Commit Graph

5621 Commits

Author SHA1 Message Date
6fe060b79e [fix](streamload) fix http_stream retry mechanism (#24978)
If a failure occurs, doris may retry. Due to ctx->is_read_schema is a global variable that has not been reset in a timely manner, which may cause exceptions.


---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-08 11:16:21 +08:00
feb1cbe9ed [bug](partition_sort)partition sort need sort all data in two phase global (#24960)
#24886 this PR have mark phase in FE, now add those change in BE.
partition sort need sort all data in two pahse global
2023-10-08 10:46:43 +08:00
4e8cde127c [Enhance](catalog)add table cache in paimon jni (#25014)
- fix get old schema after refresh paimon table
- add table cache in paimon jni
2023-10-08 10:36:18 +08:00
239df5860b [enhancement](tablet_meta_lock) add more trace for write lock of tablet's _meta_lock (#25095) 2023-10-08 10:28:10 +08:00
f66708db0e [log](load) PUBLISH_TIMEOUT should not print stacktrace (#25080) 2023-10-08 10:16:25 +08:00
0df32c8e3e [Fix](Outfile) Use data_type_serde to export data to csv file format (#24721)
Modify the outfile logic, use the data type serde framework.
2023-10-07 22:50:44 +08:00
8953179c11 [fix](multi-table) fix multi table task cannot end (#25056)
When exec multi table task, it can not end when exec plan error, which causes other routine load task can not submit.
2023-10-07 19:45:42 +08:00
59261174d5 [chore](unused) Remove unused variable CPU_HARD_LIMIT in task_group.cc (#25076)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-07 03:36:13 -05:00
335804bb25 [fix](pipelinex) fix multi cast sink without init (#25066) 2023-10-07 15:49:03 +08:00
7b2ff38401 query cpu hard limit based on doris scheduler (#24844) 2023-10-07 12:03:07 +08:00
0631ed61b0 [feature](profilev2) Preliminary support for profilev2. (#24881)
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk

profile

Simple  profile  
  
  PLAN  FRAGMENT  0
    OUTPUT  EXPRS:
        count(*)
    PARTITION:  UNPARTITIONED

    VRESULT  SINK
          MYSQL_PROTOCAL


    7:VAGGREGATE  (merge  finalize)
    |    output:  count(partial_count(*))[#44]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  725.608us,  max  725.608us,  min  725.608us
    |    RowsReturned:  1
    |    
    6:VEXCHANGE
          offset:  0
          TotalTime:  avg  52.411us,  max  52.411us,  min  52.411us
          RowsReturned:  8

PLAN  FRAGMENT  1

    PARTITION:  HASH_PARTITIONED:  c_customer_sk

    STREAM  DATA  SINK
        EXCHANGE  ID:  06
        UNPARTITIONED

        TotalTime:  avg  106.263us,  max  118.38us,  min  81.403us
        BlocksSent:  8

    5:VAGGREGATE  (update  serialize)
    |    output:  partial_count(*)[#43]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  679.296us,  max  739.395us,  min  554.904us
    |    BuildTime:  avg  33.198us,  max  48.387us,  min  28.880us
    |    ExecTime:  avg  27.633us,  max  40.278us,  min  24.537us
    |    RowsReturned:  8
    |    
    4:VHASH  JOIN
    |    join  op:  INNER  JOIN(PARTITIONED)[]
    |    equal  join  conjunct:  c_customer_sk  =  i_item_sk
    |    runtime  filters:  RF000[bloom]  <-  i_item_sk(18000/16384/1048576)
    |    cardinality=17,740
    |    vec  output  tuple  id:  3
    |    vIntermediate  tuple  ids:  2  
    |    hash  output  slot  ids:  22  
    |    RowsReturned:  18.0K  (18000)
    |    ProbeRows:  18.0K  (18000)
    |    ProbeTime:  avg  862.308us,  max  1.576ms,  min  666.28us
    |    BuildRows:  18.0K  (18000)
    |    BuildTime:  avg  3.8ms,  max  3.860ms,  min  2.317ms
    |    
    |----1:VEXCHANGE
    |              offset:  0
    |              TotalTime:  avg  48.822us,  max  67.459us,  min  30.380us
    |              RowsReturned:  18.0K  (18000)
    |        
    3:VEXCHANGE
          offset:  0
          TotalTime:  avg  33.162us,  max  39.480us,  min  28.854us
          RowsReturned:  18.0K  (18000)

PLAN  FRAGMENT  2

    PARTITION:  HASH_PARTITIONED:  c_customer_id

    STREAM  DATA  SINK
        EXCHANGE  ID:  03
        HASH_PARTITIONED:  c_customer_sk

        TotalTime:  avg  753.954us,  max  1.210ms,  min  499.470us
        BlocksSent:  64

    2:VOlapScanNode
          TABLE:  default_cluster:tpcds.customer(customer),  PREAGGREGATION:  ON
          runtime  filters:  RF000[bloom]  ->  c_customer_sk
          partitions=1/1,  tablets=12/12,  tabletList=1550745,1550747,1550749  ...
          cardinality=100000,  avgRowSize=0.0,  numNodes=1
          pushAggOp=NONE
          TotalTime:  avg  18.417us,  max  41.319us,  min  10.189us
          RowsReturned:  18.0K  (18000)
---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-07 11:16:53 +08:00
83a9d07288 [refactor](segment iterator) remove some code to make the logic more clear (#25050)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-10-07 11:14:28 +08:00
bd582aee75 [pipelineX](minor) refine code (#25015) 2023-10-07 10:45:33 +08:00
a9d12f7b82 [Debug](float) Add clang debug tune float accuracy (#25041) 2023-10-07 09:34:50 +08:00
c2b46e4df7 [fix](move-memtable) exclude rpc memory in flush mem-tracker (#24722) 2023-10-05 22:10:53 +08:00
db6c16058a [improve](move-memtable) always share load streams (#24763) 2023-10-05 22:09:59 +08:00
93eedaff62 [opt](function) Use Dict to opt the function of time_round (#25029)
Before:

select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t                   | cnt    |
+---------------------+--------+
| 1998-04-30 21:00:00 |    324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (3.39 sec)
after:

select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t                   | cnt    |
+---------------------+--------+
| 1998-04-30 21:00:00 |    324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (2.19 sec)
2023-10-04 23:34:24 +08:00
4ce5213b1c [fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954) 2023-10-03 20:56:24 +08:00
6e836fe381 [fix](jdbc catalog) fix jdbc catalog read bitmap data crash (#25034) 2023-10-03 20:52:47 +08:00
10f0c63896 [FIX](complex-type) fix agg table with complex type with replace state (#24873)
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
f8a3034dca [Opt](performance) refactor and opt time round floor function (#25026)
refactor and opt time round floor function
2023-10-01 11:51:26 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
d23bedf170 [fix](single-replica-load) fix duplicated done run in request_slave_tablet_pull_rowset (#25013)
BE will crash because done run twice when try_offer() failed in
request_slave_tablet_pull_rowset.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-09-28 21:08:18 +08:00
864a0f9bcb [opt](pipeline) Make pipeline fragment context send_report asynchronized (#23142) 2023-09-28 17:55:53 +08:00
2ec50dcfc7 [log](compaction) add more stats for compaction log (#24984) 2023-09-28 15:29:15 +08:00
b6babf3af4 [pipelineX](sink) support jdbc table sink (#24970)
* [pipelineX](sink) support jdbc table sink
2023-09-28 14:39:32 +08:00
b35171b582 [pipelineX](bug) fix distinct streaming agg (#24995) 2023-09-28 14:01:26 +08:00
f0fad61db4 [pipelineX](bug) Fix file scan operator (#24989) 2023-09-28 11:12:27 +08:00
188d9ab94e [enhancement](statistics) collect table level loaded rows on BE to make RPC light weight (#24609) 2023-09-28 10:51:50 +08:00
430634367a [pipelineX](node)support file scan operator (#24924) 2023-09-27 22:10:43 +08:00
68087f6c82 [fix](json function) Fix the slow performance of get_json_path when processing JSONB (#24631)
When processing JSONB, automatically convert to jsonb_extract_string
2023-09-27 21:17:39 +08:00
d4e823950a [bug](json)Fix some problems of json function on Nereids (#24898)
Fix some problems of json_length and json_contains function on Nereids
fix wrong result of json_contains function
Regression test jsonb_p0 to enable Nereids
2023-09-27 21:01:45 +08:00
947b116318 [pipelineX](fix) Fix BE crash due to ES scan operator (#24983) 2023-09-27 20:45:38 +08:00
1fb9022d07 [pipelineX](bug) Fix meta scan operator (#24963) 2023-09-27 20:34:47 +08:00
671b5f0a0a [Bug](pipeline) Fix block reusing for union source operator (#24977)
[CANCELLED][INTERNAL_ERROR]Merge block not match, self:[String], input:[String, Nullable(String), Nullable(String), Nullable(String), Nullable(String), DateV2]
2023-09-27 19:41:56 +08:00
5d138b6928 [remove](function) make execute_impl const and remove running_difference function (#24935) 2023-09-27 18:17:28 +08:00
c04078f3b8 [improvement](compaction) output tablet_id when be core dumped. (#24952) 2023-09-27 16:50:18 +08:00
19cff5d167 [fix](compile) failed on arm platform, with clang compiler and pch on (#24636)
failed on arm platform, with clang compiler and pch on
2023-09-27 16:47:02 +08:00
Pxl
5fc04b6aeb [Improvement](hash) some refactor of process hash table probe impl (#24461)
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
aa4dbbedc7 [pipelineX](bug) Fix dead lock in exchange sink operator (#24947) 2023-09-27 15:40:25 +08:00
87a30dc41d [feature-wip](arrow-flight)(step3) Support authentication and user session (#24772) 2023-09-27 14:53:58 +08:00
26818de9c8 [feature](jni) support complex types in jni framework (#24810)
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00
1b0e3246ea [pipelineX](fix) Fix exception reporting and Nereids plan (#24936) 2023-09-27 13:15:40 +08:00
c04e5bac39 [bug](pipelineX) fix java-udaf failed with open pipelineX (#24939) 2023-09-27 13:14:10 +08:00
452318a9fc [Enhancement](streamload) stream tvf support user specified label (#24219)
stream tvf support user specified label
example:

curl -v --location-trusted -u root: -H "sql: insert into test.t1 WITH LABEL label1 select c1,c2 from http_stream(\"format\" = \"CSV\", \"column_separator\" = \",\")" -T example.csv http://127.0.0.1:8030/api/_http_stream
return:

{
    "TxnId": 2064,
    "Label": "label1",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Success",
    "Message": "OK",
    "NumberTotalRows": 2,
    "NumberLoadedRows": 2,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 27,
    "LoadTimeMs": 152,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 83,
    "ReadDataTimeMs": 92,
    "WriteDataTimeMs": 41,
    "CommitAndPublishTimeMs": 24
}
2023-09-27 12:09:35 +08:00
24ee3607e1 [Bug](pipeline) nullprt may be close the sink if init failed (#24926) 2023-09-27 09:11:06 +08:00
a689a2fbb1 [pipelineX](fix) Fix projection expression (#24923) 2023-09-26 21:48:28 +08:00
55d1090137 [feature](insert) Support group commit stream load (#24304) 2023-09-26 20:57:02 +08:00
fe2879d8fe [fix](merge-on-write) MergeIndexDeleteBitmapCalculator stack overflow (#24913) 2023-09-26 20:32:23 +08:00
77e864df12 [enhancement](delete) use column id in delete push task instead of column name (#24549) 2023-09-26 19:54:55 +08:00