Commit Graph

5781 Commits

Author SHA1 Message Date
5a55e47acd [Enhancement](Load) stream tvf support two phase commit (#23800) 2023-10-09 14:15:56 +08:00
9e31cb26bb [fix](parse_url) fix parse_url is not working in some case to extract the HOST (#25040)
Issue Number: close #24452
2023-10-09 00:14:58 +08:00
451e299151 [Opt](performance) Optimize timeround with minute / second (#25073) 2023-10-08 23:14:23 +08:00
5c020be4d2 [Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine (#25087) 2023-10-08 22:50:12 +08:00
9d8b993c51 [fix](fs) fix remove error log failed (#25108) 2023-10-08 22:15:37 +08:00
7e9ffad933 [fix](ES catalog)Doris cannot parse ES date field without time zone (#24864)
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
2023-10-08 19:28:08 +08:00
b91335dbb8 [refactor](columndecimal) is_decimal_v2 member is useless because column decimal could detect by itself (#25110)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-10-08 18:09:19 +08:00
c3d9f42a3e [fix](scanner) fix load cannot end when set exec_mem_limit (#25090) 2023-10-08 17:07:30 +08:00
6fe060b79e [fix](streamload) fix http_stream retry mechanism (#24978)
If a failure occurs, doris may retry. Due to ctx->is_read_schema is a global variable that has not been reset in a timely manner, which may cause exceptions.


---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-08 11:16:21 +08:00
feb1cbe9ed [bug](partition_sort)partition sort need sort all data in two phase global (#24960)
#24886 this PR have mark phase in FE, now add those change in BE.
partition sort need sort all data in two pahse global
2023-10-08 10:46:43 +08:00
4e8cde127c [Enhance](catalog)add table cache in paimon jni (#25014)
- fix get old schema after refresh paimon table
- add table cache in paimon jni
2023-10-08 10:36:18 +08:00
239df5860b [enhancement](tablet_meta_lock) add more trace for write lock of tablet's _meta_lock (#25095) 2023-10-08 10:28:10 +08:00
f66708db0e [log](load) PUBLISH_TIMEOUT should not print stacktrace (#25080) 2023-10-08 10:16:25 +08:00
0df32c8e3e [Fix](Outfile) Use data_type_serde to export data to csv file format (#24721)
Modify the outfile logic, use the data type serde framework.
2023-10-07 22:50:44 +08:00
cb0076e585 [fix](insert) fix group commit be ut (#24968) 2023-10-07 19:50:05 +08:00
8953179c11 [fix](multi-table) fix multi table task cannot end (#25056)
When exec multi table task, it can not end when exec plan error, which causes other routine load task can not submit.
2023-10-07 19:45:42 +08:00
59261174d5 [chore](unused) Remove unused variable CPU_HARD_LIMIT in task_group.cc (#25076)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2023-10-07 03:36:13 -05:00
335804bb25 [fix](pipelinex) fix multi cast sink without init (#25066) 2023-10-07 15:49:03 +08:00
7b2ff38401 query cpu hard limit based on doris scheduler (#24844) 2023-10-07 12:03:07 +08:00
0631ed61b0 [feature](profilev2) Preliminary support for profilev2. (#24881)
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk

profile

Simple  profile  
  
  PLAN  FRAGMENT  0
    OUTPUT  EXPRS:
        count(*)
    PARTITION:  UNPARTITIONED

    VRESULT  SINK
          MYSQL_PROTOCAL


    7:VAGGREGATE  (merge  finalize)
    |    output:  count(partial_count(*))[#44]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  725.608us,  max  725.608us,  min  725.608us
    |    RowsReturned:  1
    |    
    6:VEXCHANGE
          offset:  0
          TotalTime:  avg  52.411us,  max  52.411us,  min  52.411us
          RowsReturned:  8

PLAN  FRAGMENT  1

    PARTITION:  HASH_PARTITIONED:  c_customer_sk

    STREAM  DATA  SINK
        EXCHANGE  ID:  06
        UNPARTITIONED

        TotalTime:  avg  106.263us,  max  118.38us,  min  81.403us
        BlocksSent:  8

    5:VAGGREGATE  (update  serialize)
    |    output:  partial_count(*)[#43]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  679.296us,  max  739.395us,  min  554.904us
    |    BuildTime:  avg  33.198us,  max  48.387us,  min  28.880us
    |    ExecTime:  avg  27.633us,  max  40.278us,  min  24.537us
    |    RowsReturned:  8
    |    
    4:VHASH  JOIN
    |    join  op:  INNER  JOIN(PARTITIONED)[]
    |    equal  join  conjunct:  c_customer_sk  =  i_item_sk
    |    runtime  filters:  RF000[bloom]  <-  i_item_sk(18000/16384/1048576)
    |    cardinality=17,740
    |    vec  output  tuple  id:  3
    |    vIntermediate  tuple  ids:  2  
    |    hash  output  slot  ids:  22  
    |    RowsReturned:  18.0K  (18000)
    |    ProbeRows:  18.0K  (18000)
    |    ProbeTime:  avg  862.308us,  max  1.576ms,  min  666.28us
    |    BuildRows:  18.0K  (18000)
    |    BuildTime:  avg  3.8ms,  max  3.860ms,  min  2.317ms
    |    
    |----1:VEXCHANGE
    |              offset:  0
    |              TotalTime:  avg  48.822us,  max  67.459us,  min  30.380us
    |              RowsReturned:  18.0K  (18000)
    |        
    3:VEXCHANGE
          offset:  0
          TotalTime:  avg  33.162us,  max  39.480us,  min  28.854us
          RowsReturned:  18.0K  (18000)

PLAN  FRAGMENT  2

    PARTITION:  HASH_PARTITIONED:  c_customer_id

    STREAM  DATA  SINK
        EXCHANGE  ID:  03
        HASH_PARTITIONED:  c_customer_sk

        TotalTime:  avg  753.954us,  max  1.210ms,  min  499.470us
        BlocksSent:  64

    2:VOlapScanNode
          TABLE:  default_cluster:tpcds.customer(customer),  PREAGGREGATION:  ON
          runtime  filters:  RF000[bloom]  ->  c_customer_sk
          partitions=1/1,  tablets=12/12,  tabletList=1550745,1550747,1550749  ...
          cardinality=100000,  avgRowSize=0.0,  numNodes=1
          pushAggOp=NONE
          TotalTime:  avg  18.417us,  max  41.319us,  min  10.189us
          RowsReturned:  18.0K  (18000)
---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-07 11:16:53 +08:00
83a9d07288 [refactor](segment iterator) remove some code to make the logic more clear (#25050)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-10-07 11:14:28 +08:00
bd582aee75 [pipelineX](minor) refine code (#25015) 2023-10-07 10:45:33 +08:00
a9d12f7b82 [Debug](float) Add clang debug tune float accuracy (#25041) 2023-10-07 09:34:50 +08:00
c2b46e4df7 [fix](move-memtable) exclude rpc memory in flush mem-tracker (#24722) 2023-10-05 22:10:53 +08:00
db6c16058a [improve](move-memtable) always share load streams (#24763) 2023-10-05 22:09:59 +08:00
93eedaff62 [opt](function) Use Dict to opt the function of time_round (#25029)
Before:

select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t                   | cnt    |
+---------------------+--------+
| 1998-04-30 21:00:00 |    324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (3.39 sec)
after:

select hour_floor(`@timestamp`, 7) as t, count() as cnt from httplogs_date group by t order by t limit 10;
+---------------------+--------+
| t                   | cnt    |
+---------------------+--------+
| 1998-04-30 21:00:00 |    324 |
| 1998-05-01 04:00:00 | 286156 |
| 1998-05-01 11:00:00 | 266130 |
| 1998-05-01 18:00:00 | 483765 |
| 1998-05-02 01:00:00 | 276706 |
| 1998-05-02 08:00:00 | 169945 |
| 1998-05-02 15:00:00 | 223593 |
| 1998-05-02 22:00:00 | 272616 |
| 1998-05-03 05:00:00 | 188689 |
| 1998-05-03 12:00:00 | 184405 |
+---------------------+--------+
10 rows in set (2.19 sec)
2023-10-04 23:34:24 +08:00
4ce5213b1c [fix](insert) Fix test_group_commit_stream_load and add more regression in test_group_commit_http_stream (#24954) 2023-10-03 20:56:24 +08:00
6e836fe381 [fix](jdbc catalog) fix jdbc catalog read bitmap data crash (#25034) 2023-10-03 20:52:47 +08:00
10f0c63896 [FIX](complex-type) fix agg table with complex type with replace state (#24873)
fix agg table with complex type with replace state
2023-10-03 16:32:58 +08:00
f8a3034dca [Opt](performance) refactor and opt time round floor function (#25026)
refactor and opt time round floor function
2023-10-01 11:51:26 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
d23bedf170 [fix](single-replica-load) fix duplicated done run in request_slave_tablet_pull_rowset (#25013)
BE will crash because done run twice when try_offer() failed in
request_slave_tablet_pull_rowset.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-09-28 21:08:18 +08:00
864a0f9bcb [opt](pipeline) Make pipeline fragment context send_report asynchronized (#23142) 2023-09-28 17:55:53 +08:00
2ec50dcfc7 [log](compaction) add more stats for compaction log (#24984) 2023-09-28 15:29:15 +08:00
b6babf3af4 [pipelineX](sink) support jdbc table sink (#24970)
* [pipelineX](sink) support jdbc table sink
2023-09-28 14:39:32 +08:00
b35171b582 [pipelineX](bug) fix distinct streaming agg (#24995) 2023-09-28 14:01:26 +08:00
f0fad61db4 [pipelineX](bug) Fix file scan operator (#24989) 2023-09-28 11:12:27 +08:00
188d9ab94e [enhancement](statistics) collect table level loaded rows on BE to make RPC light weight (#24609) 2023-09-28 10:51:50 +08:00
430634367a [pipelineX](node)support file scan operator (#24924) 2023-09-27 22:10:43 +08:00
68087f6c82 [fix](json function) Fix the slow performance of get_json_path when processing JSONB (#24631)
When processing JSONB, automatically convert to jsonb_extract_string
2023-09-27 21:17:39 +08:00
d4e823950a [bug](json)Fix some problems of json function on Nereids (#24898)
Fix some problems of json_length and json_contains function on Nereids
fix wrong result of json_contains function
Regression test jsonb_p0 to enable Nereids
2023-09-27 21:01:45 +08:00
947b116318 [pipelineX](fix) Fix BE crash due to ES scan operator (#24983) 2023-09-27 20:45:38 +08:00
1fb9022d07 [pipelineX](bug) Fix meta scan operator (#24963) 2023-09-27 20:34:47 +08:00
671b5f0a0a [Bug](pipeline) Fix block reusing for union source operator (#24977)
[CANCELLED][INTERNAL_ERROR]Merge block not match, self:[String], input:[String, Nullable(String), Nullable(String), Nullable(String), Nullable(String), DateV2]
2023-09-27 19:41:56 +08:00
5d138b6928 [remove](function) make execute_impl const and remove running_difference function (#24935) 2023-09-27 18:17:28 +08:00
c04078f3b8 [improvement](compaction) output tablet_id when be core dumped. (#24952) 2023-09-27 16:50:18 +08:00
19cff5d167 [fix](compile) failed on arm platform, with clang compiler and pch on (#24636)
failed on arm platform, with clang compiler and pch on
2023-09-27 16:47:02 +08:00
Pxl
5fc04b6aeb [Improvement](hash) some refactor of process hash table probe impl (#24461)
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
aa4dbbedc7 [pipelineX](bug) Fix dead lock in exchange sink operator (#24947) 2023-09-27 15:40:25 +08:00
87a30dc41d [feature-wip](arrow-flight)(step3) Support authentication and user session (#24772) 2023-09-27 14:53:58 +08:00