Commit Graph

965 Commits

Author SHA1 Message Date
1e5ff40e17 [refactor](group commit) remove future block (#27720)
Co-authored-by: huanghaibin <284824253@qq.com>
2023-12-11 08:41:51 +08:00
05adbfdb3d [feature](inverted index) match_phrase_prefix feature added (#27404)
select count() from test_index_match_phrase_prefix where request match_phrase_prefix 'xxx';
2023-12-05 20:15:13 +08:00
e62d19d90d [improve](partition) support auto list partition with more columns (#27817)
before the partition by column only have one column.
now remove those limit, could have more columns.
2023-12-04 11:33:18 +08:00
10483ea12c [fix](profile) fix error set with peak_memory_usage in pipeline #27749 2023-12-02 14:12:38 +08:00
ce271ff382 [fix](parquet)fix can not read parquet lz4 compress. (#27383)
Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
2023-11-29 19:04:53 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
6ed0be8e3c [refactor](profilev2) unify the counter name in shuffle operator and normal operator (#27267)
using blocksproduced and rowsproduced to unify the counter name in DataStreamSender and other exec node, or exchange operator and other operators.
blocks produced and rows produced are more easy to understand.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-11-20 14:21:39 +08:00
836cda65d8 [refactor](profilev2) split merged profile to a single runtime profile to make the logic more clear (#27184) 2023-11-19 13:21:50 +08:00
2f41e0c823 [FIX](complextype)fix information schema for complex type (#27203)
when we select in information schema , here do not show complex type information
2023-11-18 11:32:32 +08:00
e29d8cb110 [feature](move-memtable) support pipelineX in sink v2 (#27067) 2023-11-16 15:00:55 +08:00
83edcdead9 [enhancement](random_sink) change tablet search algorithm from random to round-robin for random distribution table (#26611)
1. fix race condition problem when get tablet load index
2. change tablet search algorithm from random to round-robin for random distribution table when load_to_single_tablet set to false
2023-11-15 19:55:31 +08:00
d3fd923447 [opt](pipeline) Return InternalError to FE instead of doing a useless DCHECK in ExecNode #27035
Effect: Client will see error message like below when BE meeting plan logical error.

RROR 1105 (HY000): errCode = 2, detailMessage = ([xxx]())[CANCELLED]Logical error during processing VNewOlapScanNode(dr_case_tag), output of projections 2 mismatches with exec node output 3
2023-11-15 18:15:21 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
baae7bf339 [fix](information_schema)fix bug that metadata_name_ids error tableid and append information_schema case. (#26238)
fix bug that  #24059 .
Added some information_schema scanner tests.
files
schema_privileges
table_privileges
partitions
rowsets
statistics
table_constraints

Based on infodb_support_ext_catalog=false, it currently includes tests for all tables under the information_schema database.
2023-11-09 14:07:12 +08:00
95f74f1544 [FIX](complextype)fix shrink in topN for complex type #26609 2023-11-09 10:56:14 +08:00
a3666aa87e [feature](decimal) support decimal256 when creating table (#26308) 2023-11-08 15:21:01 +08:00
16644eff7f [opt](load) optimize the performance of row distribution (#25546)
For non-pipeline non-sinkv2:
before: 14s
now: 6s-
For pipeline + sinkv2:
before: 230ms *48 instances
now: 38ms *48 instances
2023-11-07 10:04:59 +08:00
dd8bcc831c [keyword](decimalv2) Add DecimalV2 keyword (#26283) 2023-11-02 16:27:12 +08:00
e20cab64f4 [improvement](scan) avoid too many scanners for file scan node (#25727)
In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num`(default is 48).
And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many.

In this PR, I change the logic, the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners
will be up to `doris_scanner_thread_pool_thread_num`.

Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
d6c64d305f [chore](log) Add log to trace query execution #25739 2023-10-26 14:09:25 +08:00
693982fd1a [feature](decimal) support decimal256 (#25386) 2023-10-25 15:47:51 +08:00
6b2eed779c [feature](AuditLog) add scanRows scanBytes in auditlog (#25435) 2023-10-25 10:00:35 +08:00
Pxl
2972daaed9 [Bug](status) process error status on es_scroll_parser and compaction_action (#25745)
process error status on es_scroll_parser and compaction_action
2023-10-24 15:51:01 +08:00
a4c9beba85 [fix](move-memtable) fallback if partial update (#25801) 2023-10-24 10:29:59 +08:00
0e0f8090f7 [refactor](text_convert)Use serde to replace text_convert. (#25543)
Remove text_convert and use serde to replace it.
2023-10-24 09:52:43 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
Pxl
2e2d5bcba2 [Improvements](status) catch some error status (#25677)
catch some error status
2023-10-23 10:19:08 +08:00
Pxl
642c149e6a remove datetime_value and move vecdatetime_value to doris namespace (#25695)
remove datetime_value and move vecdatetime_value to doris namespace
2023-10-20 22:08:17 +08:00
2e97044706 [fix](move-memtable) fix inverted index condition (#25684) 2023-10-20 17:37:39 +08:00
d0cd535cb9 [improvement](insert) refactor group commit stream load (#25560) 2023-10-20 13:27:30 +08:00
159be51ea6 [bugfix](schema_change) Fix the coredump when doubly write during schema change (#22557) 2023-10-19 14:43:18 +08:00
11fecafb74 [fix](move-memtable) fallback if target table contains inverted index (#25498) 2023-10-18 22:11:59 +08:00
b0e0a0569a [Fix](row store) Real default value should be used instead of default… (#25230)
Before this PR the default value is not correct, we should use default value in Frontend schema.
2023-10-18 10:13:44 +08:00
b2e3ecb81d [opt](load)change load_to_single_tablet tablet search algorithm from random to round-robin (#25256)
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.

When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
2023-10-16 16:43:25 +08:00
2014e16cfb [fix](es catalog)fix es http timeout (#25273) 2023-10-12 10:21:55 +08:00
7e9ffad933 [fix](ES catalog)Doris cannot parse ES date field without time zone (#24864)
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
2023-10-08 19:28:08 +08:00
0631ed61b0 [feature](profilev2) Preliminary support for profilev2. (#24881)
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk

profile

Simple  profile  
  
  PLAN  FRAGMENT  0
    OUTPUT  EXPRS:
        count(*)
    PARTITION:  UNPARTITIONED

    VRESULT  SINK
          MYSQL_PROTOCAL


    7:VAGGREGATE  (merge  finalize)
    |    output:  count(partial_count(*))[#44]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  725.608us,  max  725.608us,  min  725.608us
    |    RowsReturned:  1
    |    
    6:VEXCHANGE
          offset:  0
          TotalTime:  avg  52.411us,  max  52.411us,  min  52.411us
          RowsReturned:  8

PLAN  FRAGMENT  1

    PARTITION:  HASH_PARTITIONED:  c_customer_sk

    STREAM  DATA  SINK
        EXCHANGE  ID:  06
        UNPARTITIONED

        TotalTime:  avg  106.263us,  max  118.38us,  min  81.403us
        BlocksSent:  8

    5:VAGGREGATE  (update  serialize)
    |    output:  partial_count(*)[#43]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  679.296us,  max  739.395us,  min  554.904us
    |    BuildTime:  avg  33.198us,  max  48.387us,  min  28.880us
    |    ExecTime:  avg  27.633us,  max  40.278us,  min  24.537us
    |    RowsReturned:  8
    |    
    4:VHASH  JOIN
    |    join  op:  INNER  JOIN(PARTITIONED)[]
    |    equal  join  conjunct:  c_customer_sk  =  i_item_sk
    |    runtime  filters:  RF000[bloom]  <-  i_item_sk(18000/16384/1048576)
    |    cardinality=17,740
    |    vec  output  tuple  id:  3
    |    vIntermediate  tuple  ids:  2  
    |    hash  output  slot  ids:  22  
    |    RowsReturned:  18.0K  (18000)
    |    ProbeRows:  18.0K  (18000)
    |    ProbeTime:  avg  862.308us,  max  1.576ms,  min  666.28us
    |    BuildRows:  18.0K  (18000)
    |    BuildTime:  avg  3.8ms,  max  3.860ms,  min  2.317ms
    |    
    |----1:VEXCHANGE
    |              offset:  0
    |              TotalTime:  avg  48.822us,  max  67.459us,  min  30.380us
    |              RowsReturned:  18.0K  (18000)
    |        
    3:VEXCHANGE
          offset:  0
          TotalTime:  avg  33.162us,  max  39.480us,  min  28.854us
          RowsReturned:  18.0K  (18000)

PLAN  FRAGMENT  2

    PARTITION:  HASH_PARTITIONED:  c_customer_id

    STREAM  DATA  SINK
        EXCHANGE  ID:  03
        HASH_PARTITIONED:  c_customer_sk

        TotalTime:  avg  753.954us,  max  1.210ms,  min  499.470us
        BlocksSent:  64

    2:VOlapScanNode
          TABLE:  default_cluster:tpcds.customer(customer),  PREAGGREGATION:  ON
          runtime  filters:  RF000[bloom]  ->  c_customer_sk
          partitions=1/1,  tablets=12/12,  tabletList=1550745,1550747,1550749  ...
          cardinality=100000,  avgRowSize=0.0,  numNodes=1
          pushAggOp=NONE
          TotalTime:  avg  18.417us,  max  41.319us,  min  10.189us
          RowsReturned:  18.0K  (18000)
---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-07 11:16:53 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
864a0f9bcb [opt](pipeline) Make pipeline fragment context send_report asynchronized (#23142) 2023-09-28 17:55:53 +08:00
3b4d8b4ac8 [pipelineX](feature) Support schema scan operator (#24850) 2023-09-25 14:42:25 +08:00
ac55d45f79 [Fix](topn opt) fix heap use after free when shrink in fetch phase (#24774) 2023-09-22 19:48:05 +08:00
09e03247ec [chore](readability) Better readability of ExecNode.cpp #24733 2023-09-22 08:54:57 +08:00
58ab25ccaa Revert "[Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773)" (#24731)
This reverts commit 3ee89aea35726197cb7e94bb4f2c36bc9d50da84.
2023-09-21 21:01:28 +08:00
dc9fa1a4f1 [Refactor](Sink) convert to tablet sink to tablet writer (#24474) 2023-09-20 14:47:18 +08:00
b9ddcbf729 [feature](merge-cloud) Rewrite code related to IOContext (#24269) 2023-09-15 19:57:58 +08:00
3ee89aea35 [Feature](merge-on-write)Support ignore mode for merge-on-write unique table (#21773) 2023-09-14 18:03:51 +08:00
11afd321cb [fix](es catalog) fix issue with select and insert from es catalog core (#24318)
Issue Number: close #24315

The root cause of this issue is that Elasticsearch's long type allows inserting floats and strings. Doris did not handle these cases when doing type conversion. The current strategy is to take the integer before the decimal point if a float or string is found.
2023-09-13 23:07:31 +08:00
e30c3f3a65 [fix](csv_reader)fix bug that Read garbled files caused be crash. (#24164)
fix bug that read garbled files caused be crash.
2023-09-13 14:12:55 +08:00
d3f1388717 [Feature](partitions) Support auto-partition (#24153)
Co-authored-by: zhangstar333 <2561612514@qq.com>
2023-09-12 15:23:15 +08:00
82dc970916 [feature](insert) Support group commit insert (#22829) 2023-09-08 15:51:03 +08:00