Commit Graph

206 Commits

Author SHA1 Message Date
0af9371a96 [fix](hash join) fix column ref DCHECK failure of hash join node block mem reuse (#28991)
Introduced by #28851, after evaluating build side expr, some columns in resulting block may be referenced more than once in the same block.

e.g. coalesce(col_a, 'string') if col_a is nullable but actually contains no null values, in this case funcition coalesce will insert a new nullable column which references the original col_a.
2023-12-25 22:19:01 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
d75300f166 [fix](hash join) fix stack overflow caused by evaluate case expr on huge build block (#28851) 2023-12-22 15:45:12 +08:00
e53cfa09da [fix](join) incorrect result of right anti join with nullable (#28301) 2023-12-14 14:07:12 +08:00
ac167f493b [fix](join) fix decimal overflow caused by left outer join (#28221)
For left outer join or full outer join, when build side data is empty, null data is output for build side, but nested column data of nullable column is not properly initialized, which may cause decimal arithmetic overflow
2023-12-11 11:51:05 +08:00
abc802b5ba [bugfix](core) child block is shared between operator and node, it should be shared ptr (#28106)
_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.

It will double free the block, if operator's close method is not called correctly.

It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
2023-12-09 00:18:14 +08:00
Pxl
e3d2425d47 [Improvement](join) remove insert_indices_from_join and special judge for -1 (#27779)
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
Pxl
d969047b50 [Refactor](join) refactor of hash join (#27557)
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table

Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
2023-11-28 19:46:00 +08:00
Pxl
91b0edfaa2 [Bug](join) try fix wrong _has_null_in_build_side setted (#27684)
try fix wrong _has_null_in_build_side setted
2023-11-28 17:42:14 +08:00
f565f60bc3 [refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587) 2023-11-28 13:02:30 +08:00
Pxl
301bfe4d5d [Bug](mark-join) fix mark join report error when probe block have column do not output (#27360)
fix mark join report error when probe block have column do not output
2023-11-23 11:16:02 +08:00
febd60c75f [fix](join) incorrect result of left join with other conjuncts (#27238) 2023-11-19 15:36:39 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
6761dc4113 [coverage](test) improve test coverage (#26096)
improve test coverage
2023-10-30 18:01:55 +08:00
b5ee4a9dbb [enhancement](profilev2) add some fields for profile v2 (#25611)
Add 3 counters for ExecNode:

ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
7385602b19 [bug](rf) fix only min/max rf return error when has remote target (#25588) 2023-10-19 19:26:29 +08:00
ef7d8aa99a [fix](be)confix bug of converting outer join probe block to nullable (#25492)
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
2023-10-17 10:10:56 +08:00
Pxl
d00d029ffb Separate fixed key hash map context creator (#25438)
Separate fixed key hash map context creator
2023-10-16 11:20:30 +08:00
37dbda6209 [pipelineX](refactor) Use class template to simplify join (#25369) 2023-10-13 16:51:55 +08:00
Pxl
1a0344df16 [Improvement](hash) refactor of hash map context (#24966)
refactor of hash map context
2023-10-12 18:10:21 +08:00
cdf5f0fe68 [fix](pipelineX) mark join column should be nullable (#25275) 2023-10-11 11:35:43 +08:00
f5b826b66d [fix](mark join) mark join column should be nullable (#24910) 2023-10-10 10:10:36 +08:00
5c020be4d2 [Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine (#25087) 2023-10-08 22:50:12 +08:00
0631ed61b0 [feature](profilev2) Preliminary support for profilev2. (#24881)
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk

profile

Simple  profile  
  
  PLAN  FRAGMENT  0
    OUTPUT  EXPRS:
        count(*)
    PARTITION:  UNPARTITIONED

    VRESULT  SINK
          MYSQL_PROTOCAL


    7:VAGGREGATE  (merge  finalize)
    |    output:  count(partial_count(*))[#44]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  725.608us,  max  725.608us,  min  725.608us
    |    RowsReturned:  1
    |    
    6:VEXCHANGE
          offset:  0
          TotalTime:  avg  52.411us,  max  52.411us,  min  52.411us
          RowsReturned:  8

PLAN  FRAGMENT  1

    PARTITION:  HASH_PARTITIONED:  c_customer_sk

    STREAM  DATA  SINK
        EXCHANGE  ID:  06
        UNPARTITIONED

        TotalTime:  avg  106.263us,  max  118.38us,  min  81.403us
        BlocksSent:  8

    5:VAGGREGATE  (update  serialize)
    |    output:  partial_count(*)[#43]
    |    group  by:  
    |    cardinality=1
    |    TotalTime:  avg  679.296us,  max  739.395us,  min  554.904us
    |    BuildTime:  avg  33.198us,  max  48.387us,  min  28.880us
    |    ExecTime:  avg  27.633us,  max  40.278us,  min  24.537us
    |    RowsReturned:  8
    |    
    4:VHASH  JOIN
    |    join  op:  INNER  JOIN(PARTITIONED)[]
    |    equal  join  conjunct:  c_customer_sk  =  i_item_sk
    |    runtime  filters:  RF000[bloom]  <-  i_item_sk(18000/16384/1048576)
    |    cardinality=17,740
    |    vec  output  tuple  id:  3
    |    vIntermediate  tuple  ids:  2  
    |    hash  output  slot  ids:  22  
    |    RowsReturned:  18.0K  (18000)
    |    ProbeRows:  18.0K  (18000)
    |    ProbeTime:  avg  862.308us,  max  1.576ms,  min  666.28us
    |    BuildRows:  18.0K  (18000)
    |    BuildTime:  avg  3.8ms,  max  3.860ms,  min  2.317ms
    |    
    |----1:VEXCHANGE
    |              offset:  0
    |              TotalTime:  avg  48.822us,  max  67.459us,  min  30.380us
    |              RowsReturned:  18.0K  (18000)
    |        
    3:VEXCHANGE
          offset:  0
          TotalTime:  avg  33.162us,  max  39.480us,  min  28.854us
          RowsReturned:  18.0K  (18000)

PLAN  FRAGMENT  2

    PARTITION:  HASH_PARTITIONED:  c_customer_id

    STREAM  DATA  SINK
        EXCHANGE  ID:  03
        HASH_PARTITIONED:  c_customer_sk

        TotalTime:  avg  753.954us,  max  1.210ms,  min  499.470us
        BlocksSent:  64

    2:VOlapScanNode
          TABLE:  default_cluster:tpcds.customer(customer),  PREAGGREGATION:  ON
          runtime  filters:  RF000[bloom]  ->  c_customer_sk
          partitions=1/1,  tablets=12/12,  tabletList=1550745,1550747,1550749  ...
          cardinality=100000,  avgRowSize=0.0,  numNodes=1
          pushAggOp=NONE
          TotalTime:  avg  18.417us,  max  41.319us,  min  10.189us
          RowsReturned:  18.0K  (18000)
---------

Co-authored-by: yiguolei <676222867@qq.com>
2023-10-07 11:16:53 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
Pxl
5fc04b6aeb [Improvement](hash) some refactor of process hash table probe impl (#24461)
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
c9ef5ef2b1 [refactor](profile) refactor join node profile when build side shared hash table (#24785)
refactor join node profile when build side shared hash table
2023-09-25 10:28:16 +08:00
49f6eda843 [fix](nested_join) incorrect result of semi/anti mark join (#24616) 2023-09-20 10:41:06 +08:00
Pxl
35c5d71549 [Improvement](join) some improvement of hash join (#23972)
some improvement of hash join
2023-09-14 17:55:35 +08:00
8e7f7c9566 [fix](profile) move probe time to pull and add LoopGenerateJoin time #24302 2023-09-14 16:41:01 +08:00
c94e47583c [fix](join) avoid DCHECK failed in '_filter_data_and_build_output' (#24162)
avoid DCHECK failed in '_filter_data_and_build_output'
2023-09-11 11:54:44 +08:00
93c1151f1a [fix](join) incorrect result of mark join (#24112) 2023-09-10 11:30:45 +08:00
76ca57cf21 [bug](join) fix outer join not add tuple is null column when build rows is 0 (#23974)
fix outer join not add tuple is null column when build rows is 0
2023-09-08 17:55:03 +08:00
Pxl
69868f18d6 [Bug](join) fix nested loop join some problems (#24034) 2023-09-08 17:40:41 +08:00
68acb8597b [fix](nested_loop_join) null value should be output in semi-anti join (#23971)
create table t1
        (k1 bigint, k2 bigint)
        ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
create table t3
        (k1 bigint, k2 bigint)
        ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
Data:

insert into t1 values (1,null),(null,1),(1,2), (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null);
insert into t3 values (1,null),(null,1),(1,4), (1,2), (null,3), (2,4), (3,7), (3,9),(null,null),(5,1);
Query:

 select t1.* from t1 where not exists ( select k1 from t3 where t1.k2 < t3.k2 );
Result:

Empty set
Expect result:

+------+------+
| k1   | k2   |
+------+------+
| NULL | NULL |
|    1 | NULL |
+------+------+
2023-09-08 09:28:55 +08:00
3317909141 [pipelineX](join) support nested loop join operator (#23756) 2023-09-04 10:08:22 +08:00
9da9409bd4 [refactor](join) improve join node output when build table rows is 0 (#23713) 2023-09-04 09:48:38 +08:00
d22290e548 [pipelineX](join) support hash join (#23689) 2023-08-31 13:01:26 +08:00
9d1f2cd8e0 [Improvement](pipeline) Terminate early for short-circuit join (#23378) 2023-08-23 19:40:17 +08:00
b252c49071 [fix](hash join) fix heap-use-after-free of HashJoinNode (#23094) 2023-08-17 16:29:47 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
Pxl
d371101bfd [Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573)
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
e7e73a618c [exec](join) Print join type in profile (#22567) 2023-08-03 20:46:15 +08:00
7947569993 [Bug][RegressionTest] fix the DCHECK failed in join code (#22021) 2023-07-20 18:12:20 +08:00
b35cfc5d5e [opt](join) Opt the performance of join probe (#21845) 2023-07-19 01:21:22 +08:00
c36d225a27 [feature](profile) add process hashtable time in join node (#21878)
add process hashtable time in join node
2023-07-18 18:09:42 +08:00
7f50c07219 [Opt](exec) opt the outer join performance in TPCDS Q95 (#21806) 2023-07-14 18:42:08 +08:00
4d17400244 [profile](join) add collisions into profile (#21510) 2023-07-06 14:30:10 +08:00
b5da3f74f5 [improvement](join) avoid unnecessary copying in _build_output_block (#21360)
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
ca0953ea51 [improvement](join) Serialize build keys in a vectorized (columnar) way (#21361)
There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.
2023-07-03 09:29:10 +08:00