0af9371a96
[fix](hash join) fix column ref DCHECK failure of hash join node block mem reuse ( #28991 )
...
Introduced by #28851 , after evaluating build side expr, some columns in resulting block may be referenced more than once in the same block.
e.g. coalesce(col_a, 'string') if col_a is nullable but actually contains no null values, in this case funcition coalesce will insert a new nullable column which references the original col_a.
2023-12-25 22:19:01 +08:00
7081139bdc
[fix](block) fix be core while mutable block merge may cause different row size between columns in origin block ( #27943 )
2023-12-25 20:35:22 +08:00
d75300f166
[fix](hash join) fix stack overflow caused by evaluate case expr on huge build block ( #28851 )
2023-12-22 15:45:12 +08:00
e53cfa09da
[fix](join) incorrect result of right anti join with nullable ( #28301 )
2023-12-14 14:07:12 +08:00
ac167f493b
[fix](join) fix decimal overflow caused by left outer join ( #28221 )
...
For left outer join or full outer join, when build side data is empty, null data is output for build side, but nested column data of nullable column is not properly initialized, which may cause decimal arithmetic overflow
2023-12-11 11:51:05 +08:00
abc802b5ba
[bugfix](core) child block is shared between operator and node, it should be shared ptr ( #28106 )
...
_child_block in nest loop join , table value function, repeat node will be shared between ExecNode and related operator, but it should not be a unique ptr in operator, it belongs to exec node.
It will double free the block, if operator's close method is not called correctly.
It should be a shared ptr, then it will not core even if the opeartor's close method is not called.
2023-12-09 00:18:14 +08:00
e3d2425d47
[Improvement](join) remove insert_indices_from_join and special judge for -1 ( #27779 )
...
remove insert_indices_from_join and special judge for -1
2023-12-04 11:03:22 +08:00
d969047b50
[Refactor](join) refactor of hash join ( #27557 )
...
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table
Co-authored-by: HappenLee <happenlee@hotmail.com >
Co-authored-by: BiteTheDDDDt <pxl290@qq.com >
2023-11-28 19:46:00 +08:00
91b0edfaa2
[Bug](join) try fix wrong _has_null_in_build_side setted ( #27684 )
...
try fix wrong _has_null_in_build_side setted
2023-11-28 17:42:14 +08:00
f565f60bc3
[refactor](standard)BE:Initialize pointer variables in the class to nullptr by default ( #27587 )
2023-11-28 13:02:30 +08:00
301bfe4d5d
[Bug](mark-join) fix mark join report error when probe block have column do not output ( #27360 )
...
fix mark join report error when probe block have column do not output
2023-11-23 11:16:02 +08:00
febd60c75f
[fix](join) incorrect result of left join with other conjuncts ( #27238 )
2023-11-19 15:36:39 +08:00
a5565f68b2
[Refactor](opentelemetry) Remove opentelemetry ( #26605 )
2023-11-09 18:05:34 +08:00
6761dc4113
[coverage](test) improve test coverage ( #26096 )
...
improve test coverage
2023-10-30 18:01:55 +08:00
b5ee4a9dbb
[enhancement](profilev2) add some fields for profile v2 ( #25611 )
...
Add 3 counters for ExecNode:
ExecTime - Total execution time(excluding the execution time of children).
OutputBytes - The total number of bytes output to parent.
BlockCount - The total count of blocks output to parent.
2023-10-23 15:55:40 +08:00
7385602b19
[bug](rf) fix only min/max rf return error when has remote target ( #25588 )
2023-10-19 19:26:29 +08:00
ef7d8aa99a
[fix](be)confix bug of converting outer join probe block to nullable ( #25492 )
...
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
2023-10-17 10:10:56 +08:00
d00d029ffb
Separate fixed key hash map context creator ( #25438 )
...
Separate fixed key hash map context creator
2023-10-16 11:20:30 +08:00
37dbda6209
[pipelineX](refactor) Use class template to simplify join ( #25369 )
2023-10-13 16:51:55 +08:00
1a0344df16
[Improvement](hash) refactor of hash map context ( #24966 )
...
refactor of hash map context
2023-10-12 18:10:21 +08:00
cdf5f0fe68
[fix](pipelineX) mark join column should be nullable ( #25275 )
2023-10-11 11:35:43 +08:00
f5b826b66d
[fix](mark join) mark join column should be nullable ( #24910 )
2023-10-10 10:10:36 +08:00
5c020be4d2
[Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine ( #25087 )
2023-10-08 22:50:12 +08:00
0631ed61b0
[feature](profilev2) Preliminary support for profilev2. ( #24881 )
...
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk
profile
Simple profile
PLAN FRAGMENT 0
OUTPUT EXPRS:
count(*)
PARTITION: UNPARTITIONED
VRESULT SINK
MYSQL_PROTOCAL
7:VAGGREGATE (merge finalize)
| output: count(partial_count(*))[#44 ]
| group by:
| cardinality=1
| TotalTime: avg 725.608us, max 725.608us, min 725.608us
| RowsReturned: 1
|
6:VEXCHANGE
offset: 0
TotalTime: avg 52.411us, max 52.411us, min 52.411us
RowsReturned: 8
PLAN FRAGMENT 1
PARTITION: HASH_PARTITIONED: c_customer_sk
STREAM DATA SINK
EXCHANGE ID: 06
UNPARTITIONED
TotalTime: avg 106.263us, max 118.38us, min 81.403us
BlocksSent: 8
5:VAGGREGATE (update serialize)
| output: partial_count(*)[#43 ]
| group by:
| cardinality=1
| TotalTime: avg 679.296us, max 739.395us, min 554.904us
| BuildTime: avg 33.198us, max 48.387us, min 28.880us
| ExecTime: avg 27.633us, max 40.278us, min 24.537us
| RowsReturned: 8
|
4:VHASH JOIN
| join op: INNER JOIN(PARTITIONED)[]
| equal join conjunct: c_customer_sk = i_item_sk
| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576)
| cardinality=17,740
| vec output tuple id: 3
| vIntermediate tuple ids: 2
| hash output slot ids: 22
| RowsReturned: 18.0K (18000)
| ProbeRows: 18.0K (18000)
| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us
| BuildRows: 18.0K (18000)
| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms
|
|----1:VEXCHANGE
| offset: 0
| TotalTime: avg 48.822us, max 67.459us, min 30.380us
| RowsReturned: 18.0K (18000)
|
3:VEXCHANGE
offset: 0
TotalTime: avg 33.162us, max 39.480us, min 28.854us
RowsReturned: 18.0K (18000)
PLAN FRAGMENT 2
PARTITION: HASH_PARTITIONED: c_customer_id
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: c_customer_sk
TotalTime: avg 753.954us, max 1.210ms, min 499.470us
BlocksSent: 64
2:VOlapScanNode
TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON
runtime filters: RF000[bloom] -> c_customer_sk
partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ...
cardinality=100000, avgRowSize=0.0, numNodes=1
pushAggOp=NONE
TotalTime: avg 18.417us, max 41.319us, min 10.189us
RowsReturned: 18.0K (18000)
---------
Co-authored-by: yiguolei <676222867@qq.com >
2023-10-07 11:16:53 +08:00
642e5cdb69
[Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly ( #23395 )
2023-09-29 22:38:52 +08:00
5fc04b6aeb
[Improvement](hash) some refactor of process hash table probe impl ( #24461 )
...
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
c9ef5ef2b1
[refactor](profile) refactor join node profile when build side shared hash table ( #24785 )
...
refactor join node profile when build side shared hash table
2023-09-25 10:28:16 +08:00
49f6eda843
[fix](nested_join) incorrect result of semi/anti mark join ( #24616 )
2023-09-20 10:41:06 +08:00
35c5d71549
[Improvement](join) some improvement of hash join ( #23972 )
...
some improvement of hash join
2023-09-14 17:55:35 +08:00
8e7f7c9566
[fix](profile) move probe time to pull and add LoopGenerateJoin time #24302
2023-09-14 16:41:01 +08:00
c94e47583c
[fix](join) avoid DCHECK failed in '_filter_data_and_build_output' ( #24162 )
...
avoid DCHECK failed in '_filter_data_and_build_output'
2023-09-11 11:54:44 +08:00
93c1151f1a
[fix](join) incorrect result of mark join ( #24112 )
2023-09-10 11:30:45 +08:00
76ca57cf21
[bug](join) fix outer join not add tuple is null column when build rows is 0 ( #23974 )
...
fix outer join not add tuple is null column when build rows is 0
2023-09-08 17:55:03 +08:00
69868f18d6
[Bug](join) fix nested loop join some problems ( #24034 )
2023-09-08 17:40:41 +08:00
68acb8597b
[fix](nested_loop_join) null value should be output in semi-anti join ( #23971 )
...
create table t1
(k1 bigint, k2 bigint)
ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
create table t3
(k1 bigint, k2 bigint)
ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
Data:
insert into t1 values (1,null),(null,1),(1,2), (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null);
insert into t3 values (1,null),(null,1),(1,4), (1,2), (null,3), (2,4), (3,7), (3,9),(null,null),(5,1);
Query:
select t1.* from t1 where not exists ( select k1 from t3 where t1.k2 < t3.k2 );
Result:
Empty set
Expect result:
+------+------+
| k1 | k2 |
+------+------+
| NULL | NULL |
| 1 | NULL |
+------+------+
2023-09-08 09:28:55 +08:00
3317909141
[pipelineX](join) support nested loop join operator ( #23756 )
2023-09-04 10:08:22 +08:00
9da9409bd4
[refactor](join) improve join node output when build table rows is 0 ( #23713 )
2023-09-04 09:48:38 +08:00
d22290e548
[pipelineX](join) support hash join ( #23689 )
2023-08-31 13:01:26 +08:00
9d1f2cd8e0
[Improvement](pipeline) Terminate early for short-circuit join ( #23378 )
2023-08-23 19:40:17 +08:00
b252c49071
[fix](hash join) fix heap-use-after-free of HashJoinNode ( #23094 )
2023-08-17 16:29:47 +08:00
343a6dc29d
[improvement](hash join) Return result early if probe side has no data ( #23044 )
2023-08-17 09:17:09 +08:00
d371101bfd
[Improvement](aggregation) make fixed hashmap's bitmap_size flexable ( #22573 )
...
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
e7e73a618c
[exec](join) Print join type in profile ( #22567 )
2023-08-03 20:46:15 +08:00
7947569993
[Bug][RegressionTest] fix the DCHECK failed in join code ( #22021 )
2023-07-20 18:12:20 +08:00
b35cfc5d5e
[opt](join) Opt the performance of join probe ( #21845 )
2023-07-19 01:21:22 +08:00
c36d225a27
[feature](profile) add process hashtable time in join node ( #21878 )
...
add process hashtable time in join node
2023-07-18 18:09:42 +08:00
7f50c07219
[Opt](exec) opt the outer join performance in TPCDS Q95 ( #21806 )
2023-07-14 18:42:08 +08:00
4d17400244
[profile](join) add collisions into profile ( #21510 )
2023-07-06 14:30:10 +08:00
b5da3f74f5
[improvement](join) avoid unnecessary copying in _build_output_block ( #21360 )
...
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
ca0953ea51
[improvement](join) Serialize build keys in a vectorized (columnar) way ( #21361 )
...
There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.
2023-07-03 09:29:10 +08:00