37dbda6209
[pipelineX](refactor) Use class template to simplify join ( #25369 )
2023-10-13 16:51:55 +08:00
1a0344df16
[Improvement](hash) refactor of hash map context ( #24966 )
...
refactor of hash map context
2023-10-12 18:10:21 +08:00
cdf5f0fe68
[fix](pipelineX) mark join column should be nullable ( #25275 )
2023-10-11 11:35:43 +08:00
f5b826b66d
[fix](mark join) mark join column should be nullable ( #24910 )
2023-10-10 10:10:36 +08:00
5c020be4d2
[Bug](join) corner case cause the mark join + null aware left join core dump in regression test in pipeline query engine ( #25087 )
2023-10-08 22:50:12 +08:00
0631ed61b0
[feature](profilev2) Preliminary support for profilev2. ( #24881 )
...
You can set the level of counters on the backend using ADD_COUNTER_WITH_LEVEL/ADD_TIMER_WITH_LEVEL. The profile can then merge counters with level 1.
set profile_level = 1;
such as
sql
select count(*) from customer join item on c_customer_sk = i_item_sk
profile
Simple profile
PLAN FRAGMENT 0
OUTPUT EXPRS:
count(*)
PARTITION: UNPARTITIONED
VRESULT SINK
MYSQL_PROTOCAL
7:VAGGREGATE (merge finalize)
| output: count(partial_count(*))[#44 ]
| group by:
| cardinality=1
| TotalTime: avg 725.608us, max 725.608us, min 725.608us
| RowsReturned: 1
|
6:VEXCHANGE
offset: 0
TotalTime: avg 52.411us, max 52.411us, min 52.411us
RowsReturned: 8
PLAN FRAGMENT 1
PARTITION: HASH_PARTITIONED: c_customer_sk
STREAM DATA SINK
EXCHANGE ID: 06
UNPARTITIONED
TotalTime: avg 106.263us, max 118.38us, min 81.403us
BlocksSent: 8
5:VAGGREGATE (update serialize)
| output: partial_count(*)[#43 ]
| group by:
| cardinality=1
| TotalTime: avg 679.296us, max 739.395us, min 554.904us
| BuildTime: avg 33.198us, max 48.387us, min 28.880us
| ExecTime: avg 27.633us, max 40.278us, min 24.537us
| RowsReturned: 8
|
4:VHASH JOIN
| join op: INNER JOIN(PARTITIONED)[]
| equal join conjunct: c_customer_sk = i_item_sk
| runtime filters: RF000[bloom] <- i_item_sk(18000/16384/1048576)
| cardinality=17,740
| vec output tuple id: 3
| vIntermediate tuple ids: 2
| hash output slot ids: 22
| RowsReturned: 18.0K (18000)
| ProbeRows: 18.0K (18000)
| ProbeTime: avg 862.308us, max 1.576ms, min 666.28us
| BuildRows: 18.0K (18000)
| BuildTime: avg 3.8ms, max 3.860ms, min 2.317ms
|
|----1:VEXCHANGE
| offset: 0
| TotalTime: avg 48.822us, max 67.459us, min 30.380us
| RowsReturned: 18.0K (18000)
|
3:VEXCHANGE
offset: 0
TotalTime: avg 33.162us, max 39.480us, min 28.854us
RowsReturned: 18.0K (18000)
PLAN FRAGMENT 2
PARTITION: HASH_PARTITIONED: c_customer_id
STREAM DATA SINK
EXCHANGE ID: 03
HASH_PARTITIONED: c_customer_sk
TotalTime: avg 753.954us, max 1.210ms, min 499.470us
BlocksSent: 64
2:VOlapScanNode
TABLE: default_cluster:tpcds.customer(customer), PREAGGREGATION: ON
runtime filters: RF000[bloom] -> c_customer_sk
partitions=1/1, tablets=12/12, tabletList=1550745,1550747,1550749 ...
cardinality=100000, avgRowSize=0.0, numNodes=1
pushAggOp=NONE
TotalTime: avg 18.417us, max 41.319us, min 10.189us
RowsReturned: 18.0K (18000)
---------
Co-authored-by: yiguolei <676222867@qq.com >
2023-10-07 11:16:53 +08:00
642e5cdb69
[Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly ( #23395 )
2023-09-29 22:38:52 +08:00
5fc04b6aeb
[Improvement](hash) some refactor of process hash table probe impl ( #24461 )
...
some refactor of process hash table probe impl
2023-09-27 16:14:49 +08:00
c9ef5ef2b1
[refactor](profile) refactor join node profile when build side shared hash table ( #24785 )
...
refactor join node profile when build side shared hash table
2023-09-25 10:28:16 +08:00
49f6eda843
[fix](nested_join) incorrect result of semi/anti mark join ( #24616 )
2023-09-20 10:41:06 +08:00
35c5d71549
[Improvement](join) some improvement of hash join ( #23972 )
...
some improvement of hash join
2023-09-14 17:55:35 +08:00
8e7f7c9566
[fix](profile) move probe time to pull and add LoopGenerateJoin time #24302
2023-09-14 16:41:01 +08:00
c94e47583c
[fix](join) avoid DCHECK failed in '_filter_data_and_build_output' ( #24162 )
...
avoid DCHECK failed in '_filter_data_and_build_output'
2023-09-11 11:54:44 +08:00
93c1151f1a
[fix](join) incorrect result of mark join ( #24112 )
2023-09-10 11:30:45 +08:00
76ca57cf21
[bug](join) fix outer join not add tuple is null column when build rows is 0 ( #23974 )
...
fix outer join not add tuple is null column when build rows is 0
2023-09-08 17:55:03 +08:00
69868f18d6
[Bug](join) fix nested loop join some problems ( #24034 )
2023-09-08 17:40:41 +08:00
68acb8597b
[fix](nested_loop_join) null value should be output in semi-anti join ( #23971 )
...
create table t1
(k1 bigint, k2 bigint)
ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
create table t3
(k1 bigint, k2 bigint)
ENGINE=OLAP
DUPLICATE KEY(k1, k2)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(k2) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"is_being_synced" = "false",
"storage_format" = "V2",
"light_schema_change" = "true",
"disable_auto_compaction" = "false",
"enable_single_replica_compaction" = "false"
);
Data:
insert into t1 values (1,null),(null,1),(1,2), (null,2),(1,3), (2,4), (2,5), (3,3), (3,4), (20,2), (22,3), (24,4),(null,null);
insert into t3 values (1,null),(null,1),(1,4), (1,2), (null,3), (2,4), (3,7), (3,9),(null,null),(5,1);
Query:
select t1.* from t1 where not exists ( select k1 from t3 where t1.k2 < t3.k2 );
Result:
Empty set
Expect result:
+------+------+
| k1 | k2 |
+------+------+
| NULL | NULL |
| 1 | NULL |
+------+------+
2023-09-08 09:28:55 +08:00
3317909141
[pipelineX](join) support nested loop join operator ( #23756 )
2023-09-04 10:08:22 +08:00
9da9409bd4
[refactor](join) improve join node output when build table rows is 0 ( #23713 )
2023-09-04 09:48:38 +08:00
d22290e548
[pipelineX](join) support hash join ( #23689 )
2023-08-31 13:01:26 +08:00
9d1f2cd8e0
[Improvement](pipeline) Terminate early for short-circuit join ( #23378 )
2023-08-23 19:40:17 +08:00
b252c49071
[fix](hash join) fix heap-use-after-free of HashJoinNode ( #23094 )
2023-08-17 16:29:47 +08:00
343a6dc29d
[improvement](hash join) Return result early if probe side has no data ( #23044 )
2023-08-17 09:17:09 +08:00
d371101bfd
[Improvement](aggregation) make fixed hashmap's bitmap_size flexable ( #22573 )
...
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
e7e73a618c
[exec](join) Print join type in profile ( #22567 )
2023-08-03 20:46:15 +08:00
7947569993
[Bug][RegressionTest] fix the DCHECK failed in join code ( #22021 )
2023-07-20 18:12:20 +08:00
b35cfc5d5e
[opt](join) Opt the performance of join probe ( #21845 )
2023-07-19 01:21:22 +08:00
c36d225a27
[feature](profile) add process hashtable time in join node ( #21878 )
...
add process hashtable time in join node
2023-07-18 18:09:42 +08:00
7f50c07219
[Opt](exec) opt the outer join performance in TPCDS Q95 ( #21806 )
2023-07-14 18:42:08 +08:00
4d17400244
[profile](join) add collisions into profile ( #21510 )
2023-07-06 14:30:10 +08:00
b5da3f74f5
[improvement](join) avoid unnecessary copying in _build_output_block ( #21360 )
...
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
ca0953ea51
[improvement](join) Serialize build keys in a vectorized (columnar) way ( #21361 )
...
There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.
2023-07-03 09:29:10 +08:00
2c9bdd64fa
[fix](memory) arena support memory reuse after clear() ( #21033 )
2023-06-21 23:27:21 +08:00
6d579d924d
[fix](profile) delete useless profile add_child #20989
2023-06-20 23:21:52 +08:00
fb9fcf460a
[fix](leftjoin) fix bug of left and full join with other conjuncts ( #20946 )
...
Fix bug of left and full outer join with other conjuncts. When equal matched row count of a probe row exceed batch_size, some times the _join_node->_is_any_probe_match_row_output flag is not set correcty, which result in outputing extra rows for the probe row.
2023-06-19 12:27:06 +08:00
ab32299ba4
[feature](nereids) Support multi target rf #20714
...
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
460399f214
[fix](profile) remove same profile in join node ( #20734 )
2023-06-15 08:08:39 +08:00
31a4f96f01
[refactor](exprcontext) move close to expr context's dector method ( #20747 )
...
The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.
2023-06-14 18:01:07 +08:00
e010fa8d4f
[Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally ( #20593 )
2023-06-13 11:20:49 +08:00
51bbf17786
[Refactor](Profile) Add and refactor the join profile ( #20693 )
2023-06-13 09:06:51 +08:00
ea264ce9de
[Opt](join) short circuit probe for join node ( #20585 )
...
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
a6f625676b
[profile](remove child) child is for node, should not be used to organize counters ( #20676 )
...
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.
- MemoryUsage:
- HashTable: 23.98 KB
- SerializeKeyArena: 446.75 KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com >
2023-06-12 10:00:35 +08:00
3b17cc8eb3
[Improvement](column) reduce cache miss for data copy ( #20583 )
2023-06-09 13:10:57 +08:00
b60860c5e5
[refactor](profile) refactor the join profile when its shared hash table ( #20391 )
...
in join node, if it's broadcast_join
and shared hash table, some counter/timer about build hash table is useless,
so we could add those counter/timer in faker profile, and those will not display in web profile.
2023-06-09 08:59:49 +08:00
d00b7ad04b
[Opt](performance) opt the outer join for nested loop join ( #20524 )
2023-06-07 17:31:36 +08:00
1fc48e83f2
[fix](executor)Fix duplicate timer and add open timer #20448
...
1 Currently, Node's total timer couter has timed twice(in Open and alloc_resource), this may cause timer in profile is not correct.
2 Add more timer to find more code which may cost much time.
2023-06-06 08:55:52 +08:00
4f77578d8a
[enhancement](profile) add build get child next time ( #20460 )
...
Currently, build time not include child(1)->get next time, it is very confusing during shared hash table scenario. So that I add a profile.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com >
2023-06-06 08:55:19 +08:00
9f8de89659
[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode ( #19758 )
...
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.
By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.
This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
43aa062fb1
[Chore](hash-join) remove useless conditions and add some case ( #20050 )
2023-05-26 14:45:24 +08:00
14b4c7abf9
[fix](hashtable) Check query cancel status during build hash table #19970
...
should cancel query during hash table build stage if the query is cancelled.
2023-05-24 14:24:03 +08:00