Commit Graph

171 Commits

Author SHA1 Message Date
3317909141 [pipelineX](join) support nested loop join operator (#23756) 2023-09-04 10:08:22 +08:00
9da9409bd4 [refactor](join) improve join node output when build table rows is 0 (#23713) 2023-09-04 09:48:38 +08:00
d22290e548 [pipelineX](join) support hash join (#23689) 2023-08-31 13:01:26 +08:00
9d1f2cd8e0 [Improvement](pipeline) Terminate early for short-circuit join (#23378) 2023-08-23 19:40:17 +08:00
b252c49071 [fix](hash join) fix heap-use-after-free of HashJoinNode (#23094) 2023-08-17 16:29:47 +08:00
343a6dc29d [improvement](hash join) Return result early if probe side has no data (#23044) 2023-08-17 09:17:09 +08:00
Pxl
d371101bfd [Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573)
make fixed hashmap's bitmap_size flexable
2023-08-14 10:47:06 +08:00
e7e73a618c [exec](join) Print join type in profile (#22567) 2023-08-03 20:46:15 +08:00
7947569993 [Bug][RegressionTest] fix the DCHECK failed in join code (#22021) 2023-07-20 18:12:20 +08:00
b35cfc5d5e [opt](join) Opt the performance of join probe (#21845) 2023-07-19 01:21:22 +08:00
c36d225a27 [feature](profile) add process hashtable time in join node (#21878)
add process hashtable time in join node
2023-07-18 18:09:42 +08:00
7f50c07219 [Opt](exec) opt the outer join performance in TPCDS Q95 (#21806) 2023-07-14 18:42:08 +08:00
4d17400244 [profile](join) add collisions into profile (#21510) 2023-07-06 14:30:10 +08:00
b5da3f74f5 [improvement](join) avoid unnecessary copying in _build_output_block (#21360)
If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.
2023-07-04 12:13:49 +08:00
ca0953ea51 [improvement](join) Serialize build keys in a vectorized (columnar) way (#21361)
There is a significant performance improvement in serializing keys in the aggregate node through vectorization. Now, applying it to the join node also brings performance improvement.
2023-07-03 09:29:10 +08:00
2c9bdd64fa [fix](memory) arena support memory reuse after clear() (#21033) 2023-06-21 23:27:21 +08:00
6d579d924d [fix](profile) delete useless profile add_child #20989 2023-06-20 23:21:52 +08:00
fb9fcf460a [fix](leftjoin) fix bug of left and full join with other conjuncts (#20946)
Fix bug of left and full outer join with other conjuncts. When equal matched row count of a probe row exceed batch_size, some times the _join_node->_is_any_probe_match_row_output flag is not set correcty, which result in outputing extra rows for the probe row.
2023-06-19 12:27:06 +08:00
ab32299ba4 [feature](nereids) Support multi target rf #20714
Support multi target runtime filter, mainly for set operation, such as union/intersect/except.
2023-06-16 20:26:00 +08:00
460399f214 [fix](profile) remove same profile in join node (#20734) 2023-06-15 08:08:39 +08:00
31a4f96f01 [refactor](exprcontext) move close to expr context's dector method (#20747)
The close method does nothing. But I am not sure we could remove it. So that I add it to dector method and remove many many calls.
2023-06-14 18:01:07 +08:00
Pxl
e010fa8d4f [Chore](runtime filter) remove runtime filter ready_for_publish/publish_finally (#20593) 2023-06-13 11:20:49 +08:00
51bbf17786 [Refactor](Profile) Add and refactor the join profile (#20693) 2023-06-13 09:06:51 +08:00
ea264ce9de [Opt](join) short circuit probe for join node (#20585)
Support the _short_circuit_for_probe for join node
2023-06-12 16:01:09 +08:00
a6f625676b [profile](remove child) child is for node, should not be used to organize counters (#20676)
Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label.

                          -  MemoryUsage:  
                              -  HashTable:  23.98  KB
                              -  SerializeKeyArena:  446.75  KB
Add a new macro ADD_LABEL_COUNTER to add just a label in the profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-12 10:00:35 +08:00
3b17cc8eb3 [Improvement](column) reduce cache miss for data copy (#20583) 2023-06-09 13:10:57 +08:00
b60860c5e5 [refactor](profile) refactor the join profile when its shared hash table (#20391)
in join node, if it's broadcast_join
and shared hash table, some counter/timer about build hash table is useless,
so we could add those counter/timer in faker profile, and those will not display in web profile.
2023-06-09 08:59:49 +08:00
d00b7ad04b [Opt](performance) opt the outer join for nested loop join (#20524) 2023-06-07 17:31:36 +08:00
1fc48e83f2 [fix](executor)Fix duplicate timer and add open timer #20448
1 Currently, Node's total timer couter has timed twice(in Open and alloc_resource), this may cause timer in profile is not correct.
2 Add more timer to find more code which may cost much time.
2023-06-06 08:55:52 +08:00
4f77578d8a [enhancement](profile) add build get child next time (#20460)
Currently, build time not include child(1)->get next time, it is very confusing during shared hash table scenario. So that I add a profile.

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-06-06 08:55:19 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
Pxl
43aa062fb1 [Chore](hash-join) remove useless conditions and add some case (#20050) 2023-05-26 14:45:24 +08:00
14b4c7abf9 [fix](hashtable) Check query cancel status during build hash table #19970
should cancel query during hash table build stage if the query is cancelled.
2023-05-24 14:24:03 +08:00
272a7565b8 [improvement](tracing) Remove useless span levels from be side tracing (#19665)
1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span;
2. Fix some problems with tracing in pipeline.
2023-05-17 19:04:52 +08:00
1d421a26d9 [bugfix](memory) merge block may allocate failed (#19507) 2023-05-11 10:42:47 +08:00
9e2b118288 [RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149) 2023-04-27 17:52:03 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
293e115536 [Improvement](bloom filter) initialize bloom filter with adaptive size (#18785) 2023-04-20 10:06:40 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
79c446c89f [enhancement](exception) Column filter/replicate supports exception safety (#18503) 2023-04-18 19:23:09 +08:00
4ca0c0face [fix](join) fix wrong result of right join (#18365)
When processing data in hash table for right join and full outer join, if the output data rows of one hash bucket excceeds batch size, the logic when continue processing this bucket is wrong, it should differentiate between different join types.
2023-04-06 10:55:58 +08:00
e5793249cd [opt](hashtable) Modify default filled strategy to 75% (#18242) 2023-03-31 09:28:11 +08:00
d27201f331 [fix](nested_loop_join)got incorrect result from nested loop join without condition (#18139) 2023-03-28 16:20:05 +08:00
78abb40fdc [improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989)
Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.
2023-03-27 08:55:26 +08:00
7d91114304 [fix](join) fix wrong result of null aware left anti join (#17752) 2023-03-14 09:35:46 +08:00
93a865c3e8 [improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655)
When the right (build) side is empty in a right outer join, there is no need to read data from the left child.
2023-03-13 09:03:17 +08:00
00727e8c11 [fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570) 2023-03-09 10:59:05 +08:00
1244eed1cd [Opt](exec) opt the dispose nullable column logic (#17192) 2023-03-01 23:25:40 +08:00