Commit Graph

141 Commits

Author SHA1 Message Date
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
Pxl
43aa062fb1 [Chore](hash-join) remove useless conditions and add some case (#20050) 2023-05-26 14:45:24 +08:00
14b4c7abf9 [fix](hashtable) Check query cancel status during build hash table #19970
should cancel query during hash table build stage if the query is cancelled.
2023-05-24 14:24:03 +08:00
272a7565b8 [improvement](tracing) Remove useless span levels from be side tracing (#19665)
1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span;
2. Fix some problems with tracing in pipeline.
2023-05-17 19:04:52 +08:00
1d421a26d9 [bugfix](memory) merge block may allocate failed (#19507) 2023-05-11 10:42:47 +08:00
9e2b118288 [RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149) 2023-04-27 17:52:03 +08:00
aabcab9dbe [Improvement](runtime filter) Improve merge phase (#18828) 2023-04-26 21:01:20 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
3736530585 [refactor](query context) rename query fragments context to query context and make query context safe (#18950)
* [refactor](query context) rename query fragments context to query context and make query context safe

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-23 22:53:56 +08:00
293e115536 [Improvement](bloom filter) initialize bloom filter with adaptive size (#18785) 2023-04-20 10:06:40 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
79c446c89f [enhancement](exception) Column filter/replicate supports exception safety (#18503) 2023-04-18 19:23:09 +08:00
4ca0c0face [fix](join) fix wrong result of right join (#18365)
When processing data in hash table for right join and full outer join, if the output data rows of one hash bucket excceeds batch size, the logic when continue processing this bucket is wrong, it should differentiate between different join types.
2023-04-06 10:55:58 +08:00
e5793249cd [opt](hashtable) Modify default filled strategy to 75% (#18242) 2023-03-31 09:28:11 +08:00
d27201f331 [fix](nested_loop_join)got incorrect result from nested loop join without condition (#18139) 2023-03-28 16:20:05 +08:00
78abb40fdc [improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989)
Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.
2023-03-27 08:55:26 +08:00
7d91114304 [fix](join) fix wrong result of null aware left anti join (#17752) 2023-03-14 09:35:46 +08:00
93a865c3e8 [improvement](join) Avoid reading from left child while hash table is empty(right join) (#17655)
When the right (build) side is empty in a right outer join, there is no need to read data from the left child.
2023-03-13 09:03:17 +08:00
00727e8c11 [fix](in-bitmap) fix result may be wrong if the left side of the in bitmap predicate is a constant (#17570) 2023-03-09 10:59:05 +08:00
1244eed1cd [Opt](exec) opt the dispose nullable column logic (#17192) 2023-03-01 23:25:40 +08:00
e22a9ecc3b [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread (#17212)
* [enhancement](execute model) using thread pool to execute report or join task instead of staring too many thread

Doris will start report thread and join thread during fragment execution. There are many problems if create and destroy thread very frequently. Jemalloc may not behave very well, it may crashed.

jemalloc/jemalloc#1405

It is better to using thread pool to do these tasks.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-01 08:35:27 +08:00
a1c0054b4c [fix](memory) fix memory GC details and join probe catch bad_alloc (#16989)
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
2023-02-23 08:33:30 +08:00
fb0d08ff4c [fix](mark join) fix bug of mark join with other conjuncts (#16655)
Fix bug that probe_index is not increased for mark hash join with other conjuncts.
2023-02-14 14:47:15 +08:00
f71fc3291f [Bug](fix) right anti join error result when batch size is low (#16510) 2023-02-08 17:26:19 +08:00
f6a20f844b [fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402) 2023-02-08 14:26:35 +08:00
91229bb87d [Bug](makr join) Fix mark join with other conjuncts (#16435) 2023-02-07 09:31:41 +08:00
696c6ffcc5 [fix](join) crash caused by canceling query (#16311)
If the query was canceled,
the status in shared context may be `OK` with other fields not set.
2023-02-02 09:55:37 +08:00
bf16228851 [fix](hashjoin) join produce blocks with rows larger than batch size (#16166)
* [fix](hashjoin) join produce blocks with rows larger than batch size

* fix
2023-02-01 16:02:31 +08:00
Pxl
46347a51d2 [Bug](exec) enable warning on ignoring function return value for vctx (#16157)
* enable warning on ignoring function return value for vctx
2023-01-29 17:23:21 +08:00
79ad74637d [refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136) 2023-01-24 10:45:35 +08:00
9f106161a7 [Bug](join) Fix null aware anti join error in fuzzy mode (#15987) 2023-01-17 11:32:16 +08:00
97fcad76f8 [enhancement](memtracker) Improve readability (#15716) 2023-01-16 16:30:35 +08:00
9468711f9f [Bug](join) fix bug null aware left anti join not correct result (#15841) 2023-01-13 10:18:05 +08:00
d857b4af1b [refactor](remove row batch) remove impala rowbatch structure (#15767)
* [refactor](remove row batch) remove impala rowbatch structure

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-11 09:37:35 +08:00
9c0f96883a [fix](hashjoin) Fix right join pull output block memory overflow (#15440)
For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated.

If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.
2023-01-10 10:10:43 +08:00
9c36278c4a [improvement](pipeline) Support sharing hash table for broadcast join (#15628) 2023-01-06 15:11:28 +08:00
05d72e8919 [fix](join) fix anti join incorrectly outputs null values (#15567) 2023-01-06 09:55:48 +08:00
5ff5b8fc98 [feature](mark join) Support mark join for hash join node (#15569)
* [feature](mark join) Support mark join for hash join node
2023-01-05 09:32:26 +08:00
10be583e52 [chore](pipeline) optimize profile information (#15433) 2022-12-30 09:56:33 +08:00
06f71f2bca [pipeline](fix) Fix bugs to pass all regression cases (#15306)
* [pipeline](fix) Fix bugs to pass all regression cases

* update

* update
2022-12-23 22:17:50 +08:00
b085ff49f0 [refactor](non-vec) delete non-vec data sink (#15283)
* [refactor](non-vec) delete non-vec data sink

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2022-12-23 14:10:47 +08:00
8ecf69b09b [pipeline](regression) nested loop join test get error result in pipeline engine and refactor the code for need more input data (#15208) 2022-12-21 19:03:51 +08:00
af54299b26 [Pipeline](projection) Support projection on pipeline engine (#15220) 2022-12-21 15:47:29 +08:00
732417258c [Bug](pipeline) Fix bugs to pass TPCDS cases (#15194) 2022-12-20 22:29:55 +08:00
5cf21fa7d1 [feature](planner) mark join to support subquery in disjunction (#14579)
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2022-12-20 15:22:43 +08:00
7c67fa8651 [Bug](pipeline) fix bug of right anti join error result in pipeline (#15165) 2022-12-19 19:28:44 +08:00
0732f31e5d [Bug](pipeline) Fix bugs for scan node and join node (#15164)
* [Bug](pipeline) Fix bugs for scan node and join node

* update
2022-12-19 15:59:29 +08:00
874acdf68f [vectorized](join) add try catch in create thread (#15065) 2022-12-16 19:55:09 +08:00
284a3351f4 [Refactor](exec) refactor the code of datasink eos logic (#15009) 2022-12-13 15:33:08 +08:00
68092fe514 [pipeline](NLJ) support nested loop join for pipeline (#14966) 2022-12-10 00:20:16 +08:00