fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately
this function return different between gcc and clang when x is 0
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
The processing logic is
If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect
Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
# Proposed changes
Issue Number: close#6238
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
Co-authored-by: wangbo <506340561@qq.com>
Co-authored-by: emmymiao87 <522274284@qq.com>
Co-authored-by: Pxl <952130278@qq.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: thinker <zchw100@qq.com>
Co-authored-by: Zeno Yang <1521564989@qq.com>
Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
Co-authored-by: xinghuayu007 <1450306854@qq.com>
Co-authored-by: weizuo93 <weizuo@apache.org>
Co-authored-by: yiguolei <guoleiyi@tencent.com>
Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
Co-authored-by: awakeljw <993007281@qq.com>
Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>
## Problem Summary:
### 1. Some code from clickhouse
**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**
The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris
### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node
You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.
### 3. Data Model
Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.
### 4. How to use
1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)
### 5. Some diff from origin exec engine
https://github.com/doris-vectorized/doris-vectorized/issues/294
## Checklist(Required)
1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
1. fix core dump when using multi explode_bitmap #7716
2. fix bug that json array extract by json path is wrong #7717
3. fix bug that after lateral view, the null value become non-null value #7718
4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719
5. fix error result when using lateral view with where predicate #7720
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target.
Close related #7546
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
I tested hex in a 1000w times for loop with random numbers,
old hex avg time cost is 4.92 s,optimize hex avg time cost is 0.46 s which faster nearly 10x.
when right table has null value in string column, runtime filter may coredump
```
select count(*) from baseall t1 join test t2 where t1.k7 = t2.k7;
```
Currently, the function lower()/upper() can only handle one char at a time.
A vectorized function has been implemented, it makes performance 2 times faster. Here is the performance test:
The length of char: 26, test 100 times
vectorized-function-cost: 99491 ns
normal-function-cost: 134766 ns
The length of char: 260, test 100 times
vectorized-function-cost: 179341 ns
normal-function-cost: 344995 ns
* Revert "[Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036)"
This reverts commit f254870aeb18752a786586ef5d7ccf952b97f895.
* [BUG][Timeout][QueryLeak] Fixed memory not released in time, Fix Core dump in bloomfilter
This is part of the array type support and has not been fully completed.
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table
this pr merge some code from #4655#4650#4644#4643#4623#2979
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.