Commit Graph

121 Commits

Author SHA1 Message Date
5029ef46c9 [fix] fix ltrim result may incorrect in some case (#7963)
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately

this function return different between gcc and clang when x is 0
2022-02-09 13:06:37 +08:00
Pxl
0553ce2944 [feature](vectorization) support function topn && remove some unused code (#7793) 2022-02-09 13:05:31 +08:00
c1fef37399 [improvement](runtime-filter) Support adaptive runtime filter(#7546) (#7645)
Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER
    The processing logic is
    If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work
    If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect

Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER
2022-01-30 16:46:52 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00
5b0f11b665 [feature](mysql-compatibility)(function) add WEEKDAY function (#7673)
`WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday.
`DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday.

Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function.

Thanks for the following materials:
- https://github.com/apache/incubator-doris/pull/6982/files
- https://www.bilibili.com/video/BV1V44y1Y7Ro
2022-01-16 10:39:21 +08:00
5e1caea2b1 [fix](lateral-view) Fix some bugs about lateral view (#7721)
1.  fix core dump when using multi explode_bitmap #7716 
2. fix bug that json array extract by json path is wrong #7717 
3. fix bug that after lateral view, the null value become non-null value #7718 
4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719 
5. fix error result when using lateral view with where predicate #7720
2022-01-13 15:30:38 +08:00
563545475e [Optimize](Runtime Filter) Support merge in runtime filter(#7546) (#7547)
Support merge IN predicate when exist remote target(e.g. shuffle hash join).
Remote the code that IN predicate implicit conversion to Bloom filter then exist  remote target.

Close related #7546
2022-01-06 19:08:35 +08:00
07e2acb2f3 [feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464)
SM3 is password hash algorithm
SM4 is a block cipher used to replace DES / AES and other international algorithms.
2021-12-28 10:39:54 +08:00
0c154733e0 [feature](function) support bitmap_union/intersect have more columns parameters (#7379)
support multi bitmap parameter for all bitmap aggregation function
2021-12-26 11:03:20 +08:00
3ba6dcf236 [fix](function) fix round function for inaccuracy (#7421) 2021-12-24 21:23:11 +08:00
382351b0ee [fix](ut) Fix run fe ut failed, be ut memory leak and build thirdparty failed (#7377) 2021-12-15 11:00:20 +08:00
d3316ff567 [performance](function) Support SIMD function in some string function (#7236)
Support SIMD function in some string function:lrtim,rtrim,trim,reverse,hex
2021-12-06 10:24:26 +08:00
c9e578032b optimize bitmap function count, use roaring cardinality method, this will more fast than current version (#7151) 2021-11-24 14:42:48 +08:00
Pxl
a74fdf184c [refactor](be) refactor predicate function creator (#7054)
Refactor predicate function creator, make MinMaxFunction/HybridSet/BloomFilter
use a unified interface through template to get function.
2021-11-24 10:39:29 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
e69249c082 sub_bitmap (#6977)
Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset.

Types of chang
2021-11-06 13:31:03 +08:00
599ecb1f30 [Function] Add bitmap function bitmap_subset_limit (#6980)
Add bitmap function bitmap_subset_limit.
This function will return subset in specified index.
2021-11-04 12:14:47 +08:00
aeec9c45e6 [Function] Add bitmap-xor-count function for doris (#6982)
Add bitmap-xor-count function for doris

relate to #6875
2021-11-02 16:37:00 +08:00
1ff3d708ca [Function] add functions of bitmap_and/or_count (#6912)
issue #6875
add bitmap_and_count/ bitmap_or_count
2021-11-01 14:00:07 +08:00
c7a3116f98 [Function] add bitmap function of bitmap_has_all (#6918)
The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.
2021-11-01 12:50:47 +08:00
65ded82778 [Function] add BE bitmap function bitmap_subset_in_range (#6917)
Add bitmap function bitmap_subset_in_range.
This function will return subset in specified range (not include the range_end).
2021-11-01 11:05:19 +08:00
Pxl
28030294f7 [Feature] Support bitmap_and_not & bitmap_and_not_count (#6910)
Support bitmap_and_not & bitmap_and_not_count.
2021-11-01 10:11:54 +08:00
a842d41b87 [Function] add BE bitmap function bitmap_max (#6942)
Support bitmap_max.
2021-10-30 18:16:38 +08:00
adb9b0d9c6 [Bug] Return 0 when hex(0) (#6837) 2021-10-15 10:18:55 +08:00
58440b90f0 [Bug] Left() string function behaves not identically to the mysql implementation (#6811)
See Fix #6810
2021-10-15 10:17:21 +08:00
ad949c2f65 Optimize Hex and add related Doc (#6697)
I tested hex in a 1000w times for loop with random numbers,
old hex avg time cost is 4.92 s,optimize hex avg time cost is 0.46 s which faster nearly 10x.
2021-10-13 11:36:14 +08:00
020282e885 [Bug] Fix aes_decrypt to handle null input correctly. (#6636) 2021-09-14 11:19:55 +08:00
225bdb1fda [Bug] fix replace function bug (#6605)
* fix replace function bug

* fix replace docs
2021-09-14 09:59:13 +08:00
Pxl
577ff01094 [Bug][Function] Fix pad function wrong result when len.val==str_char_size (#6564)
like #6563 and #6562
2021-09-07 11:55:49 +08:00
7a15e583a7 [Feature]Support functions of json_array, json_object, json_quote (#6504) 2021-09-02 09:59:02 +08:00
66a7a4b294 [Feature] Support exact percentile aggregate function (#6410)
Support to calculate the exact percentile value array of numeric column `col` at the given percentage(s).
2021-08-18 15:56:06 +08:00
285d44cd48 [BUG] Fix potential overflow exception when do money format for double (#6408)
* [BUG] Fix potential overflow bug when do money format for double

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-15 18:40:26 +08:00
2f5b06ae70 [Bug][Optimize] Fix race condition problem and optimize do_money_format function (#6350)
* [Bug][Optimize] Fix race condition problem and optimize do_money_format function

Co-authored-by: caiconghui <caiconghui@xiaomi.com>
2021-08-06 16:29:34 +08:00
4c0fdd2800 [Bug] Fix core dump in BloomFilter while build Runtime Filter right table string column contains null (#6305)
when right table has null value in string column, runtime filter may coredump
```
select count(*) from baseall t1 join test t2 where t1.k7 = t2.k7;
```
2021-07-26 09:41:41 +08:00
13ef2c9e1d [Function][Enhance] lower/upper case transfer function vectorized (#6253)
Currently, the function lower()/upper() can only handle one char at a time.
A vectorized function has been implemented, it makes performance 2 times faster. Here is the performance test:

The length of char: 26, test 100 times
vectorized-function-cost: 99491 ns
normal-function-cost: 134766 ns

The length of char: 260, test 100 times
vectorized-function-cost: 179341 ns
normal-function-cost: 344995 ns
2021-07-26 09:38:07 +08:00
fae3eff2e6 [Bug] Fix the bug of cast string to datetime return not null (#6228) 2021-07-17 10:55:08 +08:00
bf5db6eefe [BUG][Timeout][QueryLeak] Fixed memory not released in time (#6221)
* Revert "[Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036)"

This reverts commit f254870aeb18752a786586ef5d7ccf952b97f895.

* [BUG][Timeout][QueryLeak] Fixed memory not released in time, Fix Core dump in bloomfilter
2021-07-16 12:32:10 +08:00
c2695e9716 [Bug][RoutineLoad] Can not match whole json in routine load (#6213)
Support using json path "$" to match the whole json in routine load

Co-authored-by: chenmingyu <chenmingyu@baidu.com>
2021-07-16 09:21:27 +08:00
ed3ff470ce [ARRAY] Support array type load and select not include access by index (#5980)
This is part of the array type support and has not been fully completed. 
The following functions are implemented
1. fe array type support and implementation of array function, support array syntax analysis and planning
2. Support import array type data through insert into
3. Support select array type data
4. Only the array type is supported on the value lie of the duplicate table

this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979
2021-07-13 14:02:39 +08:00
290a844e04 [optimize] Optimize bloomfilter performance (#6180)
refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5%
import block bloom filter, for avx version obtained 40% performance improvement
before: bloomfilter size:default, about 2000W item cost about 1s400ms
after: bloomfilter size:524288, about 2000W item cost about 400ms
2021-07-10 10:12:12 +08:00
c929a8935a [Feature][Function] support bit_length function (#6140)
support bit_length function like mysql
2021-07-08 09:40:30 +08:00
739c0268ff [refactor] Remove decimal v1 related code from code base (#6079)
remove ALL DECIMAL V1 type code , this is a part of #6073
2021-07-07 10:26:32 +08:00
149def9e42 [Feature] Support RuntimeFilter in Doris (BE Implement) (#6077)
1. support in/bloomfilter/minmax
2. support broadcast/shuffle/bucket shuffle/colocate join
3. opt memory use and cpu cache miss while build runtime filter
4. opt memory use in left semi join (works well on tpcds-95)
2021-07-04 20:59:05 +08:00
bde60280b8 [Optimize] use string_view instead of std::string in string function (#6010) 2021-06-16 09:40:13 +08:00
d33a6d1b98 [Function] Support date function: yearweek(), week(), makedate(). (#6000) 2021-06-10 17:38:25 +08:00
d790cc6a50 [BUG] Fixed the problem that substring function may access illegal address (#5952) 2021-06-03 18:38:10 +08:00
be733cfa9c [Metrics] Add some large memtrackers' metric (#5614)
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
2021-04-21 09:15:04 +08:00
40f53ac71f fix bitmap unit test failed (#5610) 2021-04-08 10:25:59 +08:00
1e8c4584ab [Function] Add BE udf bitmap_min (#2538) (#5581)
this function will return the min result of the input bitmap .
2021-04-08 09:11:32 +08:00