Commit Graph

41 Commits

Author SHA1 Message Date
94eedd8ea4 [Enhancement](function)make SUBSTRING_INDEX function DEPEND_ON_ARGUMENT (#30392) 2024-02-02 13:31:47 +08:00
ca5a314765 [fix](function) make STRLEFT and STRRIGHT and SUBSTR function DEPEND_ON_ARGUMENT (#28352)
make STRLEFT and STRRIGHT function DEPEND_ON_ARGUMENT
2024-01-25 13:23:59 +08:00
Pxl
02a27a587a remove some unused member function of IFunctionBase (#30260) 2024-01-24 09:59:45 +08:00
e417128fb9 [bug](bitmap) should return error status when execute failed (#29841) 2024-01-16 18:30:23 +08:00
d75300f166 [fix](hash join) fix stack overflow caused by evaluate case expr on huge build block (#28851) 2023-12-22 15:45:12 +08:00
f9df3bae61 [Enhancement](functions) change some nullable mode and clear some smooth upgrade (#25334) 2023-10-16 19:50:17 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
5d138b6928 [remove](function) make execute_impl const and remove running_difference function (#24935) 2023-09-27 18:17:28 +08:00
81e65f4a12 [feature](function) Support SHA family functions (#24342) 2023-09-20 17:21:45 +08:00
181dad4181 [fix](executor) make elt / repeat smooth upgrade. (#21493)
BE : 2.0,FE : 1.2

before

mysql [(none)]>select elt(1, 'aaa', 'bbb');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Function elt get failed, expr is VectorizedFnCall[elt](arguments=,return=String) and return type is String.

mysql [test]> INSERT INTO tbb VALUES (1, repeat("test1111", 8192))(2, repeat("test1111", 131072));
mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | d41d8cd98f00b204e9800998ecf8427e |            0 |
| 2    | d41d8cd98f00b204e9800998ecf8427e |            0 |
+------+----------------------------------+--------------+

now

mysql [test]>select elt(1, 'aaa', 'bbb');
+----------------------+
| elt(1, 'aaa', 'bbb') |
+----------------------+
| aaa                  |
+----------------------+

mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | 1f44fb91f47cab16f711973af06294a0 |        65536 |
| 2    | 3c514d3b89e26e2f983b7bd4cbb82055 |      1048576 |
+------+----------------------------------+--------------+
2023-07-06 19:15:06 +08:00
d03bb4ba7b [Optimize](function) Optimize locate function by compare across strings (#20290)
Optimize locate function by compare across strings. about 90% speed up test by sum()
2023-06-05 12:43:14 +08:00
Pxl
d64be9565d [Bug](function) fix function in get wrong result when input const column (#19791)
fix function in get wrong result when input const column
2023-05-22 10:58:29 +08:00
6626f26506 [optimize](string) optimize char_length function by SIMD (#18925)
Optimize char_length function by SIMD
(1) optimize utf8_len compute
(2) 840% up
2023-04-28 17:22:35 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
ab9500bfa6 [optimize](string) optimize instr and locate function for constant arguments (#18692)
Optimize instr and locate function for constant arguments.

    instr and locate function constant arguments has 58%~200% performance improvement.
    refactor locate(substr, str, pos) as standardized arguments processing.
2023-04-20 10:40:19 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
18898db09d [feature](function) Add new parameters to 'trim'. (#18580) 2023-04-18 14:13:30 +08:00
43392918cd [Optimization](functions)Optimize function call for const columns. (#18310) 2023-04-12 11:11:01 +08:00
fb50626075 [optimize](string) optimize concat function by SIMD memcpy (#18458)
Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15.
Optimize string functions list: concat, convert_to, mask, initcap, lower, upper.

concat function has 29% up:
2023-04-08 17:05:34 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
7a08a799e9 [Vectorized](function) support order by convert_to function (#14555) 2022-11-29 15:22:27 +08:00
ee934483eb [Enhancement](function) optimize the upper and lower functions using the simd instruction. (#13326)
optimize the `upper` and `lower` functions using the simd instruction.
2022-11-03 15:12:25 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
de4315c1c5 [feature](function) support initcap string function (#13193)
support `initcap` string function
2022-10-13 21:31:44 +08:00
f1539761e8 [Bugfix](string_functions) rearrange code to avoid global buffer overflow in FindInSetOp::execute (#12677) 2022-09-21 09:19:38 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00
df47b6941d [feature-wip](array-type) support the array type in reverse function (#11213)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-09 20:49:09 +08:00
e0ef9b8f6c [refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373) 2022-03-11 17:21:47 +08:00
Pxl
cd8694e532 [feature][vectorized] support replace() (#8384) 2022-03-08 18:57:12 +08:00
454b45bea3 [feature](vectorize)(function) support regexp&&sm4&&aes functions (#8307) 2022-03-08 13:14:02 +08:00
2b9b0fc1ec [Fix] Function percentile input null return null (#8238) 2022-03-01 14:42:48 +08:00
Pxl
b26e7e3c28 [feature](function)(vec) support locate function (#7988)
* support function locate in vectorized engine

* add ut and fix some bug
2022-02-12 16:00:37 +08:00
5029ef46c9 [fix] fix ltrim result may incorrect in some case (#7963)
fix ltrim result may incorrect in some case
according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
Built-in Function: int __builtin_cl/tz (unsigned int x)
If x is 0, the result is undefined.
So we handle the case of 0 separately

this function return different between gcc and clang when x is 0
2022-02-09 13:06:37 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00