Commit Graph

63 Commits

Author SHA1 Message Date
181dad4181 [fix](executor) make elt / repeat smooth upgrade. (#21493)
BE : 2.0,FE : 1.2

before

mysql [(none)]>select elt(1, 'aaa', 'bbb');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Function elt get failed, expr is VectorizedFnCall[elt](arguments=,return=String) and return type is String.

mysql [test]> INSERT INTO tbb VALUES (1, repeat("test1111", 8192))(2, repeat("test1111", 131072));
mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | d41d8cd98f00b204e9800998ecf8427e |            0 |
| 2    | d41d8cd98f00b204e9800998ecf8427e |            0 |
+------+----------------------------------+--------------+

now

mysql [test]>select elt(1, 'aaa', 'bbb');
+----------------------+
| elt(1, 'aaa', 'bbb') |
+----------------------+
| aaa                  |
+----------------------+

mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | 1f44fb91f47cab16f711973af06294a0 |        65536 |
| 2    | 3c514d3b89e26e2f983b7bd4cbb82055 |      1048576 |
+------+----------------------------------+--------------+
2023-07-06 19:15:06 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
Pxl
90d710e83d [Enchancement](function) optimize for padding function && add string length check on string op (#20363) 2023-06-02 21:24:41 +08:00
Pxl
d64be9565d [Bug](function) fix function in get wrong result when input const column (#19791)
fix function in get wrong result when input const column
2023-05-22 10:58:29 +08:00
56809230d1 [Improvement](string function) optimize substring and in string set (#19257)
* [Improvement](string function) optimize substring and in string set

* update
2023-05-17 14:09:52 +08:00
6626f26506 [optimize](string) optimize char_length function by SIMD (#18925)
Optimize char_length function by SIMD
(1) optimize utf8_len compute
(2) 840% up
2023-04-28 17:22:35 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
ab9500bfa6 [optimize](string) optimize instr and locate function for constant arguments (#18692)
Optimize instr and locate function for constant arguments.

    instr and locate function constant arguments has 58%~200% performance improvement.
    refactor locate(substr, str, pos) as standardized arguments processing.
2023-04-20 10:40:19 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
0b074ade02 [fix](const column) fix coredump caused by const column for some functions (#18737) 2023-04-18 13:57:55 +08:00
43392918cd [Optimization](functions)Optimize function call for const columns. (#18310) 2023-04-12 11:11:01 +08:00
0c5e3df4a3 [optimize](string) optimize split_by_string and substring_index function (#18496)
Use SIMD stringsearcher and SIMD memcmp optimze split_by_string and substring_index function.

split_by_string function has 32%~540% up
substring_index function has 22%~46% up
Performance difference depends on the needle size and whether the needle is constant param. And the longer the needle, the more performance improvement
2023-04-11 15:49:03 +08:00
fb50626075 [optimize](string) optimize concat function by SIMD memcpy (#18458)
Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15.
Optimize string functions list: concat, convert_to, mask, initcap, lower, upper.

concat function has 29% up:
2023-04-08 17:05:34 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00
b95cd7eca2 [Refactor](function) Reconstruct default logic for const args. (#17830) 2023-03-17 11:13:13 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
3e40467ce6 [Bug](vec) Fix chinese pinyin order by (#17152)
bug: some chinese word not sort by pinyin in GBK coding

CREATE TABLE `test_convert` (
                 `a` varchar(100) NULL
             ) ENGINE=OLAP
               DUPLICATE KEY(`a`)
               DISTRIBUTED BY HASH(`a`) BUCKETS 3
               PROPERTIES (
               "replication_allocation" = "tag.location.default: 1"
               );
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a    |
+------+
| a    |
| c    |
| 丝   |
| b    |
| 多   |
| 睿   |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);          
+------+
| a    |
+------+
| a    |
| b    |
| c    |
| 多   |
| 丝   |
| 睿   |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
883f575cfe [fix](string function) fix wrong usage of iconv_open (#17048)
* [fix](string function) fix wrong usage of iconv_open

Also add test case for function convert

* fix test case
2023-02-24 09:13:10 +08:00
e04c13b7a6 [enhancement](exception safe) make function state exception safe (#16771) 2023-02-20 23:01:45 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
adb758dcac [refactor](remove non vec code) remove json functions string functions match functions and some code (#16141)
remove json functions code
remove string functions code
remove math functions code
move MatchPredicate to olap since it is only used in storage predicate process
remove some code in tuple, Tuple structure should be removed in the future.
remove many code in collection value structure, they are useless
2023-01-26 16:21:12 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
2c9c7c48ac [improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674) 2023-01-09 15:13:16 +08:00
Pxl
6b3721af23 [Bug](function) fix core dump on reverse() when big string input
fix core dump on reverse() when big string input
2022-12-23 10:14:09 +08:00
77c15729d4 [fix](memory) Fix too many repeat cause OOM (#15217) 2022-12-22 17:16:18 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
b5c0d4870d [fix](nereids)fix bug of elt and sub_replace function (#14971) 2022-12-12 17:37:36 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
33349c3419 [feature](function)Support negative index for function split_part (#13914) 2022-12-12 09:56:09 +08:00
7a08a799e9 [Vectorized](function) support order by convert_to function (#14555) 2022-11-29 15:22:27 +08:00
5d7b51dcc2 [BugFix](Concat) output of string concat function exceeds UINT makes crash (#13916) 2022-11-03 19:44:44 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
ffcb2f8525 [opt](exec) Replace get_utf8_byte_length function by array (#13664) 2022-10-27 09:46:41 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
d624ff0580 [chore](macOS) Avoid using binutils from Homebrew to build third parties (#13512)
Overwrite the environment variable PATH to avoid using binutils from Homebrew to build third parties which may cause compilation errors.

Error: building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format
2022-10-21 01:28:30 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
f0dbbe5b46 [Bug](funciton) fix repeat coredump when step is to long (#13408) 2022-10-18 09:55:06 +08:00
144486e220 [Opt](fun) simd the substring function and use stack buf to speed up (#13338) 2022-10-16 11:48:34 +08:00
5757bbc9f3 fix be oom when replace with an empty old str (#13220) 2022-10-10 15:58:12 +08:00
Pxl
ee3dd423b9 [Bug](function) core dump on substr #13007 2022-09-28 08:54:49 +08:00
dd6ed5a9a7 [fix](function)fix string split function buffer overflow (#12834) 2022-09-24 17:32:00 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00