Commit Graph

55 Commits

Author SHA1 Message Date
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
0b074ade02 [fix](const column) fix coredump caused by const column for some functions (#18737) 2023-04-18 13:57:55 +08:00
43392918cd [Optimization](functions)Optimize function call for const columns. (#18310) 2023-04-12 11:11:01 +08:00
0c5e3df4a3 [optimize](string) optimize split_by_string and substring_index function (#18496)
Use SIMD stringsearcher and SIMD memcmp optimze split_by_string and substring_index function.

split_by_string function has 32%~540% up
substring_index function has 22%~46% up
Performance difference depends on the needle size and whether the needle is constant param. And the longer the needle, the more performance improvement
2023-04-11 15:49:03 +08:00
fb50626075 [optimize](string) optimize concat function by SIMD memcpy (#18458)
Optimize concat function 29% up by memcpy_small_allow_read_write_overflow15.
Optimize string functions list: concat, convert_to, mask, initcap, lower, upper.

concat function has 29% up:
2023-04-08 17:05:34 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00
b95cd7eca2 [Refactor](function) Reconstruct default logic for const args. (#17830) 2023-03-17 11:13:13 +08:00
77ab2fac20 [refactor](functioncontext) remove function context impl class (#17715)
* [refactor](functioncontext) remove function context impl class


Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2023-03-14 11:21:45 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
4692d6764c [refactor](remove string val) remove string val structure, it is same with string ref (#17461)
remove stringval, decimalv2val, bigintval
2023-03-08 10:42:20 +08:00
17f4990bd3 [enhancement](functioncontext) function context should use shared ptr and simply function context (#17311)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-03-02 16:23:54 +08:00
3e40467ce6 [Bug](vec) Fix chinese pinyin order by (#17152)
bug: some chinese word not sort by pinyin in GBK coding

CREATE TABLE `test_convert` (
                 `a` varchar(100) NULL
             ) ENGINE=OLAP
               DUPLICATE KEY(`a`)
               DISTRIBUTED BY HASH(`a`) BUCKETS 3
               PROPERTIES (
               "replication_allocation" = "tag.location.default: 1"
               );
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a    |
+------+
| a    |
| c    |
| 丝   |
| b    |
| 多   |
| 睿   |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);          
+------+
| a    |
+------+
| a    |
| b    |
| c    |
| 多   |
| 丝   |
| 睿   |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
883f575cfe [fix](string function) fix wrong usage of iconv_open (#17048)
* [fix](string function) fix wrong usage of iconv_open

Also add test case for function convert

* fix test case
2023-02-24 09:13:10 +08:00
e04c13b7a6 [enhancement](exception safe) make function state exception safe (#16771) 2023-02-20 23:01:45 +08:00
7d5a10e1af [bug](function) fix mask_first_n function can't handle const value (#16308) 2023-02-03 10:32:42 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
adb758dcac [refactor](remove non vec code) remove json functions string functions match functions and some code (#16141)
remove json functions code
remove string functions code
remove math functions code
move MatchPredicate to olap since it is only used in storage predicate process
remove some code in tuple, Tuple structure should be removed in the future.
remove many code in collection value structure, they are useless
2023-01-26 16:21:12 +08:00
199d7d3be8 [Refactor]Merged string_value into string_ref (#15925) 2023-01-22 16:39:23 +08:00
2c9c7c48ac [improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674) 2023-01-09 15:13:16 +08:00
Pxl
6b3721af23 [Bug](function) fix core dump on reverse() when big string input
fix core dump on reverse() when big string input
2022-12-23 10:14:09 +08:00
77c15729d4 [fix](memory) Fix too many repeat cause OOM (#15217) 2022-12-22 17:16:18 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
b5c0d4870d [fix](nereids)fix bug of elt and sub_replace function (#14971) 2022-12-12 17:37:36 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
33349c3419 [feature](function)Support negative index for function split_part (#13914) 2022-12-12 09:56:09 +08:00
7a08a799e9 [Vectorized](function) support order by convert_to function (#14555) 2022-11-29 15:22:27 +08:00
5d7b51dcc2 [BugFix](Concat) output of string concat function exceeds UINT makes crash (#13916) 2022-11-03 19:44:44 +08:00
5805011629 [Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694)
Implementation of mask function from hive.
2022-10-28 10:43:56 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
ffcb2f8525 [opt](exec) Replace get_utf8_byte_length function by array (#13664) 2022-10-27 09:46:41 +08:00
e62d3dd8e5 [opt](function) refactor extract_url to use StringValue (#13508)
change extract_url use stringvalue to repalce std::string to speed up
2022-10-21 08:33:39 +08:00
d624ff0580 [chore](macOS) Avoid using binutils from Homebrew to build third parties (#13512)
Overwrite the environment variable PATH to avoid using binutils from Homebrew to build third parties which may cause compilation errors.

Error: building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format
2022-10-21 01:28:30 +08:00
2b328eafbb [function](string_function) add new string function 'extract_url_parameter' (#13323) 2022-10-20 11:11:43 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
f0dbbe5b46 [Bug](funciton) fix repeat coredump when step is to long (#13408) 2022-10-18 09:55:06 +08:00
144486e220 [Opt](fun) simd the substring function and use stack buf to speed up (#13338) 2022-10-16 11:48:34 +08:00
5757bbc9f3 fix be oom when replace with an empty old str (#13220) 2022-10-10 15:58:12 +08:00
Pxl
ee3dd423b9 [Bug](function) core dump on substr #13007 2022-09-28 08:54:49 +08:00
dd6ed5a9a7 [fix](function)fix string split function buffer overflow (#12834) 2022-09-24 17:32:00 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
09b45f2b71 [Function](ELT)Add elt function (#12321) 2022-09-07 15:21:08 +08:00
cf5d194fe1 [enhancement](array-type) Split Array Offsets and String Offsets (#12341)
In old Doris version string offsets are 32bit, but it is not enough for Array type.
If we change string offsets from 32bit to 64bit, there will be problem if we upgrade BE one by one. Because at the same time 32bit Offsets and 64 bit Offsets String will exist at the same time.
As a result, we separate the Codes for Array Offsets.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-09-06 11:18:27 +08:00
df47b6941d [feature-wip](array-type) support the array type in reverse function (#11213)
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-08-09 20:49:09 +08:00
3e3b2d15d4 [bug]string pad functions should always be nullable (#11140)
* string pad functions should always be nullable
2022-07-26 10:20:11 +08:00
5793cb11d0 [feature-wip] (array-type) function concat_ws support array (#10749)
Issue #10052
function concat_ws support array
2022-07-17 17:50:39 +08:00
8012d63ea0 [fix] substr('', 1, 5) return empty string instead of null (#10622) 2022-07-06 22:51:02 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
0e404edf54 [improvement] Change array offset type from UInt32 to UInt64 (#10070)
Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
2022-06-19 10:24:08 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00