doris

Author	SHA1	Message	Date
yiguolei	77ab2fac20	[refactor](functioncontext) remove function context impl class (#17715 ) * [refactor](functioncontext) remove function context impl class Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-03-14 11:21:45 +08:00
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
yiguolei	4692d6764c	[refactor](remove string val) remove string val structure, it is same with string ref (#17461 ) remove stringval, decimalv2val, bigintval	2023-03-08 10:42:20 +08:00
yiguolei	17f4990bd3	[enhancement](functioncontext) function context should use shared ptr and simply function context (#17311 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-02 16:23:54 +08:00
HappenLee	3e40467ce6	[Bug](vec) Fix chinese pinyin order by (#17152 ) bug: some chinese word not sort by pinyin in GBK coding CREATE TABLE `test_convert` ( `a` varchar(100) NULL ) ENGINE=OLAP DUPLICATE KEY(`a`) DISTRIBUTED BY HASH(`a`) BUCKETS 3 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝"); Query OK, 6 rows affected (0.03 sec) {'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'} mysql [test]>select * from test_convert; +------+ \| a \| +------+ \| a \| \| c \| \| 丝 \| \| b \| \| 多 \| \| 睿 \| +------+ 6 rows in set (0.01 sec) mysql [test]>select * from test_convert order by convert(a using gbk); +------+ \| a \| +------+ \| a \| \| b \| \| c \| \| 多 \| \| 丝 \| \| 睿 \| +------+ 6 rows in set (0.01 sec)	2023-02-28 14:29:56 +08:00
TengJianPing	883f575cfe	[fix](string function) fix wrong usage of iconv_open (#17048 ) * [fix](string function) fix wrong usage of iconv_open Also add test case for function convert * fix test case	2023-02-24 09:13:10 +08:00
yiguolei	e04c13b7a6	[enhancement](exception safe) make function state exception safe (#16771 )	2023-02-20 23:01:45 +08:00
zhangstar333	7d5a10e1af	[bug](function) fix mask_first_n function can't handle const value (#16308 )	2023-02-03 10:32:42 +08:00
HappenLee	95d7c2de26	[Refactor](function) Rewrite the function elt (#16287 )	2023-02-01 11:17:06 +08:00
yiguolei	adb758dcac	[refactor](remove non vec code) remove json functions string functions match functions and some code (#16141 ) remove json functions code remove string functions code remove math functions code move MatchPredicate to olap since it is only used in storage predicate process remove some code in tuple, Tuple structure should be removed in the future. remove many code in collection value structure, they are useless	2023-01-26 16:21:12 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
Gabriel	2c9c7c48ac	[improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674 )	2023-01-09 15:13:16 +08:00
Pxl	6b3721af23	[Bug](function) fix core dump on reverse() when big string input fix core dump on reverse() when big string input	2022-12-23 10:14:09 +08:00
Xinyi Zou	77c15729d4	[fix](memory) Fix too many repeat cause OOM (#15217 )	2022-12-22 17:16:18 +08:00
Yulei-Yang	21c2e485ae	[improvment](function) add new function substring_index (#15024 )	2022-12-15 09:54:34 +08:00
starocean999	b5c0d4870d	[fix](nereids)fix bug of elt and sub_replace function (#14971 )	2022-12-12 17:37:36 +08:00
liqing-coder	38570312dd	[feature](split_by_string)support split by string function (#13741 )	2022-12-12 15:22:30 +08:00
Yulei-Yang	33349c3419	[feature](function)Support negative index for function split_part (#13914 )	2022-12-12 09:56:09 +08:00
zhangstar333	7a08a799e9	[Vectorized](function) support order by convert_to function (#14555 )	2022-11-29 15:22:27 +08:00
AlexYue	5d7b51dcc2	[BugFix](Concat) output of string concat function exceeds UINT makes crash (#13916 )	2022-11-03 19:44:44 +08:00
Jerry Hu	5805011629	[Feature](string-function) Add function mask/mask_first_n/mask_last_n (#13694 ) Implementation of mask function from hive.	2022-10-28 10:43:56 +08:00
zhangstar333	43c6428aea	[Function](string) support sub_replace function (#13736 ) * [Function](string) support sub_replace function * remove conf	2022-10-28 08:40:08 +08:00
HappenLee	ffcb2f8525	[opt](exec) Replace get_utf8_byte_length function by array (#13664 )	2022-10-27 09:46:41 +08:00
HappenLee	e62d3dd8e5	[opt](function) refactor extract_url to use StringValue (#13508 ) change extract_url use stringvalue to repalce std::string to speed up	2022-10-21 08:33:39 +08:00
Adonis Ling	d624ff0580	[chore](macOS) Avoid using binutils from Homebrew to build third parties (#13512 ) Overwrite the environment variable PATH to avoid using binutils from Homebrew to build third parties which may cause compilation errors. Error: building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format	2022-10-21 01:28:30 +08:00
DongLiang-0	2b328eafbb	[function](string_function) add new string function 'extract_url_parameter' (#13323 )	2022-10-20 11:11:43 +08:00
zy-kkk	8a068c8c92	[function](string_function) add new string function 'not_null_or_empty' (#13418 )	2022-10-19 11:10:37 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
HappenLee	f0dbbe5b46	[Bug](funciton) fix repeat coredump when step is to long (#13408 )	2022-10-18 09:55:06 +08:00
HappenLee	144486e220	[Opt](fun) simd the substring function and use stack buf to speed up (#13338 )	2022-10-16 11:48:34 +08:00
AlexYue	5757bbc9f3	fix be oom when replace with an empty old str (#13220 )	2022-10-10 15:58:12 +08:00
Pxl	ee3dd423b9	[Bug](function) core dump on substr #13007	2022-09-28 08:54:49 +08:00
starocean999	dd6ed5a9a7	[fix](function)fix string split function buffer overflow (#12834 )	2022-09-24 17:32:00 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
yongjinhou	09b45f2b71	[Function](ELT)Add elt function (#12321 )	2022-09-07 15:21:08 +08:00
camby	cf5d194fe1	[enhancement](array-type) Split Array Offsets and String Offsets (#12341 ) In old Doris version string offsets are 32bit, but it is not enough for Array type. If we change string offsets from 32bit to 64bit, there will be problem if we upgrade BE one by one. Because at the same time 32bit Offsets and 64 bit Offsets String will exist at the same time. As a result, we separate the Codes for Array Offsets. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-06 11:18:27 +08:00
carlvinhust2012	df47b6941d	[feature-wip](array-type) support the array type in reverse function (#11213 ) Co-authored-by: hucheng01 <hucheng01@baidu.com>	2022-08-09 20:49:09 +08:00
starocean999	3e3b2d15d4	[bug]string pad functions should always be nullable (#11140 ) * string pad functions should always be nullable	2022-07-26 10:20:11 +08:00
zxealous	5793cb11d0	[feature-wip] (array-type) function concat_ws support array (#10749 ) Issue #10052 function concat_ws support array	2022-07-17 17:50:39 +08:00
Dongyang Li	8012d63ea0	[fix] substr('', 1, 5) return empty string instead of null (#10622 )	2022-07-06 22:51:02 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
camby	0e404edf54	[improvement] Change array offset type from UInt32 to UInt64 (#10070 ) Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now. If we call array_union to merge arrays repeatedly, the size of array may overflow. So we need to extend it before `Array Data Type` release.	2022-06-19 10:24:08 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
Pxl	cd8694e532	[feature][vectorized] support replace() (#8384 )	2022-03-08 18:57:12 +08:00
zhangstar333	454b45bea3	[feature](vectorize)(function) support regexp&&sm4&&aes functions (#8307 )	2022-03-08 13:14:02 +08:00
HappenLee	56adc7f56b	[Bug][vec] Fix bug of nullable const value convert to argument cause coredump (#8139 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-02-20 20:05:23 +08:00
Pxl	b26e7e3c28	[feature](function)(vec) support locate function (#7988 ) * support function locate in vectorized engine * add ut and fix some bug	2022-02-12 16:00:37 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

48 Commits