doris

Author	SHA1	Message	Date
Jerry Hu	9d1c702b3a	[improvement](function) do not use hyperscan for non-const partterns in like function (#23495 )	2023-08-25 20:40:23 +08:00
Jerry Hu	49f8f20fb1	[fix](regex) String with Chinese characters matching failed (#20493 )	2023-06-07 07:27:47 +08:00
ZhangYu0123	78c37b5244	[Optimize](Function) Add fast path for col like '%%' or col like '%' or regexp '\\.' (#20143 ) Add fast path for col like '%%' or col like '%' or regexp '\\.' (1) like about 34% speed up when use count() test support col like '%%' , col like '%', col not like '%%' , col not like '%' (2) regexp about 37% speed up when use count() test support col regexp '\\.', col not regexp '\\.' Q1: select count() From hits where url like '%'; Q2: select count() From hits where url regexp '\\.*';	2023-06-02 16:26:56 +08:00
Kang	88ca4f3e6b	[feature](like) make like regexp used as a sql function (#19755 )	2023-05-18 10:03:12 +08:00
Pxl	ec517a53a8	[Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155 ) upgrade clang-format version to 16 move thrift to fe-common fix core dump on pipeline engine when operator canceled and not prepared	2023-04-28 14:14:51 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Kang	3d28de6e54	[Enhencement](like) fallback to re2 if hyperscan failed (#18350 )	2023-04-09 09:18:13 +08:00
TengJianPing	9877143210	[fix](like) fix wrong result of like pattern with backslash (#18039 ) Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result. CREATE TABLE `person` ( `id` int(11) NULL, `name` text NULL, `age` int(11) NULL, `class` int(11) NULL, `address` text NULL ) ENGINE=OLAP UNIQUE KEY(`id`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`id`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); insert into person values (10001,'test1',30,2,'test\\\\,xxx'); Adding logs: select * from person where address like '%\\\\%'; I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)\|(\\_)\|([^%_]))+), size: 30 I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3 I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2 I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2 I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0 I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1 I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\% I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \% It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.	2023-03-30 11:05:09 +08:00
HappenLee	7b93c17364	[Bug][Fix] regexp function core dump DCHECK failed and error result (#17953 ) CREATE TABLE `test` ( `name` varchar(64) NULL, `age` int(11) NULL ) ENGINE=OLAP DUPLICATE KEY(`name`) COMMENT 'OLAP' DISTRIBUTED BY HASH(`name`) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2", "disable_auto_compaction" = "false" ); insert into `test` values ("lemon",1),("tom",2); select a.name regexp concat('^', a.name) from test a;	2023-03-21 08:56:19 +08:00
yiguolei	17f4990bd3	[enhancement](functioncontext) function context should use shared ptr and simply function context (#17311 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-03-02 16:23:54 +08:00
yiguolei	e04c13b7a6	[enhancement](exception safe) make function state exception safe (#16771 )	2023-02-20 23:01:45 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
Kang	34f43ac781	[bug](like function)fix like '' (empty string) get wrong result with all rows #14035	2022-11-08 08:51:39 +08:00
Gabriel	1d5ba9cbcc	[Improvement](like) Change `like` function to batch call (#13314 )	2022-10-16 16:18:22 +08:00
Gabriel	86d55dd79c	[Improvement](like function) avoid to convert const column to full column (#13214 )	2022-10-10 14:19:46 +08:00
starocean999	57b3c03371	[enhancement](like)pass data to like function in block not in row (#12825 ) The like predicate process data in block perform better than in row. Currently, only not null column is optimized, nullable column will be handled later. SELECT COUNT(*) FROM hits WHERE URL LIKE '%google%'; before: ~680ms after: ~570ms	2022-09-22 09:59:30 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Jerry Hu	0291f84a9e	[fix](like-predicate) Add missing functions in LikeColumnPredicate (#11631 )	2022-08-10 15:03:14 +08:00
HappenLee	52460af74b	[Bug][Vectorized] Support the .* in hyperscan to valid the % in SQL (#11371 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-08-01 11:00:05 +08:00
Kang	4ea2c04676	Optimize regexp and like using hyperscan (#11116 ) * use hyperscan instead of re2 for regexp and like function	2022-07-27 16:43:58 +08:00
Kang	4e9d5a7f7a	optimize substr performance and fix ASAN global buffer overflow (#10442 ) * add volnitsky substr algorithm * replace std::search with volnitsky search algorithm in StringSearch * optimize substring for constant_substring_fn case use long run length search for performance	2022-07-12 08:36:21 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
Mingyu Chen	9036f93df4	Revert "[improvement](function) optimize substr performance (#10169 )" (#10390 ) This reverts commit 2335d233f1f52eb64a380b4c9959becdf182b71b.	2022-06-24 14:38:52 +08:00
Kang	2335d233f1	[improvement](function) optimize substr performance (#10169 ) optimize substr performance about 1.5~2x speedup.	2022-06-24 08:57:31 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

27 Commits