doris

Author	SHA1	Message	Date
Zhengguo Yang	3eedd15f9c	[optimize] optimze tablet read, avoid to create too much scanner for small tablet (#8096 )	2022-03-08 13:59:45 +08:00
Pxl	668188b91f	[improvement][vectorized] support es node predicate peel (#8174 )	2022-02-26 17:02:54 +08:00
HappenLee	51abaa89f3	[fix](vec) Fix some bugs about vec engine (#7884 ) 1. mem leak in vcollector iter 2. query slow in agg table limit 10 3. query slow in SSB q4,q5,q6	2022-02-03 19:21:17 +08:00
zuochunwei	4e783afa7a	[feature] add Generic debug timer for debugging or profiling (#7923 ) add a group of debug-timer for the purpose of profiling or testing you can use these timers for custom meaning purpose unlike the specific named timer	2022-01-31 22:15:43 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
zhoubintao	85521944dd	[refactor](olap-scan-node) Refactor olap scannode (#7131 ) 1. Delete useless variables 2. Add const modifier for read-only function 3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: [https://en.cppreference.com/w/cpp/language/rule_of_three] 4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override	2021-12-16 10:33:41 +08:00
thinker	850cf10991	[Refactor] refactor olap_scan_node: discard boost, remove dynamic_cast (#6622 ) 1. refactor olap_scan_node: discard boost, remove dynamic_cast 2. use move instead of copy version for push_back	2021-09-27 10:32:57 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
stdpain	290a844e04	[optimize] Optimize bloomfilter performance (#6180 ) refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5% import block bloom filter, for avx version obtained 40% performance improvement before: bloomfilter size:default, about 2000W item cost about 1s400ms after: bloomfilter size:524288, about 2000W item cost about 400ms	2021-07-10 10:12:12 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
Xinyi Zou	5748241dab	[Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768 ) The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule OlapScanNode::scanner_thread until all tasks are scheduled. Although each task does not scan data and exits quickly, it still consumes a lot of resources. (Guess)This may be the cause of the BUG (#5767) causing the I/O to be full. So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for the end of all scanner_threads, transfer_thread will also exit.	2021-05-19 09:26:58 +08:00
xxiao2018	1100a0f3a0	[Profile] Add more timer for scan thread (#5511 ) 1. Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch. 2. Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool. Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-03-15 10:07:11 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
HuangWei	17d939b789	[Bug] Fix scanner threads heap-use-after-free (#5111 ) Scanner threads may be running and using the member vars of OlapScanNode, when the OlapScanNode has already destroyed. We can use `_running_thread` to be the last accessed member variable. And `transfer_thread` need to wait for `_running_thread==0`. After `transfer_thread` joined, `OlapScanNode::close()` can continue.	2021-01-04 09:28:51 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
HappenLee	0a0e46fd53	[Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072 ) And Refactor ColumnRangeValue and OlapScanNode This patch mainly do the following: - Fix issue #5071 - Change type_min in ColumnRangeValue as static - Add Class of type_limit make code clear - Refactor the function of normalize_in_and_eq_predicate	2020-12-15 09:29:10 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Xinyi Zou	66132d2836	[Feature] Running Profile OLAP_SCAN_NODE layering and enhance readability (#4825 ) mainly includes: - `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`. - Delete meaningless statistical values. mainly in scan_node.cpp. - Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2. - Modify the document based on the above, and enhance readability.	2020-11-11 21:21:25 +08:00
HappenLee	b1c1ffda4a	[Refactor] Refactor olap scan node code (#4823 ) 1. Remove meaningless code in Doris 2. Replace string copy by string reference 3. Simplified the implementation of some functions	2020-11-01 09:12:23 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Mingyu Chen	8b0b120aca	[Profile] Add 2 Segment related metrics in query profile (#4348 ) Total number of segments and filterd number of segment	2020-08-27 12:07:21 +08:00
ZhangYu0123	97d963468a	[Code Cleanup] Template nest convert to c++11 syntax and style (#4442 )	2020-08-26 10:51:52 +08:00
Mingyu Chen	0cbacaf01d	[Refactor] Replace some boost to std in OlapScanNode (#3934 ) Replace some boost to std in OlapScanNode. This refactor seems solve the problem describe in #3929. Because I found that BE will crash to calling `boost::condition_variable.notify_all()`. But after upgrade to this, BE does not crash any more.	2020-06-29 19:13:03 +08:00
Mingyu Chen	b8ee84a120	[Doc] Add docs to OLAP_SCAN_NODE query profile (#3808 )	2020-06-13 16:25:40 +08:00
Mingyu Chen	27046c5b61	[Enhancement] Improve the performance of query with IN predicate (#3694 ) This CL mainly changes: 1. Add a new BE config `max_pushdown_conditions_per_column` to limit the number of conditions of a single column that can be pushed down to storage engine. 2. Add 2 new session variables `max_scan_key_num` and `doris_max_scan_key_num` which can set in session level and overwrite the config value in BE.	2020-06-04 11:39:00 +08:00
HappenLee	d0fe7e4d94	[Profile] Make running profile clearer and more intuitive to improve usability (#3405 ) This CL mainly made the following modifications: 1. Delete Invalid MemoryUsed Counter and Add PeakMemUsage in each exec node and datastreamsender 2. Add intent in child execnode profile，make it is easily to know the relationship between execnode 3. Del _is_result_order we not support any more in olap_scan_node.h and olap_scan_node.cpp 4. Add scan_disk method to olap_scanner to fix the counter _num_disks_accessed_counter 5. Now we do not use buffer pool to read and write disk, so annotation eadio counter and 6. Delete the MemUsed counter in exec node.	2020-04-30 14:57:21 +08:00
kangpinghuang	d31f774852	Add block split bloom filter (#2471 ) [STORAGE][SEGMENTV2] use block split bloom filter build bloom filter against data page add distinct value to bloom filter add ordinal index to bloom filter index	2019-12-18 12:57:44 +08:00
kangkaisen	f828670245	Add Bitmap index reader (#2319 ) [STORAGE] [INDEX] For #2061 and #2062 Add bitmap index reader SegmentIterator support bitmap index Add some metrics	2019-12-03 23:01:40 +08:00
kangpinghuang	9c2d149c36	add profile for segment v2 (#2015 )	2019-10-22 09:43:16 +08:00
kangkaisen	b246d93128	Avoid SerDe for aggregation query with object pool (#1854 )	2019-09-26 13:51:13 +08:00
ZHAO Chun	0dc0dadad1	Reduce unnecessary memory allocat and copy in OlapScanNode (#1742 )	2019-09-04 21:05:12 +08:00
ZHAO Chun	81ca3e3abf	Free olap scanner out of lock (#1733 ) Close scanner out of OlapScanner's batch lock, which will lead all scanners wait for one scanner to finish.	2019-09-02 16:49:28 +08:00
chenhao	0e5b193243	Add cpu and io indicates to audit log (#531 )	2019-01-17 12:43:15 +08:00
ZHAO Chun	e8360f5eee	Add counters to OlapScanNode (#538 ) There is unnegligible cost to covnert VectorRowBatch to RowBatch, When we seek block, we only read one row from engine to minimize this convert cost. This patch can optimize some query's time from 5s to 2s	2019-01-16 18:57:04 +08:00
Zhao Chun	a2b299e3b9	Reduce UT binary size (#314 ) * Reduce UT binary size Almost every module depend on ExecEnv, and ExecEnv contains all singleton, which make UT binary contains all object files. This patch seperate ExecEnv's initial and destory to anthor file to avoid other file's dependence. And status.cc include debug_util.h which depend tuple.h tuple_row.h, and I move get_stack_trace() to stack_util.cpp to reduce status.cc's dependence. I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a Issue: #292 * Update	2018-11-15 16:17:23 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00
morningman	2868793b6b	Change license to Apache License 2.0 (#262 )	2018-11-01 09:06:01 +08:00
morningman	2419384e8a	push 3.3.19 to github (#193 ) * push 3.3.19 to github * merge to 20ed420122a8283200aa37b0a6179b6a571d2837	2018-05-15 20:38:22 +08:00
李超勇	6486be64c3	fix license statement (#29 ) * change picture to word * change picture to word * SHOW FULL TABLES WHERE Table_type != VIEW sql can not execute * change license description	2017-08-18 19:16:23 +08:00
cyongli	e2311f656e	baidu palo	2017-08-11 17:51:21 +08:00

44 Commits