doris

Author	SHA1	Message	Date
Zhengguo Yang	3eedd15f9c	[optimize] optimze tablet read, avoid to create too much scanner for small tablet (#8096 )	2022-03-08 13:59:45 +08:00
Pxl	668188b91f	[improvement][vectorized] support es node predicate peel (#8174 )	2022-02-26 17:02:54 +08:00
yinzhijian	936da4f10a	[feature](thread-pool) Support thread pool per disk for scanners (#7994 ) Support thread pool per disk for scanners to prevent pool performance from some high ioutil disks happening key point: 1. each disk has a thread pool for scanners 2. whenever a thread pool of one disk runs out of local work, tasks can be retrieved from other threads(disks). This is done round-robin. performance testing: vec version: 25% faster than single thread pool in a high io util disk test case normal version: 8% faster than single thread pool in a high io util disk test case	2022-02-18 09:40:58 +08:00
yiguolei	aea3e4e59b	[refactor] Remove version hash from BE and related test in BE (#8027 )	2022-02-14 09:29:27 +08:00
zuochunwei	4e783afa7a	[feature] add Generic debug timer for debugging or profiling (#7923 ) add a group of debug-timer for the purpose of profiling or testing you can use these timers for custom meaning purpose unlike the specific named timer	2022-01-31 22:15:43 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
924060929	563545475e	[Optimize](Runtime Filter) Support merge in runtime filter(#7546 ) (#7547 ) Support merge IN predicate when exist remote target(e.g. shuffle hash join). Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target. Close related #7546	2022-01-06 19:08:35 +08:00
zhoubintao	85521944dd	[refactor](olap-scan-node) Refactor olap scannode (#7131 ) 1. Delete useless variables 2. Add const modifier for read-only function 3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: [https://en.cppreference.com/w/cpp/language/rule_of_three] 4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override	2021-12-16 10:33:41 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
jiafeng.zhang	088a16d33b	Chinese annotation modification (#6958 ) * Modify Chinese comment (#6951)	2021-11-09 18:00:14 +08:00
HappenLee	c3b133bdb3	[Refactor] Refactor the reader code (#6866 ) 1. Removed useless redundant code logic 2. Change reader to interface, add tuple reader to simplify the structure of reader	2021-10-30 18:15:28 +08:00
Zhengguo Yang	4170aabf83	[Optimize] optimize some session variable and profile (#6920 ) 1. optimize error message when using batch delete 2. rename session variable is_report_success to enable_profile 3. add table name to OlapScanner profile	2021-10-27 18:03:12 +08:00
Mingyu Chen	adb6bfdf74	[Bug] Fix bug that truncate table may change the storage medium property (#6905 )	2021-10-25 10:07:27 +08:00
Zhengguo Yang	7297b275f1	[Optimize] Optimize cpu consumption when importing parquet files (#6782 ) Remove part of dynamic_cast, reduce the overhead caused by type conversion, and probably reduce the cpu consumption of parquet file import by about 10%	2021-10-03 12:14:35 +08:00
Mingyu Chen	ad3c9390a2	[Bug] Fix bdbje getDatabaseNames() bug and scan node close bug (#6769 ) 1. This bug is introduced from #6582 2. Optimize the error log of Address used used error msg. 3. Add some document about compilation. 1. Add a custom thirdparty download url. 2. Add a custom com.alibaba maven jar package for DataX. 4. Fix bug that BE crash when closing scan node, introduced from #6622.	2021-09-29 11:11:28 +08:00
thinker	850cf10991	[Refactor] refactor olap_scan_node: discard boost, remove dynamic_cast (#6622 ) 1. refactor olap_scan_node: discard boost, remove dynamic_cast 2. use move instead of copy version for push_back	2021-09-27 10:32:57 +08:00
Zhengguo Yang	5c45e26644	Fixed zone map init error for string type (#6667 ) Fixed the problem that the StringValue memory generated by Expr may be released before use Fixed from_string for String type may overflow	2021-09-23 09:44:22 +08:00
Mingyu Chen	74ddea8d83	[Optimize] Remove some unused code to reduce lock contention (#6566 ) 1. Remove global runtime profile counter 2. Remove unused thread token register	2021-09-07 11:56:12 +08:00
Mingyu Chen	3f2fdd236f	Add scan thread token (#6443 )	2021-08-27 10:56:17 +08:00
Zhengguo Yang	8738ce380b	Add long text type STRING, with a maximum length of 2GB. Usage is similar to varchar, and there is no guarantee for the performance of storing extremely long data (#6391 )	2021-08-18 09:05:40 +08:00
Mingyu Chen	2030c44dba	[Log] Modify some log level on BE side (#6381 )	2021-08-14 10:25:45 +08:00
stdpain	34af66bf1d	[BUG][Memory] fix memory tracker DCHECK fail in debug mode and Fix Process Memory limit fail (#6438 )	2021-08-14 10:24:33 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00
stdpain	1999a0c26b	[optimization] open gcc strict-aliasing optimization (#6034 ) * open gcc strict-aliasing optimization * use -Werror=strick-alias	2021-06-18 11:39:24 +08:00
Xinyi Zou	5748241dab	[Bug-fix] When query cancel, transfer_thread does not continue to schedule scanner_thread (#5768 ) The cause of the problem is that after query cancel, OlapScanNode::transfer_thread still continues to schedule OlapScanNode::scanner_thread until all tasks are scheduled. Although each task does not scan data and exits quickly, it still consumes a lot of resources. (Guess)This may be the cause of the BUG (#5767) causing the I/O to be full. So after query cancel, immediately exit the scheduling loop in transfer_thread, and after waiting for the end of all scanner_threads, transfer_thread will also exit.	2021-05-19 09:26:58 +08:00
Zhengguo Yang	98e80aa65e	[refactor] Replace boost::function with std::function (#5700 ) Replace boost::function with std::function	2021-05-09 22:00:48 +08:00
HappenLee	6ad1bf7d7e	[Bug] Fix dead lock in olap scan node and refactor some code in FE profile (#5713 ) * [Bug] Fix dead lock in olap scan node and refactor some code in FE profile * Add some comment	2021-04-30 10:12:18 +08:00
Zhengguo Yang	c4cc681d14	remove boost_foreach, using c++ foreach instead (#5611 )	2021-04-15 10:52:29 +08:00
xxiao2018	1100a0f3a0	[Profile] Add more timer for scan thread (#5511 ) 1. Add timer to count the time the transfer thread waits for the scaner thread to return rowbatch. 2. Add timer to count the time that the scanner thread waits for the available worker threads in the thread pool. Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-03-15 10:07:11 +08:00
stdpain	7eae3e280a	[optimization] use inline optimize ExprContext::get_value (#5385 )	2021-02-16 22:35:14 +08:00
HappenLee	a5298d617d	[Performance Improve] Push Down _conjunctf of 'not in' and '!=' to Storage Engine. (#5207 )	2021-01-23 21:07:01 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
HuangWei	17d939b789	[Bug] Fix scanner threads heap-use-after-free (#5111 ) Scanner threads may be running and using the member vars of OlapScanNode, when the OlapScanNode has already destroyed. We can use `_running_thread` to be the last accessed member variable. And `transfer_thread` need to wait for `_running_thread==0`. After `transfer_thread` joined, `OlapScanNode::close()` can continue.	2021-01-04 09:28:51 +08:00
HappenLee	9e19b6b133	[Performance Improve] Push Down _conjunct of 'A is NULL' and 'B is not NULL' to Storage Engine. (#5092 ) This patch mainly do the following: - Support #5086 - Refactor ColumnRangeValue to support contain null	2021-01-03 15:45:07 +08:00
Mingyu Chen	81c7c0360e	[Bug] Fix a core dump of counter in BE (#5078 ) Introduced by PR #5051. As @liutang123 said, when PlanFragmentExecutor is destructed, it will call `close -> ExecNode::close -> OlapScanNode::close`. OlapScanNode will wait for `_transfer_thread`. `_transfer_thread` will wait for all OlapScanner processing to complete. OlapScanner is processed by the scanner thread. When the last scanner processing is completed, `_transfer_thread` will break out of the loop, and PlanFragmentExecutor will continue to destruct. And if it is completed, its RuntimeProfile::Counter will also be destructed. At this time, the ScopedTimer in the Scan thread may still use this Counter when it is destructed. So we must make sure that the timer is deconstructed before deconstructing the runtime profile.	2020-12-15 09:33:38 +08:00
HappenLee	0a0e46fd53	[Bug] Fix the bug of where condition a in ('A', 'B', 'V') and a in ('A') return error result (#5072 ) And Refactor ColumnRangeValue and OlapScanNode This patch mainly do the following: - Fix issue #5071 - Change type_min in ColumnRangeValue as static - Add Class of type_limit make code clear - Refactor the function of normalize_in_and_eq_predicate	2020-12-15 09:29:10 +08:00
Lijia Liu	ff4bd1223f	[Profile] Add cpu time cost in query audit (#5051 )	2020-12-13 22:22:15 +08:00
HappenLee	6021d6fc7f	[Performance Optimization] Remove push down conjuncts in olap scan node (#4999 ) Push conjunct to Storage Engine as more as possible olap scan node do not need filter data use push down conjuncts again. fix #4986	2020-12-06 08:50:08 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Xinyi Zou	66132d2836	[Feature] Running Profile OLAP_SCAN_NODE layering and enhance readability (#4825 ) mainly includes: - `OLAP_SCAN_NODE` profile layering: `OLAP_SCAN_NODE`,`OlapScanner`, and `SegmentIterator`. - Delete meaningless statistical values. mainly in scan_node.cpp. - Increase `RowsConditionsFiltered` statistical, split from `RowsDelFiltered`, the meaning is the number of rows filtered by various column indexes, only in segment V2. - Modify the document based on the above, and enhance readability.	2020-11-11 21:21:25 +08:00
HappenLee	b1c1ffda4a	[Refactor] Refactor olap scan node code (#4823 ) 1. Remove meaningless code in Doris 2. Replace string copy by string reference 3. Simplified the implementation of some functions	2020-11-01 09:12:23 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
Mingyu Chen	8b0b120aca	[Profile] Add 2 Segment related metrics in query profile (#4348 ) Total number of segments and filterd number of segment	2020-08-27 12:07:21 +08:00
sduzh	3ce6fc631e	[BUG] Fix wrong result of querying with cast expr in where clause (#4219 )	2020-08-01 17:46:39 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Mingyu Chen	725ebafd99	[Bug] Cancel the query if OlapScanner prepare failed (#4002 )	2020-07-03 21:33:07 +08:00

1 2

84 Commits