doris

Author	SHA1	Message	Date
ShowCode	f565f60bc3	[refactor](standard)BE:Initialize pointer variables in the class to nullptr by default (#27587 )	2023-11-28 13:02:30 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
Mryange	6b2eed779c	[feature](AuditLog) add scanRows scanBytes in auditlog (#25435 )	2023-10-25 10:00:35 +08:00
Jerry Hu	b5ee4a9dbb	[enhancement](profilev2) add some fields for profile v2 (#25611 ) Add 3 counters for ExecNode: ExecTime - Total execution time(excluding the execution time of children). OutputBytes - The total number of bytes output to parent. BlockCount - The total count of blocks output to parent.	2023-10-23 15:55:40 +08:00
zhiqqqq	09e03247ec	[chore](readability) Better readability of ExecNode.cpp #24733	2023-09-22 08:54:57 +08:00
Gabriel	9d1f2cd8e0	[Improvement](pipeline) Terminate early for short-circuit join (#23378 )	2023-08-23 19:40:17 +08:00
Xinyi Zou	96f42ca20a	[fix](memory) Independent count exec node memory profile (#22598 ) Independent count exec node memory profile, after #22582	2023-08-06 10:56:31 +08:00
Lijia Liu	d86c67863d	Remove unused code (#21735 )	2023-07-12 14:48:13 +08:00
zhangstar333	b60860c5e5	[refactor](profile) refactor the join profile when its shared hash table (#20391 ) in join node, if it's broadcast_join and shared hash table, some counter/timer about build hash table is useless, so we could add those counter/timer in faker profile, and those will not display in web profile.	2023-06-09 08:59:49 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
luozenglin	272a7565b8	[improvement](tracing) Remove useless span levels from be side tracing (#19665 ) 1. Remove an exec node method corresponding to a span and replace it with an exec node corresponding to a span; 2. Fix some problems with tracing in pipeline.	2023-05-17 19:04:52 +08:00
yiguolei	8ef9212ddc	[enhancement](exceptionsafe) force check exec node method's return value (#19538 )	2023-05-12 10:21:00 +08:00
yiguolei	4e4fb33995	[refactor](conjuncts) simplify conjuncts in exec node (#19254 ) Co-authored-by: yiguolei <yiguolei@gmail.com> Currently, exec node save exprcontext*, but the object is in object pool, the code is very unclear. we could just use exprcontext.	2023-05-04 18:04:32 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Adonis Ling	9e960f4c4f	[chore](build) Use include-what-you-use to optimize includes (#18681 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-17 11:44:58 +08:00
Gabriel	06788bc2d0	[Bug](pipeline) Fix projection on streaming operator (#16592 )	2023-02-10 15:57:26 +08:00
yiguolei	eba70f972e	[improvement](global context) remove some unused method from runtime state (#16329 ) This is part of #16296. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-02 10:24:55 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
yiguolei	16862d9b43	[refactor](remove unused code) remove buffer pool and disk io mgr (#15853 ) * [refactor](remove buffer pool and disk io mgr) remove unused code Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-13 09:42:58 +08:00
yiguolei	a807978882	[refactor](non-vec) Remove rowbatch code from delta writer and some rowbatch related code (#15349 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-26 08:54:51 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	af54299b26	[Pipeline](projection) Support projection on pipeline engine (#15220 )	2022-12-21 15:47:29 +08:00
zhangstar333	494eb895d3	[vectorized](pipeline) support union node operator (#15031 )	2022-12-19 22:01:56 +08:00
Pxl	c25a7235f9	[Pipeline](load) support pipeline broker load (#14940 ) support pipeline broker load	2022-12-13 00:28:36 +08:00
HappenLee	68092fe514	[pipeline](NLJ) support nested loop join for pipeline (#14966 )	2022-12-10 00:20:16 +08:00
Jerry Hu	873b128fde	[feature](pipeline) add inersect/except operators (#14868 )	2022-12-09 14:13:48 +08:00
HappenLee	b30cd86e9e	[Refactor](pipeline) Refactor operator and builder code of pipeline (#14787 )	2022-12-05 18:35:00 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Xinyi Zou	176f519fa1	[enhancement](memtracker) Optimize exec node memory tracking (#14711 )	2022-12-01 14:52:21 +08:00
HappenLee	d5af4f6558	[Neried](Profile) Add projection timer for neried (#14286 )	2022-11-17 22:17:55 +08:00
Gabriel	1ef85ae1f2	[Improvement](join) Support nested loop outer join (#13965 )	2022-11-10 19:50:46 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
Mingyu Chen	893567628e	[fix](exec-node) fix nullptr of runtime state (#12395 ) Remove default nullptr runtime state, which is very error-prone	2022-09-07 08:46:42 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Mingyu Chen	a16cf0e2c8	[feature-wip](scan) add profile for new olap scan node (#12042 ) Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner. Fix some blocking bug of new scan framework. TODO: Memtracker Opentelemetry spen The new framework is still disabled by default, so it will not effect other feature.	2022-08-30 10:55:48 +08:00
Xinyi Zou	1fc5515a78	[enhancement](memory) Remove unused reservation tracker (#11969 )	2022-08-24 08:49:34 +08:00
Mingyu Chen	7e3fc0d321	[enhancement](vec) Support outer join for vectorized exec engine (#11068 ) Hash join node adds three new attributes. The following will take an SQL as an example to illustrate the meaning of these three attributes ``` select t1. a from t1 left join t2 on t1. a=t2. b; ``` 1. vOutputTupleDesc：Tuple2(a'') 2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>) 2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')> The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap. This code mainly merges the contents of two PRs: 1. [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323) 2. [Fix](Join) Fix the bug of outer join function under vectorization #9954 The following is the specific description of the first PR In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple. The following is the specific description of the second PR This pr mainly fixes the following problems: 1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function. 2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage. For example： ``` select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1 ``` At this time, the nullable property of column k1 in the `tmp` inline view should be true. In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the correctness of the column nullable property of this tableRef is very important. In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable. In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness. That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result. The vectorized nullable attribute requirements are very strict. Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer. Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer. So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem. (At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.) Co-authored-by: EmmyMiao87 Co-authored-by: HappenLee Co-authored-by: morrySnow Co-authored-by: EmmyMiao87 <522274284@qq.com>	2022-07-21 23:39:25 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
luozenglin	d5ea677282	[feature](tracing) Support query tracing to improve doris observability by introducing OpenTelemetry. (#10533 ) The collection of query traces is implemented in fe and be, and the spans are exported to zipkin. DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-012%3A+Introduce+opentelemetry	2022-07-09 15:50:40 +08:00
yiguolei	4ec6e3ee81	[refactor] Remove debug action since it is never used. (#10484 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-06-29 20:37:51 +08:00
Gabriel	eebfbd0c91	Revert "[fix](vectorized) Support outer join for vectorized exec engine (#10323 )" (#10424 ) This reverts commit 2cc670dba697a330358ae7d485d856e4b457c679.	2022-06-25 22:18:08 +08:00
HappenLee	2cc670dba6	[fix](vectorized) Support outer join for vectorized exec engine (#10323 ) In a vectorized scenario, the query plan will generate a new tuple for the join node. This tuple mainly describes the output schema of the join node. Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema. For example: 1. The case where the null side column caused by outer join is converted to nullable. 2. The projection of the outer tuple.	2022-06-24 08:59:30 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
hongbin	e61d296486	[Refactor] Replace '#ifndef' with '#pragma once' (#9456 ) * Replace '#ifndef' with '#pragma once'	2022-05-10 09:25:59 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
zhoubintao	85521944dd	[refactor](olap-scan-node) Refactor olap scannode (#7131 ) 1. Delete useless variables 2. Add const modifier for read-only function 3. Delete the empty destructor, the compiler will automatically generate it, refer to the 3/5/0 rule: [https://en.cppreference.com/w/cpp/language/rule_of_three] 4. It is recommended to add the override keyword (instead of the virtual keyword) to the subclass virtual function. Override will let the compiler help check and improve security. This is also the reason why C++11 introduces override	2021-12-16 10:33:41 +08:00

1 2

67 Commits