doris

Author	SHA1	Message	Date
Mingyu Chen	ef2151ae66	[Feature-WIP](multi-catalog) Add Hive sink on BE side. (#32306 ) (#32364 ) bp #32306 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-03-18 11:23:01 +08:00
zclllyybb	b177b26d39	[branch-2.1](tracing) Pick pipeline tracing and relative bugfix (#31367 ) * [Feature](pipeline) Trace pipeline scheduling (part I) (#31027) * [fix](compile) Fix performance compile fail #31305 * [fix](compile) Fix macOS compilation issues for PURE macro and CPU core identification (#31357) * [fix](compile) Correct PURE macro definition to fix compilation on macOS * 2 --------- Co-authored-by: zy-kkk <zhongyk10@gmail.com>	2024-02-29 08:42:35 +08:00
wangbo	0433b8730d	[Feature](profile)add shuffle send rows/bytes #30456	2024-01-28 18:25:08 +08:00
wangbo	0d691c638b	[Feature](profile)Support report runtime workload statistics #29591	2024-01-12 11:59:27 +08:00
yiguolei	b142ade69e	[refactor](renamefile) rename some files according to the class names (#28606 )	2023-12-19 14:10:11 +08:00
yiguolei	73f7b61019	[refactor](scanner) use weak ptr to lock task execution context to avoid core in scanner dctor (#28493 ) using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.	2023-12-18 14:09:32 +08:00
meiyi	1e5ff40e17	[refactor](group commit) remove future block (#27720 ) Co-authored-by: huanghaibin <284824253@qq.com>	2023-12-11 08:41:51 +08:00
Kaijie Chen	e29d8cb110	[feature](move-memtable) support pipelineX in sink v2 (#27067 )	2023-11-16 15:00:55 +08:00
Kaijie Chen	a4c9beba85	[fix](move-memtable) fallback if partial update (#25801 )	2023-10-24 10:29:59 +08:00
Kaijie Chen	2e97044706	[fix](move-memtable) fix inverted index condition (#25684 )	2023-10-20 17:37:39 +08:00
meiyi	d0cd535cb9	[improvement](insert) refactor group commit stream load (#25560 )	2023-10-20 13:27:30 +08:00
Kaijie Chen	11fecafb74	[fix](move-memtable) fallback if target table contains inverted index (#25498 )	2023-10-18 22:11:59 +08:00
HappenLee	dc9fa1a4f1	[Refactor](Sink) convert to tablet sink to tablet writer (#24474 )	2023-09-20 14:47:18 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
HappenLee	c74ca15753	[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589 )	2023-08-31 22:44:25 +08:00
Kaijie Chen	2b6d876280	[feature](move-memtable)[6/7] add options to enable memtable on sink node (#23470 ) Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>	2023-08-25 22:32:22 +08:00
HappenLee	5c2fae7ce5	[pipeline](exec) Refactor the table sink code in remove unless code (#23223 ) Refactor the table sink code in remove unless code	2023-08-22 20:42:14 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
Pxl	591aee528d	[Bug](exchange) change BlockSerializer from unique_ptr to object (#22653 ) change BlockSerializer from unique_ptr to object	2023-08-07 14:47:21 +08:00
Xinyi Zou	93b53cf2f4	[improvement](exception-safe) create and prepare node/sink support exception safe (#20551 )	2023-06-09 21:06:59 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
HappenLee	ef0657c072	[Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed (#19783 ) RegressionTest failed release resouce cause DCHECK failed	2023-05-18 18:57:25 +08:00
HappenLee	fe42e52851	[pipeline](CTE) Support multi stream data sink in pipeline (#19519 )	2023-05-18 10:34:37 +08:00
HappenLee	b68857902e	[Compile](BE) Fix compile failed with tcmalloc (#18748 )	2023-04-18 09:26:45 +08:00
yongjinhou	b59c4b4702	[fix](build) Fix missing header files (#18740 )	2023-04-17 21:22:15 +08:00
Adonis Ling	9e960f4c4f	[chore](build) Use include-what-you-use to optimize includes (#18681 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-17 11:44:58 +08:00
Gabriel	0bb6005143	[Improvement](thrift) optimize thrift messages (#16383 ) Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment	2023-02-16 11:07:46 +08:00
Gabriel	d0e8f84279	[feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612 )	2023-01-10 10:38:35 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	e9a201e0ec	[refactor](non-vec) delete some non-vec exec node (#15239 ) * [refactor](non-vec) delete some non-vec exec node	2022-12-22 14:05:51 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
zhangstar333	22a8d35999	[Feature](vectorized) support jdbc sink for insert into data to table (#12534 )	2022-09-15 11:08:41 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
HappenLee	d9b6e07e9d	[Vectorized] Support ODBC sink for vec exec engine (#11045 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-20 19:09:41 +08:00
camby	a7df6e3dee	rename some files inside vec/sink dir (#10636 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-06 17:52:47 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
Gabriel	1220cc147d	[feature](vectorized) Support outfile on vectorized engine (#10013 ) This PR supports output csv format file on vectorized engine. Parquet is still not supported.	2022-06-10 09:15:53 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
zhangstar333	f8411f3c6a	[refactor](mysql_table_writer)split into two parts of vectorized and row mode (#8081 )	2022-02-17 11:29:25 +08:00
zhangstar333	25d64775d1	[Vectorized][Feature] Support mysql external table insert into stm (#7979 )	2022-02-15 14:58:58 +08:00
HappenLee	ef233701b3	[feature](vec)(load) Support vtablet sink to enable insert into by using vec query engine (#7957 ) Support vtablet sink to enable insert into query in vec query engine	2022-02-08 11:04:09 +08:00
Mingyu Chen	5fc0a9f40d	[improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319 ) This PR mainly changes: 1. Help to Cancel the load job ASAP when encounter unqualified data. Solution is described in #6318 . Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues. 2. fix a NPE bug when create user with empty host 3. fix compile warning after rebasing the master(vectorization)	2022-01-18 13:13:55 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
EmmyMiao87	9469b2ce1a	[Outfile] Support concurrent export of query results (#6539 ) This pr mainly supports 1. Export query result sets concurrently 2. Query result set export supports s3 protocol Among them, there are several preconditions for concurrently exporting query result sets 1. Enable concurrent export variables 2. The query itself can be exported concurrently (some queries containing sort nodes at the top level cannot be exported concurrently) 3. Export the s3 protocol used instead of the broker After exporting the result set concurrently, the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}	2021-09-07 11:53:32 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00

1 2

64 Commits