doris

Author	SHA1	Message	Date
meiyi	d0cd535cb9	[improvement](insert) refactor group commit stream load (#25560 )	2023-10-20 13:27:30 +08:00
Kaijie Chen	11fecafb74	[fix](move-memtable) fallback if target table contains inverted index (#25498 )	2023-10-18 22:11:59 +08:00
HappenLee	dc9fa1a4f1	[Refactor](Sink) convert to tablet sink to tablet writer (#24474 )	2023-09-20 14:47:18 +08:00
meiyi	82dc970916	[feature](insert) Support group commit insert (#22829 )	2023-09-08 15:51:03 +08:00
HappenLee	c74ca15753	[pipeline](sink) Supprt Async Writer Sink of result file sink and memory scratch sink (#23589 )	2023-08-31 22:44:25 +08:00
Kaijie Chen	2b6d876280	[feature](move-memtable)[6/7] add options to enable memtable on sink node (#23470 ) Co-authored-by: Siyang Tang <82279870+TangSiyang2001@users.noreply.github.com>	2023-08-25 22:32:22 +08:00
HappenLee	5c2fae7ce5	[pipeline](exec) Refactor the table sink code in remove unless code (#23223 ) Refactor the table sink code in remove unless code	2023-08-22 20:42:14 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
Pxl	591aee528d	[Bug](exchange) change BlockSerializer from unique_ptr to object (#22653 ) change BlockSerializer from unique_ptr to object	2023-08-07 14:47:21 +08:00
Xinyi Zou	93b53cf2f4	[improvement](exception-safe) create and prepare node/sink support exception safe (#20551 )	2023-06-09 21:06:59 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
HappenLee	ef0657c072	[Bug](pipeline) RegressionTest failed release resouce cause DCHECK failed (#19783 ) RegressionTest failed release resouce cause DCHECK failed	2023-05-18 18:57:25 +08:00
HappenLee	fe42e52851	[pipeline](CTE) Support multi stream data sink in pipeline (#19519 )	2023-05-18 10:34:37 +08:00
HappenLee	b68857902e	[Compile](BE) Fix compile failed with tcmalloc (#18748 )	2023-04-18 09:26:45 +08:00
yongjinhou	b59c4b4702	[fix](build) Fix missing header files (#18740 )	2023-04-17 21:22:15 +08:00
Adonis Ling	9e960f4c4f	[chore](build) Use include-what-you-use to optimize includes (#18681 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-17 11:44:58 +08:00
Gabriel	0bb6005143	[Improvement](thrift) optimize thrift messages (#16383 ) Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment	2023-02-16 11:07:46 +08:00
Gabriel	d0e8f84279	[feature](vectorized) Support MemoryScratchSink on vectorized engine (#15612 )	2023-01-10 10:38:35 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
Gabriel	e9a201e0ec	[refactor](non-vec) delete some non-vec exec node (#15239 ) * [refactor](non-vec) delete some non-vec exec node	2022-12-22 14:05:51 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Zhengguo Yang	12652ebb0e	[UDF](java udf) using config to enable java udf instead of macro at compile time (#14062 ) * [UDF](java udf) useing config to enable java udf instead of macro at compile time	2022-11-11 09:03:52 +08:00
zhangstar333	22a8d35999	[Feature](vectorized) support jdbc sink for insert into data to table (#12534 )	2022-09-15 11:08:41 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
HappenLee	d9b6e07e9d	[Vectorized] Support ODBC sink for vec exec engine (#11045 ) Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-07-20 19:09:41 +08:00
camby	a7df6e3dee	rename some files inside vec/sink dir (#10636 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-07-06 17:52:47 +08:00
Pxl	fd0bd395ac	[Enhancement] Remove some unused include (#10035 )	2022-06-17 10:47:25 +08:00
Gabriel	1220cc147d	[feature](vectorized) Support outfile on vectorized engine (#10013 ) This PR supports output csv format file on vectorized engine. Parquet is still not supported.	2022-06-10 09:15:53 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
Mingyu Chen	869fdff2f0	[refactor] add reference path for source file from impala (#9115 ) According to the requirements of the APLv2, the referenced code needs to be marked with the path of the source code.	2022-04-20 12:29:57 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
zhangstar333	f8411f3c6a	[refactor](mysql_table_writer)split into two parts of vectorized and row mode (#8081 )	2022-02-17 11:29:25 +08:00
zhangstar333	25d64775d1	[Vectorized][Feature] Support mysql external table insert into stm (#7979 )	2022-02-15 14:58:58 +08:00
HappenLee	ef233701b3	[feature](vec)(load) Support vtablet sink to enable insert into by using vec query engine (#7957 ) Support vtablet sink to enable insert into query in vec query engine	2022-02-08 11:04:09 +08:00
Mingyu Chen	5fc0a9f40d	[improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319 ) This PR mainly changes: 1. Help to Cancel the load job ASAP when encounter unqualified data. Solution is described in #6318 . Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues. 2. fix a NPE bug when create user with empty host 3. fix compile warning after rebasing the master(vectorization)	2022-01-18 13:13:55 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
EmmyMiao87	9469b2ce1a	[Outfile] Support concurrent export of query results (#6539 ) This pr mainly supports 1. Export query result sets concurrently 2. Query result set export supports s3 protocol Among them, there are several preconditions for concurrently exporting query result sets 1. Enable concurrent export variables 2. The query itself can be exported concurrently (some queries containing sort nodes at the top level cannot be exported concurrently) 3. Export the s3 protocol used instead of the broker After exporting the result set concurrently, the file prefix is changed to outfile_{query_instance_id}_filenumber.{file_format}	2021-09-07 11:53:32 +08:00
HappenLee	9216735cfa	[New Featrue] Support Vectorization Execution Engine Interface For Doris (#6329 ) 1. FE vectorized plan code 2. Function register vec function 3. Diff function nullable type 4. New thirdparty code and new thrift struct	2021-08-11 14:54:06 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Yingchun Lai	58e58c94d8	[TSAN] Fix tsan bugs (part 1) (#5162 ) ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread problems, such as data race, mutex problems, etc. We should detect TSAN problems for Doris BE, both unit tests and server should pass through TSAN mode, to make Doris more robustness. This is the very beginning patch to fix TSAN problems, and some difficult problems are suppressed in file 'tsan_suppressions', you can suppress these problems by setting: export TSAN_OPTIONS="suppressions=tsan_suppressions" before running: `BUILD_TYPE=tsan ./run-be-ut.sh --run`	2021-01-15 09:45:11 +08:00
HappenLee	115d4332aa	[ODBC] Support ODBC Sink for insert into data to ODBC external table (#5033 ) issue:#5031 1. Support ODBC Sink for insert into data to ODBC external table. 2. Support Transaction for ODBC sink to make sure insert into data is atomicital. 3. The document about ODBC sink has been modified	2020-12-13 21:53:27 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Yunfeng,Wu	e3348c46a9	Expose data pruned-filter-scan ability (#1527 )	2019-08-11 12:59:24 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
ZHAO Chun	9d03ba236b	Uniform Status (#1317 )	2019-06-14 23:38:31 +08:00
ZHAO Chun	934ca2481a	Make MySQL support optional (#1248 )	2019-06-05 12:28:15 +08:00
chenhao	0e5b193243	Add cpu and io indicates to audit log (#531 )	2019-01-17 12:43:15 +08:00

1 2

54 Commits