doris

Author	SHA1	Message	Date
Pxl	5688c28364	[Bug](runtime-filter) try to fix heap use after free on runtime filter send filter size (#33465 ) (#33522 )	2024-04-11 13:10:24 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
wangbo	0d691c638b	[Feature](profile)Support report runtime workload statistics #29591	2024-01-12 11:59:27 +08:00
yiguolei	fc4ca712ed	[bugfix](core) using weak ptr in data stream receiver to avoid runtime state is deconstructed (#29410 )	2024-01-12 11:48:39 +08:00
yiguolei	8cf360fff7	[refactor](closure) remove ref count closure using auto release closure (#26718 ) 1. closure should be managed by a unique ptr and released by brpc , should not hold by our code. If hold by our code, we need to wait brpc finished during cancel or close. 2. closure should be exception safe, if any exception happens, should not memory leak. 3. using a specific callback interface to be implemented by Doris's code, we could write any code and doris should manage callback's lifecycle. 4. using a weak ptr between callback and closure. If callback is deconstruted before closure'Run, should not core.	2023-11-12 11:57:46 +08:00
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
Gabriel	12075f9853	[pipelineX](projection) Support projection and blocking agg (#23256 )	2023-08-21 22:23:02 +08:00
Gabriel	d585a8acc1	[Improvement](shuffle) Accumulate rows in a batch for shuffling (#22218 )	2023-08-01 09:55:06 +08:00
Pxl	19ba6bec38	[Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086 ) support send eos on local exchange and remove some unused code	2023-07-24 09:25:32 +08:00
Adonis Ling	16a394da0e	[chore](build) Use include-what-you-use to optimize includes (PART III) (#18958 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-24 14:51:51 +08:00
HappenLee	f24659c003	[Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650 )	2023-01-06 14:52:43 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
Pxl	bdcb600f3d	[Bug](load) fix core dump on big block load (#13014 )	2022-10-10 12:38:32 +08:00
Adonis Ling	f377c26bf7	[refactor][be] Optimize headers (#9708 )	2022-05-30 16:12:10 +08:00
Zhengguo Yang	5a44eeaf62	[refactor] Unify all unit tests into one binary file (#8958 ) 1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems 2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins 3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests	2022-04-12 15:30:40 +08:00
Mingyu Chen	3048ce8a4f	[improvement][refactor](vec) Refactor serde of vec block and using brpc attachment (#7939 ) This PR mainly changes: 1. Change the define of PBlock The new PBlock consists of a set of PColumnMeta and a binary buffer. The PColumnMeta records the metadata information of all columns in the Block, while the buffer stores the serialized binary data of all columns. 2. Refactor the serialize/deserialize method of data type Rewrite the `serialize()/deserialize()` of IDataType. And also add a new method `get_uncompressed_serialized_bytes()` to get the total length of uncompressed serialized data of a column. 3. Rewrite the serialize/deserialize method of Block Now, when serializing a Block to PBlock, it will first get the total length of uncompressed serialized data of all columns in this Block, and then allocate the memory to write the serialized data to the buffer. 4. Use brpc attachment to transmit the serialized column data	2022-02-08 11:11:42 +08:00
Zhengguo Yang	f8d086d87f	[feature](rpc) (experimental)Support implement UDF through GRPC protocol. (#7519 ) Support implement UDF through GRPC protocol. This brings several benefits: 1. The udf implementation language is not limited to c++, users can use any familiar language to implement udf 2. UDF is decoupled from Doris, udf will not cause doris coredump, udf computing resources are separated from doris, and doris services are not affected But RPC's UDF has a fixed overhead, so its performance is much slower than C++ UDF, especially when the amount of data is large. Create function like ``` CREATE FUNCTION rpc_add(INT, INT) RETURNS INT PROPERTIES ( "SYMBOL"="add_int", "OBJECT_FILE"="127.0.0.1:9999", "TYPE"="RPC" ); ``` Function service need to implement `check_fn` and `fn_call` methods Note: THIS IS AN EXPERIMENTAL FEATURE, THE INTERFACE AND DATA STRUCTURE MAY BE CHANGED IN FUTURE !!!	2022-02-08 09:25:09 +08:00
Mingyu Chen	5fc0a9f40d	[improvement](Load) Cancel the load job ASAP when encounter unqualified data (#6319 ) This PR mainly changes: 1. Help to Cancel the load job ASAP when encounter unqualified data. Solution is described in #6318 . Also replace some std::stringstream with fmt::memory_buffer to avoid performance issues. 2. fix a NPE bug when create user with empty host 3. fix compile warning after rebasing the master(vectorization)	2022-01-18 13:13:55 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

21 Commits