doris

Author	SHA1	Message	Date
Xinyi Zou	5846b3fc54	[fix](memory) Remove PODArray peak allocated memory tracking #18010 #11740 , solved the problem that the query memory statistics are higher than the actual physical memory, because PODArray does not have memset 0 when allocating memory, and the query mem tracker is virtual memory. But in extreme cases, such as csv load, PODArray frequent insert will cause performance problems. So revert part of #11740 and part of #12820. The accuracy of the query mem tracker, there is currently no feedback, no further attention.	2023-03-26 09:45:10 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
HappenLee	d2be5096d6	[Revert](mem) revert the mem config cause perfermace degradation (#13526 ) * Revert "[fix](mem) failure of allocating memory (#13414)" This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801. * Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)" This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.	2022-10-21 08:32:16 +08:00
yiguolei	1e42598fe6	[memory](podarray) revert not allocate too much memory in podarray change (#13457 ) revert not allocate too much memory in podarray change	2022-10-19 14:08:44 +08:00
Adonis Ling	125def5102	[enhancement](macOS M1) Support building from source on macOS (M1) (#13195 ) # Proposed changes This PR fixed lots of issues when building from source on macOS with Apple M1 chip. ## ATTENTION The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime: 1. Some errors with memory tracker occur when BE (RELEASE) starts. 2. Some UT cases fail. ... Temporarily, the following changes are made on macOS to start BE successfully. 1. Disable memory tracker. 2. Use tcmalloc instead of jemalloc. This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues. ## Use case ```shell ./build.sh -j 8 --be --clean cd output/be/bin ulimit -n 60000 ./start_be.sh --daemon ``` ## Something else It takes around _10+_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.	2022-10-18 13:10:13 +08:00
yiguolei	a5f3880649	[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285 ) disable page cache by default disable chunk allocator by default not use chunk allocator for vectorized allocator by default add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.	2022-10-15 17:27:17 +08:00
Yongqiang YANG	a7d42b5d81	[fix](streamload&sink) release and allocate memory in the same tracker (#12820 ) 1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker. 2. We can assume brpc allocate memory in doris thread. Above problems leads to wrong result of memtracker.	2022-09-23 17:51:44 +08:00
Xinyi Zou	b41eaa5ac0	[fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794 ) The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker. In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption. Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.	2022-09-21 15:47:10 +08:00
Xinyi Zou	f72d2559cf	[fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202	2022-08-31 09:12:28 +08:00
Xinyi Zou	8370115cf6	[enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168	2022-08-30 10:22:12 +08:00
Xinyi Zou	1304a17600	[fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021	2022-08-24 14:24:14 +08:00
Xinyi Zou	2a1803c646	[enhancement](memtracker) Optimize query memory accuracy (#11740 ) Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory. At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.	2022-08-16 14:23:28 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	358bd79fb1	[improvement](vec)(Join) Mem reuse to speed up join operator (#7905 ) 1. Reuse the mem of output block in vec join node 2. Add the function `replicate` in column	2022-01-31 22:14:12 +08:00
zuochunwei	1ba20b1dbb	[improvement](storage) improving Column inserter (#7855 ) * optimize Column inserter * DCHECK * DCHECK Co-authored-by: zuochunwei <zuochunwei@meituan.com>	2022-01-27 14:18:15 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00

16 Commits