Commit Graph

16 Commits

Author SHA1 Message Date
5846b3fc54 [fix](memory) Remove PODArray peak allocated memory tracking #18010
#11740 , solved the problem that the query memory statistics are higher than the actual physical memory, because PODArray does not have memset 0 when allocating memory, and the query mem tracker is virtual memory.

But in extreme cases, such as csv load, PODArray frequent insert will cause performance problems. So revert part of #11740 and part of #12820.

The accuracy of the query mem tracker, there is currently no feedback, no further attention.
2023-03-26 09:45:10 +08:00
6d2e6d85d3 [enhancement](be)release memory in Node's close() method (#14258)
* [enhancement](be)release memory in Node's close() method

* format code
2022-11-15 15:59:23 +08:00
d2be5096d6 [Revert](mem) revert the mem config cause perfermace degradation (#13526)
* Revert "[fix](mem) failure of allocating memory (#13414)"

This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801.

* Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)"

This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.
2022-10-21 08:32:16 +08:00
1e42598fe6 [memory](podarray) revert not allocate too much memory in podarray change (#13457)
revert not allocate too much memory in podarray change
2022-10-19 14:08:44 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
a5f3880649 [improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)
disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
2022-10-15 17:27:17 +08:00
a7d42b5d81 [fix](streamload&sink) release and allocate memory in the same tracker (#12820)
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.

Above problems leads to wrong result of memtracker.
2022-09-23 17:51:44 +08:00
b41eaa5ac0 [fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794)
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.

In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.

Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
2022-09-21 15:47:10 +08:00
f72d2559cf [fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202 2022-08-31 09:12:28 +08:00
8370115cf6 [enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168 2022-08-30 10:22:12 +08:00
1304a17600 [fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021 2022-08-24 14:24:14 +08:00
2a1803c646 [enhancement](memtracker) Optimize query memory accuracy (#11740)
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.

At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
2022-08-16 14:23:28 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
358bd79fb1 [improvement](vec)(Join) Mem reuse to speed up join operator (#7905)
1. Reuse the mem of output block in vec join node
2. Add the function `replicate` in column
2022-01-31 22:14:12 +08:00
1ba20b1dbb [improvement](storage) improving Column inserter (#7855)
* optimize Column inserter

* DCHECK

* DCHECK

Co-authored-by: zuochunwei <zuochunwei@meituan.com>
2022-01-27 14:18:15 +08:00
e1d7233e9c [feature](vectorization) Support Vectorized Exec Engine In Doris (#7785)
# Proposed changes

Issue Number: close #6238

    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com>
    Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com>
    Co-authored-by: wangbo <506340561@qq.com>
    Co-authored-by: emmymiao87 <522274284@qq.com>
    Co-authored-by: Pxl <952130278@qq.com>
    Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
    Co-authored-by: thinker <zchw100@qq.com>
    Co-authored-by: Zeno Yang <1521564989@qq.com>
    Co-authored-by: Wang Shuo <wangshuo128@gmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
    Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
    Co-authored-by: xinghuayu007 <1450306854@qq.com>
    Co-authored-by: weizuo93 <weizuo@apache.org>
    Co-authored-by: yiguolei <guoleiyi@tencent.com>
    Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com>
    Co-authored-by: awakeljw <993007281@qq.com>
    Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com>
    Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com>


## Problem Summary:

### 1. Some code from clickhouse

**ClickHouse is an excellent implementation of the vectorized execution engine database,
so here we have referenced and learned a lot from its excellent implementation in terms of
data structure and function implementation.
We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers.**

The following comment has been added to the code from Clickhouse, eg:
// This file is copied from
// https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h
// and modified by Doris

### 2. Support exec node and query:
* vaggregation_node
* vanalytic_eval_node
* vassert_num_rows_node
* vblocking_join_node
* vcross_join_node
* vempty_set_node
* ves_http_scan_node
* vexcept_node
* vexchange_node
* vintersect_node
* vmysql_scan_node
* vodbc_scan_node
* volap_scan_node
* vrepeat_node
* vschema_scan_node
* vselect_node
* vset_operation_node
* vsort_node
* vunion_node
* vhash_join_node

You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set.

### 3. Data Model

Vec Exec Engine Support **Dup/Agg/Unq** table, Support Block Reader Vectorized.
Segment Vec is working in process.

### 4. How to use

1. Set the environment variable `set enable_vectorized_engine = true; `(required)
2. Set the environment variable `set batch_size = 4096; ` (recommended)

### 5. Some diff from origin exec engine

https://github.com/doris-vectorized/doris-vectorized/issues/294

## Checklist(Required)

1. Does it affect the original behavior: (No)
2. Has unit tests been added: (Yes)
3. Has document been added or modified: (No)
4. Does it need to update dependencies: (No)
5. Are there any changes that cannot be rolled back: (Yes)
2022-01-18 10:07:15 +08:00