Commit Graph

79 Commits

Author SHA1 Message Date
e1f0fa069c [enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580) 2022-11-29 10:15:25 +08:00
78adecac1b [enhancemennt](be)optimize mem usage in join and set node (#14602) 2022-11-27 13:38:49 +08:00
ac46922433 [fix](ut) Fix failures for BE UT macOS (#14543) 2022-11-24 17:39:37 +08:00
6c7f758ef7 [improvement](hashjoin) support partitioned hash table in hash join (#14480) 2022-11-24 14:16:47 +08:00
1520e5c88a [enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484)
* [enhancement](agg)use new method to serialize keys in batch if the key is too large

* fix compile error
2022-11-23 17:35:39 +08:00
1ec7f45fb6 [Bug](avg) Fix avg for bigint (#14433) 2022-11-22 10:29:59 +08:00
2c42f0a905 [refactor](decimalv3) Refine code for DecimalV3 (#14394) 2022-11-19 16:57:17 +08:00
1f326fc0d6 [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321)
* [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node

* use only one chunk memory when serializing keys in agg node
2022-11-18 12:31:52 +08:00
6d2e6d85d3 [enhancement](be)release memory in Node's close() method (#14258)
* [enhancement](be)release memory in Node's close() method

* format code
2022-11-15 15:59:23 +08:00
cffdeff4ec [fix](memory) Fix memory leak by calling boost::stacktrace (#14269)
boost::stacktrace::stacktrace() has memory leak, so use glog internal func to print stacktrace.
The reason for the memory leak of boost::stacktrace is that a state is saved in the thread local of each thread but not actively released. The test found that each thread leaked about 100M after calling boost::stacktrace.
refer to:
boostorg/stacktrace#118
boostorg/stacktrace#111
2022-11-15 08:58:57 +08:00
dd11d5c0a5 [enhancement](memory) Support try catch bad alloc (#14135) 2022-11-13 11:22:56 +08:00
035657c5a1 [typo](comment) Fix a lot of spell errors in be comments (#14208)
fix typos.
2022-11-12 16:06:15 +08:00
e692636b4f [performance-wip] (vectorization) Opt HashJoin Performance (#12390) 2022-11-09 14:07:49 +08:00
0b945fe361 [enhancement](memtracker) Refactor mem tracker hierarchy (#13585)
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.

type includes

enum Type {
        GLOBAL = 0,        // Life cycle is the same as the process, e.g. Cache and default Orphan
        QUERY = 1,         // Count the memory consumption of all Query tasks.
        LOAD = 2,          // Count the memory consumption of all Load tasks.
        COMPACTION = 3,    // Count the memory consumption of all Base and Cumulative tasks.
        SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
        CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
        BATCHLOAD = 6,  // Count the memory consumption of all EngineBatchLoadTask.
        CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
    }
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.

other fix:

In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
2022-11-08 09:52:33 +08:00
Pxl
57a9b0fa65 [Enhancement](chore) remove unused diagnostic (#12337)
remove unused diagnostic
2022-10-31 19:19:13 +08:00
d2be5096d6 [Revert](mem) revert the mem config cause perfermace degradation (#13526)
* Revert "[fix](mem) failure of allocating memory (#13414)"

This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801.

* Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)"

This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.
2022-10-21 08:32:16 +08:00
f329d33666 [chore](fix) Fix some spell errors in be's comments. #13452 2022-10-20 08:56:01 +08:00
1e42598fe6 [memory](podarray) revert not allocate too much memory in podarray change (#13457)
revert not allocate too much memory in podarray change
2022-10-19 14:08:44 +08:00
ac037e57f5 [fix](sort)the sort expr's nullability property may not be right (#13328) 2022-10-18 22:09:02 +08:00
125def5102 [enhancement](macOS M1) Support building from source on macOS (M1) (#13195)
# Proposed changes

This PR fixed lots of issues when building from source on macOS with Apple M1 chip.

## ATTENTION

The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...

Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.

This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.

## Use case

```shell
./build.sh -j 8 --be --clean

cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```

## Something else

It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the  development experience on macOS greatly when we finish the adaptation job.
2022-10-18 13:10:13 +08:00
1d5ba9cbcc [Improvement](like) Change like function to batch call (#13314) 2022-10-16 16:18:22 +08:00
a5f3880649 [improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)
disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
2022-10-15 17:27:17 +08:00
baf2689610 [Improvement](join) compute hash values by vectorized way (#13335) 2022-10-13 16:04:58 +08:00
dfe308f501 [Improvement](join) refine prefetch strategy (#13286) 2022-10-12 19:02:06 +08:00
89b295c6cc [enhancement](memory) Print memory usage log when memory allocation fails (#13301) 2022-10-12 10:08:25 +08:00
1ba9e4b568 [Improvement](sort) Reuse memory in sort node (#12921) 2022-09-28 09:44:35 +08:00
a7d42b5d81 [fix](streamload&sink) release and allocate memory in the same tracker (#12820)
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.

Above problems leads to wrong result of memtracker.
2022-09-23 17:51:44 +08:00
b41eaa5ac0 [fix](memtracker) Introduce orphan mem tracker to verify memory tracking accuracy (#12794)
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.

In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.

Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
2022-09-21 15:47:10 +08:00
8f4bb0f804 [improvement](agg) iterate aggregation data in memory written order (#12704)
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.

Test
hash table items count: 3.35M

Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
2022-09-21 14:58:50 +08:00
3cfaae0031 [Improvement](sort) Use heap sort to optimize sort node (#12700) 2022-09-21 10:01:52 +08:00
c05d736331 [Improvement](sort) fallback to partial sort small block if topN is small (#12604)
* [Improvement](sort) fallback to partial sort small block if topN is small
2022-09-16 10:20:17 +08:00
Pxl
0ead048b93 [Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456)
1. remove ColumnString terminating zero
    2. add a data_version for pblock
    3. change EncryptionMode to enum class
2022-09-14 21:25:22 +08:00
f72d2559cf [fix](compile) Fix compile error '<unknown>' may be used uninitialized in PODArray::insert_prepare #12202 2022-08-31 09:12:28 +08:00
8370115cf6 [enhancement](memtracker) Improve performance of tracking real physical memory of PODArray #12168 2022-08-30 10:22:12 +08:00
9caaa4bfbd [fix](memory) fix set disable_chunk_allocator_in_vec=false performance #12092 2022-08-26 14:28:12 +08:00
1304a17600 [fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021 2022-08-24 14:24:14 +08:00
c124470408 [enhancement](memory) Fix too much cache leads to less memory available for queries (#11751)
Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache.

For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock.

Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.
2022-08-16 14:35:57 +08:00
2a1803c646 [enhancement](memtracker) Optimize query memory accuracy (#11740)
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.

At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
2022-08-16 14:23:28 +08:00
092a394782 [improvement](agg)limit the output of agg node (#11461)
* [improvement](agg)limit the output of agg node
2022-08-05 07:53:55 +08:00
b74f36e009 [improvement]Use phmap for aggregation with integer keys (#11175) 2022-07-27 13:58:20 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
989e6d1cf9 [chore]fix clang compile error (#11021) 2022-07-20 08:28:47 +08:00
fd2c374426 [fix]Empty string key in aggregation was output as NULL (#11011) 2022-07-19 23:25:28 +08:00
899acb6564 [improvement][agg]import sub hashmap (#10937) 2022-07-18 18:36:45 +08:00
d9095922d9 [Enhancement] [Memory] add strict memory usage compile option STRICT_MEMORY_USE (#10936)
In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value.

STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.
2022-07-18 16:16:43 +08:00
d1573e1a4a [improvement]Use phmap for aggregation with serialized key (#10821) 2022-07-14 11:26:09 +08:00
4e9d5a7f7a optimize substr performance and fix ASAN global buffer overflow (#10442)
* add volnitsky substr algorithm

* replace std::search with volnitsky search algorithm in StringSearch

* optimize substring for constant_substring_fn case
use long run length search for performance
2022-07-12 08:36:21 +08:00
e293fbd277 [improvement]pre-serialize aggregation keys (#10700) 2022-07-09 06:21:56 +08:00
ec6620ae3e [feature-wip](array-type) add function arrays_overlap (#10233) 2022-06-30 08:12:29 +08:00
deeb3028ad [Enhancement] [Memory] [Vectorized] Stress test and optimize memory allocation (#9581)
* vec stress test, Allocator introduce chunkallocator

* fix comment
2022-06-29 02:57:51 +08:00