disable page cache by default
disable chunk allocator by default
not use chunk allocator for vectorized allocator by default
add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.
Above problems leads to wrong result of memtracker.
The mem hook consumes the orphan tracker by default. If the thread does not attach other trackers, by default all consumption will be passed to the process tracker through the orphan tracker.
In real time, consumption of all other trackers + orphan tracker consumption = process tracker consumption.
Ideally, all threads are expected to attach to the specified tracker, so that "all memory has its own ownership", and the consumption of the orphan mem tracker is close to 0, but greater than 0.
Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient.
Traversing aggregate states in memory write order can significantly improve memory read efficiency.
Test
hash table items count: 3.35M
Before this optimization: insert keys into column takes 500ms
With this optimization only takes 80ms
Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache.
For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock.
Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.
Currently, only the virtual memory used by the query can be tracked through the tcmalloc hook. When the memory is not fully used after the application, the recorded virtual memory will be larger than the physical memory.
At present, it is mainly because PODArray does not memset 0 when applying for memory, and blocks applied for through PODArray in places such as VOlapScanNode::_free_blocks are usually used for memory reuse and cannot be fully used.
In the strict memory usage mode of STRICT_MEMORY_USE=ON, when the capacity of the vectorized Hash Table is greater than 2G, it starts to grow when 75% of the capacity is satisfied, the memory usage of the vectorized Join becomes 50% of the previous value.
STRICT_MEMORY_USE=ON` expects BE to use less memory, and gives priority to ensuring stability when the cluster memory is limited.
* add volnitsky substr algorithm
* replace std::search with volnitsky search algorithm in StringSearch
* optimize substring for constant_substring_fn case
use long run length search for performance
1. MAP_POPULATE is missing for mmap in Allocator, because macro OS_LINUX is not defined in allocator.h;
2. MAP_POPULATE has no effect for mremap as for mmap, zero-fill enlarged memory range explicitly to pre-fault the pages
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)
2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
1.Member functions defined in a class are inline by default (implicitly), and do not need to be added
2.inline is a keyword used for implementation, which has no effect when placed before the function declaration
1. Avoid print large string in error log
If user load a unqualified large string, the all string will be saved in error log,
so the error log is too big that can not be shown be using `show load warnings on "url"`.
Err: `Got packet bigger than 'max_allowed_packet' bytes`
2. Remove duplicate help doc
Do not allow doc with same title, or error thrown when starting FE:
`java.lang.IllegalArgumentException: Multiple entries with same key:`