In some situation, _handle_mem_exceed_limit will alloc a large memory block, more than 5G. After add some log, we found that:
alloc memory was made in vector::insert_realloc
writers_to_reduce_mem's size is more than 8 million.
which indicated that an infinite loop was met in while (!tablets_mem_heap.empty()).
By reviewing codes, """ if (std::get<0>(tablet_mem_item)++ != std::get<1>(tablet_mem_item)) """ is wrong,
which must be """ if (++std::get<0>(tablet_mem_item) != std::get<1>(tablet_mem_item)) """.
In the original code, we will made ++ on end iterator, and then compare to end iterator, the behavior is undefined.
Only switch to bthread local when modifying the mem tracker in the thread context. No longer switches to bthread local by default when bthread starts
mem tracker increases brpc IOBufBlockMemory memory
remove thread mem tracker metrics
fix mem_limit default value
memory_gc_sleep_time_s to memory_gc_sleep_time_ms
LoadChannelMgr::_handle_mem_exceed_limit process_mem_limit to process soft mem limit
fix query mem tracker print
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.
Optimize mem limit exceeded log printing
Optimize compilation time
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.
type includes
enum Type {
GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan
QUERY = 1, // Count the memory consumption of all Query tasks.
LOAD = 2, // Count the memory consumption of all Load tasks.
COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks.
SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask.
CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
}
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.
other fix:
In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.
When the physical memory of the process reaches 90% of the mem limit, trigger the load channel mgr to brush down
The default value of be.conf mem_limit is changed from 90% to 80%, and stability is the priority.
Fix deadlock in arena_locks in BufferPool::BufferAllocator::ScavengeBuffers and _lock in DebugString
1. Remove single load channel mem limit, only use load channel mgr mem limit
2. Default load channel mgr mem limit from 50% to 80%
3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush
4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.
Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.