Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;
Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
#4995
**Implementation of Separated Page Cache**
- Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache
- Change the member of StoragePageCache to maintain two type of cache
- Change the interface of StoragePageCache for selecting type of cache
- Change the usage of page cache in read_and_decompress_page in page_io.cpp
- add page type as argument
- check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1)
- Add type as argument in superior call of read_and_decompress_page
- Change Unit Test
We create some objects which are only used for unit tests, it's not necessary,
and it may cause create duplicate instances for some classes.
This patch remove unnecessary instance of class BlockManager and StoragePageCache.
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec
before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared
too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
StorageEngine::open just return a very vague status info when failed,
we have to check logs to find out the root reason, and it's not
convenient to check logs if we run unit tests in CI dockers.
It would be better to return more detailed failure info to point out
the root reason, for example, it may return error status with message
"file descriptors limit is too small".
We can observe the workload of BE, and also it's a way to check
whether there is any problem in BE, like some container increase
too large and lead to OOM.
This patch add the following metrics:
```
Name Description
rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start
unused_rowsets_count The total count of unused rowset waiting to be GC
broker_count The total count of brokers in management
data_stream_receiver_count The total count of data stream receivers in management
fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count
active_scan_context_count The total count of active scan contexts
plan_fragment_count The total count of plan fragments in executing
load_channel_count The total count of load channels in management
result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024)
result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count)
routine_load_task_count The total count of routine load tasks in executing
small_file_cache_count The total count of cached small files' digest info
stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M)
tablet_writer_count The total count of tablet writers
brpc_endpoint_stub_count The total count of brpc endpoints
```
1. BlockManager has been added into StorageEngine.
So StorageEngine should be initialized when starting BetaRowset unit test.
2. Cache should not use the same buf to store value, otherwise the address
will be freed twice and crash.
[Bug][BetaRowset] fix beta rowset read slowly with limit
beta rowset do not update raw_rows_read in statistics and will read all
data in tablet when query with limit, which lead to long query time.