This CL mainly changes:
1. the `storage_page_cache_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
2. the `buffer_pool_limit` is based on config `mem_limit`
the default is 20% of `mem_limit`.
3. the `buffer_pool_clean_pages_limit` is based on config `buffer_pool_limit`
the default is 50% of `buffer_pool_limit`
4. Fix some show bugs of lru cache hit ratio and usage ratio
5. Fix a create view bug that `notEvalNondeterministicFunction` should be reset after analyze.
At present, some constant expression calculations are implemented on the FE side,
but they are incomplete, and some expressions cannot be completely consistent with
the value calculated by BE (such as part of the time function)
Therefore, we provide a way to pass all the constants in SQL to BE for calculation,
and then begin to analyze and plan SQL. This method can also solve the problem that some
complex constant calculations issued by BI cannot be processed on the FE side.
Here through a session variable enable_fold_constant_by_be to control this function,
which is disabled by default.
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.
OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
1 Make some MemTracker have reasonable parent MemTracker not the root tracker
2 Make each MemTracker can be easily to trace.
3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
* [Enhance] Make MemTracker more accurate (#5515)
This PR main about:
1. Improve the readability of MemTrackers' name
2. Add the MemTracker of:
* Load
* Compaction
* SchemaChange
* StoragePageCache
* TabletManager
3. Change SchemaChange to a Singleon
* revise some code for Code Review
* change the name of mem_tracker
* keep reader_context have the same lifetime of rowset_reader in schema change.
* change vlog notice to log(warning) in schema change
#4995
**Implementation of Separated Page Cache**
- Add config "index_page_cache_ratio" to set the ratio of capacity of index page cache
- Change the member of StoragePageCache to maintain two type of cache
- Change the interface of StoragePageCache for selecting type of cache
- Change the usage of page cache in read_and_decompress_page in page_io.cpp
- add page type as argument
- check if current page type is available in StoragePageCache (cover the situation of ratio == 0 or 1)
- Add type as argument in superior call of read_and_decompress_page
- Change Unit Test
1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime
2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve
3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition,
4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
Redesign metrics to 3 layers:
MetricRegistry - MetricEntity - Metrics
MetricRegistry : the register center
MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several
MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift
clients and servers, and so on.
Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity,
thrift_opened_clients on a thrift_client entity.
MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across
different MetricEntities.
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec
before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared
too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
Support BE plugin framework, include:
* update Plugin Manager, support Plugin find method
* support Builtin-Plugin register method
* plugin install/uninstall process
* PluginLoader:
* dynamic install and check Plugin .so file
* dynamic uninstall and check Plugin status
* PluginZip:
* support plugin remote/local .zip file download and extract
TODO:
* We should support a PluginContext to transmit necessary system variable when the plugin's init/close method invoke
* Add the entry which is BE dynamic Plugin install/uninstall process, include:
* The FE send install/uninstall Plugin statement (RPC way)
* The FE meta update request with Plugin list information
* The FE operation request(update/query) with Plugin (maybe don't need)
* Add the plugin status upload way
* Load already install Plugin when BE start
Normalize the setting of mem limit to avoid some unexpected exception.
For example, use may not setting query mem limit in query plan, which
may cause BE crash.
1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask.
2. FlushToken is used to seperate different tasks from different tablets.
Every DeltaWriter of tablet constructs a FlushToken,
task in FlushToken are handle serially, task between FlushToken are
handle concurrently.
3. I have remove thread limit on data_dir, because of I/O is not the main
timer consumer of Flush thread. Much of time is consumed in CPU decoding
and compress.
The control framework is implemented through heartbeat message. Use uint64_t as flags to control different functions.
Now add a flag to set the default rowset type to beta.
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.
1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
Nameing rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
with different format.
1. Print broker address for debug.
2. Do not letting backup job cancelled if it already in state UPLOAD_INFO.
3. Cancel task on Backends when job is cancelled.
4. Show detail progress of backup and restore job.
5. Make 'show snapshot' result more readable.
6. Change upload and download thread num of backup and restore in Backend to 1.
* Reduce UT binary size
Almost every module depend on ExecEnv, and ExecEnv contains all
singleton, which make UT binary contains all object files.
This patch seperate ExecEnv's initial and destory to anthor file to
avoid other file's dependence. And status.cc include debug_util.h which
depend tuple.h tuple_row.h, and I move get_stack_trace() to
stack_util.cpp to reduce status.cc's dependence.
I add USE_RTTI=1 to build rocksdb to avoid linking librocksdb.a
Issue: #292
* Update