[doc](fix) cold hot separation cache doc (#20994)
This commit is contained in:
@ -117,16 +117,12 @@ For details, please refer to the [resource](../sql-manual/sql-reference/Data-Def
|
||||
|
||||
As above, cold data introduces the cache in order to optimize query performance. After the first hit after cooling, Doris will reload the cooled data to be's local disk. The cache has the following characteristics:
|
||||
- The cache is actually stored on the be local disk and does not occupy memory.
|
||||
- the cache can limit expansion and clean up data through LRU
|
||||
- The be parameter `file_cache_alive_time_sec` can set the maximum storage time of the cache data after it has not been accessed. The default is 604800, which is one week.
|
||||
- The be parameter `file_cache_max_size_per_disk` can set the disk size occupied by the cache. Once this setting is exceeded, the cache that has not been accessed for the longest time will be deleted. The default is 0, means no limit to the size, unit: byte.
|
||||
- The be parameter `file_cache_type` is optional `sub_file_cache` (segment the remote file for local caching) and `whole_file_cache` (the entire remote file for local caching), the default is "", means no file is cached, please set it when caching is required this parameter.
|
||||
|
||||
|
||||
- The cache can limit expansion and clean up data through LRU
|
||||
- The implementation of the cache is the same as the cache of the federated query catalog. The documentation is [here](../lakehouse/filecache.md)
|
||||
|
||||
## cold data compaction
|
||||
The time when cold data is imported is from the moment when the data rowset file is written to the local disk, plus the cooling time. Since the data is not written and cooled at one time, to avoid the problem of small files in the object storage, doris will also perform compaction of cold data.
|
||||
However, the frequency of cold data compaction and the priority of resource occupation are not very high. Specifically, it can be adjusted by the following be parameters:
|
||||
However, the frequency of cold data compaction and the priority of resource occupation are not very high, let the local hot data be compacted as much as possible before performing cooling. Specifically, it can be adjusted by the following be parameters:
|
||||
- The be parameter `cold_data_compaction_thread_num` can set the concurrency of executing cold data compaction, the default is 2.
|
||||
- The be parameter `cold_data_compaction_interval_sec` can set the time interval for executing cold data compaction, the default is 1800, unit: second, that is, half an hour.
|
||||
|
||||
|
||||
@ -118,13 +118,11 @@ ALTER TABLE create_table_partition MODIFY PARTITION (*) SET("storage_policy"="te
|
||||
上文提到冷数据为了优化查询的性能和对象存储资源节省,引入了cache的概念。在冷却后首次命中,Doris会将已经冷却的数据又重新加载到be的本地磁盘,cache有以下特性:
|
||||
- cache实际存储于be磁盘,不占用内存空间。
|
||||
- cache可以限制膨胀,通过LRU进行数据的清理
|
||||
- be参数`file_cache_alive_time_sec`可以设置cache数据再未被访问后的最大保存时间,默认是604800,即一周。
|
||||
- be参数`file_cache_max_size_per_disk` 可以设置cache占用磁盘大小,一旦超过这个设置,会删除最久未访问cache,默认是0,单位:字节,即不限制大小。
|
||||
- be参数`file_cache_type` 可选项`sub_file_cache`(切分远端文件进行本地缓存)和`whole_file_cache`(整个远端文件进行本地缓存),默认为"",即不缓存文件,需要缓存的时候请设置此参数。
|
||||
- cache的实现和联邦查询catalog的cache是同一套实现,文档参考[此处](../lakehouse/filecache.md)
|
||||
|
||||
## 冷数据的compaction
|
||||
冷数据传入的时间是数据rowset文件写入本地磁盘时刻起,加上冷却时间。由于数据并不是一次性写入和冷却的,因此避免在对象存储内的小文件问题,doris也会进行冷数据的compaction。
|
||||
但是,冷数据的compaction的频次和资源占用的优先级并不是很高。具体可以通过以下be参数调整:
|
||||
但是,冷数据的compaction的频次和资源占用的优先级并不是很高,也推荐本地热数据compaction后再执行冷却。具体可以通过以下be参数调整:
|
||||
- be参数`cold_data_compaction_thread_num`可以设置执行冷数据的compaction的并发,默认是2。
|
||||
- be参数`cold_data_compaction_interval_sec` 可以设置执行冷数据的compaction的时间间隔,默认是1800,单位:秒,即半个小时。
|
||||
|
||||
|
||||
Reference in New Issue
Block a user