[improvement](memory) simplify memory config related to tcmalloc (#13781)

There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage. If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
2022-11-01 21:45:19 +08:00
parent 287a739510
commit 8b3afd431e
5 changed files with 40 additions and 72 deletions
--- a/be/src/common/config.h
+++ b/be/src/common/config.h
@ -46,25 +46,9 @@ CONF_Int32(single_replica_load_brpc_num_threads, "64");
 // If no ip match this rule, will choose one randomly.
 CONF_String(priority_networks, "");

-////
-//// tcmalloc gc parameter
-////
-// min memory for TCmalloc, when used memory is smaller than this, do not returned to OS
-CONF_mInt64(tc_use_memory_min, "10737418240");
-// free memory rate.[0-100]
-CONF_mInt64(tc_free_memory_rate, "20");
-// tcmallc aggressive_memory_decommit
-CONF_Bool(tc_enable_aggressive_memory_decommit, "false");
-
-// Bound on the total amount of bytes allocated to thread caches.
-// This bound is not strict, so it is possible for the cache to go over this bound
-// in certain circumstances. This value defaults to 1GB
-// If you suspect your application is not scaling to many threads due to lock contention in TCMalloc,
-// you can try increasing this value. This may improve performance, at a cost of extra memory
-// use by TCMalloc.
-// reference: https://gperftools.github.io/gperftools/tcmalloc.html: TCMALLOC_MAX_TOTAL_THREAD_CACHE_BYTES
-//            https://github.com/gperftools/gperftools/issues/1111
-CONF_Int64(tc_max_total_thread_cache_bytes, "1073741824");
+// memory mode
+// performance or compact
+CONF_String(memory_mode, "performance");

 // process memory limit specified as number of bytes
 // ('<int>[bB]?'), megabytes ('<float>[mM]'), gigabytes ('<float>[gG]'),
--- a/be/src/common/daemon.cpp
+++ b/be/src/common/daemon.cpp
@ -71,6 +71,15 @@ void Daemon::tcmalloc_gc_thread() {
    // TODO All cache GC wish to be supported
 #if !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && !defined(THREAD_SANITIZER) && \
        !defined(USE_JEMALLOC)
+
+    size_t tc_use_memory_min = MemInfo::mem_limit();
+    if (config::memory_mode == std::string("performance")) {
+        tc_use_memory_min = std::max(tc_use_memory_min / 10 * 9,
+                                     tc_use_memory_min - (size_t)10 * 1024 * 1024 * 1024);
+    } else {
+        tc_use_memory_min >>= 1;
+    }
+
    while (!_stop_background_threads_latch.wait_for(std::chrono::seconds(10))) {
        size_t used_size = 0;
        size_t free_size = 0;
@ -80,10 +89,11 @@ void Daemon::tcmalloc_gc_thread() {
        MallocExtension::instance()->GetNumericProperty("tcmalloc.pageheap_free_bytes", &free_size);
        size_t alloc_size = used_size + free_size;
        LOG(INFO) << "tcmalloc.pageheap_free_bytes " << free_size
-                  << ", generic.current_allocated_bytes " << used_size;
+                  << ", generic.current_allocated_bytes " << used_size << ", tc_use_memory_min "
+                  << tc_use_memory_min;

-        if (alloc_size > config::tc_use_memory_min) {
-            size_t max_free_size = alloc_size * config::tc_free_memory_rate / 100;
+        if (alloc_size > tc_use_memory_min) {
+            size_t max_free_size = alloc_size * 20 / 100;
            if (free_size > max_free_size) {
                MallocExtension::instance()->ReleaseToSystem(free_size - max_free_size);
            }
--- a/be/src/service/doris_main.cpp
+++ b/be/src/service/doris_main.cpp
@ -323,17 +323,19 @@ int main(int argc, char** argv) {

 #if !defined(__SANITIZE_ADDRESS__) && !defined(ADDRESS_SANITIZER) && !defined(LEAK_SANITIZER) && \
        !defined(THREAD_SANITIZER) && !defined(USE_JEMALLOC)
-    // Aggressive decommit is required so that unused pages in the TCMalloc page heap are
-    // not backed by physical pages and do not contribute towards memory consumption.
-    if (doris::config::tc_enable_aggressive_memory_decommit) {
-        MallocExtension::instance()->SetNumericProperty("tcmalloc.aggressive_memory_decommit", 1);
-    }
    // Change the total TCMalloc thread cache size if necessary.
-    if (!MallocExtension::instance()->SetNumericProperty(
-                "tcmalloc.max_total_thread_cache_bytes",
-                doris::config::tc_max_total_thread_cache_bytes)) {
-        fprintf(stderr, "Failed to change TCMalloc total thread cache size.\n");
-        return -1;
+    size_t total_thread_cache_bytes;
+    if (!MallocExtension::instance()->GetNumericProperty("tcmalloc.max_total_thread_cache_bytes",
+                                                         &total_thread_cache_bytes)) {
+        fprintf(stderr, "Failed to get TCMalloc total thread cache size.\n");
+    }
+    const size_t kDefaultTotalThreadCacheBytes = 1024 * 1024 * 1024;
+    if (total_thread_cache_bytes < kDefaultTotalThreadCacheBytes) {
+        if (!MallocExtension::instance()->SetNumericProperty(
+                    "tcmalloc.max_total_thread_cache_bytes", kDefaultTotalThreadCacheBytes)) {
+            fprintf(stderr, "Failed to change TCMalloc total thread cache size.\n");
+            return -1;
+        }
    }
 #endif

--- a/docs/en/docs/admin-manual/config/be-config.md
+++ b/docs/en/docs/admin-manual/config/be-config.md
@ -838,6 +838,12 @@ The number of sliced tablets, plan the layout of the tablet, and avoid too many
 * Description: Limit the percentage of the server's maximum memory used by the BE process. It is used to prevent BE memory from occupying to many the machine's memory. This parameter must be greater than 0. When the percentage is greater than 100%, the value will default to 100%.
 * Default value: 80%

+### `memory_mode`
+
+* Type: string
+* Description: Control gc of tcmalloc, in performance mode doirs releases memory of tcmalloc cache when usgae >= 90% * mem_limit, otherwise, doris releases memory of tcmalloc cache when usage >= 50% * mem_limit;
+* Default value: performance
+
 ### `memory_limitation_per_thread_for_schema_change`

 Default: 2 （G）
@ -1350,26 +1356,6 @@ The RPC timeout for sending a Batch (1024 lines) during import. The default is 6

 When meet '[E1011]The server is overcrowded' error, you can tune the configuration `brpc_socket_max_unwritten_bytes`, but it can't be modified at runtime. Set it to `true` to avoid writing failed temporarily. Notice that, it only effects `write`, other rpc requests will still check if overcrowded.

-### `tc_free_memory_rate`
-
-Default: 20   (%)
-
-Available memory, value range: [0-100]
-
-### `tc_max_total_thread_cache_bytes`
-
-* Type: int64
-* Description: Used to limit the total thread cache size in tcmalloc. This limit is not a hard limit, so the actual thread cache usage may exceed this limit. For details, please refer to [TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
-* Default: 1073741824
-
-If the system is found to be in a high-stress scenario and a large number of threads are found in the tcmalloc lock competition phase through the BE thread stack, such as a large number of `SpinLock` related stacks, you can try increasing this parameter to improve system performance. [Reference](https://github.com/gperftools/gperftools/issues/1111)
-
-### `tc_use_memory_min`
-
-Default: 10737418240
-
-The minimum memory of TCmalloc, when the memory used is less than this, it is not returned to the operating system
-
 ### `thrift_client_retry_interval_ms`

 * Type: int64
--- a/docs/zh-CN/docs/admin-manual/config/be-config.md
+++ b/docs/zh-CN/docs/admin-manual/config/be-config.md
@ -839,6 +839,12 @@ txn 管理器中每个 txn_partition_map 的最大 txns 数，这是一种自我
 * 描述：限制BE进程使用服务器最大内存百分比。用于防止BE内存挤占太多的机器内存，该参数必须大于0，当百分大于100%之后，该值会默认为100%。
 * 默认值：80%

+### `memory_mode`
+
+* 类型：string
+* 描述：控制tcmalloc的回收。如果配置为performance，内存使用超过mem_limit的90%时，doris会释放tcmalloc cache中的内存，如果配置为compact，内存使用超过mem_limit的50%时，doris会释放tcmalloc cache中的内存。
+* 默认值：performance
+
 ### `memory_limitation_per_thread_for_schema_change`

 默认值：2 （GB）
@ -1373,26 +1379,6 @@ tablet状态缓存的更新间隔，单位：秒

 当遇到'[E1011]The server is overcrowded'的错误时，可以调整配置项`brpc_socket_max_unwritten_bytes`，但这个配置项不能动态调整。所以可通过设置此项为`true`来临时避免写失败。注意，此配置项只影响写流程，其他的rpc请求依旧会检查是否overcrowded。

-### `tc_free_memory_rate`
-
-默认值：20   (%)
-
-可用内存，取值范围：[0-100]
-
-### `tc_max_total_thread_cache_bytes`
-
-* 类型：int64
-* 描述：用来限制 tcmalloc 中总的线程缓存大小。这个限制不是硬限，因此实际线程缓存使用可能超过这个限制。具体可参阅 [TCMALLOC\_MAX\_TOTAL\_THREAD\_CACHE\_BYTES](https://gperftools.github.io/gperftools/tcmalloc.html)
-* 默认值： 1073741824
-
-如果发现系统在高压力场景下，通过 BE 线程堆栈发现大量线程处于 tcmalloc 的锁竞争阶段，如大量的 `SpinLock` 相关堆栈，则可以尝试增大该参数来提升系统性能。[参考](https://github.com/gperftools/gperftools/issues/1111)
-
-### `tc_use_memory_min`
-
-默认值：10737418240
-
-TCmalloc 的最小内存，当使用的内存小于这个时，不返回给操作系统
-
 ### `thrift_client_retry_interval_ms`

 * 类型：int64