Commit Graph

139 Commits

Author SHA1 Message Date
de6ecd2035 [fix](tls) Manually track memory in Allocator instead of mem hook and ThreadContext life cycle to manual control (#26904)
Manually track query/load/compaction/etc. memory in Allocator instead of mem hook.
Can still use Mem Hook when cannot manually track memory code segments and find memory locations during debugging.
This will cause memory tracking loss for Query, loss less than 10% compared to the past, but this is expected to be more controllable.
Similarly, Mem Hook will no longer track unowned memory to the orphan mem tracker by default, so the total memory of all MemTrackers will be less than before.
Not need to get memory size from jemalloc in Mem Hook each memory alloc and free, which would lose performance in the past.
Not require caching bthread local in pthread local for memory hook, in the past this has caused core dumps inside bthread, seems to be a bug in bthread.
ThreadContext life cycle to manual control
In the past, ThreadContext was automatically created when it was used for the first time (this was usually in the Jemalloc Hook when the first malloc memory), and was automatically destroyed when the thread exited.
Now instead of manually controlling the create and destroy of ThreadContext, it is mainly created manually when the task thread start and destroyed before the task thread end.
Run 43 clickbench query tests.
Use MemHook in the past:
2023-11-14 10:30:42 +08:00
a5565f68b2 [Refactor](opentelemetry) Remove opentelemetry (#26605) 2023-11-09 18:05:34 +08:00
87b414cdae [Fix](query execution) Fix result sink fragment can't be cancelled in non-pipeline (#25524) 2023-10-24 11:30:29 +08:00
022762d5f0 [fix](memory) Fix work load group GC and add logs to locate slow GC #24975
Fix work load group GC, add cancel load and add logs.
Unify the format and change all to lowercase of GC logs, avoid unnecessary trouble when grep or less
Add logs to help locate the cause of slow GC.
2023-10-12 10:33:56 +08:00
f2f591e280 [fix](memory) Optimize memory exceed limit logs (#22655)
After memory exceeds the limit, print the top 15 task trackers with the largest memory.
After memory exceeds the limit, more detailed GC logs in stages.
fix large memory check.
2023-09-21 10:38:17 +08:00
c7ae2a7d22 [Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029) 2023-09-13 09:27:03 +08:00
0143ae8266 [fix]Add logging before _builtin_unreachable() (#24101)
Co-authored-by: 宋光璠 <songguangfan@sf.com>
2023-09-09 00:30:11 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
f1e43fcaa4 [opt](cache) Support segment cache dynamic opening and closing (#23659)
Dynamically modify the config to clear the cache, each time the disable cache will only be cleared once.
TODO, Support page cache and other caches.

curl -X POST http://xxxx:8040/api/update_config?disable_segment_cache=true
2023-08-31 18:48:26 +08:00
hzq
c083336bbe [Improvement](pipeline) Cancel outdated query if original fe restarts (#23582)
If any FE restarts, queries that is emitted from this FE will be cancelled.

Implementation of #23704
2023-08-31 17:58:52 +08:00
ba351af452 [enhancement](thirdparty) upgrade thirdparty libs - again (#23414)
submit again #23290 (not upgrade brpc, because bthread local has error)

protobuf 3.15.0 -> 21.11
glog 0.4.0 -> 0.6.0
lz4 1.9.3 -> 1.9.4
curl 7.79.0 -> 8.2.1
zstd 1.5.2 -> 1.5.5
arrow 7.0.0 -> 13.0.0
abseil 20220623.1 -> 20230125.3
orc 1.7.2 -> 1.9.0
jemalloc for arrow 5.2.1 -> 5.3.0
xsimd 7.0.0 -> 13.0.0
opentelemetry-proto 0.19.0 -> 1.0.0
opentelemetry 1.8.3 -> 1.10.0

new:
c-ares -> 1.19.1
grpc -> 1.54.3
2023-08-26 22:59:10 +08:00
13cc7a31ab [fix](bug) Fix page handle safe exit #22849 2023-08-11 09:55:19 +08:00
96f42ca20a [fix](memory) Independent count exec node memory profile (#22598)
Independent count exec node memory profile, after #22582
2023-08-06 10:56:31 +08:00
e8d105d6ff [fix](debug) add bvar counter for memtracker #22581 2023-08-04 09:56:30 +08:00
ee754307bb [refactor](load) refactor memtable flush actively (#21634) 2023-07-30 21:31:54 +08:00
1f3de0eae3 [fix](memory) fix invalid large memory check && fix memory info thread safety (#22027)
fix invalid large memory check
fix memory info thread safety
2023-07-26 12:18:31 +08:00
7e1299bcbc [fix](memory) fix mem tracker grace exit 2 (#22003)
fix: #21136
mem tracker group uses class static variables instead of global variables

https://stackoverflow.com/questions/2204608/does-c-call-destructors-for-global-and-class-static-variables
TODO: A mem tracker manager is required, don't use global variables, it will sad

==3623982==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f0000056b8 at pc 0x56478bbe3ae0 bp 0x7f20953d2270 sp 0x7f20953d2268
READ of size 8 at 0x60f0000056b8 thread T41 (memory_tracker_)
*** Query id: 0-0 ***
*** Aborted at 1689749969 (unix time) try "date -d @1689749969" if you are using GNU date ***
*** Current BE git commitID: b3e9cad48e ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 3623982 (TID 3624277 OR 0x7f19e06dd640) from PID 0; stack trace: ***
    #0 0x56478bbe3adf in std::__shared_ptr::operator bool() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1295:16
    #1 0x56478bbe306e in doris::MemTracker::refresh_profile_counter() /doris/be/src/runtime/memory/mem_tracker.h:149:13
    #2 0x56478bbec669 in doris::MemTrackerLimiter::refresh_all_tracker_profile() /doris/be/src/runtime/memory/mem_tracker_limiter.cpp:119:22
    #3 0x564788f53fa0 in doris::Daemon::memory_tracker_profile_refresh_thread() /doris/be/src/common/daemon.cpp:295:9
    #4 0x564788f5d04b in doris::Daemon::start()::$_4::operator()() const /doris/be/src/common/daemon.cpp:473:30
    #5 0x564788f5cff6 in void std::__invoke_impl(std::__invoke_other, doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #6 0x564788f5cf78 in std::enable_if, void>::type std::__invoke_r(doris::Daemon::start()::$_4&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #7 0x564788f5cdae in std::_Function_handler::_M_invoke(std::_Any_data const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
    #8 0x56478903f576 in std::function::operator()() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
    #9 0x56478c4a35af in doris::Thread::supervise_thread(void*) /doris/be/src/util/thread.cpp:465:5
    #10 0x7f217c8a244f in start_thread nptl/pthread_create.c:473:8
    #11 0x7f217cb27d52 in __clone misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

0x60f0000056b8 is located 56 bytes inside of 168-byte region [0x60f000005680,0x60f000005728)
freed by thread T0 here:
    #0 0x564788e7280d in operator delete(void*) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1758280d) (BuildId: 219493cc924323ee)
    #1 0x56478acec1d5 in std::default_delete::operator()(doris::MemTrackerLimiter*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
    #2 0x56478ace9faf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
    #3 0x56478ace1471 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:581:1
    #4 0x56478ace14c8 in doris::ShardedLRUCache::~ShardedLRUCache() /doris/be/src/olap/lru_cache.cpp:572:37
    #5 0x56478acd0984 in std::default_delete::operator()(doris::Cache*) const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85:2
    #6 0x56478acceddf in std::unique_ptr >::~unique_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361:4
    #7 0x56478ad96dc6 in doris::StoragePageCache::~StoragePageCache() /doris/be/src/olap/page_cache.h:78:7
    #8 0x7f217ca54146 in __run_exit_handlers stdlib/exit.c:108:8

previously allocated by thread T0 here:
    #0 0x564788e71fad in operator new(unsigned long) (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x17581fad) (BuildId: 219493cc924323ee)
    #1 0x56478ace9c90 in std::_MakeUniq::__single_object std::make_unique, std::allocator > const&>(doris::MemTrackerLimiter::Type&&, std::__cxx11::basic_string, std::allocator > const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:962:30
    #2 0x56478acde930 in doris::ShardedLRUCache::ShardedLRUCache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int, unsigned int) /doris/be/src/olap/lru_cache.cpp:526:20
    #3 0x56478ace22e1 in doris::new_lru_cache(std::__cxx11::basic_string, std::allocator > const&, unsigned long, doris::LRUCacheType, unsigned int) /doris/be/src/olap/lru_cache.cpp:670:16
    #4 0x56478ad91da2 in doris::StoragePageCache::StoragePageCache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:47:17
    #5 0x56478ad9156e in doris::StoragePageCache::create_global_cache(unsigned long, int, long, unsigned int) /doris/be/src/olap/page_cache.cpp:31:29
    #6 0x56478b98b3d3 in doris::ExecEnv::_init_mem_env() /doris/be/src/runtime/exec_env_init.cpp:251:5
    #7 0x56478b98946c in doris::ExecEnv::_init(std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:182:5
    #8 0x56478b987139 in doris::ExecEnv::init(doris::ExecEnv*, std::vector > const&) /doris/be/src/runtime/exec_env_init.cpp:98:17
    #9 0x564788e79b50 in main /doris/be/src/service/doris_main.cpp:429:5
    #10 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16

Thread T41 (memory_tracker_) created by T0 here:
    #0 0x564788e1fcaa in pthread_create (/mnt/hdd01/dorisTestEnv/NEREIDS_ASAN/be/lib/doris_be+0x1752fcaa) (BuildId: 219493cc924323ee)
    #1 0x56478c4a2366 in doris::Thread::start_thread(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, std::function const&, unsigned long, scoped_refptr*) /doris/be/src/util/thread.cpp:419:15
    #2 0x564788f59b91 in doris::Status doris::Thread::create(std::__cxx11::basic_string, std::allocator > const&, std::__cxx11::basic_string, std::allocator > const&, doris::Daemon::start()::$_4 const&, scoped_refptr*) /doris/be/src/util/thread.h:50:16
    #3 0x564788f58165 in doris::Daemon::start() /doris/be/src/common/daemon.cpp:471:10
    #4 0x564788e79a96 in main /doris/be/src/service/doris_main.cpp:420:12
    #5 0x7f217ca38564 in __libc_start_main csu/../csu/libc-start.c:332:16
2023-07-20 10:33:03 +08:00
4b30485d62 [improvement](memory) Refactor doris cache GC (#21522)
Abstract CachePolicy, which controls the gc of all caches.
Add stale sweep to all lru caches, including page caches, etc.
I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC:
  FreeTopMemoryQuery:
     - CancelCostTime: 1m51s
     - CancelTasksNum: 1
     - FindCostTime: 0.000ns
     - FreedMemory: 2.93 GB
  WorkloadGroup:
  Cache name=DataPageCache:
     - CostTime: 15.283ms
     - FreedEntrys: 9.56K
     - FreedMemory: 691.97 MB
     - PruneAllNumber: 1
     - PruneStaleNumber: 1
2023-07-11 20:21:31 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
d2c42ec638 [fix](memory) Purge Jemalloc arena dirty pages when memory insufficient (#21237)
Jemalloc dirty page only use madvise MADV_FREE, memory is not release back to system, RSS won't reduce in time,

So when the process memory exceed limit or system available memory is insufficient,
manually transfer dirty page to the muzzy page, which will call MADV_DONTNEED to release the physical memory back to the system.

https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms
2023-06-28 16:49:45 +08:00
0396f78590 [fix](memory) Remove ChunkAllocator & fix Allocator no use mmap (#21259) 2023-06-28 16:10:24 +08:00
29b3d39561 [enhancement](memory) print stacktrace for large allocation (#21069) 2023-06-27 19:39:51 +08:00
6f7759b08d [fix](memory) fix mem tracker grace exit (#21136) 2023-06-26 10:28:24 +08:00
84b97860a1 [fix](memory) Fix memory exceed limit and query has been canceled, Allocator will block 100ms (#20959) 2023-06-21 17:35:19 +08:00
ae8914d5c8 [fix](memory) Fix meta_tool start failed #20045 2023-06-05 17:09:13 +08:00
b81e9e2521 [fix](resource-group) Fix resource group memory isolation may release too much memory (#20066)
Suppose three queries are executed in a resource group with a memory_limit of 8G, and they consume memory of query_a = 3G, query_b = 3G, and query_c = 3G. The total memory used is counted as 9G when the resource group GC is executed, which exceeds the resource group limit and cancels query_a.

When the resource group is next GC, the memory of query_a may not be freed yet, and it will be counted again in the total memory consumed by that resource group, which again exceeds the resource group limit and cancels query_b.

From the user's perspective, it is fine to execute query_a and query_b at the same time, but executing query_ a, query_b and query_c will be cancelled for two queries, which is not as expected.

This pr skips the queries that are cancelled when counting the memory used by the resource group. If this causes the process memory to grow, the process gc will handle it.
2023-05-26 14:50:12 +08:00
1c950d6930 [fix](config) fix memory config enable_query_memroy_overcommit spell problem #19898 2023-05-22 00:32:20 +08:00
33fd965b5c [feature-wip](resouce-group) Supports memory soft isolation of resource group (#19802)
create resource groups name properties(
    'enable_memory_overcommit' = 'true' // whether to enable memory soft isolation
)
2023-05-21 19:33:57 +08:00
abde8bf26a [chore](build) Fix the compilation errors on macOS (arm64) (#19859)
Some errors raise when building the codebase on macOS (arm64).
2023-05-19 18:50:47 +08:00
068a32bc49 [Improvement](memory) faststring use Allocator #19762
After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator.

Currently page body compress will catch memory alloc failure exception
2023-05-18 15:00:49 +08:00
7c8b7878cd [fix](memory) Print all query/load memory before memory GC when memory_debug=true (#19720) 2023-05-18 14:55:47 +08:00
6c9c9e9765 [feature-wip](resource-group) Supports memory hard isolation of resource group (#19526) 2023-05-15 22:45:46 +08:00
03538381a3 [enhancement](memory) MemCounter supports lock-free thread safety (#19256)
make try_add() and update_peak() thread-safe.
2023-05-10 02:24:07 +08:00
58cb404661 [fix](memory) Allocator throws Exception instead of std::bad_alloc (#19285)
W0505 01:31:25.840227 1727715 scanner_scheduler.cpp:340] Scan thread read VScanner failed: [MEM_LIMIT_EXCEEDED]PreCatch error code:11, [E11] Allocator sys memory check failed: Cannot alloc:16384, consuming tracker:<Orphan>, exec node:<>, process memory used 5.87 GB exceed limit 5.64 GB or sys mem available 252.17 GB less than low water mark 1.60 GB, failed alloc size 16.00 KB.
    @     0x555c19e0cca8  doris::Exception::Exception()
    @     0x555c1c3e0c3f  Allocator<>::sys_memory_check()
    @     0x555c1c3e1052  Allocator<>::memory_check()
    @     0x555c19e0a645  Allocator<>::alloc()
    @     0x555c1c34508b  COWHelper<>::create<>()
    @     0x555c1e23f574  doris::vectorized::ConvertThroughParsing<>::execute<>()
    @     0x555c1e23f209  doris::vectorized::FunctionConvertFromString<>::execute_impl()
    @     0x555c1e23f4aa  doris::vectorized::FunctionConvertFromString<>::execute_impl()
    @     0x555c1e15ac29  doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns()
    @     0x555c1e15ac56  doris::vectorized::PreparedFunctionImpl::execute()
    @     0x555c1e245276  _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast14create_wrapperINS4_14DataTypeNumberIiEEEESt8functionISC_ERKSt10shared_ptrIKNS4_9IDataTypeEEPKT_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSY_
    @     0x555c1e2a9341  _ZZNK5doris10vectorized12FunctionCast23prepare_remove_nullableEPNS_15FunctionContextERKSt10shared_ptrIKNS0_9IDataTypeEES9_bENKUlS3_RNS0_5BlockERKSt6vectorImSaImEEmmE_clES3_SB_SG_mm
    @     0x555c1e2a8d42  _ZNSt17_Function_handlerIFN5doris6StatusEPNS0_15FunctionContextERNS0_10vectorized5BlockERKSt6vectorImSaImEEmmEZNKS4_12FunctionCast23prepare_remove_nullableES3_RKSt10shared_ptrIKNS4_9IDataTypeEESJ_bEUlS3_S6_SB_mmE_E9_M_invokeERKSt9_Any_dataOS3_S6_SB_OmSQ_
    @     0x555c1e20e42b  doris::vectorized::PreparedFunctionCast::execute_impl()
    @     0x555c1e15ac29  doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns()
    @     0x555c1e15ac56  doris::vectorized::PreparedFunctionImpl::execute()
    @     0x555c1d63e960  doris::vectorized::IFunctionBase::execute()
    @     0x555c1d628700  doris::vectorized::VCastExpr::execute()
    @     0x555c1d6163e5  doris::vectorized::VExprContext::execute()
    @     0x555c20a83fe1  doris::vectorized::VFileScanner::_convert_to_output_block()
    @     0x555c20a809af  doris::vectorized::VFileScanner::_get_block_impl()
    @     0x555c209b9bc4  doris::vectorized::VScanner::get_block()
    @     0x555c209b1a50  doris::vectorized::ScannerScheduler::_scanner_scan()
    @     0x555c209b2ac1  _ZNSt17_Function_handlerIFvvEZZN5doris10vectorized16ScannerScheduler18_schedule_scannersEPNS2_14ScannerContextEENK3$_0clEvEUlvE1_E9_M_invokeERKSt9_Any_data
    @     0x555c1a8378cf  doris::ThreadPool::dispatch_thread()
    @     0x555c1a830fac  doris::Thread::supervise_thread()
    @     0x7f461faa117a  start_thread
    @     0x7f462033bdf3  __GI___clone
    @              (nil)  (unknown)
2023-05-05 18:01:48 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
c704351273 [enhancement](memory) Refactor memory limit exceeded behavior (#18590)
No check mem tracker limit and no cancel task in mem hook, only in Allocator. This helps in clearer analysis of memory issues and reduces performance loss.
PODArray/hash table/arena memory allocation will use Allocator.

Optimize mem limit exceeded log printing

Optimize compilation time
2023-04-14 10:42:35 +08:00
308ff9a16f [enchancement](memory) tracking lru cache memory and page memory not in cache (#18361)
Statistics lru cache memory in metrics
Statistics page memory not in cache in mem tracker
2023-04-07 14:22:44 +08:00
01d012bab7 [fix](memory) Remove page cache regular clear, disabled jemalloc prof by default (#18218)
Remove page cache regular clear
Now the page cache is turned off by default. If the user manually opens the page cache, it can be considered that the user can accept the memory usage of the page cache, and then can consider adding a manual clear command to the cache.

fix memory gc cancel top memory query

jemalloc prof is not enabled by default
2023-03-30 09:39:37 +08:00
990479e177 [refactor](memory) Query waits for memory free in Allocator, after memory exceed limit. (#18075)
After the memory exceeds the limit, the previous query waited for memory free in the mem hook, and changed it to wait in the Allocator.

more controllable and safe
2023-03-27 09:06:03 +08:00
7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074)
Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s
2023-03-25 23:22:57 +08:00
Pxl
16fc3a0e22 [Chore](compile) remove some unused static on inline function to reduce compile time (#17603)
remove some unused static on inline function to reduce compile time
2023-03-13 11:11:59 +08:00
b7677beab7 [enhancement](memtracker) Add special counter for memtracker and fix thread create and destroy track #17301
Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time
Track thread create and destroy memory, which was previously removed due to performance loss and added back
2023-03-02 08:55:00 +08:00
3871e989ac [fix](memory) Avoid repeating meaningless memory gc #17258 2023-03-01 19:23:33 +08:00
a1c0054b4c [fix](memory) fix memory GC details and join probe catch bad_alloc (#16989)
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
2023-02-23 08:33:30 +08:00
2074b83c67 [enhancement](third-party) Upgrade JEMalloc version from 5.2.1 to 5.3.0 (#14871)
https://github.com/jemalloc/jemalloc/releases
2023-02-20 00:00:40 +08:00
63d57b83f3 [fix](memory) Fix request jemallloc metrics wait lock je_malloc_mutex_lock_slow #16381
MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.
2023-02-04 22:49:22 +08:00
Pxl
5e4bb98900 [Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290)
enable -Wpedantic and update lowest gcc version to 11.1
2023-02-03 11:28:48 +08:00
e9afd3210c [improvement](memory) Optimize the log of process memory insufficient and support regular GC cache (#16084)
1. When the process memory is insufficient, print the process memory statistics in a more timely and detailed manner.
2. Support regular GC cache, currently only page cache and chunk allocator are included, because many people reported that the memory does not drop after the query ends.
3. Reduce system available memory warning water mark to reduce memory waste
4. Optimize soft mem limit logging
2023-01-29 10:02:04 +08:00
949a065f22 [improvement](memory) load support overcommit memory (#16083)
Overcommit memory means that when the memory is sufficient, it no longer checks whether the memory of query/load exceeds the exec mem limit.

Instead, when the process memory exceeds the limit or the available system memory is insufficient, cancel the top overcommit query in minor gc, and cancel top memory query in full gc.

Previously only query supported overcommit memory, this pr supports load, including insert into and stream load.

Detailed explanation, I will update the memory document in these two days~
15bd56cd43/docs/zh-CN/docs/admin-manual/maint-monitor/memory-management/memory-limit-exceeded-analysis.md
2023-01-28 16:10:18 +08:00
97fcad76f8 [enhancement](memtracker) Improve readability (#15716) 2023-01-16 16:30:35 +08:00