Commit Graph

103 Commits

Author SHA1 Message Date
0d691c638b [Feature](profile)Support report runtime workload statistics #29591 2024-01-12 11:59:27 +08:00
83e7235bab [fix](memory) Add thread asynchronous purge jemalloc dirty pages (#28655)
jemallctl purge all arena dirty pages may take several seconds, which will block memory GC and cause OOM.
So purge asynchronously in a thread.
2023-12-22 12:05:20 +08:00
022762d5f0 [fix](memory) Fix work load group GC and add logs to locate slow GC #24975
Fix work load group GC, add cancel load and add logs.
Unify the format and change all to lowercase of GC logs, avoid unnecessary trouble when grep or less
Add logs to help locate the cause of slow GC.
2023-10-12 10:33:56 +08:00
c7ae2a7d22 [Refactor & Bugfix](static variables) move some static vairables to exec_env (#24029) 2023-09-13 09:27:03 +08:00
25b6e4deb2 [fix](daemon) Fix incorrect initialization order of daemon services (#23578)
Current initialization dependency:

      Daemon ───┬──► StorageEngine ──► ExecEnv ──► Disk/Mem/CpuInfo
                │
                │
BackendService ─┘
However, original code incorrectly initialize Daemon before StorageEngine.
This PR also stop and join threads of daemon services in their dtor, to ensure Daemon services release resources in reverse order of initialization via RAII.
2023-08-31 19:46:38 +08:00
f88f021e52 [fix](bug) Fix BE thread safe start and stop #22560 2023-08-11 15:34:10 +08:00
ed6bb1fc9d [fix](memory) remove memory tracker profile refresh thread #22582
Memtrackers are usually bound to operators in query/load. If a large number of query/loads are stuck, memtrackers will be very large. memory tracker profile refresh thread will get stuck on the lock.

This pr is for branch-2.0, I will rewrite the memory profile in the next pr
2023-08-04 11:51:19 +08:00
ee754307bb [refactor](load) refactor memtable flush actively (#21634) 2023-07-30 21:31:54 +08:00
38c8657e5e [improve](memory) more grace logging for memory exceed limit (#21311)
more grace logging for Allocator and MemTracker when memory exceed limit
fix bthread grace exit.
2023-07-05 14:59:06 +08:00
d2c42ec638 [fix](memory) Purge Jemalloc arena dirty pages when memory insufficient (#21237)
Jemalloc dirty page only use madvise MADV_FREE, memory is not release back to system, RSS won't reduce in time,

So when the process memory exceed limit or system available memory is insufficient,
manually transfer dirty page to the muzzy page, which will call MADV_DONTNEED to release the physical memory back to the system.

https://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms
2023-06-28 16:49:45 +08:00
1ec7f0e50a [fix](memory) memory management thread exits gracefully
memory_maintenance_thread, memory_gc_thread, load_channel_tracker_refresh_thread, memory_tracker_profile_refresh_thread
2023-06-16 11:40:24 +08:00
e801e3b737 [fix](memory) Fix crash at bthread_setspecific in brpc::Socket::CheckHealth() (#20450)
Only switch to bthread local when modifying the mem tracker in the thread context. No longer switches to bthread local by default when bthread starts
mem tracker increases brpc IOBufBlockMemory memory
remove thread mem tracker metrics
2023-06-08 19:48:19 +08:00
346c51faa2 [fix](expr) Make VExprContext exit gracefully (#19984) 2023-05-26 20:21:53 +08:00
cf7a74f6ec [fix](memory) query check cancel while waiting for memory in Allocator, and optimize log (#19967)
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
2023-05-24 11:08:48 +08:00
33fd965b5c [feature-wip](resouce-group) Supports memory soft isolation of resource group (#19802)
create resource groups name properties(
    'enable_memory_overcommit' = 'true' // whether to enable memory soft isolation
)
2023-05-21 19:33:57 +08:00
7c8b7878cd [fix](memory) Print all query/load memory before memory GC when memory_debug=true (#19720) 2023-05-18 14:55:47 +08:00
d5d47703fe [fix](memory) remove auto option in memory config and optimize memtracker logs #19706
fix mem_limit default value
memory_gc_sleep_time_s to memory_gc_sleep_time_ms
LoadChannelMgr::_handle_mem_exceed_limit process_mem_limit to process soft mem limit
fix query mem tracker print
2023-05-18 08:54:03 +08:00
6c9c9e9765 [feature-wip](resource-group) Supports memory hard isolation of resource group (#19526) 2023-05-15 22:45:46 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
b68857902e [Compile](BE) Fix compile failed with tcmalloc (#18748) 2023-04-18 09:26:45 +08:00
9e960f4c4f [chore](build) Use include-what-you-use to optimize includes (#18681)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-17 11:44:58 +08:00
dd78001cc1 [fix](memory) Fix memtable flush mem tracker #18330 2023-04-03 20:37:14 +08:00
01d012bab7 [fix](memory) Remove page cache regular clear, disabled jemalloc prof by default (#18218)
Remove page cache regular clear
Now the page cache is turned off by default. If the user manually opens the page cache, it can be considered that the user can accept the memory usage of the page cache, and then can consider adding a manual clear command to the cache.

fix memory gc cancel top memory query

jemalloc prof is not enabled by default
2023-03-30 09:39:37 +08:00
335c1e5953 [fix](memory) Fix MacOS mem_limit parse error and GC after env Init #17528
Fix MacOS mem_limit parse result is 0.
Fix GC after env Init, otherwise, when the memory is insufficient, BE will start failure.
*** Query id: 0-0 ***
*** Aborted at 1677833773 (unix time) try "date -d @1677833773" if you are using GNU date ***
*** Current BE git commitID: 8ee5f45 ***
*** SIGSEGV address not mapped to object (@0x70) received by PID 24145 (TID 0x7fa53c9fd700) from PID 112; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at be/src/common/signal_handler.h:420
 1# os::Linux::chained_handler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo*, void*) in /usr/local/jdk/jre/lib/amd64/server/libjvm.so
 4# 0x00007FA56295A400 in /lib64/libc.so.6
 5# doris::MemTrackerLimiter::log_process_usage_str(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:208
 6# doris::MemTrackerLimiter::print_log_process_usage(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) at be/src/runtime/memory/mem_tracker_limiter.cpp:226
 7# doris::Daemon::memory_maintenance_thread() at be/src/common/daemon.cpp:245
 8# doris::Thread::supervise_thread(void*) at be/src/util/thread.cpp:455
 9# start_thread in /lib64/libpthread.so.0
10# clone in /lib64/libc.so.6
2023-03-08 14:00:57 +08:00
b7677beab7 [enhancement](memtracker) Add special counter for memtracker and fix thread create and destroy track #17301
Add a special counter for memtracker, faster, but relaxed ordering and not accurate in real time
Track thread create and destroy memory, which was previously removed due to performance loss and added back
2023-03-02 08:55:00 +08:00
a1e3b908d7 [fix](memory) split mem usage thread and gc thread to different threads (#17213)
Ensure that the memory status is refreshed in time
Avoid frequent GC
2023-03-01 19:19:05 +08:00
a1c0054b4c [fix](memory) fix memory GC details and join probe catch bad_alloc (#16989)
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
2023-02-23 08:33:30 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
1de4e312cc [fix](metric) Fix be core when set enable_system_metrics to false in be (#16646)
when enable_system_metrics is false, we should not use system_metrics any more

Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-02-12 23:01:41 +08:00
63d57b83f3 [fix](memory) Fix request jemallloc metrics wait lock je_malloc_mutex_lock_slow #16381
MetricRegistry::trigger_all_hooks holds the metrics lock and is stuck in get_je_metrics, to_prometheus is waiting for MetricRegistry::trigger_all_hooks to release the lock, so get_je_metrics is no longer called in MetricRegistry::trigger_all_hooks.
2023-02-04 22:49:22 +08:00
17885acd09 [improvement](metrics) Metrics add all rowset nums and segment nums (#16208) 2023-01-30 09:55:32 +08:00
e9afd3210c [improvement](memory) Optimize the log of process memory insufficient and support regular GC cache (#16084)
1. When the process memory is insufficient, print the process memory statistics in a more timely and detailed manner.
2. Support regular GC cache, currently only page cache and chunk allocator are included, because many people reported that the memory does not drop after the query ends.
3. Reduce system available memory warning water mark to reduce memory waste
4. Optimize soft mem limit logging
2023-01-29 10:02:04 +08:00
e49766483e [refactor](remove unused code) remove many xxxVal structure (#16143)
remove many xxxVal structure
remove BetaRowsetWriter::_add_row
remove anyval_util.cpp
remove non-vectorized geo functions
remove non-vectorized like predicate
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-28 14:17:43 +08:00
0148b39de0 [fix](metric) fix be down when enable_system_metrics is false (#16140)
if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, 
maybe you set enable_system_metrics to false ", so fix it
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-01-28 00:10:39 +08:00
adb758dcac [refactor](remove non vec code) remove json functions string functions match functions and some code (#16141)
remove json functions code
remove string functions code
remove math functions code
move MatchPredicate to olap since it is only used in storage predicate process
remove some code in tuple, Tuple structure should be removed in the future.
remove many code in collection value structure, they are useless
2023-01-26 16:21:12 +08:00
615a5e7b51 [refactor](remove non vec code) remove non vec functions and AggregateInfo (#16138)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-25 12:53:05 +08:00
6e8eedc521 [refactor](remove unused code) remove storage buffer and orc reader (#16137)
remove olap storage byte buffer
remove orc reader
remove time operator
remove read_write_util
remove aggregate funcs
remove compress.h and cpp
remove bhp_lib

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-24 22:29:32 +08:00
79ad74637d [refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136) 2023-01-24 10:45:35 +08:00
05f0f63718 [fix](daemon) should use GetMonoTimeMicros() (#16070) 2023-01-19 10:44:06 +08:00
16862d9b43 [refactor](remove unused code) remove buffer pool and disk io mgr (#15853)
* [refactor](remove buffer pool and disk io mgr) remove unused code


Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-01-13 09:42:58 +08:00
0fbdf8e3e1 [Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772) 2023-01-12 15:08:21 +08:00
8f31a36429 [feature] support spill to disk for sort node (#15624) 2023-01-11 08:40:58 +08:00
edecc2e706 [feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211)
* [feature-wip](inverted index)inverted index api: reader

* [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL

* [feature-wip](inverted index) Adapt to index meta

* [enhance] add more metrics

* [enhance] add fulltext match query check for column type and index parser

* [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node
2022-12-30 21:48:14 +08:00
c16cc5c602 [fix](memtracker) Fix load channel memory tracker are not refreshed in time (#15048) 2022-12-16 10:43:03 +08:00
f3aea7f0f0 [Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744) 2022-12-11 23:33:18 +08:00
cdbbf1e4ee [enhancement](memory) Add Memory GC when the available memory of the BE process is lacking (#14712)
When the system MemAvailable is less than the warning water mark, or the memory used by the BE process exceeds the mem soft limit, run minor gc and try to release cache.

When the MemAvailable of the system is less than the low water mark, or the memory used by the BE process exceeds the mem limit, run fucc gc, try to release the cache, and start canceling from the query with the largest memory usage until the memory of mem_limit * 20% is released.
2022-12-07 15:28:52 +08:00
07472f7318 [fix](tcmalloc_gc) optimize policy of tcmalloc gc (#14776)
Release memory when memory pressure is above pressure limit and keep at lease 2% memory as tcmalloc cache.
2022-12-05 21:16:35 +08:00
5b29489c7f (tcmalloc) gc does not work in somecases (#14732)
gc does not work in some cases
2022-12-02 09:18:23 +08:00
486a77fec0 [fix](tcmalloc) use low_watermark instead of hard_mem_limit (#14660)
* [fix](tcmalloc) use low_watermark instead of hard_mem_limit

hard_mem_limit is removed.

* format
2022-11-30 11:29:57 +08:00
e1f0fa069c [enhancement](memory) Refactored process memory statistics periodically refresh, and fix catch bad_alloc (#14580) 2022-11-29 10:15:25 +08:00