doris

Author	SHA1	Message	Date
Xinyi Zou	068a32bc49	[Improvement](memory) faststring use Allocator #19762 After the outer catch exception, faststring resize reserve build may throw a memory alloc failure exception from the Allocator. Currently page body compress will catch memory alloc failure exception	2023-05-18 15:00:49 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	cdbbf1e4ee	[enhancement](memory) Add Memory GC when the available memory of the BE process is lacking (#14712 ) When the system MemAvailable is less than the warning water mark, or the memory used by the BE process exceeds the mem soft limit, run minor gc and try to release cache. When the MemAvailable of the system is less than the low water mark, or the memory used by the BE process exceeds the mem limit, run fucc gc, try to release the cache, and start canceling from the query with the largest memory usage until the memory of mem_limit * 20% is released.	2022-12-07 15:28:52 +08:00
yiguolei	fd3af489a4	[memory](chunkallocator) disable chunkallocator when reserved bytes == 0 (#14494 ) disable chunkallocator when reserved bytes == 0 disable chunkallocator by default	2022-11-23 17:12:53 +08:00
Xinyi Zou	0b945fe361	[enhancement](memtracker) Refactor mem tracker hierarchy (#13585 ) mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc. type includes enum Type { GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan QUERY = 1, // Count the memory consumption of all Query tasks. LOAD = 2, // Count the memory consumption of all Load tasks. COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks. SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks. CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots. BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask. CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask. } Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated. other fix: In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.	2022-11-08 09:52:33 +08:00
HappenLee	d2be5096d6	[Revert](mem) revert the mem config cause perfermace degradation (#13526 ) * Revert "[fix](mem) failure of allocating memory (#13414)" This reverts commit 971eb9172f3e925c0b46ec1ffd1a9037a1b49801. * Revert "[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285)" This reverts commit a5f3880649b094b58061f25c15dccdb50a4a2973.	2022-10-21 08:32:16 +08:00
yiguolei	a5f3880649	[improvement](memory) disable page cache and chunk allocator, optimize memory allocate size (#13285 ) disable page cache by default disable chunk allocator by default not use chunk allocator for vectorized allocator by default add a new config memory_linear_growth_threshold = 128Mb, not allocate memory by RoundUpToPowerOf2 if the allocated size is larger than this threshold. This config is added to MemPool, ChunkAllocator, PodArray, Arena.	2022-10-15 17:27:17 +08:00
Xinyi Zou	82ca62dfcc	[fix](memory) Fix disable_mem_pools to disable cache #12087	2022-08-26 11:43:19 +08:00
Xinyi Zou	c124470408	[enhancement](memory) Fix too much cache leads to less memory available for queries (#11751 ) Disable Chunk Allocator in Vectorized Allocator, this will reduce memory cache. For high concurrent queries, using Chunk Allocator with vectorized Allocator can reduce the impact of gperftools tcmalloc central lock. Jemalloc or google tcmalloc have core cache, Chunk Allocator may no longer be needed after replacing gperftools tcmalloc.	2022-08-16 14:35:57 +08:00
muyizi	bd6e3cf132	[improvement]lock_times_limit (#11404 ) Co-authored-by: songning03 <songning03@meituan.com>	2022-08-02 10:59:58 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
Xinyi Zou	deeb3028ad	[Enhancement] [Memory] [Vectorized] Stress test and optimize memory allocation (#9581 ) * vec stress test, Allocator introduce chunkallocator * fix comment	2022-06-29 02:57:51 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
Xinyi Zou	26bc462e1c	[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145 ) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: `731730da85/docs/en/server.md (bthread-local)` 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.	2022-04-27 20:34:02 +08:00
hongbin	c71ffc01de	[Refactor] Cleanup some unused include (#9063 )	2022-04-18 09:52:31 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
Xinyi Zou	aaaaae53b5	[feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605 ) In pr #8476, all memory usage of a process is recorded in the process mem tracker, and all memory usage of a query is recorded in the query mem tracker, and it is still necessary to manually call `transfer to` to track the cached memory size. We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker. In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately counts more detailed memory usage than before.	2022-03-24 14:29:34 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
Xinyi Zou	ad0d2b82ab	[fix](memory) fix bug that ~BitShufflePageDecoder destroys uninitialized chunk (#7172 ) Added a safe way to destroy Chunk.	2021-11-23 15:24:25 +08:00
Xinyi Zou	f5a35c28e9	[Optimize] [Memory] BitShufflePageDecoder use memory allocated by ChunkAllocator instead of Faststring (#6515 ) BitShufflePageDecoder reuses the memory for storing decoder results, allocate memory directly from the `ChunkAllocator`, the performance is improved to a certain extent. In the case of #6285, the total time consumption is reduced by 13.5%, and the time consumption ratio of `~Reader()` has also been reduced from 17.65% to 1.53%, and the memory allocation is unified to `ChunkAllocator` for centralized management , Which is conducive to subsequent memory optimization. which can avoid the memory waste caused by `Mempool`, because the chunk can be free at any time, but the performance is lower than the allocation from `Mempool`. The guess is that there is no `Mempool` after secondary allocation of large chunks , Will directly apply for a large number of small chunks from `ChunkAllocator`, and it takes longer to lock in `pop_free_chunk` and `push_free_chunk` (but this is not proven from the flame graphs of BE's cpu and contention).	2021-11-17 11:20:21 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
ZHAO Chun	17e52a4bac	Improve LRUCache to get better performance (#1826 ) In this CL, I move the entry's deleter out of LRUCache's mutex block, which can let others access this cache without waiting free cache entry.	2019-09-19 17:37:02 +08:00
ZHAO Chun	11eafe524f	Add ChunkAllocator to accelerate chunk allocation (#1792 ) I add ChunkAllocator in this CL to put unused memory chunk to a chunk pool other than return it to system allocator. Now we only change MemPool's chunk allocation and free to this. And two configuration are introduduced too. 'chunk_reserved_bytes_limit' is the limit of how many bytes this chunk pool can reserve in total and its default value is 2147483648(2GB). 'use_mmap_allocate_chunk': if chunk is allocated via mmap and default value is false. And in my test case with default configuration a simple like "select * from table limit 10", this can improve throughput from 280 QPS to to 650 QPS. And when I config 'chunk_reserved_bytes_limit' to 0, which means this is disabled, the throughput is the same with origin's.	2019-09-13 08:27:24 +08:00

30 Commits