doris

Author	SHA1	Message	Date
Yingchun Lai	be733cfa9c	[Metrics] Add some large memtrackers' metric (#5614 ) MemTracker can provide memory consumption for us to find out which module consume more memory, but it's just a current value, this patch add metrics for some large memory consumers, then we can find out which module consume more memory in timeline, it would be useful to troubleshoot OOM problems and optimize configs.	2021-04-21 09:15:04 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
Yingchun Lai	8ead0aaad8	[Enhance] Sort directories by available space when do trash sweep (#5498 ) * [Enhance] Sort directories by available space when do trash sweep In the case when one disk is about to be full, we want to sweep trash data on this disk as quickly as possible. The currently trash sweep function is to remove trashed files order by path's name, however, disk data directories may have some large different available space because of the load balance algorithm, this patch improve it to remove files by directories' available space. * add log	2021-03-12 13:43:27 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Yingchun Lai	58e58c94d8	[TSAN] Fix tsan bugs (part 1) (#5162 ) ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread problems, such as data race, mutex problems, etc. We should detect TSAN problems for Doris BE, both unit tests and server should pass through TSAN mode, to make Doris more robustness. This is the very beginning patch to fix TSAN problems, and some difficult problems are suppressed in file 'tsan_suppressions', you can suppress these problems by setting: export TSAN_OPTIONS="suppressions=tsan_suppressions" before running: `BUILD_TYPE=tsan ./run-be-ut.sh --run`	2021-01-15 09:45:11 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
Yingchun Lai	176dcf8bd9	[Trace] Add trace for create tablet tasks (#5091 ) Add trace for create tablet tasks, it's a useful tool for admin to find out the bottleneck when create tablets timeouted. For example, admin could enlarge 'tablet_map_shard_size' when found 'got tablets shard lock' procedure cost too much time.	2020-12-19 11:18:12 +08:00
Yingchun Lai	f6881d2f7b	[Bug] Fix coredump bug when create new tablets (#5089 ) There is a bug may cause BE coredump when create tablet, the accessing of tablet_set of a data dir should be protected by lock.	2020-12-17 00:34:31 +08:00
weizuo93	f2d69a51d4	[Docs]Remove some unused variables and update BE config documents (#4987 ) Remove some unused variables and update BE config documents about compaction.	2020-12-09 09:28:56 +08:00
weizuo93	ec7e1c6b1b	[Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891 ) The current compaction mechanism is that there is a producer thread that has been producing compaction tasks, and the selected tablet must apply for `permits`. When a tablet could hold `permits`, compaction task for this tablet will be submitted to thread pool. We take compaction score as `permits` which is used for limiting memory consumption. However, `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread, and the number of segment files that actually perform the merge operation is smaller than compaction score. In addition, it is also possible that compaction task exits directly because the tablet doesn't meet the requirements of compaction. This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets' before applying for permits for a compaction task, calculate the number of segment files that actually participate in the merge operation, and take this number as `permits`.	2020-11-30 11:41:14 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	2682712349	[Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941 )	2020-11-24 22:21:19 +08:00
Zhengguo Yang	0f13eddd97	fix typo in log (#4790 )	2020-10-27 10:03:56 +08:00
Yingchun Lai	6cbefd5621	[LRUCache] Expose LRU Cache status to metrics (#4688 ) Expose LRU Cache status to metrics would be helpful to diagnose problems like high usage, low hit rate.	2020-10-22 21:37:02 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Yingchun Lai	45fa67aa71	[Refactor] Remove objects which are only used for unit test (#4751 ) We create some objects which are only used for unit tests, it's not necessary, and it may cause create duplicate instances for some classes. This patch remove unnecessary instance of class BlockManager and StoragePageCache.	2020-10-18 21:37:12 +08:00
Yingchun Lai	3438a746ac	[Typo] Fix typo in metrics macros (#4739 ) Just fix typo. Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)	2020-10-15 19:56:43 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
weizuo93	eba595583e	[Optimize] Optimize the execution model of compaction to limit memory consumption (#4670 ) Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough. The strategy proposed in #3624 may be effective to solve the OOM. A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire `permits` for each task. The compaction task which can hold `permits` will be executed in thread pool and each finished task will release its `permits`. `permits` should be applied for before a compaction task can execute. When the sum of `permits` held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here. To some extent, memory consumption can be limited by setting appropriate `permits` threshold.	2020-10-11 11:39:25 +08:00
Mingyu Chen	00f25c2b77	[Bug] Tablet and Disk report thread not work (#4597 ) The tablet and disk information reporting threads need to report to the FE periodically. At the same time these two reporting threads will also be triggered by certain events. The modification in PR #4440 caused these two threads to be triggered only by events, and could not report regularly.	2020-09-20 20:51:52 +08:00
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
weizuo93	613c44e889	[Optimize]Optimize the disk selection strategy on BE for tablet creation (#4373 ) When creating a tablet, it is necessary to select a disk from all disks that meet the requirements on the BE node to store the tablet. In Doris, the current disk selection strategy is to randomly select a disk from all disks that meet the requirements for tablet creation. After the cluster has been running for a long time, we found that the distribution of the number of tablets on different disks in a BE node is unbalanced. In order to solve this problem, we introduced the algorithm of "two random choices" for disk selection when creating the tablet: (1) Select two disks from all disks that meet the requirements on the BE node randomly； (2) Choose the disk with a smaller number of tablet from the two disks selected in (1) for tablet creation.	2020-08-26 10:35:33 +08:00
Mingyu Chen	3359467b9a	[Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255 ) In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one. Also fix a bug in Fix #4274	2020-08-18 06:13:53 +00:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HuangWei	a01d1aec56	[Compaction] track RowsetReader's mem & add metric (#4068 ) Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244 Only RowsetReaders in compaction are under the track. Other RowsetReaders won't be effected, because the parent_tracker is nullptr.	2020-07-24 07:58:09 +08:00
xy720	2c8fdb6134	[BUG]Make segment V1 and V2 share same file cache (#3945 ) This commit make segment V1 and V2 share on same file cache, so that segment V2's file descriptors stored in cache can be cleaned up as V1 do.	2020-06-29 18:43:09 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
Binglin Chang	ca96ea3056	[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )	2020-06-18 09:56:07 +08:00
Yingchun Lai	3c09e1e1d8	[trace] Adapt trace util to compaction module (#3814 ) Trace util is helpful for diagnosing compaction performance problems, we can get trace log for base compaction like: ``` W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace: 0610 11:23:03.727535 (+ 0us) storage_engine.cpp:554] start to perform base compaction 0610 11:23:03.728961 (+ 1426us) storage_engine.cpp:560] found best tablet 546859 0610 11:23:03.728963 (+ 2us) base_compaction.cpp:40] got base compaction lock 0610 11:23:03.729029 (+ 66us) base_compaction.cpp:44] rowsets picked 0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:24:51.784818 (+ 379us) compaction.cpp:74] prepare finished 0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished 0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built 0610 11:26:33.484482 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished 0610 11:26:33.513300 (+ 103us) base_compaction.cpp:49] compaction finished 0610 11:26:33.513441 (+ 141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6} ``` for cumulative compaction like: ``` W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace: 0610 11:14:08.068484 (+ 0us) storage_engine.cpp:520] start to perform cumulative compaction 0610 11:14:08.069844 (+ 1360us) storage_engine.cpp:526] found best tablet 547083 0610 11:14:08.069846 (+ 2us) cumulative_compaction.cpp:42] got cumulative compaction lock 0610 11:14:08.069947 (+ 101us) cumulative_compaction.cpp:46] calculated cumulative point 0610 11:14:08.070141 (+ 194us) cumulative_compaction.cpp:50] rowsets picked 0610 11:14:08.070143 (+ 2us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:14:08.070518 (+ 375us) compaction.cpp:74] prepare finished 0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished 0610 11:14:15.390916 (+ 1023us) compaction.cpp:102] output rowset built 0610 11:14:15.390917 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished 0610 11:14:15.409496 (+ 36us) cumulative_compaction.cpp:55] compaction finished 0610 11:14:15.410138 (+ 642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1} ```	2020-06-13 19:31:51 +08:00
Yingchun Lai	e4dc2ec440	[StorageEngine] Make StorageEngine::open return more detailed info (#3761 ) StorageEngine::open just return a very vague status info when failed, we have to check logs to find out the root reason, and it's not convenient to check logs if we run unit tests in CI dockers. It would be better to return more detailed failure info to point out the root reason, for example, it may return error status with message "file descriptors limit is too small".	2020-06-07 10:21:33 +08:00
Dayue Gao	273aad6cf4	[Bug] Restore tablet action not working because tablet status is shutdown (#3551 )	2020-05-15 10:11:17 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
Mingyu Chen	74b987f053	[Bug] Fix bug that storage engine bg threads should start after env is ready	2020-04-29 11:21:19 +08:00
Yingchun Lai	72f3082358	[Metrics] Add some metrics for container size in BE (#3246 ) We can observe the workload of BE, and also it's a way to check whether there is any problem in BE, like some container increase too large and lead to OOM. This patch add the following metrics: ``` Name Description rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start unused_rowsets_count The total count of unused rowset waiting to be GC broker_count The total count of brokers in management data_stream_receiver_count The total count of data stream receivers in management fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count active_scan_context_count The total count of active scan contexts plan_fragment_count The total count of plan fragments in executing load_channel_count The total count of load channels in management result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024) result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count) routine_load_task_count The total count of routine load tasks in executing small_file_cache_count The total count of cached small files' digest info stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M) tablet_writer_count The total count of tablet writers brpc_endpoint_stub_count The total count of brpc endpoints ```	2020-04-25 16:13:39 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
caiconghui	a5703ef114	[Performance] Support sharding txn_map_lock into more small map locks to make good performance for txn manage task (#3222 ) This PR is to enhance the performance for txn manage task, when there are so many txn in BE, the only one txn_map_lock and additional _txn_locks may cause poor performance, and now we remove the additional _txn_locks and split the txn_map_lock into many small locks.	2020-04-09 22:35:15 +08:00
HuangWei	162b1c5d8b	[Storage] Open data dirs parallelly (#3260 )	2020-04-07 20:59:56 +08:00
HuangWei	0462607d8d	StorageEngine: unused_rowsets use unordered_multimap (#3207 )	2020-03-27 14:30:31 +08:00
Mingyu Chen	8aa8b8c96d	[Code Refactor] Using block manager to unify the data file access. (#3189 ) Earlier we introduced `BlockManager` to separate data access logic from underlying file read and write logic. This CL further unifies all `SegmentV2` data access to the `BlockManager`, removes the previous `FileManager` class, and move the file cache to the `FileBlockManager`. There are no logical changes to this CL. After this CL, all user table data is read through the `WritableBlock` and `ReadableBlock` returned by the `BlockManager`, and no file operations are performed directly.	2020-03-25 20:39:07 +08:00
kangpinghuang	f6374fa9a5	Use default_rowset_type to replace compaction_rowset_type (#3101 ) * use default_rowset_type to replace compaction_rowset_type * segment v2 usage document	2020-03-16 22:23:48 +08:00
Mingyu Chen	ee06ce31ba	[Bug] Fix bug that the file_block_mgr object was incorrectly destructed (#3122 ) During the use of the `block`, some methods in the block manager will be referenced. So `file_block_mgr` should be a resident and globally unique object. I put it in `StorageEngine`. TODO: the `BlockManager`, `Env` need to be reorganized.	2020-03-16 17:07:27 +08:00
Yingchun Lai	64a06ea9d4	[UT] Fix some BE unit tests (#3110 ) And also support graceful exit for StorageEngine to avoid hang too long time in unit test.	2020-03-16 13:31:44 +08:00
Mingyu Chen	42931d22cb	[Bug] tablet meta is not updated correctly after compaction (#3098 ) This CL try to fix a potential bug describe in ISSUE: #3097. But I'm not sure this is the root cause. Also remove lots of verbose log, and fix a memory leak.	2020-03-14 23:39:11 +08:00
caiconghui	a1f5b57011	Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task (#3051 ) Support sharding tablet_map_lock into more small map locks to make good performance for tablet manage task	2020-03-09 16:29:56 +08:00

1 2

85 Commits