doris

Author	SHA1	Message	Date
Pxl	236e0f1eda	[Feature] Support for querying the trash used capacity (#6247 ) Support for querying the trash used capacity. ``` SHOW TRASH [ON ...] ``` Now user can proactively scan trash directory.	2021-08-10 10:10:47 +08:00
Lijia Liu	f772649535	[Optimize] Optimize lock when check error storage (#6321 ) 1. `StorageEngine::_delete_tablets_on_unused_root_path` will try to obtain tablet shard write lock in `TabletManager` ``` StorageEngine::_delete_tablets_on_unused_root_path TabletManager::drop_tablets_on_error_root_path obtain each tablet shard's write lock ``` 2. `TabletManager::build_all_report_tablets_info` and other methods will obtain tablet shard read lock frequently. So, `StorageEngine::_delete_tablets_on_unused_root_path` will hold `_store_lock` for a long time. This will make it difficult for other threads to get write `_store_lock`, such as `StorageEngine::get_stores_for_create_tablet` `drop_tablets_on_error_root_path` is a small probability event, `TabletManager::drop_tablets_on_error_root_path` should return when its param `tablet_info_vec` is empty	2021-08-07 21:30:49 +08:00
Pxl	3812cca4db	[Bug]fix the calculation of the "_start_trash_sweep" run interval. (#6177 ) * fix the calculation of the _start_trash_sweep run interval	2021-07-09 09:45:44 +08:00
Zhengguo Yang	68bab73c35	[Bug] Fix select random storage path maybe same at a long time (#6062 ) random_shuflle will generate same random sequence when call multiple times, although we use twice random, but when there is no change in the size relationship between the adjacent numbers, the result of the second shuffle will not change either	2021-06-20 16:16:32 +08:00
Mingyu Chen	d57c2344e1	[MemTracker] Refactored the hierarchical structure of memtracker (#5956 ) To avoid showing too many memtracker on BE web pages. The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE. OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata. TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task. VERBOSE is used for other more detailed memtrackers.	2021-06-16 09:44:24 +08:00
Yingchun Lai	6d6c3d9703	[Enhancement] Reduce memory consumption by releasing readers earier (#5811 ) We created multiple rowset readers to read data of one tablet, after one rowset reader has reached EOF, it can be released to reduce resource (typically memory) consumption. As the same, we can release segment reader when it reach EOF.	2021-06-16 09:37:50 +08:00
Mingyu Chen	206a711f9b	[Bug] SimplifyInvalidDateBinaryPredicatesDateRule may cause invalid query plan (#5987 ) 1. "where 1k > to_date(now())" will return EMPTYSET in query plan. 2. DateLiteral should accept date string like "2021-6-1".	2021-06-10 17:37:26 +08:00
Mingyu Chen	81ecf3d097	[Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945 ) The version information of the tablet will be stored in the memory in an adjacency graph data structure. And as the new version is written and the old version is deleted, the data structure will begin to have empty vertex with no edge associations(orphan vertex). These orphan vertexs should be removed somehow.	2021-06-03 09:59:20 +08:00
Mingyu Chen	8850cfe2ad	[Compaction] Modify compaction logic (#5737 ) 1. Add /api/compaction/run_status to show the running compaction tasks. 2. Support do base and cumulative compaction for one tablet at same time. 3. Modify some log level. 4. Add a feedback document.	2021-05-07 11:18:47 +08:00
Yingchun Lai	58d0c8971e	[Bugfix] Fix BE metrics http API dead lock bug (#5730 )	2021-04-30 10:15:33 +08:00
Yingchun Lai	84f6d74322	[Optimize] Sort trashed files by name and skip processing unexpired files (#5678 )	2021-04-24 17:42:06 +08:00
xinghuayu007	4fa25b6eb9	[Optimize] make tablet meta checkpoint to be threadpool model (#5654 ) Currently Tablet meta checkpoint is a memory-exhausted operation. If a host has 12 disks, it will start 12 threads to do tablet meta checkpoint. In our experience, the data size of one tablet can be as high as 2G. If 12 threads do the checkpoint at the same time, it maybe cause OOM. Therefore, this PR try to solve this problem. Firstly, it only start one thread to produce table meta checkpoint tasks. Secondly, it creates a thread pool to handle these tasks. You can configure the size of the thread pool to control the parallelism in case of OOM. It is a producer-customer model.	2021-04-23 09:45:15 +08:00
weizuo93	f5cf008bcc	[Bug] Fix stream load UT failed (#5692 ) Also move the stream load rocksdb dir to the first of storage root paths	2021-04-23 09:33:42 +08:00
weizuo93	a4f8194111	[Audit][Stream Load] Support audit function for stream load (#5452 ) Record finished stream load job (both successful job and failed job) into audit log so that we can see when the stream load job was executed and check the details of stream load jobs.	2021-04-21 16:36:12 +08:00
Yingchun Lai	be733cfa9c	[Metrics] Add some large memtrackers' metric (#5614 ) MemTracker can provide memory consumption for us to find out which module consume more memory, but it's just a current value, this patch add metrics for some large memory consumers, then we can find out which module consume more memory in timeline, it would be useful to troubleshoot OOM problems and optimize configs.	2021-04-21 09:15:04 +08:00
HappenLee	b423274f17	[Enhance] Make MemTracker more accurate (#5515 ) (#5516 ) * [Enhance] Make MemTracker more accurate (#5515) This PR main about: 1. Improve the readability of MemTrackers' name 2. Add the MemTracker of: * Load * Compaction * SchemaChange * StoragePageCache * TabletManager 3. Change SchemaChange to a Singleon * revise some code for Code Review * change the name of mem_tracker * keep reader_context have the same lifetime of rowset_reader in schema change. * change vlog notice to log(warning) in schema change	2021-04-08 09:14:55 +08:00
Zhengguo Yang	d641a26490	[Refactor] Remove boost filesystem (#5579 ) * use std::filesystem instead of boost Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>	2021-04-08 09:11:59 +08:00
stdpain	ad67dd34a0	update gcc to gcc 10 and support c++17 (#5394 ) * update gcc to gcc 10 and support c++17 update brpc to 0.9.7 update boost to 1.73 remove third-party boost 1.54 for mysql * update cmake version * ignore jdk version * remove unused patch * avoid use SYS_getrandom call	2021-03-25 09:30:38 +08:00
Yingchun Lai	8ead0aaad8	[Enhance] Sort directories by available space when do trash sweep (#5498 ) * [Enhance] Sort directories by available space when do trash sweep In the case when one disk is about to be full, we want to sweep trash data on this disk as quickly as possible. The currently trash sweep function is to remove trashed files order by path's name, however, disk data directories may have some large different available space because of the load balance algorithm, this patch improve it to remove files by directories' available space. * add log	2021-03-12 13:43:27 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
Yingchun Lai	58e58c94d8	[TSAN] Fix tsan bugs (part 1) (#5162 ) ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread problems, such as data race, mutex problems, etc. We should detect TSAN problems for Doris BE, both unit tests and server should pass through TSAN mode, to make Doris more robustness. This is the very beginning patch to fix TSAN problems, and some difficult problems are suppressed in file 'tsan_suppressions', you can suppress these problems by setting: export TSAN_OPTIONS="suppressions=tsan_suppressions" before running: `BUILD_TYPE=tsan ./run-be-ut.sh --run`	2021-01-15 09:45:11 +08:00
HuangWei	85076b5678	[UT] fix test_env & add a sample (#5085 ) Easily create tests.	2020-12-27 22:14:30 +08:00
Yingchun Lai	176dcf8bd9	[Trace] Add trace for create tablet tasks (#5091 ) Add trace for create tablet tasks, it's a useful tool for admin to find out the bottleneck when create tablets timeouted. For example, admin could enlarge 'tablet_map_shard_size' when found 'got tablets shard lock' procedure cost too much time.	2020-12-19 11:18:12 +08:00
Yingchun Lai	f6881d2f7b	[Bug] Fix coredump bug when create new tablets (#5089 ) There is a bug may cause BE coredump when create tablet, the accessing of tablet_set of a data dir should be protected by lock.	2020-12-17 00:34:31 +08:00
weizuo93	f2d69a51d4	[Docs]Remove some unused variables and update BE config documents (#4987 ) Remove some unused variables and update BE config documents about compaction.	2020-12-09 09:28:56 +08:00
weizuo93	ec7e1c6b1b	[Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891 ) The current compaction mechanism is that there is a producer thread that has been producing compaction tasks, and the selected tablet must apply for `permits`. When a tablet could hold `permits`, compaction task for this tablet will be submitted to thread pool. We take compaction score as `permits` which is used for limiting memory consumption. However, `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread, and the number of segment files that actually perform the merge operation is smaller than compaction score. In addition, it is also possible that compaction task exits directly because the tablet doesn't meet the requirements of compaction. This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets' before applying for permits for a compaction task, calculate the number of segment files that actually participate in the merge operation, and take this number as `permits`.	2020-11-30 11:41:14 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
sduzh	10e1e29711	Remove header file common/names.h (#4945 )	2020-11-26 17:00:48 +08:00
HappenLee	2682712349	[Bug] Fix be ut compile failed and core in delta_writer_test when ulimit < 60000. (#4941 )	2020-11-24 22:21:19 +08:00
Zhengguo Yang	0f13eddd97	fix typo in log (#4790 )	2020-10-27 10:03:56 +08:00
Yingchun Lai	6cbefd5621	[LRUCache] Expose LRU Cache status to metrics (#4688 ) Expose LRU Cache status to metrics would be helpful to diagnose problems like high usage, low hit rate.	2020-10-22 21:37:02 +08:00
Zhengguo Yang	09f97f8a05	[Refactor] Fixes some be typo part 2 (#4747 )	2020-10-20 09:28:57 +08:00
Yingchun Lai	45fa67aa71	[Refactor] Remove objects which are only used for unit test (#4751 ) We create some objects which are only used for unit tests, it's not necessary, and it may cause create duplicate instances for some classes. This patch remove unnecessary instance of class BlockManager and StoragePageCache.	2020-10-18 21:37:12 +08:00
Yingchun Lai	3438a746ac	[Typo] Fix typo in metrics macros (#4739 ) Just fix typo. Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)	2020-10-15 19:56:43 +08:00
Zhengguo Yang	75e0ba32a1	Fixes some be typo (#4714 )	2020-10-13 09:37:15 +08:00
weizuo93	eba595583e	[Optimize] Optimize the execution model of compaction to limit memory consumption (#4670 ) Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks is limited by semaphore. If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. Therefore, concurrency limitation is not enough. The strategy proposed in #3624 may be effective to solve the OOM. A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model. Producer will try to generate compaction tasks and acquire `permits` for each task. The compaction task which can hold `permits` will be executed in thread pool and each finished task will release its `permits`. `permits` should be applied for before a compaction task can execute. When the sum of `permits` held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed, until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here. To some extent, memory consumption can be limited by setting appropriate `permits` threshold.	2020-10-11 11:39:25 +08:00
Mingyu Chen	00f25c2b77	[Bug] Tablet and Disk report thread not work (#4597 ) The tablet and disk information reporting threads need to report to the FE periodically. At the same time these two reporting threads will also be triggered by certain events. The modification in PR #4440 caused these two threads to be triggered only by events, and could not report regularly.	2020-09-20 20:51:52 +08:00
Yingchun Lai	b780df697a	[refactor] Optimize threads usage mode in BE (#4440 ) BE can not graceful exit because some threads are running in endless loop. This patch do the following optimization: - Use the well encapsulated Thread and ThreadPool instead of std::thread and std::vector<std::thread> - Use CountDownLatch in thread's loop condition to avoid endless loop - Introduce a new class Daemon for daemon works, like tcmalloc_gc, memory_maintenance and calculate_metrics - Decouple statistics type TaskWorkerPool and StorageEngine notification by submit tasks to TaskWorkerPool's queue - Reorder objects' stop and deconstruct in main(), i.e. stop network services at first, then internal services - Use libevent in pthreads mode, by calling evthread_use_pthreads(), then EvHttpServer can exit gracefully in multi-threads - Call brpc::Server's Stop() and ClearServices() explicitly	2020-09-06 20:19:14 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
weizuo93	613c44e889	[Optimize]Optimize the disk selection strategy on BE for tablet creation (#4373 ) When creating a tablet, it is necessary to select a disk from all disks that meet the requirements on the BE node to store the tablet. In Doris, the current disk selection strategy is to randomly select a disk from all disks that meet the requirements for tablet creation. After the cluster has been running for a long time, we found that the distribution of the number of tablets on different disks in a BE node is unbalanced. In order to solve this problem, we introduced the algorithm of "two random choices" for disk selection when creating the tablet: (1) Select two disks from all disks that meet the requirements on the BE node randomly； (2) Choose the disk with a smaller number of tablet from the two disks selected in (1) for tablet creation.	2020-08-26 10:35:33 +08:00
Mingyu Chen	3359467b9a	[Tablet][Recovery] Support using empty tablet to repair the damaged or missing tablet (#4255 ) In some very special circumstances, such as code bugs, or human misoperation, etc., all replicas of some tablets may be lost. In this case, the data has been substantially lost. However, in some scenarios, the business still hopes to ensure that the query will not report errors even if there is data loss, and reduce the perception of the user layer. At this point, we can use the blank Tablet to fill the missing replica to ensure that the query can be executed normally. Add a new FE config `recover_with_empty_tablet`. default is false. true means to use empty tablet to fill the missing one. Also fix a bug in Fix #4274	2020-08-18 06:13:53 +00:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HuangWei	a01d1aec56	[Compaction] track RowsetReader's mem & add metric (#4068 ) Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244 Only RowsetReaders in compaction are under the track. Other RowsetReaders won't be effected, because the parent_tracker is nullptr.	2020-07-24 07:58:09 +08:00
xy720	2c8fdb6134	[BUG]Make segment V1 and V2 share same file cache (#3945 ) This commit make segment V1 and V2 share on same file cache, so that segment V2's file descriptors stored in cache can be cleaned up as V1 do.	2020-06-29 18:43:09 +08:00
lichaoyong	93a0b47d22	Revert "[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )" (#3931 ) This reverts commit ca96ea30560c9e9837c28cfd2cdd8ed24196f787.	2020-06-24 10:13:45 +08:00
Binglin Chang	ca96ea3056	[Memory Engine] MemTablet creation and compatibility handling in BE (#3762 )	2020-06-18 09:56:07 +08:00
Yingchun Lai	3c09e1e1d8	[trace] Adapt trace util to compaction module (#3814 ) Trace util is helpful for diagnosing compaction performance problems, we can get trace log for base compaction like: ``` W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace: 0610 11:23:03.727535 (+ 0us) storage_engine.cpp:554] start to perform base compaction 0610 11:23:03.728961 (+ 1426us) storage_engine.cpp:560] found best tablet 546859 0610 11:23:03.728963 (+ 2us) base_compaction.cpp:40] got base compaction lock 0610 11:23:03.729029 (+ 66us) base_compaction.cpp:44] rowsets picked 0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:24:51.784818 (+ 379us) compaction.cpp:74] prepare finished 0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished 0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built 0610 11:26:33.484482 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished 0610 11:26:33.513300 (+ 103us) base_compaction.cpp:49] compaction finished 0610 11:26:33.513441 (+ 141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6} ``` for cumulative compaction like: ``` W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace: 0610 11:14:08.068484 (+ 0us) storage_engine.cpp:520] start to perform cumulative compaction 0610 11:14:08.069844 (+ 1360us) storage_engine.cpp:526] found best tablet 547083 0610 11:14:08.069846 (+ 2us) cumulative_compaction.cpp:42] got cumulative compaction lock 0610 11:14:08.069947 (+ 101us) cumulative_compaction.cpp:46] calculated cumulative point 0610 11:14:08.070141 (+ 194us) cumulative_compaction.cpp:50] rowsets picked 0610 11:14:08.070143 (+ 2us) compaction.cpp:46] got concurrency lock and start to do compaction 0610 11:14:08.070518 (+ 375us) compaction.cpp:74] prepare finished 0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished 0610 11:14:15.390916 (+ 1023us) compaction.cpp:102] output rowset built 0610 11:14:15.390917 (+ 1us) compaction.cpp:106] check correctness finished 0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished 0610 11:14:15.409496 (+ 36us) cumulative_compaction.cpp:55] compaction finished 0610 11:14:15.410138 (+ 642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1} ```	2020-06-13 19:31:51 +08:00
Yingchun Lai	e4dc2ec440	[StorageEngine] Make StorageEngine::open return more detailed info (#3761 ) StorageEngine::open just return a very vague status info when failed, we have to check logs to find out the root reason, and it's not convenient to check logs if we run unit tests in CI dockers. It would be better to return more detailed failure info to point out the root reason, for example, it may return error status with message "file descriptors limit is too small".	2020-06-07 10:21:33 +08:00

1 2

99 Commits