doris

Author	SHA1	Message	Date
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	308ff9a16f	[enchancement](memory) tracking lru cache memory and page memory not in cache (#18361 ) Statistics lru cache memory in metrics Statistics page memory not in cache in mem tracker	2023-04-07 14:22:44 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
Xinyi Zou	a1c0054b4c	[fix](memory) fix memory GC details and join probe catch bad_alloc (#16989 ) Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory. vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs. join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM. Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.	2023-02-23 08:33:30 +08:00
YueW	43eca4f209	[Feature-WIP](inverted index) Implementation for alter inverted index. (#16371 ) implementation for add/drop inverted index.	2023-02-10 17:56:17 +08:00
Xinyi Zou	17885acd09	[improvement](metrics) Metrics add all rowset nums and segment nums (#16208 )	2023-01-30 09:55:32 +08:00
caiconghui	0148b39de0	[fix](metric) fix be down when enable_system_metrics is false (#16140 ) if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, maybe you set enable_system_metrics to false ", so fix it Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-01-28 00:10:39 +08:00
Tiewei Fang	b8f93681eb	[feature-wip](file reader) Merge broker reader to the new file reader (#14980 ) Currently, there are two sets of file readers in Doris, this pr rewrites the old broker reader with the new file reader. TODO: 1. rewrite stream load pipe and kafka consumer pipe	2022-12-14 12:48:02 +08:00
Tiewei Fang	00f44257e2	[feature-wip](file-reader) Merge hdfs reader to the new file reader (#14875 )	2022-12-09 13:21:59 +08:00
Xinyi Zou	8726bfa121	[enhancement](memory) Add tablet schema cache metrics (#14742 )	2022-12-05 18:19:13 +08:00
pengxiangyu	eab8876abc	[Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487 )	2022-10-28 15:48:22 +08:00
Mingyu Chen	abbf75d302	[doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307 )	2022-08-02 11:34:06 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
plat1ko	3bc6655069	[refactor] remove BlockManager (#10913 ) * remove BlockManager * remove deprecated field in tablet meta	2022-07-17 14:10:06 +08:00
plat1ko	331fa50501	[feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280 ) This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet, and there is no necessary to prohibit loading new data to cooled tablets. Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without perceiving the underlying filesystem. The abstracted `RemoteFileSystem` can try local caching strategies with different granularity, instead of caching segment files as before. To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory. In the future, `FileReader`s and `FileWriter`s should be unified.	2022-07-08 12:18:39 +08:00
Xinyi Zou	26bc462e1c	[feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145 ) 1. fix track bthread - Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS). - This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker. Ref: `731730da85/docs/en/server.md (bthread-local)` 2. fix track vectorized query - Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine. - Refactored ThreadContext to avoid dependency conflicts and make it easier to debug. - Fix some bugs.	2022-04-27 20:34:02 +08:00
pengxiangyu	e157c2c254	[feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806 ) 1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action 2. Change TabletManager::create_tablet() for remote storage 3. Change TabletManager::try_delete_unused_tablet_path() for remote storage	2022-04-22 22:38:10 +08:00
Xinyi Zou	aaaaae53b5	[feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605 ) In pr #8476, all memory usage of a process is recorded in the process mem tracker, and all memory usage of a query is recorded in the query mem tracker, and it is still necessary to manually call `transfer to` to track the cached memory size. We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker. In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately counts more detailed memory usage than before.	2022-03-24 14:29:34 +08:00
Lijia Liu	f772649535	[Optimize] Optimize lock when check error storage (#6321 ) 1. `StorageEngine::_delete_tablets_on_unused_root_path` will try to obtain tablet shard write lock in `TabletManager` ``` StorageEngine::_delete_tablets_on_unused_root_path TabletManager::drop_tablets_on_error_root_path obtain each tablet shard's write lock ``` 2. `TabletManager::build_all_report_tablets_info` and other methods will obtain tablet shard read lock frequently. So, `StorageEngine::_delete_tablets_on_unused_root_path` will hold `_store_lock` for a long time. This will make it difficult for other threads to get write `_store_lock`, such as `StorageEngine::get_stores_for_create_tablet` `drop_tablets_on_error_root_path` is a small probability event, `TabletManager::drop_tablets_on_error_root_path` should return when its param `tablet_info_vec` is empty	2021-08-07 21:30:49 +08:00
weizuo93	ad3a0fb79d	[Metric] Add metrics of tablet version num distribution (#5665 ) Add metrics (P50, P75, P90, P95, P99, etc.) to show the distribution of tablets version count. ``` # TYPE doris_be_tablet_version_num_distribution histogram doris_be_tablet_version_num_distribution{quantile="0.50"} 9.21429 doris_be_tablet_version_num_distribution{quantile="0.75"} 11.7949 doris_be_tablet_version_num_distribution{quantile="0.90"} 13 doris_be_tablet_version_num_distribution{quantile="0.95"} 13 doris_be_tablet_version_num_distribution{quantile="0.99"} 13 doris_be_tablet_version_num_distribution_sum 950 doris_be_tablet_version_num_distribution_count 100 ```	2021-04-23 21:23:22 +08:00
wangbo	f3aded9370	[Bug] System metric init failed cause be start failed (#5262 ) System metric init failed cause be start failed	2021-02-01 00:10:57 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
weizuo93	4c63dc0027	[Metric] Add metrics for compaction permits and log for compaction merge (#4893 ) 1. Add metrics to `used permits` and `waitting permits` for compaction. It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task. 2. Add log which can be chosen by config for merge rowsets. It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.	2020-11-28 10:00:08 +08:00
weizuo93	6247408689	[Compact]Take tablet scan frequency into consider when selecting tablet for compaction (#4837 ) A large number of small segment files will lead to low efficiency for scan operations. Multiple small files can be merged into a large file by compaction operation. So we could take the tablet scan frequency into consideration when selecting an tablet for compaction and preferentially do compaction for those tablets which are scanned frequently during a latest period of time at the present. Using the compaction strategy of Kudu for reference, scan frequency can be calculated for tablet during a latest period of time and be taken into consideration when calculating compaction score.	2020-11-18 21:51:12 +08:00
Yingchun Lai	3438a746ac	[Typo] Fix typo in metrics macros (#4739 ) Just fix typo. Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)	2020-10-15 19:56:43 +08:00
HaiBo Li	5f43fb3bde	[Cache][BE] LRU cache for sql/partition cache #2581 (#4005 ) 1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime 2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve 3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition, 4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data	2020-09-20 20:50:51 +08:00
Yingchun Lai	498b06fbe2	[Metrics] Support tablet level metrics (#4428 ) Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet, but we have no insight about tablets in the cluster. This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request, and not return tablet level metrics by default.	2020-09-02 10:39:41 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
Yingchun Lai	8500d8b695	[metrics] Use atomic instead of SpinLock for integer metric (#4036 )	2020-07-17 11:01:33 +08:00
lichaoyong	6c4d7c60dd	[Feature] Add QueryDetail to store query statistics. (#3744 ) 1. Store the query statistics in memory. 2. Supporting RESTFUL interface to get the statistics.	2020-06-15 18:16:54 +08:00
Mingyu Chen	f89d970cfd	[Bug][Metrics] Fix bug that some of metrics can not be got (#3708 ) The metrics in a metric collector need have same type, but no need to have same unit.	2020-05-28 09:09:14 +08:00
lichaoyong	1cc78fe69b	[Enhancement] Convert metric to Json format (#3635 ) Add a JSON format for existing metrics like this. ``` { "tags": { "metric":"thread_pool", "name":"thrift-server-pool", "type":"active_thread_num" }, "unit":"number", "value":3 } ``` I add a new JsonMetricVisitor to handle the transformation. It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor. Also I add 1. A unit item to indicate the metric better 2. Cloning tablet statistics divided by database. 3. Use white space to replace newline in audit.log	2020-05-27 08:49:30 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
Yingchun Lai	72f3082358	[Metrics] Add some metrics for container size in BE (#3246 ) We can observe the workload of BE, and also it's a way to check whether there is any problem in BE, like some container increase too large and lead to OOM. This patch add the following metrics: ``` Name Description rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start unused_rowsets_count The total count of unused rowset waiting to be GC broker_count The total count of brokers in management data_stream_receiver_count The total count of data stream receivers in management fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count active_scan_context_count The total count of active scan contexts plan_fragment_count The total count of plan fragments in executing load_channel_count The total count of load channels in management result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024) result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count) routine_load_task_count The total count of routine load tasks in executing small_file_cache_count The total count of cached small files' digest info stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M) tablet_writer_count The total count of tablet writers brpc_endpoint_stub_count The total count of brpc endpoints ```	2020-04-25 16:13:39 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
LingBin	58b8e3f574	[Fs Block] Add block layer to storage-engine (#2983 ) The abstraction of the Block layer, inspired by Kudu, lies between the "business layer" and the "underlying file storage layer" (`Env`), making them no longer strongly coupled. In this way, for the business layer (such as `SegmentWriter`), there is no need to directly do the file operation, which will bring better encapsulation. An ideal situation in the future is: when we need to support a new file storage system, we only need to add a corresponding type of BlockManager without modifying the business code (such as `SegmentWriter`). With the Block layer, there are some benefits: 1. First and foremost, the mapping relationship between data and `Env` is more flexible. For example, in the storage engine, the data of the tablet can be placed in multiple file systems (`Env`) at the same time. That is, one-to-many relationships can be supported. For example: one on the local and one on the remote storage. 2. The mapping relationship between blocks and files can be adjusted, for example, it may not be a one-to-one relationship. For example, the data of multiple blocks can be stored in a physical file, which can reduce the number of files that need to be opened during querying. It is like `LogBlockManager` in Kudu. 3. We can move the opened-file-cache under the Block layer, which can automatically close and open the files used by the upper layer, so that the upper business level does not need to be aware of the restrictions of the file handle at all (This problem is often encountered online now). 4. Better automatic cleanup logic when there are exceptions. For example, a block that is not closed explicitly can automatically clean up its corresponding file, thereby avoiding generating most garbage files. 5. More convenient for batch file creation and deletion. Some business operations create multiple files, such as compaction. At present, the processing flow that these files go through is executed one by one: 1) creation; 2) writing data; 3) fsync to disk. But in fact, this is not necessary, we only need to fsync this batch of files at the end. The advantage is that it can give the operating system more opportunities to perform IO merge, thereby improving performance. However, this operation is relatively tedious, there is no need to be coupled in the business code, it is an ideal place to put it in the Block layer. This is the first patch, just add related classes, laying the groundwork for later switching of read and write logic.	2020-03-01 10:48:00 +08:00
Mingyu Chen	c39d35df4c	Add tablet compaction score metrics (#2427 ) [Metric] Add tablet compaction score metrics Backend: Add metric "tablet_max_compaction_score" to monitor the current max compaction score of tablets on this Backend. This metric will be updated each time the compaction thread picking tablets to compact. Frontend: Add metric "tablet_max_compaction_score" for each Backend. These metrics will be updated when backends report tablet. And also add a calculated metric "max_tablet_compaction_core" to monitor the max compaction core of tablets on all Backends.	2019-12-12 17:46:59 +08:00
ZHAO Chun	f130bd3e7b	Use Env function to operate directory (#1980 ) Now Env has unify all environment operation, such as file operation. However some of our old functions don't leverage it. This change unify FileUtils::scan_dir to use Env's function.	2019-10-15 09:25:12 +08:00
kangpinghuang	6d040a33af	Add zone map page(#1390 ) (#1633 )	2019-08-24 00:57:30 +08:00
Mingyu Chen	a88b55e649	Add more logs and metrics to trace the broker load process (#1530 ) The Operator wants to known when the job being scheduled as PENDING and LOADING. And how long it takes to finish these sub states. Also add 2 metrics on BE to monitor the memtable's flush time. `memtable_flush_total` and `memtable_flush_duration_us`	2019-07-23 21:42:44 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00
Mingyu Chen	a08170fd50	Enhance the usabilities (#1100 ) * Enhence the usabilities 1. Add metrics to monitor transactions and steaming load process in BE. 2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s. 3. Modify FE config 'enable_metric_calculator' to true. 4. Add more log for tracing broker load process. 5. Modify the query report process, to cancel query immediately if some instance failed. * Fix bugs 1. Avoid NullPointer when enabling colocation join with broker load 2. Return immediately when pull load task coordinator execution failed	2019-05-07 15:55:04 +08:00
Mingyu Chen	afa3aa9069	Add some pre-calculated metrics (#1079 ) 1. max io util of disks 2. max network send/receive bytes rate of all network devices 3. base/cumulative compaction request counter and failure counter	2019-04-30 11:12:23 +08:00
Mingyu Chen	d3251a19f7	Modify the method to obtain some metrics (#904 )	2019-04-10 19:37:48 +08:00
lichaoyong	a9e9aef3ca	Fix sync and trash bug (#570 ) 1. no need to save header when header has no incremental delta 2. make fsync tablet_meta configurable 3. add metric for meta operation	2019-01-22 20:13:09 +08:00
chenhao7253886	37b4cafe87	Change variable and namespace name in BE (#268 ) Change 'palo' to 'doris'	2018-11-02 10:22:32 +08:00

46 Commits