Commit Graph

46 Commits

Author SHA1 Message Date
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
308ff9a16f [enchancement](memory) tracking lru cache memory and page memory not in cache (#18361)
Statistics lru cache memory in metrics
Statistics page memory not in cache in mem tracker
2023-04-07 14:22:44 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
a1c0054b4c [fix](memory) fix memory GC details and join probe catch bad_alloc (#16989)
Fix Redhat 4.x OS /proc/meminfo has no MemAvailable, disable MemAvailable to control memory.
vm_rss_str and mem_available_str recorded when gc is triggered, to avoid memory changes during gc and cause inaccurate logs.
join probe catch bad_alloc, this may alloc 64G memory at a time, avoid OOM.
Modify document doris_be_all_segments_num and doris_be_all_rowsets_num names.
2023-02-23 08:33:30 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
17885acd09 [improvement](metrics) Metrics add all rowset nums and segment nums (#16208) 2023-01-30 09:55:32 +08:00
0148b39de0 [fix](metric) fix be down when enable_system_metrics is false (#16140)
if we set enable_system_metrics to false, we will see be down with following message "enable metric calculator failed, 
maybe you set enable_system_metrics to false ", so fix it
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2023-01-28 00:10:39 +08:00
b8f93681eb [feature-wip](file reader) Merge broker reader to the new file reader (#14980)
Currently, there are two sets of file readers in Doris, this pr rewrites the old broker reader with the new file reader.

TODO:
1. rewrite stream load pipe and kafka consumer pipe
2022-12-14 12:48:02 +08:00
00f44257e2 [feature-wip](file-reader) Merge hdfs reader to the new file reader (#14875) 2022-12-09 13:21:59 +08:00
8726bfa121 [enhancement](memory) Add tablet schema cache metrics (#14742) 2022-12-05 18:19:13 +08:00
eab8876abc [Feature](remote) Using heavy schema change if the table is not enable light weight schema change (#13487) 2022-10-28 15:48:22 +08:00
abbf75d302 [doc][refactor](metrics) Reorganize FE and BE metrics and add document (#11307) 2022-08-02 11:34:06 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
3bc6655069 [refactor] remove BlockManager (#10913)
* remove BlockManager
* remove deprecated field in tablet meta
2022-07-17 14:10:06 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
26bc462e1c [feature-wip] (memory tracker) (step5) Fix track bthread, fix track vectorized query (#9145)
1. fix track bthread
- Bthread, a high performance M:N thread library used by brpc. In Doris, a brpc server response runs on one bthread, possibly on multiple pthreads. Currently, MemTracker consumption relies on pthread local variables (TLS).
- This caused pthread TLS MemTracker confusion when switching pthread TLS MemTracker in brpc server response. So replacing pthread TLS with bthread TLS in the brpc server response saves the MemTracker.
Ref: 731730da85/docs/en/server.md (bthread-local)

2. fix track vectorized query
- Added track mmap. Currently, mmap allocates memory in many places of the vectorized execution engine.
- Refactored ThreadContext to avoid dependency conflicts and make it easier to debug.
- Fix some bugs.
2022-04-27 20:34:02 +08:00
e157c2c254 [feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806)
1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action
2. Change TabletManager::create_tablet() for remote storage
3. Change TabletManager::try_delete_unused_tablet_path() for remote storage
2022-04-22 22:38:10 +08:00
aaaaae53b5 [feature] (memory) Switch TLS mem tracker to separate more detailed memory usage (#8605)
In pr #8476, all memory usage of a process is recorded in the process mem tracker,
and all memory usage of a query is recorded in the query mem tracker,
and it is still necessary to manually call `transfer to` to track the cached memory size.

We hope to separate out more detailed memory usage based on Hook TCMalloc new/delete + TLS mem tracker.

In this pr, the more detailed mem tracker is switched to TLS, which automatically and accurately
counts more detailed memory usage than before.
2022-03-24 14:29:34 +08:00
f772649535 [Optimize] Optimize lock when check error storage (#6321)
1. `StorageEngine::_delete_tablets_on_unused_root_path` will try to obtain tablet shard write lock in `TabletManager`
```
StorageEngine::_delete_tablets_on_unused_root_path
  TabletManager::drop_tablets_on_error_root_path
    obtain each tablet shard's write lock
```
2. `TabletManager::build_all_report_tablets_info` and other methods will obtain tablet shard read lock frequently.

So, `StorageEngine::_delete_tablets_on_unused_root_path` will hold `_store_lock` for a long time.
This will make it difficult for other threads to get write `_store_lock`, such as `StorageEngine::get_stores_for_create_tablet`

`drop_tablets_on_error_root_path` is a small probability event, `TabletManager::drop_tablets_on_error_root_path` should return when its param `tablet_info_vec` is empty
2021-08-07 21:30:49 +08:00
ad3a0fb79d [Metric] Add metrics of tablet version num distribution (#5665)
Add metrics (P50, P75, P90, P95, P99, etc.) to show the distribution of tablets version count.

```
# TYPE doris_be_tablet_version_num_distribution histogram
doris_be_tablet_version_num_distribution{quantile="0.50"} 9.21429
doris_be_tablet_version_num_distribution{quantile="0.75"} 11.7949
doris_be_tablet_version_num_distribution{quantile="0.90"} 13
doris_be_tablet_version_num_distribution{quantile="0.95"} 13
doris_be_tablet_version_num_distribution{quantile="0.99"} 13
doris_be_tablet_version_num_distribution_sum 950
doris_be_tablet_version_num_distribution_count 100
```
2021-04-23 21:23:22 +08:00
f3aded9370 [Bug] System metric init failed cause be start failed (#5262)
System metric init failed cause be start failed
2021-02-01 00:10:57 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
4c63dc0027 [Metric] Add metrics for compaction permits and log for compaction merge (#4893)
1. Add metrics to `used permits` and `waitting permits` for compaction.
It would be useful to monitor `permits` hold by all executing compaction tasks and waitting compaction task.

2. Add log which can be chosen by config  for merge rowsets. 
It would be helpful to track the process of rowsets merging for compaction task which lasts for a long time.
2020-11-28 10:00:08 +08:00
6247408689 [Compact]Take tablet scan frequency into consider when selecting tablet for compaction (#4837)
A large number of small segment files will lead to low efficiency for scan operations.
Multiple small files can be merged into a large file by compaction operation.
So we could take the tablet scan frequency into consideration when selecting an tablet for compaction
and preferentially do compaction for those tablets which are scanned frequently during a
latest period of time at the present.

Using the compaction strategy of Kudu for reference, scan frequency can be calculated
for tablet during a latest period of time and be taken into consideration when calculating compaction score.
2020-11-18 21:51:12 +08:00
3438a746ac [Typo] Fix typo in metrics macros (#4739)
Just fix typo.
Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit)
Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)
2020-10-15 19:56:43 +08:00
5f43fb3bde [Cache][BE] LRU cache for sql/partition cache #2581 (#4005)
1. Find the cache node by SQL Key, then find the corresponding partition data by Partition Key, and then decide whether to hit Cache by LastVersion and LastVersionTime
2. Refers to the classic cache algorithm LRU, which is the least recently used algorithm, using a three-layer data structure to achieve
3. The Cache elimination algorithm is implemented by ensuring the range of the partition as much as possible, to avoid the situation of partition discontinuity, which will reduce the hit rate of the Cache partition,
4. Use the two thresholds of maximum memory and elastic memory to control to avoid frequent elimination of data
2020-09-20 20:50:51 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
e71152132c [metrics] Redesign metrics to 3 layers (#4115)
Redesign metrics to 3 layers:
    MetricRegistry - MetricEntity - Metrics
    MetricRegistry : the register center
    MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several 
        MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift 
        clients and servers, and so on. 
    Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, 
        thrift_opened_clients on a thrift_client entity.
    MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across 
        different MetricEntities.
2020-08-08 11:23:01 +08:00
8500d8b695 [metrics] Use atomic instead of SpinLock for integer metric (#4036) 2020-07-17 11:01:33 +08:00
6c4d7c60dd [Feature] Add QueryDetail to store query statistics. (#3744)
1. Store the query statistics in memory.
2. Supporting RESTFUL interface to get the statistics.
2020-06-15 18:16:54 +08:00
f89d970cfd [Bug][Metrics] Fix bug that some of metrics can not be got (#3708)
The metrics in a metric collector need have same type, but no need
to have same unit.
2020-05-28 09:09:14 +08:00
1cc78fe69b [Enhancement] Convert metric to Json format (#3635)
Add a JSON format for existing metrics like this.
```
{
    "tags":
    {
        "metric":"thread_pool",
        "name":"thrift-server-pool",
        "type":"active_thread_num"
    },
    "unit":"number",
    "value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1.  A unit item to indicate the metric better 
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
2020-05-27 08:49:30 +08:00
b58b1b3953 [metrics] Make DorisMetrics to be a real singleton (#3417) 2020-05-04 09:20:53 +08:00
72f3082358 [Metrics] Add some metrics for container size in BE (#3246)
We can observe the workload of BE, and also it's a way to check
whether there is any problem in BE, like some container increase
too large and lead to OOM.

This patch add the following metrics:
```
Name                                   Description
rowset_count_generated_and_in_use      The total count of rowset id generated and in use since BE last start
unused_rowsets_count                   The total count of unused rowset waiting to be GC
broker_count                           The total count of brokers in management
data_stream_receiver_count             The total count of data stream receivers in management
fragment_endpoint_count                The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count
active_scan_context_count              The total count of active scan contexts
plan_fragment_count                    The total count of plan fragments in executing
load_channel_count                     The total count of load channels in management
result_buffer_block_count              The total count of result buffer blocks for queries, each block has a limited queue size (default 1024)
result_block_queue_count               The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count)
routine_load_task_count                The total count of routine load tasks in executing
small_file_cache_count                 The total count of cached small files' digest info
stream_load_pipe_count                 The total count of stream load pipes, each pipe has a limited buffer size (default 1M)
tablet_writer_count                    The total count of tablet writers
brpc_endpoint_stub_count               The total count of brpc endpoints
```
2020-04-25 16:13:39 +08:00
4a7a88ede1 [LSAN] Fix some memory leak detected by LSAN (#3326) 2020-04-22 22:59:44 +08:00
58b8e3f574 [Fs Block] Add block layer to storage-engine (#2983)
The abstraction of the Block layer, inspired by Kudu, lies between the "business
layer" and the "underlying file storage layer" (`Env`), making them no longer
strongly coupled.

In this way, for the business layer (such as `SegmentWriter`),
there is no need to directly do the file operation, which will bring better
encapsulation. An ideal situation in the future is: when we need to support a
new file storage system, we only need to add a corresponding type of
BlockManager without modifying the business code (such as `SegmentWriter`).

With the Block layer, there are some benefits:

1. First and foremost, the mapping relationship between data and `Env` is more
   flexible. For example, in the storage engine, the data of the tablet can be
   placed in multiple file systems (`Env`) at the same time. That is, one-to-many
   relationships can be supported. For example: one on the local and one on the
   remote storage.
2. The mapping relationship between blocks and files can be adjusted, for example,
   it may not be a one-to-one relationship. For example, the data of multiple
   blocks can be stored in a physical file, which can reduce the number of files
   that need to be opened during querying. It is like `LogBlockManager` in Kudu.
3. We can move the opened-file-cache under the Block layer, which can automatically
   close and open the files used by the upper layer, so that the upper business
   level does not need to be aware of the restrictions of the file handle at all
   (This problem is often encountered online now).
4. Better automatic cleanup logic when there are exceptions. For example, a block
   that is not closed explicitly can automatically clean up its corresponding file,
   thereby avoiding generating most garbage files.
5. More convenient for batch file creation and deletion. Some business operations
   create multiple files, such as compaction. At present, the processing flow that
   these files go through is executed one by one: 1) creation; 2) writing data;
   3) fsync to disk. But in fact, this is not necessary, we only need to fsync this
   batch of files at the end. The advantage is that it can give the operating system
   more opportunities to perform IO merge, thereby improving performance. However,
   this operation is relatively tedious, there is no need to be coupled in the
   business code, it is an ideal place to put it in the Block layer.

This is the first patch, just add related classes, laying the groundwork for later
switching of read and write logic.
2020-03-01 10:48:00 +08:00
c39d35df4c Add tablet compaction score metrics (#2427)
[Metric] Add tablet compaction score metrics

Backend:
    Add metric "tablet_max_compaction_score" to monitor the current max compaction
    score of tablets on this Backend. This metric will be updated each time
    the compaction thread picking tablets to compact.

Frontend:
    Add metric "tablet_max_compaction_score" for each Backend. These metrics will
    be updated when backends report tablet.
    And also add a calculated metric "max_tablet_compaction_core" to monitor the
    max compaction core of tablets on all Backends.
2019-12-12 17:46:59 +08:00
f130bd3e7b Use Env function to operate directory (#1980)
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.
2019-10-15 09:25:12 +08:00
6d040a33af Add zone map page(#1390) (#1633) 2019-08-24 00:57:30 +08:00
a88b55e649 Add more logs and metrics to trace the broker load process (#1530)
The Operator wants to known when the job being scheduled as PENDING
and LOADING. And how long it takes to finish these sub states.

Also add 2 metrics on BE to monitor the memtable's flush time.
`memtable_flush_total` and `memtable_flush_duration_us`
2019-07-23 21:42:44 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
a08170fd50 Enhance the usabilities (#1100)
* Enhence the usabilities

1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.

* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
2019-05-07 15:55:04 +08:00
afa3aa9069 Add some pre-calculated metrics (#1079)
1. max io util of disks
2. max network send/receive bytes rate of all network devices
3. base/cumulative compaction request counter and failure counter
2019-04-30 11:12:23 +08:00
d3251a19f7 Modify the method to obtain some metrics (#904) 2019-04-10 19:37:48 +08:00
a9e9aef3ca Fix sync and trash bug (#570)
1. no need to save header when header has no incremental delta
2. make fsync tablet_meta configurable
3. add metric for meta operation
2019-01-22 20:13:09 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00