Commit Graph

47 Commits

Author SHA1 Message Date
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
c69dd54116 [refactor](mutex) Use std::mutex to replace Mutex and refactor some lock logic (#8452) 2022-03-24 14:50:02 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
99404df8b2 [Bug][Compaction] Fix bug that output rowset is not deleted after compaction failure (#4964)
This CL fix 2 bugs:

1. 
When the compaction fails, we must explicitly delete the output rowset,
otherwise the GC logic cannot process these rows.

2. 
Base compaction failed if compaction process include some delete version in SegmentV2,
Because the number of filtered rows is wrong.
2020-11-30 22:02:03 +08:00
ec7e1c6b1b [Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891)
The current compaction mechanism is that there is a producer thread that has been producing compaction tasks,
and the selected tablet must apply for `permits`.
When a tablet could hold `permits`, compaction task for this tablet will be submitted to  thread pool.
We take compaction score as `permits` which is used for limiting memory consumption.
However,  `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread,
and the number of segment files that actually perform the merge operation is smaller than compaction score.
In addition, it is also possible that compaction task exits directly because the tablet doesn't meet
the requirements of compaction. 

This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets'
before applying for permits for a compaction task, calculate the number of segment files that actually
participate in the merge operation, and take this number as `permits`.
2020-11-30 11:41:14 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
f239f44b37 [Compaction][Bug-Fix] Fix bug that meta lock need to be held when calculating compaction score (#4829)
* [Compaction][Buf] Fix bug that meta lock need to be held when calucating compaction score

* fix

Co-authored-by: morningman <chenmingyu@baidu.com>
2020-11-05 20:29:01 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
75e0ba32a1 Fixes some be typo (#4714) 2020-10-13 09:37:15 +08:00
eba595583e [Optimize] Optimize the execution model of compaction to limit memory consumption (#4670)
Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. 
Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks
is limited by semaphore.
If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM
will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. 
Therefore, concurrency limitation is not enough.

The strategy proposed in #3624  may be effective to solve the OOM. 

A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model.
Producer will try to generate compaction tasks and acquire `permits` for each task. 
The compaction task which can hold `permits` will be executed in thread pool and each finished task will
release its `permits`.

`permits` should be applied for before a compaction task can execute. When the sum of `permits`
held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed,
until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here.

To some extent, memory consumption can be limited by setting appropriate `permits` threshold.
2020-10-11 11:39:25 +08:00
498b06fbe2 [Metrics] Support tablet level metrics (#4428)
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `. 
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
2020-09-02 10:39:41 +08:00
10f822eb43 [MemTracker] make all MemTrackers shared (#4135)
We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web.
As follows:
1. nearly all MemTracker raw ptr -> shared_ptr
2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent)
3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec 
     before MemTracker's destructor. So we don't change these code.
4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared 
    too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile.
Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.
2020-07-31 21:57:21 +08:00
a01d1aec56 [Compaction] track RowsetReader's mem & add metric (#4068)
Ref https://github.com/apache/incubator-doris/issues/3624#issuecomment-655933244
Only RowsetReaders in compaction are under the track.
Other RowsetReaders won't be effected, because the parent_tracker is nullptr.
2020-07-24 07:58:09 +08:00
cc7f04de2c [Log] Add compaction point log record (#4128)
Add log to record the compaction point changing.
When OLAP_ERR_BE_SEGMENTS_OVERLAPPING happens,
it will be used to  track the bugs. Add issue #4134 link.
2020-07-22 22:35:49 +08:00
03cf9b2a24 [Compaction] Add delayed deletion of rowsets function, fix -230 error. (#4039)
Related issue #4017, main changes as follows:
1. Add expired_snapshot_rs_version_map,_expired_snapshot_rs_metas,
2. Add  VersionedRowsetTracker record compacted path version
3. Record path version when rowsets compact
4. In gc process, add expired snapshot rowsets to unused set to remove.
2020-07-19 22:03:59 +08:00
e0461cc7f4 [bug] Make compaction metrics value is right (#3903)
Now _input_rowsets will be cleared when calling gc_used_rowsets().
After that, the metrics is not right upon be calculated.
2020-06-19 11:22:06 +08:00
3c09e1e1d8 [trace] Adapt trace util to compaction module (#3814)
Trace util is helpful for diagnosing compaction performance problems,
we can get trace log for base compaction like:
```
W0610 11:26:33.804431 56452 storage_engine.cpp:552] Trace:
0610 11:23:03.727535 (+     0us) storage_engine.cpp:554] start to perform base compaction
0610 11:23:03.728961 (+  1426us) storage_engine.cpp:560] found best tablet 546859
0610 11:23:03.728963 (+     2us) base_compaction.cpp:40] got base compaction lock
0610 11:23:03.729029 (+    66us) base_compaction.cpp:44] rowsets picked
0610 11:24:51.784439 (+108055410us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:24:51.784818 (+   379us) compaction.cpp:74] prepare finished
0610 11:26:33.359265 (+101574447us) compaction.cpp:87] merge rowsets finished
0610 11:26:33.484481 (+125216us) compaction.cpp:102] output rowset built
0610 11:26:33.484482 (+     1us) compaction.cpp:106] check correctness finished
0610 11:26:33.513197 (+ 28715us) compaction.cpp:110] modify rowsets finished
0610 11:26:33.513300 (+   103us) base_compaction.cpp:49] compaction finished
0610 11:26:33.513441 (+   141us) base_compaction.cpp:56] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":3346807,"input_rowsets_count":42,"input_rowsets_data_size":1256413170,"input_segments_num":44,"merge_rowsets_latency_us":101574444,"merged_rows":0,"output_row_num":3346807,"output_rowset_data_size":1228439659,"output_segments_num":6}
```
for cumulative compaction like:
```
W0610 11:14:18.714366 56468 storage_engine.cpp:518] Trace:
0610 11:14:08.068484 (+     0us) storage_engine.cpp:520] start to perform cumulative compaction
0610 11:14:08.069844 (+  1360us) storage_engine.cpp:526] found best tablet 547083
0610 11:14:08.069846 (+     2us) cumulative_compaction.cpp:42] got cumulative compaction lock
0610 11:14:08.069947 (+   101us) cumulative_compaction.cpp:46] calculated cumulative point
0610 11:14:08.070141 (+   194us) cumulative_compaction.cpp:50] rowsets picked
0610 11:14:08.070143 (+     2us) compaction.cpp:46] got concurrency lock and start to do compaction
0610 11:14:08.070518 (+   375us) compaction.cpp:74] prepare finished
0610 11:14:15.389893 (+7319375us) compaction.cpp:87] merge rowsets finished
0610 11:14:15.390916 (+  1023us) compaction.cpp:102] output rowset built
0610 11:14:15.390917 (+     1us) compaction.cpp:106] check correctness finished
0610 11:14:15.409460 (+ 18543us) compaction.cpp:110] modify rowsets finished
0610 11:14:15.409496 (+    36us) cumulative_compaction.cpp:55] compaction finished
0610 11:14:15.410138 (+   642us) cumulative_compaction.cpp:65] unused rowsets have been moved to GC queue
Metrics: {"filtered_rows":0,"input_row_num":136707,"input_rowsets_count":302,"input_rowsets_data_size":76617836,"input_segments_num":302,"merge_rowsets_latency_us":7319372,"merged_rows":0,"output_row_num":136707,"output_rowset_data_size":53893280,"output_segments_num":1}
```
2020-06-13 19:31:51 +08:00
b58b1b3953 [metrics] Make DorisMetrics to be a real singleton (#3417) 2020-05-04 09:20:53 +08:00
f77cfcdb61 [Compaction] Avoid unnecessary compaction (#2839)
It is not necessary to perform compaction in the following cases

1. A tablet has only 2 rowsets, the versions are [0-1] and [2-x]. In this case, 
there is no need to perform base compaction because the [0-1] version is an empty version.

    Some tables will be partitioned by day, and then each partition will only load one batch of data
 each day, so a large number of tablets with rowsets [0-1][2-2] will appear. And these tablets
 do not need to be base compaction.

2. The initial value of the `last successful execution time of compaction` is 0, 
which causes the first time to determine the time interval from the
 last successful execution time of compaction, which always meets the 
conditions to trigger cumulative compaction.
2020-02-06 16:40:38 +08:00
1421a9be41 [Compaction] Support compact only one rowset (#2558)
Support compaction operation to compact only one rowset.
After the modification, the last rowset of the tablet will
also be compacted.

At the same time, we added a `segments_overlap_pb` field to
the rowset meta. Used to describe whether the segment data
in the rowset overlaps. This field is set by `rowset_writer`.
Initially UNKNOWN for compatibility with existing data.

In addition, the version hash of the rowset generated after
compaction is directly set to the version hash of last rowset
participating in compaction, to ensure that the tablet's
version hash remains unchanged after compaction.
2019-12-27 10:08:41 +08:00
71731b25f4 Ignore some compaction errors to reduce logs (#1955) 2019-10-11 19:58:38 +08:00
38c82c039f Prepare _input_row_num and _input_rowsets_size before compaction (#1643) 2019-08-14 22:54:27 +08:00
dcb75729db Change cumulative compaction for decoupling storage from compution (#1576)
1. Calculate cumulative point when loading tablet first time.
2. Simplify pick rowsets logic upon delete predicate.
3. Saving meta and modify rowsets only once after cumulative compaction.
2019-08-13 18:25:56 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00
758adce761 Change the compaction thread number based on disks number (#1161)
Add 2 BE configs: base_compaction_num_threads_per_disk and cumulative_compaction_num_threads_per_disk to control the number of threads per disks.
2019-05-15 14:17:11 +08:00
f88970a454 Fix get_tablet_stat data race and base_compaction deletion check bug. (#477)
1. It is wrong to use _tablet_map_lock to protect critical region in get_tablet_stat function.
   Add a _tablet_stat_mutex to protect critical region.
2. When base_compaction finished, it checks where there is version missed in tablet.
   If answer is yes, BE will be cored dump. Now check tablet's integrity in advance.
2018-12-27 19:50:08 +08:00
ff95f23615 Remove OLAP_LOG_DEBUG AND OLAP_LOG_TRACE log format (#378)
Use VLOG(3) and VLOG(10) instead
2018-12-03 10:08:21 +08:00
5dea8bd3e6 Remove OLAP_LOG_FATAL log format. Use LOG(FATAL) instead (#376) 2018-12-01 19:26:08 +08:00
3d324e38ea Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#372) 2018-11-30 20:59:40 +08:00
49302955c8 Revert "Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370)" (#371)
This reverts commit a816925776de06dc7503ea7429802cad9042d0e4.
2018-11-30 20:56:51 +08:00
a816925776 Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead (#370)
* Remove unused row-oriented format flags

* Remove unused row-oriented format flags

* Remove OLAP_LOG_INFO log format. Use LOG(INFO) instead
2018-11-30 20:36:58 +08:00
85d0996b35 Rename Rowset to SegmentGroup (#364)
* Rename Rowset to SegmentGroup

* Modify protobuf related rowset to SegmentGroup
2018-11-29 17:30:41 +08:00
477b6b3d2a Remove unused row-oriented format flags (#357) 2018-11-27 16:34:22 +08:00
1ba8a4ee4e Transform row-oriented table to columnar-oriented table (#311) 2018-11-16 16:03:56 +08:00
37b4cafe87 Change variable and namespace name in BE (#268)
Change 'palo' to 'doris'
2018-11-02 10:22:32 +08:00
2868793b6b Change license to Apache License 2.0 (#262) 2018-11-01 09:06:01 +08:00
051aced48d Missing many files in last commit
In last commit, a lot of files has been missed
2018-10-31 16:19:21 +08:00
bea10e4f06 1. hide password and other sensitive information in log and audit log
2. add 2 new proc '/current_queries' and '/current_backend_instances' to monitor the current running queries.
3. add a manual compaction api on Backend to trigger cumulative or base compaction manually.
4. add Frontend config 'max_bytes_per_broker_scanner' to limit to bytes per one broker scanner. This is to limit the memory cost of a single broker load job
5. add Frontend config 'max_unfinished_load_job' to limit load job number: if number of running load jobs exceed the limit, no more load job is allowed to be submmitted.
6. a log of bug fixed
2018-09-19 20:04:01 +08:00
2419384e8a push 3.3.19 to github (#193)
* push 3.3.19 to github

* merge to 20ed420122a8283200aa37b0a6179b6a571d2837
2018-05-15 20:38:22 +08:00