Commit Graph

64 Commits

Author SHA1 Message Date
73d8f5901d fix mem tracker limiter (#11376) 2022-08-01 09:44:04 +08:00
b6bdb3bdbc [fix] (mem tracker) Fix MemTracker accuracy (#11190) 2022-07-27 18:59:24 +08:00
01e108cb7b [feature-wip](unique-key-merge-on-write) update delete bitmap while publish version (#11195)
1.make version publish work in version order
2.update delete bitmap while publish version, load current version rowset
primary key and search in pre rowsets
3.speed up publish version task by parallel tablet publish task

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-07-27 16:26:42 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
a266d7b040 [bug](be) fix be _quick_compaction_thread_pool without shutdown. (#10758) 2022-07-11 22:33:56 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
89e56ea67f [refactor] remove alpha rowset related code and vectorized row batch related code (#10584) 2022-07-05 20:33:34 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
f35b235c3b [opt](compaction) optimize compaction in concurrent load (#10153)
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless

after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-06-17 17:49:45 +08:00
4dfebb9852 [Feature] compaction quickly for small data import (#9804)
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows  default 1000
2022-06-15 21:48:34 +08:00
3743f19369 [feature] support convert alpha rowset (#9890)
Add alpha rowset to beta rowset convert to convert rowset automatically. We will remove alpha rowset's code after 1.1.
2022-06-04 12:29:03 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
51db78d375 [refactor] modify all OLAP_LOG_WARNING to LOG(WARNING) (#9473)
Co-authored-by: BePPPower <fangtiewei@selectdb.com>
2022-05-10 09:25:25 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
290366787c [refactor] refactor code, replace some file with stl libs (#8759)
1. replace ConditionVariables with std::condition_variable
2. repalace Mutex with std::mutex
3. repalce MonoTime with std::chrono
2022-04-13 09:55:29 +08:00
0c98c1ee03 [Improvement][fix](compaction) Change min_compaction_failure_interval_sec to 5 and fix a bug of log (#8781)
see issue #8767
2022-04-02 13:00:56 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
Pxl
3ee000c13c [chore] support build with libc++ && add some build config (#7903)
support LIBCPP/LDD/BUILD_META_TOOL for build.sh
2022-01-30 16:47:22 +08:00
ed39ff1500 [feature](compaction) Support triggering compaction for a specific partition manually (#7521)
Add statement to trigger cumulative or base compaction for a specified partition.
2022-01-21 09:27:06 +08:00
20ef8a6e21 [feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098)
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
2021-12-22 22:58:23 +08:00
6f91741628 [Bug]Fix BE coredump when manual compaction task is triggered (#7260)
* fix compaction action bug

Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-12-08 17:10:34 +08:00
ed7a873a44 [Memory Usage] Implement segment lru cache to save memory of BE (#6829) 2021-10-25 10:07:15 +08:00
57199955d6 [Compaction][ThreadPool]Support adjust compaction threads num at runtime (#5781)
* adjust thread number of compaction thread pool dynamically

Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-09-02 10:01:44 +08:00
Pxl
8a267f1ac5 [Feature] Support for cleaning the trash actively (#6323) 2021-08-12 10:07:51 +08:00
Pxl
3812cca4db [Bug]fix the calculation of the "_start_trash_sweep" run interval. (#6177)
* fix the calculation of the _start_trash_sweep run interval
2021-07-09 09:45:44 +08:00
8850cfe2ad [Compaction] Modify compaction logic (#5737)
1. Add /api/compaction/run_status to show the running compaction tasks.
2. Support do base and cumulative compaction for one tablet at same time.
3. Modify some log level.
4. Add a feedback document.
2021-05-07 11:18:47 +08:00
e519a24c9a dynamic adjust compaction policy (#5651)
Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-04-26 12:39:13 +08:00
12b2447724 [Optimize] Optimize the assign logic of compaction tasks to avoid starvation (#5683)
1. Reserve a slot to ensure that the cumulative compaction can be executed.
2. Ensure that the compaction score metric can be updated.
2021-04-23 09:48:37 +08:00
4fa25b6eb9 [Optimize] make tablet meta checkpoint to be threadpool model (#5654)
Currently Tablet meta checkpoint is a memory-exhausted operation.
If a host has 12 disks, it will start 12 threads to do tablet meta checkpoint.
In our experience, the data size of one tablet can be as high as 2G.
If 12 threads do the checkpoint at the same time, it maybe cause OOM.

Therefore, this PR try to solve this problem.
Firstly, it only start one thread to produce table meta checkpoint tasks.
Secondly, it creates a thread pool to handle these tasks.
You can configure the size of the thread pool to control the parallelism in case of OOM.
It is a producer-customer model.
2021-04-23 09:45:15 +08:00
79544d39cb [Metrics][LOG] Update metrics of 'max_compaction_score' and log for compaction (#5592)
* optimize compaction metrics and log

Co-authored-by: weizuo <weizuo@xiaomi.com>
2021-04-08 09:10:40 +08:00
90c2da54bd [Bug] Fix bug and add graceful exit for compaction producer (#5124)
1. add graceful exit mechanism for the compaction producer thread.
2. if compaction task submits unsuccessfully, the compaction task should pop from `_tablet_submitted_compaction`.
2021-01-30 16:35:36 +08:00
58e58c94d8 [TSAN] Fix tsan bugs (part 1) (#5162)
ThreadSanitizer, aka TSAN, is a useful tool to detect multi-thread
problems, such as data race, mutex problems, etc.
We should detect TSAN problems for Doris BE, both unit tests and
server should pass through TSAN mode, to make Doris more robustness.
This is the very beginning patch to fix TSAN problems, and some
difficult problems are suppressed in file 'tsan_suppressions', you
can suppress these problems by setting:
export TSAN_OPTIONS="suppressions=tsan_suppressions"

before running:
`BUILD_TYPE=tsan ./run-be-ut.sh --run`
2021-01-15 09:45:11 +08:00
3d4b2cb1ae [Bug] Fix tablet shared ptr circular reference causing the tablet not to be cleared (#5100)
Regardless of whether the tablet is submitted for compaction or not,
we need to call 'reset_compaction' to clean up the base_compaction or cumulative_compaction objects
in the tablet, because these two objects store the tablet's own shared_ptr.
If it is not cleaned up, the reference count of the tablet will always be greater than 1,
thus cannot be collected by the garbage collector. (TabletManager::start_trash_sweep)

This bug is introduced from #4891
2020-12-18 21:17:18 +08:00
ec7e1c6b1b [Refactor] Execute 'pick rowsets' before applying for permits for a compaction task (#4891)
The current compaction mechanism is that there is a producer thread that has been producing compaction tasks,
and the selected tablet must apply for `permits`.
When a tablet could hold `permits`, compaction task for this tablet will be submitted to  thread pool.
We take compaction score as `permits` which is used for limiting memory consumption.
However,  `pick_rowset_to_compaction()` will be executed before the file merge in compaction thread,
and the number of segment files that actually perform the merge operation is smaller than compaction score.
In addition, it is also possible that compaction task exits directly because the tablet doesn't meet
the requirements of compaction. 

This patch optimizes and refactors the code of compaction, so that we can execute 'pick rowsets'
before applying for permits for a compaction task, calculate the number of segment files that actually
participate in the merge operation, and take this number as `permits`.
2020-11-30 11:41:14 +08:00
6fedf5881b [CodeFormat] Clang-format cpp sources (#4965)
Clang-format all c++ source files.
2020-11-28 18:36:49 +08:00
10e1e29711 Remove header file common/names.h (#4945) 2020-11-26 17:00:48 +08:00
77835dd9c4 [Bug][Compaction] Fix bug that compaction may be blocked (#4750)
the logic of compaction producer thread may failed to produce compaction task due to
invalid order of modifying task map.
2020-10-21 10:12:37 +08:00
09f97f8a05 [Refactor] Fixes some be typo part 2 (#4747) 2020-10-20 09:28:57 +08:00
eba595583e [Optimize] Optimize the execution model of compaction to limit memory consumption (#4670)
Currently, there are M threads to do base compaction and N threads to do cumulative compaction for each disk. 
Too many compaction tasks may run out of memory, so the max concurrency of running compaction tasks
is limited by semaphore.
If the running threads cost too much memory, we can't defense it. In addition, reducing concurrency to avoid OOM
will lead to some compaction tasks can't be executed in time and we may encounter more heavy compaction. 
Therefore, concurrency limitation is not enough.

The strategy proposed in #3624  may be effective to solve the OOM. 

A CompactionPermitLimiter is used for compaction limitation, and use single-producer/multi-consumer model.
Producer will try to generate compaction tasks and acquire `permits` for each task. 
The compaction task which can hold `permits` will be executed in thread pool and each finished task will
release its `permits`.

`permits` should be applied for before a compaction task can execute. When the sum of `permits`
held by executing compaction tasks reaches a threshold, subsequent compaction task will be no longer allowed,
until some `permits` are released. Tablet compaction score is used as `permits` of compaction task here.

To some extent, memory consumption can be limited by setting appropriate `permits` threshold.
2020-10-11 11:39:25 +08:00
b780df697a [refactor] Optimize threads usage mode in BE (#4440)
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
  and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
  memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
  by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
  services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
  then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
2020-09-06 20:19:14 +08:00
b85bb0e2e9 [Bug-Fix] Some deleted tablets are not recycled on BE (#4401) 2020-08-27 12:09:19 +08:00
dc3ed1c525 [Compaction]Compaction rules optimization (#4212)
Compaction rules optimization, the detail problem description and design to see #4164.
This pr commits 2 functions:
(1) add the cumulative policy configable, and implement original policy.
(2) implement universal policy, the optimization version in #4164.
2020-08-19 09:34:13 +08:00
d6028863f3 [Compaction] Manually trigger compaction RESTapi interface (#4312)
Add restapi to be which do compaction task by manual trigger. The detail design in #4311 .
2020-08-13 23:41:46 +08:00
74b987f053 [Bug] Fix bug that storage engine bg threads should start after env is ready 2020-04-29 11:21:19 +08:00
8fc284d593 [config] Support to modify configs when BE is running without restarting (#3264)
In the past, when we want to modify some BE configs, we have to modify be.conf and then restart BE.
This patch provides a way to modify configs in the type of 'threshold', 'interval', 'enable flag'
when BE is running without restarting it.
You can update a single config once by BE's http API: `be_host:be_http_port/api/update_config?config_name=new_value`
2020-04-08 11:17:47 +08:00
64a06ea9d4 [UT] Fix some BE unit tests (#3110)
And also support graceful exit for StorageEngine to avoid hang too long
time in unit test.
2020-03-16 13:31:44 +08:00
5440e19d01 Improve the triggering strategy of BE report (#2881)
Currently, the report from BE to FE is completed in the background
threads of `AgentServer` (`report_tablet_thread` and
`report_disk_stat_thread`).  These two threads will sleep and be in
a standby state after each report, if there is any need to report
immediately, they will be notified and wake up immediately to report.

For example, when background thread (`disk_monitor_thread`) in
`StorageEngine` finds some tablets were deleted, it will notify
`AgentServer` to trigger a report immediately.

In the current implementation, in order to report ASAP, a local variable
(`_is_drop_tables`) and two other flags are used to record whether
reporting is needed, and then `StorageEngine::disk_monitor_thread` checks
the value of this variable every time it runs, to determine whether it
needs to be triggered Reporting. This is actually superfluous, and it
may result in untimely notifications, as shown below:

```
(thread_1)        (thread_2)
disk-monitor     disk-stat-reporter
    |                  |
    |               reporting
    |                  |
  notify_1             |
    |                  |
    |                wait_for_notify(will wait until timeout or next notification)
    |                  |
    V                  V
```

When `report_tablet_thread` has not started waiting,
`StorageEngine::disk_monitor_thread` triggers a notification, so this
notification will not be received by `report_tablet_thread`,
resulting in the BE not reporting to the FE until the lock times out
or the next round of `disk_monitor_thread` detection.

This change restructures the triggering implementation, and solves the above problem.

This change also changes some methods(that do not need to be public) to private.
2020-02-11 20:38:44 +08:00
6cab929d6d [Compaction] Limit the max concurrency of running compaction tasks (#2635)
Compaction task may sometimes consume much memory and results in OOM.
And currently, there is no good way to predict the mem consumption of
a compaction task, so I add a new BE config: max_compaction_concurrency
to limit the max concurrency of running compaction tasks manually.
2020-01-02 14:47:54 +08:00
b72a4a4bc6 Add tablet meta checkpoint mechanism (#1936) 2019-10-10 09:39:02 +08:00