Commit Graph

131 Commits

Author SHA1 Message Date
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
ec5996f1f8 [improvement]do not acquire mutex in metric hook (#10941) 2022-07-18 08:52:24 +08:00
523d395527 [refactor] Remove alpha rowset meta (#10933)
* remove alpha_rowset_meta
* remove alpha rowset related codes in compaction
* remove alpha rowset related codes in RowsetMeta
* fix be ut because some ut use alpha rowsetmeta
2022-07-18 08:45:46 +08:00
3bc6655069 [refactor] remove BlockManager (#10913)
* remove BlockManager
* remove deprecated field in tablet meta
2022-07-17 14:10:06 +08:00
a266d7b040 [bug](be) fix be _quick_compaction_thread_pool without shutdown. (#10758) 2022-07-11 22:33:56 +08:00
331fa50501 [feature](cold-data) move cold data to object storage without losing any feature(BE) (#10280)
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.

Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.

The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.

To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
2022-07-08 12:18:39 +08:00
89e56ea67f [refactor] remove alpha rowset related code and vectorized row batch related code (#10584) 2022-07-05 20:33:34 +08:00
c9f86bc7e2 [refactor] Refactoring Status static methods to format message using fmt(#9533) 2022-07-02 18:58:23 +08:00
f35b235c3b [opt](compaction) optimize compaction in concurrent load (#10153)
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless

after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-06-17 17:49:45 +08:00
Pxl
f2aa5f32b8 [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change (#9811)
Some pre-refactorings or interface additions for schema change
2022-06-07 15:04:57 +08:00
3743f19369 [feature] support convert alpha rowset (#9890)
Add alpha rowset to beta rowset convert to convert rowset automatically. We will remove alpha rowset's code after 1.1.
2022-06-04 12:29:03 +08:00
f1aa9668af [refactor][storage format] Forbidden rowset v1 (#9248)
- Force change the existing olaptable's storage format from V1 to V2
- Forbidden to create new olap table with storage format == v1 OR do schema change that want to create new v1 format
2022-05-04 17:32:20 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
e157c2c254 [feature-wip](remote-storage) step3: Support remote storage, only for be, add migration_task_v2 (#8806)
1. Add TStorageMigrationReqV2 and EngineStorageMigrationTask to support migration action
2. Change TabletManager::create_tablet() for remote storage
3. Change TabletManager::try_delete_unused_tablet_path() for remote storage
2022-04-22 22:38:10 +08:00
e5e0dc421d [refactor] Change ALL OLAPStatus to Status (#8855)
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
2022-04-14 11:43:49 +08:00
290366787c [refactor] refactor code, replace some file with stl libs (#8759)
1. replace ConditionVariables with std::condition_variable
2. repalace Mutex with std::mutex
3. repalce MonoTime with std::chrono
2022-04-13 09:55:29 +08:00
5a44eeaf62 [refactor] Unify all unit tests into one binary file (#8958)
1. solved the previous delayed unit test file size is too large (1.7G+) and the unit test link time is too long problem problems
2. Unify all unit tests into one file to significantly reduce unit test execution time to less than 3 mins
3. temporarily disable stream_load_test.cpp, metrics_action_test.cpp, load_channel_mgr_test.cpp because it will re-implement part of the code and affect other tests
2022-04-12 15:30:40 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
98cab78320 [refactor](schema_hash) remove schema_hash since every tablet id in be is unique (#8574) 2022-04-07 08:37:45 +08:00
e63afc1a3c [feature-wip](remote storage)(step2) add storage_backend_mgr on BE side (#8663)
1. add storage backend mgr
2. remove env_remote
2022-03-31 11:13:14 +08:00
c69dd54116 [refactor](mutex) Use std::mutex to replace Mutex and refactor some lock logic (#8452) 2022-03-24 14:50:02 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
c86d469baf [Refactor](storage_engine) Use std::shared_mutex to replace RWMutex (#8387) 2022-03-11 18:14:24 +08:00
0f7a25367d [fix](rowset-meta) Fix bug that rowset meta is not deleted (#8118)
As described in #8120, a large number of rowset meta remain in rocksdb, which may be generated by:

1. drop tablet

    The drop tablet task itself just sets the state of the tablet meta to `SHUTDOWN`
    and moves the tablet to `_shutdown_tablets` vector then the background thread
    will periodically clean up the tablet in `_shutdown_tablets` (that's why even if we execute
    the `drop table xx force`, the tablet may be delayed by 10min to 1 hour before it goes into the trash directory).

    The regular cleanup thread in the background saves the complete tablet meta as a `.hdr` file
    when deleting the tablet, and then moves it to the trash directory along with the data files.

    But this process does not process the rowset meta (before doing the checkpoint of the tablet meta,
    the rowset meta is stored independently in rocksdb as a key-value). So this results in a residual rowset meta.

2. clone task

    The clone task may migrate back and forth between BEs, which may result in a situation
    where the tablet id is the same on the BE, but the tablet uuid is different.
    This leads to some rowset meta can not find the corresponding tablet, but there is no thread
    to process these rowsets, and eventually lead to residual.

This is PR, I handled it in the regular cleanup thread with method `_clean_unused_rowset_metas()`.
I did not delete rowset meta along with "drop tablet" task, because "drop tablet" itself is not a synchronous operation.
It also relies on a background thread to clean up the tablet periodically.
So I put this operation in the background cleanup thread.
2022-02-19 12:00:48 +08:00
7d7e3a39f5 [refactor] Remove snapshot converter and unused Protobuf Definitions (#8026)
1. remove snapshot converter
2. remove unused protobuf definitions
3. move some macro as const variables
2022-02-12 16:06:04 +08:00
51abaa89f3 [fix](vec) Fix some bugs about vec engine (#7884)
1. mem leak in vcollector iter
2. query slow in agg table limit 10
3. query slow in SSB q4,q5,q6
2022-02-03 19:21:17 +08:00
Pxl
cd73a6b84b [chore] fix clang compile error (#7883) 2022-01-26 12:53:35 +08:00
20ef8a6e21 [feature-wip](remote storage)(step1) use a struct instead of string for parameter path, add basic remote method (#7098)
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
2021-12-22 22:58:23 +08:00
6c6380969b [refactor] replace boost smart ptr with stl (#6856)
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
2021-11-17 10:18:35 +08:00
ed7a873a44 [Memory Usage] Implement segment lru cache to save memory of BE (#6829) 2021-10-25 10:07:15 +08:00
Pxl
8a267f1ac5 [Feature] Support for cleaning the trash actively (#6323) 2021-08-12 10:07:51 +08:00
Pxl
236e0f1eda [Feature] Support for querying the trash used capacity (#6247)
Support for querying the trash used capacity.

```
SHOW TRASH [ON ...]
```

Now user can proactively scan trash directory.
2021-08-10 10:10:47 +08:00
f772649535 [Optimize] Optimize lock when check error storage (#6321)
1. `StorageEngine::_delete_tablets_on_unused_root_path` will try to obtain tablet shard write lock in `TabletManager`
```
StorageEngine::_delete_tablets_on_unused_root_path
  TabletManager::drop_tablets_on_error_root_path
    obtain each tablet shard's write lock
```
2. `TabletManager::build_all_report_tablets_info` and other methods will obtain tablet shard read lock frequently.

So, `StorageEngine::_delete_tablets_on_unused_root_path` will hold `_store_lock` for a long time.
This will make it difficult for other threads to get write `_store_lock`, such as `StorageEngine::get_stores_for_create_tablet`

`drop_tablets_on_error_root_path` is a small probability event, `TabletManager::drop_tablets_on_error_root_path` should return when its param `tablet_info_vec` is empty
2021-08-07 21:30:49 +08:00
Pxl
3812cca4db [Bug]fix the calculation of the "_start_trash_sweep" run interval. (#6177)
* fix the calculation of the _start_trash_sweep run interval
2021-07-09 09:45:44 +08:00
68bab73c35 [Bug] Fix select random storage path maybe same at a long time (#6062)
random_shuflle will generate same random sequence when call multiple times,
although we use twice random, but when there is no change in the size relationship
between the adjacent numbers, the result of the second shuffle will not change either
2021-06-20 16:16:32 +08:00
d57c2344e1 [MemTracker] Refactored the hierarchical structure of memtracker (#5956)
To avoid showing too many memtracker on BE web pages.
The MemTracker level now has 3 levels: OVERVIEW, TASK and VERBOSE.

OVERVIEW Mainly used for main memory consumption module such as Query/Load/Metadata.
TASK is mainly used to record the memory overhead of a single task such as a single query, load, and compaction task.
VERBOSE is used for other more detailed memtrackers.
2021-06-16 09:44:24 +08:00
6d6c3d9703 [Enhancement] Reduce memory consumption by releasing readers earier (#5811)
We created multiple rowset readers to read data of one tablet,
after one rowset reader has reached EOF, it can be released to
reduce resource (typically memory) consumption.
As the same, we can release segment reader when it reach EOF.
2021-06-16 09:37:50 +08:00
206a711f9b [Bug] SimplifyInvalidDateBinaryPredicatesDateRule may cause invalid query plan (#5987)
1. "where 1k > to_date(now())" will return EMPTYSET in query plan.
2. DateLiteral should accept date string like "2021-6-1".
2021-06-10 17:37:26 +08:00
81ecf3d097 [Bug] Rebuilt version graph of a tablet when there are too many orphan vertex (#5945)
The version information of the tablet will be stored in the memory
in an adjacency graph data structure.
And as the new version is written and the old version is deleted,
the data structure will begin to have empty vertex with no edge associations(orphan vertex).

These orphan vertexs should be removed somehow.
2021-06-03 09:59:20 +08:00
8850cfe2ad [Compaction] Modify compaction logic (#5737)
1. Add /api/compaction/run_status to show the running compaction tasks.
2. Support do base and cumulative compaction for one tablet at same time.
3. Modify some log level.
4. Add a feedback document.
2021-05-07 11:18:47 +08:00
58d0c8971e [Bugfix] Fix BE metrics http API dead lock bug (#5730) 2021-04-30 10:15:33 +08:00
84f6d74322 [Optimize] Sort trashed files by name and skip processing unexpired files (#5678) 2021-04-24 17:42:06 +08:00
4fa25b6eb9 [Optimize] make tablet meta checkpoint to be threadpool model (#5654)
Currently Tablet meta checkpoint is a memory-exhausted operation.
If a host has 12 disks, it will start 12 threads to do tablet meta checkpoint.
In our experience, the data size of one tablet can be as high as 2G.
If 12 threads do the checkpoint at the same time, it maybe cause OOM.

Therefore, this PR try to solve this problem.
Firstly, it only start one thread to produce table meta checkpoint tasks.
Secondly, it creates a thread pool to handle these tasks.
You can configure the size of the thread pool to control the parallelism in case of OOM.
It is a producer-customer model.
2021-04-23 09:45:15 +08:00
f5cf008bcc [Bug] Fix stream load UT failed (#5692)
Also move the stream load rocksdb dir to the first of storage root paths
2021-04-23 09:33:42 +08:00
a4f8194111 [Audit][Stream Load] Support audit function for stream load (#5452)
Record finished stream load job (both successful job and failed job) into audit log
so that we can see when the stream load job was executed and check the details of stream load jobs.
2021-04-21 16:36:12 +08:00
be733cfa9c [Metrics] Add some large memtrackers' metric (#5614)
MemTracker can provide memory consumption for us to find out which
module consume more memory, but it's just a current value, this patch
add metrics for some large memory consumers, then we can find out
which module consume more memory in timeline, it would be useful to
troubleshoot OOM problems and optimize configs.
2021-04-21 09:15:04 +08:00
b423274f17 [Enhance] Make MemTracker more accurate (#5515) (#5516)
* [Enhance] Make MemTracker more accurate (#5515)
 This PR main about:
 1. Improve the readability of MemTrackers' name
 2. Add the MemTracker of:
    * Load
    * Compaction
    * SchemaChange
    * StoragePageCache
    * TabletManager
 3. Change SchemaChange to a Singleon

* revise some code for Code Review

* change the name of mem_tracker

* keep reader_context have the same lifetime of rowset_reader in schema change.

* change vlog notice to log(warning) in schema change
2021-04-08 09:14:55 +08:00
d641a26490 [Refactor] Remove boost filesystem (#5579)
* use std::filesystem instead of boost
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
2021-04-08 09:11:59 +08:00
ad67dd34a0 update gcc to gcc 10 and support c++17 (#5394)
* update gcc to gcc 10 and support c++17
    update brpc to 0.9.7
    update boost to 1.73
    remove third-party boost 1.54 for mysql

* update cmake version

* ignore jdk version

* remove unused patch

* avoid use SYS_getrandom call
2021-03-25 09:30:38 +08:00