Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Follow #17586.
This PR mainly changes:
Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.
Except for tests in #17586, I also tested some case related to fs io:
clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
This patchset applies the following changes:
using vertical compaction machanism to do segcompaction
basic (WIP) refraction to separate segcompaction logic from BetaRowsetWriter
add segcompaction specific ut and regression tests
multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
mem tracker can be logically divided into 4 layers: 1)process 2)type 3)query/load/compation task etc. 4)exec node etc.
type includes
enum Type {
GLOBAL = 0, // Life cycle is the same as the process, e.g. Cache and default Orphan
QUERY = 1, // Count the memory consumption of all Query tasks.
LOAD = 2, // Count the memory consumption of all Load tasks.
COMPACTION = 3, // Count the memory consumption of all Base and Cumulative tasks.
SCHEMA_CHANGE = 4, // Count the memory consumption of all SchemaChange tasks.
CLONE = 5, // Count the memory consumption of all EngineCloneTask. Note: Memory that does not contain make/release snapshots.
BATCHLOAD = 6, // Count the memory consumption of all EngineBatchLoadTask.
CONSISTENCY = 7 // Count the memory consumption of all EngineChecksumTask.
}
Object pointers are no longer saved between each layer, and the values of process and each type are periodically aggregated.
other fix:
In [fix](memtracker) Fix transmit_tracker null pointer because phamp is not thread safe #13528, I tried to separate the memory that was manually abandoned in the query from the orphan mem tracker. But in the actual test, the accuracy of this part of the memory cannot be guaranteed, so put it back to the orphan mem tracker again.
## Design
### Trigger
Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.
### Target Selection
We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.
### Compaction Process
A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:
- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it
### SegID handling
SegID must remain consecutive after segment compaction.
If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:
- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1
It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
* support local cache gc by disk usage
* support gc per disk
* refractor file cache size logic
* also consider unused file caches while GC by disk size
* change config file_cache_max_size_per_disk from GB to B
* bugfix
* update
* use two stage locks to avoid lock while disk io
* rdlock one by one for dummy file cache gc
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
1.make version publish work in version order
2.update delete bitmap while publish version, load current version rowset
primary key and search in pre rowsets
3.speed up publish version task by parallel tablet publish task
Co-authored-by: yixiutt <yixiu@selectdb.com>
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
add some logic to opt compaction:
1.seperate base&cumu compaction in case base compaction runs too long and
affect cumu compaction
2.fix level size in cu compaction so that file size below 64M have a right level
size, when choose rowsets to do compaction, the policy will ignore big rowset,
this will reduce about 25% cpu in high frequency concurrent load
3.remove skip window restriction so rowset can do compaction right after
generated, cause we'll not delete rowset after compaction. This will highly
reduce compaction score in concurrent log.
4.remove version consistence check in can_do_compaction, we'll choose a
consecutive rowset to do compaction, so this logic is useless
after add logic above, compaction score and cpu cost will have a substantial
optimize in concurrent load.
Co-authored-by: yixiutt <yixiu@selectdb.com>
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.