Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Follow #17586.
This PR mainly changes:
Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.
Except for tests in #17586, I also tested some case related to fs io:
clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system
Not test:
- cold & host data separation case.
Currently, there are 2 status code in BE, one is common/Status.h,
and the other is olap/olap_define.h called OLAPStatus.
OLAPStatus is just an enum type, it is very simple and could not save many informations,
I will unify these code to common/Status.
1. Fix the problem of BE crash caused by destruct sequence. (close#8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`
This config specify the max concurrent compaction task num on fast disk(typically .SSD).
So that for high speed disk, we can execute more compaction task at same time,
to compact the data as soon as possible
3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
For the first, we need to make a parameter to discribe the data is local or remote.
At then, we need to support some basic function to support the operation for remote storage.
1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
At present, the application of vlog in the code is quite confusing.
It is inherited from impala VLOG_XX format, and there is also VLOG(number) format.
VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG
1. MonoTime/MonoDelta
MonoTime: The MonoTime represents a particular point in time, relative to some fixed but unspecified reference point.
MonoDelta: The MonoDelta class represents an elapsed duration of time, the delta between two MonoTime instances.
2. CountDownLatch
This is a C++ implementation of the Java CountDownLatch
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.
1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
Nameing rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
with different format.