Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Follow #17586.
This PR mainly changes:
Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.
Except for tests in #17586, I also tested some case related to fs io:
clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
See #17764 for details
I have tested:
- Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp
- Outfile to local/s3/hdfs/broker.
- Load from local/s3/hdfs/broker.
- Query file on local/s3/hdfs/broker file system, with table value function and catalog.
- Backup/Restore with local/s3/hdfs/broker file system
Not test:
- cold & host data separation case.
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
* add ut for cooldown on be
Since Filesystem inherited std::enable_shared_from_this , it is dangerous to create native point of FileSystem.
To avoid this behavior, making the constructor of XxxFileSystem a private method and using the static method create(...) to get a new FileSystem object.
During load process, the same operation are performed on all replicas such as sort and aggregation,
which are resource-intensive.
Concurrent data load would consume much CPU and memory resources.
It's better to perform write process (writing data into MemTable and then data flush) on single replica
and synchronize data files to other replicas before transaction finished.
1.make version publish work in version order
2.update delete bitmap while publish version, load current version rowset
primary key and search in pre rowsets
3.speed up publish version task by parallel tablet publish task
Co-authored-by: yixiutt <yixiu@selectdb.com>
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.