Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Follow #17586.
This PR mainly changes:
Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.
Except for tests in #17586, I also tested some case related to fs io:
clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
# Proposed changes
This PR fixed lots of issues when building from source on macOS with Apple M1 chip.
## ATTENTION
The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...
Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.
This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.
## Use case
```shell
./build.sh -j 8 --be --clean
cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```
## Something else
It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.
This PR supports rowset level data upload on the BE side, so that there can be both cold data and hot data in a tablet,
and there is no necessary to prohibit loading new data to cooled tablets.
Each rowset is bound to a `FileSystem`, so that the storage layer can read and write rowsets without
perceiving the underlying filesystem.
The abstracted `RemoteFileSystem` can try local caching strategies with different granularity,
instead of caching segment files as before.
To avoid conflicts with the code in be/src/io, we temporarily put the file system related code in the be/src/io/fs directory.
In the future, `FileReader`s and `FileWriter`s should be unified.
BE can not graceful exit because some threads are running in endless
loop. This patch do the following optimization:
- Use the well encapsulated Thread and ThreadPool instead of std::thread
and std::vector<std::thread>
- Use CountDownLatch in thread's loop condition to avoid endless loop
- Introduce a new class Daemon for daemon works, like tcmalloc_gc,
memory_maintenance and calculate_metrics
- Decouple statistics type TaskWorkerPool and StorageEngine notification
by submit tasks to TaskWorkerPool's queue
- Reorder objects' stop and deconstruct in main(), i.e. stop network
services at first, then internal services
- Use libevent in pthreads mode, by calling evthread_use_pthreads(),
then EvHttpServer can exit gracefully in multi-threads
- Call brpc::Server's Stop() and ClearServices() explicitly
Now Env has unify all environment operation, such as file operation.
However some of our old functions don't leverage it. This change unify
FileUtils::scan_dir to use Env's function.
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.
1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
Nameing rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
with different format.
Currently, the load error log on BE will be cleaned along with the
intermediate data of load, configured by 'load_data_reserve_hours'.
Sometimes user want to reserve the error log for longer time.
1. No one can set root password expect for root user itself
2. NODE_PRIV cannot be granted.
3. ADMIN_PRIV and GRANT_PRIV can only be granted or revoked on *.*
4. No one can modifly privs of default role 'operator' and 'admin'.
5. No user can be granted to role 'operator'.
Fixed: the running load limit should not be applied to replay logic. It will cause replay or loading image fail.
Changed: optimize the problem of too many directories under mini load directory.
Fixed: missing password and auth check when handling mini load request in Frontend.
Fixed: DomainResolver should start after Frontends transfer to a certain ROLE, not in Catalog construction methods.
Fixed: a stupid bug that no one can set password for root user... fix it: only root user can set password for root.
Fixed: read null data twice
When reading data with a null value, in some cases, the same data will be read twice by the storage engine,
resulting in a wrong result.The reason for this problem is that when splitting,
and the start key is the minimum value, the data with null is read.
Fixed: add a flag to prevent DomainResovler thread start twice.
Fixed: fixed a mem leak of using ByteBuf when parsing auth info of http request.
Fixed: add a new config 'disable_hadoop_load', default is false, set to true to disable hadoop load.
Changed: add detail error msg of submitting hadoop load job in show load result.
Fixed: Backend process should be crashed if failed to saving header.
Added: exposure backend info to user when encounter error on Backend. for debugging it more convenient.
Fixed: Should remove fd from map when inputstream or outputstream is closed in Broker process.
Fixed: Change all files' LF to unix format.
Internal commit id: merge from dfcd0aca18eed9ff99d188eb3d01c60d419be1b8