bp #36045, and turn on batch split, which is turn off in #36109
Generate and get split batch concurrently.
`SplitSource.getNextBatch` remove the synchronization, and make each get their splits concurrently, and `SplitAssignment` generates splits asynchronously.
- Add 2 new BE config
- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`
When meet s3 429 error, the "get" request will
sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
The max sleep time is s3_read_max_wait_time_ms
and the max retry time is max_s3_client_retry
- Add more metrics for s3 file reader
- `s3_file_reader_too_many_request`: counter of 429 error.
- `s3_file_reader_s3_get_request`: the QPS of s3 get request.
- `TotalGetRequest`: Get request counter in profile
- `TooManyRequestErr`: 429 error counter in profile
- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
- `TotalBytesRead`: Total bytes read from s3 in profile
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused,
so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"].
2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.
Refactor the config for log dir of FE and BE
TLDR:
- Use env variable `LOG_DIR` to set root log dir
- Remove `sys_log_dir` for FE and BE
Details:
1. FE
1. The root log dir is set by env variable `LOG_DIR` in `fe.conf`
2. The default value of `audit_log_dir` is same as `${LOG_DIR}/`
3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log`
4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log`
5. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
2. BE
1. The root log dir is set by env variable `LOG_DIR` in `be.conf`
2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly.
3. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
* The default delete bitmap cache is set to 100MB, which can be insufficient and cause performance issues when the amount of user data is large. To mitigate the problem of an inadequate cache, we will take the larger of 5% of the total memory and 100MB as the delete bitmap cache size.
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.
This PR mainly changes:
1. Add DNSCache for both FE and BE.
The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
and update the cache.
In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
be changed at anytime.
There are other implements of this dns cache:
1. 36fed13997
This is for BE side, but it does not handle the IP change case.
3. https://github.com/apache/doris/pull/28479
This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.
This PR mainly changes:
1. Add a new method `exceed_prune_limit()` for `CachePolicy`
For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.
2. Reduce the default capability of file meta cache, from 20000 to 1000
Also change the default capability of hdfs file handle cache, from 20000 to 1000
4. Change judgement of whether enable file meta cache when querying
If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
will be disabled for this query. Because cache is useless if there are too many files.
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)
* [fix](outfile) Fix unable to export empty data (#30703)
Issue Number: close#30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.
* [fix](file-writer) avoid empty file for segment writer (#31169)
---------
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>