For HDFS tvf like:
```
select count(*) from hdfs(
"uri" = "hdfs://HDFS8000871/path/to/1.parquet",
"fs.defaultFS" = "hdfs://HDFS8000871/",
"format" = "parquet"
);
```
Before, if the `fs.defaultFS` is end with `/`, the query will fail with error like:
```
reason: RemoteException: File does not exist: /user/doris/path/to/1.parquet
```
You can see that is a wrong path with wrong prefix `/user/doris`
User need to set `fs.defaultFS` to `hdfs://HDFS8000871` to avoid this error.
This PR fix this issue
In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.
This PR mainly changes:
1. Add DNSCache for both FE and BE.
The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
and update the cache.
In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
be changed at anytime.
There are other implements of this dns cache:
1. 36fed13997
This is for BE side, but it does not handle the IP change case.
3. https://github.com/apache/doris/pull/28479
This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
File meta cache on BE is used to cache the meta for external table's file such as parquet footer.
This cache is counted by number, not memory consumption.
So if the cache object is big(eg, a large parquet footer), the total memory consumption of this cache
will be large and causing OOM.
This PR mainly changes:
1. Add a new method `exceed_prune_limit()` for `CachePolicy`
For `ObjLRUCache`, it always return true so that the minor of full gc on BE will prune the cache each time.
2. Reduce the default capability of file meta cache, from 20000 to 1000
Also change the default capability of hdfs file handle cache, from 20000 to 1000
4. Change judgement of whether enable file meta cache when querying
If the number of file need to be read is larger than the 1/3 of the file meta cache's capability, file meta cache
will be disabled for this query. Because cache is useless if there are too many files.
In previous, the counter in `profile` may be updated when close the file reader.
And the file reader may be closed when the object being deconstruted.
But at that time, the `profile` object may already be deleted, causing NPE and BE will crash.
This PR try to fix this issue:
1. Remove the "profile counter update" logic from all `close()` method.
2. Add a new interface `ProfileCollector`
It has 2 methods:
- `collect_profile_at_runtime()`
It can be called at runtime, eg, in every `get_next_block()` method.
So that the counter in profile can be updated at runtime.
- `collect_profile_before_close()`
Should be called before the object call `close()`. And it will only be called once.
3. Derived from `ProfileCollector`
All classes which may update the profile counter in `close()` method should extends
the `ProfileCollector`. Such as `GenericReader`, etc. And implement `collect_profile_before_close()`
And `collect_profile_before_close()` will be called in `scanner->mark_to_need_to_close()`.
In order to add common code to the value deleter of LRU cache, let all lru cache values inherit from LRUCacheValueBase class and tracking memory in destructor.
After all LRU Cache inherits from LRUCachePolicy, this will allow prune stale entry, eviction when memory exceeds limit, and define common properties. LRUCache constructor change to private, only allow LRUCachePolicy to construct it.
Impl DummyLRUCache, when LRU Cache capacity is 0, will no longer be meaningless insert and evict.