In previously, when enabling FQDN, Doris will call dns resolver to get IP from hostname
each time when 1) FE gets BE's grpc client. 2) BE gets other BE's brpc client.
So when in high concurrency case, the dns resolver be overloaded and failed to resolve hostname.
This PR mainly changes:
1. Add DNSCache for both FE and BE.
The DNSCache will run on every FE and BE node. It has a cache, key is hostname and value is IP.
Caller can get IP by hostname from this cache, and if hostname does not exist, it will try to resolve it
and update the cache.
In addition, DNSCache has a daemon thread to refresh the cache every 1 min, in case that the IP may
be changed at anytime.
There are other implements of this dns cache:
1. 36fed13997
This is for BE side, but it does not handle the IP change case.
3. https://github.com/apache/doris/pull/28479
This is for FE side, but it can only work with Master FE. Other FE node will not be aware of the IP change.
And there are a bunch of BackendServiceProxy, this PR only handle cache in one of them.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
Redesign metrics to 3 layers:
MetricRegistry - MetricEntity - Metrics
MetricRegistry : the register center
MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several
MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift
clients and servers, and so on.
Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity,
thrift_opened_clients on a thrift_client entity.
MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across
different MetricEntities.
Add a JSON format for existing metrics like this.
```
{
"tags":
{
"metric":"thread_pool",
"name":"thrift-server-pool",
"type":"active_thread_num"
},
"unit":"number",
"value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1. A unit item to indicate the metric better
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log