# Proposed changes
This PR fixed lots of issues when building from source on macOS with Apple M1 chip.
## ATTENTION
The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...
Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.
This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.
## Use case
```shell
./build.sh -j 8 --be --clean
cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```
## Something else
It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.
tcmalloc/jemalloc allocator cache does not participate in the mem check as part of the process physical memory.
because new/malloc will trigger mem hook when using tcmalloc/jemalloc allocator cache, but it may not actually alloc physical memory, which is not expected in mem hook fail.
in addition:
The value of tcmalloc/jemalloc allocator cache is used as a mem tracker, the parent is the process mem tracker, which is updated every 1s.
Modify the process default mem_limit to 90%. expect mem tracker to effectively limit the memory usage of the process.
1. add some metrics for cpu monitor;
2. add metrics for process state monitor;
3. add metrics for memory monitor;
It is convenient for us to use grafana to filter through different conditions.
After the added, we can find the cpu metrics like this:
doris_be_cpu{device="cpu1",mode="guest_nice"} 0
doris_be_cpu{device="cpu1",mode="guest"} 0
doris_be_cpu{device="cpu1",mode="steal"} 0
doris_be_cpu{device="cpu1",mode="soft_irq"} 107168
doris_be_cpu{device="cpu1",mode="irq"} 0
doris_be_cpu{device="cpu1",mode="iowait"} 3726931
doris_be_cpu{device="cpu1",mode="idle"} 2358039214
doris_be_cpu{device="cpu1",mode="system"} 58699464
doris_be_cpu{device="cpu1",mode="nice"} 1700438
doris_be_cpu{device="cpu1",mode="user"} 54974091
we can find the memory metrics as follow:
doris_be_memory_pswpin 167785
doris_be_memory_pswpout 203724
doris_be_memory_pgpgin 22308762092
doris_be_memory_pgpgout 152101956232
we also can find the process metrics as follow:
doris_be_proc{mode="interrupt"} 421721020416
doris_be_proc{mode="ctxt_switch"} 2806640907317
doris_be_proc{mode="procs_running"} 8
doris_be_proc{mode="procs_blocked"} 3
Sometimes we want to detect the hotspot of a cluster, for example, hot scanned tablet, hot wrote tablet,
but we have no insight about tablets in the cluster.
This patch introduce tablet level metrics to help to achieve this object, now support 4 metrics on tablets: `query_scan_bytes `, `query_scan_rows `, `flush_bytes `, `flush_count `.
However, one BE may holds hundreds of thousands of tablets, so I add a parameter for the metrics HTTP request,
and not return tablet level metrics by default.
Redesign metrics to 3 layers:
MetricRegistry - MetricEntity - Metrics
MetricRegistry : the register center
MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several
MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift
clients and servers, and so on.
Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity,
thrift_opened_clients on a thrift_client entity.
MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across
different MetricEntities.
Fix#3920
CL:
1. Parse the TCP metrics header in `/proc/net/snmp` to get the right position of the metrics.
2. Add 2 new metrics: `tcp_in_segs` and `tcp_out_segs`
Add a JSON format for existing metrics like this.
```
{
"tags":
{
"metric":"thread_pool",
"name":"thrift-server-pool",
"type":"active_thread_num"
},
"unit":"number",
"value":3
}
```
I add a new JsonMetricVisitor to handle the transformation.
It's not to modify existing PrometheusMetricVisitor and SimpleCoreMetricVisitor.
Also I add
1. A unit item to indicate the metric better
2. Cloning tablet statistics divided by database.
3. Use white space to replace newline in audit.log
* Enhence the usabilities
1. Add metrics to monitor transactions and steaming load process in BE.
2. Modify BE config 'result_buffer_cancelled_interval_time' to 300s.
3. Modify FE config 'enable_metric_calculator' to true.
4. Add more log for tracing broker load process.
5. Modify the query report process, to cancel query immediately if some instance failed.
* Fix bugs
1. Avoid NullPointer when enabling colocation join with broker load
2. Return immediately when pull load task coordinator execution failed
1. max io util of disks
2. max network send/receive bytes rate of all network devices
3. base/cumulative compaction request counter and failure counter
Added: Support getting column size and precision info of table or view using JDBC.
Updated: Change the promethues type name GAUGE to lowercase, to fit the latest promethues version.
Updated: Backend ip saved in FE will be compared with BE's local ip when doing heartbeat, to avoid false positive heartbeat response.
Updated: Using version_num of tablet instead of calculating nice value to select cumulative compaction candicates.
Fixed: Predicates should not be pushed down to subquery which contains limit clause.
Fixed: Fix the formula of calculating BE load score.
Fixed: Fix a bug that in some edge cases, non-master Fontend may wait for a unnecessary long timeout after forwarding cmd to Master FE.
Fixed: A bug that granting privs on more than one table does not work.
Fixed: Support 'Insert into' table which contains HLL columns.
Fixed: ExportStmt' toSql() method may throw NullPointer Exception if table does not exist.
Fixed: Remove unnecessary 'get capacity' operation to avoid IO impact.
Internal commit id: merge to c16bd603a53dfe2089ff95704c698a738c317792