* Add `-XX:+PrintClassHistogramAfterFullGC` for JAVA_OPTS
* Add `classhisto*=trace` for JAVA_OPTS_FOR_JDK_17
fe.gc.log will print like this:
```
2024-11-15T11:49:00.316+0800: 11.346: [Class Histogram (after full gc):
num #instances #bytes class name
----------------------------------------------
1: 7464 7053464 [B
2: 37465 3656360 [C
3: 7076 2909880 [Ljava.lang.Object;
4: 4915 2306872 [I
5: 9167 1719552 [S
6: 16229 1168488 io.grpc.netty.shaded.io.netty.buffer.PoolSubpage
......
```
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [x] Previous test can cover this change.
- [x] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
bp #45148
### What problem does this PR solve?
Problem Summary:
Optimize reading of maxcompute partition tables:
1. Introduce batch mode to generate splits for Maxcompute partition
tables to optimize scenarios with a large number of partitions. Control
it through the variable `num_partitions_in_batch_mode`.
2. Introduce catalog parameter `mc.split_cross_partition`. The parameter
is true, which is more friendly to reading partition tables, and false,
which is more friendly to debug.
3. Add `-Darrow.enable_null_check_for_get=false` to be jvm to improve
the efficiency of mc arrow data conversion.
bp #40225 , #40888 ,#41386
## Proposed changes
Among them, #40225 is the new api of mc,
#40888 is used to fix the bug when reading null between the new and old
apis,
#41386 is used for compatibility between the new and old versions
Refactor the config for log dir of FE and BE
TLDR:
- Use env variable `LOG_DIR` to set root log dir
- Remove `sys_log_dir` for FE and BE
Details:
1. FE
1. The root log dir is set by env variable `LOG_DIR` in `fe.conf`
2. The default value of `audit_log_dir` is same as `${LOG_DIR}/`
3. The default value of `spark_launcher_log_dir` is `${LOG_DIR}/spark_launcher_log`
4. The default value of `nereids_trace_log_dir` is `${LOG_DIR}/nereids_trace_log`
5. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
2. BE
1. The root log dir is set by env variable `LOG_DIR` in `be.conf`
2. Remove `pipeline_tracing_log_dir`, use `${LOG_DIR}` directly.
3. The origin `sys_log_dir` is deprecated, and default value is `""`.
But for compatibility, if user already set `sys_log_dir` before, Doris will still use it as root log dir.
Issue Number: close#30484
problem:
gson will use Java's reflection mechanism to generate a default Adapter, but JDK17 is prohibited from visiting such an access.
solution:
gson has provided solutions since 2.9.1, which can bypass this problem: Add support for reflection access filter by Marcono1234 · Pull Request #1905 · google/gson
We need to upgrade the gson version and use this solution
tc/jemalloc_free_memory in web beip:8040/mem_tracker is the cache size of Jemalloc.
Previously lg_tcache_max:20, this will cache up to 1M Bin in the thread cache, which will cause the Jemalloc cache to be too large in some scenarios.
Restore the default lg_tcache_max:16, which can cache a maximum of 64K Bins.
If you are doing a performance POC, you can consider increasing it.
Fix hadoop short circuit reading can not enabled in some environments.
- Revert #21430 because it will cause performance degradation issue.
- Add `$HADOOP_CONF_DIR` to `$CLASSPATH`.
- Remove empty `hdfs-site.xml`. Because in some environments it will cause hadoop short circuit reading can not enabled.
- Copy the hadoop common native libs(which is copied from https://github.com/apache/doris-thirdparty/pull/98
) and add it to `LD_LIBRARY_PATH`. Because in some environments `LD_LIBRARY_PATH` doesn't contain hadoop common native libs, which will cause hadoop short circuit reading can not enabled.
As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance, and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users.
refer to https://logging.apache.org/log4j/2.x/performance.html
1. cast string literal to date like type should not be an implict cast
2. the string representation of float like type should not be scientific notation
3. the data type of like function's regex expr should be string type even if it's a null literal
4. add -Xss4m in fe.conf to prevent stack overflow in some case