Add file cache metrics and management.
1. Get file cache metrics
> If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information.
API: `http://be_host:be_webserver_port/metrics`
File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2.
```
doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000
doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000
doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0
doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0
doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400
doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000
doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600
doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400
doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846
doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904
doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18
doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22
...
```
2. Release file cache
> Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache.
API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}`
Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released.
It's thread-safe to call this api, so only the segment files not been read currently can be released.
```
{"released_elements":22}
```
3. Specify the base path to store cache data
> Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance.
User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data. If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.
Prevent BE to be recompiled many files:
When we execute build.sh, it clean thrift code so that BE will be recompiled many files. It is added by this pr
#19217
We can use build.sh --clean to clean the thrift code. No need to clean it every time.
Co-authored-by: yiguolei <yiguolei@gmail.com>
Currently, exec node save exprcontext**, but the object is in object pool, the code is very unclear. we could just use exprcontext*.
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
TOperationStatus determines that a null pointer is redundant. If tOperationStatus is a null pointer, then tOperationStatus.getMessage() will have a null pointer exception.
In my scene, We need to specify databases that are excluded to synchronize to doris,
like some databases store temporary table.
Since #17803 introduce `specified_database_list` to specify 'include databases',
this pr introduce new config `exclude_database_list` to specify 'exclude databases',
and rename `specified_database_list` to `include_database_list` for naming symmetry.
BTW, when `include_database_list` and `exclude_database_list` specify overlapping databases, `exclude_database_list` would take effect with higher privilege over `include_database_list`.
When salve FE nodes replay OP_REFRESH_EXTERNAL_TABLE log, it will invoke `org.apache.doris.datasource.hive.HiveMetaStoreCache#invalidateTableCache`,
but if the table is a non-partitioned table, it will invoke `catalog.getClient().getTable`.
If some network problem occurs or this table is not existed, an exception will be thrown and FE will exit right away.
The solution is that we can use a dummy key as the file cache key which only contains db name and table name.
And when slave FE nodes replay OP_REFRESH_EXTERNAL_TABLE log,
it will not rely on the hms client and there will not any exception occurs.
LSC updates tablet's schema in writing. Be optimized adding columns via linked schema change and
it distinguishes adding by comparing column name. e.g. if new column's name is not found in old schema,
then it is a newly-add column.
When a table is under schema-changing, it adds __doris_shadow_ prefix in name of columns in shadow index.
Then writes during schema-changing would bring schema with __doris_shadow_ to be.
If schema change request arrives at be after writes, then be do it as a add-column schema change due to
__doris_shadow_ is not in base tablet.
Sometimes the dict is not initialized when run comparison predicate here, for example, the full page is null, then the reader will skip read, so that the dictionary is not inited. The cached code is wrong during this case, because the following page maybe not null, and the dict should have items in the future.
This will result the dict string column query return wrong result, if there are many null values in the column.
I also add some regression test for dict column's equal query, larger than query, less than query.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
When running regression test with setting enable_resource_group = true, it's shared by other test case, may be cause regression test failed.
So we should not set it to true until we have fully test it.