1. we need remove BE kinit, and use jni login with keytab, because kinit cannot renew TGT for doris in many complex cases.
> This pull requet will support new instance from keytab: https://github.com/apache/doris-thirdparty/pull/173, so now we won't need kinit cmd, just login with keytab and principal
2. add `kerberos_ccache_path` to set kerberos credentials cache path manually.
3. add `max_hdfs_file_handle_cache_time_ms` to set hdfs fs handle cache time.
After all LRU Cache inherits from LRUCachePolicy, this will allow prune stale entry, eviction when memory exceeds limit, and define common properties. LRUCache constructor change to private, only allow LRUCachePolicy to construct it.
Impl DummyLRUCache, when LRU Cache capacity is 0, will no longer be meaningless insert and evict.
Normally we write the separate index files to disk before we merge the index files into an idx compound file.
In high-frequency load scenarios, disk IO can become a bottleneck.
In order to reduce the pressure on the disk, we write the standalone index file to the RAM directory for the first time, and then write it to the disk when merging it into a composite file.
Add config `index_inverted_index_by_ram_dir_enable`, default is `false`.
We change memtable size from 200MB to 100MB to achieve smoother flush
performance. We change loadStreamPerNode from 20 to 60 to avoid stream
rpc to be the bottleneck when enable memtable_on_sink_node. We change
default s3&broker load parallelsim to make the most of CPUs on moderm
multi-core systems.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.
Be familiar with the usage of doris' import function and internal implementation process
Add a new be configuration item load_error_log_limit_bytes = default value 200MB
Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
Write regression cases for testing and verification
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>