configs
Bdbje elect timeout is 30 seconds, so we enlarge thrift_rpc_timeout_ms
and txn_commit_rpc_timeout_ms to 60s.
BTW: enlarge bdbje_lock_timeout_second from 1 to 5.
Support such grammar
ANALYZE TABLE test WITH CRON "* * * * * ?"
Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only
1. cancel future when meet timeout and add config to modify rpc timeout
2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked
Add a new FE config `force_olap_table_replication_num`.
If this config is larger than 0, when doing creating table operation, the replication num of table will
forcibly be this value.
Default is 0, which make no effect.
This config will only effect the creating olap table operation, other operation such as `add partition`,
`modify table properties` will not be effect.
The motivation of this config is that the most regression test cases are creating table will single replica,
this will be the regression test running well in p0, p1 pipeline.
But we also need to run these cases in multi backend Doris cluster, so we need test cases will multi replicas.
But it is hard to modify each test cases. So I add this config, so that we can simply set it to create all tables with
specified replication number.
Use file system type and Conf as key to cache remote file system.
This could avoid get a new file system for each external table partition's location.
The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.
Enlarge jetty_server_max_http_header_size to avoid Request Header Fields
Too Large error when streamloading to FE.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Older MySQL client (< 5.7.28) will try to connect to server with tls1.1,
which is insecure and is not supported by Doris FE. The connection will
fail.
We disable ssl connection support on Doris FE to keep the users' application
unaffected. To enable ssl support explicitly, just put
the following to fe.conf
```
enable_ssl = true
```
Fetch iceberg table stats automatically while querying a table.
Collect accurate statistics for Iceberg table by running analyze sql in Doris (remove collect by meta option).
I will enhance performance about querying meta cache of hms tables by 2 steps:
**Step1** : use concurrent batch loading for meta cache
**Step2** : execute some other tasks concurrently as soon as possible
**This pr mainly for step1 and it mainly do the following things:**
- Create a `CacheBulkLoader` for batch loading
- Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful)
- Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full).
- Remove parallel streams and use the `CacheBulkLoader` to do batch loadings
- Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size`
- Fix the spelling mistake for `max_hive_table_catch_num`
Support estimate table row count based on file size.
With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
The current column statistic cache loader is to load data from column_statistics olap table.
This pr is to change the cache loader logic to First load from column_statistics olap table, if no data was loaded, then load from table metadata. This is mainly to support fetch statistics data for external catalog using HMS or Iceberg api.
This is the first PR, next pr will implement the fetch logic for different external catalogs.
1. add more checks for match expression in nereids:
- match expression only support in filter
- match expression left child and right child must all be string type
- left child for match expression must be sloftRef, right child for match expression must be Literal
2. to fix regression case test_index_match_select and test_index_match_phrase
As we know, log4j2 some times may be bottleneck in doris fe when there are many logs to be output in sync mode while asynchronous logging has a better performance, and we find that capturing caller location has a similar impact across all logging libraries, and slows down asynchronous logging by about 30-100x. so, here we provide three log mode for log4j2 to meet the needs of different users.
refer to https://logging.apache.org/log4j/2.x/performance.html
This PR addresses the refactoring of common methods that were originally located within the ODBC classes, but were used by the JDBC classes. These methods have now been moved to the JDBC classes to improve code readability and maintainability.
In addition, we have disabled the creation of ODBC external tables by default. However, this will not affect the existing usage of ODBC. You can still enable the ODBC external tables through the enable_odbc_table setting. Please be aware that we plan to completely remove the ODBC external tables in future versions, so we recommend using the JDBC Catalog as a priority.
Support collect statistics for HMS external table with specific partitions. Add session variables to limit the partitions to collect for whole table line number and columns statistics.
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.