In current cte multicast fragment param computing logic in coordinator, if shared hash table for bc opened, its destination's number will be the same as be hosts'. But the judgment of falling into shared hash table bc part code is wrong, which will cause when a multicast's target is fixed with both bc and partition, the first bc info will overwrite the following partition's, i.e, the destination info will be the host level, which should be per instance. This will cause the hash partition part hang.
Problem:
Minidump unit test failed because of column statistic deserialization need a new column schema but not added to minidump unit test file
Solved:
Add last update time to unit test input file
1. cancel future when meet timeout and add config to modify rpc timeout
2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked
### Two main changes:
- 1、add minidump replay
- 2、change minidump serialization of statistic messages and some interface between main logic of nereids optimizer and minidump
### Use of nereids ut:
- 1、save minidump files:
Execute command by mysql-client:
```
set enable_nereids_planner=true;
set enable_minidump=true;
```
Execute sql in mysql-client
- 2、use nereids-ut script to execute directory:
```
cp -r ${DORIS_HOME}/minidump ${DORIS_HOME}/output/fe && cd ${DORIS_HOME}/output/fe
./nereids_ut --d ${directory_of_minidump_files}
```
### Refactor of minidump
- move statistics used serialization to serialization of input and serialize with catalogs
- generating minidump file only when enable_minidump flag is set, minidump module interactive with main optimizer only by :
serializeInputsToDumpFile(catalog, statistics, query) && serializeOutputsToDumpFile(outputplan).
In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese') |
+---------------------------------------+
| ["来到", "北京", "清华大学"] |
| ["我爱你", "中国"] |
| ["人民", "得到", "更", "实惠"] |
+---------------------------------------+
```
the global limit will create a gather action, and all the data will be calculated in one instance. If we push down the global limit, the node run after the limit node will run slowly.
We fix it by push down only local limit.
a join plan tree before fixing:
```
LogicalLimit(global)
LogicalLimit(local)
Plan()
LogicalLimit(global)
LogicalLimit(local)
LogicalJoin
LogicalLimit(global)
LogicalLimit(local)
Plan()
LogicalLimit(global)
LogicalLimit(local)
Plan()
after fixing:
LogicalLimit(global)
LogicalLimit(local)
Plan()
LogicalLimit(local)
LogicalJoin
LogicalLimit(local)
Plan()
LogicalLimit(local)
Plan()
```
New aggregation function: map_agg.
This function requires two arguments: a key and a value, which are used to build a map.
select map_agg(column1, column2) from t group by column3;
1. support collect query counter and error query counter metric in user level
2. add sum and count for histogram metric for mistaken delete in PR #22045
Problem:
When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example:
create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123;
we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can
not get any relationship of these two predicates.
Solution:
We need to cache middle results of source predicates like t22.c1 = 123 in example.
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.
For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01' order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`
`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.
### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
- common name support `-` ,reason: MySQL's db name support `-`
- table name support `-`
- username support `.`,reason:LDAP's username support `.`
- ldap doc
- ldap support rbac
1.support filesystem metastore
2.support predicate and project when split
3.fix partition table query error
todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem
doc pr: #21966
no need to run pipeline if only modify regression-test/pipeline/p0/conf/regression-conf.groovy or regression-test/pipeline/p1/conf/regression-conf.groovy
1. set replica count fot stats tbl as :"Math.max(Config.statistic_internal_table_replica_num,Config.min_replication_num_per_tablet)"
2. update comment for stats tbl remove symbol `'`
Use weak_ptr to cache the file handle of file segment. The max cached number of file handles can be configured by `file_cache_max_file_reader_cache_size`, default `1000000`.
Users can inspect the number of cached file handles by request BE metrics: `http://be_host:be_webserver_port/metrics`:
```
# TYPE doris_be_file_cache_segment_reader_cache_size gauge
doris_be_file_cache_segment_reader_cache_size{path="/mnt/datadisk1/gaoxin/file_cache"} 2500
```
This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling.
Example:
```
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3
-- Transformed to
SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3
```
The following functions can be transformed:
- COUNT
- SUM
- AVG
- GROUP_CONCAT
If any unsupported functions are encountered, an error is now reported during the optimization phase.
To ensure the absence of such cases, a final check has been implemented after the rewriting phase.
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017 | 0.01705 | 0.0171 |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171 | 0.01705 | 0.0171 |
+-------------------------+---------------+-------------------+
```
According the implementation in execution engine, all order keys
in SortNode will be output. We must normalize LogicalSort follow
by it.
We push down all non-slot order key in sort to materialize them
behind sort. So, all order key will be slot and do not need do
projection by SortNode itself.
This will simplify translation of SortNode by avoid to generate
resolvedTupleExprs and sortTupleDesc.