This PR was originally #16940 , but it has not been updated for a long time due to the original author @Cai-Yao . At present, we will merge some of the code into the master first.
thanks @Cai-Yao @yiguolei
* work around, ingest binlog after backup/restore which local_tablet.partition_id is not correct, use by
req.partition_id
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
In some cases ( or bugs), doris may returned query to jdbc, but jdbc can not recognized what doris sent back,
so hanged. To fix this, add a timeout of 30 minutes to jdbc connection.
MTMV regression tests may loop forever due to some potential bugs. Therefore, we add a timeout to avoid endless loop. The value of the timeout is hard coded 30 minutes now.
This pr refactor the column pruning by the visitor, the good sides
1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate
2. support shrink output field in some plans, this can skip some useless operations so improvement
example:
```sql
select id
from (
select id, sum(age)
from student
group by id
)a
```
we should prune the useless `sum (age)` in the aggregate.
before refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
+--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`#4], hasRepeat=false )
+--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true )
+--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```
after refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
+--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false )
+--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```
* [fix](Nereids): fix scalar_function A-F.
* [Fix](regression-test)fix regression test framework cannot compare double value nan and inf.
* revert dround()
Main subtask of [DSIP-28](https://cwiki.apache.org/confluence/display/DORIS/DSIP-028%3A+Suppot+MySQL+Load+Data)
## Problem summary
Support mysql load syntax as below:
```sql
LOAD DATA
[LOCAL]
INFILE 'file_name'
INTO TABLE tbl_name
[PARTITION (partition_name [, partition_name] ...)]
[COLUMNS TERMINATED BY 'string']
[LINES TERMINATED BY 'string']
[IGNORE number {LINES | ROWS}]
[(col_name_or_user_var [, col_name_or_user_var] ...)]
[SET (col_name={expr | DEFAULT} [, col_name={expr | DEFAULT}] ...)]
[PROPERTIES (key1 = value1 [, key2=value2]) ]
```
For example,
```sql
LOAD DATA
LOCAL
INFILE 'local_test.file'
INTO TABLE db1.table1
PARTITION (partition_a, partition_b, partition_c, partition_d)
COLUMNS TERMINATED BY '\t'
(k1, k2, v2, v10, v11)
set (c1=k1,c2=k2,c3=v10,c4=v11)
PROPERTIES ("auth" = "root:", "strict_mode"="true")
```
Note that in this pr the property named `auth` must be set since stream load need auth. I will optimize it later.
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
Support benchmarkAction for regression test, this action can help us to run the benchmark queries and print the result
example:
benchmark {
executeTimes 3
warmUp true
skipFailure true
printResult true
sqls(["select 1", "select 2"])
}
Proposed changes
1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode.
- IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL
- NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE
- ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF
- ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF
3. generate lots of scalar functions
4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function.
5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property
Note:
1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing.
2. A known problem is can not invoke the variable length function, I will fix it later.
Some problems have been found with the setting of parallel_fragment_exec_inistance_num > 1.
Try to use this way to set a random parallel_fragment_exec_inistance_num value for each query to cover more situations.