1. add a feature that support statement having aggregate function in order by list. such as:
SELECT COUNT(*) FROM t GROUP BY c1 ORDER BY COUNT(*) DESC;
2. add clickbench analyze unit tests
This pr does three things:
1. Modified the framework of table-valued-function(tvf).
2. be support `fetch_table_schema` rpc.
3. Implemented `S3(path, AK, SK, format)` table-valued-function.
[What is DLF](https://www.alibabacloud.com/product/datalake-formation)
This PR is a preparation for support DLF, with some changes of multi catalog
1. Add RuntimeException for most of hive meta store or es client visit operation.
2. Add DLF related dependencies.
3. Move the checks of es catalog properties to the analysis phase of creating es catalog
TODO(in next PR):
1. Refactor the `getSplit` method to support not only hdfs, but s3-compatible object storage.
2. Finish the implementation of supporting DLF
Persist external catalog/db/table, including the columns of external tables.
After this change, external objects could have their own uniq ID through their lifetime,
this is required for the statistic information collection.
Add a rule to check the permission of a user who are executing a query. Forbid users who don't have SELECT_PRIV on some tables from executing queries on these tables.
Support running transactional insert operation with new scan framework. eg:
admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert
Do not support non-literal value in insert stmt
Fix some issue about array type:
Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
Support common table expression(CTE) in Nereids:
- Just implemented inline CTE, which means we will copy the logicalPlan of CTE everywhere it is referenced;
- If the name of CTE is the same as an existing table or view, we will choose CTE first;
`BackendServiceProxy.getInstance()` uses the round robin strategy to obtain the proxy,
so when the current RPC request is abnormal, the proxy removed by
`BackendServiceProxy.getInstance().removeProxy(...)` is not an abnormal proxy.
Proposed changes
1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode.
- IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL
- NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE
- ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF
- ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF
3. generate lots of scalar functions
4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function.
5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property
Note:
1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing.
2. A known problem is can not invoke the variable length function, I will fix it later.
Refactor eliminate outer join #12985
Evaluate the expression with ConstantFoldRule. If the evaluation result is NULL or FALSE, then the elimination condition is satisfied.
after project, some Slot maybe project to another one. So we need to replace ExprId in DistributionSpecHash to the new one. if we do project other than Alias, We need to return DistributionSpecAny other than child's DistributionSpec.
This rule eliminate project that output set is same with its child. If the project is the root of plan, the elimination condition is project's output is exactly the same with its child.
The reason to add this rule is when we do join reorder in optimization, the root of plan after transformed maybe a Project and its output set is same with the root of plan before transformed. If we had a Project on the top of the root and its output set is same with the root of plan too. We will have two exactly same projects in memo. One of them is the parent of the other. After MergeProject, we will get a new Project exactly same like the child and need to add to parent's group. Then we trigger Merge Group. Since merge will produce a cycle, the merge will be denied and we will get a final plan with two consecutive projects.
## for example:
**BEFORE OPTIMIZATION**
```
LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1]
+--LogicalJoin(type=LEFT_SEMI_JOIN) [GroupId#2]
|--LogicalProject(...)
| +--LogicalJoin(type=INNER_JOIN)
| ...
+--LogicalOlapScan(...)
```
**AFTER APPLY RULE: LOGICAL_SEMI_JOIN_LOGICAL_JOIN_TRANSPOSE_PROJECT**
```
LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1]
+--LogicalProject2( projects=[c_custkey#0, c_name#1]) [GroupId#2]
+--LogicalJoin(type=INNER_JOIN) [GroupId#10]
|--LogicalProject(...)
| +--LogicalJoin(type=LEFT_SEMI_JOIN)
| ...
+--LogicalOlapScan(...)
```
**AFTER APPLY RULE: MERGE_PROJECTS**
```
LogicalProject3( projects=[c_custkey#0, c_name#1]) [should be in GroupId#1, but in GroupId#2 in fact]
+--LogicalJoin(type=INNER_JOIN) [GroupId#10]
|--LogicalProject(...)
| +--LogicalJoin(type=LEFT_SEMI_JOIN)
| ...
+--LogicalOlapScan(...)
```
Since we have exaclty GroupExpression(LogicalProject3 and LogicalProject2) in GroupId#1 and GroupId#2, we need to do MergeGroup(GroupId#1, GroupId#2). But we have child of GroupId#1 in GroupId#2. So the merge is denied.
If the best GroupExpression in GroupId#2 is LogicalProject3, we will get two consecutive projects in the final plan.