1. Spark currently does not populate bucketed output which is compatible with Hive, so spark bucket table is not supported in current implementation.
2. Hive 3.0 introduced bucket version 2, but doris still uses hive 2.3.7 which lacks the hash function of version 2, so I refer to the implementation of Trino and copy the hash function from hive.
3. Current implementation doest not support the table with multiple bucketed columns, and only support `Equal` and `In` predicates.
Add MultiJoin.
In addtion, when (joinInputs.size() >= 3 && !conjuncts.isEmpty()), conjunct still can contains onPredicate.
Like:
```
A join B on A.id = B.id where A.sid = B.sid
```
add rule to push predicates down to aggregation node
add PushDownPredicatesThroughAggregation.java
add ut for PushDownPredicatesThroughAggregation
For example:
```
Logical plan tree:
any_node
|
filter (a>0 and b>0)
|
group by(a, c)
|
scan
```
transformed to:
```
project
|
upper filter (b>0)
|
group by(a, c)
|
bottom filter (a>0)
|
scan
```
Note:
'a>0' could be push down, because 'a' is in group by keys;
but 'b>0' could not push down, because 'b' is not in group by keys.
support cast and extract date for TPC-H, for example:
select cast(a as datetime) as d from test;
select extract(year from datetime_column) as y from test
When we copy a plan into memo, we will check if this plan is already in memo or it is a new plan.
In the new version of Memo.copyIn(), we encapsulate is_new and the plan's corresponding group-expression.
The is_new is used to avoid repeatedly apply rules against the same plan, and hence save optimize efforts.
Describe the overview of changes.
change Memo.copyIn() and related function interfaces
every time Memo.copyIn() is invoked, we check if the plan is already recorded by memo or not. if plan is not new, we do not put its group into stack for further optimization.
* [fix](image) fix bug that latestValidatedImageSeq may not be the second largest image id
When traversing the image files in the meta directory,
it cannot be guaranteed to be traversed in the order of imageid size
For example, if it traverses the image file with orders like: 3,5,4,1,
then latestImageSeq is 5, but latestValidatedImageSeq is 3, which is wrong.
LogicalSort.equals() method depends on OrderKey.equals(), which is not defined correctly.
This pr defines OrderKey.equals() to enable correctly comparing LogicalSort.
This PR proposes to move the analysis logic to the dedicated class NereidsAnalyzer, which has the following benefits:
Unify the analysis logic in production and test files.
Facilitate analyzing subquery plans within different scopes.
1. Increase the expression of subquery and in.
2. Added tpch creation table and sql query, including original sql query and query rewritten by doris
3. Adjust the position of checkAnalyze
4. add exists subquery
1. Unify and refine property names in LogicalAggregate and PhysicalAggregate
2. Remove partitionExpressions in LogicalAggregate since it's a physical property, and should not appear in the logical plan. It should be generated when converting logical aggregate to physical aggregate or in enforcing rules.
1. [] do not have a proper array nested type, cause BE coredump
2. [abc] or ['abc'] load by vectorized load get error result
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
Currently, the new optimizer is under development. We want to merge not fully developed features into the code, but not be used in the main code path.
Added an annotation @Developing to mark these features.
current nereids planner execute ssb will run into dead loop and crash be, this pr fix this problem and add some regression test case prevent execute ssb failed
In Memo.copyIn( plan, group1, isRewrite), one branch is that the plan is already recorded in Memo, and owned by group 'group2'. In such case, 'group1' should be merged with 'group2', because they are equivalent.
After merge, the upper level of 'group1', saying 'p1 = group1.getLogicalExpression().getOwnerGroup()' of 'group1', and that of 'group2', saying 'p2', are equivalent. We need to merge 'p1' and 'p2'. And this process is recursive.