1. Spark currently does not populate bucketed output which is compatible with Hive, so spark bucket table is not supported in current implementation.
2. Hive 3.0 introduced bucket version 2, but doris still uses hive 2.3.7 which lacks the hash function of version 2, so I refer to the implementation of Trino and copy the hash function from hive.
3. Current implementation doest not support the table with multiple bucketed columns, and only support `Equal` and `In` predicates.
Add MultiJoin.
In addtion, when (joinInputs.size() >= 3 && !conjuncts.isEmpty()), conjunct still can contains onPredicate.
Like:
```
A join B on A.id = B.id where A.sid = B.sid
```
1.make version publish work in version order
2.update delete bitmap while publish version, load current version rowset
primary key and search in pre rowsets
3.speed up publish version task by parallel tablet publish task
Co-authored-by: yixiutt <yixiu@selectdb.com>
add rule to push predicates down to aggregation node
add PushDownPredicatesThroughAggregation.java
add ut for PushDownPredicatesThroughAggregation
For example:
```
Logical plan tree:
any_node
|
filter (a>0 and b>0)
|
group by(a, c)
|
scan
```
transformed to:
```
project
|
upper filter (b>0)
|
group by(a, c)
|
bottom filter (a>0)
|
scan
```
Note:
'a>0' could be push down, because 'a' is in group by keys;
but 'b>0' could not push down, because 'b' is not in group by keys.
This PR proposes to ignore tracking the following file and dir auto-generated by ANTLR4:
fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisLexer.tokens
fe/fe-core/src/main/antlr4/org/apache/doris/nereids/gen/
support cast and extract date for TPC-H, for example:
select cast(a as datetime) as d from test;
select extract(year from datetime_column) as y from test
When we copy a plan into memo, we will check if this plan is already in memo or it is a new plan.
In the new version of Memo.copyIn(), we encapsulate is_new and the plan's corresponding group-expression.
The is_new is used to avoid repeatedly apply rules against the same plan, and hence save optimize efforts.
Describe the overview of changes.
change Memo.copyIn() and related function interfaces
every time Memo.copyIn() is invoked, we check if the plan is already recorded by memo or not. if plan is not new, we do not put its group into stack for further optimization.
* [fix](image) fix bug that latestValidatedImageSeq may not be the second largest image id
When traversing the image files in the meta directory,
it cannot be guaranteed to be traversed in the order of imageid size
For example, if it traverses the image file with orders like: 3,5,4,1,
then latestImageSeq is 5, but latestValidatedImageSeq is 3, which is wrong.
LogicalSort.equals() method depends on OrderKey.equals(), which is not defined correctly.
This pr defines OrderKey.equals() to enable correctly comparing LogicalSort.
This PR proposes to move the analysis logic to the dedicated class NereidsAnalyzer, which has the following benefits:
Unify the analysis logic in production and test files.
Facilitate analyzing subquery plans within different scopes.