scalar subquery:
A subquery that will return only one row and one column.
A limit has been added, where a = subquery returns a result of 1 row and 1 column.
Here, the first limit is to return 1 column.
TODO: the subsequent return will limit the return to 1 row
* [tracing] Support opentelemtry collector.
1. support for exporting traces to multiple distributed tracing system via collector;
2. support using collector to process traces.
Increase the parsing of subquery.
Add LogicalApply and LogicalCorrelatedJoin and LogicalEnforceSingleRow.
(These structures are temporarily in use, in preparation for the follow-up)
LogicalApply:
Apply Node for subquery.
Use this node to display the subquery in the relational algebra tree.
refer to "Orthogonal Optimization of Subqueries and Aggregation"
LogicalCorrelatedJoin:
A relational algebra node with join type converted from apply node to subquery.
LogicalEnforceSingleRow:
Guaranteed to return a result of 1 row.
1. Spark currently does not populate bucketed output which is compatible with Hive, so spark bucket table is not supported in current implementation.
2. Hive 3.0 introduced bucket version 2, but doris still uses hive 2.3.7 which lacks the hash function of version 2, so I refer to the implementation of Trino and copy the hash function from hive.
3. Current implementation doest not support the table with multiple bucketed columns, and only support `Equal` and `In` predicates.
Add MultiJoin.
In addtion, when (joinInputs.size() >= 3 && !conjuncts.isEmpty()), conjunct still can contains onPredicate.
Like:
```
A join B on A.id = B.id where A.sid = B.sid
```
add rule to push predicates down to aggregation node
add PushDownPredicatesThroughAggregation.java
add ut for PushDownPredicatesThroughAggregation
For example:
```
Logical plan tree:
any_node
|
filter (a>0 and b>0)
|
group by(a, c)
|
scan
```
transformed to:
```
project
|
upper filter (b>0)
|
group by(a, c)
|
bottom filter (a>0)
|
scan
```
Note:
'a>0' could be push down, because 'a' is in group by keys;
but 'b>0' could not push down, because 'b' is not in group by keys.
support cast and extract date for TPC-H, for example:
select cast(a as datetime) as d from test;
select extract(year from datetime_column) as y from test
When we copy a plan into memo, we will check if this plan is already in memo or it is a new plan.
In the new version of Memo.copyIn(), we encapsulate is_new and the plan's corresponding group-expression.
The is_new is used to avoid repeatedly apply rules against the same plan, and hence save optimize efforts.
Describe the overview of changes.
change Memo.copyIn() and related function interfaces
every time Memo.copyIn() is invoked, we check if the plan is already recorded by memo or not. if plan is not new, we do not put its group into stack for further optimization.
* [fix](image) fix bug that latestValidatedImageSeq may not be the second largest image id
When traversing the image files in the meta directory,
it cannot be guaranteed to be traversed in the order of imageid size
For example, if it traverses the image file with orders like: 3,5,4,1,
then latestImageSeq is 5, but latestValidatedImageSeq is 3, which is wrong.
LogicalSort.equals() method depends on OrderKey.equals(), which is not defined correctly.
This pr defines OrderKey.equals() to enable correctly comparing LogicalSort.
This PR proposes to move the analysis logic to the dedicated class NereidsAnalyzer, which has the following benefits:
Unify the analysis logic in production and test files.
Facilitate analyzing subquery plans within different scopes.