This PR changed some interfaces to avoid unsafe cast.
- Modify `Plan.getExpressions()`'s return type from `List<Expression>` to `List<? extends Expression>`
Return projects (type is a list of named expression) in `getExpressions` can avoid unsafe cast. See `LogicalProject.getExpression()` as an example.
- Modify `EmptyRelation.getProjects()`'s return type from `List<NamedExpression>` to `List<? extends NamedExpression>`
Creating empty relation with a list of slots can avoid unsafe cast. See the `EliminateLimit` rule for example.
In Nereids, we could not distinguish two relation from same table in one PlanTree.
This lead to some trick code to process them when do plan. Such as a separate branch to do equals in GroupExpression.
This PR add RelationId to LogicalRelation and PhysicalRelation. Then all relations equals function will compare RelationId to help us distinguish two relation from same table.
TODO:
add relation id to UnboundRelation, UnboundOneRowRelation, LogicalOneRowRelation, PhysicalOneRowRelation.
A channel is closed when a timeout or exception happens, if only
one stub is used, then all query would fail.
If we dont close the channel, sometimes grpc-java stuck without sending
any rpc.
1. For some related rules, we need to execute them together to get the expected plan.
2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.
For debug purpose:
Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model.
Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.
Related pr:
https://github.com/apache/doris/pull/11582https://github.com/apache/doris/pull/12048
Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.
Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.
This rule is rewrite project -> limit to limit -> project. The reason is we could get tree like project -> limit -> project -> other node. If we do not rewrite it. we could not merge the two project into one. And if we has more than one project on one node, the second one will overwrite the first one when translate. Then, be will core dump or return slot cannot find error.
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
# Proposed changes
First step of #12303
## Problem summary
This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.
This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default. Cases that pre-aggregation should be turned off will be addressed in the next PR.
Main steps for rollup index selection:
1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.
TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases.
Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.
The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added
Co-authored-by: morrySnow <morrysnow@126.com>
Support function registry.
The classes:
- BuiltinFunctions: contains the built-in functions list
- FunctionRegistry: used to register scalar functions and aggregate functions, it can find the function by name
- FunctionBuilder: used to resolve a BoundFunction class, extract the constructor, and build to a BoundFunction by arguments(`List<Expression>`)
Register example: you can add built-in functions in the list for simplicity
```java
public class BuiltinFunctions implements FunctionHelper {
public final List<ScalarFunc> scalarFunctions = ImmutableList.of(
scalar(Substring.class, "substr", "substring"),
scalar(WeekOfYear.class),
scalar(Year.class)
);
public final ImmutableList<AggregateFunc> aggregateFunctions = ImmutableList.of(
agg(Avg.class),
agg(Count.class),
agg(Max.class),
agg(Min.class),
agg(Sum.class)
);
}
```
Note:
- Currently, we only support register scalar functions add aggregate functions, we will support register table functions.
- Currently, we only support resolve function by function name and difference arity, but can not resolve the same arity override function, e.g. `some_function(Expression)` and `some_function(Literal)`
when uncorrelated subquery in having predicates, having's output will appears one slot from subquery by mistake. This PR fix it by always add a project on the top of having.
Co-authored-by: mch_ucchi <organic_chemistry@foxmail.com>
1. For query with 1656 union, the plan thrift size will be reduced from 400MB+ to 2MB.
This optimization is introduced from #4904, but lost after #9720
2. Disable ExprSubstitutionMap.verify when debug is disable.
So that the plan time of query with 1656 union will be reduced from 20s to 2s
The original statistic derive calculate algorithm rely on NDV and other column statistics. But we cannot get these stats in product environment.
This PR change these operator's stats calc algorithm to use a DEFAULT RATIO variable instead of column statistics.
We should change these algorithm when we could get column stats in product environment
Implement uncheckedCast on VarcharLiteral for a temp way to let TimestampArithmetic work.
We should remove these code and do implicit cast in TypeCoercion rule in future.
Just as legacy planner, Nereids parse all fractional literal to decimal.
In the future, we will add more syntax for user to control the fractional literal type.
Execution plan display when using orthogonal_bitmap_union_count function:
PREAGGREGATION: OFF
Reason: Invalid Aggregate Operator: orthogonal_bitmap_union_count
The correct plan is: PREAGGREGATION: ON
Co-authored-by: lihuigang <lihuigang@meituan.com>