After applying NormalizeAggregate rule, owner groups of all aggregate children are removed.
The root cause is the new aggregate node is regarded as the old aggregate node, because LogicalAggregate.equals() does not take some attributes ("normalized", "disassembled") into account.
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
Add a new property called 'reserve_replica', which means you can
get a table with same partitions with the same replication num
as before the backup.
Co-authored-by: Stalary <stalary@163.com>
Co-authored-by: camby <104178625@qq.com>
Fix some bugs when add REWRITE rule to Cascades Optimizer
- all rule should set as not rewrite rule when use them in Cascades Optimizer
- IMPLEMENT rule promise should large than others since we should do exploration first.
In old planner, Predicate set its type in analyzeImpl(). However, function analyzeImpl() is in old planner path, but not in nereids path. And hence the type is invalid.
Because all predicate has type bool, we set its type in constructor.
Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g.
SELECT COUNT(1) FROM t;
This PR reserve the column with the smallest amount of data when doing column prune under this situation.
To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.
#11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch
may be corrupted because the data of _input_block is copied shallowly to the _output_block.
Currently, explain string print all expression as slot id, e.g. `<slot 1>`.
This PR, print its name with slot id instead, e.g. `column_a[#1]`. For details:
- print qualified table name for OlapScanNode
- print NamedExpression name with SlotId instead of just SlotId
- OlapScanNode's node name use "OlapScanNode" instead of table name
Currently, there are still lots of bugs related to ARRAY<NOT_NULL(T)>.
We decide that we don't support ARRAY<NOT_NULL(T)> types at the first version and all elements in ARRAY are nullable.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
There are some bugs in Nereids' StatsCalculator.
1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
- enable CBO stage in Nereids
- use the `chooseBestPlan()` to get the best plan
- add a new rule JoinCommuteProject
- test the stage by JoinCommute rule
When Sum's child is Decimal, Return Double Type by mistake lead to result error, so we should keep the return type to decimal when the child expression's type is decimal.
this pr do 2 refactor
1. remove useless parameter from `Plan#computeOutput`
2. refactor memo.copyIn
It the past, `memo.copyIn` has complex logic to process init, rewrite and copyIn, It's difficult to understand and easy to meet bug and leak memory for some unreachable group/groupExpression.
So I separate it into three methods:
1. `Memo.init` for init Memo by LogicalPlan
2. `Memo.doRewrite` for rewrite
3. `Memo.doCopyIn` for exploration and implementation
And separate the UT into 3 files
1. `MemoInitTest`
2. `MemoRewriteTest`
3. `MemoCopyInTest`
I have added a lots of UT for `Memo.rewrite`, and add some unreachable DAG check in the PlanChecker, when the plan is changed.
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE
Co-authored-by: HappenLee <happenlee@hotmail.com>
We can skip aggregate on replace column, otherwise it would generate
wrong result. e.g. a row in UNIQUE is deleted by delte_sign_column,
then it would be returned.
Array column should not be distributed key or group by or aggregate key, we should forbid it before create table.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Add one expression rewrite rule:
rewrite InPredicate to an EqualTo Expression, if there exists exactly one element in InPredicate Options.
Examples:
1. where A in (x) ==> where A = x
2. where A not in (x) ==> where not A = x