`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
In VCollectIterator&VGenericIterator, use insert_range_from to copy rows
in a block which is continuous to save cpu cost.
If rows in rowset and segment are non overlapping, this whill improve 30%
throughput of compaction.If rows are completely overlapping such as load two
same files, the throughput goes nearly same as before.
Co-authored-by: yixiutt <yixiu@selectdb.com>
There are some bugs in Nereids' StatsCalculator.
1. Project: return child column stats directly, so its parents cannot find column stats from project's slot.
2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot.
3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.
When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.
Read and generate parquet array column.
When D=1, R=0, representing an empty array. Empty array is not a null value, so the NullMap for this row is false,
the offset for this row is [offset_start, offset_end) whose `offset_start == offset_end`,
and offset_end is the start offset of the next row, so there is no value in the nested primitive column.
When D=0, R=0, representing a null array, and the NullMap for this row is true.
- enable CBO stage in Nereids
- use the `chooseBestPlan()` to get the best plan
- add a new rule JoinCommuteProject
- test the stage by JoinCommute rule
When Sum's child is Decimal, Return Double Type by mistake lead to result error, so we should keep the return type to decimal when the child expression's type is decimal.
this pr do 2 refactor
1. remove useless parameter from `Plan#computeOutput`
2. refactor memo.copyIn
It the past, `memo.copyIn` has complex logic to process init, rewrite and copyIn, It's difficult to understand and easy to meet bug and leak memory for some unreachable group/groupExpression.
So I separate it into three methods:
1. `Memo.init` for init Memo by LogicalPlan
2. `Memo.doRewrite` for rewrite
3. `Memo.doCopyIn` for exploration and implementation
And separate the UT into 3 files
1. `MemoInitTest`
2. `MemoRewriteTest`
3. `MemoCopyInTest`
I have added a lots of UT for `Memo.rewrite`, and add some unreachable DAG check in the PlanChecker, when the plan is changed.
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE
Co-authored-by: HappenLee <happenlee@hotmail.com>
We can skip aggregate on replace column, otherwise it would generate
wrong result. e.g. a row in UNIQUE is deleted by delte_sign_column,
then it would be returned.