1. For some related rules, we need to execute them together to get the expected plan.
2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.
In compute level, CHAR type will shrink suffix zeros.
To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
in some case, the output slots of agg info may be materialized by call SlotDescriptor's materializeSrcExpr method, but not the intermediate slots. This pr set intermediate slots materialized info to keep consistent with output slots.
When select large number of data from a table, the profile will show that:
- ScannerCtxSchedCount: 2.82664M(2826640)
But there is only 8 times of ScannerSchedCount, most of them are busy running.
After improvement, the ScannerCtxSchedCount will be reduced to only 10.
# Proposed changes
First step of #12303
## Problem summary
This is the first step for supporting rollup index selection for aggregate/unique key OLAP table.
This PR aims to select rollup index when the aggregate node is present and the aggregate function matches the value type. So pre-aggregation is turned on by default. Cases that pre-aggregation should be turned off will be addressed in the next PR.
Main steps for rollup index selection:
1. filter rollup indexes with all the required columns.
2. filter rollup indexes that match the key prefix most.
3. order the rollup indexes by row count, column count, rollup index id.
TODO remaining:
1. address cases that pre-aggregation should be turned off. (next PR)
2. add more test cases.
Refactor
- Add `Project.getSlotToProducer` to extract a map from the project output slot to its producing expression.
- Add `Filter.getConjuncts` to split the filter condition to conjunctive predicates.
- Move the usage of `ExpressionReplacer` to `ExpressionUtils.replace(expr, replaceMap)` to simplify the code.
When the current non-correlated subquery is executed, an error will be reported that the corresponding column cannot be found.
The reason is that the tupleID of the child obtained in visitPhysicalNestedLoopJoin is not consistent with the child.
The non-correlated subquery will trigger this bug because it uses crossJoin.
At the same time, sub-query regression tests for non-associative and complex scenarios have been added
Co-authored-by: morrySnow <morrysnow@126.com>
Support OneRowRelation and EmptyRelation.
OneRowRelation: `select 100, 'abc', substring('abc', 1, 2)`
EmptyRelation: `select * from tbl limit 0`
Note:
PhysicalOneRowRelation will translate to UnionNode(constExpr) for BE execution
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:
SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
Implement the having clause for Nereids Planner.
NOTE:
This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
Added regression test of sub-queries. Currently only associated sub-queries are added. Non-associated sub-queries will be added after project revision.
Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g.
SELECT COUNT(1) FROM t;
This PR reserve the column with the smallest amount of data when doing column prune under this situation.
To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.
#11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch
may be corrupted because the data of _input_block is copied shallowly to the _output_block.