We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE
Co-authored-by: HappenLee <happenlee@hotmail.com>
We can skip aggregate on replace column, otherwise it would generate
wrong result. e.g. a row in UNIQUE is deleted by delte_sign_column,
then it would be returned.
Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner.
Fix some blocking bug of new scan framework.
TODO:
Memtracker
Opentelemetry spen
The new framework is still disabled by default, so it will not effect other feature.
Array column should not be distributed key or group by or aggregate key, we should forbid it before create table.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Add one expression rewrite rule:
rewrite InPredicate to an EqualTo Expression, if there exists exactly one element in InPredicate Options.
Examples:
1. where A in (x) ==> where A = x
2. where A not in (x) ==> where not A = x
In BE, There is an implicit convention that HashJoinNode's left child's output Slot must before right child's output slot in intermediateTuple.
However, after we do commute rule on join plan in Nereids, this convention will be broken and cause core dump in BE.
There are two way to fix this problem:
1. add a project on join after we do commute
2. reorder output of join node when we do translate
Since we cannot translate project yet because BE projection support is on going(#11842). So we use second way to fix it now. After the project translation could work correctly, we should use the first way to fix it.
In current implementation, we detect invalid slot at execute phase. At execute phase, it is hard to get useful information for further debug. This pr moves error detection ahead to prepare phase, so that we can log related tuple descriptors.
Refactor TaggableLogger
Refactor status handling in agent task:
Unify log format in TaskWorkerPool
Pass Status to the top caller, and replace some OLAPInternalError with more detailed error message Status
Premature return with the opposite condition to reduce indention
Problem:
1. `enable_array_type` is masterOnly;
2. dynamic open config only affect FE MASTER
`admin set frontend config("enable_array_type"="true");`
3. query in FE FOLLOWER will fail, because of `enable_array_type` is false in FE FOLLOWER
`select * from table_with_array `
Solution:
Only check `enable_array_type` while creating new tables with array column.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
1. Rewrite Filter(Project) to Project(Filter) to make sure when do partition prune the tree looks like this: Project(Filter(OlapScan)).
2. Enable the MergeConsecutiveProject MergeConsecutiveFilter rules.
3. prune range partition just like what Legacy Planner do.
Parse parquet data with dictionary encoding.
Using the PLAIN_DICTIONARY enum value is deprecated in the Parquet 2.0 specification.
Prefer using RLE_DICTIONARY in a data page and PLAIN in a dictionary page for Parquet 2.0+ files.
refer: https://github.com/apache/parquet-format/blob/master/Encodings.md
This PR fix three problem in Nereids.
- Add selected index, partition and tablet Info in LogicalOlapScan
- with JoinReorderContext in new LogicalJoin
- fix compute data size when no column size info in StatsCaculator