This PR add runtime filter to Nereids planner. Now only support push through join node and scan node.
TODO:
1. current support inner join, cross join, right outer join, and will support other join type in future.
2. translate left outer join to inner join if there are inner join ancestors.
3. some complex situation cannot be handled now, see more details in test case: testPushDownThroughJoin.
4. support src key is aggregate group key.
NamedExpressionUtil::clear should reset the nextId rather than create a new IdGenerator<ExprId> because the old one may be referenced by other objects and it may cause some cases start in a dirty environment when we run test cases in package.
support distinct count with group by clause.
for example:
SELECT count(distinct c_custkey + 1) FROM customer group by c_nation;
TODO: support distinct count without group by clause.
This PR changed some interfaces to avoid unsafe cast.
- Modify `Plan.getExpressions()`'s return type from `List<Expression>` to `List<? extends Expression>`
Return projects (type is a list of named expression) in `getExpressions` can avoid unsafe cast. See `LogicalProject.getExpression()` as an example.
- Modify `EmptyRelation.getProjects()`'s return type from `List<NamedExpression>` to `List<? extends NamedExpression>`
Creating empty relation with a list of slots can avoid unsafe cast. See the `EliminateLimit` rule for example.
In Nereids, we could not distinguish two relation from same table in one PlanTree.
This lead to some trick code to process them when do plan. Such as a separate branch to do equals in GroupExpression.
This PR add RelationId to LogicalRelation and PhysicalRelation. Then all relations equals function will compare RelationId to help us distinguish two relation from same table.
TODO:
add relation id to UnboundRelation, UnboundOneRowRelation, LogicalOneRowRelation, PhysicalOneRowRelation.
Reuse compression ctx and buffer.
Use a global instance for every compression algorithm, and use a
thread saft buffer pool to reuse compression buffer, pool size is equal
to max parallel thread num in compression, and this will not be too large.
Test shows this feature increase 5% of data import and compaction.
Co-authored-by: yixiutt <yixiu@selectdb.com>
Store the offset rather than the length in file for the data with array type. The new file format can improve the seek performance. Please refer to #12246 to get the performance report.
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Currently, Doris has a variety of readers for different file formats,
such as parquet reader, orc reader, csv reader, json reader and so on.
The interfaces of these readers are not unified, which makes it impossible to call them through a unified method.
In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class
to use the `get_next_block()` method.
This PR currently only modifies `arrow_reader` and `parquet reader`.
Other readers will be modified one by one in subsequent PRs.
A channel is closed when a timeout or exception happens, if only
one stub is used, then all query would fail.
If we dont close the channel, sometimes grpc-java stuck without sending
any rpc.
1. For some related rules, we need to execute them together to get the expected plan.
2. Add session variables to avoid fallback to stale planner when running regression tests of nereids for piggyback.
In compute level, CHAR type will shrink suffix zeros.
To keep the logic the same as CHAR type, we also shrink for ARRAY or ARRAY<ARRAY> types.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
For debug purpose:
Add session variable skip_storage_engine_merge, when set to true, tables of aggregate key model and unique key model will be read as duplicate key model.
Add session variable skip_delete_predicate, when set to true, rows deleted with delete statement will be selected.
Related pr:
https://github.com/apache/doris/pull/11582https://github.com/apache/doris/pull/12048
Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.
Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.