There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:
Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.
This PR mainly adds 4 new class:
VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.
VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.
ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.
ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.
ScannerScheduler
Unified responsible for all Scanner scheduling tasks
Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>
- add an interface ExpectsInputTypes to Expression
- add an interface ImplicitCastInputTypes to Expression
- add a Expression rewrite rule for type coercion
- add a Check Analysis Rule to check whether Plan is Semantically correct
if Expression implements ImplicitCastInputTypes, type coercion rule will automatic rewrite its children that casting it to the most suitable type.
If Expression implements ExpectsInputTypes, Check Analysis will check its children's type whether accepted by expects input types.
when config::enable_simdjson_parser=true in vec streamload, may lead to core dump when json input invalid format string like '{ "a', or all the fields is null like '{}', this may lead to simdjson lib throw some unhandled expection like `Objects and arrays can only be iterated when they are first encountered`.We should take care of these cases
Signed-off-by: eldenmoon <15605149486@163.com>
support view in query
and add a rewrite rule: merge consecutive projects.
the rule can merge relative consecutive projects to one project to improve efficiency
Add p0 test cases, including:
aggregate
join
union
order by
group by
keyword
arithmetic operators
logical operators
case function
coalesce
between
in
like
limit
where
regexp
window function
runtime filter
schema change
* [bugfix](schema change) when there is a string column with delete predicate, the schema change may core
Co-authored-by: yiguolei <yiguolei@gmail.com>
1. add StatementContext, and PlannerContext is renamed to CascadsContext. CascadsContext belong to a StatementContext, and StatementContext belong to a ConnectionContext, and the lifecycle increases in turn. StatementContext can wrap some statement's lifecycle-related state, such as ExpressionId, TableLock. MemoTestUtil can simplify create a CascadesContext and Memo for test.
2. add PlanPreprocessor to process parsed logical plan before copy into memo. and add a PlanPostprocessor to process physical plan after copy out from memo.
3. utilize PlanPreprocessor to process SET_VAR hint, the class is EliminateLogicalSelectHint
4. pass the limit clause in regression test case, in set_var.groovy
1. Add InPredicate expression parser and translator
2. Add regression-test for In predicate (in nereids_syntax)
3. Support NOT EqualTo and NOT InPredicate in ExpressionTranslator#visitNot()
column_ptr will be a none nullable column pointer after `column_ptr = &nullable_column->get_nested_column()`
so we should not cast column_ptr to ColumnNullable any more