Refactor Context in Cascades:
use two context in cascades framework.
JobContext is used in each job, contains such attributes:
- reference to PlannerContext
- current cost upper bound
- current required physical properties
PlannerContext is used to hold global info for query planner, contains such attributes:
- reference to Memo
- reference to connectContext
- reference to ruleset could be used for plan
- job pool to maintain unexecuted jobs
- job scheduler to schedule unexecuted jobs
- current job context for next job to be executed
During the query planning phase, the binary predicate rewrite optimization process converting DecimalLiteral to integers may overflow, resulting in false values like "id = 12345678901.0" (see the issue for detailed examples).
This pr fixes a possible overflow and optimizes the case where DecimalLiteral is not in the column type value range.
Issue Number: close#10544
`ExprBuilder` use stack to build the expr.
The input order is : col, value and the output is value, col, but the `>=` is not reverse.
Example:
`col >= 1` => `1 >= col`
In this case, it's better use the queue to keeper the input order.
And also the `CompoundPredicate(OR)` have some problems, it should be `alwaysTrue` whenever it's not a partition key or it's not a supported op.
enhancement
- add functions `finalizeForNereids` and `finalizeImplForNereids` in stale expression to generate some attributes using in BE.
- remove unnecessary parameter `Analyzer` in function `getBuiltinFunction`
- swap join condition if its left hand expression related to right table
- change join physical implementation to broadcast hash join
- add push predicate rule into planner
fix
- swap join children visit order to ensure the last fragment is root
- avoid visit join left child twice
known issues
- expression compute will generate a wrong answer when expression include arithmetic with two literal children.
Add Rule for disassemble the logical aggregate node, this is necessary since our execution framework is distributed and the execution of aggregate always in two steps, first, aggregate locally then merge them.
Add some fields to logical aggregate to determine whether a logical aggreate operator has been disasembled and mark the aggregate phase it belongs and add the logic to mapping the new aggregate function to its stale definition to get the function intermediate type.
In the funciton `TextConverter::write_vec_column`, it should execute the statement `nullable_column->get_null_map_data().push_back(0);` for every row.
Otherwise the null map will get error and cause the core dump.
Nereids could execute query: `select a from t;`
**enhancement**
- add a queriable interface for QueryStmt and LogicalPlanAdapter Temporarily
- refactor GroupId, GroupId extends doris.common.id now
- GroupId is generated by it's memo now, not global yet
- add varchar type
- Nereids enabled only when vectorized engine enabled
**fix**
- set output and column label to logicalPlanAdapter
- set output expression on root fragment
- set select partition and select index id to OlapScanNode
- BatchRulesJob add rule type mismatch
- add all implementation rules to rule set
- SlotReference get catalog column no longer returns null values
- bind star correctly
- implement `isNullable` in expressions
**known issue**
- could not do expression mapping(e.g. a + 1) on project node(wait intermediate tuple interface and project ability in ExecNode in be)
- aggregate do not work
- sort do not work
- filter do not work
- join do not work
This PR fix a bug in predicate inference.
The original predicate inference compare two slot without SlotId. This will arise an error when a query has SetOperand and more than one SetOperand's child use same table alias. e.g.
```
select * from tb1 inner join tb2 on tb1.k1 = tb2.k1
union
select * from tb1 inner join tb2 on tb1.k2 = tb2.k2 where tb1.k1 = 3;
```
in this case, we infer a predicate `tb2.k1 = 3` on table 'tbl2' of SetOperand's second child by mistake.
add parser error listener and post processor to parser
error listener:
- throw exception when parser find unexpected syntax
post processor:
- throw exception when find error indent
- replace '``' with '`' in quoted identifier
- replace non reserved key word with normal identifier
Add filter operator to join children according to the predicate of filter and join, in order to achieving predicate push-down
Pattern:
```
filter
|
join
/ \
child child
```
Transform:
```
filter
|
join
/ \
filter filter
| |
child child
```
Since we will do the column prune with project node, so we need compact the project outputs to the PlanNode in PhysicalPlanTranslator::visitPhysicalProject
1. Add support for ProjectNode to make column prune available.
2. Add SortNode to PlanFragment when it is unpartitioned piggyback
Organize the plan process, improve the batch execution of rules and the way to add jobs.
Fix the problem that the condition in PhysicalHashJoin is empty.
Follow-up #10241, this PR go through parse and analyze the SSB and add this functions:
1. support parse parenthesizedExpression
2. support analyze LogicalAggregate and LogicalSort
3. replace the functionCall to UnboundFunction and BoundFunction
4. support sum aggregate funciton
5. fix the dead loop in the ExpressionRewriter
6. refine some code
This pr follows up [#10435](https://github.com/apache/doris/pull/10435). [#10435](https://github.com/apache/doris/pull/10435) had supported catalog in sql syntax, but some doris statements are only valid in internal catalog. In order to remind users of the scope of catalog usage, it is necessary to throw errors to exceptions of using catalog in the analyze phase.
## How does it effect origin behavior
It is fully compatible with the previous sql statements. Meanwhile, if using the internal catalog in the statements that all the usage of the internal catalog, the syntax is still valid, but using the external catalog will directly throw errors. For example:
```
MySQL [(none)]> show data from tpch10.lineitem;
+-----------+-----------+------------+--------------+----------+
| TableName | IndexName | Size | ReplicaCount | RowCount |
+-----------+-----------+------------+--------------+----------+
| lineitem | lineitem | 210.809 MB | 32 | 6001215 |
| | Total | 210.809 MB | 32 | |
+-----------+-----------+------------+--------------+----------+
MySQL [(none)]> show data from internal_catalog.tpch10.lineitem;
+-----------+-----------+------------+--------------+----------+
| TableName | IndexName | Size | ReplicaCount | RowCount |
+-----------+-----------+------------+--------------+----------+
| lineitem | lineitem | 210.809 MB | 32 | 6001215 |
| | Total | 210.809 MB | 32 | |
+-----------+-----------+------------+--------------+----------+
MySQL [(none)]> show data from hive.tpch10.lineitem;
ERROR 1105 (HY000): errCode = 2, detailMessage = External catalog 'hive' is not allowed in 'ShowDataStmt'
```
Define a new file scanner node for hms table in be.
This file scanner node is different from broker scan node as blow:
1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot
and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot.
2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed.
3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type,
but FileScanNode will have the final type.
Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.
In order to complete the ssb test, temporarily increase the implementation of costAndEnforcerJob, and create an OptimizeGroupjob for all children of the group.
Add more implmentation rules.
Current some `logical` and `physical` operator is different. I change some code to make them match.
Implementation
- Sort:only heap sort
- Agg
- OlapScan
doris on es8 can not work, because type change. The use of type is no longer recommended in es7,
and support for type has been removed from es8.
1. /_mapping not support include_type_name
2. /_search not support use type