Commit Graph

126 Commits

Author SHA1 Message Date
cd70c37402 [fix](nereids) filter and project node should be pushed down through cte (#20508)
1.move PushdownFilterThroughCTEAnchor and PushdownProjectThroughCTEAnchor into PUSH_DOWN_FILTERS rule set
2.move PushdownFilterThroughProject before MergeProjectPostProcessor
2023-06-07 10:36:32 +08:00
cd0379df4e [fix](nereids) select with specified partition name is not work as expected (#20269)
This PR is to fix the select specific partition issue, certain codes related to this feature were accidentally deleted.
2023-06-05 12:48:54 +08:00
519f01133a [feature](decimal)support cast rounding half up and div precision increment in decimalv3. (#19811) 2023-06-01 13:09:58 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
970efdc1cb [Feature](Nereids) support advanced materialized view (#19650)
Increase the functionality of advanced materialized view

This feature already supported by legacy planner with PR #19650

This PR implement it in Nereids. This PR implement the features as below:
1. Support multiple columns in aggregate function.  eg: select sum(c1 + c2) from t1;
2. Supports complex expressions.  eg: select abs(c1), sum(abc(c1+1) + 1) from t1;

TODO:
1. Support adding where in materialized view
2023-05-29 10:37:44 +08:00
f1b949ad59 [fix](Nereids) local sort should not translate to unpartitioned partition (#20031)
1. local sort should not update current fragment partition to UNPARTITIONED
2. should set input fragment dest exchange node after create dest fragment
2023-05-26 10:18:56 +08:00
0dce725120 [fix](nereids)fix decimalv3 type error of mod operator (#20039) 2023-05-25 17:25:11 +08:00
c41b486e7e [fix](nereids) LogicalProject should always has non-empty project list (#18863) 2023-04-21 14:28:07 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
a9f9366736 [fix](nereids) the data type of compareExpr and listQuery should be the same when creating InSubquery (#18539)
Consider sql

select table_B_alias.b from table_B_alias where table_B_alias.b in ( select a from table_A_alias );

if table_B_alias.b is int and table_A_alias.a is bigint,
we should cast(b as bigint) to make the data type the same as the InSubquery.
2023-04-12 20:02:37 +08:00
735cd15a3d [fix](nereids) PushdownAliasThroughJoin should handle same column with different alias in project list (#18470) 2023-04-10 11:50:37 +08:00
b92087dee8 [Fix](Nereids) ReorderJoin rule cannot process MarkJoin correctly (#18159)
Fix two problems,
1. The logical join containing the MarkJoinSlotRefrance column will generate a plan->MarkJoinSlotreference structure when reorderJoin is executed, and the MarkJoinSlotreference column will be restored after the reorder is completed. But when filter+crossJoin exists, it will be transformed into innerJoin in the rules, causing the map to fail, and the corresponding plan cannot be found, thus losing the MarkJoinSlotreference column.
2. Originally, the MarkJoinSlotReference column was used as the NonUserVisibleOutput of logicalJoin. At the same time, when logicalApply was generated, the added logicalProject did not include the MarkJoinSlotReference column, and the invalid logicalProject was deleted based on other rules, so as to ensure that LogicalApply was under the logicalFilter and could recognize the MarkJoinSlotReference column. But there will be problems if logicalProject cannot be deleted.

Repair method
1. For logicalJoin containing MarkJoinSlotreference, the rules of reorderJoin are not executed.
2. Use MarkJoinSlotreference as the output of logicalJoin and also as the output of LogicalApply.
3. When generating LogicalApply, if MarkJoinSlotreference is included, you need to add an additional logicalProject to logicalFilter, and remove the MarkJoinSlotreference column.

eg
```
logicalFilter(subquery with disconjunct)

after SubqueryToApply

logicalProject(without markJoinSlotReference)
+-- logicalFilter(markJoinSlotReference)
    +-- logicalProject(with markJoinSlotReference)
        +-- logicalApply()
```

```
SELECT * FROM sub_query_correlated_subquery1 WHERE k1 IN (SELECT k1 FROM sub_query_correlated_subquery3) OR k1 < 10;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[60] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                        |
| +--LogicalProject[59] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                     |
|    +--LogicalFilter[58] ( predicates=($c$1#7#false OR (k1#0 < 10)) )                                                                                                                               |
|       +--LogicalProject[57] ( distinct=false, projects=[k1#0, k2#1, $c$1#7#false], excepts=[], canEliminate=true )                                                                                 |
|          +--LogicalApply ( correlationSlot=[], correlationFilter=Optional.empty, isMarkJoin=true, MarkJoinSlotReference=$c$1#7#false, scalarSubCorrespondingSlot=empty )                           |
|             |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, indexName=<index_not_selected>, selectedIndexId=63105, preAgg=ON )    |
|             +--LogicalProject[34] ( distinct=false, projects=[k1#2], excepts=[], canEliminate=true )                                                                                               |
|                +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, indexName=<index_not_selected>, selectedIndexId=63115, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2023-03-29 16:12:42 +08:00
d3e7f12ada [refactor](Nereids) refactor column pruning (#17579)
This pr refactor the column pruning by the visitor, the good sides
1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate
2. support shrink output field in some plans, this can skip some useless operations so improvement

example:
```sql
select id 
from (
  select id, sum(age)
  from student
  group by id
)a
```

we should prune the useless `sum (age)` in the aggregate.
before refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
   +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`#4], hasRepeat=false )
      +--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true )
         +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```

after refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
   +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false )
      +--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
         +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```
2023-03-24 09:00:48 +08:00
82716ec99d [fix](Nereids) type coercion for subquery (#17661)
Complete the type coercion of the subquery in the function Binder process.

Expressions generated when subqueries are nested are uniformly converted to implicit types in the analyze stage.
Method: Add a typeCoercionExpr field to the subquery expression to store the generated cast information.

Fix scenario where scalarSubQuery handles arithmetic expressions when implicitly converting types
2023-03-21 20:38:06 +08:00
5c990fb737 [fix](nereids) Analyze failed for SQL that has count distinct with same col (#17928)
This problem is caused by the slots with same hashcodes was put in the hashset results into the wrong rules was selected.Use list instead of set as return type of getDistinctArguments method
2023-03-19 21:31:47 +08:00
dfa2528b5e [fix](bitmap) fix wrong result of bitmap count functions for null values (#17849)
bitmap count functions result is null when there are null values, which is not right:
2023-03-19 11:49:58 +08:00
5f2b68df24 [fix](regression-test) fix unstable regression test cases found in p0 (#17900) 2023-03-19 10:11:57 +08:00
ffda858f01 [fix](regression) fix unstable test cases and remove redundant cases (#17845)
aggregate_strategies execution too slow, use smaller table valued function to speed up
add a p2 case nereids_syntax_p2/aggregate_strategies to use larger table valued function to ensure correct
remove case nereids_syntax_p0/test_join_nereids since it redundant with nereids_p0/join/test_join
remove unstable case in query_p0/aggregate/aggregate
2023-03-16 15:59:26 +08:00
39b5682d59 [Pipeline](shared_scan_opt) Support shared scan opt in pipeline exec engine 2023-03-13 10:33:57 +08:00
6c894be007 [enhancement](Nereids) support decimalv3 and precision derive (#17393) 2023-03-09 14:12:10 +08:00
eea6d770d7 [fix](bitmap) fix wrong result of bitmap_or for null (#17456)
Result of select bitmap_to_string(bitmap_or(to_bitmap(1), null)) should be 1 instead of null.

This PR fix logic of bitmap_or and bitmap_or_count.

Other count related funcitons should also be checked and fix, they will be fixed in another PR.
2023-03-08 16:29:01 +08:00
aab14922af [Feature](Nereids) support MarkJoin (#16616)
# Proposed changes
1.The new optimizer supports the combination of subquery and disjunction.In the way of MarkJoin, it behaves the same as the old optimizer. For design details see:https://emmymiao87.github.io/jekyll/update/2021/07/25/Mark-Join.html.
2.Implicit type conversion is performed when conjects are generated after subquery parsing
3.Convert the unnesting of scalarSubquery in filter from filter+join to join + Conjuncts.
2023-03-08 14:26:24 +08:00
fd8adb492d [fix](nereids) fix bugs in nereids window function (#17284)
fix two problems:

1. push agg-fun in windowExpression down to AggregateNode
for example, sql:
select sum(sum(a)) over (order by b)
Plan:
windowExpression( sum(y) over (order by b))
+--- Agg(sum(a) as y, b)

2. push other expr to upper proj
for example, sql:
select sum(a+1) over ()
Plan:
windowExpression(sum(y) over ())
+--- Project(a + 1 as y,...)
+--- Agg(a,...)
2023-03-07 16:35:37 +08:00
3eeeff09fd [enhancement](nereids) convert string literal to commontype in in-expr and cass-when-expr (#17200) 2023-03-02 22:05:35 +08:00
469b6b8466 [enhancement](Nereids) datetime v2 type precision derive (#17079) 2023-02-26 22:33:55 +08:00
c53b6a9532 [fix](Nereids) fix nullable() of lead/lag (#17014)
fix bug when we use NULL as default value for window function lead() and lag()
2023-02-24 21:27:44 +08:00
7956800df7 [refactor](Nereids) let type coercion same with legacy planner (#16844)
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.

- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
2023-02-22 17:29:37 +08:00
77a3288ce7 [feature](Nereids) support window function (#14397) 2023-02-13 21:20:56 +08:00
4f778c38a1 [feature](nereids) support explore 4 phase aggregation (#16298)
support 4 phase Aggregation.
example: 
`select count(distinct k1), sum(k2) from t`
suppose t.k0 is distribute key.

we have plan 
```
Agg(DISTINCT_GLOBAL)
   |
Exchange(Gather)
  |
Agg(DISTINCT_LOCAL)
  |
Agg(GLOBAL)
  |
Exchange(hash distribute by k1)
 |
Agg(LOCAL) 
 |
scan
```

limitations:
1. only support sql with one distinct.
not support:`select count(distinct k1), count(distinct k2) from t`
2. only support sql with distinct one column
not support: `select count(distinct k1, k2) from t`
2023-02-03 21:51:10 +08:00
929b31bd3c [Feature](Nereids) Support CaseWhen with subquery (#16385)
Co-authored-by: jianghaochen <jianghaochen@meituan.com>
2023-02-03 18:20:47 +08:00
e31913faca [Feature](Nereids) Support order and limit in subquery (#15971)
1.Compatible with the old optimizer, the sort and limit in the subquery will not take effect, just delete it directly.
```
select * from sub_query_correlated_subquery1 where sub_query_correlated_subquery1.k1 > (select sum(sub_query_correlated_subquery3.k3) a from sub_query_correlated_subquery3 where sub_query_correlated_subquery3.v2 = sub_query_correlated_subquery1.k2 order by a limit 1);
```

2.Adjust the unnesting position of the subquery to ensure that the conjunct in the filter has been optimized, and then unnesting

Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k1 = i1.k1) AND (k2 = 1)) )  > 0);
```
The reason why the above can be supported is that conjunction will be performed, which can be converted into the following
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2 or k2 = 1)) )  > 0);
```

Not Support:
```
SELECT DISTINCT k1 FROM sub_query_correlated_subquery1 i1 WHERE ((SELECT count(*) FROM sub_query_correlated_subquery1 WHERE ((k1 = i1.k1) AND (k2 = 2)) or ((k2 = i1.k1) AND (k2 = 1)) )  > 0);
```
2023-02-02 18:17:30 +08:00
09abd32957 [fix](test) result order in group-by-costant case is not stable (#16323) 2023-02-02 16:54:01 +08:00
1ec88cbff6 [fix](nereids) AggregationNode process null as key column in wrong way (#16125)
in AggregationNode, _merge_with_serialized_key_helper method should convert the key column to full column if the key column is null literal.
2023-01-29 20:12:07 +08:00
cbb203efd2 [fix](nereids) fix test_join regression test for nereids (#16094)
1. add TypeCoercion for (string, decimal) and (date, decimal)
2. The equality of LogicalProject node should consider children in some case
3. don't push down join condition like "t1 join t2 on true/false"
4. add PUSH_DOWN_FILTERS after FindHashConditionForJoin
5. nestloop join should support all kind of join
6. the intermediate tuple should contains slots from both children of nest loop join.
2023-01-20 14:02:29 +08:00
dd869077f8 [fix](nereids) do not generate compare between Date to Date (#16061)
BE storage Engine has some bug in Date comparison, and hence if we push down predicates like Date'x' < Date 'y', we get error results.
This pr just convert expr like ’Date'x' < Date 'y',‘ to DateTime'x' < DateTime 'y'

TODO:
do storage engine support date slot compare with datetime?
if it support, we could avoid add cast on the slot
and then, this expression could push down to storage engine.
2023-01-19 15:56:51 +08:00
21b78cb820 [fix](nereids) Fix bind failed of the slots in the group by clause (#16077)
Child's slot with same name to the slots in the outputexpression would be discarded which would cause the bind failed, since the slots in the group by expressions cannot find the corresponding bound slots from the child's output
2023-01-19 15:36:13 +08:00
0144c51ddb [fix](nereids) fix bug in CaseWhen.getDataType and add some missing case for findTightestCommonType (#15776) 2023-01-19 15:30:25 +08:00
d8f598eeab [enhancement](Nereids) add timestampadd, timestampdiff functions (#16072) 2023-01-19 01:05:25 +08:00
baf62b4418 [test](Nereids) add regression-test for running_difference and regexp_extract_all (#16049) 2023-01-18 22:24:52 +08:00
0916cbcb10 [ehancement](nereids) Made the parse for named expression more complete (#16010)
After this PR, we could support such grammar.

SELECT SUBSTRING("dddd编", 0, 3) AS "测试";
SELECT SUBSTRING("dddd编", 0, 3) "测试";
2023-01-18 19:44:51 +08:00
1fa2b662cf [opt](Nereids) add date_add/sub function (#16048)
1. add week_add week_diff function
2. register all date_add/date_diff function
2023-01-18 17:11:44 +08:00
96b9115286 [fix](nereids) fix bug of invalid column in olap scan node when a materialized view is selected (#15976)
if a materialized view is selected, the olap scan node's NonUserVisibleOutput property may contains column from other materialized view. This pr remove invalid column
2023-01-18 01:02:12 +08:00
0c8255d9b8 [fix](nereids)nest loop join should support filter conjuncts like hash join (#15979) 2023-01-17 20:38:38 +08:00
7e4bc1fee6 [fix](Nereids) add a rule to adjust nullable of all expressions (#15791)
we have some rules that change output's nullable in rewrite step. So we need a rule to adjust nullable at the end of rewrite step.

TODO
- remove the output slot map
- add nullable compare into slot reference
- use exprid to compare two slot if do not need to compare nullable
- merge all rules into one to adjust all type plans
2023-01-17 15:51:25 +08:00
d98abb12f9 [fix](Nereids)set oepration type coercion is diff with legacy planner (#15982) 2023-01-17 11:41:41 +08:00
ce1d19b373 [fix](Nereids) lateral view cannot bind function nested in generators (#15960) 2023-01-17 11:37:56 +08:00
8d25b156aa [fix](nereids) bind slot using exactly match (#15950)
example:
unbound slot k
bounded [k, t.k]

In previous binding algorithm, there are 2 candidate bindings,
in which bounded k is exactly matched unbound slot k, it has higher priority than that of t1.k
2023-01-17 11:25:08 +08:00
fa03c8a241 [feature](nereids) const folding for in-predicate with null literal (#15880)
select 1 in (2 , null)  => null
select 1 in (1 , null)  => true
select 1 not in (2 , null)  => null
select 1 not in (1 , null)  => false
2023-01-16 13:48:45 +08:00
67378a2dc3 [fix](nereids) fix bug in SequenceFunction legality check (#15812)
1. fix bug in sequence_match function
2. do type promotion instead of explicit cast for
  - varcharLiteral -> stringLiteral
  - charLiteral->stringLiteral
2023-01-13 12:09:53 +08:00
39697bb83e [fix](Nereids) make the type of the first parameter in window_funnel is intergerLike (#15810) 2023-01-12 11:53:28 +08:00