Commit Graph

2085 Commits

Author SHA1 Message Date
3c58e9bac9 [Fix](Nereids) Fix problem of infer predicates not completely (#22145)
Problem:
When inferring predicate in nereids, new inferred predicates can not be the source of next round. For example:

create table tt1(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt2(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
create table tt3(c1 int, c2 int) distributed by hash(c1) properties('replication_num'='1');
explain select * from tt1 left join tt2 on tt1.c1 = tt2.c1 left join tt3 on tt2.c1 = tt3.c1 where tt1.c1 = 123;

we expect to get t33.c1 = 123, but we can just get t22.c1 = 123. Because when infer tt1.c1 = 123 and tt2.c1 = tt3.c1, we can
not get any relationship of these two predicates.

Solution:
We need to cache middle results of source predicates like t22.c1 = 123 in example.
2023-07-25 10:05:00 +08:00
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
752cec9e19 [Fix](multi-catalog) Fix not single slot filter conjuncts with dict filter issue. (#22052)
### Issue
Dictionary filtering is a mechanism that directly reads the dictionary encoding of a single string column filter condition for filter comparison. But dictionary filtered single string columns may be included in other multi-column filter conditions. This can cause problems.

For example:
`select * from multi_catalog.lineitem_string_date_orc where l_commitdate < l_receiptdate and l_receiptdate = '1995-01-01'  order by l_orderkey, l_partkey, l_suppkey, l_linenumber limit 10;`

`l_receiptdate` is string filter column,it is included by multi-column filter condition `l_commitdate < l_receiptdate`.

### Solution
Resolve it by separating the multi-column filter conditions and executing it after the dictionary filter column is converted to string.
2023-07-24 22:31:18 +08:00
9fe470b273 [pipeline](check) update check-pr-if-need-run-build.sh (#22171)
no need to run pipeline if only modify regression-test/pipeline/p0/conf/regression-conf.groovy or regression-test/pipeline/p1/conf/regression-conf.groovy
2023-07-24 21:04:23 +08:00
68bd4a1a96 [opt](Nereids) check multiple distinct functions that cannot be transformed into muti_distinct (#21626)
This commit introduces a transformation for SQL queries that contain multiple distinct aggregate functions. When the number of distinct values processed by these functions is greater than 1, they are converted into multi_distinct functions for more efficient handling.

Example:
```
SELECT COUNT(DISTINCT c1), SUM(DISTINCT c2) FROM tbl GROUP BY c3
-- Transformed to
SELECT MULTI_DISTINCT_COUNT(c1), MULTI_DISTINCT_SUM(c2) FROM tbl GROUP BY c3
```

The following functions can be transformed:
- COUNT
- SUM
- AVG
- GROUP_CONCAT

If any unsupported functions are encountered, an error is now reported during the optimization phase.

To ensure the absence of such cases, a final check has been implemented after the rewriting phase.
2023-07-24 16:34:17 +08:00
21deb57a4d [fix](Nereids) remove double sigature of ceil, floor and round (#22134)
we convert input parameters to double for function ceil, floor and round,
because DecimalV2 could not do these operation. Since we intro DecimalV3,
we should convert all parameters to DecimalV3 to get correct result.
For example, when we use double as parameters, we get wrong result:
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.017                   | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
DecimalV3 could get correct result
```sql
select round(341/20000,4),341/20000,round(0.01705,4);
+-------------------------+---------------+-------------------+
| round((341 / 20000), 4) | (341 / 20000) | round(0.01705, 4) |
+-------------------------+---------------+-------------------+
| 0.0171                  | 0.01705       | 0.0171            |
+-------------------------+---------------+-------------------+
```
2023-07-24 16:08:00 +08:00
d2531db1cf [fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066) 2023-07-24 15:39:08 +08:00
ac9480123c [refactor](Nereids) push down all non-slot order key in sort and prune them upper sort (#22034)
According the implementation in execution engine, all order keys
in SortNode will be output. We must normalize LogicalSort follow
by it.
We push down all non-slot order key in sort to materialize them
behind sort. So, all order key will be slot and do not need do
projection by SortNode itself.
This will simplify translation of SortNode by avoid to generate
resolvedTupleExprs and sortTupleDesc.
2023-07-24 15:36:33 +08:00
b5f27b5349 [enhance](nereids) enable wf partition topn by default (#21860) 2023-07-24 14:21:45 +08:00
99bf901607 [fix](in) throw exception for unsupported data type of in expr (#22050) 2023-07-24 14:13:31 +08:00
138e6c2f01 [stats](nereids)keep min/max expr in colstats (#22064)
columnStatistics.minExpr and maxExpr is useful when we derive stats for cast function.
This pr
1. maintains the min/max expr during stats derive in filter condition: col<literal, col>literal and col=literal
2. adjust column stats range for cast function (now only support cast from string to other types)

ds9 is changed, but no performance issue: on tpcds_sf100_rf exe time is 1.5~1.6sec, the same as master
2023-07-24 10:28:36 +08:00
4f0158c458 [fix](partial-update) fix update core for merge-on-write table (#22090) 2023-07-23 13:35:08 +08:00
Pxl
0755fd16d8 remove create hot partition failed check (#22093) 2023-07-22 17:47:46 +08:00
3d0f952934 [FIX](complex-type)delete enable_map/struct_type switch #21957 2023-07-22 15:29:32 +08:00
Pxl
ae809fbeba [Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969)
fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case
2023-07-22 15:27:05 +08:00
50c8563f35 [fix](partial update) fix some bugs of sequence column (#21896) 2023-07-22 15:26:48 +08:00
355ac18363 [Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998)
JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.
2023-07-22 14:08:44 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
93f9a8cbf5 [fix](nereids)PredicatePropagation only support integer types for now (#22096) 2023-07-21 23:40:08 +08:00
ef01988ae1 [opt](inverted index) support the same column create different type index (#21972) 2023-07-21 23:02:39 +08:00
acf4aa2818 [fix](planner)shouldn't force push down conjuncts for union statement (#22079)
* [fix](planner)shouldn't force push down conjuncts for union statement
2023-07-21 21:12:56 +08:00
e489b60ea3 [feature](load) support line delimiter for old broker load (#22030) 2023-07-21 19:31:19 +08:00
37f230ee3e [pipeline](regression) do not run build if only modified regression conf (#22075)
in order to fast exclude cases that block regression pipeline.
2023-07-21 17:13:28 +08:00
40299d280d [Fix](json reader) fix rapidjson array->PushBack may take ownership… (#21988)
With bellow json path
`["$.data","$.data.datatimestamp"]`

After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp`

Rapidjson doc:
```
//! Append a GenericValue at the end of the array.
  \note The ownership of \c value will be transferred to this array on success.
 */
GenericValue& PushBack(GenericValue& value, Allocator& allocator);
```
2023-07-21 17:02:01 +08:00
2b2ac10e93 [feature](partial update) add failure tolerance for strict mode partial update stream load 2023-07-21 16:46:44 +08:00
732e0d14ff [Enhancement](window-funnel)add different modes for window_funnel() function (#20563) 2023-07-21 13:57:27 +08:00
74313c7d54 [feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036) 2023-07-21 13:24:41 +08:00
fb5b412698 [fix](planner)fix bug of pushing conjuncts into inlineview (#21962)
1. markConstantConjunct method shouldn't change the input conjunct
2. Use Expr's comeFrom method to check if the pushed expr is one of the group by exprs, this is the correct way to check if the conjunct can be pushed down through the agg node.
3. migrateConstantConjuncts should substitute the conjuncts using inlineViewRef's analyzer to make the analyzer recognize the column in the conjuncts in the following analyze phase
2023-07-21 11:34:56 +08:00
eabd5d386b [Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953)
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal 
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a 
table or get the wrong table while querying external table. This pr is to fix this bug.
2023-07-20 20:32:01 +08:00
ee65e0a6b1 [fix](Nereids) should not remove any limit from uncorrelated subquery (#21976)
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);

-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```
2023-07-20 18:37:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
86d7233b06 [fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827)
consider the window function:
```sql
substr(
ref_1.cp_type,
sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (),
1)
```
Before the pr, only "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END" is pushed down.
But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END"
should be pushed down.
This pr fix it
2023-07-20 11:47:55 +08:00
0f116ce148 Revert "[Enhancement](Nereids)enable nereids DML by default. (#21539)" (#22013)
This reverts commit f668b3965effbd5df4902f20b496cb6b6642414c.
2023-07-20 11:32:54 +08:00
20242d9a0e [Improve](simdjson) put unescaped string value after parsed (#21866)
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
2023-07-20 10:33:17 +08:00
2daad2151d [enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745) 2023-07-19 23:48:23 +08:00
58f2593ba1 [Fix](Nereids) Add cast comparison with slot reference when inferring predicate (#21171)
Problem:
When inferring predicate, we assume that slot reference need to be inferred. But in this case:
carete table tb1(l1 smallint) ...;
create table tb2(l2 int) ...;
select * from tb1 inner join tb2 where tb1.l1 = tb2.l2 and tb2.l2 = 1;
We can not get tb1.l1 = 1 filter because we will add a cast to l1 (Cast smallint to int l1) = l2.

Solved:
Add cast consideration when inferring predicate, also add change judgement when judging equals to slotreference and cast expression. But when we want to infer predicate from bigger type cast to smaller type, it is logical error.
For example:
select * from tb1 inner join tb2 where tb1.l1 = cast(tb2.l2 as smallint) and tb2.l2 = (number between smallint max and intmax);
tb2.l2 value can not infer to left side because tb1.l1 would be false value, and when we add one more condition like tb1.l1 = tb3.l3(smallint). It would cause this predicate be false.
2023-07-19 23:14:26 +08:00
56c67a442a [regression-test] add p0/p1 case about partition table (#21777) 2023-07-19 14:05:56 +08:00
f668b3965e [Enhancement](Nereids)enable nereids DML by default. (#21539)
TODO: fix cast agg_state type when do insert
2023-07-19 13:52:15 +08:00
d987f782d2 [refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727)
REFACTOR:

1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze.

For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`.
Before this PR, we got analyzed plan like this:
```
logicalCTE(LogicalSubQueryAlias(cte1))
+-- logicalProject()
    +-- logicalCteConsumer()
```
we only have LogicalCteConsumer on the plan, but not LogicalCteProducer.
This is not a valid plan, and should not as the final result of analyze.
After this PR, we got analyzed plan like this:
```
logicalCteAnchor()
|-- logicalCteProducer()
+-- logicalProject()
    +-- logicalCteConsumer()
```
This is a valid plan with LogicalCteProducer and LogicalCteConsumer

2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline

Because we generate LogicalCteAnchor and LogicalCteProducer when analyze.
So, we could not do re-analyze to gnerate CTE inline plan anymore.
The another reason is, we reuse relation id between unbound and bound relation.
So, if we do re-analyze on unresloved CTE plan, we will get two relation
with same RelationId. This is wrong, because we use RelationId to distinguish
two different relations.
This PR implement two helper class to deep copy a new plan from CTEProducer.
`LogicalPlanDeepCopier` and `ExpressionDeepCopier`

3. New rewrite framework to ensure do CTEInline in right way.

Before this PR, we do CTEInline before apply any rewrite rule.
But sometimes, some CteConsumer could be eliminated after rewrite.
After this PR, we do CTEInline after the plans relaying on CTEProducer have
been rewritten. So we could do CTEInline if some the count of CTEConsumer
decrease under the threshold of CTEInline.

4. add relation id to all relation plan node
5. let all relation generated from table implement trait CatalogRelation
6. reuse relation id between unbound relation and relation after bind


ENHANCEMENT:

1. Pull up CTEAnchor before RBO to avoid break other rules' pattern

Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan.
So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan.
For example, push down filter and push down project should add pattern like:
```
logicalProject(logicalCTE)
...
logicalFilter(logicalCteAnchor)
...
```
project and filter must be push through these virtual plan node to ensure all project
and filter could be merged togather and get right order of them. for Example:
```
logicalProject
+-- logicalFilter
    +-- logicalCteAnchor
        +-- logicalProject
            +-- logicalFilter
                +-- logicalOlapScan
```
upper plan will lead to translation error. because we could not do twice filter and
project on bottom logicalOlapScan.


BUGFIX:

1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE

For example
```sql
SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; 
```
Before this PR, we will use nested cte name to bind outer plan.
So the outer cte1 with alias v2 will bound on the inner cte1.
After this PR, the sql will throw Table not exists exception when binding.

2. Use right way do withChildren in CTEProducer and remove projects in it

Before this PR, we add an attr named projects in CTEProducer to represent the output
of it. This is because we cannot get right output of it by call `getOutput` method on it.
The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer.
This PR fix this problem and remove projects attr of CTEProducer.

3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output

This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right
nullable info, if the CteProducer's output nullable has been adjusted.

4. Bind set operation expression should not change children's output's nullable

This PR use fix a problem introduced by prvious PR #21168. The nullable info of
SetOperation's children should not changed after binding SetOperation.
2023-07-19 11:41:41 +08:00
5b043a980e [fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819)
* [fix](planner)only forbid literal value in AnalyticExpr's order by list
2023-07-19 09:40:55 +08:00
Pxl
0de94e857f [Bug](materialized view) fix wrong match mv when mv have where clause (#21797) 2023-07-19 01:11:39 +08:00
fff1983f40 [fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949) 2023-07-19 00:46:49 +08:00
28dfcd8785 [fix](pipeline) Fix pipeline that cause plenty timeout of p0 cases #21917 2023-07-18 23:15:49 +08:00
a9ea138caf [fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899)
When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
2023-07-18 19:50:30 +08:00
c6063ed92f [Revert](lazy open) revert lazy open and add case (#21821) 2023-07-18 19:41:33 +08:00
87556b5741 [bug](test) fix regression test case failed with curdate (#21922)
fix regression test case failed with curdate
2023-07-18 19:10:55 +08:00
d6d27ef428 [fix](Nereids) join other conjuncts should get slot from join output (#21840) 2023-07-18 18:22:40 +08:00
ec12a4159a [fix](planner) push conjuncts into SetOperationStmt inline view (#21718)
* [fix](planner)push conjuncts into SetOperationStmt inline view
2023-07-18 14:17:07 +08:00
Pxl
417e3e5616 [Feature](delete) support fold constant on delete stmt (#21833)
support fold constant on delete stmt
2023-07-18 12:56:28 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00