Commit Graph

1433 Commits

Author SHA1 Message Date
Pxl
ae809fbeba [Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969)
fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case
2023-07-22 15:27:05 +08:00
50c8563f35 [fix](partial update) fix some bugs of sequence column (#21896) 2023-07-22 15:26:48 +08:00
355ac18363 [Fix](jdbc catalog) Pass conjuncts to JdbcScanNode and FileScanNode before doing finalize. (#21998)
JdbcScanNode need to use the conjuncts to generate sql in finalize function. But the conjuncts have not passed to JdbcScanNode yet while calling finalize. This pr is to pass the conjuncts to scan node before using it to avoid scan the whole table.
2023-07-22 14:08:44 +08:00
f7e3cc1553 [FIX](map)fix map proto contains_null #22107
when we select map in order by and limit; be node will coredump
2023-07-22 10:41:55 +08:00
40299d280d [Fix](json reader) fix rapidjson array->PushBack may take ownership… (#21988)
With bellow json path
`["$.data","$.data.datatimestamp"]`

After `array_obj->PushBack` the `data` field owner will be taken from array_obj, and lead to null values for json path `$.data.datatimestamp`

Rapidjson doc:
```
//! Append a GenericValue at the end of the array.
  \note The ownership of \c value will be transferred to this array on success.
 */
GenericValue& PushBack(GenericValue& value, Allocator& allocator);
```
2023-07-21 17:02:01 +08:00
2b2ac10e93 [feature](partial update) add failure tolerance for strict mode partial update stream load 2023-07-21 16:46:44 +08:00
732e0d14ff [Enhancement](window-funnel)add different modes for window_funnel() function (#20563) 2023-07-21 13:57:27 +08:00
74313c7d54 [feature-wip](autoinc)(step-3) add auto increment support for unique table (#22036) 2023-07-21 13:24:41 +08:00
eabd5d386b [Fix](multi catalog)Fix nereids context table always use internal catalog bug (#21953)
The getTable function in CascadesContext only handles the internal catalog case (try to find table only in internal 
catalog and dbs). However, it should take all the external catalogs into consideration, otherwise, it will failed to find a 
table or get the wrong table while querying external table. This pr is to fix this bug.
2023-07-20 20:32:01 +08:00
ee65e0a6b1 [fix](Nereids) should not remove any limit from uncorrelated subquery (#21976)
We should not remove any limit from uncorrelated subquery. For Example
```sql
-- should return nothing, but return all tuple of t if we remove limit from exists
SELECT * FROM t WHERE EXISTS (SELECT * FROM t limit 0);

-- should return the tuple with smallest c1 in t,
-- but report error if we remove limit from scalar subquery
SELECT * FROM t WHERE c1 = (SELECT * FROM t ORDER BY c1 LIMIT 1);
```
2023-07-20 18:37:04 +08:00
367ad9164a [feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917) 2023-07-20 18:03:39 +08:00
86d7233b06 [fix](nereids) ExtractAndNormalizeWindowExpression rule should push down correct exprs to child (#21827)
consider the window function:
```sql
substr(
ref_1.cp_type,
sum(CASE WHEN ref_1.cp_type = 0 THEN 3 ELSE 2 END) OVER (),
1)
```
Before the pr, only "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END" is pushed down.
But both "ref_1.cp_type" and "CASE WHEN ref_1.cp_type  = 0 THEN 3 ELSE 2 END"
should be pushed down.
This pr fix it
2023-07-20 11:47:55 +08:00
20242d9a0e [Improve](simdjson) put unescaped string value after parsed (#21866)
In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB.
If not unescape, then later jsonb parse will be failed
2023-07-20 10:33:17 +08:00
2daad2151d [enhancement](jdbc catalog) Add mysql jdbc catalog function to filter push-down identification (#21745) 2023-07-19 23:48:23 +08:00
56c67a442a [regression-test] add p0/p1 case about partition table (#21777) 2023-07-19 14:05:56 +08:00
d987f782d2 [refactor](Nereids) refactor cte analyze, rewrite and reuse code (#21727)
REFACTOR:

1. Generate CTEAnchor, CTEProducer, CTEConsumer when analyze.

For example, statement `WITH cte1 AS (SELECT * FROM t) SELECT * FROM cte1`.
Before this PR, we got analyzed plan like this:
```
logicalCTE(LogicalSubQueryAlias(cte1))
+-- logicalProject()
    +-- logicalCteConsumer()
```
we only have LogicalCteConsumer on the plan, but not LogicalCteProducer.
This is not a valid plan, and should not as the final result of analyze.
After this PR, we got analyzed plan like this:
```
logicalCteAnchor()
|-- logicalCteProducer()
+-- logicalProject()
    +-- logicalCteConsumer()
```
This is a valid plan with LogicalCteProducer and LogicalCteConsumer

2. Replace re-analyze unbound plan with deepCopy plan when do CTEInline

Because we generate LogicalCteAnchor and LogicalCteProducer when analyze.
So, we could not do re-analyze to gnerate CTE inline plan anymore.
The another reason is, we reuse relation id between unbound and bound relation.
So, if we do re-analyze on unresloved CTE plan, we will get two relation
with same RelationId. This is wrong, because we use RelationId to distinguish
two different relations.
This PR implement two helper class to deep copy a new plan from CTEProducer.
`LogicalPlanDeepCopier` and `ExpressionDeepCopier`

3. New rewrite framework to ensure do CTEInline in right way.

Before this PR, we do CTEInline before apply any rewrite rule.
But sometimes, some CteConsumer could be eliminated after rewrite.
After this PR, we do CTEInline after the plans relaying on CTEProducer have
been rewritten. So we could do CTEInline if some the count of CTEConsumer
decrease under the threshold of CTEInline.

4. add relation id to all relation plan node
5. let all relation generated from table implement trait CatalogRelation
6. reuse relation id between unbound relation and relation after bind


ENHANCEMENT:

1. Pull up CTEAnchor before RBO to avoid break other rules' pattern

Before this PR, we will generate CTEAnchor and LogicalCTE in the middle of plan.
So all rules should process LogicalCTEAnchor, otherwise will generate unexpected plan.
For example, push down filter and push down project should add pattern like:
```
logicalProject(logicalCTE)
...
logicalFilter(logicalCteAnchor)
...
```
project and filter must be push through these virtual plan node to ensure all project
and filter could be merged togather and get right order of them. for Example:
```
logicalProject
+-- logicalFilter
    +-- logicalCteAnchor
        +-- logicalProject
            +-- logicalFilter
                +-- logicalOlapScan
```
upper plan will lead to translation error. because we could not do twice filter and
project on bottom logicalOlapScan.


BUGFIX:

1. Recursive analyze LogicalCTE to avoid bind outer relation on inner CTE

For example
```sql
SELECT * FROM (WITH cte1 AS (SELECT * FROM t1) SELECT * FROM cte1)v1, cte1 v2; 
```
Before this PR, we will use nested cte name to bind outer plan.
So the outer cte1 with alias v2 will bound on the inner cte1.
After this PR, the sql will throw Table not exists exception when binding.

2. Use right way do withChildren in CTEProducer and remove projects in it

Before this PR, we add an attr named projects in CTEProducer to represent the output
of it. This is because we cannot get right output of it by call `getOutput` method on it.
The root reason of that is the wrong implementation of computeOutput of LogicalCteProducer.
This PR fix this problem and remove projects attr of CTEProducer.

3. Adjust nullable rule update CTEConsumer's output by CTEProducer's output

This PR process nullable on LogicalCteConsumer to ensure CteConsumer's output with right
nullable info, if the CteProducer's output nullable has been adjusted.

4. Bind set operation expression should not change children's output's nullable

This PR use fix a problem introduced by prvious PR #21168. The nullable info of
SetOperation's children should not changed after binding SetOperation.
2023-07-19 11:41:41 +08:00
5b043a980e [fix](planner)only forbid literal value in AnalyticExpr's order by list (#21819)
* [fix](planner)only forbid literal value in AnalyticExpr's order by list
2023-07-19 09:40:55 +08:00
Pxl
0de94e857f [Bug](materialized view) fix wrong match mv when mv have where clause (#21797) 2023-07-19 01:11:39 +08:00
fff1983f40 [fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949) 2023-07-19 00:46:49 +08:00
a9ea138caf [fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899)
When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
2023-07-18 19:50:30 +08:00
87556b5741 [bug](test) fix regression test case failed with curdate (#21922)
fix regression test case failed with curdate
2023-07-18 19:10:55 +08:00
Pxl
417e3e5616 [Feature](delete) support fold constant on delete stmt (#21833)
support fold constant on delete stmt
2023-07-18 12:56:28 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00
489171e4c1 [Fix](multi catalog)Fix hive partition value contains special character such as / bug (#21876)
Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F.
Doris didn't handle this case which will cause doris failed to list the files under partition with special characters.
This pr is to fix this bug.
2023-07-18 11:20:38 +08:00
b656f31cf2 [Enchancement](compatible) show decimalv3 to decimal (#21782) 2023-07-18 09:17:14 +08:00
a92508c3f9 [Fix](statistics) Fix analyze db always use internal catalog bug (#21850)
`Analyze database db_name ` command couldn't use current catalog, it is always using the internal catalog. This will cause the command failed to find the db. This pr is to fix this bug.
2023-07-17 15:28:54 +08:00
03b575842d [Feature](table function) support explode_json_array_json (#21795) 2023-07-17 11:40:02 +08:00
Pxl
86841d8653 [Bug](materialized-view) fix some problems of mv and make ssb mv work on nereids (#21559)
fix some problems of mv and make ssb mv work on nereids
2023-07-17 10:08:25 +08:00
c409fa0f58 [Feature](Compaction)Support full compaction (#21177) 2023-07-16 13:21:15 +08:00
83ce4379ff [regression] add order by in test case for stable output (#21815) 2023-07-14 18:01:43 +08:00
c9a99ce171 [Feature](Nereids) support udf for Nereids (#18257)
Support alias function, Java UDF, Java UDAF for Nereids.
Implementation:
UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection.

Secondly:
For alias function:
The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function.
the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function.

Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions.

For JavaUDF and JavaUDAF
JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it.

All in all, now Nereids support UDFs and nesting them.
2023-07-14 17:02:01 +08:00
f95d728d3e [shape](nereids) TPCDS check all query shape, except ds64 (#21742)
there is a known bug on ds64 analyze. add ds 64 shape check latter
2023-07-14 16:56:46 +08:00
Pxl
4d44cea784 [Bug](materialized-view) check group expr at create mv (#21798)
check group expr at create mv
2023-07-14 15:39:38 +08:00
ca6e33ec0c [feature](table-value-functions)add catalogs table-value-function (#21790)
mysql> select * from catalogs() order by CatalogId;
2023-07-14 10:25:16 +08:00
6fd8f5cd2f [Fix](parquet-reader) Fix parquet string column min max statistics issue which caused query result incorrectly. (#21675)
In parquet, min and max statistics may not be able to handle UTF8 correctly.
Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used.
If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains 
only ASCII characters. I will improve it in the future PR.
2023-07-14 00:09:41 +08:00
35fa9496e7 [fix](merge-on-write) fix wrong result when query with prefix key predicate (#21770) 2023-07-13 19:56:00 +08:00
abc21f5d77 [bugfix](ngram bf index) process differently for normal bloom filter index and ngram bf index (#21310)
* process differently for normal bloom filter index and ngram bf index

* fix review comments for readbility

* add test case

* add testcase for delete condition
2023-07-13 17:31:45 +08:00
d4bdd6768c [Feature](Nereids) support select into outfile (#21197) 2023-07-13 17:01:47 +08:00
00c48f7d46 [opt](regression case) add more index change case (#21734) 2023-07-12 21:52:48 +08:00
be55cb8dfc [Improve](jsonb_extract) support jsonb_extract multi parse path (#21555)
support jsonb_extract multi parse path
2023-07-12 21:37:36 +08:00
88c719233a [opt](nereids) convert OR expression to IN expression (#21326)
Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine.

for example:

```sql
col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 = 1 or col1 = 2) and  (col2 = 3 or col2 = 4)
```

would be converted to 

```sql
col1 in (1, 2) or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 in (1, 2) and (col2 in (3, 4)))
```
2023-07-12 10:53:06 +08:00
ff42cd9b49 [feature](hive)add read of the hive table textfile format array type (#21514) 2023-07-11 22:37:48 +08:00
cb69349873 [regression] add bitmap filter p1 regression case (#21591) 2023-07-11 14:27:03 +08:00
5ed42705d4 [fix](jdbc scan) 1=1 does not translate to TRUE (#21688)
For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database
2023-07-11 14:04:49 +08:00
d3be10ee58 [improvement](column) Support for the default value of current_timestamp in microsecond (#21487) 2023-07-11 14:04:13 +08:00
7b403bff62 [feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623)
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
2023-07-11 09:38:56 +08:00
736d6f3b4c [improvement](timezone) support mixed uppper-lower case of timezone names (#21572) 2023-07-11 09:37:14 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
202a5c636f [fix](create table) modify varchar default length 1 to 65533 (#21302)
*modify archer default length 1 to  varchar.max.length , when create table.*

```mysql
create table t2 (             
k1 CHAR,              
K2 CHAR(10) ,               
K3 VARCHAR ,             
 K4 VARCHAR(1024) )              
duplicate key (k1)              
distributed by hash(k1) buckets 1              
properties('replication_num' = '1');  

desc t2;
```

| Field | Type           | Null | Key   | Default | Extra |
| -- |--|--| -| -| -| 
| k1    | CHAR(1)        | Yes  | true  | NULL    |       |
| K2    | CHAR(10)       | Yes  | false | NULL    | NONE  |
| K3    | VARCHAR(65533) | Yes  | false | NULL    | NONE  |
| K4    | VARCHAR(1024)  | Yes  | false | NULL    | NONE  |
2023-07-10 17:57:21 +08:00
0be349e250 [feature](jdbc) Support jdbc catalog to read json types (#21341) 2023-07-10 16:21:00 +08:00