Commit Graph

2044 Commits

Author SHA1 Message Date
fff1983f40 [fix](planner)use tupleId of agg node to get its unsigned conjuncts (#21949) 2023-07-19 00:46:49 +08:00
28dfcd8785 [fix](pipeline) Fix pipeline that cause plenty timeout of p0 cases #21917 2023-07-18 23:15:49 +08:00
a9ea138caf [fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899)
When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
2023-07-18 19:50:30 +08:00
c6063ed92f [Revert](lazy open) revert lazy open and add case (#21821) 2023-07-18 19:41:33 +08:00
87556b5741 [bug](test) fix regression test case failed with curdate (#21922)
fix regression test case failed with curdate
2023-07-18 19:10:55 +08:00
d6d27ef428 [fix](Nereids) join other conjuncts should get slot from join output (#21840) 2023-07-18 18:22:40 +08:00
ec12a4159a [fix](planner) push conjuncts into SetOperationStmt inline view (#21718)
* [fix](planner)push conjuncts into SetOperationStmt inline view
2023-07-18 14:17:07 +08:00
Pxl
417e3e5616 [Feature](delete) support fold constant on delete stmt (#21833)
support fold constant on delete stmt
2023-07-18 12:56:28 +08:00
Pxl
19492b06c1 [Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890)
fix failed on test_dup_tab_decimalv3 due to wrong precision
2023-07-18 12:53:09 +08:00
07e720e65d [fix](planner)need recalculate nullable info of output slots for join node (#21650)
* [fix](planner)need recalculate nullable info of output slots for join node
2023-07-18 12:10:27 +08:00
489171e4c1 [Fix](multi catalog)Fix hive partition value contains special character such as / bug (#21876)
Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F.
Doris didn't handle this case which will cause doris failed to list the files under partition with special characters.
This pr is to fix this bug.
2023-07-18 11:20:38 +08:00
ebd2a4b707 [fix](dynamic partition) fix create hot partition failed without error response (#20996) 2023-07-18 10:56:37 +08:00
b656f31cf2 [Enchancement](compatible) show decimalv3 to decimal (#21782) 2023-07-18 09:17:14 +08:00
12784f863d [fix](Export) Fixed the bug that would be core when exporting large amounts of data (#21761)
A heap-buffer-overflow error occurs when exporting large amounts of data to orc format.
Reserve 50B for buffer to avoid this problem.
2023-07-18 00:06:38 +08:00
a92508c3f9 [Fix](statistics) Fix analyze db always use internal catalog bug (#21850)
`Analyze database db_name ` command couldn't use current catalog, it is always using the internal catalog. This will cause the command failed to find the db. This pr is to fix this bug.
2023-07-17 15:28:54 +08:00
5fc0a84735 [improvement](catalog) reduce the size thrift params for external table query (#21771)
### 1
In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange`
contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`.
So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift
data send to BE too large, resulting in:

1. the rpc of sending fragment may fail due to timeout
2. FE will OOM

For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`.
So I move this to the `TExecPlanFragmentParams`.
After that, for each FileSplit, there is only a list of `TFileRangeDesc`.

In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB,
and the above 2 issues are gone.

### 2
Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will
not be used.
Because I found that for some wide table, the footer is too large(1MB after compact, and much more after
deserialized to thrift), it will consuming too much memory of BE when there are many files.

This will be optimized later, here I just support to disable this cache.
2023-07-17 13:37:02 +08:00
03b575842d [Feature](table function) support explode_json_array_json (#21795) 2023-07-17 11:40:02 +08:00
d0775f8209 [log](profile) add doris version info to query profile (#21501) 2023-07-17 11:18:05 +08:00
Pxl
86841d8653 [Bug](materialized-view) fix some problems of mv and make ssb mv work on nereids (#21559)
fix some problems of mv and make ssb mv work on nereids
2023-07-17 10:08:25 +08:00
c409fa0f58 [Feature](Compaction)Support full compaction (#21177) 2023-07-16 13:21:15 +08:00
7a61953d17 [fix](nereids)SimplifyComparisonPredicate rule need special care for deicmalv3 and datetimev2 literal (#21575) 2023-07-14 23:05:14 +08:00
83ce4379ff [regression] add order by in test case for stable output (#21815) 2023-07-14 18:01:43 +08:00
c9a99ce171 [Feature](Nereids) support udf for Nereids (#18257)
Support alias function, Java UDF, Java UDAF for Nereids.
Implementation:
UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection.

Secondly:
For alias function:
The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function.
the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function.

Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions.

For JavaUDF and JavaUDAF
JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it.

All in all, now Nereids support UDFs and nesting them.
2023-07-14 17:02:01 +08:00
f95d728d3e [shape](nereids) TPCDS check all query shape, except ds64 (#21742)
there is a known bug on ds64 analyze. add ds 64 shape check latter
2023-07-14 16:56:46 +08:00
Pxl
4d44cea784 [Bug](materialized-view) check group expr at create mv (#21798)
check group expr at create mv
2023-07-14 15:39:38 +08:00
ca6e33ec0c [feature](table-value-functions)add catalogs table-value-function (#21790)
mysql> select * from catalogs() order by CatalogId;
2023-07-14 10:25:16 +08:00
6fd8f5cd2f [Fix](parquet-reader) Fix parquet string column min max statistics issue which caused query result incorrectly. (#21675)
In parquet, min and max statistics may not be able to handle UTF8 correctly.
Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used.
If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains 
only ASCII characters. I will improve it in the future PR.
2023-07-14 00:09:41 +08:00
37e247536a [tpcds](nereids) add tpchds 1T shape check #21753
add regression case to simulate tpcds 1T.
shape check will be added later after they are stable.
2023-07-13 21:44:10 +08:00
35fa9496e7 [fix](merge-on-write) fix wrong result when query with prefix key predicate (#21770) 2023-07-13 19:56:00 +08:00
abc21f5d77 [bugfix](ngram bf index) process differently for normal bloom filter index and ngram bf index (#21310)
* process differently for normal bloom filter index and ngram bf index

* fix review comments for readbility

* add test case

* add testcase for delete condition
2023-07-13 17:31:45 +08:00
d4bdd6768c [Feature](Nereids) support select into outfile (#21197) 2023-07-13 17:01:47 +08:00
f863c653e2 [Fix](Planner) fix limit execute before sort in show export job (#21663)
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.

Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large

Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
2023-07-13 11:17:28 +08:00
e18465eac7 [feature](TVF) support path partition keys for external file TVF (#21648) 2023-07-13 10:15:55 +08:00
00c48f7d46 [opt](regression case) add more index change case (#21734) 2023-07-12 21:52:48 +08:00
be55cb8dfc [Improve](jsonb_extract) support jsonb_extract multi parse path (#21555)
support jsonb_extract multi parse path
2023-07-12 21:37:36 +08:00
88c719233a [opt](nereids) convert OR expression to IN expression (#21326)
Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine.

for example:

```sql
col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 = 1 or col1 = 2) and  (col2 = 3 or col2 = 4)
```

would be converted to 

```sql
col1 in (1, 2) or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 in (1, 2) and (col2 in (3, 4)))
```
2023-07-12 10:53:06 +08:00
ff42cd9b49 [feature](hive)add read of the hive table textfile format array type (#21514) 2023-07-11 22:37:48 +08:00
ed410034c6 [enhancement](nereids) Sync stats across FE cluster after analyze #21482
Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache
Load partition stats to col stats
2023-07-11 20:09:02 +08:00
4cbd99ad9b [pipeline](ckb) trigger new ckb pipeline, even pr id also run (#21661)
* [pipeline](ckb) also trigger new ckb pipeline

* [pipeline](ckb) all pr run ckb pipeline

* change required

---------

Co-authored-by: stephen <hello-stephen@qq.com>
2023-07-11 15:24:26 +08:00
cb69349873 [regression] add bitmap filter p1 regression case (#21591) 2023-07-11 14:27:03 +08:00
Pxl
bb88df3779 [regression-test](agg-state) change set to set global enable_agg_state (#21708)
When there are multiple fe, we need set global to set the session variable of all fe
2023-07-11 14:15:54 +08:00
5ed42705d4 [fix](jdbc scan) 1=1 does not translate to TRUE (#21688)
For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database
2023-07-11 14:04:49 +08:00
d3be10ee58 [improvement](column) Support for the default value of current_timestamp in microsecond (#21487) 2023-07-11 14:04:13 +08:00
8eae31002d [fix](regression)update some case with timediff (#21697)
Because this pr introduces scale. However, fe current constant folding is incomplete, so the exact type cannot be deduced
2023-07-11 09:55:13 +08:00
7b403bff62 [feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623)
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
2023-07-11 09:38:56 +08:00
736d6f3b4c [improvement](timezone) support mixed uppper-lower case of timezone names (#21572) 2023-07-11 09:37:14 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
202a5c636f [fix](create table) modify varchar default length 1 to 65533 (#21302)
*modify archer default length 1 to  varchar.max.length , when create table.*

```mysql
create table t2 (             
k1 CHAR,              
K2 CHAR(10) ,               
K3 VARCHAR ,             
 K4 VARCHAR(1024) )              
duplicate key (k1)              
distributed by hash(k1) buckets 1              
properties('replication_num' = '1');  

desc t2;
```

| Field | Type           | Null | Key   | Default | Extra |
| -- |--|--| -| -| -| 
| k1    | CHAR(1)        | Yes  | true  | NULL    |       |
| K2    | CHAR(10)       | Yes  | false | NULL    | NONE  |
| K3    | VARCHAR(65533) | Yes  | false | NULL    | NONE  |
| K4    | VARCHAR(1024)  | Yes  | false | NULL    | NONE  |
2023-07-10 17:57:21 +08:00
0be349e250 [feature](jdbc) Support jdbc catalog to read json types (#21341) 2023-07-10 16:21:00 +08:00
f9c56d59fc [improvement](statistics)Support external table show table stats, modify column stats and drop stats (#21624)
Support external table show table stats, modify column stats and drop stats.
2023-07-10 11:33:06 +08:00