Commit Graph

1090 Commits

Author SHA1 Message Date
93c48f2bb0 [fix](regression) fix show create table in_memory = false test result error #19022 2023-04-25 09:04:59 +08:00
207c827cdb [fix](test) fix result of CHARACTER_OCTET_LENGTH in . (#18896) 2023-04-25 08:42:54 +08:00
efebb3d21e [fix](schema) fix show create table get wrong random distribution info (#18895)
* [fix](schema) fix show create table get wrong random distribution info


---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-24 23:33:42 +08:00
6bf51150f3 [fix](nereids) remove unnecessary project above scan node (#18920)
1. remove unnecessary project node above scan node.
2. fix in subquery may be recognized as scalar subquery bug
3. fix some Quantile related functions' return type bug
2023-04-24 13:58:57 +08:00
Pxl
1f9450e0f7 [Chore](case) add some regression-test case about materialized-view #18946 2023-04-24 11:36:56 +08:00
ab2a6864bc [function](json) Json unquote (#18037) 2023-04-24 10:33:29 +08:00
45d0f53529 [Regression-test](Export) add regression test for export #18897 2023-04-23 19:43:22 +08:00
a9ac930e5f [Fix](mutli-catalogs) Fix jdbc regression tests. (#18927)
- Fix `test_show_where` result.
- Remove `enable_decimal_conversion = true` in `test_mysql_jdbc_catalog`.
- Remove `test_show_create_catalog`.
2023-04-23 19:42:13 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
de0e89d1b4 [feature](function) Modified cast as time to behave more like MySQL (#18565)
Because the underlying type of time was float64, select cast("19:22:18" as time) would result in a null value in the past.
Results in the following:
2023-04-22 06:11:59 +08:00
6eea3d9e2d [Test](multi-catalog) Fix test_hive_parquet regression test order issue. (#18879)
l_orderkey cannot guarantee unique order.
2023-04-21 22:59:34 +08:00
425101bf53 [fix](test)Move broker test to p2. Move test data to cos in Beijing region (#18893)
Fix broker load p2 test case error.
1. Move test data from cos Hong kong region to Beijing region.
2. Move broker load test to p2 group.
3. Fix error message mismatch error.
2023-04-21 22:15:52 +08:00
f7651d8dfb (fix)[olap] not support in_memory=true now (#18731)
* (fix)[olap] can not set in_memory=true now

---------

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-04-21 21:55:37 +08:00
af20b2c95e [Bug](topn opt) Fix be crash when enable topn opt with larger thresho… (#18858)
topn opt should be inited when update it
2023-04-21 17:45:00 +08:00
ec1ab1a3d2 [Improve](GEO)wkb input and output are represented as hexadecimal strings And delete EWKB (#18721) 2023-04-21 15:11:18 +08:00
c41b486e7e [fix](nereids) LogicalProject should always has non-empty project list (#18863) 2023-04-21 14:28:07 +08:00
1a6401d682 [enchancement](statistics) support sampling collection of statistics (#18880)
1. Supports sampling to collect statistics
2. Improved syntax for collecting statistics
3. Support histogram specifies the number of buckets
4. Tweaked some code structure

---

The syntax supports WITH and PROPERTIES, using the same syntax as before.

Column Statistics Collection Syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     [ [WITH SYNC] | [WITH INCREMENTAL] | [WITH SAMPLE PERCENT | ROWS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Column histogram collection syntax:
```SQL
ANALYZE [ SYNC ] TABLE table_name
     [ (column_name [, ...]) ]
     UPDATE HISTOGRAM
     [ [ WITH SYNC ][ WITH INCREMENTAL ][ WITH SAMPLE PERCENT | ROWS ][ WITH BUCKETS ] ]
     [ PROPERTIES ('key' = 'value', ...) ];
```

Illustrate:
- sync:Collect statistics synchronously. Return after collecting.
- incremental:Collect statistics incrementally. Incremental collection of histogram statistics is not supported.
- sample percent | rows:Collect statistics by sampling. Scale and number of rows can be sampled.
- buckets:Specifies the maximum number of buckets generated when collecting histogram statistics.
- table_name: The purpose table for collecting statistics. Can be of the form `db_name.table_name`.
- column_name: The specified destination column must be a column that exists in `table_name`, and multiple column names are separated by commas.
- properties:Properties used to set statistics tasks. Currently only the following configurations are supported (equivalent to the with statement)
   - 'sync' = 'true'
   - 'incremental' = 'true'
   - 'sample.percent' = '50'
   - 'sample.rows' = '1000'
   - 'num.buckets' = 10

--- 

TODO: 
- Supplement the complete p0 test
- `Incremental` statistics see #18653
2023-04-21 13:11:43 +08:00
Pxl
c033c6239f [Bug](table-function) fix wrong result when seprator of explode_split size more than one (#18824)
fix wrong result when seprator of explode_split size more than one
2023-04-21 11:00:47 +08:00
3328a65b75 [Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835)
Fix decimal v3 precision loss issues in the multi-catalog module.
Now it will use decimal v3 to represent decimal type in the multi-catalog module.
Regression Test: `test_load_with_decimal.groovy`
2023-04-20 11:02:53 +08:00
f280b04736 [regression-test](iceberg)add iceberg in regression case (#18792)
add iceberg 'in' clause regression case
for #18226
2023-04-19 15:09:20 +08:00
7d6b1a115a [feature](nereids)Tpc-h 1T plan shape check #18717
add regression test to check tpc-h 1T plan shape
2023-04-19 12:00:54 +08:00
031d35d4a1 [fix](stats) Stats still in cache after user dropped it (#18720)
1. Evict the dropped stats from cache
2. Remove codes for the partition level stats collection
3. Disable analyze whole database directly
4. Fix the potential death loop in the stats cleaner
5. Sleep thread in each loop when scanning stats table to avoid excessive IO usage by this task.
2023-04-18 16:41:10 +08:00
0b074ade02 [fix](const column) fix coredump caused by const column for some functions (#18737) 2023-04-18 13:57:55 +08:00
6b351a2818 [vectorzied](function) fix array_map function analyzed failed with order by clause (#18676)
* [vectorzied](function) fix array_map function analyzed failed with order by clause

* add test
2023-04-18 12:01:44 +08:00
74d424e6d4 [Bug](DECIMAL) Fix bug for arithmatic expr DECIMALV2 / DECIMALV3 (#18723) 2023-04-17 16:43:36 +08:00
1e06763366 [fix](bitmap) fix bitmap_count errors to set nullable to non-nullable bitmap col (#18689) 2023-04-17 13:23:27 +08:00
5300b21db7 [Bug](DECIMALV3) report failure if a decimal value is overflow (#18336) 2023-04-17 13:18:14 +08:00
a2278dbc6c [opt](nereids) optimize filter estimation for pattern "col=col" #18716
Tpc-h q10 and q5 benefit from this optimization.

For a given hash join condition, A=B, sometimes both A and B are reduced by filters. In this pr, both reductions are counted in join estimation.
2023-04-17 11:44:35 +08:00
092d81f88a [BugFix](functions) fix multi_search_all_positions #18682 2023-04-17 08:32:57 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
3ec52dc7da [tpch](nereids) add regression test for tpch_sf500 plan shape #18631
add regression test to check tpch_sf500 plan shape by explain shape plan.
2023-04-16 11:37:33 +08:00
98b8bef05b [bugfix](inverted index) fix inverted index to support NULL value filter (#18302) 2023-04-15 13:20:26 +08:00
f7e129934e [fix](nereids) only order by slot reference could use topn opt (#18622)
select cast(k1 as INT) as id from tbl1 order by id limit 2; 

is not valid for topN optimization, because 'id' is
a cast expr not a table column from scan node.
This pr address this issue.
2023-04-14 20:59:06 +08:00
f2d75cb492 [fix](Nereids) fix signature precision round for decimalv3 (#18639)
add decimalv3 signature to below functions:
ceil
dceil
dfloor
dround
floor
round
round_bankers
truncate
fix ComputePrecisionForRound to get correct signature
2023-04-14 18:18:41 +08:00
362b5a34ae [feat](stats) Support to delete expired stats periodically (#18614)
Support to delete expired stats periodically and manually.

default cleaner running interval is 2 days

Manually clean syntax is
```sql
DROP EXPIRED STATS
```

TODO:
1. process external catalog's stats
2. run drop at the appointed time
3. sleep a short time after drop one batch
2023-04-14 17:32:51 +08:00
4174d5a707 [opt](nereids) optimze aggregation estimation #18607
`select count(*) from T group by A, B`
suppose `ndv(A) > ndv(B)`
the estimated row count of aggregate is between ndv(A) and ndv(A) * ndv(B)

in previous version, we choose upper bound, that is ndv(A) * ndv(B). The drawback of this choice is the estimated row is often bigger that row count of T.

In this version, we choose the lower bound.
2023-04-14 16:13:25 +08:00
4d18ea30f4 [fix](Nereids) get_json_bigint should return bigint type (#18626) 2023-04-14 14:01:44 +08:00
8751f08d5a [bugfix](GEO)fix precision problem (#18642) 2023-04-14 10:39:19 +08:00
6c0af24e9d [Improve](simdjson reader) support UTF-8 unicode (with BOM) (#18585) 2023-04-13 21:58:44 +08:00
aa6b3cc537 [fix](planner)keep all agg functions if there is any virtual slots in group by list (#18630)
Because of the limitation of ProjectPlanner, we have to keep set agg functions materialized if there is any virtual slots in the group by list, such as 'GROUPING_ID' in the group by list etc.
2023-04-13 19:44:46 +08:00
2f64a8b387 [feature](GEO)Support read/write WKB/EWKB to gis types (#18526)
Support mutual conversion from wkb and gis types.also compatible with EWKB format
https://cwiki.apache.org/confluence/display/DORIS/DSIP-033%3A+More+GEO+functions
2023-04-13 16:25:18 +08:00
Pxl
eb46bcb304 [Bug](materialized-view) fix match wrong index on some scan node (#18561)
fix match wrong index on some scan node
2023-04-13 11:50:14 +08:00
df0aaece1d [Function](test) add some test cases for agg functions (#18610) 2023-04-13 10:23:41 +08:00
d57371da13 [feature](struct-type) support basic struct constructor function (#18190)
This commit will support struct and named_struct function.
2023-04-13 09:18:00 +08:00
1f9372558d [improve](regression case) Add more inverted index regression case (#18589)
1. add more inverted index regression case for unique mow
2. add inverted index case with different data types
2023-04-12 20:40:55 +08:00
a9f9366736 [fix](nereids) the data type of compareExpr and listQuery should be the same when creating InSubquery (#18539)
Consider sql

select table_B_alias.b from table_B_alias where table_B_alias.b in ( select a from table_A_alias );

if table_B_alias.b is int and table_A_alias.a is bigint,
we should cast(b as bigint) to make the data type the same as the InSubquery.
2023-04-12 20:02:37 +08:00
db44970685 [feature](stats) Support sync analyze (#18567)
Gammer:

```
ANALYZE [SYNC] TABLE ....
```

Add this feature so that we could test and tune stats framework conveniently.
2023-04-12 17:49:30 +08:00
b93e04ab66 [test](Nereids) add regression test to check join order for tpch queries (#18543)
by explain shape plan command, with stats injection, we add regression test to check tpch queries' plan shape.
2023-04-12 15:43:21 +08:00
43392918cd [Optimization](functions)Optimize function call for const columns. (#18310) 2023-04-12 11:11:01 +08:00
1238f6de97 [bug](array) fix be core in array_with_constant/array_repeat function when the first argument is nullable (#18404)
fix be core in array_with_constant/array_repeat function when the first argument is nullable
2023-04-11 19:46:41 +08:00