Commit Graph

1213 Commits

Author SHA1 Message Date
bb12a1cb49 [Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724) 2023-05-30 13:09:19 +08:00
94e1072d14 Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)" (#20204)
This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.
2023-05-30 10:35:33 +08:00
72cfe5865a [feat](optimizer) Support CTE reuse (#19934)
Before this PR, new optimizer would inline CTE directly. However in many scenario a CTE could be referenced many times, such as in TPC-DS tests, for these cases materialize the result sets of CTE and reuse it would significantly agument performance. In our tests on tpc-ds related sqls, it would improve the performance by up to almost **4 times** than before.

We introduce belowing plan node in optimizer

1. CTEConsumer: which hold a reference to CTEProducer
2. CTEProducer: Plan defined by CTE stmt
3. CTEAnchor: the father node of CTEProducer, a CTEProducer could only be referenced from  corresponding CTEAnchor's right child.

A CTEConsumer would be converted to a inlined plan if corresponding CTE referenced less than or equal `inline_cte_referenced_threshold` (it's a session variable, by default is 1).


For SQL:

```sql
EXPLAIN REWRITTEN PLAN
WITH cte AS (SELECT col2 FROM t1)
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1))
UNION ALL
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1));
```

Rewritten plan before this PR:

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LogicalJoin[559] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )      |
| |  |--LogicalProject[551] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                       |
| |  |  +--LogicalFilter[549] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                             |
| |  |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| |  +--LogicalProject[555] ( distinct=false, projects=[col2#20 AS `col2`#8], excepts=[], canEliminate=true )                                          |
| |     +--LogicalFilter[553] ( predicates=(__DORIS_DELETE_SIGN__#22 = 0) )                                                                            |
| |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| +--LogicalProject[575] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[573] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[565] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[563] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LogicalProject[569] ( distinct=false, projects=[col2#24 AS `col2`#13], excepts=[], canEliminate=true )                                      |
|          +--LogicalFilter[567] ( predicates=(__DORIS_DELETE_SIGN__#26 = 0) )                                                                         |
|             +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```

After this PR

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LOGICAL_CTE_ANCHOR#-1164890733                                                                                                                    |
| |  |--LOGICAL_CTE_PRODUCER#-1164890733                                                                                                               |
| |  |  +--LogicalProject[427] ( distinct=false, projects=[col2#1], excepts=[], canEliminate=true )                                                    |
| |  |     +--LogicalFilter[425] ( predicates=(__DORIS_DELETE_SIGN__#3 = 0) )                                                                          |
| |  |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |  +--LogicalJoin[373] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )   |
| |     |--LogicalProject[370] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                    |
| |     |  +--LogicalFilter[368] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                          |
| |     |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |     +--LOGICAL_CTE_CONSUMER#-1164890733#1038782805                                                                                                 |
| +--LogicalProject[384] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[382] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[379] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[377] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LOGICAL_CTE_CONSUMER#-1164890733#858618008                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```
2023-05-30 10:18:59 +08:00
6f31ee9492 [fix](p0 regression)Update hive docker test case result data (#20176)
Doris updated array type output format, using double quote for Strings.
Before, it was using single quote. So we need to update the case out file using double quote.
2023-05-30 00:17:30 +08:00
90b4e127e3 [Feature](inverted index) add parser_mode properties for inverted index parser (#20116)
We add parser mode for inverted index, usage like this:
```
CREATE TABLE `inverted` (
  `FIELD0` text NULL,
  `FIELD1` text NULL,
  `FIELD2` text NULL,
  `FIELD3` text NULL,
  INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '',
  INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT ''
) ENGINE=OLAP
);
```
2023-05-29 23:21:52 +08:00
8ca4f93067 [fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)
before

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+


mysql [test]>select * from divtest;
+------+------+
| id   | val  |
+------+------+
|    3 | 5.00 |
|    2 | 4.00 |
|    1 | 3.00 |
+------+------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                                   0 |
|                                   0 |
|                                   0 |
+-------------------------------------+
after

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.33 |
+-----------------------------------------------------------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                            0.250000 |
|                            0.200000 |
|                            0.333333 |
+-------------------------------------+
This is because in the previous code, the constant 1.000 would be transformed into 1.

remove "ReduceType
2023-05-29 19:51:12 +08:00
Pxl
5788214416 [Bug](function) fix equals implements not judge order by elements of function call expr (#20083)
fix equals implements not judge order by elements of function call expr
#19296
2023-05-29 19:03:05 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
a86134cb39 [fix](executor) Fixed an error with cast as time. #20144
before

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 00:00:00                      |
+-------------------------------+
after

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 10:10:10                      |
+-------------------------------+
In the past, we supported this syntax.

mysql [(none)]>select cast("2023:05:01 13:14:15" as time);
+------------------------------------------+
| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) |
+------------------------------------------+
| 13:14:15                                 |
+------------------------------------------+
However, "10:10:10" is also a valid datetime.

mysql [(none)]>select cast("10:10:10" as datetime);
+-----------------------------------+
| CAST('10:10:10' AS DATETIMEV2(0)) |
+-----------------------------------+
| 2010-10-10 00:00:00               |
+-----------------------------------+
So here, the order of parsing has been adjusted.
2023-05-29 12:17:21 +08:00
970efdc1cb [Feature](Nereids) support advanced materialized view (#19650)
Increase the functionality of advanced materialized view

This feature already supported by legacy planner with PR #19650

This PR implement it in Nereids. This PR implement the features as below:
1. Support multiple columns in aggregate function.  eg: select sum(c1 + c2) from t1;
2. Supports complex expressions.  eg: select abs(c1), sum(abc(c1+1) + 1) from t1;

TODO:
1. Support adding where in materialized view
2023-05-29 10:37:44 +08:00
859b03dfdf [Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978) 2023-05-29 10:16:15 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
4cbb6ece10 [fix](fe)ordering exprs should be substituted in the same way as select part (#20091) 2023-05-27 21:00:57 +08:00
f54a068d82 [feature](function) add json->operator convert to json_extract (#19899) 2023-05-27 12:45:45 +08:00
f3d8af330a [Bug](point query) check point query before check two phase read (#20055)
* [Bug](point query) checkAndSetPointQuery before checkEnableTwoPhaseRead

1. checkEnableTwoPhaseRead rely on thr short circuit flag
2. add more metric to display lookup profile

* fix rebase
2023-05-27 12:38:58 +08:00
9539bbf8ae Revert "[test](executor)add crud regression test for resource group (#19659)" (#20121)
This reverts commit 8b9813663d87afa7b359b31782f3864dc54881df.
2023-05-27 08:25:00 +08:00
23c95d15da [regression-test](sort) Fix unstable sorting (#20125) 2023-05-26 23:42:05 +08:00
860e28a3a3 [Fix](multi-catalog) Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true. (#20021)
Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true.
Fix regression-test test_oracle_jdbc_catalog.
2023-05-26 21:35:38 +08:00
ce45d6119d [FIX](regress-test) fix struct_export out data (#20111)
fix struct_export out data
2023-05-26 19:57:51 +08:00
317338913c [Bug](topn) Fix topn fetch set real default value (#20074)
1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side.

2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model
2023-05-26 16:06:55 +08:00
488c9ba7c2 [improvement](exchange) test: data stream sender stop sending data to receiver if it returns eos early (#20081) 2023-05-26 16:05:38 +08:00
Pxl
43aa062fb1 [Chore](hash-join) remove useless conditions and add some case (#20050) 2023-05-26 14:45:24 +08:00
315b30c23d [testcase](union) add test case for union of decimal (#20080) 2023-05-26 14:12:14 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
558f625d3b [fix](planner) The group by part should be substituted in the same way as select part (#20019) 2023-05-26 11:05:02 +08:00
9c22fc4130 [fix](multi catalog)Support Hive partiton manually removed (#20024)
If the user manually removed a hive partition (remove the partition dir through hdfs), doris will failed to query the hive 
table with an error message get file split failed for table. That is because the Hive metadata still contains the removed partition.
This pr is to fix this bug. Skip the not exist dirs.
2023-05-26 10:32:45 +08:00
5621ae08e6 [fix](Nereids) function ABS return type not same between constant folding and function signature (#20059)
The abs return the wrong type for the integer type. Return the int type when the arg's type is integer
2023-05-26 10:24:32 +08:00
f1b949ad59 [fix](Nereids) local sort should not translate to unpartitioned partition (#20031)
1. local sort should not update current fragment partition to UNPARTITIONED
2. should set input fragment dest exchange node after create dest fragment
2023-05-26 10:18:56 +08:00
0dce725120 [fix](nereids)fix decimalv3 type error of mod operator (#20039) 2023-05-25 17:25:11 +08:00
002c76e06f [vectorized](udaf) support udaf function work with window function (#19962) 2023-05-25 14:38:47 +08:00
8149b757c4 [Feature](Nereids)support insert into select command (#18869)
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:

```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
  select * from src
)
select k1, k2, k3 from cte
```

we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids. 
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
2023-05-25 10:44:41 +08:00
1dd3a4ed3a [fix](Nereids) fix unstable regression test cases and some bugs (#19999)
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors

Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql

Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy

Unstable regression test case
1. nereids_syntax_p0/join.groovy
2023-05-24 20:34:01 +08:00
a713c225a5 [regressiontest](statistics) Collate and supplement statistics regression test (#19901)
This pr is mainly supplement statistics regression test. include the following:

analyze stats p0 tests:

1. Universal analysis

analyze stats p1 tests:

1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis

manage stats p0 tests:

1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats

TODO:

1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases

For pr related to statistics, should ensure that all of these cases pass!
2023-05-24 20:17:28 +08:00
4aad88abc4 [test](Nereids) fix tpcds shape out file #20002 2023-05-24 17:40:13 +08:00
f14e6189a9 [feature](load-refactor) Unfied mysql load use InsertStmt (#19571) 2023-05-24 12:09:16 +08:00
384a0c7aa7 [fix](testcases) Fix some unstable testcases. (#19956)
case of test_string_concat_extremely_long_string will exceed our test limit. Move it to p2 so that it will be tested only in SelectDB test environment.
Because we need to keep consistent with MySQL & avoid overflow. the q67 must keep its behavior like now. When we fully apply nereids & decimalV3 then, it will be fixed automatically.
In the parallel test, although all query stats were cleaned, the cases run in parallel will affect this. So we need to use a unique table for query_stats_test
test_query_sys_tables didn't deal with some unstable situations. fixed it.
temporarily disable unstable case analyze_test case for p0.
2023-05-24 09:52:02 +08:00
a6674bb7b1 [regression](nereids) tpcds sf100 plan shape regression cases (#19913) 2023-05-23 18:48:00 +08:00
35f8fc22f2 [testcase](test) Fix query stats test may failed (#19958) 2023-05-23 18:33:07 +08:00
a434a49f71 [Bug](decimal) fix mod function (#19925)
Bug:
select id, kdcml * ktint, kdcml / ktint, kdcml % ktint from expr_test order by id;
+------+-------------------+-------------------+-----------------------+
| id | kdcml * ktint | kdcml / ktint | kdcml % ktint |
+------+-------------------+-------------------+-----------------------+
| NULL | NULL | NULL | NULL |
| 1 | 24.395 | 24.395 | -4702111234474983.74 |
| 2 | 68.968 | 17.242 | -4702111234474983.74 |
| 3 | 146.268 | 16.252 | -4702111234474983.74 |
| 4 | 275.772 | 17.235 | -4702111234474983.74 |
| 5 | 487.470 | 19.498 | -4702111234474983.74 |
| 6 | 827.244 | 22.979 | -4702111234474983.74 |
| 7 | 1364.860 | 27.854 | -4702111234474983.74 |
| 8 | 2205.928 | 34.467 | -4702111234474983.74 |
| 9 | 3509.595 | 43.328 | -4702111234474983.74 |
| 10 | 5514.790 | 55.147 | -4702111234474983.74 |
| 11 | 8578.988 | 70.900 | -4702111234474983.74 |
| 12 | 13235.484 | 91.913 | -4702111234474983.74 |
| 13 | 24.395 | 24.395 | -4702111234474983.74 |
| 14 | 68.968 | 17.242 | -4702111234474983.74 |
| 15 | 146.268 | 16.252 | -4702111234474983.74 |
| 16 | 275.772 | 17.235 | -4702111234474983.74 |
| 17 | 487.470 | 19.498 | -4702111234474983.74 |
| 18 | 827.244 | 22.979 | -4702111234474983.74 |
| 19 | 1364.860 | 27.854 | -4702111234474983.74 |
| 20 | 2205.928 | 34.467 | -4702111234474983.74 |
| 21 | 3509.595 | 43.328 | -4702111234474983.74 |
| 22 | 5514.790 | 55.147 | -4702111234474983.74 |
| 23 | 8578.988 | 70.900 | -4702111234474983.74 |
| 24 | 13235.484 | 91.913 | -4702111234474983.74 |
2023-05-23 18:24:31 +08:00
c88ba85e10 [Bug](schema-change) fix varchar can not change to datev2 #19952 2023-05-23 18:18:55 +08:00
4398b91576 [Fix](multi catalog)Change all partition names to lower case (#19816)
Iceberg table partition name may contain upper case characters, for example: City=xxx, Nation=xxx.
But in Doris, all column names are in lower case. Here we transfer the partition name to lower case to keep consist with column name.
2023-05-23 09:31:31 +08:00
bd74890cf7 [fix](multi-catalog) JDBC Catalog Unknown UNSIGNED type of mysql, type: [DOUBLE] (#19912) 2023-05-23 09:29:57 +08:00
6762af3c9b [Improve](struct)improve struct support into outfile (#19894)
support select into outfile for struct type
2023-05-22 18:45:56 +08:00
Pxl
9945067e3c [Bug](function) make VcompoundPred optimization work well (#19870)
make VcompoundPred optimization work well
#19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28.
The reason is some nullable logic on mysql need special handling.

mysql [regression_test_tpcds_sf1_p1]>select null and false;
+----------------+
| NULL AND FALSE |
+----------------+
|              0 |
+----------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null and true;
+---------------+
| NULL AND TRUE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or false;
+---------------+
| NULL OR FALSE |
+---------------+
| NULL          |
+---------------+
1 row in set (0.00 sec)

mysql [regression_test_tpcds_sf1_p1]>select null or true;
+--------------+
| NULL OR TRUE |
+--------------+
|            1 |
+--------------+
1 row in set (0.00 sec)
2023-05-22 18:32:17 +08:00
Pxl
e9223f6a19 [Feature](aggregation) add agg_state define and ddl support (#19824)
add agg_state define and ddl support
2023-05-22 11:45:53 +08:00
Pxl
d64be9565d [Bug](function) fix function in get wrong result when input const column (#19791)
fix function in get wrong result when input const column
2023-05-22 10:58:29 +08:00
8b9813663d [test](executor)add crud regression test for resource group (#19659)
dd crud regression test for resource group (#19659)
2023-05-20 13:49:02 +08:00
ca737c37ee add testcases for inverted index on different datatypes (#19843) 2023-05-20 00:21:34 +08:00
67dc68630b [Improve](complex-type)improve array/map/struct creating and function with decimalv3 (#19830) 2023-05-19 17:43:36 +08:00
609b20bd02 [Feature](planner) use partial update in update from & delete from (#19262) 2023-05-19 09:46:29 +08:00