Commit Graph

1726 Commits

Author SHA1 Message Date
bb12a1cb49 [Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724) 2023-05-30 13:09:19 +08:00
94e1072d14 Revert "[fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)" (#20204)
This reverts commit 8ca4f9306763b5a18ffda27a07ab03cc77351e35.
2023-05-30 10:35:33 +08:00
72cfe5865a [feat](optimizer) Support CTE reuse (#19934)
Before this PR, new optimizer would inline CTE directly. However in many scenario a CTE could be referenced many times, such as in TPC-DS tests, for these cases materialize the result sets of CTE and reuse it would significantly agument performance. In our tests on tpc-ds related sqls, it would improve the performance by up to almost **4 times** than before.

We introduce belowing plan node in optimizer

1. CTEConsumer: which hold a reference to CTEProducer
2. CTEProducer: Plan defined by CTE stmt
3. CTEAnchor: the father node of CTEProducer, a CTEProducer could only be referenced from  corresponding CTEAnchor's right child.

A CTEConsumer would be converted to a inlined plan if corresponding CTE referenced less than or equal `inline_cte_referenced_threshold` (it's a session variable, by default is 1).


For SQL:

```sql
EXPLAIN REWRITTEN PLAN
WITH cte AS (SELECT col2 FROM t1)
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1))
UNION ALL
SELECT * FROM t1 WHERE (col3 IN (SELECT c1.col2 FROM cte c1));
```

Rewritten plan before this PR:

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LogicalJoin[559] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )      |
| |  |--LogicalProject[551] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                       |
| |  |  +--LogicalFilter[549] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                             |
| |  |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| |  +--LogicalProject[555] ( distinct=false, projects=[col2#20 AS `col2`#8], excepts=[], canEliminate=true )                                          |
| |     +--LogicalFilter[553] ( predicates=(__DORIS_DELETE_SIGN__#22 = 0) )                                                                            |
| |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                    |
| +--LogicalProject[575] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[573] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[565] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[563] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LogicalProject[569] ( distinct=false, projects=[col2#24 AS `col2`#13], excepts=[], canEliminate=true )                                      |
|          +--LogicalFilter[567] ( predicates=(__DORIS_DELETE_SIGN__#26 = 0) )                                                                         |
|             +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```

After this PR

```
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalUnion ( qualifier=ALL, outputs=[col1#14, col2#15, col3#16], hasPushedFilter=false )                                                           |
| |--LOGICAL_CTE_ANCHOR#-1164890733                                                                                                                    |
| |  |--LOGICAL_CTE_PRODUCER#-1164890733                                                                                                               |
| |  |  +--LogicalProject[427] ( distinct=false, projects=[col2#1], excepts=[], canEliminate=true )                                                    |
| |  |     +--LogicalFilter[425] ( predicates=(__DORIS_DELETE_SIGN__#3 = 0) )                                                                          |
| |  |        +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |  +--LogicalJoin[373] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#6 = col2#8)], otherJoinConjuncts=[] )   |
| |     |--LogicalProject[370] ( distinct=false, projects=[col1#4, col2#5, col3#6], excepts=[], canEliminate=true )                                    |
| |     |  +--LogicalFilter[368] ( predicates=(__DORIS_DELETE_SIGN__#7 = 0) )                                                                          |
| |     |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
| |     +--LOGICAL_CTE_CONSUMER#-1164890733#1038782805                                                                                                 |
| +--LogicalProject[384] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=false )                                       |
|    +--LogicalJoin[382] ( type=LEFT_SEMI_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(col3#11 = col2#13)], otherJoinConjuncts=[] ) |
|       |--LogicalProject[379] ( distinct=false, projects=[col1#9, col2#10, col3#11], excepts=[], canEliminate=true )                                  |
|       |  +--LogicalFilter[377] ( predicates=(__DORIS_DELETE_SIGN__#12 = 0) )                                                                         |
|       |     +--LogicalOlapScan ( qualified=default_cluster:test.t1, indexName=t1, selectedIndexId=42723, preAgg=ON )                                 |
|       +--LOGICAL_CTE_CONSUMER#-1164890733#858618008                                                                                                  |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

```
2023-05-30 10:18:59 +08:00
6f31ee9492 [fix](p0 regression)Update hive docker test case result data (#20176)
Doris updated array type output format, using double quote for Strings.
Before, it was using single quote. So we need to update the case out file using double quote.
2023-05-30 00:17:30 +08:00
90b4e127e3 [Feature](inverted index) add parser_mode properties for inverted index parser (#20116)
We add parser mode for inverted index, usage like this:
```
CREATE TABLE `inverted` (
  `FIELD0` text NULL,
  `FIELD1` text NULL,
  `FIELD2` text NULL,
  `FIELD3` text NULL,
  INDEX idx_name1 (`FIELD0`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "fine_grained") COMMENT '',
  INDEX idx_name2 (`FIELD1`) USING INVERTED PROPERTIES("parser" = "chinese", "parser_mode" = "coarse_grained") COMMENT ''
) ENGINE=OLAP
);
```
2023-05-29 23:21:52 +08:00
8ca4f93067 [fix](DECIMALV3) Fix the error in DECIMALV3 when explicitly casting. (#19926)
before

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.00 |
+-----------------------------------------------------------+


mysql [test]>select * from divtest;
+------+------+
| id   | val  |
+------+------+
|    3 | 5.00 |
|    2 | 4.00 |
|    1 | 3.00 |
+------+------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                                   0 |
|                                   0 |
|                                   0 |
+-------------------------------------+
after

mysql [test]>select cast(1 as DECIMALV3(16, 2)) /  cast(3 as DECIMALV3(16, 2));
+-----------------------------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / CAST(3 AS DECIMALV3(16, 2)) |
+-----------------------------------------------------------+
|                                                      0.33 |
+-----------------------------------------------------------+

mysql [test]>select cast(1 as decimalv3(16,2)) / val from divtest;
+-------------------------------------+
| CAST(1 AS DECIMALV3(16, 2)) / `val` |
+-------------------------------------+
|                            0.250000 |
|                            0.200000 |
|                            0.333333 |
+-------------------------------------+
This is because in the previous code, the constant 1.000 would be transformed into 1.

remove "ReduceType
2023-05-29 19:51:12 +08:00
5f37396514 [Enhancement](Nerieds) add switch for developing Nereids DML (#20100) 2023-05-29 19:06:55 +08:00
Pxl
5788214416 [Bug](function) fix equals implements not judge order by elements of function call expr (#20083)
fix equals implements not judge order by elements of function call expr
#19296
2023-05-29 19:03:05 +08:00
198433b131 [typo](config)Remove FE config max_conn_per_user (#20122)
---------

Co-authored-by: Yijia Su <suyijia@selectdb.com>
2023-05-29 17:20:36 +08:00
cc47ee480c [feat](stats) delete data size stat and Made task timeout configurable (#20090)
1. Delete the stats for data size, since it would cost too much time but useless
2. Make task time out configurable since when it's common to analyze a quite huge table that the default 10 min is not suitable
2023-05-29 16:40:59 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
a86134cb39 [fix](executor) Fixed an error with cast as time. #20144
before

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 00:00:00                      |
+-------------------------------+
after

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 10:10:10                      |
+-------------------------------+
In the past, we supported this syntax.

mysql [(none)]>select cast("2023:05:01 13:14:15" as time);
+------------------------------------------+
| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) |
+------------------------------------------+
| 13:14:15                                 |
+------------------------------------------+
However, "10:10:10" is also a valid datetime.

mysql [(none)]>select cast("10:10:10" as datetime);
+-----------------------------------+
| CAST('10:10:10' AS DATETIMEV2(0)) |
+-----------------------------------+
| 2010-10-10 00:00:00               |
+-----------------------------------+
So here, the order of parsing has been adjusted.
2023-05-29 12:17:21 +08:00
9f8de89659 [refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758)
Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity.

By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed.

This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.
2023-05-29 11:47:31 +08:00
970efdc1cb [Feature](Nereids) support advanced materialized view (#19650)
Increase the functionality of advanced materialized view

This feature already supported by legacy planner with PR #19650

This PR implement it in Nereids. This PR implement the features as below:
1. Support multiple columns in aggregate function.  eg: select sum(c1 + c2) from t1;
2. Supports complex expressions.  eg: select abs(c1), sum(abc(c1+1) + 1) from t1;

TODO:
1. Support adding where in materialized view
2023-05-29 10:37:44 +08:00
859b03dfdf [Improvement](topn) prevent memory usage of key topn increasing unlimited (#19978) 2023-05-29 10:16:15 +08:00
ae352997b4 [Enhancement](alter inverted index) Improve alter inverted index performance with light weight add or drop inverted index (#19063) 2023-05-28 11:23:07 +08:00
637e083343 [regression](test) fix test case failed in pipeline mode (#20139) 2023-05-27 22:42:25 +08:00
4cbb6ece10 [fix](fe)ordering exprs should be substituted in the same way as select part (#20091) 2023-05-27 21:00:57 +08:00
f54a068d82 [feature](function) add json->operator convert to json_extract (#19899) 2023-05-27 12:45:45 +08:00
f3d8af330a [Bug](point query) check point query before check two phase read (#20055)
* [Bug](point query) checkAndSetPointQuery before checkEnableTwoPhaseRead

1. checkEnableTwoPhaseRead rely on thr short circuit flag
2. add more metric to display lookup profile

* fix rebase
2023-05-27 12:38:58 +08:00
9539bbf8ae Revert "[test](executor)add crud regression test for resource group (#19659)" (#20121)
This reverts commit 8b9813663d87afa7b359b31782f3864dc54881df.
2023-05-27 08:25:00 +08:00
23c95d15da [regression-test](sort) Fix unstable sorting (#20125) 2023-05-26 23:42:05 +08:00
860e28a3a3 [Fix](multi-catalog) Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true. (#20021)
Fix db name is not lower case when jdbc catalog configuration lower_case_table_names is true.
Fix regression-test test_oracle_jdbc_catalog.
2023-05-26 21:35:38 +08:00
ce45d6119d [FIX](regress-test) fix struct_export out data (#20111)
fix struct_export out data
2023-05-26 19:57:51 +08:00
317338913c [Bug](topn) Fix topn fetch set real default value (#20074)
1. Before this PR if rowset does not contain column which should be read for related SlotDescriptor will call `insert_default` to column, but it's not this real defautl value.Real default value relevant information should be provided by the frontend side.

2. Support fetch when light schema change is not enabled, but disable for AGG or UNIQUE MOR model
2023-05-26 16:06:55 +08:00
488c9ba7c2 [improvement](exchange) test: data stream sender stop sending data to receiver if it returns eos early (#20081) 2023-05-26 16:05:38 +08:00
Pxl
43aa062fb1 [Chore](hash-join) remove useless conditions and add some case (#20050) 2023-05-26 14:45:24 +08:00
315b30c23d [testcase](union) add test case for union of decimal (#20080) 2023-05-26 14:12:14 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
558f625d3b [fix](planner) The group by part should be substituted in the same way as select part (#20019) 2023-05-26 11:05:02 +08:00
9c22fc4130 [fix](multi catalog)Support Hive partiton manually removed (#20024)
If the user manually removed a hive partition (remove the partition dir through hdfs), doris will failed to query the hive 
table with an error message get file split failed for table. That is because the Hive metadata still contains the removed partition.
This pr is to fix this bug. Skip the not exist dirs.
2023-05-26 10:32:45 +08:00
5621ae08e6 [fix](Nereids) function ABS return type not same between constant folding and function signature (#20059)
The abs return the wrong type for the integer type. Return the int type when the arg's type is integer
2023-05-26 10:24:32 +08:00
f1b949ad59 [fix](Nereids) local sort should not translate to unpartitioned partition (#20031)
1. local sort should not update current fragment partition to UNPARTITIONED
2. should set input fragment dest exchange node after create dest fragment
2023-05-26 10:18:56 +08:00
d6998723e8 Comment stats unstable cases (#20034) 2023-05-25 21:08:00 +08:00
0dce725120 [fix](nereids)fix decimalv3 type error of mod operator (#20039) 2023-05-25 17:25:11 +08:00
694b8b6cd3 [test](pipline) adjust mem limit to 50% (#20030) 2023-05-25 15:51:32 +08:00
Pxl
618961053f [Bug](materialized-view) forbid create mv/rollup on mow table (#20001)
forbid create mv/rollup on mow table
2023-05-25 15:30:12 +08:00
002c76e06f [vectorized](udaf) support udaf function work with window function (#19962) 2023-05-25 14:38:47 +08:00
8149b757c4 [Feature](Nereids)support insert into select command (#18869)
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:

```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
  select * from src
)
select k1, k2, k3 from cte
```

we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids. 
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
2023-05-25 10:44:41 +08:00
Pxl
f9a4a04bdb [fix](Nereids) npe when one row relation contain aggregate function (#19974)
mysql [test]>select sum(1);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
2023-05-25 09:09:50 +08:00
1dd3a4ed3a [fix](Nereids) fix unstable regression test cases and some bugs (#19999)
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors

Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql

Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy

Unstable regression test case
1. nereids_syntax_p0/join.groovy
2023-05-24 20:34:01 +08:00
a713c225a5 [regressiontest](statistics) Collate and supplement statistics regression test (#19901)
This pr is mainly supplement statistics regression test. include the following:

analyze stats p0 tests:

1. Universal analysis

analyze stats p1 tests:

1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis

manage stats p0 tests:

1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats

TODO:

1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases

For pr related to statistics, should ensure that all of these cases pass!
2023-05-24 20:17:28 +08:00
4aad88abc4 [test](Nereids) fix tpcds shape out file #20002 2023-05-24 17:40:13 +08:00
c84fd79051 [regression](nereids) fix tpcds plan shape #19985
skip tpcds 88/16/28/61/85/17/9/50/25/39/29/13/48/64
2023-05-24 14:04:28 +08:00
f14e6189a9 [feature](load-refactor) Unfied mysql load use InsertStmt (#19571) 2023-05-24 12:09:16 +08:00
b4669eaeba [Improve](complex-type)add switch for array/struct/map nesting complex type (#19928)
Now we not support array/map/struct nesting each other for many action in be , If we do prohibit it in fe, we will meet many undefined action in be , so I just add switch to prohibit nesting complex type . When we fully support , can make it able.
Issue Number: close #xxx
2023-05-24 11:39:53 +08:00
384a0c7aa7 [fix](testcases) Fix some unstable testcases. (#19956)
case of test_string_concat_extremely_long_string will exceed our test limit. Move it to p2 so that it will be tested only in SelectDB test environment.
Because we need to keep consistent with MySQL & avoid overflow. the q67 must keep its behavior like now. When we fully apply nereids & decimalV3 then, it will be fixed automatically.
In the parallel test, although all query stats were cleaned, the cases run in parallel will affect this. So we need to use a unique table for query_stats_test
test_query_sys_tables didn't deal with some unstable situations. fixed it.
temporarily disable unstable case analyze_test case for p0.
2023-05-24 09:52:02 +08:00
a6674bb7b1 [regression](nereids) tpcds sf100 plan shape regression cases (#19913) 2023-05-23 18:48:00 +08:00
35f8fc22f2 [testcase](test) Fix query stats test may failed (#19958) 2023-05-23 18:33:07 +08:00
a434a49f71 [Bug](decimal) fix mod function (#19925)
Bug:
select id, kdcml * ktint, kdcml / ktint, kdcml % ktint from expr_test order by id;
+------+-------------------+-------------------+-----------------------+
| id | kdcml * ktint | kdcml / ktint | kdcml % ktint |
+------+-------------------+-------------------+-----------------------+
| NULL | NULL | NULL | NULL |
| 1 | 24.395 | 24.395 | -4702111234474983.74 |
| 2 | 68.968 | 17.242 | -4702111234474983.74 |
| 3 | 146.268 | 16.252 | -4702111234474983.74 |
| 4 | 275.772 | 17.235 | -4702111234474983.74 |
| 5 | 487.470 | 19.498 | -4702111234474983.74 |
| 6 | 827.244 | 22.979 | -4702111234474983.74 |
| 7 | 1364.860 | 27.854 | -4702111234474983.74 |
| 8 | 2205.928 | 34.467 | -4702111234474983.74 |
| 9 | 3509.595 | 43.328 | -4702111234474983.74 |
| 10 | 5514.790 | 55.147 | -4702111234474983.74 |
| 11 | 8578.988 | 70.900 | -4702111234474983.74 |
| 12 | 13235.484 | 91.913 | -4702111234474983.74 |
| 13 | 24.395 | 24.395 | -4702111234474983.74 |
| 14 | 68.968 | 17.242 | -4702111234474983.74 |
| 15 | 146.268 | 16.252 | -4702111234474983.74 |
| 16 | 275.772 | 17.235 | -4702111234474983.74 |
| 17 | 487.470 | 19.498 | -4702111234474983.74 |
| 18 | 827.244 | 22.979 | -4702111234474983.74 |
| 19 | 1364.860 | 27.854 | -4702111234474983.74 |
| 20 | 2205.928 | 34.467 | -4702111234474983.74 |
| 21 | 3509.595 | 43.328 | -4702111234474983.74 |
| 22 | 5514.790 | 55.147 | -4702111234474983.74 |
| 23 | 8578.988 | 70.900 | -4702111234474983.74 |
| 24 | 13235.484 | 91.913 | -4702111234474983.74 |
2023-05-23 18:24:31 +08:00