Commit Graph

7690 Commits

Author SHA1 Message Date
4de25ede85 [fix](Nereids): other cond should be kept for each anti join when expanding anti join such as (#31521) 2024-02-29 08:42:35 +08:00
8633a0c0cc [Opt](exec) enable top opt in string type (#31489)
enable top opt in string type
2024-02-29 08:42:35 +08:00
3ca412efe3 Return UNKNOWN column stats if ndv is 0. (#31439) 2024-02-29 08:42:35 +08:00
e8a21b529e [Fix](fe-core) Fix The EliminateSortUnderSubquery will not affect the EliminateOrderByConstant rule (#31402) (#31403) 2024-02-28 17:52:11 +08:00
Pxl
6737fdea64 [Chore](agg-state) adjust AggStateType constructor check input (#31401)
adjust AggStateType constructor check input
2024-02-28 17:52:11 +08:00
d88caca44a [fix](Nereids) push down topn distinct through join by mistake (#31396)
should not push down topn distinct through join when the output
columns of the corresponding child of join is more than
aggregate distinct columns.

for example for LEFT_OUTER_JOIN:

left child of join's output is: c1, c2, c3.
distinct columns is: c1, c2
topn: limit 2

if we push down topn distinct, we could get result of join like this:

```
c1    c2    c3, ...
1     2     1
1     2     2
```

and the final result we get is:

```
c1    c2
1     2
```

this is wrong, because we need 2 lines, but only return 1.
2024-02-28 17:51:32 +08:00
79dd4e24ff fix compile 2024-02-28 17:38:47 +08:00
7a42d3b52c [fix](fe ut) fix TabletRepairAndBalanceTest (#31397) 2024-02-28 13:08:41 +08:00
883d022f84 [fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478) 2024-02-28 13:08:41 +08:00
d0a8a30998 [improvement](iceberg/paimon)add show table stats (#31473) 2024-02-28 13:08:36 +08:00
5bbe9f7b40 Fix replay binlog gc when not found db binlog (#31463)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
2024-02-28 13:07:47 +08:00
5824f8a4bc [Feat](nereids) support multi-leading (#30379)
support multi-leading in each query block
2024-02-28 13:07:47 +08:00
ab00c7012b [fix] Fix the incorrect class name in the getLogger method call of MysqlTable. (#31465) 2024-02-28 13:07:47 +08:00
Pxl
3acfda413b [Chore](function) remove unused check on count function (#31400) 2024-02-28 13:07:47 +08:00
a371a10603 [fix](Nereids) let time type coercion same with legacy planner (#31472) 2024-02-28 13:07:47 +08:00
c0754583cb [opt](plsql) Fix procedure key compatibility (#31445)
use dbId replace dbName, because dbName may be renamed by Alter.
procedure key add package name (only reserved, currently no plans to support package)
Optimize procedure create and exception
2024-02-28 13:07:47 +08:00
9ffcf48cce [enhancement](Nereids) Support show process time and process steps by explain statement (#31339)
## Proposed changes

1. show process time when execute `explain plan xxx` by nereids
2. add `explain xxx plan process select ...` statement to show the process of the plan, not support show memo shape (physical plan) currently

example:
show process time:
```
mysql> explain plan select * from tt;
+---------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                               |
+---------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 3ms) ==========                                                                 |
| UnboundResultSink[3] (  )                                                                                     |
| +--LogicalProject[2] ( distinct=false, projects=[*], excepts=[] )                                             |
|    +--LogicalCheckPolicy (  )                                                                                 |
|       +--UnboundRelation ( id=RelationId#0, nameParts=tt )                                                    |
|                                                                                                               |
| ========== ANALYZED PLAN (time: 6ms) ==========                                                               |
| LogicalResultSink[11] ( outputExprs=[id#0, name#1] )                                                          |
| +--LogicalProject[9] ( distinct=false, projects=[id#0, name#1], excepts=[] )                                  |
|    +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON ) |
|                                                                                                               |
| ========== REWRITTEN PLAN (time: 0ms)==========                                                               |
| LogicalResultSink[11] ( outputExprs=[id#0, name#1] )                                                          |
| +--LogicalProject[9] ( distinct=false, projects=[id#0, name#1], excepts=[] )                                  |
|    +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON ) |
|                                                                                                               |
| ========== OPTIMIZED PLAN (time: 2ms) ==========                                                              |
| PhysicalResultSink[56] ( outputExprs=[id#0, name#1] )                                                         |
| +--PhysicalDistribute[53]@1 ( stats=2, distributionSpec=DistributionSpecGather )                              |
|    +--PhysicalProject[50]@1 ( stats=2, projects=[id#0, name#1] )                                              |
|       +--PhysicalOlapScan[tt]@0 ( stats=2 )                                                                   |
+---------------------------------------------------------------------------------------------------------------+
21 rows in set (0.01 sec)
```

explain plan process:
```
mysql> explain plan process select * from tt\G
*************************** 1. row ***************************
  Rule: BINDING_RELATION
Before: UnboundResultSink[8] (  )
+--LogicalProject[7] ( distinct=false, projects=[*], excepts=[] )
   +--LogicalCheckPolicy (  )
      +--UnboundRelation ( id=RelationId#0, nameParts=tt )
 After: UnboundResultSink[11] (  )
+--LogicalProject[10] ( distinct=false, projects=[*], excepts=[] )
   +--LogicalCheckPolicy (  )
      +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 2. row ***************************
  Rule: CHECK_ROW_POLICY
Before: UnboundResultSink[15] (  )
+--LogicalProject[14] ( distinct=false, projects=[*], excepts=[] )
   +--LogicalCheckPolicy (  )
      +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: UnboundResultSink[17] (  )
+--LogicalProject[16] ( distinct=false, projects=[*], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 3. row ***************************
  Rule: BINDING_PROJECT_SLOT
Before: UnboundResultSink[22] (  )
+--LogicalProject[21] ( distinct=false, projects=[*], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: UnboundResultSink[23] (  )
+--LogicalProject[20] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 4. row ***************************
  Rule: BINDING_RESULT_SINK
Before: UnboundResultSink[26] (  )
+--LogicalProject[20] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[25] ( outputExprs=[id#0, name#1] )
+--LogicalProject[20] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 5. row ***************************
  Rule: ELIMINATE_UNNECESSARY_PROJECT
Before: LogicalResultSink[25] ( outputExprs=[id#0, name#1] )
+--LogicalProject[20] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[27] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 6. row ***************************
  Rule: PRUNE_EMPTY_PARTITION
Before: LogicalResultSink[29] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[30] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
*************************** 7. row ***************************
  Rule: MATERIALIZED_INDEX_SCAN
Before: LogicalResultSink[36] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[37] ( outputExprs=[id#0, name#1] )
+--LogicalProject[35] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalProject[34] ( distinct=false, projects=[id#0, name#1], excepts=[] )
      +--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
*************************** 8. row ***************************
  Rule: MERGE_PROJECTS
Before: LogicalResultSink[40] ( outputExprs=[id#0, name#1] )
+--LogicalProject[39] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalProject[34] ( distinct=false, projects=[id#0, name#1], excepts=[] )
      +--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[41] ( outputExprs=[id#0, name#1] )
+--LogicalProject[38] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
*************************** 9. row ***************************
  Rule: ELIMINATE_UNNECESSARY_PROJECT
Before: LogicalResultSink[42] ( outputExprs=[id#0, name#1] )
+--LogicalProject[38] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[43] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
*************************** 10. row ***************************
  Rule: REWRITE_CTE_CHILDREN
Before: LogicalResultSink[25] ( outputExprs=[id#0, name#1] )
+--LogicalProject[20] ( distinct=false, projects=[id#0, name#1], excepts=[] )
   +--LogicalOlapScan ( qualified=test.tt, indexName=<index_not_selected>, selectedIndexId=10361, preAgg=ON )
 After: LogicalResultSink[43] ( outputExprs=[id#0, name#1] )
+--LogicalOlapScan ( qualified=test.tt, indexName=tt, selectedIndexId=10361, preAgg=ON )
10 rows in set (0.00 sec)
```
2024-02-28 13:07:25 +08:00
47d112e4e6 [fix](MySQL) implement SHOW CHARSET statement. (#31389) 2024-02-28 13:07:23 +08:00
7b3377d474 [fix](Nereids) let with methods of plans use correct logical properties (#31447) 2024-02-28 13:05:57 +08:00
37590f1778 Make sure external table fetched dbId before call getRowCount. (#31379) 2024-02-28 13:05:57 +08:00
eb0416032b [feature](multi-catalog)support hms catalog create and drop table/db (#30198) (#31499)
1. rename old create/drop table to add/removeMemoryTable
2. add new create/drop table/db method
3. support hms catalog create/drop table/db

(cherry picked from commit b2e869c7414c68186de8d43b324ae736d7cc3463)
2024-02-28 09:33:54 +08:00
39a8db27f2 [fix](mtmv)TVF Query JOB Concurrent Reading and Writing Causes Exception #31422 2024-02-27 16:06:26 +08:00
6b4a756837 [Fix] Only datetime and datetimev2 types can use current_timestamp as column default value (#31395)
for this kind of sql:

create table test_default10(
  a int, 
  b varchar(100) default current_timestamp
)
distributed by hash(a)
properties('replication_num'="1");

add check:
 Types other than DATETIME and DATETIMEV2 cannot use current_timestamp as the default value
2024-02-27 16:06:26 +08:00
378ced72db [chore](Nereids) more reasonable parse select list only query (#31346) 2024-02-27 16:06:26 +08:00
481d94c3fc [feature](nereids) deal the slots that appear both in agg func and grouping sets (#31318)
this PR support slot appearing both in agg func and grouping sets.
sql like below:
select sum(a) from t group by grouping sets ((a)); 

Before this PR, Nereids throw exception like below:
col_int_undef_signed cannot both in select list and aggregate functions when using GROUPING SETS/CUBE/ROLLUP, please use union instead.

This PR removes the restriction and supports this situation.
2024-02-27 10:12:33 +08:00
dd229b77b1 [fix](inverted index)Remove the strong check for parser when creating a table with inverted index (#31391) 2024-02-27 10:12:33 +08:00
c34639245e [Improvement](executor)add remote scan thread pool (#31376)
* add remote scan thread pool

* +1
2024-02-27 10:12:33 +08:00
1127b0065a [Improment](executor)Add scanbytes/scanrows condition (#31364)
* Add scanbytes/scanrows condition

* fix reg
2024-02-27 10:12:33 +08:00
f163d56a98 [feature](function) support sequence function(alias of array_range), enhance both to handle datetimev2 (#30823) 2024-02-27 10:12:19 +08:00
3cee6c6722 [fix](function) fix unexpected be core in string search function (#31312)
Fix be core in multi_match_any/multi_search_all_positions functions.
2024-02-27 10:12:18 +08:00
5d4a2d93a6 [feature](nereids) support join with joinRelation (#30909) 2024-02-26 19:07:11 +08:00
Pxl
fcea2b964e [Chore](materialized-view) forbid create mv have calculations included outside aggregate functions (#31336)
forbid create mv have calculations included outside aggregate functions
2024-02-26 19:07:11 +08:00
b1fc0ebbe7 [improvement](iceberg/paimon)support estimate row count (#31204)
Get the number of rows evaluated for iceberg and paimon.
2024-02-26 19:07:10 +08:00
f951ca2efb [refactor](stats) Remove useless async loader code. (#31380) 2024-02-26 19:07:10 +08:00
Pxl
3b7261abe4 [Mv] check delete from column exists on mv (#31321) 2024-02-26 19:07:10 +08:00
3451cd6c23 [fix](datetime) fix hour 24 on be (#31304) 2024-02-26 19:07:10 +08:00
e48f4f38d0 [Fix](fe-common) Fix the Pair.java code about the hidden danger of NullPointException (#31371)
* 修复Pair类 first 或 second 为null时,调用equals和toString 抛NullPointException问题

* add license
2024-02-26 19:07:10 +08:00
7a9fe5d275 [enhance](mtmv)MTMV supports Hive multi-level partitioning (#31060)
Issue Number: close #xxx

For example, the hive table is partitioned by `date` and `region`, with the following 6 partitions
```
20200101
        beijing
        shanghai
20200102
        beijing
        shanghai
20200103
        beijing
        shanghai
```

If the MTMV is partitioned by `date`, then the MTMV will have three partitions: 20200101, 202000102, 20200103

If the MTMV is partitioned by `region`, then the MTMV will have two partitions: beijing, shanghai
2024-02-25 18:08:19 +08:00
7a1caf4718 [refactor](wg) enable wg by default and init normal wg in constructor (#31373)
should always enable workload group because other operations depend on it for example MTMV, and spill to disk.
the normal workload group should be created in constructor.
2024-02-25 18:08:19 +08:00
8001f73e52 [pipelineX](file scan) Improve parallel tasks if ignore data distribution (#31328) 2024-02-24 11:45:05 +08:00
aee49adf1e [opt](compute-node) refactor compute node doc and opt some default config (#31325)
* [opt](compute-node) refactor compute node doc and opt some default config

* 1

* 1
2024-02-24 11:44:53 +08:00
dcbba9a013 fix compile 2024-02-24 08:26:57 +08:00
db58104bc3 [fix](inverted index) Fix inverted index for MOR unique table #31051 (#31354) 2024-02-23 23:10:36 +08:00
481517ac6a [fix](plan) only scan node with limit and no predicate can reduce to 1 instance (#31342)
This PR #25952 introduce a opt that if a scan node has limit and predicates, use only 1 instance to save cup and memory.
But this is wrong because we can not guarantee that the predicates can truly help to prune the data.
So I modify the logic to remove this opt.
Now, only scan node with limit and NO predicate can reduce to only 1 instance.
2024-02-23 21:09:14 +08:00
49842eecc5 [Fix](multi-catalog) Fix NPE when refreshing catalog on Slave FE. (#31335)
---------

Co-authored-by: wangxiangyu <wangxiangyu@360shuke.com>
2024-02-23 20:59:07 +08:00
b6ca76e7d4 fix routine load job throw exception after commit (#31303) 2024-02-23 20:57:03 +08:00
bb31f4adb6 [fix](mtmv)fix generate partition name illegality when partition value contains colon (#31282) 2024-02-23 19:05:20 +08:00
b5ec1e7b7d [fix](Nereids) support check authorization for view but skip check in the view (#31289)
move UserAuthentication in BindRelation, support check authorization view but skip check in the view

relate pr: #23295
2024-02-23 19:03:28 +08:00
9a40b6c978 Refactor get row count related interface, add row count cache for external table. (#31276) 2024-02-23 19:03:28 +08:00
8f77e6363a [Feature](function) Support xxhash function like murmur hash function (#31193) 2024-02-23 19:03:28 +08:00