Commit Graph

8203 Commits

Author SHA1 Message Date
42e91149e4 [enhancement](auto-partition) Forbid use Auto and Dynamic partition at the same time (#33736) 2024-04-19 23:41:46 +08:00
f2a0ac8ff2 [feature] (partition) Dynamic partition behavior changes (#33712) 2024-04-19 23:41:46 +08:00
15f8014e4e [enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867)
* [enhancement](Nereids) Enable parse sql from sql cache (#33262)

Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms.

This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed.

In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms

## Features
1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe
2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this:
```sql
mysql> set enable_sql_cache=true;
Query OK, 0 rows affected (0.00 sec)

mysql> explain physical plan select * from test.t;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                  |
+----------------------------------------------------------------------------------+
| cost = 3.135                                                                     |
| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] )                              |
| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) |
|    +--PhysicalOlapScan[t]@0 ( stats=3 )                                          |
+----------------------------------------------------------------------------------+
4 rows in set (0.02 sec)

mysql> select * from test.t;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    2 |
|   -2 |   -2 |
| NULL |   30 |
+------+------+
3 rows in set (0.05 sec)

mysql> explain physical plan select * from test.t;
+-------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                           |
+-------------------------------------------------------------------------------------------+
| cost = 0.0                                                                                |
| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) |
| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] )                                    |
|    +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather )       |
|       +--PhysicalOlapScan[t]@0 ( stats=3 )                                                |
+-------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
```

(cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20)

* fix

* [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722)

fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key.
support dynamic update the num of fe manage sql cache key

(cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30)

* [fix](Nereids) fix bug of dry run query with sql cache (#33799)

1. dry run query should not use sql cache
2. fix test sql cache in cloud mode
3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql

(cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe)

* remove cloud mode

* remove @NotNull
2024-04-19 15:22:14 +08:00
ad75b9b142 [opt](auto bucket) add fe config autobucket_max_buckets (#33842) 2024-04-19 15:03:06 +08:00
1a6f8c443e [bugfix](paimon) Create paimon catalog with hadoop user (#33833)
When creating a catalog, paimon will create a warehouse on HDFS, so we need to use the corresponding user with permissions to create it.
2024-04-19 15:02:56 +08:00
6776a3ad1b [Fix](planner) fix create view star except and modify cast to sql (#33726) 2024-04-19 15:02:49 +08:00
a8ba933947 [Fix](nereids) fix bind order by expression logic (#33843) 2024-04-19 15:02:49 +08:00
2675e94a93 [feature](variable) add read_only and super_read_only (#33795) 2024-04-19 15:02:21 +08:00
5abc84af71 [fix](txn insert) Fix txn insert commit failed when schema change (#33706) 2024-04-19 15:01:57 +08:00
315f6e44c2 [Branch-2.1](Outfile) Fixed the problem that the concurrent Outfile wrote multiple Success files (#33870)
backport: #33016
2024-04-19 12:09:53 +08:00
561afde0c4 [feature](insert)support default value when create hive table (#33666)
Issue Number: #31442

hive3 support create table with column's default value
if use hive3, we can write default value to table
2024-04-19 11:31:33 +08:00
734520a77b [bugfix](hive)delete write path after hive insert (#33798)
Issue #31442

1. delete file according query id
2. delete write path after insert
2024-04-19 11:31:25 +08:00
c8a92b82cc [fix](restore) Reset index id for MaterializedIndexMeta (#33831) 2024-04-18 19:05:24 +08:00
46fa64f34b [minor](Nereids): remove useless getFilterConjuncts() filter() in Translator (#33801) 2024-04-18 19:05:24 +08:00
3eca9da0dd [refactor](filesystem)refactor filesystem interface (#33361)
1. Remame`list` to `globList` . The path of this `list` needs to have a wildcard character, and the corresponding hdfs interface is `globStatus`, so the modified name is `globList`.
2. If you only need to view files based on paths, you can use the `listFiles` operation.
3. Merge `listLocatedFiles` function into `listFiles` function.
2024-04-18 19:05:24 +08:00
34a97d5e8b [fix](Nereids)fix unstable plan shape in limit_push_down case 2024-04-18 19:05:24 +08:00
75b47b7189 [opt](nereids)clear min/max column stats if table is partially analyzed (#33685) 2024-04-18 19:04:03 +08:00
a05d738b6c [fix](planner) create view statement should forbid mv rewrite (#33784) 2024-04-18 19:02:58 +08:00
Pxl
8c535c51b5 [Improvement](materialized-view) support multiple agg function have same base table slot (#33774)
support multiple agg function have same base table slot
2024-04-18 19:02:49 +08:00
4de357ccfb [Fix](Variant Type) forbit distribution info contains variant columns (#33707) 2024-04-18 19:02:37 +08:00
a57e0d3500 [Pick](nerids) pick #33010 #32982 #33531 to branch 2.1 (#33829) 2024-04-18 18:40:36 +08:00
20b37e7a18 Add workload group id in workload policy's property (#33483) 2024-04-17 23:42:14 +08:00
048448eb32 [fix](Nereids) dphyper support evaluate join that has one side condition (#33702) 2024-04-17 23:42:14 +08:00
461561fed0 [minor](Nereids): remove useless stream filter() in Translator (#33758) 2024-04-17 23:42:14 +08:00
4460d23cd9 [chore](variable) update nereids timeout second default value to 30 (#33749) 2024-04-17 23:42:14 +08:00
b5640ae763 [fix](restore) add indexes as part of table signature (#33650) 2024-04-17 23:42:14 +08:00
ffa0e57122 [enhancement](auditlog) ignore any errors in write audit log (#33750) 2024-04-17 23:42:14 +08:00
5555cc175f [feature](windows function)Improve error handling for window functions (#33673) 2024-04-17 23:42:14 +08:00
c6d1d75ff2 [Fix](Json type) forbit schema change adding JSON columns with none null default value (#33686) 2024-04-17 23:42:14 +08:00
2648a92594 [FIX](load)fix load with split-by-string (#33713) 2024-04-17 23:42:14 +08:00
ff8cb3cc43 [Fix](executor)Fix routine load failed when can not find group (#33596) 2024-04-17 23:42:13 +08:00
b44fed8dc2 [fix](restore) Reset index id for restore (#33648) 2024-04-17 23:42:13 +08:00
5734e2bd30 [opt](meta-cache) refine the meta cache (#33449) (#33754)
bp #33449
2024-04-17 23:42:13 +08:00
2854048eb5 fix compile 2024-04-17 23:42:13 +08:00
43974a2334 (Fix)(nereids) modify create view privilege check error message (#33669) 2024-04-17 23:42:13 +08:00
db846709d2 [opt](Nereids) auto fallback when meet udf override (#33708) 2024-04-17 23:42:13 +08:00
81f7c53bad [fix](Nereids) could not query variant that not from table (#33704) 2024-04-17 23:42:13 +08:00
22a6b1d3f5 [feature](function) support hll functions hll_from_base64, hll_to_base64 (#32089)
Issue Number: #31320 

Support two hll functions:

- hll_from_base64
Convert a base64 string(result of function hll_to_base64) into a hll.
- hll_to_base64
Convert an input hll to a base64 string.
2024-04-17 23:42:13 +08:00
3096150d1b [feature](agg) support aggregate function group_array_intersect (#33265) 2024-04-17 23:42:13 +08:00
07a8f44443 [improvement](spill) improve config and fix spill bugs (#33519) 2024-04-17 23:42:13 +08:00
3f267e36d1 [fix](nereids)InSubquery's withChildren method lost typeCoercionExpr (#33692) 2024-04-17 23:42:13 +08:00
2890f6c3cf [opt](Nereids) date literal support basic format with timezone (#33662) 2024-04-17 23:42:13 +08:00
11266dd9b8 [minor](Nereids): remove useless override (#33651) 2024-04-17 23:42:13 +08:00
16e9eb3b05 [fix](analyze) avoid java.util.ConcurrentModificationException (#33674)
```
java.util.ConcurrentModificationException: null
        at java.util.TreeMap$ValueSpliterator.forEachRemaining(TreeMap.java:3226) ~[?:?]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) ~[?:?]
        at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[?:?]
        at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) ~[?:?]
        at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:?]
        at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) ~[?:?]
        at org.apache.doris.statistics.AnalysisManager.findShowAnalyzeResult(AnalysisManager.java:552) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.statistics.AnalysisManager.showAnalysisJob(AnalysisManager.java:533) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ShowExecutor.handleShowAnalyze(ShowExecutor.java:2772) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:447) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2738) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.executeByLegacy(StmtExecutor.java:1010) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:624) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.StmtExecutor.execute(StmtExecutor.java:526) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ConnectProcessor.executeQuery(ConnectProcessor.java:333) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.ConnectProcessor.handleQuery(ConnectProcessor.java:228) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.MysqlConnectProcessor.handleQuery(MysqlConnectProcessor.java:176) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.MysqlConnectProcessor.dispatch(MysqlConnectProcessor.java:205) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.qe.MysqlConnectProcessor.processOnce(MysqlConnectProcessor.java:258) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
        at java.lang.Thread.run(Thread.java:840) ~[?:?]
```

Due to the `Collections.synchronizedNavigableMap`'s java doc:

```
In order to guarantee serial access, it is critical that all access to the backing navigable map is accomplished through the returned navigable map (or its views).
It is imperative that the user manually synchronize on the returned navigable map when traversing any of its collection views, or the collections views of any of its subMap, headMap or tailMap views, via Iterator, Spliterator or Stream
```
2024-04-17 23:42:13 +08:00
ca728a2405 [feature](proc)Add table's indexes info in show proc interface (#33438)
1. Add show proc `/dbs/db_id/table_id/indexes` impl
2. Remove index_id in `show index from table`
3. Add test cases

---------

Co-authored-by: Luennng <luennng@gmail.com>
2024-04-17 23:42:13 +08:00
dac2829194 [fix](routine-load) fix data lost when FE leader change (#33678) 2024-04-17 23:42:13 +08:00
d15981abd2 [Enhencement](Nereids) add rule of agg(case when) to agg(filter) (#33598) 2024-04-17 23:42:13 +08:00
1fba73eea4 [fix](fe) Fix finalizeCommand sendAndFlush NullPointerException (#33420) 2024-04-17 23:42:13 +08:00
8e38549a92 [fix](nereids) Use correct PREAGGREGATION in agg(filter(scan)) (#33454)
1. set `PreAggStatus` to `ON` when agg key column by max or min;
2. #28747 may change `PreAggStatus` of scan, inherit it from the previous one.
2024-04-17 23:42:13 +08:00
d18f5e2544 [refactor](refresh-catalog) refactor the refresh catalog code (#33653)
To unify the code.
In previous, we do catalog refresh in `CatalogMgr`, but do
database and table refresh in `RefreshMgr`, which is very confusing.

This PR move all `refresh` related code from CatalogMgr to RefreshMgr.

No logic is changed in this PR.
2024-04-17 23:42:12 +08:00