Commit Graph

168 Commits

Author SHA1 Message Date
fbc82e0253 [opt](log) refine the BE logger (#35942) (#35988)
bp #35942
2024-06-06 22:25:22 +08:00
373d9ab988 [enhance](mtmv)add truncate table case (#35599)
truncate table or truncate partition,mtmv should can detect data change
2024-05-30 19:59:37 +08:00
a03309c7e5 Revert "[regression-test](framework) add log for scpFiles (#35570)"
This reverts commit 6fd3916168af24357e12bc1b36cdbeb64985bbf7.
2024-05-30 11:24:29 +08:00
eefea4c7e6 [fix](mtmv) Fix partition mv rewrite result wrong (#35236)
this is brought by https://github.com/apache/doris/pull/33800
if mv is partitioned materialzied view,
the data will be wrong by using the hited materialized view when the
paritions in related base partiton table are deleted, created and so on.
this fix the problem.

if **SET enable_materialized_view_union_rewrite=true;** this will use
the materializd view and make sure the data is corrent
if **SET enable_materialized_view_union_rewrite=false;** this will query
base table directly to make sure the data is right
2024-05-29 20:30:23 +08:00
6fd3916168 [regression-test](framework) add log for scpFiles (#35570) 2024-05-29 15:06:46 +08:00
473e14ca82 [chore](backup) log backup/restore job during replay (#35234) 2024-05-24 16:23:57 +08:00
853dbdcb00 [Feature](PreparedStatement) implement general server side prepared (#33807) 2024-05-10 22:10:11 +08:00
3ae3f9d6e1 [opt](catalog) support using loading cache for db/table list in external catalog (#33610) (#34596)
bp #33610
2024-05-09 17:50:39 +08:00
222289697d [improve](regression) Support qt_target_sql (#34236) (#34270) 2024-04-29 11:50:35 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
450f443413 [fix](decommission) fix cann't decommission mtmv (#33823) 2024-04-25 12:01:44 +08:00
60253c827c [fix](nereids) do not push RF into nested cte (#33769) 2024-04-20 20:08:00 +08:00
15f8014e4e [enhancement](Nereids) Enable parse sql from sql cache and fix some bugs (#33867)
* [enhancement](Nereids) Enable parse sql from sql cache (#33262)

Before this pr, the query must pass through parser, analyzer, rewriter, optimizer and translator, then we can check whether this query can use sql cache, if the query is too long, or the number of join tables too big, the plan time usually >= 500ms.

This pr reduce this time by skip the fashion plan path, because we can reuse the previous physical plan and query result if no any changed. In some cases we should not parse sql from sql cache, e.g. table structure changed, data changed, user policies changed, privileges changed, contains non-deterministic functions, and user variables changed.

In my test case: query a view which has lots of join and union, and the tables has empty partition, the query latency is about 3ms. if not parse sql from sql cache, the plan time is about 550ms

## Features
1. use Config.sql_cache_manage_num to control how many sql cache be reused in on fe
2. if explain plan appear some plans contains `LogicalSqlCache` or `PhysicalSqlCache`, it means the query can use sql cache, like this:
```sql
mysql> set enable_sql_cache=true;
Query OK, 0 rows affected (0.00 sec)

mysql> explain physical plan select * from test.t;
+----------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                  |
+----------------------------------------------------------------------------------+
| cost = 3.135                                                                     |
| PhysicalResultSink[53] ( outputExprs=[c1#0, c2#1] )                              |
| +--PhysicalDistribute[50]@0 ( stats=3, distributionSpec=DistributionSpecGather ) |
|    +--PhysicalOlapScan[t]@0 ( stats=3 )                                          |
+----------------------------------------------------------------------------------+
4 rows in set (0.02 sec)

mysql> select * from test.t;
+------+------+
| c1   | c2   |
+------+------+
|    1 |    2 |
|   -2 |   -2 |
| NULL |   30 |
+------+------+
3 rows in set (0.05 sec)

mysql> explain physical plan select * from test.t;
+-------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                           |
+-------------------------------------------------------------------------------------------+
| cost = 0.0                                                                                |
| PhysicalSqlCache[2] ( queryId=78511f515cda466b-95385d892d6c68d0, backend=127.0.0.1:9050 ) |
| +--PhysicalResultSink[52] ( outputExprs=[c1#0, c2#1] )                                    |
|    +--PhysicalDistribute[49]@0 ( stats=3, distributionSpec=DistributionSpecGather )       |
|       +--PhysicalOlapScan[t]@0 ( stats=3 )                                                |
+-------------------------------------------------------------------------------------------+
5 rows in set (0.01 sec)
```

(cherry picked from commit 03bd2a337d4a56ea9c91673b3bd4ae518ed10f20)

* fix

* [fix](Nereids) fix some sql cache consistence bug between multiple frontends (#33722)

fix some sql cache consistence bug between multiple frontends which introduced by [enhancement](Nereids) Enable parse sql from sql cache #33262, fix by use row policy as the part of sql cache key.
support dynamic update the num of fe manage sql cache key

(cherry picked from commit 90abd76f71e73702e49794d375ace4f27f834a30)

* [fix](Nereids) fix bug of dry run query with sql cache (#33799)

1. dry run query should not use sql cache
2. fix test sql cache in cloud mode
3. enable cache OneRowRelation and EmptyRelation in frontend to skip parse sql

(cherry picked from commit dc80ecf7f33da7b8c04832dee88abd09f7db9ffe)

* remove cloud mode

* remove @NotNull
2024-04-19 15:22:14 +08:00
46a258dc85 [improvement](binlog)Support inverted index format v2 in CCR (#33415) 2024-04-17 23:42:12 +08:00
9a3b19d21e [fix](cases) Add check status timeout for backup/restore cases (#32975) 2024-04-11 13:10:24 +08:00
0e99926b28 (httpaction) log response of http (#33270) 2024-04-10 15:58:14 +08:00
ff990eb869 [enhancement](Nereids) refactor expression rewriter to pattern match (#32617)
this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)
2024-04-10 14:59:45 +08:00
a7c8abe58c [feature](nereids) support common sub expression by multi-layer projections (fe part) (#33087)
* cse fe part
2024-04-10 14:53:56 +08:00
39b768b9ea [branch-2.1](thirdparty) upgrade arrow to 15.0.2 #32827 2024-03-27 08:35:40 +08:00
ea8d4f2d0b [fix][regression]update ccr test project (#32445) 2024-03-21 14:07:24 +08:00
9767861ee9 [enhancement](test) opt the unique model by modify value (#32043) 2024-03-15 18:05:35 +08:00
5f75b36ad3 [regression](framework) add config caseNamePrefix (#32266) 2024-03-14 21:48:28 +08:00
cf04c9c300 [enhancement](Nereids) refine and speedup analyzer (#31792) (#32111)
## Proposed changes
1. check data type whether can applied should not throw exception when real data type is subclass of signature data type
2. merge `SlotBinder` and `FunctionBinder` to `ExpressionAnalyzer` to skip rewrite the whole expression tree multiple times.
3. `ExpressionAnalyzer.buildCustomSlotBinderAnalyzer()` provide more refined code to bind slot by different parts and different priority
4. the origin slot binder has O(n^2) complexity, this pr use `Scope.nameToSlot` to support O(n) bind
5. modify some `Collection.stream()` to `ImmutableXxx.builder()` to remove some method call which are difficult to inline by jvm in the hot path, e.g. `Expression.<init>` and `AbstractTreeNode.<init>`
6. modify some `ImmutableXxx.copyOf(xxx)` to `Utils.fastToImmutableList(xxx)` to skip addition copy of the array
7. set init size to `Immutable.builder()` to skip some useless resize
8. lazy compute and cache some heavy operations, like `Scope.nameToSlot` and `CaseWhen.computeDataTypesForCoercion()`

(cherry picked from commit 83c2f5a95827136aac4f0a78c5e841e9a099858c)
2024-03-12 17:09:38 +08:00
21e412393b [enhancement] add_method_for_schemachange (#31849) 2024-03-09 19:45:50 +08:00
cc0e58faec [enhancement](regression-test) upgrade groovy to 4.x and enable run test by jdk17/21 (#31906)
upgrade groovy to 4.x and enable run test by jdk17 / 21
2024-03-09 19:45:46 +08:00
07224686ef [feature](jdbc catalog) support db2 jdbc catalog (#31627) 2024-03-01 14:19:28 +08:00
5276cc4db6 [docker][fix] update routine load cases (#31553)
Co-authored-by: 胥剑旭 <xujianxu@xujianxudeMacBook-Pro.local>
2024-02-29 16:44:39 +08:00
22b6434054 [feature](doris compose) Add create cloud cluster (#31315) 2024-02-27 10:12:19 +08:00
a6c0be611c [fix](mtmv) fix mtmv workload group case failed (#31218) 2024-02-22 19:51:20 +08:00
98c3cb825f [fix](Nereids) simplify airthmetic should not change return type (#31237) 2024-02-22 19:50:47 +08:00
1d6dc9a5f0 [fix](catalog recycle bin) Forbid recover partition if table schame is changed #31146 2024-02-21 13:53:18 +08:00
fd453ace38 [enhancement](sc-test) Optimize waitForSchemeChangeDone (#31002) 2024-02-20 16:23:53 +08:00
08c196f3dc [enhancement](stmt-forward) record query result for proxy query to avoid EOF (#30536) 2024-02-16 10:12:25 +08:00
d1bb63ed67 [fix](arrow-flight) Modify FE Arrow version to 15.0.0 #30824 2024-02-05 21:56:57 +08:00
ccbcf879b5 [test](mtmv) Add materialized view availability regression test (#30769)
Add materialized view availability regression test

when mv refresh_time is in the grace_period(unit is second), materialized view will be use to
query rewrite regardless of the base table is update or not
when mv refresh_time is out of the grace_period(unit is second), will check the base table is update or not
if update the materialized view will not be used to query rewrite
2024-02-04 22:21:16 +08:00
b275cb0f44 [feature](mtmv) mtmv support workload group (#29595)
MTMV supports controlling the resource usage of refresh tasks by setting the name of workload group
about workload group : https://doris.apache.org/zh-CN/docs/dev/admin-manual/workload-group
2024-02-04 14:28:38 +08:00
ecf282ca92 [improve](catalog recycle bin) show data size info when show catalog recycle bin (#30592) 2024-02-01 19:00:51 +08:00
1f754c55d5 [chore](show replica) show replica print path (#30402) 2024-02-01 19:00:50 +08:00
3354ac48f7 [enhance](mtmv)add version and version time for table (#30437)
Add version to record data changes in the table

Scope of impact: 

- Transaction related operations
- drop partition
- replace partition
2024-01-29 19:03:47 +08:00
49f879f8fd [regression test](framework) add waitFor action (#30289) 2024-01-25 13:24:52 +08:00
48d7c1b1ed [test](regression-test) fix case, compatible with 3 replicas. (#29905) 2024-01-18 10:04:21 +08:00
697a6a4ba2 [Refactor](admin-stmt) rename some admin-show statestmt (#29492)
The `ADMIN SHOW` statement can not be executed with high version of mysql 8.x jdbc driver.
So I rename these statement, remove the `ADMIN` keywords.

1. ADMIN SHOW CONFIG -> SHOW CONFIG
2. ADMIN SHOW REPLICA -> SHOW REPLICA
3. ADMIN DIAGNOSE TABLET -> SHOW TABLET DIAGNOSIS
4. ADMIN SHOW TABLET -> SHOW TABLET

for compatibility, the old statements are still supported, but not recommend to use.
They will be removed in later version
2024-01-12 11:53:57 +08:00
3f988e1b7f [improvement](doris compose) regression test auto install doris compose requirements (#29012) 2024-01-06 20:20:38 +08:00
5985d216f3 [feature](mtmv)support cancel mtmv task command (#29252)
- `CANCEL MATERIALIZED VIEW TASK taskId on mvName`
- CANCEL MATERIALIZED VIEW TASK, tasks("type"="mv") and jobs("type"="mv") support check auth use priv of mv
- tasks and jobs add column mvName and mvDbName,you can use `select * from tasks("type"="mv") where MvName="xxx"` get all tasks of mv
- fix `desc mv all` error
- fix p0 The task sequence is incorrect
2023-12-31 23:10:30 +08:00
7604401b06 [Enhance](regression)Do path creation ahead of time for case test_export_external_table (#28616)
Do path creation ahead of time for case test_export_external_table
2023-12-29 17:59:20 +08:00
24cdc00b60 [improvement](doris compose) avoid docker network conflict (#28957) 2023-12-29 15:05:28 +08:00
6ed09c3f46 [fix](mtmv)add log for resolve pending task (#29078) 2023-12-27 12:36:17 +08:00
eb50db1f3f [fix](regression) fix test_set_replica_status due to force_olap_table_replication_num=3 (#28573) 2023-12-20 09:27:18 +08:00
f9ddf8c7ef [improvement](be report) add be report http (#28424) 2023-12-19 10:39:19 +08:00
a267a7fbe3 [fix](arrow-flight) Modify regression test Arrow version to 14.0.1 (#28472)
Same as #28093
2023-12-18 14:36:59 +08:00