Commit Graph

45 Commits

Author SHA1 Message Date
84af8e0a53 [enhance](mtmv)mtmv support hive default partition (#32051) 2024-03-12 22:51:11 +08:00
667b1fba04 [enhance](mtmv) MTMV Use partial partition of base table (#31632)
MTMV add 3 properties:
partition_sync_limit: digit
partition_sync_time_unit: DAY/MONTH/YEAR
partition_sync_date_format: like "%Y-%m-%d"/"%Y%m%d"

For example, the current time is 2020-02-03 20:10:10
- If partition_sync_limit is set to 1 and partition_sync_time_unit is set to DAY, only partitions with a time greater than or equal to 2020-02-03 00:00:00 will be synchronized to the MTMV
- If partition_sync_limit is set to 1 and partition_sync_time_unit is set to MONTH, only partitions with a time greater than or equal to 2020-02-01 00:00:00 will be synchronized to the MTMV
- If partition_sync_limit is set to 1 and partition_sync_time_unit is set to YEAR, only partitions with a time greater than or equal to 2020-01-01 00:00:00 will be synchronized to the MTMV
- If partition_sync_limit is set to 3 and partition_sync_time_unit is set to MONTH, only partitions with a time greater than or equal to 2019-12-01 00:00:00 will be synchronized to the MTMV
- If partition_sync_limit is set to 4 and partition_sync_time_unit is set to DAY, only partitions with a time greater than or equal to 2020-01-31 00:00:00 will be synchronized to the MTMV
2024-03-07 16:53:49 +08:00
7a9fe5d275 [enhance](mtmv)MTMV supports Hive multi-level partitioning (#31060)
Issue Number: close #xxx

For example, the hive table is partitioned by `date` and `region`, with the following 6 partitions
```
20200101
        beijing
        shanghai
20200102
        beijing
        shanghai
20200103
        beijing
        shanghai
```

If the MTMV is partitioned by `date`, then the MTMV will have three partitions: 20200101, 202000102, 20200103

If the MTMV is partitioned by `region`, then the MTMV will have two partitions: beijing, shanghai
2024-02-25 18:08:19 +08:00
a6c0be611c [fix](mtmv) fix mtmv workload group case failed (#31218) 2024-02-22 19:51:20 +08:00
4c34ebb1cf [fix](mtmv)Fix the case failure issue caused by the same catalog name #31058 2024-02-20 09:12:38 +08:00
de1724ab6a [case](mtmv) MTMV hive case (#30930) 2024-02-16 10:12:24 +08:00
3017c0a6ff [enhance](mtmv) Limit the number of partitions for table creation (#30867)
- Creating too many partitions is time-consuming, so limiting the number of partitions
- add more case,such as `mor`,`mow`
2024-02-16 10:12:24 +08:00
fc762f426b [enhance](mtmv) mtmv disable hive auto refresh (#30775)
- If the `related table` is `hive`, do not refresh automatically
- If the `related table` is `hive`, the partition col is allowed to be `null`. Otherwise, it must be `not null`
- add more `ut`
2024-02-05 21:56:57 +08:00
b275cb0f44 [feature](mtmv) mtmv support workload group (#29595)
MTMV supports controlling the resource usage of refresh tasks by setting the name of workload group
about workload group : https://doris.apache.org/zh-CN/docs/dev/admin-manual/workload-group
2024-02-04 14:28:38 +08:00
658c869aac [improvement](mtmv)mtmv support partition by hms table (#29989) 2024-01-29 19:02:46 +08:00
0b16938b7f [Fix](Nereids) Fix datatype length wrong when string contains chinese (#29885)
When varchar literal contains chinese, the length of varchar should not be the length of the varchar, it should be 
the actual length of the using byte.
Chinese is represented by unicode, a chinese char occypy 4 byte at mostly. So if meet chinese in varchar literal, we 
set the length is 4* length.

for example as following:
>        CREATE MATERIALIZED VIEW test_varchar_literal_mv
>             BUILD IMMEDIATE REFRESH AUTO ON MANUAL
>             DISTRIBUTED BY RANDOM BUCKETS 2
>             PROPERTIES ('replication_num' = '1')
>             AS
>             select case when l_orderkey > 1 then "一二三四" else "五六七八" end as field_1 from lineitem;

mysql> desc test_varchar_literal_mv;
the def of materialized view is as following:
+---------+-------------+------+-------+---------+-------+
| Field   | Type        | Null | Key   | Default | Extra |
+---------+-------------+------+-------+---------+-------+
| field_1 | VARCHAR(16) | No   | false | NULL    | NONE  |
+---------+-------------+------+-------+---------+-------+
2024-01-16 18:31:59 +08:00
0c7c9485b6 [Fix](nereids) Fix get ralated partition table when nodata (#29453)
Support to create partition materialized view using nodata table
Such as the table def as following:
>        CREATE TABLE `test_no_data` (
>         `user_id` LARGEINT NOT NULL COMMENT '"用户id"',
>         `date` DATE NOT NULL COMMENT '"数据灌入日期时间"',
>         `num` SMALLINT NOT NULL COMMENT '"数量"'
>        ) ENGINE=OLAP
>        DUPLICATE KEY(`user_id`, `date`, `num`)
>        COMMENT 'OLAP'
>        PARTITION BY RANGE(`date`)
>        (PARTITION p201701_1000 VALUES [('0000-01-01'), ('2017-02-01')),
>        PARTITION p201702_2000 VALUES [('2017-02-01'), ('2017-03-01')),
>        PARTITION p201703_all VALUES [('2017-03-01'), ('2017-04-01')))
>        DISTRIBUTED BY HASH(`user_id`) BUCKETS 2
>        PROPERTIES ('replication_num' = '1') ;

when table test_no_data has no data, it also support to create partition materialized view as following:
>        CREATE MATERIALIZED VIEW no_data_partition_mv
>            BUILD IMMEDIATE REFRESH AUTO ON MANUAL
>            partition by(`date`)
>            DISTRIBUTED BY RANDOM BUCKETS 2
>            PROPERTIES ('replication_num' = '1')
>            AS
>           SELECT * FROM test_no_data where date > '2017-05-01';
>
2024-01-12 11:44:21 +08:00
5985d216f3 [feature](mtmv)support cancel mtmv task command (#29252)
- `CANCEL MATERIALIZED VIEW TASK taskId on mvName`
- CANCEL MATERIALIZED VIEW TASK, tasks("type"="mv") and jobs("type"="mv") support check auth use priv of mv
- tasks and jobs add column mvName and mvDbName,you can use `select * from tasks("type"="mv") where MvName="xxx"` get all tasks of mv
- fix `desc mv all` error
- fix p0 The task sequence is incorrect
2023-12-31 23:10:30 +08:00
7434de9ed8 [improvement](nereids) Get partition related table disable nullable field and complete agg matched pattern mv rules. (#28973)
* [improvement] (nereids) Get partition related table disable nullable field and modify regression test, complete agg mv rules.

* make filed not null to create partition mv
2023-12-26 00:29:42 +08:00
66b14f4db1 [fix](mtmv)fix can not create mtmv all use default value (#28922) 2023-12-23 21:27:01 +08:00
0a1d9f4cbc [feature](mtmv)add more test case1 (#28910) 2023-12-23 14:39:44 +08:00
623257d02b [feature](mtmv)MTMV pause and resume (#28887)
- PAUSE MATERIALIZED VIEW JOB ON mv1
- RESUME MATERIALIZED VIEW JOB ON mv1
- fix when drop db,not drop job
- add lock for one materialized view can only run one task at a time
2023-12-23 14:30:54 +08:00
3d2b4ae244 [fix](mtmv) fix failed to specify the number of buckets when bucket auto (#28854)
Issue Number: close #xxx

- fix failed to specify the number of buckets when bucket auto
- delete unused SessionVariable
- if mtmv used external table ,check `isMaterializedViewRewriteEnableContainForeignTable`
2023-12-23 09:26:16 +08:00
d1e1619e89 [feature](mtmv)mtmv partition refresh case (#28787) 2023-12-22 14:03:31 +08:00
38e79e32fa [fix](mtmv)fix start time can not be earlier than the current time (#28379) 2023-12-14 17:28:04 +08:00
b6722653cf [test](Job)Delete the JOB show syntax (now we use TVF) and add tvf case (#28058) 2023-12-07 10:17:52 +08:00
6074cddcf8 [feature](mtmv)add Job and task tvf (#27967)
add:
select * from jobs("type"="mv");
select * from tasks("type"="mv");
select * from jobs("type"="insert");
select * from tasks("type"="insert");

add check priv for mv_infos("database"="xxx");

change JobType MTMV==>MV
2023-12-05 15:12:36 +08:00
3791de3cfa [feature](mtmv)(6)implement cancel method (#27541)
1.implement cancel task method
2.fix `show create table ` not display `comment`
2023-11-27 09:49:46 +08:00
dfe3a2dd01 [feature](mtmv)(3)Implementing multi table materialized views (#26146)
Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
  - MTMVJobManager:All operations that affect the MTMV job
  - MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
2023-11-24 12:34:38 +08:00
2a74d9a8c8 [feature](mtmv)(1)remove old mtmv code (#26041)
remove old mtmv code,we will implement mtmv in a new way
2023-10-30 19:49:45 +08:00
3c18ed4e86 [test](fix) remove unused test case test_mtmv_ssb_ddl.groovy (#24434)
* forbid: test_mtmv_ssb_ddl

* remove: test_mtmv_ssb_ddl.groovy
2023-09-15 15:02:31 +08:00
7b93b26b8c [feature-wip](MTMV) optimize lock of mtmv job & task, to avoid dead lock (#21054) 2023-06-27 16:23:50 +08:00
4bee226698 [fix](regression-test) fix compile test_vertical_compaction_agg_keys failed (#20792)
fix compile test_vertical_compaction_agg_keys failed.
2023-06-14 23:25:17 +08:00
daf18a4b0e [fix](MTMV) Support refreshing data manually (#20108) 2023-06-12 17:57:06 +08:00
325ddab34e [conf](pipeline) turn pipeline on by default (#20458) 2023-06-08 09:20:51 +08:00
09d98c1663 [BugFix](MTMV)Set enable_mtmv_scheduler_framework master only to avoid regression fail (#18473)
Set enable_mtmv_scheduler_framework master only to avoid regression fail
2023-04-09 08:47:18 +08:00
Pxl
0a4381197a [Bug](MTMV) fix waitingMTMVTaskFinished failed at test_mtmv_ssb_ddl (#18373)
fix waitingMTMVTaskFinished failed at test_mtmv_ssb_ddl
2023-04-05 11:04:41 +08:00
55bf38dbab [feature-wip](MTMV) Use SSB ddl to test (#18150)
Add regression tests for MTMV.
2023-03-30 00:11:38 +08:00
a65616a5cd [enhancement](MTMV) Add a timeout for regression tests (#18048)
MTMV regression tests may loop forever due to some potential bugs. Therefore, we add a timeout to avoid endless loop. The value of the timeout is hard coded 30 minutes now.
2023-03-24 10:39:42 +08:00
d3e7f12ada [refactor](Nereids) refactor column pruning (#17579)
This pr refactor the column pruning by the visitor, the good sides
1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate
2. support shrink output field in some plans, this can skip some useless operations so improvement

example:
```sql
select id 
from (
  select id, sum(age)
  from student
  group by id
)a
```

we should prune the useless `sum (age)` in the aggregate.
before refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
   +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`#4], hasRepeat=false )
      +--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true )
         +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```

after refactor:
```
LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
+--LogicalSubQueryAlias ( qualifier=[a] )
   +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false )
      +--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true )
         +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON )
```
2023-03-24 09:00:48 +08:00
8df4a94826 [fix](MTMV) Tasks leak when dropping job (#17984)
1. Divide MTMV regression tests into 4 suites
2. Try to remove tasks which were killed by dropping job actions in running map.
2023-03-21 23:22:17 +08:00
731ba93773 [fix](regression) fix regression case (#17846) 2023-03-16 14:33:16 +08:00
ffa1d4d96a [regression-test](mtmv) drop table and mv before running the case (#17802)
To avoid table or mv already exist problem
2023-03-16 11:16:06 +08:00
96a3c60d3b [feature-wip](MTMV) Support alter statement (#16817)
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.

Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
2023-02-19 12:15:17 +08:00
1146bde695 [feature-wip](MTMV) Support refresh mtmv (#16218)
Support using this sql to refresh mtmv manually. It can generate a mtmv task right now.

```
REFRESH MATERIALIZED VIEW test_mv_view [complete];
```

You can use `show mtmv task` to show the latest task.

In this pr, I also try to clear the mtmv tasks when drop the mtmv to make sure test suite to be right
2023-02-04 20:17:45 +08:00
0842aa2947 [Fix](MTMV)Support master and follow change in multi fe for mtmv (#16149)
Support master and follow change in multi fe for mtmv

This PR fixes following issues:

1. Start the mtmv only in master node, if master change to follower, it will stop the scheduler.
2. Fix a double meta write here
3. Rename some edit log function and variables
4. If a mv both have PeriodicalJob and immediate job and PeriodicalJob will be trigger right now, scheduler will ignore the immediate job.
5. Fix expired time bugs, and make sure it will be clean among all the fes.
6. cleanerScheduler interval from 1 day to 1 minute.
2023-02-01 20:02:46 +08:00
388d623506 [fix](MTMV) Refine the process of refreshing data (#16006)
1. Remove some redundant code.
2. Fix the issue with the state of MTMV task.
3. Fix the case - test_create_mtmv.

## Problem summary

1. We used a retry policy to re-run the failed MTMV tasks, but we set the state to `FAILURE` during re-running the tasks.
We should do this after all the retry runs fail.
2. There are some redundant code can be removed.
3. In the case test_create_mtmv, we created many background tasks to refresh the data. Some task may fail due to the concurrency and cause the test fail. Actually, we only need single one task to verify the functionality.
2023-01-17 23:08:12 +08:00
fbe68e7ec8 [regression-test](MTMV) Make the case test_create_mtmv more robust (addendum) (#15909) 2023-01-13 22:51:47 +08:00
14e3879c4b [regression-test](MTMV) Make the case test_create_mtmv more robust (#15866)
## Proposed changes

1. Check the state of MTMV task as the loop condition.
2. Check the data in materialized view.

## Problem summary

There are some minor issues with #15546.
1. The case used a retry strategy as the loop condition, it may not be stable while the host machine is busy.
2. The case didn't check the final data in materialized view.
2023-01-13 00:13:24 +08:00
5dc644769a [mtmv](regression-test) add mtmv write data regression test (#15546)
* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test

* [regression-test](mtmv) add mtmv write data regression test
2023-01-10 23:42:42 +08:00