materialized view definition is as following, and the query sql is the same
when outer group by use the col1 in the inner group, which can be rewritten by materialized view
select
t1.o_orderdate,
t1.o_orderkey,
t1.col1
from
(
select
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate,
sum(o_shippriority) as col1
from
orders
group by
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate
) as t1
left join lineitem on lineitem.l_orderkey = t1.o_orderkey
group by
t1.o_orderdate,
t1.o_orderkey,
t1.col1
materialized view definition is
> select
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1,
> bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
the query following can be rewritten by materialized view above.
it use the aggregate fuction arithmetic calculation in the select
> select
> count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null end) as cnt_2,
> (sum(o_totalprice) + min(o_totalprice)) * count(*),
> min(o_totalprice) + count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null >end)
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
materialized view def is as following:
> select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey
> from lineitem
> left join (select * from orders where o_orderdate = '2023-12-10' ) t2
> on lineitem.l_orderkey = t2.o_orderkey;
the query as following, should add filter `o_orderdate = '2023-12-10'` on mv when query rewrite by materialized view
> select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey
> from lineitem
> left join orders
> on lineitem.l_orderkey = orders.o_orderkey
> where o_orderdate = '2023-12-10' order by 1, 2, 3, 4, 5;
Support to query rewrite by materialized view when join input has aggregate, the aggregate should be simple
For example as following:
The materialized view def is
> select
> l_linenumber,
> count(distinct l_orderkey),
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> max(case when l_orderkey in (4, 5) then (l_quantity *2 + part_supp_a.qty_max) * 0.88 else 100 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;
when query is like following, it can be rewritten by mv above
> select
> l_linenumber,
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;
1. Make sure instance when change params of StructInfo,Predicates.
2. Catch and record exception for every materialization context, this make sure that if throw exception when one materialization context rewrite, it will not influence others.
3. Support to mv rewrite when hava count function when aggregate without group by
Fix cte rewrite by mv wrongly when query has scalar aggregate but view no
For example as following, it should not be rewritten by materialized view successfully
// materialzied view define
def mv20_1 = """
select
l_shipmode,
l_shipinstruct,
sum(l_extendedprice),
count()
from lineitem
left join
orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY
group by
l_shipmode,
l_shipinstruct;
"""
// query sql
def query20_1 =
"""
select
sum(l_extendedprice),
count()
from lineitem
left join
orders
on lineitem.L_ORDERKEY = orders.O_ORDERKEY
"""
Fix predicates compensation by mistake
For example as following, it can return right result, but it's wrong earlier.
// materialzied view define
def mv7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from lineitem
left join orders
on lineitem.l_orderkey = orders.o_orderkey
where l_shipdate = '2023-12-08' and o_orderdate = '2023-12-08';
"""
// query sql
def query7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from (select * from lineitem where l_shipdate = '2023-10-17' ) t1
left join orders
on t1.l_orderkey = orders.o_orderkey;
"""
and optimize some code usage and add more comment for method
materialized view def is as following:
> select l_linenumber, o_custkey
> from orders
> left join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY
> where o_custkey = 1;
when query is as following, it can be rewritten by mv above
it requires that query has reject null filters on the join right input,
current supported filter are "=", "<", "<=", ">", ">=", "<=>"
> select IFNULL(orders.O_CUSTKEY, 0) as custkey_not_null,
> case when l_linenumber in (1,2,3) then l_linenumber else o_custkey end as case_when
> from orders
> inner join lineitem on orders.O_ORDERKEY = lineitem.L_ORDERKEY
> where o_custkey = 1 and l_linenumber > 0;
Query rewrite by mv support bitmap_union and bitmap_union_count roll up, aggregate functions which supports roll up is listed as following:
| 查询中函数 | 物化视图中函数 | 函数上卷后 |
|------------------|--------------|--------------------|
| max | max | max |
| min | min | min |
| sum | sum | sum |
| count | count | sum |
| count(distinct ) | bitmap_union | bitmap_union_count |
| bitmap_union | bitmap_union | bitmap_union|
| bitmap_union_count | bitmap_union | bitmap_union_count |
this depends on https://github.com/apache/doris/pull/29256
when aggregate function roll up, we should check the qury and mv function argument is equal
such as mv def and query sql as following, it should not rewrite success, because the bitmap_union_basic field augument is
not equal to the `count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)` field in query
mv def:
> select l_shipdate, o_orderdate, l_partkey, l_suppkey,
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic
> from lineitem
> left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
> group by
> l_shipdate,
> o_orderdate,
> l_partkey,
> l_suppkey;
query sql:
> select t1.l_partkey, t1.l_suppkey, o_orderdate,
> sum(o_totalprice),
> max(o_totalprice),
> min(o_totalprice),
> count(*),
> count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)
> from (select * from lineitem where l_shipdate = '2023-12-11') t1
> left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
> group by
> o_orderdate,
> l_partkey,
> l_suppkey;
Fix query rewrite by mv fail when self join, after fix query like following can be rewrited
def materialized view = """
select
a.o_orderkey,
count(distinct a.o_orderstatus) num1,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
def query = """
select
a.o_orderkey,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
* [improvement] (nereids) Get partition related table disable nullable field and modify regression test, complete agg mv rules.
* make filed not null to create partition mv
1. Fix data wrong using mv rewrite
2. Ignore case when getting mv related partition table
3. Enable infer expression column name without alias when create mv
- Support left outer join rewrite by materialized view
- Support bitmap_union roll up to imp count(distinct)
- Support partition materialized view rewrite