Commit Graph

19 Commits

Author SHA1 Message Date
49cec39038 [test](mtmv) Add materialized view which contain external table rewrite test (#30975) 2024-02-16 10:16:39 +08:00
5c2a4a80dd [fix](nereids) Fix use aggregate mv wrongly when rewrite query which only contains join (#30858)
the materialized view def is as following:
>            select 
>               o_orderdate, 
>               o_shippriority, 
>               o_comment, 
>               l_orderkey, 
>               o_orderkey, 
>               count(*) 
>             from 
>               orders 
>               left join lineitem on l_orderkey = o_orderkey
>               group by o_orderdate, 
>               o_shippriority, 
>               o_comment, 
>               l_orderkey;

the query should rewrite success by using above materialized view
>             select 
>               o_orderdate, 
>               o_shippriority, 
>               o_comment, 
>               l_orderkey, 
>               ps_partkey, 
>               count(*) 
>             from 
>               orders left 
>               join lineitem on l_orderkey = o_orderkey
>               left join partsupp on ps_partkey = l_orderkey
>               group by
>              o_orderdate, 
>              o_shippriority, 
>              o_comment, 
>              l_orderkey, 
>              ps_partkey;
2024-02-16 10:12:23 +08:00
5aed3abb8a [Fix](Nereids) Fix rewrite by materialized view fail when join input has agg (#30734)
materialized view definition is as following, and the query sql is the same
when outer group by use the col1 in the inner group, which can be rewritten by materialized view

select
t1.o_orderdate,
t1.o_orderkey,
t1.col1
from
(
select
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate,
sum(o_shippriority) as col1
from
orders
group by
o_orderkey,
o_custkey,
o_orderstatus,
o_orderdate
) as t1
left join lineitem on lineitem.l_orderkey = t1.o_orderkey
group by
t1.o_orderdate,
t1.o_orderkey,
t1.col1
2024-02-03 20:27:04 +08:00
1ab37737ae [Test](Nereids) Add SSB dataset to test materialized view rewrite (#30528)
* [Test](Nereids) Add SSB dataset to test materialized view rewrite

* rollback irrelevant code

* fix sort slot 0
2024-02-01 19:00:50 +08:00
dce6c8bd65 [Improvement](Nereids) Support aggregate rewrite by materialized view with complex expression (#30440)
materialized view definition is

>            select
>            sum(o_totalprice) as sum_total,
>            max(o_totalprice) as max_total,
>            min(o_totalprice) as min_total,
>           count(*) as count_all,
>            bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1,
>            bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2
>            from lineitem
>            left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
   

the query following can be rewritten by materialized view above.
it use the aggregate fuction arithmetic calculation in the select 

>            select
>            count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null end) as cnt_2,
>            (sum(o_totalprice) + min(o_totalprice)) * count(*),
>            min(o_totalprice) + count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null >end)
>            from lineitem
>            left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
2024-01-29 19:03:47 +08:00
f25af15842 [Fix](Nereids) Fix lost predicate when query has a filter at the right input of the outer join (#30374)
materialized view def is as following:
>        select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey  
>        from lineitem 
>        left join (select * from orders where o_orderdate = '2023-12-10' ) t2 
>        on lineitem.l_orderkey = t2.o_orderkey;
    
the query as following, should add filter `o_orderdate = '2023-12-10'` on mv when query rewrite by materialized view
>        select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey 
>         from lineitem 
>         left join orders 
>        on lineitem.l_orderkey = orders.o_orderkey 
>         where o_orderdate = '2023-12-10' order by 1, 2, 3, 4, 5;
2024-01-27 09:09:02 +08:00
2f68aac885 [Improvement](Nereids) Support to query rewrite by materialized view when join input has aggregate (#30230)
Support to query rewrite by materialized view when join input has aggregate, the aggregate should be simple
For example as following:
The materialized view def is 
>            select
>              l_linenumber,
>              count(distinct l_orderkey),
>              sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
>              max(case when l_orderkey in (4, 5) then (l_quantity *2 + part_supp_a.qty_max) * 0.88 else 100 end),
>              avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
>            from lineitem
>            left join orders on l_orderkey = o_orderkey
>            left join 
>              (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
>                min(ps_availqty) qty_min,
>                avg(ps_supplycost) cost_avg
>                from partsupp
>                group by ps_partkey,ps_suppkey) part_supp_a
>              on l_partkey = part_supp_a.ps_partkey
>                and l_suppkey = part_supp_a.ps_suppkey
>            group by l_linenumber;

when query is like following, it can be rewritten by mv above
>            select
>              l_linenumber,
>              sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
>              avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
>            from lineitem
>            left join orders on l_orderkey = o_orderkey
>            left join 
>              (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
>                min(ps_availqty) qty_min,
>                avg(ps_supplycost) cost_avg
>                from partsupp
>                group by ps_partkey,ps_suppkey) part_supp_a
>              on l_partkey = part_supp_a.ps_partkey
>                and l_suppkey = part_supp_a.ps_suppkey
>            group by l_linenumber;
2024-01-25 13:23:59 +08:00
9a58cacf0f [Improvement](nereids) Make sure to catch and record exception for every materialization context (#29953)
1. Make sure instance when change params of StructInfo,Predicates.
2. Catch and record exception for every materialization context, this make sure that if throw exception when one materialization context rewrite, it will not influence others.
3. Support to mv rewrite when hava count function when aggregate without group by
2024-01-23 10:09:54 +08:00
4bf4239d7a [feature](Nereids): optimize logical group expression in dphyp (#30000) 2024-01-16 18:48:20 +08:00
d47adbb81f [Fix](nereids) Fix cte rewrite by mv failure and predicates compensation by mistake (#29820)
Fix cte rewrite by mv wrongly when query has scalar aggregate but view no
For example as following, it should not be rewritten by materialized view successfully

// materialzied view define
def mv20_1 = """
select
l_shipmode,
l_shipinstruct,
sum(l_extendedprice),
count()
from lineitem
left join
orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY
group by
l_shipmode,
l_shipinstruct;
"""
// query sql
def query20_1 =
"""
select
sum(l_extendedprice),
count()
from lineitem
left join
orders
on lineitem.L_ORDERKEY = orders.O_ORDERKEY
"""

Fix predicates compensation by mistake
For example as following, it can return right result, but it's wrong earlier.

// materialzied view define
def mv7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from lineitem
left join orders
on lineitem.l_orderkey = orders.o_orderkey
where l_shipdate = '2023-12-08' and o_orderdate = '2023-12-08';
"""
// query sql
def query7_1 = """
select l_shipdate, o_orderdate, l_partkey, l_suppkey
from (select * from lineitem where l_shipdate = '2023-10-17' ) t1
left join orders
on t1.l_orderkey = orders.o_orderkey;
"""

and optimize some code usage and add more comment for method
2024-01-16 18:31:27 +08:00
fda001b6d3 [Improvement](nereids) Support join derivation when mv rewrite (#29609)
materialized view def is as following:
>            select l_linenumber, o_custkey
>           from orders
>            left join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY
>            where o_custkey = 1;

when query is as following, it can be rewritten by mv above
it requires that query has reject null filters on the join right input, 
current supported filter are  "=", "<", "<=", ">", ">=", "<=>" 
>            select IFNULL(orders.O_CUSTKEY, 0) as custkey_not_null,
>           case when l_linenumber in (1,2,3) then l_linenumber else o_custkey end as case_when
>            from orders
>            inner join lineitem on orders.O_ORDERKEY = lineitem.L_ORDERKEY
>            where o_custkey = 1 and l_linenumber > 0;
2024-01-12 11:44:21 +08:00
d50c8b6d3a [Improvement](nereids) Query rewrite by mv support bitmap_union and bitmap_union_count roll up (#29418)
Query rewrite by mv support bitmap_union and bitmap_union_count roll up, aggregate functions which supports roll up is listed as following:

| 查询中函数            | 物化视图中函数      | 函数上卷后              |
|------------------|--------------|--------------------|
| max              | max          | max                |
| min              | min          | min                |
| sum              | sum          | sum                |
| count            | count        | sum                |
| count(distinct ) | bitmap_union | bitmap_union_count |
| bitmap_union | bitmap_union | bitmap_union|
| bitmap_union_count | bitmap_union | bitmap_union_count |

this depends on  https://github.com/apache/doris/pull/29256
2024-01-12 11:44:21 +08:00
49a3bab399 [fix](nereids) fix aggregate function roll up when expression arguments is not equals (#29256)
when aggregate function roll up, we should check the qury and mv function argument is equal
such as mv def and query sql as following, it should not rewrite success, because the  bitmap_union_basic field augument is
not equal to the `count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)`  field in query

mv def:
>      select l_shipdate, o_orderdate, l_partkey, l_suppkey, 
>            sum(o_totalprice) as sum_total, 
>            max(o_totalprice) as max_total, 
>            min(o_totalprice) as min_total, 
>            count(*) as count_all, 
>            bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic 
>           from lineitem 
>           left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate 
>            group by 
>         l_shipdate, 
>         o_orderdate, 
>          l_partkey, 
>         l_suppkey;

query sql:

>             select t1.l_partkey, t1.l_suppkey, o_orderdate,
>           sum(o_totalprice),
>            max(o_totalprice),
>           min(o_totalprice),
>           count(*),
>            count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)
>            from (select * from lineitem where l_shipdate = '2023-12-11') t1
>            left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
>            group by
>            o_orderdate, 
>            l_partkey,
>            l_suppkey;
2024-01-03 18:58:18 +08:00
9fc613de9c [fix](nereids) Fix query rewrite by mv fail when self join (#29227)
Fix query rewrite by mv fail when self join, after fix query like following can be rewrited

def materialized view = """
    select 
    a.o_orderkey,
    count(distinct a.o_orderstatus) num1,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
"""

def query = """
    select 
    a.o_orderkey,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
    SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
    AVG(a.o_totalprice) num5,
    MAX(b.o_totalprice) num6,
    MIN(a.o_totalprice) num7
    from
    orders a
    left outer join orders b
    on a.o_orderkey = b.o_orderkey
    and a.o_custkey = b.o_custkey
    group by a.o_orderkey;
"""
2023-12-29 13:45:33 +08:00
7434de9ed8 [improvement](nereids) Get partition related table disable nullable field and complete agg matched pattern mv rules. (#28973)
* [improvement] (nereids) Get partition related table disable nullable field and modify regression test, complete agg mv rules.

* make filed not null to create partition mv
2023-12-26 00:29:42 +08:00
9a5ec43f05 [fix](nereids) Fix data wrong using mv rewrite and ignore case when getting mv related partition table (#28699)
1. Fix data wrong using mv rewrite
2. Ignore case when getting mv related partition table
3. Enable infer expression column name without alias when create mv
2023-12-20 17:59:46 +08:00
4c0080e237 [feat](Nereids) support outer join and aggregate bitmap rewrite by mv (#28596)
- Support left outer join rewrite by materialized view
- Support bitmap_union roll up to imp count(distinct)
- Support partition materialized view rewrite
2023-12-20 10:23:30 +08:00
4c51558f6b [feature](nereids) Support basic aggregate rewrite and function rollup using materialized view (#28269)
Add aggregate materializedviewRules for query rewrite.
it support the query rewrite as following:

    def mv = "select lineitem.L_LINENUMBER, orders.O_CUSTKEY, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER, orders.O_CUSTKEY "
    def query = "select lineitem.L_LINENUMBER, sum(O_TOTALPRICE) as sum_alias " +
            "from lineitem " +
            "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY " +
            "group by lineitem.L_LINENUMBER"
2023-12-15 11:30:02 +08:00
be81eb1a9b [feature](nereids) Support inner join query rewrite by materialized view (#27922)
Work in process. Support inner join query rewrite by materialized view in some scene.
Such as an exmple as following:

> mv = "select  lineitem.L_LINENUMBER, orders.O_CUSTKEY " +
>             "from orders " +
>             "inner join lineitem on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
>     query = "select lineitem.L_LINENUMBER " +
>             "from lineitem " +
>             "inner join orders on lineitem.L_ORDERKEY = orders.O_ORDERKEY "
2023-12-07 20:29:51 +08:00