materialized view definition is
> select
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) >cnt_1,
> bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as >cnt_2
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
the query following can be rewritten by materialized view above.
it use the aggregate fuction arithmetic calculation in the select
> select
> count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null end) as cnt_2,
> (sum(o_totalprice) + min(o_totalprice)) * count(*),
> min(o_totalprice) + count(distinct case when O_SHIPPRIORITY > 2 and o_orderkey IN (2) then o_custkey else null >end)
> from lineitem
> left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate;
the target expression should be:
1. only one numeric slot, or
2. cast for any data type
example:
select * from T1 join T2 on abs(T1.a) = T2.a
RF T2.a->abs(T1.a)
materialized view def is as following:
> select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey
> from lineitem
> left join (select * from orders where o_orderdate = '2023-12-10' ) t2
> on lineitem.l_orderkey = t2.o_orderkey;
the query as following, should add filter `o_orderdate = '2023-12-10'` on mv when query rewrite by materialized view
> select l_shipdate, o_orderdate, l_partkey, l_suppkey, o_orderkey
> from lineitem
> left join orders
> on lineitem.l_orderkey = orders.o_orderkey
> where o_orderdate = '2023-12-10' order by 1, 2, 3, 4, 5;
* [Nereids](Variant) Implement variant type in Variant and support new sub column access method
The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1
1. During the binding stage, the expression element_at(var, "xxx") is transformed into a SlotReference with a specified path. This conversion is tracked in the StatementContext, where the parent slot is the primary key and the paths are secondary keys. This structure, known as subColumnSlotRefMap in the StatementContext, helps to eliminate duplicates of the same slot derived from identical paths.
2. A new rule, BindSlotWithPaths, is introduced in the analysis stage. This rule is responsible for converting slots with paths into their respective slot suppliers. To ensure that slots with paths are correctly associated with the appropriate LogicalOlapScan, an additional mapping, slotToRelation, is added to the StatementContext. This mapping links the top-level slot to its corresponding relation (i.e., LogicalOlapScan). Consequently, subsequent slots with paths can determine the correct LogicalOlapScan to merge with and modify accordingly.
Support to query rewrite by materialized view when join input has aggregate, the aggregate should be simple
For example as following:
The materialized view def is
> select
> l_linenumber,
> count(distinct l_orderkey),
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> max(case when l_orderkey in (4, 5) then (l_quantity *2 + part_supp_a.qty_max) * 0.88 else 100 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;
when query is like following, it can be rewritten by mv above
> select
> l_linenumber,
> sum(case when l_orderkey in (1,2,3) then l_suppkey * l_linenumber else 0 end),
> avg(case when l_partkey in (2, 3, 4) then l_discount + o_totalprice + part_supp_a.qty_sum else 50 end)
> from lineitem
> left join orders on l_orderkey = o_orderkey
> left join
> (select ps_partkey, ps_suppkey, sum(ps_availqty) qty_sum, max(ps_availqty) qty_max,
> min(ps_availqty) qty_min,
> avg(ps_supplycost) cost_avg
> from partsupp
> group by ps_partkey,ps_suppkey) part_supp_a
> on l_partkey = part_supp_a.ps_partkey
> and l_suppkey = part_supp_a.ps_suppkey
> group by l_linenumber;