when aggregate function roll up, we should check the qury and mv function argument is equal
such as mv def and query sql as following, it should not rewrite success, because the bitmap_union_basic field augument is
not equal to the `count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)` field in query
mv def:
> select l_shipdate, o_orderdate, l_partkey, l_suppkey,
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic
> from lineitem
> left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
> group by
> l_shipdate,
> o_orderdate,
> l_partkey,
> l_suppkey;
query sql:
> select t1.l_partkey, t1.l_suppkey, o_orderdate,
> sum(o_totalprice),
> max(o_totalprice),
> min(o_totalprice),
> count(*),
> count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)
> from (select * from lineitem where l_shipdate = '2023-12-11') t1
> left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
> group by
> o_orderdate,
> l_partkey,
> l_suppkey;
Add some cases for index compaction:
1. index compaction with duplicate key table
2. index compaction with unique key table
3. optimize index compaction with multi segments in one inverted index
Fix query rewrite by mv fail when self join, after fix query like following can be rewrited
def materialized view = """
select
a.o_orderkey,
count(distinct a.o_orderstatus) num1,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
def query = """
select
a.o_orderkey,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
pred column also needs to be filtered by expr, exclude delete condition column, delete condition column not need to be filtered, query engine does not need it, after _output_column_by_sel_idx, delete condition materialize column will be erase at the end of the block.
Eg:
delete from table where a = 10;
select b from table;
a column only effective in segment iterator, the block from query engine only contain the b column, so no need to filter a column by expr.