The first time load data to a partition, we need to analyze the partition columns even when the health rate is high. Because if not, the min max value of the column may not include the new partition values, which may cause bad plan.
Fix query rewrite by mv fail when self join, after fix query like following can be rewrited
def materialized view = """
select
a.o_orderkey,
count(distinct a.o_orderstatus) num1,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
def query = """
select
a.o_orderkey,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate = '2023-12-08' AND b.o_orderdate = '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num2,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority = 1 AND a.o_orderdate >= '2023-12-01' AND a.o_orderdate <= '2023-12-09' THEN a.o_shippriority+b.o_custkey ELSE 0 END) num3,
SUM(CASE WHEN a.o_orderstatus = 'o' AND a.o_shippriority in (1,2) AND a.o_orderdate >= '2023-12-08' AND b.o_orderdate <= '2023-12-09' THEN a.o_shippriority-b.o_custkey ELSE 0 END) num4,
AVG(a.o_totalprice) num5,
MAX(b.o_totalprice) num6,
MIN(a.o_totalprice) num7
from
orders a
left outer join orders b
on a.o_orderkey = b.o_orderkey
and a.o_custkey = b.o_custkey
group by a.o_orderkey;
"""
Show auto analyze can show the running jobs, not only the finished/failed jobs.
Show analyze task status could show auto tasks as well.
Remove some useless code.
Auto analyze execute catalog/db/table in the order of id, small id first.
Sample analyzing need to get row count by using table.getRowCount(). This method is not updated in real time, which may cause the sample task to scan whole table.
This pr is to fix this. Set the flag that indicate the analyze job is for an empty table and skip scan the table. Meanwhile, don't reset updatedRows in this case.
Set hugeTableAutoAnalyzeIntervalInMillis = 0 because all default huge table size has been set to 0.