The signatures of functions in these PRs should be more standard:
#27342,
#25510,
#20936,
including the following:
ipv4numtostring,
ipv4stringtonum,
ipv4stringtonumordefault,
ipv4stringtonumornull,
ipv6numtostring.
This PR will add necessary underscores between the words of each of them,
like changing ipv4numtostring to ipv4_num_to_string.
Modify computeSampleTabletIds location. table sample not take effect because selectedIndexId=-1 in computeSampleTabletIds, the previous run FE UT had special processing, so FE UT did not discover the BUG in time.
In query rewrite by mv process, we may want know the mv rewrite process info
such as which materializedView is used by rewrite, which materializedView is rewritten successfully, and
chose which materializedView by cost finally.
We can run sql as following to see the mv rewrite process summary info
`explain <your_query_sql>`
MaterializedView rewrite info is under the **MATERIALIZATIONS** tag.
For example as following:
we can see that materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedView with name `mv2_4` and `mv1_3` is avaliable but rewrite fail
Materialized View
MaterializedViewRewriteFail:
name: mv2_4
FailSummary: The graph logic between query and view is not consistent
name: mv1_3
FailSummary: Match mode is invalid
MaterializedViewRewriteSuccessButNotChose:
Names:
MaterializedViewRewriteSuccessAndChose:
Names: mv2_3
`MaterializedViewRewriteFail`:
it means that it's failure when try to use this materilaized view to represnt the query,
`NAME` is the name of MTMV.
`FAIL_SUMMARY` is the summary for the fail reason.
`MaterializedViewRewriteSuccessButNotChose`
it means that try to use this materilaized view to represnt the query successfully, but cbo optimizer doesn't chose it finally.
`MaterializedViewRewriteSuccessAndChose`
it means that try to use this materilaized view to represnt the query successfully and cbo optimizer chose it finally.
If want to see detail info, we can also run sql as following to see the mv rewrite process detail info
`explain memo plan <your_query_sql>`
MaterializedView rewrite info is under the **MATERIALIZATIONS** tag,
For example as following:
we can see the materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedViews with name of `mv2_4` and `mv1_3` is failed with falil reason.
========== MATERIALIZATIONS ==========
materializationContexts:
MaterializationContext[mv1_3] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#257.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#260.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#251.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#254.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
] )
MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#771.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
] )
MaterializationContext[mv2_3] ( rewriteSuccess=true, failReason=[
] )
`ObjectId` is the id of group expression.
`Summary`is is the summary for the fail reason.
`Reason` is the detail fail reason
such as the info as above
MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
]
`0` represent table lineitem
`1` represent table orders
`[<{0} --LEFT_OUTER_JOIN-- {1}>]` means the edge which is lineitem left outer join orders
`[<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]` means there is filter above orders which can not pull up because the edge `[<{0} --LEFT_OUTER_JOIN-- {1}>]`.
this can not rewrite because `[(o_orderdate#20 = 2023-12-01)]` in query is not found in **mv2_4**
**mv1_3** def as following:
CREATE MATERIALIZED VIEW mv1_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
o_orderstatus,
o_clerk
from
orders
where
O_ORDERDATE = '2023-12-01'
group by
o_orderstatus,
o_clerk;
**mv2_3** def as following:
CREATE MATERIALIZED VIEW mv2_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
l_linestatus,
o_clerk,
from
(
select
*
from
lineitem
where
l_shipdate = '2023-12-01'
) t1
left join (
select
*
from
orders
where
o_orderdate = '2023-12-01'
) t2 on l_orderkey = o_orderkey
group by
l_linestatus,
o_clerk;
**mv2_4** def as following:
CREATE MATERIALIZED VIEW mv2_4
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
l_linestatus,
o_clerk,
from
(
select
*
from
lineitem
where
l_shipdate >= '2023-12-01' and l_shipdate <= '2023-12-05'
) t1
left join (
select
*
from
orders
where
o_orderdate >= '2023-12-01' and o_orderdate <= '2023-12-05'
) t2 on l_orderkey = o_orderkey
group by
l_linestatus,
o_clerk;
when aggregate function roll up, we should check the qury and mv function argument is equal
such as mv def and query sql as following, it should not rewrite success, because the bitmap_union_basic field augument is
not equal to the `count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)` field in query
mv def:
> select l_shipdate, o_orderdate, l_partkey, l_suppkey,
> sum(o_totalprice) as sum_total,
> max(o_totalprice) as max_total,
> min(o_totalprice) as min_total,
> count(*) as count_all,
> bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) as bitmap_union_basic
> from lineitem
> left join orders on lineitem.l_orderkey = orders.o_orderkey and l_shipdate = o_orderdate
> group by
> l_shipdate,
> o_orderdate,
> l_partkey,
> l_suppkey;
query sql:
> select t1.l_partkey, t1.l_suppkey, o_orderdate,
> sum(o_totalprice),
> max(o_totalprice),
> min(o_totalprice),
> count(*),
> count(distinct case when o_shippriority > 10 and o_orderkey IN (1, 3) then o_custkey else null end)
> from (select * from lineitem where l_shipdate = '2023-12-11') t1
> left join orders on t1.l_orderkey = orders.o_orderkey and t1.l_shipdate = o_orderdate
> group by
> o_orderdate,
> l_partkey,
> l_suppkey;
My organization uses HMS catalog to accelerate Lake query. Sine we have custom distributed file system and hard to integrate to FE / BE, we introduce HMS Catalog broker scan support (#24830) and implement custom distributed file system adaption in broker.
We want to expand the scope of use to Iceberg table scan in HMS Catalog. This PR introduces broker-scan-related `IcebergBrokerIO`, `BrokerInputFile`, `BrokerInputStream` for Iceberg table scan