Query rewrite by mv support bitmap_union and bitmap_union_count roll up, aggregate functions which supports roll up is listed as following:
| 查询中函数 | 物化视图中函数 | 函数上卷后 |
|------------------|--------------|--------------------|
| max | max | max |
| min | min | min |
| sum | sum | sum |
| count | count | sum |
| count(distinct ) | bitmap_union | bitmap_union_count |
| bitmap_union | bitmap_union | bitmap_union|
| bitmap_union_count | bitmap_union | bitmap_union_count |
this depends on https://github.com/apache/doris/pull/29256
The current logic for SQL dialect conversion is all in the `fe-core` module, which may lead to the following issues:
- Changes to the dialect conversion logic may occur frequently, requiring users to upgrade the Doris version frequently within the fe-core module, leading to a longer change cycle.
- The cost of customized development is high, requiring users to replace the fe-core JAR package.
Turning it into a plugin can address the above issues properly.
Problem:
fe ut failed cause of null pointer error
Cause:
fe ut getting statement context from connection context failed
Resolved:
add null pointer judgement
Force to use zonemap for collecting string type min max.
String type is not using zonemap for min max, because zonemap value at BE side is truncated at 512 bytes which may cause the value not accurate. But it's OK for statisitcs min max, and this could also avoid scan whole table while sampling.
Be do not support RF for NullSafeEquals, so fe not generate RF for them.
However, after we support NullSafeEquals as Hash join condition,
the order of RF is wrong when generating RF in FE. this PR fix it.
The signatures of functions in these PRs should be more standard:
#27342,
#25510,
#20936,
including the following:
ipv4numtostring,
ipv4stringtonum,
ipv4stringtonumordefault,
ipv4stringtonumornull,
ipv6numtostring.
This PR will add necessary underscores between the words of each of them,
like changing ipv4numtostring to ipv4_num_to_string.
Modify computeSampleTabletIds location. table sample not take effect because selectedIndexId=-1 in computeSampleTabletIds, the previous run FE UT had special processing, so FE UT did not discover the BUG in time.
In query rewrite by mv process, we may want know the mv rewrite process info
such as which materializedView is used by rewrite, which materializedView is rewritten successfully, and
chose which materializedView by cost finally.
We can run sql as following to see the mv rewrite process summary info
`explain <your_query_sql>`
MaterializedView rewrite info is under the **MATERIALIZATIONS** tag.
For example as following:
we can see that materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedView with name `mv2_4` and `mv1_3` is avaliable but rewrite fail
Materialized View
MaterializedViewRewriteFail:
name: mv2_4
FailSummary: The graph logic between query and view is not consistent
name: mv1_3
FailSummary: Match mode is invalid
MaterializedViewRewriteSuccessButNotChose:
Names:
MaterializedViewRewriteSuccessAndChose:
Names: mv2_3
`MaterializedViewRewriteFail`:
it means that it's failure when try to use this materilaized view to represnt the query,
`NAME` is the name of MTMV.
`FAIL_SUMMARY` is the summary for the fail reason.
`MaterializedViewRewriteSuccessButNotChose`
it means that try to use this materilaized view to represnt the query successfully, but cbo optimizer doesn't chose it finally.
`MaterializedViewRewriteSuccessAndChose`
it means that try to use this materilaized view to represnt the query successfully and cbo optimizer chose it finally.
If want to see detail info, we can also run sql as following to see the mv rewrite process detail info
`explain memo plan <your_query_sql>`
MaterializedView rewrite info is under the **MATERIALIZATIONS** tag,
For example as following:
we can see the materializedView with name `mv2_3` is rewritten successfuly and chosen finally.
and materializedViews with name of `mv2_4` and `mv1_3` is failed with falil reason.
========== MATERIALIZATIONS ==========
materializationContexts:
MaterializationContext[mv1_3] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#257.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#260.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#251.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
ObjectId : ObjectId#254.
Summary : Match mode is invalid.
Reason : matchMode is VIEW_PARTIAL.
] )
MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#771.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
] )
MaterializationContext[mv2_3] ( rewriteSuccess=true, failReason=[
] )
`ObjectId` is the id of group expression.
`Summary`is is the summary for the fail reason.
`Reason` is the detail fail reason
such as the info as above
MaterializationContext[mv2_4] ( rewriteSuccess=false, failReason=[
ObjectId : ObjectId#762.
Summary : The graph logic between query and view is not consistent.
Reason : graph logical is not equal
query join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
view join edges is
[<{0} --LEFT_OUTER_JOIN-- {1}>],
query filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]],
view filter edges
is [<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]
inferred edge with conditions
{}
with error edge <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]=[(o_orderdate#20 = 2023-12-01)].
]
`0` represent table lineitem
`1` represent table orders
`[<{0} --LEFT_OUTER_JOIN-- {1}>]` means the edge which is lineitem left outer join orders
`[<{0} --FILTER-- {}>, <{1} --FILTER-- {}>[[] , [<{0} --LEFT_OUTER_JOIN-- {1}>]]]` means there is filter above orders which can not pull up because the edge `[<{0} --LEFT_OUTER_JOIN-- {1}>]`.
this can not rewrite because `[(o_orderdate#20 = 2023-12-01)]` in query is not found in **mv2_4**
**mv1_3** def as following:
CREATE MATERIALIZED VIEW mv1_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
o_orderstatus,
o_clerk
from
orders
where
O_ORDERDATE = '2023-12-01'
group by
o_orderstatus,
o_clerk;
**mv2_3** def as following:
CREATE MATERIALIZED VIEW mv2_3
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
l_linestatus,
o_clerk,
from
(
select
*
from
lineitem
where
l_shipdate = '2023-12-01'
) t1
left join (
select
*
from
orders
where
o_orderdate = '2023-12-01'
) t2 on l_orderkey = o_orderkey
group by
l_linestatus,
o_clerk;
**mv2_4** def as following:
CREATE MATERIALIZED VIEW mv2_4
BUILD IMMEDIATE REFRESH auto ON SCHEDULE EVERY 1 hour
DISTRIBUTED BY RANDOM BUCKETS 12
PROPERTIES ('replication_num' = '1') as
select
l_linestatus,
o_clerk,
from
(
select
*
from
lineitem
where
l_shipdate >= '2023-12-01' and l_shipdate <= '2023-12-05'
) t1
left join (
select
*
from
orders
where
o_orderdate >= '2023-12-01' and o_orderdate <= '2023-12-05'
) t2 on l_orderkey = o_orderkey
group by
l_linestatus,
o_clerk;