Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.
Now, a `allowOpMTMV()` method is implemented through various `clauses`.
Because some operations have the same `opType`, but some operations allow and some do not.
For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.
The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
commitid: 806e241
pr: #34768
Table id may be the same but actually they are different tables. so we optimize the
org.apache.doris.nereids.rules.exploration.mv.mapping.RelationMapping#getTableQualifier with following code:
Objects.hash(table.getDatabase().getCatalog().getId(), table.getDatabase().getId(), table.getId())
table id is long, we identify the table used in mv rewrite is bitSet. the bitSet can only use int, so we mapping the long id to init id in every query when mv rewrite
if query and mv def is as following:
def mv1_1 = """
select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
def query1_1 = """
select t1.L_LINENUMBER, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)
When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.
* [Fix](nereids) fix when insert into select empty table
---------
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
agg_state is agg intermediate state, detail see
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state
this support agg function roll up as following
+---------------------+---------------------------------------------+---------------------+
| query | materialized view | roll up |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+
for example which can be rewritten by mv sucessfully as following
MV defination is
```
select
o_orderstatus,
l_partkey,
l_suppkey,
sum_union(sum_state(o_shippriority)),
group_concat_union(group_concat_state(l_shipinstruct)),
avg_union(avg_state(l_linenumber)),
max_by_union(max_by_state(l_shipmode, l_suppkey)),
count_union(count_state(l_orderkey)),
multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
from lineitem
left join orders
on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_partkey,
l_suppkey;
```
Query is
```
select
o_orderstatus,
l_suppkey,
sum(o_shippriority),
group_concat(l_shipinstruct),
avg(l_linenumber),
max_by(l_shipmode,l_suppkey),
count(l_orderkey),
multi_distinct_count(l_shipmode)
from lineitem
left join orders
on l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_suppkey;
```
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;
Support Single table query rewrite with out group by
this is useful for complex filter or expresission
the mv def and query is as following
which can be query rewritten
mv def:
```
select *
from lineitem where l_comment like '%xx%'
```
query:
```
select l_linenumber, l_receiptdate
from lineitem where l_comment like '%xx%'
```
Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646)
Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.
* fix code style