pick from master commit id: d20b18f pr: #34293
if mv3 is def as following:
select c1, c2, c3 from t1;
mv4 is def as following:
select c1, c2 from mv3;
when query is
select c1, c2 from t1;
the mv3 and mv4 both can be rewritten successfully
Otherwise result in crash
```
*** SIGSEGV address not mapped to object (@0x0) received by PID 4149909 (TID 4152328 OR 0x7efefc60d700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F031AD0E090 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::Status doris::vectorized::MutableBlock::merge_impl<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:586
5# doris::Status doris::vectorized::MutableBlock::merge<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:521
```
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)
When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.
* [Fix](nereids) fix when insert into select empty table
---------
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
agg_state is agg intermediate state, detail see
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state
this support agg function roll up as following
+---------------------+---------------------------------------------+---------------------+
| query | materialized view | roll up |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+
for example which can be rewritten by mv sucessfully as following
MV defination is
```
select
o_orderstatus,
l_partkey,
l_suppkey,
sum_union(sum_state(o_shippriority)),
group_concat_union(group_concat_state(l_shipinstruct)),
avg_union(avg_state(l_linenumber)),
max_by_union(max_by_state(l_shipmode, l_suppkey)),
count_union(count_state(l_orderkey)),
multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
from lineitem
left join orders
on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_partkey,
l_suppkey;
```
Query is
```
select
o_orderstatus,
l_suppkey,
sum(o_shippriority),
group_concat(l_shipinstruct),
avg(l_linenumber),
max_by(l_shipmode,l_suppkey),
count(l_orderkey),
multi_distinct_count(l_shipmode)
from lineitem
left join orders
on l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_suppkey;
```
* Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory.
Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises.
Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>` with `RETURN_IF_ERROR`.
The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
Support Single table query rewrite with out group by
this is useful for complex filter or expresission
the mv def and query is as following
which can be query rewritten
mv def:
```
select *
from lineitem where l_comment like '%xx%'
```
query:
```
select l_linenumber, l_receiptdate
from lineitem where l_comment like '%xx%'
```
Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
* [improvement](mtmv) Split the expression mapping in LogicalCompatibilityContext for performance (#34646)
Need query to view expression mapping when check the logic of hyper graph is equals or not.
Getting all expression mapping one-time may affect performance. So split the expresson to three type
JOIN_EDGE, NODE, FILTER_EDGE and get them step by step.
* fix code style
followup #35241
In #35241, we update the doris-shade version to 2.1.0, which already contains dlf dependencies.
pick part of #34749, to remove dlf dependencies in fe/pom.xml