cherry-pick #34313 to branch-2.1
MergePercentileToArray is to perform a transformation in this case:
select ss_item_sk, percentile(ss_quantity,0.9), percentile(ss_quantity,0.6), percentile(ss_quantity,0.3)
from store_sales group by ss_item_sk;
==>
select ss_item_sk, percentile_array(ss_quantity,[0.3,0.6,0.9]) from store_sales group by ss_item_sk;
support data type ipv4/ipv6 with inverted index
and then we can query like "> or < or >= or <= or in/not in " this
conjuncts expr for ip with inverted index speeding up
## Proposed changes
Issue #31442
<!--Describe your changes.-->
1. The unit of the seventh parameter of `ZonedDateTime.of` is
nanosecond, so we should multiply the microsecond by 1000.
2. When writing to a non-partitioned iceberg table, the data path has an
extra slash
pick from master #34548
The modification involving CloudGlobalTransactionMgr was not picked up
to 2.1 because the 2.1 branch does not yet have the Thunderbolt
CloudGlobalTransactionMgr
## Proposed changes
This pr fixes some failed regression test about checking shape
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.
Backport #35287
Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
If there are duplicated expressions in the select list, the result will
be incorrect.
## Proposed changes
Issue Number: close#28438
<!--Describe your changes.-->
this is brought by https://github.com/apache/doris/pull/33800
if mv is partitioned materialzied view,
the data will be wrong by using the hited materialized view when the
paritions in related base partiton table are deleted, created and so on.
this fix the problem.
if **SET enable_materialized_view_union_rewrite=true;** this will use
the materializd view and make sure the data is corrent
if **SET enable_materialized_view_union_rewrite=false;** this will query
base table directly to make sure the data is right
eliminate empty relations for following patterns:
topn->empty
sort->empty
distribute->empty
project->empty
(cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7)
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
backport #35347
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
example:
filter (y=1)
+-- window( ... partition by x)
+-- project( A as x, A as y)
filter(y=1) is equivalent to filter(x=1),
because x and y are in the same equal-set in window#logicalProperties.
And hence we could push filter(y=1) through window operator
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
introduced by #31811
sql like this:
select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;
Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:
LogicalAggregate
+-- LogicalPrject
+-- LogicalAggregate
After Transformation:
LogicalProject
+-- LogicalAggregate
Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.
Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.
Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
Hive's test environment uses docker, so when using 127.0.0.1,
BE will write the file to the docker of its own machine.
But if FE and are not on the same machine,
FE cannot read this file because it can only read docker on its own machine.
Therefore, the address 127.0.0.1 cannot be used in the test environment.
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.
The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
if query and mv def is as following:
def mv1_1 = """
select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
def query1_1 = """
select t1.L_LINENUMBER, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)
When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.
* [Fix](nereids) fix when insert into select empty table
---------
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
agg_state is agg intermediate state, detail see
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state
this support agg function roll up as following
+---------------------+---------------------------------------------+---------------------+
| query | materialized view | roll up |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+
for example which can be rewritten by mv sucessfully as following
MV defination is
```
select
o_orderstatus,
l_partkey,
l_suppkey,
sum_union(sum_state(o_shippriority)),
group_concat_union(group_concat_state(l_shipinstruct)),
avg_union(avg_state(l_linenumber)),
max_by_union(max_by_state(l_shipmode, l_suppkey)),
count_union(count_state(l_orderkey)),
multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
from lineitem
left join orders
on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_partkey,
l_suppkey;
```
Query is
```
select
o_orderstatus,
l_suppkey,
sum(o_shippriority),
group_concat(l_shipinstruct),
avg(l_linenumber),
max_by(l_shipmode,l_suppkey),
count(l_orderkey),
multi_distinct_count(l_shipmode)
from lineitem
left join orders
on l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_suppkey;
```
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;
Support Single table query rewrite with out group by
this is useful for complex filter or expresission
the mv def and query is as following
which can be query rewritten
mv def:
```
select *
from lineitem where l_comment like '%xx%'
```
query:
```
select l_linenumber, l_receiptdate
from lineitem where l_comment like '%xx%'
```
Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>