The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
Hive's test environment uses docker, so when using 127.0.0.1,
BE will write the file to the docker of its own machine.
But if FE and are not on the same machine,
FE cannot read this file because it can only read docker on its own machine.
Therefore, the address 127.0.0.1 cannot be used in the test environment.
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.
The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
if query and mv def is as following:
def mv1_1 = """
select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
def query1_1 = """
select t1.L_LINENUMBER, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)
When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.
* [Fix](nereids) fix when insert into select empty table
---------
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
agg_state is agg intermediate state, detail see
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state
this support agg function roll up as following
+---------------------+---------------------------------------------+---------------------+
| query | materialized view | roll up |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+
for example which can be rewritten by mv sucessfully as following
MV defination is
```
select
o_orderstatus,
l_partkey,
l_suppkey,
sum_union(sum_state(o_shippriority)),
group_concat_union(group_concat_state(l_shipinstruct)),
avg_union(avg_state(l_linenumber)),
max_by_union(max_by_state(l_shipmode, l_suppkey)),
count_union(count_state(l_orderkey)),
multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
from lineitem
left join orders
on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_partkey,
l_suppkey;
```
Query is
```
select
o_orderstatus,
l_suppkey,
sum(o_shippriority),
group_concat(l_shipinstruct),
avg(l_linenumber),
max_by(l_shipmode,l_suppkey),
count(l_orderkey),
multi_distinct_count(l_shipmode)
from lineitem
left join orders
on l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_suppkey;
```
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;
Support Single table query rewrite with out group by
this is useful for complex filter or expresission
the mv def and query is as following
which can be query rewritten
mv def:
```
select *
from lineitem where l_comment like '%xx%'
```
query:
```
select l_linenumber, l_receiptdate
from lineitem where l_comment like '%xx%'
```
Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)
* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.
**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.
**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.
* 2
* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
fix leading with cte and same subqueryalias name
Example:
with tbl1 as select t1.c1 from t1
select tbl2.c2 from (select / * + leading(t2 tbl1) * / tbl1.c1, t2.c2 from tbl1 join t2) as tbl2 join t3;
Reason:
in this case, before getting analyzed preprocess would change subquery tbl2 to cte plan, and this cte plan should be in upper level cte plan, but not in logical result sink plan
Problem:
when using leading like leading(tbl1 tbl2) in
"select * from (select tbl1.c1 from t1 as tbl1 join t2 as tbl2) join t3 as tbl2 on tbl2.c3 != 101;",
in which tbl2.c3 means t3.c3 but not t2.c3
Causes and solved:
when finding columns in condition, leading hint would find tbl2.c3's RelationId, and when we collect RelationId and aliasName
we should update it if aliasName is repeat
fix leading with multi level of brace pairs
example:
leading(t1 {{t2 t3} {t4 t5}} t6) can be reduced to leading(t1 {t2 t3 {t4 t5}} t6)
also update cases which remove project node from explain shape plan
Before, when executing `create table hive.db.table as select` to create table in hive catalog,
if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong.
This PR will fill the default engine name base on specified catalog.
before fix, join node will retain some slots, which are not materialized and unrequired.
join node need remove these slots and not make them be output slots.
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
if create MTMV `date_trunc(`xxx`,'month')`
when related table is `range` partition,and have 3 partitions:
```
20200101-20200102
20200102-20200103
20200201-20200202
```
then MTMV will have 2 partitions:
```
20200101-20200201
20200201-20200301
```
when related table is `list` partition,and have 3 partitions:
```
(20200101,20200102)
(20200103)
(20200201)
```
then MTMV will have 2 partitions:
```
(20200101,20200102,20200103)
(20200201)
```
fix test_hive_parquet_alter_column p2 case.
Since this is a p2 case. The data is stored on emr, not in docker. So there is no need to consider hive2 and hive3.
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"
This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.
* run buildall
* MORE
* FIX
Enhance properties regulator checking:
(1) right bucket shuffle restriction takes effective only when either side has NATUAL shuffle type.
(2) enhance bothSideShuffleKeysAreSameOrder checking if taking EquivalenceExprIds into consideration.
Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>