1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)
* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.
**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.
**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.
* 2
* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
fix leading with cte and same subqueryalias name
Example:
with tbl1 as select t1.c1 from t1
select tbl2.c2 from (select / * + leading(t2 tbl1) * / tbl1.c1, t2.c2 from tbl1 join t2) as tbl2 join t3;
Reason:
in this case, before getting analyzed preprocess would change subquery tbl2 to cte plan, and this cte plan should be in upper level cte plan, but not in logical result sink plan
Problem:
when using leading like leading(tbl1 tbl2) in
"select * from (select tbl1.c1 from t1 as tbl1 join t2 as tbl2) join t3 as tbl2 on tbl2.c3 != 101;",
in which tbl2.c3 means t3.c3 but not t2.c3
Causes and solved:
when finding columns in condition, leading hint would find tbl2.c3's RelationId, and when we collect RelationId and aliasName
we should update it if aliasName is repeat
fix leading with multi level of brace pairs
example:
leading(t1 {{t2 t3} {t4 t5}} t6) can be reduced to leading(t1 {t2 t3 {t4 t5}} t6)
also update cases which remove project node from explain shape plan
Before, when executing `create table hive.db.table as select` to create table in hive catalog,
if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong.
This PR will fill the default engine name base on specified catalog.
before fix, join node will retain some slots, which are not materialized and unrequired.
join node need remove these slots and not make them be output slots.
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
if create MTMV `date_trunc(`xxx`,'month')`
when related table is `range` partition,and have 3 partitions:
```
20200101-20200102
20200102-20200103
20200201-20200202
```
then MTMV will have 2 partitions:
```
20200101-20200201
20200201-20200301
```
when related table is `list` partition,and have 3 partitions:
```
(20200101,20200102)
(20200103)
(20200201)
```
then MTMV will have 2 partitions:
```
(20200101,20200102,20200103)
(20200201)
```
fix test_hive_parquet_alter_column p2 case.
Since this is a p2 case. The data is stored on emr, not in docker. So there is no need to consider hive2 and hive3.
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"
This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.
* run buildall
* MORE
* FIX
Enhance properties regulator checking:
(1) right bucket shuffle restriction takes effective only when either side has NATUAL shuffle type.
(2) enhance bothSideShuffleKeysAreSameOrder checking if taking EquivalenceExprIds into consideration.
Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
* [refactor](Nereids)refactor runtime filter generator (#34275)
1. unify the process of generating rf for hash join and for nested loop join
2. fix some bugs in generating rf
3. remove some duplicated check
(cherry picked from commit 07267faac0d9c6ef3bb1fd4ee101b4c761c8a2f2)
* [refactor](nereids) do not deny a runtime filter by removing an entry in aliasMap (#34559)
in current version, there are 2 approaches to verify whether a join condition can be used to generate a runtime filter, they are
1. remove the output slot from aliasMap
2. pushDownVisitor.visit(...) return false
the 1st approach has some drawbacks, we prefer to the 2ed approach.
In this pr, all the cases are handled by the 2ed approach, and remove the related code for the 1st approach.
(cherry picked from commit a29082bf31e66efa2df193b38347e610f2bf7464)
* rebase
we have an order reserved mappping from string to double.
for string column A, we have double values for A.min and A.max.
when estimating A<"abc", A.min/max could be used to judge whether 'abc' is between A.min and A.max, but it cannot be used to do range estimation. suppose "abc" is mapped to double x. if we compute selectivity by formula "sel = (x-A.min)/(A.max-A.min)", we are likely to obtain extreme values.
The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3).
The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1).
decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem.
This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1).
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
In NormalizeRepeat, three parts of the outputExpression of LogicalRepeat need to be pushed down and outputted by bottom project: flattenGroupingSetExpr, argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction.
In the original code, use these three parts to rewrite the outputExpressions of LogicalRepeat to slots.This can cause problems in some cases, for example:
```sql
SELECT
ROUND( SUM(pk + 1) - 3) col_alias1,
pk + 1 AS col_alias3
FROM
table_20_undef_partitions2_keys3_properties4_distributed_by53
GROUP BY
GROUPING SETS ((pk), ()) ;
```
The three parts expression needed to be pushed down are: pk, pk+1. The original code use pk+1 to rewrite the pk + 1 AS col_alias3 to slot. But the pk+1 is not in the list of grouping outputs, and then report error.
This pr change the rewrite process, divide the expression needed to be pushed down into 2 parts: one is (flattenGroupingSetExpr) and the other one is (argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction).
and use the flattenGroupingSetExpr rewrite all LogicalRepeat outputExpressions, and use the argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction to rewrite only the agg function arguments and the grouping scalar function.
So, in the above sql, the pk + 1 AS col_alias3 will not be rewritten to slot, and can be computed.
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>