Commit Graph

2908 Commits

Author SHA1 Message Date
fb28d0b185 [BUG] fix scan range boundary handling is incorrect (#34832)
fix scan range boundary handling is incorrect
Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
2024-05-21 13:00:50 +08:00
c0fd98abe5 [Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926)
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
2024-05-21 12:59:31 +08:00
5872173901 [improve](function) add limit check for lpad/rpad function input big value of length (#34810) 2024-05-21 12:54:25 +08:00
aba00d7146 [Fix](executor)Fix workload reg test #35082 2024-05-20 20:36:29 +08:00
42425808a1 [Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788" (#35056)
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)

* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.

**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.

**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.

* 2

* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
2024-05-20 15:43:46 +08:00
be50139eb1 [Fix](Nereids) fix leading with cte and same subqueryalias name (#34838) (#35047)
fix leading with cte and same subqueryalias name
Example:
with tbl1 as select t1.c1 from t1
select tbl2.c2 from (select / * + leading(t2 tbl1) * / tbl1.c1, t2.c2 from tbl1 join t2) as tbl2 join t3;
Reason:
in this case, before getting analyzed preprocess would change subquery tbl2 to cte plan, and this cte plan should be in upper level cte plan, but not in logical result sink plan
2024-05-20 10:44:22 +08:00
5ac4ea2cd9 [Fix](Nereids) fix leading hint with update of alias name (#34434) (#35046)
Problem:
when using leading like leading(tbl1 tbl2) in
"select * from (select tbl1.c1 from t1 as tbl1 join t2 as tbl2) join t3 as tbl2 on tbl2.c3 != 101;",
in which tbl2.c3 means t3.c3 but not t2.c3
Causes and solved:
when finding columns in condition, leading hint would find tbl2.c3's RelationId, and when we collect RelationId and aliasName
we should update it if aliasName is repeat
2024-05-20 10:40:10 +08:00
7c29a964e5 [Fix](Nereids) fix leading with multi level of brace pairs (#34169) (#35043)
fix leading with multi level of brace pairs
example:
leading(t1 {{t2 t3} {t4 t5}} t6) can be reduced to leading(t1 {t2 t3 {t4 t5}} t6)
also update cases which remove project node from explain shape plan
2024-05-20 10:28:22 +08:00
a6a398d7a4 [Fix](function) remove datev2 signature of microsecond #35017 2024-05-19 19:58:02 +08:00
22f85be712 [fix](hive-ctas) support create hive table with full quolified name (#34984)
Before, when executing `create table hive.db.table as select` to create table in hive catalog,
if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong.

This PR will fill the default engine name base on specified catalog.
2024-05-18 18:42:43 +08:00
a59f9c3fa1 [fix](planner) fix unrequired slot bug when join node introduced by #25204 (#34923)
before fix, join node will retain some slots, which are not materialized and unrequired.
join node need remove these slots and not make them be output slots.

Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-05-18 18:40:56 +08:00
e3e5f18f26 [Fix](Json type) correct cast result for json type (#34764) 2024-05-18 18:40:17 +08:00
81bcb9d490 [opt](planner)(Nereids) support auto aggregation for random distributed table (#33630)
support auto aggregation for querying detail data of random distributed table:
the same key column will return only one row.
2024-05-18 18:40:16 +08:00
9b5028785d [fix](prepare) fix datetimev2 return err when binary_row_format (#34662)
fix datetimev2 return err when binary_row_format. before pr, Backend return datetimev2 alwary by to_string.
fix datatimev2 return metadata loss scale.
2024-05-18 18:37:41 +08:00
eb7eaee386 [fix](function) money format (#34680) 2024-05-18 18:35:29 +08:00
30a036e7a4 [feature](mtmv) create mtmv support partitions rollup (#31812)
if create MTMV `date_trunc(`xxx`,'month')`
when related table is `range` partition,and have 3 partitions:
```
20200101-20200102
20200102-20200103
20200201-20200202
```
then MTMV will have 2 partitions:
```
20200101-20200201
20200201-20200301
```

when related table is `list` partition,and have 3 partitions:
```
(20200101,20200102)
(20200103)
(20200201)
```
then MTMV will have 2 partitions:
```
(20200101,20200102,20200103)
(20200201)
```
2024-05-18 18:14:48 +08:00
1545d96617 [WIP](test) remove enable_nereids_planner in regression cases (part 4) (#34642)
before PR are
#34417
#34490
#34558
2024-05-18 18:07:39 +08:00
385739564d [test](executor) Add workload group upgrade test #35007 2024-05-17 17:34:08 +08:00
e74b17c761 [Fix](Row store) support decimal256 type (#34887) 2024-05-15 19:01:18 +08:00
baf9a45e57 [fix](mtmv) check groupby in agg-bottom-plan when rewrite agg query by mv (#34274)
check groupby in agg-bottom-plan when rewrite and rollup agg query by mv
2024-05-15 12:38:40 +08:00
1f0c45204b [fix](iceberg) read the primary key columns if hasing equality delete (#34884)
backport: #34835
2024-05-15 11:37:25 +08:00
d5ab2787ba [Fix](function) fix pad functions behaviour of empty pad string (#34796)
fix pad functions behaviour of empty pad string
2024-05-15 10:28:09 +08:00
0b4d814598 [fix](decimal) Fix wrong result produced by decimal128 multiply (#34825)
* [fix](decimal) Fix wrong result produced by decimal128 multiply

* update
2024-05-14 23:34:11 +08:00
a0a025f763 [fix](regression test)fix test_hive_parquet_alter_column p2 case. (#34727) (#34859)
fix test_hive_parquet_alter_column p2 case.
Since this is a p2 case. The data is stored on emr, not in docker. So there is no need to consider hive2 and hive3.
2024-05-14 23:30:06 +08:00
4dd5379951 [bugfix](hive)fix error for writing to hive for 2.1 (#34518)
mirror #34520
2024-05-14 23:27:29 +08:00
0ae1b9c70a [chore](remove code) Remove dragonbox related (#34528)
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"

This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.

* run buildall

* MORE

* FIX
2024-05-13 22:16:57 +08:00
db15c811f8 [opt](Nereids) enhance properties regulator checking (#34603)
Enhance properties regulator checking:
(1) right bucket shuffle restriction takes effective only when either side has NATUAL shuffle type.
(2) enhance bothSideShuffleKeysAreSameOrder checking if taking EquivalenceExprIds into consideration.


Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
2024-05-13 22:15:16 +08:00
3ef5ed1ad0 [opt](Nereids) normalize column name of output file (#34650)
when do export to output file, normalize column name.
For example

> SELECT 1 > 2 INTO OUTFILE "..."

the column name of 1 > 2 will be __greater_than_0
2024-05-13 22:12:46 +08:00
ca9eb56233 [Fix](functions) fix strcmp return value #34565 2024-05-12 09:49:38 +08:00
20e2d2e2f8 [Fix](executor)Fix workload thread start failed when follower convert to master 2024-05-12 09:30:14 +08:00
9915862bf7 [opt](nereids)estimate rowcount for is-null filter when column stats are not available (#34519)
* estimate rowcount for is-null filter when column stats are not available
2024-05-11 15:04:35 +08:00
719e50f353 [fix](json function) fix failed when json_exists_path use not null input (#34289) 2024-05-11 15:04:35 +08:00
Pxl
1ff4dc8f85 [Bug](runtime-filter) fix coredump won change_null_to_true when argument column is not null… (#34602)
fix coredump won change_null_to_true when argument column is not nullable
2024-05-11 15:04:35 +08:00
2392477f76 [test](shuffle) test insert row count when rows filtered by ExchangeNode (#34657) 2024-05-11 11:47:49 +08:00
8c237e82a3 [Bug](exec) fix intersections/differences bug (#34675) 2024-05-11 11:45:31 +08:00
58c19e33b3 [fix](round) Fix incorrect decimal scale inference in round functions (#34471)
* FIX NEEDED

* FORMAT

* FORMAT

* FIX TEST
2024-05-11 11:42:12 +08:00
7ba66c5890 [branch-2.1](routine-load) do not schedule task when there is no data (#34654) 2024-05-11 11:01:18 +08:00
dd1b54cf62 [pick](nereids)Runtime filter pushdown refactor for branch-2.1 (#34682)
* [refactor](Nereids)refactor runtime filter generator (#34275)

1. unify the process of generating rf for hash join and for nested loop join
2. fix some bugs in generating rf
3. remove some duplicated check

(cherry picked from commit 07267faac0d9c6ef3bb1fd4ee101b4c761c8a2f2)

* [refactor](nereids) do not deny a runtime filter by removing an entry in aliasMap (#34559)

in current version, there are 2 approaches to verify whether a join condition can be used to generate a runtime filter, they are
1. remove the output slot from aliasMap
2. pushDownVisitor.visit(...) return false
the 1st approach has some drawbacks, we prefer to the 2ed approach.
In this pr, all the cases are handled by the 2ed approach, and remove the related code for the 1st approach.

(cherry picked from commit a29082bf31e66efa2df193b38347e610f2bf7464)

* rebase
2024-05-11 09:44:24 +08:00
e38801968d [Fix](functions) Fix bug in makedate and str_to_date functions 2024-05-10 22:14:25 +08:00
d5d6c7f8a4 [opt](nereids) optimize str-like-col range filter estimation (#34542)
we have an order reserved mappping from string to double.
for string column A, we have double values for A.min and A.max.
when estimating A<"abc", A.min/max could be used to judge whether 'abc' is between A.min and A.max, but it cannot be used to do range estimation. suppose "abc" is mapped to double x. if we compute selectivity by formula "sel = (x-A.min)/(A.max-A.min)", we are likely to obtain extreme values.
2024-05-10 22:14:00 +08:00
845732b440 [WIP](test) remove enable_nereids_planner in regression cases (part 3) (#34558)
previous PR:
part 1: #34417
part 2: #34490
2024-05-10 22:11:01 +08:00
853dbdcb00 [Feature](PreparedStatement) implement general server side prepared (#33807) 2024-05-10 22:10:11 +08:00
6c11dd2231 [Fix](planner) fix ScalarType.getAssignmentCompatibleType() when deal boolean and decimal (#34435)
The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3).
The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1).
decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem.
This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1).


Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-10 22:07:16 +08:00
7c56c17ecc [Fix](nereids) fix NormalizeRepeat, change the outputExpression rewrite logic (#34196)
In NormalizeRepeat, three parts of the outputExpression of LogicalRepeat need to be pushed down and outputted by bottom project: flattenGroupingSetExpr, argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction.
In the original code, use these three parts to rewrite the outputExpressions of LogicalRepeat to slots.This can cause problems in some cases, for example:
```sql
SELECT
	ROUND( SUM(pk + 1) - 3) col_alias1,
	pk + 1 AS col_alias3 
	FROM
	table_20_undef_partitions2_keys3_properties4_distributed_by53
GROUP BY
	GROUPING SETS ((pk), ()) ;
```
The three parts expression needed to be pushed down are: pk, pk+1. The original code use pk+1 to rewrite the pk + 1 AS col_alias3  to slot. But the pk+1 is not in the list of grouping outputs, and then report error.
This pr change the rewrite process,  divide the expression needed to be pushed down  into 2 parts: one is (flattenGroupingSetExpr) and the other one is (argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction).
 and use the flattenGroupingSetExpr rewrite all LogicalRepeat outputExpressions, and use the argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction to rewrite only the agg function arguments and the grouping scalar function.
So, in the above sql, the pk + 1 AS col_alias3  will not be rewritten to slot, and can be computed.
2024-05-10 22:03:31 +08:00
c0cca6103b [WIP](test) remove enable_nereids_planner in regression cases (part 2) (#34490) 2024-05-10 22:02:32 +08:00
97bc367611 [enhancement](regression-test) modify a key type from BIGINT/LARGEINT to other type (#34436)
Co-authored-by: cjj2010 <2449402815@qq.com>
2024-05-10 14:48:52 +08:00
25ae7cd65f [bug](ipv6) the ipv6 type should be uint128_t (#34121)
the ipv6 type should be uint128_t, and max value is ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff
if use int128_t type, it's will be min value.
2024-05-10 14:43:46 +08:00
9b712b03b4 [FIX]fix is_ip_address_in_range func with const param (#34266) 2024-05-10 14:37:20 +08:00
520774a24b [fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042)
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
2024-05-10 14:37:04 +08:00
cc00666be6 [opt](inverted index) add inlist condition handling to compound (#34134)
1. Previously, the compound did not support the inlist condition, which could impact performance if an inverted index was created.
2024-05-10 14:35:47 +08:00