doris

Author	SHA1	Message	Date
Qi Chen	68eda58a8c	[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335 ) The following sql and when the dictionary column contains functions related to null, the results will be incorrect. ``` select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null'; ``` ``` select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null' ``` ``` select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'; ```	2024-05-27 15:25:29 +08:00
wuwenchi	f98ed4e4c5	[bugfix](hive)Misspelling of class names (#34981 )	2024-05-27 15:24:38 +08:00
wuwenchi	b1795d44ec	[bugfix](hive)fix testcase for test_hive_write_different_path (#35209 ) Hive's test environment uses docker, so when using 127.0.0.1, BE will write the file to the docker of its own machine. But if FE and are not on the same machine, FE cannot read this file because it can only read docker on its own machine. Therefore, the address 127.0.0.1 cannot be used in the test environment.	2024-05-27 15:24:30 +08:00
airborne12	2422439e45	[Update](regression) add case for inverted index (#35305 ) Co-authored-by: Kang <kxiao.tiger@gmail.com>	2024-05-27 15:24:09 +08:00
谢健	af986c370b	[feat](Nereids): Put the Child with Least Row Count in the First Position of Intersect (#34290 ) (#35339 ) In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator. The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.	2024-05-27 11:52:35 +08:00
seawinde	62998719df	[opt](mtmv) Add threshold for relation mapping num when query rewrite (#34694 ) (#35378 ) if query and mv def is as following: def mv1_1 = """ select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY from lineitem t1 inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY; """ def query1_1 = """ select t1.L_LINENUMBER, t2.L_ORDERKEY from lineitem t1 inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY; """ this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.	2024-05-24 20:36:29 +08:00
TengJianPing	639c7ee7fb	[fix](decimalv2) fix scale of decimalv2 to string (#35222 ) (#35359 ) * [fix](decimalv2) fix scale of decimalv2 to string	2024-05-24 17:20:43 +08:00
feiniaofeiafei	1e07971a98	[Feat](nereids)when dealing insert into stmt with empty table source, fe returns directly (#35333 ) * [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418) When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation. When dealing insert into stmt with empty table source, fe returns directly. * [Fix](nereids) fix when insert into select empty table --------- Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-24 16:25:00 +08:00
Tiewei Fang	f6beeb1ddd	[Enhencement](tvf) select tvf supports using resource (#35139 ) Create an S3/HDFS resource that TVF can use it directly to access the data source.	2024-05-24 16:23:58 +08:00
seawinde	d6e8fb7d77	[feature](mtmv) Support agg state roll up and optimize the roll up code (#35026 ) agg_state is agg intermediate state, detail see state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state this support agg function roll up as following +---------------------+---------------------------------------------+---------------------+ \| query \| materialized view \| roll up \| \| ------------------- \| ------------------------------------------- \| ------------------- \| \| agg_funtion() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_merge() \| \| agg_funtion_unoin() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_union() \| \| agg_funtion_merge() \| agg_funtion_unoin() or agg_funtion_state() \| agg_funtion_merge() \| +---------------------+---------------------------------------------+---------------------+ for example which can be rewritten by mv sucessfully as following MV defination is ``` select o_orderstatus, l_partkey, l_suppkey, sum_union(sum_state(o_shippriority)), group_concat_union(group_concat_state(l_shipinstruct)), avg_union(avg_state(l_linenumber)), max_by_union(max_by_state(l_shipmode, l_suppkey)), count_union(count_state(l_orderkey)), multi_distinct_count_union(multi_distinct_count_state(l_shipmode)) from lineitem left join orders on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate group by o_orderstatus, l_partkey, l_suppkey; ``` Query is ``` select o_orderstatus, l_suppkey, sum(o_shippriority), group_concat(l_shipinstruct), avg(l_linenumber), max_by(l_shipmode,l_suppkey), count(l_orderkey), multi_distinct_count(l_shipmode) from lineitem left join orders on l_orderkey = o_orderkey and l_shipdate = o_orderdate group by o_orderstatus, l_suppkey; ```	2024-05-24 16:23:58 +08:00
Xujian Duan	dd567fa774	[fix](function) support return JsonType for If function (#35199 ) add a FunctionSignature for If to support return Type is JsonType.	2024-05-24 16:23:58 +08:00
morrySnow	98b2bda660	[opt](Nereids) remove restrict for count() in window (#35220 ) support count() used for window function CREATE TABLE `t1` ( `id` INT NULL, `dt` TEXT NULL ) DISTRIBUTED BY HASH(`id`) BUCKETS 10 PROPERTIES ( "replication_allocation" = "tag.location.default: 1" ); select , count() over() from t1;	2024-05-24 16:23:58 +08:00
yiguolei	b3f6668464	fix case: test_create_table_without_distribution	2024-05-23 19:03:30 +08:00
seawinde	4075408b84	[feature](mtmv)Support single table mv rewrite (#34185 ) (#35242 ) Support Single table query rewrite with out group by this is useful for complex filter or expresission the mv def and query is as following which can be query rewritten mv def: ``` select * from lineitem where l_comment like '%xx%' ``` query: ``` select l_linenumber, l_receiptdate from lineitem where l_comment like '%xx%' ``` Co-authored-by: zfr9527 <qhu15zhang3294197@163.com>	2024-05-23 19:00:36 +08:00
Mingyu Chen	adc364a6fd	[feature](Paimon) support deletion vector for Paimon naive reader (#34743 ) (#35241 ) bp #34743 Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>	2024-05-23 00:01:30 +08:00
zy-kkk	24990383ff	[refactor](jdbc catalog) split clickhouse jdbc executor (#34794 ) (#35174 ) pick master #34794	2024-05-22 19:09:05 +08:00
Qi Chen	291cf57c54	[Configurations](multi-catalog) Add `enable_parquet_filter_by_min_max` and `enable_orc_filter_by_min_max` Session variables. (#35012 ) (#35164 ) backport #35012	2024-05-22 19:06:12 +08:00
feiniaofeiafei	15f70c8183	[Feat](planner)create table stmt offer default distribution attribute :random distribution and auto bucket (#35189 ) Co-authored-by: feiniaofeiafei <moailing@selectdb.com>	2024-05-22 15:18:29 +08:00
Gabriel	c23384ff07	[fix](decimal) Fix long string casting to decimalv2 (#35121 )	2024-05-22 14:32:29 +08:00
Pxl	84f7bfffe2	[Bug](bitmap-filter) fix empty bitmap when rf do merge (#34182 ) fix empty bitmap when rf do merge	2024-05-22 14:29:50 +08:00
HappenLee	f0b2f5ba36	[Fix](bug) agg limit contains null values may cause error result (#35180 )	2024-05-22 10:57:57 +08:00
Xujian Duan	af7b16f213	[optimize](desc) display the correct data type of aggStateType (#34968 ) If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement. create table a_table( k1 int null, k2 agg_state<max_by(int not null,int)> generic, k3 agg_state<group_concat(string)> generic ) aggregate key (k1) distributed BY hash(k1) buckets 3 properties("replication_num" = "1"); before optimize: mysql> desc a_table; +-------+------------------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| org.apache.doris.catalog.AggStateType@239f771c \| No \| false \| NULL \| GENERIC \| \| k3 \| org.apache.doris.catalog.AggStateType@2e535f50 \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------------------+------+-------+---------+---------+ 3 rows in set (0.00 sec) after optimize: mysql> desc a_table; +-------+------------------------------------+------+-------+---------+---------+ \| Field \| Type \| Null \| Key \| Default \| Extra \| +-------+------------------------------------+------+-------+---------+---------+ \| k1 \| INT \| Yes \| true \| NULL \| \| \| k2 \| AGG_STATE<max_by(INT, INT NULL)> \| No \| false \| NULL \| GENERIC \| \| k3 \| AGG_STATE<group_concat(TEXT NULL)> \| No \| false \| NULL \| GENERIC \| +-------+------------------------------------+------+-------+---------+---------+ Co-authored-by: duanxujian <duanxujian@jd.com>	2024-05-22 10:03:31 +08:00
zclllyybb	b96148c9cd	[Fix](function) fix days/weeks_diff result wrong on BE #35104 select days_diff('2024-01-01 00:00:00', '2023-12-31 23:59:59'); should be 0 but got 1 on BE.	2024-05-22 10:00:26 +08:00
shee	fb28d0b185	[BUG] fix scan range boundary handling is incorrect (#34832 ) fix scan range boundary handling is incorrect Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>	2024-05-21 13:00:50 +08:00
Tiewei Fang	c0fd98abe5	[Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926 ) 1. Fix the issue with tvf reading empty compressed files. 2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0	2024-05-21 12:59:31 +08:00
zhangstar333	5872173901	[improve](function) add limit check for lpad/rpad function input big value of length (#34810 )	2024-05-21 12:54:25 +08:00
wangbo	aba00d7146	[Fix](executor)Fix workload reg test #35082	2024-05-20 20:36:29 +08:00
abmdocrt	42425808a1	[Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788 " (#35056 ) * [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788) * Problem: For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas. Cause: Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE. Solution: Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs. * 2 * [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)	2024-05-20 15:43:46 +08:00
LiBinfeng	be50139eb1	[Fix](Nereids) fix leading with cte and same subqueryalias name (#34838 ) (#35047 ) fix leading with cte and same subqueryalias name Example: with tbl1 as select t1.c1 from t1 select tbl2.c2 from (select / * + leading(t2 tbl1) * / tbl1.c1, t2.c2 from tbl1 join t2) as tbl2 join t3; Reason: in this case, before getting analyzed preprocess would change subquery tbl2 to cte plan, and this cte plan should be in upper level cte plan, but not in logical result sink plan	2024-05-20 10:44:22 +08:00
LiBinfeng	5ac4ea2cd9	[Fix](Nereids) fix leading hint with update of alias name (#34434 ) (#35046 ) Problem: when using leading like leading(tbl1 tbl2) in "select * from (select tbl1.c1 from t1 as tbl1 join t2 as tbl2) join t3 as tbl2 on tbl2.c3 != 101;", in which tbl2.c3 means t3.c3 but not t2.c3 Causes and solved: when finding columns in condition, leading hint would find tbl2.c3's RelationId, and when we collect RelationId and aliasName we should update it if aliasName is repeat	2024-05-20 10:40:10 +08:00
LiBinfeng	7c29a964e5	[Fix](Nereids) fix leading with multi level of brace pairs (#34169 ) (#35043 ) fix leading with multi level of brace pairs example: leading(t1 {{t2 t3} {t4 t5}} t6) can be reduced to leading(t1 {t2 t3 {t4 t5}} t6) also update cases which remove project node from explain shape plan	2024-05-20 10:28:22 +08:00
zclllyybb	a6a398d7a4	[Fix](function) remove datev2 signature of microsecond #35017	2024-05-19 19:58:02 +08:00
Mingyu Chen	22f85be712	[fix](hive-ctas) support create hive table with full quolified name (#34984 ) Before, when executing `create table hive.db.table as select` to create table in hive catalog, if current catalog is not hive catalog, the default engine name will be filled with `olap`, which is wrong. This PR will fill the default engine name base on specified catalog.	2024-05-18 18:42:43 +08:00
xueweizhang	a59f9c3fa1	[fix](planner) fix unrequired slot bug when join node introduced by #25204 (#34923 ) before fix, join node will retain some slots, which are not materialized and unrequired. join node need remove these slots and not make them be output slots. Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2024-05-18 18:40:56 +08:00
lihangyu	e3e5f18f26	[Fix](Json type) correct cast result for json type (#34764 )	2024-05-18 18:40:17 +08:00
Xujian Duan	81bcb9d490	[opt](planner)(Nereids) support auto aggregation for random distributed table (#33630 ) support auto aggregation for querying detail data of random distributed table: the same key column will return only one row.	2024-05-18 18:40:16 +08:00
xueweizhang	9b5028785d	[fix](prepare) fix datetimev2 return err when binary_row_format (#34662 ) fix datetimev2 return err when binary_row_format. before pr, Backend return datetimev2 alwary by to_string. fix datatimev2 return metadata loss scale.	2024-05-18 18:37:41 +08:00
zhiqiang	eb7eaee386	[fix](function) money format (#34680 )	2024-05-18 18:35:29 +08:00
zhangdong	30a036e7a4	[feature](mtmv) create mtmv support partitions rollup (#31812 ) if create MTMV `date_trunc(`xxx`,'month')` when related table is `range` partition,and have 3 partitions: ``` 20200101-20200102 20200102-20200103 20200201-20200202 ``` then MTMV will have 2 partitions: ``` 20200101-20200201 20200201-20200301 ``` when related table is `list` partition,and have 3 partitions: ``` (20200101,20200102) (20200103) (20200201) ``` then MTMV will have 2 partitions: ``` (20200101,20200102，20200103) (20200201) ```	2024-05-18 18:14:48 +08:00
morrySnow	1545d96617	[WIP](test) remove enable_nereids_planner in regression cases (part 4) (#34642 ) before PR are #34417 #34490 #34558	2024-05-18 18:07:39 +08:00
wangbo	385739564d	[test](executor) Add workload group upgrade test #35007	2024-05-17 17:34:08 +08:00
lihangyu	e74b17c761	[Fix](Row store) support decimal256 type (#34887 )	2024-05-15 19:01:18 +08:00
SWEI	baf9a45e57	[fix](mtmv) check groupby in agg-bottom-plan when rewrite agg query by mv (#34274 ) check groupby in agg-bottom-plan when rewrite and rollup agg query by mv	2024-05-15 12:38:40 +08:00
Ashin Gau	1f0c45204b	[fix](iceberg) read the primary key columns if hasing equality delete (#34884 ) backport: #34835	2024-05-15 11:37:25 +08:00
zclllyybb	d5ab2787ba	[Fix](function) fix pad functions behaviour of empty pad string (#34796 ) fix pad functions behaviour of empty pad string	2024-05-15 10:28:09 +08:00
Gabriel	0b4d814598	[fix](decimal) Fix wrong result produced by decimal128 multiply (#34825 ) * [fix](decimal) Fix wrong result produced by decimal128 multiply * update	2024-05-14 23:34:11 +08:00
daidai	a0a025f763	[fix](regression test)fix test_hive_parquet_alter_column p2 case. (#34727 ) (#34859 ) fix test_hive_parquet_alter_column p2 case. Since this is a p2 case. The data is stored on emr, not in docker. So there is no need to consider hive2 and hive3.	2024-05-14 23:30:06 +08:00
wuwenchi	4dd5379951	[bugfix](hive)fix error for writing to hive for 2.1 (#34518 ) mirror #34520	2024-05-14 23:27:29 +08:00
zhiqiang	0ae1b9c70a	[chore](remove code) Remove dragonbox related (#34528 ) * Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)" This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019. * run buildall * MORE * FIX	2024-05-13 22:16:57 +08:00
xzj7019	db15c811f8	[opt](Nereids) enhance properties regulator checking (#34603 ) Enhance properties regulator checking: (1) right bucket shuffle restriction takes effective only when either side has NATUAL shuffle type. (2) enhance bothSideShuffleKeysAreSameOrder checking if taking EquivalenceExprIds into consideration. Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>	2024-05-13 22:15:16 +08:00

1 2 3 4 5 ...

2931 Commits