doris

Author	SHA1	Message	Date
ZhangYu0123	115e52c16c	[Opt](array) optimize_array_sort (#18123 )	2023-03-27 22:01:24 +08:00
gitccl	ee80c12815	[feature](json) add json_extract function (#17808 )	2023-03-27 21:19:47 +08:00
mch_ucchi	894f38a517	[fix](planner) fix conjunct planned on exchange node (#18042 ) sql like: select k5, k6, SUM(k3) AS k3 from ( select k5, date_format(k6, '%Y-%m-%d') as k6, count(distinct k3) as k3 from t group by k5, k6 ) AS temp where 1=1 group by k5, k6; will throw exception since conjuncts planned on exchange node, because exchange node cannot handle conjuncts, now we skip exchange node when planning conjuncts, which fixes the bug. notice: the bug occurs iff the conjunct is always true like 1=1 above.	2023-03-27 17:50:52 +08:00
mch_ucchi	902629adb6	[fix](planner) fix targetTypeDef NPE when value is null (#18072 ) sql like: select * from (select , null as top from v1)t where top = 5; select from (select *, null as top from v1)t where top is not null; will cause NPE because targetTypeDef is null when value is null. Now we use cast target type to the targetTypeDef.	2023-03-27 17:29:14 +08:00
mch_ucchi	8b07021f5f	[enhancement](regression-test) add hint to disable nereids planner for some cases (#18066 )	2023-03-27 14:06:50 +08:00
Liqf	bcf95cd920	[feature](function)Add ST_Angle_Sphere function (#17919 )	2023-03-27 10:14:46 +08:00
TengJianPing	78abb40fdc	[improvement](string) throw exception instead of log fatal if string column exceed total size limit (#17989 ) Throw exception instead of log fatal if string column exceed total size limit, so that we can catch it and let query fail, instead of causing be exit.	2023-03-27 08:55:26 +08:00
奕冷	a0b100d38e	[enhancement](regression-test) prove setting default value to session var will be detected #18113	2023-03-26 12:56:15 +08:00
ZashJie	2a0890d803	[feature](datatype) add show data types stmt (#18111 )	2023-03-26 12:37:06 +08:00
gitccl	96f274b8f3	[fix](global-variable) fix bug that set default value for global variable will cause NullPointerException (#18004 )	2023-03-25 22:45:26 +08:00
Yisong Han	df0eca4003	[improvement] (schema change) Lightweight schema change of modify column with varchar length (#17207 ) Signed-off-by: Yisong Han <yisong8686@gmail.com>	2023-03-25 22:38:19 +08:00
abmdocrt	74fdb6c116	[refactor](regression-test) refactor ssl test from p0 to p2 (#17847 )	2023-03-25 22:37:26 +08:00
ZhangYu0123	360d3050bc	[Feature](array-function) Support array_reverse_sort function (#17754 ) Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-03-25 21:58:11 +08:00
xueweizhang	50eeb2d9a4	[fix](json) change int to bigint for json function (#17769 )	2023-03-25 21:57:29 +08:00
Pxl	a8753faeb1	[Bug](function) fix column complex not resize after filter (#18043 )	2023-03-25 21:48:13 +08:00
Jerry Hu	f84481886b	[feature](string_functions) The 'split_part' function supports non-constant parameters (#18029 )	2023-03-25 12:03:11 +08:00
Gabriel	2408ca5da8	[Bug](DECIMALV3) Fix wrong precision for plus/minus (#18052 ) Result type for DECIMAL(x, y) plus/minus DECIMAL(m, n) should be DECIMAL(max(x - y, m - n) + max(y + n) + 1, max(y + n))	2023-03-25 09:42:39 +08:00
HappenLee	473f0c45ff	[Bug](delete) Fix bug of delete partition prune error (#18057 )	2023-03-24 20:22:12 +08:00
starocean999	7bdd854fdc	[fix](nereids) bucket shuffle and colocate join is not correctly recognized (#17807 ) 1. close (https://github.com/apache/doris/issues/16458) for nereids 2. varchar and string type should be treated as same type in bucket shuffle join scenario. ``` create table shuffle_join_t1 ( a varchar(10) not null ) create table shuffle_join_t2 ( a varchar(5) not null, b string not null, c char(3) not null ) ``` the bellow 2 sqls can use bucket shuffle join ``` select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.a; select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.b; ``` 3. PushdownExpressionsInHashCondition should consider both hash and other conjuncts 4. visitPhysicalProject should handle MarkJoinSlotReference	2023-03-24 19:21:41 +08:00
Kang	1a3c6b7ed9	[bugfix](testcase) use different table name in map testcases to avoid confilt (#18077 )	2023-03-24 17:43:18 +08:00
Pxl	8249441335	[Bug](planner) add conjunct slotref id to table function node to avoid result incorrect (#18063 ) add conjunct slotref id to table function node to avoid result incorrect	2023-03-24 14:48:03 +08:00
mch_ucchi	aa3ea4beed	[fix](planner) failed to create view when use window function (#17815 ) fix failed to create view when use window function because the view string contains slot id and which cannot be parsed.	2023-03-24 10:58:52 +08:00
starocean999	22fce33fb2	[fix](nereids) fix bitmap function nullable trait and dphyper bugs (#18041 ) 1. some bitmap functions like bitmap_or, bitmap_and_count, bitmap_or_count etc shouldn't follow constant fold rule for PropagateNullable functions. So remove PropagateNullable property and these functions would use their own constant fold logic correctly 2. dphyper's PlanReceiver class shouldn't change hyperGraph's complex project info. So make PlanReceiver use its own copy of complex project info now.	2023-03-24 10:53:45 +08:00
huangzhaowei	a65616a5cd	[enhancement](MTMV) Add a timeout for regression tests (#18048 ) MTMV regression tests may loop forever due to some potential bugs. Therefore, we add a timeout to avoid endless loop. The value of the timeout is hard coded 30 minutes now.	2023-03-24 10:39:42 +08:00
zhangstar333	2a35adbba8	[vectorized](udaf) fix java-udaf case of P0 is unstable (#18054 ) the udaf case is unstable reason: when enable_pipeline_engine=true, the case of agg function only 1 instance, so not merge the default value, but if instance>1, will merge the default value	2023-03-24 09:10:58 +08:00
924060929	321bb3e9ee	[refactor](Nereids) Refactor and optimize partition pruning (#18003 ) the legacy PartitionPruner only support some simple cases, some useful cases not support: 1. can not support evaluate some builtin functions, like `cast(part_column as bigint) = 1` 2. can not prune multi level range partition, for partition `[[('1', 'a'), ('2', 'b'))`, it has some constraints: - first_part_column between '1' and '2' - if first_part_column = '1' then second_part_column >= 'a' - if first_part_column = '2' then second_part_column < 'a' This pr refactor it and support: 1. use visitor to evaluate function and fold constant 2. if the partition is discrete like int, date, we can expand it and evaluate, e.g `[1, 5)` will be expand to `[1, 2, 3, 4]` 3. support prune multi level range partition, as previously described 4. support evaluate capabilities for a range slot, e.g. datetime range partition `[('2023-03-21 00:00:00'), ('2023-03-21 23:59:59'))`, if the filter is `date(col1) = '2023-03-22'`, this partition will be pruned, we can do this prune because we know that the date always is `2023-03-21`. you can implement the visit method in FoldConstantRuleOnFE and OneRangePartitionEvaluator to support this functions. ### How can we do it so finely ？ Generally, the range partition can separate to three parts: `const`, `range`, `other`. for example, the partition `[(1, 'a', 'D'), ('1', 'c', 'D'))` exist 1. first partition column is `const`: always equals to '1' 2. second partition column is `range`: `slot >= 'a' and <= 'c'`. If not later slot, it must be `slot >= 'a' and < 'c'` 3. third partition column is `other`: regardless of whether the upper and lower bounds are the same, it must exist multi values, e.g. `('1', 'a', 'D')`, `('1', 'a', 'F')`, `('1', 'b', 'A')`, `('1', 'c', 'A')` In a partition, there is one and only one `range` slot can exist; maybe zero or one or many `const`/`other` slots. Normally, a partition look like [const, range, other], these are the possible shapes: 1. [range], e.g `[('1'), ('10'))` 2. [const, range], e.g. `[('1', 'a'), ('1', 'd'))` 3. [range, other, other], e.g. `[('1', '1', '1'), ('2', '1', '1'))` 4. [const, const, ..., range, other, other, ...], e.g. `[('1', '1', '2', '3', '4'), ('1', '1', '3', '3', '4'))` The properties of `const`: 1. we can replace slot to literal to evaluate expression tree. The properties of `range`: 1. if the slot date type is discrete type, like int, and date, we can expand it to literal and evaluate expression tree 2. if not discrete type, like datetime, or the discrete values too much, like [1, 1000000), we can keep the slot in the expression tree, and assign a range for it, when evaluate expression tree, we also compute the range and check whether range is empty set, if so we can simplify to BooleanLiteral.FALSE to skip this partition. 5. if the range slot satisfied some conditions , we can fold the slot with some function too, see the datetime example above The properties of `other`: 1. only when the previous slot is literal and equals to the lower bound or upper bound of partition, we can shrink the range of the `other` slot According this properties, we can do it finely. at the runtime, the `range` and `other` slot maybe shrink the range of values, e.g. 1. the partition `[('a'), ('b'))` with predicate `part_col = 'a'` will shrink range `['a', 'b')` to `['a']`, like a `range` slot change/downgrading to `const` slot; 2. the partition `[('a', '1'), ('b', '10'))` with predicate `part_col1 = 'a'` will shrink the range of `other` slot from unknown(all range) to `['1', +∞)`, like a `other` slot change/downgrading to `range` slot. But to simplify, I haven't change the type at the runtime, just shrink the ColumnRange.	2023-03-24 09:06:52 +08:00
924060929	d3e7f12ada	[refactor](Nereids) refactor column pruning (#17579 ) This pr refactor the column pruning by the visitor, the good sides 1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate 2. support shrink output field in some plans, this can skip some useless operations so improvement example: ```sql select id from ( select id, sum(age) from student group by id )a ``` we should prune the useless `sum (age)` in the aggregate. before refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`#4], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ``` after refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ```	2023-03-24 09:00:48 +08:00
slothever	678314d657	[fix](regression)fix glue regression (#17952 )	2023-03-24 00:10:20 +08:00
morrySnow	c1bd5b26a8	[refactor](Nereids) expression translate no long rely on legacy planner code (#17671 )	2023-03-23 23:05:15 +08:00
Gabriel	5445a86570	[Bug](array_product) Fix array_product for ARRAY<DECIMAL> (#18014 )	2023-03-23 20:29:50 +08:00
Xin Liao	bdff9a7a7b	[regression-test](merge-on-write) Optimize merge-on-write case (#18038 )	2023-03-23 17:59:49 +08:00
lihangyu	4c5ba4bb01	[Improve](point query) optimize sendFields since `writeField` is heav… (#18000 ) save about 20% FE cpu cost for point query with prepared statement which table contains 100 columns	2023-03-23 17:45:56 +08:00
Pxl	f43d2ded0a	[Chore](case) add order by to testIncorrectMVRewriteInSubquery (#18017 ) add order by to testIncorrectMVRewriteInSubquery	2023-03-23 16:39:46 +08:00
morrySnow	20d26397aa	[fix](planner) forbid inline view but not the subquery resolve from parent tuples (#18032 ) in PR #17813 , we want to forbid bind slot on brother's column howerver the fix is not in correct way. the correct way to do that is forbid subquery register itself in parent's analyzer. This reverts commit b91a3b5a72520105638dad1079b71a05f02c10a0.	2023-03-23 16:11:04 +08:00
qiye	cedd36c786	[improvement](compaction)Support segcompaction for inverted index (#17874 ) Since Doris supports segcompaction #12866 during loading, inverted index support is also needed.	2023-03-23 14:41:30 +08:00
morrySnow	fadf3b906d	[enhancement](planner) delete support between predicate (#17892 )	2023-03-23 13:24:32 +08:00
mch_ucchi	abeec4848a	[Fix](Nereids)fix be fold constant incorrectly on from_unixtime. (#18016 )	2023-03-23 11:17:08 +08:00
ZhangYu0123	089a91ecd5	[vectorized](function) support array_exists lambda function (#17931 ) Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-03-23 11:11:39 +08:00
Dongyang Li	6935e153e6	[pipeline](ckb) back to even pr id don't run ckb (#18022 ) Co-authored-by: stephen <hello_stephen@@qq.com>	2023-03-22 20:37:12 +08:00
qiye	410907c940	[improvement](inverted index)UNIQUE_KEYS table only supports inverted index when merge_on_write is enabled. (#17827 ) When adding inverted index to UNIQUE_KEYS table without merge_on_write enabled, the match query may failed before the segment is compacted. So we add the restriction here.	2023-03-22 17:47:30 +08:00
zhangstar333	a61ef34a68	[vectorzied](log) add some log in java-udaf function (#18001 )	2023-03-22 13:30:28 +08:00
Xinyi Zou	ebef0c038d	Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 )" (#17887 ) This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.	2023-03-22 13:28:25 +08:00
mch_ucchi	0e7f0abe61	[test] (Nereids) add regression-test of arithmetic expressions of decimalv3 for nereids (#17549 ) add regression-test of decimalv3 for nereids and refactor some suites. too many suites will be changed, so this pr we just add arithmetic test. 1. some tests are disabled because of unfixed results and precision, detailed a big integer mul and div a float will cause the latter and bit-op will cause the former. 2. the disabled tests with tag original planner are caused by unfixed results.	2023-03-22 11:25:49 +08:00
Pxl	40ca250678	[Feature](materialized-view) support where clause on create materialized view (#17534 ) support where clause on create materialized view	2023-03-22 11:25:13 +08:00
Pxl	401836f523	[Bug](planner) fix core dump when lateral view above union node and have predicate (#17912 ) fix core dump when lateral view above union node and have predicate	2023-03-22 11:24:45 +08:00
starocean999	17a1ce5ed3	[fix](nereids) add a project node above sort node to eliminate unused order by keys (#17913 ) if the order by keys are not simple slot in sort node, the order by exprs have to been added to sort node's output tuple. In that case, we need add a project node above sort node to eliminate the unused order by exprs. for example: ```sql WITH t0 AS (SELECT DATE_FORMAT(date, '%Y%m%d') AS date FROM cir_1756_t1 ), t3 AS (SELECT date_format(date, '%Y%m%d') AS `date` FROM `cir_1756_t2` GROUP BY date_format(date, '%Y%m%d') ORDER BY date_format(date, '%Y%m%d') ) SELECT t0.date FROM t0 LEFT JOIN t3 ON t0.date = t3.date; ``` before: ``` +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalProject[159] ( distinct=false, projects=[date#1], excepts=[], canEliminate=true ) \| \| +--LogicalJoin[158] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#1 = date#3)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[151] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#1], excepts=[], canEliminate=true ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=412339, preAgg=ON ) \| \| +--LogicalSort[157] ( orderKeys=[date_format(cast(date#3 as DATETIME), '%Y%m%d') asc null first] ) \| \| +--LogicalAggregate[156] ( groupByExpr=[date#3], outputExpr=[date#3], hasRepeat=false ) \| \| +--LogicalProject[155] ( distinct=false, projects=[date_format(date#2, '%Y%m%d') AS `date`#3], excepts=[], canEliminate=true ) \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=412352, preAgg=ON ) \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ ``` after: ``` +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalProject[171] ( distinct=false, projects=[date#2], excepts=[], canEliminate=true ) \| \| +--LogicalJoin[170] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#2 = date#4)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[162] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#2], excepts=[], canEliminate=true ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=1049812, preAgg=ON ) \| \| +--LogicalProject[169] ( distinct=false, projects=[date#4], excepts=[], canEliminate=false ) \| \| +--LogicalSort[168] ( orderKeys=[date_format(cast(date#4 as DATETIME), '%Y%m%d') asc null first] ) \| \| +--LogicalAggregate[167] ( groupByExpr=[date#4], outputExpr=[date#4], hasRepeat=false ) \| \| +--LogicalProject[166] ( distinct=false, projects=[date_format(date#3, '%Y%m%d') AS `date`#4], excepts=[], canEliminate=true ) \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=1049825, preAgg=ON ) \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ ```	2023-03-22 11:19:32 +08:00
morrySnow	173d68409c	[enhencement](planner) update and delete support use alias for target table (#17914 )	2023-03-22 11:07:39 +08:00
morrySnow	b91a3b5a72	[fix](planner) should not bind slot on brother's tuple in subquery (#17813 ) consider the query like this: ```sql SELECT k3, k4 FROM test WHERE EXISTS( SELECT d.* FROM (SELECT k1 AS _1234, SUM(k2) FROM `test` d GROUP BY _1234) d LEFT JOIN (SELECT k1 AS _1234, SUM(k2) FROM `test` GROUP BY _1234) temp ON d._1234 = temp._1234) ORDER BY k3, k4 ``` when we analyze group by exprs in `temp` inline view. we bind the `_1234` on `d._1234` by mistake. that because, when we do analyze in a SUB-QUERY, we will resolve SlotRef by itself AND parent's tuple. in the meanwhile, we register child's tuple to parent's analyzer. So, in a SUB-QUERY, the brother's tuple will affect the resolve result of current inlineview's slot. This PR: 1. add a flag on the function `resolveColumnRef` in `Analyzer` ```java private TupleDescriptor resolveColumnRef(String colName, boolean requestFromChild); private TupleDescriptor resolveColumnRef(TableName tblName, String colName, boolean requestByChild); ``` 2. add a flag to specify whether the tuple is from child. ```java // alias name -> <from child, tupleDesc> private final Multimap<String, Pair<Boolean, TupleDescriptor>> tupleByAlias; ``` when `requestByChild == true`, we SKIP the tuple from other child to avoid resolve error.	2023-03-22 11:00:55 +08:00
zhangdong	545d3b1c3e	[Enhancement](auth)support ranger col priv (#17915 ) 1.When querying data, it is no longer necessary to verify the permissions of the entire table, but rather to verify the permissions of the queried columns. Currently, the 'ranger' already supports column permissions, and the internal catalog provides the implementation of dummy column permissions (the actual verified permissions are still table permissions) 2.delete roles in userIdentity 3.Change trigger logic of initAccessController	2023-03-22 09:00:17 +08:00
huangzhaowei	8df4a94826	[fix](MTMV) Tasks leak when dropping job (#17984 ) 1. Divide MTMV regression tests into 4 suites 2. Try to remove tasks which were killed by dropping job actions in running map.	2023-03-21 23:22:17 +08:00

1 2 3 4 5 ...

1360 Commits