doris

Author	SHA1	Message	Date
morrySnow	698bae09b2	[fix](Nereids)get NPE and group not be optimized when add REWRITE rule to Cascades Optimzer (#12346 ) Fix some bugs when add REWRITE rule to Cascades Optimizer - all rule should set as not rewrite rule when use them in Cascades Optimizer - IMPLEMENT rule promise should large than others since we should do exploration first.	2022-09-05 19:11:48 +08:00
minghong	f466a072d8	fix bug: tpch-q12 invalid type (#12347 ) In old planner, Predicate set its type in analyzeImpl(). However, function analyzeImpl() is in old planner path, but not in nereids path. And hence the type is invalid. Because all predicate has type bool, we set its type in constructor.	2022-09-05 19:09:27 +08:00
Kikyou1997	dadfd85c40	prune for agg with constant expr (#12274 ) Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g. SELECT COUNT(1) FROM t; This PR reserve the column with the smallest amount of data when doing column prune under this situation. To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.	2022-09-05 19:09:00 +08:00
Adonis Ling	8bfb89c100	[feature-wip](array-type) Add some regression tests for nested array (#12322 ) #11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch may be corrupted because the data of _input_block is copied shallowly to the _output_block.	2022-09-05 14:05:24 +08:00
Gabriel	3b104e334a	[Bug](load) fix missing nullable info in stream load (#12302 )	2022-09-05 13:41:28 +08:00
morrySnow	2398cd3bb6	[enhancement](Nereids)print slot name in explain string (#12272 ) Currently, explain string print all expression as slot id, e.g. `<slot 1>`. This PR, print its name with slot id instead, e.g. `column_a[#1]`. For details: - print qualified table name for OlapScanNode - print NamedExpression name with SlotId instead of just SlotId - OlapScanNode's node name use "OlapScanNode" instead of table name	2022-09-05 11:31:35 +08:00
camby	90a0baf5f8	[fix](array-type) Forbid ARRAY<NOT_NULL(T)> temporarily (#12262 ) Currently, there are still lots of bugs related to ARRAY<NOT_NULL(T)>. We decide that we don't support ARRAY<NOT_NULL(T)> types at the first version and all elements in ARRAY are nullable. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-09-03 14:26:08 +08:00
minghong	34dd67f804	[feature](nereids) add weekOfYear to support ssb-flat benchmark (#12207 ) support function WeekOfYear In current implementation, WeekOfYear can be used in where clause, but not in select clause.	2022-09-03 12:04:51 +08:00
xy720	62561834a8	[Feature](array-type) Support is-null-predicate for array type (#12237 )	2022-09-03 11:37:57 +08:00
Zhengguo Yang	c944496fb4	[chore](log) add cluster and tag message to exception (#12287 )	2022-09-02 20:46:39 +08:00
Stalary	0d33c713d1	[Bug](CTAS) Fix CTAS error for use agg column as first. (#12299 ) * FIX: ctas default use duplicate key.	2022-09-02 20:44:01 +08:00
zhengshiJ	7f7a3a7524	[feature](nereids) Convert subqueries into algebraic expressions and … (#11454 ) 1.Convert subqueries to Apply nodes. 2.Convert ApplyNode to ordinary join. ### Detailed design: There are three types of current subexpressions, scalarSubquery, inSubquery, and Exists. The scalarSubquery refers to the returned data as 1 row and 1 column. Subquery replacement ``` before: scalarSubquery: filter(t1.a = scalarSubquery(output b)); inSubquery: filter(inSubquery); inSubquery = (t1.a in select *); exists: filter(exists); exists = (select ); end: scalarSubquery: filter(t1.a = b); inSubquery: filter(True); exists: filter(True); ``` Subquery Transformation Rules* ``` PushApplyUnderFilter * before: * Apply * / \ * Input(output:b) Filter(Correlated predicate/UnCorrelated predicate) * * after: * Filter(Correlated predicate) * \| * Apply * / \ * Input(output:b) Filter(UnCorrelated predicate) ``` ``` PushApplyUnderProject * before: * Apply * / \ * Input(output:b) Project(output:a) * * after: * Project(b,(if the Subquery is Scalar add 'a' as the output column)) * / \ * Input(output:b) Apply ``` ``` ApplyPullFilterOnAgg * before: * Apply * / \ * Input(output:b) agg(output:fn,c; group by:null) * \| * Filter(Correlated predicate(Input.e = this.f)/UnCorrelated predicate) * * end: * Apply(Correlated predicate(Input.e = this.f)) * / \ * Input(output:b) agg(output:fn,this.f; group by:this.f) * \| * Filter(UnCorrelated predicate) ``` ``` ApplyPullFilterOnProjectUnderAgg * before: * apply * / \ * Input(output:b) agg * \| * Project(output:a) * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * child * apply * / \ * Input(output:b) agg * \| * Filter(correlated predicate(Input.e = this.f)/Unapply predicate) * \| * Project(output:a,this.f, Unapply predicate(slots)) * \| * child ``` ``` ScalarToJoin * UnCorrelated -> CROSS_JOIN * Correlated -> LEFT_OUTER_JOIN ``` ``` InToJoin * Not In -> LEFT_ANTI_JOIN * In -> LEFT_SEMI_JOIN ``` ``` existsToJoin * Exists * Correlated -> LEFT_SEMI_JOIN * correlated LEFT_SEMI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(limit(1)) * uncorrelated CROSS_JOIN * / \ --> / \ * input queryPlan input limit(1) * \| * queryPlan * * Not Exists * Correlated -> LEFT_ANTI_JOIN * correlated LEFT_ANTI_JOIN(Correlated Predicate) * / \ --> / \ * input queryPlan input queryPlan * * UnCorrelated -> CROSS_JOIN(Count()) Filter(count() = 0) \| * apply Cross_Join * / \ --> / \ * input queryPlan input agg(output:count()) \| * limit(1) * \| * queryPlan ```	2022-09-02 17:34:19 +08:00
Adonis Ling	81c5732dc7	[feature-wip](MTMV) Support creating materialized view for multiple tables (#11646 ) Support creating materialized view for multiple tables. Examples: mysql> CREATE TABLE t1 (pk INT, v1 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE TABLE t2 (pk INT, v2 INT SUM) AGGREGATE KEY (pk) DISTRIBUTED BY hash (pk) PROPERTIES ('replication_num' = '1'); mysql> CREATE MATERIALIZED VIEW mv BUILD IMMEDIATE REFRESH COMPLETE KEY (mv_pk) DISTRIBUTED BY HASH (mv_pk) PROPERTIES ('replication_num' = '1') AS SELECT t1.pk as mv_pk FROM t1, t2 WHERE t1.pk = t2.pk;	2022-09-02 14:51:56 +08:00
morrySnow	87086ffe31	[enhancment](Nereids)enable normalize aggregate rule (#12194 ) enable normalize aggregate rule introduced by #12013	2022-09-01 19:20:37 +08:00
Mingyu Chen	3ce305134a	[fix](scan) fix potential wrong cancel when sql has limit (#12224 )	2022-09-01 19:11:40 +08:00
starocean999	f8eb480bec	[fix](emptynode)fix empty node bug in vec engine (#12258 ) * [fix](emptynode)fix empty node bug in vec engine * update fe ut	2022-09-01 18:52:10 +08:00
Henry2SS	ad8e2f4749	[fix](rpc) fix that coordinator rpc timeout too large may make show load blocked for long time (#12152 ) Co-authored-by: wuhangze <wuhangze@jd.com>	2022-09-01 18:05:37 +08:00
morrySnow	068e60145e	[enhancement](Nereids)ban groupPlan() pattern to avoid misuse (#12250 ) `groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing. When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse. So, this PR ban `groupPlan()` pattern ti avoid misuse.	2022-09-01 14:37:48 +08:00
Gabriel	3bcab8bbef	[feature](function) support now/current_timestamp functions with precision (#12219 ) * [feature](function) support now/current_timestamp functions with precision	2022-09-01 14:35:12 +08:00
starocean999	d7e02a9514	[fix](join)join reorder by mistake (#12113 )	2022-09-01 09:46:01 +08:00
morrySnow	a49bde8a71	[fix](Nereids)statistics calculator for Project and Aggregate lost some columns (#12196 ) There are some bugs in Nereids' StatsCalculator. 1. Project: return child column stats directly, so its parents cannot find column stats from project's slot. 2. Aggregate: do not return column that is Alias, its parents cannot find some column stats from Aggregate's slot. 3. All: use SlotReference as key of column to stats map. So we need change SlotReference's equals and hashCode method to just using ExprId as we discussed.	2022-08-31 20:47:22 +08:00
morrySnow	57051d3591	[fix](Nereids)cast StringType to DateType failed when bind TimestampArithmetic function (#12198 ) When bind TimestampArithmetic, we always want to cast left child to DateTimeType. But sometimes, we need to cast it to DateType, this PR fix this problem.	2022-08-31 19:52:03 +08:00
xy720	90c5180370	[Bug](array-type) Fix bug in creating view from table with array types (#12200 )	2022-08-31 14:36:31 +08:00
camby	da4ffd3c56	[Enhancement](metric-type) more readable error message for only metric type #12162 Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-31 14:35:48 +08:00
starocean999	3cdd19821d	[fix](sort)the slot in sort node should be nullable if it's outer joined (#12193 ) The sort node's output expr should be nullable if it is outer joined.	2022-08-31 14:34:14 +08:00
jakevin	8999ba34ae	[improve](Nereids)unify all plan toString() function (#12132 ) Add a Util function to generate uniform format plan toString for easy reading and debugging	2022-08-31 14:28:44 +08:00
camby	8b98e2021e	[enhancement](array-type) Array type do not support compare with '=','>', '<', make the error message more readable (#12181 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-31 12:49:10 +08:00
HouRong	f0cde35ea6	[performance improvement] Spark Load, SparkDpp processRDDAggregate performance improvement (#12186 ) Co-authored-by: hourong <hourong@zhihu.com>	2022-08-31 09:14:13 +08:00
Mingyu Chen	8251e7cbfc	[refactor](column) remove confused field (#12187 )	2022-08-31 09:13:31 +08:00
minghong	f949262ddf	[fix](planner) a slot id is bounded on a wrong tuple id, if cross join has a hash join as child (#12156 )	2022-08-31 09:07:55 +08:00
Mingyu Chen	22430cd7bb	[feature](stmt) add ADMIN COPY TABLET stmt for local debug (#12176 ) Add a new stmt ADMIN COPY TABLET for easy copy a tablet to local env to reproduce problem. See document for more details.	2022-08-31 09:06:49 +08:00
morrySnow	172c213fbc	[fix](Nereids)get NPE from finalize TimestampArithmeticExpr that generated by ExpressionTranslator (#12163 ) This PR: 1. refactor getDataType in TimestampArithmetic 2. set TimeUnit correctly when translate TimestampArithmetic to TimestampArithmeticExpr	2022-08-30 21:59:28 +08:00
jakevin	59e5527eb0	[feature](Nereids)enable CBO optimize stage in Nereids (#12008 ) - enable CBO stage in Nereids - use the `chooseBestPlan()` to get the best plan - add a new rule JoinCommuteProject - test the stage by JoinCommute rule	2022-08-30 21:28:17 +08:00
morrySnow	de4bdc7f6f	[fix](Nereids)Sum return DoubleType when child is DecimalType by mistake (#12169 ) When Sum's child is Decimal, Return Double Type by mistake lead to result error, so we should keep the return type to decimal when the child expression's type is decimal.	2022-08-30 19:53:25 +08:00
924060929	f6a10e9ea3	[refactor](Nereids)refactor memo.copyIn (#12147 ) this pr do 2 refactor 1. remove useless parameter from `Plan#computeOutput` 2. refactor memo.copyIn It the past, `memo.copyIn` has complex logic to process init, rewrite and copyIn, It's difficult to understand and easy to meet bug and leak memory for some unreachable group/groupExpression. So I separate it into three methods: 1. `Memo.init` for init Memo by LogicalPlan 2. `Memo.doRewrite` for rewrite 3. `Memo.doCopyIn` for exploration and implementation And separate the UT into 3 files 1. `MemoInitTest` 2. `MemoRewriteTest` 3. `MemoCopyInTest` I have added a lots of UT for `Memo.rewrite`, and add some unreachable DAG check in the PlanChecker, when the plan is changed.	2022-08-30 19:36:04 +08:00
Kikyou1997	9a74ad1702	[feature](Nereids)add the ability of projection on each ExecNode and add column prune on OlapScan (#11842 ) We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE. This PR: - add projections on each ExecNode in BE - translate PhysicalProject into projections on PlanNode in FE - do column prune on ScanNode in FE Co-authored-by: HappenLee <happenlee@hotmail.com>	2022-08-30 16:17:10 +08:00
Yongqiang YANG	fb27e3ef31	[fix](planner) let OlapScanNode turn off preaggragation when there is a filter on DELETE_SIGN (#12118 ) We can skip aggregate on replace column, otherwise it would generate wrong result. e.g. a row in UNIQUE is deleted by delte_sign_column, then it would be returned.	2022-08-30 15:54:37 +08:00
caiconghui	2715ff9e0f	[Enhancement](select) Make select variables request handled in fe without be to avoid potential blocked problem when login (#12111 )	2022-08-29 23:07:30 +08:00
jakevin	580b8dd3ec	[improve](Nereids)make some DataType's constructor's access level to private (#12143 ) include these types: - NullType - BooleanType - TinyIntType - SmallIntType - IntegerType - BigIntType - LargeIntType - FloatType - DoubleType - DateType - DateTimeType - StringType	2022-08-29 21:21:24 +08:00
camby	47c89d49f0	[fix](array-type) array can not be distributed key and aggregation key (#12082 ) Array column should not be distributed key or group by or aggregate key, we should forbid it before create table. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-29 17:29:23 +08:00
Mingyu Chen	b2c85d753f	Revert "[behavior change](planner)change Doris's query organization syntax to standard sql (#9745 )" (#12135 ) Reverts apache/doris#9745 This may cause NPE when calling `parseDefineExprWithoutAnalyze()`	2022-08-29 16:29:09 +08:00
Fy	c0b56400ed	[feature](nereids)support one expression-rewrite rule: inPredicateToEqualTo (#12046 ) Add one expression rewrite rule: rewrite InPredicate to an EqualTo Expression, if there exists exactly one element in InPredicate Options. Examples: 1. where A in (x) ==> where A = x 2. where A not in (x) ==> where not A = x	2022-08-29 15:54:06 +08:00
morrySnow	1d9d99c8ec	[fix](Nereids)join output order need same with child plan node output when translate (#12130 ) In BE, There is an implicit convention that HashJoinNode's left child's output Slot must before right child's output slot in intermediateTuple. However, after we do commute rule on join plan in Nereids, this convention will be broken and cause core dump in BE. There are two way to fix this problem: 1. add a project on join after we do commute 2. reorder output of join node when we do translate Since we cannot translate project yet because BE projection support is on going(#11842). So we use second way to fix it now. After the project translation could work correctly, we should use the first way to fix it.	2022-08-29 15:32:55 +08:00
Gabriel	af09c1f4eb	[Improvement](window funnel) restrict timestamp to datetime type in window funnel (#12123 )	2022-08-29 12:14:04 +08:00
yinzhijian	3ca6f34c87	[fix](view) Fix view not showing specific lengths for varchar type (#12107 )	2022-08-29 12:09:48 +08:00
Pxl	7829c21b20	[Bug](lateral-view) fix some conjunct not work on lateral view #12105	2022-08-29 12:08:20 +08:00
tk047	fb7c42a4e3	[fix](fe) Fixed alterOp from HashSet to EnumSet (#12094 ) Change the HashSet to EnumSet of the AlterOp's currentOps for better performance	2022-08-29 12:07:31 +08:00
jakevin	eb3e0b2f7d	[test](Nereids): add more plan equals test for Nereids (#12127 ) - add more plan equals test for Nereids - fix join equals bugs	2022-08-29 11:46:30 +08:00
camby	fe9767941d	[fix](array-type) adjust enable_array_type config (#12071 ) Problem: 1. `enable_array_type` is masterOnly; 2. dynamic open config only affect FE MASTER `admin set frontend config("enable_array_type"="true");` 3. query in FE FOLLOWER will fail, because of `enable_array_type` is false in FE FOLLOWER `select * from table_with_array ` Solution: Only check `enable_array_type` while creating new tables with array column. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-08-29 11:10:52 +08:00
Mingyu Chen	7fbcf3c8ba	[api-change](http) change kill query http api by using query id (#12120 ) Now user can cancel query id by http by following steps: Get query id by trace id cancel query by query id The modified api has not been released yet.	2022-08-29 09:51:51 +08:00

1 2 3 4 5 ...

2681 Commits