doris

Author	SHA1	Message	Date
starocean999	e0518fd19d	[fix](nereids)remove redundant visit call in Validator (#18103 )	2023-03-25 11:41:34 +08:00
mch_ucchi	1164611393	[enhancement](planner) fix unclear exception msg when create mv (#17537 ) a materialized view's from clause can only be a single table and not sub-query, but the exception msg is npe. The pr change it to a clear msg.	2023-03-25 11:36:40 +08:00
Gabriel	2408ca5da8	[Bug](DECIMALV3) Fix wrong precision for plus/minus (#18052 ) Result type for DECIMAL(x, y) plus/minus DECIMAL(m, n) should be DECIMAL(max(x - y, m - n) + max(y + n) + 1, max(y + n))	2023-03-25 09:42:39 +08:00
AKIRA	dc4b719528	[enhancement](stats) Make estimation with histogram much more precisely (#18053 )	2023-03-25 01:02:36 +08:00
Lijia Liu	51962fbfaf	[fix](meta) FE should delete a colocate table's replica when it is redundant (#17998 ) If a colocate table's tablet is heathy. When a BE report a extra reaplica to FE, FE will not delete it. But it should be deleted, otherwise it will report again and again but no one will handle it.	2023-03-25 00:16:31 +08:00
minghong	80d2e6f4c1	[fix](nereids) should not assign stats after cast on the original slot (#18061 ) select * from T where A = 10.0 suppose A is int column after stats derive on `cast(A as double) = 10.0`, we set column stats for `cast(A as double)` on `A`	2023-03-24 21:37:06 +08:00
HappenLee	473f0c45ff	[Bug](delete) Fix bug of delete partition prune error (#18057 )	2023-03-24 20:22:12 +08:00
gitccl	0523860877	[Enhancement](streamload) print profile for streamload (#18015 ) When both enable_profile and enable_stream_load_profile_log is true, stream load profile is printed to the log	2023-03-24 20:17:33 +08:00
zhangdong	219ef01c65	[bugfix](k8s)roll back jackson version (#18046 ) when Upgrade the version of jackson,k8s client will failed java.lang.NoClassDefFoundError: org/yaml/snakeyaml/LoaderOptions at com.fasterxml.jackson.dataformat.yaml.YAMLParser.(YAMLParser.java:191) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2] at com.fasterxml.jackson.dataformat.yaml.YAMLFactory._createParser(YAMLFactory.java:509) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2] at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:413) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2] at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:386) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2] at com.fasterxml.jackson.dataformat.yaml.YAMLFactory.createParser(YAMLFactory.java:15) ~[jackson-dataformat-yaml-2.14.2.jar:2.14.2] at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3677) ~[jackson-databind-2.14.2.jar:2.14.2] at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3645) ~[jackson-databind-2.14.2.jar:2.14.2] at io.fabric8.kubernetes.client.internal.KubeConfigUtils.parseConfigFromString(KubeConfigUtils.java:47) ~[kubernetes-client-5.12.2.jar:?] ...	2023-03-24 19:36:59 +08:00
starocean999	7bdd854fdc	[fix](nereids) bucket shuffle and colocate join is not correctly recognized (#17807 ) 1. close (https://github.com/apache/doris/issues/16458) for nereids 2. varchar and string type should be treated as same type in bucket shuffle join scenario. ``` create table shuffle_join_t1 ( a varchar(10) not null ) create table shuffle_join_t2 ( a varchar(5) not null, b string not null, c char(3) not null ) ``` the bellow 2 sqls can use bucket shuffle join ``` select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.a; select * from shuffle_join_t1 t1 left join shuffle_join_t2 t2 on t1.a = t2.b; ``` 3. PushdownExpressionsInHashCondition should consider both hash and other conjuncts 4. visitPhysicalProject should handle MarkJoinSlotReference	2023-03-24 19:21:41 +08:00
lexluo09	562f572311	[enhancement](UDF) The user defined functions support global ('show functions'/'show create') operation (#16973 ) (#17964 ) 1. add the global keyword. SHOW [GLOBAL] [FULL] [BUILTIN] FUNCTIONS [IN\|FROM db] [LIKE 'function_pattern'] SHOW CREATE GLOBAL FUNCTION function_name(arg_type [, ...]); 2. show the details of the global udf.	2023-03-24 19:07:38 +08:00
jakevin	354d109130	[feat](Nereids): check Memo Plan for Unit Test. (#18082 )	2023-03-24 18:31:33 +08:00
Xinyi Zou	cd28e9f3b5	[fix](function) fix encrypt/decrypt function bug select list expression not produced by aggregation output #18078 Fix function analysis repeat add child. select list expression not produced by aggregation output (missing from GROUP BY clause?): if(length(`r_2_3`.`name`) % 32 = 0, aes_decrypt(unhex(`r_2_3`.`name`), '***'), `r_2_3`.`name`)	2023-03-24 18:03:18 +08:00
wangqt	ca0e4844e8	[typo](comment) code comment fix (#17870 ) Co-authored-by: wangqingtao6 <wangqingtao6@jd.com>	2023-03-24 17:47:30 +08:00
Pxl	8249441335	[Bug](planner) add conjunct slotref id to table function node to avoid result incorrect (#18063 ) add conjunct slotref id to table function node to avoid result incorrect	2023-03-24 14:48:03 +08:00
mch_ucchi	aa3ea4beed	[fix](planner) failed to create view when use window function (#17815 ) fix failed to create view when use window function because the view string contains slot id and which cannot be parsed.	2023-03-24 10:58:52 +08:00
starocean999	22fce33fb2	[fix](nereids) fix bitmap function nullable trait and dphyper bugs (#18041 ) 1. some bitmap functions like bitmap_or, bitmap_and_count, bitmap_or_count etc shouldn't follow constant fold rule for PropagateNullable functions. So remove PropagateNullable property and these functions would use their own constant fold logic correctly 2. dphyper's PlanReceiver class shouldn't change hyperGraph's complex project info. So make PlanReceiver use its own copy of complex project info now.	2023-03-24 10:53:45 +08:00
jakevin	f9f87545d6	[improve](Nereids): check slot from children in validator. (#17951 )	2023-03-24 10:52:12 +08:00
hqx871	1999cccde9	[feature](array-type) Unique table support array value (#17024 ) Unique table support array value --------- Co-authored-by: huangqixiang.871 <huangqixiang.871@bytedance.com>	2023-03-24 10:18:59 +08:00
Xiangyu Wang	1f8ba4948d	[Fix](multi-catalog) add handler for hms INSERT EVENT. (#17933 ) When we use a hive client to submit a `INSERT INTO TBL SELECT * FROM ...` or `INSERT INTO TBL VALUES ...` sql and the table is non-partitioned table, the hms will generate an insert event. The insert stmt may changed the hdfs file distribution of this table, but currently we do not handle this, so the file cache of this table may be inaccurate.	2023-03-24 10:17:47 +08:00
924060929	321bb3e9ee	[refactor](Nereids) Refactor and optimize partition pruning (#18003 ) the legacy PartitionPruner only support some simple cases, some useful cases not support: 1. can not support evaluate some builtin functions, like `cast(part_column as bigint) = 1` 2. can not prune multi level range partition, for partition `[[('1', 'a'), ('2', 'b'))`, it has some constraints: - first_part_column between '1' and '2' - if first_part_column = '1' then second_part_column >= 'a' - if first_part_column = '2' then second_part_column < 'a' This pr refactor it and support: 1. use visitor to evaluate function and fold constant 2. if the partition is discrete like int, date, we can expand it and evaluate, e.g `[1, 5)` will be expand to `[1, 2, 3, 4]` 3. support prune multi level range partition, as previously described 4. support evaluate capabilities for a range slot, e.g. datetime range partition `[('2023-03-21 00:00:00'), ('2023-03-21 23:59:59'))`, if the filter is `date(col1) = '2023-03-22'`, this partition will be pruned, we can do this prune because we know that the date always is `2023-03-21`. you can implement the visit method in FoldConstantRuleOnFE and OneRangePartitionEvaluator to support this functions. ### How can we do it so finely ？ Generally, the range partition can separate to three parts: `const`, `range`, `other`. for example, the partition `[(1, 'a', 'D'), ('1', 'c', 'D'))` exist 1. first partition column is `const`: always equals to '1' 2. second partition column is `range`: `slot >= 'a' and <= 'c'`. If not later slot, it must be `slot >= 'a' and < 'c'` 3. third partition column is `other`: regardless of whether the upper and lower bounds are the same, it must exist multi values, e.g. `('1', 'a', 'D')`, `('1', 'a', 'F')`, `('1', 'b', 'A')`, `('1', 'c', 'A')` In a partition, there is one and only one `range` slot can exist; maybe zero or one or many `const`/`other` slots. Normally, a partition look like [const, range, other], these are the possible shapes: 1. [range], e.g `[('1'), ('10'))` 2. [const, range], e.g. `[('1', 'a'), ('1', 'd'))` 3. [range, other, other], e.g. `[('1', '1', '1'), ('2', '1', '1'))` 4. [const, const, ..., range, other, other, ...], e.g. `[('1', '1', '2', '3', '4'), ('1', '1', '3', '3', '4'))` The properties of `const`: 1. we can replace slot to literal to evaluate expression tree. The properties of `range`: 1. if the slot date type is discrete type, like int, and date, we can expand it to literal and evaluate expression tree 2. if not discrete type, like datetime, or the discrete values too much, like [1, 1000000), we can keep the slot in the expression tree, and assign a range for it, when evaluate expression tree, we also compute the range and check whether range is empty set, if so we can simplify to BooleanLiteral.FALSE to skip this partition. 5. if the range slot satisfied some conditions , we can fold the slot with some function too, see the datetime example above The properties of `other`: 1. only when the previous slot is literal and equals to the lower bound or upper bound of partition, we can shrink the range of the `other` slot According this properties, we can do it finely. at the runtime, the `range` and `other` slot maybe shrink the range of values, e.g. 1. the partition `[('a'), ('b'))` with predicate `part_col = 'a'` will shrink range `['a', 'b')` to `['a']`, like a `range` slot change/downgrading to `const` slot; 2. the partition `[('a', '1'), ('b', '10'))` with predicate `part_col1 = 'a'` will shrink the range of `other` slot from unknown(all range) to `['1', +∞)`, like a `other` slot change/downgrading to `range` slot. But to simplify, I haven't change the type at the runtime, just shrink the ColumnRange.	2023-03-24 09:06:52 +08:00
924060929	d3e7f12ada	[refactor](Nereids) refactor column pruning (#17579 ) This pr refactor the column pruning by the visitor, the good sides 1. easy to provide ability of column pruning for new plan by implement the interface `OutputPrunable` if the plan contains output field or do nothing if not contains output field, don't need to add new rule like `PruneXxxChildColumns`, few scenarios need to override the visit function to write special logic, like prune the LogicalSetOperation and Aggregate 2. support shrink output field in some plans, this can skip some useless operations so improvement example: ```sql select id from ( select id, sum(age) from student group by id )a ``` we should prune the useless `sum (age)` in the aggregate. before refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0, sum(age#2) AS `sum(age)`#4], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0, age#2], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ``` after refactor: ``` LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalSubQueryAlias ( qualifier=[a] ) +--LogicalAggregate ( groupByExpr=[id#0], outputExpr=[id#0], hasRepeat=false ) +--LogicalProject ( distinct=false, projects=[id#0], excepts=[], canEliminate=true ) +--LogicalOlapScan ( qualified=default_cluster:test.student, indexName=<index_not_selected>, selectedIndexId=10007, preAgg=ON ) ```	2023-03-24 09:00:48 +08:00
morrySnow	c1bd5b26a8	[refactor](Nereids) expression translate no long rely on legacy planner code (#17671 )	2023-03-23 23:05:15 +08:00
morrySnow	47bd3e77e8	[fix](Nereids) cannot select random olap table (#18044 )	2023-03-23 22:11:36 +08:00
AlexYue	3bb3c36b9b	[bugfix](txn) return when txn state is null when doing abort txn (#18045 )	2023-03-23 20:51:21 +08:00
lihangyu	4c5ba4bb01	[Improve](point query) optimize sendFields since `writeField` is heav… (#18000 ) save about 20% FE cpu cost for point query with prepared statement which table contains 100 columns	2023-03-23 17:45:56 +08:00
lihangyu	8b617afe43	[Improve](point query) improve column match performance when doing `computeColumnFilter` to prune partition (#17982 ) Only use key columns when `computeColumnFilter` otherwise for wide tables the match process could be very slow 500 columns table QPS: 6186 -> 13208	2023-03-23 17:45:34 +08:00
morrySnow	20d26397aa	[fix](planner) forbid inline view but not the subquery resolve from parent tuples (#18032 ) in PR #17813 , we want to forbid bind slot on brother's column howerver the fix is not in correct way. the correct way to do that is forbid subquery register itself in parent's analyzer. This reverts commit b91a3b5a72520105638dad1079b71a05f02c10a0.	2023-03-23 16:11:04 +08:00
AKIRA	34dc7e57c1	[ehancement](stats) Tune for stats framework (#18035 ) 1. Estimate timearithmeticexpr instead of setting Double.MAX Double.MIN directly 2. Enable histogram to derive stats 3. Loose the condition for histogram usage 4. Improve the accuracy for agg on TPC-H 1G greatly 5. Fix avg qerror calculation	2023-03-23 16:03:58 +08:00
HappenLee	e9ff3d185b	[Opt](pipeline) disable coloagg when the para instance num >= tablet_num * 2 (#18030 )	2023-03-23 15:53:13 +08:00
zhengshiJ	574365b6d4	[Feature](Nereids) support new mv (#17853 ) The metadata storage format of the materialized view has changed, and the new optimizer adapts to the new storage method. The column storage format of the metadata for the materialized view is changed to start with mv_ or start with mva_ This pr allows the new optimizer to recognize the new materialized view columns and select the correct materialized view. TODO: support advance mv	2023-03-23 15:25:42 +08:00
Jibing-Li	6684d65075	[Improvement](TVF)Support file split for TableValueFunction (#17958 ) Current getSplits for TVF is to create one split for each file. In this case, large file scan performance maybe bad. This pr is to implement the getSplits function in TVFSplitter to support split file to multiple blocks which may improve the performance for large files.	2023-03-23 15:05:44 +08:00
mch_ucchi	2d4f5886ab	[Enhancement](Nereids) add single sql fall back to original planner hint (#17994 ) now we can use /+ SET_VAR(enable_nereids_planner="false") / to disable nereids in a single sql.	2023-03-23 13:38:40 +08:00
minghong	e415754130	[enhancement](nereids) adjust distribution cost in cost model v1 (#17990 ) 1. adjust in cost model the cost of broadcast should lower than the cost of shuffle when data size is small. In broadcast, we do not known the number of receiver BEs, so we use the number of BEs in the system. 2. debug message adjust a. in explain, print row count after filter b. if join is not marked join, do not print marked join info	2023-03-23 13:32:36 +08:00
morrySnow	fadf3b906d	[enhancement](planner) delete support between predicate (#17892 )	2023-03-23 13:24:32 +08:00
mch_ucchi	abeec4848a	[Fix](Nereids)fix be fold constant incorrectly on from_unixtime. (#18016 )	2023-03-23 11:17:08 +08:00
ZhangYu0123	089a91ecd5	[vectorized](function) support array_exists lambda function (#17931 ) Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-03-23 11:11:39 +08:00
ElvinWei	5a7d99e2f0	[Improvement](statistics) Support for collecting statistics at the granularity of partitions. (#17966 ) * Support for collecting statistics at the granularity of partitions * Add ut and fix some bug	2023-03-23 09:05:42 +08:00
HappenLee	58b00858ab	[Refactor](pipeline) Remove unless fe session variable enable_rpc_opt_for_pipeline (#18019 )	2023-03-23 07:27:58 +08:00
Xiangyu Wang	7ed15ee8c9	[Fix](multi-catalog) invalidates the file cache when table is non-partitioned. (#17932 ) Reference to `org.apache.doris.planner.external.HiveSplitter`, the file cache of `HiveMetaStoreCache` may be created even the table is a non-partitioned table, so the `RefreshTableStmt` should consider this scene and handle it.	2023-03-22 23:34:18 +08:00
Adonis Ling	5021c0f91a	[feature-wip](MTMV) Support joining tables with views (#18026 ) * [feature-wip](MTMV) Support joining tables with views * Resolve comments	2023-03-22 23:21:50 +08:00
yongkang.zhong	e2e806a5e7	[improve](clickhouse jdbc) support clickhouse array type (#17993 ) In this PR, I match the array type of ClickHouse to the array type of Doris's jdbc external.	2023-03-22 19:42:32 +08:00
qiye	410907c940	[improvement](inverted index)UNIQUE_KEYS table only supports inverted index when merge_on_write is enabled. (#17827 ) When adding inverted index to UNIQUE_KEYS table without merge_on_write enabled, the match query may failed before the segment is compacted. So we add the restriction here.	2023-03-22 17:47:30 +08:00
Xinyi Zou	ebef0c038d	Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420 )" (#17887 ) This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.	2023-03-22 13:28:25 +08:00
jakevin	bd46d721e9	[feature](Nereids): pull up SEMI JOIN from INNER JOIN (#17765 )	2023-03-22 12:48:04 +08:00
Pxl	40ca250678	[Feature](materialized-view) support where clause on create materialized view (#17534 ) support where clause on create materialized view	2023-03-22 11:25:13 +08:00
Pxl	401836f523	[Bug](planner) fix core dump when lateral view above union node and have predicate (#17912 ) fix core dump when lateral view above union node and have predicate	2023-03-22 11:24:45 +08:00
starocean999	17a1ce5ed3	[fix](nereids) add a project node above sort node to eliminate unused order by keys (#17913 ) if the order by keys are not simple slot in sort node, the order by exprs have to been added to sort node's output tuple. In that case, we need add a project node above sort node to eliminate the unused order by exprs. for example: ```sql WITH t0 AS (SELECT DATE_FORMAT(date, '%Y%m%d') AS date FROM cir_1756_t1 ), t3 AS (SELECT date_format(date, '%Y%m%d') AS `date` FROM `cir_1756_t2` GROUP BY date_format(date, '%Y%m%d') ORDER BY date_format(date, '%Y%m%d') ) SELECT t0.date FROM t0 LEFT JOIN t3 ON t0.date = t3.date; ``` before: ``` +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalProject[159] ( distinct=false, projects=[date#1], excepts=[], canEliminate=true ) \| \| +--LogicalJoin[158] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#1 = date#3)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[151] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#1], excepts=[], canEliminate=true ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=412339, preAgg=ON ) \| \| +--LogicalSort[157] ( orderKeys=[date_format(cast(date#3 as DATETIME), '%Y%m%d') asc null first] ) \| \| +--LogicalAggregate[156] ( groupByExpr=[date#3], outputExpr=[date#3], hasRepeat=false ) \| \| +--LogicalProject[155] ( distinct=false, projects=[date_format(date#2, '%Y%m%d') AS `date`#3], excepts=[], canEliminate=true ) \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=412352, preAgg=ON ) \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ ``` after: ``` +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalProject[171] ( distinct=false, projects=[date#2], excepts=[], canEliminate=true ) \| \| +--LogicalJoin[170] ( type=LEFT_OUTER_JOIN, markJoinSlotReference=Optional.empty, hashJoinConjuncts=[(date#2 = date#4)], otherJoinConjuncts=[] ) \| \| \|--LogicalProject[162] ( distinct=false, projects=[date_format(date#0, '%Y%m%d') AS `date`#2], excepts=[], canEliminate=true ) \| \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t1, indexName=cir_1756_t1, selectedIndexId=1049812, preAgg=ON ) \| \| +--LogicalProject[169] ( distinct=false, projects=[date#4], excepts=[], canEliminate=false ) \| \| +--LogicalSort[168] ( orderKeys=[date_format(cast(date#4 as DATETIME), '%Y%m%d') asc null first] ) \| \| +--LogicalAggregate[167] ( groupByExpr=[date#4], outputExpr=[date#4], hasRepeat=false ) \| \| +--LogicalProject[166] ( distinct=false, projects=[date_format(date#3, '%Y%m%d') AS `date`#4], excepts=[], canEliminate=true ) \| \| +--LogicalOlapScan ( qualified=default_cluster:bugfix.cir_1756_t2, indexName=cir_1756_t2, selectedIndexId=1049825, preAgg=ON ) \| +--------------------------------------------------------------------------------------------------------------------------------------------------+ ```	2023-03-22 11:19:32 +08:00
AKIRA	f600f70619	[ehancement](fe) Tune for stats framework (#17860 )	2023-03-22 11:07:56 +08:00
morrySnow	173d68409c	[enhencement](planner) update and delete support use alias for target table (#17914 )	2023-03-22 11:07:39 +08:00

... 82 83 84 85 86 ...

8289 Commits