doris

Author	SHA1	Message	Date
jakevin	58de8ec2df	[enhance](Nereids): add variable to enable Bushy Tree (#18202 )	2023-03-29 21:53:24 +08:00
Xinyi Zou	6964d9f99c	[fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907 ) * Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)" This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d. * [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420) ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided. Solved: 192/256 supports calculation without init vector For other algorithms, an error should be reported when there is no init vector Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector. Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found	2023-03-29 21:13:01 +08:00
minghong	f24174ebf1	when estimated rowCount is 0, adjust to 1 (#18174 )	2023-03-29 19:03:53 +08:00
zhengshiJ	b92087dee8	[Fix](Nereids) ReorderJoin rule cannot process MarkJoin correctly (#18159 ) Fix two problems, 1. The logical join containing the MarkJoinSlotRefrance column will generate a plan->MarkJoinSlotreference structure when reorderJoin is executed, and the MarkJoinSlotreference column will be restored after the reorder is completed. But when filter+crossJoin exists, it will be transformed into innerJoin in the rules, causing the map to fail, and the corresponding plan cannot be found, thus losing the MarkJoinSlotreference column. 2. Originally, the MarkJoinSlotReference column was used as the NonUserVisibleOutput of logicalJoin. At the same time, when logicalApply was generated, the added logicalProject did not include the MarkJoinSlotReference column, and the invalid logicalProject was deleted based on other rules, so as to ensure that LogicalApply was under the logicalFilter and could recognize the MarkJoinSlotReference column. But there will be problems if logicalProject cannot be deleted. Repair method 1. For logicalJoin containing MarkJoinSlotreference, the rules of reorderJoin are not executed. 2. Use MarkJoinSlotreference as the output of logicalJoin and also as the output of LogicalApply. 3. When generating LogicalApply, if MarkJoinSlotreference is included, you need to add an additional logicalProject to logicalFilter, and remove the MarkJoinSlotreference column. eg ``` logicalFilter(subquery with disconjunct) after SubqueryToApply logicalProject(without markJoinSlotReference) +-- logicalFilter(markJoinSlotReference) +-- logicalProject(with markJoinSlotReference) +-- logicalApply() ``` ``` SELECT * FROM sub_query_correlated_subquery1 WHERE k1 IN (SELECT k1 FROM sub_query_correlated_subquery3) OR k1 < 10; +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| Explain String \| +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ \| LogicalProject[60] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true ) \| \| +--LogicalProject[59] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true ) \| \| +--LogicalFilter[58] ( predicates=($c$1#7#false OR (k1#0 < 10)) ) \| \| +--LogicalProject[57] ( distinct=false, projects=[k1#0, k2#1, $c$1#7#false], excepts=[], canEliminate=true ) \| \| +--LogicalApply ( correlationSlot=[], correlationFilter=Optional.empty, isMarkJoin=true, MarkJoinSlotReference=$c$1#7#false, scalarSubCorrespondingSlot=empty ) \| \| \|--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, indexName=<index_not_selected>, selectedIndexId=63105, preAgg=ON ) \| \| +--LogicalProject[34] ( distinct=false, projects=[k1#2], excepts=[], canEliminate=true ) \| \| +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, indexName=<index_not_selected>, selectedIndexId=63115, preAgg=ON ) \| +----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ ```	2023-03-29 16:12:42 +08:00
Pxl	503c6bf38e	[Chore](materialized-view) forbiden create mv with some constant expr and curdate() (#18145 ) forbiden create mv with some constant expr and curdate()	2023-03-29 16:08:48 +08:00
Lei Zhang	f7f7958d35	[fix](bdbje) fix handle bdb RollbackException incorrectly (#17483 )	2023-03-29 16:02:55 +08:00
morrySnow	545160a343	[refactor](planner) Separate the planning process for the legacy planner and Nereids (#17991 ) 1. separate the planning process for legacy planner and Nereids in StmtExecutor 2. add forward to master logic to Nereids 3. refactor Command process for Nereids, add run interface to Command 4. internal query could run on Nereids as normal query 5. fix CreatePolicyCommand syntax, let it exactlly same with legacy planner 6. let Nereids session variables forward to master	2023-03-29 11:36:38 +08:00
starocean999	db25165498	[fix](nereids)move AdjustAggregateNullableForEmptySet before NormalizeAggregate (#18147 )	2023-03-29 11:34:01 +08:00
Pxl	0c01df6bb2	[Bug](view) fix AES_ENCRYPT have wrong result on view (#18034 )	2023-03-29 10:49:39 +08:00
Pxl	fd18e34c0c	[Chore](planner) add error information for OnClause contain ExistsPredicates (#18090 )	2023-03-29 10:47:41 +08:00
zhangdong	7e9e02a173	[Enhancement](auth)Desc table check col auth (#18114 ) 1.Change permission exception format 2.when desc table ,we show different cols by auth 3.delete unused code	2023-03-29 10:42:18 +08:00
Mingyu Chen	05db6e9b55	[refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009 ) Follow #17586. This PR mainly changes: Remove env/ Remove FileUtils/FilesystemUtils Some methods are moved to LocalFileSystem Remove olap/file_cache Add s3 client cache for s3 file system In my test, the time of open s3 file can be reduced significantly Fix cold/hot separation bug for s3 fs. This is the last PR of #17764. After this, all IO operation should be in io/fs. Except for tests in #17586, I also tested some case related to fs io: clone concurrency query on local/s3/hdfs load error log create and clean disk metrics	2023-03-29 09:00:52 +08:00
WenYao	c3fe113894	rename PaloFe to DorisFE (#18167 )	2023-03-29 00:30:16 +08:00
奕冷	5d218388f3	[enhancement](stmt-forward) make fe follower err msg shown to client be consistent with master (#18180 ) Found that RPC timeout is too short that RPC client will close before execute result is return. Therefore, use a coefficient to prolong the RPC client timeout, so that it can wait for the real cause to be recieved.	2023-03-28 21:27:45 +08:00
Liqf	012f7bd031	[feature](function)Add ST_Area function (#18138 )	2023-03-28 19:36:09 +08:00
jakevin	cff6a7195b	[feature](Nereids): add bushy tree rule; (#18130 )	2023-03-28 19:32:53 +08:00
GoGoWen	b9161295b7	[Fix](plan) fix bug that the case sensibility of column name may impact join method (#17904 ) Issue Number: close #17876	2023-03-28 15:18:30 +08:00
Xiangyu Wang	6bd2609294	[Enhancement](multi-catalog) add config for external meta cache loade… (#18117 ) Add config for external cache-loader's max thread-pool size.	2023-03-28 15:10:19 +08:00
Tiewei Fang	d7dcdfcba9	[Fix](Create View) support create view from tvf (#18087 ) Support create view as select * from tvf()	2023-03-28 15:07:32 +08:00
jakevin	d6339b36a4	[fix](Nereids): correct the order of pushdown semi rules. (#18148 )	2023-03-28 14:20:07 +08:00
xueweizhang	1956f04aa2	[feature](multi-catalog) add specified_database_list PROPERTY for jdbc/hms/iceberg catalog (#17803 ) add specified_database_list PROPERTY for jdbc catalog, user can use many database specified by jdbc catalog	2023-03-28 14:04:41 +08:00
xy720	daeaa91dd6	[feature](function) support variadic template type in SQL function (#17985 ) Inspired by c++ function `std::vector::emplace_back()`, we can use variadic template for this issue. e.g. ``` [['struct'], 'STRUCT<TYPES>', ['TYPES'], 'ALWAYS_NOT_NULLABLE', ['TYPES...']] ``` `...TYPES` in template_types defines a variadic template `TYPE`. Then the variadic template will be expanded to multiple normal templates based on actual input arguments at runtime in FE. But make sure `TYPES...` is placed on the last position in all template type arguments. BTW, the origin template function logic is not affected.	2023-03-28 11:08:24 +08:00
Pxl	9c1e86f84f	[Bug](materialized-view) add some limit for create mv on aggregate table (#18141 ) add some limit for create mv on aggregate table. ```sql CREATE TABLE t1 ( p1 INT, p2 INT, p3 INT, v1 INT SUM, v2 INT MAX, v3 INT MIN ) AGGREGATE KEY (p1, p2, p3) DISTRIBUTED BY HASH (p1) BUCKETS 1 PROPERTIES ('replication_num' = '1'); CREATE MATERIALIZED VIEW mv_1 AS SELECT p1, SUM(v3) FROM t1 GROUP BY p1; // invalid aggregate type CREATE MATERIALIZED VIEW mv_2 AS SELECT p1, MIN(v3+v3) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column CREATE MATERIALIZED VIEW mv_3 AS SELECT p1, SUM(v1) FROM t1 GROUP BY p1; // cast v1 as bigint, ok CREATE MATERIALIZED VIEW mv_4 AS SELECT p1, SUM(abs(v1)) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column ```	2023-03-28 10:28:29 +08:00
mch_ucchi	84c6f47e4f	[Feature](Nereids) add WinMagic rule to rewrite scalar sub-query to window function (#17968 ) refer paper: WinMagic - Subquery Elimination Using Window Aggregation SQL like TPC-H Q2 and Q17, which contains a correlated sub-query with only one aggregation function output, we can eliminate the sub-query and transform it to window function. For example, TPC-H Q17 is ```sql select sum(l_extendedprice) / 7.0 as avg_yearly from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' and l_quantity < ( select 0.2 * avg(l_quantity) from lineitem where l_partkey = p_partkey ); ``` we rewrite it to ```sql select sum(l_extendedprice) / 7.0 as avg_yearly from ( select l_extendedprice, l_quantity, avg(l_quantity) over(partition by l_partkey) avg_l_quantity from lineitem, part where p_partkey = l_partkey and p_brand = 'Brand#23' and p_container = 'MED BOX' ) where l_quantity < 0.2 * avg_l_quantity ``` now the rule can only handle: where conjuncts in outer scope contain one sub-query and the conjunct contain sub-query is a comparison-predicate, we will support compound-predicate and more than one conjuncts containing sub-query later.	2023-03-27 23:58:41 +08:00
lexluo09	785e3e3bca	[Enhancement](multi catalog) Support hive meta cache TTL (#18102 ) Currently, if user modify the file on hdfs directly, no through hive. The changes of file will not be noticed by Doris and user will get wrong data. Support the TTL(Time-to-Live) config of File Cache, so that the stale file info will be invalidated automatically after expiring. 1.Add a parameter configuration to set file cache ttl. "file.meta.cache.ttl-second". 2.Set the value corresponding to guava expireAfterAccess to the configuration value. Co-authored-by: lexluo <lexluo@tencent.com>	2023-03-27 19:19:31 +08:00
minghong	c8e4684578	[enhancement](nereids)support topN opt in nereids (#17741 ) 1. support topN opt in nereids 2. pushdown limit->proj->sort	2023-03-27 18:57:56 +08:00
mch_ucchi	894f38a517	[fix](planner) fix conjunct planned on exchange node (#18042 ) sql like: select k5, k6, SUM(k3) AS k3 from ( select k5, date_format(k6, '%Y-%m-%d') as k6, count(distinct k3) as k3 from t group by k5, k6 ) AS temp where 1=1 group by k5, k6; will throw exception since conjuncts planned on exchange node, because exchange node cannot handle conjuncts, now we skip exchange node when planning conjuncts, which fixes the bug. notice: the bug occurs iff the conjunct is always true like 1=1 above.	2023-03-27 17:50:52 +08:00
mch_ucchi	902629adb6	[fix](planner) fix targetTypeDef NPE when value is null (#18072 ) sql like: select * from (select , null as top from v1)t where top = 5; select from (select *, null as top from v1)t where top is not null; will cause NPE because targetTypeDef is null when value is null. Now we use cast target type to the targetTypeDef.	2023-03-27 17:29:14 +08:00
Gabriel	cd85b5b262	[conf](nereids) disable new cost model since it hurts performance (#18127 )	2023-03-27 16:12:15 +08:00
jakevin	da8c53a831	[feat](Nereids): pushdown semijoin through agg. (#18105 )	2023-03-27 15:27:44 +08:00
Tiewei Fang	642c378fc7	[feature](table-valued-function) add Backends table-valued-function (#17667 ) This pr implement a new Metadata TVF called backends. And the implement process tutorial is in #17974.	2023-03-27 15:18:31 +08:00
AKIRA	1576130094	[ehancement](stats) Tune stats framework (#18118 )	2023-03-27 14:38:10 +08:00
AKIRA	dc7b2015f5	eh (#18122 )	2023-03-27 11:09:35 +08:00
Liqf	bcf95cd920	[feature](function)Add ST_Angle_Sphere function (#17919 )	2023-03-27 10:14:46 +08:00
Mingyu Chen	c2dd005efb	[fix](chore) fix BE compile and FE protoc artifact issue (#18120 ) add <optional> head to solve the compilation issue use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21 See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/ Remove --show-progress arguments of wget because it is not supported in low version wget	2023-03-27 08:53:42 +08:00
jakevin	1027dd52ba	[feature](Nereids): Pushdown SemiJoin in RBO. (#18099 )	2023-03-26 20:58:43 +08:00
huanghaibin	304064653c	[feature](log)check and log holding lock time when it exceeds threshold (#17965 ) Sometimes the competition of lock is fierce in DatabaseTransactionMgr, which may lead to publish time out, i think we should have a log to hint these lock competition.	2023-03-26 20:11:40 +08:00
Lijia Liu	e06c613f9a	[fix](meta)Fix FE try to repair a tablet witch can not be repaired. #17959	2023-03-26 20:11:14 +08:00
Tiewei Fang	3e8b3d68fc	[BugFix](jdbc catalog) fix OOM when jdbc catalog querys large data from doris #18067 When using JDBC Catalog to query the Doris data, because Doris does not provide the cursor reading method (that is, fetchBatchSize is invalid), Doris will send the data to the client at one time, resulting in client OOM. The MySQL protocol provides a stream reading method. Doris can use this method to avoid OOM. The requirements of using the stream method are setting fetchbatchsize = Integer.MIN_VALUE and setting ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY	2023-03-26 20:02:03 +08:00
ZashJie	2a0890d803	[feature](datatype) add show data types stmt (#18111 )	2023-03-26 12:37:06 +08:00
bin41215	0347ae4dbd	[Enhancement](proc) sort result by backend id when show backends (#18112 )	2023-03-26 11:30:47 +08:00
slothever	c5dcb633e9	[fix](hive)throw exception if complex type in text format table (#18013 ) For Hive text input format: the column types ARRAY/MAP/STRUCT are not supported yet. It will be supported over successive versions. Co-authored-by: jinzhe <jinzhe@selectdb.com>	2023-03-25 23:26:52 +08:00
Mingyu Chen	7c0bcbdca1	[enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 ) Problem: 1. FE will split the parquet file into split. So a file can have several splits. 2. BE will scan each split, read the footer of the parquet file. 3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice. This PR mainly changes: 1. Use kv cache to cache the footer of parquet file. 2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache. 3. In cache, the key is "meta_file_path", the value is parsed thrift footer. The KV Cache is sharded into mutlti sub cache. So that different file can use different sub cache, avoid blocking each other In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s	2023-03-25 23:22:57 +08:00
gitccl	96f274b8f3	[fix](global-variable) fix bug that set default value for global variable will cause NullPointerException (#18004 )	2023-03-25 22:45:26 +08:00
Yisong Han	df0eca4003	[improvement] (schema change) Lightweight schema change of modify column with varchar length (#17207 ) Signed-off-by: Yisong Han <yisong8686@gmail.com>	2023-03-25 22:38:19 +08:00
abmdocrt	cb6fca95b2	[fix](lambda-func) fix lambda functions exception message errors (#18068 )	2023-03-25 22:36:55 +08:00
ZhangYu0123	360d3050bc	[Feature](array-function) Support array_reverse_sort function (#17754 ) Co-authored-by: zhangyu209 <zhangyu209@meituan.com>	2023-03-25 21:58:11 +08:00
xueweizhang	50eeb2d9a4	[fix](json) change int to bigint for json function (#17769 )	2023-03-25 21:57:29 +08:00
奕冷	855852d582	[enhancement](timeout) fix set timeout failure and simplify timeout logic (#17837 )	2023-03-25 21:56:06 +08:00
jakevin	f9013f2668	[feature](Nereids): pullup all semijoin through join. (#18106 )	2023-03-25 20:25:28 +08:00

... 81 82 83 84 85 ...

8289 Commits