doris

Author	SHA1	Message	Date
LiBinfeng	0c3acfdb7c	[Fix](planner) Set inline view output as non constant after analyze (#21212 ) Problem: Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant And make wrong constant folding For example： when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant Solusion: Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see view.a outside	2023-07-06 15:37:43 +08:00
LiBinfeng	068fe44493	[feature](profile) Add important time of legacy planner to profile (#20602 ) Add important time in planning process. Add time points of: // Join reorder end time queryJoinReorderFinishTime means time after analyze and before join reorder // Create single node plan end time queryCreateSingleNodeFinishTime means time after join reorder and before finish create single node plan // Create distribute plan end time queryDistributedFinishTime means time after create single node plan and before finish create distributed node plan	2023-07-06 15:36:25 +08:00
Xiangyu Wang	bb3b6770b5	[Enhancement](multi-catalog) Make meta cache batch loading concurrently. (#21471 ) I will enhance performance about querying meta cache of hms tables by 2 steps: Step1 : use concurrent batch loading for meta cache Step2 : execute some other tasks concurrently as soon as possible This pr mainly for step1 and it mainly do the following things: - Create a `CacheBulkLoader` for batch loading - Remove the executor of the previous async cache loader and change the loader's type to `CacheBulkLoader` (We do not set any refresh strategies for LoadingCache, so the previous executor is not useful) - Use a `FixedCacheThreadPool` to replace the `CacheThreadPool` (The previous `CacheThreadPool` just log warn infos and will not throw any exceptions when the pool is full). - Remove parallel streams and use the `CacheBulkLoader` to do batch loadings - Change the value of `max_external_cache_loader_thread_pool_size` to 64, and set the pool size of hms client pool to `max_external_cache_loader_thread_pool_size` - Fix the spelling mistake for `max_hive_table_catch_num`	2023-07-06 15:18:30 +08:00
jakevin	8839518bfb	[Performance](Nereids): add withGroupExprLogicalPropChildren to reduce new Plan (#21477 )	2023-07-06 14:10:31 +08:00
lihangyu	013bfc6a06	[Bug](row store) Fix column aggregate info lost when table is unique model (#21506 )	2023-07-06 12:06:22 +08:00
Siyang Tang	b1be59c799	[enhancement](query) enable strong consistency by syncing max journal id from master (#21205 ) Add a session var & config enable_strong_consistency_read to solve the problem that loading result may be shortly invisible to follwers, to meet users requirements in strong consistency read scenario. Will sync max journal id from master and wait for replaying.	2023-07-06 10:25:38 +08:00
Mingyu Chen	c1e82ce817	[fix](backup) fix show snapshot cauing mysql connection lost (#21520 ) If this is no `info file` in repository, the mysql connection may lost when user executing `show snapshot on repo`, ``` 2023-07-05 09:22:48,689 WARN (mysql-nio-pool-0\|199) [ReadListener.lambda$handleEvent$0():60] Exception happened in one session(org.apache.doris.qe.ConnectContext@730797c1). java.io.IOException: Error happened when receiving packet. at org.apache.doris.qe.ConnectProcessor.processOnce(ConnectProcessor.java:691) ~[doris-fe.jar:1.2-SNAPSHOT] at org.apache.doris.mysql.ReadListener.lambda$handleEvent$0(ReadListener.java:52) ~[doris-fe.jar:1.2-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_322] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_322] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_322] ``` This is because there are some field missing in returned result set.	2023-07-05 22:44:57 +08:00
Xiangyu Wang	b6a5afa87d	[Feature](multi-catalog) support query hive-view for nereids planner. (#21419 ) Relevant pr #18815, support query hive views for nereids planner.	2023-07-05 21:58:03 +08:00
jakevin	b3db904847	[fix](Nereids): when child is Aggregate, don't infer Distinct for it (#21519 )	2023-07-05 19:39:41 +08:00
Xiangyu Wang	f868aa9d4a	[Enhancement](multi-catalog) Add some checks for ShowPartitionsStmt. (#21446 ) 1. Add some validations for ShowPartitionsStmt with hive tables 2. Make the behavior consistently with hive	2023-07-05 16:28:05 +08:00
Xiangyu Wang	0da1bc7acd	[Fix](multi-catalog) Fallback to refresh catalog when hms events are missing (#21333 ) Fix #20227, the implementation has some problems and can not catch event-missing-exception.	2023-07-05 16:27:01 +08:00
Jibing-Li	37a52789bd	[improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207 ) Support estimate table row count based on file size. With sample size=3000 (total partition number is 87491), load cache time is 45s. With sample size=100000 (more than total partition number 87505), load cache time is 388s.	2023-07-05 16:07:12 +08:00
jakevin	1121e7d0c3	[feature](Nereids): pushdown distinct through join. (#21437 )	2023-07-05 15:55:21 +08:00
morrySnow	4d414c649a	[fix](Nereids) set operation physical properties derive is wrong (#21496 )	2023-07-05 15:44:40 +08:00
xzj7019	f9bc433917	[fix](nereids) fix runtime filter expr order (#21480 ) Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result. This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.	2023-07-05 14:27:35 +08:00
starocean999	de5cfe34bf	[fix](feut)should not create a DeriveStatsJob in fe ut (#21498 )	2023-07-05 10:38:09 +08:00
DeadlineFen	15ec191a77	[Fix](CCR) Use tableId as the credential for CCR syncer instead of tableName (#21466 )	2023-07-05 10:16:09 +08:00
DeadlineFen	93795442a4	[Fix](CCR) Binlog config is missed when create replica task (#21397 )	2023-07-05 10:15:13 +08:00
zy-kkk	f498beed07	[improvement](jdbc)Support for automatically obtaining the precision of the trino/presto timestamp type (#21386 )	2023-07-04 18:59:42 +08:00
zy-kkk	aec5bac498	[improvement](jdbc)Support for automatically obtaining the precision of the hana timestamp type (#21380 )	2023-07-04 18:59:21 +08:00
zy-kkk	b27fa70558	[fix](jdbc) fix presto jdbc catalog pushDown and nameFormat (#21447 )	2023-07-04 18:58:33 +08:00
AKIRA	9d997b9349	[revert](nereids) Revert data size agg (#21216 ) To make stats derivation more precise	2023-07-04 18:02:15 +08:00
jakevin	1b86e658fd	[fix](Nereids): decrease the memo GroupExpression of limits (#21354 )	2023-07-04 17:15:41 +08:00
Mingyu Chen	c2b483529c	[fix](heartbeat) need to set backend status base on edit log (#21410 ) For non-master FE, must set Backend's status based on the content of edit log. There is a bug that if we set fe config: `max_backend_heartbeat_failure_tolerance_count` larger that one, the non-master FE will not set Backend as dead until it receive enough number of heartbeat edit log, which is wrong. This will causing the Backend is dead on Master FE, but is alive on non-master FE	2023-07-04 17:12:53 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
morrySnow	90dd8716ed	[refactor](multicast) change the way multicast do filter, project and shuffle (#21412 ) Co-authored-by: Jerry Hu <mrhhsg@gmail.com> 1. Filtering is done at the sending end rather than the receiving end 2. Projection is done at the sending end rather than the receiving end 3. Each sender can use different shuffle policies to send data	2023-07-04 16:51:07 +08:00
jakevin	9e8501f191	[Performance](Nereids): speedup analyze by removing sort()/addAll() in OptimizeGroupExpressionJob to (#21452 ) sort() and allAll() all rules will cost much time and it's useless action, remove them to speed up. explain tpcds q72: 1.72s -> 1.46s	2023-07-04 16:01:54 +08:00
Pxl	65cb91e60e	[Chore](agg-state) add sessionvariable enable_agg_state (#21373 ) add sessionvariable enable_agg_state	2023-07-04 14:25:21 +08:00
zhangdong	8cbc1d58e1	[fix](MTMV) Disable partition specification temporarily (#20793 ) The syntax for supporting partition updates in the future has not been investigated yet and there are issues with partition syntax. Therefore, the partition syntax has been temporarily removed in the current version and will be added after future research.	2023-07-04 11:09:04 +08:00
jakevin	d5f39a6e54	[Performance](Nereids) refactor code speedup analyze (#21458 ) refactor those code which cost much time.	2023-07-04 10:59:07 +08:00
starocean999	599ba4529c	[fix](nereids) need run ConvertInnerOrCrossJoin rule again after EliminateNotNull (#21346 ) after running EliminateNotNull rule, the join conjuncts may be removed from inner join node. So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.	2023-07-04 10:52:36 +08:00
Xiangyu Wang	11e18f4c98	[Fix](multi-catalog) fix NPE for FileCacheValue. (#21441 ) FileCacheValue.files may be null if there is not any files exists for some partitions.	2023-07-03 23:38:58 +08:00
starocean999	63b170251e	[fix](nereids)cast filter and join conjunct's return type to boolean (#21434 )	2023-07-03 17:22:46 +08:00
Qi Chen	f80df20b6f	[Fix](multi-catalog) Fix read error in mixed partition locations. (#21399 ) Issue Number: close #20948 Fix read error in mixed partition locations(for example, some partitions locations are on s3, other are on hdfs) by `getLocationType` of file split level instead of the table level.	2023-07-03 15:14:17 +08:00
jakevin	9fa2dac352	[fix](Nereids): DefaultPlanRewriter visit plan children. (#21395 )	2023-07-03 13:20:01 +08:00
minghong	17af099dc3	[fix](nereids)miss group id in explain plan #21402 after we introduce "PushdownFilterThroughProject" post processor, some plan node missed their groupExpression (withChildren function will remove groupExpression). this is not good for debug, since it takes more time to find the owner group of a plan node This pr record the missing owner group id in plan node mutableState.	2023-07-03 13:16:33 +08:00
minghong	2827bc1a39	[Fix](nereids) fix a bug in ColumnStatistics.numNulls update #21220 no impact on tpch has impact on tpcds 95, before 1.63 sec, after 1.30 sec	2023-07-03 10:51:23 +08:00
Pxl	59c1bbd163	[Feature](materialized view) support query match mv with agg_state on nereids planner (#21067 ) * support create mv contain aggstate column * update * update * update * support query match mv with agg_state on nereids planner update * update * update	2023-07-03 10:19:31 +08:00
Qi Chen	124516c1ea	[Fix](orc-reader) Fix `Wrong data type for column` error when column order in hive table is not same in orc file schema. (#21306 ) `Wrong data type for column` error when column order in hive table is not same in orc file schema. The root cause is in order to handle the following case: The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping. ### Solution Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration. ```sql CREATE CATALOG hive PROPERTIES ( 'hive.version' = '1.x.x' ); ```	2023-07-03 09:32:55 +08:00
slothever	f5af735fa6	[fix](multi-catalog)fix obj file cache and dlf iceberg catalog (#21238 ) 1. fix storage prefix for obj file cache: oss/cos/obs don't need convert to s3 prefix , just convert when create split 2. dlf iceberg catalog: support dlf iceberg table, use s3 file io.	2023-07-02 21:08:41 +08:00
xy720	f74e635aa5	[bug](proc) fix NumberFormatException in show proc '/current_queries' (#21400 ) If the current query is running for a very long time, the ExecTime of this query may larger than the MAX_INT value, then a NumberFormatException will be thrown when execute "show proc '/current_queries'." The query's ExecTime is long type, we should not use 'Integer.parseInt' to parse it.	2023-07-01 17:42:46 +08:00
Mingyu Chen	887d33c789	[fix](cup) add keywords KW_PERCENT (#21404 ) Or it may cause some edit log replay error, like parsing create routine load stmt, which contains this keyword as a column name	2023-07-01 16:53:54 +08:00
htyoung	603f4ab20f	[fix](truncate) it will directly return and avoid throwing IllegalStateException caused by bufferSize equals zero when table has no partition (#21378 ) if table currently has no partition, the truncate SQL will be a empty command, it should directly return and avoid throwing IllegalStateException caused by bufferSize equals zero Issue Number: close #21316 Co-authored-by: tongyang.han <tongyang.han@jiduauto.com>	2023-07-01 08:39:38 +08:00
Ashin Gau	0e17cd4d92	[fix](hudi) use hudi api to split the COW table (#21385 ) Fix tow bugs: COW & Read Optimized table will use hive splitter to split files, but it can't recognize some specific files. ERROR 1105 (HY000): errCode = 2, detailMessage = (172.21.0.101)[CORRUPTION]Invalid magic number in parquet file, bytes read: 3035, file size: 3035, path: /usr/hive/warehouse/hudi.db/test/.hoodie/metadata/.hoodie/00000000000000.deltacommit.inflight, read magic: The read optimized table created by spark will add empty partition even if the table has no partition, so we have to filter these empty partition keys in hive client. \| test_ro \| CREATE TABLE `test_ro`( `_hoodie_commit_time` string COMMENT '', ... `ts` bigint COMMENT '') PARTITIONED BY ( `` string) ROW FORMAT SERDE	2023-07-01 08:35:33 +08:00
zzzzzzzs	96aa0e5876	[fix](tvf) To fix the bug that requires adding backticks on "frontends()" in order to query the frontends TVF. (#21338 )	2023-06-30 22:37:21 +08:00
Jibing-Li	ed2cd4974e	[fix](nereids) to_date should return type datev2 for datetimev2 (#21375 ) To_date function in nereids return type should be DATEV2 if the arg type is DATETIMEV2. Before the return type was DATE which would cause BE get wrong query result.	2023-06-30 21:42:59 +08:00
jakevin	18b7d84436	[fix](Nereids): reject infer distinct when children exist NLJ (#21391 )	2023-06-30 20:29:48 +08:00
xzj7019	4117f0b93b	[improve](nereids) Support outer rf into inner left outer join (#21368 ) Support rf into left outer join from outside allowed type join. Before this pr, some join type, such as full outer join, are all not allowed to do rf pushing. For example, (a left join b on a.id = b.id) inner join c on a.id2 = c.id2, will lost the rf pushing from c.id2 to inner table a. This pr will open this limitation for supporting rf into left outer join from outside allowed type join.	2023-06-30 19:07:39 +08:00
xzj7019	164448dac3	[fix](nereids) fix rf info missing for set op (#21367 ) During physical set operation translation, we forget to inherit rf related info from set op children, which will lead the merge filter error and get a long waittime.	2023-06-30 18:50:29 +08:00
Pxl	88cbea2b56	[Bug](agg-state) fix core dump on not nullable argument for aggstate's nested argument (#21331 ) fix core dump on not nullable argument for aggstate's nested argument	2023-06-30 18:20:25 +08:00

1 2 3 4 5 ...

3927 Commits