doris

Author	SHA1	Message	Date
airborne12	5d2739b5c5	[Fix](submodule) revert clucene version wrong rollback (#21523 )	2023-07-05 19:10:15 +08:00
Xiangyu Wang	f868aa9d4a	[Enhancement](multi-catalog) Add some checks for ShowPartitionsStmt. (#21446 ) 1. Add some validations for ShowPartitionsStmt with hive tables 2. Make the behavior consistently with hive	2023-07-05 16:28:05 +08:00
Xiangyu Wang	0da1bc7acd	[Fix](multi-catalog) Fallback to refresh catalog when hms events are missing (#21333 ) Fix #20227, the implementation has some problems and can not catch event-missing-exception.	2023-07-05 16:27:01 +08:00
Mingyu Chen	242a35fa80	[fix](s3) fix s3 fs benchmark tool (#21401 ) 1. fix concurrency bug of s3 fs benchmark tool, to avoid crash on multi thread. 2. Add `prefetch_read` operation to test prefetch reader. 3. add `AWS_EC2_METADATA_DISABLED` env in `start_be.sh` to avoid call ec2 metadata when creating s3 client. 4. add `AWS_MAX_ATTEMPTS` env in `start_be.sh` to avoid warning log of s3 sdk.	2023-07-05 16:20:58 +08:00
HappenLee	39590f95b0	[pipeline](load) return error status in pipeline load (#21303 )	2023-07-05 16:13:32 +08:00
Jibing-Li	37a52789bd	[improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207 ) Support estimate table row count based on file size. With sample size=3000 (total partition number is 87491), load cache time is 45s. With sample size=100000 (more than total partition number 87505), load cache time is 388s.	2023-07-05 16:07:12 +08:00
jakevin	1121e7d0c3	[feature](Nereids): pushdown distinct through join. (#21437 )	2023-07-05 15:55:21 +08:00
morrySnow	4d414c649a	[fix](Nereids) set operation physical properties derive is wrong (#21496 )	2023-07-05 15:44:40 +08:00
abmdocrt	d8a549fe61	[Fix](Comment) Comment should be in English (#20964 )	2023-07-05 15:41:34 +08:00
abmdocrt	48bfb8e9cf	[Enhancement](regression-test)Add regression test for MoW backup and restore (#21223 )	2023-07-05 15:16:04 +08:00
Xinyi Zou	38c8657e5e	[improve](memory) more grace logging for memory exceed limit (#21311 ) more grace logging for Allocator and MemTracker when memory exceed limit fix bthread grace exit.	2023-07-05 14:59:06 +08:00
xzj7019	f9bc433917	[fix](nereids) fix runtime filter expr order (#21480 ) Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result. This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.	2023-07-05 14:27:35 +08:00
Pxl	f02bec8ad1	[Chore](runtime filter) change runtime filter dcheck to error status or exception (#21475 ) change runtime filter dcheck to error status or exception	2023-07-05 14:03:55 +08:00
catpineapple	d3eeb233c8	[fix](dbt) dbt getconfig array or string (#21345 ) {{ config(unique_key='id') }} {{ config(unique_key=['id','name']) }} Follow the dbt habit, use string for a single column name, and use array for multiple columns	2023-07-05 11:42:38 +08:00
catpineapple	e510e6b0a6	[fix](dbt) dbt-doris match dbt-core==1.5 (#21392 ) dbt-doris==0.2 match dbt-core==1.3 or older version dbt-doris Subsequent version match dbt-core==1.4，1.5	2023-07-05 11:42:19 +08:00
catpineapple	c9c183e498	[fix](dbt) dbt seed config read (#21492 )	2023-07-05 11:41:59 +08:00
Ashin Gau	0084b9fd9a	[fix](hudi) scala can't call Properties.putAll in jdk11 (#21494 )	2023-07-05 10:53:09 +08:00
starocean999	de5cfe34bf	[fix](feut)should not create a DeriveStatsJob in fe ut (#21498 )	2023-07-05 10:38:09 +08:00
DeadlineFen	15ec191a77	[Fix](CCR) Use tableId as the credential for CCR syncer instead of tableName (#21466 )	2023-07-05 10:16:09 +08:00
DeadlineFen	93795442a4	[Fix](CCR) Binlog config is missed when create replica task (#21397 )	2023-07-05 10:15:13 +08:00
DeadlineFen	0469c02202	[Test](regression) Temporarily disable quickTest for SHOW CREATE TABLE to adapt to enable_feature_binlog=true (#21247 )	2023-07-05 10:12:02 +08:00
zhangstar333	122f5f6c2d	[enchanment](udf) add more info when download jar package failed (#21440 ) when download jar package, some times show the checksum is not equal, but the root reason is unknown, now add some error msg if failed.	2023-07-04 20:35:35 +08:00
Xinyi Zou	3b73604f74	[fix](memory) fix jemalloc purge arena dirty pages core dump (#21486 ) Issue Number: close #xxx jemalloc/jemalloc#2470 Occasional core dump during stress test.	2023-07-04 20:35:13 +08:00
Mryange	81ee4d7402	[performance](group_concat) avoid extra copy in group_concat (#21432 ) avoid extra copy in group_concat	2023-07-04 20:21:44 +08:00
Luzhijing	8c2963961f	[docs](releasenote) 2.0 beta release note (#21457 )	2023-07-04 19:02:18 +08:00
zy-kkk	f498beed07	[improvement](jdbc)Support for automatically obtaining the precision of the trino/presto timestamp type (#21386 )	2023-07-04 18:59:42 +08:00
zy-kkk	aec5bac498	[improvement](jdbc)Support for automatically obtaining the precision of the hana timestamp type (#21380 )	2023-07-04 18:59:21 +08:00
zy-kkk	b27fa70558	[fix](jdbc) fix presto jdbc catalog pushDown and nameFormat (#21447 )	2023-07-04 18:58:33 +08:00
zy-kkk	be406a1696	[typo](docs) fix presto jdbc catalog docs (#21445 )	2023-07-04 18:24:58 +08:00
YueW	899f7fbfeb	[fix](regression case) fix variable scope bug in some inverted index regression cases (#21194 ) fix variable scope bug in some inverted index regression cases	2023-07-04 18:05:46 +08:00
AKIRA	9d997b9349	[revert](nereids) Revert data size agg (#21216 ) To make stats derivation more precise	2023-07-04 18:02:15 +08:00
jakevin	1b86e658fd	[fix](Nereids): decrease the memo GroupExpression of limits (#21354 )	2023-07-04 17:15:41 +08:00
Mingyu Chen	13fb69550a	[improvement](kerberos) disable hdfs fs handle cache to renew kerberos ticket at fix interval (#21265 ) Add a new BE config `kerberos_ticket_lifetime_seconds`, default is 86400. Better set it same as the value of `ticket_lifetime` in `krb5.conf` If a HDFS fs handle in cache is live longer than HALF of this time, it will be set as invalid and recreated. And the kerberos ticket will be renewed.	2023-07-04 17:13:34 +08:00
Mingyu Chen	c2b483529c	[fix](heartbeat) need to set backend status base on edit log (#21410 ) For non-master FE, must set Backend's status based on the content of edit log. There is a bug that if we set fe config: `max_backend_heartbeat_failure_tolerance_count` larger that one, the non-master FE will not set Backend as dead until it receive enough number of heartbeat edit log, which is wrong. This will causing the Backend is dead on Master FE, but is alive on non-master FE	2023-07-04 17:12:53 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
morrySnow	90dd8716ed	[refactor](multicast) change the way multicast do filter, project and shuffle (#21412 ) Co-authored-by: Jerry Hu <mrhhsg@gmail.com> 1. Filtering is done at the sending end rather than the receiving end 2. Projection is done at the sending end rather than the receiving end 3. Each sender can use different shuffle policies to send data	2023-07-04 16:51:07 +08:00
hqx871	09f414e0f4	fix lru cache handle field order (#21435 ) For LRUHandle, all fields should be put ahead of key_data. The LRUHandle is allocated using malloc and starting from field key_data is for key data.	2023-07-04 16:10:05 +08:00
jakevin	9e8501f191	[Performance](Nereids): speedup analyze by removing sort()/addAll() in OptimizeGroupExpressionJob to (#21452 ) sort() and allAll() all rules will cost much time and it's useless action, remove them to speed up. explain tpcds q72: 1.72s -> 1.46s	2023-07-04 16:01:54 +08:00
Huang Haijun	890e55b604	[typo](docs)Delete unsupported sql statements in GROUP_CONCAT() (#21455 ) Delete unsupported sql statements in GROUP_CONCAT()	2023-07-04 14:46:49 +08:00
Pxl	65cb91e60e	[Chore](agg-state) add sessionvariable enable_agg_state (#21373 ) add sessionvariable enable_agg_state	2023-07-04 14:25:21 +08:00
Kang	9477436524	[fix](test) add def keyword to define local variable success (#21206 ) add def keyword to define local variable success	2023-07-04 14:24:37 +08:00
Jerry Hu	b5da3f74f5	[improvement](join) avoid unnecessary copying in _build_output_block (#21360 ) If the source columns are mutually exclusive within a temporary block, there is no need to duplicate the data.	2023-07-04 12:13:49 +08:00
Kaijie Chen	cac465472a	[chore](tools) add submodules in .idea/vcs.xml (#21383 )	2023-07-04 11:44:09 +08:00
Calvin Kirs	e4c0a0ac24	[improve](dependency)Upgrade dependency version (#21431 ) exclude old netty version upgrade spring-boot version to 2.7.13 used ojdbc8 replace ojdbc6 upgrade jackson version to 2.15.2 upgrade fabric8 version to 6.7.2	2023-07-04 11:29:21 +08:00
Xinyi Zou	b86dd11a7d	[fix](pipeline) refactor olap table sink close (#20771 ) For pipeline, olap table sink close is divided into three stages, try_close() --> pending_finish() --> close() only after all node channels are done or canceled, pending_finish() will return false, close() will start. this will avoid block pipeline on close(). In close, check the index channel intolerable failure status after each node channel failure, if intolerable failure is true, the close will be terminated in advance, and all node channels will be canceled to avoid meaningless blocking.	2023-07-04 11:27:51 +08:00
zhangdong	8cbc1d58e1	[fix](MTMV) Disable partition specification temporarily (#20793 ) The syntax for supporting partition updates in the future has not been investigated yet and there are issues with partition syntax. Therefore, the partition syntax has been temporarily removed in the current version and will be added after future research.	2023-07-04 11:09:04 +08:00
jakevin	d5f39a6e54	[Performance](Nereids) refactor code speedup analyze (#21458 ) refactor those code which cost much time.	2023-07-04 10:59:07 +08:00
starocean999	599ba4529c	[fix](nereids) need run ConvertInnerOrCrossJoin rule again after EliminateNotNull (#21346 ) after running EliminateNotNull rule, the join conjuncts may be removed from inner join node. So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.	2023-07-04 10:52:36 +08:00
Kaijie Chen	b1c16b96d6	[refactor](load) move validator out of VOlapTableSink (#21460 )	2023-07-04 10:16:56 +08:00
TengJianPing	938c0765cd	[improvement](memory) improve inserting sparse rows into string column (#21420 ) For the following test, which simulate hash join outputing 435699854 rows from 5131 buiding rows: { auto col = doris::vectorized::ColumnString::create(); constexpr int build_rows = 5131; constexpr int output_rows = 435699854; std::string str("01234567"); for (int i = 0; i < build_rows; ++i) { col->insert_data(str.data(), str.size()); } int indices[output_rows]; for (int i = 0; i < output_rows; ++i) { indices[i] = i % build_rows; } auto col2 = doris::vectorized::ColumnString::create(); doris::MonotonicStopWatch watch; watch.start(); col2->insert_indices_from(*col, indices, indices + output_rows); watch.stop(); LOG(WARNING) << "string column insert_indices_from, rows: " << output_rows << ", time: " << doris::PrettyPrinter::print(watch.elapsed_time(), doris::TUnit::TIME_NS); } The ColumnString::insert_indices_from inserting time improve from 6s665ms to 3s158ms: W0702 23:08:39.672044 1277989 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s153ms W0702 23:09:36.368853 1282061 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 3s158ms W0703 00:30:26.093307 1468640 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s761ms W0703 00:31:21.043638 1472937 doris_main.cpp:545] string column insert_indices_from, rows: 435699854, time: 6s665ms	2023-07-04 09:34:10 +08:00

1 2 3 4 5 ...

11677 Commits