doris

Author	SHA1	Message	Date
mch_ucchi	d4bdd6768c	[Feature](Nereids) support select into outfile (#21197 )	2023-07-13 17:01:47 +08:00
Mingyu Chen	b72e0d9172	[github](labeler) remove scope labeler (#21789 ) Scope labeler is useless now, I think we can remove it.	2023-07-13 16:13:58 +08:00
Euporia	8a42ba5742	[typo](docs) modify bitmap function document (#21721 )	2023-07-13 14:02:10 +08:00
AKIRA	06d129c364	[docs](stats) Update statistics related content #21766 1. Update grammar of `ANALYZE` 2. Add command description about how to delete a analyze job	2023-07-13 13:51:26 +08:00
airborne12	e167394dc1	[Fix](pipeline) close sink when fragment context destructs (#21668 ) Co-authored-by: airborne12 <airborne12@gmail.com>	2023-07-13 11:52:24 +08:00
Jack Drogon	14253b6a30	[fix](ccr) Add tableName in DropInfo && BatchDropInfo (#21736 ) Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>	2023-07-13 11:47:49 +08:00
lihangyu	9cad929e96	[Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741 ) * [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. related pr #20732 There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large. * not use unused rowsets	2023-07-13 11:46:12 +08:00
LiBinfeng	f863c653e2	[Fix](Planner) fix limit execute before sort in show export job (#21663 ) Problem: When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want. Example: we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1 show export from db order by JobId desc limit 1; We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large Solve: We can not cut results first if we have order by clause. And cut result set after sorting	2023-07-13 11:17:28 +08:00
Mryange	cf016f210d	Revert "[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter (#21715 )" (#21763 ) This reverts commit 925da90480f60afc0e5333a536d41e004234874e.	2023-07-13 10:44:20 +08:00
Calvin Kirs	2d2beb637a	[enhancement](RoutineLoad)Mutile table support pipeline load (#21678 )	2023-07-13 10:26:46 +08:00
Siyang Tang	e18465eac7	[feature](TVF) support path partition keys for external file TVF (#21648 )	2023-07-13 10:15:55 +08:00
Xiangyu Wang	105a162f94	[Enhancement](multi-catalog) Merge hms events every round to speed up events processing. (#21589 ) Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.	2023-07-12 23:41:07 +08:00
yujun	2e3d15b552	[Feature](doris compose) A tool for setup and manage doris docker cluster scaling easily (#21649 )	2023-07-12 22:13:38 +08:00
YueW	00c48f7d46	[opt](regression case) add more index change case (#21734 )	2023-07-12 21:52:48 +08:00
zhannngchen	7f133b7514	[fix](partial-update) transient rowset writer should not trigger segcompaction when build rowset (#21751 )	2023-07-12 21:47:07 +08:00
amory	be55cb8dfc	[Improve](jsonb_extract) support jsonb_extract multi parse path (#21555 ) support jsonb_extract multi parse path	2023-07-12 21:37:36 +08:00
Xin Liao	da67d08bca	[fix](compile) fix be compile error (#21765 ) * [fix](compile) fix be compile error * remove warning	2023-07-12 21:14:04 +08:00
amory	3163841a3a	[FIX](serde)Fix decimal for arrow serde (#21716 )	2023-07-12 19:15:48 +08:00
Xin Liao	f0d08da97c	[enhancement](merge-on-write) split delete bitmap from tablet meta (#21456 )	2023-07-12 19:13:36 +08:00
Siyang Tang	9d96e18614	[fix](multi-table-load) fix memory leak when processing multi-table routine load (#21611 ) * use naked ptr to prevent loop ref * add comments	2023-07-12 17:32:56 +08:00
minghong	0243c403f1	[refactor](nereids)set session var for bushy join (#21744 ) add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5 if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree	2023-07-12 16:40:48 +08:00
ElvinWei	3b76428de9	[fix](stats) when some stat is NULL, causing an exception during display stats (#21588 ) During manual statistics injection, some statistics may beNULL, causing an exception during display.	2023-07-12 14:57:06 +08:00
Lijia Liu	d86c67863d	Remove unused code (#21735 )	2023-07-12 14:48:13 +08:00
AKIRA	a18b345459	[opt](stats)update tbl stats of statistics collection after system statistics collection job succeeded (#21528 ) So that if FE crushed when system analyze task running, the system task for column could be created and running when FE recovered	2023-07-12 11:11:50 +08:00
AKIRA	56c2deadb1	[opt](nereids) update CTEConsumer's stats when CTEProducer's stats updated (#21469 )	2023-07-12 10:55:40 +08:00
AKIRA	88c719233a	[opt](nereids) convert OR expression to IN expression (#21326 ) Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine. for example: ```sql col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4) col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4 (col1 = 1 or col1 = 2) and (col2 = 3 or col2 = 4) ``` would be converted to ```sql col1 in (1, 2) or col1 = 3 and (col2 = 4) col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4 (col1 in (1, 2) and (col2 in (3, 4))) ```	2023-07-12 10:53:06 +08:00
daidai	ff42cd9b49	[feature](hive)add read of the hive table textfile format array type (#21514 )	2023-07-11 22:37:48 +08:00
Mryange	925da90480	[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter (#21715 ) Reduced the granularity of the lock. In the past, the entire map was locked map(string) --> map(int) The bf does not need to init_with_fixed_length	2023-07-11 22:35:30 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
AKIRA	ed410034c6	[enhancement](nereids) Sync stats across FE cluster after analyze #21482 Before this PR, if user connect to follower and analyze table, stats would not get cached in follower FE, since Analyze stmt would be forwarded to master, and in follower it's still lazy load to cache.After this PR, once analyze finished on master, master would sync stats to all followers and update follower's stats cache Load partition stats to col stats	2023-07-11 20:09:02 +08:00
zhengyu	8ffa21a157	[fix](config) set FE header size limit to 1MB from 10k (#21719 ) Enlarge jetty_server_max_http_header_size to avoid Request Header Fields Too Large error when streamloading to FE. Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-07-11 19:52:14 +08:00
Kaijie Chen	da86d2ff65	[fix](mow) fix flush_single_block core in calc_segment_delete_bitmap (#21619 )	2023-07-11 15:56:57 +08:00
wangbo	d3317aa33b	[Fix](executor)Fix scan entity core #21696 After the last time to call scan_task.scan_func()，the should be ended, this means PipelineFragmentContext could be released. Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump. But it can only explain core 2 void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler, taskgroup::ScanTaskTaskGroupQueue* scan_queue) { while (!_is_closed) { taskgroup::ScanTask scan_task; auto success = scan_queue->take(&scan_task); if (success) { int64_t time_spent = 0; { SCOPED_RAW_TIMER(&time_spent); scan_task.scan_func(); } scan_queue->update_statistics(scan_task, time_spent); } } }	2023-07-11 15:56:13 +08:00
Dongyang Li	4cbd99ad9b	[pipeline](ckb) trigger new ckb pipeline, even pr id also run (#21661 ) * [pipeline](ckb) also trigger new ckb pipeline * [pipeline](ckb) all pr run ckb pipeline * change required --------- Co-authored-by: stephen <hello-stephen@qq.com>	2023-07-11 15:24:26 +08:00
lihangyu	b2c7a4575c	[Bug](dynamic table) set all CreateTableStmt from cup parser dynamic table flag false (#21706 )	2023-07-11 15:23:27 +08:00
amory	d0eb4d7da3	[Improve](hash-fun)improve nested hash with range #21699 Issue Number: close #xxx when cal array hash, elem size is not need to seed hash hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size), sizeof(elem_size), hash); but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to make difference 2. use range for one hash value to avoid virtual function call in loop. which double the performance. I make it in ut column: array[int64] 50 rows , and single array has 10w elements	2023-07-11 14:40:40 +08:00
zhangy5	cb69349873	[regression] add bitmap filter p1 regression case (#21591 )	2023-07-11 14:27:03 +08:00
Pxl	bb88df3779	[regression-test](agg-state) change set to set global enable_agg_state (#21708 ) When there are multiple fe, we need set global to set the session variable of all fe	2023-07-11 14:15:54 +08:00
zy-kkk	7a758f7944	[enhancement](mysql) Add have_query_cache variable to be compatible with old mysql client (#21701 )	2023-07-11 14:05:40 +08:00
zy-kkk	8d98f2ac7e	[fix](errCode) Change the error code of a read-only variable (#21705 )	2023-07-11 14:05:18 +08:00
zy-kkk	5ed42705d4	[fix](jdbc scan) `1=1` does not translate to `TRUE` (#21688 ) For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database	2023-07-11 14:04:49 +08:00
zy-kkk	d3be10ee58	[improvement](column) Support for the default value of current_timestamp in microsecond (#21487 )	2023-07-11 14:04:13 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
zy-kkk	5a15967b65	[fix](sparkdpp) Change spark dpp default version to 1.2-SNAPSHOT (#21698 )	2023-07-11 10:49:53 +08:00
Mryange	8eae31002d	[fix](regression)update some case with timediff (#21697 ) Because this pr introduces scale. However, fe current constant folding is incomplete, so the exact type cannot be deduced	2023-07-11 09:55:13 +08:00
bobhan1	7b403bff62	[feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623 ) 1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table 2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable	2023-07-11 09:38:56 +08:00
TengJianPing	736d6f3b4c	[improvement](timezone) support mixed uppper-lower case of timezone names (#21572 )	2023-07-11 09:37:14 +08:00
catpineapple	47dd2db292	[doc](fix) storage policy fe conf doc (#21679 ) * [doc](fix) storage policy fe conf doc	2023-07-11 09:16:58 +08:00
Jerry Hu	f87a3ccba2	[fix](runtime_filter) runtime_profile was not initialized in multi_cast_data_stream_source (#21690 )	2023-07-11 00:16:29 +08:00
TengJianPing	d59c21e594	[test](spill) disable fuzzy spill variables for now (#21677 ) we will rewrite this logic, so that it is useless now. Not test it anymore.	2023-07-10 22:28:41 +08:00

1 2 3 4 5 ...

11816 Commits