doris

Author	SHA1	Message	Date
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
Mryange	ebe771d240	[refactor](executor) remove unused variable	2023-07-14 10:35:59 +08:00
daidai	ca6e33ec0c	[feature](table-value-functions)add catalogs table-value-function (#21790 ) mysql> select * from catalogs() order by CatalogId;	2023-07-14 10:25:16 +08:00
amory	cbddff0694	[FIX](map) fix map key-column nullable for arrow serde #21762 arrow is not support key column has null element , but doris default map key column is nullable , so need to deal with if doris map row if key column has null element , we put null to arrow	2023-07-14 00:30:07 +08:00
HappenLee	254f76f61d	[Agg](exec) support aggregation_node limit short circuit (#21767 )	2023-07-14 00:29:19 +08:00
Qi Chen	6fd8f5cd2f	[Fix](parquet-reader) Fix parquet string column min max statistics issue which caused query result incorrectly. (#21675 ) In parquet, min and max statistics may not be able to handle UTF8 correctly. Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used. If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains only ASCII characters. I will improve it in the future PR.	2023-07-14 00:09:41 +08:00
abmdocrt	fd6553b218	[Fix](MoW) Fix bug about caculating all committed rowsets delete bitmaps when do comapction (#21760 )	2023-07-13 21:10:15 +08:00
Xin Liao	2c83e5a538	[fix](merge-on-write) fix be core and delete unused pending publish info for async publish when tablet dropped (#21793 )	2023-07-13 21:09:51 +08:00
Xin Liao	35fa9496e7	[fix](merge-on-write) fix wrong result when query with prefix key predicate (#21770 )	2023-07-13 19:56:00 +08:00
Kang	abc21f5d77	[bugfix](ngram bf index) process differently for normal bloom filter index and ngram bf index (#21310 ) * process differently for normal bloom filter index and ngram bf index * fix review comments for readbility * add test case * add testcase for delete condition	2023-07-13 17:31:45 +08:00
airborne12	e167394dc1	[Fix](pipeline) close sink when fragment context destructs (#21668 ) Co-authored-by: airborne12 <airborne12@gmail.com>	2023-07-13 11:52:24 +08:00
lihangyu	9cad929e96	[Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741 ) * [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. related pr #20732 There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large. * not use unused rowsets	2023-07-13 11:46:12 +08:00
Mryange	cf016f210d	Revert "[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter (#21715 )" (#21763 ) This reverts commit 925da90480f60afc0e5333a536d41e004234874e.	2023-07-13 10:44:20 +08:00
YueW	00c48f7d46	[opt](regression case) add more index change case (#21734 )	2023-07-12 21:52:48 +08:00
zhannngchen	7f133b7514	[fix](partial-update) transient rowset writer should not trigger segcompaction when build rowset (#21751 )	2023-07-12 21:47:07 +08:00
amory	be55cb8dfc	[Improve](jsonb_extract) support jsonb_extract multi parse path (#21555 ) support jsonb_extract multi parse path	2023-07-12 21:37:36 +08:00
Xin Liao	da67d08bca	[fix](compile) fix be compile error (#21765 ) * [fix](compile) fix be compile error * remove warning	2023-07-12 21:14:04 +08:00
amory	3163841a3a	[FIX](serde)Fix decimal for arrow serde (#21716 )	2023-07-12 19:15:48 +08:00
Xin Liao	f0d08da97c	[enhancement](merge-on-write) split delete bitmap from tablet meta (#21456 )	2023-07-12 19:13:36 +08:00
Siyang Tang	9d96e18614	[fix](multi-table-load) fix memory leak when processing multi-table routine load (#21611 ) * use naked ptr to prevent loop ref * add comments	2023-07-12 17:32:56 +08:00
Lijia Liu	d86c67863d	Remove unused code (#21735 )	2023-07-12 14:48:13 +08:00
daidai	ff42cd9b49	[feature](hive)add read of the hive table textfile format array type (#21514 )	2023-07-11 22:37:48 +08:00
Mryange	925da90480	[imporve](bloomfilter) refactor runtime_filter_mgr with bloomfilter (#21715 ) Reduced the granularity of the lock. In the past, the entire map was locked map(string) --> map(int) The bf does not need to init_with_fixed_length	2023-07-11 22:35:30 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
Kaijie Chen	da86d2ff65	[fix](mow) fix flush_single_block core in calc_segment_delete_bitmap (#21619 )	2023-07-11 15:56:57 +08:00
wangbo	d3317aa33b	[Fix](executor)Fix scan entity core #21696 After the last time to call scan_task.scan_func()，the should be ended, this means PipelineFragmentContext could be released. Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump. But it can only explain core 2 void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler, taskgroup::ScanTaskTaskGroupQueue* scan_queue) { while (!_is_closed) { taskgroup::ScanTask scan_task; auto success = scan_queue->take(&scan_task); if (success) { int64_t time_spent = 0; { SCOPED_RAW_TIMER(&time_spent); scan_task.scan_func(); } scan_queue->update_statistics(scan_task, time_spent); } } }	2023-07-11 15:56:13 +08:00
amory	d0eb4d7da3	[Improve](hash-fun)improve nested hash with range #21699 Issue Number: close #xxx when cal array hash, elem size is not need to seed hash hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size), sizeof(elem_size), hash); but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to make difference 2. use range for one hash value to avoid virtual function call in loop. which double the performance. I make it in ut column: array[int64] 50 rows , and single array has 10w elements	2023-07-11 14:40:40 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
bobhan1	7b403bff62	[feature](partial update)support insert new rows in non-strict mode partial update with nullable unmentioned columns (#21623 ) 1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table 2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable	2023-07-11 09:38:56 +08:00
TengJianPing	736d6f3b4c	[improvement](timezone) support mixed uppper-lower case of timezone names (#21572 )	2023-07-11 09:37:14 +08:00
Jerry Hu	f87a3ccba2	[fix](runtime_filter) runtime_profile was not initialized in multi_cast_data_stream_source (#21690 )	2023-07-11 00:16:29 +08:00
HappenLee	307149dc35	[pipeline](task_queue) remove disable steal in task queue to speed up query (#21692 ) TPCH Q9 before: 2.74s after: 2.33s	2023-07-10 22:21:56 +08:00
Mryange	8973610543	[feature](datetime) "timediff" supports calculating microseconds (#21371 )	2023-07-10 19:21:32 +08:00
Jerry Hu	36524f2b72	[improvement](functions) avoid copying of block in create_block_with_nested_columns (#21526 ) avoid copying of block in create_block_with_nested_columns	2023-07-10 17:21:23 +08:00
Siyang Tang	842fe00157	[enhancement](flush) make writer write fail status visible (#21530 )	2023-07-10 17:14:33 +08:00
zy-kkk	0be349e250	[feature](jdbc) Support jdbc catalog to read json types (#21341 )	2023-07-10 16:21:00 +08:00
HappenLee	1a08c81adc	[Profile](runtimefilter) fix merge time of runtime filter (#21654 )	2023-07-10 16:16:05 +08:00
abmdocrt	7d4c47e250	[Enhancement](Compaction) Caculate all committed rowsets delete bitmaps when do comapction (#20907 ) Here we will calculate all the rowsets delete bitmaps which are committed but not published to reduce the calculation pressure of publish phase. Step1: collect this tablet's all committed rowsets' delete bitmaps. Step2: calculate all rowsets' delete bitmaps which are published during compaction. Step3: write back updated delete bitmap and tablet info.	2023-07-10 14:06:11 +08:00
airborne12	ee9822fa7e	[Fix](pipeline) fix ExchangeSinkBuffer request id memory alloc problem (#21647 ) Co-authored-by: airborne12 <airborne12@gmail.com> fix ExchangeSinkBuffer request id memory alloc problem	2023-07-09 23:45:28 +08:00
GoGoWen	469c8b7ece	[Fix](JSON LOAD)fix json load issue when string conform with RFC 4627 #21390 should set: enable_simdjson_reader=false in master as master enable_simdjson_reader=true by default. Issue Number: close #21389 from rapidjson: Query String In addition to GetString(), the Value class also contains GetStringLength(). Here explains why: According to RFC 4627, JSON strings can contain Unicode character U+0000, which must be escaped as "\u0000". The problem is that, C/C++ often uses null-terminated string, which treats \0 as the terminator symbol. To conform with RFC 4627, RapidJSON supports string containing U+0000 character. If you need to handle this, you can use GetStringLength() to obtain the correct string length. For example, after parsing the following JSON to Document d: { "s" : "a\u0000b" } The correct length of the string "a\u0000b" is 3, as returned by GetStringLength(). But strlen() returns 1. GetStringLength() can also improve performance, as user may often need to call strlen() for allocating buffer. Besides, std::string also support a constructor: string(const char* s, size_t count); which accepts the length of string as parameter. This constructor supports storing null character within the string, and should also provide better performance.	2023-07-09 17:16:03 +08:00
YueW	cf1efce824	[fix](inverted index) use index id instead of column uid to determine whether a hard link is required when build index (#21574 ) Fix problem: For the same column, there are concurrent drop index request and build index request, if build index obtain lock before drop index, build a new index file, but when drop index request execute, link file not contains all index files for the column, that lead to new index file is missed. Based on the above questions, use index id instead of column unique id to determine whether a hard link is required when do build index	2023-07-09 16:45:27 +08:00
Xinyi Zou	bf61d2cfc0	[fix](sink) fix pipeline load stuck #21636	2023-07-09 16:27:11 +08:00
Kaijie Chen	1b226ff8a2	[refactor](load) remove FlushContext from SegmentWriter (#21596 ) * [refactor](load) remove FlushContext from SegmentWriter * remove unused imports	2023-07-08 22:44:56 +08:00
airborne12	f7adb6507e	[Fix](storage engine) shutdown cooldown and cold data compaction thread when engine stop (#21639 ) when stop be gracefully, storage engine did not shut down cooldown and cold data compaction thread correctly.	2023-07-08 22:22:15 +08:00
Mryange	f8a2c66174	[refactor](planner) refactor automatically set instance_num (#21640 ) refactor automatically set instance_num	2023-07-08 21:59:17 +08:00
amory	7caab87bbe	[FIX](serde) fix map/struct/array support arrow #21628 support map/struct support arrow format fix string arrow format fix largeInt 128 for arrow builder	2023-07-08 15:51:14 +08:00
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
Mingyu Chen	b471cf2045	Revert "[Enhancement](multi-catalog) Add hdfs read statistics profile. (#21442 )" (#21618 ) This reverts commit 57729bad6841ea9728e6b2cf0bd484133e7b9ead. To fix compile error	2023-07-07 17:45:31 +08:00
zhannngchen	67afea73b1	[enhancement](merge-on-write) add more version and txn information for mow publish (#21257 )	2023-07-07 16:18:47 +08:00
Mingyu Chen	871002c882	[fix](kerberos) should renew the kerberos ticket each half of ticket lifetime (#21546 ) Follow #21265, the renew interval of kerberos ticket should be half of config::kerberos_expiration_time_seconds	2023-07-07 14:52:36 +08:00

1 2 3 4 5 ...

4977 Commits