doris

Author	SHA1	Message	Date
YueW	d2531db1cf	[fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066 )	2023-07-24 15:39:08 +08:00
TengJianPing	99bf901607	[fix](in) throw exception for unsupported data type of in expr (#22050 )	2023-07-24 14:13:31 +08:00
Pxl	19ba6bec38	[Improvement](pipeline) support send eos on local exchange and remove some unused code (#22086 ) support send eos on local exchange and remove some unused code	2023-07-24 09:25:32 +08:00
Chenyang Sun	0396ac9d38	fix(compaction) release the block and segment iterator after reading to the end of the segment file (#22082 ) When reading to the end of the segment file, clearing the block did not release the memory, leading to high memory usage during compaction. When reading through segment file for columns that are dictionary encoded, the column iterator in the segment iterator will hold the dictionary. Release the segment iterator to free up the dictionary.	2023-07-24 08:47:19 +08:00
Liqf	ddd7e9871d	[improvement](Jsonb) optimization Jsonb path parse (#21495 ) The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath	2023-07-23 18:59:12 +08:00
yiguolei	2c16fe0da9	[bugfix](runtimefilter) runtime filter is shared between multi instances with same node id, should not cache exprs (#22114 ) runtime filter is shared among multi instances. in the past, we cached pushdown expr(runtime filter generated) every scannode[runtime filter consumer] will try to call prepare expr but the expr may generated with different fn_context_id --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-07-23 13:04:33 +08:00
zhangstar333	256051a965	[bug](node) fix partiton sort node core dump when eos (#22108 ) fix partiton sort node core dump when eos	2023-07-23 12:00:53 +08:00
yiguolei	f8307f1a1a	[bugfix](scanner) when scanner init failed during get tablet, not need call update counters (#22117 ) Co-authored-by: yiguolei <yiguolei@gmail.com> If the scanner is failed during init or open, then not need update counters because the query is fail and the counter is useless. And it may core during update counters. For example, update counters depend on scanner's tablet, but the tablet == null when init failed.	2023-07-23 10:19:20 +08:00
Pxl	ae809fbeba	[Bug](storage )fix dead lock when create_tablet need lock two tablet && update mv_p0… (#21969 ) fix dead lock when create_tablet need lock two tablet && update mv_p0/ssb case	2023-07-22 15:27:05 +08:00
zhangstar333	afeac4419f	[Bug](node) fix partition sort node forget handle some type of key in hashmap (#22037 ) * [enhancement](repeat) add filter in repeat node in BE * update	2023-07-21 23:30:40 +08:00
Kaijie Chen	bed940b7fc	[fix](log) column index off-by-one error in scanner logs (#19747 )	2023-07-21 18:30:01 +08:00
bobhan1	2b2ac10e93	[feature](partial update) add failure tolerance for strict mode partial update stream load	2023-07-21 16:46:44 +08:00
bobhan1	732e0d14ff	[Enhancement](window-funnel)add different modes for window_funnel() function (#20563 )	2023-07-21 13:57:27 +08:00
ZenoYang	6512893257	[refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026 ) * [refactor](vectorized) Remove useless control variables to simplify aggregation node code * fix	2023-07-21 12:45:23 +08:00
Mryange	6875ef4b8b	[refactor](mem_reuse) refactor mem_reuse in MutableBlock (#21564 )	2023-07-20 22:53:19 +08:00
HappenLee	7947569993	[Bug][RegressionTest] fix the DCHECK failed in join code (#22021 )	2023-07-20 18:12:20 +08:00
bobhan1	367ad9164a	[feature-wip](auto-inc)(step-2) support auto-increment column for duplicate table (#19917 )	2023-07-20 18:03:39 +08:00
zhangstar333	650d7cfc8c	[enhancement](repeat) add filter in repeat node in BE (#21984 ) [enhancement](repeat) add filter in repeat node in BE (#21984)	2023-07-20 17:25:13 +08:00
HappenLee	9182b8d3c2	[Refactor](exec) Remove the unless header of vresult_writer (#22011 ) Remove unless code of vresult_wirter;	2023-07-20 13:31:44 +08:00
lihangyu	20242d9a0e	[Improve](simdjson) put unescaped string value after parsed (#21866 ) In some cases, it is necessary to unescape the original value, such as when converting a string to JSONB. If not unescape, then later jsonb parse will be failed	2023-07-20 10:33:17 +08:00
amory	ce397a8d32	[FIX](map)fix arrow serde with map null key #21955	2023-07-19 12:09:34 +08:00
HappenLee	b35cfc5d5e	[opt](join) Opt the performance of join probe (#21845 )	2023-07-19 01:21:22 +08:00
zclllyybb	845cf94a7a	[feature](function) support time_to_sec (#21722 ) mysql >select sec_to_time(time_to_sec(cast('16:32:18' as time))); +----------------------------------------------------+ \| sec_to_time(time_to_sec(CAST('16:32:18' AS TIME))) \| +----------------------------------------------------+ \| 16:32:18 \| +----------------------------------------------------+ 1 row in set (0.53 sec) mysql [test]>select sec_to_time(59538); +--------------------+ \| sec_to_time(59538) \| +--------------------+ \| 16:32:18 \| +--------------------+ 1 row in set (0.03 sec)	2023-07-19 01:09:48 +08:00
Pxl	4171309b9b	[Bug](scanner) fix core dump due to release ScannerContext too early #21946	2023-07-19 00:53:23 +08:00
TengJianPing	a9ea138caf	[fix](two level hash table) fix dead loop when converting to two level hash table for zero value (#21899 ) When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.	2023-07-18 19:50:30 +08:00
HHoflittlefish777	c6063ed92f	[Revert](lazy open) revert lazy open and add case (#21821 )	2023-07-18 19:41:33 +08:00
Mryange	c36d225a27	[feature](profile) add process hashtable time in join node (#21878 ) add process hashtable time in join node	2023-07-18 18:09:42 +08:00
Pxl	3089e4b3b6	[Bug](excution) fix ScannerContext is done make query failed (#21923 ) fix ScannerContext is done make query failed	2023-07-18 17:58:00 +08:00
Pxl	19492b06c1	[Bug](decimalv3) fix failed on test_dup_tab_decimalv3 due to wrong precision (#21890 ) fix failed on test_dup_tab_decimalv3 due to wrong precision	2023-07-18 12:53:09 +08:00
Pxl	b3d3ffa2de	[Bug](pipeline) adjust scanner scheduler.submit and _num_scheduling_ctx maintain (#21843 ) adjust scanner scheduler.submit and _num_scheduling_ctx maintain	2023-07-18 11:55:21 +08:00
Tiewei Fang	12784f863d	[fix](Export) Fixed the bug that would be core when exporting large amounts of data (#21761 ) A heap-buffer-overflow error occurs when exporting large amounts of data to orc format. Reserve 50B for buffer to avoid this problem.	2023-07-18 00:06:38 +08:00
Mingyu Chen	5fc0a84735	[improvement](catalog) reduce the size thrift params for external table query (#21771 ) ### 1 In previous implementation, for each FileSplit, there will be a `TFileScanRange`, and each `TFileScanRange` contains a list of `TFileRangeDesc` and a `TFileScanRangeParams`. So if there are thousands of FileSplit, there will be thousands of `TFileScanRange`, which cause the thrift data send to BE too large, resulting in: 1. the rpc of sending fragment may fail due to timeout 2. FE will OOM For a certain query request, the `TFileScanRangeParams` is the common part and is same of all `TFileScanRange`. So I move this to the `TExecPlanFragmentParams`. After that, for each FileSplit, there is only a list of `TFileRangeDesc`. In my test, to query a hive table with 100000 partitions, the size of thrift data reduced from 151MB to 15MB, and the above 2 issues are gone. ### 2 Support when setting `max_external_file_meta_cache_num` <=0, the file meta cache for parquet footer will not be used. Because I found that for some wide table, the footer is too large(1MB after compact, and much more after deserialized to thrift), it will consuming too much memory of BE when there are many files. This will be optimized later, here I just support to disable this cache.	2023-07-17 13:37:02 +08:00
zy-kkk	03b575842d	[Feature](table function) support explode_json_array_json (#21795 )	2023-07-17 11:40:02 +08:00
HappenLee	a7eb186801	[Bug](CSVReader) fix null pointer coredump in CSVReader in p2 (#20811 )	2023-07-15 22:50:10 +08:00
HappenLee	7f50c07219	[Opt](exec) opt the outer join performance in TPCDS Q95 (#21806 )	2023-07-14 18:42:08 +08:00
Siyang Tang	b013f8006d	[enhancement](multi-table) enable mullti table routine load on pipeline engine (#21729 )	2023-07-14 12:16:32 +08:00
zhangstar333	c07e2ada43	[imporve](udaf) refactor java-udaf executor by using for loop (#21713 ) refactor java-udaf executor by using for loop	2023-07-14 11:37:19 +08:00
Mryange	ebe771d240	[refactor](executor) remove unused variable	2023-07-14 10:35:59 +08:00
daidai	ca6e33ec0c	[feature](table-value-functions)add catalogs table-value-function (#21790 ) mysql> select * from catalogs() order by CatalogId;	2023-07-14 10:25:16 +08:00
amory	cbddff0694	[FIX](map) fix map key-column nullable for arrow serde #21762 arrow is not support key column has null element , but doris default map key column is nullable , so need to deal with if doris map row if key column has null element , we put null to arrow	2023-07-14 00:30:07 +08:00
HappenLee	254f76f61d	[Agg](exec) support aggregation_node limit short circuit (#21767 )	2023-07-14 00:29:19 +08:00
Qi Chen	6fd8f5cd2f	[Fix](parquet-reader) Fix parquet string column min max statistics issue which caused query result incorrectly. (#21675 ) In parquet, min and max statistics may not be able to handle UTF8 correctly. Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used. If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains only ASCII characters. I will improve it in the future PR.	2023-07-14 00:09:41 +08:00
lihangyu	9cad929e96	[Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. (#21741 ) * [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query. related pr #20732 There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large. * not use unused rowsets	2023-07-13 11:46:12 +08:00
amory	be55cb8dfc	[Improve](jsonb_extract) support jsonb_extract multi parse path (#21555 ) support jsonb_extract multi parse path	2023-07-12 21:37:36 +08:00
amory	3163841a3a	[FIX](serde)Fix decimal for arrow serde (#21716 )	2023-07-12 19:15:48 +08:00
Lijia Liu	d86c67863d	Remove unused code (#21735 )	2023-07-12 14:48:13 +08:00
daidai	ff42cd9b49	[feature](hive)add read of the hive table textfile format array type (#21514 )	2023-07-11 22:37:48 +08:00
Xinyi Zou	4b30485d62	[improvement](memory) Refactor doris cache GC (#21522 ) Abstract CachePolicy, which controls the gc of all caches. Add stale sweep to all lru caches, including page caches, etc. I0710 18:32:35.729460 2945318 mem_info.cpp:172] End Full GC Free, Memory 3866389992 Bytes. cost(us): 112165339, details: FullGC: FreeTopMemoryQuery: - CancelCostTime: 1m51s - CancelTasksNum: 1 - FindCostTime: 0.000ns - FreedMemory: 2.93 GB WorkloadGroup: Cache name=DataPageCache: - CostTime: 15.283ms - FreedEntrys: 9.56K - FreedMemory: 691.97 MB - PruneAllNumber: 1 - PruneStaleNumber: 1	2023-07-11 20:21:31 +08:00
wangbo	d3317aa33b	[Fix](executor)Fix scan entity core #21696 After the last time to call scan_task.scan_func()，the should be ended, this means PipelineFragmentContext could be released. Then after PipelineFragmentContext is released, visiting its field such as query_ctx or _state may cause core dump. But it can only explain core 2 void ScannerScheduler::_task_group_scanner_scan(ScannerScheduler* scheduler, taskgroup::ScanTaskTaskGroupQueue* scan_queue) { while (!_is_closed) { taskgroup::ScanTask scan_task; auto success = scan_queue->take(&scan_task); if (success) { int64_t time_spent = 0; { SCOPED_RAW_TIMER(&time_spent); scan_task.scan_func(); } scan_queue->update_statistics(scan_task, time_spent); } } }	2023-07-11 15:56:13 +08:00
amory	d0eb4d7da3	[Improve](hash-fun)improve nested hash with range #21699 Issue Number: close #xxx when cal array hash, elem size is not need to seed hash hash = HashUtil::zlib_crc_hash(reinterpret_cast<const char*>(&elem_size), sizeof(elem_size), hash); but we need to be care [[], [1]] vs [[1], []], when array nested array , and nested array is empty, we should make hash seed to make difference 2. use range for one hash value to avoid virtual function call in loop. which double the performance. I make it in ut column: array[int64] 50 rows , and single array has 10w elements	2023-07-11 14:40:40 +08:00

1 2 3 4 5 ...

1929 Commits