doris

Author	SHA1	Message	Date
Qi Chen	3ed5cf8350	[Optimize] add `has_filter` template param in `get_next_run() to decrease` _has_filter `condition checking count in the loop.` (#19043 )	2023-04-27 21:23:36 +08:00
Qi Chen	e4f7d77c5c	[Optimize](parquet-reader) Opt by filtering null count statistics in rowgroup and page level. (#19106 ) Issue Number: About #19038, we found in this case, l_orderkey has many nulls, so we can filter it by null count statistics in the row group and page level, then it can improve a lot of performance in this case.	2023-04-27 21:21:30 +08:00
yixiutt	95d91e7010	[bugfix](txn_manager) use write lock to protect txn_tablet_map (#19161 )	2023-04-27 20:21:20 +08:00
HappenLee	9e2b118288	[RegressTest](Exec) Add DCHECK null_aware_left_anti_join in mark join (#19149 )	2023-04-27 17:52:03 +08:00
Xinyi Zou	f23c93b3c6	[fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126 )	2023-04-27 14:58:32 +08:00
Xinyi Zou	98a975b013	[fix](memory) Fix SchemaChange memory leak due to incorrect aggfunc destroy (#19130 )	2023-04-27 14:44:00 +08:00
Yongqiang YANG	8412571030	[fix](memleak) avoid memleak due to race condition (#19071 )	2023-04-27 14:22:09 +08:00
Kang	68d3111629	[bugfix](topn) fix memory leak in topn AcceptNullPredicate (#19060 ) fix the memory leak reported by ASAN as follows.	2023-04-27 14:07:57 +08:00
airborne12	b9855a6e29	[Fix](inverted index) fix memory leak for inverted index (#19008 ) forget to delete handler->_shared_lock	2023-04-27 11:53:55 +08:00
lihangyu	e76b3a316f	[Bug](mysql proto) fix binary proto with dynamic mode (#19055 ) Dynamic mode used in array type when serialize it to mysql row buffer using dynamic mode, when combine binary row format with dynamic mode,something goes wrong, and lead to invalid binary row format.	2023-04-27 11:18:01 +08:00
Mingyu Chen	84d040bdbf	[fix](heartbeat) fix update BE last start time (#18962 ) Sometimes the LastStartTime info in show backends result is unchanged even if BE restart. This PR fix it	2023-04-27 09:59:04 +08:00
brody715	20395ce501	[feature](array_function): add support for array_cum_sum function (#18231 )	2023-04-27 09:57:13 +08:00
Yongqiang YANG	6eb12640a1	[fix](segment_iter) do not init segment_iterator twice (#18337 ) * [fix](segment_iter) do not init segment_iterator twice SegmentIterator::init is called by Segment::new_iterator and BetaRowsetReader::get_segment_iterators twice.	2023-04-27 09:51:57 +08:00
yiguolei	a262f42a28	[refactor](exceptionsafe) make scanner and scancontext exception safe (#19057 )	2023-04-27 09:23:01 +08:00
Liqf	d12fe4a7d2	[bug](fix)fix Geo memory leak (#19116 )	2023-04-27 09:04:10 +08:00
xy720	925efc1902	[bug](map-type)fix some bugs in map and map element function (#18935 ) fix some bugs in map and map element function.	2023-04-26 22:10:15 +08:00
Gabriel	aabcab9dbe	[Improvement](runtime filter) Improve merge phase (#18828 )	2023-04-26 21:01:20 +08:00
amory	1ccbdee757	[FIX](map-type)fix map regress test & create mapTypeInfo without delete #19033	2023-04-26 19:03:55 +08:00
caiconghui	a32fa219ec	Revert "[Enhancement](compaction) stop tablet compaction when table dropped (#18702 )" (#19086 ) This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2. we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-04-26 18:27:46 +08:00
Pxl	60cda12e57	[Bug](pipeline-engine) fix hang on insert into select when enable pipeline engine (#19075 )	2023-04-26 16:50:19 +08:00
zclllyybb	e1651bfea5	[bugfix](aggregate_function) Fix wrong registration for percentile_approx #19070	2023-04-26 16:17:46 +08:00
Kang	1dfc5ea34c	[bugfix](jsonb) fix jsonb parser crash on noavx2 host (#18977 ) support avx2 and noavx2 for jsonb parser using __AVX2__ macro.	2023-04-26 15:10:12 +08:00
Tiewei Fang	94b11af17c	[fixbug](json-reader) fix memory leak of new_json_reader #19067	2023-04-26 12:54:47 +08:00
Qi Chen	5bd4a3897e	[optimize](multi-catalog) Skip whole row group in lazy_read if data has been filtered. (#19039 ) We found qt_q11 in regression test test_external_catalog_hive is very slow. The result is only one record, so other data should be filtered out in the parquet lazy read situation. Then we found currently the parquet reader read many records because we can only skip parquet page. But in order to skip parquet page, currently we need to read page header, then it will caused prefetch data. Therefore, prefetch data in this case may be not good. So there are two issues: Skip whole row group in this case. Prefetching data in this case may be not good, need to improve it. This PR resolve issues 1.	2023-04-26 12:10:14 +08:00
Adonis Ling	375789d345	[enhancement](JNI) Provide default environment variables if it is unset (#19041 )	2023-04-26 12:06:38 +08:00
Mryange	5fd6d8ebd4	[fix](function) Support more behaviors of cast time in MySQL	2023-04-26 07:49:54 +08:00
WenYao	2c836251b2	[Fix](schema scanner) Fixed the problem of overflow when multiplying two INT	2023-04-25 23:58:47 +08:00
Lightman	1be5dac036	[improve] Refactor file cache and Improve the file cache strategy (#18652 ) 1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before. 2. Support more strategy. Support file cache priority. The file cache will have three queue, name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.	2023-04-25 23:14:28 +08:00
herry2038	17b59df8dd	[fix](function) Array_map compared offset rows one by one (#18406 ) Array_map 's multi columns compare not only nested data rows to be equal,but also the offsets data must equal each other.	2023-04-25 19:12:19 +08:00
Mryange	fa0f3a2859	[fix](planner) vdatetime_value.cpp:1585 Array access may overflow. (#18872 ) int64_t months = _year * 12 + _month - 1 + sign * (12 * interval.year + interval.month); _year = months / 12; if (_year > 9999) { return false; } _month = (months % 12) + 1; if (_day > s_days_in_month[_month]) { _day = s_days_in_month[_month]; if (_month == 2 && doris::is_leap(_year)) { _day++; } } The variable "months" may be negative. Taking modulus with it (_month) may also result in a negative value, which can cause an array access overflow.	2023-04-25 17:57:21 +08:00
yiguolei	8d21f20753	[enhancement](javaudf) not depend on parent will cause deconstructor core (#18948 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-25 15:26:54 +08:00
WenYao	339d804ec4	[Refactor](exceptionsafe) add factory creator to some class (#19000 )	2023-04-25 14:33:47 +08:00
Ashin Gau	39d66ca2c6	[fix](parquet) hasn't initialize select vector when number of nested values equals zero (#18953 ) Fix bug when reading array type in parquet file: ``` ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed, reason = [IO_ERROR]Decode too many values in current page ``` When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong. The situation where this error occurs is particularly extreme: The column pages have remaining values to be read, but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.	2023-04-25 14:21:33 +08:00
yixiutt	8b27d42b9b	[bugfix](MOW) fix core in set_txn_related_delete_bitmap (#18956 ) Fe will clear transaction info when transaction timeout, but calc delete bitmap related logic in DeltaWriter::close_wait will continue. In set_txn_related_delete_bitmap, we return directly in such case.	2023-04-25 10:57:26 +08:00
lihangyu	d555bae290	[Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3 (#19011 ) * [Bug](serde) fix serialize column to jsonb when meet boolean and decimal_v3 * add comment to explain why use uint8	2023-04-25 10:48:13 +08:00
yiguolei	4e9b32d622	[bugfix](exception) remove fmt code to test if there still exist core (#19009 )	2023-04-25 07:24:14 +08:00
yiguolei	3899c08036	[optimize](compile) remove unused template param from load channel (#18980 ) * [optimize](compile) remove unused template param from load channel --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-24 23:36:47 +08:00
HappenLee	b2c26e17e1	[Compile](vec) Fix compile by BHREAD_SCANNER (#18979 )	2023-04-24 17:07:06 +08:00
Adonis Ling	16a394da0e	[chore](build) Use include-what-you-use to optimize includes (PART III) (#18958 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-24 14:51:51 +08:00
gitccl	296b0c92f7	[Enhancement](compaction) stop tablet compaction when table dropped (#18702 ) * [Enhancement](compaction) stop tablet compaction when table dropped * fix be ut	2023-04-24 11:04:27 +08:00
Mellorsssss	ab2a6864bc	[function](json) Json unquote (#18037 )	2023-04-24 10:33:29 +08:00
yiguolei	8d7a9fd21b	[refactor](exceptionsafe) add factory creator to some class (#18978 ) make vexprecontext,vexpr,function,query context,runtimestate thread safe. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-24 10:32:11 +08:00
Xinyi Zou	8e4710079d	[improvement](profile) Insert into add LoadChannel runtime profile (#18908 ) TabletSink and LoadChannel in BE are M: N relationship, Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.	2023-04-24 09:41:57 +08:00
Jerry Hu	0c95d760fe	[fix](fixed_hashtable) The incorrect implementation of copy constructor (#18921 )	2023-04-24 08:36:52 +08:00
airborne12	07ea350201	[Fix](inverted index) fix memory leak when create bkd reader (#18914 ) The function compoundReader->openInput is called three times, and if any of these calls fail, an error is logged, and the function returns early. If one or two of the calls succeed, but the others fail, there might be a situation where the allocated memory for the IndexInput objects is not freed. To fix this, you could use std::unique_ptr to manage the memory for IndexInput objects. This would automatically clean up the memory when the function goes out of scope.	2023-04-23 23:21:44 +08:00
AlexYue	c3baa65de3	[feature](io) enable s3 file writer with multi part uploading concurrently (#17585 ) Formerly S3FileWriter has to write each buffer with 5MB or more then upload one part, after all these works are done it could then process the incoming data, it's blocking and inefficient. This pr brings one bufferpool where the data could write into memory buffer immediately if has free buffer and then it would be uploaded into the S3. This pr doesn't provide the ability to elegantly support cases where there is no free buffer, i'll leave it as one future work.	2023-04-23 23:19:44 +08:00
yiguolei	3736530585	[refactor](query context) rename query fragments context to query context and make query context safe (#18950 ) * [refactor](query context) rename query fragments context to query context and make query context safe --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-23 22:53:56 +08:00
ZenoYang	0da2cf270a	[improvement](fetch data) Merge result into batch to reduce rpc times (#17828 )	2023-04-23 15:07:28 +08:00
yiguolei	61b44108e2	[bugfix](asan) fix possible asan check bug in exception to string (#18936 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-04-23 12:26:36 +08:00
Ashin Gau	29f502380c	[opt](FileReader) merge small IO to optimize read performace (#18796 ) Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance. `MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc. In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the reader. The random access ranges is a list of ranges that order by offset. The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly. When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges. For example, in parquet, the random access ranges is the column offsets in a row group. When reading at offset, if [offset, offset + 8MB) contains many random access ranges, the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges, and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero, the box can be release or reused by other ranges. When there is no empty box for a new read operation, the read operation will do directly. ## Effects The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s. The profile of Query 24: ``` VFILE_SCAN_NODE (id=0):(Active: 8s344ms, % non-child: 83.06%) - FileReadBytes: 534.46 MB - FileReadCalls: 1.031K (1031) - FileReadTime: 28s801ms - GetNextTime: 8s304ms - MaxScannerThreadNum: 12 - MergedSmallIO: 0ns - CopyTime: 157.774ms - MergedBytes: 549.91 MB - MergedIO: 94 - ReadTime: 28s642ms - RequestBytes: 507.96 MB - RequestIO: 1.001K (1001) - NumScanners: 18 ``` 1001 request IOs has been merged into 94 IOs. ## Remaining problems 1. Add p2 regression test in nest PR 2. Profiles are scattered in various codes and will be refactored in the next PR 3. Support ORC reader	2023-04-23 10:51:38 +08:00

1 2 3 4 5 ...

4229 Commits