doris

Author	SHA1	Message	Date
YueW	ed3420000e	[fix](bthread) fix bthread hang (#16594 )	2023-02-14 00:08:57 +08:00
yixiutt	de725d5d44	[bugfix](column_reader) index_page should not be pre-decoded (#16605 ) In our current logic, index page will be pre-decoded but it will return OK as index page use BinaryPlainPageBuilder and first 4 bytes of the page is a offset so it's high probablility not equal to EncodingTypePB::DICT_ENCODING which is 5. Code in bitshuffle_page_pre_decode.h ``` if constexpr (USED_IN_DICT_ENCODING) { auto type = decode_fixed32_le((const uint8_t*)&data.data[0]); if (static_cast<EncodingTypePB>(type) != EncodingTypePB::DICT_ENCODING) { return Status::OK(); } size_of_dict_header = BINARY_DICT_PAGE_HEADER_SIZE; data.remove_prefix(4); } ``` But if type just equal to EncodingTypePB::DICT_ENCODING and then it will use BitShuffle to decode BinaryPlainPage, which will leads to an fatal error.	2023-02-14 00:06:14 +08:00
lihangyu	36955a6769	[regression-test](dynamic-table) add regression test for dynamic table (#16656 )	2023-02-14 00:03:19 +08:00
plat1ko	5014ad03e7	[feature](cooldown) Auto delete unused remote files (#16588 )	2023-02-13 23:59:39 +08:00
Tiewei Fang	c620e06f6a	[Enhencement](Broker reader)Use smart Pointer instead of native Pointers in broker reader	2023-02-13 21:01:53 +08:00
Jack Drogon	15d9dd114b	[Fix](cpp17) fix gutil unary_function binary_function for cpp17 (#16670 )	2023-02-13 20:30:10 +08:00
HappenLee	a34cc6ed23	[Refactor](exchange) Remove unless variable and change block mem count way (#16668 )	2023-02-13 19:14:01 +08:00
AlexYue	8317c4a752	[Bug](cooldown) set new replica id when early exit in doing clone when no missed versions (#16644 ) * set new replica id * reduce lock * reset when replica id is different	2023-02-13 14:39:03 +08:00
yiguolei	be9385d40a	[improvement](lock raii) use raii to lock and unlock (#16652 ) * [improvement](lock raii) use raii to lock and unlock This is part of exception safe: #16366. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-13 14:06:36 +08:00
airborne12	91c4d1cade	[Feature-WIP](inverted index) step 1 for supporting range predicate pushing down to inverted index (#16615 )	2023-02-13 10:30:51 +08:00
huangzhaowei	f41a2055d3	[feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073 ) Use FE cluster token to auth stream load. This auth is only open for be, and fe auth still only support http basic auth. I will use this auth for mysql load to build a no-auth stream load from fe to be. And this will avoid double auth in mysql load. More information to see the design doc.	2023-02-13 10:00:08 +08:00
caiconghui	1de4e312cc	[fix](metric) Fix be core when set enable_system_metrics to false in be (#16646 ) when enable_system_metrics is false, we should not use system_metrics any more Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-02-12 23:01:41 +08:00
奕冷	cf739e7496	[Enhancement](Stmt) Set insert_into timeout session variable separately (#16343 )	2023-02-12 16:56:10 +08:00
AlexYue	6a8fc35b78	[Bug](Cooldown) fix load balance causing no cooldown replica (#16641 )	2023-02-12 16:47:38 +08:00
HappenLee	09b7c22f6b	[Opt](exec) remove unless null key when no split in convert key range (#16624 )	2023-02-11 15:44:35 +08:00
Kang	aba843bb2b	[Improvement](inverted index) inverted index query match bitmap cache (#16578 ) Add cache for inverted index query match bitmap to accelerate common query keyword, especially for keyword matching many rows. Tests result: - large result: matching 99% out of 247 million rows shows 8x speed up. - small result: matching 0.1% out of 247 million rows shows 2x speed up.	2023-02-11 13:38:58 +08:00
lihangyu	37d1519316	[WIP](dynamic-table) support dynamic schema table (#16335 ) Issue Number: close #16351 Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.	2023-02-11 13:37:50 +08:00
YueW	b155fc07f6	[fix](fragment thread) fix thread in fragment thread pool hang (#16608 ) process the return status for exec_state->execute() in function FragmentMgr::_exec_actual	2023-02-11 09:05:10 +08:00
TengJianPing	171ae2892f	[improvement](batch size) pass batch size of exec engine to storage engine (#16614 ) Currently batch_size is not passed on to SegmentIterator, the SegmentIterator uses the hard coded value 4096 - 32 as the max row count of a block. * fix bug	2023-02-11 09:01:44 +08:00
lihangyu	8749aedbae	[Bug](point query) make get_rowset thread safe (#16609 ) `get_rowset` calling from `lookup_row_data` without lock will lead to core dump if _rs_version_map, _stale_rs_version_map changed	2023-02-10 23:54:56 +08:00
Xin Liao	c3110f8153	[fix](merge-on-write) fix that the query result has duplicate keys when load with sequence column (#16587 )	2023-02-10 22:31:05 +08:00
yiguolei	75847f7f6a	[bugfix](exchange node) should not depend on eos to judge the ending of stream receiver (#16600 ) [bugfix](exchange node) should not depend on eos to judge the ending of stream receiver #16600 Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-10 20:35:49 +08:00
YueW	ad141747b4	[fix](inverted index) fix array type inverted index query error (#16582 )	2023-02-10 17:57:15 +08:00
YueW	43eca4f209	[Feature-WIP](inverted index) Implementation for alter inverted index. (#16371 ) implementation for add/drop inverted index.	2023-02-10 17:56:17 +08:00
Xin Liao	6a5277b391	[fix](sequence-column) MergeIterator does not use the correct seq column for comparison (#16494 )	2023-02-10 17:51:15 +08:00
Jerry Hu	861f31205a	[fix](window function) invalid order_by_start in VAnalyticEvalNode (#16589 )	2023-02-10 17:40:40 +08:00
lihangyu	32188855ef	[improve](topn) seperate multiget rpc to ThreadPool (#16598 ) multiget_data working in bthread and may block the whole worker pthread of BRPC framework and effect other bthreads, so I seperate work task into a seperate task pool.	2023-02-10 17:39:31 +08:00
AlexYue	1f631c388d	[enhance](cooldown)accelerate cooldown task produce efficiency (#16089 )	2023-02-10 16:58:27 +08:00
zhangstar333	b99e2dc727	[bug](jdbc) fix jdbc can't get object of PGobject (#16496 ) when pg table have some unsupported column type like: point, polygon, jsonb...... jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject Some test need this pr: #16442	2023-02-10 16:19:02 +08:00
Gabriel	06788bc2d0	[Bug](pipeline) Fix projection on streaming operator (#16592 )	2023-02-10 15:57:26 +08:00
xueweizhang	379bef598d	[fix-core](block) clear block row_same_bit when block reuse (#16172 )	2023-02-10 12:21:27 +08:00
xy720	1b3902baa2	[Feature](Complex-type) Add struct and map type to Doris (#16444 ) This commit support: 1、Insert + select for struct/map type 2、Json stream load for struct type 3、m[key] function for map type How to use: Set the fe config to create table for struct and map type 1、admin set frontend config("enable_struct_type" = "true"); 2、admin set frontend config("enable_map_type" = "true"); #16547 Co-authored-by: xy720 <xuyang25@baidu.com> Co-authored-by: amory <wangqiannan@selectdb.com> Co-authored-by: cambyzju <zhuxiaoli01@baidu.com> Co-authored-by: hucheng01 <hucheng01@baidu.com>	2023-02-10 11:00:33 +08:00
Pxl	266bb971a6	[Enchancement](function) display elements number on check_chars_length #16570	2023-02-10 08:52:41 +08:00
Xinyi Zou	c1a1275870	[fix](memory) Fix parquet load stack overflow (#16537 )	2023-02-10 08:48:12 +08:00
yiguolei	4fcd6cd236	[refactor](remove unused code) remove load stream mgr (#16580 ) remove old stream load pipe remove old stream load manager --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-02-10 07:46:18 +08:00
Kang	130b3599bc	[Improvement](writer) make DeltaWriter close idempotent to be more robust (#16558 ) return `Status::OK()` instead of `Status::Error<ALREADY_CLOSED>()` for close() in `DeltaWriter` if it's already closed.	2023-02-09 19:48:23 +08:00
Gabriel	a038fdaec6	[Bug](pipeline) Fix bug in non-local exchange on pipeline engine (#16463 ) Currently, for broadcast shuffle, we serialize a block once and then send it by RPC through multiple channel. After this, we will serialize next block in the same memory for consideration of memory reuse. However, since the RPC is asynchronized, maybe the next block serialization will happen before sending the previous block. So, in this PR, I use a ref count to identify if the serialized block can be reuse in broadcast shuffle.	2023-02-09 19:22:40 +08:00
Ashin Gau	539fd684e9	[improvement](filecache) use dynamic segment size to cache remote file block (#16485 ) `CachedRemoteFileReader` has used fixed segment size(file_cache_max_file_segment_size=4M) to cache remote file blocks. However, the column size in a rowgroup/strip maybe smaller than 10K if a parquet/orc file has many columns, resulting in particularly serious read amplification. For example: Q1 in clickbench: select count() from hits ``` - FileCache: 0ns - IOHitCacheNum: 552 - IOTotalNum: 835 - ReadFromFileCacheBytes: 19.98 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 915.77 MB - WriteInFileCacheNum: 283 ``` Only 30MB of data is needed, but 900MB+ of data is read from hdfs. The query time of Q1(single scan thread) increased from 5.17s* to 24.45s when enable file cache. Therefore, this PR introduce dynamic segment size which is based on the `read_size` of the data. In order to prevent too small or too large IO, the segment size is limited in [4096, file_cache_max_file_segment_size]. Q1 in clickbench is 5.66s when enable file cache. The performance is almost the same as if the cache is disabled, and the data size read from hdfs is reduced to 45MB. ``` - FileCache: 0ns - IOHitCacheNum: 297 - IOTotalNum: 835 - ReadFromFileCacheBytes: 8.73 MB - ReadFromWriteCacheBytes: 0.00 - ReadTotalBytes: 29.52 MB - SkipCacheBytes: 0.00 - WriteInFileCacheBytes: 45.66 MB - WriteInFileCacheNum: 544 ``` ## Remaining Problems Small queries may result in a large number of small files(4KB at least), and the `BE` saves too much meta information of cached segments. ## Fix bug `FileCachePolicy` in `FileReaderOptions` is a constant reference, but the parameter passed in `FileFactory::create_file_reader` is a temporary variable, resulting in segmentation fault.	2023-02-09 16:39:10 +08:00
Gabriel	e48a033338	[Bug](pipeline) Support projection in UnionSourceOperator (#16525 )	2023-02-09 14:43:44 +08:00
HappenLee	7d035486ad	[Opt](vec) opt the fast execute logic to remove useless function call (#16532 )	2023-02-09 14:12:40 +08:00
yiguolei	646ba2cc88	[bugfix](scannode) 1. make rows_read correct 2. use single scanner if has limit clause (#16473 ) make rows_read correct so that the scheduler could using this correctly. use single scanner if has limit clause. Move it from fragment context to scannode. --------- Co-authored-by: yiguolei <yiguolei@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2023-02-09 14:12:18 +08:00
Xiaocc	0142ef8b95	[improvement](scanner) Supports bthread scanner (#16031 )	2023-02-09 10:24:56 +08:00
yixiutt	9f8753ffd2	[bugfix](vertical_compaction) fix base_compaction delete_sign handler (#16469 ) In vertical base compaction, same rows will be filtered in vertical_merge_iterator, we should skip these filtered rows when set agg flag of delete sign. For example, schema is a,b,delete_sign, and data is 1,1,1 1,1,0 1,1,0 2,2,1 2,2 and Block we get in VerticalBlockReader is 1,1,1 2,2,1 and we should set agg flag idex 0,4 to true when handle delete sign, so we add a function continuous_agg_count to skip same rows filtered in VerticalMergeIterator.	2023-02-09 10:13:41 +08:00
plat1ko	e1f1386395	[fix](cooldown) Rewrite update cooldown conf (#16488 ) Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf. Some bug fix about cooldown.	2023-02-09 09:12:55 +08:00
Gabriel	d1c6b81140	[Bug](log) add some log to find out bug (#16518 )	2023-02-08 21:23:02 +08:00
HappenLee	f71fc3291f	[Bug](fix) right anti join error result when batch size is low (#16510 )	2023-02-08 17:26:19 +08:00
lihangyu	d956cb13af	[Bug](point query) Reusable in PointQueryExecutor should call init before add to LookupCache (#16489 ) Otherwise in high concurrent query, _block_pool maybe used before Reusable::init done in other threads	2023-02-08 16:05:59 +08:00
TengJianPing	f6a20f844b	[fix](hashjoin) join produce blocks with rows larger than batch size: handle join with other conjuncts (#16402 )	2023-02-08 14:26:35 +08:00
abmdocrt	41947c73eb	[Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382 )	2023-02-08 12:51:07 +08:00
Kang	cf18de14b5	[fix](writer) add _is_closed state to DeltaWriter and avoid write/close core after close (#16453 )	2023-02-07 22:40:26 +08:00

1 2 3 4 5 ...

3773 Commits