doris

Author	SHA1	Message	Date
Xinyi Zou	9c0f96883a	[fix](hashjoin) Fix right join pull output block memory overflow (#15440 ) For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated. If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.	2023-01-10 10:10:43 +08:00
Mingyu Chen	9e3a61989b	[refactor](es) remove BE generated dsl for es query #15751 remove fe config enable_new_es_dsl and all related code. Now the DSL for es is always generated on FE side.	2023-01-10 08:40:32 +08:00
plat1ko	ab186a60ce	[enhancement](compaction) Optimize judging delete rowset and picking candidate rowsets for compaction #15631 Tablet::version_for_delete_predicate should travel all rowset metas in tablet meta which complex is O(N), however we can directly judge whether this rowset is a delete rowset by RowsetMeta::has_delete_predicate which complex is O(1). As we won't call Tablet::version_for_delete_predicate when pick input rowsets for compaction, we can reduce the critical area of Tablet::_meta_lock.	2023-01-10 08:32:15 +08:00
Gabriel	2c9c7c48ac	[improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674 )	2023-01-09 15:13:16 +08:00
Gabriel	699bf972e2	[Bug](bitmap) Fix bitmap_from_string for null constant (#15698 )	2023-01-09 10:21:08 +08:00
Pxl	1514b5ab5c	[Feature](Materialized-View) support advanced Materialized-View (#15212 )	2023-01-09 09:53:11 +08:00
Lijia Liu	c57fa7c930	[Pipeline] Fix PipScannerContext::can_finish return wrong status (#15259 ) Now in ScannerContext::push_back_scanner_and_reschedule, _num_running_scanners-- is before _num_scheduling_ctx++. InPipScannerContext::can_finish, we check _num_running_scanners == 0 && _num_scheduling_ctx == 0 without obtaining _transfer_lock. In follow case, PipScannerContext::can_finish will return wrong result. _num_running_scanners-- Check _num_running_scanners == 0 && _num_scheduling_ctx == 0` return true. _num_scheduling_ctx++ So, we can set _num_running_scanners-- in the last of this func. Describe your changes. PipScannerContext::get_block_from_queue not block. Set _num_running_scanners-- in the last of ScannerContext::push_back_scanner_and_reschedule.	2023-01-09 08:46:58 +08:00
zbtzbtzbt	ba54634d55	[refactor] delete non vec load from memtable (#15667 ) * [refactor] delete non vec load from memtable delete non vec load from memtable totally. remove function keys_type() in memtable. Co-authored-by: zhoubintao <1229701101@qq.com>	2023-01-09 08:41:58 +08:00
ElvinWei	36590da24b	[fix](regression p0) add the alias function hist to histogram and fix p0 (#15708 ) add the alias function hist to histogram and fix p0	2023-01-08 11:31:23 +08:00
yixiutt	90be1a22a9	[bugfix](vertical compaction) fix dcheck failed in MOW tablet (#15638 ) fix a dcheck error for vertical compaction on Merge-On-Write table。 When merge rowsets with empty segment, VerticalHeapMergeIterator::init return ok directly and _record_rowids not set, dcheck failed when _unique_key_next_block call current_block_row_locations。	2023-01-08 10:39:52 +08:00
Ashin Gau	707eab9a63	[opt](multi-catalog) cache and reuse position delete rows in iceberg v2 (#15670 ) A deleted file may belong to multiple data files. Each data file will read a full amount of deleted files, so a deleted file may be read repeatedly. The deleted files can be cached, and multiple data files can reuse the first read content. The performance is improved by 60% in the case of single thread, and by 30% in the case of multithreading.	2023-01-07 22:29:11 +08:00
ElvinWei	76ad599fd7	[enhancement](histogram) optimise aggregate function histogram (#15317 ) This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910) aggregation function. Including the following: 1. Support input parameters `sample_rate` and `max_bucket_num` 2. Add UT and regression test 3. Add documentation 4. Optimize function implementation logic Parameter description： - `sample_rate`：Optional. The proportion of sample data used to generate the histogram. The default is 0.2. - `max_bucket_num`：Optional. Limit the number of histogram buckets. The default value is 128. --- Example： ``` MySQL [test]> SELECT histogram(c_float) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_float`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_string`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ ``` Query result description： ``` { "sample_rate": 0.2, "max_bucket_num": 128, "bucket_num": 3, "buckets": [ { "lower": "0.1", "upper": "0.2", "count": 2, "pre_sum": 0, "ndv": 2 }, { "lower": "0.8", "upper": "0.9", "count": 2, "pre_sum": 2, "ndv": 2 }, { "lower": "1.0", "upper": "1.0", "count": 2, "pre_sum": 4, "ndv": 1 } ] } ``` Field description： - sample_rate：Rate of sampling - max_bucket_num：Limit the maximum number of buckets - bucket_num：The actual number of buckets - buckets：All buckets - lower：Upper bound of the bucket - upper：Lower bound of the bucket - count：The number of elements contained in the bucket - pre_sum：The total number of elements in the front bucket - ndv：The number of different values in the bucket > Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).	2023-01-07 00:50:32 +08:00
Jerry Hu	9c36278c4a	[improvement](pipeline) Support sharing hash table for broadcast join (#15628 )	2023-01-06 15:11:28 +08:00
HappenLee	1038093c29	[Pipeline](Exec) disable work steal of hash join build (#15652 )	2023-01-06 15:08:10 +08:00
HappenLee	f24659c003	[Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650 )	2023-01-06 14:52:43 +08:00
Tiewei Fang	df2da89b89	[feature](multi-catalog) support postgresql jdbc catalog (#15570 ) support postgresql jdbc catalog	2023-01-06 11:00:59 +08:00
Gabriel	b57500d0c3	[Bug](decimalv3) fix wrong result for MOD operation (#15644 )	2023-01-06 10:38:53 +08:00
luozenglin	05d72e8919	[fix](join) fix anti join incorrectly outputs null values (#15567 )	2023-01-06 09:55:48 +08:00
Zhengguo Yang	b41934864e	[enhancement](frontendservice) add retry when create connection to frontend service (#15635 )	2023-01-06 09:15:08 +08:00
Adonis Ling	95f2f43c02	[fix](macOS) Failed to run BE UT due to syscall to map cache into shared region failed (#15641 ) According to the post https://developer.apple.com/forums/thread/676684, the executable whose size is bigger than 2G may fail to start. The size of the executable `doris_be_test` generated by run-be-ut.sh is 2.1G (> 2G) now and we can't run it on macOS (arm64). We can separate the debug info from the executable `doris_be_test` to reduce the size. After that, we can run `doris_be_test` successfully.	2023-01-06 01:23:37 +08:00
Kang	9d1f02c580	[Improvement](topn) runtime prune for topn query (#15558 )	2023-01-05 20:10:12 +08:00
Gabriel	5ee479f45c	[Pipeline](load) Support transaction on pipeline engine (#15597 )	2023-01-05 15:59:18 +08:00
Zhengguo Yang	6523b546ab	[chore](vulnerability) fix some high risk vulnerabilities report by bug scanner (#15621 ) * [chore](vulnerability) fix some high risk vulnerabilities report by bug scanner	2023-01-05 14:58:23 +08:00
spaces-x	1018657d9d	[Enhancement](SparkLoad): avoid BE OOM in push task, fix #15572 (#15620 ) Release memory pool held by the parquet reader when the data has been flushed by rowset writter. Co-authored-by: spaces-x <weixiang06@meituan.com>	2023-01-05 10:20:32 +08:00
Pxl	93f5e440eb	[Bug](execute) fix get next non stop for eos on streaming preagg (#15611 ) * fix get nnext non stop for eos on streaming preagg * update	2023-01-05 09:36:11 +08:00
Gabriel	5ff5b8fc98	[feature](mark join) Support mark join for hash join node (#15569 ) * [feature](mark join) Support mark join for hash join node	2023-01-05 09:32:26 +08:00
yixiutt	804ea08825	[add_log](checksum) add cost and size log for checksum (#15599 )	2023-01-04 19:10:40 +08:00
Mingyu Chen	4075e3aec6	[fix](csv-reader) fix new csv reader's performance issue (#15581 )	2023-01-04 18:25:08 +08:00
luozenglin	c42c61dcad	[fix](bitmapfilter) fix bitmap filter not pushing down (#15532 )	2023-01-04 14:33:53 +08:00
yixiutt	73d4070708	[bugfix](compaction) fix missing key_bounds in vertical compaction (#15578 ) When flush last segment of every column, missing set segment key bound for rowset meta so that rowset tree init error.	2023-01-04 13:39:09 +08:00
Pxl	85fe9d2496	[Bug](filter) fix not in(null) return true (#15466 ) fix not in(null) return true	2023-01-03 21:14:50 +08:00
Xin Liao	4380f1ec54	[Enhancement](load) reduce memory by memory size of global delta writer (#14491 )	2023-01-03 20:05:21 +08:00
zhangstar333	b50448d5c4	[vectorized](udaf) fix udaf result is null when has multiple aggs (#15554 )	2023-01-03 16:03:43 +08:00
TengJianPing	77fda4f749	[SpillToDisk](block reader and writer)Support spill to disk: implement interfaces for spill block and read block (#15399 )	2023-01-03 12:42:45 +08:00
Zhengguo Yang	57620f6f0d	[bugfix](datetimev2) fix coredump when load datatime data to doris (#15510 )	2023-01-03 10:05:44 +08:00
Jibing-Li	17286861ef	[Fix](multi catalog)Skip non-vectorized init code for NewFileScanNode. #15550	2023-01-03 09:22:17 +08:00
yiguolei	14eaf41029	[refactor](remove rowblockv2) remove rowblock v2 structure (#15540 ) * [refactor](remove rowblockv2) remove rowblock v2 structure * fix bugs Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-03 09:21:57 +08:00
xueweizhang	40c53931e5	[fix](vec) VMergeIterator add key same label for agg table (#14722 )	2023-01-02 22:54:21 +08:00
yixiutt	365c3eec16	[enhancement](compaction) vertical compaction support unique-key mow (#15353 )	2023-01-02 22:53:04 +08:00
AlexYue	87110ad3e3	[chore](Sink)remove useless OlapTablePartitionParam-related code (#15549 )	2023-01-02 22:47:16 +08:00
Gabriel	ad9a67a76a	[Bug](decimalv3) Fix wrong decimalv3 value after insertion (#15505 )	2023-01-01 11:08:59 +08:00
abmdocrt	3c2dee1d10	[fix](typo) Fix typo in variable name (#15538 )	2023-01-01 11:03:45 +08:00
Ashin Gau	50f1931f96	[fix](multi-catalog) get dictionary-encode from parquet metadata (#15525 )	2022-12-31 19:08:10 +08:00
starocean999	100834df8b	[fix](nereids) fix some arrgregate bugs in Nereids (#15326 ) 1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct 2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function 3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case 4. AggregateExpression's nullable method should call inner function's nullable method. 5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))" 6. don't remove project node in PhysicalPlanTranslator 7. add a cast to bigint expr when count( distinct datelike type ) 8. fallback to old optimizer if bitmap runtime filter is enabled. 9. fix exchange node mem leak	2022-12-30 23:07:37 +08:00
Xin Liao	cc7a9d92ad	[refactor](non-vec) remove non vec code for indexed column reader (#15409 )	2022-12-30 23:01:54 +08:00
plat1ko	ad68764977	[enhancement](tablet) Unify redundant `create_rowset_writer` methods (#15519 ) * Remove redundant create_rowset_writer methods * Set resource id when setting FS in rowset meta * fix * fix ut	2022-12-30 22:57:12 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
yiguolei	b23d068281	[refactor](remove-non-vec) Remove non vec load from memtable and delta writer (#15517 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-12-30 21:22:58 +08:00
Xin Liao	ec52907b06	[fix](index) fix wrong dcheck in indexed column writer (#15520 )	2022-12-30 20:12:41 +08:00
luozenglin	dec1eb360c	[fix](brokerload) be core dump caused by broker load orc format file nullptr pointer (#15460 )	2022-12-30 15:37:33 +08:00

1 2 3 4 5 ...

3526 Commits