doris

Author	SHA1	Message	Date
HappenLee	d5a3e8df3a	[Exec](opt) Opt the vexplode_split function performance (#15945 )	2023-01-17 19:02:57 +08:00
starocean999	151ae71761	[fix](be)fix bug of VSetOperationNode::release_resource (#15997 ) should call "ExecNode::release_resource(state)" if child class override the parent's method	2023-01-17 16:16:25 +08:00
Gabriel	7d34512501	[Bug](pipeline) Fix DCHECK failure (#15928 )	2023-01-17 12:01:20 +08:00
HappenLee	9f106161a7	[Bug](join) Fix null aware anti join error in fuzzy mode (#15987 )	2023-01-17 11:32:16 +08:00
YueW	b1caa68706	[Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823 ) Issue Number: Step2 of DSIP-023: Add inverted index for full text search implementation of inverted index reader dependency pr: #14211 #15807 #15821	2023-01-17 09:13:56 +08:00
WenYao	bdec4d5ac2	[enhancement](profile) add read columns to scanner profile (#15902 )	2023-01-16 19:32:46 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
Pxl	81bab55d43	[Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903 )	2023-01-16 11:21:28 +08:00
Pxl	b727033906	[Chore](build) enable -Wextra and remove some -Wno (#15760 ) enable -Wextra and remove some -Wno	2023-01-15 10:40:35 +08:00
Gabriel	5af7bcaa55	[Bug](decimalv3) Fix missing precision and scale in predicates (#15930 )	2023-01-15 00:01:48 +08:00
Tiewei Fang	c4475a8dbc	[Enhencement](jdbc scanner) add profile for jdbc scanner (#15914 )	2023-01-14 10:28:59 +08:00
Ashin Gau	34bb9cd5d3	[fix](parquet-reader) fix coredump when load datatime data to doris from parquet (#15794 ) `date_time_v2` will check scale when constructed datatimev2: ``` LOG(FATAL) << fmt::format("Scale {} is out of bounds", scale); ``` This [PR](https://github.com/apache/doris/pull/15510) has fixed this issue, but parquet does not use constructor to create `TypeDescriptor`, leading the `scale = -1` when reading datetimev2 data.	2023-01-13 11:51:11 +08:00
HappenLee	9468711f9f	[Bug](join) fix bug null aware left anti join not correct result (#15841 )	2023-01-13 10:18:05 +08:00
yongkang.zhong	688a0bb96a	[feature](multi-catalog) support clickhouse jdbc catalog (#15780 )	2023-01-13 10:07:22 +08:00
Gabriel	0fbdf8e3e1	[Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772 )	2023-01-12 15:08:21 +08:00
AlexYue	98d69d1568	[fix](compile) fix vscan node compile error (#15805 ) conflict merge of #15604 and #15618	2023-01-11 15:08:46 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
yiguolei	d857b4af1b	[refactor](remove row batch) remove impala rowbatch structure (#15767 ) * [refactor](remove row batch) remove impala rowbatch structure Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-11 09:37:35 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
slothever	90a92f0643	[feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618 ) Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed. One of the usage: Before we use following sql to time travel: `select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`; `select * from ice_table FOR VERSION AS OF "snapshot_id"`; we can use the snapshots metadata to get the `committed time` or `snapshot_id`, and then, we can use it as the time or version in time travel clause	2023-01-10 22:37:35 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
Xinyi Zou	9c0f96883a	[fix](hashjoin) Fix right join pull output block memory overflow (#15440 ) For outer join / right outer join / right semi join, when HashJoinNode::pull->process_data_in_hashtable outputs a block, it will output all rows of a key in the hash table into a block, and the output of a key is completed After that, it will check whether the block size exceeds the batch size, and if it exceeds, the output will be terminated. If a key has 2000w+ rows, memory overflow will occur when the subsequent block operations on the 2000w+ rows are performed.	2023-01-10 10:10:43 +08:00
Mingyu Chen	9e3a61989b	[refactor](es) remove BE generated dsl for es query #15751 remove fe config enable_new_es_dsl and all related code. Now the DSL for es is always generated on FE side.	2023-01-10 08:40:32 +08:00
Pxl	1514b5ab5c	[Feature](Materialized-View) support advanced Materialized-View (#15212 )	2023-01-09 09:53:11 +08:00
Lijia Liu	c57fa7c930	[Pipeline] Fix PipScannerContext::can_finish return wrong status (#15259 ) Now in ScannerContext::push_back_scanner_and_reschedule, _num_running_scanners-- is before _num_scheduling_ctx++. InPipScannerContext::can_finish, we check _num_running_scanners == 0 && _num_scheduling_ctx == 0 without obtaining _transfer_lock. In follow case, PipScannerContext::can_finish will return wrong result. _num_running_scanners-- Check _num_running_scanners == 0 && _num_scheduling_ctx == 0` return true. _num_scheduling_ctx++ So, we can set _num_running_scanners-- in the last of this func. Describe your changes. PipScannerContext::get_block_from_queue not block. Set _num_running_scanners-- in the last of ScannerContext::push_back_scanner_and_reschedule.	2023-01-09 08:46:58 +08:00
Ashin Gau	707eab9a63	[opt](multi-catalog) cache and reuse position delete rows in iceberg v2 (#15670 ) A deleted file may belong to multiple data files. Each data file will read a full amount of deleted files, so a deleted file may be read repeatedly. The deleted files can be cached, and multiple data files can reuse the first read content. The performance is improved by 60% in the case of single thread, and by 30% in the case of multithreading.	2023-01-07 22:29:11 +08:00
Jerry Hu	9c36278c4a	[improvement](pipeline) Support sharing hash table for broadcast join (#15628 )	2023-01-06 15:11:28 +08:00
HappenLee	f24659c003	[Refactor](pipeline) refactor the code of channel buffer limit and change the default value (#15650 )	2023-01-06 14:52:43 +08:00
Tiewei Fang	df2da89b89	[feature](multi-catalog) support postgresql jdbc catalog (#15570 ) support postgresql jdbc catalog	2023-01-06 11:00:59 +08:00
luozenglin	05d72e8919	[fix](join) fix anti join incorrectly outputs null values (#15567 )	2023-01-06 09:55:48 +08:00
Kang	9d1f02c580	[Improvement](topn) runtime prune for topn query (#15558 )	2023-01-05 20:10:12 +08:00
Zhengguo Yang	6523b546ab	[chore](vulnerability) fix some high risk vulnerabilities report by bug scanner (#15621 ) * [chore](vulnerability) fix some high risk vulnerabilities report by bug scanner	2023-01-05 14:58:23 +08:00
Pxl	93f5e440eb	[Bug](execute) fix get next non stop for eos on streaming preagg (#15611 ) * fix get nnext non stop for eos on streaming preagg * update	2023-01-05 09:36:11 +08:00
Gabriel	5ff5b8fc98	[feature](mark join) Support mark join for hash join node (#15569 ) * [feature](mark join) Support mark join for hash join node	2023-01-05 09:32:26 +08:00
Mingyu Chen	4075e3aec6	[fix](csv-reader) fix new csv reader's performance issue (#15581 )	2023-01-04 18:25:08 +08:00
luozenglin	c42c61dcad	[fix](bitmapfilter) fix bitmap filter not pushing down (#15532 )	2023-01-04 14:33:53 +08:00
Pxl	85fe9d2496	[Bug](filter) fix not in(null) return true (#15466 ) fix not in(null) return true	2023-01-03 21:14:50 +08:00
Ashin Gau	50f1931f96	[fix](multi-catalog) get dictionary-encode from parquet metadata (#15525 )	2022-12-31 19:08:10 +08:00
starocean999	100834df8b	[fix](nereids) fix some arrgregate bugs in Nereids (#15326 ) 1. the agg function without distinct keyword should be a "merge" funcion in threePhaseAggregateWithDistinct 2. use aggregateParam.aggMode.consumeAggregateBuffer instead of aggregateParam.aggPhase.isGlobal() to indicate if a agg function is a "merge" function 3. add an AvgDistinctToSumDivCount rule to support avg(distinct xxx) in some case 4. AggregateExpression's nullable method should call inner function's nullable method. 5. add a bind slot rule to bind pattern "logicalSort(logicalHaving(logicalProject()))" 6. don't remove project node in PhysicalPlanTranslator 7. add a cast to bigint expr when count( distinct datelike type ) 8. fallback to old optimizer if bitmap runtime filter is enabled. 9. fix exchange node mem leak	2022-12-30 23:07:37 +08:00
YueW	edecc2e706	[feature-wip](inverted index) API for inverted index reader and syntax for fulltext match (#14211 ) * [feature-wip](inverted index)inverted index api: reader * [feature-wip](inverted index) Fulltext query syntax with MATCH/MATCH_ALL/MATCH_ALL * [feature-wip](inverted index) Adapt to index meta * [enhance] add more metrics * [enhance] add fulltext match query check for column type and index parser * [feature-wip](inverted index) Support apply inverted index in compound predicate which except leaf node of and node	2022-12-30 21:48:14 +08:00
Jerry Hu	10be583e52	[chore](pipeline) optimize profile information (#15433 )	2022-12-30 09:56:33 +08:00
Ashin Gau	2c8de30cce	[optimize](multi-catalog) use dictionary encode&filter to process delete files (#15441 ) Optimize PR #14470 has used `Expr` to filter delete rows to match current data file, but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files) to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows. In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path` columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column. After testing, the performance of iceberg v2's MOR is improved by 30%+. Fix Bug Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr` and the batch_eof is generated from next batch.	2022-12-30 08:57:55 +08:00
zhangstar333	85c7c531f1	[vectorized](jdbc) support array type in jdbc external table (#15303 )	2022-12-30 00:29:08 +08:00
Jibing-Li	987970e8e3	[fix](multi catalog)Set column defualt value for query. (#15415 ) Current column default value is used only for load task. But in the case of Iceberg schema change, query task is also possible to read the default value for columns not exist in old schema. This pr is to support default value for query task. Manually tested the broker load and external emr regression cases.	2022-12-29 12:03:17 +08:00
zhangstar333	3146fc8189	[bug](jdbc) fix jdbc external table with char type length error (#15386 ) Now have test pg and oracle with char(100), if data='abc' but read string data length is 100, so need trim extral spaces	2022-12-29 11:19:03 +08:00
YueW	305dd15fea	[improvement](index) Support bitmap index can be applied with compound predicate when enable vectorized engine query (#13035 ) Current bitmap index only can apply pushed down predicates which in AND conditions. When predicates in OR conditions and other complex compound conditions, it will not be pushed down to the storage layer, this leads to read more data. Based on that situation, this pr will do: 1. this pr in order to support bitmap index apply compound predicates, query sql like: select * from tb where a > 'hello' or b < 100; select * from tb where a > 'hello' or b < 100 or c > 'ok'; select * from tb where (a > 'hello' or b <100) and (a < 'world' or b > 200); select * from tb where (not a> 'hello') or b < 100; ... above sql，column a and b and c has created bitmap_index. 2. this optimization can reduce reading data by index 3. set config enable_index_apply_compound_predicates to use this optimization	2022-12-28 20:08:57 +08:00
luozenglin	f8bb8c7829	[fix](broker) fix be core dump caused by broker load (#15390 ) * [fix](broker) fix be core dump caused by broker load	2022-12-28 10:57:41 +08:00
Ashin Gau	fc8f6a0715	[fix](multi-catalog) throw NPE when reading data after EOF (#15358 ) 1. Fix 1 bug: Throw null pointer exception when reading data after the reader reaches the end of file, so should return directly when `_do_lazy_read` read no data. 2. Optimize code: Remove unused parameters. 3. Fix regression test	2022-12-26 22:49:35 +08:00
Xin Liao	bf71943605	[feature](load) stream load trim double quotes for csv (#15241 )	2022-12-26 11:45:54 +08:00
HappenLee	ca4674ca68	[pipeline](opt) opt the exec performance of pipe exec engine (#15330 ) opt the exec performance of pipe exec engine	2022-12-26 09:58:52 +08:00

1 2 3 4 5 ...

509 Commits