doris

Author	SHA1	Message	Date
yiguolei	6e8eedc521	[refactor](remove unused code) remove storage buffer and orc reader (#16137 ) remove olap storage byte buffer remove orc reader remove time operator remove read_write_util remove aggregate funcs remove compress.h and cpp remove bhp_lib Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-24 22:29:32 +08:00
yiguolei	79ad74637d	[refactor](remove expr) remove non vectorized Expr and ExprContext related codes (#16136 )	2023-01-24 10:45:35 +08:00
Mingyu Chen	23edb3de5a	[fix](icebergv2) fix bug that delete file reader is not opened (#16133 ) This pr #15836 change the way to use parquet reader by first open() then init_reader(). But we forgot to call open() for iceberg delete file, which cause coredump.	2023-01-24 10:19:46 +08:00
yiguolei	a3cd0ddbdc	[refactor](remove broker scan node) it is not useful any more (#16128 ) remove broker scannode remove broker table remove broker scanner remove json scanner remove orc scanner remove hive external table remove hudi external table remove broker external table, user could use broker table value function instead Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-23 19:37:38 +08:00
zhangstar333	61fccc88d7	[vectorized](analytic) fix analytic node of window function get wrong… (#16074 ) [Bug] 基础函数rank()开窗排序结果错误 #15951	2023-01-23 16:09:46 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
yiguolei	8920295534	[refactor](remoe non vec code) remove non vectorized conjunctx from scanner (#16121 ) 1. remove arrow group filter 2. remove non vectorized conjunctx from scanner Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-21 19:23:17 +08:00
zhangstar333	253445ca46	[vectorzied](jdbc) fix jdbc executor for get result by batch and memo… (#15843 ) result set should be get by batch size2. fix memory leak3.	2023-01-21 08:22:22 +08:00
Ashin Gau	de12957057	[debug](ParquetReader) print file path if failed to read parquet file (#16118 )	2023-01-21 08:05:17 +08:00
Tiewei Fang	7814d2b651	[Fix](Oracle External Table) fix that oracle external table can not insert batch values (#16117 ) Issue Number: close #xxx This pr fix two bugs: _jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914 In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is: insert all into table(col1,col2) values(c1v1, c2v1) into table(col1,col2) values(c1v2, c2v2) SELECT 1 FROM DUAL;	2023-01-21 07:57:12 +08:00
abmdocrt	9ffd109b35	[fix](datetimev2) Fix BE datetimev2 type returning wrong result (#15885 )	2023-01-20 22:25:20 +08:00
yixiutt	171404228f	[improvement](vertical compaction) cache segment in vertical compaction (#16101 ) 1.In vertical compaction, segments will be loaded for every column group, so we should cache segment ptr to avoid too many repeated io. 2.fix vertical compaction data size bug	2023-01-20 16:38:23 +08:00
Tiewei Fang	1638936e3f	[fix](oracle catalog) oracle catalog support `TIMESTAMP` dateType of oracle (#16113 ) `TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris	2023-01-20 14:47:58 +08:00
lihangyu	116e17428b	[Enhancement](point query optimize) improve performace of point query on primary keys (#15491 ) 1. support row format using codec of jsonb 2. short path optimize for point query 3. support prepared statement for point query 4. support mysql binary format	2023-01-20 13:33:01 +08:00
Jibing-Li	3ebc98228d	[feature wip](multi catalog)Support iceberg schema evolution. (#15836 ) Support iceberg schema evolution for parquet file format. Iceberg use unique id for each column to support schema evolution. To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side. Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.	2023-01-20 12:57:36 +08:00
Gabriel	6e090e4daf	[Bug](predicate) fix date predicate (#16053 )	2023-01-19 14:14:48 +08:00
yiguolei	0b5e71d3b4	[refactor](refactor field) remove unused method (#16068 )	2023-01-19 10:16:09 +08:00
lihangyu	3894de49d2	[Enhancement](topn) support two phase read for topn query (#15642 ) This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`. TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase: 1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode. 2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine. After the second phase read, Block will contain all the data needed for the query	2023-01-19 10:01:33 +08:00
HappenLee	d5a3e8df3a	[Exec](opt) Opt the vexplode_split function performance (#15945 )	2023-01-17 19:02:57 +08:00
starocean999	151ae71761	[fix](be)fix bug of VSetOperationNode::release_resource (#15997 ) should call "ExecNode::release_resource(state)" if child class override the parent's method	2023-01-17 16:16:25 +08:00
Gabriel	d062ca2944	[refactor](vectorized) remove unnecessary vectorization check (#15984 )	2023-01-17 12:21:46 +08:00
Gabriel	7d34512501	[Bug](pipeline) Fix DCHECK failure (#15928 )	2023-01-17 12:01:20 +08:00
HappenLee	9f106161a7	[Bug](join) Fix null aware anti join error in fuzzy mode (#15987 )	2023-01-17 11:32:16 +08:00
YueW	b1caa68706	[Feature-WIP](inverted index) inverted index reader's implementation, and add mysql_fulltext regression case to test fulltext query (#15823 ) Issue Number: Step2 of DSIP-023: Add inverted index for full text search implementation of inverted index reader dependency pr: #14211 #15807 #15821	2023-01-17 09:13:56 +08:00
yixiutt	0057243f54	[improvement](reader) use union merge when rowset are noneoverlapping (#15749 )	2023-01-16 21:53:18 +08:00
WenYao	bdec4d5ac2	[enhancement](profile) add read columns to scanner profile (#15902 )	2023-01-16 19:32:46 +08:00
Xinyi Zou	97fcad76f8	[enhancement](memtracker) Improve readability (#15716 )	2023-01-16 16:30:35 +08:00
xueweizhang	63d48564ed	[fix](datetimev2) fix datetimev2 error with T (#15915 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2023-01-16 15:30:48 +08:00
Pxl	81bab55d43	[Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903 )	2023-01-16 11:21:28 +08:00
Pxl	b727033906	[Chore](build) enable -Wextra and remove some -Wno (#15760 ) enable -Wextra and remove some -Wno	2023-01-15 10:40:35 +08:00
Gabriel	5af7bcaa55	[Bug](decimalv3) Fix missing precision and scale in predicates (#15930 )	2023-01-15 00:01:48 +08:00
Tiewei Fang	c4475a8dbc	[Enhencement](jdbc scanner) add profile for jdbc scanner (#15914 )	2023-01-14 10:28:59 +08:00
AlexYue	049f8ad2f9	[Bug](sort)fix merge sorter might div zero when block bytes less than block rows (#15859 ) If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.	2023-01-13 18:33:40 +08:00
Ashin Gau	34bb9cd5d3	[fix](parquet-reader) fix coredump when load datatime data to doris from parquet (#15794 ) `date_time_v2` will check scale when constructed datatimev2: ``` LOG(FATAL) << fmt::format("Scale {} is out of bounds", scale); ``` This [PR](https://github.com/apache/doris/pull/15510) has fixed this issue, but parquet does not use constructor to create `TypeDescriptor`, leading the `scale = -1` when reading datetimev2 data.	2023-01-13 11:51:11 +08:00
HappenLee	9468711f9f	[Bug](join) fix bug null aware left anti join not correct result (#15841 )	2023-01-13 10:18:05 +08:00
yongkang.zhong	688a0bb96a	[feature](multi-catalog) support clickhouse jdbc catalog (#15780 )	2023-01-13 10:07:22 +08:00
Jerry Hu	bae29157aa	[fix](olap) dictionary cannot be sorted after inserting some null values (#15829 )	2023-01-13 09:28:55 +08:00
TengJianPing	730571e386	[fix](sort spill) fix bug of failed to create spilled file (#15864 ) Also increase buffered block size when it has started to spill.	2023-01-13 09:23:26 +08:00
Gabriel	174e5e601f	[refactor](rpc fn) decouple vectorized remote function from row-based one (#15871 )	2023-01-13 09:21:33 +08:00
Gabriel	0fbdf8e3e1	[Refactor](table function) Decouple vectorized table functions from non-vectorized ones (#15772 )	2023-01-12 15:08:21 +08:00
abmdocrt	7441b4dc96	[Feature](function) Support width_bucket function (#14396 )	2023-01-12 13:59:21 +08:00
zhengyu	f3ef3f7e15	[fix](sink) fix memory leak in VNodeChannel (#15834 ) (#15835 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-01-12 09:24:51 +08:00
AlexYue	98d69d1568	[fix](compile) fix vscan node compile error (#15805 ) conflict merge of #15604 and #15618	2023-01-11 15:08:46 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
yiguolei	d857b4af1b	[refactor](remove row batch) remove impala rowbatch structure (#15767 ) * [refactor](remove row batch) remove impala rowbatch structure Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-01-11 09:37:35 +08:00
TengJianPing	8f31a36429	[feature] support spill to disk for sort node (#15624 )	2023-01-11 08:40:58 +08:00
Jerry Hu	4bbc93b7ce	[refactor](hashtable) simplify template args of partitioned hash table (#15736 )	2023-01-11 08:39:13 +08:00
slothever	90a92f0643	[feature-wip](multi-catalog) add iceberg tvf to read snapshots (#15618 ) Support new table value function `iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` we can use the sql `select * from iceberg_meta("table" = "ctl.db.tbl", "query_type" = "snapshots")` to get snapshots info of a table. The other iceberg metadata will be supported later when needed. One of the usage: Before we use following sql to time travel: `select * from ice_table FOR TIME AS OF "2022-10-10 11:11:11"`; `select * from ice_table FOR VERSION AS OF "snapshot_id"`; we can use the snapshots metadata to get the `committed time` or `snapshot_id`, and then, we can use it as the time or version in time travel clause	2023-01-10 22:37:35 +08:00
zclllyybb	c3da5a687a	[fix]fixed dangerous usage of namespace std (#15741 ) Co-authored-by: zhaochangle <zhaochangle@selectdb.com>	2023-01-10 16:10:49 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00

1 2 3 4 5 ...

1137 Commits