doris

Author	SHA1	Message	Date
camby	3ea9d3f2e1	[enhancement](array) support read list(Array) type from orc file (#14132 ) Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash. Because complex types in ORC file include multi real columns, so we need to filter columns by column names. Otherwise we could not read all columns we need. Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names. Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-11-15 17:48:17 +08:00
Ashin Gau	21f233d7e7	[feature-wip](multi-catalog) use apache orc reader to read orc file (#13404 ) Use apache orc to read orc file, and convert ColumnVectorBatch to doris block.	2022-10-18 13:47:56 +08:00
Mingyu Chen	d286aa7bf7	[fix](spark-load) no need to filter row group when doing spark load (#13116 ) 1. Fix issue #13115 2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly. Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers. 3. Add more checks for broker load test cases.	2022-10-05 23:00:56 +08:00
Mingyu Chen	d80b7b9689	[feature-wip](new-scan) support more load situation (#12953 )	2022-09-27 21:48:32 +08:00
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
Jibing-Li	9b9ed1aef1	[data lake](arrow scanner)Fix file arrow scanner column index out of range core. (#11691 )	2022-08-12 11:34:29 +08:00
huangzhaowei	6eb8ac0ebf	[feature-wip][multi-catalog]Support caseSensitive field name in file scan node (#11310 ) * Impl case sentive in file scan node	2022-08-05 08:03:16 +08:00
huangzhaowei	0ac5228c05	[feature-wip][multi-catalog]Support prefetch for orc file format (#11292 ) Refactor the prefetch code in parquet and support prefetch for orc file format	2022-08-02 11:01:15 +08:00
slothever	c358a43f35	[feature-wip] support parquet predicate push down (#10512 )	2022-07-08 23:11:25 +08:00
HappenLee	94089b9192	[Refactor] Use file factory to replace create file reader/writer (#9505 ) 1. Simplify code logic and improve abstraction 2. Fix the mem leak of raw pointer Co-authored-by: lihaopeng <lihaopeng@baidu.com>	2022-06-08 15:07:39 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00

11 Commits