doris

Author	SHA1	Message	Date
Mingyu Chen	c5ad989065	[refactor](reader) refactor the interface of file reader (#12574 ) Currently, Doris has a variety of readers for different file formats, such as parquet reader, orc reader, csv reader, json reader and so on. The interfaces of these readers are not unified, which makes it impossible to call them through a unified method. In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class to use the `get_next_block()` method. This PR currently only modifies `arrow_reader` and `parquet reader`. Other readers will be modified one by one in subsequent PRs.	2022-09-14 22:31:11 +08:00
huangzhaowei	54f878b781	[feature-wip](multi-catalog) Support orc format file split for file scan node (#11046 )	2022-07-25 11:41:46 +08:00
slothever	8a366c9ba2	[feature](multi-catalog) read parquet file by start/offset (#10843 ) To avoid reading the repeat row group, we should align offsets	2022-07-18 20:51:08 +08:00
Mingyu Chen	8e364fb848	[fix](load) skip empty orc file (#10593 ) Something the upstream system(eg, hive) may create empty orc file which only has a header and footer, without schema. And if we call `_reader->createRowReader()` with selected columns, it will throw ParserError: Invalid column selected xx. So here we first check its number of rows and skip these kind of files. This is only a fix for non-vec load, for vec load, it use arrow scanner to read orc file, which does not have this problem.	2022-07-05 22:18:56 +08:00
yinzhijian	cbbda7857b	[feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541 )	2022-05-26 21:39:12 +08:00

5 Commits