[enhancement](array) support read list(Array) type from orc file (#14132)

Before this pr, if we try to load ORC file with native list(or array) type data, the be will crash.
Because complex types in ORC file include multi real columns, so we need to filter columns by column names.
Otherwise we could not read all columns we need.
Now arrow release-7.0.0 only support create stripe reader by column index, so we patch it to support create stripe reader by column names.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
This commit is contained in:
camby
2022-11-15 17:48:17 +08:00
committed by GitHub
parent 9d70c531a3
commit 3ea9d3f2e1
6 changed files with 121 additions and 2 deletions

View File

@ -71,11 +71,13 @@ void ArrowReaderWrap::close() {
Status ArrowReaderWrap::column_indices() {
_include_column_ids.clear();
_include_cols.clear();
for (auto& slot_desc : _file_slot_descs) {
// Get the Column Reader for the boolean column
auto iter = _map_column.find(slot_desc->col_name());
if (iter != _map_column.end()) {
_include_column_ids.emplace_back(iter->second);
_include_cols.push_back(slot_desc->col_name());
} else {
_missing_cols.push_back(slot_desc->col_name());
}
@ -136,6 +138,7 @@ Status ArrowReaderWrap::next_batch(std::shared_ptr<arrow::RecordBatch>* batch, b
while (!_closed && _queue.empty()) {
if (_batch_eof) {
_include_column_ids.clear();
_include_cols.clear();
*eof = true;
return Status::OK();
}