[feature-wip](parquet-reader) bug fix, create compress codec before parsing dictionary (#12422)

## Fix five bugs:
1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors.
2. `FE` doesn't resolve array type
3. `ParquetFileHdfsScanner`  doesn't fill partition values when the table is partitioned
4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss
5. typographical error in `PageReader`
This commit is contained in:
Ashin Gau
2022-09-08 09:54:25 +08:00
committed by GitHub
parent d40a9d0555
commit dd2f834c79
8 changed files with 56 additions and 34 deletions

View File

@ -813,6 +813,12 @@ public class HiveMetaStoreClientHelper {
default:
break;
}
if (lowerCaseType.startsWith("array")) {
if (lowerCaseType.indexOf("<") == 5 && lowerCaseType.lastIndexOf(">") == lowerCaseType.length() - 1) {
Type innerType = hiveTypeToDorisType(lowerCaseType.substring(6, lowerCaseType.length() - 1));
return ArrayType.create(innerType, true);
}
}
if (lowerCaseType.startsWith("char")) {
ScalarType type = ScalarType.createType(PrimitiveType.CHAR);
Matcher match = digitPattern.matcher(lowerCaseType);