Commit Graph

9 Commits

Author SHA1 Message Date
6d925054de [feature-wip](parquet-reader) decode parquet time & datetime & decimal (#11845)
1. Spark can set the timestamp precision by the following configuration:
spark.sql.parquet.outputTimestampType = INT96(NANOS), TIMESTAMP_MICROS, TIMESTAMP_MILLIS
DATETIME V1 only keeps the second precision, DATETIME V2 keeps the microsecond precision.
2. If using DECIMAL V2, the BE saves the value as decimal128, and keeps the precision of decimal as (precision=27, scale=9). DECIMAL V3 can maintain the right precision of decimal
2022-08-22 10:15:35 +08:00
124b4f7694 [feature-wip](parquet-reader) row group reader ut finish (#11887)
Co-authored-by: jinzhe <jinzhe@selectdb.com>
2022-08-18 17:18:14 +08:00
f39f57636b [feature-wip](parquet-reader) update column read model and add page index (#11601) 2022-08-16 15:04:07 +08:00
0b9bfd15b7 [feature-wip](parquet-reader) parquet physical type to doris logical type (#11769)
Two improvements have been added:
1. Translate parquet physical type into doris logical type.
2. Decode parquet column chunk into doris ColumnPtr, and add unit tests to show how to use related API.
2022-08-15 16:08:11 +08:00
8f5aed27ec [feature-wip](parquet-reader)read and decode parquet physical type (#11637)
# Proposed changes

Read and decode parquet physical type.
1. The encoding type of boolean is bit-packing, this PR introduces the implementation of bit-packing from Impala
2. Create a parquet including all the primitive types supported by hive

## Remaining Problems
1. At present, only physical types are decoded, and there is no corresponding and conversion methods with doris logical.
2. No parsing and processing Decimal type / Timestamp / Date.
3. Int_8 / Int_16 is stored as Int_32. How to resolve these types.
2022-08-11 10:17:32 +08:00
37d1180cca [feature-wip](parquet-reader)decode parquet data (#11536) 2022-08-08 12:44:06 +08:00
e8a344b683 [feature-wip](parquet-reader) add predicate filter and column reader (#11488) 2022-08-08 10:21:24 +08:00
44a1a20e65 [feature-wip](parquet-reader)parse parquet schema (#11381)
Analyze schema elements in parquet FileMetaData, and generate the hierarchy of nested fields.
For exmpale:
1. primitive type
```
// thrift:
optional int32 <column-name>;
// sql definition:
<column-name> int32;
```
2. nested type
```
// thrift:
optional group <column-name> (LIST) {
  repeated group bag {
    optional group array_element (LIST) {
      repeated group bag {
        optional int32 array_element
      }
    }
  }
}
// sql definition:
<column-name> array<array<int32>>
```
2022-08-02 10:56:13 +08:00
e4bc3f6b6f [feature-wip] (parquet-reader) add parquet reader impl template (#11285) 2022-07-29 14:30:31 +08:00