Commit Graph

7 Commits

Author SHA1 Message Date
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
8f7a62c79b [improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723) 2023-06-25 22:31:38 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00