Commit Graph

15 Commits

Author SHA1 Message Date
5789b7e380 [fix](jin) add datetimev2 precision (#29528) 2024-01-06 13:35:26 +08:00
2d2f14bc75 [fix](paimon) use SlotDescriptor to parse the required fields (#28990)
Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.
2023-12-27 15:45:53 +08:00
0b5fe681e4 [fix](paimon) read batch by doris' batch size (#29039) 2023-12-27 12:35:17 +08:00
f38e11ec4e [fix](paimon)fix type convert for paimon (#28774)
fix type convert for paimon
2023-12-22 13:18:25 +08:00
c459408580 [fix](jni) avoid BE crash and NPE when close paimon reader (#27129)
1. Do not use FATAL log when jni encounter error, to avoid crash.
2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.
2023-11-17 20:01:08 +08:00
267c11207b [feature](paimon)paimon catalog supports complex types (#25364) 2023-10-23 17:32:13 +08:00
ce18f1148a [improvement](catalog)compatible with paimon 0.5 (#24985)
compatible with paimon 0.5
add p0 for paimon,need set enablePaimonTest=true
2023-10-17 22:07:13 +08:00
4e8cde127c [Enhance](catalog)add table cache in paimon jni (#25014)
- fix get old schema after refresh paimon table
- add table cache in paimon jni
2023-10-08 10:36:18 +08:00
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
7fcf702081 [improvement](multi catalog)paimon support filesystem metastore (#21910)
1.support filesystem metastore

2.support predicate and project when split

3.fix partition table query error

todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem

doc pr: #21966
2023-07-24 22:02:57 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
8f7a62c79b [improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723) 2023-06-25 22:31:38 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00