[opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. **Advantage** for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris **Disadvantage** for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly
This commit is contained in:
@ -65,6 +65,9 @@ public class HudiUtils {
|
||||
int scale = ((LogicalTypes.Decimal) logicalType).getScale();
|
||||
return String.format("decimal(%s,%s)", precision, scale);
|
||||
} else {
|
||||
if (columnType == Schema.Type.BYTES) {
|
||||
return "binary";
|
||||
}
|
||||
return "string";
|
||||
}
|
||||
case ARRAY:
|
||||
|
||||
Reference in New Issue
Block a user