[opt](hudi) use spark bundle to read hudi data (#21260)

Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
This commit is contained in:
Ashin Gau
2023-07-04 17:04:49 +08:00
committed by GitHub
parent 90dd8716ed
commit 9adbca685a
26 changed files with 1524 additions and 540 deletions

View File

@ -65,6 +65,9 @@ public class HudiUtils {
int scale = ((LogicalTypes.Decimal) logicalType).getScale();
return String.format("decimal(%s,%s)", precision, scale);
} else {
if (columnType == Schema.Type.BYTES) {
return "binary";
}
return "string";
}
case ARRAY: