[opt](hudi) use spark bundle to read hudi data (#21260)

Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. **Advantage** for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris **Disadvantage** for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
parent 90dd8716ed
commit 9adbca685a
26 changed files with 1524 additions and 540 deletions
--- a/fe/fe-core/src/main/java/org/apache/doris/catalog/HudiUtils.java
+++ b/fe/fe-core/src/main/java/org/apache/doris/catalog/HudiUtils.java
@ -65,6 +65,9 @@ public class HudiUtils {
                    int scale = ((LogicalTypes.Decimal) logicalType).getScale();
                    return String.format("decimal(%s,%s)", precision, scale);
                } else {
+                    if (columnType == Schema.Type.BYTES) {
+                        return "binary";
+                    }
                    return "string";
                }
            case ARRAY: