Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.
**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris
**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
exclude old netty version
upgrade spring-boot version to 2.7.13
used ojdbc8 replace ojdbc6
upgrade jackson version to 2.15.2
upgrade fabric8 version to 6.7.2
1. Fix MC jni scanner OOM
2. add the second datetime type for MC SDK timestamp
3. make s3 uri case insensitive by the way
4. optimize max compute scanner parallel model
Add JNI metrics, for example:
```
- HudiJniScanner: 0ns
- FillBlockTime: 31.29ms
- GetRecordReaderTime: 1m5s
- JavaScanTime: 35s991ms
- OpenScannerTime: 1m6s
```
Add three common performance metrics for JNI scanner:
1. `OpenScannerTime`: Time to init and open JNI scanner
2. `JavaScanTime`: Time to scan data and insert into vector table in java side
3. `FillBlockTime`: Time to convert java vector table to c++ block
And support user defined metrics in java side, for example: `OpenScannerTime` is a long time for the open process, we want to determine which sub-process takes too much time, so we add `GetRecordReaderTime` in java side.
The user defined metrics in java side can be attached to BE profile automatically.
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread.
```
public Kryo newKryo() {
Kryo kryo = new Kryo();
...
// Thread.currentThread().getContextClassLoader() returns null
kryo.setClassLoader(Thread.currentThread().getContextClassLoader());
...
return kryo;
}
```
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.
Co-authored-by: lexluo <lexluo@tencent.com>