doris

Author	SHA1	Message	Date
zy-kkk	0be349e250	[feature](jdbc) Support jdbc catalog to read json types (#21341 )	2023-07-10 16:21:00 +08:00
zhangstar333	bb985cd9a1	[refactor](udf) refactor java-udf execute method by using for loop (#21388 )	2023-07-07 11:43:11 +08:00
Ashin Gau	0084b9fd9a	[fix](hudi) scala can't call Properties.putAll in jdk11 (#21494 )	2023-07-05 10:53:09 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
Calvin Kirs	e4c0a0ac24	[improve](dependency)Upgrade dependency version (#21431 ) exclude old netty version upgrade spring-boot version to 2.7.13 used ojdbc8 replace ojdbc6 upgrade jackson version to 2.15.2 upgrade fabric8 version to 6.7.2	2023-07-04 11:29:21 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
slothever	2b3c82f57a	[fix](multi-catalog)fix max compute scanner OOM and datetime (#20957 ) 1. Fix MC jni scanner OOM 2. add the second datetime type for MC SDK timestamp 3. make s3 uri case insensitive by the way 4. optimize max compute scanner parallel model	2023-06-26 13:53:29 +08:00
yuxuan-luo	8f7a62c79b	[improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723 )	2023-06-25 22:31:38 +08:00
Ashin Gau	ef17289925	[feature](jni) add jni metrics and attach to BE profile automatically (#21004 ) Add JNI metrics, for example: ``` - HudiJniScanner: 0ns - FillBlockTime: 31.29ms - GetRecordReaderTime: 1m5s - JavaScanTime: 35s991ms - OpenScannerTime: 1m6s ``` Add three common performance metrics for JNI scanner: 1. `OpenScannerTime`: Time to init and open JNI scanner 2. `JavaScanTime`: Time to scan data and insert into vector table in java side 3. `FillBlockTime`: Time to convert java vector table to c++ block And support user defined metrics in java side, for example: `OpenScannerTime` is a long time for the open process, we want to determine which sub-process takes too much time, so we add `GetRecordReaderTime` in java side. The user defined metrics in java side can be attached to BE profile automatically.	2023-06-21 11:19:02 +08:00
zy-kkk	53b2fe5db6	[improvement](jdbc) Set the JDBC connection timeout to be conf (#21000 )	2023-06-20 14:23:48 +08:00
Ashin Gau	923f7edad0	[opt](hudi) using native reader to read the base file with no log file (#20988 ) Two optimizations: 1. Insert string bytes directly to remove decoding&encoding process. 2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.	2023-06-20 11:20:21 +08:00
zy-kkk	d9b3c2aba2	[improvement](jdbc) support support get mysql information_schema's table and clickhouse system's table (#20768 )	2023-06-15 14:53:51 +08:00
zy-kkk	09d187ec77	[improvement](ck jdbc) Optimized reading of datetime and ip types of the ClickHouse JDBC Catalog (#20804 )	2023-06-14 23:28:08 +08:00
Ashin Gau	062641e8f8	[fix](hudi) set default class loader for hudi serializer (#20680 ) hudi serializer `org.apache.hudi.common.util.SerializationUtils$KryoInstantiator.newKryo` throws error like `java.lang.IllegalArgumentException: classLoader cannot be null`. Set the default class loader for scan thread. ``` public Kryo newKryo() { Kryo kryo = new Kryo(); ... // Thread.currentThread().getContextClassLoader() returns null kryo.setClassLoader(Thread.currentThread().getContextClassLoader()); ... return kryo; } ```	2023-06-14 16:02:56 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00

15 Commits