doris

Author	SHA1	Message	Date
wuwenchi	88cfaedb16	[opt](paimon)Optimize the storage location of the serialized paimon table for 2.1 (#44274 ) (#44660 ) bp: #44274	2024-11-27 20:35:35 +08:00
github-actions[bot]	821c0d1380	branch-2.1: [improvement](paimon)Using table serialization on the jni side (#43475 ) Cherry-picked from #43167 Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com> Co-authored-by: wuwenchi <wuwenchi@selectdb.com>	2024-11-12 14:43:32 +08:00
Mingyu Chen (Rayner)	d1e63c5201	[improvement](external)add some improvements for external scan (#38946 ) (#43156 ) bp #38946 Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>	2024-11-04 09:40:08 +08:00
Mingyu Chen (Rayner)	17d84dc88f	[enhance](paimon)paimon scanner code optimization #42606 (#42875 ) cherry pick from #42606 Co-authored-by: zhangdong <493738387@qq.com>	2024-10-30 12:51:59 +08:00
Mingyu Chen	14a2a66106	[fix](paimon) fix not able to read paimon data from hdfs with HA (#39806 ) (#39876 ) bp #39806	2024-08-24 17:51:15 +08:00
wuwenchi	41fa7bc9fd	[bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716 ) (#38592 ) bp: #37716	2024-08-01 10:23:06 +08:00
wuwenchi	4008a04da7	[bugfix](paimon)Fix field case issues for 2.1 (#36288 ) bp: #36239	2024-06-17 18:38:00 +08:00
wuwenchi	bd6b913e00	[bugfix](paimon)paimon's field length judgment error for 2.1 (#36049 ) bp #35981	2024-06-07 21:13:08 +08:00
苏小刚	72a27a0938	[fix](paimon)fix paimon cache bug (#35309 ) Issue Number: close #35024 This bug is because the fe incorrectly sets the update time of paimon catalog, causing the be to be unable to update paimon's schema in time. ```c++ private void initTable() { PaimonTableCacheKey key = new PaimonTableCacheKey(ctlId, dbId, tblId, paimonOptionParams, dbName, tblName); TableExt tableExt = PaimonTableCache.getTable(key); if (tableExt.getCreateTime() < lastUpdateTime) { LOG.warn("invalidate cache table:{}, localTime:{}, remoteTime:{}", key, tableExt.getCreateTime(), lastUpdateTime); PaimonTableCache.invalidateTableCache(key); tableExt = PaimonTableCache.getTable(key); } this.table = tableExt.getTable(); paimonAllFieldNames = PaimonScannerUtils.fieldNames(this.table.rowType()); if (LOG.isDebugEnabled()) { LOG.debug("paimonAllFieldNames:{}", paimonAllFieldNames); } } ```	2024-05-28 18:52:51 +08:00
苏小刚	11039ade7b	[opt](paimon) support mapping Paimon column type "Row" to Doris type "Struct" (#34239 ) backport: #33786	2024-04-28 19:38:50 +08:00
morningman	9c6180d9ba	[revert](jni) revert part of #32455 #32904	2024-03-27 20:45:44 +08:00
Mingyu Chen	c0d7a5660e	[fix](paimon) support paimon with hive2 (#32455 ) In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java to let it compatible with both hive2 and hive3. And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that it can overwrite the HiveMetastoreClient in hadoop jar. This PR mainly changes: 1. Copy HiveMetastoreClient.java in FE to BE's preload jar. 2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars 1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient. 2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars. 3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first. 4. Change the way the assemble the jni scanner jar Only need to assemble the project jar, without other dependencies. Because actually we only use classed under `org.apache.doris` package. So remove other unused dependency jars can also reduce the output size of BE. 5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon` 6. Support paimon with hive2 User can set `hive.version` in paimon catalog properties to specify the hive version.	2024-03-26 15:31:07 +08:00
Mingyu Chen	4c8aaa156a	[fix](jni) remove 'push_down_predicates' and fix BE crash with decimal predicate (#32253 ) (#32599 )	2024-03-21 14:07:50 +08:00
Calvin Kirs	02bded2688	[Improve](common)Optimize logging performance with LOG.isDebugEnabled() (#31091 ) * [Improve](common)Optimize logging performance with LOG.isDebugEnabled() * fix error ut	2024-02-20 09:16:14 +08:00
wuwenchi	11f7b36dab	[bugfix](paimon)add class loader (#30483 )	2024-01-30 15:33:40 +08:00
Ashin Gau	5789b7e380	[fix](jin) add datetimev2 precision (#29528 )	2024-01-06 13:35:26 +08:00
Ashin Gau	2d2f14bc75	[fix](paimon) use SlotDescriptor to parse the required fields (#28990 ) Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.	2023-12-27 15:45:53 +08:00
wuwenchi	0b5fe681e4	[fix](paimon) read batch by doris' batch size (#29039 )	2023-12-27 12:35:17 +08:00
wuwenchi	f38e11ec4e	[fix](paimon)fix type convert for paimon (#28774 ) fix type convert for paimon	2023-12-22 13:18:25 +08:00
Mingyu Chen	c459408580	[fix](jni) avoid BE crash and NPE when close paimon reader (#27129 ) 1. Do not use FATAL log when jni encounter error, to avoid crash. 2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.	2023-11-17 20:01:08 +08:00
DongLiang-0	267c11207b	[feature](paimon)paimon catalog supports complex types (#25364 )	2023-10-23 17:32:13 +08:00
zhangdong	ce18f1148a	[improvement](catalog)compatible with paimon 0.5 (#24985 ) compatible with paimon 0.5 add p0 for paimon,need set enablePaimonTest=true	2023-10-17 22:07:13 +08:00
zhangdong	4e8cde127c	[Enhance](catalog)add table cache in paimon jni (#25014 ) - fix get old schema after refresh paimon table - add table cache in paimon jni	2023-10-08 10:36:18 +08:00
zhangdong	fa6110accd	[fix](catalog)paimon support more data type (#22899 )	2023-08-14 13:48:33 +08:00
zhangdong	7fcf702081	[improvement](multi catalog)paimon support filesystem metastore (#21910 ) 1.support filesystem metastore 2.support predicate and project when split 3.fix partition table query error todo: Now you need to manually put paimon-s3-0.4.0-incubating.jar in be/lib/java_extensions when use s3 filesystem doc pr: #21966	2023-07-24 22:02:57 +08:00
Ashin Gau	9adbca685a	[opt](hudi) use spark bundle to read hudi data (#21260 ) Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data. Advantage for using spark-bundle to read hudi data: 1. The performance of spark-bundle is more than twice that of hive-bundle 2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm 3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris Disadvantage for using spark-bundle to read hudi data: 1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M) 2. spark-bundle only provides `RDD` interface and cannot be used directly	2023-07-04 17:04:49 +08:00
DongLiang-0	a6b51ec19a	[Feature](avro) Support Apache Avro file format (#19990 ) support read avro file by hdfs() or s3() . ```sql select * from s3( "uri" = "http://127.0.0.1:9312/test2/person.avro", "ACCESS_KEY" = "ak", "SECRET_KEY" = "sk", "FORMAT" = "avro"); +--------+--------------+-------------+-----------------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------------+ \| Alyssa \| 1 \| 10.0012 \| 100000000221133 \| \| Ben \| 0 \| 5555.999 \| 4009990000 \| \| lisi \| 0 \| 5992225.999 \| 9099933330 \| +--------+--------------+-------------+-----------------+ select * from hdfs( "uri" = "hdfs://127.0.0.1:9000/input/person2.avro", "fs.defaultFS" = "hdfs://127.0.0.1:9000", "hadoop.username" = "doris", "format" = "avro"); +--------+--------------+-------------+-----------+ \| name \| boolean_type \| double_type \| long_type \| +--------+--------------+-------------+-----------+ \| Alyssa \| 1 \| 8888.99999 \| 89898989 \| +--------+--------------+-------------+-----------+ ``` current avro reader only support common data type, the complex data types will be supported later.	2023-06-28 21:15:35 +08:00
yuxuan-luo	8f7a62c79b	[improvement](mutil-catalog) PaimonColumnValue support short and Decimal (#20723 )	2023-06-25 22:31:38 +08:00
Ashin Gau	923f7edad0	[opt](hudi) using native reader to read the base file with no log file (#20988 ) Two optimizations: 1. Insert string bytes directly to remove decoding&encoding process. 2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.	2023-06-20 11:20:21 +08:00
lexluo09	57656b2459	[Enhancement](java-udf) java-udf module split to sub modules (#20185 ) The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner. Co-authored-by: lexluo <lexluo@tencent.com>	2023-06-13 09:41:22 +08:00

30 Commits