doris

Author	SHA1	Message	Date
wuwenchi	11f7b36dab	[bugfix](paimon)add class loader (#30483 )	2024-01-30 15:33:40 +08:00
slothever	b1a9370004	[fix](glue)support access glue iceberg with credential list (#30473 ) merge from #30292	2024-01-28 18:23:07 +08:00
zy-kkk	bc03354be8	[improvement](jdbc catalog) Optimize the Close logic of JDBC client (#30236 ) Optimize the Close logic of the JDBC client so that the Jdbc Catalog can correctly cancel the running query when the query is cancelled.	2024-01-23 13:22:14 +08:00
zy-kkk	0ccd706a30	[Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035 ) This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant. About Upgrade To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.	2024-01-18 12:03:07 +08:00
Mingyu Chen	12af86176a	[fix](class-loader) fix class loader conflict on BE side (#29942 ) 1. make `hadoop-common` in be java extension as `provided`. 2. must load be java extension jars before hadoop jars	2024-01-14 15:53:33 +08:00
zy-kkk	8fc9c18c85	[improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195 )	2024-01-12 11:40:28 +08:00
Ashin Gau	5789b7e380	[fix](jin) add datetimev2 precision (#29528 )	2024-01-06 13:35:26 +08:00
Ashin Gau	2d2f14bc75	[fix](paimon) use SlotDescriptor to parse the required fields (#28990 ) Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.	2023-12-27 15:45:53 +08:00
wuwenchi	0b5fe681e4	[fix](paimon) read batch by doris' batch size (#29039 )	2023-12-27 12:35:17 +08:00
zy-kkk	10623ad671	[improvement](jdbc catalog) Optimize connection pool caching logic (#28859 ) In the old caching logic, we only used jdbcurl, user, and password as cache keys. This may cause the old link to be still used when replacing the jar package, so we should concatenate all the parameters required for the connection pool as the key.	2023-12-26 14:12:37 +08:00
Ashin Gau	f30e50676e	[opt](scanner) optimize the number of threads of scanners (#28640 ) 1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only. 2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`	2023-12-26 10:24:12 +08:00
zhangstar333	726a9b96c2	[enhancement](udf) add prepare function for java-udf (#28750 )	2023-12-22 22:15:59 +08:00
wuwenchi	f38e11ec4e	[fix](paimon)fix type convert for paimon (#28774 ) fix type convert for paimon	2023-12-22 13:18:25 +08:00
wudongliang	49eed98c1e	[fix](tvf)Fixed the avro-scanner projection pushdown failing to query on multiple BEs (#28709 )	2023-12-20 19:39:26 +08:00
wudongliang	111185407c	[Improve](tvf)jni-avro support split file (#27933 )	2023-12-19 16:37:34 +08:00
zy-kkk	3e1e8d2ebe	[fix](jdbc catalog) Fixed data conversion problem when all data is null (#28230 )	2023-12-11 17:57:57 +08:00
slothever	1706699e7e	[fix](multi-catalog)support the max compute partition prune (#27154 ) 1. max compute partition prune, we just support filter mc partitions by '='，it can filter just one partition to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported. 2. add max compute row count cache and partitionValues cache 3. add max compute regression case	2023-12-01 22:28:26 +08:00
wudongliang	cd6c61347d	[Feature](tvf)(avro-jni) avro-jni add projection push down (#26885 )	2023-11-27 10:33:27 +08:00
slothever	add6bdb240	[fix](multi-catalog)add the max compute fe ut and fix download expired (#27007 ) 1. add the max compute fe ut and fix download expired 2. solve memery leak when allocator close 3. add correct partition rows	2023-11-20 10:42:07 +08:00
Mingyu Chen	c459408580	[fix](jni) avoid BE crash and NPE when close paimon reader (#27129 ) 1. Do not use FATAL log when jni encounter error, to avoid crash. 2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.	2023-11-17 20:01:08 +08:00
zy-kkk	df867a1531	[fix](catalog) Fix ClickHouse DataTime64 precision parsing (#26977 )	2023-11-15 10:23:21 +08:00
zy-kkk	2f32a721ee	[refactor](jni) unified jni framework for jdbc catalog (#26317 ) This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.	2023-11-13 14:28:15 +08:00
zy-kkk	8434389358	[fix](jdbc) fix clickhouse catalog arr nullable and add case (#26639 )	2023-11-09 19:32:05 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Gabriel	809510f8b2	[bug](udf) Fix method invoking (#26131 )	2023-10-31 11:46:14 +08:00
DongLiang-0	267c11207b	[feature](paimon)paimon catalog supports complex types (#25364 )	2023-10-23 17:32:13 +08:00
Ashin Gau	a2ceea5951	[refactor](jni) unified jni framework for java udaf (#25591 ) Follow https://github.com/apache/doris/pull/25302, and use the unified jni framework to refactor java udaf. This PR has removed the old interfaces to run java udf/udaf. Thanks to the ease of use of the new framework, the core code for modifying UDAF does not exceed 100 lines, and the logic is similar to that of UDF.	2023-10-20 16:13:40 +08:00
Ashin Gau	47689fd452	[refactor](jni) unified jni framework for java udf (#25302 ) Use the unified jni framework to refactor java udf. The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion. In addition, the unified framework supports complex and nested types. The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.	2023-10-18 09:27:54 +08:00
slothever	18c2a13e09	[fix](multi-catalog)fix maxcompute partition filter and session creation (#24911 ) add maxcompute partition support fix maxcompute partition filter modify maxcompute session create method	2023-10-17 22:36:10 +08:00
zhangdong	ce18f1148a	[improvement](catalog)compatible with paimon 0.5 (#24985 ) compatible with paimon 0.5 add p0 for paimon,need set enablePaimonTest=true	2023-10-17 22:07:13 +08:00
Ashin Gau	522faa8cd2	[fix](jni) the offset in map type is int64 (#25394 ) The offset in map type column is int64, but #24810 has put as int32, causing error like:	2023-10-13 14:23:17 +08:00
zhangdong	4e8cde127c	[Enhance](catalog)add table cache in paimon jni (#25014 ) - fix get old schema after refresh paimon table - add table cache in paimon jni	2023-10-08 10:36:18 +08:00
Ashin Gau	26818de9c8	[feature](jni) support complex types in jni framework (#24810 ) Support complex types in jni framework, and successfully run end-to-end on hudi. ### How to Use Other scanners only need to implement three interfaces in `ColumnValue`: ``` // Get array elements and append into values void unpackArray(List<ColumnValue> values); // Get map key array&value array, and append into keys&values void unpackMap(List<ColumnValue> keys, List<ColumnValue> values); // Get the struct fields specified by `structFieldIndex`, and append into values void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values); ``` Developers can take `HudiColumnValue` as an example.	2023-09-27 14:47:41 +08:00
lsy3993	1f8e0b48bc	[fix](S3)delete main function because hardcoded ip is not safe (#24872 )	2023-09-26 10:49:16 +08:00
Calvin Kirs	c832e018d0	[Dependence](Fe)Upgrade Fe dependencies (#24606 ) * be scanner - Upgrade avro to 1.11.2 fe - Upgrade quartz to 2.5.0-rc1 - Upgrade maxcompute to 0.45-2-publish - Binding avro-ipc to 1.11.2 * Binding hbase version to 2.5.5 binding nimbusds version to 9.35	2023-09-22 10:14:42 +08:00
Mryange	ee56783629	[fix](Java UDF) Do not use enum as the data type for JavaUdfDataType. (#24460 )	2023-09-19 14:06:02 +08:00
slothever	4816ca6679	[fix](multi-catalog)fix mc decimal type parse, fix wrong obj location (#24242 ) 1. mc decimal type need parse correctly by arrow vector method 2. fix wrong obj location if use oss,obs,cosn Will add test case in another PR	2023-09-15 17:44:56 +08:00
zy-kkk	dbfacdc4af	[improvement](jdbc catalog) Optimize Loop Performance by Caching `isNebula` Method Result (#24260 )	2023-09-13 21:40:28 +08:00
slothever	fca34ec337	[fix](multi-catalog)support bit type and hidden mc secret key (#24124 ) support max compute bit type and mask mc secret key bool type will use bit arrow vector should mask secret key: close #24019	2023-09-12 10:36:48 +08:00
Ashin Gau	6e28d878b5	[fix](hudi) compatible with hudi spark configuration and support skip merge (#24067 ) Fix three bugs: 1. Hudi slice maybe has log files only, so `new Path(filePath)` will throw errors. 2. Hive column names are lowercase only, so match column names in ignore-case-mode. 3. Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.	2023-09-11 19:54:59 +08:00
Mingyu Chen	f85da7d942	[improvement](jdbc) add profile for jdbc read and convert phase (#23962 ) Add 2 metrics in jdbc scan node profile: - `CallJniNextTime`: call get next from jdbc result set - `ConvertBatchTime`: call convert jobject to columm block Also fix a potential concurrency issue when init jdbc connection cache pool	2023-09-10 21:42:06 +08:00
Ashin Gau	13c9c41c1f	[opt](hudi) reduce the memory usage of avro reader (#23745 ) 1. Reduce the number of threads reading avro logs and keep the readers in a fixed thread pool. 2. Regularly cleaning the cached resolvers in the thread local map by reflection.	2023-09-05 23:59:23 +08:00
GoGoWen	228f0ac5bb	[Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021 )	2023-09-02 12:46:33 +08:00
Mryange	96c4471b4a	[feature](udf) udf array/map support decimal and update doc (#23560 ) * update * decimal * update table name * remove log * add log	2023-08-31 07:44:18 +08:00
zhangstar333	aef162ad4c	[test](log) add some log in udf function when thrown exception (#23651 ) [test](log) add some log in udf function when thrown exception (#23651)	2023-08-30 14:16:05 +08:00
slothever	f66f161017	[fix](multi-catalog)fix hive table with cosn location issue (#23409 ) Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc). This PR mainly changes: 1. Fix the bug of accessing files via cosn. 2. Add a new field `fs_name` in TFileRangeDesc This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name for each file, otherwise, it may return error: `reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`	2023-08-26 00:16:00 +08:00
slothever	5ba505ebf4	[fix](multi-catalog)fix avro and jdbc scanner dependency (#23015 ) add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions	2023-08-20 19:28:17 +08:00
zy-kkk	221e7bdd17	[test](jdbc external) fix mysql and pg external regression test (#22998 )	2023-08-16 10:44:47 +08:00
zhangdong	fa6110accd	[fix](catalog)paimon support more data type (#22899 )	2023-08-14 13:48:33 +08:00
DongLiang-0	a089fe3e43	[Improve](jni-avro)Reduce the volume of the avro-scanner-jar package (#22276 ) The avro-scanner-jar package is reduced from 204M to 160M. Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.	2023-08-11 17:26:14 +08:00

1 2

85 Commits