Commit Graph

85 Commits

Author SHA1 Message Date
11f7b36dab [bugfix](paimon)add class loader (#30483) 2024-01-30 15:33:40 +08:00
b1a9370004 [fix](glue)support access glue iceberg with credential list (#30473)
merge from #30292
2024-01-28 18:23:07 +08:00
bc03354be8 [improvement](jdbc catalog) Optimize the Close logic of JDBC client (#30236)
Optimize the Close logic of the JDBC client so that the Jdbc Catalog can correctly cancel the running query when the query is cancelled.
2024-01-23 13:22:14 +08:00
0ccd706a30 [Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035)
This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.

About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
2024-01-18 12:03:07 +08:00
12af86176a [fix](class-loader) fix class loader conflict on BE side (#29942)
1. make `hadoop-common` in be java extension as `provided`.
2. must load be java extension jars before hadoop jars
2024-01-14 15:53:33 +08:00
8fc9c18c85 [improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195) 2024-01-12 11:40:28 +08:00
5789b7e380 [fix](jin) add datetimev2 precision (#29528) 2024-01-06 13:35:26 +08:00
2d2f14bc75 [fix](paimon) use SlotDescriptor to parse the required fields (#28990)
Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.
2023-12-27 15:45:53 +08:00
0b5fe681e4 [fix](paimon) read batch by doris' batch size (#29039) 2023-12-27 12:35:17 +08:00
10623ad671 [improvement](jdbc catalog) Optimize connection pool caching logic (#28859)
In the old caching logic, we only used jdbcurl, user, and password as cache keys. This may cause the old link to be still used when replacing the jar package, so we should concatenate all the parameters required for the connection pool as the key.
2023-12-26 14:12:37 +08:00
f30e50676e [opt](scanner) optimize the number of threads of scanners (#28640)
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
2023-12-26 10:24:12 +08:00
726a9b96c2 [enhancement](udf) add prepare function for java-udf (#28750) 2023-12-22 22:15:59 +08:00
f38e11ec4e [fix](paimon)fix type convert for paimon (#28774)
fix type convert for paimon
2023-12-22 13:18:25 +08:00
49eed98c1e [fix](tvf)Fixed the avro-scanner projection pushdown failing to query on multiple BEs (#28709) 2023-12-20 19:39:26 +08:00
111185407c [Improve](tvf)jni-avro support split file (#27933) 2023-12-19 16:37:34 +08:00
3e1e8d2ebe [fix](jdbc catalog) Fixed data conversion problem when all data is null (#28230) 2023-12-11 17:57:57 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
cd6c61347d [Feature](tvf)(avro-jni) avro-jni add projection push down (#26885) 2023-11-27 10:33:27 +08:00
add6bdb240 [fix](multi-catalog)add the max compute fe ut and fix download expired (#27007)
1. add the max compute fe ut and fix download expired
2. solve memery leak when allocator close
3. add correct partition rows
2023-11-20 10:42:07 +08:00
c459408580 [fix](jni) avoid BE crash and NPE when close paimon reader (#27129)
1. Do not use FATAL log when jni encounter error, to avoid crash.
2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.
2023-11-17 20:01:08 +08:00
df867a1531 [fix](catalog) Fix ClickHouse DataTime64 precision parsing (#26977) 2023-11-15 10:23:21 +08:00
2f32a721ee [refactor](jni) unified jni framework for jdbc catalog (#26317)
This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.
2023-11-13 14:28:15 +08:00
8434389358 [fix](jdbc) fix clickhouse catalog arr nullable and add case (#26639) 2023-11-09 19:32:05 +08:00
22bf2889e5 [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236)
Support avro's enum, record, union data types
2023-11-09 13:58:49 +08:00
809510f8b2 [bug](udf) Fix method invoking (#26131) 2023-10-31 11:46:14 +08:00
267c11207b [feature](paimon)paimon catalog supports complex types (#25364) 2023-10-23 17:32:13 +08:00
a2ceea5951 [refactor](jni) unified jni framework for java udaf (#25591)
Follow https://github.com/apache/doris/pull/25302, and use the unified jni framework to refactor java udaf.
This PR has removed the old interfaces to run java udf/udaf. Thanks to the ease of use of the new framework, the core code for modifying UDAF does not exceed 100 lines, and the logic is similar to that of UDF.
2023-10-20 16:13:40 +08:00
47689fd452 [refactor](jni) unified jni framework for java udf (#25302)
Use the unified jni framework to refactor java udf.
The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion.
In addition, the unified framework supports complex and nested types.
The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.
2023-10-18 09:27:54 +08:00
18c2a13e09 [fix](multi-catalog)fix maxcompute partition filter and session creation (#24911)
add maxcompute partition support
fix maxcompute partition filter
modify maxcompute session create method
2023-10-17 22:36:10 +08:00
ce18f1148a [improvement](catalog)compatible with paimon 0.5 (#24985)
compatible with paimon 0.5
add p0 for paimon,need set enablePaimonTest=true
2023-10-17 22:07:13 +08:00
522faa8cd2 [fix](jni) the offset in map type is int64 (#25394)
The offset in map type column is int64, but #24810 has put as int32, causing error like:
2023-10-13 14:23:17 +08:00
4e8cde127c [Enhance](catalog)add table cache in paimon jni (#25014)
- fix get old schema after refresh paimon table
- add table cache in paimon jni
2023-10-08 10:36:18 +08:00
26818de9c8 [feature](jni) support complex types in jni framework (#24810)
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);

// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);

// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
2023-09-27 14:47:41 +08:00
1f8e0b48bc [fix](S3)delete main function because hardcoded ip is not safe (#24872) 2023-09-26 10:49:16 +08:00
c832e018d0 [Dependence](Fe)Upgrade Fe dependencies (#24606)
* be scanner
- Upgrade avro to 1.11.2
fe
- Upgrade quartz to 2.5.0-rc1
- Upgrade maxcompute to 0.45-2-publish
- Binding  avro-ipc  to 1.11.2

* Binding hbase  version to 2.5.5
binding nimbusds version to 9.35
2023-09-22 10:14:42 +08:00
ee56783629 [fix](Java UDF) Do not use enum as the data type for JavaUdfDataType. (#24460) 2023-09-19 14:06:02 +08:00
4816ca6679 [fix](multi-catalog)fix mc decimal type parse, fix wrong obj location (#24242)
1. mc decimal type need parse correctly by arrow vector method
2. fix wrong obj location if use oss,obs,cosn

Will add test case in another PR
2023-09-15 17:44:56 +08:00
dbfacdc4af [improvement](jdbc catalog) Optimize Loop Performance by Caching isNebula Method Result (#24260) 2023-09-13 21:40:28 +08:00
fca34ec337 [fix](multi-catalog)support bit type and hidden mc secret key (#24124)
support max compute bit type and mask mc secret key
bool type will use bit arrow vector
should mask secret key: close #24019
2023-09-12 10:36:48 +08:00
6e28d878b5 [fix](hudi) compatible with hudi spark configuration and support skip merge (#24067)
Fix three bugs:
1. Hudi slice maybe has log files only, so `new Path(filePath)`  will throw errors.
2. Hive column names are lowercase only, so match column names in ignore-case-mode.
3.  Compatible with [Spark Datasource Configs](https://hudi.apache.org/docs/configurations/#Read-Options), so users can add `hoodie.datasource.merge.type=skip_merge` in catalog properties to skip merge logs files.
2023-09-11 19:54:59 +08:00
f85da7d942 [improvement](jdbc) add profile for jdbc read and convert phase (#23962)
Add 2 metrics in jdbc scan node profile:
- `CallJniNextTime`: call get next from jdbc result set
- `ConvertBatchTime`: call convert jobject to columm block

Also fix a potential concurrency issue when init jdbc connection cache pool
2023-09-10 21:42:06 +08:00
13c9c41c1f [opt](hudi) reduce the memory usage of avro reader (#23745)
1. Reduce the number of threads reading avro logs and keep the readers in a fixed thread pool.
2. Regularly cleaning the cached resolvers in the thread local map by reflection.
2023-09-05 23:59:23 +08:00
228f0ac5bb [Feature](Multi-Catalog) support query doris bitmap column in external jdbc catalog (#23021) 2023-09-02 12:46:33 +08:00
96c4471b4a [feature](udf) udf array/map support decimal and update doc (#23560)
* update

* decimal

* update table name

* remove log

* add log
2023-08-31 07:44:18 +08:00
aef162ad4c [test](log) add some log in udf function when thrown exception (#23651)
[test](log) add some log in udf function when thrown exception (#23651)
2023-08-30 14:16:05 +08:00
f66f161017 [fix](multi-catalog)fix hive table with cosn location issue (#23409)
Sometimes, the partitions of a hive table may on different storage, eg, some is on HDFS, others on object storage(cos, etc).
This PR mainly changes:

1. Fix the bug of accessing files via cosn.
2. Add a new field `fs_name` in TFileRangeDesc
    This is because, when accessing a file, the BE will get a hdfs client from hdfs client cache, and different file in one query
request may have different fs name, eg, some of are `hdfs://`, some of are `cosn://`, so we need to specify fs name
for each file, otherwise, it may return error:

`reason: IllegalArgumentException: Wrong FS: cosn://doris-build-1308700295/xxxx, expected: hdfs://[172.xxxx:4007](http://172.xxxxx:4007/)`
2023-08-26 00:16:00 +08:00
5ba505ebf4 [fix](multi-catalog)fix avro and jdbc scanner dependency (#23015)
add preload-extensions module, put all conflict dependencies to pom.xml in preload-extensions
2023-08-20 19:28:17 +08:00
221e7bdd17 [test](jdbc external) fix mysql and pg external regression test (#22998) 2023-08-16 10:44:47 +08:00
fa6110accd [fix](catalog)paimon support more data type (#22899) 2023-08-14 13:48:33 +08:00
a089fe3e43 [Improve](jni-avro)Reduce the volume of the avro-scanner-jar package (#22276)
The avro-scanner-jar package is reduced from 204M to 160M.

Hadoop-related dependencies in the original avro pom are directly packaged into a jar package, resulting in a jar volume of 200M. Now since there is already a hadoop jar package environment in be lib, it can be directly referenced.
2023-08-11 17:26:14 +08:00