Commit Graph

112 Commits

Author SHA1 Message Date
fbd2c9db2d [upgrade](hive-shade)(paimon) upgrade hive shade to 2.0.0 and paimon to 0.7 (#34085)
* Adapt paimon 0.6.0 (#33943)

Version 2.0.0 of the shade package eliminates potential jar conflicts, resolves dependency component issues, and significantly reduces package size.
Utilize the directly-dependent guava library instead of relying on transitively included libraries.

* [chore](dependencies)Upgrade paimon to 0.7.0 (#33987)

---------

Co-authored-by: Calvin Kirs <kirs@apache.org>
2024-04-25 12:01:44 +08:00
f6af79c0ed [fix](catalog) Remove unexpected cleanup when reading jdbc data (#33529) 2024-04-17 23:42:12 +08:00
e9b67bc82d [bugfix](paimon)merge meta-inf/services for paimon FileIOLoader (#33166)
We introduced paimon's oss and s3 packages, but did not register them in meta-info/service. As a result, when be used the s3  or oss interface, an error was reported and the class could not be found(`Could not find a file io implementation for scheme 's3' in the classpath.`).

FYI:
https://stackoverflow.com/questions/47310215/merging-meta-inf-services-files-with-maven-assembly-plugin
https://stackoverflow.com/questions/1607220/how-can-i-merge-resource-files-in-a-maven-assembly
2024-04-07 22:13:00 +08:00
9c6180d9ba [revert](jni) revert part of #32455 #32904 2024-03-27 20:45:44 +08:00
c0d7a5660e [fix](paimon) support paimon with hive2 (#32455)
In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java
to let it compatible with both hive2 and hive3.
And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that
it can overwrite the HiveMetastoreClient in hadoop jar.

This PR mainly changes:

1. Copy HiveMetastoreClient.java in FE to BE's preload jar.

2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars
    1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient.
    2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars.

3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first.

4. Change the way the assemble the jni scanner jar
    Only need to assemble the project jar, without other dependencies.
    Because actually we only use classed under `org.apache.doris` package.
    So remove other unused dependency jars can also reduce the output size of BE.

5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon`

6. Support paimon with hive2
    User can set `hive.version` in paimon catalog properties to specify the hive version.
2024-03-26 15:31:07 +08:00
ec43f65235 [feature](hudi) support hudi incremental read (#32052)
* [feature](hudi) support incremental read for hudi table

* fix jdk17 java options
2024-03-26 15:31:07 +08:00
a10466598b [fix](jdbc catalog) Fix query errors without jdbc pool default value on only BE upgrade (#32618) 2024-03-22 16:36:22 +08:00
4c8aaa156a [fix](jni) remove 'push_down_predicates' and fix BE crash with decimal predicate (#32253) (#32599) 2024-03-21 14:07:50 +08:00
e952b5ef5b [opt](jdbc catalog) Refine the jdbc_connector close logic and actively clear the jvm occupied by jdbcexecutor (#32300) 2024-03-21 14:07:23 +08:00
85b2c42f76 [Enhancement](jdbc catalog) Add a property to test the connection when creating a Jdbc catalog (#32125) (#32531) 2024-03-21 14:05:59 +08:00
cf6b22c621 [fix](jdbc catalog) fix type conversion error in MySQL JDBC Driver 5.x (#31880) 2024-03-12 14:07:57 +08:00
0f1cbcc86a [refactor](jdbc catalog) split postgresql jdbc Executor (#31730) 2024-03-06 13:08:04 +08:00
2e9bd268cd [improvement](jdbc catalog) support sqlserver timestamp type read (#31805) 2024-03-06 13:08:04 +08:00
11903d29a1 [fix](jdbc catalog) fix close abort in sqlserver (#31718) 2024-03-06 13:07:49 +08:00
a5b9127656 [refactor](jdbc catalog) split sqlserver jdbc executor (#31679) 2024-03-06 13:04:29 +08:00
07224686ef [feature](jdbc catalog) support db2 jdbc catalog (#31627) 2024-03-01 14:19:28 +08:00
0b5b7175d6 [fix](multi-catalog) add max compute custom odps and tunnel url (#31390)
add max compute custom odps and tunnel url
2024-02-29 16:44:40 +08:00
450422556a [fix](paimon) auto deplay paimon oss/s3 jar file (#31568)
No need to deploy paimon oss/s3 jar files manually.
Include them in preload-extensions-jar-with-dependencies.jar
2024-02-29 16:44:40 +08:00
3c37fb085c [refactor](jdbc catalog) split jdbc executor for different data sources (step-1) (#31406) 2024-02-29 12:38:03 +08:00
883d022f84 [fix](paimon) fix hadoop.username does not take effect in paimon catalog (#31478) 2024-02-28 13:08:41 +08:00
18f9ca9242 [fix](jdbc catalog) Fix Resource Closing Logic After Connection Abort (#31219)
* [improvement](jdbc catalog) Optimize the closing logic of Jdbc connection after abort

* [improvement](jdbc catalog) Optimize the closing logic of Jdbc connection after abort

* fix
2024-02-22 20:35:10 +08:00
260568db17 [update](hudi) update hudi version to 0.14.1 and compatible with flink hive catalog (#31181)
1. Update hudi version from 0.13.1 to .14.1
2. Compatible with the hudi table created by flink hive catalog
2024-02-22 19:51:20 +08:00
02bded2688 [Improve](common)Optimize logging performance with LOG.isDebugEnabled() (#31091)
* [Improve](common)Optimize logging performance with LOG.isDebugEnabled()

* fix error ut
2024-02-20 09:16:14 +08:00
4a33d9820a [fix](multi-catalog)fix getting ugi methods and unify them (#30844)
put all ugi login methods to HadoopUGI
2024-02-20 09:12:38 +08:00
bc2e8ac8f9 [fix](kerberos) fix kerberos ugi login method (#30766) 2024-02-06 08:35:54 +08:00
4f8730d092 [improvement](jdbc catalog) Optimize connection pool parameter settings (#30588)
This PR makes the following changes to the connection pool of JDBC Catalog
1. Set the maximum connection survival time, the default is 30 minutes

-   Moreover, one-half of the maximum survival time is the recyclable time,
-   One-tenth is the check interval for recycling connections

2. Keepalive only takes effect on the connection pool on BE, and will be activated based on one-fifth of the maximum survival time.
3. The maximum number of existing connections is changed from 100 to 10
4. Add the connection cache recycling thread on BE, and add a parameter to control the recycling time, the default is 28800 (8 hours)
5. Add CatalogID to the key of the connection pool cache to achieve better isolation, requires refresh catalog to take effect
6. Upgrade druid connection pool to version 1.2.20
7. Added JdbcResource's setting of default parameters when upgrading the FE version to avoid errors due to unset parameters.
2024-02-03 20:26:03 +08:00
9b7c6af581 [Fix](JDK17) Fixed that BE could not be started using JDK17 (#30286)
Issue Number: #30484

This is because hadoop-client-api relies on hadoop-common.
In the case of JDK17, it will still include hadoop-common.
2024-02-01 23:14:14 +08:00
11f7b36dab [bugfix](paimon)add class loader (#30483) 2024-01-30 15:33:40 +08:00
b1a9370004 [fix](glue)support access glue iceberg with credential list (#30473)
merge from #30292
2024-01-28 18:23:07 +08:00
bc03354be8 [improvement](jdbc catalog) Optimize the Close logic of JDBC client (#30236)
Optimize the Close logic of the JDBC client so that the Jdbc Catalog can correctly cancel the running query when the query is cancelled.
2024-01-23 13:22:14 +08:00
0ccd706a30 [Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035)
This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.

About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
2024-01-18 12:03:07 +08:00
12af86176a [fix](class-loader) fix class loader conflict on BE side (#29942)
1. make `hadoop-common` in be java extension as `provided`.
2. must load be java extension jars before hadoop jars
2024-01-14 15:53:33 +08:00
8fc9c18c85 [improvement](jdbc catalog) Put the jdbc connection pool parameters into catalog properties (#29195) 2024-01-12 11:40:28 +08:00
5789b7e380 [fix](jin) add datetimev2 precision (#29528) 2024-01-06 13:35:26 +08:00
2d2f14bc75 [fix](paimon) use SlotDescriptor to parse the required fields (#28990)
Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.
2023-12-27 15:45:53 +08:00
0b5fe681e4 [fix](paimon) read batch by doris' batch size (#29039) 2023-12-27 12:35:17 +08:00
10623ad671 [improvement](jdbc catalog) Optimize connection pool caching logic (#28859)
In the old caching logic, we only used jdbcurl, user, and password as cache keys. This may cause the old link to be still used when replacing the jar package, so we should concatenate all the parameters required for the connection pool as the key.
2023-12-26 14:12:37 +08:00
f30e50676e [opt](scanner) optimize the number of threads of scanners (#28640)
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
2023-12-26 10:24:12 +08:00
726a9b96c2 [enhancement](udf) add prepare function for java-udf (#28750) 2023-12-22 22:15:59 +08:00
f38e11ec4e [fix](paimon)fix type convert for paimon (#28774)
fix type convert for paimon
2023-12-22 13:18:25 +08:00
49eed98c1e [fix](tvf)Fixed the avro-scanner projection pushdown failing to query on multiple BEs (#28709) 2023-12-20 19:39:26 +08:00
111185407c [Improve](tvf)jni-avro support split file (#27933) 2023-12-19 16:37:34 +08:00
3e1e8d2ebe [fix](jdbc catalog) Fixed data conversion problem when all data is null (#28230) 2023-12-11 17:57:57 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
cd6c61347d [Feature](tvf)(avro-jni) avro-jni add projection push down (#26885) 2023-11-27 10:33:27 +08:00
add6bdb240 [fix](multi-catalog)add the max compute fe ut and fix download expired (#27007)
1. add the max compute fe ut and fix download expired
2. solve memery leak when allocator close
3. add correct partition rows
2023-11-20 10:42:07 +08:00
c459408580 [fix](jni) avoid BE crash and NPE when close paimon reader (#27129)
1. Do not use FATAL log when jni encounter error, to avoid crash.
2. Fix NPE when closing PaimonReader, the reader may not be assigned if PaimonReader open failed.
2023-11-17 20:01:08 +08:00
df867a1531 [fix](catalog) Fix ClickHouse DataTime64 precision parsing (#26977) 2023-11-15 10:23:21 +08:00
2f32a721ee [refactor](jni) unified jni framework for jdbc catalog (#26317)
This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.
2023-11-13 14:28:15 +08:00
8434389358 [fix](jdbc) fix clickhouse catalog arr nullable and add case (#26639) 2023-11-09 19:32:05 +08:00