Commit Graph

74 Commits

Author SHA1 Message Date
3a22af836e [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463)
* [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion

* add test case
2023-05-10 21:53:30 +08:00
1bc405c06f [fix](catalog) fix doris jdbc catalog largeint select error (#19407)
when I use mysql-jdbc 5.1.47 create a doris jdbc catalog, the largeint cannot select
When mysql-jdbc reads largeint, it will convert the format to string because it is too long

mysql> select `largeint` from type3;
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Fail to convert jdbc type of java.lang.String to doris type LARGEINT on column: largeint. You need to check this column type between external table and doris table.
2023-05-09 17:34:48 +08:00
aeb3450151 [feature](graph)Support querying data from the Nebula graph database (#19209)
Support querying data from the Nebula graph database
This feature comes from the needs of commercial customers who have used Doris and Nebula, hoping to connect these two databases

changes mainly include:

* add New Graph Database JDBC Type
* Adapt the type and map the graph to the Doris type
2023-05-09 15:30:11 +08:00
5459cd9c30 [Improve](fe)Upgrade dependencies and optimize jar package management (#18882)
bind netty-version to 4.1.89-final
bind jettison to 1.5.4
upgrade hadoop version to 3.3.5
upgrade range-plugins-common to 2.4.0
bind bcprov-jdk15on to 2.4.0
upgrade and bind woodstox to 6.5.1
upgrade and bind kerby to 2.0.3
upgrade hudi to 0.13.0
upgrade parquet to 1.13.0
upgrade maven-source-plugin to 3.2.1
upgrade maven-assembly-plugin to 3.3.0
upgrade maven-javadoc-plugin to 3.3.2
upgrade maven-shade-plugin to 3.3.4
upgrade maven-clean-plugin to 3.1.0
Remove meaningless plugins
Optimize doris maven path
Unify the Java modules for management in fe
2023-05-04 10:07:37 +08:00
8864266a42 [fix](Jdbc Catalog) fix Druid Pool parameter and set testWhileIdle = true (#19049)
Set `testWhileIdle` for the druid pool to true
2023-04-26 11:44:45 +08:00
fd905b66b0 [refactor](jdbc) close datasource if no need to maintain the cache (#18724)
after pr #18670
could use jvm parameters to init jdbc datasource,
but when set JDBC_MIN_POOL=0, it can be immediately closed.
There is no need to wait for the recycling timer.
2023-04-22 22:07:34 +08:00
13894ae790 [fix](jdbc catalog) Use default value if the user does not set the pool parameter in be.conf #18919 2023-04-22 08:39:26 +08:00
575c1620c2 [Improve](fe)Use commons-lang3 uniformly and refactor PatternGenerator#generateTypePattern (#18666)
`commons-lang`(1and2) is no longer maintained since 2011, and the official recommendation is `commons-lang3`, which can be smoothly upgraded to be compatible with `commons-lang`.
We use both dependencies in `fe`, which can be completely unified.

`PatternGenerator#generateTypePattern` has many meaningless loops, and IntegerRange is introduced for,
which is unnecessary. So I refactored it.
2023-04-17 20:15:17 +08:00
ddbff2aa39 [feature](jni) map c++ block to java vector table (#18566)
PR(#17960) has introduced vector table which can map java table to c++ block.
In some cases(java udf & jdbc exector), we should map c++ block to java table. This PR implements this function.

The memory structure of java vector table and c++ block is consistent,
so the implementation doesn't copy the block, just passes the memory address.
2023-04-17 00:04:53 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
e1b3955e05 [refactor](jdbc) using jvm parameters to init jdbc datasource (#18670)
using the jvm parameters to init jdbc datasource connect pool.
if anyone don't need to maintain the connect, so could set JDBC_MIN_POOL=0
2023-04-14 18:45:29 +08:00
1d3699a70c [refactor](jdbc) refactor jdbc connection num in datasource (#18563)
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
2023-04-13 22:08:08 +08:00
5f981b0b1f [fix](catalog)Use hive-catalog-shade to solve thrift version compatibility issues (#18504)
`Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package.
These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies
in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.
2023-04-11 13:19:39 +08:00
e29fc3b46b [fix](chore) fix compile failed in JdbcExecutor and revert #18306 since be crash randomly (#18371)
fix 2 problems:
1. PR #18187 use the api resizeColumn in JNINativeMethod has been removed by #17960
2. revert PR #18306 to fix pipeline core when load
2023-04-04 20:04:28 +08:00
54dbb4af67 [vectorzied](jdbc) refactor jdbc table read array type (#18187)
jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
2023-04-04 11:57:04 +08:00
fe9d2b00fc [test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007) 2023-04-03 20:18:36 +08:00
1c2f95b887 [improve](clickhouse jdbc) support clickhouse jdbc 4.x version (#18258)
In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External
```
com.clickhouse.data.value.UnsignedByte;
com.clickhouse.data.value.UnsignedInteger;
com.clickhouse.data.value.UnsignedLong;
com.clickhouse.data.value.UnsignedShort;
```
2023-03-31 13:40:10 +08:00
d6b0fe9072 [feature](jni) jni table scanner framework (#17960)
A framework that read data from jni scanner, which can support the data source from java ecosystem(java API).

## Java Interface
Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods:
```
// Initialize JniScanner
public abstract void open() throws IOException;
// Close JniScanner and release resources
public abstract void close() throws IOException;
// Scan data and save as vector table
public abstract int getNext() throws IOException;
```
See demo usage in `org.apache.doris.jni.MockJniScanner`

## c++ interface
C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. 

## Pushed-down predicates
Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`.

## Remaining works:
1. Implement complex nested types.
2. Read hudi MOR table as the end-to-end demo usage.
2023-03-30 23:47:45 +08:00
3e8b3d68fc [BugFix](jdbc catalog) fix OOM when jdbc catalog querys large data from doris #18067
When using JDBC Catalog to query the Doris data, because Doris does not provide the cursor reading method (that is, fetchBatchSize is invalid), Doris will send the data to the client at one time, resulting in client OOM.

The MySQL protocol provides a stream reading method. Doris can use this method to avoid OOM. The requirements of using the stream method are setting fetchbatchsize =  Integer.MIN_VALUE and setting ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY
2023-03-26 20:02:03 +08:00
e2e806a5e7 [improve](clickhouse jdbc) support clickhouse array type (#17993)
In this PR, I match the array type of ClickHouse to the array type of Doris's jdbc external.
2023-03-22 19:42:32 +08:00
e359e412e1 [vectorized](udaf) fix java udaf meet error of std::bad_alloc (#17848)
Now if the user code of java udaf throws exception, because c++ code of agg function nobody could deal
with it, so maybe get error of std::bad_alloc
2023-03-19 11:52:15 +08:00
Pxl
1a549edac2 [Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202)
upgrade thrift from 0.13 to 0.16
There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md
2023-03-10 11:33:16 +08:00
4ef46159ae [vectorized](udaf) support array type for java-udaf (#17351) 2023-03-09 11:30:07 +08:00
d908d5fe01 [dependency](fe)Dependency Upgrade (#17377)
* Upgrade log4j to 2.X
  - binding log4j version to 2.18.0
  - used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
2023-03-08 14:28:40 +08:00
48c2d806d7 [enhencement](jdbc catalog) Use Druid instead of HikariCP in JdbcClient (#17395)
This pr does three things:
1. Use Druid instead of HikariCP in JdbcClient
2. when download udf jar, add the name of the jar package after the local file name.
3. refactor some jdbcResource code
2023-03-07 08:51:10 +08:00
c2cc75d741 [BugFix](Jdbc Catalog) Fix null pointer exception in JdbcExecutor (#16958)
This pr do two things:
1. fix: 
    It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null !

2. Enhencement
    In the original logic, all fields in jdbc catalog table will be set Nullable.
    However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table
    is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.
2023-02-23 14:04:54 +08:00
dc3dab5a23 [vectorized](jdbc) fix jdbc connect sql server error (#16929) 2023-02-22 19:36:27 +08:00
54bf40b6e7 [feature](Nereids): Eliminate duplicate join condition. (#16910) 2023-02-21 19:40:44 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
af5dc7565e [bug](udf) fix udf return type of decimal check scale must is 9 (#16497) 2023-02-14 10:53:53 +08:00
b99e2dc727 [bug](jdbc) fix jdbc can't get object of PGobject (#16496)
when pg table have some  unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
 
Some test need this pr: #16442
2023-02-10 16:19:02 +08:00
458adf6c91 [improvement](jdbc) refator jdbc of copy result set by batch (#16337)
have test jdbc external table with read,  10%+ performance improvement after optimization
2023-02-04 22:51:55 +08:00
253445ca46 [vectorzied](jdbc) fix jdbc executor for get result by batch and memo… (#15843)
result set should be get by batch size2.
fix memory leak3.
2023-01-21 08:22:22 +08:00
01c001e2ac [refactor](javaudf) simplify UdfExecutor and UdafExecutor (#16050)
* [refactor](javaudf) simplify UdfExecutor and UdafExecutor

* update

* update
2023-01-21 08:07:28 +08:00
7814d2b651 [Fix](Oracle External Table) fix that oracle external table can not insert batch values (#16117)
Issue Number: close #xxx

This pr fix two bugs:

_jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914
In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is:
insert all
into table(col1,col2) values(c1v1, c2v1)
into table(col1,col2) values(c1v2, c2v2)
SELECT 1 FROM DUAL;
2023-01-21 07:57:12 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
4035bd83c3 [fix](jdbc) fix jdbc driver bug and external datasource p2 test case issue (#16033)
Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.

Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data

set useCursorFetch in jdbc url for both MySQL and PostgreSQL.

Fix some p2 external datasource bug
2023-01-18 17:48:06 +08:00
4b49d05e97 [refactor](fe) remove type related class to fe-common to reduce java-udf jar size (#15808) 2023-01-17 00:01:15 +08:00
2c9c7c48ac [improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674) 2023-01-09 15:13:16 +08:00
df2da89b89 [feature](multi-catalog) support postgresql jdbc catalog (#15570)
support postgresql jdbc catalog
2023-01-06 11:00:59 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
d48abd91df [deps](fe)upgrade deps version (#15262)
upgrade hadoop version to 2.10.2
jackson-databind to 2.14.1
2022-12-24 22:18:10 +08:00
e8bac706d3 [deps](FE)Upgrade the velocity version that hive-exec depends on to 2.3 (#15067) 2022-12-19 14:20:11 +08:00
17e14e9a63 [bug](udaf) fix java udaf incorrect get null value with row (#15151) 2022-12-19 10:07:12 +08:00
962810b973 [Vectorized](jdbc) add check type for jdbc table (#14501) 2022-12-08 10:27:47 +08:00
9d2cb133f2 [fix](jdbc) fix logger error of statusLogger unrecognized (#14854)
* [fix](jdbc) fix logger error of statusLogger unrecognized

* update
2022-12-07 11:43:05 +08:00
9272680d00 [feature](multi-catalog) support Jdbc catalog (#14527)
Issue Number: close #xxx

I add jdbc catalog for doris multi-catalog feature.
Currently, the jdbc catalog only supports MYSQL DBMS.

TODO:

support for postgre DB
Support for other databases.
Problem summary
For jdbc catalog, we can create catalog like:

CREATE CATALOG jdbc4 PROPERTIES (
    "type"="jdbc",
    "jdbc.user"="root",
    "jdbc.password"="123456",
    "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false",
    "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar",
    "jdbc.driver_class" = "com.mysql.jdbc.Driver"
);
Note:
yearIsDateType is a param of jdbc:
If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight.
To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.
2022-11-30 11:28:08 +08:00
36419fae48 [fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598)
JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.
2022-11-26 23:53:05 +08:00
496a92b668 [JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519) 2022-11-23 23:36:39 +08:00
ce489cf723 [Feature](JDBC)support clickhouse jdbc external table (#14244) 2022-11-21 10:33:53 +08:00