Commit Graph

61 Commits

Author SHA1 Message Date
575c1620c2 [Improve](fe)Use commons-lang3 uniformly and refactor PatternGenerator#generateTypePattern (#18666)
`commons-lang`(1and2) is no longer maintained since 2011, and the official recommendation is `commons-lang3`, which can be smoothly upgraded to be compatible with `commons-lang`.
We use both dependencies in `fe`, which can be completely unified.

`PatternGenerator#generateTypePattern` has many meaningless loops, and IntegerRange is introduced for,
which is unnecessary. So I refactored it.
2023-04-17 20:15:17 +08:00
ddbff2aa39 [feature](jni) map c++ block to java vector table (#18566)
PR(#17960) has introduced vector table which can map java table to c++ block.
In some cases(java udf & jdbc exector), we should map c++ block to java table. This PR implements this function.

The memory structure of java vector table and c++ block is consistent,
so the implementation doesn't copy the block, just passes the memory address.
2023-04-17 00:04:53 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
e1b3955e05 [refactor](jdbc) using jvm parameters to init jdbc datasource (#18670)
using the jvm parameters to init jdbc datasource connect pool.
if anyone don't need to maintain the connect, so could set JDBC_MIN_POOL=0
2023-04-14 18:45:29 +08:00
1d3699a70c [refactor](jdbc) refactor jdbc connection num in datasource (#18563)
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
2023-04-13 22:08:08 +08:00
e29fc3b46b [fix](chore) fix compile failed in JdbcExecutor and revert #18306 since be crash randomly (#18371)
fix 2 problems:
1. PR #18187 use the api resizeColumn in JNINativeMethod has been removed by #17960
2. revert PR #18306 to fix pipeline core when load
2023-04-04 20:04:28 +08:00
54dbb4af67 [vectorzied](jdbc) refactor jdbc table read array type (#18187)
jdbc read array type get result from Doris is string, PG is java.sql.array, CK is java.lang.object
it's difficult to maintain and read the code,
so change all database's array result to string, then add a cast function from string to doris array type
2023-04-04 11:57:04 +08:00
fe9d2b00fc [test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007) 2023-04-03 20:18:36 +08:00
1c2f95b887 [improve](clickhouse jdbc) support clickhouse jdbc 4.x version (#18258)
In clickhouse's 4.x version of jdbc, some UInt types use special Java types, so I adapted Doris's ClickHouse JDBC External
```
com.clickhouse.data.value.UnsignedByte;
com.clickhouse.data.value.UnsignedInteger;
com.clickhouse.data.value.UnsignedLong;
com.clickhouse.data.value.UnsignedShort;
```
2023-03-31 13:40:10 +08:00
d6b0fe9072 [feature](jni) jni table scanner framework (#17960)
A framework that read data from jni scanner, which can support the data source from java ecosystem(java API).

## Java Interface
Java scanner should extends `org.apache.doris.jni.JniScanner`, implements the following methods:
```
// Initialize JniScanner
public abstract void open() throws IOException;
// Close JniScanner and release resources
public abstract void close() throws IOException;
// Scan data and save as vector table
public abstract int getNext() throws IOException;
```
See demo usage in `org.apache.doris.jni.MockJniScanner`

## c++ interface
C++ reader should use `doris::JniConnector` to get data from `org.apache.doris.jni.JniScanner`. See demo usage in `doris::MockJniReader`. 

## Pushed-down predicates
Java scanner can get pushed-down predicates by `org.apache.doris.jni.vec.ScanPredicate`.

## Remaining works:
1. Implement complex nested types.
2. Read hudi MOR table as the end-to-end demo usage.
2023-03-30 23:47:45 +08:00
3e8b3d68fc [BugFix](jdbc catalog) fix OOM when jdbc catalog querys large data from doris #18067
When using JDBC Catalog to query the Doris data, because Doris does not provide the cursor reading method (that is, fetchBatchSize is invalid), Doris will send the data to the client at one time, resulting in client OOM.

The MySQL protocol provides a stream reading method. Doris can use this method to avoid OOM. The requirements of using the stream method are setting fetchbatchsize =  Integer.MIN_VALUE and setting ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY
2023-03-26 20:02:03 +08:00
e2e806a5e7 [improve](clickhouse jdbc) support clickhouse array type (#17993)
In this PR, I match the array type of ClickHouse to the array type of Doris's jdbc external.
2023-03-22 19:42:32 +08:00
e359e412e1 [vectorized](udaf) fix java udaf meet error of std::bad_alloc (#17848)
Now if the user code of java udaf throws exception, because c++ code of agg function nobody could deal
with it, so maybe get error of std::bad_alloc
2023-03-19 11:52:15 +08:00
Pxl
1a549edac2 [Chore](third-party) upgrade thrift from 0.13 to 0.16 (#17202)
upgrade thrift from 0.13 to 0.16
There is thrift's release notes https://github.com/apache/thrift/blob/master/CHANGES.md
2023-03-10 11:33:16 +08:00
4ef46159ae [vectorized](udaf) support array type for java-udaf (#17351) 2023-03-09 11:30:07 +08:00
d908d5fe01 [dependency](fe)Dependency Upgrade (#17377)
* Upgrade log4j to 2.X
  - binding log4j version to 2.18.0
  - used log4j-1.2-api complete smooth upgrade
* Upgrade filerupload to 1.5
* Upgrade commons-io to 2.7
* Upgrade commons-compress to 1.22
* Upgrade gson to 2.8.9
* Upgrade guava to 30.0-jre
* Binding jackson version to 2.14.2
* Upgrade netty-all to 4.1.89.final
* Upgrade protobuf to 3.21.12
* Upgrade kafka-clints to 3.4.0
* Upgrade calcite version to 1.33.0
* Upgrade aws-java-sdk to 1.12.302
* Upgrade hadoop to 3.3.4
* Upgrade zookeeper to 3.4.14
* Binding tomcat-embed-core to 8.5.86
* Upgrade apache parent pom to 25
* Use hive-exec-core as a hive dependency, add the missing jar-hive-serde separately
* Basic public dependencies are extracted to parent dependencies
* Use jackson uniformly as the basic json tool
* Remove springloaded, spring-boot-devtools has the same functionality
* Modify the spark-related dependency scope to provide, which should be provided at runtime
2023-03-08 14:28:40 +08:00
c2cc75d741 [BugFix](Jdbc Catalog) Fix null pointer exception in JdbcExecutor (#16958)
This pr do two things:
1. fix: 
    It use `column[0]` to judge class type in JdbcExecutor, but column[0] may be null !

2. Enhencement
    In the original logic, all fields in jdbc catalog table will be set Nullable.
    However, it is inefficient for nullable fields. Actually, we can know if the fields in data source table
    is nullable through jdbc. So we can set the corresponding fields in Doris jdbc catalog to nullable or not.
2023-02-23 14:04:54 +08:00
dc3dab5a23 [vectorized](jdbc) fix jdbc connect sql server error (#16929) 2023-02-22 19:36:27 +08:00
54bf40b6e7 [feature](Nereids): Eliminate duplicate join condition. (#16910) 2023-02-21 19:40:44 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
af5dc7565e [bug](udf) fix udf return type of decimal check scale must is 9 (#16497) 2023-02-14 10:53:53 +08:00
b99e2dc727 [bug](jdbc) fix jdbc can't get object of PGobject (#16496)
when pg table have some  unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
 
Some test need this pr: #16442
2023-02-10 16:19:02 +08:00
458adf6c91 [improvement](jdbc) refator jdbc of copy result set by batch (#16337)
have test jdbc external table with read,  10%+ performance improvement after optimization
2023-02-04 22:51:55 +08:00
253445ca46 [vectorzied](jdbc) fix jdbc executor for get result by batch and memo… (#15843)
result set should be get by batch size2.
fix memory leak3.
2023-01-21 08:22:22 +08:00
01c001e2ac [refactor](javaudf) simplify UdfExecutor and UdafExecutor (#16050)
* [refactor](javaudf) simplify UdfExecutor and UdafExecutor

* update

* update
2023-01-21 08:07:28 +08:00
7814d2b651 [Fix](Oracle External Table) fix that oracle external table can not insert batch values (#16117)
Issue Number: close #xxx

This pr fix two bugs:

_jdbc_scanner may be nullptr in vjdbc_connector.cpp, so we use another method to count jdbc statistic. close [Enhencement](jdbc scanner) add profile for jdbc scanner #15914
In the batch insertion scenario, oracle database does not support syntax insert into tables values (...),(...); , what it supports is:
insert all
into table(col1,col2) values(c1v1, c2v1)
into table(col1,col2) values(c1v2, c2v2)
SELECT 1 FROM DUAL;
2023-01-21 07:57:12 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
4035bd83c3 [fix](jdbc) fix jdbc driver bug and external datasource p2 test case issue (#16033)
Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.

Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data

set useCursorFetch in jdbc url for both MySQL and PostgreSQL.

Fix some p2 external datasource bug
2023-01-18 17:48:06 +08:00
4b49d05e97 [refactor](fe) remove type related class to fe-common to reduce java-udf jar size (#15808) 2023-01-17 00:01:15 +08:00
2c9c7c48ac [improvement](decimalv3) Java UDF and array type support DECIMALV3 (#15674) 2023-01-09 15:13:16 +08:00
df2da89b89 [feature](multi-catalog) support postgresql jdbc catalog (#15570)
support postgresql jdbc catalog
2023-01-06 11:00:59 +08:00
85c7c531f1 [vectorized](jdbc) support array type in jdbc external table (#15303) 2022-12-30 00:29:08 +08:00
17e14e9a63 [bug](udaf) fix java udaf incorrect get null value with row (#15151) 2022-12-19 10:07:12 +08:00
962810b973 [Vectorized](jdbc) add check type for jdbc table (#14501) 2022-12-08 10:27:47 +08:00
9d2cb133f2 [fix](jdbc) fix logger error of statusLogger unrecognized (#14854)
* [fix](jdbc) fix logger error of statusLogger unrecognized

* update
2022-12-07 11:43:05 +08:00
9272680d00 [feature](multi-catalog) support Jdbc catalog (#14527)
Issue Number: close #xxx

I add jdbc catalog for doris multi-catalog feature.
Currently, the jdbc catalog only supports MYSQL DBMS.

TODO:

support for postgre DB
Support for other databases.
Problem summary
For jdbc catalog, we can create catalog like:

CREATE CATALOG jdbc4 PROPERTIES (
    "type"="jdbc",
    "jdbc.user"="root",
    "jdbc.password"="123456",
    "jdbc.jdbc_url" = "jdbc:mysql://127.0.0.1:13396/demo?yearIsDateType=false",
    "jdbc.driver_url" = "file:/mnt/disk2/ftw/tools/jar/mysql-connector-java-5.1.47/mysql-connector-java-5.1.47.jar",
    "jdbc.driver_class" = "com.mysql.jdbc.Driver"
);
Note:
yearIsDateType is a param of jdbc:
If yearIsDateType configuration property is set to false, then the returned object type is java.sql.Short. If set to true (the default), then the returned object is of type java.sql.Date with the date set to January 1st, at midnight.
To compat with mysql, we force the use of yearIsDateType=false in FE. if user sets yearIsDateType=true, doris FE will force to change yearIsDateType=false.
2022-11-30 11:28:08 +08:00
36419fae48 [fix](JdbcExecutor) fix that JdbcExecutor did not load the class jar (#14598)
JdbcExecutor did not load jdbc driver jar, so add classloader to load jdbc jar.
2022-11-26 23:53:05 +08:00
496a92b668 [JavaUDF](loader) Fix compatible problem for JAVA 11 (#14519) 2022-11-23 23:36:39 +08:00
ce489cf723 [Feature](JDBC)support clickhouse jdbc external table (#14244) 2022-11-21 10:33:53 +08:00
df622d8b7d [Bug](udf) fix java-udaf process string type error and add some tests (#14106) 2022-11-10 09:30:57 +08:00
0a228a68d6 [Improvement](javaudf) support different date argument for date/datetime type (#13920) 2022-11-03 20:33:20 +08:00
287a739510 [javaudf](string) Fix string format in java udf (#13854) 2022-11-01 21:25:12 +08:00
3c95106d45 [Bug](jdbc) Fix memory leak for JDBC datasource (#13657) 2022-10-27 00:02:25 +08:00
847b80ebfa [test](jdbc) add jdbc and hive regression test (#13143)
1. Modify default behavior of `build.sh`
    The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.

2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
   See `docker/thirdparties/docker-compose`.

3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
   The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.

4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
2022-10-21 15:29:27 +08:00
22a8d35999 [Feature](vectorized) support jdbc sink for insert into data to table (#12534) 2022-09-15 11:08:41 +08:00
42bdde8750 [Feature](Vectorized) support jdbc scan node (#12010) 2022-09-07 10:29:41 +08:00
adfef85c0c [improve](fe): use Pair.of to replace new Pair<>() (#11945) 2022-08-22 08:27:40 +08:00
Pxl
461a31b1f6 [enhancement][Storage] refactor create predicate (#11017) 2022-07-27 16:12:23 +08:00
e361eb385e [vectorized][udf] improvement java-udaf with group by clause (#10296)
save for file about udaf
add bool _destory_deserialize
update some code according reviewer
change destroy all data at once
2022-07-14 11:23:42 +08:00
3b46242483 [feature-wip] Optimize Decimal type (#10794)
* [feature-wip](decimalv3) support decimalv3

* [feature-wip] Optimize Decimal type

Co-authored-by: liaoxin <liaoxinbit@126.com>
2022-07-14 10:50:50 +08:00