Commit Graph

69 Commits

Author SHA1 Message Date
657e927d50 [fix](json)Fix the bug that read json file Out of bounds access (#23411) 2023-09-02 01:11:37 +08:00
4c00b1760b [feature](partial update) Support partial update for broker load (#22970) 2023-08-29 14:41:01 +08:00
d19dcd6bc1 [improve](jdbc catalog) support sqlserver uniqueidentifier data type (#23297) 2023-08-28 10:30:10 +08:00
a7675243d9 [fix](jdbc catalog) fix adaptation to Oracle special character / table names (#23080)
The changes of this PR for JdbcOracleClient are as follows:

#### bug fixes:
  1. Fix the problem that if there is an approximate table name for Schema synchronization with a table name with `/` characters, the synchronization Column will be confused
  2. Fix the NPE problem of metadata synchronization after enabling lower_case_table_names configuration

#### improvement:
  1. Modify the method of synchronizing Oracle User to Doris Database mapping, use `metadata.getSchemas` instead of `SELECT DISTINCT OWNER FROM all_tables`
  2. When synchronizing metadata, change `null` at the catalog level to `conn.getcatalog`
2023-08-22 15:25:42 +08:00
51db11ed0b [improve](jdbc catalog) Add a variable to accommodate the final keyword in ClickHouse Jdbc Catalog queries (#23282) 2023-08-22 12:13:36 +08:00
41ff48f838 [regresstion][external]fix case test_show_where and es_query 0811 (#22898) 2023-08-12 19:41:55 +08:00
23094a01d4 [fix](test) load data inpath will remove the data in hdfs (#22908)
Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.
2023-08-12 15:12:00 +08:00
de5603da6b [regresstion][external]fix jdbc cases fail external 0809 (#22761)
fix jdbc cases fail external 0809
2023-08-10 15:23:30 +08:00
3eeca7ee55 [enhance](regresstion case)add external group mark 0727 (#22287)
* add external group mark 0727

* add external pipeline regression conf
 0727

* update pipeline regression config 0727

* open es config from docker 0727
2023-07-28 17:11:19 +08:00
6f1c03c766 [fix](jdbc_catalog) fix int and bigint in mysql view when use doris catalog (#22251) 2023-07-27 16:50:42 +08:00
4f6a3c5bf0 [feature](catalog) support clob type in oracle jdbc catalog (#21532) 2023-07-27 15:49:15 +08:00
cf677b327b [fix](jdbc catalog) Fixed mappings with type errors for bool and tinyint(1) (#22089)
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
2023-07-25 22:45:22 +08:00
5ed42705d4 [fix](jdbc scan) 1=1 does not translate to TRUE (#21688)
For most database systems, they recognize where 1=1 but not where true, so we should send the original 1=1 to the database
2023-07-11 14:04:49 +08:00
0be349e250 [feature](jdbc) Support jdbc catalog to read json types (#21341) 2023-07-10 16:21:00 +08:00
124516c1ea [Fix](orc-reader) Fix Wrong data type for column error when column order in hive table is not same in orc file schema. (#21306)
`Wrong data type for column` error when column order in hive table is not same in orc file schema.

The root cause is in order to handle the following case:

The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping.

### Solution
Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration.

```sql
CREATE CATALOG hive PROPERTIES (
    'hive.version' = '1.x.x'
);
```
2023-07-03 09:32:55 +08:00
449c8d4568 [fix](jdbc) Handling Zero DateTime Values in Non-nullable Columns for JDBC Catalog Reading MySQL (#21296) 2023-06-28 22:51:17 +08:00
a6ff87f32c [docker](trino) add Trino docker compose and hive catalog (#21086) 2023-06-28 11:04:41 +08:00
d871df64ca [improvement](oracle jdbc)Support for automatically obtaining the precision of the oracle timestamp type (#21252) 2023-06-28 00:19:01 +08:00
c9306e9c48 [improvement](ms jdbc)Support for automatically obtaining the precision of the sqlserver datetime type (#21145) 2023-06-26 23:10:46 +08:00
7e01f074e2 [improvement](jdbc mysql) support auto calculate the precision of timestamp/datetime (#20788) 2023-06-20 10:39:34 +08:00
fe18cfa2fb [improvement](pg jdbc)Support for automatically obtaining the precision of the postgresql timestamp type (#20909) 2023-06-16 23:41:09 +08:00
367f64e7bd [improvement](jdbc) support insert autoinc and default value column to mysql (#20765)
In JdbcMysqlClient, I've added methods to retrieve auto-increment and default value columns from MySQL. These columns are then mapped into Doris metadata to make them visible to users.

When handling the InsertStmt into an execution plan, Doris used to automatically fill in NULL or default values for columns not specified in the InsertStmt. However, in the JDBC catalog, we don't need Doris to handle these unspecified columns, so I've made changes to skip them directly.

For the insert prepared statement required for writing, our previous behavior was to obtain all columns for placeholders. So, the change I made is to pass in the columns processed by the execution plan during the sink task generation stage for dynamic generation.
2023-06-16 23:38:11 +08:00
722839e118 [Fix](multi-catalog) Fix hive transaction table regression test by adding hive-docker missing configurations. (#20832)
Fix hive transaction table regression test test_transactional_hive by adding hive-docker missing configurations of #20679. Hive need to be set these configurations to do compaction.
2023-06-16 13:08:24 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
4faee4d8fd [Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537)
Fix be crashed when query hive table after schema changed(new column added).

Regression Test: test_hive_schema_evolution.groovy
2023-06-08 18:10:36 +08:00
bd74890cf7 [fix](multi-catalog) JDBC Catalog Unknown UNSIGNED type of mysql, type: [DOUBLE] (#19912) 2023-05-23 09:29:57 +08:00
9535ed01aa [feature](tvf) Support compress file for tvf hdfs() and s3() (#19530)
We can support this by add a new properties for tvf, like :

`select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)`

User can:

Specify compression explicitly by setting `"compression" = "xxx"`.
Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`)
Currently, we only support reading compress file in `csv` format, and on BE side, we already support.
All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.
2023-05-16 08:50:43 +08:00
4418eb36a3 [Fix](multi-catalog) Fix some hive partition issues. (#19513)
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
2023-05-11 07:49:46 +08:00
3a22af836e [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion (#19463)
* [fix](jdbc catalog) fix error to clickhouse uint64 type Conversion

* add test case
2023-05-10 21:53:30 +08:00
224bca3794 [docker](hudi) add hudi docker compose (#19048) 2023-05-02 09:54:52 +08:00
Pxl
ec517a53a8 [Chore](build) upgrade clang-format version to 16 && move thrift to fe-common (#19155)
upgrade clang-format version to 16
move thrift to fe-common
fix core dump on pipeline engine when operator canceled and not prepared
2023-04-28 14:14:51 +08:00
25e8c71943 [test](fix) fix postgresql test (#18900)
* [test](fix) fix postgresql test

* fix
2023-04-23 18:41:41 +08:00
1ff2ccc6c5 [Fix](docker) Fix regression test docker issues. (#18928)
1. Fix not reset data after pg restarted.
2. 'docker-compose' to 'docker compose'.
2023-04-22 18:03:50 +08:00
afdac1204d [improve](postgresql catalog) support postgresql bytea type to doris string (#18623)
* [improve](postgresql catalog) support postgresql bytea type to doris string

* modify function name

* add case
2023-04-16 18:14:42 +08:00
fe9d2b00fc [test](jdbc catalog) add clickhouse jdbc catalog base type test (#18007) 2023-04-03 20:18:36 +08:00
32ccf0c68d [test](case)add external hive parquet case 0328 #18169
add case about external hive parquet
2023-03-29 09:13:03 +08:00
4ba93efc98 [Enhance](DOE)Support parse default es iso datetime string (#17412)
* support parse default es iso datetime string
2023-03-10 09:59:20 +08:00
9bcc3ae283 [Fix](DOE)Fix be core dump when parse es epoch_millis date format (#17100) 2023-02-28 20:09:35 +08:00
3a9aa03aab [BugFix](oracle-catalog) Modify the doris data type mapping of oracle NUMBER(p,s) type (#17051)
The data type `NUMBER(p,s)` of oracle has some different of doris decimal type in semantics. 
For Oracle Number(p,s) type:
1. 
if s<0 , it means this is an Interger. This `NUMBER(p,s)` has (p+|s| ) significant digit,
and rounding will be performed at s position.
eg:  if we insert 1234567 into `NUMBER(5,-2)` type, then the oracle will store 1234500. In this case,
Doris will use
int type (`TINYINT/SMALLINT/INT/.../LARGEINT`).

2. if s>=0 && s<p , it just like doris Decimal(p,s) behavior.

3. if s>=0 && s>p, it means this is a decimal(like 0.xxxxx).
p represents how many digits can be left to the left after the decimal point,
the figure after the decimal point s will be rounded. eg: we can not insert 0.0123456 into `NUMBER(5,7)` type,
because there must be two zeros on the right side of the decimal point,
we can insert 0.0012345 into `NUMBER(5,7)` type. In this case, Doris will use `DECIMAL(s,s)`

4. if we don't specify p and s for `NUMBER(p,s)` like `NUMBER`,
the p and s of `NUMBER` are uncertain. In this case, doris can not determine p and s,
so doris can not determine data type.
2023-02-26 09:05:41 +08:00
92ecd16573 (feature)[DOE]Support array for Doris on ES (#16941)
* (feature)[DOE]Support array for Doris on ES
2023-02-23 19:31:18 +08:00
3c3110b253 [Fix](Jdbc Catalog) jdbc catalog support to connect to doris database (#16527)
Doris can use mysql-jdbc-jar to connect doris database, but doris has some data type that mysql without.
Such as DecimalV3 and Date/DatetimeV2
I add some case judgments in `Mysql Catalog` , so that Jdbc catalog can identify the data type of DORIS
2023-02-10 20:24:40 +08:00
557159d3ce [feature](JdbcExternalCatalog) support insert data in JdbcExternalCatalog (#16271) 2023-02-02 17:31:33 +08:00
c9f66250a8 [docker](iceberg) add iceberg docker compose and modify scripts (#16175)
Add iceberg docker compose
Rename start-thirdparties-docker.sh to run-thirdparties-docker.sh and support start to stop specified components.
2023-01-29 14:31:27 +08:00
1638936e3f [fix](oracle catalog) oracle catalog support TIMESTAMP dateType of oracle (#16113)
`TIMESTAMP` dateType of Oracle will map to `DateTime` dateType of Doris
2023-01-20 14:47:58 +08:00
ba71516eba [feature](jdbc catalog) support SQLServer jdbc catalog (#16093) 2023-01-20 12:37:38 +08:00
2580c88c1b [feature](multi-catalog) support oracle jdbc catalog (#15862) 2023-01-14 00:01:33 +08:00
500c7fb702 [improvement](multi-catalog) support unsupported column type (#15660)
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.

In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.

In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.

When query this table, there are serval situation:

select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
2023-01-08 10:07:10 +08:00
df2da89b89 [feature](multi-catalog) support postgresql jdbc catalog (#15570)
support postgresql jdbc catalog
2023-01-06 11:00:59 +08:00
e7a077a81f [fix](jdbc catalog) fix bugs of jdbc catalog and table valued function (#15216)
* fix bugs

* add `desc function` test

* add test

* fix
2022-12-23 16:46:39 +08:00
7627defc88 [fix](regression-test) Add test data for test_mysql_jdbc_catalog and fix mysql-5.7.yaml about UTF8 (#14749)
Fix two things:
1. Fix that the MySQL table displays the garbled code even if the UTF8 is specified for table.
2. Fix that `test_mysql_jdbc_catalog.out` lack of returned data for table `ex_tb13`.
2022-12-02 11:58:11 +08:00