[Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035)

This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.

About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
This commit is contained in:
zy-kkk
2024-01-18 10:11:38 +08:00
committed by yiguolei
parent c1722a6ab0
commit 0ccd706a30
9 changed files with 1183 additions and 1184 deletions

View File

@ -706,6 +706,7 @@ Status JdbcConnector::_cast_string_to_bitmap(const SlotDescriptor* slot_desc, Bl
return Status::OK();
}
// Deprecated, this code is retained only for compatibility with query problems that may be encountered when upgrading the version that maps JSON to JSONB to this version, and will be deleted in subsequent versions.
Status JdbcConnector::_cast_string_to_json(const SlotDescriptor* slot_desc, Block* block,
int column_index, int rows) {
DataTypePtr _target_data_type = slot_desc->get_data_type_ptr();

View File

@ -303,37 +303,37 @@ CREATE CATALOG jdbc_mysql PROPERTIES (
#### Type Mapping
| MYSQL Type | Doris Type | Comment |
|-------------------------------------------|----------------|-------------------------------------------------------------------------------|
| BOOLEAN | TINYINT | |
| TINYINT | TINYINT | |
| SMALLINT | SMALLINT | |
| MEDIUMINT | INT | |
| INT | INT | |
| BIGINT | BIGINT | |
| UNSIGNED TINYINT | SMALLINT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED MEDIUMINT | INT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED INT | BIGINT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED BIGINT | LARGEINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DECIMAL | DECIMAL | |
| UNSIGNED DECIMAL(p,s) | DECIMAL(p+1,s) / STRING | If p+1>38, the Doris STRING type will be used. |
| DATE | DATE | |
| TIMESTAMP | DATETIME | |
| DATETIME | DATETIME | |
| YEAR | SMALLINT | |
| TIME | STRING | |
| CHAR | CHAR | |
| VARCHAR | VARCHAR | |
| JSON | JSON | |
| SET | STRING | |
| BIT | BOOLEAN/STRING | BIT(1) will be mapped to BOOLEAN, and other BITs will be mapped to STRING |
| TINYTEXT、TEXT、MEDIUMTEXT、LONGTEXT | STRING | |
| BLOB、MEDIUMBLOB、LONGBLOB、TINYBLOB | STRING | |
| TINYSTRING、STRING、MEDIUMSTRING、LONGSTRING | STRING | |
| BINARY、VARBINARY | STRING | |
| Other | UNSUPPORTED | |
| MYSQL Type | Doris Type | Comment |
|-------------------------------------------|-------------------------|----------------------------------------------------------------------------------------|
| BOOLEAN | TINYINT | |
| TINYINT | TINYINT | |
| SMALLINT | SMALLINT | |
| MEDIUMINT | INT | |
| INT | INT | |
| BIGINT | BIGINT | |
| UNSIGNED TINYINT | SMALLINT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED MEDIUMINT | INT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED INT | BIGINT | Doris does not have an UNSIGNED data type, so expand by an order of magnitude |
| UNSIGNED BIGINT | LARGEINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DECIMAL | DECIMAL | |
| UNSIGNED DECIMAL(p,s) | DECIMAL(p+1,s) / STRING | If p+1>38, the Doris STRING type will be used. |
| DATE | DATE | |
| TIMESTAMP | DATETIME | |
| DATETIME | DATETIME | |
| YEAR | SMALLINT | |
| TIME | STRING | |
| CHAR | CHAR | |
| VARCHAR | VARCHAR | |
| JSON | STRING | For better performance, map JSON from external data sources to STRING instead of JSONB |
| SET | STRING | |
| BIT | BOOLEAN/STRING | BIT(1) will be mapped to BOOLEAN, and other BITs will be mapped to STRING |
| TINYTEXT、TEXT、MEDIUMTEXT、LONGTEXT | STRING | |
| BLOB、MEDIUMBLOB、LONGBLOB、TINYBLOB | STRING | |
| TINYSTRING、STRING、MEDIUMSTRING、LONGSTRING | STRING | |
| BINARY、VARBINARY | STRING | |
| Other | UNSUPPORTED | |
### PostgreSQL
@ -366,30 +366,30 @@ Doris obtains all schemas that PG user can access through the SQL statement: `se
#### Type Mapping
| POSTGRESQL Type | Doris Type | Comment |
|-----------------------------------------|----------------|-------------------------------------------|
| boolean | BOOLEAN | |
| smallint/int2 | SMALLINT | |
| integer/int4 | INT | |
| bigint/int8 | BIGINT | |
| decimal/numeric | DECIMAL | |
| real/float4 | FLOAT | |
| double precision | DOUBLE | |
| smallserial | SMALLINT | |
| serial | INT | |
| bigserial | BIGINT | |
| char | CHAR | |
| varchar/text | STRING | |
| timestamp | DATETIME | |
| date | DATE | |
| json/josnb | JSON | |
| time | STRING | |
| interval | STRING | |
| point/line/lseg/box/path/polygon/circle | STRING | |
| cidr/inet/macaddr | STRING | |
| bit | BOOLEAN/STRING | bit(1) will be mapped to BOOLEAN, and other bits will be mapped to STRING |
| uuid | STRING | |
| Other | UNSUPPORTED | |
| POSTGRESQL Type | Doris Type | Comment |
|-----------------------------------------|----------------|----------------------------------------------------------------------------------------|
| boolean | BOOLEAN | |
| smallint/int2 | SMALLINT | |
| integer/int4 | INT | |
| bigint/int8 | BIGINT | |
| decimal/numeric | DECIMAL | |
| real/float4 | FLOAT | |
| double precision | DOUBLE | |
| smallserial | SMALLINT | |
| serial | INT | |
| bigserial | BIGINT | |
| char | CHAR | |
| varchar/text | STRING | |
| timestamp | DATETIME | |
| date | DATE | |
| json/jsonb | STRING | For better performance, map JSON from external data sources to STRING instead of JSONB |
| time | STRING | |
| interval | STRING | |
| point/line/lseg/box/path/polygon/circle | STRING | |
| cidr/inet/macaddr | STRING | |
| bit | BOOLEAN/STRING | bit(1) will be mapped to BOOLEAN, and other bits will be mapped to STRING |
| uuid | STRING | |
| Other | UNSUPPORTED | |
### Oracle

View File

@ -303,37 +303,37 @@ CALL EXECUTE_STMT(jdbc_catalog", "create table dbl1.tbl2 (k1 int)");
#### 类型映射
| MYSQL Type | Doris Type | Comment |
|-------------------------------------------|----------------|-------------------------------------------------|
| BOOLEAN | TINYINT | |
| TINYINT | TINYINT | |
| SMALLINT | SMALLINT | |
| MEDIUMINT | INT | |
| INT | INT | |
| BIGINT | BIGINT | |
| UNSIGNED TINYINT | SMALLINT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED MEDIUMINT | INT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED INT | BIGINT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED BIGINT | LARGEINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DECIMAL | DECIMAL | |
| UNSIGNED DECIMAL(p,s) | DECIMAL(p+1,s) / STRING | 如果p+1>38, 将使用Doris STRING类型 |
| DATE | DATE | |
| TIMESTAMP | DATETIME | |
| DATETIME | DATETIME | |
| YEAR | SMALLINT | |
| TIME | STRING | |
| CHAR | CHAR | |
| VARCHAR | VARCHAR | |
| JSON | JSON | |
| SET | STRING | |
| BIT | BOOLEAN/STRING | BIT(1) 会映射为 BOOLEAN,其他 BIT 映射为 STRING |
| TINYTEXT、TEXT、MEDIUMTEXT、LONGTEXT | STRING | |
| BLOB、MEDIUMBLOB、LONGBLOB、TINYBLOB | STRING | |
| TINYSTRING、STRING、MEDIUMSTRING、LONGSTRING | STRING | |
| BINARY、VARBINARY | STRING | |
| Other | UNSUPPORTED | |
| MYSQL Type | Doris Type | Comment |
|-------------------------------------------|-------------------------|-------------------------------------------------------|
| BOOLEAN | TINYINT | |
| TINYINT | TINYINT | |
| SMALLINT | SMALLINT | |
| MEDIUMINT | INT | |
| INT | INT | |
| BIGINT | BIGINT | |
| UNSIGNED TINYINT | SMALLINT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED MEDIUMINT | INT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED INT | BIGINT | Doris 没有 UNSIGNED 数据类型,所以扩大一个数量级 |
| UNSIGNED BIGINT | LARGEINT | |
| FLOAT | FLOAT | |
| DOUBLE | DOUBLE | |
| DECIMAL | DECIMAL | |
| UNSIGNED DECIMAL(p,s) | DECIMAL(p+1,s) / STRING | 如果p+1>38, 将使用Doris STRING类型 |
| DATE | DATE | |
| TIMESTAMP | DATETIME | |
| DATETIME | DATETIME | |
| YEAR | SMALLINT | |
| TIME | STRING | |
| CHAR | CHAR | |
| VARCHAR | VARCHAR | |
| JSON | STRING | 为了更好的性能,将外部数据源的 JSON 映射为 STRING 而不是JSONB |
| SET | STRING | |
| BIT | BOOLEAN/STRING | BIT(1) 会映射为 BOOLEAN,其他 BIT 映射为 STRING |
| TINYTEXT、TEXT、MEDIUMTEXT、LONGTEXT | STRING | |
| BLOB、MEDIUMBLOB、LONGBLOB、TINYBLOB | STRING | |
| TINYSTRING、STRING、MEDIUMSTRING、LONGSTRING | STRING | |
| BINARY、VARBINARY | STRING | |
| Other | UNSUPPORTED | |
### PostgreSQL
@ -366,30 +366,30 @@ Doris 通过sql 语句 `select nspname from pg_namespace where has_schema_privil
#### 类型映射
| POSTGRESQL Type | Doris Type | Comment |
|-----------------------------------------|----------------|-----------------------------------------------|
| boolean | BOOLEAN | |
| smallint/int2 | SMALLINT | |
| integer/int4 | INT | |
| bigint/int8 | BIGINT | |
| decimal/numeric | DECIMAL | |
| real/float4 | FLOAT | |
| double precision | DOUBLE | |
| smallserial | SMALLINT | |
| serial | INT | |
| bigserial | BIGINT | |
| char | CHAR | |
| varchar/text | STRING | |
| timestamp | DATETIME | |
| date | DATE | |
| json/josnb | JSON | |
| time | STRING | |
| interval | STRING | |
| point/line/lseg/box/path/polygon/circle | STRING | |
| cidr/inet/macaddr | STRING | |
| bit | BOOLEAN/STRING | bit(1)会映射为 BOOLEAN,其他 bit 映射为 STRING |
| uuid | STRING | |
| Other | UNSUPPORTED | |
| POSTGRESQL Type | Doris Type | Comment |
|-----------------------------------------|----------------|-----------------------------------------------------|
| boolean | BOOLEAN | |
| smallint/int2 | SMALLINT | |
| integer/int4 | INT | |
| bigint/int8 | BIGINT | |
| decimal/numeric | DECIMAL | |
| real/float4 | FLOAT | |
| double precision | DOUBLE | |
| smallserial | SMALLINT | |
| serial | INT | |
| bigserial | BIGINT | |
| char | CHAR | |
| varchar/text | STRING | |
| timestamp | DATETIME | |
| date | DATE | |
| json/jsonb | STRING | 为了更好的性能,将外部数据源的 JSON 映射为 STRING 而不是JSONB|
| time | STRING | |
| interval | STRING | |
| point/line/lseg/box/path/polygon/circle | STRING | |
| cidr/inet/macaddr | STRING | |
| bit | BOOLEAN/STRING | bit(1)会映射为 BOOLEAN,其他 bit 映射为 STRING |
| uuid | STRING | |
| Other | UNSUPPORTED | |
### Oracle

View File

@ -128,6 +128,7 @@ public class JdbcExecutor {
resultSet.close();
}
if (stmt != null) {
stmt.cancel();
stmt.close();
}
if (conn != null) {

View File

@ -312,7 +312,6 @@ public class JdbcMySQLClient extends JdbcClient {
return ScalarType.createStringType();
}
case "JSON":
return ScalarType.createJsonbType();
case "TIME":
case "TINYTEXT":
case "TEXT":
@ -430,9 +429,8 @@ public class JdbcMySQLClient extends JdbcClient {
}
case "STRING":
case "TEXT":
return ScalarType.createStringType();
case "JSON":
return ScalarType.createJsonbType();
return ScalarType.createStringType();
case "HLL":
return ScalarType.createHllType();
case "BITMAP":

View File

@ -101,10 +101,9 @@ public class JdbcPostgreSQLClient extends JdbcClient {
case "varbit":
case "uuid":
case "bytea":
return ScalarType.createStringType();
case "json":
case "jsonb":
return ScalarType.createJsonbType();
return ScalarType.createStringType();
default:
return Type.UNSUPPORTED;
}

View File

@ -92,7 +92,7 @@ date_col DATE Yes false \N NONE
datetime_col DATETIME(3) Yes false \N NONE
char_col CHAR(10) Yes false \N NONE
varchar_col VARCHAR(10) Yes false \N NONE
json_col JSON Yes false \N NONE
json_col TEXT Yes false \N NONE
-- !desc_ctas_arr --
int_col INT Yes true \N

View File

@ -292,9 +292,9 @@ information_schema
-- !mysql_all_types --
\N 302 \N 502 602 4.14159 \N 6.14159 \N -124 -302 2013 -402 -502 -602 \N 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 \N -7.1400 row2 \N 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 \N \N 0x2F \N 0x88656C6C9F Value3
201 301 401 501 601 3.14159 4.1415926 5.14159 1 -123 -301 2012 -401 -501 -601 2012-10-30 2012-10-25T12:05:36.345700 2012-10-25T08:08:08 -4.14145 -5.1400000001 -6.1400 row1 line1 09:09:09.567 text1 0x48656C6C6F20576F726C64 {"age":30,"city":"London","name":"Alice"} Option1,Option3 0x2A 0x48656C6C6F00000000000000 0x48656C6C6F Value2
202 302 402 502 602 4.14159 5.1415926 6.14159 0 -124 -302 2013 -402 -502 -602 2012-11-01 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 -6.1400000001 -7.1400 row2 line2 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 {"age":18,"city":"ChongQing","name":"Gaoxin"} Option1,Option2 0x2F 0x58676C6C6F00000000000000 0x88656C6C9F Value3
203 303 403 503 603 7.14159 8.1415926 9.14159 0 \N -402 2017 -602 -902 -1102 2012-11-02 \N 2013-10-27T08:11:18 -5.14145 -6.1400000000001 -7.1400 row3 line3 09:11:09.567 text3 0xE86F6C6C6F20576F726C67 {"age":24,"city":"ChongQing","name":"ChenQi"} Option2 0x2F 0x58676C6C6F00000000000000 \N Value1
201 301 401 501 601 3.14159 4.1415926 5.14159 1 -123 -301 2012 -401 -501 -601 2012-10-30 2012-10-25T12:05:36.345700 2012-10-25T08:08:08 -4.14145 -5.1400000001 -6.1400 row1 line1 09:09:09.567 text1 0x48656C6C6F20576F726C64 {"age": 30, "city": "London", "name": "Alice"} Option1,Option3 0x2A 0x48656C6C6F00000000000000 0x48656C6C6F Value2
202 302 402 502 602 4.14159 5.1415926 6.14159 0 -124 -302 2013 -402 -502 -602 2012-11-01 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 -6.1400000001 -7.1400 row2 line2 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 {"age": 18, "city": "ChongQing", "name": "Gaoxin"} Option1,Option2 0x2F 0x58676C6C6F00000000000000 0x88656C6C9F Value3
203 303 403 503 603 7.14159 8.1415926 9.14159 0 \N -402 2017 -602 -902 -1102 2012-11-02 \N 2013-10-27T08:11:18 -5.14145 -6.1400000000001 -7.1400 row3 line3 09:11:09.567 text3 0xE86F6C6C6F20576F726C67 {"age": 24, "city": "ChongQing", "name": "ChenQi"} Option2 0x2F 0x58676C6C6F00000000000000 \N Value1
-- !select_insert_all_types --
\N 302 \N 502 602 4.14159 \N 6.14159 \N -124 -302 2013 -402 -502 -602 \N 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 \N -7.1400 row2 \N 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 \N \N 0x2F \N 0x88656C6C9F Value3
@ -304,9 +304,9 @@ information_schema
-- !ctas --
\N 302 \N 502 602 4.14159 \N 6.14159 \N -124 -302 2013 -402 -502 -602 \N 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 \N -7.1400 row2 \N 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 \N \N 0x2F \N 0x88656C6C9F Value3
201 301 401 501 601 3.14159 4.1415926 5.14159 1 -123 -301 2012 -401 -501 -601 2012-10-30 2012-10-25T12:05:36.345700 2012-10-25T08:08:08 -4.14145 -5.1400000001 -6.1400 row1 line1 09:09:09.567 text1 0x48656C6C6F20576F726C64 {"age":30,"city":"London","name":"Alice"} Option1,Option3 0x2A 0x48656C6C6F00000000000000 0x48656C6C6F Value2
202 302 402 502 602 4.14159 5.1415926 6.14159 0 -124 -302 2013 -402 -502 -602 2012-11-01 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 -6.1400000001 -7.1400 row2 line2 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 {"age":18,"city":"ChongQing","name":"Gaoxin"} Option1,Option2 0x2F 0x58676C6C6F00000000000000 0x88656C6C9F Value3
203 303 403 503 603 7.14159 8.1415926 9.14159 0 \N -402 2017 -602 -902 -1102 2012-11-02 \N 2013-10-27T08:11:18 -5.14145 -6.1400000000001 -7.1400 row3 line3 09:11:09.567 text3 0xE86F6C6C6F20576F726C67 {"age":24,"city":"ChongQing","name":"ChenQi"} Option2 0x2F 0x58676C6C6F00000000000000 \N Value1
201 301 401 501 601 3.14159 4.1415926 5.14159 1 -123 -301 2012 -401 -501 -601 2012-10-30 2012-10-25T12:05:36.345700 2012-10-25T08:08:08 -4.14145 -5.1400000001 -6.1400 row1 line1 09:09:09.567 text1 0x48656C6C6F20576F726C64 {"age": 30, "city": "London", "name": "Alice"} Option1,Option3 0x2A 0x48656C6C6F00000000000000 0x48656C6C6F Value2
202 302 402 502 602 4.14159 5.1415926 6.14159 0 -124 -302 2013 -402 -502 -602 2012-11-01 2012-10-26T02:08:39.345700 2013-10-26T08:09:18 -5.14145 -6.1400000001 -7.1400 row2 line2 09:11:09.567 text2 0xE86F6C6C6F20576F726C67 {"age": 18, "city": "ChongQing", "name": "Gaoxin"} Option1,Option2 0x2F 0x58676C6C6F00000000000000 0x88656C6C9F Value3
203 303 403 503 603 7.14159 8.1415926 9.14159 0 \N -402 2017 -602 -902 -1102 2012-11-02 \N 2013-10-27T08:11:18 -5.14145 -6.1400000000001 -7.1400 row3 line3 09:11:09.567 text3 0xE86F6C6C6F20576F726C67 {"age": 24, "city": "ChongQing", "name": "ChenQi"} Option2 0x2F 0x58676C6C6F00000000000000 \N Value1
-- !ctas_desc --
bigint BIGINT Yes false \N NONE
@ -327,7 +327,7 @@ float FLOAT Yes false \N NONE
float_u FLOAT Yes false \N NONE
int INT Yes false \N NONE
int_u BIGINT Yes false \N NONE
json JSON Yes false \N NONE
json TEXT Yes false \N NONE
mediumint INT Yes false \N NONE
mediumint_u INT Yes true \N
set TEXT Yes false \N NONE

File diff suppressed because one or more lines are too long