Commit Graph

119 Commits

Author SHA1 Message Date
df5ec16d7c [Refactor](exectuor)Add schema type table active_queries (#32057)
* Add schema type table active_queries
2024-03-15 17:57:28 +08:00
31ee448c87 [test](fix) Fix one missing line of output in out file (#32036) 2024-03-12 14:17:55 +08:00
cf6b22c621 [fix](jdbc catalog) fix type conversion error in MySQL JDBC Driver 5.x (#31880) 2024-03-12 14:07:57 +08:00
c5390d00bb [Improvement]Add schema table backend_active_tasks (#31945) 2024-03-09 19:55:48 +08:00
5b00f4fbeb [improvement](jdbc catalog) opt get db2 schema list & xml type mapping (#31856)
1. Trim Schema Names: Adapted the system to remove trailing spaces from DB2 schema names, ensuring compatibility without affecting query operations.
2. XML Mapping: Implemented a feature to directly map XML types to String.
2024-03-07 16:53:19 +08:00
2e9bd268cd [improvement](jdbc catalog) support sqlserver timestamp type read (#31805) 2024-03-06 13:08:04 +08:00
07224686ef [feature](jdbc catalog) support db2 jdbc catalog (#31627) 2024-03-01 14:19:28 +08:00
4636b6195b [Fix](JNI) fix BE core when using JNI to query the empty map type value (#31502) 2024-02-29 14:03:38 +08:00
3c37fb085c [refactor](jdbc catalog) split jdbc executor for different data sources (step-1) (#31406) 2024-02-29 12:38:03 +08:00
b3ac2128dd [Refactor](catalog) Refactor Jdbc Catalog external name case mapping rules (#28414) 2024-02-19 17:22:03 +08:00
8db2824c44 [bugfix](es catalog) add constant_keyword wildcard data type (#30947) 2024-02-19 17:20:21 +08:00
6e4f76de54 [improvement](jdbc catalog) Delete unnecessary schema and optimize insert logic (#30880)
In the previous design, we were compatible with MySQL's auto-increment column and default value to bypass the null value check when writing back Jdbc External Table. However, because MySQL's default value is not completely unified with Doris, this resulted in The unsuitable default value is wrong. In response to this situation, I made the following optimizations
1. For JDBC External Table, we always allow certain columns to be missing during insertion. Even if these columns are not allowed to be empty at the source end, the error should be generated by the source end, not Doris herself.
2. When the target column is non-nullable and the insertion is done via `INSERT INTO tbl VALUES()` or `INSERT INTO tbl SELECT constants`, Doris should verify any inconsistency between them and throw an exception. This check is not applied for `INSERT INTO tbl SELECT ... FROM tbl` operations.
2024-02-19 17:20:21 +08:00
2cb46eed94 [Feature](auto-inc) Add start value for auto increment column (#30512) 2024-02-16 10:12:23 +08:00
92226c986a [fix](catalog) fix data_sub/data_add func pushdown in jdbcscan (#30807) 2024-02-06 08:35:54 +08:00
9100fba47e [Fix](parquet-reader) Fix decimal test case out files. (#30715) 2024-02-01 21:17:17 +08:00
92cad69fc4 [Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535) 2024-01-31 23:53:40 +08:00
7d037c12bf [bugfix](paimon)fix paimon testcases (#30514)
1. set default timezone
2. not supported `char` type to pushdown
2024-01-31 23:53:39 +08:00
0f81d2d533 [FIX](complextype)fix complex type nested version type but not hide version (#30419) 2024-01-29 19:03:47 +08:00
779a9a1fbb [opt](planner) use string for varchar in ctas if original table is not olap (#30323) 2024-01-29 19:03:47 +08:00
fac0580eae [opt](docker)optimize ES docker compose (#30068)
1. add volume for es logs
2. optimize health check, waiting for es status to be green
3. fix es6 valume path error
4. optimize disk watermark to avoid es disk watermark error
5. fix es6 create index error
6. add custom elasticsearch.yml for es6
7. add log4j2.properties for es6, es7, es8
2024-01-19 15:48:56 +08:00
0ccd706a30 [Enhancement](Jdbc Catalog) Map Jdbc Catalog JSON Type to String for Improved Performance and Compatibility (#30035)
This PR proposes mapping external catalog JSON types to String instead of JsonB in Apache Doris. This change is motivated by the realization that JDBC retrieves JSON data as a String JSON string, regardless of its storage format (Json(String) or Json(Binary)). Mapping to String streamlines data retrieval, simplifies write-backs, and ensures compatibility with all JSON(String) and JSON(Binary) functions, despite potentially misleading displays of JSON data as Strings in Doris. This approach avoids the performance overhead and complexity of converting each row of data from JsonB to String, making the process more efficient and elegant.

About Upgrade
To ensure query compatibility with existing Catalogs in the upgraded version,we currently still retain the capability to query external JSON types as JSONB. However, once you upgrade to the new version and either refresh the Catalog or create a new one, all external JSON types will be treated as Strings. To ensure consistent behavior,and possible future removal of support for JSON as JSONB query code, it is highly recommended that you manually refresh your Catalog as soon as possible after upgrading to the new version.
2024-01-18 12:03:07 +08:00
44ba9e102c [feature](statistics)support statistics for iceberg/paimon/hudi table (#29868) 2024-01-18 12:03:07 +08:00
ade720470d [Improve](config)delete confused config for nested complex type (#29988) 2024-01-18 12:03:07 +08:00
f53d2c28cb [improvement](catalog) fix jdbc mysql catalog to_date fun pushdown (#29900) 2024-01-16 18:46:19 +08:00
f6dc6ea13b [improvement](catalog) Escape characters for columns in recovery predicate pushdown in SQL (#29854)
In the previous logic, when we restored the Column in the predicate pushdown based on the logical syntax tree for JdbcScanNode, in order to avoid query errors caused by keywords such as `key`, we added escape characters for it, but before we only Binary predicates are processed, which is imperfect. We should add escape characters to all columns that appear in the predicate to avoid errors with keywords or illegal characters.
2024-01-16 18:39:00 +08:00
ebfbe0c8dd [opt](information_schema) support information_schema in external catalog (#28919)
Add `information_schema` database for all catalog.
This is useful when using BI tools to connect to Doris,
the tools can get meta info from `information_schema`.

This PR mainly changes:

1. There will be a `information_schema` db in each catalog.
2. Each `information_schema` db only store the meta info of the catalog it belongs to.
3. For `information_schema`, the `TABLE_SCHEMA` column's value is the database name.
4. There is a new global variable `show_full_dbname_in_info_schema_db`, default is false, if set to true,
    The `TABLE_SCHEMA` column's value is the like `ctl.db`, because:

	When connect to Doris, the `database` info in connection url will be: `xxx?db=ctl.db`.
	
	And then some BI will try to query `information_schema` with sql like:
	
	`select * from information_schema.columns where TABLE_SCHEMA = "ctl.db"`
	
	So it has to be format as `ctl.db`
	
	eg, the `information_schema.columns` table in external catalog `doris` is like:
	
	```
	mysql> select * from information_schema.columns limit 1\G
	*************************** 1. row ***************************
	           TABLE_CATALOG: doris
	            TABLE_SCHEMA: doris.__internal_schema
	              TABLE_NAME: column_statistics
	             COLUMN_NAME: id
	        ORDINAL_POSITION: 1
	          COLUMN_DEFAULT: NULL
	             IS_NULLABLE: NO
	               DATA_TYPE: varchar
	CHARACTER_MAXIMUM_LENGTH: 4096
	  CHARACTER_OCTET_LENGTH: 16384
	       NUMERIC_PRECISION: NULL
	           NUMERIC_SCALE: NULL
	      DATETIME_PRECISION: NULL
	      CHARACTER_SET_NAME: NULL
	          COLLATION_NAME: NULL
	             COLUMN_TYPE: varchar(4096)
	              COLUMN_KEY:
	                   EXTRA:
	              PRIVILEGES:
	          COLUMN_COMMENT:
	             COLUMN_SIZE: 4096
	          DECIMAL_DIGITS: NULL
	   GENERATION_EXPRESSION: NULL
	                  SRS_ID: NULL
	```
	
6. Modify the behavior of

	- show tables
	- shwo databases
	- show columns
	- show table status

	The above statements may query the `information_schema` db if there is `where` predicate after them
2024-01-12 13:58:19 +08:00
3cd1c7745a [fix](jdbc catalog) Fix the precision of decimal type mapping to 0 (#29407) 2024-01-12 11:39:57 +08:00
5789b7e380 [fix](jin) add datetimev2 precision (#29528) 2024-01-06 13:35:26 +08:00
2a9b4a0f76 [enhancement](paimon)support predict for null and notnull (#29134) 2024-01-03 12:53:39 +08:00
2c4e52e44e [fix](es catalog) only es_query function can push down to ES (#29320)
Issue Number: close #29318 
1. Only push down `es_query` function to ES
2. Add null check where ES query result not have `_source` or `fields` fields.
2023-12-30 09:33:26 +08:00
1d8822b2b7 [fix](paimon)fix like predicate (#28803)
fix like predict
2023-12-23 22:25:55 +08:00
f38e11ec4e [fix](paimon)fix type convert for paimon (#28774)
fix type convert for paimon
2023-12-22 13:18:25 +08:00
7da86c37ec [fix](hive) add support for quoteChar and seperatorChar for hive (#28613)
add support for quoteChar and seperatorChar .
2023-12-19 19:35:03 +08:00
a3e2c6affe [fix](jdbc catalog) fix JdbcScanNode NOT CompoundPredicate filter expr handling errors (#28497) 2023-12-16 12:54:55 +08:00
01c94a554d [fix](autoinc) Fix broker load when target table has autoinc column (#28402) 2023-12-14 18:02:54 +08:00
c08ab9edc7 [feature](HiveCatalog) Support for getting hive meta data from relational databases under HMS (#28188) 2023-12-14 17:50:17 +08:00
dee89d2c4a [refactor](Nereids) let create table compatible with legacy planner (#28078) 2023-12-13 16:35:40 +08:00
3e1e8d2ebe [fix](jdbc catalog) Fixed data conversion problem when all data is null (#28230) 2023-12-11 17:57:57 +08:00
07336980f9 [fix](meta) show partitions with Limit for external HMS tables (27835) (#27835)
This enhancement shall extend existing logic for SHOW PARTITIONS FROM to include: -

Limit/Offset
Where [partition name only] [equal operator and like operator]
Order by [partition name only]
Issue Number: close #27834
2023-12-09 01:44:45 +08:00
baf85547ae [feature](jdbc) support call function to pass sql directly to jdbc catalog #26492
Support a new stmt in Nereids:
`CALL EXECUTE_STMT("jdbc", "stmt")`

So that we can pass the origin stmt directly to the datasource of a jdbc catalog.

show case:
```
mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
+------+------+
1 row in set (0.63 sec)

mysql> call execute("mysql_catalog", "insert into db1.tbl1 values(1,'abc')");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|  111 | 222  |
|    1 | abc  |
+------+------+
2 rows in set (0.03 sec)

mysql> call execute_stmt("mysql_catalog", "delete from db1.tbl1 where k1=111");
Query OK, 0 rows affected (0.01 sec)

mysql> select * from mysql_catalog.db1.tbl1;
+------+------+
| k1   | k2   |
+------+------+
|    1 | abc  |
+------+------+
1 row in set (0.03 sec)
```
2023-12-08 23:06:05 +08:00
81a0f8c041 [Feature](function) support generating const values from tvf numbers (#28051)
If specified, got a column of constant. otherwise an incremental series like it always be.

mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
|   -123 |
|   -123 |
|   -123 |
|   -123 |
|   -123 |
+--------+
5 rows in set (0.11 sec)
2023-12-07 22:26:43 +08:00
8749e5208f [fix](jdbc catalog) fix insert into jdbc table column order (#27855) 2023-12-01 20:46:48 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
573f0eaad9 [fix](regression)fix parquet data page v2 unstable case (#27753) 2023-11-29 18:58:37 +08:00
d771f16b79 [fix](parquet)fix bug that can not read parquet data page v2 (#27655) 2023-11-28 22:43:46 +08:00
cd6c61347d [Feature](tvf)(avro-jni) avro-jni add projection push down (#26885) 2023-11-27 10:33:27 +08:00
cc395f5428 [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27563) 2023-11-25 10:29:39 +08:00
b477839bce [enhancement](jdbc catalog) Add lowercase column name mapping to Jdbc data source & optimize database and table mapping (#27124)
This PR adds the processing of lowercase Column names in Oracle Jdbc Catalog. In the previous behavior, we changed all Oracle columns to uppercase queries by default, but could not handle the lowercase case. This PR can solve this situation and improve All Jdbc Catalog works
2023-11-17 23:51:47 +08:00
5dbc3cbba4 [test](information_schema)append information_schema external_table_p0 case. (#26846) 2023-11-15 14:30:16 +08:00