Commit Graph

19 Commits

Author SHA1 Message Date
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00
1b968c4ade [fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260)
Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for
 text format table. Because csv reader use the column index position to split a line. Extra column index will cause get 
wrong split result. This PR is to reset the column index after Projection, remove the useless column index.
2023-06-01 12:17:47 +08:00
0c98355fff [fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137)
1. Fix create catalog with resource replay bug.
	If user create catalog using `create catalog hive with resource xxx`, when replaying edit log,
	there is a bug that resource may be dropped, causing NPE and FE will fail to start.

	In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true.
	So that `with resource` will not be allowed, and it will be deprecated later.

	And also fix the replay bug to avoid NPE.

2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication.

	When user create 2 hive catalogs, one use simple auth, the other use kerberos auth.
	The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.`

	So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`.
	Which means this property will be added automatically when user creating hive catalog, to avoid such problem.

3. Fix calling `hdfsExists()` issue

	When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found.

3. Some code refactor

	Avoid import `org.apache.parquet.Strings`
2023-05-30 16:57:39 +08:00
6f31ee9492 [fix](p0 regression)Update hive docker test case result data (#20176)
Doris updated array type output format, using double quote for Strings.
Before, it was using single quote. So we need to update the case out file using double quote.
2023-05-30 00:17:30 +08:00
30c4f25cb3 [fix](multi-catalog) verify the precision of datetime types for each data source (#19544)
Fix threes bugs of timestampv2 precision:
1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2;
2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost.
3. TVF doesn't use the precision from meta data of file format.
2023-05-17 20:50:15 +08:00
4418eb36a3 [Fix](multi-catalog) Fix some hive partition issues. (#19513)
Fix some hive partition issues.
1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type.
2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.
2023-05-11 07:49:46 +08:00
68505a1192 [Test](multi catalog)Add test case for Iceberg External Table. #19488 2023-05-11 01:13:40 +08:00
6eea3d9e2d [Test](multi-catalog) Fix test_hive_parquet regression test order issue. (#18879)
l_orderkey cannot guarantee unique order.
2023-04-21 22:59:34 +08:00
c6630a06c1 [Fix](multi-catalog) Fix "test_hive_other" regression test. (#17611) 2023-03-14 09:16:48 +08:00
292926e5aa [Fix](multi catalog)Fix partition case bug (#16763)
Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
2023-02-16 15:47:23 +08:00
3ebc98228d [feature wip](multi catalog)Support iceberg schema evolution. (#15836)
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
2023-01-20 12:57:36 +08:00
211cc66d02 [fix](multi-catalog) fix image loading failture when create catalog with resource (#15692)
Bug fix
fix image loading failture when create catalog with resource
When creating jdbc catalog with resource, the metadata image will failed to be loaded.
Because when loading jdbc catalog image, it will try to get resource from ResourceMgr,
but ResourceMgr has not been loaded, so NPE will be thrown.

This PR fix this bug, and refactor some logic about catalog and resource.

When loading jdbc catalog image, it will not get resource from ResourceMgr.
And now user can create catalog with resource and properties, like:

create catalog jdbc_catalog with resource jdbc_resource
properites("user" = "user1");
The properties in "properties" clause will overwrite the properties in "jdbc_resource".

force adding tinyInt1isBit=false to jdbc url
The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type.
force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris.

Avoid calculate checksum of jdbc driver jar multiple times
Refactor
Refactor the notification logic when updating properties in resource.
When updating properties in resource, it will notify the corresponding catalog to update its own properties.
This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal
objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily.

And all properties will be got from Resource at runtime, so that it will always get the latest properties

Regression test cases
Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.
2023-01-09 09:56:26 +08:00
500c7fb702 [improvement](multi-catalog) support unsupported column type (#15660)
When creating an external catalog, Doris will automatically sync the schema of table from external catalog.
But some of column type are not supported by Doris now, such as struct, map, etc.

In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding
table can not be synced. But user may just want to query other supported columns.

In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync.
When meeting unsupported column, it will be synced as column with UNSUPPORTED type.

When query this table, there are serval situation:

select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx
select k1 from table: k1 is with supported type. query OK.
select * except(k2): k2 is with unsupported type. query OK
2023-01-08 10:07:10 +08:00
1520a4af6d [refactor](resource) use resource to create external catalog (#14978)
Use resource to create external catalog.
-- HMS
mysql> create resource hms_resource properties(
    -> "type"="hms",
    -> 'hive.metastore.uris' = 'thrift://172.21.0.44:7004',
    -> 'dfs.nameservices'='HANN',
    -> 'dfs.ha.namenodes.HANN'='nn1,nn2',
    -> 'dfs.namenode.rpc-address.HANN.nn1'='172.21.0.32:4007',
    -> 'dfs.namenode.rpc-address.HANN.nn2'='172.21.0.44:4007',
    -> 'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
    -> );

-- MYSQL
mysql> create resource mysql_resource properties (
    -> "type"="jdbc",
    -> "user"="root",
    -> "password"="123456",
    -> "jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false",
    -> "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar",
    -> "driver_class" = "com.mysql.cj.jdbc.Driver");

-- ES
mysql> create resource es_resource properties (
    -> "type"="es",
    -> "hosts"="http://127.0.0.1:29200",
    -> "nodes_discovery"="false",
    -> "enable_keyword_sniff"="true");
2022-12-22 13:45:55 +08:00
d0d7a6d8ad [fix](multi-catalog) can't show databases when creating a new user in external catalog (#15204)
Fix bug: A new user with grants to access external catalog can't show databases.
2022-12-21 08:58:06 +08:00
6625e650c4 [fix](resource) HdfsStorage can get default.Fs from path or configuration (#15079) 2022-12-15 16:56:32 +08:00
dd7ec8f4ca [improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669)
Add tpch 1g orc test case in hive docker

Refactor some suites dir of catalog test cases.

And "-internal" for dlf endpoint, to support access oss with aliyun vpc.
2022-11-30 10:03:58 +08:00
44ee4386f7 [test](multi-catalog)Regression test for external hive orc table (#13762)
Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment.
Functions to be tested:
1. Ensure that all types are parsed correctly
2. Ensure that the null map of all types are parsed correctly
3. Ensure that the `SearchArgument` of `OrcReader` works well
4. Only select partition columns
2022-11-17 20:36:02 +08:00
30f36070b5 [test](multi-catalog)Regression test for external hive parquet table (#13611) 2022-11-14 14:10:10 +08:00