doris

Author	SHA1	Message	Date
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Jibing-Li	1b968c4ade	[fix](multi catalog)Fix nereids planner text format include extra column index bug (#20260 ) Nereids planner include all columns index in TFileScanRangeParams, this may cause the column projection incorrect for text format table. Because csv reader use the column index position to split a line. Extra column index will cause get wrong split result. This PR is to reset the column index after Projection, remove the useless column index.	2023-06-01 12:17:47 +08:00
Mingyu Chen	0c98355fff	[fix](catalog) fix create catalog with resource replay issue and kerberos auth issue (#20137 ) 1. Fix create catalog with resource replay bug. If user create catalog using `create catalog hive with resource xxx`, when replaying edit log, there is a bug that resource may be dropped, causing NPE and FE will fail to start. In this PR, I add a new FE config `disallow_create_catalog_with_resource`, default is true. So that `with resource` will not be allowed, and it will be deprecated later. And also fix the replay bug to avoid NPE. 2. Fix issue when creating 2 hive catalogs to connect with and without kerberos authentication. When user create 2 hive catalogs, one use simple auth, the other use kerberos auth. The query may fail with error like: `Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.` So I add a default property for hive catalog: `"ipc.client.fallback-to-simple-auth-allowed" = "true"`. Which means this property will be added automatically when user creating hive catalog, to avoid such problem. 3. Fix calling `hdfsExists()` issue When calling `hdfsExists()` with non-zero return code, should check if it encounters error or is file not found. 3. Some code refactor Avoid import `org.apache.parquet.Strings`	2023-05-30 16:57:39 +08:00
Jibing-Li	6f31ee9492	[fix](p0 regression)Update hive docker test case result data (#20176 ) Doris updated array type output format, using double quote for Strings. Before, it was using single quote. So we need to update the case out file using double quote.	2023-05-30 00:17:30 +08:00
Ashin Gau	30c4f25cb3	[fix](multi-catalog) verify the precision of datetime types for each data source (#19544 ) Fix threes bugs of timestampv2 precision: 1. Hive catalog doesn't set the precision of timestampv2, and can't get the precision from hive metastore, so set the largest precision for timestampv2; 2. Jdbc catalog use datetimev1 to parse timestamp, and convert to timestampv2, so the precision is lost. 3. TVF doesn't use the precision from meta data of file format.	2023-05-17 20:50:15 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
Jibing-Li	68505a1192	[Test](multi catalog)Add test case for Iceberg External Table. #19488	2023-05-11 01:13:40 +08:00
Qi Chen	6eea3d9e2d	[Test](multi-catalog) Fix test_hive_parquet regression test order issue. (#18879 ) l_orderkey cannot guarantee unique order.	2023-04-21 22:59:34 +08:00
Qi Chen	c6630a06c1	[Fix](multi-catalog) Fix "test_hive_other" regression test. (#17611 )	2023-03-14 09:16:48 +08:00
Jibing-Li	292926e5aa	[Fix](multi catalog)Fix partition case bug (#16763 ) Set column names from path to lower case in case-insensitive case. This is for Iceberg columns from path. Iceberg columns are case sensitive, which may cause error for table with partitions.	2023-02-16 15:47:23 +08:00
Jibing-Li	3ebc98228d	[feature wip](multi catalog)Support iceberg schema evolution. (#15836 ) Support iceberg schema evolution for parquet file format. Iceberg use unique id for each column to support schema evolution. To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side. Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.	2023-01-20 12:57:36 +08:00
Mingyu Chen	211cc66d02	[fix](multi-catalog) fix image loading failture when create catalog with resource (#15692 ) Bug fix fix image loading failture when create catalog with resource When creating jdbc catalog with resource, the metadata image will failed to be loaded. Because when loading jdbc catalog image, it will try to get resource from ResourceMgr, but ResourceMgr has not been loaded, so NPE will be thrown. This PR fix this bug, and refactor some logic about catalog and resource. When loading jdbc catalog image, it will not get resource from ResourceMgr. And now user can create catalog with resource and properties, like: create catalog jdbc_catalog with resource jdbc_resource properites("user" = "user1"); The properties in "properties" clause will overwrite the properties in "jdbc_resource". force adding tinyInt1isBit=false to jdbc url The default value of tinyInt1isBit is true, and it will cause tinyint in mysql to be bit type. force adding tinyInt1isBit=false to jdbc url so that the tinyint in mysql will be tinyint in Doris. Avoid calculate checksum of jdbc driver jar multiple times Refactor Refactor the notification logic when updating properties in resource. When updating properties in resource, it will notify the corresponding catalog to update its own properties. This PR change this logic. After updating properties in resource, it will only uninitialize the catalog's internal objects such "jdbc client" or "hms client". And this objects will be re-initialized lazily. And all properties will be got from Resource at runtime, so that it will always get the latest properties Regression test cases Because we add tinyInt1isBit=false to jdbc url, some of cases need to be changed.	2023-01-09 09:56:26 +08:00
Mingyu Chen	500c7fb702	[improvement](multi-catalog) support unsupported column type (#15660 ) When creating an external catalog, Doris will automatically sync the schema of table from external catalog. But some of column type are not supported by Doris now, such as struct, map, etc. In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding table can not be synced. But user may just want to query other supported columns. In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync. When meeting unsupported column, it will be synced as column with UNSUPPORTED type. When query this table, there are serval situation: select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx select k1 from table: k1 is with supported type. query OK. select * except(k2): k2 is with unsupported type. query OK	2023-01-08 10:07:10 +08:00
Ashin Gau	1520a4af6d	[refactor](resource) use resource to create external catalog (#14978 ) Use resource to create external catalog. -- HMS mysql> create resource hms_resource properties( -> "type"="hms", -> 'hive.metastore.uris' = 'thrift://172.21.0.44:7004', -> 'dfs.nameservices'='HANN', -> 'dfs.ha.namenodes.HANN'='nn1,nn2', -> 'dfs.namenode.rpc-address.HANN.nn1'='172.21.0.32:4007', -> 'dfs.namenode.rpc-address.HANN.nn2'='172.21.0.44:4007', -> 'dfs.client.failover.proxy.provider.HANN'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider' -> ); -- MYSQL mysql> create resource mysql_resource properties ( -> "type"="jdbc", -> "user"="root", -> "password"="123456", -> "jdbc_url" = "jdbc:mysql://127.0.0.1:3316/doris_test?useSSL=false", -> "driver_url" = "https://doris-community-test-1308700295.cos.ap-hongkong.myqcloud.com/jdbc_driver/mysql-connector-java-8.0.25.jar", -> "driver_class" = "com.mysql.cj.jdbc.Driver"); -- ES mysql> create resource es_resource properties ( -> "type"="es", -> "hosts"="http://127.0.0.1:29200", -> "nodes_discovery"="false", -> "enable_keyword_sniff"="true");	2022-12-22 13:45:55 +08:00
Ashin Gau	d0d7a6d8ad	[fix](multi-catalog) can't show databases when creating a new user in external catalog (#15204 ) Fix bug: A new user with grants to access external catalog can't show databases.	2022-12-21 08:58:06 +08:00
Ashin Gau	6625e650c4	[fix](resource) HdfsStorage can get default.Fs from path or configuration (#15079 )	2022-12-15 16:56:32 +08:00
Mingyu Chen	dd7ec8f4ca	[improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669 ) Add tpch 1g orc test case in hive docker Refactor some suites dir of catalog test cases. And "-internal" for dlf endpoint, to support access oss with aliyun vpc.	2022-11-30 10:03:58 +08:00
Ashin Gau	44ee4386f7	[test](multi-catalog)Regression test for external hive orc table (#13762 ) Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment. Functions to be tested: 1. Ensure that all types are parsed correctly 2. Ensure that the null map of all types are parsed correctly 3. Ensure that the `SearchArgument` of `OrcReader` works well 4. Only select partition columns	2022-11-17 20:36:02 +08:00
Jibing-Li	30f36070b5	[test](multi-catalog)Regression test for external hive parquet table (#13611 )	2022-11-14 14:10:10 +08:00

19 Commits