[docs](multi-catalog)update en docs (#16160)

This commit is contained in:
Hu Yanjun
2023-01-29 00:36:31 +08:00
committed by GitHub
parent b7379daffa
commit 46ce66cbd8
7 changed files with 304 additions and 16 deletions

View File

@ -1,6 +1,6 @@
---
{
"title": "Aliyun DLF",
"title": "Alibaba Cloud DLF",
"language": "en"
}
---
@ -25,7 +25,79 @@ under the License.
-->
# Aliyun DLF
# Alibaba Cloud DLF
Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
> [What is DLF](https://www.alibabacloud.com/product/datalake-formation)
Doris can access DLF the same way as it accesses Hive Metastore.
## Connect to DLF
1. Create `hive-site.xml`
Create the `hive-site.xml` file, and put it in the `fe/conf` directory.
```
<?xml version="1.0"?>
<configuration>
<!--Set to use dlf client-->
<property>
<name>hive.metastore.type</name>
<value>dlf</value>
</property>
<property>
<name>dlf.catalog.endpoint</name>
<value>dlf-vpc.cn-beijing.aliyuncs.com</value>
</property>
<property>
<name>dlf.catalog.region</name>
<value>cn-beijing</value>
</property>
<property>
<name>dlf.catalog.proxyMode</name>
<value>DLF_ONLY</value>
</property>
<property>
<name>dlf.catalog.uid</name>
<value>20000000000000000</value>
</property>
<property>
<name>dlf.catalog.accessKeyId</name>
<value>XXXXXXXXXXXXXXX</value>
</property>
<property>
<name>dlf.catalog.accessKeySecret</name>
<value>XXXXXXXXXXXXXXXXX</value>
</property>
</configuration>
```
* `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
* `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
* `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console.
* `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
* `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
Other configuration items are fixed and require no modifications.
2. Restart FE, and create Catalog via the `CREATE CATALOG` statement.
Doris will read and parse `fe/conf/hive-site.xml`.
```sql
CREATE CATALOG hive_with_dlf PROPERTIES (
"type"="hms",
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
)
```
`type` should always be `hms`; while `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.
After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
TODO: translate

View File

@ -26,4 +26,149 @@ under the License.
# Hive
TODO: translate
Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.
Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog.
## Usage
When connnecting to Hive, Doris:
1. Supports Hive version 1/2/3;
2. Supports both Managed Table and External Table;
3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`).
## Create Catalog
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
In addition to `type` and `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection.
For example, to specify HDFS HA:
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
To specify HDFS HA and Kerberos authentication information:
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hive.metastore.sasl.enabled' = 'true',
'dfs.nameservices'='your-nameservice',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'hadoop.security.authentication' = 'kerberos',
'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
);
```
To provide Hadoop KMS encrypted transmission information:
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
);
```
Or to connect to Hive data stored in JuiceFS:
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'root',
'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
'juicefs.meta' = 'xxx'
);
```
In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:
```sql
# 1. Create Resource
CREATE RESOURCE hms_resource PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
'key' = 'value'
);
```
You can also put the `hive-site.xml` file in the `conf` directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules :
* Information in Resource will overwrite that in `hive-site.xml`.
* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource.
### Hive Versions
Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hive.version' = '1.1.0'
);
```
## Column Type Mapping
This is applicable for Hive/Iceberge/Hudi.
| HMS Type | Doris Type | Comment |
| ------------- | ------------- | ------------------------------------------------- |
| boolean | boolean | |
| tinyint | tinyint | |
| smallint | smallint | |
| int | int | |
| bigint | bigint | |
| date | date | |
| timestamp | datetime | |
| float | float | |
| double | double | |
| char | char | |
| varchar | varchar | |
| decimal | decimal | |
| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` |
| other | unsupported | |

View File

@ -27,4 +27,28 @@ under the License.
# Hudi
TODO: translate
## Usage
1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions.
## Create Catalog
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
```sql
CREATE CATALOG hudi PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
## Column Type Mapping
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).

View File

@ -27,4 +27,51 @@ under the License.
# Iceberg
TODO: translate
## Usage
When connecting to Iceberg, Doris:
1. Supports Iceberg V1/V2 table formats;
2. Supports Position Delete but not Equality Delete for V2 format;
3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.
## Create Catalog
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
```sql
CREATE CATALOG iceberg PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',
'dfs.nameservices'='your-nameservice',
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
## Column Type Mapping
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
## Time Travel
<version since="dev">
Doris supports reading the specified Snapshot of Iceberg tables.
</version>
Each write operation to an Iceberg table will generate a new Snapshot.
By default, a read request will only read the latest Snapshot.
You can read data of historical table versions using the `FOR TIME AS OF` or `FOR VERSION AS OF` statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:
`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";`
`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;`
You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table.

View File

@ -261,7 +261,7 @@ See [Hudi](./hudi)
### Connect to Elasticsearch
See [Elasticsearch](./elasticsearch)
See [Elasticsearch](./es)
### Connect to JDBC

View File

@ -28,7 +28,7 @@ under the License.
通过连接 Hive Metastore,或者兼容 Hive Metatore 的元数据服务,Doris 可以自动获取 Hive 的库表信息,并进行数据查询。
除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能方位 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能访问 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
## 使用限制
@ -38,7 +38,7 @@ under the License.
4. 支持数据存储在 Juicefs 上的 hive 表,用法如下(需要把juicefs-hadoop-x.x.x.jar放在 fe/lib/ 和 apache_hdfs_broker/lib/ 下)。
## 创建 Catalog
```sql
CREATE CATALOG hive PROPERTIES (
'type'='hms',
@ -51,7 +51,7 @@ CREATE CATALOG hive PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
除了 `type``hive.metastore.uris` 两个必须参数外,还可以通过更多参数来传递连接所需要的信息。
如提供 HDFS HA 信息,示例如下:
@ -68,7 +68,7 @@ CREATE CATALOG hive PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
```
同时提供 HDFS HA 信息和 Kerberos 认证信息,示例如下:
```sql
@ -87,7 +87,7 @@ CREATE CATALOG hive PROPERTIES (
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
);
```
提供 Hadoop KMS 加密传输信息,示例如下:
```sql
@ -110,7 +110,7 @@ CREATE CATALOG hive PROPERTIES (
'juicefs.meta' = 'xxx'
);
```
在 1.2.1 版本之后,我们也可以将这些信息通过创建一个 Resource 统一存储,然后在创建 Catalog 时使用这个 Resource。示例如下:
```sql
@ -126,12 +126,12 @@ CREATE RESOURCE hms_resource PROPERTIES (
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
);
# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息覆盖 Resource 中的信息。
# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息覆盖 Resource 中的信息。
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
'key' = 'value'
);
```
我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下:
* Resource 中的信息覆盖 hive-site.xml 中的信息。

View File

@ -37,7 +37,7 @@ under the License.
和 Hive Catalog 基本一致,这里仅给出简单示例。其他示例可参阅 [Hive Catalog](./hive)。
```sql
CREATE CATALOG iceberg PROPERTIES (
CREATE CATALOG hudi PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
'hadoop.username' = 'hive',