[docs](multi-catalog)update en docs (#16160)
This commit is contained in:
@ -1,6 +1,6 @@
|
||||
---
|
||||
{
|
||||
"title": "Aliyun DLF",
|
||||
"title": "Alibaba Cloud DLF",
|
||||
"language": "en"
|
||||
}
|
||||
---
|
||||
@ -25,7 +25,79 @@ under the License.
|
||||
-->
|
||||
|
||||
|
||||
# Aliyun DLF
|
||||
# Alibaba Cloud DLF
|
||||
|
||||
Data Lake Formation (DLF) is the unified metadata management service of Alibaba Cloud. It is compatible with the Hive Metastore protocol.
|
||||
|
||||
> [What is DLF](https://www.alibabacloud.com/product/datalake-formation)
|
||||
|
||||
Doris can access DLF the same way as it accesses Hive Metastore.
|
||||
|
||||
## Connect to DLF
|
||||
|
||||
1. Create `hive-site.xml`
|
||||
|
||||
Create the `hive-site.xml` file, and put it in the `fe/conf` directory.
|
||||
|
||||
```
|
||||
<?xml version="1.0"?>
|
||||
<configuration>
|
||||
<!--Set to use dlf client-->
|
||||
<property>
|
||||
<name>hive.metastore.type</name>
|
||||
<value>dlf</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.endpoint</name>
|
||||
<value>dlf-vpc.cn-beijing.aliyuncs.com</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.region</name>
|
||||
<value>cn-beijing</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.proxyMode</name>
|
||||
<value>DLF_ONLY</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.uid</name>
|
||||
<value>20000000000000000</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.accessKeyId</name>
|
||||
<value>XXXXXXXXXXXXXXX</value>
|
||||
</property>
|
||||
<property>
|
||||
<name>dlf.catalog.accessKeySecret</name>
|
||||
<value>XXXXXXXXXXXXXXXXX</value>
|
||||
</property>
|
||||
</configuration>
|
||||
```
|
||||
|
||||
* `dlf.catalog.endpoint`: DLF Endpoint. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
|
||||
* `dlf.catalog.region`: DLF Region. See [Regions and Endpoints of DLF](https://www.alibabacloud.com/help/en/data-lake-formation/latest/regions-and-endpoints).
|
||||
* `dlf.catalog.uid`: Alibaba Cloud account. You can find the "Account ID" in the upper right corner on the Alibaba Cloud console.
|
||||
* `dlf.catalog.accessKeyId`:AccessKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
|
||||
* `dlf.catalog.accessKeySecret`:SecretKey, which you can create and manage on the [Alibaba Cloud console](https://ram.console.aliyun.com/manage/ak).
|
||||
|
||||
Other configuration items are fixed and require no modifications.
|
||||
|
||||
2. Restart FE, and create Catalog via the `CREATE CATALOG` statement.
|
||||
|
||||
Doris will read and parse `fe/conf/hive-site.xml`.
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive_with_dlf PROPERTIES (
|
||||
"type"="hms",
|
||||
"hive.metastore.uris" = "thrift://127.0.0.1:9083"
|
||||
)
|
||||
```
|
||||
|
||||
`type` should always be `hms`; while `hive.metastore.uris` can be arbitary since it is not used in real practice, but it should follow the format of Hive Metastore Thrift URI.
|
||||
|
||||
After the above steps, you can access metadata in DLF the same way as you access Hive MetaStore.
|
||||
|
||||
Doris supports accessing Hive/Iceberg/Hudi metadata in DLF.
|
||||
|
||||
|
||||
TODO: translate
|
||||
|
||||
|
||||
@ -26,4 +26,149 @@ under the License.
|
||||
|
||||
# Hive
|
||||
|
||||
TODO: translate
|
||||
Once Doris is connected to Hive Metastore or made compatible with Hive Metastore metadata service, it can access databases and tables in Hive and conduct queries.
|
||||
|
||||
Besides Hive, many other systems, such as Iceberg and Hudi, use Hive Metastore to keep their metadata. Thus, Doris can also access these systems via Hive Catalog.
|
||||
|
||||
## Usage
|
||||
|
||||
When connnecting to Hive, Doris:
|
||||
|
||||
1. Supports Hive version 1/2/3;
|
||||
2. Supports both Managed Table and External Table;
|
||||
3. Can identify metadata of Hive, Iceberg, and Hudi stored in Hive Metastore;
|
||||
4. Supports Hive tables with data stored in JuiceFS, which can be used the same way as normal Hive tables (put `juicefs-hadoop-x.x.x.jar` in `fe/lib/` and `apache_hdfs_broker/lib/`).
|
||||
|
||||
## Create Catalog
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
In addition to `type` and `hive.metastore.uris` , which are required, you can specify other parameters regarding the connection.
|
||||
|
||||
For example, to specify HDFS HA:
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
To specify HDFS HA and Kerberos authentication information:
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hive.metastore.sasl.enabled' = 'true',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
|
||||
'hadoop.security.authentication' = 'kerberos',
|
||||
'hadoop.kerberos.keytab' = '/your-keytab-filepath/your.keytab',
|
||||
'hadoop.kerberos.principal' = 'your-principal@YOUR.COM',
|
||||
'yarn.resourcemanager.address' = 'your-rm-address:your-rm-port',
|
||||
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
|
||||
);
|
||||
```
|
||||
|
||||
To provide Hadoop KMS encrypted transmission information:
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'dfs.encryption.key.provider.uri' = 'kms://http@kms_host:kms_port/kms'
|
||||
);
|
||||
```
|
||||
|
||||
Or to connect to Hive data stored in JuiceFS:
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'root',
|
||||
'fs.jfs.impl' = 'io.juicefs.JuiceFileSystem',
|
||||
'fs.AbstractFileSystem.jfs.impl' = 'io.juicefs.JuiceFS',
|
||||
'juicefs.meta' = 'xxx'
|
||||
);
|
||||
```
|
||||
|
||||
In Doris 1.2.1 and newer, you can create a Resource that contains all these parameters, and reuse the Resource when creating new Catalogs. Here is an example:
|
||||
|
||||
```sql
|
||||
# 1. Create Resource
|
||||
CREATE RESOURCE hms_resource PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
|
||||
# 2. Create Catalog and use an existing Resource. The key and value information in the followings will overwrite the corresponding information in the Resource.
|
||||
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
|
||||
'key' = 'value'
|
||||
);
|
||||
```
|
||||
|
||||
You can also put the `hive-site.xml` file in the `conf` directories of FE and BE. This will enable Doris to automatically read information from `hive-site.xml`. The relevant information will be overwritten based on the following rules :
|
||||
|
||||
|
||||
* Information in Resource will overwrite that in `hive-site.xml`.
|
||||
* Information in `CREATE CATALOG PROPERTIES` will overwrite that in Resource.
|
||||
|
||||
### Hive Versions
|
||||
|
||||
Doris can access Hive Metastore in all Hive versions. By default, Doris uses the interface compatible with Hive 2.3 to access Hive Metastore. You can specify a certain Hive version when creating Catalogs, for example:
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hive.version' = '1.1.0'
|
||||
);
|
||||
```
|
||||
|
||||
## Column Type Mapping
|
||||
|
||||
This is applicable for Hive/Iceberge/Hudi.
|
||||
|
||||
| HMS Type | Doris Type | Comment |
|
||||
| ------------- | ------------- | ------------------------------------------------- |
|
||||
| boolean | boolean | |
|
||||
| tinyint | tinyint | |
|
||||
| smallint | smallint | |
|
||||
| int | int | |
|
||||
| bigint | bigint | |
|
||||
| date | date | |
|
||||
| timestamp | datetime | |
|
||||
| float | float | |
|
||||
| double | double | |
|
||||
| char | char | |
|
||||
| varchar | varchar | |
|
||||
| decimal | decimal | |
|
||||
| `array<type>` | `array<type>` | Support nested array, such as `array<array<int>>` |
|
||||
| other | unsupported | |
|
||||
|
||||
@ -27,4 +27,28 @@ under the License.
|
||||
|
||||
# Hudi
|
||||
|
||||
TODO: translate
|
||||
## Usage
|
||||
|
||||
1. Currently, Doris supports Snapshot Query on Copy-on-Write Hudi tables and Read Optimized Query on Merge-on-Read tables. In the future, it will support Snapshot Query on Merge-on-Read tables and Incremental Query.
|
||||
2. Doris only supports Hive Metastore Catalogs currently. The usage is basically the same as that of Hive Catalogs. More types of Catalogs will be supported in future versions.
|
||||
|
||||
## Create Catalog
|
||||
|
||||
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hudi PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
## Column Type Mapping
|
||||
|
||||
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
|
||||
|
||||
@ -27,4 +27,51 @@ under the License.
|
||||
|
||||
# Iceberg
|
||||
|
||||
TODO: translate
|
||||
## Usage
|
||||
|
||||
When connecting to Iceberg, Doris:
|
||||
|
||||
1. Supports Iceberg V1/V2 table formats;
|
||||
2. Supports Position Delete but not Equality Delete for V2 format;
|
||||
3. Only supports Hive Metastore Catalogs. The usage is the same as that of Hive Catalogs.
|
||||
|
||||
## Create Catalog
|
||||
|
||||
Same as creating Hive Catalogs. A simple example is provided here. See [Hive](./hive) for more information.
|
||||
|
||||
```sql
|
||||
CREATE CATALOG iceberg PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
'dfs.nameservices'='your-nameservice',
|
||||
'dfs.ha.namenodes.your-nameservice'='nn1,nn2',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn1'='172.21.0.2:4007',
|
||||
'dfs.namenode.rpc-address.your-nameservice.nn2'='172.21.0.3:4007',
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
## Column Type Mapping
|
||||
|
||||
Same as that in Hive Catalogs. See the relevant section in [Hive](./hive).
|
||||
|
||||
## Time Travel
|
||||
|
||||
<version since="dev">
|
||||
|
||||
Doris supports reading the specified Snapshot of Iceberg tables.
|
||||
|
||||
</version>
|
||||
|
||||
Each write operation to an Iceberg table will generate a new Snapshot.
|
||||
|
||||
By default, a read request will only read the latest Snapshot.
|
||||
|
||||
You can read data of historical table versions using the `FOR TIME AS OF` or `FOR VERSION AS OF` statements based on the Snapshot ID or the timepoint the Snapshot is generated. For example:
|
||||
|
||||
`SELECT * FROM iceberg_tbl FOR TIME AS OF "2022-10-07 17:20:37";`
|
||||
|
||||
`SELECT * FROM iceberg_tbl FOR VERSION AS OF 868895038966572;`
|
||||
|
||||
You can use the [iceberg_meta](https://doris.apache.org/docs/dev/sql-manual/sql-functions/table-functions/iceberg_meta/) table function to view the Snapshot details of the specified table.
|
||||
|
||||
@ -261,7 +261,7 @@ See [Hudi](./hudi)
|
||||
|
||||
### Connect to Elasticsearch
|
||||
|
||||
See [Elasticsearch](./elasticsearch)
|
||||
See [Elasticsearch](./es)
|
||||
|
||||
### Connect to JDBC
|
||||
|
||||
|
||||
@ -28,7 +28,7 @@ under the License.
|
||||
|
||||
通过连接 Hive Metastore,或者兼容 Hive Metatore 的元数据服务,Doris 可以自动获取 Hive 的库表信息,并进行数据查询。
|
||||
|
||||
除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能方位 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
|
||||
除了 Hive 外,很多其他系统也会使用 Hive Metastore 存储元数据。所以通过 Hive Catalog,我们不仅能访问 Hive,也能访问使用 Hive Metastore 作为元数据存储的系统。如 Iceberg、Hudi 等。
|
||||
|
||||
## 使用限制
|
||||
|
||||
@ -38,7 +38,7 @@ under the License.
|
||||
4. 支持数据存储在 Juicefs 上的 hive 表,用法如下(需要把juicefs-hadoop-x.x.x.jar放在 fe/lib/ 和 apache_hdfs_broker/lib/ 下)。
|
||||
|
||||
## 创建 Catalog
|
||||
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
'type'='hms',
|
||||
@ -51,7 +51,7 @@ CREATE CATALOG hive PROPERTIES (
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
除了 `type` 和 `hive.metastore.uris` 两个必须参数外,还可以通过更多参数来传递连接所需要的信息。
|
||||
|
||||
如提供 HDFS HA 信息,示例如下:
|
||||
@ -68,7 +68,7 @@ CREATE CATALOG hive PROPERTIES (
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
同时提供 HDFS HA 信息和 Kerberos 认证信息,示例如下:
|
||||
|
||||
```sql
|
||||
@ -87,7 +87,7 @@ CREATE CATALOG hive PROPERTIES (
|
||||
'yarn.resourcemanager.principal' = 'your-rm-principal/_HOST@YOUR.COM'
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
提供 Hadoop KMS 加密传输信息,示例如下:
|
||||
|
||||
```sql
|
||||
@ -110,7 +110,7 @@ CREATE CATALOG hive PROPERTIES (
|
||||
'juicefs.meta' = 'xxx'
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
在 1.2.1 版本之后,我们也可以将这些信息通过创建一个 Resource 统一存储,然后在创建 Catalog 时使用这个 Resource。示例如下:
|
||||
|
||||
```sql
|
||||
@ -126,12 +126,12 @@ CREATE RESOURCE hms_resource PROPERTIES (
|
||||
'dfs.client.failover.proxy.provider.your-nameservice'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider'
|
||||
);
|
||||
|
||||
# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息回覆盖 Resource 中的信息。
|
||||
# 2. 创建 Catalog 并使用 Resource,这里的 Key Value 信息会覆盖 Resource 中的信息。
|
||||
CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
|
||||
'key' = 'value'
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下:
|
||||
|
||||
* Resource 中的信息覆盖 hive-site.xml 中的信息。
|
||||
|
||||
@ -37,7 +37,7 @@ under the License.
|
||||
和 Hive Catalog 基本一致,这里仅给出简单示例。其他示例可参阅 [Hive Catalog](./hive)。
|
||||
|
||||
```sql
|
||||
CREATE CATALOG iceberg PROPERTIES (
|
||||
CREATE CATALOG hudi PROPERTIES (
|
||||
'type'='hms',
|
||||
'hive.metastore.uris' = 'thrift://172.21.0.1:7004',
|
||||
'hadoop.username' = 'hive',
|
||||
|
||||
Reference in New Issue
Block a user