[doc](catalog) add faq for hive catalog (#18298)
This commit is contained in:
@ -185,7 +185,7 @@ The rules of dynamic partition are prefixed with `dynamic_partition.`:
|
||||
|
||||
- `dynamic_partition.storage_medium`
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
Specifies the default storage medium for the created dynamic partition. HDD is the default, SSD can be selected.
|
||||
|
||||
|
||||
@ -153,6 +153,27 @@ LIMIT 5;
|
||||
|
||||
You can put the Table Value Function anywhere that you used to put Table in the SQL, such as in the WITH or FROM clause in CTE. In this way, you can treat the file as a normal table and conduct analysis conveniently.
|
||||
|
||||
<version since="dev"></version>
|
||||
|
||||
你也可以用过 `CREATE VIEW` 语句为 Table Value Function 创建一个逻辑视图。这样,你可以想其他视图一样,对这个 Table Value Function 进行访问、权限管理等操作,也可以让其他用户访问这个 Table Value Function。
|
||||
You can also create a logic view by using `CREATE VIEW` statement for a Table Value Function. So that you can query this view, grant priv on this view or allow other user to access this Table Value Function.
|
||||
|
||||
```
|
||||
CREATE VIEW v1 AS
|
||||
SELECT * FROM s3(
|
||||
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
|
||||
"ACCESS_KEY"= "minioadmin",
|
||||
"SECRET_KEY" = "minioadmin",
|
||||
"Format" = "parquet",
|
||||
"use_path_style"="true");
|
||||
|
||||
DESC v1;
|
||||
|
||||
SELECT * FROM v1;
|
||||
|
||||
GRANT SELECT_PRIV ON db1.v1 TO user1;
|
||||
```
|
||||
|
||||
### Data Ingestion
|
||||
|
||||
Users can ingest files into Doris tables via `INSERT INTO SELECT` for faster file analysis:
|
||||
|
||||
@ -41,28 +41,34 @@ File Cache caches the accessed remote data in the local BE node. The original da
|
||||
File Cache is disabled by default. You need to set the relevant configuration in FE and BE to enable it.
|
||||
|
||||
### Configurations for FE
|
||||
|
||||
Enable File Cache for a given session:
|
||||
|
||||
```
|
||||
SET enable_file_cache = true;
|
||||
```
|
||||
|
||||
Enable File Cache globally:
|
||||
|
||||
```
|
||||
SET GLOBAL enable_file_cache = true;
|
||||
```
|
||||
|
||||
### Configurations for BE
|
||||
Add settings to the BE node's configuration file `conf/be.conf`, and restart the BE node for the configuration to take effect.
|
||||
|
||||
| Parameter | Description |
|
||||
| ---- | ---- |
|
||||
| enable_file_cache | Whether to enable File Cache, default false |
|
||||
| file_cache_max_file_segment_size | Max size of a single cached block, default 4MB, should greater than 4096 |
|
||||
| file_cache_path | Parameters about cache path, json format, for exmaple: [{"path": "storage1", "normal":53687091200,"persistent":21474836480,"query_limit": "10737418240"},{"path": "storage2", "normal":53687091200,"persistent":21474836480},{"path": "storage3","normal":53687091200,"persistent":21474836480}]. `path` is the path to save cached data; `normal` is the max size of cached data; `query_limit` is the max size of cached data for a single query; `persistent` / `file_cache_max_file_segment_size` is max number of cache blocks. |
|
||||
| enable_file_cache_query_limit | Whether to limit the cache size used by a single query, default false |
|
||||
| clear_file_cache | Whether to delete the previous cache data when the BE restarts, default false |
|
||||
| --- | --- |
|
||||
| `enable_file_cache` | Whether to enable File Cache, default false |
|
||||
| `file_cache_max_file_segment_size` | Max size of a single cached block, default 4MB, should greater than 4096 |
|
||||
| `file_cache_path` | Parameters about cache path, json format, for exmaple: `[{"path": "storage1", "normal":53687091200,"persistent":21474836480,"query_limit": "10737418240"},{"path": "storage2", "normal":53687091200,"persistent":21474836480},{"path": "storage3","normal":53687091200,"persistent":21474836480}]`. `path` is the path to save cached data; `normal` is the max size of cached data; `query_limit` is the max size of cached data for a single query; `persistent` / `file_cache_max_file_segment_size` is max number of cache blocks. |
|
||||
| `enable_file_cache_query_limit` | Whether to limit the cache size used by a single query, default false |
|
||||
| `clear_file_cache` | Whether to delete the previous cache data when the BE restarts, default false |
|
||||
|
||||
## Check whether a query hits cache
|
||||
|
||||
Execute `set enable_profile = true` to enable the session variable, and you can view the query profile in the Queris tab of FE's web page. The metrics related to File Cache are as follows:
|
||||
|
||||
```
|
||||
- FileCache:
|
||||
- IOHitCacheNum: 552
|
||||
|
||||
@ -60,7 +60,7 @@ under the License.
|
||||
|
||||
Upgrade the JDK version to a version >= Java 8 u162. Or download and install the JCE Unlimited Strength Jurisdiction Policy Files corresponding to the JDK.
|
||||
|
||||
5. When querying a table in ORC format, FE reports an error `Could not obtain block`
|
||||
5. When querying a table in ORC format, FE reports an error `Could not obtain block` or `Caused by: java.lang.NoSuchFieldError: types`
|
||||
|
||||
For ORC files, by default, FE will access HDFS to obtain file information and split files. In some cases, FE may not be able to access HDFS. It can be solved by adding the following parameters:
|
||||
|
||||
@ -81,3 +81,18 @@ under the License.
|
||||
8. An error is reported when connecting to the MySQL database through the JDBC Catalog: `Establishing SSL connection without server's identity verification is not recommended`
|
||||
|
||||
Please add `useSSL=true` in `jdbc_url`
|
||||
|
||||
9. An error is reported when connecting Hive Catalog: `Caused by: java.lang.NullPointerException`
|
||||
|
||||
If there is stack trace in fe.log:
|
||||
|
||||
```
|
||||
Caused by: java.lang.NullPointerException
|
||||
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3]
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
|
||||
```
|
||||
|
||||
Try adding `"metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` in `create catalog` statement.
|
||||
|
||||
@ -132,7 +132,7 @@ Same as that in Hive Catalogs. See the relevant section in [Hive](./hive.md).
|
||||
|
||||
## Time Travel
|
||||
|
||||
<version since="dev">
|
||||
<version since="1.2.2">
|
||||
|
||||
Doris supports reading the specified Snapshot of Iceberg tables.
|
||||
|
||||
|
||||
@ -139,7 +139,7 @@ Once connected, Doris will ingest metadata of databases and tables from the exte
|
||||
|
||||
6. Doris
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
Jdbc Catalog also support to connect another Doris database:
|
||||
|
||||
@ -158,7 +158,7 @@ Currently, Jdbc Catalog only support to use 5.x version of JDBC jar package to c
|
||||
|
||||
7. SAP_HANA
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hana_catalog PROPERTIES (
|
||||
|
||||
@ -58,8 +58,7 @@ Variables that support both the current session and the global effect include:
|
||||
- `sql_mode`
|
||||
- `enable_profile`
|
||||
- `query_timeout`
|
||||
<version since="dev"></version>
|
||||
- `insert_timeout`
|
||||
- <version since="dev" type="inline">`insert_timeout`</version>
|
||||
- `exec_mem_limit`
|
||||
- `batch_size`
|
||||
- `allow_partition_column_nullable`
|
||||
|
||||
@ -177,7 +177,7 @@ under the License.
|
||||
|
||||
- `dynamic_partition.storage_medium`
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
指定创建的动态分区的默认存储介质。默认是 HDD,可选择 SSD。
|
||||
|
||||
|
||||
@ -154,6 +154,26 @@ LIMIT 5;
|
||||
Table Value Function 可以出现在 SQL 中,Table 能出现的任意位置。如 CTE 的 WITH 子句中,FROM 子句中。
|
||||
这样,你可以把文件当做一张普通的表进行任意分析。
|
||||
|
||||
<version since="dev"></version>
|
||||
|
||||
你也可以用过 `CREATE VIEW` 语句为 Table Value Function 创建一个逻辑视图。这样,你可以想其他视图一样,对这个 Table Value Function 进行访问、权限管理等操作,也可以让其他用户访问这个 Table Value Function。
|
||||
|
||||
```
|
||||
CREATE VIEW v1 AS
|
||||
SELECT * FROM s3(
|
||||
"URI" = "http://127.0.0.1:9312/test2/test.snappy.parquet",
|
||||
"ACCESS_KEY"= "minioadmin",
|
||||
"SECRET_KEY" = "minioadmin",
|
||||
"Format" = "parquet",
|
||||
"use_path_style"="true");
|
||||
|
||||
DESC v1;
|
||||
|
||||
SELECT * FROM v1;
|
||||
|
||||
GRANT SELECT_PRIV ON db1.v1 TO user1;
|
||||
```
|
||||
|
||||
### 数据导入
|
||||
|
||||
配合 `INSERT INTO SELECT` 语法,我们可以方便将文件导入到 Doris 表中进行更快速的分析:
|
||||
|
||||
@ -42,28 +42,32 @@ File Cache 将访问的远程数据缓存到本地的 BE 节点。原始的数
|
||||
File Cache 默认关闭,需要在 FE 和 BE 中设置相关参数进行开启。
|
||||
|
||||
### FE 配置
|
||||
|
||||
单个会话中开启 File Cache:
|
||||
|
||||
```
|
||||
SET enable_file_cache = true;
|
||||
```
|
||||
全局开启 File Cache:
|
||||
|
||||
```
|
||||
SET GLOBAL enable_file_cache = true;
|
||||
```
|
||||
|
||||
### BE 配置
|
||||
添加参数到 BE 节点的配置文件 conf/be.conf 中,并重启 BE 节点让配置生效。
|
||||
|
||||
| 参数 | 说明 |
|
||||
| ---- | ---- |
|
||||
| enable_file_cache | 是否启用 File Cache,默认 false |
|
||||
| file_cache_max_file_segment_size | 单个 Block 的大小上限,默认 4MB,需要大于 4096 |
|
||||
| file_cache_path | 缓存目录的相关配置,json格式,例子: [{"path": "storage1", "normal":53687091200,"persistent":21474836480,"query_limit": "10737418240"},{"path": "storage2", "normal":53687091200,"persistent":21474836480},{"path": "storage3","normal":53687091200,"persistent":21474836480}]。`path` 是缓存的保存路径,`normal` 是缓存的大小上限,`query_limit` 是单个查询能够使用的最大缓存大小,`persistent` / `file_cache_max_file_segment_size` 是最多缓存的 Block 数量。 |
|
||||
| enable_file_cache_query_limit | 是否限制单个 query 使用的缓存大小,默认 false |
|
||||
| clear_file_cache | BE 重启时是否删除之前的缓存数据,默认 false |
|
||||
| --- | --- |
|
||||
| `enable_file_cache` | 是否启用 File Cache,默认 false |
|
||||
| `file_cache_max_file_segment_size` | 单个 Block 的大小上限,默认 4MB,需要大于 4096 |
|
||||
| `file_cache_path` | 缓存目录的相关配置,json格式,例子: `[{"path": "storage1", "normal":53687091200,"persistent":21474836480,"query_limit": "10737418240"},{"path": "storage2", "normal":53687091200,"persistent":21474836480},{"path": "storage3","normal":53687091200,"persistent":21474836480}]`。`path` 是缓存的保存路径,`normal` 是缓存的大小上限,`query_limit` 是单个查询能够使用的最大缓存大小,`persistent` / `file_cache_max_file_segment_size` 是最多缓存的 Block 数量。 |
|
||||
| `enable_file_cache_query_limit` | 是否限制单个 query 使用的缓存大小,默认 false |
|
||||
| `clear_file_cache` | BE 重启时是否删除之前的缓存数据,默认 false |
|
||||
|
||||
### 查看 File Cache 命中情况
|
||||
|
||||
执行 set enable_profile=true 打开会话变量,可以在 FE 的 web 页面的 Queris 标签中查看到作业的 Profile。File Cache 相关的指标如下:
|
||||
执行 `set enable_profile=true` 打开会话变量,可以在 FE 的 web 页面的 Queris 标签中查看到作业的 Profile。File Cache 相关的指标如下:
|
||||
```
|
||||
- FileCache:
|
||||
- IOHitCacheNum: 552
|
||||
|
||||
@ -60,7 +60,7 @@ under the License.
|
||||
|
||||
升级 JDK 版本到 >= Java 8 u162 的版本。或者下载安装 JDK 相应的 JCE Unlimited Strength Jurisdiction Policy Files。
|
||||
|
||||
5. 查询 ORC 格式的表,FE 报错 `Could not obtain block`
|
||||
5. 查询 ORC 格式的表,FE 报错 `Could not obtain block` 或 `Caused by: java.lang.NoSuchFieldError: types`
|
||||
|
||||
对于 ORC 文件,在默认情况下,FE 会访问 HDFS 获取文件信息,进行文件切分。部分情况下,FE 可能无法访问到 HDFS。可以通过添加以下参数解决:
|
||||
|
||||
@ -81,3 +81,19 @@ under the License.
|
||||
8. 通过 JDBC Catalog 连接 MySQL 数据库报错:`Establishing SSL connection without server's identity verification is not recommended`
|
||||
|
||||
请在 `jdbc_url` 中添加 `useSSL=true`
|
||||
|
||||
9. 连接 Hive Catalog 报错:`Caused by: java.lang.NullPointerException`
|
||||
|
||||
如 fe.log 中有如下堆栈:
|
||||
|
||||
```
|
||||
Caused by: java.lang.NullPointerException
|
||||
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.getFilteredObjects(AuthorizationMetaStoreFilterHook.java:78) ~[hive-exec-3.1.3-core.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.ql.security.authorization.plugin.AuthorizationMetaStoreFilterHook.filterDatabases(AuthorizationMetaStoreFilterHook.java:55) ~[hive-exec-3.1.3-core.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1548) ~[doris-fe.jar:3.1.3]
|
||||
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1542) ~[doris-fe.jar:3.1.3]
|
||||
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_181]
|
||||
```
|
||||
|
||||
可以尝试在 `create catalog` 语句中添加 `"metastore.filter.hook" = "org.apache.hadoop.hive.metastore.DefaultMetaStoreFilterHookImpl"` 解决。
|
||||
|
||||
|
||||
@ -157,6 +157,7 @@ CREATE CATALOG hive WITH RESOURCE hms_resource PROPERTIES(
|
||||
);
|
||||
```
|
||||
<version since="dev"></version>
|
||||
|
||||
创建 Catalog 时可以采用参数 `file.meta.cache.ttl-second` 来设置 File Cache 自动失效时间,也可以将该值设置为 0 来禁用 File Cache。时间单位为:秒。示例如下:
|
||||
```sql
|
||||
CREATE CATALOG hive PROPERTIES (
|
||||
@ -172,7 +173,6 @@ CREATE CATALOG hive PROPERTIES (
|
||||
);
|
||||
```
|
||||
|
||||
|
||||
我们也可以直接将 hive-site.xml 放到 FE 和 BE 的 conf 目录下,系统也会自动读取 hive-site.xml 中的信息。信息覆盖的规则如下:
|
||||
|
||||
* Resource 中的信息覆盖 hive-site.xml 中的信息。
|
||||
|
||||
@ -130,7 +130,7 @@ CREATE CATALOG iceberg PROPERTIES (
|
||||
|
||||
## Time Travel
|
||||
|
||||
<version since="dev">
|
||||
<version since="1.2.2">
|
||||
|
||||
支持读取 Iceberg 表指定的 Snapshot。
|
||||
|
||||
|
||||
@ -140,7 +140,7 @@ CREATE CATALOG sqlserver_catalog PROPERTIES (
|
||||
|
||||
6. Doris
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
Jdbc Catalog也支持连接另一个Doris数据库:
|
||||
|
||||
@ -159,7 +159,7 @@ CREATE CATALOG doris_catalog PROPERTIES (
|
||||
|
||||
7. SAP_HANA
|
||||
|
||||
<version since="dev"></version>
|
||||
<version since="1.2.3"></version>
|
||||
|
||||
```sql
|
||||
CREATE CATALOG hana_catalog PROPERTIES (
|
||||
|
||||
@ -58,8 +58,7 @@ SET variable_assignment [, variable_assignment] ...
|
||||
- `sql_mode`
|
||||
- `enable_profile`
|
||||
- `query_timeout`
|
||||
<version since="dev"></version>
|
||||
- `insert_timeout`
|
||||
- <version since="dev" type="inline">`insert_timeout`</version>
|
||||
- `exec_mem_limit`
|
||||
- `batch_size`
|
||||
- `allow_partition_column_nullable`
|
||||
|
||||
Reference in New Issue
Block a user