diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md index c6e6285ac3..d7c10b7d05 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md @@ -85,6 +85,7 @@ The bottom layer of the `Export` statement actually executes the `select...outfi - `parallelism`: The concurrency degree of the `export` job, the default is `1`. The export job will be divided into `select..outfile..` statements of the number of `parallelism` to execute concurrently. (If the value of `parallelism` is greater than the number of tablets in the table, the system will automatically set `parallelism` to the number of tablets, that is, each `select..outfile..` statement is responsible for one tablet) - `delete_existing_files`: default `false`. If it is specified as true, you will first delete all files specified in the directory specified by the file_path, and then export the data to the directory.For example: "file_path" = "/user/tmp", then delete all files and directory under "/user/"; "file_path" = "/user/tmp/", then delete all files and directory under "/user/tmp/" - `max_file_size`: it is the limit for the size of a single file in the export job. If the result file exceeds this value, it will be split into multiple files. The valid range for `max_file_size` is [5MB, 2GB], with a default value of 1GB. (When exporting to the ORC file format, the actual size of the split files will be multiples of 64MB, for example, if max_file_size is specified as 5MB, the actual split size will be 64MB; if max_file_size is specified as 65MB, the actual split size will be 128MB.) + - `timeout`: This is the timeout parameter of the export job, the default timeout is 2 hours, and the unit is seconds. > Note that to use the `delete_existing_files` parameter, you also need to add the configuration `enable_delete_existing_files = true` to the fe.conf file and restart the FE. Only then will the `delete_existing_files` parameter take effect. Setting `delete_existing_files = true` is a dangerous operation and it is recommended to only use it in a testing environment. @@ -367,5 +368,3 @@ WITH BROKER "broker_name" - Currently, The `Export Job` is simply check whether the `Tablets version` is the same, it is recommended not to import data during the execution of the `Export Job`. - The maximum number of partitions that an `Export job` allows is 2000. You can add a parameter to the fe.conf `maximum_number_of_export_partitions` and restart FE to modify the setting. - - - The timeout time for the `EXPORT` command is the same as the timeout time for queries. It can be set using `SET query_timeout=xxx`, where xxx represents the desired timeout value. diff --git a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md index 5593feca9d..9bf2608ce9 100644 --- a/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md +++ b/docs/en/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md @@ -33,7 +33,7 @@ OURFILE This statement is used to export query results to a file using the `SELECT INTO OUTFILE` command. Currently, it supports exporting to remote storage, such as HDFS, S3, BOS, COS (Tencent Cloud), through the Broker process, S3 protocol, or HDFS protocol. -**grammar:** +#### grammar: ```sql query_stmt @@ -42,7 +42,7 @@ INTO OUTFILE "file_path" [properties] ``` -**illustrate:** +#### illustrate: 1. file_path @@ -129,6 +129,57 @@ INTO OUTFILE "file_path" INTO OUTFILE "file:///home/work/path/result_"; ``` +#### DataType Mapping + +Parquet and ORC file formats have their own data types. The export function of Doris can automatically export the Doris data types to the corresponding data types of the Parquet/ORC file format. The following are the data type mapping relationship of the Doris data types and the Parquet/ORC file format data types: + +1. The mapping relationship between the Doris data types to the ORC data types is: + + | Doris Type | Orc Type | + | --- | --- | + | boolean | boolean | + | tinyint | tinyint | + | smallint | smallint | + | int | int | + | bigint | bigint | + | largeInt | string | + | date | string | + | datev2 | string | + | datetime | string | + | datetimev2 | timestamp | + | float | float | + | double | double | + | char / varchar / string | string | + | decimal | decimal | + | struct | struct | + | map | map | + | array | array | + +2. When Doris exports data to the Parquet file format, the Doris memory data will be converted to Arrow memory data format first, and then the paraquet file format is written by Arrow. The mapping relationship between the Doris data types to the ARROW data types is: + + | Doris Type | Arrow Type | + | --- | --- | + | boolean | boolean | + | tinyint | int8 | + | smallint | int16 | + | int | int32 | + | bigint | int64 | + | largeInt | utf8 | + | date | utf8 | + | datev2 | utf8 | + | datetime | utf8 | + | datetimev2 | utf8 | + | float | float32 | + | double | float64 | + | char / varchar / string | utf8 | + | decimal | decimal128 | + | struct | struct | + | map | map | + | array | list | + + + + ### example 1. Use the broker method to export, and export the simple query results to the file `hdfs://path/to/result.txt`. Specifies that the export format is CSV. Use `my_broker` and set kerberos authentication information. Specify the column separator as `,` and the row separator as `\n`. diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md index 6eddc68785..e780c93084 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT.md @@ -90,6 +90,8 @@ EXPORT - `delete_existing_files`: 默认为false,若指定为true,则会先删除`export_path`所指定目录下的所有文件,然后导出数据到该目录下。例如:"export_path" = "/user/tmp", 则会删除"/user/"下所有文件及目录;"file_path" = "/user/tmp/", 则会删除"/user/tmp/"下所有文件及目录。 + - `timeout`:导出作业的超时时间,默认为2小时,单位是秒。 + > 注意:要使用delete_existing_files参数,还需要在fe.conf中添加配置`enable_delete_existing_files = true`并重启fe,此时delete_existing_files才会生效。delete_existing_files = true 是一个危险的操作,建议只在测试环境中使用。 @@ -361,5 +363,3 @@ Export 作业拆分成多个`SELECT INTO OUTFILE`的具体逻辑是:将该表 - 目前在export时只是简单检查tablets版本是否一致,建议在执行export过程中不要对该表进行导入数据操作。 - 一个Export Job允许导出的分区数量最大为2000,可以在fe.conf中添加参数`maximum_number_of_export_partitions`并重启FE来修改该设置。 - -- `EXPORT`命令的超时时间同查询的超时时间。可以通过 SET query_timeout=xxx 进行设置 diff --git a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md index b4f7a8e3d2..39d74072ed 100644 --- a/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md +++ b/docs/zh-CN/docs/sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE.md @@ -34,7 +34,7 @@ OURFILE `SELECT INTO OUTFILE` 命令用于将查询结果导出为文件。目前支持通过 Broker 进程, S3 协议或HDFS 协议,导出到远端存储,如 HDFS,S3,BOS,COS(腾讯云)上。 -语法: +#### 语法: ``` query_stmt @@ -43,7 +43,7 @@ INTO OUTFILE "file_path" [properties] ``` -说明: +#### 说明: 1. file_path @@ -133,6 +133,55 @@ INTO OUTFILE "file_path" INTO OUTFILE "file:///home/work/path/result_"; ``` +#### 数据类型映射 + +parquet、orc文件格式拥有自己的数据类型,Doris的导出功能能够自动将Doris的数据类型导出到parquet/orc文件格式的对应数据类型,以下是Doris数据类型和parquet/orc文件格式的数据类型映射关系表: + +1. Doris导出到Orc文件格式的数据类型映射表: + + | Doris Type | Orc Type | + | --- | --- | + | boolean | boolean | + | tinyint | tinyint | + | smallint | smallint | + | int | int | + | bigint | bigint | + | largeInt | string | + | date | string | + | datev2 | string | + | datetime | string | + | datetimev2 | timestamp | + | float | float | + | double | double | + | char / varchar / string | string | + | decimal | decimal | + | struct | struct | + | map | map | + | array | array | + + +2. Doris导出到Parquet文件格式时,会先将Doris内存数据转换为arrow内存数据格式,然后由arrow写出到parquet文件格式。Doris数据类型到arrow数据类的映射关系为: + + | Doris Type | Arrow Type | + | --- | --- | + | boolean | boolean | + | tinyint | int8 | + | smallint | int16 | + | int | int32 | + | bigint | int64 | + | largeInt | utf8 | + | date | utf8 | + | datev2 | utf8 | + | datetime | utf8 | + | datetimev2 | utf8 | + | float | float32 | + | double | float64 | + | char / varchar / string | utf8 | + | decimal | decimal128 | + | struct | struct | + | map | map | + | array | list | + ### example 1. 使用 broker 方式导出,将简单查询结果导出到文件 `hdfs://path/to/result.txt`。指定导出格式为 CSV。使用 `my_broker` 并设置 kerberos 认证信息。指定列分隔符为 `,`,行分隔符为 `\n`。