From b763bfa17db45ee62687c72f9d86944dc67fdcb0 Mon Sep 17 00:00:00 2001 From: DongLiang-0 <46414265+DongLiang-0@users.noreply.github.com> Date: Thu, 31 Aug 2023 21:49:27 +0800 Subject: [PATCH] [Doc](tvf)Added tvf support for reading documents from avro files (#23436) --- .../sql-functions/table-functions/hdfs.md | 2 +- .../sql-functions/table-functions/s3.md | 19 +++++++++++++++++++ .../sql-functions/table-functions/hdfs.md | 2 +- .../sql-functions/table-functions/s3.md | 18 ++++++++++++++++++ 4 files changed, 39 insertions(+), 2 deletions(-) diff --git a/docs/en/docs/sql-manual/sql-functions/table-functions/hdfs.md b/docs/en/docs/sql-manual/sql-functions/table-functions/hdfs.md index 5969585ad5..7cbd21366a 100644 --- a/docs/en/docs/sql-manual/sql-functions/table-functions/hdfs.md +++ b/docs/en/docs/sql-manual/sql-functions/table-functions/hdfs.md @@ -69,7 +69,7 @@ Related parameters for accessing HDFS in HA mode: File format parameters: -- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` +- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro` - `column_separator`: (optional) default `,`. - `line_delimiter`: (optional) default `\n`. - `compress_type`: (optional) Currently support `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE`. Default value is `UNKNOWN`, it will automatically infer the type based on the suffix of `uri`. diff --git a/docs/en/docs/sql-manual/sql-functions/table-functions/s3.md b/docs/en/docs/sql-manual/sql-functions/table-functions/s3.md index b42788583f..d089c98155 100644 --- a/docs/en/docs/sql-manual/sql-functions/table-functions/s3.md +++ b/docs/en/docs/sql-manual/sql-functions/table-functions/s3.md @@ -424,6 +424,25 @@ MySQL [(none)]> select * from s3( +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+ ``` +**avro format** + +`avro` format: S3 tvf supports parsing the column names and column types of the table schema from the avro file. Example: + +```sql +select * from s3( + "uri" = "http://127.0.0.1:9312/test2/person.avro", + "ACCESS_KEY" = "ak", + "SECRET_KEY" = "sk", + "FORMAT" = "avro"); ++--------+--------------+-------------+-----------------+ +| name | boolean_type | double_type | long_type | ++--------+--------------+-------------+-----------------+ +| Alyssa | 1 | 10.0012 | 100000000221133 | +| Ben | 0 | 5555.999 | 4009990000 | +| lisi | 0 | 5992225.999 | 9099933330 | ++--------+--------------+-------------+-----------------+ +``` + **uri contains wildcards** uri can use wildcards to read multiple files. Note: If wildcards are used, the format of each file must be consistent (especially csv/csv_with_names/csv_with_names_and_types count as different formats), S3 tvf uses the first file to parse out the table schema. For example: diff --git a/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/hdfs.md b/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/hdfs.md index 47e723a623..c7faaa7a86 100644 --- a/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/hdfs.md +++ b/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/hdfs.md @@ -70,7 +70,7 @@ hdfs( - `dfs.client.failover.proxy.provider.your-nameservices`:(选填) 文件格式相关参数 -- `format`:(必填) 目前支持 `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc` +- `format`:(必填) 目前支持 `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro` - `column_separator`:(选填) 列分割符, 默认为`,`。 - `line_delimiter`:(选填) 行分割符,默认为`\n`。 - `compress_type`: (选填) 目前支持 `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE`。 默认值为 `UNKNOWN`, 将会根据 `uri` 的后缀自动推断类型。 diff --git a/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/s3.md b/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/s3.md index 1a64205643..081734985c 100644 --- a/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/s3.md +++ b/docs/zh-CN/docs/sql-manual/sql-functions/table-functions/s3.md @@ -428,6 +428,24 @@ MySQL [(none)]> select * from s3( | 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN | 15 | SM PKG | 905 | wake carefully | +-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+ ``` +**avro format** + +`avro` 格式:S3 tvf支持从avro文件中解析出table schema的列名、列类型。举例: + +```sql +select * from s3( + "uri" = "http://127.0.0.1:9312/test2/person.avro", + "ACCESS_KEY" = "ak", + "SECRET_KEY" = "sk", + "FORMAT" = "avro"); ++--------+--------------+-------------+-----------------+ +| name | boolean_type | double_type | long_type | ++--------+--------------+-------------+-----------------+ +| Alyssa | 1 | 10.0012 | 100000000221133 | +| Ben | 0 | 5555.999 | 4009990000 | +| lisi | 0 | 5992225.999 | 9099933330 | ++--------+--------------+-------------+-----------------+ +``` **uri包含通配符**