[Doc](tvf)Added tvf support for reading documents from avro files (#23436)
This commit is contained in:
@ -69,7 +69,7 @@ Related parameters for accessing HDFS in HA mode:
|
||||
|
||||
File format parameters:
|
||||
|
||||
- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`
|
||||
- `format`: (required) Currently support `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
|
||||
- `column_separator`: (optional) default `,`.
|
||||
- `line_delimiter`: (optional) default `\n`.
|
||||
- `compress_type`: (optional) Currently support `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE`. Default value is `UNKNOWN`, it will automatically infer the type based on the suffix of `uri`.
|
||||
|
||||
@ -424,6 +424,25 @@ MySQL [(none)]> select * from s3(
|
||||
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
|
||||
```
|
||||
|
||||
**avro format**
|
||||
|
||||
`avro` format: S3 tvf supports parsing the column names and column types of the table schema from the avro file. Example:
|
||||
|
||||
```sql
|
||||
select * from s3(
|
||||
"uri" = "http://127.0.0.1:9312/test2/person.avro",
|
||||
"ACCESS_KEY" = "ak",
|
||||
"SECRET_KEY" = "sk",
|
||||
"FORMAT" = "avro");
|
||||
+--------+--------------+-------------+-----------------+
|
||||
| name | boolean_type | double_type | long_type |
|
||||
+--------+--------------+-------------+-----------------+
|
||||
| Alyssa | 1 | 10.0012 | 100000000221133 |
|
||||
| Ben | 0 | 5555.999 | 4009990000 |
|
||||
| lisi | 0 | 5992225.999 | 9099933330 |
|
||||
+--------+--------------+-------------+-----------------+
|
||||
```
|
||||
|
||||
**uri contains wildcards**
|
||||
|
||||
uri can use wildcards to read multiple files. Note: If wildcards are used, the format of each file must be consistent (especially csv/csv_with_names/csv_with_names_and_types count as different formats), S3 tvf uses the first file to parse out the table schema. For example:
|
||||
|
||||
@ -70,7 +70,7 @@ hdfs(
|
||||
- `dfs.client.failover.proxy.provider.your-nameservices`:(选填)
|
||||
|
||||
文件格式相关参数
|
||||
- `format`:(必填) 目前支持 `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc`
|
||||
- `format`:(必填) 目前支持 `csv/csv_with_names/csv_with_names_and_types/json/parquet/orc/avro`
|
||||
- `column_separator`:(选填) 列分割符, 默认为`,`。
|
||||
- `line_delimiter`:(选填) 行分割符,默认为`\n`。
|
||||
- `compress_type`: (选填) 目前支持 `UNKNOWN/PLAIN/GZ/LZO/BZ2/LZ4FRAME/DEFLATE`。 默认值为 `UNKNOWN`, 将会根据 `uri` 的后缀自动推断类型。
|
||||
|
||||
@ -428,6 +428,24 @@ MySQL [(none)]> select * from s3(
|
||||
| 5 | forest brown coral puff cream | Manufacturer#3 | Brand#32 | STANDARD POLISHED TIN | 15 | SM PKG | 905 | wake carefully |
|
||||
+-----------+------------------------------------------+----------------+----------+-------------------------+--------+-------------+---------------+---------------------+
|
||||
```
|
||||
**avro format**
|
||||
|
||||
`avro` 格式:S3 tvf支持从avro文件中解析出table schema的列名、列类型。举例:
|
||||
|
||||
```sql
|
||||
select * from s3(
|
||||
"uri" = "http://127.0.0.1:9312/test2/person.avro",
|
||||
"ACCESS_KEY" = "ak",
|
||||
"SECRET_KEY" = "sk",
|
||||
"FORMAT" = "avro");
|
||||
+--------+--------------+-------------+-----------------+
|
||||
| name | boolean_type | double_type | long_type |
|
||||
+--------+--------------+-------------+-----------------+
|
||||
| Alyssa | 1 | 10.0012 | 100000000221133 |
|
||||
| Ben | 0 | 5555.999 | 4009990000 |
|
||||
| lisi | 0 | 5992225.999 | 9099933330 |
|
||||
+--------+--------------+-------------+-----------------+
|
||||
```
|
||||
|
||||
**uri包含通配符**
|
||||
|
||||
|
||||
Reference in New Issue
Block a user