[doc](fix)Unified JSON format (#19147)

This commit is contained in:
Liqf
2023-04-27 14:59:18 +08:00
committed by GitHub
parent f23c93b3c6
commit a9480bdcf3
2 changed files with 98 additions and 98 deletions

View File

@ -1,6 +1,6 @@
---
{
"title": "Load Json Format Data",
"title": "Load JSON Format Data",
"language": "en"
}
---
@ -30,21 +30,21 @@ Doris supports importing data in JSON format. This document mainly describes the
## Supported import methods
Currently, only the following import methods support data import in Json format:
Currently, only the following import methods support data import in JSON format:
- Through [S3 table function](../../../sql-manual/sql-functions/table-functions/s3.md) import statement: insert into select * from S3();
- Through [S3 table function](../../../sql-manual/sql-functions/table-functions/s3.md) import statement: insert into table select * from S3();
- Import the local JSON format file through [STREAM LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md).
- Subscribe and consume JSON format in Kafka via [ROUTINE LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md) information.
Other ways of importing data in JSON format are not currently supported.
## Supported Json formats
## Supported JSON formats
Currently only the following two Json formats are supported:
Currently only the following two JSON formats are supported:
1. Multiple rows of data represented by Array
Json format with Array as root node. Each element in the Array represents a row of data to be imported, usually an Object. An example is as follows:
JSON format with Array as root node. Each element in the Array represents a row of data to be imported, usually an Object. An example is as follows:
````json
[
@ -67,7 +67,7 @@ Currently only the following two Json formats are supported:
This method must be used with the setting `strip_outer_array=true`. Doris will expand the array when parsing, and then parse each Object in turn as a row of data.
2. A single row of data represented by Object
Json format with Object as root node. The entire Object represents a row of data to be imported. An example is as follows:
JSON format with Object as root node. The entire Object represents a row of data to be imported. An example is as follows:
````json
{ "id": 123, "city" : "beijing"}
@ -107,19 +107,19 @@ This parameter is usually used to import the format of **multi-line data represe
This feature requires that each row of data in the Array has exactly the same order of fields. Doris will only parse according to the field order of the first row, and then access the subsequent data in the form of subscripts. This method can improve the import efficiency by 3-5X.
## Json Path
## JSON Path
Doris supports extracting data specified in Json through Json Path.
Doris supports extracting data specified in JSON through JSON Path.
**Note: Because for Array type data, Doris will expand the array first, and finally process it in a single line according to the Object format. So the examples later in this document are all explained with Json data in a single Object format. **
- do not specify Json Path
- do not specify JSON Path
If Json Path is not specified, Doris will use the column name in the table to find the element in Object by default. An example is as follows:
If JSON Path is not specified, Doris will use the column name in the table to find the element in Object by default. An example is as follows:
The table contains two columns: `id`, `city`
The Json data is as follows:
The JSON data is as follows:
````json
{ "id": 123, "city" : "beijing"}
@ -127,7 +127,7 @@ Doris supports extracting data specified in Json through Json Path.
Then Doris will use `id`, `city` for matching, and get the final data `123` and `beijing`.
If the Json data is as follows:
If the JSON data is as follows:
````json
{ "id": 123, "name" : "beijing"}
@ -135,9 +135,9 @@ Doris supports extracting data specified in Json through Json Path.
Then use `id`, `city` for matching, and get the final data `123` and `null`.
- Specify Json Path
- Specify JSON Path
Specify a set of Json Path in the form of a Json data. Each element in the array represents a column to extract. An example is as follows:
Specify a set of JSON Path in the form of a JSON data. Each element in the array represents a column to extract. An example is as follows:
````json
["$.id", "$.name"]
@ -147,19 +147,19 @@ Doris supports extracting data specified in Json through Json Path.
["$.id.sub_id", "$.name[0]", "$.city[0]"]
````
Doris will use the specified Json Path for data matching and extraction.
Doris will use the specified JSON Path for data matching and extraction.
- matches non-primitive types
The values that are finally matched in the preceding examples are all primitive types, such as integers, strings, and so on. Doris currently does not support composite types, such as Array, Map, etc. So when a non-basic type is matched, Doris will convert the type to a string in Json format and import it as a string type. An example is as follows:
The Json data is:
The JSON data is:
````json
{ "id": 123, "city" : { "name" : "beijing", "region" : "haidian" }}
````
Json Path is `["$.city"]`. The matched elements are:
JSON Path is `["$.city"]`. The matched elements are:
````json
{ "name" : "beijing", "region" : "haidian" }
@ -175,21 +175,21 @@ Doris supports extracting data specified in Json through Json Path.
When the match fails, `null` will be returned. An example is as follows:
The Json data is:
The JSON data is:
````json
{ "id": 123, "name" : "beijing"}
````
Json Path is `["$.id", "$.info"]`. The matched elements are `123` and `null`.
JSON Path is `["$.id", "$.info"]`. The matched elements are `123` and `null`.
Doris currently does not distinguish between null values represented in Json data and null values produced when a match fails. Suppose the Json data is:
Doris currently does not distinguish between null values represented in JSON data and null values produced when a match fails. Suppose the JSON data is:
````json
{ "id": 123, "name" : null }
````
The same result would be obtained with the following two Json Paths: `123` and `null`.
The same result would be obtained with the following two JSON Paths: `123` and `null`.
````json
["$.id", "$.name"]
@ -207,7 +207,7 @@ Doris supports extracting data specified in Json through Json Path.
{ "id": 123, "city" : "beijing" }
````
If the Json Path is written incorrectly as (or if the Json Path is not specified, the columns in the table do not contain `id` and `city`):
If the JSON Path is written incorrectly as (or if the JSON Path is not specified, the columns in the table do not contain `id` and `city`):
````json
["$.ad", "$.infa"]
@ -215,11 +215,11 @@ Doris supports extracting data specified in Json through Json Path.
would cause the exact match to fail, and the line would be marked as an error line instead of yielding `null, null`.
## Json Path and Columns
## JSON Path and Columns
Json Path is used to specify how to extract data in JSON format, while Columns specifies the mapping and conversion relationship of columns. Both can be used together.
JSON Path is used to specify how to extract data in JSON format, while Columns specifies the mapping and conversion relationship of columns. Both can be used together.
In other words, it is equivalent to rearranging the columns of a Json format data according to the column order specified in Json Path through Json Path. After that, you can map the rearranged source data to the columns of the table through Columns. An example is as follows:
In other words, it is equivalent to rearranging the columns of a JSON format data according to the column order specified in JSON Path through JSON Path. After that, you can map the rearranged source data to the columns of the table through Columns. An example is as follows:
Data content:
@ -239,7 +239,7 @@ Import statement 1 (take Stream Load as an example):
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -T example.json http:/ /127.0.0.1:8030/api/db1/tbl1/_stream_load
````
In import statement 1, only Json Path is specified, and Columns is not specified. The role of Json Path is to extract the Json data in the order of the fields in the Json Path, and then write it in the order of the table structure. The final imported data results are as follows:
In import statement 1, only JSON Path is specified, and Columns is not specified. The role of JSON Path is to extract the JSON data in the order of the fields in the JSON Path, and then write it in the order of the table structure. The final imported data results are as follows:
````text
+------+------+
@ -249,7 +249,7 @@ In import statement 1, only Json Path is specified, and Columns is not specified
+------+------+
````
You can see that the actual k1 column imports the value of the "k2" column in the Json data. This is because the field name in Json is not equivalent to the field name in the table structure. We need to explicitly specify the mapping between the two.
You can see that the actual k1 column imports the value of the "k2" column in the JSON data. This is because the field name in JSON is not equivalent to the field name in the table structure. We need to explicitly specify the mapping between the two.
Import statement 2:
@ -257,7 +257,7 @@ Import statement 2:
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -H "columns: k2, k1 " -T example.json http://127.0.0.1:8030/api/db1/tbl1/_stream_load
````
Compared with the import statement 1, the Columns field is added here to describe the mapping relationship of the columns, in the order of `k2, k1`. That is, after extracting in the order of fields in Json Path, specify the value of column k2 in the table for the first column, and the value of column k1 in the table for the second column. The final imported data results are as follows:
Compared with the import statement 1, the Columns field is added here to describe the mapping relationship of the columns, in the order of `k2, k1`. That is, after extracting in the order of fields in JSON Path, specify the value of column k2 in the table for the first column, and the value of column k1 in the table for the second column. The final imported data results are as follows:
````text
+------+------+
@ -283,19 +283,19 @@ The above example will import the value of k1 multiplied by 100. The final impor
+------+------+
````
## Json root
## JSON root
Doris supports extracting data specified in Json through Json root.
Doris supports extracting data specified in JSON through JSON root.
**Note: Because for Array type data, Doris will expand the array first, and finally process it in a single line according to the Object format. So the examples later in this document are all explained with Json data in a single Object format. **
- do not specify Json root
- do not specify JSON root
If Json root is not specified, Doris will use the column name in the table to find the element in Object by default. An example is as follows:
If JSON root is not specified, Doris will use the column name in the table to find the element in Object by default. An example is as follows:
The table contains two columns: `id`, `city`
The Json data is as follows:
The JSON data is as follows:
```json
{ "id": 123, "name" : { "id" : "321", "city" : "shanghai" }}
@ -303,17 +303,17 @@ Doris supports extracting data specified in Json through Json root.
Then use `id`, `city` for matching, and get the final data `123` and `null`
- Specify Json root
- Specify JSON root
When the import data format is json, you can specify the root node of the Json data through json_root. Doris will extract the elements of the root node through json_root for parsing. Default is empty.
When the import data format is JSON, you can specify the root node of the JSON data through json_root. Doris will extract the elements of the root node through json_root for parsing. Default is empty.
Specify Json root `-H "json_root: $.name"`. The matched elements are:
Specify JSON root `-H "json_root: $.name"`. The matched elements are:
```json
{ "id" : "321", "city" : "shanghai" }
```
The element will be treated as new json for subsequent import operations,and get the final data 321 and shanghai
The element will be treated as new JSON for subsequent import operations,and get the final data 321 and shanghai
## NULL and Default values
@ -374,7 +374,7 @@ curl -v --location-trusted -u root: -H "format: json" -H "strip_outer_array: tru
### Stream Load
Because of the inseparability of the Json format, when using Stream Load to import a Json format file, the file content will be fully loaded into the memory before processing begins. Therefore, if the file is too large, it may take up more memory.
Because of the inseparability of the JSON format, when using Stream Load to import a JSON format file, the file content will be fully loaded into the memory before processing begins. Therefore, if the file is too large, it may take up more memory.
Suppose the table structure is:
@ -390,7 +390,7 @@ code INT NULL
{"id": 100, "city": "beijing", "code" : 1}
````
- do not specify Json Path
- do not specify JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -402,7 +402,7 @@ code INT NULL
100 beijing 1
````
- Specify Json Path
- Specify JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" - T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -420,7 +420,7 @@ code INT NULL
{"id": 100, "content": {"city": "beijing", "code": 1}}
````
- Specify Json Path
- Specify JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.content.city\",\"$.content.code\ "]" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -451,7 +451,7 @@ code INT NULL
]
````
- Specify Json Path
- Specify JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" - H "strip_outer_array: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -508,9 +508,9 @@ Import result:
105 {"order1":["guangzhou"]} 7
````
6. Import Array by json
6. Import Array by JSON
Since the Rapidjson handles decimal and largeint numbers which will cause precision problems,
we suggest you to use json string to import data to `array<decimal>` or `array<largeint>` column.
we suggest you to use JSON string to import data to `array<decimal>` or `array<largeint>` column.
```json
{"k1": 39, "k2": ["-818.2173181"]}
@ -556,6 +556,6 @@ MySQL > select * from array_test_largeint;
### Routine Load
The processing principle of Routine Load for Json data is the same as that of Stream Load. It is not repeated here.
The processing principle of Routine Load for JSON data is the same as that of Stream Load. It is not repeated here.
For Kafka data sources, the content in each Massage is treated as a complete Json data. If there are multiple rows of data represented in Array format in a Massage, multiple rows will be imported, and the offset of Kafka will only increase by 1. If an Array format Json represents multiple lines of data, but the Json parsing fails due to the wrong Json format, the error line will only increase by 1 (because the parsing fails, in fact, Doris cannot determine how many lines of data are contained in it, and can only error by one line data record)
For Kafka data sources, the content in each Massage is treated as a complete JSON data. If there are multiple rows of data represented in Array format in a Massage, multiple rows will be imported, and the offset of Kafka will only increase by 1. If an Array format Json represents multiple lines of data, but the Json parsing fails due to the wrong Json format, the error line will only increase by 1 (because the parsing fails, in fact, Doris cannot determine how many lines of data are contained in it, and can only error by one line data record)

View File

@ -26,25 +26,25 @@ under the License.
# JSON格式数据导入
Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格式数据导入时的注意事项。
Doris 支持导入 JSON 格式的数据。本文档主要说明在进行 JSON 格式数据导入时的注意事项。
## 支持的导入方式
目前只有以下导入方式支持 Json 格式的数据导入:
目前只有以下导入方式支持 JSON 格式的数据导入:
- 通过 [S3表函数](../../../sql-manual/sql-functions/table-functions/s3.md) 导入语句:insert into select * from S3();
- 通过 [S3 表函数](../../../sql-manual/sql-functions/table-functions/s3.md) 导入语句:insert into table select * from S3();
- 将本地 JSON 格式的文件通过 [STREAM LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/STREAM-LOAD.md) 方式导入。
- 通过 [ROUTINE LOAD](../../../sql-manual/sql-reference/Data-Manipulation-Statements/Load/CREATE-ROUTINE-LOAD.md) 订阅并消费 Kafka 中的 JSON 格式消息。
暂不支持其他方式的 JSON 格式数据导入。
## 支持的 Json 格式
## 支持的 JSON 格式
当前仅支持以下两种 Json 格式:
当前仅支持以下两种 JSON 格式:
1. 以 Array 表示的多行数据
以 Array 为根节点的 Json 格式。Array 中的每个元素表示要导入的一行数据,通常是一个 Object。示例如下:
以 Array 为根节点的 JSON 格式。Array 中的每个元素表示要导入的一行数据,通常是一个 Object。示例如下:
```json
[
@ -68,7 +68,7 @@ Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格
2. 以 Object 表示的单行数据
以 Object 为根节点的 Json 格式。整个 Object 即表示要导入的一行数据。示例如下:
以 Object 为根节点的 JSON 格式。整个 Object 即表示要导入的一行数据。示例如下:
```json
{ "id": 123, "city" : "beijing"}
@ -108,19 +108,19 @@ Doris 支持导入 JSON 格式的数据。本文档主要说明在进行JSON格
这个功能要求 Array 中的每行数据的**字段顺序完全一致**。Doris 仅会根据第一行的字段顺序做解析,然后以下标的形式访问之后的数据。该方式可以提升 3-5X 的导入效率。
## Json Path
## JSON Path
Doris 支持通过 Json Path 抽取 Json 中指定的数据。
Doris 支持通过 JSON Path 抽取 JSON 中指定的数据。
**注:因为对于 Array 类型的数据,Doris 会先进行数组展开,最终按照 Object 格式进行单行处理。所以本文档之后的示例都以单个 Object 格式的 Json 数据进行说明。**
- 不指定 Json Path
- 不指定 JSON Path
如果没有指定 Json Path,则 Doris 会默认使用表中的列名查找 Object 中的元素。示例如下:
如果没有指定 JSON Path,则 Doris 会默认使用表中的列名查找 Object 中的元素。示例如下:
表中包含两列: `id`, `city`
Json 数据如下:
JSON 数据如下:
```json
{ "id": 123, "city" : "beijing"}
@ -128,7 +128,7 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
则 Doris 会使用 `id`, `city` 进行匹配,得到最终数据 `123` 和 `beijing`。
如果 Json 数据如下:
如果 JSON 数据如下:
```json
{ "id": 123, "name" : "beijing"}
@ -136,9 +136,9 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
则使用 `id`, `city` 进行匹配,得到最终数据 `123` 和 `null`。
- 指定 Json Path
- 指定 JSON Path
通过一个 Json 数据的形式指定一组 Json Path。数组中的每个元素表示一个要抽取的列。示例如下:
通过一个 JSON 数据的形式指定一组 JSON Path。数组中的每个元素表示一个要抽取的列。示例如下:
```json
["$.id", "$.name"]
@ -148,19 +148,19 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
["$.id.sub_id", "$.name[0]", "$.city[0]"]
```
Doris 会使用指定的 Json Path 进行数据匹配和抽取。
Doris 会使用指定的 JSON Path 进行数据匹配和抽取。
- 匹配非基本类型
前面的示例最终匹配到的数值都是基本类型,如整型、字符串等。Doris 当前暂不支持复合类型,如 Array、Map 等。所以当匹配到一个非基本类型时,Doris 会将该类型转换为 Json 格式的字符串,并以字符串类型进行导入。示例如下:
前面的示例最终匹配到的数值都是基本类型,如整型、字符串等。Doris 当前暂不支持复合类型,如 Array、Map 等。所以当匹配到一个非基本类型时,Doris 会将该类型转换为 JSON 格式的字符串,并以字符串类型进行导入。示例如下:
Json 数据为:
JSON 数据为:
```json
{ "id": 123, "city" : { "name" : "beijing", "region" : "haidian" }}
```
Json Path 为 `["$.city"]`。则匹配到的元素为:
JSON Path 为 `["$.city"]`。则匹配到的元素为:
```json
{ "name" : "beijing", "region" : "haidian" }
@ -176,21 +176,21 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
当匹配失败时,将会返回 `null`。示例如下:
Json 数据为:
JSON 数据为:
```json
{ "id": 123, "name" : "beijing"}
```
Json Path 为 `["$.id", "$.info"]`。则匹配到的元素为 `123` 和 `null`。
JSON Path 为 `["$.id", "$.info"]`。则匹配到的元素为 `123` 和 `null`。
Doris 当前不区分 Json 数据中表示的 null 值,和匹配失败时产生的 null 值。假设 Json 数据为:
Doris 当前不区分 JSON 数据中表示的 null 值,和匹配失败时产生的 null 值。假设 JSON 数据为:
```json
{ "id": 123, "name" : null }
```
则使用以下两种 Json Path 会获得相同的结果:`123` 和 `null`。
则使用以下两种 JSON Path 会获得相同的结果:`123` 和 `null`。
```json
["$.id", "$.name"]
@ -202,13 +202,13 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
- 完全匹配失败
为防止一些参数设置错误导致的误操作。Doris 在尝试匹配一行数据时,如果所有列都匹配失败,则会认为这个是一个错误行。假设 Json 数据为:
为防止一些参数设置错误导致的误操作。Doris 在尝试匹配一行数据时,如果所有列都匹配失败,则会认为这个是一个错误行。假设 JSON 数据为:
```json
{ "id": 123, "city" : "beijing" }
```
如果 Json Path 错误的写为(或者不指定 Json Path 时,表中的列不包含 `id` 和 `city`):
如果 JSON Path 错误的写为(或者不指定 JSON Path 时,表中的列不包含 `id` 和 `city`):
```json
["$.ad", "$.infa"]
@ -216,11 +216,11 @@ Doris 支持通过 Json Path 抽取 Json 中指定的数据。
则会导致完全匹配失败,则该行会标记为错误行,而不是产出 `null, null`。
## Json Path 和 Columns
## JSON Path 和 Columns
Json Path 用于指定如何对 JSON 格式中的数据进行抽取,而 Columns 指定列的映射和转换关系。两者可以配合使用。
JSON Path 用于指定如何对 JSON 格式中的数据进行抽取,而 Columns 指定列的映射和转换关系。两者可以配合使用。
换句话说,相当于通过 Json Path,将一个 Json 格式的数据,按照 Json Path 中指定的列顺序进行了列的重排。之后,可以通过 Columns,将这个重排后的源数据和表的列进行映射。举例如下:
换句话说,相当于通过 JSON Path,将一个 JSON 格式的数据,按照 JSON Path 中指定的列顺序进行了列的重排。之后,可以通过 Columns,将这个重排后的源数据和表的列进行映射。举例如下:
数据内容:
@ -240,7 +240,7 @@ k2 int, k1 int
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -T example.json http://127.0.0.1:8030/api/db1/tbl1/_stream_load
```
导入语句1中,仅指定了 Json Path,没有指定 Columns。其中 Json Path 的作用是将 Json 数据按照 Json Path 中字段的顺序进行抽取,之后会按照表结构的顺序进行写入。最终导入的数据结果如下:
导入语句1中,仅指定了 JSON Path,没有指定 Columns。其中 JSON Path 的作用是将 JSON 数据按照 JSON Path 中字段的顺序进行抽取,之后会按照表结构的顺序进行写入。最终导入的数据结果如下:
```text
+------+------+
@ -250,7 +250,7 @@ curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\",
+------+------+
```
会看到,实际的 k1 列导入了 Json 数据中的 "k2" 列的值。这是因为,Json 中字段名称并不等同于表结构中字段的名称。我们需要显式的指定这两者之间的映射关系。
会看到,实际的 k1 列导入了 JSON 数据中的 "k2" 列的值。这是因为,JSON 中字段名称并不等同于表结构中字段的名称。我们需要显式的指定这两者之间的映射关系。
导入语句2:
@ -258,7 +258,7 @@ curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\",
curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\", \"$.k1\"]" -H "columns: k2, k1" -T example.json http://127.0.0.1:8030/api/db1/tbl1/_stream_load
```
相比如导入语句1,这里增加了 Columns 字段,用于描述列的映射关系,按 `k2, k1` 的顺序。即按Json Path 中字段的顺序抽取后,指定第一列为表中 k2 列的值,而第二列为表中 k1 列的值。最终导入的数据结果如下:
相比如导入语句1,这里增加了 Columns 字段,用于描述列的映射关系,按 `k2, k1` 的顺序。即按 JSON Path 中字段的顺序抽取后,指定第一列为表中 k2 列的值,而第二列为表中 k1 列的值。最终导入的数据结果如下:
```text
+------+------+
@ -284,19 +284,19 @@ curl -v --location-trusted -u root: -H "format: json" -H "jsonpaths: [\"$.k2\",
+------+------+
```
## Json root
## JSON root
Doris 支持通过 Json root 抽取 Json 中指定的数据。
Doris 支持通过 JSON root 抽取 JSON 中指定的数据。
**注:因为对于 Array 类型的数据,Doris 会先进行数组展开,最终按照 Object 格式进行单行处理。所以本文档之后的示例都以单个 Object 格式的 Json 数据进行说明。**
- 不指定 Json root
- 不指定 JSON root
如果没有指定 Json root,则 Doris 会默认使用表中的列名查找 Object 中的元素。示例如下:
如果没有指定 JSON root,则 Doris 会默认使用表中的列名查找 Object 中的元素。示例如下:
表中包含两列: `id`, `city`
Json 数据为:
JSON 数据为:
```json
{ "id": 123, "name" : { "id" : "321", "city" : "shanghai" }}
@ -304,17 +304,17 @@ Doris 支持通过 Json root 抽取 Json 中指定的数据。
则 Doris 会使用id, city 进行匹配,得到最终数据 123 和 null。
- 指定 Json root
- 指定 JSON root
通过 json_root 指定 Json 数据的根节点。Doris 将通过 json_root 抽取根节点的元素进行解析。默认为空。
通过 json_root 指定 JSON 数据的根节点。Doris 将通过 json_root 抽取根节点的元素进行解析。默认为空。
指定 Json root `-H "json_root: $.name"`。则匹配到的元素为:
指定 JSON root `-H "json_root: $.name"`。则匹配到的元素为:
```json
{ "id" : "321", "city" : "shanghai" }
```
该元素会被当作新Json进行后续导入操作,得到最终数据 321 和 shanghai
该元素会被当作新 JSON 进行后续导入操作,得到最终数据 321 和 shanghai
## NULL 和 Default 值
@ -374,7 +374,7 @@ curl -v --location-trusted -u root: -H "format: json" -H "strip_outer_array: tru
### Stream Load
因为 Json 格式的不可拆分特性,所以在使用 Stream Load 导入 Json 格式的文件时,文件内容会被全部加载到内存后,才开始处理。因此,如果文件过大的话,可能会占用较多的内存。
因为 JSON 格式的不可拆分特性,所以在使用 Stream Load 导入 JSON 格式的文件时,文件内容会被全部加载到内存后,才开始处理。因此,如果文件过大的话,可能会占用较多的内存。
假设表结构为:
@ -390,7 +390,7 @@ code INT NULL
{"id": 100, "city": "beijing", "code" : 1}
```
- 不指定 Json Path
- 不指定 JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -402,7 +402,7 @@ code INT NULL
100 beijing 1
```
- 指定 Json Path
- 指定 JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -420,7 +420,7 @@ code INT NULL
{"id": 100, "content": {"city": "beijing", "code" : 1}}
```
- 指定 Json Path
- 指定 JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.content.city\",\"$.content.code\"]" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -451,7 +451,7 @@ code INT NULL
]
```
- 指定 Json Path
- 指定 JSON Path
```bash
curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\",\"$.city\",\"$.code\"]" -H "strip_outer_array: true" -T data.json http://localhost:8030/api/db1/tbl1/_stream_load
@ -468,9 +468,9 @@ code INT NULL
105 {"order1":["guangzhou"]} 6
```
4. 以 多行Object 形式导入多行数据
4. 以多行 Object 形式导入多行数据
```
```json
{"id": 100, "city": "beijing", "code" : 1}
{"id": 101, "city": "shanghai"}
{"id": 102, "city": "tianjin", "code" : 3}
@ -511,8 +511,8 @@ curl --location-trusted -u user:passwd -H "format: json" -H "jsonpaths: [\"$.id\
105 {"order1":["guangzhou"]} 7
```
6. 使用json导入Array类型
由于Rapidjson处理decimal和largeint数值会导致精度问题,所以我们建议使用json字符串来导入数据到`array<decimal>` 或 `array<largeint>`列。
6. 使用 JSON 导入Array类型
由于 RapidJSON 处理decimal和largeint数值会导致精度问题,所以我们建议使用 JSON 字符串来导入数据到`array<decimal>` 或 `array<largeint>`列。
```json
{"k1": 39, "k2": ["-818.2173181"]}
@ -558,6 +558,6 @@ MySQL > select * from array_test_largeint;
### Routine Load
Routine Load 对 Json 数据的处理原理和 Stream Load 相同。在此不再赘述。
Routine Load 对 JSON 数据的处理原理和 Stream Load 相同。在此不再赘述。
对于 Kafka 数据源,每个 Massage 中的内容被视作一个完整的 Json 数据。如果一个 Massage 中是以 Array 格式的表示的多行数据,则会导入多行,而 Kafka 的 offset 只会增加 1。而如果一个 Array 格式的 Json 表示多行数据,但是因为 Json 格式错误导致解析 Json 失败,则错误行只会增加 1(因为解析失败,实际上 Doris 无法判断其中包含多少行数据,只能按一行错误数据记录)
对于 Kafka 数据源,每个 Massage 中的内容被视作一个完整的 JSON 数据。如果一个 Massage 中是以 Array 格式的表示的多行数据,则会导入多行,而 Kafka 的 offset 只会增加 1。而如果一个 Array 格式的 JSON 表示多行数据,但是因为 JSON 格式错误导致解析 JSON 失败,则错误行只会增加 1(因为解析失败,实际上 Doris 无法判断其中包含多少行数据,只能按一行错误数据记录)