[doc](insert-overwrite) add related doc (#20657)

This commit is contained in:
nanfeng
2023-06-16 21:11:04 +08:00
committed by GitHub
parent ab32299ba4
commit 483d96368c
3 changed files with 364 additions and 1 deletions

View File

@ -0,0 +1,180 @@
---
{
"title": "INSERT-OVERWRITE",
"language": "en"
}
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## INSERT OVERWRITE
### Name
INSERT OVERWRITE
### Description
The function of this statement is to overwrite a table or a partition of a table
```sql
INSERT OVERWRITE table table_name
[ PARTITION (p1, ...) ]
[ WITH LABEL label]
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
```
Parameters
> table_name: the destination table to overwrite. This table must exist. It can be of the form `db_name.table_name`
>
> partitions: the table partition that needs to be overwritten must be one of the existing partitions in `table_name` separated by a comma
>
> label: specify a label for the Insert task
>
> column_name: the specified destination column must be one of the existing columns in `table_name`
>
> expression: the corresponding expression that needs to be assigned to a column
>
> DEFAULT: let the column use the default value
>
> query: a common query, the result of the query will overwrite the target.
>
> hint: some indicator used to indicate the execution behavior of `INSERT`. You can choose one of this values: `/*+ STREAMING */`, `/*+ SHUFFLE */` or `/*+ NOSHUFFLE */.
>
> 1. STREAMING: At present, it has no practical effect and is only reserved for compatibility with previous versions. (In the previous version, adding this hint would return a label, but now it defaults to returning a label)
> 2. SHUFFLE: When the target table is a partition table, enabling this hint will do repartiiton.
> 3. NOSHUFFLE: Even if the target table is a partition table, repartiiton will not be performed, but some other operations will be performed to ensure that the data is correctly dropped into each partition.
Notice:
In the current version, the session variable `enable_insert_strict` is set to `true` by default. If some data that does not conform to the format of the target table is filtered out during the execution of the `INSERT OVERWRITE` statement, such as when overwriting a partition and not all partition conditions are satisfied, overwriting the target table will fail.
The `INSERT OVERWRITE` statement first creates a new table, inserts the data to be overwritten into the new table, and then atomically replaces the old table with the new table and modifies its name. Therefore, during the process of overwriting the table, the data in the old table can still be accessed normally until the overwriting is completed.
### Example
Assuming there is a table named `test`. The table contains two columns `c1` and `c2`, and two partitions `p1` and `p2`
```sql
CREATE TABLE IF NOT EXISTS test (
`c1` int NOT NULL DEFAULT "1",
`c2` int NOT NULL DEFAULT "4"
) ENGINE=OLAP
UNIQUE KEY(`c1`)
PARTITION BY LIST (`c1`)
(
PARTITION p1 VALUES IN ("1","2","3"),# Partition p1 only allows 1, 2, and 3 to exist.
PARTITION p2 VALUES IN ("4","5","6") # Partition p2 only allows 1, 5, and 6 to exist.
)
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);
```
#### Overwrite Table
1. Overwrite the `test` table using the form of `VALUES`.
```sql
// Single-row overwrite.
INSERT OVERWRITE table test VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1);
// Multi-row overwrite.
INSERT OVERWRITE table test VALUES (1, 2), (3, 2 + 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2), (3, 2 * 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1), (3);
```
- The first and second statements have the same effect. If the target column is not specified during overwriting, the column order in the table will be used as the default target column. After the overwrite is successful, there is only one row of data in the `test` table.
- The third and fourth statements have the same effect. The unspecified column `c2` will be overwritten with the default value 1. After the overwrite is successful, there is only one row of data in the `test` table.
- The fifth and sixth statements have the same effect. Expressions (such as `2+2`, `2*2`) can be used in the statement. The result of the expression will be computed during the execution of the statement and then overwritten into the `test` table. After the overwrite is successful, there are two rows of data in the `test` table.
- The seventh and eighth statements have the same effect. The unspecified column `c2` will be overwritten with the default value 1. After the overwrite is successful, there are two rows of data in the `test` table.
2. Overwrite the `test` table in the form of a query statement. The data format of the `test2` table and the `test` table must be consistent. If they are not consistent, implicit data type conversion will be triggered.
```sql
INSERT OVERWRITE table test SELECT * FROM test2;
INSERT OVERWRITE table test (c1, c2) SELECT * from test2;
```
- The first and second statements have the same effect. The purpose of these statements is to take data from the `test2` table and overwrite the `test` table with the taken data. After the overwrite is successful, the data in the `test` table will be consistent with the data in the `test2` table.
3. Overwrite the `test` table and specify a label.
```sql
INSERT OVERWRITE table test WITH LABEL `label1` SELECT * FROM test2;
INSERT OVERWRITE table test WITH LABEL `label2` (c1, c2) SELECT * from test2;
```
- Using a label will encapsulate this task into an **asynchronous task**. After executing the statement, the relevant operations will be executed asynchronously. Users can use the `SHOW LOAD;` command to check the status of the job imported by this `label`. It should be noted that the label is unique.
#### Overwrite Table Partition
1. Overwrite partitions `P1` and `P2` of the `test` table using the form of `VALUES`.
```sql
// Single-row overwrite.
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1);
// Multi-row overwrite.
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2), (4, 2 + 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2), (4, 2 * 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT), (4, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1), (4);
```
Unlike overwriting an entire table, the above statements are overwriting partitions in the table. Partitions can be overwritten one at a time or multiple partitions can be overwritten at once. It should be noted that only data that satisfies the corresponding partition filtering condition can be overwritten successfully. If there is data in the overwritten data that does not satisfy any of the partitions, the overwrite will fail. An example of a failure is shown below.
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (7, 2);
```
The data overwritten by the above statements (`c1=7`) does not satisfy the conditions of partitions `P1` and `P2`, so the overwrite will fail.
2. Overwrite partitions `P1` and `P2` of the `test` table in the form of a query statement. The data format of the `test2` table and the `test` table must be consistent. If they are not consistent, implicit data type conversion will be triggered.
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) SELECT * from test2;
```
3. Overwrite partitions `P1` and `P2` of the `test` table and specify a label.
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label3` SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label4` (c1, c2) SELECT * from test2;
```
### Keywords
INSERT OVERWRITE

View File

@ -882,7 +882,8 @@
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/DELETE",
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/UPDATE",
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/EXPORT",
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/CANCEL-EXPORT"
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/CANCEL-EXPORT",
"sql-manual/sql-reference/Data-Manipulation-Statements/Manipulation/INSERT-OVERWRITE"
]
},
"sql-manual/sql-reference/Data-Manipulation-Statements/OUTFILE"

View File

@ -0,0 +1,182 @@
---
{
"title": "INSERT-OVERWRITE",
"language": "zh-CN"
}
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
## INSERT OVERWRITE
### Name
INSERT OVERWRITE
### Description
该语句的功能是重写表或表的某个分区
```sql
INSERT OVERWRITE table table_name
[ PARTITION (p1, ...) ]
[ WITH LABEL label]
[ (column [, ...]) ]
[ [ hint [, ...] ] ]
{ VALUES ( { expression | DEFAULT } [, ...] ) [, ...] | query }
```
Parameters
> table_name: 需要重写的目的表。这个表必须存在。可以是 `db_name.table_name` 形式
>
> partitions: 需要重写的表分区,必须是 `table_name` 中存在的分区,多个分区名称用逗号分隔
>
> label: 为 Insert 任务指定一个 label
>
> column_name: 指定的目的列,必须是 `table_name` 中存在的列
>
> expression: 需要赋值给某个列的对应表达式
>
> DEFAULT: 让对应列使用默认值
>
> query: 一个普通查询,查询的结果会重写到目标中
>
> hint: 用于指示 `INSERT` 执行行为的一些指示符。目前 hint 有三个可选值`/*+ STREAMING */`、`/*+ SHUFFLE */`或`/*+ NOSHUFFLE */`
>
> 1. STREAMING:目前无实际作用,只是为了兼容之前的版本,因此保留。(之前的版本加上这个 hint 会返回 label,现在默认都会返回 label)
> 2. SHUFFLE:当目标表是分区表,开启这个 hint 会进行 repartiiton。
> 3. NOSHUFFLE:即使目标表是分区表,也不会进行 repartiiton,但会做一些其他操作以保证数据正确落到各个分区中。
注意:
在当前版本中,会话变量 `enable_insert_strict` 默认为 `true`,如果执行 `INSERT OVERWRITE` 语句时,对于有不符合目标表格式的数据被过滤掉的话会重写目标表失败(比如重写分区时,不满足所有分区条件的数据会被过滤)。
INSERT OVERWRITE语句会首先创建一个新表,将需要重写的数据插入到新表中,最后原子性的用新表替换旧表并修改名称。因此,在重写表的过程中,旧表中的数据在重写完毕之前仍然可以正常访问。
### Example
假设有`test` 表。该表包含两个列`c1`, `c2`,两个分区`p1`,`p2`。建表语句如下所示
```sql
CREATE TABLE IF NOT EXISTS test (
`c1` int NOT NULL DEFAULT "1",
`c2` int NOT NULL DEFAULT "4"
) ENGINE=OLAP
UNIQUE KEY(`c1`)
PARTITION BY LIST (`c1`)
(
PARTITION p1 VALUES IN ("1","2","3"),# 分区p1只允许1 2 3存在
PARTITION p2 VALUES IN ("4","5","6") # 分区p2只允许1 5 6存在
)
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2"
);
```
#### Overwrite Table
1. VALUES的形式重写`test`
```sql
# 单行重写
INSERT OVERWRITE table test VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1);
# 多行重写
INSERT OVERWRITE table test VALUES (1, 2), (3, 2 + 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, 2), (3, 2 * 2);
INSERT OVERWRITE table test (c1, c2) VALUES (1, DEFAULT), (3, DEFAULT);
INSERT OVERWRITE table test (c1) VALUES (1), (3);
```
- 第一条语句和第二条语句的效果一致,重写时如果不指定目标列,会使用表中的列顺序来作为默认的目标列。重写成功后表`test`中只有一行数据。
- 第三条语句和第四条语句的效果一致,没有指定的列`c2`会使用默认值1来完成数据重写。重写成功后表`test`中只有一行数据。
- 第五条语句和第六条语句的效果一致,在语句中可以使用表达式(如`2+2`,`2*2`),执行语句的时候会计算出表达式的结果再重写表`test`。重写成功后表`test`中有两行数据。
- 第七条语句和第八条语句的效果一致,没有指定的列`c2`会使用默认值1来完成数据重写。重写成功后表`test`中有两行数据。
2. 查询语句的形式重写`test`表,表`test2`和表`test`的数据格式需要保持一致,如果不一致会触发数据类型的隐式转换
```sql
INSERT OVERWRITE table test SELECT * FROM test2;
INSERT OVERWRITE table test (c1, c2) SELECT * from test2;
```
- 第一条语句和第二条语句的效果一致,该语句的作用是将数据从表`test2`中取出,使用取出的数据重写表`test`。重写成功后表`test`中的数据和表`test2`中的数据保持一致。
3. 重写 `test` 表并指定label
```sql
INSERT OVERWRITE table test WITH LABEL `label1` SELECT * FROM test2;
INSERT OVERWRITE table test WITH LABEL `label2` (c1, c2) SELECT * from test2;
```
- 使用label会将此任务封装成一个**异步任务**,执行语句之后,相关操作都会异步执行,用户可以通过`SHOW LOAD;`命令查看此`label`导入作业的状态。需要注意的是label具有唯一性。
#### Overwrite Table Partition
1. VALUES的形式重写`test`表分区`P1`和`p2`
```sql
# 单行重写
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1);
# 多行重写
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (1, 2), (4, 2 + 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, 2), (4, 2 * 2);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) VALUES (1, DEFAULT), (4, DEFAULT);
INSERT OVERWRITE table test PARTITION(p1,p2) (c1) VALUES (1), (4);
```
以上语句与重写表不同的是,它们都是重写表中的分区。分区可以一次重写一个分区也可以一次重写多个分区。需要注意的是,只有满足对应分区过滤条件的数据才能够重写成功。如果重写的数据中有数据不满足其中任意一个分区,那么本次重写会失败。一个失败的例子如下所示
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) VALUES (7, 2);
```
以上语句重写的数据`c1=7`分区`p1`和`p2`的条件都不满足,因此会重写失败。
2. 查询语句的形式重写`test`表分区`P1`和`p2`,表`test2`和表`test`的数据格式需要保持一致,如果不一致会触发数据类型的隐式转换
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) (c1, c2) SELECT * from test2;
```
3. 重写 `test` 表分区`P1`和`p2`并指定label
```sql
INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label3` SELECT * FROM test2;
INSERT OVERWRITE table test PARTITION(p1,p2) WITH LABEL `label4` (c1, c2) SELECT * from test2;
```
### Keywords
INSERT OVERWRITE