[docs](doc) Add autobucket doc (#16746)
Signed-off-by: Jack Drogon <jack.xsuperman@gmail.com>
This commit is contained in:
175
docs/en/docs/advanced/autobucket.md
Normal file
175
docs/en/docs/advanced/autobucket.md
Normal file
@ -0,0 +1,175 @@
|
||||
---
|
||||
{
|
||||
"title": "AutoBucket",
|
||||
"language": "en"
|
||||
}
|
||||
---
|
||||
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
# Background
|
||||
|
||||
<version since="1.2.2">
|
||||
|
||||
DISTRIBUTED BY ... BUCKETS auto
|
||||
|
||||
</version>
|
||||
|
||||
Users often set inappropriate buckets, leading to various problems. For now, it only works for olap tables
|
||||
|
||||
# Implementation
|
||||
|
||||
In the past, when creating buckets, you had to set the number of buckets manually, but the automatic bucket projection feature is a way for Apache Doris to dynamically project the number of buckets, so that the number of buckets always stays within a suitable range and users don't have to worry about the minutiae of the number of buckets.
|
||||
First, for the sake of clarity, this section splits the bucket into two periods, the initial bucket and the subsequent bucket; the initial and subsequent are just terms used in this article to describe the feature clearly, there is no initial or subsequent Apache Doris bucket.
|
||||
As we know from the section above on creating buckets, BUCKET_DESC is very simple, but you need to specify the number of buckets; for the automatic bucket projection feature, the syntax of BUCKET_DESC directly changes the number of buckets to "Auto" and adds a new Properties configuration.
|
||||
|
||||
```sql
|
||||
-- old version of the creation syntax for specifying the number of buckets
|
||||
DISTRIBUTED BY HASH(site) BUCKETS 20
|
||||
|
||||
-- Newer versions use the creation syntax for automatic bucket imputation
|
||||
DISTRIBUTED BY HASH(site) BUCKETS AUTO
|
||||
properties("estimate_partition_size" = "100G")
|
||||
```
|
||||
|
||||
The new configuration parameter estimate_partition_size indicates the amount of data for a single partition. This parameter is optional and if not given, Doris will take the default value of estimate_partition_size to 10GB.
|
||||
As you know from the above, a partitioned bucket is a Tablet at the physical level, and for best performance, it is recommended that the Tablet size be in the range of 1GB - 10GB. So how does the automatic bucketing projection ensure that the Tablet size is within this range? To summarize, there are a few principles.
|
||||
|
||||
- If the overall data volume is small, the number of buckets should not be set too high
|
||||
- If the overall data volume is large, the number of buckets should be related to the total number of disk blocks, so as to fully utilize the capacity of each BE machine and each disk
|
||||
Initial bucketing projection
|
||||
Starting from the principle, it becomes easy to understand the detailed logic of the automatic bucket imputation function.
|
||||
First look at the initial bucketing
|
||||
|
||||
First, use the value of estimate_partition_size divided by 5 (calculated as a 5-to-1 data compression ratio in text format into Doris) to obtain the following result.
|
||||
|
||||
- (, 100MB), then take N=1
|
||||
- [100MB, 1GB), then take N=2
|
||||
- (1GB, ), then one bucket per GB
|
||||
|
||||
2. calculate the number of buckets M based on the number of BE nodes and the disk capacity of each BE node, where each BE node counts as 1 and each 50G of disk capacity counts as 1. Then the rule for calculating M is
|
||||
M = number of BE nodes *( one disk block size / 50GB)* number of disk blocks
|
||||
For example, if there are 3 BEs, each with 4 500GB disks, then M = 3 *(500GB / 50GB)* 4 = 120
|
||||
3. Calculation logic to get the final number of buckets.
|
||||
First calculate an intermediate value x = min(M, N, 128).
|
||||
If x < N and x < the number of BE nodes, the final bucket is y, the number of BE nodes; otherwise, the final bucket is x.
|
||||
|
||||
The pseudo-code representation of the above process is as follows
|
||||
|
||||
```
|
||||
int N = Compute the N value;
|
||||
int M = compute M value;
|
||||
|
||||
int y = number of BE nodes;
|
||||
int x = min(M, N, 128);
|
||||
|
||||
if (x < N && x < y) {
|
||||
return y;
|
||||
}
|
||||
return x;
|
||||
```
|
||||
|
||||
With the above algorithm in mind, let's introduce some examples to better understand this part of the logic.
|
||||
|
||||
```
|
||||
case1:
|
||||
Amount of data 100 MB, 10 BE machines, 2TB * 3 disks
|
||||
Amount of data N = 1
|
||||
BE disks M = 10* (2TB/50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 1
|
||||
Final: 1
|
||||
|
||||
case2:
|
||||
Data volume 1GB, 3 BE machines, 500GB * 2 disks
|
||||
Amount of data N = 2
|
||||
BE disks M = 3* (500GB/50GB) * 2 = 60
|
||||
x = min(M, N, 128) = 2
|
||||
Final: 2
|
||||
|
||||
case3:
|
||||
Data volume 100GB, 3 BE machines, 500GB * 2 disks
|
||||
Amount of data N = 20
|
||||
BE disks M = 3* (500GB/50GB) * 2 = 60
|
||||
x = min(M, N, 128) = 20
|
||||
Final: 20
|
||||
|
||||
case4:
|
||||
Data volume 500GB, 3 BE machines, 1TB * 1 disk
|
||||
Data volume N = 100
|
||||
BE disks M = 3* (1TB /50GB) * 1 = 60
|
||||
x = min(M, N, 128) = 63
|
||||
Final: 63
|
||||
|
||||
case5:
|
||||
Data volume 500GB, 10 BE machines, 2TB * 3 disks
|
||||
Amount of data N = 100
|
||||
BE disks M = 10* (2TB / 50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 100
|
||||
Final: 100
|
||||
|
||||
case 6:
|
||||
Data volume 1TB, 10 BE machines, 2TB * 3 disks
|
||||
Amount of data N = 205
|
||||
BE disks M = 10* (2TB / 50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 128
|
||||
Final: 128
|
||||
|
||||
case 7:
|
||||
Data volume 500GB, 1 BE machine, 100TB * 1 disk
|
||||
Amount of data N = 100
|
||||
BE disk M = 1* (100TB / 50GB) * 1 = 2048
|
||||
x = min(M, N, 128) = 100
|
||||
Final: 100
|
||||
|
||||
case 8:
|
||||
Data volume 1TB, 200 BE machines, 4TB * 7 disks
|
||||
Amount of data N = 205
|
||||
BE disks M = 200* (4TB / 50GB) * 7 = 114800
|
||||
x = min(M, N, 128) = 128
|
||||
Final: 200
|
||||
```
|
||||
|
||||
As you can see, the detailed logic matches the principle.
|
||||
Subsequent bucketing projection
|
||||
The above is the calculation logic for the initial bucketing. The subsequent bucketing can be evaluated based on the amount of partition data available since there is already a certain amount of partition data. The subsequent bucket size is evaluated based on the EMA[1] (short term exponential moving average) value of up to the first 7 partitions, which is used as the estimate_partition_size. At this point there are two ways to calculate the partition buckets, assuming partitioning by days, counting forward to the first day partition size of S7, counting forward to the second day partition size of S6, and so on to S1.
|
||||
|
||||
1. if the partition data in 7 days is strictly increasing daily, then the trend value will be taken at this time
|
||||
|
||||
There are 6 delta values, which are
|
||||
|
||||
```
|
||||
S7 - S6 = delta1,
|
||||
S6 - S5 = delta2,
|
||||
...
|
||||
S2 - S1 = delta6
|
||||
```
|
||||
|
||||
This yields the ema(delta) value.
|
||||
Then, today's estimate_partition_size = S7 + ema(delta)
|
||||
|
||||
2. not the first case, this time directly take the average of the previous days EMA
|
||||
|
||||
> today's estimate_partition_size = EMA(S1, ... , S7) , S7)
|
||||
|
||||
According to the above algorithm, the initial number of buckets and the number of subsequent buckets can be calculated. Unlike before when only a fixed number of buckets could be specified, due to changes in business data, it is possible that the number of buckets in the previous partition is different from the number of buckets in the next partition, which is transparent to the user, and the user does not need to care about the exact number of buckets in each partition, and this automatic extrapolation will make the number of buckets more reasonable.
|
||||
|
||||
# Description
|
||||
|
||||
When autobucket is enabled, the schema you see in `show create table` is also `BUCKETS AUTO`. If you want to see the exact number of buckets, you can do so by `show partitions from ${table};`.
|
||||
@ -337,6 +337,7 @@ Range partitioning also supports batch partitioning. For example, you can create
|
||||
1. If you choose to specify multiple bucketing columns, the data will be more evenly distributed. However, if the query condition does not include the equivalent conditions for all bucketing columns, the system will scan all buckets, largely increasing the query throughput and decreasing the latency of a single query. This method is suitable for high-throughput, low-concurrency query scenarios.
|
||||
2. If you choose to specify only one or a few bucketing columns, point queries might scan only one bucket. Thus, when multiple point queries are preformed concurrently, they might scan various buckets, with no interaction between the IO operations (especially when the buckets are stored on various disks). This approach is suitable for high-concurrency point query scenarios.
|
||||
|
||||
* AutoBucket: Calculates the number of partition buckets based on the amount of data. For partitioned tables, you can determine a bucket based on the amount of data, the number of machines, and the number of disks in the historical partition.
|
||||
* There is no theoretical limit on the number of buckets.
|
||||
|
||||
3. Recommendations on the number and data volume for Partitions and Buckets.
|
||||
|
||||
@ -248,12 +248,12 @@ distribution_desc
|
||||
|
||||
1) Hash
|
||||
Syntax:
|
||||
`DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]`
|
||||
`DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num|auto]`
|
||||
Explain:
|
||||
Hash bucketing using the specified key column.
|
||||
2) Random
|
||||
Syntax:
|
||||
`DISTRIBUTED BY RANDOM [BUCKETS num]`
|
||||
`DISTRIBUTED BY RANDOM [BUCKETS num|auto]`
|
||||
Explain:
|
||||
Use random numbers for bucketing.
|
||||
|
||||
|
||||
175
docs/zh-CN/docs/advanced/autobucket.md
Normal file
175
docs/zh-CN/docs/advanced/autobucket.md
Normal file
@ -0,0 +1,175 @@
|
||||
---
|
||||
{
|
||||
"title": "自动分桶",
|
||||
"language": "zh-CN"
|
||||
}
|
||||
---
|
||||
|
||||
<!--
|
||||
Licensed to the Apache Software Foundation (ASF) under one
|
||||
or more contributor license agreements. See the NOTICE file
|
||||
distributed with this work for additional information
|
||||
regarding copyright ownership. The ASF licenses this file
|
||||
to you under the Apache License, Version 2.0 (the
|
||||
"License"); you may not use this file except in compliance
|
||||
with the License. You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing,
|
||||
software distributed under the License is distributed on an
|
||||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
|
||||
KIND, either express or implied. See the License for the
|
||||
specific language governing permissions and limitations
|
||||
under the License.
|
||||
-->
|
||||
|
||||
# 背景
|
||||
|
||||
<version since="1.2.2">
|
||||
|
||||
DISTRIBUTED BY ... BUCKETS auto
|
||||
|
||||
</version>
|
||||
|
||||
用户经常设置不合适的bucket,导致各种问题,这里提供一种方式,来自动设置分桶数。暂时而言只对olap表生效
|
||||
|
||||
# 实现
|
||||
|
||||
以往创建分桶时需要手动设定分桶数,而自动分桶推算功能是 Apache Doris 可以动态地推算分桶个数,使得分桶数始终保持在一个合适范围内,让用户不再操心桶数的细枝末节。
|
||||
首先说明一点,为了方便阐述该功能,该部分会将桶拆分为两个时期的桶,即初始分桶以及后续分桶;这里的初始和后续只是本文为了描述清楚该功能而采用的术语,Apache Doris 分桶本身没有初始和后续之分。
|
||||
从上文中创建分桶一节我们知道,BUCKET_DESC 非常简单,但是需要指定分桶个数;而在自动分桶推算功能上,BUCKET_DESC 的语法直接将分桶数改成"Auto",并新增一个 Properties 配置即可:
|
||||
|
||||
```sql
|
||||
-- 旧版本指定分桶个数的创建语法
|
||||
DISTRIBUTED BY HASH(site) BUCKETS 20
|
||||
|
||||
-- 新版本使用自动分桶推算的创建语法
|
||||
DISTRIBUTED BY HASH(site) BUCKETS AUTO
|
||||
properties("estimate_partition_size" = "100G")
|
||||
```
|
||||
|
||||
新增的配置参数 estimate_partition_size 表示一个单分区的数据量。该参数是可选的,如果没有给出则 Doris 会将 estimate_partition_size 的默认值取为 10GB。
|
||||
从上文中已经得知,一个分桶在物理层面就是一个Tablet,为了获得最好的性能,建议 Tablet 的大小在 1GB - 10GB 的范围内。那么自动分桶推算是如何保证 Tablet 大小处于这个范围内的呢?总结起来不外乎几个原则:
|
||||
|
||||
- 若是整体数据量较小,则分桶数不要设置过多
|
||||
- 若是整体数据量较大,则应使桶数跟总的磁盘块数相关,充分利用每台 BE 机器和每块磁盘的能力
|
||||
初始分桶推算
|
||||
从原则出发,理解自动分桶推算功能的详细逻辑就变得简单了:
|
||||
首先来看初始分桶
|
||||
|
||||
1. 先根据数据量得出一个桶数 N。首先使用 estimate_partition_size 的值除以 5(按文本格式存入 Doris 中有 5 比 1 的数据压缩比计算),得到的结果为:
|
||||
|
||||
- (, 100MB),则取 N=1
|
||||
- [100MB, 1GB),则取 N=2
|
||||
- [1GB, ),则每GB 一个分桶
|
||||
|
||||
2. 根据 BE 节点数以及每个 BE 节点的磁盘容量,计算出桶数 M。其中每个 BE 节点算 1,每 50G 的磁盘容量算 1,那么 M 的计算规则为:
|
||||
M = BE 节点数 *( 一块磁盘块大小 / 50GB)* 磁盘块数
|
||||
例如有 3 台 BE,每台 BE 都有 4 块 500GB 的磁盘,那么 M = 3 *(500GB / 50GB)* 4 = 120
|
||||
3. 得到最终的分桶个数计算逻辑:
|
||||
先计算一个中间值 x = min(M, N, 128),
|
||||
如果 x < N并且x < BE节点个数,则最终分桶为 y 即 BE 节点个数;否则最终分桶数为 x
|
||||
|
||||
上述过程伪代码表现形式为:
|
||||
|
||||
```
|
||||
int N = 计算N值;
|
||||
int M = 计算M值;
|
||||
|
||||
int y = BE节点个数;
|
||||
int x = min(M, N, 128);
|
||||
|
||||
if (x < N && x < y) {
|
||||
return y;
|
||||
}
|
||||
return x;
|
||||
```
|
||||
|
||||
有了上述算法,咱们再引入一些例子来更好地理解这部分逻辑:
|
||||
|
||||
```
|
||||
case1:
|
||||
数据量 100 MB,10 台 BE 机器,2TB *3 块盘
|
||||
数据量 N = 1
|
||||
BE 磁盘 M = 10* (2TB/50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 1
|
||||
最终: 1
|
||||
|
||||
case2:
|
||||
数据量 1GB, 3 台 BE 机器,500GB *2块盘
|
||||
数据量 N = 2
|
||||
BE 磁盘 M = 3* (500GB/50GB) * 2 = 60
|
||||
x = min(M, N, 128) = 2
|
||||
最终: 2
|
||||
|
||||
case3:
|
||||
数据量100GB,3台BE机器,500GB *2块盘
|
||||
数据量N = 20
|
||||
BE磁盘M = 3* (500GB/50GB) * 2 = 60
|
||||
x = min(M, N, 128) = 20
|
||||
最终: 20
|
||||
|
||||
case4:
|
||||
数据量500GB,3台BE机器,1TB *1块盘
|
||||
数据量N = 100
|
||||
BE磁盘M = 3* (1TB /50GB) * 1 = 60
|
||||
x = min(M, N, 128) = 63
|
||||
最终: 63
|
||||
|
||||
case5:
|
||||
数据量500GB,10台BE机器,2TB *3块盘
|
||||
数据量 N = 100
|
||||
BE磁盘 M = 10* (2TB / 50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 100
|
||||
最终: 100
|
||||
|
||||
case 6:
|
||||
数据量1TB,10台BE机器,2TB *3块盘
|
||||
数据量 N = 205
|
||||
BE磁盘M = 10* (2TB / 50GB) * 3 = 1230
|
||||
x = min(M, N, 128) = 128
|
||||
最终: 128
|
||||
|
||||
case 7:
|
||||
数据量500GB,1台BE机器,100TB *1块盘
|
||||
数据量 N = 100
|
||||
BE磁盘M = 1* (100TB / 50GB) * 1 = 2048
|
||||
x = min(M, N, 128) = 100
|
||||
最终: 100
|
||||
|
||||
case 8:
|
||||
数据量1TB, 200台BE机器,4TB *7块盘
|
||||
数据量 N = 205
|
||||
BE磁盘M = 200* (4TB / 50GB) * 7 = 114800
|
||||
x = min(M, N, 128) = 128
|
||||
最终: 200
|
||||
```
|
||||
|
||||
可以看到,详细逻辑与原则是匹配的。
|
||||
后续分桶推算
|
||||
上述是关于初始分桶的计算逻辑,后续分桶数因为已经有了一定的分区数据,可以根据已有的分区数据量来进行评估。后续分桶数会根据最多前 7 个分区数据量的 EMA[1](短期指数移动平均线)值,作为estimate_partition_size 进行评估。此时计算分桶有两种计算方式,假设以天来分区,往前数第一天分区大小为 S7,往前数第二天分区大小为 S6,依次类推到 S1;
|
||||
|
||||
1. 如果 7 天内的分区数据每日严格递增,则此时会取趋势值
|
||||
|
||||
有6个delta值,分别是
|
||||
|
||||
```
|
||||
S7 - S6 = delta1,
|
||||
S6 - S5 = delta2,
|
||||
...
|
||||
S2 - S1 = delta6
|
||||
```
|
||||
|
||||
由此得到ema(delta)值:
|
||||
那么,今天的estimate_partition_size = S7 + ema(delta)
|
||||
|
||||
2. 非第一种的情况,此时直接取前几天的EMA平均值
|
||||
|
||||
> 今天的estimate_partition_size = EMA(S1, ..., S7)
|
||||
|
||||
根据上述算法,初始分桶个数以及后续分桶个数都能被计算出来。跟之前只能指定固定分桶数不同,由于业务数据的变化,有可能前面分区的分桶数和后面分区的分桶数不一样,这对用户是透明的,用户无需关心每一分区具体的分桶数是多少,而这一自动推算的功能会让分桶数更加合理。
|
||||
|
||||
# 说明
|
||||
|
||||
开启autobucket之后,在`show create table`的时候看到的schema也是`BUCKETS AUTO`.如果想要查看确切的bucket数,可以通过`show partitions from ${table};`来查看。
|
||||
@ -341,6 +341,7 @@ Doris 支持两层的数据划分。第一层是 Partition,支持 Range 和 Li
|
||||
- 分桶列的选择,是在 **查询吞吐** 和 **查询并发** 之间的一种权衡:
|
||||
1. 如果选择多个分桶列,则数据分布更均匀。如果一个查询条件不包含所有分桶列的等值条件,那么该查询会触发所有分桶同时扫描,这样查询的吞吐会增加,单个查询的延迟随之降低。这个方式适合大吞吐低并发的查询场景。
|
||||
2. 如果仅选择一个或少数分桶列,则对应的点查询可以仅触发一个分桶扫描。此时,当多个点查询并发时,这些查询有较大的概率分别触发不同的分桶扫描,各个查询之间的IO影响较小(尤其当不同桶分布在不同磁盘上时),所以这种方式适合高并发的点查询场景。
|
||||
- AutoBucket: 根据数据量,计算分桶数。 对于分区表,可以根据历史分区的数据量、机器数、盘数,确定一个分桶。
|
||||
- 分桶的数量理论上没有上限。
|
||||
|
||||
3. **关于 Partition 和 Bucket 的数量和数据量的建议。**
|
||||
|
||||
@ -245,12 +245,12 @@ distribution_desc
|
||||
|
||||
1) Hash 分桶
|
||||
语法:
|
||||
`DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num]`
|
||||
`DISTRIBUTED BY HASH (k1[,k2 ...]) [BUCKETS num|auto]`
|
||||
说明:
|
||||
使用指定的 key 列进行哈希分桶。
|
||||
2) Random 分桶
|
||||
语法:
|
||||
`DISTRIBUTED BY RANDOM [BUCKETS num]`
|
||||
`DISTRIBUTED BY RANDOM [BUCKETS num|auto]`
|
||||
说明:
|
||||
使用随机数进行分桶。
|
||||
|
||||
|
||||
Reference in New Issue
Block a user