[Conf] Make default_storage_medium configurable (#2980)

Doris support choose medium when create table, and the cluster balance strategy is dependent
between different storage medium, and most use will not specify the storage medium when create table,
even they kown that they should choose a storage medium, they have no idea about the
cluster's storage medium, so, I think we should make storage_medium and storage_cooldown_time
configurable, and this should be the admin's responsibility.

For Example, if the cluster's storage medium is HDD, but we need to change part of machines to SSD,
if we change the machine, the tablets before change is stored in HDD and they can't find a dest path
to migrate, and user will create table as usual, it will make all tablets stored in old machines and
the new machines will only store a little tablets. Without this config the only way is admin need
to traverse all partitions in cluster and change the property of storage_medium, it will increase
operational and maintenance costs.

So I add a FE config default_storage_medium, so that user can set the default storage medium.
This commit is contained in:
WingC
2020-03-27 07:22:18 -05:00
committed by GitHub
parent 32c4fc691c
commit c1969a3fb3
12 changed files with 30 additions and 23 deletions

View File

@ -246,7 +246,7 @@ PARTITION BY RANGE(`date`, `id`)
2. storage_medium & storage\_cooldown\_time
* BE 的数据存储目录可以显式的指定为 SSD 或者 HDD(通过 .SSD 或者 .HDD 后缀区分)。建表时,可以统一指定所有 Partition 初始存储的介质。注意,后缀作用是显式指定磁盘介质,而不会检查是否与实际介质类型相符。
* 默认初始存储介质为 HDD。如果指定为 SSD,则数据初始存放在 SSD 上。
* 默认初始存储介质可通过fe的配置文件 `fe.conf` 中指定 `default_storage_medium=xxx`,如果没有指定,则默认为 HDD。如果指定为 SSD,则数据初始存放在 SSD 上。
* 如果没有指定 storage\_cooldown\_time,则默认 7 天后,数据会从 SSD 自动迁移到 HDD 上。如果指定了 storage\_cooldown\_time,则在到达 storage_cooldown_time 时间后,数据才会迁移。
* 注意,当指定 storage_medium 时,该参数只是一个“尽力而为”的设置。即使集群内没有设置 SSD 存储介质,也不会报错,而是自动存储在可用的数据目录中。同样,如果 SSD 介质不可访问、空间不足,都可能导致数据初始直接存储在其他可用介质上。而数据到期迁移到 HDD 时,如果 HDD 介质不可访问、空间不足,也可能迁移失败(但是会不断尝试)。

View File

@ -223,9 +223,9 @@ under the License.
)
```
storage_medium: 用于指定该分区的初始存储介质,可选择 SSD 或 HDD。默认为 HDD。
storage_medium: 用于指定该分区的初始存储介质,可选择 SSD 或 HDD。默认初始存储介质可通过fe的配置文件 `fe.conf` 中指定 `default_storage_medium=xxx`,如果没有指定,则默认为 HDD。
storage_cooldown_time: 当设置存储介质为 SSD 时,指定该分区在 SSD 上的存储到期时间。
默认存放 7 天。
默认存放 30 天。
格式为:"yyyy-MM-dd HH:mm:ss"
replication_num: 指定分区的副本数。默认为 3

View File

@ -242,7 +242,7 @@ Replication_num
2. storage_medium & storage\_cooldown\_time
* The BE data storage directory can be explicitly specified as SSD or HDD (differentiated by .SSD or .HDD suffix). When you build a table, you can uniformly specify the media for all Partition initial storage. Note that the suffix is ​​to explicitly specify the disk media without checking to see if it matches the actual media type.
* The default initial storage medium is HDD. If specified as an SSD, the data is initially stored on the SSD.
* The default initial storage media can be specified by `default_storage_medium= XXX` in the fe configuration file `fe.conf`, or, if not, by default, HDD. If specified as an SSD, the data is initially stored on the SSD.
* If storage\_cooldown\_time is not specified, the data is automatically migrated from the SSD to the HDD after 7 days by default. If storage\_cooldown\_time is specified, the data will not migrate until the storage_cooldown_time time is reached.
* Note that this parameter is just a "best effort" setting when storage_medium is specified. Even if no SSD storage media is set in the cluster, no error is reported and it is automatically stored in the available data directory. Similarly, if the SSD media is inaccessible and out of space, the data may initially be stored directly on other available media. When the data expires and is migrated to the HDD, if the HDD media is inaccessible and there is not enough space, the migration may fail (but will continue to try).

View File

@ -204,7 +204,7 @@ Syntax:
)
```
storage_medium: SSD or HDD
storage_medium: SSD or HDD, The default initial storage media can be specified by `default_storage_medium= XXX` in the fe configuration file `fe.conf`, or, if not, by default, HDD.
storage_cooldown_time: If storage_medium is SSD, data will be automatically moved to HDD when timeout.
Default is 7 days.
Format: "yyyy-MM-dd HH:mm:ss"

View File

@ -56,7 +56,7 @@ public class SingleRangePartitionDesc {
this.partitionKeyDesc = partitionKeyDesc;
this.properties = properties;
this.partitionDataProperty = DataProperty.DEFAULT_HDD_DATA_PROPERTY;
this.partitionDataProperty = DataProperty.DEFAULT_DATA_PROPERTY;
this.replicationNum = FeConstants.default_replication_num;
}
@ -107,7 +107,7 @@ public class SingleRangePartitionDesc {
// analyze data property
partitionDataProperty = PropertyAnalyzer.analyzeDataProperty(properties,
DataProperty.DEFAULT_HDD_DATA_PROPERTY);
DataProperty.DEFAULT_DATA_PROPERTY);
Preconditions.checkNotNull(partitionDataProperty);
// analyze replication num

View File

@ -3561,7 +3561,7 @@ public class Catalog {
DataProperty dataProperty = null;
try {
dataProperty = PropertyAnalyzer.analyzeDataProperty(stmt.getProperties(),
DataProperty.DEFAULT_HDD_DATA_PROPERTY);
DataProperty.DEFAULT_DATA_PROPERTY);
} catch (AnalysisException e) {
throw new DdlException(e.getMessage());
}
@ -3673,7 +3673,7 @@ public class Catalog {
try {
// just for remove entries in stmt.getProperties(),
// and then check if there still has unknown properties
PropertyAnalyzer.analyzeDataProperty(stmt.getProperties(), DataProperty.DEFAULT_HDD_DATA_PROPERTY);
PropertyAnalyzer.analyzeDataProperty(stmt.getProperties(), DataProperty.DEFAULT_DATA_PROPERTY);
DynamicPartitionUtil.checkAndSetDynamicPartitionProperty(olapTable, properties);
if (properties != null && !properties.isEmpty()) {
@ -4519,8 +4519,7 @@ public class Catalog {
if (dataProperty.getStorageMedium() == TStorageMedium.SSD
&& dataProperty.getCooldownTimeMs() < currentTimeMs) {
// expire. change to HDD.
partitionInfo.setDataProperty(partition.getId(),
DataProperty.DEFAULT_HDD_DATA_PROPERTY);
partitionInfo.setDataProperty(partition.getId(), new DataProperty(TStorageMedium.HDD));
storageMediumMap.put(partitionId, TStorageMedium.HDD);
LOG.debug("partition[{}-{}-{}] storage medium changed from SSD to HDD",
dbId, tableId, partitionId);
@ -4529,7 +4528,7 @@ public class Catalog {
ModifyPartitionInfo info =
new ModifyPartitionInfo(db.getId(), olapTable.getId(),
partition.getId(),
DataProperty.DEFAULT_HDD_DATA_PROPERTY,
DataProperty.DEFAULT_DATA_PROPERTY,
(short) -1,
partitionInfo.getIsInMemory(partition.getId()));
editLog.logModifyPartition(info);

View File

@ -17,6 +17,7 @@
package org.apache.doris.catalog;
import org.apache.doris.common.Config;
import org.apache.doris.common.io.Text;
import org.apache.doris.common.io.Writable;
import org.apache.doris.common.util.TimeUtils;
@ -27,7 +28,8 @@ import java.io.DataOutput;
import java.io.IOException;
public class DataProperty implements Writable {
public static final DataProperty DEFAULT_HDD_DATA_PROPERTY = new DataProperty(TStorageMedium.HDD);
public static final DataProperty DEFAULT_DATA_PROPERTY = new DataProperty(
"SSD".equalsIgnoreCase(Config.default_storage_medium) ? TStorageMedium.SSD : TStorageMedium.HDD);
public static final long MAX_COOLDOWN_TIME_MS = 253402271999000L; // 9999-12-31 23:59:59
private TStorageMedium storageMedium;

View File

@ -23,6 +23,7 @@ import org.apache.doris.common.io.Writable;
import com.google.common.base.Preconditions;
import org.apache.doris.thrift.TStorageMedium;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
@ -127,7 +128,7 @@ public class PartitionInfo implements Writable {
out.writeInt(idToDataProperty.size());
for (Map.Entry<Long, DataProperty> entry : idToDataProperty.entrySet()) {
out.writeLong(entry.getKey());
if (entry.getValue() == DataProperty.DEFAULT_HDD_DATA_PROPERTY) {
if (entry.getValue().equals(new DataProperty(TStorageMedium.HDD))) {
out.writeBoolean(true);
} else {
out.writeBoolean(false);
@ -145,9 +146,9 @@ public class PartitionInfo implements Writable {
int counter = in.readInt();
for (int i = 0; i < counter; i++) {
long partitionId = in.readLong();
boolean isDefaultDataProperty = in.readBoolean();
if (isDefaultDataProperty) {
idToDataProperty.put(partitionId, DataProperty.DEFAULT_HDD_DATA_PROPERTY);
boolean isDefaultHddDataProperty = in.readBoolean();
if (isDefaultHddDataProperty) {
idToDataProperty.put(partitionId, new DataProperty(TStorageMedium.HDD));
} else {
idToDataProperty.put(partitionId, DataProperty.read(in));
}
@ -170,7 +171,7 @@ public class PartitionInfo implements Writable {
for (Map.Entry<Long, DataProperty> entry : idToDataProperty.entrySet()) {
buff.append(entry.getKey()).append("is HDD: ");;
if (entry.getValue() == DataProperty.DEFAULT_HDD_DATA_PROPERTY) {
if (entry.getValue().equals(new DataProperty(TStorageMedium.HDD))) {
buff.append(true);
} else {
buff.append(false);

View File

@ -555,7 +555,12 @@ public class Config extends ConfigBase {
@ConfField(mutable = true, masterOnly = true)
public static int max_backend_down_time_second = 3600; // 1h
/*
* When create a table(or partition), you can specify its storage media(HDD or SSD).
* When create a table(or partition), you can specify its storage medium(HDD or SSD).
* If not set, this specifies the default medium when creat.
*/
@ConfField public static String default_storage_medium = "HDD";
/*
* When create a table(or partition), you can specify its storage medium(HDD or SSD).
* If set to SSD, this specifies the default duration that tablets will stay on SSD.
* After that, tablets will be moved to HDD automatically.
* You can set storage cooldown time in CREATE TABLE stmt.

View File

@ -214,7 +214,7 @@ public class CatalogTestUtil {
// table
PartitionInfo partitionInfo = new SinglePartitionInfo();
partitionInfo.setDataProperty(partitionId, DataProperty.DEFAULT_HDD_DATA_PROPERTY);
partitionInfo.setDataProperty(partitionId, DataProperty.DEFAULT_DATA_PROPERTY);
partitionInfo.setReplicationNum(partitionId, (short) 3);
OlapTable table = new OlapTable(tableId, testTable1, columns, KeysType.AGG_KEYS, partitionInfo,
distributionInfo);

View File

@ -108,7 +108,7 @@ public class UnitTestUtil {
// table
PartitionInfo partitionInfo = new SinglePartitionInfo();
partitionInfo.setDataProperty(partitionId, DataProperty.DEFAULT_HDD_DATA_PROPERTY);
partitionInfo.setDataProperty(partitionId, DataProperty.DEFAULT_DATA_PROPERTY);
partitionInfo.setReplicationNum(partitionId, (short) 3);
partitionInfo.setIsInMemory(partitionId, false);
OlapTable table = new OlapTable(tableId, TABLE_NAME, columns,

View File

@ -150,7 +150,7 @@ abstract public class DorisHttpTestCase {
// table
PartitionInfo partitionInfo = new SinglePartitionInfo();
partitionInfo.setDataProperty(testPartitionId, DataProperty.DEFAULT_HDD_DATA_PROPERTY);
partitionInfo.setDataProperty(testPartitionId, DataProperty.DEFAULT_DATA_PROPERTY);
partitionInfo.setReplicationNum(testPartitionId, (short) 3);
OlapTable table = new OlapTable(testTableId, name, columns, KeysType.AGG_KEYS, partitionInfo,
distributionInfo);
@ -168,7 +168,7 @@ abstract public class DorisHttpTestCase {
columns.add(k1);
columns.add(k2);
PartitionInfo partitionInfo = new SinglePartitionInfo();
partitionInfo.setDataProperty(testPartitionId + 100, DataProperty.DEFAULT_HDD_DATA_PROPERTY);
partitionInfo.setDataProperty(testPartitionId + 100, DataProperty.DEFAULT_DATA_PROPERTY);
partitionInfo.setReplicationNum(testPartitionId + 100, (short) 3);
EsTable table = null;
Map<String, String> props = new HashMap<>();