[fix](transaction) Fix concurrent schema change and txn cause dead lock (#26428)

Concurrent schema change and txn may cause dead lock. An example:

Txn T commit but not publish;
Run schema change or rollup on T's related partition, add alter replica R;
sc/rollup add a sched txn watermark M;
Restart fe;
After fe restart, T's loadedTblIndexes will clear because it's not save to disk;
T will publish version to all tablet, including sc/rollup's new alter replica R;
Since R not contains txn data, so the T will fail. It will then always waitting for R's data;
sc/rollup wait for txn before M to finish, only after that it will let R copy history data;
Since T's not finished, so sc/rollup will always wait, so R will nerver copy history data;
Txn T and sc/rollup will wait each other forever, cause dead lock;
Fix: because sc/rollup will ensure double write after the sched watermark M, so for finish transaction, when checking a alter replica:

if txn id is bigger than M, check it just like a normal replica;
otherwise skip check this replica, the BE will modify history data later.
This commit is contained in:
yujun
2023-11-13 21:39:28 +08:00
committed by GitHub
parent 7b50a62f0c
commit ebc15fc6cc
12 changed files with 306 additions and 22 deletions

View File

@ -439,6 +439,13 @@ public class Config extends ConfigBase {
+ "then the load task will be successful." })
public static int publish_wait_time_second = 300;
@ConfField(mutable = true, masterOnly = true, description = {"导入 Publish 阶段是否检查正在做 Schema 变更的副本。"
+ "正常情况下,不要关闭此检查。除非在极端情况下出现导入和 Schema 变更出现互相等待死锁时才临时打开。",
"Check the replicas which are doing schema change when publish transaction. Do not turn off this check "
+ " under normal circumstances. It's only temporarily skip check if publish version and schema change have"
+ " dead lock" })
public static boolean publish_version_check_alter_replica = true;
@ConfField(mutable = true, masterOnly = true, description = {"提交事务的最大超时时间,单位是秒。"
+ "该参数仅用于事务型 insert 操作中。",
"Maximal waiting time for all data inserted before one transaction to be committed, in seconds. "