From b059d78a302d4f92d134372a54ea26bf3825743c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Markus=20M=C3=A4kel=C3=A4?= Date: Wed, 7 Feb 2018 21:34:47 +0200 Subject: [PATCH] Fix master loss on split cluster When four servers (A, B, C and E where E and A replicate from each other and A is the master for B and C) form a cluster and only three of them (A, B and C) are configured into MaxScale, a failover operation from A to B (making B the current master) and a restart of A causes B to lose its master status. The following diagram illustrates the state of the cluster at the end of the process described above. +----------------------+ | +---+ | +------------+ B <-+ | +-v-+ | +---+ | | | E | | | | +-^-+ | +---+ +-+-+ | +------+ A | | C | | | +---+ +---+ | | | +----------------------+ The external server E was not correctly ignored in the replication topology generation causing both A and B to be seen as the lowest slave nodes in the tree. From a theoretical point of view this is the correct interpretation as there are two distinct trees and neither of them contains any true masters. In practice, MaxScale should treat any servers that replicate from an external master as root level master nodes. Doing this guarantees that they are labeled as masters if they have slaves replicating from them. --- server/modules/monitor/mariadbmon/mysql_mon.cc | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/server/modules/monitor/mariadbmon/mysql_mon.cc b/server/modules/monitor/mariadbmon/mysql_mon.cc index cd1a8f79a..51586f3db 100644 --- a/server/modules/monitor/mariadbmon/mysql_mon.cc +++ b/server/modules/monitor/mariadbmon/mysql_mon.cc @@ -2847,7 +2847,11 @@ static MXS_MONITORED_SERVER *get_replication_tree(MXS_MONITOR *mon, int num_serv current = ptr->server; node_id = current->master_id; - if (node_id < 1) + + /** Either this node doesn't replicate from a master or the master + * where it replicates from is not configured to this monitor. */ + if (node_id < 1 || + getServerByNodeId(mon->monitored_servers, node_id) == NULL) { MXS_MONITORED_SERVER *find_slave; find_slave = getSlaveOfNodeId(mon->monitored_servers, current->node_id, ACCEPT_DOWN);