Fix master loss on split cluster

When four servers (A, B, C and E where E and A replicate from each other and A is the master for B and C) form a cluster and only three of them (A, B and C) are configured into MaxScale, a failover operation from A to B (making B the current master) and a restart of A causes B to lose its master status. The following diagram illustrates the state of the cluster at the end of the process described above. +----------------------+ | +---+ | +------------+ B <-+ | +-v-+ | +---+ | | | E | | | | +-^-+ | +---+ +-+-+ | +------+ A | | C | | | +---+ +---+ | | | +----------------------+ The external server E was not correctly ignored in the replication topology generation causing both A and B to be seen as the lowest slave nodes in the tree. From a theoretical point of view this is the correct interpretation as there are two distinct trees and neither of them contains any true masters. In practice, MaxScale should treat any servers that replicate from an external master as root level master nodes. Doing this guarantees that they are labeled as masters if they have slaves replicating from them.
2018-02-07 21:34:47 +02:00 · 2018-02-07 21:34:47 +02:00 · b059d78a30
commit b059d78a30
parent 2504ff19b3
1 changed files with 5 additions and 1 deletions
--- a/server/modules/monitor/mariadbmon/mysql_mon.cc
+++ b/server/modules/monitor/mariadbmon/mysql_mon.cc
@ -2847,7 +2847,11 @@ static MXS_MONITORED_SERVER *get_replication_tree(MXS_MONITOR *mon, int num_serv
        current = ptr->server;

        node_id = current->master_id;
-        if (node_id < 1)
+
+        /** Either this node doesn't replicate from a master or the master
+         * where it replicates from is not configured to this monitor. */
+        if (node_id < 1 ||
+            getServerByNodeId(mon->monitored_servers, node_id) == NULL)
        {
            MXS_MONITORED_SERVER *find_slave;
            find_slave = getSlaveOfNodeId(mon->monitored_servers, current->node_id, ACCEPT_DOWN);