Fix master loss on split cluster

When four servers (A, B, C and E where E and A replicate from each other
and A is the master for B and C) form a cluster and only three of them (A,
B and C) are configured into MaxScale, a failover operation from A to B
(making B the current master) and a restart of A causes B to lose its
master status.

The following diagram illustrates the state of the cluster at the end of
the process described above.

      +----------------------+
      |        +---+         |
  +------------+ B <-+       |
+-v-+ |        +---+ |       |
| E | |              |       |
+-^-+ |  +---+     +-+-+     |
  +------+ A |     | C |     |
      |  +---+     +---+     |
      |                      |
      +----------------------+

The external server E was not correctly ignored in the replication
topology generation causing both A and B to be seen as the lowest slave
nodes in the tree. From a theoretical point of view this is the correct
interpretation as there are two distinct trees and neither of them
contains any true masters.

In practice, MaxScale should treat any servers that replicate from an
external master as root level master nodes. Doing this guarantees that they
are labeled as masters if they have slaves replicating from them.
This commit is contained in:
Markus Mäkelä 2018-02-07 21:34:47 +02:00
parent 2504ff19b3
commit b059d78a30
No known key found for this signature in database
GPG Key ID: 72D48FCE664F7B19

View File

@ -2847,7 +2847,11 @@ static MXS_MONITORED_SERVER *get_replication_tree(MXS_MONITOR *mon, int num_serv
current = ptr->server;
node_id = current->master_id;
if (node_id < 1)
/** Either this node doesn't replicate from a master or the master
* where it replicates from is not configured to this monitor. */
if (node_id < 1 ||
getServerByNodeId(mon->monitored_servers, node_id) == NULL)
{
MXS_MONITORED_SERVER *find_slave;
find_slave = getSlaveOfNodeId(mon->monitored_servers, current->node_id, ACCEPT_DOWN);