From f8898a15620a26612e2e018003f8fa157f34cb36 Mon Sep 17 00:00:00 2001
From: Esa Korhonen <esa.korhonen@mariadb.com>
Date: Fri, 20 Jul 2018 15:14:44 +0300
Subject: [PATCH] Update MariaDB-Monitor documentation

Added explanation on master selection.
---
 Documentation/Monitors/MariaDB-Monitor.md | 53 +++++++++++++++++++++--
 1 file changed, 49 insertions(+), 4 deletions(-)

diff --git a/Documentation/Monitors/MariaDB-Monitor.md b/Documentation/Monitors/MariaDB-Monitor.md
index 67ce52140..7a89873b1 100644
--- a/Documentation/Monitors/MariaDB-Monitor.md
+++ b/Documentation/Monitors/MariaDB-Monitor.md
@@ -5,7 +5,45 @@ Up until MariaDB MaxScale 2.2.0, this monitor was called _MySQL Monitor_.
 ## Overview
 
 The MariaDB Monitor monitors a Master-Slave replication cluster. It monitors the
-state of the backends and assigns master and slave roles.
+state of the backends and assigns server roles such as master and slave, which
+are used by the routers when deciding where to route a query. It can also modify
+the replication cluster by performing failover, switchover and rejoin. Backend
+server versions older than MariaDB/MySQL 5.5 are not supported.
+
+## Master selection
+
+Only one backend can be master at any given time. When a master has been
+selected, the monitor prefers to stick with the choice even if other potential
+masters are available. Only if the current master is clearly unsuitable does the
+monitor try to select another master. An existing master turns invalid if:
+
+1. It is unwritable (*read_only* is on).
+2. It has been down for more than *failcount* monitor passes and automatic
+failover is disabled.
+3. It did not previously replicate from another server in the cluster but it
+is now replicating.
+4. It was previously part of a multimaster group but is no longer, or the
+multimaster group is replicating from a server not in the group.
+
+Cases 1 and 2 cover the situations in which the master server is indeed invalid
+and can no longer act as master. Cases 3 and 4 are less severe, as in these
+cases the topology has changed significantly and the master should be
+re-selected, but the current master may still be the best choice.
+
+The master change described above is not the same as the failover described in
+the section
+[Failover, switchover and auto-rejoin](#failover,-switchover-and-auto-rejoin).
+A master change only modifies the server roles inside MaxScale but does not
+modify the replication topology. For this reason case 2 requires automatic
+failover to be off.
+
+Master selection prefers to select the server with the most slaves, possibly in
+multiple replication layers. A master must also be running (successfully
+connected) and *read_only* must be off. Servers in a cyclical replication
+topology (multimaster group) are interpreted as having all the servers in the
+group as slaves. Even from a multimaster group only one server is selected as
+the overall master. When multiple servers are tied for master status, the server
+which appears earlier in the `servers`-field of the monitor is selected.
 
 ## Configuration
 
@@ -131,9 +169,16 @@ replicate from. In most cases this should be left on.
 
 ### `failcount`
 
-Number of failures that must consecutively occur on a failed master before an
-automatic failover triggers. The default value is 5 failures. Automatic failover
-must be enabled for this effect (`auto_failover=true`).
+Number of consecutive monitor passes a master server must be down before it is
+considered failed. At this point, automatic failover is performed if enabled
+(`auto_failover=true`). If automatic failover is not on, the monitor will try to
+search for another server to fultill the master role. See section
+[Master selection](#master-selection)
+for more details. Changing the master may break replication as queries could be
+routed to a server without previous events. To prevent this, avoid having
+multiple valid master servers in the cluster.
+
+The default value is 5 failures.
 
 ### `allow_cluster_recovery`