From 06e16954c4910a8be3ddc76619f97c05c6ba17d0 Mon Sep 17 00:00:00 2001
From: Esa Korhonen <esa.korhonen@mariadb.com>
Date: Fri, 8 Dec 2017 12:26:07 +0200
Subject: [PATCH] Add documentation for switchover, failover and rejoin

---
 Documentation/Monitors/MySQL-Monitor.md | 259 +++++++++++++-----------
 1 file changed, 142 insertions(+), 117 deletions(-)

diff --git a/Documentation/Monitors/MySQL-Monitor.md b/Documentation/Monitors/MySQL-Monitor.md
index 450c0f944..47a6ddd82 100644
--- a/Documentation/Monitors/MySQL-Monitor.md
+++ b/Documentation/Monitors/MySQL-Monitor.md
@@ -184,6 +184,10 @@ the master.
 The formula for calculating the actual number of milliseconds before the server
 is labelled as the master is `monitor_interval * failcount`.
 
+If automatic failover is enabled (`auto_failover=true`), this setting also
+controls how many times the master server must fail to respond before failover
+begins.
+
 ### `allow_cluster_recovery`
 
 Allow recovery after the cluster has dropped down to one server. This feature
@@ -214,60 +218,163 @@ assigned the _Slave_ status which allows them to be used like normal slave
 servers. When the option is disabled, the servers will only receive the _Slave
 of External Server_ status and they will not be used.
 
-### `failover`
+## Failover, switchover and auto-rejoin
+
+Starting with MaxScale 2.2.1, MySQL Monitor supports replication cluster
+modification. The operations implemented are: _failover_ (replacing a failed
+master), _switchover_ (swapping a slave with a running master) and _rejoin_
+(joining a standalone server to the cluster). The features and the parameters
+controlling them are presented in this section.
+
+Both failover and switchover can be activated manually through MaxAdmin.
+Failover selects the new master server automatically, switchover requires the
+user to designate the new master as well as the current master. Example commands
+are below:
+
+```
+call command mysqlmon failover MySQL-Monitor
+call command mysqlmon switchover MySQL-Monitor SlaveServ3 MasterServ
+```
+
+Failover can also activate automatically, if `auto_failover` is on. The
+activation begins when the master has been down for a number of monitor
+iterations defined in `failcount`.
+
+When `auto-rejoin` is active, the monitor will try to rejoin standalone servers
+and slaves replicating from the wrong master (any server not the cluster
+master). These servers are redirected to replicate from the correct master
+server, forcing the replication topology to a 1-master-N-slaves configuration.
+
+All of the three features require that the monitor user (`user`) has the SUPER
+privilege. In addition, the monitor needs to know which username and password a
+slave should use when starting replication. These are given in
+`replication_user` and `replication_password`.
+
+### Limitations
+
+Switchover and failover only understand simple topologies. They will not work if
+the cluster has multiple masters, relay masters, or if the topology is circular.
+The server cluster is assumed to be well-behaving with no significant
+replication lag and all commands that modify the cluster complete in a few
+seconds (faster than `backend_read_timeout` and `backend_write_timeout`).
+
+The backends must all use GTID-based replication, and the domain id should not
+change during a switchover or failover. Master and slaves must have
+well-behaving GTIDs: no extra events on slave servers.
+
+### Configuration parameters
+
+#### `auto_failover`
 
 Enable automated master failover. This parameter expects a boolean value and the
 default value is false.
 
-When the failover functionality is enabled, traditional MariaDB Master-Slave
-clusters will automatically elect a new master if the old master goes down. The
-failover functionality will not take place when MaxScale is configured as a
-passive instance. For details on how MaxScale behaves in passive mode, see the
-following documentation of `failover_timeout`.
+When automatic failover is enabled, traditional MariaDB Master-Slave clusters
+will automatically elect a new master if the old master goes down and stays down
+a number of iterations given in `failcount`. Failover will not take place when
+MaxScale is configured as a passive instance. For details on how MaxScale
+behaves in passive mode, see the documentation on `failover_timeout` below.
 
 If an attempt at failover fails or multiple master servers are detected, an
-error is logged and the failover functionality is disabled. If this happens, the
-cluster must be fixed manually and the failover needs to be re-enabled via the
-REST API or MaxAdmin.
+error is logged and automatic failover is disabled. If this happens, the cluster
+must be fixed manually and the failover needs to be re-enabled via the REST API
+or MaxAdmin.
 
-**Note:** The monitor user must have the SUPER privilege if the failover feature
-  is enabled.
+The monitor user must have the SUPER privilege for failover to work.
 
-### `failover_timeout`
+#### `auto_rejoin`
 
-The timeout for the cluster failover in seconds. The default value is 90
+Enable automatic joining of server to the cluster. This parameter expects a
+boolean value and the default value is false.
+
+When enabled, the monitor will attempt to direct standalone servers and servers replicating from a relay master to the main cluster master server, enforcing a 1-master-N-slaves configuration.
+
+For example, consider the following event series.
+
+1. Slave A goes down
+2. Master goes down and a failover is performed, promoting Slave B
+3. Slave A comes back
+
+Slave A is still trying to replicate from the downed master, since it wasn't online during failover. If `auto_rejoin` is on, Slave A will quickly be redirected to Slave B, the current master.
+
+#### `replication_user` and `replication_password`
+
+The username and password of the replication user. These are given as the values
+for `MASTER_USER` and `MASTER_PASSWORD` whenever a `CHANGE MASTER TO` command is
+executed.
+
+Both `replication_user` and `replication_password` parameters must be defined if
+a custom replication user is used. If neither of the parameters is defined, the
+`CHANGE MASTER TO` command will use the monitor credentials for the replication
+user.
+
+The credentials used for replication must have the `REPLICATION SLAVE`
+privilege.
+
+#### `failover_timeout`
+
+Time limit for the cluster failover in seconds. The default value is 90
 seconds.
 
 If no successful failover takes place within the configured time period, a
-message is logged and the failover functionality is disabled.
+message is logged and automatic failover is disabled.
 
 This parameter also controls how long a MaxScale instance that has transitioned
 from passive to active will wait for a failover to take place after an apparent
 loss of a master server. If no new master server is detected within the
-configured time period, the failover will be initiated again.
+configured time period, failover will be initiated again.
 
-### `switchover`
+#### `verify_master_failure`
 
-Enable switchover via MaxScale. This parameter expects a boolean value and
-the default value is false.
+Enable master failure verification for automatic failover. This parameter
+expects a boolean value and the feature is enabled by default.
 
-When the switchover functionality is enabled, a REST API endpoint will be
-made available, using which switchover may be performed. The endpoint will
-be available irrespective of whether MaxScale is in active or passive mode,
-but switchover will only be attempted if MaxScale is in active mode and an
-error logged if an attempt is made when MaxScale is in passive mode.
-Switchover may also be triggered from MaxAdmin and the same rules regarding
-active/passive holds.
+The failure of a master can be verified by checking whether the slaves are still
+connected to the master. The timeout for master failure verification is
+controlled by the `master_failure_timeout` parameter.
 
-It is safe to perform switchover even with the failover functionality
-enabled, as MaxScale will disable the failover behaviour for the duration
-of the switchover.
+#### `master_failure_timeout`
 
-Only if the switchover succeeds, will the failover functionality be re-enabled.
-Otherwise it will remain disabled and must be turned on manually via the REST
-API or MaxAdmin.
+This parameter controls the period of time, in seconds, that the monitor must
+wait before it can declare that the master has failed. The default value is 10
+seconds. For failover to activate, the `failcount` requirement must also be met.
 
-When switchover is iniated via the REST-API, the URL path looks as follows:
+The failure of a master is verified by tracking when the last change to the
+relay log was done and when the last replication heartbeat was received. If the
+period of time between the last received event and the time of the check exceeds
+the configured value, the slave's connection to the master is considered to be
+broken.
+
+When all slaves of a failed master are no longer connected to the master, the
+master failure is verified and the failover can be safely performed.
+
+If the slaves lose their connections to the master before the configured timeout
+is exceeded, the failover is performed immediately. This allows a faster
+failover when the master server crashes causing immediate disconnection of the
+the network connections.
+
+#### `switchover_timeout`
+
+Time limit for cluster switchover in seconds. The default value is 90
+seconds.
+
+If no successful switchover takes place within the configured time period, a
+message is logged and automatic failover is disabled, even if it was enabled
+before the switchover attempt. This prevents further modifications to the
+misbehaving cluster.
+
+### Manual switchover and failover
+
+Both failover and switchover can be activated manually through the REST API or
+MaxAdmin. The commands are only performed when MaxScale is in active mode.
+
+It is safe to perform switchover or failover even with `auto_failover` on, since
+the automatic operation cannot happen simultaneously with the manual one.
+
+If a switchover or failover fails, automatic failover is disabled. It can be
+turned on manually via the REST API or MaxAdmin.
+
+When switchover is iniated via the REST-API, the URL path is:
 ```
 /v1/maxscale/mysqlmon/switchover?<monitor-instance>&<new-master>&<current-master>
 ```
@@ -291,94 +398,12 @@ path for making `server4` the new master would be:
 /v1/maxscale/mysqlmon/switchover?Cluster1&server4&server2
 ```
 
-**Note:** The monitor user must have the SUPER privilege if the switchover
-  feature is enabled.
-
-### `switchover_script`
-
-*NOTE* By default, MariaDB MaxScale uses the MariaDB provided switchover
-script, so `switchover_script` need not be specified.
-
-This command will be executed when MaxScale has been told to perform a
-switchover, either via MaxAdmin or the REST-API. The parameter should be an
-absolute path to a command or the command should be in the executable path.
-The user which is used to run MaxScale should have execution rights to the
-file itself and the directory it resides in.
-
+The REST-API path for manual failover is similar, although the `<new-master>`
+and `<current-master>` fields are left out.
 ```
-script=/home/user/myswitchover.sh current_master=$CURRENT_MASTER new_master=$NEW_MASTER
+/v1/maxscale/mysqlmon/failover?Cluster1
 ```
 
-In addition to the substitutions documented in
-[Common Monitor Parameters](./Monitor-Common.md)
-the following substitutions will be made to the parameter value:
-
-* `$CURRENT_MASTER` will be replaced with the IP and port of the current
-  master. If the is no current master, the value will be `none`.
-* `$NEW_MASTER` will be replaced with the IP and port of the server that
-  should be made into the new master.
-
-The script should return 0 for success and a non-zero value for failure.
-
-### `switchover_timeout`
-
-The timeout for the cluster switchover in seconds. The default value is 90
-seconds.
-
-If no successful switchover takes place within the configured time period,
-a message is logged and the failover (not switchover) functionality will not
-be enabled, even if it was enabled before the switchover attempt.
-
-### `replication_user`
-
-The username of the replication user. This is given as the value for
-`MASTER_USER` whenever a `CHANGE_MASTER_TO` command is executed.
-
-Both `replication_user` and `replication_password` parameters must be defined if
-a custom replication user is used. If neither of the parameters is defined, the
-`CHANGE MASTER TO` command will use the monitor credentials for the replication
-user.
-
-The credentials used for replication must have the `REPLICATION SLAVE`
-privilege.
-
-### `replication_password`
-
-The password of the replication user. This is given as the value for
-`MASTER_USER` whenever a `CHANGE_MASTER_TO` command is executed.
-
-See `replication_user` parameter documentation for details about the use of this
-parameter.
-
-### `verify_master_failure`
-
-Enable master failure verification for failover. This parameter expects a
-boolean value and the feature is enabled by default.
-
-The failure of a master can be verified by checking whether the slaves are still
-connected to the master. The timeout for master failure verification is
-controlled by the `master_failure_timeout` parameter.
-
-### `master_failure_timeout`
-
-This parameter controls the period of time, in seconds, that the monitor must
-wait before it can declare that the master has failed. The default value is 10
-seconds.
-
-The failure of a master is verified by tracking when the last change to the
-relay log was done and when the last replication heartbeat was received. If the
-period of time between the last received event and the time of the check exceeds
-the configured value, the slave's connection to the master is considered to be
-broken.
-
-When all slaves of a failed master are no longer connected to the master, the
-master failure is verified and the failover can be safely performed.
-
-If the slaves lose their connections to the master before the configured timeout
-is exceeded, the failover is performed immediately. This allows a faster
-failover when the master server crashes causing immediate disconnection of the
-the network connections.
-
 ## Using the MySQL Monitor With Binlogrouter
 
 Since MaxScale 2.2 it's possible to detect a replication setup