MXS-1493: Improve master failure detection

The master failure can now be verified by checking when the slaves are
connected to the master. If the slaves do not receive any events from the
master, the connections are considered as down after a configurable limit.

Added two parameters for controlling whether the check is done and for how
long the monitor waits before doing the failover.
This commit is contained in:
Markus Mäkelä
2017-10-26 12:16:55 +03:00
parent 26b47d0b90
commit 0be39b8545
3 changed files with 78 additions and 0 deletions

View File

@ -358,6 +358,35 @@ The password of the replication user. This is given as the value for
See `replication_user` parameter documentation for details about the use of this
parameter.
### `verify_master_failure`
Enable master failure verification for failover. This parameter expects a
boolean value and the feature is enabled by default.
The failure of a master can be verified by checking whether the slaves are still
connected to the master. The timeout for master failure verification is
controlled by the `master_failure_timeout` parameter.
### `master_failure_timeout`
This parameter controls the period of time, in seconds, that the monitor must
wait before it can declare that the master has failed. The default value is 10
seconds.
The failure of a master is verified by tracking when the last change to the
relay log was done and when the last replication heartbeat was received. If the
period of time between the last received event and the time of the check exceeds
the configured value, the slave's connection to the master is considered to be
broken.
When all slaves of a failed master are no longer connected to the master, the
master failure is verified and the failover can be safely performed.
If the slaves lose their connections to the master before the configured timeout
is exceeded, the failover is performed immediately. This allows a faster
failover when the master server crashes causing immediate disconnection of the
the network connections.
## Using the MySQL Monitor With Binlogrouter
Since MaxScale 2.2 it's possible to detect a replication setup