Remove old "detect_standalone_master"-feature, update documentation

The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
This commit is contained in:
Esa Korhonen 2018-07-04 15:16:01 +03:00
parent f7538db3b7
commit 936bcde135
6 changed files with 42 additions and 234 deletions

View File

@ -4,9 +4,8 @@ Up until MariaDB MaxScale 2.2.0, this monitor was called _MySQL Monitor_.
## Overview
The MariaDB Monitor is a monitoring module for MaxScale that monitors a Master-Slave
replication cluster. It assigns master and slave roles inside MaxScale according to
the actual replication tree in the cluster.
The MariaDB Monitor monitors a Master-Slave replication cluster. It monitors the
state of the backends and assigns master and slave roles.
## Configuration
@ -20,14 +19,14 @@ module=mariadbmon
servers=server1,server2,server3
user=myuser
passwd=mypwd
```
Note that from MaxScale 2.2.1 onwards, the module name is `mariadbmon`; up until
MaxScale 2.2.0 it was `mysqlmon`. The name `mysqlmon` has been deprecated but can
still be used, although it will cause a warning to be logged.
The user requires the REPLICATION CLIENT privilege to successfully monitor the
state of the servers.
From MaxScale 2.2.1 onwards, the module name is `mariadbmon` instead of
`mysqlmon`. The old name can still be used.
The `user` requires the REPLICATION CLIENT privilege to successfully monitor the
state of the servers. SUPER privilege is required for cluster manipulation
features such as failover.
```
MariaDB [(none)]> grant replication client on *.* to 'maxscale'@'maxscalehost';
@ -49,16 +48,17 @@ A boolean value which controls if replication lag between the master and the
slaves is monitored. This allows the routers to route read queries to only
slaves that are up to date. Default value for this parameter is _false_.
To detect the replication lag, MaxScale uses the _maxscale_schema.replication_heartbeat_
table. This table is created on the master server and it is updated at every heartbeat
with the current timestamp. The updates are then replicated to the slave servers
and when the replicated timestamp is read from the slave servers, the lag between
the slave and the master can be calculated.
To measure the replication lag, MaxScale uses the
*maxscale_schema.replication_heartbeat* table. This table is created on the
master server and it is updated at every heartbeat with the current timestamp.
The updates are then replicated to the slave servers and when the replicated
timestamp is read from the slave servers, the lag between the slave and the
master is calculated.
The monitor user requires INSERT, UPDATE, DELETE and SELECT permissions on the
maxscale_schema.replication_heartbeat table and CREATE permissions on the
maxscale_schema database. The monitor user will always try to create the database
and the table if they do not exist.
*maxscale_schema.replication_heartbeat* table and CREATE permissions on the
maxscale_schema database. The monitor creates the database and the table if they
do not exist.
### `detect_stale_master`
@ -97,38 +97,11 @@ detect_stale_slave=true
### `mysql51_replication`
Enable support for MySQL 5.1 replication monitoring. This is needed if a MySQL
server older than 5.5 is used as a slave in replication.
```
mysql51_replication=true
```
Deprecated and unused as of MaxScale 2.3. Can be defined but is ignored.
### `multimaster`
Detect multi-master replication topologies. This feature is disabled by default.
When enabled, the multi-master detection looks for the root master servers in
the replication clusters. These masters can be found by detecting cycles in the
graph created by the servers. When a cycle is detected, it is assigned a master
group ID. Every master in a master group will receive the Master status. The
special group ID 0 is assigned to all servers which are not a part of a
multi-master replication cycle.
If one or more masters in a group has the `@@read_only` system variable set to
`ON`, those servers will receive the Slave status even though they are in the
multi-master group. Slave servers with `@@read_only` disabled will never receive
the master status.
By setting the servers into read-only mode, the user can control which
server receive the master status. To do this:
- Enable `@@read_only` on all servers (preferably through the configuration file)
- Manually disable `@@read_only` on the server which should be the master
This functionality is similar to the [Multi-Master Monitor](MM-Monitor.md)
functionality. The only difference is that the MariaDB monitor will also detect
traditional Master-Slave topologies.
Deprecated and unused as of MaxScale 2.3. Can be defined but is ignored.
### `ignore_external_masters`
@ -149,89 +122,22 @@ External Server, Running` labels will instead get the `Master, Running` labels.
### `detect_standalone_master`
Detect standalone master servers. This feature takes a boolean parameter and
from MaxScale 2.2.1 onwards is enabled by default. Up until MaxScale 2.2.0 it
was disabled by default. In MaxScale 2.1.0, this parameter was called `failover`.
Detect standalone master servers. This feature takes a boolean parameter and is
enabled by default.
This parameter is intended to be used with simple, two node master-slave pairs
where the failure of the master can be resolved by "promoting" the slave as the
new master. Normally this is done by using an external agent of some sort
(possibly triggered by MaxScale's monitor scripts), like
[MariaDB Replication Manager](https://github.com/tanji/replication-manager)
or [MHA](https://code.google.com/p/mysql-master-ha/).
When the number of running servers in the cluster drops down to one, MaxScale
cannot be absolutely certain whether the last remaining server is a master or a
slave. At this point, MaxScale will try to deduce the type of the server by
looking at the system variables of the server in question.
By default, MaxScale will only attempt to deduce if the server can be used as a
slave server (controlled by the `detect_stale_slave` parameter). When the
`detect_standalone_master` mode is enabled, MaxScale will also attempt to deduce
whether the server can be used as a master server. This is done by checking that
the server is not in read-only mode and that it is not configured as a slave.
This mode in mariadbmon is completely passive in the sense that it does not modify
the cluster or any of the servers in it. It only labels the last remaining
server in a cluster as the master server.
Before a server is labelled as a standalone master, the following conditions must
have been met:
- Previous attempts to connect to other servers in the cluster have failed,
controlled by the `failcount` parameter
- There is only one running server among the monitored servers
- The value of the `@@read_only` system variable is set to `OFF`
In 2.1.1, the following additional condition was added:
- The last running server is not configured as a slave
If the value of the `allow_cluster_recovery` parameter is set to false, the monitor
sets all other servers into maintenance mode. This is done to prevent accidental
use of the failed servers if they came back online. If the failed servers come
back up, the maintenance mode needs to be manually cleared once replication has
been set up.
**Note**: A failover will cause permanent changes in the data of the promoted
server. Only use this feature if you know that the slave servers are capable
of acting as master servers.
This setting controls whether a standalone server can be a master. A standalone
server is a server from which no other server in the cluster is attempting to
replicate from. In most cases this should be left on.
### `failcount`
Number of failures that must occur on all failed servers before a standalone
server is labelled as a master. The default value is 5 failures.
The monitor will attempt to contact all servers once per monitoring cycle. When
`detect_standalone_master` is enabled, all of the failed servers must fail
_failcount_ number of connection attempts before the last server is labeled as
the master.
The formula for calculating the actual number of milliseconds before the server
is labelled as the master is `monitor_interval * failcount`.
If automatic failover is enabled (`auto_failover=true`), this setting also
controls how many times the master server must fail to respond before failover
begins.
Number of failures that must consecutively occur on a failed master before an
automatic failover triggers. The default value is 5 failures. Automatic failover
must be enabled for this effect (`auto_failover=true`).
### `allow_cluster_recovery`
Allow recovery after the cluster has dropped down to one server. This feature
takes a boolean parameter is enabled by default. This parameter requires that
`detect_standalone_master` is set to true. In MaxScale 2.1.0, this parameter was
called `failover_recovery`.
When this parameter is disabled, if the last remaining server is labelled as the
master, the monitor will set all of the failed servers into maintenance
mode. When this option is enabled, the failed servers are allowed to rejoin the
cluster.
This option should be enabled only when MaxScale is used in conjunction with an
external agent that automatically reintegrates failed servers into the
cluster. One of these agents is the _replication-manager_ which automatically
configures the failed servers as new slaves of the current master.
Deprecated and unused as of MaxScale 2.3. Can be defined but is ignored.
### `enforce_read_only_slaves`

View File

@ -463,78 +463,6 @@ static bool check_replicate_wild_ignore_table(MXS_MONITORED_SERVER* database)
return rval;
}
/**
* @brief Check whether standalone master conditions have been met
*
* This function checks whether all the conditions to use a standalone master are met. For this to happen,
* only one server must be available and other servers must have passed the configured tolerance level of
* failures.
*
* @return True if standalone master should be used
*/
bool MariaDBMonitor::standalone_master_required()
{
int candidates = 0;
for (auto iter = m_servers.begin(); iter != m_servers.end(); iter++)
{
MariaDBServer* server = *iter;
if (server->is_running())
{
candidates++;
if (server->m_read_only || !server->m_slave_status.empty() || candidates > 1)
{
return false;
}
}
else if (server->m_server_base->mon_err_count < m_failcount)
{
return false;
}
}
return candidates == 1;
}
/**
* @brief Use standalone master
*
* This function assigns the last remaining server the master status and sets all other servers into
* maintenance mode. By setting the servers into maintenance mode, we prevent any possible conflicts when
* the failed servers come back up.
*
* @return True if standalone master was set
*/
bool MariaDBMonitor::set_standalone_master()
{
bool rval = false;
for (auto iter = m_servers.begin(); iter != m_servers.end(); iter++)
{
MariaDBServer* server = *iter;
auto mon_server = server->m_server_base;
if (server->is_running())
{
if (!server->is_master() && m_warn_set_standalone_master)
{
MXS_WARNING("Setting standalone master, server '%s' is now the master.%s",
server->name(), m_allow_cluster_recovery ? "" :
" All other servers are set into maintenance mode.");
m_warn_set_standalone_master = false;
}
monitor_set_pending_status(mon_server, SERVER_MASTER | SERVER_WAS_MASTER);
monitor_clear_pending_status(mon_server, SERVER_SLAVE);
m_master = server;
rval = true;
}
else if (!m_allow_cluster_recovery)
{
server->set_status(SERVER_MAINT);
}
}
return rval;
}
/**
* Find the server with the best reach in the candidates-array. Running state or 'read_only' is ignored by
* this method.

View File

@ -42,6 +42,7 @@ static const char CN_NO_PROMOTE_SERVERS[] = "servers_no_promotion";
static const char CN_FAILOVER_TIMEOUT[] = "failover_timeout";
static const char CN_SWITCHOVER_ON_LOW_DISK_SPACE[] = "switchover_on_low_disk_space";
static const char CN_SWITCHOVER_TIMEOUT[] = "switchover_timeout";
static const char CN_DETECT_STANDALONE_MASTER[] = "detect_standalone_master";
static const char CN_MAINTENANCE_ON_LOW_DISK_SPACE[] = "maintenance_on_low_disk_space";
// Parameters for master failure verification and timeout
static const char CN_VERIFY_MASTER_FAILURE[] = "verify_master_failure";
@ -58,7 +59,6 @@ MariaDBMonitor::MariaDBMonitor(MXS_MONITOR* monitor)
, m_cluster_topology_changed(true)
, m_cluster_modified(false)
, m_switchover_on_low_disk_space(false)
, m_warn_set_standalone_master(true)
, m_log_no_master(true)
, m_warn_failover_precond(true)
, m_warn_cannot_rejoin(true)
@ -181,11 +181,9 @@ bool MariaDBMonitor::configure(const MXS_CONFIG_PARAMETER* params)
m_detect_stale_master = config_get_bool(params, "detect_stale_master");
m_detect_stale_slave = config_get_bool(params, "detect_stale_slave");
m_detect_replication_lag = config_get_bool(params, "detect_replication_lag");
m_detect_multimaster = config_get_bool(params, "multimaster");
m_ignore_external_masters = config_get_bool(params, "ignore_external_masters");
m_detect_standalone_master = config_get_bool(params, "detect_standalone_master");
m_detect_standalone_master = config_get_bool(params, CN_DETECT_STANDALONE_MASTER);
m_failcount = config_get_integer(params, CN_FAILCOUNT);
m_allow_cluster_recovery = config_get_bool(params, "allow_cluster_recovery");
m_script = config_get_string(params, "script");
m_events = config_get_enum(params, "events", mxs_monitor_event_enum_values);
m_failover_timeout = config_get_integer(params, CN_FAILOVER_TIMEOUT);
@ -244,7 +242,7 @@ void MariaDBMonitor::diagnostics(DCB *dcb) const
dcb_printf(dcb, "\nServer information:\n-------------------\n\n");
for (auto iter = m_servers.begin(); iter != m_servers.end(); iter++)
{
string server_info = (*iter)->diagnostics(m_detect_multimaster) + "\n";
string server_info = (*iter)->diagnostics() + "\n";
dcb_printf(dcb, "%s", server_info.c_str());
}
}
@ -256,10 +254,8 @@ json_t* MariaDBMonitor::diagnostics_json() const
json_object_set_new(rval, "detect_stale_master", json_boolean(m_detect_stale_master));
json_object_set_new(rval, "detect_stale_slave", json_boolean(m_detect_stale_slave));
json_object_set_new(rval, "detect_replication_lag", json_boolean(m_detect_replication_lag));
json_object_set_new(rval, "multimaster", json_boolean(m_detect_multimaster));
json_object_set_new(rval, "detect_standalone_master", json_boolean(m_detect_standalone_master));
json_object_set_new(rval, CN_DETECT_STANDALONE_MASTER, json_boolean(m_detect_standalone_master));
json_object_set_new(rval, CN_FAILCOUNT, json_integer(m_failcount));
json_object_set_new(rval, "allow_cluster_recovery", json_boolean(m_allow_cluster_recovery));
json_object_set_new(rval, CN_AUTO_FAILOVER, json_boolean(m_auto_failover));
json_object_set_new(rval, CN_FAILOVER_TIMEOUT, json_integer(m_failover_timeout));
json_object_set_new(rval, CN_SWITCHOVER_TIMEOUT, json_integer(m_switchover_timeout));
@ -281,7 +277,7 @@ json_t* MariaDBMonitor::diagnostics_json() const
json_t* arr = json_array();
for (auto iter = m_servers.begin(); iter != m_servers.end(); iter++)
{
json_array_append_new(arr, (*iter)->diagnostics_json(m_detect_multimaster));
json_array_append_new(arr, (*iter)->diagnostics_json());
}
json_object_set_new(rval, "server_info", arr);
}
@ -491,20 +487,6 @@ void MariaDBMonitor::tick()
}
}
/* Check if need to use standalone master. TODO: Rewrite these methods. */
if (m_detect_standalone_master)
{
if (standalone_master_required())
{
// Other servers have died, set last remaining server as master
set_standalone_master();
}
else
{
m_warn_set_standalone_master = true;
}
}
if (m_master != NULL && m_master->is_master())
{
// Update cluster-wide values dependant on the current master.
@ -1273,10 +1255,10 @@ extern "C" MXS_MODULE* MXS_CREATE_MODULE()
{"detect_stale_master", MXS_MODULE_PARAM_BOOL, "true"},
{"detect_stale_slave", MXS_MODULE_PARAM_BOOL, "true"},
{"mysql51_replication", MXS_MODULE_PARAM_BOOL, "false", MXS_MODULE_OPT_DEPRECATED},
{"multimaster", MXS_MODULE_PARAM_BOOL, "false"},
{"detect_standalone_master", MXS_MODULE_PARAM_BOOL, "true"},
{"multimaster", MXS_MODULE_PARAM_BOOL, "false", MXS_MODULE_OPT_DEPRECATED},
{CN_DETECT_STANDALONE_MASTER, MXS_MODULE_PARAM_BOOL, "true"},
{CN_FAILCOUNT, MXS_MODULE_PARAM_COUNT, "5"},
{"allow_cluster_recovery", MXS_MODULE_PARAM_BOOL, "true"},
{"allow_cluster_recovery", MXS_MODULE_PARAM_BOOL, "true", MXS_MODULE_OPT_DEPRECATED},
{"ignore_external_masters", MXS_MODULE_PARAM_BOOL, "false"},
{
"script",

View File

@ -143,14 +143,11 @@ private:
CycleInfo m_master_cycle_status; /**< Info about master server cycle from previous round */
// Replication topology detection settings
bool m_allow_cluster_recovery; /**< Allow failed servers to rejoin the cluster */
bool m_detect_replication_lag; /**< Monitor flag for MySQL replication heartbeat */
bool m_detect_multimaster; /**< Detect and handle multi-master topologies */
bool m_detect_stale_master; /**< Monitor flag for MySQL replication Stale Master detection */
bool m_detect_stale_slave; /**< Monitor flag for MySQL replication Stale Slave detection */
bool m_detect_standalone_master; /**< If standalone master are detected */
bool m_ignore_external_masters; /**< Ignore masters outside of the monitor configuration */
bool m_mysql51_replication; /**< Use MySQL 5.1 replication */
// Failover, switchover and rejoin settings
bool m_auto_failover; /**< Is automatic master failover is enabled? */
@ -174,7 +171,6 @@ private:
// Other settings
std::string m_script; /**< Script to call when state changes occur on servers */
uint64_t m_events; /**< enabled events */
bool m_warn_set_standalone_master; /**< Log a warning when setting standalone master */
bool m_log_no_master; /**< Should it be logged that there is no master */
bool m_warn_no_valid_in_cycle; /**< Log a warning when a replication cycle has no valid master */
bool m_warn_no_valid_outside_cycle; /**< Log a warning when a replication topology has no valid master
@ -196,8 +192,6 @@ private:
// Cluster discovery and status assignment methods
void update_server(MariaDBServer& server);
void find_graph_cycles();
bool standalone_master_required();
bool set_standalone_master();
void log_master_changes();
void update_gtid_domain();
void update_external_master();

View File

@ -487,7 +487,7 @@ const char* MariaDBServer::name() const
return m_server_base->server->name;
}
string MariaDBServer::diagnostics(bool multimaster) const
string MariaDBServer::diagnostics() const
{
std::stringstream ss;
ss << "Server: " << name() << "\n";
@ -507,14 +507,14 @@ string MariaDBServer::diagnostics(bool multimaster) const
{
ss << "Gtid binlog position: " << m_gtid_binlog_pos.to_string() << "\n";
}
if (multimaster)
if (m_node.cycle != NodeData::CYCLE_NONE)
{
ss << "Master group: " << m_node.cycle << "\n";
}
return ss.str();
}
json_t* MariaDBServer::diagnostics_json(bool multimaster) const
json_t* MariaDBServer::diagnostics_json() const
{
json_t* srv = json_object();
json_object_set_new(srv, "name", json_string(name()));
@ -541,7 +541,7 @@ json_t* MariaDBServer::diagnostics_json(bool multimaster) const
json_object_set_new(srv, "gtid_io_pos",
json_string(m_slave_status[0].gtid_io_pos.to_string().c_str()));
}
if (multimaster)
if (m_node.cycle != NodeData::CYCLE_NONE)
{
json_object_set_new(srv, "master_group", json_integer(m_node.cycle));
}

View File

@ -299,18 +299,16 @@ public:
/**
* Print server information to a json object.
*
* @param multimaster Print multimaster group
* @return Json diagnostics object
*/
json_t* diagnostics_json(bool multimaster) const;
json_t* diagnostics_json() const;
/**
* Print server information to a string.
*
* @param multimaster Print multimaster group
* @return Diagnostics string
*/
std::string diagnostics(bool multimaster) const;
std::string diagnostics() const;
/**
* Check if server is using gtid replication.