Auto-rejoin now explains more accurately if a server cannot be joined due
to conflicting gtid.
Also, auto-rejoin is no longer disabled if a join fails. Usually the fail
is due to the server not replying fast enough with query completion. The
query is often completed anyways. This can lead to some log spam.
Auto-failover is no longer considered to have failed if the preconditions
are not met. An error message with the failed checks is printed once, but
the checks are repeated every loop as long as the master is down.
If the feature is enabled (default off), at the end of a monitor loop
(once server states are known), read_only is enabled on slaves servers
without it.
The sql queries are given in two text files, defined by options promotion_sql_file
and demotion_sql_file. The files must exist when monitor starts. The files are read
line by line, ignoring empty lines and lines starting with '#'. All other lines
are sent to the server being promoted, demoted or rejoined. Any error in opening
a file, reading it or executing the contents will cause the entire operation to
fail.
The filed defined in demotion_sql_file is also ran when rejoining a server. This
is to ensure a previously failed master is "demoted" properly when it joins the
cluster.
If the master is replicating from an external master, the monitor will save the
host:port of the external server. During demotion, the old master stops the external
replication while the new master begins it. Also, any commands that would add
to gtid have to be omitted when an external master is in play.
"servers_no_promotion" is a comma-separated list of servers
which cannot be chosen when selecting a new master during failover
(auto or manual), or when automatically selecting a new master
for switchover (currently disabled).
The servers in the list are redirected normally and can be promoted
by switchover when manually selecting a new master.
For some reason, the source code of mysqlmon was split into C and C++
sources. This caused problems by effectively discarding all changes from
2.1 that are merged into 2.2.
This commit merges the changes into the correct file that were added to
the wrong file.
When enabled, the monitor will redirect servers to replicate from the
current master. Standalone servers and servers replicating from a slave
are redirected.
The new parameter allows ignoring of master servers that are external to
the monitor configuration. This allows sub-trees of the actual replication
tree to be used as fully fledged replication trees.
If the gtid_domain_pos of the master is ever modified,
gtid-variables will have multiple domains. Generally, we are
only interested in the most recent domain. This is tracked in
gtid_domain_id:s and the value of the master is used for
filtering the correct domain from all gtid-values.
Also, use gtid_current_pos instead of gtid_slave_pos. The
advantage of current_pos is that the same variable works also
for master servers. The gtid-handling is now more thorough and
detects some weird situations.
The master failure can now be verified by checking when the slaves are
connected to the master. If the slaves do not receive any events from the
master, the connections are considered as down after a configurable limit.
Added two parameters for controlling whether the check is done and for how
long the monitor waits before doing the failover.
Moved mon_process_failover() from monitor.cc to mysql_mon.cc. Renamed
some functions and variables related to previous failover functionality
to avoid confusion.
If a switchover_script parameter is given, its value will be used as
the switchover script. Otherwise the default will be used. Currently
just echo.
The MySQL Monitor now introduces two script variables, CURRENT_MASTER
and NEW_MASTER, that contain information about the current and new
master respectively.
Switchover is performed only if switchover has been enabled and MaxScale
is *not* in passive mode.
Tentative documentation.
With the 'switchover' config variable the switchover functionality
can be enabled. If enabled a REST API endpoint will appear, using
which that switchover can be initiated.
Switchover can only be performed when MaxScale is in active mode
and failover will be disabled for the duration of the switchover.
Only if the switchover succeeds, will failover be enabled again.
Might be easier to expose that REST API always and only change
the behaviour when calling it, instead of making it appear and
re-appear.
The `failover` and `failover_timeout` parameters are now declared as a
part of the mysqlmon module. Changed the implementation of the failover
function so that the dependencies on the monitor struct can be removed or
moved into parameters.
All monitors now persist the state of the server in a monitor journal
file.
Moved the removal of stale journals into the core and removed them from
the monitor journal interface.
If a monitor is started and stopped before the external monitoring thread
has had time to start, a deadlock will occur.
The first thing that the monitoring threads do is read the monitor handle
from the monitor object. This handle is given as the return value of
startMonitor and it is stored in the monitor object. As this can still be
NULL when the monitor thread starts, the threads use locks to prevent
this.
The correct way to prevent this is to pass the handle as the thread
parameter so that no locks are required.
The MySQL monitor stores the server states in a backup file which can be
used to restore the state of the servers even if MaxScale is stoppen in an
uncontrolled fashion.
The `failover_recovery` option allows failed servers to rejoin the
cluster. This should make using MaxScale with two node clusters easier.
One use case for this is when the replication-manager promotes the last
node in the cluster as the master. When this is done, the slave
configuration is cleared and the read-only mode is disabled. Since the
failover requires that the server is not configured as a slave and that it
is not in read-only mode, it is safe to use `failover_recovery` with
replication-manager.
This removes parts of the nearly identical code from all monitors.
The removal of monitor type specific event checking is done based on the
assumption that only the monitor that is monitoring the server can be the
cause for a state change. This removes the need to actually check that the
state change is relevant for each monitor and allows the event handling to
be moved into the core.
All monitors now declare the parameters that they use. This allows the
core to check the validity of the parameters before they are passed to the
monitor. It also simplifies the processing of the parameters as they are
guaranteed to be valid.
The general purpose stuff in skygw_utils.h was moved to utils.h
and the corresponding implementation from skygw_utils.cc to utils.c.
Includes updated accordingly.
Skygw_utils.h is now only used by log_manager and by mlist, which
is only used by log_manager. Consequently, skygw_utils.h was moved
to server/maxscale.
Utils.h needs a separate overhaul.
- All now include maxscale/cdefs.h as the very first file.
- MXS_[BEGIN|END]_DECLS added to all C-headers.
Strictly speaking not necessary for private headers, but
does not hurt either.
- Include guards moved to the very top of the file.
- #pragma once added.
- Headers now to be included as <maxscale/xyz.h>
- First step, no cleanup of headers has been made. Only moving
from one place to another + necessary modifications.
The mysqlmon simple failover mode allows it to direct write traffic to a
secondary node. This enables a very simple failover mode with MaxScale
when it is used in a two node master-slave setup.
The mysqlmon now supports proper detection of multi-master topologies by
building a directed graph out of the monitored server. If cycles are found from
this graph, they are assigned a master group ID. All servers with a positive
master group ID will receive the Master status unless they have `@@read_only`
enabled.
This new functionality can be enabled with the 'multimaster' boolean
parameter.
Mysqlmon now stores the values of read_only, slave_sql_running,
slave_io_running, the name and position of the masters binlog and the
replication configuration status of the slave.
This allows more detailed server information to be displayed with the
`show monitor <name>` diagnostic interface. In addition to this, the new
structure used to store them provides an easy way to store information
that is specific to a monitor and the servers it monitors.
These new status variables can be used to implement better multi-master
detection in mysqlmon by using the value of read_only to resolve
situations where multiple master candidates are available.