When detect_standalone_master is enabled, the root_master variable was not
updated after the master was changed by the standalone server detection
mechanism. This caused debug assertions to fire in addition to possibly
causing some of the ignore_external_masters logic to break.
"servers_no_promotion" is a comma-separated list of servers
which cannot be chosen when selecting a new master during failover
(auto or manual), or when automatically selecting a new master
for switchover (currently disabled).
The servers in the list are redirected normally and can be promoted
by switchover when manually selecting a new master.
The Master status now prevents Slave status from being assigned to a
server. In practice this simply means that the master will not have both
the Master and Slave status bits.
In debug mode, when scanning the server id from a string, check that resulting
number is 32bit. Also, when querying the server id, query the global version.
Now, if a super user modifies the server id the monitor will notice it.
Server id:s in gtid:s are handled similarly.
Now detects some erroneous situations before starting switchover.
Switchover can be activated without specifying current master.
In this case, the cluster master server is selected.
The monitor will now also create the database if it is missing. Since it
already creates the table, also creating the database is not a large
addition.
Cleaned up some of the related checking code and combined them into a
simple utility function.
Time elapsed is now properly tracked during a switchover. After slave
redirection, an event is added to the master. Then, the slaves are queried
repeatedly until they advance to the newest event. I/O and SQL errors are
also detected.
During switchover, MASTER_GTID_WAIT is now called on all slaves. This causes
switchover to complete slower than before but is safer if log_slave_updates
is not on on the new master server. Also, read_only is disabled on the
demoted server if waiting on slaves or promotion fails. This should
effectively cancel the failover for the old master.
'mysqlmon' is still accepted but 'mariadbmon' is loaded instead.
This is done at runtime instead of e.g. by using a symbolic link,
so that a warning can be logged.
The warning is logged and the translation of the module name is
made by the code that loads the modules so that it's easy to do
the same thing for other modules as well.
In a subsequent commit the documentation is updated.
Change the ordering of the two flushes such that FLUSH LOGS comes last.
This seems to make sure gtid:s are updated to newest values before
the MASTER_GTID_WAIT-call. Without this fix, switchover does complete
succesfully, but some of the slaves may not be able to replicate due to
not having same events as new master. Exact reason for this still unclear.
For some reason, the source code of mysqlmon was split into C and C++
sources. This caused problems by effectively discarding all changes from
2.1 that are merged into 2.2.
This commit merges the changes into the correct file that were added to
the wrong file.
Previously, the rejoin would only be ran on servers with a connected slave io
thread. This patch runs the rejoin also on slaves which cannot connect to a
downed old master while the master hostname or port differs from the current
cluster master server.
When enabled, the monitor will redirect servers to replicate from the
current master. Standalone servers and servers replicating from a slave
are redirected.
The new parameter allows ignoring of master servers that are external to
the monitor configuration. This allows sub-trees of the actual replication
tree to be used as fully fledged replication trees.
If the gtid_domain_pos of the master is ever modified,
gtid-variables will have multiple domains. Generally, we are
only interested in the most recent domain. This is tracked in
gtid_domain_id:s and the value of the master is used for
filtering the correct domain from all gtid-values.
Also, use gtid_current_pos instead of gtid_slave_pos. The
advantage of current_pos is that the same variable works also
for master servers. The gtid-handling is now more thorough and
detects some weird situations.
If given a readily selected master server, Switchover will use it
as the new master. If the given server is invalid, nothing will
happen and an error is returned.
The internal header directory conflicted with in-source builds causing a
build failure. This is fixed by renaming the internal header directory to
something other than maxscale.
The renaming pointed out a few problems in a couple of source files that
appeared to include internal headers when the headers were in fact public
headers.
Fixed maxctrl in-source builds by making the copying of the sources
optional.
The setting limits the maximum time a MASTER_GTID_WAIT-function
can wait. To work around this limitation, the function is now called
in a loop such that the total timeout is approximately equal to
the requested timeout.
Slave redirection is a special case, as there the total failure
is only known after all redirects have been attempted. In the
failure case, all errors from connections are gathered to one
message.
If a server goes down and it has the stale master bit enabled, all other
bits for the server are cleared. This allows failed masters that have been
replaced to be first detected and then reintroduced into the replication
topology.
The slave and stale slave status bits should be cleared from a master if
it still has them.
Also used the correct functions to manipulate the bits instead of directly
setting them in the monitor.
The value of the global gtid_slave_pos is only needed during
failover, so querying it every monitor loop is unnecessary. The
value is now only requested when deciding on a new master server
or when waiting for the selected promotion target to clear its
relay logs.
Also, when waiting for the logs to clear, gtid_io_pos must stay
constant or failover is cancelled. Io_pos advancing indicates that
the server is still receiving events from the old master.
The Gtid_Slave_Pos returned by SHOW ALL SLAVES STATUS is not quite
reliable (MDEV-14182) so the variable version is used instead. Added
a convenience function for querying a single row of values.
Also, gtid_strict_mode, log_bin and log_slave_updates are now
queried during failover. The first only causes a warning message
if disabled, the last two affect new master selection.