When MaxScale thinks that the master has failed, it tries to verify it by
seeing if the slave server is receiving events. There was a missing IO
thread status check in the slave_receiving_events function which caused
the failover to wait until the verification timed out.
The relay master detection logic also lacked a check for the slave SQL
thread status. The code should check the state of the SQL thread to
determine whether the server is actually a functional slave to a master.
The fix causes a regression in the failover functionality as there is a
dependency between the slave's master ID and how the failover
performs. This dependency should not exist but fixing it causes a problem
with the mysqlmon_rejoin_bad2 test.
Startup now done in a static method. Constructor initializes some values.
Config parameters loaded in a separate method. Some things still need
looking.
Previously, if the list contained servers that were not monitored by
the monitor yet were valid servers, an error value would be returned
and the monitor failed to start.
With this update, the non-monitored servers are simply ignored when
forming the final list.
Also, added printing of the list to diagnostics.
If the master is replicating from an external master, the monitor will save the
host:port of the external server. During demotion, the old master stops the external
replication while the new master begins it. Also, any commands that would add
to gtid have to be omitted when an external master is in play.
When four servers (A, B, C and E where E and A replicate from each other
and A is the master for B and C) form a cluster and only three of them (A,
B and C) are configured into MaxScale, a failover operation from A to B
(making B the current master) and a restart of A causes B to lose its
master status.
The following diagram illustrates the state of the cluster at the end of
the process described above.
+----------------------+
| +---+ |
+------------+ B <-+ |
+-v-+ | +---+ | |
| E | | | |
+-^-+ | +---+ +-+-+ |
+------+ A | | C | |
| +---+ +---+ |
| |
+----------------------+
The external server E was not correctly ignored in the replication
topology generation causing both A and B to be seen as the lowest slave
nodes in the tree. From a theoretical point of view this is the correct
interpretation as there are two distinct trees and neither of them
contains any true masters.
In practice, MaxScale should treat any servers that replicate from an
external master as root level master nodes. Doing this guarantees that they
are labeled as masters if they have slaves replicating from them.
When detect_standalone_master is enabled, the root_master variable was not
updated after the master was changed by the standalone server detection
mechanism. This caused debug assertions to fire in addition to possibly
causing some of the ignore_external_masters logic to break.
"servers_no_promotion" is a comma-separated list of servers
which cannot be chosen when selecting a new master during failover
(auto or manual), or when automatically selecting a new master
for switchover (currently disabled).
The servers in the list are redirected normally and can be promoted
by switchover when manually selecting a new master.
The Master status now prevents Slave status from being assigned to a
server. In practice this simply means that the master will not have both
the Master and Slave status bits.
In debug mode, when scanning the server id from a string, check that resulting
number is 32bit. Also, when querying the server id, query the global version.
Now, if a super user modifies the server id the monitor will notice it.
Server id:s in gtid:s are handled similarly.
Now detects some erroneous situations before starting switchover.
Switchover can be activated without specifying current master.
In this case, the cluster master server is selected.
The monitor will now also create the database if it is missing. Since it
already creates the table, also creating the database is not a large
addition.
Cleaned up some of the related checking code and combined them into a
simple utility function.