Since the servers are not modified before or during the wait, the waiting
can be done in the preparation method. This simplifies the actual failover
somewhat, and allows the monitor to keep running normally while waiting for
the log to clear.
When the replication status from the external master is checked, the
pending status must be used. This makes sure that the SlaveStatusArray and
the server state are sync.
Also extended the message that was logged when the external master was
lost. By adding the network address there, it makes it easier to see where
the server was replicating from if only the log file is available.
This reduces the ambiguity of server id:s in the slave status contents.
If a slave connection has been seen properly connected at an earlier time,
it can be trusted to report the correct master server id. This also
fixes some wrong status assignment edge cases with the SERVER_WAS_SLAVE-bit.
The bit will be removed in a later commit.
Even this does not solve the situation when MaxScale is started with
some servers down.
Uses mostly the status functions for reading the flags. Strickly
speaking this breaks the REST API since in some cases (status combinations)
the printed string is different from what was printed before.
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.
This update required several modifications in related code.
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.
Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
Not yet used, as more is needed to replace the old code. The
algorithm is based on counting the total number of slave nodes
a server has, possibly in multiple layers and/or cycles.
This makes the code clearer and reduces race conditions, as the monitor
could be writing SERVER->status while a router is reading it. Also,
the time during which the SERVER struct is locked drops to a fraction.
Since monitors are now freed at MaxScale exit, the server data should be freed. Also,
gtid domain variables are now initialized with a common constant.
The updating of GTIDs was only considered successful if both the current
GTID position and binlog GTID positions were non-empty. If a slave has no
binlogged events, the GTID update would always fail.
This change in behavior caused the mysqlmon_failover_auto and
mysqlmon_failver_manual tests to break. The test disabled the binary log
on one of the servers which caused it to be left out from the rejoining
process.