No longer writes events to the master, as this creates problems if the
promoted server was not the overall master. Instead, the slave status
output is inspected.
Clean up, comments and enhancements. StopWatch lap() didn't mean lap-time, but elapsed time. Changed meaning to lap-time and added split() for split-time.
In progress, does not yet overwrite existing code.
The new promotion mechanism automatically retries queries which timed out. It also
handles multimaster situations correctly.
The test adds a scheduled server event, the does failover, rejoin and
switchover and checks that event is manipulated correctly. Also includes
a change to the monitor to fix an invalid ALTER EVENT query when the event
definer has wildcard host.
Now logs messages explaining what has been done. Scheduled events are
disabled/enabled during the operation. Redirection of slaves is done at
the end similar to failover/switchover.
The 'reset_replication' module command deletes all slave connections and binlogs,
sets gtid to sequence 0 and restarts replication from the given master. Should be
only used if gtid:s are incompatible but the actual data is known to be in sync.
Event handling is now enabled by default. If the monitor cannot query the EVENTS-
table (most likely because of missing credentials), print an error suggesting to
turn the feature off.
When disabling events on a rejoining standalone server (likely a former master),
disable binlog event recording for the session. This prevents the ALTER EVENT
queries from generating binlog events.
Also added documentation and combined similar parts in the code.
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
Since the servers are not modified before or during the wait, the waiting
can be done in the preparation method. This simplifies the actual failover
somewhat, and allows the monitor to keep running normally while waiting for
the log to clear.
When the replication status from the external master is checked, the
pending status must be used. This makes sure that the SlaveStatusArray and
the server state are sync.
Also extended the message that was logged when the external master was
lost. By adding the network address there, it makes it easier to see where
the server was replicating from if only the log file is available.
This reduces the ambiguity of server id:s in the slave status contents.
If a slave connection has been seen properly connected at an earlier time,
it can be trusted to report the correct master server id. This also
fixes some wrong status assignment edge cases with the SERVER_WAS_SLAVE-bit.
The bit will be removed in a later commit.
Even this does not solve the situation when MaxScale is started with
some servers down.
Uses mostly the status functions for reading the flags. Strickly
speaking this breaks the REST API since in some cases (status combinations)
the printed string is different from what was printed before.
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.
This update required several modifications in related code.
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.
Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.