Now the test program will
1) Write to each node in a Galera cluster and verify that the data
ends up in the slave.
2) At the end of 1) execute STOP SLAVE and START SLAVE to check that
replication can be stopped and started again (won't work unless
each node has the same server_id and value for @@log_bin_basename).
3) Block the node BLR is replicating from and expect it to connect
to the next configured master and that replication continues to
work. Do that for all nodes.
4) Stop MaxScale and restart it and expect 3) to work. That checks
that BLR saves all necessary information in master.ini and is
capable of reading it.
It should be possible to START SLAVE and STOP SLAVE irrespective
of which Galera node updates are mode to.
That will be the case if @@log_slave_updates is on and each node
in the Galera cluster have the same server id. Otherwise it will
fail with the current incarnation of BLR.
If
* BLR replicates from a node in a Galera cluster and
* writes are made to all nodes in that cluster,
then
* if a slave to BLR is stopped when it has received an event
originating in a node different than the one BLR is replicating
from
the subsequent (re)starting of the slave will fail because BLR looks
for the last event from a file whose path contains the server id of
the node where the event originates, although it should look for it
in the file whose path contains the server id of the node from which
BLR replicates.
The behavior of mariadbmon was changed so that it better understands
slaves attempting to replicate. Rewrote the test to accommodate the change
in behavior and take the opportunity to use newer code.
The switchover sometimes fails due to a broken connection when the STOP
SLAVE on the new master is executed. Nothing is logged on the server in
question and the error message simply states that the connection was lost
in the middle of a query.
Increasing the query_retries to 1 reduced the likelihood of failure from
about 1/3 of tests failing to roughly 1/6 of tests failing. Increasing it
to 5 seems to remove it completely. As to what is the real reason this
happens, we do not yet know.
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.
Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
The test environment isn't always pristine after a test run so for the
sake of being able to actually test what we're attempting to test, we
should ignore duplicate databases for the time being.
The long-term fix is detect when a test doesn't clean up after itself.
Don't test failover functionality when it is not needed. The bug is only
about the extra events that appear when a master is demoted and a slave is
promoted.
The tests can now wait for a number of monitor intervals. This removes the
need to have hard-coded sleeps in the code and makes monitor tests more
robust under heavier load.
The test may fail if the client/maxscale/mariadb combo is too slow.
TODO, maybe: today I saw mysql_query() hang (in a poll) when maxscale
disconnected. Is there something that can be done about that?
I added a source directory, base, for stuff that should become part of a common
utility library in the future.