The `master_reconnection` parameter now controls both the reconnection of
the master server as well as the migration of the master server to another
server. Although these two cases appear to be different, the end result
from readwritesplit's point of view is the same and are thus controlled
with the same parameter.
The RWBackend class now resets its internal state when it is closed. This
allows readwritesplit to handle the case when a result was expected from
the master but the master died before the result was returned. The same
code should also handle slave connection failures mid-result, allowing
Backend reuse.
Added a test case that verifies the new functionality when combined with
`master_failure_mode=error_on_write`.
The `MYSQL_ROW row` variable was being overwritten by the extra query done
by the SST method detection code. Moving it into its own function prevents
this and makes the code significantly easier to comprehend.
Added a test case that reproduced the problem (MaxScale crashed) and
verifies that the patch fixes the problem.
Testing of routing behavior with master_failure_mode=error_on_write and
allow_master_changes=true. By sending an error instead of closing the
connection when the master fails, the connection can resume execution if a
new master becomes available.
Added test cases that verify that the functionality works as
expected. Also made Mariadb_nodes::change_master less verbose when one of
the nodes is down.
The test checks that failover works even when the master of the monitored
cluster is a slave to an external masters. The test also verifies that the
servers do not get unexpected status labels.
Tests that local_address is taken into account. However, at the time
of writing the maxscale VM does not have two usable IP addresses, so
we only test that explicitly specifying an IP-address does not break
things.
Locally it has been confirmed that this indeed works the way it is
supposed to.
- Start 4 threads where each thread sits in a loop and performs
20% updates and 80% selects. Each thread has a table of its own.
- The main thread executes the following in a loop.
- Perform a switchover from the current master to the next (which is
simply the next node % all nodes).
- Keep on doing that for 1.5 minutes.
The expectation is that the switchover will succeed, that is, after the
operation there will be a new master.
- Start 4 threads where each thread sits in a loop and performs
20% updates and 80% selects. Each thread has a table of its own.
- The main thread executes the following in a loop.
- Take down the current master and wait a while (failover assumed
to happen).
- Put up the old master node and wait a while.
Keep on doing that for 1.5 minutes.
At the end check that:
- There is one 'Master'.
- The other nodes are either
- 'Slave' or
- 'Running' in which case it is checked it is because the node could
not be rejoined.
The tests now reset the replication state using queries and switchover instead of
calling fix_replication(). The results are checked so these tests now test
switchover as well.
Also, reduce printing when verbose is on for any test using the get_output()-function
in fail_switch_rejoin_common.cpp.
auto_failover=true
auto_rejoin=false
This test tests the following:
- Regular master-slave setup
- Create a table, insert some data
- Sync all slaves
- Stop a slave
- Insert some more data
- Sync remaining slaves
- Stop the master
- Expect the failover mechanism to pick a new master (server2)
- Bring up the slave
- Perform a switchover from server2 to server4
- Should fail
Currently it does fail, but only due to a timeout.
[mysqlmon] MASTER_GTID_WAIT() timed out on slave 'server4'.
There should be some check that would ensure that the failure happens
faster than that.
This test tests the following:
- Regular master-slave setup
- Create a table, insert some data
- Sync all slaves
- Stop a slave
- Insert some more data
- Sync remaining slaves
- Stop the master
- Expect the failover mechanism to pick a new master
- Bring up the slave
- Expect the slave to be rejoined
- The test starts with the usual setup of 1 master and 3 slaves.
- Then the master is taken down and it is checked that the failover
mechanism promotes some slave to master.
- This is continued until there is a single master left (no
slaves).