- Start 4 threads where each thread sits in a loop and performs
20% updates and 80% selects. Each thread has a table of its own.
- The main thread executes the following in a loop.
- Take down the current master and wait a while (failover assumed
to happen).
- Put up the old master node and wait a while.
Keep on doing that for 1.5 minutes.
At the end check that:
- There is one 'Master'.
- The other nodes are either
- 'Slave' or
- 'Running' in which case it is checked it is because the node could
not be rejoined.
If node00 is not the master when a test is started, the cleanup
performed by TestConnections takes considerable time.
By restoring the sitation back to normal before ending the test,
the startup time of the next test can be shortened significantly.
Now that mariadbmon supports failover and switchover, a test may
may change the master to some other node than node 0.
Consequently, as part of fixing the replication, gtid_slave_pos
of node 0 must be reset as well.
The test uses standard setup (1xMaster, 3xSlaves).
1. Shutdown master (server 1), check that autofailover promotes
a new master.
2. Stop MaxScale.
3. Start server 1 and add some events to it so it can no longer rejoin
cluster.
4. Start MaxScale, check that server 1 does not join.
5. Set current master to replicate from server 1, turning it to a relay
master.
6. Check that server 1 is master, all others are slaves (due to auto-rejoin).
A subset of the checks done at connection creation time need to be done at
query routing time. This guarantees that the connection is closed if the
server no longer qualifies as a valid candidate.
Added teset case that checks that a change in the replication topology
correctly breaks the connection.
The tests now reset the replication state using queries and switchover instead of
calling fix_replication(). The results are checked so these tests now test
switchover as well.
Also, reduce printing when verbose is on for any test using the get_output()-function
in fail_switch_rejoin_common.cpp.
The expected and received value are now both logged when an error
occurs. The execution is also cut short to prevent excessive message
flooding as most of the time all comparisons fail.
The test can fail if the Galera nodes aren't synced when the connection to
MaxScale is made. Adding a small sleep should allow the Galera cluster to
stabilize after the configuration switch.