The switchover sometimes fails due to a broken connection when the STOP
SLAVE on the new master is executed. Nothing is logged on the server in
question and the error message simply states that the connection was lost
in the middle of a query.
Increasing the query_retries to 1 reduced the likelihood of failure from
about 1/3 of tests failing to roughly 1/6 of tests failing. Increasing it
to 5 seems to remove it completely. As to what is the real reason this
happens, we do not yet know.
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.
Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
The test environment isn't always pristine after a test run so for the
sake of being able to actually test what we're attempting to test, we
should ignore duplicate databases for the time being.
The long-term fix is detect when a test doesn't clean up after itself.
Don't test failover functionality when it is not needed. The bug is only
about the extra events that appear when a master is demoted and a slave is
promoted.
The tests can now wait for a number of monitor intervals. This removes the
need to have hard-coded sleeps in the code and makes monitor tests more
robust under heavier load.
The test may fail if the client/maxscale/mariadb combo is too slow.
TODO, maybe: today I saw mysql_query() hang (in a poll) when maxscale
disconnected. Is there something that can be done about that?
I added a source directory, base, for stuff that should become part of a common
utility library in the future.
Port 9003 is not open by default in the test environment. Changing it to
port 4006, which is open, will work around this restriction.
Also added the mysql_error output to the error message when the querying
fails.
The `master_reconnection` parameter now controls both the reconnection of
the master server as well as the migration of the master server to another
server. Although these two cases appear to be different, the end result
from readwritesplit's point of view is the same and are thus controlled
with the same parameter.
The RWBackend class now resets its internal state when it is closed. This
allows readwritesplit to handle the case when a result was expected from
the master but the master died before the result was returned. The same
code should also handle slave connection failures mid-result, allowing
Backend reuse.
Added a test case that verifies the new functionality when combined with
`master_failure_mode=error_on_write`.