The behavior of mariadbmon was changed so that it better understands
slaves attempting to replicate. Rewrote the test to accommodate the change
in behavior and take the opportunity to use newer code.
When the filters of a service are modified, each session to that service
should either see the old filters or the new filters i.e. the operation
must be atomic.
The command naming caused problems when other parts of the service were
being altered. The parser doesn't seem to handle the case of overlapping
commands that well.
Updated test cases with new code and adjusted syntax accordingly.
If the test would otherwise succeed but a coredump is found when the logs
are copied in the destructor, the test would pass. To prevent this, the
destructor should always exit with a non-zero status when it detects an
error.
The switchover sometimes fails due to a broken connection when the STOP
SLAVE on the new master is executed. Nothing is logged on the server in
question and the error message simply states that the connection was lost
in the middle of a query.
Increasing the query_retries to 1 reduced the likelihood of failure from
about 1/3 of tests failing to roughly 1/6 of tests failing. Increasing it
to 5 seems to remove it completely. As to what is the real reason this
happens, we do not yet know.
The causal read queries were performed also when the target server was the
master. The extra functionality of the causal reads is only needed on
slaves.
Adjusted the test case to require GTID replication.
The configuration used the wrong parameter name. The test also did not
explicitly enable tracking of the last_gtid variable which caused it to
fail if it wasn't already on.
The MariaDB 10.3.8 release fixes the use of last_gtid as a value for
session_track_system_variables. This means that the test can be enabled if
the backends are new enough.
As we know that the test doesn't rely on absolute binlog positions, we can
force synchronization by creating a table on the master and waiting until
that table is replicated to all slaves.
Streamlined the test to perform as much testing as fast as possible. The
gradual ramp up did not provide any concrete benefits compared to testing
everything at once.
Replaced structures with C++11 alternatives where possible and removed
unused, redundant or dead code.
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.
Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
The test environment isn't always pristine after a test run so for the
sake of being able to actually test what we're attempting to test, we
should ignore duplicate databases for the time being.
The long-term fix is detect when a test doesn't clean up after itself.
Some of the older tests expected results that didn't make much sense. The
mxs1643_extra_events test should expect the master to devolve into the
Running state when it is set into read-only. This is understandable as a
set of servers consisting only of slaves is rather disorienting.
In mxs1678_relay_master breaking the replication of the relay master
devolves it into the Running state as well as all slaves replicating from
it. This is a better result as in a real-life scenario only the valid and
up-to-date slaves would be used.