The test suite now compiles Jansson instead of letting the CDC connector
do it. This way the connector can be used as a very simple static library
with a dependency on the Jansson library.
The CDC connector now uses a non-blocking socket for the reads. This
allows the possibility of adding read timeouts.
Added some utility functions for dealing with GTIDs and delayed the
reading of the first row.
The transaction safety was checked even if master GTID registration was
disabled. This always caused a failure when the router was started without
the transaction safety parameter.
As transaction safety is required by the GTID registration, it is not very
helpful to refuse to start if an invalid set of options is detected. To
make usage of the master GTID registration easier, the transaction safety
is also automatically enabled.
The Gtid_Slave_Pos returned by SHOW ALL SLAVES STATUS is not quite
reliable (MDEV-14182) so the variable version is used instead. Added
a convenience function for querying a single row of values.
Also, gtid_strict_mode, log_bin and log_slave_updates are now
queried during failover. The first only causes a warning message
if disabled, the last two affect new master selection.
The two depended on the PCRE2 and Connector-C libraries which means that
the libraries need to be built first. This information needs to be told to
CMake with the add_dependency call.
The cdc-connector did not build on Ubuntu Trusty due to the wrong order of
linker flags. Moving the crypt and crypto linkage to be after
cdc-connector appears to fix it.
The CDC connector was moved to its own repository and some changes to its
interface were made. Updated build scripts, deleted old connector and
fixed code to use new interfaces.
The test suite now compiles Jansson instead of letting the CDC connector
do it. This way the connector can be used as a very simple static library
with a dependency on the Jansson library.
The CDC connector now uses a non-blocking socket for the reads. This
allows the possibility of adding read timeouts.
Added some utility functions for dealing with GTIDs and delayed the
reading of the first row.
The authentication code assumed that the initial request only had
authentication related data. This is not true if the client library
predicts that the authentication will succeed and it sends a query right
after it sends the authentication data.
Gtid_Slave_Pos may contain multiple triplets even with single-source
replication if the domain has changed at some point. For failover, we
only need to know the current domain values, so the gtid-parsing now
accepts an optional domain parameter. The Gtid-class still only stores
one triplet of values.
When parsing the Show Slave Status result, Gtid_IO_Pos is parsed first.
The resulting domain is then read from Gtid_Slave_Pos.
When selecting the new master server, Gtid_IO_Pos is checked to
select the slave with the latest event in relay log. If there is a
tie, the slave that has processed most events wins.
It's possible that the winning slave has unprocessed events. In
this case, failover waits for the slave to complete processing the
log. The maximum wait is defined in monitor parameter
"failover_timeout", defaulting to 90 seconds. If time runs out
failover ends in failure.
The Gtid struct was separated to its own definition to handle gtid:s
easier.
From the documentation:
* `never`: When there is an active transaction, no data will be returned
from the cache, but all requests will always be sent to the backend.
The cache will be populated inside _explicitly_ read-only transactions.
Inside transactions that are not explicitly read-only, the cache will
be populated _until_ the first non-SELECT statement.
* `read_only_transactions`: The cache will be used and populated inside
_explicitly_ read-only transactions. Inside transactions that are not
explicitly read-only, the cache will be populated, but not used
_until_ the first non-SELECT statement.
* `all_transactions`: The cache will be used and populated inside
_explicitly_ read-only transactions. Inside transactions that are not
explicitly read-only, the cache will be used and populated _until_ the
first non-SELECT statement.
When deciding whether the cache should be consulted or not,
the value of the configuration parameter 'cache_in_transaction'
is taken into account as well.
The SlaveStatus info is now in a separate class, although it's
still embedded in the MYSQL_SERVER_INFO-class. Both classes now
use strings intead of char*:s.
As the passive parameter is only used by the failover and the failover can
only be initiated by the monitor, there is no true need to synchronize the
reads and write of this parameter.
As all runtime changes are protected by the runtime lock, only partial
reads are of concern. For the supported platforms, this is not a practical
problem and it only confuses the reader when other variables are modified
without atomic operations.
The master failure was assumed to be the only master related event for
each monitoring loop. If the master was switched by an external actor, the
monitor tracking would be out of sync.
The helper function provides map-like access to row values. This is used
to retrieve the values for all MariaDB 10.0+ versions as there are
differences in the returned results between 10.1 and 10.2.
Using timestamps to detect whether MaxScale was active or passive can
cause problems if multiple events happen at the same time. This can be
avoided by separating events into actively observed and passively observed
events. This clarifies the logic by removing the ambiguity of timestamps.
As the monitoring threads are separate from the worker threads, it is
prudent to use atomic operations to modify and read the state of the
MaxScale. This will impose an happens-before relation between MaxScale
being set into passive mode and events being classified as being passively
observed.
The master failure can now be verified by checking when the slaves are
connected to the master. If the slaves do not receive any events from the
master, the connections are considered as down after a configurable limit.
Added two parameters for controlling whether the check is done and for how
long the monitor waits before doing the failover.
The slave heartbeat count and period are collected from the SHOW ALL
SLAVES STATUS output. This, in addition to the relay log position, is used
to calculate the point in time when a slave has last interacted with the
master.
By using this timestamp, the monitor can enforce a minimum "timeout" for
the master before a failover is performed.
Now SHOW ALL SLAVES STATUS reports new fields:
Retried_transactions;
Max_relay_log_size,
Executed_log_entries,
Slave_received_heartbeats,
Slave_heartbeat_period,
Gtid_Slave_Pos"
As the Avro C API depends on the Jansson library, the build scripts must
build it. This is not optimal as the Jansson version needs to be updated
in two places.