To allow MariaDBMon to be used with Clustrix we need to handle
Clustrix separately as its apparent version is 5.0.45, which is
lower than what MariaDBMon supports. Further, we must ensure that
Clustrix does not query the slave status as there are no slaves
in the M/S sense in a Clustrix cluster.
NOTE: Once there is a specific Clustrix monitor, this code should
be removed.
Under heavy load some of the basic network operations could fail which led
to some of the allocated memory to leak.
Also the backend protocol never freed the current protocol command if it
was not completed. This would happen if a user executed a session command
as the first command but backend authentication would fail.
The authentication code did not initialize one of the buffers used to
calculate the password hashes. This resulted in the use of uninitialized
memory when the user provided no password.
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.
Also fixed the formatting.
If a Galera cluster drops down to a single node, the last node would not
be considered valid. During the failure of the second to last node, the
master would also temporarily lose the master status.
The behavior was changed to always keep the cluster UUID until the cluster
size drops down to zero. This guarantees that the same cluster is used as
long as possible.
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.
By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
The assertion would hold true for a single worker but it can't be
guaranteed to be true on a multi-worker system where the statistics are
distributed across the workers.
Enabling the feature by default prevents the master connection from dying
during times when there are very little or no writes. Having a modest ping
interval of 300 seconds serves to minimize the amount of extra work that
both MaxScale and the server have to do while still keeping the
connections in good shape.
Used only with supporting server versions. Using the time limit ensures that
the server interrupts the query at the same point Connector-C would cut the
connection. This prevents lingering queries.
Also, cleans up some associated error messages.
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.
By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
If the master succeeds in executing a session command but the slave fails,
the error message could help explain why it failed. At the moment this is
mainly relevant for inspection of test results.
Previously, if the server had no gtid:s, the method would fail leading to
a confusing error message. This could even totally stop the monitor from working
if a recent server version (10.X) did not have any gtid events.
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.
This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
If the client sends two different sets of capability bits during the
authentication phase of an SSL enabled connection, both sets need to be
combined. This prevents capabilities from degrading mid-connection which
is the case when Oracle Connector/J drops the SSL capability bit
mid-authentication.
The servers with a zero weight would be always used over ones that have a
weight. This means that the behavior was inverted and caused the
mxs2054_hybrid_cluster test to fail in 2.3.
Also fixed a typo in the deprecation message.
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.
As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
If the service doesn't require collection of complete packets, the user
reauthentication done with COM_CHANGE_USER would be skipped. This caused
the change_user test to fail.
By temporarily switching to full packet collection mode for the duration
of the COM_CHANGE_USER, we avoid duplicating the code for the streaming
router types.
If the initial handshake that is sent by the accepting thread is buffered,
the subsequent flushing of it is done by the owning thread. As
cross-thread buffer usage is not allowed, the initial handshake must be
sent by the owning thread.
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.