If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.
By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
If the master succeeds in executing a session command but the slave fails,
the error message could help explain why it failed. At the moment this is
mainly relevant for inspection of test results.
Previously, if the server had no gtid:s, the method would fail leading to
a confusing error message. This could even totally stop the monitor from working
if a recent server version (10.X) did not have any gtid events.
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.
This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
If the client sends two different sets of capability bits during the
authentication phase of an SSL enabled connection, both sets need to be
combined. This prevents capabilities from degrading mid-connection which
is the case when Oracle Connector/J drops the SSL capability bit
mid-authentication.
The servers with a zero weight would be always used over ones that have a
weight. This means that the behavior was inverted and caused the
mxs2054_hybrid_cluster test to fail in 2.3.
Also fixed a typo in the deprecation message.
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.
As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
If the service doesn't require collection of complete packets, the user
reauthentication done with COM_CHANGE_USER would be skipped. This caused
the change_user test to fail.
By temporarily switching to full packet collection mode for the duration
of the COM_CHANGE_USER, we avoid duplicating the code for the streaming
router types.
If the initial handshake that is sent by the accepting thread is buffered,
the subsequent flushing of it is done by the owning thread. As
cross-thread buffer usage is not allowed, the initial handshake must be
sent by the owning thread.
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.
When the connection to the master is broken, the session is not configured
to use the read-only modes and the monitor can still connect to the
server, the connection will be closed and and error is sent to the
client. To leave some trace of this problem in the MaxScale logs, a
message should always be logged when a network error occurs.
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.
All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.
When session information is queried for, we report as much information
we have available.
The main class was getting unwieldly and too general. Dividing the fields
helps adding support for other operation types.
This commit leaves most data duplicated, later commits clean up the affected code.
The causal_reads_timeout default value is too long when considering the
behavioral changes that MXS-2141 introduced. With a 10 second default
value, a result is returned to the client in a reasonable amount of time.
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
Added the csmon monitor which supports both old and new ColumnStore
servers. As older server versions aren't able to express their role, the
master needs to be designated by the user. When a ColumnStore version is
released that supports the mcsSystemPrimary() function, the master can be
automatically found.