This correctly triggers the session command response processing to accept
results from other servers than the current master backend if the session
can continue. If the session cannot continue, it will be stopped
immediately.
The use of the server state is not transactional across multiple uses of
the function. This means that any assertions on the target state can fail
if the monitor updates the state between target selection and the
assertion.
The slave backend would be closed twice if it would both respond with a
different result and be closed due to a hangup before the master
responded.
Added a test case that reproduced the problem.
The errors that are ignored by readwritesplit are now stored as the
current close reason in the Backend. This allows the information about the
error to be retained and it can be used later in the error handler to
display the true reason why the connection was closed.
This allows the same verbose information to be logged in the cases where
it is of use. Mostly this information can be used to figure out why a
certain session was closed.
By doing the reconnection only when a new query arrives, we prevent the
excessive reconnecting that is done when a server's actual and monitored
states are in conflict.
If a master failed during an ongoing session command history replay, it
would be treated as if a normal session command failed which would result
in the already executed session command being re-executed on all servers
at the wrong logical position.
To fix this, the history replay must be distinguished from normal session
command execution. When a connection replaying the history fails, the
query routing simply needs to be attempted again.
If session command execution during server reconnection caused a query to
be queued, the query would be put on the tail end of the queue. This would
cause queries to be reordered if the queue wasn't empty. The correct thing
to do would be to put the next pending query back at the front of the
queue.
If a master reconnection occurred after the session command history was
disabled due to the limit being exceeded, a debug assertion would be hit
in prepare_target. This assert makes sure that a connection can be safely
created to the server which means that in release mode builds the session
state would be inconsistent on the new master.
As this is an unrecoverable situation, the session should stop immediately
even if delayed_retry is enabled. Currently the session will continue
until the delayed retry timeout is hit. This happens due to the fact that
the delayed retry mechanism handles all errors in a similar way.
The expected response counter was not decremented if a transaction replay
was started. This caused the connections to hang which in turn caused the
failure of the mxs1507_trx_stress test case.
If the slave's response differs from the master and the slave sent an
error packet, log the contents of the error. This should make it obvious
as to what caused the failure.
The assertion in routeQuery that expects there to be at least one ongoing
query would be triggered if a query was received after a master had failed
but before the session would close. To make sure the internal logic stays
consistent, the error handler should only decrement the expected response
count if the session can continue.
This could end up in infinite mutual recursion if no responses are
expected. Although this does not happen now that MXS-2587 is fixed, the
code should not even be there.
If a transaction replay fails, no queries must be routed before the
connection is closed. This could happen if the client received the error
from the replay failure and closes the connection before the fake hangup
generated by the replay failure is processed.
The error was only generated for COM_STMT_EXECUTE commands when all PS
commands should trigger it. In addition, large packets would get sent two
errors upon the arrival of the trailing end.
If a server fails mid-resultset, there's not a lot we can do to recover
the situation. A few cases could be handled (e.g. generate an ERR if the
resultset has proceeded to the row processing stage) but these fall
outside the scope of the original issue.
If a COM_STMT_EXECUTE has no metadata in it and it has more than one
parameter, it must be routed to the same backend where the previous
COM_STMT_EXECUTE with the same ID was routed to. This prevents MDEV-19811
that is triggered by MaxScale routing the queries to different backends.
By always restoring the ID, we are guaranteed to only store the query in
the form that it was originally sent in. This should be changed so that
the ID that the client sends can be used as-is in the backends.
If one slave is executing a query while another one is executing a session
command and the one that is executing the session command fails, the
ongoing query would get retried even though the server that failed was not
executing it. If the server was executing a session command, nothing needs
to be done.
If the execution of a session command fails on a master, it is retried
again. If the master is not available, the response will be returned from
one of the slaves.
The retrying of a read on a slave should only be done when the failing
server is waiting for a result and it was the last server from which a
result was expected.
If the master fails when a session command is being executed with
delayed_retry enabled, a null query would get placed into the query
queue. This change simply prevents the crash and closes the session even
though the query could be retried.
A query should not be queued if no responses are expected. The code that
executes queued queries should be dead code and this assertion would catch
it.