This could happen if a session command triggers a master reconnection and
the connection fails while the history replay is ongoing. The code assumed
that history replay would only happen when a query was in the query queue.
This correctly triggers the session command response processing to accept
results from other servers than the current master backend if the session
can continue. If the session cannot continue, it will be stopped
immediately.
Session commands did not trigger a reconnection process which caused
sessions to be closed in cases where recovery was possible.
Added a test case that verifies the patch fixes the problem.
If the session command could not be routed, the log message should contain
the actual command that was routed. This makes failure analysis much
easier.
If a limit on the replication lag is configured, servers with unmeasured
replication lag should not be used. The code in question did use them even
when a limit was set as the value used for undefined lag was -1 which
always measured lower than the limit.
The code relied on last_read for the idle time calculation which caused
the pings that were written to not reset the idle time. This increased the
chance of multiple COM_PING packets being sent to a backend before a reply
was received.
The use of the server state is not transactional across multiple uses of
the function. This means that any assertions on the target state can fail
if the monitor updates the state between target selection and the
assertion.
The slave backend would be closed twice if it would both respond with a
different result and be closed due to a hangup before the master
responded.
Added a test case that reproduced the problem.
It appears that rollback errors are possible outside of
transactions. Since this was not something we expected to see, logging it
as an error allows us to see why this happens in production deployments.
The errors that are ignored by readwritesplit are now stored as the
current close reason in the Backend. This allows the information about the
error to be retained and it can be used later in the error handler to
display the true reason why the connection was closed.
When lazy_connect is enabled and the first query is `SET autocommit=0`, a
master connection can be created. If it is, then the m_current_master
pointer must also be updated.
Also fixed the case where a failure to connect to one slave would cause
the connection attempts to stop too early.
This allows the same verbose information to be logged in the cases where
it is of use. Mostly this information can be used to figure out why a
certain session was closed.
By doing the reconnection only when a new query arrives, we prevent the
excessive reconnecting that is done when a server's actual and monitored
states are in conflict.