Backported the fix for MXS-3617 to 2.4.
If a packet was read from the backend while the client DCB was being
throttled due to writeq_high_water being hit, the response would be
discarded as it did not qualify for routing. The check should not check
whether the client DCB is in epoll as it has no effect on writes.
Unlike readwritesplit, schemarouter will process all responses from
backends as if they are expected. There are cases where errors are
generated that aren't sent as a response to a query. These queries must be
ignored and not routed to the client. Copying the code as-is from
readwritesplit isn't the cleanest solution but it avoids refactoring code
in a patch release.
The custom error number (2003) used by the backend protocol code was not
an actual error number that the server would send. The error code in
question was for an error that only the C connector returns:
CR_CONN_HOST_ERROR. Using ER_CONNECTION_KILLED as the error number better
conveys the fact that the connection was killed due to a reason not
related to any ongoing query.
By using a known error number that is correctly handled, we also avoid
writing errors to the client in the middle of a resultset or as the
initial response to a result. This explains why the problem described in
MXS-3267 happened in the first place: an unrelated connection was lost in
the middle of a resultset and the error was interpreted as the end of a
resultset. As a result of there being more data to be read, the unexpected
result state messages were logged.
If the protocol routes a COM_QUIT packet to the backend, it must not
generate a packet when it is shutting down. This could cause unexpected
write errors if the backend server managed to close the socket before the
write was done.
By deferring the closing of a DCB until the protocol tells that it's in a
stable state, we avoid closing the connection mid-authentication. This
makes sure that all connections have reached a stable state before they
are closed which in turn prevents the connections from counting towards
aborted connects (or failed authentications like it did with the old fix).
When a fake handshake response is generated for a connection that hasn't
received the server's handshake, the client's SHA1 would be used with a
static scramble. This, in theory, would weaken the authentication to some
extend so to completely prevent this, a null password is used. This
removes any possibility of the password being exposed.
The charset sent in the handshake is now done with the following
priorities:
* First Master server
* Last Slave server
* First Running server or Down server whose charset is known
The change is that server in the Down state to which we've successfully
connected to can also be used as the charset source. This, in addition
with an "empty" default charset, helps avoid the use of the default latin1
charset unless absolutely necessary.
The backend didn't expect AuthSwitchRequest packets in response to the
handshake response packets. This is allowed by the protocol and appears to
happen with at least MySQL 8.0.
If the client DCB of the session was passed into the function, it was
possible that the session pointer for it was already set to null. The
session pointer of an open DCB is never null but a client DCB's session
pointer can be null if accessed via the MXS_SESSION object.
By incrementing the counters when the session is created, we know that the
counter will always be decremented correctly. This does cause the listener
session to be counted as an actual session but this is already present in
the statistics calculations and is something we have to live with in 2.3
This change also makes it possible to overshoot the connection count
limitation as the session creation is delayed until authentication
fails. Both of these problems are fixed in 2.4.
This causes the connection failure to be counted as an authentication
failure instead of a connection error. The former never causes the host to
be blocked which effectively solves the problem for most cases. The only
case where this would not work is where the network buffer for a backend
DCB is full right after the connection is created.