Unlike readwritesplit, schemarouter will process all responses from
backends as if they are expected. There are cases where errors are
generated that aren't sent as a response to a query. These queries must be
ignored and not routed to the client. Copying the code as-is from
readwritesplit isn't the cleanest solution but it avoids refactoring code
in a patch release.
The custom error number (2003) used by the backend protocol code was not
an actual error number that the server would send. The error code in
question was for an error that only the C connector returns:
CR_CONN_HOST_ERROR. Using ER_CONNECTION_KILLED as the error number better
conveys the fact that the connection was killed due to a reason not
related to any ongoing query.
By using a known error number that is correctly handled, we also avoid
writing errors to the client in the middle of a resultset or as the
initial response to a result. This explains why the problem described in
MXS-3267 happened in the first place: an unrelated connection was lost in
the middle of a resultset and the error was interpreted as the end of a
resultset. As a result of there being more data to be read, the unexpected
result state messages were logged.
This could happen if a session command triggers a master reconnection and
the connection fails while the history replay is ongoing. The code assumed
that history replay would only happen when a query was in the query queue.
This correctly triggers the session command response processing to accept
results from other servers than the current master backend if the session
can continue. If the session cannot continue, it will be stopped
immediately.
Fixed leak in load_utils.cc and the cache filter. Also changed all
instances of json_object_set with json_object_set_new to make sure it's
only used when the references are to be stolen.
Connectors that wouldn't send the plugin name even when the plugin
authentication capability was enabled would have to do an extra step in
the authentication.
The diagnostics_json call could access the std::unordered_map at the same
time it was being updated by the monitoring thread. This leads to
undefined behavior which in the case of MXS-3059 manifested as a segfault.
The call to gwbuf_length will fail if the pointed to buffer isn't the head
of the chain. To prevent this, the length is calculated before the buffer
is appended.
Also fixed the use of gwbuf_append. The return value should be assigned
and the code shouldn't reply on the value passed to the function being
correct.
The code used a null GWBUF with gwbuf_append which causes a crash. The
return value of the function that used it was also not correctly handled
and would be mistaken for a different error.
If the protocol routes a COM_QUIT packet to the backend, it must not
generate a packet when it is shutting down. This could cause unexpected
write errors if the backend server managed to close the socket before the
write was done.
By deferring the closing of a DCB until the protocol tells that it's in a
stable state, we avoid closing the connection mid-authentication. This
makes sure that all connections have reached a stable state before they
are closed which in turn prevents the connections from counting towards
aborted connects (or failed authentications like it did with the old fix).
In some cases the dbfwfilter is too strict and SQL that would not match a
rule is blocked due to it not being fully parsed. To allow a more lenient
mode of operation, the requirement for full parsing must be made
configurable.
When a fake handshake response is generated for a connection that hasn't
received the server's handshake, the client's SHA1 would be used with a
static scramble. This, in theory, would weaken the authentication to some
extend so to completely prevent this, a null password is used. This
removes any possibility of the password being exposed.
This allows the set of servers used by the service to also participate in
the cache value resolution. This will prevent the most obvious of problems
but any abstractions of the servers will prevent this from working.
Session commands did not trigger a reconnection process which caused
sessions to be closed in cases where recovery was possible.
Added a test case that verifies the patch fixes the problem.