If the slave's response differs from the master and the slave sent an
error packet, log the contents of the error. This should make it obvious
as to what caused the failure.
The assertion in routeQuery that expects there to be at least one ongoing
query would be triggered if a query was received after a master had failed
but before the session would close. To make sure the internal logic stays
consistent, the error handler should only decrement the expected response
count if the session can continue.
This could end up in infinite mutual recursion if no responses are
expected. Although this does not happen now that MXS-2587 is fixed, the
code should not even be there.
If a transaction replay fails, no queries must be routed before the
connection is closed. This could happen if the client received the error
from the replay failure and closes the connection before the fake hangup
generated by the replay failure is processed.
The error was only generated for COM_STMT_EXECUTE commands when all PS
commands should trigger it. In addition, large packets would get sent two
errors upon the arrival of the trailing end.
If a server fails mid-resultset, there's not a lot we can do to recover
the situation. A few cases could be handled (e.g. generate an ERR if the
resultset has proceeded to the row processing stage) but these fall
outside the scope of the original issue.
If a COM_STMT_EXECUTE has no metadata in it and it has more than one
parameter, it must be routed to the same backend where the previous
COM_STMT_EXECUTE with the same ID was routed to. This prevents MDEV-19811
that is triggered by MaxScale routing the queries to different backends.
By always restoring the ID, we are guaranteed to only store the query in
the form that it was originally sent in. This should be changed so that
the ID that the client sends can be used as-is in the backends.
The connection counts are now always used to pick the best servers where
the initial connections are created. This covers both master and slaves
selection. Reconnections done while routing queries still pick the "best"
server according to the slave selection criteria. This allows better
servers to be taken into use when `lazy_connect` is enabled.
1. Remove persistence of performance data
2. Move global CanonicalPerformance into SmartRouter object
3. Implement another kill_all_others_v2. Left kill_all_others_v1
in case it should be fixed and used instead.
Does the measurments, usage of the same and persistence to file.
Kill is not implemented, so waits for all clusters to fully responds
The performance data uses a mutex and the persistence data file
is written while holding the mutex. This obviously needs to be
improved, but this commit shows the working concept.
If one slave is executing a query while another one is executing a session
command and the one that is executing the session command fails, the
ongoing query would get retried even though the server that failed was not
executing it. If the server was executing a session command, nothing needs
to be done.
If a resultset is followed by an ERR packet that is not expected
(e.g. server is shutting down), the packet must not be sent to the
client. This allows readwritesplit to replace the failing connection with
a new one thus hiding server shutdowns from clients.
The name of the object (i.e. the section name from the configuration
file), is now stored in the configuration object for that object.
That way, more contextual and hence morfe user friendly errors and
warnings can be generated.
If the server (a real one or a service exposed as a server) is
on the same machine as MaxScale, then for performance reasons
a Unix domain socket and not a TCP/IP socket should be used.
As an error returned by the server is now stored inside RWBackend,
irrespective of whether it is returned solely or e.g. last after
a result set, there is no need to examine the GWBUF in rws, but
we can use the information that exists.