Commit Graph

152 Commits

Author SHA1 Message Date
790d90f229 Update 2.3.16 Change Date 2020-01-15 11:08:51 +02:00
b0a1eddb6c Store ignored error information
The errors that are ignored by readwritesplit are now stored as the
current close reason in the Backend. This allows the information about the
error to be retained and it can be used later in the error handler to
display the true reason why the connection was closed.
2019-12-17 13:48:51 +02:00
df6c56e7ca Update 2.3.13 Change Date 2019-10-29 12:51:31 +02:00
f46f873dc1 Add verbose backend status helper
This allows the same verbose information to be logged in the cases where
it is of use. Mostly this information can be used to figure out why a
certain session was closed.
2019-09-19 13:41:49 +03:00
fd0c156655 MXS-2564: Reconnect only when necessary
By doing the reconnection only when a new query arrives, we prevent the
excessive reconnecting that is done when a server's actual and monitored
states are in conflict.
2019-09-19 13:41:49 +03:00
1748e6599d MXS-2609: Fix session command mixup on master failure
If a master failed during an ongoing session command history replay, it
would be treated as if a normal session command failed which would result
in the already executed session command being re-executed on all servers
at the wrong logical position.

To fix this, the history replay must be distinguished from normal session
command execution. When a connection replaying the history fails, the
query routing simply needs to be attempted again.
2019-08-09 01:54:09 +03:00
fd72332ea4 Improve master failure error message
The message will now always contain the server name.
2019-08-05 12:48:19 +03:00
918a2964d5 MXS-2592 Add configuration for session specific in-memory log
When enabled each session will write log messages in the in-memory log.
If session ends in error this log is written to the actual log in disk.
2019-07-28 20:56:22 +03:00
b07ffdb2fa Fix hang on transaction replay
The expected response counter was not decremented if a transaction replay
was started. This caused the connections to hang which in turn caused the
failure of the mxs1507_trx_stress test case.
2019-07-26 09:34:08 +03:00
84f4688ebb Fix readwritesplit response count assertion
The assertion in routeQuery that expects there to be at least one ongoing
query would be triggered if a query was received after a master had failed
but before the session would close. To make sure the internal logic stays
consistent, the error handler should only decrement the expected response
count if the session can continue.
2019-07-18 12:24:06 +03:00
e516c11ac5 MXS-2587: Never route stored queries in routeQuery
This could end up in infinite mutual recursion if no responses are
expected. Although this does not happen now that MXS-2587 is fixed, the
code should not even be there.
2019-07-05 14:19:44 +03:00
953dd4098b MXS-2587: Prevent queries after failed trx replay
If a transaction replay fails, no queries must be routed before the
connection is closed. This could happen if the client received the error
from the replay failure and closes the connection before the fake hangup
generated by the replay failure is processed.
2019-07-04 08:21:16 +03:00
3b6387c952 MXS-2562: Stop immediately on mid-resultset failure
If a server fails mid-resultset, there's not a lot we can do to recover
the situation. A few cases could be handled (e.g. generate an ERR if the
resultset has proceeded to the row processing stage) but these fall
outside the scope of the original issue.
2019-06-28 20:25:31 +03:00
805be70a78 Add more information to rwsplit info messages
The statement ID for all binary protocol statements and the error given to
handleError are now logged.
2019-06-20 14:27:03 +03:00
f6a5b59067 MXS-2563: Fix query retrying on slave failure
If one slave is executing a query while another one is executing a session
command and the one that is executing the session command fails, the
ongoing query would get retried even though the server that failed was not
executing it. If the server was executing a session command, nothing needs
to be done.
2019-06-17 14:07:52 +03:00
220fea3546 MXS-2464: Retry failed session commands
If the execution of a session command fails on a master, it is retried
again. If the master is not available, the response will be returned from
one of the slaves.
2019-05-31 14:01:15 +03:00
cb089f69e6 Add read retry assertion
The retrying of a read on a slave should only be done when the failing
server is waiting for a result and it was the last server from which a
result was expected.
2019-05-31 14:01:15 +03:00
625740e69d MXS-2464: Fix crash on failed session command
If the master fails when a session command is being executed with
delayed_retry enabled, a null query would get placed into the query
queue. This change simply prevents the crash and closes the session even
though the query could be retried.
2019-05-31 14:01:15 +03:00
ee7e63a611 MXS-2464: Assert that responses are expected
A query should not be queued if no responses are expected. The code that
executes queued queries should be dead code and this assertion would catch
it.
2019-05-31 14:01:14 +03:00
96a477ec89 MXS-2490: Send error to client on unknown PS handle
If a client requests an unknown binary protocol prepared statement handle,
a custom error shows the actual ID used instead of the "empty" ID of 0
that the backend sends.
2019-05-17 14:13:44 +03:00
bf63698991 MXS-2464: Bring back the runtime query queue check
The code that checked that only non-empty queries are stored in the query
queue was left out when the query queue fix was backported to 2.3. Since
MXS-2464 is caused by a still unknown bug, the runtime check should help
figure out in which cases the problem occurs.
2019-05-17 13:03:03 +03:00
ec890b33cd Prevent checksum mismatch on second trx replay
If a transaction replay has to be executed twice due to a failure of the
original candidate master, the query queue could contain replayed
queries. The replayed queries would be placed into the queue if a new
connection needs to be created before the transaction replay can start.
2019-04-05 13:33:16 +03:00
6421af1bb4 Backport query queue changes to 2.3
Backported the changes that convert the query queue in readwritesplit into
a proper queue. This changes combines both
5e3198f8313b7bb33df386eb35986bfae1db94a3 and
6042a53cb31046b1100743723567906c5d8208e2 into one commit.
2019-04-05 13:33:16 +03:00
a217dde1f0 MXS-2419: Queue queries executed during trx replay
By storing the queries in the query queue and routing it once the
transaction replay is done, we prevent two problems:

* Multiple transaction replays would overwrite the m_interrupted_query
  buffer that was used to store any queries executed during the
  transaction replay.

* Incorrect ordering of queries when the query queue is not empty and a
  new query is executed during transaction replay.
2019-04-03 12:57:05 +03:00
9bc721afb6 Merge commit '11ee74bad327e7fb15e8388d20e7838b9e49cadf' into 2.3 2019-03-21 17:52:42 +02:00
4dda31ffe3 Merge branch '2.2' into 2.3 2019-03-16 09:30:56 +02:00
09dc92973e Discard connections as the last step
Th discarding of connections in maintenance mode must be done after any
results have been written to them. This prevents closing of the connection
before the actual result is returned.
2019-03-14 12:15:30 +02:00
b537176248 Fix parsing of non-query packets
Packets that do not contain SQL should not be parsed.
2019-03-13 15:44:02 +02:00
710e5df27b MXS-2365: Fix classification of queued queries
Queries in the query queue need to be explicitly parsed since they are
stored in a single buffer and thus share the query classification
information. In the next major version this should be changed into an
array of individual buffers instead of a shared buffer.
2019-03-08 14:45:18 +02:00
b97976c4ee MXS-2323: Close stale connections
Cleaning up and closing stale connections to servers in maintenance mode
helps administrators see when a server is no longer in use.
2019-03-07 15:59:26 +02:00
24c9b62a2f Add verbose logging for session command failures
If the routing of a session command fails due to problems with the backend
connections, a more verbose error message is logged. The added status
information in the Backend class makes tracking the original cause of the
problem a lot easier due to knowing where, when and why the connection was
closed.
2019-01-31 14:23:26 +02:00
021d48f94c Log low-level reason and idle time on master failure
If the connection to the master is lost, knowing what type of an error
caused the call to handleError helps deduce what was the real reason for
it. Logging the idle time of the connection helps detect when the
wait_timeout of a connection is exceeded.
2019-01-16 09:43:49 +02:00
147f0bb656 Extend master failure error message
The error now describes the failure mode in more detail. This should make
post mortem analysis of failed connections a lot easier.
2019-01-09 20:05:38 +02:00
9adbd2f8f0 Cache the local server statistics object
By storing the server statistics object in side the session, the lookup
involved in getting a worker-local value is avoided. Since the lookup is
done multiple times for a single query, it is beneficial to store it in
the session.

As the worker-local value is never deleted, it is safe to store a
reference to it in the session. It is also never updated concurrently so
no atomic operations are necessary.
2019-01-03 09:37:59 +02:00
1fa3b133c7 Make keepalive ping checks more efficient
The code now only checks the need for a keepalive ping once every
keepalive interval. Reduced the number of mxs_clock calls to one so that
all servers use the same value.
2019-01-03 09:37:59 +02:00
48efa6d027 MXS-2213: Clear stored PS information
The information stored for each prepared statement would not be cleared
until the end of the session. This is a problem if the sessions last for a
very long time as the stored information is unused once a COM_STMT_CLOSE
has been received.

In addition to this, the session command response maps were not cleared
correctly if all backends had processed all session commands.
2018-12-11 13:54:10 +02:00
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
cab8a4bde8 MXS-2144: Treat server shutdown as a network error
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.

By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
2018-11-14 16:23:47 +02:00
c32bb18862 Fix transaction replay checksum mismatches
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.

This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
2018-11-13 16:48:03 +02:00
2a6df0e724 Merge branch '2.2' into 2.3 2018-11-09 14:22:28 +02:00
c899f00541 MXS-1780 Collect server response information
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.

All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.

When session information is queried for, we report as much information
we have available.
2018-11-08 12:04:55 +02:00
c692c864e2 MXS-2078 Take new statistics into use 2018-11-08 10:44:32 +02:00
e56372b153 MXS-2141: Retry query on master if it times out on slave
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
2018-11-06 15:09:14 +02:00
f8c132903b Fix query average measurment and average text output.
The query_ended() call was not in the right spot. Tests did not
detect it. Changed textual output to reflect the fact that they
are for RWSplit reads.
2018-11-04 17:18:09 +02:00
4d8a95d041 Merge commit '262f1d7e471bacca6b985ec3f2cd5cb76d6e2584' into 2.3 2018-10-26 12:44:57 +03:00