Commit Graph

953 Commits

Author SHA1 Message Date
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
842f9f1d15 Fix transaction replay timeout
The timeout would not be triggered due to the fact that the
delayed_retry_timeout wasn't inspected.
2018-11-26 09:42:12 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
9f6700b329 Skip connection_keepalive during transaction_replay
When a transaction is replayed, there is no target but the routing was
"successful".
2018-11-26 09:42:11 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
8b92c63248 Remove incorrect assertion
The assertion would hold true for a single worker but it can't be
guaranteed to be true on a multi-worker system where the statistics are
distributed across the workers.
2018-11-26 09:42:11 +02:00
dcf53da209 Enable connection_keepalive by default
Enabling the feature by default prevents the master connection from dying
during times when there are very little or no writes. Having a modest ping
interval of 300 seconds serves to minimize the amount of extra work that
both MaxScale and the server have to do while still keeping the
connections in good shape.
2018-11-26 09:42:11 +02:00
cab8a4bde8 MXS-2144: Treat server shutdown as a network error
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.

By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
2018-11-14 16:23:47 +02:00
370483fb4b Log slave error message on failed session command
If the master succeeds in executing a session command but the slave fails,
the error message could help explain why it failed. At the moment this is
mainly relevant for inspection of test results.
2018-11-14 16:23:46 +02:00
c32bb18862 Fix transaction replay checksum mismatches
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.

This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
2018-11-13 16:48:03 +02:00
ae0e9b359d Fix use of zero-weight servers
The servers with a zero weight would be always used over ones that have a
weight. This means that the behavior was inverted and caused the
mxs2054_hybrid_cluster test to fail in 2.3.

Also fixed a typo in the deprecation message.
2018-11-12 10:13:59 +02:00
b443bb7525 Store PS session commands with internal ID
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.

As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
2018-11-12 10:13:59 +02:00
2a6df0e724 Merge branch '2.2' into 2.3 2018-11-09 14:22:28 +02:00
a9e2364979 Fix unknown PS ID on query re-routing
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.
2018-11-09 12:13:22 +02:00
bfc8cb4803 MXS-2151: Always log fatal master connection errors
When the connection to the master is broken, the session is not configured
to use the read-only modes and the monitor can still connect to the
server, the connection will be closed and and error is sent to the
client. To leave some trace of this problem in the MaxScale logs, a
message should always be logged when a network error occurs.
2018-11-09 00:39:32 +02:00
c899f00541 MXS-1780 Collect server response information
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.

All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.

When session information is queried for, we report as much information
we have available.
2018-11-08 12:04:55 +02:00
3ccdb508de Fix bug in roulette wheel
Slot values were changed after the total was calculated. Fix bug
and adjust the offending code.
2018-11-08 10:50:23 +02:00
c692c864e2 MXS-2078 Take new statistics into use 2018-11-08 10:44:32 +02:00
113b1503f6 Expand readwritesplit delayed retry error message
The error now explains if the write failure was due to the
delayed_retry_timeout being reached.
2018-11-06 15:09:14 +02:00
4341c2b6e2 MXS-2142: Set causal_reads_timeout default to 10
The causal_reads_timeout default value is too long when considering the
behavioral changes that MXS-2141 introduced. With a 10 second default
value, a result is returned to the client in a reasonable amount of time.
2018-11-06 15:09:14 +02:00
e56372b153 MXS-2141: Retry query on master if it times out on slave
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
2018-11-06 15:09:14 +02:00
c661f5e838 MXS-2139: Extend transaction_replay requirements
Enabling transaction_replay now automatically enables
master_failure_mode=fail_on_write. This makes the behavior consistent
across all failure modes.
2018-11-06 15:09:14 +02:00
95745f5a4e MXS-2140: Fix readwritesplit configuration processing
Runtime configuration changes did not properly enable implicitly enabled
parameters.
2018-11-06 15:09:14 +02:00
f8c132903b Fix query average measurment and average text output.
The query_ended() call was not in the right spot. Tests did not
detect it. Changed textual output to reflect the fact that they
are for RWSplit reads.
2018-11-04 17:18:09 +02:00
4d8a95d041 Merge commit '262f1d7e471bacca6b985ec3f2cd5cb76d6e2584' into 2.3 2018-10-26 12:44:57 +03:00
192563a947 MXS-2108: Fix open connection calculation
When a connection to a server is lost and the session command history is
disabled, the session will continue as long as at least one connection is
open. Previously the open connection calculation used the same code that
was used when a new session was created which only inspected the
configured server count instead of the actual open connection count.
2018-10-19 15:20:34 +03:00
f8cf5053bd MXS-2103: Fix CREATE TEMPORARY TABLE detection
The table creation was not detected as the function used to extract the
table name did not return the fully qualified names. Even if it did return
a fully qualified name, it wouldn't have been correctly processed.
2018-10-18 20:26:58 +03:00
20af9afb49 Merge branch '2.2' into 2.3 2018-10-16 11:10:48 +03:00
92057f6ff9 Add more logging to readwritesplit
When a read-only transaction fails due to a connection error, no message
would be logged. Also added an info level message for the case when a
backend connection would get closed before the session is in the correct
state and a debug assertion that the router session should never be closed
when the handleError method is called.
2018-10-16 11:04:57 +03:00
75ea1b6ea1 Fix formatting of new(std::nothrow)
The code previously formatted everything as `new( std::nothrow)`.
2018-10-04 21:50:44 +03:00
3b1b63d939 MXS-1777 Change LOWEST_RESPONSE_TIME to ADAPTIVE_ROUTING
LOWEST_RESPONSE_TIME is not quite correct, and marketing material
will call it Adaptive Routing, so better match that.
2018-10-04 19:26:16 +03:00
ada91f2d53 MXS-1777 Make sure slower servers are sampled sufficiently 2018-10-04 17:53:57 +03:00
d866cb3a21 Add bias value to server score calculations
By biasing the values of all counter type scores to positive integers, the
server weights are always taken into use.

This fixes the case when weights were ignored until all score base values
were larger than zero (the mxs922_server test).
2018-10-03 08:41:44 +03:00
9278da1f54 MXS-2067: Remove spinlock.h
Removed the spinlock.h header and replaced with plain pthread types and
functions.
2018-09-28 12:18:24 +03:00
66227301aa Merge branch '2.2' into develop 2018-09-27 11:47:32 +03:00
7d231e5328 MXS-1777 Changing server weights to match 2.2 behavior.
Match 2.2, changed the weights back to non-inverse because 0-weight
is a special case. Renamed to server_weight for greppability.
2018-09-26 12:05:48 +03:00
7d2a5b2c13 Fix readwritesplit debug assertion
The debug assertion is wrong as the code was changed to prioritize hints
over the router target selection. Also removed the superficial check for
master, slave and relay master states as they are implied by the fact that
the connection is in use.
2018-09-26 11:08:23 +03:00
ddd6feff69 Move transaction management into a subfunction
The readwritesplit transaction management was a large part of the
clientReply function. Moving it into a separate function clarifies the
clientReply function by hiding the comments and details of the transaction
management.
2018-09-26 09:43:26 +03:00
24b438c9b6 MXS-2068: Split reply_is_complete into two functions
By splitting the processing and state querying into two separate
functions, the result can be inspected multiple times without triggering
the result processing.
2018-09-26 09:43:25 +03:00
a32361e894 MXS-2068: Move common functionality into RWBackend
The RWBackend now updates the internal state when a new write is done in
addition to acknowledging it when the reply is complete.
2018-09-26 09:43:25 +03:00
09a64753f1 MXS-2068: Move RWBackend into mysqlcommon
This cleanly allows multiple modules to use it.
2018-09-26 09:43:25 +03:00
e5a0b4e9bb Merge branch '2.2' into develop 2018-09-21 14:18:15 +03:00
d55c07dc2e MXS-2066: Reset resultset collection by default
The collection of resultsets needs to be disabled by default when a
response is received to cover the cases where an error is returned.

The collection of results should also not be set for queries that do not
generate any responses.
2018-09-21 11:14:45 +03:00
71ffef5708 Partially revert 4ba011266843857bbd3201e5b925a47e88e1808f
Add back leading operator enforcement.
2018-09-20 15:57:30 +03:00
d8d0b1a29c Merge branch '2.2' into develop 2018-09-20 12:21:53 +03:00
97a4cdcd49 MXS-2052: Log error on failed routing of session command
If no server receives the session command, an error is now logged.
2018-09-18 21:07:18 +03:00
2e069fa892 MXS-1632: Take mxb::atomic::add into use
The function now mostly replaces the use of atomic_add_ functions declared
in atomic.h.
2018-09-18 15:21:54 +03:00