Commit Graph

166 Commits

Author SHA1 Message Date
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
cab8a4bde8 MXS-2144: Treat server shutdown as a network error
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.

By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
2018-11-14 16:23:47 +02:00
c32bb18862 Fix transaction replay checksum mismatches
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.

This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
2018-11-13 16:48:03 +02:00
2a6df0e724 Merge branch '2.2' into 2.3 2018-11-09 14:22:28 +02:00
c899f00541 MXS-1780 Collect server response information
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.

All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.

When session information is queried for, we report as much information
we have available.
2018-11-08 12:04:55 +02:00
c692c864e2 MXS-2078 Take new statistics into use 2018-11-08 10:44:32 +02:00
e56372b153 MXS-2141: Retry query on master if it times out on slave
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
2018-11-06 15:09:14 +02:00
f8c132903b Fix query average measurment and average text output.
The query_ended() call was not in the right spot. Tests did not
detect it. Changed textual output to reflect the fact that they
are for RWSplit reads.
2018-11-04 17:18:09 +02:00
4d8a95d041 Merge commit '262f1d7e471bacca6b985ec3f2cd5cb76d6e2584' into 2.3 2018-10-26 12:44:57 +03:00
20af9afb49 Merge branch '2.2' into 2.3 2018-10-16 11:10:48 +03:00
ddd6feff69 Move transaction management into a subfunction
The readwritesplit transaction management was a large part of the
clientReply function. Moving it into a separate function clarifies the
clientReply function by hiding the comments and details of the transaction
management.
2018-09-26 09:43:26 +03:00
24b438c9b6 MXS-2068: Split reply_is_complete into two functions
By splitting the processing and state querying into two separate
functions, the result can be inspected multiple times without triggering
the result processing.
2018-09-26 09:43:25 +03:00
a32361e894 MXS-2068: Move common functionality into RWBackend
The RWBackend now updates the internal state when a new write is done in
addition to acknowledging it when the reply is complete.
2018-09-26 09:43:25 +03:00
2e069fa892 MXS-1632: Take mxb::atomic::add into use
The function now mostly replaces the use of atomic_add_ functions declared
in atomic.h.
2018-09-18 15:21:54 +03:00
c81173e320 Move C++ code out of C headers
The additions into the server.h header used C++ language which caused C
programs to fail to compile. Moved the implementation of the EMAverage
class into the private Server class in the server.hh header and exposed it
via functions in the server.h header. Also temporarily moved
almost_equal_server_scores into the public server.hh as there is no
service.hh header.
2018-09-10 11:21:06 +03:00
c447e5cf15 Uncrustify maxscale
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
2018-09-09 22:26:19 +03:00
fa7ec95069 MXS-1777 Tune code for cases with slow, or new servers.
Changes that allow slow or new servers to quickly apply samples towards the
server average. The most important changes are to not ignore the first N samples,
and apply an average to the server as soon as there is one available.
The new ResponseStat::make_valid() will use filter samples to add an average,
if no averages have yet been added, even if the number of  filter samples is less
than the filter limit.
2018-09-09 14:17:40 +03:00
6351ab9c73 MXS-1777: Initial version of routing based on query response time.
The main piece of code, slave selection (backend_cmp_response_time), uses the available
method of pair-wise comparison of slaves. This will be changed to selection using all
available slaves, along with removal of hard coded values.
2018-09-05 17:05:06 +03:00
a13e95951b Merge branch '2.2' into develop 2018-08-30 11:37:49 +03:00
3f53eddbde MXS-2020 Replace ss[_info]_dassert with mxb_assert[_message] 2018-08-22 11:34:59 +03:00
ae43e4f0f2 MXS-2013 Remove all CHK_-macros 2018-08-15 09:28:04 +03:00
c01840ffb3 Remove unnecessary SConfig from readwritesplit
The configuration doesn't need to be contained in shared pointer as each
session holds its own version of it. This removes most of the overhead in
configuration reloading. The only thing that's left is any overhead added
by the use of thread-local storage.
2018-08-06 21:20:29 +03:00
d7a3980308 Read correct parameter for causal_reads
The configuration used the wrong parameter name. The test also did not
explicitly enable tracking of the last_gtid variable which caused it to
fail if it wasn't already on.
2018-07-31 09:41:09 +03:00
6c59da77fb Merge branch '2.2' into develop 2018-07-26 11:27:09 +03:00
2acf5f545e MXS-1066 Add query hint to route to last used server
Add new hint type and support for it in the readwritesplit router.
2018-07-13 11:11:02 +03:00
f3c84d84c7 Fix transaction migration
The transaction migration in the case of a changed master never worked as
transaction replay would only be triggered when the master fails. To cover
this case, the transaction replay just needs to be started when the need
for a transaction migration is detected.

To help diagnose the behavior, the Trx class no longer logs a message when
a transaction is closed. This is now done by readwritesplit which has more
knowledge of the context in which the transaction is closed.
2018-07-11 14:08:50 +03:00
86cdb14286 Don't process queued commands when replaying transaction
If a transaction is replayed, queued commands must not be processed. The
exception to this rule is when pending session commands are executed
before the first statement in the replayed transaction is executed.
2018-07-11 14:08:47 +03:00
0614ff4c9d Fix handling of transactions with large results
If transaction replaying was enabled and a result was returned in more
than one call to clientReply, a NULL value would be added to the statement
which in turn would trigger a debug assertion.

Similarly any following statements in the transaction would be executed
regardless of whether the result was complete.

Renamed the statement execution function to better describe what it does.

Extended the basic functional test case to cover this.
2018-07-11 14:08:47 +03:00
bd4be3a97b Use shared configurations in readwritesplit
By using a shared pointer instead of a plain object, we can replace the
router configuration without it affecting existing sessions. This is a
change that is required to enable runtime reconfiguration of
readwritesplit.
2018-07-11 14:08:45 +03:00
8d7cb27884 Remove faulty debug assertion
The debug assertion was missing the check for the queued commands.
2018-07-02 13:29:21 +03:00
9c6cc713c8 Remove unnecessary session command logging
All executed session commands were logged in the RWSplitSession
destructor. This is not really necessary and shouldn't have been placed
there in the first place.
2018-07-02 13:29:20 +03:00
12398bfc26 MXS-1549: Implement optimistic transaction execution
When the `optimistic_trx` mode is enabled, all transactions are started on
a slave server. If the client executes a query inside the transaction that
is not of a read-only nature, the transaction is rolled back and replayed
on the master.
2018-07-02 13:29:19 +03:00
d6a964304b MXS-1549: Always store previous target
Unconditionally update the previous target on each routed query. This
allows routing to the previous server in case it is needed. One example of
this is a new type of hint that allows routing to the same server where
the previous query was sent.

Also added a minor clarifying comment to the resetting of the
current_query.
2018-07-02 13:29:18 +03:00
93fdada534 Fix crash on trx replay with session command
Readwritesplit would crash with the following transaction:

    BEGIN;
    SET @a = 1; -- This is where it would crash
    COMMIT;

When a session command was a part of the transaction, empty queries
(i.e. NULL GWBUFs) would be added to the transaction. If the transaction
were to be replayed, MaxScale would crash when these NULL queries were
executed.

Once the empty responses were fixed, the replaying of the transaction
would fail with a checksum mismatch. This was caused by the wrong order of
processing in RWSplitSession::clientReply. The response processing for
session commands was done after the response processing for replayed
transactions. This would trigger a checksum comparison too early for the
transaction in question.
2018-06-30 19:26:23 +03:00
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
e561c3995c Use correct write in Backend::execute_session_command
Backend::execute_session_command would use the overridden write method
instead of the Backend::write method that it intended to use. This caused
session commands that did not expect a response to be in a state that
expected a result.

Also fixed RWBackend::write pass the response_type value to
Backend::write.
2018-06-22 10:37:11 +03:00
6278f27ab6 Merge branch '2.2' into develop 2018-06-20 10:26:29 +03:00
1f166482b2 Fix slave reconnection regression
The state of the backend needs to be checked before any pending session
commands are executed on it.

Added debug assertions to catch invalid use of the status functions of
closed backends.
2018-06-18 14:25:05 +03:00
2005164222 Fix slave reconnection logic
Allowing calls to select_connect_backend_servers even when all slaves are
connected solves the debug assertion in select_connect_backend_servers
that happens when the execution of a queued query causes a new connection
to be created.
2018-06-15 16:16:53 +03:00
3ed6411741 Fix debug assert on reconnection with session commands
When a query was routed to a server that must first be connected to, the
expected response count was not updated for the executed session commands.
2018-06-15 16:16:53 +03:00
0d73530ff3 Merge branch '2.2' into develop 2018-06-08 11:30:55 +03:00
445eece95b MXS-1507: Fix replaying of empty transactions
If the starting of a transaction was interrupted by a server failure, the
query needs to be retried. This needs to be done as a transaction replay
to keep the routing logic consistent and simple.

When a non-autocommit transaction is interrupted, there will be no query
in progress and no replaying is needed. To handle this case, the replay
initialization logic needed to be altered to treat truly empty
transactions as a success case.
2018-06-04 19:26:36 +03:00
4a3216d483 Merge branch '2.2' into develop 2018-06-04 16:00:19 +03:00
2bbf1271c9 Fix large packet execution
The number of expected responses was not correctly tracked for large
packets.
2018-05-28 13:51:05 +03:00
a33f09ad06 Fix test failures and add debug logging
Fixed test failures, increased some of the timeouts, added extra info
level logging into rwsplit to help debug the test failures.
2018-05-22 17:46:27 +03:00