Commit Graph

231 Commits

Author SHA1 Message Date
021d48f94c Log low-level reason and idle time on master failure
If the connection to the master is lost, knowing what type of an error
caused the call to handleError helps deduce what was the real reason for
it. Logging the idle time of the connection helps detect when the
wait_timeout of a connection is exceeded.
2019-01-16 09:43:49 +02:00
7cac2c009d Merge branch '2.3' into develop 2019-01-10 12:43:46 +02:00
9cac927542 MXS-2220 Move server response calculation functions inside class 2019-01-10 10:26:53 +02:00
147f0bb656 Extend master failure error message
The error now describes the failure mode in more detail. This should make
post mortem analysis of failed connections a lot easier.
2019-01-09 20:05:38 +02:00
f0f9c21d1c Merge branch '2.3' into develop 2019-01-07 10:54:42 +02:00
40485d746c MXS-2220 Change server name to constant string 2019-01-03 12:13:15 +02:00
9adbd2f8f0 Cache the local server statistics object
By storing the server statistics object in side the session, the lookup
involved in getting a worker-local value is avoided. Since the lookup is
done multiple times for a single query, it is beneficial to store it in
the session.

As the worker-local value is never deleted, it is safe to store a
reference to it in the session. It is also never updated concurrently so
no atomic operations are necessary.
2019-01-03 09:37:59 +02:00
1fa3b133c7 Make keepalive ping checks more efficient
The code now only checks the need for a keepalive ping once every
keepalive interval. Reduced the number of mxs_clock calls to one so that
all servers use the same value.
2019-01-03 09:37:59 +02:00
4d0a40ef9f Add missing pointer initialization
The change from SRWBackend to RWBackend* had some side effects, namely the
missing automatic initialization into zero values.
2018-12-28 08:19:23 +02:00
20fe9b9dca MXS-2196: Rename session states
Minor renaming of the session state enum values. Also exposed the session
state stringification function in the public header and removed the
stringification macro.
2018-12-13 13:27:45 +02:00
48efa6d027 MXS-2213: Clear stored PS information
The information stored for each prepared statement would not be cleared
until the end of the session. This is a problem if the sessions last for a
very long time as the stored information is unused once a COM_STMT_CLOSE
has been received.

In addition to this, the session command response maps were not cleared
correctly if all backends had processed all session commands.
2018-12-11 13:54:10 +02:00
77477d9648 MXS-2196: Rename dcb_role_t to DCB::Role 2018-12-05 15:30:44 +02:00
0d09b56f58 MXS-2025 RWBackends as a vector of unique_ptr:s
For lifetime management keep RWBackends in a vector of unique_ptrs.
RWSplitSession keeps the unique_ptrs very private, and provides a vector
of plain pointers for all other interfaces.
2018-12-05 10:23:57 +02:00
20b62a3f3d MXS-2025 Change RWBackend usage to a vector of raw ptrs.
This is essentially just a search and replace to change SRWBackend to
RWBackend* and SRWBackendList to PRWBackends, a vector of a raw
pointers. In the next few commits vector<unique_ptr<RWBackend>>
will be used for life time management.

There are a lot of diffs from the global search and replace. Only a few manual
edits had to be done.

list-src -x build | xargs sed -ri 's/SRWBackends/prwbackends/g'
list-src -x build | xargs sed -ri 's/const mxs::SRWBackend\&/const mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/const SRWBackend\&/const RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\(\)/nullptr/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\&/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\b/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/prwbackends/PRWBackends/g'
2018-12-05 10:23:57 +02:00
d96a7dedc5 MXS-2205 Convert maxscale/poll.h to .hh 2018-12-04 14:51:02 +02:00
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
cab8a4bde8 MXS-2144: Treat server shutdown as a network error
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.

By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
2018-11-14 16:23:47 +02:00
c32bb18862 Fix transaction replay checksum mismatches
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.

This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
2018-11-13 16:48:03 +02:00
2a6df0e724 Merge branch '2.2' into 2.3 2018-11-09 14:22:28 +02:00
c899f00541 MXS-1780 Collect server response information
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.

All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.

When session information is queried for, we report as much information
we have available.
2018-11-08 12:04:55 +02:00
c692c864e2 MXS-2078 Take new statistics into use 2018-11-08 10:44:32 +02:00
e56372b153 MXS-2141: Retry query on master if it times out on slave
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
2018-11-06 15:09:14 +02:00
f8c132903b Fix query average measurment and average text output.
The query_ended() call was not in the right spot. Tests did not
detect it. Changed textual output to reflect the fact that they
are for RWSplit reads.
2018-11-04 17:18:09 +02:00
4d8a95d041 Merge commit '262f1d7e471bacca6b985ec3f2cd5cb76d6e2584' into 2.3 2018-10-26 12:44:57 +03:00
20af9afb49 Merge branch '2.2' into 2.3 2018-10-16 11:10:48 +03:00
ddd6feff69 Move transaction management into a subfunction
The readwritesplit transaction management was a large part of the
clientReply function. Moving it into a separate function clarifies the
clientReply function by hiding the comments and details of the transaction
management.
2018-09-26 09:43:26 +03:00
24b438c9b6 MXS-2068: Split reply_is_complete into two functions
By splitting the processing and state querying into two separate
functions, the result can be inspected multiple times without triggering
the result processing.
2018-09-26 09:43:25 +03:00
a32361e894 MXS-2068: Move common functionality into RWBackend
The RWBackend now updates the internal state when a new write is done in
addition to acknowledging it when the reply is complete.
2018-09-26 09:43:25 +03:00
2e069fa892 MXS-1632: Take mxb::atomic::add into use
The function now mostly replaces the use of atomic_add_ functions declared
in atomic.h.
2018-09-18 15:21:54 +03:00
c81173e320 Move C++ code out of C headers
The additions into the server.h header used C++ language which caused C
programs to fail to compile. Moved the implementation of the EMAverage
class into the private Server class in the server.hh header and exposed it
via functions in the server.h header. Also temporarily moved
almost_equal_server_scores into the public server.hh as there is no
service.hh header.
2018-09-10 11:21:06 +03:00
c447e5cf15 Uncrustify maxscale
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
2018-09-09 22:26:19 +03:00
fa7ec95069 MXS-1777 Tune code for cases with slow, or new servers.
Changes that allow slow or new servers to quickly apply samples towards the
server average. The most important changes are to not ignore the first N samples,
and apply an average to the server as soon as there is one available.
The new ResponseStat::make_valid() will use filter samples to add an average,
if no averages have yet been added, even if the number of  filter samples is less
than the filter limit.
2018-09-09 14:17:40 +03:00
6351ab9c73 MXS-1777: Initial version of routing based on query response time.
The main piece of code, slave selection (backend_cmp_response_time), uses the available
method of pair-wise comparison of slaves. This will be changed to selection using all
available slaves, along with removal of hard coded values.
2018-09-05 17:05:06 +03:00
a13e95951b Merge branch '2.2' into develop 2018-08-30 11:37:49 +03:00
3f53eddbde MXS-2020 Replace ss[_info]_dassert with mxb_assert[_message] 2018-08-22 11:34:59 +03:00
ae43e4f0f2 MXS-2013 Remove all CHK_-macros 2018-08-15 09:28:04 +03:00
c01840ffb3 Remove unnecessary SConfig from readwritesplit
The configuration doesn't need to be contained in shared pointer as each
session holds its own version of it. This removes most of the overhead in
configuration reloading. The only thing that's left is any overhead added
by the use of thread-local storage.
2018-08-06 21:20:29 +03:00
d7a3980308 Read correct parameter for causal_reads
The configuration used the wrong parameter name. The test also did not
explicitly enable tracking of the last_gtid variable which caused it to
fail if it wasn't already on.
2018-07-31 09:41:09 +03:00
6c59da77fb Merge branch '2.2' into develop 2018-07-26 11:27:09 +03:00
2acf5f545e MXS-1066 Add query hint to route to last used server
Add new hint type and support for it in the readwritesplit router.
2018-07-13 11:11:02 +03:00
f3c84d84c7 Fix transaction migration
The transaction migration in the case of a changed master never worked as
transaction replay would only be triggered when the master fails. To cover
this case, the transaction replay just needs to be started when the need
for a transaction migration is detected.

To help diagnose the behavior, the Trx class no longer logs a message when
a transaction is closed. This is now done by readwritesplit which has more
knowledge of the context in which the transaction is closed.
2018-07-11 14:08:50 +03:00
86cdb14286 Don't process queued commands when replaying transaction
If a transaction is replayed, queued commands must not be processed. The
exception to this rule is when pending session commands are executed
before the first statement in the replayed transaction is executed.
2018-07-11 14:08:47 +03:00
0614ff4c9d Fix handling of transactions with large results
If transaction replaying was enabled and a result was returned in more
than one call to clientReply, a NULL value would be added to the statement
which in turn would trigger a debug assertion.

Similarly any following statements in the transaction would be executed
regardless of whether the result was complete.

Renamed the statement execution function to better describe what it does.

Extended the basic functional test case to cover this.
2018-07-11 14:08:47 +03:00
bd4be3a97b Use shared configurations in readwritesplit
By using a shared pointer instead of a plain object, we can replace the
router configuration without it affecting existing sessions. This is a
change that is required to enable runtime reconfiguration of
readwritesplit.
2018-07-11 14:08:45 +03:00
8d7cb27884 Remove faulty debug assertion
The debug assertion was missing the check for the queued commands.
2018-07-02 13:29:21 +03:00