Commit Graph

166 Commits

Author SHA1 Message Date
5828061321 Merge branch '2.3' into develop 2019-05-17 14:39:30 +03:00
96a477ec89 MXS-2490: Send error to client on unknown PS handle
If a client requests an unknown binary protocol prepared statement handle,
a custom error shows the actual ID used instead of the "empty" ID of 0
that the backend sends.
2019-05-17 14:13:44 +03:00
bf63698991 MXS-2464: Bring back the runtime query queue check
The code that checked that only non-empty queries are stored in the query
queue was left out when the query queue fix was backported to 2.3. Since
MXS-2464 is caused by a still unknown bug, the runtime check should help
figure out in which cases the problem occurs.
2019-05-17 13:03:03 +03:00
418ccf861d Format routers and monitors 2019-05-10 10:31:12 +03:00
23a09a6294 MXS-2455 Use mxb::Buffer::iterator
Simplifies the code and as extra allocations etc. are only
made when info is enabled, and can thus be ignored.
2019-05-09 15:04:03 +03:00
c818b1208a MXS-2455 Recognize transaction rollbacks
All transaction rollback errors have an sql_state like "40XXX".
So, when an error reply is received we check for that and act
accordingly.
2019-05-08 10:00:50 +03:00
3dd9298b18 MXS-2456: Test transaction replay cap
Added a test that makes sure the transaction replay cap is respected. Also
improved the logging to show how many transaction replay attemps have been
done and to log if a replay is not done due to too many attempts.
2019-05-02 16:59:36 +03:00
26b2897280 MXS-2456: Cap transaction replay attempts
In most cases it is reasonable to stop attempting transaction replays
after a certain number of failed attempts. This prevents transactions from
being repeatedly replayed on the same server over and over again if, for
example, it keeps crashing.
2019-05-02 16:59:36 +03:00
01b1d469a8 MXS-2435 Handle recoverable Clustrix errors
If
- transaction replay is enabled,
- an error is returned and
- the error is one of the recoverable Clustrix errors
we will retry the transaction.

If it succeeds, then the client will not notice anything but
for a short delay.

Note that the error message is looked for irrespective of whether
the backend is Clustrix or not. However, as errors are not common
the price for doing that can probably be ignored.

However, a bigger problem is that explicit knowledge of different
backends should *not* be coded into routers.
2019-04-26 10:54:57 +03:00
c643f9bc8d Merge branch '2.3' into develop 2019-04-12 13:23:49 +03:00
ec890b33cd Prevent checksum mismatch on second trx replay
If a transaction replay has to be executed twice due to a failure of the
original candidate master, the query queue could contain replayed
queries. The replayed queries would be placed into the queue if a new
connection needs to be created before the transaction replay can start.
2019-04-05 13:33:16 +03:00
6421af1bb4 Backport query queue changes to 2.3
Backported the changes that convert the query queue in readwritesplit into
a proper queue. This changes combines both
5e3198f8313b7bb33df386eb35986bfae1db94a3 and
6042a53cb31046b1100743723567906c5d8208e2 into one commit.
2019-04-05 13:33:16 +03:00
2aa3515fc8 Merge commit '09cb4a885f88d30b5108d215dcdaa5163229a230' into develop 2019-04-04 14:34:17 +03:00
a217dde1f0 MXS-2419: Queue queries executed during trx replay
By storing the queries in the query queue and routing it once the
transaction replay is done, we prevent two problems:

* Multiple transaction replays would overwrite the m_interrupted_query
  buffer that was used to store any queries executed during the
  transaction replay.

* Incorrect ordering of queries when the query queue is not empty and a
  new query is executed during transaction replay.
2019-04-03 12:57:05 +03:00
5242cd5ebf Readwritesplit: Graceful maintenance mode
By allowing transactions to the master to end even if the server is in
maintenance mode makes it possible to terminate connections at a known
point. This helps prevent interrupted transactions which can help reduce
errors that are visible to the clients.
2019-04-02 14:21:54 +03:00
74eeb64fba Don't close connections to servers being drained
The connections to servers being drained should not be closed like they
should be for servers in maintenance mode. The change in functionality
between 2.3 and develop caused the connections to be discarded if the
server was in either maintenance or drain mode.
2019-03-21 18:19:10 +02:00
9bc721afb6 Merge commit '11ee74bad327e7fb15e8388d20e7838b9e49cadf' into 2.3 2019-03-21 17:52:42 +02:00
6042a53cb3 Replace raw GWBUF pointers with mxs::Buffer
Now that the query queue is stored in an actual container, it is only
logical to use mxs::Buffer instead of GWBUF as the stored type.
2019-03-18 13:18:52 +02:00
5e3198f831 Replace the plain GWBUF query queue with std::deque
Using a std::deque to store the queries retains the exact state of the
object thus removing the need to parse the query again. It also removes
the need to split the queue into individual packets which makes the code
cleaner.
2019-03-18 13:18:52 +02:00
0001babd26 Clean up readwritesplit routing functions
Moved the more verbose parts of the routing code into subfunctions and
arranged it so that more relevant parts are closer to each other. Also
added the SQL statement that is being delayed to the message.
2019-03-18 13:18:52 +02:00
4bf9fa872c MXS-2313: Use servers of same rank in readwritesplit
When a readwritesplit session has a connection to a master server, servers
of the same rank as the master are used. If no master connection is
available, the server with the highest rank among all connected servers is
used. If there are no open connections, the server with the best rank is
chosen and a connection to it is made.

Connections with different rank values than what is the current rank value
of the session will be discarded. This reduces the use of server with
different ranks when the master server of a session fails. Without the
active pruning of connections, slave connections to primary clusters
without masters would remain in use even after the primary master
fails. This guarantees full switchover to a secondary cluster if a master
change occurs.
2019-03-18 13:12:59 +02:00
ba448cb12c MXS-2313: Clean up readwritesplit connection creation
The connection creation is now internal to RWSplitSession. This makes the
code more readable by removing the need to pass parameters and allowing
easier reuse of existing functions. The various conditions require to
create connections are now also checked in only one place.
2019-03-18 13:12:58 +02:00
4dda31ffe3 Merge branch '2.2' into 2.3 2019-03-16 09:30:56 +02:00
995c890664 Fix uninitialized pointers in readwritesplit 2019-03-15 15:41:39 +02:00
667a9f1c6f Merge branch '2.3' into develop 2019-03-15 12:31:08 +02:00
09dc92973e Discard connections as the last step
Th discarding of connections in maintenance mode must be done after any
results have been written to them. This prevents closing of the connection
before the actual result is returned.
2019-03-14 12:15:30 +02:00
b537176248 Fix parsing of non-query packets
Packets that do not contain SQL should not be parsed.
2019-03-13 15:44:02 +02:00
1c3a5bda83 Merge branch '2.3' into develop 2019-03-11 12:29:56 +02:00
710e5df27b MXS-2365: Fix classification of queued queries
Queries in the query queue need to be explicitly parsed since they are
stored in a single buffer and thus share the query classification
information. In the next major version this should be changed into an
array of individual buffers instead of a shared buffer.
2019-03-08 14:45:18 +02:00
24ea222ed6 MXS-2350: Allow lazy connection creation
The lazy connection creation reduces the burden that short sessions place
on the backend servers. This also prevents the problems caused by early
disconnections that happen when only one server is used but multiple
connections are created. This does not solve the problem (MXS-619) but it
does mitigate it to acceptable levels.

This commit also adds a change to the weighting algorithm that prefers
existing connections over unopened ones. This helps avoid the
flip-flopping that happens when the absolute scores are very similar. The
hard-coded value might need to be tuned once testing is done.
2019-03-08 08:20:44 +02:00
95317725ce Merge branch '2.3' into develop 2019-03-07 16:21:03 +02:00
b97976c4ee MXS-2323: Close stale connections
Cleaning up and closing stale connections to servers in maintenance mode
helps administrators see when a server is no longer in use.
2019-03-07 15:59:26 +02:00
6038f1f386 Merge branch '2.3' into develop 2019-02-01 13:55:54 +02:00
24c9b62a2f Add verbose logging for session command failures
If the routing of a session command fails due to problems with the backend
connections, a more verbose error message is logged. The added status
information in the Backend class makes tracking the original cause of the
problem a lot easier due to knowing where, when and why the connection was
closed.
2019-01-31 14:23:26 +02:00
a3fa2f8111 Merge branch '2.3' into develop 2019-01-16 16:31:14 +02:00
021d48f94c Log low-level reason and idle time on master failure
If the connection to the master is lost, knowing what type of an error
caused the call to handleError helps deduce what was the real reason for
it. Logging the idle time of the connection helps detect when the
wait_timeout of a connection is exceeded.
2019-01-16 09:43:49 +02:00
7cac2c009d Merge branch '2.3' into develop 2019-01-10 12:43:46 +02:00
9cac927542 MXS-2220 Move server response calculation functions inside class 2019-01-10 10:26:53 +02:00
147f0bb656 Extend master failure error message
The error now describes the failure mode in more detail. This should make
post mortem analysis of failed connections a lot easier.
2019-01-09 20:05:38 +02:00
f0f9c21d1c Merge branch '2.3' into develop 2019-01-07 10:54:42 +02:00
40485d746c MXS-2220 Change server name to constant string 2019-01-03 12:13:15 +02:00
9adbd2f8f0 Cache the local server statistics object
By storing the server statistics object in side the session, the lookup
involved in getting a worker-local value is avoided. Since the lookup is
done multiple times for a single query, it is beneficial to store it in
the session.

As the worker-local value is never deleted, it is safe to store a
reference to it in the session. It is also never updated concurrently so
no atomic operations are necessary.
2019-01-03 09:37:59 +02:00
1fa3b133c7 Make keepalive ping checks more efficient
The code now only checks the need for a keepalive ping once every
keepalive interval. Reduced the number of mxs_clock calls to one so that
all servers use the same value.
2019-01-03 09:37:59 +02:00
4d0a40ef9f Add missing pointer initialization
The change from SRWBackend to RWBackend* had some side effects, namely the
missing automatic initialization into zero values.
2018-12-28 08:19:23 +02:00
20fe9b9dca MXS-2196: Rename session states
Minor renaming of the session state enum values. Also exposed the session
state stringification function in the public header and removed the
stringification macro.
2018-12-13 13:27:45 +02:00
48efa6d027 MXS-2213: Clear stored PS information
The information stored for each prepared statement would not be cleared
until the end of the session. This is a problem if the sessions last for a
very long time as the stored information is unused once a COM_STMT_CLOSE
has been received.

In addition to this, the session command response maps were not cleared
correctly if all backends had processed all session commands.
2018-12-11 13:54:10 +02:00
77477d9648 MXS-2196: Rename dcb_role_t to DCB::Role 2018-12-05 15:30:44 +02:00
0d09b56f58 MXS-2025 RWBackends as a vector of unique_ptr:s
For lifetime management keep RWBackends in a vector of unique_ptrs.
RWSplitSession keeps the unique_ptrs very private, and provides a vector
of plain pointers for all other interfaces.
2018-12-05 10:23:57 +02:00
20b62a3f3d MXS-2025 Change RWBackend usage to a vector of raw ptrs.
This is essentially just a search and replace to change SRWBackend to
RWBackend* and SRWBackendList to PRWBackends, a vector of a raw
pointers. In the next few commits vector<unique_ptr<RWBackend>>
will be used for life time management.

There are a lot of diffs from the global search and replace. Only a few manual
edits had to be done.

list-src -x build | xargs sed -ri 's/SRWBackends/prwbackends/g'
list-src -x build | xargs sed -ri 's/const mxs::SRWBackend\&/const mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/const SRWBackend\&/const RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\(\)/nullptr/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\&/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\b/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/prwbackends/PRWBackends/g'
2018-12-05 10:23:57 +02:00
d96a7dedc5 MXS-2205 Convert maxscale/poll.h to .hh 2018-12-04 14:51:02 +02:00