MaxScale

Author	SHA1	Message	Date
Johan Wikman	01b1d469a8	MXS-2435 Handle recoverable Clustrix errors If - transaction replay is enabled, - an error is returned and - the error is one of the recoverable Clustrix errors we will retry the transaction. If it succeeds, then the client will not notice anything but for a short delay. Note that the error message is looked for irrespective of whether the backend is Clustrix or not. However, as errors are not common the price for doing that can probably be ignored. However, a bigger problem is that explicit knowledge of different backends should not be coded into routers.	2019-04-26 10:54:57 +03:00
Markus Mäkelä	c643f9bc8d	Merge branch '2.3' into develop	2019-04-12 13:23:49 +03:00
Markus Mäkelä	ec890b33cd	Prevent checksum mismatch on second trx replay If a transaction replay has to be executed twice due to a failure of the original candidate master, the query queue could contain replayed queries. The replayed queries would be placed into the queue if a new connection needs to be created before the transaction replay can start.	2019-04-05 13:33:16 +03:00
Markus Mäkelä	6421af1bb4	Backport query queue changes to 2.3 Backported the changes that convert the query queue in readwritesplit into a proper queue. This changes combines both 5e3198f8313b7bb33df386eb35986bfae1db94a3 and 6042a53cb31046b1100743723567906c5d8208e2 into one commit.	2019-04-05 13:33:16 +03:00
Markus Mäkelä	2aa3515fc8	Merge commit '09cb4a885f88d30b5108d215dcdaa5163229a230' into develop	2019-04-04 14:34:17 +03:00
Markus Mäkelä	a217dde1f0	MXS-2419: Queue queries executed during trx replay By storing the queries in the query queue and routing it once the transaction replay is done, we prevent two problems: * Multiple transaction replays would overwrite the m_interrupted_query buffer that was used to store any queries executed during the transaction replay. * Incorrect ordering of queries when the query queue is not empty and a new query is executed during transaction replay.	2019-04-03 12:57:05 +03:00
Markus Mäkelä	5242cd5ebf	Readwritesplit: Graceful maintenance mode By allowing transactions to the master to end even if the server is in maintenance mode makes it possible to terminate connections at a known point. This helps prevent interrupted transactions which can help reduce errors that are visible to the clients.	2019-04-02 14:21:54 +03:00
Markus Mäkelä	74eeb64fba	Don't close connections to servers being drained The connections to servers being drained should not be closed like they should be for servers in maintenance mode. The change in functionality between 2.3 and develop caused the connections to be discarded if the server was in either maintenance or drain mode.	2019-03-21 18:19:10 +02:00
Markus Mäkelä	9bc721afb6	Merge commit '11ee74bad327e7fb15e8388d20e7838b9e49cadf' into 2.3	2019-03-21 17:52:42 +02:00
Markus Mäkelä	6042a53cb3	Replace raw GWBUF pointers with mxs::Buffer Now that the query queue is stored in an actual container, it is only logical to use mxs::Buffer instead of GWBUF as the stored type.	2019-03-18 13:18:52 +02:00
Markus Mäkelä	5e3198f831	Replace the plain GWBUF query queue with std::deque Using a std::deque to store the queries retains the exact state of the object thus removing the need to parse the query again. It also removes the need to split the queue into individual packets which makes the code cleaner.	2019-03-18 13:18:52 +02:00
Markus Mäkelä	0001babd26	Clean up readwritesplit routing functions Moved the more verbose parts of the routing code into subfunctions and arranged it so that more relevant parts are closer to each other. Also added the SQL statement that is being delayed to the message.	2019-03-18 13:18:52 +02:00
Markus Mäkelä	4bf9fa872c	MXS-2313: Use servers of same rank in readwritesplit When a readwritesplit session has a connection to a master server, servers of the same rank as the master are used. If no master connection is available, the server with the highest rank among all connected servers is used. If there are no open connections, the server with the best rank is chosen and a connection to it is made. Connections with different rank values than what is the current rank value of the session will be discarded. This reduces the use of server with different ranks when the master server of a session fails. Without the active pruning of connections, slave connections to primary clusters without masters would remain in use even after the primary master fails. This guarantees full switchover to a secondary cluster if a master change occurs.	2019-03-18 13:12:59 +02:00
Markus Mäkelä	ba448cb12c	MXS-2313: Clean up readwritesplit connection creation The connection creation is now internal to RWSplitSession. This makes the code more readable by removing the need to pass parameters and allowing easier reuse of existing functions. The various conditions require to create connections are now also checked in only one place.	2019-03-18 13:12:58 +02:00
Markus Mäkelä	4dda31ffe3	Merge branch '2.2' into 2.3	2019-03-16 09:30:56 +02:00
Markus Mäkelä	995c890664	Fix uninitialized pointers in readwritesplit	2019-03-15 15:41:39 +02:00
Markus Mäkelä	667a9f1c6f	Merge branch '2.3' into develop	2019-03-15 12:31:08 +02:00
Markus Mäkelä	09dc92973e	Discard connections as the last step Th discarding of connections in maintenance mode must be done after any results have been written to them. This prevents closing of the connection before the actual result is returned.	2019-03-14 12:15:30 +02:00
Markus Mäkelä	b537176248	Fix parsing of non-query packets Packets that do not contain SQL should not be parsed.	2019-03-13 15:44:02 +02:00
Markus Mäkelä	1c3a5bda83	Merge branch '2.3' into develop	2019-03-11 12:29:56 +02:00
Markus Mäkelä	710e5df27b	MXS-2365: Fix classification of queued queries Queries in the query queue need to be explicitly parsed since they are stored in a single buffer and thus share the query classification information. In the next major version this should be changed into an array of individual buffers instead of a shared buffer.	2019-03-08 14:45:18 +02:00
Markus Mäkelä	24ea222ed6	MXS-2350: Allow lazy connection creation The lazy connection creation reduces the burden that short sessions place on the backend servers. This also prevents the problems caused by early disconnections that happen when only one server is used but multiple connections are created. This does not solve the problem (MXS-619) but it does mitigate it to acceptable levels. This commit also adds a change to the weighting algorithm that prefers existing connections over unopened ones. This helps avoid the flip-flopping that happens when the absolute scores are very similar. The hard-coded value might need to be tuned once testing is done.	2019-03-08 08:20:44 +02:00
Markus Mäkelä	95317725ce	Merge branch '2.3' into develop	2019-03-07 16:21:03 +02:00
Markus Mäkelä	b97976c4ee	MXS-2323: Close stale connections Cleaning up and closing stale connections to servers in maintenance mode helps administrators see when a server is no longer in use.	2019-03-07 15:59:26 +02:00
Markus Mäkelä	6038f1f386	Merge branch '2.3' into develop	2019-02-01 13:55:54 +02:00
Markus Mäkelä	24c9b62a2f	Add verbose logging for session command failures If the routing of a session command fails due to problems with the backend connections, a more verbose error message is logged. The added status information in the Backend class makes tracking the original cause of the problem a lot easier due to knowing where, when and why the connection was closed.	2019-01-31 14:23:26 +02:00
Markus Mäkelä	a3fa2f8111	Merge branch '2.3' into develop	2019-01-16 16:31:14 +02:00
Markus Mäkelä	021d48f94c	Log low-level reason and idle time on master failure If the connection to the master is lost, knowing what type of an error caused the call to handleError helps deduce what was the real reason for it. Logging the idle time of the connection helps detect when the wait_timeout of a connection is exceeded.	2019-01-16 09:43:49 +02:00
Johan Wikman	7cac2c009d	Merge branch '2.3' into develop	2019-01-10 12:43:46 +02:00
Esa Korhonen	9cac927542	MXS-2220 Move server response calculation functions inside class	2019-01-10 10:26:53 +02:00
Markus Mäkelä	147f0bb656	Extend master failure error message The error now describes the failure mode in more detail. This should make post mortem analysis of failed connections a lot easier.	2019-01-09 20:05:38 +02:00
Markus Mäkelä	f0f9c21d1c	Merge branch '2.3' into develop	2019-01-07 10:54:42 +02:00
Esa Korhonen	40485d746c	MXS-2220 Change server name to constant string	2019-01-03 12:13:15 +02:00
Markus Mäkelä	9adbd2f8f0	Cache the local server statistics object By storing the server statistics object in side the session, the lookup involved in getting a worker-local value is avoided. Since the lookup is done multiple times for a single query, it is beneficial to store it in the session. As the worker-local value is never deleted, it is safe to store a reference to it in the session. It is also never updated concurrently so no atomic operations are necessary.	2019-01-03 09:37:59 +02:00
Markus Mäkelä	1fa3b133c7	Make keepalive ping checks more efficient The code now only checks the need for a keepalive ping once every keepalive interval. Reduced the number of mxs_clock calls to one so that all servers use the same value.	2019-01-03 09:37:59 +02:00
Markus Mäkelä	4d0a40ef9f	Add missing pointer initialization The change from SRWBackend to RWBackend* had some side effects, namely the missing automatic initialization into zero values.	2018-12-28 08:19:23 +02:00
Markus Mäkelä	20fe9b9dca	MXS-2196: Rename session states Minor renaming of the session state enum values. Also exposed the session state stringification function in the public header and removed the stringification macro.	2018-12-13 13:27:45 +02:00
Markus Mäkelä	48efa6d027	MXS-2213: Clear stored PS information The information stored for each prepared statement would not be cleared until the end of the session. This is a problem if the sessions last for a very long time as the stored information is unused once a COM_STMT_CLOSE has been received. In addition to this, the session command response maps were not cleared correctly if all backends had processed all session commands.	2018-12-11 13:54:10 +02:00
Markus Mäkelä	77477d9648	MXS-2196: Rename dcb_role_t to DCB::Role	2018-12-05 15:30:44 +02:00
Niclas Antti	0d09b56f58	MXS-2025 RWBackends as a vector of unique_ptr:s For lifetime management keep RWBackends in a vector of unique_ptrs. RWSplitSession keeps the unique_ptrs very private, and provides a vector of plain pointers for all other interfaces.	2018-12-05 10:23:57 +02:00
Niclas Antti	20b62a3f3d	MXS-2025 Change RWBackend usage to a vector of raw ptrs. This is essentially just a search and replace to change SRWBackend to RWBackend* and SRWBackendList to PRWBackends, a vector of a raw pointers. In the next few commits vector<unique_ptr<RWBackend>> will be used for life time management. There are a lot of diffs from the global search and replace. Only a few manual edits had to be done. list-src -x build \| xargs sed -ri 's/SRWBackends/prwbackends/g' list-src -x build \| xargs sed -ri 's/const mxs::SRWBackend\&/const mxs::RWBackend\/g' list-src -x build \| xargs sed -ri 's/const SRWBackend\&/const RWBackend\/g' list-src -x build \| xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\/g' list-src -x build \| xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\/g' list-src -x build \| xargs sed -ri 's/SRWBackend\(\)/nullptr/g' list-src -x build \| xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\/g' list-src -x build \| xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\/g' list-src -x build \| xargs sed -ri 's/SRWBackend\&/RWBackend\/g' list-src -x build \| xargs sed -ri 's/SRWBackend\b/RWBackend\/g' list-src -x build \| xargs sed -ri 's/prwbackends/PRWBackends/g'	2018-12-05 10:23:57 +02:00
Esa Korhonen	d96a7dedc5	MXS-2205 Convert maxscale/poll.h to .hh	2018-12-04 14:51:02 +02:00
Markus Mäkelä	da83551493	MXS-2189: Prevent unwanted trx replay When a transaction is being executed on a slave and the master fails, the transaction replay would start.	2018-11-27 12:52:45 +02:00
Markus Mäkelä	1abcbd64bd	MXS-2187: Allow multiple transaction retries By resetting the replay state the transaction replay can start again on a new server. This allows the replay process work when a master server is shutting down.	2018-11-27 12:52:44 +02:00
Markus Mäkelä	e6325d39fb	Delay initial transaction replay By delaying the replay for a second, we give the monitor a small chance to adap to master failures. It'll also prevent rapid re-querying if multiple transaction replays are supported.	2018-11-27 12:52:44 +02:00
Markus Mäkelä	851793cb86	Fix transaction replay debug assertion A transaction that just completed will go through the start_trx_replay function as from the client protocol's point of view the transaction is still open. The debug assertion did not take this into account and would fail if a successful commit was the last thing done on master that failed. Also fixed the formatting.	2018-11-27 12:52:44 +02:00
Markus Mäkelä	7bf5c07835	Ignore errors sent by servers in shutdown When a server is stopping, it'll send an error to the client before terminating the TCP connection. The code in readwritesplit would detect this error and create a hangup event on the DCB. This would cause it to appear as if the TCP connection was broken and the router would immediately try to reconnect to the same server. By ignoring the error and allowing the connection to die on its own, we avoid immediately reconnecting and retrying any transactions on the stopping server. This increases the chances that the monitor will see it first and assign the server states correctly before the transaction replay is attempted.	2018-11-26 09:42:12 +02:00
Markus Mäkelä	925670ae2f	Fix false master failure log message The message would be logged even if the session continues.	2018-11-26 09:42:11 +02:00
Markus Mäkelä	cab8a4bde8	MXS-2144: Treat server shutdown as a network error If the server where a query is being executed is shutting down, readwritesplit should treat it as an error to make retrying of the query possible. By treating server shutdowns as network errors, the same code path that is used for actual network errors can be taken. This removes the need for any extra retrying logic for this particular case.	2018-11-14 16:23:47 +02:00
Markus Mäkelä	c32bb18862	Fix transaction replay checksum mismatches The transaction replay could get mixed up with new queries if the client managed to perform one while the delayed routing was taking place. A proper way to solve this would be to cork the client DCB until the transaction is fully replayed. As this change would be relatively more complex compared to simply labeling queries that are being retried the corking implementation is left for later when a more complete solution can be designed. This commit also adds some of the missing info logging for the transaction replaying which makes analysis of failures easier.	2018-11-13 16:48:03 +02:00

1 2 3 4

158 Commits