Commit Graph

5397 Commits

Author SHA1 Message Date
7f978f275f MXS-2223 Log a message when a slave is discriminated due to replication lag
Both the replication lag and the message printing state are saved in SERVER,
although the values are mostly used by readwritesplit. A log message is printed
both when a server goes over the limit and when it comes back below.
Because of concurrency issues, a message may be printed multiple times before
different threads detect the new message state.

Documentation updated to explain the change.
2019-01-21 13:02:18 +02:00
a469ef83b6 MXS-2217: Pick DCB owner before adding to epoll
There is a race condition between the addition of the DCB into epoll and
the execution of the event that initiates the protocol pointer for the DCB
and sends the handshake to the client. If a hangup event would occur
before the handshake would be sent, it would be possible that the DCB
would get freed before the code that sends the handshake is executed.

By picking the worker who owns the DCB before the DCB is placed into the
owner's epoll instance, we make sure no events arrive on the DCB while the
control is transferred from the accepting worker to the owning
worker.
2019-01-17 10:35:49 +02:00
317166540f MXS-2266: Close prepared statements with internal ID
The ID used to store the prepared statements uses the internal ID and
using the external ID caused unwanted memory use and a false warning.
2019-01-16 16:19:35 +02:00
57fe5ff56a Fix error packet stringification function
The code read past the stack buffer.
2019-01-16 09:43:49 +02:00
021d48f94c Log low-level reason and idle time on master failure
If the connection to the master is lost, knowing what type of an error
caused the call to handleError helps deduce what was the real reason for
it. Logging the idle time of the connection helps detect when the
wait_timeout of a connection is exceeded.
2019-01-16 09:43:49 +02:00
8cef8b9472 Compile MariaDBMonitor unit tests only if flag is set 2019-01-15 15:44:21 +02:00
8ac786110e MXS-2255: Fix COMMIT matching
The code used a rather questionable method for parsing SQL statements
instead of using the query classifier for detecting transaction start and
stop events.
2019-01-11 10:27:00 +02:00
147f0bb656 Extend master failure error message
The error now describes the failure mode in more detail. This should make
post mortem analysis of failed connections a lot easier.
2019-01-09 20:05:38 +02:00
63358fb4c1 Add correct operator to comparison
The clause had a bitwise AND. The end result is likely the same.
2019-01-08 10:18:15 +02:00
5f83b07fc2 MXS-2241: Detect invalid readwritesplit configuration
master_reconnection and disable_sescmd_history are, in practice, mutually
exclusive settings.
2019-01-07 11:06:24 +02:00
46bea87ff6 MXS-2238: Fix reading of large Avro schemas
The schemas were read incorrectly which resulted in large schemas having
multiple newlines in them.
2019-01-06 13:05:42 +02:00
8c437c6440 Fix crash in galeramon
An std::string was assigned a null value which will cause a crash.
2019-01-04 12:00:48 +02:00
9adbd2f8f0 Cache the local server statistics object
By storing the server statistics object in side the session, the lookup
involved in getting a worker-local value is avoided. Since the lookup is
done multiple times for a single query, it is beneficial to store it in
the session.

As the worker-local value is never deleted, it is safe to store a
reference to it in the session. It is also never updated concurrently so
no atomic operations are necessary.
2019-01-03 09:37:59 +02:00
1fa3b133c7 Make keepalive ping checks more efficient
The code now only checks the need for a keepalive ping once every
keepalive interval. Reduced the number of mxs_clock calls to one so that
all servers use the same value.
2019-01-03 09:37:59 +02:00
26da72a41f Merge branch '2.2' into 2.3 2019-01-03 09:23:16 +02:00
8f0e4a3034 MXS-2232: Fix version string prefix check
The prefix was always added even when the original version would've been
acceptable. For example, a version string of 5.5.40 would get converted to
5.5.5-5.5.40 which is quite confusing for older client applications.
2019-01-02 19:29:48 +02:00
04dd05b262 MXS-2231: Move TLS handshake code into MariaDBClient
The code is now in the correct place and TLS connections with all
authenticators should now work.
2019-01-02 19:29:41 +02:00
edd03e950f MXS-2209: Use compound roles only with 10.2.15+
Due to MDEV-15556 and MDEV-15840 recursive CTEs can't be reliably used
with older 10.2 versions. To prevent problems, only use the query that
extracts composite roles with newer versions.
2019-01-02 19:27:14 +02:00
35d31801bb Merge branch '2.2' into 2.3 2018-12-17 23:52:56 +02:00
48efa6d027 MXS-2213: Clear stored PS information
The information stored for each prepared statement would not be cleared
until the end of the session. This is a problem if the sessions last for a
very long time as the stored information is unused once a COM_STMT_CLOSE
has been received.

In addition to this, the session command response maps were not cleared
correctly if all backends had processed all session commands.
2018-12-11 13:54:10 +02:00
8b00a00ea7 MXS-2216: Use correct function in response processing
When a response to a prepared statement was processed, the number of EOF
packets was used to see whether the response was complete. This code used
a function that does not work with the special packet returned by a PS
preparation that is similar to an OK packet.

The correct method is to count the total number of packets in the
response.
2018-12-11 13:54:10 +02:00
2dc6718d47 MXS-2188: Update target table in prepare_table
Passing the target table and create to the prepare_table function allows
the converter to update the internal variables.
2018-11-28 19:21:59 +02:00
2eee524eef Fix blr server protocol name
The server that the binlogrouter creates used the old protocol name that
caused a confusing deprecation warning.
2018-11-28 19:21:59 +02:00
cf66cc6968 Merge branch '2.2' into 2.3 2018-11-28 11:27:44 +02:00
6451b1f21a MXS-2183: Fix memory leaks
Under heavy load some of the basic network operations could fail which led
to some of the allocated memory to leak.

Also the backend protocol never freed the current protocol command if it
was not completed. This would happen if a user executed a session command
as the first command but backend authentication would fail.
2018-11-28 02:03:00 +02:00
24d1876ed4 Initialize memory in password hashing
The authentication code did not initialize one of the buffers used to
calculate the password hashes. This resulted in the use of uninitialized
memory when the user provided no password.
2018-11-28 00:15:57 +02:00
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
81937c635b Fix current command tracking during LDLI
When a LOAD DATA LOCAL INFILE was executed, the client and backend
protocols would update the command byte leading to misinterpretations of
the data.
2018-11-27 12:52:44 +02:00
f87ff431c1 Merge branch '2.2' into 2.3 2018-11-27 11:46:47 +02:00
f41caae5c7 MXS-2175: Fix available_when_donor
If a Galera cluster drops down to a single node, the last node would not
be considered valid. During the failure of the second to last node, the
master would also temporarily lose the master status.

The behavior was changed to always keep the cluster UUID until the cluster
size drops down to zero. This guarantees that the same cluster is used as
long as possible.
2018-11-27 09:22:39 +02:00
a042ad646b MXS-2184: Fix avrorouter GTID generation
The event number in the GTID was not incremented for the update_after part
of the transaction.
2018-11-26 09:42:12 +02:00
842f9f1d15 Fix transaction replay timeout
The timeout would not be triggered due to the fact that the
delayed_retry_timeout wasn't inspected.
2018-11-26 09:42:12 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
9f6700b329 Skip connection_keepalive during transaction_replay
When a transaction is replayed, there is no target but the routing was
"successful".
2018-11-26 09:42:11 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
8b92c63248 Remove incorrect assertion
The assertion would hold true for a single worker but it can't be
guaranteed to be true on a multi-worker system where the statistics are
distributed across the workers.
2018-11-26 09:42:11 +02:00
dcf53da209 Enable connection_keepalive by default
Enabling the feature by default prevents the master connection from dying
during times when there are very little or no writes. Having a modest ping
interval of 300 seconds serves to minimize the amount of extra work that
both MaxScale and the server have to do while still keeping the
connections in good shape.
2018-11-26 09:42:11 +02:00
b47d4a4105 Use max_statement_time
Used only with supporting server versions. Using the time limit ensures that
the server interrupts the query at the same point Connector-C would cut the
connection. This prevents lingering queries.

Also, cleans up some associated error messages.
2018-11-23 15:15:44 +02:00
0a5a3309e0 Add missing quotes when printing server names
Some of the log messages didn't have the quotes.
2018-11-23 14:02:09 +02:00
fb52e565fe Store capabilities of monitored server
Checking the version number in various places in the code gets confusing.
It's better to check the version in one place and record the relevant data.
2018-11-21 17:36:52 +02:00
01628dd0de Cleanup server version updating 2018-11-21 17:36:52 +02:00
78829429ae MXS-2178 Add WD workaround to REST-API and maxadmin 2018-11-21 13:31:49 +02:00
5d4775cac1 MXS-2168 Update test_cycle_find
The test now uses both server id:s and hostname:port combinations.
2018-11-21 10:40:21 +02:00
90da3a4d90 Remove MXS_MONITORED_SERVER mapping from MariaDBMon
The mapping was rarely used.
2018-11-21 10:30:11 +02:00
1a046bd453 MXS-2168 Add 'assume_unique_hostnames'-setting to MariaDBMonitor
Adds the setting and takes it into use during replication graph creation
and the most important checks.
2018-11-21 10:30:11 +02:00
48eb57bdc5 Merge branch '2.3.1' into 2.3 2018-11-20 11:27:59 +02:00
c552845fd1 Deprecate old admin modules
Added notification messages for the deprecation of the old admin interface
modules. Also added notes into the documentation about their deprecation.
2018-11-20 10:51:49 +02:00