Commit Graph

2631 Commits

Author SHA1 Message Date
da83551493 MXS-2189: Prevent unwanted trx replay
When a transaction is being executed on a slave and the master fails, the
transaction replay would start.
2018-11-27 12:52:45 +02:00
1abcbd64bd MXS-2187: Allow multiple transaction retries
By resetting the replay state the transaction replay can start again on a
new server. This allows the replay process work when a master server is
shutting down.
2018-11-27 12:52:44 +02:00
e6325d39fb Delay initial transaction replay
By delaying the replay for a second, we give the monitor a small chance to
adap to master failures. It'll also prevent rapid re-querying if multiple
transaction replays are supported.
2018-11-27 12:52:44 +02:00
851793cb86 Fix transaction replay debug assertion
A transaction that just completed will go through the start_trx_replay
function as from the client protocol's point of view the transaction is
still open. The debug assertion did not take this into account and would
fail if a successful commit was the last thing done on master that failed.

Also fixed the formatting.
2018-11-27 12:52:44 +02:00
a042ad646b MXS-2184: Fix avrorouter GTID generation
The event number in the GTID was not incremented for the update_after part
of the transaction.
2018-11-26 09:42:12 +02:00
842f9f1d15 Fix transaction replay timeout
The timeout would not be triggered due to the fact that the
delayed_retry_timeout wasn't inspected.
2018-11-26 09:42:12 +02:00
7bf5c07835 Ignore errors sent by servers in shutdown
When a server is stopping, it'll send an error to the client before
terminating the TCP connection. The code in readwritesplit would detect
this error and create a hangup event on the DCB. This would cause it to
appear as if the TCP connection was broken and the router would
immediately try to reconnect to the same server.

By ignoring the error and allowing the connection to die on its own, we
avoid immediately reconnecting and retrying any transactions on the
stopping server. This increases the chances that the monitor will see it
first and assign the server states correctly before the transaction replay
is attempted.
2018-11-26 09:42:12 +02:00
9f6700b329 Skip connection_keepalive during transaction_replay
When a transaction is replayed, there is no target but the routing was
"successful".
2018-11-26 09:42:11 +02:00
925670ae2f Fix false master failure log message
The message would be logged even if the session continues.
2018-11-26 09:42:11 +02:00
8b92c63248 Remove incorrect assertion
The assertion would hold true for a single worker but it can't be
guaranteed to be true on a multi-worker system where the statistics are
distributed across the workers.
2018-11-26 09:42:11 +02:00
dcf53da209 Enable connection_keepalive by default
Enabling the feature by default prevents the master connection from dying
during times when there are very little or no writes. Having a modest ping
interval of 300 seconds serves to minimize the amount of extra work that
both MaxScale and the server have to do while still keeping the
connections in good shape.
2018-11-26 09:42:11 +02:00
78829429ae MXS-2178 Add WD workaround to REST-API and maxadmin 2018-11-21 13:31:49 +02:00
c552845fd1 Deprecate old admin modules
Added notification messages for the deprecation of the old admin interface
modules. Also added notes into the documentation about their deprecation.
2018-11-20 10:51:49 +02:00
cab8a4bde8 MXS-2144: Treat server shutdown as a network error
If the server where a query is being executed is shutting down,
readwritesplit should treat it as an error to make retrying of the query
possible.

By treating server shutdowns as network errors, the same code path that is
used for actual network errors can be taken. This removes the need for any
extra retrying logic for this particular case.
2018-11-14 16:23:47 +02:00
370483fb4b Log slave error message on failed session command
If the master succeeds in executing a session command but the slave fails,
the error message could help explain why it failed. At the moment this is
mainly relevant for inspection of test results.
2018-11-14 16:23:46 +02:00
c32bb18862 Fix transaction replay checksum mismatches
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.

This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
2018-11-13 16:48:03 +02:00
ae0e9b359d Fix use of zero-weight servers
The servers with a zero weight would be always used over ones that have a
weight. This means that the behavior was inverted and caused the
mxs2054_hybrid_cluster test to fail in 2.3.

Also fixed a typo in the deprecation message.
2018-11-12 10:13:59 +02:00
b443bb7525 Store PS session commands with internal ID
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.

As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
2018-11-12 10:13:59 +02:00
7e54cb8132 Fix crash in cat
The router used the wrong capabilities and results weren't delivered as
complete and contiguous packets.
2018-11-12 10:13:22 +02:00
2a6df0e724 Merge branch '2.2' into 2.3 2018-11-09 14:22:28 +02:00
a9e2364979 Fix unknown PS ID on query re-routing
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.
2018-11-09 12:13:22 +02:00
bfc8cb4803 MXS-2151: Always log fatal master connection errors
When the connection to the master is broken, the session is not configured
to use the read-only modes and the monitor can still connect to the
server, the connection will be closed and and error is sent to the
client. To leave some trace of this problem in the MaxScale logs, a
message should always be logged when a network error occurs.
2018-11-09 00:39:32 +02:00
c899f00541 MXS-1780 Collect server response information
As the router is the only one that knows what backends a particular
statement has been sent to, it is the responsibility of the router
to keep the session bookkeeping up to date. If it doesn't we will
know what statements a session has received (provided at least some
component in the routing chain has RCAP_TYPE_STMT_INPUT capability),
but not how long their processing took. Currently only readwritesplit
does that.

All queries are stored and not just COM_QUERY as that makes the
overall bookkeeping simpler; at clientReply() time we do not need to
know whether or not to bookkeep information, we can just do it.

When session information is queried for, we report as much information
we have available.
2018-11-08 12:04:55 +02:00
3ccdb508de Fix bug in roulette wheel
Slot values were changed after the total was calculated. Fix bug
and adjust the offending code.
2018-11-08 10:50:23 +02:00
c692c864e2 MXS-2078 Take new statistics into use 2018-11-08 10:44:32 +02:00
6a8ba999bd MXS-2095: Fix crash on GRANT CREATE TEMPORARY TABLE
The avrorouter classified the GRANT statement as a CREATE TABLE statement.
2018-11-08 08:31:48 +02:00
774e9d1efb Fix merge problems
The values were not removed from the merge.
2018-11-06 21:18:42 +02:00
383b0b1989 Merge branch '2.2' into 2.3 2018-11-06 21:12:20 +02:00
113b1503f6 Expand readwritesplit delayed retry error message
The error now explains if the write failure was due to the
delayed_retry_timeout being reached.
2018-11-06 15:09:14 +02:00
4341c2b6e2 MXS-2142: Set causal_reads_timeout default to 10
The causal_reads_timeout default value is too long when considering the
behavioral changes that MXS-2141 introduced. With a 10 second default
value, a result is returned to the client in a reasonable amount of time.
2018-11-06 15:09:14 +02:00
e56372b153 MXS-2141: Retry query on master if it times out on slave
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
2018-11-06 15:09:14 +02:00
c661f5e838 MXS-2139: Extend transaction_replay requirements
Enabling transaction_replay now automatically enables
master_failure_mode=fail_on_write. This makes the behavior consistent
across all failure modes.
2018-11-06 15:09:14 +02:00
95745f5a4e MXS-2140: Fix readwritesplit configuration processing
Runtime configuration changes did not properly enable implicitly enabled
parameters.
2018-11-06 15:09:14 +02:00
562c7be8fe MXS-2106: Fix NULL value handling
The NULL values were not stored as NULL Avro values due to the fact that
the file format has no native NULL-ness for the basic types. To solve
this, all values must be stored as a union that contains the actual type
as well as the null type.

Unions were not implemented in the maxavro library but implementing means
simply recursing one level down.
2018-11-05 13:37:29 +02:00
7f36ec83da MXS-2095: Add runtime detection of unknown SQL
If the query statement is wrongly treated as a table creation statement it
could cause a crash. To handle this, unknown SQL is now reported and the
processing is stopped early. This does not solve the root cause of the
problem but makes it possible to detect it in the future.
2018-11-05 13:37:28 +02:00
f8c132903b Fix query average measurment and average text output.
The query_ended() call was not in the right spot. Tests did not
detect it. Changed textual output to reflect the fact that they
are for RWSplit reads.
2018-11-04 17:18:09 +02:00
ce35b0d541 Merge branch '2.2' into 2.3 2018-10-30 14:16:33 +02:00
91c5f8580c MXS-2119: Fix file permissions
The admin files are now created with 640 permissions and automatically
created directories now properly set the permissions for the group as
well. All files and directories created by avrorouter and binlogrouter
also now correctly limit the read and write permissions only to the owner
and the group.
2018-10-30 12:45:36 +02:00
4d8a95d041 Merge commit '262f1d7e471bacca6b985ec3f2cd5cb76d6e2584' into 2.3 2018-10-26 12:44:57 +03:00
192563a947 MXS-2108: Fix open connection calculation
When a connection to a server is lost and the session command history is
disabled, the session will continue as long as at least one connection is
open. Previously the open connection calculation used the same code that
was used when a new session was created which only inspected the
configured server count instead of the actual open connection count.
2018-10-19 15:20:34 +03:00
8a0805d264 MXS-2090 Drop requirement that GTID based replication is used
Drop the requirement that GTID based replication is used for
the BinLog Galera "failover" mechanism. There is no reason for
that restriction; it works just as well with file+position based
replication.
2018-10-19 08:03:11 +03:00
f8cf5053bd MXS-2103: Fix CREATE TEMPORARY TABLE detection
The table creation was not detected as the function used to extract the
table name did not return the fully qualified names. Even if it did return
a fully qualified name, it wouldn't have been correctly processed.
2018-10-18 20:26:58 +03:00
20af9afb49 Merge branch '2.2' into 2.3 2018-10-16 11:10:48 +03:00
92057f6ff9 Add more logging to readwritesplit
When a read-only transaction fails due to a connection error, no message
would be logged. Also added an info level message for the case when a
backend connection would get closed before the session is in the correct
state and a debug assertion that the router session should never be closed
when the handleError method is called.
2018-10-16 11:04:57 +03:00
75ea1b6ea1 Fix formatting of new(std::nothrow)
The code previously formatted everything as `new( std::nothrow)`.
2018-10-04 21:50:44 +03:00
3b1b63d939 MXS-1777 Change LOWEST_RESPONSE_TIME to ADAPTIVE_ROUTING
LOWEST_RESPONSE_TIME is not quite correct, and marketing material
will call it Adaptive Routing, so better match that.
2018-10-04 19:26:16 +03:00
ada91f2d53 MXS-1777 Make sure slower servers are sampled sufficiently 2018-10-04 17:53:57 +03:00
661bdd5b82 Work around debug assertions in binlogrouter
The binlogrouter uses buffers across worker threads which is no longer OK
in 2.3. The correct solution would be to store data in something other
than a GWBUF (e.g. std::vector) and protect the sharing with a mutex. The
current solution simply works around the assertions by using macros
instead of functions.
2018-10-04 12:48:27 +03:00
d866cb3a21 Add bias value to server score calculations
By biasing the values of all counter type scores to positive integers, the
server weights are always taken into use.

This fixes the case when weights were ignored until all score base values
were larger than zero (the mxs922_server test).
2018-10-03 08:41:44 +03:00
ea971a664e Fix readconnroute debug assertion
If the DCB is closed in handleError, it would be NULL in closeSession. To
only close the DCB in one place, the handleError can be reduced to writing
an error to the client and marking the failure as a fatal one.
2018-10-03 08:41:44 +03:00