The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.
This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.
As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
The debug assertion is wrong as the code was changed to prioritize hints
over the router target selection. Also removed the superficial check for
master, slave and relay master states as they are implied by the fact that
the connection is in use.
The collection of resultsets needs to be disabled by default when a
response is received to cover the cases where an error is returned.
The collection of results should also not be set for queries that do not
generate any responses.
The read-write distribution in readwritesplit is now stored in a map
partitioned by the servers that the router has used. Currently, the
statistics for removed servers aren't dropped so some filtering still
needs to be added.
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
The math becomes simpler when the weight is inverted, i.e. a simple multiplication
to get the (inverse) score. Inverse weights are normalized to the range [0..1] where a lower
number is a higher weight,
The enum select_criteria_t is used to provide a std::function that takes the backends
as vector (rather than the prior pairwise compares) and returns the best backend.
This commit refactors slave selection. The compare is still pair-wise but isolated into a small run_comparison() function. The function get_slave_candidate() is used when new connections are created, which I both moved and modified (had to move due to scoping), so diff is off.
The slave selection for routing: get_slave_backend() now has the filtering logic from old get_slave_backend() and compare_backends(), the latter of which is removed.
Backend functions mostly take shared_ptr<SRWBackend> in various forms (as is, const ref, in a container). Ideally the shared_ptr would be used only to where it is really needed, and either naked ptrs or references to RWBackend would be used. This refactor does not address that issue, but compounds it by using even deeper shared_ptr structures. Fixing that in a future commit.
The main piece of code, slave selection (backend_cmp_response_time), uses the available
method of pair-wise comparison of slaves. This will be changed to selection using all
available slaves, along with removal of hard coded values.
By storing the data gathere by readwritesplit inside the session, the
protocol will be aware of the state of the LOAD DATA LOCAL INFILE
execution. This prevents misinterpretation of the data which previously
led to closed connections, effectively rendering LOAD DATA LOCAL INFILE
unusable.
This change is a temporary solution to a problem that needs to be solved
at the protocol level. The changes required to implement this are too big
to add into a bug fix release.
The configuration doesn't need to be contained in shared pointer as each
session holds its own version of it. This removes most of the overhead in
configuration reloading. The only thing that's left is any overhead added
by the use of thread-local storage.
The causal read queries were performed also when the target server was the
master. The extra functionality of the causal reads is only needed on
slaves.
Adjusted the test case to require GTID replication.
The configuration used the wrong parameter name. The test also did not
explicitly enable tracking of the last_gtid variable which caused it to
fail if it wasn't already on.
The transaction migration in the case of a changed master never worked as
transaction replay would only be triggered when the master fails. To cover
this case, the transaction replay just needs to be started when the need
for a transaction migration is detected.
To help diagnose the behavior, the Trx class no longer logs a message when
a transaction is closed. This is now done by readwritesplit which has more
knowledge of the context in which the transaction is closed.
Moved transaction statistics calculations into a member function and
placed all target type specific processing into their respective
functions.
Also inverted the connection keepalive check to also cover hinted queries.
By using a shared pointer instead of a plain object, we can replace the
router configuration without it affecting existing sessions. This is a
change that is required to enable runtime reconfiguration of
readwritesplit.
When a valid target was not found, no error message was logged by the
router. This would cause the "Routing the query failed. Session will be
closed." message to be logged with no explanation as to why the routing
failed.
In addition to the above-mentioned case, no message would be logged if the
target for a COM_STMT_FETCH was not in use.
If two or more session commands contain identical buffers, the buffer of
the first session command is shared between the others. This reduces the
amount of memory used to store repeated executions of session commands.
The purging of session command history in readwritesplit was replaced with
session command de-duplication. This was done to prevent problems that
could arise when the order of session commands plays a significant role.
The assertion that was added to RWSplitSession::handle_slave_is_target
failed when delayed_retry was enabled or when slave reconnection
occurred. In 2.3, targets returned by the target selection functions do
not need to be in use but they must be valid connection targets.
When the `optimistic_trx` mode is enabled, all transactions are started on
a slave server. If the client executes a query inside the transaction that
is not of a read-only nature, the transaction is rolled back and replayed
on the master.
Unconditionally update the previous target on each routed query. This
allows routing to the previous server in case it is needed. One example of
this is a new type of hint that allows routing to the same server where
the previous query was sent.
Also added a minor clarifying comment to the resetting of the
current_query.
Formatted readwritesplit with Astyle. Changed the initialization of
Backend::m_modutil_state to use curly braces to cope with Astyle's lack of
support for curly braces inside parentheses.
Readwritesplit now keeps track of how many read-only and read-write
transactions have been executed. This allows a coarse estimation of how
widely read-only transactions are done even without explicit read-only
transactions being used (i.e. START TRANSACTION READ ONLY).