By storing the server statistics object in side the session, the lookup
involved in getting a worker-local value is avoided. Since the lookup is
done multiple times for a single query, it is beneficial to store it in
the session.
As the worker-local value is never deleted, it is safe to store a
reference to it in the session. It is also never updated concurrently so
no atomic operations are necessary.
The code now only checks the need for a keepalive ping once every
keepalive interval. Reduced the number of mxs_clock calls to one so that
all servers use the same value.
The information stored for each prepared statement would not be cleared
until the end of the session. This is a problem if the sessions last for a
very long time as the stored information is unused once a COM_STMT_CLOSE
has been received.
In addition to this, the session command response maps were not cleared
correctly if all backends had processed all session commands.
For lifetime management keep RWBackends in a vector of unique_ptrs.
RWSplitSession keeps the unique_ptrs very private, and provides a vector
of plain pointers for all other interfaces.
This is essentially just a search and replace to change SRWBackend to
RWBackend* and SRWBackendList to PRWBackends, a vector of a raw
pointers. In the next few commits vector<unique_ptr<RWBackend>>
will be used for life time management.
There are a lot of diffs from the global search and replace. Only a few manual
edits had to be done.
list-src -x build | xargs sed -ri 's/SRWBackends/prwbackends/g'
list-src -x build | xargs sed -ri 's/const mxs::SRWBackend\&/const mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/const SRWBackend\&/const RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\(\)/nullptr/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend\&/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/mxs::SRWBackend/mxs::RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\&/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/SRWBackend\b/RWBackend\*/g'
list-src -x build | xargs sed -ri 's/prwbackends/PRWBackends/g'
The transaction replay could get mixed up with new queries if the client
managed to perform one while the delayed routing was taking place. A
proper way to solve this would be to cork the client DCB until the
transaction is fully replayed. As this change would be relatively more
complex compared to simply labeling queries that are being retried the
corking implementation is left for later when a more complete solution can
be designed.
This commit also adds some of the missing info logging for the transaction
replaying which makes analysis of failures easier.
Commit a9e236497963251f8b4afa07484b88ad97e73a03 changed where the PS ID
for a binary protocol command is replaced with the internal form. This
caused prepared statements that are also session commands to be always
routed with the external ID.
As the external ID is almost always the master's ID, the aforementioned
bug resulted in odd side-effects and the true cause of these was only
revealed when the error message sent by the slave was included in the log
messages.
If a PS command is routed multiple times, the ID will not be reverted to
the external ID in the failure cases. This prevented prepared statements
from being re-routed correctly.
With causal_reads enabled, the query would return with an error if the
slave was not able to catch up to the master fast enough. By automatically
retrying the query on the master, we're guaranteed that a valid result is
always returned to the client.
The debug assertion is wrong as the code was changed to prioritize hints
over the router target selection. Also removed the superficial check for
master, slave and relay master states as they are implied by the fact that
the connection is in use.
The collection of resultsets needs to be disabled by default when a
response is received to cover the cases where an error is returned.
The collection of results should also not be set for queries that do not
generate any responses.
The read-write distribution in readwritesplit is now stored in a map
partitioned by the servers that the router has used. Currently, the
statistics for removed servers aren't dropped so some filtering still
needs to be added.
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
The math becomes simpler when the weight is inverted, i.e. a simple multiplication
to get the (inverse) score. Inverse weights are normalized to the range [0..1] where a lower
number is a higher weight,
The enum select_criteria_t is used to provide a std::function that takes the backends
as vector (rather than the prior pairwise compares) and returns the best backend.
This commit refactors slave selection. The compare is still pair-wise but isolated into a small run_comparison() function. The function get_slave_candidate() is used when new connections are created, which I both moved and modified (had to move due to scoping), so diff is off.
The slave selection for routing: get_slave_backend() now has the filtering logic from old get_slave_backend() and compare_backends(), the latter of which is removed.
Backend functions mostly take shared_ptr<SRWBackend> in various forms (as is, const ref, in a container). Ideally the shared_ptr would be used only to where it is really needed, and either naked ptrs or references to RWBackend would be used. This refactor does not address that issue, but compounds it by using even deeper shared_ptr structures. Fixing that in a future commit.
The main piece of code, slave selection (backend_cmp_response_time), uses the available
method of pair-wise comparison of slaves. This will be changed to selection using all
available slaves, along with removal of hard coded values.
By storing the data gathere by readwritesplit inside the session, the
protocol will be aware of the state of the LOAD DATA LOCAL INFILE
execution. This prevents misinterpretation of the data which previously
led to closed connections, effectively rendering LOAD DATA LOCAL INFILE
unusable.
This change is a temporary solution to a problem that needs to be solved
at the protocol level. The changes required to implement this are too big
to add into a bug fix release.
The configuration doesn't need to be contained in shared pointer as each
session holds its own version of it. This removes most of the overhead in
configuration reloading. The only thing that's left is any overhead added
by the use of thread-local storage.
The causal read queries were performed also when the target server was the
master. The extra functionality of the causal reads is only needed on
slaves.
Adjusted the test case to require GTID replication.
The configuration used the wrong parameter name. The test also did not
explicitly enable tracking of the last_gtid variable which caused it to
fail if it wasn't already on.
The transaction migration in the case of a changed master never worked as
transaction replay would only be triggered when the master fails. To cover
this case, the transaction replay just needs to be started when the need
for a transaction migration is detected.
To help diagnose the behavior, the Trx class no longer logs a message when
a transaction is closed. This is now done by readwritesplit which has more
knowledge of the context in which the transaction is closed.
Moved transaction statistics calculations into a member function and
placed all target type specific processing into their respective
functions.
Also inverted the connection keepalive check to also cover hinted queries.