The slave backend would be closed twice if it would both respond with a
different result and be closed due to a hangup before the master
responded.
Added a test case that reproduced the problem.
If the client DCB of the session was passed into the function, it was
possible that the session pointer for it was already set to null. The
session pointer of an open DCB is never null but a client DCB's session
pointer can be null if accessed via the MXS_SESSION object.
By incrementing the counters when the session is created, we know that the
counter will always be decremented correctly. This does cause the listener
session to be counted as an actual session but this is already present in
the statistics calculations and is something we have to live with in 2.3
This change also makes it possible to overshoot the connection count
limitation as the session creation is delayed until authentication
fails. Both of these problems are fixed in 2.4.
The default database was not exposed in the warning that was logged when
authentication failed. The authentication uses the username, host and the
default database to find the user entry and the lack of the default
database made it hard to know for sure which user entry a client should've
matched against.
This causes the connection failure to be counted as an authentication
failure instead of a connection error. The former never causes the host to
be blocked which effectively solves the problem for most cases. The only
case where this would not work is where the network buffer for a backend
DCB is full right after the connection is created.
The password values are now masked with asterisks. This tells whether a
password is set or not but it does not expose any information about the
password itself.
The events are similar to normal query events except that they have an
extra 13 bytes of static data. This data is of no relevance to Maxscale
and thus can be ignored. This also allows the reuse of the same query
event code for execute load query events.
It appears that rollback errors are possible outside of
transactions. Since this was not something we expected to see, logging it
as an error allows us to see why this happens in production deployments.
The errors that are ignored by readwritesplit are now stored as the
current close reason in the Backend. This allows the information about the
error to be retained and it can be used later in the error handler to
display the true reason why the connection was closed.
The hangup and error handlers now have unique messages. Although the
behavior in the handlers is practically the same in both cases, the cause
of the error is not the same.
If a socket error is present, it is added to the error message. If an
error is present, it should clearly show the reason why the TCP socket was
closed.
The is_fake_event boolean helps distinguish fake events from real
ones. This makes figuring out the real source of hangup events easier.
By checking whether the users have changed whenever they are reloaded, we
improve the visibility of the user reloading process. Using a checksum
allows us to easily compress the information with acceptable loss of
accuracy. Using a CAS loop prevents duplicate messages without losing any
updates even if multiple user reloads result in different outcomes.
The use of a regular expression allows multiple rewrite rules to be
combined into one. This allows more versatile conversions but, given the
simple nature of regular expressions, also makes accidental changes more
likely.
Addd mxs::pcre2_substitute that is a more C++-friendly version of
mxs_pcre2_substitute to make. This makes string replacement a lot easier
to do when the source and destination are not C strings.
When rewrite_src and rewrite_dest have different lengths, the slave must
use GTID based replication. This removes the need for one-to-one matching
between the slave's relay log and the master's binlog which gets broken
when event lengths are modified due to event rewriting.
The replication events use a redundant format that has both the length of
the event and the position of the next event. The length can be modified
so that the next event position of the previous event and the length of
the curren event can be different. This includes overlap of the events
where the next event position of an event is "inside" the current event.
The next event position must retain its original value as that allows
replication slaves to reconnect with the correct position when file and
position based replication is used. For GTID replication, the slave asks
for the coordinates from the master and uses those.
When a slave receives a heartbeat event from a master, it checks that the
binlog name matches and that the next event position in the event is not
behind the slave's relay log position. These events must be modified to
contain a fake next event position that will never be reached by the
slave. This makes sure that the simple sanity checks never fail even if
we've caused the slave's relay log to be ahead of the master's binlog.
The parameters allow rudimentary database rewriting in the replication
stream. This is still very limited as the replacement must have the same
length as the original. In theory it could be shorter without causing
problems but making it longer is not easy.
The binlogfilter needs to read results one packet at a time but it needs
resultsets to be collected into a single buffer. This behavior is
guaranteed implicitly when the binlogrouter is used but is not present
when it is used without it. To support the use of the binlogfilter with
readconnroute, the filter must properly declare the capabilities.
The SQL for the second recursive CTE table can be optimized by adding a
where condition on the recursive part that rules out users that are not
roles. The functionality remains the same as only roles can be granted to
users.
If an existing cache-entry should be updated, but the new value
is larger that the maximum size of the cache, then the cache can
not be updated, but the old value must be removed.
Whether or not we succeed in removing the entry, an error result
must be returned. Earlier OK was returned, but no node was
allocated, which then caused a crash.