If a master with a better rank and a slave with a worse rank were
available and master_accept_reads wasn't enabled, the slave would be
preferred over the master. The check for master_accept_reads was done
twice and also in the wrong place.
Although the default value is the maximum value of a signed 32-bit
integer, the value is stored as a 64-bit integer. The integer type
conversion functions return 64-bit values so storing it as one makes
sense.
Currently values higher than the default are allowed but the accepted
range of input should be restricted in the future.
Readwritesplit now respects server ranks. When servers are selected for
either routing or connection creation, the servers are partitioned by
their rank into sets of servers. These sets of servers are never mixed so
the end result is that only servers of the same rank are considered for
candidacy.
The master selection is slightly different: the server with the best rank
that is capable of acting as a master is chosen. This means that a session
can have a master with a lower rank and slaves with higher ranks than the
master. In most cases this actually is the preferred behavior as the rank
is used to prioritize usage but not outright prevent it.
The connection creation is now internal to RWSplitSession. This makes the
code more readable by removing the need to pass parameters and allowing
easier reuse of existing functions. The various conditions require to
create connections are now also checked in only one place.
Readwritesplit now picks the best available master if no open master
connection is available. This is required if the server rank is to be
taken into account when master selection is done.
Th discarding of connections in maintenance mode must be done after any
results have been written to them. This prevents closing of the connection
before the actual result is returned.
The candidate selection code used default values that would cause reads
past buffers. The code could also dereference the end iterator which
causes undefined behavior.
Queries in the query queue need to be explicitly parsed since they are
stored in a single buffer and thus share the query classification
information. In the next major version this should be changed into an
array of individual buffers instead of a shared buffer.
If a session command is executed when lazy_connect is enabled and no
connections have been created, a connection must be made. This makes sure
that the session isn't closed and that the client receives a response.
The lazy connection creation reduces the burden that short sessions place
on the backend servers. This also prevents the problems caused by early
disconnections that happen when only one server is used but multiple
connections are created. This does not solve the problem (MXS-619) but it
does mitigate it to acceptable levels.
This commit also adds a change to the weighting algorithm that prefers
existing connections over unopened ones. This helps avoid the
flip-flopping that happens when the absolute scores are very similar. The
hard-coded value might need to be tuned once testing is done.
The protocol should not track the session state as the parsing is quite
expensive with the current code. This change is a workaround that enables
the parsing only when required. A proper way to handle this would be to do
all the response processing in one place thus avoiding the duplication of
work.
Given the fact that there exist only three possible categories, the map
can be replaced with a static array that needs no memory
allocations. Making this array thread-local allows it to be reused which
places an upper limit on the number of memory allocations.
The documentation stated that at most `max_sescmd_history` commands were
kept but in reality the number of commands kept in the history was one
command smaller than what was documented.
Replaces uses of config_get_param() in modules either with contains()
or get_string(). The config_get_param() is moved to internal headers,
as it allows seeing inside a config setting.
This commit adds a new parameter that, when enabled, prunes the session
command history to a known length. This makes it possible to keep a
client-side pooled connection open indefinitely at the cost of making
reconnections theoretically unsafe. In practice the maximum history length
can be set to a value that encompasses a single session using the pooled
connection with no risk to session state integrity. The default history
length of 50 commands is quite likely to be adequate for the majority of
use-cases.
When the connection state is reset by executing a COM_CHANGE_USER or
COM_RESET_CONNECTION, readwritesplit does not need to store the session
command history that was executed before it. With this, pooled connections
will effectively behave like normal connections if the pooling mechanism
is smart enough to reset the connection. This also prevents unwanted
visibility into the session states of other connections.
If the routing of a session command fails due to problems with the backend
connections, a more verbose error message is logged. The added status
information in the Backend class makes tracking the original cause of the
problem a lot easier due to knowing where, when and why the connection was
closed.
If a master is found but it is being drained, the connection attempt
is rejected if the master failure mode is fail_instantly.
In that case the logged message makes it plain that it is the draining
that is the reason for the connection attempt to fail.
If a server was not chosen as the target of a routing hint, the server
status would not be logged. By logging the server state in the message, it
is easier to figure out why a server wasn't chosen as the routing target.
Both the replication lag and the message printing state are saved in SERVER,
although the values are mostly used by readwritesplit. A log message is printed
both when a server goes over the limit and when it comes back below.
Because of concurrency issues, a message may be printed multiple times before
different threads detect the new message state.
Documentation updated to explain the change.
If the connection to the master is lost, knowing what type of an error
caused the call to handleError helps deduce what was the real reason for
it. Logging the idle time of the connection helps detect when the
wait_timeout of a connection is exceeded.