The mapping from a canonical statement to the query classification
result is maintained by the class QCInfoCache of which there exist
an instance per thread. That way no locking is needed but the
information will be cached multiple times (but that is a smaller
price to pay). Currently the information is stored in a regular
std::unordered_map, which means that the consumed amount of
memory will just keep on growing unless the number of canonical
statements used by clients happens to have an upper bound.
The LRU cache (that provides means for putting a bound on the
amount of memory used and number of items) used in the cache filter
will be generalized and be taken into use here as well.
The key is now the canonical statement itself, which means that
a fair amount of memory will be used. To preserve memory it might
make sense to use a hashed value instead, although that at least
in principle opens up the possibility for unintended collisions.
This feature will also be made configurable.
Minor cleanup:
- Unit variables places in anonymous namespace.
- Unit variables accessed via this_unit struct.
This is the first commit of many where the mapping from canonical
statement to query classification result is introduced.
The mapping is placed above the actual query classifier, so that all
query classifiers will benefit from it.
The mapping will be maintained by thread so that there will not be
any synchronization issues. Further, initially a simple map without
upper boundary will be used; if this is found to provide measurable
benefits, the map will then be replaced with an LRU mechanism so
that it becomes possible to specify just how much memory the mapping
may use.
The evq_length file held the returned number of descriptors from
the last epoll_wait() call. As such it is highly temporal and not
particularly meaningful.
That has now been removed and the instead the average number of
returned descriptors is maintained. That information changes slowly
and thus carries some meaning.
Due to the skewed accept distribution without SO_REUSEPORT, we use
round-robin assignment of workers for new client connections. This
provides better performance as work is more likely to be evenly
distributed across all threads.
Using a least-busy-worker algorithm would provide a more stable result but
this is not trivially simple to implement. For this reason, the
round-robin based approach was chosen for 2.2.
If two or more session commands contain identical buffers, the buffer of
the first session command is shared between the others. This reduces the
amount of memory used to store repeated executions of session commands.
The purging of session command history in readwritesplit was replaced with
session command de-duplication. This was done to prevent problems that
could arise when the order of session commands plays a significant role.
The characteristics of a transaction can now be tracked by the query
classifier. This allows read-only and read-write transaction statistics to
be calculated.
Prepared statements via readwritesplit need to have their IDs mapped from
the internal representation to the backend specific one. The RWBackend
class does this in its write method but the fix in commit
e561c3995c7396cf3749ccdf6a3357d7dd32c856 caused this to be bypassed and
the base version was always used.
The id has now been moved from mxs::Worker to mxs::RoutingWorker
and the implications are felt in many places.
The primary need for the id was to be able to access worker specfic
data, maintained outside of a routing worker, when given a worker
(the id is used to index into an array). Slightly related to that
was the need to be able to iterate over all workers. That obviously
implies some kind of collection.
That causes all sorts of issues if there is a need for being able
to create and destroy a worker at runtime. With the id removed from
mxs::Worker all those issues are gone, and its perfectly ok to create
and destory mxs::Workers as needed.
Further, while there is a need to broadcast a particular message to
all _routing_ workers, it hardly makes sense to broadcast a particular
message too _all_ workers. Consequently, only routing workers are kept
in a collection and all static member functions dealing with all
workers (e.g. broadcast) have now been moved to mxs::RoutingWorker.
Now, instead of passing the id around we instead deal directly
with the worker pointer. Later the data in all those external arrays
will be moved into mxs::[Worker|RoutingWorker] so that worker related
data is maintained in exactly one place.
To get rid of the need that a Worker must have an id, we store
in the MXS_POLL_DATA structure a pointer to the owning worker
instead of the id of the owning worker. This also allows some
further cleanup as the need for switching back and forth between
the id and the worker disappears.
The id will be moved from Worker to RoutingWorker as there
currently is a fair amount of code that assumes that the id of
routing workers start from 0.
It's no longer necessary to inherit from Worker in order to use
it, but it can now be used in a stand-alone fashion. This fits
the MonitorInstance use-case better.
If a session command produces a different result on the slave than it did
on the master, a warning is logged. This warning now also logs the query
that was being executed to make investigation of the problem easier.
The only way to cleanly separate the maxutils library from the MaxScale
CMake project is to make it a standalone CMake project. With the help of
ExternalProject, it should be relatively easy to use.
Backend::execute_session_command would use the overridden write method
instead of the Backend::write method that it intended to use. This caused
session commands that did not expect a response to be in a state that
expected a result.
Also fixed RWBackend::write pass the response_type value to
Backend::write.
Look for the expected message several times, with short sleeps in
between. That way we will not sleep more than necessary, yet will
not immediately give up either.
At the start of the monitor tick, the monitor pending status is set to the
value of the current server status. This allows the informative and
history bits to survive even if a server goes down.
To make sure that no stale state bits are in effect when the post_tick
method is called, the pending status must be cleared of all status bits.
The core library now contains the maxscale_shutdown() command. This makes
it possible to resolve all symbols at link time even for administrative
modules.
MaxScale now defines events for which the syslog
facility and level can explicitly be defined by the
administrator. Currently there is only one such
event, namelt AUTHENTICATION_FAILURE.
In a subsequent commit, config.cc will be modified so
that event-related configuration parameters are passed
to event::configure() and in another subsequent commit
the authenticators will be modifed to use this mechanism.
In practice a line like:
MXS_WARNING("%s: login attempt for user '%s'@[%s]:%s, "
"authentication failed.",
dcb->service->name, client_data->user,
dcb->remote, dcb->path);
will be changed to
MXS_LOG_EVENT(event::AUTHENTICATION_FAILURE,
"%s: login attempt for user '%s'@[%s]:%s, "
"authentication failed.",
dcb->service->name, client_data->user,
dcb->remote, dcb->path);
Allowing calls to select_connect_backend_servers even when all slaves are
connected solves the debug assertion in select_connect_backend_servers
that happens when the execution of a queued query causes a new connection
to be created.
In principle a syslog priority consists of a syslog level
bit-or:d with a syslog facility. That's clear from the syslog
man page, but not so clear from the code in syslog.h.
Anyway, to make it possible to log using a specific facility
(instead of the default LOG_USER), we must allow priorities
that include a specified facility.