The log manager is the only one that uses the mlist_t versioned list. The
counter that keeps track of the version number was not modified using
atomic operations meaning that the compiler is free to optimize away parts
of the lock-free versioning mechanism that uses it.
To prevent this optimization, the variable is declared volatile. A rewrite
is direly needed but it cannot be done in 2.2.
Due to the skewed accept distribution without SO_REUSEPORT, we use
round-robin assignment of workers for new client connections. This
provides better performance as work is more likely to be evenly
distributed across all threads.
Using a least-busy-worker algorithm would provide a more stable result but
this is not trivially simple to implement. For this reason, the
round-robin based approach was chosen for 2.2.
The id has now been moved from mxs::Worker to mxs::RoutingWorker
and the implications are felt in many places.
The primary need for the id was to be able to access worker specfic
data, maintained outside of a routing worker, when given a worker
(the id is used to index into an array). Slightly related to that
was the need to be able to iterate over all workers. That obviously
implies some kind of collection.
That causes all sorts of issues if there is a need for being able
to create and destroy a worker at runtime. With the id removed from
mxs::Worker all those issues are gone, and its perfectly ok to create
and destory mxs::Workers as needed.
Further, while there is a need to broadcast a particular message to
all _routing_ workers, it hardly makes sense to broadcast a particular
message too _all_ workers. Consequently, only routing workers are kept
in a collection and all static member functions dealing with all
workers (e.g. broadcast) have now been moved to mxs::RoutingWorker.
Now, instead of passing the id around we instead deal directly
with the worker pointer. Later the data in all those external arrays
will be moved into mxs::[Worker|RoutingWorker] so that worker related
data is maintained in exactly one place.
MaxScale now defines events for which the syslog
facility and level can explicitly be defined by the
administrator. Currently there is only one such
event, namelt AUTHENTICATION_FAILURE.
In a subsequent commit, config.cc will be modified so
that event-related configuration parameters are passed
to event::configure() and in another subsequent commit
the authenticators will be modifed to use this mechanism.
In practice a line like:
MXS_WARNING("%s: login attempt for user '%s'@[%s]:%s, "
"authentication failed.",
dcb->service->name, client_data->user,
dcb->remote, dcb->path);
will be changed to
MXS_LOG_EVENT(event::AUTHENTICATION_FAILURE,
"%s: login attempt for user '%s'@[%s]:%s, "
"authentication failed.",
dcb->service->name, client_data->user,
dcb->remote, dcb->path);
By storing a link to the backend DCBs in the session object itself, we can
reach all related objects from the session. This removes the need to
iterate over all DCBs to find the set of related DCBs.
MonitorDestroy() (renamed to monitor_destroy()) will be used for
actually destroying the monitor instance, that is, execute
destroyInstance() on the loaded module instance and freeing the
the monitor instance.
TODO: monitor_deactivate() could do all the stuff which is currently
done to the monitor in config_runtime(), instead of just
turning off the flag.
Fixed string truncation warnings by reducing max parameter lengths by one
where applicable. The binlogrouter filename lengths are slightly different
so using memcpy to work around the warnings is an adequate "solution"
until the root of the problem is solved.
Removed unnecessary CMake policy settings from qc_sqlite. Adding a
self-dependency on the source file of an external project has no effect
and only caused warnings to be logged.
When providing pointer to instance and pointer to member function
of the class of the instance, the pointer to the member function
should be first and the pointer to the instance second.
There's now double bookkeeping:
- All delayed calls are in a map whose key is the next
invocation time. Since it's a map (and not an unordered_map)
it's sorted just the way we want to have it.
- In addition, there's an unordered set for each tag.
With this arrangement we can easily invoke the delayed calls
in the right order and be able to efficiently remove all
delayed calls related to a particular tag.
When canceling, a DelayedCall instance must be removed from the
collection holding all delayed calls. Consequently priority_queue
cannot be used as it 1) does not provide access to the underlying
collection and 2) the underlying collection (vector or deque)
is a bad choise if items in the middle needs to be removed.
The interface for canceling calls is now geared towards the needs
of sessions. Basically the idea is as follows:
class MyFilterSession : public maxscale::FilterSession
{
...
int MyFilterSession::routeQuery(GWBUF* pPacket)
{
...
if (needs_to_be_delayed())
{
Worker* pWorker = Worker::current();
void* pTag = this;
pWorker->delayed_call(5000, pTag, this,
&MyFilterSession::delayed_routeQuery,
pPacket);
return 1;
}
...
}
bool MyFilterSession::delayed_routeQuery(Worker::Call:action_t action,
GWBUF* pPacket)
{
if (action == Worker::Call::EXECUTE)
{
routeQuery(pPacket);
}
else
{
ss_dassert(action == Worker::Call::CANCEL);
gwbuf_free(pPacket);
}
return false;
}
~MyFilterSession()
{
void* pTag = this;
Worker::current()->cancel_delayed_calls(pTag);
}
}
The alternative, returning some key that the caller must keep
around seems more cumbersome for the general case.
It's now possible to provide Worker with a function to call
at a later time. It's possible to provide a function or a
member function (with the object), taking zero or one argument
of any kind. The argument must be copyable.
There's currently no way to cancel a call, which must be added
as typically the delayed calling is associated with a session
and if the session is closed before the delayed call is made,
bad things are likely to happen.
Worker::Timer class and Worker::DelegatingTimer templates are
timers built on top of timerfd_create(2). As such they consume
descriptor and hence cannot be created independently for each
timer need.
Each Worker has now a private timer member variable on top of
which a general timer mechanism will be provided.
Worker is now the base class of all workers. It has a message
queue and can be run in a thread of its own, or in the calling
thread. Worker can not be used as such, but a concrete worker
class must be derived from it. Currently there is only one
concrete class RoutingWorker.
There is some overlapping in functionality between Worker and
RoutingWorker, as there is e.g. a need for broadcasting a
message to all routing workers, but not to other workers.
Currently other workers can not be created as the array for
holding the pointers to the workers is exactly as large as
there will be RoutingWorkers. That will be changed so that
the maximum number of threads is hardwired to some ridiculous
value such as 128. That's the first step in the path towards
a situation where the number of worker threads can be changed
at runtime.
A new class mxs::Worker will be introduced and mxs::RoutingWorker
will be inherited from that. mxs::Worker will basically only be a
thread with a message-loop.
Once available, all current non-worker threads (but the one
implicitly created by microhttpd) can be creating by inheriting
from that; in practice that means the housekeeping thread, all
monitor threads and possibly the logging thread.
The benefit of this arrangement is that there then will be a general
mechanism for cross thread communication without having to use any
shared data structures.
With a granularity of 1 second, the load will from a human
perspective reflect the current situation. That also means
that the maxadmin output shows "natural" steps; 1s, 1m and 1h.
By definition, the load is calculated using the following formula:
L = 100 * ((T - t) / T)
where T is a time period and t the time of that period that the worker
spends in epoll_wait(). So, if there is so much work that epoll_wait()
always returns immediately, then the load is 100 and if the thread
spends the entire period in epoll_wait(), then the load is 0.
The basic idea is that the timeout given to epoll_wait() is adjusted
so that epoll_wait() will always return roughly at 10 seconds interval.
By making a note of when we are about to enter epoll_wait() and when we
return from it, we have all the information we need for calculating the
load.
Due to the nature of things, we will not be able to calculate the load
at exact 10-second boundaries, but it will be pretty close. And the load
is always calculated using the true length of the period.
We will then calculate 1 minute load by averaging the load value for 6
consecutive 10-second periods and the 1 hour load by averaging the load
value of 60 consecutive 1 minute loads.
So, while the 10-second load represents the load of the most recently
measured 10-second period (and not the load of the most recent 10
seconds), the 1 minute load and the 1 hour load represents the load of
the most recent minute and hour respectively.
Pre-loading users for all threads at startup significantly reduces the
chance for failures caused by the lazy initialization of the user database
done by the authenticators.
If users are not loaded at startup and the connection limit for all
servers is reached, authentication in MaxScale will fail not due to too
many connections but due to the lack of authentication data. This causes
repeated reloading of users, which floods the log with messages, and
unnecessary stress on the cluster itself.
The internal header directory conflicted with in-source builds causing a
build failure. This is fixed by renaming the internal header directory to
something other than maxscale.
The renaming pointed out a few problems in a couple of source files that
appeared to include internal headers when the headers were in fact public
headers.
Fixed maxctrl in-source builds by making the copying of the sources
optional.