Due to the fact that both client connections and listeners use sessions in
2.3, the client_count tracking must be done inside the client DCB. In
addition to this, the max_connections check didn't take the current
pending connection into account which caused an off-by-one error.
This commit fixes the connection_limit test failure that was introduced by
commit 6306519e5e75575ba083ee2f0edfe7e624da5d26.
By incrementing the counters when the session is created, we know that the
counter will always be decremented correctly. This does cause the listener
session to be counted as an actual session but this is already present in
the statistics calculations and is something we have to live with in 2.3
This change also makes it possible to overshoot the connection count
limitation as the session creation is delayed until authentication
fails. Both of these problems are fixed in 2.4.
The FakeEventTask called the actual DCB handler with a fake task but it
didn't set the fake event flag. This caused KILL queries to be treated as
if they were network errors.
The hangup and error handlers now have unique messages. Although the
behavior in the handlers is practically the same in both cases, the cause
of the error is not the same.
If a socket error is present, it is added to the error message. If an
error is present, it should clearly show the reason why the TCP socket was
closed.
The is_fake_event boolean helps distinguish fake events from real
ones. This makes figuring out the real source of hangup events easier.
When fake hangup events are delivered via DCBs, the current DCB would not
be updated. This would cause error messages without a session ID which
makes failure analysis harder.
Given the assumption that queries are rarely 16MB long and that
realistically the only time that happens is during a large dump of data,
we can limit the size of a single read to at most one MariaDB/MySQL packet
at a time. This change allows the network throttling to engage a lot
sooner and reduces the maximum overshoot of throtting to 16MB.
The function allocated a constant-sized chunk of memory for all messages
which was excessive as well as potentially dangerous when used with large
strings.
If a DCB was closed and a hangup event was sent to it via
dcb_hangup_foreach shortly after it was closed, the DCB would still
receive it even if it was closed. To prevent this, events must only be
delivered to DCBs if they haven't been closed.
There is a race condition between the addition of the DCB into epoll and
the execution of the event that initiates the protocol pointer for the DCB
and sends the handshake to the client. If a hangup event would occur
before the handshake would be sent, it would be possible that the DCB
would get freed before the code that sends the handshake is executed.
By picking the worker who owns the DCB before the DCB is placed into the
owner's epoll instance, we make sure no events arrive on the DCB while the
control is transferred from the accepting worker to the owning
worker.
When poll_add_dcb was called for a DCB that once was polling system but
was subsequently removed, the DCB would appear twice in the worker's list
of DCBs. This caused a hang when the DCB was the last one in the worker's
list and dcb_foreach_local would be called.
To prevent the aforementioned problem, the DCBs are now added and removed
directly to and from the workers instead of indirectly via poll_add_dcb
and poll_remove_dcb.
Even though directly closing the socket is not very neat in the
architectural sense of things, it allows the best of both worlds: the
socket is instantly closed and is open for reuse while the listener struct
is still available as a reference.
This change needs to be revised when the listeners are refactored into
separate objects.
Updated documentation to reflect the change in behavior.
When a DCB is removed and added more than once with poll_add_dcb and
poll_remove_dcb, the code previously chose a different thread each time
the DCB was added. This violated the assumption is that all DCBs are local
to a single worker.
The callbacks iterated over all threads when only the local ones must be
iterated. This prevents a deadlock from occurring when multiple threads
start throttling at the same time.
Also fixed the gwbuf_append debug assertion.
The reads now read as much of the data as is available to reduce the
number of distinct malloc calls that need to be made. The SSL_read also
now allocates the buffer before reading into it so that the amount of
copying is reduced.
Also removed some of the not quite helpful debug messages.
The client connections had the Nagle algorithm enabled which could cause
bad performance with smaller workloads. The common network configuration
code in utils.cc, currently used by the backend connections, sets it
properly.
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
Given that worker.hh was public, it made sense to make routingworker.hh
public as well. This removes the need to include private headers in
modules and allows C++ constructs to be used in C++ code when previously
only the C API was available.