With the global configuration parameter 'query_classifier_cache'
the query classification cache can be turned on. At the moment it
does not matter what value it has; its presence simply enables the
caching.
Eventually you will be able to specify how much memory the cache
is allowed to consume.
Now takes a structure that, if present, enables the query
classification caching and specifies the properties of the
cache.
For the time being no actual properties are yet available.
The sql mode affects the result of the query classification.
Consequently, we cannot use the cached result if it was generated
when the sql mode was something else than what it currently is.
The mapping from a canonical statement to the query classification
result is maintained by the class QCInfoCache of which there exist
an instance per thread. That way no locking is needed but the
information will be cached multiple times (but that is a smaller
price to pay). Currently the information is stored in a regular
std::unordered_map, which means that the consumed amount of
memory will just keep on growing unless the number of canonical
statements used by clients happens to have an upper bound.
The LRU cache (that provides means for putting a bound on the
amount of memory used and number of items) used in the cache filter
will be generalized and be taken into use here as well.
The key is now the canonical statement itself, which means that
a fair amount of memory will be used. To preserve memory it might
make sense to use a hashed value instead, although that at least
in principle opens up the possibility for unintended collisions.
This feature will also be made configurable.
Minor cleanup:
- Unit variables places in anonymous namespace.
- Unit variables accessed via this_unit struct.
This is the first commit of many where the mapping from canonical
statement to query classification result is introduced.
The mapping is placed above the actual query classifier, so that all
query classifiers will benefit from it.
The mapping will be maintained by thread so that there will not be
any synchronization issues. Further, initially a simple map without
upper boundary will be used; if this is found to provide measurable
benefits, the map will then be replaced with an LRU mechanism so
that it becomes possible to specify just how much memory the mapping
may use.
The evq_length file held the returned number of descriptors from
the last epoll_wait() call. As such it is highly temporal and not
particularly meaningful.
That has now been removed and the instead the average number of
returned descriptors is maintained. That information changes slowly
and thus carries some meaning.
It is now possible to prevent the masking filter from rejecting
statements using functions in conjunction with fields to be
masked. So now it is possible to not use the blanket rejection
of the masking filter and replace it with more detailed firewall
rules.
The masking filter works only on the result-set. However, if
functions are used, the column names will not be available in
the result-set, and hence masking will not take place.
Now, the statement is checked and if functions are used in
conjunction with columns that should be masked, the statement
is rejected. Thus, functions can no longer be used for bypassing
the masking. That was possible earlier as well, but required
manually setting up the firewall filter.
If a service has no active servers and users are injected, a warning would
be logged. This is a misleading warning if the service has no servers and
should only be logged if the failure to load any users is an unexpected
situation.
The log manager is the only one that uses the mlist_t versioned list. The
counter that keeps track of the version number was not modified using
atomic operations meaning that the compiler is free to optimize away parts
of the lock-free versioning mechanism that uses it.
To prevent this optimization, the variable is declared volatile. A rewrite
is direly needed but it cannot be done in 2.2.
Empty databases were not mapped to ServerMap because they had no tables
to map. Modified the query to also include empty databases making it
possible to also connect to empty databases through schemarouter.
Due to the skewed accept distribution without SO_REUSEPORT, we use
round-robin assignment of workers for new client connections. This
provides better performance as work is more likely to be evenly
distributed across all threads.
Using a least-busy-worker algorithm would provide a more stable result but
this is not trivially simple to implement. For this reason, the
round-robin based approach was chosen for 2.2.
Parameters that accept whitespace-only values need to have their default
values quoted if they contain only whitespace characters. In 2.2 the
qlafilter is the only module that did not do this.
When a valid target was not found, no error message was logged by the
router. This would cause the "Routing the query failed. Session will be
closed." message to be logged with no explanation as to why the routing
failed.
In addition to the above-mentioned case, no message would be logged if the
target for a COM_STMT_FETCH was not in use.
If the authentication failure was due to a missing database, this extra
information can be logged. This will help cases where users are using
databases that do not exist.
If two or more session commands contain identical buffers, the buffer of
the first session command is shared between the others. This reduces the
amount of memory used to store repeated executions of session commands.
The purging of session command history in readwritesplit was replaced with
session command de-duplication. This was done to prevent problems that
could arise when the order of session commands plays a significant role.
The assertion that was added to RWSplitSession::handle_slave_is_target
failed when delayed_retry was enabled or when slave reconnection
occurred. In 2.3, targets returned by the target selection functions do
not need to be in use but they must be valid connection targets.
All executed session commands were logged in the RWSplitSession
destructor. This is not really necessary and shouldn't have been placed
there in the first place.
When the `optimistic_trx` mode is enabled, all transactions are started on
a slave server. If the client executes a query inside the transaction that
is not of a read-only nature, the transaction is rolled back and replayed
on the master.
The optimistic_trx parameter will control whether transactions are assumed
to be read-only and will be optimistically executed on slave
servers. Currently, the parameter does nothing.
Unconditionally update the previous target on each routed query. This
allows routing to the previous server in case it is needed. One example of
this is a new type of hint that allows routing to the same server where
the previous query was sent.
Also added a minor clarifying comment to the resetting of the
current_query.
Formatted readwritesplit with Astyle. Changed the initialization of
Backend::m_modutil_state to use curly braces to cope with Astyle's lack of
support for curly braces inside parentheses.
Readwritesplit now keeps track of how many read-only and read-write
transactions have been executed. This allows a coarse estimation of how
widely read-only transactions are done even without explicit read-only
transactions being used (i.e. START TRANSACTION READ ONLY).
The characteristics of a transaction can now be tracked by the query
classifier. This allows read-only and read-write transaction statistics to
be calculated.
Also change the following defaults:
- "selects": Was "verify_cacheable", is now "assume_cacheable"
- "cached_data": Was "shared", is now "thread_specific"
Readwritesplit would crash with the following transaction:
BEGIN;
SET @a = 1; -- This is where it would crash
COMMIT;
When a session command was a part of the transaction, empty queries
(i.e. NULL GWBUFs) would be added to the transaction. If the transaction
were to be replayed, MaxScale would crash when these NULL queries were
executed.
Once the empty responses were fixed, the replaying of the transaction
would fail with a checksum mismatch. This was caused by the wrong order of
processing in RWSplitSession::clientReply. The response processing for
session commands was done after the response processing for replayed
transactions. This would trigger a checksum comparison too early for the
transaction in question.
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.
This update required several modifications in related code.
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.
Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.