Added `match` and `exclude` functionality. This allows versatile filtering
without a large investment of development time by leveraging the benefits
of PCRE2 regular expressions.
Also cleaned up the filter and removed the single table matching and
active parameter that were obsoleted by the regular expression parameters.
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
qc_thread_init() must now explicitly be called in every thread
and not just in other threads but the one where qc_process_init()
is called.
This change was caused by QC_INIT_SELF initialization actually
being performed in query_classifier.cc. Before this change, there
actually was a leak in the routing worker running in the main
thread, the query classification cache was created twice.
In principle it would be better if the qc information were
obtained via a specific query_classifier resource. However,
there are multiple problems with that (e.g. the qc has no way
of safely accessing information of another thread) and hence
the worker specific qc cache statistics is reported as part of
the worker statistics.
The cache now enforces the defined maximum size by evicting some
entries in case the insertion of a new entry would cause the max
size to be exceeded. Currently the eviction algorithm simply
removes a random element.
Now takes a structure that, if present, enables the query
classification caching and specifies the properties of the
cache.
For the time being no actual properties are yet available.
The mapping from a canonical statement to the query classification
result is maintained by the class QCInfoCache of which there exist
an instance per thread. That way no locking is needed but the
information will be cached multiple times (but that is a smaller
price to pay). Currently the information is stored in a regular
std::unordered_map, which means that the consumed amount of
memory will just keep on growing unless the number of canonical
statements used by clients happens to have an upper bound.
The LRU cache (that provides means for putting a bound on the
amount of memory used and number of items) used in the cache filter
will be generalized and be taken into use here as well.
The key is now the canonical statement itself, which means that
a fair amount of memory will be used. To preserve memory it might
make sense to use a hashed value instead, although that at least
in principle opens up the possibility for unintended collisions.
This feature will also be made configurable.
The query classifier stores information about the statement carried
by a GWBUF in the GWBUF itself. We need to be able to store that
object out side the lifetime of the GWBUF. So, we require that a
query classifier is capable of duplicating references to that object.
The two operations return different types of results and need to be
treated differently in order for them to be handled correctly in 2.2.
This fixes the unexpected internal state errors that happened in all 2.2
versions due to a wrong assumption made by readwritesplit. This fix is not
necessary for newer versions as the LOAD DATA LOCAL INFILE processing is
done with a simpler, and more robust, method.
If the API versions do not match, MaxScale will treat this as an
error. The API versioning would allow backwards compatible changes but the
functionality to handle that is not implemented in MaxScale.
Updated API versions based on changes done to module APIs in 2.2.
Basically it would be trivial to report far more operations
explicitly, but for the fact that the values in qc_query_op_t
currently, quite unnecessarily, form a bitmask.
In 2.2 that is no longer the case, so other operations will be
added there.
Changed the query operation enum to contain implicit enum values instead
of providing. The operation was never used as a bitmask so it is pointless
to declare them as such.
Added the EXECUTE type to the enum and used it in qc_sqlite and
qc_mysqlembedded.
By default, only the essentials - the type and the operation - of
a statement will be collected and only if fields, tables, functions
and databases are explicitly asked for, will they be collected.
However, a statement will be parsed at most twice; if parsing is
needed a second time then all information will be collected.
If it is known that some particular information is needed, then
qc_parse() can be called explicitly to ensure it is collected
at first parsing.
It is now possible to specify what information the caller is interested
in. With this the cost for collecting information during the query parsing
that nobody is interested in can be avoided.
Transaction boundaries can now be detected using regexes.
All else being equal, it gives a 10% performance improvement
compared to qc-based detection.
In a subsequent change, mysql_client.c will be modified to use
qc_get_trx_type_mask() instead of qc_get_type_mask().
Currently the use of regex matching is turned on using an
environment variable. That will change.
The process and thread initialization/finalization of the query
classifier plugins is handled using the process and thread
initialization/finalization functions in the module object.
However, the top-level query classifier will also need to perform
process and thread initialization when transaction boundaries are
detected using regular expressions.
Since the whole preparable statement will be available, it is
superfluous to provide a function using which the operation of
the preparable statement can be obtained.
This function will return the preparable statement of a PREPARE
statement as a COM_QUERY GWBUF. That is, once obtained, the normal
query classifier functions can be used for obtaining information
about the preparable statement itself.
- Only types of fixed size used in API interface
- The actual function return value specifies whether the parsing
process succeeded, while "logical" return values are returned
as out arguments.
The wrapper function currently ignores the function return value.
The setting up and the initialization of the query classifier has
now been separated. The gateway explicitly sets up the query
classifier (i.e. chooses which one to use and what arguments to
provide), but the actual initialization is performed as part of
the general module initialization.
Together with the field names, now qc_get_field_info also returns
field usage information, that is, in what context a field is used.
This allows, for instance, the cache to take action if a a particular
field is selected (SELECT a FROM ...), but not if it is used in a
GROUP BY clause (...GROUP BY a).
This caused a significant modifications of qc_mysqlembedded that
earlier did not walk the parse-tree, but instead looped over of a
list of st_select_lex instances that, the name notwithstanding,
also contain information about other things but SELECTs. The former
approach lost all contextual information, so it was not possible
to know where a particular field was used.
Now the parse tree is walked, which means that the contextual
information is known, and thus the field usage can be updated.
Function is no longer used and it was quite unoptimal, so now
removed.
qc_get_prepare_name, qc_get_prepare_operation and qc_get_field_info
that were missing from qc_dummy added at the same time.
This function returns more detailed information about the fields
of a statement. Supersedes qc_get_affected_fields() that will
be deprecated and removed.
Note that this function now introduced new kind of behaviour; the
returned data belongs to the GWBUF and remains valid for as long as
the GWBUF is alive. That means that unnecessary copying need not
be done.
The operation of the statement to be prepared is no longer
reported as the operation of the PREPARE statement.
Instead, when the type of the statement is
QUERY_TYPE_PREPARE_NAMED_STMT, the operation can be obtained
using qc_get_prepare_operation().
The qc_mysqlembedded implementation will be provided in a
subsequent commit.