Added find_package module for CMake which locates the Jansson libraries
and headers. This will make dependency checking easier and prevents build
failures due to missing dependencies.
The general purpose stuff in skygw_utils.h was moved to utils.h
and the corresponding implementation from skygw_utils.cc to utils.c.
Includes updated accordingly.
Skygw_utils.h is now only used by log_manager and by mlist, which
is only used by log_manager. Consequently, skygw_utils.h was moved
to server/maxscale.
Utils.h needs a separate overhaul.
- STRERROR_BUFLEN moved to cdefs.h and renamed to MXS_STRERROR_BUFLEN.
Better would be to provide a 'const char* mxs_strerror(int errno)'
that would have a thread specific buffer for the error message.
- MIN and MAX also moved to defs.h as MXS_MIN and MXS_MAX.
- Now only mlist.h of the headers depend upon skygw_utils.h.
- All now include maxscale/cdefs.h as the very first file.
- MXS_[BEGIN|END]_DECLS added to all C-headers.
Strictly speaking not necessary for private headers, but
does not hurt either.
- Include guards moved to the very top of the file.
- #pragma once added.
- Headers now to be included as <maxscale/xyz.h>
- First step, no cleanup of headers has been made. Only moving
from one place to another + necessary modifications.
The storage_rocksdb version is now stored to the database. That will
ensure that should the format (key content, length, etc.) change, we
can will detect whether a database is too old and take action.
The databases targeted by a query are now included in the key.
That way, two identical queries targeting a different default
database will not clash.
Further, it will mean that queries targeting the same databases
are stored near each other, which is good for the performance.
The concept of 'allowed_references' was removed from the
documentation and the code. Now that COM_INIT_DB is tracked,
we will always know what the default database is and hence
we can create a cache key that distinguises between identical
queries targeting different default database (that is not
implemented yet in this change).
The rules for the cache is expressed using a JSON object.
There are two decisions to be made; when to store data to the
cache and when to use data from the cache. The latter is
obviously dependent on the former.
In this change, the 'store' handling is implemented; 'use'
handling will be in a subsequent change.
With this change, the cache will be aware of which default database
is being used. That will remove the need for the cache parameter
'allowed_references' and thus make the cache easier to configure
and manage.
The return values of pcre2_match are now properly handled. A positive
match is a return value which is greater than or equal to zero. This fix
should give a small performance boost to as memory is no longer needlessly
allocated.
When a query has been sent to a backend, the response is now
processed to the extent that the cache is capable of figuring
out how many rows are being returned, so that the cache setting
`max_resultset_rows` can be processed.
The code is now also written in such a manner that it should be
insensitive to how a package has been split up into a chain of
GWBUFs.
The RocksDB TTL database only honours the TTL when the database
is compacted. If the database is not compacted, stale values will
be returned until the end of time.
Here we utilize the knowledge that the TTL is stored after the
actual value and use the root database for getting the value,
thereby getting access to the timestamp.
It's still worthwhile using the TTL database as that'll give
us compaction and the removal of stale items.
RocksDB is cloned from github and version v4.9 (latest at the time of
this writing) is checked out.
RocksDB can only be compiled as C++11, which means that RocksDB and hence
storage_rocksdb can be built only if the GCC version is >= 4.7.
The actual storage implementation is quite straightforward.
- The key is a SHA512 of the entire query. That will be changed so that
the database/table name is stored in the beginning of the key unhashed
as that will cause cached items from the same table to be stored
together. Assumption is that if you access something from a particular
table, chances are you will access something else as well.
- When the SO is loaded, the initialization function will created a
subdirectory storage_rocksdb under the MaxScale cache directory.
- For each instance, the RocksDB cache is created into a directory
whose name is the same as the cache filter name in the configuration
file, under that directory.
- The storage API's get and put functions are then mapped directly on
top of RockDB's equivalent functions.
The cache filter consists of two separate components; the cache
itself that evaluates whether a particular query is subject to
caching and the actual cache storage. The storage is loaded at
runtime by the cache filter. Currently using a custom mechanism;
once the new plugin loading macros/mechanism is in place, I'll see
if that can be used.
There are a few open questions/issues.
- Can a GWBUF delivered to the filter contain more MySQL packets
than one? If yes, then some queueing mechanism needs to be
introduced. Currently the code is written so that the packets
are processes in a loop, which will not work.
- Currently, the storage API is synchronous. That may work with a
storage built upon RocksDB, that writes asynchronously to disk,
but not with memcached that can be (and in MaxScale's case
would have to be) used asynchronously.
Reading may be problematic with RocksDB as values are returned
synchronously. So that will stall the thread being used. However,
as RocksDB uses in-memory caching and it is possible to arrange
so that e.g. selects targeting the same table are stored together,
it is not obvious what the impact would be.
So as not to block the MaxScale worker threads, there'd have to
be a separate thread-pool for RocksDB access and then arrange
the data to be moved across.
But initially, the inteface is synchronous.
- How is the cache configured? The original requirement mentions
all sorts of parameters - database name, table name, column name,
presence of WHERE clause, regexp, date/time of query, user -
but it's not alltogether clear exactly how they should be specified
and how they should interract. So initially all selects will
be subject to caching with a TTL.
The section name of a filter in the MaxScale configuration
file is unique for each filter. Hence, by providing that name
to the filter instance creation function, it is possible to
act differently depending on which particular instance is
being created.
Case in point.
There can be multiple cache filters defined in the MaxScale
configuration file, each with a different set of rules, ttl
and even backing store.
Each cache will be backed by a separate storage that e.g.
in the case of RocksDB will correspond to a particular path.
In other words, for each cache instance corresponding to a
particular cache definition in the MaxScale configuration file,
we need to be able to create a unique path.
If the filter section name (in the MaxScale configuration file)
that anyways needs to be unique, is provided to the filter, then
that name can be used when forming the unique path.
The alternative is to require the DBA to provide some unique
parameter for each cache definition, which adds configuration
overhead and is errorprone.
Furthermore, by providing the name to filters, also error
messages can be customized for a particular section when
appropriate.