The RocksDB TTL database only honours the TTL when the database
is compacted. If the database is not compacted, stale values will
be returned until the end of time.
Here we utilize the knowledge that the TTL is stored after the
actual value and use the root database for getting the value,
thereby getting access to the timestamp.
It's still worthwhile using the TTL database as that'll give
us compaction and the removal of stale items.
RocksDB is cloned from github and version v4.9 (latest at the time of
this writing) is checked out.
RocksDB can only be compiled as C++11, which means that RocksDB and hence
storage_rocksdb can be built only if the GCC version is >= 4.7.
The actual storage implementation is quite straightforward.
- The key is a SHA512 of the entire query. That will be changed so that
the database/table name is stored in the beginning of the key unhashed
as that will cause cached items from the same table to be stored
together. Assumption is that if you access something from a particular
table, chances are you will access something else as well.
- When the SO is loaded, the initialization function will created a
subdirectory storage_rocksdb under the MaxScale cache directory.
- For each instance, the RocksDB cache is created into a directory
whose name is the same as the cache filter name in the configuration
file, under that directory.
- The storage API's get and put functions are then mapped directly on
top of RockDB's equivalent functions.
The `detect_stale_slave` functionality used to only work when MaxScale had
the knowledge that a master server has existed and that replication was
working at some point in time. This might be a "safe" way to do it in
regards to staleness of the data but in practice it is preferrable to
always allow slave to be used for reads.
This change adds the missing functionality to the monitor by assigning
slave status to all servers which are configured as replication slaves
when no master can be found.
The new member variable that was added to the SERVER should be removed in
2.1 where the server_info offers the same functionalty without "polluting"
the SERVER type.
The different server monitoring functions all did similar work and
combining them into one function makes the whole process of monitoring a
server simpler.
If a relay master server is found in the replication tree, it should not
get the master status. Previously all master servers were assigned the
master status regardless of their depth in the replication tree.
By comparing the depth value of each potential master, the monitor can
find the right master at the root of the replication tree.
The mysqlmon now supports proper detection of multi-master topologies by
building a directed graph out of the monitored server. If cycles are found from
this graph, they are assigned a master group ID. All servers with a positive
master group ID will receive the Master status unless they have `@@read_only`
enabled.
This new functionality can be enabled with the 'multimaster' boolean
parameter.
Mysqlmon now stores the values of read_only, slave_sql_running,
slave_io_running, the name and position of the masters binlog and the
replication configuration status of the slave.
This allows more detailed server information to be displayed with the
`show monitor <name>` diagnostic interface. In addition to this, the new
structure used to store them provides an easy way to store information
that is specific to a monitor and the servers it monitors.
These new status variables can be used to implement better multi-master
detection in mysqlmon by using the value of read_only to resolve
situations where multiple master candidates are available.
When a client executes commands which do not return results (for example
inserting BLOB data via the C API), readwritesplit expects a result for
each sent packet. This is a somewhat of a false assumption but it clears
itself out when the session is closed normally. If the session is closed
due to an error, the counter is not decremented.
Each sesssion should only increase the number of active operation on a
server by one operation. By checking that the session is not already
executing an operation before incrementing the active operation count the
runtime operation count will be correct.
It's now possible to use both a Unix domain socket and host/port
when connecting with MaxAdmin to MaxScale.
By default MaxAdmin will attempt to use the default Unix domain
socket, but if host and/or port has been specified, then an inet
socket will be used.
maxscaled will authenticate the connection attempt differently
depending on whether a Unix domain socket is used or not. If
a Unix domain socket is used, then the Linux user id will be
used for the authorization, otherwise the 1.4.3 username/password
handshake will be performed.
adminusers has now been extended so that there is one set of
functions for local users (connecting locally over a Unix socket)
and one set of functions for remote users (connecting locally
or remotely over an Inet socket).
The local users are stored in the new .../maxscale-users and the
remote users in .../passwd. That is, the old users of a 1.4
installation will work as such in 2.0.
One difference is that there will be *no* default remote user.
That is, remote users will always have to be added manually using
a local user.
The implementation is shared; the local and remote alternatives
use common functions to which the hashtable and filename to be
used are forwarded.
The commands "[add|remove] user" behave now exactly like they did
in 1.4.3, and also all existing users work out of the box.
In addition there is now the commands "[enable|disable] account"
using which Linux accounts can be enabled for MaxAdmin usage.
The change in readwritesplit routing priorities, where hints have the
highest priority, gives users more options to control how readwritesplit
acts.
For example, this allows read-only stored procedures to be routed to
slaves by adding a hint to the query:
CALL myproc(); -- maxscale route to slave
The readwritesplit documentation also warns the user not to use routing
hints unless they can be absolutely sure that no damage will be done.
The binlogrouter requires that users are not loaded at startup. This
allows it to inject the service user into the list of valid MySQL users so
that the binlogrouter can be controlled via the listeners.
The authenticator modules now load the user data when the new loadusers
entry point is called. This new entry point is optional.
At the moment the code that was in service.c was just moved into the
modules but the ground work for allowing different user loading mechanisms
is done.
Further improvements need to be made so that the authenticators behave
more like routers and filters. This work includes the creation of a
AUTHENTICATOR module object, addition of createInstance entry points for
authenticators and implementing it for all authenticators.
Local admins are the ones accessing MaxScale on the same host
over a Unix domain socket, and who are strongly identified), and
optional remote admins are the ones accessing MaxScale potentially
over a tcp socket (potentially over the network), and who are
weakly identified.
These are completely separate and a different set of functions
will be needed for managing them. This initial change merely
renames the functions.
With this change, if two master servers both have equal depths but
different weights, the one with the higher weight is used. If the depths
and weights are equal, the first master listed in the configuration is
used.
The cache filter consists of two separate components; the cache
itself that evaluates whether a particular query is subject to
caching and the actual cache storage. The storage is loaded at
runtime by the cache filter. Currently using a custom mechanism;
once the new plugin loading macros/mechanism is in place, I'll see
if that can be used.
There are a few open questions/issues.
- Can a GWBUF delivered to the filter contain more MySQL packets
than one? If yes, then some queueing mechanism needs to be
introduced. Currently the code is written so that the packets
are processes in a loop, which will not work.
- Currently, the storage API is synchronous. That may work with a
storage built upon RocksDB, that writes asynchronously to disk,
but not with memcached that can be (and in MaxScale's case
would have to be) used asynchronously.
Reading may be problematic with RocksDB as values are returned
synchronously. So that will stall the thread being used. However,
as RocksDB uses in-memory caching and it is possible to arrange
so that e.g. selects targeting the same table are stored together,
it is not obvious what the impact would be.
So as not to block the MaxScale worker threads, there'd have to
be a separate thread-pool for RocksDB access and then arrange
the data to be moved across.
But initially, the inteface is synchronous.
- How is the cache configured? The original requirement mentions
all sorts of parameters - database name, table name, column name,
presence of WHERE clause, regexp, date/time of query, user -
but it's not alltogether clear exactly how they should be specified
and how they should interract. So initially all selects will
be subject to caching with a TTL.
The section name of a filter in the MaxScale configuration
file is unique for each filter. Hence, by providing that name
to the filter instance creation function, it is possible to
act differently depending on which particular instance is
being created.
Case in point.
There can be multiple cache filters defined in the MaxScale
configuration file, each with a different set of rules, ttl
and even backing store.
Each cache will be backed by a separate storage that e.g.
in the case of RocksDB will correspond to a particular path.
In other words, for each cache instance corresponding to a
particular cache definition in the MaxScale configuration file,
we need to be able to create a unique path.
If the filter section name (in the MaxScale configuration file)
that anyways needs to be unique, is provided to the filter, then
that name can be used when forming the unique path.
The alternative is to require the DBA to provide some unique
parameter for each cache definition, which adds configuration
overhead and is errorprone.
Furthermore, by providing the name to filters, also error
messages can be customized for a particular section when
appropriate.
C++11 requires a space between literal and string macro.
Does not change the content or layout of GW_MYSQL_VERSION.
RocksDB must be compiled with C++11.
Currently, if the address/port information of a maxscaled protocol
listener is not updated to socket when MaxScale is upgraded to 2.0,
maxscaled would not start, with the effect of a user loosing maxadmin
after an upgrade.
After this change, if address/port information is detected, a warning
is logged and the default socket path is used. That way, maxadmin will
still be usable after an upgrade, even if the address/port information
is not updated.
The Aurora monitor inspects the status information in the
`replica_host_status` table in the `information_schema` database. Using
this information the monitor determines which of the nodes is the master
for of this Aurora cluster.
This monitor also supports monitor scripts as described in
Monitor-Common.md.