The parameter extraction caused a recursive lock of the server
spinlock. To work around this, an unlocked version of server_get_parameter
is needed.
Ideally, a lock-free setup would be used but due to this being a bug fix,
it will have to be done later on.
StartMonitor() now takes a MXS_MONITOR_INSTANCE and returns
true, if the monitor could be started and false otherwise.
So, the setup is such that in createInstance(), the instance
data is created and then using startMonitor() and stopMonitor()
the monitor is started/stopped. Finally in destroyInstance(),
the actual instance data is deleted.
The following type name changes
MXS_MONITOR_OBJECT -> MXS_MONITOR_API
MXS_SPECIFIC_MONITOR -> MXS_MONITOR_INSTANCE
Further, the 'handle' instance variable of what used to be
called MXS_MONITOR_OBJECT has been renamed to 'api'.
An example, what used to look like
mon->module->stopMonitor(mon->handle);
now looks like
mon->api->stopMonitor(mon->instance);
which makes it more obvious what is going on.
CreateInstance() (renamed from initMonitor()) and destroyInstance()
(renamed from finishMonitor()) have now tentatively been
implemented for all monitors.
Next step is to
1) change the prototype of startMonitor() to
bool (*startMonitor)(MXS_SPECIFIC_MONITOR*,
const MXS_MONITOR_PARAMETER*);
and assume that mon->handle will always contain the
instance,
2) not delete any data in stopMonitor(),
3) add monitorCreateAll() that calls createInstance() for all
monitors (and call that in main()), and
4) add monitorDestroyAll() that calls destroyInstance() for
all monitors (and call that in main()).
Now, all monitor functions but startMonitor takes a
MXS_SPECIFIC_MONITOR instead of MXS_MONITOR. That is, startMonitor
is now like a static factory member returning a new specific
monitor instance and the other functions are like member functions
of that instance.
Instead of using void there's now a MXS_SPECIFIC_MONITOR struct
from which monitor specific types can be derived. This change
does not bring about other benefits than a bit of clarity but
this is the first step in clearing up the monitor API.
The `MYSQL_ROW row` variable was being overwritten by the extra query done
by the SST method detection code. Moving it into its own function prevents
this and makes the code significantly easier to comprehend.
Added a test case that reproduced the problem (MaxScale crashed) and
verifies that the patch fixes the problem.
It is now possible to specify the thread stack size to be used,
when a new thread is created. This will subsequently be used
for allowing the stack size to be specified for worker threads.
All monitors now persist the state of the server in a monitor journal
file.
Moved the removal of stale journals into the core and removed them from
the monitor journal interface.
Enabling the option hinders the use of maintenance mode with the root
master node in most use-cases.
This behavior occurs due to the fact that the maintenance mode causes a
server to be treted as if it was down. The Galera monitor waits for the
cluster to reorganize before assigning a new master node. This is correct
(but very unexpected) behavior for single instance use-cases.
The function should actually be in include/maxscale/protocol/mysql.h
and the implementation in MySQLCommon, but the monitors do not link
to that yet.
All MySQL related should be moved to MySQLCommon and the core
refactored so that no MySQL knowledge is needed there.
If a monitor is started and stopped before the external monitoring thread
has had time to start, a deadlock will occur.
The first thing that the monitoring threads do is read the monitor handle
from the monitor object. This handle is given as the return value of
startMonitor and it is stored in the monitor object. As this can still be
NULL when the monitor thread starts, the threads use locks to prevent
this.
The correct way to prevent this is to pass the handle as the thread
parameter so that no locks are required.
This number (defaults to 1) sets how many times mon_connect_to_db
will try to connect to a backend before returning an error. Every
connection attempt may take backend_connect_timeout seconds to
complete.
Also refactored code a bit. Renamed mon_connect_to_db to
mon_ping_or_connect_to_db, since it does not connect if the connection
is already alive.
When log messages are written with both address and port information, IPv6
addresses can cause confusion if the normal address:port formatting is
used. The RFC 3986 suggests that all IPv6 addresses are expressed as a
bracket enclosed address optionally followed by the port that is separate
from the address by a colon.
In practice, the "all interfaces" address and port number 3306 can be
written in IPv4 numbers-and-dots notation as 0.0.0.0:3306 and in IPv6
notation as [::]:3306. Using the latter format in log messages keeps the
output consistent with all types of addresses.
The details of the standard can be found at the following addresses:
https://www.ietf.org/rfc/rfc3986.txthttps://www.rfc-editor.org/std/std66.txt
The static capabilities declared in getCapabilities allows certain
capabilities to be queried before instances are created. The intended use
of this capability is to remove the need for the `is_internal_service`
function.
Monitored nodes could be part of different cluster UUIDs: select only
the ones belonging to UUID with more joined nodes.
In case of different UUIDs if the joined numbers is less than (n_nodes
/ 2 ) + 1 don’t consider any node part of the cluster
Moved some typedefs to router.h and server.h, changed a few
constants to these enums. Renamed some types in config.h to
remove "Gateway".
There are still some functions in the public header which are
only used in core, but they seem to fit the theme of public functions
so were not moved.
The slave nodes par od the cluster are sorted by wsrep_local_index or
server priority.
The sorted list od nodes is then used in ‘wsrep_sst_donor’ variable.
Note: candidate master servers are always at the end of the list