50 Commits

Author SHA1 Message Date
Johan Wikman
ff33453e1a MXS-2463 Prepare for another set of queries
Persisted information about dynamic nodes must be used only if
the bootrap information has not been changed, as otherwise we risk
using information that is not valid.
2019-05-07 10:36:21 +03:00
Johan Wikman
6f607e13de MXS-2446 Do not assume created server exists
"Once you eliminate the impossible, whatever remains, no matter
 how improbable, must be the truth." Arthur Conan Doyle

Since server objects are never destroyed, currently the only
explanation for the crash described in MXS-2446 is that a server
created at runtime could not, immediately after the creation, be
found using its name.
2019-05-06 12:13:29 +03:00
Johan Wikman
86b099b487 MXS-2468 When the URLs change, HTTP GET must be cancelled
If the nodes change while a multi HTTP GET is in process, the
corresponding delayed called must be cancelled. Otherwise we
eventually would end up attempting to update the state of the
nodes using the wrong result.
2019-05-03 15:00:00 +03:00
Esa Korhonen
75c0ac5323 Move items from MonitorWorker to MonitorWorkerSimple
MonitorWorker only enforces the use of a worker thread but otherwise
does not define how the monitor is implemented.
2019-04-24 11:27:11 +03:00
Johan Wikman
8b29e70f63 MXS-2428 Allow fixed Clustrix configuration
If 'dynamic_node_detection' has been set to false, then the
Clustrix monitor will not dynamically figure out what nodes are
available, but instead use the bootstrap nodes as such.

With 'dynamic_node_detection' being false, the Clustrix monitor
will do no cluster checks, but simply ping the health port of
each server.
2019-04-16 13:58:27 +03:00
Johan Wikman
e09a6c8100 MXS-2428 Add 'dynamic_node_detection' 'health_check_port'
'dynamic_node_detection' specifies whether the Clustrix monitor
should dynamically figure out what nodes there are, or just rely
upon static information.

'health_check_port' specifies the port to be used when perforing
the health check ping.
2019-04-16 13:58:27 +03:00
Johan Wikman
893059c537 MXS-2424 Use persisted nodes if bootstrap node missing
At runtime the Clustrix monitor will save to an sqlite3
database information about detected nodes and delete that
information if a node disappears.

At startup, if the monitor fails to connect to a bootstrap
node, it will try to connect any of the persisted nodes and
start from there.

This means that in general it is sufficient if the Clustrix
monitor at the very first startup can connect to a bootstrap
node; thereafter it will get by even if the bootstrap node
would disappear for good.
2019-04-12 16:29:21 +03:00
Johan Wikman
c422aafe1d MXS-2424 Refactor for further changes
In subsequent change(s) persisted node information will be used
as a last resort to connect to a Clustrix node.
2019-04-12 16:29:21 +03:00
Johan Wikman
875146f53c MXS-2424 Store information about dynamic Clustrix nodes
Information about the detected Clustrix nodes is now stored to
a Clustrix monitor specific sqlite-database. This will be used
for bootstrapping the Clustrix monitor, in case a statically
defined bootstrap server is unavailable.
2019-04-12 16:29:21 +03:00
Esa Korhonen
d89f0c062b MXS-2271 Change Monitor->m_name to std::string
Also, monitor address is no longer printed.
2019-04-02 13:08:38 +03:00
Johan Wikman
84bf241dd1 MXS-2339 A running Clustrix node is regarded as master
In this context master should be interpreted as "can be read
and written to".

Marking them as master requires less changes in RWS to make it
usable with a Clustrix cluster.
2019-04-02 08:13:50 +03:00
Markus Mäkelä
858327acf7
Rename Being Drained to Draining
With this, the words are unique and can be searched for more easily. This
does not fix the test failure of mxs2273_being_drained.
2019-03-28 13:21:24 +02:00
Johan Wikman
bf5f80b13b Fix ClustrixMonitor
The cluster check can only be made after the monitor has been
started. If done when monitor is configured it will at startup
be done when services are not yet available and hence they will
not be populated with the dynamically discovered servers.
2019-03-25 13:56:39 +02:00
Markus Mäkelä
203bba0e1d
Add support for multiple runtime error messages
Storing all the runtime errors makes it possible to return all of them
them via the REST API. MaxAdmin will still only show the latest error but
MaxCtrl will now show all errors if more than one error occurs.
2019-03-21 18:19:10 +02:00
Esa Korhonen
6b14479b6c MXS-2271 Rename MXS_MONITORED_SERVER to MonitorServer 2019-03-19 13:32:38 +02:00
Esa Korhonen
14b4fa632a MXS-2271 Move Monitor inside maxscale-namespace
Rearranged monitor.cc by namespace.
2019-03-15 12:57:35 +02:00
Esa Korhonen
a8949b2560 MXS-2271 Move free monitor functions into classes
Functions are divided to MonitorManager, Monitor, or the monitored
server.
2019-03-12 10:29:55 +02:00
Esa Korhonen
1858fe9127 MXS-2271 Monitor modifications always go through Monitor::configure()
Previously, runtime monitor modifications could directly alter monitor fields,
which could leave the text-form parameters and reality out-of-sync. Also,
the configure-function was not called for the entire monitor-object, only the
module-implementation.

Now, all modifications go through the overridden configure-function, which calls the
base-class function. As most configuration changes are given in text-form, this
removes the need for specific setters. The only exceptions are the server add/remove
operations, which must modify the text-form serverlist.
2019-03-12 10:19:45 +02:00
Johan Wikman
5c34550b40 MXS-2330 Do not use softfailed node as hub
When a softfailed node is finally revoked, it will appear as the
single node in a functioning Clustrix cluster. To ensure that the
Clustrix monitor will not stick to that node, if the node that is
used as hub is softfailed, it is immediately replaced with another
node.
2019-02-15 08:09:03 +02:00
Johan Wikman
7a99b5d253 MXS-2314 Define monitor state in terms of worker state
Worker::STOPPED    -> MONITOR_STATE_STOPPED
Worker::POLLING    -> MONITOR_STATE_RUNNING
Worker::PROCESSING -> MONITOR_STATE_RUNNING

By defining the monitor state from the worker state there is
no risk they will ever get out of sync. And there is one thing
less to maintain.
2019-02-11 13:03:18 +02:00
Johan Wikman
cac1d76e48 MXS-2314 Monitor decides whether servers are added to services
When the servers of a service are defined by a monitor, then
at startup all servers of the monitor should be added to relevant
services. Likewise, when a server is added to or removed from a
monitor at runtime, those changes should affect services as well.

However, whether that should happen or not depends upon the monitor.
In the case of the Clustrix monitor this should not happen as it
adds and removes servers depending on the runtime state of the
Clustrix cluster.
2019-02-11 13:03:18 +02:00
Johan Wikman
b4eb87dfcc MXS-2314 Populate services with servers
The services whose servers are defined using a monitor, will
now be populated from the monitor.

Note, no consideration has yet been given to runtime changes.
2019-02-11 13:03:18 +02:00
Johan Wikman
692dd195ec MXS-2275 Trigger cluster check if node is down
The likely reason for a node being down is that some cluster level
modifications have been performed. Consequently a cluster check should
be triggered in that case.
2019-02-04 12:02:58 +02:00
Johan Wikman
b582119d27 MXS-2275 Check for softfailed nodes
When checking the node info, also include information about wheter
a node is being SOFTFAILed. If it is, turn on the `Being Drained`
bit.

A node is SOFTFAILed with the intention of removing it, so better
not to create new connections to it as they later would be broken
when the node is actually taken down.
2019-02-01 11:00:53 +02:00
Johan Wikman
55b1e031d6 MXS-2275 Fix breakage due to rebasing 2019-02-01 11:00:53 +02:00
Johan Wikman
6d60714a17 MXS-2275 Always log monitor instance name
When logging something, always log the monitor instance name
as well.
2019-02-01 11:00:53 +02:00
Johan Wikman
cb07687672 MXS-2275 Implement [un]softfailing
It is now possible to [un]softfail a Clustrix node via MaxScale
using a Clustrix monitor module command.

In case a node is successfully softfailed, the `Being Drained` bit
will automatically turned on. Similarly, if a node is successfully
unsoftfailed, the `Being Drained` bit will be cleared.
2019-02-01 11:00:53 +02:00
Johan Wikman
2e395c4477 MXS-2275 Add skeleton softfail/unsoftfail support
Add skeleton implementation for the functionality for being able
to softfail and unsoftfail a Clustrix node.
2019-02-01 11:00:53 +02:00
Esa Korhonen
c8a84cebd0 MXS-2304 Use get_integer() instead of config_get_integer() 2019-01-31 18:12:25 +02:00
Esa Korhonen
0903648542 MXS-2271 Move connection settings inside settings struct
Since the settings are now protected fields, all related functions were
moved inside the monitor class. mon_ping_or_connect_to_db() is now a method
of MXS_MONITORED_SERVER. The connection settings class is defined inside the
server since that is the class actually using the settings.
2019-01-31 17:00:47 +02:00
Esa Korhonen
6326172325 MXS-2271 Rename basic Monitor fields
Adds the m_-prefix.
2019-01-28 15:41:00 +02:00
Esa Korhonen
cef4e836bc MXS-2271 Store monitored servers in a vector
The array is still a public member because it's used in several non-member functions.
2019-01-28 15:41:00 +02:00
Esa Korhonen
546b80de4b MXS-2271 Move monitor interval to settings container 2019-01-25 13:46:01 +02:00
Johan Wikman
92b27500c7 MXS-2276 Fix things due to MXS_MONITOR -> Monitor change 2019-01-25 10:30:27 +02:00
Johan Wikman
42b3402a71 MXS-2276 Use dynamic servers also for cluster check
Once the monitor has been able to connect to a clustrix node
and obtain the clustrix nodes, it'll primarily use those nodes
when looking for a Clustrix node to be used as the "hub".
With this change it is sufficient (but perhaps unwise) to provide
a single node boostrap node in the configuration file.

Some other rearrangements and renamings of functions has also been
made.
2019-01-25 10:30:27 +02:00
Johan Wikman
6937bb6663 MXS-2274 Create globally unique name for dynamic servers
Convention needs to be that the runtime object creating other objects
needs to incorporate its own name in the name of any object created.
Together with the '@@' prefix that ensures that the created name will
be reasonably globally unique.
2019-01-24 17:42:29 +02:00
Esa Korhonen
f6cec41dd8 MXS-2271 Monitor config name and instance name are parameters of createInstance()
Also adds/moves some comments from previous entrypoints. Name and module
are now constant fields.
2019-01-24 09:49:53 +02:00
Esa Korhonen
dadb6a1a79 MXS-2271 All monitors inherit from MXS_MONITOR
Most of the API entrypoints are replaced with virtual functions.
2019-01-22 15:59:17 +02:00
Johan Wikman
a7f0bcc4c5 MXS-2219 Close server connection if unusable
If a server cannot be used, close the associated MYSQL connection.
Further, when an existing connection is used, verify that the server
is still part of the quorum.
2019-01-21 15:41:55 +02:00
Johan Wikman
c51895eaad MXS-2219 Replace for_each with regular for loops
In this context the former provides no advantage.
2019-01-21 15:41:55 +02:00
Johan Wikman
01c3da9e0f MXS-2219 Check that monitored server is part of quorum
When the monitor connects to a Clustrix node, it checks that
the node is part of the quorum, before taking it into use.
2019-01-21 15:41:55 +02:00
Johan Wikman
6b556859ce MXS-2219 Use system.membership as primary table
From system.membership we can find out what server exist in the
cluster while system.nodeinfo contains information about those
servers. If a node goes down, it will disappear from system.nodeinfo,
but not from system.membership. Consequently, we must start from
system.membership and then fetch more information from system.nodeinfo.

Incidentally, a query like

    SELECT ms.nid, ni.iface_ip
    FROM system.membership AS ms
        LEFT JOIN system.nodeinfo AS ni ON ms.nid=ni.nodeid;

should provide all information in one go, but it seems that such joins
are not supported on the system tables.
2019-01-21 15:41:55 +02:00
Johan Wikman
f7c840df26 MXS-2219 Update datastructures instead of recreating them
The node infos of the Clustrix servers are now kept around and
and updated based upon changing conditions instead of regularly
being re-created.

Further, the server is now looked up by name only right after
having been created (and that only due to runtime_create_server()
currently being used).

The state of the dynamically created server is now updated directly
as a result of the health-check ping, while the state of the bootstrap
servers is updated during the tick()-call according to the monitor
"protocol".
2019-01-21 15:41:55 +02:00
Johan Wikman
ac61e205d8 MXS-2219 Dynamically create Clustrix servers
MaxScale server objects are now created for all Clustrix nodes.
Currently the name is "Clustrix-Server-N" where N is the number
of the node.

The server is created using runtime_create_server() that has been
modified so that it optionally will not persist the created server.
That is probably just a temporary solution as a monitor should not
need to include .../core/internal-stuff.
2019-01-17 11:11:21 +02:00
Johan Wikman
bd2eb3d5dc MXS-2219 Allow starting Clx monitor with no servers 2019-01-17 11:11:21 +02:00
Johan Wikman
89c059411d MXS-2219 Add health check threshold
Make it configurable how many times a node may fail to respond
on the health check port before it is considered to be down.
2019-01-17 11:11:21 +02:00
Johan Wikman
880842e55d MXS-2219 Perform cluster monitoring as well
Now the monitor
- will frequently ping the health port of each server
- less frequently check from system.membership the actual
  number of available nodes
and act accordingly.

Currently, the updated servers are the ones listed in the conf
file. Subsequently this will be changed so that the servers listed
in the configuration file are only used for bootstrapping the monitor
and server objects are then created dynamically according to what is
found in the cluster.
2019-01-17 11:11:21 +02:00
Johan Wikman
f8545a0b7f MXS-2219 Address review comments 2019-01-07 12:59:57 +02:00
Johan Wikman
4512295e40 MXS-2219 Implement rudimentary Clustrix monitoring
The monitor now pings the health check ports of the Clustrix nodes.
A response translates to RUNNING and a non-response to DOWN.
2019-01-07 12:59:27 +02:00
Johan Wikman
115feab946 MXS-2164 Add skeleton Clustrix monitor 2018-12-18 15:17:09 +02:00