Commit Graph

63 Commits

Author SHA1 Message Date
f2b2951c99 Track the number of performed monitoring intervals
Tracking how many times the monitor has performed its monitoring allows
the test framework to consistently wait for an event instead of waiting
for a hard-coded time period. The MaxCtrl `api get` command can be used to
easily extract the numeric value.
2018-06-06 08:46:46 +03:00
b8d3da4968 Add error tolerance to "servers_no_promotion"
Previously, if the list contained servers that were not monitored by
the monitor yet were valid servers, an error value would be returned
and the monitor failed to start.

With this update, the non-monitored servers are simply ignored when
forming the final list.

Also, added printing of the list to diagnostics.
2018-02-12 10:49:28 +02:00
2181c9d240 Include MariaDB Connector-C headers first
The MariaDB Connector-C headers that are built by MaxScale must be
included before any system headers.

Fixed code that explicitly included the <mysql.h> header to use the
<maxscale/protocol/mysql.h> wrapper instead.
2018-02-07 16:07:17 +02:00
1cf3de4a74 Add config parameter for excluding servers from failover
"servers_no_promotion" is a comma-separated list of servers
which cannot be chosen when selecting a new master during failover
(auto or manual), or when automatically selecting a new master
for switchover (currently disabled).

The servers in the list are redirected normally and can be promoted
by switchover when manually selecting a new master.
2018-02-07 14:07:10 +02:00
4089b6b6fd MXS-1647: Detect API version mismatch
If the API versions do not match, MaxScale will treat this as an
error. The API versioning would allow backwards compatible changes but the
functionality to handle that is not implemented in MaxScale.

Updated API versions based on changes done to module APIs in 2.2.
2018-02-06 14:51:07 +02:00
255250652d Refactor pre-switchover, add similar checks as in failover
Now detects some erroneous situations before starting switchover.
Switchover can be activated without specifying current master.
In this case, the cluster master server is selected.
2018-01-31 10:40:09 +02:00
6036c1cdca MXS-1539: Assign capability bits for all module types
All modules now have an 8-bit range for capability flags. Currently only
the client side authenticator and protocol capability bits are loaded due
to the fact that backend versions of these modules don't relate to a
particular service.
2018-01-03 08:56:41 +02:00
703230a930 Only write monitor journal when it changes
The state of the monitored servers is only persisted if the states of the
servers have changed. This removes the unnecessary disk IO caused by the
writing on the monitor journal.
2017-11-16 15:38:13 +02:00
3a78b716b8 Merge branch '2.2' into 2.2-mrm 2017-10-30 11:06:34 +02:00
2d1e5f46fa Remove use of timestamps in failover code
Using timestamps to detect whether MaxScale was active or passive can
cause problems if multiple events happen at the same time. This can be
avoided by separating events into actively observed and passively observed
events. This clarifies the logic by removing the ambiguity of timestamps.

As the monitoring threads are separate from the worker threads, it is
prudent to use atomic operations to modify and read the state of the
MaxScale. This will impose an happens-before relation between MaxScale
being set into passive mode and events being classified as being passively
observed.
2017-10-27 15:31:46 +03:00
63c7550196 MXS-1490 Prepare for failover functionality addition
Moved mon_process_failover() from monitor.cc to mysql_mon.cc. Renamed
some functions and variables related to previous failover functionality
to avoid confusion.
2017-10-25 12:24:29 +03:00
582a65f77c Do not return empty relationships
If no relationships of a particular type are defined for a resource, the
key for that relationship should not be defined.
2017-10-23 19:37:24 +03:00
df816ea2a9 MXS-1460 Add failover_script parameter
The failover script can now be specified in the configuration file.
2017-10-03 15:24:29 +03:00
7ca8db14de MXS-1444: Add monitor parameter alteration
The parameter handling for monitors can now be done in a consistent manner
by establishing a rule that the monitor owns the parameter object as long
as it is running. This will allow parameters to be added and removed
safely both from outside and inside monitors.

Currently this functionality is only used by mysqlmon to disable failover
after an attempt to perform a failover has failed.
2017-10-03 14:50:20 +03:00
438b4e0341 Merge branch '2.2' into 2.2-mrm 2017-10-02 15:49:08 +03:00
68432bbaa3 Rename MXS_MONITOR::databases to MXS_MONITOR::monitored_servers
More descriptive name. Some local varaibles could now also be
renamed to be more descriptive, but that's for another day.
2017-10-02 15:33:58 +03:00
8d03876e3e Rename MXS_MONITOR_SERVERS to MXS_MONITORED_SERVER
An element in a linked list is not a list.
2017-10-02 15:05:17 +03:00
d4fd34cecd MXS-1446: Move failover parameters into mysqlmon
The `failover` and `failover_timeout` parameters are now declared as a
part of the mysqlmon module. Changed the implementation of the failover
function so that the dependencies on the monitor struct can be removed or
moved into parameters.
2017-09-28 08:23:34 +03:00
ef115208e6 MXS-1446: Move failover to mysqlmon
Split the state change processing and failover handling into two separate
functions and added a call to the failover function into mysqlmon. This
prevents unintended behavior when failover is enabled for non-mysqlmon
monitors. The parameter itself still needs to be moved into mysqlmon.

Moved the failover documentation to the mysqlmon documentation as it is
specific to this monitor.
2017-09-28 07:54:42 +03:00
4c3d6f6884 MXS-1446: Add execution of dummy failover command
The failover command is simulated by executing a call to /usr/bin/echo
with all possible monitor parameters. This allows testing of the failover
mechanism without actually using the failover command.
2017-09-27 19:44:21 +03:00
316f792242 MXS-1446: Make failover_timeout configurable
The time that MaxScale waits for a failover is now configurable.
2017-09-27 19:37:41 +03:00
ef2ee38ccf MXS-1446: Store more detailed event information
The timestamp of the last change from passive to active is now
tracked. This, with the timestamps of the last master_down and master_up
events, allows detection of cases when MaxScale was failed over but the
failover was not done.

Currently, only a warning is logged if no new master has appeared within
90 seconds of a master_down event and MaxScale was set to active from
passive.

The last event and when the event was triggered is now shown for all
servers. The latest change from passive to active is also shown.
2017-09-27 19:32:58 +03:00
3e1d89ff17 MXS-1446: Store last triggered event for each server
When an event occurs on a server, it is now stored so that the last event
for each server is known. This allows a state change to trigger an event
even if, at the time of the event, no action was taken.

This change is only cosmetic as no functionality is implemented.
2017-09-27 19:32:58 +03:00
fe40511d97 MXS-1405: Take script_timeout into use
The script_timeout parameter is now used by all monitors.
2017-09-14 12:34:34 +03:00
53bf21f785 MXS-1262: Use monitor journals in all monitors
All monitors now persist the state of the server in a monitor journal
file.

Moved the removal of stale journals into the core and removed them from
the monitor journal interface.
2017-08-11 04:09:08 +03:00
b448b129d0 MXS-1262: Move journal_max_age to MaxScale core
The parameter is now defined in the monitor. Further refactoring is needed
to make the interface of the journal system simpler.
2017-08-11 04:09:08 +03:00
837d57f4f4 MXS-1262: Move monitor journals into the core
The journaling functionality is now in the core. Only the MySQL Monitor is
using it.
2017-08-11 04:09:07 +03:00
512c3c018d Add recycling of destroyed monitors
If a destroyed monitor is created again, it will be reused. This should
prevent excessive memory growth when the same monitor is created and
destroyed again.
2017-08-09 11:39:24 +03:00
1f4856cdc5 Merge branch '2.1' into develop 2017-08-03 19:01:32 +03:00
f7b8744460 Add more error messages to monitors
When the execution of a query fails, the error reported by the Connector-C
and the server where the query was executed is logged.
2017-08-03 15:42:40 +03:00
f546a17e77 Update change date of 2.2 2017-06-01 10:24:20 +03:00
dd68069471 MXS-1220: Add back the old diagnostic entry point
This makes 2.2 maxadmin backwards compatible with 2.1.
2017-05-04 09:14:04 +03:00
695a4032ca Use common constants for monitor and server parameters
The parameter names for monitors and servers now use a set of constant
names. This removes some of the errors caused by spelling mistakes when
the same parameter name is repeated in multiple places.

The service, filter and listener parameters should also be converted to
constants. This allows for a consistent user experience.
2017-05-04 09:14:04 +03:00
3d2219e8ef MXS-1220: Add missing relationships
The relationships from servers to services and monitors and filters to
services were not implemented. Now each server lists the services and
monitors that use it and each filter lists the services that use the
filter.

This enables the creation of a server and linking of that server to
services and monitors in one atomic operation.
2017-05-04 09:14:03 +03:00
5b9c276123 MXS-1220: Create corrent relation links
When a resource has a relation to another resource, it should be expressed
as a working link to the resource. By passing the hostname of the server
to the functions, we are able to generate working relation links.
2017-05-04 09:14:01 +03:00
bbe0620944 MXS-1220: Add JSON return value to diagnostics entry points
The modules that implement a diagnostics entry point now return a JSON
type object. This removes the need to format data inside the modules.

The module implementations of these are not yet complete which means that
MaxScale will fail to compile.
2017-05-04 09:12:15 +03:00
75b44198ba MXS-1220: Add monitor to JSON conversion
Monitors can now be printed in JSON format. The REST API resource
`/monitors` accepts GET requests and returns a JSON representation of the
monitors as a response.
2017-05-04 09:12:15 +03:00
a362bd0024 Add parameter backend_connect_attempts to monitor
This number (defaults to 1) sets how many times mon_connect_to_db
will try to connect to a backend before returning an error. Every
connection attempt may take backend_connect_timeout seconds to
complete.

Also refactored code a bit. Renamed mon_connect_to_db to
mon_ping_or_connect_to_db, since it does not connect if the connection
is already alive.
2017-04-03 09:56:22 +03:00
5648f708af Update license to BSL 1.1 2017-02-14 21:42:28 +02:00
7d51864402 Clean config.h some more
Moved some typedefs to router.h and server.h, changed a few
constants to these enums. Renamed some types in config.h to
remove "Gateway".

There are still some functions in the public header which are
only used in core, but they seem to fit the theme of public functions
so were not moved.
2017-01-25 16:05:51 +02:00
9430da94b9 Further cleanup of monitor headers 2017-01-23 14:16:34 +02:00
53c5b475ad More cleanup of monitor.h
Move some more functions into core header + miscellaneous cleanup
2017-01-19 15:32:40 +02:00
7ed7f972b1 Divide monitor.h to internal and interface headers similar to filter.h
Definitions and function used by core are in
server/core/maxscale/monitor.h
Definitions and function used by plugins are in
include/maxscale/monitor.h
2017-01-19 09:50:48 +02:00
680401cf8e Rename public types and constants in monitor.h
Preparing to split monitor.h into module and core sections. Also
changed a few comments in monitor.h.
2017-01-17 15:47:13 +02:00
0b6b9c3d81 Format core source code and headers
Formatted core source code and headers with Astyle.
2017-01-17 14:47:50 +02:00
9168974882 Remove extra space after enum names
The monitor event enum values had an extra space which caused them to be
falsely classified as invalid parameters.
2017-01-12 15:33:48 +02:00
a196420c2d Move monitor script processing and launching into the core
This removes parts of the nearly identical code from all monitors.

The removal of monitor type specific event checking is done based on the
assumption that only the monitor that is monitoring the server can be the
cause for a state change. This removes the need to actually check that the
state change is relevant for each monitor and allows the event handling to
be moved into the core.
2017-01-11 14:21:57 +02:00
7e9db7ed0c Have server status updates applied during monitor loop
Previously, server status changes from MaxAdmin would be set immediately
as long as the server lock could be acquired. This meant that it might take
several seconds until the next monitor pass is executed. Usually, this was
fine but in some situations we would want the monitor to run immediately
after the change (MXS-740 and Galera). This patch changes the logic of
setting and clearing status bits to a delayed mode: changes are first applied
to a "status_pending"-variable, and only once the monitor runs will the
setting be applied. To reduce the delay, the monitor now has a flag
which is checked during sleep (between short 0.1s naps). If set, the
sleep is cut short.

If a server is not monitored, the status bits are set directly.

There is a small possibility of a race condition: If a monitor is stopped or
destroyed before the pending change is applied, the change is forgotten.
2016-12-19 10:19:23 +02:00
259e944b3d Server status changes now happen under a lock
MXS-873 To prevent monitors and MaxAdmin from interfering with each other,
changes to the server status flags now happen under a lock. To avoid
interfering with monitor logic, the monitors now acquire locks to all
of their servers at the start of the monitor loop and release them
before sleeping.
2016-12-12 15:04:05 +02:00
cb218804ef Detect double monitoring of servers
Adding a server to multiple monitors is forbidden. This should be detected
and reported to the end user.

The information provided by the config_runtime system to the client isn't
as detailed as it could be. Some sort of an error message stack should be
added so that client facing interfaces could properly report the reason
for the failure. Currently the only way to detect the reason of the
failure is to parse the log files.
2016-12-12 10:48:53 +02:00