119 Commits

Author SHA1 Message Date
Esa Korhonen
7e6ce2d13f MXS-1937 Disable server events on master during switchover
The feature is behind a config setting.
2018-09-05 15:54:08 +03:00
Esa Korhonen
85d8a85cde Update master failure detection from slaves
The detection now works with multiple slave connections.
2018-08-29 17:07:52 +03:00
Esa Korhonen
a593d00c65 Simplify failed master detection
No longer depends on monitor events as the other operations do not
either. The active_event/new_event detection was removed, as these
only protect against a rare situation. A similar feature which
protects all the cluster modifications will be implemented later.
2018-08-29 17:04:05 +03:00
Esa Korhonen
9e566bc619 A master that is down with no running slaves can be replaced
This should be a more general way to detect situations where a DBA
or another MaxScale performs a failover.
2018-08-29 16:41:20 +03:00
Esa Korhonen
61bb172033 Cleanup failover/switchover
Replication settings warnings are printed once more. Changed some
parameter names to be more consistent within the monitor.
2018-08-23 10:28:47 +03:00
Esa Korhonen
f2dfd39f79 Clean up JSON diagnostics
Now prints all slave connections.
2018-08-22 12:23:28 +03:00
Esa Korhonen
3777da96bd Miscellaneous cleanup
Removes needless status assignments and unused code. Moves and modifies some comments.
2018-08-22 12:11:33 +03:00
Niclas Antti
24ab3c099c Move top of the file "#pragma once" to after the following comment (swap them). If the comment is a BPL update it to the latest one 2018-08-21 13:13:15 +03:00
Esa Korhonen
0d762b2019 MXS-2012 Remove old replication lag detection
The method was quite disruptive as it continuosly wrote to the database.
Also removed the MaxScale ID global, as it wasn't used for anything else.
2018-08-21 11:51:10 +03:00
Johan Wikman
f9ba8824d4 MXS-2004 Remove additional dependencies on maxscale/thread.h 2018-08-13 13:38:39 +03:00
Johan Wikman
e2902b6513 MXS-2002 Remove GenericFunction typedef
It does not save much in characters compared to std::function<void ()>
and it's a bit misleading as not just any callable object will do, but
only one that takes no argument and returns void.
2018-08-13 08:30:05 +03:00
Esa Korhonen
b9ec3f5130 Monitor json diagnostics printing cleanup
The 'events' and 'script' config values were defined for every monitor.
Removed the extra definitions and moved the variables to MXS_MONITOR.

MariaDBMonitor was printing config values a second time, they are
already printed by the caller.

Moved the events enum definition to the internal header since it's no longer
required by modules.

Added a default config setting "all" to 'events' to clarify that it enables
all events.
2018-08-10 11:19:09 +03:00
Esa Korhonen
3f2838ab36 Avoid repeated logging when retrying automatic failover or switchover
Prevents repeated logging of similar error messages.
2018-08-07 16:36:48 +03:00
Esa Korhonen
17c84a22c7 Refactor preparations to failover
The two operations are quite similar so the code should look
similar as well and use shared functions.
2018-08-07 16:33:56 +03:00
Esa Korhonen
c0bd5ca3a1 MXS-1905 Switchover if master is low on disk space
Required quite a bit of refactoring.
2018-08-06 13:24:05 +03:00
Markus Mäkelä
d22b02047f
Disable parameters on main worker
Disabling the parameter on the main worker prevents deadlocks if the
parameter is disabled at the same time a monitor diagnostic is executed.
2018-08-03 10:34:47 +03:00
Esa Korhonen
89dfc80f86 Better tracking for slave status bits
The monitor can now differentiate between slaves with a running
series of slave connections to the master from slaves with broken
links. Both still get the SERVER_SLAVE-flag if 'detect_stale_slave'
is on.

Also, relay servers must be running.
2018-07-31 14:53:29 +03:00
Esa Korhonen
cfa07c69ff Clean up switchover_check_current()
Now uses MariaDBServer.
2018-07-27 11:20:23 +03:00
Esa Korhonen
b421e56d1c Move execute_worker_task to MonitorInstance
The function is rather general and may of use to other monitor modules.
2018-07-24 15:07:18 +03:00
Esa Korhonen
382a017518 A master which is down for longer than failcount is considered an invalid master
If auto_failover is disabled and an alternative master exists, the
monitor will swap the master. This may break replication, but the
situation requires that the dba has set up a cluster with multiple
masters.
2018-07-20 15:47:23 +03:00
Esa Korhonen
c9570ff616 Check failover applicability to the cluster every turn
This should give an advance warning if a user tries to activate auto_failover
on a cluster which does not support it.
2018-07-20 15:33:47 +03:00
Esa Korhonen
862ae099b0 Construct diagnostics results in the monitor thread
MariaDBMonitor diagnostics printing is unsafe as some of the read
fields are arrays. To be on the safe side, the fields are now read
in the monitor worker thread.

Since diagnostics must work even for stopped monitors, a worker task
is used. In practice, it usually runs when the monitor is sleeping.
2018-07-20 10:18:58 +03:00
Esa Korhonen
590df89dbc Fix mm_mysqlmon test
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.

Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
2018-07-18 16:32:16 +03:00
Esa Korhonen
f2e0bf3caa Factor out functions
The topology update is now in a method. Also, the m_master-field
is only written inside a method so that the cycle info is always
updated.
2018-07-16 15:58:16 +03:00
Esa Korhonen
936bcde135 Remove old "detect_standalone_master"-feature, update documentation
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
2018-07-16 15:58:16 +03:00
Markus Mäkelä
77a1417479
Replace TR1 headers with standard headers
Now that the C++11 standard is the default one, we can remove the TR1
headers and classes.
2018-07-11 14:08:46 +03:00
Esa Korhonen
03491a45f0 Remove old code
The functionality is elsewhere.
2018-07-03 10:32:06 +03:00
Esa Korhonen
fd31c9cced MXS-1905 Set slaves with low disk space to maintenance
Also, servers in maintenance are updated just as other servers.
2018-07-02 14:24:57 +03:00
Esa Korhonen
960d08a36a Code cleanup
Removed unused code.
2018-06-29 10:54:34 +03:00
Esa Korhonen
9525d3507b Run manual commands without stopping the monitor
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.

This update required several modifications in related code.
2018-06-28 16:56:41 +03:00
Esa Korhonen
6bf10904d7 MXS-1845 Only rebuild topology when required
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.

Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
2018-06-28 16:56:41 +03:00
Johan Wikman
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
Esa Korhonen
58207ec414 MXS-1775 Check disk space warning bit when selecting a new master for failover
This also applies to autoselect switchover. The disk space warning has the least
priority, as the other criteria could lead to replication failures. Also, print
the reason the new master was selected over the second best candidate.
2018-06-18 17:59:19 +03:00
Esa Korhonen
019d62bbb8 MXS-1886 Better auto-rejoin error description and tolerance
Contains changes from commit 09df01752812444c6e7c409a8957d292f7de63cf
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
Esa Korhonen
d3e9cc9a4f MXS-1886 Auto-failover error tolerance
Contains changes from commit 9e68d8ec3ddf1621f533067021c4b3042f695e80
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
Esa Korhonen
2f987d0b10 MXS-1845 Only select a master if current master is no longer usable
The purpose is to make the selected master server sticky. The master is reselected only
if the current master is no longer a valid master.
2018-06-18 11:06:58 +03:00
Esa Korhonen
5324a1bdaa MXS-1845 Assign server roles
Assign server roles (master, slave, relay master, slave of external master)
for a graph with possibly multiple paths to a slave server.
2018-06-13 17:38:53 +03:00
Esa Korhonen
3f82c25c62 MXS-1845 New algorithm for finding the master server
Not yet used, as more is needed to replace the old code. The
algorithm is based on counting the total number of slave nodes
a server has, possibly in multiple layers and/or cycles.
2018-06-13 17:38:33 +03:00
Johan Wikman
8afa8c2c5a MXS-1775 Add MonitorInstanceSimple class
MonitorInstanceSimple is intended for simple monitors that
probe servers in a straightforward fashion. More complex monitors
can be derived directly from MonitorInstance.
2018-06-07 15:13:26 +03:00
Esa Korhonen
2481de260f Move monitor-dependent code in MariaDBServer to MariaDBMonitor
Removes Monitor-dependency from the MariaDBServer-class.
2018-06-06 22:28:38 +03:00
Johan Wikman
b2a190c2b8 MXS-1775 Add switchover_on_low_disk_space parameter 2018-06-06 15:25:57 +03:00
Johan Wikman
f600b3a769 MXS-1775 Check disk space in MariaDBMonitor 2018-06-06 15:25:57 +03:00
Johan Wikman
18ece193bb MXS-1775 MariaDBMonitor::main() removed
Now uses MonitorInstance::main() as all other monitors.
2018-06-06 15:25:57 +03:00
Johan Wikman
329a6df662 MXS-1775 Factor out post-processing
Further adjustments for being able to move MariaDBMonitor on
top of MonitorInstance::main().
2018-06-06 15:25:57 +03:00
Johan Wikman
71194d83d3 MXS-1775 Rearrange for moving main loop to MonitorInstance
This is another step in the process for moving the main loop
from MariaDBMonitor to MonitorInstance.
2018-06-06 15:25:57 +03:00
Esa Korhonen
17c9ac95bd MXS-1845 Add unit test for the cycle find algorithm (Tarjan SCC)
Only a few test cases for now.
2018-06-05 16:33:51 +03:00
Esa Korhonen
9a0445cd4e MXS-1845 Save cycle members
The saved data may be useful later on. Also includes some cleanup.
2018-06-05 16:25:04 +03:00
Esa Korhonen
4d7aff4ab9 MXS-1845 Find strongly connected components with multiple slave connections
Rewrote the algorithm for clarity.
2018-06-01 14:04:50 +03:00
Johan Wikman
32c7ae2f9f MXS-1775 Inherit MariaDBMonitor from mxs::MonitorInstance
Start/stop now provided by MonitorInstance. The thread main
function is now virtual and overriden by MariaDBMonitor. Some
additional refactoring is necessary in order to be able to allow
MonitorInstance to handle the main loop.
2018-06-01 13:48:15 +03:00
Johan Wikman
be3bdc7bc9 MXS-1775 Rename MariaDBMonitor::main_loop() to main()
To make it compatible with mxs::MonitorInstance.
2018-06-01 13:48:15 +03:00