Commit Graph

354 Commits

Author SHA1 Message Date
e0361e335f Fix relay master assignment
The relay master status was assigned to a server based on the last known
replication status of the slaves that have at some point replicated from
it. This can cause false positives and the relay master status is assigned
to servers that have never been observed to act as relay masters.
2018-07-17 11:52:20 +03:00
0750e93eeb Fix verify_master_failure
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
2018-07-17 11:52:19 +03:00
bded99aea3 Assign slave status even if no master is available
The master validity check now checks if the master is down. This requires
that the slave status is assigned even if no master is available.

The failover precondition is also fulfilled as long as one valid promotion
candidate is found. Previously a slave that didn't use GTID replication
appeared to prevent failover.
2018-07-17 11:52:18 +03:00
f2e0bf3caa Factor out functions
The topology update is now in a method. Also, the m_master-field
is only written inside a method so that the cycle info is always
updated.
2018-07-16 15:58:16 +03:00
936bcde135 Remove old "detect_standalone_master"-feature, update documentation
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
2018-07-16 15:58:16 +03:00
77a1417479 Replace TR1 headers with standard headers
Now that the C++11 standard is the default one, we can remove the TR1
headers and classes.
2018-07-11 14:08:46 +03:00
9d94230237 Assign status bits only for running servers
In previously the status bits were assigned only for running servers. Due
to the changes done in the monitoring algorithm, the slave and master
status bits are assigned to servers that are down. This change broke a
number of tests and deviates from previous behavior.

To keep the old behavior and to fix the test, the status bits are not
assigned to servers that are down.
2018-07-09 12:10:36 +03:00
34f61bc4f2 Close connections before starting loop
The connections should be closed after the check queries.
2018-07-03 10:32:06 +03:00
03491a45f0 Remove old code
The functionality is elsewhere.
2018-07-03 10:32:06 +03:00
fd31c9cced MXS-1905 Set slaves with low disk space to maintenance
Also, servers in maintenance are updated just as other servers.
2018-07-02 14:24:57 +03:00
960d08a36a Code cleanup
Removed unused code.
2018-06-29 10:54:34 +03:00
0afcd4b468 Fix test-program failures
Due to recent changes, mxs::MessageQueue::init() must be called
explicitly, if a monitor is created.
2018-06-29 10:43:49 +03:00
9525d3507b Run manual commands without stopping the monitor
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.

This update required several modifications in related code.
2018-06-28 16:56:41 +03:00
6bf10904d7 MXS-1845 Only rebuild topology when required
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.

Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
2018-06-28 16:56:41 +03:00
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
8bd9e1d473 Check monitor permissions when reconnecting to server
Previously, the permissions would only be checked at monitor start.
Now, the permissions are checked if [Auth Error] is on or server
was reconnected.
2018-06-20 17:54:46 +03:00
58207ec414 MXS-1775 Check disk space warning bit when selecting a new master for failover
This also applies to autoselect switchover. The disk space warning has the least
priority, as the other criteria could lead to replication failures. Also, print
the reason the new master was selected over the second best candidate.
2018-06-18 17:59:19 +03:00
019d62bbb8 MXS-1886 Better auto-rejoin error description and tolerance
Contains changes from commit 09df01752812444c6e7c409a8957d292f7de63cf
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
d3e9cc9a4f MXS-1886 Auto-failover error tolerance
Contains changes from commit 9e68d8ec3ddf1621f533067021c4b3042f695e80
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
2f987d0b10 MXS-1845 Only select a master if current master is no longer usable
The purpose is to make the selected master server sticky. The master is reselected only
if the current master is no longer a valid master.
2018-06-18 11:06:58 +03:00
09df017528 MXS-1886 Better auto-rejoin error description and tolerance
Auto-rejoin now explains more accurately if a server cannot be joined due
to conflicting gtid.

Also, auto-rejoin is no longer disabled if a join fails. Usually the fail
is due to the server not replying fast enough with query completion. The
query is often completed anyways. This can lead to some log spam.
2018-06-15 13:11:10 +03:00
9e68d8ec3d MXS-1886 Auto-failover error tolerance
Auto-failover is no longer considered to have failed if the preconditions
are not met. An error message with the failed checks is printed once, but
the checks are repeated every loop as long as the master is down.
2018-06-15 12:52:03 +03:00
de37f1a5c4 Fix cycle find test 2018-06-14 10:28:10 +03:00
5324a1bdaa MXS-1845 Assign server roles
Assign server roles (master, slave, relay master, slave of external master)
for a graph with possibly multiple paths to a slave server.
2018-06-13 17:38:53 +03:00
3f82c25c62 MXS-1845 New algorithm for finding the master server
Not yet used, as more is needed to replace the old code. The
algorithm is based on counting the total number of slave nodes
a server has, possibly in multiple layers and/or cycles.
2018-06-13 17:38:33 +03:00
8afa8c2c5a MXS-1775 Add MonitorInstanceSimple class
MonitorInstanceSimple is intended for simple monitors that
probe servers in a straightforward fashion. More complex monitors
can be derived directly from MonitorInstance.
2018-06-07 15:13:26 +03:00
2481de260f Move monitor-dependent code in MariaDBServer to MariaDBMonitor
Removes Monitor-dependency from the MariaDBServer-class.
2018-06-06 22:28:38 +03:00
b2a190c2b8 MXS-1775 Add switchover_on_low_disk_space parameter 2018-06-06 15:25:57 +03:00
dc47835ef6 MXS-1775 Add documentation for new monitor parameter 2018-06-06 15:25:57 +03:00
af717426d5 MXS-1775 Load server journal unconditionally
The server journal is unconditionally loaded and need not be
done in @c pre_loop.
2018-06-06 15:25:57 +03:00
f600b3a769 MXS-1775 Check disk space in MariaDBMonitor 2018-06-06 15:25:57 +03:00
18ece193bb MXS-1775 MariaDBMonitor::main() removed
Now uses MonitorInstance::main() as all other monitors.
2018-06-06 15:25:57 +03:00
44b1e805a3 MXS-1775 Move MariaDBMonitor functionality to tick
Now all is set for moving MariaDBMonitor on top of
MonitorInstance::main.
2018-06-06 15:25:57 +03:00
329a6df662 MXS-1775 Factor out post-processing
Further adjustments for being able to move MariaDBMonitor on
top of MonitorInstance::main().
2018-06-06 15:25:57 +03:00
71194d83d3 MXS-1775 Rearrange for moving main loop to MonitorInstance
This is another step in the process for moving the main loop
from MariaDBMonitor to MonitorInstance.
2018-06-06 15:25:57 +03:00
70fdd0fc17 Merge branch '2.2' into develop 2018-06-06 08:56:31 +03:00
f2b2951c99 Track the number of performed monitoring intervals
Tracking how many times the monitor has performed its monitoring allows
the test framework to consistently wait for an event instead of waiting
for a hard-coded time period. The MaxCtrl `api get` command can be used to
easily extract the numeric value.
2018-06-06 08:46:46 +03:00
17c9ac95bd MXS-1845 Add unit test for the cycle find algorithm (Tarjan SCC)
Only a few test cases for now.
2018-06-05 16:33:51 +03:00
9a0445cd4e MXS-1845 Save cycle members
The saved data may be useful later on. Also includes some cleanup.
2018-06-05 16:25:04 +03:00
7cd19a12a2 MXS-1883 Remove locking during monitor loop
Since the admin cannot modify server states any more, the locks are not
required.
2018-06-01 15:03:07 +03:00
37841183b3 Cleanup server.h
Renamed, rearranged and clarified status bits. Removed unused macros.
2018-06-01 14:29:51 +03:00
4d7aff4ab9 MXS-1845 Find strongly connected components with multiple slave connections
Rewrote the algorithm for clarity.
2018-06-01 14:04:50 +03:00
32c7ae2f9f MXS-1775 Inherit MariaDBMonitor from mxs::MonitorInstance
Start/stop now provided by MonitorInstance. The thread main
function is now virtual and overriden by MariaDBMonitor. Some
additional refactoring is necessary in order to be able to allow
MonitorInstance to handle the main loop.
2018-06-01 13:48:15 +03:00
5219245a04 MXS-1775 Reset server info in MariaDBMonitor::configure()
That way, MariaDBMonitor no longer needs a custome start() function.
2018-06-01 13:48:15 +03:00
be3bdc7bc9 MXS-1775 Rename MariaDBMonitor::main_loop() to main()
To make it compatible with mxs::MonitorInstance.
2018-06-01 13:48:15 +03:00
a82c5911e5 MXS-1775 Rename m_monitor_base to m_monitor
To make it compatible with how the variable is named
in maxscale::MonitorInstance.
2018-06-01 13:48:15 +03:00
f862939dd7 MXS-1775 Make MariaDBMon non-dependent on stop() return value
To align it with the behavour or MonitorInstance::stop()
2018-06-01 13:48:15 +03:00
62f3e89ae7 MXS-1775 Tilt MariaDbMon towards maxscale::MonitorInstance
- m_status -> m_state
- m_keep_running -> m_shutdown
- load_config_params() -> configure()
2018-06-01 13:48:15 +03:00
b439857a84 MXS-1775 Remove destroy()
Now the instance is deleted simply by deleting it.
2018-06-01 13:48:15 +03:00
c039821467 MXS-1883 Maintenance is now the only user-modifiable bit for a monitored server
The request to turn maintenance off/on is a separate flag, although the actual
status is still contained in the status bitfield.
2018-05-30 10:09:15 +03:00