716 Commits

Author SHA1 Message Date
Esa Korhonen
c0bd5ca3a1 MXS-1905 Switchover if master is low on disk space
Required quite a bit of refactoring.
2018-08-06 13:24:05 +03:00
Markus Mäkelä
d22b02047f
Disable parameters on main worker
Disabling the parameter on the main worker prevents deadlocks if the
parameter is disabled at the same time a monitor diagnostic is executed.
2018-08-03 10:34:47 +03:00
Markus Mäkelä
d412b8d729
Move execute_worker_task into mxs::Worker
The function has use outside of the monitors as it makes execution of
worker tasks much more convenient. Currently, this change only moves the
code and takes it into use: there should be no functional changes.
2018-08-02 18:56:35 +03:00
Esa Korhonen
836db54800 Clean up server status printing
Uses mostly the status functions for reading the flags. Strickly
speaking this breaks the REST API since in some cases (status combinations)
the printed string is different from what was printed before.
2018-08-02 10:42:12 +03:00
Johan Wikman
f7e3d4c2fb Remove maxscale/thread.hh
A C++11-like implementation of thread, future, etc. that now is
obsolete as we use C++11.
2018-08-01 17:12:49 +03:00
Esa Korhonen
1e33ab69f2 Rename server_is_running() to server_is_usable()
The previous name was misleading. The new server_is_running() only
checks for the running bit so that a server is always either running
or down.
2018-07-31 14:53:56 +03:00
Esa Korhonen
89dfc80f86 Better tracking for slave status bits
The monitor can now differentiate between slaves with a running
series of slave connections to the master from slaves with broken
links. Both still get the SERVER_SLAVE-flag if 'detect_stale_slave'
is on.

Also, relay servers must be running.
2018-07-31 14:53:29 +03:00
Esa Korhonen
cfa07c69ff Clean up switchover_check_current()
Now uses MariaDBServer.
2018-07-27 11:20:23 +03:00
Esa Korhonen
18bfca0533 Define inline functions for status variables
The functions are used in MariaDB Monitor.
2018-07-27 11:20:23 +03:00
Esa Korhonen
fbce38878b Turn server status macros to functions 2018-07-25 11:19:47 +03:00
Esa Korhonen
b421e56d1c Move execute_worker_task to MonitorInstance
The function is rather general and may of use to other monitor modules.
2018-07-24 15:07:18 +03:00
Esa Korhonen
27084f1368 Handle the situation where the previous master is reselected 2018-07-23 12:17:00 +03:00
Esa Korhonen
382a017518 A master which is down for longer than failcount is considered an invalid master
If auto_failover is disabled and an alternative master exists, the
monitor will swap the master. This may break replication, but the
situation requires that the dba has set up a cluster with multiple
masters.
2018-07-20 15:47:23 +03:00
Esa Korhonen
c9570ff616 Check failover applicability to the cluster every turn
This should give an advance warning if a user tries to activate auto_failover
on a cluster which does not support it.
2018-07-20 15:33:47 +03:00
Esa Korhonen
862ae099b0 Construct diagnostics results in the monitor thread
MariaDBMonitor diagnostics printing is unsafe as some of the read
fields are arrays. To be on the safe side, the fields are now read
in the monitor worker thread.

Since diagnostics must work even for stopped monitors, a worker task
is used. In practice, it usually runs when the monitor is sleeping.
2018-07-20 10:18:58 +03:00
Esa Korhonen
590df89dbc Fix mm_mysqlmon test
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.

Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
2018-07-18 16:32:16 +03:00
Markus Mäkelä
e0361e335f
Fix relay master assignment
The relay master status was assigned to a server based on the last known
replication status of the slaves that have at some point replicated from
it. This can cause false positives and the relay master status is assigned
to servers that have never been observed to act as relay masters.
2018-07-17 11:52:20 +03:00
Markus Mäkelä
0750e93eeb
Fix verify_master_failure
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
2018-07-17 11:52:19 +03:00
Markus Mäkelä
bded99aea3
Assign slave status even if no master is available
The master validity check now checks if the master is down. This requires
that the slave status is assigned even if no master is available.

The failover precondition is also fulfilled as long as one valid promotion
candidate is found. Previously a slave that didn't use GTID replication
appeared to prevent failover.
2018-07-17 11:52:18 +03:00
Esa Korhonen
f2e0bf3caa Factor out functions
The topology update is now in a method. Also, the m_master-field
is only written inside a method so that the cycle info is always
updated.
2018-07-16 15:58:16 +03:00
Esa Korhonen
936bcde135 Remove old "detect_standalone_master"-feature, update documentation
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
2018-07-16 15:58:16 +03:00
Markus Mäkelä
77a1417479
Replace TR1 headers with standard headers
Now that the C++11 standard is the default one, we can remove the TR1
headers and classes.
2018-07-11 14:08:46 +03:00
Markus Mäkelä
e6cf20ea29
Fix mmmon hang
The iteration of servers would never exit.
2018-07-09 12:10:36 +03:00
Markus Mäkelä
9d94230237
Assign status bits only for running servers
In previously the status bits were assigned only for running servers. Due
to the changes done in the monitoring algorithm, the slave and master
status bits are assigned to servers that are down. This change broke a
number of tests and deviates from previous behavior.

To keep the old behavior and to fix the test, the status bits are not
assigned to servers that are down.
2018-07-09 12:10:36 +03:00
Esa Korhonen
34f61bc4f2 Close connections before starting loop
The connections should be closed after the check queries.
2018-07-03 10:32:06 +03:00
Esa Korhonen
03491a45f0 Remove old code
The functionality is elsewhere.
2018-07-03 10:32:06 +03:00
Esa Korhonen
fd31c9cced MXS-1905 Set slaves with low disk space to maintenance
Also, servers in maintenance are updated just as other servers.
2018-07-02 14:24:57 +03:00
Esa Korhonen
a59c0c61ce Remove depth field from SERVER
It was not really used anymore.
2018-06-29 10:54:34 +03:00
Esa Korhonen
960d08a36a Code cleanup
Removed unused code.
2018-06-29 10:54:34 +03:00
Johan Wikman
0afcd4b468 Fix test-program failures
Due to recent changes, mxs::MessageQueue::init() must be called
explicitly, if a monitor is created.
2018-06-29 10:43:49 +03:00
Esa Korhonen
9525d3507b Run manual commands without stopping the monitor
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.

This update required several modifications in related code.
2018-06-28 16:56:41 +03:00
Esa Korhonen
6bf10904d7 MXS-1845 Only rebuild topology when required
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.

Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
2018-06-28 16:56:41 +03:00
Johan Wikman
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
Esa Korhonen
8bd9e1d473 Check monitor permissions when reconnecting to server
Previously, the permissions would only be checked at monitor start.
Now, the permissions are checked if [Auth Error] is on or server
was reconnected.
2018-06-20 17:54:46 +03:00
Esa Korhonen
58207ec414 MXS-1775 Check disk space warning bit when selecting a new master for failover
This also applies to autoselect switchover. The disk space warning has the least
priority, as the other criteria could lead to replication failures. Also, print
the reason the new master was selected over the second best candidate.
2018-06-18 17:59:19 +03:00
Esa Korhonen
019d62bbb8 MXS-1886 Better auto-rejoin error description and tolerance
Contains changes from commit 09df01752812444c6e7c409a8957d292f7de63cf
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
Esa Korhonen
d3e9cc9a4f MXS-1886 Auto-failover error tolerance
Contains changes from commit 9e68d8ec3ddf1621f533067021c4b3042f695e80
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
Markus Mäkelä
9d961ece3a
Clear galeramon server info in pre_tick
The server information in galeramon is gathered every monitoring
interval. To prevent stale information from being used, the server
information needs to be cleared at the start of each monitoring interval.
2018-06-18 14:25:05 +03:00
Esa Korhonen
2f987d0b10 MXS-1845 Only select a master if current master is no longer usable
The purpose is to make the selected master server sticky. The master is reselected only
if the current master is no longer a valid master.
2018-06-18 11:06:58 +03:00
Esa Korhonen
de37f1a5c4 Fix cycle find test 2018-06-14 10:28:10 +03:00
Esa Korhonen
5324a1bdaa MXS-1845 Assign server roles
Assign server roles (master, slave, relay master, slave of external master)
for a graph with possibly multiple paths to a slave server.
2018-06-13 17:38:53 +03:00
Esa Korhonen
3f82c25c62 MXS-1845 New algorithm for finding the master server
Not yet used, as more is needed to replace the old code. The
algorithm is based on counting the total number of slave nodes
a server has, possibly in multiple layers and/or cycles.
2018-06-13 17:38:33 +03:00
Markus Mäkelä
c798a4ae36
Remove use of HASHTABLE in galeramon
Replaced the HASHTABLE in galeramon with an std::unordered_map. This
simplifies the code by a great amount and makes it more readable. Removed
the extraneous functions that mostly logged debug information and
simplified the logic by removing redundant checks.
2018-06-11 10:22:23 +03:00
Markus Mäkelä
9263a06b15
Fix galera master selection
The master selection still used the current status instead of the pending
status. This caused the master selection to lag behind by one monitor
loop.
2018-06-11 10:22:23 +03:00
Markus Mäkelä
dd49d4faea
Check pending Synced status in Galeramon
The pending status must be used, not the current.
2018-06-08 14:41:11 +03:00
Markus Mäkelä
9f5358eac0
Remove server_get_parameter_nolock
The function is no longer needed as there is no recursive access to the
server.
2018-06-08 14:41:11 +03:00
Johan Wikman
8afa8c2c5a MXS-1775 Add MonitorInstanceSimple class
MonitorInstanceSimple is intended for simple monitors that
probe servers in a straightforward fashion. More complex monitors
can be derived directly from MonitorInstance.
2018-06-07 15:13:26 +03:00
Esa Korhonen
2481de260f Move monitor-dependent code in MariaDBServer to MariaDBMonitor
Removes Monitor-dependency from the MariaDBServer-class.
2018-06-06 22:28:38 +03:00
Johan Wikman
b2a190c2b8 MXS-1775 Add switchover_on_low_disk_space parameter 2018-06-06 15:25:57 +03:00
Johan Wikman
dc47835ef6 MXS-1775 Add documentation for new monitor parameter 2018-06-06 15:25:57 +03:00