Commit Graph

728 Commits

Author SHA1 Message Date
e5a90d63e1 Remove SERVER_WAS_SLAVE status bit
Was unused due to MariaDBMonitor changes.
2018-08-13 11:30:20 +03:00
e2902b6513 MXS-2002 Remove GenericFunction typedef
It does not save much in characters compared to std::function<void ()>
and it's a bit misleading as not just any callable object will do, but
only one that takes no argument and returns void.
2018-08-13 08:30:05 +03:00
4193c4d3db MXS-2002 Add additional versions of Worker::[call|execute]() 2018-08-13 08:30:05 +03:00
e9758ebaf1 MXS-2002 Rename Worker::post() to Worker::execute()
The main point is that tasks/functions are executed, not that
they are posted.
2018-08-13 08:30:05 +03:00
3013adb14f MXS-2002 Worker::execute() renamed to Worker::call()
In preparation for Worker::post() to be renamed to Worker:execute().
The concept of _posting_ will be reserved to mean the transfer of
something over the message queue to the worker for processing and
nothing else.
2018-08-13 08:30:05 +03:00
9cfd451a1d MXS-2002 Make Worker excecution mode explicit
This is the first step in some cleanup of the Worker interface.
The execution mode must now be explicitly specified, but that is
just a temporary step. Further down the road, _posting_ will
*always* mean via the message loop while _executing_ will optionally
and by default mean direct execution if the calling thread is that
of the worker.
2018-08-13 08:30:05 +03:00
b9ec3f5130 Monitor json diagnostics printing cleanup
The 'events' and 'script' config values were defined for every monitor.
Removed the extra definitions and moved the variables to MXS_MONITOR.

MariaDBMonitor was printing config values a second time, they are
already printed by the caller.

Moved the events enum definition to the internal header since it's no longer
required by modules.

Added a default config setting "all" to 'events' to clarify that it enables
all events.
2018-08-10 11:19:09 +03:00
f14380243b Rename cppdefs.hh to ccdefs.hh
For obvious reasons; the c++ suffix is .cc and not .cpp
2018-08-10 07:50:18 +03:00
b7c94abb34 Keep track of previously observed slave connections
This reduces the ambiguity of server id:s in the slave status contents.
If a slave connection has been seen properly connected at an earlier time,
it can be trusted to report the correct master server id. This also
fixes some wrong status assignment edge cases with the SERVER_WAS_SLAVE-bit.
The bit will be removed in a later commit.

Even this does not solve the situation when MaxScale is started with
some servers down.
2018-08-09 20:39:19 +03:00
3f2838ab36 Avoid repeated logging when retrying automatic failover or switchover
Prevents repeated logging of similar error messages.
2018-08-07 16:36:48 +03:00
17c84a22c7 Refactor preparations to failover
The two operations are quite similar so the code should look
similar as well and use shared functions.
2018-08-07 16:33:56 +03:00
0a81f78442 Use unique pointer instead of auto-pointer 2018-08-06 13:24:05 +03:00
c0bd5ca3a1 MXS-1905 Switchover if master is low on disk space
Required quite a bit of refactoring.
2018-08-06 13:24:05 +03:00
d22b02047f Disable parameters on main worker
Disabling the parameter on the main worker prevents deadlocks if the
parameter is disabled at the same time a monitor diagnostic is executed.
2018-08-03 10:34:47 +03:00
d412b8d729 Move execute_worker_task into mxs::Worker
The function has use outside of the monitors as it makes execution of
worker tasks much more convenient. Currently, this change only moves the
code and takes it into use: there should be no functional changes.
2018-08-02 18:56:35 +03:00
836db54800 Clean up server status printing
Uses mostly the status functions for reading the flags. Strickly
speaking this breaks the REST API since in some cases (status combinations)
the printed string is different from what was printed before.
2018-08-02 10:42:12 +03:00
f7e3d4c2fb Remove maxscale/thread.hh
A C++11-like implementation of thread, future, etc. that now is
obsolete as we use C++11.
2018-08-01 17:12:49 +03:00
1e33ab69f2 Rename server_is_running() to server_is_usable()
The previous name was misleading. The new server_is_running() only
checks for the running bit so that a server is always either running
or down.
2018-07-31 14:53:56 +03:00
89dfc80f86 Better tracking for slave status bits
The monitor can now differentiate between slaves with a running
series of slave connections to the master from slaves with broken
links. Both still get the SERVER_SLAVE-flag if 'detect_stale_slave'
is on.

Also, relay servers must be running.
2018-07-31 14:53:29 +03:00
cfa07c69ff Clean up switchover_check_current()
Now uses MariaDBServer.
2018-07-27 11:20:23 +03:00
18bfca0533 Define inline functions for status variables
The functions are used in MariaDB Monitor.
2018-07-27 11:20:23 +03:00
fbce38878b Turn server status macros to functions 2018-07-25 11:19:47 +03:00
b421e56d1c Move execute_worker_task to MonitorInstance
The function is rather general and may of use to other monitor modules.
2018-07-24 15:07:18 +03:00
27084f1368 Handle the situation where the previous master is reselected 2018-07-23 12:17:00 +03:00
382a017518 A master which is down for longer than failcount is considered an invalid master
If auto_failover is disabled and an alternative master exists, the
monitor will swap the master. This may break replication, but the
situation requires that the dba has set up a cluster with multiple
masters.
2018-07-20 15:47:23 +03:00
c9570ff616 Check failover applicability to the cluster every turn
This should give an advance warning if a user tries to activate auto_failover
on a cluster which does not support it.
2018-07-20 15:33:47 +03:00
862ae099b0 Construct diagnostics results in the monitor thread
MariaDBMonitor diagnostics printing is unsafe as some of the read
fields are arrays. To be on the safe side, the fields are now read
in the monitor worker thread.

Since diagnostics must work even for stopped monitors, a worker task
is used. In practice, it usually runs when the monitor is sleeping.
2018-07-20 10:18:58 +03:00
590df89dbc Fix mm_mysqlmon test
Because of monitor changes, the test had wrong assumptions.
Renamed the test and updated it to use MaxCtrl for some queries.

Also, changed the type of the cycle container in the monitor to an
ordered map so that results are predictable.
2018-07-18 16:32:16 +03:00
e0361e335f Fix relay master assignment
The relay master status was assigned to a server based on the last known
replication status of the slaves that have at some point replicated from
it. This can cause false positives and the relay master status is assigned
to servers that have never been observed to act as relay masters.
2018-07-17 11:52:20 +03:00
0750e93eeb Fix verify_master_failure
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
2018-07-17 11:52:19 +03:00
bded99aea3 Assign slave status even if no master is available
The master validity check now checks if the master is down. This requires
that the slave status is assigned even if no master is available.

The failover precondition is also fulfilled as long as one valid promotion
candidate is found. Previously a slave that didn't use GTID replication
appeared to prevent failover.
2018-07-17 11:52:18 +03:00
f2e0bf3caa Factor out functions
The topology update is now in a method. Also, the m_master-field
is only written inside a method so that the cycle info is always
updated.
2018-07-16 15:58:16 +03:00
936bcde135 Remove old "detect_standalone_master"-feature, update documentation
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
2018-07-16 15:58:16 +03:00
77a1417479 Replace TR1 headers with standard headers
Now that the C++11 standard is the default one, we can remove the TR1
headers and classes.
2018-07-11 14:08:46 +03:00
e6cf20ea29 Fix mmmon hang
The iteration of servers would never exit.
2018-07-09 12:10:36 +03:00
9d94230237 Assign status bits only for running servers
In previously the status bits were assigned only for running servers. Due
to the changes done in the monitoring algorithm, the slave and master
status bits are assigned to servers that are down. This change broke a
number of tests and deviates from previous behavior.

To keep the old behavior and to fix the test, the status bits are not
assigned to servers that are down.
2018-07-09 12:10:36 +03:00
34f61bc4f2 Close connections before starting loop
The connections should be closed after the check queries.
2018-07-03 10:32:06 +03:00
03491a45f0 Remove old code
The functionality is elsewhere.
2018-07-03 10:32:06 +03:00
fd31c9cced MXS-1905 Set slaves with low disk space to maintenance
Also, servers in maintenance are updated just as other servers.
2018-07-02 14:24:57 +03:00
a59c0c61ce Remove depth field from SERVER
It was not really used anymore.
2018-06-29 10:54:34 +03:00
960d08a36a Code cleanup
Removed unused code.
2018-06-29 10:54:34 +03:00
0afcd4b468 Fix test-program failures
Due to recent changes, mxs::MessageQueue::init() must be called
explicitly, if a monitor is created.
2018-06-29 10:43:49 +03:00
9525d3507b Run manual commands without stopping the monitor
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.

This update required several modifications in related code.
2018-06-28 16:56:41 +03:00
6bf10904d7 MXS-1845 Only rebuild topology when required
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.

Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
2018-06-28 16:56:41 +03:00
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
8bd9e1d473 Check monitor permissions when reconnecting to server
Previously, the permissions would only be checked at monitor start.
Now, the permissions are checked if [Auth Error] is on or server
was reconnected.
2018-06-20 17:54:46 +03:00
58207ec414 MXS-1775 Check disk space warning bit when selecting a new master for failover
This also applies to autoselect switchover. The disk space warning has the least
priority, as the other criteria could lead to replication failures. Also, print
the reason the new master was selected over the second best candidate.
2018-06-18 17:59:19 +03:00
019d62bbb8 MXS-1886 Better auto-rejoin error description and tolerance
Contains changes from commit 09df01752812444c6e7c409a8957d292f7de63cf
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
d3e9cc9a4f MXS-1886 Auto-failover error tolerance
Contains changes from commit 9e68d8ec3ddf1621f533067021c4b3042f695e80
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
9d961ece3a Clear galeramon server info in pre_tick
The server information in galeramon is gathered every monitoring
interval. To prevent stale information from being used, the server
information needs to be cleared at the start of each monitoring interval.
2018-06-18 14:25:05 +03:00