Commit Graph

176 Commits

Author SHA1 Message Date
56c84541df MXS-1712 Add reset replication to MariaDB Monitor
The 'reset_replication' module command deletes all slave connections and binlogs,
sets gtid to sequence 0 and restarts replication from the given master. Should be
only used if gtid:s are incompatible but the actual data is known to be in sync.
2018-09-14 17:15:05 +03:00
cb54880b99 MXS-1937 Cleanup event handling
Event handling is now enabled by default. If the monitor cannot query the EVENTS-
table (most likely because of missing credentials), print an error suggesting to
turn the feature off.

When disabling events on a rejoining standalone server (likely a former master),
disable binlog event recording for the session. This prevents the ALTER EVENT
queries from generating binlog events.

Also added documentation and combined similar parts in the code.
2018-09-14 16:54:24 +03:00
108638b0cf Format with Uncrustify 0.67 2018-09-10 13:31:39 +03:00
d11c78ad80 Format all sources with Uncrustify
Formatted all sources and manually tuned some files to make the code look
neater.
2018-09-10 13:22:49 +03:00
c447e5cf15 Uncrustify maxscale
See script directory for method. The script to run in the top level
MaxScale directory is called maxscale-uncrustify.sh, which uses
another script, list-src, from the same directory (so you need to set
your PATH). The uncrustify version was 0.66.
2018-09-09 22:26:19 +03:00
f3fbb297a4 MXS-1937 Enable server events on new master during switchover and failover
Combined some of the shared code in enable/disable_events(). Also disables
events on a joining standalone server.
2018-09-05 15:54:08 +03:00
7e6ce2d13f MXS-1937 Disable server events on master during switchover
The feature is behind a config setting.
2018-09-05 15:54:08 +03:00
7cd1cfdb80 Relay log waiting is part of failover_prepare
Since the servers are not modified before or during the wait, the waiting
can be done in the preparation method. This simplifies the actual failover
somewhat, and allows the monitor to keep running normally while waiting for
the log to clear.
2018-08-30 17:07:34 +03:00
c39177bc8d Relay log clear supports multiple slave connections
Now waits for the relay log of the correct slave connection.
2018-08-29 17:07:52 +03:00
85d8a85cde Update master failure detection from slaves
The detection now works with multiple slave connections.
2018-08-29 17:07:52 +03:00
91ab59530f Use pending status in external master checks
When the replication status from the external master is checked, the
pending status must be used. This makes sure that the SlaveStatusArray and
the server state are sync.

Also extended the message that was logged when the external master was
lost. By adding the network address there, it makes it easier to see where
the server was replicating from if only the log file is available.
2018-08-23 15:46:45 +03:00
61bb172033 Cleanup failover/switchover
Replication settings warnings are printed once more. Changed some
parameter names to be more consistent within the monitor.
2018-08-23 10:28:47 +03:00
f2dfd39f79 Clean up JSON diagnostics
Now prints all slave connections.
2018-08-22 12:23:28 +03:00
3777da96bd Miscellaneous cleanup
Removes needless status assignments and unused code. Moves and modifies some comments.
2018-08-22 12:11:33 +03:00
3f53eddbde MXS-2020 Replace ss[_info]_dassert with mxb_assert[_message] 2018-08-22 11:34:59 +03:00
03cefcc4ac MXS-2012 Write replication lag to SERVER
Allows routers to read the value.
2018-08-21 11:51:10 +03:00
1c508cd413 MXS-2012 Read and print Seconds_Behind_Master
Replaces the old replication lag detection.
2018-08-21 11:51:10 +03:00
b408894f6d MXS-2004 Remove dependency of maxscale/thread.h 2018-08-13 13:38:39 +03:00
681c456bd7 Separate unknown server version from old versions
This allows better failover support detection.
2018-08-13 11:30:21 +03:00
b7c94abb34 Keep track of previously observed slave connections
This reduces the ambiguity of server id:s in the slave status contents.
If a slave connection has been seen properly connected at an earlier time,
it can be trusted to report the correct master server id. This also
fixes some wrong status assignment edge cases with the SERVER_WAS_SLAVE-bit.
The bit will be removed in a later commit.

Even this does not solve the situation when MaxScale is started with
some servers down.
2018-08-09 20:39:19 +03:00
17c84a22c7 Refactor preparations to failover
The two operations are quite similar so the code should look
similar as well and use shared functions.
2018-08-07 16:33:56 +03:00
0a81f78442 Use unique pointer instead of auto-pointer 2018-08-06 13:24:05 +03:00
c0bd5ca3a1 MXS-1905 Switchover if master is low on disk space
Required quite a bit of refactoring.
2018-08-06 13:24:05 +03:00
836db54800 Clean up server status printing
Uses mostly the status functions for reading the flags. Strickly
speaking this breaks the REST API since in some cases (status combinations)
the printed string is different from what was printed before.
2018-08-02 10:42:12 +03:00
1e33ab69f2 Rename server_is_running() to server_is_usable()
The previous name was misleading. The new server_is_running() only
checks for the running bit so that a server is always either running
or down.
2018-07-31 14:53:56 +03:00
18bfca0533 Define inline functions for status variables
The functions are used in MariaDB Monitor.
2018-07-27 11:20:23 +03:00
fbce38878b Turn server status macros to functions 2018-07-25 11:19:47 +03:00
c9570ff616 Check failover applicability to the cluster every turn
This should give an advance warning if a user tries to activate auto_failover
on a cluster which does not support it.
2018-07-20 15:33:47 +03:00
0750e93eeb Fix verify_master_failure
The master failure verification would not work if the slaves did not have
a state change since MaxScale had started. This can be fixed by treating
the startup of MaxScale as an event of sorts.
2018-07-17 11:52:19 +03:00
936bcde135 Remove old "detect_standalone_master"-feature, update documentation
The auto_failover is a more reliable solution and should be used instead. Several
unused parameters were removed, although they can still be defined in the config
file. Updated documentation on the relevant parts.
2018-07-16 15:58:16 +03:00
9525d3507b Run manual commands without stopping the monitor
The command is saved in a function object which is read by the monitor
thread. This way, manual and automatic cluster modification commands are
ran in the same step of a monitor cycle.

This update required several modifications in related code.
2018-06-28 16:56:41 +03:00
6bf10904d7 MXS-1845 Only rebuild topology when required
The monitor now detects when a server has changed such that a replication
graph rebuild is needed and only then rebuilds the graph and detects
cycles and master.

Also, some old code is no longer called in the monitor cycle. It will be
removed in later commits. Refactored some of the related functions.
2018-06-28 16:56:41 +03:00
cc0299aee6 Update change date of 2.3 2018-06-25 10:07:52 +03:00
8bd9e1d473 Check monitor permissions when reconnecting to server
Previously, the permissions would only be checked at monitor start.
Now, the permissions are checked if [Auth Error] is on or server
was reconnected.
2018-06-20 17:54:46 +03:00
019d62bbb8 MXS-1886 Better auto-rejoin error description and tolerance
Contains changes from commit 09df01752812444c6e7c409a8957d292f7de63cf
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
d3e9cc9a4f MXS-1886 Auto-failover error tolerance
Contains changes from commit 9e68d8ec3ddf1621f533067021c4b3042f695e80
adapted to the 2.3 branch.
2018-06-18 16:35:28 +03:00
2f987d0b10 MXS-1845 Only select a master if current master is no longer usable
The purpose is to make the selected master server sticky. The master is reselected only
if the current master is no longer a valid master.
2018-06-18 11:06:58 +03:00
5324a1bdaa MXS-1845 Assign server roles
Assign server roles (master, slave, relay master, slave of external master)
for a graph with possibly multiple paths to a slave server.
2018-06-13 17:38:53 +03:00
3f82c25c62 MXS-1845 New algorithm for finding the master server
Not yet used, as more is needed to replace the old code. The
algorithm is based on counting the total number of slave nodes
a server has, possibly in multiple layers and/or cycles.
2018-06-13 17:38:33 +03:00
2481de260f Move monitor-dependent code in MariaDBServer to MariaDBMonitor
Removes Monitor-dependency from the MariaDBServer-class.
2018-06-06 22:28:38 +03:00
f600b3a769 MXS-1775 Check disk space in MariaDBMonitor 2018-06-06 15:25:57 +03:00
37841183b3 Cleanup server.h
Renamed, rearranged and clarified status bits. Removed unused macros.
2018-06-01 14:29:51 +03:00
4d7aff4ab9 MXS-1845 Find strongly connected components with multiple slave connections
Rewrote the algorithm for clarity.
2018-06-01 14:04:50 +03:00
3ec449339f Only write to SERVER->status at the end of a monitoring loop
This makes the code clearer and reduces race conditions, as the monitor
could be writing SERVER->status while a router is reading it. Also,
the time during which the SERVER struct is locked drops to a fraction.
2018-05-23 16:19:08 +03:00
a31549221b MXS-1845 Print all slave connections at 'show monitors'
The format has changed.
2018-05-18 13:48:08 +03:00
45da5a08d9 MXS-1845 Cleanup monitor backend updating
Now detects errors and prints them.
2018-05-18 13:48:08 +03:00
b71710e066 MXS-1865 Combine server version and type
Binlog server is now handled better.
2018-05-16 13:55:45 +03:00
b29bae6e84 MXS-1865 Update server version only when (re)connecting
Updating it every iteration is needless.
2018-05-16 13:55:45 +03:00
c6b8277281 Cleanup server status querying
No functional changes except that status changes are no longer logged when
querying. It was bugged to begin with.
2018-05-14 11:32:02 +03:00
df4454027a Clean up monitor initialization and destruction
Since monitors are now freed at MaxScale exit, the server data should be freed. Also,
gtid domain variables are now initialized with a common constant.
2018-05-14 11:32:02 +03:00