40 Commits

Author SHA1 Message Date
Markus Mäkelä
f41111b4bd MXS-1517: Retain stale master bit even on master failure
If a server goes down and it has the stale master bit enabled, all other
bits for the server are cleared. This allows failed masters that have been
replaced to be first detected and then reintroduced into the replication
topology.
2017-11-14 16:53:09 +02:00
Esa Korhonen
3a13469691 MXS-1490 Fix bug with gtid_io_pos change check
The conditional was opposite to intention.
2017-11-08 10:46:51 +02:00
Esa Korhonen
a1a5947d61 MXS-1490: Parse Gtid_IO_Pos only when using Gtid
First check "Using_Gtid", as that should be always valid. If set to
"Slave_Pos", parse "Gtid_IO_Pos".
2017-11-08 10:46:51 +02:00
Johan Wikman
4cf01fa88f Remove 'failover_script' parameter
As the failover is now internal to MySQL Monitor, no failover
script parameter is needed.
2017-11-07 16:05:44 +02:00
Markus Mäkelä
dce073a684 MXS-1496: Don't assign slave status for masters
The slave and stale slave status bits should be cleared from a master if
it still has them.

Also used the correct functions to manipulate the bits instead of directly
setting them in the monitor.
2017-11-07 15:52:28 +02:00
Esa Korhonen
84e95cee96 MXS-1490: Query gtid_slave_pos only during failover
The value of the global gtid_slave_pos is only needed during
failover, so querying it every monitor loop is unnecessary. The
value is now only requested when deciding on a new master server
or when waiting for the selected promotion target to clear its
relay logs.

Also, when waiting for the logs to clear, gtid_io_pos must stay
constant or failover is cancelled. Io_pos advancing indicates that
the server is still receiving events from the old master.
2017-11-07 13:09:51 +02:00
Esa Korhonen
0bb54511b7 MXS-1490: Query binlog & gtid settings, read @@gtid_slave_pos
The Gtid_Slave_Pos returned by SHOW ALL SLAVES STATUS is not quite
reliable (MDEV-14182) so the variable version is used instead. Added
a convenience function for querying a single row of values.

Also,  gtid_strict_mode, log_bin and log_slave_updates are now
queried during failover. The first only causes a warning message
if disabled, the last two affect new master selection.
2017-11-06 12:23:35 +02:00
Johan Wikman
2115ad7911 Make lines <= 110 chars long 2017-11-02 09:29:24 +02:00
Esa Korhonen
e79a95cd96 MXS-1490: Parse Gtid-strings with multiple triplets
Gtid_Slave_Pos may contain multiple triplets even with single-source
replication if the domain has changed at some point. For failover, we
only need to know the current domain values, so the gtid-parsing now
accepts an optional domain parameter. The Gtid-class still only stores
one triplet of values.

When parsing the Show Slave Status result, Gtid_IO_Pos is parsed first.
The resulting domain is then read from Gtid_Slave_Pos.
2017-11-01 14:43:13 +02:00
Esa Korhonen
0f2c1ff7d6 MXS-1490: Wait for a slave to clear relay logs before promotion
When selecting the new master server, Gtid_IO_Pos is checked to
select the slave with the latest event in relay log. If there is a
tie, the slave that has processed most events wins.

It's possible that the winning slave has unprocessed events. In
this case, failover waits for the slave to complete processing the
log. The maximum wait is defined in monitor parameter
"failover_timeout", defaulting to 90 seconds. If time runs out
failover ends in failure.

The Gtid struct was separated to its own definition to handle gtid:s
easier.
2017-10-31 18:27:16 +02:00
Esa Korhonen
41cd0cd6d7 MXS-1490 Separate SlaveStatus information to its own class
The SlaveStatus info is now in a separate class, although it's
still embedded in the MYSQL_SERVER_INFO-class. Both classes now
use strings intead of char*:s.
2017-10-30 10:33:41 +02:00
Markus Mäkelä
c7c670930c MXS-1493: Check that master appears dead before verifying it
Before the verification of the master's failure is done, the master must
first appear to have failed.
2017-10-27 15:31:46 +03:00
Markus Mäkelä
0bc439641a Add helper function for reading values by field name
The helper function provides map-like access to row values. This is used
to retrieve the values for all MariaDB 10.0+ versions as there are
differences in the returned results between 10.1 and 10.2.
2017-10-27 15:31:46 +03:00
Markus Mäkelä
2d1e5f46fa Remove use of timestamps in failover code
Using timestamps to detect whether MaxScale was active or passive can
cause problems if multiple events happen at the same time. This can be
avoided by separating events into actively observed and passively observed
events. This clarifies the logic by removing the ambiguity of timestamps.

As the monitoring threads are separate from the worker threads, it is
prudent to use atomic operations to modify and read the state of the
MaxScale. This will impose an happens-before relation between MaxScale
being set into passive mode and events being classified as being passively
observed.
2017-10-27 15:31:46 +03:00
Markus Mäkelä
52473c379b Extract Gtid_Slave_Pos in mysqlmon
The string form value of Gtid_Slave_Pos is extracted into different
integer components.
2017-10-27 15:31:46 +03:00
Markus Mäkelä
0be39b8545 MXS-1493: Improve master failure detection
The master failure can now be verified by checking when the slaves are
connected to the master. If the slaves do not receive any events from the
master, the connections are considered as down after a configurable limit.

Added two parameters for controlling whether the check is done and for how
long the monitor waits before doing the failover.
2017-10-27 15:31:18 +03:00
Markus Mäkelä
26b47d0b90 MXS-1493: Collect slave heartbeats
The slave heartbeat count and period are collected from the SHOW ALL
SLAVES STATUS output. This, in addition to the relay log position, is used
to calculate the point in time when a slave has last interacted with the
master.

By using this timestamp, the monitor can enforce a minimum "timeout" for
the master before a failover is performed.
2017-10-27 15:30:38 +03:00
Esa Korhonen
48a15368d0 MXS-1490-1492: First version of failover script
Works in ideal situations and can be tested. Does not consider
relay log and only checks that commands were received by a backend.
Work in progress.
2017-10-27 10:54:50 +03:00
Markus Mäkelä
114ea49e10 MXS-1494: Add missing replication credentials parameters
The parameters weren't added to the list of module parameters.
2017-10-26 17:37:02 +03:00
Esa Korhonen
63c7550196 MXS-1490 Prepare for failover functionality addition
Moved mon_process_failover() from monitor.cc to mysql_mon.cc. Renamed
some functions and variables related to previous failover functionality
to avoid confusion.
2017-10-25 12:24:29 +03:00
Markus Mäkelä
554ae642d7 MXS-1495: Add failover sanity check
The sanity check disables the failover functionality if a server is
configured to replicate from more than one source.
2017-10-24 23:45:23 +03:00
Markus Mäkelä
c3ff2aa1e9 MXS-1495: Move the MYSQL_SERVER_INFO extraction into a function
The get_server_info function takes the monitor handle and a database and
returns the corresponding MYSQL_SERVER_INFO struct. This hides a part of
the actual implementation of the info struct from the monitor code,
allowing future refactoring to be done. It also makes the code a bit more
readable.
2017-10-24 23:44:59 +03:00
Markus Mäkelä
95ac9d501c MXS-1494: Add replication credentials to mysqlmon
The credentials used for slave servers can now be controlled with the
replication_user and replication_password parameters.
2017-10-24 23:44:46 +03:00
Markus Mäkelä
75a2e190b2 Add function for updating the MYSQL_SERVER_INFO struct
The values in the MYSQL_SERVER_INFO struct can now be updated with the
update_slave_status function.

Also moved the number of configured and running slave configurations into
the info struct. This removes the need to pass output parameters.
2017-10-24 15:43:03 +03:00
Markus Mäkelä
3cefb53e1d Split server state and info processing into two
The MYSQL_SERVER_INFO struct is updated first and then the server status
is updated. This allows the function to be called without it affecting the
server state.
2017-10-24 15:27:36 +03:00
Johan Wikman
df816ea2a9 MXS-1460 Add failover_script parameter
The failover script can now be specified in the configuration file.
2017-10-03 15:24:29 +03:00
Markus Mäkelä
8c3c103060 Merge branch '2.2' into 2.2-mrm 2017-10-03 14:52:21 +03:00
Markus Mäkelä
7ca8db14de MXS-1444: Add monitor parameter alteration
The parameter handling for monitors can now be done in a consistent manner
by establishing a rule that the monitor owns the parameter object as long
as it is running. This will allow parameters to be added and removed
safely both from outside and inside monitors.

Currently this functionality is only used by mysqlmon to disable failover
after an attempt to perform a failover has failed.
2017-10-03 14:50:20 +03:00
Markus Mäkelä
27d1be7f96 Merge branch '2.2' of github.com:mariadb-corporation/MaxScale into 2.2 2017-10-03 14:46:14 +03:00
Markus Mäkelä
bd39284f9c Merge branch '2.1' into 2.2 2017-10-03 14:30:06 +03:00
Johan Wikman
267a45ad63 MXS-1441 Add switchover_script parameter
If a switchover_script parameter is given, its value will be used as
the switchover script. Otherwise the default will be used. Currently
just echo.

The MySQL Monitor now introduces two script variables, CURRENT_MASTER
and NEW_MASTER, that contain information about the current and new
master respectively.

Switchover is performed only if switchover has been enabled and MaxScale
is *not* in passive mode.
2017-10-03 13:51:08 +03:00
Johan Wikman
8d1c4bdd56 MXS-1441 Use monitor_launch_script for performing switchover
To be able to do that, we need to get hold of the MXS_MONITORED_SERVER
corresponding to the SERVER specified as the new master.

So, instead of just return a boolean indicating whether the server was
found or not we return the MXS_MONITORED_SERVER pointer.
2017-10-03 09:28:42 +03:00
Johan Wikman
f29d8209cc MXS-1441 Create proper json error objects 2017-10-02 16:08:12 +03:00
Johan Wikman
438b4e0341 Merge branch '2.2' into 2.2-mrm 2017-10-02 15:49:08 +03:00
Johan Wikman
68432bbaa3 Rename MXS_MONITOR::databases to MXS_MONITOR::monitored_servers
More descriptive name. Some local varaibles could now also be
renamed to be more descriptive, but that's for another day.
2017-10-02 15:33:58 +03:00
Johan Wikman
8d03876e3e Rename MXS_MONITOR_SERVERS to MXS_MONITORED_SERVER
An element in a linked list is not a list.
2017-10-02 15:05:17 +03:00
Johan Wikman
a81d85fba2 MXS-1441 Add switchover logic
Switchover expects one or two servers as argument, one (the new
master) if there is no master and two (the new master, and the
current master) if there currently is a master.

The procedure is as follows:
- Stop monitor
- Check that provided arguments are reasonable.
  - If there is no master currently, then only one argument is
    accepted.
  - If there is a master, then it must also be specified.

  This is to prevent pathological cases where the situation has
  changed after the admin has issued the switchover command.
- Check the failover mode and disable it.
- Perform the failover.
- If succeeded, enable failover if it was.
- If it failed, if failover was enabled, do not enable it and log
  an alert. If failover was not enabled, just log an error.
2017-09-29 16:19:12 +03:00
Johan Wikman
f87a878073 Set monitor state before thread launch
When the monitor is started, the state is immediately updated
to running, and not only when the thread actually has started
executing.
2017-09-29 13:10:49 +03:00
Johan Wikman
89d1f81e37 Merge branch '2.2' into 2.2-mrm 2017-09-28 15:19:20 +03:00
Johan Wikman
9c60b68476 Convert mysql_mon.c to mysql_mon.cc 2017-09-28 15:17:27 +03:00