Commit Graph

426 Commits

Author SHA1 Message Date
5d4775cac1 MXS-2168 Update test_cycle_find
The test now uses both server id:s and hostname:port combinations.
2018-11-21 10:40:21 +02:00
90da3a4d90 Remove MXS_MONITORED_SERVER mapping from MariaDBMon
The mapping was rarely used.
2018-11-21 10:30:11 +02:00
1a046bd453 MXS-2168 Add 'assume_unique_hostnames'-setting to MariaDBMonitor
Adds the setting and takes it into use during replication graph creation
and the most important checks.
2018-11-21 10:30:11 +02:00
bba0bc0f31 MXS-2158 Relax requirements for manual rejoin
The operation is now allowed even if the rejoining server has empty gtid:s.
Auto-rejoin keeps the safeties on.
2018-11-16 13:03:30 +02:00
6a1cfddb43 MXS-2158 Clean up gtid updating during rejoin
Error messages from update_gtids() are now printed. can_replicate_from()
no longer updates gtid:s.
2018-11-16 12:56:24 +02:00
a377a9fc5a Add gtid event in reset-replication
Adds a "FLUSH TABLES" command at the end so that the new master has a non-empty
gtid_binlog_pos after the operation.
2018-11-14 11:01:48 +02:00
14e38e4e08 MXS-2158 Return true if update_gtids() succeeds, even if no data is returned
Previously, if the server had no gtid:s, the method would fail leading to
a confusing error message. This could even totally stop the monitor from working
if a recent server version (10.X) did not have any gtid events.
2018-11-14 10:56:42 +02:00
ecc7442358 Detect manual commands faster
Previous, MariaDBMonitor would wait until the next monitor interval before detecting
a new manual command. The commands are now checked every 100 ms.
2018-11-08 19:12:00 +02:00
fb3ccc94d6 MXS-1944 Cleanup function parameters
The naming and ordering is now a bit more consistent between promote() and demote().
2018-11-07 12:55:59 +02:00
e4e2235297 MXS-1944 Use time limited methods in rejoin
Uses switchover time limit, since the typical rejoin of a standalone server
is somewhat similar to a switchover.
2018-11-07 12:55:59 +02:00
184e187732 Different cluster operations use different parameter types
Only the parameters used by all operations are in the common class.
2018-11-07 12:55:59 +02:00
a4ce4e4613 MariaDBServer no longer uses ClusterOperation
The functions in the server class now only use the general parameters object.
2018-11-07 12:55:59 +02:00
8877e7180b Continue separation of ClusterOperation elements 2018-11-07 12:55:59 +02:00
90e6ff078a Divide ClusterOperation to two types
The main class was getting unwieldly and too general. Dividing the fields
helps adding support for other operation types.

This commit leaves most data duplicated, later commits clean up the affected code.
2018-11-07 12:55:59 +02:00
11a756a028 Detect undefined references at link time
Instruct the linker to make sure all symbols are resolved at link time.
2018-11-06 21:34:28 +02:00
c5a54d2fe9 Disallow switchover promotion of a server low on disk space
This protects the user and also prevents a neverending series of
automatic switchovers in the case when all servers are low.
2018-11-06 11:44:50 +02:00
906d8cee5b Format all files
Formatted all files with uncrustify.
2018-10-31 09:46:02 +02:00
36b666898c Fix connection merging
The conditional was inverted.
2018-10-16 16:09:38 +03:00
2d61b78439 Fix low disk space maintenance
The setting didn't work because the code updated a status flag which
would be overwritten before being read. Also, promotion code now checks
that the server is not in maintenance.
2018-10-16 16:09:38 +03:00
0c203fa02d Don't redirect duplicate connections
The redirection method checks if a slave connection to the redirection
target already exists. If so, the connection is not modified. Also, failover
better detects duplicate connections during promotion.
2018-10-16 16:09:38 +03:00
e930270b9c Use copy when checking removed connections
The function modifies the reference parameter contents indirectly.
2018-10-16 16:09:38 +03:00
f554ef770b Allow switchover for arbitrary topologies
The demoted server no longer needs to be the master.
2018-10-16 16:09:38 +03:00
0cf8ea43f7 Redirect slaves of promotion target
This affects situations where the promoted server is a relay or multimaster
group member.
2018-10-16 16:09:38 +03:00
2f76c48b06 Clarify server version error message
The required server version is now printed.
2018-10-11 11:41:46 +03:00
c0945020ee Only running slave connections are checked for non-gtid replication
This prevents auto-failover from being disabled due to recently generated or
non-functional slave connections.
2018-10-11 11:41:46 +03:00
f2067fcf7c Monitor cleanup
Removes unused code, compacts lines, moves code.
No functional changes.
2018-10-11 11:39:05 +03:00
d0444ff054 SlaveStatus::to_short_string() uses member field
The owner server name is now stored in a field.
2018-10-10 17:26:48 +03:00
2f1512a22d Cleanup slave connection removal during promotion/demotion
The removing and slave status updating is now separated to a function.
As the MariaDBServer object now contains the updated slave connections,
keeping track of removed connections is no longer required.
2018-10-09 14:29:49 +03:00
c10fab977d Cleanup slave connection copy & merge
The two cases are now separated. In switchover, the promotion and
demotion targets can swap connections between each other without worry.
In failover, the two connection lists must be merged semi-intelligently.

The slave connections of the two servers are now saved to the operation
descriptor object at the start of the operation. This allows slave status
updating during the operation.
2018-10-09 14:29:49 +03:00
4d6f961695 Cleanup mariadbserver.hh 2018-10-09 14:29:49 +03:00
5cc4eb08ee Clean up mariadbmon.hh 2018-10-09 14:29:49 +03:00
68d65682b5 Reorganize MariaDBServer code
The server-class keeps growing, so the additional classes are moved out of
the main class file.
2018-10-09 14:29:49 +03:00
75ea1b6ea1 Fix formatting of new(std::nothrow)
The code previously formatted everything as `new( std::nothrow)`.
2018-10-04 21:50:44 +03:00
a398da58a4 Add sleep to execute_cmd_time_limit
If the query fails instantly, the retries end up busy-looping. Now
each try is at least one second.
2018-10-04 20:19:57 +03:00
707506feae A slave must have running slaves to be a relay master
This prevents some questionable status assignments, but also means that
the Relay Master status can be lost if a slave goes down. This is
contrary to Master status which is not lost if slaves go down. Fixes
mxs1961_standalone_rejoin.
2018-10-04 20:16:29 +03:00
374ae2fc9b Only redirect usable slaves
Prevents pointless retrying/waiting when redirecting slaves.
2018-10-04 20:11:12 +03:00
80c731f02a Fix verify_master_failure
The log message had changed, changed test to match. Also, the remaining
delay is now printed.
2018-10-04 13:38:10 +03:00
86ae0c3e4d MXS-1845 Remove unneeded code & cleanup 2018-10-04 13:38:10 +03:00
a1b3a005dd MXS-1845 Relax cluster operation support requirements
Support for more complicated topologies is quite close to completion and
in any case the function was too aggressive.
2018-10-04 13:09:28 +03:00
30eb21914f MXS-1845 Switchover cleanup
Several small changes:
Binlog is flushed at the end of old master demotion.
Only new master is required to catch up to old master.
Use the same replication check method as failover.
2018-10-04 11:45:33 +03:00
49e85d9a28 MXS-1845 Add demotion code
The master demotion in switchover now uses query retrying with
the switchover time limit.
2018-10-04 11:45:33 +03:00
a4747f5b03 Revert the last commit, and an additional fix to the
"Fix code for warnings:" commit.
2018-10-03 17:22:10 +03:00
268e689dc5 Fix code for warnings: unused-but-set-variable and warn_unused_result. 2018-10-03 16:33:24 +03:00
d14b9bfe43 MXS-1845 Cluster stabilization rewrite
No longer writes events to the master, as this creates problems if the
promoted server was not the overall master. Instead, the slave status
output is inspected.
2018-10-02 11:09:16 +03:00
1ca5d02abb MXS-1845 Add redirection code
Should work with multimaster replication.
2018-10-02 11:09:16 +03:00
6b8443aba6 MXS-1845 Complete server promotion code
Now copies slave connections from the previous master. Promotion
code taken into use.
2018-10-01 18:06:39 +03:00
c65edd1298 Enhance StopWatch
Clean up, comments and enhancements. StopWatch lap() didn't mean lap-time, but elapsed time. Changed meaning to lap-time and added split() for split-time.
2018-10-01 09:30:24 +03:00
fe81b399b2 Use maxbase time and clock classes instead of std::chrono 2018-09-27 17:04:59 +03:00
05d18e81ae Use string instead of stringstream
Most of the monitor was already using string for formatted printing.
2018-09-27 16:46:59 +03:00
dd9ff27743 MXS-1845 Rewrite server promotion code
In progress, does not yet overwrite existing code.

The new promotion mechanism automatically retries queries which timed out. It also
handles multimaster situations correctly.
2018-09-26 13:20:29 +03:00