MaxScale

Author	SHA1	Message	Date
Esa Korhonen	b94f3b8792	Failover: do not check cluster stabilization if no slaves Caused debug assert.	2018-02-12 12:37:41 +02:00
Esa Korhonen	b8d3da4968	Add error tolerance to "servers_no_promotion" Previously, if the list contained servers that were not monitored by the monitor yet were valid servers, an error value would be returned and the monitor failed to start. With this update, the non-monitored servers are simply ignored when forming the final list. Also, added printing of the list to diagnostics.	2018-02-12 10:49:28 +02:00
Esa Korhonen	faaf43ff39	Add gtid to monitor diagnostics, clean up formatting Gtid:s are now queried every monitor loop. dignostics() no longer prints slave related info if the server has no slave connection.	2018-02-10 12:32:56 +02:00
Esa Korhonen	a0d9c7da74	External master server support for failover/switchover If the master is replicating from an external master, the monitor will save the host:port of the external server. During demotion, the old master stops the external replication while the new master begins it. Also, any commands that would add to gtid have to be omitted when an external master is in play.	2018-02-08 18:44:08 +02:00
Markus Mäkelä	ed28d986e9	Fix debug assertion when no master is present If no master is present, the debug assertion would dereference a NULL pointer.	2018-02-08 13:33:31 +02:00
Markus Mäkelä	b059d78a30	Fix master loss on split cluster When four servers (A, B, C and E where E and A replicate from each other and A is the master for B and C) form a cluster and only three of them (A, B and C) are configured into MaxScale, a failover operation from A to B (making B the current master) and a restart of A causes B to lose its master status. The following diagram illustrates the state of the cluster at the end of the process described above. +----------------------+ \| +---+ \| +------------+ B <-+ \| +-v-+ \| +---+ \| \| \| E \| \| \| \| +-^-+ \| +---+ +-+-+ \| +------+ A \| \| C \| \| \| +---+ +---+ \| \| \| +----------------------+ The external server E was not correctly ignored in the replication topology generation causing both A and B to be seen as the lowest slave nodes in the tree. From a theoretical point of view this is the correct interpretation as there are two distinct trees and neither of them contains any true masters. In practice, MaxScale should treat any servers that replicate from an external master as root level master nodes. Doing this guarantees that they are labeled as masters if they have slaves replicating from them.	2018-02-08 12:48:56 +02:00
Johan Wikman	6c42221c9c	MXS-1654 Prevent crash in debug mode if no master is found	2018-02-07 16:43:13 +02:00
Markus Mäkelä	8558ace801	Clean up ignore_external_masters code Removed redundant checks and cleared only bits that aren't already cleared.	2018-02-07 16:07:16 +02:00
Markus Mäkelä	8bf756ca56	Update root_master when using a standalone master When detect_standalone_master is enabled, the root_master variable was not updated after the master was changed by the standalone server detection mechanism. This caused debug assertions to fire in addition to possibly causing some of the ignore_external_masters logic to break.	2018-02-07 16:07:16 +02:00
Esa Korhonen	1cf3de4a74	Add config parameter for excluding servers from failover "servers_no_promotion" is a comma-separated list of servers which cannot be chosen when selecting a new master during failover (auto or manual), or when automatically selecting a new master for switchover (currently disabled). The servers in the list are redirected normally and can be promoted by switchover when manually selecting a new master.	2018-02-07 14:07:10 +02:00
Esa Korhonen	4a478d31f3	Print Gtid IO position during monitor diagnostics	2018-02-05 17:21:20 +02:00
Markus Mäkelä	f6afb0c6d1	MXS-1643: Make Master and Slave status mutually exclusive The Master status now prevents Slave status from being assigned to a server. In practice this simply means that the master will not have both the Master and Slave status bits.	2018-02-05 16:53:43 +02:00
Esa Korhonen	a83b36ca45	Use 64 bits for storing server id In debug mode, when scanning the server id from a string, check that resulting number is 32bit. Also, when querying the server id, query the global version. Now, if a super user modifies the server id the monitor will notice it. Server id:s in gtid:s are handled similarly.	2018-02-02 11:34:32 +02:00
Esa Korhonen	255250652d	Refactor pre-switchover, add similar checks as in failover Now detects some erroneous situations before starting switchover. Switchover can be activated without specifying current master. In this case, the cluster master server is selected.	2018-01-31 10:40:09 +02:00
Esa Korhonen	e455e8d43c	Simplify monitor start and stop during switchover/failover If these operations failed, the monitor could be left stopped.	2018-01-29 15:44:32 +02:00
Esa Korhonen	9ded584836	Check that all slaves use gtid replication before performing failover	2018-01-29 13:33:16 +02:00
Esa Korhonen	257034bf3e	Clarify master failure verification The two previous functions were somewhat overlapping.	2018-01-23 16:14:50 +02:00
Markus Mäkelä	753b97303a	MXS-1606: Create maxscale_schema database if missing The monitor will now also create the database if it is missing. Since it already creates the table, also creating the database is not a large addition. Cleaned up some of the related checking code and combined them into a simple utility function.	2018-01-19 11:34:25 +02:00
Esa Korhonen	a4f6176ced	Fix bug in printing switchover/failover module command info The string constant passed to the register-function went invalid once CREATE_MODULE() completed, causing random characters to be printed.	2018-01-17 12:01:00 +02:00
Esa Korhonen	23f2c3b980	Better failover timing and redirection success is tested Works similar to switchover.	2018-01-16 18:05:12 +02:00
Esa Korhonen	c2c898ee93	Fix formatting in MariaDB Monitor	2018-01-16 13:27:20 +02:00
Esa Korhonen	ff2ad05d0a	Add manual rejoin command to MariaDB Monitor The rejoin command takes as parameters the monitor name and name of server to rejoin. This change required refactoring the rejoin code.	2018-01-16 13:20:35 +02:00
Esa Korhonen	5f4db64ac7	Better timing for switchover, check slaves for IO/SQL errors Time elapsed is now properly tracked during a switchover. After slave redirection, an event is added to the master. Then, the slaves are queried repeatedly until they advance to the newest event. I/O and SQL errors are also detected.	2018-01-08 15:23:25 +02:00
Esa Korhonen	047c08f577	MXS-1588: Wait on all slaves during switchover During switchover, MASTER_GTID_WAIT is now called on all slaves. This causes switchover to complete slower than before but is safer if log_slave_updates is not on on the new master server. Also, read_only is disabled on the demoted server if waiting on slaves or promotion fails. This should effectively cancel the failover for the old master.	2018-01-03 12:52:33 +02:00
Johan Wikman	9558addbfe	Update module name for mariadbmon	2018-01-02 11:01:28 +02:00
Johan Wikman	d4f9cb661f	MXS-1587 Rename mysqlmon to mariadbmon 'mysqlmon' is still accepted but 'mariadbmon' is loaded instead. This is done at runtime instead of e.g. by using a symbolic link, so that a warning can be logged. The warning is logged and the translation of the module name is made by the code that loads the modules so that it's easy to do the same thing for other modules as well. In a subsequent commit the documentation is updated.	2017-12-27 11:22:27 +02:00
Esa Korhonen	3ccd6eed28	MXS-1588 Fix switchover Change the ordering of the two flushes such that FLUSH LOGS comes last. This seems to make sure gtid:s are updated to newest values before the MASTER_GTID_WAIT-call. Without this fix, switchover does complete succesfully, but some of the slaves may not be able to replicate due to not having same events as new master. Exact reason for this still unclear.	2017-12-22 13:35:36 +02:00
Markus Mäkelä	79652301d8	Fix split of mysqlmon sources For some reason, the source code of mysqlmon was split into C and C++ sources. This caused problems by effectively discarding all changes from 2.1 that are merged into 2.2. This commit merges the changes into the correct file that were added to the wrong file.	2017-12-21 16:24:03 +02:00
Esa Korhonen	7ed9172496	MXS-1533: Rejoin also when the old master cannot be connected Previously, the rejoin would only be ran on servers with a connected slave io thread. This patch runs the rejoin also on slaves which cannot connect to a downed old master while the master hostname or port differs from the current cluster master server.	2017-12-18 13:04:47 +02:00
Johan Wikman	b6e983c0b5	Change default value of detect_standalone_master The default value was changed from false to true.	2017-12-13 13:18:44 +02:00
Markus Mäkelä	79afaa447e	Merge branch '2.1' into 2.2	2017-12-12 13:23:02 +02:00
Esa Korhonen	b29a8eb4f8	MXS-1513: Flush logs before tables during switchover_demote_master In this order, the new binary log file will have 1 event instead of 0. In total, only 1 event is added.	2017-12-08 12:28:50 +02:00
Esa Korhonen	c6daf8c26b	MXS-1513: More accurate error messages The failed query is now printed.	2017-12-07 10:54:30 +02:00
Esa Korhonen	b2bc087508	MXS-1491: Print failcount Failcount is now printed in "show monitors". Also, when master goes down but failover does not yet happen because failcount > 1, a message is logged.	2017-12-07 10:05:45 +02:00
Esa Korhonen	046ed5c93d	MXS-1513: mysql_mon.cc formatting changes Ran astyle, cut some long lines.	2017-12-04 13:53:16 +02:00
Esa Korhonen	c0ab80e459	MXS-1513: Cleanup some messages, change function names No real functionality changes.	2017-12-04 12:53:09 +02:00
Esa Korhonen	45834a89b5	MXS-1533: Rename "auto_join" to "auto_rejoin"	2017-12-04 09:59:29 +02:00
Esa Korhonen	508ce3a703	MXS-1491: Failover can be executed manually Also, renamed config setting "failover" to "auto_failover". Removed setting "switchover" as it is now always enabled.	2017-12-04 09:41:00 +02:00
Esa Korhonen	90f6d78a58	MXS-1533: Add automatic join feature When enabled, the monitor will redirect servers to replicate from the current master. Standalone servers and servers replicating from a slave are redirected.	2017-12-04 09:37:16 +02:00
Esa Korhonen	4cb50f48ad	MXS-1533: Fix relay master identification and root master detection Relay master servers must now have a running slave. Also, fix cluster master detection in get_replication_tree().	2017-11-30 16:16:19 +02:00
Markus Mäkelä	d5d41349ae	MXS-1509: Add `ignore_external_masters` parameter The new parameter allows ignoring of master servers that are external to the monitor configuration. This allows sub-trees of the actual replication tree to be used as fully fledged replication trees.	2017-11-30 12:39:00 +02:00
Esa Korhonen	23cd294dad	MXS-1533: Better handling of multi-domain gtids If the gtid_domain_pos of the master is ever modified, gtid-variables will have multiple domains. Generally, we are only interested in the most recent domain. This is tracked in gtid_domain_id:s and the value of the master is used for filtering the correct domain from all gtid-values. Also, use gtid_current_pos instead of gtid_slave_pos. The advantage of current_pos is that the same variable works also for master servers. The gtid-handling is now more thorough and detects some weird situations.	2017-11-27 16:31:13 +02:00
Esa Korhonen	15e330e127	MXS-1533: Save gtid_domain_id and server version to MySqlServerInfo Cleaned the surrounding code, as it was querying server version twice.	2017-11-27 16:30:43 +02:00
Esa Korhonen	dc2286c774	MXS-1513: Obay user-given new master server If given a readily selected master server, Switchover will use it as the new master. If the given server is invalid, nothing will happen and an error is returned.	2017-11-23 13:37:42 +02:00
Markus Mäkelä	396b81f336	Fix in-source builds The internal header directory conflicted with in-source builds causing a build failure. This is fixed by renaming the internal header directory to something other than maxscale. The renaming pointed out a few problems in a couple of source files that appeared to include internal headers when the headers were in fact public headers. Fixed maxctrl in-source builds by making the copying of the sources optional.	2017-11-22 18:40:18 +02:00
Esa Korhonen	8077d97e25	MXS-1513: Work around the backend_read_timeout-setting The setting limits the maximum time a MASTER_GTID_WAIT-function can wait. To work around this limitation, the function is now called in a loop such that the total timeout is approximately equal to the requested timeout.	2017-11-20 12:31:11 +02:00
Esa Korhonen	59616b5f3e	MXS-1490: Cleanup error printing, add json error Slave redirection is a special case, as there the total failure is only known after all redirects have been attempted. In the failure case, all errors from connections are gathered to one message.	2017-11-20 11:33:47 +02:00
Esa Korhonen	84d1ea0bff	MXS-1490: Perform failover only after failcount monitor loops The same failcount variable is used for the detect_standalone_master- feature.	2017-11-17 10:12:33 +02:00
Esa Korhonen	b63c6504a3	MXS-1513: Switchover script First version of switchover script. Unsafe to run as it has no timeouts for most queries. Also, removed code launching the previous switchover_script.	2017-11-16 10:51:12 +02:00
Markus Mäkelä	f41111b4bd	MXS-1517: Retain stale master bit even on master failure If a server goes down and it has the stale master bit enabled, all other bits for the server are cleared. This allows failed masters that have been replaced to be first detected and then reintroduced into the replication topology.	2017-11-14 16:53:09 +02:00

1 2 3 4 5 ...

466 Commits