MaxScale

Author	SHA1	Message	Date
Esa Korhonen	09df017528	MXS-1886 Better auto-rejoin error description and tolerance Auto-rejoin now explains more accurately if a server cannot be joined due to conflicting gtid. Also, auto-rejoin is no longer disabled if a join fails. Usually the fail is due to the server not replying fast enough with query completion. The query is often completed anyways. This can lead to some log spam.	2018-06-15 13:11:10 +03:00
Esa Korhonen	9e68d8ec3d	MXS-1886 Auto-failover error tolerance Auto-failover is no longer considered to have failed if the preconditions are not met. An error message with the failed checks is printed once, but the checks are repeated every loop as long as the master is down.	2018-06-15 12:52:03 +03:00
Markus Mäkelä	f2b2951c99	Track the number of performed monitoring intervals Tracking how many times the monitor has performed its monitoring allows the test framework to consistently wait for an event instead of waiting for a hard-coded time period. The MaxCtrl `api get` command can be used to easily extract the numeric value.	2018-06-06 08:46:46 +03:00
Esa Korhonen	fb56de641a	MXS-1859 Add options for enforcing read_only on slaves If the feature is enabled (default off), at the end of a monitor loop (once server states are known), read_only is enabled on slaves servers without it.	2018-05-18 15:29:56 +03:00
Esa Korhonen	39789c19d3	MXS-1856 Do not set read_only OFF if join_cluster() fails This could in some cases leave read_only OFF even if the target slave begins replication.	2018-05-08 13:56:57 +03:00
Esa Korhonen	2a38902aa6	MXS-1639 Discard results when executing sql text files This removes the limitation of not returning resultsets.	2018-04-24 13:21:44 +03:00
Esa Korhonen	fa7cd9450a	MXS-1639 Do not run demote_sql_file if the server already has a slave connection In this case, the server was already a slave and is not being demoted. Also, the file may contain queries which cannot be ran while a slave connection is running.	2018-04-24 13:21:44 +03:00
Esa Korhonen	739edcbe22	MXS-1639 Run user-given sql commands during promotion, demotion and rejoin The sql queries are given in two text files, defined by options promotion_sql_file and demotion_sql_file. The files must exist when monitor starts. The files are read line by line, ignoring empty lines and lines starting with '#'. All other lines are sent to the server being promoted, demoted or rejoined. Any error in opening a file, reading it or executing the contents will cause the entire operation to fail. The filed defined in demotion_sql_file is also ran when rejoining a server. This is to ensure a previously failed master is "demoted" properly when it joins the cluster.	2018-04-19 17:01:36 +03:00
Esa Korhonen	7209080236	MXS-1747 Improve error messages of rejoin operations Now states which query caused the error.	2018-03-28 12:39:10 +03:00
Esa Korhonen	6c32c7421b	MXS-1746 Query global gtid_domain_id instead of session-specific value The monitor queried the session-specific domain id, which does not follow the global value while the session is alive. This caused the monitor to follow the wrong gtid domain if the domain was changed after MaxScale was started. This patch modifies the query to read the global value instead. Even this is not fool-proof, as existing sessions can issue writes with the old domain, confusing the gtid-parsing.	2018-03-28 12:23:57 +03:00
Esa Korhonen	bd8b6dbc6f	MXS-1722 Add better error messages to switchover_demote_master() The error messages should now be a bit more reliable.	2018-03-21 15:04:39 +02:00
Esa Korhonen	2178667245	MXS-1679 Check for existence of master before continuing failover checks Seems to fix the issue with MaxScale detecting an old master down event.	2018-03-16 11:26:58 +02:00
Esa Korhonen	b982458497	MXS-1679 Add more accurate error printing The reason for rejoin failing should now be clearer.	2018-03-12 17:16:54 +02:00
Markus Mäkelä	5a62adc63e	MXS-1678: Detect broken replication with Last_IO_Errno This commit introduces changes that fix the relay master detection that was broken by the merge from 2.1 into 2.2 by commit 1ecd791887994209eb29e56e1271f8c407cd0cdf. In 2.2, the master server ID is used to detect whether a slave is actually replicating from a master. The value is still displayed even if the slave is not actively replicating from a master. The commit in 2.1 causes this value to be stored unconditionally if it is available. By checking the value of Last_IO_Errno and comparing it to a list of known error codes, we know whether the slave is replicating properly. The slave detection in 2.2 correctly identifies a broken slave with a stopped IO thread. Due to this, the test case must be modified to check that the relay master is not a slave if the IO thread is stopped.	2018-03-12 14:55:54 +02:00
Markus Mäkelä	f7b284bbb7	Check IO thread status when verifying master failure When MaxScale thinks that the master has failed, it tries to verify it by seeing if the slave server is receiving events. There was a missing IO thread status check in the slave_receiving_events function which caused the failover to wait until the verification timed out. The relay master detection logic also lacked a check for the slave SQL thread status. The code should check the state of the SQL thread to determine whether the server is actually a functional slave to a master.	2018-03-09 20:53:56 +02:00
Johan Wikman	d443e22d1b	Merge branch '2.2.3' into 2.2	2018-03-09 20:50:01 +02:00
Markus Mäkelä	f4c7a4700a	Disable fix to MXS-1678 in 2.2.3 The fix causes a regression in the failover functionality as there is a dependency between the slave's master ID and how the failover performs. This dependency should not exist but fixing it causes a problem with the mysqlmon_rejoin_bad2 test.	2018-03-08 21:03:52 +02:00
Markus Mäkelä	ff9024bdfb	MXS-1698: Remove false debug assertion It is not an error if the correct GTID is not found and thus it should not be asserted that one is found.	2018-03-07 11:55:46 +02:00
Johan Wikman	39d3c42c94	Merge branch '2.1' into 2.2	2018-03-01 17:52:42 +02:00
Johan Wikman	b67ab83486	Revert "Use dedicated header in NDBClusterMon" This reverts commit b9d80f6061d6b536d7a15febf0367e5f6dba0e84.	2018-02-24 15:43:15 +02:00
Johan Wikman	0bbf0246f9	Revert "Compile mariadbmon.h as C++" This reverts commit 60d57aee61d96832aeec1b8a61d36803c38ca77c.	2018-02-24 15:40:21 +02:00
Johan Wikman	236e906d88	Revert "Turn MariaDB Monitor struct to class with public fields" This reverts commit cb6f70119d9857b277306e9af5881fe29c574a32.	2018-02-24 15:37:50 +02:00
Johan Wikman	13661ab4a6	Revert "MariaDB Monitor: Move additional classes to separate file" This reverts commit ff55106610881d55db88eca9e2ef6a056cbc8d51.	2018-02-24 15:35:36 +02:00
Johan Wikman	e721733434	Revert "MariaDBMon: Move replication manipulation functions to a separate file" This reverts commit 8cdd23dda2add6486abb685834def94c72a09b6c.	2018-02-24 15:35:02 +02:00
Esa Korhonen	8cdd23dda2	MariaDBMon: Move replication manipulation functions to a separate file Refactoring continues. This update moves some of the replication manipulation functions to a separate file and turns them into class methods.	2018-02-22 10:51:52 +02:00
Esa Korhonen	ff55106610	MariaDB Monitor: Move additional classes to separate file Also use stl containers in monitor definition.	2018-02-21 12:24:24 +02:00
Esa Korhonen	cb6f70119d	Turn MariaDB Monitor struct to class with public fields Allows using std::string for strings. Also, cleanup.	2018-02-21 11:00:42 +02:00
Esa Korhonen	60d57aee61	Compile mariadbmon.h as C++	2018-02-20 11:14:21 +02:00
Esa Korhonen	b9d80f6061	Use dedicated header in NDBClusterMon NDBClusterMonitor used the MariaDBMonitor header instead of its own.	2018-02-20 11:09:04 +02:00
Esa Korhonen	754d80da75	Do not auto_rejoin if maxscale is passive	2018-02-14 17:30:02 +02:00
Markus Mäkelä	3b2ec4ab5a	Change references from MySQL to MariaDB A few were missed when the renaming was done. Also renamed the file to mariadbmon.cc.	2018-02-14 14:47:03 +02:00
Esa Korhonen	b94f3b8792	Failover: do not check cluster stabilization if no slaves Caused debug assert.	2018-02-12 12:37:41 +02:00
Esa Korhonen	b8d3da4968	Add error tolerance to "servers_no_promotion" Previously, if the list contained servers that were not monitored by the monitor yet were valid servers, an error value would be returned and the monitor failed to start. With this update, the non-monitored servers are simply ignored when forming the final list. Also, added printing of the list to diagnostics.	2018-02-12 10:49:28 +02:00
Esa Korhonen	faaf43ff39	Add gtid to monitor diagnostics, clean up formatting Gtid:s are now queried every monitor loop. dignostics() no longer prints slave related info if the server has no slave connection.	2018-02-10 12:32:56 +02:00
Esa Korhonen	a0d9c7da74	External master server support for failover/switchover If the master is replicating from an external master, the monitor will save the host:port of the external server. During demotion, the old master stops the external replication while the new master begins it. Also, any commands that would add to gtid have to be omitted when an external master is in play.	2018-02-08 18:44:08 +02:00
Markus Mäkelä	ed28d986e9	Fix debug assertion when no master is present If no master is present, the debug assertion would dereference a NULL pointer.	2018-02-08 13:33:31 +02:00
Markus Mäkelä	b059d78a30	Fix master loss on split cluster When four servers (A, B, C and E where E and A replicate from each other and A is the master for B and C) form a cluster and only three of them (A, B and C) are configured into MaxScale, a failover operation from A to B (making B the current master) and a restart of A causes B to lose its master status. The following diagram illustrates the state of the cluster at the end of the process described above. +----------------------+ \| +---+ \| +------------+ B <-+ \| +-v-+ \| +---+ \| \| \| E \| \| \| \| +-^-+ \| +---+ +-+-+ \| +------+ A \| \| C \| \| \| +---+ +---+ \| \| \| +----------------------+ The external server E was not correctly ignored in the replication topology generation causing both A and B to be seen as the lowest slave nodes in the tree. From a theoretical point of view this is the correct interpretation as there are two distinct trees and neither of them contains any true masters. In practice, MaxScale should treat any servers that replicate from an external master as root level master nodes. Doing this guarantees that they are labeled as masters if they have slaves replicating from them.	2018-02-08 12:48:56 +02:00
Johan Wikman	6c42221c9c	MXS-1654 Prevent crash in debug mode if no master is found	2018-02-07 16:43:13 +02:00
Markus Mäkelä	8558ace801	Clean up ignore_external_masters code Removed redundant checks and cleared only bits that aren't already cleared.	2018-02-07 16:07:16 +02:00
Markus Mäkelä	8bf756ca56	Update root_master when using a standalone master When detect_standalone_master is enabled, the root_master variable was not updated after the master was changed by the standalone server detection mechanism. This caused debug assertions to fire in addition to possibly causing some of the ignore_external_masters logic to break.	2018-02-07 16:07:16 +02:00
Esa Korhonen	1cf3de4a74	Add config parameter for excluding servers from failover "servers_no_promotion" is a comma-separated list of servers which cannot be chosen when selecting a new master during failover (auto or manual), or when automatically selecting a new master for switchover (currently disabled). The servers in the list are redirected normally and can be promoted by switchover when manually selecting a new master.	2018-02-07 14:07:10 +02:00
Esa Korhonen	4a478d31f3	Print Gtid IO position during monitor diagnostics	2018-02-05 17:21:20 +02:00
Markus Mäkelä	f6afb0c6d1	MXS-1643: Make Master and Slave status mutually exclusive The Master status now prevents Slave status from being assigned to a server. In practice this simply means that the master will not have both the Master and Slave status bits.	2018-02-05 16:53:43 +02:00
Esa Korhonen	a83b36ca45	Use 64 bits for storing server id In debug mode, when scanning the server id from a string, check that resulting number is 32bit. Also, when querying the server id, query the global version. Now, if a super user modifies the server id the monitor will notice it. Server id:s in gtid:s are handled similarly.	2018-02-02 11:34:32 +02:00
Esa Korhonen	255250652d	Refactor pre-switchover, add similar checks as in failover Now detects some erroneous situations before starting switchover. Switchover can be activated without specifying current master. In this case, the cluster master server is selected.	2018-01-31 10:40:09 +02:00
Esa Korhonen	e455e8d43c	Simplify monitor start and stop during switchover/failover If these operations failed, the monitor could be left stopped.	2018-01-29 15:44:32 +02:00
Esa Korhonen	9ded584836	Check that all slaves use gtid replication before performing failover	2018-01-29 13:33:16 +02:00
Esa Korhonen	257034bf3e	Clarify master failure verification The two previous functions were somewhat overlapping.	2018-01-23 16:14:50 +02:00
Markus Mäkelä	753b97303a	MXS-1606: Create maxscale_schema database if missing The monitor will now also create the database if it is missing. Since it already creates the table, also creating the database is not a large addition. Cleaned up some of the related checking code and combined them into a simple utility function.	2018-01-19 11:34:25 +02:00
Esa Korhonen	a4f6176ced	Fix bug in printing switchover/failover module command info The string constant passed to the register-function went invalid once CREATE_MODULE() completed, causing random characters to be printed.	2018-01-17 12:01:00 +02:00

1 2

57 Commits