diff --git a/Documentation/Tutorials/MaxScale-HA-with-Corosync-Pacemaker.md b/Documentation/Tutorials/MaxScale-HA-with-Corosync-Pacemaker.md deleted file mode 100644 index c06af5720..000000000 --- a/Documentation/Tutorials/MaxScale-HA-with-Corosync-Pacemaker.md +++ /dev/null @@ -1,528 +0,0 @@ -# How to make MariaDB MaxScale High Available - -The document shows an example of a Pacemaker / Corosync setup with MariaDB MaxScale based on Linux Centos 6.5, using three virtual servers and unicast heartbeat mode with the following minimum requirements: - -- MariaDB MaxScale process is started/stopped and monitored via /etc/init.d/maxscale script that is LSB compatible in order to be managed by Pacemaker resource manager - -- A Virtual IP is set providing the access to the MariaDB MaxScale process that could be set to one of the cluster nodes - -- Pacemaker/Corosync and crmsh command line tool basic knowledge - -Please note the solution is a quick setup example that may not be suited for all production environments. - -## Clustering Software installation - -On each node in the cluster do the following steps. - -### Add clustering repos to yum - -``` -# vi /etc/yum.repos.d/ha-clustering.repo -``` - -Add the following to the file. - -``` -[haclustering] -name=HA Clustering -baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/ -enabled=1 -gpgcheck=0 -``` - -### Install the software - -``` -# yum install pacemaker corosync crmsh -``` - -Package versions used - -``` -Package pacemaker-1.1.10-14.el6_5.3.x86_64 -Package corosync-1.4.5-2.4.x86_64 -Package crmsh-2.0+git46-1.1.x86_64 -``` - -### Assign hostname on each node - -In this example the three names used for the nodes are: node1,node,node3 - -``` -[root@server1 ~]# hostname node1 -... -[root@server2 ~]# hostname node2 -... -[root@server3 ~]# hostname node3 -``` - -For each node, add all the server names into `/etc/hosts`. - -``` -[root@node3 ~]# vi /etc/hosts -10.74.14.39 node1 -10.228.103.72 node2 -10.35.15.26 node3 current-node -... -[root@node1 ~]# vi /etc/hosts -10.74.14.39 node1 current-node -10.228.103.72 node2 -10.35.15.26 node3 -``` - -**Note**: add _current-node_ as an alias for the current node in each of the /etc/hosts files. - -### Prepare authkey for optional cryptographic use - -On one of the nodes, say node2 run the corosync-keygen utility and follow - -``` -[root@node2 ~]# corosync-keygen - -Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy. - -After completion the key will be found in /etc/corosync/authkey. -``` - -### Prepare the corosync configuration file - -Using node2 as an example: - -``` -[root@node2 ~]# vi /etc/corosync/corosync.conf -``` - -Add the following to the file: - -``` -# Please read the corosync.conf.5 manual page - -compatibility: whitetank - -totem { - version: 2 - secauth: off - interface { - member { - memberaddr: node1 - } - member { - memberaddr: node2 - } - member { - memberaddr: node3 - } - ringnumber: 0 - bindnetaddr: current-node - mcastport: 5405 - ttl: 1 - } - transport: udpu -} - -logging { - fileline: off - to_logfile: yes - to_syslog: yes - logfile: /var/log/cluster/corosync.log - debug: off - timestamp: on - logger_subsys { - subsys: AMF - debug: off - } -} - -# this will start Pacemaker processes - -service { -ver: 0 -name: pacemaker -} -``` - -**Note**: in this example: - -- unicast UDP is used - -- bindnetaddr for corosync process is current-node, that has the right value on each node due to the alias added in /etc/hosts above - -- Pacemaker processes are started by the corosync daemon, so there is no need to launch it via /etc/init.d/pacemaker start - -### Copy configuration files and auth key on each of the other nodes - -``` -[root@node2 ~]# scp /etc/corosync/* root@node1:/etc/corosync/ -... -[root@node2 ~]# scp /etc/corosync/* root@nodeN:/etc/corosync/ -``` - -Corosync needs port 5405 to be opened. Configure any firewall or iptables accordingly. For a quick start just disable iptables on each nodes: - -``` -[root@node2 ~]# service iptables stop -... -[root@nodeN ~]# service iptables stop -``` - -### Start Corosyn on each node - -``` -[root@node2 ~] #/etc/init.d/corosync start -... -[root@nodeN ~] #/etc/init.d/corosync start -``` - -Check that the corosync daemon is successfully bound to port 5405. - -``` -[root@node2 ~] #netstat -na | grep 5405 -udp 0 0 10.228.103.72:5405 0.0.0.0:* -``` - -Check if other nodes are reachable with nc utility and option UDP (-u). - -``` -[root@node2 ~] #echo "check ..." | nc -u node1 5405 -[root@node2 ~] #echo "check ..." | nc -u node3 5405 -... -[root@node1 ~] #echo "check ..." | nc -u node2 5405 -[root@node1 ~] #echo "check ..." | nc -u node3 5405 -``` - -If the following message is displayed, there is an issue with communication between the nodes. - -``` -nc: Write error: Connection refused -``` - -This is most likely to be an issue with the firewall configuration on your nodes. Check and resolve any issues with your firewall configuration. - -### Check the cluster status from any node - -``` -[root@node3 ~]# crm status -``` - -The command should produce the following. - -``` -[root@node3 ~]# crm status -Last updated: Mon Jun 30 12:47:53 2014 -Last change: Mon Jun 30 12:47:39 2014 via crmd on node2 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -0 Resources configured - -Online: [ node1 node2 node3 ] -``` - -For the basic setup disable the following properties: - -- stonith - -- quorum policy - -``` -[root@node3 ~]# crm configure property 'stonith-enabled'='false' -[root@node3 ~]# crm configure property 'no-quorum-policy'='ignore' -``` - -For additional information see: - -[http://www.clusterlabs.org/doc/crm_fencing.html](http://www.clusterlabs.org/doc/crm_fencing.html) - -[http://clusterlabs.org/doc/](http://clusterlabs.org/doc/) - -The configuration is automatically updated on every node. - -Check it from another node, say node1: - -``` -[root@node1 ~]# crm configure show -node node1 -node node2 -node node3 -property cib-bootstrap-options: \ - dc-version=1.1.10-14.el6_5.3-368c726 \ - cluster-infrastructure="classic openais (with plugin)" \ - expected-quorum-votes=3 \ - stonith-enabled=false \ - no-quorum-policy=ignore \ - placement-strategy=balanced \ - default-resource-stickiness=infinity -``` - -The Corosync / Pacemaker cluster is ready to be configured to manage resources. - -## MariaDB MaxScale init script - -The MariaDB MaxScale init script in `/etc/init.d./maxscale` allows to start, stop, restart and monitor the MariaDB MaxScale process running on the system. - -``` -[root@node1 ~]# /etc/init.d/maxscale -Usage: /etc/init.d/maxscale {start|stop|status|restart|condrestart|reload} -``` - -- Start - -``` -[root@node1 ~]# /etc/init.d/maxscale start -Starting MaxScale: maxscale (pid 25892) is running... [ OK ] -``` - -- Start again - -``` -[root@node1 ~]# /etc/init.d/maxscale start -Starting MaxScale: found maxscale (pid 25892) is running.[ OK ] -``` - -- Stop - -``` -[root@node1 ~]# /etc/init.d/maxscale stop -Stopping MaxScale: [ OK ] -``` - -- Stop again - -``` -[root@node1 ~]# /etc/init.d/maxscale stop -Stopping MaxScale: [FAILED] -``` - -- Status (MaxScale not running) - -``` -[root@node1 ~]# /etc/init.d/maxscale status -MaxScale is stopped [FAILED] -``` - -The script exit code for "status" is 3 - -- Status (MaxScale is running) - -``` -[root@node1 ~]# /etc/init.d/maxscale status -Checking MaxScale status: MaxScale (pid 25953) is running.[ OK ] -``` - -The script exit code for "status" is 0 - -Read the following for additional information about LSB init scripts: - -[http://www.linux-ha.org/wiki/LSB_Resource_Agents](http://www.linux-ha.org/wiki/LSB_Resource_Agents) - -After checking that the init scripts for MariaDB MaxScale work, it is possible to configure MariaDB MaxScale for HA via Pacemaker. - -# Configure MariaDB MaxScale for HA with Pacemaker - -``` -[root@node2 ~]# crm configure primitive MaxScale lsb:maxscale \ -op monitor interval="10s” timeout=”15s” \ -op start interval="0” timeout=”15s” \ -op stop interval="0” timeout=”30s” -``` - -MaxScale resource will be started. - -``` -[root@node2 ~]# crm status -Last updated: Mon Jun 30 13:15:34 2014 -Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -1 Resources configured - -Online: [ node1 node2 node3 ] - - MaxScale (lsb:maxscale): Started node1 -``` - -## Basic use cases - -### Resource restarted after a failure - -In the example MariaDB MaxScale PID is 26114, kill the process immediately. - -``` -[root@node2 ~]# kill -9 26114 -... -[root@node2 ~]# crm status -Last updated: Mon Jun 30 13:16:11 2014 -Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -1 Resources configured - -Online: [ node1 node2 node3 ] - -Failed actions: - - MaxScale_monitor_15000 on node1 'not running' (7): call=19, status=complete, last-rc-change='Mon Jun 30 13:16:14 2014', queued=0ms, exec=0ms -``` - -**Note**: the _MaxScale_monitor_ failed action - -After a few seconds it will be started again. - -``` -[root@node2 ~]# crm status -Last updated: Mon Jun 30 13:21:12 2014 -Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node1 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -1 Resources configured - -Online: [ node1 node2 node3 ] - - MaxScale (lsb:maxscale): Started node1 -``` - -### The resource cannot be migrated to node1 for a failure - -First, migrate the the resource to another node, say node3. - -``` -[root@node1 ~]# crm resource migrate MaxScale node3 -... - -Online: [ node1 node2 node3 ] - -Failed actions: - - MaxScale_start_0 on node1 'not running' (7): call=76, status=complete, last-rc-change='Mon Jun 30 13:31:17 2014', queued=2015ms, exec=0ms -``` - -**Note**: the _MaxScale_start_ failed action on node1, and after a few seconds. - -``` -[root@node3 ~]# crm status -Last updated: Mon Jun 30 13:35:00 2014 -Last change: Mon Jun 30 13:31:13 2014 via crm_resource on node3 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -1 Resources configured - -Online: [ node1 node2 node3 ] - - MaxScale (lsb:maxscale): Started node2 - -Failed actions: - - MaxScale_start_0 on node1 'not running' (7): call=76, status=complete, last-rc-change='Mon Jun 30 13:31:17 2014', queued=2015ms, exec=0ms -``` - -Successfully, MaxScale has been started on a new node (node2). - -**Note**: Failed actions remain in the output of crm status. - - With "crm resource cleanup MaxScale" is possible to cleanup the messages: - -``` -[root@node1 ~]# crm resource cleanup MaxScale -Cleaning up MaxScale on node1 -Cleaning up MaxScale on node2 -Cleaning up MaxScale on node3 -``` - -The cleaned status is visible from other nodes as well. - -``` -[root@node2 ~]# crm status -Last updated: Mon Jun 30 13:38:18 2014 -Last change: Mon Jun 30 13:38:17 2014 via crmd on node3 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -1 Resources configured - -Online: [ node1 node2 node3 ] - - MaxScale (lsb:maxscale): Started node2 -``` - -## Add a Virtual IP (VIP) to the cluster - -It’s possible to add a virtual IP to the cluster. MariaDB MaxScale process will be only contacted via this IP. The virtual IP can move across nodes in case one of them fails. - -The Setup is very easy. Assuming an addition IP address is available and can be added to one of the nodes, this is the new configuration to add. - -``` -[root@node2 ~]# crm configure primitive maxscale_vip ocf:heartbeat:IPaddr2 params ip=192.168.122.125 op monitor interval=10s -``` - -MariaDB MaxScale process and the VIP must be run in the same node, so it is mandatory to add to the configuration to the group ‘maxscale_service’. - -``` -[root@node2 ~]# crm configure group maxscale_service maxscale_vip MaxScale -``` - -The following is the final configuration. - -``` -[root@node3 ~]# crm configure show -node node1 -node node2 -node node3 -primitive MaxScale lsb:maxscale \ - op monitor interval=15s timeout=10s \ - op start interval=0 timeout=15s \ - op stop interval=0 timeout=30s -primitive maxscale_vip IPaddr2 \ - params ip=192.168.122.125 \ - op monitor interval=10s -group maxscale_service maxscale_vip MaxScale \ - meta target-role=Started -property cib-bootstrap-options: \ - dc-version=1.1.10-14.el6_5.3-368c726 \ - cluster-infrastructure="classic openais (with plugin)" \ - expected-quorum-votes=3 \ - stonith-enabled=false \ - no-quorum-policy=ignore \ - placement-strategy=balanced \ - last-lrm-refresh=1404125486 -``` - -Check the resource status. - -``` -[root@node1 ~]# crm status -Last updated: Mon Jun 30 13:51:29 2014 -Last change: Mon Jun 30 13:51:27 2014 via crmd on node1 -Stack: classic openais (with plugin) -Current DC: node2 - partition with quorum -Version: 1.1.10-14.el6_5.3-368c726 - -3 Nodes configured, 3 expected votes -2 Resources configured - -Online: [ node1 node2 node3 ] - - Resource Group: maxscale_service - - maxscale_vip (ocf::heartbeat:IPaddr2): Started node2 - - MaxScale (lsb:maxscale): Started node2 -``` - -With both resources on node2, now MariaDB MaxScale service will be reachable via the configured VIP address 192.168.122.125. -