529 lines
13 KiB
Markdown
529 lines
13 KiB
Markdown
# How to make MariaDB MaxScale High Available
|
|
|
|
The document shows an example of a Pacemaker / Corosync setup with MariaDB MaxScale based on Linux Centos 6.5, using three virtual servers and unicast heartbeat mode with the following minimum requirements:
|
|
|
|
- MariaDB MaxScale process is started/stopped and monitored via /etc/init.d/maxscale script that is LSB compatible in order to be managed by Pacemaker resource manager
|
|
|
|
- A Virtual IP is set providing the access to the MariaDB MaxScale process that could be set to one of the cluster nodes
|
|
|
|
- Pacemaker/Corosync and crmsh command line tool basic knowledge
|
|
|
|
Please note the solution is a quick setup example that may not be suited for all production environments.
|
|
|
|
## Clustering Software installation
|
|
|
|
On each node in the cluster do the following steps.
|
|
|
|
### Add clustering repos to yum
|
|
|
|
```
|
|
# vi /etc/yum.repos.d/ha-clustering.repo
|
|
```
|
|
|
|
Add the following to the file.
|
|
|
|
```
|
|
[haclustering]
|
|
name=HA Clustering
|
|
baseurl=http://download.opensuse.org/repositories/network:/ha-clustering:/Stable/CentOS_CentOS-6/
|
|
enabled=1
|
|
gpgcheck=0
|
|
```
|
|
|
|
### Install the software
|
|
|
|
```
|
|
# yum install pacemaker corosync crmsh
|
|
```
|
|
|
|
Package versions used
|
|
|
|
```
|
|
Package pacemaker-1.1.10-14.el6_5.3.x86_64
|
|
Package corosync-1.4.5-2.4.x86_64
|
|
Package crmsh-2.0+git46-1.1.x86_64
|
|
```
|
|
|
|
### Assign hostname on each node
|
|
|
|
In this example the three names used for the nodes are: node1,node,node3
|
|
|
|
```
|
|
[root@server1 ~]# hostname node1
|
|
...
|
|
[root@server2 ~]# hostname node2
|
|
...
|
|
[root@server3 ~]# hostname node3
|
|
```
|
|
|
|
For each node, add all the server names into `/etc/hosts`.
|
|
|
|
```
|
|
[root@node3 ~]# vi /etc/hosts
|
|
10.74.14.39 node1
|
|
10.228.103.72 node2
|
|
10.35.15.26 node3 current-node
|
|
...
|
|
[root@node1 ~]# vi /etc/hosts
|
|
10.74.14.39 node1 current-node
|
|
10.228.103.72 node2
|
|
10.35.15.26 node3
|
|
```
|
|
|
|
**Note**: add _current-node_ as an alias for the current node in each of the /etc/hosts files.
|
|
|
|
### Prepare authkey for optional cryptographic use
|
|
|
|
On one of the nodes, say node2 run the corosync-keygen utility and follow
|
|
|
|
```
|
|
[root@node2 ~]# corosync-keygen
|
|
|
|
Corosync Cluster Engine Authentication key generator. Gathering 1024 bits for key from /dev/random. Press keys on your keyboard to generate entropy.
|
|
|
|
After completion the key will be found in /etc/corosync/authkey.
|
|
```
|
|
|
|
### Prepare the corosync configuration file
|
|
|
|
Using node2 as an example:
|
|
|
|
```
|
|
[root@node2 ~]# vi /etc/corosync/corosync.conf
|
|
```
|
|
|
|
Add the following to the file:
|
|
|
|
```
|
|
# Please read the corosync.conf.5 manual page
|
|
|
|
compatibility: whitetank
|
|
|
|
totem {
|
|
version: 2
|
|
secauth: off
|
|
interface {
|
|
member {
|
|
memberaddr: node1
|
|
}
|
|
member {
|
|
memberaddr: node2
|
|
}
|
|
member {
|
|
memberaddr: node3
|
|
}
|
|
ringnumber: 0
|
|
bindnetaddr: current-node
|
|
mcastport: 5405
|
|
ttl: 1
|
|
}
|
|
transport: udpu
|
|
}
|
|
|
|
logging {
|
|
fileline: off
|
|
to_logfile: yes
|
|
to_syslog: yes
|
|
logfile: /var/log/cluster/corosync.log
|
|
debug: off
|
|
timestamp: on
|
|
logger_subsys {
|
|
subsys: AMF
|
|
debug: off
|
|
}
|
|
}
|
|
|
|
# this will start Pacemaker processes
|
|
|
|
service {
|
|
ver: 0
|
|
name: pacemaker
|
|
}
|
|
```
|
|
|
|
**Note**: in this example:
|
|
|
|
- unicast UDP is used
|
|
|
|
- bindnetaddr for corosync process is current-node, that has the right value on each node due to the alias added in /etc/hosts above
|
|
|
|
- Pacemaker processes are started by the corosync daemon, so there is no need to launch it via /etc/init.d/pacemaker start
|
|
|
|
### Copy configuration files and auth key on each of the other nodes
|
|
|
|
```
|
|
[root@node2 ~]# scp /etc/corosync/* root@node1:/etc/corosync/
|
|
...
|
|
[root@node2 ~]# scp /etc/corosync/* root@nodeN:/etc/corosync/
|
|
```
|
|
|
|
Corosync needs port 5405 to be opened. Configure any firewall or iptables accordingly. For a quick start just disable iptables on each nodes:
|
|
|
|
```
|
|
[root@node2 ~]# service iptables stop
|
|
...
|
|
[root@nodeN ~]# service iptables stop
|
|
```
|
|
|
|
### Start Corosyn on each node
|
|
|
|
```
|
|
[root@node2 ~] #/etc/init.d/corosync start
|
|
...
|
|
[root@nodeN ~] #/etc/init.d/corosync start
|
|
```
|
|
|
|
Check that the corosync daemon is successfully bound to port 5405.
|
|
|
|
```
|
|
[root@node2 ~] #netstat -na | grep 5405
|
|
udp 0 0 10.228.103.72:5405 0.0.0.0:*
|
|
```
|
|
|
|
Check if other nodes are reachable with nc utility and option UDP (-u).
|
|
|
|
```
|
|
[root@node2 ~] #echo "check ..." | nc -u node1 5405
|
|
[root@node2 ~] #echo "check ..." | nc -u node3 5405
|
|
...
|
|
[root@node1 ~] #echo "check ..." | nc -u node2 5405
|
|
[root@node1 ~] #echo "check ..." | nc -u node3 5405
|
|
```
|
|
|
|
If the following message is displayed, there is an issue with communication between the nodes.
|
|
|
|
```
|
|
nc: Write error: Connection refused
|
|
```
|
|
|
|
This is most likely to be an issue with the firewall configuration on your nodes. Check and resolve any issues with your firewall configuration.
|
|
|
|
### Check the cluster status from any node
|
|
|
|
```
|
|
[root@node3 ~]# crm status
|
|
```
|
|
|
|
The command should produce the following.
|
|
|
|
```
|
|
[root@node3 ~]# crm status
|
|
Last updated: Mon Jun 30 12:47:53 2014
|
|
Last change: Mon Jun 30 12:47:39 2014 via crmd on node2
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
0 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
```
|
|
|
|
For the basic setup disable the following properties:
|
|
|
|
- stonith
|
|
|
|
- quorum policy
|
|
|
|
```
|
|
[root@node3 ~]# crm configure property 'stonith-enabled'='false'
|
|
[root@node3 ~]# crm configure property 'no-quorum-policy'='ignore'
|
|
```
|
|
|
|
For additional information see:
|
|
|
|
[http://www.clusterlabs.org/doc/crm_fencing.html](http://www.clusterlabs.org/doc/crm_fencing.html)
|
|
|
|
[http://clusterlabs.org/doc/](http://clusterlabs.org/doc/)
|
|
|
|
The configuration is automatically updated on every node.
|
|
|
|
Check it from another node, say node1:
|
|
|
|
```
|
|
[root@node1 ~]# crm configure show
|
|
node node1
|
|
node node2
|
|
node node3
|
|
property cib-bootstrap-options: \
|
|
dc-version=1.1.10-14.el6_5.3-368c726 \
|
|
cluster-infrastructure="classic openais (with plugin)" \
|
|
expected-quorum-votes=3 \
|
|
stonith-enabled=false \
|
|
no-quorum-policy=ignore \
|
|
placement-strategy=balanced \
|
|
default-resource-stickiness=infinity
|
|
```
|
|
|
|
The Corosync / Pacemaker cluster is ready to be configured to manage resources.
|
|
|
|
## MariaDB MaxScale init script
|
|
|
|
The MariaDB MaxScale init script in `/etc/init.d./maxscale` allows to start, stop, restart and monitor the MariaDB MaxScale process running on the system.
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale
|
|
Usage: /etc/init.d/maxscale {start|stop|status|restart|condrestart|reload}
|
|
```
|
|
|
|
- Start
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale start
|
|
Starting MaxScale: maxscale (pid 25892) is running... [ OK ]
|
|
```
|
|
|
|
- Start again
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale start
|
|
Starting MaxScale: found maxscale (pid 25892) is running.[ OK ]
|
|
```
|
|
|
|
- Stop
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale stop
|
|
Stopping MaxScale: [ OK ]
|
|
```
|
|
|
|
- Stop again
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale stop
|
|
Stopping MaxScale: [FAILED]
|
|
```
|
|
|
|
- Status (MaxScale not running)
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale status
|
|
MaxScale is stopped [FAILED]
|
|
```
|
|
|
|
The script exit code for "status" is 3
|
|
|
|
- Status (MaxScale is running)
|
|
|
|
```
|
|
[root@node1 ~]# /etc/init.d/maxscale status
|
|
Checking MaxScale status: MaxScale (pid 25953) is running.[ OK ]
|
|
```
|
|
|
|
The script exit code for "status" is 0
|
|
|
|
Read the following for additional information about LSB init scripts:
|
|
|
|
[http://www.linux-ha.org/wiki/LSB_Resource_Agents](http://www.linux-ha.org/wiki/LSB_Resource_Agents)
|
|
|
|
After checking that the init scripts for MariaDB MaxScale work, it is possible to configure MariaDB MaxScale for HA via Pacemaker.
|
|
|
|
# Configure MariaDB MaxScale for HA with Pacemaker
|
|
|
|
```
|
|
[root@node2 ~]# crm configure primitive MaxScale lsb:maxscale \
|
|
op monitor interval="10s” timeout=”15s” \
|
|
op start interval="0” timeout=”15s” \
|
|
op stop interval="0” timeout=”30s”
|
|
```
|
|
|
|
MaxScale resource will be started.
|
|
|
|
```
|
|
[root@node2 ~]# crm status
|
|
Last updated: Mon Jun 30 13:15:34 2014
|
|
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
1 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
MaxScale (lsb:maxscale): Started node1
|
|
```
|
|
|
|
## Basic use cases
|
|
|
|
### Resource restarted after a failure
|
|
|
|
In the example MariaDB MaxScale PID is 26114, kill the process immediately.
|
|
|
|
```
|
|
[root@node2 ~]# kill -9 26114
|
|
...
|
|
[root@node2 ~]# crm status
|
|
Last updated: Mon Jun 30 13:16:11 2014
|
|
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node2
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
1 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
Failed actions:
|
|
|
|
MaxScale_monitor_15000 on node1 'not running' (7): call=19, status=complete, last-rc-change='Mon Jun 30 13:16:14 2014', queued=0ms, exec=0ms
|
|
```
|
|
|
|
**Note**: the _MaxScale_monitor_ failed action
|
|
|
|
After a few seconds it will be started again.
|
|
|
|
```
|
|
[root@node2 ~]# crm status
|
|
Last updated: Mon Jun 30 13:21:12 2014
|
|
Last change: Mon Jun 30 13:15:28 2014 via cibadmin on node1
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
1 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
MaxScale (lsb:maxscale): Started node1
|
|
```
|
|
|
|
### The resource cannot be migrated to node1 for a failure
|
|
|
|
First, migrate the the resource to another node, say node3.
|
|
|
|
```
|
|
[root@node1 ~]# crm resource migrate MaxScale node3
|
|
...
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
Failed actions:
|
|
|
|
MaxScale_start_0 on node1 'not running' (7): call=76, status=complete, last-rc-change='Mon Jun 30 13:31:17 2014', queued=2015ms, exec=0ms
|
|
```
|
|
|
|
**Note**: the _MaxScale_start_ failed action on node1, and after a few seconds.
|
|
|
|
```
|
|
[root@node3 ~]# crm status
|
|
Last updated: Mon Jun 30 13:35:00 2014
|
|
Last change: Mon Jun 30 13:31:13 2014 via crm_resource on node3
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
1 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
MaxScale (lsb:maxscale): Started node2
|
|
|
|
Failed actions:
|
|
|
|
MaxScale_start_0 on node1 'not running' (7): call=76, status=complete, last-rc-change='Mon Jun 30 13:31:17 2014', queued=2015ms, exec=0ms
|
|
```
|
|
|
|
Successfully, MaxScale has been started on a new node (node2).
|
|
|
|
**Note**: Failed actions remain in the output of crm status.
|
|
|
|
With "crm resource cleanup MaxScale" is possible to cleanup the messages:
|
|
|
|
```
|
|
[root@node1 ~]# crm resource cleanup MaxScale
|
|
Cleaning up MaxScale on node1
|
|
Cleaning up MaxScale on node2
|
|
Cleaning up MaxScale on node3
|
|
```
|
|
|
|
The cleaned status is visible from other nodes as well.
|
|
|
|
```
|
|
[root@node2 ~]# crm status
|
|
Last updated: Mon Jun 30 13:38:18 2014
|
|
Last change: Mon Jun 30 13:38:17 2014 via crmd on node3
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
1 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
MaxScale (lsb:maxscale): Started node2
|
|
```
|
|
|
|
## Add a Virtual IP (VIP) to the cluster
|
|
|
|
It’s possible to add a virtual IP to the cluster. MariaDB MaxScale process will be only contacted via this IP. The virtual IP can move across nodes in case one of them fails.
|
|
|
|
The Setup is very easy. Assuming an addition IP address is available and can be added to one of the nodes, this is the new configuration to add.
|
|
|
|
```
|
|
[root@node2 ~]# crm configure primitive maxscale_vip ocf:heartbeat:IPaddr2 params ip=192.168.122.125 op monitor interval=10s
|
|
```
|
|
|
|
MariaDB MaxScale process and the VIP must be run in the same node, so it is mandatory to add to the configuration to the group ‘maxscale_service’.
|
|
|
|
```
|
|
[root@node2 ~]# crm configure group maxscale_service maxscale_vip MaxScale
|
|
```
|
|
|
|
The following is the final configuration.
|
|
|
|
```
|
|
[root@node3 ~]# crm configure show
|
|
node node1
|
|
node node2
|
|
node node3
|
|
primitive MaxScale lsb:maxscale \
|
|
op monitor interval=15s timeout=10s \
|
|
op start interval=0 timeout=15s \
|
|
op stop interval=0 timeout=30s
|
|
primitive maxscale_vip IPaddr2 \
|
|
params ip=192.168.122.125 \
|
|
op monitor interval=10s
|
|
group maxscale_service maxscale_vip MaxScale \
|
|
meta target-role=Started
|
|
property cib-bootstrap-options: \
|
|
dc-version=1.1.10-14.el6_5.3-368c726 \
|
|
cluster-infrastructure="classic openais (with plugin)" \
|
|
expected-quorum-votes=3 \
|
|
stonith-enabled=false \
|
|
no-quorum-policy=ignore \
|
|
placement-strategy=balanced \
|
|
last-lrm-refresh=1404125486
|
|
```
|
|
|
|
Check the resource status.
|
|
|
|
```
|
|
[root@node1 ~]# crm status
|
|
Last updated: Mon Jun 30 13:51:29 2014
|
|
Last change: Mon Jun 30 13:51:27 2014 via crmd on node1
|
|
Stack: classic openais (with plugin)
|
|
Current DC: node2 - partition with quorum
|
|
Version: 1.1.10-14.el6_5.3-368c726
|
|
|
|
3 Nodes configured, 3 expected votes
|
|
2 Resources configured
|
|
|
|
Online: [ node1 node2 node3 ]
|
|
|
|
Resource Group: maxscale_service
|
|
|
|
maxscale_vip (ocf::heartbeat:IPaddr2): Started node2
|
|
|
|
MaxScale (lsb:maxscale): Started node2
|
|
```
|
|
|
|
With both resources on node2, now MariaDB MaxScale service will be reachable via the configured VIP address 192.168.122.125.
|
|
|