256 lines
		
	
	
		
			9.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			256 lines
		
	
	
		
			9.0 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # MaxScale Failover with Keepalived and MaxCtrl
 | |
| 
 | |
| ## Introduction
 | |
| 
 | |
| [Keepalived](http://www.keepalived.org/index.html) is a routing software for
 | |
| load balancing and high-availability. It has several applications, but for this
 | |
| tutorial the goal is to set up a simple IP failover between two servers running
 | |
| MaxScale. If the main server fails the backup machine takes over, receiving any
 | |
| new connections. The Keepalived settings used in this tutorial follow the
 | |
| example given in [Simple keepalived failover setup on Ubuntu 14.04](
 | |
| https://raymii.org/s/tutorials/Keepalived-Simple-IP-failover-on-Ubuntu.html).
 | |
| 
 | |
| Two hosts and one client machine are used, all in the same LAN. Hosts run
 | |
| MaxScale and Keepalived. The backend servers may be running on one of the hosts,
 | |
| e.g. in docker containers, or on separate machines for a more realistic setup.
 | |
| Clients connect to the virtual IP (VIP), which is claimed by the current master
 | |
| host.
 | |
| 
 | |
| 
 | |
| 
 | |
| Once configured and running, the different Keepalived nodes continuously
 | |
| broadcast their status to the network and listen for each other. If a node does
 | |
| not receive a status message from another node with a higher priority than
 | |
| itself, it will claim the VIP, effectively becoming the master. Thus, a node can
 | |
| be put online or removed by starting and stopping the Keepalived service.
 | |
| 
 | |
| If the current master node is removed (e.g. by stopping the service or pulling
 | |
| the network cable) the remaining nodes will quickly elect a new master and
 | |
| future traffic to the VIP will be directed to that node. Any connections to the
 | |
| old master node will naturally break. If the old master comes back online, it
 | |
| will again claim the VIP, breaking any connections to the backup machine.
 | |
| 
 | |
| MaxScale has no knowledge of this even happening. Both MaxScales are running
 | |
| normally, monitoring the backend servers and listening for client connections.
 | |
| Since clients are connecting through the VIP, only the machine claiming the VIP
 | |
| will receive incoming connections. The connections between MaxScale and the
 | |
| backends are using real IPs and are unaffected by the VIP.
 | |
| 
 | |
| ## Configuration
 | |
| 
 | |
| MaxScale does not require any specific configuration to work with Keepalived in
 | |
| this simple setup, it just needs to be running on both hosts. The MaxScale
 | |
| configurations should be similar to the extent that both look identical to
 | |
| connecting clients. In practice the listening ports and related services should
 | |
| be the same. Setting the service-level setting “version_string” to different
 | |
| values on the MaxScale nodes is recommended, as it will be printed to any
 | |
| connecting clients indicating which node was connected to.
 | |
| 
 | |
| Keepalived requires specific setups on both machines. On the **primary host**,
 | |
| the */etc/keepalived/keepalived.conf*-file should be as follows.
 | |
| 
 | |
| ```
 | |
| vrrp_instance VI_1 {
 | |
|   state MASTER
 | |
|   interface eth0
 | |
|   virtual_router_id 51
 | |
|   priority 150
 | |
|   advert_int 1
 | |
|   authentication {
 | |
|     auth_type PASS
 | |
|     auth_pass mypass
 | |
|   }
 | |
|   virtual_ipaddress {
 | |
|     192.168.1.123
 | |
|   }
 | |
| }
 | |
| ```
 | |
| 
 | |
| The *state* must be MASTER on both hosts. *virtual_router_id* and *auth_pass*
 | |
| must be identical on all hosts. The *interface* defines the network interface
 | |
| used. This depends on the system, but often the correct value is *eth0*,
 | |
| *enp0s12f3* or similar. *priority* defines the voting strength between different
 | |
| Keepalived instances when negotiating on which should be the master. The
 | |
| instances should have different values of priority. In this example, the backup
 | |
| host(s) could have priority 149, 148 and so on. *advert_int* is the interval
 | |
| between a host “advertising” its existence to other Keepalived host. One second
 | |
| is a reasonable value.
 | |
| 
 | |
| *virtual_ipaddress* (VIP) is the IP the different Keepalived hosts try to claim
 | |
| and must be identical between the hosts. For IP negotiation to work, the VIP
 | |
| must be in the local network address space and unclaimed by any other machine
 | |
| in the LAN. An example *keepalived.conf*-file for a **backup host** is listed
 | |
| below.
 | |
| 
 | |
| ```
 | |
| vrrp_instance VI_1 {
 | |
|   state MASTER
 | |
|   interface eth0
 | |
|   virtual_router_id 51
 | |
|   priority 100
 | |
|   advert_int 1
 | |
|   authentication {
 | |
|     auth_type PASS
 | |
|     auth_pass mypass
 | |
|   }
 | |
|   virtual_ipaddress {
 | |
|     192.168.1.123
 | |
|   }
 | |
| }
 | |
| ```
 | |
| 
 | |
| Once the Keepalived service is running, recent log entries can be printed with
 | |
| the command `service keepalived status`.
 | |
| 
 | |
| ```
 | |
| Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Received higher prio advert
 | |
| Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Entering BACKUP STATE
 | |
| Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) removing protocol VIPs.
 | |
| ```
 | |
| 
 | |
| ## MaxScale health check
 | |
| 
 | |
| So far, none of this tutorial has been MaxScale-specific and the health of the
 | |
| MaxScale process has been ignored. To ensure that MaxScale is running on the
 | |
| current master host, a *check script* should be set. Keepalived runs the script
 | |
| regularly and if the script returns an error value, the Keepalived node will
 | |
| assume that it has failed, stops broadcasting its state and relinquishes the
 | |
| VIP. This allows another node to take the master status and claim the VIP. To
 | |
| define a check script, modify the configuration as follows. The example is for
 | |
| the primary node. See [Keepalived Check and Notify Scripts](
 | |
| https://tobrunet.ch/2013/07/keepalived-check-and-notify-scripts/) for more
 | |
| information.
 | |
| 
 | |
| ```
 | |
| vrrp_script chk_myscript {
 | |
|   script "/home/scripts/is_maxscale_running.sh"
 | |
|   interval 2 # check every 2 seconds
 | |
|   fall 2 # require 2 failures for KO
 | |
|   rise 2 # require 2 successes for OK
 | |
| }
 | |
| 
 | |
| vrrp_instance VI_1 {
 | |
|   state MASTER
 | |
|   interface wlp2s0
 | |
|   virtual_router_id 51
 | |
|   priority 150
 | |
|   advert_int 1
 | |
|   authentication {
 | |
|     auth_type PASS
 | |
|     auth_pass mypass
 | |
|   }
 | |
|   virtual_ipaddress {
 | |
|     192.168.1.13
 | |
|   }
 | |
|   track_script {
 | |
|     chk_myscript
 | |
|   }
 | |
| }
 | |
| ```
 | |
| 
 | |
| An example script, *is_maxscale_running.sh*, is listed below. The script uses
 | |
| MaxAdmin to try to contact the locally running MaxScale and request a server
 | |
| list, then check that the list has at least some expected elements. The timeout
 | |
| command ensures the MaxAdmin call exits in reasonable time. The script detects
 | |
| if MaxScale has crashed, is stuck or is totally overburdened and no longer
 | |
| responds to connections.
 | |
| 
 | |
| ```
 | |
| #!/bin/bash
 | |
| fileName="maxadmin_output.txt"
 | |
| rm $fileName
 | |
| timeout 2s maxadmin list servers > $fileName
 | |
| to_result=$?
 | |
| if [ $to_result -ge 1 ]
 | |
| then
 | |
|   echo Timed out or error, timeout returned $to_result
 | |
|   exit 3
 | |
| else
 | |
|   echo MaxAdmin success, rval is $to_result
 | |
|   echo Checking maxadmin output sanity
 | |
|   grep1=$(grep server1 $fileName)
 | |
|   grep2=$(grep server2 $fileName)
 | |
| 
 | |
|   if [ "$grep1" ] && [ "$grep2" ]
 | |
|   then
 | |
|     echo All is fine
 | |
|     exit 0
 | |
|   else
 | |
|     echo Something is wrong
 | |
|     exit 3
 | |
|   fi
 | |
| fi
 | |
| ```
 | |
| 
 | |
| ```
 | |
| Aug 11 10:51:56 maxscale2 Keepalived_vrrp[20257]: VRRP_Script(chk_myscript) failed
 | |
| Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Entering FAULT STATE
 | |
| Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) removing protocol VIPs.
 | |
| Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Now in FAULT state
 | |
| ```
 | |
| 
 | |
| ## MaxScale active/passive-setting
 | |
| 
 | |
| When using multiple MaxScales with replication cluster management features
 | |
| (failover, switchover, rejoin), only one MaxScale instance should be allowed to
 | |
| modify the cluster at any given time. This instance should be the one with
 | |
| MASTER Keepalived status. MaxScale itself does not know its state, but MaxCtrl
 | |
| (a replacement for MaxAdmin) can set a MaxScale instance to passive mode. As of
 | |
| version 2.2.2, a passive MaxScale behaves similar to an active one with the
 | |
| distinction that it won't perform failover, switchover or rejoin. Even manual
 | |
| versions of these commands will end in error. The passive/active mode
 | |
| differences may be expanded in the future.
 | |
| 
 | |
| To have Keepalived modify the MaxScale operating mode, a notify script is
 | |
| needed. This script is ran whenever Keepalived changes its state. The script
 | |
| file is defined in the Keepalived configuration file as `notify`.
 | |
| 
 | |
| ```
 | |
| ...
 | |
| virtual_ipaddress {
 | |
|   192.168.1.13
 | |
| }
 | |
| track_script {
 | |
|   chk_myscript
 | |
| }
 | |
| notify /home/user/notify_script.sh
 | |
| ...
 | |
| ```
 | |
| Keepalived calls the script with three parameters. In our case, only the third
 | |
| parameter, STATE, is relevant. An example script is below.
 | |
| 
 | |
| ```
 | |
| #!/bin/bash
 | |
| 
 | |
| TYPE=$1
 | |
| NAME=$2
 | |
| STATE=$3
 | |
| 
 | |
| OUTFILE=/home/user/state.txt
 | |
| 
 | |
| case $STATE in
 | |
|   "MASTER") echo "Setting this MaxScale node to active mode" > $OUTFILE
 | |
|                   maxctrl alter maxscale passive false
 | |
|                   exit 0
 | |
|                   ;;
 | |
|   "BACKUP") echo "Setting this MaxScale node to passive mode" > $OUTFILE
 | |
|                   maxctrl alter maxscale passive true
 | |
|                   exit 0
 | |
|                   ;;
 | |
|   "FAULT")  echo "MaxScale failed the status check." > $OUTFILE
 | |
|                   maxctrl alter maxscale passive true
 | |
|                   exit 0
 | |
|                   ;;
 | |
|         *)        echo "Unknown state" > $OUTFILE
 | |
|                   exit 1
 | |
|                   ;;
 | |
| esac
 | |
| 
 | |
| ```
 | |
| The script logs the current state to a text file and sets the operating mode of
 | |
| MaxScale. The FAULT case also attempts to set MaxScale to passive mode,
 | |
| although the MaxCtrl command will likely fail.
 | |
| 
 | |
| If all MaxScale/Keepalived instances have a similar notify script, only one
 | |
| MaxScale should ever be in active mode.
 | 
