Add Keepalived tutorial

2017-08-25 11:41:08 +03:00 · 2017-08-25 11:41:08 +03:00 · 08ae659310
commit 08ae659310
parent 13f7015e7b
3 changed files with 191 additions and 0 deletions
--- a/Documentation/Documentation-Contents.md
+++ b/Documentation/Documentation-Contents.md
@ -37,6 +37,7 @@ These tutorials are for specific use cases and module combinations.

 - [Administration Tutorial](Tutorials/Administration-Tutorial.md)
 - [Avro Router Tutorial](Tutorials/Avrorouter-Tutorial.md)
+ - [Failover with Keepalived](Tutorials/Failover-with-Keepalived.md)
 - [Filter Tutorial](Tutorials/Filter-Tutorial.md)
 - [Galera Cluster Connection Routing Tutorial](Tutorials/Galera-Cluster-Connection-Routing-Tutorial.md)
 - [Galera Gluster Read Write Splitting Tutorial](Tutorials/Galera-Cluster-Read-Write-Splitting-Tutorial.md)
--- a/Documentation/Tutorials/Failover-with-Keepalived.md
+++ b/Documentation/Tutorials/Failover-with-Keepalived.md
@ -0,0 +1,190 @@
+# Failover with Keepalived
+
+## Introduction
+
+[Keepalived](http://www.keepalived.org/index.html) is a routing software for
+load balancing and high-availability. It has several applications, but for this
+tutorial the goal is to set up a simple IP failover between two servers running
+MaxScale. If the main server fails the backup machine takes over, receiving any
+new connections. The Keepalived settings used in this tutorial follow the
+example given in [Simple keepalived failover setup on Ubuntu 14.04](
+https://raymii.org/s/tutorials/Keepalived-Simple-IP-failover-on-Ubuntu.html).
+
+Two hosts and one client machine are used, all in the same LAN. Hosts run
+MaxScale and Keepalived. The backend servers may be running on one of the hosts,
+e.g. in docker containers, or on separate machines for a more realistic setup.
+Clients connect to the virtual IP (VIP), which is claimed by the current master
+host.
+
+![](images/Keepalived.png)
+
+Once configured and running, the different Keepalived nodes continuously
+broadcast their status to the network and listen for each other. If a node does
+not receive a status message from another node with a higher priority than
+itself, it will claim the VIP, effectively becoming the master. Thus, a node can
+be put online or removed by starting and stopping the Keepalived service.
+
+If the current master node is removed (e.g. by stopping the service or pulling
+the network cable) the remaining nodes will quickly elect a new master and
+future traffic to the VIP will be directed to that node. Any connections to the
+old master node will naturally break. If the old master comes back online, it
+will again claim the VIP, breaking any connections to the backup machine.
+
+MaxScale has no knowledge of this even happening. Both MaxScales are running
+normally, monitoring the backend servers and listening for client connections.
+Since clients are connecting through the VIP, only the machine claiming the VIP
+will receive incoming connections. The connections between MaxScale and the
+backends are using real IPs and are unaffected by the VIP.
+
+## Configuration
+
+MaxScale does not require any specific configuration to work with Keepalived in
+this simple setup, it just needs to be running on both hosts. The MaxScale
+configurations should be similar to the extent that both look identical to
+connecting clients. In practice the listening ports and related services should
+be the same. Setting the service-level setting “version_string” to different
+values on the MaxScale nodes is recommended, as it will be printed to any
+connecting clients indicating which node was connected to.
+
+Keepalived requires specific setups on both machines. On the **primary host**,
+the */etc/keepalived/keepalived.conf*-file should be as follows.
+
+```
+vrrp_instance VI_1 {
+  state MASTER
+  interface eth0
+  virtual_router_id 51
+  priority 150
+  advert_int 1
+  authentication {
+    auth_type PASS
+    auth_pass mypass
+  }
+  virtual_ipaddress {
+    192.168.1.123
+  }
+}
+```
+
+The *state* must be MASTER on both hosts. *virtual_router_id* and *auth_pass*
+must be identical on all hosts. The *interface* defines the network interface
+used. This depends on the system, but often the correct value is *eth0*,
+*enp0s12f3* or similar. *priority* defines the voting strength between different
+Keepalived instances when negotiating on which should be the master. The
+instances should have different values of priority. In this example, the backup
+host(s) could have priority 149, 148 and so on. *advert_int* is the interval
+between a host “advertising” its existence to other Keepalived host. One second
+is a reasonable value.
+
+*virtual_ipaddress* (VIP) is the IP the different Keepalived hosts try to claim
+and must be identical between the hosts. For IP negotiation to work, the VIP
+must be in the local network address space and unclaimed by any other machine
+in the LAN. An example *keepalived.conf*-file for a **backup host** is listed
+below.
+
+```
+vrrp_instance VI_1 {
+  state MASTER
+  interface eth0
+  virtual_router_id 51
+  priority 100
+  advert_int 1
+  authentication {
+    auth_type PASS
+    auth_pass mypass
+  }
+  virtual_ipaddress {
+    192.168.1.123
+  }
+}
+```
+
+Once the Keepalived service is running, recent log entries can be printed with
+the command `service keepalived status`.
+
+```
+Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Received higher prio advert
+Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) Entering BACKUP STATE
+Aug 11 10:27:59 maxscale2 Keepalived_vrrp[27369]: VRRP_Instance(VI_1) removing protocol VIPs.
+```
+
+## MaxScale health check
+
+So far, none of this tutorial has been MaxScale-specific and the health of the
+MaxScale process has been ignored. To ensure that MaxScale is running on the
+current master host, a *check script* should be set. Keepalived runs the script
+regularly and if the script returns an error value, the Keepalived node will
+assume that it has failed, stops broadcasting its state and relinquishes the
+VIP. This allows another node to take the master status and claim the VIP. To
+define a check script, modify the configuration as follows. The example is for
+the primary node. See [Keepalived Check and Notify Scripts](
+https://tobrunet.ch/2013/07/keepalived-check-and-notify-scripts/) for more
+information.
+
+```
+vrrp_script chk_myscript {
+  script "/home/scripts/is_maxscale_running.sh"
+  interval 2 # check every 2 seconds
+  fall 2 # require 2 failures for KO
+  rise 2 # require 2 successes for OK
+}
+
+vrrp_instance VI_1 {
+  state MASTER
+  interface wlp2s0
+  virtual_router_id 51
+  priority 150
+  advert_int 1
+  authentication {
+    auth_type PASS
+    auth_pass mypass
+  }
+  virtual_ipaddress {
+    192.168.1.13
+  }
+  track_script {
+    chk_myscript
+  }
+}
+```
+
+An example script, *is_maxscale_running.sh*, is listed below. The script uses
+MaxAdmin to try to contact the locally running MaxScale and request a server
+list, then check that the list has at least some expected elements. The timeout
+command ensures the MaxAdmin call exits in reasonable time. The script detects
+if MaxScale has crashed, is stuck or is totally overburdened and no longer
+responds to connections.
+
+```
+#!/bin/bash
+fileName="maxadmin_output.txt"
+rm $fileName
+timeout 2s maxadmin list servers > $fileName
+to_result=$?
+if [ $to_result -ge 1 ]
+then
+  echo Timed out or error, timeout returned $to_result
+  exit 3
+else
+  echo MaxAdmin success, rval is $to_result
+  echo Checking maxadmin output sanity
+  grep1=$(grep server1 $fileName)
+  grep2=$(grep server2 $fileName)
+
+  if [ "$grep1" ] && [ "$grep2" ]
+  then
+    echo All is fine
+    exit 0
+  else
+    echo Something is wrong
+    exit 3
+  fi
+fi
+```
+
+```
+Aug 11 10:51:56 maxscale2 Keepalived_vrrp[20257]: VRRP_Script(chk_myscript) failed
+Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Entering FAULT STATE
+Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) removing protocol VIPs.
+Aug 11 10:51:57 maxscale2 Keepalived_vrrp[20257]: VRRP_Instance(VI_1) Now in FAULT state
+```
--- a/Documentation/Tutorials/images/Keepalived.png
+++ b/Documentation/Tutorials/images/Keepalived.png