The LifeKeeper Generic Application Recovery Kit for Load Balancer Health Checks (“GenLB Recovery Kit”) may be used as part of a LifeKeeper resource hierarchy to help route load balancer traffic to the cluster node where a particular resource is currently in-service. This is achieved by maintaining a listener on a user-specified TCP port on the cluster node where the resource is in-service.
Install the GenLB Recovery Kit
The GenLB Recovery Kit is only supported on SIOS Protection Suite for Linux version 9.5.1 and later, and an rpm installation file can be obtained from the same FTP directory where the SIOS Protection Suite for Linux installation media was downloaded. The filename format is steeleye-lkHOTFIX-Gen-LB-PL-7172-x.x.x-xxxx.x86_64.rpm. No additional SIOS licenses are required in order to create GenLB resources.
- Install the rpm:
[root@node-a ~]# rpm -ivh steeleye-lkHOTFIX-Gen-LB-PL-7172-x.x.x-xxxx.x86_64.rpm Verifying... ####################### [100%] Preparing... ####################### [100%] Updating / installing... 1:steeleye-lkHOTFIX-Gen-LB-PL-7172-# [100%]
- Verify that the GenLB resource action scripts have been successfully installed to the /opt/LifeKeeper/SIOS_Hotfixes/Gen-LB-PL-7172 directory.
[root@node-a ~]# ls -l /opt/LifeKeeper/SIOS_Hotfixes/Gen-LB-PL-7172/ total 12 -r-x------ 1 root root 2579 Jan 01 00:00 quickCheck -r-x------ 1 root root 1734 Jan 01 00:00 remove.pl -r-x------ 1 root root 3909 Jan 01 00:00 restore.pl
Create a GenLB Resource
In this example we will create a sample GenLB resource on server node-a listening on TCP port 54321.
- In the LifeKeeper GUI, click to open the Create Resource Wizard. Select the “Generic Application” Recovery Kit.
- Enter the following values into the Create Resource Wizard and click Create when prompted. The icon indicates that the default option is chosen. The “Application Info” field has the format “<TCP Port> <Response>”. The response is optional, and no whitespace is allowed in the response text.
Switchback Type | Intelligent |
Server | node-a |
Restore Script | /opt/LifeKeeper/SIOS_Hotfixes/Gen-LB-PL-7172/restore.pl |
Remove Script | /opt/LifeKeeper/SIOS_Hotfixes/Gen-LB-PL-7172/remove.pl |
QuickCheck Script | /opt/LifeKeeper/SIOS_Hotfixes/Gen-LB-PL-7172/quickCheck |
Local Recovery Script | None (Empty) |
Application Info | 54321 |
Bring Resource In Service | Yes |
Resource Tag | ilb-test-54321 |
Once the resource has been created and brought in-service successfully, click Next> to proceed to the Pre-Extend Wizard.
- Enter the following values into the Pre-Extend Wizard. The icon indicates that the default option is chosen.
Target Server | node-b |
Switchback Type | Intelligent |
Template Priority | 1 |
Target Priority | 10 |
Once the pre-extend checks have passed, click Next> to proceed to the Extend gen/app Resource Hierarchy Wizard.
- Enter the following values into the Extend gen/app Resource Hierarchy Wizard and click Extend when prompted. The icon indicates that the default option is chosen.
Resource Tag | ilb-test-54321 |
Application Info | 54321 |
Once the resource has been extended successfully, click Finish.
- Back in the LifeKeeper GUI, we see that the newly created ilb-test-54321 resource is Active on node-a and Standby on node-b. In this state, a TCP load balancer with a TCP health check on port 54321 will treat node-a as healthy and node-b as unhealthy, causing all load balancer traffic to be routed to node-a. When placed in a resource hierarchy with a protected application, this resource will ensure that load balancer traffic is always routed to the server on which the application is currently running.
Test GenLB Resource Switchover and Failover
In this section we will assume that we have created an internal load balancer with node-a and node-b as backend targets which has the following properties:
- Front-end internal IP: 10.20.0.10
- TCP health check on port 54321
and that the ilb-test-54321 GenLB resource that was created in the previous section is currently Active on node-a.
For convenience we will set up a temporary Apache web server that will simply return the hostname of each server. Execute the following commands on both node-a and node-b. Adjust the commands accordingly (e.g., to use zypper install) if installing on a SLES server.
# yum install -y httpd # systemctl start httpd # echo $(hostname) > /var/www/html/index.html
Before continuing, verify that traffic is allowed on TCP port 80 for node-a and node-b.
We will now test the switchover and failover capabilities of the ilb-test-54321 GenLB resource.
- With the ilb-test-54321 resource Active on node-a and Standby on node-b, verify the output of the following command on each server.
[root@node-a ~]# curl http://10.20.0.10 node-a [root@node-b ~]# curl http://10.20.0.10 node-a
- Execute the following command on node-a:
[root@node-a ~]# while true; do curl http://10.20.0.10; sleep 1; done
and initiate a switchover of the ilb-test-54321 resource to node-b. Once the switchover has completed successfully, use Ctrl-C (SIGINT) to terminate the running command on node-a.
The output of the command should be similar to:
…
node-a
node-a
node-a
[switchover occurs]
node-b
node-b
node-b
…
In particular, the load balancer should cleanly stop routing traffic to node-a before beginning to route it to node-b. If the output near the switchover point looks like the following:
…
node-a
[switchover occurs]
node-b
node-a
node-b
node-a
node-b
node-a
node-b
node-b
node-b
…
then you may need to edit the health check properties to decrease the time between health check probes and/or decrease the minimum number of unsuccessful health check probes before a backend instance is marked unhealthy and removed from the load balancer pool.
- With the ilb-test-54321 resource Active on node-b, execute the following command on node-a:
[root@node-a ~]# while true; do curl http://10.20.0.10; sleep 1; done
and forcefully reboot node-b to initiate a failover of the ilb-test-54321 resource back to node-a:
[root@node-b ~]# echo b > /proc/sysrq-trigger
After the failover has completed successfully, use Ctrl-C (SIGINT) to terminate the running command on node-a.
The output of the command on node-a should be similar to:
…
node-b
node-b
node-b
[failover occurs]
node-a
node-a
node-a
…
At this point basic verification of the GenLB resource behavior is complete. Execute additional tests as necessary to verify the interaction between the GenLB resource and your protected application on switchover and failover. Once finished testing the GenLB resource functionality, the temporary Apache web servers may be removed by executing the following commands on both node-a and node-b:
# systemctl stop httpd # rm -f /var/www/html/index.html # yum remove -y httpd
Post your comment on this topic.