Virtual IP Resource
LifeKeeper brings an IP resource into service by creating an IP alias address on one of the physical network interfaces on the primary server. Users connect to the node using this alias address.
The IP Recovery Kit software performs checks to help ensure that the selected address, network mask and interface can function properly. The software verifies the following elements:
- Unused resource. The new IP address is not already assigned to any other IP resource in the LifeKeeper cluster.
- Unique address. The address cannot be currently active on the network. In addition to checking during creation, the software also performs the uniqueness check immediately before bringing the resource into service. If the software detects a duplicate address on the net, it does not bring the resource into service.
When the primary server fails, the IP Recovery Kit brings the IP resource into service on a backup server by configuring the IP alias on one of that server’s physical network interfaces.
Since session context is lost following recovery, after the recovery, IP users must reconnect using exactly the same procedures they used to connect originally.
In a manual switchover, the IP Recovery Kit removes the alias address from service on the active server before adding it to the backup server.
To clarify the administration and operation of the IP Recovery Kit, consider the scenario shown in Figure 1. This example configuration contains two servers, Server 1 and Server 2. Each server has a single LAN interface, eth0, connected to subnet 25.0.1. The user systems are also on this subnet. The LAN interfaces on Server 1 and Server 2 have addresses 18.104.22.168 and 22.214.171.124, respectively.
The system administrator decides to use 126.96.36.199 as the alias address for an IP resource, to be called ipname. The administrator creates entries in the /etc/hosts files (and in the DNS, if used), similar to the following:
Assuming that Server 1 is the primary server for the resource, the administrator creates the IP resource hierarchy for ipname on Server 1 using the wizard described in the section entitled Creating an IP Resource Hierarchy. The software finds the address associated with ipname (188.8.131.52) from /etc/hosts, verifies that it is available and brings it into service by configuring a secondary address on eth0 on Server 1. eth0 on Server 1 now responds to both server1 and ipname.
With LifeKeeper 7.3 or earlier, the new alias address can be verified using the ifconfig or ip addr show command. Starting with LifeKeeper 7.4, the ip addr show command should be used (for more information, see the IPv6 Known Issue).
Users can then connect to Server 1 by entering, for example, telnet ipname. If Server 1 crashes, LifeKeeper automatically switches over the ipname address to eth0 on Server 2. The user sessions on Server 1 terminate. When users re-run telnet ipname, they are connected to Server 2.
Regardless of where ipname is actively in service, addresses server1 and server2 are active and usable, though not protected by LifeKeeper recovery. The addresses could be used for any cases that require connection to a specific server by name rather than to a switched application. Examples might include remote system management and the LifeKeeper communications path. (In this case, for example, 184.108.40.206 and 220.127.116.11 would be used for the LifeKeeper communications path.)
Actual IP Resource
LifeKeeper can protect not only the virtual IP address but also an actual IP address (i.e Primary IP address which is configured for the network interface). This allows you to configure without the virtual IP address in the Amazon Web Services (AWS) environment by using Recovery Kit for Route 53. Refer to the Recovery Kit for Route 53™ Administration Guide for more information.
IP Resource Monitoring
LifeKeeper monitors the health of the IP resources under its control on a periodic basis, using the following techniques, in this order.
- Check the link status for the network interface on which the IP resource is configured to determine whether the interface is properly connected to the physical network.
- Verify that the IP resource is still configured as an alias on the appropriate network interface.
- Perform a broadcast ping test or ping a pre-configured list of addresses, using the protected IP address as the source address of the pings, to determine whether the IP resource can successfully send and receive data on the network.
The broadcast ping test is the default test mechanism. It operates by sending a broadcast ping packet to the broadcast address of the subnet associated with the IP resource, using the protected IP address as the source address. If a response is received from any address other than addresses on the local system, the test is considered successful.
For environments in which there are no systems on the network that can respond to the broadcast ping test (which is the default configuration of many systems), LifeKeeper also offers the ability to configure a list of addresses to be pinged as an alternative to the broadcast ping test. If such a list has been specified, the broadcast ping test is skipped, and all of the addresses in the list are pinged in parallel. The test is considered successful if a ping response is received from any one of the addresses in the Ping List. This technique is also useful to reduce broadcast storms on larger networks.
If any of these tests fail during the periodic health check of an IP resource, LifeKeeper is notified of the failure. LifeKeeper will first attempt a local recovery operation to try to restore the IP resource to a working state on the local node. See the section IP Local Recovery and Configuration Considerations for more information about the local recovery procedure. If local recovery is unsuccessful in restoring the IP resource to a working state, LifeKeeper will then attempt to migrate the application hierarchy containing the IP resource to another LifeKeeper system in the cluster.
LifeKeeper also uses these same health checks to verify the proper operation of an IP resource immediately after it is brought in-service. A failure of any of the checks will cause the in-service operation to fail.