This operations overview uses the case of protecting IPv4 addresses as an example.
Bringing a Resource In Service (restore)
- Checking protected IP addresses
Check whether protected IP is a physical IP address.
If the physical IP address is protected, the restore process ends with success at this point.
Note that if the [Restore and Recover] setting is [Disable], no further processing is performed and the restore process ends successfully. The default setting for [Restore and Recover] is [Enable].
- Checking the link status
If IP_NOLINKCHECK=1 is specified, skip this step and proceed to step 3.
Execute the following command to determine if the device is linked down or linked up depending on whether there is a “NO-CARRIER” status in the output.
# ip -o link show <device>
If the result of the above command indicates that the device link is up, go to step 3.
If the device link is down, the device is stopped and started only once. If the device link is still down, the restore process ends with an error.
For RHEL8 or later
# nmcli connection down <device> Wait for 5 seconds (variable, can be changed in seconds with IP_WAIT_LINKDOWN) # nmcli connection up <device> Wait for 1 second (Fixed, cannot be changed) # ip -o link show <device>
For SLES and RHEL7 series
# ifdown <device> Wait for 5 seconds (variable, can be changed in seconds with IP_WAIT_LINKDOWN) # ifup <device> Wait for 1 second (Fixed, cannot be changed) # ip -o link show <device>
In an environment where NetworkManager is enabled, in addition to checking the link status using the ip command, execute the following command to check the startup status of NetworkManager. If the check fails, the restore process ends with an error.
#nmcli connection show --active | grep -qE <device>
- Pre-confirmation of assignment status
Execute the following command to obtain the IP address assigned to the device and verify that there is no virtual IP address in the output.
# ip -4 -o addr show dev <device>
If there is no virtual IP address, go to step 4.
If the virtual IP address exists, check the same items as the monitoring process to determine the normality. If it is determined as normal, the restore process ends successfully, otherwise the process ends with an error.
- Duplication check
If NOIPUNIQUE=1 is specified, skip this step and proceed to step 5.
Execute the following command to check connectivity from the physical IP address on the local node to the virtual IP address to be assigned. If there is a response, it is determined that the virtual IP address is duplicated and the restore process ends with an error.
# /opt/LifeKeeper/bin/lkping -b -c 1 -w IP_PINGTIMEOUT <Virtual IP>
- Assigning virtual IP addresses and flushing the ARP table
First, execute the following command to assign a virtual IP address. If the result of the command indicates that the assignment failed, the restore process ends with an error.
# ip -4 addr add <Virtual IP/prefix> dev <device>
Next, execute the following command to flush the ARP cache for all network devices on the own node.
# ip neigh flush dev <device>
Then run the following command to flush the ARP table on the network to which the virtual IP address was assigned.
# arping -Uq -c1 -s <Virtual IP> -I <device> <Virtual IP> # arping -Aq -c1 -s <Virtual IP> -I <device> <Virtual IP> # arping -Uq -c1 -s <Virtual IP> -I <device> <broadcast address> # arping -Aq -c1 -s <Virtual IP> -I <device> <broadcast address>
When the arping command is not installed, execute the following command only for the network device to which the virtual IP address is assigned.
# ip neigh flush dev <device>
- Enable source address setting
If Source Address Setting is enabled on the IP resource properties screen, execute the following command to set the source address of communication from the own node to the network to which the virtual IP address belongs to the virtual IP address.
# ip -4 route change <network address/prefix> src <Virtual IP>
- Connectivity check
The connectivity check is the same as for the monitoring process.
Please refer to the “Connectivity check” section for details. Depending on the success or failure of this check, the restore process ends successfully or with an error.
Taking a Resource Out of Service (remove)
- Checking protected IP addresses
Check whether the protected IP is a physical IP address.
If the physical IP address is protected, the remove process ends successfully at this point.
- Checking the assignment status
Execute the following command to obtain the IP address assigned to the device and verify that there is a virtual IP address in the output.
# ip -4 -o addr show dev <device>
If the virtual IP address does not exist, the remove process ends successfully.
If the virtual IP address exists, go to step 3.
- Disabling source address setting
If Source Address Setting is enabled on the IP resource properties screen, execute the following command to set the source address of communication from the own node to the network to which the virtual IP address belongs to the physical IP address.
# ip -4 route change <network address/prefix>
- Removing a virtual IP address
Execute the following command to remove the virtual IP address. Depending on the result of the command, the remove process will end successfully or with an error. Steps 5 and 6 should be performed in the case of success or failure.
# ip -4 addr delete <Virtual IP/prefix> dev <device>
- Remove host route
Execute the following command to remove the host route.
# ip -4 route delete <Virtual IP/prefix> dev <device>
- Clear routing cache
Execute the following command to clear the routing cache.
# echo 0 > /proc/sys/net/ipv4/route/flush 2> /dev/null
Monitoring (quickCheck)
Note that if the [Restore and Recover] setting is [Disable], no further processing is performed and the restore process ends successfully. The default setting for [Restore and Recover] is [Enable].
- Check the assignment status
Execute the following command to obtain the IP address assigned to the device and verify that there is a protected IP address in the output.
# ip -4 -o addr show dev <device>
If the protected IP address does not exist, the monitoring process ends with an error at this point.
If the protected IP address exists, go to Step 2.
- Checking the link status
If IP_NOLINKCHECK=1 is specified, skip this step and proceed to step 3.
Execute the following command to determine if the device link is down or the link is up depending on whether there is a “NO-CARRIER” status in the output.
# ip -4 -o addr show dev <device>
If the result of the above command indicates that the device link is down, the monitoring process ends with an error at this point.
In an environment where NetworkManager is enabled, in addition to checking the link status using the ip command, execute the following command to check the startup status of NetworkManager. If the check fails, the monitoring process ends with an error.
# nmcli connection --active grep -qE <device>
- Connectivity check
The connectivity check is the same as for the restore process.
Please refer to the “Connectivity check” section for details. Depending on the success or failure of this check, the monitoring process ends successfully or with an error.
If the monitoring process does not finish within the IP_QUICKCHECK_TIMEOUT value (default: 12 seconds), it times out and causes a resource failure.
Recovery (recover)
- Check protected IP addresses
Check whether the protected IP is a physical IP address.
If the physical IP address is protected, the recovery process ends with an error at this point.
- Perform the recovery process
Attempt to recover the virtual IP address by performing the above restore and remove steps in order.
Connectivity check
This communication check process is common to both restore and monitoring processes, and can be broadly divided into one of three types depending on the following conditions.
- Availability of PingList
- NOBCASTPING setting value (0 or 1)
When the process in this section ends with “success” or “error”, return to the step for the restore or monitoring process.
Connectivity check using unicast ping and broadcast ping
Condition: PingList is available and NOBCASTPING=0 or 1
- Connectivity check using unicast ping
Execute the following command to perform a unicast ping to each IP listed in the PingList.
# /opt/LifeKeeper/bin/lkping -b -c 1 -w IP_PINGTIME -I <Virtual IP> <PingList IP>
If there is at least one response to the above command, this process ends successfully.
If there is no response to the above command, go to step 2.
- Connectivity check using broadcast ping
If NOBCASTPING=1 is specified, this process terminates with an error at this point because step 1 has failed.
If NOBCASTPING=0 is specified, execute the following command to perform a broadcast ping to the network to which the virtual IP address belongs.
# /opt/LifeKeeper/bin/lkping <broadcast address> -b -c 1 -w IP_PINGTIME -I <Virtual IP> -z <own node IP>
If there is at least one response to the above command, this process ends successfully.
If there is no response to the above command, this process ends with an error.
The process ends successfully without connectivity check
Condition: PingList is not available and NOBCASTPING=1
Connectivity check is not performed under this condition and the process ends successfully.
Connectivity check using broadcast ping only
Condition: PingList is not available and NOBCASTPING=0
Execute the following command to perform a broadcast ping to the network to which the virtual IP address belongs.
# /opt/LifeKeeper/bin/lkping -b -c 1 -w IP_PINGTIME -I <Virtual IP> -z <Own node IP>
If there is at least one response to the above command, this process ends successfully.
If there is no response to the above command, this process ends with an error.
lkping command
Command Overview
The lkping command is a connectivity check command customized for LifeKeeper for Linux and used for handling some LifeKeeper resources.
Difference from the ping command
The ping command and the lkping command, which are generally used to check connectivity, have some different options, but the internal behavior is the same.
In situations where the lkping command results in an error, it is expected that the ping command will fail as well.
Options specific to the lkping command
The following options are specific to lkping. Other options are the same as the regular ping command.
-w
: Specifies the number of seconds before the command times out (not in milliseconds)
-z
: Specifies addresses to exclude from receiving ping responses separated by “, (comma)”
*The -z option is used to exclude responses to broadcast pings from IP addresses on the own node.
IP Recovery Kit Parameters
For more details on the IP Recovery Kit parameters described here, see IP Parameters List .
Post your comment on this topic.