The Standby Node Health Check feature allows you to monitor CPU and memory utilization on the standby node and monitor the health of out-of-service resources to detect errors on the standby node. This allows for issues to be resolved in advance, reducing the risk of an unsuccessful failover, if a failure occurs on the active node. This monitoring is performed at the same interval as the normal LifeKeeper resource monitoring (/etc/default/LifeKeeper setting LKCHECKINTERVAL).
The Standby Node Health Check performs the following two functions:
If all resources on a node are out of service, LifeKeeper considers it a standby node and calls the node monitoring script. The node monitoring script monitors CPU and memory utilization. If it determines that the node cannot be switched to successfully (due to high CPU or memory load), it sends this information to the administrator by email or SNMP event forwarding. See Node Monitoring for details.
Out-of-Service (OSU) Resource Monitoring
For each out-of-service (OSU) resource, lkcheck periodically calls the OSUquickCheck script. The OSUquickCheck script performs a quick health check for the resource. If it determines that the resource cannot start successfully, it changes the state of the resource to OSF and sends this information to the administrator by email or SNMP event forwarding. See OSU Resource Monitoring for details.
Installation and Configuration
There is no special installation required.
Setting up Standby Node Health Check
- Configure email notification and event forwarding via SNMP2.
- Configure Standby Node Health Check (Set the SNHC settings in the /etc/default/LifeKeeper configuration file. See Standby Node Health Check Parameters List for details.)
- If LifeKeeper is already started, restart the lkcheck process in order to reflect the configuration. Run the following command to restart the lkcheck process:
Once the above steps are completed, the Standby Node Health Check is activated on that node.