For each out-of-service (OSU) resource, lkcheck periodically calls the OSUquickCheck script for the resource. The OSUquickCheck script performs a quick health check for the resource. If it determines that the resource cannot start successfully, it changes the state of the resource to OSF and sends this information to the administrator by email or SNMP event forwarding. This monitoring is performed at the same interval as the normal LifeKeeper resource monitoring (/etc/default/LifeKeeper setting LKCHECKINTERVAL).
Monitored Resources
The following can be monitored with OSU Resource Monitoring:
IP Resource | Verify the NIC link is up (disable with /etc/default/LifeKeeper setting IP_NOLINKCHECK=1). Also, verify network reachability (if a ping list is configured). |
Disk or DMMP resource(s) | Verify that the paths to the monitored disk are functional by using commands for each resource. |
NAS Resource | Verify that NFS access is available for the NFS server. Refer to NAS Configuration Considerations for information on the timeout value for NFS access. |
OSU Resource Monitoring Configuration
Set the SNHC_IPCHECK and SNHC_DISKCHECK settings in the /etc/default/LifeKeeper configuration file. You may also need to configure the following setting. See Standby Node Health Check Parameters List for details.
- SNHC_IPCHECK_SLEEPTIME
Recovery from Failure
If an error is detected during OSU resource monitoring, the state of the corresponding resource is changed to OSF (out of service with failure). When the status is changed, OSU resource monitoring is no longer performed for the resource. After checking the details of the notified failure and addressing it, you should change the resource state to OSU. The state can be changed from OSF to OSU using the following command:
/opt/LifeKeeper/lkadm/bin/retstate <resource tag>
Post your comment on this topic.