The ability to provide detection and alarming for problems within an application is critical to building the best total fault resilient solution. Since every specific application varies on the mechanism and format of failures, no one set of generic mechanisms can be supplied. In general, however, many application configurations can rely on the Core system error detection provided within LifeKeeper Single Server Protection. This topic demonstrates the power of LifeKeeper Single Server Protection’s core facilities.
Below is a recovery scenario to demonstrate how LifeKeeper Single Server Protection provides fault detection and recovery when an application fails.
- LifeKeeper Single Server Protection will first attempt recovery by trying to restart the application.
- If the recovery succeeds, the application should continue to run normally.
- If the recovery attempt fails:
a. If the LifeKeeper Single Server Protection recovery attempt fails, and LifeKeeper Single Server Protection is installed in a VMware guest OS with HA enabled (HA_DISABLE=0 in /etc/default/LifeKeeper), then LifeKeeper Single Server Protection will trigger VMware HA by withholding the heartbeat that LifeKeeper Single Server Protection sends down to the Application Monitoring Interface. VMware HA will then respond by restarting the server.
b. If the LifeKeeper Single Server Protection recovery attempt fails, and LifeKeeper Single Server Protection is not installed in a VMware guest OS or is installed in a VMware guest OS but has HA disabled (HA_DISABLE=1 in /etc/default/LifeKeeper), then a system reboot will be forced.
Optionally, LifeKeeper Single Server Protection can be placed in Notification Only mode. In this mode the automatic triggering of a system reboot is disabled (see the section VMware HA and Notification Only Mode below). In Notification Only mode you must log into the system and correct the issue that caused failure.
VMware HA and Notification Only Mode
- In Notification Only mode with HA enabled in the VMware guest OS and the LifeKeeper SSP vCenter plugin installed, when a failure is detected, LifeKeeper Single Server Protection will not attempt to restart the application. Instead, the resource will be marked as Failed. The vCenter plugin dashboard view status screen will show failure (Application Status: Failed).
- Log in to the server and correct the issue that caused the failure.
- Open the LifeKeeper Admin Console either through the CLI or by clicking on the protected virtual machine within the vSphere Client User Interface.
- Bring the application back in service.
- Go to the dashboard view within the vSphere Client User Interface.
- Click Refresh. Application status goes back to Active.