The ability to provide detection and alarming for problems within an application is critical to building the best total fault resilient solution. Since every specific application varies on the mechanism and format of failures, no one set of generic mechanisms can be supplied. In general, however, many application configurations can rely on the Core system error detection provided within LifeKeeper Single Server Protection. This topic demonstrates the power of LifeKeeper Single Server Protection’s core facilities.

Below is a recovery scenario to demonstrate how LifeKeeper Single Server Protection provides fault detection and recovery when an application fails.

  1. LifeKeeper Single Server Protection will first attempt recovery by trying to restart the application.
  1. If the recovery succeeds, the application should continue to run normally.
  1. If the recovery attempt fails:

a. If the LifeKeeper Single Server Protection recovery attempt fails, and LifeKeeper Single Server Protection is installed in a VMware guest OS with HA enabled (HA_DISABLE=0 in %LKROOT%\etc\default\LifeKeeper), then LifeKeeper Single Server Protection will trigger VMware HA by withholding the heartbeat that LifeKeeper Single Server Protection sends down to the Application Monitoring Interface. VMware HA will then respond by restarting the server.

b. If the LifeKeeper Single Server Protection recovery attempt fails, and LifeKeeper Single Server Protection is not installed in a VMware guest OS or is installed in a VMware guest OS but has HA disabled (HA_DISABLE=1 in %LKROOT%\etc\default\LifeKeeper), then a system reboot will be forced.

Optionally, LifeKeeper Single Server Protection can be placed in Notification Only mode. In this mode the automatic triggering of a system reboot is disabled (see the section VMware HA and Notification Only Mode below). In Notification Only mode you must log into the system and correct the issue that caused the failure.

VMware HA and Notification Only Mode

  1. To enable Notification Only mode, edit the file %LKROOT%\etc\default\LifeKeeper and set the value SSP_NOTIFICATION_ONLY=1. By default this value is 0 (disabled). Then restart LifeKeeper Single Server Protection on the node.
  1. In Notification Only mode when a failure is detected, LifeKeeper Single Server Protection will not attempt to restart the application. Instead, the resource will be marked as Failed.
  1. Open the LifeKeeper Admin Console through the CLI.
  1. Bring the application back in service.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment