Resource Policy Management in LifeKeeper Single Server Protection (SSP) provides behavior management of resource local recovery and failover. Resource policies are managed with the lkpolicy command line tool (CLI).

LifeKeeper SSP Recovery Behavior

LifeKeeper SSP is designed to monitor individual applications and groups of related applications, periodically performing local recoveries or notifications when protected applications fail. Related applications, by example, are hierarchies where the primary application depends on lower-level storage or network resources. When an application or resource failure occurs, the default behavior is:

  1. Local Recovery: First, attempt local recovery of the resource or application. An attempt will be made to restore the resource or application on the local server without external intervention. If local recovery is successful, then LifeKeeper SSP will not perform any additional action.
  1. Failover: Second, if a local recovery attempt fails to restore the resource or application (or the recovery kit monitoring the resource has no support for local recovery), then a failover will be initiated (see Failover in the Standard Policies section below).

Please see LifeKeeper Single Server Protection Fault Detection and Recovery Scenario for more detailed information about our recovery behavior.

Custom and Maintenance-Mode Behavior via Policies

LifeKeeper SSP supports the ability to set additional policies that modify the default recovery behavior. There are four policies that can be set for individual resources (see the section below about precautions regarding individual resource policies) or for an entire server. The recommended approach is to alter policies at the server level.

The available policies are:

Standard Policies

  • Failover – For LifeKeeper SSP this policy setting can be used to turn on/off resource failover (which results in a reboot).
  • LocalRecovery – LifeKeeper SSP by default, will attempt to recover protected resources by restarting the individual resource or the entire protected application prior to performing a failover (which would be a reboot). This policy setting can be used to turn on/off local recovery.
  • TemporalRecovery – Normally, LifeKeeper SSP will perform local recovery of a failed resource. If local recovery fails, LifeKeeper SSP will perform a reboot. If the local recovery succeeds, failover (which would be a reboot) will not be performed.

There may be cases where the local recovery succeeds, but due to some irregularity in the server, the local recovery is re-attempted within a short time; resulting in multiple, consecutive local recovery attempts. This may degrade availability for the affected application.

To prevent this repetitive local recovery/failure cycle, you may set a temporal recovery policy. The temporal recovery policy allows an administrator to limit the number of local recovery attempts (successful or not) within a defined time period.

Example: If a user sets the policy definition to limit the resource to three local recovery attempts in a 30-minute time period, LifeKeeper SSP will failover(reboot) when a third local recovery attempt occurs within the 30-minute period.

Defined temporal recovery policies may be turned on or off. When a temporal recovery policy is off, temporal recovery processing will continue to be done and notifications will appear in the log when the policy would have fired; however, no actions will be taken.

Meta Policies

The “meta” policies are the ones that can affect more than one other policy at the same time. These policies are usually used as shortcuts for getting certain system behaviors that would otherwise require setting multiple standard policies.

  • NotificationOnly – This mode allows administrators to put LifeKeeper SSP in a “monitoring only” state. Both local recovery and failover(reboot) of a resource (or all resources in the case of a server-wide policy) are affected. The user interface will indicate a Failure state if a failure is detected; but no recovery or failover(reboot) action will be taken. Note: The administrator will need to correct the problem that caused the failure manually and then bring the affected resource(s) back in service to continue normal LifeKeeper SSP operations.

Important Considerations for Resource-Level Policies

Resource level policies are policies that apply to a specific resource only, as opposed to an entire resource hierarchy or server.

Example:

app

- IP

- file system

In the above resource hierarchy, app depends on both IP and file system. A policy can be set to disable local recovery or failover of a specific resource. This means that, for example, if the IP resource’s local recovery fails and a policy was set to disable failover of the IP resource, then the IP resource will not fail over or cause a failover of the other resources. However, if the file system resource’s local recovery fails and the file system resource policy does not have failover disabled, then the entire hierarchy will failover causing a reboot.

This is a simple example. Complex hierarchies can be configured, so care must be taken when setting resource-level policies.

The lkpolicy Tool

The lkpolicy tool is the command-line tool that allows management (querying, setting, removing) of policies on servers running LifeKeeper SSP. lkpolicy supports setting/modifying policies, removing policies and viewing all available policies and their current settings. In addition, defined policies can be set on or off, preserving resource/server settings while affecting recovery behavior.

The general usage is :

lkpolicy [—list-policies | —get-policies | —set-policy | —remove-policy] <name value pair data…>

The <name value pair data…> differ depending on the operation and the policy being manipulated, particularly when setting policies. For example: Most on/off type policies only require –on or —off switch, but the temporal policy requires additional values to describe the threshold values.

Example lkpolicy Usage

Authenticating With Local and Remote Servers

The lkpolicy tool communicates with LifeKeeper SSP servers via an API that the servers expose. This API requires authentication from clients like the lkpolicy tool. The first time the lkpolicy tool is asked to access a LifeKeeper SSP server, if the credentials for that server are not known, it will ask the user for credentials for that server. These credentials are in the form of a username and password and:

  1. Clients must have LifeKeeper SSP admin rights. This means the username must be in the lkadmin group according to the operating system’s authentication configuration (via pam). It is not necessary to run as root, but the root user can be used since it is in the appropriate group by default.
  1. The credentials will be stored in the credential store so they do not have to be entered manually each time the tool is used to access this server.

The lkpolicy tool communicates with LifeKeeper SSP servers via an API that the servers expose. This API requires authentication from clients like the lkpolicy tool. The first time the lkpolicy tool is asked to access a LifeKeeper SSP server, if the credentials for that server are not known, it will ask the user for credentials for that server. These credentials are in the form of a username and password and:

See Configuring Credentials for SIOS Protection Suite for more information on the credential store and its management with the credstore utility.

An example session with lkpolicy might look like this:

[root@thor49 ~]# lkpolicy -l -d v6test4
Please enter your credentials for the system ‘v6test4’.
Username: root
Password:
Confirm password:
Failover
LocalRecovery
TemporalRecovery
NotificationOnly
[root@thor49 ~]# lkpolicy -l -d v6test4
Failover
LocalRecovery
TemporalRecovery
NotificationOnly
[root@thor49 ~]#

Listing Policies

lkpolicy —list-policy-types

Showing Current Policies

lkpolicy —get-policies

lkpolicy —get-policies tag=\*

lkpolicy —get-policies —verbose tag=mysql\* # all resources starting with mysql

lkpolicy —get-policies tag=mytagonly

Setting Policies

lkpolicy —set-policy Failover —off

lkpolicy —set-policy Failover —on tag=myresource

lkpolicy —set-policy Failover —on tag=\*

lkpolicy —set-policy LocalRecovery —off tag=myresource

lkpolicy —set-policy NotificationOnly —on

lkpolicy —set-policy TemporalRecovery —on recoverylimit=5 period=15

lkpolicy —set-policy TemporalRecovery —on —force recoverylimit=5 period=10

Removing Policies

lkpolicy —remove-policy Failover tag=steve

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment