Resource Policy Management in SIOS Protection Suite for Linux provides behavior management of resource local recovery and failover. Resource policies are managed with the lkpolicy command line tool (CLI).
SIOS Protection Suite
SIOS Protection Suite is designed to monitor individual applications and groups of related applications, periodically performing local recoveries or notifications when protected applications fail. Related applications, by example, are hierarchies where the primary application depends on lower-level storage or network resources. When an application or resource failure occurs, the default behavior is:
- Local Recovery: First, attempt local recovery of the resource or application. An attempt will be made to restore the resource or application on the local server without external intervention. If local recovery is successful, then SIOS Protection Suite will not perform any additional action.
- Failover: Second, if a local recovery attempt fails to restore the resource or application (or the recovery kit monitoring the resource has no support for local recovery), then a failover will be initiated. The failover action attempts to bring the application (and all dependent resources) into service on another server within the cluster.
Please see SIOS Protection Suite Fault Detection and Recovery Scenarios for more detailed information about our recovery behavior.
Custom and Maintenance-Mode Behavior via Policies
SIOS Protection Suite Version 7.5 and later supports the ability to set additional policies that modify the default recovery behavior. There are four policies that can be set for individual resources (see the section below about precautions regarding individual resource policies) or for an entire server. The recommended approach is to alter policies at the server level.
The available policies are:
Standard Policies
- Failover This policy setting can be used to turn on/off resource failover. (Note: In order for reservations to be handled correctly, Failover cannot be turned off for individual scsi resources.)
- LocalRecovery – SIOS Protection Suite, by default, will attempt to recover protected resources by restarting the individual resource or the entire protected application prior to performing a failover. This policy setting can be used to turn on/off local recovery.
- TemporalRecovery – Normally, SIOS Protection Suite will perform local recovery of a failed resource. If local recovery fails, SIOS Protection Suite will perform a resource hierarchy failover to another node. If the local recovery succeeds, failover will not be performed.
There may be cases where the local recovery succeeds, but due to some irregularity in the server, the local recovery is re-attempted within a short time; resulting in multiple, consecutive local recovery attempts. This may degrade availability for the affected application.
To prevent this repetitive local recovery/failure cycle, you may set a temporal recovery policy. The temporal recovery policy allows an administrator to limit the number of local recovery attempts (successful or not) within a defined time period.
Example: If a user sets the policy definition to limit the resource to three local recovery attempts in a 30-minute time period, SIOS Protection Suite will fail over when a third local recovery attempt occurs within the 30-minute period.
Defined temporal recovery policies may be turned on or off. When a temporal recovery policy is off, temporal recovery processing will continue to be done and notifications will appear in the log when the policy would have fired; however, no actions will be taken.
Meta Policies
The “meta” policies are the ones that can affect more than one other policy at the same time. These policies are usually used as shortcuts for getting certain system behaviors that would otherwise require setting multiple standard policies.
- NotificationOnly – This mode allows administrators to put SIOS Protection Suite in a “monitoring only” state. Both local recovery and failover of a resource (or all resources in the case of a server-wide policy) are affected. The user interface will indicate a Failure state if a failure is detected; but no recovery or failover action will be taken. Note: The administrator will need to correct the problem that caused the failure manually and then bring the affected resource(s) back in service to continue normal SIOS Protection Suite operations.
Important Considerations for Resource-Level Policies
Resource level policies are policies that apply to a specific resource only, as opposed to an entire resource hierarchy or server.
Example:
app
- IP
- file system
In the above resource hierarchy, app depends on both IP and file system. A policy can be set to disable local recovery or failover of a specific resource. This means that, for example, if the IP resource’s local recovery fails and a policy was set to disable failover of the IP resource, then the IP resource will not fail over or cause a failover of the other resources. However, if the file system resource’s local recovery fails and the file system resource policy does not have failover disabled, then the entire hierarchy will fail over.
This is a simple example. Complex hierarchies can be configured, so care must be taken when setting resource-level policies.
The lkpolicy Tool
The lkpolicy tool is the command-line tool that allows management (querying, setting, removing) of policies on servers running SIOS Protection Suite for Linux. lkpolicy supports setting/modifying policies, removing policies and viewing all available policies and their current settings. In addition, defined policies can be set on or off, preserving resource/server settings while affecting recovery behavior.
The general usage is :
lkpolicy [—list-policies | —get-policies | —set-policy | —remove-policy] <name value pair data…>
The <name value pair data…> differ depending on the operation and the policy being manipulated, particularly when setting policies. For example: Most on/off type policies only require —on or —off switch, but the temporal policy requires additional values to describe the threshold values.
Example lkpolicy Usage
Authenticating With Local and Remote Servers
The lkpolicy tool communicates with SIOS Protection Suite servers via an API that the servers expose. This API requires authentication from clients like the lkpolicy tool. The first time the lkpolicy tool is asked to access a SIOS Protection Suite server, if the credentials for that server are not known, it will ask the user for credentials for that server. These credentials are in the form of a username and password and:
- Clients must have SIOS Protection Suite admin rights. This means the username must be in the lkadmin group according to the operating system’s authentication configuration (via pam). It is not necessary to run as root, but the root user can be used since it is in the appropriate group by default.
- The credentials will be stored in the credential store so they do not have to be entered manually each time the tool is used to access this server.
See Configuring Credentials for SIOS Protection Suite for more information on the credential store and its management with the credstore utility.
An example session with lkpolicy might look like this:
[root@thor49 ~]# lkpolicy -l -d v6test4
Please enter your credentials for the system ‘v6test4’.
Username: root
Password:
Confirm password:
Failover
LocalRecovery
TemporalRecovery
NotificationOnly
[root@thor49 ~]# lkpolicy -l -d v6test4
Failover
LocalRecovery
TemporalRecovery
NotificationOnly
[root@thor49 ~]#
Listing Policies
lkpolicy —list-policy-types
Showing Current Policies
lkpolicy —get-policies
lkpolicy —get-policies tag=\*
lkpolicy —get-policies —verbose tag=mysql\* # all resources starting with mysql
lkpolicy —get-policies tag=mytagonly
Setting Policies
lkpolicy —set-policy Failover —off
lkpolicy —set-policy Failover —on tag=myresource
lkpolicy —set-policy Failover —on tag=\*
lkpolicy —set-policy LocalRecovery —off tag=myresource
lkpolicy —set-policy NotificationOnly —on
lkpolicy —set-policy TemporalRecovery —on recoverylimit=5 period=15
lkpolicy —set-policy TemporalRecovery —on —force recoverylimit=5 period=10
Removing Policies
lkpolicy —remove-policy Failover tag=steve
Post your comment on this topic.