Local Recovery Policies
In a highly-available SAP HANA deployment utilizing HANA System Replication (HSR), the data from the primary database instance is continuously replicated to a running secondary database instance on a standby server. This running secondary instance acts as a warm standby, providing the ability to switch or fail over the SAP HANA resource to the standby server without the time-consuming requirement of starting the database during the in-service operation. Because of this, there are some situations, especially when protecting very large database instances, where it may be faster to immediately fail over to the standby server instead of restarting the database on the primary server when a failure is detected.
This behavior can be enforced by setting a LifeKeeper local recovery policy for the SAP HANA resource. Several ways of setting a local recovery policy are described below.
Setting the Local Recovery Policy During Resource Creation
During creation of a SAP HANA resource, a dialog is presented which allows the user to either enable or disable local recovery for the resource. Select Enable to allow local recovery attempts on the server where the resource is being created, or Disable to skip all local recovery attempts and immediately fail the resource hierarchy over to the standby server when a database failure is detected on this server.
Setting the Local Recovery Policy During Resource Extension
During extension of a SAP HANA resource, a dialog is presented which allows the user to either enable or disable local recovery for the resource. Select Enable to allow local recovery attempts on the server that the resource is being extended to, or Disable to skip all local recovery attempts and immediately fail the resource hierarchy to the standby server when a database failure is detected on this server.
Setting the Local Recovery Policy Using the LifeKeeper GUI
Right-clicking a SAP HANA resource in the LifeKeeper GUI and selecting “Enable/Disable Local Recovery…” will bring up a dialog allowing the user to enable or disable local recovery for the resource on the chosen server.
Setting the Local Recovery Policy Using LKCLI
The following LKCLI command may be used to enable or disable local recovery for an SAP HANA resource on the server where the command is executed. Replace <HANA Tag> in the command by the tag name of the SAP HANA resource to be configured.
# /opt/LifeKeeper/bin/lkcli resource config hana --tag <HANA Tag> --set_local_recovery_policy <enable|disable>
Setting the Local Recovery Policy Using LKPolicy
The LKPolicy utility may be used to create a local recovery policy.
Execute the following command to disable local recovery for a given resource on the server where the command is run:
# /opt/LifeKeeper/bin/lkpolicy --set-policy LocalRecovery --off tag=<Resource Tag>
Execute the following command to enable local recovery for a given resource on the server where the command is run:
# /opt/LifeKeeper/bin/lkpolicy --set-policy LocalRecovery --on tag=<Resource Tag>
Temporal Recovery Policies
In some situations it may be the case that even though all necessary database processes are able to start successfully, they begin to fail shortly afterwards due to some underlying server issue. If local recovery is enabled for the SAP HANA resource, this sort of situation could lead to a potentially endless cycle of quickCheck failures followed by successful local recovery attempts. Therefore a sequence of several local recovery attempts within a short period of time may be indicative of a server issue, even if all of the attempts are all successful.
To help avoid endless local recovery cycles like this, the LKPolicy utility may be used to establish a temporal recovery policy on each server. With a temporal recovery policy, LifeKeeper will immediately fail all resource hierarchies from the faulty server to a standby server when it experiences X local recovery attempts within Y minutes (where X and Y are parameters set by the policy).
For example, the following command can be executed on a server to set a temporal recovery policy which will trigger failover after 3 recovery attempts on that server within 10 minutes:
# /opt/LifeKeeper/bin/lkpolicy --set-policy TemporalRecovery --on recoverylimit=3 period=10
The following command may be executed to remove an existing temporal recovery policy:
# /opt/LifeKeeper/bin/lkpolicy --remove-policy TemporalRecovery
Local Recovery Enhancements Added To The HANA Recovery Kit
Post your comment on this topic.