Takeover with Handshake of the SAP HANA Database
The “takeover with handshake” feature, available in SAP HANA 2.0 SPS04 and later, allows for reduced downtime of the primary database during switchover by suspending the primary database (rather than completely stopping it) before performing a takeover of SAP HANA System Replication on the new database host.
When a “takeover with handshake” is initiated, the SAP HANA Recovery Kit performs the following steps:
- The database instance is left running on the previous primary node.
- A “takeover with handshake” of SAP HANA System Replication is executed on the new primary node (i.e., the previous secondary node). This puts the database on the original primary node into a suspended state before registering the new primary system replication site.
- The suspended database on the new secondary node (i.e., the previous primary node) is stopped and re-registered as the secondary system replication site.
- The database instance is started on the new secondary node.
Performing a Takeover with Handshake
Prerequisites
Before performing a takeover with handshake, first ensure that the following prerequisites are met:
- SAP HANA 2.0 SPS04 or later is installed on each server,
- The protected SAP HANA database is in-service on a server in the cluster, and
- SAP HANA System Replication is in-sync.
Takeover with Handshake in the LifeKeeper GUI
To perform a takeover with handshake in the LifeKeeper GUI, right-click the resource on the standby server and select the “In Service – Takeover with Handshake…” action.
This will bring up a confirmation dialog describing the takeover with handshake feature:
Click Perform Takeover to initiate the takeover. Once the takeover process is complete, the states of the SAP HANA resources on each node will transition to Standby – Suspended and Active.
Once the SAP HANA resource has successfully been brought in-service on the target server, a ‘remoteregisterdb’ event will automatically fire in the background to stop, register, and restart the secondary database instance. If you would prefer this process of restarting the secondary database to occur in the foreground while bringing the SAP HANA resource in-service, set HANA_REGISTER_SECONDARY_DURING_RESTORE=true in /etc/default/LifeKeeper.
Once the ‘remoteregisterdb’ event has successfully restarted the secondary database instance, the resource state will transition to Standby – HDB Running. After LifeKeeper spawns its HSR monitoring process during the subsequent quickCheck cycle and determines that HSR is in-sync, the resource state will transition to Standby – In Sync. At this point, the SAP HANA database is highly-available.
Takeover with Handshake on the Command Line
Takeover with handshake may be performed on the command line by executing either of the following commands:
- /opt/LifeKeeper/bin/hana_takeover_with_handshake -t <HANA Tag> -s <Target Server> [-b]
- lkcli resource config hana —tag <HANA Tag> —takeover_with_handshake <Target Server>
The optional -b parameter in the first command controls how much of the SAP HANA resource hierarchy is brought in-service on the target server. Without the -b option, the entire hierarchy (including all parent resources and resources with shared dependencies) will be brought in-service. With the -b option, only the given SAP HANA resource and its dependencies will be brought in-service.
Controlling Failback Behavior
By default, if the SAP HANA resource cannot be successfully brought in-service during a takeover with handshake attempt, it will be left in the Out of Service – Failed (OSF) state on the target server and will require manual intervention to be brought back in-service. In this scenario, LifeKeeper may also be configured to attempt an automated failback to bring the SAP HANA resource hierarchy back in-service on the previous host. This automated failback behavior can be enabled by setting HANA_HANDSHAKE_TAKEOVER_FAILBACK=true in /etc/default/LifeKeeper.
Resuming a Suspended Primary Database
If the protected SAP HANA database becomes suspended on the node where the resource is currently in-service (for example due to an administrator performing a takeover with handshake manually outside of LifeKeeper), the Active – Suspended and Standby – Primary resource states will be displayed.
While in this state, all SAP HANA resource monitoring is suspended and a message similar to the following is logged and broadcast to all open terminals on the system where the SAP HANA resource is currently in-service until the issue is resolved:
EMERG:hana:quickCheck:HANA-SHC_HDB00:136377:SAP HANA database HDB00 corresponding to resource HANA-SHC_HDB00 is currently suspended on server sap-rhel81-1 due to actions performed outside of LifeKeeper. Please take the SAP HANA resource out of service on server sap-rhel81-1 and bring it in-service on the server where the database should be registered as primary master. Bringing resource HANA-SHC_HDB00 back in-service on sap-rhel81-1 will resume the suspended database. Resource monitoring for HANA-SHC_HDB00 will be suspended until the issue is resolved.
If necessary, the database can be manually resumed by executing the following command on the server where it is suspended:
su – <sid>adm -c “hdbnsutil -sr_resumeSuspendedPrimary”
After resuming the database with this command, it must also be manually stopped on the standby server where it is registered as primary by executing the following command:
su – <sid>adm -c “sapcontrol -nr <InstNum> -function StopWait 600 5”
Alternatively, if the standby server is intended to become the new primary server, then the issue can be resolved by switching over the SAP HANA resource to the server where the SAP HANA resource state is Standby – Primary.
Post your comment on this topic.