Switchover of the SAP HANA Resource
When a switchover of the primary database instance is initiated, the SAP HANA Recovery Kit performs the following steps:
- The database instance is stopped on the previous primary node
- A takeover of SAP HANA System Replication is executed on the new primary node (i.e., the previous secondary node)
- The new secondary node (i.e., the previous primary node) is re-registered as the secondary SAP HANA System Replication site
- The database instance is started on the new secondary node
If a message similar to the following:
ERROR:hana:restore:HANA-SPS_HDB00:136266:The resource HANA-SPS_HDB00 protecting SAP HANA database HDB00 is not in sync. To protect the data LifeKeeper will not restore the resource on $me. Please restore the resource on the previous source server to allow the resync to complete.
is displayed while bringing the SAP HANA resource in-service, this means that SAP HANA System Replication was not in-sync when the primary database instance was stopped. Therefore data may exist on the primary database server which has not yet been replicated to the secondary database server. For this reason, LifeKeeper will not allow the secondary server to take over the primary replication role. The recommendation in this scenario is to bring the SAP HANA resource hierarchy back in-service on the previous primary server and allow the resynchronization to complete.
If the previous primary server is down and cannot be recovered, the SAP HANA resource can be forced online on server where the data is out-of-sync, but this will result in a loss of all data that has not yet been replicated from the previous primary server. If it is determined by a database administrator that this data loss is acceptable or unavoidable, the out-of-sync data flag can be manually removed with the following command:
/opt/LifeKeeper/bin/flg_remove -f ‘!HANA_DATA_OUT_OF_SYNC_<Tag>’
where <Tag> is the SAP HANA resource tag name in LifeKeeper (e.g., HANA-SPS_HDB00). After removing the out-of-sync data flag, reattempt the in-service operation for the HANA resource. Once brought in-service, the database will take over the primary SAP HANA System Replication role and all data on the previous primary server that has not been replicated will be lost. Once the previous primary server is repaired and brought back online, it will be registered as the secondary system replication site and the database will be restarted as the replication target.
Takeover with Handshake of the SAP HANA Database
The “takeover with handshake” feature, available in SAP HANA 2.0 SPS04 and later, allows for reduced downtime of the primary database during switchover by suspending the primary database (rather than completely stopping it) before performing a takeover of SAP HANA System Replication on the new database host. For more information on how to perform this type of takeover with the SAP HANA Recovery Kit, see Takeover with Handshake.
Stopping the SAP HANA Database
When the SAP HANA resource is taken out of service in LifeKeeper, only the primary database instance is stopped. The secondary database instance is kept running to minimize downtime during switchover or failover of the HANA resource hierarchy.
There are two special cases that cause exceptional behavior:
- If the !volatile!hana_leave_db_running_<HANA Tag> LifeKeeper flag is set on the system where the SAP HANA resource is being taken out of service, LifeKeeper will not stop the database instance during the out of service operation. If this flag has been set unintentionally, it can be removed with the following command:
/opt/LifeKeeper/bin/flg_remove -f ‘!volatile!hana_leave_db_running_<HANA Tag>’
- If the database is suspended on the system where the SAP HANA resource is being taken out of service, LifeKeeper will not stop the database instance during the out of service operation. Keeping the suspended database instance running in this scenario preserves the option to resume it, should that action be required. This scenario most commonly occurs when a “takeover with handshake” has been manually performed by a database administrator outside of LifeKeeper. If necessary, the suspended database instance can be stopped manually with the following command:
su – <sid>adm -c “sapcontrol -nr <InstNum> -function StopWait 600 5”
where <sid> is the lower-case SID for the SAP HANA installation and <InstNum> is the HDB instance number.
Stopping all SAP HANA Databases (Maintenance Mode)
When this option is chosen the primary HANA resource is taken out of service and all of the HANA database instances in a HANA resource cluster will be stopped. This option must be executed with utmost care, as it brings the possibility of a quick failover/switchover to the backup machine. Note: This option should only be chosen in the event that the secondary database instance must also be stopped (e.g. during the maintenance window).
To use this option perform the following steps on HANA resource hierarchy:
- Right click either on HANA resource under the left hand panel or an in-service server and choose the option Out of Service – Stop HDB on All Nodes.
- Verify the HANA resource and follow the instructions given in the dialog box. Click on Stop All SAP HANA DBs to start the process.
- Once the process finishes, click Finish to complete the process.
- The final state of SAP HANA resource will appear as shown below:
- Once all of the maintenance activities are complete, bring the SAP HANA resource hierarchy in service on the last primary system.
The protected SAP HANA database may also be stopped on all servers in the cluster by executing the following command on the server where the SAP HANA resource is currently in-service (ISP):
/opt/LifeKeeper/bin/hana_stop_all_dbs -t <HANA Resource Tag>