Local recovery may fail if the local database is stopped with a sapcontrol Stop/StopWait command, resulting in a failover of the SAP HANA resource hierarchy
ISSUE: When a user issues a Stop/StopWait request for the HDB instance using the sapcontrol utility (which is also what is used internally when a user issues an ‘HDB stop’ command), sapstartsrv begins an asynchronous process of gracefully stopping all of the HANA database processes, and does not stop this process until either the database is completely shut down or the process times out. Therefore any other action issued via sapcontrol while sapstartsrv is in the process of gracefully shutting down the database will compete with the already-in-progress stop action, and will ultimately fail and time out.
In particular, the following sequence of events may lead to a failover of the SAP HANA resource hierarchy, even when local recovery is enabled for the protected database:
A user initiates a graceful shutdown of the HANA database while it is running on the primary server by issuing a ‘sapcontrol Stop/StopWait’ or ‘HDB stop’ command.
The ‘quickCheck’ script in the SAP HANA Recovery Kit detects that at least one database process is no longer running, which results in an attempt to locally restart the database.
The ‘recover’ script in the SAP HANA Recovery Kit issues a ‘sapcontrol StartWait’ command to attempt to restart the protected HDB instance.
Because the ‘sapcontrol Stop/StopWait’ command issued in step 1 is still actively stopping the HANA database processes, the ‘sapcontrol Start’ command issued by the SAP HANA Recovery Kit fails and times out.
Since the SAP HANA Recovery Kit is unable to restart the database locally, local recovery fails and the SAP HANA resource hierarchy fails over to the standby server.
WORKAROUND/SOLUTION: If the database is being stopped manually as part of pre-production cluster testing to simulate local recovery after a failure of the primary database, consider forcefully killing the database processes (e.g., with ‘HDB kill-9’) to more accurately simulate a primary database crash. See Testing Your SAP HANA Resource Hierarchy for sample test cases.
Post your comment on this topic.