What if a failure occurs in the middle of a rolling upgrade of a pre-9.7.0 cluster?
A user with an existing HANA resource hierarchy should generate an lkbackup archive before upgrading from a pre-9.7.0 version of LifeKeeper to version 9.7.0 or later to allow for rollback if necessary. See the lkbackup topic for more details.
Possible scenario:
- User has an existing two-node SAP HANA cluster (node1, node2) running a pre-9.7.0 version of LifeKeeper.
- Upgrade to LifeKeeper 9.7.0 or later on node2.
- Bring the HANA resource hierarchy in-service on node2.
- The HANA quickCheck script updates the info field on both node1 and node2.
- A failure of the HANA hierarchy on node2 causes LifeKeeper to fail the hierarchy over to node1.
- The HANA resource on node1 (still running a pre-9.7.0 version of LifeKeeper) cannot re-establish replication to node2 since it cannot get the replication mode from the updated info field structure.
What errors will occur if the resource is brought in-service on a pre-9.7.0 LifeKeeper system after the info field has been updated by the LifeKeeper 9.7.0 upgrade?
The resource will come in-service on the pre-9.7.0 system but will not re-establish replication to the standby system. The following errors will be in the LifeKeeper log on the in-service system:
- INFO:hana:remoteregisterdb:HANA-SPS_HDB00:136175:failed. trace file nameserver_node2.00000.000.trc may contain more error details.
- ERROR:hana:remoteregisterdb:HANA-SPS_HDB00:136258:Attempt to register server node2 as the secondary SAP HANA System Replication site for database HDB00 failed with exit code 127.
- NOTIFY:hana:remoteregisterdb:HANA-SPS_HDB00:136701:END failed remoteregisterdb of HANA-SPS_HDB00 with return value of 1.
How can a user roll back to the pre-9.7.0 format of the info field?
Recommended:
- Restore the resource hierarchy from a recent lkbackup archive on each node taken while running a pre-9.7.0 LifeKeeper version. See the lkbackup topic for more details.
Post your comment on this topic.