Test Switchover and Failover

In this section we will perform basic tests to verify the expected behavior of the SAP-SPS_ASCS10 and SAP-SPS_ERS20 resource hierarchies on switchover and failover. It is important to test that the enqueue server process in the ASCS10 instance is able to successfully recover the enqueue lock table from the ERS20 instance after switchover or failover.

Verify that the SAP-SPS_ASCS10 resource state is currently Active on node-a and Standby on node-b, and that the SAP-SPS_ERS20 resource state is currently Active on node-b and Standby on node-a.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

Execute the following commands to verify that the ASCS10 and ERS20 instances are running successfully on node-a and node-b, respectively:

[root@node-a ~]# su - spsadm -c "sapcontrol -nr 10 -function GetProcessList"
04.03.2021 20:24:12
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
msg_server, MessageServer, GREEN, Running, 2020 12 21 16:53:00, 1755:31:12, 11497
enq_server, Enqueue Server 2, GREEN, Running, 2020 12 21 16:53:00, 1755:31:12, 11498

[root@node-b ~]# su - spsadm -c "sapcontrol -nr 20 -function GetProcessList"
04.03.2021 20:24:22
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
enq_replicator, Enqueue Replicator 2, GREEN, Running, 2021 02 22 16:55:17, 243:29:05, 30028

Execute the following command on node-a to write 100 exclusive non-cumulative locks labeled 0-99 to the lock table maintained by the enqueue server.

[root@node-a ~]# su - spsadm -c "enq_admin --set_locks=100:X:DIAG::TAB:%u pf=/usr/sap/SPS/SYS/profile/SPS_ASCS10_sps-ascs"
Enqueue Server 2

2021-03-04 20:32:16; OK; 'Set Locks'; Response=41496 usec
==============================================================

Execute the following commands to verify that the locks have been successfully stored in the lock table on node-a and replicated to the enqueue replication server on node-b:

[root@node-a ~]# su - spsadm -c "sapcontrol -nr 10 -function EnqGetStatistic" | grep locks_now
locks_now: 100

[root@node-b ~]# su - spsadm -c "sapcontrol -nr 20 -function EnqGetStatistic" | grep locks_now
locks_now: 100

!If the locks are not being successfully replicated to the Enqueue Replication Server on node-b in a Google Cloud deployment, please verify that IP forwarding has been disabled on both node-a and node-b as described in the Disable IP Forwarding section of Google Cloud – Using an Internal Load Balancer. In this configuration, also verify that each GenLB resource has a dependent IP resource protecting the frontend IP address of the corresponding load balancer using network mask 255.255.255.255. Without completing these configuration steps on Google Cloud, the Enqueue Server and Enqueue Replication Server will be unable to communicate with each other through the frontend IP addresses of their corresponding internal load balancers when running on different cluster nodes.

Perform a switchover of the ASCS resource hierarchy by right-clicking the SAP-SPS_ASCS10 resource on node-b and choosing the In-Service… operation. Click In Service to begin the switchover. Once the switchover is complete, the SAP-SPS_ASCS10 and SAP-SPS_ERS20 resources will both be in-service on node-b.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

Once the ASCS resource hierarchy has successfully come in-service on node-b and the enqueue server process has obtained the copy of the backup enqueue lock table from the enqueue replication server process, LifeKeeper will automatically relocate the SAP-SPS_ERS20 resource to node-a to provide lock table redundancy across cluster nodes. This process may take several minutes to complete.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

Once LifeKeeper has relocated the SAP-SPS_ERS20 resource back to node-a, execute the following commands to verify that the ASCS10 and ERS20 instances are running successfully on node-b and node-a, respectively, and that they both still hold the 100 locks written in step 3.

[root@node-a ~]# su - spsadm -c "sapcontrol -nr 20 -function GetProcessList"
04.03.2021 20:58:57
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
enq_replicator, Enqueue Replicator 2, GREEN, Running, 2021 03 04 20:57:34, 0:01:23, 21967

[root@node-a ~]# su - spsadm -c "sapcontrol -nr 20 -function EnqGetStatistic" | grep locks_now
locks_now: 100

[root@node-b ~]# su - spsadm -c "sapcontrol -nr 10 -function GetProcessList"
04.03.2021 20:56:56
GetProcessList
OK
name, description, dispstatus, textstatus, starttime, elapsedtime, pid
msg_server, MessageServer, GREEN, Running, 2021 03 04 20:54:47, 0:02:09, 17074
enq_server, Enqueue Server 2, GREEN, Running, 2021 03 04 20:54:47, 0:02:09, 17075

[root@node-b ~]# su - spsadm -c "sapcontrol -nr 10 -function EnqGetStatistic" | grep locks_now
locks_now: 100

Execute the following command to forcefully reboot node-b:

[root@node-b ~]# echo b > /proc/sysrq-trigger

Once LifeKeeper has detected that node-b has been powered off, the status of node-b updates to “Unknown” in the LifeKeeper GUI.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

At this point, LifeKeeper will initiate automatic failover of the SAP-SPS_ASCS10 resource hierarchy back to node-a. The SAP-SPS_ASCS10 and SAP-SPS_ERS20 resource hierarchies will both be Active on node-a until node-b comes back online.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

Once node-b is back online, LifeKeeper will automatically relocate the SAP-SPS_ERS20 resource hierarchy back to node-b. This process may take several minutes to complete. Once this process is complete, the SAP-SPS_ASCS10 and SAP-SPS_ERS20 resource hierarchies will be back in-service on node-a and node-b, respectively.

On AWS or Azure, the LifeKeeper GUI should resemble the following image:

On Google Cloud, the LifeKeeper GUI should resemble the following image:

Execute the sapcontrol commands given in steps 2 and 3 again to verify the expected state on each node.

Execute the following command on node-a to release the 100 locks that were written in step 3:

[root@node-a ~]# su - spsadm -c "enq_admin --release_locks=100:X:DIAG::TAB:%u pf=/usr/sap/SPS/SYS/profile/SPS_ASCS10_sps-ascs"
Enqueue Server 2

2021-03-04 21:10:22; OK; 'Release Locks'; Response=36883 usec
===============================================================

We have now verified the basic switchover and failover functionality of the ASCS and ERS resource hierarchies.

Google Cloud – Create LifeKeeper SAP Resources

Protecting SAP HANA Resources

Feedback

Post your comment on this topic.

Feedback

Was this helpful?