Symptom:
When attempting to bring an SAP resource hierarchy which does not contain a dependent file system resource back in-service on a previous host node, the in-service (restore) of the SAP resource fails.
Messages similar to the following appear in the SAP instance trace logs indicating that a port required by one of the instance processes is already in-use (this example was taken from /usr/sap/<SID>/ASCS<##>/work/dev_ms.new):
[Thr 140225937864512] ***LOG Q0I=> NiIBindSocket: bind (98: Address already in use) [/bas/781_REL/src/base/ni/nixxi.cpp 3946
[Thr 140225937864512] *** ERROR => NiIBindSocket: SiBind failed for hdl 1/sock 6
(SI_EPORT_INUSE/98; I4; ST; 0.0.0.0:3610) [nixxi.cpp 3946]
[Thr 140225937864512] *** ERROR => MsSCommInit: NiBuf2Listen(sapmsSHC) (rc=NIESERV_USED) [msxxserv.c 12838]
[Thr 140225937864512] *** ERROR => MsSInit: MsSCommInit [msxxserv.c 2732]
[Thr 140225937864512] *** ERROR => MsSInit failed, see dev_ms.new for details [msxxserv.c 7363]
Cause:
Resource Protection Level was set to Basic, Minimum, or Standard after the create or extend. When the resource Protection Level is set to Basic, Minimum, or Standard, the protected SAP instance will not be stopped when taking the SAP resource out of service. This leaves the processes for that instance running after the remove script is called. Since the SAP resource hierarchy does not contain a dependent file system resource, the file systems that the SAP instance depends on will stay mounted after the SAP hierarchy is taken out of service. This means that the SAP instance processes will continue running, whereas they would have been killed with a fuser -k
call if LifeKeeper had been unmounting an underlying file system.
When LifeKeeper attempts to bring the resource back in-service on the server where the SAP instance processes were left running, the protected instance is unable to start due to the required ports already being in-use by these running processes.
Action:
- The Basic and/or Minimum settings should only be used to place a resource in a temporary maintenance mode. They should not be used as ongoing Protection Level settings.
- The Standard setting may be used for replicated enqueue resources only. In particular, it should not be used when the SAP resource does not have a dependent file system under LifeKeeper protection.
- To resolve the issue, follow these steps:
- Reboot all servers in the cluster that the SAP resource has been extended to. After rebooting, all processes in the protected SAP instance will be stopped and the required ports will be available.
- Bring the SAP resource hierarchy in-service on one of the servers. Change the SAP resource Protection Level to Full by right-clicking the resource in the LifeKeeper GUI and selecting “Update Protection Level”.
- Repeat step (b) for each server in the cluster that the SAP resource has been extended to.
Post your comment on this topic.