As previously described in the Configuration Considerations topic, if the file system has been configured on either the PAS Primary or Backup server to locally mount NFS shares, an NFS hierarchy out-of-service operation will hang the system and prevent a clean reboot. To avoid causing your cluster to hang by inadvertently stopping the NFS server, we make the following recommendations:
- Do not take your NFS hierarchy out of service on a server that contains local NFS mount points to the protected NFS share. You may take your SAP resource in and out of service freely so long as the NFS child resources stay in service. You may also bring your NFS hierarchies in service on a different server prior to shutting a server down.
- If you must stop LifeKeeper on a server where the NFS hierarchy protecting locally mounted NFS shares is in service, always use the –f option. Stopping LifeKeeper using the command lkstop –f stops LifeKeeper without taking the hierarchies out of service, thereby preventing a server hang due to local NFS mounts. See the lkstop man page for additional information.
- If you must reboot a server where the NFS hierarchy protecting locally mounted NFS shares is in service, you should first stop LifeKeeper using the –f option as described above. A server reboot will cause the system to stop LifeKeeper without the –f option, thereby taking the NFS hierarchies out-of-service and hanging the system.
- If you need to uninstall the SAP package, do not do so when there are SAP hierarchies containing NFS resources that are in-service protected (ISP) on the server. Delete the SAP hierarchy prior to uninstalling the package.
- If you are upgrading SPS or if you need to run the SPS Installation setup scripts, it is recommended that you follow the upgrade instructions included in the SPS for Linux Installation Guide. This includes switching all applications away from the server to be upgraded before running the setup script on the SPS Installation image file and/or updating your SPS packages. Specifically, the setup script on the LifeKeeper Installation image file should not be run on a server where LifeKeeper is protecting active NFS shares, since upgrading the nfsd kernel module requires stopping NFS on that server which may cause the server to hang with locally mounted NFS file systems. For additional information, refer to the NFS Server Recovery Kit Documentation.
- Using TCP can lead to hangs during out-of-service operations during the forceumount call. When NFS shares are not accessible the unmount can fail. LifeKeeper will attempt to unmount the filesystem multiple times. These multiple attempts will typically succeed in eventually taking the resource out of service. However, this will cause delays in taking the resource out of service. To avoid these retries, use ‘nfsvers=3, proto=udp’ mount options.
- If the /sapmnt (or /sapmnt/<SID>) filesystem is shared via NFS, ‘SAP_NFS_CHECK_DIRS=/sapmnt’ should be added to /etc/default/LifeKeeper on each node in the cluster to help prevent hangs in SAP resource administration actions due to a loss of the NFS shares.
Post your comment on this topic.