Technical Notes

LifeKeeper Features

Item	Description
Licensing	LifeKeeper requires unique runtime license keys for each server. This applies to both physical and virtual servers. A license key is required for the LifeKeeper core software, as well as for each separately packaged LifeKeeper recovery kit. The installation script installs a License Utilities package that obtains and displays the Host ID of your server during the initial install of LifeKeeper. Once your licenses have been installed the utility will return the Entitlement ID if it is available or the Host ID if it is not. The Host IDs, along with the Activation ID(s) provided with your software, are used to obtain license keys from the SIOS Technology Corp. website.
Large Cluster Support	LifeKeeper supports large cluster configurations, up to 32 servers. There are many factors other than LifeKeeper, however, that can affect the number of servers supported in a cluster. This includes items such as the storage interconnect and operating system or storage software limitations. Refer to the vendor-specific hardware and software configuration information to determine the maximum supported cluster size.
Internationalization and Localization	LifeKeeper for Linux v5.2 and later does support wide/multi-byte characters in resource and tag names but does not include native language message support. The LifeKeeper GUI can be localized by creating locale-specific versions of the Java property files, although currently only the English version is fully localized. However, many of the messages displayed by the LifeKeeper GUI come from the LifeKeeper core, so localization of the GUI will provide only a partial solution for users until the core software is fully localized. See also Language Environment Effects in Known Issues and Restrictions for additional information.
LifeKeeper MIB File	LifeKeeper can be configured to issue SNMP traps describing the events that are occurring within the LifeKeeper cluster. See the lk_configsnmp(8) man page for more information about configuring this capability. The MIB file describing the LifeKeeper traps can be found at /opt/LifeKeeper/include/LifeKeeper-MIB.txt.
Watchdog	LifeKeeper supports the watchdog feature. The feature was tested by SIOS Technology Corp. on Red Hat EL 5.5 64-bit, and Red Hat EL 6 + softdog.
STONITH	LifeKeeper supports the STONITH feature. This feature was tested by SIOS Technology Corp. on SLES 11 on IBM x3550 ×86_64 architecture and RHEL5.5 64-bit.
XFS File System	The XFS file system does not use the fsck utility to check and fix a file system but instead relies on mount to replay the log. If there is a concern that there may be a consistency problem, the system administrator should unmount the file system by taking it out of service and run xfs_check(8) and xfs_repair(8) to resolve any issues.
IPv6	SIOS has migrated to the use of the ip command and away from the ifconfig command (for more information, see IPv6 Known Issue).

Tuning

Item	Description
IPC Semaphores and IPC Shared Memory	LifeKeeper requires Inter-Process Communication (IPC) semaphores and IPC shared memory. The default Red Hat values for the following Linux kernel options are located in /usr/src/linux/include/linux/sem.h and should be sufficient to support most LifeKeeper configurations. Option Required Default Red Hat 6.2 SEMOPM 14 32 SEMUME 20 32 SEMMNU 60 32000 SEMMAP 25 32000 SEMMNI 25 128
System File Table	LifeKeeper requires that system resources be available in order to failover successfully to a backup system. For example, if the system file table is full, LifeKeeper may be unable to start new processes and perform a recovery. In kernels with enterprise patches, including those supported by LifeKeeper, file-max, the maximum number of open files in the system, is configured by default to 1/10 of the system memory size, which should be sufficient to support most LifeKeeper configurations. Configuring the file-max value lower than the default could result in unexpected LifeKeeper failures. The value of file-max may be obtained using the following command: cat /proc/sys/fs/file-nr This will return three numbers. The first represents the high watermark of file table entries (i.e. the maximum the system has seen so far); the second represents the current number of file table entries, and the third is the file-max value. To adjust file-max, add (or alter) the “fs,file-max” value in /etc/sysctl.conf (see sysctl.conf(5) for the format) and then run sysctl –p to update the system. The value in /etc/sysctl.conf will persist across reboots.

Item

Description

IPC Semaphores and IPC Shared Memory

LifeKeeper requires Inter-Process Communication (IPC) semaphores and IPC shared memory. The default Red Hat values for the following Linux kernel options are located in /usr/src/linux/include/linux/sem.h and should be sufficient to support most LifeKeeper configurations.

Option Required Default Red Hat 6.2
SEMOPM        14 32
SEMUME        20   32
SEMMNU        60   32000
SEMMAP        25   32000
SEMMNI         25   128

System File Table

LifeKeeper requires that system resources be available in order to failover successfully to a backup system. For example, if the system file table is full, LifeKeeper may be unable to start new processes and perform a recovery. In kernels with enterprise patches, including those supported by LifeKeeper, file-max, the maximum number of open files in the system, is configured by default to 1/10 of the system memory size, which should be sufficient to support most LifeKeeper configurations. Configuring the file-max value lower than the default could result in unexpected LifeKeeper failures.

The value of file-max may be obtained using the following command:

cat /proc/sys/fs/file-nr

This will return three numbers. The first represents the high watermark of file table entries (i.e. the maximum the system has seen so far); the second represents the current number of file table entries, and the third is the file-max value.

To adjust file-max, add (or alter) the “fs,file-max” value in /etc/sysctl.conf (see sysctl.conf(5) for the format) and then run

sysctl –p

to update the system. The value in /etc/sysctl.conf will persist across reboots.

LifeKeeper Operations

Item	Description
Kernel Debugger (kdb) init s	Before using the Kernel Debugger (kdb) or moving to init s on a LifeKeeper protected server, you should first either shut down LifeKeeper on that server or switch any LifeKeeper protected resources over to the backup server. Use of kdb with the LifeKeeper SCSI Reservation Daemons (lkscsid and lkccissd) enabled (they are enabled by default) can also lead to unexpected panics.
System Panic on Locked Shared Devices	LifeKeeper uses a lock to protect shared data from being accessed by other servers on a shared SCSI Bus. If LifeKeeper cannot access a device as a result of another server taking the lock on a device, then a critical error has occurred and quick action should be taken or data can be corrupted. When this condition is detected, LifeKeeper enables a feature that will cause the system to panic. If LifeKeeper is stopped by some means other than ‘/etc/init.d/lifekeeper stop-nofailover’ with shared devices still reserved (this could be caused by executing init s), then the LifeKeeper locking mechanism may trigger a kernel panic when the other server recovers the resource(s). All resources must be placed out-of-service before stopping LifeKeeper in this manner.
nolock Option	When using storage applications with locking and following recommendations for the NFS mount options, SPS requires the additional nolock option be set, e.g. rw,nolock,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0.
Recovering Out-of-Service Hierarchies	As part of the recovery following the failure of a LifeKeeper server, resource hierarchies that were configured on the failed server but which were not in-service anywhere at the time of the server failure are recovered on the highest priority alive server at the time of the failure. This is the case no matter where the out-of-service hierarchy was last in service, including the failed server, the recovering server, or some other server in the cluster.
Coexistence with Linux Firewalls	The firewall is enabled upon installation. After installation is complete, the firewall should be modified. LifeKeeper will function if a host firewall is enabled. However, unless absolutely necessary, it is recommended that the firewall be disabled and that the LifeKeeper protected resources reside behind another shielding firewall. If LifeKeeper must coexist on firewall enabled hosts, then the specific ports that LifeKeeper is using need to be opened. Please note that LifeKeeper uses specific ports for communication paths, GUI, IP and Data Replication. Refer to Running LifeKeeper with a Firewall for details. To disable or modify the firewall please refer to the documentation for your OS distribution.
Coexistence with SELinux	Disable SELinux. To Disable SELinux, please refer to the documentation for your OS distribution. AppArmor (for distributions that use this security model) may be enabled.
Suid Mount Option	The suid mount option is the default when mounting as root and is not written to the /etc/mtab by the mount command. The suid mount option is not needed in LifeKeeper environments.

Server Configuration

Item	Description
BIOS Updates	The latest available BIOS should always be installed on all LifeKeeper servers.

LifeKeeper Version 8.2.0 and Later GUI Requirement

64-bit versions of any PAM related packages will be required for the LifeKeeper GUI Client to successfully authenticate users.

Confirm Failover and Block Resource Failover Settings

Make sure you review and understand the following descriptions, examples and considerations before setting the Confirm Failover or Block Resource Failover in your LifeKeeper environment. These settings are available from the command line or on the Properties panel in the LifeKeeper GUI.

Confirm Failover On:

Definition – Enables manual failover confirmation from System A to System B (where System A is the server whose properties are being displayed in the Properties Panel and System B is the system to the left of the checkbox). If this option is set on a system, it will require a manual confirmation by a system administrator before allowing LifeKeeper to perform a failover recovery of a system that it detects as failed.

Use the lk_confirmso command to confirm the failover. By default, the administrator has 10 minutes to run this command. This time can be changed by modifying the CONFIRMSOTO setting in /etc/default/LifeKeeper. If the administrator does not run the lk_confirmso command within the time allowed, the failover will either proceed or be blocked. By default, the failover will proceed. This behavior can be changed by modifying the COMFIRMSODEF setting in /etc/default/LifeKeeper.

Example: If you wish to block automatic failovers completely, then you should set the Confirm Failover On option in the Properties panel and also set CONFIRMSODEF to 1 (block failover) and CONFIRMSOTO to 0 (do not wait to decide on the failover action).

When to select this setting:

This setting is used in most Disaster Recovery and other WAN configurations where the configuration does not include redundant heartbeat communications paths.

In a regular site (non multi-site cluster), open the Properties page from one server and then select the server that you want the Confirm Failover flag to be set on.

For a Multi-site WAN configuration: Enable manual failover confirmation

For a Multi-site LAN configuration: Do not enable manual failover confirmation

In a multi-site cluster environment – from the non-disaster system, select the DR system and check the set confirm failover flag. You will need to open the Properties panel and select this setting for each non-disaster server in the cluster.

Set Block Resource Failover On:

Definition – By default, all resource failures will result in a recover event that will attempt to recover the failed resource on the local system. If local recovery fails or is not enabled, then LifeKeeper transfers the resource hierarchy to the next highest priority system for which the resource is defined. However, if this setting is selected on a designated system(s), all resource transfers due to a resource failure will be blocked from the given system.

When the setting is enabled, the following message is logged:

Local recovery failure, failover blocked, MANUAL INTERVENTION REQUIRED

Conditions/Considerations:

In a Multi-site configuration, do not select Block Failover for any server in the configuration.

Remember: This setting will not affect failover behavior if there is a complete system failure. It will only block failovers due to resource failures.

NFS Client Options

When setting up a LifeKeeper protected NFS server, how the NFS clients connect to that server can make a significant impact on the speed of reconnection on failover.

NFS Client Mounting Considerations

An NFS Server provides a network-based storage system to client computers. To utilize this resource, the client systems must “mount” the file systems that have been NFS exported by the NFS server. There are several options that system administrators must consider on how NFS clients are connected to the LifeKeeper protected NFS resources.

UDP or TCP?

The NFS Protocol can utilize either the User Datagram Protocol (UDP) or the Transmission Control Protocol (TCP). NFS has historically used the UDP protocol for client-server communication. One reason for this is that it is easier for NFS to work in a stateless fashion using the UDP protocol. This “statelessness” is valuable in a high availability clustering environment, as it permits easy reconnection of clients if the protected NFS server resource is switched between cluster hosts. In general, when working with a LifeKeeper protected NFS resource, the UDP protocol tends to work better than TCP.

Sync Option in /etc/exports

Specifying “sync” as an export option is recommended for LifeKeeper protected NFS resources. The “sync” option tells NFS to commit writes to the disk before sending an acknowledgment back to the NFS client. The contrasting “async” option is also available, but using this option can lead to data corruption, as the NFS server will acknowledge NFS writes to the client before committing them to disk. NFS clients can also specify “sync” as an option when they mount the NFS file system.

Red Hat EL6 (and Fedora 14) Clients with Red Hat EL6 NFS Server

Due to what appears to be a bug in the NFS server for Red Hat EL6, NFS clients running Red Hat EL6 (and Fedora 14) cannot specify both an NFS version (nfsvers) and UDP in the mount command. This same behavior has been observed on an Ubuntu10.10 client as well. This behavior is not seen with Red Hat EL5 clients when using a Red Hat EL6 NFS server, and it is also not seen with any clients using a Red Hat EL5 NFS server. The best combination of NFS mount directives to use with Red Hat EL6 (Fedora 14) clients and a Red Hat EL 6 NFS server is:

mount <protected-IP>:<export> <mount point>
-o nfsvers=2,sync,hard,intr,timeo=1

This combination produces the fastest re-connection times for the client in case of a switchover or failover of the LifeKeeper protected NFS server.

Red Hat EL5 NFS Clients with a Red Hat EL6 NFS Server

The best combination of options when using NFS clients running Red Hat EL5 with a Red Hat EL6 NFS server for fast reconnection times is:

mount <protected-IP>:<export> <mount point>
-o nfsvers=3,sync,hard,intr,timeo=1,udp

Transferring Resource Hierarchies

Cluster Example

Feedback

Post your comment on this topic.