LifeKeeper Features

Item Description
Licensing LifeKeeper requires unique runtime license keys for each server. This applies to both physical and virtual servers. A license key is required for the LifeKeeper core software, as well as for each separately packaged LifeKeeper recovery kit. The installation script installs a License Utilities package that obtains and displays the Host ID of your server during the initial install of LifeKeeper. Once your licenses have been installed the utility will return the Entitlement ID if it is available or the Host ID if it is not. The Host IDs, along with the Activation ID(s) provided with your software, are used to obtain license keys from the SIOS Technology Corp. website.
Large Cluster Support LifeKeeper supports large cluster configurations, up to 32 servers. There are many factors other than LifeKeeper, however, that can affect the number of servers supported in a cluster. This includes items such as the storage interconnect and operating system or storage software limitations. Refer to the vendor-specific hardware and software configuration information to determine the maximum supported cluster size.
Internationalization and Localization LifeKeeper for Linux v5.2 and later does support wide/multi-byte characters in resource and tag names but does not include native language message support. The LifeKeeper GUI can be localized by creating locale-specific versions of the Java property files, although currently only the English version is fully localized. However, many of the messages displayed by the LifeKeeper GUI come from the LifeKeeper core, so localization of the GUI will provide only a partial solution for users until the core software is fully localized.

See also Language Environment Effects in Known Issues and Restrictions for additional information.

LifeKeeper MIB File LifeKeeper can be configured to issue SNMP traps describing the events that are occurring within the LifeKeeper cluster. See the lk_configsnmp(8) man page for more information about configuring this capability. The MIB file describing the LifeKeeper traps can be found at /opt/LifeKeeper/include/LifeKeeper-MIB.txt.
Watchdog LifeKeeper supports the watchdog feature. The feature was tested by SIOS Technology Corp. on Red Hat EL 5.5 64-bit, and Red Hat EL 6 + softdog.
STONITH LifeKeeper supports the STONITH feature. This feature was tested by SIOS Technology Corp. on SLES 11 on IBM x3550 ×86_64 architecture and RHEL5.5 64-bit.
XFS File System The XFS file system does not use the fsck utility to check and fix a file system but instead relies on mount to replay the log. If there is a concern that there may be a consistency problem, the system administrator should unmount the file system by taking it out of service and run xfs_check(8) and xfs_repair(8) to resolve any issues.
IPv6 SIOS has migrated to the use of the ip command and away from the ifconfig command (for more information, see IPv6 Known Issue).

Tuning

Item Description
IPC Semaphores and IPC Shared Memory LifeKeeper requires Inter-Process Communication (IPC) semaphores and IPC shared memory. The default Red Hat values for the following Linux kernel options are located in /usr/include/linux/sem.h and should be sufficient to support most LifeKeeper configurations.

Option       Required    Default Red Hat 7
SEMOPM        14                     32
SEMUME        20                     32
SEMMNU        60                     32000
SEMMAP        25                     32000
SEMMNI         25                     128

System File Table LifeKeeper requires that system resources be available in order to failover successfully to a backup system. For example, if the system file table is full, LifeKeeper may be unable to start new processes and perform a recovery. In kernels with enterprise patches, including those supported by LifeKeeper, file-max, the maximum number of open files in the system, is configured by default to 1/10 of the system memory size, which should be sufficient to support most LifeKeeper configurations. Configuring the file-max value lower than the default could result in unexpected LifeKeeper failures.

The value of file-max may be obtained using the following command:

cat /proc/sys/fs/file-nr

This will return three numbers. The first represents the high watermark of file table entries (i.e. the maximum the system has seen so far); the second represents the current number of file table entries, and the third is the file-max value.

To adjust file-max, add (or alter) the “fs,file-max” value in /etc/sysctl.conf (see sysctl.conf(5) for the format) and then run

sysctl –p

to update the system. The value in /etc/sysctl.conf will persist across reboots.

LifeKeeper Operations

Item Description
Kernel Debugger (kdb) Before using the Kernel Debugger (kdb) on a LifeKeeper protected server, you should first either shut down LifeKeeper on that server or switch any LifeKeeper protected resources over to the backup server. Use of kdb with the LifeKeeper SCSI Reservation Daemons (lkscsid) enabled (they are enabled by default) can also lead to unexpected panics.
System Panic on Locked Shared Devices LifeKeeper uses a lock to protect shared data from being accessed by other servers on a shared SCSI Bus. If LifeKeeper cannot access a device as a result of another server taking the lock on a device, then a critical error has occurred and quick action should be taken or data can be corrupted. When this condition is detected, LifeKeeper enables a feature that will cause the system to panic.

If LifeKeeper stops the LifeKeeper daemons without removing resource(s) such as lkcli stop -f with shared devices still reserved, then the LifeKeeper locking mechanism may trigger a kernel panic when the other server recovers the resource(s). All resources must be placed out-of-service before stopping LifeKeeper in this manner.

nolock Option When using storage applications with locking and following recommendations for the NFS mount options, SPS requires the additional nolock option be set, e.g. rw,nolock,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0.
Recovering Out-of-Service Hierarchies As part of the recovery following the failure of a LifeKeeper server, resource hierarchies that were configured on the failed server but which were not in-service anywhere at the time of the server failure are recovered on the highest priority alive server at the time of the failure. This is the case no matter where the out-of-service hierarchy was last in service, including the failed server, the recovering server, or some other server in the cluster.
Coexistence with Linux Firewalls

The firewall is enabled upon installation. After installation is complete, the firewall should be modified.

LifeKeeper will function if a host firewall is enabled. However, unless absolutely necessary, it is recommended that the firewall be disabled and that the LifeKeeper protected resources reside behind another shielding firewall.

If LifeKeeper must coexist on firewall enabled hosts, then the specific ports that LifeKeeper is using need to be opened. Please note that LifeKeeper uses specific ports for communication paths, GUI, IP and Data Replication. Refer to Running LifeKeeper with a Firewall for details.

To disable or modify the firewall please refer to the documentation for your OS distribution.

Coexistence with SELinux Disable SELinux. To Disable SELinux, please refer to the documentation for your OS distribution.

AppArmor (for distributions that use this security model) may be enabled.

Suid Mount Option The suid mount option is the default when mounting as root and is not written to the /etc/mtab by the mount command. The suid mount option is not needed in LifeKeeper environments.

Server Configuration

Item Description
BIOS Updates The latest available BIOS should always be installed on all LifeKeeper servers.

LifeKeeper Version 8.2.0 and Later GUI Requirement

64-bit versions of any PAM related packages will be required for the LifeKeeper GUI Client to successfully authenticate users.

Confirm Failover and Block Resource Failover Settings

Make sure you review and understand the following descriptions, examples and considerations before setting the Confirm Failover or Block Resource Failover in your LifeKeeper environment. These settings are available from the command line or on the Properties panel in the LifeKeeper GUI.

Confirm Failover On:

Definition – Enables manual failover confirmation from System A to System B (where System A is the server whose properties are being displayed in the Properties Panel and System B is the system to the left of the checkbox). If this option is set on a system, it will require a manual confirmation by a system administrator before allowing LifeKeeper to perform a failover recovery of a system that it detects as failed.

Use the lk_confirmso command to confirm the failover. By default, the administrator has 10 minutes to run this command. This time can be changed by modifying the CONFIRMSOTO setting in /etc/default/LifeKeeper. If the administrator does not run the lk_confirmso command within the time allowed, the failover will either proceed or be blocked. By default, the failover will proceed. This behavior can be changed by modifying the COMFIRMSODEF setting in /etc/default/LifeKeeper.

Example: If you wish to block automatic failovers completely, then you should set the Confirm Failover On option in the Properties panel and also set CONFIRMSODEF to 1 (block failover) and CONFIRMSOTO to 0 (do not wait to decide on the failover action).

When to select this setting:

This setting is used in most Disaster Recovery and other WAN configurations where the configuration does not include redundant heartbeat communications paths.

Open the Properties page from one server and then select the server that you want the Confirm Failover flag to be set on.

Set Block Resource Failover On:

Definition – By default, all resource failures will result in a recover event that will attempt to recover the failed resource on the local system. If local recovery fails or is not enabled, then LifeKeeper transfers the resource hierarchy to the next highest priority system for which the resource is defined. However, if this setting is selected on a designated system(s), all resource transfers due to a resource failure will be blocked from the given system.

When the setting is enabled, the following message is logged:

Local recovery failure, failover blocked, MANUAL INTERVENTION REQUIRED

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment