You are here: Troubleshooting > Known Issues and Restrictions

Known Issues and Restrictions

Included below are the restrictions or known issues open against LifeKeeper for Linux, broken down by functional area.

Installation

Description

In Release 7.4 and forward, relocation of the SteelEye product RPM packages is no longer supported.

lklin00001458

Package check errors (rpm -V steeleye-lk) will occur on the core when installed on SUSE.  The following errors will occur:

Because of the way SUSE runs shutdown scripts (vs other Linux distributions), the following scripts are moved to another location after installation, so LifeKeeper will be shut down when changing run levels or rebooting.  These should be the only errors that occur when verifying the steeleye-lk package.

Missing    /etc/rc.d/rc0.d/K01lifekeeper

Missing    /etc/rc.d/rc1.d/K01lifekeeper

Missing    /etc/rc.d/rc6.d/K01lifekeeper

Bug 1481

GUI does not work with default RHEL6 64-bit

There is a compatibility issue against Red Hat Enterprise Linux 6 64-bit

Solution:  Install the following packages, which are contained in the installation media of the OS, prior to installing LifeKeeper.  If these are not installed prior to installing LifeKeeper, the install will not finish correctly.

 libXau-1.0.5-1.el6.i686.rpm 
 libxcb-1.5-1.el6.i686.rpm 
 libX11-1.3-2.el6.i686.rpm 
 libXext-1.1-3.el6.i686.rpm 
 libXi-1.3-3.el6.i686.rpm 
 libXtst-1.0.99.2-3.el6.i686.rpm

Bug 1533

The multipathd daemon will log errors in the error log when the nbd driver is loaded as it tries to scan the new devices

Solution:  To avoid these errors in the log, add devnode "^nbd" to the blacklist in /etc/multipath.conf.

Bug 2196

The following errors appear in the LK_install log file after running setup from de.img on SLES11 SP1:

     ************** SETUP ENDING: Thu Sep 22 15:37:12 EDT 2011
Press ENTER to exit# running LSB install on lifekeeper rc script
insserv: Script jexec is broken: incomplete LSB comment.
insserv: missing `Required-Stop:'  entry: please add even if empty.
insserv: warning: current stop runlevel(s) (empty) of script `lifekeeper' overwrites defaults (2 3 4 5).
insserv: can not symlink(../lifekeeper, rc2.d/S12lifekeeper): File exists
insserv: can not symlink(../lifekeeper, rc3.d/S12lifekeeper): File exists
insserv: can not symlink(../lifekeeper, rc4.d/S12lifekeeper): File exists
insserv: can not symlink(../lifekeeper, rc5.d/S12lifekeeper): File exists
insserv: warning: current stop runlevel(s) (empty) of script `lifekeeper_stop'
overwrites defaults (2 3 4 5).

Workaround: Insert  the line "# Required-Stop:" in /etc/init.d/jexec as follows:

### BEGIN INIT INFO
# Provides: binfmt_misc
# Required-Start: $local_fs
# Default-Start: 1 2 3 4 5
# Required-Stop:
# Default-Start: 0 6
# chkconfig: 12345 95 05
# Description: Supports the direct execution of binary formats.
### END INIT INFO
#

Bug 2239

Incomplete NFS Setup Logging

When running the Installation Support setup script from the ISO image de.img, the output from the script patching process for NFS is not captured in the LifeKeeper install log (/var/log/LK_install.log). No workaround is available.

Bug 2219

Core package upgrade from 7.x fails with conflict on Html.pm package

Upgrading the LifeKeeper Core package (steeleye-lk) from a release prior to 7.4.0 to release 7.5.0 or later will result in a conflicts error on the file /opt/LifeKeeper/lib/perl/Html.pm. Resolving this error and successfully installing the Core package will require the use of the --force option to rpm.

Bug 2176

When using the loopback interface in the INTERFACELIST tunable, licensing will not function properly.

The loopback (lo) interface cannot be used in the INTERFACELIST tunable.

Bug 2264 

lklicmgr tool incorrectly displays a "HOSTID mismatch" when a license file based on an IP Address is used.

If a license file based on an IP Address is used, lklicmgr incorrectly displays a HOSTID mismatch error. This is only a display issue with lklicmgr. The license will function as expected.

LifeKeeper Core

Description

lklin00002100

Language Environment Effects

Some LifeKeeper scripts parse the output of Linux system utilities and rely on certain patterns in order to extract information.  When some of these commands run under non-English locales, the expected patterns are altered, and LifeKeeper scripts fail to retrieve the needed information.  For this reason, the language environment variable LC_MESSAGES has been set to the POSIX “C” locale (LC_MESSAGES=C) in /etc/default/LifeKeeper.  It is not necessary to install Linux with the language set to English (any language variant available with your installation media may be chosen); the setting of LC_MESSAGES in /etc/default/LifeKeeper will only influence LifeKeeper.  If you change the value of LC_MESSAGES in /etc/default/LifeKeeper, be aware that it may adversely affect the way LifeKeeper operates.  The side effects depend on whether or not message catalogs are installed for various languages and utilities and if they produce text output that LifeKeeper does not expect.

lklin00003765

File system labels should not be used in large configurations

The use of file system labels can cause performance problems during boot-up with large clusters.  The problems are generally the result of the requirement that to use labels all devices connected to a system must be scanned.  For systems connected to a SAN, especially those with LifeKeeper where accessing a device is blocked, this scanning can be very slow. 

To avoid this performance problem on Red Hat systems, edit /etc/fstab and replace the labels with the path names. 

lklin00003994

Cannot break reservation on QLogic driver (qla2xxx) running SUSE SLES 10

Failover does not work on a SUSE SLES 10 system using the QLogic driver (qla2xxxx).  On x86 boxes running SLES 10 with the stock QLogic driver, a failover does not work since we cannot break the reservation.  It appears the qla2xxx driver delivered on SLES 10 will only issue a reset if there is a hung IO.  NOTE: The qla2xxx driver delivered on SLES 10 SP1 corrects the problem.

lklin00004221

CCISS device checking thread hung errors with the HP MSA 500

Customers are seeing a problem with the HP MSA 500 with LifeKeeper.  LifeKeeper is waiting on an I/O from the MSA 500 controller (via the cciss driver) and it is never received. Device checking thread hung errors are logged in the LifeKeeper log and LifeKeeper successfully fails the resources over to the backup server.

lklin00004361

Syntax errors can occur with gen/app resources

When the steeleye-lkGUI package is upgraded without upgrading the core, a syntax error can occur with gen/app resources.  The steeleye-lkGUI package contains updates to the gen/app GUI components that require the same version or later version of the core. 

NOTE:  When upgrading LifeKeeper, both the GUI and the core packages should be upgraded to the latest versions.  When the core is upgraded in conjunction with the GUI package, no errors should occur. 

lklin00004392

Shutdown hangs on SLES10 systems

When running shutdown on an AMD64 system with SLES10, the system locks up and the shutdown does not complete.  This has been reported to Novell via bug #294787.  The lockup appears to be caused by the SLES10 powersave package.

Workaround: Remove the SLES10 powersave package to enable shutdown to complete successfully.

Bug 1046

lkscsid will halt system when it should issue a sendevent

When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recover from the failure.  The sendevent will first try to recover the failure locally and if that fails, will try to recover the failure by switching the hierarchy with the disk to another server.  On some versions of Linux (RHEL5 and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt the system.  This only affects hierarchies using the SCSI device nodes such as /dev/sda.

Bug 1589

RHEL6: LifeKeeper core gets in a state where it cannot stop or start

When running LifeKeeper with RHEL 6, the LifeKeeper core gets into a state where LifeKeeper cannot be stopped or started.

Workaround:  If LifeKeeper gets into this state (where lkstart reports that "LifeKeeper should already be running" but "ps" or "lktest" shows no daemons are running), execute the following commands:  

initctl stop lifekeeper

rm /etc/init/lifekeeper.conf

rm /etc/init/lk-logmgr.conf

rm /etc/init/lk-logmgr-kill.conf

rm /etc/init/lkstart.conf

lkstart

Bug 1930

Setup will fail for RHEL6 64-bit

There is a compatibility issue against Red Hat Enterprise Linux 6 64-bit.

Solution: Install the following packages, which are contained in the installation media of the OS, prior to installing LifeKeeper. If these are not installed prior to running LifeKeeper setup, the setup will not finish correctly.

rpm -i compat-libstdc++-33-3.2.3-69.el6.i686 libgcc-4.4.4-13.el6.i686
rpm -i nss-softokn-freebl-3.12.7-1.1.el6.i686 glibc-2.12-1.7.el6.i686    

Note: See Package Dependencies List for LifeKeeper 7.5 and Later for more information.

Bug 2213

DataKeeper Create Resource fails

When using DataKeeper with fully virtualized VMs running on Citrix XenServer (or other hypervisor that may provide IDE disk emulation), an error occurs on the create:

ERROR 104052: Cannot get the hardware ID of the device "dev/hda3"

This is due to the fact that the fully virtualized VMs have their local disk drives show up as IDE drives and getId is not able to query IDE disks on these VMs properly.  

Workaround:  Add /dev/hda* to the DEVNAME device_pattern file, e.g.:

# cat /opt/LifeKeeper/subsys/scsi/Resources/DEVNAME/device_pattern

/dev/hda*

Bug 2257

Specifying hostnames for API access 

The key name used to store LifeKeeper server credentials must match exactly the hostname of the other LifeKeeper server (as displayed by the hostname command on that server). If the hostname is an FQDN, then the credential key must also be the FQDN. If the hostname is a short name, then the key must also be the short name.

Workaround: Make sure that the hostname(s) stored by credstore match the hostname exactly.

Bug 2537

Resources removed in the wrong order during failover

In cases where a hierarchy shares a common resource instance with another root hierarchy, resources are sometimes removed in the wrong order during a cascading failover or resource failover.

Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the top down.

  1. Create a gen/app that always succeeds on restore and remove.
  2. Make all current roots children of this new gen/app.

Note: Using /bin/true for the restore and remove script would accomplish this.

Internet/IP Licensing

Bug 2300

INTERFACELIST syntax, /etc/hosts settings dependency

/etc/hosts settings:
When using internet-based licensing (IPv4 address), the configuration of /etc/hosts can negatively impact license validation. If LifeKeeper startup fails with:

Error in obtaining LifeKeeper license key:
Invalid host.
 The hostid of this system does not match the hostid specified in the license file.

and the listed internet hostid is correct, then the configuration of /etc/hosts may be the cause. To correctly match /etc/hosts entries, IPv4 entries must be listed before any IPv6 entries. To verify if the /etc/hosts configuration is the cause, run the following command:

/opt/LifeKeeper/bin/lmutil lmhostid -internet -n

If the IPv4 address listed does not match the IPv4 address in the installed license file, then /etc/hosts must be modified to place IPv4 entries before IPv6 entries to return the correct address.

INTERFACELIST syntax:

By default, licensing in LifeKeeper is based on the primary network interface eth0. LifeKeeper installation and startup errors will occur if interface eth0 is renamed. Renaming is not supported, as it will cause LifeKeeper to fail to obtain a unique system HOST ID. To address consistent network device naming conventions introduced in RedHat Enterprise Linux 6.1, the tunable INTERFACELIST was added to specify the name of the primary interface in RedHat Enterprise Linux 6.x. 

The consistent network device naming of interfaces uses the name em<port number> for on board interfaces and pci<slot number>p<port number>_<virtual function instance> for pci add-in interfaces. By default, LifeKeeper will look for network device em0 on RedHat Enterprise Linux 6.x systems. If that device does not exist, then the INTERFACELIST tunable must be configured to specify the primary interface name. The tunable should only contain the primary interface name but does support additional names in a colon separated list: e.g. INTERFACELIST=em0:em1.

Note: The INTERFACELIST tunable value should be set in /etc/default/LifeKeeper. If the LifeKeeper core package has not yet been installed, /etc/default/LifeKeeper will not exist. In this case, ensure that INTERFACELIST is set in the environment prior to rerunning the setup script (e.g. export INTERFACELIST=em1).

GUI

Description

lklin00004276

GUI login prompt may not re-appear when reconnecting via a web browser after exiting the GUI

When you exit or disconnect from the GUI applet and then try to reconnect from the same web browser session, the login prompt may not appear.

Workaround: Close the web browser, re-open the browser and then connect to the server. When using the Firefox browser, close all Firefox windows and re-open.

lklin00004181

lkGUIapp on RHEL5 reports unsupported theme errors

When you start the GUI application client, you may see the following console message. This message comes from the RHEL 5 and FC6 Java platform look and feel and will not adversely affect the behavior of the GUI client.

/usr/share/themes/Clearlooks/gtk-2.0/gtkrc:60: Engine "clearlooks" is unsupported, ignoring

lklin00000477

GUI does not immediately update IP resource state after network is disconnected and then reconnected

When the primary network between servers in a cluster is disconnected and then reconnected, the IP resource state on a remote GUI client may take as long as 1 minute and 25 seconds to be updated due to a problem in the RMI/

TCP layer.

Bug 1211

Java Mixed Signed/Unsigned Code Warning - When loading the LifeKeeper Java GUI client applet from a remote system, the following security warning may be displayed: 

DigitalSignatureVerify.jpg

 
Enter “Run” and the following dialog will be displayed: 

JavaSecurityWarning.jpg

Block?  Enter “No” and the LifeKeeper GUI will be allowed to operate. 

Recommended Actions:  To reduce the number of security warnings, you have two options: 

  1. Check the “Always trust content from this publisher” box and select “Run”.  The next time the LifeKeeper GUI Java client is loaded, the warning message will not be displayed.

or

  1. Add the following entry to your Java “deployment.properties” file to eliminate the second dialog about blocking. The security warning will still be displayed when you load the Java client, however, the applet will not be blocked and the Block “Yes” or “No” dialog will not be displayed.  Please note this setting will apply to all of your Java applets.

deployment.security.mixcode=HIDE_RUN 

To bypass both messages, implement 1 and 2.

Bug 2522

steeleye-lighttpd process fails to start if Port 778 is in use

If a process is using Port 778 when steeleye-lighttpd starts up, steeleye-lighttpd fails without logging any message causing a failure to connect to the GUI.

Solution: Check to see if the port is in use by trying the following:

  • Check to see if the steeleye-lighttpd process is running:

ps -efww | grep steeleye-lighttpd

  • If steeleye-lighttpd is not running, it is more than likely caused by some other program using the port. This can be confirmed by running the following command:

netstat -anp | grep 778

If the output produced shows another process using Port 778, set the following tunable on all nodes in the cluster and then restart LifeKeeper on all the nodes:

Add the following line to /etc/default/LifeKeeper:

API_SSL_PORT=port_number

where port_number is the new port to use.

Data Replication

Description

lklin00001536

In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the netraid device (mirror) stops responding and eventually the whole system hangs

Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely.  While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in/proc/sys/kernel/vm/bdflush).

Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure).  LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition.  However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation.

The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors).  To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.

lklin00004972

GUI does not show proper state on SLES 10 SP2 system

This issue is due to a SLES 10 SP2 kernel bug and has been fixed in update kernel version 2.6.16.60-0.23. On SLES 10 SP2, netstat is broken due to a new format in /proc/<PID>/fd. 

Solution:  Please upgrade kernel version 2.6.16.60-0.23 if running on SLES 10 SP2.

Important Information Regarding Kernel Upgrades: LifeKeeper typically installs kernel modules to support some of its features; therefore, when applying a kernel patch/kernel upgrade on a RedHat system, it is important to rerun the ./setup script from the installation media to ensure that any kernel modules installed as part of LifeKeeper will be available to the new kernel. Failure to perform this step may leave LifeKeeper resources unable to be put into service and/or improperly protected.

Bug 1563

32-bit zlib packages should be installed to RHEL 6 (64-bit) for Set Compression Level 

When using SDR with RHEL 6 (64-bit), the following error may appear:  

Could not start balance on Target when Compression Level is set on RHEL 6 (64-bit)  

Solution:  To resolve the issue, please install the 32-bit zlib packages from RHEL 6 when using RHEL 6 (64-bit).

Bug 2405

Mirror breaks and fills up /var/log/messages with errors

This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6) during stress tests with induced failures, especially in killing the nbd_server process that runs on a mirror target system. Upgrading to the latest kernel for your distribution may help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1.el6 on Red Hat EL 6.0 or 6.1. Rebooting the source system will clear up this issue. 

With the default kernel that comes with CentOS 6 (2.6.32-71.el6), this issue may occur much more frequently (even when the mirror is just under a heavy load.) Unfortunately, CentOS has not yet released a kernel (2.6.32-131.17.1) that will improve this situation. SIOS recommends updating to the 2.6.32-131.17.1 kernel as soon as it becomes available for CentOS 6.

Important Information Regarding Kernel Upgrades: LifeKeeper typically installs kernel modules to support some of its features; therefore, when applying a kernel patch/kernel upgrade on a RedHat system, it is important to rerun the ./setup script from the installation media to ensure that any kernel modules installed as part of LifeKeeper will be available to the new kernel. Failure to perform this step may leave LifeKeeper resources unable to be put into service and/or improperly protected.

IPv6

Description

SIOS has migrated to the use of the ip command and away from the ifconfig command. Because of this change, customers with external scripts are advised to make a similar change. Instead of issuing the ifconfig command and parsing the output looking for a specific interface, scripts should instead use "ip -o addr show" and parse the output looking for lines that contain the words "inet" and "secondary".

# ip -o addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
     \    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo    inet 127.0.0.1/8 scope host lo
1: lo    inet6 ::1/128 scope host
     \       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     \    link/ether d2:05:de:4f:a2:e6 brd ff:ff:ff:ff:ff:ff
2: eth0    inet 172.17.100.77/22 brd 172.17.103.255 scope global eth0
2: eth0    inet 172.17.100.79/22 scope global secondary eth0
2: eth0    inet 172.17.100.80/22 scope global secondary eth0
2: eth0    inet6 2001:5c0:110e:3364::1:2/64 scope global
     \       valid_lft forever preferred_lft forever
2: eth0    inet6 2001:5c0:110e:3300:d005:deff:fe4f:a2e6/64 scope global dynamic
     \       valid_lft 86393sec preferred_lft 14393sec
2: eth0    inet6 fe80::d005:deff:fe4f:a2e6/64 scope link
     \       valid_lft forever preferred_lft forever

So for the above output from the ip command, the following lines contain virtual IP addresses for the eth0 interface:

2: eth0    inet 172.17.100.79/22 scope global secondary eth0
2: eth0    inet 172.17.100.80/22 scope global secondary eth0

Bug 1971

'IPV6_AUTOCONF = No' for /etc/sysconfig/network-scripts/ifcfg-<nicName> is not being honored on reboot or boot

On boot, a stateless, auto-configured IPv6 address is assigned to the network interface. If a comm path is created with a stateless IPv6 address of an interface that has IPV6_AUTOCONF=No set, the address will be removed if any system resources manage the interface, e.g. ifdown <nicName>;ifup <nicName>.

Comm path using auto-configured IPv6 addresses did not recover and remained dead after rebooting primary server because IPV6_AUTOCONF was set to No.

Solution: Use Static IPv6 addresses only. The use of auto-configured IPv6 addresses could cause a comm loss after a reboot, change NIC, etc.

While IPv6 auto-configured addresses may be used for comm path creation, it is incumbent upon the system administrator to be aware of the following conditions:

  • IPv6 auto-configured/stateless addresses are dependent on the network interface (NIC) MAC address. If a comm path was created and the associated NIC is later replaced, the auto-configured IPv6 address will be different and LifeKeeper will correctly show the comm path is dead. The comm path will need to be recreated.

  • At least with RHEL5.6, implementing the intended behavior for assuring consistent IPv6 auto-configuration during all phases of host operation requires specific domain knowledge for accurately and precisely setting the individual interface config files AS WELL AS the sysctl.conf, net.ipv6.* directives (i.e. explicity setting IPV6_AUTOCONF in the ifcfg-<nic> which is referenced by the 'if/ip' utilities AND setting directives in /etc/sysctl.conf which impact NIC control when the system is booting and switching init levels). 

Bug 1977

IP: Modify Source Address Setting for IPv6 doesn't set source address

When attempting to set the source address for an IPv6 IP resource, it will report success when nothing was changed.

Workaround: Currently no workaround is available. This will be addressed in a future release.

Bug 2008

IP: Invalid IPv6 addressing allowed in IP resource creation

Entering IPv6 addresses of the format 2001:5c0:110e:3368:000000:000000001:61:14 is accepted when the octets contain more than four characters.

Workaround: Enter correctly formatted IPv6 addresses.

Bugs 1922, 1923 and 1989

Can't connect to host via IPv6 addressing

lkGUIapp will fail connecting to a host via IPv6 hex addressing, either via resolvable host name or IP address. lkGUIapp requires an IPv4 configured node for connection. IPv6 comm paths are fully supported.

Bug 2398

IPv6 resource reported as ISP when address assigned to bonded NIC but in 'tentative' state

IPv6 protected resources in LifeKeeper will incorrectly be identified as 'In Service Protected' (ISP) on SLES systems where the IPv6 resource is on a bonded interface, a mode other than 'active-backup' (1) and Linux kernel 2.6.21 or lower. The IPv6 bonded link will remain in the 'tentative' state with the address unresolvable.

Workaround: Set the bonded interface mode to 'active-backup' (1) or operate with an updated kernel which will set the link state from 'tentative' to 'valid' for modes other than 'active-backup' (1).

Important Information Regarding Kernel Upgrades: LifeKeeper typically installs kernel modules to support some of its features; therefore, when applying a kernel patch/kernel upgrade on a RedHat system, it is important to rerun the ./setup script from the installation media to ensure that any kernel modules installed as part of LifeKeeper will be available to the new kernel. Failure to perform this step may leave LifeKeeper resources unable to be put into service and/or improperly protected.

Apache

Description

Bug 1946

Apache Kit does not support IPv6; doesn't indentify IPv6 in httpd.conf

Any IPv6 addresses assigned to the 'Listen' directive entry in the httpd.conf file will cause problems.

Solution: Until there is support for IPv6 in the Apache Recovery Kit, there can be no IPv6 address in the httpd.conf file after the resource has been created.

Oracle Recovery Kit

Description

lklin00000819

The Oracle Recovery Kit does not include support for Connection Manager and Oracle Names features

The LifeKeeper Oracle Recovery Kit does not include support for the following Oracle Net features of Oracle: Oracle Connection Manager, a routing process that manages a large number of connections that need to access the same service; and Oracle Names, the Oracle-specific name service that maintains a central store of service addresses.

The LifeKeeper Oracle Recovery Kit does protect the Oracle Net Listener process that listens for incoming client connection requests and manages traffic to the server.  Refer to the LifeKeeper for Linux Oracle Recovery Kit Administration Guide for LifeKeeper configuration specific information regarding the Oracle Listener.

lklin00003290 / lklin00003323

The Oracle Recovery Kit does not support the ASM or grid component features of Oracle 10g

The following information applies to Oracle 10g database instances only.  The Oracle Automatic Storage Manager (ASM) feature provided in Oracle 10g is not currently supported with LifeKeeper.  In addition, the grid components of 10g are not protected by the LifeKeeper Oracle Recovery Kit.  Support for raw devices, file systems, and logical volumes are included in the current LifeKeeper for Linux Oracle Recovery Kit.  The support for the grid components can be added to LifeKeeper protection using the gen/app recovery kit.

Bug 2385

The Oracle package install fails to add app and typ entries with LifeKeeper running

When installing the Oracle package (Version 7.2), app and typ entries are not created if LifeKeeper is running preventing the ability to create Oracle resource hierarchies until LifeKeeper is stopped and restarted.

Solution: Stop LifeKeeper before installing the Oracle rpm. 

  1. Stop LifeKeeper: lkstop -f

  1. Install Oracle.

  2. Restart LifeKeeper: lkstart

NFS Server Recovery Kit

Description

lklin00001123

Top level NFS resource hierarchy uses the switchback type of the hanfs resource

The switchback type, which dictates whether the NFS resource hierarchy will automatically switch back to the primary server when it comes back into service after a failure, is defined by the hanfs resource.

lklin00003427

Some clients are unable to reacquire nfs file locks

When acting as NFS clients, some Linux kernels do not respond correctly to notifications from an NFS server that an NFS lock has been dropped and needs to be reacquired.  As a result, when these systems are the clients of an NFS file share that is protected by LifeKeeper, the NFS locks held by these clients are lost during a failover or switchover.

Bug 1961

NFS v4 changes not compatible with SLES 11 nfs subsystem operation

The mounting of a non-NFS v4 remote export on SLES 11 starts rpc.statd. The start up of rpc.statd on the out of service node in a cluster protecting an NFS v4 root export will fail.

Solution: Do not mix NFS v2/v3 with a cluster protecting an NFS v4 root export.

Bug 2014

NFS v4 cannot be configured with IPv6

IPv6 virtual IP gets rolled up into the NFSv4 heirarchy.

Solution: Do not use an IPv6 virtual IP resource when creating an NFSv4 resource.

Bug 2019

NFS v4: Unable to re-extend hierarchy after unextend

Extend fails because export point is already exported on the target server.  A re-extend to server A of a NFS v4 hierarchy will fail if a hierarchy is created on server A and extended to server B, brought in service on server B and then unextended from server A.

Solution: On server A run the command "exportfs -ra" to clean up the extra export information left behind.

Bug 2391

NFSv3: File Lock switchover fails on RedHat 6.x and CentOS 6.x

Attempting to fail over file locks on a server failover / switchover does not work with any RedHat 6.x or CentOS 6.x system. Lock failover with NFSv3 is currently not supported on these OS versions.

Solution: Use the lock failover features available with NFSv4.

SAP Recovery Kit

Description

lklin00002532

Failed delete or unextend of a SAP hierarchy

Deleting or unextending a SAP hierarchy that contains the same IP resource in multiple locations within the hierarchy can sometimes cause a core dump that results in resources not being deleted.

To correct the problem, after the failed unextend or delete operation, manually remove any remaining resources using the LifeKeeper GUI.  You may also want to remove the core file from the server.

Bug 2082

Handle Warnings gives a syntax error at -e line 1

When changing the default behavior of No in Handle Warnings to Yes, an error is received.

Solution: Leave this option at the default setting of NoNote: It is highly recommended that this setting be left on the default selection of No as Yellow is a transient state that most often does not indicate a failure.

Bug 2084

Choosing same setting causes missing button on Update Wizard

If user attempts to update the Handle Warning without changing the current setting, the next screen, which indicates that they must go back, is missing the Done button.

Bug 2087

When changes are made to res_state, monitoring is disabled

If Protection Level is set to BASIC and SAP is taken down manually (i.e. for maintenance), it will be marked as FAILED and monitoring will stop.   

Solution: In order for monitoring to resume, LifeKeeper will need to start up the resource instead of starting it up manually.

Bug 2092

ERS in-service fails on remote host if ERS is not parent of Core/CI

Creating an ERS resource without any additional SAP resource dependents will cause initial in-service to fail on switchover. 

SolutionCreate ERS as parent of CI/Core instance (SCS or ASCS), then retry in-service.

LVM Recovery Kit

Description

lklin00003844

Use of lkID incompatible with LVM pvcreate on entire disk

When lkID is used to generate unique disk IDs on disks that are configured as LVM physical volumes, there is a conflict in the locations in which the lkID and LVM information is stored on the disk.  This causes either the lkID or LVM information to be overwritten depending on the order in which lkID and pvcreate are used.

Workaround:  When it is necessary to use lkID in conjunction with LVM, partition the disk and use the disk partition(s) as the LVM physical volume(s) rather than the entire disk.

Bug 1565

LVM actions slow on RHEL 6

When running certain LVM commands on RHEL 6, performance is sometimes slower than in previous releases.  This can be seen in slightly longer restore and remove times for hierarchies with LVM resources.

DMMP Recovery Kit

Description

lklin00004530

DMMP: Write issued on standby server can hang

If a write is issued to a DMMP device that is reserved on another server, then the IO can hang indefinitely (or until the device is no longer reserved on the other server).  If/when the device is released on the other server and the write is issued, this can cause data corruption.

The problem is due to the way the path checking is done along with the IO retries in DMMP.  When "no_path_retry" is set to 0 (fail), this hang will not occur.  When the path_checker for a device fails when the path is reserved by another server (MSA1000), then this also will not occur.

Workaround: Set "no_path_retry" to 0 (fail).  However, this can cause IO failures due to transient path failures.

Bug 1327

DMMP: Multiple initiators are not registered properly for SAS arrays that support ATP_C

LifeKeeper does not support configurations where there are multiple SAS initiators connected to an SAS array.  In these configurations, LifeKeeper will not register each initiator correctly, so only one initiator will be able to issue IOs.  Errors will occur if the multipath driver (DMMP for example) tries to issue IOs to an unregistered initiator.

Bug 1595

LifeKeeper on RHEL 6 cannot support reservations connected to an EMC Clariion

PostgreSQL Recovery Kit

Description

lklin00004972

On SLES 10 SP2, the PostgreSQL resource hierarchy fails with error the database is not running or has experienced a dbfail event

This issue is due to a SLES 10 SP2 kernel bug and has been fixed in update kernel version 2.6.16.60-0.23. On SLES 10 SP2, the netstat is broken due to a new format in /proc/<PID>/fd. The netstat utility is used in the PostgreSQL recovery kit to verify that the database is running.

Solution: Please upgrade kernel version 2.6.16.60-0.23 if running on SLES 10 SP2.

Important Information Regarding Kernel Upgrades: LifeKeeper typically installs kernel modules to support some of its features; therefore, when applying a kernel patch/kernel upgrade on a RedHat system, it is important to rerun the ./setup script from the installation media to ensure that any kernel modules installed as part of LifeKeeper will be available to the new kernel. Failure to perform this step may leave LifeKeeper resources unable to be put into service and/or improperly protected.

MD Recovery Kit

Description

Bug 1043

MD Kit does not support mirrors created with “homehost”

The LifeKeeper MD Recovery Kit will not work properly with a mirror created with the "homehost" feature.  Where "homehost" is configured, LifeKeeper will use a unique ID that is improperly formatted such that in-service operations will fail.  On SLES 11 systems, the “homehost” will be set by default when a mirror is created.  The version of mdadm that supports “homehost” is expected to be available on other distributions and versions as well.  When creating a mirror, specify --homehost="" on the command line to disable this feature.  If a mirror already exists that has been created with the “homehost” setting, the mirror must be recreated to disable the setting.  If a LifeKeeper hierarchy has already been built for a mirror created with “homehost”, the hierarchy must be deleted and recreated after the mirror has been built with the “homehost” disabled.

Bug 130

MD Kit does not support MD devices created on LVM devices

The LifeKeeper MD Recovery Kit will not work properly with an MD device created on an LVM device.  When the MD device is created, it is given a name that LifeKeeper does not recognize. 

Bug 131

MD Kit configuration file entries in /etc/mdadm.conf not commented out

The LifeKeeper configuration file entries in /etc/mdadm.conf should be commented out after a reboot. These file entries are not commented out.

Bug 1098

Components not going out of service in some all path failures

In some cases during an all path failure, mdadm detects the failed legs and the MD quickCheck starts trying to recover before lkscsid detects the failed disk, causing multiple recoveries at the same time resulting in components not being taken out of service.

Bug 1126

Local recovery not performed in large configurations

In some cases with large configurations (6 or more hierarchies), if a local recovery is triggered (sendevent), not all of the hierarchies are checked resulting in local recovery attempt failures.

Bug 1534

Mirrors automatically started during boot

On some systems (for example those running RHEL 6), there is an AUTO entry in the configuration file (/etc/mdadm.conf) that will automatically start mirrors during boot (example:  AUTO +imsm +1.x –all). 

Solution:  Since LifeKeeper requires that mirrors not be automatically started, this entry will need to be edited to make sure that LifeKeeper mirrors will not be automatically started during boot.  The previous example (AUTO +imsm +1.x –all) is telling the system to automatically start mirrors created using imsm metadata and 1.x metadata minus all others.  This entry should be changed to "AUTO -all", telling the system to automatically start everything “minus” all; therefore, nothing will be automatically started.  Important:  If system critical resources (such as root) are using MD, make sure that those mirrors are started by other means while the LifeKeeper protected mirrors are not.

Bug 2574

MD resource instances can be adversely impacted by udev processing during restore

During udev processing, device nodes are removed and recreated. Occasionally during a restore, LifeKeeper will try to access a node before it has been recreated causing the restore to fail.

Solution: Perform the LifeKeeper restore action again.

© 2012 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.