You are here: LifeKeeper > Troubleshooting > Known Issues and Restrictions

Known Issues and Restrictions

Included below are the restrictions or known issues open against LifeKeeper for Linux, broken down by functional area.

Installation

Description

In Release 7.4 and forward, relocation of the SIOS product RPM packages is no longer supported.

Linux Dependencies

Successful completion of the installation of SIOS Protection Suite for Linux, including the optional Recovery Kits, requires the installation of a number of prerequisite packages. Note: Unsuccessful installation of these dependent packages will also impact the ability to start SIOS Protection Suite for Linux along with the ability to load the SIOS Protection Suite for Linux GUI.

To prevent script failures, these packages should be installed prior to attempting to run the installation setup script. For a complete list of dependencies, see the Linux Dependencies topic.

Bug 1533

The multipathd daemon will log errors in the error log when the nbd driver is loaded as it tries to scan the new devices

Solution:  To avoid these errors in the log, add devnode "^nbd" to the blacklist in /etc/multipath.conf.

Bug 2239

Incomplete NFS Setup Logging

When running the Installation setup script from the ISO image (sps.img), the output from the script patching process for NFS is not captured in the LifeKeeper install log (/var/log/LK_install.log). No workaround is available.

Bug 3467

mksh conflicts with SIOS Protection Suite for Linux setup needing ksh

If the mksh package is installed, the SIOS Protection Suite for Linux setup will fail indicating a package conflict. The SIOS Protection Suite for Linux requires the ksh package.

Workaround: On RHEL, CentOS or Oracle Linux, remove the mksh package and install the ksh package. After installing the ksh package, re-run the SIOS Protection Suite for Linux setup.

Example:

  1. Remove the mksh package

yum remove mksh

  1. Install the ksh package

yum install ksh

  1. Re-run setup

Unexpected termination of daemons

Daemons using IPC terminate unexpectedly after update to Red Hat Enterprise Linux 7.2 and Red Hat 7.2 derivative systems. A new systemd feature was introduced in Red Hat Enterprise Linux 7.2 related to the cleanup of all allocated inter-process communication (IPC) resources when the last user session finishes. A session can be an administrative cron job or an interactive session. This behavior can cause daemons running under the same user, and using the same resources, to terminate unexpectedly.

To work around this problem, edit the file /etc/systemd/logind.conf and add the following line:

RemoveIPC=no

Then, execute the following command, so that the change is put into effect:
systemctl restart systemd-logind.service

After performing these steps, daemons no longer crash in the described situation. Applications (such as MQ, Oracle, SAP, etc) using shared memory and semaphores may be affected by this issue and therefore require this change.

LifeKeeper Core

Description

lklin00003765

File system labels should not be used in large configurations

The use of file system labels can cause performance problems during boot-up with large clusters.  The problems are generally the result of the requirement that to use labels all devices connected to a system must be scanned.  For systems connected to a SAN, especially those with LifeKeeper where accessing a device is blocked, this scanning can be very slow. 

To avoid this performance problem on Red Hat systems, edit /etc/fstab and replace the labels with the path names. 

Bug 1046

lkscsid will halt system when it should issue a sendevent

When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recover from the failure.  The sendevent will first try to recover the failure locally and if that fails, will try to recover the failure by switching the hierarchy with the disk to another server. On some versions of Linux (RHEL 5 and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt the system. This only affects hierarchies using the SCSI device nodes such as /dev/sda.

Bug 2213

DataKeeper Create Resource fails

When using DataKeeper in certain environments (e.g., virtualized environments with IDE disk emulation, or servers with HP CCISS storage), an error may occur when a mirror is created:

ERROR 104052: Cannot get the hardware ID of the device "dev/hda3"

This is because LifeKeeper does not recognize the disk in question and cannot get a unique ID to associate with the device.  

Workaround:  Add a pattern for our disk(s) to the DEVNAME device_pattern file, e.g.:

# cat /opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern

/dev/hda*

Bug 2257

Specifying hostnames for API access 

The key name used to store LifeKeeper server credentials must match exactly the hostname of the other LifeKeeper server (as displayed by the hostname command on that server). If the hostname is an FQDN, then the credential key must also be the FQDN. If the hostname is a short name, then the key must also be the short name.

Workaround: Make sure that the hostname(s) stored by credstore match the hostname exactly.

Bug 2833

The use of lkbackups taken from versions of LifeKeeper prior to 8.3.2 requires manually updating /etc/default/LifeKeeper when restored on a later version

In the current version of LifeKeeper/SPS, there have been significant enhancements to the logging and other major core components. These enhancements affect the tunables in the /etc/default/LifeKeeper file. When an lkbackup created with an older version of LifeKeeper/SPS is restored on a newer version of LifeKeeper/SPS, these tunables will no longer have the right values causing a conflict.

Solution: Prior to restoring from an lkbackup, save /etc/default/LifeKeeper. After restoring from the lkbackup, merge in the tunable values as listed below.

When using lkbackups taken from versions previous to 8.0.0 and restoring on 8.0.0 or later:

LKSYSLOGTAG=LifeKeeper

LKSYSLOGSELECTOR=local6

Also, when using lkbackups taken from versions previous to 9.0.1 and restoring on 9.0.1 or later:

PATH=/opt/LifeKeeper/bin:/usr/java/jre1.8.0_51/bin:/usr/java/bin:/usr/java/jdk1.8.0_51/bin:/bin:/usr/bin:/usr/sbin:/sbin

See section on Logging With syslog for further information.

Bug 2884

Restore of an lkbackup after a resource has been created may leave broken equivalencies

The configuration files for created resources are saved during an lkbackup. If a resource is created for the first time after an lkbackup has been taken, that resource may not be properly accounted for when restoring from this previous backup.

Solution: Restore from lkbackup prior to adding a new resource for the first time. If a new resource has been added after an lkbackup, it should either be deleted prior to performing the restore, or delete an instance of the resource hierarchy, then re-extend the hierarchy after the restore. Note: It is recommended that an lkbackup be run when a resource of a particular type is created for the first time.

Bug 2537

Resources removed in the wrong order during failover

In cases where a hierarchy shares a common resource instance with another root hierarchy, resources are sometimes removed in the wrong order during a cascading failover or resource failover.

Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the top down.

  1. Create a gen/app that always succeeds on restore and remove.
  2. Make all current roots children of this new gen/app.

Note: Using /bin/true for the restore and remove script would accomplish this.

LifeKeeper syslog EMERG severity messages do not display to a SLES11 host's console which has AppArmor enabled

LifeKeeper is accessing /var/run/utmp which is disallowed by the SLES11 AppArmor syslog-ng configuration.

Solution: To allow LifeKeeper syslog EMERG severity messages to appear on a SLES11 console with AppArmor enabled, add the following entry to /etc/apparmor.d/sbin.syslog-ng:

/var/run/utmp kr

If added to sbin.syslog-ng, you can replace the existing AppArmor definition (without rebooting) and update with:

apparmor_parser -r /etc/apparmor.d/sbin.syslog-ng

Verify that the AppArmor update was successful by sending an EMERG syslog entry via:

logger -p local6.emerg "This is a syslog/lk/apparmor test."

RHEL 6.0 is NOT Recommended

SIOS strongly discourages the use of RHEL 6.0. If RHEL 6.0 is used, please understand that an OS update may be required to fix certain issues including, but not limited to:

  • DMMP fails to recover from cable pull with EMC CLARiiON (fixed in RHEL 6.1)
  • md recovery process hangs (fixed in the first update kernel of RHEL 6.1)

Note: In DataKeeper configurations, if the operating system is updated, a reinstall/upgrade of SIOS Protection Suite for Linux is required.

Bug 2873

Delete of nested file system hierarchy generates "Object does not exist" message

Solution: This message can be disregarded as it does not create any issues.

Bug 2890

filesyshier returns the wrong tag on a nested mount create

When a database has nested file system resources, the file system kit will create the file system for both the parent and the nested child. However, filesyshier returns only the child tag. This causes the application to create a dependency on the child but not the parent.

Solution: When multiple file systems are nested within a single mount point, it may be necessary to manually create the additional dependencies to the parent application tag using dep_create or via the UI Create Dependency.

Bug 2891

DataKeeper: Nested file system create will fail with DataKeeper

When creating a DataKeeper mirror for replicating an existing File System, if a file system is nested within this structure, you must unmount it first before creating the File System resource.

Workaround: Manually unmount the nested file systems and remount / create each nested mount.

Bug 3096

lkstart on SLES 11 SP2 generates insserv message

When lkstart is run on SLES 11 SP2, the following insserv message is generated:

insserv: Service syslog is missed in the runlevel 4 to use service steeleye-runit

LifeKeeper and steeleye-runit scripts are configured by default to start in run level 4 where dependent init script syslog is not. If system run level is changed to 4, syslog will be terminated and LifeKeeper will be unable to log.

Solution: Make sure system run level is NOT changed to run level 4.

Bug 7054

Changing the mount point of the device protected by Filesystem resource may lead data corruption

The mount point of the device protected by LifeKeeper via the File System resource (filesys) must not be changed. Doing so may lead to the device being mounted on multiple nodes and if a switchover is done and this could lead to data corruption.

XFS file system usage may cause quickCheck to fail.

In the case CHECK_FS_QUOTAS setting is enabled for LifeKeeper installed on Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7, quickCheck fails if uquota, gquota option is set to the XFS file system resource, which is to be protected. Solution: Use usrquota, grpquota instead of uquota, gquota for mount options of XFS file system, or, disable CHECK_FS_QUOTAS setting.

Btrfs is not supported

Btrfs cannot be used for LifeKeeper files (/opt/LifeKeeper), bitmap files if they are not in /opt/LifeKeeper, lkbackupfiles, or any other LifeKeeper related files. In addition, LifeKeeper does not support protecting Btrfs within a resource hierarchy.

SLES12 SP1 or later on AWS

The following restrictions apply with SLES12 SP1 or later on AWS:

  • Cannot set static routing configuration
    Automatic IP address configuration via DHCP does not work if a static routing configuration is set in /etc/sysconfig/network/routes. This causes the network not to start correctly.
    Solution: Update the routing information in the configuration file by modifying the "ROUTE" parameter in /etc/sysconfig/network/ifroute-ethX
  • Hostname is changed even if the "Change Hostname via DHCP" setting is disabled.
    The LK service does not work properly if the hostname is rewritten. In SLES12 SP1 or later on AWS, the hostname is changed even after the "Change Hostname via DHCP" setting is disabled.
    Solution:
    • Update /etc/cloud/cloud.cfg to comment out the "update_hostname" parameter
    • Update /etc/cloud/cloud.cfg to set the preserve_hostname parameter to "true"
    • Update /etc/sysconfig/network/dhcp to set the DHCLIENT_SET_HOSTNAME parameter to "no"

Shutdown Strategy set to "Switchover Resources" may fail when using Quorum/Witness Kit in Witness mode

Hierarchy switchover during LifeKeeper shutdown may fail to occur when using the Quorum/Witness Kit in Witness mode.
Workaround: Manually switchover resource hierarchies before shutdown.

Edit /etc/service

If the following entry in /etc/service is deleted, LifeKeeper cannot start up.

lcm_server 7365/tcp

Don't delete this entry when editing the file.

Any storage unit which returns a string including a space for the SCSI ID cannot be protected by LifeKeeper.

Automatic recovery of network may fail when link status goes down on Red Hat 5.x and Red Hat 6.x systems when using bonded interfaces.

The network doesn't recover automatically when using bonded interfaces when the link status is lost and then restored on Red Hat 5.x or Red Hat 6.x. Loss of the link status can occur via a bad network cable, bad switch or hub or a cable reconnection and ifdown -> ifup. To recover from this status, restart the network manually by executing the following command by a user with root authorization.

# service network restart

Please note that this problem is already corrected for RHEL7. Also, this problem doesn't occur with SLES.

 

Internet/IP Licensing

Description

Bug 2300

/etc/hosts settings dependency

/etc/hosts settings:
When using internet-based licensing (IPv4 address), the configuration of /etc/hosts can negatively impact license validation. If LifeKeeper startup fails with:

Error in obtaining LifeKeeper license key:
Invalid host.
 The hostid of this system does not match the hostid specified in the license file.

and the listed internet hostid is correct, then the configuration of /etc/hosts may be the cause. To correctly match /etc/hosts entries, IPv4 entries must be listed before any IPv6 entries. To verify if the /etc/hosts configuration is the cause, run the following command:

/opt/LifeKeeper/bin/lmutil lmhostid -internet -n

If the IPv4 address listed does not match the IPv4 address in the installed license file, then /etc/hosts must be modified to place IPv4 entries before IPv6 entries to return the correct address.

GUI

Description

lklin00004276

GUI login prompt may not re-appear when reconnecting via a web browser after exiting the GUI

When you exit or disconnect from the GUI applet and then try to reconnect from the same web browser session, the login prompt may not appear.

Workaround: Close the web browser, re-open the browser and then connect to the server. When using the Firefox browser, close all Firefox windows and re-open.

lklin00004181

lkGUIapp on RHEL 5 reports unsupported theme errors

When you start the GUI application client, you may see the following console message. This message comes from the RHEL 5 and FC6 Java platform look and feel and will not adversely affect the behavior of the GUI client.

/usr/share/themes/Clearlooks/gtk-2.0/gtkrc:60: Engine "clearlooks" is unsupported, ignoring

lklin00000477

GUI does not immediately update IP resource state after network is disconnected and then reconnected

When the primary network between servers in a cluster is disconnected and then reconnected, the IP resource state on a remote GUI client may take as long as 1 minute and 25 seconds to be updated due to a problem in the RMI/TCP layer.

Bug 1211

Java Mixed Signed/Unsigned Code Warning - When loading the LifeKeeper Java GUI client applet from a remote system, the following security warning may be displayed: 

DigitalSignatureVerify.jpg

Enter “Run” and the following dialog will be displayed: 

JavaSecurityWarning.jpg

Block?  Enter “No” and the LifeKeeper GUI will be allowed to operate. 

Recommended Actions:  To reduce the number of security warnings, you have two options: 

  1. Check the “Always trust content from this publisher” box and select “Run”.  The next time the LifeKeeper GUI Java client is loaded, the warning message will not be displayed.

or

  1. Add the following entry to your Java “deployment.properties” file to eliminate the second dialog about blocking. The security warning will still be displayed when you load the Java client, however, the applet will not be blocked and the Block “Yes” or “No” dialog will not be displayed.  Please note this setting will apply to all of your Java applets.

deployment.security.mixcode=HIDE_RUN 

To bypass both messages, implement 1 and 2.

Bug 2522

steeleye-lighttpd process fails to start if Port 778 is in use

If a process is using Port 778 when steeleye-lighttpd starts up, steeleye-lighttpd fails causing a failure to connect to the GUI.

Solution: Set the following tunable on all nodes in the cluster and then restart LifeKeeper on all the nodes:

Add the following line to /etc/default/LifeKeeper:

API_SSL_PORT=port_number

where port_number is the new port to use.

Data Replication

Description

Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeepr configuration

Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

lklin00001536

In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the netraid device (mirror) stops responding and eventually the whole system hangs

Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely.  While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in/proc/sys/kernel/vm/bdflush).

Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure).  LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition.  However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation.

The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors).  To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.

Bug 2405

Mirror breaks and fills up /var/log/messages with errors

This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6) during stress tests with induced failures, especially in killing the nbd_server process that runs on a mirror target system. Upgrading to the latest kernel for your distribution may help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1 on Red Hat EL 6.0 or 6.1. Rebooting the source system will clear up this issue. 

With the default kernel that comes with CentOS 6 (2.6.32-71), this issue may occur much more frequently (even when the mirror is just under a heavy load.) Unfortunately, CentOS has not yet released a kernel (2.6.32-131.17.1) that will improve this situation. SIOS recommends updating to the 2.6.32-131.17.1 kernel as soon as it becomes available for CentOS 6.

Note: Beginning with SPS 8.1, when performing a kernel upgrade on Red Hat Enterprise Linux systems, it is no longer a requirement that the setup script (./setup) from the installation image be rerun. Modules should be automatically available to the upgraded kernel without any intervention as long as the kernel was installed from a proper Red Hat package (rpm file).

Bug 2707

High CPU usage reported by top for md_raid1 process with large mirror sizes

With the mdX_raid1 process (with X representing the mirror number), high CPU usage as reported by top can be seen on some OS distributions when working with very large mirrors (500GB or more).

Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunable LKDR_CHUNK_SIZE then delete and recreate the mirror in order to use this new setting.

Bug 2819

The use of lkbackup with DataKeeper resources requires a full resync

Although lkbackup will save the instance and mirror_info files, it is best practice to perform a full resync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot be guaranteed while a resource does not exist.

Bug 2373

Mirror resyncs may hang in early Red Hat/CentOS 6.x kernels with a "Failed to remove device" message in the LifeKeeper log

Kernel versions prior to version 2.6.32-131.17.1 (RHEL 6.1 kernel version 2.6.32-131.0.15 before update, etc) contain a problem in the md driver used for replication. This problem prevents the release of the nbd device from the mirror resulting in the logging of multiple "Failed to remove device" messages and the aborting of the mirror resync. A system reboot may be required to clear the condition. This problem has been observed during initial resyncs after mirror creation and when the mirror is under stress.

Solution: Kernel 2.6.32-131.17.1 has been verified to contain the fix for this problem. If you are using DataKeeper with Red Hat or CentOS 6 kernels before the 2.6.32-131.17.1 version, we recommend updating to this or the latest available version.

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later due to disk I/O performance problem.

IPv6

Description

SIOS has migrated to the use of the ip command and away from the ifconfig command. Because of this change, customers with external scripts are advised to make a similar change. Instead of issuing the ifconfig command and parsing the output looking for a specific interface, scripts should instead use "ip -o addr show" and parse the output looking for lines that contain the words "inet" and "secondary".

# ip -o addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 16436 qdisc noqueue state UNKNOWN
     \    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
1: lo    inet 127.0.0.1/8 scope host lo
1: lo    inet6 ::1/128 scope host
     \       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
     \    link/ether d2:05:de:4f:a2:e6 brd ff:ff:ff:ff:ff:ff
2: eth0    inet 172.17.100.77/22 brd 172.17.103.255 scope global eth0
2: eth0    inet 172.17.100.79/22 scope global secondary eth0
2: eth0    inet 172.17.100.80/22 scope global secondary eth0
2: eth0    inet6 2001:5c0:110e:3364::1:2/64 scope global
     \       valid_lft forever preferred_lft forever
2: eth0    inet6 2001:5c0:110e:3300:d005:deff:fe4f:a2e6/64 scope global dynamic
     \       valid_lft 86393sec preferred_lft 14393sec
2: eth0    inet6 fe80::d005:deff:fe4f:a2e6/64 scope link
     \       valid_lft forever preferred_lft forever

So for the above output from the ip command, the following lines contain virtual IP addresses for the eth0 interface:

2: eth0    inet 172.17.100.79/22 scope global secondary eth0
2: eth0    inet 172.17.100.80/22 scope global secondary eth0

Bug 1971

'IPV6_AUTOCONF = No' for /etc/sysconfig/network-scripts/ifcfg-<nicName> is not being honored on reboot or boot

On boot, a stateless, auto-configured IPv6 address is assigned to the network interface. If a comm path is created with a stateless IPv6 address of an interface that has IPV6_AUTOCONF=No set, the address will be removed if any system resources manage the interface, e.g. ifdown <nicName>;ifup <nicName>.

Comm path using auto-configured IPv6 addresses did not recover and remained dead after rebooting primary server because IPV6_AUTOCONF was set to No.

Solution: Use Static IPv6 addresses only. The use of auto-configured IPv6 addresses could cause a comm loss after a reboot, change NIC, etc.

While IPv6 auto-configured addresses may be used for comm path creation, it is incumbent upon the system administrator to be aware of the following conditions:

  • IPv6 auto-configured/stateless addresses are dependent on the network interface (NIC) MAC address. If a comm path was created and the associated NIC is later replaced, the auto-configured IPv6 address will be different and LifeKeeper will correctly show the comm path is dead. The comm path will need to be recreated.

  • At least with RHEL 5.6, implementing the intended behavior for assuring consistent IPv6 auto-configuration during all phases of host operation requires specific domain knowledge for accurately and precisely setting the individual interface config files AS WELL AS the sysctl.conf, net.ipv6.* directives (i.e. explicity setting IPV6_AUTOCONF in the ifcfg-<nic> which is referenced by the 'if/ip' utilities AND setting directives in /etc/sysctl.conf which impact NIC control when the system is booting and switching init levels). 

Bug 1977

IP: Modify Source Address Setting for IPv6 doesn't set source address

When attempting to set the source address for an IPv6 IP resource, it will report success when nothing was changed.

Workaround: Currently no workaround is available. This will be addressed in a future release.

Bug 2008

IP: Invalid IPv6 addressing allowed in IP resource creation

Entering IPv6 addresses of the format 2001:5c0:110e:3368:000000:000000001:61:14 is accepted when the octets contain more than four characters.

Workaround: Enter correctly formatted IPv6 addresses.

Bugs 1922, 1923 and 1989

Can't connect to host via IPv6 addressing

lkGUIapp will fail connecting to a host via IPv6 hex addressing, either via resolvable host name or IP address. lkGUIapp requires an IPv4 configured node for connection. IPv6 comm paths are fully supported.

Bug 2398

IPv6 resource reported as ISP when address assigned to bonded NIC but in 'tentative' state

IPv6 protected resources in LifeKeeper will incorrectly be identified as 'In Service Protected' (ISP) on SLES systems where the IPv6 resource is on a bonded interface, a mode other than 'active-backup' (1) and Linux kernel 2.6.21 or lower. The IPv6 bonded link will remain in the 'tentative' state with the address unresolvable.

Workaround: Set the bonded interface mode to 'active-backup' (1) or operate with an updated kernel which will set the link state from 'tentative' to 'valid' for modes other than 'active-backup' (1).

Apache

Description

Bug 1946

Apache Kit does not support IPv6; doesn't indentify IPv6 in httpd.conf

Any IPv6 addresses assigned to the 'Listen' directive entry in the httpd.conf file will cause problems.

Solution: Until there is support for IPv6 in the Apache Recovery Kit, there can be no IPv6 address in the httpd.conf file after the resource has been created.

Oracle Recovery Kit

Description

lklin00000819

The Oracle Recovery Kit does not include support for Connection Manager and Oracle Names features

The LifeKeeper Oracle Recovery Kit does not include support for the following Oracle Net features of Oracle: Oracle Connection Manager, a routing process that manages a large number of connections that need to access the same service; and Oracle Names, the Oracle-specific name service that maintains a central store of service addresses.

The LifeKeeper Oracle Recovery Kit does protect the Oracle Net Listener process that listens for incoming client connection requests and manages traffic to the server.  Refer to the LifeKeeper for Linux Oracle Recovery Kit Administration Guide for LifeKeeper configuration specific information regarding the Oracle Listener.

lklin00003290 / lklin00003323

The Oracle Recovery Kit does not support the ASM or grid component features of Oracle 10g

The following information applies to Oracle 10g database instances only.  The Oracle Automatic Storage Manager (ASM) feature provided in Oracle 10g is not currently supported with LifeKeeper.  In addition, the grid components of 10g are not protected by the LifeKeeper Oracle Recovery Kit.  Support for raw devices, file systems, and logical volumes are included in the current LifeKeeper for Linux Oracle Recovery Kit.  The support for the grid components can be added to LifeKeeper protection using the gen/app recovery kit.

The Oracle Recovery Kit does not support NFS Version 4

The Oracle Recovery Kit supports NFS Version 3 for shared database storage. NFS Version 4 is not supported at this time due to NFSv4 file locking mechanisms.

Bug 3307

Oracle listener stays in service on primary server after failover

Network failures may result in the listener process remaining active on the primary server after an application failover to the backup server. Though connections to the correct database are unaffected, you may still want to kill that listener process.

The Oracle Recovery Kit does not support Oracle Database Standard Edition 2 (SE2) on AWS EC2 system

 

During Oracle Database Standard Edition 2 (SE2) test, an unknown behavior was reported on AWS EC2 system. However, we confirmed that other services except EC2 did not have the same issue, meaning that Oracle Recovery Kit supports SE2 on systems except EC2.

MySQL

Description

The "include" directive is not supported

The "include" directive is not supported. All the setup configuration information must be described in a single my.cnf file.

Crash Recovery

Restarting MySQL after an abnormal termination initiates a MySQL crash recovery. While in this recovery state MySQL client connections are denied. This will prevent LifeKeeper from checking the state of MySQL causing a possible failover to the standby node.

NFS Server Recovery Kit

Description

lklin00001123

Top level NFS resource hierarchy uses the switchback type of the hanfs resource

The switchback type, which dictates whether the NFS resource hierarchy will automatically switch back to the primary server when it comes back into service after a failure, is defined by the hanfs resource.

lklin00003427

Some clients are unable to reacquire nfs file locks

When acting as NFS clients, some Linux kernels do not respond correctly to notifications from an NFS server that an NFS lock has been dropped and needs to be reacquired.  As a result, when these systems are the clients of an NFS file share that is protected by LifeKeeper, the NFS locks held by these clients are lost during a failover or switchover.

When using storage applications with locking and following recommendations for the NFS mount options, SPS requires the additional nolock option be set, e.g. rw,nolock,bg,hard,nointr,tcp,nfsvers=3,timeo=600,rsize=32768,wsize=32768,actimeo=0.

Bug 1961

NFS v4 changes not compatible with SLES 11 nfs subsystem operation

The mounting of a non-NFS v4 remote export on SLES 11 starts rpc.statd. The start up of rpc.statd on the out of service node in a cluster protecting an NFS v4 root export will fail.

Solution: Do not mix NFS v2/v3 with a cluster protecting an NFS v4 root export.

Bug 2014

NFS v4 cannot be configured with IPv6

IPv6 virtual IP gets rolled up into the NFSv4 heirarchy.

Solution: Do not use an IPv6 virtual IP resource when creating an NFSv4 resource.

Bug 2019

NFS v4: Unable to re-extend hierarchy after unextend

Extend fails because export point is already exported on the target server.  A re-extend to server A of a NFS v4 hierarchy will fail if a hierarchy is created on server A and extended to server B, brought in service on server B and then unextended from server A.

Solution: On server A run the command "exportfs -ra" to clean up the extra export information left behind.

File Lock switchover with NFSv3 fails on some operating systems

Failover file locks with NFSv3 during resources switchover/failover does not work on the operating systems listed below. Lock failover with NFSv3 is currently not supported on these OS versions.

  • RHEL7, CentOS7, OL7
  • RHEL6, CentOS6, OL6
  • SLES12
  • SLES11

Solution: Use the lock failover features available with NFSv4.

The Oracle Recovery Kit does not support NFSv4

The Oracle Recovery Kit supports NFSv3 for shared database storage. NFSv4 is not supported at this time due to NFSv4 file locking mechanisms.

Bug 3319

Stopping and starting NFS subsystem adversely impacts SIOS Protection Suite protected NFS exports.

If the NFS subsystem (/etc/init.d/nfs on Red Hat or /etc/init.d/nfsserver on SuSE) is stopped while SIOS Protection Suite for Linux is protecting NFS exports, then all SIOS Protection Suite for Linux protected exported directories will be impacted as the NFS stop action performs an unexport of all the directories. The SIOS Protection Suite for Linux NFS quickCheck script will detect the stopped NFS processes and the unexported directories and run a local recovery to restart the processes and re-export the directories. However, it will take a quickCheck run for each protected export for the SIOS Protection Suite NFS ARK to recover everything. For example, if five exports are protected by the SIOS Protection Suite for Linux NFS ARK, it will take five quickCheck runs to recover all the exported directories the kit protects. Based on the default quickCheck time of two minutes, it could take up to ten minutes to recover all the exported directories.

Workaround: Do not stop the NFS subsystem while the SIOS Protection Suite NFS ARK is actively protecting exported directories on the system. If the NFS subsystem must be stopped, all NFS resources should be switched to the standby node before stopping the NFS subsystem. Use of the exportfs command should also be considered. This command line utility provides the ability to export and unexport a single directory thus bypassing the need to stop the entire NFS subsystem.

SAP Recovery Kit

Description

lklin00002532

Failed delete or unextend of a SAP hierarchy

Deleting or unextending a SAP hierarchy that contains the same IP resource in multiple locations within the hierarchy can sometimes cause a core dump that results in resources not being deleted.

To correct the problem, after the failed unextend or delete operation, manually remove any remaining resources using the LifeKeeper GUI.  You may also want to remove the core file from the server.

Bug 2082

Handle Warnings gives a syntax error at -e line 1

When changing the default behavior of No in Handle Warnings to Yes, an error is received.

Solution: Leave this option at the default setting of NoNote: It is highly recommended that this setting be left on the default selection of No as Yellow is a transient state that most often does not indicate a failure.

Bug 2084

Choosing same setting causes missing button on Update Wizard

If user attempts to update the Handle Warning without changing the current setting, the next screen, which indicates that they must go back, is missing the Done button.

Bug 2087

When changes are made to res_state, monitoring is disabled

If Protection Level is set to BASIC and SAP is taken down manually (i.e. for maintenance), it will be marked as FAILED and monitoring will stop.   

Solution: In order for monitoring to resume, LifeKeeper will need to start up the resource instead of starting it up manually.

Bug 2092

ERS in-service fails on remote host if ERS is not parent of Core/CI

Creating an ERS resource without any additional SAP resource dependents will cause initial in-service to fail on switchover. 

Solution: Create ERS as parent of CI/Core instance (SCS or ASCS), then retry in-service.

LVM Recovery Kit

Description

Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeepr configuration

Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

lklin00003844

Use of lkID incompatible with LVM overwritten on entire disk

When lkID is used to generate unique disk IDs on disks that are configured as LVM physical volumes, there is a conflict in the locations in which the lkID and LVM information is stored on the disk.  This causes either the lkID or LVM information to be overwritten depending on the order in which lkID and pvcreate are used.

Workaround:  When it is necessary to use lkID in conjunction with LVM, partition the disk and use the disk partition(s) as the LVM physical volume(s) rather than the entire disk.

Bug 1565

LVM actions slow on RHEL 6

When running certain LVM commands on RHEL 6, performance is sometimes slower than in previous releases.  This can be seen in slightly longer restore and remove times for hierarchies with LVM resources.

The configuration of Raw and LVM Recovery Kits together is not supported in RHEL 6 environment

When creating a Raw resource, the Raw Recovery Kit is looking for a device file based on major # and minor # of Raw device. As the result, /dev/dm-* will be the device; however, this type of /dev/dm-* cannot be handled by the LVM Recovery Kit and a "raw device not found" error will occur in the GUI.

Multipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS)

Description

Bug 3523

Multipath Recovery Kits (DMMP / HDLM / PPATH / NECSPS): Registration conflict error occurs on lkstop when resource OSF

The multipath recovery kits (DMMP, HDLM, PPATH, NECSPS) can have a system halt occur on the active (ISP) node when LifeKeeper is stopped on the standby (OSU) node if the Multipath resource state is OSF.

Workarounds:

a) Switch the hierarchy to the standby node before LifeKeeper is stopped

OR

b) Run ins_setstate on the standby node and set the Multipath resource state to OSU before LifeKeeper is stopped

DMMP Recovery Kit

Description

lklin00004530

DMMP: Write issued on standby server can hang

If a write is issued to a DMMP device that is reserved on another server, then the IO can hang indefinitely (or until the device is no longer reserved on the other server).  If/when the device is released on the other server and the write is issued, this can cause data corruption.

The problem is due to the way the path checking is done along with the IO retries in DMMP.  When "no_path_retry" is set to 0 (fail), this hang will not occur.  When the path_checker for a device fails when the path is reserved by another server (MSA1000), then this also will not occur.

Workaround: Set "no_path_retry" to 0 (fail).  However, this can cause IO failures due to transient path failures.

Bug 1327

DMMP: Multiple initiators are not registered properly for SAS arrays that support ATP_C

LifeKeeper does not natively support configurations where there are multiple SAS initiators connected to a SAS array. In these configurations, LifeKeeper will not register each initiator correctly, so only one initiator will be able to issue IOs. Errors will occur if the multipath driver (DMMP for example) tries to issue IOs to an unregistered initiator.

Solution: Set the following tunable in /etc/default/LifeKeeper to allow path IDs to be set based on SAS storage information:

MULTIPATH_SAS=TRUE

Bug 1595

LifeKeeper on RHEL 6.0 cannot support reservations connected to an EMC Clariion

Bug 7017

Two or more different storage can not be used concurrently in case of the parameter configuration of DMMP recovery kit is required for some storage model.

DMMP RK doesn't function correctly if the disk name ends with "p<number>".

The DMMP RK doesn't function correctly if the disk name ends with "p<number>".

Workaround: Do not create disk names ending in "p<number>".

DB2 Recovery Kit

Description

DB2 Recovery Kit reports unnecessary error

If DB2 is installed on a shared disk, the following message may be seen when extending a DB2 resource.

LifeKeeper was unable to add instance "%s" and/or its variables to the DB2 registry.

This message will not adversely affect the behavior of the DB2 resource extend.

MD Recovery Kit

Description

Bug 1043

MD Kit does not support mirrors created with “homehost”

The LifeKeeper MD Recovery Kit will not work properly with a mirror created with the "homehost" feature.  Where "homehost" is configured, LifeKeeper will use a unique ID that is improperly formatted such that in-service operations will fail.  On SLES 11 systems, the “homehost” will be set by default when a mirror is created.  The version of mdadm that supports “homehost” is expected to be available on other distributions and versions as well.  When creating a mirror, specify --homehost="" on the command line to disable this feature.  If a mirror already exists that has been created with the “homehost” setting, the mirror must be recreated to disable the setting.  If a LifeKeeper hierarchy has already been built for a mirror created with “homehost”, the hierarchy must be deleted and recreated after the mirror has been built with the “homehost” disabled.

Bug 130

MD Kit does not support MD devices created on LVM devices

The LifeKeeper MD Recovery Kit will not work properly with an MD device created on an LVM device.  When the MD device is created, it is given a name that LifeKeeper does not recognize. 

Bug 131

MD Kit configuration file entries in /etc/mdadm.conf not commented out

The LifeKeeper configuration file entries in /etc/mdadm.conf should be commented out after a reboot. These file entries are not commented out.

Bug 1126

Local recovery not performed in large configurations

In some cases with large configurations (6 or more hierarchies), if a local recovery is triggered (sendevent), not all of the hierarchies are checked resulting in local recovery attempt failures.

Bug 1534

Mirrors automatically started during boot

On some systems (for example those running RHEL 6), there is an AUTO entry in the configuration file (/etc/mdadm.conf) that will automatically start mirrors during boot (example:  AUTO +imsm +1.x –all). 

Solution:  Since LifeKeeper requires that mirrors not be automatically started, this entry will need to be edited to make sure that LifeKeeper mirrors will not be automatically started during boot.  The previous example (AUTO +imsm +1.x –all) is telling the system to automatically start mirrors created using imsm metadata and 1.x metadata minus all others.  This entry should be changed to "AUTO -all", telling the system to automatically start everything “minus” all; therefore, nothing will be automatically started.  

Important:  If system critical resources (such as root) are using MD, make sure that those mirrors are started by other means while the LifeKeeper protected mirrors are not.

 

SAP DB / MaxDB Recovery Kit

Description

Bug 3272

Extend fails when MaxDB is installed and configured on shared storage.

Workaround: Install a local copy of MaxDB on the backup node(s) in the same directories as the primary system.

 

Sybase ASE Recovery Kit

Description

User Name/Password Issues:

  • If the default user name is password-protected, the create UI does not detect this until after all validation is complete

When creating the Sybase resource, you are prompted to enter the user name. The help to front displays a message that if no user is specified, the default of 'sa' will be used. However, no password validation is done for the default account at this time. When SIOS Protection Suite attempts to create the Sybase resource, the resource creation fails because the password has not been validated or entered. The password validation occurs on the user/password dialog, but only when a valid user is actually entered on the user prompt. Even if using the default user name, it must be specified during the create action.

  • Password prompt skipped if no user name specified

User/password dialog skips the password prompt if you do not enter a user name. When updating the user/password via the UI option, if you do not enter the Sybase user name, the default of ‘sa’ will be used and no password validation is done for the account. This causes the monitoring of the database to fail with invalid credential errors. Even if using the default user name, it must be specified during the update action. To fix this failure, perform the following steps:

  1. Verify that the required Sybase data files are currently accessible from the intended server. In most instances, this will be the backup server due to the monitoring and local recovery failure on the primary.
  2. Start the Sybase database instance from the command line on this server (see the Sybase product documentation for information on starting the database manually).
  3. From the command line, change directory (cd) to the LKROOT/bin directory (/opt/LifeKeeper/bin on most installations).
  4. Once in the bin directory, execute the following:

./ins_setstate –t <SYBASE_TAG> -S ISP

where <SYBASE_TAG> is the tag name of the Sybase resource

  1. When the command completes, immediately execute the Update User/Password Wizard from the UI and enter a valid user name, even if planning to use the Sybase default of ‘sa’. Note: The Update User/Password Wizard can be accessed by right-clicking on the Sybase resource instance and selecting Change Username/Password.
  2. When the hierarchy has been updated on the local server, verify that the resource can be brought in service on all nodes.
  • Protecting backup server fails when Sybase local user name >= eight characters

The Sybase user name must consist of less than eight characters. If the Sybase local user name is greater than eight characters, the process and user identification checks used for resource creation and monitoring will fail. This will also prevent the protection of a valid Sybase Backup Server instance from being selected for protection. This problem is caused by the operating system translation of user names that are >= eight characters from the name to the UID in various commands (for example, ps). You must use a user name that is less than eight characters long.

Resource Create Issue:

  • Default Sybase install prompt is based on ASE 16.0 SP02 (/opt/sybase). During the SIOS Protection Suite resource creation, the default prompt for the location of the Sybase installation shows up relative to Sybase Version 16.0 SP02 (/opt/sybase). You must manually enter or browse to the correct Sybase install location during the resource create prompt.

Extend Issues:

  • The Sybase tag prompt on extend is editable but should not be changed. The Sybase tag can be changed during extend, but this is not recommended. Using different tags on each server can lead to issues with remote administration via the command line.

Properties Page Issues:

  • Image appears missing for the Properties pane update user/password. Instead of the proper image, a small square appears on the toolbar. Selecting this square will launch the User/Password Update Wizard.

Sybase Monitor server is not supported in 15.7 or later with SIOS Protection Suite. If the Sybase Monitor server process is configured in Sybase 15.7 or later, you must use a Generic Application (gen/app) resource to protect this server process.

Remove command not recognized by Sybase on SLES 11

If bringing Sybase in service on the backup server, it must first be taken out of service on the primary server. In order for Sybase to recognize this command, you must add a line in "locales/locales.dat" for the SIOS Protection Suite remotely executed Sybase command using "POSIX" as "vendor_locale".

Example:

locale = POSIX, us_english, utf8

Unable to detect that Sybase ARK is running

Symptom: Unable to detect that Sybase ARK is running.
Cause: The Sybase ARK uses the default sql interface tool (isql). On some 64 bit systems, the isql tool is installed as isql64 and not isql.
Solution: The isql64 tool can be copied, in the same path, to isql. Or a link can be created between the isql64 executable and isql.

 

WebSphere MQ Recovery Kit

Description

Error when lksupport command is executed:

The following error can be output when lksupport command is executed in the case MQ queue manager protected by MQ RK is set on the disk shared by NFS.

cat: <PATH>/mqm/qmgrs/tkqmgr/qm.ini: Operation not permitted

This happens because the root access to NFS area is prohibited. This error output doesn't cause any problem.

Quickcheck fails if queue has long messages if a message is in the test queue of size > 101 characters the put/get fails and the queue fills up

Install fails if the only installed MQ is a relocated install (non-standard and not likely)

Package install fails if the MQ package does not have the default name

Compile samples fails if the software is not installed under /opt/mqm

If two listeners are defined for a single instance and one is set to manual and the other is automatic failures can occur in create and quickCheck

 

Quick Service Protection (QSP) Recovery Kit

Description

Unexpected failover may occur when the operating system shutting down.

Workaround: Stop LifeKeeper by lkstop before stopping the OS. Or perform out-of-service for all resources, then you can confirm there is no ISP resources, then do shutdown the OS.

of

© 2017 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.