LifeKeeper Core – Known Issues / Restrictions - LifeKeeper for Linux LIVE

Description
File system labels should not be used in large configurations The use of file system labels can cause performance problems during boot-up with large clusters. The problems are generally the result of the requirement that to use labels all devices connected to a system must be scanned. For systems connected to a SAN, especially those with LifeKeeper where accessing a device is blocked, this scanning can be very slow. To avoid this performance problem on Red Hat systems, edit /etc/fstab and replace the labels with the path names.
lkscsid will halt the system when it should issue a sendevent when a disk fails in certain environments When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recover from the failure. The sendevent will first try to recover the failure locally and if that fails, will try to recover the failure by switching the hierarchy with the disk to another server. On some versions of Linux (RHEL 5 and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt the system. This only affects hierarchies using the SCSI device nodes such as /dev/sda in a shared storage configuration.
DataKeeper Create Resource fails When using DataKeeper in certain environments (e.g., virtualized environments with IDE disk emulation, or servers with HP CCISS storage), an error may occur when a mirror is created: ERROR 104052: Cannot get the hardware ID of the device “dev/hda3” This is because LifeKeeper does not recognize the disk in question and cannot get a unique ID to associate with the device. Workaround: Use a GUID Partition so that LifeKeeper can recognize the disk in question. Otherwise add a pattern for the disk(s) to the DEVNAME device_pattern file, e.g.: # cat /opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern /dev/hda*
Specifying hostnames for API access The key name used to store LifeKeeper server credentials must match the hostname of the other LifeKeeper server *exactly* (as displayed by the hostname command on that server). If the hostname is an FQDN, then the credential key must also be the FQDN. If the hostname is a short name, then the key must also be the short name. Workaround: Make sure that the hostname(s) stored by credstore match the hostname exactly.
The use of lkbackup taken from versions of LifeKeeper prior to 8.3.2 requires manually updating /etc/default/LifeKeeper when restored on a later version In the current version of LifeKeeper/SPS, there have been significant enhancements to the logging and other major core components. These enhancements affect the tunables in the /etc/default/LifeKeeper file. When an lkbackup created with an older version of LifeKeeper/SPS is restored on a newer version of LifeKeeper/SPS, these tunables will no longer have the right values causing a conflict. Solution: Prior to restoring from an lkbackup, save /etc/default/LifeKeeper. After restoring from the lkbackup, merge in the tunable values as listed below. When using lkbackup taken from versions previous to 8.0.0 and restoring on 8.0.0 or later: LKSYSLOGTAG=LifeKeeper LKSYSLOGSELECTOR=local6 Also, when using lkbackup taken from versions previous to 9.0.1 and restoring on 9.0.1 or later: PATH=/opt/LifeKeeper/bin:/usr/java/jre1.8.0_51/bin:/usr/java/bin:/usr/java/jdk1.8.0_51/bin:/bin:/usr/bin:/usr/sbin:/sbin See Logging With syslog for more information.
Restore of an lkbackup after a resource has been created may leave broken equivalencies The configuration files for created resources are saved during an lkbackup. If a resource is created for the first time after an lkbackup has been taken, that resource may not be properly accounted for when restoring from this previous backup. Solution: Restore from lkbackup prior to adding a new resource for the first time. If a new resource has been added after an lkbackup, it should either be deleted prior to performing the restore, or delete an instance of the resource hierarchy, then re-extend the hierarchy after the restore. Note: It is recommended that an lkbackup be run when a resource of a particular type is created for the first time.
Resources removed in the wrong order during failover In cases where a hierarchy shares a common resource instance with another root hierarchy, resources are sometimes removed in the wrong order during a cascading failover or resource failover. Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the top down.>/p> Create a gen/app that always succeeds on restore and remove. Make all current roots children of this new gen/app. Note: Using /bin/true for the restore and remove script would accomplish this.
LifeKeeper syslog EMERG severity messages do not display to a SLES11 host’s console which has AppArmor enabled LifeKeeper is accessing /var/run/utmp which is disallowed by the SLES11 AppArmor syslog-ng configuration. Solution: To allow LifeKeeper syslog EMERG severity messages to appear on a SLES11 console with AppArmor enabled, add the following entry to /etc/apparmor.d/sbin.syslog-ng: /var/run/utmp kr If added to sbin.syslog-ng, you can replace the existing AppArmor definition (without rebooting) and update with: apparmor_parser -r /etc/apparmor.d/sbin.syslog-ng Verify that the AppArmor update was successful by sending an EMERG syslog entry via: logger -p local6.emerg “This is a syslog/lk/apparmor test.”
RHEL 6.0 is NOT Recommended SIOS strongly discourages the use of RHEL 6.0. If RHEL 6.0 is used, please understand that an OS update may be required to fix certain issues including, but not limited to: DMMP fails to recover from cable pull with EMC CLARiiON (fixed in RHEL 6.1) md recovery process hangs (fixed in the first update kernel of RHEL 6.1) Note: In DataKeeper configurations, if the operating system is updated, a reinstall/upgrade of SIOS Protection Suite for Linux is required.
Delete of nested file system hierarchy generates “Object does not exist” message Solution: This message can be disregarded as it does not create any issues.
filesyshier returns the wrong tag on a nested mount create When a database has nested file system resources, the file system kit will create the file system for both the parent and the nested child. However, filesyshier returns only the child tag. This causes the application to create a dependency on the child but not the parent. Solution: When multiple file systems are nested within a single mount point, it may be necessary to manually create the additional dependencies to the parent application tag using dep_create or via the UI Create Dependency.
DataKeeper: Nested file system create will fail with DataKeeper When creating a DataKeeper mirror for replicating an existing File System, if a file system is nested within this structure, you must unmount it first before creating the File System resource. Workaround: Manually unmount the nested file systems and remount / create each nested mount.
lkstart on SLES 11 SP2 generates insserv message When lkstart is run on SLES 11 SP2, the following insserv message is generated: insserv: Service syslog is missed in the runlevel 4 to use service steeleye-runit LifeKeeper and steeleye-runit scripts are configured by default to start in run level 4 where dependent init script syslog is not. If system run level is changed to 4, syslog will be terminated and LifeKeeper will be unable to log. Solution: Make sure system run level is NOT changed to run level 4.
Changing the mount point of the device protected by Filesystem resource may lead data corruption The mount point of the device protected by LifeKeeper via the File System resource (filesys) must not be changed. Doing so may lead to the device being mounted on multiple nodes and if a switchover is done and this could lead to data corruption.
XFS file system usage may cause quickCheck to fail. In the case CHECK_FS_QUOTAS setting is enabled for LifeKeeper installed on Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7, quickCheck fails if uquota, gquota option is set to the XFS file system resource, which is to be protected. Solution: Use usrquota, grpquota instead of uquota, gquota for mount options of XFS file system, or, disable CHECK_FS_QUOTAS setting.
Btrfs is not supported Btrfs cannot be used for LifeKeeper files (/opt/LifeKeeper), bitmap files if they are not in /opt/LifeKeeper, lkbackupfiles, or any other LifeKeeper related files. In addition, LifeKeeper does not support protecting Btrfs within a resource hierarchy.
SLES12 SP1 or later on AWS The following restrictions apply with SLES12 SP1 or later on AWS: Cannot set static routing configuration Automatic IP address configuration via DHCP does not work if a static routing configuration is set in /etc/sysconfig/network/routes. This causes the network not to start correctly. Solution: Update the routing information in the configuration file by modifying the “ROUTE” parameter in /etc/sysconfig/network/ifroute-ethX Hostname is changed even if the “Change Hostname via DHCP” setting is disabled. The LK service does not work properly if the hostname is rewritten. In SLES12 SP1 or later on AWS, the hostname is changed even after the “Change Hostname via DHCP” setting is disabled. Solution: º Update /etc/cloud/cloud.cfg to comment out the “update_hostname” parameter º Update /etc/cloud/cloud.cfg to set the preserve_hostname parameter to “true” º Update /etc/sysconfig/network/dhcp to set the DHCLIENT_SET_HOSTNAME parameter to “no”
Shutdown Strategy set to “Switchover Resources” may fail when using Quorum/Witness Kit in Witness mode Hierarchy switchover during LifeKeeper shutdown may fail to occur when using the Quorum/Witness Kit in Witness mode. Workaround: Manually switchover resource hierarchies before shutdown.
Edit /etc/service If the following entry in /etc/service is deleted, LifeKeeper cannot start up. lcm_server 7365/tcp Don’t delete this entry when editing the file.
Any storage unit which returns a string including a space for the SCSI ID cannot be protected by LifeKeeper.
Automatic recovery of network may fail when link status goes down on RHEL 6.x systems when using bonded interfaces. The network doesn’t recover automatically when using bonded interfaces when the link status is lost and then restored on RHEL 6.x. Loss of the link status can occur via a bad network cable, bad switch or hub or a cable reconnection and ifdown -> ifup. To recover from this status, restart the network manually by executing the following command by a user with root authorization. # service network restart Please note that this problem is already corrected for RHEL7. Also, this problem doesn’t occur with SLES.
Using bind mounts is not supported Bind mounts (mount —bind) cannot be used for the file system protected by LifeKeeper.

Description

File system labels should not be used in large configurations

The use of file system labels can cause performance problems during boot-up with large clusters. The problems are generally the result of the requirement that to use labels all devices connected to a system must be scanned. For systems connected to a SAN, especially those with LifeKeeper where accessing a device is blocked, this scanning can be very slow.

To avoid this performance problem on Red Hat systems, edit /etc/fstab and replace the labels with the path names.

lkscsid will halt the system when it should issue a sendevent when a disk fails in certain environments

When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recover from the failure. The sendevent will first try to recover the failure locally and if that fails, will try to recover the failure by switching the hierarchy with the disk to another server. On some versions of Linux (RHEL 5 and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt the system. This only affects hierarchies using the SCSI device nodes such as /dev/sda in a shared storage configuration.

DataKeeper Create Resource fails

When using DataKeeper in certain environments (e.g., virtualized environments with IDE disk emulation, or servers with HP CCISS storage), an error may occur when a mirror is created:

ERROR 104052: Cannot get the hardware ID of the device “dev/hda3”

This is because LifeKeeper does not recognize the disk in question and cannot get a unique ID to associate with the device.

Workaround: Use a GUID Partition so that LifeKeeper can recognize the disk in question. Otherwise add a pattern for the disk(s) to the DEVNAME device_pattern file, e.g.:

# cat /opt/LifeKeeper/subsys/scsi/resources/DEVNAME/device_pattern

/dev/hda*

Specifying hostnames for API access

The key name used to store LifeKeeper server credentials must match the hostname of the other LifeKeeper server exactly (as displayed by the hostname command on that server). If the hostname is an FQDN, then the credential key must also be the FQDN. If the hostname is a short name, then the key must also be the short name.

Workaround: Make sure that the hostname(s) stored by credstore match the hostname exactly.

The use of lkbackup taken from versions of LifeKeeper prior to 8.3.2 requires manually updating /etc/default/LifeKeeper when restored on a later version

In the current version of LifeKeeper/SPS, there have been significant enhancements to the logging and other major core components. These enhancements affect the tunables in the /etc/default/LifeKeeper file. When an lkbackup created with an older version of LifeKeeper/SPS is restored on a newer version of LifeKeeper/SPS, these tunables will no longer have the right values causing a conflict.

Solution: Prior to restoring from an lkbackup, save /etc/default/LifeKeeper. After restoring from the lkbackup, merge in the tunable values as listed below.

When using lkbackup taken from versions previous to 8.0.0 and restoring on 8.0.0 or later:

LKSYSLOGTAG=LifeKeeper

LKSYSLOGSELECTOR=local6

Also, when using lkbackup taken from versions previous to 9.0.1 and restoring on 9.0.1 or later:

PATH=/opt/LifeKeeper/bin:/usr/java/jre1.8.0_51/bin:/usr/java/bin:/usr/java/jdk1.8.0_51/bin:/bin:/usr/bin:/usr/sbin:/sbin

See Logging With syslog for more information.

Restore of an lkbackup after a resource has been created may leave broken equivalencies

The configuration files for created resources are saved during an lkbackup. If a resource is created for the first time after an lkbackup has been taken, that resource may not be properly accounted for when restoring from this previous backup.

Solution: Restore from lkbackup prior to adding a new resource for the first time. If a new resource has been added after an lkbackup, it should either be deleted prior to performing the restore, or delete an instance of the resource hierarchy, then re-extend the hierarchy after the restore. Note: It is recommended that an lkbackup be run when a resource of a particular type is created for the first time.

Resources removed in the wrong order during failover

In cases where a hierarchy shares a common resource instance with another root hierarchy, resources are sometimes removed in the wrong order during a cascading failover or resource failover.

Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the top down.>/p>

Create a gen/app that always succeeds on restore and remove.
Make all current roots children of this new gen/app.

Note: Using /bin/true for the restore and remove script would accomplish this.

LifeKeeper syslog EMERG severity messages do not display to a SLES11 host’s console which has AppArmor enabled

LifeKeeper is accessing /var/run/utmp which is disallowed by the SLES11 AppArmor syslog-ng configuration.

Solution: To allow LifeKeeper syslog EMERG severity messages to appear on a SLES11 console with AppArmor enabled, add the following entry to /etc/apparmor.d/sbin.syslog-ng:

/var/run/utmp kr
If added to sbin.syslog-ng, you can replace the existing AppArmor definition (without rebooting) and update with:

apparmor_parser -r /etc/apparmor.d/sbin.syslog-ng
Verify that the AppArmor update was successful by sending an EMERG syslog entry via:

logger -p local6.emerg “This is a syslog/lk/apparmor test.”

RHEL 6.0 is NOT Recommended

SIOS strongly discourages the use of RHEL 6.0. If RHEL 6.0 is used, please understand that an OS update may be required to fix certain issues including, but not limited to:

DMMP fails to recover from cable pull with EMC CLARiiON (fixed in RHEL 6.1)
md recovery process hangs (fixed in the first update kernel of RHEL 6.1)

Note: In DataKeeper configurations, if the operating system is updated, a reinstall/upgrade of SIOS Protection Suite for Linux is required.

Delete of nested file system hierarchy generates “Object does not exist” message

Solution: This message can be disregarded as it does not create any issues.

filesyshier returns the wrong tag on a nested mount create

When a database has nested file system resources, the file system kit will create the file system for both the parent and the nested child. However, filesyshier returns only the child tag. This causes the application to create a dependency on the child but not the parent.

Solution: When multiple file systems are nested within a single mount point, it may be necessary to manually create the additional dependencies to the parent application tag using dep_create or via the UI Create Dependency.

DataKeeper: Nested file system create will fail with DataKeeper

When creating a DataKeeper mirror for replicating an existing File System, if a file system is nested within this structure, you must unmount it first before creating the File System resource.

Workaround: Manually unmount the nested file systems and remount / create each nested mount.

lkstart on SLES 11 SP2 generates insserv message

When lkstart is run on SLES 11 SP2, the following insserv message is generated:

insserv: Service syslog is missed in the runlevel 4 to use service steeleye-runit

LifeKeeper and steeleye-runit scripts are configured by default to start in run level 4 where dependent init script syslog is not. If system run level is changed to 4, syslog will be terminated and LifeKeeper will be unable to log.

Solution: Make sure system run level is NOT changed to run level 4.

Changing the mount point of the device protected by Filesystem resource may lead data corruption

The mount point of the device protected by LifeKeeper via the File System resource (filesys) must not be changed. Doing so may lead to the device being mounted on multiple nodes and if a switchover is done and this could lead to data corruption.

XFS file system usage may cause quickCheck to fail.

In the case CHECK_FS_QUOTAS setting is enabled for LifeKeeper installed on Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7, quickCheck fails if uquota, gquota option is set to the XFS file system resource, which is to be protected.

Solution: Use usrquota, grpquota instead of uquota, gquota for mount options of XFS file system, or, disable CHECK_FS_QUOTAS setting.

Btrfs is not supported

Btrfs cannot be used for LifeKeeper files (/opt/LifeKeeper), bitmap files if they are not in /opt/LifeKeeper, lkbackupfiles, or any other LifeKeeper related files. In addition, LifeKeeper does not support protecting Btrfs within a resource hierarchy.

SLES12 SP1 or later on AWS

The following restrictions apply with SLES12 SP1 or later on AWS:

- Cannot set static routing configuration
  Automatic IP address configuration via DHCP does not work if a static routing configuration is set in /etc/sysconfig/network/routes. This causes the network not to start correctly.
  Solution: Update the routing information in the configuration file by modifying the “ROUTE” parameter in /etc/sysconfig/network/ifroute-ethX
- Hostname is changed even if the “Change Hostname via DHCP” setting is disabled.
  The LK service does not work properly if the hostname is rewritten. In SLES12 SP1 or later on AWS, the hostname is changed even after the “Change Hostname via DHCP” setting is disabled.
  Solution:
  - º Update /etc/cloud/cloud.cfg to comment out the “update_hostname” parameter
  - º Update /etc/cloud/cloud.cfg to set the preserve_hostname parameter to “true”
  - º Update /etc/sysconfig/network/dhcp to set the DHCLIENT_SET_HOSTNAME parameter to “no”

Shutdown Strategy set to “Switchover Resources” may fail when using Quorum/Witness Kit in Witness mode

Hierarchy switchover during LifeKeeper shutdown may fail to occur when using the Quorum/Witness Kit in Witness mode.

Workaround: Manually switchover resource hierarchies before shutdown.

Edit /etc/service

If the following entry in /etc/service is deleted, LifeKeeper cannot start up.

lcm_server 7365/tcp

Don’t delete this entry when editing the file.

Any storage unit which returns a string including a space for the SCSI ID cannot be protected by LifeKeeper.

Automatic recovery of network may fail when link status goes down on RHEL 6.x systems when using bonded interfaces.

The network doesn’t recover automatically when using bonded interfaces when the link status is lost and then restored on RHEL 6.x. Loss of the link status can occur via a bad network cable, bad switch or hub or a cable reconnection and ifdown -> ifup. To recover from this status, restart the network manually by executing the following command by a user with root authorization.

# service network restart

Please note that this problem is already corrected for RHEL7. Also, this problem doesn’t occur with SLES.

Using bind mounts is not supported

Bind mounts (mount —bind) cannot be used for the file system protected by LifeKeeper.

Installation – Known Issues / Restrictions

Internet/IP Licensing – Known Issues / Restrictions

フィードバック

このトピックへフィードバック

LifeKeeper Core – Known Issues / Restrictions

フィードバック

お役に立ちましたか?