Description
LifeKeeper fails to start with the message, “Incorrect SELinux configuration (<setting>). LifeKeeper can not startup.”

Where <setting> is the SELinux boolean value and the incorrect setting. For example, when the SELinux mode is set to enforcing, the Boolean value mmap_low_allowed must be set to on for LifeKeeper to start. In that case the message will be, “Incorrect SELinux configuration (mmap_low_allowed=off). LifeKeeper can not startup.” This can be done using the setsebool utility:

setsebool -P mmap_low_allowed=on
Uninstalling a recovery kit may not remove the corresponding resource type from the LifeKeeper Configuration Database

Symptom: Resource types may continue to appear in the output of /opt/LifeKeeper/bin/typ_list after the corresponding recovery kit has been uninstalled.

Solution: Restart LifeKeeper in order to update the list of available resource types.
The following message will be displayed when setup is run if SELinux is set to “enforcing” and the SELinux parameter “mmap_low_allowed” is “off”.

SELinux appears to be set to Enforcing.

SELinux configuration changes are required to run LifeKeeper in Enforcing mode. If you continue setup, mmap_low_allowed will be changed in the SELinux configuration. If you exit setup, no changes will be made.”

If <Continue> is selected then “mmap_low_allowed” will be set to “on” and installation will continue. If <Exit> is selected then “mmap_low_allowed” will be left unchanged and the install of LifeKeeper will be aborted.
The following message will be displayed when setup is run to install LifeKeeper if SELinux is set to “permissive” and the SELinux parameter “mmap_low_allowed” is “off”.

SELinux appears to be set to Permissive.

SELinux configuration changes are required to run LifeKeeper in Permissive mode. If you continue setup, mmap_low_allowed will be changed in the SELinux configuration. If you exit setup, no changes will be made.”

If <Continue> is selected then “mmap_low_allowed” will be set to “on” and installation will continue. If <Exit> is selected then “mmap_low_allowed” will be left unchanged and the install of LifeKeeper will be aborted.
The following message will be displayed when setup is run to upgrade LifeKeeper if SELinux is set to “permissive” and the SELinux parameter “mmap_low_allowed” is “off”.

SELinux appears to be set to Permissive.

SELinux configuration changes are required to run LifeKeeper in Permissive mode. The SELinux configuration should have mmap_low_allowed enabled to avoid errors being logged.”

No configuration changes are made. To avoid SELinux error messages being logged with LifeKeeper programs that are run the mmap_low_allowed must be set to “on”. This can be done by running:

setsebool -P mmap_low_allowed=on
Please use caution when removing packages using package management tools such as dnf, yum, zypper … .

If command line arguments that automatically remove dependent packages are specified it can lead to the removal of packages that are still required for normal operation of LifeKeeper. Specifying arguments to remove dependent packages will remove those packages and any packages that depend on them and continue to remove packages until all dependencies have been addressed. Before doing any removal, dependencies should be checked to ensure that all direct as well as indirect dependencies are known. Failure to perform these checks can result in the removal of LifeKeeper packages and files.

For example:
On some Redhat distributions the redhat-lsb package requires the cups package. If the package management utilitly “dnf” is used with instructions to remove dependent packages and “cups” is specified for removal it will result in the removal of the redhat-lsb package and any packages that depend on the redhat-lsb package. In this case it will find LifeKeeper packages that need to be removed to address the additional dependencies. This will adversely impact the operation of LifeKeeper requiring re-installation and recreation of resource hierarchies.
Antivirus Software – Exclusions for LifeKeeper for Linux.

CrowdStrike

CrowdStrike may interfere with the nbd processes, preventing the mirror from being and staying in sync.

Recommended Action

Add the following path exclusions for LifeKeeper:
/usr/local/bin/nbd-client
/usr/local/bin/nbd-server
Antivirus Software – Exclusions for LifeKeeper for Linux.
Note: SIOS tested using SentinelOne.

If LifeKeeper for Linux is used in an environment where anti-virus software is running, LifeKeeper may not operate properly.

Recommended Action

Add the following path exclusions for LifeKeeper:
/opt/LifeKeeper/*
/var/log/lifekeeper*
/etc/default/LifeKeeper
/var/log/messages
/tmp/*

If using SAP HANA, add the SAP HANA log directory location.

Add the following path exclusions for DataKeeper:
/usr/local/bin/nbd-client
/usr/local/bin/nbd-server
Include the NBD module where LifeKeeper installs an NBD module.

Optional path exclusions:
/var/LifeKeeper
Increase the amount of time for a umount or to decrease the amount of time before a force umount is done

Customers sometimes need to modify tunable values to allow them to increase the amount of time for a umount or to decrease the amount of time before a force umount is done. There are 3 tunables that contribute to this timer:

FS_UMOUNT_RETRIES (defaults to 1)
FS_KERNEL_RETRIES (defaults to 60). (This is the recommended value to change to modify timing)
FS_UMOUNT_TIMEOUT (defaults to 30)

The customer can lower the maximum execution time of the forceumount script through manipulation of the FS_KERNEL_RETRIES tunable.

Lowering the value by one will save three seconds of script execution time.

FS_UMOUNT_TIMEOUT (defaults to 30) must be set to a value greater than Maximum execution time.

Maximum execution time = ($FS_KERNEL_RETRIES * 3) + 3*($FS_UMOUNT_RETRIES + 3)
Cannot failover when using VEEAM Backup in conjunction with LifeKeeper for Linux

Workaround:

A gen app needs to be created to remove the ‘veeamsnap’ kernel module. With the veeamsnap kernel being loaded the module is preventing LifeKeeper’s ability to stop the mirror via ‘mdadm – –stop’ as the mirror still appears to be “in use’. The gen app resource for Veeam needs 2 scripts.
  • The first is a restore script that just does an ‘exit 0’ and is used to bring the new gen app resource in service.
  • The second script is a remove script that will unload the ‘veeamsnap’ kernel module via the command ‘rmmod veeamsnap’.
    Once the ‘veeamsnap’ module is unloaded it will allow the ‘mdadm – –stop’ command to actually stop the mirror and successfully remove the datakeeper resource. The remove script takes the resource out of service. See the Cannot failover when using VEEAM Backup in conjunction with LifeKeeper for Linux Solution for more information.
Automatic switchback may not be successful after quorum loss when using the ‘fastboot’ quorum loss action

Solution: When using the ‘fastboot’ quorum loss action (QUORUM_LOSS_ACTION=fastboot), a server is forcefully rebooted when it loses quorum (see Quorum/Witness for more details). When the server comes back online, resource hierarchies configured to use automatic switchback may not be automatically put back in-service on the higher priority server by LifeKeeper. In this case, the user must manually bring the resource hierarchies back in-service on the higher priority server.

New or Deprecated Mount Options After a Kernel Upgrade

When upgrading the Linux kernel, it is possible that some existing file system mount options may be deprecated in the new kernel or that the new kernel may add new default mount options to existing mounts. For example, the “nobarrier” mount option was deprecated in RedHat Enterprise Linux 8, and some kernel versions have added new default mount options such as “logbufs=8” and “logbsize=32k”.

If a LifeKeeper-protected file system resource contains mount options which become deprecated after a kernel upgrade, the deprecated options should be removed from the list of mount options for the LifeKeeper resource on every server in the cluster. See the Modifying Mount Options for a LifeKeeper File System Resource section for more details.

If new default mount options are added by the kernel to an existing LifeKeeper-protected mount point after a kernel upgrade, then the new options should be added to the list of mount options for the LifeKeeper resource on every server in the cluster. See the Modifying Mount Options for a LifeKeeper File System Resource section for more details.

If you set your shutdown strategy to “Do not Switchover Resources” (default), do not start LifeKeeper immediately after stopping it. If the time between stopping and starting LifeKeeper is too short, a split brain may occur. This is especially important for Quorum configurations in storage mode.

Conflicts between LifeKeeper’s stop and start processes can cause a split brain. Allow a few seconds between stopping and starting LifeKeeper. Since Quorum configurations in storage mode take longer to stop, you need to allow more time than QWK_STORAGE_HBEATTIME * QWK_STORAGE_NUMHBEATS (24 seconds by default).

If there is a problem with a network connection, stop the service that automatically configures the network

In an environment where IP addresses are protected using LifeKeeper, IP resources may conflict with daemons and services that automatically configure the network, such as avahi-daemon. If there is a problem when restoring communication paths or starting IP resources, stop the services that automatically configure the network.

Do not disconnect the network using the ifconfig down or the ip link down command

When a network interface is disconnected using the ifconfig down or ip link down command, a communication path may not be restored after reconnecting, if a virtual IP resource is configured on the interface.

LifeKeeper does not start with systemd target set to multi-user

In order for LifeKeeper to function properly, when running systemctl set-default or systemctl isolate, you must use the lifekeeper-graphical.target (for graphical mode) or lifekeeper-multi-user.target (for console mode). Do not use the normal graphical.target and multi-user.target systemd targets.

DataKeeper Disk UUID Restriction

Starting in version 9.5.0, DataKeeper can no longer mirror disks that do not present a UUID to the operating system. The best way to mirror such a disk is to partition it with a GPT (GUID Partition Table). The “parted” tool can be used for this purpose. Caution: partitioning a disk will destroy any data that is already stored on the disk.

Workaround: See DataKeeper for Linux Troubleshooting
On SLES 15, LifeKeeper logging may not appear in the LifeKeeper log file following a log rotation

If logrotate is run on the command line or if a background log rotation occurs due to the size of the log, LifeKeeper will stop logging.

Workaround: Run systemctl reload rsyslog to resume LifeKeeper logging.

File system labels should not be used in large configurations

The use of file system labels can cause performance problems during boot-up with large clusters. The problems are generally the result of the requirement that to use labels all devices connected to a system must be scanned. For systems connected to a SAN, especially those with LifeKeeper where accessing a device is blocked, this scanning can be very slow.

To avoid this performance problem on Red Hat systems, edit /etc/fstab and replace the labels with the path names.

lkscsid will halt the system when it should issue a sendevent when a disk fails in certain environments

When lkscsid detects a disk failure, it should, by default, issue a sendevent to LifeKeeper to recover from the failure. The sendevent will first try to recover the failure locally and if that fails, will try to recover the failure by switching the hierarchy with the disk to another server. On some versions of Linux (RHEL 5 and SLES11), lkscsid will not be able to issue the sendevent but instead will immediately halt the system. This only affects hierarchies using the SCSI device nodes such as /dev/sda in a shared storage configuration.

DataKeeper Create Resource fails

When using DataKeeper in certain environments (e.g., virtualized environments with IDE disk emulation, or servers with HP CCISS storage), an error may occur when a mirror is created:


ERROR 104052: Cannot get the hardware ID of the device “/dev/hda3”

This is because LifeKeeper does not recognize the disk in question and cannot get a unique ID to associate with the device.

Workaround: Use a GUID Partition so that LifeKeeper can recognize the disk in question.
Specifying hostnames for API access

The key name used to store LifeKeeper server credentials must match the hostname of the other LifeKeeper server exactly (as displayed by the hostname command on that server). If the hostname is an FQDN, then the credential key must also be the FQDN. If the hostname is a short name, then the key must also be the short name.

Workaround: Make sure that the hostname(s) stored by credstore match the hostname exactly.

Restore of an lkbackup after a resource has been created may leave broken equivalencies

The configuration files for created resources are saved during an lkbackup. If a resource is created for the first time after an lkbackup has been taken, that resource may not be properly accounted for when restoring from this previous backup.

Solution: Restore from lkbackup prior to adding a new resource for the first time. If a new resource has been added after an lkbackup, it should either be deleted prior to performing the restore, or delete an instance of the resource hierarchy, then re-extend the hierarchy after the restore. Note: It is recommended that an lkbackup be run when a resource of a particular type is created for the first time.

Resources removed in the wrong order during failover

In cases where a hierarchy shares a common resource instance with another root hierarchy, resources are sometimes removed in the wrong order during a cascading failover or resource failover.

Solution: Creating a common root will ensure that resource removals in the hierarchy occur from the top down.

  1. Create a gen/app that always succeeds on restore and remove.

  2. Make all current roots children of this new gen/app.



Note: Using /bin/true for the restore and remove script would accomplish this.
Delete of nested file system hierarchy generates “Object does not exist” message

Solution: This message can be disregarded as it does not create any issues.

filesyshier returns the wrong tag on a nested mount create

When a database has nested file system resources, the file system kit will create the file system for both the parent and the nested child. However, filesyshier returns only the child tag. This causes the application to create a dependency on the child but not the parent.

Solution: When multiple file systems are nested within a single mount point, it may be necessary to manually create the additional dependencies to the parent application tag using dep_create or via the UI Create Dependency.

DataKeeper: Nested file system create will fail with DataKeeper

When creating a DataKeeper mirror for replicating an existing file system, if a file system is nested within this structure, you must unmount it first before creating the File System resource.

Workaround: Manually unmount the nested file systems and remount / create each nested mount.

Changing the mount point of the device protected by Filesystem resource may lead data corruption

The mount point of the device protected by LifeKeeper via the File System resource (filesys) must not be changed. Doing so may lead to the device being mounted on multiple nodes and if a switchover is done and this could lead to data corruption.

XFS file system usage may cause quickCheck to fail.

In the case CHECK_FS_QUOTAS setting is enabled for LifeKeeper installed on Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7, quickCheck fails if uquota, gquota option is set to the XFS file system resource, which is to be protected.

Solution: Use usrquota, grpquota instead of uquota, gquota for mount options of XFS file system, or, disable CHECK_FS_QUOTAS setting.

Btrfs is not supported

Btrfs (or any other LifeKeeper for Linux unsupported filesystem) cannot be used for LifeKeeper files (/opt/LifeKeeper), bitmap files if they are not in /opt/LifeKeeper, lkbackupfiles, or any other LifeKeeper related files. In addition, LifeKeeper does not support protecting Btrfs (or any other LifeKeeper for Linux unsupported filesystem) within a resource hierarchy.

Solution: A simple work around for placing /optLifeKeeper on a Btrfs file system is to add a small disk to your instances and format that disk with ext4 or xfs, and mount this filesystem as /opt/LifeKeeper.

  1. Create a small disk to be used for /opt/LifeKeeper
    • A minimum of 110MB is required for software installs
    • Note: In Azure, you can create a 1 GB data disk at a minimum.
    • Note: Additional ARKs and the number of mirrors may increase the total required space.

  2. Once the disk is added to the node and visible, partition the disk (or use lvm).
    • Example: gdisk /dev/sdb
    • Note: How to add a disk to your system is outside the scope of this KBA (contact your sysadmin for your environment)

  3. Format the partition with a supported filesystem (see http://docs.us.sios.com/spslinux/9.4.1/en/topic/sios-protection-suite-for-linux-release-notes).
    • Example: mkfs.ext4 /dev/sdb1 (where sdb1 was created in step 2)

  4. Add the newly created and formatted partition to /etc/fstab and set it to be automatically mounted on system boot.

  5. Mount the new partition as /opt/LifeKeeper
    • Example: mount /dev/sdb1 /opt/LifeKeeper
    • verify filesystem is mounted

  6. Install LifeKeeper for Linux

  7. After the installation, edit /etc/fstab and add the entry, so the disk can be mounted on reboot.
    • Example: /dev/sdb           /opt/LifeKeeper      ext4


SLES12 SP1 or later on AWS

The following restrictions apply with SLES12 SP1 or later on AWS:

• Cannot set static routing configuration
Automatic IP address configuration via DHCP does not work if a static routing configuration is set in /etc/sysconfig/network/routes. This causes the network not to start correctly.


Solution: Update the routing information in the configuration file by modifying the “ROUTE” parameter in /etc/sysconfig/network/ifroute-ethX


• Hostname is changed even if the “Change Hostname via DHCP” setting is disabled.
The LK service does not work properly if the hostname is rewritten. In SLES12 SP1 or later on AWS, the hostname is changed even after the “Change Hostname via DHCP” setting is disabled.


Solution:



º Update /etc/cloud/cloud.cfg to comment out the “update_hostname” parameter



º Update /etc/cloud/cloud.cfg to set the preserve_hostname parameter to “true”



º Update /etc/sysconfig/network/dhcp to set the DHCLIENT_SET_HOSTNAME parameter to “no”



Shutdown Strategy set to “Switchover Resources” may fail when using Quorum/Witness Kit in Witness mode

Hierarchy switchover during LifeKeeper shutdown may fail to occur when using the Quorum/Witness Kit in Witness mode.

Workaround: Manually switchover resource hierarchies before shutdown.

Edit /etc/services

If the following entry in /etc/service is deleted, LifeKeeper cannot start up.


lcm_server 7365/tcp


Don’t delete this entry when editing the file.

NOTE: When LifeKeeper is installed it adds “lcm_server” (with an underscore) to /etc/services. This entry is normally used to determine the port number used by LCM (determined by the configuration in /etc/nsswitch.conf). There are also entries in /etc/services for “lcm-server” (with a dash). Those entries are NOT used by LCM.
Any storage unit which returns a string including a space for the SCSI ID cannot be protected by LifeKeeper.
Using bind mounts is not supported


Bind mounts (mount —bind) cannot be used for the file system protected by LifeKeeper.

On SLES running on AWS or Azure, change the network interface configuration file in order to prevent a cloud network plug-in from removing the virtual IP address.

Click here for more details.
Invalid subdirectory cache of $id still exists

This may prevent $id to unmount, causing the resource remove to fail. This is most likely caused by a known issue when mounting a subdirectory of an NFSv3 export.

Solution: Reconfigure the resources so that a subdirectory is not mounted. Only the directory that is exported should be mounted.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment