Data Replication – Known Issues / Restrictions - LifeKeeper for Linux LIVE - 9.8.0

Description
A Linux kernel vulnerability may cause a kernel panic. The following Linux kernel vulnerability may cause a kernel panic. This issue was identified during high-load testing in a development environment. It does not occur frequently in production environments. For details, please contact your OS vendor. CVE-2024-35979 https://access.redhat.com/security/cve/cve-2024-35979 https://www.suse.com/security/cve/CVE-2024-35979.html CVE-2024-49855 https://access.redhat.com/security/cve/cve-2024-49855 https://www.suse.com/security/cve/CVE-2024-49855.html Solution: Use an unaffected version of the OS.
DataKeeper synchronization fails with certain kernel versions. The following logs are repeatedly output to /var/log/messages: Apr 18 13:05:59 node1 nbd-client: Begin Negotiation Apr 18 13:05:59 node1 nbd-client: size = 53684994048 Apr 18 13:05:59 node1 nbd-client: Negotiation Complete Apr 18 13:05:59 node1 nbd-client: Ioctl/2 failed: Device or resource busy Apr 18 13:05:59 node1 kernel: block nbd1: Device being setup by another task If you are using RHEL8.0~8.2 or Oracle Linux UEK R5 (kernel-4.14.35-*), please update to RHEL8.3 or later and UEK R6 or later, respectively.
Extending to the third node fails when a resync is in progress With a DK resource (mirror), if you attempt to extend the mirror to a third node while a resync is in progress the extend to the third node will fail because the mirror can’t be “grown” while the resync is in progress. It will result in an error similar to the following: removing hierarchy remnants getId: /opt/LifeKeeper/lkadm/subsys/scsi/CPQARRAY/bin/getId -i “/dev/sdb1” returned “” getId: /opt/LifeKeeper/lkadm/subsys/scsi/device/bin/getId -i “/dev/sdb1” returned “360022480e2671ceb01246bb1c5d67ebd-1” mdadm: /dev/md0 is performing resync/recovery and cannot be reshaped Failed to grow array (1)
RHEL 8.6 / 9.0 does not support DataKeeper asynchronous mode on disks that are thin provisioned It has been observed that a kernel panic will occur with kernels provided for Red Hat EL 8.6 /9.0. While we are working with Red Hat to provide an updated kernel with the upstream fix we recommend not configuring asynchronous mirrors on a RHEL 8.6 / 9.0 kernel where the disk is thin provisioned. A warning will be displayed when installing or updating LifeKeeper on a RHEL 8.6 / 9.0 system.
A DataKeeper resource configuration where the resource is created with asynchronous mode and extended with synchronous mode is not supported. In the DataKeeper resource configuration where the resource is created with asynchronous mode and extended with synchronous mode, the read/write process for mirrors may hang within the kernel. Run the following command on each node to determine if the DataKeeper resource is synchronous or asynchronous. 0 is synchronous mode and non zero is asynchronous mode. Resources with all synchronous or resources with all asynchronous on all nodes are acceptable. To avoid this issue do not mix synchronous and asynchronous modes. `perl -nle 'my @x = split(/\x01/, $_); print “$x[0]:$x[3]”;' /opt/LifeKeeper/subsys/scsi/resources/netraid/mirrorinfo_<md num>` Solution: Currently no workaround is available. Recreate a DataKeeper resource and select synchronous mode at the time of creating and extending.
Partitions with an odd number of sectors are not supported when running kernel 4.12 or later The use of a partition with an odd number of sectors is not supported in a DataKeeper mirror in environments running kernel 4.12 or later. This is due to an issue where a resync may fail when attempting to write past the end of the disk.
Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeeper configuration Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.
In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the mirror stops responding and eventually the whole system hangs Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely. While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in /proc/sys/kernel/vm/bdflush). Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure). LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition. However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation. The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors). To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.
High CPU usage reported by top for md_raid1 process with large mirror sizes With the mdX_raid1 process (with X representing the mirror number), high CPU usage as reported by top can be seen on some OS distributions when working with very large mirrors (500GB or more). Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunable LKDR_CHUNK_SIZE then delete and recreate the mirror in order to use this new setting.
The use of lkbackup with DataKeeper resources requires a full resync Although lkbackup will save the instance and mirror_info files, it is best practice to perform a full resync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot be guaranteed while a resource does not exist.
DataKeeper does not support using Network Compression on SLES12 SP1 or later DataKeeper does not support using Network Compression on SLES12 SP1 or later due to disk I/O performance problem.
Certain kernel versions do not support DataKeeper asynchronous mode. It has been observed that kernel panic will occur with certain kernel versions when using DataKeeper resource asynchronous mode with LifeKeeper for Linux. Since this is a kernel dependent problem, there is no fundamental solution with LifeKeeper. In order to use DataKeeper asynchronous mode configuration, it is necessary to update or downgrade the kernel. The kernel versions that do not support the DataKeeper asynchronous mode are as follows. 3.10.0-693. series for 3.10.0-693.24.1.el7.x86_64 or later 3.10.0-862.el7.x86_64 ~ 3.10.0-862.26.x.el7.x86_64 3.10.0-957.el7.x86_64 ~ 3.10.0-957.3.x.el7.x86_64 If you use the kernel version listed above and want to use DataKeeper resources in asynchronous mode, please update (or downgrade) to the following kernel version. 3.10.0-693. series kernel for before 3.10.0-693.24.1.el7.x86_64 3.10.0-862.29.1.el7.x86_64 or later 3.10.0-957.4.1.el7.x86_64 or later If you cannot update (or downgrade) the kernel, do not use DataKeeper asynchronous mode.
Some kernel versions do not support the Secure Boot feature If Secure Boot is enabled on RHEL7, RHEL8, and their compatible operating systems, the nbd module fails to load. Also, in some kernel versions of SUSE Linux Enterprise Server and Oracle Linux UEK kernel, loading of the md / raid1 kernel module fails when Secure Boot is enabled. Solution: Take one of the following actions: Disable Secure Boot – Disable Secure Boot in the UEFI configuration. Disable signature verification – Disable signature verification with the “mokutil ––disable-validation“ command. See mokutil documentations for details. Solution 1 is recommended. Both require a system reboot.
Mirroring occasionally stops when the communication paths recover When LifeKeeper recovers from a communication failure and the resync starts, multiple processes for the resync run. Depending on the timing, one of the processes blocks the initialization of the other and the resync falls into an Out of Sync state. This issue is most likely to occur when all of the following conditions are satisfied after recovering from a communication failure. The mirror state of a DataKeeper resource is “Out of Sync” The following log message is displayed at quickCheck intervals `INFO:lkcheck:::006041:recover is being in progress: [Tag name of the DataKeeper resource]` One or more recover processes are still running. For example, you can check this by the following command. `top \| grep recover` Solution : Reboot the node on which the DataKeeper resource is In Service. If this solution is not effective, please contact our support desk.
The function to pause/resume mirroring is not available when using LVM over DataKeeper When the mirroring is paused, it is expected that the file system will be mounted on the target node, but it is not. It will also fail when the mirroring is resumed. Do not pause/resume mirroring.

GUI – Known Issues / Restrictions

IPv6 – Known Issues / Restrictions

Feedback

Post your comment on this topic.