Data Replication – Known Issues / Restrictions - LifeKeeper for Linux LIVE

Description
Partitions with an odd number of sectors are not supported when running kernel 4.12 or later The use of a partition with an odd number of sectors is not supported in a DataKeeper mirror in environments running kernel 4.12 or later. This is due to an issue where a resync may fail when attempting to write past the end of the disk.
Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeepr configuration Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.
In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the netraid device (mirror) stops responding and eventually the whole system hangs Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely. While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in /proc/sys/kernel/vm/bdflush). Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure). LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition. However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation. The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors). To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.
*Mirror breaks and fills up /var/log/messages* with errors This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6) during stress tests with induced failures, especially in killing the nbd_server process that runs on a mirror target system. Upgrading to the latest kernel for your distribution may help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1 on Red Hat EL 6.0 or 6.1. Rebooting the source system will clear up this issue. With the default kernel that comes with CentOS 6 (2.6.32-71), this issue may occur much more frequently (even when the mirror is just under a heavy load.) Unfortunately, CentOS has not yet released a kernel (2.6.32-131.17.1) that will improve this situation. SIOS recommends updating to the 2.6.32-131.17.1 kernel as soon as it becomes available for CentOS 6. Note:** Beginning with SPS 8.1, when performing a kernel upgrade on Red Hat Enterprise Linux systems, it is no longer a requirement that the setup script (./setup) from the installation image be rerun. Modules should be automatically available to the upgraded kernel without any intervention as long as the kernel was installed from a proper Red Hat package (rpm file).
High CPU usage reported by top for md_raid1 process with large mirror sizes With the mdX_raid1 process (with X representing the mirror number), high CPU usage as reported by top can be seen on some OS distributions when working with very large mirrors (500GB or more). Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunable LKDR_CHUNK_SIZE then delete and recreate the mirror in order to use this new setting.
The use of lkbackup with DataKeeper resources requires a full resync Although lkbackup will save the instance and mirror_info files, it is best practice to perform a full resync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot be guaranteed while a resource does not exist.
Mirror resyncs may hang in early Red Hat/CentOS 6.x kernels with a “Failed to remove device” message in the LifeKeeper log Kernel versions prior to version 2.6.32-131.17.1 (RHEL 6.1 kernel version 2.6.32-131.0.15 before update, etc) contain a problem in the md driver used for replication. This problem prevents the release of the nbd device from the mirror resulting in the logging of multiple “Failed to remove device” messages and the aborting of the mirror resync. A system reboot may be required to clear the condition. This problem has been observed during initial resyncs after mirror creation and when the mirror is under stress. Solution: Kernel 2.6.32-131.17.1 has been verified to contain the fix for this problem. If you are using DataKeeper with Red Hat or CentOS 6 kernels before the 2.6.32-131.17.1 version, we recommend updating to this or the latest available version.
DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later due to disk I/O performance problem.
Asynchronous replication mode for DataKeeper mirrors is not supported on RHEL 7.5 and 7.6, CentOS 7.5 and 7.6 and Oracle Linux 7.5 and 7.6 OS distributions. Asynchronous replication mode for DataKeeper mirrors is not supported under the kernel-3.10.0-862.el7.x86_64 and kernel-3.10.0-957el7.x86_64 environments due to a kernel issue that results in a system panic.

Description

Partitions with an odd number of sectors are not supported when running kernel 4.12 or later

The use of a partition with an odd number of sectors is not supported in a DataKeeper mirror in environments running kernel 4.12 or later. This is due to an issue where a resync may fail when attempting to write past the end of the disk.

Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeepr configuration

Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the netraid device (mirror) stops responding and eventually the whole system hangs

Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely. While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in /proc/sys/kernel/vm/bdflush).

Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure). LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition. However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation.

The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors). To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.

Mirror breaks and fills up /var/log/messages with errors

This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6) during stress tests with induced failures, especially in killing the nbd_server process that runs on a mirror target system. Upgrading to the latest kernel for your distribution may help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1 on Red Hat EL 6.0 or 6.1. Rebooting the source system will clear up this issue.

With the default kernel that comes with CentOS 6 (2.6.32-71), this issue may occur much more frequently (even when the mirror is just under a heavy load.) Unfortunately, CentOS has not yet released a kernel (2.6.32-131.17.1) that will improve this situation. SIOS recommends updating to the 2.6.32-131.17.1 kernel as soon as it becomes available for CentOS 6.

Note: Beginning with SPS 8.1, when performing a kernel upgrade on Red Hat Enterprise Linux systems, it is no longer a requirement that the setup script (./setup) from the installation image be rerun. Modules should be automatically available to the upgraded kernel without any intervention as long as the kernel was installed from a proper Red Hat package (rpm file).

High CPU usage reported by top for md_raid1 process with large mirror sizes

With the mdX_raid1 process (with X representing the mirror number), high CPU usage as reported by top can be seen on some OS distributions when working with very large mirrors (500GB or more).

Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunable LKDR_CHUNK_SIZE then delete and recreate the mirror in order to use this new setting.

The use of lkbackup with DataKeeper resources requires a full resync

Although lkbackup will save the instance and mirror_info files, it is best practice to perform a full resync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot be guaranteed while a resource does not exist.

Mirror resyncs may hang in early Red Hat/CentOS 6.x kernels with a “Failed to remove device” message in the LifeKeeper log

Kernel versions prior to version 2.6.32-131.17.1 (RHEL 6.1 kernel version 2.6.32-131.0.15 before update, etc) contain a problem in the md driver used for replication. This problem prevents the release of the nbd device from the mirror resulting in the logging of multiple “Failed to remove device” messages and the aborting of the mirror resync. A system reboot may be required to clear the condition. This problem has been observed during initial resyncs after mirror creation and when the mirror is under stress.

Solution: Kernel 2.6.32-131.17.1 has been verified to contain the fix for this problem. If you are using DataKeeper with Red Hat or CentOS 6 kernels before the 2.6.32-131.17.1 version, we recommend updating to this or the latest available version.

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later due to disk I/O performance problem.

Asynchronous replication mode for DataKeeper mirrors is not supported on RHEL 7.5 and 7.6, CentOS 7.5 and 7.6 and Oracle Linux 7.5 and 7.6 OS distributions.

Asynchronous replication mode for DataKeeper mirrors is not supported under the kernel-3.10.0-862.el7.x86_64 and kernel-3.10.0-957el7.x86_64 environments due to a kernel issue that results in a system panic.

GUI – Known Issues / Restrictions

IPv6 – Known Issues / Restrictions

Feedback

Post your comment on this topic.

Data Replication – Known Issues / Restrictions

Feedback

Was this helpful?