Description

Mirror status is not displayed correctly when running kernel 4.13 or later

The default value of the nbd kernel module parameter max_part has changed in kernel 4.13. As a result, the mirror status is not displayed correctly in the GUI and the mirror_status command.

Solution: Follow the steps below to set the nbd module parameter on all nodes.

Execute the following commands on the standby node to set the parameter.

  1. In the file /etc/modprobe.d/lifekeeper-nbd.conf, add “max_part=0” to the line containing “options nbd”, e.g.:

      options nbd nbds_max=128 max_part=0

    (note that nbds_max may be different on your system, so leave that setting at its current value)

  2. Run the following commands to reload the nbd module.

      modprobe -r nbd
      modprobe nbd

    Switch over and perform steps 1 and 2 on the former active node.

    Switch over to return to the original active/standby state.

Partitions with an odd number of sectors are not supported when running kernel 4.12 or later

The use of a partition with an odd number of sectors is not supported in a DataKeeper mirror in environments running kernel 4.12 or later. This is due to an issue where a resync may fail when attempting to write past the end of the disk.

Important reminder about DataKeeper for Linux asynchronous mode in an LVM over DataKeeper configuration

Kernel panics may occur in configurations were LVM resources sit above multiple asynchronous mirrors. In these configurations data consistency may be an issue if a panic occurs. Therefore the required configurations are a single DataKeeper mirror or multiple synchronous DataKeeper mirrors.

In symmetric active SDR configurations with significant I/O traffic on both servers, the filesystem mounted on the mirror stops responding and eventually the whole system hangs

Due to the single threaded nature of the Linux buffer cache, the buffer cache flushing daemon can hang trying to flush out a buffer which needs to be committed remotely. While the flushing daemon is hung, all activities in the Linux system with dirty buffers will stop if the number of dirty buffers goes over the system accepted limit (set in /proc/sys/kernel/vm/bdflush).

Usually this is not a serious problem unless something happens to prevent the remote system from clearing remote buffers (e.g. a network failure). LifeKeeper will detect a network failure and stop replication in that event, thus clearing a hang condition. However, if the remote system is also replicating to the local system (i.e. they are both symmetrically replicating to each other), they can deadlock forever if they both get into this flushing daemon hang situation.

The deadlock can be released by manually killing the nbd-client daemons on both systems (which will break the mirrors). To avoid this potential deadlock entirely, however, symmetric active replication is not recommended.

Mirror breaks and fills up /var/log/messages with errors

This issue has been seen occasionally (on Red Hat EL 6.x and CentOS 6.x) during stress tests with induced failures, especially in killing the nbd-server process that runs on a mirror target system. Upgrading to the latest kernel for your distribution may help lower the risk of seeing this particular issue, such as kernel-2.6.32-131.17.1 or later. Rebooting the source system will clear up this issue.

With the default kernel that comes with CentOS 6 (2.6.32-71), this issue may occur much more frequently (even when the mirror is just under a heavy load).

Note: Beginning with SPS 8.1, when performing a kernel upgrade on Red Hat Enterprise Linux systems, it is no longer a requirement that the setup script (./setup) from the installation image be rerun. Modules should be automatically available to the upgraded kernel without any intervention as long as the kernel was installed from a proper Red Hat package (rpm file).

High CPU usage reported by top for md_raid1 process with large mirror sizes

With the mdX_raid1 process (with X representing the mirror number), high CPU usage as reported by top can be seen on some OS distributions when working with very large mirrors (500GB or more).

Solution: To reduce the CPU usage percent, modify the chunk size to 1024 via the LifeKeeper tunable LKDR_CHUNK_SIZE then delete and recreate the mirror in order to use this new setting.

The use of lkbackup with DataKeeper resources requires a full resync

Although lkbackup will save the instance and mirror_info files, it is best practice to perform a full resync of DataKeeper mirrors after a restore from lkbackup as the status of source and target cannot be guaranteed while a resource does not exist.

Mirror resyncs may hang in early Red Hat/CentOS 6.x kernels with a “Failed to remove device” message in the LifeKeeper log

Kernel versions prior to version 2.6.32-131.17.1 (RHEL 6.1 kernel version 2.6.32-131.0.15 before update, etc) contain a problem in the md driver used for replication. This problem prevents the release of the nbd device from the mirror resulting in the logging of multiple “Failed to remove device” messages and the aborting of the mirror resync. A system reboot may be required to clear the condition. This problem has been observed during initial resyncs after mirror creation and when the mirror is under stress.

Solution: Kernel 2.6.32-131.17.1 has been verified to contain the fix for this problem. If you are using DataKeeper with Red Hat or CentOS 6 kernels before the 2.6.32-131.17.1 version, we recommend updating to this or the latest available version.

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later

DataKeeper does not support using Network Compression on SLES11 SP4 and SLES12 SP1 or later due to disk I/O performance problem.

Asynchronous replication mode for DataKeeper mirrors is not supported with some kernels on RHEL 7.4-7.6, CentOS 7.4-7.6, and Oracle Linux 7.4-7.6 OS distributions

Asynchronous replication mode for DataKeeper mirrors is not supported with the following list of kernels due to a kernel issue that results in a system panic.

The following kernels are not supported:
3.10.0-693.x earlier than 3.10.0-693.51.1.el7
3.10.0-862.x earlier than 3.10.0-862.29.1.el7
3.10.0-957.x earlier than 3.10.0-957.10.1.el7

Secure Boot is not supported on RHEL/CentOS/Oracle Linux

If Secure Boot is enabled on RHEL7 or later, CentOS7 or later, or Oracle Linux 7 or later, the nbd module fails to load. For this reason, Secure Boot cannot be enabled in a DataKeeper environment. When using UEK with Oracle Linux, Secure Boot can be enabled.

Solution: Take one of the following actions:

  1. Disable Secure Boot – Disable Secure Boot in the UEFI configuration.
  2. Disable signature verification – Disable signature verification with the “mokutil ––disable-validation“ command. See mokutil documentations for details.

    Solution 1 is recommended. Both require a system reboot.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment