Avoiding Full Resynchronizations

When replicating large amounts of data over a WAN link, it is desirable to avoid full resynchronizations which can consume large amounts of network bandwidth and time. With newer kernels, SIOS DataKeeper can avoid almost all full resyncs by using its bitmap technology. However, the initial full resync, which occurs when the mirror is first set up, cannot be avoided when existing data is being replicated. (For brand new data, SIOS DataKeeper does not perform a full resync, so the steps below are not necessary.)

There are a couple of ways to avoid an initial full resync when replicating existing data. Two recommended methods are described below.

Method 1 – Replicating to a 2nd node

The first method consists of taking a raw disk image and shipping it to the target site. This results in minimal downtime as the mirror can be active on the source system while the data is in transit to the target system.

Procedure

Create the mirror (selecting Replicate Existing Filesystem), but do not extend the mirror to the target system.

Take the mirror out of service.

Take an image of the source disk or partition. For this example, the chosen disk or partition is /dev/sda1:

root@source# dd if=/dev/sda1 of=/tmp/sdr_disk.img bs=65536

(The block size argument of 65536 is merely for efficiency).

This will create a file containing the raw disk image of the disk or partition.

Note that instead of a file, a hard drive or other storage device could have been used.

Optional Step – Take a checksum of the source disk or partition:

root@source# md5sum /dev/sda1

Optional Step – Compress the disk image file:

root@source# gzip /tmp/sdr_disk.img

Clear the bitmap file (Replace the last argument with the path of the bitmap file which you specified when creating resources.):

root@source# /opt/LifeKeeper/bin/bitmap -c /opt/LifeKeeper/bitmap__dr

Bring the mirror and dependent filesystem and applications (if any), into service. The bitmap file will track any changes made while the data is transferred to the target system.

Transfer the disk image to the target system using your preferred transfer method.

Optional Step – Uncompress the disk image file on the target system:

root@target# gunzip /tmp/sdr_disk.img.gz

Optional Step – Verify that the checksum of the image file matches the original checksum taken in Step 4:

root@target# md5sum /tmp/sdr_disk.img

Transfer the image to the target disk, for example, /dev/sda2:

root@target# dd if=/tmp/sdr_disk.img of=/dev/sda2 bs=65536

Set LKDR_NO_FULL_SYNC=1 in /etc/default/LifeKeeper on both systems:

root@source# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

root@target# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

Extend the mirror to the target. A partial resync will occur.

Edit /etc/default/LifeKeeper to remove the LKDR_NO_FULL_SYNC entry.

Extending to a 3rd node or any additional nodes without doing a full resync

Procedure for copying data from the Source:

These steps assume the mirror has already been created and extended to the 2nd node, aka target1.

Set LKDR_NO_FULL_SYNC=1 in /etc/default/LifeKeeper on each system:

root@source# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

root@target1# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

root@target2# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

Extend the mirror to the new target (target2). A partial resync will occur.

Pause the mirror to the new target (target2).

Take the mirror out of service.

Unmount the file system on the paused mirror on target2 by running ‘umount <filesystem>’ on target2.

Stop the md device running on target2 by running ‘mdadm —stop /dev/md#’ on target2, where # is the value reported in /proc/mdstat.

Make a copy of the source disk or partition on the source node. This could be done using dd or tools from the disk vendor or cloud vendor. The copy must be a block-for-block identical copy. It cannot be a file level copy.

Optional step – Collect checksum data to verify disk image (md5sum, sha256sum, etc).

Optional step – Compress disk image.

Bring the mirror and dependent filesystem and applications (if any), into service. The bitmap file will track any changes made while the data is transferred to the target system.

Verify that the mirror to target2 is still paused. If it is not then restart at step 4.

Verify that the file system and md device are not running on target2. If they are then unmount the file system and stop the md device.

Transfer the disk image to the target disk on target2.

Verify that the disk image is correct. Perhaps use md5sum or sha256sum to validate the disk contents.

Resume the paused mirror to target2. The bitmap on the source was keeping track of any changes made since target2 was paused. When the mirror is resumed these changes will be sent to target2.

Edit /etc/default/LifeKeeper to remove the LKDR_NO_FULL_SYNC entry.

Procedure for copying data from paused target:

This will allow no downtime but while the targets are paused, there is no data redundancy.

These steps assume the mirror has already been created and extended to the 2nd node, aka target1.

Set LKDR_NO_FULL_SYNC=1 in /etc/default/LifeKeeper on each system:

root@source# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

root@target1# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

root@target2# echo ‘LKDR_NO_FULL_SYNC=1’ >>/etc/default/LifeKeeper

Extend the mirror to the new target (target2). A partial resync will occur.

Pause the mirror to the new target (target2).

Pause the mirror to target1.

Unmount the file system on the paused mirror on target2 by running ‘umount <filesystem>’ on target2.

Stop the md running on target2 by running ‘mdadm —stop /dev/md#’ on target2, where # is the value reported in /proc/mdstat.

Unmount the file system on the paused mirror on target1 by running ‘umount <filesystem>’ on target1.

Stop the md running on target1 by running ‘mdadm —stop /dev/md#’ on target1, where # is the value reported in /proc/mdstat.

Make a copy of the target disk or partition on target1. This could be done using dd or tools from the disk vendor or cloud vendor. The copy must be a block-for-block identical copy of the full disk or partition. It cannot be a file level copy.

Optional step – Collect checksum data to verify disk image (md5sum, sha256sum, etc).

Optional step – Compress disk image.

Resume replication to target1. The bitmap file will track any changes made while the data is transferred to target2.

Verify that the mirror to target2 is paused. If it is not then restart at step 4.

Verify that the file system and md device are not running on target2. If they are, then unmount the file system and stop the md device.

Transfer the disk image to the target disk on target2.

Optional step – Decompress disk image.

Optional step – Verify the disk image is correct (md5sum, sha256sum, etc.).

Resume the paused mirror to target2. The bitmap on the source was keeping track of any changes made since target2 was paused. When the mirror is resumed these changes will be sent to target2.

Edit /etc/default/LifeKeeper to remove the LKDR_NO_FULL_SYNC entry.

Method 2

This method can be used if the target system can be easily transported to or will already be at the source site when the systems are configured. This method consists of temporarily modifying network routes to make the eventual WAN mirror into a LAN mirror so that the initial full resync can be performed over a faster local network. In the following example, assume the source site is on subnet 10.10.10.0/24 and the target site is on subnet 10.10.20.0/24. By temporarily setting up static routes on the source and target systems, the “WAN” traffic can be made to go directly from one server to another over a local ethernet connection or loopback cable.

Procedure

Install and configure the systems at the source site.

Add static routes:

root@source# ip route add 10.10.20.0/24 dev eth0

root@target# ip route add 10.10.10.0/24 dev eth0

The systems should now be able to talk to each other over the LAN.

Configure the communication paths in LifeKeeper.

Create the mirror and extend to the target. A full resync will occur.

Pause the mirror. Changes will be tracked in the bitmap file until the mirror is resumed.

Delete the static routes:

root@source# ip route del 10.10.20.0/24

root@target# ip route del 10.10.10.0/24

Shut down the target system and ship it to its permanent location.

Boot the target system and ensure network connectivity with the source.

Resume Replication. A partial resync will occur.

Resynchronization

Verify Data Before Resync (Wait to Resync)

Feedback

Post your comment on this topic.

Method 1 – Replicating to a 2nd node

Procedure

Extending to a 3rd node or any additional nodes without doing a full resync

Procedure for copying data from the Source:

Procedure for copying data from paused target:

Method 2

Procedure

Feedback

Was this helpful?