You are here: Introduction > How SteelEye DataKeeper Works

How SteelEye DataKeeper Works

SteelEye DataKeeper creates and protects NetRAID devices.  A NetRAID device is a RAID1 device that consists of a local disk or partition and a Network Block Device (NBD) as shown in the diagram below.

sdrblock.gif

A LifeKeeper supported file system can be mounted on a NetRAID device like any other storage device.  In this case, the file system is called a replicated file system.  LifeKeeper protects both the NetRAID device and the replicated file system.

The NetRAID device is created by building the DataKeeper resource hierarchy. Extending the NetRAID device to another server will create the NBD device and make the network connection between the two servers.  SteelEye DataKeeper starts replicating data as soon as the NBD connection is made.

The nbd-client process executes on the primary server and connects to the nbd-server process running on the backup server.

Synchronization (and Resynchronization)

After the DataKeeper resource hierarchy is created and before it is extended, it is in a degraded mode; that is, data will be written to the local disk or partition only. Once the hierarchy is extended to the backup (target) system, SteelEye DataKeeper synchronizes the data between the two systems and all subsequent writes are replicated to the target. If at any time the data gets “out-of-sync” (i.e., a system or network failure occurs) SteelEye DataKeeper will automatically resynchronize the data on the source and target systems. If the mirror was configured to use an intent log (bitmap file), SteelEye DataKeeper uses it to determine what data is out-of-sync so that a full resynchronization is not required. If the mirror was not configured to use a bitmap file, then a full resync is performed after any interruption of data replication.

Standard Mirror Configuration

The most common mirror configuration involves two servers with a mirror established between local disks or partitions on each server, as shown below. Server1 is considered the primary server containing the mirror source. Server2 is the backup server containing the mirror target.

mirrorconfig1.gif

N+1 Configuration

A commonly used variation of the standard mirror configuration above is a cluster in which two or more servers replicate data to a common backup server. In this case, each mirror source must replicate to a separate disk or partition on the backup server, as shown below.

sdrnplus1.gif

Multiple Target Configuration

When used with an appropriate Linux distribution and kernel version 2.6.7 or higher, SteelEye DataKeeper can also replicate data from a single disk or partition on the primary server to multiple backup systems, as shown below.

sdrmulttar.gif


A given source disk or partition can be replicated to a maximum of 7 mirror targets, and each mirror target must be on a separate system (i.e. a source disk or partition cannot be mirrored to more than one disk or partition on the same target system).

This type of configuration allows the use of LifeKeeper’s cascading failover feature, providing multiple backup systems for a protected application and its associated data.

SteelEye DataKeeper Resource Hierarchy

The following example shows a typical DataKeeper resource hierarchy as it appears in the LifeKeeper GUI:

sdrgui.gif

The resource datarep-ext3-sdr is the NetRAID resource, and the parent resource ext3-sdr is the file system resource. Note that subsequent references to the DataKeeper resource in this documentation refer to both resources together. Because the file system resource is dependent on the NetRAID resource, performing an action on the NetRAID resource will also affect the file system resource above it.

Failover Scenarios

The following four examples show what happens during a failover using SteelEye DataKeeper. In these examples, the LifeKeeper for Linux cluster consists of two servers, Server 1 (primary server) and Server 2 (backup server). 

Scenario 1

Server 1 has successfully completed its replication to Server 2 after which Server 1 becomes inoperable.

sdrfail1.gif

Result: Failover occurs. Server 2 now takes on the role of primary server and operates in a degraded mode (with no backup) until Server 1 is again operational. SteelEye DataKeeper will then initiate a resynchronization from Server 2 to Server 1. This will be a full resynchronization on kernel 2.6.18 and lower. On kernels 2.6.19 and later or with RedHat Enterprise Linux 5.4 kernels 2.6.18-164 or later (or a supported derivative of RedHat 5.4 or later), the resynchronization will be partial, meaning only the changed blocks recorded in the bitmap files on the source and target will need to be synchronized.

Note:  SteelEye DataKeeper sets the following flag on the server that is currently acting as the mirror source:

$LKROOT/subsys/scsi/resources/netraid/$TAG_last_owner

When Server 1 fails over to Server 2, this flag is set on Server 2.Thus, when Server 1 comes back up; SteelEye DataKeeper removes the last owner flag from Server1. It then begins resynchronizing the data from Server 2 to Server 1.

Scenario 2

Considering scenario 1, Server 2 (still the primary server) becomes inoperable during the resynchronization with Server 1 (now the backup server).

Result: Because the resynchronization process did not complete successfully, there is potential for data corruption.  As a result, LifeKeeper will not attempt to fail over the DataKeeper resource to Server 1. Only when Server 2 becomes operable will LifeKeeper attempt to bring the DataKeeper resource in-service (ISP) on Server 2.

Scenario 3

Both Server 1 (primary) and Server 2 (target) become inoperable. Server 1 (primary) comes back up first.

sdrfail3.gif

Result: Server 1 will not bring the DataKeeper resource in-service. The reason is that if a source server goes down, and then it cannot communicate with the target after it comes back online, it sets the following flag:

$LKROOT/subsys/scsi/resources/netraid/$TAG_data_corrupt

This is a safeguard to avoid resynchronizing data in the wrong direction. In this case you will need to force the mirror online on Server1, which will delete the data_corrupt flag and bring the resource into service on Server 1. See Force Mirror Online.

Note: The user must be certain that Server 1 was the last primary before removing the $TAG_data_corrupt file. Otherwise data corruption might occur. You can verify this by checking for the presence of the last_owner flag.

Scenario 4

Both Server 1 (primary) and Server 2 (target) become inoperable. Server 2 (target) comes back up first.

sdrfail4.gif

Result: LifeKeeper will not bring the DataKeeper resource ISP on Server 2. When Server 1 comes back up, LifeKeeper will automatically bring the DataKeeper resource ISP on Server 1.

© 2012 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.