Replacing a node in a LifeKeeper for Windows cluster that has DataKeeper mirrored volume resources involves making the following changes:

  • Move any DataKeeper Volume resources from the node being replaced to another node in the cluster
  • Using the registry editor, remove the node being replaced from the DataKeeper job for each mirrored volume. For 1×1 mirrors (2 node cluster) delete the DataKeeper job.
  • For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced
  • Remove the node from the cluster
    • Version 8.9.0 and later – use the delallsys.pl utility to remove the system
    • Versions prior to 8.9.0 – manually remove comm paths, equivalencies, and the system from the LifeKeeper Configuration Database
  • Bring up the replacement node
  • Use the DataKeeper GUI to re-create mirrors to the new node
  • Add the node to the LifeKeeper cluster

There are two cases that require slightly different steps to achieve node replacement. The first case involves a cluster node that has been lost and cannot be recovered, the second is the case where a node is planned to be replaced, but is still up and running prior to replacement.

Within those two cases are two scenarios that also require slightly different steps. The first scenario is a two-node cluster, with one mirror for each clustered volume. The second scenario is a three-node cluster, or a two-node cluster with a node outside the cluster.

Case 1 – Node is lost and Not recoverable

SCENARIO 1: Two-node cluster with no nodes outside the cluster

In this example, there is a two-node DKCE cluster. The cluster nodes are:

  • W16-1
  • W16-2

There are two mirrored volumes – E: and F:.

Node W16-2 has been lost and is not recoverable. It will be replaced with a new node, also named W16-2.

Step 1 – Move any DataKeeper Volume resources from the node being replaced to another node in the cluster

LifeKeeper hierarchies are in service on node W16-1.

Step 2 – Using the registry editor, delete the jobs that contain mirrored volumes.

DataKeeper Jobs are stored in the Windows registry, in the following registry key:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ExtMirr\Parameters\Jobs

Start the registry editor and navigate to that key.

Each DataKeeper job has an ID, which you’ll see as a subkey in the “Jobs” key. For example, on this system there are two jobs “E” and “F”. The job ID is listed in the output of the “emcmd . getjobinfo” command:


C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getjobinfo
ID = 27473bab-6269-45c6-a49f-ee2499e6a406
Name = E
Description =
MirrorEndPoints = W16-1.SIOS.LOCAL;E;172.31.58.101;W16-2.SIOS.LOCAL;E;172.31.27.23;A

ID = 2b37cb2b-3ba9-4ba4-9ff9-be8f3f9daa7e
Name = F
Description =
MirrorEndPoints = W16-1.SIOS.LOCAL;F;172.31.58.101;W16-2.SIOS.LOCAL;F;172.31.27.23;A

The registry shows the two Job IDs.

Each job should be deleted, unless it contains information about mirrored volumes that are not part of the cluster. To delete the job, right-click the key whose name is the job ID, and choose “Delete”. This removes the job completely from DataKeeper on this system.

Note: Some DataKeeper jobs contain information for more than one volume.

After completing this step on one of the cluster nodes, repeat it on all other cluster nodes.

Step 3 – For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced

To delete a mirror using EMCMD, start a CMD prompt on the mirror source node. Then change directory to the DataKeeper install directory using the command “cd /d %ExtMirrBase%”.

To delete the mirror for a mirrored volume that has only one target, run this command:

emcmd . deletemirror <vol>

In this case, run the commands:


C:\Users\administrator.SIOS>cd /d %extmirrbase%

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getmirrorvolinfo e
E: 1 W16-1 172.31.27.23 1

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getmirrorvolinfo f
F: 1 W16-1 172.31.27.23 1

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . deletemirror e
         Status = 0

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . deletemirror f
         Status = 0

Step 4 – Remove the node from the cluster

At this point, the node has been completely removed from DataKeeper. The next step in the replacement process is to remove the node from the LifeKeeper cluster.

Version 8.9.0 and later – use delallsys.pl

Command: delallsys [-f] [-e] <sys>

[-f] force items to be deleted — without this option, NOTHING will be deleted. Run without this first to see what this command will delete!
[-e] delete equivalencies only, not comm paths
<sys> delete all resource equivalencies and comm paths to <sys>
where <sys> is permanently out of the cluster
(i.e., dead) and is being replaced

Note: LifeKeeper MUST be running before you can run this utility.

Note: Run this command with extreme caution and only under the direction of the SIOS Support Team. If you have any questions please contact Support at support@us.sios.com.

If your LifeKeeper version is 8.9.0 or later, the delallsys.pl utility is included with LifeKeeper. To use this tool, follow these steps.

  1. Open a CMD prompt
  2. Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
  3. Start a shell session by running the command “sh”
  4. Set PATH using the command “export PATH=$PATH:/bin”
  5. Run the delallsys.pl script by running the command “perl delallsys.pl -f <node name>”
  6. Run “exit” to return to the CMD prompt


C:\Users\administrator.SIOS>cd /d %LKBIN%

C:\LK\Bin>sh
$ export PATH=$PATH:/bin
$ perl delallsys.pl -f W16-2
Removing equivalency 17.17.17.17 -> 17.17.17.17 on W16-2…
Removing equivalency Vol.E -> Vol.E on W16-2…
Removing equivalency Vol.F -> Vol.F on W16-2…
Removing network comm path(s) 172.31.58.101 -> 172.31.39.191…
$ exit

C:\LK\Bin>

Versions prior to 8.9.0 – manually remove the node

If your LifeKeeper version is earlier than 8.9.0, the LifeKeeper system must be manually removed from the configuration database. To do this, follow these steps:

  1. Open a CMD prompt
  2. Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
  3. Remove equivalencies:
    1. Run “eqv_list” to get a list of equivalencies. Identify all of the ones that contain the node being replaced (W16-3):


C:\LK\Bin> eqv_list
W16-1▯17.17.17.17▯W16-2▯17.17.17.17▯SHARED▯1▯10
W16-1▯Vol.E▯W16-2▯Vol.E▯SHARED▯1▯10
W16-1▯Vol.F▯W16-2▯Vol.F▯SHARED▯1▯10

b. For each instance tag (Vol.E, 17.17.17.17, and Vol.F in this case), find the local tag (the second item in each entry, after the “” character) and the remote tag (the 4th item). Run the command “eqv_remove -t <localtag> -S <node being removed> -o <remotetag> -e <instancetype>.


C:\LK\Bin>eqv_remove -t Vol.E -S W16-2 -o Vol.E -e SHARED
C:\LK\Bin>eqv_remove -t 17.17.17.17 -S W16-2 -o 17.17.17.17 -e SHARED
C:\LK\Bin>eqv_remove -t Vol.F -S W16-2 -o Vol.F -e SHARED

c. Verify that eqv_list output is now empty


C:\LK\Bin>eqv_list

C:\LK\Bin>

  1. Remove LifeKeeper communication paths to the node being replaced
    1. Run “net_list” to get a list of communication paths. Identify all of the paths that contain the node being replaced (W16-3):


C:\LK\Bin>net_list
W16-2▯TCPIP:1500▯TLI▯0▯0▯0▯ALIVE▯1▯6▯5▯0▯0▯0▯0▯0▯0▯0▯0▯172.31.27.23▯172.31.58.101

b. For each communication path to be removed, get the device name (the 2nd item in the list after the “▯” character). Run the command “net_remove -D <devicename>” to delete the communication path. Then run “net_list” to verify that the communication path has been removed.


C:\LK\Bin>net_remove -D TCPIP:1500

C:\LK\Bin>net_list

C:\LK\Bin>

  1. Remove the node from the list of LifeKeeper systems by running “sys_remove -s <node being removed>”. Verify that the system was removed by running “sys_list”


C:\LK\Bin>sys_remove -s W16-2

C:\LK\Bin>sys_list
W16-1

Step 5 – Bring up the replacement node and add it to the cluster

Configure the new node, adding storage as appropriate. Then add it to the cluster.

Step 6 – Use the DataKeeper GUI to re-create mirrors to the new node

Start the DataKeeper GUI, connect to the new node, and create a mirror to it within the appropriate job.

SCENARIO 2: Three-or-more-node cluster, or two node cluster with 1 or more nodes outside the cluster

In this example, there is a three-node DKCE cluster. The cluster nodes are:

  • W16-1
  • W16-2
  • W16-3

There are two mirrored volumes – E: and F:.

Node W16-3 has been lost and is not recoverable. It will be replaced with a new node, also named W16-3.

Step 1 – Move any DataKeeper Volume resources from the node being replaced to another node in the cluster

LifeKeeper hierarchies are in service on node W16-1.

Step 2 – Using the registry editor, remove the node being replaced from the DataKeeper job for each mirrored volume.

DataKeeper Jobs are stored in the Windows registry. To modify a job that is configured on a node that is not accessible, update the registry values associated with the job.

DataKeeper Jobs are stored in the following registry key:

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ExtMirr\Parameters\Jobs

Start the registry editor and navigate to that key.

Each DataKeeper job has an ID, which you’ll see as a subkey in the “Jobs” key. For example, on this system there are two jobs “E” and “F”. The job ID is listed in the output of the “emcmd . getjobinfo” command:


C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getjobinfo
ID = c1bfee06-ec2e-4d72-b8a1-81c649259541
Name = E
Description =
MirrorEndPoints = W16-2.SIOS.LOCAL;E;172.31.27.23;W16-1.SIOS.LOCAL;E;172.31.58.101;A
MirrorEndPoints = W16-3.SIOS.LOCAL;E;172.31.39.191;W16-2.SIOS.LOCAL;E;172.31.27.23;A
MirrorEndPoints = W16-3.SIOS.LOCAL;E;172.31.39.191;W16-1.SIOS.LOCAL;E;172.31.58.101;A

ID = f6a74540-e012-4b3b-8687-933905d0d1f1
Name = F
Description =
MirrorEndPoints = W16-2.SIOS.LOCAL;F;172.31.27.23;W16-1.SIOS.LOCAL;F;172.31.58.101;A
MirrorEndPoints = W16-3.SIOS.LOCAL;F;172.31.39.191;W16-2.SIOS.LOCAL;F;172.31.27.23;A
MirrorEndPoints = W16-3.SIOS.LOCAL;F;172.31.39.191;W16-1.SIOS.LOCAL;F;172.31.58.101;A

The registry shows the two Job IDs. Navigate into one of them – you will see that it contains 3 values: Name, Description, and Endpoints.

To remove a node from a job, the Endpoints value needs to be modified. Double-click the Endpoints value and find any lines containing the node that is to be removed.

In this case, the 2nd and 3rd lines should be removed. Highlight them and press the Delete button, then “OK” to save the value. In this case, it should have a single line left (for the mirror between W16-1 and W16-2).

Repeat these steps for all jobs. When completed, “emcmd . getjobinfo” will reflect the new job contents:


C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getjobinfo
ID = c1bfee06-ec2e-4d72-b8a1-81c649259541
Name = E
Description =
MirrorEndPoints = W16-2.SIOS.LOCAL;E;172.31.27.23;W16-1.SIOS.LOCAL;E;172.31.58.101;A

ID = f6a74540-e012-4b3b-8687-933905d0d1f1
Name = F
Description =
MirrorEndPoints = W16-2.SIOS.LOCAL;F;172.31.27.23;W16-1.SIOS.LOCAL;F;172.31.58.101;A

Note: Some DataKeeper jobs contain information for more than one volume. In those cases, the same steps should be followed – remove any lines that contain references to the node being removed.

After completing this step on one of the cluster nodes, repeat it on all other cluster nodes. An alternative is to export the “Jobs” key to a file, and import that key on each of the other nodes. This ensures that job information is consistent across the nodes.

Step 3 – For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced

To delete a mirror using EMCMD, start a CMD prompt on the mirror source node. Then change directory to the DataKeeper install directory using the command “cd /d %ExtMirrBase%”.

To delete the mirror whose target is the node being removed, run the command:

emcmd . deletemirror <vol> <target_ip>

using the volume letter and IP address of the node being removed. In this case, run the commands:


C:\Users\administrator.SIOS>cd /d %extmirrbase%

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getmirrorvolinfo E
E: 1 W16-1 172.31.27.23 1
E: 1 W16-1 172.31.39.191 4

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . getmirrorvolinfo F
F: 1 W16-1 172.31.39.191 4
F: 1 W16-1 172.31.27.23 1

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . deletemirror E 172.31.39.191
         Status = 0

C:\Program Files (x86)\SIOS\DataKeeper>emcmd . deletemirror F 172.31.39.191
         Status = 0

Step 4 – Remove the node from the cluster

At this point, the node has been completely removed from DataKeeper. The next step in the replacement process is to remove the node from the LifeKeeper cluster.

Version 8.9.0 and later – use delallsys.pl

Command: delallsys [-f] [-e] <sys>

[-f] force items to be deleted — without this option, NOTHING will be deleted. Run without this first to see what this command will delete!
[-e] delete equivalencies only, not comm paths
<sys> delete all resource equivalencies and comm paths to <sys>
where <sys> is permanently out of the cluster
(i.e., dead) and is being replaced

Note: LifeKeeper MUST be running before you can run this utility.

Note: Run this command with extreme caution and only under the direction of the SIOS Support Team. If you have any questions please contact Support at support@us.sios.com.

If your LifeKeeper version is 8.9.0 or later, the delallsys.pl utility is included with LifeKeeper. To use this tool, follow these steps.

  1. Open a CMD prompt
  2. Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
  3. Start a shell session by running the command “sh”
  4. Set PATH using the command “export PATH=$PATH:/bin”
  5. Run the delallsys.pl script by running the command “perl delallsys.pl -f <node name>”
  6. Run “exit” to return to the CMD prompt


C:\Users\administrator.SIOS>cd /d %LKBIN%

C:\LK\Bin>sh
$ export PATH=$PATH:/bin
$ perl delallsys.pl -f W16-3
Removing equivalency 17.17.17.17 -> 17.17.17.17 on W16-3…
Removing equivalency Vol.E -> Vol.E on W16-3…
Removing equivalency Vol.F -> Vol.F on W16-3…
Removing network comm path(s) 172.31.58.101 -> 172.31.39.191…
$ exit

C:\LK\Bin>

Repeat these steps on all remaining LifeKeeper nodes (W16-2 in this case).

Versions prior to 8.9.0 – manually remove the node

If your LifeKeeper version is earlier than 8.9.0, the LifeKeeper system must be manually removed from the configuration database. To do this, follow these steps:

  1. Open a CMD prompt
  2. Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
  3. Remove equivalencies:
    1. Run “eqv_list” to get a list of equivalencies. Identify all of the ones that contain the node being replaced (W16-3):


C:\LK\Bin> eqv_list
W16-1▯17.17.17.17▯W16-2▯17.17.17.17▯SHARED▯1▯10
W16-1▯Vol.E▯W16-3▯Vol.E▯SHARED▯1▯20
W16-1▯Vol.E▯W16-2▯Vol.E▯SHARED▯1▯10
W16-1▯17.17.17.17▯W16-3▯17.17.17.17SHARED▯1▯20
W16-1▯Vol.F▯W16-2▯Vol.F▯SHARED▯1▯10
W16-1▯Vol.F▯W16-3▯Vol.F▯SHARED▯1▯20

b. For each instance tag (Vol.E, 17.17.17.17, and Vol.F in this case), find the local tag (the second item in each entry, after the “” character) and the remote tag (the 4th item). Run the command “eqv_remove -t <localtag> -S <node being removed> -o <remotetag> -e <instancetype>.


C:\LK\Bin>eqv_remove -t Vol.E -S W16-3 -o Vol.E -e SHARED
C:\LK\Bin>eqv_remove -t 17.17.17.17 -S W16-3 -o 17.17.17.17 -e SHARED
C:\LK\Bin>eqv_remove -t Vol.F -S W16-3 -o Vol.F -e SHARED

c. Verify that eqv_list now does not include the node being removed:


C:\LK\Bin>eqv_list
W16-1Vol.EW16-2Vol.ESHARED110
W16-1Vol.FW16-2Vol.FSHARED110
W16-117.17.17.17W16-217.17.17.17SHARED110

  1. Remove LifeKeeper communication paths to the node being replaced
    1. Run “net_list” to get a list of communication paths. Identify all of the paths that contain the node being replaced (W16-3):


C:\LK\Bin>net_list
W16-2▯TCPIP:1500▯TLI▯0▯0▯0▯ALIVE▯1▯6▯5▯0▯0▯0▯0▯0▯0▯0▯0▯172.31.27.23▯172.31.58.101
W16-3▯TCPIP:1510▯TLI▯0▯0▯0▯DEAD▯2▯6▯5▯1▯0▯0▯0▯0▯0▯0▯0▯172.31.39.191▯172.31.58.101

b. For each communication path to be removed, get the device name (the 2nd item in the list after the “▯” character). Run the command “net_remove -D <devicename>” to delete the communication path. Then run “net_list” to verify that the communication path has been removed.


C:\LK\Bin>net_remove -D TCPIP:1510

C:\LK\Bin>net_list
W16-2TCPIP:1500TLI000ALIVE16500000000172.31.27.23172.31.58.101

  1. Remove the node from the list of LifeKeeper systems by running “sys_remove -s <node being removed>”. Verify that the system was removed by running “sys_list”


C:\LK\Bin>sys_remove -s W16-3

C:\LK\Bin>sys_list
W16-1
W16-2

Repeat these steps on each remaining node in the cluster.

Step 5 – Bring up the replacement node

Configure the new node, adding storage as appropriate.

Step 6 – Use the DataKeeper GUI to re-create mirrors to the new node

Start the DataKeeper GUI, connect to the new node, and create a mirror to it within the appropriate job.

Step 7 – Add the node to the LifeKeeper cluster

Create comm paths from the existing LifeKeeper nodes to the new replacement node, then extend all hierarchies.

Case 2 – node is running and can be accessed prior to being replaced

If you are planning to replace a cluster node with a new one, the steps are very similar to what is done for Case 1 – node is lost and not recoverable. The steps are – before shutting down the node to be replaced:

  • Move any hierarchies from the node being replaced to another node in the cluster
  • Shut down the node that is going to be replaced. After this point, do NOT re-start this node, since it will have invalid mirror and job configuration.
  • Follow the steps described in Case 1 – node is lost and not recoverable.

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment