Replacing a node in a LifeKeeper for Windows cluster that has DataKeeper mirrored volume resources involves making the following changes:
- Move any DataKeeper Volume resources from the node being replaced to another node in the cluster
- Using the registry editor, remove the node being replaced from the DataKeeper job for each mirrored volume. For 1×1 mirrors (2 node cluster) delete the DataKeeper job.
- For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced
- Remove the node from the cluster
- Version 8.9.0 and later – use the delallsys.pl utility to remove the system
- Versions prior to 8.9.0 – manually remove comm paths, equivalencies, and the system from the LifeKeeper Configuration Database
- Bring up the replacement node
- Use the DataKeeper GUI to re-create mirrors to the new node
- Add the node to the LifeKeeper cluster
There are two cases that require slightly different steps to achieve node replacement. The first case involves a cluster node that has been lost and cannot be recovered, the second is the case where a node is planned to be replaced, but is still up and running prior to replacement.
Within those two cases are two scenarios that also require slightly different steps. The first scenario is a two-node cluster, with one mirror for each clustered volume. The second scenario is a three-node cluster, or a two-node cluster with a node outside the cluster.
Case 1 – Node is lost and Not recoverable
SCENARIO 1: Two-node cluster with no nodes outside the cluster
In this example, there is a two-node DKCE cluster. The cluster nodes are:
- W16-1
- W16-2
There are two mirrored volumes – E: and F:.
Node W16-2 has been lost and is not recoverable. It will be replaced with a new node, also named W16-2.
Step 1 – Move any DataKeeper Volume resources from the node being replaced to another node in the cluster
LifeKeeper hierarchies are in service on node W16-1.
Step 2 – Using the registry editor, delete the jobs that contain mirrored volumes.
DataKeeper Jobs are stored in the Windows registry, in the following registry key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ExtMirr\Parameters\Jobs
Start the registry editor and navigate to that key.
Each DataKeeper job has an ID, which you’ll see as a subkey in the “Jobs” key. For example, on this system there are two jobs “E” and “F”. The job ID is listed in the output of the “emcmd . getjobinfo” command:
|
---|
The registry shows the two Job IDs.
Each job should be deleted, unless it contains information about mirrored volumes that are not part of the cluster. To delete the job, right-click the key whose name is the job ID, and choose “Delete”. This removes the job completely from DataKeeper on this system.
Note: Some DataKeeper jobs contain information for more than one volume.
After completing this step on one of the cluster nodes, repeat it on all other cluster nodes.
Step 3 – For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced
To delete a mirror using EMCMD, start a CMD prompt on the mirror source node. Then change directory to the DataKeeper install directory using the command “cd /d %ExtMirrBase%”.
To delete the mirror for a mirrored volume that has only one target, run this command:
emcmd . deletemirror <vol>
In this case, run the commands:
C:\Program Files (x86)\SIOS\DataKeeper>emcmd . deletemirror f |
---|
Step 4 – Remove the node from the cluster
At this point, the node has been completely removed from DataKeeper. The next step in the replacement process is to remove the node from the LifeKeeper cluster.
Version 8.9.0 and later – use delallsys.pl
If your LifeKeeper version is 8.9.0 or later, the delallsys.pl utility is included with LifeKeeper. To use this tool, follow these steps.
- Open a CMD prompt
- Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
- Start a shell session by running the command “sh”
- Set PATH using the command “export PATH=$PATH:/bin”
- Run the delallsys.pl script by running the command “perl delallsys.pl -f <node name>”
- Run “exit” to return to the CMD prompt
|
---|
Versions prior to 8.9.0 – manually remove the node
If your LifeKeeper version is earlier than 8.9.0, the LifeKeeper system must be manually removed from the configuration database. To do this, follow these steps:
- Open a CMD prompt
- Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
- Remove equivalencies:
- Run “eqv_list” to get a list of equivalencies. Identify all of the ones that contain the node being replaced (W16-3):
|
---|
b. For each instance tag (Vol.E, 17.17.17.17, and Vol.F in this case), find the local tag (the second item in each entry, after the “” character) and the remote tag (the 4th item). Run the command “eqv_remove -t <localtag> -S <node being removed> -o <remotetag> -e <instancetype>.
|
---|
c. Verify that eqv_list output is now empty
|
---|
- Remove LifeKeeper communication paths to the node being replaced
- Run “net_list” to get a list of communication paths. Identify all of the paths that contain the node being replaced (W16-3):
|
---|
b. For each communication path to be removed, get the device name (the 2nd item in the list after the “▯” character). Run the command “net_remove -D <devicename>” to delete the communication path. Then run “net_list” to verify that the communication path has been removed.
|
---|
- Remove the node from the list of LifeKeeper systems by running “sys_remove -s <node being removed>”. Verify that the system was removed by running “sys_list”
|
---|
Step 5 – Bring up the replacement node and add it to the cluster
Configure the new node, adding storage as appropriate. Then add it to the cluster.
Step 6 – Use the DataKeeper GUI to re-create mirrors to the new node
Start the DataKeeper GUI, connect to the new node, and create a mirror to it within the appropriate job.
SCENARIO 2: Three-or-more-node cluster, or two node cluster with 1 or more nodes outside the cluster
In this example, there is a three-node DKCE cluster. The cluster nodes are:
- W16-1
- W16-2
- W16-3
There are two mirrored volumes – E: and F:.
Node W16-3 has been lost and is not recoverable. It will be replaced with a new node, also named W16-3.
Step 1 – Move any DataKeeper Volume resources from the node being replaced to another node in the cluster
LifeKeeper hierarchies are in service on node W16-1.
Step 2 – Using the registry editor, remove the node being replaced from the DataKeeper job for each mirrored volume.
DataKeeper Jobs are stored in the Windows registry. To modify a job that is configured on a node that is not accessible, update the registry values associated with the job.
DataKeeper Jobs are stored in the following registry key:
HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\ExtMirr\Parameters\Jobs
Start the registry editor and navigate to that key.
Each DataKeeper job has an ID, which you’ll see as a subkey in the “Jobs” key. For example, on this system there are two jobs “E” and “F”. The job ID is listed in the output of the “emcmd . getjobinfo” command:
|
---|
The registry shows the two Job IDs. Navigate into one of them – you will see that it contains 3 values: Name, Description, and Endpoints.
To remove a node from a job, the Endpoints value needs to be modified. Double-click the Endpoints value and find any lines containing the node that is to be removed.
In this case, the 2nd and 3rd lines should be removed. Highlight them and press the Delete button, then “OK” to save the value. In this case, it should have a single line left (for the mirror between W16-1 and W16-2).
Repeat these steps for all jobs. When completed, “emcmd . getjobinfo” will reflect the new job contents:
|
---|
Note: Some DataKeeper jobs contain information for more than one volume. In those cases, the same steps should be followed – remove any lines that contain references to the node being removed.
After completing this step on one of the cluster nodes, repeat it on all other cluster nodes. An alternative is to export the “Jobs” key to a file, and import that key on each of the other nodes. This ensures that job information is consistent across the nodes.
Step 3 – For each mirrored volume, use EMCMD to delete mirrors from the source system to the target node which is being replaced
To delete a mirror using EMCMD, start a CMD prompt on the mirror source node. Then change directory to the DataKeeper install directory using the command “cd /d %ExtMirrBase%”.
To delete the mirror whose target is the node being removed, run the command:
emcmd . deletemirror <vol> <target_ip>
using the volume letter and IP address of the node being removed. In this case, run the commands:
|
---|
Step 4 – Remove the node from the cluster
At this point, the node has been completely removed from DataKeeper. The next step in the replacement process is to remove the node from the LifeKeeper cluster.
Version 8.9.0 and later – use delallsys.pl
If your LifeKeeper version is 8.9.0 or later, the delallsys.pl utility is included with LifeKeeper. To use this tool, follow these steps.
- Open a CMD prompt
- Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
- Start a shell session by running the command “sh”
- Set PATH using the command “export PATH=$PATH:/bin”
- Run the delallsys.pl script by running the command “perl delallsys.pl -f <node name>”
- Run “exit” to return to the CMD prompt
|
---|
Repeat these steps on all remaining LifeKeeper nodes (W16-2 in this case).
Versions prior to 8.9.0 – manually remove the node
If your LifeKeeper version is earlier than 8.9.0, the LifeKeeper system must be manually removed from the configuration database. To do this, follow these steps:
- Open a CMD prompt
- Change directory to the LifeKeeper bin directory by running the command “cd /d %LKBIN%”
- Remove equivalencies:
- Run “eqv_list” to get a list of equivalencies. Identify all of the ones that contain the node being replaced (W16-3):
|
---|
b. For each instance tag (Vol.E, 17.17.17.17, and Vol.F in this case), find the local tag (the second item in each entry, after the “” character) and the remote tag (the 4th item). Run the command “eqv_remove -t <localtag> -S <node being removed> -o <remotetag> -e <instancetype>.
|
---|
c. Verify that eqv_list now does not include the node being removed:
|
---|
- Remove LifeKeeper communication paths to the node being replaced
- Run “net_list” to get a list of communication paths. Identify all of the paths that contain the node being replaced (W16-3):
|
---|
b. For each communication path to be removed, get the device name (the 2nd item in the list after the “▯” character). Run the command “net_remove -D <devicename>” to delete the communication path. Then run “net_list” to verify that the communication path has been removed.
|
---|
- Remove the node from the list of LifeKeeper systems by running “sys_remove -s <node being removed>”. Verify that the system was removed by running “sys_list”
|
---|
Repeat these steps on each remaining node in the cluster.
Step 5 – Bring up the replacement node
Configure the new node, adding storage as appropriate.
Step 6 – Use the DataKeeper GUI to re-create mirrors to the new node
Start the DataKeeper GUI, connect to the new node, and create a mirror to it within the appropriate job.
Step 7 – Add the node to the LifeKeeper cluster
Create comm paths from the existing LifeKeeper nodes to the new replacement node, then extend all hierarchies.
Case 2 – node is running and can be accessed prior to being replaced
If you are planning to replace a cluster node with a new one, the steps are very similar to what is done for Case 1 – node is lost and not recoverable. The steps are – before shutting down the node to be replaced:
- Move any hierarchies from the node being replaced to another node in the cluster
- Shut down the node that is going to be replaced. After this point, do NOT re-start this node, since it will have invalid mirror and job configuration.
- Follow the steps described in Case 1 – node is lost and not recoverable.
Post your comment on this topic.