STONITH (Shoot The Other Node in the Head) is a fencing technique for remotely powering down a node in a cluster. LifeKeeper can provide STONITH capabilities by using external power switch controls, IPMI-enabled motherboard controls, hypervisor-provided power capabilities and cloud vendor tools to power off the other nodes in a cluster. Each STONITH method allows the cluster software to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.
STONITH using IPMI
IPMI (Intelligent Platform Management Interface) defines a set of common interfaces to a computer system which can be used to monitor system health and manage the system. Used with STONITH, it allows the cluster software to instruct the switch via a serial or network connection to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.
Package Requirements
- IPMI tools package on each server in the cluster(e.g. ipmitool-1.8.11-6.el6.x86_64.rpm)
Configure Baseboard Management Controller (BMC)
Using BIOS or the ipmitool command on each server in the cluster (example using ipmitool):
- Use Static IP: ipmitool lan set 1 ipsrc static
- Add IP address: ipmitool lan set 1 ipaddr 192.168.0.1
- Set Sub netmask: ipmitool lan set 1 netmask 255.0.0.0
- Set User name: ipmitool user set name 1 root
- Set Password: ipmitool user set password 1 secret
- Add administrator privilege level to the user: ipmitool user priv 1 4
- Enable network access to the user: ipmitool user enable 1
(For detailed information, see the ipmitool man page.)
STONITH Installation
Install the LifeKeeper STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:
# /opt/LifeKeeper/samples/STONITH/stonith-install
Update /opt/LifeKeeper/config/stonith.conf
Add entries to the stonith.conf file for each server in the cluster.
# LifeKeeper STONITH configuration # # Example: <host> ipmitool -l <interface> -H <ip> -U root -P secret power off minute-maid ipmitool -I lanplus -H 192.168.0.1 -U root -P secret power off kool-aid ipmitool -I lanplus -H 192.168.0.2 -U root -P secret power off |
STONITH in VMware vSphere Environments
vCLI (vSphere Command-Line Interface) is a command-line interface supported by VMware for managing your virtual infrastructure including the ESXi hosts and virtual machines. You can choose the vCLI command best suited for your needs and apply it for your LifeKeeper STONITH usage between VMware virtual machines.
Package Requirements
STONITH Server
VMware vSphere SDK Package or VMware vSphere CLI (vSphere CLI is included in the same installation package as the vSphere SDK).
Each Monitored Virtual Machine
VMware Tools
Configuration
vSphere CLI commands run on top of vSphere SDK for Perl.
- vCLI-esxcli
- esxcli ––server=10.0.0.1 ––username=root ––password=secret vms vm kill ––type=‘hard’ ––world-id=1234567
- vCLI-vmware_cmd
- vmware-cmd -H 10.0.0.1 -U root -P secret <vm_id> stop hard
Determining <vm_id>
vSphere CLI commands run on top of vSphere SDK for Perl. <vm_id> is used as an identifier of the VM. This variable should point to the VM’s configuration file for the VM being configured.
- Get the list of available hosts:
vmware-cmd -H <vmware host> -l
- Sample output:
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx
/vmfs/volumes/4e1e1386-0b862fae-a859-0019b9cb28bc/oracle10/oracle.vmx
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver02/lampserver02.vmx
- The command referencing the first VM in the output that is in bold:
vmware-cmd -H 10.0.0.1 -U root -P secret
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/l
ampserver/lampserver.vmx stop hard
STONITH Installation
Install the LifeKeeper STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:
# /opt/LifeKeeper/samples/STONITH/stonith-install
Update /opt/LifeKeeper/config/stonith.conf
The entries for the 3 hosts listed in the output above for the stonith.conf file (all other entries should be commented out or deleted):
# LifeKeeper STONITH configuration # # Example: vmware-cmd -H 10.0.0.1 -U root -P secret stop hard lampserver vmware-cmd -H 10.0.0.1 -U root -P secret /vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx stop hard oracle vmware-cmd -H 10.0.0.1 -U root -P secret /vmfs/volumes/4e1e1386-0b862fae-a859-0019b9cb28bc/oracle10/oracle.vmx stop hard lampserver02 vmware-cmd -H 10.0.0.1 -U root -P secret / vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver02/lampserver02.vmx stop hard |
STONITH in Microsoft Azure Environments
The Azure CLI is a command line interface supported by Microsoft to manage Azure resources such as virtual machines. Used with STONITH, it allows the cluster software to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.
Requirements
Package
Azure CLI – install Linux Azure CLI on each server in the cluster.
Custom Role
The virtual machines on Microsoft Azure used for Azure Fencing and the custom roles assigned to users must have at least powerOff permissions on the virtual machines.
Microsoft.Compute/*/read Microsoft.Compute/virtualMachines/powerOff/action Microsoft.Compute/virtualMachines/start/action |
Pre-checking
On each server run the following command to verify that authentication on Microsoft Azure is working.
# az vm show —resource-group <group name> —name <vm name>
STONITH Installation
Install the LifeKeeper for Microsoft Azure STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:
# /opt/LifeKeeper/samples/STONITH/azure-stonith-install
Since the above command works interactively, enter the group name and check the virtual machine name on Microsoft Azure for the cluster node displayed.
Example output from the command
STONITH script install… Please enter the Resource Group name in Azure: rg-Group1 Please enter the System name in Azure[vm-HostA]: Enable Stonith on node vm-HostA [Yes]: s Please enter the System name in Azure[vm-HostB]: Enable Stonith on node vm-HostB [Yes]: Configuration file /opt/LifeKeeper/config/stonith.conf was saved. |
Update /opt/LifeKeeper/config/stonith.conf
After the installation is completed, the settings to power off the virtual machine will be added to the following file.
/opt/LifeKeeper/config/stonith.conf
# LifeKeeper STONITH configuration # # Example: <host> az vm restart -g <resource group> -n <node name> vm-HostA az vm stop -g rg-Group1 -n vm-HostA --skip-shutdown vm-HostB az vm stop -g rg-Group1 -n vm-HostB --skip-shutdown |
Expected Behaviors
When LifeKeeper detects a communication failure with a node, that node will be powered off and a failover will occur. Once the issue is repaired, the node will have to be manually powered on.
Post your comment on this topic.