STONITH (Shoot The Other Node in the Head) is a fencing technique for remotely powering down a node in a cluster. LifeKeeper can provide STONITH capabilities by using external power switch controls, IPMI-enabled motherboard controls, hypervisor-provided power capabilities and cloud vendor tools to power off the other nodes in a cluster. Each STONITH method allows the cluster software to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.

STONITH using IPMI

IPMI (Intelligent Platform Management Interface) defines a set of common interfaces to a computer system which can be used to monitor system health and manage the system. Used with STONITH, it allows the cluster software to instruct the switch via a serial or network connection to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.

Package Requirements

  • IPMI tools package on each server in the cluster(e.g. ipmitool-1.8.11-6.el6.x86_64.rpm)

Configure Baseboard Management Controller (BMC)

Using BIOS or the ipmitool command on each server in the cluster (example using ipmitool):

  • Use Static IP: ipmitool lan set 1 ipsrc static
  • Add IP address: ipmitool lan set 1 ipaddr 192.168.0.1
  • Set Sub netmask: ipmitool lan set 1 netmask 255.0.0.0
  • Set User name: ipmitool user set name 1 root
  • Set Password: ipmitool user set password 1 secret
  • Add administrator privilege level to the user: ipmitool user priv 1 4
  • Enable network access to the user: ipmitool user enable 1
    (For detailed information, see the ipmitool man page.)

STONITH Installation

Install the LifeKeeper STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:

# /opt/LifeKeeper/samples/STONITH/stonith-install

Update /opt/LifeKeeper/config/stonith.conf

Add entries to the stonith.conf file for each server in the cluster.

# LifeKeeper STONITH configuration
#
# Example: <host> ipmitool -l <interface> -H <ip> -U root -P secret power off
minute-maid ipmitool -I lanplus -H 192.168.0.1 -U root -P secret power off
kool-aid ipmitool -I lanplus -H 192.168.0.2 -U root -P secret power off

STONITH in VMware vSphere Environments

vCLI (vSphere Command-Line Interface) is a command-line interface supported by VMware for managing your virtual infrastructure including the ESXi hosts and virtual machines. You can choose the vCLI command best suited for your needs and apply it for your LifeKeeper STONITH usage between VMware virtual machines.

Package Requirements

STONITH Server

VMware vSphere SDK Package or VMware vSphere CLI (vSphere CLI is included in the same installation package as the vSphere SDK).

Each Monitored Virtual Machine

VMware Tools

Configuration

vSphere CLI commands run on top of vSphere SDK for Perl.

  • vCLI-esxcli
    • esxcli ––server=10.0.0.1 ––username=root ––password=secret vms vm kill ––type=‘hard’ ––world-id=1234567
  • vCLI-vmware_cmd
    • vmware-cmd -H 10.0.0.1 -U root -P secret <vm_id> stop hard

Determining <vm_id>

vSphere CLI commands run on top of vSphere SDK for Perl. <vm_id> is used as an identifier of the VM. This variable should point to the VM’s configuration file for the VM being configured.

  1. Get the list of available hosts:

vmware-cmd -H <vmware host> -l

  1. Sample output:

/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx
/vmfs/volumes/4e1e1386-0b862fae-a859-0019b9cb28bc/oracle10/oracle.vmx
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver02/lampserver02.vmx

  1. The command referencing the first VM in the output that is in bold:

vmware-cmd -H 10.0.0.1 -U root -P secret
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/l
ampserver/lampserver.vmx stop hard

STONITH Installation

Install the LifeKeeper STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:

# /opt/LifeKeeper/samples/STONITH/stonith-install

Update /opt/LifeKeeper/config/stonith.conf

The entries for the 3 hosts listed in the output above for the stonith.conf file (all other entries should be commented out or deleted):

# LifeKeeper STONITH configuration
#
# Example: vmware-cmd -H 10.0.0.1 -U root -P secret stop hard
lampserver vmware-cmd -H 10.0.0.1 -U root -P secret
/vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver/lampserver.vmx stop hard
oracle vmware-cmd -H 10.0.0.1 -U root -P secret
/vmfs/volumes/4e1e1386-0b862fae-a859-0019b9cb28bc/oracle10/oracle.vmx stop hard
lampserver02 vmware-cmd -H 10.0.0.1 -U root -P secret /
vmfs/volumes/4e08c1b9-d741c09c-1d3e-0019b9cb28be/lampserver02/lampserver02.vmx stop hard

STONITH in Microsoft Azure Environments

The Azure CLI is a command line interface supported by Microsoft to manage Azure resources such as virtual machines. Used with STONITH, it allows the cluster software to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.

Requirements

Package

Azure CLIinstall Linux Azure CLI on each server in the cluster.

Custom Role

The virtual machines on Microsoft Azure used for Azure Fencing and the custom roles assigned to users must have at least powerOff permissions on the virtual machines.

Microsoft.Compute/*/read
Microsoft.Compute/virtualMachines/powerOff/action
Microsoft.Compute/virtualMachines/start/action

Pre-checking

On each server run the following command to verify that authentication on Microsoft Azure is working.

# az vm show —resource-group <group name> —name <vm name>

STONITH Installation

Install the LifeKeeper for Microsoft Azure STONITH script on each server where LifeKeeper is installed and communication paths are configured to all servers by running the following command:

# /opt/LifeKeeper/samples/STONITH/azure-stonith-install

Since the above command works interactively, enter the group name and check the virtual machine name on Microsoft Azure for the cluster node displayed.

Example output from the command

STONITH script install…
Please enter the Resource Group name in Azure: rg-Group1

Please enter the System name in Azure[vm-HostA]:
Enable Stonith on node vm-HostA [Yes]:
s
Please enter the System name in Azure[vm-HostB]:
Enable Stonith on node vm-HostB [Yes]:
Configuration file /opt/LifeKeeper/config/stonith.conf was saved.

Update /opt/LifeKeeper/config/stonith.conf

After the installation is completed, the settings to power off the virtual machine will be added to the following file.

/opt/LifeKeeper/config/stonith.conf

# LifeKeeper STONITH configuration
#
# Example: <host> az vm restart -g <resource group> -n <node name>
vm-HostA az vm stop -g rg-Group1 -n vm-HostA --skip-shutdown
vm-HostB az vm stop -g rg-Group1 -n vm-HostB --skip-shutdown

Expected Behaviors

When LifeKeeper detects a communication failure with a node, that node will be powered off and a failover will occur. Once the issue is repaired, the node will have to be manually powered on.

Feedback

Thanks for your feedback.

Post your comment on this topic.

Post Comment