STONITH (Shoot The Other Node in the Head) is a fencing technique for remotely powering down a node in a cluster. LifeKeeper can provide STONITH capabilities by using external power switch controls, IPMI-enabled motherboard controls, hypervisor-provided power capabilities and cloud vendor tools to power off the other nodes in a cluster. Each STONITH method allows the cluster software to power off a cluster node that appears to have died thus ensuring that the unhealthy node cannot access or corrupt any shared data.
How to configure STONITH in each environment
We have provided examples of STONITH configuration for the following environments.
Testing STONITH In General
STONITH is activated when the commpaths are lost.
To simulate a loss comm path you can run a iptables or firewalld command to test your stonith configuration file.
Iptables
# iptables -A INPUT -p tcp --destination-port 7365 -j DROP
Firewalld
There are two options for the firewall.
Option 1 - Start and disable the firewall.
# systemctl disable firewalld
# systemctl start firewalld
Option 2 - If the firewall is already configured.
Use this option if ports 81, 82, 1024, and 7365 are open.
- Remove 7365 from the port list.
# firewall-cmd --zone=public --remove-port=7365/tcp
- Perform a reboot.
- Start the firewall if it is not already running.
# systemctl start firewalld
- The system will reboot. Once the reboot is complete, add 7365 again with the following command:
# firewall-cmd --zone=public --add-port=7365/tcp
Expected behavior
When LifeKeeper detects a communication failure with a node, it will power off that node and a failover will occur. Once the problem is fixed, you will need to manually power on the node.
Remove STONITH
- Remove all folders under /opt/LifeKeeper/events/prefailover.
# rm -r /opt/LifeKeeper/events/prefailover/*
- Remove the configuration file under /opt/LifeKeeper/config/stonith.conf.
# rm /opt/LifeKeeper/config/stonith.conf
Post your comment on this topic.