By using STONITH, you can turn off the power to prevent a faulty node from accessing or corrupting the shared data. When used in conjunction with Quorum/Witness, it strengthens the prevention of split-brain scenarios. Combining the use of stopping the Linux kernel via the SSM Agent and shutting down via the EC2 instance ensures a more reliable operation.
Requirements
Package
Install AWS CLI and SSM Agent on each cluster node. The SSM Agent can be installed as follows.
# dnf install https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
# systemctl enable amazon-ssm-agent
# systemctl start amazon-ssm-agent
For more detailed instructions, please refer to the AWS User’s Guide Manually Install and Uninstall SSM Agent on EC2 Instances for Linux.
IAM Role
The IAM role needs to have the permission to power off virtual machines and to communicate with the SSM Agent.
ec2:StopInstances
ssm:SendCommand
Pre-checking
Run the following command on each node to ensure that authentication on AWS is functioning properly.
# aws ec2 stop-instances --instance-ids <instance> --force --region <region>
# aws ssm send-command --document-name AWS-RunShellScript --instance-ids <instance> --parameters '{"commands":["dohalt() { sleep 1; echo o > /proc/sysrq-trigger; }","echo 1 > /proc/sys/kernel/sysrq","dohalt &","echo shutoff the system"],"executionTimeout":["10"]}' --timeout-seconds 30 --region <region>
STONITH Installation
Execute the following command to install the LifeKeeper STONITH script on each server that has the communication paths configured to all servers.
# /opt/LifeKeeper/samples/STONITH/stonith-install
Create ec2-stonith.sh
Create the following STONITH script and place it in /usr/local/bin/ec2-stonith.sh.
#!/bin/sh
instance="--instance-ids $1"
region="${2:+--region $2}"
aws ssm send-command --document-name AWS-RunShellScript $instance --parameters '{"commands":["dohalt() { sleep 1; echo o > /proc/sysrq-trigger; }","echo 1 > /proc/sys/kernel/sysrq","dohalt &","echo shutoff the system"],"executionTimeout":["10"]}' --timeout-seconds 30 $region
sleep 1
aws ec2 stop-instances $instance --force $region
sleep 1
aws ec2 stop-instances $instance --force $region
Grant execution permissions.
# chmod 755 /usr/local/bin/ec2-stonith.sh
Update /opt/LifeKeeper/config/stonith.conf
Update the stonith.conf file.
vm-HostA /usr/local/bin/ec2-stonith.sh <instance> <region>
vm-HostB /usr/local/bin/ec2-stonith.sh <instance> <region>
Post your comment on this topic.