By using STONITH, you can turn off the power to prevent a faulty node from accessing or corrupting the shared data. When used in conjunction with Quorum/Witness, it strengthens the prevention of split-brain scenarios. Combining the use of stopping the Linux kernel via the SSM Agent and shutting down via the EC2 instance ensures a more reliable operation.

Requirements

Package

Install AWS CLI and SSM Agent on each cluster node. The SSM Agent can be installed as follows.

# dnf install https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
# systemctl enable amazon-ssm-agent
# systemctl start amazon-ssm-agent

For more detailed instructions, please refer to the AWS User’s Guide Manually Install and Uninstall SSM Agent on EC2 Instances for Linux.

IAM Role

The IAM role needs to have the permission to power off virtual machines and to communicate with the SSM Agent.

ec2:StopInstances
ssm:SendCommand

Pre-checking

Run the following command on each node to ensure that authentication on AWS is functioning properly.

# aws ec2 stop-instances --instance-ids <instance> --force --region <region>
# aws ssm send-command --document-name AWS-RunShellScript --instance-ids <instance> --parameters '{"commands":["dohalt() { sleep 1; echo o > /proc/sysrq-trigger; }","echo 1 > /proc/sys/kernel/sysrq","dohalt &","echo shutoff the system"],"executionTimeout":["10"]}' --timeout-seconds 30 --region <region>

STONITH Installation

Execute the following command to install the LifeKeeper STONITH script on each server that has the communication paths configured to all servers.

# /opt/LifeKeeper/samples/STONITH/stonith-install

Create ec2-stonith.sh

Create the following STONITH script and place it in /usr/local/bin/ec2-stonith.sh.

#!/bin/sh
instance="--instance-ids $1"
region="${2:+--region $2}"
aws ssm send-command --document-name AWS-RunShellScript $instance --parameters '{"commands":["dohalt() { sleep 1; echo o > /proc/sysrq-trigger; }","echo 1 > /proc/sys/kernel/sysrq","dohalt &","echo shutoff the system"],"executionTimeout":["10"]}' --timeout-seconds 30 $region
sleep 1
aws ec2 stop-instances $instance --force $region
sleep 1
aws ec2 stop-instances $instance --force $region

Grant execution permissions.

# chmod 755 /usr/local/bin/ec2-stonith.sh

Update /opt/LifeKeeper/config/stonith.conf

Update the stonith.conf file.

vm-HostA /usr/local/bin/ec2-stonith.sh <instance> <region>
vm-HostB /usr/local/bin/ec2-stonith.sh <instance> <region>

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment