The File System Health Monitoring feature detects conditions that could cause LifeKeeper protected applications that depend on the file system to fail. Monitoring occurs on active/in-service resources (i.e. file systems) only. The two conditions that are monitored are:
- A full (or almost full) file system, and
- An improperly mounted (or unmounted) file system.
When either of these two conditions is detected, one of several actions might be taken.
- A warning message can be logged and email sent to a system administrator.
- Local recovery of the resource can be attempted.
- The resource can be failed over to a backup server.
Condition Definitions
Full or Almost Full File System
A “disk full” condition can be detected, but cannot be resolved by performing a local recovery or failover – administrator intervention is required. A message will be logged by default. Additional notification functionality is available. For example, an email can be sent to a system administrator, or another application can be invoked to send a warning message by some other means.To enable notification for the full/almost full disk conditions a basic event notification script name notify has been provided in the directory /opt/LifeKeeper/events/filesys/diskfull. Simply add the functionality required to send email or execute another application.
In addition to a “disk full” condition, a “disk almost full” condition can be detected and a warning message logged in the LifeKeeper log.
The “disk full” threshold is:
FILESYSFULLERROR=95
The “disk almost full” threshold is:
FILESYSFULLWARN=90
The default values are 90% and 95% as shown, but are configurable via tunables in the /etc/default/LifeKeeper file. The meanings of these two thresholds are as follows:
FILESYSFULLWARN – When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log.
FILESYSFULLERROR – When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log as well as the system log. The file system notify script will also be called.
Unmounted or Improperly Mounted File System
LifeKeeper checks the /etc/mtab file to determine whether a LifeKeeper protected file system that is in service is actually mounted. In addition, the mount options are checked against the stored mount options in the filesys resource information field to ensure that they match the original mount options used at the time the hierarchy was created.
If an unmounted or improperly mounted file system is detected, local recovery is invoked and will attempt to remount the file system with the correct mount options.
If the remount fails, failover will be attempted to resolve the condition. The following is a list of common causes for remount failure which would lead to a failover:
- corrupted file system (fsck failure)
- failure to create mount point directory
- mount point is busy
- mount failure
- LifeKeeper internal error
Post your comment on this topic.