The File System Health Monitoring feature detects conditions that could cause LifeKeeper protected applications that depend on the file system to fail. Monitoring occurs on active/in-service resources (i.e. file systems) only. The two conditions that are monitored are:
A full (or almost full) file system, and
An improperly mounted (or unmounted) file system.
When either of these two conditions is detected, one of several actions might be taken.
A warning message can be logged and email sent to a system administrator.
Local recovery of the resource can be attempted.
The resource can be failed over to a backup server.
A "disk full" condition can be detected, but cannot be resolved by performing a local recovery or failover - administrator intervention is required. A message will be logged by default. Additional notification functionality is available. For example, an email can be sent to a system administrator, or another application can be invoked to send a warning message by some other means. To enable this notification functionality, refer to the topic Configuring LifeKeeper Event Email Notification.
In addition to a "disk full" condition, a "disk almost full" condition can be detected and a warning message logged in the LifeKeeper log.
The "disk full" threshold is:
The "disk almost full" threshold is:
The default values are 90% and 95% as shown, but are configurable via tunables in the /etc/default/LifeKeeper file. The meanings of these two thresholds are as follows:
FILESYSFULLWARNING - When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log.
FILESYSFULLERROR - When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log as well as the system log. The file system notify script will also be called.
LifeKeeper checks the /etc/mtab file to determine whether a LifeKeeper protected file system that is in service is actually mounted. In addition, the mount options are checked against the stored mount options in the filesys resource information field to ensure that they match the original mount options used at the time the hierarchy was created.
If an unmounted or improperly mounted file system is detected, local recovery is invoked and will attempt to remount the file system with the correct mount options.
If the remount fails, failover will be attempted to resolve the condition. The following is a list of common causes for remount failure which would lead to a failover:
corrupted file system (fsck failure)
failure to create mount point directory
mount point is busy
LifeKeeper internal error
© 2016 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.
Open topic with navigation