You are here: LifeKeeper > User Guide > Using LifeKeeper > File System Health Monitoring

File System Health Monitoring

The File System Health Monitoring feature detects conditions that could cause LifeKeeper protected applications that depend on the file system to fail. Monitoring occurs on active/in-service resources (i.e. file systems) only. The two conditions that are monitored are:

When either of these two conditions is detected, one of several actions might be taken.

Condition Definitions

Full or Almost Full File System

A "disk full" condition can be detected, but cannot be resolved by performing a local recovery or failover - administrator intervention is required. A message will be logged by default. Additional notification functionality is available. For example, an email can be sent to a system administrator, or another application can be invoked to send a warning message by some other means. To enable this notification functionality, refer to the topic Configuring LifeKeeper Event Email Notification.

In addition to a "disk full" condition, a "disk almost full" condition can be detected and a warning message logged in the LifeKeeper log.

The "disk full" threshold is:

FILESYSFULLERROR=95

The "disk almost full" threshold is:

FILESYSFULLWARN=90

The default values are 90% and 95% as shown, but are configurable via tunables in the /etc/default/LifeKeeper file. The meanings of these two thresholds are as follows:

FILESYSFULLWARNING - When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log.

FILESYSFULLERROR - When a file system reaches this percentage full, a message will be displayed in the LifeKeeper log as well as the system log. The file system notify script will also be called.

Unmounted or Improperly Mounted File System

LifeKeeper checks the /etc/mtab file to determine whether a LifeKeeper protected file system that is in service is actually mounted. In addition, the mount options are checked against the stored mount options in the filesys resource information field to ensure that they match the original mount options used at the time the hierarchy was created.

If an unmounted or improperly mounted file system is detected, local recovery is invoked and will attempt to remount the file system with the correct mount options.  

If the remount fails, failover will be attempted to resolve the condition. The following is a list of common causes for remount failure which would lead to a failover:

  • corrupted file system (fsck failure)

  • failure to create mount point directory

  • mount point is busy    

  • mount failure   

  • LifeKeeper internal error    

of

© 2016 SIOS Technology Corp., the industry's leading provider of business continuity solutions, data replication for continuous data protection.