If one of the legs of a mirror fails, a repair can be done on that leg.

If a problem occurs, the resource will be marked OSF. (Note: An email notification will occur if enabled.)

Figure 16: LifeKeeper Hierarchy With Failed Component

The mdComponent could be marked OSF while the disk is okay, but the component is marked “faulty” in the mirror. This can be due to some issue detected by mdadm when the device was brought on-line (check the error log for further information) or could be due to a manual operation where the mdadm utility was used to “break” the mirror.

The mdComponent as well as the underlying disk/device could be marked OSF if they failed during the in-service operation. For example, the disk was “broken” or physically not connected when the virtual device was started.

The following screen shots depict an array failure from before the array failed and initial handling of that failure to updating the state to “failed” and bringing it back in service. (These screen shots include an example using a “terminal resource” to tie the bottom of each hierarchy to a single resource.)

Figure 17 – Before Failure of Array

Figure 18 – After Failure of Array

When the failure of the array is initially handled, all resources will be marked OSF. During this failure, IOs continue to the good component or leg of the mirror.

Figure 19 – Failed Disk Array

Figure 20 – Updating Failed Component to Standby

If the failed component was successfully removed from the mirror configuration during the error handling, the resource will transition to OSU. This is done when the MD quickCheck runs after the failure. If, during the handling, the failed component could not be removed from the mirror configuration, then the resource will remain in the OSF state.

Figure 21 – Restored Storage Resources

If the server has to reboot while in the failed state, perhaps to repair the failure to the storage, then the storage resources under the failed component will be restored (if it was properly repaired), but the failed component will not automatically be re-added into the mirror. An in-service (from the GUI or using perform_action(1M)) of the failed component will re-add the failed component. This will trigger a resumption of IO to the leg. The mirror will then do a partial resync if an internal bitmap is configured or a full resync will be done otherwise.

Figure 22: Software RAID In-Service Status

If the failed leg is repaired manually in the virtual device, LifeKeeper will automatically detect the change when quickCheck runs. The state of the resource will change to reflect its new state. However, if the resources below the component are failed, aka the device and/or disk, those states will not be updated. To update those states, the GUI or perform_action(1M) must be used to bring the resource(s) in-service.

Figure 23: Software RAID Successful In-Service

Feedback

Was this helpful?

Yes No
You indicated this topic was not helpful to you ...
Could you please leave a comment telling us why? Thank you!
Thanks for your feedback.

Post your comment on this topic.

Post Comment