If one of the legs of a mirror fails, a repair can be done on that leg.
If a problem occurs, the resource will be marked OSF. (Note: An email notification will occur if enabled.)
The mdComponent could be marked OSF while the disk is okay, but the component is marked “faulty” in the mirror. This can be due to some issue detected by mdadm when the device was brought on-line (check the error log for further information) or could be due to a manual operation where the mdadm utility was used to “break” the mirror.
The mdComponent as well as the underlying disk/device could be marked OSF if they failed during the in-service operation. For example, the disk was “broken” or physically not connected when the virtual device was started.
The following screen shots depict an array failure from before the array failed and initial handling of that failure to updating the state to “failed” and bringing it back in service. (These screen shots include an example using a “terminal resource” to tie the bottom of each hierarchy to a single resource.)
When the failure of the array is initially handled, all resources will be marked OSF. During this failure, IOs continue to the good component or leg of the mirror.
If the failed component was successfully removed from the mirror configuration during the error handling, the resource will transition to OSU. This is done when the MD quickCheck runs after the failure. If, during the handling, the failed component could not be removed from the mirror configuration, then the resource will remain in the OSF state.
If the server has to reboot while in the failed state, perhaps to repair the failure to the storage, then the storage resources under the failed component will be restored (if it was properly repaired), but the failed component will not automatically be re-added into the mirror. An in-service (from the GUI or using perform_action(1M)) of the failed component will re-add the failed component. This will trigger a resumption of IO to the leg. The mirror will then do a partial resync if an internal bitmap is configured or a full resync will be done otherwise.
If the failed leg is repaired manually in the virtual device, LifeKeeper will automatically detect the change when quickCheck runs. The state of the resource will change to reflect its new state. However, if the resources below the component are failed, aka the device and/or disk, those states will not be updated. To update those states, the GUI or perform_action(1M) must be used to bring the resource(s) in-service.
Post your comment on this topic.