aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJonathan Brassow <jbrassow@redhat.com>2009-09-04 15:40:32 -0400
committerAlasdair G Kergon <agk@redhat.com>2009-09-04 15:40:32 -0400
commitd2b698644c97cb033261536a4f2010924a00eac9 (patch)
tree32192650a01c913a85953c0a756fca06320fef46
parentb8313b6da7e2e7c7f47d93d8561969a3ff9ba0ea (diff)
dm raid1: do not allow log_failure variable to unset after being set
This patch fixes a bug which was triggering a case where the primary leg could not be changed on failure even when the mirror was in-sync. The case involves the failure of the primary device along with the transient failure of the log device. The problem is that bios can be put on the 'failures' list (due to log failure) before 'fail_mirror' is called due to the primary device failure. Normally, this is fine, but if the log device failure is transient, a subsequent iteration of the work thread, 'do_mirror', will reset 'log_failure'. The 'do_failures' function then resets the 'in_sync' variable when processing bios on the failures list. The 'in_sync' variable is what is used to determine if the primary device can be switched in the event of a failure. Since this has been reset, the primary device is incorrectly assumed to be not switchable. The case has been seen in the cluster mirror context, where one machine realizes the log device is dead before the other machines. As the responsibilities of the server migrate from one node to another (because the mirror is being reconfigured due to the failure), the new server may think for a moment that the log device is fine - thus resetting the 'log_failure' variable. In any case, it is inappropiate for us to reset the 'log_failure' variable. The above bug simply illustrates that it can actually hurt us. Cc: stable@kernel.org Signed-off-by: Jonathan Brassow <jbrassow@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
-rw-r--r--drivers/md/dm-raid1.c8
1 files changed, 7 insertions, 1 deletions
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c
index 9726577cde49..33f179e66bf5 100644
--- a/drivers/md/dm-raid1.c
+++ b/drivers/md/dm-raid1.c
@@ -648,7 +648,13 @@ static void do_writes(struct mirror_set *ms, struct bio_list *writes)
648 */ 648 */
649 dm_rh_inc_pending(ms->rh, &sync); 649 dm_rh_inc_pending(ms->rh, &sync);
650 dm_rh_inc_pending(ms->rh, &nosync); 650 dm_rh_inc_pending(ms->rh, &nosync);
651 ms->log_failure = dm_rh_flush(ms->rh) ? 1 : 0; 651
652 /*
653 * If the flush fails on a previous call and succeeds here,
654 * we must not reset the log_failure variable. We need
655 * userspace interaction to do that.
656 */
657 ms->log_failure = dm_rh_flush(ms->rh) ? 1 : ms->log_failure;
652 658
653 /* 659 /*
654 * Dispatch io. 660 * Dispatch io.