diff options
author | Jun'ichi Nomura <j-nomura@ce.jp.nec.com> | 2006-03-27 04:17:47 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@g5.osdl.org> | 2006-03-27 11:44:58 -0500 |
commit | 930d332a23682202c07df0276dd665a57755b37d (patch) | |
tree | 0a0202fb581a65a0b63a4a98aa725eb2088a7814 /drivers/md | |
parent | 76df1c651b66bdf07d60b3d60789feb5f58d73e3 (diff) |
[PATCH] drivers/md/dm-raid1.c: Fix inconsistent mirroring after interrupted recovery
dm-mirror has potential data corruption problem: while on-disk log shows
that all disk contents are in-sync, actual contents of the disks are not
synchronized. This problem occurs if initial recovery (synching) is
interrupted and resumed.
Attached patch fixes this problem.
Background:
rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
(in-sync), which results in the corresponding bit of clean_bits being set.
This is harmful if on-disk log is used and the map is removed/suspended
before the initial sync is completed. The clean_bits is written down to
the on-disk log at the map removal, and, upon resume, it's read and copied
to sync_bits. Since the recovery process refers to the sync_bits to find a
region to be recovered, the region whose state was changed from RH_NOSYNC
to RH_CLEAN is no longer recovered.
If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
sometimes ago, the contents of the mirrored disk just corrupt silently. If
you have, balanced read may get bogus data from out-of-sync disks.
The patch keeps RH_NOSYNC state unchanged. It will be changed to
RH_RECOVERING when recovery starts and get reclaimed when the recovery
completes. So it doesn't leak the region hash entry.
Description:
Keep RH_NOSYNC state unchanged when I/O on the region completes.
rh_dec() changes the region state from RH_NOSYNC (out-of-sync) to RH_CLEAN
(in-sync), which results in the corresponding bit of clean_bits being set.
This is harmful if on-disk log is used and the map is removed/suspended
before the initial sync is completed. The clean_bits is written down to
the on-disk log at the map removal, and, upon resume, it's read and copied
to sync_bits. Since the recovery process refers to the sync_bits to find a
region to be recovered, the region whose state was changed from RH_NOSYNC
to RH_CLEAN is no longer recovered.
If you haven't applied dm-raid1-read-balancing.patch proposed in dm-devel
sometimes ago, the contents of the mirrored disk just corrupt silently. If
you have, balanced read may get bogus data from out-of-sync disks.
The RH_NOSYNC region will be changed to RH_RECOVERING when recovery starts
on the region and get reclaimed when the recovery completes. So it doesn't
leak the region hash entry.
Alasdair said:
I've analysed the relevant part of the state machine and I believe that
the patch is correct.
(Further work on this code is still needed - this patch has the
side-effect of holding onto memory unnecessarily for long periods of time
under certain workloads - but better that than corrupting data.)
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'drivers/md')
-rw-r--r-- | drivers/md/dm-raid1.c | 14 |
1 files changed, 13 insertions, 1 deletions
diff --git a/drivers/md/dm-raid1.c b/drivers/md/dm-raid1.c index 4e90f231fbf..3d90f1b37e0 100644 --- a/drivers/md/dm-raid1.c +++ b/drivers/md/dm-raid1.c | |||
@@ -402,9 +402,21 @@ static void rh_dec(struct region_hash *rh, region_t region) | |||
402 | 402 | ||
403 | spin_lock_irqsave(&rh->region_lock, flags); | 403 | spin_lock_irqsave(&rh->region_lock, flags); |
404 | if (atomic_dec_and_test(®->pending)) { | 404 | if (atomic_dec_and_test(®->pending)) { |
405 | /* | ||
406 | * There is no pending I/O for this region. | ||
407 | * We can move the region to corresponding list for next action. | ||
408 | * At this point, the region is not yet connected to any list. | ||
409 | * | ||
410 | * If the state is RH_NOSYNC, the region should be kept off | ||
411 | * from clean list. | ||
412 | * The hash entry for RH_NOSYNC will remain in memory | ||
413 | * until the region is recovered or the map is reloaded. | ||
414 | */ | ||
415 | |||
416 | /* do nothing for RH_NOSYNC */ | ||
405 | if (reg->state == RH_RECOVERING) { | 417 | if (reg->state == RH_RECOVERING) { |
406 | list_add_tail(®->list, &rh->quiesced_regions); | 418 | list_add_tail(®->list, &rh->quiesced_regions); |
407 | } else { | 419 | } else if (reg->state == RH_DIRTY) { |
408 | reg->state = RH_CLEAN; | 420 | reg->state = RH_CLEAN; |
409 | list_add(®->list, &rh->clean_regions); | 421 | list_add(®->list, &rh->clean_regions); |
410 | } | 422 | } |