diff options
author | NeilBrown <neilb@suse.com> | 2017-04-02 22:11:32 -0400 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2017-04-10 13:35:27 -0400 |
commit | 7471fb77ce4dc4cb81291189947fcdf621a97987 (patch) | |
tree | 8b8e0ee2aec838c866e15634c43e3f98215b5b88 | |
parent | 583da48e388f472e8818d9bb60ef6a1d40ee9f9d (diff) |
md/raid6: Fix anomily when recovering a single device in RAID6.
When recoverying a single missing/failed device in a RAID6,
those stripes where the Q block is on the missing device are
handled a bit differently. In these cases it is easy to
check that the P block is correct, so we do. This results
in the P block be destroy. Consequently the P block needs
to be read a second time in order to compute Q. This causes
lots of seeks and hurts performance.
It shouldn't be necessary to re-read P as it can be computed
from the DATA. But we only compute blocks on missing
devices, since c337869d9501 ("md: do not compute parity
unless it is on a failed drive").
So relax the change made in that commit to allow computing
of the P block in a RAID6 which it is the only missing that
block.
This makes RAID6 recovery run much faster as the disk just
"before" the recovering device is no longer seeking
back-and-forth.
Reported-by-tested-by: Brad Campbell <lists2009@fnarfbargle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
-rw-r--r-- | drivers/md/raid5.c | 13 |
1 files changed, 12 insertions, 1 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index a5676559e7a6..09d94ad5e52b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c | |||
@@ -3619,9 +3619,20 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s, | |||
3619 | BUG_ON(test_bit(R5_Wantcompute, &dev->flags)); | 3619 | BUG_ON(test_bit(R5_Wantcompute, &dev->flags)); |
3620 | BUG_ON(test_bit(R5_Wantread, &dev->flags)); | 3620 | BUG_ON(test_bit(R5_Wantread, &dev->flags)); |
3621 | BUG_ON(sh->batch_head); | 3621 | BUG_ON(sh->batch_head); |
3622 | |||
3623 | /* | ||
3624 | * In the raid6 case if the only non-uptodate disk is P | ||
3625 | * then we already trusted P to compute the other failed | ||
3626 | * drives. It is safe to compute rather than re-read P. | ||
3627 | * In other cases we only compute blocks from failed | ||
3628 | * devices, otherwise check/repair might fail to detect | ||
3629 | * a real inconsistency. | ||
3630 | */ | ||
3631 | |||
3622 | if ((s->uptodate == disks - 1) && | 3632 | if ((s->uptodate == disks - 1) && |
3633 | ((sh->qd_idx >= 0 && sh->pd_idx == disk_idx) || | ||
3623 | (s->failed && (disk_idx == s->failed_num[0] || | 3634 | (s->failed && (disk_idx == s->failed_num[0] || |
3624 | disk_idx == s->failed_num[1]))) { | 3635 | disk_idx == s->failed_num[1])))) { |
3625 | /* have disk failed, and we're requested to fetch it; | 3636 | /* have disk failed, and we're requested to fetch it; |
3626 | * do compute it | 3637 | * do compute it |
3627 | */ | 3638 | */ |