md/raid6: Fix anomily when recovering a single device in RAID6.

When recoverying a single missing/failed device in a RAID6, those stripes where the Q block is on the missing device are handled a bit differently. In these cases it is easy to check that the P block is correct, so we do. This results in the P block be destroy. Consequently the P block needs to be read a second time in order to compute Q. This causes lots of seeks and hurts performance. It shouldn't be necessary to re-read P as it can be computed from the DATA. But we only compute blocks on missing devices, since c337869d9501 ("md: do not compute parity unless it is on a failed drive"). So relax the change made in that commit to allow computing of the P block in a RAID6 which it is the only missing that block. This makes RAID6 recovery run much faster as the disk just "before" the recovering device is no longer seeking back-and-forth. Reported-by-tested-by: Brad Campbell <lists2009@fnarfbargle.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
author: NeilBrown <neilb@suse.com> 2017-04-02 22:11:32 -0400
committer: Shaohua Li <shli@fb.com> 2017-04-10 13:35:27 -0400
commit: 7471fb77ce4dc4cb81291189947fcdf621a97987 (patch)
tree: 8b8e0ee2aec838c866e15634c43e3f98215b5b88
parent: 583da48e388f472e8818d9bb60ef6a1d40ee9f9d (diff)
1 files changed, 12 insertions, 1 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index a5676559e7a6..09d94ad5e52b 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3619,9 +3619,20 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s,
                BUG_ON(test_bit(R5_Wantcompute, &dev->flags));
                BUG_ON(test_bit(R5_Wantread, &dev->flags));
                BUG_ON(sh->batch_head);
+                /*
+                 * In the raid6 case if the only non-uptodate disk is P
+                 * then we already trusted P to compute the other failed
+                 * drives. It is safe to compute rather than re-read P.
+                 * In other cases we only compute blocks from failed
+                 * devices, otherwise check/repair might fail to detect
+                 * a real inconsistency.
+                 */
                if ((s->uptodate == disks - 1) &&
+                    ((sh->qd_idx >= 0 && sh->pd_idx == disk_idx) ||
                    (s->failed && (disk_idx == s->failed_num[0] ||
-                                   disk_idx == s->failed_num[1]))) {
+                                   disk_idx == s->failed_num[1])))) {
                        /* have disk failed, and we're requested to fetch it;
                         * do compute it
                         */
author	NeilBrown <neilb@suse.com>	2017-04-02 22:11:32 -0400
committer	Shaohua Li <shli@fb.com>	2017-04-10 13:35:27 -0400
commit	7471fb77ce4dc4cb81291189947fcdf621a97987 (patch)
tree	8b8e0ee2aec838c866e15634c43e3f98215b5b88
parent	583da48e388f472e8818d9bb60ef6a1d40ee9f9d (diff)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index a5676559e7a6..09d94ad5e52b 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c
@@ -3619,9 +3619,20 @@ static int fetch_block(struct stripe_head sh, struct stripe_head_state s,
3619	BUG_ON(test_bit(R5_Wantcompute, &dev->flags));	3619	BUG_ON(test_bit(R5_Wantcompute, &dev->flags));
3620	BUG_ON(test_bit(R5_Wantread, &dev->flags));	3620	BUG_ON(test_bit(R5_Wantread, &dev->flags));
3621	BUG_ON(sh->batch_head);	3621	BUG_ON(sh->batch_head);
		3622
		3623	/*
		3624	* In the raid6 case if the only non-uptodate disk is P
		3625	* then we already trusted P to compute the other failed
		3626	* drives. It is safe to compute rather than re-read P.
		3627	* In other cases we only compute blocks from failed
		3628	* devices, otherwise check/repair might fail to detect
		3629	* a real inconsistency.
		3630	*/
		3631
3622	if ((s->uptodate == disks - 1) &&	3632	if ((s->uptodate == disks - 1) &&
		3633	((sh->qd_idx >= 0 && sh->pd_idx == disk_idx) \|\|
3623	(s->failed && (disk_idx == s->failed_num[0] \|\|	3634	(s->failed && (disk_idx == s->failed_num[0] \|\|
3624	disk_idx == s->failed_num[1]))) {	3635	disk_idx == s->failed_num[1])))) {
3625	/* have disk failed, and we're requested to fetch it;	3636	/* have disk failed, and we're requested to fetch it;
3626	* do compute it	3637	* do compute it
3627	*/	3638	*/