aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorShaohua Li <shli@fb.com>2017-02-28 16:00:20 -0500
committerGreg Kroah-Hartman <gregkh@linuxfoundation.org>2017-03-26 07:05:57 -0400
commitad5166415ff3178cc75331dc6366ea8f4e48207d (patch)
treeb5d9b456d1758b86d02cddd8cd6d5eb9d85f3bd3
parent4265e0b487da6427105e8238679a237704557945 (diff)
md/raid1/10: fix potential deadlock
commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream. Neil Brown pointed out a potential deadlock in raid 10 code with bio_split/chain. The raid1 code could have the same issue, but recent barrier rework makes it less likely to happen. The deadlock happens in below sequence: 1. generic_make_request(bio), this will set current->bio_list 2. raid10_make_request will split bio to bio1 and bio2 3. __make_request(bio1), wait_barrer, add underlayer disk bio to current->bio_list 4. __make_request(bio2), wait_barrer If raise_barrier happens between 3 & 4, since wait_barrier runs at 3, raise_barrier waits for IO completion from 3. And since raise_barrier sets barrier, 4 waits for raise_barrier. But IO from 3 can't be dispatched because raid10_make_request() doesn't finished yet. The solution is to adjust the IO ordering. Quotes from Neil: " It is much safer to: if (need to split) { split = bio_split(bio, ...) bio_chain(...) make_request_fn(split); generic_make_request(bio); } else make_request_fn(mddev, bio); This way we first process the initial section of the bio (in 'split') which will queue some requests to the underlying devices. These requests will be queued in generic_make_request. Then we queue the remainder of the bio, which will be added to the end of the generic_make_request queue. Then we return. generic_make_request() will pop the lower-level device requests off the queue and handle them first. Then it will process the remainder of the original bio once the first section has been fully processed. " Note, this only happens in read path. In write path, the bio is flushed to underlaying disks either by blk flush (from schedule) or offladed to raid1/10d. It's queued in current->bio_list. Cc: Coly Li <colyli@suse.de> Suggested-by: NeilBrown <neilb@suse.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-rw-r--r--drivers/md/raid10.c18
1 files changed, 18 insertions, 0 deletions
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 39fddda2fef2..55b5e0e77b17 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1470,7 +1470,25 @@ static void raid10_make_request(struct mddev *mddev, struct bio *bio)
1470 split = bio; 1470 split = bio;
1471 } 1471 }
1472 1472
1473 /*
1474 * If a bio is splitted, the first part of bio will pass
1475 * barrier but the bio is queued in current->bio_list (see
1476 * generic_make_request). If there is a raise_barrier() called
1477 * here, the second part of bio can't pass barrier. But since
1478 * the first part bio isn't dispatched to underlaying disks
1479 * yet, the barrier is never released, hence raise_barrier will
1480 * alays wait. We have a deadlock.
1481 * Note, this only happens in read path. For write path, the
1482 * first part of bio is dispatched in a schedule() call
1483 * (because of blk plug) or offloaded to raid10d.
1484 * Quitting from the function immediately can change the bio
1485 * order queued in bio_list and avoid the deadlock.
1486 */
1473 __make_request(mddev, split); 1487 __make_request(mddev, split);
1488 if (split != bio && bio_data_dir(bio) == READ) {
1489 generic_make_request(bio);
1490 break;
1491 }
1474 } while (split != bio); 1492 } while (split != bio);
1475 1493
1476 /* In case raid10d snuck in to freeze_array */ 1494 /* In case raid10d snuck in to freeze_array */