diff options
author | Shaohua Li <shli@fb.com> | 2017-02-28 16:00:20 -0500 |
---|---|---|
committer | Greg Kroah-Hartman <gregkh@linuxfoundation.org> | 2017-03-26 07:05:57 -0400 |
commit | ad5166415ff3178cc75331dc6366ea8f4e48207d (patch) | |
tree | b5d9b456d1758b86d02cddd8cd6d5eb9d85f3bd3 /drivers/md | |
parent | 4265e0b487da6427105e8238679a237704557945 (diff) |
md/raid1/10: fix potential deadlock
commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream.
Neil Brown pointed out a potential deadlock in raid 10 code with
bio_split/chain. The raid1 code could have the same issue, but recent
barrier rework makes it less likely to happen. The deadlock happens in
below sequence:
1. generic_make_request(bio), this will set current->bio_list
2. raid10_make_request will split bio to bio1 and bio2
3. __make_request(bio1), wait_barrer, add underlayer disk bio to
current->bio_list
4. __make_request(bio2), wait_barrer
If raise_barrier happens between 3 & 4, since wait_barrier runs at 3,
raise_barrier waits for IO completion from 3. And since raise_barrier
sets barrier, 4 waits for raise_barrier. But IO from 3 can't be
dispatched because raid10_make_request() doesn't finished yet.
The solution is to adjust the IO ordering. Quotes from Neil:
"
It is much safer to:
if (need to split) {
split = bio_split(bio, ...)
bio_chain(...)
make_request_fn(split);
generic_make_request(bio);
} else
make_request_fn(mddev, bio);
This way we first process the initial section of the bio (in 'split')
which will queue some requests to the underlying devices. These
requests will be queued in generic_make_request.
Then we queue the remainder of the bio, which will be added to the end
of the generic_make_request queue.
Then we return.
generic_make_request() will pop the lower-level device requests off the
queue and handle them first. Then it will process the remainder
of the original bio once the first section has been fully processed.
"
Note, this only happens in read path. In write path, the bio is flushed to
underlaying disks either by blk flush (from schedule) or offladed to raid1/10d.
It's queued in current->bio_list.
Cc: Coly Li <colyli@suse.de>
Suggested-by: NeilBrown <neilb@suse.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Diffstat (limited to 'drivers/md')
-rw-r--r-- | drivers/md/raid10.c | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c index 39fddda2fef2..55b5e0e77b17 100644 --- a/drivers/md/raid10.c +++ b/drivers/md/raid10.c | |||
@@ -1470,7 +1470,25 @@ static void raid10_make_request(struct mddev *mddev, struct bio *bio) | |||
1470 | split = bio; | 1470 | split = bio; |
1471 | } | 1471 | } |
1472 | 1472 | ||
1473 | /* | ||
1474 | * If a bio is splitted, the first part of bio will pass | ||
1475 | * barrier but the bio is queued in current->bio_list (see | ||
1476 | * generic_make_request). If there is a raise_barrier() called | ||
1477 | * here, the second part of bio can't pass barrier. But since | ||
1478 | * the first part bio isn't dispatched to underlaying disks | ||
1479 | * yet, the barrier is never released, hence raise_barrier will | ||
1480 | * alays wait. We have a deadlock. | ||
1481 | * Note, this only happens in read path. For write path, the | ||
1482 | * first part of bio is dispatched in a schedule() call | ||
1483 | * (because of blk plug) or offloaded to raid10d. | ||
1484 | * Quitting from the function immediately can change the bio | ||
1485 | * order queued in bio_list and avoid the deadlock. | ||
1486 | */ | ||
1473 | __make_request(mddev, split); | 1487 | __make_request(mddev, split); |
1488 | if (split != bio && bio_data_dir(bio) == READ) { | ||
1489 | generic_make_request(bio); | ||
1490 | break; | ||
1491 | } | ||
1474 | } while (split != bio); | 1492 | } while (split != bio); |
1475 | 1493 | ||
1476 | /* In case raid10d snuck in to freeze_array */ | 1494 | /* In case raid10d snuck in to freeze_array */ |