aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorEric Mei <eric.mei@seagate.com>2015-03-19 01:39:11 -0400
committerNeilBrown <neilb@suse.de>2015-04-21 18:00:43 -0400
commit9ffc8f7cb9647b13dfe4d1ad0d5e1427bb8b46d6 (patch)
treeffb66260046b7c622362daf1fa0105e2c9b06ce7
parentedbe83ab4c27ea6669eb57adb5ed7eaec1118ceb (diff)
md/raid5: don't do chunk aligned read on degraded array.
When array is degraded, read data landed on failed drives will result in reading rest of data in a stripe. So a single sequential read would result in same data being read twice. This patch is to avoid chunk aligned read for degraded array. The downside is to involve stripe cache which means associated CPU overhead and extra memory copy. Test Results: Following test are done on a enterprise storage node with Seagate 6T SAS drives and Xeon E5-2648L CPU (10 cores, 1.9Ghz), 10 disks MD RAID6 8+2, chunk size 128 KiB. I use FIO, using direct-io with various bs size, enough queue depth, tested sequential and 100% random read against 3 array config: 1) optimal, as baseline; 2) degraded; 3) degraded with this patch. Kernel version is 4.0-rc3. Each individual test I only did once so there might be some variations, but we just focus on big trend. Sequential Read: bs=(KiB) optimal(MiB/s) degraded(MiB/s) degraded-with-patch (MiB/s) 1024 1608 656 995 512 1624 710 956 256 1635 728 980 128 1636 771 983 64 1612 1119 1000 32 1580 1420 1004 16 1368 688 986 8 768 647 953 4 411 413 850 Random Read: bs=(KiB) optimal(IOPS) degraded(IOPS) degraded-with-patch (IOPS) 1024 163 160 156 512 274 273 272 256 426 428 424 128 576 592 591 64 726 724 726 32 849 848 837 16 900 970 971 8 927 940 929 4 948 940 955 Some notes: * In sequential + optimal, as bs size getting smaller, the FIO thread become CPU bound. * In sequential + degraded, there's big increase when bs is 64K and 32K, I don't have explanation. * In sequential + degraded-with-patch, the MD thread mostly become CPU bound. If you want to we can discuss specific data point in those data. But in general it seems with this patch, we have more predictable and in most cases significant better sequential read performance when array is degraded, and almost no noticeable impact on random read. Performance is a complicated thing, the patch works well for this particular configuration, but may not be universal. For example I imagine testing on all SSD array may have very different result. But I personally think in most cases IO bandwidth is more scarce resource than CPU. Signed-off-by: Eric Mei <eric.mei@seagate.com> Signed-off-by: NeilBrown <neilb@suse.de>
-rw-r--r--drivers/md/raid5.c15
1 files changed, 12 insertions, 3 deletions
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 9716319cc477..77dfd720aaa0 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4632,8 +4632,12 @@ static int raid5_mergeable_bvec(struct mddev *mddev,
4632 unsigned int chunk_sectors = mddev->chunk_sectors; 4632 unsigned int chunk_sectors = mddev->chunk_sectors;
4633 unsigned int bio_sectors = bvm->bi_size >> 9; 4633 unsigned int bio_sectors = bvm->bi_size >> 9;
4634 4634
4635 if ((bvm->bi_rw & 1) == WRITE) 4635 /*
4636 return biovec->bv_len; /* always allow writes to be mergeable */ 4636 * always allow writes to be mergeable, read as well if array
4637 * is degraded as we'll go through stripe cache anyway.
4638 */
4639 if ((bvm->bi_rw & 1) == WRITE || mddev->degraded)
4640 return biovec->bv_len;
4637 4641
4638 if (mddev->new_chunk_sectors < mddev->chunk_sectors) 4642 if (mddev->new_chunk_sectors < mddev->chunk_sectors)
4639 chunk_sectors = mddev->new_chunk_sectors; 4643 chunk_sectors = mddev->new_chunk_sectors;
@@ -5110,7 +5114,12 @@ static void make_request(struct mddev *mddev, struct bio * bi)
5110 5114
5111 md_write_start(mddev, bi); 5115 md_write_start(mddev, bi);
5112 5116
5113 if (rw == READ && 5117 /*
5118 * If array is degraded, better not do chunk aligned read because
5119 * later we might have to read it again in order to reconstruct
5120 * data on failed drives.
5121 */
5122 if (rw == READ && mddev->degraded == 0 &&
5114 mddev->reshape_position == MaxSector && 5123 mddev->reshape_position == MaxSector &&
5115 chunk_aligned_read(mddev,bi)) 5124 chunk_aligned_read(mddev,bi))
5116 return; 5125 return;