aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/md.h
diff options
context:
space:
mode:
authorRobert Becker <Rob.Becker@riverbed.com>2009-12-13 20:49:58 -0500
committerNeilBrown <neilb@suse.de>2009-12-13 20:51:41 -0500
commit1e50915fe0bbf7a46db0fa7e1e604d3fc95f057d (patch)
tree7a722ad6f56c61a6173493f1cd44d809c8b1bd8d /drivers/md/md.h
parent67b8dc4b06b0e97df55fd76e209f34f9a52e820e (diff)
raid: improve MD/raid10 handling of correctable read errors.
We've noticed severe lasting performance degradation of our raid arrays when we have drives that yield large amounts of media errors. The raid10 module will queue each failed read for retry, and also will attempt call fix_read_error() to perform the read recovery. Read recovery is performed while the array is frozen, so repeated recovery attempts can degrade the performance of the array for extended periods of time. With this patch I propose adding a per md device max number of corrected read attempts. Each rdev will maintain a count of read correction attempts in the rdev->read_errors field (not used currently for raid10). When we enter fix_read_error() we'll check to see when the last read error occurred, and divide the read error count by 2 for every hour since the last read error. If at that point our read error count exceeds the read error threshold, we'll fail the raid device. In addition in this patch I add sysfs nodes (get/set) for the per md max_read_errors attribute, the rdev->read_errors attribute, and added some printk's to indicate when fix_read_error fails to repair an rdev. For testing I used debugfs->fail_make_request to inject IO errors to the rdev while doing IO to the raid array. Signed-off-by: Robert Becker <Rob.Becker@riverbed.com> Signed-off-by: NeilBrown <neilb@suse.de>
Diffstat (limited to 'drivers/md/md.h')
-rw-r--r--drivers/md/md.h4
1 files changed, 4 insertions, 0 deletions
diff --git a/drivers/md/md.h b/drivers/md/md.h
index d9138885b87f..8e4c75c00d46 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -97,6 +97,9 @@ struct mdk_rdev_s
97 atomic_t read_errors; /* number of consecutive read errors that 97 atomic_t read_errors; /* number of consecutive read errors that
98 * we have tried to ignore. 98 * we have tried to ignore.
99 */ 99 */
100 struct timespec last_read_error; /* monotonic time since our
101 * last read error
102 */
100 atomic_t corrected_errors; /* number of corrected read errors, 103 atomic_t corrected_errors; /* number of corrected read errors,
101 * for reporting to userspace and storing 104 * for reporting to userspace and storing
102 * in superblock. 105 * in superblock.
@@ -299,6 +302,7 @@ struct mddev_s
299 int external; 302 int external;
300 } bitmap_info; 303 } bitmap_info;
301 304
305 atomic_t max_corr_read_errors; /* max read retries */
302 struct list_head all_mddevs; 306 struct list_head all_mddevs;
303 307
304 /* Generic barrier handling. 308 /* Generic barrier handling.