[PATCH] readahead: backoff on I/O error

Backoff readahead size exponentially on I/O error. Michael Tokarev <mjt@tls.msk.ru> described the problem as: [QUOTE] Suppose there's a CD-rom with a scratch/etc, one sector is unreadable. In order to "fix" it, one have to read it and write to another CD-rom, or something.. or just ignore the error (if it's just a skip in a video stream). Let's assume the unreadable block is number U. But current behavior is just insane. An application requests block number N, which is before U. Kernel tries to read-ahead blocks N..U. Cdrom drive tries to read it, re-read it.. for some time. Finally, when all the N..U-1 blocks are read, kernel returns block number N (as requested) to an application, successefully. Now an app requests block number N+1, and kernel tries to read blocks N+1..U+1. Retrying again as in previous step. And so on, up to when an app requests block number U-1. And when, finally, it requests block U, it receives read error. So, kernel currentry tries to re-read the same failing block as many times as the current readahead value (256 (times?) by default). This whole process already killed my cdrom drive (I posted about it to LKML several months ago) - literally, the drive has fried, and does not work anymore. Ofcourse that problem was a bug in firmware (or whatever) of the drive *too*, but.. main problem with that is current readahead logic as described above. [/QUOTE] Which was confirmed by Jens Axboe <axboe@suse.de>: [QUOTE] For ide-cd, it tends do only end the first part of the request on a medium error. So you may see a lot of repeats :/ [/QUOTE] With this patch, retries are expected to be reduced from, say, 256, to 5. [akpm@osdl.org: cleanups] Signed-off-by: Wu Fengguang <wfg@mail.ustc.edu.cn> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
author: Wu Fengguang <wfg@mail.ustc.edu.cn> 2006-06-25 08:48:43 -0400
committer: Linus Torvalds <torvalds@g5.osdl.org> 2006-06-25 13:01:17 -0400
commit: 76d42bd96984832c4ea8bc8cbd74e496ac31409e (patch)
tree: 138fb5c39d671166485cf2e16e450332daeb7081 /mm/filemap.c
parent: 78dbe706e22f54bce61571ad837238382e1ba5f9 (diff)
1 files changed, 28 insertions, 0 deletions
diff --git a/mm/filemap.c b/mm/filemap.c
index 1ed4be2a7654..9c7334bafda8 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -828,6 +828,32 @@ grab_cache_page_nowait(struct address_space *mapping, unsigned long index)
 }
 EXPORT_SYMBOL(grab_cache_page_nowait);
+/*
+ * CD/DVDs are error prone. When a medium error occurs, the driver may fail
+ * a _large_ part of the i/o request. Imagine the worst scenario:
+ *
+ *      ---R__________________________________________B__________
+ *         ^ reading here                             ^ bad block(assume 4k)
+ *
+ * read(R) => miss => readahead(R...B) => media error => frustrating retries
+ * => failing the whole request => read(R) => read(R+1) =>
+ * readahead(R+1...B+1) => bang => read(R+2) => read(R+3) =>
+ * readahead(R+3...B+2) => bang => read(R+3) => read(R+4) =>
+ * readahead(R+4...B+3) => bang => read(R+4) => read(R+5) => ......
+ *
+ * It is going insane. Fix it by quickly scaling down the readahead size.
+ */
+static void shrink_readahead_size_eio(struct file *filp,
+                                        struct file_ra_state *ra)
+{
+        if (!ra->ra_pages)
+                return;
+        ra->ra_pages /= 4;
+        printk(KERN_WARNING "Reducing readahead size to %luK\n",
+                        ra->ra_pages << (PAGE_CACHE_SHIFT - 10));
+}
 /**
 * do_generic_mapping_read - generic file read routine
 * @mapping:    address_space to be read
@@ -985,6 +1011,7 @@ readpage:
                                }
                                unlock_page(page);
                                error = -EIO;
+                                shrink_readahead_size_eio(filp, &ra);
                                goto readpage_error;
                        }
                        unlock_page(page);
@@ -1522,6 +1549,7 @@ page_not_uptodate:
         * Things didn't work out. Return zero to tell the
         * mm layer so, possibly freeing the page cache page first.
         */
+        shrink_readahead_size_eio(file, ra);
        page_cache_release(page);
        return NULL;
 }
author	Wu Fengguang <wfg@mail.ustc.edu.cn>	2006-06-25 08:48:43 -0400
committer	Linus Torvalds <torvalds@g5.osdl.org>	2006-06-25 13:01:17 -0400
commit	76d42bd96984832c4ea8bc8cbd74e496ac31409e (patch)
tree	138fb5c39d671166485cf2e16e450332daeb7081 /mm/filemap.c
parent	78dbe706e22f54bce61571ad837238382e1ba5f9 (diff)

diff --git a/mm/filemap.c b/mm/filemap.c index 1ed4be2a7654..9c7334bafda8 100644 --- a/mm/filemap.c +++ b/mm/filemap.c
@@ -828,6 +828,32 @@ grab_cache_page_nowait(struct address_space *mapping, unsigned long index)
828	}	828	}
829	EXPORT_SYMBOL(grab_cache_page_nowait);	829	EXPORT_SYMBOL(grab_cache_page_nowait);
830		830
		831	/*
		832	* CD/DVDs are error prone. When a medium error occurs, the driver may fail
		833	* a _large_ part of the i/o request. Imagine the worst scenario:
		834	*
		835	* ---R__________________________________________B__________
		836	* ^ reading here ^ bad block(assume 4k)
		837	*
		838	* read(R) => miss => readahead(R...B) => media error => frustrating retries
		839	* => failing the whole request => read(R) => read(R+1) =>
		840	* readahead(R+1...B+1) => bang => read(R+2) => read(R+3) =>
		841	* readahead(R+3...B+2) => bang => read(R+3) => read(R+4) =>
		842	* readahead(R+4...B+3) => bang => read(R+4) => read(R+5) => ......
		843	*
		844	* It is going insane. Fix it by quickly scaling down the readahead size.
		845	*/
		846	static void shrink_readahead_size_eio(struct file *filp,
		847	struct file_ra_state *ra)
		848	{
		849	if (!ra->ra_pages)
		850	return;
		851
		852	ra->ra_pages /= 4;
		853	printk(KERN_WARNING "Reducing readahead size to %luK\n",
		854	ra->ra_pages << (PAGE_CACHE_SHIFT - 10));
		855	}
		856
831	/**	857	/**
832	* do_generic_mapping_read - generic file read routine	858	* do_generic_mapping_read - generic file read routine
833	* @mapping: address_space to be read	859	* @mapping: address_space to be read
@@ -985,6 +1011,7 @@ readpage:
985	}	1011	}
986	unlock_page(page);	1012	unlock_page(page);
987	error = -EIO;	1013	error = -EIO;
		1014	shrink_readahead_size_eio(filp, &ra);
988	goto readpage_error;	1015	goto readpage_error;
989	}	1016	}
990	unlock_page(page);	1017	unlock_page(page);
@@ -1522,6 +1549,7 @@ page_not_uptodate:
1522	* Things didn't work out. Return zero to tell the	1549	* Things didn't work out. Return zero to tell the
1523	* mm layer so, possibly freeing the page cache page first.	1550	* mm layer so, possibly freeing the page cache page first.
1524	*/	1551	*/
		1552	shrink_readahead_size_eio(file, ra);
1525	page_cache_release(page);	1553	page_cache_release(page);
1526	return NULL;	1554	return NULL;
1527	}	1555	}