aboutsummaryrefslogtreecommitdiffstats
path: root/mm
diff options
context:
space:
mode:
authorZach Brown <zach.brown@oracle.com>2006-12-10 05:21:05 -0500
committerLinus Torvalds <torvalds@woody.osdl.org>2006-12-10 12:57:21 -0500
commit8459d86aff04fa53c2ab6a6b9f355b3063cc8014 (patch)
treec0584c4907f0d63a18998b7cbffdf7900609606b /mm
parent20258b2b397031649e4a41922fe803d57017df84 (diff)
[PATCH] dio: only call aio_complete() after returning -EIOCBQUEUED
The only time it is safe to call aio_complete() is when the ->ki_retry function returns -EIOCBQUEUED to the AIO core. direct_io_worker() has historically done this by relying on its caller to translate positive return codes into -EIOCBQUEUED for the aio case. It did this by trying to keep conditionals in sync. direct_io_worker() knew when finished_one_bio() was going to call aio_complete(). It would reverse the test and wait and free the dio in the cases it thought that finished_one_bio() wasn't going to. Not surprisingly, it ended up getting it wrong. 'ret' could be a negative errno from the submission path but it failed to communicate this to finished_one_bio(). direct_io_worker() would return < 0, it's callers wouldn't raise -EIOCBQUEUED, and aio_complete() would be called. In the future finished_one_bio()'s tests wouldn't reflect this and aio_complete() would be called for a second time which can manifest as an oops. The previous cleanups have whittled the sync and async completion paths down to the point where we can collapse them and clearly reassert the invariant that we must only call aio_complete() after returning -EIOCBQUEUED. direct_io_worker() will only return -EIOCBQUEUED when it is not the last to drop the dio refcount and the aio bio completion path will only call aio_complete() when it is the last to drop the dio refcount. direct_io_worker() can ensure that it is the last to drop the reference count by waiting for bios to drain. It does this for sync ops, of course, and for partial dio writes that must fall back to buffered and for aio ops that saw errors during submission. This means that operations that end up waiting, even if they were issued as aio ops, will not call aio_complete() from dio. Instead we return the return code of the operation and let the aio core call aio_complete(). This is purposely done to fix a bug where AIO DIO file extensions would call aio_complete() before their callers have a chance to update i_size. Now that direct_io_worker() is explicitly returning -EIOCBQUEUED its callers no longer have to translate for it. XFS needs to be careful not to free resources that will be used during AIO completion if -EIOCBQUEUED is returned. We maintain the previous behaviour of trying to write fs metadata for O_SYNC aio+dio writes. Signed-off-by: Zach Brown <zach.brown@oracle.com> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Suparna Bhattacharya <suparna@in.ibm.com> Acked-by: Jeff Moyer <jmoyer@redhat.com> Cc: <xfs-masters@oss.sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Diffstat (limited to 'mm')
-rw-r--r--mm/filemap.c9
1 files changed, 3 insertions, 6 deletions
diff --git a/mm/filemap.c b/mm/filemap.c
index 606432f71b3a..8332c77b1bd1 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1181,8 +1181,6 @@ generic_file_aio_read(struct kiocb *iocb, const struct iovec *iov,
1181 if (pos < size) { 1181 if (pos < size) {
1182 retval = generic_file_direct_IO(READ, iocb, 1182 retval = generic_file_direct_IO(READ, iocb,
1183 iov, pos, nr_segs); 1183 iov, pos, nr_segs);
1184 if (retval > 0 && !is_sync_kiocb(iocb))
1185 retval = -EIOCBQUEUED;
1186 if (retval > 0) 1184 if (retval > 0)
1187 *ppos = pos + retval; 1185 *ppos = pos + retval;
1188 } 1186 }
@@ -2047,15 +2045,14 @@ generic_file_direct_write(struct kiocb *iocb, const struct iovec *iov,
2047 * Sync the fs metadata but not the minor inode changes and 2045 * Sync the fs metadata but not the minor inode changes and
2048 * of course not the data as we did direct DMA for the IO. 2046 * of course not the data as we did direct DMA for the IO.
2049 * i_mutex is held, which protects generic_osync_inode() from 2047 * i_mutex is held, which protects generic_osync_inode() from
2050 * livelocking. 2048 * livelocking. AIO O_DIRECT ops attempt to sync metadata here.
2051 */ 2049 */
2052 if (written >= 0 && ((file->f_flags & O_SYNC) || IS_SYNC(inode))) { 2050 if ((written >= 0 || written == -EIOCBQUEUED) &&
2051 ((file->f_flags & O_SYNC) || IS_SYNC(inode))) {
2053 int err = generic_osync_inode(inode, mapping, OSYNC_METADATA); 2052 int err = generic_osync_inode(inode, mapping, OSYNC_METADATA);
2054 if (err < 0) 2053 if (err < 0)
2055 written = err; 2054 written = err;
2056 } 2055 }
2057 if (written == count && !is_sync_kiocb(iocb))
2058 written = -EIOCBQUEUED;
2059 return written; 2056 return written;
2060} 2057}
2061EXPORT_SYMBOL(generic_file_direct_write); 2058EXPORT_SYMBOL(generic_file_direct_write);