summaryrefslogtreecommitdiffstats
path: root/mm/filemap.c
diff options
context:
space:
mode:
authorJeff Layton <jlayton@redhat.com>2017-07-06 07:02:25 -0400
committerJeff Layton <jlayton@redhat.com>2017-07-06 07:02:25 -0400
commit5660e13d2fd6af1903d4b0b98020af95ca2d638a (patch)
tree10944f111ba11bf1d3b194300018f3a45e9fd9e8 /mm/filemap.c
parent84cbadadc6eafc4798513773a2c8fce37dcd2fb8 (diff)
fs: new infrastructure for writeback error handling and reporting
Most filesystems currently use mapping_set_error and filemap_check_errors for setting and reporting/clearing writeback errors at the mapping level. filemap_check_errors is indirectly called from most of the filemap_fdatawait_* functions and from filemap_write_and_wait*. These functions are called from all sorts of contexts to wait on writeback to finish -- e.g. mostly in fsync, but also in truncate calls, getattr, etc. The non-fsync callers are problematic. We should be reporting writeback errors during fsync, but many places spread over the tree clear out errors before they can be properly reported, or report errors at nonsensical times. If I get -EIO on a stat() call, there is no reason for me to assume that it is because some previous writeback failed. The fact that it also clears out the error such that a subsequent fsync returns 0 is a bug, and a nasty one since that's potentially silent data corruption. This patch adds a small bit of new infrastructure for setting and reporting errors during address_space writeback. While the above was my original impetus for adding this, I think it's also the case that current fsync semantics are just problematic for userland. Most applications that call fsync do so to ensure that the data they wrote has hit the backing store. In the case where there are multiple writers to the file at the same time, this is really hard to determine. The first one to call fsync will see any stored error, and the rest get back 0. The processes with open fds may not be associated with one another in any way. They could even be in different containers, so ensuring coordination between all fsync callers is not really an option. One way to remedy this would be to track what file descriptor was used to dirty the file, but that's rather cumbersome and would likely be slow. However, there is a simpler way to improve the semantics here without incurring too much overhead. This set adds an errseq_t to struct address_space, and a corresponding one is added to struct file. Writeback errors are recorded in the mapping's errseq_t, and the one in struct file is used as the "since" value. This changes the semantics of the Linux fsync implementation such that applications can now use it to determine whether there were any writeback errors since fsync(fd) was last called (or since the file was opened in the case of fsync having never been called). Note that those writeback errors may have occurred when writing data that was dirtied via an entirely different fd, but that's the case now with the current mapping_set_error/filemap_check_error infrastructure. This will at least prevent you from getting a false report of success. The new behavior is still consistent with the POSIX spec, and is more reliable for application developers. This patch just adds some basic infrastructure for doing this, and ensures that the f_wb_err "cursor" is properly set when a file is opened. Later patches will change the existing code to use this new infrastructure for reporting errors at fsync time. Signed-off-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz>
Diffstat (limited to 'mm/filemap.c')
-rw-r--r--mm/filemap.c84
1 files changed, 84 insertions, 0 deletions
diff --git a/mm/filemap.c b/mm/filemap.c
index eb99b5f23c61..d7a30aefee0d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -553,6 +553,90 @@ int filemap_write_and_wait_range(struct address_space *mapping,
553} 553}
554EXPORT_SYMBOL(filemap_write_and_wait_range); 554EXPORT_SYMBOL(filemap_write_and_wait_range);
555 555
556void __filemap_set_wb_err(struct address_space *mapping, int err)
557{
558 errseq_t eseq = __errseq_set(&mapping->wb_err, err);
559
560 trace_filemap_set_wb_err(mapping, eseq);
561}
562EXPORT_SYMBOL(__filemap_set_wb_err);
563
564/**
565 * file_check_and_advance_wb_err - report wb error (if any) that was previously
566 * and advance wb_err to current one
567 * @file: struct file on which the error is being reported
568 *
569 * When userland calls fsync (or something like nfsd does the equivalent), we
570 * want to report any writeback errors that occurred since the last fsync (or
571 * since the file was opened if there haven't been any).
572 *
573 * Grab the wb_err from the mapping. If it matches what we have in the file,
574 * then just quickly return 0. The file is all caught up.
575 *
576 * If it doesn't match, then take the mapping value, set the "seen" flag in
577 * it and try to swap it into place. If it works, or another task beat us
578 * to it with the new value, then update the f_wb_err and return the error
579 * portion. The error at this point must be reported via proper channels
580 * (a'la fsync, or NFS COMMIT operation, etc.).
581 *
582 * While we handle mapping->wb_err with atomic operations, the f_wb_err
583 * value is protected by the f_lock since we must ensure that it reflects
584 * the latest value swapped in for this file descriptor.
585 */
586int file_check_and_advance_wb_err(struct file *file)
587{
588 int err = 0;
589 errseq_t old = READ_ONCE(file->f_wb_err);
590 struct address_space *mapping = file->f_mapping;
591
592 /* Locklessly handle the common case where nothing has changed */
593 if (errseq_check(&mapping->wb_err, old)) {
594 /* Something changed, must use slow path */
595 spin_lock(&file->f_lock);
596 old = file->f_wb_err;
597 err = errseq_check_and_advance(&mapping->wb_err,
598 &file->f_wb_err);
599 trace_file_check_and_advance_wb_err(file, old);
600 spin_unlock(&file->f_lock);
601 }
602 return err;
603}
604EXPORT_SYMBOL(file_check_and_advance_wb_err);
605
606/**
607 * file_write_and_wait_range - write out & wait on a file range
608 * @file: file pointing to address_space with pages
609 * @lstart: offset in bytes where the range starts
610 * @lend: offset in bytes where the range ends (inclusive)
611 *
612 * Write out and wait upon file offsets lstart->lend, inclusive.
613 *
614 * Note that @lend is inclusive (describes the last byte to be written) so
615 * that this function can be used to write to the very end-of-file (end = -1).
616 *
617 * After writing out and waiting on the data, we check and advance the
618 * f_wb_err cursor to the latest value, and return any errors detected there.
619 */
620int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
621{
622 int err = 0, err2;
623 struct address_space *mapping = file->f_mapping;
624
625 if ((!dax_mapping(mapping) && mapping->nrpages) ||
626 (dax_mapping(mapping) && mapping->nrexceptional)) {
627 err = __filemap_fdatawrite_range(mapping, lstart, lend,
628 WB_SYNC_ALL);
629 /* See comment of filemap_write_and_wait() */
630 if (err != -EIO)
631 __filemap_fdatawait_range(mapping, lstart, lend);
632 }
633 err2 = file_check_and_advance_wb_err(file);
634 if (!err)
635 err = err2;
636 return err;
637}
638EXPORT_SYMBOL(file_write_and_wait_range);
639
556/** 640/**
557 * replace_page_cache_page - replace a pagecache page with a new one 641 * replace_page_cache_page - replace a pagecache page with a new one
558 * @old: page to be replaced 642 * @old: page to be replaced