summaryrefslogtreecommitdiffstats
path: root/mm/filemap.c
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-07-07 22:38:17 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2017-07-07 22:38:17 -0400
commit088737f44bbf6378745f5b57b035e57ee3dc4750 (patch)
tree86a2b1240ea5f7a0ebca837d17a53c07cd07d62a /mm/filemap.c
parent33198c165b7afd500f7b6b7680ef994296805ef0 (diff)
parent333427a505be1e10d8da13427dc0c33ec1976b99 (diff)
Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull Writeback error handling updates from Jeff Layton: "This pile represents the bulk of the writeback error handling fixes that I have for this cycle. Some of the earlier patches in this pile may look trivial but they are prerequisites for later patches in the series. The aim of this set is to improve how we track and report writeback errors to userland. Most applications that care about data integrity will periodically call fsync/fdatasync/msync to ensure that their writes have made it to the backing store. For a very long time, we have tracked writeback errors using two flags in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a writeback error occurs (via mapping_set_error) and are cleared as a side-effect of filemap_check_errors (as you noted yesterday). This model really sucks for userland. Only the first task to call fsync (or msync or fdatasync) will see the error. Any subsequent task calling fsync on a file will get back 0 (unless another writeback error occurs in the interim). If I have several tasks writing to a file and calling fsync to ensure that their writes got stored, then I need to have them coordinate with one another. That's difficult enough, but in a world of containerized setups that coordination may even not be possible. But wait...it gets worse! The calls to filemap_check_errors can be buried pretty far down in the call stack, and there are internal callers of filemap_write_and_wait and the like that also end up clearing those errors. Many of those callers ignore the error return from that function or return it to userland at nonsensical times (e.g. truncate() or stat()). If I get back -EIO on a truncate, there is no reason to think that it was because some previous writeback failed, and a subsequent fsync() will (incorrectly) return 0. This pile aims to do three things: 1) ensure that when a writeback error occurs that that error will be reported to userland on a subsequent fsync/fdatasync/msync call, regardless of what internal callers are doing 2) report writeback errors on all file descriptions that were open at the time that the error occurred. This is a user-visible change, but I think most applications are written to assume this behavior anyway. Those that aren't are unlikely to be hurt by it. 3) document what filesystems should do when there is a writeback error. Today, there is very little consistency between them, and a lot of cargo-cult copying. We need to make it very clear what filesystems should do in this situation. To achieve this, the set adds a new data type (errseq_t) and then builds new writeback error tracking infrastructure around that. Once all of that is in place, we change the filesystems to use the new infrastructure for reporting wb errors to userland. Note that this is just the initial foray into cleaning up this mess. There is a lot of work remaining here: 1) convert the rest of the filesystems in a similar fashion. Once the initial set is in, then I think most other fs' will be fairly simple to convert. Hopefully most of those can in via individual filesystem trees. 2) convert internal waiters on writeback to use errseq_t for detecting errors instead of relying on the AS_* flags. I have some draft patches for this for ext4, but they are not quite ready for prime time yet. This was a discussion topic this year at LSF/MM too. If you're interested in the gory details, LWN has some good articles about this: https://lwn.net/Articles/718734/ https://lwn.net/Articles/724307/" * tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: btrfs: minimal conversion to errseq_t writeback error reporting on fsync xfs: minimal conversion to errseq_t writeback error reporting ext4: use errseq_t based error handling for reporting data writeback errors fs: convert __generic_file_fsync to use errseq_t based reporting block: convert to errseq_t based writeback error tracking dax: set errors in mapping when writeback fails Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error fs: new infrastructure for writeback error handling and reporting lib: add errseq_t type and infrastructure for handling it mm: don't TestClearPageError in __filemap_fdatawait_range mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails jbd2: don't clear and reset errors after waiting on writeback buffer: set errors in mapping at the time that the error occurs fs: check for writeback errors after syncing out buffers in generic_file_fsync buffer: use mapping_set_error instead of setting the flag mm: fix mapping_set_error call in me_pagecache_dirty
Diffstat (limited to 'mm/filemap.c')
-rw-r--r--mm/filemap.c126
1 files changed, 109 insertions, 17 deletions
diff --git a/mm/filemap.c b/mm/filemap.c
index 2e906ef52143..3247b4208034 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -309,6 +309,16 @@ int filemap_check_errors(struct address_space *mapping)
309} 309}
310EXPORT_SYMBOL(filemap_check_errors); 310EXPORT_SYMBOL(filemap_check_errors);
311 311
312static int filemap_check_and_keep_errors(struct address_space *mapping)
313{
314 /* Check for outstanding write errors */
315 if (test_bit(AS_EIO, &mapping->flags))
316 return -EIO;
317 if (test_bit(AS_ENOSPC, &mapping->flags))
318 return -ENOSPC;
319 return 0;
320}
321
312/** 322/**
313 * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range 323 * __filemap_fdatawrite_range - start writeback on mapping dirty pages in range
314 * @mapping: address space structure to write 324 * @mapping: address space structure to write
@@ -408,17 +418,16 @@ bool filemap_range_has_page(struct address_space *mapping,
408} 418}
409EXPORT_SYMBOL(filemap_range_has_page); 419EXPORT_SYMBOL(filemap_range_has_page);
410 420
411static int __filemap_fdatawait_range(struct address_space *mapping, 421static void __filemap_fdatawait_range(struct address_space *mapping,
412 loff_t start_byte, loff_t end_byte) 422 loff_t start_byte, loff_t end_byte)
413{ 423{
414 pgoff_t index = start_byte >> PAGE_SHIFT; 424 pgoff_t index = start_byte >> PAGE_SHIFT;
415 pgoff_t end = end_byte >> PAGE_SHIFT; 425 pgoff_t end = end_byte >> PAGE_SHIFT;
416 struct pagevec pvec; 426 struct pagevec pvec;
417 int nr_pages; 427 int nr_pages;
418 int ret = 0;
419 428
420 if (end_byte < start_byte) 429 if (end_byte < start_byte)
421 goto out; 430 return;
422 431
423 pagevec_init(&pvec, 0); 432 pagevec_init(&pvec, 0);
424 while ((index <= end) && 433 while ((index <= end) &&
@@ -435,14 +444,11 @@ static int __filemap_fdatawait_range(struct address_space *mapping,
435 continue; 444 continue;
436 445
437 wait_on_page_writeback(page); 446 wait_on_page_writeback(page);
438 if (TestClearPageError(page)) 447 ClearPageError(page);
439 ret = -EIO;
440 } 448 }
441 pagevec_release(&pvec); 449 pagevec_release(&pvec);
442 cond_resched(); 450 cond_resched();
443 } 451 }
444out:
445 return ret;
446} 452}
447 453
448/** 454/**
@@ -462,14 +468,8 @@ out:
462int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte, 468int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
463 loff_t end_byte) 469 loff_t end_byte)
464{ 470{
465 int ret, ret2; 471 __filemap_fdatawait_range(mapping, start_byte, end_byte);
466 472 return filemap_check_errors(mapping);
467 ret = __filemap_fdatawait_range(mapping, start_byte, end_byte);
468 ret2 = filemap_check_errors(mapping);
469 if (!ret)
470 ret = ret2;
471
472 return ret;
473} 473}
474EXPORT_SYMBOL(filemap_fdatawait_range); 474EXPORT_SYMBOL(filemap_fdatawait_range);
475 475
@@ -485,15 +485,17 @@ EXPORT_SYMBOL(filemap_fdatawait_range);
485 * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2), 485 * call sites are system-wide / filesystem-wide data flushers: e.g. sync(2),
486 * fsfreeze(8) 486 * fsfreeze(8)
487 */ 487 */
488void filemap_fdatawait_keep_errors(struct address_space *mapping) 488int filemap_fdatawait_keep_errors(struct address_space *mapping)
489{ 489{
490 loff_t i_size = i_size_read(mapping->host); 490 loff_t i_size = i_size_read(mapping->host);
491 491
492 if (i_size == 0) 492 if (i_size == 0)
493 return; 493 return 0;
494 494
495 __filemap_fdatawait_range(mapping, 0, i_size - 1); 495 __filemap_fdatawait_range(mapping, 0, i_size - 1);
496 return filemap_check_and_keep_errors(mapping);
496} 497}
498EXPORT_SYMBOL(filemap_fdatawait_keep_errors);
497 499
498/** 500/**
499 * filemap_fdatawait - wait for all under-writeback pages to complete 501 * filemap_fdatawait - wait for all under-writeback pages to complete
@@ -535,6 +537,9 @@ int filemap_write_and_wait(struct address_space *mapping)
535 int err2 = filemap_fdatawait(mapping); 537 int err2 = filemap_fdatawait(mapping);
536 if (!err) 538 if (!err)
537 err = err2; 539 err = err2;
540 } else {
541 /* Clear any previously stored errors */
542 filemap_check_errors(mapping);
538 } 543 }
539 } else { 544 } else {
540 err = filemap_check_errors(mapping); 545 err = filemap_check_errors(mapping);
@@ -569,6 +574,9 @@ int filemap_write_and_wait_range(struct address_space *mapping,
569 lstart, lend); 574 lstart, lend);
570 if (!err) 575 if (!err)
571 err = err2; 576 err = err2;
577 } else {
578 /* Clear any previously stored errors */
579 filemap_check_errors(mapping);
572 } 580 }
573 } else { 581 } else {
574 err = filemap_check_errors(mapping); 582 err = filemap_check_errors(mapping);
@@ -577,6 +585,90 @@ int filemap_write_and_wait_range(struct address_space *mapping,
577} 585}
578EXPORT_SYMBOL(filemap_write_and_wait_range); 586EXPORT_SYMBOL(filemap_write_and_wait_range);
579 587
588void __filemap_set_wb_err(struct address_space *mapping, int err)
589{
590 errseq_t eseq = __errseq_set(&mapping->wb_err, err);
591
592 trace_filemap_set_wb_err(mapping, eseq);
593}
594EXPORT_SYMBOL(__filemap_set_wb_err);
595
596/**
597 * file_check_and_advance_wb_err - report wb error (if any) that was previously
598 * and advance wb_err to current one
599 * @file: struct file on which the error is being reported
600 *
601 * When userland calls fsync (or something like nfsd does the equivalent), we
602 * want to report any writeback errors that occurred since the last fsync (or
603 * since the file was opened if there haven't been any).
604 *
605 * Grab the wb_err from the mapping. If it matches what we have in the file,
606 * then just quickly return 0. The file is all caught up.
607 *
608 * If it doesn't match, then take the mapping value, set the "seen" flag in
609 * it and try to swap it into place. If it works, or another task beat us
610 * to it with the new value, then update the f_wb_err and return the error
611 * portion. The error at this point must be reported via proper channels
612 * (a'la fsync, or NFS COMMIT operation, etc.).
613 *
614 * While we handle mapping->wb_err with atomic operations, the f_wb_err
615 * value is protected by the f_lock since we must ensure that it reflects
616 * the latest value swapped in for this file descriptor.
617 */
618int file_check_and_advance_wb_err(struct file *file)
619{
620 int err = 0;
621 errseq_t old = READ_ONCE(file->f_wb_err);
622 struct address_space *mapping = file->f_mapping;
623
624 /* Locklessly handle the common case where nothing has changed */
625 if (errseq_check(&mapping->wb_err, old)) {
626 /* Something changed, must use slow path */
627 spin_lock(&file->f_lock);
628 old = file->f_wb_err;
629 err = errseq_check_and_advance(&mapping->wb_err,
630 &file->f_wb_err);
631 trace_file_check_and_advance_wb_err(file, old);
632 spin_unlock(&file->f_lock);
633 }
634 return err;
635}
636EXPORT_SYMBOL(file_check_and_advance_wb_err);
637
638/**
639 * file_write_and_wait_range - write out & wait on a file range
640 * @file: file pointing to address_space with pages
641 * @lstart: offset in bytes where the range starts
642 * @lend: offset in bytes where the range ends (inclusive)
643 *
644 * Write out and wait upon file offsets lstart->lend, inclusive.
645 *
646 * Note that @lend is inclusive (describes the last byte to be written) so
647 * that this function can be used to write to the very end-of-file (end = -1).
648 *
649 * After writing out and waiting on the data, we check and advance the
650 * f_wb_err cursor to the latest value, and return any errors detected there.
651 */
652int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
653{
654 int err = 0, err2;
655 struct address_space *mapping = file->f_mapping;
656
657 if ((!dax_mapping(mapping) && mapping->nrpages) ||
658 (dax_mapping(mapping) && mapping->nrexceptional)) {
659 err = __filemap_fdatawrite_range(mapping, lstart, lend,
660 WB_SYNC_ALL);
661 /* See comment of filemap_write_and_wait() */
662 if (err != -EIO)
663 __filemap_fdatawait_range(mapping, lstart, lend);
664 }
665 err2 = file_check_and_advance_wb_err(file);
666 if (!err)
667 err = err2;
668 return err;
669}
670EXPORT_SYMBOL(file_write_and_wait_range);
671
580/** 672/**
581 * replace_page_cache_page - replace a pagecache page with a new one 673 * replace_page_cache_page - replace a pagecache page with a new one
582 * @old: page to be replaced 674 * @old: page to be replaced