aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2017-09-06 17:11:03 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2017-09-06 17:11:03 -0400
commitec3604c7a5aae8953545b0d05495357009a960e5 (patch)
treedd3927047b90048231d924fc151a9d1881f7b8cd
parent066dea8c30ae7d8e061145bcf5422ce0773582eb (diff)
parent6d4b51241394664fffbf68ea86c96d2699344583 (diff)
Merge tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull writeback error handling updates from Jeff Layton: "This pile continues the work from last cycle on better tracking writeback errors. In v4.13 we added some basic errseq_t infrastructure and converted a few filesystems to use it. This set continues refining that infrastructure, adds documentation, and converts most of the other filesystems to use it. The main exception at this point is the NFS client" * tag 'wberr-v4.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: ecryptfs: convert to file_write_and_wait in ->fsync mm: remove optimizations based on i_size in mapping writeback waits fs: convert a pile of fsync routines to errseq_t based reporting gfs2: convert to errseq_t based writeback error reporting for fsync fs: convert sync_file_range to use errseq_t based error-tracking mm: add file_fdatawait_range and file_write_and_wait fuse: convert to errseq_t based error tracking for fsync mm: consolidate dax / non-dax checks for writeback Documentation: add some docs for errseq_t errseq: rename __errseq_set to errseq_set
-rw-r--r--Documentation/errseq.rst149
-rw-r--r--arch/powerpc/platforms/cell/spufs/file.c2
-rw-r--r--drivers/staging/lustre/lustre/llite/file.c2
-rw-r--r--drivers/video/fbdev/core/fb_defio.c2
-rw-r--r--fs/9p/vfs_file.c4
-rw-r--r--fs/affs/file.c2
-rw-r--r--fs/afs/write.c2
-rw-r--r--fs/cifs/file.c4
-rw-r--r--fs/ecryptfs/file.c2
-rw-r--r--fs/exofs/file.c2
-rw-r--r--fs/f2fs/file.c2
-rw-r--r--fs/fuse/file.c6
-rw-r--r--fs/gfs2/file.c6
-rw-r--r--fs/hfs/inode.c2
-rw-r--r--fs/hfsplus/inode.c2
-rw-r--r--fs/hostfs/hostfs_kern.c2
-rw-r--r--fs/hpfs/file.c2
-rw-r--r--fs/jffs2/file.c2
-rw-r--r--fs/jfs/file.c2
-rw-r--r--fs/ncpfs/file.c2
-rw-r--r--fs/ntfs/dir.c2
-rw-r--r--fs/ntfs/file.c2
-rw-r--r--fs/ocfs2/file.c2
-rw-r--r--fs/reiserfs/dir.c2
-rw-r--r--fs/reiserfs/file.c2
-rw-r--r--fs/sync.c4
-rw-r--r--fs/ubifs/file.c2
-rw-r--r--include/linux/errseq.h14
-rw-r--r--include/linux/fs.h20
-rw-r--r--lib/errseq.c17
-rw-r--r--mm/filemap.c64
31 files changed, 241 insertions, 89 deletions
diff --git a/Documentation/errseq.rst b/Documentation/errseq.rst
new file mode 100644
index 000000000000..4c29bd5afbc5
--- /dev/null
+++ b/Documentation/errseq.rst
@@ -0,0 +1,149 @@
1The errseq_t datatype
2=====================
3An errseq_t is a way of recording errors in one place, and allowing any
4number of "subscribers" to tell whether it has changed since a previous
5point where it was sampled.
6
7The initial use case for this is tracking errors for file
8synchronization syscalls (fsync, fdatasync, msync and sync_file_range),
9but it may be usable in other situations.
10
11It's implemented as an unsigned 32-bit value. The low order bits are
12designated to hold an error code (between 1 and MAX_ERRNO). The upper bits
13are used as a counter. This is done with atomics instead of locking so that
14these functions can be called from any context.
15
16Note that there is a risk of collisions if new errors are being recorded
17frequently, since we have so few bits to use as a counter.
18
19To mitigate this, the bit between the error value and counter is used as
20a flag to tell whether the value has been sampled since a new value was
21recorded. That allows us to avoid bumping the counter if no one has
22sampled it since the last time an error was recorded.
23
24Thus we end up with a value that looks something like this::
25
26 bit: 31..13 12 11..0
27 +-----------------+----+----------------+
28 | counter | SF | errno |
29 +-----------------+----+----------------+
30
31The general idea is for "watchers" to sample an errseq_t value and keep
32it as a running cursor. That value can later be used to tell whether
33any new errors have occurred since that sampling was done, and atomically
34record the state at the time that it was checked. This allows us to
35record errors in one place, and then have a number of "watchers" that
36can tell whether the value has changed since they last checked it.
37
38A new errseq_t should always be zeroed out. An errseq_t value of all zeroes
39is the special (but common) case where there has never been an error. An all
40zero value thus serves as the "epoch" if one wishes to know whether there
41has ever been an error set since it was first initialized.
42
43API usage
44=========
45Let me tell you a story about a worker drone. Now, he's a good worker
46overall, but the company is a little...management heavy. He has to
47report to 77 supervisors today, and tomorrow the "big boss" is coming in
48from out of town and he's sure to test the poor fellow too.
49
50They're all handing him work to do -- so much he can't keep track of who
51handed him what, but that's not really a big problem. The supervisors
52just want to know when he's finished all of the work they've handed him so
53far and whether he made any mistakes since they last asked.
54
55He might have made the mistake on work they didn't actually hand him,
56but he can't keep track of things at that level of detail, all he can
57remember is the most recent mistake that he made.
58
59Here's our worker_drone representation::
60
61 struct worker_drone {
62 errseq_t wd_err; /* for recording errors */
63 };
64
65Every day, the worker_drone starts out with a blank slate::
66
67 struct worker_drone wd;
68
69 wd.wd_err = (errseq_t)0;
70
71The supervisors come in and get an initial read for the day. They
72don't care about anything that happened before their watch begins::
73
74 struct supervisor {
75 errseq_t s_wd_err; /* private "cursor" for wd_err */
76 spinlock_t s_wd_err_lock; /* protects s_wd_err */
77 }
78
79 struct supervisor su;
80
81 su.s_wd_err = errseq_sample(&wd.wd_err);
82 spin_lock_init(&su.s_wd_err_lock);
83
84Now they start handing him tasks to do. Every few minutes they ask him to
85finish up all of the work they've handed him so far. Then they ask him
86whether he made any mistakes on any of it::
87
88 spin_lock(&su.su_wd_err_lock);
89 err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
90 spin_unlock(&su.su_wd_err_lock);
91
92Up to this point, that just keeps returning 0.
93
94Now, the owners of this company are quite miserly and have given him
95substandard equipment with which to do his job. Occasionally it
96glitches and he makes a mistake. He sighs a heavy sigh, and marks it
97down::
98
99 errseq_set(&wd.wd_err, -EIO);
100
101...and then gets back to work. The supervisors eventually poll again
102and they each get the error when they next check. Subsequent calls will
103return 0, until another error is recorded, at which point it's reported
104to each of them once.
105
106Note that the supervisors can't tell how many mistakes he made, only
107whether one was made since they last checked, and the latest value
108recorded.
109
110Occasionally the big boss comes in for a spot check and asks the worker
111to do a one-off job for him. He's not really watching the worker
112full-time like the supervisors, but he does need to know whether a
113mistake occurred while his job was processing.
114
115He can just sample the current errseq_t in the worker, and then use that
116to tell whether an error has occurred later::
117
118 errseq_t since = errseq_sample(&wd.wd_err);
119 /* submit some work and wait for it to complete */
120 err = errseq_check(&wd.wd_err, since);
121
122Since he's just going to discard "since" after that point, he doesn't
123need to advance it here. He also doesn't need any locking since it's
124not usable by anyone else.
125
126Serializing errseq_t cursor updates
127===================================
128Note that the errseq_t API does not protect the errseq_t cursor during a
129check_and_advance_operation. Only the canonical error code is handled
130atomically. In a situation where more than one task might be using the
131same errseq_t cursor at the same time, it's important to serialize
132updates to that cursor.
133
134If that's not done, then it's possible for the cursor to go backward
135in which case the same error could be reported more than once.
136
137Because of this, it's often advantageous to first do an errseq_check to
138see if anything has changed, and only later do an
139errseq_check_and_advance after taking the lock. e.g.::
140
141 if (errseq_check(&wd.wd_err, READ_ONCE(su.s_wd_err)) {
142 /* su.s_wd_err is protected by s_wd_err_lock */
143 spin_lock(&su.s_wd_err_lock);
144 err = errseq_check_and_advance(&wd.wd_err, &su.s_wd_err);
145 spin_unlock(&su.s_wd_err_lock);
146 }
147
148That avoids the spinlock in the common case where nothing has changed
149since the last time it was checked.
diff --git a/arch/powerpc/platforms/cell/spufs/file.c b/arch/powerpc/platforms/cell/spufs/file.c
index ae2f740a82f1..5ffcdeb1eb17 100644
--- a/arch/powerpc/platforms/cell/spufs/file.c
+++ b/arch/powerpc/platforms/cell/spufs/file.c
@@ -1749,7 +1749,7 @@ out:
1749static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int datasync) 1749static int spufs_mfc_fsync(struct file *file, loff_t start, loff_t end, int datasync)
1750{ 1750{
1751 struct inode *inode = file_inode(file); 1751 struct inode *inode = file_inode(file);
1752 int err = filemap_write_and_wait_range(inode->i_mapping, start, end); 1752 int err = file_write_and_wait_range(file, start, end);
1753 if (!err) { 1753 if (!err) {
1754 inode_lock(inode); 1754 inode_lock(inode);
1755 err = spufs_mfc_flush(file, NULL); 1755 err = spufs_mfc_flush(file, NULL);
diff --git a/drivers/staging/lustre/lustre/llite/file.c b/drivers/staging/lustre/lustre/llite/file.c
index 2c30e422a47e..be665454f407 100644
--- a/drivers/staging/lustre/lustre/llite/file.c
+++ b/drivers/staging/lustre/lustre/llite/file.c
@@ -2364,7 +2364,7 @@ int ll_fsync(struct file *file, loff_t start, loff_t end, int datasync)
2364 PFID(ll_inode2fid(inode)), inode); 2364 PFID(ll_inode2fid(inode)), inode);
2365 ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1); 2365 ll_stats_ops_tally(ll_i2sbi(inode), LPROC_LL_FSYNC, 1);
2366 2366
2367 rc = filemap_write_and_wait_range(inode->i_mapping, start, end); 2367 rc = file_write_and_wait_range(file, start, end);
2368 inode_lock(inode); 2368 inode_lock(inode);
2369 2369
2370 /* catch async errors that were recorded back when async writeback 2370 /* catch async errors that were recorded back when async writeback
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 37f69c061210..487d5e336e1b 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -69,7 +69,7 @@ int fb_deferred_io_fsync(struct file *file, loff_t start, loff_t end, int datasy
69{ 69{
70 struct fb_info *info = file->private_data; 70 struct fb_info *info = file->private_data;
71 struct inode *inode = file_inode(file); 71 struct inode *inode = file_inode(file);
72 int err = filemap_write_and_wait_range(inode->i_mapping, start, end); 72 int err = file_write_and_wait_range(file, start, end);
73 if (err) 73 if (err)
74 return err; 74 return err;
75 75
diff --git a/fs/9p/vfs_file.c b/fs/9p/vfs_file.c
index 43c242e17132..03c9e325bfbc 100644
--- a/fs/9p/vfs_file.c
+++ b/fs/9p/vfs_file.c
@@ -445,7 +445,7 @@ static int v9fs_file_fsync(struct file *filp, loff_t start, loff_t end,
445 struct p9_wstat wstat; 445 struct p9_wstat wstat;
446 int retval; 446 int retval;
447 447
448 retval = filemap_write_and_wait_range(inode->i_mapping, start, end); 448 retval = file_write_and_wait_range(filp, start, end);
449 if (retval) 449 if (retval)
450 return retval; 450 return retval;
451 451
@@ -468,7 +468,7 @@ int v9fs_file_fsync_dotl(struct file *filp, loff_t start, loff_t end,
468 struct inode *inode = filp->f_mapping->host; 468 struct inode *inode = filp->f_mapping->host;
469 int retval; 469 int retval;
470 470
471 retval = filemap_write_and_wait_range(inode->i_mapping, start, end); 471 retval = file_write_and_wait_range(filp, start, end);
472 if (retval) 472 if (retval)
473 return retval; 473 return retval;
474 474
diff --git a/fs/affs/file.c b/fs/affs/file.c
index 196ee7f6fdc4..00331810f690 100644
--- a/fs/affs/file.c
+++ b/fs/affs/file.c
@@ -954,7 +954,7 @@ int affs_file_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
954 struct inode *inode = filp->f_mapping->host; 954 struct inode *inode = filp->f_mapping->host;
955 int ret, err; 955 int ret, err;
956 956
957 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 957 err = file_write_and_wait_range(filp, start, end);
958 if (err) 958 if (err)
959 return err; 959 return err;
960 960
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 2d2fccd5044b..106e43db1115 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -714,7 +714,7 @@ int afs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
714 vnode->fid.vid, vnode->fid.vnode, file, 714 vnode->fid.vid, vnode->fid.vnode, file,
715 datasync); 715 datasync);
716 716
717 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 717 ret = file_write_and_wait_range(file, start, end);
718 if (ret) 718 if (ret)
719 return ret; 719 return ret;
720 inode_lock(inode); 720 inode_lock(inode);
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index bc09df6b473a..0786f19d288f 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -2329,7 +2329,7 @@ int cifs_strict_fsync(struct file *file, loff_t start, loff_t end,
2329 struct inode *inode = file_inode(file); 2329 struct inode *inode = file_inode(file);
2330 struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); 2330 struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
2331 2331
2332 rc = filemap_write_and_wait_range(inode->i_mapping, start, end); 2332 rc = file_write_and_wait_range(file, start, end);
2333 if (rc) 2333 if (rc)
2334 return rc; 2334 return rc;
2335 inode_lock(inode); 2335 inode_lock(inode);
@@ -2371,7 +2371,7 @@ int cifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
2371 struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file); 2371 struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(file);
2372 struct inode *inode = file->f_mapping->host; 2372 struct inode *inode = file->f_mapping->host;
2373 2373
2374 rc = filemap_write_and_wait_range(inode->i_mapping, start, end); 2374 rc = file_write_and_wait_range(file, start, end);
2375 if (rc) 2375 if (rc)
2376 return rc; 2376 return rc;
2377 inode_lock(inode); 2377 inode_lock(inode);
diff --git a/fs/ecryptfs/file.c b/fs/ecryptfs/file.c
index ca4e83750214..c74ed3ca3372 100644
--- a/fs/ecryptfs/file.c
+++ b/fs/ecryptfs/file.c
@@ -328,7 +328,7 @@ ecryptfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
328{ 328{
329 int rc; 329 int rc;
330 330
331 rc = filemap_write_and_wait(file->f_mapping); 331 rc = file_write_and_wait(file);
332 if (rc) 332 if (rc)
333 return rc; 333 return rc;
334 334
diff --git a/fs/exofs/file.c b/fs/exofs/file.c
index 28645f0640f7..a94594ea2aa3 100644
--- a/fs/exofs/file.c
+++ b/fs/exofs/file.c
@@ -48,7 +48,7 @@ static int exofs_file_fsync(struct file *filp, loff_t start, loff_t end,
48 struct inode *inode = filp->f_mapping->host; 48 struct inode *inode = filp->f_mapping->host;
49 int ret; 49 int ret;
50 50
51 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 51 ret = file_write_and_wait_range(filp, start, end);
52 if (ret) 52 if (ret)
53 return ret; 53 return ret;
54 54
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 2706130c261b..843a0d99f7ea 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -206,7 +206,7 @@ static int f2fs_do_sync_file(struct file *file, loff_t start, loff_t end,
206 /* if fdatasync is triggered, let's do in-place-update */ 206 /* if fdatasync is triggered, let's do in-place-update */
207 if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks) 207 if (datasync || get_dirty_pages(inode) <= SM_I(sbi)->min_fsync_blocks)
208 set_inode_flag(inode, FI_NEED_IPU); 208 set_inode_flag(inode, FI_NEED_IPU);
209 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 209 ret = file_write_and_wait_range(file, start, end);
210 clear_inode_flag(inode, FI_NEED_IPU); 210 clear_inode_flag(inode, FI_NEED_IPU);
211 211
212 if (ret) { 212 if (ret) {
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 0273029b1220..d66789804287 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -457,7 +457,7 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
457 * wait for all outstanding writes, before sending the FSYNC 457 * wait for all outstanding writes, before sending the FSYNC
458 * request. 458 * request.
459 */ 459 */
460 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 460 err = file_write_and_wait_range(file, start, end);
461 if (err) 461 if (err)
462 goto out; 462 goto out;
463 463
@@ -465,10 +465,10 @@ int fuse_fsync_common(struct file *file, loff_t start, loff_t end,
465 465
466 /* 466 /*
467 * Due to implementation of fuse writeback 467 * Due to implementation of fuse writeback
468 * filemap_write_and_wait_range() does not catch errors. 468 * file_write_and_wait_range() does not catch errors.
469 * We have to do this directly after fuse_sync_writes() 469 * We have to do this directly after fuse_sync_writes()
470 */ 470 */
471 err = filemap_check_errors(file->f_mapping); 471 err = file_check_and_advance_wb_err(file);
472 if (err) 472 if (err)
473 goto out; 473 goto out;
474 474
diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index bb48074be019..33a0cb5701a3 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -668,12 +668,14 @@ static int gfs2_fsync(struct file *file, loff_t start, loff_t end,
668 if (ret) 668 if (ret)
669 return ret; 669 return ret;
670 if (gfs2_is_jdata(ip)) 670 if (gfs2_is_jdata(ip))
671 filemap_write_and_wait(mapping); 671 ret = file_write_and_wait(file);
672 if (ret)
673 return ret;
672 gfs2_ail_flush(ip->i_gl, 1); 674 gfs2_ail_flush(ip->i_gl, 1);
673 } 675 }
674 676
675 if (mapping->nrpages) 677 if (mapping->nrpages)
676 ret = filemap_fdatawait_range(mapping, start, end); 678 ret = file_fdatawait_range(file, start, end);
677 679
678 return ret ? ret : ret1; 680 return ret ? ret : ret1;
679} 681}
diff --git a/fs/hfs/inode.c b/fs/hfs/inode.c
index bfbba799430f..2538b49cc349 100644
--- a/fs/hfs/inode.c
+++ b/fs/hfs/inode.c
@@ -656,7 +656,7 @@ static int hfs_file_fsync(struct file *filp, loff_t start, loff_t end,
656 struct super_block * sb; 656 struct super_block * sb;
657 int ret, err; 657 int ret, err;
658 658
659 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 659 ret = file_write_and_wait_range(filp, start, end);
660 if (ret) 660 if (ret)
661 return ret; 661 return ret;
662 inode_lock(inode); 662 inode_lock(inode);
diff --git a/fs/hfsplus/inode.c b/fs/hfsplus/inode.c
index e8638d528195..4f26b6877130 100644
--- a/fs/hfsplus/inode.c
+++ b/fs/hfsplus/inode.c
@@ -283,7 +283,7 @@ int hfsplus_file_fsync(struct file *file, loff_t start, loff_t end,
283 struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb); 283 struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
284 int error = 0, error2; 284 int error = 0, error2;
285 285
286 error = filemap_write_and_wait_range(inode->i_mapping, start, end); 286 error = file_write_and_wait_range(file, start, end);
287 if (error) 287 if (error)
288 return error; 288 return error;
289 inode_lock(inode); 289 inode_lock(inode);
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c
index e61261a7417e..c148e7f4f451 100644
--- a/fs/hostfs/hostfs_kern.c
+++ b/fs/hostfs/hostfs_kern.c
@@ -374,7 +374,7 @@ static int hostfs_fsync(struct file *file, loff_t start, loff_t end,
374 struct inode *inode = file->f_mapping->host; 374 struct inode *inode = file->f_mapping->host;
375 int ret; 375 int ret;
376 376
377 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 377 ret = file_write_and_wait_range(file, start, end);
378 if (ret) 378 if (ret)
379 return ret; 379 return ret;
380 380
diff --git a/fs/hpfs/file.c b/fs/hpfs/file.c
index b3be1b5a62e2..f26138425b16 100644
--- a/fs/hpfs/file.c
+++ b/fs/hpfs/file.c
@@ -24,7 +24,7 @@ int hpfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
24 struct inode *inode = file->f_mapping->host; 24 struct inode *inode = file->f_mapping->host;
25 int ret; 25 int ret;
26 26
27 ret = filemap_write_and_wait_range(file->f_mapping, start, end); 27 ret = file_write_and_wait_range(file, start, end);
28 if (ret) 28 if (ret)
29 return ret; 29 return ret;
30 return sync_blockdev(inode->i_sb->s_bdev); 30 return sync_blockdev(inode->i_sb->s_bdev);
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c
index c12476e309c6..bd0428bebe9b 100644
--- a/fs/jffs2/file.c
+++ b/fs/jffs2/file.c
@@ -35,7 +35,7 @@ int jffs2_fsync(struct file *filp, loff_t start, loff_t end, int datasync)
35 struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb); 35 struct jffs2_sb_info *c = JFFS2_SB_INFO(inode->i_sb);
36 int ret; 36 int ret;
37 37
38 ret = filemap_write_and_wait_range(inode->i_mapping, start, end); 38 ret = file_write_and_wait_range(filp, start, end);
39 if (ret) 39 if (ret)
40 return ret; 40 return ret;
41 41
diff --git a/fs/jfs/file.c b/fs/jfs/file.c
index 739492c7a3fd..36665fd37095 100644
--- a/fs/jfs/file.c
+++ b/fs/jfs/file.c
@@ -34,7 +34,7 @@ int jfs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
34 struct inode *inode = file->f_mapping->host; 34 struct inode *inode = file->f_mapping->host;
35 int rc = 0; 35 int rc = 0;
36 36
37 rc = filemap_write_and_wait_range(inode->i_mapping, start, end); 37 rc = file_write_and_wait_range(file, start, end);
38 if (rc) 38 if (rc)
39 return rc; 39 return rc;
40 40
diff --git a/fs/ncpfs/file.c b/fs/ncpfs/file.c
index 76965e772264..a06c07619ee6 100644
--- a/fs/ncpfs/file.c
+++ b/fs/ncpfs/file.c
@@ -23,7 +23,7 @@
23 23
24static int ncp_fsync(struct file *file, loff_t start, loff_t end, int datasync) 24static int ncp_fsync(struct file *file, loff_t start, loff_t end, int datasync)
25{ 25{
26 return filemap_write_and_wait_range(file->f_mapping, start, end); 26 return file_write_and_wait_range(file, start, end);
27} 27}
28 28
29/* 29/*
diff --git a/fs/ntfs/dir.c b/fs/ntfs/dir.c
index 0ee19ecc982d..1a24be9e8405 100644
--- a/fs/ntfs/dir.c
+++ b/fs/ntfs/dir.c
@@ -1506,7 +1506,7 @@ static int ntfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
1506 1506
1507 ntfs_debug("Entering for inode 0x%lx.", vi->i_ino); 1507 ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
1508 1508
1509 err = filemap_write_and_wait_range(vi->i_mapping, start, end); 1509 err = file_write_and_wait_range(filp, start, end);
1510 if (err) 1510 if (err)
1511 return err; 1511 return err;
1512 inode_lock(vi); 1512 inode_lock(vi);
diff --git a/fs/ntfs/file.c b/fs/ntfs/file.c
index c4f68c338735..331910fa8442 100644
--- a/fs/ntfs/file.c
+++ b/fs/ntfs/file.c
@@ -1989,7 +1989,7 @@ static int ntfs_file_fsync(struct file *filp, loff_t start, loff_t end,
1989 1989
1990 ntfs_debug("Entering for inode 0x%lx.", vi->i_ino); 1990 ntfs_debug("Entering for inode 0x%lx.", vi->i_ino);
1991 1991
1992 err = filemap_write_and_wait_range(vi->i_mapping, start, end); 1992 err = file_write_and_wait_range(filp, start, end);
1993 if (err) 1993 if (err)
1994 return err; 1994 return err;
1995 inode_lock(vi); 1995 inode_lock(vi);
diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c
index bfeb647459d9..66e59d3163ea 100644
--- a/fs/ocfs2/file.c
+++ b/fs/ocfs2/file.c
@@ -196,7 +196,7 @@ static int ocfs2_sync_file(struct file *file, loff_t start, loff_t end,
196 if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb)) 196 if (ocfs2_is_hard_readonly(osb) || ocfs2_is_soft_readonly(osb))
197 return -EROFS; 197 return -EROFS;
198 198
199 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 199 err = file_write_and_wait_range(file, start, end);
200 if (err) 200 if (err)
201 return err; 201 return err;
202 202
diff --git a/fs/reiserfs/dir.c b/fs/reiserfs/dir.c
index 45aa05e2232f..5b50689d8539 100644
--- a/fs/reiserfs/dir.c
+++ b/fs/reiserfs/dir.c
@@ -34,7 +34,7 @@ static int reiserfs_dir_fsync(struct file *filp, loff_t start, loff_t end,
34 struct inode *inode = filp->f_mapping->host; 34 struct inode *inode = filp->f_mapping->host;
35 int err; 35 int err;
36 36
37 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 37 err = file_write_and_wait_range(filp, start, end);
38 if (err) 38 if (err)
39 return err; 39 return err;
40 40
diff --git a/fs/reiserfs/file.c b/fs/reiserfs/file.c
index b396eb09f288..843aadcc123c 100644
--- a/fs/reiserfs/file.c
+++ b/fs/reiserfs/file.c
@@ -154,7 +154,7 @@ static int reiserfs_sync_file(struct file *filp, loff_t start, loff_t end,
154 int err; 154 int err;
155 int barrier_done; 155 int barrier_done;
156 156
157 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 157 err = file_write_and_wait_range(filp, start, end);
158 if (err) 158 if (err)
159 return err; 159 return err;
160 160
diff --git a/fs/sync.c b/fs/sync.c
index 2a54c1f22035..27d6b8bbcb6a 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -342,7 +342,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
342 342
343 ret = 0; 343 ret = 0;
344 if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) { 344 if (flags & SYNC_FILE_RANGE_WAIT_BEFORE) {
345 ret = filemap_fdatawait_range(mapping, offset, endbyte); 345 ret = file_fdatawait_range(f.file, offset, endbyte);
346 if (ret < 0) 346 if (ret < 0)
347 goto out_put; 347 goto out_put;
348 } 348 }
@@ -355,7 +355,7 @@ SYSCALL_DEFINE4(sync_file_range, int, fd, loff_t, offset, loff_t, nbytes,
355 } 355 }
356 356
357 if (flags & SYNC_FILE_RANGE_WAIT_AFTER) 357 if (flags & SYNC_FILE_RANGE_WAIT_AFTER)
358 ret = filemap_fdatawait_range(mapping, offset, endbyte); 358 ret = file_fdatawait_range(f.file, offset, endbyte);
359 359
360out_put: 360out_put:
361 fdput(f); 361 fdput(f);
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c
index 8cad0b19b404..f90a466ea5db 100644
--- a/fs/ubifs/file.c
+++ b/fs/ubifs/file.c
@@ -1337,7 +1337,7 @@ int ubifs_fsync(struct file *file, loff_t start, loff_t end, int datasync)
1337 */ 1337 */
1338 return 0; 1338 return 0;
1339 1339
1340 err = filemap_write_and_wait_range(inode->i_mapping, start, end); 1340 err = file_write_and_wait_range(file, start, end);
1341 if (err) 1341 if (err)
1342 return err; 1342 return err;
1343 inode_lock(inode); 1343 inode_lock(inode);
diff --git a/include/linux/errseq.h b/include/linux/errseq.h
index 9e0d444ac88d..f746bd8fe4d0 100644
--- a/include/linux/errseq.h
+++ b/include/linux/errseq.h
@@ -1,18 +1,12 @@
1/*
2 * See Documentation/errseq.rst and lib/errseq.c
3 */
1#ifndef _LINUX_ERRSEQ_H 4#ifndef _LINUX_ERRSEQ_H
2#define _LINUX_ERRSEQ_H 5#define _LINUX_ERRSEQ_H
3 6
4/* See lib/errseq.c for more info */
5
6typedef u32 errseq_t; 7typedef u32 errseq_t;
7 8
8errseq_t __errseq_set(errseq_t *eseq, int err); 9errseq_t errseq_set(errseq_t *eseq, int err);
9static inline void errseq_set(errseq_t *eseq, int err)
10{
11 /* Optimize for the common case of no error */
12 if (unlikely(err))
13 __errseq_set(eseq, err);
14}
15
16errseq_t errseq_sample(errseq_t *eseq); 10errseq_t errseq_sample(errseq_t *eseq);
17int errseq_check(errseq_t *eseq, errseq_t since); 11int errseq_check(errseq_t *eseq, errseq_t since);
18int errseq_check_and_advance(errseq_t *eseq, errseq_t *since); 12int errseq_check_and_advance(errseq_t *eseq, errseq_t *since);
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 0a51a8b197f7..5b744a3456c5 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2544,12 +2544,19 @@ extern int invalidate_inode_pages2_range(struct address_space *mapping,
2544extern int write_inode_now(struct inode *, int); 2544extern int write_inode_now(struct inode *, int);
2545extern int filemap_fdatawrite(struct address_space *); 2545extern int filemap_fdatawrite(struct address_space *);
2546extern int filemap_flush(struct address_space *); 2546extern int filemap_flush(struct address_space *);
2547extern int filemap_fdatawait(struct address_space *);
2548extern int filemap_fdatawait_keep_errors(struct address_space *mapping); 2547extern int filemap_fdatawait_keep_errors(struct address_space *mapping);
2549extern int filemap_fdatawait_range(struct address_space *, loff_t lstart, 2548extern int filemap_fdatawait_range(struct address_space *, loff_t lstart,
2550 loff_t lend); 2549 loff_t lend);
2550
2551static inline int filemap_fdatawait(struct address_space *mapping)
2552{
2553 return filemap_fdatawait_range(mapping, 0, LLONG_MAX);
2554}
2555
2551extern bool filemap_range_has_page(struct address_space *, loff_t lstart, 2556extern bool filemap_range_has_page(struct address_space *, loff_t lstart,
2552 loff_t lend); 2557 loff_t lend);
2558extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
2559 loff_t lend);
2553extern int filemap_write_and_wait(struct address_space *mapping); 2560extern int filemap_write_and_wait(struct address_space *mapping);
2554extern int filemap_write_and_wait_range(struct address_space *mapping, 2561extern int filemap_write_and_wait_range(struct address_space *mapping,
2555 loff_t lstart, loff_t lend); 2562 loff_t lstart, loff_t lend);
@@ -2558,12 +2565,19 @@ extern int __filemap_fdatawrite_range(struct address_space *mapping,
2558extern int filemap_fdatawrite_range(struct address_space *mapping, 2565extern int filemap_fdatawrite_range(struct address_space *mapping,
2559 loff_t start, loff_t end); 2566 loff_t start, loff_t end);
2560extern int filemap_check_errors(struct address_space *mapping); 2567extern int filemap_check_errors(struct address_space *mapping);
2561
2562extern void __filemap_set_wb_err(struct address_space *mapping, int err); 2568extern void __filemap_set_wb_err(struct address_space *mapping, int err);
2569
2570extern int __must_check file_fdatawait_range(struct file *file, loff_t lstart,
2571 loff_t lend);
2563extern int __must_check file_check_and_advance_wb_err(struct file *file); 2572extern int __must_check file_check_and_advance_wb_err(struct file *file);
2564extern int __must_check file_write_and_wait_range(struct file *file, 2573extern int __must_check file_write_and_wait_range(struct file *file,
2565 loff_t start, loff_t end); 2574 loff_t start, loff_t end);
2566 2575
2576static inline int file_write_and_wait(struct file *file)
2577{
2578 return file_write_and_wait_range(file, 0, LLONG_MAX);
2579}
2580
2567/** 2581/**
2568 * filemap_set_wb_err - set a writeback error on an address_space 2582 * filemap_set_wb_err - set a writeback error on an address_space
2569 * @mapping: mapping in which to set writeback error 2583 * @mapping: mapping in which to set writeback error
@@ -2577,8 +2591,6 @@ extern int __must_check file_write_and_wait_range(struct file *file,
2577 * When a writeback error occurs, most filesystems will want to call 2591 * When a writeback error occurs, most filesystems will want to call
2578 * filemap_set_wb_err to record the error in the mapping so that it will be 2592 * filemap_set_wb_err to record the error in the mapping so that it will be
2579 * automatically reported whenever fsync is called on the file. 2593 * automatically reported whenever fsync is called on the file.
2580 *
2581 * FIXME: mention FS_* flag here?
2582 */ 2594 */
2583static inline void filemap_set_wb_err(struct address_space *mapping, int err) 2595static inline void filemap_set_wb_err(struct address_space *mapping, int err)
2584{ 2596{
diff --git a/lib/errseq.c b/lib/errseq.c
index 841fa24e6e00..7b900c2a277a 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -41,23 +41,20 @@
41#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1)) 41#define ERRSEQ_CTR_INC (1 << (ERRSEQ_SHIFT + 1))
42 42
43/** 43/**
44 * __errseq_set - set a errseq_t for later reporting 44 * errseq_set - set a errseq_t for later reporting
45 * @eseq: errseq_t field that should be set 45 * @eseq: errseq_t field that should be set
46 * @err: error to set 46 * @err: error to set (must be between -1 and -MAX_ERRNO)
47 * 47 *
48 * This function sets the error in *eseq, and increments the sequence counter 48 * This function sets the error in *eseq, and increments the sequence counter
49 * if the last sequence was sampled at some point in the past. 49 * if the last sequence was sampled at some point in the past.
50 * 50 *
51 * Any error set will always overwrite an existing error. 51 * Any error set will always overwrite an existing error.
52 * 52 *
53 * Most callers will want to use the errseq_set inline wrapper to efficiently 53 * We do return the latest value here, primarily for debugging purposes. The
54 * handle the common case where err is 0. 54 * return value should not be used as a previously sampled value in later calls
55 * 55 * as it will not have the SEEN flag set.
56 * We do return an errseq_t here, primarily for debugging purposes. The return
57 * value should not be used as a previously sampled value in later calls as it
58 * will not have the SEEN flag set.
59 */ 56 */
60errseq_t __errseq_set(errseq_t *eseq, int err) 57errseq_t errseq_set(errseq_t *eseq, int err)
61{ 58{
62 errseq_t cur, old; 59 errseq_t cur, old;
63 60
@@ -107,7 +104,7 @@ errseq_t __errseq_set(errseq_t *eseq, int err)
107 } 104 }
108 return cur; 105 return cur;
109} 106}
110EXPORT_SYMBOL(__errseq_set); 107EXPORT_SYMBOL(errseq_set);
111 108
112/** 109/**
113 * errseq_sample - grab current errseq_t value 110 * errseq_sample - grab current errseq_t value
diff --git a/mm/filemap.c b/mm/filemap.c
index 65b4b6e7f7bd..1e01cb6e5173 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -476,6 +476,29 @@ int filemap_fdatawait_range(struct address_space *mapping, loff_t start_byte,
476EXPORT_SYMBOL(filemap_fdatawait_range); 476EXPORT_SYMBOL(filemap_fdatawait_range);
477 477
478/** 478/**
479 * file_fdatawait_range - wait for writeback to complete
480 * @file: file pointing to address space structure to wait for
481 * @start_byte: offset in bytes where the range starts
482 * @end_byte: offset in bytes where the range ends (inclusive)
483 *
484 * Walk the list of under-writeback pages of the address space that file
485 * refers to, in the given range and wait for all of them. Check error
486 * status of the address space vs. the file->f_wb_err cursor and return it.
487 *
488 * Since the error status of the file is advanced by this function,
489 * callers are responsible for checking the return value and handling and/or
490 * reporting the error.
491 */
492int file_fdatawait_range(struct file *file, loff_t start_byte, loff_t end_byte)
493{
494 struct address_space *mapping = file->f_mapping;
495
496 __filemap_fdatawait_range(mapping, start_byte, end_byte);
497 return file_check_and_advance_wb_err(file);
498}
499EXPORT_SYMBOL(file_fdatawait_range);
500
501/**
479 * filemap_fdatawait_keep_errors - wait for writeback without clearing errors 502 * filemap_fdatawait_keep_errors - wait for writeback without clearing errors
480 * @mapping: address space structure to wait for 503 * @mapping: address space structure to wait for
481 * 504 *
@@ -489,45 +512,22 @@ EXPORT_SYMBOL(filemap_fdatawait_range);
489 */ 512 */
490int filemap_fdatawait_keep_errors(struct address_space *mapping) 513int filemap_fdatawait_keep_errors(struct address_space *mapping)
491{ 514{
492 loff_t i_size = i_size_read(mapping->host); 515 __filemap_fdatawait_range(mapping, 0, LLONG_MAX);
493
494 if (i_size == 0)
495 return 0;
496
497 __filemap_fdatawait_range(mapping, 0, i_size - 1);
498 return filemap_check_and_keep_errors(mapping); 516 return filemap_check_and_keep_errors(mapping);
499} 517}
500EXPORT_SYMBOL(filemap_fdatawait_keep_errors); 518EXPORT_SYMBOL(filemap_fdatawait_keep_errors);
501 519
502/** 520static bool mapping_needs_writeback(struct address_space *mapping)
503 * filemap_fdatawait - wait for all under-writeback pages to complete
504 * @mapping: address space structure to wait for
505 *
506 * Walk the list of under-writeback pages of the given address space
507 * and wait for all of them. Check error status of the address space
508 * and return it.
509 *
510 * Since the error status of the address space is cleared by this function,
511 * callers are responsible for checking the return value and handling and/or
512 * reporting the error.
513 */
514int filemap_fdatawait(struct address_space *mapping)
515{ 521{
516 loff_t i_size = i_size_read(mapping->host); 522 return (!dax_mapping(mapping) && mapping->nrpages) ||
517 523 (dax_mapping(mapping) && mapping->nrexceptional);
518 if (i_size == 0)
519 return 0;
520
521 return filemap_fdatawait_range(mapping, 0, i_size - 1);
522} 524}
523EXPORT_SYMBOL(filemap_fdatawait);
524 525
525int filemap_write_and_wait(struct address_space *mapping) 526int filemap_write_and_wait(struct address_space *mapping)
526{ 527{
527 int err = 0; 528 int err = 0;
528 529
529 if ((!dax_mapping(mapping) && mapping->nrpages) || 530 if (mapping_needs_writeback(mapping)) {
530 (dax_mapping(mapping) && mapping->nrexceptional)) {
531 err = filemap_fdatawrite(mapping); 531 err = filemap_fdatawrite(mapping);
532 /* 532 /*
533 * Even if the above returned error, the pages may be 533 * Even if the above returned error, the pages may be
@@ -566,8 +566,7 @@ int filemap_write_and_wait_range(struct address_space *mapping,
566{ 566{
567 int err = 0; 567 int err = 0;
568 568
569 if ((!dax_mapping(mapping) && mapping->nrpages) || 569 if (mapping_needs_writeback(mapping)) {
570 (dax_mapping(mapping) && mapping->nrexceptional)) {
571 err = __filemap_fdatawrite_range(mapping, lstart, lend, 570 err = __filemap_fdatawrite_range(mapping, lstart, lend,
572 WB_SYNC_ALL); 571 WB_SYNC_ALL);
573 /* See comment of filemap_write_and_wait() */ 572 /* See comment of filemap_write_and_wait() */
@@ -589,7 +588,7 @@ EXPORT_SYMBOL(filemap_write_and_wait_range);
589 588
590void __filemap_set_wb_err(struct address_space *mapping, int err) 589void __filemap_set_wb_err(struct address_space *mapping, int err)
591{ 590{
592 errseq_t eseq = __errseq_set(&mapping->wb_err, err); 591 errseq_t eseq = errseq_set(&mapping->wb_err, err);
593 592
594 trace_filemap_set_wb_err(mapping, eseq); 593 trace_filemap_set_wb_err(mapping, eseq);
595} 594}
@@ -656,8 +655,7 @@ int file_write_and_wait_range(struct file *file, loff_t lstart, loff_t lend)
656 int err = 0, err2; 655 int err = 0, err2;
657 struct address_space *mapping = file->f_mapping; 656 struct address_space *mapping = file->f_mapping;
658 657
659 if ((!dax_mapping(mapping) && mapping->nrpages) || 658 if (mapping_needs_writeback(mapping)) {
660 (dax_mapping(mapping) && mapping->nrexceptional)) {
661 err = __filemap_fdatawrite_range(mapping, lstart, lend, 659 err = __filemap_fdatawrite_range(mapping, lstart, lend,
662 WB_SYNC_ALL); 660 WB_SYNC_ALL);
663 /* See comment of filemap_write_and_wait() */ 661 /* See comment of filemap_write_and_wait() */