aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorJeff Layton <jlayton@redhat.com>2007-11-20 18:19:03 -0500
committerSteve French <sfrench@us.ibm.com>2007-11-20 18:19:03 -0500
commitcea218054ad277d6c126890213afde07b4eb1602 (patch)
tree8bfbd3c7d8ab94d35ec749ed4e0d66b1f6b69101
parent2a97468024fb5b6eccee2a67a7796485c829343a (diff)
[CIFS] Fix potential data corruption when writing out cached dirty pages
Fix RedHat bug 329431 The idea here is separate "conscious" from "unconscious" flushes. Conscious flushes are those due to a fsync() or close(). Unconscious ones are flushes that occur as a side effect of some other operation or due to memory pressure. Currently, when an error occurs during an unconscious flush (ENOSPC or EIO), we toss out the page and don't preserve that error to report to the user when a conscious flush occurs. If after the unconscious flush, there are no more dirty pages for the inode, the conscious flush will simply return success even though there were previous errors when writing out pages. This can lead to data corruption. The easiest way to reproduce this is to mount up a CIFS share that's very close to being full or where the user is very close to quota. mv a file to the share that's slightly larger than the quota allows. The writes will all succeed (since they go to pagecache). The mv will do a setattr to set the new file's attributes. This calls filemap_write_and_wait, which will return an error since all of the pages can't be written out. Then later, when the flush and release ops occur, there are no more dirty pages in pagecache for the file and those operations return 0. mv then assumes that the file was written out correctly and deletes the original. CIFS already has a write_behind_rc variable where it stores the results from earlier flushes, but that value is only reported in cifs_close. Since the VFS ignores the return value from the release operation, this isn't helpful. We should be reporting this error during the flush operation. This patch does the following: 1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also sync to check its return code. If it returns successful, they then check the value of write_behind_rc to see if an earlier flush had reported any errors. If so, they return that error and clear write_behind_rc. 2) sets write_behind_rc in a few other places where pages are written out as a side effect of other operations and the code waits on them. 3) changes cifs_setattr to only call filemap_write_and_wait for ATTR_SIZE changes. 4) makes cifs_writepages accurately distinguish between EIO and ENOSPC errors when writing out pages. Some simple testing indicates that the patch works as expected and that it fixes the reproduceable known problem. Acked-by: Dave Kleikamp <shaggy@austin.rr.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
-rw-r--r--fs/cifs/CHANGES4
-rw-r--r--fs/cifs/README27
-rw-r--r--fs/cifs/cifsfs.c7
-rw-r--r--fs/cifs/file.c24
-rw-r--r--fs/cifs/inode.c26
5 files changed, 58 insertions, 30 deletions
diff --git a/fs/cifs/CHANGES b/fs/cifs/CHANGES
index e31aa74f7d9e..a609599287aa 100644
--- a/fs/cifs/CHANGES
+++ b/fs/cifs/CHANGES
@@ -1,7 +1,9 @@
1Version 1.52 1Version 1.52
2------------ 2------------
3Fix oops on second mount to server when null auth is used. 3Fix oops on second mount to server when null auth is used.
4Enable experimental Kerberos support 4Enable experimental Kerberos support. Return writebehind errors on flush
5and sync so that events like out of disk space get reported properly on
6cached files.
5 7
6Version 1.51 8Version 1.51
7------------ 9------------
diff --git a/fs/cifs/README b/fs/cifs/README
index b806b11b5560..bf11329ac784 100644
--- a/fs/cifs/README
+++ b/fs/cifs/README
@@ -225,12 +225,9 @@ If no password is provided, mount.cifs will prompt for password entry
225 225
226Restrictions 226Restrictions
227============ 227============
228Servers must support the NTLM SMB dialect (which is the most recent, supported
229by Samba and Windows NT version 4, 2000 and XP and many other SMB/CIFS servers)
230Servers must support either "pure-TCP" (port 445 TCP/IP CIFS connections) or RFC 228Servers must support either "pure-TCP" (port 445 TCP/IP CIFS connections) or RFC
2311001/1002 support for "Netbios-Over-TCP/IP." Neither of these is likely to be a 2291001/1002 support for "Netbios-Over-TCP/IP." This is not likely to be a
232problem as most servers support this. IPv6 support is planned for the future, 230problem as most servers support this.
233and is almost complete.
234 231
235Valid filenames differ between Windows and Linux. Windows typically restricts 232Valid filenames differ between Windows and Linux. Windows typically restricts
236filenames which contain certain reserved characters (e.g.the character : 233filenames which contain certain reserved characters (e.g.the character :
@@ -458,6 +455,8 @@ A partial list of the supported mount options follows:
458 byte range locks). 455 byte range locks).
459 remount remount the share (often used to change from ro to rw mounts 456 remount remount the share (often used to change from ro to rw mounts
460 or vice versa) 457 or vice versa)
458 cifsacl Report mode bits (e.g. on stat) based on the Windows ACL for
459 the file. (EXPERIMENTAL)
461 servern Specify the server 's netbios name (RFC1001 name) to use 460 servern Specify the server 's netbios name (RFC1001 name) to use
462 when attempting to setup a session to the server. This is 461 when attempting to setup a session to the server. This is
463 This is needed for mounting to some older servers (such 462 This is needed for mounting to some older servers (such
@@ -584,8 +583,8 @@ Experimental When set to 1 used to enable certain experimental
584 performance enhancement was disabled when 583 performance enhancement was disabled when
585 signing turned on in case buffer was modified 584 signing turned on in case buffer was modified
586 just before it was sent, also this flag will 585 just before it was sent, also this flag will
587 be used to use the new experimental sessionsetup 586 be used to use the new experimental directory change
588 code). 587 notification code).
589 588
590These experimental features and tracing can be enabled by changing flags in 589These experimental features and tracing can be enabled by changing flags in
591/proc/fs/cifs (after the cifs module has been installed or built into the 590/proc/fs/cifs (after the cifs module has been installed or built into the
@@ -608,7 +607,8 @@ the start of smb requests and responses can be enabled via:
608Two other experimental features are under development. To test these 607Two other experimental features are under development. To test these
609requires enabling CONFIG_CIFS_EXPERIMENTAL 608requires enabling CONFIG_CIFS_EXPERIMENTAL
610 609
611 ipv6 enablement 610 cifsacl support needed to retrieve approximated mode bits based on
611 the contents on the CIFS ACL.
612 612
613 DNOTIFY fcntl: needed for support of directory change 613 DNOTIFY fcntl: needed for support of directory change
614 notification and perhaps later for file leases) 614 notification and perhaps later for file leases)
@@ -625,10 +625,7 @@ that they represent all for that share, not just those for which the server
625returned success. 625returned success.
626 626
627Also note that "cat /proc/fs/cifs/DebugData" will display information about 627Also note that "cat /proc/fs/cifs/DebugData" will display information about
628the active sessions and the shares that are mounted. Note: NTLMv2 enablement 628the active sessions and the shares that are mounted.
629will not work since its implementation is not quite complete yet. Do not alter 629Enabling Kerberos (extended security) works when CONFIG_CIFS_EXPERIMENTAL is enabled
630the ExtendedSecurity configuration value unless you are doing specific testing. 630but requires a user space helper (from the Samba project). NTLM and NTLMv2 and
631Enabling extended security works to Windows 2000 Workstations and XP but not to 631LANMAN support do not require this helpr.
632Windows 2000 server or Samba since it does not usually send "raw NTLMSSP"
633(instead it sends NTLMSSP encapsulated in SPNEGO/GSSAPI, which support is not
634complete in the CIFS VFS yet).
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c
index 416dc9fe8961..093beaa3900d 100644
--- a/fs/cifs/cifsfs.c
+++ b/fs/cifs/cifsfs.c
@@ -266,6 +266,7 @@ cifs_alloc_inode(struct super_block *sb)
266 cifs_inode->cifsAttrs = 0x20; /* default */ 266 cifs_inode->cifsAttrs = 0x20; /* default */
267 atomic_set(&cifs_inode->inUse, 0); 267 atomic_set(&cifs_inode->inUse, 0);
268 cifs_inode->time = 0; 268 cifs_inode->time = 0;
269 cifs_inode->write_behind_rc = 0;
269 /* Until the file is open and we have gotten oplock 270 /* Until the file is open and we have gotten oplock
270 info back from the server, can not assume caching of 271 info back from the server, can not assume caching of
271 file data or metadata */ 272 file data or metadata */
@@ -852,7 +853,7 @@ static int cifs_oplock_thread(void *dummyarg)
852 struct cifsTconInfo *pTcon; 853 struct cifsTconInfo *pTcon;
853 struct inode *inode; 854 struct inode *inode;
854 __u16 netfid; 855 __u16 netfid;
855 int rc; 856 int rc, waitrc = 0;
856 857
857 set_freezable(); 858 set_freezable();
858 do { 859 do {
@@ -884,9 +885,11 @@ static int cifs_oplock_thread(void *dummyarg)
884 filemap_fdatawrite(inode->i_mapping); 885 filemap_fdatawrite(inode->i_mapping);
885 if (CIFS_I(inode)->clientCanCacheRead 886 if (CIFS_I(inode)->clientCanCacheRead
886 == 0) { 887 == 0) {
887 filemap_fdatawait(inode->i_mapping); 888 waitrc = filemap_fdatawait(inode->i_mapping);
888 invalidate_remote_inode(inode); 889 invalidate_remote_inode(inode);
889 } 890 }
891 if (rc == 0)
892 rc = waitrc;
890 } else 893 } else
891 rc = 0; 894 rc = 0;
892 /* mutex_unlock(&inode->i_mutex);*/ 895 /* mutex_unlock(&inode->i_mutex);*/
diff --git a/fs/cifs/file.c b/fs/cifs/file.c
index 802564196510..dd26e2759b17 100644
--- a/fs/cifs/file.c
+++ b/fs/cifs/file.c
@@ -130,7 +130,9 @@ static inline int cifs_open_inode_helper(struct inode *inode, struct file *file,
130 if (file->f_path.dentry->d_inode->i_mapping) { 130 if (file->f_path.dentry->d_inode->i_mapping) {
131 /* BB no need to lock inode until after invalidate 131 /* BB no need to lock inode until after invalidate
132 since namei code should already have it locked? */ 132 since namei code should already have it locked? */
133 filemap_write_and_wait(file->f_path.dentry->d_inode->i_mapping); 133 rc = filemap_write_and_wait(file->f_path.dentry->d_inode->i_mapping);
134 if (rc != 0)
135 CIFS_I(file->f_path.dentry->d_inode)->write_behind_rc = rc;
134 } 136 }
135 cFYI(1, ("invalidating remote inode since open detected it " 137 cFYI(1, ("invalidating remote inode since open detected it "
136 "changed")); 138 "changed"));
@@ -425,7 +427,9 @@ reopen_error_exit:
425 pCifsInode = CIFS_I(inode); 427 pCifsInode = CIFS_I(inode);
426 if (pCifsInode) { 428 if (pCifsInode) {
427 if (can_flush) { 429 if (can_flush) {
428 filemap_write_and_wait(inode->i_mapping); 430 rc = filemap_write_and_wait(inode->i_mapping);
431 if (rc != 0)
432 CIFS_I(inode)->write_behind_rc = rc;
429 /* temporarily disable caching while we 433 /* temporarily disable caching while we
430 go to server to get inode info */ 434 go to server to get inode info */
431 pCifsInode->clientCanCacheAll = FALSE; 435 pCifsInode->clientCanCacheAll = FALSE;
@@ -1367,7 +1371,10 @@ retry:
1367 rc, bytes_written)); 1371 rc, bytes_written));
1368 /* BB what if continued retry is 1372 /* BB what if continued retry is
1369 requested via mount flags? */ 1373 requested via mount flags? */
1370 set_bit(AS_EIO, &mapping->flags); 1374 if (rc == -ENOSPC)
1375 set_bit(AS_ENOSPC, &mapping->flags);
1376 else
1377 set_bit(AS_EIO, &mapping->flags);
1371 } else { 1378 } else {
1372 cifs_stats_bytes_written(cifs_sb->tcon, 1379 cifs_stats_bytes_written(cifs_sb->tcon,
1373 bytes_written); 1380 bytes_written);
@@ -1499,9 +1506,11 @@ int cifs_fsync(struct file *file, struct dentry *dentry, int datasync)
1499 cFYI(1, ("Sync file - name: %s datasync: 0x%x", 1506 cFYI(1, ("Sync file - name: %s datasync: 0x%x",
1500 dentry->d_name.name, datasync)); 1507 dentry->d_name.name, datasync));
1501 1508
1502 rc = filemap_fdatawrite(inode->i_mapping); 1509 rc = filemap_write_and_wait(inode->i_mapping);
1503 if (rc == 0) 1510 if (rc == 0) {
1511 rc = CIFS_I(inode)->write_behind_rc;
1504 CIFS_I(inode)->write_behind_rc = 0; 1512 CIFS_I(inode)->write_behind_rc = 0;
1513 }
1505 FreeXid(xid); 1514 FreeXid(xid);
1506 return rc; 1515 return rc;
1507} 1516}
@@ -1553,8 +1562,11 @@ int cifs_flush(struct file *file, fl_owner_t id)
1553 filemapfdatawrite appears easier for the time being */ 1562 filemapfdatawrite appears easier for the time being */
1554 1563
1555 rc = filemap_fdatawrite(inode->i_mapping); 1564 rc = filemap_fdatawrite(inode->i_mapping);
1556 if (!rc) /* reset wb rc if we were able to write out dirty pages */ 1565 /* reset wb rc if we were able to write out dirty pages */
1566 if (!rc) {
1567 rc = CIFS_I(inode)->write_behind_rc;
1557 CIFS_I(inode)->write_behind_rc = 0; 1568 CIFS_I(inode)->write_behind_rc = 0;
1569 }
1558 1570
1559 cFYI(1, ("Flush inode %p file %p rc %d", inode, file, rc)); 1571 cFYI(1, ("Flush inode %p file %p rc %d", inode, file, rc));
1560 1572
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c
index 7d907e84e032..e915eb1d2e66 100644
--- a/fs/cifs/inode.c
+++ b/fs/cifs/inode.c
@@ -1233,7 +1233,7 @@ cifs_rename_exit:
1233int cifs_revalidate(struct dentry *direntry) 1233int cifs_revalidate(struct dentry *direntry)
1234{ 1234{
1235 int xid; 1235 int xid;
1236 int rc = 0; 1236 int rc = 0, wbrc = 0;
1237 char *full_path; 1237 char *full_path;
1238 struct cifs_sb_info *cifs_sb; 1238 struct cifs_sb_info *cifs_sb;
1239 struct cifsInodeInfo *cifsInode; 1239 struct cifsInodeInfo *cifsInode;
@@ -1333,7 +1333,9 @@ int cifs_revalidate(struct dentry *direntry)
1333 if (direntry->d_inode->i_mapping) { 1333 if (direntry->d_inode->i_mapping) {
1334 /* do we need to lock inode until after invalidate completes 1334 /* do we need to lock inode until after invalidate completes
1335 below? */ 1335 below? */
1336 filemap_fdatawrite(direntry->d_inode->i_mapping); 1336 wbrc = filemap_fdatawrite(direntry->d_inode->i_mapping);
1337 if (wbrc)
1338 CIFS_I(direntry->d_inode)->write_behind_rc = wbrc;
1337 } 1339 }
1338 if (invalidate_inode) { 1340 if (invalidate_inode) {
1339 /* shrink_dcache not necessary now that cifs dentry ops 1341 /* shrink_dcache not necessary now that cifs dentry ops
@@ -1342,7 +1344,9 @@ int cifs_revalidate(struct dentry *direntry)
1342 shrink_dcache_parent(direntry); */ 1344 shrink_dcache_parent(direntry); */
1343 if (S_ISREG(direntry->d_inode->i_mode)) { 1345 if (S_ISREG(direntry->d_inode->i_mode)) {
1344 if (direntry->d_inode->i_mapping) 1346 if (direntry->d_inode->i_mapping)
1345 filemap_fdatawait(direntry->d_inode->i_mapping); 1347 wbrc = filemap_fdatawait(direntry->d_inode->i_mapping);
1348 if (wbrc)
1349 CIFS_I(direntry->d_inode)->write_behind_rc = wbrc;
1346 /* may eventually have to do this for open files too */ 1350 /* may eventually have to do this for open files too */
1347 if (list_empty(&(cifsInode->openFileList))) { 1351 if (list_empty(&(cifsInode->openFileList))) {
1348 /* changed on server - flush read ahead pages */ 1352 /* changed on server - flush read ahead pages */
@@ -1485,10 +1489,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs)
1485 1489
1486 /* BB check if we need to refresh inode from server now ? BB */ 1490 /* BB check if we need to refresh inode from server now ? BB */
1487 1491
1488 /* need to flush data before changing file size on server */
1489 filemap_write_and_wait(direntry->d_inode->i_mapping);
1490
1491 if (attrs->ia_valid & ATTR_SIZE) { 1492 if (attrs->ia_valid & ATTR_SIZE) {
1493 /*
1494 Flush data before changing file size on server. If the
1495 flush returns error, store it to report later and continue.
1496 BB: This should be smarter. Why bother flushing pages that
1497 will be truncated anyway? Also, should we error out here if
1498 the flush returns error?
1499 */
1500 rc = filemap_write_and_wait(direntry->d_inode->i_mapping);
1501 if (rc != 0) {
1502 CIFS_I(direntry->d_inode)->write_behind_rc = rc;
1503 rc = 0;
1504 }
1505
1492 /* To avoid spurious oplock breaks from server, in the case of 1506 /* To avoid spurious oplock breaks from server, in the case of
1493 inodes that we already have open, avoid doing path based 1507 inodes that we already have open, avoid doing path based
1494 setting of file size if we can do it by handle. 1508 setting of file size if we can do it by handle.