diff options
author | Jeff Layton <jlayton@redhat.com> | 2007-11-20 18:19:03 -0500 |
---|---|---|
committer | Steve French <sfrench@us.ibm.com> | 2007-11-20 18:19:03 -0500 |
commit | cea218054ad277d6c126890213afde07b4eb1602 (patch) | |
tree | 8bfbd3c7d8ab94d35ec749ed4e0d66b1f6b69101 /fs | |
parent | 2a97468024fb5b6eccee2a67a7796485c829343a (diff) |
[CIFS] Fix potential data corruption when writing out cached dirty pages
Fix RedHat bug 329431
The idea here is separate "conscious" from "unconscious" flushes.
Conscious flushes are those due to a fsync() or close(). Unconscious
ones are flushes that occur as a side effect of some other operation or
due to memory pressure.
Currently, when an error occurs during an unconscious flush (ENOSPC or
EIO), we toss out the page and don't preserve that error to report to
the user when a conscious flush occurs. If after the unconscious flush,
there are no more dirty pages for the inode, the conscious flush will
simply return success even though there were previous errors when writing
out pages. This can lead to data corruption.
The easiest way to reproduce this is to mount up a CIFS share that's
very close to being full or where the user is very close to quota. mv
a file to the share that's slightly larger than the quota allows. The
writes will all succeed (since they go to pagecache). The mv will do a
setattr to set the new file's attributes. This calls
filemap_write_and_wait,
which will return an error since all of the pages can't be written out.
Then later, when the flush and release ops occur, there are no more
dirty pages in pagecache for the file and those operations return 0. mv
then assumes that the file was written out correctly and deletes the
original.
CIFS already has a write_behind_rc variable where it stores the results
from earlier flushes, but that value is only reported in cifs_close.
Since the VFS ignores the return value from the release operation, this
isn't helpful. We should be reporting this error during the flush
operation.
This patch does the following:
1) changes cifs_fsync to use filemap_write_and_wait and cifs_flush and also
sync to check its return code. If it returns successful, they then check
the value of write_behind_rc to see if an earlier flush had reported any
errors. If so, they return that error and clear write_behind_rc.
2) sets write_behind_rc in a few other places where pages are written
out as a side effect of other operations and the code waits on them.
3) changes cifs_setattr to only call filemap_write_and_wait for
ATTR_SIZE changes.
4) makes cifs_writepages accurately distinguish between EIO and ENOSPC
errors when writing out pages.
Some simple testing indicates that the patch works as expected and that
it fixes the reproduceable known problem.
Acked-by: Dave Kleikamp <shaggy@austin.rr.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Diffstat (limited to 'fs')
-rw-r--r-- | fs/cifs/CHANGES | 4 | ||||
-rw-r--r-- | fs/cifs/README | 27 | ||||
-rw-r--r-- | fs/cifs/cifsfs.c | 7 | ||||
-rw-r--r-- | fs/cifs/file.c | 24 | ||||
-rw-r--r-- | fs/cifs/inode.c | 26 |
5 files changed, 58 insertions, 30 deletions
diff --git a/fs/cifs/CHANGES b/fs/cifs/CHANGES index e31aa74f7d9e..a609599287aa 100644 --- a/fs/cifs/CHANGES +++ b/fs/cifs/CHANGES | |||
@@ -1,7 +1,9 @@ | |||
1 | Version 1.52 | 1 | Version 1.52 |
2 | ------------ | 2 | ------------ |
3 | Fix oops on second mount to server when null auth is used. | 3 | Fix oops on second mount to server when null auth is used. |
4 | Enable experimental Kerberos support | 4 | Enable experimental Kerberos support. Return writebehind errors on flush |
5 | and sync so that events like out of disk space get reported properly on | ||
6 | cached files. | ||
5 | 7 | ||
6 | Version 1.51 | 8 | Version 1.51 |
7 | ------------ | 9 | ------------ |
diff --git a/fs/cifs/README b/fs/cifs/README index b806b11b5560..bf11329ac784 100644 --- a/fs/cifs/README +++ b/fs/cifs/README | |||
@@ -225,12 +225,9 @@ If no password is provided, mount.cifs will prompt for password entry | |||
225 | 225 | ||
226 | Restrictions | 226 | Restrictions |
227 | ============ | 227 | ============ |
228 | Servers must support the NTLM SMB dialect (which is the most recent, supported | ||
229 | by Samba and Windows NT version 4, 2000 and XP and many other SMB/CIFS servers) | ||
230 | Servers must support either "pure-TCP" (port 445 TCP/IP CIFS connections) or RFC | 228 | Servers must support either "pure-TCP" (port 445 TCP/IP CIFS connections) or RFC |
231 | 1001/1002 support for "Netbios-Over-TCP/IP." Neither of these is likely to be a | 229 | 1001/1002 support for "Netbios-Over-TCP/IP." This is not likely to be a |
232 | problem as most servers support this. IPv6 support is planned for the future, | 230 | problem as most servers support this. |
233 | and is almost complete. | ||
234 | 231 | ||
235 | Valid filenames differ between Windows and Linux. Windows typically restricts | 232 | Valid filenames differ between Windows and Linux. Windows typically restricts |
236 | filenames which contain certain reserved characters (e.g.the character : | 233 | filenames which contain certain reserved characters (e.g.the character : |
@@ -458,6 +455,8 @@ A partial list of the supported mount options follows: | |||
458 | byte range locks). | 455 | byte range locks). |
459 | remount remount the share (often used to change from ro to rw mounts | 456 | remount remount the share (often used to change from ro to rw mounts |
460 | or vice versa) | 457 | or vice versa) |
458 | cifsacl Report mode bits (e.g. on stat) based on the Windows ACL for | ||
459 | the file. (EXPERIMENTAL) | ||
461 | servern Specify the server 's netbios name (RFC1001 name) to use | 460 | servern Specify the server 's netbios name (RFC1001 name) to use |
462 | when attempting to setup a session to the server. This is | 461 | when attempting to setup a session to the server. This is |
463 | This is needed for mounting to some older servers (such | 462 | This is needed for mounting to some older servers (such |
@@ -584,8 +583,8 @@ Experimental When set to 1 used to enable certain experimental | |||
584 | performance enhancement was disabled when | 583 | performance enhancement was disabled when |
585 | signing turned on in case buffer was modified | 584 | signing turned on in case buffer was modified |
586 | just before it was sent, also this flag will | 585 | just before it was sent, also this flag will |
587 | be used to use the new experimental sessionsetup | 586 | be used to use the new experimental directory change |
588 | code). | 587 | notification code). |
589 | 588 | ||
590 | These experimental features and tracing can be enabled by changing flags in | 589 | These experimental features and tracing can be enabled by changing flags in |
591 | /proc/fs/cifs (after the cifs module has been installed or built into the | 590 | /proc/fs/cifs (after the cifs module has been installed or built into the |
@@ -608,7 +607,8 @@ the start of smb requests and responses can be enabled via: | |||
608 | Two other experimental features are under development. To test these | 607 | Two other experimental features are under development. To test these |
609 | requires enabling CONFIG_CIFS_EXPERIMENTAL | 608 | requires enabling CONFIG_CIFS_EXPERIMENTAL |
610 | 609 | ||
611 | ipv6 enablement | 610 | cifsacl support needed to retrieve approximated mode bits based on |
611 | the contents on the CIFS ACL. | ||
612 | 612 | ||
613 | DNOTIFY fcntl: needed for support of directory change | 613 | DNOTIFY fcntl: needed for support of directory change |
614 | notification and perhaps later for file leases) | 614 | notification and perhaps later for file leases) |
@@ -625,10 +625,7 @@ that they represent all for that share, not just those for which the server | |||
625 | returned success. | 625 | returned success. |
626 | 626 | ||
627 | Also note that "cat /proc/fs/cifs/DebugData" will display information about | 627 | Also note that "cat /proc/fs/cifs/DebugData" will display information about |
628 | the active sessions and the shares that are mounted. Note: NTLMv2 enablement | 628 | the active sessions and the shares that are mounted. |
629 | will not work since its implementation is not quite complete yet. Do not alter | 629 | Enabling Kerberos (extended security) works when CONFIG_CIFS_EXPERIMENTAL is enabled |
630 | the ExtendedSecurity configuration value unless you are doing specific testing. | 630 | but requires a user space helper (from the Samba project). NTLM and NTLMv2 and |
631 | Enabling extended security works to Windows 2000 Workstations and XP but not to | 631 | LANMAN support do not require this helpr. |
632 | Windows 2000 server or Samba since it does not usually send "raw NTLMSSP" | ||
633 | (instead it sends NTLMSSP encapsulated in SPNEGO/GSSAPI, which support is not | ||
634 | complete in the CIFS VFS yet). | ||
diff --git a/fs/cifs/cifsfs.c b/fs/cifs/cifsfs.c index 416dc9fe8961..093beaa3900d 100644 --- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c | |||
@@ -266,6 +266,7 @@ cifs_alloc_inode(struct super_block *sb) | |||
266 | cifs_inode->cifsAttrs = 0x20; /* default */ | 266 | cifs_inode->cifsAttrs = 0x20; /* default */ |
267 | atomic_set(&cifs_inode->inUse, 0); | 267 | atomic_set(&cifs_inode->inUse, 0); |
268 | cifs_inode->time = 0; | 268 | cifs_inode->time = 0; |
269 | cifs_inode->write_behind_rc = 0; | ||
269 | /* Until the file is open and we have gotten oplock | 270 | /* Until the file is open and we have gotten oplock |
270 | info back from the server, can not assume caching of | 271 | info back from the server, can not assume caching of |
271 | file data or metadata */ | 272 | file data or metadata */ |
@@ -852,7 +853,7 @@ static int cifs_oplock_thread(void *dummyarg) | |||
852 | struct cifsTconInfo *pTcon; | 853 | struct cifsTconInfo *pTcon; |
853 | struct inode *inode; | 854 | struct inode *inode; |
854 | __u16 netfid; | 855 | __u16 netfid; |
855 | int rc; | 856 | int rc, waitrc = 0; |
856 | 857 | ||
857 | set_freezable(); | 858 | set_freezable(); |
858 | do { | 859 | do { |
@@ -884,9 +885,11 @@ static int cifs_oplock_thread(void *dummyarg) | |||
884 | filemap_fdatawrite(inode->i_mapping); | 885 | filemap_fdatawrite(inode->i_mapping); |
885 | if (CIFS_I(inode)->clientCanCacheRead | 886 | if (CIFS_I(inode)->clientCanCacheRead |
886 | == 0) { | 887 | == 0) { |
887 | filemap_fdatawait(inode->i_mapping); | 888 | waitrc = filemap_fdatawait(inode->i_mapping); |
888 | invalidate_remote_inode(inode); | 889 | invalidate_remote_inode(inode); |
889 | } | 890 | } |
891 | if (rc == 0) | ||
892 | rc = waitrc; | ||
890 | } else | 893 | } else |
891 | rc = 0; | 894 | rc = 0; |
892 | /* mutex_unlock(&inode->i_mutex);*/ | 895 | /* mutex_unlock(&inode->i_mutex);*/ |
diff --git a/fs/cifs/file.c b/fs/cifs/file.c index 802564196510..dd26e2759b17 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c | |||
@@ -130,7 +130,9 @@ static inline int cifs_open_inode_helper(struct inode *inode, struct file *file, | |||
130 | if (file->f_path.dentry->d_inode->i_mapping) { | 130 | if (file->f_path.dentry->d_inode->i_mapping) { |
131 | /* BB no need to lock inode until after invalidate | 131 | /* BB no need to lock inode until after invalidate |
132 | since namei code should already have it locked? */ | 132 | since namei code should already have it locked? */ |
133 | filemap_write_and_wait(file->f_path.dentry->d_inode->i_mapping); | 133 | rc = filemap_write_and_wait(file->f_path.dentry->d_inode->i_mapping); |
134 | if (rc != 0) | ||
135 | CIFS_I(file->f_path.dentry->d_inode)->write_behind_rc = rc; | ||
134 | } | 136 | } |
135 | cFYI(1, ("invalidating remote inode since open detected it " | 137 | cFYI(1, ("invalidating remote inode since open detected it " |
136 | "changed")); | 138 | "changed")); |
@@ -425,7 +427,9 @@ reopen_error_exit: | |||
425 | pCifsInode = CIFS_I(inode); | 427 | pCifsInode = CIFS_I(inode); |
426 | if (pCifsInode) { | 428 | if (pCifsInode) { |
427 | if (can_flush) { | 429 | if (can_flush) { |
428 | filemap_write_and_wait(inode->i_mapping); | 430 | rc = filemap_write_and_wait(inode->i_mapping); |
431 | if (rc != 0) | ||
432 | CIFS_I(inode)->write_behind_rc = rc; | ||
429 | /* temporarily disable caching while we | 433 | /* temporarily disable caching while we |
430 | go to server to get inode info */ | 434 | go to server to get inode info */ |
431 | pCifsInode->clientCanCacheAll = FALSE; | 435 | pCifsInode->clientCanCacheAll = FALSE; |
@@ -1367,7 +1371,10 @@ retry: | |||
1367 | rc, bytes_written)); | 1371 | rc, bytes_written)); |
1368 | /* BB what if continued retry is | 1372 | /* BB what if continued retry is |
1369 | requested via mount flags? */ | 1373 | requested via mount flags? */ |
1370 | set_bit(AS_EIO, &mapping->flags); | 1374 | if (rc == -ENOSPC) |
1375 | set_bit(AS_ENOSPC, &mapping->flags); | ||
1376 | else | ||
1377 | set_bit(AS_EIO, &mapping->flags); | ||
1371 | } else { | 1378 | } else { |
1372 | cifs_stats_bytes_written(cifs_sb->tcon, | 1379 | cifs_stats_bytes_written(cifs_sb->tcon, |
1373 | bytes_written); | 1380 | bytes_written); |
@@ -1499,9 +1506,11 @@ int cifs_fsync(struct file *file, struct dentry *dentry, int datasync) | |||
1499 | cFYI(1, ("Sync file - name: %s datasync: 0x%x", | 1506 | cFYI(1, ("Sync file - name: %s datasync: 0x%x", |
1500 | dentry->d_name.name, datasync)); | 1507 | dentry->d_name.name, datasync)); |
1501 | 1508 | ||
1502 | rc = filemap_fdatawrite(inode->i_mapping); | 1509 | rc = filemap_write_and_wait(inode->i_mapping); |
1503 | if (rc == 0) | 1510 | if (rc == 0) { |
1511 | rc = CIFS_I(inode)->write_behind_rc; | ||
1504 | CIFS_I(inode)->write_behind_rc = 0; | 1512 | CIFS_I(inode)->write_behind_rc = 0; |
1513 | } | ||
1505 | FreeXid(xid); | 1514 | FreeXid(xid); |
1506 | return rc; | 1515 | return rc; |
1507 | } | 1516 | } |
@@ -1553,8 +1562,11 @@ int cifs_flush(struct file *file, fl_owner_t id) | |||
1553 | filemapfdatawrite appears easier for the time being */ | 1562 | filemapfdatawrite appears easier for the time being */ |
1554 | 1563 | ||
1555 | rc = filemap_fdatawrite(inode->i_mapping); | 1564 | rc = filemap_fdatawrite(inode->i_mapping); |
1556 | if (!rc) /* reset wb rc if we were able to write out dirty pages */ | 1565 | /* reset wb rc if we were able to write out dirty pages */ |
1566 | if (!rc) { | ||
1567 | rc = CIFS_I(inode)->write_behind_rc; | ||
1557 | CIFS_I(inode)->write_behind_rc = 0; | 1568 | CIFS_I(inode)->write_behind_rc = 0; |
1569 | } | ||
1558 | 1570 | ||
1559 | cFYI(1, ("Flush inode %p file %p rc %d", inode, file, rc)); | 1571 | cFYI(1, ("Flush inode %p file %p rc %d", inode, file, rc)); |
1560 | 1572 | ||
diff --git a/fs/cifs/inode.c b/fs/cifs/inode.c index 7d907e84e032..e915eb1d2e66 100644 --- a/fs/cifs/inode.c +++ b/fs/cifs/inode.c | |||
@@ -1233,7 +1233,7 @@ cifs_rename_exit: | |||
1233 | int cifs_revalidate(struct dentry *direntry) | 1233 | int cifs_revalidate(struct dentry *direntry) |
1234 | { | 1234 | { |
1235 | int xid; | 1235 | int xid; |
1236 | int rc = 0; | 1236 | int rc = 0, wbrc = 0; |
1237 | char *full_path; | 1237 | char *full_path; |
1238 | struct cifs_sb_info *cifs_sb; | 1238 | struct cifs_sb_info *cifs_sb; |
1239 | struct cifsInodeInfo *cifsInode; | 1239 | struct cifsInodeInfo *cifsInode; |
@@ -1333,7 +1333,9 @@ int cifs_revalidate(struct dentry *direntry) | |||
1333 | if (direntry->d_inode->i_mapping) { | 1333 | if (direntry->d_inode->i_mapping) { |
1334 | /* do we need to lock inode until after invalidate completes | 1334 | /* do we need to lock inode until after invalidate completes |
1335 | below? */ | 1335 | below? */ |
1336 | filemap_fdatawrite(direntry->d_inode->i_mapping); | 1336 | wbrc = filemap_fdatawrite(direntry->d_inode->i_mapping); |
1337 | if (wbrc) | ||
1338 | CIFS_I(direntry->d_inode)->write_behind_rc = wbrc; | ||
1337 | } | 1339 | } |
1338 | if (invalidate_inode) { | 1340 | if (invalidate_inode) { |
1339 | /* shrink_dcache not necessary now that cifs dentry ops | 1341 | /* shrink_dcache not necessary now that cifs dentry ops |
@@ -1342,7 +1344,9 @@ int cifs_revalidate(struct dentry *direntry) | |||
1342 | shrink_dcache_parent(direntry); */ | 1344 | shrink_dcache_parent(direntry); */ |
1343 | if (S_ISREG(direntry->d_inode->i_mode)) { | 1345 | if (S_ISREG(direntry->d_inode->i_mode)) { |
1344 | if (direntry->d_inode->i_mapping) | 1346 | if (direntry->d_inode->i_mapping) |
1345 | filemap_fdatawait(direntry->d_inode->i_mapping); | 1347 | wbrc = filemap_fdatawait(direntry->d_inode->i_mapping); |
1348 | if (wbrc) | ||
1349 | CIFS_I(direntry->d_inode)->write_behind_rc = wbrc; | ||
1346 | /* may eventually have to do this for open files too */ | 1350 | /* may eventually have to do this for open files too */ |
1347 | if (list_empty(&(cifsInode->openFileList))) { | 1351 | if (list_empty(&(cifsInode->openFileList))) { |
1348 | /* changed on server - flush read ahead pages */ | 1352 | /* changed on server - flush read ahead pages */ |
@@ -1485,10 +1489,20 @@ int cifs_setattr(struct dentry *direntry, struct iattr *attrs) | |||
1485 | 1489 | ||
1486 | /* BB check if we need to refresh inode from server now ? BB */ | 1490 | /* BB check if we need to refresh inode from server now ? BB */ |
1487 | 1491 | ||
1488 | /* need to flush data before changing file size on server */ | ||
1489 | filemap_write_and_wait(direntry->d_inode->i_mapping); | ||
1490 | |||
1491 | if (attrs->ia_valid & ATTR_SIZE) { | 1492 | if (attrs->ia_valid & ATTR_SIZE) { |
1493 | /* | ||
1494 | Flush data before changing file size on server. If the | ||
1495 | flush returns error, store it to report later and continue. | ||
1496 | BB: This should be smarter. Why bother flushing pages that | ||
1497 | will be truncated anyway? Also, should we error out here if | ||
1498 | the flush returns error? | ||
1499 | */ | ||
1500 | rc = filemap_write_and_wait(direntry->d_inode->i_mapping); | ||
1501 | if (rc != 0) { | ||
1502 | CIFS_I(direntry->d_inode)->write_behind_rc = rc; | ||
1503 | rc = 0; | ||
1504 | } | ||
1505 | |||
1492 | /* To avoid spurious oplock breaks from server, in the case of | 1506 | /* To avoid spurious oplock breaks from server, in the case of |
1493 | inodes that we already have open, avoid doing path based | 1507 | inodes that we already have open, avoid doing path based |
1494 | setting of file size if we can do it by handle. | 1508 | setting of file size if we can do it by handle. |