litmus-rt.git/fs/nfs, branch tracing-devel

Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache

2009-11-30T21:33:48+00:00

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (31 commits)
  FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
  SLOW_WORK: Fix GFS2 to #include  before using THIS_MODULE
  SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
  CacheFiles: Don't log lookup/create failing with ENOBUFS
  CacheFiles: Catch an overly long wait for an old active object
  CacheFiles: Better showing of debugging information in active object problems
  CacheFiles: Mark parent directory locks as I_MUTEX_PARENT to keep lockdep happy
  CacheFiles: Handle truncate unlocking the page we're reading
  CacheFiles: Don't write a full page if there's only a partial page to cache
  FS-Cache: Actually requeue an object when requested
  FS-Cache: Start processing an object's operations on that object's death
  FS-Cache: Make sure FSCACHE_COOKIE_LOOKING_UP cleared on lookup failure
  FS-Cache: Add a retirement stat counter
  FS-Cache: Handle pages pending storage that get evicted under OOM conditions
  FS-Cache: Handle read request vs lookup, creation or other cache failure
  FS-Cache: Don't delete pending pages from the page-store tracking tree
  FS-Cache: Fix lock misorder in fscache_write_op()
  FS-Cache: The object-available state can't rely on the cookie to be available
  FS-Cache: Permit cache retrieval ops to be interrupted in the initial wait phase
  FS-Cache: Use radix tree preload correctly in tracking of pages to be stored
  ...

FS-Cache: Handle pages pending storage that get evicted under OOM conditions

2009-11-19T18:11:35+00:00

Handle netfs pages that the vmscan algorithm wants to evict from the pagecache
under OOM conditions, but that are waiting for write to the cache.  Under these
conditions, vmscan calls the releasepage() function of the netfs, asking if a
page can be discarded.

The problem is typified by the following trace of a stuck process:

	kslowd005     D 0000000000000000     0  4253      2 0x00000080
	 ffff88001b14f370 0000000000000046 ffff880020d0d000 0000000000000007
	 0000000000000006 0000000000000001 ffff88001b14ffd8 ffff880020d0d2a8
	 000000000000ddf0 00000000000118c0 00000000000118c0 ffff880020d0d2a8
	Call Trace:
	 [] __fscache_wait_on_page_write+0x8b/0xa7 [fscache]
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __fscache_check_page_write+0x63/0x70 [fscache]
	 [] nfs_fscache_release_page+0x4e/0xc4 [nfs]
	 [] nfs_release_page+0x3c/0x41 [nfs]
	 [] try_to_release_page+0x32/0x3b
	 [] shrink_page_list+0x316/0x4ac
	 [] shrink_inactive_list+0x392/0x67c
	 [] ? __mutex_unlock_slowpath+0x100/0x10b
	 [] ? trace_hardirqs_on_caller+0x10c/0x130
	 [] ? mutex_unlock+0x9/0xb
	 [] shrink_list+0x8d/0x8f
	 [] shrink_zone+0x278/0x33c
	 [] ? ktime_get_ts+0xad/0xba
	 [] try_to_free_pages+0x22e/0x392
	 [] ? isolate_pages_global+0x0/0x212
	 [] __alloc_pages_nodemask+0x3dc/0x5cf
	 [] grab_cache_page_write_begin+0x65/0xaa
	 [] ext3_write_begin+0x78/0x1eb
	 [] generic_file_buffered_write+0x109/0x28c
	 [] ? current_fs_time+0x22/0x29
	 [] __generic_file_aio_write+0x350/0x385
	 [] ? generic_file_aio_write+0x4a/0xae
	 [] generic_file_aio_write+0x60/0xae
	 [] do_sync_write+0xe3/0x120
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? __dentry_open+0x1a5/0x2b8
	 [] ? dentry_open+0x82/0x89
	 [] cachefiles_write_page+0x298/0x335 [cachefiles]
	 [] fscache_write_op+0x178/0x2c2 [fscache]
	 [] fscache_op_execute+0x7a/0xd1 [fscache]
	 [] slow_work_execute+0x18f/0x2d1
	 [] slow_work_thread+0x1c5/0x308
	 [] ? autoremove_wake_function+0x0/0x34
	 [] ? slow_work_thread+0x0/0x308
	 [] kthread+0x7a/0x82
	 [] child_rip+0xa/0x20
	 [] ? restore_args+0x0/0x30
	 [] ? tg_shares_up+0x171/0x227
	 [] ? kthread+0x0/0x82
	 [] ? child_rip+0x0/0x20

In the above backtrace, the following is happening:

 (1) A page storage operation is being executed by a slow-work thread
     (fscache_write_op()).

 (2) FS-Cache farms the operation out to the cache to perform
     (cachefiles_write_page()).

 (3) CacheFiles is then calling Ext3 to perform the actual write, using Ext3's
     standard write (do_sync_write()) under KERNEL_DS directly from the netfs
     page.

 (4) However, for Ext3 to perform the write, it must allocate some memory, in
     particular, it must allocate at least one page cache page into which it
     can copy the data from the netfs page.

 (5) Under OOM conditions, the memory allocator can't immediately come up with
     a page, so it uses vmscan to find something to discard
     (try_to_free_pages()).

 (6) vmscan finds a clean netfs page it might be able to discard (possibly the
     one it's trying to write out).

 (7) The netfs is called to throw the page away (nfs_release_page()) - but it's
     called with __GFP_WAIT, so the netfs decides to wait for the store to
     complete (__fscache_wait_on_page_write()).

 (8) This blocks a slow-work processing thread - possibly against itself.

The system ends up stuck because it can't write out any netfs pages to the
cache without allocating more memory.

To avoid this, we make FS-Cache cancel some writes that aren't in the middle of
actually being performed.  This means that some data won't make it into the
cache this time.  To support this, a new FS-Cache function is added
fscache_maybe_release_page() that replaces what the netfs releasepage()
functions used to do with respect to the cache.

The decisions fscache_maybe_release_page() makes are counted and displayed
through /proc/fs/fscache/stats on a line labelled "VmScan".  There are four
counters provided: "nos=N" - pages that weren't pending storage; "gon=N" -
pages that were pending storage when we first looked, but weren't by the time
we got the object lock; "bsy=N" - pages that we ignored as they were actively
being written when we looked; and "can=N" - pages that we cancelled the storage
of.

What I'd really like to do is alter the behaviour of the cancellation
heuristics, depending on how necessary it is to expel pages.  If there are
plenty of other pages that aren't waiting to be written to the cache that
could be ejected first, then it would be nice to hold up on immediate
cancellation of cache writes - but I don't see a way of doing that.

Signed-off-by: David Howells

NFSv4: Fix a cache validation bug which causes getcwd() to return ENOENT

2009-11-11T07:15:42+00:00

Changeset a65318bf3afc93ce49227e849d213799b072c5fd (NFSv4: Simplify some
cache consistency post-op GETATTRs) incorrectly changed the getattr
bitmap for readdir().
This causes the readdir() function to fail to return a
fileid/inode number, which again exposed a bug in the NFS readdir code that
causes spurious ENOENT errors to appear in applications (see
http://bugzilla.kernel.org/show_bug.cgi?id=14541).

The immediate band aid is to revert the incorrect bitmap change, but more
long term, we should change the NFS readdir code to cope with the
fact that NFSv4 servers are not required to support fileids/inode numbers.

Reported-by: Daniel J Blueman 
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust

NFSv4: The link() operation should return any delegation on the file

2009-10-26T12:09:46+00:00

Otherwise, we have to wait for the server to recall it.

Signed-off-by: Trond Myklebust

NFSv4: Fix two unbalanced put_rpccred() issues.

2009-10-26T12:09:46+00:00

Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence
operation) introduce a couple of put_rpccred() calls on credentials for
which there is no corresponding get_rpccred().

See http://bugzilla.kernel.org/show_bug.cgi?id=14249

Signed-off-by: Trond Myklebust

NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE

2009-10-23T18:46:42+00:00

RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not
supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc
operations. The problem is that we map that error into EREMOTEIO in the XDR
layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(),
and nfs_increment_seqid() don't recognise it.

The fix is to defer the mapping until after the middle layers have
processed the error.

Signed-off-by: Trond Myklebust

nfs: Panic when commit fails

2009-10-23T18:16:30+00:00

Actually pass the NFS_FILE_SYNC option to the server to avoid a
Panic in nfs_direct_write_complete() when a commit fails.

At the end of an nfs write, if the nfs commit fails, all the writes
will be rescheduled.  They are supposed to be rescheduled as NFS_FILE_SYNC
writes, but the rpc_task structure is not completely intialized and so
the option is not passed.  When the rescheduled writes complete, the
return indicates that they are NFS_UNSTABLE and we try to do another
commit.  This leads to a Panic because the commit data structure pointer
was set to null in the initial (failed) commit attempt.

Signed-off-by: Terry Loftin 
Signed-off-by: Trond Myklebust

nfs: Fix nfs_parse_mount_options() kfree() leak

2009-10-21T23:15:23+00:00

Fix a (small) memory leak in one of the error paths of the NFS mount
options parsing code.

Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
rpc/rdma transport module automatically).

Reported-by: Yinghai Lu 
Reported-by: Pekka Enberg 
Signed-off-by: Ingo Molnar 
Signed-off-by: Trond Myklebust 
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

NFS: suppress a build warning

2009-10-12T17:25:12+00:00

struct sockaddr_storage * can safely be used as struct sockaddr *.
Suppress an "incompatible pointer type" warning.

Signed-off-by: Stefan Richter 
Signed-off-by: Trond Myklebust 
Signed-off-by: Linus Torvalds

NFSv4: Kill nfs4_renewd_prepare_shutdown()

2009-10-08T15:50:55+00:00

The NFSv4 renew daemon is shared between all active super blocks that refer
to a particular NFS server, so it is wrong to be shutting it down in
nfs4_kill_super every time a super block is destroyed.

This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and
leaves it up to nfs4_shutdown_client() to also shut down the renew daemon
by means of the existing call to nfs4_kill_renewd().

Signed-off-by: Trond Myklebust