aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* fs/ncpfs/getopt.c: cleanup keneldocQinghuang Feng2009-01-06
| | | | | | | | | | There are no argument named @flag in ncp_getopt(), remove it. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Petr Vandrovec <VANDROVE@vc.cvut.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/binfmt_misc.c: add terminating newline to /proc/sys/fs/binfmt_misc/statusQinghuang Feng2009-01-06
| | | | | | | | | | | | The following is what it looks like before patching. It is not much readable. user@ubuntu:/proc/sys/fs/binfmt_misc$ cat status enableduser@ubuntu:/proc/sys/fs/binfmt_misc$ Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: fix function param name in kernel-docRandy Dunlap2009-01-06
| | | | | | | | | | | Fix function parameter name in kernel-doc: Warning(linux-2.6.28-git5//fs/block_dev.c:1272): No description found for parameter 'pathname' Warning(linux-2.6.28-git5//fs/block_dev.c:1272): Excess function parameter 'path' description in 'lookup_bdev' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/inode: fix kernel-doc notationRandy Dunlap2009-01-06
| | | | | | | | | | | | | | Fix kernel-doc notation: Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'sb' Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'inode' Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'sb' Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'inode' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_coredump(): check return from argv_split()Tetsuo Handa2009-01-06
| | | | | | | | | do_coredump() accesses helper_argv[0] without checking helper_argv != NULL. This can happen if page allocation failed. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* add missing accounting calls to compat_sys_{readv,writev}Gerd Hoffmann2009-01-06
| | | | | | | | Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Cc: Jay Lan <jlan@engr.sgi.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: fix name overwrite in __register_chrdev_region()Cyrill Gorcunov2009-01-06
| | | | | | | | | | | | It's possible to register a chrdev with a name size exactly the same as was allocated in structure. It seems it was not intended behaviour. At least chrdev_show does not like it. Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* percpu_counter: FBC_BATCH should be a variableEric Dumazet2009-01-06
| | | | | | | | | | | | | | | | | | | | | For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS Considering more and more distros are using high NR_CPUS values, it makes sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS. A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This minimum value helps branch prediction in __percpu_counter_add()) We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically. We rename FBC_BATCH to percpu_counter_batch since its not a constant anymore. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* poll: allow f_op->poll to sleepTejun Heo2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | f_op->poll is the only vfs operation which is not allowed to sleep. It's because poll and select implementation used task state to synchronize against wake ups, which doesn't have to be the case anymore as wait/wake interface can now use custom wake up functions. The non-sleep restriction can be a bit tricky because ->poll is not called from an atomic context and the result of accidentally sleeping in ->poll only shows up as temporary busy looping when the timing is right or rather wrong. This patch converts poll/select to use custom wake up function and use separate triggered variable to synchronize against wake up events. The only added overhead is an extra function call during wake up and negligible. This patch removes the one non-sleep exception from vfs locking rules and is beneficial to userland filesystem implementations like FUSE, 9p or peculiar fs like spufs as it's very difficult for those to implement non-sleeping poll method. While at it, make the following cosmetic changes to make poll.h and select.c checkpatch friendly. * s/type * symbol/type *symbol/ : three places in poll.h * remove blank line before EXPORT_SYMBOL() : two places in select.c Oleg: spotted missing barrier in poll_schedule_timeout() Davide: spotted missing write barrier in pollwake() Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Ron Minnich <rminnich@sandia.gov> Cc: Ingo Molnar <mingo@elte.hu> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Brad Boyer <flar@allandria.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Roland McGrath <roland@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: use menuconfig to control the Misc. filesystems menuRandy Dunlap2009-01-06
| | | | | | | | | Have one option to control Miscellaneous filesystems. This makes it easy to disable all of them at one time. Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/exec.c:__bprm_mm_init(): clean up error handlingLuiz Fernando N. Capitulino2009-01-06
| | | | | | | | | Untangle the error unwinding in this function, saving a test of local variable `vma'. Signed-off-by: Luiz Fernando N. Capitulino <lcapitulino@mandriva.com.br> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: sys_sync fixNick Piggin2009-01-06
| | | | | | | | | | | | | | s_syncing livelock avoidance was breaking data integrity guarantee of sys_sync, by allowing sys_sync to skip writing or waiting for superblocks if there is a concurrent sys_sync happening. This livelock avoidance is much less important now that we don't have the get_super_to_sync() call after every sb that we sync. This was replaced by __put_super_and_need_restart. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: sync_sb_inodes fixNick Piggin2009-01-06
| | | | | | | | | | Fix data integrity semantics required by sys_sync, by iterating over all inodes and waiting for any writeback pages after the initial writeout. Comments explain the exact problem. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: remove WB_SYNC_HOLDNick Piggin2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove WB_SYNC_HOLD. The primary motiviation is the design of my anti-starvation code for fsync. It requires taking an inode lock over the sync operation, so we could run into lock ordering problems with multiple inodes. It is possible to take a single global lock to solve the ordering problem, but then that would prevent a future nice implementation of "sync multiple inodes" based on lock order via inode address. Seems like a backward step to remove this, but actually it is busted anyway: we can't use the inode lists for data integrity wait: an inode can be taken off the dirty lists but still be under writeback. In order to satisfy data integrity semantics, we should wait for it to finish writeback, but if we only search the dirty lists, we'll miss it. It would be possible to have a "writeback" list, for sys_sync, I suppose. But why complicate things by prematurely optimise? For unmounting, we could avoid the "livelock avoidance" code, which would be easier, but again premature IMO. Fixing the existing data integrity problem will come next. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* UBIFS: do not use WB_SYNC_HOLDArtem Bityutskiy2009-01-06
| | | | | | | | | | | | | | | | | | WB_SYNC_HOLD is going to be zapped so we should not use it. Use %WB_SYNC_NONE instead. Here is what akpm said: "I think I'll just switch that to WB_SYNC_NONE. The `wait==0' mode is just an advisory thing to help the fs shove lots of data into the queues. If some gets missed then it'll be picked up on the second ->sync_fs call, with wait==1." Thanks to Randy Dunlap for catching this. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* block_write_begin(): remove useless gotoFranck Bui-Huu2009-01-06
| | | | | | Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hugetlb: unsigned ret cannot be negativeRoel Kluin2009-01-06
| | | | | | | | | | | | | | unsigned long ret cannot be negative, but ret can get -EFAULT. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Hugh Dickins <hugh@veritas.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Adam Litke <agl@us.ibm.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Ken Chen <kenchen@google.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs: truncate blocks outside i_size after O_DIRECT write errorDmitri Monakhov2009-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error extending write may have instantiated a few blocks outside i_size. We need to trim these blocks. We have to do it *regardless* to blocksize. At least ext2, ext3 and reiserfs interpret (i_size < biggest block) condition as error. Fsck will complain about wrong i_size. Then fsck will fix the error by changing i_size according to the biggest block. This is bad because this blocks contain garbage from previous write attempt. And result in data corruption. ####TESTCASE_BEGIN $touch /mnt/test/BIG_FILE ## at this moment /mnt/test/BIG_FILE size and blocks equal to zero open("/mnt/test/BIG_FILE", O_WRONLY|O_CREAT|O_DIRECT, 0666) = 3 write(3, "aaaaaaaaaaaa"..., 104857600) = -1 ENOSPC (No space left on device) ## size and block sould't be changed because write op failed. $stat /mnt/test/BIG_FILE File: `/mnt/test/BIG_FILE' Size: 0 Blocks: 110896 IO Block: 1024 regular empty file <<<<<<<<^^^^^^^^^^^^^^^^^^^^^^^^^^^^^file size is less than biggest block idx Device: fe07h/65031d Inode: 14 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2007-01-24 20:03:38.000000000 +0300 Modify: 2007-01-24 20:03:38.000000000 +0300 Change: 2007-01-24 20:03:39.000000000 +0300 #fsck.ext3 -f /dev/VG/test e2fsck 1.39 (29-May-2006) Pass 1: Checking inodes, blocks, and sizes Inode 14, i_size is 0, should be 56556544. Fix<y>? yes Pass 2: Checking directory structure .... #####TESTCASE_ENDdiff --git a/fs/direct-io.c b/fs/direct-io.c index af0558d..4e88bea 100644 [akpm@linux-foundation.org: use i_size_read()] Signed-off-by: Dmitri Monakhov <dmonakhov@openvz.org> Cc: Zach Brown <zach.brown@oracle.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove GFP_HIGHUSER_PAGECACHEHugh Dickins2009-01-06
| | | | | | | | | | | | | | | GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making that harder to track down: remove it, and its out-of-work brothers GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE. Since we're making that improvement to hotremove_migrate_alloc(), I think we can now also remove one of the "o"s from its comment. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_mpage_readpage(): remove useless clear_buffer_mapped() callFranck Bui-Huu2009-01-06
| | | | | | | | It is known that buffer_mapped() is false in this code path. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: do_sync_mapping_range integrity fixNick Piggin2009-01-06
| | | | | | | | | | | | | | Chris Mason notices do_sync_mapping_range didn't actually ask for data integrity writeout. Unfortunately, it is advertised as being usable for data integrity operations. This is a data integrity bug. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* do_mpage_readpage(): don't submit lots of small bios on boundaryMiquel van Smoorenburg2009-01-06
| | | | | | | | | | | | | | | | | | | | | | While tracing I/O patterns with blktrace (a great tool) a few weeks ago I identified a minor issue in fs/mpage.c As the comment above mpage_readpages() says, a fs's get_block function will set BH_Boundary when it maps a block just before a block for which extra I/O is required. Since get_block() can map a range of pages, for all these pages the BH_Boundary flag will be set. But we only need to push what I/O we have accumulated at the last block of this range. This makes do_mpage_readpage() send out the largest possible bio instead of a bunch of page-sized ones in the BH_Boundary case. Signed-off-by: Miquel van Smoorenburg <mikevs@xs4all.net> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: report the MMU pagesize in /proc/pid/smapsMel Gorman2009-01-06
| | | | | | | | | | | | | | | The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the kernel to back a VMA. This matches the size used by the MMU in the majority of cases. However, one counter-example occurs on PPC64 kernels whereby a kernel using 64K as a base pagesize may still use 4K pages for the MMU on older processor. To distinguish, this patch reports MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Cc: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: report the pagesize backing a VMA in /proc/pid/smapsMel Gorman2009-01-06
| | | | | | | | | | | | | | | It is useful to verify a hugepage-aware application is using the expected pagesizes for its memory regions. This patch creates an entry called KernelPageSize in /proc/pid/smaps that is the size of page used by the kernel to back a VMA. The entry is not called PageSize as it is possible the MMU uses a different size. This extension should not break any sensible parser that skips lines containing unrecognised information. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: "KOSAKI Motohiro" <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2009-01-05
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm: dlm: fs/dlm/ast.c: fix warning dlm: add new debugfs entry dlm: add time stamp of blocking callback dlm: change lock time stamping dlm: improve how bast mode handling dlm: remove extra blocking callback check dlm: replace schedule with cond_resched dlm: remove kmap/kunmap dlm: trivial annotation of be16 value dlm: fix up memory allocation flags
| * dlm: fs/dlm/ast.c: fix warningAndrew Morton2008-12-23
| | | | | | | | | | | | | | | | | | | | fs/dlm/ast.c: In function 'dlm_astd': fs/dlm/ast.c:64: warning: 'bastmode' may be used uninitialized in this function Cleans code up. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: add new debugfs entryDavid Teigland2008-12-23
| | | | | | | | | | | | | | | | | | The new debugfs entry dumps all rsb and lkb structures, and includes a lot more information than has been available before. This includes the new timestamps added by a previous patch for debugging callback issues. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: add time stamp of blocking callbackDavid Teigland2008-12-23
| | | | | | | | | | | | | | | | Record the time the latest blocking callback was queued for a lock. This will be used for debugging in combination with lock queue timestamp changes in the previous patch. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: change lock time stampingDavid Teigland2008-12-23
| | | | | | | | | | | | | | | | | | Use ktime instead of jiffies for timestamping lkb's. Also stamp the time on every lkb whenever it's added to a resource queue, instead of just stamping locks subject to timeouts. This will allow us to use timestamps more widely for debugging all locks. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: improve how bast mode handlingDavid Teigland2008-12-23
| | | | | | | | | | | | | | | | | | | | | | The lkb bastmode value is set in the context of processing the lock, and read by the dlm_astd thread. Because it's accessed in these two separate contexts, the writing/reading ought to be done under a lock. This is simple to do by setting it and reading it when the lkb is added to and removed from dlm_astd's callback list which is properly locked. Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: remove extra blocking callback checkDavid Teigland2008-12-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just before delivering a blocking callback (bast), the dlm_astd thread checks again that the granted mode of the lkb actually blocks the mode requested by the bast. The idea behind this was originally that the granted mode may have changed since the bast was queued, making the callback now unnecessary. Reasons for removing this extra check are: - dlm_astd doesn't lock the rsb before reading the lkb grmode, so it's not technically safe (this removes the long standing FIXME) - after running some tests, it doesn't appear the check ever actually eliminates a bast - delivering an unnecessary blocking callback isn't a bad thing and can happen anyway Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: replace schedule with cond_reschedSteven Whitehouse2008-12-23
| | | | | | | | | | | | | | | | | | | | This is a one-liner to use cond_resched() rather than schedule() in the ast delivery loop. It should not be necessary to schedule every time, so this will save some cpu time while continuing to allow scheduling when required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: remove kmap/kunmapSteven Whitehouse2008-12-23
| | | | | | | | | | | | | | | | The pages used in lowcomms are not highmem, so kmap is not necessary. Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: trivial annotation of be16 valueHarvey Harrison2008-12-23
| | | | | | | | | | | | | | | | | | fs/dlm/dir.c:419:14: warning: incorrect type in assignment (different base types) fs/dlm/dir.c:419:14: expected unsigned short [unsigned] [addressable] [assigned] [usertype] be_namelen fs/dlm/dir.c:419:14: got restricted __be16 [usertype] <noident> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David Teigland <teigland@redhat.com>
| * dlm: fix up memory allocation flagsSteven Whitehouse2008-12-23
| | | | | | | | | | | | | | | | | | | | | | Use ls_allocation for memory allocations, which a cluster fs sets to GFP_NOFS. Use GFP_NOFS for allocations when no lockspace struct is available. Taking dlm locks needs to avoid calling back into the cluster fs because write-out can require taking dlm locks. Cc: Christine Caulfield <ccaulfie@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2009-01-05
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits) GFS2: Use DEFINE_SPINLOCK GFS2: Fix use-after-free bug on umount (try #2) Revert "GFS2: Fix use-after-free bug on umount" GFS2: Streamline alloc calculations for writes GFS2: Send useful information with uevent messages GFS2: Fix use-after-free bug on umount GFS2: Remove ancient, unused code GFS2: Move four functions from super.c GFS2: Fix bug in gfs2_lock_fs_check_clean() GFS2: Send some sensible sysfs stuff GFS2: Kill two daemons with one patch GFS2: Move gfs2_recoverd into recovery.c GFS2: Fix "truncate in progress" hang GFS2: Clean up & move gfs2_quotad GFS2: Add more detail to debugfs glock dumps GFS2: Banish struct gfs2_rgrpd_host GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd GFS2: Move rg_igeneration into struct gfs2_rgrpd GFS2: Banish struct gfs2_dinode_host GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize ...
| * | GFS2: Use DEFINE_SPINLOCKJulia Lawall2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPIN_LOCK_UNLOCKED is deprecated. The following makes the change suggested in Documentation/spinlocks.txt The semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @@ declarer name DEFINE_SPINLOCK; identifier xxx_lock; @@ - spinlock_t xxx_lock = SPIN_LOCK_UNLOCKED; + DEFINE_SPINLOCK(xxx_lock); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix use-after-free bug on umount (try #2)Steven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | This should solve the issue with the previous attempt at fixing this. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | Revert "GFS2: Fix use-after-free bug on umount"Steven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 78802499912f1ba31ce83a94c55b5a980f250a43. The original patch is causing problems in relation to order of operations at umount in relation to jdata files. I need to fix this a different way. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Streamline alloc calculations for writesSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes some unused code, and make the calculation of the number of blocks required conditional in order to reduce the number of times this (potentially expensive) calculation is done. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Send useful information with uevent messagesSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to distinguish between two differing uevent messages and to avoid using the (racy) method of reading status from sysfs in future, this adds some status information to our uevent messages. Btw, before anybody says "sysfs isn't racy", I'm aware of that, but the way that GFS2 was using it (send an ambiugous uevent and then expect the receiver to read sysfs to find out the status of the reported operation) was. The additional benefit of using the new interface is that it should be possible for a node to recover multiple journals at the same time, since there is no longer any confusion as to which journal the status belongs to. At some future stage, when all the userland programs have been converted, I intend to remove the old interface. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix use-after-free bug on umountSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a use-after-free with the GFS2 super block during umount. This patch moves almost all of the umount code from ->put_super into ->kill_sb, the only bit that cannot be moved being the glock hash clearing which has to remain as ->put_super due to umount ordering requirements. As a result its now obvious that the kfree is the final operation, whereas before it was hidden in ->put_super. Also gfs2_jindex_free is then only referenced from a single file so thats moved and marked static too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Remove ancient, unused codeSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | Remove code that used to have something to do with initrd but has been unused for a long time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Move four functions from super.cSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The functions which are being moved can all be marked static in their new locations, since they only have a single caller each. Their new locations are more logical than before and some of the functions are small enough that the compiler might well inline them. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix bug in gfs2_lock_fs_check_clean()Steven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | gfs2_lock_fs_check_clean() should not be calling gfs2_jindex_hold() since it doesn't work like rindex hold, despite the comment. That allows gfs2_jindex_hold() to be moved into ops_fstype.c where it can be made static. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Send some sensible sysfs stuffSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | We ought to inform the user of the locktable and lockproto for each uevent we generate. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Kill two daemons with one patchSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the two daemons, gfs2_scand and gfs2_glockd and replaces them with a shrinker which is called from the VM. The net result is that GFS2 responds better when there is memory pressure, since it shrinks the glock cache at the same rate as the VFS shrinks the dcache and icache. There are no longer any time based criteria for shrinking glocks, they are kept until such time as the VM asks for more memory and then we demote just as many glocks as required. There are potential future changes to this code, including the possibility of sorting the glocks which are to be written back into inode number order, to get a better I/O ordering. It would be very useful to have an elevator based workqueue implementation for this, as that would automatically deal with the read I/O cases at the same time. This patch is my answer to Andrew Morton's remark, made during the initial review of GFS2, asking why GFS2 needs so many kernel threads, the answer being that it doesn't :-) This patch is a net loss of about 200 lines of code. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Move gfs2_recoverd into recovery.cSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By moving gfs2_recoverd, we can make an additional function static and it also leaves only (the already scheduled for removal) gfs2_glockd in daemon.c. At the same time the declaration of gfs2_quotad is moved to quota.h to reflect the new location of gfs2_quotad in a previous patch. Also the recovery.h and quota.h headers are cleaned up. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Fix "truncate in progress" hangSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Following on from the recent clean up of gfs2_quotad, this patch moves the processing of "truncate in progress" inodes from the glock workqueue into gfs2_quotad. This fixes a hang due to the "truncate in progress" processing requiring glocks in order to complete. It might seem odd to use gfs2_quotad for this particular item, but we have to use a pre-existing thread since creating a thread implies a GFP_KERNEL memory allocation which is not allowed from the glock workqueue context. Of the existing threads, gfs2_logd and gfs2_recoverd may deadlock if used for this operation. gfs2_scand and gfs2_glockd are both scheduled for removal at some (hopefully not too distant) future point. That leaves only gfs2_quotad whose workload is generally fairly light and is easily adapted for this extra task. Also, as a result of this change, it opens the way for a future patch to make the reading of the inode's information asynchronous with respect to the glock workqueue, which is another improvement that has been on the list for some time now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Clean up & move gfs2_quotadSteven Whitehouse2009-01-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is a clean up of gfs2_quotad prior to giving it an extra job to do in addition to the current portfolio of updating the quota and statfs information from time to time. As a result it has been moved into quota.c allowing one of the functions it calls to be made static. Also the clean up allows the two existing functions to have separate timeouts and also to coexist with its future role of dealing with the "truncate in progress" inode flag. The (pointless) setting of gfs2_quotad_secs is removed since we arrange to only wake up quotad when one of the two timers expires. In addition the struct gfs2_quota_data is moved into a slab cache, mainly for easier debugging. It should also be possible to use a shrinker in the future, rather than the current scheme of scanning the quota data entries from time to time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>