aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* [PATCH] uclinux: fix proc_task()/get_proc-task() namingGreg Ungerer2006-07-04
| | | | | | | Fix changed name of proc_task() to get_proc_task(). Signed-off-by: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge git://git.infradead.org/mtd-2.6Linus Torvalds2006-07-04
|\ | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/mtd-2.6: [JFFS2][XATTR] Fix memory leak in POSIX-ACL support fs/jffs2/: make 2 functions static [MTD] NAND: Fix broken sharpsl driver [JFFS2][XATTR] Fix xd->refcnt race condition MTD: kernel-doc fixes + additions MTD: fix all kernel-doc warnings [MTD] DOC: Fixup read functions and do a little cleanup
| * [JFFS2][XATTR] Fix memory leak in POSIX-ACL supportKaiGai Kohei2006-07-02
| | | | | | | | | | | | | | | | | | | | | | | | jffs2_clear_acl() which releases acl caches allocated by kmalloc() was defined but it was never called. Thus, we faced to the risk of memory leaking. This patch plugs jffs2_clear_acl() into jffs2_do_clear_inode(). It ensures to release acl cache when inode is cleared. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * fs/jffs2/: make 2 functions staticAdrian Bunk2006-06-29
| | | | | | | | | | | | | | This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
| * [JFFS2][XATTR] Fix xd->refcnt race conditionKaiGai Kohei2006-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When xd->refcnt is checked whether this xdatum should be released or not, atomic_dec_and_lock() is used to ensure holding the c->erase_completion_lock. This fix change a specification of delete_xattr_datum(). Previously, it's only called when xd->refcnt equals zero. (calling it with positive xd->refcnt cause a BUG()) If you applied this patch, the function checks whether xd->refcnt is zero or not under the spinlock if necessary. Then, it marks xd DEAD flahs and links with xattr_dead_list or releases it immediately when xd->refcnt become zero. Signed-off-by: KaiGai Kohei <kaigai@ak.jp.nec.com> Signed-off-by: David Woodhouse <dwmw2@infradead.org>
* | [PATCH] sched: cleanup, remove task_t, convert to struct task_structIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | cleanup: remove task_t and convert all the uses to struct task_struct. I introduced it for the scheduler anno and it was a mistake. Conversion was mostly scripted, the result was reviewed and all secondary whitespace and style impact (if any) was fixed up by hand. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate blkdev nestingIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Teach special (recursive) locking code to the lock validator. Effects on non-lockdep kernels: - the introduction of the following function variants: extern struct block_device *open_partition_by_devnum(dev_t, unsigned); extern int blkdev_put_partition(struct block_device *); static int blkdev_get_whole(struct block_device *bdev, mode_t mode, unsigned flags); which on non-lockdep are the same as open_by_devnum(), blkdev_put() and blkdev_get(). - a subclass parameter to do_open(). [unused on non-lockdep] - a subclass parameter to __blkdev_put(), which is a new internal function for the main blkdev_put*() functions. [parameter unused on non-lockdep kernels, except for two sanity check WARN_ON()s] these functions carry no semantical difference - they only express object dependencies towards the lockdep subsystem. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Neil Brown <neilb@cse.unsw.edu.au> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate sb ->s_umountArjan van de Ven2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | The s_umount rwsem needs to be classified as per-superblock since it's perfectly legit to keep multiple of those recursively in the VFS locking rules. Has no effect on non-lockdep kernels. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate ->s_lockIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | Teach special (per-filesystem) locking code to the lock validator. Minimal effect on non-lockdep kernels: one extra parameter to alloc_super(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate the quota codeArjan van de Ven2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The quota code plays interesting games with the lock ordering; to quote Jan: | i_mutex of inode containing quota file is acquired after all other | quota locks. i_mutex of all other inodes is acquired before quota | locks. Quota code makes sure (by resetting inode operations and | setting special flag on inode) that noone tries to enter quota code | while holding i_mutex on a quota file... The good news is that all of this special case i_mutex grabbing happens in the (per filesystem) low level quota write function. For this special case we need a new I_MUTEX_* nesting level, since this just entirely outside any of the regular VFS locking rules for i_mutex. I trust Jan on his blue eyes that this is not ever going to deadlock; and based on that the patch below is what it takes to inform lockdep of these very interesting new locking rules. The new locking rule for the I_MUTEX_QUOTA nesting level is that this is the deepest possible level of nesting for i_mutex, and that this only should be used in quota write (and possibly read) function of filesystems. This makes the lock ordering of the I_MUTEX_* levels: I_MUTEX_PARENT -> I_MUTEX_CHILD -> I_MUTEX_NORMAL -> I_MUTEX_QUOTA Has no effect on non-lockdep kernels. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Jan Kara <jack@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate NTFS locking rulesIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NTFS uses lots of type-opaque objects which acquire their true identity runtime - so the lock validator needs to be helped in a couple of places to figure out object types. Many thanks to Anton Altaparmakov for giving lots of explanations about NTFS locking rules. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Anton Altaparmakov <aia21@cantab.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate i_mutexIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | Teach special (recursive) locking code to the lock validator. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate dcacheIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | Teach special (recursive) locking code to the lock validator. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: annotate direct ioIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | Teach special (rwsem-in-irq) locking code to the lock validator. Has no effect on non-lockdep kernels. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] lockdep: locking init debugging improvementIngo Molnar2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | Locking init improvement: - introduce and use __SPIN_LOCK_UNLOCKED for array initializations, to pass in the name string of locks, used by debugging Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] binfmt_elf: fix checks for bad addressChuck Ebbert2006-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix check for bad address; use macro instead of open-coding two checks. Taken from RHEL4 kernel update. From: Ernie Petrides <petrides@redhat.com> For background, the BAD_ADDR() macro should return TRUE if the address is TASK_SIZE, because that's the lowest address that is *not* valid for user-space mappings. The macro was correct in binfmt_aout.c but was wrong for the "equal to" case in binfmt_elf.c. There were two in-line validations of user-space addresses in binfmt_elf.c, which have been appropriately converted to use the corrected BAD_ADDR() macro in the patch you posted yesterday. Note that the size checks against TASK_SIZE are okay as coded. The additional changes that I propose are below. These are in the error paths for bad ELF entry addresses once load_elf_binary() has already committed to exec'ing the new image (following the tearing down of the task's original address space). The 1st hunk deals with the interp-side of the outer "if". There were two problems here. The printk() should be removed because this path can be triggered at will by a bogus interpreter image created and used by a malicious user. Further, the error code should not be ENOEXEC, because that causes the loop in search_binary_handler() to continue trying other exec handlers (twice, in fact). But it's too late for this to work correctly, because the user address space has already been torn down, and an exec() failure cannot be returned to the user code because the code no longer exists. The only recovery is to force a SIGSEGV, but it's best to terminate the search loop immediately. I somewhat arbitrarily chose EINVAL as a fallback error code, but any error returned by load_elf_interp() will override that (but this value will never be seen by user-space). The 2nd hunk deals with the non-interp-side of the outer "if". There were two problems here as well. The SIGSEGV needs to be forced, because a prior sigaction() syscall might have set the associated disposition to SIG_IGN. And the ENOEXEC should be changed to EINVAL as described above. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Ernie Petrides <petrides@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] nfs: non-procfs build fixDominik Hackl2006-07-02
| | | | | | | | | | | | | | | | This fixes a bug in fs/nfs which makes it impossible to build nfs without having procfs enabled. Signed-off-by: Dominik Hackl <dominik@hackl.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] reiserfs: update ctime and mtime on expanding truncateVladimir Saveliev2006-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | Reiserfs does not update ctime and mtime on expanding truncate via truncate(). This patch fixes it. Signed-off-by: Vladimir Saveliev <vs@namesys.com> Cc: Hans Reiser <reiser@namesys.com> Cc: Michael Kerrisk <mtk-manpages@gmx.net> Cc: Chris Mason <mason@suse.com> Cc: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] ufs: truncate should allocate block for last byteEvgeniy Dushistov2006-07-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes buggy behaviour of UFS in such kind of scenario: open(, O_TRUNC...) ftruncate(, 1024) ftruncate(, 0) Such a scenario causes ufs_panic and remount read-only. This happen because of according to specification UFS should always allocate block for last byte, and many parts of our implementation rely on this, but `ufs_truncate' doesn't care about this. To make possible return error code and to know about old size, this patch removes `truncate' from ufs inode_operations and uses `setattr' method to call ufs_truncate. Signed-off-by: Evgeniy Dushistov <dushistov@mail.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivialLinus Torvalds2006-06-30
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: Remove obsolete #include <linux/config.h> remove obsolete swsusp_encrypt arch/arm26/Kconfig typos Documentation/IPMI typos Kconfig: Typos in net/sched/Kconfig v9fs: do not include linux/version.h Documentation/DocBook/mtdnand.tmpl: typo fixes typo fixes: specfic -> specific typo fixes in Documentation/networking/pktgen.txt typo fixes: occuring -> occurring typo fixes: infomation -> information typo fixes: disadvantadge -> disadvantage typo fixes: aquire -> acquire typo fixes: mecanism -> mechanism typo fixes: bandwith -> bandwidth fix a typo in the RTC_CLASS help text smb is no longer maintained Manually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
| * | Remove obsolete #include <linux/config.h>Jörn Engel2006-06-30
| | | | | | | | | | | | | | | Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: Adrian Bunk <bunk@stusta.de>
| * | v9fs: do not include linux/version.hPaul Collins2006-06-30
| | | | | | | | | | | | | | | | | | | | | I noticed that part of v9fs was being rebuilt when version.h changed. Signed-off-by: Paul Collins <paul@ondioline.org> Signed-off-by: Adrian Bunk <bunk@stusta.de>
| * | typo fixes: aquire -> acquireAdrian Bunk2006-06-30
| | | | | | | | | | | | | | | Signed-off-by: Adrian Bunk <bunk@stusta.de> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
* | | [PATCH] knfsd: nfsd: mark rqstp to prevent use of sendfile in privacy caseJ. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy case so that the wrapping code will get copies of the read data instead of real page cache pages. This makes life simpler when we encrypt the response. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd4: fix open flag passingJ. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since nfsv4 actually keeps around the file descriptors it gets from open (instead of just using them for a single read or write operation), we need to make sure that we can do RDWR opens and not just RDONLY/WRONLY. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd4: fix some open argument testsJ. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These tests always returned true; clearly that wasn't what was intended. In keeping with kernel style, make them functions instead of macros while we're at it. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd: fix misplaced fh_unlock() in nfsd_link()David M. Richter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is skipped and locks are held overlong. Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and verifying that fh_unlock() gets called appropriately. Signed-off-by: David M. Richter <richterd@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd4: remove superfluous grace period checksJ. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're checking nfs_in_grace here a few times when there isn't really any reason to--bad_stateid is probably the more sensible return value anyway. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd: call nfsd_setuser() on fh_compose(), fix nfsd4 ↵J. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | permissions problem In the typical v2/v3 case the only new filehandles used as arguments to operations are filehandles taken directly off the wire, which don't get dentries until fh_verify() is called. But in v4 the filehandles that are arguments to operations were often created by previous operations (putrootfh, lookup, etc.) using fh_compose, which sets the dentry in the filehandle without calling nfsd_setuser(). This also means that, for example, if filesystem B is mounted on filesystem A, and filesystem A is exported without root-squashing, then a client can bypass the rootsquashing on B using a compound that starts at a filehandle in A, crosses into B using lookups, and then does stuff in B. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: nfsd4: fix open_confirm lockingJ. Bruce Fields2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an improper unlock in an error path. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: ignore ref_fh when crossing a mountpointNeilBrown2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nfsd tries to return to a client the same sort of filehandle as was used by the client. This removes some filehandle aliasing issues and means that a server upgrade followed by a downgrade will not confused clients not restarted during that time. However when crossing a mountpoint, the filehandle used for one filesystem doesn't provide any useful information on what sort of filehandle should be used on the other, and can provide misleading information. So if the reference filehandle is on a different filesystem to the one being generated, ignore it. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: remove noise about filehandle being uptodateNeilBrown2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a perfectly valid situation where fh_update gets called on an already uptodate filehandle - in nfsd_create_v3 where a CREATE_UNCHECKED finds an existing file and wants to just set the size. We could possible optimise out the call in that case, but the only harm involved is that fh_update prints a warning, so it is easier to remove the warning. Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: fixing missing 'expkey' support for fsid type 3Frank Filz2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Type '3' is used for the fsid in filehandles when the device number of the device holding the filesystem has more than 8 bits in either major or minor. Unfortunately expkey_parse doesn't recognise type 3. Fix this. (Slighty modified from Frank's original) Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] knfsd: improve the test for cross-device-rename in nfsdNeilBrown2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just testing the i_sb isn't really enough, at least the vfsmnt must be the same. Thanks Al. Cc: Al Viro <viro@ftp.linux.org.uk> Signed-off-by: Neil Brown <neilb@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] SELinux: Add security hook definition for getioprio and insert hooksDavid Quigley2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new security hook definition for the sys_ioprio_get operation. At present, the SELinux hook function implementation for this hook is identical to the getscheduler implementation but a separate hook is introduced to allow this check to be specialized in the future if necessary. This patch also creates a helper function get_task_ioprio which handles the access check in addition to retrieving the ioprio value for the task. Signed-off-by: David Quigley <dpquigl@tycho.nsa.gov> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: James Morris <jmorris@namei.org> Cc: Jens Axboe <axboe@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] Light weight event countersChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The remaining counters in page_state after the zoned VM counter patches have been applied are all just for show in /proc/vmstat. They have no essential function for the VM. We use a simple increment of per cpu variables. In order to avoid the most severe races we disable preempt. Preempt does not prevent the race between an increment and an interrupt handler incrementing the same statistics counter. However, that race is exceedingly rare, we may only loose one increment or so and there is no requirement (at least not in kernel) that the vm event counters have to be accurate. In the non preempt case this results in a simple increment for each counter. For many architectures this will be reduced by the compiler to a single instruction. This single instruction is atomic for i386 and x86_64. And therefore even the rare race condition in an interrupt is avoided for both architectures in most cases. The patchset also adds an off switch for embedded systems that allows a building of linux kernels without these counters. The implementation of these counters is through inline code that hopefully results in only a single instruction increment instruction being emitted (i386, x86_64) or in the increment being hidden though instruction concurrency (EPIC architectures such as ia64 can get that done). Benefits: - VM event counter operations usually reduce to a single inline instruction on i386 and x86_64. - No interrupt disable, only preempt disable for the preempt case. Preempt disable can also be avoided by moving the counter into a spinlock. - Handling is similar to zoned VM counters. - Simple and easily extendable. - Can be omitted to reduce memory use for embedded use. References: RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2 RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2 local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2 V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2 V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2 V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2 Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_bounce to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion of nr_bounce to a per zone counter nr_bounce is only used for proc output. So it could be left as an event counter. However, the event counters may not be accurate and nr_bounce is categorizing types of pages in a zone. So we really need this to also be a per zone counter. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_unstable to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [akpm@osdl.org: bugfixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_writeback to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion of nr_writeback to per zone counter. This removes the last page_state counter from arch/i386/mm/pgtable.c so we drop the page_state from there. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_dirty to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes nr_dirty a per zone counter. Looping over all processors is avoided during writeback state determination. The counter aggregation for nr_dirty had to be undone in the NFS layer since we summed up the page counts from multiple zones. Someone more familiar with NFS should probably review what I have done. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_pagetables to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | Conversion of nr_page_table_pages to a per zone counter [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_slab to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Allows reclaim to access counter without looping over processor counts. - Allows accurate statistics on how many pages are used in a zone by the slab. This may become useful to balance slab allocations over various zones. [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: split NR_ANON_PAGES off from NR_FILE_MAPPEDChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current NR_FILE_MAPPED is used by zone reclaim and the dirty load calculation as the number of mapped pagecache pages. However, that is not true. NR_FILE_MAPPED includes the mapped anonymous pages. This patch separates those and therefore allows an accurate tracking of the anonymous pages per zone. It then becomes possible to determine the number of unmapped pages per zone and we can avoid scanning for unmapped pages if there are none. Also it may now be possible to determine the mapped/unmapped ratio in get_dirty_limit. Isnt the number of anonymous pages irrelevant in that calculation? Note that this will change the meaning of the number of mapped pages reported in /proc/vmstat /proc/meminfo and in the per node statistics. This may affect user space tools that monitor these counters! NR_FILE_MAPPED works like NR_FILE_DIRTY. It is only valid for pagecache pages. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: conversion of nr_pagecache to per zone counterChristoph Lameter2006-06-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently a single atomic variable is used to establish the size of the page cache in the whole machine. The zoned VM counters have the same method of implementation as the nr_pagecache code but also allow the determination of the pagecache size per zone. Remove the special implementation for nr_pagecache and make it a zoned counter named NR_FILE_PAGES. Updates of the page cache counters are always performed with interrupts off. We can therefore use the __ variant here. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] zoned vm counters: convert nr_mapped to per zone counterChristoph Lameter2006-06-30
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nr_mapped is important because it allows a determination of how many pages of a zone are not mapped, which would allow a more efficient means of determining when we need to reclaim memory in a zone. We take the nr_mapped field out of the page state structure and define a new per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off from NR_MAPPED in the next patch). We replace the use of nr_mapped in various kernel locations. This avoids the looping over all processors in try_to_free_pages(), writeback, reclaim (swap + zone reclaim). [akpm@osdl.org: bugfix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge branch 'upstream-linus' of ↵Linus Torvalds2006-06-29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks() ocfs2: clean up some osb fields ocfs2: fix init of uuid_net_key ocfs2: silence a debug print ocfs2: silence ENOENT during lookup of broken links ocfs2: Cleanup message prints ocfs2: silence -EEXIST from ocfs2_extent_map_insert/lookup [PATCH] fs/ocfs2/dlm/dlmrecovery.c: make dlm_lockres_master_requery() static ocfs2: warn the user on a dead timeout mismatch ocfs2: OCFS2_FS must depend on SYSFS ocfs2: Compile-time disabling of ocfs2 debugging output. configfs: Clear up a few extra spaces where there should be TABs. configfs: Release memory in configfs_example.
| * | ocfs2: remove redundant NULL checks in ocfs2_direct_IO_get_blocks()Florin Malita2006-06-29
| | | | | | | | | | | | | | | Signed-off-by: Florin Malita <fmalita@gmail.com> Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: clean up some osb fieldsMark Fasheh2006-06-29
| | | | | | | | | | | | | | | | | | | | | | | | Get rid of osb->uuid, osb->proc_sub_dir, and osb->osb_id. Those fields were unused, or could easily be removed. As a result, we also no longer need MAX_OSB_ID or ocfs2_globals_lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: fix init of uuid_net_keyMark Fasheh2006-06-29
| | | | | | | | | | | | | | | | | | ocfs2_initialize_super() should be copying from the beginning of the uuid. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
| * | ocfs2: silence a debug printMark Fasheh2006-06-29
| | | | | | | | | | | | Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>