aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
...
| * | | | | | | | | | [CIFS] add extra null attr checkSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although attr == NULL can not happen, this makes cifs_set_file_info safer in the future since it may not be obvious that the caller can not set attr to NULL. Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | | | | | [CIFS] fix build errorSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | | | | | [CIFS] reopen file via newer posix open protocol operation if availableSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the network connection crashes, and we have to reopen files, preferentially use the newer cifs posix open protocol operation if the server supports it. Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | | | | | [CIFS] Add new nostrictsync cifs mount option to avoid slow SMB flushSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If this mount option is set, when an application does an fsync call then the cifs client does not send an SMB Flush to the server (to force the server to write all dirty data for this file immediately to disk), although cifs still sends all dirty (cached) file data to the server and waits for the server to respond to the write write. Since SMB Flush can be very slow, and some servers may be reliable enough (to risk delaying slightly flushing the data to disk on the server), turning on this option may be useful to improve performance for applications that fsync too much, at a small risk of server crash. If this mount option is not set, by default cifs will send an SMB flush request (and wait for a response) on every fsync call. Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | | | | | [CIFS] DFS no longer experimentalSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also updates some DFS flag definitions Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | | | | | | | | | [CIFS] Send SMB flush in cifs_fsyncSteve French2009-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In contrast to the now-obsolete smbfs, cifs does not send SMB_COM_FLUSH in response to an explicit fsync(2) to guarantee that all volatile data is written to stable storage on the server side, provided the server honors the request (which, to my knowledge, is true for Windows and Samba with 'strict sync' enabled). This patch modifies the cifs_fsync implementation to restore the fsync-behavior of smbfs by triggering SMB_COM_FLUSH after sending outstanding data on the client side to the server. Signed-off-by: Horst Reiterer <horst.reiterer@gmail.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | | | | | | | | | ext3: Avoid starting a transaction in writepage when not necessaryJan Kara2009-03-26
| |_|_|_|_|_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have to start a transaction in writepage() when all the blocks are a properly allocated. Even in ordered mode either the data has been written via write() and they are thus already added to transaction's list or the data was written via mmap and then it's random in which transaction they get written anyway. This should help VM to pageout dirty memory without blocking on transaction commits. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6Linus Torvalds2009-03-26
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (61 commits) Dynamic debug: fix pr_fmt() build error Dynamic debug: allow simple quoting of words dynamic debug: update docs dynamic debug: combine dprintk and dynamic printk sysfs: fix some bin_vm_ops errors kobject: don't block for each kobject_uevent sysfs: only allow one scheduled removal callback per kobj Driver core: Fix device_move() vs. dpm list ordering, v2 Driver core: some cleanup on drivers/base/sys.c Driver core: implement uevent suppress in kobject vcs: hook sysfs devices into object lifetime instead of "binding" driver core: fix passing platform_data driver core: move platform_data into platform_device sysfs: don't block indefinitely for unmapped files. driver core: move knode_bus into private structure driver core: move knode_driver into private structure driver core: move klist_children into private structure driver core: create a private portion of struct device driver core: remove polling for driver_probe_done(v5) sysfs: reference sysfs_dirent from sysfs inodes ... Fixed conflicts in drivers/sh/maple/maple.c manually
| * | | | | | | | | | sysfs: fix some bin_vm_ops errorsHugh Dickins2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 86c9508eb1c0ce5aa07b5cf1d36b60c54efc3d7a "sysfs: don't block indefinitely for unmapped files" in linux-next crashes the PowerMac G5 when X starts up. It's caught out by the way powerpc's pci_mmap of legacy_mem uses shmem_zero_setup(), substituting a new vma->vm_file whose private_data no longer points to the bin_buffer (substitution done because some versions of X crash if that mmap fails). The fix to this is straightforward: the original vm_file is fput() in that case, so this mmap won't block sysfs at all, so just don't switch over to bin_vm_ops if vm_file has changed. But more fixes made before realizing that was the problem:- It should not be an error if bin_page_mkwrite() finds no underlying page_mkwrite(). Check that a file already mmap'ed has the same underlying vm_ops _before_ pointing vma->vm_ops at bin_vm_ops. If the file being mmap'ed is a shmem/tmpfs file, don't fail the mmap on CONFIG_NUMA=y, just because that has a set_policy and get_policy: provide bin_set_policy, bin_get_policy and bin_migrate. Signed-off-by: Hugh Dickins <hugh@veritas.com> Acked-by: Eric Biederman <ebiederm@aristanetworks.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | sysfs: only allow one scheduled removal callback per kobjAlex Chiang2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only way for a sysfs attribute to remove itself (without deadlock) is to use the sysfs_schedule_callback() interface. Vegard Nossum discovered that a poorly written sysfs ->store callback can repeatedly schedule remove callbacks on the same device over and over, e.g. $ while true ; do echo 1 > /sys/devices/.../remove ; done If the 'remove' attribute uses the sysfs_schedule_callback API and also does not protect itself from concurrent accesses, its callback handler will be called multiple times, and will eventually attempt to perform operations on a freed kobject, leading to many problems. Instead of requiring all callers of sysfs_schedule_callback to implement their own synchronization, provide the protection in the infrastructure. Now, sysfs_schedule_callback will only allow one scheduled callback per kobject. On subsequent calls with the same kobject, return -EAGAIN. This is a short term fix. The long term fix is to allow sysfs attributes to remove themselves directly, without any of this callback hokey pokey. [cornelia.huck@de.ibm.com: s390 ccwgroup bits] Reported-by: vegard.nossum@gmail.com Signed-off-by: Alex Chiang <achiang@hp.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | Driver core: implement uevent suppress in kobjectMing Lei2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements uevent suppress in kobject and removes it from struct device, based on the following ideas: 1,Uevent sending should be one attribute of kobject, so suppressing it in kobject layer is more natural than in device layer. By this way, we can do it for other objects embedded with kobject. 2,It may save several bytes for each instance of struct device.(On my omap3(32bit ARM) based box, can save 8bytes per device object) This patch also introduces dev_set|get_uevent_suppress() helpers to set and query uevent_suppress attribute in case to help kobject as private part of struct device in future. [This version is against the latest driver-core patch set of Greg,please ignore the last version.] Signed-off-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | sysfs: don't block indefinitely for unmapped files.Eric W. Biederman2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Modify sysfs bin files so that we can remove the bin file while they are still mapped. When the kobject is removed we unmap the bin file and arrange for future accesses to the mapping to receive SIGBUS. Implementing this prevents a nasty DOS when pci devices are hot plugged and unplugged. Where if any of their resources were mmaped the kernel could not free up their pci resources or release their pci data structures. [akpm@linux-foundation.org: remove unused var] Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | sysfs: reference sysfs_dirent from sysfs inodesEric W. Biederman2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs_dirent serves as both an inode and a directory entry for sysfs. To prevent the sysfs inode numbers from being freed prematurely hold a reference to sysfs_dirent from the sysfs inode. [akpm@linux-foundation.org: add comment] Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Cc: Tejun Heo <tj@kernel.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | sysfs: sysfs_add_one WARNs with full path to duplicate filenameAlex Chiang2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysfs: sysfs_add_one WARNs with full path to duplicate filename As a debugging aid, it can be useful to know the full path to a duplicate file being created in sysfs. We now will display warnings such as: sysfs: cannot create duplicate filename '/foo' when attempting to create multiple files named 'foo' in the sysfs root, or: sysfs: cannot create duplicate filename '/bus/pci/slots/5/foo' when attempting to create multiple files named 'foo' under a given directory in sysfs. The path displayed is always a relative path to sysfs_root. The leading '/' in the path name refers to the sysfs_root mount point, and should not be confused with the "real" '/'. Thanks to Alex Williamson for essentially writing sysfs_pathname. Cc: Alex Williamson <alex.williamson@hp.com> Signed-off-by: Alex Chiang <achiang@hp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | sysfs: Take sysfs_mutex when fetching the root inode.Eric W. Biederman2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysfs_get_inode ultimately calls sysfs_count_nlink when the a directory inode is fectched. sysfs_count_nlink needs to be called under the sysfs_mutex to guard against the unlikely but possible scenario that the root directory is changing as we are counting the number entries in it, and just in general to be consistent. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | | | | | | | | | SYSFS: use standard magic.h for sysfsQinghuang Feng2009-03-24
| | |_|_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SYSFS_MAGIC has been added into magic.h, so only use that definition in magic.h to avoid potential consistency problem. Signed-off-by: Qinghuang Feng <qhfeng.kernel@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmwLinus Torvalds2009-03-26
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Fix freeze issue Fix a minor bug in the previous patch GFS2: Clean up of glops.c GFS2: Fix locking bug in failed shared to exclusive conversion GFS2: Pagecache usage optimization on GFS2 GFS2: fix sparse warning: Should it be static? GFS2: fix sparse warnings: constant is so big it is ... GFS2: Support quota/noquota mount arguments GFS2: Fix alignment issue and tidy gfs2_bitfit GFS2: Add a "demote a glock" interface to sysfs GFS2: Expose UUID via sysfs/uevent GFS2: Support generation of discard requests GFS2: Fix deadlock on journal flush GFS2: Fix error path ref counting for root inode GFS2: Remove unused field from glock GFS2: Merge lock_dlm module into GFS2 GFS2: Remove "double" locking in quota GFS2: change gfs2_quota_scan into a shrinker GFS2: Bring back lvb-related stuff to lock_nolock to support quotas GFS2: Fix remount argument parsing
| * | | | | | | | | | GFS2: Fix freeze issueSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes some old code that was causing issues during filesystem freeze. Reported-by: Andrew Price <andy@andrewprice.me.uk> Tested-by: Andrew Price <andy@andrewprice.me.uk> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | Fix a minor bug in the previous patchSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logic requires that we mark the glock dirty in page_mkwrite otherwise we might not flush correctly in the case that no allocation was required in the process of dirying the page. Also we need to set the shared write flag early for the same reason. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Clean up of glops.cSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This cleans up a number of bits of code mostly based in glops.c. A couple of simple functions have been merged into the callers to make it more obvious what is going on, the mysterious raising of i_writecount around the truncate_inode_pages() call has been removed. The meta_go_* operations have been renamed rgrp_go_* since that is the only lock type that they are used with. The unused argument of gfs2_read_sb has been removed. Also a bug has been fixed where a check for the rindex inode was in the wrong callback. More comments are added, and the debugging code is improved too. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Fix locking bug in failed shared to exclusive conversionBenjamin Marzinski2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After calling out to the dlm, GFS2 sets the new state of a glock to gl_target in gdlm_ast(). However, gl_target is not always the lock state that was requested. If a conversion from shared to exclusive fails, finish_xmote() will call do_xmote() with LM_ST_UNLOCKED, instead of gl->gl_target, so that it can reacquire the lock in exlusive the next time around. In this case, setting the lock to gl_target in gdlm_ast() will make GFS2 think that it has the glock in exclusive mode, when really, it doesn't have the glock locked at all. This patch adds a new field to the gfs2_glock structure, gl_req, to track the mode that was requested. Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Pagecache usage optimization on GFS2Hisashi Hifumi2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I introduced "is_partially_uptodate" aops for GFS2. A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. I did a performance test using the sysbench. #sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1 --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0 --file-rw-ratio=1 run -2.6.29-rc6 Test execution summary: total time: 202.6389s total number of events: 200000 total time taken by event execution: 2580.0480 per-request statistics: min: 0.0000s avg: 0.0129s max: 49.5852s approx. 95 percentile: 0.0462s -2.6.29-rc6-patched Test execution summary: total time: 177.8639s total number of events: 200000 total time taken by event execution: 2419.0199 per-request statistics: min: 0.0000s avg: 0.0121s max: 52.4306s approx. 95 percentile: 0.0444s arch: ia64 pagesize: 16k blocksize: 4k Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: fix sparse warning: Should it be static?Hannes Eder2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Make symbol static. Fix this sparse warning: fs/gfs2/rgrp.c:188:5: warning: symbol 'gfs2_bitfit' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: fix sparse warnings: constant is so big it is ...Hannes Eder2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this sparse warnings: fs/gfs2/rgrp.c:156:23: warning: constant 0xffffffffffffffff is so big it is unsigned long long fs/gfs2/rgrp.c:157:23: warning: constant 0xaaaaaaaaaaaaaaaa is so big it is unsigned long long fs/gfs2/rgrp.c:158:23: warning: constant 0x5555555555555555 is so big it is long long fs/gfs2/rgrp.c:194:20: warning: constant 0x5555555555555555 is so big it is long long fs/gfs2/rgrp.c:204:44: warning: constant 0x5555555555555555 is so big it is long long Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Support quota/noquota mount argumentsSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds support for "quota" and "noquota" mount options in addition to the existing "quota=on/off/account" so that we are compatible with the names by which these options are more generally known. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Fix alignment issue and tidy gfs2_bitfitSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An alignment issue with the existing bitfit algorithm was reported on IA64. This patch attempts to fix that, and also to tidy up the code a bit. There is now more documentation about how this works and it has survived a number of different tests. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Add a "demote a glock" interface to sysfsSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a sysfs file called demote_rq to GFS2's per filesystem directory. Its possible to use this file to demote arbitrary glocks in exactly the same way as if a request had come in from a remote node. This is intended for testing issues relating to caching of data under glocks. Despite that, the interface is generic enough to send requests to any type of glock, but be careful as its not always safe to send an arbitrary message to an arbitrary glock. For that reason and to prevent DoS, this interface is restricted to root only. The messages look like this: <type>:<glocknumber> <mode> Example: echo -n "2:13324 EX" >/sys/fs/gfs2/unity:myfs/demote_rq Which means "please demote inode glock (type 2) number 13324 so that I can get an EX (exclusive) lock". The lock modes are those which would normally be sent by a remote node in its callback so if you want to unlock a glock, you use EX, to demote to shared, use SH or PR (depending on whether you like GFS2 or DLM lock modes better!). If the glock doesn't exist, you'll get -ENOENT returned. If the arguments don't make sense, you'll get -EINVAL returned. The plan is that this interface will be used in combination with the blktrace patch which I recently posted for comments although it is, of course, still useful in its own right. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Expose UUID via sysfs/ueventSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have a UUID, we ought to expose it to the user via sysfs and uevents. We already have the fs name in both of these places (a combination of the lock proto and lock table name) so if we add the UUID as well, we have a full set. For older filesystems (i.e. those created before mkfs.gfs2 was writing UUIDs by default) the sysfs file will appear zero length, and no UUID env var will be added to the uevents. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Support generation of discard requestsSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows GFS2 to generate discard requests for blocks which are no longer useful to the filesystem (i.e. those which have been freed as the result of an unlink operation). The requests are generated at the time which those blocks become available for reuse in the filesystem. In order to use this new feature, you have to specify the "discard" mount option. The code coalesces adjacent blocks into a single extent when generating the discard requests, thus generating the minimum number. If an error occurs when the request has been sent to the block device, then it will print a message and turn off the requests for that filesystem. If the problem is temporary, then you can use remount to turn the option back on again. There is also a nodiscard mount option so that you can use remount to turn discard requests off, if required. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Fix deadlock on journal flushSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a deadlock when the journal is flushed and there are dirty inodes other than the one which caused the journal flush. Originally the journal flushing code was trying to obtain the transaction glock while running the flush code for an inode glock. We no longer require the transaction glock at this point in time since we know that any attempt to get the transaction glock from another node will result in a journal flush. So if we are flushing the journal, we can be sure that the transaction lock is still cached from when the transaction was started. By inlining a version of gfs2_trans_begin() (minus the bit which gets the transaction glock) we can avoid the deadlock problems caused if there is a demote request queued up on the transaction glock. In addition I've also moved the umount rwsem so that it covers the glock workqueue, since it all demotions are done by this workqueue now. That fixes a bug on umount which I came across while fixing the original problem. Reported-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Fix error path ref counting for root inodeSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We were keeping hold of an extra ref to the root inode in one of the error paths, that resulted in a hang. Reported-by: Nate Straz <nstraz@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Tested-by: Robert Peterson <rpeterso@redhat.com>
| * | | | | | | | | | GFS2: Remove unused field from glockSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time stamp field is unused in the glock now that we are using a shrinker, so that we can remove it and save sizeof(unsigned long) bytes in each glock. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Merge lock_dlm module into GFS2Steven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the big patch that I've been working on for some time now. There are many reasons for wanting to make this change such as: o Reducing overhead by eliminating duplicated fields between structures o Simplifcation of the code (reduces the code size by a fair bit) o The locking interface is now the DLM interface itself as proposed some time ago. o Fewer lookups of glocks when processing replies from the DLM o Fewer memory allocations/deallocations for each glock o Scope to do further optimisations in the future (but this patch is more than big enough for now!) Please note that (a) this patch relates to the lock_dlm module and not the DLM itself, that is still a separate module; and (b) that we retain the ability to build GFS2 as a standalone single node filesystem with out requiring the DLM. This patch needs a lot of testing, hence my keeping it I restarted my -git tree after the last merge window. That way, this has the maximum exposure before its merged. This is (modulo a few minor bug fixes) the same patch that I've been posting on and off the the last three months and its passed a number of different tests so far. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Remove "double" locking in quotaSteven Whitehouse2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only really need a single spin lock for the quota data, so lets just use the lru lock for now. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Abhijith Das <adas@redhat.com>
| * | | | | | | | | | GFS2: change gfs2_quota_scan into a shrinkerAbhijith Das2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deallocation of gfs2_quota_data objects now happens on-demand through a shrinker instead of routinely deallocating through the quotad daemon. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Bring back lvb-related stuff to lock_nolock to support quotasAbhijith Das2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The quota code uses lvbs and this is currently not implemented in lock_nolock, thereby causing panics when quota is enabled with lock_nolock. This patch adds the relevant bits. Signed-off-by: Abhijith Das <adas@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | | | | | | | | | GFS2: Fix remount argument parsingSteven Whitehouse2009-03-24
| |/ / / / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch fixes an issue relating to remount and argument parsing. After this fix is applied, remount becomes atomic in that it either succeeds changing the mount to the new state, or it fails and leaves it in the old state. Previously it was possible for the parsing of options to fail part way though and for the fs to be left in a state where some of the new arguments had been applied, but some had not. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
* | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2009-03-26
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (71 commits) SELinux: inode_doinit_with_dentry drop no dentry printk SELinux: new permission between tty audit and audit socket SELinux: open perm for sock files smack: fixes for unlabeled host support keys: make procfiles per-user-namespace keys: skip keys from another user namespace keys: consider user namespace in key_permission keys: distinguish per-uid keys in different namespaces integrity: ima iint radix_tree_lookup locking fix TOMOYO: Do not call tomoyo_realpath_init unless registered. integrity: ima scatterlist bug fix smack: fix lots of kernel-doc notation TOMOYO: Don't create securityfs entries unless registered. TOMOYO: Fix exception policy read failure. SELinux: convert the avc cache hash list to an hlist SELinux: code readability with avc_cache SELinux: remove unused av.decided field SELinux: more careful use of avd in avc_has_perm_noaudit SELinux: remove the unused ae.used SELinux: check seqno when updating an avc_node ...
| * | | | | | | | | | Merge branch 'master' into nextJames Morris2009-03-23
| |\| | | | | | | | |
| * | | | | | | | | | Add in_execve flag into task_struct.Kentaro Takeda2009-02-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch allows LSM modules to determine whether current process is in an execve operation or not so that they can behave differently while an execve operation is in progress. This patch is needed by TOMOYO. Please see another patch titled "LSM adapter functions." for backgrounds. Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | | | | | | Merge branch 'master' into nextJames Morris2009-02-05
| |\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: fs/namei.c Manually merged per: diff --cc fs/namei.c index 734f2b5,bbc15c2..0000000 --- a/fs/namei.c +++ b/fs/namei.c @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char nd->flags |= LOOKUP_CONTINUE; err = exec_permission_lite(inode); if (err == -EAGAIN) - err = vfs_permission(nd, MAY_EXEC); + err = inode_permission(nd->path.dentry->d_inode, + MAY_EXEC); + if (!err) + err = ima_path_check(&nd->path, MAY_EXEC); if (err) break; @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc flag &= ~O_TRUNC; } - error = vfs_permission(nd, acc_mode); + error = inode_permission(inode, acc_mode); if (error) return error; + - error = ima_path_check(&nd->path, ++ error = ima_path_check(path, + acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC)); + if (error) + return error; /* * An append-only file must be opened in append mode for writing. */ Signed-off-by: James Morris <jmorris@namei.org>
| * | | | | | | | | | | integrity: IMA hooksMimi Zohar2009-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch replaces the generic integrity hooks, for which IMA registered itself, with IMA integrity hooks in the appropriate places directly in the fs directory. Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | | | | | | | | | | Make relatime defaultMatthew Garrett2009-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the default behaviour of the kernel to use relatime for all filesystems. This can be overridden with the "strictatime" mount option. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | Add a strictatime mount optionMatthew Garrett2009-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for explicitly requesting full atime updates. This makes it possible for kernels to default to relatime but still allow userspace to override it. Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | | Allow relatime to update atime once a dayMatthew Garrett2009-03-26
| |_|/ / / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow atime to be updated once per day even with relatime. This lets utilities like tmpreaper (which delete files based on last access time) continue working, making relatime a plausible default for distributions. Signed-off-by: Matthew Garrett <mjg@redhat.com> Reviewed-by: Matthew Wilcox <willy@linux.intel.com> Acked-by: Valerie Aurora Henson <vaurora@redhat.com> Acked-by: Alan Cox <alan@redhat.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | Update my email addressGertjan van Wingerde2009-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update all previous incarnations of my email address to the correct one. Signed-off-by: Gertjan van Wingerde <gwingerde@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | eCryptfs: NULL crypt_stat dereference during lookupTyler Hicks2009-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If ecryptfs_encrypted_view or ecryptfs_xattr_metadata were being specified as mount options, a NULL pointer dereference of crypt_stat was possible during lookup. This patch moves the crypt_stat assignment into ecryptfs_lookup_and_interpose_lower(), ensuring that crypt_stat will not be NULL before we attempt to dereference it. Thanks to Dan Carpenter and his static analysis tool, smatch, for finding this bug. Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: Dustin Kirkland <kirkland@canonical.com> Cc: Dan Carpenter <error27@gmail.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | eCryptfs: Allocate a variable number of pages for file headersTyler Hicks2009-03-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating the memory used to store the eCryptfs header contents, a single, zeroed page was being allocated with get_zeroed_page(). However, the size of an eCryptfs header is either PAGE_CACHE_SIZE or ECRYPTFS_MINIMUM_HEADER_EXTENT_SIZE (8192), whichever is larger, and is stored in the file's private_data->crypt_stat->num_header_bytes_at_front field. ecryptfs_write_metadata_to_contents() was using num_header_bytes_at_front to decide how many bytes should be written to the lower filesystem for the file header. Unfortunately, at least 8K was being written from the page, despite the chance of the single, zeroed page being smaller than 8K. This resulted in random areas of kernel memory being written between the 0x1000 and 0x1FFF bytes offsets in the eCryptfs file headers if PAGE_SIZE was 4K. This patch allocates a variable number of pages, calculated with num_header_bytes_at_front, and passes the number of allocated pages along to ecryptfs_write_metadata_to_contents(). Thanks to Florian Streibelt for reporting the data leak and working with me to find the problem. 2.6.28 is the only kernel release with this vulnerability. Corresponds to CVE-2009-0787 Signed-off-by: Tyler Hicks <tyhicks@linux.vnet.ibm.com> Acked-by: Dustin Kirkland <kirkland@canonical.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Eugene Teo <eugeneteo@kernel.sg> Cc: Greg KH <greg@kroah.com> Cc: dann frazier <dannf@dannf.org> Cc: Serge E. Hallyn <serue@us.ibm.com> Cc: Florian Streibelt <florian@f-streibelt.de> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | aio: lookup_ioctx can return the wrong value when looking up a bogus contextJeff Moyer2009-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The libaio test harness turned up a problem whereby lookup_ioctx on a bogus io context was returning the 1 valid io context from the list (harness/cases/3.p). Because of that, an extra put_iocontext was done, and when the process exited, it hit a BUG_ON in the put_iocontext macro called from exit_aio (since we expect a users count of 1 and instead get 0). The problem was introduced by "aio: make the lookup_ioctx() lockless" (commit abf137dd7712132ee56d5b3143c2ff61a72a5faa). Thanks to Zach for pointing out that hlist_for_each_entry_rcu will not return with a NULL tpos at the end of the loop, even if the entry was not found. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Acked-by: Zach Brown <zach.brown@oracle.com> Acked-by: Jens Axboe <jens.axboe@oracle.com> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | eventfd: remove fput() call from possible IRQ contextDavide Libenzi2009-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a source of fput() call from inside IRQ context. Myself, like Eric, wasn't able to reproduce an fput() call from IRQ context, but Jeff said he was able to, with the attached test program. Independently from this, the bug is conceptually there, so we might be better off fixing it. This patch adds an optimization similar to the one we already do on ->ki_filp, on ->ki_eventfd. Playing with ->f_count directly is not pretty in general, but the alternative here would be to add a brand new delayed fput() infrastructure, that I'm not sure is worth it. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Cc: Benjamin LaHaise <bcrl@kvack.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Cc: Zach Brown <zach.brown@oracle.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>