aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
...
| | * | | | | | | | | ceph: specify supported features in super.hSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specify the supported/required feature bits in super.h client code instead of using the definitions from the shared kernel/userspace headers (which will go away shortly). Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: clean up fsid mount optionSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Specify the fsid mount option in hex, not via the major/minor u64 hackery we had before. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: remove unused 'monport' mount optionSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: handle ESTALE properly; on receipt send to authority if it wasn'tGreg Farnum2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: add ceph_get_cap_for_mds function.Greg Farnum2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Greg Farnum <gregf@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: connect to export targets on cap exportSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get a cap EXPORT message, make sure we are connected to all export targets to ensure we can handle the matching IMPORT. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: connect to export targets if mds is laggySage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If an MDS we are talking to may have failed, we need to open sessions to its potential export targets to ensure that any in-progress migration that may have involved some of our caps is properly handled. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: introduce helper to connect to mds export targetsSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few cases where we need to open sessions with a given mds's potential export targets. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: only set num_pages in calc_layoutSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting it elsewhere is unnecessary and more fragile. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: do caps accounting per mds_clientYehuda Sadeh2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Caps related accounting is now being done per mds client instead of just being global. This prepares ground work for a later revision of the caps preallocated reservation list. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: track laggy state of mds from mdsmapSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: code cleanupYehuda Sadeh2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mainly fixing minor issues reported by sparse. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: skip if no auth cap in flush_snapsSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a capsnap but no auth cap (e.g. because it is migrating to another mds), bail out and do nothing for now. Do NOT remove the capsnap from the flush list. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: simplify caps revocation, fix for multimdsSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The caps revocation should either initiate writeback, invalidateion, or call check_caps to ack or do the dirty work. The primary question is whether we can get away with only checking the auth cap or whether all caps need to be checked. The old code was doing...something else. At the very least, revocations from non-auth MDSs could break by triggering the "check auth cap only" case. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: simplify add_cap_releasesSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No functional change, aside from more useful debug output. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: drop unused argumentSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: perform lazy reads when file mode and caps permitSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the file mode is marked as "lazy," perform cached/buffered reads when the caps permit it. Adjust the rdcache_gen and invalidation logic accordingly so that we manage our cache based on the FILE_CACHE -or- FILE_LAZYIO cap bits. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: perform lazy writes when file mode and caps permitSage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have marked a file as "lazy" (using the ceph ioctl), perform buffered writes when the MDS caps allow it. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: add LAZYIO ioctl to mark a file description for lazy consistencySage Weil2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow an application to mark a file descriptor for lazy file consistency semantics, allowing buffered reads and writes when multiple clients are accessing the same file. Signed-off-by: Sage Weil <sage@newdream.net>
| | * | | | | | | | | ceph: request FILE_LAZYIO cap when LAZY file mode is setSage Weil2010-08-01
| | | |_|_|_|/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also clean up the file flags -> file mode -> wanted caps functions while we're at it. This resyncs this file with userspace. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | | | | | | | fs/sysv/super.c: add support for non-PDP11 v7 filesystemsLubomir Rintel2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds byte order autodetection (of PDP-11 and LE filesystems). No attempt is made to detect big-endian filesystems -- were there any? Tested with PDP-11 v7 filesystems and PC-IX maintenance floppy. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Cc: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | fs/sysv: v7: adjust sanity checks for some volumesLubomir Rintel2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Newly mkfs-ed filesystems from Seventh Edition have last modification time set to zero, but are otherwise perfectly valid. Also, tighten up other sanity checks to filter out most filesystems with [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | fs/sysv: add v7 aliasLubomir Rintel2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So that the module gets autoloaded when a v7 filesystem is mounted. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | afs: destroy work queue on init failureDan Carpenter2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can clean up the work queue on this error path. This function is called from afs_init(). Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | partitions: fix sometimes unreadable partition stringsAlexey Dobriyan2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this garbage happening quite often: ==> sda: scsi 3:0:0:0: CD-ROM TOSHIBA ==> sda1 sda2 sda3 sda4 <sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray ^^^ Uniform CD-ROM driver Revision: 3.20 sr 3:0:0:0: Attached scsi CD-ROM sr0 ==> sda5 sda6 sda7 > Make "sda: sda1 ..." lines actually lines. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | procfs: simplify conditional processing of fs/proc.o.Robert P. J. Day2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the entire fs/proc directory is conditionally included based on CONFIG_PROC_FS, it's redundant to check that same variable within that directory. Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | signalfd: fill in ssi_int for posix timers and message queuesNathan Lynch2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If signalfd is used to consume a signal generated by a POSIX interval timer or POSIX message queue, the ssi_int field does not reflect the data (sigevent->sigev_value) supplied to timer_create(2) or mq_notify(3). (The ssi_ptr field, however, is filled in.) This behavior differs from signalfd's treatment of sigqueue-generated signals -- see the default case in signalfd_copyinfo. It also gives results that differ from the case when a signal is handled conventionally via a sigaction-registered handler. So, set signalfd_siginfo->ssi_int in the remaining cases (__SI_TIMER, __SI_MESGQ) where ssi_ptr is set. akpm: a non-back-compatible change. Merge into -stable to minimise the number of kernels which are in the field and which miss this feature. Signed-off-by: Nathan Lynch <ntl@pobox.com> Acked-by: Davide Libenzi <davidel@xmailserver.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | blkdev: cgroup whitelist permission fixChris Wright2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cgroup device whitelist code gets confused when trying to grant permission to a disk partition that is not currently open. Part of blkdev_open() includes __blkdev_get() on the whole disk. Basically, the only ways to reliably allow a cgroup access to a partition on a block device when using the whitelist are to 1) also give it access to the whole block device or 2) make sure the partition is already open in a different context. The patch avoids the cgroup check for the whole disk case when opening a partition. Addresses https://bugzilla.redhat.com/show_bug.cgi?id=589662 Signed-off-by: Chris Wright <chrisw@sous-sol.org> Acked-by: Serge E. Hallyn <serue@us.ibm.com> Tested-by: Serge E. Hallyn <serue@us.ibm.com> Reported-by: Vivek Goyal <vgoyal@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: "Daniel P. Berrange" <berrange@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | reiserfs: remove unused local `wait'Changli Gao2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | autofs4: remove unneeded null check in try_to_fill_dentry()Dan Carpenter2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After 97e7449a7ad: "autofs4: fix indirect mount pending expire race" we no longer assumed that "ino" can be null. The other null checks got removed but this was one was missed. Signed-off-by: Dan Carpenter <error27@gmail.com> Cc: Ian Kent <raven@themaw.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | vfs: use kmalloc() to allocate fdmem if possibleChangli Gao2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use kmalloc() to allocate fdmem if possible. vmalloc() is used as a fallback solution for fdmem allocation. A new helper function __free_fdtable() is introduced to reduce the lines of code. A potential bug, vfree() a memory allocated by kmalloc(), is fixed. [akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()] Signed-off-by: Changli Gao <xiaosuo@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jiri Slaby <jslaby@suse.cz> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Avi Kivity <avi@redhat.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | vfs: clarify that nonseekable_open() will never failDmitry Torokhov2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Dmitry Torokhov <dtor@mail.ru> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: John Kacur <jkacur@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | vfs: O_* bit numbers uniqueness checkWu Fengguang2010-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The O_* bit numbers are defined in 20+ arch/*, and can silently overlap. Add a compile time check to ensure the uniqueness as suggested by David Miller. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Eric Paris <eparis@redhat.com> Cc: Roland Dreier <rdreier@cisco.com> Cc: Jamie Lokier <jamie@shareable.org> Cc: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | | vfs: improve comment describing fget_light()Tony Battersby2010-08-11
| | |_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Improve the description of fget_light(), which is currently incorrect about needing a prior refcnt (judging by the way it is actually used). Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | jfs: don't allow os2 xattr namespace overlap with othersDave Kleikamp2010-08-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's currently possible to bypass xattr namespace access rules by prefixing valid xattr names with "os2.", since the os2 namespace stores extended attributes in a legacy format with no prefix. This patch adds checking to deny access to any valid namespace prefix following "os2.". Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Reported-by: Sergey Vlasov <vsu@altlinux.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | | | | | Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-08-10
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits) block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n xen-blkfront: fix missing out label blkdev: fix blkdev_issue_zeroout return value block: update request stacking methods to support discards block: fix missing export of blk_types.h writeback: fix bad _bh spinlock nesting drbd: revert "delay probes", feature is being re-implemented differently drbd: Initialize all members of sync_conf to their defaults [Bugz 315] drbd: Disable delay probes for the upcomming release writeback: cleanup bdi_register writeback: add new tracepoints writeback: remove unnecessary init_timer call writeback: optimize periodic bdi thread wakeups writeback: prevent unnecessary bdi threads wakeups writeback: move bdi threads exiting logic to the forker thread writeback: restructure bdi forker loop a little writeback: move last_active to bdi writeback: do not remove bdi from bdi_list writeback: simplify bdi code a little writeback: do not lose wake-ups in bdi threads ... Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and drivers/scsi/scsi_error.c as per Jens.
| | * | | | | | | | writeback: optimize periodic bdi thread wakeupsArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which should take care of the periodic background write-out. However, the write-out will actually start only 'dirty_writeback_interval' centisecs later, so we can delay the wake-up. This change was requested by Nick Piggin who pointed out that if we delay the wake-up, we weed out 2 unnecessary contex switches, which matters because '__mark_inode_dirty()' is a hot-path function. This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which sets up a timer to wake-up the bdi thread and returns. So the wake-up is delayed. We also delete the timer in bdi threads just before writing-back. And synchronously delete it when unregistering bdi. At the unregister point the bdi does not have any users, so no one can arm it again. Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes this change as well. This patch also moves the 'bdi_wb_init()' function down in the file to avoid forward-declaration of 'bdi_wakeup_thread_delayed()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: prevent unnecessary bdi threads wakeupsArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very bad for battery-driven devices. There are two types of activities bdi threads do: 1. process bdi works from the 'bdi->work_list' 2. periodic write-back So there are 2 sources of wake-up events for bdi threads: 1. 'bdi_queue_work()' - submits bdi works 2. '__mark_inode_dirty()' - adds dirty I/O to bdi's The former already has bdi wake-up code. The latter does not, and this patch adds it. '__mark_inode_dirty()' is hot-path function, but this patch adds another 'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when the bdi has no dirty inodes. So adding this spinlock should be fine and should not affect performance. This patch makes sure bdi threads and the forker thread do not wake-up if there is nothing to do. The forker thread will nevertheless wake up at least every 5 min. to check whether it has to kill a bdi thread. This can also be optimized, but is not worth it. This patch also tidies up the warning about unregistered bid, and turns it from an ugly crocodile to a simple 'WARN()' statement. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: move bdi threads exiting logic to the forker threadArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, bdi threads can decide to exit if there were no useful activities for 5 minutes. However, this causes nasty races: we can easily oops in the 'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up. And even if we do not oops, but the bdi tread exits immediately after we wake it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs) in the bdi work processing. This patch makes the forker thread to be the central place which not only creates bdi threads, but also kills them if they were inactive long enough. This better design-wise. Another reason why this change was done is to prepare for the further changes which will prevent the bdi threads from waking up every 5 sec and wasting power. Indeed, when the task does not wake up periodically anymore, it won't be able to exit either. This patch also moves the the 'wake_up_bit()' call from the bdi thread to the forker thread as well. So now the forker thread sets the BDI_pending bit, then forks the task or kills it, then clears the bit and wakes up the waiting process. The only process which may wain on the bit is 'bdi_wb_shutdown()'. This function was changed as well - now it first removes the bdi from the 'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is guaranteed that the forker thread won't race with it, because the bdi is not visible. Note, the forker thread sets the 'BDI_pending' bit under the 'bdi->wb_lock' which is essential for proper serialization. And additionally, when we change 'bdi->wb.task', we now take the 'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise would when raced with, say, 'bdi_queue_work()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: move last_active to bdiArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently bdi threads use local variable 'last_active' which stores last time when the bdi thread did some useful work. Move this local variable to 'struct bdi_writeback'. This is just a preparation for the further patches which will make the forker thread decide when bdi threads should be killed. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: do not remove bdi from bdi_listArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The forker thread removes bdis from 'bdi_list' before forking the bdi thread. But this is wrong for at least 2 reasons. Reason #1: if we temporary remove a bdi from the list, we may miss works which would otherwise be given to us. Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when it races with the forker thread, it can shut down the bdi thread at the same time as the forker creates it. This patch makes sure the forker thread never removes bdis from 'bdi_list' (which was suggested by Christoph Hellwig). In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending' flag. NOTE! The error path is interesting. Currently, when we fail to create a bdi thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the bdi from the list, we cannot move it to the tail either, because then we can mess up the RCU readers which walk the list. And also, we'll have the race described above in "Reason #2". But I not think that adding to the tail is any important so I just do not do that. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: do not lose wake-ups in bdi threadsArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For example, if 'bdi_queue_work()' is executed after the bdi thread have had finished 'wb_do_writeback()' but before it called 'schedule_timeout_interruptible()'. To fix this issue, we have to check whether we have works to process after we have changed the task state to 'TASK_INTERRUPTIBLE'. This patch also clean-ups handling of the cases when 'dirty_writeback_interval' is zero or non-zero. Additionally, this patch also removes unneeded 'list_empty_careful()' call. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: harmonize writeback threads namingArtem Bityutskiy2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The write-back code mixes words "thread" and "task" for the same things. This is not a big deal, but still an inconsistency. hch: a convention I tend to use and I've seen in various places is to always use _task for the storage of the task_struct pointer, and thread everywhere else. This especially helps with having foo_thread for the actual thread and foo_task for a global variable keeping the task_struct pointer This patch renames: * 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()' * 'bdi_forker_task()' -> 'bdi_forker_thread()' because bdi threads are 'bdi_writeback_thread()', so these names are more consistent. This patch also amends commentaries and makes them refer the forker and bdi threads as "thread", not "task". Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use 'static void' instead of 'void static' and make checkpatch.pl happy. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | coda: fixup clash with block layer REQ_* definesJens Axboe2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CODA should not be using defines in the global name space of that nature, prefix them with CODA_. Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: remove wb in get_next_work_itemMinchan Kim2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 83ba7b07 cleans up the writeback. So we don't use wb any more in get_next_work_item. Let's remove unnecessary argument. CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | splice: fix misuse of SPLICE_F_NONBLOCKMiklos Szeredi2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the pipe. In __generic_file_splice_read(), however, it causes an EAGAIN if the page is currently being read. This makes it impossible to write an application that only wants failure if the pipe is full. For example if the same process is handling both ends of a pipe and isn't otherwise able to determine whether a splice to the pipe will fill it or not. We could make the read non-blocking on O_NONBLOCK or some other splice flag, but for now this is the simplest fix. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | block: push down BKL into .open and .releaseArnd Bergmann2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The open and release block_device_operations are currently called with the BKL held. In order to change that, we must first make sure that all drivers that currently rely on this have no regressions. This blindly pushes the BKL into all .open and .release operations for all block drivers to prepare for the next step. The drivers can subsequently replace the BKL with their own locks or remove it completely when it can be shown that it is not needed. The functions blkdev_get and blkdev_put are the only remaining users of the big kernel lock in the block layer, besides a few uses in the ioctl code, none of which need to serialize with blkdev_{get,put}. Most of these two functions is also under the protection of bdev->bd_mutex, including the actual calls to ->open and ->release, and the common code does not access any global data structures that need the BKL. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: Add tracing to balance_dirty_pagesDave Chinner2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tracing high level background writeback events is good, but it doesn't give the entire picture. Add visibility into write throttling to catch IO dispatched by foreground throttling of processing dirtying lots of pages. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | writeback: Initial tracing supportDave Chinner2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trace queue/sched/exec parts of the writeback loop. This provides insight into when and why flusher threads are scheduled to run. e.g a sync invocation leaves traces like: sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0 This also lays the foundation for adding more writeback tracing to provide deeper insight into the whole writeback path. The original tracing code is from Jens Axboe, though this version is a rewrite as a result of the code being traced changing significantly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| | * | | | | | | | gcc-4.6: fs: fix unused but set warningsAndi Kleen2010-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No real bugs I believe, just some dead code, and some shut up code. Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Eric Paris <eparis@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>