litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Make smp_send_pull_timers() optional.	Bjoern B. Brandenburg	2010-06-09
\| \| \| \| \|	There is currently no need to implement this in ARM. So let's make it optional instead.
*	Add litmus-timer flag for release heap timer	Andrea Bastoni	2010-06-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PreemptRT hrtimers are normally executed as softirq. When marking the release timer as IRQ safe, an extra __run_hrtimer() call is performed in hrtimer_enqueue_reprogram(). This cannot be done in Litmus as it will result in a deadlock on ready_lock. [ 151.834801] ============================================= [ 151.835015] [ INFO: possible recursive locking detected ] [ 151.835015] 2.6.33.5-rt22-litmus2010 #405 [ 151.835015] --------------------------------------------- [ 151.835015] find/1405 is trying to acquire lock: [ 151.835015] (&rt->ready_lock){-.....}, at: [<ffffffff812142db>] gsnedf_release_jobs+0x2b/0x60 [ 151.835015] [ 151.835015] but task is already holding lock: [ 151.835015] (&rt->ready_lock){-.....}, at: [<ffffffff81213e5c>] gsnedf_schedule+0x5c/0x380 [ 151.835015] [ 151.835015] other info that might help us debug this: [ 151.835015] 5 locks held by find/1405: [ 151.835015] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff810f364a>] vfs_readdir+0x7a/0xd0 [ 151.835015] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810f8810>] __d_lookup+0x0/0x1e0 [ 151.835015] #2: (&rq->lock){-...-.}, at: [<ffffffff81480ee7>] __schedule+0x97/0x9e0 [ 151.835015] #3: (&rt->ready_lock){-.....}, at: [<ffffffff81213e5c>] gsnedf_schedule+0x5c/0x380 [ 151.835015] #4: (&rt->tobe_lock){......}, at: [<ffffffff81213a74>] requeue+0x64/0x90 [ 151.835015] stack backtrace: [ 151.835015] Pid: 1405, comm: find Not tainted 2.6.33.5-rt22-litmus2010 #405 [ 151.835015] Call Trace: [ 151.835015] [<ffffffff8106fb52>] __lock_acquire+0x1572/0x1d60 [ 151.835015] [<ffffffff81217509>] ? sched_trace_log_message+0xd9/0x110 [ 151.835015] [<ffffffff8106c01d>] ? trace_hardirqs_off+0xd/0x10 [ 151.835015] [<ffffffff81217538>] ? sched_trace_log_message+0x108/0x110 [ 151.835015] [<ffffffff8107044b>] lock_acquire+0x10b/0x140 [ 151.835015] [<ffffffff812142db>] ? gsnedf_release_jobs+0x2b/0x60 [ 151.835015] [<ffffffff814842d6>] _raw_spin_lock_irqsave+0x46/0x60 [ 151.835015] [<ffffffff812142db>] ? gsnedf_release_jobs+0x2b/0x60 [ 151.835015] [<ffffffff812142db>] gsnedf_release_jobs+0x2b/0x60 [ 151.835015] [<ffffffff81212054>] on_release_timer+0x104/0x140 [ 151.835015] [<ffffffff81211f50>] ? on_release_timer+0x0/0x140 [ 151.835015] [<ffffffff8105edc6>] __run_hrtimer+0x116/0x210 hrtimer_enqueue_reprogram() -- inline function [ 151.835015] [<ffffffff8105f9f4>] __hrtimer_start_range_ns+0x1e4/0x2b0 [ 151.835015] [<ffffffff81211da7>] __add_release+0x417/0x420 [ 151.835015] [<ffffffff81213a86>] requeue+0x76/0x90 [ 151.835015] [<ffffffff81213ba3>] gsnedf_job_arrival+0x13/0x30
*	Merge branch 'linux-preempt-rt' into wip-merge-preempt-2.6.33-rt	Andrea Bastoni	2010-05-31
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Simple merge between 2010.1 (vanilla 2.6.32) and 2.6.33.5-rt22 with conflicts resolved. This commit does not compile, the following main problems are still unresolved: - spinlock -> raw_spinlock (semantics: spinlocks can sleep in -rt) - rwlock and wait_queue_t lock - kfifo API changes - sched_class API changes (get_rr_interval() signature change) Conflicts: Makefile arch/x86/include/asm/unistd_32.h arch/x86/kernel/syscall_table_32.S include/linux/hrtimer.h kernel/hrtimer.c kernel/sched.c kernel/sched_fair.c
\| *	Merge stable/linux-2.6.33.y into rt/2.6.33	Thomas Gleixner	2010-05-31
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	revert "procfs: provide stack information for threads" and its fixup commits	Robin Holt	2010-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream. Originally, commit d899bf7b ("procfs: provide stack information for threads") attempted to introduce a new feature for showing where the threadstack was located and how many pages are being utilized by the stack. Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was applied to fix the NO_MMU case. Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") was applied to fix a bug in ia32 executables being loaded. Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel threads") was applied to fix a bug which had kernel threads printing a userland stack address. Commit 1306d603f ('proc: partially revert "procfs: provide stack information for threads"') was then applied to revert the stack pages being used to solve a significant performance regression. This patch nearly undoes the effect of all these patches. The reason for reverting these is it provides an unusable value in field 28. For x86_64, a fork will result in the task->stack_start value being updated to the current user top of stack and not the stack start address. This unpredictability of the stack_start value makes it worthless. That includes the intended use of showing how much stack space a thread has. Other architectures will get different values. As an example, ia64 gets 0. The do_fork() and copy_process() functions appear to treat the stack_start and stack_size parameters as architecture specific. I only partially reverted c44972f1 ("procfs: disable per-task stack usage on NOMMU") . If I had completely reverted it, I would have had to change mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is configured. Since I could not test the builds without significant effort, I decided to not change mm/Makefile. I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on 64-bit") . I left the KSTK_ESP() change in place as that seemed worthwhile. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Stefani Seibold <stefani@seibold.net> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Michal Simek <monstr@monstr.eu> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| * \|	net: Make [dis/en]able_irq_*_lockdep() RT safe	Thomas Gleixner	2010-05-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The lockdep irqoff protection which is used to prevent lockdep false positives leads to "scheduling while atomic" and "might sleep" bug floods. Make the irq disabling depend on !RT. Reported-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Merge branch '2.6.33.4' into rt/2.6.33	Thomas Gleixner	2010-05-12
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: Makefile Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2)	Mark Lord	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 45c4d015a92f72ec47acd0c7557abdc0c8a6499d upstream. Most drives from Seagate, Hitachi, and possibly other brands, do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1). So instead use LBA48 for such accesses. This bug could bite a lot of systems, especially when the user has taken care to align partitions to 4KB boundaries. On misaligned systems, it is less likely to be encountered, since a 4KB read would end at 0x10000000 rather than at 0x0fffffff. Signed-off-by: Mark Lord <mlord@pobox.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	hugetlb: fix infinite loop in get_futex_key() when backed by huge pages	Mel Gorman	2010-05-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 23be7468e8802a2ac1de6ee3eecb3ec7f14dc703 upstream. If a futex key happens to be located within a huge page mapped MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a page->mapping that will never exist. See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details about the problem. This patch makes page->mapping a poisoned value that includes PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to continue but because of PAGE_MAPPING_ANON, the poisoned value is not dereferenced or used by futex. No other part of the VM should be dereferencing the page->mapping of a hugetlbfs page as its page cache is not on the LRU. This patch fixes the problem with the test case described in the bugzilla. [akpm@linux-foundation.org: mel cant spel] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <darren@dvhart.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| * \|	fs: Fix namespace related hangs	John Stultz	2010-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Nick converted the dentry->d_mounted counter to a flag, however with namespaces, dentries can be mounted multiple times (and more importantly unmounted multiple times). If a namespace was created and then released, the unmount_tree would remove the DCACHE_MOUNTED flag and that would make d_mountpoint fail, causing the mounts to be lost. This patch coverts it back to a counter, and adds some extra WARN_ONs to make sure things are accounted properly. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Nick Piggin <npiggin@suse.de> LKML-Reference: <1272522942.1967.12.camel@work-vm> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	hugetlb: fix infinite loop in get_futex_key() when backed by huge pages	Mel Gorman	2010-04-28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a futex key happens to be located within a huge page mapped MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a page->mapping that will never exist. See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details about the problem. This patch makes page->mapping a poisoned value that includes PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to continue but because of PAGE_MAPPING_ANON, the poisoned value is not dereferenced or used by futex. No other part of the VM should be dereferencing the page->mapping of a hugetlbfs page as its page cache is not on the LRU. This patch fixes the problem with the test case described in the bugzilla. [ upstream commit: 23be7468e8802a2ac1de6ee3eecb3ec7f14dc703 ] [akpm@linux-foundation.org: mel cant spel] Signed-off-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Darren Hart <darren@dvhart.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Fixup some compilation warnings and errors	John Stultz	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Amit Arora noticed some compile issues with coda, and an fs.h include issue, so so this patch fixes those along with btrfs warnings. Thanks to Amit for the testing! Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Revert d_count back to an atomic_t	John Stultz	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch reverts the portion of Nick's vfs scalability patch that converts the dentry d_count from an atomic_t to an int protected by the d_lock. This greatly improves vfs scalability with the -rt kernel, as the extra lock contention on the d_lock hurts very badly when CONFIG_PREEMPT_RT is enabled and the spinlocks become rtmutexes. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Fix vfsmount_read_lock to work with -rt	John Stultz	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because vfsmount_read_lock aquires the vfsmount spinlock for the current cpu, it causes problems wiht -rt, as you might migrate between cpus between a lock and unlock. This patch fixes the issue by having the caller pick a cpu, then consistently use that cpu between the lock and unlock. We may migrate inbetween lock and unlock, but that's ok because we're not doing anything cpu specific, other then avoiding contention on the read side across the cpus. Its not pretty, but it works and statistically shouldn't hurt performance. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Fixups from 09102009.patch.gz	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch is just the delta from Nick's 06102009 and his 09102009 megapatches Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-sb-inodes-percpu	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-nr_inodes-percpu	Eric Dumazet	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fs: inode per-cpu nr_inodes counter Avoids cache line ping pongs between cpus and prepare next patch, because updates of nr_inodes dont need inode_lock anymore. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode-nr_inodes	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XXX: this should be folded back into the individual locking patches Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_rcu	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RCU free the struct inode. This will allow: - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want to take i_lock no longer need to take sb_inode_list_lock to walk the list in the first place. This will simplify and optimize locking. - eventually, completely write-free RCU path walking. The inode must be consulted for permissions when walking, so a write-free reference (ie. RCU is helpful). - can potentially simplify things a bit in VM land. May not need to take the page lock to get back to the page->mapping. - can remove some nested trylock loops in dcache code todo: convert all filesystems Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-10	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impelemnt lazy inode lru similarly to dcache. This should reduce inode list lock acquisition (todo: measure). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-8	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make inode_hash_lock private by adding a function __remove_inode_hash that can be used by filesystems defining their own drop_inode functions. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-7	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the global inode_lock Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-5	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect inodes_stat statistics with atomic ops rather than inode_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-6	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new lock, wb_inode_list_lock, to protect i_list and various lists which the inode can be put onto. XXX: haven't audited ocfs2 Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-4	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect inode->i_count with i_lock, rather than having it atomic. Next step should also be to move things together (eg. the refcount increment into d_instantiate, which will remove a lock/unlock cycle on i_lock). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale-2	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new lock, inode_hash_lock, to protect the inode hash table lists. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-inode_lock-scale	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect sb->s_inodes with a new lock, sb_inode_list_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	dcache-percpu-nr_dentry	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The nr_dentry stat is a globally touched cacheline and atomic operation twice over the lifetime of a dentry. It is used for the benfit of userspace only. We could make a per-cpu counter or something for it, but it is only accessed via proc, so we could use slab stats. XXX: must implement slab routines to return stats for a single cache, and implement the proc handler. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	dcache-split-inode_lock	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_inode_lock can be replaced with per-inode locking. Use existing inode->i_lock for this. This is slightly non-trivial because we sometimes need to find the inode from the dentry, which requires d_inode to be stabilised (either with refcount or d_lock). Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	dcache-chain-hashlock	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We can turn the dcache hash locking from a global dcache_hash_lock into per-bucket locking. XXX: should probably use a bit lock in the first bit of the hash pointers to avoid any space bloating (non-atomic unlock means no extra atomics either) Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache_lock-remove	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	dcache_lock no longer protects anything (I hope). remove it. This breaks a lot of the tree where I haven't thought about the problem, but it simplifies the dcache.c code quite a bit (and it's also probably a good thing to break unconverted code). So I include this here before making further changes to the locking. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-i_dentry	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list from concurrent modification. d_alias is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-d_subdirs	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Protect d_subdirs and d_child with d_lock, except in filesystems that aren't using dcache_lock for these anyway (eg. using i_mutex). XXX: probably don't need parent lock in inotify (because child lock should stabilize parent). Also, possibly some filesystems don't need so much locking (eg. of child dentry when modifying d_child, so long as parent is locked)... but be on the safe side. Hmm, maybe we should just say d_child list is protected by d_parent->d_lock. d_parent could remain protected with d_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-d_count	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make d_count non-atomic and protect it with d_lock. This allows us to ensure a 0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when we start protecting many other dentry members with d_lock. XXX: This patch does not boot on its own Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-nr_dentry	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make dentry_stat_t.nr_dentry an atomic_t type, and move it from under dcache_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-dcache-scale-d_hash	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a new lock, dcache_hash_lock, to protect the dcache hash table from concurrent modification. d_hash is also protected by d_lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-mntget-scale	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve scalability of mntget/mntput by using per-cpu counters protected by the reader side of the brlock vfsmount_lock. mnt_mounted keeps track of whether the vfsmount is actually attached to the tree so we can shortcut expensive checks in mntput. XXX: count_mnt_count needs write lock. Document this and/or revisit locking (eg. look at writers count) Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-vfsmount_lock-scale	Nick Piggin	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use a brlock for the vfsmount lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-files_lock-scale	John Stultz	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with per-cpu locking. Effectively turning it into a big-writer lock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	fs-files_list-improve	John Stultz	2010-04-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| * \|	Merge branch 'master' of	Thomas Gleixner	2010-04-27
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.33.y into rt/2.6.33 Conflicts: Makefile arch/x86/include/asm/rwsem.h Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
\| \| *	KVM: Increase NR_IOBUS_DEVS limit to 200	Sridhar Samudrala	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit e80e2a60ff7914dae691345a976c80bbbff3ec74) This patch increases the current hardcoded limit of NR_IOBUS_DEVS from 6 to 200. We are hitting this limit when creating a guest with more than 1 virtio-net device using vhost-net backend. Each virtio-net device requires 2 such devices to service notifications from rx/tx queues. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	KVM: fix the handling of dirty bitmaps to avoid overflows	Takuya Yoshikawa	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(Cherry-picked from commit 87bf6e7de1134f48681fd2ce4b7c1ec45458cb6d) Int is not long enough to store the size of a dirty bitmap. This patch fixes this problem with the introduction of a wrapper function to calculate the sizes of dirty bitmaps. Note: in mark_page_dirty(), we have to consider the fact that __set_bit() takes the offset as int, not long. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	module: fix __module_ref_addr()	Mathieu Desnoyers	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The __module_ref_addr() problem disappears in 2.6.34-rc kernels because these percpu accesses were re-factored. __module_ref_addr() should use per_cpu_ptr() to obfuscate the pointer (RELOC_HIDE is needed for per cpu pointers). This non-standard per-cpu pointer use has been introduced by commit 720eba31f47aeade8ec130ca7f4353223c49170f It causes a NULL pointer exception on some configurations when CONFIG_TRACING is enabled on 2.6.33. This patch fixes the problem (acknowledged by Randy who reported the bug). It did not appear to hurt previously because most of the accesses were done through local_inc, which probably obfuscated the access enough that no compiler optimizations were done. But with local_read() done when CONFIG_TRACING is active, this becomes a problem. Non-CONFIG_TRACING is probably affected as well (module.c contains local_set and local_read that use __module_ref_addr()), but I guess nobody noticed because we've been lucky enough that the compiler did not generate the inappropriate optimization pattern there. This patch should be queued for the 2.6.29.x through 2.6.33.x stable branches. (tested on 2.6.33.1 x86_64) Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Tested-by: Randy Dunlap <randy.dunlap@oracle.com> CC: Eric Dumazet <dada1@cosmosbay.com> CC: Rusty Russell <rusty@rustcorp.com.au> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Tejun Heo <tj@kernel.org> CC: Ingo Molnar <mingo@elte.hu> CC: Andrew Morton <akpm@linux-foundation.org> CC: Linus Torvalds <torvalds@linux-foundation.org> CC: Greg Kroah-Hartman <gregkh@suse.de> CC: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	x86/PCI: irq and pci_ids patch for Intel Cougar Point DeviceIDs	Seth Heasley	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 93da6202264ce1256b04db8008a43882ae62d060 upstream. This patch adds the Intel Cougar Point (PCH) LPC and SMBus Controller DeviceIDs. Signed-off-by: Seth Heasley <seth.heasley@intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	pci: Update pci_set_vga_state() to call arch functions	Mike Travis	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 95a8b6efc5d07103583f706c8a5889437d537939 upstream. Update pci_set_vga_state to call arch dependent functions to enable Legacy VGA I/O transactions to be redirected to correct target. [akpm@linux-foundation.org: make pci_register_set_vga_state() __init] Signed-off-by: Mike Travis <travis@sgi.com> LKML-Reference: <201002022238.o12McE1J018723@imap1.linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Robin Holt <holt@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: David Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Cc: maximilian attems <max@stro.at> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	dm ioctl: introduce flag indicating uevent was generated	Peter Rajnoha	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 3abf85b5b5851b5f28d3d8a920ebb844edd08352 upstream. Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to indicate that a uevent was actually generated. This tells the userspace caller that it may need to wait for the event to be processed. Signed-off-by: Peter Rajnoha <prajnoha@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	NFSv4: fix delegated locking	Trond Myklebust	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 0df5dd4aae211edeeeb84f7f84f6d093406d7c22 upstream. Arnaud Giersch reports that NFSv4 locking is broken when we hold a delegation since commit 8e469ebd6dc32cbaf620e134d79f740bf0ebab79 (NFSv4: Don't allow posix locking against servers that don't support it). According to Arnaud, the lock succeeds the first time he opens the file (since we cannot do a delegated open) but then fails after we start using delegated opens. The following patch fixes it by ensuring that locking behaviour is governed by a per-filesystem capability flag that is initially set, but gets cleared if the server ever returns an OPEN without the NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set. Reported-by: Arnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	include/linux/kfifo.h: fix INIT_KFIFO()	David Härdeman	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 530cd330dc3865e3107304a6e84fdc332aa72f7d upstream. DECLARE_KFIFO creates a union with a struct kfifo and a buffer array with size [size + sizeof(struct kfifo)]. INIT_KFIFO then sets the buffer pointer in struct kfifo to point to the beginning of the buffer array which means that the first call to kfifo_in will overwrite members of the struct kfifo. Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
\| \| *	raw: fsync method is now required	Anton Blanchard	2010-04-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 55ab3a1ff843e3f0e24d2da44e71bffa5d853010 upstream. Commit 148f948ba877f4d3cdef036b1ff6d9f68986706a (vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode) broke the raw driver. We now call through generic_file_aio_write -> generic_write_sync -> vfs_fsync_range. vfs_fsync_range has: if (!fop \|\| !fop->fsync) { ret = -EINVAL; goto out; } But drivers/char/raw.c doesn't set an fsync method. We have two options: fix it or remove the raw driver completely. I'm happy to do either, the fact this has been broken for so long suggests it is rarely used. The patch below adds an fsync method to the raw driver. My knowledge of the block layer is pretty sketchy so this could do with a once over. If we instead decide to remove the raw driver, this patch might still be useful as a backport to 2.6.33 and 2.6.32. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <jens.axboe@oracle.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Tested-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>