| Commit message (Collapse) | Author | Age |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When (re)setting Litmus task policy to BACKGROUND, kfree is called while
in atomic context. This cannot be done in PreemptRT as it will deadlock
(this commit fixes the below trace).
Properly freeing rt-task structures while not holding proper locks
breaks various assumptions in the code and therefore these operations
cannot be easily postponed (to a moment when we don't hold the locks).
The solution is a bad hack and if the policy is reset to BACKGROUND task
we leak some memory.
Better solutions are very welcomed.
[ 52.850018] =======================================================
[ 52.850018] [ INFO: possible circular locking dependency detected ]
[ 52.850018] 2.6.33.5-rt22-litmus2010 #441
[ 52.850018] -------------------------------------------------------
[ 52.850018] longtest_g/1637 is trying to acquire lock:
[ 52.850018] (per_cpu__lock_slab_locked){......}, at: [<ffffffff8142b783>] rt_spin_lock_slowlock+0x33/0x360
[ 52.850018]
[ 52.850018] but task is already holding lock:
[ 52.850018] (&rq->lock){-...-.}, at: [<ffffffff8102ee52>] __sched_setscheduler+0x112/0x4f0
[ 52.850018]
[ 52.850018] which lock already depends on the new lock.
[ 52.850018]
[ 52.850018]
[ 52.850018] the existing dependency chain (in reverse order) is:
[ 52.850018]
[ 52.850018] -> #2 (&rq->lock){-...-.}:
[ 52.850018] [<ffffffff81060954>] __lock_acquire+0x13c4/0x1cd0
[ 52.850018] [<ffffffff810612bc>] lock_acquire+0x5c/0x80
[ 52.850018] [<ffffffff8142cc76>] _raw_spin_lock+0x36/0x50
[ 52.850018] [<ffffffff8102551e>] task_rq_lock+0x5e/0xb0
[ 52.850018] [<ffffffff8102fa44>] try_to_wake_up+0x64/0x420
[ 52.850018] [<ffffffff8102fe65>] wake_up_process_mutex+0x15/0x20
[ 52.850018] [<ffffffff8106708e>] wakeup_next_waiter+0x9e/0x1b0
[ 52.850018] [<ffffffff8142b70f>] rt_spin_lock_slowunlock+0x4f/0x90
[ 52.850018] [<ffffffff8142c419>] rt_spin_unlock+0x49/0x50
[ 52.850018] [<ffffffff8102ceb4>] complete+0x44/0x50
[ 52.850018] [<ffffffff8104d07c>] kthread+0x7c/0xc0
[ 52.850018] [<ffffffff81003214>] kernel_thread_helper+0x4/0x10
[ 52.850018]
[ 52.850018] -> #1 (&p->pi_lock){......}:
[ 52.850018] [<ffffffff81060954>] __lock_acquire+0x13c4/0x1cd0
[ 52.850018] [<ffffffff810612bc>] lock_acquire+0x5c/0x80
[ 52.850018] [<ffffffff8142cc76>] _raw_spin_lock+0x36/0x50
[ 52.850018] [<ffffffff81066859>] task_blocks_on_rt_mutex+0x39/0x210
[ 52.850018] [<ffffffff8142b9c3>] rt_spin_lock_slowlock+0x273/0x360
[ 52.850018] [<ffffffff8142c383>] rt_spin_lock+0x43/0x90
[ 52.850018] [<ffffffff810ab23e>] _slab_irq_disable+0x4e/0x70
[ 52.850018] [<ffffffff810ab93f>] kmem_cache_free+0x1f/0xf0
[ 52.850018] [<ffffffff810b34a1>] file_free_rcu+0x31/0x40
[ 52.850018] [<ffffffff81074a48>] __rcu_process_callbacks+0x128/0x3a0
[ 52.850018] [<ffffffff81074d3b>] rcu_process_callbacks+0x7b/0x90
[ 52.850018] [<ffffffff8103a96f>] run_ksoftirqd+0x14f/0x310
[ 52.850018] [<ffffffff8104d0a6>] kthread+0xa6/0xc0
[ 52.850018] [<ffffffff81003214>] kernel_thread_helper+0x4/0x10
[ 52.850018]
[ 52.850018] -> #0 (per_cpu__lock_slab_locked){......}:
[ 52.850018] [<ffffffff8106120c>] __lock_acquire+0x1c7c/0x1cd0
[ 52.850018] [<ffffffff810612bc>] lock_acquire+0x5c/0x80
[ 52.850018] [<ffffffff8142cd91>] _raw_spin_lock_irqsave+0x41/0x60
[ 52.850018] [<ffffffff8142b783>] rt_spin_lock_slowlock+0x33/0x360
[ 52.850018] [<ffffffff8142c383>] rt_spin_lock+0x43/0x90
[ 52.850018] [<ffffffff810ab23e>] _slab_irq_disable+0x4e/0x70
[ 52.850018] [<ffffffff810ab93f>] kmem_cache_free+0x1f/0xf0
[ 52.850018] [<ffffffff811d1374>] litmus_exit_task+0x84/0x130
[ 52.850018] [<ffffffff8102f11f>] __sched_setscheduler+0x3df/0x4f0
[ 52.850018] [<ffffffff8102f24e>] sched_setscheduler+0xe/0x10
[ 52.850018] [<ffffffff8102f30d>] do_sched_setscheduler+0xbd/0x100
[ 52.850018] [<ffffffff8102f384>] sys_sched_setscheduler+0x14/0x20
[ 52.850018] [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
[ 52.850018]
[ 52.850018] other info that might help us debug this:
[ 52.850018]
[ 52.850018] 3 locks held by longtest_g/1637:
[ 52.850018] #0: (rcu_read_lock){.+.+..}, at: [<ffffffff8102f2bf>] do_sched_setscheduler+0x6f/0x100
[ 52.850018] #1: (&p->pi_lock){......}, at: [<ffffffff8102ee0e>] __sched_setscheduler+0xce/0x4f0
[ 52.850018] #2: (&rq->lock){-...-.}, at: [<ffffffff8102ee52>] __sched_setscheduler+0x112/0x4f0
[ 52.850018]
[ 52.850018] stack backtrace:
[ 52.850018] Pid: 1637, comm: longtest_g Tainted: G W 2.6.33.5-rt22-litmus2010 #441
[ 52.850018] Call Trace:
[ 52.850018] [<ffffffff8105ef10>] print_circular_bug+0x100/0x110
[ 52.850018] [<ffffffff8106120c>] __lock_acquire+0x1c7c/0x1cd0
[ 52.850018] [<ffffffff81429659>] ? printk+0x67/0x69
[ 52.850018] [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b
[ 52.850018] [<ffffffff8100246b>] ? system_call_fastpath+0x16/0x1b
[ 52.850018] [<ffffffff810612bc>] lock_acquire+0x5c/0x80
[ 52.850018] [<ffffffff8142b783>] ? rt_spin_lock_slowlock+0x33/0x360
[ 52.850018] [<ffffffff8142cd91>] _raw_spin_lock_irqsave+0x41/0x60
[ 52.850018] [<ffffffff8142b783>] ? rt_spin_lock_slowlock+0x33/0x360
[ 52.850018] [<ffffffff81005649>] ? dump_trace+0x129/0x330
[ 52.850018] [<ffffffff8142b783>] rt_spin_lock_slowlock+0x33/0x360
[ 52.850018] [<ffffffff8142c383>] rt_spin_lock+0x43/0x90
[ 52.850018] [<ffffffff810ab23e>] _slab_irq_disable+0x4e/0x70
[ 52.850018] [<ffffffff810ab93f>] kmem_cache_free+0x1f/0xf0
[ 52.850018] [<ffffffff811d1374>] litmus_exit_task+0x84/0x130
[ 52.850018] [<ffffffff810242d8>] ? dequeue_task+0x48/0x90
[ 52.850018] [<ffffffff8102f11f>] __sched_setscheduler+0x3df/0x4f0
[ 52.850018] [<ffffffff8102f24e>] sched_setscheduler+0xe/0x10
[ 52.850018] [<ffffffff8102f30d>] do_sched_setscheduler+0xbd/0x100
[ 52.850018] [<ffffffff8102f2bf>] ? do_sched_setscheduler+0x6f/0x100
[ 52.850018] [<ffffffff8142bf02>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 52.850018] [<ffffffff8102f384>] sys_sched_setscheduler+0x14/0x20
[ 52.850018] [<ffffffff8100246b>] system_call_fastpath+0x16/0x1b
|
|
|
|
|
|
|
| |
The wait_queue_t lock used to protect fmlp_down and fmlp_up operations
in vanilla kernel is a sleeping lock in PreemptRT. We call fmlp
operations from atomic contexts. This commit replaces the wait_queue
lock with a raw_spin_lock.
|
|
|
|
|
|
|
|
|
|
|
| |
Feather-Trace rewrites instructions in the kernel's .text segment.
This segment may be write-protected if CONFIG_DEBUG_RODATA is selected.
In this case, fall back to the default flag-based Feather-Trace
implementation. In the future, we could either adopt the ftrace method
of rewriting .text addresses using non-.text mappings or we could
consider replacing Feather-Trace with ftrace altogether.
For now, this patch avoids unexpected runtime errors.
|
|
|
|
|
| |
There is currently no need to implement this in ARM.
So let's make it optional instead.
|
|
|
|
|
|
|
|
| |
Introduces CONFIG_RELEASE_MASTER and makes release
master support dependent on the new symbol. This is
useful because dedicated interrupt handling only applies
to "large" multicore platforms. This will allow us to
not implement smp_send_pull_timers() for all platforms.
|
|
|
|
|
|
|
|
|
| |
The idea of the Feather-Trace default implementation is that LITMUS^RT should
work without a specialized Feather-Trace implementation present. This was
actually broken.
Changes litmus/feather_trace.h to only include asm/feather_trace.h if actually
promised by the architecture.
|
|
|
|
|
|
|
|
|
|
|
| |
In PreemptRT the new state TASK_RUNNING_MUTEX should be properly
considered as a running state. A task that is woken up when another task
sleeps on a lock runs with the TASK_RUNNING_MUTEX state to preserve the
state of the blocked task. To exploit these new running windows, we need
to account for TASK_RUNNING_MUTEX as a running state.
This solves the "BAD: migration invariant failed" condition that can be
otherwise observed when tracing.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In PreemptRT hrtimers are normally executed as softirq. When marking
the release timer as IRQ safe, an extra __run_hrtimer() call is performed
in hrtimer_enqueue_reprogram(). This cannot be done in Litmus as it will
result in a deadlock on ready_lock.
[ 151.834801] =============================================
[ 151.835015] [ INFO: possible recursive locking detected ]
[ 151.835015] 2.6.33.5-rt22-litmus2010 #405
[ 151.835015] ---------------------------------------------
[ 151.835015] find/1405 is trying to acquire lock:
[ 151.835015] (&rt->ready_lock){-.....}, at: [<ffffffff812142db>] gsnedf_release_jobs+0x2b/0x60
[ 151.835015]
[ 151.835015] but task is already holding lock:
[ 151.835015] (&rt->ready_lock){-.....}, at: [<ffffffff81213e5c>] gsnedf_schedule+0x5c/0x380
[ 151.835015]
[ 151.835015] other info that might help us debug this:
[ 151.835015] 5 locks held by find/1405:
[ 151.835015] #0: (&sb->s_type->i_mutex_key#7){+.+.+.}, at: [<ffffffff810f364a>] vfs_readdir+0x7a/0xd0
[ 151.835015] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810f8810>] __d_lookup+0x0/0x1e0
[ 151.835015] #2: (&rq->lock){-...-.}, at: [<ffffffff81480ee7>] __schedule+0x97/0x9e0
[ 151.835015] #3: (&rt->ready_lock){-.....}, at: [<ffffffff81213e5c>] gsnedf_schedule+0x5c/0x380
[ 151.835015] #4: (&rt->tobe_lock){......}, at: [<ffffffff81213a74>] requeue+0x64/0x90
[ 151.835015] stack backtrace:
[ 151.835015] Pid: 1405, comm: find Not tainted 2.6.33.5-rt22-litmus2010 #405
[ 151.835015] Call Trace:
[ 151.835015] [<ffffffff8106fb52>] __lock_acquire+0x1572/0x1d60
[ 151.835015] [<ffffffff81217509>] ? sched_trace_log_message+0xd9/0x110
[ 151.835015] [<ffffffff8106c01d>] ? trace_hardirqs_off+0xd/0x10
[ 151.835015] [<ffffffff81217538>] ? sched_trace_log_message+0x108/0x110
[ 151.835015] [<ffffffff8107044b>] lock_acquire+0x10b/0x140
[ 151.835015] [<ffffffff812142db>] ? gsnedf_release_jobs+0x2b/0x60
[ 151.835015] [<ffffffff814842d6>] _raw_spin_lock_irqsave+0x46/0x60
[ 151.835015] [<ffffffff812142db>] ? gsnedf_release_jobs+0x2b/0x60
[ 151.835015] [<ffffffff812142db>] gsnedf_release_jobs+0x2b/0x60
[ 151.835015] [<ffffffff81212054>] on_release_timer+0x104/0x140
[ 151.835015] [<ffffffff81211f50>] ? on_release_timer+0x0/0x140
[ 151.835015] [<ffffffff8105edc6>] __run_hrtimer+0x116/0x210
hrtimer_enqueue_reprogram() -- inline function
[ 151.835015] [<ffffffff8105f9f4>] __hrtimer_start_range_ns+0x1e4/0x2b0
[ 151.835015] [<ffffffff81211da7>] __add_release+0x417/0x420
[ 151.835015] [<ffffffff81213a86>] requeue+0x76/0x90
[ 151.835015] [<ffffffff81213ba3>] gsnedf_job_arrival+0x13/0x30
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
NO_ENFORCEMENT - A job may execute beyond its declared execution time.
Jobs notify the kernel that they are complete via liblitmus's
sleep_next_period()
QUANTUM_ENFORCEMENT - The kernel terminates a job if its actual execution
time exceeds the declared execution time.
PRECISE_ENFORCEMENT - Hook declared, but not yet implemented. Plan to
support this policy through hrtimers. Error thrown if specified.
This patch from Glenn is integrated early in the development of Litmus
PreemptRT as it's needed to compile the library.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Integrate commit a66246f9e (from the 2.6.34 vanilla porting) to change
most of spinlock-I-really-would-like-you-to-spin into the raw_spinlock
(that really spin in Preempt-RT).
- Fix no longer atomic_t "i_count" in fdso.c.
- wait_queue_t lock is defined as spinlock_t; it is used in:
* fmlp.c -- sem->wait.lock
This can be a problem as this lock may be used in atomic contexts
- wait_queue_t lock is also used in:
* sync.c -- ts_release.wait.lock
This lock controls the synchronous releases of tasks and it's not used
in atomic contexts. It should be safe to leave it with the current
implementation.
This commit also fixes warnings and errors due to the need to include
slab.h when using kmalloc() and friends.
Although this commit compiles, abovementioned locks and spinlocks in
sched_litmus.c still need to be changed.
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Simple merge between 2010.1 (vanilla 2.6.32) and 2.6.33.5-rt22 with
conflicts resolved.
This commit does not compile, the following main problems are still
unresolved:
- spinlock -> raw_spinlock (semantics: spinlocks can sleep in -rt)
- rwlock and wait_queue_t lock
- kfifo API changes
- sched_class API changes (get_rr_interval() signature change)
Conflicts:
Makefile
arch/x86/include/asm/unistd_32.h
arch/x86/kernel/syscall_table_32.S
include/linux/hrtimer.h
kernel/hrtimer.c
kernel/sched.c
kernel/sched_fair.c
|
| |\
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
Makefile
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 34441427aab4bdb3069a4ffcda69a99357abcb2e upstream.
Originally, commit d899bf7b ("procfs: provide stack information for
threads") attempted to introduce a new feature for showing where the
threadstack was located and how many pages are being utilized by the
stack.
Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
applied to fix the NO_MMU case.
Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
64-bit") was applied to fix a bug in ia32 executables being loaded.
Commit 9ebd4eba7 ("procfs: fix /proc/<pid>/stat stack pointer for kernel
threads") was applied to fix a bug which had kernel threads printing a
userland stack address.
Commit 1306d603f ('proc: partially revert "procfs: provide stack
information for threads"') was then applied to revert the stack pages
being used to solve a significant performance regression.
This patch nearly undoes the effect of all these patches.
The reason for reverting these is it provides an unusable value in
field 28. For x86_64, a fork will result in the task->stack_start
value being updated to the current user top of stack and not the stack
start address. This unpredictability of the stack_start value makes
it worthless. That includes the intended use of showing how much stack
space a thread has.
Other architectures will get different values. As an example, ia64
gets 0. The do_fork() and copy_process() functions appear to treat the
stack_start and stack_size parameters as architecture specific.
I only partially reverted c44972f1 ("procfs: disable per-task stack usage
on NOMMU") . If I had completely reverted it, I would have had to change
mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
configured. Since I could not test the builds without significant effort,
I decided to not change mm/Makefile.
I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
information for threads on 64-bit") . I left the KSTK_ESP() change in
place as that seemed worthwhile.
Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Stefani Seibold <stefani@seibold.net>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit f33d7e2d2d113a63772bbc993cdec3b5327f0ef1 upstream.
dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device() use
a wrong address with a partial synchronization.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The lockdep irqoff protection which is used to prevent lockdep false
positives leads to "scheduling while atomic" and "might sleep" bug
floods.
Make the irq disabling depend on !RT.
Reported-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| |\|
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
Makefile
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
[ Upstream commit c0786693404cffd80ca3cb6e75ee7b35186b2825 ]
When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF. This action runs the sctp state
machine recursively and it's not prepared to do so.
kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]
Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
[<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
[<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
[<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
[<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
[<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
[<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
[<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
[<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
[<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
[<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
[<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]
Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
[ Upstream commit 561b1733a465cf9677356b40c27653dd45f1ac56 ]
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
(clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
(slock-AF_INET){+.-...}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
1 lock held by sctp_darn/1517:
#0: (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 45c4d015a92f72ec47acd0c7557abdc0c8a6499d upstream.
Most drives from Seagate, Hitachi, and possibly other brands,
do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1).
So instead use LBA48 for such accesses.
This bug could bite a lot of systems, especially when the user has
taken care to align partitions to 4KB boundaries. On misaligned systems,
it is less likely to be encountered, since a 4KB read would end at
0x10000000 rather than at 0x0fffffff.
Signed-off-by: Mark Lord <mlord@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
commit 23be7468e8802a2ac1de6ee3eecb3ec7f14dc703 upstream.
If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.
See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.
This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex. No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.
This patch fixes the problem with the test case described in the bugzilla.
[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Darren Hart <darren@dvhart.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Nick converted the dentry->d_mounted counter to a flag, however with
namespaces, dentries can be mounted multiple times (and more
importantly unmounted multiple times).
If a namespace was created and then released, the unmount_tree would
remove the DCACHE_MOUNTED flag and that would make d_mountpoint fail,
causing the mounts to be lost.
This patch coverts it back to a counter, and adds some extra WARN_ONs
to make sure things are accounted properly.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Nick Piggin <npiggin@suse.de>
LKML-Reference: <1272522942.1967.12.camel@work-vm>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If a futex key happens to be located within a huge page mapped
MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
page->mapping that will never exist.
See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
about the problem.
This patch makes page->mapping a poisoned value that includes
PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
continue but because of PAGE_MAPPING_ANON, the poisoned value is not
dereferenced or used by futex. No other part of the VM should be
dereferencing the page->mapping of a hugetlbfs page as its page cache is
not on the LRU.
This patch fixes the problem with the test case described in the bugzilla.
[ upstream commit: 23be7468e8802a2ac1de6ee3eecb3ec7f14dc703 ]
[akpm@linux-foundation.org: mel cant spel]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Darren Hart <darren@dvhart.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Amit Arora noticed some compile issues with coda, and an fs.h include
issue, so so this patch fixes those along with btrfs warnings.
Thanks to Amit for the testing!
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch reverts the portion of Nick's vfs scalability patch that
converts the dentry d_count from an atomic_t to an int protected by
the d_lock.
This greatly improves vfs scalability with the -rt kernel, as
the extra lock contention on the d_lock hurts very badly when
CONFIG_PREEMPT_RT is enabled and the spinlocks become rtmutexes.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Because vfsmount_read_lock aquires the vfsmount spinlock for the current cpu,
it causes problems wiht -rt, as you might migrate between cpus between a
lock and unlock.
This patch fixes the issue by having the caller pick a cpu, then consistently
use that cpu between the lock and unlock. We may migrate inbetween lock and
unlock, but that's ok because we're not doing anything cpu specific, other
then avoiding contention on the read side across the cpus.
Its not pretty, but it works and statistically shouldn't hurt performance.
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is just the delta from Nick's 06102009 and his 09102009 megapatches
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
fs: inode per-cpu nr_inodes counter
Avoids cache line ping pongs between cpus and prepare next patch,
because updates of nr_inodes dont need inode_lock anymore.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
XXX: this should be folded back into the individual locking patches
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
RCU free the struct inode. This will allow:
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- eventually, completely write-free RCU path walking. The inode must be
consulted for permissions when walking, so a write-free reference (ie.
RCU is helpful).
- can potentially simplify things a bit in VM land. May not need to take the
page lock to get back to the page->mapping.
- can remove some nested trylock loops in dcache code
todo: convert all filesystems
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Impelemnt lazy inode lru similarly to dcache. This should reduce inode list
lock acquisition (todo: measure).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make inode_hash_lock private by adding a function __remove_inode_hash
that can be used by filesystems defining their own drop_inode functions.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Remove the global inode_lock
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Protect inodes_stat statistics with atomic ops rather than inode_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new lock, wb_inode_list_lock, to protect i_list and various lists
which the inode can be put onto.
XXX: haven't audited ocfs2
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Protect inode->i_count with i_lock, rather than having it atomic.
Next step should also be to move things together (eg. the refcount increment
into d_instantiate, which will remove a lock/unlock cycle on i_lock).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new lock, inode_hash_lock, to protect the inode hash table lists.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Protect sb->s_inodes with a new lock, sb_inode_list_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The nr_dentry stat is a globally touched cacheline and atomic operation
twice over the lifetime of a dentry. It is used for the benfit of userspace
only. We could make a per-cpu counter or something for it, but it is only
accessed via proc, so we could use slab stats.
XXX: must implement slab routines to return stats for a single cache, and
implement the proc handler.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We can turn the dcache hash locking from a global dcache_hash_lock into
per-bucket locking.
XXX: should probably use a bit lock in the first bit of the hash pointers
to avoid any space bloating (non-atomic unlock means no extra atomics either)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
dcache_lock no longer protects anything (I hope). remove it.
This breaks a lot of the tree where I haven't thought about the problem,
but it simplifies the dcache.c code quite a bit (and it's also probably
a good thing to break unconverted code). So I include this here before
making further changes to the locking.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
using dcache_lock for these anyway (eg. using i_mutex).
XXX: probably don't need parent lock in inotify (because child lock
should stabilize parent). Also, possibly some filesystems don't need so
much locking (eg. of child dentry when modifying d_child, so long as
parent is locked)... but be on the safe side. Hmm, maybe we should just
say d_child list is protected by d_parent->d_lock. d_parent could remain
protected with d_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make d_count non-atomic and protect it with d_lock. This allows us to
ensure a 0 refcount dentry remains 0 without dcache_lock. It is also
fairly natural when we start protecting many other dentry members with
d_lock.
XXX: This patch does not boot on its own
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make dentry_stat_t.nr_dentry an atomic_t type, and move it from under
dcache_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add a new lock, dcache_hash_lock, to protect the dcache hash table from
concurrent modification. d_hash is also protected by d_lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Improve scalability of mntget/mntput by using per-cpu counters protected
by the reader side of the brlock vfsmount_lock. mnt_mounted keeps track of
whether the vfsmount is actually attached to the tree so we can shortcut
expensive checks in mntput.
XXX: count_mnt_count needs write lock. Document this and/or revisit locking
(eg. look at writers count)
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use a brlock for the vfsmount lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with per-cpu locking. Effectively turning it into a big-writer
lock.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|