aboutsummaryrefslogtreecommitdiffstats
path: root/init
diff options
context:
space:
mode:
authorSteven Rostedt <rostedt@goodmis.org>2013-05-28 17:32:53 -0400
committerPaul E. McKenney <paulmck@linux.vnet.ibm.com>2013-06-10 16:37:11 -0400
commit016a8d5be6ddcc72ef0432d82d9f6fa34f61b907 (patch)
tree3f89d85633928acfde3a6a390ef62dcda0d3a3f4 /init
parentd62840995a99c9766803d54e9d7923f247a1c1db (diff)
rcu: Don't call wakeup() with rcu_node structure ->lock held
This commit fixes a lockdep-detected deadlock by moving a wake_up() call out from a rnp->lock critical section. Please see below for the long version of this story. On Tue, 2013-05-28 at 16:13 -0400, Dave Jones wrote: > [12572.705832] ====================================================== > [12572.750317] [ INFO: possible circular locking dependency detected ] > [12572.796978] 3.10.0-rc3+ #39 Not tainted > [12572.833381] ------------------------------------------------------- > [12572.862233] trinity-child17/31341 is trying to acquire lock: > [12572.870390] (rcu_node_0){..-.-.}, at: [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12572.878859] > but task is already holding lock: > [12572.894894] (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12572.903381] > which lock already depends on the new lock. > > [12572.927541] > the existing dependency chain (in reverse order) is: > [12572.943736] > -> #4 (&ctx->lock){-.-...}: > [12572.960032] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12572.968337] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12572.976633] [<ffffffff8113c987>] __perf_event_task_sched_out+0x2e7/0x5e0 > [12572.984969] [<ffffffff81088953>] perf_event_task_sched_out+0x93/0xa0 > [12572.993326] [<ffffffff816ea0bf>] __schedule+0x2cf/0x9c0 > [12573.001652] [<ffffffff816eacfe>] schedule_user+0x2e/0x70 > [12573.009998] [<ffffffff816ecd64>] retint_careful+0x12/0x2e > [12573.018321] > -> #3 (&rq->lock){-.-.-.}: > [12573.034628] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.042930] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.051248] [<ffffffff8108e6a7>] wake_up_new_task+0xb7/0x260 > [12573.059579] [<ffffffff810492f5>] do_fork+0x105/0x470 > [12573.067880] [<ffffffff81049686>] kernel_thread+0x26/0x30 > [12573.076202] [<ffffffff816cee63>] rest_init+0x23/0x140 > [12573.084508] [<ffffffff81ed8e1f>] start_kernel+0x3f1/0x3fe > [12573.092852] [<ffffffff81ed856f>] x86_64_start_reservations+0x2a/0x2c > [12573.101233] [<ffffffff81ed863d>] x86_64_start_kernel+0xcc/0xcf > [12573.109528] > -> #2 (&p->pi_lock){-.-.-.}: > [12573.125675] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.133829] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.141964] [<ffffffff8108e881>] try_to_wake_up+0x31/0x320 > [12573.150065] [<ffffffff8108ebe2>] default_wake_function+0x12/0x20 > [12573.158151] [<ffffffff8107bbf8>] autoremove_wake_function+0x18/0x40 > [12573.166195] [<ffffffff81085398>] __wake_up_common+0x58/0x90 > [12573.174215] [<ffffffff81086909>] __wake_up+0x39/0x50 > [12573.182146] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.190119] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.198023] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.205860] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.213656] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 > [12573.221379] > -> #1 (&rsp->gp_wq){..-.-.}: > [12573.236329] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.243783] [<ffffffff816ebe9b>] _raw_spin_lock_irqsave+0x4b/0x90 > [12573.251178] [<ffffffff810868f3>] __wake_up+0x23/0x50 > [12573.258505] [<ffffffff810fc3da>] rcu_start_gp_advanced.isra.11+0x4a/0x50 > [12573.265891] [<ffffffff810fdb09>] rcu_start_future_gp+0x1c9/0x1f0 > [12573.273248] [<ffffffff810fe2c4>] rcu_nocb_kthread+0x114/0x930 > [12573.280564] [<ffffffff8107a91d>] kthread+0xed/0x100 > [12573.287807] [<ffffffff816f4b1c>] ret_from_fork+0x7c/0xb0 Notice the above call chain. rcu_start_future_gp() is called with the rnp->lock held. Then it calls rcu_start_gp_advance, which does a wakeup. You can't do wakeups while holding the rnp->lock, as that would mean that you could not do a rcu_read_unlock() while holding the rq lock, or any lock that was taken while holding the rq lock. This is because... (See below). > [12573.295067] > -> #0 (rcu_node_0){..-.-.}: > [12573.309293] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.316568] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.323825] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.331081] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.338377] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.345648] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.352942] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.360211] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.367514] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.374816] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 Notice the above trace. perf took its own ctx->lock, which can be taken while holding the rq lock. While holding this lock, it did a rcu_read_unlock(). The perf_lock_task_context() basically looks like: rcu_read_lock(); raw_spin_lock(ctx->lock); rcu_read_unlock(); Now, what looks to have happened, is that we scheduled after taking that first rcu_read_lock() but before taking the spin lock. When we scheduled back in and took the ctx->lock, the following rcu_read_unlock() triggered the "special" code. The rcu_read_unlock_special() takes the rnp->lock, which gives us a possible deadlock scenario. CPU0 CPU1 CPU2 ---- ---- ---- rcu_nocb_kthread() lock(rq->lock); lock(ctx->lock); lock(rnp->lock); wake_up(); lock(rq->lock); rcu_read_unlock(); rcu_read_unlock_special(); lock(rnp->lock); lock(ctx->lock); **** DEADLOCK **** > [12573.382068] > other info that might help us debug this: > > [12573.403229] Chain exists of: > rcu_node_0 --> &rq->lock --> &ctx->lock > > [12573.424471] Possible unsafe locking scenario: > > [12573.438499] CPU0 CPU1 > [12573.445599] ---- ---- > [12573.452691] lock(&ctx->lock); > [12573.459799] lock(&rq->lock); > [12573.467010] lock(&ctx->lock); > [12573.474192] lock(rcu_node_0); > [12573.481262] > *** DEADLOCK *** > > [12573.501931] 1 lock held by trinity-child17/31341: > [12573.508990] #0: (&ctx->lock){-.-...}, at: [<ffffffff811390ed>] perf_lock_task_context+0x7d/0x2d0 > [12573.516475] > stack backtrace: > [12573.530395] CPU: 1 PID: 31341 Comm: trinity-child17 Not tainted 3.10.0-rc3+ #39 > [12573.545357] ffffffff825b4f90 ffff880219f1dbc0 ffffffff816e375b ffff880219f1dc00 > [12573.552868] ffffffff816dfa5d ffff880219f1dc50 ffff88023ce4d1f8 ffff88023ce4ca40 > [12573.560353] 0000000000000001 0000000000000001 ffff88023ce4d1f8 ffff880219f1dcc0 > [12573.567856] Call Trace: > [12573.575011] [<ffffffff816e375b>] dump_stack+0x19/0x1b > [12573.582284] [<ffffffff816dfa5d>] print_circular_bug+0x200/0x20f > [12573.589637] [<ffffffff810b8d36>] __lock_acquire+0x1786/0x1af0 > [12573.596982] [<ffffffff810918f5>] ? sched_clock_cpu+0xb5/0x100 > [12573.604344] [<ffffffff810b9851>] lock_acquire+0x91/0x1f0 > [12573.611652] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.619030] [<ffffffff816ebc90>] _raw_spin_lock+0x40/0x80 > [12573.626331] [<ffffffff811054ff>] ? rcu_read_unlock_special+0x9f/0x4c0 > [12573.633671] [<ffffffff811054ff>] rcu_read_unlock_special+0x9f/0x4c0 > [12573.640992] [<ffffffff811390ed>] ? perf_lock_task_context+0x7d/0x2d0 > [12573.648330] [<ffffffff810b429e>] ? put_lock_stats.isra.29+0xe/0x40 > [12573.655662] [<ffffffff813095a0>] ? delay_tsc+0x90/0xe0 > [12573.662964] [<ffffffff810760a6>] __rcu_read_unlock+0x96/0xa0 > [12573.670276] [<ffffffff811391b3>] perf_lock_task_context+0x143/0x2d0 > [12573.677622] [<ffffffff81139070>] ? __perf_event_enable+0x370/0x370 > [12573.684981] [<ffffffff8113938e>] find_get_context+0x4e/0x1f0 > [12573.692358] [<ffffffff811403f4>] SYSC_perf_event_open+0x514/0xbd0 > [12573.699753] [<ffffffff8108cd9d>] ? get_parent_ip+0xd/0x50 > [12573.707135] [<ffffffff810b71fd>] ? trace_hardirqs_on_caller+0xfd/0x1c0 > [12573.714599] [<ffffffff81140e49>] SyS_perf_event_open+0x9/0x10 > [12573.721996] [<ffffffff816f4dd4>] tracesys+0xdd/0xe2 This commit delays the wakeup via irq_work(), which is what perf and ftrace use to perform wakeups in critical sections. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Diffstat (limited to 'init')
-rw-r--r--init/Kconfig1
1 files changed, 1 insertions, 0 deletions
diff --git a/init/Kconfig b/init/Kconfig
index 9d3a7887a6d3..2d9b83104dcf 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -431,6 +431,7 @@ choice
431config TREE_RCU 431config TREE_RCU
432 bool "Tree-based hierarchical RCU" 432 bool "Tree-based hierarchical RCU"
433 depends on !PREEMPT && SMP 433 depends on !PREEMPT && SMP
434 select IRQ_WORK
434 help 435 help
435 This option selects the RCU implementation that is 436 This option selects the RCU implementation that is
436 designed for very large SMP system with hundreds or 437 designed for very large SMP system with hundreds or