aboutsummaryrefslogtreecommitdiffstats
path: root/kernel/rcupdate.c
Commit message (Collapse)AuthorAge
* rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.Paul E. McKenney2009-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ingo Molnar reported this lockup: [ 200.380003] Hangcheck: hangcheck value past margin! [ 248.192003] INFO: task S99local:2974 blocked for more than 120 seconds. [ 248.194532] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 248.202330] S99local D 0000000c 6256 2974 2687 0x00000000 [ 248.208929] 9c7ebe90 00000086 6b67ef8b 0000000c 9f25a610 81a69869 00000001 820b6990 [ 248.216123] 820b6990 820b6990 9c6e4c20 9c6e4eb4 82c78990 00000000 6b993559 0000000c [ 248.220616] 9c7ebe90 8105f22a 9c6e4eb4 9c6e4c20 00000001 9c7ebe98 9c7ebeb4 81a65cb3 [ 248.229990] Call Trace: [ 248.234049] [<81a69869>] ? _spin_unlock_irqrestore+0x22/0x37 [ 248.239769] [<8105f22a>] ? prepare_to_wait+0x48/0x4e [ 248.244796] [<81a65cb3>] rcu_barrier_cpu_hotplug+0xaa/0xc9 [ 248.250343] [<8105f029>] ? autoremove_wake_function+0x0/0x38 [ 248.256063] [<81062cf2>] notifier_call_chain+0x49/0x71 [ 248.261263] [<81062da0>] raw_notifier_call_chain+0x11/0x13 [ 248.266809] [<81a0b475>] _cpu_down+0x272/0x288 [ 248.271316] [<81a0b4d5>] cpu_down+0x4a/0xa2 [ 248.275563] [<81a0c48a>] store_online+0x2a/0x5e [ 248.280156] [<81a0c460>] ? store_online+0x0/0x5e [ 248.284836] [<814ddc35>] sysdev_store+0x20/0x28 [ 248.289429] [<8112e403>] sysfs_write_file+0xb8/0xe3 [ 248.294369] [<8112e34b>] ? sysfs_write_file+0x0/0xe3 [ 248.299396] [<810e4c8f>] vfs_write+0x91/0x120 [ 248.303817] [<810e4dc1>] sys_write+0x40/0x65 [ 248.308150] [<81002d73>] sysenter_do_call+0x12/0x28 This change moves an RCU grace period delay off of the critical path for CPU-hotunplug operations. Since RCU callback migration is only performed on CPU-hotunplug operations, and since the rcu_barrier() race is provoked only by consecutive CPU-hotunplug operations, it is not necessary to delay the end of a given CPU-hotunplug operation. We can instead choose to delay the beginning of the next CPU-hotunplug operation. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josht@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: hugh.dickins@tiscali.co.uk Cc: benh@kernel.crashing.org LKML-Reference: <20090819060614.GA14383@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: Simplify RCU CPU-hotplug notificationPaul E. McKenney2009-08-15
| | | | | | | | | | | | | | | | | | | | | | | | | | Use the new cpu_notifier() API to simplify RCU's CPU-hotplug notifiers, collapsing down to a single such notifier. This makes it trivial to provide the notifier-ordering guarantee that rcu_barrier() depends on. Also remove redundant open_softirq() calls from Hierarchical RCU notifier. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: josht@linux.vnet.ibm.com Cc: akpm@linux-foundation.org Cc: mathieu.desnoyers@polymtl.ca Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: hugh.dickins@tiscali.co.uk Cc: benh@kernel.crashing.org LKML-Reference: <12503552312510-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: Add synchronize_sched_expedited() primitivePaul E. McKenney2009-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the synchronize_sched_expedited() primitive that implements the "big hammer" expedited RCU grace periods. This primitive is placed in kernel/sched.c rather than kernel/rcupdate.c due to its need to interact closely with the migration_thread() kthread. The idea is to wake up this kthread with req->task set to NULL, in response to which the kthread reports the quiescent state resulting from the kthread having been scheduled. Because this patch needs to fallback to the slow versions of the primitives in response to some races with CPU onlining and offlining, a new synchronize_rcu_bh() primitive is added as well. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org Cc: davem@davemloft.net Cc: dada1@cosmosbay.com Cc: zbr@ioremap.net Cc: jeff.chua.linux@gmail.com Cc: paulus@samba.org Cc: laijs@cn.fujitsu.com Cc: jengelh@medozas.de Cc: r000n@r000n.net Cc: benh@kernel.crashing.org Cc: mathieu.desnoyers@polymtl.ca LKML-Reference: <12459460982947-git-send-email-> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* RCU: Don't try and predeclare inline funcs as it upsets some versions of gccDavid Howells2009-04-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't try and predeclare inline funcs like this: static inline void wait_migrated_callbacks(void) ... static void _rcu_barrier(enum rcu_barrier type) { ... wait_migrated_callbacks(); } ... static inline void wait_migrated_callbacks(void) { wait_event(rcu_migrate_wq, !atomic_read(&rcu_migrate_type_count)); } as it upsets some versions of gcc under some circumstances: kernel/rcupdate.c: In function `_rcu_barrier': kernel/rcupdate.c:125: sorry, unimplemented: inlining failed in call to 'wait_migrated_callbacks': function body not available kernel/rcupdate.c:152: sorry, unimplemented: called from here This can be dealt with by simply putting the static variables (rcu_migrate_*) at the top, and moving the implementation of the function up so that it replaces its forward declaration. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* rcu: rcu_barrier VS cpu_hotplug: Ensure callbacks in dead cpu are migrated ↵Lai Jiangshan2009-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to online cpu cpu hotplug may happen asynchronously, some rcu callbacks are maybe still on dead cpu, rcu_barrier() also needs to wait for these rcu callbacks to complete, so we must ensure callbacks in dead cpu are migrated to online cpu. Paul E. McKenney's review: Good stuff, Lai!!! Simpler than any of the approaches that I was considering, and, better yet, independent of the underlying RCU implementation!!! I was initially worried that wake_up() might wake only one of two possible wait_event()s, namely rcu_barrier() and the CPU_POST_DEAD code, but the fact that wait_event() clears WQ_FLAG_EXCLUSIVE avoids that issue. I was also worried about the fact that different RCU implementations have different mappings of call_rcu(), call_rcu_bh(), and call_rcu_sched(), but this is OK as well because we just get an extra (harmless) callback in the case that they map together (for example, Classic RCU has call_rcu_sched() mapping to call_rcu()). Overlap of CPU-hotplug operations is prevented by cpu_add_remove_lock, and any stray callbacks that arrive (for example, from irq handlers running on the dying CPU) either are ahead of the CPU_DYING callbacks on the one hand (and thus accounted for), or happened after the rcu_barrier() started on the other (and thus don't need to be accounted for). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> LKML-Reference: <49C36476.1010400@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: Teach RCU that idle task is not quiscent state at bootPaul E. McKenney2009-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a bug located by Vegard Nossum with the aid of kmemcheck, updated based on review comments from Nick Piggin, Ingo Molnar, and Andrew Morton. And cleans up the variable-name and function-name language. ;-) The boot CPU runs in the context of its idle thread during boot-up. During this time, idle_cpu(0) will always return nonzero, which will fool Classic and Hierarchical RCU into deciding that a large chunk of the boot-up sequence is a big long quiescent state. This in turn causes RCU to prematurely end grace periods during this time. This patch changes the rcutree.c and rcuclassic.c rcu_check_callbacks() function to ignore the idle task as a quiescent state until the system has started up the scheduler in rest_init(), introducing a new non-API function rcu_idle_now_means_idle() to inform RCU of this transition. RCU maintains an internal rcu_idle_cpu_truthful variable to track this state, which is then used by rcu_check_callback() to determine if it should believe idle_cpu(). Because this patch has the effect of disallowing RCU grace periods during long stretches of the boot-up sequence, this patch also introduces Josh Triplett's UP-only optimization that makes synchronize_rcu() be a no-op if num_online_cpus() returns 1. This allows boot-time code that calls synchronize_rcu() to proceed normally. Note, however, that RCU callbacks registered by call_rcu() will likely queue up until later in the boot sequence. Although rcuclassic and rcutree can also use this same optimization after boot completes, rcupreempt must restrict its use of this optimization to the portion of the boot sequence before the scheduler starts up, given that an rcupreempt RCU read-side critical section may be preeempted. In addition, this patch takes Nick Piggin's suggestion to make the system_state global variable be __read_mostly. Changes since v4: o Changes the name of the introduced function and variable to be less emotional. ;-) Changes since v3: o WARN_ON(nr_context_switches() > 0) to verify that RCU switches out of boot-time mode before the first context switch, as suggested by Nick Piggin. Changes since v2: o Created rcu_blocking_is_gp() internal-to-RCU API that determines whether a call to synchronize_rcu() is itself a grace period. o The definition of rcu_blocking_is_gp() for rcuclassic and rcutree checks to see if but a single CPU is online. o The definition of rcu_blocking_is_gp() for rcupreempt checks to see both if but a single CPU is online and if the system is still in early boot. This allows rcupreempt to again work correctly if running on a single CPU after booting is complete. o Added check to rcupreempt's synchronize_sched() for there being but one online CPU. Tested all three variants both SMP and !SMP, booted fine, passed a short rcutorture test on both x86 and Power. Located-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: eliminate synchronize_rcu_xxx macroPaul E. McKenney2009-01-05
| | | | | | | | | | | | | Impact: cleanup Expand macro into two files. The synchronize_rcu_xxx macro is quite ugly and it's only used by two callers, so expand it instead. This makes this code easier to change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcupdate: fix bug of rcu_barrier*()Lai Jiangshan2008-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | current rcu_barrier_bh() is like this: void rcu_barrier_bh(void) { BUG_ON(in_interrupt()); /* Take cpucontrol mutex to protect against CPU hotplug */ mutex_lock(&rcu_barrier_mutex); init_completion(&rcu_barrier_completion); atomic_set(&rcu_barrier_cpu_count, 0); /* * The queueing of callbacks in all CPUs must be atomic with * respect to RCU, otherwise one CPU may queue a callback, * wait for a grace period, decrement barrier count and call * complete(), while other CPUs have not yet queued anything. * So, we need to make sure that grace periods cannot complete * until all the callbacks are queued. */ rcu_read_lock(); on_each_cpu(rcu_barrier_func, (void *)RCU_BARRIER_BH, 1); rcu_read_unlock(); wait_for_completion(&rcu_barrier_completion); mutex_unlock(&rcu_barrier_mutex); } The inconsistency of the code and the comments show a bug here. rcu_read_lock() cannot make sure that "grace periods for RCU_BH cannot complete until all the callbacks are queued". it only make sure that race periods for RCU cannot complete until all the callbacks are queued. so we must use rcu_read_lock_bh() for rcu_barrier_bh(). like this: void rcu_barrier_bh(void) { ...... rcu_read_lock_bh(); on_each_cpu(rcu_barrier_func, (void *)RCU_BARRIER_BH, 1); rcu_read_unlock_bh(); ...... } and also rcu_barrier() rcu_barrier_sched() are implemented like this. it will bring a lot of duplicate code. My patch uses another way to fix this bug, please see the comment of my patch. Thank Paul E. McKenney for he rewrote the comment. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: fix synchronize_rcu() so that kernel-doc worksRandy Dunlap2008-08-21
| | | | | | | | | | | | | Fix RCU's synchronize_rcu() so that it looks like a C function, enabling it to be recognized as a function with kernel-doc annotation. Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'synchronize_rcu' Warning(linux-2.6.26-git11//kernel/rcupdate.c:81): No description found for parameter 'call_rcu' [akpm@linux-foundation.org: fix comment] Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'generic-ipi-for-linus' of ↵Linus Torvalds2008-07-15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'generic-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (22 commits) generic-ipi: more merge fallout generic-ipi: merge fix x86, visws: use mach-default/entry_arch.h x86, visws: fix generic-ipi build generic-ipi: fixlet generic-ipi: fix s390 build bug generic-ipi: fix linux-next tree build failure fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix: "smp_call_function: get rid of the unused nonatomic/retry argument" fix "smp_call_function: get rid of the unused nonatomic/retry argument" on_each_cpu(): kill unused 'retry' parameter smp_call_function: get rid of the unused nonatomic/retry argument sh: convert to generic helpers for IPI function calls parisc: convert to generic helpers for IPI function calls mips: convert to generic helpers for IPI function calls m32r: convert to generic helpers for IPI function calls arm: convert to generic helpers for IPI function calls alpha: convert to generic helpers for IPI function calls ia64: convert to generic helpers for IPI function calls powerpc: convert to generic helpers for IPI function calls ... Fix trivial conflicts due to rcu updates in kernel/rcupdate.c manually
| * on_each_cpu(): kill unused 'retry' parameterJens Axboe2008-06-26
| | | | | | | | | | | | | | | | | | It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | rcu: add rcu_barrier_sched() and rcu_barrier_bh()Paul E. McKenney2008-05-19
| | | | | | | | | | | | | | | | | | | | | | | | Add rcu_barrier_sched() and rcu_barrier_bh(). With these in place, rcutorture no longer gives the occasional oops when repeatedly starting and stopping torturing rcu_bh. Also adds the API needed to flush out pre-existing call_rcu_sched() callbacks. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | rcu: add call_rcu_sched()Paul E. McKenney2008-05-19
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fourth cut of patch to provide the call_rcu_sched(). This is again to synchronize_sched() as call_rcu() is to synchronize_rcu(). Should be fine for experimental and -rt use, but not ready for inclusion. With some luck, I will be able to tell Andrew to come out of hiding on the next round. Passes multi-day rcutorture sessions with concurrent CPU hotplugging. Fixes since the first version include a bug that could result in indefinite blocking (spotted by Gautham Shenoy), better resiliency against CPU-hotplug operations, and other minor fixes. Fixes since the second version include reworking grace-period detection to avoid deadlocks that could happen when running concurrently with CPU hotplug, adding Mathieu's fix to avoid the softlockup messages, as well as Mathieu's fix to allow use earlier in boot. Fixes since the third version include a wrong-CPU bug spotted by Andrew, getting rid of the obsolete synchronize_kernel API that somehow snuck back in, merging spin_unlock() and local_irq_restore() in a few places, commenting the code that checks for quiescent states based on interrupting from user-mode execution or the idle loop, removing some inline attributes, and some code-style changes. Known/suspected shortcomings: o I still do not entirely trust the sleep/wakeup logic. Next step will be to use a private snapshot of the CPU online mask in rcu_sched_grace_period() -- if the CPU wasn't there at the start of the grace period, we don't need to hear from it. And the bit about accounting for changes in online CPUs inside of rcu_sched_grace_period() is ugly anyway. o It might be good for rcu_sched_grace_period() to invoke resched_cpu() when a given CPU wasn't responding quickly, but resched_cpu() is declared static... This patch also fixes a long-standing bug in the earlier preemptable-RCU implementation of synchronize_rcu() that could result in loss of concurrent external changes to a task's CPU affinity mask. I still cannot remember who reported this... Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* rcupdate: fix commentPaul E. McKenney2008-02-13
| | | | | | | | | | This comment caused some consternation during fastcall removal. Make it truthful. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Preempt-RCU: fix rcu_barrier for preemptive environment.Paul E. McKenney2008-01-25
| | | | | | | | | | | | Fix rcu_barrier() to work properly in preemptive kernel environment. Also, the ordering of callback must be preserved while moving callbacks to another CPU during CPU hotplug. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: reorganize RCU code into rcuclassic.c and rcupdate.cPaul E. McKenney2008-01-25
| | | | | | | | | | | | | This patch re-organizes the RCU code to enable multiple implementations of RCU. Users of RCU continues to include rcupdate.h and the RCU interfaces remain the same. This is in preparation for subsequently merging the preemptible RCU implementation. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Preempt-RCU: Use softirq instead of tasklets forDipankar Sarma2008-01-25
| | | | | | | | | | | | | | | | This patch makes RCU use softirq instead of tasklets. It also adds a memory barrier after raising the softirq inorder to ensure that the cpu sees the most recently updated value of rcu->cur while processing callbacks. The discussion of the related theoretical race pointed out by James Huang can be found here --> http://lkml.org/lkml/2007/11/20/603 Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Reviewed-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: fix section mismatchRandy Dunlap2008-01-22
| | | | | | | | | | | | | rcu_online_cpu() should be __cpuinit instead of __devinit. WARNING: vmlinux.o(.text+0x4b6d5): Section mismatch: reference to .init.text: (between 'rcu_cpu_notify' and 'wakeme_after_rcu') Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Sam Ravnborg <sam@ravnborg.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Clean up duplicate includes in kernel/Jesper Juhl2007-10-17
| | | | | | | | | | | This patch cleans up duplicate includes in kernel/ Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Satyam Sharma <ssatyam@cse.iitk.ac.in> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* lockdep: annotate rcu_read_{,un}lock{,_bh}Peter Zijlstra2007-10-11
| | | | | | | | | lockdep annotate rcu_read_{,un}lock{,_bh} in order to catch imbalanced usage. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* Add suspend-related notifications for CPU hotplugRafael J. Wysocki2007-05-09
| | | | | | | | | | | | | | | | | | | | | Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] rcu: add a prefetch() in rcu_do_batch()Eric Dumazet2006-12-07
| | | | | | | | | | | | | | | | | | | | | | On some workloads, (for example when lot of close() syscalls are done), RCU qlen can be quite large, and RCU heads are no longer in cpu cache when rcu_do_batch() is called. This patch adds a prefetch() in rcu_do_batch() to give CPU a hint to bring back cache lines containing 'struct rcu_head's. Most list manipulations macros include prefetch(), but not open coded ones (at least with current C compilers :) ) I got a nice speedup on a trivial benchmark (3.48 us per iteration instead of 3.95 us on a 1.6 GHz Pentium-M) while (1) { pipe(p); close(fd[0]); close(fd[1]);} Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: simplify/improve batch tuningOleg Nesterov2006-10-04
| | | | | | | | | | | | | | | | | | | | Kill a hard-to-calculate 'rsinterval' boot parameter and per-cpu rcu_data.last_rs_qlen. Instead, it adds adds a flag rcu_ctrlblk.signaled, which records the fact that one of CPUs has sent a resched IPI since the last rcu_start_batch(). Roughly speaking, we need two rcu_start_batch()s in order to move callbacks from ->nxtlist to ->donelist. This means that when ->qlen exceeds qhimark and continues to grow, we should send a resched IPI, and then do it again after we gone through a quiescent state. On the other hand, if it was already sent, we don't need to do it again when another CPU detects overflow of the queue. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu_do_batch: make ->qlen decrement irq safeOleg Nesterov2006-09-13
| | | | | | | | | | | | | rcu_do_batch() decrements rdp->qlen with irqs enabled. This is not good, it can also be modified by call_rcu() from interrupt. Decrement ->qlen once with irqs disabled, after a main loop. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notificationsChandra Seetharaman2006-07-31
| | | | | | | | | | | | | | | | Few of the callback functions and notifier blocks that are associated with cpu notifications incorrectly have __devinit and __devinitdata. They should be __cpuinit and __cpuinitdata instead. It makes no functional difference but wastes text area when CONFIG_HOTPLUG is enabled and CONFIG_HOTPLUG_CPU is not. This patch fixes all those instances. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] lockdep: locking init debugging improvementIngo Molnar2006-07-03
| | | | | | | | | | | | Locking init improvement: - introduce and use __SPIN_LOCK_UNLOCKED for array initializations, to pass in the name string of locks, used by debugging Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu hotplug: revert initdata patch submitted for 2.6.17Chandra Seetharaman2006-06-27
| | | | | | | | | This patch reverts notifier_block changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpu hotplug: revert init patch submitted for 2.6.17Chandra Seetharaman2006-06-27
| | | | | | | | | | | | | | | | | | | | | In 2.6.17, there was a problem with cpu_notifiers and XFS. I provided a band-aid solution to solve that problem. In the process, i undid all the changes you both were making to ensure that these notifiers were available only at init time (unless CONFIG_HOTPLUG_CPU is defined). We deferred the real fix to 2.6.18. Here is a set of patches that fixes the XFS problem cleanly and makes the cpu notifiers available only at init time (unless CONFIG_HOTPLUG_CPU is defined). If CONFIG_HOTPLUG_CPU is defined then cpu notifiers are available at run time. This patch reverts the notifier_call changes made in 2.6.17 Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Cc: Ashok Raj <ashok.raj@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcutorture: add call_rcu_bh() operationsPaul E. McKenney2006-06-27
| | | | | | | | | Add operations for the call_rcu_bh() variant of RCU. Also add an rcu_batches_completed_bh() function, which is needed by rcutorture. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Make RCU API inaccessible to non-GPL Linux kernel modulesPaul E. McKenney2006-06-23
| | | | | | | | | | | Remove synchronize_kernel() (deprecated 2-APR-2005 in http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to non-GPL Linux kernel modules (as was announced more than one year ago in http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] RCU: introduce rcu_needs_cpu() interfaceHeiko Carstens2006-05-15
| | | | | | | | | | | | | | | With "Paul E. McKenney" <paulmck@us.ibm.com> Introduce rcu_needs_cpu() interface. This can be used to tell if there will be a new rcu batch on a cpu soon by looking at the curlist pointer. This can be used to avoid to enter a tickless idle state where the cpu would miss that a new batch is ready when rcu_start_batch would be called on a different cpu. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Remove __devinit and __cpuinit from notifier_call definitionsChandra Seetharaman2006-04-26
| | | | | | | | | | | | | Few of the notifier_chain_register() callers use __init in the definition of notifier_call. It is incorrect as the function definition should be available after the initializations (they do not unregister them during initializations). This patch fixes all such usages to _not_ have the notifier_call __init section. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Remove __devinitdata from notifier block definitionsChandra Seetharaman2006-04-26
| | | | | | | | | | | | | | | | Few of the notifier_chain_register() callers use __devinitdata in the definition of notifier_block data structure. It is incorrect as the data structure should be available after the initializations (they do not unregister them during initializations). This was leading to an oops when notifier_chain_register() call is invoked for those callback chains after initialization. This patch fixes all such usages to _not_ have the notifier_block data structure in the init data section. Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu_process_callbacks: don't cli() while testing ->nxtlistOleg Nesterov2006-03-24
| | | | | | | | | | | | | | | | | __rcu_process_callbacks() disables interrupts to protect itself from call_rcu() which adds new entries to ->nxtlist. However we can check "->nxtlist != NULL" with interrupts enabled, we can't get "false positives" because call_rcu() can only change this condition from 0 to 1. Tested with rcutorture.ko. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kernel/rcupdate.c: make two structs staticAdrian Bunk2006-03-23
| | | | | | | | This patch makes two needlessly global structs static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] convert kernel/rcupdate.c:rcu_barrier_sema to mutexIngo Molnar2006-03-23
| | | | | | | | | Convert kernel/rcupdate's rcu_barrier_sema to mutex. Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] add EXPORT_SYMBOL_GPL_FUTURE() to RCU subsystemGreg Kroah-Hartman2006-03-20
| | | | | | | | | As the RCU symbols are going to be changed to GPL in the near future, lets warn users that this is going to happen. Cc: Paul McKenney <paulmck@us.ibm.com> Acked-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* [PATCH] rcu batch tuningDipankar Sarma2006-03-08
| | | | | | | | | | | | | | | | | | | | | This patch adds new tunables for RCU queue and finished batches. There are two types of controls - number of completed RCU updates invoked in a batch (blimit) and monitoring for high rate of incoming RCUs on a cpu (qhimark, qlowmark). By default, the per-cpu batch limit is set to a small value. If the input RCU rate exceeds the high watermark, we do two things - force quiescent state on all cpus and set the batch limit of the CPU to INTMAX. Setting batch limit to INTMAX forces all finished RCUs to be processed in one shot. If we have more than INTMAX RCUs queued up, then we have bigger problems anyway. Once the incoming queued RCUs fall below the low watermark, the batch limit is set to the default. Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: fix hotplug-cpu ->donelist leakOleg Nesterov2006-01-10
| | | | | | | | | | | | | | | | | | | | | | Pointed out by Srivatsa Vaddagiri <vatsa@in.ibm.com>. rcu_do_batch() stops after processing maxbatch callbacks on ->donelist leaving rcu_tasklet in TASKLET_STATE_SCHED state. If CPU_DEAD event happens remaining ->donelist entries are lost, rcu_offline_cpu() kills this tasklet. With this patch ->donelist migrates along with ->curlist and ->nxtlist to the current cpu. Compile tested. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: join rcu_ctrlblk and rcu_stateOleg Nesterov2006-01-10
| | | | | | | | | | | This patch moves rcu_state into the rcu_ctrlblk. I think there are no reasons why we should have 2 different variables to control rcu state. Every user of rcu_state has also "rcu_ctrlblk *rcp" in the parameter list. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: don't set ->next_pending in rcu_start_batch()Oleg Nesterov2006-01-09
| | | | | | | | | | I think it is better to set ->next_pending in the caller, when it is needed. This saves one parameter, and this coincides with cpu_quiet() beahaviour, which sets ->completed = ->cur itself. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: uninline __rcu_pending()Oleg Nesterov2006-01-09
| | | | | | | | | | | | __rcu_pending() is rather fat and called twice from rcu_pending(). rcu_pending() has multiple callers, and not that small too. This patch uninlines both of them. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu file: use atomic primitivesNick Piggin2006-01-08
| | | | | | | | | Use atomic_inc_not_zero for rcu files instead of special case rcuref. Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] RCU signal handlingIngo Molnar2006-01-08
| | | | | | | | | | | | | | RCU tasklist_lock and RCU signal handling: send signals RCU-read-locked instead of tasklist_lock read-locked. This is a scalability improvement on SMP and a preemption-latency improvement under PREEMPT_RCU. Signed-off-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Acked-by: William Irwin <wli@holomorphy.com> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Change maxaligned_in_smp alignemnt macros to internodealigned_in_smp ↵Ravikiran G Thirumalai2006-01-08
| | | | | | | | | | | | | | | | | | | | | | | | macros ____cacheline_maxaligned_in_smp is currently used to align critical structures and avoid false sharing. It uses per-arch L1_CACHE_SHIFT_MAX and people find L1_CACHE_SHIFT_MAX useless. However, we have been using ____cacheline_maxaligned_in_smp to align structures on the internode cacheline size. As per Andi's suggestion, following patch kills ____cacheline_maxaligned_in_smp and introduces INTERNODE_CACHE_SHIFT, which defaults to L1_CACHE_SHIFT for all arches. Arches needing L3/Internode cacheline alignment can define INTERNODE_CACHE_SHIFT in the arch asm/cache.h. Patch replaces ____cacheline_maxaligned_in_smp with ____cacheline_internodealigned_in_smp With this patch, L1_CACHE_SHIFT_MAX can be killed Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix RCU race in access of nohz_cpu_maskSrivatsa Vaddagiri2005-12-12
| | | | | | | | | | | | | | | Accessing nohz_cpu_mask before incrementing rcp->cur is racy. It can cause tickless idle CPUs to be included in rsp->cpumask, which will extend graceperiods unnecessarily. Fix this race. It has been tested using extensions to RCU torture module that forces various CPUs to become idle. Signed-off-by: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Dipankar Sarma <dipankar@in.ibm.com> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] add rcu_barrier() synchronization pointDipankar Sarma2005-12-12
| | | | | | | | | | | | | | | | | | | This introduces a new interface - rcu_barrier() which waits until all the RCUs queued until this call have been completed. Reiser4 needs this, because we do more than just freeing memory object in our RCU callback: we also remove it from the list hanging off super-block. This means, that before freeing reiser4-specific portion of super-block (during umount) we have to wait until all pending RCU callbacks are executed. The only change of reiser4 made to the original patch, is exporting of rcu_barrier(). Cc: Hans Reiser <reiser@namesys.com> Cc: Vladimir V. Saveliev <vs@namesys.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] RCU torture-testing kernel modulePaul E. McKenney2005-10-30
| | | | | | | | | | | | | | | | | | | | This patch is a rewrite of the one submitted on October 1st, using modules (http://marc.theaimsgroup.com/?l=linux-kernel&m=112819093522998&w=2). This rewrite adds a tristate CONFIG_RCU_TORTURE_TEST, which enables an intense torture test of the RCU infratructure. This is needed due to the continued changes to the RCU infrastructure to accommodate dynamic ticks, CPU hotplug, realtime, and so on. Most of the code is in a separate file that is compiled only if the CONFIG variable is set. Documentation on how to run the test and interpret the output is also included. This code has been tested on i386 and ppc64, and an earlier version of the code has received extensive testing on a number of architectures as part of the PREEMPT_RT patchset. Signed-off-by: "Paul E. McKenney" <paulmck@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] rcu: keep rcu callback event counterEric Dumazet2005-10-17
| | | | | | | | | This makes call_rcu() keep track of how many events there are on the RCU list, and cause a reschedule event when the list gets too long. This helps keep RCU event lists down. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Increase default RCU batching sharplyLinus Torvalds2005-10-17
| | | | | | | | | | | | Dipankar made RCU limit the batch size to improve latency, but that approach is unworkable: it can cause the RCU queues to grow without bounds, since the batch limiter ended up limiting the callbacks. So make the limit much higher, and start planning on instead limiting the batch size by doing RCU callbacks more often if the queue looks like it might be growing too long. Signed-off-by: Linus Torvalds <torvalds@osdl.org>