aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux/cpumask.h
Commit message (Collapse)AuthorAge
* cpumask: introduce new API, without changing anything, v3Rusty Russell2008-11-09
| | | | | | | | | | | | | Impact: cleanup Clean up based on feedback from Andrew Morton and others: - change to inline functions instead of macros - add __init to bootmem method - add a missing debug check Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: new API, v2Rusty Russell2008-11-07
| | | | | | | | - add cpumask_of() - add free_bootmem_cpumask_var() Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: introduce new API, without changing anythingRusty Russell2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: introduce new APIs We want to deprecate cpumasks on the stack, as we are headed for gynormous numbers of CPUs. Eventually, we want to head towards an undefined 'struct cpumask' so they can never be declared on stack. 1) New cpumask functions which take pointers instead of copies. (cpus_* -> cpumask_*) 2) Several new helpers to reduce requirements for temporary cpumasks (cpumask_first_and, cpumask_next_and, cpumask_any_and) 3) Helpers for declaring cpumasks on or offstack for large NR_CPUS (cpumask_var_t, alloc_cpumask_var and free_cpumask_var) 4) 'struct cpumask' for explicitness and to mark new-style code. 5) Make iterator functions stop at nr_cpu_ids (a runtime constant), not NR_CPUS for time efficiency and for smaller dynamic allocations in future. 6) cpumask_copy() so we can allocate less than a full cpumask eventually (for alloc_cpumask_var), and so we can eliminate the 'struct cpumask' definition eventually. 7) work_on_cpu() helper for doing task on a CPU, rather than saving old cpumask for current thread and manipulating it. 8) smp_call_function_many() which is smp_call_function_mask() except taking a cpumask pointer. Note that this patch simply introduces the new functions and leaves the obsolescent ones in place. This is to simplify the transition patches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: statement expressions confuse some versions of gccStephen Rothwell2008-07-30
| | | | | | | | | | | | | | | when you take the address of the result. Noticed on a sparc64 compile using a version 3.4.5 cross compiler. kernel/time/tick-common.c: In function `tick_check_new_device': kernel/time/tick-common.c:210: error: invalid lvalue in unary `&' ... Just make it a regular expression. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpu masks: optimize and clean up cpumask_of_cpu()Linus Torvalds2008-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up and optimize cpumask_of_cpu(), by sharing all the zero words. Instead of stupidly generating all possible i=0...NR_CPUS 2^i patterns creating a huge array of constant bitmasks, realize that the zero words can be shared. In other words, on a 64-bit architecture, we only ever need 64 of these arrays - with a different bit set in one single world (with enough zero words around it so that we can create any bitmask by just offsetting in that big array). And then we just put enough zeroes around it that we can point every single cpumask to be one of those things. So when we have 4k CPU's, instead of having 4k arrays (of 4k bits each, with one bit set in each array - 2MB memory total), we have exactly 64 arrays instead, each 8k bits in size (64kB total). And then we just point cpumask(n) to the right position (which we can calculate dynamically). Once we have the right arrays, getting "cpumask(n)" ends up being: static inline const cpumask_t *get_cpu_mask(unsigned int cpu) { const unsigned long *p = cpu_bit_bitmap[1 + cpu % BITS_PER_LONG]; p -= cpu / BITS_PER_LONG; return (const cpumask_t *)p; } This brings other advantages and simplifications as well: - we are not wasting memory that is just filled with a single bit in various different places - we don't need all those games to re-create the arrays in some dense format, because they're already going to be dense enough. if we compile a kernel for up to 4k CPU's, "wasting" that 64kB of memory is a non-issue (especially since by doing this "overlapping" trick we probably get better cache behaviour anyway). [ mingo@elte.hu: Converted Linus's mails into a commit. See: http://lkml.org/lkml/2008/7/27/156 http://lkml.org/lkml/2008/7/28/320 Also applied a family filter - which also has the side-effect of leaving out the bits where Linus calls me an idio... Oh, never mind ;-) ] Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: make cpumask_of_cpu_map genericMike Travis2008-07-26
| | | | | | | | | | | | | | | | | | | | | | If an arch doesn't define cpumask_of_cpu_map, create a generic statically-initialized one for them. This allows removal of the buggy cpumask_of_cpu() macro (&cpumask_of_cpu() gives address of out-of-scope var). An arch with NR_CPUS of 4096 probably wants to allocate this itself based on the actual number of CPUs, since otherwise they're using 2MB of rodata (1024 cpus means 128k). That's what CONFIG_HAVE_CPUMASK_OF_CPU_MAP is for (only x86/64 does so at the moment). In future as we support more CPUs, we'll need to resort to a get_cpu_map()/put_cpu_map() allocation scheme. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'sched/for-linus' of ↵Linus Torvalds2008-07-23
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: hrtick_enabled() should use cpu_active() sched, x86: clean up hrtick implementation sched: fix build error, provide partition_sched_domains() unconditionally sched: fix warning in inc_rt_tasks() to not declare variable 'rq' if it's not needed cpu hotplug: Make cpu_active_map synchronization dependency clear cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) sched: rework of "prioritize non-migratable tasks over migratable ones" sched: reduce stack size in isolated_cpu_setup() Revert parts of "ftrace: do not trace scheduler functions" Fixed up conflicts in include/asm-x86/thread_info.h (due to the TIF_SINGLESTEP unification vs TIF_HRTICK_RESCHED removal) and kernel/sched_fair.c (due to cpu_active_map vs for_each_cpu_mask_nr() introduction).
| * cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment ↵Max Krasnyansky2008-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (take 2) This is based on Linus' idea of creating cpu_active_map that prevents scheduler load balancer from migrating tasks to the cpu that is going down. It allows us to simplify domain management code and avoid unecessary domain rebuilds during cpu hotplug event handling. Please ignore the cpusets part for now. It needs some more work in order to avoid crazy lock nesting. Although I did simplfy and unify domain reinitialization logic. We now simply call partition_sched_domains() in all the cases. This means that we're using exact same code paths as in cpusets case and hence the test below cover cpusets too. Cpuset changes to make rebuild_sched_domains() callable from various contexts are in the separate patch (right next after this one). This not only boots but also easily handles while true; do make clean; make -j 8; done and while true; do on-off-cpu 1; done at the same time. (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing). Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing this on right now in gnome-terminal and things are moving just fine. Also this is running with most of the debug features enabled (lockdep, mutex, etc) no BUG_ONs or lockdep complaints so far. I believe I addressed all of the Dmitry's comments for original Linus' version. I changed both fair and rt balancer to mask out non-active cpus. And replaced cpu_is_offline() with !cpu_active() in the main scheduler code where it made sense (to me). Signed-off-by: Max Krasnyanskiy <maxk@qualcomm.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Gregory Haskins <ghaskins@novell.com> Cc: dmitry.adamushko@gmail.com Cc: pj@sgi.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: Provide a generic set of CPUMASK_ALLOC macros, FIXUPMike Travis2008-07-20
| | | | | | | | | | | | | | | | | | | | | | * Rename CPUMASK_VAR --> CPUMASK_PTR (and simplify) * Fix a semantic error in CPUMASK_ALLOC * Add a bit of commentry to cpumask.h Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: Provide a generic set of CPUMASK_ALLOC macrosMike Travis2008-07-18
| | | | | | | | | | | | | | | | | | | | * Provide a generic set of CPUMASK_ALLOC macros patterned after the SCHED_CPUMASK_ALLOC macros. This is used where multiple cpumask_t variables are declared on the stack to reduce the amount of stack space required. Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | cpumask: Replace cpumask_of_cpu with cpumask_of_cpu_ptrMike Travis2008-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * This patch replaces the dangerous lvalue version of cpumask_of_cpu with new cpumask_of_cpu_ptr macros. These are patterned after the node_to_cpumask_ptr macros. In general terms, if there is a cpumask_of_cpu_map[] then a pointer to the cpumask_of_cpu_map[cpu] entry is used. The cpumask_of_cpu_map is provided when there is a large NR_CPUS count, reducing greatly the amount of code generated and stack space used for cpumask_of_cpu(). The pointer to the cpumask_t value is needed for calling set_cpus_allowed_ptr() to reduce the amount of stack space needed to pass the cpumask_t value. If there isn't a cpumask_of_cpu_map[], then a temporary variable is declared and filled in with value from cpumask_of_cpu(cpu) as well as a pointer variable pointing to this temporary variable. Afterwards, the pointer is used to reference the cpumask value. The compiler will optimize out the extra dereference through the pointer as well as the stack space used for the pointer, resulting in identical code. A good example of the orthogonal usages is in net/sunrpc/svc.c: case SVC_POOL_PERCPU: { unsigned int cpu = m->pool_to[pidx]; cpumask_of_cpu_ptr(cpumask, cpu); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, cpumask); return 1; } case SVC_POOL_PERNODE: { unsigned int node = m->pool_to[pidx]; node_to_cpumask_ptr(nodecpumask, node); *oldmask = current->cpus_allowed; set_cpus_allowed_ptr(current, nodecpumask); return 1; } Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Revert "cpumask: introduce new APIs"Ingo Molnar2008-07-06
| | | | | | | | | | | | | | | | This reverts commit acb7669c125676e63cf96582455509216c39745e. the wrappers are not needed anymore. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge commit 'v2.6.26-rc9' into cpus4096Ingo Molnar2008-07-06
|\|
| * cpumask: introduce new APIsStephen Rothwell2008-07-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In linux-next there is a commit ("x86: Add performance variants of cpumask operators") which, as part of the 4096 cpu support work adds some new APIs for dealing with cpu masks. Add trivial versions of these now so that subsystems can update in a timely manner and avoid conflicts in linux-next and the next merge window. Cc: Mike Travis <travis@sgi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | cpumask: make for_each_cpu_mask a bit smallerAlexander van Heukelum2008-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The for_each_cpu_mask loop is used quite often in the kernel. It makes use of two functions: first_cpu and next_cpu. This patch changes for_each_cpu_mask to use only the latter. Because next_cpu finds the next eligible cpu _after_ the given one, the iteration variable has to be initialized to -1 and next_cpu has to be called with this value before the first iteration. An x86_64 defconfig kernel (from sched/latest) is about 2500 bytes smaller with this patch applied: text data bss dec hex filename 6222517 917952 749932 7890401 7865e1 vmlinux.orig 6219922 917952 749932 7887806 785bbe vmlinux The same size reduction is seen for defconfig+MAXSMP text data bss dec hex filename 6241772 2563968 1492716 10298456 9d2458 vmlinux.orig 6239211 2563968 1492716 10295895 9d1a57 vmlinux Signed-off-by: Alexander van Heukelum <heukelum@fastmail.fm> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: Add performance variants of cpumask operatorsMike Travis2008-05-23
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Increase performance for systems with large count NR_CPUS by limiting the range of the cpumask operators that loop over the bits in a cpumask_t variable. This removes a large amount of wasted cpu cycles. * Add performance variants of the cpumask operators: int cpus_weight_nr(mask) Same using nr_cpu_ids instead of NR_CPUS int first_cpu_nr(mask) Number lowest set bit, or nr_cpu_ids int next_cpu_nr(cpu, mask) Next cpu past 'cpu', or nr_cpu_ids for_each_cpu_mask_nr(cpu, mask) for-loop cpu over mask using nr_cpu_ids * Modify following to use performance variants: #define num_online_cpus() cpus_weight_nr(cpu_online_map) #define num_possible_cpus() cpus_weight_nr(cpu_possible_map) #define num_present_cpus() cpus_weight_nr(cpu_present_map) #define for_each_possible_cpu(cpu) for_each_cpu_mask_nr((cpu), ...) #define for_each_online_cpu(cpu) for_each_cpu_mask_nr((cpu), ...) #define for_each_present_cpu(cpu) for_each_cpu_mask_nr((cpu), ...) * Comment added to include/linux/cpumask.h: Note: The alternate operations with the suffix "_nr" are used to limit the range of the loop to nr_cpu_ids instead of NR_CPUS when NR_CPUS > 64 for performance reasons. If NR_CPUS is <= 64 then most assembler bitmask operators execute faster with a constant range, so the operator will continue to use NR_CPUS. Another consideration is that nr_cpu_ids is initialized to NR_CPUS and isn't lowered until the possible cpus are discovered (including any disabled cpus). So early uses will span the entire range of NR_CPUS. (The net effect is that for systems with 64 or less CPU's there are no functional changes.) For inclusion into sched-devel/latest tree. Based on: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git + sched-devel/latest .../mingo/linux-2.6-sched-devel.git Cc: Paul Jackson <pj@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Reviewed-by: Paul Jackson <pj@sgi.com> Reviewed-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: remove bitmap_scnprintf_len and cpumask_scnprintf_lenPaul Jackson2008-05-13
| | | | | | | | | | | | | | | They aren't used. They were briefly used as part of some other patches to provide an alternative format for displaying some /proc and /sys cpumasks. They probably should have been removed when those other patches were dropped, in favor of a different solution. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: "Mike Travis" <travis@sgi.com> Cc: "Bert Wesarg" <bert.wesarg@googlemail.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mempolicy: add bitmap_onto() and bitmap_fold() operationsPaul Jackson2008-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(), with the usual cpumask and nodemask wrappers. The bitmap_onto() operator computes one bitmap relative to another. If the n-th bit in the origin mask is set, then the m-th bit of the destination mask will be set, where m is the position of the n-th set bit in the relative mask. The bitmap_fold() operator folds a bitmap into a second that has bit m set iff the input bitmap has some bit n set, where m == n mod sz, for the specified sz value. There are two substantive changes between this patch and its predecessor bitmap_relative: 1) Renamed bitmap_relative() to be bitmap_onto(). 2) Added bitmap_fold(). The essential motivation for bitmap_onto() is to provide a mechanism for converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset relative masks are written as if the current task were in a cpuset whose CPUs or Nodes were just the consecutive ones numbered 0..N-1, for some N. The bitmap_onto() operator is provided in anticipation of adding support for the first such cpuset relative mask, by the mbind() and set_mempolicy() system calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators (and their nodemask wrappers, in particular) will be used in code that converts the user specified cpuset relative memory policy to a specific system node numbered policy, given the current mems_allowed of the tasks cpuset. Such cpuset relative mempolicies will address two deficiencies of the existing interface between cpusets and mempolicies: 1) A task cannot at present reliably establish a cpuset relative mempolicy because there is an essential race condition, in that the tasks cpuset may be changed in between the time the task can query its cpuset placement, and the time the task can issue the applicable mbind or set_memplicy system call. 2) A task cannot at present establish what cpuset relative mempolicy it would like to have, if it is in a smaller cpuset than it might have mempolicy preferences for, because the existing interface only allows specifying mempolicies for nodes currently allowed by the cpuset. Cpuset relative mempolicies are useful for tasks that don't distinguish particularly between one CPU or Node and another, but only between how many of each are allowed, and the proper placement of threads and memory pages on the various CPUs and Nodes available. The motivation for the added bitmap_fold() can be seen in the following example. Let's say an application has specified some mempolicies that presume 16 memory nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset relative) nodes 12-15. Then lets say that application is crammed into a cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(), this mempolicy, mapped to that cpuset, would ignore the requested relative nodes above 7, leaving it empty of nodes. That's not good; better to fold the higher nodes down, so that some nodes are included in the resulting mapped mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the weight of the mems_allowed of the confining cpuset), resulting in a mempolicy specifying nodes 4-7. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Andi Kleen <ak@suse.de> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: <kosaki.motohiro@jp.fujitsu.com> Cc: <ray-lk@madrabbit.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: convert cpumask_of_cpu macro to allocated arrayMike Travis2008-04-19
| | | | | | | | | | | | | | * Here is a simple patch to use an allocated array of cpumasks to represent cpumask_of_cpu() instead of constructing one on the stack. It's based on the Kconfig option "HAVE_CPUMASK_OF_CPU_MAP" which is currently only set for x86_64 SMP. Otherwise the the existing cpumask_of_cpu() is used but has been changed to produce an lvalue so a pointer to it can be used. Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: add CPU_MASK_ALL_PTR macroMike Travis2008-04-19
| | | | | | | | | | | | | | | | * Add a static cpumask_t variable "CPU_MASK_ALL_PTR" to use as a pointer reference to CPU_MASK_ALL. This reduces where possible the instances where CPU_MASK_ALL allocates and fills a large array on the stack. Used only if NR_CPUS > BITS_PER_LONG. * Change init/main.c to use new set_cpus_allowed_ptr(). Depends on: [sched-devel]: sched: add new set_cpus_allowed_ptr function Cc: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpumask: add cpumask_scnprintf_len functionMike Travis2008-04-19
| | | | | | | | | | | | | Add a new function cpumask_scnprintf_len() to return the number of characters needed to display "len" cpumask bits. The current method of allocating NR_CPUS bytes is incorrect as what's really needed is 9 characters per 32-bit word of cpumask bits (8 hex digits plus the seperator [','] or the terminating NULL.) This function provides the caller the means to allocate the correct string length. Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Mike Travis <travis@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: avoid build warningIngo Molnar2008-01-30
| | | | | | | | | | fix this build warning: include/asm/topology_32.h: In function 'node_to_first_cpu': include/asm/topology_32.h:66: warning: unused variable 'mask' Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* CPU hotplug: fix cpu_is_offline() on !CONFIG_HOTPLUG_CPUIngo Molnar2008-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make randconfig bootup testing found that the cpufreq code crashes on bootup, if the powernow-k8 driver is enabled and if maxcpus=1 passed on the boot line to a !CONFIG_HOTPLUG_CPU kernel. First lockdep found out that there's an inconsistent unlock sequence: ===================================== [ BUG: bad unlock balance detected! ] ------------------------------------- swapper/1 is trying to release lock (&per_cpu(cpu_policy_rwsem, cpu)) at: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 but there are no more locks to release! Call Trace: [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff80251c29>] print_unlock_inbalance_bug+0x104/0x12c [<ffffffff80252f3a>] mark_held_locks+0x56/0x94 [<ffffffff806ffd8e>] unlock_policy_rwsem_write+0x3c/0x42 [<ffffffff807008b6>] cpufreq_add_dev+0x2a8/0x5c4 ... then shortly afterwards the cpufreq code crashed on an assert: ------------[ cut here ]------------ kernel BUG at drivers/cpufreq/cpufreq.c:1068! invalid opcode: 0000 [1] SMP [...] Call Trace: [<ffffffff805145d6>] sysdev_driver_unregister+0x5b/0x91 [<ffffffff806ff520>] cpufreq_register_driver+0x15d/0x1a2 [<ffffffff80cc0596>] powernowk8_init+0x86/0x94 [...] ---[ end trace 1e9219be2b4431de ]--- the bug was caused by maxcpus=1 bootup, which brought up the secondary core as !cpu_online() but !cpu_is_offline() either, which on on !CONFIG_HOTPLUG_CPU is always 0 (include/linux/cpu.h): /* CPUs don't go offline once they're online w/o CONFIG_HOTPLUG_CPU */ static inline int cpu_is_offline(int cpu) { return 0; } but the cpufreq code uses cpu_online() and cpu_is_offline() in a mixed way - the low-level drivers use cpu_online(), while the cpufreq core uses cpu_is_offline(). This opened up the possibility to add the non-initialized sysdev device of the secondary core: cpufreq-core: trying to register driver powernow-k8 cpufreq-core: adding CPU 0 powernow-k8: BIOS error - no PSB or ACPI _PSS objects cpufreq-core: initialization failed cpufreq-core: adding CPU 1 cpufreq-core: initialization failed which then blew up. The fix is to make cpu_is_offline() always the negation of cpu_online(). With that fix applied the kernel boots up fine without crashing: Calling initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() powernow-k8: Found 1 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (1 cpu cores) (version 2.20.00) powernow-k8: BIOS error - no PSB or ACPI _PSS objects initcall 0xffffffff80cc0510: powernowk8_init+0x0/0x94() returned -19. initcall 0xffffffff80cc0510 ran for 19 msecs: powernowk8_init+0x0/0x94() Calling initcall 0xffffffff80cc328f: init_lapic_nmi_sysfs+0x0/0x39() We could fix this by making CPU enumeration aware of max_cpus, but that would be more fragile IMO, and the cpu_online(cpu) != cpu_is_offline(cpu) possibility was quite confusing and a continuous source of bugs too. Most distributions have kernels with CPU hotplug enabled, so this bug remained hidden for a long time. Bug forensics: The broken cpu_is_offline() API variant was introduced via: commit a59d2e4e6977e7b94e003c96a41f07e96cddc340 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Mon Mar 8 06:06:03 2004 -0800 [PATCH] minor cleanups for hotplug CPUs ( this predates linux-2.6.git, this commit is available from Thomas's historic git tree. ) Then 1.5 years later the cpufreq code made use of it: commit c32b6b8e524d2c337767d312814484d9289550cf Author: Ashok Raj <ashok.raj@intel.com> Date: Sun Oct 30 14:59:54 2005 -0800 [PATCH] create and destroy cpufreq sysfs entries based on cpu notifiers + if (cpu_is_offline(cpu)) + return 0; which is a correct use of the subtly broken new API. v2.6.15 then shipped with this bug included. then it took two more years for random-kernel qa to hit it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] Convert highest_possible_processor_id to nr_cpu_idsChristoph Lameter2007-02-20
| | | | | | | | | | | | | | | | | We frequently need the maximum number of possible processors in order to allocate arrays for all processors. So far this was done using highest_possible_processor_id(). However, we do need the number of processors not the highest id. Moreover the number was so far dynamically calculated on each invokation. The number of possible processors does not change when the system is running. We can therefore calculate that number once. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Frederik Deweerdt <frederik.deweerdt@gmail.com> Cc: Neil Brown <neilb@suse.de> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] bitmap: parse input from kernel and user buffersReinette Chatre2006-10-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | lib/bitmap.c:bitmap_parse() is a library function that received as input a user buffer. This seemed to have originated from the way the write_proc function of the /proc filesystem operates. This has been reworked to not use kmalloc and eliminates a lot of get_user() overhead by performing one access_ok before using __get_user(). We need to test if we are in kernel or user space (is_user) and access the buffer differently. We cannot use __get_user() to access kernel addresses in all cases, for example in architectures with separate address space for kernel and user. This function will be useful for other uses as well; for example, taking input for /sysfs instead of /proc, so it was changed to accept kernel buffers. We have this use for the Linux UWB project, as part as the upcoming bandwidth allocator code. Only a few routines used this function and they were changed too. Signed-off-by: Reinette Chatre <reinette.chatre@linux.intel.com> Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com> Cc: Paul Jackson <pj@sgi.com> Cc: Joe Korty <joe.korty@ccur.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] for_each_cpu_mask() warning fixAndrew Morton2006-06-25
| | | | | | | | | | | | | | | | | | | On UP, this: cpumask_t mask = node_to_cpumask(numa_node_id()); for_each_cpu_mask(cpu, mask) does this: mm/readahead.c: In function `node_readahead_aging': mm/readahead.c:850: warning: unused variable `mask' which is unpleasantly fixed by this: Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] remove for_each_cpu()Andrew Morton2006-06-25
| | | | | | | | Convert a few stragglers over to for_each_possible_cpu(), remove for_each_cpu(). Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] for_each_possible_cpu: defines for_each_possible_cpuKAMEZAWA Hiroyuki2006-03-28
| | | | | | | | | | | | | for_each_cpu() is a for-loop over cpu_possible_map. for_each_online_cpu is for-loop cpu over cpu_online_map. .....for_each_cpu() is not sufficiently explicit and can lead to mistakes. This patch adds for_each_possible_cpu() in preparation for the removal of for_each_cpu(). Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpumask: uninline any_online_cpu()Andrew Morton2006-03-25
| | | | | | | | | | | | | | | | | text data bss dec hex filename before: 3605597 1363528 363328 5332453 515de5 vmlinux after: 3605295 1363612 363200 5332107 515c8b vmlinux 218 bytes saved. Also, optimise any_online_cpu() out of existence on CONFIG_SMP=n. This function seems inefficient. Can't we simply AND the two masks, then use find_first_bit()? Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpumask: uninline highest_possible_processor_id()Andrew Morton2006-03-25
| | | | | | | | | | | Shrinks the only caller (net/bridge/netfilter/ebtables.c) by 174 bytes. Also, optimise highest_possible_processor_id() out of existence on CONFIG_SMP=n. Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpumask: uninline next_cpu()Andrew Morton2006-03-25
| | | | | | | | | | | | text data bss dec hex filename before: 3488027 1322496 360128 5170651 4ee5db vmlinux after: 3485112 1322480 359968 5167560 4ed9c8 vmlinux 2931 bytes saved Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] cpumask: uninline first_cpu()Andrew Morton2006-03-25
| | | | | | | | | | text data bss dec hex filename before: 3490577 1322408 360000 5172985 4eeef9 vmlinux after: 3488027 1322496 360128 5170651 4ee5db vmlinux Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] x86: don't initialise cpu_possible_map to all onesAndrew Morton2006-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | Initialising cpu_possible_map to all-ones with CONFIG_HOTPLUG_CPU means that a) All for_each_cpu() loops will iterate across all NR_CPUS CPUs, rather than over possible ones. That can be quite expensive. b) Soon we'll be allocating per-cpu areas only for possible CPUs. So with CPU_MASK_ALL, we'll be wasting memory. I also switched voyager over to not use CPU_MASK_ALL in the non-CPU-hotplug case. Should be OK.. I note that parisc is also using CPU_MASK_ALL. Suggest that it stop doing that. Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Paul Jackson <pj@sgi.com> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Zwane Mwaikambo <zwane@linuxpower.ca> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] remove bogus asm/bug.h includes.Al Viro2006-02-07
| | | | | | | A bunch of asm/bug.h includes are both not needed (since it will get pulled anyway) and bogus (since they are done too early). Removed. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] cpusets: bitmap and mask remap operatorsPaul Jackson2005-10-30
| | | | | | | | | | | | | | | | | | | | | In the forthcoming task migration support, a key calculation will be mapping cpu and node numbers from the old set to the new set while preserving cpuset-relative offset. For example, if a task and its pages on nodes 8-11 are being migrated to nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be moved to node 25 (the 2nd node in the new set.) As with other bitmap operations, the proper way to code this is to provide the underlying calculation in lib/bitmap.c, and then to provide the usual cpumask and nodemask wrappers. This patch provides that. These operations are termed 'remap' operations. Both remapping a single bit and a set of bits is supported. Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH]: highest_possible_processor_id() has to be a macroAl Viro2005-10-16
| | | | | | | | | | | | | | | | | | | | ... otherwise, things like alpha and sparc64 break and break badly. They define cpu_possible_map to something else in smp.h *AFTER* having included cpumask.h. If that puppy is a macro, expansion will happen at the actual caller, when we'd already seen #define cpu_possible_map ... and we will get the right thing used. As an inline helper it will be tokenized before we get to that define and that's it; no matter what we define later, it won't affect anything. We get modules with dependency on cpu_possible_map instead of the right symbol (phys_cpu_present_map in case of sparc64), or outright link errors if they are built-in. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.David S. Miller2005-10-13
| | | | | | | | | | | | | | Original patch by Harald Welte, with feedback from Herbert Xu and testing by Sébastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-16
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!