aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* fix idr_find() lockingNadia Derbey2007-10-19
| | | | | | | | | | | | | | | | | | This is a patch that fixes the way idr_find() used to be called in ipc_lock(): in all the paths that don't imply an update of the ipcs idr, it was called without the idr tree being locked. The changes are: . in ipc_ids, the mutex has been changed into a reader/writer semaphore. . ipc_lock() now takes the mutex as a reader during the idr_find(). . a new routine ipc_lock_down() has been defined: it doesn't take the mutex, assuming that it is being held by the caller. This is the routine that is now called in all the update paths. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Acked-by: Jarek Poplawski <jarkao2@o2.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: fix wrong commentsNadia Derbey2007-10-19
| | | | | | | | | This patch fixes the wrong / obsolete comments in the ipc code. Also adds a missing lock around ipc_get_maxid() in shm_get_stat(). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: inline ipc_buildid()Nadia Derbey2007-10-19
| | | | | | | | | This is a trivial patch that changes the ipc_buildid() routine into a static inline. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: introduce the ipcid_to_idx macroNadia Derbey2007-10-19
| | | | | | | | | This is a trivial patch that changes all the (id % SEQ_MULTIPLIER) into a call to the ipcid_to_idx(id) macro. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Storing ipcs into IDRsNadia Derbey2007-10-19
| | | | | | | | | | | | | | This patch converts casts of struct kern_ipc_perm to . struct msg_queue . struct sem_array . struct shmid_kernel into the equivalent container_of() macro. It improves code maintenance because the code need not change if kern_ipc_perm is no longer at the beginning of the containing struct. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: integrate ipc_checkid() into ipc_lock()Nadia Derbey2007-10-19
| | | | | | | | | | | | This patch introduces a new ipc_lock_check() routine interface: . each time ipc_checkid() is called, this is done after calling ipc_lock(). ipc_checkid() is now called from inside ipc_lock_check(). [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: fix RCU locking] Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: remove the ipc_get() routineNadia Derbey2007-10-19
| | | | | | | | | This is a trivial patch that removes the ipc_get() routine: it is replaced by a call to idr_find(). Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: unify the syscalls codeNadia Derbey2007-10-19
| | | | | | | | | | This patch introduces a change into the sys_msgget(), sys_semget() and sys_shmget() routines: they now share a common code, which is better for maintainability. Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ipc: store ipcs into IDRsNadia Derbey2007-10-19
| | | | | | | | | | | | | | | | | This patch introduces ipcs storage into IDRs. The main changes are: . This ipc_ids structure is changed: the entries array is changed into a root idr structure. . The grow_ary() routine is removed: it is not needed anymore when adding an ipc structure, since we are now using the IDR facility. . The ipc_rmid() routine interface is changed: . there is no need for this routine to return the pointer passed in as argument: it is now declared as a void . since the id is now part of the kern_ipc_perm structure, no need to have it as an argument to the routine Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add irq protection in the percpu-counters cpu-hotplug-callback pathGautham R Shenoy2007-10-19
| | | | | | | | | | | | | Some of the per-cpu counters and thus their locks are accessed from IRQ contexts. This can cause a deadlock if it interrupts a cpu-offline thread which is transferring a dead-cpu's counts to the global counter. Add appropriate IRQ protection in the cpu-hotplug callback path. Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* CPU HOTPLUG: avoid hotadd when proper possible_map isn't specifiedKAMEZAWA Hiroyuki2007-10-19
| | | | | | | | | | | | | cpu-hot-add should be fail if cpu is not set in cpu_possible_map. If go ahead, the system will panic soon. Especially, arch which requires additional_cpus= parameter should handle this. Tested on ia64. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* hotplug cpu: migrate a task within its cpusetCliff Wickman2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cpu is disabled, move_task_off_dead_cpu() is called for tasks that have been running on that cpu. Currently, such a task is migrated: 1) to any cpu on the same node as the disabled cpu, which is both online and among that task's cpus_allowed 2) to any cpu which is both online and among that task's cpus_allowed It is typical of a multithreaded application running on a large NUMA system to have its tasks confined to a cpuset so as to cluster them near the memory that they share. Furthermore, it is typical to explicitly place such a task on a specific cpu in that cpuset. And in that case the task's cpus_allowed includes only a single cpu. This patch would insert a preference to migrate such a task to some cpu within its cpuset (and set its cpus_allowed to its entire cpuset). With this patch, migrate the task to: 1) to any cpu on the same node as the disabled cpu, which is both online and among that task's cpus_allowed 2) to any online cpu within the task's cpuset 3) to any cpu which is both online and among that task's cpus_allowed In order to do this, move_task_off_dead_cpu() must make a call to cpuset_cpus_allowed_locked(), a new subset of cpuset_cpus_allowed(), that will not block. (name change - per Oleg's suggestion) Calls are made to cpuset_lock() and cpuset_unlock() in migration_call() to set the cpuset mutex during the whole migrate_live_tasks() and migrate_dead_tasks() procedure. [akpm@linux-foundation.org: build fix] [pj@sgi.com: Fix indentation and spacing] Signed-off-by: Cliff Wickman <cpw@sgi.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Christoph Lameter <clameter@sgi.com> Cc: Paul Jackson <pj@sgi.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Control groups: Replace "cont" with "cgrp" and other misc renamingPaul Menage2007-10-19
| | | | | | | | | | | | | | | | | Replace "cont" with "cgrp" and other misc renaming This patch finishes some of the names that got missed in the great "task containers" -> "control groups" rename. Primarily it renames the local variable "cont" to "cgrp" in a number of places, and renames the CONT_* enum members to CGRP_*. This patch is not intended to have any effect on the generated code; the output of "objdump -d kernel/cgroup.o" is unchanged. Signed-off-by: Paul Menage <menage@google.com> Acked-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use task_pid_nr() instead of pid_nr(task_pid())Pavel Emelyanov2007-10-19
| | | | | | | | | | | There are two places that do so - the cgroups subsystem and the autofs code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Ian Kent <raven@themaw.net> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use task_pid_nr() in ip_vs_sync.cPavel Emelyanov2007-10-19
| | | | | | | | | | | The sync_master_pid and sync_backup_pid are set in set_sync_pid() and are used later for set/not-set checks and in printk. So it is safe to use the global pid value in this case. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Remove unused variables from fs/proc/base.cPavel Emelyanov2007-10-19
| | | | | | | | | | | When removing the explicit task_struct->pid usage I found that proc_readfd_common() and proc_pident_readdir() get this field, but do not use it at all. So this cleanup is a cheap help with the task_struct->pid isolation. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use helpers to obtain task pid in printks (arch code)Alexey Dobriyan2007-10-19
| | | | | | | | | | | | | | | One of the easiest things to isolate is the pid printed in kernel log. There was a patch, that made this for arch-independent code, this one makes so for arch/xxx files. It took some time to cross-compile it, but hopefully these are all the printks in arch code. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Use helpers to obtain task pid in printksPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | The task_struct->pid member is going to be deprecated, so start using the helpers (task_pid_nr/task_pid_vnr/task_pid_nr_ns) in the kernel. The first thing to start with is the pid, printed to dmesg - in this case we may safely use task_pid_nr(). Besides, printks produce more (much more) than a half of all the explicit pid usage. [akpm@linux-foundation.org: git-drm went and changed lots of stuff] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Dave Airlie <airlied@linux.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Isolate the explicit usage of signal->pgrpPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The pgrp field is not used widely around the kernel so it is now marked as deprecated with appropriate comment. The initialization of INIT_SIGNALS is trimmed because a) they are set to 0 automatically; b) gcc cannot properly initialize two anonymous (the second one is the one with the session) unions. In this particular case to make it compile we'd have to add some field initialized right before the .pgrp. This is the same patch as the 1ec320afdc9552c92191d5f89fcd1ebe588334ca one (from Cedric), but for the pgrp field. Some progress report: We have to deprecate the pid, tgid, session and pgrp fields on struct task_struct and struct signal_struct. The session and pgrp are already deprecated. The tgid value is close to being such - the worst known usage in in fs/locks.c and audit code. The pid field deprecation is mainly blocked by numerous printk-s around the kernel that print the tsk->pid to log. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Herbert Poetzl <herbert@13thfloor.at> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix tsk->exit_state usageEugene Teo2007-10-19
| | | | | | | | | | | | tsk->exit_state can only be 0, EXIT_ZOMBIE, or EXIT_DEAD. A non-zero test is the same as tsk->exit_state & (EXIT_ZOMBIE | EXIT_DEAD), so just testing tsk->exit_state is sufficient. Signed-off-by: Eugene Teo <eugeneteo@kernel.sg> Cc: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* proc: export a processes resource limits via /proc/pidNeil Horman2007-10-19
| | | | | | | | | | | | | Currently, there exists no method for a process to query the resource limits of another process. They can be inferred via some mechanisms but they cannot be explicitly determined. Given that this information can be usefull to know during the debugging of an application, I've written this patch which exports all of a processes limits via /proc/<pid>/limits. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* remove BITS_TO_TYPE macroJiri Slaby2007-10-19
| | | | | | | | | | | remove BITS_TO_TYPE macro I realized, that it is actually the same as DIV_ROUND_UP, use it instead. [akpm@linux-foundation.org: build fix] Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* FlashPoint, use BIT instead of BITWJiri Slaby2007-10-19
| | | | | | | | | | | FlashPoint, use BIT instead of BITW BITW was an ushort variant of BIT, use BIT instead Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* define global BIT macroJiri Slaby2007-10-19
| | | | | | | | | | | | | | | | | | | | define global BIT macro move all local BIT defines to the new globally define macro. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Kumar Gala <galak@gate.crashing.org> Cc: Dmitry Torokhov <dtor@mail.ru> Cc: Jeff Garzik <jeff@garzik.org> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: "Antonino A. Daplas" <adaplas@pol.net> Cc: Russell King <rmk@arm.linux.org.uk> Acked-by: Ralf Baechle <ralf@linux-mips.org> Cc: "John W. Linville" <linville@tuxdriver.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* get rid of input BIT* duplicate definesJiri Slaby2007-10-19
| | | | | | | | | | | | | | | | | | | | | | get rid of input BIT* duplicate defines use newly global defined macros for input layer. Also remove includes of input.h from non-input sources only for BIT macro definiton. Define the macro temporarily in local manner, all those local definitons will be removed further in this patchset (to not break bisecting). BIT macro will be globally defined (1<<x) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: <dtor@mail.ru> Acked-by: Jiri Kosina <jkosina@suse.cz> Cc: <lenb@kernel.org> Acked-by: Marcel Holtmann <marcel@holtmann.org> Cc: <perex@suse.cz> Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: <vernux@us.ibm.com> Cc: <malattia@linux.it> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* define first set of BIT* macrosJiri Slaby2007-10-19
| | | | | | | | | | | | | | | define first set of BIT* macros - move BITOP_MASK and BITOP_WORD from asm-generic/bitops/atomic.h to include/linux/bitops.h and rename it to BIT_MASK and BIT_WORD - move BITS_TO_LONGS and BITS_PER_BYTE to bitops.h too and allow easily define another BITS_TO_something (e.g. in event.c) by BITS_TO_TYPE macro Remaining (and common) BIT macro will be defined after all occurences and conflicts will be sorted out in the patches. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* amba-pl011, rename BIT macroJiri Slaby2007-10-19
| | | | | | | | amba-pl011, rename BIT macro Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* s2io, rename BIT macroJiri Slaby2007-10-19
| | | | | | | | | | | | | | | s2io, rename BIT macro BIT macro will be global definiton of (1<<x) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Ramkrishna Vepa <ram.vepa@neterion.com> Cc: Rastapur Santosh <santosh.rastapur@neterion.com> Cc: Sivakumar Subramani <sivakumar.subramani@neterion.com> Cc: Sreenivasa Honnur <sreenivasa.honnur@neterion.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* i2c-pxa, rename BIT macro to PXA_BITJiri Slaby2007-10-19
| | | | | | | | | | | | i2c-pxa, rename BIT macro to PXA_BIT BIT macro will be global definiton of (1 << x) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Nicolas Pitre <nico@cam.org> Cc: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cyber2000fb: checkpatch fixesKrzysztof Helt2007-10-19
| | | | | | | | | | | | This patch fixes errors and warnings pointed out by the checkpatch.pl script. Antonino Daplas replaced BIT with ENCODE_BIT. Signed-off-by: Krzysztof Helt <krzysztof.h1@wp.pl> Signed-off-by: Antonino Daplas <adaplas@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cyber2000fb, rename BIT macroJiri Slaby2007-10-19
| | | | | | | | | | | | cyber2000fb, rename BIT macro BIT will be global macro for (1 << x) Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: "Antonino A. Daplas" <adaplas@pol.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* forbid asm/bitops.h direct inclusionJiri Slaby2007-10-19
| | | | | | | | | | | | | forbid asm/bitops.h direct inclusion Because of compile errors that may occur after bit changes if asm/bitops.h is included directly without e.g. linux/kernel.h which includes linux/bitops.h, forbid direct inclusion of asm/bitops.h. Thanks to Adrian Bunk. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* remove asm/bitops.h includesJiri Slaby2007-10-19
| | | | | | | | | | | | | remove asm/bitops.h includes including asm/bitops directly may cause compile errors. don't include it and include linux/bitops instead. next patch will deny including asm header directly. Cc: Adrian Bunk <bunk@kernel.org> Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fs/select, remove unused macrosJiri Slaby2007-10-19
| | | | | | | | | | fs/select, remove unused macros this is due to preparation for global BIT macro Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Misc: phantom, improved data passingJiri Slaby2007-10-19
| | | | | | | | | | This new version guarantees amb_bit switch in small enough intervals, so that the device won't stop working in the middle of a movement anymore. However it preserves old (openhaptics) functionality. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Misc: phantom, add comment about openhapticsJiri Slaby2007-10-19
| | | | | | Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Misc: phantom, synchronize_irq() on suspendJiri Slaby2007-10-19
| | | | | | | | | Wait after disabling device's interrupt until the handler finishes its work if still in progress. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Fix cpusets update_cpumaskPaul Menage2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cause writes to cpuset "cpus" file to update cpus_allowed for member tasks: - collect batches of tasks under tasklist_lock and then call set_cpus_allowed() on them outside the lock (since this can sleep). - add a simple generic priority heap type to allow efficient collection of batches of tasks to be processed without duplicating or missing any tasks in subsequent batches. - make "cpus" file update a no-op if the mask hasn't changed - fix race between update_cpumask() and sched_setaffinity() by making sched_setaffinity() post-check that it's not running on any cpus outside cpuset_cpus_allowed(). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Paul Menage <menage@google.com> Cc: Paul Jackson <pj@sgi.com> Cc: David Rientjes <rientjes@google.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpusets: decrustify cpuset mask update codePaul Jackson2007-10-19
| | | | | | | | | | | | | | | | | Decrustify the kernel/cpuset.c 'cpus' and 'mems' updating code. Other than subtle improvements in the consistency of identifying white space at the beginning and end of passed in masks, this doesn't make any visible difference in behaviour. But it's one or two hundred kernel text bytes smaller, and easier to understand. [akpm@linux-foundation.org: coding-style fix] Signed-off-by: Paul Jackson <pj@sgi.com> Reviewed-by: Paul Menage <menage@google.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cpuset sched_load_balance flagPaul Jackson2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new per-cpuset flag called 'sched_load_balance'. When enabled in a cpuset (the default value) it tells the kernel scheduler that the scheduler should provide the normal load balancing on the CPUs in that cpuset, sometimes moving tasks from one CPU to a second CPU if the second CPU is less loaded and if that task is allowed to run there. When disabled (write "0" to the file) then it tells the kernel scheduler that load balancing is not required for the CPUs in that cpuset. Now even if this flag is disabled for some cpuset, the kernel may still have to load balance some or all the CPUs in that cpuset, if some overlapping cpuset has its sched_load_balance flag enabled. If there are some CPUs that are not in any cpuset whose sched_load_balance flag is enabled, the kernel scheduler will not load balance tasks to those CPUs. Moreover the kernel will partition the 'sched domains' (non-overlapping sets of CPUs over which load balancing is attempted) into the finest granularity partition that it can find, while still keeping any two CPUs that are in the same shed_load_balance enabled cpuset in the same element of the partition. This serves two purposes: 1) It provides a mechanism for real time isolation of some CPUs, and 2) it can be used to improve performance on systems with many CPUs by supporting configurations in which load balancing is not done across all CPUs at once, but rather only done in several smaller disjoint sets of CPUs. This mechanism replaces the earlier overloading of the per-cpuset flag 'cpu_exclusive', which overloading was removed in an earlier patch: cpuset-remove-sched-domain-hooks-from-cpusets See further the Documentation and comments in the code itself. [akpm@linux-foundation.org: don't be weird] Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Uninline the task_xid_nr_ns() callsPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | Since these are expanded into call to pid_nr_ns() anyway, it's OK to move the whole routine out-of-line. This is a cheap way to save ~100 bytes from vmlinux. Together with the previous two patches, it saves half-a-kilo from the vmlinux. Un-inline other (currently inlined) functions must be done with additional performance testing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Uninline find_pid etc set of functionsPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | | The find_pid/_vpid/_pid_ns functions are used to find the struct pid by its id, depending on whic id - global or virtual - is used. The find_vpid() is a macro that pushes the current->nsproxy->pid_ns on the stack to call another function - find_pid_ns(). It turned out, that this dereference together with the push itself cause the kernel text size to grow too much. Move all these out-of-line. Together with the previous patch this saves a bit less that 400 bytes from .text section. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Isolate some explicit usage of task->tgidPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | With pid namespaces this field is now dangerous to use explicitly, so hide it behind the helpers. Also the pid and pgrp fields o task_struct and signal_struct are to be deprecated. Unfortunately this patch cannot be sent right now as this leads to tons of warnings, so start isolating them, and deprecate later. Actually the p->tgid == pid has to be changed to has_group_leader_pid(), but Oleg pointed out that in case of posix cpu timers this is the same, and thread_group_leader() is more preferable. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: remove the struct pid unneeded fieldsPavel Emelyanov2007-10-19
| | | | | | | | | | | | | Since we've switched from using pid->nr to pid->upids->nr some fields on struct pid are no longer needed Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Uninline find_task_by_xxx set of functionsPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | The find_task_by_something is a set of macros are used to find task by pid depending on what kind of pid is proposed - global or virtual one. All of them are wrappers above the most generic one - find_task_by_pid_type_ns() - and just substitute some args for it. It turned out, that dereferencing the current->nsproxy->pid_ns construction and pushing one more argument on the stack inline cause kernel text size to grow. This patch moves all this stuff out-of-line into kernel/pid.c. Together with the next patch it saves a bit less than 400 bytes from the .text section. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: changes to show virtual ids to userPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the largest patch in the set. Make all (I hope) the places where the pid is shown to or get from user operate on the virtual pids. The idea is: - all in-kernel data structures must store either struct pid itself or the pid's global nr, obtained with pid_nr() call; - when seeking the task from kernel code with the stored id one should use find_task_by_pid() call that works with global pids; - when showing pid's numerical value to the user the virtual one should be used, but however when one shows task's pid outside this task's namespace the global one is to be used; - when getting the pid from userspace one need to consider this as the virtual one and use appropriate task/pid-searching functions. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: nuther build fix] [akpm@linux-foundation.org: yet nuther build fix] [akpm@linux-foundation.org: remove unneeded casts] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: destroy pid namespace on init's deathSukadev Bhattiprolu2007-10-19
| | | | | | | | | | | | | | | Terminate all processes in a namespace when the reaper of the namespace is exiting. We do this by walking the pidmap of the namespace and sending SIGKILL to all processes. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: allow signalling cgroup-initSukadev Bhattiprolu2007-10-19
| | | | | | | | | | | | | | | | | | | | Only the global-init process must be special - any other cgroup-init process must be killable to prevent run-away processes in the system. TODO: Ideally we should allow killing the cgroup-init only from parent cgroup and prevent it being killed from within the cgroup. But that is a more complex change and will be addressed by a follow-on patch. For now allow the cgroup-init to be terminated by any process with sufficient privileges. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: create a slab-cache for 'struct pid_namespace'Sukadev Bhattiprolu2007-10-19
| | | | | | | | | | | | This will help fixing memory leaks due to bad reference counting. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* pid namespaces: initialize the namespace's proc_mntPavel Emelyanov2007-10-19
| | | | | | | | | | | | | | | | | | | | | | | | The namespace's proc_mnt must be kern_mount-ed to make this pointer always valid, independently of whether the user space mounted the proc or not. This solves raced in proc_flush_task, etc. with the proc_mnt switching from NULL to not-NULL. The initialization is done after the init's pid is created and hashed to make proc_get_sb() finr it and get for root inode. Sice the namespace holds the vfsmnt, vfsmnt holds the superblock and the superblock holds the namespace we must explicitly break this circle to destroy all the stuff. This is done after the init of the namespace dies. Running a few steps forward - when init exits it will kill all its children, so no proc_mnt will be needed after its death. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com> Cc: Paul Menage <menage@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>