aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* Hibernation: Enter platform hibernation state in a consistent wayRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | | | | | | | Make hibernation_platform_enter() execute the enter-a-sleep-state sequence instead of the mixed shutdown-with-entering-S4 thing. Replace the shutting down of devices done by kernel_shutdown_prepare(), before entering the ACPI S4 sleep state, with suspending them and the shutting down of sysdevs with calling device_power_down(PMSG_SUSPEND) (just like before entering S1 or S3, but the target state is now S4).  Also, disable the nonboot CPUs before entering the sleep state (S4), which generally always is a good idea. This is known to fix the "double disk spin down during hibernation" on some machines, eg. HPC nx6325 (ref. http://lkml.org/lkml/2007/8/7/316 and the following thread).  Moreover, it has been reported to make /sys/class/rtc/rtc0/wakealarm work correctly with hibernation for some users. It also generally causes the hibernation state (ACPI S4) to be entered faster. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Hibernation: Check if ACPI is enabled during restore in the right placeRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | | | | | | | The following scenario leads to total confusion of the platform firmware on some boxes (eg. HPC nx6325): * Hibernate with ACPI enabled * Resume passing "acpi=off" to the boot kernel To prevent this from happening it's necessary to check if ACPI is enabled (and enable it if that's not the case) _right_ _after_ control has been transfered from the boot kernel to the image kernel, before device_power_up() is called (ie. with interrupts disabled).  Enabling ACPI after calling device_power_up() turns out to be insufficient. For this reason, introduce new hibernation callback ->leave() that will be executed before device_power_up() by the restored image kernel.  To make it work, it also is necessary to move swsusp_suspend() from swsusp.c to disk.c (it's name is changed to "create_image", which is more up to the point). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Hibernation: Arbitrary boot kernel support - generic codeRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | Add the bits needed for supporting arbitrary boot kernels to the common hibernation code. To support arbitrary boot kernels, make it possible to replace the 'struct new_utsname' and the kernel version in the hibernation image header by some architecture specific data that will be used to verify if the image is valid and to restore the image. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* serial: turn serial console suspend a boot rather than compile time optionAndres Salomon2007-10-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there's a CONFIG_DISABLE_CONSOLE_SUSPEND that allows one to stop the serial console from being suspended when the rest of the machine goes to sleep. This is incredibly useful for debugging power management-related things; however, having it as a compile-time option has proved to be incredibly inconvenient for us (OLPC). There are plenty of times that we want serial console to not suspend, but for the most part we'd like serial console to be suspended. This drops CONFIG_DISABLE_CONSOLE_SUSPEND, and replaces it with a kernel boot parameter (no_console_suspend). By default, the serial console will be suspended along with the rest of the system; by passing 'no_console_suspend' to the kernel during boot, serial console will remain alive during suspend. For now, this is pretty serial console specific; further fixes could be applied to make this work for things like netconsole. Signed-off-by: Andres Salomon <dilinger@debian.org> Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@suspend2.net> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: measure freezing timeRafael J. Wysocki2007-10-18
| | | | | | | | | Measure the time of the freezing of tasks, even if it doesn't fail. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: be more verboseRafael J. Wysocki2007-10-18
| | | | | | | | | | | Increase the freezer's verbosity a bit, so that it's easier to read problem reports related to it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* unexport pm_power_off_prepareAdrian Bunk2007-10-18
| | | | | | | | | This patch removes the unused EXPORT_SYMBOL(pm_power_off_prepare). Signed-off-by: Adrian Bunk <bunk@stusta.de> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: do not send signals to kernel threadsRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | | The freezer should not send signals to kernel threads, since that may lead to subtle problems. In particular, commit b74d0deb968e1f85942f17080eace015ce3c332c has changed recalc_sigpending_tsk() so that it doesn't clear TIF_SIGPENDING. For this reason, if the freezer continues to send fake signals to kernel threads and the freezing of kernel threads fails, some of them may be running with TIF_SIGPENDING set forever. Accordingly, recalc_sigpending_tsk() shouldn't set the task's TIF_SIGPENDING flag if TIF_FREEZE is set. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: prevent new tasks from inheriting TIF_FREEZE setRafael J. Wysocki2007-10-18
| | | | | | | | | | | | Tasks should go to the refrigerator only if explicitly requested to do that by the freezer and not as a result of inheriting the TIF_FREEZE flag set from the parent. Make it happen. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* freezer: do not sync filesystems from freeze_processesRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | The syncing of filesystems from within the freezer is generally not needed. Also, if there's an ext3 filesystem loopback-mounted from a FUSE one, the syncing results in writes to it and deadlocks. Similarly, it will deadlock if FUSE implements sync. Change freeze_processes() so that it doesn't execute sys_sync() and make the suspend and hibernation code path sync filesystems independently of the freezer. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Rename hibernation_ops to platform_hibernation_opsRafael J. Wysocki2007-10-18
| | | | | | | | | | | | Rename 'struct hibernation_ops' to 'struct platform_hibernation_ops' in analogy with 'struct platform_suspend_ops'. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Rework struct hibernation_opsRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | During hibernation we also need to tell the ACPI core that we're going to put the system into the S4 sleep state. For this reason, an additional method in 'struct hibernation_ops' is needed, playing the role of set_target() in 'struct platform_suspend_operations'. Moreover, the role of the .prepare() method is now different, so it's better to introduce another method, that in general may be different from .prepare(), that will be used to prepare the platform for creating the hibernation image (.prepare() is used anyway to notify the platform that we're going to enter the low power state after the image has been saved). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Make suspend_ops staticRafael J. Wysocki2007-10-18
| | | | | | | | | | | The variable suspend_ops representing the set of global platform-specific suspend-related operations, used by the PM core, need not be exported outside of kernel/power/main.c .  Make it static. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Rework struct platform_suspend_opsRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | | There is no reason why the .prepare() and .finish() methods in 'struct platform_suspend_ops' should take any arguments, since architectures don't use these methods' argument in any practically meaningful way (ie. either the target system sleep state is conveyed to the platform by .set_target(), or there is only one suspend state supported and it is indicated to the PM core by .valid(), or .prepare() and .finish() aren't defined at all).  There also is no reason why .finish() should return any result. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* PM: Rename struct pm_ops and related thingsRafael J. Wysocki2007-10-18
| | | | | | | | | | | | | | | | The name of 'struct pm_ops' suggests that it is related to the power management in general, but in fact it is only related to suspend.  Moreover, its name should indicate what this structure is used for, so it seems reasonable to change it to 'struct platform_suspend_ops'.  In that case, the name of the global variable of this type used by the PM core and the names of related functions should be changed accordingly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* make kernel/power/main.c:suspend_enter() staticAdrian Bunk2007-10-18
| | | | | | | | | | suspend_enter() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: C1E late detection fix. Really switch off lapic timerThomas Gleixner2007-10-17
| | | | | | | | | | | Doh, I completely missed that devices marked DUMMY are not running the set_mode function. So we force broadcasting, but we keep the local APIC timer running. Let the clock event layer mark the device _after_ switching it off. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds2007-10-17
|\ | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: fix new task startup crash sched: fix !SYSFS build breakage sched: fix improper load balance across sched domain sched: more robust sd-sysctl entry freeing
| * sched: fix new task startup crashSrivatsa Vaddagiri2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Child task may be added on a different cpu that the one on which parent is running. In which case, task_new_fair() should check whether the new born task's parent entity should be added as well on the cfs_rq. Patch below fixes the problem in task_new_fair. This could fix the put_prev_task_fair() crashes reported. Reported-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Reported-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix !SYSFS build breakageDhaval Giani2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_SYSFS is not set, CONFIG_FAIR_USER_SCHED fails to build with kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1488): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1490): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1480): undefined reference to `kernel_subsys' kernel/built-in.o: In function `uids_kobject_init': (.init.text+0x1494): undefined reference to `kernel_subsys' This patch fixes this build error. Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix improper load balance across sched domainKen Chen2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We recently discovered a nasty performance bug in the kernel CPU load balancer where we were hit by 50% performance regression. When tasks are assigned to a subset of CPUs that span across sched_domains (either ccNUMA node or the new multi-core domain) via cpu affinity, kernel fails to perform proper load balance at these domains, due to several logic in find_busiest_group() miss identified busiest sched group within a given domain. This leads to inadequate load balance and causes 50% performance hit. To give you a concrete example, on a dual-core, 2 socket numa system, there are 4 logical cpu, organized as: CPU0 attaching sched-domain: domain 0: span 0003 groups: 0001 0002 domain 1: span 000f groups: 0003 000c CPU1 attaching sched-domain: domain 0: span 0003 groups: 0002 0001 domain 1: span 000f groups: 0003 000c CPU2 attaching sched-domain: domain 0: span 000c groups: 0004 0008 domain 1: span 000f groups: 000c 0003 CPU3 attaching sched-domain: domain 0: span 000c groups: 0008 0004 domain 1: span 000f groups: 000c 0003 If I run 2 tasks with CPU affinity set to 0x5. There are situation where cpu0 has run queue length of 2, and cpu2 will be idle. The kernel load balancer is unable to balance out these two tasks over cpu0 and cpu2 due to at least three logics in find_busiest_group() that heavily bias load balance towards power saving mode. e.g. while determining "busiest" variable, kernel only set it when "sum_nr_running > group_capacity". This test is flawed that "sum_nr_running" is not necessary same as sum-tasks-allowed-to-run-within-the sched-group. The end result is that kernel "think" everything is balanced, but in reality we have an imbalance and thus causing one CPU to be over-subscribed and leaving other idle. There are two other logic in the same function will also causing similar effect. The nastiness of this bug is that kernel not be able to get unstuck in this unfortunate broken state. From what we've seen in our environment, kernel will stuck in imbalanced state for extended period of time and it is also very easy for the kernel to stuck into that state (it's pretty much 100% reproducible for us). So proposing the following fix: add addition logic in find_busiest_group to detect intrinsic imbalance within the busiest group. When such condition is detected, load balance goes into spread mode instead of default grouping mode. Signed-off-by: Ken Chen <kenchen@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: more robust sd-sysctl entry freeingMilton Miller2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It occurred to me this morning that the procname field was dynamically allocated and needed to be freed. I started to put in break statements when allocation failed but it was approaching 50% error handling code. I came up with this alternative of looping while entry->mode is set and checking proc_handler instead of ->table. Alternatively, the string version of the domain name and cpu number could be stored the structs. I verified by compiling CONFIG_DEBUG_SLAB and checking the allocation counts after taking a cpuset exclusive and back. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | security/ cleanupsAdrian Bunk2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch contains the following cleanups that are now possible: - remove the unused security_operations->inode_xattr_getsuffix - remove the no longer used security_operations->unregister_security - remove some no longer required exit code - remove a bunch of no longer used exports Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | ifdef struct task_struct::securityAlexey Dobriyan2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | For those who don't care about CONFIG_SECURITY. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: "Serge E. Hallyn" <serge@hallyn.com> Cc: Casey Schaufler <casey@schaufler-ca.com> Cc: James Morris <jmorris@namei.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | migration_call(CPU_DEAD): use spin_lock_irq() instead of task_rq_lock()Oleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change migration_call(CPU_DEAD) to use direct spin_lock_irq() instead of task_rq_lock(rq->idle), rq->idle can't change its task_rq(). This makes the code a bit more symmetrical with migrate_dead_tasks()'s path which uses spin_lock_irq/spin_unlock_irq. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Cliff Wickman <cpw@sgi.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | do CPU_DEAD migrating under read_lock(tasklist) instead of ↵Oleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | write_lock_irq(tasklist) Currently move_task_off_dead_cpu() is called under write_lock_irq(tasklist). This means it can't use task_lock() which is needed to improve migrating to take task's ->cpuset into account. Change the code to call move_task_off_dead_cpu() with irqs enabled, and change migrate_live_tasks() to use read_lock(tasklist). This all is a preparation for the futher changes proposed by Cliff Wickman, see http://marc.info/?t=117327786100003 Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Cliff Wickman <cpw@sgi.com> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Srivatsa Vaddagiri <vatsa@in.ibm.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | module: return error when mod_sysfs_init() failedAkinobu Mita2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | load_module() returns zero when mod_sysfs_init() fails, then the module loading will succeed accidentally. This patch makes load_module() return error correctly in that case. Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Compile handle_percpu_irq even for uniprocessor kernelsRalf Baechle2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling handle_percpu_irq only on uniprocessor generates an artificial special case so a typical use like: set_irq_chip_and_handler(irq, &some_irq_type, handle_percpu_irq); needs to be conditionally compiled only on SMP systems as well and an alternative UP construct is usually needed - for no good reason. This fixes uniprocessor configurations for some MIPS SMP systems. Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | change inotifyfs magic as the same magic is used for futexfsAndrey Mirkin2007-10-17
| | | | | | | | | | | | | | | | | | | | | | Right now futexfs and inotifyfs have one magic 0xBAD1DEA, that looks a little bit confusing. Use 0xBAD1DEA as magic for futexfs and 0x2BAD1DEA as magic for inotifyfs. Signed-off-by: Andrey Mirkin <major@openvz.org> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Use KMEM_CACHE macro to create the nsproxy cachePavel Emelyanov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | The blessed way for standard caches is to use it. Besides, this may give this cache a better alignment. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Cedric Le Goater <clg@fr.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | user.c: #ifdef ->mq_bytesAlexey Dobriyan2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | For those who deselect POSIX message queues. Reduces SLAB size of user_struct from 64 to 32 bytes here, SLUB size -- from 40 bytes to 32 bytes. [akpm@linux-foundation.org: fix build] Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | user.c: deinlineAlexey Dobriyan2007-10-17
| | | | | | | | | | | | | | | | Save some space because uid_hash_find() has 3 callsites. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | constify string/array kparam tracking structuresJan Beulich2007-10-17
| | | | | | | | | | | | | | | | | | .. in an effort to make read-only whatever can be made, so that CONFIG_DEBUG_RODATA can catch as many issues as possible. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | make kernel/profile.c:time_hook staticAdrian Bunk2007-10-17
| | | | | | | | | | | | | | | | {,un}register_timer_hook() is the API that should be used. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kernel/sys_ni.c: add dummy sys_ni_syscall() prototypeAdrian Bunk2007-10-17
| | | | | | | | | | | | | | | | | | kernel/sys_ni.c can't #include <linux/syscalls.h> due to cond_syscall(), but let's tell gcc to not warn with -Wmissing-prototypes. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Move PREEMPT_NOTIFIERS into an always-included KconfigAvi Kivity2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kconfig.preempt is not included on some archs (for example, m68k). On those archs, the Kconfig machinery complains that KVM selects an undefined symbol PREEMPT_NOTIFIERS (which lives in Kconfig.preempt). So move the offending symbol into a Kconfig file which is included by everyone. Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Avi Kivity <avi@qumranet.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Shrink task_struct if CONFIG_FUTEX=nAlexey Dobriyan2007-10-17
| | | | | | | | | | | | | | | | | | | | robust_list, compat_robust_list, pi_state_list, pi_state_cache are really used if futexes are on. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macrosKen'ichi Ohmichi2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is impossible to grep for them. So these names should be changed. This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | add-vmcore: add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_dataKen'ichi Ohmichi2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [2/3] Add nodemask_t's size and NR_FREE_PAGES's value to vmcoreinfo_data. The dump filetering command 'makedumpfile'(v1.1.6 or before) had assumed the above values, and it was not good from the reliability viewpoint. So makedumpfile v1.2.0 came to need these values and I created the patch to let the kernel output them. makedumpfile site: https://sourceforge.net/projects/makedumpfile/ Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | add-vmcore: cleanup the coding style according to Andrew's commentsKen'ichi Ohmichi2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [1/3] Cleanup the coding style according to Andrew's comments: http://lists.infradead.org/pipermail/kexec/2007-August/000522.html - vmcoreinfo_append_str() should have suitable __attribute__s so that the compiler can check its use. - vmcoreinfo_max_size should have size_t. - Use get_seconds() instead of xtime.tv_sec. - Use init_uts_ns.name.release instead of UTS_RELEASE. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Add vmcoreinfoKen'ichi Ohmichi2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch set frees the restriction that makedumpfile users should install a vmlinux file (including the debugging information) into each system. makedumpfile command is the dump filtering feature for kdump. It creates a small dumpfile by filtering unnecessary pages for the analysis. To distinguish unnecessary pages, it needs a vmlinux file including the debugging information. These days, the debugging package becomes a huge file, and it is hard to install it into each system. To solve the problem, kdump developers discussed it at lkml and kexec-ml. As the result, we reached the conclusion that necessary information for dump filtering (called "vmcoreinfo") should be embedded into the first kernel file and it should be accessed through /proc/vmcore during the second kernel. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html) Dan Aloni created the patch set for the above implementation. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html) And I updated it for multi architectures and memory models. (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html) Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | do_sigaction: don't worry about signal_pending()Oleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_sigaction() returns -ERESTARTNOINTR if signal_pending(). The comment says: * If there might be a fatal signal pending on multiple * threads, make sure we take it before changing the action. I think this is not needed. We should only worry about SIGNAL_GROUP_EXIT case, bit it implies a pending SIGKILL which can't be cleared by do_sigaction. Kill this special case. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | exec: RT sub-thread can livelock and monopolize CPU on execOleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | de_thread() yields waiting for ->group_leader to be a zombie. This deadlocks if an rt-prio execer shares the same cpu with ->group_leader. Change the code to use ->group_exit_task/notify_count mechanics. This patch certainly uglifies the code, perhaps someone can suggest something better. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Make rcutorture RNG use temporal entropyPaul E. McKenney2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Repost of http://lkml.org/lkml/2007/8/10/472 made available by request. The locking used by get_random_bytes() can conflict with the preempt_disable() and synchronize_sched() form of RCU. This patch changes rcutorture's RNG to gather entropy from the new cpu_clock() interface (relying on interrupts, preemption, daemons, and rcutorture's reader thread's rock-bottom scheduling priority to provide useful entropy), and also adds and EXPORT_SYMBOL_GPL() to make that interface available to GPLed kernel modules such as rcutorture. Passes several hours of rcutorture. [ego@in.ibm.com: Use raw_smp_processor_id() in rcu_random()] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Gautham R Shenoy <ego@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Use num_possible_cpus() instead of NR_CPUS for timer distributionjohn stultz2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid lock contention, we distribute the sched_timer calls across the cpus so they do not trigger at the same instant. However, I used NR_CPUS, which can cause needless grouping on small smp systems depending on your kernel config. This patch converts to using num_possible_cpus() so we spread it as evenly as possible on every machine. Briefly tested w/ NR_CPUS=255 and verified reduced contention. Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kernel/time/timekeeping.c: cleanupsAdrian Bunk2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | - remove the no longer required __attribute__((weak)) of xtime_lock - remove the following no longer used EXPORT_SYMBOL's: - xtime - xtime_lock Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | wait_task_stopped/continued: remove unneeded p->signal != NULL checkOleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | The child was found on ->children list under tasklist_lock, it must have a valid ->signal. __exit_signal() both removes the task from parent->children and clears ->signal "atomically" under write_lock(tasklist). Remove unneeded checks. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | __group_complete_signal: eliminate unneeded wakeup of ->group_exit_taskOleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Cleanup. __group_complete_signal() wakes up ->group_exit_task twice. The second wakeup's state includes TASK_UNINTERRUPTIBLE, which is not very appropriate. Change the code to pass the "correct" argument to signal_wake_up() and kill now unneeded wake_up_process(). Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | wait_task_zombie: don't fight with non-existing race with a dying ptraceeOleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | | | The "p->exit_signal == -1 && p->ptrace == 0" check and the comment are bogus. We already did exactly the same check in eligible_child(), we did not drop tasklist_lock since then, and both variables need write_lock(tasklist) to be changed. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | zap_other_threads: don't optimize thread_group_empty() caseOleg Nesterov2007-10-17
| | | | | | | | | | | | | | | | | | | | | | Nowadays thread_group_empty() and next_thread() are simple list operations, this optimization doesn't make sense: we are doing exactly same check one line below. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>