aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* x86: various changes and cleanups to in_p/out_p delay detailsIngo Molnar2008-01-30
| | | | | | | | | | | | | | | various changes to the in_p/out_p delay details: - add the io_delay=none method - make each method selectable from the kernel config - simplify the delay code a bit by getting rid of an indirect function call - add the /proc/sys/kernel/io_delay_type sysctl - change 'io_delay=standard|alternate' to io_delay=0x80 and io_delay=0xed - make the io delay config not depend on CONFIG_DEBUG_KERNEL Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: "David P. Reed" <dpreed@reed.com>
* time: track accurate idle time with tick_sched.idle_sleeptimeVenki Pallipadi2008-01-30
| | | | | | | | | | | | | | | | | | | | | Current idle time in kstat is based on jiffies and is coarse grained. tick_sched.idle_sleeptime is making some attempt to keep track of idle time in a fine grained manner. But, it is not handling the time spent in interrupts fully. Make tick_sched.idle_sleeptime accurate with respect to time spent on handling interrupts and also add tick_sched.idle_lastupdate, which keeps track of last time when idle_sleeptime was updated. This statistics will be crucial for cpufreq-ondemand governor, which can shed some conservative gaurd band that is uses today while setting the frequency. The ondemand changes that uses the exact idle time is coming soon. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* NTP: correct inconsistent ntp interval/tick_length usagejohn stultz2008-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I recently noticed on one of my boxes that when synched with an NTP server, the drift value reported for the system was ~283ppm. While in some cases, clock hardware can be that bad, it struck me as unusual as the system was using the acpi_pm clocksource, which is one of the more trustworthy and accurate clocksources on x86 hardware. I brought up another system and let it sync to the same NTP server, and I noticed a similar 280some ppm drift. In looking at the code, I found that the acpi_pm's constant frequency was being computed correctly at boot-up, however once the system was up, even without the ntp daemon running, the clocksource's frequency was being modified by the clocksource_adjust() function. Digging deeper, I realized that in the code that keeps track of how much the clocksource is skewing from the ntp desired time, we were using different lengths to establish how long an time interval was. The clocksource was being setup with the following interval: NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ While the ntp code was using the tick_length_base value: tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ) /NTP_INTERVAL_FREQ The subtle difference is: (tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC This difference in calculation was causing the clocksource correction code to apply a correction factor to the clocksource so the two intervals were the same, however this results in the actual frequency of the clocksource to be made incorrect. I believe this difference would affect all clocksources, although to differing degrees depending on the clocksource resolution. The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1, so my apologies for the mistake, and for not noticing it until now. The following patch, corrects the clocksource's initialization code so it uses the same interval length as the code in ntp.c. After applying this patch, the drift value for the same system went from ~283ppm to only 2.635ppm. I believe this patch to be good, however it does affect all arches and I've only tested on x86, so some caution is advised. I do think it would be a likely candidate for a stable 2.6.24.x release. Any thoughts or feedback would be appreciated. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: make clockevents more robustIngo Molnar2008-01-30
| | | | | | | | | detect zero event-device multiplicators - they then cause division-by-zero crashes if a clockevent has been initialized incorrectly. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* clocksource: add unregister function to disable unusable clocksourcesThomas Gleixner2008-01-30
| | | | | | | | | On x86 the PIT might become an unusable clocksource. Add an unregister function to provide a possibilty to remove the PIT from the list of available clock sources. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* clocksource: make clocksource watchdog cycle through online CPUsAndi Kleen2008-01-30
| | | | | | | | | | This way it checks if the clocks are synchronized between CPUs too. This might be able to detect slowly drifting TSCs which only go wrong over longer time. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* clocksource.c: use init_timer_deferrable for clocksource_watchdogParag Warudkar2008-01-30
| | | | | | | | | | | clocksource_watchdog can use a deferrable timer - reduces wakeups from idle per second. Signed-off-by: Parag Warudkar <parag.warudkar@gmail.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* time: fold __get_realtime_clock_ts() into getnstimeofday()Geert Uytterhoeven2008-01-30
| | | | | | | | | | | | - getnstimeofday() was just a wrapper around __get_realtime_clock_ts() - Replace calls to __get_realtime_clock_ts() by calls to getnstimeofday() - Fix bogus reference to get_realtime_clock_ts(), which never existed Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* timer: clean up tick-broadcast.cThomas Gleixner2008-01-30
| | | | | | | clean up tick-broadcast.c Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* time: more timer related cleanupsPavel Machek2008-01-30
| | | | | | | | | I was confused by FSEC = 10^15 NSEC statement, plus small whitespace fixes. When there's copyright, there should be GPL. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* time: timer cleanupsPavel Machek2008-01-30
| | | | | | | | | | | Small cleanups to tick-related code. Wrong preempt count is followed by BUG(), so it is hardly KERN_WARNING. Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* time: clean hungarian notation from timersPavel Machek2008-01-30
| | | | | | | | | | Clean up hungarian notation from timer code. Signed-off-by: Pavel Machek <pavel@suse.cz> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25Linus Torvalds2008-01-29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6.25: (1470 commits) [IPV6] ADDRLABEL: Fix double free on label deletion. [PPP]: Sparse warning fixes. [IPV4] fib_trie: remove unneeded NULL check [IPV4] fib_trie: More whitespace cleanup. [NET_SCHED]: Use nla_policy for attribute validation in ematches [NET_SCHED]: Use nla_policy for attribute validation in actions [NET_SCHED]: Use nla_policy for attribute validation in classifiers [NET_SCHED]: Use nla_policy for attribute validation in packet schedulers [NET_SCHED]: sch_api: introduce constant for rate table size [NET_SCHED]: Use typeful attribute parsing helpers [NET_SCHED]: Use typeful attribute construction helpers [NET_SCHED]: Use NLA_PUT_STRING for string dumping [NET_SCHED]: Use nla_nest_start/nla_nest_end [NET_SCHED]: Propagate nla_parse return value [NET_SCHED]: act_api: use PTR_ERR in tcf_action_init/tcf_action_get [NET_SCHED]: act_api: use nlmsg_parse [NET_SCHED]: act_api: fix netlink API conversion bug [NET_SCHED]: sch_netem: use nla_parse_nested_compat [NET_SCHED]: sch_atm: fix format string warning [NETNS]: Add namespace for ICMP replying code. ...
| * [NET]: Remove the empty net_tablePavel Emelyanov2008-01-28
| | | | | | | | | | | | | | | | I have removed all the entries from this table (core_table, ipv4_table and tr_table), so now we can safely drop it. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sysctl: Infrastructure for per namespace sysctlsEric W. Biederman2008-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the basic infrastructure for per namespace sysctls. A list of lists of sysctl headers is added, allowing each namespace to have it's own list of sysctl headers. Each list of sysctl headers has a lookup function to find the first sysctl header in the list, allowing the lists to have a per namespace instance. register_sysct_root is added to tell sysctl.c about additional lists of sysctl_headers. As all of the users are expected to be in kernel no unregister function is provided. sysctl_head_next is updated to walk through the list of lists. __register_sysctl_paths is added to add a new sysctl table on a non-default sysctl list. The only intrusive part of this patch is propagating the information to decided which list of sysctls to use for sysctl_check_table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sysctl: Remember the ctl_table we passed to register_sysctl_pathsEric W. Biederman2008-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By doing this we allow users of register_sysctl_paths that build and dynamically allocate their ctl_table to be simpler. This allows them to just remember the ctl_table_header returned from register_sysctl_paths from which they can now find the ctl_table array they need to free. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sysctl: Add register_sysctl_paths functionEric W. Biederman2008-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of modules that register a sysctl table somewhere deeply nested in the sysctl hierarchy, such as fs/nfs, fs/xfs, dev/cdrom, etc. They all specify several dummy ctl_tables for the path name. This patch implements register_sysctl_path that takes an additional path name, and makes up dummy sysctl nodes for each component. This patch was originally written by Olaf Kirch and brought to my attention and reworked some by Olaf Hering. I have changed a few additional things so the bugs are mine. After converting all of the easy callers Olaf Hering observed allyesconfig ARCH=i386, the patch reduces the final binary size by 9369 bytes. .text +897 .data -7008 text data bss dec hex filename 26959310 4045899 4718592 35723801 2211a19 ../vmlinux-vanilla 26960207 4038891 4718592 35717690 221023a ../O-allyesconfig/vmlinux So this change is both a space savings and a code simplification. CC: Olaf Kirch <okir@suse.de> CC: Olaf Hering <olaf@aepfle.de> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Serge Hallyn <serue@us.ibm.com> Cc: Daniel Lezcano <dlezcano@fr.ibm.com> Cc: Cedric Le Goater <clg@fr.ibm.com> Cc: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Module: check to see if we have a built in module with the same nameGreg Kroah-Hartman2008-01-29
| | | | | | | | | | | | | | | | | | | | | | | | When trying to load a module with the same name as a built-in one, a scary kobject backtrace comes up. Prevent that from checking for this condition and warning the user as to what exactly is going on. Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: add module taint on ndiswrapperJon Masters2008-01-29
| | | | | | | | | | | | | | | | | | | | | | | | The struct module taints member is supposed to store per-module taint data. The kernel knows about certain specific external modules that will taint the kernel, such as ndiswrapper. Use of ndiswrapper possibly should set the per-module taint in addition to the global kernel taint flag, unless we're arguing not because wrapper module itself is not what actually causes the kernel to be tainted as such? Signed-off-by: Jon Masters <jcm@jonmasters.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: fix the module name length in param_sysfs_builtinDenis Cheng2008-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | the original code use KOBJ_NAME_LEN for built-in module name length, that's defined to 20 in linux/kobject.h, but this is not enough appearntly, many module names are longer than this; #define KOBJ_NAME_LEN 20 another macro is MODULE_NAME_LEN defined in linux/module.h, I think this is enough for module names: #define MODULE_NAME_LEN (64 - sizeof(unsigned long)) Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: make module_address_lookup safeRusty Russell2008-01-29
| | | | | | | | | | | | | | | | module_address_lookup releases preemption then returns a pointer into the module space. The only user (kallsyms) copies the result, so just do that under the preempt disable. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: better OOPS and lockdep coverage for loading modulesRusty Russell2008-01-29
| | | | | | | | | | | | | | | | | | | | If we put the module in the linked list *before* calling into to, we get the module name and functions in the OOPS (is_module_address can find the module). It also helps lockdep in a similar way. Acked-and-tested-by: Joern Engel <joern@lazybastard.org> Tested-by: Erez Zadok <ezk@cs.sunysb.edu> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: Fix gratuitous sprintf in module.cRusty Russell2008-01-29
| | | | | | | | | | | | | | Andrew sent an older version of this patch: we shouldn't use sprintf to copy a string. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: wait for dependent modules doing init.Rusty Russell2008-01-29
| | | | | | | | | | | | | | | | | | There have been reports of modules failing to load because the modules they depend on are still loading. This changes the modules to wait for a reasonable length of time in that case. We time out eventually, because there can be module loops or broken modules. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* | module: Don't report discarded init pages as kernel text.Rusty Russell2008-01-29
|/ | | | | | | | | | | | Current code could cause a bug in symbol_put_addr() if an arch used kmalloc module text: we might think the symbol belongs to the core kernel. The downside is that this might make backtraces through (discarded) init functions harder to read on some archs, but we already have that issue for modules and noone has complained. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* kernel: add CLONE_IO to specifically request sharing of IO contextsJens Axboe2008-01-28
| | | | | | | syslets (or other threads/processes that want io context sharing) can set this to enforce sharing of io context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* io context sharing: preliminary supportJens Axboe2008-01-28
| | | | | | | Detach task state from ioc, instead keep track of how many processes are accessing the ioc. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* ioprio: move io priority from task_struct to io_contextJens Axboe2008-01-28
| | | | | | | This is where it belongs and then it doesn't take up space for a process that doesn't do IO. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* printk: revert ktime_get() timestampsIngo Molnar2008-01-27
| | | | | | | | | | revert 19ef9309273d26cb005cb23e6a370353dca91099. Kevin Winchester reported a lockup during X startup an bisected it to this commit. Reported-by: Kevin Winchester <kjwinchester@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* [S390] Remove appldata include from sysctl_check.cHeiko Carstens2008-01-26
| | | | | | | Forgot to remove this when removing the appldata binary sysctls. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6Linus Torvalds2008-01-25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits) [SCSI] usbstorage: use last_sector_bug flag universally [SCSI] libsas: abstract STP task status into a function [SCSI] ultrastor: clean up inline asm warnings [SCSI] aic7xxx: fix firmware build [SCSI] aacraid: fib context lock for management ioctls [SCSI] ch: remove forward declarations [SCSI] ch: fix device minor number management bug [SCSI] ch: handle class_device_create failure properly [SCSI] NCR5380: fix section mismatch [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices [SCSI] IB/iSER: add logical unit reset support [SCSI] don't use __GFP_DMA for sense buffers if not required [SCSI] use dynamically allocated sense buffer [SCSI] scsi.h: add macro for enclosure bit of inquiry data [SCSI] sd: add fix for devices with last sector access problems [SCSI] fix pcmcia compile problem [SCSI] aacraid: add Voodoo Lite class of cards. [SCSI] aacraid: add new driver features flags [SCSI] qla2xxx: Update version number to 8.02.00-k7. [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command. ...
| * [SCSI] sysfs: add filter function to groupsJames Bottomley2008-01-23
| | | | | | | | | | | | | | | | | | | | This patch allows the various users of attribute_groups to selectively allow the appearance of group attributes. The primary consumer of this will be the transport classes in which we currently have elaborate attribute selection algorithms to do this same thing. Acked-by: Greg KH <greg@kroah.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* | sched: keep total / count stats in addition to the max forArjan van de Ven2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | Right now, the linux kernel (with scheduler statistics enabled) keeps track of the maximum time a process is waiting to be scheduled. While the maximum is a very useful metric, tracking average and total is equally useful (at least for latencytop) to figure out the accumulated effect of scheduler delays. The accumulated effect is important to judge the performance impact of scheduler tuning/behavior. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: fix: don't take a mutex from interrupt contextPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | | | print_cfs_stats is callable from interrupt context (sysrq), hence it should not take mutexes. Change it to use RCU since the task group data is RCU freed anyway. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: print backtrace of running tasks tooNick Piggin2008-01-25
| | | | | | | | | | | | | | The attached patch is something really simple that can sometimes help in getting more info out of a hung system. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | printk: use ktime_get()Ingo Molnar2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | printk timestamps: use ktime_get(). Some platforms have a functioning clocksource function only after they are done with early bootup, so delay this until out of SYSTEM_BOOTING state. it's also inherently safe now, as any bugs in this area will be caught by the printk recursion checks. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | softlockup: fix signednessIngo Molnar2008-01-25
| | | | | | | | | | | | | | | | fix softlockup tunables signedness. mark tunables read-mostly. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: latencytop supportArjan van de Ven2008-01-25
| | | | | | | | | | | | | | | | LatencyTOP kernel infrastructure; it measures latencies in the scheduler and tracks it system wide and per process. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: fix goto retry in pick_next_task_rt()Dmitry Adamushko2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | looking at it one more time: (1) it looks to me that there is no need to call sched_rt_ratio_exceeded() from pick_next_rt_entity() - [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are not within this 'tree-like hierarchy' (or whatever we should call it :-) - there is also no need to re-check 'rt_rq->rt_time > ratio' at this point as 'rt_rq->rt_time' couldn't have been increased since the last call to update_curr_rt() (which obviously calls sched_rt_ratio_esceeded()) well, it might be that 'ratio' for this rt_rq has been re-configured (and the period over which this rt_rq was active has not yet been finished)... but I don't think we should really take this into account. (2) now pick_next_rt_entity() must never return NULL, so let's change pick_next_task_rt() accordingly. Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: monitor clock underflows in /proc/sched_debugGuillaume Chazarain2008-01-25
| | | | | | | | | | | | | | We monitor clock overflows, let's also monitor clock underflows. Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: fix rq->clock warps on frequency changesGuillaume Chazarain2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | sched: fix rq->clock warps on frequency changes Fix 2bacec8c318ca0418c0ee9ac662ee44207765dd4 (sched: touch softlockup watchdog after idling) that reintroduced warps on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock that checks rq->clock for warps, so call it after adjusting rq->clock. Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: fix, always create kernel threads with normal priorityMichal Schmidt2008-01-25
| | | | | | | | | | | | | | | | | | Ensure that the kernel threads are created with the usual nice level and affinity even if kthreadd's properties were changed from the default by root. Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | debug: clean up kernel/profile.cPaolo Ciarrocchi2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before: total: 25 errors, 13 warnings, 602 lines checked After: total: 0 errors, 2 warnings, 601 lines checked No code changed: kernel/profile.o: text data bss dec hex filename 3048 236 24 3308 cec profile.o.before 3048 236 24 3308 cec profile.o.after md5: 2501d64748a4d350dffb11203e2a5182 profile.o.before.asm 2501d64748a4d350dffb11203e2a5182 profile.o.after.asm Signed-off-by: Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: remove the !PREEMPT_BKL codeIngo Molnar2008-01-25
| | | | | | | | | | | | | | | | remove the !PREEMPT_BKL code. this removes 160 lines of legacy code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: make PREEMPT_BKL the defaultIngo Molnar2008-01-25
| | | | | | | | | | | | | | | | make PREEMPT_BKL the default. precursor to removal of the !PREEMPT_BKL code. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | debug: track and print last unloaded module in the oops traceArjan van de Ven2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based on a suggestion from Andi: In various cases, the unload of a module may leave some bad state around that causes a kernel crash AFTER a module is unloaded; and it's then hard to find which module caused that. This patch tracks the last unloaded module, and prints this as part of the module list in the oops trace. Right now, only the last 1 module is tracked; I expect that this is enough for the vast majority of cases where this information matters; if it turns out that tracking more is important, we can always extend it to that. [ mingo@elte.hu: build fix ] Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | debug: show being-loaded/being-unloaded indicator for modulesArjan van de Ven2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's rather common that an oops/WARN_ON/BUG happens during the load or unload of a module. Unfortunatly, it's not always easy to see directly which module is being loaded/unloaded from the oops itself. Worse, it's not even always possible to ask the bug reporter, since there are so many components (udev etc) that auto-load modules that there's a good chance that even the reporter doesn't know which module this is. This patch extends the existing "show if it's tainting" print code, which is used as part of printing the modules in the oops/BUG/WARN_ON to include a "+" for "being loaded" and a "-" for "being unloaded". As a result this extension, the "taint_flags()" function gets renamed to "module_flags()" (and takes a module struct as argument, not a taint flags int). Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: rt-watchdog: fix .rlim_max = RLIM_INFINITYPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Remove the curious logic to set it_sched_expires in the future. It useless because rt.timeout wouldn't be incremented anyway. Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1 is d-2. Signed-off-by: Peter Zijlsta <a.p.zijlstra@chello.nl> CC: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | sched: rt-group: reduce reschedulingPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | Only reschedule if the new group has a higher prio task. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | hrtimer: unlock hrtimer_wakeupPeter Zijlstra2008-01-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hrtimer_wakeup creates a base->lock rq->lock lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ which doesn't hold base->lock. This fully untangles hrtimer locks from the scheduler locks, and allows hrtimer usage in the scheduler proper. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>