aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* signal: re-add dead task accumulation stats.Peter Zijlstra2009-02-05
| | | | | | | | | | | | | | | | | | | | | | We're going to split the process wide cpu accounting into two parts: - clocks; which can take all the time they want since they run from user context. - timers; which need constant time tracing but can affort the overhead because they're default off -- and rare. The clock readout will go back to a full sum of the thread group, for this we need to re-add the exit stats that were removed in the initial itimer rework (f06febc9: timers: fix itimer/many thread hang). Furthermore, since that full sum can be rather slow for large thread groups and we have the complete dead task stats, revert the do_notify_parent time computation. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix nohz load balancer on cpu offlineSuresh Siddha2009-02-04
| | | | | | | | | | | | | | | | | | | | | | | Christian Borntraeger reports: > After a logical cpu offline, even on a complete idle system, there > is one cpu with full ticks. It turns out that nohz.cpu_mask has the > the offlined cpu still set. > > In select_nohz_load_balancer() we check if the system is completely > idle to turn of load balancing. We compare cpu_online_map with > nohz.cpu_mask. Since cpu_online_map is updated on cpu unplug, > but nohz.cpu_mask is not, the check fails and the scheduler believes > that we need an "idle load balancer" even on a fully idle system. > Since the ilb cpu does not deactivate the timer tick this breaks NOHZ. Fix the select_nohz_load_balancer() to not set the nohz.cpu_mask while a cpu is going offline. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2009-02-02
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched_rt: don't use first_cpu on cpumask created with cpumask_and sched: fix buddie group latency sched: clear buddies more aggressively sched: symmetric sync vs avg_overlap sched: fix sync wakeups cpuset: fix possible deadlock in async_rebuild_sched_domains
| * sched_rt: don't use first_cpu on cpumask created with cpumask_andRusty Russell2009-02-01
| | | | | | | | | | | | | | | | | | cpumask_and() only initializes nr_cpu_ids bits, so the (deprecated) first_cpu() might find one of those uninitialized bits if nr_cpu_ids is less than NR_CPUS (as it can be for CONFIG_CPUMASK_OFFSTACK). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix buddie group latencyPeter Zijlstra2009-02-01
| | | | | | | | | | | | | | | | | | | | | | Similar to the previous patch, by not clearing buddies we can select entities past their run quota, which can increase latency. This means we have to clear group buddies as well. Do not use the group clear for pick_next_task(), otherwise that'll get O(n^2). Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: clear buddies more aggressivelyMike Galbraith2009-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It was noticed that a task could get re-elected past its run quota due to buddy affinities. This could increase latency a little. Cure it by more aggresively clearing buddy state. We do so in two situations: - when we force preempt - when we select a buddy to run Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: symmetric sync vs avg_overlapPeter Zijlstra2009-02-01
| | | | | | | | | | | | | | | | Reinstate the weakening of the sync hint if set. This yields a more symmetric usage of avg_overlap. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * sched: fix sync wakeupsPeter Zijlstra2009-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pawel Dziekonski reported that the openssl benchmark and his quantum chemistry application both show slowdowns due to the scheduler under-parallelizing execution. The reason are pipe wakeups still doing 'sync' wakeups which overrides the normal buddy wakeup logic - even if waker and wakee are loosely coupled. Fix an inversion of logic in the buddy wakeup code. Reported-by: Pawel Dziekonski <dzieko@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * cpuset: fix possible deadlock in async_rebuild_sched_domainsMiao Xie2009-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep reported some possible circular locking info when we tested cpuset on NUMA/fake NUMA box. ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.29-rc1-00224-ga652504 #111 ------------------------------------------------------- bash/2968 is trying to acquire lock: (events){--..}, at: [<ffffffff8024c8cd>] flush_work+0x24/0xd8 but task is already holding lock: (cgroup_mutex){--..}, at: [<ffffffff8026ad1e>] cgroup_lock_live_group+0x12/0x29 which lock already depends on the new lock. ...... ------------------------------------------------------- Steps to reproduce: # mkdir /dev/cpuset # mount -t cpuset xxx /dev/cpuset # mkdir /dev/cpuset/0 # echo 0 > /dev/cpuset/0/cpus # echo 0 > /dev/cpuset/0/mems # echo 1 > /dev/cpuset/0/memory_migrate # cat /dev/zero > /dev/null & # echo $! > /dev/cpuset/0/tasks This is because async_rebuild_sched_domains has the following lock sequence: run_workqueue(async_rebuild_sched_domains) -> do_rebuild_sched_domains -> cgroup_lock But, attaching tasks when memory_migrate is set has following: cgroup_lock_live_group(cgroup_tasks_write) -> do_migrate_pages -> flush_work This patch fixes it by using a separate workqueue thread. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | modules: Use a better scheme for refcountingEric Dumazet2009-02-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current refcounting for modules (done if CONFIG_MODULE_UNLOAD=y) is using a lot of memory. Each 'struct module' contains an [NR_CPUS] array of full cache lines. This patch uses existing infrastructure (percpu_modalloc() & percpu_modfree()) to allocate percpu space for the refcount storage. Instead of wasting NR_CPUS*128 bytes (on i386), we now use nr_cpu_ids*sizeof(local_t) bytes. On a typical distro, where NR_CPUS=8, shiping 2000 modules, we reduce size of module files by about 2 Mbytes. (1Kb per module) Instead of having all refcounters in the same memory node - with TLB misses because of vmalloc() - this new implementation permits to have better NUMA properties, since each CPU will use storage on its preferred node, thanks to percpu storage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2009-01-31
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: generic-ipi: use per cpu data for single cpu ipi calls cpumask: convert lib/smp_processor_id to new cpumask ops signals, debug: fix BUG: using smp_processor_id() in preemptible code in print_fatal_signal()
| * | generic-ipi: use per cpu data for single cpu ipi callsSteven Rostedt2009-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The smp_call_function can be passed a wait parameter telling it to wait for all the functions running on other CPUs to complete before returning, or to return without waiting. Unfortunately, this is currently just a suggestion and not manditory. That is, the smp_call_function can decide not to return and wait instead. The reason for this is because it uses kmalloc to allocate storage to send to the called CPU and that CPU will free it when it is done. But if we fail to allocate the storage, the stack is used instead. This means we must wait for the called CPU to finish before continuing. Unfortunatly, some callers do no abide by this hint and act as if the non-wait option is mandatory. The MTRR code for instance will deadlock if the smp_call_function is set to wait. This is because the smp_call_function will wait for the other CPUs to finish their called functions, but those functions are waiting on the caller to continue. This patch changes the generic smp_call_function code to use per cpu variables if the allocation of the data fails for a single CPU call. The smp_call_function_many will fall back to the smp_call_function_single if it fails its alloc. The smp_call_function_single is modified to not force the wait state. Since we now are using a single data per cpu we must synchronize the callers to prevent a second caller modifying the data before the first called IPI functions complete. To do so, I added a flag to the call_single_data called CSD_FLAG_LOCK. When the single CPU is called (which can be called when a many call fails an alloc), we set the LOCK bit on this per cpu data. When the caller finishes it clears the LOCK bit. The caller must wait till the LOCK bit is cleared before setting it. When it is cleared, there is no IPI function using it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jens Axboe <jens.axboe@oracle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | signals, debug: fix BUG: using smp_processor_id() in preemptible code in ↵Ed Swierk2009-01-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | print_fatal_signal() With print-fatal-signals=1 on a kernel with CONFIG_PREEMPT=y, sending an unexpected signal to a process causes a BUG: using smp_processor_id() in preemptible code. get_signal_to_deliver() releases the siglock before calling print_fatal_signal(), which calls show_regs(), which calls smp_processor_id(), which is not supposed to be called from a preemptible thread. Make sure show_regs() runs with preemption disabled. Signed-off-by: Ed Swierk <eswierk@aristanetworks.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds2009-01-31
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: irq: export __set_irq_handler() and handle_level_irq()
| * | | irq: export __set_irq_handler() and handle_level_irq()Ingo Molnar2009-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix ARM updates broke x86 allmodconfig builds: ERROR: "__set_irq_handler" [drivers/mfd/pcf50633-core.ko] undefined! ERROR: "handle_level_irq" [drivers/mfd/pcf50633-core.ko] undefined! Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2009-01-31
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimer: prevent negative expiry value after clock_was_set() hrtimers: allow the hot-unplugging of all cpus hrtimers: increase clock min delta threshold while interrupt hanging
| * | | | hrtimer: prevent negative expiry value after clock_was_set()Thomas Gleixner2009-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: prevent false positive WARN_ON() in clockevents_program_event() clock_was_set() changes the base->offset of CLOCK_REALTIME and enforces the reprogramming of the clockevent device to expire timers which are based on CLOCK_REALTIME. If the clock change is large enough then the subtraction of the timer expiry value and base->offset can become negative which triggers the warning in clockevents_program_event(). Check the subtraction result and set a negative value to 0. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | hrtimers: allow the hot-unplugging of all cpusSebastien Dugue2009-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix CPU hotplug hang on Power6 testbox On architectures that support offlining all cpus (at least powerpc/pseries), hot-unpluging the tick_do_timer_cpu can result in a system hang. This comes from the fact that if the cpu going down happens to be the cpu doing the tick, then as the tick_do_timer_cpu handover happens after the cpu is dead (via the CPU_DEAD notification), we're left without ticks, jiffies are frozen and any task relying on timers (msleep, ...) is stuck. That's particularly the case for the cpu looping in __cpu_die() waiting for the dying cpu to be dead. This patch addresses this by having the tick_do_timer_cpu handover happen earlier during the CPU_DYING notification. For this, a new clockevent notification type is introduced (CLOCK_EVT_NOTIFY_CPU_DYING) which is triggered in hrtimer_cpu_notify(). Signed-off-by: Sebastien Dugue <sebastien.dugue@bull.net> Cc: <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | hrtimers: increase clock min delta threshold while interrupt hangingFrederic Weisbecker2009-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: avoid timer IRQ hanging slow systems While using the function graph tracer on a virtualized system, the hrtimer_interrupt can hang the system on an infinite loop. This can be caused in several situations: - the hardware is very slow and HZ is set too high - something intrusive is slowing the system down (tracing under emulation) ... and the next clock events to program are always before the current time. This patch implements a reasonable compromise: if such a situation is detected, we share the CPUs time in 1/4 to process the hrtimer interrupts. This is enough to let the system running without serious starvation. It has been successfully tested under VirtualBox with 1000 HZ and 100 HZ with function graph tracer launched. On both cases, the clock events were increased until about 25 ms periodic ticks, which means 40 HZ. So we change a hard to debug hang into a warning message and a system that still manages to limp along. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds2009-01-31
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, ds, bts: cleanup/fix DS configuration ring-buffer: reset timestamps when ring buffer is reset trace: set max latency variable to zero on default trace: stop all recording to ring buffer on ftrace_dump trace: print ftrace_dump at KERN_EMERG log level ring_buffer: reset write when reserve buffer fail tracing/function-graph-tracer: fix a regression while suspend to disk ring-buffer: fix alignment problem
| * | | | ring-buffer: reset timestamps when ring buffer is resetSteven Rostedt2009-01-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix bad times of recent resets The ring buffer needs to reset its timestamps when reseting of the buffer, otherwise the timestamps are stale and might be used to calculate times in the buffer causing funny timestamps to appear. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | trace: set max latency variable to zero on defaultSteven Rostedt2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: trace max latencies on start of latency tracing This patch sets the max latency to zero whenever one of the irq variant tracers or the wakeup tracer is set to current tracer. Most developers expect to see output when starting up a latency tracer. But since the max_latency is already set to max, and it takes a latency greater than max_latency to be recorded, there is no trace. This is not the expected behavior and has even confused myself. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | trace: stop all recording to ring buffer on ftrace_dumpSteven Rostedt2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: limit ftrace dump output Currently ftrace_dump only calls ftrace_kill that is a fast way to prevent the function tracer functions from being called (just sets a flag and clears the function to call, nothing else). It is better to also turn off any recording to the ring buffers as well. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | trace: print ftrace_dump at KERN_EMERG log levelSteven Rostedt2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix to print out ftrace_dump when expected I was debugging a hard race condition to only find out that after I hit the race, my log level was not at level to show KERN_INFO. The time it took to trigger the race was wasted because I did not capture the trace. Since ftrace_dump is only called from kernel oops (and only when it is set in the kernel command line to do so), or when a developer adds it to their own local tree, the log level of the print should be at KERN_EMERG to make sure the print appears. ftrace_dump is not called by a normal user setup, and will not add extra unwanted print out to the console. There is no reason it should be at KERN_INFO. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | ring_buffer: reset write when reserve buffer failLai Jiangshan2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: reset struct buffer_page.write when interrupt storm if struct buffer_page.write is not reset, any succedent committing will corrupted ring_buffer: static inline void rb_set_commit_to_write(struct ring_buffer_per_cpu *cpu_buffer) { ...... cpu_buffer->commit_page->commit = cpu_buffer->commit_page->write; ...... } when "if (RB_WARN_ON(cpu_buffer, next_page == reader_page))", ring_buffer is disabled, but some reserved buffers may haven't been committed. we need reset struct buffer_page.write. when "if (unlikely(next_page == cpu_buffer->commit_page))", ring_buffer is still available, we should not corrupt it. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | tracing/function-graph-tracer: fix a regression while suspend to diskFrederic Weisbecker2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix a crash while kernel image restore When the function graph tracer is running and while suspend to disk, some racy and dangerous things happen against this tracer. The current task will save its registers including the stack pointer which contains the return address hooked by the tracer. But the current task will continue to enter other functions after that to save the memory, and then it will store other return addresses, and finally loose the old depth which matches the return address saved in the old stack (during the registers saving). So on image restore, the code will return to wrong addresses. And there are other things: on restore, the task will have it's "current" pointer overwritten during registers restoring....switching from one task to another... That would be insane to try to trace function graphs at these stages. This patch makes the function graph tracer listening on power events, making it's tracing disabled for the current task (the one that performs the hibernation work) while suspend/resume to disk, making the tracing safe during hibernation. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | ring-buffer: fix alignment problemSteven Rostedt2009-01-20
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: fix to allow some archs to use the ring buffer Commits in the ring buffer are checked by pointer arithmetic. If the calculation is incorrect, then the commits will never take place and the buffer will simply fill up and report an error. Each page in the ring buffer has a small header: struct buffer_data_page { u64 time_stamp; local_t commit; unsigned char data[]; }; Unfortuntely, some of the calculations used sizeof(struct buffer_data_page) to know the size of the header. But this is incorrect on some archs, where sizeof(struct buffer_data_page) does not equal offsetof(struct buffer_data_page, data), and on those archs, the commits are never processed. This patch replaces the sizeof with offsetof. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | cgroup: fix root_count when mount fails due to busy subsystemPaul Menage2009-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | root_count was being incremented in cgroup_get_sb() after all error checking was complete, but decremented in cgroup_kill_sb(), which can be called on a superblock that we gave up on due to an error. This patch changes cgroup_kill_sb() to only decrement root_count if the root was previously linked into the list of roots. Signed-off-by: Paul Menage <menage@google.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | cgroups: add cpu_relax() calls in css_tryget() and cgroup_clear_css_refs()Paul Menage2009-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | css_tryget() and cgroup_clear_css_refs() contain polling loops; these loops should have cpu_relax calls in them to reduce cross-cache traffic. Signed-off-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | cgroups: fix lock inconsistency in cgroup_clone()Li Zefan2009-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I fixed a bug in cgroup_clone() in Linus' tree in commit 7b574b7 ("cgroups: fix a race between cgroup_clone and umount") without noticing there was a cleanup patch in -mm tree that should be rebased (now commit 104cbd5, "cgroups: use task_lock() for access tsk->cgroups safe in cgroup_clone()"), thus resulted in lock inconsistency. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | cgroups: use hierarchy mutex in creation failure pathKAMEZAWA Hiroyuki2009-01-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, cgrp->sibling is handled under hierarchy mutex. error route should do so, too. Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Acked-by Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | cpumask fallout: Initialize irq_default_affinity earlierDavid Daney2009-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the initialization of irq_default_affinity to early_irq_init as core_initcall is too late. irq_default_affinity can be used in init_IRQ and potentially timer and SMP init as well. All of these happen before core_initcall. Moving the initialization to early_irq_init ensures that it is initialized before it is used. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Acked-by: Mike Travis <travis@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Make irq_*_affinity depend on CONFIG_GENERIC_HARDIRQS too.David Daney2009-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In interrupt.h these functions are declared only if CONFIG_GENERIC_HARDIRQS is set. We should define them under identical conditions. Signed-off-by: David Daney <ddaney@caviumnetworks.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'hibern_fixes' of ↵Linus Torvalds2009-01-27
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'hibern_fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: SATA PIIX: Blacklist system that spins off disks during ACPI power off SATA Sil: Blacklist system that spins off disks during ACPI power off SATA AHCI: Blacklist system that spins off disks during ACPI power off SATA: Blacklisting of systems that spin off disks during ACPI power off DMI: Introduce dmi_first_match to make the interface more flexible Hibernation: Introduce system_entering_hibernation
| * | | | Hibernation: Introduce system_entering_hibernationRafael J. Wysocki2009-01-27
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce boolean function system_entering_hibernation() returning 'true' during the last phase of hibernation, in which devices are being put into low power states and the sleep state (for example, ACPI S4) is finally entered. Some device drivers need such a function to check if the system is in the final phase of hibernation. In particular, some SATA drivers are going to use it for blacklisting systems in which the disks should not be spun down during the last phase of hibernation (the BIOS will do that anyway). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixesLinus Torvalds2009-01-26
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes: kbuild: fix kbuild.txt typos kbuild: print usage with no arguments in scripts/config Revert "kbuild: strip generated symbols from *.ko"
| * | | Revert "kbuild: strip generated symbols from *.ko"Sam Ravnborg2009-01-14
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit ad7a953c522ceb496611d127e51e278bfe0ff483. And commit: ("allow stripping of generated symbols under CONFIG_KALLSYMS_ALL") 9bb482476c6c9d1ae033306440c51ceac93ea80c These stripping patches has caused a set of issues: 1) People have reported compatibility issues with binutils due to lack of support for `--strip-unneeded-symbols' with objcopy 2.15.92.0.2 Reported by: Wenji 2) ccache and distcc no longer works as expeced Reported by: Ted, Roland, + others 3) The installed modules increased a lot in size Reported by: Ted, Davej + others Reported-by: Wenji Huang <wenji.huang@oracle.com> Reported-by: "Theodore Ts'o" <tytso@mit.edu> Reported-by: Dave Jones <davej@redhat.com> Reported-by: Roland McGrath <roland@redhat.com> Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
* | | Merge branch 'sh/for-2.6.29' of ↵Linus Torvalds2009-01-26
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 * 'sh/for-2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (22 commits) dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy. dma-coherent: per-device coherent area is in pages, not bytes. sh: fix unaligned and nonexistent address handling nommu: Stub in vm_map_ram()/vm_unmap_ram()/vm_unmap_aliases(). sh: fix sh-sci / early printk build on sh7723 sh: export the sh7343 JPU to user space sh: update defconfigs. serial: sh-sci: Fix up SH7720/SH7721 SCI build. sh: Kill off obsolete busses from arch/sh/Kconfig. sh: sh7785lcr/highlander/hp6xx need linux/irq.h. sh: Migo-R MMC support using spi_gpio and mmc_spi. sh: ap325rxa MMC support using spi_gpio and mmc_spi sh: mach-x3proto: needs linux/irq.h. sh: Drop the BKL from sys_execve() on SH-5. sh: convert rsk7203 to use smsc911x. sh: convert magicpanelr2 platform to use smsc911x. sh: convert ap325rxa platform to use smsc911x. sh: mach-migor: Add tw9910 support. sh: mach-migor: Delete soc_camera_platform setup. sh: mach-migor: Add ov772x support. ...
| * | | dma-coherent: Restore dma_alloc_from_coherent() large alloc fall back policy.Paul Mundt2009-01-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing large allocations (larger than the per-device coherent area) the generic memory allocators are silently fallen back on regardless of consideration for the per-device constraints. In the DMA_MEMORY_EXCLUSIVE case falling back on generic memory is not an option, as it tends not to be addressable by the DMA hardware in question. This issue showed up with the 8139too breakage on the Dreamcast, where non-addressable buffers were silently allocated due to the size mismatch calculation -- while it should have simply errored out upon being unable to satisfy the allocation with the given device constraints. This restores fall back behaviour to what it was before the oversized request change caused multiple regressions. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | | dma-coherent: per-device coherent area is in pages, not bytes.Adrian McMenamin2009-01-21
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 58c6d3dfe436eb8cfb451981d8fdc9044eaf42da ("dma-coherent: catch oversized requests to dma_alloc_from_coherent()") attempted to add a sanity check to bail out on allocations larger than the coherent area. Unfortunately when this was implemented, the fact the coherent area is tracked in pages rather than bytes was overlooked, which subsequently broke every single dma_alloc_from_coherent() user, forcing the allocation silently through generic memory instead. Signed-off-by: Adrian McMenamin <adrian@mcmen.demon.co.uk > Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* | | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2009-01-26
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: debugobjects: add and use INIT_WORK_ON_STACK rcu: remove duplicate CONFIG_RCU_CPU_STALL_DETECTOR relay: fix lock imbalance in relay_late_setup_files oprofile: fix uninitialized use of struct op_entry rcu: move Kconfig menu softlock: fix false panic which can occur if softlockup_thresh is reduced rcu: add __cpuinit to rcu_init_percpu_data()
| * | | Merge branch 'core/debugobjects' into core/urgentThomas Gleixner2009-01-22
| |\| |
| * | | relay: fix lock imbalance in relay_late_setup_filesJiri Slaby2009-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One fail path in relay_late_setup_files() omits mutex_unlock(&relay_channels_mutex); Add it. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | softlock: fix false panic which can occur if softlockup_thresh is reducedMandeep Singh Baines2009-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At run-time, if softlockup_thresh is changed to a much lower value, touch_timestamp is likely to be much older than the new softlock_thresh. This will cause a false softlockup to be detected. If softlockup_panic is enabled, the system will panic. The fix is to touch all watchdogs before changing softlockup_thresh. Signed-off-by: Mandeep Singh Baines <msb@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | rcu: add __cpuinit to rcu_init_percpu_data()Lai Jiangshan2009-01-14
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | Impact: reduce memory footprint add __cpuinit to rcu_init_percpu_data(), and this function's text will be discarded after boot when !CONFIG_HOTPLUG_CPU. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2009-01-26
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: hrtimers: fix inconsistent lock state on resume in hres_timers_resume time-sched.c: tick_nohz_update_jiffies should be static locking, hpet: annotate false positive warning kernel/fork.c: unused variable 'ret' itimers: remove the per-cpu-ish-ness
| * | | hrtimers: fix inconsistent lock state on resume in hres_timers_resumePeter Zijlstra2009-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Andrey Borzenkov reported this lockdep assert: > [17854.688347] ================================= > [17854.688347] [ INFO: inconsistent lock state ] > [17854.688347] 2.6.29-rc2-1avb #1 > [17854.688347] --------------------------------- > [17854.688347] inconsistent {in-hardirq-W} -> {hardirq-on-W} usage. > [17854.688347] pm-suspend/18240 [HC0[0]:SC0[0]:HE1:SE1] takes: > [17854.688347] (&cpu_base->lock){++..}, at: [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] {in-hardirq-W} state was registered at: > [17854.688347] [<c01443cd>] __lock_acquire+0x79d/0x1930 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136e61>] hrtimer_run_queues+0x31/0x140 > [17854.688347] [<c0128d98>] run_local_timers+0x8/0x20 > [17854.688347] [<c0128dd3>] update_process_times+0x23/0x60 > [17854.688347] [<c013e274>] tick_periodic+0x24/0x80 > [17854.688347] [<c013e2e2>] tick_handle_periodic+0x12/0x70 > [17854.688347] [<c0104e24>] timer_interrupt+0x14/0x20 > [17854.688347] [<c01607b9>] handle_IRQ_event+0x29/0x60 > [17854.688347] [<c0161c59>] handle_level_irq+0x69/0xe0 > [17854.688347] [<ffffffff>] 0xffffffff > [17854.688347] irq event stamp: 55771 > [17854.688347] hardirqs last enabled at (55771): [<c0309125>] _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] hardirqs last disabled at (55770): [<c0309419>] _spin_lock_irqsave+0x19/0x80 > [17854.688347] softirqs last enabled at (54836): [<c0124f54>] __do_softirq+0xc4/0x110 > [17854.688347] softirqs last disabled at (54831): [<c01049ae>] do_softirq+0x8e/0xe0 > [17854.688347] > [17854.688347] other info that might help us debug this: > [17854.688347] 3 locks held by pm-suspend/18240: > [17854.688347] #0: (&buffer->mutex){--..}, at: [<c01dd4c5>] sysfs_write_file+0x25/0x100 > [17854.688347] #1: (pm_mutex){--..}, at: [<c015056f>] enter_state+0x4f/0x140 > [17854.688347] #2: (dpm_list_mtx){--..}, at: [<c027880f>] device_pm_lock+0xf/0x20 > [17854.688347] > [17854.688347] stack backtrace: > [17854.688347] Pid: 18240, comm: pm-suspend Not tainted 2.6.29-rc2-1avb #1 > [17854.688347] Call Trace: > [17854.688347] [<c0306248>] ? printk+0x18/0x20 > [17854.688347] [<c0141fac>] print_usage_bug+0x16c/0x1d0 > [17854.688347] [<c0142bcf>] mark_lock+0x8bf/0xc90 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c01441b0>] __lock_acquire+0x580/0x1930 > [17854.688347] [<c030916d>] ? _spin_unlock+0x1d/0x20 > [17854.688347] [<c0106b8f>] ? pit_next_event+0x2f/0x40 > [17854.688347] [<c013dd38>] ? clockevents_program_event+0x98/0x160 > [17854.688347] [<c0142fe8>] ? mark_held_locks+0x48/0x90 > [17854.688347] [<c0309125>] ? _spin_unlock_irqrestore+0x35/0x60 > [17854.688347] [<c0143229>] ? trace_hardirqs_on_caller+0x139/0x190 > [17854.688347] [<c014328b>] ? trace_hardirqs_on+0xb/0x10 > [17854.688347] [<c01455bc>] lock_acquire+0x5c/0x80 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c03092e5>] _spin_lock+0x35/0x70 > [17854.688347] [<c0136fcc>] ? retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c0136fcc>] retrigger_next_event+0x5c/0xa0 > [17854.688347] [<c013711a>] hres_timers_resume+0xa/0x10 > [17854.688347] [<c013aa8e>] timekeeping_resume+0xee/0x150 > [17854.688347] [<c0273384>] __sysdev_resume+0x14/0x50 > [17854.688347] [<c0273407>] sysdev_resume+0x47/0x80 > [17854.688347] [<c02791ab>] device_power_up+0xb/0x20 > [17854.688347] [<c015043f>] suspend_devices_and_enter+0xcf/0x150 > [17854.688347] [<c0150c2f>] ? freeze_processes+0x3f/0x90 > [17854.688347] [<c0150614>] enter_state+0xf4/0x140 > [17854.688347] [<c01506dd>] state_store+0x7d/0xc0 > [17854.688347] [<c0150660>] ? state_store+0x0/0xc0 > [17854.688347] [<c0202da4>] kobj_attr_store+0x24/0x30 > [17854.688347] [<c01dd53c>] sysfs_write_file+0x9c/0x100 > [17854.688347] [<c019916c>] vfs_write+0x9c/0x160 > [17854.688347] [<c0103494>] ? restore_nocheck_notrace+0x0/0xe > [17854.688347] [<c01dd4a0>] ? sysfs_write_file+0x0/0x100 > [17854.688347] [<c01992ed>] sys_write+0x3d/0x70 > [17854.688347] [<c0103371>] sysenter_do_call+0x12/0x31 Andrey's analysis: > timekeeping_resume() is called via class ->resume > method; and according to comments in sysdev_resume() and > device_power_up(), they are called with interrupts disabled. > > Looking at suspend_enter, irqs *are* disabled at this point. > > So it actually looks like something (may be some driver) > unconditionally enabled irqs in resume path. Add a debug check to test this theory. If it triggers then it triggers because the resume code calls it with irqs enabled, which is a no-no not just for timekeeping_resume(), but also bad for a number of other resume handlers. Reported-by: Andrey Borzenkov <arvidjaar@mail.ru> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | time-sched.c: tick_nohz_update_jiffies should be staticJaswinder Singh Rajput2009-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: cleanup, reduce kernel size a bit, avoid sparse warning Fixes sparse warning: kernel/time/tick-sched.c:137:6: warning: symbol 'tick_nohz_update_jiffies' was not declared. Should it be static? Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | kernel/fork.c: unused variable 'ret'Steven Noonan2009-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed the unused variable. Signed-off-by: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge commit 'v2.6.29-rc1' into timers/urgentIngo Molnar2009-01-11
| |\ \ \