aboutsummaryrefslogtreecommitdiffstats
path: root/kernel
Commit message (Collapse)AuthorAge
* Merge commit 'tip/tracing/core' into oprofile/coreRobert Richter2010-04-23
|\ | | | | | | | | | | | | Conflicts: drivers/oprofile/cpu_buffer.c Signed-off-by: Robert Richter <robert.richter@amd.com>
| * Merge branch 'linus' into tracing/coreIngo Molnar2010-04-08
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: include/linux/module.h kernel/module.c Semantic conflict: include/trace/events/module.h Merge reason: Resolve the conflict with upstream commit 5fbfb18 ("Fix up possibly racy module refcounting") Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | ring-buffer: Add lost event count to end of sub bufferSteven Rostedt2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, binary readers of the ring buffer only know where events were lost, but not how many events were lost at that location. This information is available, but it would require adding another field to the sub buffer header to include it. But when a event can not fit at the end of a sub buffer, it is written to the next sub buffer. This means there is a good chance that the buffer may have room to hold this counter. If it does, write the counter at the end of the sub buffer and set another flag in the data size field that states that this information exists. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | tracing: Show the lost events in the trace_pipe outputSteven Rostedt2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the ring buffer can keep track of where events are lost. Use this information to the output of trace_pipe: hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock CPU:1 [LOST 673 EVENTS] hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738 hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem Even works with the function graph tracer: 2) ! 170.098 us | } 2) 4.036 us | rcu_irq_exit(); 2) 3.657 us | idle_cpu(); 2) ! 190.301 us | } CPU:2 [LOST 2196 EVENTS] 2) 0.853 us | } /* cancel_dirty_page */ 2) | remove_from_page_cache() { 2) 1.578 us | _raw_spin_lock_irq(); 2) | __remove_from_page_cache() { Note, it does not work with the iterator "trace" file, since it requires the use of consuming the page from the ring buffer to determine how many events were lost, which the iterator does not do. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | ring-buffer: Add place holder recording of dropped eventsSteven Rostedt2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | tracing: Fix compile error in module tracepoints when MODULE_UNLOAD not setSteven Rostedt2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If modules are configured in the build but unloading of modules is not, then the refcnt is not defined. Place the get/put module tracepoints under CONFIG_MODULE_UNLOAD since it references this field in the module structure. As a side-effect, this patch also reduces the code when MODULE_UNLOAD is not set, because these unused tracepoints are not created. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | tracing: Remove side effect from module tracepoints that caused a GPFLi Zefan2010-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the @refcnt argument, because it has side-effects, and arguments with side-effects are not skipped by the jump over disabled instrumentation and are executed even when the tracepoint is disabled. This was also causing a GPF as found by Randy Dunlap: Subject: 2.6.33 GP fault only when built with tracing LKML-Reference: <4BA2B69D.3000309@oracle.com> Note, the current 2.6.34-rc has a fix for the actual cause of the GPF, but this fixes one of its triggers. Tested-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4BA97FA7.6040406@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | rcu: Make RCU lockdep check the lockdep_recursion variablePaul E. McKenney2010-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lockdep facility temporarily disables lockdep checking by incrementing the current->lockdep_recursion variable. Such disabling happens in NMIs and in other situations where lockdep might expect to recurse on itself. This patch therefore checks current->lockdep_recursion, disabling RCU lockdep splats when this variable is non-zero. In addition, this patch removes the "likely()", as suggested by Lai Jiangshan. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: David Miller <davem@davemloft.net> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | PM / Hibernate: user.c, fix SNAPSHOT_SET_SWAP_AREA handlingJiri Slaby2010-04-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_DEBUG_BLOCK_EXT_DEVT is set we decode the device improperly by old_decode_dev and it results in an error while hibernating with s2disk. All users already pass the new device number, so switch to new_decode_dev(). Signed-off-by: Jiri Slaby <jslaby@suse.cz> Reported-and-tested-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: "Rafael J. Wysocki" <rjw@sisk.pl>
* | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-04-08
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: Fix sched_getaffinity()
| * | sched: Fix sched_getaffinity()Anton Blanchard2010-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | taskset on 2.6.34-rc3 fails on one of my ppc64 test boxes with the following error: sched_getaffinity(0, 16, 0x10029650030) = -1 EINVAL (Invalid argument) This box has 128 threads and 16 bytes is enough to cover it. Commit cd3d8031eb4311e516329aee03c79a08333141f1 (sched: sched_getaffinity(): Allow less than NR_CPUS length) is comparing this 16 bytes agains nr_cpu_ids. Fix it by comparing nr_cpu_ids to the number of bits in the cpumask we pass in. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Sharyathi Nagesh <sharyath@in.ibm.com> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jack Steiner <steiner@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Mike Travis <travis@sgi.com> LKML-Reference: <20100406070218.GM5594@kryten> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | mm: avoid null-pointer deref in sync_mm_rss()KAMEZAWA Hiroyuki2010-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - We weren't zeroing p->rss_stat[] at fork() - Consequently sync_mm_rss() was dereferencing tsk->mm for kernel threads and was oopsing. - Make __sync_task_rss_stat() static, too. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=15648 [akpm@linux-foundation.org: remove the BUG_ON(!mm->rss)] Reported-by: Troels Liebe Bentsen <tlb@rapanden.dk> Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> "Michael S. Tsirkin" <mst@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'irq-core-for-linus' of ↵Linus Torvalds2010-04-06
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Force MSI irq handlers to run with interrupts disabled
| * | | genirq: Force MSI irq handlers to run with interrupts disabledThomas Gleixner2010-03-31
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Network folks reported that directing all MSI-X vectors of their multi queue NICs to a single core can cause interrupt stack overflows when enough interrupts fire at the same time. This is caused by the fact that we run interrupt handlers by default with interrupts enabled unless the driver reuqests the interrupt with the IRQF_DISABLED set. The NIC handlers do not set this flag, so simultaneous interrupts can nest unlimited and cause the stack overflow. The only safe counter measure is to run the interrupt handlers with interrupts disabled. We can't switch to this mode in general right now, but it is safe to do so for MSI interrupts. Force IRQF_DISABLED for MSI interrupt handlers. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Linus Torvalds <torvalds@osdl.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: David Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: stable@kernel.org
* | | Fix up possibly racy module refcountingNick Piggin2010-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Module refcounting is implemented with a per-cpu counter for speed. However there is a race when tallying the counter where a reference may be taken by one CPU and released by another. Reference count summation may then see the decrement without having seen the previous increment, leading to lower than expected count. A module which never has its actual reference drop below 1 may return a reference count of 0 due to this race. Module removal generally runs under stop_machine, which prevents this race causing bugs due to removal of in-use modules. However there are other real bugs in module.c code and driver code (module_refcount is exported) where the callers do not run under stop_machine. Fix this by maintaining running per-cpu counters for the number of module refcount increments and the number of refcount decrements. The increments are tallied after the decrements, so any decrement seen will always have its corresponding increment counted. The final refcount is the difference of the total increments and decrements, preventing a low-refcount from being returned. Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | audit: preface audit printk with auditEric Paris2010-04-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There have been a number of reports of people seeing the message: "name_count maxed, losing inode data: dev=00:05, inode=3185" in dmesg. These usually lead to people reporting problems to the filesystem group who are in turn clueless what they mean. Eventually someone finds me and I explain what is going on and that these come from the audit system. The basics of the problem is that the audit subsystem never expects a single syscall to 'interact' (for some wish washy meaning of interact) with more than 20 inodes. But in fact some operations like loading kernel modules can cause changes to lots of inodes in debugfs. There are a couple real fixes being bandied about including removing the fixed compile time limit of 20 or not auditing changes in debugfs (or both) but neither are small and obvious so I am not sending them for immediate inclusion (I hope Al forwards a real solution next devel window). In the meantime this patch simply adds 'audit' to the beginning of the crap message so if a user sees it, they come blame me first and we can talk about what it means and make sure we understand all of the reasons it can happen and make sure this gets solved correctly in the long run. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/miscLinus Torvalds2010-04-05
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'slabh' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: eeepc-wmi: include slab.h staging/otus: include slab.h from usbdrv.h percpu: don't implicitly include slab.h from percpu.h kmemcheck: Fix build errors due to missing slab.h include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h iwlwifi: don't include iwl-dev.h from iwl-devtrace.h x86: don't include slab.h from arch/x86/include/asm/pgtable_32.h Fix up trivial conflicts in include/linux/percpu.h due to is_kernel_percpu_address() having been introduced since the slab.h cleanup with the percpu_up.c splitup.
| * \ \ Merge branch 'master' into export-slabhTejun Heo2010-04-04
| |\ \ \ | | | |/ | | |/|
| * | | include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-04-05
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: module: add stub for is_module_percpu_address percpu, module: implement and use is_kernel/module_percpu_address() module: encapsulate percpu handling better and record percpu_size
| * | | percpu, module: implement and use is_kernel/module_percpu_address()Tejun Heo2010-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep has custom code to check whether a pointer belongs to static percpu area which is somewhat broken. Implement proper is_kernel/module_percpu_address() and replace the custom code. On UP, percpu variables are regular static variables and can't be distinguished from them. Always return %false on UP. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@redhat.com>
| * | | module: encapsulate percpu handling better and record percpu_sizeTejun Heo2010-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Better encapsulate module static percpu area handling so that code outsidef of CONFIG_SMP ifdef doesn't deal with mod->percpu directly and add mod->percpu_size and record percpu_size in it. Both percpu fields are compiled out on UP. While at it, mark mod->percpu w/ __percpu. This is to prepare for is_module_percpu_address(). Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Rusty Russell <rusty@rustcorp.com.au>
* | | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-04-04
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf: Always build the powerpc perf_arch_fetch_caller_regs version perf: Always build the stub perf_arch_fetch_caller_regs version perf, probe-finder: Build fix on Debian perf/scripts: Tuple was set from long in both branches in python_process_event() perf: Fix 'perf sched record' deadlock perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernels perf, x86: Fix AMD hotplug & constraint initialization x86: Move notify_cpu_starting() callback to a later stage x86,kgdb: Always initialize the hw breakpoint attribute perf: Use hot regs with software sched switch/migrate events perf: Correctly align perf event tracing buffer
| * | | | perf: Always build the stub perf_arch_fetch_caller_regs versionFrederic Weisbecker2010-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that software events use perf_arch_fetch_caller_regs() too, we need the stub version to be always built in for archs that don't implement it. Fixes the following build error in PARISC: kernel/built-in.o: In function `perf_event_task_sched_out': (.text.perf_event_task_sched_out+0x54): undefined reference to `perf_arch_fetch_caller_regs' Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org>
| * | | | perf: Fix 'perf sched record' deadlockMike Galbraith2010-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf sched record can deadlock a box should the holder of handle->data->lock take an interrupt, and then attempt to acquire an rq lock held by a CPU trying to acquire the same lock. Disable interrupts. CPU0 CPU1 sched event with rq->lock held grab handle->data->lock spin on handle->data->lock interrupt try to grab rq->lock Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1269598293.6174.8.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge branch 'perf/urgent' of ↵Ingo Molnar2010-04-02
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
| | * | | perf: Use hot regs with software sched switch/migrate eventsFrederic Weisbecker2010-04-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scheduler's task migration events don't work because they always pass NULL regs perf_sw_event(). The event hence gets filtered in perf_swevent_add(). Scheduler's context switches events use task_pt_regs() to get the context when the event occured which is a wrong thing to do as this won't give us the place in the kernel where we went to sleep but the place where we left userspace. The result is even more wrong if we switch from a kernel thread. Use the hot regs snapshot for both events as they belong to the non-interrupt/exception based events family. Unlike page faults or so that provide the regs matching the exact origin of the event, we need to save the current context. This makes the task migration event working and fix the context switch callchains and origin ip. Example: perf record -a -e cs Before: 10.91% ksoftirqd/0 0 [k] 0000000000000000 | --- (nil) perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_swevent_ctx_event do_perf_sw_event __perf_sw_event perf_event_task_sched_out schedule run_ksoftirqd kthread kernel_thread_helper After: 23.77% hald-addon-stor [kernel.kallsyms] [k] schedule | --- schedule | |--60.00%-- schedule_timeout | wait_for_common | wait_for_completion | blk_execute_rq | scsi_execute | scsi_execute_req | sr_test_unit_ready | | | |--66.67%-- sr_media_change | | media_changed | | cdrom_media_changed | | sr_block_media_changed | | check_disk_change | | cdrom_open v2: Always build perf_arch_fetch_caller_regs() now that software events need that too. They don't need it from modules, unlike trace events, so we keep the EXPORT_SYMBOL in trace_event_perf.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
| | * | | perf: Correctly align perf event tracing bufferFrederic Weisbecker2010-04-01
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The trace event buffer used by perf to record raw sample events is typed as an array of char and may then not be aligned to 8 by alloc_percpu(). But we need it to be aligned to 8 in sparc64 because we cast this buffer into a random structure type built by the TRACE_EVENT() macro to store the traces. So if a random 64 bits field is accessed inside, it may be not under an expected good alignment. Use an array of long instead to force the appropriate alignment, and perform a compile time check to ensure the size in byte of the buffer is a multiple of sizeof(long) so that its actual size doesn't get shrinked under us. This fixes unaligned accesses reported while using perf lock in sparc 64. Suggested-by: David Miller <davem@davemloft.net> Suggested-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net> Cc: Steven Rostedt <rostedt@goodmis.org>
* | | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-04-04
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlock sched: Fix proc_sched_set_task()
| * | | | sched: set_cpus_allowed_ptr(): Don't use rq->migration_thread after unlockOleg Nesterov2010-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trivial typo fix. rq->migration_thread can be NULL after task_rq_unlock(), this is why we have "mt" which should be used instead. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100330165829.GA18284@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | sched: Fix proc_sched_set_task()Mike Galbraith2010-04-02
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Latencytop clearing sum_exec_runtime via proc_sched_set_task() breaks task_times(). Other places in kernel use nvcsw and nivcsw, which are being cleared as well, Clear task statistics only. Reported-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Cc: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1269940193.19286.14.camel@marge.simson.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'tracing-fixes-for-linus' of ↵Linus Torvalds2010-04-04
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ring-buffer: Add missing unlock tracing: Fix lockdep warning in global_clock()
| * | | | ring-buffer: Add missing unlockJulia Lawall2010-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some error handling cases the lock is not unlocked. The return is converted to a goto, to share the unlock at the end of the function. A simplified version of the semantic patch that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ expression E1; identifier f; @@ f (...) { <+... * spin_lock_irq (E1,...); ... when != E1 * return ...; ...+> } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> LKML-Reference: <Pine.LNX.4.64.1003291736440.21896@ask.diku.dk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | tracing: Fix lockdep warning in global_clock()Li Zefan2010-03-29
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | # echo 1 > events/enable # echo global > trace_clock ------------[ cut here ]------------ WARNING: at kernel/lockdep.c:3162 check_flags+0xb2/0x190() ... ---[ end trace 3f86734a89416623 ]--- possible reason: unannotated irqs-on. ... There's no reason to use the raw_local_irq_save() in trace_clock_global. The local_irq_save() version is fine, and does not cause the bug in lockdep. Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4BA97FA1.7030606@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | Merge branch 'kgdb-fixes' of ↵Linus Torvalds2010-04-02
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb * 'kgdb-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb: kgdb: Turn off tracing while in the debugger kgdb: use atomic_inc and atomic_dec instead of atomic_set kgdb: eliminate kgdb_wait(), all cpus enter the same way kgdbts,sh: Add in breakpoint pc offset for superh kgdb: have ebin2mem call probe_kernel_write once
| * | | | kgdb: Turn off tracing while in the debuggerJason Wessel2010-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel debugger should turn off kernel tracing any time the debugger is active and restore it on resume. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | kgdb: use atomic_inc and atomic_dec instead of atomic_setJason Wessel2010-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory barriers should be used for the kgdb cpu synchronization. The atomic_set() does not imply a memory barrier. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | kgdb: eliminate kgdb_wait(), all cpus enter the same wayJason Wessel2010-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a kgdb architectural change to have all the cpus (master or slave) enter the same function. A cpu that hits an exception (wants to be the master cpu) will call kgdb_handle_exception() from the trap handler and then invoke a kgdb_roundup_cpu() to synchronize the other cpus and bring them into the kgdb_handle_exception() as well. A slave cpu will enter kgdb_handle_exception() from the kgdb_nmicallback() and set the exception state to note that the processor is a slave. Previously the salve cpu would have called kgdb_wait(). This change allows the debug core to change cpus without resuming the system in order to inspect arch specific cpu information. Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
| * | | | kgdb: have ebin2mem call probe_kernel_write onceJason Wessel2010-04-02
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than call probe_kernel_write() one byte at a time, process the whole buffer locally and pass the entire result in one go. This way, architectures that need to do special handling based on the length can do so, or we only end up calling memcpy() once. [sonic.zhang@analog.com: Reported original problem and preliminary patch] Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Sonic Zhang <sonic.zhang@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>
* | | | Merge branch 'pm-fixes' of ↵Linus Torvalds2010-04-02
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6 * 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: Freezer: Fix buggy resume test for tasks frozen with cgroup freezer Freezer: Only show the state of tasks refusing to freeze
| * | | Freezer: Fix buggy resume test for tasks frozen with cgroup freezerMatt Helsley2010-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the cgroup freezer is used to freeze tasks we do not want to thaw those tasks during resume. Currently we test the cgroup freezer state of the resuming tasks to see if the cgroup is FROZEN. If so then we don't thaw the task. However, the FREEZING state also indicates that the task should remain frozen. This also avoids a problem pointed out by Oren Ladaan: the freezer state transition from FREEZING to FROZEN is updated lazily when userspace reads or writes the freezer.state file in the cgroup filesystem. This means that resume will thaw tasks in cgroups which should be in the FROZEN state if there is no read/write of the freezer.state file to trigger this transition before suspend. NOTE: Another "simple" solution would be to always update the cgroup freezer state during resume. However it's a bad choice for several reasons: Updating the cgroup freezer state is somewhat expensive because it requires walking all the tasks in the cgroup and checking if they are each frozen. Worse, this could easily make resume run in N^2 time where N is the number of tasks in the cgroup. Finally, updating the freezer state from this code path requires trickier locking because of the way locks must be ordered. Instead of updating the freezer state we rely on the fact that lazy updates only manage the transition from FREEZING to FROZEN. We know that a cgroup with the FREEZING state may actually be FROZEN so test for that state too. This makes sense in the resume path even for partially-frozen cgroups -- those that really are FREEZING but not FROZEN. Reported-by: Oren Ladaan <orenl@cs.columbia.edu> Signed-off-by: Matt Helsley <matthltc@us.ibm.com> Cc: stable@kernel.org Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
| * | | Freezer: Only show the state of tasks refusing to freezeXiaotian Feng2010-03-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | show_state will dump all tasks state, so if freezer failed to freeze any task, kernel will dump all tasks state and flood the dmesg log. This patch makes freezer only show state of tasks refusing to freeze. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Acked-by: Pavel Machek <pavel@ucw.cz> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-03-30
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: CRED: Fix memory leak in error handling
| * | | | CRED: Fix memory leak in error handlingMathieu Desnoyers2010-03-30
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | Fix a memory leak on an OOM condition in prepare_usermodehelper_creds(). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: James Morris <jmorris@namei.org>
* | | | Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2010-03-30
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Do not free zero sized per cpu areas x86: Make sure free_init_pages() frees pages on page boundary x86: Make smp_locks end with page alignment
| * | | x86: Do not free zero sized per cpu areasIan Campbell2010-03-29
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids an infinite loop in free_early_partial(). Add a warning to free_early_partial() to catch future problems. -v5: put back start > end back into WARN_ONCE() -v6: use one line for warning, suggested by Linus -v7: more tests -v8: remove the function name as suggested by Johannes WARN_ONCE() will print out that function name. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: Joel Becker <joel.becker@oracle.com> Tested-by: Stanislaw Gruszka <sgruszka@redhat.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> LKML-Reference: <1269830604-26214-4-git-send-email-yinghai@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | SLOW_WORK: CONFIG_SLOW_WORK_PROC should be CONFIG_SLOW_WORK_DEBUGDavid Howells2010-03-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_SLOW_WORK_PROC was changed to CONFIG_SLOW_WORK_DEBUG, but not in all instances. Change the remaining instances. This makes the debugfs file display the time mark and the owner's description again. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | slow-work: use get_ref wrapper instead of directly calling get_refDave Airlie2010-03-29
|/ / | | | | | | | | | | | | | | | | Otherwise we can get an oops if the user has no get_ref/put_ref requirement. Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2010-03-26
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: x86/PCI: truncate _CRS windows with _LEN > _MAX - _MIN + 1 x86/PCI: for host bridge address space collisions, show conflicting resource frv/PCI: remove redundant warnings x86/PCI: remove redundant warnings PCI: don't say we claimed a resource if we failed PCI quirk: Disable MSI on VIA K8T890 systems PCI quirk: RS780/RS880: work around missing MSI initialization PCI quirk: only apply CX700 PCI bus parking quirk if external VT6212L is present PCI: complain about devices that seem to be broken PCI: print resources consistently with %pR PCI: make disabled window printk style match the enabled ones PCI: break out primary/secondary/subordinate for readability PCI: for address space collisions, show conflicting resource resources: add interfaces that return conflict information PCI: cleanup error return for pcix get and set mmrbc functions PCI: fix access of PCI_X_CMD by pcix get and set mmrbc functions PCI: kill off pci_register_set_vga_state() symbol export. PCI: fix return value from pcix_get_max_mmrbc()
| * | resources: add interfaces that return conflict informationBjorn Helgaas2010-03-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | request_resource() and insert_resource() only return success or failure, which no information about what existing resource conflicted with the proposed new reservation. This patch adds request_resource_conflict() and insert_resource_conflict(), which return the conflicting resource. Callers may use this for better error messages or to adjust the new resource and retry the request. Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>