aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* update the comment in kthread_stop()Oleg Nesterov2009-07-27
| | | | | | | | | | | | | Commit 63706172f332fd3f6e7458ebfb35fa6de9c21dc5 ("kthreads: rework kthread_stop()") removed the limitation that the thread function mysr not call do_exit() itself, but forgot to update the comment. Since that commit it is OK to use kthread_stop() even if kthread can exit itself. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* module: use MODULE_SYMBOL_PREFIX with module_layoutMike Frysinger2009-07-27
| | | | | | | | | | | | The check_modstruct_version() needs to look up the symbol "module_layout" in the kernel, but it does so literally and not by a C identifier. The trouble is that it does not include a symbol prefix for those ports that need it (like the Blackfin and H8300 port). So make sure we tack on the MODULE_SYMBOL_PREFIX define to the front of it. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for_linus' of ↵Linus Torvalds2009-07-27
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: jbd: fix race between write_metadata_buffer and get_write_access ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle() jbd: Fix a race between checkpointing code and journal_get_write_access() ext3: Fix truncation of symlinks after failed write jbd: Fail to load a journal if it is too short
| * jbd: fix race between write_metadata_buffer and get_write_accessdingdinghua2009-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The function journal_write_metadata_buffer() calls jbd_unlock_bh_state(bh_in) too early; this could potentially allow another thread to call get_write_access on the buffer head, modify the data, and dirty it, and allowing the wrong data to be written into the journal. Fortunately, if we lose this race, the only time this will actually cause filesystem corruption is if there is a system crash or other unclean shutdown of the system before the next commit can take place. Signed-off-by: dingdinghua <dingdinghua85@gmail.com> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext3: Get rid of extenddisksize parameter of ext3_get_blocks_handle()Jan Kara2009-07-15
| | | | | | | | | | | | | | | | | | | | | | Get rid of extenddisksize parameter of ext3_get_blocks_handle(). This seems to be a relict from some old days and setting disksize in this function does not make much sence. Currently it was set only by ext3_getblk(). Since the parameter has some effect only if create == 1, it is easy to check that the three callers which end up calling ext3_getblk() with create == 1 (ext3_append, ext3_quota_write, ext3_mkdir) do the right thing and set disksize themselves. Signed-off-by: Jan Kara <jack@suse.cz>
| * jbd: Fix a race between checkpointing code and journal_get_write_access()Jan Kara2009-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following race can happen: CPU1 CPU2 checkpointing code checks the buffer, adds it to an array for writeback do_get_write_access() ... lock_buffer() unlock_buffer() flush_batch() submits the buffer for IO __jbd_journal_file_buffer() So a buffer under writeout is returned from do_get_write_access(). Since the filesystem code relies on the fact that journaled buffers cannot be written out, it does not take the buffer lock and so it can modify buffer while it is under writeout. That can lead to a filesystem corruption if we crash at the right moment. The similar problem can happen with the journal_get_create_access() path. We fix the problem by clearing the buffer dirty bit under buffer_lock even if the buffer is on BJ_None list. Actually, we clear the dirty bit regardless the list the buffer is in and warn about the fact if the buffer is already journalled. Thanks for spotting the problem goes to dingdinghua <dingdinghua85@gmail.com>. Reported-by: dingdinghua <dingdinghua85@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * ext3: Fix truncation of symlinks after failed writeJan Kara2009-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Contents of long symlinks is written via standard write methods. So when the write fails, we add inode to orphan list. But symlinks don't have .truncate method defined so nobody properly removes them from the orphan list (both on disk and in memory). Fix this by calling ext3_truncate() directly instead of calling vmtruncate() (which is saner anyway since we don't need anything vmtruncate() does except from calling .truncate in these paths). We also add inode to orphan list only if ext3_can_truncate() is true (currently, it can be false for symlinks when there are no blocks allocated) - otherwise orphan list processing will complain and ext3_truncate() will not remove inode from on-disk orphan list. Signed-off-by: Jan Kara <jack@suse.cz>
| * jbd: Fail to load a journal if it is too shortJan Kara2009-07-15
| | | | | | | | | | | | | | | | Due to on disk corruption, it can happen that journal is too short. Fail to load it in such case so that we don't oops somewhere later. Reported-by: Nageswara R Sastry <rnsastry@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2009-07-27
|\ \ | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] fix sparse warning cifs: fix sb->s_maxbytes so that it casts properly to a signed value cifs: disable serverino if server doesn't support it
| * | [CIFS] fix sparse warningSteve French2009-07-22
| | | | | | | | | | | | Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: fix sb->s_maxbytes so that it casts properly to a signed valueJeff Layton2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This off-by-one bug causes sendfile() to not work properly. When a task calls sendfile() on a file on a CIFS filesystem, the syscall returns -1 and sets errno to EOVERFLOW. do_sendfile uses s_maxbytes to verify the returned offset of the file. The problem there is that this value is cast to a signed value (loff_t). When this is done on the s_maxbytes value that cifs uses, it becomes negative and the comparisons against it fail. Even though s_maxbytes is an unsigned value, it seems that it's not OK to set it in such a way that it'll end up negative when it's cast to a signed value. These casts happen in other codepaths besides sendfile too, but the VFS is a little hard to follow in this area and I can't be sure if there are other bugs that this will fix. It's not clear to me why s_maxbytes isn't just declared as loff_t in the first place, but either way we still need to fix these values to make sendfile work properly. This is also an opportunity to replace the magic bit-shift values here with the standard #defines for this. This fixes the reproducer program I have that does a sendfile and will probably also fix the situation where apache is serving from a CIFS share. Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * | cifs: disable serverino if server doesn't support itJeff Layton2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A recent regression when dealing with older servers. This bug was introduced when we made serverino the default... When the server can't provide inode numbers, disable it for the mount. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* | | mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()Benjamin Herrenschmidt2009-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() Upcoming paches to support the new 64-bit "BookE" powerpc architecture will need to have the virtual address corresponding to PTE page when freeing it, due to the way the HW table walker works. Basically, the TLB can be loaded with "large" pages that cover the whole virtual space (well, sort-of, half of it actually) represented by a PTE page, and which contain an "indirect" bit indicating that this TLB entry RPN points to an array of PTEs from which the TLB can then create direct entries. Thus, in order to invalidate those when PTE pages are deleted, we need the virtual address to pass to tlbilx or tlbivax instructions. The old trick of sticking it somewhere in the PTE page struct page sucks too much, the address is almost readily available in all call sites and almost everybody implemets these as macros, so we may as well add the argument everywhere. I added it to the pmd and pud variants for consistency. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: David Howells <dhowells@redhat.com> [MN10300 & FRV] Acked-by: Nick Piggin <npiggin@suse.de> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> [s390] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Linux 2.6.31-rc4v2.6.31-rc4Linus Torvalds2009-07-22
| | |
* | | Merge branch 'irq-fixes-for-linus' of ↵Linus Torvalds2009-07-22
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: genirq: Fix UP compile failure caused by irq_thread_check_affinity
| * | | genirq: Fix UP compile failure caused by irq_thread_check_affinityBruno Premont2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since genirq: Delegate irq affinity setting to the irq thread (591d2fb02ea80472d846c0b8507007806bdd69cc) compilation with CONFIG_SMP=n fails with following error: /usr/src/linux-2.6/kernel/irq/manage.c: In function 'irq_thread_check_affinity': /usr/src/linux-2.6/kernel/irq/manage.c:475: error: 'struct irq_desc' has no member named 'affinity' make[4]: *** [kernel/irq/manage.o] Error 1 That commit adds a new function irq_thread_check_affinity() which uses struct irq_desc.affinity which is only available for CONFIG_SMP=y. Move that function under #ifdef CONFIG_SMP. [ tglx@brownpaperbag: compile and boot tested on UP and SMP ] Signed-off-by: Bruno Premont <bonbons@linux-vserver.org> LKML-Reference: <20090722222232.2eb3e1c4@neptune.home> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge branch 'lockdep-for-linus' of ↵Linus Torvalds2009-07-22
|\ \ \ \ | |/ / / |/| | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'lockdep-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: lockdep: Fix lockdep annotation for pipe_double_lock()
| * | | lockdep: Fix lockdep annotation for pipe_double_lock()Peter Zijlstra2009-07-22
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The presumed use of the pipe_double_lock() routine is to lock 2 locks in a deadlock free way by ordering the locks by their address. However it fails to keep the specified lock classes in order and explicitly annotates a deadlock. Rectify this. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Miklos Szeredi <mszeredi@suse.cz> LKML-Reference: <1248163763.15751.11098.camel@twins>
* | | Merge branch 'perf-counters-for-linus' of ↵Linus Torvalds2009-07-22
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf * 'perf-counters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-perf: (31 commits) perf_counter tools: Give perf top inherit option perf_counter tools: Fix vmlinux symbol generation breakage perf_counter: Detect debugfs location perf_counter: Add tracepoint support to perf list, perf stat perf symbol: C++ demangling perf: avoid structure size confusion by using a fixed size perf_counter: Fix throttle/unthrottle event logging perf_counter: Improve perf stat and perf record option parsing perf_counter: PERF_SAMPLE_ID and inherited counters perf_counter: Plug more stack leaks perf: Fix stack data leak perf_counter: Remove unused variables perf_counter: Make call graph option consistent perf_counter: Add perf record option to log addresses perf_counter: Log vfork as a fork event perf_counter: Synthesize VDSO mmap event perf_counter: Make sure we dont leak kernel memory to userspace perf_counter tools: Fix index boundary check perf_counter: Fix the tracepoint channel to perfcounters perf_counter, x86: Extend perf_counter Pentium M support ...
| * | | perf_counter tools: Give perf top inherit optionMike Galbraith2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, perf top -p only tracks the pid provided, which isn't very useful for watching forky loads, so give it an inherit option. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248165036.9795.10.camel@marge.simson.net>
| * | | perf_counter tools: Fix vmlinux symbol generation breakageMike Galbraith2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmlinux meets the criteria for symbol adjustment, which breaks vmlinux generated symbols. Fix this by exempting vmlinux. This is a bit fragile in that someone could change the kernel dso's name, but currently that name is also hardwired. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248091298.18702.18.camel@marge.simson.net>
| * | | perf_counter: Detect debugfs locationJason Baron2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If "/sys/kernel/debug" is not a debugfs mount point, search for the debugfs filesystem in /proc/mounts, but also allows the user to specify '--debugfs-dir=blah' or set the environment variable: 'PERF_DEBUGFS_DIR' Signed-off-by: Jason Baron <jbaron@redhat.com> [ also made it probe "/debug" by default ] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090721181629.GA3094@redhat.com>
| * | | perf_counter: Add tracepoint support to perf list, perf statJason Baron2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support to 'perf list' and 'perf stat' for kernel tracepoints. The implementation creates a 'for_each_subsystem' and 'for_each_event' for easy iteration over the tracepoints. Signed-off-by: Jason Baron <jbaron@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <426129bf9fcc8ee63bb094cf736e7316a7dcd77a.1248190728.git.jbaron@redhat.com>
| * | | perf symbol: C++ demanglingArnaldo Carvalho de Melo2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [acme@doppio ~]$ perf report -s comm,dso,symbol -C firefox -d /usr/lib64/xulrunner-1.9.1/libxul.so | grep :: | head 2.21% [.] nsDeque::Push(void*) 1.78% [.] GraphWalker::DoWalk(nsDeque&) 1.30% [.] GCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*) 1.27% [.] XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 1.18% [.] imgContainer::DrawFrameTo(gfxIImageFrame*, gfxIImageFrame*, nsRect&) 1.13% [.] nsDeque::PopFront() 1.11% [.] nsGlobalWindow::RunTimeout(nsTimeout*) 0.97% [.] nsXPConnect::Traverse(void*, nsCycleCollectionTraversalCallback&) 0.95% [.] nsJSEventListener::cycleCollection::Traverse(void*, nsCycleCollectionTraversalCallback&) 0.95% [.] nsCOMPtr_base::~nsCOMPtr_base() [acme@doppio ~]$ Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Suggested-by: Clark Williams <williams@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090720171412.GB10410@ghostprotocols.net>
| * | | perf: avoid structure size confusion by using a fixed sizeArjan van de Ven2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for some reason, this structure gets compiled as 36 bytes in some files (the ones that alloacte it) but 40 bytes in others (the ones that use it). The cause is an off_t type that gets a different size in different compilation units for some yet-to-be-explained reason. But the effect is disasterous; the size/offset members of the struct are at different offsets, and result in mostly complete garbage. The parser in perf is so robust that this all gets hidden, and after skipping an certain amount of samples, it recovers.... so this bug is not normally noticed. .... except when you want every sample to be exact. Fix this by just using an explicitly sized type. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4A655917.9080504@linux.intel.com>
| * | | perf_counter: Fix throttle/unthrottle event loggingAnton Blanchard2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we only print PERF_EVENT_THROTTLE + 1 (ie PERF_EVENT_UNTHROTTLE). Fix this to print both a throttle and unthrottle event. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090722130546.GE9029@kryten>
| * | | perf_counter: Improve perf stat and perf record option parsingAnton Blanchard2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf stat and perf record currently look for all options on the command line. This can lead to some confusion: # perf stat ls -l Error: unknown switch `l' While we can work around this by adding '--' before the command, the git option parsing code can stop at the first non option: # perf stat ls -l Performance counter stats for 'ls -l': .... Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090722130412.GD9029@kryten>
| * | | perf_counter: PERF_SAMPLE_ID and inherited countersPeter Zijlstra2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anton noted that for inherited counters the counter-id as provided by PERF_SAMPLE_ID isn't mappable to the id found through PERF_RECORD_ID because each inherited counter gets its own id. His suggestion was to always return the parent counter id, since that is the primary counter id as exposed. However, these inherited counters have a unique identifier so that events like PERF_EVENT_PERIOD and PERF_EVENT_THROTTLE can be specific about which counter gets modified, which is important when trying to normalize the sample streams. This patch removes PERF_EVENT_PERIOD in favour of PERF_SAMPLE_PERIOD, which is more useful anyway, since changing periods became a lot more common than initially thought -- rendering PERF_EVENT_PERIOD the less useful solution (also, PERF_SAMPLE_PERIOD reports the more accurate value, since it reports the value used to trigger the overflow, whereas PERF_EVENT_PERIOD simply reports the requested period changed, which might only take effect on the next cycle). This still leaves us PERF_EVENT_THROTTLE to consider, but since that _should_ be a rare occurrence, and linking it to a primary id is the most useful bit to diagnose the problem, we introduce a PERF_SAMPLE_STREAM_ID, for those few cases where the full reconstruction is important. [Does change the ABI a little, but I see no other way out] Suggested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1248095846.15751.8781.camel@twins>
| * | | perf_counter: Plug more stack leaksPeter Zijlstra2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | Per example of Arjan's patch, I went through and found a few more. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | | perf: Fix stack data leakArjan van de Ven2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the "reserved" field was not initialized to zero, resulting in 4 bytes of stack data leaking to userspace.... Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | | perf_counter: Remove unused variablesPeter Zijlstra2009-07-22
| | | | | | | | | | | | | | | | | | | | | | | | Fix a gcc unused variables warning. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
| * | | Merge commit 'tip/perfcounters/core' into perf-counters-for-linusPeter Zijlstra2009-07-22
| |\ \ \
| | * | | perf_counter, x86: Extend perf_counter Pentium M supportDaniel Qarras2009-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've attached a patch to remove the Pentium M special casing of EMON and as noticed at least with my Pentium M the hardware PMU now works: Performance counter stats for '/bin/ls /var/tmp': 1.809988 task-clock-msecs # 0.125 CPUs 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 224 page-faults # 0.124 M/sec 1425648 cycles # 787.656 M/sec 912755 instructions # 0.640 IPC Vince suggested that this code was trying to address erratum Y17 in Pentium-M's: http://download.intel.com/support/processors/mobile/pm/sb/25266532.pdf But that erratum (related to IA32_MISC_ENABLES.7) does not affect perfcounters as we dont use this toggle to disable RDPMC and WRMSR/RDMSR access to performance counters. We keep cr4's bit 8 (X86_CR4_PCE) clear so unprivileged RDPMC access is not allowed anyway. Cc: Vince Weaver <vince@deater.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@googlemail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf report: Introduce -n/--show-nr-samplesArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [acme@doppio pahole]$ perf report -ns comm,dso,symbol -d /lib64/libc-2.10.1.so -C pahole | head -17 21.94% 32101 [.] _int_malloc 20.10% 29402 [.] __GI_strcmp 16.77% 24533 [.] __tsearch 12.61% 18450 [.] malloc_consolidate 6.42% 9394 [.] _int_free 6.28% 9191 [.] __tfind 4.56% 6678 [.] __GI___libc_free 4.46% 6520 [.] _IO_vfprintf_internal 2.59% 3786 [.] __malloc 1.17% 1716 [.] __GI_memcpy [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-5-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf_counter tools: PLT info is stripped in -debuginfo packagesArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we need to get the richer .symtab from the debuginfo packages but the PLT info from the original DSO where we have just the leaner .dynsym symtab. Example: | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > before | [acme@doppio pahole]$ perf report --sort comm,dso,symbol > after | [acme@doppio pahole]$ diff -U1 before after | --- before 2009-07-11 11:04:22.688595741 -0300 | +++ after 2009-07-11 11:04:33.380595676 -0300 | @@ -80,3 +80,2 @@ | 0.07% pahole ./build/pahole [.] pahole_stealer | - 0.06% pahole /usr/lib64/libdw-0.141.so [.] 0x00000000007140 | 0.06% pahole /usr/lib64/libdw-0.141.so [.] __libdw_getabbrev | @@ -91,2 +90,3 @@ | 0.06% pahole [kernel] [k] free_hot_cold_page | + 0.06% pahole /usr/lib64/libdw-0.141.so [.] tfind@plt | 0.05% pahole ./build/libdwarves.so.1.0.0 [.] ftype__add_parameter | @@ -242,2 +242,3 @@ | 0.01% pahole [kernel] [k] account_group_user_time | + 0.01% pahole /usr/lib64/libdw-0.141.so [.] strlen@plt | 0.01% pahole ./build/pahole [.] strcmp@plt | [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-4-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf report: Make the output more compactArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we filter by column content we may end up with a column that has the same value for all the lines. So remove that column and tell its unique value on the top, as a comment. Example: [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d ./build/libdwarves.so.1.0.0 -C pahole | head -15 # dso: ./build/libdwarves.so.1.0.0 # comm: pahole # Samples: 58409 # # Overhead Symbol # ........ ...... # 20.93% [.] tag__recode_dwarf_type 14.94% [.] namespace__recode_dwarf_types 10.38% [.] cu__table_add_tag 6.69% [.] __die__process_tag 5.05% [.] die__process_function 4.70% [.] list__for_all_tags 3.68% [.] tag__init 3.48% [.] die__create_new_parameter [acme@doppio pahole]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-3-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | strlist: Introduce strlist__entry and strlist__nr_entries methodsArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The strlist__entry method allows accessing strlists like an array, will be used in the 'perf report' to access the first entry. We now keep the nr_entries so that we can check if we have just one entry, will be used in 'perf report' to improve the output by showing just at the top when we have just, say, one DSO. While at it use nr_entries to optimize strlist__is_empty by not using the far more costly rb_first based implementation. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-2-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf report: Tidy up reporting of symbols not foundArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always printing the level info about if it is in the kernel, hypervisor or userspace as that is in the hist_entry. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1247325517-12272-1-git-send-email-acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf report: Adjust column width to the values sampledArnaldo Carvalho de Melo2009-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Auto-adjust column width of perf report output to the longest occuring string length. Example: [acme@doppio pahole]$ perf report --sort comm,dso,symbol | head -13 12.79% pahole /usr/lib64/libdw-0.141.so [.] __libdw_find_attr 8.90% pahole /lib64/libc-2.10.1.so [.] _int_malloc 8.68% pahole /usr/lib64/libdw-0.141.so [.] __libdw_form_val_len 8.15% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp 6.80% pahole /lib64/libc-2.10.1.so [.] __tsearch 5.54% pahole ./build/libdwarves.so.1.0.0 [.] tag__recode_dwarf_type [acme@doppio pahole]$ [acme@doppio pahole]$ perf report --sort comm,dso,symbol -d /lib64/libc-2.10.1.so | head -10 21.92% pahole /lib64/libc-2.10.1.so [.] _int_malloc 20.08% pahole /lib64/libc-2.10.1.so [.] __GI_strcmp 16.75% pahole /lib64/libc-2.10.1.so [.] __tsearch [acme@doppio pahole]$ Also add these extra options to control the new behaviour: -w, --field-width Force each column width to the provided list, for large terminal readability. -t, --field-separator: Use a special separator character and don't pad with spaces, replacing all occurances of this separator in symbol names (and other output) with a '.' character, that thus it's the only non valid separator. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20090711014728.GH3452@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf_counter: Stop open coding unclone_ctxPeter Zijlstra2009-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of open coding the unclone context thingy, put it in a common function. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf_counter: Clean up global vs counter enablePeter Zijlstra2009-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ingo noticed that both AMD and P6 call x86_pmu_disable_counter() on *_pmu_enable_counter(). This is because we rely on the side effect of that call to program the event config but not touch the EN bit. We change that for AMD by having enable_all() simply write the full config in, and for P6 by explicitly coding it. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf_counter: Fix up P6 PMU detailsPeter Zijlstra2009-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The P6 doesn't seem to support cache ref/hit/miss counts, so we extend the generic hardware event codes to have 0 and -1 mean the same thing as for the generic cache events. Furthermore, it turns out the 0 event does not count (that is, its reported that on PPro it actually does count something), therefore use a event configuration that's specified not to count to disable the counters. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * | | perf_counter: Add P6 PMU supportVince Weaver2009-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic P6 PMU support. The P6 uses the EVNTSEL0 EN bit to enable/disable both its counters. We use this for the global enable/disable, and clear all config bits (except EN) to disable individual counters. Actual ia32 hardware doesn't support lfence, so use a locked op without side-effect to implement a full barrier. perf stat and perf record seem to function correctly. [a.p.zijlstra@chello.nl: cleanups and complete the enable/disable code] Signed-off-by: Vince Weaver <vince@deater.net> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <Pine.LNX.4.64.0907081718450.2715@pianoman.cluster.toy> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Make call graph option consistentAnton Blanchard2009-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf record uses -g for logging call graph data but perf report uses -c to print call graph data. Be consistent and use -g everywhere for call graph data. Also update the help text to reflect the current default - fractal,0.5 Signed-off-by: Anton Blanchard <anton@samba.org> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.803604373@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Add perf record option to log addressesAnton Blanchard2009-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the -d or --data option to log event addresses (eg page faults). Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.697698033@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Log vfork as a fork eventAnton Blanchard2009-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now we don't output vfork events. Even though we should always see an exec after a vfork, we may get perfcounter samples between the vfork and exec. These samples can lead to some confusion when parsing perfcounter data. To keep things consistent we should always log a fork event. It will result in a little more log data, but is less confusing to trace parsing tools. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.589309391@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Synthesize VDSO mmap eventAnton Blanchard2009-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf record synthesizes mmap events for the running process. Right now it just catches file mappings, but we can check for the vdso symbol and add that too. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.517264409@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Make sure we dont leak kernel memory to userspaceAnton Blanchard2009-07-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few places we are leaking tiny amounts of kernel memory to userspace. This happens when writing out strings because we always align the end to 64 bits. To avoid this we should always use an appropriately sized temporary buffer and ensure it is zeroed. Since d_path assembles the string from the end of the buffer backwards, we need to add 64 bits after the buffer to allow for alignment. We also need to copy arch_vma_name to the temporary buffer, because if we use it directly we may end up copying to userspace a number of bytes after the end of the string constant. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090716104817.273972048@samba.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter tools: Fix index boundary checkRoel Kluin2009-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keep index within event_type_descriptors[] Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A5A7F0B.4070106@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | perf_counter: Fix the tracepoint channel to perfcountersChris Wilson2009-07-13
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a missed rename in EVENT_PROFILE support so that it gets built and allows tracepoint tracing from the 'perf' tool. Fix a typo in the (never before built & enabled) portion in perf_counter.c as well, and update that code to the attr.config changes as well. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Ben Gamari <bgamari.foss@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1246869094-21237-1-git-send-email-chris@chris-wilson.co.uk> Signed-off-by: Ingo Molnar <mingo@elte.hu>