aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* KVM: PPC: Add papr_enabled flagAlexander Graf2011-09-25
| | | | | | | | | When running a PAPR guest, some things change. The privilege level drops from hypervisor to supervisor, SDR1 gets treated differently and we interpret hypercalls. For bisectability sake, add the flag now, but only enable it when all the support code is there. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: PPC: move compute_tlbie_rb to book3s common headerAlexander Graf2011-09-25
| | | | | | | | We need the compute_tlbie_rb in _pr and _hv implementations for papr soon, so let's move it over to a common header file that both implementations can leverage. Signed-off-by: Alexander Graf <agraf@suse.de>
* KVM: Restore missing powerpc API docsAvi Kivity2011-09-25
| | | | | | Commit 371fefd6 lost a doc hunk somehow, restore it. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: APIC: avoid instruction emulation for EOI writesKevin Tian2011-09-25
| | | | | | | | | | | | | | | | Instruction emulation for EOI writes can be skipped, since sane guest simply uses MOV instead of string operations. This is a nice improvement when guest doesn't support x2apic or hyper-V EOI support. a single VM bandwidth is observed with ~8% bandwidth improvement (7.4Gbps->8Gbps), by saving ~5% cycles from EOI emulation. Signed-off-by: Kevin Tian <kevin.tian@intel.com> <Based on earlier work from>: Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Fix TSC MSR read in nested SVMNadav Har'El2011-09-25
| | | | | | | | | | | | | | When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be read without exit), we need to return L2's notion of the TSC, not L1's. The current code incorrectly returned L1 TSC, because svm_get_msr() was also used in x86.c where this was assumed, but now that these places call the new svm_read_l1_tsc(), the MSR read can be fixed. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: nVMX: Fix nested VMX TSC emulationNadav Har'El2011-09-25
| | | | | | | | | | | | | | | This patch fixes two corner cases in nested (L2) handling of TSC-related issues: 1. Somewhat suprisingly, according to the Intel spec, if L1 allows WRMSR to the TSC MSR without an exit, then this should set L1's TSC value itself - not offset by vmcs12.TSC_OFFSET (like was wrongly done in the previous code). 2. Allow L1 to disable the TSC_OFFSETING control, and then correctly ignore the vmcs12.TSC_OFFSET. Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: L1 TSC handlingNadav Har'El2011-09-25
| | | | | | | | | | | | | | | | | | | | | KVM assumed in several places that reading the TSC MSR returns the value for L1. This is incorrect, because when L2 is running, the correct TSC read exit emulation is to return L2's value. We therefore add a new x86_ops function, read_l1_tsc, to use in places that specifically need to read the L1 TSC, NOT the TSC of the current level of guest. Note that one change, of one line in kvm_arch_vcpu_load, is made redundant by a different patch sent by Zachary Amsden (and not yet applied): kvm_arch_vcpu_load() should not read the guest TSC, and if it didn't, of course we didn't have to change the call of kvm_get_msr() to read_l1_tsc(). [avi: moved callback to kvm_x86_ops tsc block] Signed-off-by: Nadav Har'El <nyh@il.ibm.com> Acked-by: Zachary Amsdem <zamsden@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: nVMX: Document 'nested' parameterSasha Levin2011-09-25
| | | | | | | | | | | Add documentation of the new 'nested' parameter to 'Documentation/kernel-parameters.txt'. Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Nadav Har'El <nyh@il.ibm.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Fix SMEP failure during fetchYang, Wei Y2011-09-25
| | | | | | | | | This patch fix kvm-unit-tests hanging and incorrect PT_ACCESSED_MASK bit set in the case of SMEP fault. The code updated 'eperm' after the variable was checked. Signed-off-by: Yang, Wei <wei.y.yang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Do not unconditionally read PDPTE from guest memoryAvi Kivity2011-09-25
| | | | | | | | | | | | | | Architecturally, PDPTEs are cached in the PDPTRs when CR3 is reloaded. On SVM, it is not possible to implement this, but on VMX this is possible and was indeed implemented until nested SVM changed this to unconditionally read PDPTEs dynamically. This has noticable impact when running PAE guests. Fix by changing the MMU to read PDPTRs from the cache, falling back to reading from memory for the nested MMU. Signed-off-by: Avi Kivity <avi@redhat.com> Tested-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: trivial: use BUG_ONJulia Lawall2011-09-25
| | | | | | | | | | | | | | | | | | | | Use BUG_ON(x) rather than if(x) BUG(); The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ -if (x) BUG(); +BUG_ON(x); @@ identifier x; @@ -if (!x) BUG(); +BUG_ON(!x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: report valid microcode update IDMarcelo Tosatti2011-09-25
| | | | | | | | | | | | | | | | Windows Server 2008 SP2 checked build with smp > 1 BSOD's during boot due to lack of microcode update: *** Assertion failed: The system BIOS on this machine does not properly support the processor. The system BIOS did not load any microcode update. A BIOS containing the latest microcode update is needed for system reliability. (CurrentUpdateRevision != 0) *** Source File: d:\longhorn\base\hals\update\intelupd\update.c, line 440 Report a non-zero microcode update signature to make it happy. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Make x86_decode_insn() return proper macrosTakuya Yoshikawa2011-09-25
| | | | | | | | | | Return EMULATION_OK/FAILED consistently. Also treat instruction fetch errors, not restricted to X86EMUL_UNHANDLEABLE, as EMULATION_FAILED; although this cannot happen in practice, the current logic will continue the emulation even if the decoder fails to fetch the instruction. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Let compiler know insn_fetch() rarely failsTakuya Yoshikawa2011-09-25
| | | | | | | | Fetching the instruction which was to be executed by the guest cannot fail normally. So compiler should always predict that it will succeed. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Drop _size argument from insn_fetch()Takuya Yoshikawa2011-09-25
| | | | | | | _type is enough to know the size. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Use ctxt->_eip directly in do_insn_fetch_byte()Takuya Yoshikawa2011-09-25
| | | | | | | | | | | | | | Instead of passing ctxt->_eip from insn_fetch() call sites, get it from ctxt in do_insn_fetch_byte(). This is done by replacing the argument _eip of insn_fetch() with _ctxt, which should be better than letting the macro use ctxt silently in its body. Though this changes the place where ctxt->_eip is incremented from insn_fetch() to do_insn_fetch_byte(), this does not have any real effect. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Intelligent device lookup on I/O busSasha Levin2011-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | Currently the method of dealing with an IO operation on a bus (PIO/MMIO) is to call the read or write callback for each device registered on the bus until we find a device which handles it. Since the number of devices on a bus can be significant due to ioeventfds and coalesced MMIO zones, this leads to a lot of overhead on each IO operation. Instead of registering devices, we now register ranges which points to a device. Lookup is done using an efficient bsearch instead of a linear search. Performance test was conducted by comparing exit count per second with 200 ioeventfds created on one byte and the guest is trying to access a different byte continuously (triggering usermode exits). Before the patch the guest has achieved 259k exits per second, after the patch the guest does 274k exits per second. Cc: Avi Kivity <avi@redhat.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use __print_symbolic() for vmexit tracepointsStefan Hajnoczi2011-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | The vmexit tracepoints format the exit_reason to make it human-readable. Since the exit_reason depends on the instruction set (vmx or svm), formatting is handled with ftrace_print_symbols_seq() by referring to the appropriate exit reason table. However, the ftrace_print_symbols_seq() function is not meant to be used directly in tracepoints since it does not export the formatting table which userspace tools like trace-cmd and perf use to format traces. In practice perf dies when formatting vmexit-related events and trace-cmd falls back to printing the numeric value (with extra formatting code in the kvm plugin to paper over this limitation). Other userspace consumers of vmexit-related tracepoints would be in similar trouble. To avoid significant changes to the kvm_exit tracepoint, this patch moves the vmx and svm exit reason tables into arch/x86/kvm/trace.h and selects the right table with __print_symbolic() depending on the instruction set. Note that __print_symbolic() is designed for exporting the formatting table to userspace and allows trace-cmd and perf to work. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Record instruction set in all vmexit tracepointsStefan Hajnoczi2011-09-25
| | | | | | | | | | | | | The kvm_exit tracepoint recently added the isa argument to aid decoding exit_reason. The semantics of exit_reason depend on the instruction set (vmx or svm) and the isa argument allows traces to be analyzed on other machines. Add the isa argument to kvm_nested_vmexit and kvm_nested_vmexit_inject so these tracepoints can also be self-describing. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Really fix HV_X64_MSR_APIC_ASSIST_PAGEMike Waychison2011-09-25
| | | | | | | | | | Commit 0945d4b228 tried to fix the get_msr path for the HV_X64_MSR_APIC_ASSIST_PAGE msr, but was poorly tested. We should be returning 0 if the read succeeded, and passing the value back to the caller via the pdata out argument, not returning the value directly. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: get_msr support for HV_X64_MSR_APIC_ASSIST_PAGEMike Waychison2011-09-25
| | | | | | | | | | | | | | "get" support for the HV_X64_MSR_APIC_ASSIST_PAGE msr was missing, even though it is explicitly enumerated as something the vmm should save in msrs_to_save and reported to userland via the KVM_GET_MSR_INDEX_LIST ioctl. Add "get" support for HV_X64_MSR_APIC_ASSIST_PAGE. We simply return the guest visible value of this register, which seems to be correct as a set on the register is validated for us already. Signed-off-by: Mike Waychison <mikew@google.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Make coalesced mmio use a device per zoneSasha Levin2011-09-25
| | | | | | | | | | | | | This patch changes coalesced mmio to create one mmio device per zone instead of handling all zones in one device. Doing so enables us to take advantage of existing locking and prevents a race condition between coalesced mmio registration/unregistration and lookups. Suggested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: Raise the hard VCPU count limitSasha Levin2011-09-25
| | | | | | | | | | | | | | | | | | | | | The patch raises the hard limit of VCPU count to 254. This will allow developers to easily work on scalability and will allow users to test high VCPU setups easily without patching the kernel. To prevent possible issues with current setups, KVM_CAP_NR_VCPUS now returns the recommended VCPU limit (which is still 64) - this should be a safe value for everybody, while a new KVM_CAP_MAX_VCPUS returns the hard limit which is now 254. Cc: Avi Kivity <avi@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Suggested-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMIO: Lock coalesced device when checking for available entrySasha Levin2011-09-25
| | | | | | | | | | | | | Move the check whether there are available entries to within the spinlock. This allows working with larger amount of VCPUs and reduces premature exits when using a large number of VCPUs. Cc: Avi Kivity <avi@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: cleanup the code of read/write emulationXiao Guangrong2011-09-25
| | | | | | | Using the read/write operation to remove the same code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: abstract the operation for read/write emulationXiao Guangrong2011-09-25
| | | | | | | | | The operations of read emulation and write emulation are very similar, so we can abstract the operation of them, in larter patch, it is used to cleanup the same code Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: fix broken read emulation spans a page boundaryXiao Guangrong2011-09-25
| | | | | | | | | | | If the range spans a page boundary, the mmio access can be broke, fix it as write emulation. And we already get the guest physical address, so use it to read guest data directly to avoid walking guest page table again Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: fix Src2CL decodeAvi Kivity2011-09-25
| | | | | | | | | | Src2CL decode (used for double width shifts) erronously decodes only bit 3 of %rcx, instead of bits 7:0. Fix by decoding %cl in its entirety. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: fix incorrect return of spteZhao Jin2011-09-25
| | | | | | | | | | __update_clear_spte_slow should return original spte while the current code returns low half of original spte combined with high half of new spte. Signed-off-by: Zhao Jin <cronozhj@gmail.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge branch 'spi/merge' of git://git.secretlab.ca/git/linux-2.6Linus Torvalds2011-09-23
|\ | | | | | | | | | | * 'spi/merge' of git://git.secretlab.ca/git/linux-2.6: spi: Fix WARN when removing spi-fsl-spi module spi/imx: Fix spi-imx when the hardware SPI chipselects are used
| * spi: Fix WARN when removing spi-fsl-spi moduleJeff Harris2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | If CPM mode is not used, the fsl_dummy_rx variable is never allocated. When the cleanup attempts to free it, the reference count is zero and a WARN is generated. The same CPM mode check used in the initialize is applied to the free as well. Tested on 2.6.33 with the previous spi_mpc8xxx driver. The renamed spi-fsl-spi driver looks to have the same problem. Signed-off-by: Jeff Harris <jeff_harris@kentrox.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
| * spi/imx: Fix spi-imx when the hardware SPI chipselects are usedFabio Estevam2011-09-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 22a85e4cd51 (spi/imx: add device tree probe support) broke spi-imx usage when the SPI chipselect is the one internal to the controller. On a mx31pdk board the following error is seen: Registering mxc_nand as whole device ------------[ cut here ]------------ WARNING: at drivers/gpio/gpiolib.c:101 gpio_ensure_requested+0x4c/0xf4() autorequest GPIO-0 Modules linked in: [<c0014410>] (unwind_backtrace+0x0/0xf4) from [<c0025754>] (warn_slowpath_common+0x4c/0x64) [<c0025754>] (warn_slowpath_common+0x4c/0x64) from [<c0025800>] (warn_slowpath_fmt+0x30/0x40) [<c0025800>] (warn_slowpath_fmt+0x30/0x40) from [<c0198688>] (gpio_ensure_requested+0x4c/0xf4) [<c0198688>] (gpio_ensure_requested+0x4c/0xf4) from [<c01988c8>] (gpio_direction_output+0xa0/0x138) [<c01988c8>] (gpio_direction_output+0xa0/0x138) from [<c01ed198>] (spi_imx_setup+0x38/0x4c) [<c01ed198>] (spi_imx_setup+0x38/0x4c) from [<c01eb5d0>] (spi_setup+0x38/0x50) [<c01eb5d0>] (spi_setup+0x38/0x50) from [<c01eb85c>] (spi_add_device+0x94/0x124) [<c01eb85c>] (spi_add_device+0x94/0x124) from [<c01eb960>] (spi_new_device+0x74/0xac) [<c01eb960>] (spi_new_device+0x74/0xac) from [<c01eb9b8>] (spi_match_master_to_boardinfo+0x20/0x40) [<c01eb9b8>] (spi_match_master_to_boardinfo+0x20/0x40) from [<c01eba88>] (spi_register_master+0xb0/0x104) [<c01eba88>] (spi_register_master+0xb0/0x104) from [<c01ec0b4>] (spi_bitbang_start+0x104/0x17c) [<c01ec0b4>] (spi_bitbang_start+0x104/0x17c) from [<c02c2c4c>] (spi_imx_probe+0x2fc/0x404) [<c02c2c4c>] (spi_imx_probe+0x2fc/0x404) from [<c01c2498>] (platform_drv_probe+0x18/0x1c) [<c01c2498>] (platform_drv_probe+0x18/0x1c) from [<c01c1058>] (driver_probe_device+0x78/0x174) [<c01c1058>] (driver_probe_device+0x78/0x174) from [<c01c11e0>] (__driver_attach+0x8c/0x90) [<c01c11e0>] (__driver_attach+0x8c/0x90) from [<c01c0860>] (bus_for_each_dev+0x60/0x8c) [<c01c0860>] (bus_for_each_dev+0x60/0x8c) from [<c01c0088>] (bus_add_driver+0xa0/0x288) [<c01c0088>] (bus_add_driver+0xa0/0x288) from [<c01c179c>] (driver_register+0x78/0x18c) [<c01c179c>] (driver_register+0x78/0x18c) from [<c0008490>] (do_one_initcall+0x34/0x178) [<c0008490>] (do_one_initcall+0x34/0x178) from [<c03a5204>] (kernel_init+0x74/0x118) [<c03a5204>] (kernel_init+0x74/0x118) from [<c000f65c>] (kernel_thread_exit+0x0/0x8) ---[ end trace 759f924b30fd5a44 ]--- Fix this issue by using the original chip select logic and make spi-imx to work again. Tested on a mx31pdk that uses the hardware SPI chipselect pins and also on a mx27pdk that uses GPIO as SPI chipselect. Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* | scsi: fix qla2xxx printk format warningRandy Dunlap2011-09-23
| | | | | | | | | | | | | | | | | | | | sector_t can be different types, so cast it to its largest possible type. drivers/scsi/qla2xxx/qla_isr.c:1509:5: warning: format '%lx' expects type 'long unsigned int', but argument 5 has type 'sector_t' Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | scsi: SCSI_ISCI needs to select SCSI_SAS_HOST_SMP, fixes build errorRandy Dunlap2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | SCSI_ISCI needs to select SCSI_SAS_HOST_SMP to ensure that all needed symbols are available to it. Fixes this build error: ERROR: "try_test_sas_gpio_gp_bit" [drivers/scsi/isci/isci.ko] undefined! Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linuxLinus Torvalds2011-09-23
|\ \ | | | | | | | | | | | | * 'perf-tools-for-linus' of git://github.com/acmel/linux: perf python: Add missing perf_event__parse_sample 'swapped' parm
| * | perf python: Add missing perf_event__parse_sample 'swapped' parmArnaldo Carvalho de Melo2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Problem introduced in 936be50, that missed one perf_event__parse_sample user, the python binding. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ja4phms9618ggi657plyuch2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | Merge branch 'perf-tools-for-linus' of git://github.com/acmel/linuxLinus Torvalds2011-09-23
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'perf-tools-for-linus' of git://github.com/acmel/linux: perf tools: Add support for disabling -Werror via WERROR=0 perf top: Fix userspace sample addr map offset perf symbols: Fix issue with binaries using 16-bytes buildids (v2) perf tool: Fix endianness handling of u32 data in samples perf sort: Fix symbol sort output by separating unresolved samples by type perf symbols: Synthesize anonymous mmap events perf record: Create events initially disabled and enable after init perf symbols: Add some heuristics for choosing the best duplicate symbol perf symbols: Preserve symbol scope when parsing /proc/kallsyms perf symbols: /proc/kallsyms does not sort module symbols perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo files perf probe: Fix regression of variable finder
| * | perf tools: Add support for disabling -Werror via WERROR=0Darren Hart2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC often introduces new warnings with lots of false positives - breaking -Werror builds. WERROR=0 allows one to build perf without much fuss - while still encouraging people to send patches to avoid the fuss of having to type WERROR=0. Bisecting back to commits that produce a (mostly harmless) warning on some compilers is more difficult. With WERROR=0 one could bisect without worrying about harmless warnings. Cc: Ingo Molnar <mingo@elte.hu> Link: http://lkml.kernel.org/r/eac06c7cc4920e5d4830417d466161fb26c7359c.1315514559.git.dvhart@linux.intel.com Signed-off-by: Darren Hart <dvhart@linux.intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf top: Fix userspace sample addr map offsetArnaldo Carvalho de Melo2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'perf top' tool came from the kernel where we had each DSO (vmlinux, modules) loaded just once at a time. But userspace may have DSOs loaded in multiple addresses (shared libraries), requiring that we use the just resolved map instead of the first one found. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ag53wz0yllpgers0n2w7hchp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: Fix issue with binaries using 16-bytes buildids (v2)Stephane Eranian2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Buildid can vary in size. According to the man page of ld, buildid can be 160 bits (sha1) or 128 bits (md5, uuid). Perf assumes buildid size of 20 bytes (160 bits) regardless. When dealing with md5 buildids, it would thus read more than needed and that would cause mismatches and samples without symbols. This patch fixes this by taking into account the actual buildid size as encoded int he section header. The leftover bytes are also cleared. This second version fixes a minor issue with the memset() base position. Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@gmail.com> Link: http://lkml.kernel.org/r/4cc1af3c.8ee7d80a.5a28.ffff868e@mx.google.com Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf tool: Fix endianness handling of u32 data in samplesDavid Ahern2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, analyzing PPC data files on x86 the cpu field is always 0 and the tid and pid are backwards. For example, analyzing a PPC file on PPC the pid/tid fields show: rsyslogd 1210/1212 and analyzing the same PPC file using an x86 perf binary shows: rsyslogd 1212/1210 The problem is that the swap_op method for samples is perf_event__all64_swap which assumes all elements in the sample_data struct are u64s. cpu, tid and pid are u32s and need to be handled individually. Given that the swap is done before the sample is parsed, the simplest solution is to undo the 64-bit swap of those elements when the sample is parsed and do the proper swap. The RAW data field is generic and perf cannot have programmatic knowledge of how to treat that data. Instead a warning is given to the user. Thanks to Anton Blanchard for providing a data file for a mult-CPU PPC system so I could verify the fix for the CPU fields. v3 -> v4: - fixed use of WARN_ONCE v2 -> v3: - used WARN_ONCE for message regarding raw data - removed struct wrapper around union - fixed whitespace issues v1 -> v2: - added a union for undoing the byte-swap on u64 and redoing swap on u32's to address compiler errors (see git commit 65014ab3) Cc: Anton Blanchard <anton@samba.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1315321946-16993-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf sort: Fix symbol sort output by separating unresolved samples by typeAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I took a profile that suggested 60% of total CPU time was in the hypervisor: ... 60.20% [H] 0x33d43c 4.43% [k] ._spin_lock_irqsave 1.07% [k] ._spin_lock Using perf stat to get the user/kernel/hypervisor breakdown contradicted this. The problem is we merge all unresolved samples into the one unknown bucket. If add a comparison by sample type to sort__sym_cmp we get the real picture: ... 57.11% [.] 0x80fbf63c 4.43% [k] ._spin_lock_irqsave 1.07% [k] ._spin_lock 0.65% [H] 0x33d43c So it was almost all userspace, not hypervisor as the initial profile suggested. I found another issue while adding this. Symbol sorting sometimes shows multiple entries for the unknown bucket: ... 16.65% [.] 0x6cd3a8 7.25% [.] 0x422460 5.37% [.] yylex 4.79% [.] malloc 4.78% [.] _int_malloc 4.03% [.] _int_free 3.95% [.] hash_source_code_string 2.82% [.] 0x532908 2.64% [.] 0x36b538 0.94% [H] 0x8000000000e132a4 0.82% [H] 0x800000000000e8b0 This happens because we aren't consistent with our sorting. On one hand we check to see if both symbols match and for two unresolved samples sym is NULL so we match: if (left->ms.sym == right->ms.sym) return 0; On the other hand we use sample IP for unresolved samples when comparing against a symbol: ip_l = left->ms.sym ? left->ms.sym->start : left->ip; ip_r = right->ms.sym ? right->ms.sym->start : right->ip; This means unresolved samples end up spread across the rbtree and we can't merge them all. If we use cmp_null all unresolved samples will end up in the one bucket and the output makes more sense: ... 39.12% [.] 0x36b538 5.37% [.] yylex 4.79% [.] malloc 4.78% [.] _int_malloc 4.03% [.] _int_free 3.95% [.] hash_source_code_string 2.26% [H] 0x800000000000e8b0 Acked-by: Eric B Munson <emunson@mgebm.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ian Munsie <imunsie@au1.ibm.com> Link: http://lkml.kernel.org/r/20110831115145.4f598ab2@kryten Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: Synthesize anonymous mmap eventsAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf_event__synthesize_mmap_events does not create anonymous mmap events even though the kernel does. As a result an already running application with dynamically created code will not get profiled - all samples end up in the unknown bucket. This patch skips any entries with '[' in the name to avoid adding events for special regions (eg the vsyscall page). All other executable mmaps are assumed to be anonymous and an event is synthesized. Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Link: http://lkml.kernel.org/r/20110830091506.60b51fe8@kryten Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf record: Create events initially disabled and enable after initDavid Ahern2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perf-record currently creates events enabled. When doing a system wide collection (-a arg) this causes data collection for perf's initialization activities -- eg., perf_event__synthesize_threads(). For some events (e.g., context switch S/W event or tracepoints like syscalls) perf's initialization causes a lot of events to be captured frequently generating "Check IO/CPU overload!" warnings on larger systems (e.g., 2 socket, quad core, hyperthreading). perf's initialization phase can be skipped by creating events disabled and then enabling them once the initialization is done. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1314289075-14706-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: Add some heuristics for choosing the best duplicate symbolAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Try and pick the best symbol based on a few heuristics: - Prefer a non weak symbol over a weak one - Prefer a global symbol over a non global one - Prefer a symbol with less underscores (idea taken from kallsyms.c) - If all else fails, choose the symbol with the longest name Cc: Eric B Munson <emunson@mgebm.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110824065243.161953371@samba.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: Preserve symbol scope when parsing /proc/kallsymsAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kallsyms__parse capitalises the symbol type, so every symbol is marked global. Remove this and fix symbol_type__is_a to handle both local and global symbols. Cc: Eric B Munson <emunson@mgebm.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110824065243.077125989@samba.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: /proc/kallsyms does not sort module symbolsAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kallsyms__parse assumes that /proc/kallsyms is sorted and sets the end of the previous symbol to the start of the current one. Unfortunately module symbols are not sorted, eg: ffffffffa0081f30 t e1000_clean_rx_irq [e1000e] ffffffffa00817a0 t e1000_alloc_rx_buffers [e1000e] Some symbols end up with a negative length and others have a length larger than they should. This results in confusing perf output. We already have a function to fixup the end of zero length symbols so use that instead. Cc: Eric B Munson <emunson@mgebm.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110824065242.969681349@samba.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf symbols: Fix ppc64 SEGV in dso__load_sym with debuginfo filesAnton Blanchard2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 64bit PowerPC debuginfo files have an empty function descriptor section. I hit a SEGV when perf tried to use this section for symbol resolution. To fix this we need to check the section is valid and we can do this by checking for type SHT_PROGBITS. Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Eric B Munson <emunson@mgebm.net> Link: http://lkml.kernel.org/r/20110824065242.895239970@samba.org Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | perf probe: Fix regression of variable finderMasami Hiramatsu2011-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to call convert_variable() if previous call does not fail. To call convert_variable, it ensures "ret" is 0. However, since "ret" has the return value of synthesize_perf_probe_arg() which always returns positive value if it succeeded, perf probe doesn't call convert_variable(). This will cause a SEGV when we add an event with arguments. This has to be fixed as it ensures "ret" is greater than 0 (or not negative). This regression has been introduced by my previous patch, f182e3e1. Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yrl.pp-manager.tt@hitachi.com Link: http://lkml.kernel.org/r/20110820053922.3286.65805.stgit@fedora15 Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | | Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2011-09-23
|\ \ \ | | | | | | | | | | | | | | | | | | | | * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/kms: fix DDIA enable on some rs690 systems Revert "drm/radeon/kms: fix typo in r100_blit_copy"