aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAge
...
* | | KVM: remove garbage arg to *hardware_{en,dis}ableRadim Krčmář2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the beggining was on_each_cpu(), which required an unused argument to kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten. Remove unnecessary arguments that stem from this. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: remove Aligned bit from movntps/movntpdPaolo Bonzini2014-08-29
| | | | | | | | | | | | | | | | | | These are not explicitly aligned, and do not require alignment on AVX. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86 emulator: emulate MOVNTDQAlex Williamson2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Windows 8.1 guest with NVIDIA driver and GPU fails to boot with an emulation failure. The KVM spew suggests the fault is with lack of movntdq emulation (courtesy of Paolo): Code=02 00 00 b8 08 00 00 00 f3 0f 6f 44 0a f0 f3 0f 6f 4c 0a e0 <66> 0f e7 41 f0 66 0f e7 49 e0 48 83 e9 40 f3 0f 6f 44 0a 10 f3 0f 6f 0c 0a 66 0f e7 41 10 $ as -o a.out .section .text .byte 0x66, 0x0f, 0xe7, 0x41, 0xf0 .byte 0x66, 0x0f, 0xe7, 0x49, 0xe0 $ objdump -d a.out 0: 66 0f e7 41 f0 movntdq %xmm0,-0x10(%rcx) 5: 66 0f e7 49 e0 movntdq %xmm1,-0x20(%rcx) Add the necessary emulation. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: vmx: VMXOFF emulation in vm86 should cause #UDNadav Amit2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike VMCALL, the instructions VMXOFF, VMLAUNCH and VMRESUME should cause a UD exception in real-mode or vm86. However, the emulator considers all these instructions the same for the matter of mode checks, and emulation upon exit due to #UD exception. As a result, the hypervisor behaves incorrectly on vm86 mode. VMXOFF, VMLAUNCH or VMRESUME cause on vm86 exit due to #UD. The hypervisor then emulates these instruction and inject #GP to the guest instead of #UD. This patch creates a new group for these instructions and mark only VMCALL as an instruction which can be emulated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: fix some sparse warningsPaolo Bonzini2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sparse reports the following easily fixed warnings: arch/x86/kvm/vmx.c:8795:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:2138:5: sparse: symbol vmx_read_l1_tsc was not declared. Should it be static? arch/x86/kvm/vmx.c:6151:48: sparse: Using plain integer as NULL pointer arch/x86/kvm/vmx.c:8851:6: sparse: symbol vmx_sched_in was not declared. Should it be static? arch/x86/kvm/svm.c:2162:5: sparse: symbol svm_read_l1_tsc was not declared. Should it be static? Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: nVMX: nested TPR shadow/threshold emulationWanpeng Li2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=61411 TPR shadow/threshold feature is important to speed up the Windows guest. Besides, it is a must feature for certain VMM. We map virtual APIC page address and TPR threshold from L1 VMCS. If TPR_BELOW_THRESHOLD VM exit is triggered by L2 guest and L1 interested in, we inject it into L1 VMM for handling. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [Add PAGE_ALIGNED check, do not write useless virtual APIC page address if TPR shadowing is disabled. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: nVMX: introduce nested_get_vmcs12_pagesWanpeng Li2014-08-29
| | | | | | | | | | | | | | | | | | | | | | | | Introduce function nested_get_vmcs12_pages() to check the valid of nested apic access page and virtual apic page earlier. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | kvm: x86: fix tracing for 32-bitPaolo Bonzini2014-08-25
| | | | | | | | | | | | | | | | | | | | | | | | Fix commit 7b46268d29543e313e731606d845e65c17f232e4, which mistakenly included the new tracepoint under #ifdef CONFIG_X86_64. Reported-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: trace kvm_ple_window grow/shrinkRadim Krčmář2014-08-21
| | | | | | | | | | | | | | | | | | | | | Tracepoint for dynamic PLE window, fired on every potential change. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: VMX: dynamise PLE windowRadim Krčmář2014-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Window is increased on every PLE exit and decreased on every sched_in. The idea is that we don't want to PLE exit if there is no preemption going on. We do this with sched_in() because it does not hold rq lock. There are two new kernel parameters for changing the window: ple_window_grow and ple_window_shrink ple_window_grow affects the window on PLE exit and ple_window_shrink does it on sched_in; depending on their value, the window is modifier like this: (ple_window is kvm_intel's global) ple_window_shrink/ | ple_window_grow | PLE exit | sched_in -------------------+--------------------+--------------------- < 1 | = ple_window | = ple_window < ple_window | *= ple_window_grow | /= ple_window_shrink otherwise | += ple_window_grow | -= ple_window_shrink A third new parameter, ple_window_max, controls the maximal ple_window; it is internally rounded down to a closest multiple of ple_window_grow. VCPU's PLE window is never allowed below ple_window. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: VMX: make PLE window per-VCPURadim Krčmář2014-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change PLE window into per-VCPU variable, seeded from module parameter, to allow greater flexibility. Brings in a small overhead on every vmentry. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: introduce sched_in to kvm_x86_opsRadim Krčmář2014-08-21
| | | | | | | | | | | | | | | | | | | | | | | | sched_in preempt notifier is available for x86, allow its use in specific virtualization technlogies as well. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: add kvm_arch_sched_inRadim Krčmář2014-08-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce preempt notifiers for architecture specific code. Advantage over creating a new notifier in every arch is slightly simpler code and guaranteed call order with respect to kvm_sched_in. Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: Replace X86_FEATURE_NX offset with the definitionNadav Amit2014-08-21
| | | | | | | | | | | | | | | | | | | | | | | | Replace reference to X86_FEATURE_NX using bit shift with the defined X86_FEATURE_NX. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: emulate: warn on invalid or uninitialized exception numbersPaolo Bonzini2014-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These were reported when running Jailhouse on AMD processors. Initialize ctxt->exception.vector with an invalid exception number, and warn if it remained invalid even though the emulator got an X86EMUL_PROPAGATE_FAULT return code. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: emulate: do not return X86EMUL_PROPAGATE_FAULT explicitlyPaolo Bonzini2014-08-20
| | | | | | | | | | | | | | | | | | | | | Always get it through emulate_exception or emulate_ts. This ensures that the ctxt->exception fields have been populated. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: Clarify PMU related features bit manipulationNadav Amit2014-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_pmu_cpuid_update makes a lot of bit manuiplation operations, when in fact there are already unions that can be used instead. Changing the bit manipulation to the union for clarity. This patch does not change the functionality. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: vmx: fix ept reserved bits for 1-GByte pageWanpeng Li2014-08-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EPT misconfig handler in kvm will check which reason lead to EPT misconfiguration after vmexit. One of the reasons is that an EPT paging-structure entry is configured with settings reserved for future functionality. However, the handler can't identify if paging-structure entry of reserved bits for 1-GByte page are configured, since PDPTE which point to 1-GByte page will reserve bits 29:12 instead of bits 7:3 which are reserved for PDPTE that references an EPT Page Directory. This patch fix it by reserve bits 29:12 for 1-GByte page. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: recalculate_apic_map after enabling apicNadav Amit2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, recalculate_apic_map ignores vcpus whose lapic is software disabled through the spurious interrupt vector. However, once it is re-enabled, the map is not recalculated. Therefore, if the guest OS configured DFR while lapic is software-disabled, the map may be incorrect. This patch recalculates apic map after software enabling the lapic. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: Clear apic tsc-deadline after deadlineNadav Amit2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel SDM 10.5.4.1 says "When the timer generates an interrupt, it disarms itself and clears the IA32_TSC_DEADLINE MSR". This patch clears the MSR upon timer interrupt delivery which delivered on deadline mode. Since the MSR may be reconfigured while an interrupt is pending, causing the new value to be overriden, pending timer interrupts are checked before setting a new deadline. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: #GP when attempts to write reserved bits of Variable Range MTRRsWanpeng Li2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Section 11.11.2.3 of the SDM mentions "All other bits in the IA32_MTRR_PHYSBASEn and IA32_MTRR_PHYSMASKn registers are reserved; the processor generates a general-protection exception(#GP) if software attempts to write to them". This patch do it in kvm. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: fix check legal type of Variable Range MTRRsWanpeng Li2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first entry in each pair(IA32_MTRR_PHYSBASEn) defines the base address and memory type for the range; the second entry(IA32_MTRR_PHYSMASKn) contains a mask used to determine the address range. The legal values for the type field of IA32_MTRR_PHYSBASEn are 0,1,4,5, and 6. However, IA32_MTRR_PHYSMASKn don't have type field. This patch avoid check if the type field is legal for IA32_MTRR_PHYSMASKn. Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | arch/x86: Use RCU_INIT_POINTER(x, NULL) in kvm/vmx.cMonam Agarwal2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here rcu_assign_pointer() is ensuring that the initialization of a structure is carried out before storing a pointer to that structure. So, rcu_assign_pointer(p, NULL) can always safely be converted to RCU_INIT_POINTER(p, NULL). Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: raise invalid TSS exceptions during a task switchPaolo Bonzini2014-08-19
| | | | | | | | | | | | | | | | | | | | | Conditions that would usually trigger a general protection fault should instead raise #TS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: x86: drop fpu_activate hookWanpeng Li2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only user of the fpu_activate hook was dropped in commit 2d04a05bd7e9 (KVM: x86 emulator: emulate CLTS internally, 2011-04-20). vmx_fpu_activate and svm_fpu_activate are still called on #NM (and for Intel CLTS), but never from common code; hence, there's no need for a hook. Reviewed-by: Yang Zhang <yang.z.zhang@intel.com> Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | | KVM: SVM: add rdmsr support for AMD event registersWei Huang2014-08-19
| |/ |/| | | | | | | | | | | | | | | | | | | | | Current KVM only supports RDMSR for K7_EVNTSEL0 and K7_PERFCTR0 MSRs. Reading the rest MSRs will trigger KVM to inject #GP into guest VM. This causes a warning message "Failed to access perfctr msr (MSR c0010001 is ffffffffffffffff)" on AMD host. This patch adds RDMSR support for all K7_EVNTSELn and K7_PERFCTRn registers and thus supresses the warning message. Signed-off-by: Wei Huang <wehuang@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: do not check CS.DPL against RPL during task switchPaolo Bonzini2014-08-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the check added by commit 5045b468037d (KVM: x86: check CS.DPL against RPL during task switch, 2014-05-15). Although the CS.DPL=CS.RPL check is mentioned in table 7-1 of the SDM as causing a #TSS exception, it is not mentioned in table 6-6 that lists "invalid TSS conditions" which cause #TSS exceptions. In fact it causes some tests to fail, which pass on bare-metal. Keep the rest of the commit, since we will find new uses for it in 3.18. Reported-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* | KVM: x86: Avoid emulating instructions on #UD mistakenlyNadav Amit2014-08-19
|/ | | | | | | | | | | | | Commit d40a6898e5 mistakenly caused instructions which are not marked as EmulateOnUD to be emulated upon #UD exception. The commit caused the check of whether the instruction flags include EmulateOnUD to never be evaluated. As a result instructions whose emulation is broken may be emulated. This fix moves the evaluation of EmulateOnUD so it would be evaluated. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> [Tweak operand order in &&, remove EmulateOnUD where it's now superfluous. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge branch 'akpm' (second patchbomb from Andrew Morton)Linus Torvalds2014-08-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge more incoming from Andrew Morton: "Two new syscalls: memfd_create in "shm: add memfd_create() syscall" kexec_file_load in "kexec: implementation of new syscall kexec_file_load" And: - Most (all?) of the rest of MM - Lots of the usual misc bits - fs/autofs4 - drivers/rtc - fs/nilfs - procfs - fork.c, exec.c - more in lib/ - rapidio - Janitorial work in filesystems: fs/ufs, fs/reiserfs, fs/adfs, fs/cramfs, fs/romfs, fs/qnx6. - initrd/initramfs work - "file sealing" and the memfd_create() syscall, in tmpfs - add pci_zalloc_consistent, use it in lots of places - MAINTAINERS maintenance - kexec feature work" * emailed patches from Andrew Morton <akpm@linux-foundation.org: (193 commits) MAINTAINERS: update nomadik patterns MAINTAINERS: update usb/gadget patterns MAINTAINERS: update DMA BUFFER SHARING patterns kexec: verify the signature of signed PE bzImage kexec: support kexec/kdump on EFI systems kexec: support for kexec on panic using new system call kexec-bzImage64: support for loading bzImage using 64bit entry kexec: load and relocate purgatory at kernel load time purgatory: core purgatory functionality purgatory/sha256: provide implementation of sha256 in purgaotory context kexec: implementation of new syscall kexec_file_load kexec: new syscall kexec_file_load() declaration kexec: make kexec_segment user buffer pointer a union resource: provide new functions to walk through resources kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc() kexec: move segment verification code in a separate function kexec: rename unusebale_pages to unusable_pages kernel: build bin2c based on config option CONFIG_BUILD_BIN2C bin2c: move bin2c in scripts/basic shm: wait for pins to be released when sealing ...
| * arch/x86: replace strict_strto callsDaniel Walter2014-08-08
| | | | | | | | | | | | | | | | | | | | | | | | Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter <dwalter@google.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-08-07
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull second round of KVM changes from Paolo Bonzini: "Here are the PPC and ARM changes for KVM, which I separated because they had small conflicts (respectively within KVM documentation, and with 3.16-rc changes). Since they were all within the subsystem, I took care of them. Stephen Rothwell reported some snags in PPC builds, but they are all fixed now; the latest linux-next report was clean. New features for ARM include: - KVM VGIC v2 emulation on GICv3 hardware - Big-Endian support for arm/arm64 (guest and host) - Debug Architecture support for arm64 (arm32 is on Christoffer's todo list) And for PPC: - Book3S: Good number of LE host fixes, enable HV on LE - Book3S HV: Add in-guest debug support This release drops support for KVM on the PPC440. As a result, the PPC merge removes more lines than it adds. :) I also included an x86 change, since Davidlohr tied it to an independent bug report and the reporter quickly provided a Tested-by; there was no reason to wait for -rc2" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (122 commits) KVM: Move more code under CONFIG_HAVE_KVM_IRQFD KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in use KVM: nVMX: Fix nested vmexit ack intr before load vmcs01 KVM: PPC: Enable IRQFD support for the XICS interrupt controller KVM: Give IRQFD its own separate enabling Kconfig option KVM: Move irq notifier implementation into eventfd.c KVM: Move all accesses to kvm::irq_routing into irqchip.c KVM: irqchip: Provide and use accessors for irq routing table KVM: Don't keep reference to irq routing table in irqfd struct KVM: PPC: drop duplicate tracepoint arm64: KVM: fix 64bit CP15 VM access for 32bit guests KVM: arm64: GICv3: mandate page-aligned GICV region arm64: KVM: GICv3: move system register access to msr_s/mrs_s KVM: PPC: PR: Handle FSCR feature deselects KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr KVM: PPC: Remove DCR handling KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Handle magic page in kvmppc_ld/st ...
| * KVM: nVMX: fix "acknowledge interrupt on exit" when APICv is in useWanpeng Li2014-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to), "Acknowledge interrupt on exit" behavior can be emulated. To do so, KVM will ask the APIC for the interrupt vector if during a nested vmexit if VM_EXIT_ACK_INTR_ON_EXIT is set. With APICv, kvm_get_apic_interrupt would return -1 and give the following WARNING: Call Trace: [<ffffffff81493563>] dump_stack+0x49/0x5e [<ffffffff8103f0eb>] warn_slowpath_common+0x7c/0x96 [<ffffffffa059709a>] ? nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffff8103f11a>] warn_slowpath_null+0x15/0x17 [<ffffffffa059709a>] nested_vmx_vmexit+0xa4/0x233 [kvm_intel] [<ffffffffa0594295>] ? nested_vmx_exit_handled+0x6a/0x39e [kvm_intel] [<ffffffffa0537931>] ? kvm_apic_has_interrupt+0x80/0xd5 [kvm] [<ffffffffa05972ec>] vmx_check_nested_events+0xc3/0xd3 [kvm_intel] [<ffffffffa051ebe9>] inject_pending_event+0xd0/0x16e [kvm] [<ffffffffa051efa0>] vcpu_enter_guest+0x319/0x704 [kvm] To fix this, we cannot rely on the processor's virtual interrupt delivery, because "acknowledge interrupt on exit" must only update the virtual ISR/PPR/IRR registers (and SVI, which is just a cache of the virtual ISR) but it should not deliver the interrupt through the IDT. Thus, KVM has to deliver the interrupt "by hand", similar to the treatment of EOI in commit fc57ac2c9ca8 (KVM: lapic: sync highest ISR to hardware apic on EOI, 2014-05-14). The patch modifies kvm_cpu_get_interrupt to always acknowledge an interrupt; there are only two callers, and the other is not affected because it is never reached with kvm_apic_vid_enabled() == true. Then it modifies apic_set_isr and apic_clear_irr to update SVI and RVI in addition to the registers. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Suggested-by: "Zhang, Yang Z" <yang.z.zhang@intel.com> Tested-by: Liu, RongrongX <rongrongx.liu@intel.com> Tested-by: Felipe Reyes <freyes@suse.com> Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: nVMX: Fix nested vmexit ack intr before load vmcs01Wanpeng Li2014-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An external interrupt will cause a vmexit with reason "external interrupt" when L2 is running. L1 will pick up the interrupt through vmcs12 if L1 set the ack interrupt bit. Commit 77b0f5d (KVM: nVMX: Ack and write vector info to intr_info if L1 asks us to) retrieves the interrupt that belongs to L1 before vmcs01 is loaded. This will lead to problems in the next patch, which would write to SVI of vmcs02 instead of vmcs01 (SVI of vmcs02 doesn't make sense because L2 runs without APICv). Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Tested-by: Liu, RongrongX <rongrongx.liu@intel.com> Tested-by: Felipe Reyes <freyes@suse.com> Fixes: 77b0f5d67ff2781f36831cba79674c3e97bd7acf Cc: stable@vger.kernel.org Signed-off-by: Wanpeng Li <wanpeng.li@linux.intel.com> [Move tracepoint as well. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * KVM: Give IRQFD its own separate enabling Kconfig optionPaul Mackerras2014-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the IRQFD code is conditional on CONFIG_HAVE_KVM_IRQ_ROUTING. So that we can have the IRQFD code compiled in without having the IRQ routing code, this creates a new CONFIG_HAVE_KVM_IRQFD, makes the IRQFD code conditional on it instead of CONFIG_HAVE_KVM_IRQ_ROUTING, and makes all the platforms that currently select HAVE_KVM_IRQ_ROUTING also select HAVE_KVM_IRQFD. Signed-off-by: Paul Mackerras <paulus@samba.org> Tested-by: Eric Auger <eric.auger@linaro.org> Tested-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * Merge tag 'signed-kvm-ppc-next' of git://github.com/agraf/linux-2.6 into kvmPaolo Bonzini2014-08-05
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch queue for ppc - 2014-08-01 Highlights in this release include: - BookE: Rework instruction fetch, not racy anymore now - BookE HV: Fix ONE_REG accessors for some in-hardware registers - Book3S: Good number of LE host fixes, enable HV on LE - Book3S: Some misc bug fixes - Book3S HV: Add in-guest debug support - Book3S HV: Preload cache lines on context switch - Remove 440 support Alexander Graf (31): KVM: PPC: Book3s PR: Disable AIL mode with OPAL KVM: PPC: Book3s HV: Fix tlbie compile error KVM: PPC: Book3S PR: Handle hyp doorbell exits KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: Book3S PR: Fix sparse endian checks PPC: Add asm helpers for BE 32bit load/store KVM: PPC: Book3S HV: Make HTAB code LE host aware KVM: PPC: Book3S HV: Access guest VPA in BE KVM: PPC: Book3S HV: Access host lppaca and shadow slb in BE KVM: PPC: Book3S HV: Access XICS in BE KVM: PPC: Book3S HV: Fix ABIv2 on LE KVM: PPC: Book3S HV: Enable for little endian hosts KVM: PPC: Book3S: Move vcore definition to end of kvm_arch struct KVM: PPC: Deflect page write faults properly in kvmppc_st KVM: PPC: Book3S: Stop PTE lookup on write errors KVM: PPC: Book3S: Add hack for split real mode KVM: PPC: Book3S: Make magic page properly 4k mappable KVM: PPC: Remove 440 support KVM: Rename and add argument to check_extension KVM: Allow KVM_CHECK_EXTENSION on the vm fd KVM: PPC: Book3S: Provide different CAPs based on HV or PR mode KVM: PPC: Implement kvmppc_xlate for all targets KVM: PPC: Move kvmppc_ld/st to common code KVM: PPC: Remove kvmppc_bad_hva() KVM: PPC: Use kvm_read_guest in kvmppc_ld KVM: PPC: Handle magic page in kvmppc_ld/st KVM: PPC: Separate loadstore emulation from priv emulation KVM: PPC: Expose helper functions for data/inst faults KVM: PPC: Remove DCR handling KVM: PPC: HV: Remove generic instruction emulation KVM: PPC: PR: Handle FSCR feature deselects Alexey Kardashevskiy (1): KVM: PPC: Book3S: Fix LPCR one_reg interface Aneesh Kumar K.V (4): KVM: PPC: BOOK3S: PR: Fix PURR and SPURR emulation KVM: PPC: BOOK3S: PR: Emulate virtual timebase register KVM: PPC: BOOK3S: PR: Emulate instruction counter KVM: PPC: BOOK3S: HV: Update compute_tlbie_rb to handle 16MB base page Anton Blanchard (2): KVM: PPC: Book3S HV: Fix ABIv2 indirect branch issue KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bharat Bhushan (10): kvm: ppc: bookehv: Added wrapper macros for shadow registers kvm: ppc: booke: Use the shared struct helpers of SRR0 and SRR1 kvm: ppc: booke: Use the shared struct helpers of SPRN_DEAR kvm: ppc: booke: Add shared struct helpers of SPRN_ESR kvm: ppc: booke: Use the shared struct helpers for SPRN_SPRG0-7 kvm: ppc: Add SPRN_EPR get helper function kvm: ppc: bookehv: Save restore SPRN_SPRG9 on guest entry exit KVM: PPC: Booke-hv: Add one reg interface for SPRG9 KVM: PPC: Remove comment saying SPRG1 is used for vcpu pointer KVM: PPC: BOOKEHV: rename e500hv_spr to bookehv_spr Michael Neuling (1): KVM: PPC: Book3S HV: Add H_SET_MODE hcall handling Mihai Caraman (8): KVM: PPC: e500mc: Enhance tlb invalidation condition on vcpu schedule KVM: PPC: e500: Fix default tlb for victim hint KVM: PPC: e500: Emulate power management control SPR KVM: PPC: e500mc: Revert "add load inst fixup" KVM: PPC: Book3e: Add TLBSEL/TSIZE defines for MAS0/1 KVM: PPC: Book3s: Remove kvmppc_read_inst() function KVM: PPC: Allow kvmppc_get_last_inst() to fail KVM: PPC: Bookehv: Get vcpu's last instruction for emulation Paul Mackerras (4): KVM: PPC: Book3S: Controls for in-kernel sPAPR hypercall handling KVM: PPC: Book3S: Allow only implemented hcalls to be enabled or disabled KVM: PPC: Book3S PR: Take SRCU read lock around RTAS kvm_read_guest() call KVM: PPC: Book3S: Make kvmppc_ld return a more accurate error indication Stewart Smith (2): Split out struct kvmppc_vcore creation to separate function Use the POWER8 Micro Partition Prefetch Engine in KVM HV on POWER8 Conflicts: Documentation/virtual/kvm/api.txt
| | * KVM: Rename and add argument to check_extensionAlexander Graf2014-07-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation to make the check_extension function available to VM scope we add a struct kvm * argument to the function header and rename the function accordingly. It will still be called from the /dev/kvm fd, but with a NULL argument for struct kvm *. Signed-off-by: Alexander Graf <agraf@suse.de> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
* | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2014-08-05
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer and time updates from Thomas Gleixner: "A rather large update of timers, timekeeping & co - Core timekeeping code is year-2038 safe now for 32bit machines. Now we just need to fix all in kernel users and the gazillion of user space interfaces which rely on timespec/timeval :) - Better cache layout for the timekeeping internal data structures. - Proper nanosecond based interfaces for in kernel users. - Tree wide cleanup of code which wants nanoseconds but does hoops and loops to convert back and forth from timespecs. Some of it definitely belongs into the ugly code museum. - Consolidation of the timekeeping interface zoo. - A fast NMI safe accessor to clock monotonic for tracing. This is a long standing request to support correlated user/kernel space traces. With proper NTP frequency correction it's also suitable for correlation of traces accross separate machines. - Checkpoint/restart support for timerfd. - A few NOHZ[_FULL] improvements in the [hr]timer code. - Code move from kernel to kernel/time of all time* related code. - New clocksource/event drivers from the ARM universe. I'm really impressed that despite an architected timer in the newer chips SoC manufacturers insist on inventing new and differently broken SoC specific timers. [ Ed. "Impressed"? I don't think that word means what you think it means ] - Another round of code move from arch to drivers. Looks like most of the legacy mess in ARM regarding timers is sorted out except for a few obnoxious strongholds. - The usual updates and fixlets all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits) timekeeping: Fixup typo in update_vsyscall_old definition clocksource: document some basic timekeeping concepts timekeeping: Use cached ntp_tick_length when accumulating error timekeeping: Rework frequency adjustments to work better w/ nohz timekeeping: Minor fixup for timespec64->timespec assignment ftrace: Provide trace clocks monotonic timekeeping: Provide fast and NMI safe access to CLOCK_MONOTONIC seqcount: Add raw_write_seqcount_latch() seqcount: Provide raw_read_seqcount() timekeeping: Use tk_read_base as argument for timekeeping_get_ns() timekeeping: Create struct tk_read_base and use it in struct timekeeper timekeeping: Restructure the timekeeper some more clocksource: Get rid of cycle_last clocksource: Move cycle_last validation to core code clocksource: Make delta calculation a function wireless: ath9k: Get rid of timespec conversions drm: vmwgfx: Use nsec based interfaces drm: i915: Use nsec based interfaces timekeeping: Provide ktime_get_raw() hangcheck-timer: Use ktime_get_ns() ...
| * | timekeeping: Create struct tk_read_base and use it in struct timekeeperThomas Gleixner2014-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The members of the new struct are the required ones for the new NMI safe accessor to clcok monotonic. In order to reuse the existing timekeeping code and to make the update of the fast NMI safe timekeepers a simple memcpy use the struct for the timekeeper as well and convert all users. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | clocksource: Get rid of cycle_lastThomas Gleixner2014-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | cycle_last was added to the clocksource to support the TSC validation. We moved that to the core code, so we can get rid of the extra copy. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | x86: kvm: Make kvm_get_time_and_clockread() nanoseconds basedThomas Gleixner2014-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the relevant base data right away to nanoseconds instead of doing the conversion on every readout. Reduces text size by 160 bytes. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
| * | x86: kvm: Use ktime_get_boot_ns()Thomas Gleixner2014-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new nanoseconds based interface and get rid of the timespec conversion dance. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Gleb Natapov <gleb@kernel.org> Cc: kvm@vger.kernel.org Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-08-04
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM changes from Paolo Bonzini: "These are the x86, MIPS and s390 changes; PPC and ARM will come in a few days. MIPS and s390 have little going on this release; just bugfixes, some small, some larger. The highlights for x86 are nested VMX improvements (Jan Kiszka), optimizations for old processor (up to Nehalem, by me and Bandan Das), and a lot of x86 emulator bugfixes (Nadav Amit). Stephen Rothwell reported a trivial conflict with the tracing branch" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (104 commits) x86/kvm: Resolve shadow warnings in macro expansion KVM: s390: rework broken SIGP STOP interrupt handling KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table KVM: vmx: remove duplicate vmx_mpx_supported() prototype KVM: s390: Fix memory leak on busy SIGP stop x86/kvm: Resolve shadow warning from min macro kvm: Resolve missing-field-initializers warnings Replace NR_VMX_MSR with its definition KVM: x86: Assertions to check no overrun in MSR lists KVM: x86: set rflags.rf during fault injection KVM: x86: Setting rflags.rf during rep-string emulation KVM: x86: DR6/7.RTM cannot be written KVM: nVMX: clean up nested_release_vmcs12 and code around it KVM: nVMX: fix lifetime issues for vmcs02 KVM: x86: Defining missing x86 vectors KVM: x86: emulator injects #DB when RFLAGS.RF is set KVM: x86: Cleanup of rflags.rf cleaning KVM: x86: Clear rflags.rf on emulated instructions KVM: x86: popf emulation should not change RF KVM: x86: Clearing rflags.rf upon skipped emulated instruction ...
| * | | x86/kvm: Resolve shadow warnings in macro expansionMark D Rustad2014-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve shadow warnings that appear in W=2 builds. Instead of using ret to hold the return pointer, save the length in a new variable saved_len and compute the pointer on exit. This also resolves a very technical error, in that ret was declared as a const char *, when it really was a char * const. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: vmx: remove duplicate vmx_mpx_supported() prototypeChris J Arges2014-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a prototype which was added by both 93c4adc7afe and 36be0b9deb2. Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | x86/kvm: Resolve shadow warning from min macroMark Rustad2014-07-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resolve a shadow warning generated in W=2 builds by the nested use of the min macro by instead using the min3 macro for the minimum of 3 values. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | Replace NR_VMX_MSR with its definitionPaolo Bonzini2014-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using ARRAY_SIZE directly makes it easier to read the code. While touching the code, replace the division by a multiplication in the recently added BUILD_BUG_ON. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: Assertions to check no overrun in MSR listsNadav Amit2014-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is no check whether shared MSRs list overrun the allocated size which can results in bugs. In addition there is no check that vmx->guest_msrs has sufficient space to accommodate all the VMX msrs. This patch adds the assertions. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: set rflags.rf during fault injectionNadav Amit2014-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86 does not automatically set rflags.rf during event injection. This patch does partial job, setting rflags.rf upon fault injection. It does not handle the setting of RF upon interrupt injection on rep-string instruction. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: Setting rflags.rf during rep-string emulationNadav Amit2014-07-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates RF for rep-string emulation. The flag is set upon the first iteration, and cleared after the last (if emulated). It is intended to make sure that if a trap (in future data/io #DB emulation) or interrupt is delivered to the guest during the rep-string instruction, RF will be set correctly. RF affects whether instruction breakpoint in the guest is masked. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | KVM: x86: DR6/7.RTM cannot be writtenNadav Amit2014-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit <namit@cs.technion.ac.il> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>