aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/x86.c
Commit message (Collapse)AuthorAge
...
* KVM: x86 emulator: emulate CLTS internallyAvi Kivity2011-05-22
| | | | | | | | | Avoid using ctxt->vcpu; we can do everything with ->get_cr() and ->set_cr(). A side effect is that we no longer activate the fpu on emulated CLTS; but that should be very rare. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: add and use new callbacks set_idt(), set_gdt()Avi Kivity2011-05-22
| | | | | | Replacing direct calls to realmode_lgdt(), realmode_lidt(). Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: drop vcpu argument from intercept callbackAvi Kivity2011-05-22
| | | | | | Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: drop vcpu argument from cr/dr/cpl/msr callbacksAvi Kivity2011-05-22
| | | | | | Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: drop vcpu argument from segment/gdt/idt callbacksAvi Kivity2011-05-22
| | | | | | | | | Making the emulator caller agnostic. [Takuya Yoshikawa: fix typo leading to LDT failures] Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: drop vcpu argument from pio callbacksAvi Kivity2011-05-11
| | | | | | Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: drop vcpu argument from memory read/write callbacksAvi Kivity2011-05-11
| | | | | | Making the emulator caller agnostic. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: X86: Update last_guest_tsc in vcpu_putJoerg Roedel2011-05-11
| | | | | | | | | | | | The last_guest_tsc is used in vcpu_load to adjust the tsc_offset since tsc-scaling is merged. So the last_guest_tsc needs to be updated in vcpu_put instead of the the last_host_tsc. This is fixed with this patch. Reported-by: Jan Kiszka <jan.kiszka@web.de> Tested-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fix push of wrong eip when doing softintSerge E. Hallyn2011-05-11
| | | | | | | | | | When doing a soft int, we need to bump eip before pushing it to the stack. Otherwise we'll do the int a second time. [apw@canonical.com: merged eip update as per Jan's recommendation.] Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: emulator: do not needlesly sync registers from emulator ctxt to vcpuGleb Natapov2011-05-11
| | | | | | | | | | | | | Currently we sync registers back and forth before/after exiting to userspace for IO, but during IO device model shouldn't need to read/write the registers, so we can as well skip those sync points. The only exaception is broken vmware backdor interface. The new code sync registers content during IO only if registers are read from/written to by userspace in the middle of the IO operation and this almost never happens in practise. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: X86: Implement userspace interface to set virtual_tsc_khzJoerg Roedel2011-05-11
| | | | | | | | | | This patch implements two new vm-ioctls to get and set the virtual_tsc_khz if the machine supports tsc-scaling. Setting the tsc-frequency is only possible before userspace creates any vcpu. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: X86: Delegate tsc-offset calculation to architecture codeJoerg Roedel2011-05-11
| | | | | | | | | | With TSC scaling in SVM the tsc-offset needs to be calculated differently. This patch propagates this calculation into the architecture specific modules so that this complexity can be handled there. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: X86: Make tsc_delta calculation a function of guest tscJoerg Roedel2011-05-11
| | | | | | | | | | | | The calculation of the tsc_delta value to ensure a forward-going tsc for the guest is a function of the host-tsc. This works as long as the guests tsc_khz is equal to the hosts tsc_khz. With tsc-scaling hardware support this is not longer true and the tsc_delta needs to be calculated using guest_tsc values. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: X86: Let kvm-clock report the right tsc frequencyJoerg Roedel2011-05-11
| | | | | | | | This patch changes the kvm_guest_time_update function to use TSC frequency the guest actually has for updating its clock. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Add intercept check for emulated cr accessesJoerg Roedel2011-05-11
| | | | | | | | This patch adds all necessary intercept checks for instructions that access the crX registers. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Add x86 callback for intercept checkJoerg Roedel2011-05-11
| | | | | | | | This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTEDJoerg Roedel2011-05-11
| | | | | | | | | This patch prevents the changed CPU state to be written back when the emulator detected that the instruction was intercepted by the guest. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: add framework for instruction interceptsAvi Kivity2011-05-11
| | | | | | | | | | | | | | | | | | | | | | | | When running in guest mode, certain instructions can be intercepted by hardware. This also holds for nested guests running on emulated virtualization hardware, in particular instructions emulated by kvm itself. This patch adds a framework for intercepting instructions. If an instruction is marked for interception, and if we're running in guest mode, a callback is called to check whether an intercept is needed or not. The callback is called at three points in time: immediately after beginning execution, after checking privilge exceptions, and after checking memory exception. This suits the different interception points defined for different instructions and for the various virtualization instruction sets. In addition, a new X86EMUL_INTERCEPT is defined, which any callback or memory access may define, allowing the more complicated intercepts to be implemented in existing callbacks. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86 emulator: define callbacks for using the guest fpu within the emulatorAvi Kivity2011-05-11
| | | | | | Needed for emulating fpu instructions. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: 16-byte mmio supportAvi Kivity2011-05-11
| | | | | | | | | Since sse instructions can issue 16-byte mmios, we need to support them. We can't increase the kvm_run mmio buffer size to 16 bytes without breaking compatibility, so instead we break the large mmios into two smaller 8-byte ones. Since the bus is 64-bit we aren't breaking any atomicity guarantees. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Split mmio completion into a functionAvi Kivity2011-05-11
| | | | | | Make room for sse mmio completions. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: extend in-kernel mmio to handle >8 byte transactionsAvi Kivity2011-05-11
| | | | | | Needed for coalesced mmio using sse. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: better fix for race between nmi injection and enabling nmi windowGleb Natapov2011-05-11
| | | | | | Fix race between nmi injection and enabling nmi window in a simpler way. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Revert "KVM: Fix race between nmi injection and enabling nmi window"Marcelo Tosatti2011-05-11
| | | | | | | | This reverts commit f86368493ec038218e8663cc1b6e5393cd8e008a. Simpler fix to follow. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: expose async pf through our standard mechanismGlauber Costa2011-05-11
| | | | | | | | | | | | | As Avi recently mentioned, the new standard mechanism for exposing features is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf missed that. So expose async_pf here. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Gleb Natapov <gleb@redhat.com> CC: Avi Kivity <avi@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versionsAvi Kivity2011-05-11
| | | | | | | | Some rflags bits are owned by the host, not guest, so we need to use kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them back. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: move and fix substitue search for missing CPUID entriesAndre Przywara2011-04-06
| | | | | | | | | | | | | | | | | | | If KVM cannot find an exact match for a requested CPUID leaf, the code will try to find the closest match instead of simply confessing it's failure. The implementation was meant to satisfy the CPUID specification, but did not properly check for extended and standard leaves and also didn't account for the index subleaf. Beside that this rule only applies to CPUID intercepts, which is not the only user of the kvm_find_cpuid_entry() function. So fix this algorithm and call it from kvm_emulate_cpuid(). This fixes a crash of newer Linux kernels as KVM guests on AMD Bulldozer CPUs, where bogus values were returned in response to a CPUID intercept. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fix XSAVE bit scanningAndre Przywara2011-04-06
| | | | | | | | | | | | When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area leaves, it assumes that the leaves are contigious and stops at the first zero one. On AMD hardware there is a gap, though, as LWP uses leaf 62 to announce it's state save area. So lets iterate through all 64 possible leaves and simply skip zero ones to also cover later features. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds2011-03-18
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Flush TLB if PGD entry is changed in i386 PAE mode x86, dumpstack: Correct stack dump info when frame pointer is available x86: Clean up csum-copy_64.S a bit x86: Fix common misspellings x86: Fix misspelling and align params x86: Use PentiumPro-optimized partial_csum() on VIA C7
| * x86: Fix common misspellingsLucas De Marchi2011-03-18
| | | | | | | | | | | | | | | | | | They were generated by 'codespell' and then manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi> Cc: trivial@kernel.org LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | KVM: fix kvmclock regression due to missing clock updateNikola Ciprich2011-03-17
| | | | | | | | | | | | | | | | | | | | commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu), breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function to proper place. Signed-off-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz> Acked-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: emulator: Fix io permission checking for 64bit guestGleb Natapov2011-03-17
| | | | | | | | | | | | | | | | | | Current implementation truncates upper 32bit of TR base address during IO permission bitmap check. The patch fixes this. Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com> Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: MMU: move mmu pages calculated out of mmu lockXiao Guangrong2011-03-17
| | | | | | | | | | | | | | | | kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by kvm->slots_lock, so move it out of mmu spinlock Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: better readability of efer_reserved_bitsLai Jiangshan2011-03-17
| | | | | | | | | | | | | | use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: Clear async page fault hash after switching to real modeLai Jiangshan2011-03-17
| | | | | | | | | | | | | | | | | | The hash array of async gfns may still contain some left gfns after kvm_clear_async_pf_completion_queue() called, need to clear them. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: x86: Convert tsc_write_lock to raw_spinlockJan Kiszka2011-03-17
| | | | | | | | | | | | | | | | Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: remove isr_ack logic from PICGleb Natapov2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | isr_ack logic was added by e48258009d to avoid unnecessary IPIs. Back then it made sense, but now the code checks that vcpu is ready to accept interrupt before sending IPI, so this logic is no longer needed. The patch removes it. Fixes a regression with Debian/Hurd. Signed-off-by: Gleb Natapov <gleb@redhat.com> Reported-and-tested-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: Convert kvm_lock to raw_spinlockJan Kiszka2011-03-17
| | | | | | | | | | | | | | | | Code under this lock requires non-preemptibility. Ensure this also over -rt by converting it to raw spinlock. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: Fix race between nmi injection and enabling nmi windowAvi Kivity2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The interrupt injection logic looks something like if an nmi is pending, and nmi injection allowed inject nmi if an nmi is pending request exit on nmi window the problem is that "nmi is pending" can be set asynchronously by the PIT; if it happens to fire between the two if statements, we will request an nmi window even though nmi injection is allowed. On SVM, this has disasterous results, since it causes eflags.TF to be set in random guest code. The fix is simple; make nmi_pending synchronous using the standard vcpu->requests mechanism; this ensures the code above is completely synchronous wrt nmi_pending. Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: Drop ad-hoc vendor specific instruction restrictionAvi Kivity2011-03-17
| | | | | | | | | | | | | | Use the new support in the emulator, and drop the ad-hoc code in x86.c. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: Drop bogus x86_decode_insn() error checkAvi Kivity2011-03-17
| | | | | | | | | | | | | | | | | | | | x86_decode_insn() doesn't return X86EMUL_* values, so the check for X86EMUL_PROPOGATE_FAULT will always fail. There is a proper check later on, so there is no need for a replacement for this code. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: x86: release kvmclock page on resetGlauber Costa2011-03-17
| | | | | | | | | | | | | | | | | | | | When a vcpu is reset, kvmclock page keeps being written to this days. This is wrong and inconsistent: a cpu reset should take it to its initial state. Signed-off-by: Glauber Costa <glommer@redhat.com> CC: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: x86: handle guest access to BBL_CR_CTL3 MSRjohn cooper2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A correction to Intel cpu model CPUID data (patch queued) caused winxp to BSOD when booted with a Penryn model. This was traced to the CPUID "model" field correction from 6 -> 23 (as is proper for a Penryn class of cpu). Only in this case does the problem surface. The cause for this failure is winxp accessing the BBL_CR_CTL3 MSR which is unsupported by current kvm, appears to be a legacy MSR not fully characterized yet existing in current silicon, and is apparently carried forward in MSR space to accommodate vintage code as here. It is not yet conclusive whether this MSR implements any of its legacy functionality or is just an ornamental dud for compatibility. While I found no silicon version specific documentation link to this MSR, a general description exists in Intel's developer's reference which agrees with the functional behavior of other bootloader/kernel code I've examined accessing BBL_CR_CTL3. Regrettably winxp appears to be setting bit #19 called out as "reserved" in the above document. So to minimally accommodate this MSR, kvm msr get will provide the equivalent mock data and kvm msr write will simply toss the guest passed data without interpretation. While this treatment of BBL_CR_CTL3 addresses the immediate problem, the approach may be modified pending clarification from Intel. Signed-off-by: john cooper <john.cooper@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: Add "exiting guest mode" stateXiao Guangrong2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we keep track of only two states: guest mode and host mode. This patch adds an "exiting guest mode" state that tells us that an IPI will happen soon, so unless we need to wait for the IPI, we can avoid it completely. Also 1: No need atomically to read/write ->mode in vcpu's thread 2: reorganize struct kvm_vcpu to make ->mode and ->requests in the same cache line explicitly Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: x86: Remove user space triggerable MCE error messageJan Kiszka2011-03-17
| | | | | | | | | | | | | | | | This case is a pure user space error we do not need to record. Moreover, it can be misused to flood the kernel log. Remove it. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: fix rcu usage warning in kvm_arch_vcpu_ioctl_set_sregs()Xiao Guangrong2011-03-17
|/ | | | | | | | | | | | | | | | | | | | | | | | Fix: [ 1001.499596] =================================================== [ 1001.499599] [ INFO: suspicious rcu_dereference_check() usage. ] [ 1001.499601] --------------------------------------------------- [ 1001.499604] include/linux/kvm_host.h:301 invoked rcu_dereference_check() without protection! ...... [ 1001.499636] Pid: 6035, comm: qemu-system-x86 Not tainted 2.6.37-rc6+ #62 [ 1001.499638] Call Trace: [ 1001.499644] [] lockdep_rcu_dereference+0x9d/0xa5 [ 1001.499653] [] gfn_to_memslot+0x8d/0xc8 [kvm] [ 1001.499661] [] gfn_to_hva+0x16/0x3f [kvm] [ 1001.499669] [] kvm_read_guest_page+0x1e/0x5e [kvm] [ 1001.499681] [] kvm_read_guest_page_mmu+0x53/0x5e [kvm] [ 1001.499699] [] load_pdptrs+0x3f/0x9c [kvm] [ 1001.499705] [] ? vmx_set_cr0+0x507/0x517 [kvm_intel] [ 1001.499717] [] kvm_arch_vcpu_ioctl_set_sregs+0x1f3/0x3c0 [kvm] [ 1001.499727] [] kvm_vcpu_ioctl+0x6a5/0xbc5 [kvm] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* Merge branch 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2011-01-13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits) KVM: Initialize fpu state in preemptible context KVM: VMX: when entering real mode align segment base to 16 bytes KVM: MMU: handle 'map_writable' in set_spte() function KVM: MMU: audit: allow audit more guests at the same time KVM: Fetch guest cr3 from hardware on demand KVM: Replace reads of vcpu->arch.cr3 by an accessor KVM: MMU: only write protect mappings at pagetable level KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear() KVM: MMU: Initialize base_role for tdp mmus KVM: VMX: Optimize atomic EFER load KVM: VMX: Add definitions for more vm entry/exit control bits KVM: SVM: copy instruction bytes from VMCB KVM: SVM: implement enhanced INVLPG intercept KVM: SVM: enhance mov DR intercept handler KVM: SVM: enhance MOV CR intercept handler KVM: SVM: add new SVM feature bit names KVM: cleanup emulate_instruction KVM: move complete_insn_gp() into x86.c KVM: x86: fix CR8 handling KVM guest: Fix kvm clock initialization when it's configured out ...
| * KVM: Initialize fpu state in preemptible contextAvi Kivity2011-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | init_fpu() (which is indirectly called by the fpu switching code) assumes it is in process context. Rather than makeing init_fpu() use an atomic allocation, which can cause a task to be killed, make sure the fpu is already initialized when we enter the run loop. KVM-Stable-Tag. Reported-and-tested-by: Kirill A. Shutemov <kas@openvz.org> Acked-by: Pekka Enberg <penberg@kernel.org> Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Fetch guest cr3 from hardware on demandAvi Kivity2011-01-12
| | | | | | | | | | | | | | | | | | | | Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Replace reads of vcpu->arch.cr3 by an accessorAvi Kivity2011-01-12
| | | | | | | | | | | | This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>