aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/vmx.c
Commit message (Collapse)AuthorAge
* KVM: VMX: do not overwrite uptodate vcpu->arch.cr3 on KVM_SET_SREGSMarcelo Tosatti2011-06-19
| | | | | | | | | Only decache guest CR3 value if vcpu->arch.cr3 is stale. Fixes loadvm with live guest. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Tested-by: Markus Schade <markus.schade@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Cache vmcs segment fieldsAvi Kivity2011-05-22
| | | | | | | | | Since the emulator now checks segment limits and access rights, it generates a lot more accesses to the vmcs segment fields. Undo some of the performance hit by cacheing those fields in a read-only cache (the entire cache is invalidated on any write, or on guest exit). Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid reading %rip unnecessarily when handling exceptionsAvi Kivity2011-05-22
| | | | | | Avoids a VMREAD. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: fix push of wrong eip when doing softintSerge E. Hallyn2011-05-11
| | | | | | | | | | When doing a soft int, we need to bump eip before pushing it to the stack. Otherwise we'll do the int a second time. [apw@canonical.com: merged eip update as per Jan's recommendation.] Signed-off-by: Serge E. Hallyn <serge.hallyn@ubuntu.com> Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Ensure that vmx_create_vcpu always returns proper errorJan Kiszka2011-05-11
| | | | | | | | | | In case certain allocations fail, vmx_create_vcpu may return 0 as error instead of a negative value encoded via ERR_PTR. This causes a NULL pointer dereferencing later on in kvm_vm_ioctl_vcpu_create. Reported-by: Sasha Levin <levinsasha928@gmail.com> Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: X86: Delegate tsc-offset calculation to architecture codeJoerg Roedel2011-05-11
| | | | | | | | | | With TSC scaling in SVM the tsc-offset needs to be calculated differently. This patch propagates this calculation into the architecture specific modules so that this complexity can be handled there. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: X86: Implement call-back to propagate virtual_tsc_khzJoerg Roedel2011-05-11
| | | | | | | | | | | This patch implements a call-back into the architecture code to allow the propagation of changes to the virtual tsc_khz of the vcpu. On SVM it updates the tsc_ratio variable, on VMX it does nothing. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Add x86 callback for intercept checkJoerg Roedel2011-05-11
| | | | | | | | This patch adds a callback into kvm_x86_ops so that svm and vmx code can do intercept checks on emulated instructions. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: simplify NMI mask managementAvi Kivity2011-05-11
| | | | | | | | | | | | Use vmx_set_nmi_mask() instead of open-coding management of the hardware bit and the software hint (nmi_known_unmasked). There's a slight change of behaviour when running without hardware virtual NMI support - we now clear the NMI mask if NMI delivery faulted in that case as well. This improves emulation accuracy. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exceptionAvi Kivity2011-05-11
| | | | | | vmx_complete_atomic_exit() cached it for us, so we can use it here. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionallyAvi Kivity2011-05-11
| | | | | | Only read it if we're going to use it later. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Refactor vmx_complete_atomic_exit()Avi Kivity2011-05-11
| | | | | | | Move the exit reason checks to the front of the function, for early exit in the common case. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Qualify check for host NMIAvi Kivity2011-05-11
| | | | | | | Check for the exit reason first; this allows us, later, to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneededAvi Kivity2011-05-11
| | | | | | | | When we haven't injected an interrupt, we don't need to recover the nmi blocking state (since the guest can't set it by itself). This allows us to avoid a VMREAD later on. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Cache cplAvi Kivity2011-05-11
| | | | | | | | We may read the cpl quite often in the same vmexit (instruction privilege check, memory access checks for instruction and operands), so we gain a bit if we cache the value. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Optimize vmx_get_cpl()Avi Kivity2011-05-11
| | | | | | | In long mode, vm86 mode is disallowed, so we need not check for it. Reading rflags.vm may require a VMREAD, so it is expensive. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Optimize vmx_get_rflags()Avi Kivity2011-05-11
| | | | | | If called several times within the same exit, return cached results. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versionsAvi Kivity2011-05-11
| | | | | | | | Some rflags bits are owned by the host, not guest, so we need to use kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them back. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: unbreak userspace that does not sets tss addressGleb Natapov2011-03-17
| | | | | | | | | | Commit 6440e5967bc broke old userspaces that do not set tss address before entering vcpu. Unbreak it by setting tss address to a safe value on the first vcpu entry. New userspaces should set tss address, so print warning in case it doesn't. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix rcu usage in init_rmode_* functionsXiao Guangrong2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix: [ 3494.671786] stack backtrace: [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 3494.671790] Call Trace: [ 3494.671796] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 3494.671826] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 3494.671834] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 3494.671843] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 3494.671851] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 3494.671861] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 3494.671867] [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel] and: [ 8328.789599] stack backtrace: [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23 [ 8328.789603] Call Trace: [ 8328.789609] [] ? lockdep_rcu_dereference+0x9d/0xa5 [ 8328.789621] [] ? kvm_memslots+0x6b/0x73 [kvm] [ 8328.789628] [] ? gfn_to_memslot+0x16/0x4f [kvm] [ 8328.789635] [] ? gfn_to_hva+0x16/0x27 [kvm] [ 8328.789643] [] ? kvm_write_guest_page+0x31/0x83 [kvm] [ 8328.789699] [] ? kvm_clear_guest_page+0x1a/0x1c [kvm] [ 8328.789713] [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel] Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Remove useless regs_page pointer from kvm_lapicTakuya Yoshikawa2011-03-17
| | | | | | | | | Access to this page is mostly done through the regs member which holds the address to this page. The exceptions are in vmx_vcpu_reset() and kvm_free_lapic() and these both can easily be converted to using regs. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Initialize vm86 TSS only once.Gleb Natapov2011-03-17
| | | | | | | | | | | | Currently vm86 task is initialized on each real mode entry and vcpu reset. Initialization is done by zeroing TSS and updating relevant fields. But since all vcpus are using the same TSS there is a race where one vcpu may use TSS while other vcpu is initializing it, so the vcpu that uses TSS will see wrong TSS content and will behave incorrectly. Fix that by initializing TSS only once. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: update live TR selector if it changes in real modeGleb Natapov2011-03-17
| | | | | | | | | | | | When rmode.vm86 is active TR descriptor is updated with vm86 task values, but selector is left intact. vmx_set_segment() makes sure that if TR register is written into while vm86 is active the new values are saved for use after vm86 is deactivated, but since selector is not updated on vm86 activation/deactivation new value is lost. Fix this by writing new selector into vmcs immediately. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: add the __noclone attribute to vmx_vcpu_runLai Jiangshan2011-03-17
| | | | | | | | | The changelog of 104f226 said "adds the __noclone attribute", but it was missing in its patch. I think it is still needed. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: fix detection of BIOS disabling VMXJoseph Cihula2011-03-17
| | | | | | | | This patch fixes the logic used to detect whether BIOS has disabled VMX, for the case where VMX is enabled only under SMX, but tboot is not active. Signed-off-by: Joseph Cihula <joseph.cihula@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid atomic operation in vmx_vcpu_runAvi Kivity2011-03-17
| | | | | | | | | Instead of exchanging the guest and host rcx, have separate storage for each. This allows us to avoid using the xchg instruction, which is is a little slower than normal operations. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Simplify saving guest rcx in vmx_vcpu_runAvi Kivity2011-03-17
| | | | | | | | | | | | | | | | | Change push top-of-stack pop guest-rcx pop dummy to pop guest-rcx which is the same thing, only simpler. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: increase ple_gap default to 128Rik van Riel2011-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some CPUs, a ple_gap of 41 is simply insufficient to ever trigger PLE exits, even with the minimalistic PLE test from kvm-unit-tests. http://git.kernel.org/?p=virt/kvm/kvm-unit-tests.git;a=commitdiff;h=eda71b28fa122203e316483b35f37aaacd42f545 For example, the Xeon X5670 CPU needs a ple_gap of at least 48 in order to get pause loop exits: # modprobe kvm_intel ple_gap=47 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 58298446 # rmmod kvm_intel # modprobe kvm_intel ple_gap=48 # taskset 1 /usr/local/bin/qemu-system-x86_64 \ -device testdev,chardev=log -chardev stdio,id=log \ -kernel x86/vmexit.flat -append ple-round-robin -smp 2 VNC server running on `::1:5900' enabling apic enabling apic ple-round-robin 36616 Increase the ple_gap to 128 to be on the safe side. Signed-off-by: Rik van Riel <riel@redhat.com> Acked-by: Zhai, Edwin <edwin.zhai@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Avoid leaking fake realmode state to userspaceAvi Kivity2011-03-17
| | | | | | | | | | | | | When emulating real mode, we fake some state: - tr.base points to a fake vm86 tss - segment registers are made to conform to vm86 restrictions change vmx_get_segment() not to expose this fake state to userspace; instead, return the original state. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Save and restore tr selector across mode switchesAvi Kivity2011-03-17
| | | | | | | | | | | | | When emulating real mode we play with tr hidden state, but leave tr.selector alone. That works well, except for save/restore, since loading TR writes it to the hidden state in vmx->rmode. Fix by also saving and restoring the tr selector; this makes things more consistent and allows migration to work during the early boot stages of Windows XP. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: when entering real mode align segment base to 16 bytesGleb Natapov2011-01-12
| | | | | | | | VMX checks that base is equal segment shifted 4 bits left. Otherwise guest entry fails. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fetch guest cr3 from hardware on demandAvi Kivity2011-01-12
| | | | | | | | | | Instead of syncing the guest cr3 every exit, which is expensince on vmx with ept enabled, sync it only on demand. [sheng: fix incorrect cr3 seen by Windows XP] Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Replace reads of vcpu->arch.cr3 by an accessorAvi Kivity2011-01-12
| | | | | | This allows us to keep cr3 in the VMCS, later on. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()Avi Kivity2011-01-12
| | | | | | | | 'error' is byte sized, so use a byte register constraint. Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Optimize atomic EFER loadAvi Kivity2011-01-12
| | | | | | | | | When NX is enabled on the host but not on the guest, we use the entry/exit msr load facility, which is slow. Optimize it to use entry/exit efer load, which is ~1200 cycles faster. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: SVM: copy instruction bytes from VMCBAndre Przywara2011-01-12
| | | | | | | | | | | In case of a nested page fault or an intercepted #PF newer SVM implementations provide a copy of the faulting instruction bytes in the VMCB. Use these bytes to feed the instruction emulator and avoid the costly guest instruction fetch in this case. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: cleanup emulate_instructionAndre Przywara2011-01-12
| | | | | | | | | | emulate_instruction had many callers, but only one used all parameters. One parameter was unused, another one is now hidden by a wrapper function (required for a future addition anyway), so most callers use now a shorter parameter list. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: move complete_insn_gp() into x86.cAndre Przywara2011-01-12
| | | | | | | | move the complete_insn_gp() helper function out of the VMX part into the generic x86 part to make it usable by SVM. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: fix CR8 handlingAndre Przywara2011-01-12
| | | | | | | | | | The handling of CR8 writes in KVM is currently somewhat cumbersome. This patch makes it look like the other CR register handlers and fixes a possible issue in VMX, where the RIP would be incremented despite an injected #GP. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: add module parameter to avoid trapping HLT instructions (v5)Anthony Liguori2011-01-12
| | | | | | | | | | | | | In certain use-cases, we want to allocate guests fixed time slices where idle guest cycles leave the machine idling. There are many approaches to achieve this but the most direct is to simply avoid trapping the HLT instruction which lets the guest directly execute the instruction putting the processor to sleep. Introduce this as a module-level option for kvm-vmx.ko since if you do this for one guest, you probably want to do it for all. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Return 0 from a failed VMREADAvi Kivity2011-01-12
| | | | | | | | | If we execute VMREAD during reboot we'll just skip over it. Instead of returning garbage, return 0, which has a much smaller chance of confusing the code. Otherwise we risk a flood of debug printk()s which block the reboot process if a serial console or netconsole is enabled. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add instruction-set-specific exit qualifications to kvm_exit traceAvi Kivity2011-01-12
| | | | | | | | | | | The exit reason alone is insufficient to understand exactly why an exit occured; add ISA-specific trace parameters for additional information. Because fetching these parameters is expensive on vmx, and because these parameters are fetched even if tracing is disabled, we fetch the parameters via a callback instead of as traditional trace arguments. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Record instruction set in kvm_exit tracepointAvi Kivity2011-01-12
| | | | | | | exit_reason's meaning depend on the instruction set; record it so a trace taken on one machine can be interpreted on another. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fold __vmx_vcpu_run() into vmx_vcpu_run()Avi Kivity2011-01-12
| | | | | | | | | cea15c2 ("KVM: Move KVM context switch into own function") split vmx_vcpu_run() to prevent multiple copies of the context switch from being generated (causing problems due to a label). This patch folds them back together again and adds the __noclone attribute to prevent the label from being duplicated. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Inform user about INTEL_TXT dependencyShane Wang2011-01-12
| | | | | | | | | Inform user to either disable TXT in the BIOS or do TXT launch with tboot before enabling KVM since some BIOSes do not set FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX bit when TXT is enabled. Signed-off-by: Shane Wang <shane.wang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Disallow NMI while blocked by STIAvi Kivity2011-01-12
| | | | | | | | | | | | While not mandated by the spec, Linux relies on NMI being blocked by an IF-enabling STI. VMX also refuses to enter a guest in this state, at least on some implementations. Disallow NMI while blocked by STI by checking for the condition, and requesting an interrupt window exit if it occurs. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: handle exit due to INVD in VMXGleb Natapov2011-01-12
| | | | | | | | | | Currently the exit is unhandled, so guest halts with error if it tries to execute INVD instruction. Call into emulator when INVD instruction is executed by a guest instead. This instruction is not needed by ordinary guests, but firmware (like OpenBIOS) use it and fail. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: remove setting of shadow_base_ptes for EPTMarcelo Tosatti2011-01-12
| | | | | | | | | | | | | | The EPT present/writable bits use the same position as normal pagetable bits. Since direct_map passes ACC_ALL to mmu_set_spte, thus always setting the writable bit on sptes, use the generic PT_PRESENT shadow_base_pte. Also pass present/writable error code information from EPT violation to generic pagefault handler. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Move KVM context switch into own functionAndi Kleen2011-01-12
| | | | | | | | | | | | | | | | | | gcc 4.5 with some special options is able to duplicate the VMX context switch asm in vmx_vcpu_run(). This results in a compile error because the inline asm sequence uses an on local label. The non local label is needed because other code wants to set up the return address. This patch moves the asm code into an own function and marks that explicitely noinline to avoid this problem. Better would be probably to just move it into an .S file. The diff looks worse than the change really is, it's all just code movement and no logic change. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: SVM: Do not report xsave in supported cpuidJoerg Roedel2010-12-08
| | | | | | | | | | | | To support xsave properly for the guest the SVM module need software support for it. As long as this is not present do not report the xsave as supported feature in cpuid. As a side-effect this patch moves the bit() helper function into the x86.h file so that it can be used in svm.c too. KVM-Stable-Tag. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>