aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/vmx.c
Commit message (Collapse)AuthorAge
* KVM: VMX: blocked-by-sti must not defer NMI injectionsJan Kiszka2010-05-13
| | | | | | | | | | | | | | | | | As the processor may not consider GUEST_INTR_STATE_STI as a reason for blocking NMI, it could return immediately with EXIT_REASON_NMI_WINDOW when we asked for it. But as we consider this state as NMI-blocking, we can run into an endless loop. Resolve this by allowing NMI injection if just GUEST_INTR_STATE_STI is active (originally suggested by Gleb). Intel confirmed that this is safe, the processor will never complain about NMI injection in this state. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> KVM-Stable-Tag Acked-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Save/restore rflags.vm correctly in real modeAvi Kivity2010-04-20
| | | | | | | | | | | | | | | | | | Currently we set eflags.vm unconditionally when entering real mode emulation through virtual-8086 mode, and clear it unconditionally when we enter protected mode. The means that the following sequence KVM_SET_REGS (rflags.vm=1) KVM_SET_SREGS (cr0.pe=1) Ends up with rflags.vm clear due to KVM_SET_SREGS triggering enter_pmode(). Fix by shadowing rflags.vm (and rflags.iopl) correctly while in real mode: reads and writes to those bits access a shadow register instead of the actual register. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* KVM: VMX: Update instruction length on intercepted BPJan Kiszka2010-03-01
| | | | | | | | | | | | We intercept #BP while in guest debugging mode. As VM exits due to intercepted exceptions do not necessarily come with valid idt_vectoring, we have to update event_exit_inst_len explicitly in such cases. At least in the absence of migration, this ensures that re-injections of #BP will find and use the correct instruction length. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Cc: stable@kernel.org (2.6.32, 2.6.33) Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Rename VMX_EPT_IGMT_BIT to VMX_EPT_IPAT_BITSheng Yang2010-03-01
| | | | | | | Following the new SDM. Now the bit is named "Ignore PAT memory type". Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Remove redundant test in vmx_set_efer()Julia Lawall2010-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | msr was tested above, so the second test is not needed. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ expression *x; expression e; identifier l; @@ if (x == NULL || ...) { ... when forall return ...; } ... when != goto l; when != x = e when != &x *x == NULL // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Wire up .fpu_activate() callbackAvi Kivity2010-03-01
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Remove redundant check in vm_need_virtualize_apic_accesses()Gui Jianfeng2010-03-01
| | | | | | | | flexpriority_enabled implies cpu_has_vmx_virtualize_apic_accesses() returning true, so we don't need this check here. Signed-off-by: Gui Jianfeng <guijianfeng@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Trace failed msr reads and writesAvi Kivity2010-03-01
| | | | | | | Record failed msrs reads and writes, and the fact that they failed as well. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Pass cr0.mp through to the guest when the fpu is activeAvi Kivity2010-03-01
| | | | | | | | | | | | | | When cr0.mp is clear, the guest doesn't expect a #NM in response to a WAIT instruction. Because we always keep cr0.mp set, it will get a #NM, and potentially be confused. Fix by keeping cr0.mp set only when the fpu is inactive, and passing it through when inactive. Reported-by: Lorenzo Martignoni <martignlo@gmail.com> Analyzed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Rename vcpu->shadow_efer to eferAvi Kivity2010-03-01
| | | | | | | None of the other registers have the shadow_ prefix. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Add a helper for checking if the guest is in protected modeAvi Kivity2010-03-01
| | | | | Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Activate fpu on cltsAvi Kivity2010-03-01
| | | | | | | | Assume that if the guest executes clts, it knows what it's doing, and load the guest fpu to prevent an #NM exception. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Clean up DR6 emulationJan Kiszka2010-03-01
| | | | | | | | As we trap all debug register accesses, we do not need to switch real DR6 at all. Clean up update_exception_bitmap at this chance, too. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Fix emulation of DR4 and DR5Jan Kiszka2010-03-01
| | | | | | | | Make sure DR4 and DR5 are aliased to DR6 and DR7, respectively, if CR4.DE is not set. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Fix exceptions of mov to drJan Kiszka2010-03-01
| | | | | | | | | Injecting GP without an error code is a bad idea (causes unhandled guest exits). Moreover, we must not skip the instruction if we injected an exception. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Remove emulation failure reportSheng Yang2010-03-01
| | | | | | | | | | | | | | | As Avi noted: >There are two problems with the kernel failure report. First, it >doesn't report enough data - registers, surrounding instructions, etc. >that are needed to explain what is going on. Second, it can flood >dmesg, which is a pretty bad thing to do. So we remove the emulation failure report in handle_invalid_guest_state(), and would inspected the guest using userspace tool in the future. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Give the guest ownership of cr0.ts when the fpu is activeAvi Kivity2010-03-01
| | | | | | | | | | | | If the guest fpu is loaded, there is nothing interesing about cr0.ts; let the guest play with it as it will. This makes context switches between fpu intensive guest processes faster, as we won't trap the clts and cr0 write instructions. [marcelo: fix cr0 read shadow update on fpu deactivation; kills F8 install] Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Lazify fpu activation and deactivationAvi Kivity2010-03-01
| | | | | | | | | | | Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep it loaded until the next heavyweight exit (where we are forced to unload it). This reduces unnecessary exits. We also defer fpu activation on clts; while clts signals the intent to use the fpu, we can't be sure the guest will actually use it. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Allow the guest to own some cr0 bitsAvi Kivity2010-03-01
| | | | | | We will use this later to give the guest ownership of cr0.ts. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Replace read accesses of vcpu->arch.cr0 by an accessorAvi Kivity2010-03-01
| | | | | | | Since we'd like to allow the guest to own a few bits of cr0 at times, we need to know when we access those bits. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: trace clts and lmsw instructions as cr accessesAvi Kivity2010-03-01
| | | | | | clts writes cr0.ts; lmsw writes cr0[0:15] - record that in ftrace. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Enable EPT 1GB page supportSheng Yang2010-03-01
| | | | | Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_opsSheng Yang2010-03-01
| | | | | | | | | | Then the callback can provide the maximum supported large page level, which is more flexible. Also move the gb page support into x86_64 specific. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fill out ftrace exit reason stringsAvi Kivity2010-03-01
| | | | | | | Some exit reasons missed their strings; fill out the table. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: convert slots_lock to a mutexMarcelo Tosatti2010-03-01
| | | | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: switch vcpu context to use SRCUMarcelo Tosatti2010-03-01
| | | | Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: introduce kvm->srcu and convert kvm_set_memory_region to SRCU updateMarcelo Tosatti2010-03-01
| | | | | | | | | | Use two steps for memslot deletion: mark the slot invalid (which stops instantiation of new shadow pages for that slot, but allows destruction), then instantiate the new empty slot. Also simplifies kvm_handle_hva locking. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: modify memslots layout in struct kvmMarcelo Tosatti2010-03-01
| | | | | | | | | Have a pointer to an allocated region inside struct kvm. [alex: fix ppc book 3s] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Add instruction rdtscp support for guestSheng Yang2010-03-01
| | | | | | | Before enabling, execution of "rdtscp" in guest would result in #UD. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add cpuid_update() callback to kvm_x86_opsSheng Yang2010-03-01
| | | | | | | | | | | Sometime, we need to adjust some state in order to reflect guest CPUID setting, e.g. if we don't expose rdtscp to guest, we won't want to enable it on hardware. cpuid_update() is introduced for this purpose. Also export kvm_find_cpuid_entry() for later use. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Remove redundant variableSheng Yang2010-03-01
| | | | | | | It's no longer necessary. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fold ept_update_paging_mode_cr4() into its callerAvi Kivity2010-03-01
| | | | | | | | | | ept_update_paging_mode_cr4() accesses vcpu->arch.cr4 directly, which usually needs to be accessed via kvm_read_cr4(). In this case, we can't, since cr4 is in the process of being updated. Instead of adding inane comments, fold the function into its caller (vmx_set_cr4), so it can use the not-yet-committed cr4 directly. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: When using ept, allow the guest to own cr4.pgeAvi Kivity2010-03-01
| | | | | | | We make no use of cr4.pge if ept is enabled, but the guest does (to flush global mappings, as with vmap()), so give the guest ownership of this bit. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Make guest cr4 mask more conservativeAvi Kivity2010-03-01
| | | | | | | | Instead of specifying the bits which we want to trap on, specify the bits which we allow the guest to change transparently. This is safer wrt future changes to cr4. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add accessor for reading cr4 (or some bits of cr4)Avi Kivity2010-03-01
| | | | | | | | | | | Some bits of cr4 can be owned by the guest on vmx, so when we read them, we copy them to the vcpu structure. In preparation for making the set of guest-owned bits dynamic, use helpers to access these bits so we don't need to know where the bit resides. No changes to svm since all bits are host-owned there. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Move some cr[04] related constants to vmx.cAvi Kivity2010-03-01
| | | | | | They have no place in common code. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Trap and invalid MWAIT/MONITOR instructionSheng Yang2010-03-01
| | | | | | | | | | We don't support these instructions, but guest can execute them even if the feature('monitor') haven't been exposed in CPUID. So we would trap and inject a #UD if guest try this way. Cc: stable@kernel.org Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fix comparison of guest efer with stale host valueAvi Kivity2009-12-03
| | | | | | | | | | | | | | | update_transition_efer() masks out some efer bits when deciding whether to switch the msr during guest entry; for example, NX is emulated using the mmu so we don't need to disable it, and LMA/LME are handled by the hardware. However, with shared msrs, the comparison is made against a stale value; at the time of the guest switch we may be running with another guest's efer. Fix by deferring the mask/compare to the actual point of guest entry. Noted by Marcelo. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Disable unrestricted guest when EPT disabledSheng Yang2009-12-03
| | | | | | | | Otherwise would cause VMEntry failure when using ept=0 on unrestricted guest supported processors. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: x86: Add KVM_GET/SET_VCPU_EVENTSJan Kiszka2009-12-03
| | | | | | | | | | | | This new IOCTL exports all yet user-invisible states related to exceptions, interrupts, and NMIs. Together with appropriate user space changes, this fixes sporadic problems of vmsave/restore, live migration and system reset. [avi: future-proof abi by adding a flags field] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Report unexpected simultaneous exceptions as internal errorsAvi Kivity2009-12-03
| | | | | | | | | These happen when we trap an exception when another exception is being delivered; we only expect these with MCEs and page faults. If something unexpected happens, things probably went south and we're better off reporting an internal error and freezing. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Allow internal errors reported to userspace to carry extra dataAvi Kivity2009-12-03
| | | | | | | | | Usually userspace will freeze the guest so we can inspect it, but some internal state is not available. Add extra data to internal error reporting so we can expose it to the debugger. Extra data is specific to the suberror. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Remove vmx->msr_offset_eferAvi Kivity2009-12-03
| | | | | | | This variable is used to communicate between a caller and a callee; switch to a function argument instead. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: move CR3/PDPTR update to vmx_set_cr3Marcelo Tosatti2009-12-03
| | | | | | | | | | | | GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from outside guest context. Similarly pdptrs are updated via load_pdptrs. Let kvm_set_cr3 perform the update, removing it from the vcpu_run fast path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Use shared msr infrastructureAvi Kivity2009-12-03
| | | | | | | | | | | | | | Instead of reloading syscall MSRs on every preemption, use the new shared msr infrastructure to reload them at the last possible minute (just before exit to userspace). Improves vcpu/idle/vcpu switches by about 2000 cycles (when EFER needs to be reloaded as well). [jan: fix slot index missing indirection] Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Move MSR_KERNEL_GS_BASE out of the vmx autoload msr areaAvi Kivity2009-12-03
| | | | | | | | | Currently MSR_KERNEL_GS_BASE is saved and restored as part of the guest/host msr reloading. Since we wish to lazy-restore all the other msrs, save and reload MSR_KERNEL_GS_BASE explicitly instead of using the common code. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Use macros instead of hex value on cr0 initializationEduardo Habkost2009-12-03
| | | | | | | This should have no effect, it is just to make the code clearer. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: fix handle_pause declarationMarcelo Tosatti2009-12-03
| | | | | | There's no kvm_run argument anymore. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: VMX: Add support for Pause-Loop ExitingZhai, Edwin2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution control fields: PLE_Gap - upper bound on the amount of time between two successive executions of PAUSE in a loop. PLE_Window - upper bound on the amount of time a guest is allowed to execute in a PAUSE loop If the time, between this execution of PAUSE and previous one, exceeds the PLE_Gap, processor consider this PAUSE belongs to a new loop. Otherwise, processor determins the the total execution time of this loop(since 1st PAUSE in this loop), and triggers a VM exit if total time exceeds the PLE_Window. * Refer SDM volume 3b section 21.6.13 & 22.1.3. Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP is sched-out after hold a spinlock, then other VPs for same lock are sched-in to waste the CPU time. Our tests indicate that most spinlocks are held for less than 212 cycles. Performance tests show that with 2X LP over-commitment we can get +2% perf improvement for kernel build(Even more perf gain with more LPs). Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>