aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAge
...
* KVM: x86 emulator: simplify sib decodingAvi Kivity2008-07-20
| | | | | | Instead of using sparse switches, use simpler if/else sequences. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: handle undecoded rex.b with r/m = 5 in certain casesAvi Kivity2008-07-20
| | | | | | x86_64 does not decode rex.b in certain cases, where the r/m field = 5. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: emulate nop and xchg reg, acc (opcodes 0x90 - 0x97)Mohammed Gamal2008-07-20
| | | | | Signed-off-by: Mohammed Gamal <m.gamal005@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use printk_rlimit() instead of reporting emulation failures just onceAvi Kivity2008-07-20
| | | | | | | Emulation failure reports are useful, so allow more than one per the lifetime of the module. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Do not calculate linear rip in emulation failure reportGlauber Costa2008-07-20
| | | | | | | | If we're not gonna do anything (case in which failure is already reported), we do not need to even bother with calculating the linear rip. Signed-off-by: Glauber Costa <gcosta@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: only abort guest entry if timer count goes from 0->1Marcelo Tosatti2008-07-20
| | | | | | | | | | | | | | | | Only abort guest entry if the timer count went from 0->1, since for 1->2 or larger the bit will either be set already or a timer irq will have been injected. Using atomic_inc_and_test() for it also introduces an SMP barrier to the LAPIC version (thought it was unecessary because of timer migration, but guest can be scheduled to a different pCPU between exit and kvm_vcpu_block(), so there is the possibility for a race). Noticed by Avi. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add coalesced MMIO support (x86 part)Laurent Vivier2008-07-20
| | | | | | | | | This patch enables coalesced MMIO for x86 architecture. It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO. It enables the compilation of coalesced_mmio.c. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: kvm_io_device: extend in_range() to manage len and write attributeLaurent Vivier2008-07-20
| | | | | | | | | | Modify member in_range() of structure kvm_io_device to pass length and the type of the I/O (write or read). This modification allows to use kvm_io_device with coalesced MMIO. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Avoid page prefetch on SVMAvi Kivity2008-07-20
| | | | | | | | SVM cannot benefit from page prefetching since guest page fault bypass cannot by made to work there. Avoid accessing the guest page table in this case. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move nonpaging_prefetch_page()Avi Kivity2008-07-20
| | | | | | In preparation for next patch. No code change. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: implement 'push imm' (opcode 0x68)Avi Kivity2008-07-20
| | | | | | | Encountered in FC6 boot sequence, now that we don't force ss.rpl = 0 during the protected mode transition. Not really necessary, but nice to have. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: simplify push imm8 emulationAvi Kivity2008-07-20
| | | | | | Instead of fetching the data explicitly, use SrcImmByte. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Optimize prefetch_page()Avi Kivity2008-07-20
| | | | | | | Instead of reading each pte individually, read 256 bytes worth of ptes and batch process them. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Add support for mov r, sreg (0x8c) instructionGuillaume Thouvenin2008-07-20
| | | | | | | | Add support for mov r, sreg (0x8c) instruction Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Add support for mov seg, r (0x8e) instructionGuillaume Thouvenin2008-07-20
| | | | | | | | | | Add support for mov r, sreg (0x8c) instruction. [avi: drop the sreg decoding table in favor of 1:1 encoding] Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: adds support to mov r,imm (opcode 0xb8) instructionGuillaume Thouvenin2008-07-20
| | | | | | | | Add support to mov r, imm (0xb8) instruction. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: add support for jmp far 0xeaGuillaume Thouvenin2008-07-20
| | | | | | | | Add support for jmp far (opcode 0xea) instruction. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Update c->dst.bytes in decode instructionGuillaume Thouvenin2008-07-20
| | | | | | | | | | Update c->dst.bytes in decode instruction instead of instruction itself. It's needed because if c->dst.bytes is equal to 0, the instruction is not emulated. Signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Prefixes segment functions that will be exported with "kvm_"Guillaume Thouvenin2008-07-20
| | | | | | | | | | Prefixes functions that will be exported with kvm_. We also prefixed set_segment() even if it still static to be coherent. signed-off-by: Guillaume Thouvenin <guillaume.thouvenin@ext.bull.net> Signed-off-by: Laurent Vivier <laurent.vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MTRR supportAvi Kivity2008-07-20
| | | | | | | Add emulation for the memory type range registers, needed by VMware esx 3.5, and by pci device assignment. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Enable NMI with in-kernel irqchipSheng Yang2008-07-20
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: IOAPIC/LAPIC: Enable NMI supportSheng Yang2008-07-20
| | | | | | | [avi: fix ia64 build breakage] Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove unnecessary ->decache_regs() callAvi Kivity2008-07-20
| | | | | | | Since we aren't modifying any register, there's no need to decache the register state. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove decache_vcpus_on_cpu() and related callbacksAvi Kivity2008-07-20
| | | | | | Obsoleted by the vmx-specific per-cpu list. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Add list of potentially locally cached vcpusAvi Kivity2008-07-20
| | | | | | | | | | | | | | | | | | | | | VMX hardware can cache the contents of a vcpu's vmcs. This cache needs to be flushed when migrating a vcpu to another cpu, or (which is the case that interests us here) when disabling hardware virtualization on a cpu. The current implementation of decaching iterates over the list of all vcpus, picks the ones that are potentially cached on the cpu that is being offlined, and flushes the cache. The problem is that it uses mutex_trylock() to gain exclusive access to the vcpu, which fires off a (benign) warning about using the mutex in an interrupt context. To avoid this, and to make things generally nicer, add a new per-cpu list of potentially cached vcus. This makes the decaching code much simpler. The list is vmx-specific since other hardware doesn't have this issue. [andrea: fix crash on suspend/resume] Signed-off-by: Andrea Arcangeli <andrea@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle virtualization instruction #UD faults during rebootAvi Kivity2008-07-20
| | | | | | | | | | | | | | | KVM turns off hardware virtualization extensions during reboot, in order to disassociate the memory used by the virtualization extensions from the processor, and in order to have the system in a consistent state. Unfortunately virtual machines may still be running while this goes on, and once virtualization extensions are turned off, any virtulization instruction will #UD on execution. Fix by adding an exception handler to virtualization instructions; if we get an exception during reboot, we simply spin waiting for the reset to complete. If it's a true exception, BUG() so we can have our stack trace. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix false flooding when a pte points to page tableAvi Kivity2008-07-20
| | | | | | | | | | | | | | | | | | | | | | The KVM MMU tries to detect when a speculative pte update is not actually used by demand fault, by checking the accessed bit of the shadow pte. If the shadow pte has not been accessed, we deem that page table flooded and remove the shadow page table, allowing further pte updates to proceed without emulation. However, if the pte itself points at a page table and only used for write operations, the accessed bit will never be set since all access will happen through the emulator. This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels. The kernel points a kmap_atomic() pte at a page table, and then proceeds with read-modify-write operations to look at the dirty and accessed bits. We get a false flood trigger on the kmap ptes, which results in the mmu spending all its time setting up and tearing down shadows. Fix by setting the shadow accessed bit on emulated accesses. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Trivial vmcs_write64() code simplificationAvi Kivity2008-07-20
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: Fake MSR_K7 performance countersChris Lalancette2008-07-20
| | | | | | | | | | | | | | | | | | | | | | Attached is a patch that fixes a guest crash when booting older Linux kernels. The problem stems from the fact that we are currently emulating MSR_K7_EVNTSEL[0-3], but not emulating MSR_K7_PERFCTR[0-3]. Because of this, setup_k7_watchdog() in the Linux kernel receives a GPF when it attempts to write into MSR_K7_PERFCTR, which causes an OOPs. The patch fixes it by just "fake" emulating the appropriate MSRs, throwing away the data in the process. This causes the NMI watchdog to not actually work, but it's not such a big deal in a virtualized environment. When we get a write to one of these counters, we printk_ratelimit() a warning. I decided to print it out for all writes, even if the data is 0; it doesn't seem to make sense to me to special case when data == 0. Tested by myself on a RHEL-4 guest, and Joerg Roedel on a Windows XP 64-bit guest. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: PIT: support mode 3Aurelien Jarno2008-07-20
| | | | | | | | | | | | | The in-kernel PIT emulation ignores pending timers if operating under mode 3, which for example Hurd uses. This mode should output a square wave, high for (N+1)/2 counts and low for (N-1)/2 counts. As we only care about the resulting interrupts, the period is N, and mode 3 is the same as mode 2 with regard to interrupts. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: add tracing support for TDP page faultsJoerg Roedel2008-07-20
| | | | | | | | To distinguish between real page faults and nested page faults they should be traced as different events. This is implemented by this patch. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: add missing kvmtrace markersJoerg Roedel2008-07-20
| | | | | | | | This patch adds the missing kvmtrace markers to the svm module of kvm. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add missing kvmtrace bitsJoerg Roedel2008-07-20
| | | | | | | | This patch adds some kvmtrace bits to the generic x86 code where it is instrumented from SVM. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: implement dedicated INTR exit handlerJoerg Roedel2008-07-20
| | | | | | | | With an exit handler for INTR intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: SVM: implement dedicated NMI exit handlerJoerg Roedel2008-07-20
| | | | | | | | With an exit handler for NMI intercepts its possible to account them using kvmtrace. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: move APIC_ACCESS trace entry to generic codeJoerg Roedel2008-07-20
| | | | | | | | This patch moves the trace entry for APIC accesses from the VMX code to the generic lapic code. This way APIC accesses from SVM will also be traced. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add statics were possible, function definition in lapic.hHarvey Harrison2008-07-20
| | | | | | | | | | | | | | | | Noticed by sparse: arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static? arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static? arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static? arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static? arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static? arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static? arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static? arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static? arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* on_each_cpu(): kill unused 'retry' parameterJens Axboe2008-06-26
| | | | | | | | | It's not even passed on to smp_call_function() anymore, since that was removed. So kill it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* smp_call_function: get rid of the unused nonatomic/retry argumentJens Axboe2008-06-26
| | | | | | | | It's never used and the comments refer to nonatomic and retry interchangably. So get rid of it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* KVM: Make kvm host use the paravirt clocksource structsGerd Hoffmann2008-06-24
| | | | | | | | This patch updates the kvm host code to use the pvclock structs. It also makes the paravirt clock compatible with Xen. Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Fix host msr corruption with preemption enabledAvi Kivity2008-06-24
| | | | | | | | | | | | | Switching msrs can occur either synchronously as a result of calls to the msr management functions (usually in response to the guest touching virtualized msrs), or asynchronously when preempting a kvm thread that has guest state loaded. If we're unlucky enough to have the two at the same time, host msrs are corrupted and the machine goes kaput on the next syscall. Most easily triggered by Windows Server 2008, as it does a lot of msr switching during bootup. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity2008-06-24
| | | | | | | | | | | | | | | | KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti2008-06-24
| | | | | | | | | | | | | | kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti2008-06-24
| | | | | | | | rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: close timer injection race window in __vcpu_runMarcelo Tosatti2008-06-24
| | | | | | | | | | | | | | If a timer fires after kvm_inject_pending_timer_irqs() but before local_irq_disable() the code will enter guest mode and only inject such timer interrupt the next time an unrelated event causes an exit. It would be simpler if the timer->pending irq conversion could be done with IRQ's disabled, so that the above problem cannot happen. For now introduce a new vcpu requests bit to cancel guest entry. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix race between timer migration and vcpu migrationMarcelo Tosatti2008-06-24
| | | | | | | | | | | | | | | A guest vcpu instance can be scheduled to a different physical CPU between the test for KVM_REQ_MIGRATE_TIMER and local_irq_disable(). If that happens, the timer will only be migrated to the current pCPU on the next exit, meaning that guest LAPIC timer event can be delayed until a host interrupt is triggered. Fix it by cancelling guest entry if any vcpu request is pending. This has the side effect of nicely consolidating vcpu->requests checks. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix is_empty_shadow_page() checkAvi Kivity2008-06-06
| | | | | | The check is only looking at one of two possible empty ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix printk() format stringAvi Kivity2008-06-06
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: reschedule during shadow teardownAvi Kivity2008-06-06
| | | | | | | Shadows for large guests can take a long time to tear down, so reschedule occasionally to avoid softlockup warnings. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Clear CR4.VMXE in hardware_disableEli Collins2008-06-06
| | | | | | | | | | | | | Clear CR4.VMXE in hardware_disable. There's no reason to leave it set after doing a VMXOFF. VMware Workstation 6.5 checks CR4.VMXE as a proxy for whether the CPU is in VMX mode, so leaving VMXE set means we'll refuse to power on. With this change the user can power on after unloading the kvm-intel module. I tested on kvm-67 and kvm-69. Signed-off-by: Eli Collins <ecollins@vmware.com> Signed-off-by: Avi Kivity <avi@qumranet.com>