aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm
Commit message (Collapse)AuthorAge
...
* KVM: MMU: Emulate #PF error code of reserved bits violationDong, Eddie2009-06-10
| | | | | | | | | | | Detect, indicate, and propagate page faults where reserved bits are set. Take care to handle the different paging modes, each of which has different sets of reserved bits. [avi: fix pte reserved bits for efer.nxe=0] Signed-off-by: Eddie Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: Fix comment in page_fault()Eddie Dong2009-06-10
| | | | | | | | The original one is for the code before refactoring. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Correct wrong vmcs field sizesSheng Yang2009-06-10
| | | | | | | EXIT_QUALIFICATION and GUEST_LINEAR_ADDRESS are natural width, not 64-bit. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Make flexpriority module parameter reflect hardware capabilityAvi Kivity2009-06-10
| | | | | | If the hardware does not support flexpriority, zero the module parameter. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix interrupt unhalting a vcpu when it shouldn'tGleb Natapov2009-06-10
| | | | | | | | kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking if interrupt window is actually opened. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Timer event should not unconditionally unhalt vcpu.Gleb Natapov2009-06-10
| | | | | | | | | | Currently timer events are processed before entering guest mode. Move it to main vcpu event loop since timer events should be processed even while vcpu is halted. Timer may cause interrupt/nmi to be injected and only then vcpu will be unhalted. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Fold vm_need_ept() into callersAvi Kivity2009-06-10
| | | | | | Trivial. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Zero ept module parameter if ept is not presentAvi Kivity2009-06-10
| | | | | | Allows reading back hardware capability. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Zero the vpid module parameter if vpid is not supportedAvi Kivity2009-06-10
| | | | | | This allows reading back how the hardware is configured. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Annotate module parameters as __read_mostlyAvi Kivity2009-06-10
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Simplify module parameter namesAvi Kivity2009-06-10
| | | | | | Instead of 'enable_vpid=1', use a simple 'vpid=1'. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Rename kvm_handle_exit() to vmx_handle_exit()Avi Kivity2009-06-10
| | | | | | It is a static vmx-specific function. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Make module parameters readableAvi Kivity2009-06-10
| | | | | | Useful to see how the module was loaded. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: reuse (pop|push)_irq from svm.c in vmx.cGleb Natapov2009-06-10
| | | | | | | | The prioritized bit vector manipulation functions are useful in both vmx and svm. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Remove duplicate code in svm_do_inject_vector()Gleb Natapov2009-06-10
| | | | | | | svm_do_inject_vector() reimplements pop_irq(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: Ignore reads to EVNTSEL MSRsAmit Shah2009-06-10
| | | | | | | | | | | | We ignore writes to the performance counters and performance event selector registers already. Kaspersky antivirus reads the eventsel MSR causing it to crash with the current behaviour. Return 0 as data when the eventsel registers are read to stop the crash. Signed-off-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: do not free active mmu pages in free_mmu_pages()Gleb Natapov2009-06-10
| | | | | | | | free_mmu_pages() should only undo what alloc_mmu_pages() does. Free mmu pages from the generic VM destruction function, kvm_destroy_vm(). Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Device assignment framework reworkSheng Yang2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | After discussion with Marcelo, we decided to rework device assignment framework together. The old problems are kernel logic is unnecessary complex. So Marcelo suggest to split it into a more elegant way: 1. Split host IRQ assign and guest IRQ assign. And userspace determine the combination. Also discard msi2intx parameter, userspace can specific KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_INTX in assigned_irq->flags to enable MSI to INTx convertion. 2. Split assign IRQ and deassign IRQ. Import two new ioctls: KVM_ASSIGN_DEV_IRQ and KVM_DEASSIGN_DEV_IRQ. This patch also fixed the reversed _IOR vs _IOW in definition(by deprecated the old interface). [avi: replace homemade bitcount() by hweight_long()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: make 'lapic_timer_ops' and 'kpit_ops' staticHannes Eder2009-06-10
| | | | | | | | | Fix this sparse warnings: arch/x86/kvm/lapic.c:916:22: warning: symbol 'lapic_timer_ops' was not declared. Should it be static? arch/x86/kvm/i8254.c:268:22: warning: symbol 'kpit_ops' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: APIC: get rid of deliver_bitmaskGleb Natapov2009-06-10
| | | | | | | | Deliver interrupt during destination matching loop. Signed-off-by: Gleb Natapov <gleb@redhat.com> Acked-by: Xiantao Zhang <xiantao.zhang@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: change the way how lowest priority vcpu is calculatedGleb Natapov2009-06-10
| | | | | | | | | The new way does not require additional loop over vcpus to calculate the one with lowest priority as one is chosen during delivery bitmap construction. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: consolidate ioapic/ipi interrupt delivery logicGleb Natapov2009-06-10
| | | | | | | | | | Use kvm_apic_match_dest() in kvm_get_intr_delivery_bitmask() instead of duplicating the same code. Use kvm_get_intr_delivery_bitmask() in apic_send_ipi() to figure out ipi destination instead of reimplementing the logic. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: APIC: kvm_apic_set_irq deliver all kinds of interruptsGleb Natapov2009-06-10
| | | | | | | Get rid of ioapic_inj_irq() and ioapic_inj_nmi() functions. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: MMU: remove call to kvm_mmu_pte_write from walk_addrJoerg Roedel2009-06-10
| | | | | | | | There is no reason to update the shadow pte here because the guest pte is only changed to dirty state. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: unify part of generic timer handlingMarcelo Tosatti2009-06-10
| | | | | | | | | Hide the internals of vcpu awakening / injection from the in-kernel emulated timers. This makes future changes in this logic easier and decreases the distance to more generic timer handling. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PIT: remove usage of count_load_time for channel 0Marcelo Tosatti2009-06-10
| | | | | | | We can infer elapsed time from hrtimer_expires_remaining. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: PIT: remove unused scheduled variableMarcelo Tosatti2009-06-10
| | | | | | | Unused. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: silence preempt warning on kvm_write_guest_timeMatt T. Yourst2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This issue just appeared in kvm-84 when running on 2.6.28.7 (x86-64) with PREEMPT enabled. We're getting syslog warnings like this many (but not all) times qemu tells KVM to run the VCPU: BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-x86/28938 caller is kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] Pid: 28938, comm: qemu-system-x86 2.6.28.7-mtyrel-64bit Call Trace: debug_smp_processor_id+0xf7/0x100 kvm_arch_vcpu_ioctl_run+0x5d1/0xc70 [kvm] ? __wake_up+0x4e/0x70 ? wake_futex+0x27/0x40 kvm_vcpu_ioctl+0x2e9/0x5a0 [kvm] enqueue_hrtimer+0x8a/0x110 _spin_unlock_irqrestore+0x27/0x50 vfs_ioctl+0x31/0xa0 do_vfs_ioctl+0x74/0x480 sys_futex+0xb4/0x140 sys_ioctl+0x99/0xa0 system_call_fastpath+0x16/0x1b As it turns out, the call trace is messed up due to gcc's inlining, but I isolated the problem anyway: kvm_write_guest_time() is being used in a non-thread-safe manner on preemptable kernels. Basically kvm_write_guest_time()'s body needs to be surrounded by preempt_disable() and preempt_enable(), since the kernel won't let us query any per-CPU data (indirectly using smp_processor_id()) without preemption disabled. The attached patch fixes this issue by disabling preemption inside kvm_write_guest_time(). [marcelo: surround only __get_cpu_var calls since the warning is harmless] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: bit ops for deliver_bitmapSheng Yang2009-06-10
| | | | | | | It's also convenient when we extend KVM supported vcpu number in the future. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Update intr delivery func to accept unsigned long* bitmapSheng Yang2009-06-10
| | | | | | | | Would be used with bit ops, and would be easily extended if KVM_MAX_VCPUS is increased. Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Don't intercept MSR_KERNEL_GS_BASEAvi Kivity2009-06-10
| | | | | | | | | | | | | Windows 2008 accesses this MSR often on context switch intensive workloads; since we run in guest context with the guest MSR value loaded (so swapgs can work correctly), we can simply disable interception of rdmsr/wrmsr for this MSR. A complication occurs since in legacy mode, we run with the host MSR value loaded. In this case we enable interception. This means we need two MSR bitmaps, one for legacy mode and one for long mode. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: VMX: Don't use highmem pages for the msr and pio bitmapsAvi Kivity2009-06-10
| | | | | | | Highmem pages are a pain, and saving three lowmem pages on i386 isn't worth the extra code. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix PDPTR reloading on CR4 writesAvi Kivity2009-05-25
| | | | | | | | | | | | | | | | The processor is documented to reload the PDPTRs while in PAE mode if any of the CR4 bits PSE, PGE, or PAE change. Linux relies on this behaviour when zapping the low mappings of PAE kernels during boot. The code already handled changes to CR4.PAE; augment it to also notice changes to PSE and PGE. This triggered while booting an F11 PAE kernel; the futex initialization code runs before any CR3 reloads and writes to a NULL pointer; the futex subsystem ended up uninitialized, killing PI futexes and pulseaudio which uses them. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Make paravirt tlb flush also reload the PAE PDPTRsAvi Kivity2009-05-25
| | | | | | | | | | The paravirt tlb flush may be used not only to flush TLBs, but also to reload the four page-directory-pointer-table entries, as it is used as a replacement for reloading CR3. Change the code to do the entire CR3 reloading dance instead of simply flushing the TLB. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Remove port 80 passthroughAvi Kivity2009-05-11
| | | | | | | | | | | | | KVM optimizes guest port 80 accesses by passthing them through to the host. Some AMD machines die on port 80 writes, allowing the guest to hard-lock the host. Remove the port passthrough to avoid the problem. Cc: stable@kernel.org Reported-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Tested-by: Piotr Jaroszyński <p.jaroszynski@gmail.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Make EFER reads safe when EFER does not existAvi Kivity2009-05-11
| | | | | | | | Some processors don't have EFER; don't oops if userspace wants us to read EFER when we check NX. Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix NX support reportingAvi Kivity2009-05-11
| | | | | | NX support is bit 20, not bit 1. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: SVM: Fix cross vendor migration issue with unusable bitAndre Przywara2009-05-11
| | | | | | | | | AMDs VMCB does not have an explicit unusable segment descriptor field, so we emulate it by using "not present". This has to be setup before the fixups, because this field is used there. Signed-off-by: Andre Przywara <andre.przywara@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Unregister cpufreq notifier on unloadJan Kiszka2009-04-22
| | | | | | | | Properly unregister cpufreq notifier on onload if it was registered during init. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: release time_page on vcpu destructionJoerg Roedel2009-04-22
| | | | | | | | | Not releasing the time_page causes a leak of that page or the compound page it is situated in. Cc: stable@kernel.org Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: MMU: disable global page optimizationMarcelo Tosatti2009-04-22
| | | | | | | | | Complexity to fix it not worthwhile the gains, as discussed in http://article.gmane.org/gmane.comp.emulators.kvm.devel/28649. Cc: stable@kernel.org Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* Merge branch 'tracing/core-v2' into tracing-for-linusIngo Molnar2009-04-01
|\ | | | | | | | | | | | | | | Conflicts: include/linux/slub_def.h lib/Kconfig.debug mm/slob.c mm/slub.c
| * Merge branch 'linus' into tracing/blktraceIngo Molnar2009-02-19
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: block/blktrace.c Semantic merge: kernel/trace/blktrace.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * \ Merge branch 'linus' into tracing/kmemtrace2Ingo Molnar2009-01-06
| |\ \
| * | | tracing, kvm: change MARKERS to select instead of depends onIngo Molnar2008-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: build fix fix: kernel/trace/Kconfig:42:error: found recursive dependency: TRACING -> TRACEPOINTS -> MARKERS -> KVM_TRACE -> RELAY -> KMEMTRACE -> TRACING markers is a facility that should be selected - not depended on by an interactive Kconfig entry. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | KVM: VMX: Don't allow uninhibited access to EFER on i386Avi Kivity2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmx_set_msr() does not allow i386 guests to touch EFER, but they can still do so through the default: label in the switch. If they set EFER_LME, they can oops the host. Fix by having EFER access through the normal channel (which will check for EFER_LME) even on i386. Reported-and-tested-by: Benjamin Gilbert <bgilbert@cs.cmu.edu> Cc: stable@kernel.org Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: Fix missing smp tlb flush in invlpgAndrea Arcangeli2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kvm emulates an invlpg instruction, it can drop a shadow pte, but leaves the guest tlbs intact. This can cause memory corruption when swapping out. Without this the other cpu can still write to a freed host physical page. tlb smp flush must happen if rmap_remove is called always before mmu_lock is released because the VM will take the mmu_lock before it can finally add the page to the freelist after swapout. mmu notifier makes it safe to flush the tlb after freeing the page (otherwise it would never be safe) so we can do a single flush for multiple sptes invalidated. Cc: stable@kernel.org Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: fix sparse warnings: Should it be static?Hannes Eder2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Make symbols static. Fix this sparse warnings: arch/x86/kvm/mmu.c:992:5: warning: symbol 'mmu_pages_add' was not declared. Should it be static? arch/x86/kvm/mmu.c:1124:5: warning: symbol 'mmu_pages_next' was not declared. Should it be static? arch/x86/kvm/mmu.c:1144:6: warning: symbol 'mmu_pages_clear_parents' was not declared. Should it be static? arch/x86/kvm/x86.c:2037:5: warning: symbol 'kvm_read_guest_virt' was not declared. Should it be static? arch/x86/kvm/x86.c:2067:5: warning: symbol 'kvm_write_guest_virt' was not declared. Should it be static? virt/kvm/irq_comm.c:220:5: warning: symbol 'setup_routing_entry' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: fix sparse warnings: context imbalanceHannes Eder2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: Attribute function with __acquires(...) resp. __releases(...). Fix this sparse warnings: arch/x86/kvm/i8259.c:34:13: warning: context imbalance in 'pic_lock' - wrong count at exit arch/x86/kvm/i8259.c:39:13: warning: context imbalance in 'pic_unlock' - unexpected unlock Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | | KVM: is_long_mode() should check for EFER.LMAAmit Shah2009-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | is_long_mode currently checks the LongModeEnable bit in EFER instead of the LongModeActive bit. This is wrong, but we survived this till now since it wasn't triggered. This breaks guests that go from long mode to compatibility mode. This is noticed on a solaris guest and fixes bug #1842160 Signed-off-by: Amit Shah <amit.shah@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>