aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| | * | | | | | KVM: s390: Fix possible memory leak in SIGP functionsThomas Huth2014-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When kvm_get_vcpu() returned NULL for the destination CPU in __sigp_emergency() or __sigp_external_call(), the memory for the "inti" structure was not released anymore. This patch fixes this issue by moving the check for !dst_vcpu before the kzalloc() call. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | | | | KVM: s390: fix calculation of idle_mask array sizeJens Freimann2014-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need BITS_TO_LONGS, not sizeof(long) to calculate the correct size. idle_mask is a bitmask, each bit representing the state of a cpu. The desired outcome is an array of unsigned long fields that can fit KVM_MAX_VCPUS bits. We should not use sizeof(long) which returnes the size in bytes, but BITS_TO_LONGS Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| | * | | | | | KVM: s390: randomize sca addressChristian Borntraeger2014-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We allocate a page for the 2k sca, so lets use the space to improve hit rate of some internal cpu caches. No need to change the freeing of the page, as this will shift away the page offset bits anyway. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
| * | | | | | | Merge branch 'kvms390-irqfd' of ↵Paolo Bonzini2014-03-24
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
| | * | | | | | | KVM: Bump KVM_MAX_IRQ_ROUTES for s390Cornelia Huck2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The maximum number for irq routes is currently 1024, which is a bit on the small size for s390: We support up to 4 x 64k virtual devices with up to 64 queues, and we need one route for each of the queues if we want to operate it via irqfd. Let's bump this to 4k on s390 for now, as this at least covers the saner setups. We need to find a more general solution, though, as we can't just grow the routing table indefinitly. Acked-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * | | | | | | KVM: s390: irq routing for adapter interrupts.Cornelia Huck2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a new interrupt class for s390 adapter interrupts and enable irqfds for s390. This is depending on a new s390 specific vm capability, KVM_CAP_S390_IRQCHIP, that needs to be enabled by userspace. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * | | | | | | KVM: s390: adapter interrupt sourcesCornelia Huck2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new interface to register/deregister sources of adapter interrupts identified by an unique id via the flic. Adapters may also be maskable and carry a list of pinned pages. These adapters will be used by irq routing later. Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| | * | | | | | | KVM: Add per-vm capability enablement.Cornelia Huck2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow KVM_ENABLE_CAP to act on a vm as well as on a vcpu. This makes more sense when the caller wants to enable a vm-related capability. s390 will be the first user; wire it up. Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * | | | | | | | KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIPPaolo Bonzini2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the previous patches, an interrupt whose bit is set in the IRR register will never be in the LAPIC's IRR and has never been injected on the migration source. So inject it on the destination. This fixes migration of Windows guests without HPET (they use the RTC to trigger the scheduler tick, and lose it after migration). Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: ioapic: extract body of kvm_ioapic_set_irqPaolo Bonzini2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will reuse it to process a nonzero IRR that is passed to KVM_SET_IRQCHIP. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: ioapic: clear IRR for edge-triggered interrupts at deliveryPaolo Bonzini2014-03-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that IRR bits are set in the KVM_GET_IRQCHIP result only if the interrupt is still sitting in the IOAPIC. After the next patches, it avoids spurious reinjection of the interrupt when KVM_SET_IRQCHIP is called. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | | KVM: ioapic: merge ioapic_deliver into ioapic_servicePaolo Bonzini2014-03-21
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commonize the handling of masking, which was absent for kvm_ioapic_set_irq. Setting remote_irr does not need a separate function either, and merging the two functions avoids confusion. Reviewed-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | MIPS: KVM: Remove dead code in CP0 emulationJames Hogan2014-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to check whether rd > MIPS_CP0_DESAVE is dead code, since MIPS_CP0_DESAVE = 31 and rd is already masked with 0x1f. Remove it. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | MIPS: KVM: Consult HWREna before emulating RDHWRJames Hogan2014-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ability to read hardware registers from userland with the RDHWR instruction should depend upon the corresponding bit of the HWREna register being set, otherwise a reserved instruction exception should be generated. However KVM's current emulation ignores the guest's HWREna and always emulates RDHWR instructions even if the guest OS has disallowed them. Therefore rework the RDHWR emulation code to check for privilege or the corresponding bit in the guest HWREna bit. Also remove the #if 0 case for the UserLocal register. I presume it was there for debug purposes but it seems unnecessary now that the guest can control whether it causes a guest exception. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | MIPS: KVM: Pass reserved instruction exceptions to guestJames Hogan2014-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously a reserved instruction exception while in guest code would cause a KVM internal error if kvm_mips_handle_ri() didn't recognise the instruction (including a RDHWR from an unrecognised hardware register). However the guest OS should really have the opportunity to catch the exception so that it can take the appropriate actions such as sending a SIGILL to the guest user process or emulating the instruction itself. Therefore in these cases emulate a guest RI exception and only return EMULATE_FAIL if that fails, being careful to revert the PC first in case the exception occurred in a branch delay slot in which case the PC will already point to the branch target. Also turn the printk messages relating to these cases into kvm_debug messages so that they aren't usually visible. This allows crashme to run in the guest without killing the entire VM. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | MIPS: KVM: asm/kvm_host.h: Clean up whitespaceJames Hogan2014-03-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The whitespace in asm/kvm_host.h is quite inconsistent in places. Clean up the whole file to use tabs more consistently. When you use the --ignore-space-change argument to git diff this patch only changes line wrapping in TLB_IS_GLOBAL and TLB_IS_VALID macros. Signed-off-by: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sanjay Lal <sanjayl@kymasys.com> Cc: linux-mips@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: eventfd: Fix lock order inversion.Cornelia Huck2014-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When registering a new irqfd, we call its ->poll method to collect any event that might have previously been pending so that we can trigger it. This is done under the kvm->irqfds.lock, which means the eventfd's ctx lock is taken under it. However, if we get a POLLHUP in irqfd_wakeup, we will be called with the ctx lock held before getting the irqfds.lock to deactivate the irqfd, causing lockdep to complain. Calling the ->poll method does not really need the irqfds.lock, so let's just move it after we've given up the irqfds.lock in kvm_irqfd_assign(). Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: x86: handle missing MPX in nested virtualizationPaolo Bonzini2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing nested virtualization, we may be able to read BNDCFGS but still not be allowed to write to GUEST_BNDCFGS in the VMCS. Guard writes to the field with vmx_mpx_supported(), and similarly hide the MSR from userspace if the processor does not support the field. We could work around this with the generic MSR save/load machinery, but there is only a limited number of MSR save/load slots and it is not really worthwhile to waste one for a scenario that should not happen except in the nested virtualization case. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: x86: Add nested virtualization support for MPXPaolo Bonzini2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is simple to do, the "host" BNDCFGS is either 0 or the guest value. However, both controls have to be present. We cannot provide MPX if we only have one of the "load BNDCFGS" or "clear BNDCFGS" controls. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: x86: introduce kvm_supported_xcr0()Paolo Bonzini2014-03-17
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XSAVE support for KVM is already using host_xcr0 & KVM_SUPPORTED_XCR0 as a "dynamic" version of KVM_SUPPORTED_XCR0. However, this is not enough because the MPX bits should not be presented to the guest unless kvm_x86_ops confirms the support. So, replace all instances of host_xcr0 & KVM_SUPPORTED_XCR0 with a new function kvm_supported_xcr0() that also has this check. Note that here: if (xstate_bv & ~KVM_SUPPORTED_XCR0) return -EINVAL; if (xstate_bv & ~host_cr0) return -EINVAL; the code is equivalent to if ((xstate_bv & ~KVM_SUPPORTED_XCR0) || (xstate_bv & ~host_cr0) return -EINVAL; i.e. "xstate_bv & (~KVM_SUPPORTED_XCR0 | ~host_cr0)" which is in turn equal to "xstate_bv & ~(KVM_SUPPORTED_XCR0 & host_cr0)". So we should also use the new function there. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge tag 'kvm-s390-20140317' of ↵Paolo Bonzini2014-03-17
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD Two patches: - one regression fix for reducing the amount of ucontrol userspace exits - get rid of BUG_ONs in hot inner loops
| | * | | | | | KVM: s390: Optimize ucontrol pathChristian Borntraeger2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 7c470539c95630c1f2a10f109e96f249730b75eb (s390/kvm: avoid automatic sie reentry) we will run through the C code of KVM on host interrupts instead of just reentering the guest. This will result in additional ucontrol exits (at least HZ per second). Let handle a 0 intercept in the kernel and dont return to userspace, even if in ucontrol mode. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> CC: stable@vger.kernel.org
| | * | | | | | KVM: s390: Removing untriggerable BUG_ONsDominik Dingel2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The BUG_ON in kvm-s390.c is unreachable, as we get the vcpu per common code, which itself does this from the private_data field of the file descriptor, and there is no KVM_UNCREATE_VCPU. The __{set,unset}_cpu_idle BUG_ONs are not triggerable because the vcpu creation code already checks against KVM_MAX_VCPUS. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * | | | | | | KVM: x86 emulator: emulate MOVAPDIgor Mammedov2014-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add emulation for 0x66 prefixed instruction of 0f 28 opcode that has been added earlier. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | | KVM: x86 emulator: emulate MOVAPSIgor Mammedov2014-03-17
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HCK memory driver test fails when testing 32-bit Windows 8.1 with baloon driver. tracing KVM shows error: reason EXIT_ERR rip 0x81c18326 info 0 0 x/10i 0x81c18326-20 0x0000000081c18312: add %al,(%eax) 0x0000000081c18314: add %cl,-0x7127711d(%esi) 0x0000000081c1831a: rolb $0x0,0x80ec(%ecx) 0x0000000081c18321: and $0xfffffff0,%esp 0x0000000081c18324: mov %esp,%esi 0x0000000081c18326: movaps %xmm0,(%esi) 0x0000000081c18329: movaps %xmm1,0x10(%esi) 0x0000000081c1832d: movaps %xmm2,0x20(%esi) 0x0000000081c18331: movaps %xmm3,0x30(%esi) 0x0000000081c18335: movaps %xmm4,0x40(%esi) which points to MOVAPS instruction currently no emulated by KVM. Fix it by adding appropriate entries to opcode table in KVM's emulator. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge branch 'kvm-ppc-fix' into HEADPaolo Bonzini2014-03-14
| |\| | | | |
| * | | | | | kvm: x86: ignore ioapic polarityGabriel L. Somlo2014-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both QEMU and KVM have already accumulated a significant number of optimizations based on the hard-coded assumption that ioapic polarity will always use the ActiveHigh convention, where the logical and physical states of level-triggered irq lines always match (i.e., active(asserted) == high == 1, inactive == low == 0). QEMU guests are expected to follow directions given via ACPI and configure the ioapic with polarity 0 (ActiveHigh). However, even when misbehaving guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow), QEMU will still use the ActiveHigh signaling convention when interfacing with KVM. This patch modifies KVM to completely ignore ioapic polarity as set by the guest OS, enabling misbehaving guests to work alongside those which comply with the ActiveHigh polarity specified by QEMU's ACPI tables. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu> [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: svm: Allow the guest to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: svm: set/clear all DR intercepts in one swoopPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unlike other intercepts, debug register intercepts will be modified in hot paths if the guest OS is bad or otherwise gets tricked into doing so. Avoid calling recalc_intercepts 16 times for debug registers. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Allow nested guests to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When preparing the VMCS02, the CPU-based execution controls is computed by vmx_exec_control. Turn off DR access exits there, too, if the KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: vmx: Allow the guest to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When not running in guest-debug mode (i.e. the guest controls the debug registers, having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we can let it write freely to the debug registers and reload them on the next exit. We still need to exit on the first access, so that the KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further accesses to the debug registers will not cause a vmexit. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: Allow the guest to run with dirty debug registersPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When not running in guest-debug mode, the guest controls the debug registers and having to take an exit for each DR access is a waste of time. If the guest gets into a state where each context switch causes DR to be saved and restored, this can take away as much as 40% of the execution time from the guest. After this patch, VMX- and SVM-specific code can set a flag in switch_db_regs, telling vcpu_enter_guest that on the next exit the debug registers might be dirty and need to be reloaded (syncing will be taken care of by a new callback in kvm_x86_ops). This flag can be set on the first access to a debug registers, so that multiple accesses to the debug registers only cause one vmexit. Note that since the guest will be able to read debug registers and enable breakpoints in DR7, we need to ensure that they are synchronized on entry to the guest---including DR6 that was not synced before. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: change vcpu->arch.switch_db_regs to a bit maskPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The next patch will add another bit that we can test with the same "if". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: vmx: we do rely on loading DR7 on entryPaolo Bonzini2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, this works even if the bit is not in "min", because the bit is always set in MSR_IA32_VMX_ENTRY_CTLS. Mention it for the sake of documentation, and to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: x86: Remove return code from enable_irq/nmi_windowJan Kiszka2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's no longer possible to enter enable_irq_window in guest mode when L1 intercepts external interrupts and we are entering L2. This is now caught in vcpu_enter_guest. So we can remove the check from the VMX version of enable_irq_window, thus the need to return an error code from both enable_irq_window and enable_nmi_window. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interruptJan Kiszka2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to SDM 27.2.3, IDT vectoring information will not be valid on vmexits caused by external NMIs. So we have to avoid creating such scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we have a pending interrupt because that one would be migrated to L1's IDT vectoring info on nested exit. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Fully emulate preemption timerJan Kiszka2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot rely on the hardware-provided preemption timer support because we are holding L2 in HLT outside non-root mode. Furthermore, emulating the preemption will resolve tick rate errata on older Intel CPUs. The emulation is based on hrtimer which is started on L2 entry, stopped on L2 exit and evaluated via the new check_nested_events hook. As we no longer rely on hardware features, we can enable both the preemption timer support and value saving unconditionally. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | KVM: nVMX: Rework interception of IRQs and NMIsJan Kiszka2014-03-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the check for leaving L2 on pending and intercepted IRQs or NMIs from the *_allowed handler into a dedicated callback. Invoke this callback at the relevant points before KVM checks if IRQs/NMIs can be injected. The callback has the task to switch from L2 to L1 if needed and inject the proper vmexit events. The rework fixes L2 wakeups from HLT and provides the foundation for preemption timer emulation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge tag 'kvm-s390-20140306' of ↵Paolo Bonzini2014-03-06
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next One fix for virtio-ccw, fixing a problem introduced with "virtio_ccw: fix vcdev pointer handling issues" and noticed just after it went into git.
| | * | | | | | virtio_ccw: fix hang in set offline processingHeinz Graalfs2014-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During set offline processing virtio_grab_drvdata() incorrectly calls dev_set_drvdata() to remove the virtio_ccw_device from the parent ccw_device's driver data. This is wrong and ends up in a hang during virtio_ccw_reset(), as the interrupt handler still has need of the virtio_ccw_device. A new field 'going_away' is introduced in struct virtio_ccw_device to control the usage of the ccw_device's driver data pointer in virtio_grab_drvdata(). Signed-off-by: Heinz Graalfs <graalfs@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * | | | | | | Merge tag 'kvm-for-3.15-1' of ↵Paolo Bonzini2014-03-04
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next
| | * | | | | | | ARM: KVM: fix warning in mmu.cMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiling with THP enabled leads to the following warning: arch/arm/kvm/mmu.c: In function ‘unmap_range’: arch/arm/kvm/mmu.c:177:39: warning: ‘pte’ may be used uninitialized in this function [-Wmaybe-uninitialized] if (kvm_pmd_huge(*pmd) || page_empty(pte)) { ^ Code inspection reveals that these two cases are mutually exclusive, so GCC is a bit overzealous here. Silence it anyway by initializing pte to NULL and testing it later on. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | ARM: KVM: trap VM system registers until MMU and caches are ONMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to be able to detect the point where the guest enables its MMU and caches, trap all the VM related system registers. Once we see the guest enabling both the MMU and the caches, we can go back to a saner mode of operation, which is to leave these registers in complete control of the guest. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | ARM: KVM: add world-switch for AMAIR{0,1}Marc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HCR.TVM traps (among other things) accesses to AMAIR0 and AMAIR1. In order to minimise the amount of surprise a guest could generate by trying to access these registers with caches off, add them to the list of registers we switch/handle. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | | | | | ARM: KVM: introduce per-vcpu HYP Configuration RegisterMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So far, KVM/ARM used a fixed HCR configuration per guest, except for the VI/VF/VA bits to control the interrupt in absence of VGIC. With the upcoming need to dynamically reconfigure trapping, it becomes necessary to allow the HCR to be changed on a per-vcpu basis. The fix here is to mimic what KVM/arm64 already does: a per vcpu HCR field, initialized at setup time. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | | | | | ARM: KVM: fix ordering of 64bit coprocessor accessesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 240e99cbd00a (ARM: KVM: Fix 64-bit coprocessor handling) added an ordering dependency for the 64bit registers. The order described is: CRn, CRm, Op1, Op2, 64bit-first. Unfortunately, the implementation is: CRn, 64bit-first, CRm... Move the 64bit test to be last in order to match the documentation. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | | | | | ARM: KVM: fix handling of trapped 64bit coprocessor accessesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 240e99cbd00a (ARM: KVM: Fix 64-bit coprocessor handling) changed the way we match the 64bit coprocessor access from user space, but didn't update the trap handler for the same set of registers. The effect is that a trapped 64bit access is never matched, leading to a fault being injected into the guest. This went unnoticed as we didn't really trap any 64bit register so far. Placing the CRm field of the access into the CRn field of the matching structure fixes the problem. Also update the debug feature to emit the expected string in case of failing match. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | | | | | ARM: KVM: force cache clean on page fault when caches are offMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order for a guest with caches disabled to observe data written contained in a given page, we need to make sure that page is committed to memory, and not just hanging in the cache (as guest accesses are completely bypassing the cache until it decides to enable it). For this purpose, hook into the coherent_cache_guest_page function and flush the region if the guest SCTLR register doesn't show the MMU and caches as being enabled. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
| | * | | | | | | arm64: KVM: flush VM pages before letting the guest enable cachesMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the guest runs with caches disabled (like in an early boot sequence, for example), all the writes are diectly going to RAM, bypassing the caches altogether. Once the MMU and caches are enabled, whatever sits in the cache becomes suddenly visible, which isn't what the guest expects. A way to avoid this potential disaster is to invalidate the cache when the MMU is being turned on. For this, we hook into the SCTLR_EL1 trapping code, and scan the stage-2 page tables, invalidating the pages/sections that have already been mapped in. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
| | * | | | | | | ARM: KVM: introduce kvm_p*d_addr_endMarc Zyngier2014-03-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of p*d_addr_end with stage-2 translation is slightly dodgy, as the IPA is 40bits, while all the p*d_addr_end helpers are taking an unsigned long (arm64 is fine with that as unligned long is 64bit). The fix is to introduce 64bit clean versions of the same helpers, and use them in the stage-2 page table code. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>