aboutsummaryrefslogtreecommitdiffstats
path: root/arch/s390/kvm/kvm-s390.c
Commit message (Collapse)AuthorAge
* KVM: s390: disable RRBM againChristian Borntraeger2015-04-15
| | | | | | | | | | | | | commit b273921356df ("KVM: s390: enable more features that need no hypervisor changes") also enabled RRBM. Turns out that this instruction does need some KVM code, so lets disable that bit again. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Fixes: b273921356df ("KVM: s390: enable more features that need no hypervisor changes") Message-Id: <1429093624-49611-2-git-send-email-borntraeger@de.ibm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2015-04-13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "First batch of KVM changes for 4.1 The most interesting bit here is irqfd/ioeventfd support for ARM and ARM64. Summary: ARM/ARM64: fixes for live migration, irqfd and ioeventfd support (enabling vhost, too), page aging s390: interrupt handling rework, allowing to inject all local interrupts via new ioctl and to get/set the full local irq state for migration and introspection. New ioctls to access memory by virtual address, and to get/set the guest storage keys. SIMD support. MIPS: FPU and MIPS SIMD Architecture (MSA) support. Includes some patches from Ralf Baechle's MIPS tree. x86: bugfixes (notably for pvclock, the others are small) and cleanups. Another small latency improvement for the TSC deadline timer" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (146 commits) KVM: use slowpath for cross page cached accesses kvm: mmu: lazy collapse small sptes into large sptes KVM: x86: Clear CR2 on VCPU reset KVM: x86: DR0-DR3 are not clear on reset KVM: x86: BSP in MSR_IA32_APICBASE is writable KVM: x86: simplify kvm_apic_map KVM: x86: avoid logical_map when it is invalid KVM: x86: fix mixed APIC mode broadcast KVM: x86: use MDA for interrupt matching kvm/ppc/mpic: drop unused IRQ_testbit KVM: nVMX: remove unnecessary double caching of MAXPHYADDR KVM: nVMX: checks for address bits beyond MAXPHYADDR on VM-entry KVM: x86: cache maxphyaddr CPUID leaf in struct kvm_vcpu KVM: vmx: pass error code with internal error #2 x86: vdso: fix pvclock races with task migration KVM: remove kvm_read_hva and kvm_read_hva_atomic KVM: x86: optimize delivery of TSC deadline timer interrupt KVM: x86: extract blocking logic from __vcpu_run kvm: x86: fix x86 eflags fixed bit KVM: s390: migrate vcpu interrupt state ...
| * KVM: s390: migrate vcpu interrupt stateJens Freimann2015-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to migrate vcpu interrupts. Two new vcpu ioctls are added which get/set the complete status of pending interrupts in one go. The ioctls are marked as available with the new capability KVM_CAP_S390_IRQ_STATE. We can not use a ONEREG, as the number of pending local interrupts is not constant and depends on the number of CPUs. To retrieve the interrupt state we add an ioctl KVM_S390_GET_IRQ_STATE. Its input parameter is a pointer to a struct kvm_s390_irq_state which has a buffer and length. For all currently pending interrupts, we copy a struct kvm_s390_irq into the buffer and pass it to userspace. To store interrupt state into a buffer provided by userspace, we add an ioctl KVM_S390_SET_IRQ_STATE. It passes a struct kvm_s390_irq_state into the kernel and injects all interrupts contained in the buffer. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * KVM: s390: add ioctl to inject local interruptsJens Freimann2015-03-31
| | | | | | | | | | | | | | | | | | | | | | We have introduced struct kvm_s390_irq a while ago which allows to inject all kinds of interrupts as defined in the Principles of Operation. Add ioctl to inject interrupts with the extended struct kvm_s390_irq Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * KVM: s390: deliver floating interrupts in order of priorityJens Freimann2015-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes interrupt handling compliant to the z/Architecture Principles of Operation with regard to interrupt priorities. Add a bitmap for pending floating interrupts. Each bit relates to a interrupt type and its list. A turned on bit indicates that a list contains items (interrupts) which need to be delivered. When delivering interrupts on a cpu we can merge the existing bitmap for cpu-local interrupts and floating interrupts and have a single mechanism for delivery. Currently we have one list for all kinds of floating interrupts and a corresponding spin lock. This patch adds a separate list per interrupt type. An exception to this are service signal and machine check interrupts, as there can be only one pending interrupt at a time. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * KVM: s390: enable more features that need no hypervisor changesChristian Borntraeger2015-03-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After some review about what these facilities do, the following facilities will work under KVM and can, therefore, be reported to the guest if the cpu model and the host cpu provide this bit. There are plans underway to make the whole bit thing more readable, but its not yet finished. So here are some last bit changes and we enhance the KVM mask with: 9 The sense-running-status facility is installed in the z/Architecture architectural mode. ---> handled by SIE or KVM 10 The conditional-SSKE facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 13 The IPTE-range facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 36 The enhanced-monitor facility is installed in the z/Architecture architectural mode. ---> handled by SIE 47 The CMPSC-enhancement facility is installed in the z/Architecture architectural mode. ---> handled by SIE 48 The decimal-floating-point zoned-conversion facility is installed in the z/Architecture architectural mode. ---> handled by SIE 49 The execution-hint, load-and-trap, miscellaneous- instruction-extensions and processor-assist ---> handled by SIE 51 The local-TLB-clearing facility is installed in the z/Architecture architectural mode. ---> handled by SIE 52 The interlocked-access facility 2 is installed. ---> handled by SIE 53 The load/store-on-condition facility 2 and load-and- zero-rightmost-byte facility are installed in the z/Architecture architectural mode. ---> handled by SIE 57 The message-security-assist-extension-5 facility is installed in the z/Architecture architectural mode. ---> handled by SIE 66 The reset-reference-bits-multiple facility is installed in the z/Architecture architectural mode. ---> handled by SIE. KVM will retry SIE 80 The decimal-floating-point packed-conversion facility is installed in the z/Architecture architectural mode. ---> handled by SIE Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Tested-by: Michael Mueller <mimu@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
| * KVM: s390: represent SIMD cap in kvm facilityMichael Mueller2015-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch represents capability KVM_CAP_S390_VECTOR_REGISTERS by means of the SIMD facility bit. This allows to a) disable the use of SIMD when used in conjunction with a not-SIMD-aware QEMU, b) to enable SIMD when used with a SIMD-aware version of QEMU and c) finally by means of a QEMU version using the future cpu model ioctls. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Tested-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: drop SIMD bit from kvm_s390_fac_list_maskMichael Mueller2015-03-17
| | | | | | | | | | | | | | | | | | | | | | Setting the SIMD bit in the KVM mask is an issue because it makes the facility visible but not usable to the guest, thus it needs to be removed again. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Create ioctl for Getting/Setting guest storage keysJason J. Herne2015-03-17
| | | | | | | | | | | | | | | | | | | | Provide the KVM_S390_GET_SKEYS and KVM_S390_SET_SKEYS ioctl which can be used to get/set guest storage keys. This functionality is needed for live migration of s390 guests that use storage keys. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: introduce post handlers for STSIEkaterina Tumanova2015-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Store System Information (STSI) instruction currently collects all information it relays to the caller in the kernel. Some information, however, is only available in user space. An example of this is the guest name: The kernel always sets "KVMGuest", but user space knows the actual guest name. This patch introduces a new exit, KVM_EXIT_S390_STSI, guarded by a capability that can be enabled by user space if it wants to be able to insert such data. User space will be provided with the target buffer and the requested STSI function code. Reviewed-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Add MEMOP ioctls for reading/writing guest memoryThomas Huth2015-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | On s390, we've got to make sure to hold the IPTE lock while accessing logical memory. So let's add an ioctl for reading and writing logical memory to provide this feature for userspace, too. The maximum transfer size of this call is limited to 64kB to prevent that the guest can trigger huge copy_from/to_user transfers. QEMU currently only requests up to one or two pages so far, so 16*4kB seems to be a reasonable limit here. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Guest's memory access functions get access registersAlexander Yarygin2015-03-17
| | | | | | | | | | | | | | | | | | | | | | In access register mode, the write_guest() read_guest() and other functions will invoke the access register translation, which requires an ar, designated by one of the instruction fields. Signed-off-by: Alexander Yarygin <yarygin@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: cleanup jump lables in kvm_arch_init_vmDominik Dingel2015-03-17
| | | | | | | | | | | | | | | | | | As all cleanup functions can handle their respective NULL case there is no need to have more than one error jump label. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Enable vector support for capable guestEric Farman2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | We finally have all the pieces in place, so let's include the vector facility bit in the mask of available hardware facilities for the guest to recognize. Also, enable the vector functionality in the guest control blocks, to avoid a possible vector data exception that would otherwise occur when a vector instruction is issued by the guest operating system. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Machine CheckEric Farman2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | Store additional status in the machine check handler, in order to collect status (such as vector registers) that is not defined by store status. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Add new SIGP order to kernel countersEric Farman2015-03-06
| | | | | | | | | | | | | | | | | | | | | | The new SIGP order Store Additional Status at Address is totally handled by user space, but we should still record the occurrence of this order in the kernel code. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Allocate and save/restore vector registersEric Farman2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define and allocate space for both the host and guest views of the vector registers for a given vcpu. The 32 vector registers occupy 128 bits each (512 bytes total), but architecturally are paired with 512 additional bytes of reserved space for future expansion. The kvm_sync_regs structs containing the registers are union'ed with 1024 bytes of padding in the common kvm_run struct. The addition of 1024 bytes of new register information clearly exceeds the existing union, so an expansion of that padding is required. When changing environments, we need to appropriately save and restore the vector registers viewed by both the host and guest, into and out of the sync_regs space. The floating point registers overlay the upper half of vector registers 0-15, so there's a bit of data duplication here that needs to be carefully avoided. Signed-off-by: Eric Farman <farman@linux.vnet.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: perform vcpu model setup in a functionMichael Mueller2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | The function kvm_s390_vcpu_setup_model() now performs all cpu model realated setup tasks for a vcpu. Besides cpuid and ibc initialization, facility list assignment takes place during the setup step as well. The model setup has been pulled to the begin of vcpu setup to allow kvm facility tests. There is no need to protect the cpu model setup with a lock since the attributes can't be changed anymore as soon the first vcpu is online. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Forward PSW to next instruction for addressing exceptionsThomas Huth2015-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the SIE exited by a DAT access exceptions which we can not resolve, the guest tried to access a page which is out of bounds and can not be paged-in. In this case we have to signal the bad access by injecting an address exception. However, address exceptions are either suppressing or terminating, i.e. the PSW has to point to the next instruction when the exception is delivered. Since the originating DAT access exception is nullifying, the PSW still points to the offending instruction instead, so we've got to forward the PSW to the next instruction. Having fixed this issue, we can now also enable the TPROT interpretation facility again which had been disabled because of this problem. Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | kvm: move advertising of KVM_CAP_IRQFD to common codePaolo Bonzini2015-03-10
|/ | | | | | | | | | | | | | | POWER supports irqfds but forgot to advertise them. Some userspace does not check for the capability, but others check it---thus they work on x86 and s390 but not POWER. To avoid that other architectures in the future make the same mistake, let common code handle KVM_CAP_IRQFD the same way as KVM_CAP_IRQFD_RESAMPLE. Reported-and-tested-by: Greg Kurz <gkurz@linux.vnet.ibm.com> Cc: stable@vger.kernel.org Fixes: 297e21053a52f060944e9f0de4c64fad9bcd72fc Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: s390: non-LPAR case obsolete during facilities mask initMichael Mueller2015-03-04
| | | | | | | | With patch "include guest facilities in kvm facility test" it is no longer necessary to have special handling for the non-LPAR case. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: include guest facilities in kvm facility testMichael Mueller2015-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | Most facility related decisions in KVM have to take into account: - the facilities offered by the underlying run container (LPAR/VM) - the facilities supported by the KVM code itself - the facilities requested by a guest VM This patch adds the KVM driver requested facilities to the test routine. It additionally renames struct s390_model_fac to kvm_s390_fac and its field names to be more meaningful. The semantics of the facilities stored in the KVM architecture structure is changed. The address arch.model.fac->list now points to the guest facility list and arch.model.fac->mask points to the KVM facility mask. This patch fixes the behaviour of KVM for some facilities for guests that ignore the guest visible facility bits, e.g. guests could use transactional memory intructions on hosts supporting them even if the chosen cpu model would not offer them. The userspace interface is not affected by this change. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: fix in memory copy of facility listsMichael Mueller2015-03-04
| | | | | | | The facility lists were not fully copied. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390/cpacf: Fix kernel bug under z/VMChristian Borntraeger2015-03-04
| | | | | | | | | | | | | | | | | | | | | | | | Under z/VM PQAP might trigger an operation exception if no crypto cards are defined via APVIRTUAL or APDEDICATED. [ 386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable] [ 386.098693] illegal operation: 0001 ilc:2 [#1] SMP [...] [ 386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98) [...] [ 386.098804] [<000000000013627c>] kvm_arch_init_vm+0x29c/0x358 [ 386.098806] [<000000000012d008>] kvm_dev_ioctl+0xc0/0x460 [ 386.098809] [<00000000002c639a>] do_vfs_ioctl+0x332/0x508 [ 386.098811] [<00000000002c660e>] SyS_ioctl+0x9e/0xb0 [ 386.098814] [<000000000070476a>] system_call+0xd6/0x258 [ 386.098815] [<000003fffc7400a2>] 0x3fffc7400a2 Lets add an extable entry and provide a zeroed config in that case. Reported-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com> Tested-by: Stefan Zimmermann <stzi@linux.vnet.ibm.com>
* KVM: s390/cpacf: Enable key wrapping by defaultTony Krowiak2015-03-03
| | | | | | | z/VM and LPAR enable key wrapping by default, lets do the same on KVM. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: add cpu model supportMichael Mueller2015-02-09
| | | | | | | | | | | | | | | | | | | | | | | | This patch enables cpu model support in kvm/s390 via the vm attribute interface. During KVM initialization, the host properties cpuid, IBC value and the facility list are stored in the architecture specific cpu model structure. During vcpu setup, these properties are taken to initialize the related SIE state. This mechanism allows to adjust the properties from user space and thus to implement different selectable cpu models. This patch uses the IBC functionality to block instructions that have not been implemented at the requested CPU type and GA level compared to the full host capability. Userspace has to initialize the cpu model before vcpu creation. A cpu model change of running vcpus is not possible. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: use facilities and cpu_id per KVMMichael Mueller2015-02-09
| | | | | | | | | | | The patch introduces facilities and cpu_ids per virtual machine. Different virtual machines may want to expose different facilities and cpu ids to the guest, so let's make them per-vm instead of global. Signed-off-by: Michael Mueller <mimu@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390/CPACF: Choose crypto control block formatTony Krowiak2015-02-09
| | | | | | | | | | | We need to specify a different format for the crypto control block depending on whether the APXA facility is installed or not. Let's test for it by executing the PQAP(QCI) function and use either a format-1 or a format-2 crypto control block accordingly. This is a host only change for z13 and does not affect the guest view. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: reenable LPP facilityChristian Borntraeger2015-02-09
| | | | | | | | | | | | | | | | | | | commit 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering facility bit") accidentially disabled the "load program parameter" facility bit during rebase for upstream submission (my fault). Re-add that bit. As this is only for a performance measurement helper instruction (used by KVM itself) cc stable is not necessary see http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a (SA23-2260 The Load-Program-Parameter and CPU-Measurement Facilities) for details about LPP and its usecase. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Fixes: 7be81a46695d ("KVM: s390/facilities: allow TOD-CLOCK steering")
* kvm: add halt_poll_ns module parameterPaolo Bonzini2015-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a new module parameter for the KVM module; when it is present, KVM attempts a bit of polling on every HLT before scheduling itself out via kvm_vcpu_block. This parameter helps a lot for latency-bound workloads---in particular I tested it with O_DSYNC writes with a battery-backed disk in the host. In this case, writes are fast (because the data doesn't have to go all the way to the platters) but they cannot be merged by either the host or the guest. KVM's performance here is usually around 30% of bare metal, or 50% if you use cache=directsync or cache=writethrough (these parameters avoid that the guest sends pointless flush requests, and at the same time they are not slow because of the battery-backed cache). The bad performance happens because on every halt the host CPU decides to halt itself too. When the interrupt comes, the vCPU thread is then migrated to a new physical CPU, and in general the latency is horrible because the vCPU thread has to be scheduled back in. With this patch performance reaches 60-65% of bare metal and, more important, 99% of what you get if you use idle=poll in the guest. This means that the tunable gets rid of this particular bottleneck, and more work can be done to improve performance in the kernel or QEMU. Of course there is some price to pay; every time an otherwise idle vCPUs is interrupted by an interrupt, it will poll unnecessarily and thus impose a little load on the host. The above results were obtained with a mostly random value of the parameter (500000), and the load was around 1.5-2.5% CPU usage on one of the host's core for each idle guest vCPU. The patch also adds a new stat, /sys/kernel/debug/kvm/halt_successful_poll, that can be used to tune the parameter. It counts how many HLT instructions received an interrupt during the polling period; each successful poll avoids that Linux schedules the VCPU thread out and back in, and may also avoid a likely trip to C1 and back for the physical CPU. While the VM is idle, a Linux 4 VCPU VM halts around 10 times per second. Of these halts, almost all are failed polls. During the benchmark, instead, basically all halts end within the polling period, except a more or less constant stream of 50 per second coming from vCPUs that are not running the benchmark. The wasted time is thus very low. Things may be slightly different for Windows VMs, which have a ~10 ms timer tick. The effect is also visible on Marcelo's recently-introduced latency test for the TSC deadline timer. Though of course a non-RT kernel has awful latency bounds, the latency of the timer is around 8000-10000 clock cycles compared to 20000-120000 without setting halt_poll_ns. For the TSC deadline timer, thus, the effect is both a smaller average latency and a smaller variance. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390/cpacf: Enable/disable protected key functions for kvm guestTony Krowiak2015-01-23
| | | | | | | | | | | | | | | | Created new KVM device attributes for indicating whether the AES and DES/TDES protected key functions are available for programs running on the KVM guest. The attributes are used to set up the controls in the guest SIE block that specify whether programs running on the guest will be given access to the protected key functions available on the s390 hardware. Signed-off-by: Tony Krowiak <akrowiak@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: Michael Mueller <mimu@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> [split MSA4/protected key into two patches]
* KVM: s390: Provide guest TOD Clock Get/Set ControlsJason J. Herne2015-01-23
| | | | | | | | | | | | | | | | Provide controls for setting/getting the guest TOD clock based on the VM attribute interface. Provide TOD and TOD_HIGH vm attributes on s390 for managing guest Time Of Day clock value. TOD_HIGH is presently always set to 0. In the future it will contain a high order expansion of the tod clock value after it overflows the 64-bits of the TOD. Signed-off-by: Jason J. Herne <jjherne@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: forward most SIGP orders to user spaceDavid Hildenbrand2015-01-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Most SIGP orders are handled partially in kernel and partially in user space. In order to: - Get a correct SIGP SET PREFIX handler that informs user space - Avoid race conditions between concurrently executed SIGP orders - Serialize SIGP orders per VCPU We need to handle all "slow" SIGP orders in user space. The remaining ones to be handled completely in kernel are: - SENSE - SENSE RUNNING - EXTERNAL CALL - EMERGENCY SIGNAL - CONDITIONAL EMERGENCY SIGNAL According to the PoP, they have to be fast. They can be executed without conflicting to the actions of other pending/concurrently executing orders (e.g. STOP vs. START). This patch introduces a new capability that will - when enabled - forward all but the mentioned SIGP orders to user space. The instruction counters in the kernel are still updated. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: clear the pfault queue if user space sets the invalid tokenDavid Hildenbrand2015-01-23
| | | | | | | | | | | | | We need a way to clear the async pfault queue from user space (e.g. for resets and SIGP SET ARCHITECTURE). This patch simply clears the queue as soon as user space sets the invalid pfault token. The definition of the invalid token is moved to uapi. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: only one external call may be pending at a timeDavid Hildenbrand2015-01-23
| | | | | | | | | | | | | | | | | | | | Only one external call may be pending at a vcpu at a time. For this reason, we have to detect whether the SIGP externcal call interpretation facility is available. If so, all external calls have to be injected using this mechanism. SIGP EXTERNAL CALL orders have to return whether another external call is already pending. This check was missing until now. SIGP SENSE hasn't returned yet in all conditions whether an external call was pending. If a SIGP EXTERNAL CALL irq is to be injected and one is already pending, -EBUSY is returned. Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: a VCPU may only stop when no interrupts are left pendingDavid Hildenbrand2015-01-23
| | | | | | | | | | | | | | As a SIGP STOP is an interrupt with the least priority, it may only result in stop of the vcpu when no other interrupts are left pending. To detect whether a non-stop irq is pending, we need a way to mask out stop irqs from the general kvm_cpu_has_interrupt() function. For this reason, the existing function (with an outdated name) is replaced by kvm_s390_vcpu_has_irq() which allows to mask out pending stop irqs. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: handle stop irqs without action_bitsDavid Hildenbrand2015-01-23
| | | | | | | | | | | | | | | | | | | | | This patch removes the famous action_bits and moves the handling of SIGP STOP AND STORE STATUS directly into the SIGP STOP interrupt. The new local interrupt infrastructure is used to track pending stop requests. STOP irqs are the only irqs that don't get actively delivered. They remain pending until the stop function is executed (=stop intercept). If another STOP irq is already pending, -EBUSY will now be returned (needed for the SIGP handling code). Migration of pending SIGP STOP (AND STORE STATUS) orders should now be supported out of the box. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: base hrtimer on a monotonic clockDavid Hildenbrand2015-01-23
| | | | | | | | | | | | The hrtimer that handles the wait with enabled timer interrupts should not be disturbed by changes of the host time. This patch changes our hrtimer to be based on a monotonic clock. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: Allow userspace to limit guest memory sizeDominik Dingel2015-01-23
| | | | | | | | | | | | | | | | | With commit c6c956b80bdf ("KVM: s390/mm: support gmap page tables with less than 5 levels") we are able to define a limit for the guest memory size. As we round up the guest size in respect to the levels of page tables we get to guest limits of: 2048 MB, 4096 GB, 8192 TB and 16384 PB. We currently limit the guest size to 16 TB, which means we end up creating a page table structure supporting guest sizes up to 8192 TB. This patch introduces an interface that allows userspace to tune this limit. This may bring performance improvements for small guests. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390: move vcpu specific initalization to a later pointDominik Dingel2015-01-23
| | | | | | | | | | | | | | | | As we will allow in a later patch to recreate gmaps with new limits, we need to make sure that vcpus get their reference for that gmap after they increased the online_vcpu counter, so there is no possible race. While we are doing this, we also can simplify the vcpu_init function, by moving ucontrol specifics to an own function. That way we also start now setting the kvm_valid_regs for the ucontrol path. Reviewed-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: remove unneeded return value of vcpu_postcreateDominik Dingel2015-01-23
| | | | | | | | | | | | The return value of kvm_arch_vcpu_postcreate is not checked in its caller. This is okay, because only x86 provides vcpu_postcreate right now and it could only fail if vcpu_load failed. But that is not possible during KVM_CREATE_VCPU (kvm_arch_vcpu_load is void, too), so just get rid of the unchecked return value. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-12-18
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM update from Paolo Bonzini: "3.19 changes for KVM: - spring cleaning: removed support for IA64, and for hardware- assisted virtualization on the PPC970 - ARM, PPC, s390 all had only small fixes For x86: - small performance improvements (though only on weird guests) - usual round of hardware-compliancy fixes from Nadav - APICv fixes - XSAVES support for hosts and guests. XSAVES hosts were broken because the (non-KVM) XSAVES patches inadvertently changed the KVM userspace ABI whenever XSAVES was enabled; hence, this part is going to stable. Guest support is just a matter of exposing the feature and CPUID leaves support" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits) KVM: move APIC types to arch/x86/ KVM: PPC: Book3S: Enable in-kernel XICS emulation by default KVM: PPC: Book3S HV: Improve H_CONFER implementation KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register KVM: PPC: Book3S HV: Remove code for PPC970 processors KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions KVM: PPC: Book3S HV: Simplify locking around stolen time calculations arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function arch: powerpc: kvm: book3s_pr.c: Remove unused function arch: powerpc: kvm: book3s.c: Remove some unused functions arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked KVM: PPC: Book3S HV: ptes are big endian KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI KVM: PPC: Book3S HV: Fix KSM memory corruption KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI KVM: PPC: Book3S HV: Fix computation of tlbie operand KVM: PPC: Book3S HV: Add missing HPTE unlock KVM: PPC: BookE: Improve irq inject tracepoint arm/arm64: KVM: Require in-kernel vgic for the arch timers ...
| * KVM: s390: handle pending local interrupts via bitmapJens Freimann2014-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adapts handling of local interrupts to be more compliant with the z/Architecture Principles of Operation and introduces a data structure which allows more efficient handling of interrupts. * get rid of li->active flag, use bitmap instead * Keep interrupts in a bitmap instead of a list * Deliver interrupts in the order of their priority as defined in the PoP * Use a second bitmap for sigp emergency requests, as a CPU can have one request pending from every other CPU in the system. Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: sigp: instruction counters for all sigp ordersDavid Hildenbrand2014-10-28
| | | | | | | | | | | | | | | | | | This patch introduces instruction counters for all known sigp orders and also a separate one for unknown orders that are passed to user space. Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
| * KVM: s390: Make the simple ipte mutex specific to a VM instead of globalThomas Huth2014-10-28
| | | | | | | | | | | | | | | | | | | | | | | | The ipte-locking should be done for each VM seperately, not globally. This way we avoid possible congestions when the simple ipte-lock is used and multiple VMs are running. Suggested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* | s390/mm: recfactor global pgste updatesDominik Dingel2014-10-27
|/ | | | | | | | | | | Replace the s390 specific page table walker for the pgste updates with a call to the common code walk_page_range function. There are now two pte modification functions, one for the reset of the CMMA state and another one for the initialization of the storage keys. Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
* KVM: s390: count vcpu wakeups in stat.halt_wakeupDavid Hildenbrand2014-10-01
| | | | | | | | | This patch introduces the halt_wakeup counter used by common code and uses it to count vcpu wakeups done in s390 arch specific code. Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
* KVM: s390/facilities: allow TOD-CLOCK steering facility bitChristian Borntraeger2014-10-01
| | | | | | | There is nothing to do for KVM to support TOD-CLOCK steering. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
* KVM: s390: register flic ops dynamicallyCornelia Huck2014-09-17
| | | | | | | | | | | Using the new kvm_register_device_ops() interface makes us get rid of an #ifdef in common code. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* KVM: s390: Limit guest size to 16TBChristian Borntraeger2014-09-10
| | | | | | | | | | | | | | | | Currently we fill up a full 5 level page table to hold the guest mapping. Since commit "support gmap page tables with less than 5 levels" we can do better. Having more than 4 TB might be useful for some testing scenarios, so let's just limit ourselves to 16TB guest size. Having more than that is totally untested as I do not have enough swap space/memory. We continue to allow ucontrol the full size. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>