diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2013-02-24 16:07:18 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2013-02-24 16:07:18 -0500 |
commit | 89f883372fa60f604d136924baf3e89ff1870e9e (patch) | |
tree | cb69b0a14957945ba00d3d392bf9ccbbef56f3b8 /Documentation/virtual/kvm | |
parent | 9e2d59ad580d590134285f361a0e80f0e98c0207 (diff) | |
parent | 6b73a96065e89dc9fa75ba4f78b1aa3a3bbd0470 (diff) |
Merge tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Marcelo Tosatti:
"KVM updates for the 3.9 merge window, including x86 real mode
emulation fixes, stronger memory slot interface restrictions, mmu_lock
spinlock hold time reduction, improved handling of large page faults
on shadow, initial APICv HW acceleration support, s390 channel IO
based virtio, amongst others"
* tag 'kvm-3.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (143 commits)
Revert "KVM: MMU: lazily drop large spte"
x86: pvclock kvm: align allocation size to page size
KVM: nVMX: Remove redundant get_vmcs12 from nested_vmx_exit_handled_msr
x86 emulator: fix parity calculation for AAD instruction
KVM: PPC: BookE: Handle alignment interrupts
booke: Added DBCR4 SPR number
KVM: PPC: booke: Allow multiple exception types
KVM: PPC: booke: use vcpu reference from thread_struct
KVM: Remove user_alloc from struct kvm_memory_slot
KVM: VMX: disable apicv by default
KVM: s390: Fix handling of iscs.
KVM: MMU: cleanup __direct_map
KVM: MMU: remove pt_access in mmu_set_spte
KVM: MMU: cleanup mapping-level
KVM: MMU: lazily drop large spte
KVM: VMX: cleanup vmx_set_cr0().
KVM: VMX: add missing exit names to VMX_EXIT_REASONS array
KVM: VMX: disable SMEP feature when guest is in non-paging mode
KVM: Remove duplicate text in api.txt
Revert "KVM: MMU: split kvm_mmu_free_page"
...
Diffstat (limited to 'Documentation/virtual/kvm')
-rw-r--r-- | Documentation/virtual/kvm/api.txt | 108 | ||||
-rw-r--r-- | Documentation/virtual/kvm/mmu.txt | 7 |
2 files changed, 85 insertions, 30 deletions
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index e0fa0ea2b187..119358dfb742 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt | |||
@@ -219,19 +219,6 @@ allocation of vcpu ids. For example, if userspace wants | |||
219 | single-threaded guest vcpus, it should make all vcpu ids be a multiple | 219 | single-threaded guest vcpus, it should make all vcpu ids be a multiple |
220 | of the number of vcpus per vcore. | 220 | of the number of vcpus per vcore. |
221 | 221 | ||
222 | On powerpc using book3s_hv mode, the vcpus are mapped onto virtual | ||
223 | threads in one or more virtual CPU cores. (This is because the | ||
224 | hardware requires all the hardware threads in a CPU core to be in the | ||
225 | same partition.) The KVM_CAP_PPC_SMT capability indicates the number | ||
226 | of vcpus per virtual core (vcore). The vcore id is obtained by | ||
227 | dividing the vcpu id by the number of vcpus per vcore. The vcpus in a | ||
228 | given vcore will always be in the same physical core as each other | ||
229 | (though that might be a different physical core from time to time). | ||
230 | Userspace can control the threading (SMT) mode of the guest by its | ||
231 | allocation of vcpu ids. For example, if userspace wants | ||
232 | single-threaded guest vcpus, it should make all vcpu ids be a multiple | ||
233 | of the number of vcpus per vcore. | ||
234 | |||
235 | For virtual cpus that have been created with S390 user controlled virtual | 222 | For virtual cpus that have been created with S390 user controlled virtual |
236 | machines, the resulting vcpu fd can be memory mapped at page offset | 223 | machines, the resulting vcpu fd can be memory mapped at page offset |
237 | KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual | 224 | KVM_S390_SIE_PAGE_OFFSET in order to obtain a memory map of the virtual |
@@ -345,7 +332,7 @@ struct kvm_sregs { | |||
345 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; | 332 | __u64 interrupt_bitmap[(KVM_NR_INTERRUPTS + 63) / 64]; |
346 | }; | 333 | }; |
347 | 334 | ||
348 | /* ppc -- see arch/powerpc/include/asm/kvm.h */ | 335 | /* ppc -- see arch/powerpc/include/uapi/asm/kvm.h */ |
349 | 336 | ||
350 | interrupt_bitmap is a bitmap of pending external interrupts. At most | 337 | interrupt_bitmap is a bitmap of pending external interrupts. At most |
351 | one bit may be set. This interrupt has been acknowledged by the APIC | 338 | one bit may be set. This interrupt has been acknowledged by the APIC |
@@ -892,12 +879,12 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr | |||
892 | be identical. This allows large pages in the guest to be backed by large | 879 | be identical. This allows large pages in the guest to be backed by large |
893 | pages in the host. | 880 | pages in the host. |
894 | 881 | ||
895 | The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which instructs | 882 | The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and |
896 | kvm to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG | 883 | KVM_MEM_READONLY. The former can be set to instruct KVM to keep track of |
897 | ioctl. The KVM_CAP_READONLY_MEM capability indicates the availability of the | 884 | writes to memory within the slot. See KVM_GET_DIRTY_LOG ioctl to know how to |
898 | KVM_MEM_READONLY flag. When this flag is set for a memory region, KVM only | 885 | use it. The latter can be set, if KVM_CAP_READONLY_MEM capability allows it, |
899 | allows read accesses. Writes will be posted to userspace as KVM_EXIT_MMIO | 886 | to make a new slot read-only. In this case, writes to this memory will be |
900 | exits. | 887 | posted to userspace as KVM_EXIT_MMIO exits. |
901 | 888 | ||
902 | When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of | 889 | When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of |
903 | the memory region are automatically reflected into the guest. For example, an | 890 | the memory region are automatically reflected into the guest. For example, an |
@@ -931,7 +918,7 @@ documentation when it pops into existence). | |||
931 | 4.37 KVM_ENABLE_CAP | 918 | 4.37 KVM_ENABLE_CAP |
932 | 919 | ||
933 | Capability: KVM_CAP_ENABLE_CAP | 920 | Capability: KVM_CAP_ENABLE_CAP |
934 | Architectures: ppc | 921 | Architectures: ppc, s390 |
935 | Type: vcpu ioctl | 922 | Type: vcpu ioctl |
936 | Parameters: struct kvm_enable_cap (in) | 923 | Parameters: struct kvm_enable_cap (in) |
937 | Returns: 0 on success; -1 on error | 924 | Returns: 0 on success; -1 on error |
@@ -1792,6 +1779,7 @@ registers, find a list below: | |||
1792 | PPC | KVM_REG_PPC_VPA_SLB | 128 | 1779 | PPC | KVM_REG_PPC_VPA_SLB | 128 |
1793 | PPC | KVM_REG_PPC_VPA_DTL | 128 | 1780 | PPC | KVM_REG_PPC_VPA_DTL | 128 |
1794 | PPC | KVM_REG_PPC_EPCR | 32 | 1781 | PPC | KVM_REG_PPC_EPCR | 32 |
1782 | PPC | KVM_REG_PPC_EPR | 32 | ||
1795 | 1783 | ||
1796 | ARM registers are mapped using the lower 32 bits. The upper 16 of that | 1784 | ARM registers are mapped using the lower 32 bits. The upper 16 of that |
1797 | is the register group type, or coprocessor number: | 1785 | is the register group type, or coprocessor number: |
@@ -2108,6 +2096,14 @@ KVM_S390_INT_VIRTIO (vm) - virtio external interrupt; external interrupt | |||
2108 | KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm | 2096 | KVM_S390_INT_SERVICE (vm) - sclp external interrupt; sclp parameter in parm |
2109 | KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm | 2097 | KVM_S390_INT_EMERGENCY (vcpu) - sigp emergency; source cpu in parm |
2110 | KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm | 2098 | KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; source cpu in parm |
2099 | KVM_S390_INT_IO(ai,cssid,ssid,schid) (vm) - compound value to indicate an | ||
2100 | I/O interrupt (ai - adapter interrupt; cssid,ssid,schid - subchannel); | ||
2101 | I/O interruption parameters in parm (subchannel) and parm64 (intparm, | ||
2102 | interruption subclass) | ||
2103 | KVM_S390_MCHK (vm, vcpu) - machine check interrupt; cr 14 bits in parm, | ||
2104 | machine check interrupt code in parm64 (note that | ||
2105 | machine checks needing further payload are not | ||
2106 | supported by this ioctl) | ||
2111 | 2107 | ||
2112 | Note that the vcpu ioctl is asynchronous to vcpu execution. | 2108 | Note that the vcpu ioctl is asynchronous to vcpu execution. |
2113 | 2109 | ||
@@ -2359,8 +2355,8 @@ executed a memory-mapped I/O instruction which could not be satisfied | |||
2359 | by kvm. The 'data' member contains the written data if 'is_write' is | 2355 | by kvm. The 'data' member contains the written data if 'is_write' is |
2360 | true, and should be filled by application code otherwise. | 2356 | true, and should be filled by application code otherwise. |
2361 | 2357 | ||
2362 | NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR | 2358 | NOTE: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_DCR, |
2363 | and KVM_EXIT_PAPR the corresponding | 2359 | KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding |
2364 | operations are complete (and guest state is consistent) only after userspace | 2360 | operations are complete (and guest state is consistent) only after userspace |
2365 | has re-entered the kernel with KVM_RUN. The kernel side will first finish | 2361 | has re-entered the kernel with KVM_RUN. The kernel side will first finish |
2366 | incomplete operations and then check for pending signals. Userspace | 2362 | incomplete operations and then check for pending signals. Userspace |
@@ -2463,6 +2459,41 @@ The possible hypercalls are defined in the Power Architecture Platform | |||
2463 | Requirements (PAPR) document available from www.power.org (free | 2459 | Requirements (PAPR) document available from www.power.org (free |
2464 | developer registration required to access it). | 2460 | developer registration required to access it). |
2465 | 2461 | ||
2462 | /* KVM_EXIT_S390_TSCH */ | ||
2463 | struct { | ||
2464 | __u16 subchannel_id; | ||
2465 | __u16 subchannel_nr; | ||
2466 | __u32 io_int_parm; | ||
2467 | __u32 io_int_word; | ||
2468 | __u32 ipb; | ||
2469 | __u8 dequeued; | ||
2470 | } s390_tsch; | ||
2471 | |||
2472 | s390 specific. This exit occurs when KVM_CAP_S390_CSS_SUPPORT has been enabled | ||
2473 | and TEST SUBCHANNEL was intercepted. If dequeued is set, a pending I/O | ||
2474 | interrupt for the target subchannel has been dequeued and subchannel_id, | ||
2475 | subchannel_nr, io_int_parm and io_int_word contain the parameters for that | ||
2476 | interrupt. ipb is needed for instruction parameter decoding. | ||
2477 | |||
2478 | /* KVM_EXIT_EPR */ | ||
2479 | struct { | ||
2480 | __u32 epr; | ||
2481 | } epr; | ||
2482 | |||
2483 | On FSL BookE PowerPC chips, the interrupt controller has a fast patch | ||
2484 | interrupt acknowledge path to the core. When the core successfully | ||
2485 | delivers an interrupt, it automatically populates the EPR register with | ||
2486 | the interrupt vector number and acknowledges the interrupt inside | ||
2487 | the interrupt controller. | ||
2488 | |||
2489 | In case the interrupt controller lives in user space, we need to do | ||
2490 | the interrupt acknowledge cycle through it to fetch the next to be | ||
2491 | delivered interrupt vector using this exit. | ||
2492 | |||
2493 | It gets triggered whenever both KVM_CAP_PPC_EPR are enabled and an | ||
2494 | external interrupt has just been delivered into the guest. User space | ||
2495 | should put the acknowledged interrupt vector into the 'epr' field. | ||
2496 | |||
2466 | /* Fix the size of the union. */ | 2497 | /* Fix the size of the union. */ |
2467 | char padding[256]; | 2498 | char padding[256]; |
2468 | }; | 2499 | }; |
@@ -2584,3 +2615,34 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: | |||
2584 | where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value. | 2615 | where "num_sets" is the tlb_sizes[] value divided by the tlb_ways[] value. |
2585 | - The tsize field of mas1 shall be set to 4K on TLB0, even though the | 2616 | - The tsize field of mas1 shall be set to 4K on TLB0, even though the |
2586 | hardware ignores this value for TLB0. | 2617 | hardware ignores this value for TLB0. |
2618 | |||
2619 | 6.4 KVM_CAP_S390_CSS_SUPPORT | ||
2620 | |||
2621 | Architectures: s390 | ||
2622 | Parameters: none | ||
2623 | Returns: 0 on success; -1 on error | ||
2624 | |||
2625 | This capability enables support for handling of channel I/O instructions. | ||
2626 | |||
2627 | TEST PENDING INTERRUPTION and the interrupt portion of TEST SUBCHANNEL are | ||
2628 | handled in-kernel, while the other I/O instructions are passed to userspace. | ||
2629 | |||
2630 | When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST | ||
2631 | SUBCHANNEL intercepts. | ||
2632 | |||
2633 | 6.5 KVM_CAP_PPC_EPR | ||
2634 | |||
2635 | Architectures: ppc | ||
2636 | Parameters: args[0] defines whether the proxy facility is active | ||
2637 | Returns: 0 on success; -1 on error | ||
2638 | |||
2639 | This capability enables or disables the delivery of interrupts through the | ||
2640 | external proxy facility. | ||
2641 | |||
2642 | When enabled (args[0] != 0), every time the guest gets an external interrupt | ||
2643 | delivered, it automatically exits into user space with a KVM_EXIT_EPR exit | ||
2644 | to receive the topmost interrupt vector. | ||
2645 | |||
2646 | When disabled (args[0] == 0), behavior is as if this facility is unsupported. | ||
2647 | |||
2648 | When this capability is enabled, KVM_EXIT_EPR can occur. | ||
diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index fa5f1dbc6b23..43fcb761ed16 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt | |||
@@ -187,13 +187,6 @@ Shadow pages contain the following information: | |||
187 | perform a reverse map from a pte to a gfn. When role.direct is set, any | 187 | perform a reverse map from a pte to a gfn. When role.direct is set, any |
188 | element of this array can be calculated from the gfn field when used, in | 188 | element of this array can be calculated from the gfn field when used, in |
189 | this case, the array of gfns is not allocated. See role.direct and gfn. | 189 | this case, the array of gfns is not allocated. See role.direct and gfn. |
190 | slot_bitmap: | ||
191 | A bitmap containing one bit per memory slot. If the page contains a pte | ||
192 | mapping a page from memory slot n, then bit n of slot_bitmap will be set | ||
193 | (if a page is aliased among several slots, then it is not guaranteed that | ||
194 | all slots will be marked). | ||
195 | Used during dirty logging to avoid scanning a shadow page if none if its | ||
196 | pages need tracking. | ||
197 | root_count: | 190 | root_count: |
198 | A counter keeping track of how many hardware registers (guest cr3 or | 191 | A counter keeping track of how many hardware registers (guest cr3 or |
199 | pdptrs) are now pointing at the page. While this counter is nonzero, the | 192 | pdptrs) are now pointing at the page. While this counter is nonzero, the |