diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-04 12:30:33 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-10-04 12:30:33 -0400 |
commit | ecefbd94b834fa32559d854646d777c56749ef1c (patch) | |
tree | ca8958900ad9e208a8e5fb7704f1b66dc76131b4 /Documentation | |
parent | ce57e981f2b996aaca2031003b3f866368307766 (diff) | |
parent | 3d11df7abbff013b811d5615320580cd5d9d7d31 (diff) |
Merge tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM updates from Avi Kivity:
"Highlights of the changes for this release include support for vfio
level triggered interrupts, improved big real mode support on older
Intels, a streamlines guest page table walker, guest APIC speedups,
PIO optimizations, better overcommit handling, and read-only memory."
* tag 'kvm-3.7-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (138 commits)
KVM: s390: Fix vcpu_load handling in interrupt code
KVM: x86: Fix guest debug across vcpu INIT reset
KVM: Add resampling irqfds for level triggered interrupts
KVM: optimize apic interrupt delivery
KVM: MMU: Eliminate pointless temporary 'ac'
KVM: MMU: Avoid access/dirty update loop if all is well
KVM: MMU: Eliminate eperm temporary
KVM: MMU: Optimize is_last_gpte()
KVM: MMU: Simplify walk_addr_generic() loop
KVM: MMU: Optimize pte permission checks
KVM: MMU: Update accessed and dirty bits after guest pagetable walk
KVM: MMU: Move gpte_access() out of paging_tmpl.h
KVM: MMU: Optimize gpte_access() slightly
KVM: MMU: Push clean gpte write protection out of gpte_access()
KVM: clarify kvmclock documentation
KVM: make processes waiting on vcpu mutex killable
KVM: SVM: Make use of asm.h
KVM: VMX: Make use of asm.h
KVM: VMX: Make lto-friendly
KVM: x86: lapic: Clean up find_highest_vector() and count_vectors()
...
Conflicts:
arch/s390/include/asm/processor.h
arch/x86/kvm/i8259.c
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/virtual/kvm/api.txt | 33 | ||||
-rw-r--r-- | Documentation/virtual/kvm/hypercalls.txt | 66 | ||||
-rw-r--r-- | Documentation/virtual/kvm/msr.txt | 32 | ||||
-rw-r--r-- | Documentation/virtual/kvm/ppc-pv.txt | 22 |
4 files changed, 133 insertions, 20 deletions
diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index bf33aaa4c59f..f6ec3a92e621 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt | |||
@@ -857,7 +857,8 @@ struct kvm_userspace_memory_region { | |||
857 | }; | 857 | }; |
858 | 858 | ||
859 | /* for kvm_memory_region::flags */ | 859 | /* for kvm_memory_region::flags */ |
860 | #define KVM_MEM_LOG_DIRTY_PAGES 1UL | 860 | #define KVM_MEM_LOG_DIRTY_PAGES (1UL << 0) |
861 | #define KVM_MEM_READONLY (1UL << 1) | ||
861 | 862 | ||
862 | This ioctl allows the user to create or modify a guest physical memory | 863 | This ioctl allows the user to create or modify a guest physical memory |
863 | slot. When changing an existing slot, it may be moved in the guest | 864 | slot. When changing an existing slot, it may be moved in the guest |
@@ -873,14 +874,17 @@ It is recommended that the lower 21 bits of guest_phys_addr and userspace_addr | |||
873 | be identical. This allows large pages in the guest to be backed by large | 874 | be identical. This allows large pages in the guest to be backed by large |
874 | pages in the host. | 875 | pages in the host. |
875 | 876 | ||
876 | The flags field supports just one flag, KVM_MEM_LOG_DIRTY_PAGES, which | 877 | The flags field supports two flag, KVM_MEM_LOG_DIRTY_PAGES, which instructs |
877 | instructs kvm to keep track of writes to memory within the slot. See | 878 | kvm to keep track of writes to memory within the slot. See KVM_GET_DIRTY_LOG |
878 | the KVM_GET_DIRTY_LOG ioctl. | 879 | ioctl. The KVM_CAP_READONLY_MEM capability indicates the availability of the |
880 | KVM_MEM_READONLY flag. When this flag is set for a memory region, KVM only | ||
881 | allows read accesses. Writes will be posted to userspace as KVM_EXIT_MMIO | ||
882 | exits. | ||
879 | 883 | ||
880 | When the KVM_CAP_SYNC_MMU capability, changes in the backing of the memory | 884 | When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of |
881 | region are automatically reflected into the guest. For example, an mmap() | 885 | the memory region are automatically reflected into the guest. For example, an |
882 | that affects the region will be made visible immediately. Another example | 886 | mmap() that affects the region will be made visible immediately. Another |
883 | is madvise(MADV_DROP). | 887 | example is madvise(MADV_DROP). |
884 | 888 | ||
885 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. | 889 | It is recommended to use this API instead of the KVM_SET_MEMORY_REGION ioctl. |
886 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory | 890 | The KVM_SET_MEMORY_REGION does not allow fine grained control over memory |
@@ -1946,6 +1950,19 @@ the guest using the specified gsi pin. The irqfd is removed using | |||
1946 | the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd | 1950 | the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd |
1947 | and kvm_irqfd.gsi. | 1951 | and kvm_irqfd.gsi. |
1948 | 1952 | ||
1953 | With KVM_CAP_IRQFD_RESAMPLE, KVM_IRQFD supports a de-assert and notify | ||
1954 | mechanism allowing emulation of level-triggered, irqfd-based | ||
1955 | interrupts. When KVM_IRQFD_FLAG_RESAMPLE is set the user must pass an | ||
1956 | additional eventfd in the kvm_irqfd.resamplefd field. When operating | ||
1957 | in resample mode, posting of an interrupt through kvm_irq.fd asserts | ||
1958 | the specified gsi in the irqchip. When the irqchip is resampled, such | ||
1959 | as from an EOI, the gsi is de-asserted and the user is notifed via | ||
1960 | kvm_irqfd.resamplefd. It is the user's responsibility to re-queue | ||
1961 | the interrupt if the device making use of it still requires service. | ||
1962 | Note that closing the resamplefd is not sufficient to disable the | ||
1963 | irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment | ||
1964 | and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. | ||
1965 | |||
1949 | 4.76 KVM_PPC_ALLOCATE_HTAB | 1966 | 4.76 KVM_PPC_ALLOCATE_HTAB |
1950 | 1967 | ||
1951 | Capability: KVM_CAP_PPC_ALLOC_HTAB | 1968 | Capability: KVM_CAP_PPC_ALLOC_HTAB |
diff --git a/Documentation/virtual/kvm/hypercalls.txt b/Documentation/virtual/kvm/hypercalls.txt new file mode 100644 index 000000000000..ea113b5d87a4 --- /dev/null +++ b/Documentation/virtual/kvm/hypercalls.txt | |||
@@ -0,0 +1,66 @@ | |||
1 | Linux KVM Hypercall: | ||
2 | =================== | ||
3 | X86: | ||
4 | KVM Hypercalls have a three-byte sequence of either the vmcall or the vmmcall | ||
5 | instruction. The hypervisor can replace it with instructions that are | ||
6 | guaranteed to be supported. | ||
7 | |||
8 | Up to four arguments may be passed in rbx, rcx, rdx, and rsi respectively. | ||
9 | The hypercall number should be placed in rax and the return value will be | ||
10 | placed in rax. No other registers will be clobbered unless explicitly stated | ||
11 | by the particular hypercall. | ||
12 | |||
13 | S390: | ||
14 | R2-R7 are used for parameters 1-6. In addition, R1 is used for hypercall | ||
15 | number. The return value is written to R2. | ||
16 | |||
17 | S390 uses diagnose instruction as hypercall (0x500) along with hypercall | ||
18 | number in R1. | ||
19 | |||
20 | PowerPC: | ||
21 | It uses R3-R10 and hypercall number in R11. R4-R11 are used as output registers. | ||
22 | Return value is placed in R3. | ||
23 | |||
24 | KVM hypercalls uses 4 byte opcode, that are patched with 'hypercall-instructions' | ||
25 | property inside the device tree's /hypervisor node. | ||
26 | For more information refer to Documentation/virtual/kvm/ppc-pv.txt | ||
27 | |||
28 | KVM Hypercalls Documentation | ||
29 | =========================== | ||
30 | The template for each hypercall is: | ||
31 | 1. Hypercall name. | ||
32 | 2. Architecture(s) | ||
33 | 3. Status (deprecated, obsolete, active) | ||
34 | 4. Purpose | ||
35 | |||
36 | 1. KVM_HC_VAPIC_POLL_IRQ | ||
37 | ------------------------ | ||
38 | Architecture: x86 | ||
39 | Status: active | ||
40 | Purpose: Trigger guest exit so that the host can check for pending | ||
41 | interrupts on reentry. | ||
42 | |||
43 | 2. KVM_HC_MMU_OP | ||
44 | ------------------------ | ||
45 | Architecture: x86 | ||
46 | Status: deprecated. | ||
47 | Purpose: Support MMU operations such as writing to PTE, | ||
48 | flushing TLB, release PT. | ||
49 | |||
50 | 3. KVM_HC_FEATURES | ||
51 | ------------------------ | ||
52 | Architecture: PPC | ||
53 | Status: active | ||
54 | Purpose: Expose hypercall availability to the guest. On x86 platforms, cpuid | ||
55 | used to enumerate which hypercalls are available. On PPC, either device tree | ||
56 | based lookup ( which is also what EPAPR dictates) OR KVM specific enumeration | ||
57 | mechanism (which is this hypercall) can be used. | ||
58 | |||
59 | 4. KVM_HC_PPC_MAP_MAGIC_PAGE | ||
60 | ------------------------ | ||
61 | Architecture: PPC | ||
62 | Status: active | ||
63 | Purpose: To enable communication between the hypervisor and guest there is a | ||
64 | shared page that contains parts of supervisor visible register state. | ||
65 | The guest can map this shared page to access its supervisor register through | ||
66 | memory using this hypercall. | ||
diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index 730471048583..6d470ae7b073 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt | |||
@@ -34,9 +34,12 @@ MSR_KVM_WALL_CLOCK_NEW: 0x4b564d00 | |||
34 | time information and check that they are both equal and even. | 34 | time information and check that they are both equal and even. |
35 | An odd version indicates an in-progress update. | 35 | An odd version indicates an in-progress update. |
36 | 36 | ||
37 | sec: number of seconds for wallclock. | 37 | sec: number of seconds for wallclock at time of boot. |
38 | 38 | ||
39 | nsec: number of nanoseconds for wallclock. | 39 | nsec: number of nanoseconds for wallclock at time of boot. |
40 | |||
41 | In order to get the current wallclock time, the system_time from | ||
42 | MSR_KVM_SYSTEM_TIME_NEW needs to be added. | ||
40 | 43 | ||
41 | Note that although MSRs are per-CPU entities, the effect of this | 44 | Note that although MSRs are per-CPU entities, the effect of this |
42 | particular MSR is global. | 45 | particular MSR is global. |
@@ -82,20 +85,25 @@ MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01 | |||
82 | time at the time this structure was last updated. Unit is | 85 | time at the time this structure was last updated. Unit is |
83 | nanoseconds. | 86 | nanoseconds. |
84 | 87 | ||
85 | tsc_to_system_mul: a function of the tsc frequency. One has | 88 | tsc_to_system_mul: multiplier to be used when converting |
86 | to multiply any tsc-related quantity by this value to get | 89 | tsc-related quantity to nanoseconds |
87 | a value in nanoseconds, besides dividing by 2^tsc_shift | ||
88 | 90 | ||
89 | tsc_shift: cycle to nanosecond divider, as a power of two, to | 91 | tsc_shift: shift to be used when converting tsc-related |
90 | allow for shift rights. One has to shift right any tsc-related | 92 | quantity to nanoseconds. This shift will ensure that |
91 | quantity by this value to get a value in nanoseconds, besides | 93 | multiplication with tsc_to_system_mul does not overflow. |
92 | multiplying by tsc_to_system_mul. | 94 | A positive value denotes a left shift, a negative value |
95 | a right shift. | ||
93 | 96 | ||
94 | With this information, guests can derive per-CPU time by | 97 | The conversion from tsc to nanoseconds involves an additional |
95 | doing: | 98 | right shift by 32 bits. With this information, guests can |
99 | derive per-CPU time by doing: | ||
96 | 100 | ||
97 | time = (current_tsc - tsc_timestamp) | 101 | time = (current_tsc - tsc_timestamp) |
98 | time = (time * tsc_to_system_mul) >> tsc_shift | 102 | if (tsc_shift >= 0) |
103 | time <<= tsc_shift; | ||
104 | else | ||
105 | time >>= -tsc_shift; | ||
106 | time = (time * tsc_to_system_mul) >> 32 | ||
99 | time = time + system_time | 107 | time = time + system_time |
100 | 108 | ||
101 | flags: bits in this field indicate extended capabilities | 109 | flags: bits in this field indicate extended capabilities |
diff --git a/Documentation/virtual/kvm/ppc-pv.txt b/Documentation/virtual/kvm/ppc-pv.txt index 4911cf95c67e..4cd076febb02 100644 --- a/Documentation/virtual/kvm/ppc-pv.txt +++ b/Documentation/virtual/kvm/ppc-pv.txt | |||
@@ -174,3 +174,25 @@ following: | |||
174 | That way we can inject an arbitrary amount of code as replacement for a single | 174 | That way we can inject an arbitrary amount of code as replacement for a single |
175 | instruction. This allows us to check for pending interrupts when setting EE=1 | 175 | instruction. This allows us to check for pending interrupts when setting EE=1 |
176 | for example. | 176 | for example. |
177 | |||
178 | Hypercall ABIs in KVM on PowerPC | ||
179 | ================================= | ||
180 | 1) KVM hypercalls (ePAPR) | ||
181 | |||
182 | These are ePAPR compliant hypercall implementation (mentioned above). Even | ||
183 | generic hypercalls are implemented here, like the ePAPR idle hcall. These are | ||
184 | available on all targets. | ||
185 | |||
186 | 2) PAPR hypercalls | ||
187 | |||
188 | PAPR hypercalls are needed to run server PowerPC PAPR guests (-M pseries in QEMU). | ||
189 | These are the same hypercalls that pHyp, the POWER hypervisor implements. Some of | ||
190 | them are handled in the kernel, some are handled in user space. This is only | ||
191 | available on book3s_64. | ||
192 | |||
193 | 3) OSI hypercalls | ||
194 | |||
195 | Mac-on-Linux is another user of KVM on PowerPC, which has its own hypercall (long | ||
196 | before KVM). This is supported to maintain compatibility. All these hypercalls get | ||
197 | forwarded to user space. This is only useful on book3s_32, but can be used with | ||
198 | book3s_64 as well. | ||