diff options
author | Nadav Har'El <nyh@il.ibm.com> | 2011-05-25 16:17:11 -0400 |
---|---|---|
committer | Avi Kivity <avi@redhat.com> | 2011-07-12 06:15:22 -0400 |
commit | 823e396558e509b7c3225cd76806f3d6643ff5f8 (patch) | |
tree | af8bc2c12b9cc31571d6280621a15b0657907512 /Documentation/virtual/kvm | |
parent | 2844d8490523c5768cc37f21e065c76c45232724 (diff) |
KVM: nVMX: Documentation
This patch includes a brief introduction to the nested vmx feature in the
Documentation/kvm directory. The document also includes a copy of the
vmcs12 structure, as requested by Avi Kivity.
[marcelo: move to Documentation/virtual/kvm]
Signed-off-by: Nadav Har'El <nyh@il.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Diffstat (limited to 'Documentation/virtual/kvm')
-rw-r--r-- | Documentation/virtual/kvm/nested-vmx.txt | 251 |
1 files changed, 251 insertions, 0 deletions
diff --git a/Documentation/virtual/kvm/nested-vmx.txt b/Documentation/virtual/kvm/nested-vmx.txt new file mode 100644 index 000000000000..8ed937de1163 --- /dev/null +++ b/Documentation/virtual/kvm/nested-vmx.txt | |||
@@ -0,0 +1,251 @@ | |||
1 | Nested VMX | ||
2 | ========== | ||
3 | |||
4 | Overview | ||
5 | --------- | ||
6 | |||
7 | On Intel processors, KVM uses Intel's VMX (Virtual-Machine eXtensions) | ||
8 | to easily and efficiently run guest operating systems. Normally, these guests | ||
9 | *cannot* themselves be hypervisors running their own guests, because in VMX, | ||
10 | guests cannot use VMX instructions. | ||
11 | |||
12 | The "Nested VMX" feature adds this missing capability - of running guest | ||
13 | hypervisors (which use VMX) with their own nested guests. It does so by | ||
14 | allowing a guest to use VMX instructions, and correctly and efficiently | ||
15 | emulating them using the single level of VMX available in the hardware. | ||
16 | |||
17 | We describe in much greater detail the theory behind the nested VMX feature, | ||
18 | its implementation and its performance characteristics, in the OSDI 2010 paper | ||
19 | "The Turtles Project: Design and Implementation of Nested Virtualization", | ||
20 | available at: | ||
21 | |||
22 | http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf | ||
23 | |||
24 | |||
25 | Terminology | ||
26 | ----------- | ||
27 | |||
28 | Single-level virtualization has two levels - the host (KVM) and the guests. | ||
29 | In nested virtualization, we have three levels: The host (KVM), which we call | ||
30 | L0, the guest hypervisor, which we call L1, and its nested guest, which we | ||
31 | call L2. | ||
32 | |||
33 | |||
34 | Known limitations | ||
35 | ----------------- | ||
36 | |||
37 | The current code supports running Linux guests under KVM guests. | ||
38 | Only 64-bit guest hypervisors are supported. | ||
39 | |||
40 | Additional patches for running Windows under guest KVM, and Linux under | ||
41 | guest VMware server, and support for nested EPT, are currently running in | ||
42 | the lab, and will be sent as follow-on patchsets. | ||
43 | |||
44 | |||
45 | Running nested VMX | ||
46 | ------------------ | ||
47 | |||
48 | The nested VMX feature is disabled by default. It can be enabled by giving | ||
49 | the "nested=1" option to the kvm-intel module. | ||
50 | |||
51 | No modifications are required to user space (qemu). However, qemu's default | ||
52 | emulated CPU type (qemu64) does not list the "VMX" CPU feature, so it must be | ||
53 | explicitly enabled, by giving qemu one of the following options: | ||
54 | |||
55 | -cpu host (emulated CPU has all features of the real CPU) | ||
56 | |||
57 | -cpu qemu64,+vmx (add just the vmx feature to a named CPU type) | ||
58 | |||
59 | |||
60 | ABIs | ||
61 | ---- | ||
62 | |||
63 | Nested VMX aims to present a standard and (eventually) fully-functional VMX | ||
64 | implementation for the a guest hypervisor to use. As such, the official | ||
65 | specification of the ABI that it provides is Intel's VMX specification, | ||
66 | namely volume 3B of their "Intel 64 and IA-32 Architectures Software | ||
67 | Developer's Manual". Not all of VMX's features are currently fully supported, | ||
68 | but the goal is to eventually support them all, starting with the VMX features | ||
69 | which are used in practice by popular hypervisors (KVM and others). | ||
70 | |||
71 | As a VMX implementation, nested VMX presents a VMCS structure to L1. | ||
72 | As mandated by the spec, other than the two fields revision_id and abort, | ||
73 | this structure is *opaque* to its user, who is not supposed to know or care | ||
74 | about its internal structure. Rather, the structure is accessed through the | ||
75 | VMREAD and VMWRITE instructions. | ||
76 | Still, for debugging purposes, KVM developers might be interested to know the | ||
77 | internals of this structure; This is struct vmcs12 from arch/x86/kvm/vmx.c. | ||
78 | |||
79 | The name "vmcs12" refers to the VMCS that L1 builds for L2. In the code we | ||
80 | also have "vmcs01", the VMCS that L0 built for L1, and "vmcs02" is the VMCS | ||
81 | which L0 builds to actually run L2 - how this is done is explained in the | ||
82 | aforementioned paper. | ||
83 | |||
84 | For convenience, we repeat the content of struct vmcs12 here. If the internals | ||
85 | of this structure changes, this can break live migration across KVM versions. | ||
86 | VMCS12_REVISION (from vmx.c) should be changed if struct vmcs12 or its inner | ||
87 | struct shadow_vmcs is ever changed. | ||
88 | |||
89 | typedef u64 natural_width; | ||
90 | struct __packed vmcs12 { | ||
91 | /* According to the Intel spec, a VMCS region must start with | ||
92 | * these two user-visible fields */ | ||
93 | u32 revision_id; | ||
94 | u32 abort; | ||
95 | |||
96 | u32 launch_state; /* set to 0 by VMCLEAR, to 1 by VMLAUNCH */ | ||
97 | u32 padding[7]; /* room for future expansion */ | ||
98 | |||
99 | u64 io_bitmap_a; | ||
100 | u64 io_bitmap_b; | ||
101 | u64 msr_bitmap; | ||
102 | u64 vm_exit_msr_store_addr; | ||
103 | u64 vm_exit_msr_load_addr; | ||
104 | u64 vm_entry_msr_load_addr; | ||
105 | u64 tsc_offset; | ||
106 | u64 virtual_apic_page_addr; | ||
107 | u64 apic_access_addr; | ||
108 | u64 ept_pointer; | ||
109 | u64 guest_physical_address; | ||
110 | u64 vmcs_link_pointer; | ||
111 | u64 guest_ia32_debugctl; | ||
112 | u64 guest_ia32_pat; | ||
113 | u64 guest_ia32_efer; | ||
114 | u64 guest_pdptr0; | ||
115 | u64 guest_pdptr1; | ||
116 | u64 guest_pdptr2; | ||
117 | u64 guest_pdptr3; | ||
118 | u64 host_ia32_pat; | ||
119 | u64 host_ia32_efer; | ||
120 | u64 padding64[8]; /* room for future expansion */ | ||
121 | natural_width cr0_guest_host_mask; | ||
122 | natural_width cr4_guest_host_mask; | ||
123 | natural_width cr0_read_shadow; | ||
124 | natural_width cr4_read_shadow; | ||
125 | natural_width cr3_target_value0; | ||
126 | natural_width cr3_target_value1; | ||
127 | natural_width cr3_target_value2; | ||
128 | natural_width cr3_target_value3; | ||
129 | natural_width exit_qualification; | ||
130 | natural_width guest_linear_address; | ||
131 | natural_width guest_cr0; | ||
132 | natural_width guest_cr3; | ||
133 | natural_width guest_cr4; | ||
134 | natural_width guest_es_base; | ||
135 | natural_width guest_cs_base; | ||
136 | natural_width guest_ss_base; | ||
137 | natural_width guest_ds_base; | ||
138 | natural_width guest_fs_base; | ||
139 | natural_width guest_gs_base; | ||
140 | natural_width guest_ldtr_base; | ||
141 | natural_width guest_tr_base; | ||
142 | natural_width guest_gdtr_base; | ||
143 | natural_width guest_idtr_base; | ||
144 | natural_width guest_dr7; | ||
145 | natural_width guest_rsp; | ||
146 | natural_width guest_rip; | ||
147 | natural_width guest_rflags; | ||
148 | natural_width guest_pending_dbg_exceptions; | ||
149 | natural_width guest_sysenter_esp; | ||
150 | natural_width guest_sysenter_eip; | ||
151 | natural_width host_cr0; | ||
152 | natural_width host_cr3; | ||
153 | natural_width host_cr4; | ||
154 | natural_width host_fs_base; | ||
155 | natural_width host_gs_base; | ||
156 | natural_width host_tr_base; | ||
157 | natural_width host_gdtr_base; | ||
158 | natural_width host_idtr_base; | ||
159 | natural_width host_ia32_sysenter_esp; | ||
160 | natural_width host_ia32_sysenter_eip; | ||
161 | natural_width host_rsp; | ||
162 | natural_width host_rip; | ||
163 | natural_width paddingl[8]; /* room for future expansion */ | ||
164 | u32 pin_based_vm_exec_control; | ||
165 | u32 cpu_based_vm_exec_control; | ||
166 | u32 exception_bitmap; | ||
167 | u32 page_fault_error_code_mask; | ||
168 | u32 page_fault_error_code_match; | ||
169 | u32 cr3_target_count; | ||
170 | u32 vm_exit_controls; | ||
171 | u32 vm_exit_msr_store_count; | ||
172 | u32 vm_exit_msr_load_count; | ||
173 | u32 vm_entry_controls; | ||
174 | u32 vm_entry_msr_load_count; | ||
175 | u32 vm_entry_intr_info_field; | ||
176 | u32 vm_entry_exception_error_code; | ||
177 | u32 vm_entry_instruction_len; | ||
178 | u32 tpr_threshold; | ||
179 | u32 secondary_vm_exec_control; | ||
180 | u32 vm_instruction_error; | ||
181 | u32 vm_exit_reason; | ||
182 | u32 vm_exit_intr_info; | ||
183 | u32 vm_exit_intr_error_code; | ||
184 | u32 idt_vectoring_info_field; | ||
185 | u32 idt_vectoring_error_code; | ||
186 | u32 vm_exit_instruction_len; | ||
187 | u32 vmx_instruction_info; | ||
188 | u32 guest_es_limit; | ||
189 | u32 guest_cs_limit; | ||
190 | u32 guest_ss_limit; | ||
191 | u32 guest_ds_limit; | ||
192 | u32 guest_fs_limit; | ||
193 | u32 guest_gs_limit; | ||
194 | u32 guest_ldtr_limit; | ||
195 | u32 guest_tr_limit; | ||
196 | u32 guest_gdtr_limit; | ||
197 | u32 guest_idtr_limit; | ||
198 | u32 guest_es_ar_bytes; | ||
199 | u32 guest_cs_ar_bytes; | ||
200 | u32 guest_ss_ar_bytes; | ||
201 | u32 guest_ds_ar_bytes; | ||
202 | u32 guest_fs_ar_bytes; | ||
203 | u32 guest_gs_ar_bytes; | ||
204 | u32 guest_ldtr_ar_bytes; | ||
205 | u32 guest_tr_ar_bytes; | ||
206 | u32 guest_interruptibility_info; | ||
207 | u32 guest_activity_state; | ||
208 | u32 guest_sysenter_cs; | ||
209 | u32 host_ia32_sysenter_cs; | ||
210 | u32 padding32[8]; /* room for future expansion */ | ||
211 | u16 virtual_processor_id; | ||
212 | u16 guest_es_selector; | ||
213 | u16 guest_cs_selector; | ||
214 | u16 guest_ss_selector; | ||
215 | u16 guest_ds_selector; | ||
216 | u16 guest_fs_selector; | ||
217 | u16 guest_gs_selector; | ||
218 | u16 guest_ldtr_selector; | ||
219 | u16 guest_tr_selector; | ||
220 | u16 host_es_selector; | ||
221 | u16 host_cs_selector; | ||
222 | u16 host_ss_selector; | ||
223 | u16 host_ds_selector; | ||
224 | u16 host_fs_selector; | ||
225 | u16 host_gs_selector; | ||
226 | u16 host_tr_selector; | ||
227 | }; | ||
228 | |||
229 | |||
230 | Authors | ||
231 | ------- | ||
232 | |||
233 | These patches were written by: | ||
234 | Abel Gordon, abelg <at> il.ibm.com | ||
235 | Nadav Har'El, nyh <at> il.ibm.com | ||
236 | Orit Wasserman, oritw <at> il.ibm.com | ||
237 | Ben-Ami Yassor, benami <at> il.ibm.com | ||
238 | Muli Ben-Yehuda, muli <at> il.ibm.com | ||
239 | |||
240 | With contributions by: | ||
241 | Anthony Liguori, aliguori <at> us.ibm.com | ||
242 | Mike Day, mdday <at> us.ibm.com | ||
243 | Michael Factor, factor <at> il.ibm.com | ||
244 | Zvi Dubitzky, dubi <at> il.ibm.com | ||
245 | |||
246 | And valuable reviews by: | ||
247 | Avi Kivity, avi <at> redhat.com | ||
248 | Gleb Natapov, gleb <at> redhat.com | ||
249 | Marcelo Tosatti, mtosatti <at> redhat.com | ||
250 | Kevin Tian, kevin.tian <at> intel.com | ||
251 | and others. | ||