diff options
Diffstat (limited to 'Documentation/kvm/mmu.txt')
-rw-r--r-- | Documentation/kvm/mmu.txt | 52 |
1 files changed, 48 insertions, 4 deletions
diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt index aaed6ab9d7ab..142cc5136650 100644 --- a/Documentation/kvm/mmu.txt +++ b/Documentation/kvm/mmu.txt | |||
@@ -77,10 +77,10 @@ Memory | |||
77 | 77 | ||
78 | Guest memory (gpa) is part of the user address space of the process that is | 78 | Guest memory (gpa) is part of the user address space of the process that is |
79 | using kvm. Userspace defines the translation between guest addresses and user | 79 | using kvm. Userspace defines the translation between guest addresses and user |
80 | addresses (gpa->hva); note that two gpas may alias to the same gva, but not | 80 | addresses (gpa->hva); note that two gpas may alias to the same hva, but not |
81 | vice versa. | 81 | vice versa. |
82 | 82 | ||
83 | These gvas may be backed using any method available to the host: anonymous | 83 | These hvas may be backed using any method available to the host: anonymous |
84 | memory, file backed memory, and device memory. Memory might be paged by the | 84 | memory, file backed memory, and device memory. Memory might be paged by the |
85 | host at any time. | 85 | host at any time. |
86 | 86 | ||
@@ -161,7 +161,7 @@ Shadow pages contain the following information: | |||
161 | role.cr4_pae: | 161 | role.cr4_pae: |
162 | Contains the value of cr4.pae for which the page is valid (e.g. whether | 162 | Contains the value of cr4.pae for which the page is valid (e.g. whether |
163 | 32-bit or 64-bit gptes are in use). | 163 | 32-bit or 64-bit gptes are in use). |
164 | role.cr4_nxe: | 164 | role.nxe: |
165 | Contains the value of efer.nxe for which the page is valid. | 165 | Contains the value of efer.nxe for which the page is valid. |
166 | role.cr0_wp: | 166 | role.cr0_wp: |
167 | Contains the value of cr0.wp for which the page is valid. | 167 | Contains the value of cr0.wp for which the page is valid. |
@@ -180,7 +180,9 @@ Shadow pages contain the following information: | |||
180 | guest pages as leaves. | 180 | guest pages as leaves. |
181 | gfns: | 181 | gfns: |
182 | An array of 512 guest frame numbers, one for each present pte. Used to | 182 | An array of 512 guest frame numbers, one for each present pte. Used to |
183 | perform a reverse map from a pte to a gfn. | 183 | perform a reverse map from a pte to a gfn. When role.direct is set, any |
184 | element of this array can be calculated from the gfn field when used, in | ||
185 | this case, the array of gfns is not allocated. See role.direct and gfn. | ||
184 | slot_bitmap: | 186 | slot_bitmap: |
185 | A bitmap containing one bit per memory slot. If the page contains a pte | 187 | A bitmap containing one bit per memory slot. If the page contains a pte |
186 | mapping a page from memory slot n, then bit n of slot_bitmap will be set | 188 | mapping a page from memory slot n, then bit n of slot_bitmap will be set |
@@ -296,6 +298,48 @@ Host translation updates: | |||
296 | - look up affected sptes through reverse map | 298 | - look up affected sptes through reverse map |
297 | - drop (or update) translations | 299 | - drop (or update) translations |
298 | 300 | ||
301 | Emulating cr0.wp | ||
302 | ================ | ||
303 | |||
304 | If tdp is not enabled, the host must keep cr0.wp=1 so page write protection | ||
305 | works for the guest kernel, not guest guest userspace. When the guest | ||
306 | cr0.wp=1, this does not present a problem. However when the guest cr0.wp=0, | ||
307 | we cannot map the permissions for gpte.u=1, gpte.w=0 to any spte (the | ||
308 | semantics require allowing any guest kernel access plus user read access). | ||
309 | |||
310 | We handle this by mapping the permissions to two possible sptes, depending | ||
311 | on fault type: | ||
312 | |||
313 | - kernel write fault: spte.u=0, spte.w=1 (allows full kernel access, | ||
314 | disallows user access) | ||
315 | - read fault: spte.u=1, spte.w=0 (allows full read access, disallows kernel | ||
316 | write access) | ||
317 | |||
318 | (user write faults generate a #PF) | ||
319 | |||
320 | Large pages | ||
321 | =========== | ||
322 | |||
323 | The mmu supports all combinations of large and small guest and host pages. | ||
324 | Supported page sizes include 4k, 2M, 4M, and 1G. 4M pages are treated as | ||
325 | two separate 2M pages, on both guest and host, since the mmu always uses PAE | ||
326 | paging. | ||
327 | |||
328 | To instantiate a large spte, four constraints must be satisfied: | ||
329 | |||
330 | - the spte must point to a large host page | ||
331 | - the guest pte must be a large pte of at least equivalent size (if tdp is | ||
332 | enabled, there is no guest pte and this condition is satisified) | ||
333 | - if the spte will be writeable, the large page frame may not overlap any | ||
334 | write-protected pages | ||
335 | - the guest page must be wholly contained by a single memory slot | ||
336 | |||
337 | To check the last two conditions, the mmu maintains a ->write_count set of | ||
338 | arrays for each memory slot and large page size. Every write protected page | ||
339 | causes its write_count to be incremented, thus preventing instantiation of | ||
340 | a large spte. The frames at the end of an unaligned memory slot have | ||
341 | artificically inflated ->write_counts so they can never be instantiated. | ||
342 | |||
299 | Further reading | 343 | Further reading |
300 | =============== | 344 | =============== |
301 | 345 | ||