aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kvm/mmu.c
Commit message (Collapse)AuthorAge
* KVM: MMU: Fix potential race setting upper shadow ptes on nonpae hostsAvi Kivity2008-07-20
| | | | | | | | | | | | | The direct mapped shadow code (used for real mode and two dimensional paging) sets upper-level ptes using direct assignment rather than calling set_shadow_pte(). A nonpae host will split this into two writes, which opens up a race if another vcpu accesses the same memory area. Fix by calling set_shadow_pte() instead of assigning directly. Noticed by Izik Eidus. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: improve invalid shadow root page handlingMarcelo Tosatti2008-07-20
| | | | | | | | Harden kvm_mmu_zap_page() against invalid root pages that had been shadowed from memslots that are gone. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: mmu_shrink: kvm_mmu_zap_page requires slots_lock to be heldMarcelo Tosatti2008-07-20
| | | | | | | | | | | | kvm_mmu_zap_page() needs slots lock held (rmap_remove->gfn_to_memslot, for example). Since kvm_lock spinlock is held in mmu_shrink(), do a non-blocking down_read_trylock(). Untested. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix printk formatAvi Kivity2008-07-20
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: When debug is enabled, make it a run-time parameterAvi Kivity2008-07-20
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Avoid page prefetch on SVMAvi Kivity2008-07-20
| | | | | | | | SVM cannot benefit from page prefetching since guest page fault bypass cannot by made to work there. Avoid accessing the guest page table in this case. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move nonpaging_prefetch_page()Avi Kivity2008-07-20
| | | | | | In preparation for next patch. No code change. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix false flooding when a pte points to page tableAvi Kivity2008-07-20
| | | | | | | | | | | | | | | | | | | | | | The KVM MMU tries to detect when a speculative pte update is not actually used by demand fault, by checking the accessed bit of the shadow pte. If the shadow pte has not been accessed, we deem that page table flooded and remove the shadow page table, allowing further pte updates to proceed without emulation. However, if the pte itself points at a page table and only used for write operations, the accessed bit will never be set since all access will happen through the emulator. This is exactly what happens with kscand on old (2.4.x) HIGHMEM kernels. The kernel points a kmap_atomic() pte at a page table, and then proceeds with read-modify-write operations to look at the dirty and accessed bits. We get a false flood trigger on the kmap ptes, which results in the mmu spending all its time setting up and tearing down shadows. Fix by setting the shadow accessed bit on emulated accesses. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: add statics were possible, function definition in lapic.hHarvey Harrison2008-07-20
| | | | | | | | | | | | | | | | Noticed by sparse: arch/x86/kvm/vmx.c:1583:6: warning: symbol 'vmx_disable_intercept_for_msr' was not declared. Should it be static? arch/x86/kvm/x86.c:3406:5: warning: symbol 'kvm_task_switch_16' was not declared. Should it be static? arch/x86/kvm/x86.c:3429:5: warning: symbol 'kvm_task_switch_32' was not declared. Should it be static? arch/x86/kvm/mmu.c:1968:6: warning: symbol 'kvm_mmu_remove_one_alloc_mmu_page' was not declared. Should it be static? arch/x86/kvm/mmu.c:2014:6: warning: symbol 'mmu_destroy_caches' was not declared. Should it be static? arch/x86/kvm/lapic.c:862:5: warning: symbol 'kvm_lapic_get_base' was not declared. Should it be static? arch/x86/kvm/i8254.c:94:5: warning: symbol 'pit_get_gate' was not declared. Should it be static? arch/x86/kvm/i8254.c:196:5: warning: symbol '__pit_timer_fn' was not declared. Should it be static? arch/x86/kvm/i8254.c:561:6: warning: symbol '__inject_pit_timer_intr' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix oops on guest userspace access to guest pagetableAvi Kivity2008-06-24
| | | | | | | | | | | | | | | | KVM has a heuristic to unshadow guest pagetables when userspace accesses them, on the assumption that most guests do not allow userspace to access pagetables directly. Unfortunately, in addition to unshadowing the pagetables, it also oopses. This never triggers on ordinary guests since sane OSes will clear the pagetables before assigning them to userspace, which will trigger the flood heuristic, unshadowing the pagetables before the first userspace access. One particular guest, though (Xenner) will run the kernel in userspace, triggering the oops. Since the heuristic is incorrect in this case, we can simply remove it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: large page update_pte issue with non-PAE 32-bit guests (resend)Marcelo Tosatti2008-06-24
| | | | | | | | | | | | | | kvm_mmu_pte_write() does not handle 32-bit non-PAE large page backed guests properly. It will instantiate two 2MB sptes pointing to the same physical 2MB page when a guest large pte update is trapped. Instead of duplicating code to handle this, disallow directory level updates to happen through kvm_mmu_pte_write(), so the two 2MB sptes emulating one guest 4MB pte can be correctly created by the page fault handling path. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix rmap_write_protect() hugepage iteration bugMarcelo Tosatti2008-06-24
| | | | | | | | rmap_next() does not work correctly after rmap_remove(), as it expects the rmap chains not to change during iteration. Fix (for now) by restarting iteration from the beginning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix is_empty_shadow_page() checkAvi Kivity2008-06-06
| | | | | | The check is only looking at one of two possible empty ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: reschedule during shadow teardownAvi Kivity2008-06-06
| | | | | | | Shadows for large guests can take a long time to tear down, so reschedule occasionally to avoid softlockup warnings. Signed-off-by: Avi Kivity <avi@qumranet.com>
* namespacecheck: automated fixesIngo Molnar2008-05-23
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* KVM: MMU: Allow more than PAGES_PER_HPAGE write protections per large pageAvi Kivity2008-05-04
| | | | | | | | | nonpae guests can call rmap_write_protect twice per page (for page tables) or four times per page (for page directories), triggering a bogus warning. Remove the warning. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Enable EPT feature for KVMSheng Yang2008-05-04
| | | | | Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Remove #ifdef CONFIG_X86_64 to support 4 level EPTSheng Yang2008-05-04
| | | | | | | | Currently EPT level is 4 for both pae and x86_64. The patch remove the #ifdef for alloc root_hpa and free root_hpa to support EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Add EPT supportSheng Yang2008-05-04
| | | | | | | Enable kvm_set_spte() to generate EPT entries. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm_x86_ops get_tdp_level()Sheng Yang2008-05-04
| | | | | | | | The function get_tdp_level() provided the number of tdp level for EPT and NPT rather than the NPT specific macro. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move some definitions to a header fileSheng Yang2008-05-04
| | | | | | | | Move some definitions to mmu.h in order to allow building common table entries between EPT and non-EPT. Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: kvm_pv_mmu_op should not take mmap_semMarcelo Tosatti2008-04-27
| | | | | | | | | | | kvm_pv_mmu_op should not take mmap_sem. All gfn_to_page() callers down in the MMU processing will take it if necessary, so as it is it can deadlock. Apparently a leftover from the days before slots_lock. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Don't assume struct page for x86Anthony Liguori2008-04-27
| | | | | | | | | | | | | | | | | | | | | | | | This patch introduces a gfn_to_pfn() function and corresponding functions like kvm_release_pfn_dirty(). Using these new functions, we can modify the x86 MMU to no longer assume that it can always get a struct page for any given gfn. We don't want to eliminate gfn_to_page() entirely because a number of places assume they can do gfn_to_page() and then kmap() the results. When we support IO memory, gfn_to_page() will fail for IO pages although gfn_to_pfn() will succeed. This does not implement support for avoiding reference counting for reserved RAM or for IO memory. However, it should make those things pretty straight forward. Since we're only introducing new common symbols, I don't think it will break the non-x86 architectures but I haven't tested those. I've tested Intel, AMD, NPT, and hugetlbfs with Windows and Linux guests. [avi: fix overflow when shifting left pfns by adding casts] Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: prepopulate guest pages after write-protectingMarcelo Tosatti2008-04-27
| | | | | | | | | | | | | | | | Zdenek reported a bug where a looping "dmsetup status" eventually hangs on SMP guests. The problem is that kvm_mmu_get_page() prepopulates the shadow MMU before write protecting the guest page tables. By doing so, it leaves a window open where the guest can mark a pte as present while the host has shadow cached such pte as "notrap". Accesses to such address will fault in the guest without the host having a chance to fix the situation. Fix by moving the write protection before the pte prefetch. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Only mark_page_accessed() if the page was accessed by the guestAvi Kivity2008-04-27
| | | | | | | | | | If the accessed bit is not set, the guest has never accessed this page (at least through this spte), so there's no need to mark the page accessed. This provides more accurate data for the eviction algortithm. Noted by Andrea Arcangeli. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: allow the vm to shrink the kvm mmu shadow cachesIzik Eidus2008-04-27
| | | | | | | Allow the Linux memory manager to reclaim memory in the kvm shadow cache. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: unify slots_lock usageMarcelo Tosatti2008-04-27
| | | | | | | | | | | | | Unify slots_lock acquision around vcpu_run(). This is simpler and less error-prone. Also fix some callsites that were not grabbing the lock properly. [avi: drop slots_lock while in guest mode to avoid holding the lock for indefinite periods] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Introduce and use spte_to_page()Avi Kivity2008-04-27
| | | | | | Encapsulate the pte mask'n'shift in a function. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: fix dirty bit setting when removing write permissionsIzik Eidus2008-04-27
| | | | | | | | | | | | | | When mmu_set_spte() checks if a page related to spte should be release as dirty or clean, it check if the shadow pte was writeble, but in case rmap_write_protect() is called called it is possible for shadow ptes that were writeble to become readonly and therefor mmu_set_spte will release the pages as clean. This patch fix this issue by marking the page as dirty inside rmap_write_protect(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Set the accessed bit on non-speculative shadow ptesAvi Kivity2008-04-27
| | | | | | | | | If we populate a shadow pte due to a fault (and not speculatively due to a pte write) then we can set the accessed bit on it, as we know it will be set immediately on the next guest instruction. This saves a read-modify-write operation. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: hypercall based pte updates and TLB flushesMarcelo Tosatti2008-04-27
| | | | | | | | | | | | | | | | | | Hypercall based pte updates are faster than faults, and also allow use of the lazy MMU mode to batch operations. Don't report the feature if two dimensional paging is enabled. [avi: - one mmu_op hypercall instead of one per op - allow 64-bit gpa on hypercall - don't pass host errors (-ENOMEM) to guest] [akpm: warning fix on i386] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: replace remaining __FUNCTION__ occurancesHarvey Harrison2008-04-27
| | | | | | | __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: large page supportMarcelo Tosatti2008-04-27
| | | | | | | | | | | | | | Create large pages mappings if the guest PTE's are marked as such and the underlying memory is hugetlbfs backed. If the largepage contains write-protected pages, a large pte is not used. Gives a consistent 2% improvement for data copies on ram mounted filesystem, without NPT/EPT. Anthony measures a 4% improvement on 4-way kernbench, with NPT. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: ignore zapped root pagetablesMarcelo Tosatti2008-04-27
| | | | | | | | | | | | Mark zapped root pagetables as invalid and ignore such pages during lookup. This is a problem with the cr3-target feature, where a zapped root table fools the faulting code into creating a read-only mapping. The result is a lockup if the instruction can't be emulated. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Cc: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: add TDP support to the KVM MMUJoerg Roedel2008-04-27
| | | | | | | | This patch contains the changes to the KVM MMU necessary for support of the Nested Paging feature in AMD Barcelona and Phenom Processors. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: make the __nonpaging_map function genericJoerg Roedel2008-04-27
| | | | | | | | | The mapping function for the nonpaging case in the softmmu does basically the same as required for Nested Paging. Make this function generic so it can be used for both. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: export information about NPT to generic x86 codeJoerg Roedel2008-04-27
| | | | | | | | | | The generic x86 code has to know if the specific implementation uses Nested Paging. In the generic code Nested Paging is called Two Dimensional Paging (TDP) to avoid confusion with (future) TDP implementations of other vendors. This patch exports the availability of TDP to the generic x86 code. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Decouple mmio from shadow page tablesAvi Kivity2008-04-27
| | | | | | | | | | | | | Currently an mmio guest pte is encoded in the shadow pagetable as a not-present trapping pte, with the SHADOW_IO_MARK bit set. However nothing is ever done with this information, so maintaining it is a useless complication. This patch moves the check for mmio to before shadow ptes are instantiated, so the shadow code is never invoked for ptes that reference mmio. The code is simpler, and with future work, can be made to handle mmio concurrently. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Simplify hash table indexingDong, Eddie2008-04-27
| | | | | Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Update shadow ptes on partial guest pte writesDong, Eddie2008-04-27
| | | | | | | | | | | | | | | | A guest partial guest pte write will leave shadow_trap_nonpresent_pte in spte, which generates a vmexit at the next guest access through that pte. This patch improves this by reading the full guest pte in advance and thus being able to update the spte and eliminate the vmexit. This helps pae guests which use two 32-bit writes to set a single 64-bit pte. [truncation fix by Eric] Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Feng (Eric) Liu <eric.e.liu@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix memory leak on guest demand faultsAvi Kivity2008-03-25
| | | | | | | | | | | While backporting 72dc67a69690288538142df73a7e3ac66fea68dc, a gfn_to_page() call was duplicated instead of moved (due to an unrelated patch not being present in mainline). This caused a page reference leak, resulting in a fairly massive memory leak. Fix by removing the extraneous gfn_to_page() call. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: handle page removal with shadow mappingMarcelo Tosatti2008-03-25
| | | | | | | | | | Do not assume that a shadow mapping will always point to the same host frame number. Fixes crash with madvise(MADV_DONTNEED). [avi: move after first printk(), add another printk()] Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix is_rmap_pte() with io ptesAvi Kivity2008-03-25
| | | | | | is_rmap_pte() doesn't take into account io ptes, which have the avail bit set. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix race when instantiating a shadow pteAvi Kivity2008-03-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For improved concurrency, the guest walk is performed concurrently with other vcpus. This means that we need to revalidate the guest ptes once we have write-protected the guest page tables, at which point they can no longer be modified. The current code attempts to avoid this check if the shadow page table is not new, on the assumption that if it has existed before, the guest could not have modified the pte without the shadow lock. However the assumption is incorrect, as the racing vcpu could have modified the pte, then instantiated the shadow page, before our vcpu regains control: vcpu0 vcpu1 fault walk pte modify pte fault in same pagetable instantiate shadow page lookup shadow page conclude it is old instantiate spte based on stale guest pte We could do something clever with generation counters, but a test run by Marcelo suggests this is unnecessary and we can just do the revalidation unconditionally. The pte will be in the processor cache and the check can be quite fast. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: make MMU_DEBUG compile againMarcelo Tosatti2008-03-04
| | | | | | | the cr3 variable is now inside the vcpu->arch structure. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: remove the usage of the mmap_sem for the protection of the memory slots.Izik Eidus2008-03-04
| | | | | | | | | This patch replaces the mmap_sem lock for the memory slots with a new kvm private lock, it is needed beacuse untill now there were cases where kvm accesses user memory while holding the mmap semaphore. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix dirty page setting for pages removed from rmapIzik Eidus2008-01-30
| | | | | | | | | | Right now rmap_remove won't set the page as dirty if the shadow pte pointed to this page had write access and then it became readonly. This patches fixes that, by setting the page as dirty for spte changes from write to readonly access. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move kvm_free_some_pages() into critical sectionAvi Kivity2008-01-30
| | | | | | | | If some other cpu steals mmu pages between our check and an attempt to allocate, we can run out of mmu pages. Fix by moving the check into the same critical section as the allocation. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Switch to mmu spinlockMarcelo Tosatti2008-01-30
| | | | | | | | | | | Convert the synchronization of the shadow handling to a separate mmu_lock spinlock. Also guard fetch() by mmap_sem in read-mode to protect against alias and memslot changes. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Avoid calling gfn_to_page() in mmu_set_spte()Avi Kivity2008-01-30
| | | | | | | | | | Since gfn_to_page() is a sleeping function, and we want to make the core mmu spinlocked, we need to pass the page from the walker context (which can sleep) to the shadow context (which cannot). [marcelo: avoid recursive locking of mmap_sem] Signed-off-by: Avi Kivity <avi@qumranet.com>