aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* KVM: Per-architecture hypercall definitionsChristian Borntraeger2008-01-30
| | | | | | | | | | Currently kvm provides hypercalls only for x86* architectures. To provide hypercall infrastructure for other kvm architectures I split kvm_para.h into a generic header file and architecture specific definitions. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Split IOAPIC reset function and export for kernel RESETEddie Dong2008-01-30
| | | | | Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Export PIC reset for kernel device resetEddie Dong2008-01-30
| | | | | Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add a might_sleep() annotation to gfn_to_page()Avi Kivity2008-01-30
| | | | | | This will help trap accesses to guest memory in atomic context. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move vmx_vcpu_reset() out of vmx_vcpu_setup()Avi Kivity2008-01-30
| | | | | | | | | | Split guest reset code out of vmx_vcpu_setup(). Besides being cleaner, this moves the realmode tss setup (which can sleep) outside vmx_vcpu_setup() (which is executed with preemption enabled). [izik: remove unused variable] Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Portability: Split kvm_vcpu into arch dependent and independent parts ↵Zhang Xiantao2008-01-30
| | | | | | | | | | | (part 1) First step to split kvm_vcpu. Currently, we just use an macro to define the common fields in kvm_vcpu for all archs, and all archs need to define its own kvm_vcpu struct. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allocate userspace memory for older userspaceAnthony Liguori2008-01-30
| | | | | | | | | | | | Allocate a userspace buffer for older userspaces. Also eliminate phys_mem buffer. The memset() in kvmctl really kills initial memory usage but swapping works even with old userspaces. A side effect is that maximum guest side is reduced for older userspace on i386. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use virtual cpu accounting if available for guest times.Christian Borntraeger2008-01-30
| | | | | | | | | | | | | | | | | | ppc and s390 offer the possibility to track process times precisely by looking at cpu timer on every context switch, irq, softirq etc. We can use that infrastructure as well for guest time accounting. We need to account the used time before we change the state. This patch adds a call to account_system_vtime to kvm_guest_enter and kvm_guest exit. If CONFIG_VIRT_CPU_ACCOUNTING is not set, account_system_vtime is defined in hardirq.h as an empty function, which means this patch does not change the behaviour on other platforms. I compile tested this patch on x86 and function tested the patch on s390. Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Partial swapping of guest memoryIzik Eidus2008-01-30
| | | | | | | | | | | | This allows guest memory to be swapped. Pages which are currently mapped via shadow page tables are pinned into memory, but all other pages can be freely swapped. The patch makes gfn_to_page() elevate the page's reference count, and introduces kvm_release_page() that pairs with it. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Make gfn_to_page() always safeIzik Eidus2008-01-30
| | | | | | | | | | In case the page is not present in the guest memory map, return a dummy page the guest can scribble on. This simplifies error checking in its users. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Keep a reverse mapping of non-writable translationsIzik Eidus2008-01-30
| | | | | | | | | | | | The current kvm mmu only reverse maps writable translation. This is used to write-protect a page in case it becomes a pagetable. But with swapping support, we need a reverse mapping of read-only pages as well: when we evict a page, we need to remove any mapping to it, whether writable or not. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Add rmap_next(), a helper for walking kvm rmapsIzik Eidus2008-01-30
| | | | | Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: cmc, clc, cli, stiNitin A Kamble2008-01-30
| | | | | | | | | | Instruction: cmc, clc, cli, sti opcodes: 0xf5, 0xf8, 0xfa, 0xfb respectively. [avi: fix reference to EFLG_IF which is not defined anywhere] Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Simplify page table walkerAvi Kivity2008-01-30
| | | | | | | | | | | | Simplify the walker level loop not to carry so much information from one loop to the next. In addition to being complex, this made kmap_atomic() critical sections difficult to manage. As a result of this change, kmap_atomic() sections are limited to actually touching the guest pte, which allows the other functions called from the walker to do sleepy operations. This will happen when we enable swapping. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Implement emulation of instruction: inc & decNitin A Kamble2008-01-30
| | | | | | | | | Instructions: inc r16/r32 (opcode 0x40-0x47) dec r16/r32 (opcode 0x48-0x4f) Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Rename KVM_TLB_FLUSH to KVM_REQ_TLB_FLUSHAvi Kivity2008-01-30
| | | | | | We now have a new namespace, KVM_REQ_*, for bits in vcpu->requests. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move apic timer interrupt backlog processing to common codeAvi Kivity2008-01-30
| | | | | | | | Beside the obvious goodness of making code more common, this prevents a livelock with the next patch which moves interrupt injection out of the critical section. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add some \n in ioapic_debug()Laurent Vivier2008-01-30
| | | | | | | Add new-line at end of debug strings. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: apic round robin cleanupQing He2008-01-30
| | | | | | | | | | If no apic is enabled in the bitmap of an interrupt delivery with delivery mode of lowest priority, a warning should be reported rather than select a fallback vcpu Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Eddie (Yaozu) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Portability: split kvm_vcpu_ioctlCarsten Otte2008-01-30
| | | | | | | | | | | | | | | | | | | | | | | | This patch splits kvm_vcpu_ioctl into archtecture independent parts, and x86 specific parts which go to kvm_arch_vcpu_ioctl in x86.c. Common ioctls for all architectures are: KVM_RUN, KVM_GET/SET_(S-)REGS, KVM_TRANSLATE, KVM_INTERRUPT, KVM_DEBUG_GUEST, KVM_SET_SIGNAL_MASK, KVM_GET/SET_FPU Note that some PPC chips don't have an FPU, so we might need an #ifdef around KVM_GET/SET_FPU one day. x86 specific ioctls are: KVM_GET/SET_LAPIC, KVM_SET_CPUID, KVM_GET/SET_MSRS An interresting aspect is vcpu_load/vcpu_put. We now have a common vcpu_load/put which does the preemption stuff, and an architecture specific kvm_arch_vcpu_load/put. In the x86 case, this one calls the vmx/svm function defined in kvm_x86_ops. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> Reviewed-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: When updating the dirty bit, inform the mmu about itAvi Kivity2008-01-30
| | | | | | | Since the mmu uses different shadow pages for dirty large pages and clean large pages, this allows the mmu to drop ptes that are now invalid. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Move dirty bit updates to a separate functionAvi Kivity2008-01-30
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Instantiate real-mode shadows as user writable shadowsAvi Kivity2008-01-30
| | | | | | This is consistent with real-mode permissions. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Disable write access on clean large pagesAvi Kivity2008-01-30
| | | | | | | | | By forcing clean huge pages to be read-only, we have separate roles for the shadow of a clean large page and the shadow of a dirty large page. This is necessary because different ptes will be instantiated for the two cases, even for read faults. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Fix nx access bit for huge pagesAvi Kivity2008-01-30
| | | | | | We must set the bit before the shift, otherwise the wrong bit gets set. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move guest pte dirty bit management to the guest pagetable walkerAvi Kivity2008-01-30
| | | | | | | This is more consistent with the accessed bit management, and makes the dirty bit available earlier for other purposes. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: More struct kvm_vcpu -> struct kvm cleanupsAnthony Liguori2008-01-30
| | | | | | | | | This time, the biggest change is gpa_to_hpa. The translation of GPA to HPA does not depend on the VCPU state unlike GVA to GPA so there's no need to pass in the kvm_vcpu. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Clean up MMU functions to take struct kvm when appropriateAnthony Liguori2008-01-30
| | | | | | | | | | | | Some of the MMU functions take a struct kvm_vcpu even though they affect all VCPUs. This patch cleans up some of them to instead take a struct kvm. This makes things a bit more clear. The main thing that was confusing me was whether certain functions need to be called on all VCPUs. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move x86 msr handling to new files x86.[ch]Carsten Otte2008-01-30
| | | | | Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Support assigning userspace memory to the guestIzik Eidus2008-01-30
| | | | | | | | | | | Instead of having the kernel allocate memory to the guest, let userspace allocate it and pass the address to the kernel. This is required for s390 support, but also enables features like memory sharing and using hugetlbfs backed memory. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: CodingStyle cleanupMike Day2008-01-30
| | | | | Signed-off-by: Mike D. Day <ncmike@ncultra.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove gratuitous casts from lapic.cRusty Russell2008-01-30
| | | | | | | Since vcpu->apic is of the correct type, there's not need to cast. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Hoist kvm_create_lapic() into kvm_vcpu_init()Rusty Russell2008-01-30
| | | | | | | | | | | | Move kvm_create_lapic() into kvm_vcpu_init(), rather than having svm and vmx do it. And make it return the error rather than a fairly random -ENOMEM. This also solves the problem that neither svm.c nor vmx.c actually handles the error path properly. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add kvm_free_lapic() to pair with kvm_create_lapic()Rusty Russell2008-01-30
| | | | | | | | | | | | | Instead of the asymetry of kvm_free_apic, implement kvm_free_lapic(). And guess what? I found a minor bug: we don't need to hrtimer_cancel() from kvm_main.c, because we do that in kvm_free_apic(). Also: 1) kvm_vcpu_uninit should be the reverse order from kvm_vcpu_init. 2) Don't set apic->regs_page to zero before freeing apic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow dynamic allocation of the mmu shadow cache sizeIzik Eidus2008-01-30
| | | | | | | The user is now able to set how many mmu pages will be allocated to the guest. Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add general accessors to read and write guest memoryIzik Eidus2008-01-30
| | | | | Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove the usage of page->private field by rmapIzik Eidus2008-01-30
| | | | | | | | | | | | | When kvm uses user-allocated pages in the future for the guest, we won't be able to use page->private for rmap, since page->rmap is reserved for the filesystem. So we move the rmap base pointers to the memory slot. A side effect of this is that we need to store the gfn of each gpte in the shadow pages, since the memory slot is addressed by gfn, instead of hfn like struct page. Signed-off-by: Izik Eidus <izik@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Simplify vcpu_clear()Avi Kivity2008-01-30
| | | | | | | Now that smp_call_function_single() knows how to call a function on the current cpu, there's no need to check explicitly. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Don't clear the vmcs if the vcpu is not loaded on any processorAvi Kivity2008-01-30
| | | | | | Noted by Eddie Dong. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Any legacy prefix after a REX prefix nullifies its effectLaurent Vivier2008-01-30
| | | | | | | | | | | This patch modifies the management of REX prefix according behavior I saw in Xen 3.1. In Xen, this modification has been introduced by Jan Beulich. http://lists.xensource.com/archives/html/xen-changelog/2007-01/msg00081.html Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Purify x86_decode_insn() error case managementLaurent Vivier2008-01-30
| | | | | | | The only valid case is on protected page access, other cases are errors. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86_emulator: no writeback for btQing He2008-01-30
| | | | | Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: Remove no_wb, use dst.type = OP_NONE insteadLaurent Vivier2008-01-30
| | | | | | | Remove no_wb, use dst.type = OP_NONE instead, idea stollen from xen-3.1 Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: remove _eflags and use directly ctxt->eflags.Laurent Vivier2008-01-30
| | | | | | | | | Remove _eflags and use directly ctxt->eflags. Caching eflags is not needed as it is restored to vcpu by kvm_main.c:emulate_instruction() from ctxt->eflags only if emulation doesn't fail. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: split some decoding into functions for readabilityLaurent Vivier2008-01-30
| | | | | | | | To improve readability, move push, writeback, and grp 1a/2/3/4/5/9 emulation parts into functions. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Ignore reserved bits in cr3 in non-pae modeRyan Harper2008-01-30
| | | | | | | | | | This patch removes the fault injected when the guest attempts to set reserved bits in cr3. X86 hardware doesn't generate a fault when setting reserved bits. The result of this patch is that vmware-server, running within a kvm guest, boots and runs memtest from an iso. Signed-off-by: Ryan Harper <ryanh@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Make flooding detection work when guest page faults are bypassedAvi Kivity2008-01-30
| | | | | | | | | When we allow guest page faults to reach the guests directly, we lose the fault tracking which allows us to detect demand paging. So we provide an alternate mechnism by clearing the accessed bit when we set a pte, and checking it later to see if the guest actually used it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow not-present guest page faults to bypass kvmAvi Kivity2008-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | There are two classes of page faults trapped by kvm: - host page faults, where the fault is needed to allow kvm to install the shadow pte or update the guest accessed and dirty bits - guest page faults, where the guest has faulted and kvm simply injects the fault back into the guest to handle The second class, guest page faults, is pure overhead. We can eliminate some of it on vmx using the following evil trick: - when we set up a shadow page table entry, if the corresponding guest pte is not present, set up the shadow pte as not present - if the guest pte _is_ present, mark the shadow pte as present but also set one of the reserved bits in the shadow pte - tell the vmx hardware not to trap faults which have the present bit clear With this, normal page-not-present faults go directly to the guest, bypassing kvm entirely. Unfortunately, this trick only works on Intel hardware, as AMD lacks a way to discriminate among page faults based on error code. It is also a little risky since it uses reserved bits which might become unreserved in the future, so a module parameter is provided to disable it. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Further reduce efer reloadsAvi Kivity2008-01-30
| | | | | | | | | | | | | KVM avoids reloading the efer msr when the difference between the guest and host values consist of the long mode bits (which are switched by hardware) and the NX bit (which is emulated by the KVM MMU). This patch also allows KVM to ignore SCE (syscall enable) when the guest is running in 32-bit mode. This is because the syscall instruction is not available in 32-bit mode on Intel processors, so the SCE bit is effectively meaningless. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Call x86_decode_insn() only when neededLaurent Vivier2008-01-30
| | | | | | | | | Move emulate_ctxt to kvm_vcpu to keep emulate context when we exit from kvm module. Call x86_decode_insn() only when needed. Modify x86_emulate_insn() to not modify the context if it must be re-entered. Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>