aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/kvm/kvm_main.c
Commit message (Collapse)AuthorAge
* KVM: Avoid calling smp_call_function_single() with interrupts disabledAvi Kivity2007-08-19
| | | | | | | | | | | | | | When taking a cpu down, we need to hardware_disable() it. Unfortunately, the CPU_DYING notifier is called with interrupts disabled, which means we can't use smp_call_function_single(). Fortunately, the CPU_DYING notifier is always called on the dying cpu, so we don't need to use the function at all and can simply call hardware_disable() directly. Tested-by: Paolo Ornati <ornati@fastwebnet.it> Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: Fix removal of nx capability from guest cpuidAvi Kivity2007-07-25
| | | | | | | | | | | Testing the wrong bit caused kvm not to disable nx on the guest when it is disabled on the host (an mmu optimization relies on the nx bits being the same in the guest and host). This allows Windows to boot when nx is disabled on te host (e.g. when host pae is disabled). Signed-off-by: Avi Kivity <avi@qumranet.com>
* Revert "KVM: Avoid useless memory write when possible"Avi Kivity2007-07-25
| | | | | | | | This reverts commit a3c870bdce4d34332ebdba7eb9969592c4c6b243. While it does save useless updates, it (probably) defeats the fork detector, causing a massive performance loss. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix unlikely kvm_create vs decache_vcpus_on_cpu raceRusty Russell2007-07-25
| | | | | | | | We add the kvm to the vm_list before initializing the vcpu mutexes, which can be mutex_trylock()'ed by decache_vcpus_on_cpu(). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Correctly handle writes crossing a page boundaryAvi Kivity2007-07-25
| | | | | | | | | | | Writes that are contiguous in virtual memory may not be contiguous in physical memory; so split writes that straddle a page boundary. Thanks to Aurelien for reporting the bug, patient testing, and a fix to this very patch. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: x86 emulator: implement rdmsr and wrmsrAvi Kivity2007-07-20
| | | | | | | Allow real-mode emulation of rdmsr and wrmsr. This allows smp Windows to boot, presumably for its sipi trampoline. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix memory slot management functions for guest smpAvi Kivity2007-07-20
| | | | | | | | | | | The memory slot management functions were oriented against vcpu 0, where they should be kvm-wide. This causes hangs starting X on guest smp. Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific. Unfortunately this reduces the efficiency of the mmu object cache a bit. We may have to revisit this later. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use CPU_DYING for disabling virtualizationAvi Kivity2007-07-16
| | | | | | | | Only at the CPU_DYING stage can we be sure that no user process will be scheduled onto the cpu and oops when trying to use virtualization extensions. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Tune hotplug/suspend IPIsAvi Kivity2007-07-16
| | | | | | | | | The hotplug IPIs can be called from the cpu on which we are currently running on, so use on_cpu(). Similarly, drop on_each_cpu() for the suspend/resume callbacks, as we're in atomic context here and only one cpu is up anyway. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Keep track of which cpus have virtualization enabledAvi Kivity2007-07-16
| | | | | | | | By keeping track of which cpus have virtualization enabled, we prevent double-enable or double-disable during hotplug, which is a very fatal oops. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Clean up #includesAvi Kivity2007-07-16
| | | | | | Remove unnecessary ones, and rearange the remaining in the standard order. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove kvmfs in favor of the anonymous inodes sourceAvi Kivity2007-07-16
| | | | | | | | kvm uses a pseudo filesystem, kvmfs, to generate inodes, a job that the new anonymous inodes source does much better. Cc: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid useless memory write when possibleLuca Tettamanti2007-07-16
| | | | | | | | When writing to normal memory and the memory area is unchanged the write can be safely skipped, avoiding the costly kvm_mmu_pte_write. Signed-Off-By: Luca Tettamanti <kronos.it@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add support for in-kernel pio handlersEddie Dong2007-07-16
| | | | | | | Useful for the PIC and PIT. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Adds support for in-kernel mmio handlersGregory Haskins2007-07-16
| | | | | Signed-off-by: Gregory Haskins <ghaskins@novell.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Flush remote tlbs when reducing shadow pte permissionsAvi Kivity2007-07-16
| | | | | | | | | | | When a vcpu causes a shadow tlb entry to have reduced permissions, it must also clear the tlb on remote vcpus. We do that by: - setting a bit on the vcpu that requests a tlb flush before the next entry - if the vcpu is currently executing, we send an ipi to make sure it exits before we continue Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Keep an upper bound of initialized vcpusAvi Kivity2007-07-16
| | | | | | | That way, we don't need to loop for KVM_MAX_VCPUS for a single vcpu vm. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Move duplicate halt handling code into kvm_main.cAvi Kivity2007-07-16
| | | | | | Will soon have a thid user. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix adding an smp virtual machine to the vm listAvi Kivity2007-07-16
| | | | | | | If we add the vm once per vcpu, we corrupt the list if the guest has multiple vcpus. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fix vcpu freeing for guest smpAvi Kivity2007-07-16
| | | | | | | | A vcpu can pin up to four mmu shadow pages, which means the freeing loop will never terminate. Fix by first unpinning shadow pages on all vcpus, then freeing shadow pages. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove unnecessary initialization and checks in mark_page_dirty()Nguyen Anh Quynh2007-07-16
| | | | Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: MMU: Use slab caches for shadow pages and their headersAvi Kivity2007-07-16
| | | | | | Use slab caches instead of a simple custom list. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Avoid saving and restoring msr_efer on lightweight vmexitEddie Dong2007-07-16
| | | | | | | | | | | | | | | | | | | MSR_EFER.LME/LMA bits are automatically save/restored by VMX hardware, KVM only needs to save NX/SCE bits at time of heavy weight VM Exit. But clearing NX bits in host envirnment may cause system hang if the host page table is using EXB bits, thus we leave NX bits as it is. If Host NX=1 and guest NX=0, we can do guest page table EXB bits check before inserting a shadow pte (though no guest is expecting to see this kind of gp fault). If host NX=0, we present guest no Execute-Disable feature to guest, thus no host NX=0, guest NX=1 combination. This patch reduces raw vmexit time by ~27%. Me: fix compile warnings on i386. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Implement IA32_EBL_CR_POWERON msrMatthew Gregan2007-07-16
| | | | | | | | | | | | | | Attempting to boot the default 'bsd' kernel of OpenBSD 4.1 i386 in a guest fails early in the kernel init inside p3_get_bus_clock while trying to read the IA32_EBL_CR_POWERON MSR. KVM logs an 'unhandled MSR' message and the guest kernel faults. This patch is sufficient to allow OpenBSD to boot, after which it seems to run fine. I'm not sure if this is the correct solution for dealing with this particular MSR, but it works for me. Signed-off-by: Matthew Gregan <kinetik@flim.org> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write()Avi Kivity2007-07-16
| | | | | | | Instead of calling two functions and repeating expensive checks, call one function and provide it with before/after information. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid saving and restoring some host CPU state on lightweight vmexitAvi Kivity2007-07-16
| | | | | | | | | | | | Many msrs and the like will only be used by the host if we schedule() or return to userspace. Therefore, we avoid saving them if we handle the exit within the kernel, and if a reschedule is not requested. Based on a patch from Eddie Dong <eddie.dong@intel.com> with a couple of fixes by me. Signed-off-by: Yaozu(Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Prevent guest fpu state from leaking into the hostAvi Kivity2007-06-15
| | | | | | | | | The lazy fpu changes did not take into account that some vmexit handlers can sleep. Move loading the guest state into the inner loop so that it can be reloaded if necessary, and move loading the host state into vmx_vcpu_put() so it can be performed whenever we relinquish the vcpu. Signed-off-by: Avi Kivity <avi@qumranet.com>
* Detach sched.h from mm.hAlexey Dobriyan2007-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | First thing mm.h does is including sched.h solely for can_do_mlock() inline function which has "current" dereference inside. By dealing with can_do_mlock() mm.h can be detached from sched.h which is good. See below, why. This patch a) removes unconditional inclusion of sched.h from mm.h b) makes can_do_mlock() normal function in mm/mlock.c c) exports can_do_mlock() to not break compilation d) adds sched.h inclusions back to files that were getting it indirectly. e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were getting them indirectly Net result is: a) mm.h users would get less code to open, read, preprocess, parse, ... if they don't need sched.h b) sched.h stops being dependency for significant number of files: on x86_64 allmodconfig touching sched.h results in recompile of 4083 files, after patch it's only 3744 (-8.3%). Cross-compile tested on all arm defconfigs, all mips defconfigs, all powerpc defconfigs, alpha alpha-up arm i386 i386-up i386-defconfig i386-allnoconfig ia64 ia64-up m68k mips parisc parisc-up powerpc powerpc-up s390 s390-up sparc sparc-up sparc64 sparc64-up um-x86_64 x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig as well as my two usual configs. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add suspend-related notifications for CPU hotplugRafael J. Wysocki2007-05-09
| | | | | | | | | | | | | | | | | | | | | Since nonboot CPUs are now disabled after tasks and devices have been frozen and the CPU hotplug infrastructure is used for this purpose, we need special CPU hotplug notifications that will help the CPU-hotplug-aware subsystems distinguish normal CPU hotplug events from CPU hotplug events related to a system-wide suspend or resume operation in progress. This patch introduces such notifications and causes them to be used during suspend and resume transitions. It also changes all of the CPU-hotplug-aware subsystems to take these notifications into consideration (for now they are handled in the same way as the corresponding "normal" ones). [oleg@tv-sign.ru: cleanups] Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Gautham R Shenoy <ego@in.ibm.com> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* KVM: Don't require explicit indication of completion of mmio or pioAvi Kivity2007-05-03
| | | | | | | | It is illegal not to return from a pio or mmio request without completing it, as mmio or pio is an atomic operation. Therefore, we can simplify the userspace interface by avoiding the completion indication. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove extraneous guest entry on mmio readAvi Kivity2007-05-03
| | | | | | | | | | | | | When emulating an mmio read, we actually emulate twice: once to determine the physical address of the mmio, and, after we've exited to userspace to get the mmio value, we emulate again to place the value in the result register and update any flags. But we don't really need to enter the guest again for that, only to take an immediate vmexit. So, if we detect that we're doing an mmio read, emulate a single instruction before entering the guest again. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Properly shadow the CR0 register in the vcpu structAnthony Liguori2007-05-03
| | | | | | | | Set all of the host mask bits for CR0 so that we can maintain a proper shadow of CR0. This exposes CR0.TS, paving the way for lazy fpu handling. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow passing 64-bit values to the emulated read/write APIAvi Kivity2007-05-03
| | | | | | | This simplifies the API somewhat (by eliminating the special-case cmpxchg8b on i386). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Per-vcpu statisticsAvi Kivity2007-05-03
| | | | | | | | Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: VMX: Avoid unnecessary vcpu_load()/vcpu_put() cyclesYaozu Dong2007-05-03
| | | | | | | | By checking if a reschedule is needed, we avoid dropping the vcpu. [With changes by me, based on Anthony Liguori's observations] Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle guest page faults when emulating mmioAvi Kivity2007-05-03
| | | | | | | | | | | | | | | | | | Usually, guest page faults are detected by the kvm page fault handler, which detects if they are shadow faults, mmio faults, pagetable faults, or normal guest page faults. However, in ceratin circumstances, we can detect a page fault much later. One of these events is the following combination: - A two memory operand instruction (e.g. movsb) is executed. - The first operand is in mmio space (which is the fault reported to kvm) - The second operand is in an ummaped address (e.g. a guest page fault) The Windows 2000 installer does such an access, an promptly hangs. Fix by adding the missing page fault injection on that path. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Use slab caches to allocate mmu data structuresAvi Kivity2007-05-03
| | | | | | | Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Initialize cr0 to indicate an fpu is presentAvi Kivity2007-05-03
| | | | | | | Solaris panics if it sees a cpu with no fpu, and it seems to rely on this bit. Closes sourceforge bug 1698920. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add fpu get/set operationsAvi Kivity2007-05-03
| | | | | | | These are really helpful when migrating an floating point app to another machine. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add physical memory aliasing featureAvi Kivity2007-05-03
| | | | | | | | With this, we can specify that accesses to one physical memory range will be remapped to another. This is useful for the vga window at 0xa0000 which is used as a movable window into the (much larger) framebuffer. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Simply gfn_to_page()Avi Kivity2007-05-03
| | | | | | | | | | | | Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Handle writes to MCG_STATUS msrSergey Kiselev2007-05-03
| | | | | | | | | | Some older (~2.6.7) kernels write MCG_STATUS register during kernel boot (mce_clear_all() function, called from mce_init()). It's not currently handled by kvm and will cause it to inject a GPF. Following patch adds a "nop" handler for this. Signed-off-by: Sergey Kiselev <sergey.kiselev@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Modify guest segments after potentially switching modesAvi Kivity2007-05-03
| | | | | | | | | The SET_SREGS ioctl modifies both cr0.pe (real mode/protected mode) and guest segment registers. Since segment handling is modified by the mode on Intel procesors, update the segment registers after the mode switch has taken place. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Remove set_cr0_no_modeswitch() arch opAvi Kivity2007-05-03
| | | | | | | | | set_cr0_no_modeswitch() was a hack to avoid corrupting segment registers. As we now cache the protected mode values on entry to real mode, this isn't an issue anymore, and it interferes with reboot (which usually _is_ a modeswitch). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Avoid guest virtual addresses in string pio userspace interfaceAvi Kivity2007-05-03
| | | | | | | | | | | | | | | The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Future-proof argument-less ioctlsAvi Kivity2007-05-03
| | | | | | | Some ioctls ignore their arguments. By requiring them to be zero now, we allow a nonzero value to have some special meaning in the future. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow kernel to select size of mmap() bufferAvi Kivity2007-05-03
| | | | | | | | This allows us to store offsets in the kernel/user kvm_run area, and be sure that userspace has them mapped. As offsets can be outside the kvm_run struct, userspace has no way of knowing how much to mmap. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Add guest mode signal maskAvi Kivity2007-05-03
| | | | | | | | | Allow a special signal mask to be used while executing in guest mode. This allows signals to be used to interrupt a vcpu without requiring signal delivery to a userspace handler, which is quite expensive. Userspace still receives -EINTR and can get the signal via sigwait(). Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Fold kvm_run::exit_type into kvm_run::exit_reasonAvi Kivity2007-05-03
| | | | | | | | | Currently, userspace is told about the nature of the last exit from the guest using two fields, exit_type and exit_reason, where exit_type has just two enumerations (and no need for more). So fold exit_type into exit_reason, reducing the complexity of determining what really happened. Signed-off-by: Avi Kivity <avi@qumranet.com>
* KVM: Allow userspace to process hypercalls which have no kernel handlerAvi Kivity2007-05-03
| | | | | | This is useful for paravirtualized graphics devices, for example. Signed-off-by: Avi Kivity <avi@qumranet.com>