aboutsummaryrefslogtreecommitdiffstats
path: root/virt
Commit message (Collapse)AuthorAge
* KVM: x86: Perform hardware_enable in CPU_STARTING callbackZachary Amsden2010-09-09
| | | | | | | | | | | | | | | | | The CPU_STARTING callback was added upstream with the intention of being used for KVM, specifically for the hardware enablement that must be done before we can run in hardware virt. It had bugs on the x86_64 architecture at the time, where it was called after CPU_ONLINE. The arches have since merged and the bug is gone. It might be noted other features should probably start making use of this callback; microcode updates in particular which might be fixing important erratums would be best applied before beginning to run user tasks. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Convert mask notifiers to use irqchip/pin instead of gsiGleb Natapov2010-08-01
| | | | | | | | | Devices register mask notifier using gsi, but irqchip knows about irqchip/pin, so conversion from irqchip/pin to gsi should be done before looking for mask notifier to call. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Return EFAULT from kvm ioctl when guest accesses bad areaGleb Natapov2010-08-01
| | | | | | | | | | Currently if guest access address that belongs to memory slot but is not backed up by page or page is read only KVM treats it like MMIO access. Remove that capability. It was never part of the interface and should not be relied upon. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: define hwpoison variables staticGleb Natapov2010-08-01
| | | | | | | They are not used outside of the file. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Use u64 for frame data typesJoerg Roedel2010-08-01
| | | | | | | | | | | | For 32bit machines where the physical address width is larger than the virtual address width the frame number types in KVM may overflow. Fix this by changing them to u64. [sfr: fix build on 32-bit ppc] Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Remove unnecessary divide operationsJoerg Roedel2010-08-01
| | | | | | | | | | This patch converts unnecessary divide and modulo operations in the KVM large page related code into logical operations. This allows to convert gfn_t to u64 while not breaking 32 bit builds. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix IOMMU memslot reference warningSheng Yang2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following warning. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- include/linux/kvm_host.h:259 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 no locks held by qemu-system-x86/29679. stack backtrace: Pid: 29679, comm: qemu-system-x86 Not tainted 2.6.35-rc3+ #200 Call Trace: [<ffffffff810a224e>] lockdep_rcu_dereference+0xa8/0xb1 [<ffffffffa018a06f>] kvm_iommu_unmap_memslots+0xc9/0xde [kvm] [<ffffffffa018a0c4>] kvm_iommu_unmap_guest+0x40/0x4e [kvm] [<ffffffffa018f772>] kvm_arch_destroy_vm+0x1a/0x186 [kvm] [<ffffffffa01800d0>] kvm_put_kvm+0x110/0x167 [kvm] [<ffffffffa0180ecc>] kvm_vcpu_release+0x18/0x1c [kvm] [<ffffffff81156f5d>] fput+0x22a/0x3a0 [<ffffffff81152288>] filp_close+0xb4/0xcd [<ffffffff8106599f>] put_files_struct+0x1b7/0x36b [<ffffffff81065830>] ? put_files_struct+0x48/0x36b [<ffffffff8131ee59>] ? do_raw_spin_unlock+0x118/0x160 [<ffffffff81065bc0>] exit_files+0x6d/0x75 [<ffffffff81068348>] do_exit+0x47d/0xc60 [<ffffffff8177e7b5>] ? _raw_spin_unlock_irq+0x30/0x36 [<ffffffff81068bfa>] do_group_exit+0xcf/0x134 [<ffffffff81080790>] get_signal_to_deliver+0x732/0x81d [<ffffffff81095996>] ? cpu_clock+0x4e/0x60 [<ffffffff81002082>] do_notify_resume+0x117/0xc43 [<ffffffff810a2fa3>] ? trace_hardirqs_on+0xd/0xf [<ffffffff81080d79>] ? sys_rt_sigtimedwait+0x2b5/0x3bf [<ffffffff8177d9f2>] ? trace_hardirqs_off_thunk+0x3a/0x3c [<ffffffff81003221>] ? sysret_signal+0x5/0x3d [<ffffffff8100343b>] int_signal+0x12/0x17 Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix a race condition for usage of is_hwpoison_address()Huang Ying2010-08-01
| | | | | | | | | | | | is_hwpoison_address accesses the page table, so the caller must hold current->mm->mmap_sem in read mode. So fix its usage in hva_to_pfn of kvm accordingly. Comment is_hwpoison_address to remind other users. Reported-by: Avi Kivity <avi@redhat.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Keep slot ID in memory slot structureAvi Kivity2010-08-01
| | | | | | | May be used for distinguishing between internal and user slots, or for sorting slots in size order. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Add mini-API for vcpu->requestsAvi Kivity2010-08-01
| | | | | | Makes it a little more readable and hackable. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Remove memory alias supportAvi Kivity2010-08-01
| | | | | | | As advertised in feature-removal-schedule.txt. Equivalent support is provided by overlapping memory regions. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: x86: In DM_LOWEST, only deliver interrupts to vcpus with enabled LAPIC'sChris Lalancette2010-08-01
| | | | | | | | Otherwise we might try to deliver a timer interrupt to a cpu that can't possibly handle it. Signed-off-by: Chris Lalancette <clalance@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix unused but set warningsAndi Kleen2010-08-01
| | | | | | | No real bugs in this one. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix KVM_SET_SIGNAL_MASK with arg == NULLAndi Kleen2010-08-01
| | | | | | | | | | When the user passed in a NULL mask pass this on from the ioctl handler. Found by gcc 4.6's new warnings. Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: cleanup "*new.rmap" typeLai Jiangshan2010-08-01
| | | | | | | The type of '*new.rmap' is not 'struct page *', fix it Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Update Red Hat copyrightsAvi Kivity2010-08-01
| | | | Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Consolidate arch specific vcpu ioctl lockingAvi Kivity2010-08-01
| | | | | | | Now that all arch specific ioctls have centralized locking, it is easy to move it to the central dispatcher. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: move vcpu locking to dispatcher for generic vcpu ioctlsAvi Kivity2010-08-01
| | | | | | | | | All vcpu ioctls need to be locked, so instead of locking each one specifically we lock at the generic dispatcher. This patch only updates generic ioctls and leaves arch specific ioctls alone. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: remove CAP_SYS_RAWIO requirement from kvm_vm_ioctl_assign_irqAlex Williamson2010-08-01
| | | | | | | | | | | Remove this check in an effort to allow kvm guests to run without root privileges. This capability check doesn't seem to add any security since the device needs to have already been added via the assign device ioctl and the io actually occurs through the pci sysfs interface. Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Avoid killing userspace through guest SRAO MCE on unmapped pagesHuang Ying2010-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In common cases, guest SRAO MCE will cause corresponding poisoned page be un-mapped and SIGBUS be sent to QEMU-KVM, then QEMU-KVM will relay the MCE to guest OS. But it is reported that if the poisoned page is accessed in guest after unmapping and before MCE is relayed to guest OS, userspace will be killed. The reason is as follows. Because poisoned page has been un-mapped, guest access will cause guest exit and kvm_mmu_page_fault will be called. kvm_mmu_page_fault can not get the poisoned page for fault address, so kernel and user space MMIO processing is tried in turn. In user MMIO processing, poisoned page is accessed again, then userspace is killed by force_sig_info. To fix the bug, kvm_mmu_page_fault send HWPOISON signal to QEMU-KVM and do not try kernel and user space MMIO processing for poisoned page. [xiao: fix warning introduced by avi] Reported-by: Max Asbock <masbock@linux.vnet.ibm.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: read apic->irr with ioapic lock heldMarcelo Tosatti2010-06-10
| | | | | | | Read ioapic->irr inside ioapic->lock protected section. KVM-Stable-Tag Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix order passed to iommu_unmapJan Kiszka2010-06-09
| | | | | | | | | | This is obviously a left-over from the the old interface taking the size. Apparently a mostly harmless issue with the current iommu_unmap implementation. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* Merge branch 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2010-05-21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.35' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (269 commits) KVM: x86: Add missing locking to arch specific vcpu ioctls KVM: PPC: Add missing vcpu_load()/vcpu_put() in vcpu ioctls KVM: MMU: Segregate shadow pages with different cr0.wp KVM: x86: Check LMA bit before set_efer KVM: Don't allow lmsw to clear cr0.pe KVM: Add cpuid.txt file KVM: x86: Tell the guest we'll warn it about tsc stability x86, paravirt: don't compute pvclock adjustments if we trust the tsc x86: KVM guest: Try using new kvm clock msrs KVM: x86: export paravirtual cpuid flags in KVM_GET_SUPPORTED_CPUID KVM: x86: add new KVMCLOCK cpuid feature KVM: x86: change msr numbers for kvmclock x86, paravirt: Add a global synchronization point for pvclock x86, paravirt: Enable pvclock flags in vcpu_time_info structure KVM: x86: Inject #GP with the right rip on efer writes KVM: SVM: Don't allow nested guest to VMMCALL into host KVM: x86: Fix exception reinjection forced to true KVM: Fix wallclock version writing race KVM: MMU: Don't read pdptrs with mmu spinlock held in mmu_alloc_roots KVM: VMX: enable VMXON check with SMX enabled (Intel TXT) ...
| * KVM: Let vcpu structure alignment be determined at runtimeAvi Kivity2010-05-19
| | | | | | | | | | | | | | vmx and svm vcpus have different contents and therefore may have different alignmment requirements. Let each specify its required alignment. Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Remove test-before-set optimization for dirty bitsTakuya Yoshikawa2010-05-17
| | | | | | | | | | | | | | | | | | | | As Avi pointed out, testing bit part in mark_page_dirty() was important in the days of shadow paging, but currently EPT and NPT has already become common and the chance of faulting a page more that once per iteration is small. So let's remove the test bit to avoid extra access. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: do not call hardware_disable() on CPU_UP_CANCELEDLai Jiangshan2010-05-17
| | | | | | | | | | | | | | | | | | | | | | When CPU_UP_CANCELED, hardware_enable() has not been called at the CPU which is going up because raw_notifier_call_chain(CPU_ONLINE) has not been called for this cpu. Drop the handling for CPU_UP_CANCELED. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: use the correct RCU API for PROVE_RCU=yLai Jiangshan2010-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The RCU/SRCU API have already changed for proving RCU usage. I got the following dmesg when PROVE_RCU=y because we used incorrect API. This patch coverts rcu_deference() to srcu_dereference() or family API. =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/mmu.c:3020 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/8550: #0: (&kvm->slots_lock){+.+.+.}, at: [<ffffffffa011a6ac>] kvm_set_memory_region+0x29/0x50 [kvm] #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa012262d>] kvm_arch_commit_memory_region+0xa6/0xe2 [kvm] stack backtrace: Pid: 8550, comm: qemu-system-x86 Not tainted 2.6.34-rc4-tip-01028-g939eab1 #27 Call Trace: [<ffffffff8106c59e>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa012f6c1>] kvm_mmu_calculate_mmu_pages+0x44/0x7d [kvm] [<ffffffffa012263e>] kvm_arch_commit_memory_region+0xb7/0xe2 [kvm] [<ffffffffa011a5d7>] __kvm_set_memory_region+0x636/0x6e2 [kvm] [<ffffffffa011a6ba>] kvm_set_memory_region+0x37/0x50 [kvm] [<ffffffffa015e956>] vmx_set_tss_addr+0x46/0x5a [kvm_intel] [<ffffffffa0126592>] kvm_arch_vm_ioctl+0x17a/0xcf8 [kvm] [<ffffffff810a8692>] ? unlock_page+0x27/0x2c [<ffffffff810bf879>] ? __do_fault+0x3a9/0x3e1 [<ffffffffa011b12f>] kvm_vm_ioctl+0x364/0x38d [kvm] [<ffffffff81060cfa>] ? up_read+0x23/0x3d [<ffffffff810f3587>] vfs_ioctl+0x32/0xa6 [<ffffffff810f3b19>] do_vfs_ioctl+0x495/0x4db [<ffffffff810e6b2f>] ? fget_light+0xc2/0x241 [<ffffffff810e416c>] ? do_sys_open+0x104/0x116 [<ffffffff81382d6d>] ? retint_swapgs+0xe/0x13 [<ffffffff810f3ba6>] sys_ioctl+0x47/0x6a [<ffffffff810021db>] system_call_fastpath+0x16/0x1b Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: limit the number of pages per memory slotTakuya Yoshikawa2010-05-17
| | | | | | | | | | | | | | | | This patch limits the number of pages per memory slot to make us free from extra care about type issues. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handlingTakuya Yoshikawa2010-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_coalesced_mmio_init() keeps to hold the addresses of a coalesced mmio ring page and dev even after it has freed them. Also, if this function fails, though it might be rare, it seems to be suggesting the system's serious state: so we'd better stop the works following the kvm_creat_vm(). This patch clears these problems. We move the coalesced mmio's initialization out of kvm_create_vm(). This seems to be natural because it includes a registration which can be done only when vm is successfully created. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: fix assigned_device_enable_host_msix error handlingjing zhang2010-05-17
| | | | | | | | | | | | | | | | Free IRQ's and disable MSIX upon failure. Cc: Avi Kivity <avi@redhat.com> Signed-off-by: Jing Zhang <zj.barak@gmail.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: fix the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO failureWei Yongjun2010-05-17
| | | | | | | | | | | | | | | | This patch change the errno of ioctl KVM_[UN]REGISTER_COALESCED_MMIO from -EINVAL to -ENXIO if no coalesced mmio dev exists. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * KVM: cleanup kvm traceXiao Guangrong2010-05-17
| | | | | | | | | | | | | | | | | | | | | | | | This patch does: - no need call tracepoint_synchronize_unregister() when kvm module is unloaded since ftrace can handle it - cleanup ftrace's macro Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: update gfn_to_hva() to use gfn_to_hva_memslot()Takuya Yoshikawa2010-04-25
| | | | | | | | | | | | | | | | | | | | Marcelo introduced gfn_to_hva_memslot() when he implemented gfn_to_pfn_memslot(). Let's use this for gfn_to_hva() too. Note: also remove parentheses next to return as checkpatch said to do. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
* | Merge branch 'core-iommu-for-linus' of ↵Linus Torvalds2010-05-18
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/amd-iommu: Add amd_iommu=off command line option iommu-api: Remove iommu_{un}map_range functions x86/amd-iommu: Implement ->{un}map callbacks for iommu-api x86/amd-iommu: Make amd_iommu_iova_to_phys aware of multiple page sizes x86/amd-iommu: Make iommu_unmap_page and fetch_pte aware of page sizes x86/amd-iommu: Make iommu_map_page and alloc_pte aware of page sizes kvm: Change kvm_iommu_map_pages to map large pages VT-d: Change {un}map_range functions to implement {un}map interface iommu-api: Add ->{un}map callbacks to iommu_ops iommu-api: Add iommu_map and iommu_unmap functions iommu-api: Rename ->{un}map function pointers to ->{un}map_range
| * \ Merge branch 'iommu/largepages' into amd-iommu/2.6.35Joerg Roedel2010-05-11
| |\ \ | | |/ | |/| | | | | | | Conflicts: arch/x86/kernel/amd_iommu.c
| | * kvm: Change kvm_iommu_map_pages to map large pagesJoerg Roedel2010-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch changes the implementation of of kvm_iommu_map_pages to map the pages with the host page size into the io virtual address space. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Acked-By: Avi Kivity <avi@redhat.com>
* | | KVM: convert ioapic lock to spinlockMarcelo Tosatti2010-05-13
|/ / | | | | | | | | | | | | | | | | kvm_set_irq is used from non sleepable contexes, so convert ioapic from mutex to spinlock. KVM-Stable-Tag. Tested-by: Ralf Bonenkamp <ralf.bonenkamp@swyx.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | KVM: Add missing srcu_read_lock() for kvm_mmu_notifier_release()Lai Jiangshan2010-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I got this dmesg due to srcu_read_lock() is missing in kvm_mmu_notifier_release(). =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- arch/x86/kvm/x86.h:72 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 2 locks held by qemu-system-x86/3100: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff810d73dc>] __mmu_notifier_release+0x38/0xdf #1: (&(&kvm->mmu_lock)->rlock){+.+...}, at: [<ffffffffa0130a6a>] kvm_mmu_zap_all+0x21/0x5e [kvm] stack backtrace: Pid: 3100, comm: qemu-system-x86 Not tainted 2.6.34-rc3-22949-gbc8a97a-dirty #2 Call Trace: [<ffffffff8106afd9>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffffa0123a89>] unalias_gfn+0x56/0xab [kvm] [<ffffffffa0119600>] gfn_to_memslot+0x16/0x25 [kvm] [<ffffffffa012ffca>] gfn_to_rmap+0x17/0x6e [kvm] [<ffffffffa01300c1>] rmap_remove+0xa0/0x19d [kvm] [<ffffffffa0130649>] kvm_mmu_zap_page+0x109/0x34d [kvm] [<ffffffffa0130a7e>] kvm_mmu_zap_all+0x35/0x5e [kvm] [<ffffffffa0122870>] kvm_arch_flush_shadow+0x16/0x22 [kvm] [<ffffffffa01189e0>] kvm_mmu_notifier_release+0x15/0x17 [kvm] [<ffffffff810d742c>] __mmu_notifier_release+0x88/0xdf [<ffffffff810d73dc>] ? __mmu_notifier_release+0x38/0xdf [<ffffffff81040848>] ? exit_mm+0xe0/0x115 [<ffffffff810c2cb0>] exit_mmap+0x2c/0x17e [<ffffffff8103c472>] mmput+0x2d/0xd4 [<ffffffff81040870>] exit_mm+0x108/0x115 [...] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: fix the handling of dirty bitmaps to avoid overflowsTakuya Yoshikawa2010-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | Int is not long enough to store the size of a dirty bitmap. This patch fixes this problem with the introduction of a wrapper function to calculate the sizes of dirty bitmaps. Note: in mark_page_dirty(), we have to consider the fact that __set_bit() takes the offset as int, not long. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo2010-03-30
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
* KVM: Convert kvm->requests_lock to raw_spinlock_tAvi Kivity2010-03-01
| | | | | | | | The code relies on kvm->requests_lock inhibiting preemption. Noted by Jan Kiszka. Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: do not store wqh in irqfdMichael S. Tsirkin2010-03-01
| | | | | | | wqh is unused, so we do not need to store it in irqfd anymore Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: cleanup the failure path of KVM_CREATE_IRQCHIP ioctrlWei Yongjun2010-03-01
| | | | | | | | | If we fail to init ioapic device or the fail to setup the default irq routing, the device register by kvm_create_pic() and kvm_ioapic_init() remain unregister. This patch fixed to do this. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: kvm->arch.vioapic should be NULL if kvm_ioapic_init() failureWei Yongjun2010-03-01
| | | | | | | | kvm->arch.vioapic should be NULL in case of kvm_ioapic_init() failure due to cannot register io dev. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Fix Codestyle in virt/kvm/coalesced_mmio.cJochen Maes2010-03-01
| | | | | | | Fixed 2 codestyle issues in virt/kvm/coalesced_mmio.c Signed-off-by: Jochen Maes <jochen.maes@sejo.be> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: Introduce kvm_host_page_sizeJoerg Roedel2010-03-01
| | | | | | | | | | This patch introduces a generic function to find out the host page size for a given gfn. This function is needed by the kvm iommu code. This patch also simplifies the x86 host_mapping_level function. Signed-off-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* KVM: enable PCI multiple-segments for pass-through deviceZhai, Edwin2010-03-01
| | | | | | | | | Enable optional parameter (default 0) - PCI segment (or domain) besides BDF, when assigning PCI device to guest. Signed-off-by: Zhai Edwin <edwin.zhai@intel.com> Acked-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: Fix kvm_coalesced_mmio_ring duplicate allocationSheng Yang2010-03-01
| | | | | | | | | The commit 0953ca73 "KVM: Simplify coalesced mmio initialization" allocate kvm_coalesced_mmio_ring in the kvm_coalesced_mmio_init(), but didn't discard the original allocation... Signed-off-by: Sheng Yang <sheng@linux.intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* KVM: fix cleanup_srcu_struct on vm destructionMarcelo Tosatti2010-03-01
| | | | | | | | | | | | | | | | | | | cleanup_srcu_struct on VM destruction remains broken: BUG: unable to handle kernel paging request at ffffffffffffffff IP: [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 RIP: 0010:[<ffffffff802533d2>] [<ffffffff802533d2>] srcu_read_lock+0x16/0x21 Call Trace: [<ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm] [<ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm] [<ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel] [<ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm] [<ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm] Move it to kvm_arch_destroy_vm. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
* KVM: avoid taking ioapic mutex for non-ioapic EOIsAvi Kivity2010-03-01
| | | | | | | | | | | | | | | | | When the guest acknowledges an interrupt, it sends an EOI message to the local apic, which broadcasts it to the ioapic. To handle the EOI, we need to take the ioapic mutex. On large guests, this causes a lot of contention on this mutex. Since large guests usually don't route interrupts via the ioapic (they use msi instead), this is completely unnecessary. Avoid taking the mutex by introducing a handled_vectors bitmap. Before taking the mutex, check if the ioapic was actually responsible for the acked vector. If not, we can return early. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>