aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * | KVM: PPC: e500mc: Move r1/r2 restoration very earlyAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hit any exception whatsoever in the restore path and r1/r2 aren't the host registers, we don't get a working oops. So it's always a good idea to restore them as early as possible. This time, it actually has practical reasons to do so too, since we need to have the host page fault handler fix up our guest instruction read code. And for that to work we need r1/r2 restored. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500mc: implicitly set MSR_GSAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When setting MSR for an e500mc guest, we implicitly always set MSR_GS to make sure the guest is in guest state. Since we have this implicit rule there, we don't need to explicitly pass MSR_GS to set_msr(). Remove all explicit setters of MSR_GS. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500mc: Add doorbell emulation supportAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When one vcpu wants to kick another, it can issue a special IPI instruction called msgsnd. This patch emulates this instruction, its clearing counterpart and the infrastructure required to actually trigger that interrupt inside a guest vcpu. With this patch, SMP guests on e500mc work. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500mc supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add processor support for e500mc, using hardware virtualization support (GS-mode). Current issues include: - No support for external proxy (coreint) interrupt mode in the guest. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: booke: standard PPC floating point supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | e500mc has a normal PPC FPU, rather than SPE which is found on e500v1/v2. Based on code from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: booke: category E.HV (GS-mode) supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chips such as e500mc that implement category E.HV in Power ISA 2.06 provide hardware virtualization features, including a new MSR mode for guest state. The guest OS can perform many operations without trapping into the hypervisor, including transitions to and from guest userspace. Since we can use SRR1[GS] to reliably tell whether an exception came from guest state, instead of messing around with IVPR, we use DO_KVM similarly to book3s. Current issues include: - Machine checks from guest state are not routed to the host handler. - The guest can cause a host oops by executing an emulated instruction in a page that lacks read permission. Existing e500/4xx support has the same problem. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove pt_regs usage] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | powerpc/booke: Provide exception macros with interrupt nameScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DO_KVM will need to identify the particular exception type. There is an existing set of arbitrary numbers that Linux passes, but it's an undocumented mess that sort of corresponds to server/classic exception vectors but not really. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: emulate tlbilxScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tlbilx is the new, preferred invalidation instruction. It is not found on e500 prior to e500mc, but there should be no harm in supporting it on all e500. Based on code from Ashish Kalra <Ashish.Kalra@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: Track TLB1 entries with a bitmapScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than invalidate everything when a TLB1 entry needs to be taken down, keep track of which host TLB1 entries are used for a given guest TLB1 entry, and invalidate just those entries. Based on code from Ashish Kalra <Ashish.Kalra@freescale.com> and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: refactor core-specific TLB codeScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PID handling is e500v1/v2-specific, and is moved to e500.c. The MMU sregs code and kvmppc_core_vcpu_translate will be shared with e500mc, and is moved from e500.c to e500_tlb.c. Partially based on patches from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: clean up arch/powerpc/kvm/e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu to the beginning of vcpu_e500 to give it appropriate prominence, especially if more fields end up getting added to the end of vcpu_e500 (and vcpu ends up in the middle). Remove gratuitous "extern" and add parameter names to prototypes. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: merge <asm/kvm_e500.h> into arch/powerpc/kvm/e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keeping two separate headers for e500-specific things was a pain, and wasn't even organized along any logical boundary. There was TLB stuff in <asm/kvm_e500.h> despite the existence of arch/powerpc/kvm/e500_tlb.h, and nothing in <asm/kvm_e500.h> needed to be referenced from outside arch/powerpc/kvm. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: e500: rename e500_tlb.h to e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is in preparation for merging in the contents of arch/powerpc/include/asm/kvm_e500.h. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: booke: Move vm core init/destroy out of booke.cScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | e500mc will want to do lpid allocation/deallocation here. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: booke: add booke-level vcpu load/putScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This gives us a place to put load/put actions that correspond to code that is booke-specific but not specific to a particular core. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: factor out lpid allocator from book3s_64_mmu_hvScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | We'll use it on e500mc as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | powerpc/e500: split CPU_FTRS_ALWAYS/CPU_FTRS_POSSIBLEScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | Split e500 (v1/v2) and e500mc/e5500 to allow optimization of feature checks that differ between the two. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | powerpc/booke: Set CPU_FTR_DEBUG_LVL_EXC on 32-bitScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently 32-bit only cares about this for choice of exception vector, which is done in core-specific code. However, KVM will want to distinguish as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: Remove unused dirty_bitmap_head and nr_dirty_pagesTakuya Yoshikawa2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we do neither double buffering nor heuristic selection of the write protection method these are not needed anymore. Note: some drivers have their own implementation of set_bit_le() and making it generic needs a bit of work; so we use test_and_set_bit_le() and will later replace it with generic set_bit_le(). Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: Switch to srcu-less get_dirty_log()Takuya Yoshikawa2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have seen some problems of the current implementation of get_dirty_log() which uses synchronize_srcu_expedited() for updating dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms order of latency when we use VGA displays. Furthermore the recent discussion on the following thread "srcu: Implement call_srcu()" http://lkml.org/lkml/2012/1/31/211 also motivated us to implement get_dirty_log() without SRCU. This patch achieves this goal without sacrificing the performance of both VGA and live migration: in practice the new code is much faster than the old one unless we have too many dirty pages. Implementation: The key part of the implementation is the use of xchg() operation for clearing dirty bits atomically. Since this allows us to update only BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap until every dirty bit is cleared again for the next call. Although some people may worry about the problem of using the atomic memory instruction many times to the concurrently accessible bitmap, it is usually accessed with mmu_lock held and we rarely see concurrent accesses: so what we need to care about is the pure xchg() overheads. Another point to note is that we do not use for_each_set_bit() to check which ones in each BITS_PER_LONG pages are actually dirty. Instead we simply use __ffs() in a loop. This is much faster than repeatedly call find_next_bit(). Performance: The dirty-log-perf unit test showed nice improvements, some times faster than before, except for some extreme cases; for such cases the speed of getting dirty page information is much faster than we process it in the userspace. For real workloads, both VGA and live migration, we have observed pure improvements: when the guest was reading a file during live migration, we originally saw a few ms of latency, but with the new method the latency was less than 200us. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: Avoid checking huge page mappings in get_dirty_log()Takuya Yoshikawa2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dropped such mappings when we enabled dirty logging and we will never create new ones until we stop the logging. For this we introduce a new function which can be used to write protect a range of PT level pages: although we do not need to care about a range of pages at this point, the following patch will need this feature to optimize the write protection of many pages. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: MMU: Split the main body of rmap_write_protect() off from othersTakuya Yoshikawa2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will use this in the following patch to implement another function which needs to write protect pages using the rmap information. Note that there is a small change in debug printing for large pages: we do not differentiate them from others to avoid duplicating code. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | kvmclock: remove unneeded EXPORT macroEric B Munson2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | check_and_clear_guest_paused does not need to be exported as it isn't used by any modules, remove the export. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: fix kvm_vcpu_kick build failure on S390Marcelo Tosatti2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | S390's kvm_vcpu_stat does not contain halt_wakeup member. Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | watchdog: add check for suspended vm in softlockup detectorEric B Munson2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A suspended VM can cause spurious soft lockup warnings. To avoid these, the watchdog now checks if the kernel knows it was stopped by the host and skips the warning if so. When the watchdog is reset successfully, clear the guest paused flag. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRLEric B Munson2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | kvmclock: Add functions to check if the host has stopped the vmEric B Munson2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson <emunson@mgebm.net> asm-generic changes Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | x86: pvclock: Add flag to indicate that a vm was stopped by the hostEric B Munson2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag will be used to check if the vm was stopped by the host when a soft lockup was detected. The host will set the flag when it stops the guest. On resume, the guest will check this flag if a soft lockup is detected and skip issuing the warning. Signed-off-by: Eric B Munson <emunson@mgebm.net> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: PPC: Rework wqp conditional codeAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On PowerPC, we sometimes use a waitqueue per core, not per thread, so we can't always use the vcpu internal waitqueue. This code has been generalized by Christoffer Dall recently, but unfortunately broke compilation for PowerPC. At the time the helper function is defined, struct kvm_vcpu is not declared yet, so we can't dereference it. This patch moves all logic into the generic inline function, at which time we have all information necessary. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: Factor out kvm_vcpu_kick to arch-generic codeChristoffer Dall2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: schedule debugfs statistics for removalAvi Kivity2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | Deprecated in favour of tracepoints. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: SVM: count all irq windows exitJason Wang2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also count the exits of fast-path. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Joerg Roedel <joerg.roedel@amd.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: set upper bounds for iobus dev to limit userspaceAmos Kong2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_io_bus devices are used for ioevent, pit, pic, ioapic, coalesced_mmio. Currently Qemu only emulates one PCI bus, it contains 32 slots, one slot contains 8 functions, maximum of supported PCI devices: 1 * 32 * 8 = 256. One virtio-blk takes one iobus device, one virtio-net(vhost=on) takes two iobus devices. The maximum of coalesced mmio zone is 100, each zone has an iobus devices. So 300 io_bus devices are not enough. Set an upper bounds for kvm_io_range to limit userspace. 1000 is a very large limit and not bloat the typical user. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: resize kvm_io_range array dynamicallyAmos Kong2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the kvm_io_range array can be resized dynamically. Signed-off-by: Amos Kong <akong@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | KVM: x86: expose Intel cpu new features (HLE, RTM) to guestLiu, Jinsong2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel recently release 2 new features, HLE and RTM. Refer to http://software.intel.com/file/41417. This patch expose them to guest. Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | | Merge tag 'stable/for-linus-3.5-rc0-tag' of ↵Linus Torvalds2012-05-24
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen updates from Konrad Rzeszutek Wilk: "Features: * Extend the APIC ops implementation and add IRQ_WORKER vector support so that 'perf' can work properly. * Fix self-ballooning code, and balloon logic when booting as initial domain. * Move array printing code to generic debugfs * Support XenBus domains. * Lazily free grants when a domain is dead/non-existent. * In M2P code use batching calls Bug-fixes: * Fix NULL dereference in allocation failure path (hvc_xen) * Fix unbinding of IRQ_WORKER vector during vCPU hot-unplug * Fix HVM guest resume - we would leak an PIRQ value instead of reusing the existing one." Fix up add-add onflicts in arch/x86/xen/enlighten.c due to addition of apic ipi interface next to the new apic_id functions. * tag 'stable/for-linus-3.5-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen: do not map the same GSI twice in PVHVM guests. hvc_xen: NULL dereference on allocation failure xen: Add selfballoning memory reservation tunable. xenbus: Add support for xenbus backend in stub domain xen/smp: unbind irqworkX when unplugging vCPUs. xen: enter/exit lazy_mmu_mode around m2p_override calls xen/acpi/sleep: Enable ACPI sleep via the __acpi_os_prepare_sleep xen: implement IRQ_WORK_VECTOR handler xen: implement apic ipi interface xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/gnttab: add deferred freeing logic debugfs: Add support to print u32 array in debugfs xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.
| * | | xen: do not map the same GSI twice in PVHVM guests.Stefano Stabellini2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PV on HVM guests map GSIs into event channels. At restore time the event channels are resumed by restore_pirqs. Device drivers might try to register the same GSI again through ACPI at restore time, but the GSI has already been mapped and bound by restore_pirqs. This patch detects these situations and avoids mapping the same GSI multiple times. Without this patch we get: (XEN) irq.c:2235: dom4: pirq 23 or emuirq 28 already mapped and waste a pirq. CC: stable@kernel.org Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | hvc_xen: NULL dereference on allocation failureDan Carpenter2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If kzalloc() returns a NULL here, we pass a NULL to xencons_disconnect_backend() which will cause an Oops. Also I removed the __GFP_ZERO while I was at it since kzalloc() implies __GFP_ZERO. CC: stable@kernel.org Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen: Add selfballoning memory reservation tunable.Jana Saout2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the memory target in the Xen selfballooning driver is mainly driven by the value of "Committed_AS". However, there are cases in which it is desirable to assign additional memory to be available for the kernel, e.g. for local caches (which are not covered by cleancache), e.g. dcache and inode caches. This adds an additional tunable in the selfballooning driver (accessible via sysfs) which allows the user to specify an additional constant amount of memory to be reserved by the selfballoning driver for the local domain. Signed-off-by: Jana Saout <jana@saout.de> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xenbus: Add support for xenbus backend in stub domainDaniel De Graaf2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an ioctl to the /dev/xen/xenbus_backend device allowing the xenbus backend to be started after the kernel has booted. This allows xenstore to run in a different domain from the dom0. Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | xen/smp: unbind irqworkX when unplugging vCPUs.Konrad Rzeszutek Wilk2012-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The git commit 1ff2b0c303698e486f1e0886b4d9876200ef8ca5 "xen: implement IRQ_WORK_VECTOR handler" added the functionality to have a per-cpu "irqworkX" for the IPI APIC functionality. However it missed the unbind when a vCPU is unplugged resulting in an orphaned per-cpu interrupt line for unplugged vCPU: 30: 216 0 xen-dyn-event hvc_console 31: 810 4 xen-dyn-event eth0 32: 29 0 xen-dyn-event blkif - 36: 0 0 xen-percpu-ipi irqwork2 - 37: 287 0 xen-dyn-event xenbus + 36: 287 0 xen-dyn-event xenbus NMI: 0 0 Non-maskable interrupts LOC: 0 0 Local timer interrupts SPU: 0 0 Spurious interrupts Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | Merge branch 'stable/autoballoon.v5.2' into stable/for-linus-3.5Konrad Rzeszutek Wilk2012-05-07
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * stable/autoballoon.v5.2: xen/setup: update VA mapping when releasing memory during setup xen/setup: Combine the two hypercall functions - since they are quite similar. xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAM xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0 xen/p2m: An early bootup variant of set_phys_to_machine xen/p2m: Collapse early_alloc_p2m_middle redundant checks. xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argument xen/p2m: Move code around to allow for better re-usage.
| | * | | xen/setup: update VA mapping when releasing memory during setupDavid Vrabel2012-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xen_memory_setup(), if a page that is being released has a VA mapping this must also be updated. Otherwise, the page will be not released completely -- it will still be referenced in Xen and won't be freed util the mapping is removed and this prevents it from being reallocated at a different PFN. This was already being done for the ISA memory region in xen_ident_map_ISA() but on many systems this was omitting a few pages as many systems marked a few pages below the ISA memory region as reserved in the e820 map. This fixes errors such as: (XEN) page_alloc.c:1148:d0 Over-allocation for domain 0: 2097153 > 2097152 (XEN) memory.c:133:d0 Could not allocate order=0 extent: id=0 memflags=0 (0 of 17) Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/setup: Combine the two hypercall functions - since they are quite similar.Konrad Rzeszutek Wilk2012-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They use the same set of arguments, so it is just the matter of using the proper hypercall. Acked-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/setup: Populate freed MFNs from non-RAM E820 entries and gaps to E820 RAMKonrad Rzeszutek Wilk2012-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the Xen hypervisor boots a PV kernel it hands it two pieces of information: nr_pages and a made up E820 entry. The nr_pages value defines the range from zero to nr_pages of PFNs which have a valid Machine Frame Number (MFN) underneath it. The E820 mirrors that (with the VGA hole): BIOS-provided physical RAM map: Xen: 0000000000000000 - 00000000000a0000 (usable) Xen: 00000000000a0000 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000080800000 (usable) The fun comes when a PV guest that is run with a machine E820 - that can either be the initial domain or a PCI PV guest, where the E820 looks like the normal thing: BIOS-provided physical RAM map: Xen: 0000000000000000 - 000000000009e000 (usable) Xen: 000000000009ec00 - 0000000000100000 (reserved) Xen: 0000000000100000 - 0000000020000000 (usable) Xen: 0000000020000000 - 0000000020200000 (reserved) Xen: 0000000020200000 - 0000000040000000 (usable) Xen: 0000000040000000 - 0000000040200000 (reserved) Xen: 0000000040200000 - 00000000bad80000 (usable) Xen: 00000000bad80000 - 00000000badc9000 (ACPI NVS) .. With that overlaying the nr_pages directly on the E820 does not work as there are gaps and non-RAM regions that won't be used by the memory allocator. The 'xen_release_chunk' helps with that by punching holes in the P2M (PFN to MFN lookup tree) for those regions and tells us that: Freeing 20000-20200 pfn range: 512 pages freed Freeing 40000-40200 pfn range: 512 pages freed Freeing bad80-badf4 pfn range: 116 pages freed Freeing badf6-bae7f pfn range: 137 pages freed Freeing bb000-100000 pfn range: 282624 pages freed Released 283999 pages of unused memory Those 283999 pages are subtracted from the nr_pages and are returned to the hypervisor. The end result is that the initial domain boots with 1GB less memory as the nr_pages has been subtracted by the amount of pages residing within the PCI hole. It can balloon up to that if desired using 'xl mem-set 0 8092', but the balloon driver is not always compiled in for the initial domain. This patch, implements the populate hypercall (XENMEM_populate_physmap) which increases the the domain with the same amount of pages that were released. The other solution (that did not work) was to transplant the MFN in the P2M tree - the ones that were going to be freed were put in the E820_RAM regions past the nr_pages. But the modifications to the M2P array (the other side of creating PTEs) were not carried away. As the hypervisor is the only one capable of modifying that and the only two hypercalls that would do this are: the update_va_mapping (which won't work, as during initial bootup only PFNs up to nr_pages are mapped in the guest) or via the populate hypercall. The end result is that the kernel can now boot with the nr_pages without having to subtract the 283999 pages. On a 8GB machine, with various dom0_mem= parameters this is what we get: no dom0_mem -Memory: 6485264k/9435136k available (5817k kernel code, 1136060k absent, 1813812k reserved, 2899k data, 696k init) +Memory: 7619036k/9435136k available (5817k kernel code, 1136060k absent, 680040k reserved, 2899k data, 696k init) dom0_mem=3G -Memory: 2616536k/9435136k available (5817k kernel code, 1136060k absent, 5682540k reserved, 2899k data, 696k init) +Memory: 2703776k/9435136k available (5817k kernel code, 1136060k absent, 5595300k reserved, 2899k data, 696k init) dom0_mem=max:3G -Memory: 2696732k/4281724k available (5817k kernel code, 1136060k absent, 448932k reserved, 2899k data, 696k init) +Memory: 2702204k/4281724k available (5817k kernel code, 1136060k absent, 443460k reserved, 2899k data, 696k init) And the 'xm list' or 'xl list' now reflect what the dom0_mem= argument is. Acked-by: David Vrabel <david.vrabel@citrix.com> [v2: Use populate hypercall] [v3: Remove debug printks] [v4: Simplify code] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/setup: Only print "Freeing XXX-YYY pfn range: Z pages freed" if Z > 0Konrad Rzeszutek Wilk2012-05-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we can get these meaningless: Freeing bad80-badf4 pfn range: 0 pages freed We also can do this for the summary ones - no point of printing "Set 0 page(s) to 1-1 mapping" Acked-by: David Vrabel <david.vrabel@citrix.com> [v1: Extended to the summary printks] Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: An early bootup variant of set_phys_to_machineKonrad Rzeszutek Wilk2012-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During early bootup we can't use alloc_page, so to allocate leaf pages in the P2M we need to use extend_brk. For that we are utilizing the early_alloc_p2m and early_alloc_p2m_middle functions to do the job for us. This function follows the same logic as set_phys_to_machine. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: Collapse early_alloc_p2m_middle redundant checks.Konrad Rzeszutek Wilk2012-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the start of the function we were checking for idx != 0 and bailing out. And later calling extend_brk if idx != 0. That is unnecessary so remove that checks. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: Allow alloc_p2m_middle to call reserve_brk depending on argumentKonrad Rzeszutek Wilk2012-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For identity cases we want to call reserve_brk only on the boundary conditions of the middle P2M (so P2M[x][y][0] = extend_brk). This is to work around identify regions (PCI spaces, gaps in E820) which are not aligned on 2MB regions. However for the case were we want to allocate P2M middle leafs at the early bootup stage, irregardless of this alignment check we need some means of doing that. For that we provide the new argument. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| | * | | xen/p2m: Move code around to allow for better re-usage.Konrad Rzeszutek Wilk2012-04-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to be using the early_alloc_p2m (and early_alloc_p2m_middle) code in follow up patches which are not related to setting identity pages. Hence lets move the code out in its own function and rename them as appropiate. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>