aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kvm
Commit message (Collapse)AuthorAge
...
| * KVM: PPC: Emulate tw and td instructionsAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 4 conditional trapping instructions: tw, twi, td, tdi. The ones with an i take an immediate comparison, the others compare two registers. All of them arrive in the emulator when the condition to trap was successfully fulfilled. Unfortunately, we were only implementing the i versions so far, so let's also add support for the other two. This fixes kernel booting with recents book3s_32 guest kernels. Reported-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Pass EA to updating emulation opsAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When emulating updating load/store instructions (lwzu, stwu, ...) we need to write the effective address of the load/store into a register. Currently, we write the physical address in there, which is very wrong. So instead let's save off where the virtual fault was on MMIO and use that information as value to put into the register. While at it, also move the XOP variants of the above instructions to the new scheme of using the already known vaddr instead of calculating it themselves. Reported-by: Jörg Sommer <joerg@alea.gnuu.de> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Work around POWER7 DABR corruption problemPaul Mackerras2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that on POWER7, writing to the DABR can cause a corrupted value to be written if the PMU is active and updating SDAR in continuous sampling mode. To work around this, we make sure that the PMU is inactive and SDAR updates are disabled (via MMCRA) when we are context-switching DABR. When the guest sets DABR via the H_SET_DABR hypercall, we use a slightly different workaround, which is to read back the DABR and write it again if it got corrupted. While we are at it, make it consistent that the saving and restoring of the guest's non-volatile GPRs and the FPRs are done with the guest setup of the PMU active. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * Restore guest CR after exit timing calculationBharat Bhushan2012-04-08
| | | | | | | | | | | | | | | | | | | | | | No instruction which can change Condition Register (CR) should be executed after Guest CR is loaded. So the guest CR is restored after the Exit Timing in lightweight_exit executes cmpw, which can clobber CR. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Book3S HV: Report stolen time to guest through dispatch trace logPaul Mackerras2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to measure "stolen" time per virtual core in units of timebase ticks, and to report the stolen time to the guest using the dispatch trace log (DTL). The guest can register an area of memory for the DTL for a given vcpu. The DTL is a ring buffer where KVM fills in one entry every time it enters the guest for that vcpu. Stolen time is measured as time when the virtual core is not running, either because the vcore is not runnable (e.g. some of its vcpus are executing elsewhere in the kernel or in userspace), or when the vcpu thread that is running the vcore is preempted. This includes time when all the vcpus are idle (i.e. have executed the H_CEDE hypercall), which is OK because the guest accounts stolen time while idle as idle time. Each vcpu keeps a record of how much stolen time has been reported to the guest for that vcpu so far. When we are about to enter the guest, we create a new DTL entry (if the guest vcpu has a DTL) and report the difference between total stolen time for the vcore and stolen time reported so far for the vcpu as the "enqueue to dispatch" time in the DTL entry. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Book3S HV: Make virtual processor area registration more robustPaul Mackerras2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PAPR API allows three sorts of per-virtual-processor areas to be registered (VPA, SLB shadow buffer, and dispatch trace log), and furthermore, these can be registered and unregistered for another virtual CPU. Currently we just update the vcpu fields pointing to these areas at the time of registration or unregistration. If this is done on another vcpu, there is the possibility that the target vcpu is using those fields at the time and could end up using a bogus pointer and corrupting memory. This fixes the race by making the target cpu itself do the update, so we can be sure that the update happens at a time when the fields aren't being used. Each area now has a struct kvmppc_vpa which is used to manage these updates. There is also a spinlock which protects access to all of the kvmppc_vpa structs, other than to the pinned_addr fields. (We could have just taken the spinlock when using the vpa, slb_shadow or dtl fields, but that would mean taking the spinlock on every guest entry and exit.) This also changes 'struct dtl' (which was undefined) to 'struct dtl_entry', which is what the rest of the kernel uses. Thanks to Michael Ellerman <michael@ellerman.id.au> for pointing out the need to initialize vcpu->arch.vpa_update_lock. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIsPaul Mackerras2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently on POWER7, if we are running the guest on a core and we don't need all the hardware threads, we do nothing to ensure that the unused threads aren't executing in the kernel (other than checking that they are offline). We just assume they're napping and we don't do anything to stop them trying to enter the kernel while the guest is running. This means that a stray IPI can wake up the hardware thread and it will then try to enter the kernel, but since the core is in guest context, it will execute code from the guest in hypervisor mode once it turns the MMU on, which tends to lead to crashes or hangs in the host. This fixes the problem by adding two new one-byte flags in the kvmppc_host_state structure in the PACA which are used to interlock between the primary thread and the unused secondary threads when entering the guest. With these flags, the primary thread can ensure that the unused secondaries are not already in kernel mode (i.e. handling a stray IPI) and then indicate that they should not try to enter the kernel if they do get woken for any reason. Instead they will go into KVM code, find that there is no vcpu to run, acknowledge and clear the IPI and go back to nap mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Save/Restore CR over vcpu_runAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | On PPC, CR2-CR4 are nonvolatile, thus have to be saved across function calls. We didn't respect that for any architecture until Paul spotted it in his patch for Book3S-HV. This patch saves/restores CR for all KVM capable PPC hosts. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Book3s: PR: Add SPAPR H_BULK_REMOVE supportMatt Evans2012-04-08
| | | | | | | | | | | | | | | | | | | | | | SPAPR support includes various in-kernel hypercalls, improving performance by cutting out the exit to userspace. H_BULK_REMOVE is implemented in this patch. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: Booke: only prepare to enter when we enterAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | So far, we've always called prepare_to_enter even when all we did was return to the host. This patch changes that semantic to only call prepare_to_enter when we actually want to get back into the guest. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: Reinject performance monitor interruptsAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | When we get a performance monitor interrupt, we need to make sure that the host receives it. So reinject it like we reinject the other host destined interrupts. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: expose good state on irq reinjectAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When reinjecting an interrupt into the host interrupt handler after we're back in host kernel land, we need to tell the kernel where the interrupt happened. We can't tell it that we were in guest state, because that might lead to random code walking host addresses. So instead, we tell it that we came from the interrupt reinject code. This helps getting reasonable numbers out of perf. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: Support perfmon interruptsAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | When during guest context we get a performance monitor interrupt, we currently bail out and oops. Let's route it to its correct handler instead. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: fix typo in tlb codeAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | The tlbncfg registers should be populated with their respective TLB's values. Fix the obvious typo. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: bookehv: remove unused codeAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | There was some unused code in the exit code path that must have been a leftover from earlier iterations. While it did no harm, it's superfluous and thus should be removed. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: add GS documentation for program interruptAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | The comment for program interrupts triggered when using bookehv was misleading. Update it to mention why MSR_GS indicates that we have to inject an interrupt into the guest again, not emulate it. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: Readd debug abort code for machine checkAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | When during guest execution we get a machine check interrupt, we don't know how to handle it yet. So let's add the error printing code back again that we dropped accidently earlier and tell user space that something went really wrong. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: bookehv: disable MAS register updates earlyAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure that no MAS updates happen automatically while we have the guest MAS registers loaded. So move the disabling code a bit higher up so that it covers the full time we have guest values in MAS registers. The race this patch fixes should never occur, but it makes the code a bit more logical to do it this way around. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: bookehv: remove SET_VCPUAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | The SET_VCPU macro is a leftover from times when the vcpu struct wasn't stored in the thread on vcpu_load/put. It's not needed anymore. Remove it. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: bookehv: remove negation for CONFIG_64BITAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead if doing #ifndef CONFIG_64BIT ... #else ... #endif we should rather do #ifdef CONFIG_64BIT ... #else ... #endif which is a lot easier to read. Change the bookehv implementation to stick with this rule. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: bookehv: fix exit timingAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | When using exit timing stats, we clobber r9 in the NEED_EMU case, so better move that part down a few lines and fix it that way. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: BOOKE_IRQPRIO_MAX is n+1Alexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | The semantics of BOOKE_IRQPRIO_MAX changed to denote the highest available irqprio + 1, so let's reflect that in the code too. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: rework rescheduling checksAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of checking whether we should reschedule only when we exited due to an interrupt, let's always check before entering the guest back again. This gets the target more in line with the other archs. Also while at it, generalize the whole thing so that eventually we could have a single kvmppc_prepare_to_enter function for all ppc targets that does signal and reschedule checking for us. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: deliver program int on emulation failureAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we fail to emulate an instruction for the guest, we better go in and tell it that we failed to emulate it, by throwing an illegal instruction exception. Please beware that we basically never get around to telling the guest that we failed thanks to the debugging code right above it. If user space however decides that it wants to ignore the debug, we would at least do "the right thing" afterwards. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: remove leftover debuggingAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | The e500mc patches left some debug code in that we don't need. Remove it. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: make e500v2 kvm and e500mc cpu mutually exclusiveAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | We can't run e500v2 kvm on e500mc kernels, so indicate that by making the 2 options mutually exclusive in kconfig. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: rename CONFIG_KVM_E500 -> CONFIG_KVM_E500V2Alexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | The CONFIG_KVM_E500 option really indicates that we're running on a V2 machine, not on a machine of the generic E500 class. So indicate that properly and change the config name accordingly. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500mc: add load inst fixupAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | There's always a chance we're unable to read a guest instruction. The guest could have its TLB mapped execute-, but not readable, something odd happens and our TLB gets flushed. So it's a good idea to be prepared for that case and have a fallback that allows us to fix things up in that case. Add fixup code that keeps guest code from potentially crashing our host kernel. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500mc: Move r1/r2 restoration very earlyAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If we hit any exception whatsoever in the restore path and r1/r2 aren't the host registers, we don't get a working oops. So it's always a good idea to restore them as early as possible. This time, it actually has practical reasons to do so too, since we need to have the host page fault handler fix up our guest instruction read code. And for that to work we need r1/r2 restored. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500mc: implicitly set MSR_GSAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | When setting MSR for an e500mc guest, we implicitly always set MSR_GS to make sure the guest is in guest state. Since we have this implicit rule there, we don't need to explicitly pass MSR_GS to set_msr(). Remove all explicit setters of MSR_GS. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500mc: Add doorbell emulation supportAlexander Graf2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | When one vcpu wants to kick another, it can issue a special IPI instruction called msgsnd. This patch emulates this instruction, its clearing counterpart and the infrastructure required to actually trigger that interrupt inside a guest vcpu. With this patch, SMP guests on e500mc work. Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500mc supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add processor support for e500mc, using hardware virtualization support (GS-mode). Current issues include: - No support for external proxy (coreint) interrupt mode in the guest. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: standard PPC floating point supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | e500mc has a normal PPC FPU, rather than SPE which is found on e500v1/v2. Based on code from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: category E.HV (GS-mode) supportScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chips such as e500mc that implement category E.HV in Power ISA 2.06 provide hardware virtualization features, including a new MSR mode for guest state. The guest OS can perform many operations without trapping into the hypervisor, including transitions to and from guest userspace. Since we can use SRR1[GS] to reliably tell whether an exception came from guest state, instead of messing around with IVPR, we use DO_KVM similarly to book3s. Current issues include: - Machine checks from guest state are not routed to the host handler. - The guest can cause a host oops by executing an emulated instruction in a page that lacks read permission. Existing e500/4xx support has the same problem. Includes work by Ashish Kalra <Ashish.Kalra@freescale.com>, Varun Sethi <Varun.Sethi@freescale.com>, and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: remove pt_regs usage] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: emulate tlbilxScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | tlbilx is the new, preferred invalidation instruction. It is not found on e500 prior to e500mc, but there should be no harm in supporting it on all e500. Based on code from Ashish Kalra <Ashish.Kalra@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: Track TLB1 entries with a bitmapScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Rather than invalidate everything when a TLB1 entry needs to be taken down, keep track of which host TLB1 entries are used for a given guest TLB1 entry, and invalidate just those entries. Based on code from Ashish Kalra <Ashish.Kalra@freescale.com> and Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: refactor core-specific TLB codeScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PID handling is e500v1/v2-specific, and is moved to e500.c. The MMU sregs code and kvmppc_core_vcpu_translate will be shared with e500mc, and is moved from e500.c to e500_tlb.c. Partially based on patches from Liu Yu <yu.liu@freescale.com>. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: clean up arch/powerpc/kvm/e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move vcpu to the beginning of vcpu_e500 to give it appropriate prominence, especially if more fields end up getting added to the end of vcpu_e500 (and vcpu ends up in the middle). Remove gratuitous "extern" and add parameter names to prototypes. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: merge <asm/kvm_e500.h> into arch/powerpc/kvm/e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Keeping two separate headers for e500-specific things was a pain, and wasn't even organized along any logical boundary. There was TLB stuff in <asm/kvm_e500.h> despite the existence of arch/powerpc/kvm/e500_tlb.h, and nothing in <asm/kvm_e500.h> needed to be referenced from outside arch/powerpc/kvm. Signed-off-by: Scott Wood <scottwood@freescale.com> [agraf: fix bisectability] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: e500: rename e500_tlb.h to e500.hScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | This is in preparation for merging in the contents of arch/powerpc/include/asm/kvm_e500.h. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: Move vm core init/destroy out of booke.cScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | e500mc will want to do lpid allocation/deallocation here. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: booke: add booke-level vcpu load/putScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | | | This gives us a place to put load/put actions that correspond to code that is booke-specific but not specific to a particular core. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: PPC: factor out lpid allocator from book3s_64_mmu_hvScott Wood2012-04-08
| | | | | | | | | | | | | | | | | | We'll use it on e500mc as well. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
| * KVM: Factor out kvm_vcpu_kick to arch-generic codeChristoffer Dall2012-04-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Christoffer Dall <c.dall@virtualopensystems.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
* | KVM: PPC: Book3S HV: Fix bug leading to deadlock in guest HPT updatesPaul Mackerras2012-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When handling the H_BULK_REMOVE hypercall, we were forgetting to invalidate and unlock the hashed page table entry (HPTE) in the case where the page had been paged out. This fixes it by clearing the first doubleword of the HPTE in that case. This fixes a regression introduced in commit a92bce95f0 ("KVM: PPC: Book3S HV: Keep HPTE locked when invalidating"). The effect of the regression is that the host kernel will sometimes hang when under memory pressure. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de>
* | powerpc/kvm: Fix VSID usage in 64-bit "PR" KVMBenjamin Herrenschmidt2012-05-16
| | | | | | | | | | | | | | | | | | | | | | | | The code forgot to scramble the VSIDs the way we normally do and was basically using the "proto VSID" directly with the MMU. This means that in practice, KVM used random VSIDs that could collide with segments used by other user space programs. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [agraf: simplify ppc32 case] Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S: PR: Fix hsrr codeAlexander Graf2012-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When jumping back into the kernel to code that knows that it would be using HSRR registers instead of SRR registers, we need to make sure we pass it all information on where to jump to in HSRR registers. Unfortunately, we used r10 to store the information to distinguish between the HSRR and SRR case. That register got clobbered in between though, rendering the later comparison invalid. Instead, let's use cr1 to store this information. That way we don't need yet another register and everyone's happy. This fixes PR KVM on POWER7 bare metal for me. Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Fix PR KVM on POWER7 bare metalAlexander Graf2012-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running on a system that is HV capable, some interrupts use HSRR SPRs instead of the normal SRR SPRs. These are also used in the Linux handlers to jump back to code after an interrupt got processed. Unfortunately, in our "jump back to the real host handler after we've done the context switch" code, we were only setting the SRR SPRs, rendering Linux to jump back to some invalid IP after it's processed the interrupt. This fixes random crashes on p7 opal mode with PR KVM for me. Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S: PR: Handle EMUL_ASSISTAlexander Graf2012-05-16
| | | | | | | | | | | | | | | | | | | | | | | | | | In addition to normal "priviledged instruction" traps, we can also receive "emulation assist" traps on newer hardware that has the HV bit set. Handle that one the same way as a privileged instruction, including the instruction fetching. That way we don't execute old instructions that we happen to still leave in that field when an emul assist trap comes. This fixes -M mac99 / -M g3beige on p7 bare metal for me. Signed-off-by: Alexander Graf <agraf@suse.de>
* | KVM: PPC: Book3S HV: Fix refcounting of hugepagesDavid Gibson2012-05-08
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | The H_REGISTER_VPA hcall implementation in HV Power KVM needs to pin some guest memory pages into host memory so that they can be safely accessed from usermode. It does this used get_user_pages_fast(). When the VPA is unregistered, or the VCPUs are cleaned up, these pages are released using put_page(). However, the get_user_pages() is invoked on the specific memory are of the VPA which could lie within hugepages. In case the pinned page is huge, we explicitly find the head page of the compound page before calling put_page() on it. At least with the latest kernel, this is not correct. put_page() already handles finding the correct head page of a compound, and also deals with various counts on the individual tail page which are important for transparent huge pages. We don't support transparent hugepages on Power, but even so, bypassing this count maintenance can lead (when the VM ends) to a hugepage being released back to the pool with a non-zero mapcount on one of the tail pages. This can then lead to a bad_page() when the page is released from the hugepage pool. This removes the explicit compound_head() call to correct this bug. Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Paul Mackerras <paulus@samba.org> Acked-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>