| Commit message (Collapse) | Author | Age |
... | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The last user of udbg_read() was removed in 2005, in commit fca5dcd
"Simplify and clean up the xmon terminal I/O".
Given we haven't needed it for 7 years we can probably drop it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Don't use 47x only #defines for TLBIVAX or ICBT, supply and use helpers
in ppc-opcode.h
This fixes a compile breakage.
Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This function is used by sparc, powerpc tile and arm64 for compat support.
The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.
The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile]
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently the __define_initcall() macro takes three arguments, fn, id and
level. The level argument is exactly the same as the id argument but
wrapped in quotes. To overcome this need to specify three arguments to
the __define_initcall macro, where one argument is the stringification of
another, we can just use the stringification macro instead.
Signed-off-by: Matthew Leach <matthew@mattleach.net>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull KVM updates from Marcelo Tosatti:
"Considerable KVM/PPC work, x86 kvmclock vsyscall support,
IA32_TSC_ADJUST MSR emulation, amongst others."
Fix up trivial conflict in kernel/sched/core.c due to cross-cpu
migration notifier added next to rq migration call-back.
* tag 'kvm-3.8-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (156 commits)
KVM: emulator: fix real mode segment checks in address linearization
VMX: remove unneeded enable_unrestricted_guest check
KVM: VMX: fix DPL during entry to protected mode
x86/kexec: crash_vmclear_local_vmcss needs __rcu
kvm: Fix irqfd resampler list walk
KVM: VMX: provide the vmclear function and a bitmap to support VMCLEAR in kdump
x86/kexec: VMCLEAR VMCSs loaded on all cpus if necessary
KVM: MMU: optimize for set_spte
KVM: PPC: booke: Get/set guest EPCR register using ONE_REG interface
KVM: PPC: bookehv: Add EPCR support in mtspr/mfspr emulation
KVM: PPC: bookehv: Add guest computation mode for irq delivery
KVM: PPC: Make EPCR a valid field for booke64 and bookehv
KVM: PPC: booke: Extend MAS2 EPN mask for 64-bit
KVM: PPC: e500: Mask MAS2 EPN high 32-bits in 32/64 tlbwe emulation
KVM: PPC: Mask ea's high 32-bits in 32/64 instr emulation
KVM: PPC: e500: Add emulation helper for getting instruction ea
KVM: PPC: bookehv64: Add support for interrupt handling
KVM: PPC: bookehv: Remove GET_VCPU macro from exception handler
KVM: PPC: booke: Fix get_tb() compile error on 64-bit
KVM: PPC: e500: Silence bogus GCC warning in tlb code
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In BookE, EPCR is defined and valid when either the HV or the 64bit
category are implemented. Reflect this in the field definition.
Today the only KVM target on 64bit is HV enabled, so there is no
change in actual source code, but this keeps the code closer to the
spec and doesn't build up artificial road blocks for a PR KVM
on 64bit.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Extend MAS2 EPN mask to retain most significant bits on 64-bit hosts.
Use this mask in tlb effective address accessor.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Mask high 32 bits of effective address in emulation layer for guests running
in 32-bit mode.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: fix indent]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add emulation helper for getting instruction ea and refactor tlb instruction
emulation to use it.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
[agraf: keep rt variable around]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add interrupt handling support for 64-bit bookehv hosts. Unify 32 and 64 bit
implementations using a common stack layout and a common execution flow starting
from kvm_handler_common macro. Update documentation for 64-bit input register
values. This patch only address the bolted TLB miss exception handlers version.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
panicking
Currently, if a machine check interrupt happens while we are in the
guest, we exit the guest and call the host's machine check handler,
which tends to cause the host to panic. Some machine checks can be
triggered by the guest; for example, if the guest creates two entries
in the SLB that map the same effective address, and then accesses that
effective address, the CPU will take a machine check interrupt.
To handle this better, when a machine check happens inside the guest,
we call a new function, kvmppc_realmode_machine_check(), while still in
real mode before exiting the guest. On POWER7, it handles the cases
that the guest can trigger, either by flushing and reloading the SLB,
or by flushing the TLB, and then it delivers the machine check interrupt
directly to the guest without going back to the host. On POWER7, the
OPAL firmware patches the machine check interrupt vector so that it
gets control first, and it leaves behind its analysis of the situation
in a structure pointed to by the opal_mc_evt field of the paca. The
kvmppc_realmode_machine_check() function looks at this, and if OPAL
reports that there was no error, or that it has handled the error, we
also go straight back to the guest with a machine check. We have to
deliver a machine check to the guest since the machine check interrupt
might have trashed valid values in SRR0/1.
If the machine check is one we can't handle in real mode, and one that
OPAL hasn't already handled, or on PPC970, we exit the guest and call
the host's machine check handler. We do this by jumping to the
machine_check_fwnmi label, rather than absolute address 0x200, because
we don't want to re-execute OPAL's handler on POWER7. On PPC970, the
two are equivalent because address 0x200 just contains a branch.
Then, if the host machine check handler decides that the system can
continue executing, kvmppc_handle_exit() delivers a machine check
interrupt to the guest -- once again to let the guest know that SRR0/1
have been modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix checkpatch warnings]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When we change or remove a HPT (hashed page table) entry, we can do
either a global TLB invalidation (tlbie) that works across the whole
machine, or a local invalidation (tlbiel) that only affects this core.
Currently we do local invalidations if the VM has only one vcpu or if
the guest requests it with the H_LOCAL flag, though the guest Linux
kernel currently doesn't ever use H_LOCAL. Then, to cope with the
possibility that vcpus moving around to different physical cores might
expose stale TLB entries, there is some code in kvmppc_hv_entry to
flush the whole TLB of entries for this VM if either this vcpu is now
running on a different physical core from where it last ran, or if this
physical core last ran a different vcpu.
There are a number of problems on POWER7 with this as it stands:
- The TLB invalidation is done per thread, whereas it only needs to be
done per core, since the TLB is shared between the threads.
- With the possibility of the host paging out guest pages, the use of
H_LOCAL by an SMP guest is dangerous since the guest could possibly
retain and use a stale TLB entry pointing to a page that had been
removed from the guest.
- The TLB invalidations that we do when a vcpu moves from one physical
core to another are unnecessary in the case of an SMP guest that isn't
using H_LOCAL.
- The optimization of using local invalidations rather than global should
apply to guests with one virtual core, not just one vcpu.
(None of this applies on PPC970, since there we always have to
invalidate the whole TLB when entering and leaving the guest, and we
can't support paging out guest memory.)
To fix these problems and simplify the code, we now maintain a simple
cpumask of which cpus need to flush the TLB on entry to the guest.
(This is indexed by cpu, though we only ever use the bits for thread
0 of each core.) Whenever we do a local TLB invalidation, we set the
bits for every cpu except the bit for thread 0 of the core that we're
currently running on. Whenever we enter a guest, we test and clear the
bit for our core, and flush the TLB if it was set.
On initial startup of the VM, and when resetting the HPT, we set all the
bits in the need_tlb_flush cpumask, since any core could potentially have
stale TLB entries from the previous VM to use the same LPID, or the
previous contents of the HPT.
Then, we maintain a count of the number of online virtual cores, and use
that when deciding whether to use a local invalidation rather than the
number of online vcpus. The code to make that decision is extracted out
into a new function, global_invalidates(). For multi-core guests on
POWER7 (i.e. when we are using mmu notifiers), we now never do local
invalidations regardless of the H_LOCAL flag.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This fixes various issues in how we were handling the VSX registers
that exist on POWER7 machines. First, we were running off the end
of the current->thread.fpr[] array. Ultimately this was because the
vcpu->arch.vsr[] array is sized to be able to store both the FP
registers and the extra VSX registers (i.e. 64 entries), but PR KVM
only uses it for the extra VSX registers (i.e. 32 entries).
Secondly, calling load_up_vsx() from C code is a really bad idea,
because it jumps to fast_exception_return at the end, rather than
returning with a blr instruction. This was causing it to jump off
to a random location with random register contents, since it was using
the largely uninitialized stack frame created by kvmppc_load_up_vsx.
In fact, it isn't necessary to call either __giveup_vsx or load_up_vsx,
since giveup_fpu and load_up_fpu handle the extra VSX registers as well
as the standard FP registers on machines with VSX. Also, since VSX
instructions can access the VMX registers and the FP registers as well
as the extra VSX registers, we have to load up the FP and VMX registers
before we can turn on the MSR_VSX bit for the guest. Conversely, if
we save away any of the VSX or FP registers, we have to turn off MSR_VSX
for the guest.
To handle all this, it is more convenient for a single call to
kvmppc_giveup_ext() to handle all the state saving that needs to be done,
so we make it take a set of MSR bits rather than just one, and the switch
statement becomes a series of if statements. Similarly kvmppc_handle_ext
needs to be able to load up more than one set of registers.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This adds basic emulation of the PURR and SPURR registers. We assume
we are emulating a single-threaded core, so these advance at the same
rate as the timebase. A Linux kernel running on a POWER7 expects to
be able to access these registers and is not prepared to handle a
program interrupt on accessing them.
This also adds a very minimal emulation of the DSCR (data stream
control register). Writes are ignored and reads return zero.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A new ioctl, KVM_PPC_GET_HTAB_FD, returns a file descriptor. Reads on
this fd return the contents of the HPT (hashed page table), writes
create and/or remove entries in the HPT. There is a new capability,
KVM_CAP_PPC_HTAB_FD, to indicate the presence of the ioctl. The ioctl
takes an argument structure with the index of the first HPT entry to
read out and a set of flags. The flags indicate whether the user is
intending to read or write the HPT, and whether to return all entries
or only the "bolted" entries (those with the bolted bit, 0x10, set in
the first doubleword).
This is intended for use in implementing qemu's savevm/loadvm and for
live migration. Therefore, on reads, the first pass returns information
about all HPTEs (or all bolted HPTEs). When the first pass reaches the
end of the HPT, it returns from the read. Subsequent reads only return
information about HPTEs that have changed since they were last read.
A read that finds no changed HPTEs in the HPT following where the last
read finished will return 0 bytes.
The format of the data provides a simple run-length compression of the
invalid entries. Each block of data starts with a header that indicates
the index (position in the HPT, which is just an array), the number of
valid entries starting at that index (may be zero), and the number of
invalid entries following those valid entries. The valid entries, 16
bytes each, follow the header. The invalid entries are not explicitly
represented.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix documentation]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This makes a HPTE removal function, kvmppc_do_h_remove(), available
outside book3s_hv_rm_mmu.c. This will be used by the HPT writing
code.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This uses a bit in our record of the guest view of the HPTE to record
when the HPTE gets modified. We use a reserved bit for this, and ensure
that this bit is always cleared in HPTE values returned to the guest.
The recording of modified HPTEs is only done if other code indicates
its interest by setting kvm->arch.hpte_mod_interest to a non-zero value.
The reason for this is that when later commits add facilities for
userspace to read the HPT, the first pass of reading the HPT will be
quicker if there are no (or very few) HPTEs marked as modified,
rather than having most HPTEs marked as modified.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This restructures the code that creates HPT (hashed page table)
entries so that it can be called in situations where we don't have a
struct vcpu pointer, only a struct kvm pointer. It also fixes a bug
where kvmppc_map_vrma() would corrupt the guest R4 value.
Most of the work of kvmppc_virtmode_h_enter is now done by a new
function, kvmppc_virtmode_do_h_enter, which itself calls another new
function, kvmppc_do_h_enter, which contains most of the old
kvmppc_h_enter. The new kvmppc_do_h_enter takes explicit arguments
for the place to return the HPTE index, the Linux page tables to use,
and whether it is being called in real mode, thus removing the need
for it to have the vcpu as an argument.
Currently kvmppc_map_vrma creates the VRMA (virtual real mode area)
HPTEs by calling kvmppc_virtmode_h_enter, which is designed primarily
to handle H_ENTER hcalls from the guest that need to pin a page of
memory. Since H_ENTER returns the index of the created HPTE in R4,
kvmppc_virtmode_h_enter updates the guest R4, corrupting the guest R4
in the case when it gets called from kvmppc_map_vrma on the first
VCPU_RUN ioctl. With this, kvmppc_map_vrma instead calls
kvmppc_virtmode_do_h_enter with the address of a dummy word as the
place to store the HPTE index, thus avoiding corrupting the guest R4.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Conflicts:
arch/powerpc/include/asm/Kbuild
arch/powerpc/include/uapi/asm/Kbuild
|
| | |\| |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Merge reason: development work has dependency on kvm patches merged
upstream.
Conflicts:
arch/powerpc/include/asm/Kbuild
arch/powerpc/include/asm/kvm_para.h
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This fixes an error in the inline asm in try_lock_hpte() where we
were erroneously using a register number as an immediate operand.
The bug only affects an error path, and in fact the code will still
work as long as the compiler chooses some register other than r0
for the "bits" variable. Nevertheless it should still be fixed.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.
In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis. Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.
To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state. Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock. This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.
Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped. Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.
This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable. With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest. Currently this only happens
rarely, when a vcpu is first started. This fixes some bugs and
omissions in the code in this case.
First, we need to check for VPA updates for the latecomer and make
a DTL entry for it. Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core. To handle this correctly
we introduce a new vcore state, VCORE_STARTING. Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap. A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc->runnable_threads, without holding
the vcore spinlock. This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.
Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.
In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that. The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.
The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread. This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition. Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.
This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists. The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The new uapi framework splits kernel internal and user space exported
bits of header files more cleanly. Adjust the ePAPR header accordingly.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* 'for-upstream' of http://github.com/agraf/linux-2.6: (56 commits)
arch/powerpc/kvm/e500_tlb.c: fix error return code
KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
KVM: PPC: Book3S: Get/set guest FP regs using the GET/SET_ONE_REG interface
KVM: PPC: Book3S: Get/set guest SPRs using the GET/SET_ONE_REG interface
KVM: PPC: set IN_GUEST_MODE before checking requests
KVM: PPC: e500: MMU API: fix leak of shared_tlb_pages
KVM: PPC: e500: fix allocation size error on g2h_tlb1_map
KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
KVM: Move some PPC ioctl definitions to the correct place
KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly
KVM: PPC: Move kvm->arch.slot_phys into memslot.arch
KVM: PPC: Book3S HV: Take the SRCU read lock before looking up memslots
KVM: PPC: bookehv: Allow duplicate calls of DO_KVM macro
KVM: PPC: BookE: Support FPU on non-hv systems
KVM: PPC: 440: Implement mfdcrx
KVM: PPC: 440: Implement mtdcrx
Document IACx/DACx registers access using ONE_REG API
KVM: PPC: E500: Remove E500_TLB_DIRTY flag
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The PAPR paravirtualization interface lets guests register three
different types of per-vCPU buffer areas in its memory for communication
with the hypervisor. These are called virtual processor areas (VPAs).
Currently the hypercalls to register and unregister VPAs are handled
by KVM in the kernel, and userspace has no way to know about or save
and restore these registrations across a migration.
This adds "register" codes for these three areas that userspace can
use with the KVM_GET/SET_ONE_REG ioctls to see what addresses have
been registered, and to register or unregister them. This will be
needed for guest hibernation and migration, and is also needed so
that userspace can unregister them on reset (otherwise we corrupt
guest memory after reboot by writing to the VPAs registered by the
previous kernel).
The "register" for the VPA is a 64-bit value containing the address,
since the length of the VPA is fixed. The "registers" for the SLB
shadow buffer and dispatch trace log (DTL) are 128 bits long,
consisting of the guest physical address in the high (first) 64 bits
and the length in the low 64 bits.
This also fixes a bug where we were calling init_vpa unconditionally,
leading to an oops when unregistering the VPA.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This enables userspace to get and set all the guest floating-point
state using the KVM_[GS]ET_ONE_REG ioctls. The floating-point state
includes all of the traditional floating-point registers and the
FPSCR (floating point status/control register), all the VMX/Altivec
vector registers and the VSCR (vector status/control register), and
on POWER7, the vector-scalar registers (note that each FP register
is the high-order half of the corresponding VSR).
Most of these are implemented in common Book 3S code, except for VSX
on POWER7. Because HV and PR differ in how they store the FP and VSX
registers on POWER7, the code for these cases is not common. On POWER7,
the FP registers are the upper halves of the VSX registers vsr0 - vsr31.
PR KVM stores vsr0 - vsr31 in two halves, with the upper halves in the
arch.fpr[] array and the lower halves in the arch.vsr[] array, whereas
HV KVM on POWER7 stores the whole VSX register in arch.vsr[].
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: fix whitespace, vsx compilation]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This enables userspace to get and set various SPRs (special-purpose
registers) using the KVM_[GS]ET_ONE_REG ioctls. With this, userspace
can get and set all the SPRs that are part of the guest state, either
through the KVM_[GS]ET_REGS ioctls, the KVM_[GS]ET_SREGS ioctls, or
the KVM_[GS]ET_ONE_REG ioctls.
The SPRs that are added here are:
- DABR: Data address breakpoint register
- DSCR: Data stream control register
- PURR: Processor utilization of resources register
- SPURR: Scaled PURR
- DAR: Data address register
- DSISR: Data storage interrupt status register
- AMR: Authority mask register
- UAMOR: User authority mask override register
- MMCR0, MMCR1, MMCRA: Performance monitor unit control registers
- PMC1..PMC8: Performance monitor unit counter registers
In order to reduce code duplication between PR and HV KVM code, this
moves the kvm_vcpu_ioctl_[gs]et_one_reg functions into book3s.c and
centralizes the copying between user and kernel space there. The
registers that are handled differently between PR and HV, and those
that exist only in one flavor, are handled in kvmppc_[gs]et_one_reg()
functions that are specific to each flavor.
Signed-off-by: Paul Mackerras <paulus@samba.org>
[agraf: minimal style fixes]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This adds an implementation of kvm_arch_flush_shadow_memslot for
Book3S HV, and arranges for kvmppc_core_commit_memory_region to
flush the dirty log when modifying an existing slot. With this,
we can handle deletion and modification of memory slots.
kvm_arch_flush_shadow_memslot calls kvmppc_core_flush_memslot, which
on Book3S HV now traverses the reverse map chains to remove any HPT
(hashed page table) entries referring to pages in the memslot. This
gets called by generic code whenever deleting a memslot or changing
the guest physical address for a memslot.
We flush the dirty log in kvmppc_core_commit_memory_region for
consistency with what x86 does. We only need to flush when an
existing memslot is being modified, because for a new memslot the
rmap array (which stores the dirty bits) is all zero, meaning that
every page is considered clean already, and when deleting a memslot
we obviously don't care about the dirty bits any more.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that we have an architecture-specific field in the kvm_memory_slot
structure, we can use it to store the array of page physical addresses
that we need for Book3S HV KVM on PPC970 processors. This reduces the
size of struct kvm_arch for Book3S HV, and also reduces the size of
struct kvm_arch_memory_slot for other PPC KVM variants since the fields
in it are now only compiled in for Book3S HV.
This necessitates making the kvm_arch_create_memslot and
kvm_arch_free_memslot operations specific to each PPC KVM variant.
That in turn means that we now don't allocate the rmap arrays on
Book3S PR and Book E.
Since we now unpin pages and free the slot_phys array in
kvmppc_core_free_memslot, we no longer need to do it in
kvmppc_core_destroy_vm, since the generic code takes care to free
all the memslots when destroying a VM.
We now need the new memslot to be passed in to
kvmppc_core_prepare_memory_region, since we need to initialize its
arch.slot_phys member on Book3S HV.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The current form of DO_KVM macro restricts its use to one call per input
parameter set. This is caused by kvmppc_resume_\intno\()_\srr1 symbol
definition.
Duplicate calls of DO_KVM are required by distinct implementations of
exeption handlers which are delegated at runtime. Use a rare label number
to avoid conflicts with the calling contexts.
Signed-off-by: Mihai Caraman <mihai.caraman@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
IAC/DAC are defined as 32 bit while they are 64 bit wide. So ONE_REG
interface is added to set/get them.
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds the watchdog emulation in KVM. The watchdog
emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_BOOKE_WATCHDOG) ioctl.
The kernel timer are used for watchdog emulation and emulates
h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
it is configured.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
[bharat.bhushan@freescale.com: reworked patch]
Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
[agraf: adjust to new request framework]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Requests may want to tell us that we need to go back into host state,
so add a return value for the checks.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Today, we disable preemption while inside guest context, because we need
to expose to the world that we are not in a preemptible context. However,
during that time we already have interrupts disabled, which would indicate
that we are in a non-preemptible context.
The reason the checks for irqs_disabled() fail for us though is that we
manually control hard IRQs and ignore all the lazy EE framework. Let's
stop doing that. Instead, let's always use lazy EE to indicate when we
want to disable IRQs, but do a special final switch that gets us into
EE disabled, but soft enabled state. That way when we get back out of
guest state, we are immediately ready to process interrupts.
This simplifies the code drastically and reduces the time that we appear
as preempt disabled.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that we have very simple MMU Notifier support for e500 in place,
also add the same simple support to book3s. It gets us one step closer
to actual fast support.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We need to do the same things when preparing to enter a guest for booke and
book3s_pr cores. Fold the generic code into a generic function that both call.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The e500 target has lived without mmu notifiers ever since it got
introduced, but fails for the user space check on them with hugetlbfs.
So in order to get that one working, implement mmu notifiers in a
reasonably dumb fashion and be happy. On embedded hardware, we almost
never end up with mmu notifier calls, since most people don't overcommit.
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[varun: 64-bit changes]
Signed-off-by: Varun Sethi <Varun.Sethi@freescale.com>
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
And add a new flag definition in kvm_ppc_pvinfo to indicate
whether the host supports the EV_IDLE hcall.
Signed-off-by: Liu Yu <yu.liu@freescale.com>
[stuart.yoder@freescale.com: cleanup,fixes for conditions allowing idle]
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
[agraf: fix typo]
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial branch from Jiri Kosina:
"Usual stuff -- comment/printk typo fixes, documentation updates, dead
code elimination."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
HOWTO: fix double words typo
x86 mtrr: fix comment typo in mtrr_bp_init
propagate name change to comments in kernel source
doc: Update the name of profiling based on sysfs
treewide: Fix typos in various drivers
treewide: Fix typos in various Kconfig
wireless: mwifiex: Fix typo in wireless/mwifiex driver
messages: i2o: Fix typo in messages/i2o
scripts/kernel-doc: check that non-void fcts describe their return value
Kernel-doc: Convention: Use a "Return" section to describe return values
radeon: Fix typo and copy/paste error in comments
doc: Remove unnecessary declarations from Documentation/accounting/getdelays.c
various: Fix spelling of "asynchronous" in comments.
Fix misspellings of "whether" in comments.
eisa: Fix spelling of "asynchronous".
various: Fix spelling of "registered" in comments.
doc: fix quite a few typos within Documentation
target: iscsi: fix comment typos in target/iscsi drivers
treewide: fix typo of "suport" in various comments and Kconfig
treewide: fix typo of "suppport" in various comments
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.
Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
|