litmus-rt-ext-res.git/arch/powerpc, branch EXT-RES

powerpc/tm: Fix FP and VMX register corruption

2017-05-25T13:44:43+00:00

commit f48e91e87e67b56bef63393d1a02c6e22c1d7078 upstream.

In commit dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state
to store live registers"), a section of code was removed that copied
the current state to checkpointed state. That code should not have been
removed.

When an FP (Floating Point) unavailable is taken inside a transaction,
we need to abort the transaction. This is because at the time of the
tbegin, the FP state is bogus so the state stored in the checkpointed
registers is incorrect. To fix this, we treclaim (to get the
checkpointed GPRs) and then copy the thread_struct FP live state into
the checkpointed state. We then trecheckpoint so that the FP state is
correctly restored into the CPU.

The copying of the FP registers from live to checkpointed is what was
missing.

This simplifies the logic slightly from the original patch.
tm_reclaim_thread() will now always write the checkpointed FP
state. Either the checkpointed FP state will be written as part of
the actual treclaim (in tm.S), or it'll be a copy of the live
state. Which one we use is based on MSR[FP] from userspace.

Similarly for VMX.

Fixes: dc3106690b20 ("powerpc: tm: Always use fp_state and vr_state to store live registers")
Signed-off-by: Michael Neuling 
Reviewed-by: cyrilbur@gmail.com
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/64e: Fix hang when debugging programs with relocated kernel

2017-05-25T13:44:43+00:00

commit fd615f69a18a9d4aa5ef02a1dc83f319f75da8e7 upstream.

Debug interrupts can be taken during interrupt entry, since interrupt
entry does not automatically turn them off.  The kernel will check
whether the faulting instruction is between [interrupt_base_book3e,
__end_interrupts], and if so clear MSR[DE] and return.

However, when the kernel is built with CONFIG_RELOCATABLE, it can't use
LOAD_REG_IMMEDIATE(r14,interrupt_base_book3e) and
LOAD_REG_IMMEDIATE(r15,__end_interrupts), as they ignore relocation.
Thus, if the kernel is actually running at a different address than it
was built at, the address comparison will fail, and the exception entry
code will hang at kernel_dbg_exc.

r2(toc) is also not usable here, as r2 still holds data from the
interrupted context, so LOAD_REG_ADDR() doesn't work either.  So we use
the *name@got* to get the EV of two labels directly.

Test programs test.c shows as follows:
int main(int argc, char *argv[])
{
	if (access("/proc/sys/kernel/perf_event_paranoid", F_OK) == -1)
		printf("Kernel doesn't have perf_event support\n");
}

Steps to reproduce the bug, for example:
 1) ./gdb ./test
 2) (gdb) b access
 3) (gdb) r
 4) (gdb) s

Signed-off-by: Liu Hailong 
Signed-off-by: Jiang Xuexin 
Reviewed-by: Jiang Biao 
Reviewed-by: Liu Song 
Reviewed-by: Huang Jian 
[scottwood: cleaned up commit message, and specified bad behavior
 as a hang rather than an oops to correspond to mainline kernel behavior]
Fixes: 1cb6e0649248 ("powerpc/book3e: support CONFIG_RELOCATABLE")
Signed-off-by: Scott Wood 
Signed-off-by: Greg Kroah-Hartman

powerpc/iommu: Do not call PageTransHuge() on tail pages

2017-05-25T13:44:43+00:00

commit e889e96e98e8da97bd39e46b7253615eabe14397 upstream.

The CMA pages migration code does not support compound pages at
the moment so it performs few tests before proceeding to actual page
migration.

One of the tests - PageTransHuge() - has VM_BUG_ON_PAGE(PageTail()) as
it is designed to be called on head pages only. Since we also test for
PageCompound(), and it contains PageTail() and PageHead(), we can
simplify the check by leaving just PageCompound() and therefore avoid
possible VM_BUG_ON_PAGE.

Fixes: 2e5bbb5461f1 ("KVM: PPC: Book3S HV: Migrate pinned pages out of CMA")
Signed-off-by: Alexey Kardashevskiy 
Acked-by: Balbir Singh 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/pseries: Fix of_node_put() underflow during DLPAR remove

2017-05-25T13:44:43+00:00

commit 68baf692c435339e6295cb470ea5545cbc28160e upstream.

Historically struct device_node references were tracked using a kref embedded as
a struct field. Commit 75b57ecf9d1d ("of: Make device nodes kobjects so they
show up in sysfs") (Mar 2014) refactored device_nodes to be kobjects such that
the device tree could by more simply exposed to userspace using sysfs.

Commit 0829f6d1f69e ("of: device_node kobject lifecycle fixes") (Mar 2014)
followed up these changes to better control the kobject lifecycle and in
particular the referecne counting via of_node_get(), of_node_put(), and
of_node_init().

A result of this second commit was that it introduced an of_node_put() call when
a dynamic node is detached, in of_node_remove(), that removes the initial kobj
reference created by of_node_init().

Traditionally as the original dynamic device node user the pseries code had
assumed responsibilty for releasing this final reference in its platform
specific DLPAR detach code.

This patch fixes a refcount underflow introduced by commit 0829f6d1f6, and
recently exposed by the upstreaming of the recount API.

Messages like the following are no longer seen in the kernel log with this
patch following DLPAR remove operations of cpus and pci devices.

  rpadlpar_io: slot PHB 72 removed
  refcount_t: underflow; use-after-free.
  ------------[ cut here ]------------
  WARNING: CPU: 5 PID: 3335 at lib/refcount.c:128 refcount_sub_and_test+0xf4/0x110

Fixes: 0829f6d1f69e ("of: device_node kobject lifecycle fixes")
Signed-off-by: Tyrel Datwyler 
[mpe: Make change log commit references more verbose]
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/book3s/mce: Move add_taint() later in virtual mode

2017-05-25T13:44:43+00:00

commit d93b0ac01a9ce276ec39644be47001873d3d183c upstream.

machine_check_early() gets called in real mode. The very first time when
add_taint() is called, it prints a warning which ends up calling opal
call (that uses OPAL_CALL wrapper) for writing it to console. If we get a
very first machine check while we are in opal we are doomed. OPAL_CALL
overwrites the PACASAVEDMSR in r13 and in this case when we are done with
MCE handling the original opal call will use this new MSR on it's way
back to opal_return. This usually leads to unexpected behaviour or the
kernel to panic. Instead move the add_taint() call later in the virtual
mode where it is safe to call.

This is broken with current FW level. We got lucky so far for not getting
very first MCE hit while in OPAL. But easily reproducible on Mambo.

Fixes: 27ea2c420cad ("powerpc: Set the correct kernel taint on machine check errors.")
Signed-off-by: Mahesh Salgaonkar 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/eeh: Avoid use after free in eeh_handle_special_event()

2017-05-25T13:44:43+00:00

commit daeba2956f32f91f3493788ff6ee02fb1b2f02fa upstream.

eeh_handle_special_event() is called when an EEH event is detected but
can't be narrowed down to a specific PE.  This function looks through
every PE to find one in an erroneous state, then calls the regular event
handler eeh_handle_normal_event() once it knows which PE has an error.

However, if eeh_handle_normal_event() found that the PE cannot possibly
be recovered, it will free it, rendering the passed PE stale.
This leads to a use after free in eeh_handle_special_event() as it attempts to
clear the "recovering" state on the PE after eeh_handle_normal_event() returns.

Thus, make sure the PE is valid when attempting to clear state in
eeh_handle_special_event().

Fixes: 8a6b1bc70dbb ("powerpc/eeh: EEH core to handle special event")
Reported-by: Alexey Kardashevskiy 
Signed-off-by: Russell Currey 
Reviewed-by: Gavin Shan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/mm: Ensure IRQs are off in switch_mm()

2017-05-25T13:44:43+00:00

commit 9765ad134a00a01cbcc69c78ff6defbfad209bc5 upstream.

powerpc expects IRQs to already be (soft) disabled when switch_mm() is
called, as made clear in the commit message of 9c1e105238c4 ("powerpc: Allow
perf_counters to access user memory at interrupt time").

Aside from any race conditions that might exist between switch_mm() and an IRQ,
there is also an unconditional hard_irq_disable() in switch_slb(). If that isn't
followed at some point by an IRQ enable then interrupts will remain disabled
until we return to userspace.

It is true that when switch_mm() is called from the scheduler IRQs are off, but
not when it's called by use_mm(). Looking closer we see that last year in commit
f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
this was made more explicit by the addition of switch_mm_irqs_off() which is now
called by the scheduler, vs switch_mm() which is used by use_mm().

Arguably it is a bug in use_mm() to call switch_mm() in a different context than
it expects, but fixing that will take time.

This was discovered recently when vhost started throwing warnings such as:

  BUG: sleeping function called from invalid context at kernel/mutex.c:578
  in_atomic(): 0, irqs_disabled(): 1, pid: 10768, name: vhost-10760
  no locks held by vhost-10760/10768.
  irq event stamp: 10
  hardirqs last  enabled at (9):  _raw_spin_unlock_irq+0x40/0x80
  hardirqs last disabled at (10): switch_slb+0x2e4/0x490
  softirqs last  enabled at (0):  copy_process+0x5e8/0x1260
  softirqs last disabled at (0):  (null)
  Call Trace:
    show_stack+0x88/0x390 (unreliable)
    dump_stack+0x30/0x44
    __might_sleep+0x1c4/0x2d0
    mutex_lock_nested+0x74/0x5c0
    cgroup_attach_task_all+0x5c/0x180
    vhost_attach_cgroups_work+0x58/0x80 [vhost]
    vhost_worker+0x24c/0x3d0 [vhost]
    kthread+0xec/0x100
    ret_from_kernel_thread+0x5c/0xd4

Prior to commit 04b96e5528ca ("vhost: lockless enqueuing") (Aug 2016) the
vhost_worker() would do a spin_unlock_irq() not long after calling use_mm(),
which had the effect of reenabling IRQs. Since that commit removed the locking
in vhost_worker() the body of the vhost_worker() loop now runs with interrupts
off causing the warnings.

This patch addresses the problem by making the powerpc code mirror the x86 code,
ie. we disable interrupts in switch_mm(), and optimise the scheduler case by
defining switch_mm_irqs_off().

Signed-off-by: David Gibson 
[mpe: Flesh out/rewrite change log, add stable]
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

pstore: Fix flags to enable dumps on powerpc

2017-05-20T12:28:42+00:00

commit 041939c1ec54208b42f5cd819209173d52a29d34 upstream.

After commit c950fd6f201a kernel registers pstore write based on flag set.
Pstore write for powerpc is broken as flags(PSTORE_FLAGS_DMESG) is not set for
powerpc architecture. On panic, kernel doesn't write message to
/fs/pstore/dmesg*(Entry doesn't gets created at all).

This patch enables pstore write for powerpc architecture by setting
PSTORE_FLAGS_DMESG flag.

Fixes: c950fd6f201a ("pstore: Split pstore fragile flags")
Signed-off-by: Ankit Kumar 
Signed-off-by: Kees Cook 
Signed-off-by: Greg Kroah-Hartman

powerpc: Correctly disable latent entropy GCC plugin on prom_init.o

2017-05-14T12:00:14+00:00

commit eac6f8b0c7adb003776dbad9d037ee2fc64f9d62 upstream.

Commit 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") excludes
certain powerpc early boot code from the latent entropy plugin by adding
appropriate CFLAGS. It looks like this was supposed to cover
prom_init.o, but ended up saying init.o (which doesn't exist) instead.
Fix the typo.

Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin")
Signed-off-by: Andrew Donnellan 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman

powerpc/ftrace: Fix confusing help text for DISABLE_MPROFILE_KERNEL

2017-05-14T12:00:14+00:00

commit 496e9cb5b2aa2ba303d2bbd08518f9be2219ab4b upstream.

The final paragraph of the help text is reversed. We want to enable
this option by default, and disable it if the toolchain has a working
-mprofile-kernel.

Fixes: 8c50b72a3b4f ("powerpc/ftrace: Add Kconfig & Make glue for mprofile-kernel")
Signed-off-by: Anton Blanchard 
Signed-off-by: Michael Ellerman 
Signed-off-by: Greg Kroah-Hartman