aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* powerpc/64s: Blacklist functions invoked on a trapNaveen N. Rao2017-07-03
| | | | | | | | | | Blacklist all functions involved while handling a trap. We: - convert some of the symbols into private symbols, and - blacklist most functions involved while handling a trap. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Un-blacklist system_call() from kprobesNaveen N. Rao2017-07-03
| | | | | | | | | | | It is actually safe to probe system_call() in entry_64.S, but only till we unset MSR_RI. To allow this, add a new symbol system_call_exit() after the mtmsrd and blacklist that. Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Move system_call() symbol to just after setting MSR_EENaveen N. Rao2017-07-03
| | | | | | | | | | | | | | | | | | It is common to get a PMU interrupt right after the mtmsr instruction that enables interrupts. Due to this, the stack trace profile gets needlessly split across system_call_common() and system_call(). Previously, system_call() symbol was at the current place to hide a few earlier symbols which have since been made private or removed entirely. So, let's move system_call() slightly higher up, right after the mtmsr instruction that enables interrupts. Convert existing references to system_call to a local syscall symbol. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Blacklist system_call() and system_call_common() from kprobesNaveen N. Rao2017-07-03
| | | | | | | | | | | | | Convert some of the symbols into private symbols and blacklist system_call_common() and system_call() from kprobes. We can't take a trap at parts of these functions as either MSR_RI is unset or the kernel stack pointer is not yet setup. Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Don't convert system_call_common to _GLOBAL()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/64s: Convert .L__replay_interrupt_return to a local labelNaveen N. Rao2017-07-03
| | | | | | | | | | | | | | | Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") introduced __replay_interrupt_return symbol with '.L' prefix in hopes of keeping it private. However, due to the use of LOAD_REG_ADDR(), the assembler kept this symbol visible. Fix the same by instead using the local label '1'. Fixes: Commit b48bbb82e2b835 ("powerpc/64s: Don't unbalance the return branch predictor in __replay_interrupt()") Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc64/elfv1: Only dereference function descriptor for non-text symbolsNaveen N. Rao2017-07-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we assume that the function pointer we receive in ppc_function_entry() points to a function descriptor. However, this is not always the case. In particular, assembly symbols without the right annotation do not have an associated function descriptor. Some of these symbols are added to the kprobe blacklist using _ASM_NOKPROBE_SYMBOL(). When such addresses are subsequently processed through arch_deref_entry_point() in populate_kprobe_blacklist(), we see the below errors during bootup: [ 0.663963] Failed to find blacklist at 7d9b02a648029b6c [ 0.663970] Failed to find blacklist at a14d03d0394a0001 [ 0.663972] Failed to find blacklist at 7d5302a6f94d0388 [ 0.663973] Failed to find blacklist at 48027d11e8610178 [ 0.663974] Failed to find blacklist at f8010070f8410080 [ 0.663976] Failed to find blacklist at 386100704801f89d [ 0.663977] Failed to find blacklist at 7d5302a6f94d00b0 Fix this by checking if the function pointer we receive in ppc_function_entry() already points to kernel text. If so, we just return it as is. If not, we assume that this is a function descriptor and proceed to dereference it. Suggested-by: Nicholas Piggin <npiggin@gmail.com> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* cxl: Export library to support IBM XSLChristophe Lombard2017-07-03
| | | | | | | | | | | | | | | | | | This patch exports a in-kernel 'library' API which can be called by other drivers to help interacting with an IBM XSL on a POWER9 system. The XSL (Translation Service Layer) is a stripped down version of the PSL (Power Service Layer) used in some cards such as the Mellanox CX5. Like the PSL, it implements the CAIA architecture, but has a number of differences, mostly in it's implementation dependent registers. The XSL also uses a special DMA cxl mode, which uses a slightly different init sequence for the CAPP and PHB. Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* Merge branch 'fixes' into nextMichael Ellerman2017-07-03
|\ | | | | | | | | | | Merge our fixes branch, a few of them are tripping people up while working on top of next, and we also have a dependency between the CXL fixes and new CXL code we want to merge into next.
| * powerpc/32: Avoid miscompilation w/GCC 4.6.3 - don't inline copy_to/from_user()Michael Ellerman2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Larry Finger reported that his Powerbook G4 was no longer booting with v4.12-rc, userspace was up but giving weird errors such as: udevd[64]: starting version 175 udevd[64]: Unable to receive ctrl message: Bad address. modprobe: chdir(4.12-rc1): No such file or directory He bisected the problem to commit 3448890c32c3 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER"). Al identified that the problem is actually a miscompilation by GCC 4.6.3, which is exposed by the above commit. Al also pointed out that inlining copy_to/from_user() is probably of little or no benefit, which is correct. Using Anton's copy_to_user benchmark, with a pathological single byte copy, we see a small increase in performance by *removing* inlining: Before (inlined): # time ./copy_to_user -w -l 1 -i 10000000 ( x 3 ) real 0m22.063s real 0m22.059s real 0m22.076s After: # time ./copy_to_user -w -l 1 -i 10000000 ( x 3 ) real 0m21.325s real 0m21.299s real 0m21.364s So as a small performance improvement and to avoid the miscompilation, drop inlining copy_to/from_user() on 32-bit. Fixes: 3448890c32c3 ("powerpc: get rid of zeroing, switch to RAW_COPY_USER") Reported-by: Larry Finger <Larry.Finger@lwfinger.net> Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * cxl: Fixes for Coherent Accelerator Interface Architecture 2.0Christophe Lombard2017-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A previous set of patches "cxl: Add support for Coherent Accelerator Interface Architecture 2.0" has introduced a new support for the CAPI cards. These patches have been tested on Simulation environment and quite a bit of them have been tested on real hardware. This patch brings new fixes after a series of tests carried out on new equipment: - Add POWER9 definition. - Re-enable any masked interrupts when the AFU is not activated after resetting the AFU. - Remove the api cxl_is_psl8/9 which is no longer useful. - Do not dump CAPI1 registers. - Rewrite cxl_is_page_fault() function. - Do not register slb callack on P9. Fixes: f24be42aab37 ("cxl: Add psl9 specific code") Signed-off-by: Christophe Lombard <clombard@linux.vnet.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64: Initialise thread_info for emergency stacksNicholas Piggin2017-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emergency stacks have their thread_info mostly uninitialised, which in particular means garbage preempt_count values. Emergency stack code runs with interrupts disabled entirely, and is used very rarely, so this has been unnoticed so far. It was found by a proposed new powerpc watchdog that takes a soft-NMI directly from the masked_interrupt handler and using the emergency stack. That crashed at BUG_ON(in_nmi()) in nmi_enter(). preempt_count()s were found to be garbage. To fix this, zero the entire THREAD_SIZE allocation, and initialize the thread_info. Cc: stable@vger.kernel.org Reported-by: Abdul Haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Move it all into setup_64.c, use a function not a macro. Fix crashes on Cell by setting preempt_count to 0 not HARDIRQ_OFFSET] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/powernv/npu-dma: Add explicit flush when sending an ATSDAlistair Popple2017-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NPU2 requires an extra explicit flush to an active GPU PID when sending address translation shoot downs (ATSDs) to reliably flush the GPU TLB. This patch adds just such a flush at the end of each sequence of ATSDs. We can safely use PID 0 which is always reserved and active on the GPU. PID 0 is only used for init_mm which will never be a user mm on the GPU. To enforce this we add a check in pnv_npu2_init_context() just in case someone tries to use PID 0 on the GPU. Signed-off-by: Alistair Popple <alistair@popple.id.au> [mpe: Use true/false for bool literals] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf: Fix oops when kthread execs user processRavi Bangoria2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a kthread calls call_usermodehelper() the steps are: 1. allocate current->mm 2. load_elf_binary() 3. populate current->thread.regs While doing this, interrupts are not disabled. If there is a perf interrupt in the middle of this process (i.e. step 1 has completed but not yet reached to step 3) and if perf tries to read userspace regs, kernel oops with following log: Unable to handle kernel paging request for data at address 0x00000000 Faulting instruction address: 0xc0000000000da0fc ... Call Trace: perf_output_sample_regs+0x6c/0xd0 perf_output_sample+0x4e4/0x830 perf_event_output_forward+0x64/0x90 __perf_event_overflow+0x8c/0x1e0 record_and_restart+0x220/0x5c0 perf_event_interrupt+0x2d8/0x4d0 performance_monitor_exception+0x54/0x70 performance_monitor_common+0x158/0x160 --- interrupt: f01 at avtab_search_node+0x150/0x1a0 LR = avtab_search_node+0x100/0x1a0 ... load_elf_binary+0x6e8/0x15a0 search_binary_handler+0xe8/0x290 do_execveat_common.isra.14+0x5f4/0x840 call_usermodehelper_exec_async+0x170/0x210 ret_from_kernel_thread+0x5c/0x7c Fix it by setting abi to PERF_SAMPLE_REGS_ABI_NONE when userspace pt_regs are not set. Fixes: ed4a4ef85cf5 ("powerpc/perf: Add support for sampling interrupt register state") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: Handle data breakpoints in Radix modeNaveen N. Rao2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Power9, trying to use data breakpoints throws the splat shown below. This is because the check for a data breakpoint in DSISR is in do_hash_page(), which is not called when in Radix mode. Unable to handle kernel paging request for data at address 0xc000000000e19218 Faulting instruction address: 0xc0000000001155e8 cpu 0x0: Vector: 300 (Data Access) at [c0000000ef1e7b20] pc: c0000000001155e8: find_pid_ns+0x48/0xe0 lr: c000000000116ac4: find_task_by_vpid+0x44/0x90 sp: c0000000ef1e7da0 msr: 9000000000009033 dar: c000000000e19218 dsisr: 400000 Move the check to handle_page_fault() so as to catch data breakpoints in both Hash and Radix MMU modes. We have to change the check in do_hash_page() against 0xa410 to use 0xa450, so as to include the value of (DSISR_DABRMATCH << 16). There are two sites that call handle_page_fault() when in Radix, both already pass DSISR in r4. Fixes: caca285e5ab4 ("powerpc/mm/radix: Use STD_MMU_64 to properly isolate hash related code") Cc: stable@vger.kernel.org # v4.7+ Reported-by: Shriya R. Kulkarni <shriykul@in.ibm.com> Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> [mpe: Fix the fall-through case on hash, we need to reload DSISR] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/kprobes: Skip livepatch_handler() for jprobesNaveen N. Rao2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace_caller() depends on a modified regs->nip to detect if a certain function has been livepatched. However, with KPROBES_ON_FTRACE, it is possible for regs->nip to have been modified by the kprobes pre_handler (jprobes, for instance). In this case, we do not want to invoke the livepatch_handler so as not to consume the livepatch stack. To distinguish between the two (kprobes and livepatch), we check if there is an active kprobe on the current function. If there is, then we know for sure that it must have modified the NIP as we don't support livepatching a kprobe'd function. In this case, we simply skip the livepatch_handler and branch to the new NIP. Otherwise, the livepatch_handler is invoked. Fixes: ead514d5fb30 ("powerpc/kprobes: Add support for KPROBES_ON_FTRACE") Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/ftrace: Pass the correct stack pointer for DYNAMIC_FTRACE_WITH_REGSNaveen N. Rao2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | For DYNAMIC_FTRACE_WITH_REGS, we should be passing-in the original set of registers in pt_regs, to capture the state _before_ ftrace_caller. However, we are instead passing the stack pointer *after* allocating a stack frame in ftrace_caller. Fix this by saving the proper value of r1 in pt_regs. Also, use SAVE_10GPRS() to simplify the code. Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/kprobes: Pause function_graph tracing during jprobes handlingNaveen N. Rao2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a crash when function_graph and jprobes are used together. This is essentially commit 237d28db036e ("ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing"), but for powerpc. Jprobes breaks function_graph tracing since the jprobe hook needs to use jprobe_return(), which never returns back to the hook, but instead to the original jprobe'd function. The solution is to momentarily pause function_graph tracing before invoking the jprobe hook and re-enable it when returning back to the original jprobe'd function. Fixes: 6794c78243bf ("powerpc64: port of the function graph tracer") Cc: stable@vger.kernel.org # v2.6.30+ Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/debug: Add missing warn flag to WARN_ON's non-builtin pathAlexey Kardashevskiy2017-06-16
| | | | | | | | | | | | | | | | | | | | | | | | | | When trapped on WARN_ON(), report_bug() is expected to return BUG_TRAP_TYPE_WARN so the caller will increment NIP by 4 and continue. The __builtin_constant_p() path of the PPC's WARN_ON() calls (indirectly) __WARN_FLAGS() which has BUGFLAG_WARNING set, however the other branch does not which makes report_bug() report a bug rather than a warning. Fixes: f26dee15103f ("debug: Avoid setting BUGFLAG_WARNING twice") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/xive: Fix offset for store EOI MMIOsBenjamin Herrenschmidt2017-06-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Architecturally we should apply a 0x400 offset for these. Not doing it will break future HW implementations. The offset of 0 is supposed to remain for "triggers" though not all sources support both trigger and store EOI, and in P9 specifically, some sources will treat 0 as a store EOI. But future chips will not. So this makes us use the properly architected offset which should work always. Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no of_nodeAlistair Popple2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4c3b89effc28 ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev") introduced explicit warnings in pnv_pci_get_npu_dev() when a PCIe device has no associated device-tree node. However not all PCIe devices have an of_node and pnv_pci_get_npu_dev() gets indirectly called at least once for every PCIe device in the system. This results in spurious WARN_ON()'s so remove it. The same situation should not exist for pnv_pci_get_gpu_dev() as any NPU based PCIe device requires a device-tree node. Fixes: 4c3b89effc28 ("powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/book3s64: Move PPC_DT_CPU_FTRs and enable it by defaultMichael Ellerman2017-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The PPC_DT_CPU_FTRs is a bit misplaced in menuconfig, it shows up with other general kernel options. It's really more at home in the "Platform Support" section, so move it there. Also enable it by default, for Book3s 64. It does mostly nothing unless the device tree properties are found, and we will want it enabled eventually in distro kernels, so turn it on to start getting more testing. Fixes: 5a61ef74f269 ("powerpc/64s: Support new device tree binding for discovering CPU features") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/mm/4k: Limit 4k page size config to 64TB virtual address spaceAneesh Kumar K.V2017-06-08
| | | | | | | | | | | | | | | | | | | | | | Supporting 512TB requires us to do a order 3 allocation for level 1 page table (pgd). This results in page allocation failures with certain workloads. For now limit 4k linux page size config to 64TB. Fixes: f6eedbba7a26 ("powerpc/mm/hash: Increase VA range to 128TB") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * cxl: Fix error path on bad ioctlFrederic Barrat2017-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | Fix error path if we can't copy user structure on CXL_IOCTL_START_WORK ioctl. We shouldn't unlock the context status mutex as it was not locked (yet). Fixes: 0712dc7e73e5 ("cxl: Fix issues when unmapping contexts") Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Reviewed-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/perf: Fix Power9 test_adder fieldsMadhavan Srinivasan2017-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8d911904f3ce4 ('powerpc/perf: Add restrictions to PMC5 in power9 DD1') was added to restrict the use of PMC5 in Power9 DD1. Intention was to disable the use of PMC5 using raw event code. But instead of updating the power9_isa207_pmu structure (used on DD1), the commit incorrectly updated the power9_pmu structure. Fix it. Fixes: 8d911904f3ce ("powerpc/perf: Add restrictions to PMC5 in power9 DD1") Reported-by: Shriya <shriyak@linux.vnet.ibm.com> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Shriya <shriyak@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/numa: Fix percpu allocations to be NUMA awareMichael Ellerman2017-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"), we switched to the generic implementation of cpu_to_node(), which uses a percpu variable to hold the NUMA node for each CPU. Unfortunately we neglected to notice that we use cpu_to_node() in the allocation of our percpu areas, leading to a chicken and egg problem. In practice what happens is when we are setting up the percpu areas, cpu_to_node() reports that all CPUs are on node 0, so we allocate all percpu areas on node 0. This is visible in the dmesg output, as all pcpu allocs being in group 0: pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 pcpu-alloc: [0] 24 25 26 27 [0] 28 29 30 31 pcpu-alloc: [0] 32 33 34 35 [0] 36 37 38 39 pcpu-alloc: [0] 40 41 42 43 [0] 44 45 46 47 To fix it we need an early_cpu_to_node() which can run prior to percpu being setup. We already have the numa_cpu_lookup_table we can use, so just plumb it in. With the patch dmesg output shows two groups, 0 and 1: pcpu-alloc: [0] 00 01 02 03 [0] 04 05 06 07 pcpu-alloc: [0] 08 09 10 11 [0] 12 13 14 15 pcpu-alloc: [0] 16 17 18 19 [0] 20 21 22 23 pcpu-alloc: [1] 24 25 26 27 [1] 28 29 30 31 pcpu-alloc: [1] 32 33 34 35 [1] 36 37 38 39 pcpu-alloc: [1] 40 41 42 43 [1] 44 45 46 47 We can also check the data_offset in the paca of various CPUs, with the fix we see: CPU 0: data_offset = 0x0ffe8b0000 CPU 24: data_offset = 0x1ffe5b0000 And we can see from dmesg that CPU 24 has an allocation on node 1: node 0: [mem 0x0000000000000000-0x0000000fffffffff] node 1: [mem 0x0000001000000000-0x0000001fffffffff] Cc: stable@vger.kernel.org # v3.16+ Fixes: 8c272261194d ("powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * cxl: Avoid double free_irq() for psl,slice interruptsVaibhav Jain2017-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During an eeh call to cxl_remove can result in double free_irq of psl,slice interrupts. This can happen if perst_reloads_same_image == 1 and call to cxl_configure_adapter() fails during slot_reset callback. In such a case we see a kernel oops with following back-trace: Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: free_irq+0x88/0xd0 (unreliable) cxl_unmap_irq+0x20/0x40 [cxl] cxl_native_release_psl_irq+0x78/0xd8 [cxl] pci_deconfigure_afu+0xac/0x110 [cxl] cxl_remove+0x104/0x210 [cxl] pci_device_remove+0x6c/0x110 device_release_driver_internal+0x204/0x2e0 pci_stop_bus_device+0xa0/0xd0 pci_stop_and_remove_bus_device+0x28/0x40 pci_hp_remove_devices+0xb0/0x150 pci_hp_remove_devices+0x68/0x150 eeh_handle_normal_event+0x140/0x580 eeh_handle_event+0x174/0x360 eeh_event_handler+0x1e8/0x1f0 This patch fixes the issue of double free_irq by checking that variables that hold the virqs (err_hwirq, serr_hwirq, psl_virq) are not '0' before un-mapping and resetting these variables to '0' when they are un-mapped. Cc: stable@vger.kernel.org Signed-off-by: Vaibhav Jain <vaibhav@linux.vnet.ibm.com> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/kernel: Initialize load_tm on task creationBreno Leitao2017-06-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently tsk->thread.load_tm is not initialized in the task creation and can contain garbage on a new task. This is an undesired behaviour, since it affects the timing to enable and disable the transactional memory laziness (disabling and enabling the MSR TM bit, which affects TM reclaim and recheckpoint in the scheduling process). Fixes: 5d176f751ee3 ("powerpc: tm: Enable transactional memory (TM) lazily for userspace") Cc: stable@vger.kernel.org # v4.9+ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/kernel: Fix FP and vector register restorationBreno Leitao2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently tsk->thread->load_vec and load_fp are not initialized during task creation, which can lead to garbage values in these variables (non-zero values). These variables will be checked later in restore_math() to validate if the FP and vector registers are being utilized. Since these values might be non-zero, the restore_math() will continue to save the FP and vectors even if they were never utilized by the userspace application. load_fp and load_vec counters will then overflow (they wrap at 255) and the FP and Altivec will be finally disabled, but before that condition is reached (counter overflow) several context switches will have restored FP and vector registers without need, causing a performance degradation. Fixes: 70fe3d980f5f ("powerpc: Restore FPU/VEC/VSX if previously used") Cc: stable@vger.kernel.org # v4.6+ Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Gustavo Romero <gusbromero@gmail.com> Acked-by: Anton Blanchard <anton@samba.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64: Reclaim CPU_FTR_SUBCOREMichael Ellerman2017-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are running low on CPU feature bits, so we only want to use them when it's really necessary. CPU_FTR_SUBCORE is only used in one place, and only in C, so we don't need it in order to make asm patching work. It can only be set on "Power8" CPUs, which in practice means POWER8, POWER8E and POWER8NVL. There are no plans to implement it on future CPUs, but if there ever were we could retrofit it then. Although KVM uses subcores, it never looks at the CPU feature, it either looks at the ISA level or the threads_per_subcore value. So drop the CPU feature and do a PVR check instead. Drop the device tree "subcore" feature as we no longer support doing anything with it, and we will drop it from skiboot too. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/hotplug-mem: Fix missing endian conversion of aa_indexMichael Bringmann2017-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | When adding or removing memory, the aa_index (affinity value) for the memblock must also be converted to match the endianness of the rest of the 'ibm,dynamic-memory' property. Otherwise, subsequent retrieval of the attribute will likely lead to non-existent nodes, followed by using the default node in the code inappropriately. Fixes: 5f97b2a0d176 ("powerpc/pseries: Implement memory hotplug add in the kernel") Cc: stable@vger.kernel.org # v4.1+ Signed-off-by: Michael Bringmann <mwb@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/sysdev/simple_gpio: Fix oops in gpio save_regs functionChristophe Leroy2017-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | of_mm_gpiochip_add_data() generates an oops for NULL pointer dereference. of_mm_gpiochip_add_data() calls mm_gc->save_regs() before setting the data, therefore ->save_regs() cannot use gpiochip_get_data() Fixes: 937daafca774 ("powerpc: simple-gpio: use gpiochip data pointer") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/spufs: Fix coredump of SPU contextsMichael Ellerman2017-06-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a process dumps core while it has SPU contexts active then we have code to also dump information about the SPU contexts. Unfortunately it's been broken for 3 1/2 years, and we didn't notice. In commit 7b1f4020d0d1 ("spufs: get rid of dump_emit() wrappers") the nread variable was removed and rc used instead. That means when the loop exits successfully, rc has the number of bytes read, but it's then used as the return value for the function, which should return 0 on success. So fix it by setting rc = 0 before returning in the success case. Fixes: 7b1f4020d0d1 ("spufs: get rid of dump_emit() wrappers") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Jeremy Kerr <jk@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * powerpc/64s: Add dt_cpu_ftrs boot time setup optionNicholas Piggin2017-06-01
| | | | | | | | | | | | | | | | | | | | | | | | Provide a dt_cpu_ftrs= cmdline option to disable the dt_cpu_ftrs CPU feature discovery, and fall back to the "cputable" based version. Also allow control of advertising unknown features to userspace and with this parameter, and remove the clunky CONFIG option. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> [mpe: Add explicit early check of bootargs in dt_cpu_ftrs_init()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/dts: Use #include "..." to include local DTMasahiro Yamada2017-07-03
| | | | | | | | | | | | | | | | | | | | | | | | Most of DT files in PowerPC use #include "..." to make pre-processor include DT in the same directory, but we have 3 exceptional files that use #include <...> for that. Fix them to remove -I$(srctree)/arch/$(SRCARCH)/boot/dts path from dtc_cpp_flags. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Aggregate result elements on POWER9 SMT8Thiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | On POWER9 SMT8 the 24x7 API returns two result elements for physical core and virtual CPU events and we need to add their counts to get the final result. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Support v2 of the hypervisor APIThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | POWER9 introduces a new version of the hypervisor API to access the 24x7 perf counters. The new version changed some of the structures used for requests and results. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Minor improvementsThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's an H24x7_DATA_BUFFER_SIZE constant, so use it in init_24x7_request. There's also an HV_PERF_DOMAIN_MAX constant, so use it in h_24x7_event_init. This makes the comment above the check redundant, so remove it. In add_event_to_24x7_request, a statement is terminated with a comma instead of a semicolon. Fix it. In hv-24x7.h, improve comments in struct hv_24x7_result. Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Fix return value of hcallsThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The H_GET_24X7_CATALOG_PAGE hcall can return a signed error code, so fix this in the code. The H_GET_24X7_DATA hcall can return a signed error code, so fix this in the code. Also, don't truncate it to 32 bit to use as return value for make_24x7_request. In case of error h_24x7_event_commit_txn passes that return value to generic code, so it should be a proper errno. The other caller of make_24x7_request is single_24x7_request, whose callers don't actually care which error code is returned so they are not affected by this change. Finally, h_24x7_get_value doesn't use the error code from single_24x7_request, so there's no need to store it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc-perf/hx-24x7: Don't log failed hcall twiceThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | make_24x7_request already calls log_24x7_hcall if it fails, so callers don't have to do it again. In fact, since the latter is now only called from the former, there's no need for a separate log_24x7_hcall anymore so remove it. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Properly iterate through resultsThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hv-24x7.h has a comment mentioning that result_buffer->results can't be indexed as a normal array because it may contain results of variable sizes, so fix the loop in h_24x7_event_commit_txn to take the variation into account when iterating through results. Another problem in that loop is that it sets h24x7hw->events[i] to NULL. This assumes that only the i'th result maps to the i'th request, but that is not guaranteed to be true. We need to leave the event in the array so that we don't dereference a NULL pointer in case more than one result maps to one request. We still assume that each result has only one result element, so warn if that assumption is violated. Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Fix off-by-one error in request_buffer checkThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | request_buffer can hold 254 requests, so if it already has that number of entries we can't add a new one. Also, define constant to show where the number comes from. Fixes: e3ee15dc5d19 ("powerpc/perf/hv-24x7: Define add_event_to_24x7_request()") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/perf/hv-24x7: Fix passing of catalog version numberThiago Jung Bauermann2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | H_GET_24X7_CATALOG_PAGE needs to be passed the version number obtained from the first catalog page obtained previously. This is a 64 bit number, but create_events_from_catalog truncates it to 32-bit. This worked on POWER8, but POWER9 actually uses the upper bits so the call fails with H_P3 because the hypervisor doesn't recognize the version. This patch also adds the hcall return code to the error message, which is helpful when debugging the problem. Fixes: 5c5cd7b50259 ("powerpc/perf/hv-24x7: parse catalog and populate sysfs with events") Reviewed-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Thiago Jung Bauermann <bauerman@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Enable ZONE_DEVICE on powerpcOliver O'Halloran2017-07-02
| | | | | | | | | | | | | | | | Flip the switch. Running around and screaming "IT'S ALIVE" is optional, but recommended. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Wire up hpte_removebolted for powernvAnton Blanchard2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | Adds support for removing bolted (i.e kernel linear mapping) mappings on powernv. This is needed to support memory hot unplug operations which are required for the teardown of DAX/PMEM devices. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/mm: Add devmap support for ppc64Oliver O'Halloran2017-07-02
| | | | | | | | | | | | | | | | | | | | | | Add support for the devmap bit on PTEs and PMDs for PPC64 Book3S. This is used to differentiate device backed memory from transparent huge pages since they are handled in more or less the same manner by the core mm code. Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/vmemmap: Add altmap supportOliver O'Halloran2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | Adds support to powerpc for the altmap feature of ZONE_DEVICE memory. An altmap is a driver provided region that is used to provide the backing storage for the struct pages of ZONE_DEVICE memory. In situations where large amount of ZONE_DEVICE memory is being added to the system the altmap reduces pressure on main system memory by allowing the mm/ metadata to be stored on the device itself rather in main memory. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/vmemmap: Reshuffle vmemmap_free()Oliver O'Halloran2017-07-02
| | | | | | | | | | | | | | | | | | Removes an indentation level and shuffles some code around to make the following patch cleaner. No functional changes. Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | mm, x86: Add ARCH_HAS_ZONE_DEVICE to KconfigOliver O'Halloran2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | Currently ZONE_DEVICE depends on X86_64 and this will get unwieldly as new architectures (and platforms) get ZONE_DEVICE support. Move to an arch selected Kconfig option to save us the trouble. Cc: linux-mm@kvack.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | powerpc/hugetlbfs: Export HPAGE_SHIFTOliver O'Halloran2017-07-02
| | | | | | | | | | | | | | Export it so it can be referenced inside a module. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | MAINTAINERS: cxl: update maintainershipAndrew Donnellan2017-07-02
| | | | | | | | | | | | | | | | | | | | | | | | | | As Ian's stepping down from his maintainer role now that he's leaving IBM, Frederic has asked me to add myself to the cxl maintainer list. Updating accordingly. Cc: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Cc: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Acked-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com> Acked-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>