aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
Commit message (Collapse)AuthorAge
...
| * | | | | powerpc/eeh: Fix build error for cellebGavin Shan2014-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7f52a526f ("powerpc/eeh: Allow to disable EEH") caused following build error with "celleb_defconfig" as being catched by Mikey on linux-next. arch/powerpc/kernel/eeh.c: In function 'eeh_init_proc': arch/powerpc/kernel/eeh.c:1173:37: error: 'powerpc_debugfs_root' \ undeclared (first use in this function) arch/powerpc/kernel/eeh.c:1173:37: note: each undeclared identifier \ is reported only once for each function it appears in Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | Merge branch 'merge' into nextBenjamin Herrenschmidt2014-05-19
| |\ \ \ \ \ | | |/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge "merge" branch to get two fairly important bug fixes: powerpc/powernv: Reset root port in firmware powerpc: irq work racing with timer interrupt can result in timer interrupt
| * | | | | Merge remote-tracking branch 'anton/abiv2' into nextBenjamin Herrenschmidt2014-05-05
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This series adds support for building the powerpc 64-bit LE kernel using the new ABI v2. We already supported running ABI v2 userspace programs but this adds support for building the kernel itself using the new ABI.
| | * | | | | powerpc/ftrace: Fix ABIv2 issues with __ftrace_make_callAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __ftrace_make_call assumed ABIv1 TOC stack offsets, so it broke on ABIv2. While we are here, we can simplify the instruction modification code. Since we always update one instruction there is no need to probe_kernel_write and flush_icache_range, just use patch_branch. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc/ftrace: Use module loader helpers to parse trampolinesAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now we have is_module_trampoline() and module_trampoline_target() we can remove a bunch of intimate kernel module trampoline knowledge from ftrace. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc/modules: Create module_trampoline_target()Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace has way too much knowledge of our kernel module trampoline layout hidden inside it. Create module_trampoline_target() that gives the target address of a kernel module trampoline. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc/modules: Create is_module_trampoline()Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ftrace has way too much knowledge of our kernel module trampoline layout hidden inside it. Create is_module_trampoline() that can abstract this away inside the module loader code. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: ftrace_caller, _mcount is exported to modules so needs _GLOBAL_TOC()Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When testing the ftrace function tracer, I realised that ftrace_caller and mcount are called from modules and they both call into C, therefore they need the ABIv2 global entry point to establish r2. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: modules: implement stubs for ELFv2 ABI.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 doesn't use function descriptors, because it doesn't need to load a new r2 when calling into a function. On the other hand, you're supposed to use a local entry point for R_PPC_REL24 branches. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: modules: skip r2 setup for ELFv2Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 doesn't need to set up r2 when calling a function. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: modules: use r12 for stub jump address.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ELFv2, r12 is supposed to equal to PC on entry to a function. Our stubs use r11, so change swap that with r12. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: modules: change r2 save/restore offset for ELFv2 ABI.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 uses a different stack offset (24 vs 40) to save r2. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: modules: comment about de-dotifying symbols when using the ELFv2 ABI.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ELFv2 doesn't use function descriptors, so we don't expect symbols to start with ".". But because depmod and modpost strip ".", and we have the special symbol ".TOC.", we still need to do it. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: Handle new ELFv2 module relocationsRusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new ELF ABI tends to use R_PPC64_REL16_LO and R_PPC64_REL16_HA relocations (PC-relative), so implement them. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: Fix up TOC. for modules.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel resolved the '.TOC.' to a fake symbol, so we need to fix it up to point to our .toc section plus 0x8000. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: module: handle MODVERSION for .TOC.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". If we don't create a CRC for it (minus the leading ".", since we strip that) we get a modpost warning about missing CRC and the CRC array seems to be displaced by 1 so other CRCs mismatch too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: EXPORT_SYMBOL(.TOC.)Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the ELFv2 ABI, powerpc introduces a magic symbol ".TOC.". depmod then complains that this doesn't resolve (so does modpost, but we could easily fix that). To export this, we need to use asm. modpost and depmod both strip "." from symbols for the old PPC64 ELFv1 ABI, so we actually export a "TOC.". Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: modules implement R_PPC64_TOCSAVE relocation.Rusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: make module stub code endian independentRusty Russell2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By representing them as words, rather than chars, we can avoid endian ifdefs. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
| | * | | | | powerpc: Fix SMP issues with ppc64le ABIv2Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to put a function descriptor in __secondary_hold_spinloop. Use ppc_function_entry to get the instruction address and put it in __secondary_hold_spinloop instead. Also fix an issue where we assumed cur_cpu_spec held a function descriptor. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc/tm: Fix GOT save offset for ABIv2Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The r2 TOC/GOT save offset is 40 on ABIv1 and 24 on ABIv2. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc/tm: Use STK_PARAMAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of the tm specific STACK_PARAM and use STK_PARAM Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Fix kernel thread creation on ABIv2Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change how we setup registers for ret_from_kernel_thread. In ABIv1, instead of passing a function descriptor in, dereference it and pass the target in directly. Use ppc_global_function_entry to get it right on both ABIv1 and ABIv2. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Ignore .TOC. relocationsAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The linker fixes up .TOC. relocations, so prom_init_check.sh should ignore them. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: ABIv2 function calls must place target address in r12Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To establish addressability quickly, ABIv2 requires the target address of the function being called to be in r12. Fix a number of places in assembly code that we do indirect function calls. We need to avoid function descriptors on ABIv2 too. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Create DOTSYM to wrap dot symbol usageAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few places we have to use dot symbols with the current ABI - the syscall table and the kvm hcall table. Wrap both of these with a new macro called DOTSYM so it will be easy to transition away from dot symbols in a future ABI. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Remove dot symbol usage in exception macrosAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | STD_EXCEPTION_COMMON, STD_EXCEPTION_COMMON_ASYNC and MASKABLE_EXCEPTION branch to the handler, so we can remove the explicit dot symbol and binutils will do the right thing. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Remove some unnecessary uses of _GLOBAL() and _STATIC()Anton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to create a function descriptor for functions called locally out of assembly. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Don't use a function descriptor for system call tableAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to create a function descriptor for the system call table. By using one we force the system call table into the text section and it really belongs in the rodata section. This also removes another use of dot symbols. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: Remove superflous function descriptors in assembly only codeAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a number of places where we load the text address of a local function and indirectly branch to it in assembly. Since it is an indirect branch binutils will not know to use the function text address, so that trick wont work. There is no need for these functions to have a function descriptor so we can replace it with a label and remove the dot symbol. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * | | | | powerpc: No need to use dot symbols when branching to a functionAnton Blanchard2014-04-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | binutils is smart enough to know that a branch to a function descriptor is actually a branch to the functions text address. Alan tells me that binutils has been doing this for 9 years. Signed-off-by: Anton Blanchard <anton@samba.org>
| * | | | | | powerpc: memcpy optimization for 64bit LEPhilippe Bergheaud2014-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unaligned stores take alignment exceptions on POWER7 running in little-endian. This is a dumb little-endian base memcpy that prevents unaligned stores. Once booted the feature fixup code switches over to the VMX copy loops (which are already endian safe). The question is what we do before that switch over. The base 64bit memcpy takes alignment exceptions on POWER7 so we can't use it as is. Fixing the causes of alignment exception would slow it down, because we'd need to ensure all loads and stores are aligned either through rotate tricks or bytewise loads and stores. Either would be bad for all other 64bit platforms. [ I simplified the loop a bit - Anton ] Signed-off-by: Philippe Bergheaud <felix@linux.vnet.ibm.com> Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/tm: Add checking to treclaim/trechkptMichael Neuling2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we do a treclaim and we are not in TM suspend mode, it results in a TM bad thing (ie. a 0x700 program check). Similarly if we do a trechkpt and we have an active transaction or TEXASR Failure Summary (FS) is not set, we also take a TM bad thing. This should never happen, but if it does (ie. a kernel bug), the cause is almost impossible to debug as the GPR state is mostly userspace and hence we don't get a call chain. This adds some checks in these cases case a BUG_ON() (in asm) in case we ever hit these cases. It moves the register saving around to preserve r1 till later also. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/tm: Remove unnecessary r1 saveMichael Neuling2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We save r1 to the scratch SPR and restore it from there after the trechkpt so saving r1 to the paca is not needed. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc: powernv: Framework to show the correct clock in /proc/cpuinfoGautham R. Shenoy2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the code in setup-common.c for powerpc assumes that all clock rates are same in a smp system. This value is cached in the variable named ppc_proc_freq and is the value that is reported in /proc/cpuinfo. However on the PowerNV platform, the clock rate is same only across the threads of the same core. Hence the value that is reported in /proc/cpuinfo is incorrect on PowerNV platforms. We need a better way to query and report the correct value of the processor clock in /proc/cpuinfo. The patch achieves this by creating a machdep_call named get_proc_freq() which is expected to returns the frequency in Hz. The code in show_cpuinfo() can invoke this method to display the correct clock rate on platforms that have implemented this method. On the other powerpc platforms it can use the value cached in ppc_proc_freq. Signed-off-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/pci: Use of_pci_range_parser helper in pci_process_bridge_OF_rangesAndrew Murray2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch updates the implementation of pci_process_bridge_OF_ranges to use the of_pci_range_parser helpers. Signed-off-by: Andrew Murray <amurray@embedded-bits.co.uk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/legacy_serial: Support MVME5100 UARTS with shifted registersStephen Chivers2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support to legacy serial for UARTS with shifted registers. The MVME5100 Single Board Computer is a PowerPC platform that has 16550 style UARTS with register addresses that are 16 bytes apart (shifted by 4). Commit 309257484cc1a592e8ac5fbdd8cd661be2b80bf8 "powerpc: Cleanup udbg_16550 and add support for LPC PIO-only UARTs" added support to udbg_16550 for shifted registers by adding a "stride" parameter to the initialisation operations for Programmed IO and Memory Mapped IO. As a consequence it is now possible to use the services of legacy serial to provide early serial console messages for the MVME5100. An added benefit of this is that the serial console will always be "ttyS0" irrespective of whether the computer is fitted with extra PCI 8250 interface boards or not. I have tested this patch using the four PowerPC platforms available to me: MVME5100 - shifted registers, SAM440EP - unshifted registers, MPC8349 - unshifted registers, MVME4100 - unshifted registers. Signed-off-by: Stephen Chivers <schivers@csc.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/prom: Stop scanning dev-tree for fdump earlyGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function early_init_dt_scan_fw_dump() is called to scan the device tree for fdump properties under node "rtas". Any one of them is invalid, we can stop scanning the device tree early by returning "1". It would save a bit time during boot. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Can't recover from non-PE-reset caseGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When PCI_ERS_RESULT_CAN_RECOVER returned from device drivers, the EEH core should enable I/O and DMA for the affected PE. However, it was missed to have DMA enabled in eeh_handle_normal_event(). Besides, the frozen state of the affected PE should be cleared after successful recovery, but we didn't. The patch fixes both of the issues as above. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/pci: Mask linkDown on resetting PCI busGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem was initially reported by Wendy who tried pass through IPR adapter, which was connected to PHB root port directly, to KVM based guest. When doing that, pci_reset_bridge_secondary_bus() was called by VFIO driver and linkDown was detected by the root port. That caused all PEs to be frozen. The patch fixes the issue by routing the reset for the secondary bus of root port to underly firmware. For that, one more weak function pci_reset_secondary_bus() is introduced so that the individual platforms can override that and do specific reset for bridge's secondary bus. Reported-by: Wendy Xiong <wenxiong@linux.vnet.ibm.com> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Make the delay for PE reset unifiedGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Basically, we have 3 types of resets to fulfil PE reset: fundamental, hot and PHB reset. For the later 2 cases, we need PCI bus reset hold and settlement delay as specified by PCI spec. PowerNV and pSeries platforms are running on top of different firmware and some of the delays have been covered by underly firmware (PowerNV). The patch makes the delays unified to be done in backend, instead of EEH core. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: No hotplug on permanently removed devGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issue was detected in a bit complicated test case where we have multiple hierarchical PEs shown as following figure: +-----------------+ | PE#3 p2p#0 | | p2p#1 | +-----------------+ | +-----------------+ | PE#4 pdev#0 | | pdev#1 | +-----------------+ PE#4 (have 2 PCI devices) is the child of PE#3, which has 2 p2p bridges. We accidentally had less-known scenario: PE#4 was removed permanently from the system because of permanent failure (e.g. exceeding the max allowd failure times in last hour), then we detects EEH errors on PE#3 and tried to recover it. However, eeh_dev instances for pdev#0/1 were not detached from PE#4, which was still connected to PE#3. All of that was because of the fact that we rely on count-based pcibios_release_device(), which isn't reliable enough. When doing recovery for PE#3, we still apply hotplug on PE#4 and pdev#0/1, which are not valid any more. Eventually, we run into kernel crash. The patch fixes above issue from two aspects. For unplug, we simply skip those permanently removed PE, whose state is (EEH_PE_STATE_ISOLATED && !EEH_PE_STATE_RECOVERING) and its frozen count should be greater than EEH_MAX_ALLOWED_FREEZES. For plug, we marked all permanently removed EEH devices with EEH_DEV_REMOVED and return 0xFF's on read its PCI config so that PCI core will omit them. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Allow to disable EEHGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch introduces bootarg "eeh=off" to disable EEH functinality. Also, it creates /sys/kerenl/debug/powerpc/eeh_enable to disable or enable EEH functionality. By default, we have the functionality enabled. For PowerNV platform, we will restore to have the conventional mechanism of clearing frozen PE during PCI config access if we're going to disable EEH functionality. Conversely, we will rely on EEH for error recovery. The patch also fixes the issue that we missed to cover the case of disabled EEH functionality in function ioda_eeh_event(). Those events driven by interrupt should be cleared to avoid endless reporting. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Cleanup EEH subsystem variablesGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There're 2 EEH subsystem variables: eeh_subsystem_enabled and eeh_probe_mode. We needn't maintain 2 variables and we can just have one variable and introduce different flags. The patch also introduces additional flag EEH_FORCE_DISABLE, which will be used to disable EEH subsystem via boot parameter ("eeh=off") in future. Besides, the patch also introduces flag EEH_ENABLED, which is changed to disable or enable EEH functionality on the fly through debugfs entry in future. With the patch applied, the creteria to check the enabled EEH functionality is changed to: !EEH_FORCE_DISABLED && EEH_ENABLED : Enabled Other cases : Disabled Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Use cached capability for log dumpGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When calling into eeh_gather_pci_data() on pSeries platform, we possiblly don't have pci_dev instance yet, but eeh_dev is always ready. So we use cached capability from eeh_dev instead of pci_dev for log dump there. In order to keep things unified, we also cache PCI capability positions to eeh_dev for PowerNV as well. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Cleanup eeh_gather_pci_data()Gavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch replaces printk(KERN_WARNING ...) with pr_warn() in the function eeh_gather_pci_data(). Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Avoid I/O access during PE resetGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have suffered recrusive frozen PE a lot, which was caused by IO accesses during the PE reset. Ben came up with the good idea to keep frozen PE until recovery (BAR restore) gets done. With that, IO accesses during PE reset are dropped by hardware and wouldn't incur the recrusive frozen PE any more. The patch implements the idea. We don't clear the frozen state until PE reset is done completely. During the period, the EEH core expects unfrozen state from backend to keep going. So we have to reuse EEH_PE_RESET flag, which has been set during PE reset, to return normal state from backend. The side effect is we have to clear frozen state for towice (PE reset and clear it explicitly), but that's harmless. We have some limitations on pHyp. pHyp doesn't allow to enable IO or DMA for unfrozen PE. So we don't enable them on unfrozen PE in eeh_pci_enable(). We have to enable IO before grabbing logs on pHyp. Otherwise, 0xFF's is always returned from PCI config space. Also, we had wrong return value from eeh_pci_enable() for EEH_OPT_THAW_DMA case. The patch fixes it too. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Block PCI-CFG access during PE resetGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've observed multiple PE reset failures because of PCI-CFG access during that period. Potentially, some device drivers can't support EEH very well and they can't put the device to motionless state before PE reset. So those device drivers might produce PCI-CFG accesses during PE reset. Also, we could have PCI-CFG access from user space (e.g. "lspci"). Since access to frozen PE should return 0xFF's, we can block PCI-CFG access during the period of PE reset so that we won't get recrusive EEH errors. The patch adds flag EEH_PE_RESET, which is kept during PE reset. The PowerNV/pSeries PCI-CFG accessors reuse the flag to block PCI-CFG accordingly. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: EEH_PE_ISOLATED not reflect HW stateGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing PE reset, EEH_PE_ISOLATED is cleared unconditionally. However, We should remove that if the PE reset has cleared the frozen state successfully. Otherwise, the flag should be kept. The patch fixes the issue. Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | | | | | powerpc/eeh: Remove EEH_PE_PHB_DEADGavin Shan2014-04-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PE state (for eeh_pe instance) EEH_PE_PHB_DEAD is duplicate to EEH_PE_ISOLATED. Originally, those PHBs (PHB PE) with EEH_PE_PHB_DEAD would be removed from the system. However, it's safe to replace that with EEH_PE_ISOLATED. The patch also clear EEH_PE_RECOVERING after fenced PHB has been handled, either failure or success. It makes the PHB PE state consistent with: PHB functions normally NONE PHB has been removed EEH_PE_ISOLATED PHB fenced, recovery in progress EEH_PE_ISOLATED | RECOVERING Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>