aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/kernel
Commit message (Collapse)AuthorAge
...
| * | powerpc: Add option to use jump label for cpu_has_feature()Kevin Hao2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do binary patching of asm code using CPU features, which is a one-time operation, done during early boot. However checks of CPU features in C code are currently done at run time, even though the set of CPU features can never change after boot. We can optimise this by using jump labels to implement cpu_has_feature(), meaning checks in C code are binary patched into a single nop or branch. For a C sequence along the lines of: if (cpu_has_feature(FOO)) return 2; The generated code before is roughly: ld r9,-27640(r2) ld r9,0(r9) lwz r9,32(r9) cmpwi cr7,r9,0 bge cr7, 1f li r3,2 blr 1: ... After (true): nop li r3,2 blr After (false): b 1f li r3,2 blr 1: ... mpe: Rename MAX_CPU_FEATURES as we already have a #define with that name, and define it simply as a constant, rather than doing tricks with sizeof and NULL pointers. Rename the array to cpu_feature_keys. Use the kconfig we added to guard it. Add BUILD_BUG_ON() if the feature is not a compile time constant. Rewrite the change log. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc: Move cpu_has_feature() to a separate fileKevin Hao2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We plan to use jump label for cpu_has_feature(). In order to implement this we need to include the linux/jump_label.h in asm/cputable.h. Unfortunately if we do that it leads to an include loop. The root of the problem seems to be that reg.h needs cputable.h (for CPU_FTRs), and then cputable.h via jump_label.h eventually pulls in hw_irq.h which needs reg.h (for MSR_EE). So move cpu_has_feature() to a separate file on its own. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> [mpe: Rename to cpu_has_feature.h and flesh out change log] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Convert early cpu/mmu feature check to use the new helpersAneesh Kumar K.V2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This switches early feature checks to use the non static key variant of the function. In later patches we will be switching cpu_has_feature() and mmu_has_feature() to use static keys and we can use them only after static key/jump label is initialized. Any check for feature before jump label init should be done using this new helper. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Make MMU_FTR_RADIX a MMU family featureAneesh Kumar K.V2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MMU feature bits are defined such that we use the lower half to present MMU family features. Remove the strict split of half and also move Radix to a mmu family feature. Radix introduce a new MMU model and strictly speaking it is a new MMU family. This also free up bits which can be used for individual features later. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/64: Do feature patching before MMU initMichael Ellerman2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up until now we needed to do the MMU init before feature patching, because part of the MMU init was scanning the device tree and setting and/or clearing some MMU feature bits. Now that we have split that MMU feature modification out into routines called from early_init_devtree() (called earlier) we can now do feature patching before calling MMU init. The advantage of this is it means the remainder of the MMU init runs with the final set of features which will apply for the rest of the life of the system. This means we don't have to special case anything called from MMU init to deal with a changing set of feature bits. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Move disable_radix handling into mmu_early_init_devtree()Michael Ellerman2016-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the handling of the disable_radix command line argument into the newly created mmu_early_init_devtree(). It's an MMU option so it's preferable to have it in an mm related file, and it also means platforms that don't support radix don't have to carry the code. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc/mm: Add mmu_early_init_devtree()Michael Ellerman2016-07-31
| | | | | | | | | | | | | | | | | | Empty for now, but we'll add to it in the next patch. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | dma-mapping: use unsigned long for dma_attrsKrzysztof Kozlowski2016-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The dma-mapping core and the implementations do not change the DMA attributes passed by pointer. Thus the pointer can point to const data. However the attributes do not have to be a bitfield. Instead unsigned long will do fine: 1. This is just simpler. Both in terms of reading the code and setting attributes. Instead of initializing local attributes on the stack and passing pointer to it to dma_set_attr(), just set the bits. 2. It brings safeness and checking for const correctness because the attributes are passed by value. Semantic patches for this change (at least most of them): virtual patch virtual context @r@ identifier f, attrs; @@ f(..., - struct dma_attrs *attrs + unsigned long attrs , ...) { ... } @@ identifier r.f; @@ f(..., - NULL + 0 ) and // Options: --all-includes virtual patch virtual context @r@ identifier f, attrs; type t; @@ t f(..., struct dma_attrs *attrs); @@ identifier r.f; @@ f(..., - NULL + 0 ) Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Vineet Gupta <vgupta@synopsys.com> Acked-by: Robin Murphy <robin.murphy@arm.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Mark Salter <msalter@redhat.com> [c6x] Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris] Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm] Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp] Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core] Acked-by: David Vrabel <david.vrabel@citrix.com> [xen] Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb] Acked-by: Joerg Roedel <jroedel@suse.de> [iommu] Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon] Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k] Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390] Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32] Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc] Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'pci-v4.8-changes' of ↵Linus Torvalds2016-08-02
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "Highlights: - ARM64 support for ACPI host bridges - new drivers for Axis ARTPEC-6 and Marvell Aardvark - new pci_alloc_irq_vectors() interface for MSI-X, MSI, legacy INTx - pci_resource_to_user() cleanup (more to come) Detailed summary: Enumeration: - Move ecam.h to linux/include/pci-ecam.h (Jayachandran C) - Add parent device field to ECAM struct pci_config_window (Jayachandran C) - Add generic MCFG table handling (Tomasz Nowicki) - Refactor pci_bus_assign_domain_nr() for CONFIG_PCI_DOMAINS_GENERIC (Tomasz Nowicki) - Factor DT-specific pci_bus_find_domain_nr() code out (Tomasz Nowicki) Resource management: - Add devm_request_pci_bus_resources() (Bjorn Helgaas) - Unify pci_resource_to_user() declarations (Bjorn Helgaas) - Implement pci_resource_to_user() with pcibios_resource_to_bus() (microblaze, powerpc, sparc) (Bjorn Helgaas) - Request host bridge window resources (designware, iproc, rcar, xgene, xilinx, xilinx-nwl) (Bjorn Helgaas) - Make PCI I/O space optional on ARM32 (Bjorn Helgaas) - Ignore write combining when mapping I/O port space (Bjorn Helgaas) - Claim bus resources on MIPS PCI_PROBE_ONLY set-ups (Bjorn Helgaas) - Remove unicore32 pci=firmware command line parameter handling (Bjorn Helgaas) - Support I/O resources when parsing host bridge resources (Jayachandran C) - Add helpers to request/release memory and I/O regions (Johannes Thumshirn) - Use pci_(request|release)_mem_regions (NVMe, lpfc, GenWQE, ethernet/intel, alx) (Johannes Thumshirn) - Extend pci=resource_alignment to specify device/vendor IDs (Koehrer Mathias (ETAS/ESW5)) - Add generic pci_bus_claim_resources() (Lorenzo Pieralisi) - Claim bus resources on ARM32 PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) - Remove ARM32 and ARM64 arch-specific pcibios_enable_device() (Lorenzo Pieralisi) - Add pci_unmap_iospace() to unmap I/O resources (Sinan Kaya) - Remove powerpc __pci_mmap_set_pgprot() (Yinghai Lu) PCI device hotplug: - Allow additional bus numbers for hotplug bridges (Keith Busch) - Ignore interrupts during D3cold (Lukas Wunner) Power management: - Enforce type casting for pci_power_t (Andy Shevchenko) - Don't clear d3cold_allowed for PCIe ports (Mika Westerberg) - Put PCIe ports into D3 during suspend (Mika Westerberg) - Power on bridges before scanning new devices (Mika Westerberg) - Runtime resume bridge before rescan (Mika Westerberg) - Add runtime PM support for PCIe ports (Mika Westerberg) - Remove redundant check of pcie_set_clkpm (Shawn Lin) Virtualization: - Add function 1 DMA alias quirk for Marvell 88SE9182 (Aaron Sierra) - Add DMA alias quirk for Adaptec 3805 (Alex Williamson) - Mark Atheros AR9485 and QCA9882 to avoid bus reset (Chris Blake) - Add ACS quirk for Solarflare SFC9220 (Edward Cree) MSI: - Fix PCI_MSI dependencies (Arnd Bergmann) - Add pci_msix_desc_addr() helper (Christoph Hellwig) - Switch msix_program_entries() to use pci_msix_desc_addr() (Christoph Hellwig) - Make the "entries" argument to pci_enable_msix() optional (Christoph Hellwig) - Provide sensible IRQ vector alloc/free routines (Christoph Hellwig) - Spread interrupt vectors in pci_alloc_irq_vectors() (Christoph Hellwig) Error Handling: - Bind DPC to Root Ports as well as Downstream Ports (Keith Busch) - Remove DPC tristate module option (Keith Busch) - Convert Downstream Port Containment driver to use devm_* functions (Mika Westerberg) Generic host bridge driver: - Select IRQ_DOMAIN (Arnd Bergmann) - Claim bus resources on PCI_PROBE_ONLY set-ups (Lorenzo Pieralisi) ACPI host bridge driver: - Add ARM64 acpi_pci_bus_find_domain_nr() (Tomasz Nowicki) - Add ARM64 ACPI support for legacy IRQs parsing and consolidation with DT code (Tomasz Nowicki) - Implement ARM64 AML accessors for PCI_Config region (Tomasz Nowicki) - Support ARM64 ACPI-based PCI host controller (Tomasz Nowicki) Altera host bridge driver: - Check link status before retrain link (Ley Foon Tan) - Poll for link up status after retraining the link (Ley Foon Tan) Axis ARTPEC-6 host bridge driver: - Add PCI_MSI_IRQ_DOMAIN dependency (Arnd Bergmann) - Add DT binding for Axis ARTPEC-6 PCIe controller (Niklas Cassel) - Add Axis ARTPEC-6 PCIe controller driver (Niklas Cassel) Intel VMD host bridge driver: - Use lock save/restore in interrupt enable path (Jon Derrick) - Select device dma ops to override (Keith Busch) - Initialize list item in IRQ disable (Keith Busch) - Use x86_vector_domain as parent domain (Keith Busch) - Separate MSI and MSI-X vector sharing (Keith Busch) Marvell Aardvark host bridge driver: - Add DT binding for the Aardvark PCIe controller (Thomas Petazzoni) - Add Aardvark PCI host controller driver (Thomas Petazzoni) - Add Aardvark PCIe support for Armada 3700 (Thomas Petazzoni) Microsoft Hyper-V host bridge driver: - Fix interrupt cleanup path (Cathy Avery) - Don't leak buffer in hv_pci_onchannelcallback() (Vitaly Kuznetsov) - Handle all pending messages in hv_pci_onchannelcallback() (Vitaly Kuznetsov) NVIDIA Tegra host bridge driver: - Program PADS_REFCLK_CFG* always, not just on legacy SoCs (Stephen Warren) - Program PADS_REFCLK_CFG* registers with per-SoC values (Stephen Warren) - Use lower-case hex consistently for register definitions (Thierry Reding) - Use generic pci_remap_iospace() rather than ARM32-specific one (Thierry Reding) - Stop setting pcibios_min_mem (Thierry Reding) Renesas R-Car host bridge driver: - Drop gen2 dummy I/O port region (Bjorn Helgaas) TI DRA7xx host bridge driver: - Fix return value in case of error (Christophe JAILLET) Xilinx AXI host bridge driver: - Fix return value in case of error (Christophe JAILLET) Miscellaneous: - Make bus_attr_resource_alignment static (Ben Dooks) - Include <asm/dma.h> for isa_dma_bridge_buggy (Ben Dooks) - MAINTAINERS: Add file patterns for PCI device tree bindings (Geert Uytterhoeven) - Make host bridge drivers explicitly non-modular (Paul Gortmaker)" * tag 'pci-v4.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (125 commits) PCI: xgene: Make explicitly non-modular PCI: thunder-pem: Make explicitly non-modular PCI: thunder-ecam: Make explicitly non-modular PCI: tegra: Make explicitly non-modular PCI: rcar-gen2: Make explicitly non-modular PCI: rcar: Make explicitly non-modular PCI: mvebu: Make explicitly non-modular PCI: layerscape: Make explicitly non-modular PCI: keystone: Make explicitly non-modular PCI: hisi: Make explicitly non-modular PCI: generic: Make explicitly non-modular PCI: designware-plat: Make it explicitly non-modular PCI: artpec6: Make explicitly non-modular PCI: armada8k: Make explicitly non-modular PCI: artpec: Add PCI_MSI_IRQ_DOMAIN dependency PCI: Add ACS quirk for Solarflare SFC9220 arm64: dts: marvell: Add Aardvark PCIe support for Armada 3700 PCI: aardvark: Add Aardvark PCI host controller driver dt-bindings: add DT binding for the Aardvark PCIe controller PCI: tegra: Program PADS_REFCLK_CFG* registers with per-SoC values ...
| * \ \ Merge branch 'pci/msi-affinity' into nextBjorn Helgaas2016-08-01
| |\ \ \ | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/nvme/host/pci.c
| * | | | powerpc/pci: Implement pci_resource_to_user() with pcibios_resource_to_bus()Bjorn Helgaas2016-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "User" addresses are shown in /sys/devices/pci.../.../resource and /proc/bus/pci/devices and used as mmap offsets for /proc/bus/pci/BB/DD.F files. For I/O port resources on powerpc, these are PCI bus addresses, i.e., raw BAR values. Previously pci_resource_to_user() computed the user address by subtracting "hose->io_base_virt - _IO_BASE" from the resource start: pci_resource_to_user() if (IO) offset = (unsigned long)hose->io_base_virt - _IO_BASE; *start = rsrc->start - offset; We've already told the PCI core about that "hose->io_base_virt - _IO_BASE" offset: pcibios_setup_phb_resources() res = &hose->io_resource; offset = pcibios_io_space_offset(); /* i.e., "offset = hose->io_base_virt - _IO_BASE" */ pci_add_resource_offset(resources, res, offset); so pcibios_resource_to_bus() knows how to do that translation. No functional change intended. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Yinghai Lu <yinghai@kernel.org>
| * | | | powerpc/pci: Remove __pci_mmap_set_pgprot()Yinghai Lu2016-06-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The powerpc-specific __pci_mmap_set_pgprot() does two things: 1) Disables write combining for I/O port space mappings This only affects procfs mappings. The pci_mmap_resource() sysfs path only requests write combining for resources with IORESOURCE_PREFETCH set, which doesn't include I/O resources. The only way to request write combining for I/O port space mappings was via the PCIIOC_WRITE_COMBINE ioctl and the proc_bus_pci_mmap() path, and we recently changed that path to ignore write combining for I/O, so this code in powerpc is no longer needed. 2) Automatically enables write combining for mappings of prefetchable resources, even if not requested by the user Both procfs (via PCIIOC_MMAP_IS_MEM and PCIIOC_WRITE_COMBINE ioctls) and sysfs (via "resourceN_wc" files, which are created for resources with IORESOURCE_PREFETCH) provide ways for the user to map PCI memory space with write combining. Users that desire write combining should use one of those ways instead of relying on powerpc-specific behavior. Remove the powerpc-specific __pci_mmap_set_pgprot(). The user-visible effect of this change is that powerpc users mapping prefetchable PCI memory space via procfs without PCIIOC_WRITE_COMBINE or via sysfs "resourceN" (not "resourceN_wc") will get regular uncacheable mappings instead of the write combining mappings they used to get. The new behavior matches the behavior on all other arches that support write combining mapping. [bhelgaas: changelog] Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-08-02
|\ \ \ \ \ | |_|_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: - ARM: GICv3 ITS emulation and various fixes. Removal of the old VGIC implementation. - s390: support for trapping software breakpoints, nested virtualization (vSIE), the STHYI opcode, initial extensions for CPU model support. - MIPS: support for MIPS64 hosts (32-bit guests only) and lots of cleanups, preliminary to this and the upcoming support for hardware virtualization extensions. - x86: support for execute-only mappings in nested EPT; reduced vmexit latency for TSC deadline timer (by about 30%) on Intel hosts; support for more than 255 vCPUs. - PPC: bugfixes. * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (302 commits) KVM: PPC: Introduce KVM_CAP_PPC_HTM MIPS: Select HAVE_KVM for MIPS64_R{2,6} MIPS: KVM: Reset CP0_PageMask during host TLB flush MIPS: KVM: Fix ptr->int cast via KVM_GUEST_KSEGX() MIPS: KVM: Sign extend MFC0/RDHWR results MIPS: KVM: Fix 64-bit big endian dynamic translation MIPS: KVM: Fail if ebase doesn't fit in CP0_EBase MIPS: KVM: Use 64-bit CP0_EBase when appropriate MIPS: KVM: Set CP0_Status.KX on MIPS64 MIPS: KVM: Make entry code MIPS64 friendly MIPS: KVM: Use kmap instead of CKSEG0ADDR() MIPS: KVM: Use virt_to_phys() to get commpage PFN MIPS: Fix definition of KSEGX() for 64-bit KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD kvm: x86: nVMX: maintain internal copy of current VMCS KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures KVM: arm64: vgic-its: Simplify MAPI error handling KVM: arm64: vgic-its: Make vgic_its_cmd_handle_mapi similar to other handlers KVM: arm64: vgic-its: Turn device_id validation into generic ID validation ...
| * | | | KVM: PPC: Book3S HV: Fix TB corruption in guest exit path on HMI interruptMahesh Salgaonkar2016-06-20
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a guest is assigned to a core it converts the host Timebase (TB) into guest TB by adding guest timebase offset before entering into guest. During guest exit it restores the guest TB to host TB. This means under certain conditions (Guest migration) host TB and guest TB can differ. When we get an HMI for TB related issues the opal HMI handler would try fixing errors and restore the correct host TB value. With no guest running, we don't have any issues. But with guest running on the core we run into TB corruption issues. If we get an HMI while in the guest, the current HMI handler invokes opal hmi handler before forcing guest to exit. The guest exit path subtracts the guest TB offset from the current TB value which may have already been restored with host value by opal hmi handler. This leads to incorrect host and guest TB values. With split-core, things become more complex. With split-core, TB also gets split and each subcore gets its own TB register. When a hmi handler fixes a TB error and restores the TB value, it affects all the TB values of sibling subcores on the same core. On TB errors all the thread in the core gets HMI. With existing code, the individual threads call opal hmi handle independently which can easily throw TB out of sync if we have guest running on subcores. Hence we will need to co-ordinate with all the threads before making opal hmi handler call followed by TB resync. This patch introduces a sibling subcore state structure (shared by all threads in the core) in paca which holds information about whether sibling subcores are in Guest mode or host mode. An array in_guest[] of size MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore. The subcore id is used as index into in_guest[] array. Only primary thread entering/exiting the guest is responsible to set/unset its designated array element. On TB error, we get HMI interrupt on every thread on the core. Upon HMI, this patch will now force guest to vacate the core/subcore. Primary thread from each subcore will then turn off its respective bit from the above bitmap during the guest exit path just after the guest->host partition switch is complete. All other threads that have just exited the guest OR were already in host will wait until all other subcores clears their respective bit. Once all the subcores turn off their respective bit, all threads will will make call to opal hmi handler. It is not necessary that opal hmi handler would resync the TB value for every HMI interrupts. It would do so only for the HMI caused due to TB errors. For rest, it would not touch TB value. Hence to make things simpler, primary thread would call TB resync explicitly once for each core immediately after opal hmi handler instead of subtracting guest offset from TB. TB resync call will restore the TB with host value. Thus we can be sure about the TB state. One of the primary threads exiting the guest will take up the responsibility of calling TB resync. It will use one of the top bits (bit 63) from subcore state flags bitmap to make the decision. The first primary thread (among the subcores) that is able to set the bit will have to call the TB resync. Rest all other threads will wait until TB resync is complete. Once TB resync is complete all threads will then proceed. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
* | | | Merge tag 'powerpc-4.8-1' of ↵Linus Torvalds2016-07-31
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - PowerNV PCI hotplug support. - Lots more Power9 support. - eBPF JIT support on ppc64le. - Lots of cxl updates. - Boot code consolidation. Bug fixes: - Fix spin_unlock_wait() from Boqun Feng - Fix stack pointer corruption in __tm_recheckpoint() from Michael Neuling - Fix multiple bugs in memory_hotplug_max() from Bharata B Rao - mm: Ensure "special" zones are empty from Oliver O'Halloran - ftrace: Separate the heuristics for checking call sites from Michael Ellerman - modules: Never restore r2 for a mprofile-kernel style mcount() call from Michael Ellerman - Fix endianness when reading TCEs from Alexey Kardashevskiy - start rtasd before PCI probing from Greg Kurz - PCI: rpaphp: Fix slot registration for multiple slots under a PHB from Tyrel Datwyler - powerpc/mm: Add memory barrier in __hugepte_alloc() from Sukadev Bhattiprolu Cleanups & fixes: - Drop support for MPIC in pseries from Rashmica Gupta - Define and use PPC64_ELF_ABI_v2/v1 from Michael Ellerman - Remove unused symbols in asm-offsets.c from Rashmica Gupta - Fix SRIOV not building without EEH enabled from Russell Currey - Remove kretprobe_trampoline_holder from Thiago Jung Bauermann - Reduce log level of PCI I/O space warning from Benjamin Herrenschmidt - Add array bounds checking to crash_shutdown_handlers from Suraj Jitindar Singh - Avoid -maltivec when using clang integrated assembler from Anton Blanchard - Fix array overrun in ppc_rtas() syscall from Andrew Donnellan - Fix error return value in cmm_mem_going_offline() from Rasmus Villemoes - export cpu_to_core_id() from Mauricio Faria de Oliveira - Remove old symbols from defconfigs from Andrew Donnellan - Update obsolete comments in setup_32.c about entry conditions from Benjamin Herrenschmidt - Add comment explaining the purpose of setup_kdump_trampoline() from Benjamin Herrenschmidt - Merge the RELOCATABLE config entries for ppc32 and ppc64 from Kevin Hao - Remove RELOCATABLE_PPC32 from Kevin Hao - Fix .long's in tlb-radix.c to more meaningful from Balbir Singh Minor cleanups & fixes: - Andrew Donnellan, Anna-Maria Gleixner, Anton Blanchard, Benjamin Herrenschmidt, Bharata B Rao, Christophe Leroy, Colin Ian King, Geliang Tang, Greg Kurz, Madhavan Srinivasan, Michael Ellerman, Michael Ellerman, Stephen Rothwell, Stewart Smith. Freescale updates from Scott: - "Highlights include more 8xx optimizations, device tree updates, and MVME7100 support." PowerNV PCI hotplug from Gavin Shan: - PCI: Add pcibios_setup_bridge() - Override pcibios_setup_bridge() - Remove PCI_RESET_DELAY_US - Move pnv_pci_ioda_setup_opal_tce_kill() around - Increase PE# capacity - Allocate PE# in reverse order - Create PEs in pcibios_setup_bridge() - Setup PE for root bus - Extend PCI bridge resources - Make pnv_ioda_deconfigure_pe() visible - Dynamically release PE - Update bridge windows on PCI plug - Delay populating pdn - Support PCI slot ID - Use PCI slot reset infrastructure - Introduce pnv_pci_get_slot_id() - Functions to get/set PCI slot state - PCI/hotplug: PowerPC PowerNV PCI hotplug driver - Print correct PHB type names Power9 idle support from Shreyas B. Prabhu: - set power_save func after the idle states are initialized - Use PNV_THREAD_WINKLE macro while requesting for winkle - make hypervisor state restore a function - Rename idle_power7.S to idle_book3s.S - Rename reusable idle functions to hardware agnostic names - Make pnv_powersave_common more generic - abstraction for saving SPRs before entering deep idle states - Add platform support for stop instruction - cpuidle/powernv: Use CPUIDLE_STATE_MAX instead of MAX_POWERNV_IDLE_STATES - cpuidle/powernv: cleanup cpuidle-powernv.c - cpuidle/powernv: Add support for POWER ISA v3 idle states - Use deepest stop state when cpu is offlined Power9 PMU from Madhavan Srinivasan: - factor out power8 pmu macros and defines - factor out power8 pmu functions - factor out power8 __init_pmu code - Add power9 event list macros for generic and cache events - Power9 PMU support - Export Power9 generic and cache events to sysfs Power9 preliminary interrupt & PCI support from Benjamin Herrenschmidt: - Add XICS emulation APIs - Move a few exception common handlers to make room - Add support for HV virtualization interrupts - Add mechanism to force a replay of interrupts - Add ICP OPAL backend - Discover IODA3 PHBs - pci: Remove obsolete SW invalidate - opal: Add real mode call wrappers - Rename TCE invalidation calls - Remove SWINV constants and obsolete TCE code - Rework accessing the TCE invalidate register - Fallback to OPAL for TCE invalidations - Use the device-tree to get available range of M64's - Check status of a PHB before using it - pci: Don't try to allocate resources that will be reassigned Other Power9: - Send SIGBUS on unaligned copy and paste from Chris Smart - Large Decrementer support from Oliver O'Halloran - Load Monitor Register Support from Jack Miller Performance improvements from Anton Blanchard: - Avoid load hit store in __giveup_fpu() and __giveup_altivec() - Avoid load hit store in setup_sigcontext() - Remove assembly versions of strcpy, strcat, strlen and strcmp - Align hot loops of some string functions eBPF JIT from Naveen N. Rao: - Fix/enhance 32-bit Load Immediate implementation - Optimize 64-bit Immediate loads - Introduce rotate immediate instructions - A few cleanups - Isolate classic BPF JIT specifics into a separate header - Implement JIT compiler for extended BPF Operator Panel driver from Suraj Jitindar Singh: - devicetree/bindings: Add binding for operator panel on FSP machines - Add inline function to get rc from an ASYNC_COMP opal_msg - Add driver for operator panel on FSP machines Sparse fixes from Daniel Axtens: - make some things static - Introduce asm-prototypes.h - Include headers containing prototypes - Use #ifdef __BIG_ENDIAN__ #else for REG_BYTE - kvm: Clarify __user annotations - Pass endianness to sparse - Make ppc_md.{halt, restart} __noreturn MM fixes & cleanups from Aneesh Kumar K.V: - radix: Update LPCR HR bit as per ISA - use _raw variant of page table accessors - Compile out radix related functions if RADIX_MMU is disabled - Clear top 16 bits of va only on older cpus - Print formation regarding the the MMU mode - hash: Update SDR1 size encoding as documented in ISA 3.0 - radix: Update PID switch sequence - radix: Update machine call back to support new HCALL. - radix: Add LPID based tlb flush helpers - radix: Add a kernel command line to disable radix - Cleanup LPCR defines Boot code consolidation from Benjamin Herrenschmidt: - Move epapr_paravirt_early_init() to early_init_devtree() - cell: Don't use flat device-tree after boot - ge_imp3a: Don't use the flat device-tree after boot - mpc85xx_ds: Don't use the flat device-tree after boot - mpc85xx_rdb: Don't use the flat device-tree after boot - Don't test for machine type in rtas_initialize() - Don't test for machine type in smp_setup_cpu_maps() - dt: Add of_device_compatible_match() - Factor do_feature_fixup calls - Move 64-bit feature fixup earlier - Move 64-bit memory reserves to setup_arch() - Use a cachable DART - Move FW feature probing out of pseries probe() - Put exception configuration in a common place - Remove early allocation of the SMU command buffer - Move MMU backend selection out of platform code - pasemi: Remove IOBMAP allocation from platform probe() - mm/hash: Don't use machine_is() early during boot - Don't test for machine type to detect HEA special case - pmac: Remove spurrious machine type test - Move hash table ops to a separate structure - Ensure that ppc_md is empty before probing for machine type - Move 64-bit probe_machine() to later in the boot process - Move 32-bit probe() machine to later in the boot process - Get rid of ppc_md.init_early() - Move the boot time info banner to a separate function - Move setting of {i,d}cache_bsize to initialize_cache_info() - Move the content of setup_system() to setup_arch() - Move cache info inits to a separate function - Re-order the call to smp_setup_cpu_maps() - Re-order setup_panic() - Make a few boot functions __init - Merge 32-bit and 64-bit setup_arch() Other new features: - tty/hvc: Use IRQF_SHARED for OPAL hvc consoles from Sam Mendoza-Jonas - tty/hvc: Use opal irqchip interface if available from Sam Mendoza-Jonas - powerpc: Add module autoloading based on CPU features from Alastair D'Silva - crypto: vmx - Convert to CPU feature based module autoloading from Alastair D'Silva - Wake up kopald polling thread before waiting for events from Benjamin Herrenschmidt - xmon: Dump ISA 2.06 SPRs from Michael Ellerman - xmon: Dump ISA 2.07 SPRs from Michael Ellerman - Add a parameter to disable 1TB segs from Oliver O'Halloran - powerpc/boot: Add OPAL console to epapr wrappers from Oliver O'Halloran - Assign fixed PHB number based on device-tree properties from Guilherme G. Piccoli - pseries: Add pseries hotplug workqueue from John Allen - pseries: Add support for hotplug interrupt source from John Allen - pseries: Use kernel hotplug queue for PowerVM hotplug events from John Allen - pseries: Move property cloning into its own routine from Nathan Fontenot - pseries: Dynamic add entires to associativity lookup array from Nathan Fontenot - pseries: Auto-online hotplugged memory from Nathan Fontenot - pseries: Remove call to memblock_add() from Nathan Fontenot cxl: - Add set and get private data to context struct from Michael Neuling - make base more explicitly non-modular from Paul Gortmaker - Use for_each_compatible_node() macro from Wei Yongjun - Frederic Barrat - Abstract the differences between the PSL and XSL - Make vPHB device node match adapter's - Philippe Bergheaud - Add mechanism for delivering AFU driver specific events - Ignore CAPI adapters misplaced in switched slots - Refine slice error debug messages - Andrew Donnellan - static-ify variables to fix sparse warnings - PCI/hotplug: pnv_php: export symbols and move struct types needed by cxl - PCI/hotplug: pnv_php: handle OPAL_PCI_SLOT_OFFLINE power state - Add cxl_check_and_switch_mode() API to switch bi-modal cards - remove dead Kconfig options - fix potential NULL dereference in free_adapter() - Ian Munsie - Update process element after allocating interrupts - Add support for CAPP DMA mode - Fix allowing bogus AFU descriptors with 0 maximum processes - Fix allocating a minimum of 2 pages for the SPA - Fix bug where AFU disable operation had no effect - Workaround XSL bug that does not clear the RA bit after a reset - Fix NULL pointer dereference on kernel contexts with no AFU interrupts - powerpc/powernv: Split cxl code out into a separate file - Add cxl_slot_is_supported API - Enable bus mastering for devices using CAPP DMA mode - Move cxl_afu_get / cxl_afu_put to base - Allow a default context to be associated with an external pci_dev - Do not create vPHB if there are no AFU configuration records - powerpc/powernv: Add support for the cxl kernel api on the real phb - Add support for using the kernel API with a real PHB - Add kernel APIs to get & set the max irqs per context - Add preliminary workaround for CX4 interrupt limitation - Add support for interrupts on the Mellanox CX4 - Workaround PE=0 hardware limitation in Mellanox CX4 - powerpc/powernv: Fix pci-cxl.c build when CONFIG_MODULES=n selftests: - Test unaligned copy and paste from Chris Smart - Load Monitor Register Tests from Jack Miller - Cyril Bur - exec() with suspended transaction - Use signed long to read perf_event_paranoid - Fix usage message in context_switch - Fix generation of vector instructions/types in context_switch - Michael Ellerman - Use "Delta" rather than "Error" in normal output - Import Anton's mmap & futex micro benchmarks - Add a test for PROT_SAO" * tag 'powerpc-4.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (263 commits) powerpc/mm: Parenthesise IS_ENABLED() in if condition tty/hvc: Use opal irqchip interface if available tty/hvc: Use IRQF_SHARED for OPAL hvc consoles selftests/powerpc: exec() with suspended transaction powerpc: Improve comment explaining why we modify VRSAVE powerpc/mm: Drop unused externs for hpte_init_beat[_v3]() powerpc/mm: Rename hpte_init_lpar() and move the fallback to a header powerpc/mm: Fix build break when PPC_NATIVE=n crypto: vmx - Convert to CPU feature based module autoloading powerpc: Add module autoloading based on CPU features powerpc/powernv/ioda: Fix endianness when reading TCEs powerpc/mm: Add memory barrier in __hugepte_alloc() powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() call powerpc/ftrace: Separate the heuristics for checking call sites powerpc: Merge 32-bit and 64-bit setup_arch() powerpc/64: Make a few boot functions __init powerpc: Re-order setup_panic() powerpc: Re-order the call to smp_setup_cpu_maps() powerpc/32: Move cache info inits to a separate function powerpc/64: Move the content of setup_system() to setup_arch() ...
| * \ \ \ Merge branch 'next' of ↵Michael Ellerman2016-07-29
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include more 8xx optimizations, device tree updates, and MVME7100 support."
| | * | | | powerpc/8xx: Force VIRT_IMMR_BASE to be a positive numberScott Wood2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The asm-offsets mechanism generates signed numbers, even if the input value is explicitly unsigned. This causes a problem with older binutils (e.g. 2.23), which sign-extend a negative number when @h is applied. Thus, this instruction: cmpli cr0, r11, VIRT_IMMR_BASE@h resulted in this: Error: operand out of range (0xfffffff0 is not between 0x00000000 and 0x0000ffff) By casting to a larger type, we can force the output to be expressed as a positive number. Signed-off-by: Scott Wood <oss@buserror.net> Cc: Christophe Leroy <christophe.leroy@c-s.fr>
| | * | | | powerpc/8xx: add CONFIG_PIN_TLB_IMMRChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_PIN_TLB maps IMMR area and the first 24 Mbytes of memory. In some circunstances it might be more interesting to not map IMMR but map 32 Mbytes of memory instead. Therefore we add config option CONFIG_PIN_TLB_IMMR to select if IMMR shall be pinned or not, hence whether we pin 24 or 32 Mbytes of RAM Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc/8xx: Rework CONFIG_PIN_TLB handlingChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On recent kernels, with some debug options like for instance CONFIG_LOCKDEP, the BSS requires more than 8M memory, allthough the kernel code fits in the first 8M. Today, it is necessary to activate CONFIG_PIN_TLB to get more than 8M at startup, allthough pinning TLB is not necessary for that. We could have inconditionaly mapped 16 or 24M bytes at startup but some old hardware only have 8M and mapping non-existing RAM would be an issue due to speculative accesses. With the preceding patch however, the TLB entries are populated on demand. By setting up the TLB miss handler to handle up to 24M until the handler is patched for the entire memory space, it is possible to allow access up to more memory without mapping non-existing RAM. It is therefore not needed anymore to map memory data at all at startup. It will be handled by the TLB miss handler. One might still want to PIN the IMMR and the first 24M of RAM. It is now possible to do it in the C memory initialisation functions. In addition, we now know how much memory we have when we do it, so we are able to adapt the pining to the real amount of memory available. So boards with less than 24M can now also benefit from PIN_TLB. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc/8xx: Don't use page table for linear memory spaceChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using the first level page table to define mappings for the linear memory space, we can use direct mapping from the TLB handling routines. This has several advantages: * No need to read the tables at each TLB miss * No issue in 16k pages mode where the 1st level table maps 64 Mbytes The size of the available linear space is known at system startup. In order to avoid data access at each TLB miss to know the memory size, the TLB routine is patched at startup with the proper size This patch provides a 10%-15% improvment of TLB miss handling for kernel addresses Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc/8xx: unpin all TLBs before flushingChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bootloader may have pinned some TLB entries so the kernel must unpin them before flushing TLBs with tlbia otherwise pinned TLB entries won't get flushed Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc/8xx: Map IMMR area with 512k page at a fixed addressChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once the linear memory space has been mapped with 8Mb pages, as seen in the related commit, we get 11 millions DTLB missed during the reference 600s period. 77% of the misses are on user addresses and 23% are on kernel addresses (1 fourth for linear address space and 3 fourth for virtual address space) Traditionaly, each driver manages one computer board which has its own components with its own memory maps. But on embedded chips like the MPC8xx, the SOC has all registers located in the same IO area. When looking at ioremaps done during startup, we see that many drivers are re-mapping small parts of the IMMR for their own use and all those small pieces gets their own 4k page, amplifying the number of TLB misses: in our system we get 0xff000000 mapped 31 times and 0xff003000 mapped 9 times. Even if each part of IMMR was mapped only once with 4k pages, it would still be several small mappings towards linear area. This patch maps the IMMR with a single 512k page. With this patch applied, the number of DTLB misses during the 10 min period is reduced to 11.8 millions for a duration of 5.8s, which represents 2% of the non-idle time hence yet another 10% reduction. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc/8xx: Fix vaddr for IMMR early remapChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Memory: 124428K/131072K available (3748K kernel code, 188K rwdata, 648K rodata, 508K init, 290K bss, 6644K reserved) Kernel virtual memory layout: * 0xfffdf000..0xfffff000 : fixmap * 0xfde00000..0xfe000000 : consistent mem * 0xfddf6000..0xfde00000 : early ioremap * 0xc9000000..0xfddf6000 : vmalloc & ioremap SLUB: HWalign=16, Order=0-3, MinObjects=0, CPUs=1, Nodes=1 Today, IMMR is mapped 1:1 at startup Mapping IMMR 1:1 is just wrong because it may overlap with another area. On most mpc8xx boards it is OK as IMMR is set to 0xff000000 but for instance on EP88xC board, IMMR is at 0xfa200000 which overlaps with VM ioremap area This patch fixes the virtual address for remapping IMMR with the fixmap regardless of the value of IMMR. The size of IMMR area is 256kbytes (CPM at offset 0, security engine at offset 128k) so a 512k page is enough Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| | * | | | powerpc32: provide VIRT_CPU_ACCOUNTINGChristophe Leroy2016-07-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch provides VIRT_CPU_ACCOUTING to PPC32 architecture. PPC32 doesn't have the PACA structure, so we use the task_info structure to store the accounting data. In order to reuse on PPC32 the PPC64 functions, all u64 data has been replaced by 'unsigned long' so that it is u32 on PPC32 and u64 on PPC64 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
| * | | | | powerpc: Improve comment explaining why we modify VRSAVEAnton Blanchard2016-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment explaining why we modify VRSAVE is misleading, glibc does rely on the behaviour. Update the comment. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/modules: Never restore r2 for a mprofile-kernel style mcount() callMichael Ellerman2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the module loader we process relocations, and for long jumps we generate trampolines (aka stubs). At the call site for one of these trampolines we usually need to generate a load instruction to restore the TOC pointer into r2. There is one exception however, which is calls to mcount() using the mprofile-kernel ABI, they handle the TOC inside the stub, and so for them we do not generate a TOC load. The bug is in how the code in restore_r2() decides if it needs to generate the TOC load. It does so by looking for a nop following the branch, and if it sees a nop, it replaces it with the load. In general the compiler has no reason to generate a nop following the mcount() call and so that check works OK. However if we combine a jump label at the start of a function, with an early return, such that GCC applies the shrink-wrapping optimisation, we can then end up with an mcount call followed immediately by a nop. However the nop is not there for a TOC load, it is for the jump label. That confuses restore_r2() into replacing the jump label nop with a TOC load, which in turn confuses ftrace into replacing the mcount call with a b +8 (fixed in the previous commit). The end result is we jump over the jump label, which if it was supposed to return means we incorrectly run the body of the function. We have seen this in practice with some yet-to-be-merged patches that use jump labels more extensively. The fix is relatively simple, in restore_r2() we check for an mprofile-kernel style mcount() call first, before looking for the presence of a nop. Fixes: 153086644fd1 ("powerpc/ftrace: Add support for -mprofile-kernel ftrace ABI") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/ftrace: Separate the heuristics for checking call sitesMichael Ellerman2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In __ftrace_make_nop() (the 64-bit version), we have code to deal with two ftrace ABIs. There is the original ABI, which looks mostly like a function call, and then the mprofile-kernel ABI which is just a branch. The code tries to handle both cases, by looking for the presence of a load to restore the TOC pointer (PPC_INST_LD_TOC). If we detect the TOC load, we assume the call site is for an mcount() call using the old ABI. That means we patch the mcount() call with a b +8, to branch over the TOC load. However if the kernel was built with mprofile-kernel, then there will never be a call site using the original ftrace ABI. If for some reason we do see a TOC load, then it's there for a good reason, and we should not jump over it. So split the code, using the existing CC_USING_MPROFILE_KERNEL. Kernels built with mprofile-kernel will only look for, and expect, the new ABI, and similarly for the original ABI. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Merge 32-bit and 64-bit setup_arch()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is little enough differences now. mpe: Add a/p/k/setup.h to contain the prototypes and empty versions of functions we need, rather than using weak functions. Add a few other empty versions to avoid as many #ifdefs as possible in the code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Make a few boot functions __initBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Re-order setup_panic()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do it right after probe_machine() since it's about testing ppc_md, and put the test in the common code. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Re-order the call to smp_setup_cpu_maps()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It makes more sense to do it before intializing xmon() as xmon might use the info in there. We do want to register the console early though in case we want some functioning printk's in the cpu map setup. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/32: Move cache info inits to a separate functionBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Matches 64-bit. Also move the call to the same spot as ppc64 Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Move the content of setup_system() to setup_arch()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And kill setup_system(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Move setting of {i,d}cache_bsize to initialize_cache_info()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Also remove the completely osbolete comment. We *do* look in the device-tree. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Move the boot time info banner to a separate functionBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Get rid of ppc_md.init_early()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is now called right after platform probe, so the probe function can just do the job. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Move 32-bit probe() machine to later in the boot processBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts all the 32-bit platforms to use the expanded device-tree which is a pretty mechanical change. Unlike 64-bit, the 32-bit kernel didn't rely on platform initializations to setup the MMU since it sets it up entirely before probe_machine() so the move has comparatively less consequences though it's a bigger patch. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Move 64-bit probe_machine() to later in the boot processBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We no long need the machine type that early, so we can move probe_machine() to after the device-tree has been expanded. This will allow further consolidation. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Ensure that ppc_md is empty before probing for machine typeBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Anything in there will be overwritten, so it helps catching nasty bugs if we check that it's indeed full of NULL's before we do so. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm: Move hash table ops to a separate structureBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moving probe_machine() to after mmu init will cause the ppc_md fields relative to the hash table management to be overwritten. Since we have essentially disconnected the machine type from the hash backend ops, finish the job by moving them to a different structure. The only callback that didn't quite fix is update_partition_table since this is not specific to hash, so I moved it to a standalone variable for now. We can revisit later if needed. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Fix ppc64e build failure in kexec] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/64: Move MMU backend selection out of platform codeBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We move it into early_mmu_init() based on firmware features. For PS3, we have to move the setting of these into early_init_devtree(). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Put exception configuration in a common placeBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The various calls to establish exception endianness and AIL are now done from a single point using already established CPU and FW feature bits to decide what to do. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Move FW feature probing out of pseries probe()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We move the function itself to pseries/firmware.c and call it along with almost all other flat device-tree parsers from early_init_devtree() Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> [mpe: Move #ifdefs into the header by providing pseries_probe_fw_features()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Move 64-bit memory reserves to setup_arch()Benjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is really no need to do them that early, early_setup() runs before MMU is on, we should do the strict minimum there to get the MMU going. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Move 64-bit feature fixup earlierBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make it part of early_setup() as we really want the feature fixups to be applied before we turn on the MMU since they can have an impact on the various assembly path related to MMU management and interrupts. This makes 64-bit match what 32-bit does. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Factor do_feature_fixup callsBenjamin Herrenschmidt2016-07-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 32 and 64-bit do a similar set of calls early on, we move it all to a single common function to make the boot code more readable. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/32: Remove RELOCATABLE_PPC32Kevin Hao2016-07-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is seldom used in the kernel code and can be easily replaced by either RELOCATABLE or PPC32. So there is no reason to keep a separate kernel option for this. Signed-off-by: Kevin Hao <haokexin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm/radix: Add a kernel command line to disable radixAneesh Kumar K.V2016-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the kernel command line disable_radix which disable the radix MMU mode even if firmware indicates radix support via ibm,pa-features device tree node. This helps in testing different MMU mode easily. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm: Clear top 16 bits of va only on older cpusAneesh Kumar K.V2016-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As per ISA, we need to do this only for architecture version 2.02 and earlier. This continued to work even for 2.07. But let's not do this for anything after 2.02. ISA 3.0 requires these top bits to be not cleared. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/pci: Don't try to allocate resources that will be reassignedBenjamin Herrenschmidt2016-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we know we will reassign all resources, trying (and failing) to allocate them initially is fairly pointless and leads to a lot of scary messages in the kernel log Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>