aboutsummaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include
Commit message (Collapse)AuthorAge
...
* | | | KVM: Use simple waitqueue for vcpu->wqMarcelo Tosatti2016-02-25
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The problem: On -rt, an emulated LAPIC timer instances has the following path: 1) hard interrupt 2) ksoftirqd is scheduled 3) ksoftirqd wakes up vcpu thread 4) vcpu thread is scheduled This extra context switch introduces unnecessary latency in the LAPIC path for a KVM guest. The solution: Allow waking up vcpu thread from hardirq context, thus avoiding the need for ksoftirqd to be scheduled. Normal waitqueues make use of spinlocks, which on -RT are sleepable locks. Therefore, waking up a waitqueue waiter involves locking a sleeping lock, which is not allowed from hard interrupt context. cyclictest command line: This patch reduces the average latency in my tests from 14us to 11us. Daniel writes: Paolo asked for numbers from kvm-unit-tests/tscdeadline_latency benchmark on mainline. The test was run 1000 times on tip/sched/core 4.4.0-rc8-01134-g0905f04: ./x86-run x86/tscdeadline_latency.flat -cpu host with idle=poll. The test seems not to deliver really stable numbers though most of them are smaller. Paolo write: "Anything above ~10000 cycles means that the host went to C1 or lower---the number means more or less nothing in that case. The mean shows an improvement indeed." Before: min max mean std count 1000.000000 1000.000000 1000.000000 1000.000000 mean 5162.596000 2019270.084000 5824.491541 20681.645558 std 75.431231 622607.723969 89.575700 6492.272062 min 4466.000000 23928.000000 5537.926500 585.864966 25% 5163.000000 1613252.750000 5790.132275 16683.745433 50% 5175.000000 2281919.000000 5834.654000 23151.990026 75% 5190.000000 2382865.750000 5861.412950 24148.206168 max 5228.000000 4175158.000000 6254.827300 46481.048691 After min max mean std count 1000.000000 1000.00000 1000.000000 1000.000000 mean 5143.511000 2076886.10300 5813.312474 21207.357565 std 77.668322 610413.09583 86.541500 6331.915127 min 4427.000000 25103.00000 5529.756600 559.187707 25% 5148.000000 1691272.75000 5784.889825 17473.518244 50% 5160.000000 2308328.50000 5832.025000 23464.837068 75% 5172.000000 2393037.75000 5853.177675 24223.969976 max 5222.000000 3922458.00000 6186.720500 42520.379830 [Patch was originaly based on the swait implementation found in the -rt tree. Daniel ported it to mainline's version and gathered the benchmark numbers for tscdeadline_latency test.] Signed-off-by: Daniel Wagner <daniel.wagner@bmw-carit.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: linux-rt-users@vger.kernel.org Cc: Boqun Feng <boqun.feng@gmail.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1455871601-27484-4-git-send-email-wagi@monom.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge tag 'powerpc-4.5-2' of ↵Linus Torvalds2016-01-29
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Wire up copy_file_range() syscall from Chandan Rajendra - Simplify module TOC handling from Alan Modra - Remove newly added extra definition of pmd_dirty from Stephen Rothwell - Allow user space to map rtas_rmo_buf from Vasant Hegde - Fix PE location code from Gavin Shan - Remove PPMU_HAS_SSLOT flag for Power8 from Madhavan Srinivasan - Fixup _HPAGE_CHG_MASK from Aneesh Kumar K.V * tag 'powerpc-4.5-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/mm: Fixup _HPAGE_CHG_MASK powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 powerpc/eeh: Fix PE location code powerpc/mm: Allow user space to map rtas_rmo_buf powerpc: Remove newly added extra definition of pmd_dirty powerpc: Simplify module TOC handling powerpc: Wire up copy_file_range() syscall
| * | powerpc/mm: Fixup _HPAGE_CHG_MASKAneesh Kumar K.V2016-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was wrongly updated by commit 7aa9a23c69ea ("powerpc, thp: remove infrastructure for handling splitting PMDs") during the last merge window. Fix it up. This could lead to incorrect behaviour in THP and/or mprotect(), at a minimum. Fixes: 7aa9a23c69ea ("powerpc, thp: remove infrastructure for handling splitting PMDs") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc: Remove newly added extra definition of pmd_dirtyStephen Rothwell2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d5d6a443b243 ("arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THP") added a new identical definition of pmd_dirty(). Remove it again. Cc: Minchan Kim <minchan@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | powerpc: Wire up copy_file_range() syscallChandan Rajendra2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | Test runs on a ppc64 BE guest succeeded using modified fstests. Also tested on ppc64 LE using a home made test - mpe. Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2016-01-27
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM fixes from Paolo Bonzini: "s390 and POWER bug fixes, plus enabling the KVM-VFIO interface on s390" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM doc: Fix KVM_SMI chapter number KVM: s390: fix memory overwrites when vx is disabled KVM: s390: Enable the KVM-VFIO device KVM: s390: fix guest fprs memory leak KVM: PPC: Fix ONE_REG AltiVec support KVM: PPC: Increase memslots to 512 KVM: PPC: Book3S PR: Remove unused variable 'vcpu_book3s' KVM: PPC: Fix emulation of H_SET_DABR/X on POWER8 KVM: PPC: Book3S HV: Handle unexpected traps in guest entry/exit code better
| * | Merge tag 'kvm-s390-master-4.5-1' of ↵Paolo Bonzini2016-01-26
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: Fixes for kvm/master (targeting 4.5) 1. Fallout of some bigger floating point/vector rework in s390 - memory leak -> stable 4.3+ - memory overwrite -> stable 4.4+ 2. enable KVM-VFIO for s390
| * \ \ Merge branch 'kvm-ppc-next' of ↵Paolo Bonzini2016-01-15
| |\ \ \ | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into HEAD
| | * | | KVM: PPC: Increase memslots to 512Thomas Huth2015-12-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only using 32 memslots for KVM on powerpc is way too low, you can nowadays hit this limit quite fast by adding a couple of PCI devices and/or pluggable memory DIMMs to the guest. x86 already increased the KVM_USER_MEM_SLOTS to 509, to satisfy 256 pluggable DIMM slots, 3 private slots and 253 slots for other things like PCI devices (i.e. resulting in 256 + 3 + 253 = 512 slots in total). We should do something similar for powerpc, and since we do not use private slots here, we can set the value to 512 directly. While we're at it, also remove the KVM_MEM_SLOTS_NUM definition from the powerpc-specific header since this gets defined in the generic kvm_host.h header anyway. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Paul Mackerras <paulus@samba.org>
* | | | | dma-mapping: always provide the dma_map_ops based implementationChristoph Hellwig2016-01-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the generic implementation to <linux/dma-mapping.h> now that all architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now that everyone supports them. [valentinrothberg@gmail.com: remove leftovers in Kconfig] Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Helge Deller <deller@gmx.de> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Steven Miao <realmz6@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Valentin Rothberg <valentinrothberg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_headerRasmus Villemoes2016-01-20
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The four cpumasks cpu_{possible,online,present,active}_bits are exposed readonly via the corresponding const variables cpu_xyz_mask. But they are also accessible for arbitrary writing via the exposed functions set_cpu_xyz. There's quite a bit of code throughout the kernel which iterates over or otherwise accesses these bitmaps, and having the access go via the cpu_xyz_mask variables is nowadays [1] simply a useless indirection. It may be that any problem in CS can be solved by an extra level of indirection, but that doesn't mean every extra indirection solves a problem. In this case, it even necessitates some minor ugliness (see 4/6). Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a struct member, to avoid problems when the identifier cpu_online_mask becomes a macro later in the series. The next four patches eliminate the cpu_xyz_mask variables by simply exposing the actual bitmaps, after renaming them to discourage direct access - that still happens through cpu_xyz_mask, which are now simply macros with the same type and value as they used to have. After that, there's no longer any reason to have the setter functions be out-of-line: The boolean parameter is almost always a literal true or false, so by making them static inlines they will usually compile to one or two instructions. For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes. We also save a little stack (stackdelta says 127 functions have a 16 byte smaller stack frame, while two grow by that amount). Mostly because, when iterating over the mask, gcc typically loads the value of cpu_xyz_mask into a callee-saved register and from there into %rdi before each find_next_bit call - now it can just load the appropriate immediate address into %rdi before each call. [1] See Rusty's kind explanation http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for some historic context. This patch (of 6): As preparation for eliminating the indirect access to the various global cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the cpu_online_mask member of struct fadump_crash_info_header to simply online_mask, thus allowing cpu_online_mask to become a macro. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Acked-by: Michael Ellerman <mpe@ellerman.id.au> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2016-01-18
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio barrier rework+fixes from Michael Tsirkin: "This adds a new kind of barrier, and reworks virtio and xen to use it. Plus some fixes here and there" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (44 commits) checkpatch: add virt barriers checkpatch: check for __smp outside barrier.h checkpatch.pl: add missing memory barriers virtio: make find_vqs() checkpatch.pl-friendly virtio_balloon: fix race between migration and ballooning virtio_balloon: fix race by fill and leak s390: more efficient smp barriers s390: use generic memory barriers xen/events: use virt_xxx barriers xen/io: use virt_xxx barriers xenbus: use virt_xxx barriers virtio_ring: use virt_store_mb sh: move xchg_cmpxchg to a header by itself sh: support 1 and 2 byte xchg virtio_ring: update weak barriers to use virt_xxx Revert "virtio_ring: Update weak barriers to use dma_wmb/rmb" asm-generic: implement virt_xxx memory barriers x86: define __smp_xxx xtensa: define __smp_xxx tile: define __smp_xxx ...
| * | | | powerpc: define __smp_xxxMichael S. Tsirkin2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This defines __smp_xxx barriers for powerpc for use by virtualization. smp_xxx barriers are removed as they are defined correctly by asm-generic/barriers.h This reduces the amount of arch-specific boiler-plate code. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Boqun Feng <boqun.feng@gmail.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
| * | | | powerpc: reuse asm-generic/barrier.hMichael S. Tsirkin2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On powerpc read_barrier_depends, smp_read_barrier_depends smp_store_mb(), smp_mb__before_atomic and smp_mb__after_atomic match the asm-generic variants exactly. Drop the local definitions and pull in asm-generic/barrier.h instead. This is in preparation to refactoring this code area. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
| * | | | lcoking/barriers, arch: Use smp barriers in smp_store_release()Davidlohr Bueso2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()") it was made clear that the context of this call (and thus set_mb) is strictly for CPU ordering, as opposed to IO. As such all archs should use the smp variant of mb(), respecting the semantics and saving a mandatory barrier on UP. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <linux-arch@vger.kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* | | | | kvm: rename pfn_t to kvm_pfn_tDan Williams2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To date, we have implemented two I/O usage models for persistent memory, PMEM (a persistent "ram disk") and DAX (mmap persistent memory into userspace). This series adds a third, DAX-GUP, that allows DAX mappings to be the target of direct-i/o. It allows userspace to coordinate DMA/RDMA from/to persistent memory. The implementation leverages the ZONE_DEVICE mm-zone that went into 4.3-rc1 (also discussed at kernel summit) to flag pages that are owned and dynamically mapped by a device driver. The pmem driver, after mapping a persistent memory range into the system memmap via devm_memremap_pages(), arranges for DAX to distinguish pfn-only versus page-backed pmem-pfns via flags in the new pfn_t type. The DAX code, upon seeing a PFN_DEV+PFN_MAP flagged pfn, flags the resulting pte(s) inserted into the process page tables with a new _PAGE_DEVMAP flag. Later, when get_user_pages() is walking ptes it keys off _PAGE_DEVMAP to pin the device hosting the page range active. Finally, get_page() and put_page() are modified to take references against the device driver established page mapping. Finally, this need for "struct page" for persistent memory requires memory capacity to store the memmap array. Given the memmap array for a large pool of persistent may exhaust available DRAM introduce a mechanism to allocate the memmap from persistent memory. The new "struct vmem_altmap *" parameter to devm_memremap_pages() enables arch_add_memory() to use reserved pmem capacity rather than the page allocator. This patch (of 18): The core has developed a need for a "pfn_t" type [1]. Move the existing pfn_t in KVM to kvm_pfn_t [2]. [1]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002199.html [2]: https://lists.01.org/pipermail/linux-nvdimm/2015-September/002218.html Signed-off-by: Dan Williams <dan.j.williams@intel.com> Acked-by: Christoffer Dall <christoffer.dall@linaro.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | arch/powerpc/include/asm/pgtable-ppc64.h: add pmd_[dirty|mkclean] for THPMinchan Kim2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MADV_FREE needs pmd_dirty and pmd_mkclean for detecting recent overwrite of the contents since MADV_FREE syscall is called for THP page. This patch adds pmd_dirty and pmd_mkclean for THP page MADV_FREE support. Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Shaohua Li <shli@kernel.org> Cc: <yalin.wang2010@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Gang <gang.chen.5i5j@gmail.com> Cc: Chris Zankel <chris@zankel.net> Cc: Daniel Micay <danielmicay@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: David S. Miller <davem@davemloft.net> Cc: Helge Deller <deller@gmx.de> Cc: Hugh Dickins <hughd@google.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Jason Evans <je@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mika Penttil <mika.penttila@nextfour.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Rik van Riel <riel@redhat.com> Cc: Roland Dreier <roland@kernel.org> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Shaohua Li <shli@kernel.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | powerpc, thp: remove infrastructure for handling splitting PMDsKirill A. Shutemov2016-01-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With new refcounting we don't need to mark PMDs splitting. Let's drop code to handle this. pmdp_splitting_flush() is not needed too: on splitting PMD we will do pmdp_clear_flush() + set_pte_at(). pmdp_clear_flush() will do IPI as needed for fast_gup. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Steve Capper <steve.capper@linaro.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'powerpc-4.5-1' of ↵Linus Torvalds2016-01-15
|\ \ \ \ \ | |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Core: - Ground work for the new Power9 MMU from Aneesh Kumar K.V - Optimise FP/VMX/VSX context switching from Anton Blanchard Misc: - Various cleanups from Krzysztof Kozlowski, John Ogness, Rashmica Gupta, Russell Currey, Gavin Shan, Daniel Axtens, Michael Neuling, Andrew Donnellan - Allow wrapper to work on non-english system from Laurent Vivier - Add rN aliases to the pt_regs_offset table from Rashmica Gupta - Fix module autoload for rackmeter & axonram drivers from Luis de Bethencourt - Include KVM guest test in all interrupt vectors from Paul Mackerras - Fix DSCR inheritance over fork() from Anton Blanchard - Make value-returning atomics & {cmp}xchg* & their atomic_ versions fully ordered from Boqun Feng - Print MSR TM bits in oops messages from Michael Neuling - Add TM signal return & invalid stack selftests from Michael Neuling - Limit EPOW reset event warnings from Vipin K Parashar - Remove the Cell QPACE code from Rashmica Gupta - Append linux_banner to exception information in xmon from Rashmica Gupta - Add selftest to check if VSRs are corrupted from Rashmica Gupta - Remove broken GregorianDay() from Daniel Axtens - Import Anton's context_switch2 benchmark into selftests from Michael Ellerman - Add selftest script to test HMI functionality from Daniel Axtens - Remove obsolete OPAL v2 support from Stewart Smith - Make enter_rtas() private from Michael Ellerman - PPR exception cleanups from Michael Ellerman - Add page soft dirty tracking from Laurent Dufour - Add support for Nvlink NPUs from Alistair Popple - Add support for kexec on 476fpe from Alistair Popple - Enable kernel CPU dlpar from sysfs from Nathan Fontenot - Copy only required pieces of the mm_context_t to the paca from Michael Neuling - Add a kmsg_dumper that flushes OPAL console output on panic from Russell Currey - Implement save_stack_trace_regs() to enable kprobe stack tracing from Steven Rostedt - Add HWCAP bits for Power9 from Michael Ellerman - Fix _PAGE_PTE breaking swapoff from Aneesh Kumar K.V - Fix _PAGE_SWP_SOFT_DIRTY breaking swapoff from Hugh Dickins - scripts/recordmcount.pl: support data in text section on powerpc from Ulrich Weigand - Handle R_PPC64_ENTRY relocations in modules from Ulrich Weigand cxl: - cxl: Fix possible idr warning when contexts are released from Vaibhav Jain - cxl: use correct operator when writing pcie config space values from Andrew Donnellan - cxl: Fix DSI misses when the context owning task exits from Vaibhav Jain - cxl: fix build for GCC 4.6.x from Brian Norris - cxl: use -Werror only with CONFIG_PPC_WERROR from Brian Norris - cxl: Enable PCI device ID for future IBM CXL adapter from Uma Krishnan Freescale: - Freescale updates from Scott: Highlights include moving QE code out of arch/powerpc (to be shared with arm), device tree updates, and minor fixes" * tag 'powerpc-4.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (149 commits) powerpc/module: Handle R_PPC64_ENTRY relocations scripts/recordmcount.pl: support data in text section on powerpc powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoff powerpc/mm: Fix _PAGE_PTE breaking swapoff cxl: Enable PCI device ID for future IBM CXL adapter cxl: use -Werror only with CONFIG_PPC_WERROR cxl: fix build for GCC 4.6.x powerpc: Add HWCAP bits for Power9 powerpc/powernv: Reserve PE#0 on NPU powerpc/powernv: Change NPU PE# assignment powerpc/powernv: Fix update of NVLink DMA mask powerpc/powernv: Remove misleading comment in pci.c powerpc: Implement save_stack_trace_regs() to enable kprobe stack tracing powerpc: Fix build break due to paca mm_context_t changes cxl: Fix DSI misses when the context owning task exits MAINTAINERS: Update Scott Wood's e-mail address powerpc/powernv: Fix minor off-by-one error in opal_mce_check_early_recovery() powerpc: Fix style of self-test config prompts powerpc/powernv: Only delay opal_rtc_read() retry when necessary ...
| * | | | Merge branch 'next' of ↵Michael Ellerman2016-01-13
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux into next Freescale updates from Scott: "Highlights include moving QE code out of arch/powerpc (to be shared with arm), device tree updates, and minor fixes."
| | * | | | QE: Move QE from arch/powerpc to drivers/socZhao Qiang2015-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ls1 has qe and ls1 has arm cpu. move qe from arch/powerpc to drivers/soc/fsl to adapt to powerpc and arm Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
| | * | | | QE/CPM: move muram management functions to qe_commonZhao Qiang2015-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QE and CPM have the same muram, they use the same management functions. Now QE support both ARM and PowerPC, it is necessary to move QE to "driver/soc", so move the muram management functions from cpm_common to qe_common for preparing to move QE code to "driver/soc" Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
| | * | | | CPM/QE: use genalloc to manage CPM/QE muramZhao Qiang2015-12-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use genalloc to manage CPM/QE muram instead of rheap. Signed-off-by: Zhao Qiang <qiang.zhao@freescale.com> Signed-off-by: Scott Wood <scottwood@freescale.com>
| * | | | | powerpc/module: Handle R_PPC64_ENTRY relocationsUlrich Weigand2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 6 will include changes to generated code with -mcmodel=large, which is used to build kernel modules on powerpc64le. This was necessary because the large model is supposed to allow arbitrary sizes and locations of the code and data sections, but the ELFv2 global entry point prolog still made the unconditional assumption that the TOC associated with any particular function can be found within 2 GB of the function entry point: func: addis r2,r12,(.TOC.-func)@ha addi r2,r2,(.TOC.-func)@l .localentry func, .-func To remove this assumption, GCC will now generate instead this global entry point prolog sequence when using -mcmodel=large: .quad .TOC.-func func: .reloc ., R_PPC64_ENTRY ld r2, -8(r12) add r2, r2, r12 .localentry func, .-func The new .reloc triggers an optimization in the linker that will replace this new prolog with the original code (see above) if the linker determines that the distance between .TOC. and func is in range after all. Since this new relocation is now present in module object files, the kernel module loader is required to handle them too. This patch adds support for the new relocation and implements the same optimization done by the GNU linker. Cc: stable@vger.kernel.org Signed-off-by: Ulrich Weigand <ulrich.weigand@de.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usagesRussell Currey2016-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no parameters and returned nothing. The call was updated to accept the terminal number to flush, and returned various values depending on the state of the output buffer. The prototype has been updated and its usage in the OPAL kmsg dumper has been modified to support its new behaviour as an incremental flush. Signed-off-by: Russell Currey <ruscur@russell.cc> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm: fix _PAGE_SWP_SOFT_DIRTY breaking swapoffHugh Dickins2016-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Swapoff after swapping hangs on the G5, when CONFIG_CHECKPOINT_RESTORE=y but CONFIG_MEM_SOFT_DIRTY is not set. That's because the non-zero _PAGE_SWP_SOFT_DIRTY bit, added by CONFIG_HAVE_ARCH_SOFT_DIRTY=y, is not discounted when CONFIG_MEM_SOFT_DIRTY is not set: so swap ptes cannot be recognized. (I suspect that the peculiar dependence of HAVE_ARCH_SOFT_DIRTY on CHECKPOINT_RESTORE in arch/powerpc/Kconfig comes from an incomplete attempt to solve this problem.) It's true that the relationship between CONFIG_HAVE_ARCH_SOFT_DIRTY and and CONFIG_MEM_SOFT_DIRTY is too confusing, and it's true that swapoff should be made more robust; but nevertheless, fix up the powerpc ifdefs as x86_64 and s390 (which met the same problem) have them, defining the bits as 0 if CONFIG_MEM_SOFT_DIRTY is not set. Fixes: 7207f43665b8 ("powerpc/mm: Add page soft dirty tracking") Signed-off-by: Hugh Dickins <hughd@google.com> Reviewed-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm: Fix _PAGE_PTE breaking swapoffAneesh Kumar K.V2016-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Core kernel expects swp_entry_t to consist of only swap type and swap offset. We should not leak pte bits into swp_entry_t. This breaks swapoff which use the swap type and offset to build a swp_entry_t and later compare that to the swp_entry_t obtained from linux page table pte. Leaking pte bits into swp_entry_t breaks that comparison and results in us looping in try_to_unuse. The stack trace can be anywhere below try_to_unuse() in mm/swapfile.c, since swapoff is circling around and around that function, reading from each used swap block into a page, then trying to find where that page belongs, looking at every non-file pte of every mm that ever swapped. Fixes: 6a119eae942c ("powerpc/mm: Add a _PAGE_PTE bit") Reported-by: Hugh Dickins <hughd@google.com> Suggested-by: Hugh Dickins <hughd@google.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Add HWCAP bits for Power9Michael Ellerman2016-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to support Power9 we need two new HWCAP bits. We are merging these ahead of the cputable entry so that glibc can start referring to them. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Fix build break due to paca mm_context_t changesMichael Ellerman2016-01-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to the paca") broke the build for CONFIG_PPC_STD_MMU_64=y and CONFIG_PPC_MM_SLICES=n. That only happens for a kernel built with 4K pages and HUGETLB disabled, which is why we missed it. Fix it by adding a mm_ctx_user_psize member to the paca and populating it in the appropriate places. Fixes: 2fc251a8dda5 ("powerpc: Copy only required pieces of the mm_context_t to the paca") Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/powernv: Add a kmsg_dumper that flushes console output on panicRussell Currey2015-12-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On BMC machines, console output is controlled by the OPAL firmware and is only flushed when its pollers are called. When the kernel is in a panic state, it no longer calls these pollers and thus console output does not completely flush, causing some output from the panic to be lost. Output is only actually lost when the kernel is configured to not power off or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL flushes the console buffer as part of its power down routines. Before this patch, however, only partial output would be printed during the timeout wait. This patch adds a new kmsg_dumper which gets called at panic time to ensure panic output is not lost. It accomplishes this by calling OPAL_CONSOLE_FLUSH in the OPAL API, and if that is not available, the pollers are called enough times to (hopefully) completely flush the buffer. The flushing mechanism will only affect output printed at and before the kmsg_dump call in kernel/panic.c:panic(). As such, the "end Kernel panic" message may still be truncated as follows: >Call Trace: >[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable) >[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4 >[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c >[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254 >[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350 >[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130 >[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac >---[ end Kernel panic - not This functionality is implemented as a kmsg_dumper as it seems to be the most sensible way to introduce platform-specific functionality to the panic function. Signed-off-by: Russell Currey <ruscur@russell.cc> Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Copy only required pieces of the mm_context_t to the pacaMichael Neuling2015-12-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we copy the whole mm_context_t to the paca but only access a few bits of it. This is wasteful of space paca and also takes quite some time in the hot path of context switching. This patch pulls in only the required bits from the mm_context_t to the paca and on context switch, copies only those. Benchmarking this (On top of Anton's recent MSR context switching changes [1]) using processes and yield shows an improvement of almost 3% on POWER8: http://ozlabs.org/~anton/junkcode/context_switch2.c ./context_switch2 --test=yield --process 0 0 1. https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-October/135700.html Signed-off-by: Michael Neuling <mikey@neuling.org> [mpe: Rename paca fields to be mm_ctx_foo rather than context_foo] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Add function to copy mm_context_t to the pacaMichael Neuling2015-12-19
| |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a function to copy the mm->context to the paca. This is only a basic conversion for now but will be used more extensively in the next patch. This also adds #ifdef CONFIG_PPC_BOOK3S around this code since it's not used elsewhere. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/powernv: Add support for Nvlink NPUsAlistair Popple2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVLink is a high speed interconnect that is used in conjunction with a PCI-E connection to create an interface between CPU and GPU that provides very high data bandwidth. A PCI-E connection to a GPU is used as the control path to initiate and report status of large data transfers sent via the NVLink. On IBM Power systems the NVLink processing unit (NPU) is similar to the existing PHB3. This patch adds support for a new NPU PHB type. DMA operations on the NPU are not supported as this patch sets the TCE translation tables to be the same as the related GPU PCIe device for each NVLink. Therefore all DMA operations are setup and controlled via the PCIe device. EEH is not presently supported for the NPU devices, although it may be added in future. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc: Add __raw_rm_writeq() functionAlistair Popple2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move __raw_rm_writeq() from platforms/powernv/pci-ioda.c to include/asm/io.h so that it can be used by other code. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | Revert "powerpc/pci: Remove unused struct pci_dn.pcidev field"Alistair Popple2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit removed the pcidev field from struct pci_dn as it was no longer in use by the kernel. However to support finding the association of Nvlink devices to GPU devices from the device-tree this field is required. This reverts commit 250c7b277c65 ("powerpc/pci: Remove unused struct pci_dn.pcidev field"). Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/mm: Add page soft dirty trackingLaurent Dufour2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | User space checkpoint and restart tool (CRIU) needs the page's change to be soft tracked. This allows to do a pre checkpoint and then dump only touched pages. This is done by using a newly assigned PTE bit (_PAGE_SOFT_DIRTY) when the page is backed in memory, and a new _PAGE_SWP_SOFT_DIRTY bit when the page is swapped out. To introduce a new PTE _PAGE_SOFT_DIRTY bit value common to hash 4k and hash 64k pte, the bits already defined in hash-*4k.h should be shifted left by one. The _PAGE_SWP_SOFT_DIRTY bit is dynamically put after the swap type in the swap pte. A check is added to ensure that the bit is not overwritten by _PAGE_HPTEFLAGS. Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com> CC: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/kernel: Combine vec/loc for STD_EXCEPTION_PSERIESMichael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The STD_EXCEPTION_PSERIES macro takes both a vector number, and a location (memory address). However both are always identical, so combine them to save repeating ourselves. This does mean an exception handler must always exist at the location in memory that matches its vector number. But that's OK because this is the "STD" macro (standard), which does exactly that. We have other macros for the other cases, eg. STD_EXCEPTION_PSERIES_OOL (out of line). Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/kernel: Open code SET_DEFAULT_THREAD_PPRMichael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is only used in one location, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/kernel: Open code HMT_MEDIUM_LOW_HAS_PPRMichael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMT_MEDIUM_LOW_HAS_PPR is only used in once place, open code it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/kernel: Drop HMT_MEDIUM_PPR_DISCARDMichael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HMT_MEDIUM_PPR_DISCARD is a macro which is present at the start of most of our first level exception handlers. It conditionally executes a HMT_MEDIUM instruction, which sets the processor priority to medium. On on modern systems, ie. Power7 and later, it is nop'ed out at boot. All it does is make the exception vectors more cramped, and consume 4 bytes of icache. On old systems it has the effect of boosting the processor priority at the start of exception processing. If we were previously in the idle loop for example, we may be at low or very low priority. This is desirable as we want to process the exception as fast as possible. However looking closely at the generated code, we see that in all cases we execute another HMT_MEDIUM just four instructions later. With code patching applied, the final code on an old (Power6) system will look like, eg: c000000000000300 <data_access_pSeries>: c000000000000300: 7c 42 13 78 mr r2,r2 <- c000000000000304: 7d b2 43 a6 mtsprg 2,r13 c000000000000308: 7d b1 42 a6 mfsprg r13,1 c00000000000030c: f9 2d 00 80 std r9,128(r13) c000000000000310: 60 00 00 00 nop c000000000000314: 7c 42 13 78 mr r2,r2 <- So I suggest that the added code complexity of HMT_MEDIUM_PPR_DISCARD is not justified by the benefit of boosting the processor priority for the duration of four instructions, and therefore we drop it. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/rtas: Make enter_rtas() privateMichael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are no longer any users of enter_rtas() outside of rtas.c, so make it "private", by moving the declaration inside rtas.c. Hopefully this will encourage people to use one of the wrappers which takes the sharp edges off the RTAS calling sequence. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/rtas: Add rtas_call_unlocked()Michael Ellerman2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most users of RTAS (Run-Time Abstraction Services) use rtas_call(), which deals with locking as well as endian handling. However we have two users outside of rtas.c that can't use rtas_call() because they have different locking requirements. The hotplug CPU code can't take the RTAS lock because the CPU would go offline with the lock held and no other CPUs would be able to call RTAS until the CPU came back online. The xmon code doesn't want to take the lock because it would risk dead locking when we are trying to recover from a crash. Both sites required multiple patches when we added little endian support, proving that programmers can't do endian right. Although that ship has sailed, we can still clean the code up by providing an unlocked version of rtas_call() which avoids the need to open code the logic elsewhere. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/powernv: remove FW_FEATURE_OPALv3 and just use FW_FEATURE_OPALStewart Smith2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Long ago, only in the lab, there was OPALv1 and OPALv2. Now there is just OPALv3, with nobody ever expecting anything on pre-OPALv3 to be cared about or supported by mainline kernels. So, let's remove FW_FEATURE_OPALv3 and instead use FW_FEATURE_OPAL exclusively. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc/powernv: Remove OPALv2 firmware define and referencesStewart Smith2015-12-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | OPALv2 only ever existed in the lab and didn't escape to the world. All OPAL systems in the wild are OPALv3. The probability of there being an OPALv2 system still powered on anywhere inside IBM is approximately zero, let alone anyone expecting to run mainline kernels. So, start to remove references to OPALv2. Signed-off-by: Stewart Smith <stewart@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | powerpc: Remove broken GregorianDay()Daniel Axtens2015-12-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GregorianDay() is supposed to calculate the day of the week (tm->tm_wday) for a given day/month/year. In that calcuation it indexed into an array called MonthOffset using tm->tm_mon-1. However tm_mon is zero-based, not one-based, so this is off-by-one. It also means that every January, GregoiranDay() will access element -1 of the MonthOffset array. It also doesn't appear to be a correct algorithm either: see in contrast kernel/time/timeconv.c's time_to_tm function. It's been broken forever, which suggests no-one in userland uses this. It looks like no-one in the kernel uses tm->tm_wday either (see e.g. drivers/rtc/rtc-ds1305.c:319). tm->tm_wday is conventionally set to -1 when not available in hardware so we can simply set it to -1 and drop the function. (There are over a dozen other drivers in drivers/rtc that do this.) Found using UBSAN. Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Andrew Morton <akpm@linux-foundation.org> # as an example of what UBSan finds. Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: rtc-linux@googlegroups.com Signed-off-by: Daniel Axtens <dja@axtens.net> Acked-by: Alexandre Belloni <alexandre.belloni@free-electrons.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | Merge tag 'powerpc-4.4-3' into nextMichael Ellerman2015-12-14
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the two TM fixes we merged in 4.4. We are about to merge selftests for these, and without the fixes the selftests will oops. powerpc fixes for 4.4 #2 - tm: Block signal return from setting invalid MSR state from Michael Neuling - tm: Check for already reclaimed tasks from Michael Neuling
| * | | | | powerpc: Make {cmp}xchg* and their atomic_ versions fully orderedBoqun Feng2015-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to memory-barriers.txt, xchg*, cmpxchg* and their atomic_ versions all need to be fully ordered, however they are now just RELEASE+ACQUIRE, which are not fully ordered. So also replace PPC_RELEASE_BARRIER and PPC_ACQUIRE_BARRIER with PPC_ATOMIC_ENTRY_BARRIER and PPC_ATOMIC_EXIT_BARRIER in __{cmp,}xchg_{u32,u64} respectively to guarantee fully ordered semantics of atomic{,64}_{cmp,}xchg() and {cmp,}xchg(), as a complement of commit b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") This patch depends on patch "powerpc: Make value-returning atomics fully ordered" for PPC_ATOMIC_ENTRY_BARRIER definition. Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc: Make value-returning atomics fully orderedBoqun Feng2015-12-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to memory-barriers.txt: > Any atomic operation that modifies some state in memory and returns > information about the state (old or new) implies an SMP-conditional > general memory barrier (smp_mb()) on each side of the actual > operation ... Which mean these operations should be fully ordered. However on PPC, PPC_ATOMIC_ENTRY_BARRIER is the barrier before the actual operation, which is currently "lwsync" if SMP=y. The leading "lwsync" can not guarantee fully ordered atomics, according to Paul Mckenney: https://lkml.org/lkml/2015/10/14/970 To fix this, we define PPC_ATOMIC_ENTRY_BARRIER as "sync" to guarantee the fully-ordered semantics. This also makes futex atomics fully ordered, which can avoid possible memory ordering problems if userspace code relies on futex system call for fully ordered semantics. Fixes: b97021f85517 ("powerpc: Fix atomic_xxx_return barrier semantics") Cc: stable@vger.kernel.org # 3.2+ Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/mm: Use H_READ with H_READ_4Aneesh Kumar K.V2015-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will bulk read 4 hash pte slot entries and should reduce the loop Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
| * | | | | powerpc/nohash: we don't use real_pte_t for nohashAneesh Kumar K.V2015-12-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the related functions and #defines Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>