| Commit message (Collapse) | Author | Age |
... | |
| |\
| | |
| | |
| | |
| | |
| | |
| | | |
Merge reason: merge almost-rc8 into perfcounters/core, which was -rc6
based - to pick up the latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Merge reason: this branch was on an -rc4 base, merge it up to -rc6
to get the latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Merge reason: This brach was on -rc1, refresh it to almost-rc4 to pick up
the latest upstream fixes.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Paul suggested we allow for data addresses to be recorded along with
the traditional IPs as power can provide these.
For now, only the software pagefault events provide data addresses,
but in the future power might as well for some events.
x86 doesn't seem capable of providing this atm.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com>
LKML-Reference: <20090408130409.394816925@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Provide separate sw counters for major and minor page faults.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We use the generic software counter infrastructure to provide
page fault events.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is a random collection of added ifdef's around portions of
code that only mak sense on server processors. Using either
CONFIG_PPC_STD_MMU_64 or CONFIG_PPC_BOOK3S as seems appropriate.
This is meant to make the future merging of Book3E 64-bit support
easier.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For some obscure reason, we only set init_bootmem_done after initializing
bootmem when NUMA isn't enabled. We even document this next to the declaration
of that global in system.h which of course I didn't read before I had to
debug why some WIP code wasn't working properly...
This patch changes it so that we always set it after bootmem is initialized
which should have always been the case... go figure !
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The MMU context_lock can be taken from switch_mm() while the
rq->lock is held. The rq->lock can also be taken from interrupts,
thus if we get interrupted in destroy_context() with the context
lock held and that interrupt tries to take the rq->lock, there's
a possible deadlock scenario with another CPU having the rq->lock
and calling switch_mm() which takes our context lock.
The fix is to always ensure interrupts are off when taking our
context lock. The switch_mm() path is already good so this fixes
the destroy_context() path.
While at it, turn the context lock into a new style spinlock.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch fixes a couple of issues that can happen as a result
of steal_context() dropping the context_lock when all possible
PIDs are ineligible for stealing (hopefully an extremely hard to
hit occurence).
This case exposes the possibility of a stale context_mm[] entry
to be seen since destroy_context() doesn't clear it and the free
map isn't re-tested. It also means steal_context() will not notice
a context freed while the lock was help, thus possibly trying to
steal a context when a free one was available.
This fixes it by always returning to the caller from steal_context
when it dropped the lock with a return value that causes the
caller to re-samble the number of free contexts, along with
properly clearing the context_mm[] array for destroyed contexts.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|\ \ \ \ \
| | |_|_|/
| |/| | | |
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The implementation we just revived has issues, such as using a
Kconfig-defined virtual address area in kernel space that nothing
actually carves out (and thus will overlap whatever is there),
or having some dependencies on being self contained in a single
PTE page which adds unnecessary constraints on the kernel virtual
address space.
This fixes it by using more classic PTE accessors and automatically
locating the area for consistent memory, carving an appropriate hole
in the kernel virtual address space, leaving only the size of that
area as a Kconfig option. It also brings some dma-mask related fixes
from the ARM implementation which was almost identical initially but
grew its own fixes.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Make FIXADDR_TOP a compile time constant and cleanup a
couple of definitions relative to the layout of the kernel
address space on ppc32. We also print out that layout at
boot time for debugging purposes.
This is a pre-requisite for properly fixing non-coherent
DMA allocactions.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
(pre-requisite to make the next patches more palatable)
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The recent rework of the MMU PID handling for non-hash CPUs has a
subtle bug in the !SMP "optimized" variant of the PID stealing
function. It clears the PID in the mm context before it calls
local_flush_tlb_mm(). However, the later will not flush anything
if the PID in the context is clear...
Signed-off-by: Hideo Saito <hsaito.ppc@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
With CONFIG_DEBUG_VM, an assertion is made when changing the protection
flags of a PTE that the PTE is locked. Huge pages use a different pagetable
format and the assertion is bogus and will always trigger with a bug looking
something like
Unable to handle kernel paging request for data at address 0xf1a00235800006f8
Faulting instruction address: 0xc000000000034a80
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=32 NUMA Maple
Modules linked in: dm_snapshot dm_mirror dm_region_hash
dm_log dm_mod loop evdev ext3 jbd mbcache sg sd_mod ide_pci_generic
pata_amd ata_generic ipr libata tg3 libphy scsi_mod windfarm_pid
windfarm_smu_sat windfarm_max6690_sensor windfarm_lm75_sensor
windfarm_cpufreq_clamp windfarm_core i2c_powermac
NIP: c000000000034a80 LR: c000000000034b18 CTR: 0000000000000003
REGS: c000000003037600 TRAP: 0300 Not tainted (2.6.30-rc3-autokern1)
MSR: 9000000000009032 <EE,ME,IR,DR> CR: 28002484 XER: 200fffff
DAR: f1a00235800006f8, DSISR: 0000000040010000
TASK = c0000002e54cc740[2960] 'map_high_trunca' THREAD: c000000003034000 CPU: 2
GPR00: 4000000000000000 c000000003037880 c000000000895d30 c0000002e5a2e500
GPR04: 00000000a0000000 c0000002edc40880 0000005700000393 0000000000000001
GPR08: f000000011ac0000 01a00235800006e8 00000000000000f5 f1a00235800006e8
GPR12: 0000000028000484 c0000000008dd780 0000000000001000 0000000000000000
GPR16: fffffffffffff000 0000000000000000 00000000a0000000 c000000003037a20
GPR20: c0000002e5f4ece8 0000000000001000 c0000002edc40880 0000000000000000
GPR24: c0000002e5f4ece8 0000000000000000 00000000a0000000 c0000002e5f4ece8
GPR28: 0000005700000393 c0000002e5a2e500 00000000a0000000 c000000003037880
NIP [c000000000034a80] .assert_pte_locked+0xa4/0xd0
LR [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
Call Trace:
[c000000003037880] [c000000003037990] 0xc000000003037990 (unreliable)
[c000000003037910] [c000000000034b18] .ptep_set_access_flags+0x6c/0xb4
[c0000000030379b0] [c00000000014bef8] .hugetlb_cow+0x124/0x674
[c000000003037b00] [c00000000014c930] .hugetlb_fault+0x4e8/0x6f8
[c000000003037c00] [c00000000013443c] .handle_mm_fault+0xac/0x828
[c000000003037cf0] [c0000000000340a8] .do_page_fault+0x39c/0x584
[c000000003037e30] [c0000000000057b0] handle_page_fault+0x20/0x5c
Instruction dump:
7d29582a 7d200074 7800d182 0b000000 3c004000 3960ffff 780007c6 796b00c4
7d290214 7929a302 1d290068 7d6b4a14 <800b0010> 7c000074 7800d182 0b000000
This patch fixes the problem by not asseting the PTE is locked for VMAs
backed by huge pages.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This adds the PowerPC 2.06 tlbie mnemonics and keeps backwards
compatibilty for CPUs before 2.06.
Only useful for bare metal systems.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| | |
We're currently choking on mem=4g (and above) due to memory_limit
being specified as an unsigned long. Make memory_limit
phys_addr_t to fix this.
Signed-off-by: Becky Bruce <beckyb@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previous gcc versions didn't notice this because one of the preceding
#ifs always evaluated to true.
gcc 4.4.0 produced this error:
arch/powerpc/mm/tlb_nohash_low.S:206:6: error: #elif with no expression
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| | |
This reverts commit e9965577406a2148ade97b5e0ce7c448b4ba4ef6. Our HW
guys were able to fix this so it never sees the light of day.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
early_init_mmu_secondary() is called at CPU hotplug time, so it
must be marked as __cpuinit, not __init.
Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit").
Tested-by: Sachin Sant <sachinp@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
|
| |
| |
| |
| |
| |
| |
| | |
arch/powerpc/mm/tlb_nohash.c: In function 'flush_tlb_mm':
arch/powerpc/mm/tlb_nohash.c:128: warning: unused variable 'cpu_mask'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|/
|
|
|
|
|
|
| |
During the ISA 2.06 development the opcode for tlbilx changed and some
early implementations used to old opcode. Add support for a MMU_FTR
fixup to deal with this.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
| |
We need to use %zu instead of %d when printing a sizeof()
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
| |
This file is only useful on 64-bit, so we name it accordingly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.
The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.
Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
| |
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
While we did add support for _PAGE_SPECIAL on some 32-bit platforms,
we never actually built get_user_pages_fast() on them. This fixes
it which requires a little bit of ifdef'ing around.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
| |
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On 64bit there is a possibility our stack and mmap randomisation will put
the two close enough such that we can't expand our stack to match the ulimit
specified.
To avoid this, start the upper mmap address at 1GB + 128MB below the top of our
address space, so in the worst case we end up with the same ~128MB hole as in
32bit. This works because we randomise the stack over a 1GB range.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
| |
get_random_int() returns the same value within a 1 jiffy interval. This means
that the mmap and stack regions will almost always end up the same distance
apart, making a relative offset based attack possible.
To fix this, shift the randomness we use for the mmap region by 1 bit.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Randomise mmap start address - 8MB on 32bit and 1GB on 64bit tasks.
Until ppc32 uses the mmap.c functionality, this is ppc64 specific.
Before:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
f75fe000-f7fff000 rw-p f75fe000 00:00 0
After:
# ./test & cat /proc/${!}/maps|tail -2|head -1
f718b000-f7b8c000 rw-p f718b000 00:00 0
f7551000-f7f52000 rw-p f7551000 00:00 0
f6ee7000-f78e8000 rw-p f6ee7000 00:00 0
f74d4000-f7ed5000 rw-p f74d4000 00:00 0
f6e9d000-f789e000 rw-p f6e9d000 00:00 0
Similar for 64bit, but with 1GB of scatter:
# ./test & cat /proc/${!}/maps|tail -2|head -1
fffb97b5000-fffb97b6000 rw-p fffb97b5000 00:00 0
fffce9a3000-fffce9a4000 rw-p fffce9a3000 00:00 0
fffeaaf2000-fffeaaf3000 rw-p fffeaaf2000 00:00 0
fffd88ac000-fffd88ad000 rw-p fffd88ac000 00:00 0
fffbc62e000-fffbc62f000 rw-p fffbc62e000 00:00 0
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
| |
Rearrange mmap.c to better match the x86 version.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch reworks the hot_add_scn_to_nid and its supporting functions
to make them easier to understand. There are no functional changes in
this patch and has been tested on machine with memory represented in the
device tree as memory nodes and in the ibm,dynamic-memory property.
My previous patch that introduced support for hotplug memory add on
systems whose memory was represented by the ibm,dynamic-memory property
of the device tree only left the code more unintelligible. This
will hopefully makes things easier to understand.
Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
At the moment we size the hashtable based on 4kB pages / 2, even on a
64kB kernel. This results in a hashtable that is much larger than it
needs to be.
Grab the real page size and size the hashtable based on that
Note: This only has effect on non hypervisor machines.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|\ |
|
| |
| |
| |
| |
| |
| |
| | |
arch/powerpc/mm/fsl_booke_mmu.c: In function 'adjust_total_lowmem':
arch/powerpc/mm/fsl_booke_mmu.c:221: warning: format '%ld' expects type 'long int', but argument 3 has type 'phys_addr_t'
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The Power ISA 2.06 added power of two page sizes to the embedded MMU
architecture. Its done it such a way to be code compatiable with the
existing HW. Made the minor code changes to support both power of two
and power of four page sizes. Also added some new MAS bits and macros
that are defined as part of the 2.06 ISA. Renamed some things to use
the 'Book-3e' concept to convey the new MMU that is based on the
Freescale Book-E MMU programming model.
Note, its still invalid to try and use a page size that isn't supported
by cpu.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
|\ \
| | |
| | |
| | |
| | | |
Manual merge of:
arch/powerpc/include/asm/pgtable-ppc32.h
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix the powerpc NUMA reserve bootmem page selection logic.
commit 8f64e1f2d1e09267ac926e15090fd505c1c0cbcb (powerpc: Reserve
in bootmem lmb reserved regions that cross NUMA nodes) changed
the logic for how the powerpc LMB reserved regions were converted
to bootmen reserved regions. As the folowing discussion reports,
the new logic was not correct.
mark_reserved_regions_for_nid() goes through each LMB on the
system that specifies a reserved area. It searches for
active regions that intersect with that LMB and are on the
specified node. It attempts to bootmem-reserve only the area
where the active region and the reserved LMB intersect. We
can not reserve things on other nodes as they may not have
bootmem structures allocated, yet.
We base the size of the bootmem reservation on two possible
things. Normally, we just make the reservation start and
stop exactly at the start and end of the LMB.
However, the LMB reservations are not aware of NUMA nodes and
on occasion a single LMB may cross into several adjacent
active regions. Those may even be on different NUMA nodes
and will require separate calls to the bootmem reserve
functions. So, the bootmem reservation must be trimmed to
fit inside the current active region.
That's all fine and dandy, but we trim the reservation
in a page-aligned fashion. That's bad because we start the
reservation at a non-page-aligned address: physbase.
The reservation may only span 2 bytes, but that those bytes
may span two pfns and cause a reserve_size of 2*PAGE_SIZE.
Take the case where you reserve 0x2 bytes at 0x0fff and
where the active region ends at 0x1000. You'll jump into
that if() statment, but node_ar.end_pfn=0x1 and
start_pfn=0x0. You'll end up with a reserve_size=0x1000,
and then call
reserve_bootmem_node(node, physbase=0xfff, size=0x1000);
0x1000 may not be on the same node as 0xfff. Oops.
In almost all the vm code, end_<anything> is not inclusive.
If you have an end_pfn of 0x1234, page 0x1234 is not
included in the range. Using PFN_UP instead of the
(>> >> PAGE_SHIFT) will make this consistent with the other VM
code.
We also need to do math for the reserved size with physbase
instead of start_pfn. node_ar.end_pfn << PAGE_SHIFT is
*precisely* the end of the node. However,
(start_pfn << PAGE_SHIFT) is *NOT* precisely the beginning
of the reserved area. That is, of course, physbase.
If we don't use physbase here, the reserve_size can be
made too large.
From: Dave Hansen <dave@linux.vnet.ibm.com>
Tested-by: Geoff Levand <geoffrey.levand@am.sony.com> Tested on PS3.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The following commit:
commit 64b3d0e8122b422e879b23d42f9e0e8efbbf9744
Author: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: Thu Dec 18 19:13:51 2008 +0000
powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDED
broke setting of the _PAGE_COHERENT bit in the PPC HW PTE. Since we now
actually set _PAGE_COHERENT in the Linux PTE we shouldn't be clearing it
out before we propogate it to the PPC HW PTE.
Reported-by: Martyn Welch <martyn.welch@gefanuc.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch reworks the way we do I and D cache coherency on PowerPC.
The "old" way was split in 3 different parts depending on the processor type:
- Hash with per-page exec support (64-bit and >= POWER4 only) does it
at hashing time, by preventing exec on unclean pages and cleaning pages
on exec faults.
- Everything without per-page exec support (32-bit hash, 8xx, and
64-bit < POWER4) does it for all page going to user space in update_mmu_cache().
- Embedded with per-page exec support does it from do_page_fault() on
exec faults, in a way similar to what the hash code does.
That leads to confusion, and bugs. For example, the method using update_mmu_cache()
is racy on SMP where another processor can see the new PTE and hash it in before
we have cleaned the cache, and then blow trying to execute. This is hard to hit but
I think it has bitten us in the past.
Also, it's inefficient for embedded where we always end up having to do at least
one more page fault.
This reworks the whole thing by moving the cache sync into two main call sites,
though we keep different behaviours depending on the HW capability. The call
sites are set_pte_at() which is now made out of line, and ptep_set_access_flags()
which joins the former in pgtable.c
The base idea for Embedded with per-page exec support, is that we now do the
flush at set_pte_at() time when coming from an exec fault, which allows us
to avoid the double fault problem completely (we can even improve the situation
more by implementing TLB preload in update_mmu_cache() but that's for later).
If for some reason we didn't do it there and we try to execute, we'll hit
the page fault, which will do a minor fault, which will hit ptep_set_access_flags()
to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make
this guys also perform the I/D cache sync for exec faults now. This second path
is the catch all for things that weren't cleaned at set_pte_at() time.
For cpus without per-pag exec support, we always do the sync at set_pte_at(),
thus guaranteeing that when the PTE is visible to other processors, the cache
is clean.
For the 64-bit hash with per-page exec support case, we keep the old mechanism
for now. I'll look into changing it later, once I've reworked a bit how we
use _PAGE_EXEC.
This is also a first step for adding _PAGE_EXEC support for embedded platforms
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We currently place mmaps just below the stack on 32bit, but leave them
in the middle of the address space on 64bit:
00100000-00120000 r-xp 00100000 00:00 0 [vdso]
10000000-10010000 r-xp 00000000 08:06 179534 /tmp/sleep
10010000-10020000 rw-p 00000000 08:06 179534 /tmp/sleep
10020000-10130000 rw-p 10020000 00:00 0 [heap]
40000000000-40000030000 r-xp 00000000 08:06 440743 /lib64/ld-2.9.so
40000030000-40000040000 rw-p 00020000 08:06 440743 /lib64/ld-2.9.so
40000050000-400001f0000 r-xp 00000000 08:06 440671 /lib64/libc-2.9.so
400001f0000-40000200000 r--p 00190000 08:06 440671 /lib64/libc-2.9.so
40000200000-40000220000 rw-p 001a0000 08:06 440671 /lib64/libc-2.9.so
40000220000-40008230000 rw-p 40000220000 00:00 0
fffffbc0000-fffffd10000 rw-p fffffeb0000 00:00 0 [stack]
Right now it isn't an issue, but at some stage we will run into mmap or
hugetlb allocation issues. Using the same layout as 32bit gives us a
some breathing room. This matches what x86-64 is doing too.
00100000-00103000 r-xp 00100000 00:00 0 [vdso]
10000000-10001000 r-xp 00000000 08:06 554894 /tmp/test
10010000-10011000 r--p 00000000 08:06 554894 /tmp/test
10011000-10012000 rw-p 00001000 08:06 554894 /tmp/test
10012000-10113000 rw-p 10012000 00:00 0 [heap]
fffefdf7000-ffff7df8000 rw-p fffefdf7000 00:00 0
ffff7df8000-ffff7f97000 r-xp 00000000 08:06 130591 /lib64/libc-2.9.so
ffff7f97000-ffff7fa6000 ---p 0019f000 08:06 130591 /lib64/libc-2.9.so
ffff7fa6000-ffff7faa000 r--p 0019e000 08:06 130591 /lib64/libc-2.9.so
ffff7faa000-ffff7fc0000 rw-p 001a2000 08:06 130591 /lib64/libc-2.9.so
ffff7fc0000-ffff7fc4000 rw-p ffff7fc0000 00:00 0
ffff7fc4000-ffff7fec000 r-xp 00000000 08:06 130663 /lib64/ld-2.9.so
ffff7fee000-ffff7ff0000 rw-p ffff7fee000 00:00 0
ffff7ffa000-ffff7ffb000 rw-p ffff7ffa000 00:00 0
ffff7ffb000-ffff7ffc000 r--p 00027000 08:06 130663 /lib64/ld-2.9.so
ffff7ffc000-ffff7fff000 rw-p 00028000 08:06 130663 /lib64/ld-2.9.so
ffff7fff000-ffff8000000 rw-p ffff7fff000 00:00 0
fffffc59000-fffffc6e000 rw-p ffffffeb000 00:00 0 [stack]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Use of_get_cpu_node, which is a superset of numa.c's find_cpu_node in
a less restrictive section (text vs cpuinit).
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
find_min_common_depth()
find_min_common_depth() was checking the property length incorrectly.
The value is in bytes not cells, and it is using the second entry.
Signed-off-By: Milton Miller <miltonm@bga.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
|\ \
| |/
|/| |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On booke processors, the code that maps low memory only uses up to three
CAM entries, even though there are sixteen and nothing else uses them.
Make this number configurable in the advanced options menu along with max
low memory size. If one wants 1 GB of lowmem, then it's typically
necessary to have four CAM entries.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The code that maps kernel low memory would only use page sizes up to 256
MB. On E500v2 pages up to 4 GB are supported.
However, a page must be aligned to a multiple of the page's size. I.e.
256 MB pages must aligned to a 256 MB boundary. This was enforced by a
requirement that the physical and virtual addresses of the start of lowmem
be aligned to 256 MB. Clearly requiring 1GB or 4GB alignment to allow
pages of that size isn't acceptable.
To solve this, I simply have adjust_total_lowmem() take alignment into
account when it decides what size pages to use. Give it PAGE_OFFSET =
0x7000_0000, PHYSICAL_START = 0x3000_0000, and 2GB of RAM, and it will map
pages like this:
PA 0x3000_0000 VA 0x7000_0000 Size 256 MB
PA 0x4000_0000 VA 0x8000_0000 Size 1 GB
PA 0x8000_0000 VA 0xC000_0000 Size 256 MB
PA 0x9000_0000 VA 0xD000_0000 Size 256 MB
PA 0xA000_0000 VA 0xE000_0000 Size 256 MB
Because the lowmem mapping code now takes alignment into account,
PHYSICAL_ALIGN can be lowered from 256 MB to 64 MB. Even lower might be
possible. The lowmem code will work down to 4 kB but it's possible some of
the boot code will fail before then. Poor alignment will force small pages
to be used, which combined with the limited number of TLB1 pages available,
will result in very little memory getting mapped. So alignments less than
64 MB probably aren't very useful anyway.
Signed-off-by: Trent Piepho <tpiepho@freescale.com>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
|