aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/mm/fault.c
Commit message (Collapse)AuthorAge
* Merge branch 'auto-ftrace-next' into tracing/for-linusIngo Molnar2008-07-14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/entry_32.S arch/x86/kernel/process_32.c arch/x86/kernel/process_64.c arch/x86/lib/Makefile include/asm-x86/irqflags.h kernel/Makefile kernel/sched.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * Merge branch 'linus' into tracing/mmiotraceIngo Molnar2008-07-07
| |\
| * | x86: mmiotrace full patch, preview 1Pekka Paalanen2008-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kmmio.c handles the list of mmio probes with callbacks, list of traced pages, and attaching into the page fault handler and die notifier. It arms, traps and disarms the given pages, this is the core of mmiotrace. mmio-mod.c is a user interface, hooking into ioremap functions and registering the mmio probes. It also decodes the required information from trapped mmio accesses via the pre and post callbacks in each probe. Currently, hooking into ioremap functions works by redefining the symbols of the target (binary) kernel module, so that it calls the traced versions of the functions. The most notable changes done since the last discussion are: - kmmio.c is a built-in, not part of the module - direct call from fault.c to kmmio.c, removing all dynamic hooks - prepare for unregistering probes at any time - make kmmio re-initializable and accessible to more than one user - rewrite kmmio locking to remove all spinlocks from page fault path Can I abuse call_rcu() like I do in kmmio.c:unregister_kmmio_probe() or is there a better way? The function called via call_rcu() itself calls call_rcu() again, will this work or break? There I need a second grace period for RCU after the first grace period for page faults. Mmiotrace itself (mmio-mod.c) is still a module, I am going to attack that next. At some point I will start looking into how to make mmiotrace a tracer component of ftrace (thanks for the hint, Ingo). Ftrace should make the user space part of mmiotracing as simple as 'cat /debug/trace/mmio > dump.txt'. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86: explicit call to mmiotrace in do_page_fault()Pekka Paalanen2008-05-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The custom page fault handler list is replaced with a single function pointer. All related functions and variables are renamed for mmiotrace. Signed-off-by: Pekka Paalanen <pq@iki.fi> Cc: Christoph Hellwig <hch@infradead.org> Cc: Arjan van de Ven <arjan@infradead.org> Cc: pq@iki.fi Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86: add a list for custom page fault handlers.Pekka Paalanen2008-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provides kernel modules a way to register custom page fault handlers. On every page fault this will call a list of registered functions. The functions may handle the fault and force do_page_fault() to return immediately. This functionality is similar to the now removed page fault notifiers. Custom page fault handlers are used by debugging and reverse engineering tools. Mmiotrace is one such tool and a patch to add it into the tree will follow. The custom page fault handlers are called earlier in do_page_fault() than the page fault notifiers were. Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | x86: simplify vmalloc_sync_allJeremy Fitzhardinge2008-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vmalloc_sync_all() is only called from register_die_notifier and alloc_vm_area. Neither is on any performance-critical paths, so vmalloc_sync_all() itself is not on any hot paths. Given that the optimisations in vmalloc_sync_all add a fair amount of code and complexity, and are fairly hard to evaluate for correctness, it's better to just remove them to simplify the code rather than worry about its absolute performance. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | |
| \ \
| \ \
| \ \
*---. \ \ Merge branches 'x86/mmio', 'x86/delay', 'x86/idle', 'x86/oprofile', ↵Ingo Molnar2008-07-08
|\ \ \ \ \ | | |_|/ / | |/| | / | | | |/ | | |/| 'x86/debug', 'x86/ptrace' and 'x86/amd-iommu' into x86/devel
| | | * x86: small unifications of address printingVegard Nossum2008-07-01
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'man 3 printf' tells me that %p should be printed as if by %#x, but this is not true for the kernel, which does not use the '0x' prefix for the %p conversion specifier. A small cast to (void *) is also prettier than #ifdef/#else/#endif. Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | / x86: remove unnecessary #ifdef CONFIG_X86_32...#elseGustavo Fernando Padovan2008-07-03
| |/ |/| | | | | | | | | | | | | | | Remove the #ifdef conditional because this comparison is already done in user_mode_vm(). Signed-off-by: Gustavo F. Padovan <gustavo@las.ic.unicamp.br> Cc: akpm@osdl.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | x86: fix endless page faults in mount_block_root for Linux 2.6Henry Nestler2008-06-12
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Page faults in kernel address space between PAGE_OFFSET up to VMALLOC_START should not try to map as vmalloc. Fix rarely endless page faults inside mount_block_root for root filesystem at boot time. All 32bit kernels up to 2.6.25 can fail into this hole. I can not present this under native linux kernel. I see, that the 64bit has fixed the problem. I copied the same lines into 32bit part. Recorded debugs are from coLinux kernel 2.6.22.18 (virtualisation): http://www.henrynestler.com/colinux/testing/pfn-check-0.7.3/20080410-antinx/bug16-recursive-page-fault-endless.txt The physicaly memory was trimmed down to 192MB to better catch the bug. More memory gets the bug more rarely. Details, how every x86 32bit system can fail: Start from "mount_block_root", http://lxr.linux.no/linux/init/do_mounts.c#L297 There the variable "fs_names" got one memory page with 4096 bytes. Variable "p" walks through the existing file system types. The first string is no problem. But, with the second loop in mount_block_root the offset of "p" is not at beginning of page, the offset is for example +9, if "reiserfs" is the first in list. Than calls do_mount_root, and lands in sys_mount. Remember: Variable "type_page" contains now "fs_type+9" and not contains a full page. The sys_mount copies 4096 bytes with function "exact_copy_from_user()": http://lxr.linux.no/linux/fs/namespace.c#L1540 Mostly exist pages after the buffer "fs_names+4096+9" and the page fault handler was not called. No problem. In the case, if the page after "fs_names+4096" is not mapped, the page fault handler was called from http://lxr.linux.no/linux/fs/namespace.c#L1320 The do_page_fault gots an address 0xc03b4000. It's kernel address, address >= TASK_SIZE, but not from vmalloc! It's from "__getname()" alias "kmem_cache_alloc". The "error_code" is 0. "vmalloc_fault" will be call: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L332 "vmalloc_fault" tryed to find the physical page for a non existing virtual memory area. The macro "pte_present" in vmalloc_fault() got a next page fault for 0xc0000ed0 at: http://lxr.linux.no/linux/arch/i386/mm/fault.c#L282 No PTE exist for such virtual address. The page fault handler was trying to sync the physical page for the PTE lockup. This called vmalloc_fault() again for address 0xc000000, and that also was not existing. The endless began... In normal case the cpu would still loop with disabled interrrupts. Under coLinux this was catched by a stack overflow inside printk debugs. Signed-off-by: Henry Nestler <henry.nestler@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: cleanup - rename VM_MASK to X86_VM_MASKgorcunov@gmail.com2008-04-17
| | | | | | | | | | | This patch renames VM_MASK to X86_VM_MASK (which in turn defined as alias to X86_EFLAGS_VM) to better distinguish from virtual memory flags. We can't just use X86_EFLAGS_VM instead because it is also used for conditional compilation Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: check vmlinux limits, 64-bitIngo Molnar2008-04-17
| | | | | | | these build-time and link-time checks would have prevented the vmlinux size regression. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: prefetch fix #2Ingo Molnar2008-03-27
| | | | | | | | | | | | | Linus noticed a second bug and an uncleanliness: - we'd return on any instruction fetch fault - we'd use both the value of 16 and the PF_INSTR symbol which are the same and make no sense the cleanup nicely unifies this piece of logic. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: fix prefetch workaroundIngo Molnar2008-03-27
| | | | | | | some early Athlon XP's and Opterons generate bogus faults on prefetch instructions. The workaround for this regressed over .24 - reinstate it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: make dump_pagetable() staticAdrian Bunk2008-02-14
| | | | | | | | dump_pagetable() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: fix deadlock, make pgd_lock irq-safeIngo Molnar2008-02-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lockdep just caught this one: ================================= [ INFO: inconsistent lock state ] 2.6.24 #38 --------------------------------- inconsistent {in-softirq-W} -> {softirq-on-W} usage. swapper/1 [HC0[0]:SC0[0]:HE1:SE1] takes: (pgd_lock){-+..}, at: [<ffffffff8022a9ea>] mm_init+0x1da/0x250 {in-softirq-W} state was registered at: [<ffffffffffffffff>] 0xffffffffffffffff irq event stamp: 394559 hardirqs last enabled at (394559): [<ffffffff80267f0a>] get_page_from_freelist+0x30a/0x4c0 hardirqs last disabled at (394558): [<ffffffff80267d25>] get_page_from_freelist+0x125/0x4c0 softirqs last enabled at (393952): [<ffffffff80232f8e>] __do_softirq+0xce/0xe0 softirqs last disabled at (393945): [<ffffffff8020c57c>] call_softirq+0x1c/0x30 other info that might help us debug this: no locks held by swapper/1. stack backtrace: Pid: 1, comm: swapper Not tainted 2.6.24 #38 Call Trace: [<ffffffff8024e1fb>] print_usage_bug+0x18b/0x190 [<ffffffff8024f55d>] mark_lock+0x53d/0x560 [<ffffffff8024fffa>] __lock_acquire+0x3ca/0xed0 [<ffffffff80250ba8>] lock_acquire+0xa8/0xe0 [<ffffffff8022a9ea>] ? mm_init+0x1da/0x250 [<ffffffff809bcd10>] _spin_lock+0x30/0x70 [<ffffffff8022a9ea>] mm_init+0x1da/0x250 [<ffffffff8022aa99>] mm_alloc+0x39/0x50 [<ffffffff8028b95a>] bprm_mm_init+0x2a/0x1a0 [<ffffffff8028d12b>] do_execve+0x7b/0x220 [<ffffffff80209776>] sys_execve+0x46/0x70 [<ffffffff8020c214>] kernel_execve+0x64/0xd0 [<ffffffff8020901e>] ? _stext+0x1e/0x20 [<ffffffff802090ba>] init_post+0x9a/0xf0 [<ffffffff809bc5f6>] ? trace_hardirqs_on_thunk+0x35/0x3a [<ffffffff8024f75a>] ? trace_hardirqs_on+0xba/0xd0 [<ffffffff8020c1a8>] ? child_rip+0xa/0x12 [<ffffffff8020bcbc>] ? restore_args+0x0/0x44 [<ffffffff8020c19e>] ? child_rip+0x0/0x12 turns out that pgd_lock has been used on 64-bit x86 in an irq-unsafe way for almost two years, since commit 8c914cb704a11460e. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: make spurious fault handler aware of large mappingsThomas Gleixner2008-02-06
| | | | | | | | | | | | | In very rare cases, on certain CPUs, we could end up in the spurious fault handler and ignore a large pud/pmd mapping. The resulting pte pointer points into the mapped physical space and dereferencing it will fault recursively. Make the code aware of large mappings and do the permission check on the pmd/pud entry, when a large pud/pmd mapping is detected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: support gbpages in pagetable dumpAndi Kleen2008-02-04
| | | | | | Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: reduce ifdef sections in fault.cHarvey Harrison2008-02-04
| | | | | | Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: fixes for lookup_address argsHarvey Harrison2008-02-01
| | | | | | | Signedness mismatches in level argument. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: use the same pgd_list for PAE and 64-bitJeremy Fitzhardinge2008-01-30
| | | | | | | | | | Use a standard list threaded through page->lru for maintaining the pgd list on PAE. This is the same as 64-bit, and seems saner than using a non-standard list via page->index. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: shrink some ifdefs in fault.cHarvey Harrison2008-01-30
| | | | | | | | | | | | The change from current to tsk in do_page_fault is safe as this is set at the very beginning of the function. Removes a likely() annotation from the 64-bit version, this could have instead been added to 32-bit. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: ignore spurious faultsJeremy Fitzhardinge2008-01-30
| | | | | | | | | | | | | | | | | | When changing a kernel page from RO->RW, it's OK to leave stale TLB entries around, since doing a global flush is expensive and they pose no security problem. They can, however, generate a spurious fault, which we should catch and simply return from (which will have the side-effect of reloading the TLB to the current PTE). This can occur when running under Xen, because it frequently changes kernel pages from RW->RO->RW to implement Xen's pagetable semantics. It could also occur when using CONFIG_DEBUG_PAGEALLOC, since it avoids doing a global TLB flush after changing page permissions. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: remove nx_enabled from fault.cHarvey Harrison2008-01-30
| | | | | | | | | | | On !PAE 32-bit, _PAGE_NX will be 0, making is_prefetch always return early. The test is sufficient on PAE as __supported_pte_mask is updated in the same places as nx_enabled in init_32.c which also takes disable_nx into account. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: unify fault_32|64.cHarvey Harrison2008-01-30
Unify includes in moved fault.c. Modify Makefiles to pick up unified file. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>