litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	Add commentary about the new "asmlinkage_protect()" macro	Linus Torvalds	2008-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's really a pretty ugly thing to need, and some day it will hopefully be obviated by teaching gcc about the magic calling conventions for the low-level system call code, but in the meantime we can at least add big honking comments about why we need these insane and strange macros. I took my comments from my version of the macro, but I ended up deciding to just pick Roland's version of the actual code instead (with his prettier syntax that uses vararg macros). Thus the previous two commits that actually implement it. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	asmlinkage_protect replaces prevent_tail_call	Roland McGrath	2008-04-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: fix 64-bit asm NOPS for CONFIG_GENERIC_CPU	Suresh Siddha	2008-04-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ASM_NOP's for 64-bit kernel with CONFIG_GENERIC_CPU is broken with the recent x86 nops merge. They were using GENERIC_NOPS which will truncate the upper 32bits of %rsi, because of the missing 64bit rex prefix. For now, fall back ASM NOPS for generic cpu to K8 NOPS, similar to the code before the wrong x86 nop merge. This should resolve the crash seen by Ingo on a test-system: BUG: unable to handle kernel paging request at 00000000d80d8ee8 IP: [<ffffffff802121af>] save_i387_ia32+0x61/0xd8 PGD b8e0067 PUD 51490067 PMD 0 Oops: 0000 [1] SMP CPU 2 Modules linked in: Pid: 3871, comm: distcc Not tainted 2.6.25-rc7-sched-devel.git-x86-latest.git #359 RIP: 0010:[<ffffffff802121af>] [<ffffffff802121af>] save_i387_ia32+0x61/0xd8 RSP: 0000:ffff81003abd3cb8 EFLAGS: 00010246 RAX: ffff810082e93400 RBX: 00000000ffc37f84 RCX: ffff8100d80d8ee0 RDX: 0000000000000000 RSI: 00000000d80d8ee0 RDI: ffff810082e93400 RBP: 00000000ffc37fdc R08: 00000000ffc37f88 R09: 0000000000000008 R10: ffff81003abd2000 R11: 0000000000000000 R12: ffff810082e93400 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff81011fb12dc0(0063) knlGS:00000000f7f1a6c0 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 00000000d80d8ee8 CR3: 0000000076922000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process distcc (pid: 3871, threadinfo ffff81003abd2000, task ffff8100d80d8ee0) Stack: ffff8100bb670380 ffffffff8026de50 0000000000000118 0000000000000002 0000000000000002 ffff81003abd3e68 ffff81003abd3ed8 ffff81003abd3de8 ffff81003abd3d18 ffffffff80229785 ffff8100d80d8ee0 ffff810001041280 Call Trace: [<ffffffff8026de50>] ? __generic_file_aio_write_nolock+0x343/0x377 [<ffffffff80229785>] ? update_curr+0x54/0x64 [<ffffffff80227cd3>] ? ia32_setup_sigcontext+0x125/0x1d2 [<ffffffff8022839f>] ? ia32_setup_frame+0x73/0x1a5 [<ffffffff8020b2a5>] ? do_notify_resume+0x1aa/0x7db [<ffffffff8024ae8c>] ? getnstimeofday+0x31/0x85 [<ffffffff80249858>] ? ktime_get_ts+0x17/0x48 [<ffffffff80249933>] ? ktime_get+0xc/0x41 [<ffffffff8024973e>] ? hrtimer_nanosleep+0x75/0xd5 [<ffffffff80249261>] ? hrtimer_wakeup+0x0/0x21 [<ffffffff8020bfbc>] ? int_signal+0x12/0x17 [<ffffffff8030e6b3>] ? dummy_file_free_security+0x0/0x1 Code: a6 08 05 00 00 f6 40 14 01 74 34 4c 89 e7 48 0f ae 07 48 8b 86 08 05 00 00 80 78 02 00 79 02 db e2 90 8d b4 26 00 00 00 00 89 f6 <48> 8b 46 08 83 60 14 fe 0f 20 c0 48 83 c8 08 0f 22 c0 eb 07 c6 Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: fix breakage of vSMP irq operations	Ravikiran G Thirumalai	2008-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	25-rc* stopped working with CONFIG_X86_VSMP on vSMP machines. Looks like the vsmp irq ops got accidentally removed during merge of x86_64 pvops in 2.6.25. -- commit 6abcd98ffafbff81f0bfd7ee1d129e634af13245 removed vsmp irq ops. Tested with both CONFIG_X86_VSMP and without CONFIG_X86_VSMP, on vSMP and non vSMP x86_64 machines. Please apply. Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	lguest: comment documentation update.	Rusty Russell	2008-03-27
\| \| \| \| \| \| \| \| \|	Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
*	rdc321x: GPIO routines bugfixes	Florian Fainelli	2008-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the use of GPIO routines which are in the PCI configuration space of the RDC321x, therefore reading/writing to this space without spinlock protection can be problematic. We also now request and free GPIOs and support the MGB100 board, previous code was very AR525W-centric. Signed-off-by: Volker Weiss <volker@tintuc.de> Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: fix performance drop for glx	Suresh Siddha	2008-03-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fix the 3D performance drop reported at: http://bugzilla.kernel.org/show_bug.cgi?id=10328 fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with WC attribute. Recent changes in page attribute code made both ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This breaks the graphics performance, as the effective memory type is UC instead of expected WC. The correct way to fix this is to add ioremap_wc() (which uses UC- in the absence of PAT kernel support and WC with PAT) and change all the fb drivers to use this new ioremap_wc() API. We can take this correct and longer route for post 2.6.25. For now, revert back to the UC- behavior for ioremap/ioremap_nocache. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86-32: Pass the full resource data to ioremap()	Linus Torvalds	2008-03-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It appears that 64-bit PCI resources cannot possibly ever have worked on x86-32 even when the RESOURCES_64BIT config option was set, because any driver that tried to [pci_]ioremap() the resource would have been unable to do so because the high 32 bits would have been silently dropped on the floor by the ioremap() routines that only used "unsigned long". Change them to use "resource_size_t" instead, which properly encodes the whole 64-bit resource data if RESOURCES_64BIT is enabled. Acked-by: H. Peter Anvin <hpa@kernel.org> Acked-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: revert: reserve dma32 early for gart	Thomas Gleixner	2008-03-22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revert commit f62f1fc9ef94f74fda2b456d935ba2da69fa0a40 Author: Yinghai Lu <yhlu.kernel@gmail.com> Date: Fri Mar 7 15:02:50 2008 -0800 x86: reserve dma32 early for gart The patch has a dependency on bootmem modifications which are not .25 material that late in the -rc cycle. The problem which is addressed by the patch is limited to machines with 256G and more memory booted with NUMA disabled. This is not a .25 regression and the audience which is affected by this problem is very limited, so it's safer to do the revert than pulling in intrusive bootmem changes right now. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	sync_bitops: fix wrong comments [Bug 10247]	Matti Linnanvuori	2008-03-21
\| \| \| \| \| \| \| \|	Fix wrong function name and references to non-x86 architectures. Signed-off-by: Matti Linnanvuori mattilinnanvuori@yahoo.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: trim mtrr don't close gap for resource allocation.	Yinghai Lu	2008-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	fix the bug reported here: http://bugzilla.kernel.org/show_bug.cgi?id=10232 use update_memory_range() instead of add_memory_range() directly to avoid closing the gap. ( the new code only affects and runs on systems where the MTRR workaround triggers. ) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: reserve dma32 early for gart	Yinghai Lu	2008-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	a system with 256 GB of RAM, when NUMA is disabled crashes the following way: Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Cannot allocate aperture memory hole (ffff8101c0000000,65536K) Kernel panic - not syncing: Not enough memory for aperture Pid: 0, comm: swapper Not tainted 2.6.25-rc4-x86-latest.git #33 Call Trace: [<ffffffff84037c62>] panic+0xb2/0x190 [<ffffffff840381fc>] ? release_console_sem+0x7c/0x250 [<ffffffff847b1628>] ? __alloc_bootmem_nopanic+0x48/0x90 [<ffffffff847b0ac9>] ? free_bootmem+0x29/0x50 [<ffffffff847ac1f7>] gart_iommu_hole_init+0x5e7/0x680 [<ffffffff847b255b>] ? alloc_large_system_hash+0x16b/0x310 [<ffffffff84506a2f>] ? _etext+0x0/0x1 [<ffffffff847a2e8c>] pci_iommu_alloc+0x1c/0x40 [<ffffffff847ac795>] mem_init+0x45/0x1a0 [<ffffffff8479ff35>] start_kernel+0x295/0x380 [<ffffffff8479f1c2>] _sinittext+0x1c2/0x230 the root cause is : memmap PMD is too big, [ffffe200e0600000-ffffe200e07fffff] PMD ->ffff81383c000000 on node 0 almost near 4G..., and vmemmap_alloc_block will use up the ram under 4G. solution will be: 1. make memmap allocation get memory above 4G... 2. reserve some dma32 range early before we try to set up memmap for all. and release that before pci_iommu_alloc, so gart or swiotlb could get some range under 4g limit for sure. the patch is using method 2. because method1 may need more code to handle SPARSEMEM and SPASEMEM_VMEMMAP will get Your BIOS doesn't leave a aperture memory hole Please enable the IOMMU option in the BIOS setup This costs you 64 MB of RAM Mapping aperture over 65536 KB of RAM @ 4000000 Memory: 264245736k/268959744k available (8484k kernel code, 4187464k reserved, 4004k data, 724k init) Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: fix {clear,copy}_user_page() declarations in page.h	Chuck Lever	2008-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Clean up: eliminate some compiler noise on x86 when building with strict warnings enabled, introduced by commit 345b904c. In file included from include2/asm/thread_info_64.h:12, from include2/asm/thread_info.h:4, from /home/cel/src/linux/nfs-2.6/include/linux/thread_info.h:35, from /home/cel/src/linux/nfs-2.6/include/linux/preempt.h:9, from /home/cel/src/linux/nfs-2.6/include/linux/spinlock.h:49, from /home/cel/src/linux/nfs-2.6/include/linux/mmzone.h:7, from /home/cel/src/linux/nfs-2.6/include/linux/gfp.h:4, from /home/cel/src/linux/nfs-2.6/include/linux/slab.h:14, from /home/cel/src/linux/nfs-2.6/fs/nfsd/nfs4acl.c:40: include2/asm/page.h:55: warning: `inline' is not at beginning of declaration include2/asm/page.h:61: warning: `inline' is not at beginning of declaration Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: cast cmpxchg and cmpxchg_local result for 386 and 486	Mathieu Desnoyers	2008-03-21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mm/slub.c: In function 'slab_alloc': mm/slub.c:1637: warning: assignment makes pointer from integer without a cast mm/slub.c:1637: warning: assignment makes pointer from integer without a cast mm/slub.c: In function 'slab_free': mm/slub.c:1796: warning: assignment makes pointer from integer without a cast mm/slub.c:1796: warning: assignment makes pointer from integer without a cast A cast is needed in the 386 and 486 code because the type is a pointer. In every other integer case the original cmpxchg code (and the cmpxchg_local which has been copied from it) worked fine, but since we touch a pointer, the type needs to be casted in the cmpxchg_local and cmpxchg macros. The more recent code (586+) does not have this problem (the cast is already there). Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Christoph Lameter <clameter@sgi.com> Cc: Vegard Nossum <vegard.nossum@gmail.com> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: remove quicklists	Thomas Gleixner	2008-03-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	quicklists cause a serious memory leak on 32-bit x86, as documented at: http://bugzilla.kernel.org/show_bug.cgi?id=9991 the reason is that the quicklist pool is a special-purpose cache that grows out of proportion. It is not accounted for anywhere and users have no way to even realize that it's the quicklists that are causing RAM usage spikes. It was supposed to be a relatively small pool, but as demonstrated by KOSAKI Motohiro, they can grow as large as: Quicklists: 1194304 kB given how much trouble this code has caused historically, and given that Andrew objected to its introduction on x86 (years ago), the best option at this point is to remove them. [ any performance benefits of caching constructed pgds should be implemented in a more generic way (possibly within the page allocator), while still allowing constructed pages to be allocated by other workloads. ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	Really unexport asm/page.h	David Woodhouse	2008-03-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ed7b1889da256977574663689b598d88950bbd23 removed page.h from include/asm-generic/Kbuild so that it shouldn't get exported. However, it was redundantly listed in asm-mn10300/Kbuild and asm-x86/Kbuild too. Remove those as well, so it really stops being exported on those architectures. Also remove the redundant listing of ptrace.h and termios.h from mn10300. Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Kprobes: indicate kretprobe support in Kconfig	Ananth N Mavinakayanahalli	2008-03-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant architectures with kprobes support. This facilitates easy handling of in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on kretprobes being present in the kernel. Thanks to Sam Ravnborg for helping make the patch more lean. Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Revert "x86: fix pmd_bad and pud_bad to support huge pages"	Linus Torvalds	2008-03-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit cded932b75ab0a5f9181ee3da34a0a488d1a14fd. Arjan bisected down a boot-time hang to this, saying: ".. it prevents the kernel to finish booting on my (Penryn based) laptop. The boot stops right after freeing the init memory." and while it's not clear exactly what triggers it, at this stage we're better off just reverting it while Ingo tries to figure out what went wrong. Requested-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Hans Rosenfeld <hans.rosenfeld@amd.com> Cc: Nish Aravamudan <nish.aravamudan@gmail.com> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86 ptrace: fix ptrace_bts_config structure declaration	Dave Anderson	2008-02-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 2.6.25 ptrace_bts_config structure in asm-x86/ptrace-abi.h is defined with u32 types: #include <asm/types.h> /* configuration/status structure used in PTRACE_BTS_CONFIG and PTRACE_BTS_STATUS commands. / struct ptrace_bts_config { / requested or actual size of BTS buffer in bytes / u32 size; / bitmask of below flags / u32 flags; / buffer overflow signal / u32 signal; / actual size of bts_struct in bytes */ u32 bts_size; }; #endif But u32 is only accessible in asm-x86/types.h if __KERNEL__, leading to compile errors when ptrace.h is included from user-space. The double-underscore versions that are exported to user-space in asm-x86/types.h should be used instead. Signed-off-by: Dave Anderson <anderson@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: fix pmd_bad and pud_bad to support huge pages	Hans Rosenfeld	2008-02-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I recently stumbled upon a problem in the support for huge pages. If a program using huge pages does not explicitly unmap them, they remain mapped (and therefore, are lost) after the program exits. I observed that the free huge page count in /proc/meminfo decreased when running my program, and it did not increase after the program exited. After running the program a few times, no more huge pages could be allocated. The reason for this seems to be that the x86 pmd_bad and pud_bad consider pmd/pud entries having the PSE bit set invalid. I think there is nothing wrong with this bit being set, it just indicates that the lowest level of translation has been reached. This bit has to be (and is) checked after the basic validity of the entry has been checked, like in this fragment from follow_page() in mm/memory.c: if (pmd_none(pmd) \|\| unlikely(pmd_bad(pmd))) goto no_page_table; if (pmd_huge(*pmd)) { BUG_ON(flags & FOLL_GET); page = follow_huge_pmd(mm, address, pmd, flags & FOLL_WRITE); goto out; } Note that this code currently doesn't work as intended if the pmd refers to a huge page, the pmd_huge() check can not be reached if the page is huge. Extending pmd_bad() (and, for future 1GB page support, pud_bad()) to allow for the PSE bit being set fixes this. For similar reasons, allowing the NX bit being set is necessary, too. I have seen huge pages having the NX bit set in their pmd entry, which would cause the same problem. Signed-Off-By: Hans Rosenfeld <hans.rosenfeld@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: no robust/pi futex for real i386 CPUs	Thomas Gleixner	2008-02-26
\| \| \| \| \| \| \| \|	Real i386 CPUs do not have cmpxchg instructions. Catch it before crashing on an invalid opcode. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE	Ingo Molnar	2008-02-26
\| \| \| \| \| \| \| \| \| \| \| \|	The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel text but data, bss and init sections as well. That name led me on the wrong path with the KERNEL_TEXT_SIZE regression, because i knew how big of _text_ my images have and i knew about the 40 MB "text" limit so i wrongly thought to be on the safe side of the 40 MB limit with my 29 MB of text, while the total image size was slightly above 40 MB. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: fix spontaneous reboot with allyesconfig bzImage	Ingo Molnar	2008-02-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	recently the 64-bit allyesconfig bzImage kernel started spontaneously rebooting during early bootup. after a few fun hours spent with early init debugging, it turns out that we've got this rather annoying limit on the size of the kernel image: #define KERNEL_TEXT_SIZE (4010241024) which limit my vmlinux just happened to pass: text data bss dec hex filename 29703744 4222751 8646224 42572719 2899baf vmlinux 40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/ So it happily crashed right in head_64.S, which - as we all know - is the most debuggable code in the whole architecture ;-) So increase the limit to allow an up to 128MB kernel image to be mapped. (should anyone be that crazy or lazy) We have a full 4K of pagetable (level2_kernel_pgt) allocated for these mappings already, so there's no RAM overhead and the limit was rather pointless and arbitrary. Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: add comments for NOPs	H. Peter Anvin	2008-02-26
\| \| \| \| \| \| \| \|	Add comments describing the various NOP sequences. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: require family >= 6 if we are using P6 NOPs	H. Peter Anvin	2008-02-26
\| \| \| \| \| \| \| \| \|	The P6 family of NOPs are only available on family >= 6 or above, so enforce that in the boot code. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	lguest: include function prototypes	Harvey Harrison	2008-02-26
\| \| \| \| \| \| \| \| \| \| \|	Added a declaration to asm-x86/lguest.h and moved the extern arrays there as well. As an alternative to including asm/lguest.h directly, an include could be put in linux/lguest.h Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Cc: "rusty@rustcorp.com.au" <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	Remove empty file remnants that were left in the tree by mistake	Linus Torvalds	2008-02-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Noted by various people (Sam, Jeff, Roland..) Commit 58b7983d15a422d9616bdc4e245d5c31dfaefbe2 intended to remove the xfs "Makefile-linux-2.6" file, but it was mistakenly still left in the tree as a empty file, and would cause git to correctly complain about a tracked file being removed after a "make distclean" (which removes empty files as garbage). And the asm-x86/desc_64.h file was supposed to be removed by commit c81c6ca45a69478c7877b729af1942d2b80ef582, but instead stayed around containing just a single newline. Get rid of them both properly. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	x86: add pgd_large() on 64-bit, for consistency	H. Peter Anvin	2008-02-19
\| \| \| \| \| \| \| \| \|	In order to have it at all levels, add pgd_large() which only returns 0. Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: minor cleanup of comments in processor.h	Mike Travis	2008-02-19
\| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Mike Travis <travis@sgi.com> Cc: Christoph Lameter <clameter@sgi.com> Cc: Jack Steiner <steiner@sgi.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: make mxcsr_feature_mask static again	Adrian Bunk	2008-02-19
\| \| \| \| \| \| \| \|	Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Roland McGrath <roland@redhat.com> Cc: hpa@zytor.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	remove mca-pentium	Adrian Bunk	2008-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the mca-pentium boot option that was a noop. besides the source code cleanup factor, this saves some text as well: arch/x86/kernel/cpu/bugs.o: text data bss dec hex filename 651 77 4 732 2dc bugs.o.before 631 53 4 688 2b0 bugs.o.after Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: zap invalid and unused pmds in early boot	Thomas Gleixner	2008-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting from __START_KERNEL_map. The kernel itself only needs _text to _end mapped in the high alias. On relocatible kernels the ASM setup code adjusts the compile time created high mappings to the relocation. This creates invalid pmd entries for negative offsets: 0xffffffff80000000 -> pmd entry: ffffffffff2001e3 It points outside of the physical address space and is marked present. This starts at the virtual address __START_KERNEL_map and goes up to the point where the first valid physical address (0x0) is mapped. Zap the mappings before _text and after _end right away in early boot. This removes also the invalid entries. Furthermore it simplifies the range check for high aliases. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: include proper prototypes for rodata_test	Harvey Harrison	2008-02-14
\| \| \| \| \| \| \| \| \| \|	extern should not appear in C files. Also, the definitions do not match the prototype currently, not sure what way you want to go with this, I've switched the prototype to return int, but I can see going to the void return as well. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: make dump_pagetable() static	Adrian Bunk	2008-02-14
\| \| \| \| \| \| \| \|	dump_pagetable() can now become static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: fix sigcontext.h user export	Ingo Molnar	2008-02-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jakub Jelinek reported that some user-space code that relies on kernel headers has built dependency on the sigcontext->eip/rip register names - which have been unified in commit: commit 742fa54a62be6a263df14a553bf832724471dfbe Author: H. Peter Anvin <hpa@zytor.com> Date: Wed Jan 30 13:30:56 2008 +0100 x86: use generic register names in struct sigcontext so give the old layout to user-space. This is not particularly pretty, but it's an ABI so there's no danger of the two definitions getting out of sync. Reported-by: Jakub Jelinek <jakub@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: introduce page pool in cpa	Thomas Gleixner	2008-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	DEBUG_PAGEALLOC was not possible on 64-bit due to its early-bootup hardcoded reliance on PSE pages, and the unrobustness of the runtime splitup of large pages. The splitup ended in recursive calls to alloc_pages() when a page for a pte split was requested. Avoid the recursion with a preallocated page pool, which is used to split up large mappings and gets refilled in the return path of kernel_map_pages after the split has been done. The size of the page pool is adjusted to the available memory. This part just implements the page pool and the initialization w/o using it yet. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	x86: construct 32-bit boot time page tables in native format.	Ian Campbell	2008-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled kernel are in PAE format. early_ioremap is updated to use the standard page table accessors. Clear any mappings beyond max_low_pfn from the boot page tables in native_pagetable_setup_start because the initial mappings can extend beyond the range of physical memory and into the vmalloc area. Derived from patches by Eric Biederman and H. Peter Anvin. [ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ] Signed-off-by: Ian Campbell <ijc@hellion.org.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Mika PenttilÃÂ¤ <mika.penttila@kolumbus.fi> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: fix sparse warnings in acpi/bus.c	Harvey Harrison	2008-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add function definition and extern variables to asm-x86/acpi.h. All of these are used in bus.c in ifdef(CONFIG_X86) sections, so are only added to the x86 include headers. boot.c already includes acpi.h so no changes are needed there. Fixes the following: arch/x86/kernel/acpi/boot.c:83:4: warning: symbol 'acpi_sci_flags' was not declared. Should it be static? arch/x86/kernel/acpi/boot.c:84:5: warning: symbol 'acpi_sci_override_gsi' was not declared. Should it be static? arch/x86/kernel/acpi/boot.c:421:13: warning: symbol 'acpi_pic_sci_set_trigger' was not declared. Should it be static? Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: sparse warnings in pageattr.c	Harvey Harrison	2008-02-09
\| \| \| \| \| \| \| \| \| \|	Adjust the definition of lookup_address to take an unsigned long level argument. Adjust callers in xen/mmu.c that pass in a dummy variable. Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: GEODE: MFGPT: Use "just-in-time" detection for the MFGPT timers	Jordan Crouse	2008-02-09
\| \| \| \| \| \| \| \| \| \| \| \|	There isn't much value to always detecting the MFGPT timers on Geode platforms; detection is only needed when something wants to use the timers. Move the detection code so that it gets called the first time a timer is allocated. Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: GEODE: MFGPT: make mfgpt_timer_setup available outside of mfgpt_32.c	Andres Salomon	2008-02-09
\| \| \| \| \| \| \| \| \| \|	We need to be called from elsewhere, and this gets some #ifdefs out of the .c file. Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: GEODE: MFGPT: drop module owner usage from MFGPT API	Andres Salomon	2008-02-09
\| \| \| \| \| \| \| \| \| \| \| \| \|	We had planned to use the 'owner' field for allowing re-allocation of MFGPTs; however, doing it by module owner name isn't flexible enough. So, drop this for now. If it turns out that we need timers in modules, we'll need to come up with a scheme that matches the write-once fields of the MFGPTx_SETUP register, and drops ponies from the sky. Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Jordan Crouse <jordan.crouse@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
*	x86: fix pgtable_t build breakage	Ingo Molnar	2008-02-08
\| \| \| \| \| \| \| \| \| \| \|	Commit 2f569afd9ced9ebec9a6eb3dbf6f83429be0a7b4 ("CONFIG_HIGHPTE vs. sub-page page tables") caused some build breakage due to pgtable_t only getting declared in the CONFIG_X86_PAE case. Move the declaration outside the PAE section. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	CONFIG_HIGHPTE vs. sub-page page tables.	Martin Schwidefsky	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: I've implemented 1K/2K page tables for s390. These sub-page page tables are required to properly support the s390 virtualization instruction with KVM. The SIE instruction requires that the page tables have 256 page table entries (pte) followed by 256 page status table entries (pgste). The pgstes are only required if the process is using the SIE instruction. The pgstes are updated by the hardware and by the hypervisor for a number of reasons, one of them is dirty and reference bit tracking. To avoid wasting memory the standard pte table allocation should return 1K/2K (31/64 bit) and 2K/4K if the process is using SIE. Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means the s390 version for pte_alloc_one cannot return a pointer to a struct page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one cannot return a pointer to a pte either, since that would require more than 32 bit for the return value of pte_alloc_one (and the pte * would not be accessible since its not kmapped). Solution: The only solution I found to this dilemma is a new typedef: a pgtable_t. For s390 pgtable_t will be a (pte ) - to be introduced with a later patch. For everybody else it will be a (struct page ). The additional problem with the initialization of the ptl lock and the NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and a destructor pgtable_page_dtor. The page table allocation and free functions need to call these two whenever a page table page is allocated or freed. pmd_populate will get a pgtable_t instead of a struct page pointer. To get the pgtable_t back from a pmd entry that has been installed with pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page call in free_pte_range and apply_to_pte_range. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	asm-*/posix_types.h: scrub __GLIBC__	Mike Frysinger	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some arches (like alpha and ia64) already have a clean posix_types.h header. This brings all the others in line by removing all references to __GLIBC__ (and some undocumented __USE_ALL). Signed-off-by: Mike Frysinger <vapier@gentoo.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Ulrich Drepper <drepper@redhat.com> Cc: Roland McGrath <roland@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT	David Howells	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	aout: move STACK_TOP[_MAX] to asm/processor.h	David Howells	2008-02-08
\| \| \| \| \| \| \| \| \| \|	Move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Add cmpxchg64 and cmpxchg64_local to x86_64	Mathieu Desnoyers	2008-02-07
\| \| \| \| \| \| \| \| \| \| \| \|	Make sure that at least cmpxchg64_local is available on all architectures to use for unsigned long long values. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Andi Kleen <ak@muc.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Sanitize the type of struct user.u_ar0	H. Peter Anvin	2008-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	struct user.u_ar0 is defined to contain a pointer offset on all architectures in which it is defined (all architectures which define an a.out format except SPARC.) However, it has a pointer type in the headers, which is pointless -- <asm/user.h> is not exported to userspace, and it just makes the code messy. Redefine the field as "unsigned long" (which is the same size as a pointer on all Linux architectures) and change the setting code to user offsetof() instead of hand-coded arithmetic. Cc: Linux Arch Mailing List <linux-arch@vger.kernel.org> Cc: Bryan Wu <bryan.wu@analog.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Lennert Buytenhek <kernel@wantstofly.org> Cc: Håvard Skinnemoen <hskinnemoen@atmel.com> Cc: Mikael Starvik <starvik@axis.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Tony Luck <tony.luck@intel.com> Cc: Hirokazu Takata <takata@linux-m32r.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Cleanup asm/{elf,page,user}.h: #ifdef __KERNEL__ is no longer needed	Kirill A. Shutemov	2008-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \|	asm/elf.h, asm/page.h and asm/user.h don't export to userspace now, so we can drop #ifdef __KERNEL__ for them. [k.shutemov@gmail.com: remove #ifdef __KERNEL_] Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com> Reviewed-by: David Woodhouse <dwmw2@infradead.org> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Kirill A. Shutemov <k.shutemov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>