aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* [PATCH] oom: less memdieNick Piggin2006-12-07
| | | | | | | | | | | Don't cause all threads in all other thread groups to gain TIF_MEMDIE otherwise we'll get a thundering herd eating our memory reserve. This may not be the optimal scheme, but it fits our policy of allowing just one TIF_MEMDIE in the system at once. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] oom: cleanup messagesNick Piggin2006-12-07
| | | | | | | | Clean up the OOM killer messages to be more consistent. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] oom: don't kill unkillable children or siblingsNick Piggin2006-12-07
| | | | | | | | | | Abort the kill if any of our threads have OOM_DISABLE set. Having this test here also prevents any OOM_DISABLE child of the "selected" process from being killed. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memory page_alloc zonelist caching reorder structurePaul Jackson2006-12-07
| | | | | | | | | | | | | | | | | | | | | | Rearrange the struct members in the 'struct zonelist_cache' structure, so as to put the readonly (once initialized) z_to_n[] array first, where it will come right after the zones[] array in struct zonelist. This pretty much eliminates the chance that the two frequently written elements of 'struct zonelist_cache', the fullzones bitmap and last_full_zap times, will end up on the same cache line as the performance sensitive, frequently read, never (after init) written zones[] array. Keeping frequently written data off frequently read cache lines is good for performance. Thanks to Rohit Seth for the suggestion. Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memory page_alloc zonelist caching speedupPaul Jackson2006-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize the critical zonelist scanning for free pages in the kernel memory allocator by caching the zones that were found to be full recently, and skipping them. Remembers the zones in a zonelist that were short of free memory in the last second. And it stashes a zone-to-node table in the zonelist struct, to optimize that conversion (minimize its cache footprint.) Recent changes: This differs in a significant way from a similar patch that I posted a week ago. Now, instead of having a nodemask_t of recently full nodes, I have a bitmask of recently full zones. This solves a problem that last weeks patch had, which on systems with multiple zones per node (such as DMA zone) would take seeing any of these zones full as meaning that all zones on that node were full. Also I changed names - from "zonelist faster" to "zonelist cache", as that seemed to better convey what we're doing here - caching some of the key zonelist state (for faster access.) See below for some performance benchmark results. After all that discussion with David on why I didn't need them, I went and got some ;). I wanted to verify that I had not hurt the normal case of memory allocation noticeably. At least for my one little microbenchmark, I found (1) the normal case wasn't affected, and (2) workloads that forced scanning across multiple nodes for memory improved up to 10% fewer System CPU cycles and lower elapsed clock time ('sys' and 'real'). Good. See details, below. I didn't have the logic in get_page_from_freelist() for various full nodes and zone reclaim failures correct. That should be fixed up now - notice the new goto labels zonelist_scan, this_zone_full, and try_next_zone, in get_page_from_freelist(). There are two reasons I persued this alternative, over some earlier proposals that would have focused on optimizing the fake numa emulation case by caching the last useful zone: 1) Contrary to what I said before, we (SGI, on large ia64 sn2 systems) have seen real customer loads where the cost to scan the zonelist was a problem, due to many nodes being full of memory before we got to a node we could use. Or at least, I think we have. This was related to me by another engineer, based on experiences from some time past. So this is not guaranteed. Most likely, though. The following approach should help such real numa systems just as much as it helps fake numa systems, or any combination thereof. 2) The effort to distinguish fake from real numa, using node_distance, so that we could cache a fake numa node and optimize choosing it over equivalent distance fake nodes, while continuing to properly scan all real nodes in distance order, was going to require a nasty blob of zonelist and node distance munging. The following approach has no new dependency on node distances or zone sorting. See comment in the patch below for a description of what it actually does. Technical details of note (or controversy): - See the use of "zlc_active" and "did_zlc_setup" below, to delay adding any work for this new mechanism until we've looked at the first zone in zonelist. I figured the odds of the first zone having the memory we needed were high enough that we should just look there, first, then get fancy only if we need to keep looking. - Some odd hackery was needed to add items to struct zonelist, while not tripping up the custom zonelists built by the mm/mempolicy.c code for MPOL_BIND. My usual wordy comments below explain this. Search for "MPOL_BIND". - Some per-node data in the struct zonelist is now modified frequently, with no locking. Multiple CPU cores on a node could hit and mangle this data. The theory is that this is just performance hint data, and the memory allocator will work just fine despite any such mangling. The fields at risk are the struct 'zonelist_cache' fields 'fullzones' (a bitmask) and 'last_full_zap' (unsigned long jiffies). It should all be self correcting after at most a one second delay. - This still does a linear scan of the same lengths as before. All I've optimized is making the scan faster, not algorithmically shorter. It is now able to scan a compact array of 'unsigned short' in the case of many full nodes, so one cache line should cover quite a few nodes, rather than each node hitting another one or two new and distinct cache lines. - If both Andi and Nick don't find this too complicated, I will be (pleasantly) flabbergasted. - I removed the comment claiming we only use one cachline's worth of zonelist. We seem, at least in the fake numa case, to have put the lie to that claim. - I pay no attention to the various watermarks and such in this performance hint. A node could be marked full for one watermark, and then skipped over when searching for a page using a different watermark. I think that's actually quite ok, as it will tend to slightly increase the spreading of memory over other nodes, away from a memory stressed node. =============== Performance - some benchmark results and analysis: This benchmark runs a memory hog program that uses multiple threads to touch alot of memory as quickly as it can. Multiple runs were made, touching 12, 38, 64 or 90 GBytes out of the total 96 GBytes on the system, and using 1, 19, 37, or 55 threads (on a 56 CPU system.) System, user and real (elapsed) timings were recorded for each run, shown in units of seconds, in the table below. Two kernels were tested - 2.6.18-mm3 and the same kernel with this zonelist caching patch added. The table also shows the percentage improvement the zonelist caching sys time is over (lower than) the stock *-mm kernel. number 2.6.18-mm3 zonelist-cache delta (< 0 good) percent GBs N ------------ -------------- ---------------- systime mem threads sys user real sys user real sys user real better 12 1 153 24 177 151 24 176 -2 0 -1 1% 12 19 99 22 8 99 22 8 0 0 0 0% 12 37 111 25 6 112 25 6 1 0 0 -0% 12 55 115 25 5 110 23 5 -5 -2 0 4% 38 1 502 74 576 497 73 570 -5 -1 -6 0% 38 19 426 78 48 373 76 39 -53 -2 -9 12% 38 37 544 83 36 547 82 36 3 -1 0 -0% 38 55 501 77 23 511 80 24 10 3 1 -1% 64 1 917 125 1042 890 124 1014 -27 -1 -28 2% 64 19 1118 138 119 965 141 103 -153 3 -16 13% 64 37 1202 151 94 1136 150 81 -66 -1 -13 5% 64 55 1118 141 61 1072 140 58 -46 -1 -3 4% 90 1 1342 177 1519 1275 174 1450 -67 -3 -69 4% 90 19 2392 199 192 2116 189 176 -276 -10 -16 11% 90 37 3313 238 175 2972 225 145 -341 -13 -30 10% 90 55 1948 210 104 1843 213 100 -105 3 -4 5% Notes: 1) This test ran a memory hog program that started a specified number N of threads, and had each thread allocate and touch 1/N'th of the total memory to be used in the test run in a single loop, writing a constant word to memory, one store every 4096 bytes. Watching this test during some earlier trial runs, I would see each of these threads sit down on one CPU and stay there, for the remainder of the pass, a different CPU for each thread. 2) The 'real' column is not comparable to the 'sys' or 'user' columns. The 'real' column is seconds wall clock time elapsed, from beginning to end of that test pass. The 'sys' and 'user' columns are total CPU seconds spent on that test pass. For a 19 thread test run, for example, the sum of 'sys' and 'user' could be up to 19 times the number of 'real' elapsed wall clock seconds. 3) Tests were run on a fresh, single-user boot, to minimize the amount of memory already in use at the start of the test, and to minimize the amount of background activity that might interfere. 4) Tests were done on a 56 CPU, 28 Node system with 96 GBytes of RAM. 5) Notice that the 'real' time gets large for the single thread runs, even though the measured 'sys' and 'user' times are modest. I'm not sure what that means - probably something to do with it being slow for one thread to be accessing memory along ways away. Perhaps the fake numa system, running ostensibly the same workload, would not show this substantial degradation of 'real' time for one thread on many nodes -- lets hope not. 6) The high thread count passes (one thread per CPU - on 55 of 56 CPUs) ran quite efficiently, as one might expect. Each pair of threads needed to allocate and touch the memory on the node the two threads shared, a pleasantly parallizable workload. 7) The intermediate thread count passes, when asking for alot of memory forcing them to go to a few neighboring nodes, improved the most with this zonelist caching patch. Conclusions: * This zonelist cache patch probably makes little difference one way or the other for most workloads on real numa hardware, if those workloads avoid heavy off node allocations. * For memory intensive workloads requiring substantial off-node allocations on real numa hardware, this patch improves both kernel and elapsed timings up to ten per-cent. * For fake numa systems, I'm optimistic, but will have to leave that up to Rohit Seth to actually test (once I get him a 2.6.18 backport.) Signed-off-by: Paul Jackson <pj@sgi.com> Cc: Rohit Seth <rohitseth@google.com> Cc: Christoph Lameter <clameter@engr.sgi.com> Cc: David Rientjes <rientjes@cs.washington.edu> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Get rid of zone_table[]Christoph Lameter2006-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zone table is mostly not needed. If we have a node in the page flags then we can get to the zone via NODE_DATA() which is much more likely to be already in the cpu cache. In case of SMP and UP NODE_DATA() is a constant pointer which allows us to access an exact replica of zonetable in the node_zones field. In all of the above cases there will be no need at all for the zone table. The only remaining case is if in a NUMA system the node numbers do not fit into the page flags. In that case we make sparse generate a table that maps sections to nodes and use that table to to figure out the node number. This table is sized to fit in a single cache line for the known 32 bit NUMA platform which makes it very likely that the information can be obtained without a cache miss. For sparsemem the zone table seems to be have been fairly large based on the maximum possible number of sections and the number of zones per node. There is some memory saving by removing zone_table. The main benefit is to reduce the cache foootprint of the VM from the frequent lookups of zones. Plus it simplifies the page allocator. [akpm@osdl.org: build fix] Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Dave Hansen <haveblue@us.ibm.com> Cc: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] __unmap_hugepage_range(): add commentChen, Kenneth W2006-12-07
| | | | | | | | Signed-off-by: Ken Chen <kenneth.w.chen@intel.com> Cc: David Gibson <david@gibson.dropbear.id.au> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] memory page alloc minor cleanupsPaul Jackson2006-12-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - s/freeliest/freelist/ spelling fix - Check for NULL *z zone seems useless - even if it could happen, so what? Perhaps we should have a check later on if we are faced with an allocation request that is not allowed to fail - shouldn't that be a serious kernel error, passing an empty zonelist with a mandate to not fail? - Initializing 'z' to zonelist->zones can wait until after the first get_page_from_freelist() fails; we only use 'z' in the wakeup_kswapd() loop, so let's initialize 'z' there, in a 'for' loop. Seems clearer. - Remove superfluous braces around a break - Fix a couple errant spaces - Adjust indentation on the cpuset_zone_allowed() check, to match the lines just before it -- seems easier to read in this case. - Add another set of braces to the zone_watermark_ok logic From: Paul Jackson <pj@sgi.com> Backout one item from a previous "memory page_alloc minor cleanups" patch. Until and unless we are certain that no one can ever pass an empty zonelist to __alloc_pages(), this check for an empty zonelist (or some BUG equivalent) is essential. The code in get_page_from_freelist() blow ups if passed an empty zonelist. Signed-off-by: Paul Jackson <pj@sgi.com> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Paul Jackson <pj@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] uml: workqueue build fixAndrew Morton2006-12-07
| | | | | | | | | | arch/um/drivers/chan_kern.c:643: error: conflicting types for 'chan_interrupt' arch/um/include/chan_kern.h:31: error: previous declaration of 'chan_interrupt' Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] skip data conversion in compat_sys_mount when data_page is NULLAndrey Mirkin2006-12-07
| | | | | | | | | | | | | | | | | | | OpenVZ Linux kernel team has found a problem with mounting in compat mode. Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode leads to oops: Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: compat_sys_mount+0xd6/0x290 Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task ffff810034c86bc0) Call Trace: ia32_sysret+0x0/0xa The problem is that data_page pointer can be NULL, so we should skip data conversion in this case. Signed-off-by: Andrey Mirkin <amirkin@openvz.org> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] drm-sis linkage fixAndrew Morton2006-12-07
| | | | | | | | | | | | Fix http://bugzilla.kernel.org/show_bug.cgi?id=7606 WARNING: "drm_sman_set_manager" [drivers/char/drm/sis.ko] undefined! Cc: <daniel-silveira@gee.inatel.br> Cc: Dave Airlie <airlied@linux.ie> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] add bottom_half.hAndrew Morton2006-12-07
| | | | | | | | | | | | | | | | | | | | With CONFIG_SMP=n: drivers/input/ff-memless.c:384: warning: implicit declaration of function 'local_bh_disable' drivers/input/ff-memless.c:393: warning: implicit declaration of function 'local_bh_enable' Really linux/spinlock.h should include linux/interrupt.h. But interrupt.h includes sched.h which will need spinlock.h. So the patch breaks the _bh declarations out into a separate header and includes it in both interrupt.h and spinlock.h. Cc: "Randy.Dunlap" <rdunlap@xenotime.net> Cc: Andi Kleen <ak@suse.de> Cc: <stable@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] A few small additions and corrections to READMEJesper Juhl2006-12-06
| | | | | | | | | | | | | | Here's a small patch which - adds a few archs to the current list of supported platforms. - adds a few missing slashes at the end of URLs. - adds a few references to additional documentation. - adds "make config" to the list of possible configuration targets. - makes a few other minor changes. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> [ Ben Nizette <ben.nizette@iinet.net.au> points out AVR32 arch too ] Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Clean up 'make help' output for documentation targets.Jesper Juhl2006-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Here's a patch that cleans up the "make help" output a bit for the documentation targets. Currently the documentation targets are listed completely different than all the other targets : Documentation targets: Linux kernel internal documentation in different formats: xmldocs (XML DocBook), psdocs (Postscript), pdfdocs (PDF) htmldocs (HTML), mandocs (man pages, use installmandocs to install) with this patch they are more in line with the rest of the output : Documentation targets: Linux kernel internal documentation in different formats: htmldocs - HTML installmandocs - install man pages generated by mandocs mandocs - man pages pdfdocs - PDF psdocs - Postscript xmldocs - XML DocBook Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds2006-12-06
|\ | | | | | | | | | | | | | | | | | | * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: [MIPS] Import updates from i386's i8259.c [MIPS] *-berr: Header inclusions for DEC bus error handlers [MIPS] Compile __do_IRQ() when really needed [MIPS] genirq: use name instead of typename [MIPS] Do not use handle_level_irq for ioasic_dma_irq_type. [MIPS] pte_offset(dir,addr): parenthesis fix
| * [MIPS] Import updates from i386's i8259.cAtsushi Nemoto2006-12-06
| | | | | | | | | | | | | | Import many updates from i386's i8259.c, especially genirq transitions. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * [MIPS] *-berr: Header inclusions for DEC bus error handlersMaciej W. Rozycki2006-12-06
| | | | | | | | | | | | | | | | A fixup to add missing header inclusions for bus error handlers for DECstation system after the recent switch to get_irq_regs(). Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * [MIPS] Compile __do_IRQ() when really neededFranck Bui-Huu2006-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __do_IRQ() is needed only by irq handlers that can't use default handlers defined in kernel/irq/chip.c. For others platforms there's no need to compile this function since it won't be used. For those platforms this patch defines GENERIC_HARDIRQS_NO__DO_IRQ symbol which is used exactly for this purpose. Futhermore for platforms which do not use __do_IRQ(), end() method which is part of the 'irq_chip' structure is not used. This patch simply removes this method in this case. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * [MIPS] genirq: use name instead of typenameAtsushi Nemoto2006-12-06
| | | | | | | | | | | | | | The "typename" field was obsoleted by the "name" field. Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * [MIPS] Do not use handle_level_irq for ioasic_dma_irq_type.Atsushi Nemoto2006-12-06
| | | | | | | | | | Signed-off-by: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * [MIPS] pte_offset(dir,addr): parenthesis fixFranck Bui-Huu2006-12-06
| | | | | | | | | | | | | | | | | | | | | | This patch adds missing parenthesis around 'dir' argument in pte_offset() macro definition. It also removes an extra space in the definition of pte_offset_kernel() macro. Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* | Merge branch 'release' of ↵Linus Torvalds2006-12-06
|\ \ | | | | | | | | | | | | | | | | | | master.kernel.org:/home/ftp/pub/scm/linux/kernel/git/aegl/linux-2.6 * 'release' of master.kernel.org:/home/ftp/pub/scm/linux/kernel/git/aegl/linux-2.6: [IA64] Fix pci.c kernel compilation breakage.
| * | [IA64] Fix pci.c kernel compilation breakage.Peter Chubb2006-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent change to convert the is_enabled flag in the PCI device to an atomic count broke the IA64 compilation. As pcibios_disable_device is only ever called if the reference count is zero, convert the if to a BUG_ON. Signed-off-by: Peter Chubb <peterc@gelato.unsw.edu.au> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | | [PATCH] ... and then some more work_struct-induced breakage (ibmvscsi)Al Viro2006-12-06
| | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] ... and more work_struct-induced breakage (mips)Al Viro2006-12-06
| | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] More work_struct induced breakage (s390)Al Viro2006-12-06
| | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | x86[-64]:Remove 'volatile' from atomic_tLinus Torvalds2006-12-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any code that relies on the volatile would be a bug waiting to happen anyway. Don't encourage people to think that putting 'volatile' on data structures somehow fixes problems. We should always use proper locking (and other serialization) techniques. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | [PATCH] Remove 'volatile' from spinlock_typesArt Haas2006-12-06
|/ / | | | | | | | | | | | | | | | | | | | | | | | | This is a resubmission of patches originally created by Ingo Molnar. The link below is the initial (?) posting of the patch. http://marc.theaimsgroup.com/?l=linux-kernel&m=115217423929806&w=2 Remove 'volatile' from spinlock_types as it causes GCC to generate bad code (see link) and locking should be used on kernel data. Signed-off-by: Art Haas <ahaas@airmail.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] drivers/{char|isdn}: work_struct-induced breakageAl Viro2006-12-06
| | | | | | | | | | | | | | part 1 of fsck-knows-how-many Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] hamradio/dmascc: fix up work_struct-induced breakageAl Viro2006-12-06
| | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6Linus Torvalds2006-12-06
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (43 commits) sh: sh775x/titan fixes for irq header changes. sh: update r7780rp defconfig. sh: compile fixes for header cleanup. sh: Fixup pte_mkhuge() build failure. sh: set KBUILD_IMAGE to something sensible. sh: show held locks in stack trace with lockdep. sh: platform_pata support for R7780RP sh: stacktrace/lockdep/irqflags tracing support. sh: Fixup movli.l/movco.l atomic ops for gcc4. sh: dyntick infrastructure. sh: Clock framework tidying. sh: Turn off IRQs around get_timer_offset() calls. sh: Get the PGD right in oops case with 64-bit PTEs. sh: Fix store queue bitmap end. sh: More flexible + SH7780 earlyprintk SCIF support. sh: Fixup various PAGE_SIZE == 4096 assumptions. sh: Fixup 4K irq stacks. sh: dma-api channel capability extensions. sh: Drop name overload in dma-sh. sh: Make dma-isa depend on ISA_DMA_API. ...
| * | sh: sh775x/titan fixes for irq header changes.Jamie Lenehan2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following moves the creation of IPR interupts into setup-7750.c and updates a few other things to make it all work after the "Drop CPU subtype IRQ headers" commit. It boots and runs fine on my titan board. - adds an ipr_idx to the ipr_data and uses a function in the subtype code to calculate the address of the IPR registers - adds a function to enable individual interrupt mode for externals in the subtype code and calls that from the titan board code instead of doing it directly. - I changed the shift in the ipr_data to be the actual # of bits to shift, instead of the numnber / 4 - made it easier to match with the manual. Signed-off-by: Jamie Lenehan <lenehan@twibble.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: update r7780rp defconfig.Paul Mundt2006-12-05
| | | | | | | | | | | | Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: compile fixes for header cleanup.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | Since some header inclusion paths were cleaned up, compilation broke. Add in the headers we need directly to build again. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Fixup pte_mkhuge() build failure.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | When hugetlbpage support isn't enabled, this can be bogus. Wrap it back in _PAGE_FLAGS_HARD to avoid changes to the base PTE when not aiming for larger sizes. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: set KBUILD_IMAGE to something sensible.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | This was missing for sh too, wire it up.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: show held locks in stack trace with lockdep.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | Follows the same change as other architectures.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: platform_pata support for R7780RPPaul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | This adds a platform device for the directly connected CF interface on R7780RP boards, for use with the pata_platform libata driver. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: stacktrace/lockdep/irqflags tracing support.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | Wire up all of the essentials for lockdep.. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Fixup movli.l/movco.l atomic ops for gcc4.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc4 gets a bit pissy about the outputs: include/asm/atomic.h: In function 'atomic_add': include/asm/atomic.h:37: error: invalid lvalue in asm statement include/asm/atomic.h:30: error: invalid lvalue in asm output 1 ... this ended up being a thinko anyways, so just fix it up. Verified for proper behaviour with the older toolchains, too. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: dyntick infrastructure.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | This adds basic NO_IDLE_HZ support to the SH timer API so timers are able to wire it up. Taken from the ARM version, as it fit in to our API with very few changes needed. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Clock framework tidying.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | This syncs up the SH clock framework with the linux/clk.h API, for which there were only some minor changes required, namely the clk_get() dev_id and subsequent callsites. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Turn off IRQs around get_timer_offset() calls.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since all of the sys_timer sources currently do this on their own within the ->get_offset() path, it's more sensible to just have the caller take care of it when grabbing xtime_lock. Incidentally, this is more in line with what others (ie, ARM) are doing already. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Get the PGD right in oops case with 64-bit PTEs.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously this was using a static pgd shift in the reporting code, simply flip this to PGDIR_SHIFT which does the right thing depending on varying PTE magnitudes on the SH-X2 MMU. While we're at it, and since it's been recently added, use get_TTB() for fetching the TTB, rather than the open coded instructions. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Fix store queue bitmap end.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The end of the store queue bitmap is miscalculated when searching for a free range in sq_remap(), missing the PAGE_SHIFT shift that's done in sq_api_init(). This runs in to workloads where we can scan beyond the end of the bitmap. Spotted by Paul Jackson: http://marc.theaimsgroup.com/?l=linux-kernel&m=116493191224097&w Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: More flexible + SH7780 earlyprintk SCIF support.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the early printk support somewhat more flexible, moving the port definition to a config option, and making the port initialization configurable for sh-ipl+g users. At the same time, this allows us to trivially wire up the SH7780 SCIF0, so that's thrown in too more or less for free. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Fixup various PAGE_SIZE == 4096 assumptions.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were a number of places that made evil PAGE_SIZE == 4k assumptions that ended up breaking when trying to play with 8k and 64k page sizes, this fixes those up. The most significant change is the way we load THREAD_SIZE, previously this was done via: mov #(THREAD_SIZE >> 8), reg shll8 reg to avoid a memory access and allow the immediate load. With a 64k PAGE_SIZE, we're out of range for the immediate load size without resorting to special instructions available in later ISAs (movi20s and so on). The "workaround" for this is to bump up the shift to 10 and insert a shll2, which gives a bit more flexibility while still being much cheaper than a memory access. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Fixup 4K irq stacks.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There was a clobber issue with the register we were saving the stack in, so we switch to a register that we handle in the clobber list properly already. This also follows the x86 changes for allowing the softirq checks from hardirq context. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: dma-api channel capability extensions.Mark Glaisher2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends the SH DMA API for allowing handling of DMA channels based off of their respective capabilities. A couple of functions are added to the existing API, the core bits are register_chan_caps() for registering channel capabilities, and request_dma_bycap() for fetching a channel dynamically based off of a capability set. Signed-off-by: Mark Glaisher <mark.glaisher@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
| * | sh: Drop name overload in dma-sh.Paul Mundt2006-12-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass along the dev_id from request_dma() all the way down, rather than inserting an artificial name relating to the TEI line that we were doing before. This makes the line a bit less obvious, but dev_id is the proper behaviour for this regardless. Signed-off-by: Paul Mundt <lethal@linux-sh.org>