aboutsummaryrefslogtreecommitdiffstats
path: root/mm/slab.c
Commit message (Collapse)AuthorAge
* mm/slab.c (non-NUMA): Fix compile warning and clean up codeLinus Torvalds2006-02-05
| | | | | | | | | | | | | | The non-NUMA case would do an unmatched "free_alien_cache()" on an alien pointer that had never been allocated. It might not matter from a code generation standpoint (since in the non-NUMA case, the code doesn't actually _do_ anything), but it not only results in a compiler warning, it's really really ugly too. Fix the compiler warning by just having a matching dummy allocation. That also avoids an unnecessary #ifdef in the code. Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NUMA slab locking fixes: fix cpu down and up lockingRavikiran G Thirumalai2006-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes locking and bugs in cpu_down and cpu_up paths of the NUMA slab allocator. Sonny Rao <sonny@burdell.org> reported problems sometime back on POWER5 boxes, when the last cpu on the nodes were being offlined. We could not reproduce the same on x86_64 because the cpumask (node_to_cpumask) was not being updated on cpu down. Since that issue is now fixed, we can reproduce Sonny's problems on x86_64 NUMA, and here is the fix. The problem earlier was on CPU_DOWN, if it was the last cpu on the node to go down, the array_caches (shared, alien) and the kmem_list3 of the node were being freed (kfree) with the kmem_list3 lock held. If the l3 or the array_caches were to come from the same cache being cleared, we hit on badness. This patch cleans up the locking in cpu_up and cpu_down path. We cannot really free l3 on cpu down because, there is no node offlining yet and even though a cpu is not yet up, node local memory can be allocated for it. So l3s are usually allocated at keme_cache_create and destroyed at kmem_cache_destroy. Hence, we don't need cachep->spinlock protection to get to the cachep->nodelist[nodeid] either. Patch survived onlining and offlining on a 4 core 2 node Tyan box with a 4 dbench process running all the time. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NUMA slab locking fixes: irq disabling from cahep->spinlock to l3 lockRavikiran G Thirumalai2006-02-05
| | | | | | | | | | | | | | | | | Earlier, we had to disable on chip interrupts while taking the cachep->spinlock because, at cache_grow, on every addition of a slab to a slab cache, we incremented colour_next which was protected by the cachep->spinlock, and cache_grow could occur at interrupt context. Since, now we protect the per-node colour_next with the node's list_lock, we do not need to disable on chip interrupts while taking the per-cache spinlock, but we just need to disable interrupts when taking the per-node kmem_list3 list_lock. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NUMA slab locking fixes: move color_next to l3Ravikiran G Thirumalai2006-02-05
| | | | | | | | | | | | | | | | | | | colour_next is used as an index to add a colouring offset to a new slab in the cache (colour_off * colour_next). Now with the NUMA aware slab allocator, it makes sense to colour slabs added on the same node sequentially with colour_next. This patch moves the colouring index "colour_next" per-node by placing it on kmem_list3 rather than kmem_cache. This also helps simplify locking for CPU up and down paths. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org> Signed-off-by: Shai Fultheim <shai@scalex86.org> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: fix sparse warningRandy Dunlap2006-02-01
| | | | | | | | mm/slab.c:1522:13: error: incompatible types for operation (&) Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm/slab: add kernel-doc for one functionRandy.Dunlap2006-02-01
| | | | | | | | Fix kernel-doc for calculate_slab_order(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: fix kzalloc and kstrdup caller report for CONFIG_DEBUG_SLABPekka Enberg2006-02-01
| | | | | | | | | | | | | | Fix kzalloc() and kstrdup() caller report for CONFIG_DEBUG_SLAB. We must pass the caller to __cache_alloc() instead of directly doing __builtin_return_address(0) there; otherwise kzalloc() and kstrdup() are reported as the allocation site instead of the real one. Thanks to Valdis Kletnieks for reporting the problem and Steven Rostedt for the original idea. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: replace kmem_cache_t with struct kmem_cachePekka Enberg2006-02-01
| | | | | | | | Replace uses of kmem_cache_t with proper struct kmem_cache in mm/slab.c. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: rename ac_data to cpu_cache_getPekka Enberg2006-02-01
| | | | | | | | | Rename the ac_data() function to more descriptive cpu_cache_get(). Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract virt_to_{cache|slab}Pekka Enberg2006-02-01
| | | | | | | | | | | Introduce virt_to_cache() and virt_to_slab() functions to reduce duplicate code and introduce a proper abstraction should we want to support other kind of mapping for address to slab and cache (eg. for vmalloc() or I/O memory). Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: reduce inliningPekka Enberg2006-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Manfred Spraul <manfred@colorfullife.com> Reduce the amount of inline functions in slab to the functions that are used in the hot path: - no inline for debug functions - no __always_inline, inline is already __always_inline - remove inline from a few numa support functions. Before: text data bss dec hex filename 13588 752 48 14388 3834 mm/slab.o (defconfig) 16671 2492 48 19211 4b0b mm/slab.o (numa) After: text data bss dec hex filename 13366 752 48 14166 3756 mm/slab.o (defconfig) 16230 2492 48 18770 4952 mm/slab.o (numa) Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract slab_{put|get}_objMatthew Dobson2006-02-01
| | | | | | | | | | | Create two helper functions slab_get_obj() and slab_put_obj() to replace duplicated code in mm/slab.c Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract slab_destroy_objs()Matthew Dobson2006-02-01
| | | | | | | | | | | | Create a helper function, slab_destroy_objs() which called from slab_destroy(). This makes slab_destroy() smaller and more readable, and moves ifdefs outside the function body. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: cache_estimate cleanupSteven Rostedt2006-02-01
| | | | | | | | | | | | | | | | | | | | Clean up cache_estimate() in mm/slab.c and improves the algorithm from O(n) to O(1). We first calculate the maximum number of objects a slab can hold after struct slab and kmem_bufctl_t for each object has been given enough space. After that, to respect alignment rules, we decrease the number of objects if necessary. As required padding is at most align-1 and memory of obj_size is at least align, it is always enough to decrease number of objects by one. The optimization was originally made by Balbir Singh with more improvements from Steven Rostedt. Manfred Spraul provider further modifications: no loop at all for the off-slab case and added comments to explain the background. Acked-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: have index_of bug at compile timeSteven Rostedt2006-02-01
| | | | | | | | | | | | | | I noticed the code for index_of is a creative way of finding the cache index using the compiler to optimize to a single hard coded number. But I couldn't help noticing that it uses two methods to let you know that someone used it wrong. One is at compile time (the correct way), and the other is at run time (not good). Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: minor cleanup to kmem_cache_alloc_nodeChristoph Lameter2006-02-01
| | | | | | | | | | Clean up kmem_cache_alloc_node a bit. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: distinguish between object and buffer sizeManfred Spraul2006-02-01
| | | | | | | | | | | | | | An object cache has two different object lengths: - the amount of memory available for the user (object size) - the amount of memory allocated internally (buffer size) This patch does some renames to make the code reflect that better. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Use 32 bit division in slab_put_obj()Benjamin LaHaise2006-02-01
| | | | | | | | | | | | | | Improve the performance of slab_put_obj(). Without the cast, gcc considers ptrdiff_t a 64 bit signed integer and ends up emitting code to use a full signed 128 bit divide on EM64T, which is substantially slower than a 32 bit unsigned divide. I noticed this when looking at the profile of a case where the slab balance is just on edge and thrashes back and forth freeing a block. Signed-off-by: Benjamin LaHaise <benjamin.c.lahaise@intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: optimize numa policy handling in slab allocatorChristoph Lameter2006-01-18
| | | | | | | | | Move the interrupt check from slab_node into ___cache_alloc and adds an "unlikely()" to avoid pipeline stalls on some architectures. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NUMA policies in the slab allocator V2Christoph Lameter2006-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a regression in 2.6.14 against 2.6.13 that causes an imbalance in memory allocation during bootup. The slab allocator in 2.6.13 is not numa aware and simply calls alloc_pages(). This means that memory policies may control the behavior of alloc_pages(). During bootup the memory policy is set to MPOL_INTERLEAVE resulting in the spreading out of allocations during bootup over all available nodes. The slab allocator in 2.6.13 has only a single list of slab pages. As a result the per cpu slab cache and the spinlock controlled page lists may contain slab entries from off node memory. The slab allocator in 2.6.13 makes no effort to discern the locality of an entry on its lists. The NUMA aware slab allocator in 2.6.14 controls locality of the slab pages explicitly by calling alloc_pages_node(). The NUMA slab allocator manages slab entries by having lists of available slab pages for each node. The per cpu slab cache can only contain slab entries associated with the node local to the processor. This guarantees that the default allocation mode of the slab allocator always assigns local memory if available. Setting MPOL_INTERLEAVE as a default policy during bootup has no effect anymore. In 2.6.14 all node unspecific slab allocations are performed on the boot processor. This means that most of key data structures are allocated on one node. Most processors will have to refer to these structures making the boot node a potential bottleneck. This may reduce performance and cause unnecessary memory pressure on the boot node. This patch implements NUMA policies in the slab layer. There is the need of explicit application of NUMA memory policies by the slab allcator itself since the NUMA slab allocator does no longer let the page_allocator control locality. The check for policies is made directly at the beginning of __cache_alloc using current->mempolicy. The memory policy is already frequently checked by the page allocator (alloc_page_vma() and alloc_page_current()). So it is highly likely that the cacheline is present. For MPOL_INTERLEAVE kmalloc() will spread out each request to one node after another so that an equal distribution of allocations can be obtained during bootup. It is not possible to push the policy check to lower layers of the NUMA slab allocator since the per cpu caches are now only containing slab entries from the current node. If the policy says that the local node is not to be preferred or forbidden then there is no point in checking the slab cache or local list of slab pages. The allocation better be directed immediately to the lists containing slab entries for the allowed set of nodes. This way of applying policy also fixes another strange behavior in 2.6.13. alloc_pages() is controlled by the memory allocation policy of the current process. It could therefore be that one process is running with MPOL_INTERLEAVE and would f.e. obtain a new page following that policy since no slab entries are in the lists anymore. A page can typically be used for multiple slab entries but lets say that the current process is only using one. The other entries are then added to the slab lists. These are now non local entries in the slab lists despite of the possible availability of local pages that would provide faster access and increase the performance of the application. Another process without MPOL_INTERLEAVE may now run and expect a local slab entry from kmalloc(). However, there are still these free slab entries from the off node page obtained from the other process via MPOL_INTERLEAVE in the cache. The process will then get an off node slab entry although other slab entries may be available that are local to that process. This means that the policy if one process may contaminate the locality of the slab caches for other processes. This patch in effect insures that a per process policy is followed for the allocation of slab entries and that there cannot be a memory policy influence from one process to another. A process with default policy will always get a local slab entry if one is available. And the process using memory policies will get its memory arranged as requested. Off-node slab allocation will require the use of spinlocks and will make the use of per cpu caches not possible. A process using memory policies to redirect allocations offnode will have to cope with additional lock overhead in addition to the latency added by the need to access a remote slab entry. Changes V1->V2 - Remove #ifdef CONFIG_NUMA by moving forward declaration into prior #ifdef CONFIG_NUMA section. - Give the function determining the node number to use a saner name. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sem2mutex: mm/slab.cIngo Molnar2006-01-18
| | | | | | | | | Convert mm/swapfile.c's swapon_sem to swapon_mutex. Signed-off-by: Ingo Molnar <mingo@elte.hu> Cc: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] fix/simplify mutex debugging codeDavid Woodhouse2006-01-11
| | | | | | | | | | Let's switch mutex_debug_check_no_locks_freed() to take (addr, len) as arguments instead, since all its callers were just calculating the 'to' address for themselves anyway... (and sometimes doing so badly). Signed-off-by: David Woodhouse <dwmw2@infradead.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mutex subsystem, more debugging codeIngo Molnar2006-01-09
| | | | | | | | more mutex debugging: check for held locks during memory freeing, task exit, enable sysrq printouts, etc. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Arjan van de Ven <arjan@infradead.org>
* [PATCH] slob: introduce mm/util.c for shared functionsMatt Mackall2006-01-08
| | | | | | | | Add mm/util.c for functions common between SLAB and SLOB. Signed-off-by: Matt Mackall <mpm@selenic.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: clean up local variablesTobias Klauser2006-01-08
| | | | | | | | | | | Clean up a local variable with the same name as a variable in a larger block. Also move a variable into the block where it's actually used. Spotted by http://linuxicc.sourceforge.net/ Signed-off-by: Tobias Klauser <tklauser@nuerscht.ch> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: fix code formattingPekka Enberg2006-01-08
| | | | | | | | | The slab allocator code is inconsistent in coding style and messy. For this patch, I ran Lindent for mm/slab.c and fixed up goofs by hand. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract slab order calculation to separate functionPekka Enberg2006-01-08
| | | | | | | | | | | | This patch moves the ugly loop that determines the 'optimal' size (page order) of cache slabs from kmem_cache_create() to a separate function and cleans it up a bit. Thanks to Matthew Wilcox for the help with this patch. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: extract slabinfo header printing to separate functionPekka Enberg2006-01-08
| | | | | | | | | This patch extracts slabinfo header printing to a separate function print_slabinfo_header() to make s_start() more readable. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove unused align parameter from alloc_percpuPekka Enberg2006-01-08
| | | | | | | | | | | | __alloc_percpu and alloc_percpu both take an 'align' argument which is completely ignored. snmp6_mib_init() in net/ipv6/af_inet6.c attempts to use it, but it will be ignored. Therefore, remove the 'align' argument and fixup the lone caller. Signed-off-by: Matthew Dobson <colpatch@us.ibm.com> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: remove alloc_pages() callsChristoph Lameter2005-11-13
| | | | | | | | | | | The slab allocator never uses alloc_pages since kmem_getpages() is always called with a valid nodeid. Remove the branch and the code from kmem_getpages() Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: convert cache to page mapping macrosPekka Enberg2005-11-13
| | | | | | | | | | This patch converts object cache <-> page mapping macros to static inline functions to make the more explicit and readable. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* mm/slab.c: fix a comment typoAdrian Bunk2005-11-08
|
* [PATCH] more kernel-doc cleanups, additionsRandy Dunlap2005-11-07
| | | | | | | | | | | Various core kernel-doc cleanups: - add missing function parameters in ipc, irq/manage, kernel/sys, kernel/sysctl, and mm/slab; - move description to just above function for kernel_restart() Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: Use same schedule timeout for all cpus in cache_reapManfred Spraul2005-11-07
| | | | | | | | | | | | | | | Chen noticed that cache_reap uses REAPTIMEOUT_CPUC+smp_processor_id() as the timeout for rescheduling. The "+smp_processor_id()" part is wrong, the timeout should be identical for all cpus: start_cpu_timer already adds a cpu dependant offset to avoid any clustering. The attached patch removes smp_processor_id(). Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm: rename kmem_cache_s to kmem_cachePekka J Enberg2005-11-07
| | | | | | | | | This patch renames struct kmem_cache_s to kmem_cache so we can start using it instead of kmem_cache_t typedef. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: don't BUG on duplicated cacheAndrew Morton2005-11-07
| | | | | | | | | | | | | | | | | | | | | slab presently goes BUG if someone tries to register an already-registered cache. But this can happen if the user accidentally loads a module which is already statically linked into the kernel. Nuking the kernel is rather a harsh reaction. Change it into a warning, and just fail the kmem_cache_alloc() attempt. If the module is well-behaved, the modprobe will fail and all is well. Notes: - Swaps the ranking of cache_chain_sem and lock_cpu_hotplug(). Doesn't seem important. Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: add additional debugging to detect slabs from the wrong nodeChristoph Lameter2005-10-30
| | | | | | | | | | This patch adds some stack dumps if the slab logic is processing slab blocks from the wrong node. This is necessary in order to detect situations as encountered by Petr. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] gfp_t: mm/* (easy parts)Al Viro2005-10-28
| | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] gfp flags annotations - part 1Al Viro2005-10-08
| | | | | | | | | | | | - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] kmalloc_node IRQ safety fixAlok N Kataria2005-09-28
| | | | | | | | | | | | | | In kmalloc_node we are checking if the allocation is for the same node when interrupts are "on". This may lead to an allocation on another node than intended. This patch just shifts the check for the current node in __cache_alloc_node when interrupts are disabled. Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Acked-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] revert oversized kmalloc checkAndrew Morton2005-09-23
| | | | | | | | | | | | | | | | As davem points out, this wasn't such a great idea. There may be some code which does: size = 1024*1024; while (kmalloc(size, ...) == 0) size /= 2; which will now explode. Cc: "David S. Miller" <davem@davemloft.net> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] __kmalloc: Generate BUG if size requested is too large.Christoph Lameter2005-09-23
| | | | | | | | | | | | | | | | I had an issue on ia64 where I got a bug in kernel/workqueue because kzalloc returned a NULL pointer due to the task structure getting too big for the slab allocator. Usually these cases are caught by the kmalloc macro in include/linux/slab.h. Compilation will fail if a too big value is passed to kmalloc. However, kzalloc uses __kmalloc which has no check for that. This patch makes __kmalloc bug if a too large entity is requested. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: fix handling of pages from foreign NUMA nodesChristoph Lameter2005-09-23
| | | | | | | | | | | | | | | | | | The numa slab allocator may allocate pages from foreign nodes onto the lists for a particular node if a node runs out of memory. Inspecting the slab->nodeid field will not reflect that the page is now in use for the slabs of another node. This patch fixes that issue by adding a node field to free_block so that the caller can indicate which node currently uses a slab. Also removes the check for the current node from kmalloc_cache_node since the process may shift later to another node which may lead to an allocation on another node than intended. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: alpha inlining fixIvan Kokshaysky2005-09-23
| | | | | | | | | | It is essential that index_of() be inlined. But alpha undoes the gcc inlining hackery and index_of() ends up out-of-line. So fiddle with things to make that function inline again. Cc: Richard Henderson <rth@twiddle.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Fix slab BUG_ON() triggered by change in array cache sizeAlok Kataria2005-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the new changes that we made in the initialization of the slab allocator, we first setup the cache from which array caches are allocated, and then the cache, from which kmem_list3's are allocated. Now if the array cache comes from a cache in which objsize > 32, (in this instance size-64) then, first size-64 cache will be allocated and then the size-128 (if this is the cache from which kmem_list3's are going to be allocated). So with these new changes, we are not guaranteed that we will be initializing the malloc_sizes array in a serialized order. Thus there is a bug in __find_general_cachep, as we are checking whether the first cache_sizes ptr is NULL. This is replaced by checking whether the array-cache cache is initialized. Attached is a patch which does that. Boots fine on a x86-64, with DEBUG_SPIN, DEBUG_SLAB, and preempt. Attached is a patch which does that. Boots fine on a x86-64, with DEBUG_SPIN, DEBUG_SLAB, and preempt.Thanks & Regards, Alok Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhitdayal.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Christoph Lameter <christoph@lameter.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] mm/slab: fix sparse warningsVictor Fusco2005-09-10
| | | | | | | | | Fix the sparse warning "implicit cast to nocast type" Signed-off-by: Victor Fusco <victor@cetuc.puc-rio.br> Signed-off-by: Domen Puncer <domen@coderock.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] update kfree, vfree, and vunmap kerneldocPekka Enberg2005-09-09
| | | | | | | | | | | This patch clarifies NULL handling of kfree() and vfree(). I addition, wording of calling context restriction for vfree() and vunmap() are changed from "may not" to "must not." Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Acked-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] Numa-aware slab allocator V5Christoph Lameter2005-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NUMA API change that introduced kmalloc_node was accepted for 2.6.12-rc3. Now it is possible to do slab allocations on a node to localize memory structures. This API was used by the pageset localization patch and the block layer localization patch now in mm. The existing kmalloc_node is slow since it simply searches through all pages of the slab to find a page that is on the node requested. The two patches do a one time allocation of slab structures at initialization and therefore the speed of kmalloc node does not matter. This patch allows kmalloc_node to be as fast as kmalloc by introducing node specific page lists for partial, free and full slabs. Slab allocation improves in a NUMA system so that we are seeing a performance gain in AIM7 of about 5% with this patch alone. More NUMA localizations are possible if kmalloc_node operates in an fast way like kmalloc. Test run on a 32p systems with 32G Ram. w/o patch Tasks jobs/min jti jobs/min/task real cpu 1 485.36 100 485.3640 11.99 1.91 Sat Apr 30 14:01:51 2005 100 26582.63 88 265.8263 21.89 144.96 Sat Apr 30 14:02:14 2005 200 29866.83 81 149.3342 38.97 286.08 Sat Apr 30 14:02:53 2005 300 33127.16 78 110.4239 52.71 426.54 Sat Apr 30 14:03:46 2005 400 34889.47 80 87.2237 66.72 568.90 Sat Apr 30 14:04:53 2005 500 35654.34 76 71.3087 81.62 714.55 Sat Apr 30 14:06:15 2005 600 36460.83 75 60.7681 95.77 853.42 Sat Apr 30 14:07:51 2005 700 35957.00 75 51.3671 113.30 990.67 Sat Apr 30 14:09:45 2005 800 33380.65 73 41.7258 139.48 1140.86 Sat Apr 30 14:12:05 2005 900 35095.01 76 38.9945 149.25 1281.30 Sat Apr 30 14:14:35 2005 1000 36094.37 74 36.0944 161.24 1419.66 Sat Apr 30 14:17:17 2005 w/patch Tasks jobs/min jti jobs/min/task real cpu 1 484.27 100 484.2736 12.02 1.93 Sat Apr 30 15:59:45 2005 100 28262.03 90 282.6203 20.59 143.57 Sat Apr 30 16:00:06 2005 200 32246.45 82 161.2322 36.10 282.89 Sat Apr 30 16:00:42 2005 300 37945.80 83 126.4860 46.01 418.75 Sat Apr 30 16:01:28 2005 400 40000.69 81 100.0017 58.20 561.48 Sat Apr 30 16:02:27 2005 500 40976.10 78 81.9522 71.02 696.95 Sat Apr 30 16:03:38 2005 600 41121.54 78 68.5359 84.92 834.86 Sat Apr 30 16:05:04 2005 700 44052.77 78 62.9325 92.48 971.53 Sat Apr 30 16:06:37 2005 800 41066.89 79 51.3336 113.38 1111.15 Sat Apr 30 16:08:31 2005 900 38918.77 79 43.2431 134.59 1252.57 Sat Apr 30 16:10:46 2005 1000 41842.21 76 41.8422 139.09 1392.33 Sat Apr 30 16:13:05 2005 These are measurement taken directly after boot and show a greater improvement than 5%. However, the performance improvements become less over time if the AIM7 runs are repeated and settle down at around 5%. Links to earlier discussions: http://marc.theaimsgroup.com/?t=111094594500003&r=1&w=2 http://marc.theaimsgroup.com/?t=111603406600002&r=1&w=2 Changelog V4-V5: - alloc_arraycache and alloc_aliencache take node parameter instead of cpu - fix initialization so that nodes without cpus are properly handled. - simplify code in kmem_cache_init - patch against Andrews temp mm3 release - Add Shai to credits - fallback to __cache_alloc from __cache_alloc_node if the node's cache is not available yet. Changelog V3-V4: - Patch against 2.6.12-rc5-mm1 - Cleanup patch integrated - More and better use of for_each_node and for_each_cpu - GCC 2.95 fix (do not use [] use [0]) - Correct determination of INDEX_AC - Remove hack to cause an error on platforms that have no CONFIG_NUMA but nodes. - Remove list3_data and list3_data_ptr macros for better readability Changelog V2-V3: - Made to patch against 2.6.12-rc4-mm1 - Revised bootstrap mechanism so that larger size kmem_list3 structs can be supported. Do a generic solution so that the right slab can be found for the internal structs. - use for_each_online_node Changelog V1-V2: - Batching for freeing of wrong-node objects (alien caches) - Locking changes and NUMA #ifdefs as requested by Manfred Signed-off-by: Alok N Kataria <alokk@calsoftinc.com> Signed-off-by: Shobhit Dayal <shobhit@calsoftinc.com> Signed-off-by: Shai Fultheim <Shai@Scalex86.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] introduce and use kzallocPekka J Enberg2005-09-07
| | | | | | | | | | This patch introduces a kzalloc wrapper and converts kernel/ to use it. It saves a little program text. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] slab: removes local_irq_save()/local_irq_restore() pairManfred Spraul2005-09-05
| | | | | | | | | | | | | Proposed by and based on a patch from Eric Dumazet <dada1@cosmosbay.com>: This patch removes unnecessary critical section in ksize() function, as cli/sti are rather expensive on modern CPUS. It additionally adds a docbook entry for ksize() and further simplifies the code. Signed-Off-By: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>