aboutsummaryrefslogtreecommitdiffstats
path: root/init
diff options
context:
space:
mode:
authorChristoph Lameter <clameter@sgi.com>2007-05-06 17:49:36 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-05-07 15:12:53 -0400
commit81819f0fc8285a2a5a921c019e3e3d7b6169d225 (patch)
tree47e3da44d3ef6c74ceae6c3771b191b46467bb48 /init
parent543691a6cd70b606dd9bed5e77b120c5d9c5c506 (diff)
SLUB core
This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'init')
-rw-r--r--init/Kconfig53
1 files changed, 40 insertions, 13 deletions
diff --git a/init/Kconfig b/init/Kconfig
index 29d9e47ee0da..7ce952052947 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -478,15 +478,6 @@ config SHMEM
478 option replaces shmem and tmpfs with the much simpler ramfs code, 478 option replaces shmem and tmpfs with the much simpler ramfs code,
479 which may be appropriate on small systems without swap. 479 which may be appropriate on small systems without swap.
480 480
481config SLAB
482 default y
483 bool "Use full SLAB allocator" if (EMBEDDED && !SMP && !SPARSEMEM)
484 help
485 Disabling this replaces the advanced SLAB allocator and
486 kmalloc support with the drastically simpler SLOB allocator.
487 SLOB is more space efficient but does not scale well and is
488 more susceptible to fragmentation.
489
490config VM_EVENT_COUNTERS 481config VM_EVENT_COUNTERS
491 default y 482 default y
492 bool "Enable VM event counters for /proc/vmstat" if EMBEDDED 483 bool "Enable VM event counters for /proc/vmstat" if EMBEDDED
@@ -496,6 +487,46 @@ config VM_EVENT_COUNTERS
496 on EMBEDDED systems. /proc/vmstat will only show page counts 487 on EMBEDDED systems. /proc/vmstat will only show page counts
497 if VM event counters are disabled. 488 if VM event counters are disabled.
498 489
490choice
491 prompt "Choose SLAB allocator"
492 default SLAB
493 help
494 This option allows to select a slab allocator.
495
496config SLAB
497 bool "SLAB"
498 help
499 The regular slab allocator that is established and known to work
500 well in all environments. It organizes chache hot objects in
501 per cpu and per node queues. SLAB is the default choice for
502 slab allocator.
503
504config SLUB
505 depends on EXPERIMENTAL && !ARCH_USES_SLAB_PAGE_STRUCT
506 bool "SLUB (Unqueued Allocator)"
507 help
508 SLUB is a slab allocator that minimizes cache line usage
509 instead of managing queues of cached objects (SLAB approach).
510 Per cpu caching is realized using slabs of objects instead
511 of queues of objects. SLUB can use memory efficiently
512 way and has enhanced diagnostics.
513
514config SLOB
515#
516# SLOB cannot support SMP because SLAB_DESTROY_BY_RCU does not work
517# properly.
518#
519 depends on EMBEDDED && !SMP && !SPARSEMEM
520 bool "SLOB (Simple Allocator)"
521 help
522 SLOB replaces the SLAB allocator with a drastically simpler
523 allocator. SLOB is more space efficient that SLAB but does not
524 scale well (single lock for all operations) and is more susceptible
525 to fragmentation. SLOB it is a great choice to reduce
526 memory usage and code size for embedded systems.
527
528endchoice
529
499endmenu # General setup 530endmenu # General setup
500 531
501config RT_MUTEXES 532config RT_MUTEXES
@@ -511,10 +542,6 @@ config BASE_SMALL
511 default 0 if BASE_FULL 542 default 0 if BASE_FULL
512 default 1 if !BASE_FULL 543 default 1 if !BASE_FULL
513 544
514config SLOB
515 default !SLAB
516 bool
517
518menu "Loadable module support" 545menu "Loadable module support"
519 546
520config MODULES 547config MODULES