aboutsummaryrefslogtreecommitdiffstats
path: root/include/linux
diff options
context:
space:
mode:
authorChristoph Lameter <clameter@sgi.com>2007-05-06 17:49:36 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-05-07 15:12:53 -0400
commit81819f0fc8285a2a5a921c019e3e3d7b6169d225 (patch)
tree47e3da44d3ef6c74ceae6c3771b191b46467bb48 /include/linux
parent543691a6cd70b606dd9bed5e77b120c5d9c5c506 (diff)
SLUB core
This is a new slab allocator which was motivated by the complexity of the existing code in mm/slab.c. It attempts to address a variety of concerns with the existing implementation. A. Management of object queues A particular concern was the complex management of the numerous object queues in SLAB. SLUB has no such queues. Instead we dedicate a slab for each allocating CPU and use objects from a slab directly instead of queueing them up. B. Storage overhead of object queues SLAB Object queues exist per node, per CPU. The alien cache queue even has a queue array that contain a queue for each processor on each node. For very large systems the number of queues and the number of objects that may be caught in those queues grows exponentially. On our systems with 1k nodes / processors we have several gigabytes just tied up for storing references to objects for those queues This does not include the objects that could be on those queues. One fears that the whole memory of the machine could one day be consumed by those queues. C. SLAB meta data overhead SLAB has overhead at the beginning of each slab. This means that data cannot be naturally aligned at the beginning of a slab block. SLUB keeps all meta data in the corresponding page_struct. Objects can be naturally aligned in the slab. F.e. a 128 byte object will be aligned at 128 byte boundaries and can fit tightly into a 4k page with no bytes left over. SLAB cannot do this. D. SLAB has a complex cache reaper SLUB does not need a cache reaper for UP systems. On SMP systems the per CPU slab may be pushed back into partial list but that operation is simple and does not require an iteration over a list of objects. SLAB expires per CPU, shared and alien object queues during cache reaping which may cause strange hold offs. E. SLAB has complex NUMA policy layer support SLUB pushes NUMA policy handling into the page allocator. This means that allocation is coarser (SLUB does interleave on a page level) but that situation was also present before 2.6.13. SLABs application of policies to individual slab objects allocated in SLAB is certainly a performance concern due to the frequent references to memory policies which may lead a sequence of objects to come from one node after another. SLUB will get a slab full of objects from one node and then will switch to the next. F. Reduction of the size of partial slab lists SLAB has per node partial lists. This means that over time a large number of partial slabs may accumulate on those lists. These can only be reused if allocator occur on specific nodes. SLUB has a global pool of partial slabs and will consume slabs from that pool to decrease fragmentation. G. Tunables SLAB has sophisticated tuning abilities for each slab cache. One can manipulate the queue sizes in detail. However, filling the queues still requires the uses of the spin lock to check out slabs. SLUB has a global parameter (min_slab_order) for tuning. Increasing the minimum slab order can decrease the locking overhead. The bigger the slab order the less motions of pages between per CPU and partial lists occur and the better SLUB will be scaling. G. Slab merging We often have slab caches with similar parameters. SLUB detects those on boot up and merges them into the corresponding general caches. This leads to more effective memory use. About 50% of all caches can be eliminated through slab merging. This will also decrease slab fragmentation because partial allocated slabs can be filled up again. Slab merging can be switched off by specifying slub_nomerge on boot up. Note that merging can expose heretofore unknown bugs in the kernel because corrupted objects may now be placed differently and corrupt differing neighboring objects. Enable sanity checks to find those. H. Diagnostics The current slab diagnostics are difficult to use and require a recompilation of the kernel. SLUB contains debugging code that is always available (but is kept out of the hot code paths). SLUB diagnostics can be enabled via the "slab_debug" option. Parameters can be specified to select a single or a group of slab caches for diagnostics. This means that the system is running with the usual performance and it is much more likely that race conditions can be reproduced. I. Resiliency If basic sanity checks are on then SLUB is capable of detecting common error conditions and recover as best as possible to allow the system to continue. J. Tracing Tracing can be enabled via the slab_debug=T,<slabcache> option during boot. SLUB will then protocol all actions on that slabcache and dump the object contents on free. K. On demand DMA cache creation. Generally DMA caches are not needed. If a kmalloc is used with __GFP_DMA then just create this single slabcache that is needed. For systems that have no ZONE_DMA requirement the support is completely eliminated. L. Performance increase Some benchmarks have shown speed improvements on kernbench in the range of 5-10%. The locking overhead of slub is based on the underlying base allocation size. If we can reliably allocate larger order pages then it is possible to increase slub performance much further. The anti-fragmentation patches may enable further performance increases. Tested on: i386 UP + SMP, x86_64 UP + SMP + NUMA emulation, IA64 NUMA + Simulator SLUB Boot options slub_nomerge Disable merging of slabs slub_min_order=x Require a minimum order for slab caches. This increases the managed chunk size and therefore reduces meta data and locking overhead. slub_min_objects=x Mininum objects per slab. Default is 8. slub_max_order=x Avoid generating slabs larger than order specified. slub_debug Enable all diagnostics for all caches slub_debug=<options> Enable selective options for all caches slub_debug=<o>,<cache> Enable selective options for a certain set of caches Available Debug options F Double Free checking, sanity and resiliency R Red zoning P Object / padding poisoning U Track last free / alloc T Trace all allocs / frees (only use for individual slabs). To use SLUB: Apply this patch and then select SLUB as the default slab allocator. [hugh@veritas.com: fix an oops-causing locking error] [akpm@linux-foundation.org: various stupid cleanups and small fixes] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'include/linux')
-rw-r--r--include/linux/mm_types.h17
-rw-r--r--include/linux/poison.h3
-rw-r--r--include/linux/slab.h14
-rw-r--r--include/linux/slub_def.h201
4 files changed, 229 insertions, 6 deletions
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index c3852fd4a1cc..e30687bad075 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -19,10 +19,16 @@ struct page {
19 unsigned long flags; /* Atomic flags, some possibly 19 unsigned long flags; /* Atomic flags, some possibly
20 * updated asynchronously */ 20 * updated asynchronously */
21 atomic_t _count; /* Usage count, see below. */ 21 atomic_t _count; /* Usage count, see below. */
22 atomic_t _mapcount; /* Count of ptes mapped in mms, 22 union {
23 atomic_t _mapcount; /* Count of ptes mapped in mms,
23 * to show when page is mapped 24 * to show when page is mapped
24 * & limit reverse map searches. 25 * & limit reverse map searches.
25 */ 26 */
27 struct { /* SLUB uses */
28 short unsigned int inuse;
29 short unsigned int offset;
30 };
31 };
26 union { 32 union {
27 struct { 33 struct {
28 unsigned long private; /* Mapping-private opaque data: 34 unsigned long private; /* Mapping-private opaque data:
@@ -43,8 +49,15 @@ struct page {
43#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS 49#if NR_CPUS >= CONFIG_SPLIT_PTLOCK_CPUS
44 spinlock_t ptl; 50 spinlock_t ptl;
45#endif 51#endif
52 struct { /* SLUB uses */
53 struct page *first_page; /* Compound pages */
54 struct kmem_cache *slab; /* Pointer to slab */
55 };
56 };
57 union {
58 pgoff_t index; /* Our offset within mapping. */
59 void *freelist; /* SLUB: pointer to free object */
46 }; 60 };
47 pgoff_t index; /* Our offset within mapping. */
48 struct list_head lru; /* Pageout list, eg. active_list 61 struct list_head lru; /* Pageout list, eg. active_list
49 * protected by zone->lru_lock ! 62 * protected by zone->lru_lock !
50 */ 63 */
diff --git a/include/linux/poison.h b/include/linux/poison.h
index 89580b764959..95f518b17684 100644
--- a/include/linux/poison.h
+++ b/include/linux/poison.h
@@ -18,6 +18,9 @@
18#define RED_INACTIVE 0x5A2CF071UL /* when obj is inactive */ 18#define RED_INACTIVE 0x5A2CF071UL /* when obj is inactive */
19#define RED_ACTIVE 0x170FC2A5UL /* when obj is active */ 19#define RED_ACTIVE 0x170FC2A5UL /* when obj is active */
20 20
21#define SLUB_RED_INACTIVE 0xbb
22#define SLUB_RED_ACTIVE 0xcc
23
21/* ...and for poisoning */ 24/* ...and for poisoning */
22#define POISON_INUSE 0x5a /* for use-uninitialised poisoning */ 25#define POISON_INUSE 0x5a /* for use-uninitialised poisoning */
23#define POISON_FREE 0x6b /* for use-after-free poisoning */ 26#define POISON_FREE 0x6b /* for use-after-free poisoning */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index f9ed9346bfd6..67425c277e12 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -32,6 +32,7 @@ typedef struct kmem_cache kmem_cache_t __deprecated;
32#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */ 32#define SLAB_PANIC 0x00040000UL /* Panic if kmem_cache_create() fails */
33#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */ 33#define SLAB_DESTROY_BY_RCU 0x00080000UL /* Defer freeing slabs to RCU */
34#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */ 34#define SLAB_MEM_SPREAD 0x00100000UL /* Spread some memory over cpuset */
35#define SLAB_TRACE 0x00200000UL /* Trace allocations and frees */
35 36
36/* Flags passed to a constructor functions */ 37/* Flags passed to a constructor functions */
37#define SLAB_CTOR_CONSTRUCTOR 0x001UL /* If not set, then deconstructor */ 38#define SLAB_CTOR_CONSTRUCTOR 0x001UL /* If not set, then deconstructor */
@@ -42,7 +43,7 @@ typedef struct kmem_cache kmem_cache_t __deprecated;
42 * struct kmem_cache related prototypes 43 * struct kmem_cache related prototypes
43 */ 44 */
44void __init kmem_cache_init(void); 45void __init kmem_cache_init(void);
45extern int slab_is_available(void); 46int slab_is_available(void);
46 47
47struct kmem_cache *kmem_cache_create(const char *, size_t, size_t, 48struct kmem_cache *kmem_cache_create(const char *, size_t, size_t,
48 unsigned long, 49 unsigned long,
@@ -95,9 +96,14 @@ static inline void *kcalloc(size_t n, size_t size, gfp_t flags)
95 * the appropriate general cache at compile time. 96 * the appropriate general cache at compile time.
96 */ 97 */
97 98
98#ifdef CONFIG_SLAB 99#if defined(CONFIG_SLAB) || defined(CONFIG_SLUB)
100#ifdef CONFIG_SLUB
101#include <linux/slub_def.h>
102#else
99#include <linux/slab_def.h> 103#include <linux/slab_def.h>
104#endif /* !CONFIG_SLUB */
100#else 105#else
106
101/* 107/*
102 * Fallback definitions for an allocator not wanting to provide 108 * Fallback definitions for an allocator not wanting to provide
103 * its own optimized kmalloc definitions (like SLOB). 109 * its own optimized kmalloc definitions (like SLOB).
@@ -184,7 +190,7 @@ static inline void *__kmalloc_node(size_t size, gfp_t flags, int node)
184 * allocator where we care about the real place the memory allocation 190 * allocator where we care about the real place the memory allocation
185 * request comes from. 191 * request comes from.
186 */ 192 */
187#ifdef CONFIG_DEBUG_SLAB 193#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
188extern void *__kmalloc_track_caller(size_t, gfp_t, void*); 194extern void *__kmalloc_track_caller(size_t, gfp_t, void*);
189#define kmalloc_track_caller(size, flags) \ 195#define kmalloc_track_caller(size, flags) \
190 __kmalloc_track_caller(size, flags, __builtin_return_address(0)) 196 __kmalloc_track_caller(size, flags, __builtin_return_address(0))
@@ -202,7 +208,7 @@ extern void *__kmalloc_track_caller(size_t, gfp_t, void*);
202 * standard allocator where we care about the real place the memory 208 * standard allocator where we care about the real place the memory
203 * allocation request comes from. 209 * allocation request comes from.
204 */ 210 */
205#ifdef CONFIG_DEBUG_SLAB 211#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB)
206extern void *__kmalloc_node_track_caller(size_t, gfp_t, int, void *); 212extern void *__kmalloc_node_track_caller(size_t, gfp_t, int, void *);
207#define kmalloc_node_track_caller(size, flags, node) \ 213#define kmalloc_node_track_caller(size, flags, node) \
208 __kmalloc_node_track_caller(size, flags, node, \ 214 __kmalloc_node_track_caller(size, flags, node, \
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
new file mode 100644
index 000000000000..30b154ce7289
--- /dev/null
+++ b/include/linux/slub_def.h
@@ -0,0 +1,201 @@
1#ifndef _LINUX_SLUB_DEF_H
2#define _LINUX_SLUB_DEF_H
3
4/*
5 * SLUB : A Slab allocator without object queues.
6 *
7 * (C) 2007 SGI, Christoph Lameter <clameter@sgi.com>
8 */
9#include <linux/types.h>
10#include <linux/gfp.h>
11#include <linux/workqueue.h>
12#include <linux/kobject.h>
13
14struct kmem_cache_node {
15 spinlock_t list_lock; /* Protect partial list and nr_partial */
16 unsigned long nr_partial;
17 atomic_long_t nr_slabs;
18 struct list_head partial;
19};
20
21/*
22 * Slab cache management.
23 */
24struct kmem_cache {
25 /* Used for retriving partial slabs etc */
26 unsigned long flags;
27 int size; /* The size of an object including meta data */
28 int objsize; /* The size of an object without meta data */
29 int offset; /* Free pointer offset. */
30 unsigned int order;
31
32 /*
33 * Avoid an extra cache line for UP, SMP and for the node local to
34 * struct kmem_cache.
35 */
36 struct kmem_cache_node local_node;
37
38 /* Allocation and freeing of slabs */
39 int objects; /* Number of objects in slab */
40 int refcount; /* Refcount for slab cache destroy */
41 void (*ctor)(void *, struct kmem_cache *, unsigned long);
42 void (*dtor)(void *, struct kmem_cache *, unsigned long);
43 int inuse; /* Offset to metadata */
44 int align; /* Alignment */
45 const char *name; /* Name (only for display!) */
46 struct list_head list; /* List of slab caches */
47 struct kobject kobj; /* For sysfs */
48
49#ifdef CONFIG_NUMA
50 int defrag_ratio;
51 struct kmem_cache_node *node[MAX_NUMNODES];
52#endif
53 struct page *cpu_slab[NR_CPUS];
54};
55
56/*
57 * Kmalloc subsystem.
58 */
59#define KMALLOC_SHIFT_LOW 3
60
61#ifdef CONFIG_LARGE_ALLOCS
62#define KMALLOC_SHIFT_HIGH 25
63#else
64#if !defined(CONFIG_MMU) || NR_CPUS > 512 || MAX_NUMNODES > 256
65#define KMALLOC_SHIFT_HIGH 20
66#else
67#define KMALLOC_SHIFT_HIGH 18
68#endif
69#endif
70
71/*
72 * We keep the general caches in an array of slab caches that are used for
73 * 2^x bytes of allocations.
74 */
75extern struct kmem_cache kmalloc_caches[KMALLOC_SHIFT_HIGH + 1];
76
77/*
78 * Sorry that the following has to be that ugly but some versions of GCC
79 * have trouble with constant propagation and loops.
80 */
81static inline int kmalloc_index(int size)
82{
83 if (size == 0)
84 return 0;
85 if (size > 64 && size <= 96)
86 return 1;
87 if (size > 128 && size <= 192)
88 return 2;
89 if (size <= 8) return 3;
90 if (size <= 16) return 4;
91 if (size <= 32) return 5;
92 if (size <= 64) return 6;
93 if (size <= 128) return 7;
94 if (size <= 256) return 8;
95 if (size <= 512) return 9;
96 if (size <= 1024) return 10;
97 if (size <= 2 * 1024) return 11;
98 if (size <= 4 * 1024) return 12;
99 if (size <= 8 * 1024) return 13;
100 if (size <= 16 * 1024) return 14;
101 if (size <= 32 * 1024) return 15;
102 if (size <= 64 * 1024) return 16;
103 if (size <= 128 * 1024) return 17;
104 if (size <= 256 * 1024) return 18;
105#if KMALLOC_SHIFT_HIGH > 18
106 if (size <= 512 * 1024) return 19;
107 if (size <= 1024 * 1024) return 20;
108#endif
109#if KMALLOC_SHIFT_HIGH > 20
110 if (size <= 2 * 1024 * 1024) return 21;
111 if (size <= 4 * 1024 * 1024) return 22;
112 if (size <= 8 * 1024 * 1024) return 23;
113 if (size <= 16 * 1024 * 1024) return 24;
114 if (size <= 32 * 1024 * 1024) return 25;
115#endif
116 return -1;
117
118/*
119 * What we really wanted to do and cannot do because of compiler issues is:
120 * int i;
121 * for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++)
122 * if (size <= (1 << i))
123 * return i;
124 */
125}
126
127/*
128 * Find the slab cache for a given combination of allocation flags and size.
129 *
130 * This ought to end up with a global pointer to the right cache
131 * in kmalloc_caches.
132 */
133static inline struct kmem_cache *kmalloc_slab(size_t size)
134{
135 int index = kmalloc_index(size);
136
137 if (index == 0)
138 return NULL;
139
140 if (index < 0) {
141 /*
142 * Generate a link failure. Would be great if we could
143 * do something to stop the compile here.
144 */
145 extern void __kmalloc_size_too_large(void);
146 __kmalloc_size_too_large();
147 }
148 return &kmalloc_caches[index];
149}
150
151#ifdef CONFIG_ZONE_DMA
152#define SLUB_DMA __GFP_DMA
153#else
154/* Disable DMA functionality */
155#define SLUB_DMA 0
156#endif
157
158static inline void *kmalloc(size_t size, gfp_t flags)
159{
160 if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
161 struct kmem_cache *s = kmalloc_slab(size);
162
163 if (!s)
164 return NULL;
165
166 return kmem_cache_alloc(s, flags);
167 } else
168 return __kmalloc(size, flags);
169}
170
171static inline void *kzalloc(size_t size, gfp_t flags)
172{
173 if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
174 struct kmem_cache *s = kmalloc_slab(size);
175
176 if (!s)
177 return NULL;
178
179 return kmem_cache_zalloc(s, flags);
180 } else
181 return __kzalloc(size, flags);
182}
183
184#ifdef CONFIG_NUMA
185extern void *__kmalloc_node(size_t size, gfp_t flags, int node);
186
187static inline void *kmalloc_node(size_t size, gfp_t flags, int node)
188{
189 if (__builtin_constant_p(size) && !(flags & SLUB_DMA)) {
190 struct kmem_cache *s = kmalloc_slab(size);
191
192 if (!s)
193 return NULL;
194
195 return kmem_cache_alloc_node(s, flags, node);
196 } else
197 return __kmalloc_node(size, flags, node);
198}
199#endif
200
201#endif /* _LINUX_SLUB_DEF_H */