diff options
| -rw-r--r-- | Documentation/kernel-parameters.txt | 37 | ||||
| -rw-r--r-- | Documentation/vm/slub.txt | 135 |
2 files changed, 158 insertions, 14 deletions
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index aae2282600ca..ce91560229f5 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
| @@ -1132,9 +1132,9 @@ and is between 256 and 4096 characters. It is defined in the file | |||
| 1132 | when set. | 1132 | when set. |
| 1133 | Format: <int> | 1133 | Format: <int> |
| 1134 | 1134 | ||
| 1135 | noaliencache [MM, NUMA] Disables the allcoation of alien caches in | 1135 | noaliencache [MM, NUMA, SLAB] Disables the allocation of alien |
| 1136 | the slab allocator. Saves per-node memory, but will | 1136 | caches in the slab allocator. Saves per-node memory, |
| 1137 | impact performance on real NUMA hardware. | 1137 | but will impact performance. |
| 1138 | 1138 | ||
| 1139 | noalign [KNL,ARM] | 1139 | noalign [KNL,ARM] |
| 1140 | 1140 | ||
| @@ -1613,6 +1613,37 @@ and is between 256 and 4096 characters. It is defined in the file | |||
| 1613 | 1613 | ||
| 1614 | slram= [HW,MTD] | 1614 | slram= [HW,MTD] |
| 1615 | 1615 | ||
| 1616 | slub_debug [MM, SLUB] | ||
| 1617 | Enabling slub_debug allows one to determine the culprit | ||
| 1618 | if slab objects become corrupted. Enabling slub_debug | ||
| 1619 | creates guard zones around objects and poisons objects | ||
| 1620 | when not in use. Also tracks the last alloc / free. | ||
| 1621 | For more information see Documentation/vm/slub.txt. | ||
| 1622 | |||
| 1623 | slub_max_order= [MM, SLUB] | ||
| 1624 | Determines the maximum allowed order for slabs. Setting | ||
| 1625 | this too high may cause fragmentation. | ||
| 1626 | For more information see Documentation/vm/slub.txt. | ||
| 1627 | |||
| 1628 | slub_min_objects= [MM, SLUB] | ||
| 1629 | The minimum objects per slab. SLUB will increase the | ||
| 1630 | slab order up to slub_max_order to generate a | ||
| 1631 | sufficiently big slab to satisfy the number of objects. | ||
| 1632 | The higher the number of objects the smaller the overhead | ||
| 1633 | of tracking slabs. | ||
| 1634 | For more information see Documentation/vm/slub.txt. | ||
| 1635 | |||
| 1636 | slub_min_order= [MM, SLUB] | ||
| 1637 | Determines the mininum page order for slabs. Must be | ||
| 1638 | lower than slub_max_order | ||
| 1639 | For more information see Documentation/vm/slub.txt. | ||
| 1640 | |||
| 1641 | slub_nomerge [MM, SLUB] | ||
| 1642 | Disable merging of slabs of similar size. May be | ||
| 1643 | necessary if there is some reason to distinguish | ||
| 1644 | allocs to different slabs. | ||
| 1645 | For more information see Documentation/vm/slub.txt. | ||
| 1646 | |||
| 1616 | smart2= [HW] | 1647 | smart2= [HW] |
| 1617 | Format: <io1>[,<io2>[,...,<io8>]] | 1648 | Format: <io1>[,<io2>[,...,<io8>]] |
| 1618 | 1649 | ||
diff --git a/Documentation/vm/slub.txt b/Documentation/vm/slub.txt index 727c8d81aeaf..1523320abd87 100644 --- a/Documentation/vm/slub.txt +++ b/Documentation/vm/slub.txt | |||
| @@ -1,13 +1,9 @@ | |||
| 1 | Short users guide for SLUB | 1 | Short users guide for SLUB |
| 2 | -------------------------- | 2 | -------------------------- |
| 3 | 3 | ||
| 4 | First of all slub should transparently replace SLAB. If you enable | ||
| 5 | SLUB then everything should work the same (Note the word "should". | ||
| 6 | There is likely not much value in that word at this point). | ||
| 7 | |||
| 8 | The basic philosophy of SLUB is very different from SLAB. SLAB | 4 | The basic philosophy of SLUB is very different from SLAB. SLAB |
| 9 | requires rebuilding the kernel to activate debug options for all | 5 | requires rebuilding the kernel to activate debug options for all |
| 10 | SLABS. SLUB always includes full debugging but its off by default. | 6 | slab caches. SLUB always includes full debugging but it is off by default. |
| 11 | SLUB can enable debugging only for selected slabs in order to avoid | 7 | SLUB can enable debugging only for selected slabs in order to avoid |
| 12 | an impact on overall system performance which may make a bug more | 8 | an impact on overall system performance which may make a bug more |
| 13 | difficult to find. | 9 | difficult to find. |
| @@ -76,13 +72,28 @@ of objects. | |||
| 76 | Careful with tracing: It may spew out lots of information and never stop if | 72 | Careful with tracing: It may spew out lots of information and never stop if |
| 77 | used on the wrong slab. | 73 | used on the wrong slab. |
| 78 | 74 | ||
| 79 | SLAB Merging | 75 | Slab merging |
| 80 | ------------ | 76 | ------------ |
| 81 | 77 | ||
| 82 | If no debugging is specified then SLUB may merge similar slabs together | 78 | If no debug options are specified then SLUB may merge similar slabs together |
| 83 | in order to reduce overhead and increase cache hotness of objects. | 79 | in order to reduce overhead and increase cache hotness of objects. |
| 84 | slabinfo -a displays which slabs were merged together. | 80 | slabinfo -a displays which slabs were merged together. |
| 85 | 81 | ||
| 82 | Slab validation | ||
| 83 | --------------- | ||
| 84 | |||
| 85 | SLUB can validate all object if the kernel was booted with slub_debug. In | ||
| 86 | order to do so you must have the slabinfo tool. Then you can do | ||
| 87 | |||
| 88 | slabinfo -v | ||
| 89 | |||
| 90 | which will test all objects. Output will be generated to the syslog. | ||
| 91 | |||
| 92 | This also works in a more limited way if boot was without slab debug. | ||
| 93 | In that case slabinfo -v simply tests all reachable objects. Usually | ||
| 94 | these are in the cpu slabs and the partial slabs. Full slabs are not | ||
| 95 | tracked by SLUB in a non debug situation. | ||
| 96 | |||
| 86 | Getting more performance | 97 | Getting more performance |
| 87 | ------------------------ | 98 | ------------------------ |
| 88 | 99 | ||
| @@ -91,9 +102,9 @@ list_lock once in a while to deal with partial slabs. That overhead is | |||
| 91 | governed by the order of the allocation for each slab. The allocations | 102 | governed by the order of the allocation for each slab. The allocations |
| 92 | can be influenced by kernel parameters: | 103 | can be influenced by kernel parameters: |
| 93 | 104 | ||
| 94 | slub_min_objects=x (default 8) | 105 | slub_min_objects=x (default 4) |
| 95 | slub_min_order=x (default 0) | 106 | slub_min_order=x (default 0) |
| 96 | slub_max_order=x (default 4) | 107 | slub_max_order=x (default 1) |
| 97 | 108 | ||
| 98 | slub_min_objects allows to specify how many objects must at least fit | 109 | slub_min_objects allows to specify how many objects must at least fit |
| 99 | into one slab in order for the allocation order to be acceptable. | 110 | into one slab in order for the allocation order to be acceptable. |
| @@ -109,5 +120,107 @@ longer be checked. This is useful to avoid SLUB trying to generate | |||
| 109 | super large order pages to fit slub_min_objects of a slab cache with | 120 | super large order pages to fit slub_min_objects of a slab cache with |
| 110 | large object sizes into one high order page. | 121 | large object sizes into one high order page. |
| 111 | 122 | ||
| 112 | 123 | SLUB Debug output | |
| 113 | Christoph Lameter, <clameter@sgi.com>, April 10, 2007 | 124 | ----------------- |
| 125 | |||
| 126 | Here is a sample of slub debug output: | ||
| 127 | |||
| 128 | *** SLUB kmalloc-8: Redzone Active@0xc90f6d20 slab 0xc528c530 offset=3360 flags=0x400000c3 inuse=61 freelist=0xc90f6d58 | ||
| 129 | Bytes b4 0xc90f6d10: 00 00 00 00 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ | ||
| 130 | Object 0xc90f6d20: 31 30 31 39 2e 30 30 35 1019.005 | ||
| 131 | Redzone 0xc90f6d28: 00 cc cc cc . | ||
| 132 | FreePointer 0xc90f6d2c -> 0xc90f6d58 | ||
| 133 | Last alloc: get_modalias+0x61/0xf5 jiffies_ago=53 cpu=1 pid=554 | ||
| 134 | Filler 0xc90f6d50: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ | ||
| 135 | [<c010523d>] dump_trace+0x63/0x1eb | ||
| 136 | [<c01053df>] show_trace_log_lvl+0x1a/0x2f | ||
| 137 | [<c010601d>] show_trace+0x12/0x14 | ||
| 138 | [<c0106035>] dump_stack+0x16/0x18 | ||
| 139 | [<c017e0fa>] object_err+0x143/0x14b | ||
| 140 | [<c017e2cc>] check_object+0x66/0x234 | ||
| 141 | [<c017eb43>] __slab_free+0x239/0x384 | ||
| 142 | [<c017f446>] kfree+0xa6/0xc6 | ||
| 143 | [<c02e2335>] get_modalias+0xb9/0xf5 | ||
| 144 | [<c02e23b7>] dmi_dev_uevent+0x27/0x3c | ||
| 145 | [<c027866a>] dev_uevent+0x1ad/0x1da | ||
| 146 | [<c0205024>] kobject_uevent_env+0x20a/0x45b | ||
| 147 | [<c020527f>] kobject_uevent+0xa/0xf | ||
| 148 | [<c02779f1>] store_uevent+0x4f/0x58 | ||
| 149 | [<c027758e>] dev_attr_store+0x29/0x2f | ||
| 150 | [<c01bec4f>] sysfs_write_file+0x16e/0x19c | ||
| 151 | [<c0183ba7>] vfs_write+0xd1/0x15a | ||
| 152 | [<c01841d7>] sys_write+0x3d/0x72 | ||
| 153 | [<c0104112>] sysenter_past_esp+0x5f/0x99 | ||
| 154 | [<b7f7b410>] 0xb7f7b410 | ||
| 155 | ======================= | ||
| 156 | @@@ SLUB kmalloc-8: Restoring redzone (0xcc) from 0xc90f6d28-0xc90f6d2b | ||
| 157 | |||
| 158 | |||
| 159 | |||
| 160 | If SLUB encounters a corrupted object then it will perform the following | ||
| 161 | actions: | ||
| 162 | |||
| 163 | 1. Isolation and report of the issue | ||
| 164 | |||
| 165 | This will be a message in the system log starting with | ||
| 166 | |||
| 167 | *** SLUB <slab cache affected>: <What went wrong>@<object address> | ||
| 168 | offset=<offset of object into slab> flags=<slabflags> | ||
| 169 | inuse=<objects in use in this slab> freelist=<first free object in slab> | ||
| 170 | |||
| 171 | 2. Report on how the problem was dealt with in order to ensure the continued | ||
| 172 | operation of the system. | ||
| 173 | |||
| 174 | These are messages in the system log beginning with | ||
| 175 | |||
| 176 | @@@ SLUB <slab cache affected>: <corrective action taken> | ||
| 177 | |||
| 178 | |||
| 179 | In the above sample SLUB found that the Redzone of an active object has | ||
| 180 | been overwritten. Here a string of 8 characters was written into a slab that | ||
| 181 | has the length of 8 characters. However, a 8 character string needs a | ||
| 182 | terminating 0. That zero has overwritten the first byte of the Redzone field. | ||
| 183 | After reporting the details of the issue encountered the @@@ SLUB message | ||
| 184 | tell us that SLUB has restored the redzone to its proper value and then | ||
| 185 | system operations continue. | ||
| 186 | |||
| 187 | Various types of lines can follow the @@@ SLUB line: | ||
| 188 | |||
| 189 | Bytes b4 <address> : <bytes> | ||
| 190 | Show a few bytes before the object where the problem was detected. | ||
| 191 | Can be useful if the corruption does not stop with the start of the | ||
| 192 | object. | ||
| 193 | |||
| 194 | Object <address> : <bytes> | ||
| 195 | The bytes of the object. If the object is inactive then the bytes | ||
| 196 | typically contain poisoning values. Any non-poison value shows a | ||
| 197 | corruption by a write after free. | ||
| 198 | |||
| 199 | Redzone <address> : <bytes> | ||
| 200 | The redzone following the object. The redzone is used to detect | ||
| 201 | writes after the object. All bytes should always have the same | ||
| 202 | value. If there is any deviation then it is due to a write after | ||
| 203 | the object boundary. | ||
| 204 | |||
| 205 | Freepointer | ||
| 206 | The pointer to the next free object in the slab. May become | ||
| 207 | corrupted if overwriting continues after the red zone. | ||
| 208 | |||
| 209 | Last alloc: | ||
| 210 | Last free: | ||
| 211 | Shows the address from which the object was allocated/freed last. | ||
| 212 | We note the pid, the time and the CPU that did so. This is usually | ||
| 213 | the most useful information to figure out where things went wrong. | ||
| 214 | Here get_modalias() did an kmalloc(8) instead of a kmalloc(9). | ||
| 215 | |||
| 216 | Filler <address> : <bytes> | ||
| 217 | Unused data to fill up the space in order to get the next object | ||
| 218 | properly aligned. In the debug case we make sure that there are | ||
| 219 | at least 4 bytes of filler. This allow for the detection of writes | ||
| 220 | before the object. | ||
| 221 | |||
| 222 | Following the filler will be a stackdump. That stackdump describes the | ||
| 223 | location where the error was detected. The cause of the corruption is more | ||
| 224 | likely to be found by looking at the information about the last alloc / free. | ||
| 225 | |||
| 226 | Christoph Lameter, <clameter@sgi.com>, May 23, 2007 | ||
