summaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMauro Carvalho Chehab <mchehab+samsung@kernel.org>2019-04-18 17:35:54 -0400
committerMauro Carvalho Chehab <mchehab+samsung@kernel.org>2019-07-15 08:20:26 -0400
commit53b9537509654a6267c3f56b4d2e7409b9089686 (patch)
treef239d0c5778ad0757bc60cc99bc7ff9e1de424cb
parent6baec31591cee0f2f6d446abb81c828499a6ed23 (diff)
docs: sysctl: convert to ReST
Rename the /proc/sys/ documentation files to ReST, using the README file as a template for an index.rst, adding the other files there via TOC markup. Despite being written on different times with different styles, try to make them somewhat coherent with a similar look and feel, ensuring that they'll look nice as both raw text file and as via the html output produced by the Sphinx build system. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
-rw-r--r--Documentation/admin-guide/kernel-parameters.txt2
-rw-r--r--Documentation/admin-guide/mm/index.rst2
-rw-r--r--Documentation/admin-guide/mm/ksm.rst2
-rw-r--r--Documentation/core-api/printk-formats.rst2
-rw-r--r--Documentation/networking/ip-sysctl.txt2
-rw-r--r--Documentation/sysctl/abi.rst67
-rw-r--r--Documentation/sysctl/abi.txt54
-rw-r--r--Documentation/sysctl/fs.rst (renamed from Documentation/sysctl/fs.txt)146
-rw-r--r--Documentation/sysctl/index.rst (renamed from Documentation/sysctl/README)36
-rw-r--r--Documentation/sysctl/kernel.rst (renamed from Documentation/sysctl/kernel.txt)372
-rw-r--r--Documentation/sysctl/net.rst (renamed from Documentation/sysctl/net.txt)141
-rw-r--r--Documentation/sysctl/sunrpc.rst (renamed from Documentation/sysctl/sunrpc.txt)13
-rw-r--r--Documentation/sysctl/user.rst (renamed from Documentation/sysctl/user.txt)32
-rw-r--r--Documentation/sysctl/vm.rst (renamed from Documentation/sysctl/vm.txt)260
-rw-r--r--Documentation/vm/unevictable-lru.rst2
-rw-r--r--kernel/panic.c2
-rw-r--r--mm/swap.c2
17 files changed, 653 insertions, 484 deletions
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 6b2adda1cc03..01123f1de354 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3144,7 +3144,7 @@
3144 numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA. 3144 numa_zonelist_order= [KNL, BOOT] Select zonelist order for NUMA.
3145 'node', 'default' can be specified 3145 'node', 'default' can be specified
3146 This can be set from sysctl after boot. 3146 This can be set from sysctl after boot.
3147 See Documentation/sysctl/vm.txt for details. 3147 See Documentation/sysctl/vm.rst for details.
3148 3148
3149 ohci1394_dma=early [HW] enable debugging via the ohci1394 driver. 3149 ohci1394_dma=early [HW] enable debugging via the ohci1394 driver.
3150 See Documentation/debugging-via-ohci1394.txt for more 3150 See Documentation/debugging-via-ohci1394.txt for more
diff --git a/Documentation/admin-guide/mm/index.rst b/Documentation/admin-guide/mm/index.rst
index ddf8d8d33377..f5e92f33f96e 100644
--- a/Documentation/admin-guide/mm/index.rst
+++ b/Documentation/admin-guide/mm/index.rst
@@ -11,7 +11,7 @@ processes address space and many other cool things.
11Linux memory management is a complex system with many configurable 11Linux memory management is a complex system with many configurable
12settings. Most of these settings are available via ``/proc`` 12settings. Most of these settings are available via ``/proc``
13filesystem and can be quired and adjusted using ``sysctl``. These APIs 13filesystem and can be quired and adjusted using ``sysctl``. These APIs
14are described in Documentation/sysctl/vm.txt and in `man 5 proc`_. 14are described in Documentation/sysctl/vm.rst and in `man 5 proc`_.
15 15
16.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html 16.. _man 5 proc: http://man7.org/linux/man-pages/man5/proc.5.html
17 17
diff --git a/Documentation/admin-guide/mm/ksm.rst b/Documentation/admin-guide/mm/ksm.rst
index 9303786632d1..7b2b8767c0b4 100644
--- a/Documentation/admin-guide/mm/ksm.rst
+++ b/Documentation/admin-guide/mm/ksm.rst
@@ -59,7 +59,7 @@ MADV_UNMERGEABLE is applied to a range which was never MADV_MERGEABLE.
59 59
60If a region of memory must be split into at least one new MADV_MERGEABLE 60If a region of memory must be split into at least one new MADV_MERGEABLE
61or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process 61or MADV_UNMERGEABLE region, the madvise may return ENOMEM if the process
62will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.txt). 62will exceed ``vm.max_map_count`` (see Documentation/sysctl/vm.rst).
63 63
64Like other madvise calls, they are intended for use on mapped areas of 64Like other madvise calls, they are intended for use on mapped areas of
65the user address space: they will report ENOMEM if the specified range 65the user address space: they will report ENOMEM if the specified range
diff --git a/Documentation/core-api/printk-formats.rst b/Documentation/core-api/printk-formats.rst
index 75d2bbe9813f..1d8e748f909f 100644
--- a/Documentation/core-api/printk-formats.rst
+++ b/Documentation/core-api/printk-formats.rst
@@ -119,7 +119,7 @@ Kernel Pointers
119 119
120For printing kernel pointers which should be hidden from unprivileged 120For printing kernel pointers which should be hidden from unprivileged
121users. The behaviour of %pK depends on the kptr_restrict sysctl - see 121users. The behaviour of %pK depends on the kptr_restrict sysctl - see
122Documentation/sysctl/kernel.txt for more details. 122Documentation/sysctl/kernel.rst for more details.
123 123
124Unmodified Addresses 124Unmodified Addresses
125-------------------- 125--------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 48c79e78817b..5c3399cde1c4 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -2287,7 +2287,7 @@ addr_scope_policy - INTEGER
2287 2287
2288 2288
2289/proc/sys/net/core/* 2289/proc/sys/net/core/*
2290 Please see: Documentation/sysctl/net.txt for descriptions of these entries. 2290 Please see: Documentation/sysctl/net.rst for descriptions of these entries.
2291 2291
2292 2292
2293/proc/sys/net/unix/* 2293/proc/sys/net/unix/*
diff --git a/Documentation/sysctl/abi.rst b/Documentation/sysctl/abi.rst
new file mode 100644
index 000000000000..599bcde7f0b7
--- /dev/null
+++ b/Documentation/sysctl/abi.rst
@@ -0,0 +1,67 @@
1================================
2Documentation for /proc/sys/abi/
3================================
4
5kernel version 2.6.0.test2
6
7Copyright (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net>
8
9For general info: index.rst.
10
11------------------------------------------------------------------------------
12
13This path is binary emulation relevant aka personality types aka abi.
14When a process is executed, it's linked to an exec_domain whose
15personality is defined using values available from /proc/sys/abi.
16You can find further details about abi in include/linux/personality.h.
17
18Here are the files featuring in 2.6 kernel:
19
20- defhandler_coff
21- defhandler_elf
22- defhandler_lcall7
23- defhandler_libcso
24- fake_utsname
25- trace
26
27defhandler_coff
28---------------
29
30defined value:
31 PER_SCOSVR3::
32
33 0x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
34
35defhandler_elf
36--------------
37
38defined value:
39 PER_LINUX::
40
41 0
42
43defhandler_lcall7
44-----------------
45
46defined value :
47 PER_SVR4::
48
49 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
50
51defhandler_libsco
52-----------------
53
54defined value:
55 PER_SVR4::
56
57 0x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
58
59fake_utsname
60------------
61
62Unused
63
64trace
65-----
66
67Unused
diff --git a/Documentation/sysctl/abi.txt b/Documentation/sysctl/abi.txt
deleted file mode 100644
index 63f4ebcf652c..000000000000
--- a/Documentation/sysctl/abi.txt
+++ /dev/null
@@ -1,54 +0,0 @@
1Documentation for /proc/sys/abi/* kernel version 2.6.0.test2
2 (c) 2003, Fabian Frederick <ffrederick@users.sourceforge.net>
3
4For general info : README.
5
6==============================================================
7
8This path is binary emulation relevant aka personality types aka abi.
9When a process is executed, it's linked to an exec_domain whose
10personality is defined using values available from /proc/sys/abi.
11You can find further details about abi in include/linux/personality.h.
12
13Here are the files featuring in 2.6 kernel :
14
15- defhandler_coff
16- defhandler_elf
17- defhandler_lcall7
18- defhandler_libcso
19- fake_utsname
20- trace
21
22===========================================================
23defhandler_coff:
24defined value :
25PER_SCOSVR3
260x0003 | STICKY_TIMEOUTS | WHOLE_SECONDS | SHORT_INODE
27
28===========================================================
29defhandler_elf:
30defined value :
31PER_LINUX
320
33
34===========================================================
35defhandler_lcall7:
36defined value :
37PER_SVR4
380x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
39
40===========================================================
41defhandler_libsco:
42defined value:
43PER_SVR4
440x0001 | STICKY_TIMEOUTS | MMAP_PAGE_ZERO,
45
46===========================================================
47fake_utsname:
48Unused
49
50===========================================================
51trace:
52Unused
53
54===========================================================
diff --git a/Documentation/sysctl/fs.txt b/Documentation/sysctl/fs.rst
index ebc679bcb2dc..2a45119e3331 100644
--- a/Documentation/sysctl/fs.txt
+++ b/Documentation/sysctl/fs.rst
@@ -1,10 +1,16 @@
1Documentation for /proc/sys/fs/* kernel version 2.2.10 1===============================
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 2Documentation for /proc/sys/fs/
3 (c) 2009, Shen Feng<shen@cn.fujitsu.com> 3===============================
4 4
5For general info and legal blurb, please look in README. 5kernel version 2.2.10
6 6
7============================================================== 7Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
8
9Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
10
11For general info and legal blurb, please look in intro.rst.
12
13------------------------------------------------------------------------------
8 14
9This file contains documentation for the sysctl files in 15This file contains documentation for the sysctl files in
10/proc/sys/fs/ and is valid for Linux kernel version 2.2. 16/proc/sys/fs/ and is valid for Linux kernel version 2.2.
@@ -16,9 +22,10 @@ system, it is advisable to read both documentation and source
16before actually making adjustments. 22before actually making adjustments.
17 23
181. /proc/sys/fs 241. /proc/sys/fs
19---------------------------------------------------------- 25===============
20 26
21Currently, these files are in /proc/sys/fs: 27Currently, these files are in /proc/sys/fs:
28
22- aio-max-nr 29- aio-max-nr
23- aio-nr 30- aio-nr
24- dentry-state 31- dentry-state
@@ -42,9 +49,9 @@ Currently, these files are in /proc/sys/fs:
42- super-max 49- super-max
43- super-nr 50- super-nr
44 51
45==============================================================
46 52
47aio-nr & aio-max-nr: 53aio-nr & aio-max-nr
54-------------------
48 55
49aio-nr is the running total of the number of events specified on the 56aio-nr is the running total of the number of events specified on the
50io_setup system call for all currently active aio contexts. If aio-nr 57io_setup system call for all currently active aio contexts. If aio-nr
@@ -52,21 +59,20 @@ reaches aio-max-nr then io_setup will fail with EAGAIN. Note that
52raising aio-max-nr does not result in the pre-allocation or re-sizing 59raising aio-max-nr does not result in the pre-allocation or re-sizing
53of any kernel data structures. 60of any kernel data structures.
54 61
55==============================================================
56 62
57dentry-state: 63dentry-state
64------------
58 65
59From linux/include/linux/dcache.h: 66From linux/include/linux/dcache.h::
60-------------------------------------------------------------- 67
61struct dentry_stat_t dentry_stat { 68 struct dentry_stat_t dentry_stat {
62 int nr_dentry; 69 int nr_dentry;
63 int nr_unused; 70 int nr_unused;
64 int age_limit; /* age in seconds */ 71 int age_limit; /* age in seconds */
65 int want_pages; /* pages requested by system */ 72 int want_pages; /* pages requested by system */
66 int nr_negative; /* # of unused negative dentries */ 73 int nr_negative; /* # of unused negative dentries */
67 int dummy; /* Reserved for future use */ 74 int dummy; /* Reserved for future use */
68}; 75 };
69--------------------------------------------------------------
70 76
71Dentries are dynamically allocated and deallocated. 77Dentries are dynamically allocated and deallocated.
72 78
@@ -84,9 +90,9 @@ negative dentries which do not map to any files. Instead,
84they help speeding up rejection of non-existing files provided 90they help speeding up rejection of non-existing files provided
85by the users. 91by the users.
86 92
87==============================================================
88 93
89dquot-max & dquot-nr: 94dquot-max & dquot-nr
95--------------------
90 96
91The file dquot-max shows the maximum number of cached disk 97The file dquot-max shows the maximum number of cached disk
92quota entries. 98quota entries.
@@ -98,9 +104,9 @@ If the number of free cached disk quotas is very low and
98you have some awesome number of simultaneous system users, 104you have some awesome number of simultaneous system users,
99you might want to raise the limit. 105you might want to raise the limit.
100 106
101==============================================================
102 107
103file-max & file-nr: 108file-max & file-nr
109------------------
104 110
105The value in file-max denotes the maximum number of file- 111The value in file-max denotes the maximum number of file-
106handles that the Linux kernel will allocate. When you get lots 112handles that the Linux kernel will allocate. When you get lots
@@ -119,18 +125,19 @@ used file handles.
119Attempts to allocate more file descriptors than file-max are 125Attempts to allocate more file descriptors than file-max are
120reported with printk, look for "VFS: file-max limit <number> 126reported with printk, look for "VFS: file-max limit <number>
121reached". 127reached".
122==============================================================
123 128
124nr_open: 129
130nr_open
131-------
125 132
126This denotes the maximum number of file-handles a process can 133This denotes the maximum number of file-handles a process can
127allocate. Default value is 1024*1024 (1048576) which should be 134allocate. Default value is 1024*1024 (1048576) which should be
128enough for most machines. Actual limit depends on RLIMIT_NOFILE 135enough for most machines. Actual limit depends on RLIMIT_NOFILE
129resource limit. 136resource limit.
130 137
131==============================================================
132 138
133inode-max, inode-nr & inode-state: 139inode-max, inode-nr & inode-state
140---------------------------------
134 141
135As with file handles, the kernel allocates the inode structures 142As with file handles, the kernel allocates the inode structures
136dynamically, but can't free them yet. 143dynamically, but can't free them yet.
@@ -157,9 +164,9 @@ preshrink is nonzero when the nr_inodes > inode-max and the
157system needs to prune the inode list instead of allocating 164system needs to prune the inode list instead of allocating
158more. 165more.
159 166
160==============================================================
161 167
162overflowgid & overflowuid: 168overflowgid & overflowuid
169-------------------------
163 170
164Some filesystems only support 16-bit UIDs and GIDs, although in Linux 171Some filesystems only support 16-bit UIDs and GIDs, although in Linux
165UIDs and GIDs are 32 bits. When one of these filesystems is mounted 172UIDs and GIDs are 32 bits. When one of these filesystems is mounted
@@ -169,18 +176,18 @@ to a fixed value before being written to disk.
169These sysctls allow you to change the value of the fixed UID and GID. 176These sysctls allow you to change the value of the fixed UID and GID.
170The default is 65534. 177The default is 65534.
171 178
172==============================================================
173 179
174pipe-user-pages-hard: 180pipe-user-pages-hard
181--------------------
175 182
176Maximum total number of pages a non-privileged user may allocate for pipes. 183Maximum total number of pages a non-privileged user may allocate for pipes.
177Once this limit is reached, no new pipes may be allocated until usage goes 184Once this limit is reached, no new pipes may be allocated until usage goes
178below the limit again. When set to 0, no limit is applied, which is the default 185below the limit again. When set to 0, no limit is applied, which is the default
179setting. 186setting.
180 187
181==============================================================
182 188
183pipe-user-pages-soft: 189pipe-user-pages-soft
190--------------------
184 191
185Maximum total number of pages a non-privileged user may allocate for pipes 192Maximum total number of pages a non-privileged user may allocate for pipes
186before the pipe size gets limited to a single page. Once this limit is reached, 193before the pipe size gets limited to a single page. Once this limit is reached,
@@ -190,9 +197,9 @@ denied until usage goes below the limit again. The default value allows to
190allocate up to 1024 pipes at their default size. When set to 0, no limit is 197allocate up to 1024 pipes at their default size. When set to 0, no limit is
191applied. 198applied.
192 199
193==============================================================
194 200
195protected_fifos: 201protected_fifos
202---------------
196 203
197The intent of this protection is to avoid unintentional writes to 204The intent of this protection is to avoid unintentional writes to
198an attacker-controlled FIFO, where a program expected to create a regular 205an attacker-controlled FIFO, where a program expected to create a regular
@@ -208,9 +215,9 @@ When set to "2" it also applies to group writable sticky directories.
208 215
209This protection is based on the restrictions in Openwall. 216This protection is based on the restrictions in Openwall.
210 217
211==============================================================
212 218
213protected_hardlinks: 219protected_hardlinks
220--------------------
214 221
215A long-standing class of security issues is the hardlink-based 222A long-standing class of security issues is the hardlink-based
216time-of-check-time-of-use race, most commonly seen in world-writable 223time-of-check-time-of-use race, most commonly seen in world-writable
@@ -228,9 +235,9 @@ already own the source file, or do not have read/write access to it.
228 235
229This protection is based on the restrictions in Openwall and grsecurity. 236This protection is based on the restrictions in Openwall and grsecurity.
230 237
231==============================================================
232 238
233protected_regular: 239protected_regular
240-----------------
234 241
235This protection is similar to protected_fifos, but it 242This protection is similar to protected_fifos, but it
236avoids writes to an attacker-controlled regular file, where a program 243avoids writes to an attacker-controlled regular file, where a program
@@ -244,9 +251,9 @@ owned by the owner of the directory.
244 251
245When set to "2" it also applies to group writable sticky directories. 252When set to "2" it also applies to group writable sticky directories.
246 253
247==============================================================
248 254
249protected_symlinks: 255protected_symlinks
256------------------
250 257
251A long-standing class of security issues is the symlink-based 258A long-standing class of security issues is the symlink-based
252time-of-check-time-of-use race, most commonly seen in world-writable 259time-of-check-time-of-use race, most commonly seen in world-writable
@@ -264,34 +271,38 @@ follower match, or when the directory owner matches the symlink's owner.
264 271
265This protection is based on the restrictions in Openwall and grsecurity. 272This protection is based on the restrictions in Openwall and grsecurity.
266 273
267==============================================================
268 274
269suid_dumpable: 275suid_dumpable:
276--------------
270 277
271This value can be used to query and set the core dump mode for setuid 278This value can be used to query and set the core dump mode for setuid
272or otherwise protected/tainted binaries. The modes are 279or otherwise protected/tainted binaries. The modes are
273 280
2740 - (default) - traditional behaviour. Any process which has changed 281= ========== ===============================================================
275 privilege levels or is execute only will not be dumped. 2820 (default) traditional behaviour. Any process which has changed
2761 - (debug) - all processes dump core when possible. The core dump is 283 privilege levels or is execute only will not be dumped.
277 owned by the current user and no security is applied. This is 2841 (debug) all processes dump core when possible. The core dump is
278 intended for system debugging situations only. Ptrace is unchecked. 285 owned by the current user and no security is applied. This is
279 This is insecure as it allows regular users to examine the memory 286 intended for system debugging situations only.
280 contents of privileged processes. 287 Ptrace is unchecked.
2812 - (suidsafe) - any binary which normally would not be dumped is dumped 288 This is insecure as it allows regular users to examine the
282 anyway, but only if the "core_pattern" kernel sysctl is set to 289 memory contents of privileged processes.
283 either a pipe handler or a fully qualified path. (For more details 2902 (suidsafe) any binary which normally would not be dumped is dumped
284 on this limitation, see CVE-2006-2451.) This mode is appropriate 291 anyway, but only if the "core_pattern" kernel sysctl is set to
285 when administrators are attempting to debug problems in a normal 292 either a pipe handler or a fully qualified path. (For more
286 environment, and either have a core dump pipe handler that knows 293 details on this limitation, see CVE-2006-2451.) This mode is
287 to treat privileged core dumps with care, or specific directory 294 appropriate when administrators are attempting to debug
288 defined for catching core dumps. If a core dump happens without 295 problems in a normal environment, and either have a core dump
289 a pipe handler or fully qualifid path, a message will be emitted 296 pipe handler that knows to treat privileged core dumps with
290 to syslog warning about the lack of a correct setting. 297 care, or specific directory defined for catching core dumps.
291 298 If a core dump happens without a pipe handler or fully
292============================================================== 299 qualified path, a message will be emitted to syslog warning
293 300 about the lack of a correct setting.
294super-max & super-nr: 301= ========== ===============================================================
302
303
304super-max & super-nr
305--------------------
295 306
296These numbers control the maximum number of superblocks, and 307These numbers control the maximum number of superblocks, and
297thus the maximum number of mounted filesystems the kernel 308thus the maximum number of mounted filesystems the kernel
@@ -299,33 +310,33 @@ can have. You only need to increase super-max if you need to
299mount more filesystems than the current value in super-max 310mount more filesystems than the current value in super-max
300allows you to. 311allows you to.
301 312
302==============================================================
303 313
304aio-nr & aio-max-nr: 314aio-nr & aio-max-nr
315-------------------
305 316
306aio-nr shows the current system-wide number of asynchronous io 317aio-nr shows the current system-wide number of asynchronous io
307requests. aio-max-nr allows you to change the maximum value 318requests. aio-max-nr allows you to change the maximum value
308aio-nr can grow to. 319aio-nr can grow to.
309 320
310==============================================================
311 321
312mount-max: 322mount-max
323---------
313 324
314This denotes the maximum number of mounts that may exist 325This denotes the maximum number of mounts that may exist
315in a mount namespace. 326in a mount namespace.
316 327
317==============================================================
318 328
319 329
3202. /proc/sys/fs/binfmt_misc 3302. /proc/sys/fs/binfmt_misc
321---------------------------------------------------------- 331===========================
322 332
323Documentation for the files in /proc/sys/fs/binfmt_misc is 333Documentation for the files in /proc/sys/fs/binfmt_misc is
324in Documentation/admin-guide/binfmt-misc.rst. 334in Documentation/admin-guide/binfmt-misc.rst.
325 335
326 336
3273. /proc/sys/fs/mqueue - POSIX message queues filesystem 3373. /proc/sys/fs/mqueue - POSIX message queues filesystem
328---------------------------------------------------------- 338========================================================
339
329 340
330The "mqueue" filesystem provides the necessary kernel features to enable the 341The "mqueue" filesystem provides the necessary kernel features to enable the
331creation of a user space library that implements the POSIX message queues 342creation of a user space library that implements the POSIX message queues
@@ -356,7 +367,7 @@ the default message size value if attr parameter of mq_open(2) is NULL. If it
356exceed msgsize_max, the default value is initialized msgsize_max. 367exceed msgsize_max, the default value is initialized msgsize_max.
357 368
3584. /proc/sys/fs/epoll - Configuration options for the epoll interface 3694. /proc/sys/fs/epoll - Configuration options for the epoll interface
359-------------------------------------------------------- 370=====================================================================
360 371
361This directory contains configuration options for the epoll(7) interface. 372This directory contains configuration options for the epoll(7) interface.
362 373
@@ -371,4 +382,3 @@ Each "watch" costs roughly 90 bytes on a 32bit kernel, and roughly 160 bytes
371on a 64bit one. 382on a 64bit one.
372The current default value for max_user_watches is the 1/32 of the available 383The current default value for max_user_watches is the 1/32 of the available
373low memory, divided for the "watch" cost in bytes. 384low memory, divided for the "watch" cost in bytes.
374
diff --git a/Documentation/sysctl/README b/Documentation/sysctl/index.rst
index d5f24ab0ecc3..efbcde8c1c9c 100644
--- a/Documentation/sysctl/README
+++ b/Documentation/sysctl/index.rst
@@ -1,5 +1,12 @@
1Documentation for /proc/sys/ kernel version 2.2.10 1:orphan:
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 2
3===========================
4Documentation for /proc/sys
5===========================
6
7Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
8
9------------------------------------------------------------------------------
3 10
4'Why', I hear you ask, 'would anyone even _want_ documentation 11'Why', I hear you ask, 'would anyone even _want_ documentation
5for them sysctl files? If anybody really needs it, it's all in 12for them sysctl files? If anybody really needs it, it's all in
@@ -12,11 +19,12 @@ have the time or knowledge to read the source code.
12Furthermore, the programmers who built sysctl have built it to 19Furthermore, the programmers who built sysctl have built it to
13be actually used, not just for the fun of programming it :-) 20be actually used, not just for the fun of programming it :-)
14 21
15============================================================== 22------------------------------------------------------------------------------
16 23
17Legal blurb: 24Legal blurb:
18 25
19As usual, there are two main things to consider: 26As usual, there are two main things to consider:
27
201. you get what you pay for 281. you get what you pay for
212. it's free 292. it's free
22 30
@@ -35,15 +43,17 @@ stories to: <riel@nl.linux.org>
35 43
36Rik van Riel. 44Rik van Riel.
37 45
38============================================================== 46--------------------------------------------------------------
39 47
40Introduction: 48Introduction
49============
41 50
42Sysctl is a means of configuring certain aspects of the kernel 51Sysctl is a means of configuring certain aspects of the kernel
43at run-time, and the /proc/sys/ directory is there so that you 52at run-time, and the /proc/sys/ directory is there so that you
44don't even need special tools to do it! 53don't even need special tools to do it!
45In fact, there are only four things needed to use these config 54In fact, there are only four things needed to use these config
46facilities: 55facilities:
56
47- a running Linux system 57- a running Linux system
48- root access 58- root access
49- common sense (this is especially hard to come by these days) 59- common sense (this is especially hard to come by these days)
@@ -54,7 +64,9 @@ several (arch-dependent?) subdirs. Each subdir is mainly about
54one part of the kernel, so you can do configuration on a piece 64one part of the kernel, so you can do configuration on a piece
55by piece basis, or just some 'thematic frobbing'. 65by piece basis, or just some 'thematic frobbing'.
56 66
57The subdirs are about: 67This documentation is about:
68
69=============== ===============================================================
58abi/ execution domains & personalities 70abi/ execution domains & personalities
59debug/ <empty> 71debug/ <empty>
60dev/ device specific information (eg dev/cdrom/info) 72dev/ device specific information (eg dev/cdrom/info)
@@ -70,7 +82,19 @@ sunrpc/ SUN Remote Procedure Call (NFS)
70vm/ memory management tuning 82vm/ memory management tuning
71 buffer and cache management 83 buffer and cache management
72user/ Per user per user namespace limits 84user/ Per user per user namespace limits
85=============== ===============================================================
73 86
74These are the subdirs I have on my system. There might be more 87These are the subdirs I have on my system. There might be more
75or other subdirs in another setup. If you see another dir, I'd 88or other subdirs in another setup. If you see another dir, I'd
76really like to hear about it :-) 89really like to hear about it :-)
90
91.. toctree::
92 :maxdepth: 1
93
94 abi
95 fs
96 kernel
97 net
98 sunrpc
99 user
100 vm
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.rst
index 1b2fe17cd2fa..a0c1d4ce403a 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.rst
@@ -1,10 +1,16 @@
1Documentation for /proc/sys/kernel/* kernel version 2.2.10 1===================================
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 2Documentation for /proc/sys/kernel/
3 (c) 2009, Shen Feng<shen@cn.fujitsu.com> 3===================================
4 4
5For general info and legal blurb, please look in README. 5kernel version 2.2.10
6 6
7============================================================== 7Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
8
9Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com>
10
11For general info and legal blurb, please look in index.rst.
12
13------------------------------------------------------------------------------
8 14
9This file contains documentation for the sysctl files in 15This file contains documentation for the sysctl files in
10/proc/sys/kernel/ and is valid for Linux kernel version 2.2. 16/proc/sys/kernel/ and is valid for Linux kernel version 2.2.
@@ -101,9 +107,9 @@ show up in /proc/sys/kernel:
101- watchdog_thresh 107- watchdog_thresh
102- version 108- version
103 109
104==============================================================
105 110
106acct: 111acct:
112=====
107 113
108highwater lowwater frequency 114highwater lowwater frequency
109 115
@@ -118,18 +124,18 @@ That is, suspend accounting if there left <= 2% free; resume it
118if we got >=4%; consider information about amount of free space 124if we got >=4%; consider information about amount of free space
119valid for 30 seconds. 125valid for 30 seconds.
120 126
121==============================================================
122 127
123acpi_video_flags: 128acpi_video_flags:
129=================
124 130
125flags 131flags
126 132
127See Doc*/kernel/power/video.txt, it allows mode of video boot to be 133See Doc*/kernel/power/video.txt, it allows mode of video boot to be
128set during run time. 134set during run time.
129 135
130==============================================================
131 136
132auto_msgmni: 137auto_msgmni:
138============
133 139
134This variable has no effect and may be removed in future kernel 140This variable has no effect and may be removed in future kernel
135releases. Reading it always returns 0. 141releases. Reading it always returns 0.
@@ -139,9 +145,8 @@ Echoing "1" into this file enabled msgmni automatic recomputing.
139Echoing "0" turned it off. auto_msgmni default value was 1. 145Echoing "0" turned it off. auto_msgmni default value was 1.
140 146
141 147
142==============================================================
143
144bootloader_type: 148bootloader_type:
149================
145 150
146x86 bootloader identification 151x86 bootloader identification
147 152
@@ -156,9 +161,9 @@ the value 340 = 0x154.
156See the type_of_loader and ext_loader_type fields in 161See the type_of_loader and ext_loader_type fields in
157Documentation/x86/boot.rst for additional information. 162Documentation/x86/boot.rst for additional information.
158 163
159==============================================================
160 164
161bootloader_version: 165bootloader_version:
166===================
162 167
163x86 bootloader version 168x86 bootloader version
164 169
@@ -168,27 +173,31 @@ file will contain the value 564 = 0x234.
168See the type_of_loader and ext_loader_ver fields in 173See the type_of_loader and ext_loader_ver fields in
169Documentation/x86/boot.rst for additional information. 174Documentation/x86/boot.rst for additional information.
170 175
171==============================================================
172 176
173cap_last_cap 177cap_last_cap:
178=============
174 179
175Highest valid capability of the running kernel. Exports 180Highest valid capability of the running kernel. Exports
176CAP_LAST_CAP from the kernel. 181CAP_LAST_CAP from the kernel.
177 182
178==============================================================
179 183
180core_pattern: 184core_pattern:
185=============
181 186
182core_pattern is used to specify a core dumpfile pattern name. 187core_pattern is used to specify a core dumpfile pattern name.
183. max length 127 characters; default value is "core" 188
184. core_pattern is used as a pattern template for the output filename; 189* max length 127 characters; default value is "core"
190* core_pattern is used as a pattern template for the output filename;
185 certain string patterns (beginning with '%') are substituted with 191 certain string patterns (beginning with '%') are substituted with
186 their actual values. 192 their actual values.
187. backward compatibility with core_uses_pid: 193* backward compatibility with core_uses_pid:
194
188 If core_pattern does not include "%p" (default does not) 195 If core_pattern does not include "%p" (default does not)
189 and core_uses_pid is set, then .PID will be appended to 196 and core_uses_pid is set, then .PID will be appended to
190 the filename. 197 the filename.
191. corename format specifiers: 198
199* corename format specifiers::
200
192 %<NUL> '%' is dropped 201 %<NUL> '%' is dropped
193 %% output one '%' 202 %% output one '%'
194 %p pid 203 %p pid
@@ -205,13 +214,14 @@ core_pattern is used to specify a core dumpfile pattern name.
205 %e executable filename (may be shortened) 214 %e executable filename (may be shortened)
206 %E executable path 215 %E executable path
207 %<OTHER> both are dropped 216 %<OTHER> both are dropped
208. If the first character of the pattern is a '|', the kernel will treat 217
218* If the first character of the pattern is a '|', the kernel will treat
209 the rest of the pattern as a command to run. The core dump will be 219 the rest of the pattern as a command to run. The core dump will be
210 written to the standard input of that program instead of to a file. 220 written to the standard input of that program instead of to a file.
211 221
212==============================================================
213 222
214core_pipe_limit: 223core_pipe_limit:
224================
215 225
216This sysctl is only applicable when core_pattern is configured to pipe 226This sysctl is only applicable when core_pattern is configured to pipe
217core files to a user space helper (when the first character of 227core files to a user space helper (when the first character of
@@ -232,9 +242,9 @@ parallel, but that no waiting will take place (i.e. the collecting
232process is not guaranteed access to /proc/<crashing pid>/). This 242process is not guaranteed access to /proc/<crashing pid>/). This
233value defaults to 0. 243value defaults to 0.
234 244
235==============================================================
236 245
237core_uses_pid: 246core_uses_pid:
247==============
238 248
239The default coredump filename is "core". By setting 249The default coredump filename is "core". By setting
240core_uses_pid to 1, the coredump filename becomes core.PID. 250core_uses_pid to 1, the coredump filename becomes core.PID.
@@ -242,9 +252,9 @@ If core_pattern does not include "%p" (default does not)
242and core_uses_pid is set, then .PID will be appended to 252and core_uses_pid is set, then .PID will be appended to
243the filename. 253the filename.
244 254
245==============================================================
246 255
247ctrl-alt-del: 256ctrl-alt-del:
257=============
248 258
249When the value in this file is 0, ctrl-alt-del is trapped and 259When the value in this file is 0, ctrl-alt-del is trapped and
250sent to the init(1) program to handle a graceful restart. 260sent to the init(1) program to handle a graceful restart.
@@ -252,14 +262,15 @@ When, however, the value is > 0, Linux's reaction to a Vulcan
252Nerve Pinch (tm) will be an immediate reboot, without even 262Nerve Pinch (tm) will be an immediate reboot, without even
253syncing its dirty buffers. 263syncing its dirty buffers.
254 264
255Note: when a program (like dosemu) has the keyboard in 'raw' 265Note:
256mode, the ctrl-alt-del is intercepted by the program before it 266 when a program (like dosemu) has the keyboard in 'raw'
257ever reaches the kernel tty layer, and it's up to the program 267 mode, the ctrl-alt-del is intercepted by the program before it
258to decide what to do with it. 268 ever reaches the kernel tty layer, and it's up to the program
269 to decide what to do with it.
259 270
260==============================================================
261 271
262dmesg_restrict: 272dmesg_restrict:
273===============
263 274
264This toggle indicates whether unprivileged users are prevented 275This toggle indicates whether unprivileged users are prevented
265from using dmesg(8) to view messages from the kernel's log buffer. 276from using dmesg(8) to view messages from the kernel's log buffer.
@@ -270,18 +281,21 @@ dmesg(8).
270The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the 281The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the
271default value of dmesg_restrict. 282default value of dmesg_restrict.
272 283
273==============================================================
274 284
275domainname & hostname: 285domainname & hostname:
286======================
276 287
277These files can be used to set the NIS/YP domainname and the 288These files can be used to set the NIS/YP domainname and the
278hostname of your box in exactly the same way as the commands 289hostname of your box in exactly the same way as the commands
279domainname and hostname, i.e.: 290domainname and hostname, i.e.::
280# echo "darkstar" > /proc/sys/kernel/hostname 291
281# echo "mydomain" > /proc/sys/kernel/domainname 292 # echo "darkstar" > /proc/sys/kernel/hostname
282has the same effect as 293 # echo "mydomain" > /proc/sys/kernel/domainname
283# hostname "darkstar" 294
284# domainname "mydomain" 295has the same effect as::
296
297 # hostname "darkstar"
298 # domainname "mydomain"
285 299
286Note, however, that the classic darkstar.frop.org has the 300Note, however, that the classic darkstar.frop.org has the
287hostname "darkstar" and DNS (Internet Domain Name Server) 301hostname "darkstar" and DNS (Internet Domain Name Server)
@@ -290,8 +304,9 @@ Information Service) or YP (Yellow Pages) domainname. These two
290domain names are in general different. For a detailed discussion 304domain names are in general different. For a detailed discussion
291see the hostname(1) man page. 305see the hostname(1) man page.
292 306
293============================================================== 307
294hardlockup_all_cpu_backtrace: 308hardlockup_all_cpu_backtrace:
309=============================
295 310
296This value controls the hard lockup detector behavior when a hard 311This value controls the hard lockup detector behavior when a hard
297lockup condition is detected as to whether or not to gather further 312lockup condition is detected as to whether or not to gather further
@@ -301,9 +316,10 @@ will be initiated.
3010: do nothing. This is the default behavior. 3160: do nothing. This is the default behavior.
302 317
3031: on detection capture more debug information. 3181: on detection capture more debug information.
304============================================================== 319
305 320
306hardlockup_panic: 321hardlockup_panic:
322=================
307 323
308This parameter can be used to control whether the kernel panics 324This parameter can be used to control whether the kernel panics
309when a hard lockup is detected. 325when a hard lockup is detected.
@@ -314,16 +330,16 @@ when a hard lockup is detected.
314See Documentation/lockup-watchdogs.txt for more information. This can 330See Documentation/lockup-watchdogs.txt for more information. This can
315also be set using the nmi_watchdog kernel parameter. 331also be set using the nmi_watchdog kernel parameter.
316 332
317==============================================================
318 333
319hotplug: 334hotplug:
335========
320 336
321Path for the hotplug policy agent. 337Path for the hotplug policy agent.
322Default value is "/sbin/hotplug". 338Default value is "/sbin/hotplug".
323 339
324==============================================================
325 340
326hung_task_panic: 341hung_task_panic:
342================
327 343
328Controls the kernel's behavior when a hung task is detected. 344Controls the kernel's behavior when a hung task is detected.
329This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. 345This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
@@ -332,27 +348,28 @@ This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
332 348
3331: panic immediately. 3491: panic immediately.
334 350
335==============================================================
336 351
337hung_task_check_count: 352hung_task_check_count:
353======================
338 354
339The upper bound on the number of tasks that are checked. 355The upper bound on the number of tasks that are checked.
340This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. 356This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
341 357
342==============================================================
343 358
344hung_task_timeout_secs: 359hung_task_timeout_secs:
360=======================
345 361
346When a task in D state did not get scheduled 362When a task in D state did not get scheduled
347for more than this value report a warning. 363for more than this value report a warning.
348This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. 364This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
349 365
3500: means infinite timeout - no checking done. 3660: means infinite timeout - no checking done.
367
351Possible values to set are in range {0..LONG_MAX/HZ}. 368Possible values to set are in range {0..LONG_MAX/HZ}.
352 369
353==============================================================
354 370
355hung_task_check_interval_secs: 371hung_task_check_interval_secs:
372==============================
356 373
357Hung task check interval. If hung task checking is enabled 374Hung task check interval. If hung task checking is enabled
358(see hung_task_timeout_secs), the check is done every 375(see hung_task_timeout_secs), the check is done every
@@ -362,9 +379,9 @@ This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
3620 (default): means use hung_task_timeout_secs as checking interval. 3790 (default): means use hung_task_timeout_secs as checking interval.
363Possible values to set are in range {0..LONG_MAX/HZ}. 380Possible values to set are in range {0..LONG_MAX/HZ}.
364 381
365==============================================================
366 382
367hung_task_warnings: 383hung_task_warnings:
384===================
368 385
369The maximum number of warnings to report. During a check interval 386The maximum number of warnings to report. During a check interval
370if a hung task is detected, this value is decreased by 1. 387if a hung task is detected, this value is decreased by 1.
@@ -373,9 +390,9 @@ This file shows up if CONFIG_DETECT_HUNG_TASK is enabled.
373 390
374-1: report an infinite number of warnings. 391-1: report an infinite number of warnings.
375 392
376==============================================================
377 393
378hyperv_record_panic_msg: 394hyperv_record_panic_msg:
395========================
379 396
380Controls whether the panic kmsg data should be reported to Hyper-V. 397Controls whether the panic kmsg data should be reported to Hyper-V.
381 398
@@ -383,9 +400,9 @@ Controls whether the panic kmsg data should be reported to Hyper-V.
383 400
3841: report the panic kmsg data. This is the default behavior. 4011: report the panic kmsg data. This is the default behavior.
385 402
386==============================================================
387 403
388kexec_load_disabled: 404kexec_load_disabled:
405====================
389 406
390A toggle indicating if the kexec_load syscall has been disabled. This 407A toggle indicating if the kexec_load syscall has been disabled. This
391value defaults to 0 (false: kexec_load enabled), but can be set to 1 408value defaults to 0 (false: kexec_load enabled), but can be set to 1
@@ -395,9 +412,9 @@ loaded before disabling the syscall, allowing a system to set up (and
395later use) an image without it being altered. Generally used together 412later use) an image without it being altered. Generally used together
396with the "modules_disabled" sysctl. 413with the "modules_disabled" sysctl.
397 414
398==============================================================
399 415
400kptr_restrict: 416kptr_restrict:
417==============
401 418
402This toggle indicates whether restrictions are placed on 419This toggle indicates whether restrictions are placed on
403exposing kernel addresses via /proc and other interfaces. 420exposing kernel addresses via /proc and other interfaces.
@@ -420,16 +437,16 @@ values to unprivileged users is a concern.
420When kptr_restrict is set to (2), kernel pointers printed using 437When kptr_restrict is set to (2), kernel pointers printed using
421%pK will be replaced with 0's regardless of privileges. 438%pK will be replaced with 0's regardless of privileges.
422 439
423==============================================================
424 440
425l2cr: (PPC only) 441l2cr: (PPC only)
442================
426 443
427This flag controls the L2 cache of G3 processor boards. If 444This flag controls the L2 cache of G3 processor boards. If
4280, the cache is disabled. Enabled if nonzero. 4450, the cache is disabled. Enabled if nonzero.
429 446
430==============================================================
431 447
432modules_disabled: 448modules_disabled:
449=================
433 450
434A toggle value indicating if modules are allowed to be loaded 451A toggle value indicating if modules are allowed to be loaded
435in an otherwise modular kernel. This toggle defaults to off 452in an otherwise modular kernel. This toggle defaults to off
@@ -437,9 +454,9 @@ in an otherwise modular kernel. This toggle defaults to off
437neither loaded nor unloaded, and the toggle cannot be set back 454neither loaded nor unloaded, and the toggle cannot be set back
438to false. Generally used with the "kexec_load_disabled" toggle. 455to false. Generally used with the "kexec_load_disabled" toggle.
439 456
440==============================================================
441 457
442msg_next_id, sem_next_id, and shm_next_id: 458msg_next_id, sem_next_id, and shm_next_id:
459==========================================
443 460
444These three toggles allows to specify desired id for next allocated IPC 461These three toggles allows to specify desired id for next allocated IPC
445object: message, semaphore or shared memory respectively. 462object: message, semaphore or shared memory respectively.
@@ -448,21 +465,22 @@ By default they are equal to -1, which means generic allocation logic.
448Possible values to set are in range {0..INT_MAX}. 465Possible values to set are in range {0..INT_MAX}.
449 466
450Notes: 467Notes:
4511) kernel doesn't guarantee, that new object will have desired id. So, 468 1) kernel doesn't guarantee, that new object will have desired id. So,
452it's up to userspace, how to handle an object with "wrong" id. 469 it's up to userspace, how to handle an object with "wrong" id.
4532) Toggle with non-default value will be set back to -1 by kernel after 470 2) Toggle with non-default value will be set back to -1 by kernel after
454successful IPC object allocation. If an IPC object allocation syscall 471 successful IPC object allocation. If an IPC object allocation syscall
455fails, it is undefined if the value remains unmodified or is reset to -1. 472 fails, it is undefined if the value remains unmodified or is reset to -1.
456 473
457==============================================================
458 474
459nmi_watchdog: 475nmi_watchdog:
476=============
460 477
461This parameter can be used to control the NMI watchdog 478This parameter can be used to control the NMI watchdog
462(i.e. the hard lockup detector) on x86 systems. 479(i.e. the hard lockup detector) on x86 systems.
463 480
464 0 - disable the hard lockup detector 4810 - disable the hard lockup detector
465 1 - enable the hard lockup detector 482
4831 - enable the hard lockup detector
466 484
467The hard lockup detector monitors each CPU for its ability to respond to 485The hard lockup detector monitors each CPU for its ability to respond to
468timer interrupts. The mechanism utilizes CPU performance counter registers 486timer interrupts. The mechanism utilizes CPU performance counter registers
@@ -470,15 +488,15 @@ that are programmed to generate Non-Maskable Interrupts (NMIs) periodically
470while a CPU is busy. Hence, the alternative name 'NMI watchdog'. 488while a CPU is busy. Hence, the alternative name 'NMI watchdog'.
471 489
472The NMI watchdog is disabled by default if the kernel is running as a guest 490The NMI watchdog is disabled by default if the kernel is running as a guest
473in a KVM virtual machine. This default can be overridden by adding 491in a KVM virtual machine. This default can be overridden by adding::
474 492
475 nmi_watchdog=1 493 nmi_watchdog=1
476 494
477to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). 495to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst).
478 496
479==============================================================
480 497
481numa_balancing 498numa_balancing:
499===============
482 500
483Enables/disables automatic page fault based NUMA memory 501Enables/disables automatic page fault based NUMA memory
484balancing. Memory is moved automatically to nodes 502balancing. Memory is moved automatically to nodes
@@ -500,10 +518,9 @@ faults may be controlled by the numa_balancing_scan_period_min_ms,
500numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, 518numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
501numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. 519numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls.
502 520
503============================================================== 521numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
522===============================================================================================================================
504 523
505numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
506numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
507 524
508Automatic NUMA balancing scans tasks address space and unmaps pages to 525Automatic NUMA balancing scans tasks address space and unmaps pages to
509detect if pages are properly placed or if the data should be migrated to a 526detect if pages are properly placed or if the data should be migrated to a
@@ -539,16 +556,18 @@ rate for each task.
539numa_balancing_scan_size_mb is how many megabytes worth of pages are 556numa_balancing_scan_size_mb is how many megabytes worth of pages are
540scanned for a given scan. 557scanned for a given scan.
541 558
542==============================================================
543 559
544osrelease, ostype & version: 560osrelease, ostype & version:
561============================
562
563::
545 564
546# cat osrelease 565 # cat osrelease
5472.1.88 566 2.1.88
548# cat ostype 567 # cat ostype
549Linux 568 Linux
550# cat version 569 # cat version
551#5 Wed Feb 25 21:49:24 MET 1998 570 #5 Wed Feb 25 21:49:24 MET 1998
552 571
553The files osrelease and ostype should be clear enough. Version 572The files osrelease and ostype should be clear enough. Version
554needs a little more clarification however. The '#5' means that 573needs a little more clarification however. The '#5' means that
@@ -556,9 +575,9 @@ this is the fifth kernel built from this source base and the
556date behind it indicates the time the kernel was built. 575date behind it indicates the time the kernel was built.
557The only way to tune these values is to rebuild the kernel :-) 576The only way to tune these values is to rebuild the kernel :-)
558 577
559==============================================================
560 578
561overflowgid & overflowuid: 579overflowgid & overflowuid:
580==========================
562 581
563if your architecture did not always support 32-bit UIDs (i.e. arm, 582if your architecture did not always support 32-bit UIDs (i.e. arm,
564i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to 583i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to
@@ -568,17 +587,17 @@ actual UID or GID would exceed 65535.
568These sysctls allow you to change the value of the fixed UID and GID. 587These sysctls allow you to change the value of the fixed UID and GID.
569The default is 65534. 588The default is 65534.
570 589
571==============================================================
572 590
573panic: 591panic:
592======
574 593
575The value in this file represents the number of seconds the kernel 594The value in this file represents the number of seconds the kernel
576waits before rebooting on a panic. When you use the software watchdog, 595waits before rebooting on a panic. When you use the software watchdog,
577the recommended setting is 60. 596the recommended setting is 60.
578 597
579==============================================================
580 598
581panic_on_io_nmi: 599panic_on_io_nmi:
600================
582 601
583Controls the kernel's behavior when a CPU receives an NMI caused by 602Controls the kernel's behavior when a CPU receives an NMI caused by
584an IO error. 603an IO error.
@@ -591,20 +610,20 @@ an IO error.
591 servers issue this sort of NMI when the dump button is pushed, 610 servers issue this sort of NMI when the dump button is pushed,
592 and you can use this option to take a crash dump. 611 and you can use this option to take a crash dump.
593 612
594==============================================================
595 613
596panic_on_oops: 614panic_on_oops:
615==============
597 616
598Controls the kernel's behaviour when an oops or BUG is encountered. 617Controls the kernel's behaviour when an oops or BUG is encountered.
599 618
6000: try to continue operation 6190: try to continue operation
601 620
6021: panic immediately. If the `panic' sysctl is also non-zero then the 6211: panic immediately. If the `panic` sysctl is also non-zero then the
603 machine will be rebooted. 622 machine will be rebooted.
604 623
605==============================================================
606 624
607panic_on_stackoverflow: 625panic_on_stackoverflow:
626=======================
608 627
609Controls the kernel's behavior when detecting the overflows of 628Controls the kernel's behavior when detecting the overflows of
610kernel, IRQ and exception stacks except a user stack. 629kernel, IRQ and exception stacks except a user stack.
@@ -614,9 +633,9 @@ This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled.
614 633
6151: panic immediately. 6341: panic immediately.
616 635
617==============================================================
618 636
619panic_on_unrecovered_nmi: 637panic_on_unrecovered_nmi:
638=========================
620 639
621The default Linux behaviour on an NMI of either memory or unknown is 640The default Linux behaviour on an NMI of either memory or unknown is
622to continue operation. For many environments such as scientific 641to continue operation. For many environments such as scientific
@@ -627,9 +646,9 @@ A small number of systems do generate NMI's for bizarre random reasons
627such as power management so the default is off. That sysctl works like 646such as power management so the default is off. That sysctl works like
628the existing panic controls already in that directory. 647the existing panic controls already in that directory.
629 648
630==============================================================
631 649
632panic_on_warn: 650panic_on_warn:
651==============
633 652
634Calls panic() in the WARN() path when set to 1. This is useful to avoid 653Calls panic() in the WARN() path when set to 1. This is useful to avoid
635a kernel rebuild when attempting to kdump at the location of a WARN(). 654a kernel rebuild when attempting to kdump at the location of a WARN().
@@ -638,25 +657,28 @@ a kernel rebuild when attempting to kdump at the location of a WARN().
638 657
6391: call panic() after printing out WARN() location. 6581: call panic() after printing out WARN() location.
640 659
641==============================================================
642 660
643panic_print: 661panic_print:
662============
644 663
645Bitmask for printing system info when panic happens. User can chose 664Bitmask for printing system info when panic happens. User can chose
646combination of the following bits: 665combination of the following bits:
647 666
648bit 0: print all tasks info 667===== ========================================
649bit 1: print system memory info 668bit 0 print all tasks info
650bit 2: print timer info 669bit 1 print system memory info
651bit 3: print locks info if CONFIG_LOCKDEP is on 670bit 2 print timer info
652bit 4: print ftrace buffer 671bit 3 print locks info if CONFIG_LOCKDEP is on
672bit 4 print ftrace buffer
673===== ========================================
674
675So for example to print tasks and memory info on panic, user can::
653 676
654So for example to print tasks and memory info on panic, user can:
655 echo 3 > /proc/sys/kernel/panic_print 677 echo 3 > /proc/sys/kernel/panic_print
656 678
657==============================================================
658 679
659panic_on_rcu_stall: 680panic_on_rcu_stall:
681===================
660 682
661When set to 1, calls panic() after RCU stall detection messages. This 683When set to 1, calls panic() after RCU stall detection messages. This
662is useful to define the root cause of RCU stalls using a vmcore. 684is useful to define the root cause of RCU stalls using a vmcore.
@@ -665,9 +687,9 @@ is useful to define the root cause of RCU stalls using a vmcore.
665 687
6661: panic() after printing RCU stall messages. 6881: panic() after printing RCU stall messages.
667 689
668==============================================================
669 690
670perf_cpu_time_max_percent: 691perf_cpu_time_max_percent:
692==========================
671 693
672Hints to the kernel how much CPU time it should be allowed to 694Hints to the kernel how much CPU time it should be allowed to
673use to handle perf sampling events. If the perf subsystem 695use to handle perf sampling events. If the perf subsystem
@@ -680,10 +702,12 @@ unexpectedly take too long to execute, the NMIs can become
680stacked up next to each other so much that nothing else is 702stacked up next to each other so much that nothing else is
681allowed to execute. 703allowed to execute.
682 704
6830: disable the mechanism. Do not monitor or correct perf's 7050:
706 disable the mechanism. Do not monitor or correct perf's
684 sampling rate no matter how CPU time it takes. 707 sampling rate no matter how CPU time it takes.
685 708
6861-100: attempt to throttle perf's sample rate to this 7091-100:
710 attempt to throttle perf's sample rate to this
687 percentage of CPU. Note: the kernel calculates an 711 percentage of CPU. Note: the kernel calculates an
688 "expected" length of each sample event. 100 here means 712 "expected" length of each sample event. 100 here means
689 100% of that expected length. Even if this is set to 713 100% of that expected length. Even if this is set to
@@ -691,23 +715,30 @@ allowed to execute.
691 length is exceeded. Set to 0 if you truly do not care 715 length is exceeded. Set to 0 if you truly do not care
692 how much CPU is consumed. 716 how much CPU is consumed.
693 717
694==============================================================
695 718
696perf_event_paranoid: 719perf_event_paranoid:
720====================
697 721
698Controls use of the performance events system by unprivileged 722Controls use of the performance events system by unprivileged
699users (without CAP_SYS_ADMIN). The default value is 2. 723users (without CAP_SYS_ADMIN). The default value is 2.
700 724
701 -1: Allow use of (almost) all events by all users 725=== ==================================================================
726 -1 Allow use of (almost) all events by all users
727
702 Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK 728 Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
703>=0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN 729
730>=0 Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
731
704 Disallow raw tracepoint access by users without CAP_SYS_ADMIN 732 Disallow raw tracepoint access by users without CAP_SYS_ADMIN
705>=1: Disallow CPU event access by users without CAP_SYS_ADMIN
706>=2: Disallow kernel profiling by users without CAP_SYS_ADMIN
707 733
708============================================================== 734>=1 Disallow CPU event access by users without CAP_SYS_ADMIN
735
736>=2 Disallow kernel profiling by users without CAP_SYS_ADMIN
737=== ==================================================================
738
709 739
710perf_event_max_stack: 740perf_event_max_stack:
741=====================
711 742
712Controls maximum number of stack frames to copy for (attr.sample_type & 743Controls maximum number of stack frames to copy for (attr.sample_type &
713PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using 744PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using
@@ -718,17 +749,17 @@ enabled, otherwise writing to this file will return -EBUSY.
718 749
719The default value is 127. 750The default value is 127.
720 751
721==============================================================
722 752
723perf_event_mlock_kb: 753perf_event_mlock_kb:
754====================
724 755
725Control size of per-cpu ring buffer not counted agains mlock limit. 756Control size of per-cpu ring buffer not counted agains mlock limit.
726 757
727The default value is 512 + 1 page 758The default value is 512 + 1 page
728 759
729==============================================================
730 760
731perf_event_max_contexts_per_stack: 761perf_event_max_contexts_per_stack:
762==================================
732 763
733Controls maximum number of stack frame context entries for 764Controls maximum number of stack frame context entries for
734(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for 765(attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for
@@ -739,25 +770,25 @@ enabled, otherwise writing to this file will return -EBUSY.
739 770
740The default value is 8. 771The default value is 8.
741 772
742==============================================================
743 773
744pid_max: 774pid_max:
775========
745 776
746PID allocation wrap value. When the kernel's next PID value 777PID allocation wrap value. When the kernel's next PID value
747reaches this value, it wraps back to a minimum PID value. 778reaches this value, it wraps back to a minimum PID value.
748PIDs of value pid_max or larger are not allocated. 779PIDs of value pid_max or larger are not allocated.
749 780
750==============================================================
751 781
752ns_last_pid: 782ns_last_pid:
783============
753 784
754The last pid allocated in the current (the one task using this sysctl 785The last pid allocated in the current (the one task using this sysctl
755lives in) pid namespace. When selecting a pid for a next task on fork 786lives in) pid namespace. When selecting a pid for a next task on fork
756kernel tries to allocate a number starting from this one. 787kernel tries to allocate a number starting from this one.
757 788
758==============================================================
759 789
760powersave-nap: (PPC only) 790powersave-nap: (PPC only)
791=========================
761 792
762If set, Linux-PPC will use the 'nap' mode of powersaving, 793If set, Linux-PPC will use the 'nap' mode of powersaving,
763otherwise the 'doze' mode will be used. 794otherwise the 'doze' mode will be used.
@@ -765,6 +796,7 @@ otherwise the 'doze' mode will be used.
765============================================================== 796==============================================================
766 797
767printk: 798printk:
799=======
768 800
769The four values in printk denote: console_loglevel, 801The four values in printk denote: console_loglevel,
770default_message_loglevel, minimum_console_loglevel and 802default_message_loglevel, minimum_console_loglevel and
@@ -774,25 +806,29 @@ These values influence printk() behavior when printing or
774logging error messages. See 'man 2 syslog' for more info on 806logging error messages. See 'man 2 syslog' for more info on
775the different loglevels. 807the different loglevels.
776 808
777- console_loglevel: messages with a higher priority than 809- console_loglevel:
778 this will be printed to the console 810 messages with a higher priority than
779- default_message_loglevel: messages without an explicit priority 811 this will be printed to the console
780 will be printed with this priority 812- default_message_loglevel:
781- minimum_console_loglevel: minimum (highest) value to which 813 messages without an explicit priority
782 console_loglevel can be set 814 will be printed with this priority
783- default_console_loglevel: default value for console_loglevel 815- minimum_console_loglevel:
816 minimum (highest) value to which
817 console_loglevel can be set
818- default_console_loglevel:
819 default value for console_loglevel
784 820
785==============================================================
786 821
787printk_delay: 822printk_delay:
823=============
788 824
789Delay each printk message in printk_delay milliseconds 825Delay each printk message in printk_delay milliseconds
790 826
791Value from 0 - 10000 is allowed. 827Value from 0 - 10000 is allowed.
792 828
793==============================================================
794 829
795printk_ratelimit: 830printk_ratelimit:
831=================
796 832
797Some warning messages are rate limited. printk_ratelimit specifies 833Some warning messages are rate limited. printk_ratelimit specifies
798the minimum length of time between these messages (in jiffies), by 834the minimum length of time between these messages (in jiffies), by
@@ -800,48 +836,52 @@ default we allow one every 5 seconds.
800 836
801A value of 0 will disable rate limiting. 837A value of 0 will disable rate limiting.
802 838
803==============================================================
804 839
805printk_ratelimit_burst: 840printk_ratelimit_burst:
841=======================
806 842
807While long term we enforce one message per printk_ratelimit 843While long term we enforce one message per printk_ratelimit
808seconds, we do allow a burst of messages to pass through. 844seconds, we do allow a burst of messages to pass through.
809printk_ratelimit_burst specifies the number of messages we can 845printk_ratelimit_burst specifies the number of messages we can
810send before ratelimiting kicks in. 846send before ratelimiting kicks in.
811 847
812==============================================================
813 848
814printk_devkmsg: 849printk_devkmsg:
850===============
815 851
816Control the logging to /dev/kmsg from userspace: 852Control the logging to /dev/kmsg from userspace:
817 853
818ratelimit: default, ratelimited 854ratelimit:
855 default, ratelimited
856
819on: unlimited logging to /dev/kmsg from userspace 857on: unlimited logging to /dev/kmsg from userspace
858
820off: logging to /dev/kmsg disabled 859off: logging to /dev/kmsg disabled
821 860
822The kernel command line parameter printk.devkmsg= overrides this and is 861The kernel command line parameter printk.devkmsg= overrides this and is
823a one-time setting until next reboot: once set, it cannot be changed by 862a one-time setting until next reboot: once set, it cannot be changed by
824this sysctl interface anymore. 863this sysctl interface anymore.
825 864
826==============================================================
827 865
828randomize_va_space: 866randomize_va_space:
867===================
829 868
830This option can be used to select the type of process address 869This option can be used to select the type of process address
831space randomization that is used in the system, for architectures 870space randomization that is used in the system, for architectures
832that support this feature. 871that support this feature.
833 872
8340 - Turn the process address space randomization off. This is the 873== ===========================================================================
8740 Turn the process address space randomization off. This is the
835 default for architectures that do not support this feature anyways, 875 default for architectures that do not support this feature anyways,
836 and kernels that are booted with the "norandmaps" parameter. 876 and kernels that are booted with the "norandmaps" parameter.
837 877
8381 - Make the addresses of mmap base, stack and VDSO page randomized. 8781 Make the addresses of mmap base, stack and VDSO page randomized.
839 This, among other things, implies that shared libraries will be 879 This, among other things, implies that shared libraries will be
840 loaded to random addresses. Also for PIE-linked binaries, the 880 loaded to random addresses. Also for PIE-linked binaries, the
841 location of code start is randomized. This is the default if the 881 location of code start is randomized. This is the default if the
842 CONFIG_COMPAT_BRK option is enabled. 882 CONFIG_COMPAT_BRK option is enabled.
843 883
8442 - Additionally enable heap randomization. This is the default if 8842 Additionally enable heap randomization. This is the default if
845 CONFIG_COMPAT_BRK is disabled. 885 CONFIG_COMPAT_BRK is disabled.
846 886
847 There are a few legacy applications out there (such as some ancient 887 There are a few legacy applications out there (such as some ancient
@@ -854,18 +894,19 @@ that support this feature.
854 Systems with ancient and/or broken binaries should be configured 894 Systems with ancient and/or broken binaries should be configured
855 with CONFIG_COMPAT_BRK enabled, which excludes the heap from process 895 with CONFIG_COMPAT_BRK enabled, which excludes the heap from process
856 address space randomization. 896 address space randomization.
897== ===========================================================================
857 898
858==============================================================
859 899
860reboot-cmd: (Sparc only) 900reboot-cmd: (Sparc only)
901========================
861 902
862??? This seems to be a way to give an argument to the Sparc 903??? This seems to be a way to give an argument to the Sparc
863ROM/Flash boot loader. Maybe to tell it what to do after 904ROM/Flash boot loader. Maybe to tell it what to do after
864rebooting. ??? 905rebooting. ???
865 906
866==============================================================
867 907
868rtsig-max & rtsig-nr: 908rtsig-max & rtsig-nr:
909=====================
869 910
870The file rtsig-max can be used to tune the maximum number 911The file rtsig-max can be used to tune the maximum number
871of POSIX realtime (queued) signals that can be outstanding 912of POSIX realtime (queued) signals that can be outstanding
@@ -873,9 +914,9 @@ in the system.
873 914
874rtsig-nr shows the number of RT signals currently queued. 915rtsig-nr shows the number of RT signals currently queued.
875 916
876==============================================================
877 917
878sched_energy_aware: 918sched_energy_aware:
919===================
879 920
880Enables/disables Energy Aware Scheduling (EAS). EAS starts 921Enables/disables Energy Aware Scheduling (EAS). EAS starts
881automatically on platforms where it can run (that is, 922automatically on platforms where it can run (that is,
@@ -884,17 +925,17 @@ Model available). If your platform happens to meet the
884requirements for EAS but you do not want to use it, change 925requirements for EAS but you do not want to use it, change
885this value to 0. 926this value to 0.
886 927
887==============================================================
888 928
889sched_schedstats: 929sched_schedstats:
930=================
890 931
891Enables/disables scheduler statistics. Enabling this feature 932Enables/disables scheduler statistics. Enabling this feature
892incurs a small amount of overhead in the scheduler but is 933incurs a small amount of overhead in the scheduler but is
893useful for debugging and performance tuning. 934useful for debugging and performance tuning.
894 935
895==============================================================
896 936
897sg-big-buff: 937sg-big-buff:
938============
898 939
899This file shows the size of the generic SCSI (sg) buffer. 940This file shows the size of the generic SCSI (sg) buffer.
900You can't tune it just yet, but you could change it on 941You can't tune it just yet, but you could change it on
@@ -905,9 +946,9 @@ There shouldn't be any reason to change this value. If
905you can come up with one, you probably know what you 946you can come up with one, you probably know what you
906are doing anyway :) 947are doing anyway :)
907 948
908==============================================================
909 949
910shmall: 950shmall:
951=======
911 952
912This parameter sets the total amount of shared memory pages that 953This parameter sets the total amount of shared memory pages that
913can be used system wide. Hence, SHMALL should always be at least 954can be used system wide. Hence, SHMALL should always be at least
@@ -916,20 +957,20 @@ ceil(shmmax/PAGE_SIZE).
916If you are not sure what the default PAGE_SIZE is on your Linux 957If you are not sure what the default PAGE_SIZE is on your Linux
917system, you can run the following command: 958system, you can run the following command:
918 959
919# getconf PAGE_SIZE 960 # getconf PAGE_SIZE
920 961
921==============================================================
922 962
923shmmax: 963shmmax:
964=======
924 965
925This value can be used to query and set the run time limit 966This value can be used to query and set the run time limit
926on the maximum shared memory segment size that can be created. 967on the maximum shared memory segment size that can be created.
927Shared memory segments up to 1Gb are now supported in the 968Shared memory segments up to 1Gb are now supported in the
928kernel. This value defaults to SHMMAX. 969kernel. This value defaults to SHMMAX.
929 970
930==============================================================
931 971
932shm_rmid_forced: 972shm_rmid_forced:
973================
933 974
934Linux lets you set resource limits, including how much memory one 975Linux lets you set resource limits, including how much memory one
935process can consume, via setrlimit(2). Unfortunately, shared memory 976process can consume, via setrlimit(2). Unfortunately, shared memory
@@ -948,28 +989,30 @@ need this.
948Note that if you change this from 0 to 1, already created segments 989Note that if you change this from 0 to 1, already created segments
949without users and with a dead originative process will be destroyed. 990without users and with a dead originative process will be destroyed.
950 991
951==============================================================
952 992
953sysctl_writes_strict: 993sysctl_writes_strict:
994=====================
954 995
955Control how file position affects the behavior of updating sysctl values 996Control how file position affects the behavior of updating sysctl values
956via the /proc/sys interface: 997via the /proc/sys interface:
957 998
958 -1 - Legacy per-write sysctl value handling, with no printk warnings. 999 == ======================================================================
1000 -1 Legacy per-write sysctl value handling, with no printk warnings.
959 Each write syscall must fully contain the sysctl value to be 1001 Each write syscall must fully contain the sysctl value to be
960 written, and multiple writes on the same sysctl file descriptor 1002 written, and multiple writes on the same sysctl file descriptor
961 will rewrite the sysctl value, regardless of file position. 1003 will rewrite the sysctl value, regardless of file position.
962 0 - Same behavior as above, but warn about processes that perform writes 1004 0 Same behavior as above, but warn about processes that perform writes
963 to a sysctl file descriptor when the file position is not 0. 1005 to a sysctl file descriptor when the file position is not 0.
964 1 - (default) Respect file position when writing sysctl strings. Multiple 1006 1 (default) Respect file position when writing sysctl strings. Multiple
965 writes will append to the sysctl value buffer. Anything past the max 1007 writes will append to the sysctl value buffer. Anything past the max
966 length of the sysctl value buffer will be ignored. Writes to numeric 1008 length of the sysctl value buffer will be ignored. Writes to numeric
967 sysctl entries must always be at file position 0 and the value must 1009 sysctl entries must always be at file position 0 and the value must
968 be fully contained in the buffer sent in the write syscall. 1010 be fully contained in the buffer sent in the write syscall.
1011 == ======================================================================
969 1012
970==============================================================
971 1013
972softlockup_all_cpu_backtrace: 1014softlockup_all_cpu_backtrace:
1015=============================
973 1016
974This value controls the soft lockup detector thread's behavior 1017This value controls the soft lockup detector thread's behavior
975when a soft lockup condition is detected as to whether or not 1018when a soft lockup condition is detected as to whether or not
@@ -983,13 +1026,14 @@ NMI.
983 1026
9841: on detection capture more debug information. 10271: on detection capture more debug information.
985 1028
986==============================================================
987 1029
988soft_watchdog 1030soft_watchdog:
1031==============
989 1032
990This parameter can be used to control the soft lockup detector. 1033This parameter can be used to control the soft lockup detector.
991 1034
992 0 - disable the soft lockup detector 1035 0 - disable the soft lockup detector
1036
993 1 - enable the soft lockup detector 1037 1 - enable the soft lockup detector
994 1038
995The soft lockup detector monitors CPUs for threads that are hogging the CPUs 1039The soft lockup detector monitors CPUs for threads that are hogging the CPUs
@@ -999,9 +1043,9 @@ interrupts which are needed for the 'watchdog/N' threads to be woken up by
999the watchdog timer function, otherwise the NMI watchdog - if enabled - can 1043the watchdog timer function, otherwise the NMI watchdog - if enabled - can
1000detect a hard lockup condition. 1044detect a hard lockup condition.
1001 1045
1002==============================================================
1003 1046
1004stack_erasing 1047stack_erasing:
1048==============
1005 1049
1006This parameter can be used to control kernel stack erasing at the end 1050This parameter can be used to control kernel stack erasing at the end
1007of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK. 1051of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK.
@@ -1015,37 +1059,40 @@ compilation sees a 1% slowdown, other systems and workloads may vary.
1015 1059
1016 1: kernel stack erasing is enabled (default), it is performed before 1060 1: kernel stack erasing is enabled (default), it is performed before
1017 returning to the userspace at the end of syscalls. 1061 returning to the userspace at the end of syscalls.
1018============================================================== 1062
1019 1063
1020tainted 1064tainted
1065=======
1021 1066
1022Non-zero if the kernel has been tainted. Numeric values, which can be 1067Non-zero if the kernel has been tainted. Numeric values, which can be
1023ORed together. The letters are seen in "Tainted" line of Oops reports. 1068ORed together. The letters are seen in "Tainted" line of Oops reports.
1024 1069
1025 1 (P): proprietary module was loaded 1070====== ===== ==============================================================
1026 2 (F): module was force loaded 1071 1 `(P)` proprietary module was loaded
1027 4 (S): SMP kernel oops on an officially SMP incapable processor 1072 2 `(F)` module was force loaded
1028 8 (R): module was force unloaded 1073 4 `(S)` SMP kernel oops on an officially SMP incapable processor
1029 16 (M): processor reported a Machine Check Exception (MCE) 1074 8 `(R)` module was force unloaded
1030 32 (B): bad page referenced or some unexpected page flags 1075 16 `(M)` processor reported a Machine Check Exception (MCE)
1031 64 (U): taint requested by userspace application 1076 32 `(B)` bad page referenced or some unexpected page flags
1032 128 (D): kernel died recently, i.e. there was an OOPS or BUG 1077 64 `(U)` taint requested by userspace application
1033 256 (A): an ACPI table was overridden by user 1078 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG
1034 512 (W): kernel issued warning 1079 256 `(A)` an ACPI table was overridden by user
1035 1024 (C): staging driver was loaded 1080 512 `(W)` kernel issued warning
1036 2048 (I): workaround for bug in platform firmware applied 1081 1024 `(C)` staging driver was loaded
1037 4096 (O): externally-built ("out-of-tree") module was loaded 1082 2048 `(I)` workaround for bug in platform firmware applied
1038 8192 (E): unsigned module was loaded 1083 4096 `(O)` externally-built ("out-of-tree") module was loaded
1039 16384 (L): soft lockup occurred 1084 8192 `(E)` unsigned module was loaded
1040 32768 (K): kernel has been live patched 1085 16384 `(L)` soft lockup occurred
1041 65536 (X): Auxiliary taint, defined and used by for distros 1086 32768 `(K)` kernel has been live patched
1042131072 (T): The kernel was built with the struct randomization plugin 1087 65536 `(X)` Auxiliary taint, defined and used by for distros
1088131072 `(T)` The kernel was built with the struct randomization plugin
1089====== ===== ==============================================================
1043 1090
1044See Documentation/admin-guide/tainted-kernels.rst for more information. 1091See Documentation/admin-guide/tainted-kernels.rst for more information.
1045 1092
1046==============================================================
1047 1093
1048threads-max 1094threads-max:
1095============
1049 1096
1050This value controls the maximum number of threads that can be created 1097This value controls the maximum number of threads that can be created
1051using fork(). 1098using fork().
@@ -1055,8 +1102,10 @@ maximum number of threads is created, the thread structures occupy only
1055a part (1/8th) of the available RAM pages. 1102a part (1/8th) of the available RAM pages.
1056 1103
1057The minimum value that can be written to threads-max is 20. 1104The minimum value that can be written to threads-max is 20.
1105
1058The maximum value that can be written to threads-max is given by the 1106The maximum value that can be written to threads-max is given by the
1059constant FUTEX_TID_MASK (0x3fffffff). 1107constant FUTEX_TID_MASK (0x3fffffff).
1108
1060If a value outside of this range is written to threads-max an error 1109If a value outside of this range is written to threads-max an error
1061EINVAL occurs. 1110EINVAL occurs.
1062 1111
@@ -1064,9 +1113,9 @@ The value written is checked against the available RAM pages. If the
1064thread structures would occupy too much (more than 1/8th) of the 1113thread structures would occupy too much (more than 1/8th) of the
1065available RAM pages threads-max is reduced accordingly. 1114available RAM pages threads-max is reduced accordingly.
1066 1115
1067==============================================================
1068 1116
1069unknown_nmi_panic: 1117unknown_nmi_panic:
1118==================
1070 1119
1071The value in this file affects behavior of handling NMI. When the 1120The value in this file affects behavior of handling NMI. When the
1072value is non-zero, unknown NMI is trapped and then panic occurs. At 1121value is non-zero, unknown NMI is trapped and then panic occurs. At
@@ -1075,28 +1124,29 @@ that time, kernel debugging information is displayed on console.
1075NMI switch that most IA32 servers have fires unknown NMI up, for 1124NMI switch that most IA32 servers have fires unknown NMI up, for
1076example. If a system hangs up, try pressing the NMI switch. 1125example. If a system hangs up, try pressing the NMI switch.
1077 1126
1078==============================================================
1079 1127
1080watchdog: 1128watchdog:
1129=========
1081 1130
1082This parameter can be used to disable or enable the soft lockup detector 1131This parameter can be used to disable or enable the soft lockup detector
1083_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. 1132_and_ the NMI watchdog (i.e. the hard lockup detector) at the same time.
1084 1133
1085 0 - disable both lockup detectors 1134 0 - disable both lockup detectors
1135
1086 1 - enable both lockup detectors 1136 1 - enable both lockup detectors
1087 1137
1088The soft lockup detector and the NMI watchdog can also be disabled or 1138The soft lockup detector and the NMI watchdog can also be disabled or
1089enabled individually, using the soft_watchdog and nmi_watchdog parameters. 1139enabled individually, using the soft_watchdog and nmi_watchdog parameters.
1090If the watchdog parameter is read, for example by executing 1140If the watchdog parameter is read, for example by executing::
1091 1141
1092 cat /proc/sys/kernel/watchdog 1142 cat /proc/sys/kernel/watchdog
1093 1143
1094the output of this command (0 or 1) shows the logical OR of soft_watchdog 1144the output of this command (0 or 1) shows the logical OR of soft_watchdog
1095and nmi_watchdog. 1145and nmi_watchdog.
1096 1146
1097==============================================================
1098 1147
1099watchdog_cpumask: 1148watchdog_cpumask:
1149=================
1100 1150
1101This value can be used to control on which cpus the watchdog may run. 1151This value can be used to control on which cpus the watchdog may run.
1102The default cpumask is all possible cores, but if NO_HZ_FULL is 1152The default cpumask is all possible cores, but if NO_HZ_FULL is
@@ -1111,13 +1161,13 @@ if a kernel lockup was suspected on those cores.
1111 1161
1112The argument value is the standard cpulist format for cpumasks, 1162The argument value is the standard cpulist format for cpumasks,
1113so for example to enable the watchdog on cores 0, 2, 3, and 4 you 1163so for example to enable the watchdog on cores 0, 2, 3, and 4 you
1114might say: 1164might say::
1115 1165
1116 echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask 1166 echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask
1117 1167
1118==============================================================
1119 1168
1120watchdog_thresh: 1169watchdog_thresh:
1170================
1121 1171
1122This value can be used to control the frequency of hrtimer and NMI 1172This value can be used to control the frequency of hrtimer and NMI
1123events and the soft and hard lockup thresholds. The default threshold 1173events and the soft and hard lockup thresholds. The default threshold
@@ -1125,5 +1175,3 @@ is 10 seconds.
1125 1175
1126The softlockup threshold is (2 * watchdog_thresh). Setting this 1176The softlockup threshold is (2 * watchdog_thresh). Setting this
1127tunable to zero will disable lockup detection altogether. 1177tunable to zero will disable lockup detection altogether.
1128
1129==============================================================
diff --git a/Documentation/sysctl/net.txt b/Documentation/sysctl/net.rst
index 2ae91d3873bb..a7d44e71019d 100644
--- a/Documentation/sysctl/net.txt
+++ b/Documentation/sysctl/net.rst
@@ -1,12 +1,25 @@
1Documentation for /proc/sys/net/* 1================================
2 (c) 1999 Terrehon Bowden <terrehon@pacbell.net> 2Documentation for /proc/sys/net/
3 Bodo Bauer <bb@ricochet.net> 3================================
4 (c) 2000 Jorge Nerin <comandante@zaralinux.com>
5 (c) 2009 Shen Feng <shen@cn.fujitsu.com>
6 4
7For general info and legal blurb, please look in README. 5Copyright
8 6
9============================================================== 7Copyright (c) 1999
8
9 - Terrehon Bowden <terrehon@pacbell.net>
10 - Bodo Bauer <bb@ricochet.net>
11
12Copyright (c) 2000
13
14 - Jorge Nerin <comandante@zaralinux.com>
15
16Copyright (c) 2009
17
18 - Shen Feng <shen@cn.fujitsu.com>
19
20For general info and legal blurb, please look in index.rst.
21
22------------------------------------------------------------------------------
10 23
11This file contains the documentation for the sysctl files in 24This file contains the documentation for the sysctl files in
12/proc/sys/net 25/proc/sys/net
@@ -17,20 +30,22 @@ see only some of them, depending on your kernel's configuration.
17 30
18 31
19Table : Subdirectories in /proc/sys/net 32Table : Subdirectories in /proc/sys/net
20.............................................................................. 33
21 Directory Content Directory Content 34 ========= =================== = ========== ==================
22 core General parameter appletalk Appletalk protocol 35 Directory Content Directory Content
23 unix Unix domain sockets netrom NET/ROM 36 ========= =================== = ========== ==================
24 802 E802 protocol ax25 AX25 37 core General parameter appletalk Appletalk protocol
25 ethernet Ethernet protocol rose X.25 PLP layer 38 unix Unix domain sockets netrom NET/ROM
26 ipv4 IP version 4 x25 X.25 protocol 39 802 E802 protocol ax25 AX25
27 ipx IPX token-ring IBM token ring 40 ethernet Ethernet protocol rose X.25 PLP layer
28 bridge Bridging decnet DEC net 41 ipv4 IP version 4 x25 X.25 protocol
29 ipv6 IP version 6 tipc TIPC 42 ipx IPX token-ring IBM token ring
30.............................................................................. 43 bridge Bridging decnet DEC net
44 ipv6 IP version 6 tipc TIPC
45 ========= =================== = ========== ==================
31 46
321. /proc/sys/net/core - Network core options 471. /proc/sys/net/core - Network core options
33------------------------------------------------------- 48============================================
34 49
35bpf_jit_enable 50bpf_jit_enable
36-------------- 51--------------
@@ -44,6 +59,7 @@ restricted C into a sequence of BPF instructions. After program load
44through bpf(2) and passing a verifier in the kernel, a JIT will then 59through bpf(2) and passing a verifier in the kernel, a JIT will then
45translate these BPF proglets into native CPU instructions. There are 60translate these BPF proglets into native CPU instructions. There are
46two flavors of JITs, the newer eBPF JIT currently supported on: 61two flavors of JITs, the newer eBPF JIT currently supported on:
62
47 - x86_64 63 - x86_64
48 - x86_32 64 - x86_32
49 - arm64 65 - arm64
@@ -55,6 +71,7 @@ two flavors of JITs, the newer eBPF JIT currently supported on:
55 - riscv 71 - riscv
56 72
57And the older cBPF JIT supported on the following archs: 73And the older cBPF JIT supported on the following archs:
74
58 - mips 75 - mips
59 - ppc 76 - ppc
60 - sparc 77 - sparc
@@ -65,10 +82,11 @@ compile them transparently. Older cBPF JITs can only translate
65tcpdump filters, seccomp rules, etc, but not mentioned eBPF 82tcpdump filters, seccomp rules, etc, but not mentioned eBPF
66programs loaded through bpf(2). 83programs loaded through bpf(2).
67 84
68Values : 85Values:
69 0 - disable the JIT (default value) 86
70 1 - enable the JIT 87 - 0 - disable the JIT (default value)
71 2 - enable the JIT and ask the compiler to emit traces on kernel log. 88 - 1 - enable the JIT
89 - 2 - enable the JIT and ask the compiler to emit traces on kernel log.
72 90
73bpf_jit_harden 91bpf_jit_harden
74-------------- 92--------------
@@ -76,10 +94,12 @@ bpf_jit_harden
76This enables hardening for the BPF JIT compiler. Supported are eBPF 94This enables hardening for the BPF JIT compiler. Supported are eBPF
77JIT backends. Enabling hardening trades off performance, but can 95JIT backends. Enabling hardening trades off performance, but can
78mitigate JIT spraying. 96mitigate JIT spraying.
79Values : 97
80 0 - disable JIT hardening (default value) 98Values:
81 1 - enable JIT hardening for unprivileged users only 99
82 2 - enable JIT hardening for all users 100 - 0 - disable JIT hardening (default value)
101 - 1 - enable JIT hardening for unprivileged users only
102 - 2 - enable JIT hardening for all users
83 103
84bpf_jit_kallsyms 104bpf_jit_kallsyms
85---------------- 105----------------
@@ -89,9 +109,11 @@ addresses to the kernel, meaning they neither show up in traces nor
89in /proc/kallsyms. This enables export of these addresses, which can 109in /proc/kallsyms. This enables export of these addresses, which can
90be used for debugging/tracing. If bpf_jit_harden is enabled, this 110be used for debugging/tracing. If bpf_jit_harden is enabled, this
91feature is disabled. 111feature is disabled.
112
92Values : 113Values :
93 0 - disable JIT kallsyms export (default value) 114
94 1 - enable JIT kallsyms export for privileged users only 115 - 0 - disable JIT kallsyms export (default value)
116 - 1 - enable JIT kallsyms export for privileged users only
95 117
96bpf_jit_limit 118bpf_jit_limit
97------------- 119-------------
@@ -102,7 +124,7 @@ been surpassed. bpf_jit_limit contains the value of the global limit
102in bytes. 124in bytes.
103 125
104dev_weight 126dev_weight
105-------------- 127----------
106 128
107The maximum number of packets that kernel can handle on a NAPI interrupt, 129The maximum number of packets that kernel can handle on a NAPI interrupt,
108it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware 130it's a Per-CPU variable. For drivers that support LRO or GRO_HW, a hardware
@@ -111,7 +133,7 @@ aggregated packet is counted as one packet in this context.
111Default: 64 133Default: 64
112 134
113dev_weight_rx_bias 135dev_weight_rx_bias
114-------------- 136------------------
115 137
116RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function 138RPS (e.g. RFS, aRFS) processing is competing with the registered NAPI poll function
117of the driver for the per softirq cycle netdev_budget. This parameter influences 139of the driver for the per softirq cycle netdev_budget. This parameter influences
@@ -120,19 +142,22 @@ processing during RX softirq cycles. It is further meant for making current
120dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack. 142dev_weight adaptable for asymmetric CPU needs on RX/TX side of the network stack.
121(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based 143(see dev_weight_tx_bias) It is effective on a per CPU basis. Determination is based
122on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias). 144on dev_weight and is calculated multiplicative (dev_weight * dev_weight_rx_bias).
145
123Default: 1 146Default: 1
124 147
125dev_weight_tx_bias 148dev_weight_tx_bias
126-------------- 149------------------
127 150
128Scales the maximum number of packets that can be processed during a TX softirq cycle. 151Scales the maximum number of packets that can be processed during a TX softirq cycle.
129Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric 152Effective on a per CPU basis. Allows scaling of current dev_weight for asymmetric
130net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog. 153net stack processing needs. Be careful to avoid making TX softirq processing a CPU hog.
154
131Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias). 155Calculation is based on dev_weight (dev_weight * dev_weight_tx_bias).
156
132Default: 1 157Default: 1
133 158
134default_qdisc 159default_qdisc
135-------------- 160-------------
136 161
137The default queuing discipline to use for network devices. This allows 162The default queuing discipline to use for network devices. This allows
138overriding the default of pfifo_fast with an alternative. Since the default 163overriding the default of pfifo_fast with an alternative. Since the default
@@ -144,17 +169,21 @@ which require setting up classes and bandwidths. Note that physical multiqueue
144interfaces still use mq as root qdisc, which in turn uses this default for its 169interfaces still use mq as root qdisc, which in turn uses this default for its
145leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead 170leaves. Virtual devices (like e.g. lo or veth) ignore this setting and instead
146default to noqueue. 171default to noqueue.
172
147Default: pfifo_fast 173Default: pfifo_fast
148 174
149busy_read 175busy_read
150---------------- 176---------
177
151Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL) 178Low latency busy poll timeout for socket reads. (needs CONFIG_NET_RX_BUSY_POLL)
152Approximate time in us to busy loop waiting for packets on the device queue. 179Approximate time in us to busy loop waiting for packets on the device queue.
153This sets the default value of the SO_BUSY_POLL socket option. 180This sets the default value of the SO_BUSY_POLL socket option.
154Can be set or overridden per socket by setting socket option SO_BUSY_POLL, 181Can be set or overridden per socket by setting socket option SO_BUSY_POLL,
155which is the preferred method of enabling. If you need to enable the feature 182which is the preferred method of enabling. If you need to enable the feature
156globally via sysctl, a value of 50 is recommended. 183globally via sysctl, a value of 50 is recommended.
184
157Will increase power usage. 185Will increase power usage.
186
158Default: 0 (off) 187Default: 0 (off)
159 188
160busy_poll 189busy_poll
@@ -167,7 +196,9 @@ For more than that you probably want to use epoll.
167Note that only sockets with SO_BUSY_POLL set will be busy polled, 196Note that only sockets with SO_BUSY_POLL set will be busy polled,
168so you want to either selectively set SO_BUSY_POLL on those sockets or set 197so you want to either selectively set SO_BUSY_POLL on those sockets or set
169sysctl.net.busy_read globally. 198sysctl.net.busy_read globally.
199
170Will increase power usage. 200Will increase power usage.
201
171Default: 0 (off) 202Default: 0 (off)
172 203
173rmem_default 204rmem_default
@@ -185,6 +216,7 @@ tstamp_allow_data
185Allow processes to receive tx timestamps looped together with the original 216Allow processes to receive tx timestamps looped together with the original
186packet contents. If disabled, transmit timestamp requests from unprivileged 217packet contents. If disabled, transmit timestamp requests from unprivileged
187processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set. 218processes are dropped unless socket option SOF_TIMESTAMPING_OPT_TSONLY is set.
219
188Default: 1 (on) 220Default: 1 (on)
189 221
190 222
@@ -250,19 +282,24 @@ randomly generated.
250Some user space might need to gather its content even if drivers do not 282Some user space might need to gather its content even if drivers do not
251provide ethtool -x support yet. 283provide ethtool -x support yet.
252 284
253myhost:~# cat /proc/sys/net/core/netdev_rss_key 285::
25484:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total) 286
287 myhost:~# cat /proc/sys/net/core/netdev_rss_key
288 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8: ... (52 bytes total)
255 289
256File contains nul bytes if no driver ever called netdev_rss_key_fill() function. 290File contains nul bytes if no driver ever called netdev_rss_key_fill() function.
291
257Note: 292Note:
258/proc/sys/net/core/netdev_rss_key contains 52 bytes of key, 293 /proc/sys/net/core/netdev_rss_key contains 52 bytes of key,
259but most drivers only use 40 bytes of it. 294 but most drivers only use 40 bytes of it.
295
296::
260 297
261myhost:~# ethtool -x eth0 298 myhost:~# ethtool -x eth0
262RX flow hash indirection table for eth0 with 8 RX ring(s): 299 RX flow hash indirection table for eth0 with 8 RX ring(s):
263 0: 0 1 2 3 4 5 6 7 300 0: 0 1 2 3 4 5 6 7
264RSS hash key: 301 RSS hash key:
26584:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89 302 84:50:f4:00:a8:15:d1:a7:e9:7f:1d:60:35:c7:47:25:42:97:74:ca:56:bb:b6:a1:d8:43:e3:c9:0c:fd:17:55:c2:3a:4d:69:ed:f1:42:89
266 303
267netdev_tstamp_prequeue 304netdev_tstamp_prequeue
268---------------------- 305----------------------
@@ -293,7 +330,7 @@ user space is responsible for creating them if needed.
293Default : 0 (for compatibility reasons) 330Default : 0 (for compatibility reasons)
294 331
295devconf_inherit_init_net 332devconf_inherit_init_net
296---------------------------- 333------------------------
297 334
298Controls if a new network namespace should inherit all current 335Controls if a new network namespace should inherit all current
299settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By 336settings under /proc/sys/net/{ipv4,ipv6}/conf/{all,default}/. By
@@ -307,7 +344,7 @@ forced to reset to their default values.
307Default : 0 (for compatibility reasons) 344Default : 0 (for compatibility reasons)
308 345
3092. /proc/sys/net/unix - Parameters for Unix domain sockets 3462. /proc/sys/net/unix - Parameters for Unix domain sockets
310------------------------------------------------------- 347----------------------------------------------------------
311 348
312There is only one file in this directory. 349There is only one file in this directory.
313unix_dgram_qlen limits the max number of datagrams queued in Unix domain 350unix_dgram_qlen limits the max number of datagrams queued in Unix domain
@@ -315,13 +352,13 @@ socket's buffer. It will not take effect unless PF_UNIX flag is specified.
315 352
316 353
3173. /proc/sys/net/ipv4 - IPV4 settings 3543. /proc/sys/net/ipv4 - IPV4 settings
318------------------------------------------------------- 355-------------------------------------
319Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for 356Please see: Documentation/networking/ip-sysctl.txt and ipvs-sysctl.txt for
320descriptions of these entries. 357descriptions of these entries.
321 358
322 359
3234. Appletalk 3604. Appletalk
324------------------------------------------------------- 361------------
325 362
326The /proc/sys/net/appletalk directory holds the Appletalk configuration data 363The /proc/sys/net/appletalk directory holds the Appletalk configuration data
327when Appletalk is loaded. The configurable parameters are: 364when Appletalk is loaded. The configurable parameters are:
@@ -366,7 +403,7 @@ route flags, and the device the route is using.
366 403
367 404
3685. IPX 4055. IPX
369------------------------------------------------------- 406------
370 407
371The IPX protocol has no tunable values in proc/sys/net. 408The IPX protocol has no tunable values in proc/sys/net.
372 409
@@ -391,14 +428,16 @@ gives the destination network, the router node (or Directly) and the network
391address of the router (or Connected) for internal networks. 428address of the router (or Connected) for internal networks.
392 429
3936. TIPC 4306. TIPC
394------------------------------------------------------- 431-------
395 432
396tipc_rmem 433tipc_rmem
397---------- 434---------
398 435
399The TIPC protocol now has a tunable for the receive memory, similar to the 436The TIPC protocol now has a tunable for the receive memory, similar to the
400tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max) 437tcp_rmem - i.e. a vector of 3 INTEGERs: (min, default, max)
401 438
439::
440
402 # cat /proc/sys/net/tipc/tipc_rmem 441 # cat /proc/sys/net/tipc/tipc_rmem
403 4252725 34021800 68043600 442 4252725 34021800 68043600
404 # 443 #
@@ -409,7 +448,7 @@ is not at this point in time used in any meaningful way, but the triplet is
409preserved in order to be consistent with things like tcp_rmem. 448preserved in order to be consistent with things like tcp_rmem.
410 449
411named_timeout 450named_timeout
412-------------- 451-------------
413 452
414TIPC name table updates are distributed asynchronously in a cluster, without 453TIPC name table updates are distributed asynchronously in a cluster, without
415any form of transaction handling. This means that different race scenarios are 454any form of transaction handling. This means that different race scenarios are
diff --git a/Documentation/sysctl/sunrpc.txt b/Documentation/sysctl/sunrpc.rst
index ae1ecac6f85a..09780a682afd 100644
--- a/Documentation/sysctl/sunrpc.txt
+++ b/Documentation/sysctl/sunrpc.rst
@@ -1,9 +1,14 @@
1Documentation for /proc/sys/sunrpc/* kernel version 2.2.10 1===================================
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 2Documentation for /proc/sys/sunrpc/
3===================================
3 4
4For general info and legal blurb, please look in README. 5kernel version 2.2.10
5 6
6============================================================== 7Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
8
9For general info and legal blurb, please look in index.rst.
10
11------------------------------------------------------------------------------
7 12
8This file contains the documentation for the sysctl files in 13This file contains the documentation for the sysctl files in
9/proc/sys/sunrpc and is valid for Linux kernel version 2.2. 14/proc/sys/sunrpc and is valid for Linux kernel version 2.2.
diff --git a/Documentation/sysctl/user.txt b/Documentation/sysctl/user.rst
index a5882865836e..650eaa03f15e 100644
--- a/Documentation/sysctl/user.txt
+++ b/Documentation/sysctl/user.rst
@@ -1,7 +1,12 @@
1Documentation for /proc/sys/user/* kernel version 4.9.0 1=================================
2 (c) 2016 Eric Biederman <ebiederm@xmission.com> 2Documentation for /proc/sys/user/
3=================================
3 4
4============================================================== 5kernel version 4.9.0
6
7Copyright (c) 2016 Eric Biederman <ebiederm@xmission.com>
8
9------------------------------------------------------------------------------
5 10
6This file contains the documentation for the sysctl files in 11This file contains the documentation for the sysctl files in
7/proc/sys/user. 12/proc/sys/user.
@@ -30,37 +35,44 @@ user namespace does not allow a user to escape their current limits.
30 35
31Currently, these files are in /proc/sys/user: 36Currently, these files are in /proc/sys/user:
32 37
33- max_cgroup_namespaces 38max_cgroup_namespaces
39=====================
34 40
35 The maximum number of cgroup namespaces that any user in the current 41 The maximum number of cgroup namespaces that any user in the current
36 user namespace may create. 42 user namespace may create.
37 43
38- max_ipc_namespaces 44max_ipc_namespaces
45==================
39 46
40 The maximum number of ipc namespaces that any user in the current 47 The maximum number of ipc namespaces that any user in the current
41 user namespace may create. 48 user namespace may create.
42 49
43- max_mnt_namespaces 50max_mnt_namespaces
51==================
44 52
45 The maximum number of mount namespaces that any user in the current 53 The maximum number of mount namespaces that any user in the current
46 user namespace may create. 54 user namespace may create.
47 55
48- max_net_namespaces 56max_net_namespaces
57==================
49 58
50 The maximum number of network namespaces that any user in the 59 The maximum number of network namespaces that any user in the
51 current user namespace may create. 60 current user namespace may create.
52 61
53- max_pid_namespaces 62max_pid_namespaces
63==================
54 64
55 The maximum number of pid namespaces that any user in the current 65 The maximum number of pid namespaces that any user in the current
56 user namespace may create. 66 user namespace may create.
57 67
58- max_user_namespaces 68max_user_namespaces
69===================
59 70
60 The maximum number of user namespaces that any user in the current 71 The maximum number of user namespaces that any user in the current
61 user namespace may create. 72 user namespace may create.
62 73
63- max_uts_namespaces 74max_uts_namespaces
75==================
64 76
65 The maximum number of user namespaces that any user in the current 77 The maximum number of user namespaces that any user in the current
66 user namespace may create. 78 user namespace may create.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.rst
index c5f0d44433a2..5aceb5cd5ce7 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.rst
@@ -1,10 +1,16 @@
1Documentation for /proc/sys/vm/* kernel version 2.6.29 1===============================
2 (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> 2Documentation for /proc/sys/vm/
3 (c) 2008 Peter W. Morreale <pmorreale@novell.com> 3===============================
4 4
5For general info and legal blurb, please look in README. 5kernel version 2.6.29
6 6
7============================================================== 7Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org>
8
9Copyright (c) 2008 Peter W. Morreale <pmorreale@novell.com>
10
11For general info and legal blurb, please look in index.rst.
12
13------------------------------------------------------------------------------
8 14
9This file contains the documentation for the sysctl files in 15This file contains the documentation for the sysctl files in
10/proc/sys/vm and is valid for Linux kernel version 2.6.29. 16/proc/sys/vm and is valid for Linux kernel version 2.6.29.
@@ -68,9 +74,9 @@ Currently, these files are in /proc/sys/vm:
68- watermark_scale_factor 74- watermark_scale_factor
69- zone_reclaim_mode 75- zone_reclaim_mode
70 76
71==============================================================
72 77
73admin_reserve_kbytes 78admin_reserve_kbytes
79====================
74 80
75The amount of free memory in the system that should be reserved for users 81The amount of free memory in the system that should be reserved for users
76with the capability cap_sys_admin. 82with the capability cap_sys_admin.
@@ -97,25 +103,25 @@ On x86_64 this is about 128MB.
97 103
98Changing this takes effect whenever an application requests memory. 104Changing this takes effect whenever an application requests memory.
99 105
100==============================================================
101 106
102block_dump 107block_dump
108==========
103 109
104block_dump enables block I/O debugging when set to a nonzero value. More 110block_dump enables block I/O debugging when set to a nonzero value. More
105information on block I/O debugging is in Documentation/laptops/laptop-mode.rst. 111information on block I/O debugging is in Documentation/laptops/laptop-mode.rst.
106 112
107==============================================================
108 113
109compact_memory 114compact_memory
115==============
110 116
111Available only when CONFIG_COMPACTION is set. When 1 is written to the file, 117Available only when CONFIG_COMPACTION is set. When 1 is written to the file,
112all zones are compacted such that free memory is available in contiguous 118all zones are compacted such that free memory is available in contiguous
113blocks where possible. This can be important for example in the allocation of 119blocks where possible. This can be important for example in the allocation of
114huge pages although processes will also directly compact memory as required. 120huge pages although processes will also directly compact memory as required.
115 121
116==============================================================
117 122
118compact_unevictable_allowed 123compact_unevictable_allowed
124===========================
119 125
120Available only when CONFIG_COMPACTION is set. When set to 1, compaction is 126Available only when CONFIG_COMPACTION is set. When set to 1, compaction is
121allowed to examine the unevictable lru (mlocked pages) for pages to compact. 127allowed to examine the unevictable lru (mlocked pages) for pages to compact.
@@ -123,21 +129,22 @@ This should be used on systems where stalls for minor page faults are an
123acceptable trade for large contiguous free memory. Set to 0 to prevent 129acceptable trade for large contiguous free memory. Set to 0 to prevent
124compaction from moving pages that are unevictable. Default value is 1. 130compaction from moving pages that are unevictable. Default value is 1.
125 131
126==============================================================
127 132
128dirty_background_bytes 133dirty_background_bytes
134======================
129 135
130Contains the amount of dirty memory at which the background kernel 136Contains the amount of dirty memory at which the background kernel
131flusher threads will start writeback. 137flusher threads will start writeback.
132 138
133Note: dirty_background_bytes is the counterpart of dirty_background_ratio. Only 139Note:
134one of them may be specified at a time. When one sysctl is written it is 140 dirty_background_bytes is the counterpart of dirty_background_ratio. Only
135immediately taken into account to evaluate the dirty memory limits and the 141 one of them may be specified at a time. When one sysctl is written it is
136other appears as 0 when read. 142 immediately taken into account to evaluate the dirty memory limits and the
143 other appears as 0 when read.
137 144
138==============================================================
139 145
140dirty_background_ratio 146dirty_background_ratio
147======================
141 148
142Contains, as a percentage of total available memory that contains free pages 149Contains, as a percentage of total available memory that contains free pages
143and reclaimable pages, the number of pages at which the background kernel 150and reclaimable pages, the number of pages at which the background kernel
@@ -145,9 +152,9 @@ flusher threads will start writing out dirty data.
145 152
146The total available memory is not equal to total system memory. 153The total available memory is not equal to total system memory.
147 154
148==============================================================
149 155
150dirty_bytes 156dirty_bytes
157===========
151 158
152Contains the amount of dirty memory at which a process generating disk writes 159Contains the amount of dirty memory at which a process generating disk writes
153will itself start writeback. 160will itself start writeback.
@@ -161,18 +168,18 @@ Note: the minimum value allowed for dirty_bytes is two pages (in bytes); any
161value lower than this limit will be ignored and the old configuration will be 168value lower than this limit will be ignored and the old configuration will be
162retained. 169retained.
163 170
164==============================================================
165 171
166dirty_expire_centisecs 172dirty_expire_centisecs
173======================
167 174
168This tunable is used to define when dirty data is old enough to be eligible 175This tunable is used to define when dirty data is old enough to be eligible
169for writeout by the kernel flusher threads. It is expressed in 100'ths 176for writeout by the kernel flusher threads. It is expressed in 100'ths
170of a second. Data which has been dirty in-memory for longer than this 177of a second. Data which has been dirty in-memory for longer than this
171interval will be written out next time a flusher thread wakes up. 178interval will be written out next time a flusher thread wakes up.
172 179
173==============================================================
174 180
175dirty_ratio 181dirty_ratio
182===========
176 183
177Contains, as a percentage of total available memory that contains free pages 184Contains, as a percentage of total available memory that contains free pages
178and reclaimable pages, the number of pages at which a process which is 185and reclaimable pages, the number of pages at which a process which is
@@ -180,9 +187,9 @@ generating disk writes will itself start writing out dirty data.
180 187
181The total available memory is not equal to total system memory. 188The total available memory is not equal to total system memory.
182 189
183==============================================================
184 190
185dirtytime_expire_seconds 191dirtytime_expire_seconds
192========================
186 193
187When a lazytime inode is constantly having its pages dirtied, the inode with 194When a lazytime inode is constantly having its pages dirtied, the inode with
188an updated timestamp will never get chance to be written out. And, if the 195an updated timestamp will never get chance to be written out. And, if the
@@ -192,34 +199,39 @@ eventually gets pushed out to disk. This tunable is used to define when dirty
192inode is old enough to be eligible for writeback by the kernel flusher threads. 199inode is old enough to be eligible for writeback by the kernel flusher threads.
193And, it is also used as the interval to wakeup dirtytime_writeback thread. 200And, it is also used as the interval to wakeup dirtytime_writeback thread.
194 201
195==============================================================
196 202
197dirty_writeback_centisecs 203dirty_writeback_centisecs
204=========================
198 205
199The kernel flusher threads will periodically wake up and write `old' data 206The kernel flusher threads will periodically wake up and write `old` data
200out to disk. This tunable expresses the interval between those wakeups, in 207out to disk. This tunable expresses the interval between those wakeups, in
201100'ths of a second. 208100'ths of a second.
202 209
203Setting this to zero disables periodic writeback altogether. 210Setting this to zero disables periodic writeback altogether.
204 211
205==============================================================
206 212
207drop_caches 213drop_caches
214===========
208 215
209Writing to this will cause the kernel to drop clean caches, as well as 216Writing to this will cause the kernel to drop clean caches, as well as
210reclaimable slab objects like dentries and inodes. Once dropped, their 217reclaimable slab objects like dentries and inodes. Once dropped, their
211memory becomes free. 218memory becomes free.
212 219
213To free pagecache: 220To free pagecache::
221
214 echo 1 > /proc/sys/vm/drop_caches 222 echo 1 > /proc/sys/vm/drop_caches
215To free reclaimable slab objects (includes dentries and inodes): 223
224To free reclaimable slab objects (includes dentries and inodes)::
225
216 echo 2 > /proc/sys/vm/drop_caches 226 echo 2 > /proc/sys/vm/drop_caches
217To free slab objects and pagecache: 227
228To free slab objects and pagecache::
229
218 echo 3 > /proc/sys/vm/drop_caches 230 echo 3 > /proc/sys/vm/drop_caches
219 231
220This is a non-destructive operation and will not free any dirty objects. 232This is a non-destructive operation and will not free any dirty objects.
221To increase the number of objects freed by this operation, the user may run 233To increase the number of objects freed by this operation, the user may run
222`sync' prior to writing to /proc/sys/vm/drop_caches. This will minimize the 234`sync` prior to writing to /proc/sys/vm/drop_caches. This will minimize the
223number of dirty objects on the system and create more candidates to be 235number of dirty objects on the system and create more candidates to be
224dropped. 236dropped.
225 237
@@ -233,16 +245,16 @@ dropped objects, especially if they were under heavy use. Because of this,
233use outside of a testing or debugging environment is not recommended. 245use outside of a testing or debugging environment is not recommended.
234 246
235You may see informational messages in your kernel log when this file is 247You may see informational messages in your kernel log when this file is
236used: 248used::
237 249
238 cat (1234): drop_caches: 3 250 cat (1234): drop_caches: 3
239 251
240These are informational only. They do not mean that anything is wrong 252These are informational only. They do not mean that anything is wrong
241with your system. To disable them, echo 4 (bit 2) into drop_caches. 253with your system. To disable them, echo 4 (bit 2) into drop_caches.
242 254
243==============================================================
244 255
245extfrag_threshold 256extfrag_threshold
257=================
246 258
247This parameter affects whether the kernel will compact memory or direct 259This parameter affects whether the kernel will compact memory or direct
248reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in 260reclaim to satisfy a high-order allocation. The extfrag/extfrag_index file in
@@ -254,9 +266,9 @@ implies that the allocation will succeed as long as watermarks are met.
254The kernel will not compact memory in a zone if the 266The kernel will not compact memory in a zone if the
255fragmentation index is <= extfrag_threshold. The default value is 500. 267fragmentation index is <= extfrag_threshold. The default value is 500.
256 268
257==============================================================
258 269
259highmem_is_dirtyable 270highmem_is_dirtyable
271====================
260 272
261Available only for systems with CONFIG_HIGHMEM enabled (32b systems). 273Available only for systems with CONFIG_HIGHMEM enabled (32b systems).
262 274
@@ -274,30 +286,30 @@ OOM killer because some writers (e.g. direct block device writes) can
274only use the low memory and they can fill it up with dirty data without 286only use the low memory and they can fill it up with dirty data without
275any throttling. 287any throttling.
276 288
277==============================================================
278 289
279hugetlb_shm_group 290hugetlb_shm_group
291=================
280 292
281hugetlb_shm_group contains group id that is allowed to create SysV 293hugetlb_shm_group contains group id that is allowed to create SysV
282shared memory segment using hugetlb page. 294shared memory segment using hugetlb page.
283 295
284==============================================================
285 296
286laptop_mode 297laptop_mode
298===========
287 299
288laptop_mode is a knob that controls "laptop mode". All the things that are 300laptop_mode is a knob that controls "laptop mode". All the things that are
289controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst. 301controlled by this knob are discussed in Documentation/laptops/laptop-mode.rst.
290 302
291==============================================================
292 303
293legacy_va_layout 304legacy_va_layout
305================
294 306
295If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel 307If non-zero, this sysctl disables the new 32-bit mmap layout - the kernel
296will use the legacy (2.4) layout for all processes. 308will use the legacy (2.4) layout for all processes.
297 309
298==============================================================
299 310
300lowmem_reserve_ratio 311lowmem_reserve_ratio
312====================
301 313
302For some specialised workloads on highmem machines it is dangerous for 314For some specialised workloads on highmem machines it is dangerous for
303the kernel to allow process memory to be allocated from the "lowmem" 315the kernel to allow process memory to be allocated from the "lowmem"
@@ -308,7 +320,7 @@ And on large highmem machines this lack of reclaimable lowmem memory
308can be fatal. 320can be fatal.
309 321
310So the Linux page allocator has a mechanism which prevents allocations 322So the Linux page allocator has a mechanism which prevents allocations
311which _could_ use highmem from using too much lowmem. This means that 323which *could* use highmem from using too much lowmem. This means that
312a certain amount of lowmem is defended from the possibility of being 324a certain amount of lowmem is defended from the possibility of being
313captured into pinned user memory. 325captured into pinned user memory.
314 326
@@ -316,39 +328,37 @@ captured into pinned user memory.
316mechanism will also defend that region from allocations which could use 328mechanism will also defend that region from allocations which could use
317highmem or lowmem). 329highmem or lowmem).
318 330
319The `lowmem_reserve_ratio' tunable determines how aggressive the kernel is 331The `lowmem_reserve_ratio` tunable determines how aggressive the kernel is
320in defending these lower zones. 332in defending these lower zones.
321 333
322If you have a machine which uses highmem or ISA DMA and your 334If you have a machine which uses highmem or ISA DMA and your
323applications are using mlock(), or if you are running with no swap then 335applications are using mlock(), or if you are running with no swap then
324you probably should change the lowmem_reserve_ratio setting. 336you probably should change the lowmem_reserve_ratio setting.
325 337
326The lowmem_reserve_ratio is an array. You can see them by reading this file. 338The lowmem_reserve_ratio is an array. You can see them by reading this file::
327- 339
328% cat /proc/sys/vm/lowmem_reserve_ratio 340 % cat /proc/sys/vm/lowmem_reserve_ratio
329256 256 32 341 256 256 32
330-
331 342
332But, these values are not used directly. The kernel calculates # of protection 343But, these values are not used directly. The kernel calculates # of protection
333pages for each zones from them. These are shown as array of protection pages 344pages for each zones from them. These are shown as array of protection pages
334in /proc/zoneinfo like followings. (This is an example of x86-64 box). 345in /proc/zoneinfo like followings. (This is an example of x86-64 box).
335Each zone has an array of protection pages like this. 346Each zone has an array of protection pages like this::
336 347
337- 348 Node 0, zone DMA
338Node 0, zone DMA 349 pages free 1355
339 pages free 1355 350 min 3
340 min 3 351 low 3
341 low 3 352 high 4
342 high 4
343 : 353 :
344 : 354 :
345 numa_other 0 355 numa_other 0
346 protection: (0, 2004, 2004, 2004) 356 protection: (0, 2004, 2004, 2004)
347 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 357 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
348 pagesets 358 pagesets
349 cpu: 0 pcp: 0 359 cpu: 0 pcp: 0
350 : 360 :
351- 361
352These protections are added to score to judge whether this zone should be used 362These protections are added to score to judge whether this zone should be used
353for page allocation or should be reclaimed. 363for page allocation or should be reclaimed.
354 364
@@ -359,20 +369,24 @@ not be used because pages_free(1355) is smaller than watermark + protection[2]
359normal page requirement. If requirement is DMA zone(index=0), protection[0] 369normal page requirement. If requirement is DMA zone(index=0), protection[0]
360(=0) is used. 370(=0) is used.
361 371
362zone[i]'s protection[j] is calculated by following expression. 372zone[i]'s protection[j] is calculated by following expression::
363 373
364(i < j): 374 (i < j):
365 zone[i]->protection[j] 375 zone[i]->protection[j]
366 = (total sums of managed_pages from zone[i+1] to zone[j] on the node) 376 = (total sums of managed_pages from zone[i+1] to zone[j] on the node)
367 / lowmem_reserve_ratio[i]; 377 / lowmem_reserve_ratio[i];
368(i = j): 378 (i = j):
369 (should not be protected. = 0; 379 (should not be protected. = 0;
370(i > j): 380 (i > j):
371 (not necessary, but looks 0) 381 (not necessary, but looks 0)
372 382
373The default values of lowmem_reserve_ratio[i] are 383The default values of lowmem_reserve_ratio[i] are
384
385 === ====================================
374 256 (if zone[i] means DMA or DMA32 zone) 386 256 (if zone[i] means DMA or DMA32 zone)
375 32 (others). 387 32 (others)
388 === ====================================
389
376As above expression, they are reciprocal number of ratio. 390As above expression, they are reciprocal number of ratio.
377256 means 1/256. # of protection pages becomes about "0.39%" of total managed 391256 means 1/256. # of protection pages becomes about "0.39%" of total managed
378pages of higher zones on the node. 392pages of higher zones on the node.
@@ -381,9 +395,9 @@ If you would like to protect more pages, smaller values are effective.
381The minimum value is 1 (1/1 -> 100%). The value less than 1 completely 395The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
382disables protection of the pages. 396disables protection of the pages.
383 397
384==============================================================
385 398
386max_map_count: 399max_map_count:
400==============
387 401
388This file contains the maximum number of memory map areas a process 402This file contains the maximum number of memory map areas a process
389may have. Memory map areas are used as a side-effect of calling 403may have. Memory map areas are used as a side-effect of calling
@@ -396,9 +410,9 @@ e.g., up to one or two maps per allocation.
396 410
397The default value is 65536. 411The default value is 65536.
398 412
399=============================================================
400 413
401memory_failure_early_kill: 414memory_failure_early_kill:
415==========================
402 416
403Control how to kill processes when uncorrected memory error (typically 417Control how to kill processes when uncorrected memory error (typically
404a 2bit error in a memory module) is detected in the background by hardware 418a 2bit error in a memory module) is detected in the background by hardware
@@ -424,9 +438,9 @@ check handling and depends on the hardware capabilities.
424 438
425Applications can override this setting individually with the PR_MCE_KILL prctl 439Applications can override this setting individually with the PR_MCE_KILL prctl
426 440
427==============================================================
428 441
429memory_failure_recovery 442memory_failure_recovery
443=======================
430 444
431Enable memory failure recovery (when supported by the platform) 445Enable memory failure recovery (when supported by the platform)
432 446
@@ -434,9 +448,9 @@ Enable memory failure recovery (when supported by the platform)
434 448
4350: Always panic on a memory failure. 4490: Always panic on a memory failure.
436 450
437==============================================================
438 451
439min_free_kbytes: 452min_free_kbytes
453===============
440 454
441This is used to force the Linux VM to keep a minimum number 455This is used to force the Linux VM to keep a minimum number
442of kilobytes free. The VM uses this number to compute a 456of kilobytes free. The VM uses this number to compute a
@@ -450,9 +464,9 @@ become subtly broken, and prone to deadlock under high loads.
450 464
451Setting this too high will OOM your machine instantly. 465Setting this too high will OOM your machine instantly.
452 466
453=============================================================
454 467
455min_slab_ratio: 468min_slab_ratio
469==============
456 470
457This is available only on NUMA kernels. 471This is available only on NUMA kernels.
458 472
@@ -468,9 +482,9 @@ Note that slab reclaim is triggered in a per zone / node fashion.
468The process of reclaiming slab memory is currently not node specific 482The process of reclaiming slab memory is currently not node specific
469and may not be fast. 483and may not be fast.
470 484
471=============================================================
472 485
473min_unmapped_ratio: 486min_unmapped_ratio
487==================
474 488
475This is available only on NUMA kernels. 489This is available only on NUMA kernels.
476 490
@@ -485,9 +499,9 @@ files and similar are considered.
485 499
486The default is 1 percent. 500The default is 1 percent.
487 501
488==============================================================
489 502
490mmap_min_addr 503mmap_min_addr
504=============
491 505
492This file indicates the amount of address space which a user process will 506This file indicates the amount of address space which a user process will
493be restricted from mmapping. Since kernel null dereference bugs could 507be restricted from mmapping. Since kernel null dereference bugs could
@@ -498,9 +512,9 @@ security module. Setting this value to something like 64k will allow the
498vast majority of applications to work correctly and provide defense in depth 512vast majority of applications to work correctly and provide defense in depth
499against future potential kernel bugs. 513against future potential kernel bugs.
500 514
501==============================================================
502 515
503mmap_rnd_bits: 516mmap_rnd_bits
517=============
504 518
505This value can be used to select the number of bits to use to 519This value can be used to select the number of bits to use to
506determine the random offset to the base address of vma regions 520determine the random offset to the base address of vma regions
@@ -511,9 +525,9 @@ by the architecture's minimum and maximum supported values.
511This value can be changed after boot using the 525This value can be changed after boot using the
512/proc/sys/vm/mmap_rnd_bits tunable 526/proc/sys/vm/mmap_rnd_bits tunable
513 527
514==============================================================
515 528
516mmap_rnd_compat_bits: 529mmap_rnd_compat_bits
530====================
517 531
518This value can be used to select the number of bits to use to 532This value can be used to select the number of bits to use to
519determine the random offset to the base address of vma regions 533determine the random offset to the base address of vma regions
@@ -525,35 +539,35 @@ architecture's minimum and maximum supported values.
525This value can be changed after boot using the 539This value can be changed after boot using the
526/proc/sys/vm/mmap_rnd_compat_bits tunable 540/proc/sys/vm/mmap_rnd_compat_bits tunable
527 541
528==============================================================
529 542
530nr_hugepages 543nr_hugepages
544============
531 545
532Change the minimum size of the hugepage pool. 546Change the minimum size of the hugepage pool.
533 547
534See Documentation/admin-guide/mm/hugetlbpage.rst 548See Documentation/admin-guide/mm/hugetlbpage.rst
535 549
536==============================================================
537 550
538nr_hugepages_mempolicy 551nr_hugepages_mempolicy
552======================
539 553
540Change the size of the hugepage pool at run-time on a specific 554Change the size of the hugepage pool at run-time on a specific
541set of NUMA nodes. 555set of NUMA nodes.
542 556
543See Documentation/admin-guide/mm/hugetlbpage.rst 557See Documentation/admin-guide/mm/hugetlbpage.rst
544 558
545==============================================================
546 559
547nr_overcommit_hugepages 560nr_overcommit_hugepages
561=======================
548 562
549Change the maximum size of the hugepage pool. The maximum is 563Change the maximum size of the hugepage pool. The maximum is
550nr_hugepages + nr_overcommit_hugepages. 564nr_hugepages + nr_overcommit_hugepages.
551 565
552See Documentation/admin-guide/mm/hugetlbpage.rst 566See Documentation/admin-guide/mm/hugetlbpage.rst
553 567
554==============================================================
555 568
556nr_trim_pages 569nr_trim_pages
570=============
557 571
558This is available only on NOMMU kernels. 572This is available only on NOMMU kernels.
559 573
@@ -568,16 +582,17 @@ The default value is 1.
568 582
569See Documentation/nommu-mmap.txt for more information. 583See Documentation/nommu-mmap.txt for more information.
570 584
571==============================================================
572 585
573numa_zonelist_order 586numa_zonelist_order
587===================
574 588
575This sysctl is only for NUMA and it is deprecated. Anything but 589This sysctl is only for NUMA and it is deprecated. Anything but
576Node order will fail! 590Node order will fail!
577 591
578'where the memory is allocated from' is controlled by zonelists. 592'where the memory is allocated from' is controlled by zonelists.
593
579(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation. 594(This documentation ignores ZONE_HIGHMEM/ZONE_DMA32 for simple explanation.
580 you may be able to read ZONE_DMA as ZONE_DMA32...) 595you may be able to read ZONE_DMA as ZONE_DMA32...)
581 596
582In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following. 597In non-NUMA case, a zonelist for GFP_KERNEL is ordered as following.
583ZONE_NORMAL -> ZONE_DMA 598ZONE_NORMAL -> ZONE_DMA
@@ -585,10 +600,10 @@ This means that a memory allocation request for GFP_KERNEL will
585get memory from ZONE_DMA only when ZONE_NORMAL is not available. 600get memory from ZONE_DMA only when ZONE_NORMAL is not available.
586 601
587In NUMA case, you can think of following 2 types of order. 602In NUMA case, you can think of following 2 types of order.
588Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL 603Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL::
589 604
590(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL 605 (A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
591(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. 606 (B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA.
592 607
593Type(A) offers the best locality for processes on Node(0), but ZONE_DMA 608Type(A) offers the best locality for processes on Node(0), but ZONE_DMA
594will be used before ZONE_NORMAL exhaustion. This increases possibility of 609will be used before ZONE_NORMAL exhaustion. This increases possibility of
@@ -616,9 +631,9 @@ order will be selected.
616Default order is recommended unless this is causing problems for your 631Default order is recommended unless this is causing problems for your
617system/application. 632system/application.
618 633
619==============================================================
620 634
621oom_dump_tasks 635oom_dump_tasks
636==============
622 637
623Enables a system-wide task dump (excluding kernel threads) to be produced 638Enables a system-wide task dump (excluding kernel threads) to be produced
624when the kernel performs an OOM-killing and includes such information as 639when the kernel performs an OOM-killing and includes such information as
@@ -638,9 +653,9 @@ OOM killer actually kills a memory-hogging task.
638 653
639The default value is 1 (enabled). 654The default value is 1 (enabled).
640 655
641==============================================================
642 656
643oom_kill_allocating_task 657oom_kill_allocating_task
658========================
644 659
645This enables or disables killing the OOM-triggering task in 660This enables or disables killing the OOM-triggering task in
646out-of-memory situations. 661out-of-memory situations.
@@ -659,9 +674,9 @@ is used in oom_kill_allocating_task.
659 674
660The default value is 0. 675The default value is 0.
661 676
662==============================================================
663 677
664overcommit_kbytes: 678overcommit_kbytes
679=================
665 680
666When overcommit_memory is set to 2, the committed address space is not 681When overcommit_memory is set to 2, the committed address space is not
667permitted to exceed swap plus this amount of physical RAM. See below. 682permitted to exceed swap plus this amount of physical RAM. See below.
@@ -670,9 +685,9 @@ Note: overcommit_kbytes is the counterpart of overcommit_ratio. Only one
670of them may be specified at a time. Setting one disables the other (which 685of them may be specified at a time. Setting one disables the other (which
671then appears as 0 when read). 686then appears as 0 when read).
672 687
673==============================================================
674 688
675overcommit_memory: 689overcommit_memory
690=================
676 691
677This value contains a flag that enables memory overcommitment. 692This value contains a flag that enables memory overcommitment.
678 693
@@ -695,17 +710,17 @@ The default value is 0.
695See Documentation/vm/overcommit-accounting.rst and 710See Documentation/vm/overcommit-accounting.rst and
696mm/util.c::__vm_enough_memory() for more information. 711mm/util.c::__vm_enough_memory() for more information.
697 712
698==============================================================
699 713
700overcommit_ratio: 714overcommit_ratio
715================
701 716
702When overcommit_memory is set to 2, the committed address 717When overcommit_memory is set to 2, the committed address
703space is not permitted to exceed swap plus this percentage 718space is not permitted to exceed swap plus this percentage
704of physical RAM. See above. 719of physical RAM. See above.
705 720
706==============================================================
707 721
708page-cluster 722page-cluster
723============
709 724
710page-cluster controls the number of pages up to which consecutive pages 725page-cluster controls the number of pages up to which consecutive pages
711are read in from swap in a single attempt. This is the swap counterpart 726are read in from swap in a single attempt. This is the swap counterpart
@@ -725,9 +740,9 @@ Lower values mean lower latencies for initial faults, but at the same time
725extra faults and I/O delays for following faults if they would have been part of 740extra faults and I/O delays for following faults if they would have been part of
726that consecutive pages readahead would have brought in. 741that consecutive pages readahead would have brought in.
727 742
728=============================================================
729 743
730panic_on_oom 744panic_on_oom
745============
731 746
732This enables or disables panic on out-of-memory feature. 747This enables or disables panic on out-of-memory feature.
733 748
@@ -747,14 +762,16 @@ above-mentioned. Even oom happens under memory cgroup, the whole
747system panics. 762system panics.
748 763
749The default value is 0. 764The default value is 0.
765
7501 and 2 are for failover of clustering. Please select either 7661 and 2 are for failover of clustering. Please select either
751according to your policy of failover. 767according to your policy of failover.
768
752panic_on_oom=2+kdump gives you very strong tool to investigate 769panic_on_oom=2+kdump gives you very strong tool to investigate
753why oom happens. You can get snapshot. 770why oom happens. You can get snapshot.
754 771
755=============================================================
756 772
757percpu_pagelist_fraction 773percpu_pagelist_fraction
774========================
758 775
759This is the fraction of pages at most (high mark pcp->high) in each zone that 776This is the fraction of pages at most (high mark pcp->high) in each zone that
760are allocated for each per cpu page list. The min value for this is 8. It 777are allocated for each per cpu page list. The min value for this is 8. It
@@ -770,16 +787,16 @@ The initial value is zero. Kernel does not use this value at boot time to set
770the high water marks for each per cpu page list. If the user writes '0' to this 787the high water marks for each per cpu page list. If the user writes '0' to this
771sysctl, it will revert to this default behavior. 788sysctl, it will revert to this default behavior.
772 789
773==============================================================
774 790
775stat_interval 791stat_interval
792=============
776 793
777The time interval between which vm statistics are updated. The default 794The time interval between which vm statistics are updated. The default
778is 1 second. 795is 1 second.
779 796
780==============================================================
781 797
782stat_refresh 798stat_refresh
799============
783 800
784Any read or write (by root only) flushes all the per-cpu vm statistics 801Any read or write (by root only) flushes all the per-cpu vm statistics
785into their global totals, for more accurate reports when testing 802into their global totals, for more accurate reports when testing
@@ -790,24 +807,26 @@ as 0) and "fails" with EINVAL if any are found, with a warning in dmesg.
790(At time of writing, a few stats are known sometimes to be found negative, 807(At time of writing, a few stats are known sometimes to be found negative,
791with no ill effects: errors and warnings on these stats are suppressed.) 808with no ill effects: errors and warnings on these stats are suppressed.)
792 809
793==============================================================
794 810
795numa_stat 811numa_stat
812=========
796 813
797This interface allows runtime configuration of numa statistics. 814This interface allows runtime configuration of numa statistics.
798 815
799When page allocation performance becomes a bottleneck and you can tolerate 816When page allocation performance becomes a bottleneck and you can tolerate
800some possible tool breakage and decreased numa counter precision, you can 817some possible tool breakage and decreased numa counter precision, you can
801do: 818do::
819
802 echo 0 > /proc/sys/vm/numa_stat 820 echo 0 > /proc/sys/vm/numa_stat
803 821
804When page allocation performance is not a bottleneck and you want all 822When page allocation performance is not a bottleneck and you want all
805tooling to work, you can do: 823tooling to work, you can do::
824
806 echo 1 > /proc/sys/vm/numa_stat 825 echo 1 > /proc/sys/vm/numa_stat
807 826
808==============================================================
809 827
810swappiness 828swappiness
829==========
811 830
812This control is used to define how aggressive the kernel will swap 831This control is used to define how aggressive the kernel will swap
813memory pages. Higher values will increase aggressiveness, lower values 832memory pages. Higher values will increase aggressiveness, lower values
@@ -817,9 +836,9 @@ than the high water mark in a zone.
817 836
818The default value is 60. 837The default value is 60.
819 838
820==============================================================
821 839
822unprivileged_userfaultfd 840unprivileged_userfaultfd
841========================
823 842
824This flag controls whether unprivileged users can use the userfaultfd 843This flag controls whether unprivileged users can use the userfaultfd
825system calls. Set this to 1 to allow unprivileged users to use the 844system calls. Set this to 1 to allow unprivileged users to use the
@@ -828,9 +847,9 @@ privileged users (with SYS_CAP_PTRACE capability).
828 847
829The default value is 1. 848The default value is 1.
830 849
831==============================================================
832 850
833- user_reserve_kbytes 851user_reserve_kbytes
852===================
834 853
835When overcommit_memory is set to 2, "never overcommit" mode, reserve 854When overcommit_memory is set to 2, "never overcommit" mode, reserve
836min(3% of current process size, user_reserve_kbytes) of free memory. 855min(3% of current process size, user_reserve_kbytes) of free memory.
@@ -846,10 +865,9 @@ Any subsequent attempts to execute a command will result in
846 865
847Changing this takes effect whenever an application requests memory. 866Changing this takes effect whenever an application requests memory.
848 867
849==============================================================
850 868
851vfs_cache_pressure 869vfs_cache_pressure
852------------------ 870==================
853 871
854This percentage value controls the tendency of the kernel to reclaim 872This percentage value controls the tendency of the kernel to reclaim
855the memory which is used for caching of directory and inode objects. 873the memory which is used for caching of directory and inode objects.
@@ -867,9 +885,9 @@ performance impact. Reclaim code needs to take various locks to find freeable
867directory and inode objects. With vfs_cache_pressure=1000, it will look for 885directory and inode objects. With vfs_cache_pressure=1000, it will look for
868ten times more freeable objects than there are. 886ten times more freeable objects than there are.
869 887
870=============================================================
871 888
872watermark_boost_factor: 889watermark_boost_factor
890======================
873 891
874This factor controls the level of reclaim when memory is being fragmented. 892This factor controls the level of reclaim when memory is being fragmented.
875It defines the percentage of the high watermark of a zone that will be 893It defines the percentage of the high watermark of a zone that will be
@@ -887,9 +905,9 @@ fragmentation events that occurred in the recent past. If this value is
887smaller than a pageblock then a pageblocks worth of pages will be reclaimed 905smaller than a pageblock then a pageblocks worth of pages will be reclaimed
888(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature. 906(e.g. 2MB on 64-bit x86). A boost factor of 0 will disable the feature.
889 907
890=============================================================
891 908
892watermark_scale_factor: 909watermark_scale_factor
910======================
893 911
894This factor controls the aggressiveness of kswapd. It defines the 912This factor controls the aggressiveness of kswapd. It defines the
895amount of memory left in a node/system before kswapd is woken up and 913amount of memory left in a node/system before kswapd is woken up and
@@ -905,20 +923,22 @@ that the number of free pages kswapd maintains for latency reasons is
905too small for the allocation bursts occurring in the system. This knob 923too small for the allocation bursts occurring in the system. This knob
906can then be used to tune kswapd aggressiveness accordingly. 924can then be used to tune kswapd aggressiveness accordingly.
907 925
908==============================================================
909 926
910zone_reclaim_mode: 927zone_reclaim_mode
928=================
911 929
912Zone_reclaim_mode allows someone to set more or less aggressive approaches to 930Zone_reclaim_mode allows someone to set more or less aggressive approaches to
913reclaim memory when a zone runs out of memory. If it is set to zero then no 931reclaim memory when a zone runs out of memory. If it is set to zero then no
914zone reclaim occurs. Allocations will be satisfied from other zones / nodes 932zone reclaim occurs. Allocations will be satisfied from other zones / nodes
915in the system. 933in the system.
916 934
917This is value ORed together of 935This is value OR'ed together of
918 936
9191 = Zone reclaim on 937= ===================================
9202 = Zone reclaim writes dirty pages out 9381 Zone reclaim on
9214 = Zone reclaim swaps pages 9392 Zone reclaim writes dirty pages out
9404 Zone reclaim swaps pages
941= ===================================
922 942
923zone_reclaim_mode is disabled by default. For file servers or workloads 943zone_reclaim_mode is disabled by default. For file servers or workloads
924that benefit from having their data cached, zone_reclaim_mode should be 944that benefit from having their data cached, zone_reclaim_mode should be
@@ -942,5 +962,3 @@ of other processes running on other nodes will not be affected.
942Allowing regular swap effectively restricts allocations to the local 962Allowing regular swap effectively restricts allocations to the local
943node unless explicitly overridden by memory policies or cpuset 963node unless explicitly overridden by memory policies or cpuset
944configurations. 964configurations.
945
946============ End of Document =================================
diff --git a/Documentation/vm/unevictable-lru.rst b/Documentation/vm/unevictable-lru.rst
index c6d94118fbcc..8ba656f37cd8 100644
--- a/Documentation/vm/unevictable-lru.rst
+++ b/Documentation/vm/unevictable-lru.rst
@@ -439,7 +439,7 @@ Compacting MLOCKED Pages
439 439
440The unevictable LRU can be scanned for compactable regions and the default 440The unevictable LRU can be scanned for compactable regions and the default
441behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls 441behavior is to do so. /proc/sys/vm/compact_unevictable_allowed controls
442this behavior (see Documentation/sysctl/vm.txt). Once scanning of the 442this behavior (see Documentation/sysctl/vm.rst). Once scanning of the
443unevictable LRU is enabled, the work of compaction is mostly handled by 443unevictable LRU is enabled, the work of compaction is mostly handled by
444the page migration code and the same work flow as described in MIGRATING 444the page migration code and the same work flow as described in MIGRATING
445MLOCKED PAGES will apply. 445MLOCKED PAGES will apply.
diff --git a/kernel/panic.c b/kernel/panic.c
index 4d9f55bf7d38..e0ea74bbb41d 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -372,7 +372,7 @@ const struct taint_flag taint_flags[TAINT_FLAGS_COUNT] = {
372/** 372/**
373 * print_tainted - return a string to represent the kernel taint state. 373 * print_tainted - return a string to represent the kernel taint state.
374 * 374 *
375 * For individual taint flag meanings, see Documentation/sysctl/kernel.txt 375 * For individual taint flag meanings, see Documentation/sysctl/kernel.rst
376 * 376 *
377 * The string is overwritten by the next call to print_tainted(), 377 * The string is overwritten by the next call to print_tainted(),
378 * but is always NULL terminated. 378 * but is always NULL terminated.
diff --git a/mm/swap.c b/mm/swap.c
index 607c48229a1d..83a2a15f4836 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -8,7 +8,7 @@
8/* 8/*
9 * This file contains the default values for the operation of the 9 * This file contains the default values for the operation of the
10 * Linux VM subsystem. Fine-tuning documentation can be found in 10 * Linux VM subsystem. Fine-tuning documentation can be found in
11 * Documentation/sysctl/vm.txt. 11 * Documentation/sysctl/vm.rst.
12 * Started 18.12.91 12 * Started 18.12.91
13 * Swap aging added 23.2.95, Stephen Tweedie. 13 * Swap aging added 23.2.95, Stephen Tweedie.
14 * Buffermem limits added 12.3.98, Rik van Riel. 14 * Buffermem limits added 12.3.98, Rik van Riel.