diff options
Diffstat (limited to 'Documentation/admin-guide/sysctl/kernel.rst')
-rw-r--r-- | Documentation/admin-guide/sysctl/kernel.rst | 1177 |
1 files changed, 1177 insertions, 0 deletions
diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst new file mode 100644 index 000000000000..032c7cd3cede --- /dev/null +++ b/Documentation/admin-guide/sysctl/kernel.rst | |||
@@ -0,0 +1,1177 @@ | |||
1 | =================================== | ||
2 | Documentation for /proc/sys/kernel/ | ||
3 | =================================== | ||
4 | |||
5 | kernel version 2.2.10 | ||
6 | |||
7 | Copyright (c) 1998, 1999, Rik van Riel <riel@nl.linux.org> | ||
8 | |||
9 | Copyright (c) 2009, Shen Feng<shen@cn.fujitsu.com> | ||
10 | |||
11 | For general info and legal blurb, please look in index.rst. | ||
12 | |||
13 | ------------------------------------------------------------------------------ | ||
14 | |||
15 | This file contains documentation for the sysctl files in | ||
16 | /proc/sys/kernel/ and is valid for Linux kernel version 2.2. | ||
17 | |||
18 | The files in this directory can be used to tune and monitor | ||
19 | miscellaneous and general things in the operation of the Linux | ||
20 | kernel. Since some of the files _can_ be used to screw up your | ||
21 | system, it is advisable to read both documentation and source | ||
22 | before actually making adjustments. | ||
23 | |||
24 | Currently, these files might (depending on your configuration) | ||
25 | show up in /proc/sys/kernel: | ||
26 | |||
27 | - acct | ||
28 | - acpi_video_flags | ||
29 | - auto_msgmni | ||
30 | - bootloader_type [ X86 only ] | ||
31 | - bootloader_version [ X86 only ] | ||
32 | - cap_last_cap | ||
33 | - core_pattern | ||
34 | - core_pipe_limit | ||
35 | - core_uses_pid | ||
36 | - ctrl-alt-del | ||
37 | - dmesg_restrict | ||
38 | - domainname | ||
39 | - hostname | ||
40 | - hotplug | ||
41 | - hardlockup_all_cpu_backtrace | ||
42 | - hardlockup_panic | ||
43 | - hung_task_panic | ||
44 | - hung_task_check_count | ||
45 | - hung_task_timeout_secs | ||
46 | - hung_task_check_interval_secs | ||
47 | - hung_task_warnings | ||
48 | - hyperv_record_panic_msg | ||
49 | - kexec_load_disabled | ||
50 | - kptr_restrict | ||
51 | - l2cr [ PPC only ] | ||
52 | - modprobe ==> Documentation/debugging-modules.txt | ||
53 | - modules_disabled | ||
54 | - msg_next_id [ sysv ipc ] | ||
55 | - msgmax | ||
56 | - msgmnb | ||
57 | - msgmni | ||
58 | - nmi_watchdog | ||
59 | - osrelease | ||
60 | - ostype | ||
61 | - overflowgid | ||
62 | - overflowuid | ||
63 | - panic | ||
64 | - panic_on_oops | ||
65 | - panic_on_stackoverflow | ||
66 | - panic_on_unrecovered_nmi | ||
67 | - panic_on_warn | ||
68 | - panic_print | ||
69 | - panic_on_rcu_stall | ||
70 | - perf_cpu_time_max_percent | ||
71 | - perf_event_paranoid | ||
72 | - perf_event_max_stack | ||
73 | - perf_event_mlock_kb | ||
74 | - perf_event_max_contexts_per_stack | ||
75 | - pid_max | ||
76 | - powersave-nap [ PPC only ] | ||
77 | - printk | ||
78 | - printk_delay | ||
79 | - printk_ratelimit | ||
80 | - printk_ratelimit_burst | ||
81 | - pty ==> Documentation/filesystems/devpts.txt | ||
82 | - randomize_va_space | ||
83 | - real-root-dev ==> Documentation/admin-guide/initrd.rst | ||
84 | - reboot-cmd [ SPARC only ] | ||
85 | - rtsig-max | ||
86 | - rtsig-nr | ||
87 | - sched_energy_aware | ||
88 | - seccomp/ ==> Documentation/userspace-api/seccomp_filter.rst | ||
89 | - sem | ||
90 | - sem_next_id [ sysv ipc ] | ||
91 | - sg-big-buff [ generic SCSI device (sg) ] | ||
92 | - shm_next_id [ sysv ipc ] | ||
93 | - shm_rmid_forced | ||
94 | - shmall | ||
95 | - shmmax [ sysv ipc ] | ||
96 | - shmmni | ||
97 | - softlockup_all_cpu_backtrace | ||
98 | - soft_watchdog | ||
99 | - stack_erasing | ||
100 | - stop-a [ SPARC only ] | ||
101 | - sysrq ==> Documentation/admin-guide/sysrq.rst | ||
102 | - sysctl_writes_strict | ||
103 | - tainted ==> Documentation/admin-guide/tainted-kernels.rst | ||
104 | - threads-max | ||
105 | - unknown_nmi_panic | ||
106 | - watchdog | ||
107 | - watchdog_thresh | ||
108 | - version | ||
109 | |||
110 | |||
111 | acct: | ||
112 | ===== | ||
113 | |||
114 | highwater lowwater frequency | ||
115 | |||
116 | If BSD-style process accounting is enabled these values control | ||
117 | its behaviour. If free space on filesystem where the log lives | ||
118 | goes below <lowwater>% accounting suspends. If free space gets | ||
119 | above <highwater>% accounting resumes. <Frequency> determines | ||
120 | how often do we check the amount of free space (value is in | ||
121 | seconds). Default: | ||
122 | 4 2 30 | ||
123 | That is, suspend accounting if there left <= 2% free; resume it | ||
124 | if we got >=4%; consider information about amount of free space | ||
125 | valid for 30 seconds. | ||
126 | |||
127 | |||
128 | acpi_video_flags: | ||
129 | ================= | ||
130 | |||
131 | flags | ||
132 | |||
133 | See Doc*/kernel/power/video.txt, it allows mode of video boot to be | ||
134 | set during run time. | ||
135 | |||
136 | |||
137 | auto_msgmni: | ||
138 | ============ | ||
139 | |||
140 | This variable has no effect and may be removed in future kernel | ||
141 | releases. Reading it always returns 0. | ||
142 | Up to Linux 3.17, it enabled/disabled automatic recomputing of msgmni | ||
143 | upon memory add/remove or upon ipc namespace creation/removal. | ||
144 | Echoing "1" into this file enabled msgmni automatic recomputing. | ||
145 | Echoing "0" turned it off. auto_msgmni default value was 1. | ||
146 | |||
147 | |||
148 | bootloader_type: | ||
149 | ================ | ||
150 | |||
151 | x86 bootloader identification | ||
152 | |||
153 | This gives the bootloader type number as indicated by the bootloader, | ||
154 | shifted left by 4, and OR'd with the low four bits of the bootloader | ||
155 | version. The reason for this encoding is that this used to match the | ||
156 | type_of_loader field in the kernel header; the encoding is kept for | ||
157 | backwards compatibility. That is, if the full bootloader type number | ||
158 | is 0x15 and the full version number is 0x234, this file will contain | ||
159 | the value 340 = 0x154. | ||
160 | |||
161 | See the type_of_loader and ext_loader_type fields in | ||
162 | Documentation/x86/boot.rst for additional information. | ||
163 | |||
164 | |||
165 | bootloader_version: | ||
166 | =================== | ||
167 | |||
168 | x86 bootloader version | ||
169 | |||
170 | The complete bootloader version number. In the example above, this | ||
171 | file will contain the value 564 = 0x234. | ||
172 | |||
173 | See the type_of_loader and ext_loader_ver fields in | ||
174 | Documentation/x86/boot.rst for additional information. | ||
175 | |||
176 | |||
177 | cap_last_cap: | ||
178 | ============= | ||
179 | |||
180 | Highest valid capability of the running kernel. Exports | ||
181 | CAP_LAST_CAP from the kernel. | ||
182 | |||
183 | |||
184 | core_pattern: | ||
185 | ============= | ||
186 | |||
187 | core_pattern is used to specify a core dumpfile pattern name. | ||
188 | |||
189 | * max length 127 characters; default value is "core" | ||
190 | * core_pattern is used as a pattern template for the output filename; | ||
191 | certain string patterns (beginning with '%') are substituted with | ||
192 | their actual values. | ||
193 | * backward compatibility with core_uses_pid: | ||
194 | |||
195 | If core_pattern does not include "%p" (default does not) | ||
196 | and core_uses_pid is set, then .PID will be appended to | ||
197 | the filename. | ||
198 | |||
199 | * corename format specifiers:: | ||
200 | |||
201 | %<NUL> '%' is dropped | ||
202 | %% output one '%' | ||
203 | %p pid | ||
204 | %P global pid (init PID namespace) | ||
205 | %i tid | ||
206 | %I global tid (init PID namespace) | ||
207 | %u uid (in initial user namespace) | ||
208 | %g gid (in initial user namespace) | ||
209 | %d dump mode, matches PR_SET_DUMPABLE and | ||
210 | /proc/sys/fs/suid_dumpable | ||
211 | %s signal number | ||
212 | %t UNIX time of dump | ||
213 | %h hostname | ||
214 | %e executable filename (may be shortened) | ||
215 | %E executable path | ||
216 | %<OTHER> both are dropped | ||
217 | |||
218 | * If the first character of the pattern is a '|', the kernel will treat | ||
219 | the rest of the pattern as a command to run. The core dump will be | ||
220 | written to the standard input of that program instead of to a file. | ||
221 | |||
222 | |||
223 | core_pipe_limit: | ||
224 | ================ | ||
225 | |||
226 | This sysctl is only applicable when core_pattern is configured to pipe | ||
227 | core files to a user space helper (when the first character of | ||
228 | core_pattern is a '|', see above). When collecting cores via a pipe | ||
229 | to an application, it is occasionally useful for the collecting | ||
230 | application to gather data about the crashing process from its | ||
231 | /proc/pid directory. In order to do this safely, the kernel must wait | ||
232 | for the collecting process to exit, so as not to remove the crashing | ||
233 | processes proc files prematurely. This in turn creates the | ||
234 | possibility that a misbehaving userspace collecting process can block | ||
235 | the reaping of a crashed process simply by never exiting. This sysctl | ||
236 | defends against that. It defines how many concurrent crashing | ||
237 | processes may be piped to user space applications in parallel. If | ||
238 | this value is exceeded, then those crashing processes above that value | ||
239 | are noted via the kernel log and their cores are skipped. 0 is a | ||
240 | special value, indicating that unlimited processes may be captured in | ||
241 | parallel, but that no waiting will take place (i.e. the collecting | ||
242 | process is not guaranteed access to /proc/<crashing pid>/). This | ||
243 | value defaults to 0. | ||
244 | |||
245 | |||
246 | core_uses_pid: | ||
247 | ============== | ||
248 | |||
249 | The default coredump filename is "core". By setting | ||
250 | core_uses_pid to 1, the coredump filename becomes core.PID. | ||
251 | If core_pattern does not include "%p" (default does not) | ||
252 | and core_uses_pid is set, then .PID will be appended to | ||
253 | the filename. | ||
254 | |||
255 | |||
256 | ctrl-alt-del: | ||
257 | ============= | ||
258 | |||
259 | When the value in this file is 0, ctrl-alt-del is trapped and | ||
260 | sent to the init(1) program to handle a graceful restart. | ||
261 | When, however, the value is > 0, Linux's reaction to a Vulcan | ||
262 | Nerve Pinch (tm) will be an immediate reboot, without even | ||
263 | syncing its dirty buffers. | ||
264 | |||
265 | Note: | ||
266 | when a program (like dosemu) has the keyboard in 'raw' | ||
267 | mode, the ctrl-alt-del is intercepted by the program before it | ||
268 | ever reaches the kernel tty layer, and it's up to the program | ||
269 | to decide what to do with it. | ||
270 | |||
271 | |||
272 | dmesg_restrict: | ||
273 | =============== | ||
274 | |||
275 | This toggle indicates whether unprivileged users are prevented | ||
276 | from using dmesg(8) to view messages from the kernel's log buffer. | ||
277 | When dmesg_restrict is set to (0) there are no restrictions. When | ||
278 | dmesg_restrict is set set to (1), users must have CAP_SYSLOG to use | ||
279 | dmesg(8). | ||
280 | |||
281 | The kernel config option CONFIG_SECURITY_DMESG_RESTRICT sets the | ||
282 | default value of dmesg_restrict. | ||
283 | |||
284 | |||
285 | domainname & hostname: | ||
286 | ====================== | ||
287 | |||
288 | These files can be used to set the NIS/YP domainname and the | ||
289 | hostname of your box in exactly the same way as the commands | ||
290 | domainname and hostname, i.e.:: | ||
291 | |||
292 | # echo "darkstar" > /proc/sys/kernel/hostname | ||
293 | # echo "mydomain" > /proc/sys/kernel/domainname | ||
294 | |||
295 | has the same effect as:: | ||
296 | |||
297 | # hostname "darkstar" | ||
298 | # domainname "mydomain" | ||
299 | |||
300 | Note, however, that the classic darkstar.frop.org has the | ||
301 | hostname "darkstar" and DNS (Internet Domain Name Server) | ||
302 | domainname "frop.org", not to be confused with the NIS (Network | ||
303 | Information Service) or YP (Yellow Pages) domainname. These two | ||
304 | domain names are in general different. For a detailed discussion | ||
305 | see the hostname(1) man page. | ||
306 | |||
307 | |||
308 | hardlockup_all_cpu_backtrace: | ||
309 | ============================= | ||
310 | |||
311 | This value controls the hard lockup detector behavior when a hard | ||
312 | lockup condition is detected as to whether or not to gather further | ||
313 | debug information. If enabled, arch-specific all-CPU stack dumping | ||
314 | will be initiated. | ||
315 | |||
316 | 0: do nothing. This is the default behavior. | ||
317 | |||
318 | 1: on detection capture more debug information. | ||
319 | |||
320 | |||
321 | hardlockup_panic: | ||
322 | ================= | ||
323 | |||
324 | This parameter can be used to control whether the kernel panics | ||
325 | when a hard lockup is detected. | ||
326 | |||
327 | 0 - don't panic on hard lockup | ||
328 | 1 - panic on hard lockup | ||
329 | |||
330 | See Documentation/admin-guide/lockup-watchdogs.rst for more information. This can | ||
331 | also be set using the nmi_watchdog kernel parameter. | ||
332 | |||
333 | |||
334 | hotplug: | ||
335 | ======== | ||
336 | |||
337 | Path for the hotplug policy agent. | ||
338 | Default value is "/sbin/hotplug". | ||
339 | |||
340 | |||
341 | hung_task_panic: | ||
342 | ================ | ||
343 | |||
344 | Controls the kernel's behavior when a hung task is detected. | ||
345 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | ||
346 | |||
347 | 0: continue operation. This is the default behavior. | ||
348 | |||
349 | 1: panic immediately. | ||
350 | |||
351 | |||
352 | hung_task_check_count: | ||
353 | ====================== | ||
354 | |||
355 | The upper bound on the number of tasks that are checked. | ||
356 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | ||
357 | |||
358 | |||
359 | hung_task_timeout_secs: | ||
360 | ======================= | ||
361 | |||
362 | When a task in D state did not get scheduled | ||
363 | for more than this value report a warning. | ||
364 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | ||
365 | |||
366 | 0: means infinite timeout - no checking done. | ||
367 | |||
368 | Possible values to set are in range {0..LONG_MAX/HZ}. | ||
369 | |||
370 | |||
371 | hung_task_check_interval_secs: | ||
372 | ============================== | ||
373 | |||
374 | Hung task check interval. If hung task checking is enabled | ||
375 | (see hung_task_timeout_secs), the check is done every | ||
376 | hung_task_check_interval_secs seconds. | ||
377 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | ||
378 | |||
379 | 0 (default): means use hung_task_timeout_secs as checking interval. | ||
380 | Possible values to set are in range {0..LONG_MAX/HZ}. | ||
381 | |||
382 | |||
383 | hung_task_warnings: | ||
384 | =================== | ||
385 | |||
386 | The maximum number of warnings to report. During a check interval | ||
387 | if a hung task is detected, this value is decreased by 1. | ||
388 | When this value reaches 0, no more warnings will be reported. | ||
389 | This file shows up if CONFIG_DETECT_HUNG_TASK is enabled. | ||
390 | |||
391 | -1: report an infinite number of warnings. | ||
392 | |||
393 | |||
394 | hyperv_record_panic_msg: | ||
395 | ======================== | ||
396 | |||
397 | Controls whether the panic kmsg data should be reported to Hyper-V. | ||
398 | |||
399 | 0: do not report panic kmsg data. | ||
400 | |||
401 | 1: report the panic kmsg data. This is the default behavior. | ||
402 | |||
403 | |||
404 | kexec_load_disabled: | ||
405 | ==================== | ||
406 | |||
407 | A toggle indicating if the kexec_load syscall has been disabled. This | ||
408 | value defaults to 0 (false: kexec_load enabled), but can be set to 1 | ||
409 | (true: kexec_load disabled). Once true, kexec can no longer be used, and | ||
410 | the toggle cannot be set back to false. This allows a kexec image to be | ||
411 | loaded before disabling the syscall, allowing a system to set up (and | ||
412 | later use) an image without it being altered. Generally used together | ||
413 | with the "modules_disabled" sysctl. | ||
414 | |||
415 | |||
416 | kptr_restrict: | ||
417 | ============== | ||
418 | |||
419 | This toggle indicates whether restrictions are placed on | ||
420 | exposing kernel addresses via /proc and other interfaces. | ||
421 | |||
422 | When kptr_restrict is set to 0 (the default) the address is hashed before | ||
423 | printing. (This is the equivalent to %p.) | ||
424 | |||
425 | When kptr_restrict is set to (1), kernel pointers printed using the %pK | ||
426 | format specifier will be replaced with 0's unless the user has CAP_SYSLOG | ||
427 | and effective user and group ids are equal to the real ids. This is | ||
428 | because %pK checks are done at read() time rather than open() time, so | ||
429 | if permissions are elevated between the open() and the read() (e.g via | ||
430 | a setuid binary) then %pK will not leak kernel pointers to unprivileged | ||
431 | users. Note, this is a temporary solution only. The correct long-term | ||
432 | solution is to do the permission checks at open() time. Consider removing | ||
433 | world read permissions from files that use %pK, and using dmesg_restrict | ||
434 | to protect against uses of %pK in dmesg(8) if leaking kernel pointer | ||
435 | values to unprivileged users is a concern. | ||
436 | |||
437 | When kptr_restrict is set to (2), kernel pointers printed using | ||
438 | %pK will be replaced with 0's regardless of privileges. | ||
439 | |||
440 | |||
441 | l2cr: (PPC only) | ||
442 | ================ | ||
443 | |||
444 | This flag controls the L2 cache of G3 processor boards. If | ||
445 | 0, the cache is disabled. Enabled if nonzero. | ||
446 | |||
447 | |||
448 | modules_disabled: | ||
449 | ================= | ||
450 | |||
451 | A toggle value indicating if modules are allowed to be loaded | ||
452 | in an otherwise modular kernel. This toggle defaults to off | ||
453 | (0), but can be set true (1). Once true, modules can be | ||
454 | neither loaded nor unloaded, and the toggle cannot be set back | ||
455 | to false. Generally used with the "kexec_load_disabled" toggle. | ||
456 | |||
457 | |||
458 | msg_next_id, sem_next_id, and shm_next_id: | ||
459 | ========================================== | ||
460 | |||
461 | These three toggles allows to specify desired id for next allocated IPC | ||
462 | object: message, semaphore or shared memory respectively. | ||
463 | |||
464 | By default they are equal to -1, which means generic allocation logic. | ||
465 | Possible values to set are in range {0..INT_MAX}. | ||
466 | |||
467 | Notes: | ||
468 | 1) kernel doesn't guarantee, that new object will have desired id. So, | ||
469 | it's up to userspace, how to handle an object with "wrong" id. | ||
470 | 2) Toggle with non-default value will be set back to -1 by kernel after | ||
471 | successful IPC object allocation. If an IPC object allocation syscall | ||
472 | fails, it is undefined if the value remains unmodified or is reset to -1. | ||
473 | |||
474 | |||
475 | nmi_watchdog: | ||
476 | ============= | ||
477 | |||
478 | This parameter can be used to control the NMI watchdog | ||
479 | (i.e. the hard lockup detector) on x86 systems. | ||
480 | |||
481 | 0 - disable the hard lockup detector | ||
482 | |||
483 | 1 - enable the hard lockup detector | ||
484 | |||
485 | The hard lockup detector monitors each CPU for its ability to respond to | ||
486 | timer interrupts. The mechanism utilizes CPU performance counter registers | ||
487 | that are programmed to generate Non-Maskable Interrupts (NMIs) periodically | ||
488 | while a CPU is busy. Hence, the alternative name 'NMI watchdog'. | ||
489 | |||
490 | The NMI watchdog is disabled by default if the kernel is running as a guest | ||
491 | in a KVM virtual machine. This default can be overridden by adding:: | ||
492 | |||
493 | nmi_watchdog=1 | ||
494 | |||
495 | to the guest kernel command line (see Documentation/admin-guide/kernel-parameters.rst). | ||
496 | |||
497 | |||
498 | numa_balancing: | ||
499 | =============== | ||
500 | |||
501 | Enables/disables automatic page fault based NUMA memory | ||
502 | balancing. Memory is moved automatically to nodes | ||
503 | that access it often. | ||
504 | |||
505 | Enables/disables automatic NUMA memory balancing. On NUMA machines, there | ||
506 | is a performance penalty if remote memory is accessed by a CPU. When this | ||
507 | feature is enabled the kernel samples what task thread is accessing memory | ||
508 | by periodically unmapping pages and later trapping a page fault. At the | ||
509 | time of the page fault, it is determined if the data being accessed should | ||
510 | be migrated to a local memory node. | ||
511 | |||
512 | The unmapping of pages and trapping faults incur additional overhead that | ||
513 | ideally is offset by improved memory locality but there is no universal | ||
514 | guarantee. If the target workload is already bound to NUMA nodes then this | ||
515 | feature should be disabled. Otherwise, if the system overhead from the | ||
516 | feature is too high then the rate the kernel samples for NUMA hinting | ||
517 | faults may be controlled by the numa_balancing_scan_period_min_ms, | ||
518 | numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, | ||
519 | numa_balancing_scan_size_mb, and numa_balancing_settle_count sysctls. | ||
520 | |||
521 | numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb | ||
522 | =============================================================================================================================== | ||
523 | |||
524 | |||
525 | Automatic NUMA balancing scans tasks address space and unmaps pages to | ||
526 | detect if pages are properly placed or if the data should be migrated to a | ||
527 | memory node local to where the task is running. Every "scan delay" the task | ||
528 | scans the next "scan size" number of pages in its address space. When the | ||
529 | end of the address space is reached the scanner restarts from the beginning. | ||
530 | |||
531 | In combination, the "scan delay" and "scan size" determine the scan rate. | ||
532 | When "scan delay" decreases, the scan rate increases. The scan delay and | ||
533 | hence the scan rate of every task is adaptive and depends on historical | ||
534 | behaviour. If pages are properly placed then the scan delay increases, | ||
535 | otherwise the scan delay decreases. The "scan size" is not adaptive but | ||
536 | the higher the "scan size", the higher the scan rate. | ||
537 | |||
538 | Higher scan rates incur higher system overhead as page faults must be | ||
539 | trapped and potentially data must be migrated. However, the higher the scan | ||
540 | rate, the more quickly a tasks memory is migrated to a local node if the | ||
541 | workload pattern changes and minimises performance impact due to remote | ||
542 | memory accesses. These sysctls control the thresholds for scan delays and | ||
543 | the number of pages scanned. | ||
544 | |||
545 | numa_balancing_scan_period_min_ms is the minimum time in milliseconds to | ||
546 | scan a tasks virtual memory. It effectively controls the maximum scanning | ||
547 | rate for each task. | ||
548 | |||
549 | numa_balancing_scan_delay_ms is the starting "scan delay" used for a task | ||
550 | when it initially forks. | ||
551 | |||
552 | numa_balancing_scan_period_max_ms is the maximum time in milliseconds to | ||
553 | scan a tasks virtual memory. It effectively controls the minimum scanning | ||
554 | rate for each task. | ||
555 | |||
556 | numa_balancing_scan_size_mb is how many megabytes worth of pages are | ||
557 | scanned for a given scan. | ||
558 | |||
559 | |||
560 | osrelease, ostype & version: | ||
561 | ============================ | ||
562 | |||
563 | :: | ||
564 | |||
565 | # cat osrelease | ||
566 | 2.1.88 | ||
567 | # cat ostype | ||
568 | Linux | ||
569 | # cat version | ||
570 | #5 Wed Feb 25 21:49:24 MET 1998 | ||
571 | |||
572 | The files osrelease and ostype should be clear enough. Version | ||
573 | needs a little more clarification however. The '#5' means that | ||
574 | this is the fifth kernel built from this source base and the | ||
575 | date behind it indicates the time the kernel was built. | ||
576 | The only way to tune these values is to rebuild the kernel :-) | ||
577 | |||
578 | |||
579 | overflowgid & overflowuid: | ||
580 | ========================== | ||
581 | |||
582 | if your architecture did not always support 32-bit UIDs (i.e. arm, | ||
583 | i386, m68k, sh, and sparc32), a fixed UID and GID will be returned to | ||
584 | applications that use the old 16-bit UID/GID system calls, if the | ||
585 | actual UID or GID would exceed 65535. | ||
586 | |||
587 | These sysctls allow you to change the value of the fixed UID and GID. | ||
588 | The default is 65534. | ||
589 | |||
590 | |||
591 | panic: | ||
592 | ====== | ||
593 | |||
594 | The value in this file represents the number of seconds the kernel | ||
595 | waits before rebooting on a panic. When you use the software watchdog, | ||
596 | the recommended setting is 60. | ||
597 | |||
598 | |||
599 | panic_on_io_nmi: | ||
600 | ================ | ||
601 | |||
602 | Controls the kernel's behavior when a CPU receives an NMI caused by | ||
603 | an IO error. | ||
604 | |||
605 | 0: try to continue operation (default) | ||
606 | |||
607 | 1: panic immediately. The IO error triggered an NMI. This indicates a | ||
608 | serious system condition which could result in IO data corruption. | ||
609 | Rather than continuing, panicking might be a better choice. Some | ||
610 | servers issue this sort of NMI when the dump button is pushed, | ||
611 | and you can use this option to take a crash dump. | ||
612 | |||
613 | |||
614 | panic_on_oops: | ||
615 | ============== | ||
616 | |||
617 | Controls the kernel's behaviour when an oops or BUG is encountered. | ||
618 | |||
619 | 0: try to continue operation | ||
620 | |||
621 | 1: panic immediately. If the `panic` sysctl is also non-zero then the | ||
622 | machine will be rebooted. | ||
623 | |||
624 | |||
625 | panic_on_stackoverflow: | ||
626 | ======================= | ||
627 | |||
628 | Controls the kernel's behavior when detecting the overflows of | ||
629 | kernel, IRQ and exception stacks except a user stack. | ||
630 | This file shows up if CONFIG_DEBUG_STACKOVERFLOW is enabled. | ||
631 | |||
632 | 0: try to continue operation. | ||
633 | |||
634 | 1: panic immediately. | ||
635 | |||
636 | |||
637 | panic_on_unrecovered_nmi: | ||
638 | ========================= | ||
639 | |||
640 | The default Linux behaviour on an NMI of either memory or unknown is | ||
641 | to continue operation. For many environments such as scientific | ||
642 | computing it is preferable that the box is taken out and the error | ||
643 | dealt with than an uncorrected parity/ECC error get propagated. | ||
644 | |||
645 | A small number of systems do generate NMI's for bizarre random reasons | ||
646 | such as power management so the default is off. That sysctl works like | ||
647 | the existing panic controls already in that directory. | ||
648 | |||
649 | |||
650 | panic_on_warn: | ||
651 | ============== | ||
652 | |||
653 | Calls panic() in the WARN() path when set to 1. This is useful to avoid | ||
654 | a kernel rebuild when attempting to kdump at the location of a WARN(). | ||
655 | |||
656 | 0: only WARN(), default behaviour. | ||
657 | |||
658 | 1: call panic() after printing out WARN() location. | ||
659 | |||
660 | |||
661 | panic_print: | ||
662 | ============ | ||
663 | |||
664 | Bitmask for printing system info when panic happens. User can chose | ||
665 | combination of the following bits: | ||
666 | |||
667 | ===== ======================================== | ||
668 | bit 0 print all tasks info | ||
669 | bit 1 print system memory info | ||
670 | bit 2 print timer info | ||
671 | bit 3 print locks info if CONFIG_LOCKDEP is on | ||
672 | bit 4 print ftrace buffer | ||
673 | ===== ======================================== | ||
674 | |||
675 | So for example to print tasks and memory info on panic, user can:: | ||
676 | |||
677 | echo 3 > /proc/sys/kernel/panic_print | ||
678 | |||
679 | |||
680 | panic_on_rcu_stall: | ||
681 | =================== | ||
682 | |||
683 | When set to 1, calls panic() after RCU stall detection messages. This | ||
684 | is useful to define the root cause of RCU stalls using a vmcore. | ||
685 | |||
686 | 0: do not panic() when RCU stall takes place, default behavior. | ||
687 | |||
688 | 1: panic() after printing RCU stall messages. | ||
689 | |||
690 | |||
691 | perf_cpu_time_max_percent: | ||
692 | ========================== | ||
693 | |||
694 | Hints to the kernel how much CPU time it should be allowed to | ||
695 | use to handle perf sampling events. If the perf subsystem | ||
696 | is informed that its samples are exceeding this limit, it | ||
697 | will drop its sampling frequency to attempt to reduce its CPU | ||
698 | usage. | ||
699 | |||
700 | Some perf sampling happens in NMIs. If these samples | ||
701 | unexpectedly take too long to execute, the NMIs can become | ||
702 | stacked up next to each other so much that nothing else is | ||
703 | allowed to execute. | ||
704 | |||
705 | 0: | ||
706 | disable the mechanism. Do not monitor or correct perf's | ||
707 | sampling rate no matter how CPU time it takes. | ||
708 | |||
709 | 1-100: | ||
710 | attempt to throttle perf's sample rate to this | ||
711 | percentage of CPU. Note: the kernel calculates an | ||
712 | "expected" length of each sample event. 100 here means | ||
713 | 100% of that expected length. Even if this is set to | ||
714 | 100, you may still see sample throttling if this | ||
715 | length is exceeded. Set to 0 if you truly do not care | ||
716 | how much CPU is consumed. | ||
717 | |||
718 | |||
719 | perf_event_paranoid: | ||
720 | ==================== | ||
721 | |||
722 | Controls use of the performance events system by unprivileged | ||
723 | users (without CAP_SYS_ADMIN). The default value is 2. | ||
724 | |||
725 | === ================================================================== | ||
726 | -1 Allow use of (almost) all events by all users | ||
727 | |||
728 | Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK | ||
729 | |||
730 | >=0 Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN | ||
731 | |||
732 | Disallow raw tracepoint access by users without CAP_SYS_ADMIN | ||
733 | |||
734 | >=1 Disallow CPU event access by users without CAP_SYS_ADMIN | ||
735 | |||
736 | >=2 Disallow kernel profiling by users without CAP_SYS_ADMIN | ||
737 | === ================================================================== | ||
738 | |||
739 | |||
740 | perf_event_max_stack: | ||
741 | ===================== | ||
742 | |||
743 | Controls maximum number of stack frames to copy for (attr.sample_type & | ||
744 | PERF_SAMPLE_CALLCHAIN) configured events, for instance, when using | ||
745 | 'perf record -g' or 'perf trace --call-graph fp'. | ||
746 | |||
747 | This can only be done when no events are in use that have callchains | ||
748 | enabled, otherwise writing to this file will return -EBUSY. | ||
749 | |||
750 | The default value is 127. | ||
751 | |||
752 | |||
753 | perf_event_mlock_kb: | ||
754 | ==================== | ||
755 | |||
756 | Control size of per-cpu ring buffer not counted agains mlock limit. | ||
757 | |||
758 | The default value is 512 + 1 page | ||
759 | |||
760 | |||
761 | perf_event_max_contexts_per_stack: | ||
762 | ================================== | ||
763 | |||
764 | Controls maximum number of stack frame context entries for | ||
765 | (attr.sample_type & PERF_SAMPLE_CALLCHAIN) configured events, for | ||
766 | instance, when using 'perf record -g' or 'perf trace --call-graph fp'. | ||
767 | |||
768 | This can only be done when no events are in use that have callchains | ||
769 | enabled, otherwise writing to this file will return -EBUSY. | ||
770 | |||
771 | The default value is 8. | ||
772 | |||
773 | |||
774 | pid_max: | ||
775 | ======== | ||
776 | |||
777 | PID allocation wrap value. When the kernel's next PID value | ||
778 | reaches this value, it wraps back to a minimum PID value. | ||
779 | PIDs of value pid_max or larger are not allocated. | ||
780 | |||
781 | |||
782 | ns_last_pid: | ||
783 | ============ | ||
784 | |||
785 | The last pid allocated in the current (the one task using this sysctl | ||
786 | lives in) pid namespace. When selecting a pid for a next task on fork | ||
787 | kernel tries to allocate a number starting from this one. | ||
788 | |||
789 | |||
790 | powersave-nap: (PPC only) | ||
791 | ========================= | ||
792 | |||
793 | If set, Linux-PPC will use the 'nap' mode of powersaving, | ||
794 | otherwise the 'doze' mode will be used. | ||
795 | |||
796 | ============================================================== | ||
797 | |||
798 | printk: | ||
799 | ======= | ||
800 | |||
801 | The four values in printk denote: console_loglevel, | ||
802 | default_message_loglevel, minimum_console_loglevel and | ||
803 | default_console_loglevel respectively. | ||
804 | |||
805 | These values influence printk() behavior when printing or | ||
806 | logging error messages. See 'man 2 syslog' for more info on | ||
807 | the different loglevels. | ||
808 | |||
809 | - console_loglevel: | ||
810 | messages with a higher priority than | ||
811 | this will be printed to the console | ||
812 | - default_message_loglevel: | ||
813 | messages without an explicit priority | ||
814 | will be printed with this priority | ||
815 | - minimum_console_loglevel: | ||
816 | minimum (highest) value to which | ||
817 | console_loglevel can be set | ||
818 | - default_console_loglevel: | ||
819 | default value for console_loglevel | ||
820 | |||
821 | |||
822 | printk_delay: | ||
823 | ============= | ||
824 | |||
825 | Delay each printk message in printk_delay milliseconds | ||
826 | |||
827 | Value from 0 - 10000 is allowed. | ||
828 | |||
829 | |||
830 | printk_ratelimit: | ||
831 | ================= | ||
832 | |||
833 | Some warning messages are rate limited. printk_ratelimit specifies | ||
834 | the minimum length of time between these messages (in jiffies), by | ||
835 | default we allow one every 5 seconds. | ||
836 | |||
837 | A value of 0 will disable rate limiting. | ||
838 | |||
839 | |||
840 | printk_ratelimit_burst: | ||
841 | ======================= | ||
842 | |||
843 | While long term we enforce one message per printk_ratelimit | ||
844 | seconds, we do allow a burst of messages to pass through. | ||
845 | printk_ratelimit_burst specifies the number of messages we can | ||
846 | send before ratelimiting kicks in. | ||
847 | |||
848 | |||
849 | printk_devkmsg: | ||
850 | =============== | ||
851 | |||
852 | Control the logging to /dev/kmsg from userspace: | ||
853 | |||
854 | ratelimit: | ||
855 | default, ratelimited | ||
856 | |||
857 | on: unlimited logging to /dev/kmsg from userspace | ||
858 | |||
859 | off: logging to /dev/kmsg disabled | ||
860 | |||
861 | The kernel command line parameter printk.devkmsg= overrides this and is | ||
862 | a one-time setting until next reboot: once set, it cannot be changed by | ||
863 | this sysctl interface anymore. | ||
864 | |||
865 | |||
866 | randomize_va_space: | ||
867 | =================== | ||
868 | |||
869 | This option can be used to select the type of process address | ||
870 | space randomization that is used in the system, for architectures | ||
871 | that support this feature. | ||
872 | |||
873 | == =========================================================================== | ||
874 | 0 Turn the process address space randomization off. This is the | ||
875 | default for architectures that do not support this feature anyways, | ||
876 | and kernels that are booted with the "norandmaps" parameter. | ||
877 | |||
878 | 1 Make the addresses of mmap base, stack and VDSO page randomized. | ||
879 | This, among other things, implies that shared libraries will be | ||
880 | loaded to random addresses. Also for PIE-linked binaries, the | ||
881 | location of code start is randomized. This is the default if the | ||
882 | CONFIG_COMPAT_BRK option is enabled. | ||
883 | |||
884 | 2 Additionally enable heap randomization. This is the default if | ||
885 | CONFIG_COMPAT_BRK is disabled. | ||
886 | |||
887 | There are a few legacy applications out there (such as some ancient | ||
888 | versions of libc.so.5 from 1996) that assume that brk area starts | ||
889 | just after the end of the code+bss. These applications break when | ||
890 | start of the brk area is randomized. There are however no known | ||
891 | non-legacy applications that would be broken this way, so for most | ||
892 | systems it is safe to choose full randomization. | ||
893 | |||
894 | Systems with ancient and/or broken binaries should be configured | ||
895 | with CONFIG_COMPAT_BRK enabled, which excludes the heap from process | ||
896 | address space randomization. | ||
897 | == =========================================================================== | ||
898 | |||
899 | |||
900 | reboot-cmd: (Sparc only) | ||
901 | ======================== | ||
902 | |||
903 | ??? This seems to be a way to give an argument to the Sparc | ||
904 | ROM/Flash boot loader. Maybe to tell it what to do after | ||
905 | rebooting. ??? | ||
906 | |||
907 | |||
908 | rtsig-max & rtsig-nr: | ||
909 | ===================== | ||
910 | |||
911 | The file rtsig-max can be used to tune the maximum number | ||
912 | of POSIX realtime (queued) signals that can be outstanding | ||
913 | in the system. | ||
914 | |||
915 | rtsig-nr shows the number of RT signals currently queued. | ||
916 | |||
917 | |||
918 | sched_energy_aware: | ||
919 | =================== | ||
920 | |||
921 | Enables/disables Energy Aware Scheduling (EAS). EAS starts | ||
922 | automatically on platforms where it can run (that is, | ||
923 | platforms with asymmetric CPU topologies and having an Energy | ||
924 | Model available). If your platform happens to meet the | ||
925 | requirements for EAS but you do not want to use it, change | ||
926 | this value to 0. | ||
927 | |||
928 | |||
929 | sched_schedstats: | ||
930 | ================= | ||
931 | |||
932 | Enables/disables scheduler statistics. Enabling this feature | ||
933 | incurs a small amount of overhead in the scheduler but is | ||
934 | useful for debugging and performance tuning. | ||
935 | |||
936 | |||
937 | sg-big-buff: | ||
938 | ============ | ||
939 | |||
940 | This file shows the size of the generic SCSI (sg) buffer. | ||
941 | You can't tune it just yet, but you could change it on | ||
942 | compile time by editing include/scsi/sg.h and changing | ||
943 | the value of SG_BIG_BUFF. | ||
944 | |||
945 | There shouldn't be any reason to change this value. If | ||
946 | you can come up with one, you probably know what you | ||
947 | are doing anyway :) | ||
948 | |||
949 | |||
950 | shmall: | ||
951 | ======= | ||
952 | |||
953 | This parameter sets the total amount of shared memory pages that | ||
954 | can be used system wide. Hence, SHMALL should always be at least | ||
955 | ceil(shmmax/PAGE_SIZE). | ||
956 | |||
957 | If you are not sure what the default PAGE_SIZE is on your Linux | ||
958 | system, you can run the following command: | ||
959 | |||
960 | # getconf PAGE_SIZE | ||
961 | |||
962 | |||
963 | shmmax: | ||
964 | ======= | ||
965 | |||
966 | This value can be used to query and set the run time limit | ||
967 | on the maximum shared memory segment size that can be created. | ||
968 | Shared memory segments up to 1Gb are now supported in the | ||
969 | kernel. This value defaults to SHMMAX. | ||
970 | |||
971 | |||
972 | shm_rmid_forced: | ||
973 | ================ | ||
974 | |||
975 | Linux lets you set resource limits, including how much memory one | ||
976 | process can consume, via setrlimit(2). Unfortunately, shared memory | ||
977 | segments are allowed to exist without association with any process, and | ||
978 | thus might not be counted against any resource limits. If enabled, | ||
979 | shared memory segments are automatically destroyed when their attach | ||
980 | count becomes zero after a detach or a process termination. It will | ||
981 | also destroy segments that were created, but never attached to, on exit | ||
982 | from the process. The only use left for IPC_RMID is to immediately | ||
983 | destroy an unattached segment. Of course, this breaks the way things are | ||
984 | defined, so some applications might stop working. Note that this | ||
985 | feature will do you no good unless you also configure your resource | ||
986 | limits (in particular, RLIMIT_AS and RLIMIT_NPROC). Most systems don't | ||
987 | need this. | ||
988 | |||
989 | Note that if you change this from 0 to 1, already created segments | ||
990 | without users and with a dead originative process will be destroyed. | ||
991 | |||
992 | |||
993 | sysctl_writes_strict: | ||
994 | ===================== | ||
995 | |||
996 | Control how file position affects the behavior of updating sysctl values | ||
997 | via the /proc/sys interface: | ||
998 | |||
999 | == ====================================================================== | ||
1000 | -1 Legacy per-write sysctl value handling, with no printk warnings. | ||
1001 | Each write syscall must fully contain the sysctl value to be | ||
1002 | written, and multiple writes on the same sysctl file descriptor | ||
1003 | will rewrite the sysctl value, regardless of file position. | ||
1004 | 0 Same behavior as above, but warn about processes that perform writes | ||
1005 | to a sysctl file descriptor when the file position is not 0. | ||
1006 | 1 (default) Respect file position when writing sysctl strings. Multiple | ||
1007 | writes will append to the sysctl value buffer. Anything past the max | ||
1008 | length of the sysctl value buffer will be ignored. Writes to numeric | ||
1009 | sysctl entries must always be at file position 0 and the value must | ||
1010 | be fully contained in the buffer sent in the write syscall. | ||
1011 | == ====================================================================== | ||
1012 | |||
1013 | |||
1014 | softlockup_all_cpu_backtrace: | ||
1015 | ============================= | ||
1016 | |||
1017 | This value controls the soft lockup detector thread's behavior | ||
1018 | when a soft lockup condition is detected as to whether or not | ||
1019 | to gather further debug information. If enabled, each cpu will | ||
1020 | be issued an NMI and instructed to capture stack trace. | ||
1021 | |||
1022 | This feature is only applicable for architectures which support | ||
1023 | NMI. | ||
1024 | |||
1025 | 0: do nothing. This is the default behavior. | ||
1026 | |||
1027 | 1: on detection capture more debug information. | ||
1028 | |||
1029 | |||
1030 | soft_watchdog: | ||
1031 | ============== | ||
1032 | |||
1033 | This parameter can be used to control the soft lockup detector. | ||
1034 | |||
1035 | 0 - disable the soft lockup detector | ||
1036 | |||
1037 | 1 - enable the soft lockup detector | ||
1038 | |||
1039 | The soft lockup detector monitors CPUs for threads that are hogging the CPUs | ||
1040 | without rescheduling voluntarily, and thus prevent the 'watchdog/N' threads | ||
1041 | from running. The mechanism depends on the CPUs ability to respond to timer | ||
1042 | interrupts which are needed for the 'watchdog/N' threads to be woken up by | ||
1043 | the watchdog timer function, otherwise the NMI watchdog - if enabled - can | ||
1044 | detect a hard lockup condition. | ||
1045 | |||
1046 | |||
1047 | stack_erasing: | ||
1048 | ============== | ||
1049 | |||
1050 | This parameter can be used to control kernel stack erasing at the end | ||
1051 | of syscalls for kernels built with CONFIG_GCC_PLUGIN_STACKLEAK. | ||
1052 | |||
1053 | That erasing reduces the information which kernel stack leak bugs | ||
1054 | can reveal and blocks some uninitialized stack variable attacks. | ||
1055 | The tradeoff is the performance impact: on a single CPU system kernel | ||
1056 | compilation sees a 1% slowdown, other systems and workloads may vary. | ||
1057 | |||
1058 | 0: kernel stack erasing is disabled, STACKLEAK_METRICS are not updated. | ||
1059 | |||
1060 | 1: kernel stack erasing is enabled (default), it is performed before | ||
1061 | returning to the userspace at the end of syscalls. | ||
1062 | |||
1063 | |||
1064 | tainted | ||
1065 | ======= | ||
1066 | |||
1067 | Non-zero if the kernel has been tainted. Numeric values, which can be | ||
1068 | ORed together. The letters are seen in "Tainted" line of Oops reports. | ||
1069 | |||
1070 | ====== ===== ============================================================== | ||
1071 | 1 `(P)` proprietary module was loaded | ||
1072 | 2 `(F)` module was force loaded | ||
1073 | 4 `(S)` SMP kernel oops on an officially SMP incapable processor | ||
1074 | 8 `(R)` module was force unloaded | ||
1075 | 16 `(M)` processor reported a Machine Check Exception (MCE) | ||
1076 | 32 `(B)` bad page referenced or some unexpected page flags | ||
1077 | 64 `(U)` taint requested by userspace application | ||
1078 | 128 `(D)` kernel died recently, i.e. there was an OOPS or BUG | ||
1079 | 256 `(A)` an ACPI table was overridden by user | ||
1080 | 512 `(W)` kernel issued warning | ||
1081 | 1024 `(C)` staging driver was loaded | ||
1082 | 2048 `(I)` workaround for bug in platform firmware applied | ||
1083 | 4096 `(O)` externally-built ("out-of-tree") module was loaded | ||
1084 | 8192 `(E)` unsigned module was loaded | ||
1085 | 16384 `(L)` soft lockup occurred | ||
1086 | 32768 `(K)` kernel has been live patched | ||
1087 | 65536 `(X)` Auxiliary taint, defined and used by for distros | ||
1088 | 131072 `(T)` The kernel was built with the struct randomization plugin | ||
1089 | ====== ===== ============================================================== | ||
1090 | |||
1091 | See Documentation/admin-guide/tainted-kernels.rst for more information. | ||
1092 | |||
1093 | |||
1094 | threads-max: | ||
1095 | ============ | ||
1096 | |||
1097 | This value controls the maximum number of threads that can be created | ||
1098 | using fork(). | ||
1099 | |||
1100 | During initialization the kernel sets this value such that even if the | ||
1101 | maximum number of threads is created, the thread structures occupy only | ||
1102 | a part (1/8th) of the available RAM pages. | ||
1103 | |||
1104 | The minimum value that can be written to threads-max is 20. | ||
1105 | |||
1106 | The maximum value that can be written to threads-max is given by the | ||
1107 | constant FUTEX_TID_MASK (0x3fffffff). | ||
1108 | |||
1109 | If a value outside of this range is written to threads-max an error | ||
1110 | EINVAL occurs. | ||
1111 | |||
1112 | The value written is checked against the available RAM pages. If the | ||
1113 | thread structures would occupy too much (more than 1/8th) of the | ||
1114 | available RAM pages threads-max is reduced accordingly. | ||
1115 | |||
1116 | |||
1117 | unknown_nmi_panic: | ||
1118 | ================== | ||
1119 | |||
1120 | The value in this file affects behavior of handling NMI. When the | ||
1121 | value is non-zero, unknown NMI is trapped and then panic occurs. At | ||
1122 | that time, kernel debugging information is displayed on console. | ||
1123 | |||
1124 | NMI switch that most IA32 servers have fires unknown NMI up, for | ||
1125 | example. If a system hangs up, try pressing the NMI switch. | ||
1126 | |||
1127 | |||
1128 | watchdog: | ||
1129 | ========= | ||
1130 | |||
1131 | This parameter can be used to disable or enable the soft lockup detector | ||
1132 | _and_ the NMI watchdog (i.e. the hard lockup detector) at the same time. | ||
1133 | |||
1134 | 0 - disable both lockup detectors | ||
1135 | |||
1136 | 1 - enable both lockup detectors | ||
1137 | |||
1138 | The soft lockup detector and the NMI watchdog can also be disabled or | ||
1139 | enabled individually, using the soft_watchdog and nmi_watchdog parameters. | ||
1140 | If the watchdog parameter is read, for example by executing:: | ||
1141 | |||
1142 | cat /proc/sys/kernel/watchdog | ||
1143 | |||
1144 | the output of this command (0 or 1) shows the logical OR of soft_watchdog | ||
1145 | and nmi_watchdog. | ||
1146 | |||
1147 | |||
1148 | watchdog_cpumask: | ||
1149 | ================= | ||
1150 | |||
1151 | This value can be used to control on which cpus the watchdog may run. | ||
1152 | The default cpumask is all possible cores, but if NO_HZ_FULL is | ||
1153 | enabled in the kernel config, and cores are specified with the | ||
1154 | nohz_full= boot argument, those cores are excluded by default. | ||
1155 | Offline cores can be included in this mask, and if the core is later | ||
1156 | brought online, the watchdog will be started based on the mask value. | ||
1157 | |||
1158 | Typically this value would only be touched in the nohz_full case | ||
1159 | to re-enable cores that by default were not running the watchdog, | ||
1160 | if a kernel lockup was suspected on those cores. | ||
1161 | |||
1162 | The argument value is the standard cpulist format for cpumasks, | ||
1163 | so for example to enable the watchdog on cores 0, 2, 3, and 4 you | ||
1164 | might say:: | ||
1165 | |||
1166 | echo 0,2-4 > /proc/sys/kernel/watchdog_cpumask | ||
1167 | |||
1168 | |||
1169 | watchdog_thresh: | ||
1170 | ================ | ||
1171 | |||
1172 | This value can be used to control the frequency of hrtimer and NMI | ||
1173 | events and the soft and hard lockup thresholds. The default threshold | ||
1174 | is 10 seconds. | ||
1175 | |||
1176 | The softlockup threshold is (2 * watchdog_thresh). Setting this | ||
1177 | tunable to zero will disable lockup detection altogether. | ||