aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/sysctl
diff options
context:
space:
mode:
authorEric Paris <eparis@redhat.com>2014-03-07 11:41:32 -0500
committerEric Paris <eparis@redhat.com>2014-03-07 11:41:32 -0500
commitb7d3622a39fde7658170b7f3cf6c6889bb8db30d (patch)
tree64f4e781ecb2a85d675e234072b988560bcd25f1 /Documentation/sysctl
parentf3411cb2b2e396a41ed3a439863f028db7140a34 (diff)
parentd8ec26d7f8287f5788a494f56e8814210f0e64be (diff)
Merge tag 'v3.13' into for-3.15
Linux 3.13 Conflicts: include/net/xfrm.h Simple merge where v3.13 removed 'extern' from definitions and the audit tree did s/u32/unsigned int/ to the same definitions.
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r--Documentation/sysctl/kernel.txt101
-rw-r--r--Documentation/sysctl/vm.txt15
2 files changed, 104 insertions, 12 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d18ad44..26b7ee491df8 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -290,13 +290,24 @@ Default value is "/sbin/hotplug".
290kptr_restrict: 290kptr_restrict:
291 291
292This toggle indicates whether restrictions are placed on 292This toggle indicates whether restrictions are placed on
293exposing kernel addresses via /proc and other interfaces. When 293exposing kernel addresses via /proc and other interfaces.
294kptr_restrict is set to (0), there are no restrictions. When 294
295kptr_restrict is set to (1), the default, kernel pointers 295When kptr_restrict is set to (0), the default, there are no restrictions.
296printed using the %pK format specifier will be replaced with 0's 296
297unless the user has CAP_SYSLOG. When kptr_restrict is set to 297When kptr_restrict is set to (1), kernel pointers printed using the %pK
298(2), kernel pointers printed using %pK will be replaced with 0's 298format specifier will be replaced with 0's unless the user has CAP_SYSLOG
299regardless of privileges. 299and effective user and group ids are equal to the real ids. This is
300because %pK checks are done at read() time rather than open() time, so
301if permissions are elevated between the open() and the read() (e.g via
302a setuid binary) then %pK will not leak kernel pointers to unprivileged
303users. Note, this is a temporary solution only. The correct long-term
304solution is to do the permission checks at open() time. Consider removing
305world read permissions from files that use %pK, and using dmesg_restrict
306to protect against uses of %pK in dmesg(8) if leaking kernel pointer
307values to unprivileged users is a concern.
308
309When kptr_restrict is set to (2), kernel pointers printed using
310%pK will be replaced with 0's regardless of privileges.
300 311
301============================================================== 312==============================================================
302 313
@@ -355,6 +366,82 @@ utilize.
355 366
356============================================================== 367==============================================================
357 368
369numa_balancing
370
371Enables/disables automatic page fault based NUMA memory
372balancing. Memory is moved automatically to nodes
373that access it often.
374
375Enables/disables automatic NUMA memory balancing. On NUMA machines, there
376is a performance penalty if remote memory is accessed by a CPU. When this
377feature is enabled the kernel samples what task thread is accessing memory
378by periodically unmapping pages and later trapping a page fault. At the
379time of the page fault, it is determined if the data being accessed should
380be migrated to a local memory node.
381
382The unmapping of pages and trapping faults incur additional overhead that
383ideally is offset by improved memory locality but there is no universal
384guarantee. If the target workload is already bound to NUMA nodes then this
385feature should be disabled. Otherwise, if the system overhead from the
386feature is too high then the rate the kernel samples for NUMA hinting
387faults may be controlled by the numa_balancing_scan_period_min_ms,
388numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
389numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
390numa_balancing_migrate_deferred.
391
392==============================================================
393
394numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
395numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
396
397Automatic NUMA balancing scans tasks address space and unmaps pages to
398detect if pages are properly placed or if the data should be migrated to a
399memory node local to where the task is running. Every "scan delay" the task
400scans the next "scan size" number of pages in its address space. When the
401end of the address space is reached the scanner restarts from the beginning.
402
403In combination, the "scan delay" and "scan size" determine the scan rate.
404When "scan delay" decreases, the scan rate increases. The scan delay and
405hence the scan rate of every task is adaptive and depends on historical
406behaviour. If pages are properly placed then the scan delay increases,
407otherwise the scan delay decreases. The "scan size" is not adaptive but
408the higher the "scan size", the higher the scan rate.
409
410Higher scan rates incur higher system overhead as page faults must be
411trapped and potentially data must be migrated. However, the higher the scan
412rate, the more quickly a tasks memory is migrated to a local node if the
413workload pattern changes and minimises performance impact due to remote
414memory accesses. These sysctls control the thresholds for scan delays and
415the number of pages scanned.
416
417numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
418scan a tasks virtual memory. It effectively controls the maximum scanning
419rate for each task.
420
421numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
422when it initially forks.
423
424numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
425scan a tasks virtual memory. It effectively controls the minimum scanning
426rate for each task.
427
428numa_balancing_scan_size_mb is how many megabytes worth of pages are
429scanned for a given scan.
430
431numa_balancing_settle_count is how many scan periods must complete before
432the schedule balancer stops pushing the task towards a preferred node. This
433gives the scheduler a chance to place the task on an alternative node if the
434preferred node is overloaded.
435
436numa_balancing_migrate_deferred is how many page migrations get skipped
437unconditionally, after a page migration is skipped because a page is shared
438with other tasks. This reduces page migration overhead, and determines
439how much stronger the "move task near its memory" policy scheduler becomes,
440versus the "move memory near its task" memory management policy, for workloads
441with shared memory.
442
443==============================================================
444
358osrelease, ostype & version: 445osrelease, ostype & version:
359 446
360# cat osrelease 447# cat osrelease
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 79a797eb3e87..1fbd4eb7b64a 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -119,8 +119,11 @@ other appears as 0 when read.
119 119
120dirty_background_ratio 120dirty_background_ratio
121 121
122Contains, as a percentage of total system memory, the number of pages at which 122Contains, as a percentage of total available memory that contains free pages
123the background kernel flusher threads will start writing out dirty data. 123and reclaimable pages, the number of pages at which the background kernel
124flusher threads will start writing out dirty data.
125
126The total avaiable memory is not equal to total system memory.
124 127
125============================================================== 128==============================================================
126 129
@@ -151,9 +154,11 @@ interval will be written out next time a flusher thread wakes up.
151 154
152dirty_ratio 155dirty_ratio
153 156
154Contains, as a percentage of total system memory, the number of pages at which 157Contains, as a percentage of total available memory that contains free pages
155a process which is generating disk writes will itself start writing out dirty 158and reclaimable pages, the number of pages at which a process which is
156data. 159generating disk writes will itself start writing out dirty data.
160
161The total avaiable memory is not equal to total system memory.
157 162
158============================================================== 163==============================================================
159 164