aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/sysctl/kernel.txt76
-rw-r--r--Documentation/trace/ftrace.txt6
2 files changed, 81 insertions, 1 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9d4c1d18ad44..4273b2d71a27 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -355,6 +355,82 @@ utilize.
355 355
356============================================================== 356==============================================================
357 357
358numa_balancing
359
360Enables/disables automatic page fault based NUMA memory
361balancing. Memory is moved automatically to nodes
362that access it often.
363
364Enables/disables automatic NUMA memory balancing. On NUMA machines, there
365is a performance penalty if remote memory is accessed by a CPU. When this
366feature is enabled the kernel samples what task thread is accessing memory
367by periodically unmapping pages and later trapping a page fault. At the
368time of the page fault, it is determined if the data being accessed should
369be migrated to a local memory node.
370
371The unmapping of pages and trapping faults incur additional overhead that
372ideally is offset by improved memory locality but there is no universal
373guarantee. If the target workload is already bound to NUMA nodes then this
374feature should be disabled. Otherwise, if the system overhead from the
375feature is too high then the rate the kernel samples for NUMA hinting
376faults may be controlled by the numa_balancing_scan_period_min_ms,
377numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms,
378numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and
379numa_balancing_migrate_deferred.
380
381==============================================================
382
383numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms,
384numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb
385
386Automatic NUMA balancing scans tasks address space and unmaps pages to
387detect if pages are properly placed or if the data should be migrated to a
388memory node local to where the task is running. Every "scan delay" the task
389scans the next "scan size" number of pages in its address space. When the
390end of the address space is reached the scanner restarts from the beginning.
391
392In combination, the "scan delay" and "scan size" determine the scan rate.
393When "scan delay" decreases, the scan rate increases. The scan delay and
394hence the scan rate of every task is adaptive and depends on historical
395behaviour. If pages are properly placed then the scan delay increases,
396otherwise the scan delay decreases. The "scan size" is not adaptive but
397the higher the "scan size", the higher the scan rate.
398
399Higher scan rates incur higher system overhead as page faults must be
400trapped and potentially data must be migrated. However, the higher the scan
401rate, the more quickly a tasks memory is migrated to a local node if the
402workload pattern changes and minimises performance impact due to remote
403memory accesses. These sysctls control the thresholds for scan delays and
404the number of pages scanned.
405
406numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
407scan a tasks virtual memory. It effectively controls the maximum scanning
408rate for each task.
409
410numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
411when it initially forks.
412
413numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
414scan a tasks virtual memory. It effectively controls the minimum scanning
415rate for each task.
416
417numa_balancing_scan_size_mb is how many megabytes worth of pages are
418scanned for a given scan.
419
420numa_balancing_settle_count is how many scan periods must complete before
421the schedule balancer stops pushing the task towards a preferred node. This
422gives the scheduler a chance to place the task on an alternative node if the
423preferred node is overloaded.
424
425numa_balancing_migrate_deferred is how many page migrations get skipped
426unconditionally, after a page migration is skipped because a page is shared
427with other tasks. This reduces page migration overhead, and determines
428how much stronger the "move task near its memory" policy scheduler becomes,
429versus the "move memory near its task" memory management policy, for workloads
430with shared memory.
431
432==============================================================
433
358osrelease, ostype & version: 434osrelease, ostype & version:
359 435
360# cat osrelease 436# cat osrelease
diff --git a/Documentation/trace/ftrace.txt b/Documentation/trace/ftrace.txt
index ea2d35d64d26..bd365988e8d8 100644
--- a/Documentation/trace/ftrace.txt
+++ b/Documentation/trace/ftrace.txt
@@ -655,7 +655,11 @@ explains which is which.
655 read the irq flags variable, an 'X' will always 655 read the irq flags variable, an 'X' will always
656 be printed here. 656 be printed here.
657 657
658 need-resched: 'N' task need_resched is set, '.' otherwise. 658 need-resched:
659 'N' both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED is set,
660 'n' only TIF_NEED_RESCHED is set,
661 'p' only PREEMPT_NEED_RESCHED is set,
662 '.' otherwise.
659 663
660 hardirq/softirq: 664 hardirq/softirq:
661 'H' - hard irq occurred inside a softirq. 665 'H' - hard irq occurred inside a softirq.