aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorMel Gorman <mgorman@suse.de>2013-10-07 06:28:55 -0400
committerIngo Molnar <mingo@kernel.org>2013-10-09 06:40:20 -0400
commit598f0ec0bc996e90a806ee9564af919ea5aad401 (patch)
tree9df97675a01340285b792be1909a41a02dbe905f /Documentation
parent7e8d16b6cbccb2f5da579f5085479fb82ba851b8 (diff)
sched/numa: Set the scan rate proportional to the memory usage of the task being scanned
The NUMA PTE scan rate is controlled with a combination of the numa_balancing_scan_period_min, numa_balancing_scan_period_max and numa_balancing_scan_size. This scan rate is independent of the size of the task and as an aside it is further complicated by the fact that numa_balancing_scan_size controls how many pages are marked pte_numa and not how much virtual memory is scanned. In combination, it is almost impossible to meaningfully tune the min and max scan periods and reasoning about performance is complex when the time to complete a full scan is is partially a function of the tasks memory size. This patch alters the semantic of the min and max tunables to be about tuning the length time it takes to complete a scan of a tasks occupied virtual address space. Conceptually this is a lot easier to understand. There is a "sanity" check to ensure the scan rate is never extremely fast based on the amount of virtual memory that should be scanned in a second. The default of 2.5G seems arbitrary but it is to have the maximum scan rate after the patch roughly match the maximum scan rate before the patch was applied. On a similar note, numa_scan_period is in milliseconds and not jiffies. Properly placed pages slow the scanning rate but adding 10 jiffies to numa_scan_period means that the rate scanning slows depends on HZ which is confusing. Get rid of the jiffies_to_msec conversion and treat it as ms. Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1381141781-10992-18-git-send-email-mgorman@suse.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/sysctl/kernel.txt11
1 files changed, 6 insertions, 5 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 1428c6659254..8cd7e5fc79da 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -403,15 +403,16 @@ workload pattern changes and minimises performance impact due to remote
403memory accesses. These sysctls control the thresholds for scan delays and 403memory accesses. These sysctls control the thresholds for scan delays and
404the number of pages scanned. 404the number of pages scanned.
405 405
406numa_balancing_scan_period_min_ms is the minimum delay in milliseconds 406numa_balancing_scan_period_min_ms is the minimum time in milliseconds to
407between scans. It effectively controls the maximum scanning rate for 407scan a tasks virtual memory. It effectively controls the maximum scanning
408each task. 408rate for each task.
409 409
410numa_balancing_scan_delay_ms is the starting "scan delay" used for a task 410numa_balancing_scan_delay_ms is the starting "scan delay" used for a task
411when it initially forks. 411when it initially forks.
412 412
413numa_balancing_scan_period_max_ms is the maximum delay between scans. It 413numa_balancing_scan_period_max_ms is the maximum time in milliseconds to
414effectively controls the minimum scanning rate for each task. 414scan a tasks virtual memory. It effectively controls the minimum scanning
415rate for each task.
415 416
416numa_balancing_scan_size_mb is how many megabytes worth of pages are 417numa_balancing_scan_size_mb is how many megabytes worth of pages are
417scanned for a given scan. 418scanned for a given scan.