diff options
author | Rik van Riel <riel@redhat.com> | 2013-10-07 06:29:39 -0400 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2013-10-09 08:48:21 -0400 |
commit | de1c9ce6f07fec0381a39a9d0b379ea35aa1167f (patch) | |
tree | d96bf1a2b25dfa84d3fe5f6fe00fb780800e3ef3 /Documentation/sysctl | |
parent | 1e3646ffc64b232cb14a5ef01d7b98997c1b73f9 (diff) |
sched/numa: Skip some page migrations after a shared fault
Shared faults can lead to lots of unnecessary page migrations,
slowing down the system, and causing private faults to hit the
per-pgdat migration ratelimit.
This patch adds sysctl numa_balancing_migrate_deferred, which specifies
how many shared page migrations to skip unconditionally, after each page
migration that is skipped because it is a shared fault.
This reduces the number of page migrations back and forth in
shared fault situations. It also gives a strong preference to
the tasks that are already running where most of the memory is,
and to moving the other tasks to near the memory.
Testing this with a much higher scan rate than the default
still seems to result in fewer page migrations than before.
Memory seems to be somewhat better consolidated than previously,
with multi-instance specjbb runs on a 4 node system.
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Diffstat (limited to 'Documentation/sysctl')
-rw-r--r-- | Documentation/sysctl/kernel.txt | 10 |
1 files changed, 9 insertions, 1 deletions
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt index 84f17800f8b5..4273b2d71a27 100644 --- a/Documentation/sysctl/kernel.txt +++ b/Documentation/sysctl/kernel.txt | |||
@@ -375,7 +375,8 @@ feature should be disabled. Otherwise, if the system overhead from the | |||
375 | feature is too high then the rate the kernel samples for NUMA hinting | 375 | feature is too high then the rate the kernel samples for NUMA hinting |
376 | faults may be controlled by the numa_balancing_scan_period_min_ms, | 376 | faults may be controlled by the numa_balancing_scan_period_min_ms, |
377 | numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, | 377 | numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, |
378 | numa_balancing_scan_size_mb and numa_balancing_settle_count sysctls. | 378 | numa_balancing_scan_size_mb, numa_balancing_settle_count sysctls and |
379 | numa_balancing_migrate_deferred. | ||
379 | 380 | ||
380 | ============================================================== | 381 | ============================================================== |
381 | 382 | ||
@@ -421,6 +422,13 @@ the schedule balancer stops pushing the task towards a preferred node. This | |||
421 | gives the scheduler a chance to place the task on an alternative node if the | 422 | gives the scheduler a chance to place the task on an alternative node if the |
422 | preferred node is overloaded. | 423 | preferred node is overloaded. |
423 | 424 | ||
425 | numa_balancing_migrate_deferred is how many page migrations get skipped | ||
426 | unconditionally, after a page migration is skipped because a page is shared | ||
427 | with other tasks. This reduces page migration overhead, and determines | ||
428 | how much stronger the "move task near its memory" policy scheduler becomes, | ||
429 | versus the "move memory near its task" memory management policy, for workloads | ||
430 | with shared memory. | ||
431 | |||
424 | ============================================================== | 432 | ============================================================== |
425 | 433 | ||
426 | osrelease, ostype & version: | 434 | osrelease, ostype & version: |