diff options
author | Mel Gorman <mgorman@suse.de> | 2015-03-25 18:55:42 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2015-03-25 19:20:31 -0400 |
commit | 074c238177a75f5e79af3b2cb6a84e54823ef950 (patch) | |
tree | 59fc27cb666e4691083df324e3016749523566c9 /kernel | |
parent | b191f9b106ea1a24a711dbebb2925d3313da5852 (diff) |
mm: numa: slow PTE scan rate if migration failures occur
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226
Across the board the 4.0-rc1 numbers are much slower, and the degradation
is far worse when using the large memory footprint configs. Perf points
straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:
- 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys
- default_send_IPI_mask_sequence_phys
- 99.99% physflat_send_IPI_mask
- 99.37% native_send_call_func_ipi
smp_call_function_many
- native_flush_tlb_others
- 99.85% flush_tlb_page
ptep_clear_flush
try_to_unmap_one
rmap_walk
try_to_unmap
migrate_pages
migrate_misplaced_page
- handle_mm_fault
- 99.73% __do_page_fault
trace_do_page_fault
do_async_page_fault
+ async_page_fault
0.63% native_send_call_func_single_ipi
generic_exec_single
smp_call_function_single
This is showing excessive migration activity even though excessive
migrations are meant to get throttled. Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults. However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote. This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen. This patch tracks when migration failures occur and
slows the PTE scanner.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/sched/fair.c | 8 |
1 files changed, 6 insertions, 2 deletions
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7ce18f3c097a..bcfe32088b37 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c | |||
@@ -1609,9 +1609,11 @@ static void update_task_scan_period(struct task_struct *p, | |||
1609 | /* | 1609 | /* |
1610 | * If there were no record hinting faults then either the task is | 1610 | * If there were no record hinting faults then either the task is |
1611 | * completely idle or all activity is areas that are not of interest | 1611 | * completely idle or all activity is areas that are not of interest |
1612 | * to automatic numa balancing. Scan slower | 1612 | * to automatic numa balancing. Related to that, if there were failed |
1613 | * migration then it implies we are migrating too quickly or the local | ||
1614 | * node is overloaded. In either case, scan slower | ||
1613 | */ | 1615 | */ |
1614 | if (local + shared == 0) { | 1616 | if (local + shared == 0 || p->numa_faults_locality[2]) { |
1615 | p->numa_scan_period = min(p->numa_scan_period_max, | 1617 | p->numa_scan_period = min(p->numa_scan_period_max, |
1616 | p->numa_scan_period << 1); | 1618 | p->numa_scan_period << 1); |
1617 | 1619 | ||
@@ -2080,6 +2082,8 @@ void task_numa_fault(int last_cpupid, int mem_node, int pages, int flags) | |||
2080 | 2082 | ||
2081 | if (migrated) | 2083 | if (migrated) |
2082 | p->numa_pages_migrated += pages; | 2084 | p->numa_pages_migrated += pages; |
2085 | if (flags & TNF_MIGRATE_FAIL) | ||
2086 | p->numa_faults_locality[2] += pages; | ||
2083 | 2087 | ||
2084 | p->numa_faults[task_faults_idx(NUMA_MEMBUF, mem_node, priv)] += pages; | 2088 | p->numa_faults[task_faults_idx(NUMA_MEMBUF, mem_node, priv)] += pages; |
2085 | p->numa_faults[task_faults_idx(NUMA_CPUBUF, cpu_node, priv)] += pages; | 2089 | p->numa_faults[task_faults_idx(NUMA_CPUBUF, cpu_node, priv)] += pages; |