aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include/asm/mmu_context.h
diff options
context:
space:
mode:
authorDave Hansen <dave.hansen@linux.intel.com>2015-01-08 17:30:21 -0500
committerThomas Gleixner <tglx@linutronix.de>2015-01-22 15:11:06 -0500
commitc922228efeeefa32e57f875764bfa6ca8053a68a (patch)
treef41350c7039ba727711d4edc2bae50d8ab8212d7 /arch/x86/include/asm/mmu_context.h
parent814564a0a1d2faee11ff9de43245d78cb79c85ac (diff)
x86, mpx: Fix potential performance issue on unmaps
The 3.19 merge window saw some TLB modifications merged which caused a performance regression. They were fixed in commit 045bbb9fa. Once that fix was applied, I also noticed that there was a small but intermittent regression still present. It was not present consistently enough to bisect reliably, but I'm fairly confident that it came from (my own) MPX patches. The source was reading a relatively unused field in the mm_struct via arch_unmap. I also noted that this code was in the main instruction flow of do_munmap() and probably had more icache impact than we want. This patch does two things: 1. Adds a static (via Kconfig) and dynamic (via cpuid) check for MPX with cpu_feature_enabled(). This keeps us from reading that cacheline in the mm and trades it for a check of the global CPUID variables at least on CPUs without MPX. 2. Adds an unlikely() to ensure that the MPX call ends up out of the main instruction flow in do_munmap(). I've added a detailed comment about why this was done and why we want it even on systems where MPX is present. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: luto@amacapital.net Cc: Dave Hansen <dave@sr71.net> Link: http://lkml.kernel.org/r/20150108223021.AEEAB987@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Diffstat (limited to 'arch/x86/include/asm/mmu_context.h')
-rw-r--r--arch/x86/include/asm/mmu_context.h20
1 files changed, 19 insertions, 1 deletions
diff --git a/arch/x86/include/asm/mmu_context.h b/arch/x86/include/asm/mmu_context.h
index 40269a2bf6f9..4b75d591eb5e 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -130,7 +130,25 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
130static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma, 130static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
131 unsigned long start, unsigned long end) 131 unsigned long start, unsigned long end)
132{ 132{
133 mpx_notify_unmap(mm, vma, start, end); 133 /*
134 * mpx_notify_unmap() goes and reads a rarely-hot
135 * cacheline in the mm_struct. That can be expensive
136 * enough to be seen in profiles.
137 *
138 * The mpx_notify_unmap() call and its contents have been
139 * observed to affect munmap() performance on hardware
140 * where MPX is not present.
141 *
142 * The unlikely() optimizes for the fast case: no MPX
143 * in the CPU, or no MPX use in the process. Even if
144 * we get this wrong (in the unlikely event that MPX
145 * is widely enabled on some system) the overhead of
146 * MPX itself (reading bounds tables) is expected to
147 * overwhelm the overhead of getting this unlikely()
148 * consistently wrong.
149 */
150 if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
151 mpx_notify_unmap(mm, vma, start, end);
134} 152}
135 153
136#endif /* _ASM_X86_MMU_CONTEXT_H */ 154#endif /* _ASM_X86_MMU_CONTEXT_H */