aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86
diff options
context:
space:
mode:
authorShaohua Li <shaohua.li@intel.com>2010-08-15 21:16:55 -0400
committerH. Peter Anvin <hpa@zytor.com>2010-08-23 13:04:57 -0400
commit61c77326d1df079f202fa79403c3ccd8c5966a81 (patch)
tree57780e6b94f24f402d1c9036d6e7cf37a359c22f /arch/x86
parent76be97c1fc945db08aae1f1b746012662d643e97 (diff)
x86, mm: Avoid unnecessary TLB flush
In x86, access and dirty bits are set automatically by CPU when CPU accesses memory. When we go into the code path of below flush_tlb_fix_spurious_fault(), we already set dirty bit for pte and don't need flush tlb. This might mean tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When the CPUs do page write, they will automatically check the bit and no software involved. On the other hand, flush tlb in below position is harmful. Test creates CPU number of threads, each thread writes to a same but random address in same vma range and we measure the total time. Under a 4 socket system, original time is 1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is 20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for tlb flush. Signed-off-by: Shaohua Li <shaohua.li@intel.com> LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrea Archangeli <aarcange@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Diffstat (limited to 'arch/x86')
-rw-r--r--arch/x86/include/asm/pgtable.h2
1 files changed, 2 insertions, 0 deletions
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index a34c785c5a63..2d0a33bd2971 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -603,6 +603,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm,
603 pte_update(mm, addr, ptep); 603 pte_update(mm, addr, ptep);
604} 604}
605 605
606#define flush_tlb_fix_spurious_fault(vma, address)
607
606/* 608/*
607 * clone_pgd_range(pgd_t *dst, pgd_t *src, int count); 609 * clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
608 * 610 *