diff options
author | Vineet Gupta <vgupta@synopsys.com> | 2013-04-05 09:08:31 -0400 |
---|---|---|
committer | Vineet Gupta <vgupta@synopsys.com> | 2013-05-07 04:14:00 -0400 |
commit | 8d56bec2f2945b7e500b413d1bdc24e7dca12877 (patch) | |
tree | 6a777ae12a5661959b055b3f47b77badc59a2a31 /arch/arc | |
parent | a92a5d0dce5b02fa34792e313b5fe3d7d317b17b (diff) |
ARC: [mm] optimize needless full mm TLB flush on munmap
munmap ends up calling tlb_flush() which for ARC was flushing the entire
TLB unconditionally (by moving the MMU to a new ASID)
do_munmap
unmap_region
unmap_vmas
unmap_single_vma
unmap_page_range
tlb_start_vma
zap_pud_range
tlb_end_vma()
tlb_finish_mmu
tlb_flush() ---> unconditional flush_tlb_mm()
So even a single page munmap, a frequent operation when uClibc dynamic
linker (ldso) is loading the dependent shared libraries, would move the
the ASID multiple times - needlessly invalidating the pre-faulted TLB
entries (and increasing the rate of ASID wraparound + full TLB flush).
This is now optimised to only be called if tlb->full_mm (which means
for exit/execve) cases only. And for those cases, flush_tlb_mm() is
already optimised to be a no-op for mm->mm_users == 0.
So essentially there are no mmore full mm flushes - except for fork which
anyhow needs it for properly COW'ing parent address space.
munmap now needs to do TLB range flush, which is implemented with
tlb_end_vma()
Results
-------
1. ASID now consistenly moves by 4 during a simple ls (as opposed to 5 or
7 before).
2. LMBench microbenchmark also shows improvements
Basic system parameters
------------------------------------------------------------------------------
Host OS Description Mhz tlb cache mem scal
pages line par load
bytes
--------- ------------- ----------------------- ---- ----- ----- ------ ----
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0404-gcc-4.4-ba 80 8 64 1.1000 1
3.9-rc5-0 Linux 3.9.0-r 3.9-rc5-0405-avoid-full 80 8 64 1.1200 1
Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc5-0 Linux 3.9.0-r 80 4.81 8.69 68.6 118. 239. 8.53 31.6 4839 13.K 34.K
3.9-rc5-0 Linux 3.9.0-r 80 4.46 8.36 53.8 91.3 223. 8.12 24.2 4725 13.K 33.K
File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc5-0 Linux 3.9.0-r 314.7 223.2 1054.9 390.2 3615.0 1.590 20.1 126.6
3.9-rc5-0 Linux 3.9.0-r 265.8 183.8 1014.2 314.1 3193.0 6.910 18.8 110.4
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Diffstat (limited to 'arch/arc')
-rw-r--r-- | arch/arc/include/asm/tlb.h | 16 |
1 files changed, 12 insertions, 4 deletions
diff --git a/arch/arc/include/asm/tlb.h b/arch/arc/include/asm/tlb.h index 3eb2ce0bdfa3..fe91719866a5 100644 --- a/arch/arc/include/asm/tlb.h +++ b/arch/arc/include/asm/tlb.h | |||
@@ -21,20 +21,28 @@ | |||
21 | 21 | ||
22 | #ifndef __ASSEMBLY__ | 22 | #ifndef __ASSEMBLY__ |
23 | 23 | ||
24 | #define tlb_flush(tlb) local_flush_tlb_mm((tlb)->mm) | 24 | #define tlb_flush(tlb) \ |
25 | do { \ | ||
26 | if (tlb->fullmm) \ | ||
27 | flush_tlb_mm((tlb)->mm); \ | ||
28 | } while (0) | ||
25 | 29 | ||
26 | /* | 30 | /* |
27 | * This pair is called at time of munmap/exit to flush cache and TLB entries | 31 | * This pair is called at time of munmap/exit to flush cache and TLB entries |
28 | * for mappings being torn down. | 32 | * for mappings being torn down. |
29 | * 1) cache-flush part -implemented via tlb_start_vma( ) can be NOP (for now) | 33 | * 1) cache-flush part -implemented via tlb_start_vma( ) can be NOP (for now) |
30 | * as we don't support aliasing configs in our VIPT D$. | 34 | * as we don't support aliasing configs in our VIPT D$. |
31 | * 2) tlb-flush part - implemted via tlb_end_vma( ) can be NOP as well- | 35 | * 2) tlb-flush part - implemted via tlb_end_vma( ) flushes the TLB range |
32 | * albiet for difft reasons - its better handled by moving to new ASID | ||
33 | * | 36 | * |
34 | * Note, read http://lkml.org/lkml/2004/1/15/6 | 37 | * Note, read http://lkml.org/lkml/2004/1/15/6 |
35 | */ | 38 | */ |
36 | #define tlb_start_vma(tlb, vma) | 39 | #define tlb_start_vma(tlb, vma) |
37 | #define tlb_end_vma(tlb, vma) | 40 | |
41 | #define tlb_end_vma(tlb, vma) \ | ||
42 | do { \ | ||
43 | if (!tlb->fullmm) \ | ||
44 | flush_tlb_range(vma, vma->vm_start, vma->vm_end); \ | ||
45 | } while (0) | ||
38 | 46 | ||
39 | #define __tlb_remove_tlb_entry(tlb, ptep, address) | 47 | #define __tlb_remove_tlb_entry(tlb, ptep, address) |
40 | 48 | ||