aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorNick Piggin <npiggin@suse.de>2007-10-12 21:07:38 -0400
committerLinus Torvalds <torvalds@woody.linux-foundation.org>2007-10-12 21:41:21 -0400
commitb6c7347fffa655a3000d9d41640d222c19fc3065 (patch)
treeef1789ab0656997f0491e051b92cf833948f2307
parent4071c718555d955a35e9651f77086096ad87d498 (diff)
x86: optimise barriers
According to latest memory ordering specification documents from Intel and AMD, both manufacturers are committed to in-order loads from cacheable memory for the x86 architecture. Hence, smp_rmb() may be a simple barrier. Also according to those documents, and according to existing practice in Linux (eg. spin_unlock doesn't enforce ordering), stores to cacheable memory are visible in program order too. Special string stores are safe -- their constituent stores may be out of order, but they must complete in order WRT surrounding stores. Nontemporal stores to WB memory can go out of order, and so they should be fenced explicitly to make them appear in-order WRT other stores. Hence, smp_wmb() may be a simple barrier. http://developer.intel.com/products/processor/manuals/318147.pdf http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf In userspace microbenchmarks on a core2 system, fence instructions range anywhere from around 15 cycles to 50, which may not be totally insignificant in performance critical paths (code size will go down too). However the primary motivation for this is to have the canonical barrier implementation for x86 architecture. smp_rmb on buggy pentium pros remains a locked op, which is apparently required. Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-rw-r--r--include/asm-x86/system_32.h6
-rw-r--r--include/asm-x86/system_64.h4
2 files changed, 7 insertions, 3 deletions
diff --git a/include/asm-x86/system_32.h b/include/asm-x86/system_32.h
index 8b15bd3057c9..e7e5d426fef5 100644
--- a/include/asm-x86/system_32.h
+++ b/include/asm-x86/system_32.h
@@ -274,7 +274,11 @@ static inline unsigned long get_limit(unsigned long segment)
274 274
275#ifdef CONFIG_SMP 275#ifdef CONFIG_SMP
276#define smp_mb() mb() 276#define smp_mb() mb()
277#define smp_rmb() rmb() 277#ifdef CONFIG_X86_PPRO_FENCE
278# define smp_rmb() rmb()
279#else
280# define smp_rmb() barrier()
281#endif
278#ifdef CONFIG_X86_OOSTORE 282#ifdef CONFIG_X86_OOSTORE
279# define smp_wmb() wmb() 283# define smp_wmb() wmb()
280#else 284#else
diff --git a/include/asm-x86/system_64.h b/include/asm-x86/system_64.h
index eff730b11926..5022aecc333d 100644
--- a/include/asm-x86/system_64.h
+++ b/include/asm-x86/system_64.h
@@ -141,8 +141,8 @@ static inline void write_cr8(unsigned long val)
141 141
142#ifdef CONFIG_SMP 142#ifdef CONFIG_SMP
143#define smp_mb() mb() 143#define smp_mb() mb()
144#define smp_rmb() rmb() 144#define smp_rmb() barrier()
145#define smp_wmb() wmb() 145#define smp_wmb() barrier()
146#define smp_read_barrier_depends() do {} while(0) 146#define smp_read_barrier_depends() do {} while(0)
147#else 147#else
148#define smp_mb() barrier() 148#define smp_mb() barrier()