x86: fix performance regression in write() syscall

While the introduction of __copy_from_user_nocache (see commit: 0812a579c92fefa57506821fa08e90f47cb6dbdd) may have been an improvement for sufficiently large writes, there is evidence to show that it is deterimental for small writes. Unixbench's fstime test gives the following results for 256 byte writes with MAX_BLOCK of 2000: 2.6.29-rc6 ( 5 samples, each in KB/sec ): 283750, 295200, 294500, 293000, 293300 2.6.29-rc6 + this patch (5 samples, each in KB/sec): 313050, 3106750, 293350, 306300, 307900 2.6.18 395700, 342000, 399100, 366050, 359850 See w_test() in src/fstime.c in unixbench version 4.1.0. Basically, the above test consists of counting how much we can write in this manner: alarm(10); while (!sigalarm) { for (f_blocks = 0; f_blocks < 2000; ++f_blocks) { write(f, buf, 256); } lseek(f, 0L, 0); } Note, there are other components to the write syscall regression that are not addressed here. Signed-off-by: Salman Qazi <sqazi@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
author: Salman Qazi <sqazi@google.com> 2009-02-23 21:03:04 -0500
committer: Ingo Molnar <mingo@elte.hu> 2009-02-24 11:16:36 -0500
commit: 30d697fa3a25fed809a873b17531a00282dc1234 (patch)
tree: bc17d39779914b4dcf7a7d7b9e08d8b020cbf0b6 /arch/x86/include
parent: cb425afd2183e90a481bb211ff49361a117a3ecc (diff)
1 files changed, 14 insertions, 2 deletions
diff --git a/arch/x86/include/asm/uaccess_64.h b/arch/x86/include/asm/uaccess_64.h
index 84210c479fca..987a2c10fe20 100644
--- a/arch/x86/include/asm/uaccess_64.h
+++ b/arch/x86/include/asm/uaccess_64.h
@@ -192,14 +192,26 @@ static inline int __copy_from_user_nocache(void *dst, const void __user *src,
                                           unsigned size)
 {
        might_sleep();
-        return __copy_user_nocache(dst, src, size, 1);
+        /*
+         * In practice this limit means that large file write()s
+         * which get chunked to 4K copies get handled via
+         * non-temporal stores here. Smaller writes get handled
+         * via regular __copy_from_user():
+         */
+        if (likely(size >= PAGE_SIZE))
+                return __copy_user_nocache(dst, src, size, 1);
+        else
+                return __copy_from_user(dst, src, size);
 }
 static inline int __copy_from_user_inatomic_nocache(void *dst,
                                                    const void __user *src,
                                                    unsigned size)
 {
-        return __copy_user_nocache(dst, src, size, 0);
+        if (likely(size >= PAGE_SIZE))
+                return __copy_user_nocache(dst, src, size, 0);
+        else
+                return __copy_from_user_inatomic(dst, src, size);
 }
 unsigned long
author	Salman Qazi <sqazi@google.com>	2009-02-23 21:03:04 -0500
committer	Ingo Molnar <mingo@elte.hu>	2009-02-24 11:16:36 -0500
commit	30d697fa3a25fed809a873b17531a00282dc1234 (patch)
tree	bc17d39779914b4dcf7a7d7b9e08d8b020cbf0b6 /arch/x86/include
parent	cb425afd2183e90a481bb211ff49361a117a3ecc (diff)