[PATCH] i386: add sleazy FPU optimization

i386 port of the sLeAZY-fpu feature. Chuck reports that this gives him a +/- 0.4% improvement on his simple benchmark x86_64 description follows: Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every* context switch a trap is taken for the first FPU use to restore the FPU context lazily. This is of course great for applications that have very sporadic or no FPU use (since then you avoid doing the expensive save/restore all the time). However for very frequent FPU users... you take an extra trap every context switch. The patch below adds a simple heuristic to this code: After 5 consecutive context switches of FPU use, the lazy behavior is disabled and the context gets restored every context switch. If the app indeed uses the FPU, the trap is avoided. (the chance of the 6th time slice using FPU after the previous 5 having done so are quite high obviously). After 256 switches, this is reset and lazy behavior is returned (until there are 5 consecutive ones again). The reason for this is to give apps that do longer bursts of FPU use still the lazy behavior back after some time. Signed-off-by: Chuck Ebbert <76306.1226@compuserve.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Andi Kleen <ak@suse.de>
author: Chuck Ebbert <76306.1226@compuserve.com> 2006-12-06 20:14:01 -0500
committer: Andi Kleen <andi@basil.nowhere.org> 2006-12-06 20:14:01 -0500
commit: acc207616a91a413a50fdd8847a747c4a7324167 (patch)
tree: 71f603615d7c9da8af47fd89346dce9a2e341456 /arch/i386/kernel/process.c
parent: be44d2aabce2d62f72d5751d1871b6212bf7a1c7 (diff)
1 files changed, 12 insertions, 0 deletions
diff --git a/arch/i386/kernel/process.c b/arch/i386/kernel/process.c
index dd53c58f64f1..ae924c416b68 100644
--- a/arch/i386/kernel/process.c
+++ b/arch/i386/kernel/process.c
@@ -648,6 +648,11 @@ struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct tas
        __unlazy_fpu(prev_p);
+        /* we're going to use this soon, after a few expensive things */
+        if (next_p->fpu_counter > 5)
+                prefetch(&next->i387.fxsave);
        /*
         * Reload esp0.
         */
@@ -697,6 +702,13 @@ struct task_struct fastcall * __switch_to(struct task_struct *prev_p, struct tas
        disable_tsc(prev_p, next_p);
+        /* If the task has used fpu the last 5 timeslices, just do a full
+         * restore of the math state immediately to avoid the trap; the
+         * chances of needing FPU soon are obviously high now
+         */
+        if (next_p->fpu_counter > 5)
+                math_state_restore();
        return prev_p;
 }
author	Chuck Ebbert <76306.1226@compuserve.com>	2006-12-06 20:14:01 -0500
committer	Andi Kleen <andi@basil.nowhere.org>	2006-12-06 20:14:01 -0500
commit	acc207616a91a413a50fdd8847a747c4a7324167 (patch)
tree	71f603615d7c9da8af47fd89346dce9a2e341456 /arch/i386/kernel/process.c
parent	be44d2aabce2d62f72d5751d1871b6212bf7a1c7 (diff)