i387: do not preload FPU state at task switch time

Yes, taking the trap to re-load the FPU/MMX state is expensive, but so is spending several days looking for a bug in the state save/restore code. And the preload code has some rather subtle interactions with both paravirtualization support and segment state restore, so it's not nearly as simple as it should be. Also, now that we no longer necessarily depend on a single bit (ie TS_USEDFPU) for keeping track of the state of the FPU, we migth be able to do better. If we are really switching between two processes that keep touching the FP state, save/restore is inevitable, but in the case of having one process that does most of the FPU usage, we may actually be able to do much better than the preloading. In particular, we may be able to keep track of which CPU the process ran on last, and also per CPU keep track of which process' FP state that CPU has. For modern CPU's that don't destroy the FPU contents on save time, that would allow us to do a lazy restore by just re-enabling the existing FPU state - with no restore cost at all! Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Linus Torvalds <torvalds@linux-foundation.org> 2012-02-16 18:45:23 -0500
committer: Linus Torvalds <torvalds@linux-foundation.org> 2012-02-16 18:45:23 -0500
commit: b3b0870ef3ffed72b92415423da864f440f57ad6 (patch)
tree: b3e128019581669d44e6634d3b1bfb169c73598d /arch/x86/kernel/process_32.c
parent: 6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (diff)
1 files changed, 0 insertions, 20 deletions
diff --git a/arch/x86/kernel/process_32.c b/arch/x86/kernel/process_32.c
index 485204f58cda..324cd722b447 100644
--- a/arch/x86/kernel/process_32.c
+++ b/arch/x86/kernel/process_32.c
@@ -299,23 +299,11 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
                                 *next = &next_p->thread;
        int cpu = smp_processor_id();
        struct tss_struct *tss = &per_cpu(init_tss, cpu);
-        bool preload_fpu;
        /* never put a printk in __switch_to... printk() calls wake_up*() indirectly */
-        /*
-         * If the task has used fpu the last 5 timeslices, just do a full
-         * restore of the math state immediately to avoid the trap; the
-         * chances of needing FPU soon are obviously high now
-         */
-        preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;
        __unlazy_fpu(prev_p);
-        /* we're going to use this soon, after a few expensive things */
-        if (preload_fpu)
-                prefetch(next->fpu.state);
        /*
         * Reload esp0.
         */
@@ -354,11 +342,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
                     task_thread_info(next_p)->flags & _TIF_WORK_CTXSW_NEXT))
                __switch_to_xtra(prev_p, next_p, tss);
-        /* If we're going to preload the fpu context, make sure clts
-           is run while we're batching the cpu state updates. */
-        if (preload_fpu)
-                clts();
        /*
         * Leave lazy mode, flushing any hypercalls made here.
         * This must be done before restoring TLS segments so
@@ -368,9 +351,6 @@ __switch_to(struct task_struct *prev_p, struct task_struct *next_p)
         */
        arch_end_context_switch(next_p);
-        if (preload_fpu)
-                __math_state_restore();
        /*
         * Restore %gs if needed (which is common)
         */
author	Linus Torvalds <torvalds@linux-foundation.org>	2012-02-16 18:45:23 -0500
committer	Linus Torvalds <torvalds@linux-foundation.org>	2012-02-16 18:45:23 -0500
commit	b3b0870ef3ffed72b92415423da864f440f57ad6 (patch)
tree	b3e128019581669d44e6634d3b1bfb169c73598d /arch/x86/kernel/process_32.c
parent	6d59d7a9f5b723a7ac1925c136e93ec83c0c3043 (diff)