aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorSuresh Siddha <suresh.b.siddha@intel.com>2008-06-19 12:41:22 -0400
committerIngo Molnar <mingo@elte.hu>2008-06-20 07:26:18 -0400
commit54481cf88bc59923ea30f2ca345a73c60155e901 (patch)
tree1965fad9a3a207a619b5506722208f93788210b8
parentffe6e1da86d21d7855495b5a772c93f050258f6e (diff)
x86: fix NULL pointer deref in __switch_to
I am able to reproduce the oops reported by Simon in __switch_to() with lguest. My debug showed that there is at least one lguest specific issue (which should be present in 2.6.25 and before aswell) and it got exposed with a kernel oops with the recent fpu dynamic allocation patches. In addition to the previous possible scenario (with fpu_counter), in the presence of lguest, it is possible that the cpu's TS bit it still set and the lguest launcher task's thread_info has TS_USEDFPU still set. This is because of the way the lguest launcher handling the guest's TS bit. (look at lguest_set_ts() in lguest_arch_run_guest()). This can result in a DNA fault while doing unlazy_fpu() in __switch_to(). This will end up causing a DNA fault in the context of new process thats getting context switched in (as opossed to handling DNA fault in the context of lguest launcher/helper process). This is wrong in both pre and post 2.6.25 kernels. In the recent 2.6.26-rc series, this is showing up as NULL pointer dereferences or sleeping function called from atomic context(__switch_to()), as we free and dynamically allocate the FPU context for the newly created threads. Older kernels might show some FPU corruption for processes running inside of lguest. With the appended patch, my test system is running for more than 50 mins now. So atleast some of your oops (hopefully all!) should get fixed. Please give it a try. I will spend more time with this fix tomorrow. Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk> Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
-rw-r--r--drivers/lguest/x86/core.c15
1 files changed, 9 insertions, 6 deletions
diff --git a/drivers/lguest/x86/core.c b/drivers/lguest/x86/core.c
index 5126d5d9ea0e..2e554a4ab337 100644
--- a/drivers/lguest/x86/core.c
+++ b/drivers/lguest/x86/core.c
@@ -176,7 +176,7 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
176 * we set it now, so we can trap and pass that trap to the Guest if it 176 * we set it now, so we can trap and pass that trap to the Guest if it
177 * uses the FPU. */ 177 * uses the FPU. */
178 if (cpu->ts) 178 if (cpu->ts)
179 lguest_set_ts(); 179 unlazy_fpu(current);
180 180
181 /* SYSENTER is an optimized way of doing system calls. We can't allow 181 /* SYSENTER is an optimized way of doing system calls. We can't allow
182 * it because it always jumps to privilege level 0. A normal Guest 182 * it because it always jumps to privilege level 0. A normal Guest
@@ -196,6 +196,10 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
196 * trap made the switcher code come back, and an error code which some 196 * trap made the switcher code come back, and an error code which some
197 * traps set. */ 197 * traps set. */
198 198
199 /* Restore SYSENTER if it's supposed to be on. */
200 if (boot_cpu_has(X86_FEATURE_SEP))
201 wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
202
199 /* If the Guest page faulted, then the cr2 register will tell us the 203 /* If the Guest page faulted, then the cr2 register will tell us the
200 * bad virtual address. We have to grab this now, because once we 204 * bad virtual address. We have to grab this now, because once we
201 * re-enable interrupts an interrupt could fault and thus overwrite 205 * re-enable interrupts an interrupt could fault and thus overwrite
@@ -203,13 +207,12 @@ void lguest_arch_run_guest(struct lg_cpu *cpu)
203 if (cpu->regs->trapnum == 14) 207 if (cpu->regs->trapnum == 14)
204 cpu->arch.last_pagefault = read_cr2(); 208 cpu->arch.last_pagefault = read_cr2();
205 /* Similarly, if we took a trap because the Guest used the FPU, 209 /* Similarly, if we took a trap because the Guest used the FPU,
206 * we have to restore the FPU it expects to see. */ 210 * we have to restore the FPU it expects to see.
211 * math_state_restore() may sleep and we may even move off to
212 * a different CPU. So all the critical stuff should be done
213 * before this. */
207 else if (cpu->regs->trapnum == 7) 214 else if (cpu->regs->trapnum == 7)
208 math_state_restore(); 215 math_state_restore();
209
210 /* Restore SYSENTER if it's supposed to be on. */
211 if (boot_cpu_has(X86_FEATURE_SEP))
212 wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0);
213} 216}
214 217
215/*H:130 Now we've examined the hypercall code; our Guest can make requests. 218/*H:130 Now we've examined the hypercall code; our Guest can make requests.