aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/process_64.c
Commit message (Collapse)AuthorAge
...
| | * | | | | | x86: prevent stale state of c1e_mask across CPU offline/onlineThomas Gleixner2008-09-23
| | | |_|_|/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Impact: hang which happens across CPU offline/online on AMD C1E systems. When a CPU goes offline then the corresponding bit in the broadcast mask is cleared. For AMD C1E enabled CPUs we do not reenable the broadcast when the CPU comes online again as we do not clear the corresponding bit in the c1e_mask, which keeps track which CPUs have been switched to broadcast already. So on those !$@#& machines we never switch back to broadcasting after a CPU offline/online cycle. Clear the bit when the CPU plays dead. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | | x86: build fix for !CONFIG_SMPAlex Nixon2008-09-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move reset_lazy_tlbstate into tlb_32.c, and define noop versions of play_dead() in process_{32,64}.c when !CONFIG_SMP. Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | x86: unify x86_32 and x86_64 play_dead into one functionAlex Nixon2008-08-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the new play_dead into smpboot.c, as it fits more cleanly in there alongside other CONFIG_HOTPLUG functions. Separate out the common code into its own function. Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | x86: add cpu hotplug hooks into smp_opsAlex Nixon2008-08-25
| |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alex Nixon <alex.nixon@citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | / / x86: invalidate caches before going into suspendMark Langsdorf2008-08-15
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a CPU core is shut down, all of its caches need to be flushed to prevent stale data from causing errors if the core is resumed. Current Linux suspend code performs an assignment after the flush, which can add dirty data back to the cache.  On some AMD platforms, additional speculative reads have caused crashes on resume because of this dirty data. Relocate the cache flush to be the very last thing done before halting.  Tie into an assembly line so the compile will not reorder it.  Add some documentation explaining what is going on and why we're doing this. Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com> Acked-by: Mark Borden <mark.borden@amd.com> Acked-by: Michael Hohmuth <michael.hohmuth@amd.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge branch 'timers-fixes-for-linus' of ↵Linus Torvalds2008-07-24
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well nohz: prevent tick stop outside of the idle loop
| | * | | Merge branch 'linus' into timers/nohzIngo Molnar2008-07-18
| | |\ \ \
| | * | | | nohz: prevent tick stop outside of the idle loopThomas Gleixner2008-07-18
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86: clean up formatting of __switch_toJeremy Fitzhardinge2008-07-16
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | process_64.c:__switch_to has some very old strange formatting, some of it dating back to pre-git. Fix it up. No functional changes. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'auto-ftrace-next' into tracing/for-linusIngo Molnar2008-07-14
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/x86/kernel/entry_32.S arch/x86/kernel/process_32.c arch/x86/kernel/process_64.c arch/x86/lib/Makefile include/asm-x86/irqflags.h kernel/Makefile kernel/sched.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
| | * \ \ Merge branch 'linus' into tracing/ftraceIngo Molnar2008-06-23
| | |\ \ \ | | | | |/ | | | |/|
| | * | | Merge branch 'linus' into tracing/ftraceIngo Molnar2008-06-16
| | |\ \ \
| | * | | | ftrace: trace irq disabled critical timingsSteven Rostedt2008-05-23
| | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds latency tracing for critical timings (how long interrupts are disabled for). "irqsoff" is added to /debugfs/tracing/available_tracers Note: tracing_max_latency also holds the max latency for irqsoff (in usecs). (default to large number so one must start latency tracing) tracing_thresh threshold (in usecs) to always print out if irqs off is detected to be longer than stated here. If irq_thresh is non-zero, then max_irq_latency is ignored. Here's an example of a trace with ftrace_enabled = 0 ======= preemption latency trace v1.1.5 on 2.6.24-rc7 Signed-off-by: Ingo Molnar <mingo@elte.hu> -------------------------------------------------------------------- latency: 100 us, #3/3, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / swapper-0 1d.s3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1d.s3 100us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1d.s3 100us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help ======= And this is a trace with ftrace_enabled == 1 ======= preemption latency trace v1.1.5 on 2.6.24-rc7 -------------------------------------------------------------------- latency: 102 us, #12/12, CPU#1 | (M:rt VP:0, KP:0, SP:0 HP:0 #P:2) ----------------- | task: swapper-0 (uid:0 nice:0 policy:0 rt_prio:0) ----------------- => started at: _spin_lock_irqsave+0x2a/0xb7 => ended at: _spin_unlock_irqrestore+0x32/0x5f _------=> CPU# / _-----=> irqs-off | / _----=> need-resched || / _---=> hardirq/softirq ||| / _--=> preempt-depth |||| / ||||| delay cmd pid ||||| time | caller \ / ||||| \ | / swapper-0 1dNs3 0us+: _spin_lock_irqsave+0x2a/0xb7 (e1000_update_stats+0x47/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_read_phy_reg+0x16/0x225 [e1000] (e1000_update_stats+0x5e2/0x64c [e1000]) swapper-0 1dNs3 46us : e1000_swfw_sync_acquire+0x10/0x99 [e1000] (e1000_read_phy_reg+0x49/0x225 [e1000]) swapper-0 1dNs3 46us : e1000_get_hw_eeprom_semaphore+0x12/0xa6 [e1000] (e1000_swfw_sync_acquire+0x36/0x99 [e1000]) swapper-0 1dNs3 47us : __const_udelay+0x9/0x47 (e1000_read_phy_reg+0x116/0x225 [e1000]) swapper-0 1dNs3 47us+: __delay+0x9/0x50 (__const_udelay+0x45/0x47) swapper-0 1dNs3 97us : preempt_schedule+0xc/0x84 (__delay+0x4e/0x50) swapper-0 1dNs3 98us : e1000_swfw_sync_release+0xc/0x55 [e1000] (e1000_read_phy_reg+0x211/0x225 [e1000]) swapper-0 1dNs3 99us+: e1000_put_hw_eeprom_semaphore+0x9/0x35 [e1000] (e1000_swfw_sync_release+0x50/0x55 [e1000]) swapper-0 1dNs3 101us : _spin_unlock_irqrestore+0xe/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : _spin_unlock_irqrestore+0x32/0x5f (e1000_update_stats+0x641/0x64c [e1000]) swapper-0 1dNs3 102us : trace_hardirqs_on_caller+0x75/0x89 (_spin_unlock_irqrestore+0x32/0x5f) vim:ft=help ======= Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | x86: save %fs and %gs before load_TLS() and arch_leave_lazy_cpu_mode()Jeremy Fitzhardinge2008-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must do this because load_TLS() may need to clear %fs and %gs. (e.g. under Xen). Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | x86, 64-bit: __switch_to(): move arch_leave_lazy_cpu_mode() to the right placeJeremy Fitzhardinge2008-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must leave lazy mode before switching the %fs and %gs selectors. Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | x86: remove open-coded save/load segment operationsJeremy Fitzhardinge2008-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This removes a pile of buggy open-coded implementations of savesegment and loadsegment. (They are buggy because they don't have memory barriers to prevent them from being reordered with respect to memory accesses.) Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: xen-devel <xen-devel@lists.xensource.com> Cc: Stephen Tweedie <sct@redhat.com> Cc: Eduardo Habkost <ehabkost@redhat.com> Cc: Mark McLoughlin <markmc@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | Merge commit 'v2.6.26-rc9' into x86/cpuIngo Molnar2008-07-08
| |\ \ \ \ | | | |_|/ | | |/| |
| * | | | x86: move more common idle functions/variables to process.cThomas Gleixner2008-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | more unification. Should cause no change in functionality. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | x86: simplify idle selectionThomas Gleixner2008-06-10
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | default_idle is selected in cpu_idle(), when no other idle routine is selected. Select it in select_idle_routine() when mwait is not selected. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | Merge branch 'linus' into stackprotectorIngo Molnar2008-06-25
|\ \ \ \ | | |/ / | |/| |
| * | | x86: fix NULL pointer deref in __switch_toSuresh Siddha2008-06-19
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patrick McHardy reported a crash: > > I get this oops once a day, its apparently triggered by something > > run by cron, but the process is a different one each time. > > > > Kernel is -git from yesterday shortly before the -rc6 release > > (last commit is the usb-2.6 merge, the x86 patches are missing), > > .config is attached. > > > > I'll retry with current -git, but the patches that have gone in > > since I last updated don't look related. > > > > [62060.043009] BUG: unable to handle kernel NULL pointer dereference at > > 000001ff > > [62060.043009] IP: [<c0102a9b>] __switch_to+0x2f/0x118 > > [62060.043009] *pde = 00000000 > > [62060.043009] Oops: 0002 [#1] PREEMPT Vegard Nossum analyzed it: > This decodes to > > 0: 0f ae 00 fxsave (%eax) > > so it's related to the floating-point context. This is the exact > location of the crash: > > $ addr2line -e arch/x86/kernel/process_32.o -i ab0 > include/asm/i387.h:232 > include/asm/i387.h:262 > arch/x86/kernel/process_32.c:595 > > ...so it looks like prev_task->thread.xstate->fxsave has become NULL. > Or maybe it never had any other value. Somehow (as described below) TS_USEDFPU is set but the fpu is not allocated or freed. Another possible FPU pre-emption issue with the sleazy FPU optimization which was benign before but not so anymore, with the dynamic FPU allocation patch. New task is getting exec'd and it is prempted at the below point. flush_thread() { ... /* * Forget coprocessor state.. */ clear_fpu(tsk); <----- Preemption point clear_used_math(); ... } Now when it context switches in again, as the used_math() is still set and fpu_counter can be > 5, we will do a math_state_restore() which sets the task's TS_USEDFPU. After it continues from the above preemption point it does clear_used_math() and much later free_thread_xstate(). Now, at the next context switch, it is quite possible that xstate is null, used_math() is not set and TS_USEDFPU is still set. This will trigger unlazy_fpu() causing kernel oops. Fix this by clearing tsk's fpu_counter before clearing task's fpu. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stackSuresh Siddha2008-06-04
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT, and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU optimization". Add tsk_used_math() checks to prevent calling math_state_restore() which can sleep in the case of !tsk_used_math(). This prevents making a blocking call in __switch_to(). Apparently "fpu_counter > 5" check is not enough, as in some signal handling and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible. It's a side effect though. This is the failing scenario: process 'A' in save_i387_ia32() just after clear_used_math() Got an interrupt and pre-empted out. At the next context switch to process 'A' again, kernel tries to restore the math state proactively and sees a fpu_counter > 0 and !tsk_used_math() This results in init_fpu() during the __switch_to()'s math_state_restore() And resulting in fpu corruption which will be saved/restored (save_i387_fxsave and restore_i387_fxsave) during the remaining part of the signal handling after the context switch. Bisected-by: Jürgen Mell <j.mell@t-online.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Jürgen Mell <j.mell@t-online.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org
* | x86: fix the stackprotector canary of the boot CPUIngo Molnar2008-05-26
| | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | stackprotector: add boot_init_stack_canary()Ingo Molnar2008-05-26
| | | | | | | | | | | | | | | | | | add the boot_init_stack_canary() and make the secondary idle threads use it. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: fix canary of the boot CPU's idle taskIngo Molnar2008-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the boot CPU's idle task has a zero stackprotector canary value. this is a special task that is never forked, so the fork code does not randomize its canary. Do it when we hit cpu_idle(). Academic sidenote: this means that the early init code runs with a zero canary and hence the canary becomes predictable for this short, boot-only amount of time. Although attack vectors against early init code are very rare, it might make sense to move this initialization to an earlier point. (to one of the early init functions that never return - such as start_kernel()) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: setup stack canary for the idle threadsArjan van de Ven2008-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | The idle threads for non-boot CPUs are a bit special in how they are created; the result is that these don't have the stack canary set up properly in their PDA. Easiest fix is to just always set the PDA up correctly when entering the idle thread; this is a NOP for the boot cpu. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: fix stackprotector canary updates during context switchesIngo Molnar2008-05-26
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fix a bug noticed and fixed by pageexec@freemail.hu. if built with -fstack-protector-all then we'll have canary checks built into the __switch_to() function. That does not work well with the canary-switching code there: while we already use the %rsp of the new task, we still call __switch_to() whith the previous task's canary value in the PDA, hence the __switch_to() ssp prologue instructions will store the previous canary. Then we update the PDA and upon return from __switch_to() the canary check triggers and we panic. so update the canary after we have called __switch_to(), where we are at the same stackframe level as the last stackframe of the next (and now freshly current) task. Note: this means that we call __switch_to() [and its sub-functions] still with the old canary, but that is not a problem, both the previous and the next task has a high-quality canary. The only (mostly academic) disadvantage is that the canary of one task may leak onto the stack of another task, increasing the risk of information leaks, were an attacker able to read the stack of specific tasks (but not that of others). To solve this we'll have to reorganize the way we switch tasks, and move the PDA setting into the switch_to() assembly code. That will happen in another patch. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* fix idle (arch, acpi and apm) and lockdepPeter Zijlstra2008-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | OK, so 25-mm1 gave a lockdep error which made me look into this. The first thing that I noticed was the horrible mess; the second thing I saw was hacks like: 71e93d15612c61c2e26a169567becf088e71b8ff The problem is that arch idle routines are somewhat inconsitent with their IRQ state handling and instead of fixing _that_, we go paper over the problem. So the thing I've tried to do is set a standard for idle routines and fix them all up to adhere to that. So the rules are: idle routines are entered with IRQs disabled idle routines will exit with IRQs enabled Nearly all already did this in one form or another. Merge the 32 and 64 bit bits so they no longer have different bugs. As for the actual lockdep warning; __sti_mwait() did a plainly un-annotated irq-enable. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Tested-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* "make namespacecheck" fixesIngo Molnar2008-04-24
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86, fpu: lazy allocation of FPU area - v5Suresh Siddha2008-04-19
| | | | | | | | | | | | | Only allocate the FPU area when the application actually uses FPU, i.e., in the first lazy FPU trap. This could save memory for non-fpu using apps. for example: on my system after boot, there are around 300 processes, with only 17 using FPU. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86, fpu: split FPU state from task struct - v5Suresh Siddha2008-04-19
| | | | | | | | | | | | | | | | | | | Split the FPU save area from the task struct. This allows easy migration of FPU context, and it's generally cleaner. It also allows the following two optimizations: 1) only allocate when the application actually uses FPU, so in the first lazy FPU trap. This could save memory for non-fpu using apps. Next patch does this lazy allocation. 2) allocate the right size for the actual cpu rather than 512 bytes always. Patches enabling xsave/xrstor support (coming shortly) will take advantage of this. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: implement prctl PR_GET_TSC and PR_SET_TSCErik Bosman2008-04-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch implements the PR_GET_TSC and PR_SET_TSC prctl() commands on the x86 platform (both 32 and 64 bit.) These commands control the ability to read the timestamp counter from userspace (the RDTSC instruction.) While the RDTSC instuction is a useful profiling tool, it is also the source of some non-determinism in ring-3. For deterministic replay applications it is useful to be able to trap and emulate (and record the outcome of) this instruction. This patch uses code earlier used to disable the timestamp counter for the SECCOMP framework. A side-effect of this patch is that the SECCOMP environment will now also disable the timestamp counter on x86_64 due to the addition of the TIF_NOTSC define on this platform. The code which enables/disables the RDTSC instruction during context switches is in the __switch_to_xtra function, which already handles other unusual conditions, so normal performance should not have to suffer from this change. Signed-off-by: Erik Bosman <ejbosman@cs.vu.nl> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: improve default idleIngo Molnar2008-04-17
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: prevent unconditional writes to DebugCtl MSRJan Beulich2008-04-17
| | | | | | | | | | | | | | Otherwise, enabling (or better, subsequent disabling) of single stepping would cause a kernel oops on CPUs not having this MSR. The patch could have been added a conditional to the MSR write in user_disable_single_step(), but centralizing the updates seems safer and (looking forward) better manageable. Signed-off-by: Jan Beulich <jbeulich@novell.com> Cc: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: de-macro start_thread()Ingo Molnar2008-04-17
| | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: Simplify cpu_idle_waitVenki Pallipadi2008-04-10
| | | | | | | | | | | | | | | | | | | | | | | This patch also resolves hangs on boot: http://lkml.org/lkml/2008/2/23/263 http://bugzilla.kernel.org/show_bug.cgi?id=10093 The bug was causing once-in-few-reboots 10-15 sec wait during boot on certain laptops. Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: disable BTS ptrace extensions for nowIngo Molnar2008-02-29
| | | | | | | | | | | | | | | | revert the BTS ptrace extension for now. based on general objections from Roland McGrath: http://lkml.org/lkml/2008/2/21/323 we'll let the BTS functionality cook some more and re-enable it in v2.6.26. We'll leave the dead code around to help the development of this code. (X86_BTS is not defined at the moment) Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: fix execve with -fstack-protectIngo Molnar2008-02-26
| | | | | | | | | | | | | | | | | | | | pointed out by pageexec@freemail.hu: > what happens here is that gcc treats the argument area as owned by the > callee, not the caller and is allowed to do certain tricks. for ssp it > will make a copy of the struct passed by value into the local variable > area and pass *its* address down, and it won't copy it back into the > original instance stored in the argument area. > > so once sys_execve returns, the pt_regs passed by value hasn't at all > changed and its default content will cause a nice double fault (FWIW, > this part took me the longest to debug, being down with cold didn't > help it either ;). To fix this we pass in pt_regs by pointer. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* aout: remove unnecessary inclusions of {asm, linux}/a.out.hDavid Howells2008-02-08
| | | | | | | | | | Remove now unnecessary inclusions of {asm,linux}/a.out.h. [akpm@linux-foundation.org: fix alpha build] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: fix section mismatch warning in process_*.cSam Ravnborg2008-01-30
| | | | | | | | | | | | | Fix the following warning: WARNING: arch/x86/kernel/built-in.o(.text+0x3): Section mismatch: reference to .cpuinit.data:force_mwait in 'mwait_usable' [Seen on 64 bit only but similar pattern exist on 32 bit so fix it there too] mwait_usable() were only used by a function annotated __cpuinit so annotate mwait_usable() with __cpuinit to fix the warning. Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: remove unneded castsJan Engelhardt2008-01-30
| | | | | | | | x86: remove unneeded casts Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: move warning message of polling idle and HT enabledHiroshi Shimamoto2008-01-30
| | | | | | | | | | | | The warning message at idle_setup() is never shown because smp_num_sibling hasn't been updated at this point yet. Move this polling idle and HT enabled warning to select_idle_routine(). I also implement this warning on 64-bit kernel. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: use the correct cpuid method to detect MWAIT support for C statesAndi Kleen2008-01-30
| | | | | | | | | | | | | | | | | | | | Previously there was a AMD specific quirk to handle the case of AMD Fam10h MWAIT not supporting any C states. But it turns out that CPUID already has ways to detectly detect that without using special quirks. The new code simply checks if MWAIT supports at least C1 and doesn't use it if it doesn't. No more vendor specific code. Note this is does not simply clear MWAIT because MWAIT can be still useful even without C states. Credit goes to Ben Serebrin for pointing out the (nearly) obvious. Cc: "Andreas Herrmann" <andreas.herrmann3@amd.com> Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: make printk_address regs->ip always reliableArjan van de Ven2008-01-30
| | | | | | | | | | printk_address()'s second parameter is the reliability indication, not the ebp. If we're printing regs->ip we're reliable by definition, so pass a 1 here. Signed-off-by: Arjan van de Ven Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: add the capability to print fuzzy backtracesArjan van de Ven2008-01-30
| | | | | | | | | | | | | | For enhancing the 32 bit EBP based backtracer, I need the capability for the backtracer to tell it's customer that an entry is either reliable or unreliable, and the backtrace printing code then needs to print the unreliable ones slightly different. This patch adds the basic capability, the next patch will add a user of this capability. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: move out tick_nohz_stop_sched_tick() call from the loopHiroshi Shimamoto2008-01-30
| | | | | | | | | Move out tick_nohz_stop_sched_tick() call from the loop in cpu_idle same as 32-bit version. Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: x86 user_regset cleanupRoland McGrath2008-01-30
| | | | | | | | | This removes a bunch of dead code that is no longer needed now that the user_regset interfaces are being used for all these jobs. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: provide 64-bit with a load_sp0 function.Glauber de Oliveira Costa2008-01-30
| | | | | | | | | | | | Paravirt guests need to inform the underlying hypervisor whenever the sp0 tss field changes. i386 already has such a function, and we use it for x86_64 too. There's an unnecessary (for 64-bit) msr handling part in the original version, and it is placed around an ifdef. Making no more sense in processor_32.h, it is moved to the common header Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86: unify tss_structGlauber de Oliveira Costa2008-01-30
| | | | | | | | | | | Although slighly different, the tss_struct is very similar in x86_64 and i386. The really different part, which matchs the hardware vision of it, is now called x86_hw_tss, and each of the architectures provides yours. It's then used as a field in the outter tss_struct. Signed-off-by: Glauber de Oliveira Costa <gcosta@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86, ptrace: remove bad commentMarkus Metzger2008-01-30
| | | | | | | | Remove no longer correct comment. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>