diff options
author | Andy Lutomirski <luto@kernel.org> | 2015-04-01 17:26:34 -0400 |
---|---|---|
committer | Ingo Molnar <mingo@kernel.org> | 2015-04-02 05:09:54 -0400 |
commit | 7ea24169097d3d3a3eab2dcc5773bc43fd5593e7 (patch) | |
tree | 5c8663df0bc9f134b4d97b283280784aa846ca3b | |
parent | 80313b3078fcd2ca51970880d90757f05879a193 (diff) |
x86/asm/entry/64: Disable opportunistic SYSRET if regs->flags has TF set
When I wrote the opportunistic SYSRET code, I missed an important difference
between SYSRET and IRET.
Both instructions are capable of setting EFLAGS.TF, but they behave differently
when doing so:
- IRET will not issue a #DB trap after execution when it sets TF.
This is critical -- otherwise you'd never be able to make forward progress when
returning to userspace.
- SYSRET, on the other hand, will trap with #DB immediately after
returning to CPL3, and the next instruction will never execute.
This breaks anything that opportunistically SYSRETs to a user
context with TF set. For example, running this code with TF set
and a SIGTRAP handler loaded never gets past 'post_nop':
extern unsigned char post_nop[];
asm volatile ("pushfq\n\t"
"popq %%r11\n\t"
"nop\n\t"
"post_nop:"
: : "c" (post_nop) : "r11");
In my defense, I can't find this documented in the AMD or Intel manual.
Fix it by using IRET to restore TF.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2a23c6b8a9c4 ("x86_64, entry: Use sysret to return to userspace when possible")
Link: http://lkml.kernel.org/r/9472f1ca4c19a38ecda45bba9c91b7168135fcfa.1427923514.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-rw-r--r-- | arch/x86/kernel/entry_64.S | 16 |
1 files changed, 15 insertions, 1 deletions
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S index 2babb393915e..f0095a76c182 100644 --- a/arch/x86/kernel/entry_64.S +++ b/arch/x86/kernel/entry_64.S | |||
@@ -799,7 +799,21 @@ retint_swapgs: /* return to user-space */ | |||
799 | cmpq %r11,(EFLAGS-ARGOFFSET)(%rsp) /* R11 == RFLAGS */ | 799 | cmpq %r11,(EFLAGS-ARGOFFSET)(%rsp) /* R11 == RFLAGS */ |
800 | jne opportunistic_sysret_failed | 800 | jne opportunistic_sysret_failed |
801 | 801 | ||
802 | testq $X86_EFLAGS_RF,%r11 /* sysret can't restore RF */ | 802 | /* |
803 | * SYSRET can't restore RF. SYSRET can restore TF, but unlike IRET, | ||
804 | * restoring TF results in a trap from userspace immediately after | ||
805 | * SYSRET. This would cause an infinite loop whenever #DB happens | ||
806 | * with register state that satisfies the opportunistic SYSRET | ||
807 | * conditions. For example, single-stepping this user code: | ||
808 | * | ||
809 | * movq $stuck_here,%rcx | ||
810 | * pushfq | ||
811 | * popq %r11 | ||
812 | * stuck_here: | ||
813 | * | ||
814 | * would never get past 'stuck_here'. | ||
815 | */ | ||
816 | testq $(X86_EFLAGS_RF|X86_EFLAGS_TF), %r11 | ||
803 | jnz opportunistic_sysret_failed | 817 | jnz opportunistic_sysret_failed |
804 | 818 | ||
805 | /* nothing to check for RSP */ | 819 | /* nothing to check for RSP */ |