aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu/common.c
Commit message (Collapse)AuthorAge
* Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-04-14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf changes from Ingo Molnar: "Core kernel changes: - One of the more interesting features in this cycle is the ability to attach eBPF programs (user-defined, sandboxed bytecode executed by the kernel) to kprobes. This allows user-defined instrumentation on a live kernel image that can never crash, hang or interfere with the kernel negatively. (Right now it's limited to root-only, but in the future we might allow unprivileged use as well.) (Alexei Starovoitov) - Another non-trivial feature is per event clockid support: this allows, amongst other things, the selection of different clock sources for event timestamps traced via perf. This feature is sought by people who'd like to merge perf generated events with external events that were measured with different clocks: - cluster wide profiling - for system wide tracing with user-space events, - JIT profiling events etc. Matching perf tooling support is added as well, available via the -k, --clockid <clockid> parameter to perf record et al. (Peter Zijlstra) Hardware enablement kernel changes: - x86 Intel Processor Trace (PT) support: which is a hardware tracer on steroids, available on Broadwell CPUs. The hardware trace stream is directly output into the user-space ring-buffer, using the 'AUX' data format extension that was added to the perf core to support hardware constraints such as the necessity to have the tracing buffer physically contiguous. This patch-set was developed for two years and this is the result. A simple way to make use of this is to use BTS tracing, the PT driver emulates BTS output - available via the 'intel_bts' PMU. More explicit PT specific tooling support is in the works as well - will probably be ready by 4.2. (Alexander Shishkin, Peter Zijlstra) - x86 Intel Cache QoS Monitoring (CQM) support: this is a hardware feature of Intel Xeon CPUs that allows the measurement and allocation/partitioning of caches to individual workloads. These kernel changes expose the measurement side as a new PMU driver, which exposes various QoS related PMU events. (The partitioning change is work in progress and is planned to be merged as a cgroup extension.) (Matt Fleming, Peter Zijlstra; CPU feature detection by Peter P Waskiewicz Jr) - x86 Intel Haswell LBR call stack support: this is a new Haswell feature that allows the hardware recording of call chains, plus tooling support. To activate this feature you have to enable it via the new 'lbr' call-graph recording option: perf record --call-graph lbr perf report or: perf top --call-graph lbr This hardware feature is a lot faster than stack walk or dwarf based unwinding, but has some limitations: - It reuses the current LBR facility, so LBR call stack and branch record can not be enabled at the same time. - It is only available for user-space callchains. (Yan, Zheng) - x86 Intel Broadwell CPU support and various event constraints and event table fixes for earlier models. (Andi Kleen) - x86 Intel HT CPUs event scheduling workarounds. This is a complex CPU bug affecting the SNB,IVB,HSW families that results in counter value corruption. The mitigation code is automatically enabled and is transparent. (Maria Dimakopoulou, Stephane Eranian) The perf tooling side had a ton of changes in this cycle as well, so I'm only able to list the user visible changes here, in addition to the tooling changes outlined above: User visible changes affecting all tools: - Improve support of compressed kernel modules (Jiri Olsa) - Save DSO loading errno to better report errors (Arnaldo Carvalho de Melo) - Bash completion for subcommands (Yunlong Song) - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa) - Support missing -f to override perf.data file ownership. (Yunlong Song) - Show the first event with an invalid filter (David Ahern, Arnaldo Carvalho de Melo) User visible changes in individual tools: 'perf data': New tool for converting perf.data to other formats, initially for the CTF (Common Trace Format) from LTTng (Jiri Olsa, Sebastian Siewior) 'perf diff': Add --kallsyms option (David Ahern) 'perf list': Allow listing events with 'tracepoint' prefix (Yunlong Song) Sort the output of the command (Yunlong Song) 'perf kmem': Respect -i option (Jiri Olsa) Print big numbers using thousands' group (Namhyung Kim) Allow -v option (Namhyung Kim) Fix alignment of slab result table (Namhyung Kim) 'perf probe': Support multiple probes on different binaries on the same command line (Masami Hiramatsu) Support unnamed union/structure members data collection. (Masami Hiramatsu) Check kprobes blacklist when adding new events. (Masami Hiramatsu) 'perf record': Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra) Support recording running/enabled time (Andi Kleen) 'perf sched': Improve the performance of 'perf sched replay' on high CPU core count machines (Yunlong Song) 'perf report' and 'perf top': Allow annotating entries in callchains in the hists browser (Arnaldo Carvalho de Melo) Indicate which callchain entries are annotated in the TUI hists browser (Arnaldo Carvalho de Melo) Add pid/tid filtering to 'report' and 'script' commands (David Ahern) Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT events (Arnaldo Carvalho de Melo) 'perf stat': Report unsupported events properly (Suzuki K. Poulose) Output running time and run/enabled ratio in CSV mode (Andi Kleen) 'perf trace': Handle legacy syscalls tracepoints (David Ahern, Arnaldo Carvalho de Melo) Only insert blank duration bracket when tracing syscalls (Arnaldo Carvalho de Melo) Filter out the trace pid when no threads are specified (Arnaldo Carvalho de Melo) Dump stack on segfaults (Arnaldo Carvalho de Melo) No need to explicitely enable evsels for workload started from perf, let it be enabled via perf_event_attr.enable_on_exec, removing some events that take place in the 'perf trace' before a workload is really started by it. (Arnaldo Carvalho de Melo) Allow mixing with tracepoints and suppressing plain syscalls. (Arnaldo Carvalho de Melo) There's also been a ton of infrastructure work done, such as the split-out of perf's build system into tools/build/ and other changes - see the shortlog and changelog for details" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (358 commits) perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init() perf evlist: Fix type for references to data_head/tail perf probe: Check the orphaned -x option perf probe: Support multiple probes on different binaries perf buildid-list: Fix segfault when show DSOs with hits perf tools: Fix cross-endian analysis perf tools: Fix error path to do closedir() when synthesizing threads perf tools: Fix synthesizing fork_event.ppid for non-main thread perf tools: Add 'I' event modifier for exclude_idle bit perf report: Don't call map__kmap if map is NULL. perf tests: Fix attr tests perf probe: Fix ARM 32 building error perf tools: Merge all perf_event_attr print functions perf record: Add clockid parameter perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10 perf sched replay: Support using -f to override perf.data file ownership perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task perf sched replay: Fix the segmentation fault problem caused by pr_err in threads perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations ...
| * Merge branch 'perf/x86' into perf/core, because it's readyIngo Molnar2015-03-27
| |\ | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * x86: Add support for Intel Cache QoS Monitoring (CQM) detectionPeter P Waskiewicz Jr2015-02-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the new Cache QoS Monitoring (CQM) feature found in future Intel Xeon processors. It includes the new values to track CQM resources to the cpuinfo_x86 structure, plus the CPUID detection routines for CQM. CQM allows a process, or set of processes, to be tracked by the CPU to determine the cache usage of that task group. Using this data from the CPU, software can be written to extract this data and report cache usage and occupancy for a particular process, or group of processes. More information about Cache QoS Monitoring can be found in the Intel (R) x86 Architecture Software Developer Manual, section 17.14. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Borislav Petkov <bp@suse.de> Cc: Chris Webb <chris@arachsys.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Igor Mammedov <imammedo@redhat.com> Cc: Jacob Shin <jacob.w.shin@gmail.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kanaka Juvva <kanaka.d.juvva@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Honeyman <stevenhoneyman@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com> Link: http://lkml.kernel.org/r/1422038748-21397-5-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/64: Use a define for an invalid segment selectorBorislav Petkov2015-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... instead of a naked number, for better readability. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/64: Fix MSR_IA32_SYSENTER_CS MSR valueBorislav Petkov2015-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: d56fe4bf5f3c ("x86/asm/entry/64: Always set up SYSENTER MSRs") missed to add "ULL" to the 0 and wrmsrl_safe() complains: arch/x86/kernel/cpu/common.c: In function ‘syscall_init’: arch/x86/kernel/cpu/common.c:1226:2: warning: right shift count >= width of type wrmsrl_safe(MSR_IA32_SYSENTER_CS, 0); Fix it. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1428054130-25847-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/32: Stop caching MSR_IA32_SYSENTER_ESP in tss.sp1Andy Lutomirski2015-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We write a stack pointer to MSR_IA32_SYSENTER_ESP exactly once, and we unnecessarily cache the value in tss.sp1. We never read the cached value. Remove all of the caching. It serves no purpose. Suggested-by: Denys Vlasenko <dvlasenk@redhat.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/05a0163eb33ef5208363f0015496855da7cebadd.1428002830.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/64: Fix comment about SYSENTER MSRsDenys Vlasenko2015-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment is ancient, it dates to the time when only AMD's x86_64 implementation existed. AMD wasn't (and still isn't) supporting SYSENTER, so these writes were "just in case" back then. This has changed: Intel's x86_64 appeared, and Intel does support SYSENTER in long mode. "Some future 64-bit CPU" is here already. The code may appear "buggy" for AMD as it stands, since MSR_IA32_SYSENTER_EIP is only 32-bit for AMD CPUs. Writing a kernel function's address to it would drop high bits. Subsequent use of this MSR for branch via SYSENTER seem to allow user to transition to CPL0 while executing his code. Scary, eh? Explain why that is not a bug: because SYSENTER insn would not work on AMD CPU. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1427453956-21931-1-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/64: Always set up SYSENTER MSRsIngo Molnar2015-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On CONFIG_IA32_EMULATION=y kernels we set up MSR_IA32_SYSENTER_CS/ESP/EIP, but on !CONFIG_IA32_EMULATION kernels we leave them unchanged. Clear them to make sure the instruction is disabled properly. SYSCALL is set up properly in both cases. Acked-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry: Get rid of KERNEL_STACK_OFFSETDenys Vlasenko2015-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PER_CPU_VAR(kernel_stack) was set up in a way where it points five stack slots below the top of stack. Presumably, it was done to avoid one "sub $5*8,%rsp" in syscall/sysenter code paths, where iret frame needs to be created by hand. Ironically, none of them benefits from this optimization, since all of them need to allocate additional data on stack (struct pt_regs), so they still have to perform subtraction. This patch eliminates KERNEL_STACK_OFFSET. PER_CPU_VAR(kernel_stack) now points directly to top of stack. pt_regs allocations are adjusted to allocate iret frame as well. Hopefully we can merge it later with 32-bit specific PER_CPU_VAR(cpu_current_top_of_stack) variable... Net result in generated code is that constants in several insns are changed. This change is necessary for changing struct pt_regs creation in SYSCALL64 code path from MOV to PUSH instructions. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1426785469-15125-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/64: Fold syscall32_cpu_init() into its sole userDenys Vlasenko2015-03-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Having syscall32/sysenter32 initialization in a separate tiny function, called from within a function that is already syscall init specific, serves no real purpose. Its existense also caused an unintended effect of having wrmsrl(MSR_CSTAR) performed twice: once we set it to a dummy function returning -ENOSYS, and immediately after (if CONFIG_IA32_EMULATION), we set it to point to the proper syscall32 entry point, ia32_cstar_target. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry: Document and clean up the enable_sep_cpu() and ↵Ingo Molnar2015-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syscall32_cpu_init() functions Clean up the flow and document the functions a bit better. Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry/32: Document the 32-bit SYSENTER "emergency stack" betterDenys Vlasenko2015-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the patch, the 'tss_struct::stack' field was not referenced anywhere. It was used only to set SYSENTER's stack to point after the last byte of tss_struct, thus the trailing field, stack[64], was used. But grep would not know it. You can comment it out, compile, and kernel will even run until an unlucky NMI corrupts io_bitmap[] (which is also not easily detectable). This patch changes code so that the purpose and usage of this field is not mysterious anymore, and can be easily grepped for. This does change generated code, for a subtle reason: since tss_struct is ____cacheline_aligned, there happens to be 5 longs of padding at the end. Old code was using the padding too; new code will strictly use it only for SYSENTER_stack[]. Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Drewry <wad@chromium.org> Link: http://lkml.kernel.org/r/1425912738-559-2-git-send-email-dvlasenk@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry: Replace this_cpu_sp0() with current_top_of_stack() and fix it ↵Andy Lutomirski2015-03-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | on x86_32 I broke 32-bit kernels. The implementation of sp0 was correct as far as I can tell, but sp0 was much weirder on x86_32 than I realized. It has the following issues: - Init's sp0 is inconsistent with everything else's: non-init tasks are offset by 8 bytes. (I have no idea why, and the comment is unhelpful.) - vm86 does crazy things to sp0. Fix it up by replacing this_cpu_sp0() with current_top_of_stack() and using a new percpu variable to track the top of the stack on x86_32. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 75182b1632a8 ("x86/asm/entry: Switch all C consumers of kernel_stack to this_cpu_sp0()") Link: http://lkml.kernel.org/r/d09dbe270883433776e0cbee3c7079433349e96d.1425692936.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/asm/entry: Rename 'init_tss' to 'cpu_tss'Andy Lutomirski2015-03-06
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has nothing to do with init -- there's only one TSS per cpu. Other names considered include: - current_tss: Confusing because we never switch the tss. - singleton_tss: Too long. This patch was generated with 's/init_tss/cpu_tss/g'. Followup patches will fix INIT_TSS and INIT_TSS_IST by hand. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/da29fb2a793e4f649d93ce2d1ed320ebe8516262.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | x86: Init per-cpu shadow copy of CR4 on 32-bit CPUs tooSteven Rostedt2015-02-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") added a shadow CR4 such that reads and writes that do not modify the CR4 execute much faster than always reading the register itself. The change modified cpu_init() in common.c, so that the shadow CR4 gets initialized before anything uses it. Unfortunately, there's two cpu_init()s in common.c. There's one for 64-bit and one for 32-bit. The commit only added the shadow init to the 64-bit path, but the 32-bit path needs the init too. Link: http://lkml.kernel.org/r/20150227125208.71c36402@gandalf.local.home Fixes: 1e02ce4cccdc "x86: Store a per-cpu shadow copy of CR4" Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150227145019.2bdd4354@gandalf.local.home Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-02-16
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation
| * x86: Store a per-cpu shadow copy of CR4Andy Lutomirski2015-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Context switches and TLB flushes can change individual bits of CR4. CR4 reads take several cycles, so store a shadow copy of CR4 in a per-cpu variable. To avoid wasting a cache line, I added the CR4 shadow to cpu_tlbstate, which is already touched in switch_mm. The heaviest users of the cr4 shadow will be switch_mm and __switch_to_xtra, and __switch_to_xtra is called shortly after switch_mm during context switch, so the cacheline is likely to be hot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Kees Cook <keescook@chromium.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/3a54dd3353fffbf84804398e00dfdc5b7c1afd7d.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86: Clean up cr4 manipulationAndy Lutomirski2015-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'x86-cleanups-for-linus' of ↵Linus Torvalds2015-02-09
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: "Misc cleanups" * 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/rtc: Remove duplicate const specifier x86, early_serial_console: Remove unnecessary check x86, early_serial_console: Remove unused macro XMTRDY x86, setup: Rename BOOT_ISDIGIT_H to BOOT_CTYPE_H x86, CPU: Fix trivial printk formatting issues with dmesg
| * | x86, CPU: Fix trivial printk formatting issues with dmesgSteven Honeyman2015-01-10
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dmesg (from util-linux) currently has two methods for reading the kernel message ring buffer: /dev/kmsg and syslog(2). Since kernel 3.5.0 kmsg has been the default, which escapes control characters (e.g. new lines) before they are shown. This change means that when dmesg is using /dev/kmsg, a 2 line printk makes the output messy, because the second line does not get a timestamp. For example: [ 0.012863] CPU0: Thermal monitoring enabled (TM1) [ 0.012869] Last level iTLB entries: 4KB 1024, 2MB 1024, 4MB 1024 Last level dTLB entries: 4KB 1024, 2MB 1024, 4MB 1024, 1GB 4 [ 0.012958] Freeing SMP alternatives memory: 28K (ffffffff81d86000 - ffffffff81d8d000) [ 0.014961] dmar: Host address width 39 Because printk.c intentionally escapes control characters, they should not be there in the first place. This patch fixes two occurrences of this. Signed-off-by: Steven Honeyman <stevenhoneyman@gmail.com> Link: https://lkml.kernel.org/r/1414856696-8094-1-git-send-email-stevenhoneyman@gmail.com [ Boris: make cpu_detect_tlb() static, while at it. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* / x86/x2apic: Split enable and setup functionThomas Gleixner2015-01-22
|/ | | | | | | | | | | | | | | | | enable_x2apic() is a convoluted unreadable mess because it is used for both enablement in early boot and for setup in cpu_init(). Split the code into x2apic_enable() for enablement and x2apic_setup() for setup of (secondary cpus). Make use of the new state tracking to simplify the logic. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/r/20150115211703.129287153@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds2014-12-10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Various vDSO updates from Andy Lutomirski, mostly cleanups and reorganization to improve maintainability, but also some micro-optimizations and robustization changes" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86_64/vsyscall: Restore orig_ax after vsyscall seccomp x86_64: Add a comment explaining the TASK_SIZE_MAX guard page x86_64,vsyscall: Make vsyscall emulation configurable x86_64, vsyscall: Rewrite comment and clean up headers in vsyscall code x86_64, vsyscall: Turn vsyscalls all the way off when vsyscall==none x86,vdso: Use LSL unconditionally for vgetcpu x86: vdso: Fix build with older gcc x86_64/vdso: Clean up vgetcpu init and merge the vdso initcalls x86_64/vdso: Remove jiffies from the vvar page x86/vdso: Make the PER_CPU segment 32 bits x86/vdso: Make the PER_CPU segment start out accessed x86/vdso: Change the PER_CPU segment to use struct desc_struct x86_64/vdso: Move getcpu code from vsyscall_64.c to vdso/vma.c x86_64/vsyscall: Move all of the gate_area code to vsyscall_64.c
| * x86,vdso: Use LSL unconditionally for vgetcpuAndy Lutomirski2014-11-03
| | | | | | | | | | | | | | | | | | | | LSL is faster than RDTSCP and works everywhere; there's no need to switch between them depending on CPU. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Andi Kleen <andi@firstfloor.org> Link: http://lkml.kernel.org/r/72f73d5ec4514e02bba345b9759177ef03742efb.1414706021.git.luto@amacapital.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | x86: Require exact match for 'noxsave' command line optionDave Hansen2014-11-16
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have some very similarly named command-line options: arch/x86/kernel/cpu/common.c:__setup("noxsave", x86_xsave_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaveopt", x86_xsaveopt_setup); arch/x86/kernel/cpu/common.c:__setup("noxsaves", x86_xsaves_setup); __setup() is designed to match options that take arguments, like "foo=bar" where you would have: __setup("foo", x86_foo_func...); The problem is that "noxsave" actually _matches_ "noxsaves" in the same way that "foo" matches "foo=bar". If you boot an old kernel that does not know about "noxsaves" with "noxsaves" on the command line, it will interpret the argument as "noxsave", which is not what you want at all. This makes the "noxsave" handler only return success when it finds an *exact* match. [ tglx: We really need to make __setup() more robust. ] Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Hansen <dave@sr71.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: x86@kernel.org Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20141111220133.FE053984@viggo.jf.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds2014-10-15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
| * x86: Replace __get_cpu_var usesChristoph Lameter2014-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds2014-10-13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge second patch-bomb from Andrew Morton: - a few hotfixes - drivers/dma updates - MAINTAINERS updates - Quite a lot of lib/ updates - checkpatch updates - binfmt updates - autofs4 - drivers/rtc/ - various small tweaks to less used filesystems - ipc/ updates - kernel/watchdog.c changes * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (135 commits) mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared kernel/param: consolidate __{start,stop}___param[] in <linux/moduleparam.h> ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[] frv: remove unused declarations of __start___ex_table and __stop___ex_table kvm: ensure hard lockup detection is disabled by default kernel/watchdog.c: control hard lockup detection default staging: rtl8192u: use %*pEn to escape buffer staging: rtl8192e: use %*pEn to escape buffer staging: wlan-ng: use %*pEhp to print SN lib80211: remove unused print_ssid() wireless: hostap: proc: print properly escaped SSID wireless: ipw2x00: print SSID via %*pE wireless: libertas: print esaped string via %*pE lib/vsprintf: add %*pE[achnops] format specifier lib / string_helpers: introduce string_escape_mem() lib / string_helpers: refactoring the test suite lib / string_helpers: move documentation to c-file include/linux: remove strict_strto* definitions arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable fs: check bh blocknr earlier when searching lru ...
| * | arch/x86/kernel/cpu/common.c: fix unused symbol warningAndrew Morton2014-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86_64 allnoconfig: arch/x86/kernel/cpu/common.c:968: warning: 'syscall32_cpu_init' defined but not used Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2014-10-13
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc smaller fixes that missed the v3.17 cycle" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/build: Add arch/x86/purgatory/ make generated files to gitignore x86: Fix section conflict for numachip x86: Reject x32 executables if x32 ABI not supported x86_64, entry: Filter RFLAGS.NT on entry from userspace x86, boot, kaslr: Fix nuisance warning on 32-bit builds
| * | | x86_64, entry: Filter RFLAGS.NT on entry from userspaceAndy Lutomirski2014-10-06
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The NT flag doesn't do anything in long mode other than causing IRET to #GP. Oddly, CPL3 code can still set NT using popf. Entry via hardware or software interrupt clears NT automatically, so the only relevant entries are fast syscalls. If user code causes kernel code to run with NT set, then there's at least some (small) chance that it could cause trouble. For example, user code could cause a call to EFI code with NT set, and who knows what would happen? Apparently some games on Wine sometimes do this (!), and, if an IRET return happens, they will segfault. That segfault cannot be handled, because signal delivery fails, too. This patch programs the CPU to clear NT on entry via SYSCALL (both 32-bit and 64-bit, by my reading of the AMD APM), and it clears NT in software on entry via SYSENTER. To save a few cycles, this borrows a trick from Jan Beulich in Xen: it checks whether NT is set before trying to clear it. As a result, it seems to have very little effect on SYSENTER performance on my machine. There's another minor bug fix in here: it looks like the CFI annotations were wrong if CONFIG_AUDITSYSCALL=n. Testers beware: on Xen, SYSENTER with NT set turns into a GPF. I haven't touched anything on 32-bit kernels. The syscall mask change comes from a variant of this patch by Anish Bhatt. Note to stable maintainers: there is no known security issue here. A misguided program can set NT and cause the kernel to try and fail to deliver SIGSEGV, crashing the program. This patch fixes Far Cry on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275 Cc: <stable@vger.kernel.org> Reported-by: Anish Bhatt <anish@chelsio.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* | | Merge branch 'x86-cpufeature-for-linus' of ↵Linus Torvalds2014-10-13
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpufeature updates from Ingo Molnar: "This tree includes the following changes: - Introduce DISABLED_MASK to list disabled CPU features, to simplify CPU feature handling and avoid excessive #ifdefs - Remove the lightly used cpu_has_pae() primitive" * 'x86-cpufeature-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Add more disabled features x86: Introduce disabled-features x86: Axe the lightly-used cpu_has_pae
| * | x86: Add more disabled featuresDave Hansen2014-09-11
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The original motivation for these patches was for an Intel CPU feature called MPX. The patch to add a disabled feature for it will go in with the other parts of the support. But, in the meantime, there are a few other features than MPX that we can make assumptions about at compile-time based on compile options. Add them to disabled-features.h and check them with cpu_feature_enabled(). Note that this gets rid of the last things that needed an #ifdef CONFIG_X86_64 in cpufeature.h. Yay! Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | Merge branch 'x86-boot-for-linus' of ↵Linus Torvalds2014-10-13
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bootup updates from Ingo Molnar: "The changes in this cycle were: - Fix rare SMP-boot hang (mostly in virtual environments) - Fix build warning with certain (rare) toolchains" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/relocs: Make per_cpu_load_addr static x86/smpboot: Initialize secondary CPU only if master CPU will wait for it
| * | x86/smpboot: Initialize secondary CPU only if master CPU will wait for itIgor Mammedov2014-09-16
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/1403266991-12233-1-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* / x86: Support compiling out human-friendly processor feature namesJosh Triplett2014-08-17
|/ | | | | | | | | | | | | | | The table mapping CPUID bits to human-readable strings takes up a non-trivial amount of space, and only exists to support /proc/cpuinfo and a couple of kernel messages. Since programs depend on the format of /proc/cpuinfo, force inclusion of the table when building with /proc support; otherwise, support omitting that table to save space, in which case the kernel messages will print features numerically instead. In addition to saving 1408 bytes out of vmlinux, this also saves 1373 bytes out of the uncompressed setup code, which contributes directly to the size of bzImage. Signed-off-by: Josh Triplett <josh@joshtriplett.org>
* Merge branch 'x86-xsave-for-linus' of ↵Linus Torvalds2014-08-13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/xsave changes from Peter Anvin: "This is a patchset to support the XSAVES instruction required to support context switch of supervisor-only features in upcoming silicon. This patchset missed the 3.16 merge window, which is why it is based on 3.15-rc7" * 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, xsave: Add forgotten inline annotation x86/xsaves: Clean up code in xstate offsets computation in xsave area x86/xsave: Make it clear that the XSAVE macros use (%edi)/(%rdi) Define kernel API to get address of each state in xsave area x86/xsaves: Enable xsaves/xrstors x86/xsaves: Call booting time xsaves and xrstors in setup_init_fpu_buf x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time x86/xsaves: Add xsaves and xrstors support for booting time x86/xsaves: Clear reserved bits in xsave header x86/xsaves: Use xsave/xrstor for saving and restoring user space context x86/xsaves: Use xsaves/xrstors for context switch x86/xsaves: Use xsaves/xrstors to save and restore xsave area x86/xsaves: Define a macro for handling xsave/xrstor instruction fault x86/xsaves: Define macros for xsave instructions x86/xsaves: Change compacted format xsave area header x86/alternative: Add alternative_input_2 to support alternative with two features and input x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstors
| * x86/xsaves: Add a kernel parameter noxsaves to disable xsaves/xrstorsFenghua Yu2014-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a kernel parameter noxsaves to disable xsaves/xrstors feature. The kernel will fall back to use xsaveopt and xrstor to save and restor xstates. By using this parameter, xsave area occupies more memory because standard form of xsave area in xsaveopt/xrstor occupies more memory than compacted form of xsave area. This patch adds a description of the kernel parameter noxsaveopt in doc. The code to support the parameter noxsaveopt has been in the kernel before. This patch just adds the description of this parameter in the doc. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-4-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2014-08-04
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm changes from Ingo Molnar: "The main change in this cycle is the rework of the TLB range flushing code, to simplify, fix and consolidate the code. By Dave Hansen" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Set TLB flush tunable to sane value (33) x86/mm: New tunable for single vs full TLB flush x86/mm: Add tracepoints for TLB flushes x86/mm: Unify remote INVLPG code x86/mm: Fix missed global TLB flush stat x86/mm: Rip out complicated, out-of-date, buggy TLB flushing x86/mm: Clean up the TLB flushing code x86/smep: Be more informative when signalling an SMEP fault
| * | x86/mm: Rip out complicated, out-of-date, buggy TLB flushingDave Hansen2014-07-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I think the flush_tlb_mm_range() code that tries to tune the flush sizes based on the CPU needs to get ripped out for several reasons: 1. It is obviously buggy. It uses mm->total_vm to judge the task's footprint in the TLB. It should certainly be using some measure of RSS, *NOT* ->total_vm since only resident memory can populate the TLB. 2. Haswell, and several other CPUs are missing from the intel_tlb_flushall_shift_set() function. Thus, it has been demonstrated to bitrot quickly in practice. 3. It is plain wrong in my vm: [ 0.037444] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0 [ 0.037444] tlb_flushall_shift: 6 Which leads to it to never use invlpg. 4. The assumptions about TLB refill costs are wrong: http://lkml.kernel.org/r/1337782555-8088-3-git-send-email-alex.shi@intel.com (more on this in later patches) 5. I can not reproduce the original data: https://lkml.org/lkml/2012/5/17/59 I believe the sample times were too short. Running the benchmark in a loop yields times that vary quite a bit. Note that this leaves us with a static ceiling of 1 page. This is a conservative, dumb setting, and will be revised in a later patch. This also removes the code which attempts to predict whether we are flushing data or instructions. We expect instruction flushes to be relatively rare and not worth tuning for explicitly. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140731154055.ABC88E89@viggo.jf.intel.com Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | Merge tag 'v3.16-rc1' into x86/cpufeatureH. Peter Anvin2014-06-18
|\ \ \ | |_|/ |/| | | | | | | | | | | Linux 3.16-rc1 Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2014-06-12
| |\ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "A second round of perf updates: - wide reaching kprobes sanitization and robustization, with the hope of fixing all 'probe this function crashes the kernel' bugs, by Masami Hiramatsu. - uprobes updates from Oleg Nesterov: tmpfs support, corner case fixes and robustization work. - perf tooling updates and fixes from Jiri Olsa, Namhyung Ki, Arnaldo et al: * Add support to accumulate hist periods (Namhyung Kim) * various fixes, refactorings and enhancements" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits) perf: Differentiate exec() and non-exec() comm events perf: Fix perf_event_comm() vs. exec() assumption uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates perf/documentation: Add description for conditional branch filter perf/x86: Add conditional branch filtering support perf/tool: Add conditional branch filter 'cond' to perf record perf: Add new conditional branch filter 'PERF_SAMPLE_BRANCH_COND' uprobes: Teach copy_insn() to support tmpfs uprobes: Shift ->readpage check from __copy_insn() to uprobe_register() perf/x86: Use common PMU interrupt disabled code perf/ARM: Use common PMU interrupt disabled code perf: Disable sampled events if no PMU interrupt perf: Fix use after free in perf_remove_from_context() perf tools: Fix 'make help' message error perf record: Fix poll return value propagation perf tools: Move elide bool into perf_hpp_fmt struct perf tools: Remove elide setup for SORT_MODE__MEMORY mode perf tools: Fix "==" into "=" in ui_browser__warning assignment perf tools: Allow overriding sysfs and proc finding with env var perf tools: Consider header files outside perf directory in tags target ...
| | * kprobes, x86: Prohibit probing on debug_stack_*()Masami Hiramatsu2014-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prohibit probing on debug_stack_reset and debug_stack_set_zero. Since the both functions are called from TRACE_IRQS_ON/OFF_DEBUG macros which run in int3 ist entry, probing it may cause a soft lockup. This happens when the kernel built with CONFIG_DYNAMIC_FTRACE=y and CONFIG_TRACE_IRQFLAGS=y. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Borislav Petkov <bp@suse.de> Cc: Jan Beulich <JBeulich@suse.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Link: http://lkml.kernel.org/r/20140417081712.26341.32994.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86, vdso: Move the vvar and hpet mappings next to the 64-bit vDSOAndy Lutomirski2014-05-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the 64-bit and x32 vdsos use the same mechanism as the 32-bit vdso. Most of the churn is deleting all the old fixmap code. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/8af87023f57f6bb96ec8d17fce3f88018195b49b.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | x86, vdso: Move syscall and sysenter setup into kernel/cpu/common.cAndy Lutomirski2014-05-05
| |/ | | | | | | | | | | | | | | | | | | | | | | | | This code is used during CPU setup, and it isn't strictly speaking related to the 32-bit vdso. It's easier to understand how this works when the code is closer to its callers. This also lets syscall32_cpu_init be static, which might save some trivial amount of kernel text. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/4e466987204e232d7b55a53ff6b9739f12237461.1399317206.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* / x86/xsaves: Detect xsaves/xrstors featureFenghua Yu2014-05-29
|/ | | | | | | | | | | | | | | | | | | | | Detect the xsaveopt, xsavec, xgetbv, and xsaves features in processor extended state enumberation sub-leaf (eax=0x0d, ecx=1): Bit 00: XSAVEOPT is available Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set Bit 02: Supports XGETBV with ECX = 1 if set Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set The above features are defined in the new word 10 in cpu features. The IA32_XSS MSR (index DA0H) contains a state-component bitmap that specifies the state components that software has enabled xsaves and xrstors to manage. If the bit corresponding to a state component is clear in XCR0 | IA32_XSS, xsaves and xrstors will not operate on that state component, regardless of the value of the instruction mask. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1401387164-43416-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* Merge branch 'x86-threadinfo-for-linus' of ↵Linus Torvalds2014-04-01
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 threadinfo changes from Ingo Molnar: "The main change here is the consolidation/unification of 32 and 64 bit thread_info handling methods, from Steve Rostedt" * 'x86-threadinfo-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, threadinfo: Redo "x86: Use inline assembler to get sp" x86: Clean up dumpstack_64.c code x86: Keep thread_info on thread stack in x86_32 x86: Prepare removal of previous_esp from i386 thread_info structure x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386 x86: Nuke the supervisor_stack field in i386 thread_info
| * x86: Keep thread_info on thread stack in x86_32Steven Rostedt2014-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | x86_64 uses a per_cpu variable kernel_stack to always point to the thread stack of current. This is where the thread_info is stored and is accessed from this location even when the irq or exception stack is in use. This removes the complexity of having to maintain the thread info on the stack when interrupts are running and having to copy the preempt_count and other fields to the interrupt stack. x86_32 uses the old method of copying the thread_info from the thread stack to the exception stack just before executing the exception. Having the two different requires #ifdefs and also the x86_32 way is a bit of a pain to maintain. By converting x86_32 to the same method of x86_64, we can remove #ifdefs, clean up the x86_32 code a little, and remove the overhead of the copy. Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Brian Gerst <brgerst@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPTH. Peter Anvin2014-02-27
| | | | | | | | | | | | | | | | | | If we explicitly disable the use of CLFLUSH, we should disable the use of CLFLUSHOPT as well. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-jtdv7btppr4jgzxm3sxx1e74@git.kernel.org
* | x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSHH. Peter Anvin2014-02-27
|/ | | | | | | | | | We call this "clflush" in /proc/cpuinfo, and have cpu_has_clflush()... let's be consistent and just call it that. Cc: Gleb Natapov <gleb@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alan Cox <alan@linux.intel.com> Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org
* x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabledH. Peter Anvin2014-02-13
| | | | | | | | | | | | | If SMAP support is not compiled into the kernel, don't enable SMAP in CR4 -- in fact, we should clear it, because the kernel doesn't contain the proper STAC/CLAC instructions for SMAP support. Found by Fengguang Wu's test system. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v3.7+