aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/include
Commit message (Collapse)AuthorAge
...
| | * | | | | KVM: x86: adjust kvm_mmu_page member to save 8 bytesWei Yang2018-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a 64bits machine, struct is naturally aligned with 8 bytes. Since kvm_mmu_page member *unsync* and *role* are less then 4 bytes, we can rearrange the sequence to compace the struct. As the comment shows, *role* and *gfn* are used to key the shadow page. In order to keep the comment valid, this patch moves the *unsync* up and exchange the position of *role* and *gfn*. From /proc/slabinfo, it shows the size of kvm_mmu_page is 8 bytes less and with one more object per slap after applying this patch. # name <active_objs> <num_objs> <objsize> <objperslab> kvm_mmu_page_header 0 0 168 24 kvm_mmu_page_header 0 0 160 25 Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| | * | | | | KVM: nVMX: Clear reserved bits of #DB exit qualificationJim Mattson2018-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to volume 3 of the SDM, bits 63:15 and 12:4 of the exit qualification field for debug exceptions are reserved (cleared to 0). However, the SDM is incorrect about bit 16 (corresponding to DR6.RTM). This bit should be set if a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. Note that this is the opposite of DR6.RTM, which "indicates (when clear) that a debug exception (#DB) or breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled." There is still an issue with stale DR6 bits potentially being misreported for the current debug exception. DR6 should not have been modified before vectoring the #DB exception, and the "new DR6 bits" should be available somewhere, but it was and they aren't. Fixes: b96fb439774e1 ("KVM: nVMX: fixes to nested virt interrupt injection") Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
| * | | | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2018-10-25
| |\ \ \ \ \ \ | | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping updates from Thomas Gleixner: "The timers and timekeeping departement provides: - Another large y2038 update with further preparations for providing the y2038 safe timespecs closer to the syscalls. - An overhaul of the SHCMT clocksource driver - SPDX license identifier updates - Small cleanups and fixes all over the place" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) tick/sched : Remove redundant cpu_online() check clocksource/drivers/dw_apb: Add reset control clocksource: Remove obsolete CLOCKSOURCE_OF_DECLARE clocksource/drivers: Unify the names to timer-* format clocksource/drivers/sh_cmt: Add R-Car gen3 support dt-bindings: timer: renesas: cmt: document R-Car gen3 support clocksource/drivers/sh_cmt: Properly line-wrap sh_cmt_of_table[] initializer clocksource/drivers/sh_cmt: Fix clocksource width for 32-bit machines clocksource/drivers/sh_cmt: Fixup for 64-bit machines clocksource/drivers/sh_tmu: Convert to SPDX identifiers clocksource/drivers/sh_mtu2: Convert to SPDX identifiers clocksource/drivers/sh_cmt: Convert to SPDX identifiers clocksource/drivers/renesas-ostm: Convert to SPDX identifiers clocksource: Convert to using %pOFn instead of device_node.name tick/broadcast: Remove redundant check RISC-V: Request newstat syscalls y2038: signal: Change rt_sigtimedwait to use __kernel_timespec y2038: socket: Change recvmmsg to use __kernel_timespec y2038: sched: Change sched_rr_get_interval to use __kernel_timespec y2038: utimes: Rework #ifdef guards for compat syscalls ...
| | * | | | | y2038: utimes: Rework #ifdef guards for compat syscallsArnd Bergmann2018-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After changing over to 64-bit time_t syscalls, many architectures will want compat_sys_utimensat() but not respective handlers for utime(), utimes() and futimesat(). This adds a new __ARCH_WANT_SYS_UTIME32 to complement __ARCH_WANT_SYS_UTIME. For now, all 64-bit architectures that support CONFIG_COMPAT set it, but future 64-bit architectures will not (tile would not have needed it either, but got removed). As older 32-bit architectures get converted to using CONFIG_64BIT_TIME, they will have to use __ARCH_WANT_SYS_UTIME32 instead of __ARCH_WANT_SYS_UTIME. Architectures using the generic syscall ABI don't need either of them as they never had a utime syscall. Since the compat_utimbuf structure is now required outside of CONFIG_COMPAT, I'm moving it into compat_time.h. Signed-off-by: Arnd Bergmann <arnd@arndb.de> --- changed from last version: - renamed __ARCH_WANT_COMPAT_SYS_UTIME to __ARCH_WANT_SYS_UTIME32
| | * | | | | asm-generic: Remove unneeded __ARCH_WANT_SYS_LLSEEK macroArnd Bergmann2018-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sys_llseek sytem call is needed on all 32-bit architectures and none of the 64-bit ones, so we can remove the __ARCH_WANT_SYS_LLSEEK guard and simplify the include/asm-generic/unistd.h header further. Since 32-bit tasks can run either natively or in compat mode on 64-bit architectures, we have to check for both !CONFIG_64BIT and CONFIG_COMPAT. There are a few 64-bit architectures that also reference sys_llseek in their 64-bit ABI (e.g. sparc), but I verified that those all select CONFIG_COMPAT, so the #if check is still correct here. It's a bit odd to include it in the syscall table though, as it's the same as sys_lseek() on 64-bit, but with strange calling conventions. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| | * | | | | asm-generic: Move common compat types to asm-generic/compat.hArnd Bergmann2018-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While converting compat system call handlers to work on 32-bit architectures, I found a number of types used in those handlers that are identical between all architectures. Let's move all the identical ones into asm-generic/compat.h to avoid having to add even more identical definitions of those types. For unknown reasons, mips defines __compat_gid32_t, __compat_uid32_t and compat_caddr_t as signed, while all others have them unsigned. This seems to be a mistake, but I'm leaving it alone here. The other types all differ by size or alignment on at least on architecture. compat_aio_context_t is currently defined in linux/compat.h but also needed for compat_sys_io_getevents(), so let's move it into the same place. While we still have not decided whether the 32-bit time handling will always use the compat syscalls, or in which form, I think this is a useful cleanup that we can merge regardless. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| | * | | | | y2038: Remove newstat family from default syscall setArnd Bergmann2018-08-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have four generations of stat() syscalls: - the oldstat syscalls that are only used on the older architectures - the newstat family that is used on all 64-bit architectures but lacked support for large files on 32-bit architectures. - the stat64 family that is used mostly on 32-bit architectures to replace newstat - statx() to replace all of the above, adding 64-bit timestamps among other things. We already compile stat64 only on those architectures that need it, but newstat is always built, including on those that don't reference it. This adds a new __ARCH_WANT_NEW_STAT symbol along the lines of __ARCH_WANT_OLD_STAT and __ARCH_WANT_STAT64 to control compilation of newstat. All architectures that need it use an explict define, the others now get a little bit smaller, and future architecture (including 64-bit targets) won't ever see it. Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
| * | | | | | Merge branch 'siginfo-linus' of ↵Linus Torvalds2018-10-24
| |\ \ \ \ \ \ | | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull siginfo updates from Eric Biederman: "I have been slowly sorting out siginfo and this is the culmination of that work. The primary result is in several ways the signal infrastructure has been made less error prone. The code has been updated so that manually specifying SEND_SIG_FORCED is never necessary. The conversion to the new siginfo sending functions is now complete, which makes it difficult to send a signal without filling in the proper siginfo fields. At the tail end of the patchset comes the optimization of decreasing the size of struct siginfo in the kernel from 128 bytes to about 48 bytes on 64bit. The fundamental observation that enables this is by definition none of the known ways to use struct siginfo uses the extra bytes. This comes at the cost of a small user space observable difference. For the rare case of siginfo being injected into the kernel only what can be copied into kernel_siginfo is delivered to the destination, the rest of the bytes are set to 0. For cases where the signal and the si_code are known this is safe, because we know those bytes are not used. For cases where the signal and si_code combination is unknown the bits that won't fit into struct kernel_siginfo are tested to verify they are zero, and the send fails if they are not. I made an extensive search through userspace code and I could not find anything that would break because of the above change. If it turns out I did break something it will take just the revert of a single change to restore kernel_siginfo to the same size as userspace siginfo. Testing did reveal dependencies on preferring the signo passed to sigqueueinfo over si->signo, so bit the bullet and added the complexity necessary to handle that case. Testing also revealed bad things can happen if a negative signal number is passed into the system calls. Something no sane application will do but something a malicious program or a fuzzer might do. So I have fixed the code that performs the bounds checks to ensure negative signal numbers are handled" * 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits) signal: Guard against negative signal numbers in copy_siginfo_from_user32 signal: Guard against negative signal numbers in copy_siginfo_from_user signal: In sigqueueinfo prefer sig not si_signo signal: Use a smaller struct siginfo in the kernel signal: Distinguish between kernel_siginfo and siginfo signal: Introduce copy_siginfo_from_user and use it's return value signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE signal: Fail sigqueueinfo if si_signo != sig signal/sparc: Move EMT_TAGOVF into the generic siginfo.h signal/unicore32: Use force_sig_fault where appropriate signal/unicore32: Generate siginfo in ucs32_notify_die signal/unicore32: Use send_sig_fault where appropriate signal/arc: Use force_sig_fault where appropriate signal/arc: Push siginfo generation into unhandled_exception signal/ia64: Use force_sig_fault where appropriate signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn signal/ia64: Use the generic force_sigsegv in setup_frame signal/arm/kvm: Use send_sig_mceerr signal/arm: Use send_sig_fault where appropriate signal/arm: Use force_sig_fault where appropriate ...
| | * | | | | signal: Distinguish between kernel_siginfo and siginfoEric W. Biederman2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus recently observed that if we did not worry about the padding member in struct siginfo it is only about 48 bytes, and 48 bytes is much nicer than 128 bytes for allocating on the stack and copying around in the kernel. The obvious thing of only adding the padding when userspace is including siginfo.h won't work as there are sigframe definitions in the kernel that embed struct siginfo. So split siginfo in two; kernel_siginfo and siginfo. Keeping the traditional name for the userspace definition. While the version that is used internally to the kernel and ultimately will not be padded to 128 bytes is called kernel_siginfo. The definition of struct kernel_siginfo I have put in include/signal_types.h A set of buildtime checks has been added to verify the two structures have the same field offsets. To make it easy to verify the change kernel_siginfo retains the same size as siginfo. The reduction in size comes in a following change. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| | * | | | | signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZEEric W. Biederman2018-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the defintion of struct siginfo so that the array padding struct siginfo to SI_MAX_SIZE can be placed in a union along side of the rest of the struct siginfo members. The result is that we no longer need the __ARCH_SI_PREAMBLE_SIZE or SI_PAD_SIZE definitions. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| | * | | | | signal/x86: Move mpx siginfo generation into do_boundsEric W. Biederman2018-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This separates the logic of generating the signal from the logic of gathering the information about the bounds violation. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| | * | | | | signal/x86: In trace_mpx_bounds_register_exception add __user annotationsEric W. Biederman2018-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value passed in to addr_referenced is of type void __user *, so update the addr_referenced parameter in trace_mpx_bounds_register_exception to match. Also update the addr_referenced paramater in TP_STRUCT__entry as it again holdes the same value. I don't know why this was missed earlier but sparse was complaining when testing test branch so fix this now. Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| | * | | | | signal: Simplify tracehook_report_syscall_exitEric W. Biederman2018-09-19
| | |/ / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace user_single_step_siginfo with user_single_step_report that allocates siginfo structure on the stack and sends it. This allows tracehook_report_syscall_exit to become a simple if statement that calls user_single_step_report or ptrace_report_syscall depending on the value of step. Update the default helper function now called user_single_step_report to explicitly set si_code to SI_USER and to set si_uid and si_pid to 0. The default helper has always been doing this (using memset) but it was far from obvious. The powerpc helper can now just call force_sig_fault. The x86 helper can now just call send_sigtrap. Unfortunately the default implementation of user_single_step_report can not use force_sig_fault as it does not use a SIGTRAP si_code. So it has to carefully setup the siginfo and use use force_sig_info. The net result is code that is easier to understand and simpler to maintain. Ref: 85ec7fd9f8e5 ("ptrace: introduce user_single_step_siginfo() helper") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | | | | | x86/cpufeatures: Enumerate MOVDIR64B instructionFenghua Yu2018-10-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVDIR64B moves 64-bytes as direct-store with 64-bytes write atomicity. Direct store is implemented by using write combining (WC) for writing data directly into memory without caching the data. In low latency offload (e.g. Non-Volatile Memory, etc), MOVDIR64B writes work descriptors (and data in some cases) to device-hosted work-queues atomically without cache pollution. Availability of the MOVDIR64B instruction is indicated by the presence of the CPUID feature flag MOVDIR64B (CPUID.0x07.0x0:ECX[bit 28]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIR64B flag. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1540418237-125817-3-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | x86/cpufeatures: Enumerate MOVDIRI instructionFenghua Yu2018-10-25
|/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MOVDIRI moves doubleword or quadword from register to memory through direct store which is implemented by using write combining (WC) for writing data directly into memory without caching the data. Programmable agents can handle streaming offload (e.g. high speed packet processing in network). Hardware implements a doorbell (tail pointer) register that is updated by software when adding new work-elements to the streaming offload work-queue. MOVDIRI can be used as the doorbell write which is a 4-byte or 8-byte uncachable write to MMIO. MOVDIRI has lower overhead than other ways to write the doorbell. Availability of the MOVDIRI instruction is indicated by the presence of the CPUID feature flag MOVDIRI(CPUID.0x07.0x0:ECX[bit 27]). Please check the latest Intel Architecture Instruction Set Extensions and Future Features Programming Reference for more details on the CPUID feature MOVDIRI flag. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Ashok Raj <ashok.raj@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi V Shankar <ravi.v.shankar@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1540418237-125817-2-git-send-email-fenghua.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 vdso updates from Ingo Molnar: "Two main changes: - Cleanups, simplifications and CLOCK_TAI support (Thomas Gleixner) - Improve code generation (Andy Lutomirski)" * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/vdso: Rearrange do_hres() to improve code generation x86/vdso: Document vgtod_ts better x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks x66/vdso: Add CLOCK_TAI support x86/vdso: Move cycle_last handling into the caller x86/vdso: Simplify the invalid vclock case x86/vdso: Replace the clockid switch case x86/vdso: Collapse coarse functions x86/vdso: Collapse high resolution functions x86/vdso: Introduce and use vgtod_ts x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq x86/vdso: Enforce 64bit clocksource x86/time: Implement clocksource_arch_init() clocksource: Provide clocksource_arch_init()
| * | | | | x86/vdso: Document vgtod_ts betterAndy Lutomirski2018-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After reading do_hres() and do_course() and scratching my head a bit, I figured out why the arithmetic is strange. Document it. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/f66f53d81150bbad47d7b282c9207a71a3ce1c16.1538689401.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | x66/vdso: Add CLOCK_TAI supportThomas Gleixner2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the storage array in place it's now trivial to support CLOCK_TAI in the vdso. Extend the base time storage array and add the update code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Matt Rickard <matt@softrans.com.au> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.823878601@linutronix.de
| * | | | | x86/vdso: Introduce and use vgtod_tsThomas Gleixner2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This results either in indirect calls due to the larger switch case, which then requires retpolines or when the compiler is forced to avoid jump tables it results in even more conditionals. To avoid both variants which are bad for performance the high resolution functions and the coarse grained functions will be collapsed into one for each. That requires to store the clock specific base time in an array. Introcude struct vgtod_ts for storage and convert the data store, the update function and the individual clock functions over to use it. The new storage does not longer use gtod_long_t for seconds depending on 32 or 64 bit compile because this needs to be the full 64bit value even for 32bit when a Y2038 function is added. No point in keeping the distinction alive in the internal representation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de
| * | | | | x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: SeqThomas Gleixner2018-10-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sequence count in vgtod_data is unsigned int, but the call sites use unsigned long, which is a pointless exercise. Fix the call sites and replace 'unsigned' with unsinged 'int' while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Matt Rickard <matt@softrans.com.au> Cc: Stephen Boyd <sboyd@kernel.org> Cc: John Stultz <john.stultz@linaro.org> Cc: Florian Weimer <fweimer@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: virtualization@lists.linux-foundation.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Juergen Gross <jgross@suse.com> Link: https://lkml.kernel.org/r/20180917130707.236250416@linutronix.de
* | | | | | Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Ingo Molnar: "The main changes: - Make the IBPB barrier more strict and add STIBP support (Jiri Kosina) - Micro-optimize and clean up the entry code (Andy Lutomirski) - ... plus misc other fixes" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Propagate information about RSB filling mitigation to sysfs x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation x86/speculation: Apply IBPB more strictly to avoid cross-process data leak x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variant x86/CPU: Fix unused variable warning when !CONFIG_IA32_EMULATION x86/pti/64: Remove the SYSCALL64 entry trampoline x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch space x86/entry/64: Document idtentry
| * | | | | | x86/speculation: Add RETPOLINE_AMD support to the inline asm CALL_NOSPEC variantZhenzhong Duan2018-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ..so that they match their asm counterpart. Add the missing ANNOTATE_NOSPEC_ALTERNATIVE in CALL_NOSPEC, while at it. Signed-off-by: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Wang YanQing <udknight@gmail.com> Cc: dhaval.giani@oracle.com Cc: srinivas.eeda@oracle.com Link: http://lkml.kernel.org/r/c3975665-173e-4d70-8dee-06c926ac26ee@default
| * | | | | | x86/pti/64: Remove the SYSCALL64 entry trampolineAndy Lutomirski2018-09-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SYSCALL64 trampoline has a couple of nice properties: - The usual sequence of SWAPGS followed by two GS-relative accesses to set up RSP is somewhat slow because the GS-relative accesses need to wait for SWAPGS to finish. The trampoline approach allows RIP-relative accesses to set up RSP, which avoids the stall. - The trampoline avoids any percpu access before CR3 is set up, which means that no percpu memory needs to be mapped in the user page tables. This prevents using Meltdown to read any percpu memory outside the cpu_entry_area and prevents using timing leaks to directly locate the percpu areas. The downsides of using a trampoline may outweigh the upsides, however. It adds an extra non-contiguous I$ cache line to system calls, and it forces an indirect jump to transfer control back to the normal kernel text after CR3 is set up. The latter is because x86 lacks a 64-bit direct jump instruction that could jump from the trampoline to the entry text. With retpolines enabled, the indirect jump is extremely slow. Change the code to map the percpu TSS into the user page tables to allow the non-trampoline SYSCALL64 path to work under PTI. This does not add a new direct information leak, since the TSS is readable by Meltdown from the cpu_entry_area alias regardless. It does allow a timing attack to locate the percpu area, but KASLR is more or less a lost cause against local attack on CPUs vulnerable to Meltdown regardless. As far as I'm concerned, on current hardware, KASLR is only useful to mitigate remote attacks that try to attack the kernel without first gaining RCE against a vulnerable user process. On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall overhead from ~237ns to ~228ns. There is a possible alternative approach: Move the trampoline within 2G of the entry text and make a separate copy for each CPU. This would allow a direct jump to rejoin the normal entry path. There are pro's and con's for this approach: + It avoids a pipeline stall - It executes from an extra page and read from another extra page during the syscall. The latter is because it needs to use a relative addressing mode to find sp1 -- it's the same *cacheline*, but accessed using an alias, so it's an extra TLB entry. - Slightly more memory. This would be one page per CPU for a simple implementation and 64-ish bytes per CPU or one page per node for a more complex implementation. - More code complexity. The current approach is chosen for simplicity and because the alternative does not provide a significant benefit, which makes it worth. [ tglx: Added the alternative discussion to the changelog ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
| * | | | | | x86/entry/64: Use the TSS sp2 slot for SYSCALL/SYSRET scratch spaceAndy Lutomirski2018-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the non-trampoline SYSCALL64 path, a percpu variable is used to temporarily store the user RSP value. Instead of a separate variable, use the otherwise unused sp2 slot in the TSS. This will improve cache locality, as the sp1 slot is already used in the same code to find the kernel stack. It will also simplify a future change to make the non-trampoline path work in PTI mode. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lkml.kernel.org/r/08e769a0023dbad4bac6f34f3631dbaf8ad59f4f.1536015544.git.luto@kernel.org
* | | | | | | Merge branch 'x86-paravirt-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 paravirt updates from Ingo Molnar: "Two main changes: - Remove no longer used parts of the paravirt infrastructure and put large quantities of paravirt ops under a new config option PARAVIRT_XXL=y, which is selected by XEN_PV only. (Joergen Gross) - Enable PV spinlocks on Hyperv (Yi Sun)" * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Enable PV qspinlock for Hyper-V x86/hyperv: Add GUEST_IDLE_MSR support x86/paravirt: Clean up native_patch() x86/paravirt: Prevent redefinition of SAVE_FLAGS macro x86/xen: Make xen_reservation_lock static x86/paravirt: Remove unneeded mmu related paravirt ops bits x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrella x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrella x86/paravirt: Introduce new config option PARAVIRT_XXL x86/paravirt: Remove unused paravirt bits x86/paravirt: Use a single ops structure x86/paravirt: Remove clobbers from struct paravirt_patch_site x86/paravirt: Remove clobbers parameter from paravirt patch functions x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() static x86/xen: Add SPDX identifier in arch/x86/xen files x86/xen: Link platform-pci-unplug.o only if CONFIG_XEN_PVHVM x86/xen: Move pv specific parts of arch/x86/xen/mmu.c to mmu_pv.c x86/xen: Move pv irq related functions under CONFIG_XEN_PV umbrella
| * | | | | | | x86/hyperv: Enable PV qspinlock for Hyper-VYi Sun2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement the required wait and kick callbacks to support PV spinlocks in Hyper-V guests. [ tglx: Document the requirement for disabling interrupts in the wait() callback. Remove goto and unnecessary includes. Add prototype for hv_vcpu_is_preempted(). Adapted to pending paravirt changes. ] Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Michael Kelley (EOSG) <Michael.H.Kelley@microsoft.com> Cc: chao.p.peng@intel.com Cc: chao.gao@intel.com Cc: isaku.yamahata@intel.com Cc: tianyu.lan@microsoft.com Link: https://lkml.kernel.org/r/1538987374-51217-3-git-send-email-yi.y.sun@linux.intel.com
| * | | | | | | x86/hyperv: Add GUEST_IDLE_MSR supportYi Sun2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V may expose a HV_X64_MSR_GUEST_IDLE MSR via HYPERV_CPUID_FEATURES. Reading this MSR triggers the host to transition the guest vCPU into an idle state. This state can be exited via an IPI even if the read in the guest happened from an interrupt disabled section. Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Cc: chao.p.peng@intel.com Cc: chao.gao@intel.com Cc: isaku.yamahata@intel.com Cc: tianyu.lan@microsoft.com Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Link: https://lkml.kernel.org/r/1538028104-114050-2-git-send-email-yi.y.sun@linux.intel.com
| * | | | | | | x86/paravirt: Prevent redefinition of SAVE_FLAGS macroJuergen Gross2018-09-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PARAVIRT_XXL changes introduced a redefinition of SAVE_FLAGS under certain configurations. Cure it Fixes: 6da63eb241a0 ("x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrella"). Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180905053720.13710-1-jgross@suse.com
| * | | | | | | x86/paravirt: Remove unneeded mmu related paravirt ops bitsJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to have 32-bit code for CONFIG_PGTABLE_LEVELS >= 4. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-16-jgross@suse.com
| * | | | | | | x86/paravirt: Move the Xen-only pv_mmu_ops under the PARAVIRT_XXL umbrellaJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the paravirt ops defined in pv_mmu_ops are for Xen PV guests only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-15-jgross@suse.com
| * | | | | | | x86/paravirt: Move the pv_irq_ops under the PARAVIRT_XXL umbrellaJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All of the paravirt ops defined in pv_irq_ops are for Xen PV guests or VSMP only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-14-jgross@suse.com
| * | | | | | | x86/paravirt: Move the Xen-only pv_cpu_ops under the PARAVIRT_XXL umbrellaJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most of the paravirt ops defined in pv_cpu_ops are for Xen PV guests only. Define them only if CONFIG_PARAVIRT_XXL is set. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-13-jgross@suse.com
| * | | | | | | x86/paravirt: Move items in pv_info under PARAVIRT_XXL umbrellaJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All items but name in pv_info are needed by Xen PV only. Define them with CONFIG_PARAVIRT_XXL set only. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-12-jgross@suse.com
| * | | | | | | x86/paravirt: Remove unused paravirt bitsJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The macros ENABLE_INTERRUPTS_SYSEXIT, GET_CR0_INTO_EAX and PARAVIRT_ADJUST_EXCEPTION_FRAME are used nowhere. Remove their definitions. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-10-jgross@suse.com
| * | | | | | | x86/paravirt: Use a single ops structureJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using six globally visible paravirt ops structures combine them in a single structure, keeping the original structures as sub-structures. This avoids the need to assemble struct paravirt_patch_template at runtime on the stack each time apply_paravirt() is being called (i.e. when loading a module). [ tglx: Made the struct and the initializer tabular for readability sake ] Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-9-jgross@suse.com
| * | | | | | | x86/paravirt: Remove clobbers from struct paravirt_patch_siteJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need any longer to store the clobbers in struct paravirt_patch_site. Remove clobbers from the struct and from the related macros. While at it fix some lines longer than 80 characters. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-8-jgross@suse.com
| * | | | | | | x86/paravirt: Remove clobbers parameter from paravirt patch functionsJuergen Gross2018-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The clobbers parameter from paravirt_patch_default() et al isn't used any longer. Remove it. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-7-jgross@suse.com
| * | | | | | | x86/paravirt: Make paravirt_patch_call() and paravirt_patch_jmp() staticJuergen Gross2018-09-03
| |/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | paravirt_patch_call() and paravirt_patch_jmp() are used in paravirt.c only. Convert them to static. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: virtualization@lists.linux-foundation.org Cc: akataria@vmware.com Cc: rusty@rustcorp.com.au Cc: boris.ostrovsky@oracle.com Cc: hpa@zytor.com Link: https://lkml.kernel.org/r/20180828074026.820-6-jgross@suse.com
* | | | | | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "Lots of changes in this cycle: - Lots of CPA (change page attribute) optimizations and related cleanups (Thomas Gleixner, Peter Zijstra) - Make lazy TLB mode even lazier (Rik van Riel) - Fault handler cleanups and improvements (Dave Hansen) - kdump, vmcore: Enable kdumping encrypted memory with AMD SME enabled (Lianbo Jiang) - Clean up VM layout documentation (Baoquan He, Ingo Molnar) - ... plus misc other fixes and enhancements" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits) x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry() x86/mm: Kill stray kernel fault handling comment x86/mm: Do not warn about PCI BIOS W+X mappings resource: Clean it up a bit resource: Fix find_next_iomem_res() iteration issue resource: Include resource end in walk_*() interfaces x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error x86/mm: Remove spurious fault pkey check x86/mm/vsyscall: Consider vsyscall page part of user address space x86/mm: Add vsyscall address helper x86/mm: Fix exception table comments x86/mm: Add clarifying comments for user addr space x86/mm: Break out user address space handling x86/mm: Break out kernel address space handling x86/mm: Clarify hardware vs. software "error_code" x86/mm/tlb: Make lazy TLB mode lazier x86/mm/tlb: Add freed_tables element to flush_tlb_info x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range smp,cpumask: introduce on_each_cpu_cond_mask smp: use __cpumask_set_cpu in on_each_cpu_cond ...
| * | | | | | | x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one errorBjorn Helgaas2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only use of KEXEC_BACKUP_SRC_END is as an argument to walk_system_ram_res(): int crash_load_segments(struct kimage *image) { ... walk_system_ram_res(KEXEC_BACKUP_SRC_START, KEXEC_BACKUP_SRC_END, image, determine_backup_region); walk_system_ram_res() expects "start, end" arguments that are inclusive, i.e., the range to be walked includes both the start and end addresses. KEXEC_BACKUP_SRC_END was previously defined as (640 * 1024UL), which is the first address *past* the desired 0-640KB range. Define KEXEC_BACKUP_SRC_END as (640 * 1024UL - 1) so the KEXEC_BACKUP_SRC region is [0-0x9ffff], not [0-0xa0000]. Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call") Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> CC: "H. Peter Anvin" <hpa@zytor.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Brijesh Singh <brijesh.singh@amd.com> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org> CC: Ingo Molnar <mingo@redhat.com> CC: Lianbo Jiang <lijiang@redhat.com> CC: Takashi Iwai <tiwai@suse.de> CC: Thomas Gleixner <tglx@linutronix.de> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Vivek Goyal <vgoyal@redhat.com> CC: baiyaowei@cmss.chinamobile.com CC: bhe@redhat.com CC: dan.j.williams@intel.com CC: dyoung@redhat.com CC: kexec@lists.infradead.org Link: http://lkml.kernel.org/r/153805811578.1157.6948388946904655969.stgit@bhelgaas-glaptop.roam.corp.google.com
| * | | | | | | x86/mm/tlb: Add freed_tables element to flush_tlb_infoRik van Riel2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass the information on to native_flush_tlb_others. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-7-riel@surriel.com
| * | | | | | | x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_rangeRik van Riel2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an argument to flush_tlb_mm_range to indicate whether page tables are about to be freed after this TLB flush. This allows for an optimization of flush_tlb_mm_range to skip CPUs in lazy TLB mode. No functional changes. Cc: npiggin@gmail.com Cc: mingo@kernel.org Cc: will.deacon@arm.com Cc: songliubraving@fb.com Cc: kernel-team@fb.com Cc: luto@kernel.org Cc: hpa@zytor.com Signed-off-by: Rik van Riel <riel@surriel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180926035844.1420-6-riel@surriel.com
| * | | | | | | x86/mm/tlb: Always use lazy TLB modeRik van Riel2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On most workloads, the number of context switches far exceeds the number of TLB flushes sent. Optimizing the context switches, by always using lazy TLB mode, speeds up those workloads. This patch results in about a 1% reduction in CPU use on a two socket Broadwell system running a memcache like workload. Cc: npiggin@gmail.com Cc: efault@gmx.de Cc: will.deacon@arm.com Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@fb.com Cc: hpa@zytor.com Cc: luto@kernel.org Tested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Rik van Riel <riel@surriel.com> (cherry picked from commit 95b0e6357d3e4e05349668940d7ff8f3b7e7e11e) Acked-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180716190337.26133-7-riel@surriel.com
| * | | | | | | x86/mm: Page size aware flush_tlb_mm_range()Peter Zijlstra2018-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new tlb_get_unmap_shift() to determine the stride of the INVLPG loop. Cc: Nick Piggin <npiggin@gmail.com> Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
| * | | | | | | x86/KASLR: Update KERNEL_IMAGE_SIZE descriptionBaoquan He2018-10-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently CONFIG_RANDOMIZE_BASE=y is set by default, which makes some of the old comments above the KERNEL_IMAGE_SIZE definition out of date. Update them to the current state of affairs. Signed-off-by: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@surriel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: thgarnie@google.com Link: http://lkml.kernel.org/r/20181006084327.27467-2-bhe@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | x86/ioremap: Add an ioremap_encrypted() helperLianbo Jiang2018-10-06
| | |_|_|_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When SME is enabled, the memory is encrypted in the first kernel. In this case, SME also needs to be enabled in the kdump kernel, and we have to remap the old memory with the memory encryption mask. The case of concern here is if SME is active in the first kernel, and it is active too in the kdump kernel. There are four cases to be considered: a. dump vmcore It is encrypted in the first kernel, and needs be read out in the kdump kernel. b. crash notes When dumping vmcore, the people usually need to read useful information from notes, and the notes is also encrypted. c. iommu device table It's encrypted in the first kernel, kdump kernel needs to access its content to analyze and get information it needs. d. mmio of AMD iommu not encrypted in both kernels Add a new bool parameter @encrypted to __ioremap_caller(). If set, memory will be remapped with the SME mask. Add a new function ioremap_encrypted() to explicitly pass in a true value for @encrypted. Use ioremap_encrypted() for the above a, b, c cases. [ bp: cleanup commit message, extern defs in io.h and drop forgotten include. ] Signed-off-by: Lianbo Jiang <lijiang@redhat.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Cc: kexec@lists.infradead.org Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: akpm@linux-foundation.org Cc: dan.j.williams@intel.com Cc: bhelgaas@google.com Cc: baiyaowei@cmss.chinamobile.com Cc: tiwai@suse.de Cc: brijesh.singh@amd.com Cc: dyoung@redhat.com Cc: bhe@redhat.com Cc: jroedel@suse.de Link: https://lkml.kernel.org/r/20180927071954.29615-2-lijiang@redhat.com
* | | | | | | Merge branch 'x86-grub2-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 grub2 updates from Ingo Molnar: "This extends the x86 boot protocol to include an address for the RSDP table - utilized by Xen currently. Matching Grub2 patches are pending as well. (Juergen Gross)" * 'x86-grub2-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/acpi, x86/boot: Take RSDP address for boot params if available x86/boot: Add ACPI RSDP address to setup_header x86/xen: Fix boot loader version reported for PVH guests
| * | | | | | | x86/acpi, x86/boot: Take RSDP address for boot params if availableJuergen Gross2018-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case the RSDP address in struct boot_params is specified don't try to find the table by searching, but take the address directly as set by the boot loader. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Baoquan He <bhe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Jia Zhang <qianyue.zj@alibaba-inc.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Pavel Tatashin <pasha.tatashin@oracle.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181010061456.22238-4-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | x86/boot: Add ACPI RSDP address to setup_headerJuergen Gross2018-10-10
| | |_|_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Xen PVH guests receive the address of the RSDP table from Xen. In order to support booting a Xen PVH guest via Grub2 using the standard x86 boot entry we need a way for Grub2 to pass the RSDP address to the kernel. For this purpose expand the struct setup_header to hold the physical address of the RSDP address. Being zero means it isn't specified and has to be located the legacy way (searching through low memory or EBDA). While documenting the new setup_header layout and protocol version 2.14 add the missing documentation of protocol version 2.13. There are Grub2 versions in several distros with a downstream patch violating the boot protocol by writing past the end of setup_header. This requires another update of the boot protocol to enable the kernel to distinguish between a specified RSDP address and one filled with garbage by such a broken Grub2. From protocol 2.14 on Grub2 will write the version it is supporting (but never a higher value than found to be supported by the kernel) ored with 0x8000 to the version field of setup_header. This enables the kernel to know up to which field Grub2 has written information to. All fields after that are supposed to be clobbered. Signed-off-by: Juergen Gross <jgross@suse.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: boris.ostrovsky@oracle.com Cc: bp@alien8.de Cc: corbet@lwn.net Cc: linux-doc@vger.kernel.org Cc: xen-devel@lists.xenproject.org Link: http://lkml.kernel.org/r/20181010061456.22238-3-jgross@suse.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | | | | Merge branch 'x86-cpu-for-linus' of ↵Linus Torvalds2018-10-23
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Ingo Molnar: "The main changes in this cycle were: - Add support for the "Dhyana" x86 CPUs by Hygon: these are licensed based on the AMD Zen architecture, and are built and sold in China, for domestic datacenter use. The code is pretty close to AMD support, mostly with a few quirks and enumeration differences. (Pu Wen) - Enable CPUID support on Cyrix 6x86/6x86L processors" * 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/cpupower: Add Hygon Dhyana support cpufreq: Add Hygon Dhyana support ACPI: Add Hygon Dhyana support x86/xen: Add Hygon Dhyana support to Xen x86/kvm: Add Hygon Dhyana support to KVM x86/mce: Add Hygon Dhyana support to the MCA infrastructure x86/bugs: Add Hygon Dhyana to the respective mitigation machinery x86/apic: Add Hygon Dhyana support x86/pci, x86/amd_nb: Add Hygon Dhyana support to PCI and northbridge x86/amd_nb: Check vendor in AMD-only functions x86/alternative: Init ideal_nops for Hygon Dhyana x86/events: Add Hygon Dhyana support to PMU infrastructure x86/smpboot: Do not use BSP INIT delay and MWAIT to idle on Dhyana x86/cpu/mtrr: Support TOP_MEM2 and get MTRR number x86/cpu: Get cache info and setup cache cpumap for Hygon Dhyana x86/cpu: Create Hygon Dhyana architecture support file x86/CPU: Change query logic so CPUID is enabled before testing x86/CPU: Use correct macros for Cyrix calls