aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu
Commit message (Collapse)AuthorAge
* cpufreq: x86: Make scaling_cur_freq behave more as expectedRafael J. Wysocki2017-07-30
| | | | | | | | | | | | | | | | | | | | | | | After commit f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute in sysfs only behaves as expected on x86 with APERF/MPERF registers available when it is read from at least twice in a row. The value returned by the first read may not be meaningful, because the computations in there use cached values from the previous iteration of aperfmperf_snapshot_khz() which may be stale. To prevent that from happening, modify arch_freq_get_on_cpu() to call aperfmperf_snapshot_khz() twice, with a short delay between these calls, if the previous invocation of aperfmperf_snapshot_khz() was too far back in the past (specifically, more that 1s ago). Also, as pointed out by Doug Smythies, aperf_delta is limited now and the multiplication of it by cpu_khz won't overflow, so simplify the s->khz computations too. Fixes: f8475cef9008 "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF" Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* x86/cpu: Use indirect call to measure performance in init_amd_k6()Mikulas Patocka2017-07-16
| | | | | | | | | | | | | | | | | | | | | | This old piece of code is supposed to measure the performance of indirect calls to determine if the processor is buggy or not, however the compiler optimizer turns it into a direct call. Use the OPTIMIZER_HIDE_VAR() macro to thwart the optimization, so that a real indirect call is generated. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707110737530.8746@file01.intranet.prod.int.rdu2.redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'pm-4.13-rc1' of ↵Linus Torvalds2017-07-04
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The big ticket items here are the rework of suspend-to-idle in order to add proper support for power button wakeup from it on recent Dell laptops and the rework of interfaces exporting the current CPU frequency on x86. In addition to that, support for a few new pieces of hardware is added, the PCI/ACPI device wakeup infrastructure is simplified significantly and the wakeup IRQ framework is fixed to unbreak the IRQ bus locking infrastructure. Also, there are some functional improvements for intel_pstate, tools updates and small fixes and cleanups all over. Specifics: - Rework suspend-to-idle to allow it to take wakeup events signaled by the EC into account on ACPI-based platforms in order to properly support power button wakeup from suspend-to-idle on recent Dell laptops (Rafael Wysocki). That includes the core suspend-to-idle code rework, support for the Low Power S0 _DSM interface, and support for the ACPI INT0002 Virtual GPIO device from Hans de Goede (required for USB keyboard wakeup from suspend-to-idle to work on some machines). - Stop trying to export the current CPU frequency via /proc/cpuinfo on x86 as that is inaccurate and confusing (Len Brown). - Rework the way in which the current CPU frequency is exported by the kernel (over the cpufreq sysfs interface) on x86 systems with the APERF and MPERF registers by always using values read from these registers, when available, to compute the current frequency regardless of which cpufreq driver is in use (Len Brown). - Rework the PCI/ACPI device wakeup infrastructure to remove the questionable and artificial distinction between "devices that can wake up the system from sleep states" and "devices that can generate wakeup signals in the working state" from it, which allows the code to be simplified quite a bit (Rafael Wysocki). - Fix the wakeup IRQ framework by making it use SRCU instead of RCU which doesn't allow sleeping in the read-side critical sections, but which in turn is expected to be allowed by the IRQ bus locking infrastructure (Thomas Gleixner). - Modify some computations in the intel_pstate driver to avoid rounding errors resulting from them (Srinivas Pandruvada). - Reduce the overhead of the intel_pstate driver in the HWP (hardware-managed P-states) mode and when the "performance" P-state selection algorithm is in use by making it avoid registering scheduler callbacks in those cases (Len Brown). - Rework the energy_performance_preference sysfs knob in intel_pstate by changing the values that correspond to different symbolic hint names used by it (Len Brown). - Make it possible to use more than one cpuidle driver at the same time on ARM (Daniel Lezcano). - Make it possible to prevent the cpuidle menu governor from using the 0 state by disabling it via sysfs (Nicholas Piggin). - Add support for FFH (Fixed Functional Hardware) MWAIT in ACPI C1 on AMD systems (Yazen Ghannam). - Make the CPPC cpufreq driver take the lowest nonlinear performance information into account (Prashanth Prakash). - Add support for hi3660 to the cpufreq-dt driver, fix the imx6q driver and clean up the sfi, exynos5440 and intel_pstate drivers (Colin Ian King, Krzysztof Kozlowski, Octavian Purdila, Rafael Wysocki, Tao Wang). - Fix a few minor issues in the generic power domains (genpd) framework and clean it up somewhat (Krzysztof Kozlowski, Mikko Perttunen, Viresh Kumar). - Fix a couple of minor issues in the operating performance points (OPP) framework and clean it up somewhat (Viresh Kumar). - Fix a CONFIG dependency in the hibernation core and clean it up slightly (Balbir Singh, Arvind Yadav, BaoJun Luo). - Add rk3228 support to the rockchip-io adaptive voltage scaling (AVS) driver (David Wu). - Fix an incorrect bit shift operation in the RAPL power capping driver (Adam Lessnau). - Add support for the EPP field in the HWP (hardware managed P-states) control register, HWP.EPP, to the x86_energy_perf_policy tool and update msr-index.h with HWP.EPP values (Len Brown). - Fix some minor issues in the turbostat tool (Len Brown). - Add support for AMD family 0x17 CPUs to the cpupower tool and fix a minor issue in it (Sherry Hurwitz). - Assorted cleanups, mostly related to the constification of some data structures (Arvind Yadav, Joe Perches, Kees Cook, Krzysztof Kozlowski)" * tag 'pm-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (69 commits) cpufreq: Update scaling_cur_freq documentation cpufreq: intel_pstate: Clean up after performance governor changes PM: hibernate: constify attribute_group structures. cpuidle: menu: allow state 0 to be disabled intel_idle: Use more common logging style PM / Domains: Fix missing default_power_down_ok comment PM / Domains: Fix unsafe iteration over modified list of domains PM / Domains: Fix unsafe iteration over modified list of domain providers PM / Domains: Fix unsafe iteration over modified list of device links PM / Domains: Handle safely genpd_syscore_switch() call on non-genpd device PM / Domains: Call driver's noirq callbacks PM / core: Drop run_wake flag from struct dev_pm_info PCI / PM: Simplify device wakeup settings code PCI / PM: Drop pme_interrupt flag from struct pci_dev ACPI / PM: Consolidate device wakeup settings code ACPI / PM: Drop run_wake from struct acpi_device_wakeup_flags PM / QoS: constify *_attribute_group. PM / AVS: rockchip-io: add io selectors and supplies for rk3228 powercap/RAPL: prevent overridding bits outside of the mask PM / sysfs: Constify attribute groups ...
| * x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERFLen Brown2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of this change is to give users a uniform and meaningful result when they read /sys/...cpufreq/scaling_cur_freq on modern x86 hardware, as compared to what they get today. Modern x86 processors include the hardware needed to accurately calculate frequency over an interval -- APERF, MPERF, and the TSC. Here we provide an x86 routine to make this calculation on supported hardware, and use it in preference to any driver driver-specific cpufreq_driver.get() routine. MHz is computed like so: MHz = base_MHz * delta_APERF / delta_MPERF MHz is the average frequency of the busy processor over a measurement interval. The interval is defined to be the time between successive invocations of aperfmperf_khz_on_cpu(), which are expected to to happen on-demand when users read sysfs attribute cpufreq/scaling_cur_freq. As with previous methods of calculating MHz, idle time is excluded. base_MHz above is from TSC calibration global "cpu_khz". This x86 native method to calculate MHz returns a meaningful result no matter if P-states are controlled by hardware or firmware and/or if the Linux cpufreq sub-system is or is-not installed. When this routine is invoked more frequently, the measurement interval becomes shorter. However, the code limits re-computation to 10ms intervals so that average frequency remains meaningful. Discerning users are encouraged to take advantage of the turbostat(8) utility, which can gracefully handle concurrent measurement intervals of arbitrary length. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
| * x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"Len Brown2017-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq_quick_get() allows cpufreq drivers to over-ride cpu_khz that is otherwise reported in x86 /proc/cpuinfo "cpu MHz". There are four problems with this scheme, any of them is sufficient justification to delete it. 1. Depending on which cpufreq driver is loaded, the behavior of this field is different. 2. Distros complain that they have to explain to users why and how this field changes. Distros have requested a constant. 3. The two major providers of this information, acpi_cpufreq and intel_pstate, both "get it wrong" in different ways. acpi_cpufreq lies to the user by telling them that they are running at whatever frequency was last requested by software. intel_pstate lies to the user by telling them that they are running at the average frequency computed over an undefined measurement. But an average computed over an undefined interval, is itself, undefined... 4. On modern processors, user space utilities, such as turbostat(1), are more accurate and more precise, while supporing concurrent measurement over arbitrary intervals. Users who have been consulting /proc/cpuinfo to track changing CPU frequency will be dissapointed that it no longer wiggles -- perhaps being unaware of the limitations of the information they have been consuming. Yes, they can change their scripts to look in sysfs cpufreq/scaling_cur_frequency. Here they will find the same data of dubious quality here removed from /proc/cpuinfo. The value in sysfs will be addressed in a subsequent patch to address issues 1-3, above. Issue 4 will remain -- users that really care about accurate frequency information should not be using either proc or sysfs kernel interfaces. They should be using using turbostat(8), or a similar purpose-built analysis tool. Signed-off-by: Len Brown <len.brown@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | Merge branch 'ras-core-for-linus' of ↵Linus Torvalds2017-07-03
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS updates from Thomas Gleixner: "The RAS updates for the 4.13 merge window: - Cleanup of the MCE injection facility (Borsilav Petkov) - Rework of the AMD/SMCA handling (Yazen Ghannam) - Enhancements for ACPI/APEI to handle new notitication types (Shiju Jose) - atomic_t to refcount_t conversion (Elena Reshetova) - A few fixes and enhancements all over the place" * 'ras-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS/CEC: Check the correct variable in the debugfs error handling x86/mce: Always save severity in machine_check_poll() x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more precise x86/mce: Update bootlog description to reflect behavior on AMD x86/mce: Don't disable MCA banks when offlining a CPU on AMD x86/mce/mce-inject: Preset the MCE injection struct x86/mce: Clean up include files x86/mce: Get rid of register_mce_write_callback() x86/mce: Merge mce_amd_inj into mce-inject x86/mce/AMD: Use saved threshold block info in interrupt handler x86/mce/AMD: Use msr_stat when clearing MCA_STATUS x86/mce/AMD: Carve out SMCA bank configuration x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_t RAS: Make local function parse_ras_param() static ACPI/APEI: Handle GSIV and GPIO notification types
| * | x86/mce: Always save severity in machine_check_poll()Yazen Ghannam2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MCE severity gives a hint as to how to handle the error. The notifier blocks can then use the severity to decide on an action. It's not necessary for machine_check_poll() to filter errors for the notifier chain, since each block will check its own set of conditions before handling an error. Also, there isn't any urgency for machine_check_poll() to make decisions based on severity like in do_machine_check(). If we can assume that a severity is set then we can use it in more notifier blocks. For example, the CEC block could check for a "KEEP" severity rather than checking bits in the status. This isn't possible now since the severity is not set except for "DEFFRRED/UCNA" errors with a valid address. Save the severity since we have it, and let the notifier blocks decide if they want to do anything. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1498074402-98633-1-git-send-email-Yazen.Ghannam@amd.com
| * | x86/MCE, xen/mcelog: Make /dev/mcelog registration messages more preciseJuergen Gross2017-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running under Xen as dom0, /dev/mcelog is being provided by Xen instead of the normal mcelog character device of the MCE core. Convert an error message being issued by the MCE core in this case to an informative message that Xen has registered the device. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: xen-devel@lists.xenproject.org Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170614084059.19294-1-jgross@suse.com
| * | x86/mce: Update bootlog description to reflect behavior on AMDYazen Ghannam2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bootlog option is only disabled by default on AMD Fam10h and older systems. Update bootlog description to say this. Change the family value to hex to avoid confusion. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-9-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce: Don't disable MCA banks when offlining a CPU on AMDYazen Ghannam2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AMD systems have non-core, shared MCA banks within a die. These banks are controlled by a master CPU per die. If this CPU is offlined then all the shared banks are disabled in addition to the CPU's core banks. Also, Fam17h systems may have SMT enabled. The MCA_CTL register is shared between SMT thread siblings. If a CPU is offlined then all its sibling's MCA banks are also disabled. Extend the existing vendor check to AMD too. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> [ Fix up comment. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-8-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce/mce-inject: Preset the MCE injection structBorislav Petkov2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Populate the MCE injection struct before doing initial injection so that values which don't change have sane defaults. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-7-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce: Clean up include filesBorislav Petkov2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not really needed. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-6-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce: Get rid of register_mce_write_callback()Borislav Petkov2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make the mcelog call a notifier which lands in the injector module and does the injection. This allows for mce-inject to be a normal kernel module now. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-5-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce: Merge mce_amd_inj into mce-injectBorislav Petkov2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reuse mce_amd_inj's debugfs interface so that mce-inject can benefit from it too. The old functionality is still preserved under CONFIG_X86_MCELOG_LEGACY. Tested-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Yazen Ghannam <yazen.ghannam@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20170613162835.30750-4-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce/AMD: Use saved threshold block info in interrupt handlerYazen Ghannam2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the amd_threshold_interrupt() handler, we loop through every possible block in each bank and rediscover the block's address and if it's valid, e.g. valid, counter present and not locked. However, we already have the address saved in the threshold blocks list for each CPU and bank. The list only contains blocks that have passed all the valid checks. Besides the redundancy, there's also a smp_call_function* in get_block_address() which causes a warning when servicing the interrupt: WARNING: CPU: 0 PID: 0 at kernel/smp.c:281 smp_call_function_single+0xdd/0xf0 ... Call Trace: <IRQ> rdmsr_safe_on_cpu() get_block_address.isra.2() amd_threshold_interrupt() smp_threshold_interrupt() threshold_interrupt() because we do get called in an interrupt handler *with* interrupts disabled, which can result in a deadlock. Drop the redundant valid checks and move the overflow check, logging and block reset into a separate function. Check the first block then iterate over the rest. This procedure is needed since the first block is used as the head of the list. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170613162835.30750-3-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce/AMD: Use msr_stat when clearing MCA_STATUSYazen Ghannam2017-06-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The value of MCA_STATUS is used as the MSR when clearing MCA_STATUS. This may cause the following warning: unchecked MSR access error: WRMSR to 0x11b (tried to write 0x0000000000000000) Call Trace: <IRQ> smp_threshold_interrupt() threshold_interrupt() Use msr_stat instead which has the MSR address. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Link: http://lkml.kernel.org/r/20170613162835.30750-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | Merge tag 'v4.12-rc5' into ras/core, to pick up fixesIngo Molnar2017-06-14
| |\| | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | x86/mce/AMD: Carve out SMCA bank configurationYazen Ghannam2017-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Scalable MCA systems have a new MCA_CONFIG register that we use to configure each bank. We currently use this when we set up thresholding. However, this is logically separate. Group all SMCA-related initialization into a single function. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-2-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/mce/AMD: Redo error logging from APIC LVT interrupt handlersYazen Ghannam2017-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have support for the new SMCA MCA_DE{STAT,ADDR} registers in Linux. So we've used these registers in place of MCA_{STATUS,ADDR} on SMCA systems. However, the guidance for current SMCA implementations of is to continue using MCA_{STATUS,ADDR} and to use MCA_DE{STAT,ADDR} only if a Deferred error was not found in the former registers. If we logged a Deferred error in MCA_STATUS then we should also clear MCA_DESTAT. This also means we shouldn't clear MCA_CONFIG[LogDeferredInMcaStat]. Rework __log_error() to only log an error and add helpers for the different error types being logged from the corresponding interrupt handlers. Boris: carve out common functionality into a _log_error_bank(). Cleanup comments, check MCi_STATUS bits before reading MSRs. Streamline flow. Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1493147772-2721-1-git-send-email-Yazen.Ghannam@amd.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | x86/mce: Convert threshold_bank.cpus from atomic_t to refcount_tElena Reshetova2017-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Suggested-by: Kees Cook <keescook@chromium.org> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Reviewed-by: David Windsor <dwindsor@gmail.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1492695536-5947-1-git-send-email-elena.reshetova@intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | Merge branch 'smp-hotplug-for-linus' of ↵Linus Torvalds2017-07-03
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull SMP hotplug updates from Thomas Gleixner: "This update is primarily a cleanup of the CPU hotplug locking code. The hotplug locking mechanism is an open coded RWSEM, which allows recursive locking. The main problem with that is the recursive nature as it evades the full lockdep coverage and hides potential deadlocks. The rework replaces the open coded RWSEM with a percpu RWSEM and establishes full lockdep coverage that way. The bulk of the changes fix up recursive locking issues and address the now fully reported potential deadlocks all over the place. Some of these deadlocks have been observed in the RT tree, but on mainline the probability was low enough to hide them away." * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) cpu/hotplug: Constify attribute_group structures powerpc: Only obtain cpu_hotplug_lock if called by rtasd ARM/hw_breakpoint: Fix possible recursive locking for arch_hw_breakpoint_init cpu/hotplug: Remove unused check_for_tasks() function perf/core: Don't release cred_guard_mutex if not taken cpuhotplug: Link lock stacks for hotplug callbacks acpi/processor: Prevent cpu hotplug deadlock sched: Provide is_percpu_thread() helper cpu/hotplug: Convert hotplug locking to percpu rwsem s390: Prevent hotplug rwsem recursion arm: Prevent hotplug rwsem recursion arm64: Prevent cpu hotplug rwsem recursion kprobes: Cure hotplug lock ordering issues jump_label: Reorder hotplug lock and jump_label_lock perf/tracing/cpuhotplug: Fix locking order ACPI/processor: Use cpu_hotplug_disable() instead of get_online_cpus() PCI: Replace the racy recursion prevention PCI: Use cpu_hotplug_disable() instead of get_online_cpus() perf/x86/intel: Drop get_online_cpus() in intel_snb_check_microcode() x86/perf: Drop EXPORT of perf_check_microcode ...
| * | | x86/mtrr: Remove get_online_cpus() from mtrr_save_state()Sebastian Andrzej Siewior2017-05-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtrr_save_state() is invoked from native_cpu_up() which is in the context of a CPU hotplug operation and therefor calling get_online_cpus() is pointless. While this works in the current get_online_cpus() implementation it prevents from converting the hotplug locking to percpu rwsems. Remove it. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20170524081547.651378834@linutronix.de
* | | | Merge branch 'x86-microcode-for-linus' of ↵Linus Torvalds2017-07-03
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 microcode updates from Ingo Molnar: "The main changes are a fix early microcode application for resume-from-RAM, plus a 32-bit initrd placement fix - by Borislav Petkov" * 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/microcode: Make a couple of symbols static x86/microcode/intel: Save pointer to ucode patch for early AP loading x86/microcode: Look for the initrd at the correct address on 32-bit
| * | | | x86/microcode: Make a couple of symbols staticColin Ian King2017-06-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The helper function __load_ucode_amd() and pointer intel_ucode_patch do not need to be in global scope, so make them static. Fixes those sparse warnings: "symbol '__load_ucode_amd' was not declared. Should it be static?" "symbol 'intel_ucode_patch' was not declared. Should it be static?" Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170622095736.11937-1-colin.king@canonical.com
| * | | | x86/microcode/intel: Save pointer to ucode patch for early AP loadingBorislav Petkov2017-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, when the initrd is gone, we can't search it for microcode blobs to apply anymore. For that we need to stash away the patch in our own storage. And save_microcode_in_initrd_intel() looks like the proper place to do that from. So in order for early loading to work, invalidate the intel_ucode_patch pointer to the patch *before* scanning the initrd one last time. If the scanning code finds a microcode patch, it will assign that pointer again, this time with our own storage's address. This way, early microcode application during resume-from-RAM works too, even after the initrd is long gone. Tested-by: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-2-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | x86/microcode: Look for the initrd at the correct address on 32-bitBorislav Petkov2017-06-20
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Early during boot, the BSP finds the ramdisk's position from boot_params but by the time the APs get to boot, the BSP has continued in the mean time and has potentially managed to relocate that ramdisk. And in that case, the APs need to find the ramdisk at its new position, in *physical* memory as they're running before paging has been enabled. Thus, get the updated physical location of the ramdisk which is in the relocated_ramdisk variable. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170614140626.4462-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | Merge branch 'x86-hyperv-for-linus' of ↵Linus Torvalds2017-07-03
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 hyperv updates from Ingo Molnar: "Avoid boot time TSC calibration on Hyper-V hosts, to improve calibration robustness. (Vitaly Kuznetsov)" * 'x86-hyperv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/hyperv: Read TSC frequency from a synthetic MSR x86/hyperv: Check frequency MSRs presence according to the specification
| * | | | x86/hyperv: Read TSC frequency from a synthetic MSRVitaly Kuznetsov2017-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was found that SMI_TRESHOLD of 50000 is not enough for Hyper-V guests in nested environment and falling back to counting jiffies is not an option for Gen2 guests as they don't have PIT. As Hyper-V provides TSC frequency in a synthetic MSR we can just use this information instead of doing a error prone calibration. Reported-and-tested-by: Ladi Prosek <lprosek@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-3-vkuznets@redhat.com
| * | | | x86/hyperv: Check frequency MSRs presence according to the specificationVitaly Kuznetsov2017-06-22
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hyper-V TLFS specifies two bits which should be checked before accessing frequency MSRs: - AccessFrequencyMsrs (BIT(11) in EAX) which indicates if we have access to frequency MSRs. - FrequencyMsrsAvailable (BIT(8) in EDX) which indicates is these MSRs are present. Rename and specify these bits accordingly. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Ladi Prosek <lprosek@redhat.com> Cc: Jork Loeser <jloeser@microsoft.com> Cc: devel@linuxdriverproject.org Cc: "K. Y. Srinivasan" <kys@microsoft.com> Link: http://lkml.kernel.org/r/20170622100730.18112-2-vkuznets@redhat.com
* / / / x86/intel_rdt: Fix memory leak on mount failureVikas Shivappa2017-06-30
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If mount fails, the kn_info directory is not freed causing memory leak. Add the missing error handling path. Fixes: 4e978d06dedb ("x86/intel_rdt: Add "info" files to resctrl file system") Signed-off-by: Vikas Shivappa <vikas.shivappa@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: ravi.v.shankar@intel.com Cc: tony.luck@intel.com Cc: fenghua.yu@intel.com Cc: peterz@infradead.org Cc: vikas.shivappa@intel.com Cc: andi.kleen@intel.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1498503368-20173-3-git-send-email-vikas.shivappa@linux.intel.com
* | | x86/microcode/intel: Clear patch pointer before jettisoning the initrdDominik Brodowski2017-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During early boot, load_ucode_intel_ap() uses __load_ucode_intel() to obtain a pointer to the relevant microcode patch (embedded in the initrd), and stores this value in 'intel_ucode_patch' to speed up the microcode patch application for subsequent CPUs. On resuming from suspend-to-RAM, however, load_ucode_ap() calls load_ucode_intel_ap() for each non-boot-CPU. By then the initramfs is long gone so the pointer stored in 'intel_ucode_patch' no longer points to a valid microcode patch. Clear that pointer so that we effectively fall back to the CPU hotplug notifier callbacks to update the microcode. Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net> [ Edit and massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.10.. Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170607095819.9754-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/cpu/cyrix: Add alternative Device ID of Geode GX1 SoCChristian Sünkenberg2017-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A SoC variant of Geode GX1, notably NSC branded SC1100, seems to report an inverted Device ID in its DIR0 configuration register, specifically 0xb instead of the expected 0x4. Catch this presumably quirky version so it's properly recognized as GX1 and has its cache switched to write-back mode, which provides a significant performance boost in most workloads. SC1100's datasheet "Geode™ SC1100 Information Appliance On a Chip", states in section 1.1.7.1 "Device ID" that device identification values are specified in SC1100's device errata. These, however, seem to not have been publicly released. Wading through a number of boot logs and /proc/cpuinfo dumps found on pastebin and blogs, this patch should mostly be relevant for a number of now admittedly aging Soekris NET4801 and PC Engines WRAP devices, the latter being the platform this issue was discovered on. Performance impact was verified using "openssl speed", with write-back caching scaling throughput between -3% and +41%. Signed-off-by: Christian Sünkenberg <christian.suenkenberg@student.kit.edu> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1496596719.26725.14.camel@student.kit.edu Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | x86/microcode/AMD: Change load_microcode_amd()'s param to bool to fix ↵Borislav Petkov2017-05-29
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | preemptibility bug With CONFIG_DEBUG_PREEMPT enabled, I get: BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is debug_smp_processor_id CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.12.0-rc2+ #2 Call Trace: dump_stack check_preemption_disabled debug_smp_processor_id save_microcode_in_initrd_amd ? microcode_init save_microcode_in_initrd ... because, well, it says it above, we're using smp_processor_id() in preemptible code. But passing the CPU number is not really needed. It is only used to determine whether we're on the BSP, and, if so, to save the microcode patch for early loading. [ We don't absolutely need to do it on the BSP but we do that customarily there. ] Instead, convert that function parameter to a boolean which denotes whether the patch should be saved or not, thereby avoiding the use of smp_processor_id() in preemptible code. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170528200414.31305-1-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | x86/MCE: Export memory_error()Borislav Petkov2017-05-21
|/ | | | | | | | | | | | | | | | | | | | | Export the function which checks whether an MCE is a memory error to other users so that we can reuse the logic. Drop the boot_cpu_data use, while at it, as mce.cpuvendor already has the CPU vendor in there. Integrate a piece from a patch from Vishal Verma <vishal.l.verma@intel.com> to export it for modules (nfit). The main reason we're exporting it is that the nfit handler nfit_handle_mce() needs to detect a memory error properly before doing its recovery actions. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20170519093915.15413-2-bp@alien8.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Tigran has movedAndrew Morton2017-05-12
| | | | | | Cc: Tigran Aivazian <aivazian.tigran@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2017-05-12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: - two boot crash fixes - unwinder fixes - kexec related kernel direct mappings enhancements/fixes - more Clang support quirks - minor cleanups - Documentation fixes" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/intel_rdt: Fix a typo in Documentation x86/build: Don't add -maccumulate-outgoing-args w/o compiler support x86/boot/32: Fix UP boot on Quark and possibly other platforms x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() x86/kexec/64: Use gbpages for identity mappings if available x86/mm: Add support for gbpages to kernel_ident_mapping_init() x86/boot: Declare error() as noreturn x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility x86/mm: Fix boot crash caused by incorrect loop count calculation in sync_global_pgds() x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic() x86/microcode/AMD: Remove redundant NULL check on mc
| * Merge branch 'linus' into x86/urgent, to pick up dependent commitsIngo Molnar2017-05-05
| |\ | | | | | | | | | | | | | | | | | | We are going to fix a bug introduced by a more recent commit, so refresh the tree. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * \ Merge branch 'x86/microcode' into x86/urgent, to pick up cleanupIngo Molnar2017-05-01
| |\ \ | | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/microcode/AMD: Remove redundant NULL check on mcColin Ian King2017-03-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mc is a pointer to the static u8 array amd_ucode_patch and therefore can never be null, so the check is redundant. Remove it. Detected by CoverityScan, CID#1372871 ("Logically Dead Code") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: kernel-janitors@vger.kernel.org Cc: x86-ml <x86@kernel.org> Link: http://lkml.kernel.org/r/20170315171010.17536-1-colin.king@canonical.com Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge tag 'for-linus-4.12b-rc0c-tag' of ↵Linus Torvalds2017-05-12
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two fixes for booting under Xen introduced during this merge window and two fixes for older problems, where one is just much more probable due to another merge window change" * tag 'for-linus-4.12b-rc0c-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: adjust early dom0 p2m handling to xen hypervisor behavior x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under Xen xen/x86: Do not call xen_init_time_ops() until shared_info is initialized x86/xen: fix xsave capability setting
| * | | | x86/amd: don't set X86_BUG_SYSRET_SS_ATTRS when running under XenJuergen Gross2017-05-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running as Xen pv guest X86_BUG_SYSRET_SS_ATTRS must not be set on AMD cpus. This bug/feature bit is kind of special as it will be used very early when switching threads. Setting the bit and clearing it a little bit later leaves a critical window where things can go wrong. This time window has enlarged a little bit by using setup_clear_cpu_cap() instead of the hypervisor's set_cpu_features callback. It seems this larger window now makes it rather easy to hit the problem. The proper solution is to never set the bit in case of Xen. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | | | x86: use set_memory.h headerLaura Abbott2017-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | set_memory_* functions have moved to set_memory.h. Switch to this explicitly. Link: http://lkml.kernel.org/r/1488920133-27229-6-git-send-email-labbott@redhat.com Signed-off-by: Laura Abbott <labbott@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Merge tag 'char-misc-4.12-rc1' of ↵Linus Torvalds2017-05-04
|\ \ \ \ \ | |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver updates from Greg KH: "Here is the big set of new char/misc driver drivers and features for 4.12-rc1. There's lots of new drivers added this time around, new firmware drivers from Google, more auxdisplay drivers, extcon drivers, fpga drivers, and a bunch of other driver updates. Nothing major, except if you happen to have the hardware for these drivers, and then you will be happy :) All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (136 commits) firmware: google memconsole: Fix return value check in platform_memconsole_init() firmware: Google VPD: Fix return value check in vpd_platform_init() goldfish_pipe: fix build warning about using too much stack. goldfish_pipe: An implementation of more parallel pipe fpga fr br: update supported version numbers fpga: region: release FPGA region reference in error path fpga altera-hps2fpga: disable/unprepare clock on error in alt_fpga_bridge_probe() mei: drop the TODO from samples firmware: Google VPD sysfs driver firmware: Google VPD: import lib_vpd source files misc: lkdtm: Add volatile to intentional NULL pointer reference eeprom: idt_89hpesx: Add OF device ID table misc: ds1682: Add OF device ID table misc: tsl2550: Add OF device ID table w1: Remove unneeded use of assert() and remove w1_log.h w1: Use kernel common min() implementation uio_mf624: Align memory regions to page size and set correct offsets uio_mf624: Refactor memory info initialization uio: Allow handling of non page-aligned memory regions hangcheck-timer: Fix typo in comment ...
| * | | | Drivers: hv: Issue explicit EOI when autoeoi is not enabledK. Y. Srinivasan2017-04-08
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When auto EOI is not enabled; issue an explicit EOI for hyper-v interrupts. Fixes: 6c248aad81c8 ("Drivers: hv: Base autoeoi enablement based on hypervisor hints") Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | x86/cpu: remove hypervisor specific set_cpu_featuresJuergen Gross2017-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no user of x86_hyper->set_cpu_features() any more. Remove it. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | | vmware: set cpu capabilities during platform initializationJuergen Gross2017-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to set the same capabilities for each cpu individually. This can be done for all cpus in platform initialization. Cc: Alok Kataria <akataria@vmware.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: virtualization@lists.linux-foundation.org Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Acked-by: Alok Kataria <akataria@vmware.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | | x86/xen: add CONFIG_XEN_PV to KconfigVitaly Kuznetsov2017-05-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All code to support Xen PV will get under this new option. For the beginning, check for it in the common code. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | | x86/xen: separate PV and HVM hypervisorsVitaly Kuznetsov2017-05-02
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As a preparation to splitting the code we need to untangle it: x86_hyper_xen -> x86_hyper_xen_hvm and x86_hyper_xen_pv xen_platform() -> xen_platform_hvm() and xen_platform_pv() xen_cpu_up_prepare() -> xen_cpu_up_prepare_pv() and xen_cpu_up_prepare_hvm() xen_cpu_dead() -> xen_cpu_dead_pv() and xen_cpu_dead_pv_hvm() Add two parameters to xen_cpuhp_setup() to pass proper cpu_up_prepare and cpu_dead hooks. xen_set_cpu_features() is now PV-only so the redundant xen_pv_domain() check can be dropped. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com>
* | | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2017-05-02
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main x86 MM changes in this cycle were: - continued native kernel PCID support preparation patches to the TLB flushing code (Andy Lutomirski) - various fixes related to 32-bit compat syscall returning address over 4Gb in applications, launched from 64-bit binaries - motivated by C/R frameworks such as Virtuozzo. (Dmitry Safonov) - continued Intel 5-level paging enablement: in particular the conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov) - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel) - ... plus misc updates, fixes and cleanups" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits) mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash x86/mm: Fix flush_tlb_page() on Xen x86/mm: Make flush_tlb_mm_range() more predictable x86/mm: Remove flush_tlb() and flush_tlb_current_task() x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly() x86/mm/64: Fix crash in remove_pagetable() Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation" x86/boot/e820: Remove a redundant self assignment x86/mm: Fix dump pagetables for 4 levels of page tables x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()" x86/espfix: Add support for 5-level paging x86/kasan: Extend KASAN to support 5-level paging x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y x86/paravirt: Add 5-level support to the paravirt code x86/mm: Define virtual memory map for 5-level paging x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert x86/boot: Detect 5-level paging support x86/mm/numa: Remove numa_nodemask_from_meminfo() ...
| * \ \ Merge branch 'x86/boot' into x86/mm, to avoid conflictIngo Molnar2017-04-11
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a conflict between ongoing level-5 paging support and the E820 rewrite. Since the E820 rewrite is essentially ready, merge it into x86/mm to reduce tree conflicts. Signed-off-by: Ingo Molnar <mingo@kernel.org>