aboutsummaryrefslogtreecommitdiffstats
path: root/arch/x86/kernel/cpu/mcheck
Commit message (Collapse)AuthorAge
...
| | * x86/MCE: Make mce_panic() fatal machine check msg in the same patternDerek Che2015-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is another mce_panic call with "Fatal machine check on current CPU" in the same mce.c file, why not keep them all in same pattern mce_panic("Fatal machine check on current CPU", &m, msg); Signed-off-by: Derek Che <drc@yahoo-inc.com> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
| | * x86/MCE/intel: Cleanup CMCI storm logicBorislav Petkov2015-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initially, this started with the yet another report about a race condition in the CMCI storm adaptive period length thing. Yes, we have to admit, it is fragile and error prone. So let's simplify it. The simpler logic is: now, after we enter storm mode, we go straight to polling with CMCI_STORM_INTERVAL, i.e. once a second. We remain in storm mode as long as we see errors being logged while polling. Theoretically, if we see an uninterrupted error stream, we will remain in storm mode indefinitely and keep polling the MSRs. However, when the storm is actually a burst of errors, once we have logged them all, we back out of it after ~5 mins of polling and no more errors logged. If we encounter an error during those 5 minutes, we reset the polling interval to 5 mins. Making machine_check_poll() return a bool and denoting whether it has seen an error or not lets us simplify a bunch of code and move the storm handling private to mce_intel.c. Some minor cleanups while at it. Reported-by: Calvin Owens <calvinowens@fb.com> Tested-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1417746575-23299-1-git-send-email-calvinowens@fb.com Signed-off-by: Borislav Petkov <bp@suse.de>
| | * x86/MCE/AMD: Drop bogus const modifier from AMD's bank4_names()Jan Beulich2015-02-19
| | | | | | | | | | | | | | | | | | | | | | | | The compiler validly warns about it being ignored. Signed-off-by: Jan Beulich <jbeulich@suse.com> Link: http://lkml.kernel.org/r/54C21511020000780005890E@mail.emea.novell.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | | Merge tag 'please-pull-fixmcelog' of ↵Linus Torvalds2015-02-17
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull mcelog regression fix from Tony Luck: "Fix regression - functions on the mce notifier chain should not be able to decide that an event should not be logged" * tag 'please-pull-fixmcelog' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: x86/mce: Fix regression. All error records should report via /dev/mcelog
| * | | x86/mce: Fix regression. All error records should report via /dev/mcelogTony Luck2015-02-09
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm getting complaints from validation teams that have updated their Linux kernels from ancient versions to current. They don't see the error logs they expect. I tell the to unload any EDAC drivers[1], and things start working again. The problem is that we short-circuit the logging process if any function on the decoder chain claims to have dealt with the problem: ret = atomic_notifier_call_chain(&x86_mce_decoder_chain, 0, m); if (ret == NOTIFY_STOP) return; The logic we used when we added this code was that we did not want to confuse users with double reports of the same error. But it turns out users are not confused - they are upset that they don't see a log where their tools used to find a log. I could also get into a long description of how the consumer of this log does more than just decode model specific details of the error. It keeps counts, tracks thresholds, takes actions and runs scripts that can alert administrators to problems. [1] We've recently compounded the problem because the acpi_extlog driver also registers for this notifier and also returns NOTIFY_STOP. Signed-off-by: Tony Luck <tony.luck@intel.com>
* | | Merge branch 'perf-core-for-linus' of ↵Linus Torvalds2015-02-16
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 perf updates from Ingo Molnar: "This series tightens up RDPMC permissions: currently even highly sandboxed x86 execution environments (such as seccomp) have permission to execute RDPMC, which may leak various perf events / PMU state such as timing information and other CPU execution details. This 'all is allowed' RDPMC mode is still preserved as the (non-default) /sys/devices/cpu/rdpmc=2 setting. The new default is that RDPMC access is only allowed if a perf event is mmap-ed (which is needed to correctly interpret RDPMC counter values in any case). As a side effect of these changes CR4 handling is cleaned up in the x86 code and a shadow copy of the CR4 value is added. The extra CR4 manipulation adds ~ <50ns to the context switch cost between rdpmc-capable and rdpmc-non-capable mms" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add /sys/devices/cpu/rdpmc=2 to allow rdpmc for all tasks perf/x86: Only allow rdpmc if a perf_event is mapped perf: Pass the event to arch_perf_update_userpage() perf: Add pmu callbacks to track event mapping and unmapping x86: Add a comment clarifying LDT context switching x86: Store a per-cpu shadow copy of CR4 x86: Clean up cr4 manipulation
| * | x86: Clean up cr4 manipulationAndy Lutomirski2015-02-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds2015-02-09
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS update from Ingo Molnar: "The changes in this cycle were: - allow mmcfg access to APEI error injection handlers - improve MCE error messages - smaller cleanups" * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix sparse errors x86, mce: Improve timeout error messages ACPI, EINJ: Enhance error injection tolerance level
| * | x86, mce: Fix sparse errorsBorislav Petkov2014-12-22
| | | | | | | | | | | | | | | | | | Make stuff used in mce.c only, static. Signed-off-by: Borislav Petkov <bp@suse.de>
| * | x86, mce: Improve timeout error messagesAndy Lutomirski2014-12-22
| |/ | | | | | | | | | | | | | | | | There are four different possible types of timeouts. Distinguish them in the logs to help debug them. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/0fa6d2653a54a01c48b43a3583caf950ea99606e.1419178397.git.luto@amacapital.net Signed-off-by: Borislav Petkov <bp@suse.de>
* | x86, mce: Get rid of TIF_MCE_NOTIFY and associated mce tricksLuck, Tony2015-01-07
| | | | | | | | | | | | | | | | | | | | We now switch to the kernel stack when a machine check interrupts during user mode. This means that we can perform recovery actions in the tail of do_machine_check() Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
* | x86, traps: Track entry into and exit from IST contextAndy Lutomirski2015-01-02
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently pretend that IST context is like standard exception context, but this is incorrect. IST entries from userspace are like standard exceptions except that they use per-cpu stacks, so they are atomic. IST entries from kernel space are like NMIs from RCU's perspective -- they are not quiescent states even if they interrupted the kernel during a quiescent state. Add and use ist_enter and ist_exit to track IST context. Even though x86_32 has no IST stacks, we track these interrupts the same way. This fixes two issues: - Scheduling from an IST interrupt handler will now warn. It would previously appear to work as long as we got lucky and nothing overwrote the stack frame. (I don't know of any bugs in this that would trigger the warning, but it's good to be on the safe side.) - RCU handling in IST context was dangerous. As far as I know, only machine checks were likely to trigger this, but it's good to be on the safe side. Note that the machine check handlers appears to have been missing any context tracking at all before this patch. Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Andy Lutomirski <luto@amacapital.net>
* x86/mce: Spell "panicked" correctlyBorislav Petkov2014-12-08
| | | | | | | | | | We need the additional "k" to make it a hard-c: https://en.wiktionary.org/wiki/panicked Signed-off-by: Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1417642605-15730-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86, mce: Support memory error recovery for both UCNA and Deferred error in ↵Chen Yucong2014-11-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | machine_check_poll Uncorrected no action required (UCNA) - is a uncorrected recoverable machine check error that is not signaled via a machine check exception and, instead, is reported to system software as a corrected machine check error. UCNA errors indicate that some data in the system is corrupted, but the data has not been consumed and the processor state is valid and you may continue execution on this processor. UCNA errors require no action from system software to continue execution. Note that UCNA errors are supported by the processor only when IA32_MCG_CAP[24] (MCG_SER_P) is set. -- Intel SDM Volume 3B Deferred errors are errors that cannot be corrected by hardware, but do not cause an immediate interruption in program flow, loss of data integrity, or corruption of processor state. These errors indicate that data has been corrupted but not consumed. Hardware writes information to the status and address registers in the corresponding bank that identifies the source of the error if deferred errors are enabled for logging. Deferred errors are not reported via machine check exceptions; they can be seen by polling the MCi_STATUS registers. -- AMD64 APM Volume 2 Above two items, both UCNA and Deferred errors belong to detected errors, but they can't be corrected by hardware, and this is very similar to Software Recoverable Action Optional (SRAO) errors. Therefore, we can take some actions that have been used for handling SRAO errors to handle UCNA and Deferred errors. Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* x86, mce, severity: Extend the the mce_severity mechanism to handle ↵Chen Yucong2014-11-19
| | | | | | | | | | | | | | | | | | | | | | UCNA/DEFERRED error Until now, the mce_severity mechanism can only identify the severity of UCNA error as MCE_KEEP_SEVERITY. Meanwhile, it is not able to filter out DEFERRED error for AMD platform. This patch extends the mce_severity mechanism for handling UCNA/DEFERRED error. In order to do this, the patch introduces a new severity level - MCE_UCNA/DEFERRED_SEVERITY. In addition, mce_severity is specific to machine check exception, and it will check MCIP/EIPV/RIPV bits. In order to use mce_severity mechanism in non-exception context, the patch also introduces a new argument (is_excp) for mce_severity. `is_excp' is used to explicitly specify the calling context of mce_severity. Reviewed-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* x86, MCE, AMD: Assign interrupt handler only when bank supports itChen Yucong2014-11-01
| | | | | | | | | | | | There are some AMD CPU models which have thresholding banks but which cannot generate a thresholding interrupt. This is denoted by the bit MCi_MISC[IntP]. Make sure to check that bit before assigning the thresholding interrupt handler. Signed-off-by: Chen Yucong <slaoub@gmail.com> [ Boris: save an indentation level and rewrite commit message. ] Link: http://lkml.kernel.org/r/1412662128.28440.18.camel@debian Signed-off-by: Borislav Petkov <bp@suse.de>
* x86, MCE, AMD: Drop software-defined bank in error thresholdingBorislav Petkov2014-10-21
| | | | | | | | | | | | | | | | | | | | | | Aravind had the good question about why we're assigning a software-defined bank when reporting error thresholding errors instead of simply using the bank which reports the last error causing the overflow. Digging through git history, it pointed to 95268664390b ("[PATCH] x86_64: mce_amd support for family 0x10 processors") which added that functionality. The problem with this, however, is that tools don't know about software-defined banks and get puzzled. So drop that K8_MCE_THRESHOLD_BASE and simply use the hw bank reporting the thresholding interrupt. Save us a couple of MSR reads while at it. Reported-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: https://lkml.kernel.org/r/5435B206.60402@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* x86, MCE, AMD: Move invariant code out from loop bodyChen Yucong2014-10-21
| | | | | | | | | | Assigning to mce_threshold_vector is loop-invariant code in mce_amd_feature_init(). So do it only once, out of loop body. Signed-off-by: Chen Yucong <slaoub@gmail.com> Link: http://lkml.kernel.org/r/1412263212.8085.6.camel@debian [ Boris: commit message corrections. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* x86, MCE, AMD: Correct thresholding error loggingChen Yucong2014-10-21
| | | | | | | | | | | | | mce_setup() does not gather the content of IA32_MCG_STATUS, so it should be read explicitly. Moreover, we need to clear IA32_MCx_STATUS to avoid that mce_log() logs the processed threshold event again at next time. But we do the logging ourselves and machine_check_poll() is completely useless there. So kill it. Signed-off-by: Chen Yucong <slaoub@gmail.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* x86, MCE, AMD: Use macros to compute bank MSRsChen Yucong2014-10-21
| | | | | | | | | | | Avoid open coded calculations for bank MSRs by hiding the index of higher bank MSRs in well-defined macros. No semantic changes. Signed-off-by: Chen Yucong <slaoub@gmail.com> Link: http://lkml.kernel.org/r/1411438561-24319-1-git-send-email-slaoub@gmail.com Signed-off-by: Borislav Petkov <bp@suse.de>
* Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds2014-10-15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
| * x86: Replace __get_cpu_var usesChristoph Lameter2014-08-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | x86/mce: Avoid showing repetitive message from intel_init_thermal()Rakib Mullick2014-09-19
|/ | | | | | | | | | | | | | | | | | | | intel_init_thermal() is called from a) at the time of system initializing and b) at the time of system resume to initialize thermal monitoring. In case when thermal monitoring is handled by SMI, we get to know it via printk(). Currently it gives the message at both cases, but its okay if we get it only once and no need to get the same message at every time system resumes. So, limit showing this message only at system boot time by avoid showing at system resume and reduce abusing kernel log buffer. Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1411068135.5121.10.camel@localhost.localdomain Signed-off-by: Ingo Molnar <mingo@kernel.org>
* arch/x86: replace strict_strto callsDaniel Walter2014-08-08
| | | | | | | | | | | | Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: Daniel Walter <dwalter@google.com> Acked-by: Borislav Petkov <bp@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86: MCE: Add raw_lock conversion againThomas Gleixner2014-08-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ea431643d6c3 ("x86/mce: Fix CMCI preemption bugs") breaks RT by the completely unrelated conversion of the cmci_discover_lock to a regular (non raw) spinlock. This lock was annotated in commit 59d958d2c7de ("locking, x86: mce: Annotate cmci_discover_lock as raw") with a proper explanation why. The argument for converting the lock back to a regular spinlock was: - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. Which is complete nonsense. The raw_spinlock is disabling preemption in the same way as a regular spinlock. In mainline spinlock maps to raw_spinlock, in RT spinlock becomes a "sleeping" lock. raw_spinlock has on RT exactly the same semantics as in mainline. And because this lock is taken in non preemptible context it must be raw on RT. Undo the locking brainfart. Reported-by: Clark Williams <williams@redhat.com> Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* x86, MCE: Robustify mcheck_init_deviceBorislav Petkov2014-06-24
| | | | | | | | | | | | | | | BorisO reports that misc_register() fails often on xen. The current code unregisters the CPU hotplug notifier in that case. If then a CPU is offlined and onlined back again, we end up with a second timer running on that CPU, leading to soft lockups and system hangs. So let's leave the hotcpu notifier always registered - even if mce_device_create failed for some cores and never unreg it so that we can deal with the timer handling accordingly. Reported-and-Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1403274493-1371-1-git-send-email-boris.ostrovsky@oracle.com Signed-off-by: Borislav Petkov <bp@suse.de>
* x86, MCE: Kill CPU_POST_DEADBorislav Petkov2014-06-22
| | | | | | | | | In conjunction with cleaning up CPU hotplug, we want to get rid of CPU_POST_DEAD. Kill this instance here and rediscover CMCI banks at the end of CPU_DEAD. Link: http://lkml.kernel.org/r/http://lkml.kernel.org/r/1400750624-19238-1-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de>
* hwpoison: remove unused global variable in do_machine_check()Chen Yucong2014-06-04
| | | | | | | | | | | | | | | Remove an unused global variable mce_entry and relative operations in do_machine_check(). Signed-off-by: Chen Yucong <slaoub@gmail.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mce: Panic when a core has reached a timeoutBorislav Petkov2014-05-30
| | | | | | | | | There is very little and maybe practically nothing we can do to recover from a system where at least one core has reached a timeout during the whole monarch cores gathering. So panic when that happens. Link: http://lkml.kernel.org/r/20140523091041.GA21332@pd.tnic Signed-off-by: Borislav Petkov <bp@suse.de>
* x86/mce: Improve mcheck_init_device() error handlingMathieu Souchaud2014-05-30
| | | | | | | | Check return code of every function called by mcheck_init_device(). Signed-off-by: Mathieu Souchaud <mattieu.souchaud@free.fr> Link: http://lkml.kernel.org/r/1399151031-19905-1-git-send-email-mattieu.souchaud@free.fr Signed-off-by: Borislav Petkov <bp@suse.de>
* asmlinkage, x86: Add explicit __visible to arch/x86/*Andi Kleen2014-05-05
| | | | | | | | | | | As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* x86/mce: Fix CMCI preemption bugsIngo Molnar2014-04-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 27f6c573e0f7 ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by: Owen Kibel <qmewlo@gmail.com> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c | 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)
* Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2014-04-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of minor fixes for x86, plus the IRET information leak fix (forbid the use of 16-bit segments in 64-bit mode)" NOTE! We may have to relax the "forbid the use of 16-bit segments in 64-bit mode" part, since there may be people who still run and depend on 16-bit Windows binaries under Wine. But I'm taking this in the current unconditional form for now to see who (if anybody) screams bloody murder. Maybe nobody cares. And maybe we'll have to update it with some kind of runtime enablement (like our vm.mmap_min_addr tunable that people who run dosemu/qemu/wine already need to tweak). * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels efi: Pass correct file handle to efi_file_{read,close} x86/efi: Correct EFI boot stub use of code32_start x86/efi: Fix boot failure with EFI stub x86/platform/hyperv: Handle VMBUS driver being a module x86/apic: Reinstate error IRQ Pentium erratum 3AP workaround x86, CMCI: Add proper detection of end of CMCI storms
| * x86, CMCI: Add proper detection of end of CMCI stormsChen, Gong2014-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When CMCI storm persists for a long time(at least beyond predefined threshold. It's 30 seconds for now), we can watch CMCI storm is detected immediately after it subsides. ... Dec 10 22:04:29 kernel: CMCI storm detected: switching to poll mode Dec 10 22:04:59 kernel: CMCI storm subsided: switching to interrupt mode Dec 10 22:04:59 kernel: CMCI storm detected: switching to poll mode Dec 10 22:05:29 kernel: CMCI storm subsided: switching to interrupt mode ... The problem is that our logic that determines that the storm has ended is incorrect. We announce the end, re-enable interrupts and realize that the storm is still going on, so we switch back to polling mode. Rinse, repeat. When a storm happens we disable signaling of errors via CMCI and begin polling machine check banks instead. If we find any logged errors, then we need to set a per-cpu flag so that our per-cpu tests that check whether the storm is ongoing will see that errors are still being logged independently of whether mce_notify_irq() says that the error has been fully processed. cmci_clear() is not the right tool to disable a bank. It disables the interrupt for the bank as desired, but it also clears the bit for this bank in "mce_banks_owned" so we will skip the bank when polling (so we fail to see that the storm continues because we stop looking). New cmci_storm_disable_banks() just disables the interrupt while allowing polling to continue. Reported-by: William Dauchy <wdauchy@gmail.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | x86, therm_throt.c: Remove unused therm_cpu_lockSrivatsa S. Bhat2014-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After fixing the CPU hotplug callback registration code, the callbacks invoked for each online CPU, during the initialization phase in thermal_throttle_init_device(), can no longer race with the actual CPU hotplug notifier callbacks (in thermal_throttle_cpu_callback). Hence the therm_cpu_lock is unnecessary now. Remove it. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | x86, therm_throt.c: Fix CPU hotplug callback registrationSrivatsa S. Bhat2014-03-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the thermal throttle code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* | x86, mce: Fix CPU hotplug callback registrationSrivatsa S. Bhat2014-03-20
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Subsystems that want to register CPU hotplug callbacks, as well as perform initialization for the CPUs that are already online, often do it as shown below: get_online_cpus(); for_each_online_cpu(cpu) init_cpu(cpu); register_cpu_notifier(&foobar_cpu_notifier); put_online_cpus(); This is wrong, since it is prone to ABBA deadlocks involving the cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently with CPU hotplug operations). Instead, the correct and race-free way of performing the callback registration is: cpu_notifier_register_begin(); for_each_online_cpu(cpu) init_cpu(cpu); /* Note the use of the double underscored version of the API */ __register_cpu_notifier(&foobar_cpu_notifier); cpu_notifier_register_done(); Fix the mce code in x86 by using this latter form of callback registration. Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
* Merge branch 'x86-ras-for-linus' of ↵Linus Torvalds2014-01-20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS changes from Ingo Molnar: - SCI reporting for other error types not only correctable ones - GHES cleanups - Add the functionality to override error reporting agents as some machines are sporting a new extended error logging capability which, if done properly in the BIOS, makes a corresponding EDAC module redundant - PCIe AER tracepoint severity levels fix - Error path correction for the mce device init - MCE timer fix - Add more flexibility to the error injection (EINJ) debugfs interface * 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, mce: Fix mce_start_timer semantics ACPI, APEI, GHES: Cleanup ghes memory error handling ACPI, APEI: Cleanup alignment-aware accesses ACPI, APEI, GHES: Do not report only correctable errors with SCI ACPI, APEI, EINJ: Changes to the ACPI/APEI/EINJ debugfs interface ACPI, eMCA: Combine eMCA/EDAC event reporting priority EDAC, sb_edac: Modify H/W event reporting policy EDAC: Add an edac_report parameter to EDAC PCI, AER: Fix severity usage in aer trace event x86, mce: Call put_device on device_register failure
| * x86, mce: Fix mce_start_timer semanticsBorislav Petkov2014-01-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So mce_start_timer() has a 'cpu' argument which is supposed to mean to start a timer on that cpu. However, the code currently starts a timer on the *current* cpu the function runs on and causes the sanity-check in mce_timer_fn to fire: WARNING: CPU: 0 PID: 0 at arch/x86/kernel/cpu/mcheck/mce.c:1286 mce_timer_fn because it is running on the wrong cpu. This was triggered by Prarit Bhargava <prarit@redhat.com> by offlining all the cpus in succession. Then, we were fiddling with the CMCI storm settings when starting the timer whereas there's no need for that - if there's storm happening on this newly restarted cpu, we're going to be in normal CMCI mode initially and then when the CMCI interrupt starts firing, we're going to go to the polling mode with the timer real soon. Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Prarit Bhargava <prarit@redhat.com> Cc: Tony Luck <tony.luck@intel.com> Reviewed-by: Chen, Gong <gong.chen@linux.intel.com> Link: http://lkml.kernel.org/r/1387722156-5511-1-git-send-email-prarit@redhat.com
| * ACPI, APEI, GHES: Do not report only correctable errors with SCIChen, Gong2013-12-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>
| * x86, mce: Call put_device on device_register failureLevente Kurusa2013-11-30
| | | | | | | | | | | | | | | | | | | | This patch adds a call to put_device() when the device_register() call has failed. This is required so that the last reference to the device is given up. Signed-off-by: Levente Kurusa <levex@linux.com> Link: http://lkml.kernel.org/r/5298F900.9000208@linux.com Signed-off-by: Borislav Petkov <bp@suse.de>
* | x86: Delete non-required instances of include <linux/init.h>Paul Gortmaker2014-01-07
|/ | | | | | | | | | | | | None of these files are actually using any __init type directives and hence don't need to include <linux/init.h>. Most are just a left over from __devinit and __cpuinit removal, or simply due to code getting copied from one driver to the next. [ hpa: undid incorrect removal from arch/x86/kernel/head_32.S ] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Link: http://lkml.kernel.org/r/1389054026-12947-1-git-send-email-paul.gortmaker@windriver.com Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* ACPI, APEI, CPER: Add UEFI 2.4 support for memory errorChen, Gong2013-10-23
| | | | | | | | | | | | | | In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* Merge tag 'please-pull-mce-f-bit' of ↵Ingo Molnar2013-08-12
|\ | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras Pull MCE-uncorrected-error fix from Tony Luck: "Bit 12 may or may not be set in MCi_STATUS.MCACOD when an uncorrected error is reported. Ignore it when checking error signatures." Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * x86/mce: Fix mce regression from recent cleanupTony Luck2013-07-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 33d7885b594e169256daef652e8d3527b2298e75 x86/mce: Update MCE severity condition check We simplified the rules to recognise each classification of recoverable machine check combining the instruction and data fetch rules into a single entry based on clarifications in the June 2013 SDM that all recoverable events would be reported on the unaffected processor with MCG_STATUS.EIPV=0 and MCG_STATUS.RIPV=1. Unfortunately the simplified rule has a couple of bugs. Fix them here. Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | Merge branch 'x86/mce' into x86/rasIngo Molnar2013-08-12
|\ \ | |/ |/| | | | | | | Pursue a single RAS/MCE topic branch on x86. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * mce: acpi/apei: Honour Firmware First for MCA banks listed in APEI HEST CMCNaveen N. Rao2013-07-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Corrected Machine Check structure (CMC) in HEST has a flag which can be set by the firmware to indicate to the OS that it prefers to process the corrected error events first. In this scenario, the OS is expected to not monitor for corrected errors (through CMCI/polling). Instead, the firmware notifies the OS on corrected error events through GHES. Linux already has support for GHES. This patch adds support for parsing CMC structure and to disable CMCI/polling if the firmware first flag is set. Further, the list of machine check bank structures at the end of CMC is used to determine which MCA banks function in FF mode, so that we continue to monitor error events on the other banks. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
* | x86: delete __cpuinit usage from all x86 filesPaul Gortmaker2013-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* | Merge branch 'next' of ↵Linus Torvalds2013-07-11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management updates from Zhang Rui: "There are not too many changes this time, except two new platform thermal drivers, ti-soc-thermal driver and x86_pkg_temp_thermal driver, and a couple of small fixes. Highlights: - move the ti-soc-thermal driver out of the staging tree to the thermal tree. - introduce the x86_pkg_temp_thermal driver. This driver registers CPU digital temperature package level sensor as a thermal zone. - small fixes/cleanups including removing redundant use of platform_set_drvdata() and of_match_ptr for all platform thermal drivers" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (34 commits) thermal: cpu_cooling: fix stub function thermal: ti-soc-thermal: use standard GPIO DT bindings thermal: MAINTAINERS: Add git tree path for SoC specific updates thermal: fix x86_pkg_temp_thermal.c build and Kconfig Thermal: Documentation for x86 package temperature thermal driver Thermal: CPU Package temperature thermal thermal: consider emul_temperature while computing trend thermal: ti-soc-thermal: add DT example for DRA752 chip thermal: ti-soc-thermal: add dra752 chip to device table thermal: ti-soc-thermal: add thermal data for DRA752 chips thermal: ti-soc-thermal: remove usage of IS_ERR_OR_NULL thermal: ti-soc-thermal: freeze FSM while computing trend thermal: ti-soc-thermal: remove external heat while extrapolating hotspot thermal: ti-soc-thermal: update DT reference for OMAP5430 x86, mcheck, therm_throt: Process package thresholds thermal: cpu_cooling: fix 'descend' check in get_property() Thermal: spear: Remove redundant use of of_match_ptr Thermal: kirkwood: Remove redundant use of of_match_ptr Thermal: dove: Remove redundant use of of_match_ptr Thermal: armada: Remove redundant use of of_match_ptr ...
| * | x86, mcheck, therm_throt: Process package thresholdsSrinivas Pandruvada2013-06-12
| |/ | | | | | | | | | | | | | | | | | | | | Added callback registration for package threshold reports. Also added a callback to check the rate control implemented in callback or not. If there is no rate control implemented, then there is a default rate control similar to core threshold notification by delaying for CHECK_INTERVAL (5 minutes) between reports. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Zhang Rui <rui.zhang@intel.com>