| Commit message (Collapse) | Author | Age |
... | |
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux
Pull thermal power-limit update from Tony Luck:
"Thermal limit warnings are too scary and cause unnecessary concern"
* tag 'please-pull-mce-therm' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
x86 thermal: Disable power limit notification interrupt by default
x86 thermal: Delete power-limit-notification console messages
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The package power limit notification interrupt is primarily for
system diagnosis, and should not be blindly enabled on every
system by default -- particuarly since Linux does nothing in the
handler except count how many times it has been called...
Add a new kernel cmdline parameter "int_pln_enable" for situations where
users want to oberve these events via existing system counters:
$ grep TRM /proc/interrupts
$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*
https://bugzilla.kernel.org/show_bug.cgi?id=36182
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Package power limits are common on some systems under some conditions --
so printing console messages when limits are reached
causes unnecessary customer concern and support calls.
Note that even with these console messages gone,
the events can still be observed via system counters:
$ grep TRM /proc/interrupts
Shows total thermal interrupts, which includes both power
limit notifications and thermal throttling interrupts.
$ grep . /sys/devices/system/cpu/cpu*/thermal_throttle/*
Will show what caused those interrupts, core and package
throttling and power limit notifications.
https://bugzilla.kernel.org/show_bug.cgi?id=36182
Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 tracing updates from Ingo Molnar:
"This tree adds IRQ vector tracepoints that are named after the handler
and which output the vector #, based on a zero-overhead approach that
relies on changing the IDT entries, by Seiji Aguchi.
The new tracepoints look like this:
# perf list | grep -i irq_vector
irq_vectors:local_timer_entry [Tracepoint event]
irq_vectors:local_timer_exit [Tracepoint event]
irq_vectors:reschedule_entry [Tracepoint event]
irq_vectors:reschedule_exit [Tracepoint event]
irq_vectors:spurious_apic_entry [Tracepoint event]
irq_vectors:spurious_apic_exit [Tracepoint event]
irq_vectors:error_apic_entry [Tracepoint event]
irq_vectors:error_apic_exit [Tracepoint event]
[...]"
* 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tracing: Add config option checking to the definitions of mce handlers
trace,x86: Do not call local_irq_save() in load_current_idt()
trace,x86: Move creation of irq tracepoints from apic.c to irq.c
x86, trace: Add irq vector tracepoints
x86: Rename variables for debugging
x86, trace: Introduce entering/exiting_irq()
tracing: Add DEFINE_EVENT_FN() macro
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
[Purpose of this patch]
As Vaibhav explained in the thread below, tracepoints for irq vectors
are useful.
http://www.spinics.net/lists/mm-commits/msg85707.html
<snip>
The current interrupt traces from irq_handler_entry and irq_handler_exit
provide when an interrupt is handled. They provide good data about when
the system has switched to kernel space and how it affects the currently
running processes.
There are some IRQ vectors which trigger the system into kernel space,
which are not handled in generic IRQ handlers. Tracing such events gives
us the information about IRQ interaction with other system events.
The trace also tells where the system is spending its time. We want to
know which cores are handling interrupts and how they are affecting other
processes in the system. Also, the trace provides information about when
the cores are idle and which interrupts are changing that state.
<snip>
On the other hand, my usecase is tracing just local timer event and
getting a value of instruction pointer.
I suggested to add an argument local timer event to get instruction pointer before.
But there is another way to get it with external module like systemtap.
So, I don't need to add any argument to irq vector tracepoints now.
[Patch Description]
Vaibhav's patch shared a trace point ,irq_vector_entry/irq_vector_exit, in all events.
But there is an above use case to trace specific irq_vector rather than tracing all events.
In this case, we are concerned about overhead due to unwanted events.
So, add following tracepoints instead of introducing irq_vector_entry/exit.
so that we can enable them independently.
- local_timer_vector
- reschedule_vector
- call_function_vector
- call_function_single_vector
- irq_work_entry_vector
- error_apic_vector
- thermal_apic_vector
- threshold_apic_vector
- spurious_apic_vector
- x86_platform_ipi_vector
Also, introduce a logic switching IDT at enabling/disabling time so that a time penalty
makes a zero when tracepoints are disabled. Detailed explanations are as follows.
- Create trace irq handlers with entering_irq()/exiting_irq().
- Create a new IDT, trace_idt_table, at boot time by adding a logic to
_set_gate(). It is just a copy of original idt table.
- Register the new handlers for tracpoints to the new IDT by introducing
macros to alloc_intr_gate() called at registering time of irq_vector handlers.
- Add checking, whether irq vector tracing is on/off, into load_current_idt().
This has to be done below debug checking for these reasons.
- Switching to debug IDT may be kicked while tracing is enabled.
- On the other hands, switching to trace IDT is kicked only when debugging
is disabled.
In addition, the new IDT is created only when CONFIG_TRACING is enabled to avoid being
used for other purposes.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C323ED.5050708@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When implementing tracepoints in interrupt handers, if the tracepoints are
simply added in the performance sensitive path of interrupt handers,
it may cause potential performance problem due to the time penalty.
To solve the problem, an idea is to prepare non-trace/trace irq handers and
switch their IDTs at the enabling/disabling time.
So, let's introduce entering_irq()/exiting_irq() for pre/post-
processing of each irq handler.
A way to use them is as follows.
Non-trace irq handler:
smp_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
exiting_irq(); /* post-processing of this handler */
}
Trace irq_handler:
smp_trace_irq_handler()
{
entering_irq(); /* pre-processing of this handler */
trace_irq_entry(); /* tracepoint for irq entry */
__smp_irq_handler(); /*
* common logic between non-trace and trace handlers
* in a vector.
*/
trace_irq_exit(); /* tracepoint for irq exit */
exiting_irq(); /* post-processing of this handler */
}
If tracepoints can place outside entering_irq()/exiting_irq() as follows,
it looks cleaner.
smp_trace_irq_handler()
{
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
}
But it doesn't work.
The problem is with irq_enter/exit() being called. They must be called before
trace_irq_enter/exit(), because of the rcu_irq_enter() must be called before
any tracepoints are used, as tracepoints use rcu to synchronize.
As a possible alternative, we may be able to call irq_enter() first as follows
if irq_enter() can nest.
smp_trace_irq_hander()
{
irq_entry();
trace_irq_entry();
smp_irq_handler();
trace_irq_exit();
irq_exit();
}
But it doesn't work, either.
If irq_enter() is nested, it may have a time penalty because it has to check if it
was already called or not. The time penalty is not desired in performance sensitive
paths even if it is tiny.
Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Link: http://lkml.kernel.org/r/51C3238D.9040706@hds.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull MCE cleanup from Tony Luck:
"Changes to simplify the SDM means that we can also simplify
the code for SRAR (software recoverable action required) errors."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| | |
Update some SRAR severity conditions check to make it clearer,
according to latest Intel SDM Vol 3(June 2013), table 15-20.
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull MCE updates from Tony Luck:
"Better comments so we understand our existing machine check
bank bitmaps - prelude to adding another bitmap soon."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
the MCA subsystem
There is some confusion about the 'mce_poll_banks' and 'mce_banks_owned'
per-cpu bitmaps. Provide comments so that we all know exactly what these
are used for, and why.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|/
|
|
|
|
|
| |
Fix the typo in MCJ_IRQ_BRAODCAST.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|\
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/ras
Pull clean up of the cmci_rediscover code to fix problems found by Dave Jones,
from Tony Luck.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Dave Jones reports that offlining a CPU leads to this trace:
numa_remove_cpu cpu 1 node 0: mask now 0,2-3
smpboot: CPU 1 is now offline
BUG: using smp_processor_id() in preemptible [00000000] code:
cpu-offline.sh/10591
caller is cmci_rediscover+0x6a/0xe0
Pid: 10591, comm: cpu-offline.sh Not tainted 3.9.0-rc3+ #2
Call Trace:
[<ffffffff81333bbd>] debug_smp_processor_id+0xdd/0x100
[<ffffffff8101edba>] cmci_rediscover+0x6a/0xe0
[<ffffffff815f5b9f>] mce_cpu_callback+0x19d/0x1ae
[<ffffffff8160ea66>] notifier_call_chain+0x66/0x150
[<ffffffff8107ad7e>] __raw_notifier_call_chain+0xe/0x10
[<ffffffff8104c2e3>] cpu_notify+0x23/0x50
[<ffffffff8104c31e>] cpu_notify_nofail+0xe/0x20
[<ffffffff815ef082>] _cpu_down+0x302/0x350
[<ffffffff815ef106>] cpu_down+0x36/0x50
[<ffffffff815f1c9d>] store_online+0x8d/0xd0
[<ffffffff813edc48>] dev_attr_store+0x18/0x30
[<ffffffff81226eeb>] sysfs_write_file+0xdb/0x150
[<ffffffff811adfb2>] vfs_write+0xa2/0x170
[<ffffffff811ae16c>] sys_write+0x4c/0xa0
[<ffffffff81613019>] system_call_fastpath+0x16/0x1b
However, a look at cmci_rediscover shows that it can be simplified quite
a bit, apart from solving the above issue. It invokes functions that
take spin locks with interrupts disabled, and hence it can run in atomic
context. Also, it is run in the CPU_POST_DEAD phase, so the dying CPU
is already dead and out of the cpu_online_mask. So take these points into
account and simplify the code, and thereby also fix the above issue.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently number of error reporting register banks is hardcoded to
6 on AMD processors. This may break in virtualized scenarios when
a hypervisor prefers to report fewer banks than what the physical
HW provides.
Since number of supported banks is reported in MSR_IA32_MCG_CAP[7:0]
that's what we should use.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1363295441-1859-3-git-send-email-boris.ostrovsky@oracle.com
[ reverse NULL ptr test logic ]
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|/
|
|
|
|
|
|
|
| |
Use helper function instead of an array to report whether register
bank is shared. Currently only bank 4 (northbridge) is shared.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1363295441-1859-2-git-send-email-boris.ostrovsky@oracle.com
Signed-off-by: Borislav Petkov <bp@suse.de>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module update from Rusty Russell:
"The sweeping change is to make add_taint() explicitly indicate whether
to disable lockdep, but it's a mechanical change."
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
MODSIGN: Add option to not sign modules during modules_install
MODSIGN: Add -s <signature> option to sign-file
MODSIGN: Specify the hash algorithm on sign-file command line
MODSIGN: Simplify Makefile with a Kconfig helper
module: clean up load_module a little more.
modpost: Ignore ARC specific non-alloc sections
module: constify within_module_*
taint: add explicit flag to show whether lock dep is still OK.
module: printk message when module signature fail taints kernel.
|
| |
| |
| |
| |
| |
| |
| | |
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's no need to test whether a (delayed) work item in pending
before queueing, flushing or cancelling it. Most uses are unnecessary
and quite a few of them are buggy.
Remove unnecessary pending tests from x86/mce. Only compile tested.
v2: Local var work removed from mce_schedule_work() as suggested by
Borislav.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac@vger.kernel.org
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 RAS update from Ingo Molnar:
"Rework all config variables used throughout the MCA code and collect
them together into a mca_config struct. This keeps them tightly and
neatly packed together instead of spilled all over the place.
Then, convert those which are used as booleans into real booleans and
save some space. These bits are exposed via
/sys/devices/system/machinecheck/machinecheck*/"
* 'x86-ras-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, MCA: Finish mca_config conversion
x86, MCA: Convert the next three variables batch
x86, MCA: Convert rip_msr, mce_bootlog, monarch_timeout
x86, MCA: Convert dont_log_ce, banks and tolerant
drivers/base: Add a DEVICE_BOOL_ATTR macro
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
mce_ser, mce_bios_cmci_threshold and mce_disabled are the last three
bools which need conversion. Move them to the mca_config struct and
adjust usage sites accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move them into the mca_config struct and adjust code touching them
accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move above configuration variables into struct mca_config and adjust
usage places accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Move those MCA configuration variables into struct mca_config and adjust
the places they're used accordingly.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Acked-by: Tony Luck <tony.luck@intel.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/urgent
Pull MCE fix from Tony Luck:
"Fix problem in CMCI rediscovery code that was illegally
migrating worker threads to other cpus."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
cmci_rediscover() used set_cpus_allowed_ptr() to change the current process's
running cpu, and migrate itself to the dest cpu. But worker processes are not
allowed to be migrated. If current is a worker, the worker will be migrated to
another cpu, but the corresponding worker_pool is still on the original cpu.
In this case, the following BUG_ON in try_to_wake_up_local() will be triggered:
BUG_ON(rq != this_rq());
This will cause the kernel panic. The call trace is like the following:
[ 6155.451107] ------------[ cut here ]------------
[ 6155.452019] kernel BUG at kernel/sched/core.c:1654!
......
[ 6155.452019] RIP: 0010:[<ffffffff810add15>] [<ffffffff810add15>] try_to_wake_up_local+0x115/0x130
......
[ 6155.452019] Call Trace:
[ 6155.452019] [<ffffffff8166fc14>] __schedule+0x764/0x880
[ 6155.452019] [<ffffffff81670059>] schedule+0x29/0x70
[ 6155.452019] [<ffffffff8166de65>] schedule_timeout+0x235/0x2d0
[ 6155.452019] [<ffffffff810db57d>] ? mark_held_locks+0x8d/0x140
[ 6155.452019] [<ffffffff810dd463>] ? __lock_release+0x133/0x1a0
[ 6155.452019] [<ffffffff81671c50>] ? _raw_spin_unlock_irq+0x30/0x50
[ 6155.452019] [<ffffffff810db8f5>] ? trace_hardirqs_on_caller+0x105/0x190
[ 6155.452019] [<ffffffff8166fefb>] wait_for_common+0x12b/0x180
[ 6155.452019] [<ffffffff810b0b30>] ? try_to_wake_up+0x2f0/0x2f0
[ 6155.452019] [<ffffffff8167002d>] wait_for_completion+0x1d/0x20
[ 6155.452019] [<ffffffff8110008a>] stop_one_cpu+0x8a/0xc0
[ 6155.452019] [<ffffffff810abd40>] ? __migrate_task+0x1a0/0x1a0
[ 6155.452019] [<ffffffff810a6ab8>] ? complete+0x28/0x60
[ 6155.452019] [<ffffffff810b0fd8>] set_cpus_allowed_ptr+0x128/0x130
[ 6155.452019] [<ffffffff81036785>] cmci_rediscover+0xf5/0x140
[ 6155.452019] [<ffffffff816643c0>] mce_cpu_callback+0x18d/0x19d
[ 6155.452019] [<ffffffff81676187>] notifier_call_chain+0x67/0x150
[ 6155.452019] [<ffffffff810a03de>] __raw_notifier_call_chain+0xe/0x10
[ 6155.452019] [<ffffffff81070470>] __cpu_notify+0x20/0x40
[ 6155.452019] [<ffffffff810704a5>] cpu_notify_nofail+0x15/0x30
[ 6155.452019] [<ffffffff81655182>] _cpu_down+0x262/0x2e0
[ 6155.452019] [<ffffffff81655236>] cpu_down+0x36/0x50
[ 6155.452019] [<ffffffff813d3eaa>] acpi_processor_remove+0x50/0x11e
[ 6155.452019] [<ffffffff813a6978>] acpi_device_remove+0x90/0xb2
[ 6155.452019] [<ffffffff8143cbec>] __device_release_driver+0x7c/0xf0
[ 6155.452019] [<ffffffff8143cd6f>] device_release_driver+0x2f/0x50
[ 6155.452019] [<ffffffff813a7870>] acpi_bus_remove+0x32/0x6d
[ 6155.452019] [<ffffffff813a7932>] acpi_bus_trim+0x87/0xee
[ 6155.452019] [<ffffffff813a7a21>] acpi_bus_hot_remove_device+0x88/0x16b
[ 6155.452019] [<ffffffff813a33ee>] acpi_os_execute_deferred+0x27/0x34
[ 6155.452019] [<ffffffff81090589>] process_one_work+0x219/0x680
[ 6155.452019] [<ffffffff81090528>] ? process_one_work+0x1b8/0x680
[ 6155.452019] [<ffffffff813a33c7>] ? acpi_os_wait_events_complete+0x23/0x23
[ 6155.452019] [<ffffffff810923be>] worker_thread+0x12e/0x320
[ 6155.452019] [<ffffffff81092290>] ? manage_workers+0x110/0x110
[ 6155.452019] [<ffffffff81098396>] kthread+0xc6/0xd0
[ 6155.452019] [<ffffffff8167c4c4>] kernel_thread_helper+0x4/0x10
[ 6155.452019] [<ffffffff81671f30>] ? retint_restore_args+0x13/0x13
[ 6155.452019] [<ffffffff810982d0>] ? __init_kthread_worker+0x70/0x70
[ 6155.452019] [<ffffffff8167c4c0>] ? gs_change+0x13/0x13
This patch removes the set_cpus_allowed_ptr() call, and put the cmci rediscover
jobs onto all the other cpus using system_wq. This could bring some delay for
the jobs.
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
|/
|
|
|
|
|
|
| |
Move to private email and put in maintained status.
Signed-off-by: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/1351532410-4887-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
From Borislav Petkov <bp@amd64.org>:
Below is a RAS fix which reverts the addition of a sysfs attribute
which we agreed is not needed, post-factum. And this should go in now
because that sysfs attribute is going to end up in 3.7 otherwise and
thus exposed to userspace; removing it then would be a lot harder.
This is done as a merge rather than a simple patch/cherry-pick since
the baseline for this patch was not in the previous x86/urgent.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
450cc201038f3 ("x86/mce: Provide boot argument to honour bios-set CMCI
threshold") added the bios_cmci_threshold sysfs attribute which was
supposed to communicate to userspace tools that BIOS CMCI threshold has
been honoured.
However, this info is not of any importance to userspace - it should
rather get the actual error count it has been thresholded already from
MCi_STATUS[38:52].
So drop this before it becomes a used interface (good thing we caught
this early in 3.7-rc1, right after the merge window closed).
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/20121017105940.GA14590@x1.osrc.amd.com
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The ACPI spec doesn't provide for a way for the bios to pass down
recommended thresholds to the OS on a _per-bank_ basis. This patch adds
a new boot option, which if passed, tells Linux to use CMCI thresholds
set by the bios.
As fail-safe, we initialize threshold to 1 if some banks have not been
initialized by the bios and warn the user.
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| |\
| | |
| | |
| | |
| | |
| | | |
Merge Linux v3.6-rc6, to refresh this tree.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On Intel systems corrected machine check interrupts (CMCI) may be sent to
multiple logical processors; possibly to all processors on the affected
socket (SDM Volume 3B "15.5.1 CMCI Local APIC Interface"). This means
that a persistent error (such as a stuck bit in ECC memory) may cause
a storm of interrupts that greatly hinders or prevents forward progress
(probably on many processors).
To solve this we keep track of the rate at which each processor sees
CMCI. If we exceed a threshold, we disable CMCI delivery and switch to
polling the machine check banks. If the storm subsides (none of the
affected processors see any more errors for a complete poll interval) we
re-enable CMCI.
[Tony: Added console messages when storm begins/ends and increased storm
threshold from 5 to 15 so we have a few more logged entries before we
disable interrupts and start dropping reports]
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
cmci_discover() works out which machine check banks support CMCI, and
which of those are shared by multiple logical processors. It uses this
information to ensure that exactly one cpu is designated the owner of
each bank so that when interrupts are broadcast to multiple cpus, only one
of them will look in a shared bank to log the error and clear the bank.
At boot time cmci_discover() performs this task silently. But during
certain cpu hotplug operations it prints out a set of summary lines
like this:
CPU 35 MCA banks CMCI:0 CMCI:1 CMCI:3 CMCI:5 CMCI:6 CMCI:7 CMCI:8 CMCI:9 CMCI:10 CMCI:11
CPU 1 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 39 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 38 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 32 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 37 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 36 MCA banks CMCI:0 CMCI:1 CMCI:3
CPU 34 MCA banks CMCI:0 CMCI:1 CMCI:3
The value of these messages seems very low. A user might painstakingly
cross-check against the data sheet for a processor to ensure that all
CMCI supported banks are correctly reported, but this seems improbable.
If users really wanted to do this, we should print the information at
boot time too.
Remove the messages.
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
No point in having double cases if we can simply mask the FROZEN bit
out.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Split timer init function into the init and the start part, so the
start part can replace the open coded version in CPU_DOWN_FAILED.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
raise_mce() fiddles with global state, but lacks any kind of
serialization.
Add a mutex around the raise_mce() call, so concurrent writers do not
stomp on each other toes.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
raise_mce() has a code path which does not disable preemption when the
raise_local() is called. The per cpu variable access in raise_local()
depends on preemption being disabled to be functional. So that code
path was either never tested or never tested with CONFIG_DEBUG_PREEMPT
enabled.
Add the missing preempt_disable/enable() pair around the call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Chen Gong <gong.chen@linux.intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
|
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When booting on a federated multi-server system (NumaScale), the
processor Northbridge lookup returns NULL; add guards to prevent this
causing an oops.
On those systems, the northbridge is accessed through MMIO and the
"normal" northbridge enumeration in amd_nb.c doesn't work since we're
generating the northbridge ID from the initial APIC ID and the last
is not unique on those systems. Long story short, we end up without
northbridge descriptors.
Signed-off-by: Daniel J Blueman <daniel@numascale-asia.com>
Cc: stable@vger.kernel.org # 3.6
Link: http://lkml.kernel.org/r/1349073725-14093-1-git-send-email-daniel@numascale-asia.com
[ Boris: beef up commit message ]
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Various fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, kcmp: The kcmp system call can be common
arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case
x86/mce: Add quirk for instruction recovery on Sandy Bridge processors
x86/mce: Move MCACOD defines from mce-severity.c to <asm/mce.h>
x86/ioapic: Fix NULL pointer dereference on CPU hotplug after disabling irqs
x86, nops: Missing break resulting in incorrect selection on Intel
x86: CONFIG_CC_STACKPROTECTOR=y is no longer experimental
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sandy Bridge processors follow the SDM (Vol 3B, Table 15-20) and
set both the RIPV and EIPV bits in the MCG_STATUS register to
zero for machine checks during instruction fetch. This is more
than a little counter-intuitive and means that Linux cannot
recover from these errors. Rather than insert special case code
at several places in mce.c and mce-severity.c, we pretend the
EIPV bit was set for just this case early in processing the
machine check.
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/180a06f3f357cf9f78259ae443a082b14a29535b.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We will need some of these values in mce.c. Move them to the
appropriate header file so they are available.
Acked-by: Borislav Petkov <bp@amd64.org>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Cc: Chen Gong <gong.chen@linux.intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Link: http://lkml.kernel.org/r/0ccfb1af5fe35e537b7cd8e4d448bf7d851dbfb9.1343078495.git.tony.luck@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen update from Konrad Rzeszutek Wilk:
"Features:
* Performance improvement to lower the amount of traps the hypervisor
has to do 32-bit guests. Mainly for setting PTE entries and
updating TLS descriptors.
* MCE polling driver to collect hypervisor MCE buffer and present
them to /dev/mcelog.
* Physical CPU online/offline support. When an privileged guest is
booted it is present with virtual CPUs, which might have an 1:1 to
physical CPUs but usually don't. This provides mechanism to
offline/online physical CPUs.
Bug-fixes for:
* Coverity found fixes in the console and ACPI processor driver.
* PVonHVM kexec fixes along with some cleanups.
* Pages that fall within E820 gaps and non-RAM regions (and had been
released to hypervisor) would be populated back, but potentially in
non-RAM regions."
* tag 'stable/for-linus-3.6-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: populate correct number of pages when across mem boundary (v2)
xen PVonHVM: move shared_info to MMIO before kexec
xen: simplify init_hvm_pv_info
xen: remove cast from HYPERVISOR_shared_info assignment
xen: enable platform-pci only in a Xen guest
xen/pv-on-hvm kexec: shutdown watches from old kernel
xen/x86: avoid updating TLS descriptors if they haven't changed
xen/x86: add desc_equal() to compare GDT descriptors
xen/mm: zero PTEs for non-present MFNs in the initial page table
xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable
xen/hvc: Fix up checks when the info is allocated.
xen/acpi: Fix potential memory leak.
xen/mce: add .poll method for mcelog device driver
xen/mce: schedule a workqueue to avoid sleep in atomic context
xen/pcpu: Xen physical cpus online/offline sys interface
xen/mce: Register native mce handler as vMCE bounce back point
x86, MCE, AMD: Adjust initcall sequence for xen
xen/mce: Add mcelog support for Xen platform
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
there are 3 funcs which need to be _initcalled in a logic sequence:
1. xen_late_init_mcelog
2. mcheck_init_device
3. threshold_init_device
xen_late_init_mcelog must register xen_mce_chrdev_device before
native mce_chrdev_device registration if running under xen platform;
mcheck_init_device should be inited before threshold_init_device to
initialize mce_device, otherwise a a NULL ptr dereference will cause panic.
so we use following _initcalls
1. device_initcall(xen_late_init_mcelog);
2. device_initcall_sync(mcheck_init_device);
3. late_initcall(threshold_init_device);
when running under xen, the initcall order is 1,2,3;
on baremetal, we skip 1 and we do only 2 and 3.
Acked-and-tested-by: Borislav Petkov <bp@amd64.org>
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When MCA error occurs, it would be handled by Xen hypervisor first,
and then the error information would be sent to initial domain for logging.
This patch gets error information from Xen hypervisor and convert
Xen format error into Linux format mcelog. This logic is basically
self-contained, not touching other kernel components.
By using tools like mcelog tool users could read specific error information,
like what they did under native Linux.
To test follow directions outlined in Documentation/acpi/apei/einj.txt
Acked-and-tested-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Ke, Liping <liping.ke@intel.com>
Signed-off-by: Jiang, Yunhong <yunhong.jiang@intel.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|\ \ \
| |_|/
|/| |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/mce changes from Ingo Molnar:
"This tree improves the AMD thresholding bank code and includes a
memory fault signal handling fixlet."
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mce: Fix siginfo_t->si_addr value for non-recoverable memory faults
x86, MCE, AMD: Update copyrights and boilerplate
x86, MCE, AMD: Give proper names to the thresholding banks
x86, MCE, AMD: Make error_count read only
x86, MCE, AMD: Cleanup reading of error_count
x86, MCE, AMD: Print decimal thresholding values
x86, MCE, AMD: Move shared bank to node descriptor
x86, MCE, AMD: Remove local_allocate_... wrapper
x86, MCE, AMD: Remove shared banks sysfs linking
x86, amd_nb: Export model 0x10 and later PCI id
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
Merge memory fault handling fix from Tony Luck.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In commit dad1743e5993f1 ("x86/mce: Only restart instruction after machine
check recovery if it is safe") we fixed mce_notify_process() to force a
signal to the current process if it was not restartable (RIPV bit not
set in MCG_STATUS). But doing it here means that the process doesn't
get told the virtual address of the fault via siginfo_t->si_addr. This
would prevent application level recovery from the fault.
Make a new MF_MUST_KILL flag bit for memory_failure() et al. to use so
that we will provide the right information with the signal.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org # 3.4+
|
| |\|
| | |
| | |
| | |
| | |
| | | |
Merge Linux 3.5-rc6 before merging more code.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras into x86/mce
Pull in AMD MCE thresholding fixes for v3.6, from Borislav Petkov:
" Those are a bunch of patches which give the MCE thresholding code a
hard look and a scrubbing to remove a couple of annoyances like sysfs
warnings when running CPU off-/online tests and the threshold_bank4 node
under /sys/devices/system/machinecheck/ is a symlink.
It also gives proper names to the thresholding banks instead of simply
enumerating them, like this:
/sys/devices/system/machinecheck/machinecheck0/
|-- bank0
|-- bank1
|-- bank2
|-- bank3
|-- bank4
|-- bank5
|-- bank6
|-- check_interval
|-- cmci_disabled
|-- combined_unit
| |-- combined_unit
| |-- error_count
| |-- threshold_limit
|-- dont_log_ce
|-- execution_unit
| |-- execution_unit
| |-- error_count
| |-- threshold_limit
|-- ignore_ce
|-- insn_fetch
| |-- insn_fetch
| |-- error_count
| |-- threshold_limit
|-- load_store
| |-- load_store
| |-- error_count
| |-- threshold_limit
|-- monarch_timeout
|-- northbridge
| |-- dram
| | |-- error_count
| | |-- interrupt_enable
| | |-- threshold_limit
| |-- ht_links
| | |-- error_count
| | |-- interrupt_enable
| | |-- threshold_limit
| |-- l3_cache
| |-- error_count
| |-- interrupt_enable
| |-- threshold_limit
...
It is tested on all our families >= K8."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Jacob is doing something else now so add myself as the loser who
provides support.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Having the banks numbered is ok but having real names which mean
something to the user makes a lot more sense:
/sys/devices/system/machinecheck/machinecheck0/
|-- bank0
|-- bank1
|-- bank2
|-- bank3
|-- bank4
|-- bank5
|-- bank6
|-- check_interval
|-- cmci_disabled
|-- combined_unit
| |-- combined_unit
| |-- error_count
| |-- threshold_limit
|-- dont_log_ce
|-- execution_unit
| |-- execution_unit
| |-- error_count
| |-- threshold_limit
|-- ignore_ce
|-- insn_fetch
| |-- insn_fetch
| |-- error_count
| |-- threshold_limit
|-- load_store
| |-- load_store
| |-- error_count
| |-- threshold_limit
|-- monarch_timeout
|-- northbridge
| |-- dram
| | |-- error_count
| | |-- interrupt_enable
| | |-- threshold_limit
| |-- ht_links
| | |-- error_count
| | |-- interrupt_enable
| | |-- threshold_limit
| |-- l3_cache
| |-- error_count
| |-- interrupt_enable
| |-- threshold_limit
...
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
|