aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2009-06-13 16:14:51 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2009-06-13 16:14:51 -0400
commita2ee2981ae2a7046b10980feae9f4ab813877106 (patch)
treeed75db7830b9ef1342659d36d2775954ce96b79f /Documentation
parent7603ef03a22a33d36d3c75d7c1aca1f957671ad3 (diff)
parent0d5959723e1db3fd7323c198a50c16cecf96c7a9 (diff)
Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits) x86, mce: Add boot options for corrected errors x86, mce: Fix mce printing x86, mce: fix for mce counters x86, mce: support action-optional machine checks x86, mce: define MCE_VECTOR x86, mce: rename mce_notify_user to mce_notify_irq x86: fix panic with interrupts off (needed for MCE) x86, mce: export MCE severities coverage via debugfs x86, mce: implement new status bits x86, mce: print header/footer only once for multiple MCEs x86, mce: default to panic timeout for machine checks x86, mce: improve mce_get_rip x86, mce: make non Monarch panic message "Fatal machine check" too x86, mce: switch x86 machine check handler to Monarch election. x86, mce: implement panic synchronization x86, mce: implement bootstrapping for machine check wakeups x86, mce: check early in exception handler if panic is needed x86, mce: add table driven machine check grading x86, mce: remove TSC print heuristic x86, mce: log corrected errors when panicing ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Changes15
-rw-r--r--Documentation/feature-removal-schedule.txt10
-rw-r--r--Documentation/x86/x86_64/boot-options.txt44
-rw-r--r--Documentation/x86/x86_64/machinecheck8
4 files changed, 69 insertions, 8 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index b95082be4d5e..d21b3b5aa543 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version
48o oprofile 0.9 # oprofiled --version 48o oprofile 0.9 # oprofiled --version
49o udev 081 # udevinfo -V 49o udev 081 # udevinfo -V
50o grub 0.93 # grub --version 50o grub 0.93 # grub --version
51o mcelog 0.6
51 52
52Kernel compilation 53Kernel compilation
53================== 54==================
@@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS
276services be protected from the internet-at-large by a firewall where 277services be protected from the internet-at-large by a firewall where
277that is possible. 278that is possible.
278 279
280mcelog
281------
282
283In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility
284as a regular cronjob similar to the x86-64 kernel to process and log
285machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check
286events are errors reported by the CPU. Processing them is strongly encouraged.
287All x86-64 kernels since 2.6.4 require the mcelog utility to
288process machine checks.
289
279Getting updated software 290Getting updated software
280======================== 291========================
281 292
@@ -365,6 +376,10 @@ FUSE
365---- 376----
366o <http://sourceforge.net/projects/fuse> 377o <http://sourceforge.net/projects/fuse>
367 378
379mcelog
380------
381o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/>
382
368Networking 383Networking
369********** 384**********
370 385
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index de491a3e2313..ec9ef5d0d7b3 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -437,3 +437,13 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate
437 driver but this caused driver conflicts. 437 driver but this caused driver conflicts.
438Who: Jean Delvare <khali@linux-fr.org> 438Who: Jean Delvare <khali@linux-fr.org>
439 Krzysztof Helt <krzysztof.h1@wp.pl> 439 Krzysztof Helt <krzysztof.h1@wp.pl>
440
441----------------------------
442
443What: CONFIG_X86_OLD_MCE
444When: 2.6.32
445Why: Remove the old legacy 32bit machine check code. This has been
446 superseded by the newer machine check code from the 64bit port,
447 but the old version has been kept around for easier testing. Note this
448 doesn't impact the old P5 and WinChip machine check handlers.
449Who: Andi Kleen <andi@firstfloor.org>
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt
index 2db5893d6c97..29a6ff8bc7d3 100644
--- a/Documentation/x86/x86_64/boot-options.txt
+++ b/Documentation/x86/x86_64/boot-options.txt
@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here.
5 5
6Machine check 6Machine check
7 7
8 mce=off disable machine check 8 Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables.
9 mce=bootlog Enable logging of machine checks left over from booting. 9
10 Disabled by default on AMD because some BIOS leave bogus ones. 10 mce=off
11 If your BIOS doesn't do that it's a good idea to enable though 11 Disable machine check
12 to make sure you log even machine check events that result 12 mce=no_cmci
13 in a reboot. On Intel systems it is enabled by default. 13 Disable CMCI(Corrected Machine Check Interrupt) that
14 Intel processor supports. Usually this disablement is
15 not recommended, but it might be handy if your hardware
16 is misbehaving.
17 Note that you'll get more problems without CMCI than with
18 due to the shared banks, i.e. you might get duplicated
19 error logs.
20 mce=dont_log_ce
21 Don't make logs for corrected errors. All events reported
22 as corrected are silently cleared by OS.
23 This option will be useful if you have no interest in any
24 of corrected errors.
25 mce=ignore_ce
26 Disable features for corrected errors, e.g. polling timer
27 and CMCI. All events reported as corrected are not cleared
28 by OS and remained in its error banks.
29 Usually this disablement is not recommended, however if
30 there is an agent checking/clearing corrected errors
31 (e.g. BIOS or hardware monitoring applications), conflicting
32 with OS's error handling, and you cannot deactivate the agent,
33 then this option will be a help.
34 mce=bootlog
35 Enable logging of machine checks left over from booting.
36 Disabled by default on AMD because some BIOS leave bogus ones.
37 If your BIOS doesn't do that it's a good idea to enable though
38 to make sure you log even machine check events that result
39 in a reboot. On Intel systems it is enabled by default.
14 mce=nobootlog 40 mce=nobootlog
15 Disable boot machine check logging. 41 Disable boot machine check logging.
16 mce=tolerancelevel (number) 42 mce=tolerancelevel[,monarchtimeout] (number,number)
43 tolerance levels:
17 0: always panic on uncorrected errors, log corrected errors 44 0: always panic on uncorrected errors, log corrected errors
18 1: panic or SIGBUS on uncorrected errors, log corrected errors 45 1: panic or SIGBUS on uncorrected errors, log corrected errors
19 2: SIGBUS or log uncorrected errors, log corrected errors 46 2: SIGBUS or log uncorrected errors, log corrected errors
20 3: never panic or SIGBUS, log all errors (for testing only) 47 3: never panic or SIGBUS, log all errors (for testing only)
21 Default is 1 48 Default is 1
22 Can be also set using sysfs which is preferable. 49 Can be also set using sysfs which is preferable.
50 monarchtimeout:
51 Sets the time in us to wait for other CPUs on machine checks. 0
52 to disable.
23 53
24 nomce (for compatibility with i386): same as mce=off 54 nomce (for compatibility with i386): same as mce=off
25 55
diff --git a/Documentation/x86/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck
index a05e58e7b159..b1fb30273286 100644
--- a/Documentation/x86/x86_64/machinecheck
+++ b/Documentation/x86/x86_64/machinecheck
@@ -41,7 +41,9 @@ check_interval
41 the polling interval. When the poller stops finding MCEs, it 41 the polling interval. When the poller stops finding MCEs, it
42 triggers an exponential backoff (poll less often) on the polling 42 triggers an exponential backoff (poll less often) on the polling
43 interval. The check_interval variable is both the initial and 43 interval. The check_interval variable is both the initial and
44 maximum polling interval. 44 maximum polling interval. 0 means no polling for corrected machine
45 check errors (but some corrected errors might be still reported
46 in other ways)
45 47
46tolerant 48tolerant
47 Tolerance level. When a machine check exception occurs for a non 49 Tolerance level. When a machine check exception occurs for a non
@@ -67,6 +69,10 @@ trigger
67 Program to run when a machine check event is detected. 69 Program to run when a machine check event is detected.
68 This is an alternative to running mcelog regularly from cron 70 This is an alternative to running mcelog regularly from cron
69 and allows to detect events faster. 71 and allows to detect events faster.
72monarch_timeout
73 How long to wait for the other CPUs to machine check too on a
74 exception. 0 to disable waiting for other CPUs.
75 Unit: us
70 76
71TBD document entries for AMD threshold interrupt configuration 77TBD document entries for AMD threshold interrupt configuration
72 78