diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-13 16:14:51 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-13 16:14:51 -0400 |
commit | a2ee2981ae2a7046b10980feae9f4ab813877106 (patch) | |
tree | ed75db7830b9ef1342659d36d2775954ce96b79f /Documentation | |
parent | 7603ef03a22a33d36d3c75d7c1aca1f957671ad3 (diff) | |
parent | 0d5959723e1db3fd7323c198a50c16cecf96c7a9 (diff) |
Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (80 commits)
x86, mce: Add boot options for corrected errors
x86, mce: Fix mce printing
x86, mce: fix for mce counters
x86, mce: support action-optional machine checks
x86, mce: define MCE_VECTOR
x86, mce: rename mce_notify_user to mce_notify_irq
x86: fix panic with interrupts off (needed for MCE)
x86, mce: export MCE severities coverage via debugfs
x86, mce: implement new status bits
x86, mce: print header/footer only once for multiple MCEs
x86, mce: default to panic timeout for machine checks
x86, mce: improve mce_get_rip
x86, mce: make non Monarch panic message "Fatal machine check" too
x86, mce: switch x86 machine check handler to Monarch election.
x86, mce: implement panic synchronization
x86, mce: implement bootstrapping for machine check wakeups
x86, mce: check early in exception handler if panic is needed
x86, mce: add table driven machine check grading
x86, mce: remove TSC print heuristic
x86, mce: log corrected errors when panicing
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/Changes | 15 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 10 | ||||
-rw-r--r-- | Documentation/x86/x86_64/boot-options.txt | 44 | ||||
-rw-r--r-- | Documentation/x86/x86_64/machinecheck | 8 |
4 files changed, 69 insertions, 8 deletions
diff --git a/Documentation/Changes b/Documentation/Changes index b95082be4d5e..d21b3b5aa543 100644 --- a/Documentation/Changes +++ b/Documentation/Changes | |||
@@ -48,6 +48,7 @@ o procps 3.2.0 # ps --version | |||
48 | o oprofile 0.9 # oprofiled --version | 48 | o oprofile 0.9 # oprofiled --version |
49 | o udev 081 # udevinfo -V | 49 | o udev 081 # udevinfo -V |
50 | o grub 0.93 # grub --version | 50 | o grub 0.93 # grub --version |
51 | o mcelog 0.6 | ||
51 | 52 | ||
52 | Kernel compilation | 53 | Kernel compilation |
53 | ================== | 54 | ================== |
@@ -276,6 +277,16 @@ before running exportfs or mountd. It is recommended that all NFS | |||
276 | services be protected from the internet-at-large by a firewall where | 277 | services be protected from the internet-at-large by a firewall where |
277 | that is possible. | 278 | that is possible. |
278 | 279 | ||
280 | mcelog | ||
281 | ------ | ||
282 | |||
283 | In Linux 2.6.31+ the i386 kernel needs to run the mcelog utility | ||
284 | as a regular cronjob similar to the x86-64 kernel to process and log | ||
285 | machine check events when CONFIG_X86_NEW_MCE is enabled. Machine check | ||
286 | events are errors reported by the CPU. Processing them is strongly encouraged. | ||
287 | All x86-64 kernels since 2.6.4 require the mcelog utility to | ||
288 | process machine checks. | ||
289 | |||
279 | Getting updated software | 290 | Getting updated software |
280 | ======================== | 291 | ======================== |
281 | 292 | ||
@@ -365,6 +376,10 @@ FUSE | |||
365 | ---- | 376 | ---- |
366 | o <http://sourceforge.net/projects/fuse> | 377 | o <http://sourceforge.net/projects/fuse> |
367 | 378 | ||
379 | mcelog | ||
380 | ------ | ||
381 | o <ftp://ftp.kernel.org/pub/linux/utils/cpu/mce/mcelog/> | ||
382 | |||
368 | Networking | 383 | Networking |
369 | ********** | 384 | ********** |
370 | 385 | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index de491a3e2313..ec9ef5d0d7b3 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -437,3 +437,13 @@ Why: Superseded by tdfxfb. I2C/DDC support used to live in a separate | |||
437 | driver but this caused driver conflicts. | 437 | driver but this caused driver conflicts. |
438 | Who: Jean Delvare <khali@linux-fr.org> | 438 | Who: Jean Delvare <khali@linux-fr.org> |
439 | Krzysztof Helt <krzysztof.h1@wp.pl> | 439 | Krzysztof Helt <krzysztof.h1@wp.pl> |
440 | |||
441 | ---------------------------- | ||
442 | |||
443 | What: CONFIG_X86_OLD_MCE | ||
444 | When: 2.6.32 | ||
445 | Why: Remove the old legacy 32bit machine check code. This has been | ||
446 | superseded by the newer machine check code from the 64bit port, | ||
447 | but the old version has been kept around for easier testing. Note this | ||
448 | doesn't impact the old P5 and WinChip machine check handlers. | ||
449 | Who: Andi Kleen <andi@firstfloor.org> | ||
diff --git a/Documentation/x86/x86_64/boot-options.txt b/Documentation/x86/x86_64/boot-options.txt index 2db5893d6c97..29a6ff8bc7d3 100644 --- a/Documentation/x86/x86_64/boot-options.txt +++ b/Documentation/x86/x86_64/boot-options.txt | |||
@@ -5,21 +5,51 @@ only the AMD64 specific ones are listed here. | |||
5 | 5 | ||
6 | Machine check | 6 | Machine check |
7 | 7 | ||
8 | mce=off disable machine check | 8 | Please see Documentation/x86/x86_64/machinecheck for sysfs runtime tunables. |
9 | mce=bootlog Enable logging of machine checks left over from booting. | 9 | |
10 | Disabled by default on AMD because some BIOS leave bogus ones. | 10 | mce=off |
11 | If your BIOS doesn't do that it's a good idea to enable though | 11 | Disable machine check |
12 | to make sure you log even machine check events that result | 12 | mce=no_cmci |
13 | in a reboot. On Intel systems it is enabled by default. | 13 | Disable CMCI(Corrected Machine Check Interrupt) that |
14 | Intel processor supports. Usually this disablement is | ||
15 | not recommended, but it might be handy if your hardware | ||
16 | is misbehaving. | ||
17 | Note that you'll get more problems without CMCI than with | ||
18 | due to the shared banks, i.e. you might get duplicated | ||
19 | error logs. | ||
20 | mce=dont_log_ce | ||
21 | Don't make logs for corrected errors. All events reported | ||
22 | as corrected are silently cleared by OS. | ||
23 | This option will be useful if you have no interest in any | ||
24 | of corrected errors. | ||
25 | mce=ignore_ce | ||
26 | Disable features for corrected errors, e.g. polling timer | ||
27 | and CMCI. All events reported as corrected are not cleared | ||
28 | by OS and remained in its error banks. | ||
29 | Usually this disablement is not recommended, however if | ||
30 | there is an agent checking/clearing corrected errors | ||
31 | (e.g. BIOS or hardware monitoring applications), conflicting | ||
32 | with OS's error handling, and you cannot deactivate the agent, | ||
33 | then this option will be a help. | ||
34 | mce=bootlog | ||
35 | Enable logging of machine checks left over from booting. | ||
36 | Disabled by default on AMD because some BIOS leave bogus ones. | ||
37 | If your BIOS doesn't do that it's a good idea to enable though | ||
38 | to make sure you log even machine check events that result | ||
39 | in a reboot. On Intel systems it is enabled by default. | ||
14 | mce=nobootlog | 40 | mce=nobootlog |
15 | Disable boot machine check logging. | 41 | Disable boot machine check logging. |
16 | mce=tolerancelevel (number) | 42 | mce=tolerancelevel[,monarchtimeout] (number,number) |
43 | tolerance levels: | ||
17 | 0: always panic on uncorrected errors, log corrected errors | 44 | 0: always panic on uncorrected errors, log corrected errors |
18 | 1: panic or SIGBUS on uncorrected errors, log corrected errors | 45 | 1: panic or SIGBUS on uncorrected errors, log corrected errors |
19 | 2: SIGBUS or log uncorrected errors, log corrected errors | 46 | 2: SIGBUS or log uncorrected errors, log corrected errors |
20 | 3: never panic or SIGBUS, log all errors (for testing only) | 47 | 3: never panic or SIGBUS, log all errors (for testing only) |
21 | Default is 1 | 48 | Default is 1 |
22 | Can be also set using sysfs which is preferable. | 49 | Can be also set using sysfs which is preferable. |
50 | monarchtimeout: | ||
51 | Sets the time in us to wait for other CPUs on machine checks. 0 | ||
52 | to disable. | ||
23 | 53 | ||
24 | nomce (for compatibility with i386): same as mce=off | 54 | nomce (for compatibility with i386): same as mce=off |
25 | 55 | ||
diff --git a/Documentation/x86/x86_64/machinecheck b/Documentation/x86/x86_64/machinecheck index a05e58e7b159..b1fb30273286 100644 --- a/Documentation/x86/x86_64/machinecheck +++ b/Documentation/x86/x86_64/machinecheck | |||
@@ -41,7 +41,9 @@ check_interval | |||
41 | the polling interval. When the poller stops finding MCEs, it | 41 | the polling interval. When the poller stops finding MCEs, it |
42 | triggers an exponential backoff (poll less often) on the polling | 42 | triggers an exponential backoff (poll less often) on the polling |
43 | interval. The check_interval variable is both the initial and | 43 | interval. The check_interval variable is both the initial and |
44 | maximum polling interval. | 44 | maximum polling interval. 0 means no polling for corrected machine |
45 | check errors (but some corrected errors might be still reported | ||
46 | in other ways) | ||
45 | 47 | ||
46 | tolerant | 48 | tolerant |
47 | Tolerance level. When a machine check exception occurs for a non | 49 | Tolerance level. When a machine check exception occurs for a non |
@@ -67,6 +69,10 @@ trigger | |||
67 | Program to run when a machine check event is detected. | 69 | Program to run when a machine check event is detected. |
68 | This is an alternative to running mcelog regularly from cron | 70 | This is an alternative to running mcelog regularly from cron |
69 | and allows to detect events faster. | 71 | and allows to detect events faster. |
72 | monarch_timeout | ||
73 | How long to wait for the other CPUs to machine check too on a | ||
74 | exception. 0 to disable waiting for other CPUs. | ||
75 | Unit: us | ||
70 | 76 | ||
71 | TBD document entries for AMD threshold interrupt configuration | 77 | TBD document entries for AMD threshold interrupt configuration |
72 | 78 | ||