diff options
Diffstat (limited to 'Documentation/drivers/edac/edac.txt')
-rw-r--r-- | Documentation/drivers/edac/edac.txt | 152 |
1 files changed, 27 insertions, 125 deletions
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt index 70d96a62e5e1..7b3d969d2964 100644 --- a/Documentation/drivers/edac/edac.txt +++ b/Documentation/drivers/edac/edac.txt | |||
@@ -35,15 +35,14 @@ the vendor should tie the parity status bits to 0 if they do not intend | |||
35 | to generate parity. Some vendors do not do this, and thus the parity bit | 35 | to generate parity. Some vendors do not do this, and thus the parity bit |
36 | can "float" giving false positives. | 36 | can "float" giving false positives. |
37 | 37 | ||
38 | The PCI Parity EDAC device has the ability to "skip" known flaky | 38 | [There are patches in the kernel queue which will allow for storage of |
39 | cards during the parity scan. These are set by the parity "blacklist" | 39 | quirks of PCI devices reporting false parity positives. The 2.6.18 |
40 | interface in the sysfs for PCI Parity. (See the PCI section in the sysfs | 40 | kernel should have those patches included. When that becomes available, |
41 | section below.) There is also a parity "whitelist" which is used as | 41 | then EDAC will be patched to utilize that information to "skip" such |
42 | an explicit list of devices to scan, while the blacklist is a list | 42 | devices.] |
43 | of devices to skip. | ||
44 | 43 | ||
45 | EDAC will have future error detectors that will be added or integrated | 44 | EDAC will have future error detectors that will be integrated with |
46 | into EDAC in the following list: | 45 | EDAC or added to it, in the following list: |
47 | 46 | ||
48 | MCE Machine Check Exception | 47 | MCE Machine Check Exception |
49 | MCA Machine Check Architecture | 48 | MCA Machine Check Architecture |
@@ -93,22 +92,24 @@ EDAC lives in the /sys/devices/system/edac directory. Within this directory | |||
93 | there currently reside 2 'edac' components: | 92 | there currently reside 2 'edac' components: |
94 | 93 | ||
95 | mc memory controller(s) system | 94 | mc memory controller(s) system |
96 | pci PCI status system | 95 | pci PCI control and status system |
97 | 96 | ||
98 | 97 | ||
99 | ============================================================================ | 98 | ============================================================================ |
100 | Memory Controller (mc) Model | 99 | Memory Controller (mc) Model |
101 | 100 | ||
102 | First a background on the memory controller's model abstracted in EDAC. | 101 | First a background on the memory controller's model abstracted in EDAC. |
103 | Each mc device controls a set of DIMM memory modules. These modules are | 102 | Each 'mc' device controls a set of DIMM memory modules. These modules are |
104 | laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can | 103 | laid out in a Chip-Select Row (csrowX) and Channel table (chX). There can |
105 | be multiple csrows and two channels. | 104 | be multiple csrows and multiple channels. |
106 | 105 | ||
107 | Memory controllers allow for several csrows, with 8 csrows being a typical value. | 106 | Memory controllers allow for several csrows, with 8 csrows being a typical value. |
108 | Yet, the actual number of csrows depends on the electrical "loading" | 107 | Yet, the actual number of csrows depends on the electrical "loading" |
109 | of a given motherboard, memory controller and DIMM characteristics. | 108 | of a given motherboard, memory controller and DIMM characteristics. |
110 | 109 | ||
111 | Dual channels allows for 128 bit data transfers to the CPU from memory. | 110 | Dual channels allows for 128 bit data transfers to the CPU from memory. |
111 | Some newer chipsets allow for more than 2 channels, like Fully Buffered DIMMs | ||
112 | (FB-DIMMs). The following example will assume 2 channels: | ||
112 | 113 | ||
113 | 114 | ||
114 | Channel 0 Channel 1 | 115 | Channel 0 Channel 1 |
@@ -234,23 +235,15 @@ Polling period control file: | |||
234 | The time period, in milliseconds, for polling for error information. | 235 | The time period, in milliseconds, for polling for error information. |
235 | Too small a value wastes resources. Too large a value might delay | 236 | Too small a value wastes resources. Too large a value might delay |
236 | necessary handling of errors and might loose valuable information for | 237 | necessary handling of errors and might loose valuable information for |
237 | locating the error. 1000 milliseconds (once each second) is about | 238 | locating the error. 1000 milliseconds (once each second) is the current |
238 | right for most uses. | 239 | default. Systems which require all the bandwidth they can get, may |
240 | increase this. | ||
239 | 241 | ||
240 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] | 242 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] |
241 | 243 | ||
242 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec | 244 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec |
243 | 245 | ||
244 | 246 | ||
245 | Module Version read-only attribute file: | ||
246 | |||
247 | 'mc_version' | ||
248 | |||
249 | The EDAC CORE module's version and compile date are shown here to | ||
250 | indicate what EDAC is running. | ||
251 | |||
252 | |||
253 | |||
254 | ============================================================================ | 247 | ============================================================================ |
255 | 'mcX' DIRECTORIES | 248 | 'mcX' DIRECTORIES |
256 | 249 | ||
@@ -284,35 +277,6 @@ Seconds since last counter reset control file: | |||
284 | 277 | ||
285 | 278 | ||
286 | 279 | ||
287 | DIMM capability attribute file: | ||
288 | |||
289 | 'edac_capability' | ||
290 | |||
291 | The EDAC (Error Detection and Correction) capabilities/modes of | ||
292 | the memory controller hardware. | ||
293 | |||
294 | |||
295 | DIMM Current Capability attribute file: | ||
296 | |||
297 | 'edac_current_capability' | ||
298 | |||
299 | The EDAC capabilities available with the hardware | ||
300 | configuration. This may not be the same as "EDAC capability" | ||
301 | if the correct memory is not used. If a memory controller is | ||
302 | capable of EDAC, but DIMMs without check bits are in use, then | ||
303 | Parity, SECDED, S4ECD4ED capabilities will not be available | ||
304 | even though the memory controller might be capable of those | ||
305 | modes with the proper memory loaded. | ||
306 | |||
307 | |||
308 | Memory Type supported on this controller attribute file: | ||
309 | |||
310 | 'supported_mem_type' | ||
311 | |||
312 | This attribute file displays the memory type, usually | ||
313 | buffered and unbuffered DIMMs. | ||
314 | |||
315 | |||
316 | Memory Controller name attribute file: | 280 | Memory Controller name attribute file: |
317 | 281 | ||
318 | 'mc_name' | 282 | 'mc_name' |
@@ -321,16 +285,6 @@ Memory Controller name attribute file: | |||
321 | that is being utilized. | 285 | that is being utilized. |
322 | 286 | ||
323 | 287 | ||
324 | Memory Controller Module name attribute file: | ||
325 | |||
326 | 'module_name' | ||
327 | |||
328 | This attribute file displays the memory controller module name, | ||
329 | version and date built. The name of the memory controller | ||
330 | hardware - some drivers work with multiple controllers and | ||
331 | this field shows which hardware is present. | ||
332 | |||
333 | |||
334 | Total memory managed by this memory controller attribute file: | 288 | Total memory managed by this memory controller attribute file: |
335 | 289 | ||
336 | 'size_mb' | 290 | 'size_mb' |
@@ -432,6 +386,9 @@ Memory Type attribute file: | |||
432 | 386 | ||
433 | This attribute file will display what type of memory is currently | 387 | This attribute file will display what type of memory is currently |
434 | on this csrow. Normally, either buffered or unbuffered memory. | 388 | on this csrow. Normally, either buffered or unbuffered memory. |
389 | Examples: | ||
390 | Registered-DDR | ||
391 | Unbuffered-DDR | ||
435 | 392 | ||
436 | 393 | ||
437 | EDAC Mode of operation attribute file: | 394 | EDAC Mode of operation attribute file: |
@@ -446,8 +403,13 @@ Device type attribute file: | |||
446 | 403 | ||
447 | 'dev_type' | 404 | 'dev_type' |
448 | 405 | ||
449 | This attribute file will display what type of DIMM device is | 406 | This attribute file will display what type of DRAM device is |
450 | being utilized. Example: x4 | 407 | being utilized on this DIMM. |
408 | Examples: | ||
409 | x1 | ||
410 | x2 | ||
411 | x4 | ||
412 | x8 | ||
451 | 413 | ||
452 | 414 | ||
453 | Channel 0 CE Count attribute file: | 415 | Channel 0 CE Count attribute file: |
@@ -522,10 +484,10 @@ SYSTEM LOGGING | |||
522 | If logging for UEs and CEs are enabled then system logs will have | 484 | If logging for UEs and CEs are enabled then system logs will have |
523 | error notices indicating errors that have been detected: | 485 | error notices indicating errors that have been detected: |
524 | 486 | ||
525 | MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, | 487 | EDAC MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, |
526 | channel 1 "DIMM_B1": amd76x_edac | 488 | channel 1 "DIMM_B1": amd76x_edac |
527 | 489 | ||
528 | MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, | 490 | EDAC MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, |
529 | channel 1 "DIMM_B1": amd76x_edac | 491 | channel 1 "DIMM_B1": amd76x_edac |
530 | 492 | ||
531 | 493 | ||
@@ -610,64 +572,4 @@ Parity Count: | |||
610 | 572 | ||
611 | 573 | ||
612 | 574 | ||
613 | PCI Device Whitelist: | ||
614 | |||
615 | 'pci_parity_whitelist' | ||
616 | |||
617 | This control file allows for an explicit list of PCI devices to be | ||
618 | scanned for parity errors. Only devices found on this list will | ||
619 | be examined. The list is a line of hexadecimal VENDOR and DEVICE | ||
620 | ID tuples: | ||
621 | |||
622 | 1022:7450,1434:16a6 | ||
623 | |||
624 | One or more can be inserted, separated by a comma. | ||
625 | |||
626 | To write the above list doing the following as one command line: | ||
627 | |||
628 | echo "1022:7450,1434:16a6" | ||
629 | > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
630 | |||
631 | |||
632 | |||
633 | To display what the whitelist is, simply 'cat' the same file. | ||
634 | |||
635 | |||
636 | PCI Device Blacklist: | ||
637 | |||
638 | 'pci_parity_blacklist' | ||
639 | |||
640 | This control file allows for a list of PCI devices to be | ||
641 | skipped for scanning. | ||
642 | The list is a line of hexadecimal VENDOR and DEVICE ID tuples: | ||
643 | |||
644 | 1022:7450,1434:16a6 | ||
645 | |||
646 | One or more can be inserted, separated by a comma. | ||
647 | |||
648 | To write the above list doing the following as one command line: | ||
649 | |||
650 | echo "1022:7450,1434:16a6" | ||
651 | > /sys/devices/system/edac/pci/pci_parity_blacklist | ||
652 | |||
653 | |||
654 | To display what the whitelist currently contains, | ||
655 | simply 'cat' the same file. | ||
656 | |||
657 | ======================================================================= | 575 | ======================================================================= |
658 | |||
659 | PCI Vendor and Devices IDs can be obtained with the lspci command. Using | ||
660 | the -n option lspci will display the vendor and device IDs. The system | ||
661 | administrator will have to determine which devices should be scanned or | ||
662 | skipped. | ||
663 | |||
664 | |||
665 | |||
666 | The two lists (white and black) are prioritized. blacklist is the lower | ||
667 | priority and will NOT be utilized when a whitelist has been set. | ||
668 | Turn OFF a whitelist by an empty echo command: | ||
669 | |||
670 | echo > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
671 | |||
672 | and any previous blacklist will be utilized. | ||
673 | |||