diff options
author | David Teigland <teigland@redhat.com> | 2006-01-20 03:59:41 -0500 |
---|---|---|
committer | Steven Whitehouse <steve@chygwyn.com> | 2006-01-20 03:59:41 -0500 |
commit | 044399b2cb6ad2d7f63cfca945268853d7443a4d (patch) | |
tree | 5f96eb307b0389ac0b919a4744a40862b615e9da /Documentation/drivers/edac | |
parent | 901359256b2666f52a3a7d3f31927677e91b3a2a (diff) | |
parent | 18a4144028f056b77d6576d4eb284246e9c7ea97 (diff) |
Merge branch 'master'
Diffstat (limited to 'Documentation/drivers/edac')
-rw-r--r-- | Documentation/drivers/edac/edac.txt | 673 |
1 files changed, 673 insertions, 0 deletions
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt new file mode 100644 index 000000000000..d37191fe5681 --- /dev/null +++ b/Documentation/drivers/edac/edac.txt | |||
@@ -0,0 +1,673 @@ | |||
1 | |||
2 | |||
3 | EDAC - Error Detection And Correction | ||
4 | |||
5 | Written by Doug Thompson <norsk5@xmission.com> | ||
6 | 7 Dec 2005 | ||
7 | |||
8 | |||
9 | EDAC was written by: | ||
10 | Thayne Harbaugh, | ||
11 | modified by Dave Peterson, Doug Thompson, et al, | ||
12 | from the bluesmoke.sourceforge.net project. | ||
13 | |||
14 | |||
15 | ============================================================================ | ||
16 | EDAC PURPOSE | ||
17 | |||
18 | The 'edac' kernel module goal is to detect and report errors that occur | ||
19 | within the computer system. In the initial release, memory Correctable Errors | ||
20 | (CE) and Uncorrectable Errors (UE) are the primary errors being harvested. | ||
21 | |||
22 | Detecting CE events, then harvesting those events and reporting them, | ||
23 | CAN be a predictor of future UE events. With CE events, the system can | ||
24 | continue to operate, but with less safety. Preventive maintainence and | ||
25 | proactive part replacement of memory DIMMs exhibiting CEs can reduce | ||
26 | the likelihood of the dreaded UE events and system 'panics'. | ||
27 | |||
28 | |||
29 | In addition, PCI Bus Parity and SERR Errors are scanned for on PCI devices | ||
30 | in order to determine if errors are occurring on data transfers. | ||
31 | The presence of PCI Parity errors must be examined with a grain of salt. | ||
32 | There are several addin adapters that do NOT follow the PCI specification | ||
33 | with regards to Parity generation and reporting. The specification says | ||
34 | the vendor should tie the parity status bits to 0 if they do not intend | ||
35 | to generate parity. Some vendors do not do this, and thus the parity bit | ||
36 | can "float" giving false positives. | ||
37 | |||
38 | The PCI Parity EDAC device has the ability to "skip" known flakey | ||
39 | cards during the parity scan. These are set by the parity "blacklist" | ||
40 | interface in the sysfs for PCI Parity. (See the PCI section in the sysfs | ||
41 | section below.) There is also a parity "whitelist" which is used as | ||
42 | an explicit list of devices to scan, while the blacklist is a list | ||
43 | of devices to skip. | ||
44 | |||
45 | EDAC will have future error detectors that will be added or integrated | ||
46 | into EDAC in the following list: | ||
47 | |||
48 | MCE Machine Check Exception | ||
49 | MCA Machine Check Architecture | ||
50 | NMI NMI notification of ECC errors | ||
51 | MSRs Machine Specific Register error cases | ||
52 | and other mechanisms. | ||
53 | |||
54 | These errors are usually bus errors, ECC errors, thermal throttling | ||
55 | and the like. | ||
56 | |||
57 | |||
58 | ============================================================================ | ||
59 | EDAC VERSIONING | ||
60 | |||
61 | EDAC is composed of a "core" module (edac_mc.ko) and several Memory | ||
62 | Controller (MC) driver modules. On a given system, the CORE | ||
63 | is loaded and one MC driver will be loaded. Both the CORE and | ||
64 | the MC driver have individual versions that reflect current release | ||
65 | level of their respective modules. Thus, to "report" on what version | ||
66 | a system is running, one must report both the CORE's and the | ||
67 | MC driver's versions. | ||
68 | |||
69 | |||
70 | LOADING | ||
71 | |||
72 | If 'edac' was statically linked with the kernel then no loading is | ||
73 | necessary. If 'edac' was built as modules then simply modprobe the | ||
74 | 'edac' pieces that you need. You should be able to modprobe | ||
75 | hardware-specific modules and have the dependencies load the necessary core | ||
76 | modules. | ||
77 | |||
78 | Example: | ||
79 | |||
80 | $> modprobe amd76x_edac | ||
81 | |||
82 | loads both the amd76x_edac.ko memory controller module and the edac_mc.ko | ||
83 | core module. | ||
84 | |||
85 | |||
86 | ============================================================================ | ||
87 | EDAC sysfs INTERFACE | ||
88 | |||
89 | EDAC presents a 'sysfs' interface for control, reporting and attribute | ||
90 | reporting purposes. | ||
91 | |||
92 | EDAC lives in the /sys/devices/system/edac directory. Within this directory | ||
93 | there currently reside 2 'edac' components: | ||
94 | |||
95 | mc memory controller(s) system | ||
96 | pci PCI status system | ||
97 | |||
98 | |||
99 | ============================================================================ | ||
100 | Memory Controller (mc) Model | ||
101 | |||
102 | First a background on the memory controller's model abstracted in EDAC. | ||
103 | Each mc device controls a set of DIMM memory modules. These modules are | ||
104 | layed out in a Chip-Select Row (csrowX) and Channel table (chX). There can | ||
105 | be multiple csrows and two channels. | ||
106 | |||
107 | Memory controllers allow for several csrows, with 8 csrows being a typical value. | ||
108 | Yet, the actual number of csrows depends on the electrical "loading" | ||
109 | of a given motherboard, memory controller and DIMM characteristics. | ||
110 | |||
111 | Dual channels allows for 128 bit data transfers to the CPU from memory. | ||
112 | |||
113 | |||
114 | Channel 0 Channel 1 | ||
115 | =================================== | ||
116 | csrow0 | DIMM_A0 | DIMM_B0 | | ||
117 | csrow1 | DIMM_A0 | DIMM_B0 | | ||
118 | =================================== | ||
119 | |||
120 | =================================== | ||
121 | csrow2 | DIMM_A1 | DIMM_B1 | | ||
122 | csrow3 | DIMM_A1 | DIMM_B1 | | ||
123 | =================================== | ||
124 | |||
125 | In the above example table there are 4 physical slots on the motherboard | ||
126 | for memory DIMMs: | ||
127 | |||
128 | DIMM_A0 | ||
129 | DIMM_B0 | ||
130 | DIMM_A1 | ||
131 | DIMM_B1 | ||
132 | |||
133 | Labels for these slots are usually silk screened on the motherboard. Slots | ||
134 | labeled 'A' are channel 0 in this example. Slots labled 'B' | ||
135 | are channel 1. Notice that there are two csrows possible on a | ||
136 | physical DIMM. These csrows are allocated their csrow assignment | ||
137 | based on the slot into which the memory DIMM is placed. Thus, when 1 DIMM | ||
138 | is placed in each Channel, the csrows cross both DIMMs. | ||
139 | |||
140 | Memory DIMMs come single or dual "ranked". A rank is a populated csrow. | ||
141 | Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above | ||
142 | will have 1 csrow, csrow0. csrow1 will be empty. On the other hand, | ||
143 | when 2 dual ranked DIMMs are similiaryly placed, then both csrow0 and | ||
144 | csrow1 will be populated. The pattern repeats itself for csrow2 and | ||
145 | csrow3. | ||
146 | |||
147 | The representation of the above is reflected in the directory tree | ||
148 | in EDAC's sysfs interface. Starting in directory | ||
149 | /sys/devices/system/edac/mc each memory controller will be represented | ||
150 | by its own 'mcX' directory, where 'X" is the index of the MC. | ||
151 | |||
152 | |||
153 | ..../edac/mc/ | ||
154 | | | ||
155 | |->mc0 | ||
156 | |->mc1 | ||
157 | |->mc2 | ||
158 | .... | ||
159 | |||
160 | Under each 'mcX' directory each 'csrowX' is again represented by a | ||
161 | 'csrowX', where 'X" is the csrow index: | ||
162 | |||
163 | |||
164 | .../mc/mc0/ | ||
165 | | | ||
166 | |->csrow0 | ||
167 | |->csrow2 | ||
168 | |->csrow3 | ||
169 | .... | ||
170 | |||
171 | Notice that there is no csrow1, which indicates that csrow0 is | ||
172 | composed of a single ranked DIMMs. This should also apply in both | ||
173 | Channels, in order to have dual-channel mode be operational. Since | ||
174 | both csrow2 and csrow3 are populated, this indicates a dual ranked | ||
175 | set of DIMMs for channels 0 and 1. | ||
176 | |||
177 | |||
178 | Within each of the 'mc','mcX' and 'csrowX' directories are several | ||
179 | EDAC control and attribute files. | ||
180 | |||
181 | |||
182 | ============================================================================ | ||
183 | DIRECTORY 'mc' | ||
184 | |||
185 | In directory 'mc' are EDAC system overall control and attribute files: | ||
186 | |||
187 | |||
188 | Panic on UE control file: | ||
189 | |||
190 | 'panic_on_ue' | ||
191 | |||
192 | An uncorrectable error will cause a machine panic. This is usually | ||
193 | desirable. It is a bad idea to continue when an uncorrectable error | ||
194 | occurs - it is indeterminate what was uncorrected and the operating | ||
195 | system context might be so mangled that continuing will lead to further | ||
196 | corruption. If the kernel has MCE configured, then EDAC will never | ||
197 | notice the UE. | ||
198 | |||
199 | LOAD TIME: module/kernel parameter: panic_on_ue=[0|1] | ||
200 | |||
201 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/panic_on_ue | ||
202 | |||
203 | |||
204 | Log UE control file: | ||
205 | |||
206 | 'log_ue' | ||
207 | |||
208 | Generate kernel messages describing uncorrectable errors. These errors | ||
209 | are reported through the system message log system. UE statistics | ||
210 | will be accumulated even when UE logging is disabled. | ||
211 | |||
212 | LOAD TIME: module/kernel parameter: log_ue=[0|1] | ||
213 | |||
214 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ue | ||
215 | |||
216 | |||
217 | Log CE control file: | ||
218 | |||
219 | 'log_ce' | ||
220 | |||
221 | Generate kernel messages describing correctable errors. These | ||
222 | errors are reported through the system message log system. | ||
223 | CE statistics will be accumulated even when CE logging is disabled. | ||
224 | |||
225 | LOAD TIME: module/kernel parameter: log_ce=[0|1] | ||
226 | |||
227 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ce | ||
228 | |||
229 | |||
230 | Polling period control file: | ||
231 | |||
232 | 'poll_msec' | ||
233 | |||
234 | The time period, in milliseconds, for polling for error information. | ||
235 | Too small a value wastes resources. Too large a value might delay | ||
236 | necessary handling of errors and might loose valuable information for | ||
237 | locating the error. 1000 milliseconds (once each second) is about | ||
238 | right for most uses. | ||
239 | |||
240 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] | ||
241 | |||
242 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec | ||
243 | |||
244 | |||
245 | Module Version read-only attribute file: | ||
246 | |||
247 | 'mc_version' | ||
248 | |||
249 | The EDAC CORE modules's version and compile date are shown here to | ||
250 | indicate what EDAC is running. | ||
251 | |||
252 | |||
253 | |||
254 | ============================================================================ | ||
255 | 'mcX' DIRECTORIES | ||
256 | |||
257 | |||
258 | In 'mcX' directories are EDAC control and attribute files for | ||
259 | this 'X" instance of the memory controllers: | ||
260 | |||
261 | |||
262 | Counter reset control file: | ||
263 | |||
264 | 'reset_counters' | ||
265 | |||
266 | This write-only control file will zero all the statistical counters | ||
267 | for UE and CE errors. Zeroing the counters will also reset the timer | ||
268 | indicating how long since the last counter zero. This is useful | ||
269 | for computing errors/time. Since the counters are always reset at | ||
270 | driver initialization time, no module/kernel parameter is available. | ||
271 | |||
272 | RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset | ||
273 | |||
274 | This resets the counters on memory controller 0 | ||
275 | |||
276 | |||
277 | Seconds since last counter reset control file: | ||
278 | |||
279 | 'seconds_since_reset' | ||
280 | |||
281 | This attribute file displays how many seconds have elapsed since the | ||
282 | last counter reset. This can be used with the error counters to | ||
283 | measure error rates. | ||
284 | |||
285 | |||
286 | |||
287 | DIMM capability attribute file: | ||
288 | |||
289 | 'edac_capability' | ||
290 | |||
291 | The EDAC (Error Detection and Correction) capabilities/modes of | ||
292 | the memory controller hardware. | ||
293 | |||
294 | |||
295 | DIMM Current Capability attribute file: | ||
296 | |||
297 | 'edac_current_capability' | ||
298 | |||
299 | The EDAC capabilities available with the hardware | ||
300 | configuration. This may not be the same as "EDAC capability" | ||
301 | if the correct memory is not used. If a memory controller is | ||
302 | capable of EDAC, but DIMMs without check bits are in use, then | ||
303 | Parity, SECDED, S4ECD4ED capabilities will not be available | ||
304 | even though the memory controller might be capable of those | ||
305 | modes with the proper memory loaded. | ||
306 | |||
307 | |||
308 | Memory Type supported on this controller attribute file: | ||
309 | |||
310 | 'supported_mem_type' | ||
311 | |||
312 | This attribute file displays the memory type, usually | ||
313 | buffered and unbuffered DIMMs. | ||
314 | |||
315 | |||
316 | Memory Controller name attribute file: | ||
317 | |||
318 | 'mc_name' | ||
319 | |||
320 | This attribute file displays the type of memory controller | ||
321 | that is being utilized. | ||
322 | |||
323 | |||
324 | Memory Controller Module name attribute file: | ||
325 | |||
326 | 'module_name' | ||
327 | |||
328 | This attribute file displays the memory controller module name, | ||
329 | version and date built. The name of the memory controller | ||
330 | hardware - some drivers work with multiple controllers and | ||
331 | this field shows which hardware is present. | ||
332 | |||
333 | |||
334 | Total memory managed by this memory controller attribute file: | ||
335 | |||
336 | 'size_mb' | ||
337 | |||
338 | This attribute file displays, in count of megabytes, of memory | ||
339 | that this instance of memory controller manages. | ||
340 | |||
341 | |||
342 | Total Uncorrectable Errors count attribute file: | ||
343 | |||
344 | 'ue_count' | ||
345 | |||
346 | This attribute file displays the total count of uncorrectable | ||
347 | errors that have occurred on this memory controller. If panic_on_ue | ||
348 | is set this counter will not have a chance to increment, | ||
349 | since EDAC will panic the system. | ||
350 | |||
351 | |||
352 | Total UE count that had no information attribute fileY: | ||
353 | |||
354 | 'ue_noinfo_count' | ||
355 | |||
356 | This attribute file displays the number of UEs that | ||
357 | have occurred have occurred with no informations as to which DIMM | ||
358 | slot is having errors. | ||
359 | |||
360 | |||
361 | Total Correctable Errors count attribute file: | ||
362 | |||
363 | 'ce_count' | ||
364 | |||
365 | This attribute file displays the total count of correctable | ||
366 | errors that have occurred on this memory controller. This | ||
367 | count is very important to examine. CEs provide early | ||
368 | indications that a DIMM is beginning to fail. This count | ||
369 | field should be monitored for non-zero values and report | ||
370 | such information to the system administrator. | ||
371 | |||
372 | |||
373 | Total Correctable Errors count attribute file: | ||
374 | |||
375 | 'ce_noinfo_count' | ||
376 | |||
377 | This attribute file displays the number of CEs that | ||
378 | have occurred wherewith no informations as to which DIMM slot | ||
379 | is having errors. Memory is handicapped, but operational, | ||
380 | yet no information is available to indicate which slot | ||
381 | the failing memory is in. This count field should be also | ||
382 | be monitored for non-zero values. | ||
383 | |||
384 | Device Symlink: | ||
385 | |||
386 | 'device' | ||
387 | |||
388 | Symlink to the memory controller device | ||
389 | |||
390 | |||
391 | |||
392 | ============================================================================ | ||
393 | 'csrowX' DIRECTORIES | ||
394 | |||
395 | In the 'csrowX' directories are EDAC control and attribute files for | ||
396 | this 'X" instance of csrow: | ||
397 | |||
398 | |||
399 | Total Uncorrectable Errors count attribute file: | ||
400 | |||
401 | 'ue_count' | ||
402 | |||
403 | This attribute file displays the total count of uncorrectable | ||
404 | errors that have occurred on this csrow. If panic_on_ue is set | ||
405 | this counter will not have a chance to increment, since EDAC | ||
406 | will panic the system. | ||
407 | |||
408 | |||
409 | Total Correctable Errors count attribute file: | ||
410 | |||
411 | 'ce_count' | ||
412 | |||
413 | This attribute file displays the total count of correctable | ||
414 | errors that have occurred on this csrow. This | ||
415 | count is very important to examine. CEs provide early | ||
416 | indications that a DIMM is beginning to fail. This count | ||
417 | field should be monitored for non-zero values and report | ||
418 | such information to the system administrator. | ||
419 | |||
420 | |||
421 | Total memory managed by this csrow attribute file: | ||
422 | |||
423 | 'size_mb' | ||
424 | |||
425 | This attribute file displays, in count of megabytes, of memory | ||
426 | that this csrow contatins. | ||
427 | |||
428 | |||
429 | Memory Type attribute file: | ||
430 | |||
431 | 'mem_type' | ||
432 | |||
433 | This attribute file will display what type of memory is currently | ||
434 | on this csrow. Normally, either buffered or unbuffered memory. | ||
435 | |||
436 | |||
437 | EDAC Mode of operation attribute file: | ||
438 | |||
439 | 'edac_mode' | ||
440 | |||
441 | This attribute file will display what type of Error detection | ||
442 | and correction is being utilized. | ||
443 | |||
444 | |||
445 | Device type attribute file: | ||
446 | |||
447 | 'dev_type' | ||
448 | |||
449 | This attribute file will display what type of DIMM device is | ||
450 | being utilized. Example: x4 | ||
451 | |||
452 | |||
453 | Channel 0 CE Count attribute file: | ||
454 | |||
455 | 'ch0_ce_count' | ||
456 | |||
457 | This attribute file will display the count of CEs on this | ||
458 | DIMM located in channel 0. | ||
459 | |||
460 | |||
461 | Channel 0 UE Count attribute file: | ||
462 | |||
463 | 'ch0_ue_count' | ||
464 | |||
465 | This attribute file will display the count of UEs on this | ||
466 | DIMM located in channel 0. | ||
467 | |||
468 | |||
469 | Channel 0 DIMM Label control file: | ||
470 | |||
471 | 'ch0_dimm_label' | ||
472 | |||
473 | This control file allows this DIMM to have a label assigned | ||
474 | to it. With this label in the module, when errors occur | ||
475 | the output can provide the DIMM label in the system log. | ||
476 | This becomes vital for panic events to isolate the | ||
477 | cause of the UE event. | ||
478 | |||
479 | DIMM Labels must be assigned after booting, with information | ||
480 | that correctly identifies the physical slot with its | ||
481 | silk screen label. This information is currently very | ||
482 | motherboard specific and determination of this information | ||
483 | must occur in userland at this time. | ||
484 | |||
485 | |||
486 | Channel 1 CE Count attribute file: | ||
487 | |||
488 | 'ch1_ce_count' | ||
489 | |||
490 | This attribute file will display the count of CEs on this | ||
491 | DIMM located in channel 1. | ||
492 | |||
493 | |||
494 | Channel 1 UE Count attribute file: | ||
495 | |||
496 | 'ch1_ue_count' | ||
497 | |||
498 | This attribute file will display the count of UEs on this | ||
499 | DIMM located in channel 0. | ||
500 | |||
501 | |||
502 | Channel 1 DIMM Label control file: | ||
503 | |||
504 | 'ch1_dimm_label' | ||
505 | |||
506 | This control file allows this DIMM to have a label assigned | ||
507 | to it. With this label in the module, when errors occur | ||
508 | the output can provide the DIMM label in the system log. | ||
509 | This becomes vital for panic events to isolate the | ||
510 | cause of the UE event. | ||
511 | |||
512 | DIMM Labels must be assigned after booting, with information | ||
513 | that correctly identifies the physical slot with its | ||
514 | silk screen label. This information is currently very | ||
515 | motherboard specific and determination of this information | ||
516 | must occur in userland at this time. | ||
517 | |||
518 | |||
519 | ============================================================================ | ||
520 | SYSTEM LOGGING | ||
521 | |||
522 | If logging for UEs and CEs are enabled then system logs will have | ||
523 | error notices indicating errors that have been detected: | ||
524 | |||
525 | MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, | ||
526 | channel 1 "DIMM_B1": amd76x_edac | ||
527 | |||
528 | MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, | ||
529 | channel 1 "DIMM_B1": amd76x_edac | ||
530 | |||
531 | |||
532 | The structure of the message is: | ||
533 | the memory controller (MC0) | ||
534 | Error type (CE) | ||
535 | memory page (0x283) | ||
536 | offset in the page (0xce0) | ||
537 | the byte granularity (grain 8) | ||
538 | or resolution of the error | ||
539 | the error syndrome (0xb741) | ||
540 | memory row (row 0) | ||
541 | memory channel (channel 1) | ||
542 | DIMM label, if set prior (DIMM B1 | ||
543 | and then an optional, driver-specific message that may | ||
544 | have additional information. | ||
545 | |||
546 | Both UEs and CEs with no info will lack all but memory controller, | ||
547 | error type, a notice of "no info" and then an optional, | ||
548 | driver-specific error message. | ||
549 | |||
550 | |||
551 | |||
552 | ============================================================================ | ||
553 | PCI Bus Parity Detection | ||
554 | |||
555 | |||
556 | On Header Type 00 devices the primary status is looked at | ||
557 | for any parity error regardless of whether Parity is enabled on the | ||
558 | device. (The spec indicates parity is generated in some cases). | ||
559 | On Header Type 01 bridges, the secondary status register is also | ||
560 | looked at to see if parity ocurred on the bus on the other side of | ||
561 | the bridge. | ||
562 | |||
563 | |||
564 | SYSFS CONFIGURATION | ||
565 | |||
566 | Under /sys/devices/system/edac/pci are control and attribute files as follows: | ||
567 | |||
568 | |||
569 | Enable/Disable PCI Parity checking control file: | ||
570 | |||
571 | 'check_pci_parity' | ||
572 | |||
573 | |||
574 | This control file enables or disables the PCI Bus Parity scanning | ||
575 | operation. Writing a 1 to this file enables the scanning. Writing | ||
576 | a 0 to this file disables the scanning. | ||
577 | |||
578 | Enable: | ||
579 | echo "1" >/sys/devices/system/edac/pci/check_pci_parity | ||
580 | |||
581 | Disable: | ||
582 | echo "0" >/sys/devices/system/edac/pci/check_pci_parity | ||
583 | |||
584 | |||
585 | |||
586 | Panic on PCI PARITY Error: | ||
587 | |||
588 | 'panic_on_pci_parity' | ||
589 | |||
590 | |||
591 | This control files enables or disables panic'ing when a parity | ||
592 | error has been detected. | ||
593 | |||
594 | |||
595 | module/kernel parameter: panic_on_pci_parity=[0|1] | ||
596 | |||
597 | Enable: | ||
598 | echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity | ||
599 | |||
600 | Disable: | ||
601 | echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity | ||
602 | |||
603 | |||
604 | Parity Count: | ||
605 | |||
606 | 'pci_parity_count' | ||
607 | |||
608 | This attribute file will display the number of parity errors that | ||
609 | have been detected. | ||
610 | |||
611 | |||
612 | |||
613 | PCI Device Whitelist: | ||
614 | |||
615 | 'pci_parity_whitelist' | ||
616 | |||
617 | This control file allows for an explicit list of PCI devices to be | ||
618 | scanned for parity errors. Only devices found on this list will | ||
619 | be examined. The list is a line of hexadecimel VENDOR and DEVICE | ||
620 | ID tuples: | ||
621 | |||
622 | 1022:7450,1434:16a6 | ||
623 | |||
624 | One or more can be inserted, seperated by a comma. | ||
625 | |||
626 | To write the above list doing the following as one command line: | ||
627 | |||
628 | echo "1022:7450,1434:16a6" | ||
629 | > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
630 | |||
631 | |||
632 | |||
633 | To display what the whitelist is, simply 'cat' the same file. | ||
634 | |||
635 | |||
636 | PCI Device Blacklist: | ||
637 | |||
638 | 'pci_parity_blacklist' | ||
639 | |||
640 | This control file allows for a list of PCI devices to be | ||
641 | skipped for scanning. | ||
642 | The list is a line of hexadecimel VENDOR and DEVICE ID tuples: | ||
643 | |||
644 | 1022:7450,1434:16a6 | ||
645 | |||
646 | One or more can be inserted, seperated by a comma. | ||
647 | |||
648 | To write the above list doing the following as one command line: | ||
649 | |||
650 | echo "1022:7450,1434:16a6" | ||
651 | > /sys/devices/system/edac/pci/pci_parity_blacklist | ||
652 | |||
653 | |||
654 | To display what the whitelist current contatins, | ||
655 | simply 'cat' the same file. | ||
656 | |||
657 | ======================================================================= | ||
658 | |||
659 | PCI Vendor and Devices IDs can be obtained with the lspci command. Using | ||
660 | the -n option lspci will display the vendor and device IDs. The system | ||
661 | adminstrator will have to determine which devices should be scanned or | ||
662 | skipped. | ||
663 | |||
664 | |||
665 | |||
666 | The two lists (white and black) are prioritized. blacklist is the lower | ||
667 | priority and will NOT be utilized when a whitelist has been set. | ||
668 | Turn OFF a whitelist by an empty echo command: | ||
669 | |||
670 | echo > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
671 | |||
672 | and any previous blacklist will be utililzed. | ||
673 | |||