aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/drivers/edac/edac.txt673
-rw-r--r--MAINTAINERS9
-rw-r--r--arch/i386/kernel/quirks.c9
-rw-r--r--drivers/Kconfig2
-rw-r--r--drivers/Makefile1
-rw-r--r--drivers/edac/Kconfig102
-rw-r--r--drivers/edac/Makefile18
-rw-r--r--drivers/edac/amd76x_edac.c2
-rw-r--r--drivers/edac/e752x_edac.c14
-rw-r--r--drivers/edac/e7xxx_edac.c2
-rw-r--r--drivers/edac/edac_mc.c2209
-rw-r--r--drivers/edac/edac_mc.h448
-rw-r--r--drivers/edac/i82860_edac.c2
-rw-r--r--drivers/edac/i82875p_edac.c2
-rw-r--r--drivers/edac/r82600_edac.c2
-rw-r--r--include/asm-i386/atomic.h12
-rw-r--r--include/asm-i386/edac.h18
-rw-r--r--include/asm-x86_64/atomic.h12
-rw-r--r--include/asm-x86_64/edac.h18
19 files changed, 3515 insertions, 40 deletions
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt
new file mode 100644
index 000000000000..d37191fe5681
--- /dev/null
+++ b/Documentation/drivers/edac/edac.txt
@@ -0,0 +1,673 @@
1
2
3EDAC - Error Detection And Correction
4
5Written by Doug Thompson <norsk5@xmission.com>
67 Dec 2005
7
8
9EDAC was written by:
10 Thayne Harbaugh,
11 modified by Dave Peterson, Doug Thompson, et al,
12 from the bluesmoke.sourceforge.net project.
13
14
15============================================================================
16EDAC PURPOSE
17
18The 'edac' kernel module goal is to detect and report errors that occur
19within the computer system. In the initial release, memory Correctable Errors
20(CE) and Uncorrectable Errors (UE) are the primary errors being harvested.
21
22Detecting CE events, then harvesting those events and reporting them,
23CAN be a predictor of future UE events. With CE events, the system can
24continue to operate, but with less safety. Preventive maintainence and
25proactive part replacement of memory DIMMs exhibiting CEs can reduce
26the likelihood of the dreaded UE events and system 'panics'.
27
28
29In addition, PCI Bus Parity and SERR Errors are scanned for on PCI devices
30in order to determine if errors are occurring on data transfers.
31The presence of PCI Parity errors must be examined with a grain of salt.
32There are several addin adapters that do NOT follow the PCI specification
33with regards to Parity generation and reporting. The specification says
34the vendor should tie the parity status bits to 0 if they do not intend
35to generate parity. Some vendors do not do this, and thus the parity bit
36can "float" giving false positives.
37
38The PCI Parity EDAC device has the ability to "skip" known flakey
39cards during the parity scan. These are set by the parity "blacklist"
40interface in the sysfs for PCI Parity. (See the PCI section in the sysfs
41section below.) There is also a parity "whitelist" which is used as
42an explicit list of devices to scan, while the blacklist is a list
43of devices to skip.
44
45EDAC will have future error detectors that will be added or integrated
46into EDAC in the following list:
47
48 MCE Machine Check Exception
49 MCA Machine Check Architecture
50 NMI NMI notification of ECC errors
51 MSRs Machine Specific Register error cases
52 and other mechanisms.
53
54These errors are usually bus errors, ECC errors, thermal throttling
55and the like.
56
57
58============================================================================
59EDAC VERSIONING
60
61EDAC is composed of a "core" module (edac_mc.ko) and several Memory
62Controller (MC) driver modules. On a given system, the CORE
63is loaded and one MC driver will be loaded. Both the CORE and
64the MC driver have individual versions that reflect current release
65level of their respective modules. Thus, to "report" on what version
66a system is running, one must report both the CORE's and the
67MC driver's versions.
68
69
70LOADING
71
72If 'edac' was statically linked with the kernel then no loading is
73necessary. If 'edac' was built as modules then simply modprobe the
74'edac' pieces that you need. You should be able to modprobe
75hardware-specific modules and have the dependencies load the necessary core
76modules.
77
78Example:
79
80$> modprobe amd76x_edac
81
82loads both the amd76x_edac.ko memory controller module and the edac_mc.ko
83core module.
84
85
86============================================================================
87EDAC sysfs INTERFACE
88
89EDAC presents a 'sysfs' interface for control, reporting and attribute
90reporting purposes.
91
92EDAC lives in the /sys/devices/system/edac directory. Within this directory
93there currently reside 2 'edac' components:
94
95 mc memory controller(s) system
96 pci PCI status system
97
98
99============================================================================
100Memory Controller (mc) Model
101
102First a background on the memory controller's model abstracted in EDAC.
103Each mc device controls a set of DIMM memory modules. These modules are
104layed out in a Chip-Select Row (csrowX) and Channel table (chX). There can
105be multiple csrows and two channels.
106
107Memory controllers allow for several csrows, with 8 csrows being a typical value.
108Yet, the actual number of csrows depends on the electrical "loading"
109of a given motherboard, memory controller and DIMM characteristics.
110
111Dual channels allows for 128 bit data transfers to the CPU from memory.
112
113
114 Channel 0 Channel 1
115 ===================================
116 csrow0 | DIMM_A0 | DIMM_B0 |
117 csrow1 | DIMM_A0 | DIMM_B0 |
118 ===================================
119
120 ===================================
121 csrow2 | DIMM_A1 | DIMM_B1 |
122 csrow3 | DIMM_A1 | DIMM_B1 |
123 ===================================
124
125In the above example table there are 4 physical slots on the motherboard
126for memory DIMMs:
127
128 DIMM_A0
129 DIMM_B0
130 DIMM_A1
131 DIMM_B1
132
133Labels for these slots are usually silk screened on the motherboard. Slots
134labeled 'A' are channel 0 in this example. Slots labled 'B'
135are channel 1. Notice that there are two csrows possible on a
136physical DIMM. These csrows are allocated their csrow assignment
137based on the slot into which the memory DIMM is placed. Thus, when 1 DIMM
138is placed in each Channel, the csrows cross both DIMMs.
139
140Memory DIMMs come single or dual "ranked". A rank is a populated csrow.
141Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above
142will have 1 csrow, csrow0. csrow1 will be empty. On the other hand,
143when 2 dual ranked DIMMs are similiaryly placed, then both csrow0 and
144csrow1 will be populated. The pattern repeats itself for csrow2 and
145csrow3.
146
147The representation of the above is reflected in the directory tree
148in EDAC's sysfs interface. Starting in directory
149/sys/devices/system/edac/mc each memory controller will be represented
150by its own 'mcX' directory, where 'X" is the index of the MC.
151
152
153 ..../edac/mc/
154 |
155 |->mc0
156 |->mc1
157 |->mc2
158 ....
159
160Under each 'mcX' directory each 'csrowX' is again represented by a
161'csrowX', where 'X" is the csrow index:
162
163
164 .../mc/mc0/
165 |
166 |->csrow0
167 |->csrow2
168 |->csrow3
169 ....
170
171Notice that there is no csrow1, which indicates that csrow0 is
172composed of a single ranked DIMMs. This should also apply in both
173Channels, in order to have dual-channel mode be operational. Since
174both csrow2 and csrow3 are populated, this indicates a dual ranked
175set of DIMMs for channels 0 and 1.
176
177
178Within each of the 'mc','mcX' and 'csrowX' directories are several
179EDAC control and attribute files.
180
181
182============================================================================
183DIRECTORY 'mc'
184
185In directory 'mc' are EDAC system overall control and attribute files:
186
187
188Panic on UE control file:
189
190 'panic_on_ue'
191
192 An uncorrectable error will cause a machine panic. This is usually
193 desirable. It is a bad idea to continue when an uncorrectable error
194 occurs - it is indeterminate what was uncorrected and the operating
195 system context might be so mangled that continuing will lead to further
196 corruption. If the kernel has MCE configured, then EDAC will never
197 notice the UE.
198
199 LOAD TIME: module/kernel parameter: panic_on_ue=[0|1]
200
201 RUN TIME: echo "1" >/sys/devices/system/edac/mc/panic_on_ue
202
203
204Log UE control file:
205
206 'log_ue'
207
208 Generate kernel messages describing uncorrectable errors. These errors
209 are reported through the system message log system. UE statistics
210 will be accumulated even when UE logging is disabled.
211
212 LOAD TIME: module/kernel parameter: log_ue=[0|1]
213
214 RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ue
215
216
217Log CE control file:
218
219 'log_ce'
220
221 Generate kernel messages describing correctable errors. These
222 errors are reported through the system message log system.
223 CE statistics will be accumulated even when CE logging is disabled.
224
225 LOAD TIME: module/kernel parameter: log_ce=[0|1]
226
227 RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ce
228
229
230Polling period control file:
231
232 'poll_msec'
233
234 The time period, in milliseconds, for polling for error information.
235 Too small a value wastes resources. Too large a value might delay
236 necessary handling of errors and might loose valuable information for
237 locating the error. 1000 milliseconds (once each second) is about
238 right for most uses.
239
240 LOAD TIME: module/kernel parameter: poll_msec=[0|1]
241
242 RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec
243
244
245Module Version read-only attribute file:
246
247 'mc_version'
248
249 The EDAC CORE modules's version and compile date are shown here to
250 indicate what EDAC is running.
251
252
253
254============================================================================
255'mcX' DIRECTORIES
256
257
258In 'mcX' directories are EDAC control and attribute files for
259this 'X" instance of the memory controllers:
260
261
262Counter reset control file:
263
264 'reset_counters'
265
266 This write-only control file will zero all the statistical counters
267 for UE and CE errors. Zeroing the counters will also reset the timer
268 indicating how long since the last counter zero. This is useful
269 for computing errors/time. Since the counters are always reset at
270 driver initialization time, no module/kernel parameter is available.
271
272 RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset
273
274 This resets the counters on memory controller 0
275
276
277Seconds since last counter reset control file:
278
279 'seconds_since_reset'
280
281 This attribute file displays how many seconds have elapsed since the
282 last counter reset. This can be used with the error counters to
283 measure error rates.
284
285
286
287DIMM capability attribute file:
288
289 'edac_capability'
290
291 The EDAC (Error Detection and Correction) capabilities/modes of
292 the memory controller hardware.
293
294
295DIMM Current Capability attribute file:
296
297 'edac_current_capability'
298
299 The EDAC capabilities available with the hardware
300 configuration. This may not be the same as "EDAC capability"
301 if the correct memory is not used. If a memory controller is
302 capable of EDAC, but DIMMs without check bits are in use, then
303 Parity, SECDED, S4ECD4ED capabilities will not be available
304 even though the memory controller might be capable of those
305 modes with the proper memory loaded.
306
307
308Memory Type supported on this controller attribute file:
309
310 'supported_mem_type'
311
312 This attribute file displays the memory type, usually
313 buffered and unbuffered DIMMs.
314
315
316Memory Controller name attribute file:
317
318 'mc_name'
319
320 This attribute file displays the type of memory controller
321 that is being utilized.
322
323
324Memory Controller Module name attribute file:
325
326 'module_name'
327
328 This attribute file displays the memory controller module name,
329 version and date built. The name of the memory controller
330 hardware - some drivers work with multiple controllers and
331 this field shows which hardware is present.
332
333
334Total memory managed by this memory controller attribute file:
335
336 'size_mb'
337
338 This attribute file displays, in count of megabytes, of memory
339 that this instance of memory controller manages.
340
341
342Total Uncorrectable Errors count attribute file:
343
344 'ue_count'
345
346 This attribute file displays the total count of uncorrectable
347 errors that have occurred on this memory controller. If panic_on_ue
348 is set this counter will not have a chance to increment,
349 since EDAC will panic the system.
350
351
352Total UE count that had no information attribute fileY:
353
354 'ue_noinfo_count'
355
356 This attribute file displays the number of UEs that
357 have occurred have occurred with no informations as to which DIMM
358 slot is having errors.
359
360
361Total Correctable Errors count attribute file:
362
363 'ce_count'
364
365 This attribute file displays the total count of correctable
366 errors that have occurred on this memory controller. This
367 count is very important to examine. CEs provide early
368 indications that a DIMM is beginning to fail. This count
369 field should be monitored for non-zero values and report
370 such information to the system administrator.
371
372
373Total Correctable Errors count attribute file:
374
375 'ce_noinfo_count'
376
377 This attribute file displays the number of CEs that
378 have occurred wherewith no informations as to which DIMM slot
379 is having errors. Memory is handicapped, but operational,
380 yet no information is available to indicate which slot
381 the failing memory is in. This count field should be also
382 be monitored for non-zero values.
383
384Device Symlink:
385
386 'device'
387
388 Symlink to the memory controller device
389
390
391
392============================================================================
393'csrowX' DIRECTORIES
394
395In the 'csrowX' directories are EDAC control and attribute files for
396this 'X" instance of csrow:
397
398
399Total Uncorrectable Errors count attribute file:
400
401 'ue_count'
402
403 This attribute file displays the total count of uncorrectable
404 errors that have occurred on this csrow. If panic_on_ue is set
405 this counter will not have a chance to increment, since EDAC
406 will panic the system.
407
408
409Total Correctable Errors count attribute file:
410
411 'ce_count'
412
413 This attribute file displays the total count of correctable
414 errors that have occurred on this csrow. This
415 count is very important to examine. CEs provide early
416 indications that a DIMM is beginning to fail. This count
417 field should be monitored for non-zero values and report
418 such information to the system administrator.
419
420
421Total memory managed by this csrow attribute file:
422
423 'size_mb'
424
425 This attribute file displays, in count of megabytes, of memory
426 that this csrow contatins.
427
428
429Memory Type attribute file:
430
431 'mem_type'
432
433 This attribute file will display what type of memory is currently
434 on this csrow. Normally, either buffered or unbuffered memory.
435
436
437EDAC Mode of operation attribute file:
438
439 'edac_mode'
440
441 This attribute file will display what type of Error detection
442 and correction is being utilized.
443
444
445Device type attribute file:
446
447 'dev_type'
448
449 This attribute file will display what type of DIMM device is
450 being utilized. Example: x4
451
452
453Channel 0 CE Count attribute file:
454
455 'ch0_ce_count'
456
457 This attribute file will display the count of CEs on this
458 DIMM located in channel 0.
459
460
461Channel 0 UE Count attribute file:
462
463 'ch0_ue_count'
464
465 This attribute file will display the count of UEs on this
466 DIMM located in channel 0.
467
468
469Channel 0 DIMM Label control file:
470
471 'ch0_dimm_label'
472
473 This control file allows this DIMM to have a label assigned
474 to it. With this label in the module, when errors occur
475 the output can provide the DIMM label in the system log.
476 This becomes vital for panic events to isolate the
477 cause of the UE event.
478
479 DIMM Labels must be assigned after booting, with information
480 that correctly identifies the physical slot with its
481 silk screen label. This information is currently very
482 motherboard specific and determination of this information
483 must occur in userland at this time.
484
485
486Channel 1 CE Count attribute file:
487
488 'ch1_ce_count'
489
490 This attribute file will display the count of CEs on this
491 DIMM located in channel 1.
492
493
494Channel 1 UE Count attribute file:
495
496 'ch1_ue_count'
497
498 This attribute file will display the count of UEs on this
499 DIMM located in channel 0.
500
501
502Channel 1 DIMM Label control file:
503
504 'ch1_dimm_label'
505
506 This control file allows this DIMM to have a label assigned
507 to it. With this label in the module, when errors occur
508 the output can provide the DIMM label in the system log.
509 This becomes vital for panic events to isolate the
510 cause of the UE event.
511
512 DIMM Labels must be assigned after booting, with information
513 that correctly identifies the physical slot with its
514 silk screen label. This information is currently very
515 motherboard specific and determination of this information
516 must occur in userland at this time.
517
518
519============================================================================
520SYSTEM LOGGING
521
522If logging for UEs and CEs are enabled then system logs will have
523error notices indicating errors that have been detected:
524
525MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0,
526channel 1 "DIMM_B1": amd76x_edac
527
528MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0,
529channel 1 "DIMM_B1": amd76x_edac
530
531
532The structure of the message is:
533 the memory controller (MC0)
534 Error type (CE)
535 memory page (0x283)
536 offset in the page (0xce0)
537 the byte granularity (grain 8)
538 or resolution of the error
539 the error syndrome (0xb741)
540 memory row (row 0)
541 memory channel (channel 1)
542 DIMM label, if set prior (DIMM B1
543 and then an optional, driver-specific message that may
544 have additional information.
545
546Both UEs and CEs with no info will lack all but memory controller,
547error type, a notice of "no info" and then an optional,
548driver-specific error message.
549
550
551
552============================================================================
553PCI Bus Parity Detection
554
555
556On Header Type 00 devices the primary status is looked at
557for any parity error regardless of whether Parity is enabled on the
558device. (The spec indicates parity is generated in some cases).
559On Header Type 01 bridges, the secondary status register is also
560looked at to see if parity ocurred on the bus on the other side of
561the bridge.
562
563
564SYSFS CONFIGURATION
565
566Under /sys/devices/system/edac/pci are control and attribute files as follows:
567
568
569Enable/Disable PCI Parity checking control file:
570
571 'check_pci_parity'
572
573
574 This control file enables or disables the PCI Bus Parity scanning
575 operation. Writing a 1 to this file enables the scanning. Writing
576 a 0 to this file disables the scanning.
577
578 Enable:
579 echo "1" >/sys/devices/system/edac/pci/check_pci_parity
580
581 Disable:
582 echo "0" >/sys/devices/system/edac/pci/check_pci_parity
583
584
585
586Panic on PCI PARITY Error:
587
588 'panic_on_pci_parity'
589
590
591 This control files enables or disables panic'ing when a parity
592 error has been detected.
593
594
595 module/kernel parameter: panic_on_pci_parity=[0|1]
596
597 Enable:
598 echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity
599
600 Disable:
601 echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity
602
603
604Parity Count:
605
606 'pci_parity_count'
607
608 This attribute file will display the number of parity errors that
609 have been detected.
610
611
612
613PCI Device Whitelist:
614
615 'pci_parity_whitelist'
616
617 This control file allows for an explicit list of PCI devices to be
618 scanned for parity errors. Only devices found on this list will
619 be examined. The list is a line of hexadecimel VENDOR and DEVICE
620 ID tuples:
621
622 1022:7450,1434:16a6
623
624 One or more can be inserted, seperated by a comma.
625
626 To write the above list doing the following as one command line:
627
628 echo "1022:7450,1434:16a6"
629 > /sys/devices/system/edac/pci/pci_parity_whitelist
630
631
632
633 To display what the whitelist is, simply 'cat' the same file.
634
635
636PCI Device Blacklist:
637
638 'pci_parity_blacklist'
639
640 This control file allows for a list of PCI devices to be
641 skipped for scanning.
642 The list is a line of hexadecimel VENDOR and DEVICE ID tuples:
643
644 1022:7450,1434:16a6
645
646 One or more can be inserted, seperated by a comma.
647
648 To write the above list doing the following as one command line:
649
650 echo "1022:7450,1434:16a6"
651 > /sys/devices/system/edac/pci/pci_parity_blacklist
652
653
654 To display what the whitelist current contatins,
655 simply 'cat' the same file.
656
657=======================================================================
658
659PCI Vendor and Devices IDs can be obtained with the lspci command. Using
660the -n option lspci will display the vendor and device IDs. The system
661adminstrator will have to determine which devices should be scanned or
662skipped.
663
664
665
666The two lists (white and black) are prioritized. blacklist is the lower
667priority and will NOT be utilized when a whitelist has been set.
668Turn OFF a whitelist by an empty echo command:
669
670 echo > /sys/devices/system/edac/pci/pci_parity_whitelist
671
672and any previous blacklist will be utililzed.
673
diff --git a/MAINTAINERS b/MAINTAINERS
index e6dbb21a8e5b..3f8a90ac47d7 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -867,6 +867,15 @@ L: ebtables-devel@lists.sourceforge.net
867W: http://ebtables.sourceforge.net/ 867W: http://ebtables.sourceforge.net/
868S: Maintained 868S: Maintained
869 869
870EDAC-CORE
871P: Doug Thompson
872M: norsk5@xmission.com, dthompson@linuxnetworx.com
873P: Dave Peterson
874M: dsp@llnl.gov, dave_peterson@pobox.com
875L: bluesmoke-devel@lists.sourceforge.net
876W: bluesmoke.sourceforge.net
877S: Maintained
878
870EEPRO100 NETWORK DRIVER 879EEPRO100 NETWORK DRIVER
871P: Andrey V. Savochkin 880P: Andrey V. Savochkin
872M: saw@saw.sw.com.sg 881M: saw@saw.sw.com.sg
diff --git a/arch/i386/kernel/quirks.c b/arch/i386/kernel/quirks.c
index aaf89cb2bc51..87ccdac84928 100644
--- a/arch/i386/kernel/quirks.c
+++ b/arch/i386/kernel/quirks.c
@@ -25,8 +25,7 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
25 25
26 /* enable access to config space*/ 26 /* enable access to config space*/
27 pci_read_config_byte(dev, 0xf4, &config); 27 pci_read_config_byte(dev, 0xf4, &config);
28 config |= 0x2; 28 pci_write_config_byte(dev, 0xf4, config|0x2);
29 pci_write_config_byte(dev, 0xf4, config);
30 29
31 /* read xTPR register */ 30 /* read xTPR register */
32 raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word); 31 raw_pci_ops->read(0, 0, 0x40, 0x4c, 2, &word);
@@ -42,9 +41,9 @@ static void __devinit quirk_intel_irqbalance(struct pci_dev *dev)
42#endif 41#endif
43 } 42 }
44 43
45 config &= ~0x2; 44 /* put back the original value for config space*/
46 /* disable access to config space*/ 45 if (!(config & 0x2))
47 pci_write_config_byte(dev, 0xf4, config); 46 pci_write_config_byte(dev, 0xf4, config);
48} 47}
49DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7320_MCH, quirk_intel_irqbalance); 48DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7320_MCH, quirk_intel_irqbalance);
50DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, quirk_intel_irqbalance); 49DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_E7525_MCH, quirk_intel_irqbalance);
diff --git a/drivers/Kconfig b/drivers/Kconfig
index 283c089537bc..bddf431bbb72 100644
--- a/drivers/Kconfig
+++ b/drivers/Kconfig
@@ -68,4 +68,6 @@ source "drivers/infiniband/Kconfig"
68 68
69source "drivers/sn/Kconfig" 69source "drivers/sn/Kconfig"
70 70
71source "drivers/edac/Kconfig"
72
71endmenu 73endmenu
diff --git a/drivers/Makefile b/drivers/Makefile
index 7c45050ecd03..619dd964c51c 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -63,6 +63,7 @@ obj-$(CONFIG_PHONE) += telephony/
63obj-$(CONFIG_MD) += md/ 63obj-$(CONFIG_MD) += md/
64obj-$(CONFIG_BT) += bluetooth/ 64obj-$(CONFIG_BT) += bluetooth/
65obj-$(CONFIG_ISDN) += isdn/ 65obj-$(CONFIG_ISDN) += isdn/
66obj-$(CONFIG_EDAC) += edac/
66obj-$(CONFIG_MCA) += mca/ 67obj-$(CONFIG_MCA) += mca/
67obj-$(CONFIG_EISA) += eisa/ 68obj-$(CONFIG_EISA) += eisa/
68obj-$(CONFIG_CPU_FREQ) += cpufreq/ 69obj-$(CONFIG_CPU_FREQ) += cpufreq/
diff --git a/drivers/edac/Kconfig b/drivers/edac/Kconfig
new file mode 100644
index 000000000000..4819e7fc00dd
--- /dev/null
+++ b/drivers/edac/Kconfig
@@ -0,0 +1,102 @@
1#
2# EDAC Kconfig
3# Copyright (c) 2003 Linux Networx
4# Licensed and distributed under the GPL
5#
6# $Id: Kconfig,v 1.4.2.7 2005/07/08 22:05:38 dsp_llnl Exp $
7#
8
9menu 'EDAC - error detection and reporting (RAS)'
10
11config EDAC
12 tristate "EDAC core system error reporting"
13 depends on X86
14 default y
15 help
16 EDAC is designed to report errors in the core system.
17 These are low-level errors that are reported in the CPU or
18 supporting chipset: memory errors, cache errors, PCI errors,
19 thermal throttling, etc.. If unsure, select 'Y'.
20
21
22comment "Reporting subsystems"
23 depends on EDAC
24
25config EDAC_DEBUG
26 bool "Debugging"
27 depends on EDAC
28 help
29 This turns on debugging information for the entire EDAC
30 sub-system. You can insert module with "debug_level=x", current
31 there're four debug levels (x=0,1,2,3 from low to high).
32 Usually you should select 'N'.
33
34config EDAC_MM_EDAC
35 tristate "Main Memory EDAC (Error Detection And Correction) reporting"
36 depends on EDAC
37 default y
38 help
39 Some systems are able to detect and correct errors in main
40 memory. EDAC can report statistics on memory error
41 detection and correction (EDAC - or commonly referred to ECC
42 errors). EDAC will also try to decode where these errors
43 occurred so that a particular failing memory module can be
44 replaced. If unsure, select 'Y'.
45
46
47config EDAC_AMD76X
48 tristate "AMD 76x (760, 762, 768)"
49 depends on EDAC_MM_EDAC && PCI
50 help
51 Support for error detection and correction on the AMD 76x
52 series of chipsets used with the Athlon processor.
53
54config EDAC_E7XXX
55 tristate "Intel e7xxx (e7205, e7500, e7501, e7505)"
56 depends on EDAC_MM_EDAC && PCI
57 help
58 Support for error detection and correction on the Intel
59 E7205, E7500, E7501 and E7505 server chipsets.
60
61config EDAC_E752X
62 tristate "Intel e752x (e7520, e7525, e7320)"
63 depends on EDAC_MM_EDAC && PCI
64 help
65 Support for error detection and correction on the Intel
66 E7520, E7525, E7320 server chipsets.
67
68config EDAC_I82875P
69 tristate "Intel 82875p (D82875P, E7210)"
70 depends on EDAC_MM_EDAC && PCI
71 help
72 Support for error detection and correction on the Intel
73 DP82785P and E7210 server chipsets.
74
75config EDAC_I82860
76 tristate "Intel 82860"
77 depends on EDAC_MM_EDAC && PCI
78 help
79 Support for error detection and correction on the Intel
80 82860 chipset.
81
82config EDAC_R82600
83 tristate "Radisys 82600 embedded chipset"
84 depends on EDAC_MM_EDAC
85 help
86 Support for error detection and correction on the Radisys
87 82600 embedded chipset.
88
89choice
90 prompt "Error detecting method"
91 depends on EDAC
92 default EDAC_POLL
93
94config EDAC_POLL
95 bool "Poll for errors"
96 depends on EDAC
97 help
98 Poll the chipset periodically to detect errors.
99
100endchoice
101
102endmenu
diff --git a/drivers/edac/Makefile b/drivers/edac/Makefile
new file mode 100644
index 000000000000..93137fdab4b3
--- /dev/null
+++ b/drivers/edac/Makefile
@@ -0,0 +1,18 @@
1#
2# Makefile for the Linux kernel EDAC drivers.
3#
4# Copyright 02 Jul 2003, Linux Networx (http://lnxi.com)
5# This file may be distributed under the terms of the
6# GNU General Public License.
7#
8# $Id: Makefile,v 1.4.2.3 2005/07/08 22:05:38 dsp_llnl Exp $
9
10
11obj-$(CONFIG_EDAC_MM_EDAC) += edac_mc.o
12obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
13obj-$(CONFIG_EDAC_E7XXX) += e7xxx_edac.o
14obj-$(CONFIG_EDAC_E752X) += e752x_edac.o
15obj-$(CONFIG_EDAC_I82875P) += i82875p_edac.o
16obj-$(CONFIG_EDAC_I82860) += i82860_edac.o
17obj-$(CONFIG_EDAC_R82600) += r82600_edac.o
18
diff --git a/drivers/edac/amd76x_edac.c b/drivers/edac/amd76x_edac.c
index 8e2b1295e70c..2fcc8120b53c 100644
--- a/drivers/edac/amd76x_edac.c
+++ b/drivers/edac/amd76x_edac.c
@@ -338,7 +338,7 @@ static struct pci_driver amd76x_driver = {
338 .id_table = amd76x_pci_tbl, 338 .id_table = amd76x_pci_tbl,
339}; 339};
340 340
341int __init amd76x_init(void) 341static int __init amd76x_init(void)
342{ 342{
343 return pci_register_driver(&amd76x_driver); 343 return pci_register_driver(&amd76x_driver);
344} 344}
diff --git a/drivers/edac/e752x_edac.c b/drivers/edac/e752x_edac.c
index 959f584f5687..770a5a633079 100644
--- a/drivers/edac/e752x_edac.c
+++ b/drivers/edac/e752x_edac.c
@@ -13,7 +13,7 @@
13 * Wang Zhenyu at intel.com 13 * Wang Zhenyu at intel.com
14 * Dave Jiang at mvista.com 14 * Dave Jiang at mvista.com
15 * 15 *
16 * $Id: bluesmoke_e752x.c,v 1.5.2.11 2005/10/05 00:43:44 dsp_llnl Exp $ 16 * $Id: edac_e752x.c,v 1.5.2.11 2005/10/05 00:43:44 dsp_llnl Exp $
17 * 17 *
18 */ 18 */
19 19
@@ -376,14 +376,14 @@ static inline void process_threshold_ce(struct mem_ctl_info *mci, u16 error,
376 mci->mc_idx); 376 mci->mc_idx);
377} 377}
378 378
379char *global_message[11] = { 379static char *global_message[11] = {
380 "PCI Express C1", "PCI Express C", "PCI Express B1", 380 "PCI Express C1", "PCI Express C", "PCI Express B1",
381 "PCI Express B", "PCI Express A1", "PCI Express A", 381 "PCI Express B", "PCI Express A1", "PCI Express A",
382 "DMA Controler", "HUB Interface", "System Bus", 382 "DMA Controler", "HUB Interface", "System Bus",
383 "DRAM Controler", "Internal Buffer" 383 "DRAM Controler", "Internal Buffer"
384}; 384};
385 385
386char *fatal_message[2] = { "Non-Fatal ", "Fatal " }; 386static char *fatal_message[2] = { "Non-Fatal ", "Fatal " };
387 387
388static void do_global_error(int fatal, u32 errors) 388static void do_global_error(int fatal, u32 errors)
389{ 389{
@@ -405,7 +405,7 @@ static inline void global_error(int fatal, u32 errors, int *error_found,
405 do_global_error(fatal, errors); 405 do_global_error(fatal, errors);
406} 406}
407 407
408char *hub_message[7] = { 408static char *hub_message[7] = {
409 "HI Address or Command Parity", "HI Illegal Access", 409 "HI Address or Command Parity", "HI Illegal Access",
410 "HI Internal Parity", "Out of Range Access", 410 "HI Internal Parity", "Out of Range Access",
411 "HI Data Parity", "Enhanced Config Access", 411 "HI Data Parity", "Enhanced Config Access",
@@ -432,7 +432,7 @@ static inline void hub_error(int fatal, u8 errors, int *error_found,
432 do_hub_error(fatal, errors); 432 do_hub_error(fatal, errors);
433} 433}
434 434
435char *membuf_message[4] = { 435static char *membuf_message[4] = {
436 "Internal PMWB to DRAM parity", 436 "Internal PMWB to DRAM parity",
437 "Internal PMWB to System Bus Parity", 437 "Internal PMWB to System Bus Parity",
438 "Internal System Bus or IO to PMWB Parity", 438 "Internal System Bus or IO to PMWB Parity",
@@ -458,6 +458,7 @@ static inline void membuf_error(u8 errors, int *error_found, int handle_error)
458 do_membuf_error(errors); 458 do_membuf_error(errors);
459} 459}
460 460
461#if 0
461char *sysbus_message[10] = { 462char *sysbus_message[10] = {
462 "Addr or Request Parity", 463 "Addr or Request Parity",
463 "Data Strobe Glitch", 464 "Data Strobe Glitch",
@@ -469,6 +470,7 @@ char *sysbus_message[10] = {
469 "Memory Parity", 470 "Memory Parity",
470 "IO Subsystem Parity" 471 "IO Subsystem Parity"
471}; 472};
473#endif /* 0 */
472 474
473static void do_sysbus_error(int fatal, u32 errors) 475static void do_sysbus_error(int fatal, u32 errors)
474{ 476{
@@ -1044,7 +1046,7 @@ static struct pci_driver e752x_driver = {
1044}; 1046};
1045 1047
1046 1048
1047int __init e752x_init(void) 1049static int __init e752x_init(void)
1048{ 1050{
1049 int pci_rc; 1051 int pci_rc;
1050 1052
diff --git a/drivers/edac/e7xxx_edac.c b/drivers/edac/e7xxx_edac.c
index 066be43f5ece..d5e320dfc66f 100644
--- a/drivers/edac/e7xxx_edac.c
+++ b/drivers/edac/e7xxx_edac.c
@@ -537,7 +537,7 @@ static struct pci_driver e7xxx_driver = {
537}; 537};
538 538
539 539
540int __init e7xxx_init(void) 540static int __init e7xxx_init(void)
541{ 541{
542 return pci_register_driver(&e7xxx_driver); 542 return pci_register_driver(&e7xxx_driver);
543} 543}
diff --git a/drivers/edac/edac_mc.c b/drivers/edac/edac_mc.c
new file mode 100644
index 000000000000..4be9bd0a1267
--- /dev/null
+++ b/drivers/edac/edac_mc.c
@@ -0,0 +1,2209 @@
1/*
2 * edac_mc kernel module
3 * (C) 2005 Linux Networx (http://lnxi.com)
4 * This file may be distributed under the terms of the
5 * GNU General Public License.
6 *
7 * Written by Thayne Harbaugh
8 * Based on work by Dan Hollis <goemon at anime dot net> and others.
9 * http://www.anime.net/~goemon/linux-ecc/
10 *
11 * Modified by Dave Peterson and Doug Thompson
12 *
13 */
14
15
16#include <linux/config.h>
17#include <linux/version.h>
18#include <linux/module.h>
19#include <linux/proc_fs.h>
20#include <linux/kernel.h>
21#include <linux/types.h>
22#include <linux/smp.h>
23#include <linux/init.h>
24#include <linux/sysctl.h>
25#include <linux/highmem.h>
26#include <linux/timer.h>
27#include <linux/slab.h>
28#include <linux/jiffies.h>
29#include <linux/spinlock.h>
30#include <linux/list.h>
31#include <linux/sysdev.h>
32#include <linux/ctype.h>
33
34#include <asm/uaccess.h>
35#include <asm/page.h>
36#include <asm/edac.h>
37
38#include "edac_mc.h"
39
40#define EDAC_MC_VERSION "edac_mc Ver: 2.0.0 " __DATE__
41
42#ifdef CONFIG_EDAC_DEBUG
43/* Values of 0 to 4 will generate output */
44int edac_debug_level = 1;
45EXPORT_SYMBOL(edac_debug_level);
46#endif
47
48/* EDAC Controls, setable by module parameter, and sysfs */
49static int log_ue = 1;
50static int log_ce = 1;
51static int panic_on_ue = 1;
52static int poll_msec = 1000;
53
54static int check_pci_parity = 0; /* default YES check PCI parity */
55static int panic_on_pci_parity; /* default no panic on PCI Parity */
56static atomic_t pci_parity_count = ATOMIC_INIT(0);
57
58/* lock to memory controller's control array */
59static DECLARE_MUTEX(mem_ctls_mutex);
60static struct list_head mc_devices = LIST_HEAD_INIT(mc_devices);
61
62/* Structure of the whitelist and blacklist arrays */
63struct edac_pci_device_list {
64 unsigned int vendor; /* Vendor ID */
65 unsigned int device; /* Deviice ID */
66};
67
68
69#define MAX_LISTED_PCI_DEVICES 32
70
71/* List of PCI devices (vendor-id:device-id) that should be skipped */
72static struct edac_pci_device_list pci_blacklist[MAX_LISTED_PCI_DEVICES];
73static int pci_blacklist_count;
74
75/* List of PCI devices (vendor-id:device-id) that should be scanned */
76static struct edac_pci_device_list pci_whitelist[MAX_LISTED_PCI_DEVICES];
77static int pci_whitelist_count ;
78
79/* START sysfs data and methods */
80
81static const char *mem_types[] = {
82 [MEM_EMPTY] = "Empty",
83 [MEM_RESERVED] = "Reserved",
84 [MEM_UNKNOWN] = "Unknown",
85 [MEM_FPM] = "FPM",
86 [MEM_EDO] = "EDO",
87 [MEM_BEDO] = "BEDO",
88 [MEM_SDR] = "Unbuffered-SDR",
89 [MEM_RDR] = "Registered-SDR",
90 [MEM_DDR] = "Unbuffered-DDR",
91 [MEM_RDDR] = "Registered-DDR",
92 [MEM_RMBS] = "RMBS"
93};
94
95static const char *dev_types[] = {
96 [DEV_UNKNOWN] = "Unknown",
97 [DEV_X1] = "x1",
98 [DEV_X2] = "x2",
99 [DEV_X4] = "x4",
100 [DEV_X8] = "x8",
101 [DEV_X16] = "x16",
102 [DEV_X32] = "x32",
103 [DEV_X64] = "x64"
104};
105
106static const char *edac_caps[] = {
107 [EDAC_UNKNOWN] = "Unknown",
108 [EDAC_NONE] = "None",
109 [EDAC_RESERVED] = "Reserved",
110 [EDAC_PARITY] = "PARITY",
111 [EDAC_EC] = "EC",
112 [EDAC_SECDED] = "SECDED",
113 [EDAC_S2ECD2ED] = "S2ECD2ED",
114 [EDAC_S4ECD4ED] = "S4ECD4ED",
115 [EDAC_S8ECD8ED] = "S8ECD8ED",
116 [EDAC_S16ECD16ED] = "S16ECD16ED"
117};
118
119
120/* sysfs object: /sys/devices/system/edac */
121static struct sysdev_class edac_class = {
122 set_kset_name("edac"),
123};
124
125/* sysfs objects:
126 * /sys/devices/system/edac/mc
127 * /sys/devices/system/edac/pci
128 */
129static struct kobject edac_memctrl_kobj;
130static struct kobject edac_pci_kobj;
131
132/*
133 * /sys/devices/system/edac/mc;
134 * data structures and methods
135 */
136static ssize_t memctrl_string_show(void *ptr, char *buffer)
137{
138 char *value = (char*) ptr;
139 return sprintf(buffer, "%s\n", value);
140}
141
142static ssize_t memctrl_int_show(void *ptr, char *buffer)
143{
144 int *value = (int*) ptr;
145 return sprintf(buffer, "%d\n", *value);
146}
147
148static ssize_t memctrl_int_store(void *ptr, const char *buffer, size_t count)
149{
150 int *value = (int*) ptr;
151
152 if (isdigit(*buffer))
153 *value = simple_strtoul(buffer, NULL, 0);
154
155 return count;
156}
157
158struct memctrl_dev_attribute {
159 struct attribute attr;
160 void *value;
161 ssize_t (*show)(void *,char *);
162 ssize_t (*store)(void *, const char *, size_t);
163};
164
165/* Set of show/store abstract level functions for memory control object */
166static ssize_t
167memctrl_dev_show(struct kobject *kobj, struct attribute *attr, char *buffer)
168{
169 struct memctrl_dev_attribute *memctrl_dev;
170 memctrl_dev = (struct memctrl_dev_attribute*)attr;
171
172 if (memctrl_dev->show)
173 return memctrl_dev->show(memctrl_dev->value, buffer);
174 return -EIO;
175}
176
177static ssize_t
178memctrl_dev_store(struct kobject *kobj, struct attribute *attr,
179 const char *buffer, size_t count)
180{
181 struct memctrl_dev_attribute *memctrl_dev;
182 memctrl_dev = (struct memctrl_dev_attribute*)attr;
183
184 if (memctrl_dev->store)
185 return memctrl_dev->store(memctrl_dev->value, buffer, count);
186 return -EIO;
187}
188
189static struct sysfs_ops memctrlfs_ops = {
190 .show = memctrl_dev_show,
191 .store = memctrl_dev_store
192};
193
194#define MEMCTRL_ATTR(_name,_mode,_show,_store) \
195struct memctrl_dev_attribute attr_##_name = { \
196 .attr = {.name = __stringify(_name), .mode = _mode }, \
197 .value = &_name, \
198 .show = _show, \
199 .store = _store, \
200};
201
202#define MEMCTRL_STRING_ATTR(_name,_data,_mode,_show,_store) \
203struct memctrl_dev_attribute attr_##_name = { \
204 .attr = {.name = __stringify(_name), .mode = _mode }, \
205 .value = _data, \
206 .show = _show, \
207 .store = _store, \
208};
209
210/* cwrow<id> attribute f*/
211MEMCTRL_STRING_ATTR(mc_version,EDAC_MC_VERSION,S_IRUGO,memctrl_string_show,NULL);
212
213/* csrow<id> control files */
214MEMCTRL_ATTR(panic_on_ue,S_IRUGO|S_IWUSR,memctrl_int_show,memctrl_int_store);
215MEMCTRL_ATTR(log_ue,S_IRUGO|S_IWUSR,memctrl_int_show,memctrl_int_store);
216MEMCTRL_ATTR(log_ce,S_IRUGO|S_IWUSR,memctrl_int_show,memctrl_int_store);
217MEMCTRL_ATTR(poll_msec,S_IRUGO|S_IWUSR,memctrl_int_show,memctrl_int_store);
218
219
220/* Base Attributes of the memory ECC object */
221static struct memctrl_dev_attribute *memctrl_attr[] = {
222 &attr_panic_on_ue,
223 &attr_log_ue,
224 &attr_log_ce,
225 &attr_poll_msec,
226 &attr_mc_version,
227 NULL,
228};
229
230/* Main MC kobject release() function */
231static void edac_memctrl_master_release(struct kobject *kobj)
232{
233 debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
234}
235
236static struct kobj_type ktype_memctrl = {
237 .release = edac_memctrl_master_release,
238 .sysfs_ops = &memctrlfs_ops,
239 .default_attrs = (struct attribute **) memctrl_attr,
240};
241
242
243/* Initialize the main sysfs entries for edac:
244 * /sys/devices/system/edac
245 *
246 * and children
247 *
248 * Return: 0 SUCCESS
249 * !0 FAILURE
250 */
251static int edac_sysfs_memctrl_setup(void)
252{
253 int err=0;
254
255 debugf1("MC: " __FILE__ ": %s()\n", __func__);
256
257 /* create the /sys/devices/system/edac directory */
258 err = sysdev_class_register(&edac_class);
259 if (!err) {
260 /* Init the MC's kobject */
261 memset(&edac_memctrl_kobj, 0, sizeof (edac_memctrl_kobj));
262 kobject_init(&edac_memctrl_kobj);
263
264 edac_memctrl_kobj.parent = &edac_class.kset.kobj;
265 edac_memctrl_kobj.ktype = &ktype_memctrl;
266
267 /* generate sysfs "..../edac/mc" */
268 err = kobject_set_name(&edac_memctrl_kobj,"mc");
269 if (!err) {
270 /* FIXME: maybe new sysdev_create_subdir() */
271 err = kobject_register(&edac_memctrl_kobj);
272 if (err) {
273 debugf1("Failed to register '.../edac/mc'\n");
274 } else {
275 debugf1("Registered '.../edac/mc' kobject\n");
276 }
277 }
278 } else {
279 debugf1(KERN_WARNING "__FILE__ %s() error=%d\n", __func__,err);
280 }
281
282 return err;
283}
284
285/*
286 * MC teardown:
287 * the '..../edac/mc' kobject followed by '..../edac' itself
288 */
289static void edac_sysfs_memctrl_teardown(void)
290{
291 debugf0("MC: " __FILE__ ": %s()\n", __func__);
292
293 /* Unregister the MC's kobject */
294 kobject_unregister(&edac_memctrl_kobj);
295
296 /* release the master edac mc kobject */
297 kobject_put(&edac_memctrl_kobj);
298
299 /* Unregister the 'edac' object */
300 sysdev_class_unregister(&edac_class);
301}
302
303/*
304 * /sys/devices/system/edac/pci;
305 * data structures and methods
306 */
307
308struct list_control {
309 struct edac_pci_device_list *list;
310 int *count;
311};
312
313/* Output the list as: vendor_id:device:id<,vendor_id:device_id> */
314static ssize_t edac_pci_list_string_show(void *ptr, char *buffer)
315{
316 struct list_control *listctl;
317 struct edac_pci_device_list *list;
318 char *p = buffer;
319 int len=0;
320 int i;
321
322 listctl = ptr;
323 list = listctl->list;
324
325 for (i = 0; i < *(listctl->count); i++, list++ ) {
326 if (len > 0)
327 len += snprintf(p + len, (PAGE_SIZE-len), ",");
328
329 len += snprintf(p + len,
330 (PAGE_SIZE-len),
331 "%x:%x",
332 list->vendor,list->device);
333 }
334
335 len += snprintf(p + len,(PAGE_SIZE-len), "\n");
336
337 return (ssize_t) len;
338}
339
340/**
341 *
342 * Scan string from **s to **e looking for one 'vendor:device' tuple
343 * where each field is a hex value
344 *
345 * return 0 if an entry is NOT found
346 * return 1 if an entry is found
347 * fill in *vendor_id and *device_id with values found
348 *
349 * In both cases, make sure *s has been moved forward toward *e
350 */
351static int parse_one_device(const char **s,const char **e,
352 unsigned int *vendor_id, unsigned int *device_id)
353{
354 const char *runner, *p;
355
356 /* if null byte, we are done */
357 if (!**s) {
358 (*s)++; /* keep *s moving */
359 return 0;
360 }
361
362 /* skip over newlines & whitespace */
363 if ((**s == '\n') || isspace(**s)) {
364 (*s)++;
365 return 0;
366 }
367
368 if (!isxdigit(**s)) {
369 (*s)++;
370 return 0;
371 }
372
373 /* parse vendor_id */
374 runner = *s;
375 while (runner < *e) {
376 /* scan for vendor:device delimiter */
377 if (*runner == ':') {
378 *vendor_id = simple_strtol((char*) *s, (char**) &p, 16);
379 runner = p + 1;
380 break;
381 }
382 runner++;
383 }
384
385 if (!isxdigit(*runner)) {
386 *s = ++runner;
387 return 0;
388 }
389
390 /* parse device_id */
391 if (runner < *e) {
392 *device_id = simple_strtol((char*)runner, (char**)&p, 16);
393 runner = p;
394 }
395
396 *s = runner;
397
398 return 1;
399}
400
401static ssize_t edac_pci_list_string_store(void *ptr, const char *buffer,
402 size_t count)
403{
404 struct list_control *listctl;
405 struct edac_pci_device_list *list;
406 unsigned int vendor_id, device_id;
407 const char *s, *e;
408 int *index;
409
410 s = (char*)buffer;
411 e = s + count;
412
413 listctl = ptr;
414 list = listctl->list;
415 index = listctl->count;
416
417 *index = 0;
418 while (*index < MAX_LISTED_PCI_DEVICES) {
419
420 if (parse_one_device(&s,&e,&vendor_id,&device_id)) {
421 list[ *index ].vendor = vendor_id;
422 list[ *index ].device = device_id;
423 (*index)++;
424 }
425
426 /* check for all data consume */
427 if (s >= e)
428 break;
429 }
430
431 return count;
432}
433
434static ssize_t edac_pci_int_show(void *ptr, char *buffer)
435{
436 int *value = ptr;
437 return sprintf(buffer,"%d\n",*value);
438}
439
440static ssize_t edac_pci_int_store(void *ptr, const char *buffer, size_t count)
441{
442 int *value = ptr;
443
444 if (isdigit(*buffer))
445 *value = simple_strtoul(buffer,NULL,0);
446
447 return count;
448}
449
450struct edac_pci_dev_attribute {
451 struct attribute attr;
452 void *value;
453 ssize_t (*show)(void *,char *);
454 ssize_t (*store)(void *, const char *,size_t);
455};
456
457/* Set of show/store abstract level functions for PCI Parity object */
458static ssize_t edac_pci_dev_show(struct kobject *kobj, struct attribute *attr,
459 char *buffer)
460{
461 struct edac_pci_dev_attribute *edac_pci_dev;
462 edac_pci_dev= (struct edac_pci_dev_attribute*)attr;
463
464 if (edac_pci_dev->show)
465 return edac_pci_dev->show(edac_pci_dev->value, buffer);
466 return -EIO;
467}
468
469static ssize_t edac_pci_dev_store(struct kobject *kobj, struct attribute *attr,
470 const char *buffer, size_t count)
471{
472 struct edac_pci_dev_attribute *edac_pci_dev;
473 edac_pci_dev= (struct edac_pci_dev_attribute*)attr;
474
475 if (edac_pci_dev->show)
476 return edac_pci_dev->store(edac_pci_dev->value, buffer, count);
477 return -EIO;
478}
479
480static struct sysfs_ops edac_pci_sysfs_ops = {
481 .show = edac_pci_dev_show,
482 .store = edac_pci_dev_store
483};
484
485
486#define EDAC_PCI_ATTR(_name,_mode,_show,_store) \
487struct edac_pci_dev_attribute edac_pci_attr_##_name = { \
488 .attr = {.name = __stringify(_name), .mode = _mode }, \
489 .value = &_name, \
490 .show = _show, \
491 .store = _store, \
492};
493
494#define EDAC_PCI_STRING_ATTR(_name,_data,_mode,_show,_store) \
495struct edac_pci_dev_attribute edac_pci_attr_##_name = { \
496 .attr = {.name = __stringify(_name), .mode = _mode }, \
497 .value = _data, \
498 .show = _show, \
499 .store = _store, \
500};
501
502static struct list_control pci_whitelist_control = {
503 .list = pci_whitelist,
504 .count = &pci_whitelist_count
505};
506
507static struct list_control pci_blacklist_control = {
508 .list = pci_blacklist,
509 .count = &pci_blacklist_count
510};
511
512/* whitelist attribute */
513EDAC_PCI_STRING_ATTR(pci_parity_whitelist,
514 &pci_whitelist_control,
515 S_IRUGO|S_IWUSR,
516 edac_pci_list_string_show,
517 edac_pci_list_string_store);
518
519EDAC_PCI_STRING_ATTR(pci_parity_blacklist,
520 &pci_blacklist_control,
521 S_IRUGO|S_IWUSR,
522 edac_pci_list_string_show,
523 edac_pci_list_string_store);
524
525/* PCI Parity control files */
526EDAC_PCI_ATTR(check_pci_parity,S_IRUGO|S_IWUSR,edac_pci_int_show,edac_pci_int_store);
527EDAC_PCI_ATTR(panic_on_pci_parity,S_IRUGO|S_IWUSR,edac_pci_int_show,edac_pci_int_store);
528EDAC_PCI_ATTR(pci_parity_count,S_IRUGO,edac_pci_int_show,NULL);
529
530/* Base Attributes of the memory ECC object */
531static struct edac_pci_dev_attribute *edac_pci_attr[] = {
532 &edac_pci_attr_check_pci_parity,
533 &edac_pci_attr_panic_on_pci_parity,
534 &edac_pci_attr_pci_parity_count,
535 &edac_pci_attr_pci_parity_whitelist,
536 &edac_pci_attr_pci_parity_blacklist,
537 NULL,
538};
539
540/* No memory to release */
541static void edac_pci_release(struct kobject *kobj)
542{
543 debugf1("EDAC PCI: " __FILE__ ": %s()\n", __func__);
544}
545
546static struct kobj_type ktype_edac_pci = {
547 .release = edac_pci_release,
548 .sysfs_ops = &edac_pci_sysfs_ops,
549 .default_attrs = (struct attribute **) edac_pci_attr,
550};
551
552/**
553 * edac_sysfs_pci_setup()
554 *
555 */
556static int edac_sysfs_pci_setup(void)
557{
558 int err;
559
560 debugf1("MC: " __FILE__ ": %s()\n", __func__);
561
562 memset(&edac_pci_kobj, 0, sizeof(edac_pci_kobj));
563
564 kobject_init(&edac_pci_kobj);
565 edac_pci_kobj.parent = &edac_class.kset.kobj;
566 edac_pci_kobj.ktype = &ktype_edac_pci;
567
568 err = kobject_set_name(&edac_pci_kobj, "pci");
569 if (!err) {
570 /* Instanstiate the csrow object */
571 /* FIXME: maybe new sysdev_create_subdir() */
572 err = kobject_register(&edac_pci_kobj);
573 if (err)
574 debugf1("Failed to register '.../edac/pci'\n");
575 else
576 debugf1("Registered '.../edac/pci' kobject\n");
577 }
578 return err;
579}
580
581
582static void edac_sysfs_pci_teardown(void)
583{
584 debugf0("MC: " __FILE__ ": %s()\n", __func__);
585
586 kobject_unregister(&edac_pci_kobj);
587 kobject_put(&edac_pci_kobj);
588}
589
590/* EDAC sysfs CSROW data structures and methods */
591
592/* Set of more detailed csrow<id> attribute show/store functions */
593static ssize_t csrow_ch0_dimm_label_show(struct csrow_info *csrow, char *data)
594{
595 ssize_t size = 0;
596
597 if (csrow->nr_channels > 0) {
598 size = snprintf(data, EDAC_MC_LABEL_LEN,"%s\n",
599 csrow->channels[0].label);
600 }
601 return size;
602}
603
604static ssize_t csrow_ch1_dimm_label_show(struct csrow_info *csrow, char *data)
605{
606 ssize_t size = 0;
607
608 if (csrow->nr_channels > 0) {
609 size = snprintf(data, EDAC_MC_LABEL_LEN, "%s\n",
610 csrow->channels[1].label);
611 }
612 return size;
613}
614
615static ssize_t csrow_ch0_dimm_label_store(struct csrow_info *csrow,
616 const char *data, size_t size)
617{
618 ssize_t max_size = 0;
619
620 if (csrow->nr_channels > 0) {
621 max_size = min((ssize_t)size,(ssize_t)EDAC_MC_LABEL_LEN-1);
622 strncpy(csrow->channels[0].label, data, max_size);
623 csrow->channels[0].label[max_size] = '\0';
624 }
625 return size;
626}
627
628static ssize_t csrow_ch1_dimm_label_store(struct csrow_info *csrow,
629 const char *data, size_t size)
630{
631 ssize_t max_size = 0;
632
633 if (csrow->nr_channels > 1) {
634 max_size = min((ssize_t)size,(ssize_t)EDAC_MC_LABEL_LEN-1);
635 strncpy(csrow->channels[1].label, data, max_size);
636 csrow->channels[1].label[max_size] = '\0';
637 }
638 return max_size;
639}
640
641static ssize_t csrow_ue_count_show(struct csrow_info *csrow, char *data)
642{
643 return sprintf(data,"%u\n", csrow->ue_count);
644}
645
646static ssize_t csrow_ce_count_show(struct csrow_info *csrow, char *data)
647{
648 return sprintf(data,"%u\n", csrow->ce_count);
649}
650
651static ssize_t csrow_ch0_ce_count_show(struct csrow_info *csrow, char *data)
652{
653 ssize_t size = 0;
654
655 if (csrow->nr_channels > 0) {
656 size = sprintf(data,"%u\n", csrow->channels[0].ce_count);
657 }
658 return size;
659}
660
661static ssize_t csrow_ch1_ce_count_show(struct csrow_info *csrow, char *data)
662{
663 ssize_t size = 0;
664
665 if (csrow->nr_channels > 1) {
666 size = sprintf(data,"%u\n", csrow->channels[1].ce_count);
667 }
668 return size;
669}
670
671static ssize_t csrow_size_show(struct csrow_info *csrow, char *data)
672{
673 return sprintf(data,"%u\n", PAGES_TO_MiB(csrow->nr_pages));
674}
675
676static ssize_t csrow_mem_type_show(struct csrow_info *csrow, char *data)
677{
678 return sprintf(data,"%s\n", mem_types[csrow->mtype]);
679}
680
681static ssize_t csrow_dev_type_show(struct csrow_info *csrow, char *data)
682{
683 return sprintf(data,"%s\n", dev_types[csrow->dtype]);
684}
685
686static ssize_t csrow_edac_mode_show(struct csrow_info *csrow, char *data)
687{
688 return sprintf(data,"%s\n", edac_caps[csrow->edac_mode]);
689}
690
691struct csrowdev_attribute {
692 struct attribute attr;
693 ssize_t (*show)(struct csrow_info *,char *);
694 ssize_t (*store)(struct csrow_info *, const char *,size_t);
695};
696
697#define to_csrow(k) container_of(k, struct csrow_info, kobj)
698#define to_csrowdev_attr(a) container_of(a, struct csrowdev_attribute, attr)
699
700/* Set of show/store higher level functions for csrow objects */
701static ssize_t csrowdev_show(struct kobject *kobj, struct attribute *attr,
702 char *buffer)
703{
704 struct csrow_info *csrow = to_csrow(kobj);
705 struct csrowdev_attribute *csrowdev_attr = to_csrowdev_attr(attr);
706
707 if (csrowdev_attr->show)
708 return csrowdev_attr->show(csrow, buffer);
709 return -EIO;
710}
711
712static ssize_t csrowdev_store(struct kobject *kobj, struct attribute *attr,
713 const char *buffer, size_t count)
714{
715 struct csrow_info *csrow = to_csrow(kobj);
716 struct csrowdev_attribute * csrowdev_attr = to_csrowdev_attr(attr);
717
718 if (csrowdev_attr->store)
719 return csrowdev_attr->store(csrow, buffer, count);
720 return -EIO;
721}
722
723static struct sysfs_ops csrowfs_ops = {
724 .show = csrowdev_show,
725 .store = csrowdev_store
726};
727
728#define CSROWDEV_ATTR(_name,_mode,_show,_store) \
729struct csrowdev_attribute attr_##_name = { \
730 .attr = {.name = __stringify(_name), .mode = _mode }, \
731 .show = _show, \
732 .store = _store, \
733};
734
735/* cwrow<id>/attribute files */
736CSROWDEV_ATTR(size_mb,S_IRUGO,csrow_size_show,NULL);
737CSROWDEV_ATTR(dev_type,S_IRUGO,csrow_dev_type_show,NULL);
738CSROWDEV_ATTR(mem_type,S_IRUGO,csrow_mem_type_show,NULL);
739CSROWDEV_ATTR(edac_mode,S_IRUGO,csrow_edac_mode_show,NULL);
740CSROWDEV_ATTR(ue_count,S_IRUGO,csrow_ue_count_show,NULL);
741CSROWDEV_ATTR(ce_count,S_IRUGO,csrow_ce_count_show,NULL);
742CSROWDEV_ATTR(ch0_ce_count,S_IRUGO,csrow_ch0_ce_count_show,NULL);
743CSROWDEV_ATTR(ch1_ce_count,S_IRUGO,csrow_ch1_ce_count_show,NULL);
744
745/* control/attribute files */
746CSROWDEV_ATTR(ch0_dimm_label,S_IRUGO|S_IWUSR,
747 csrow_ch0_dimm_label_show,
748 csrow_ch0_dimm_label_store);
749CSROWDEV_ATTR(ch1_dimm_label,S_IRUGO|S_IWUSR,
750 csrow_ch1_dimm_label_show,
751 csrow_ch1_dimm_label_store);
752
753
754/* Attributes of the CSROW<id> object */
755static struct csrowdev_attribute *csrow_attr[] = {
756 &attr_dev_type,
757 &attr_mem_type,
758 &attr_edac_mode,
759 &attr_size_mb,
760 &attr_ue_count,
761 &attr_ce_count,
762 &attr_ch0_ce_count,
763 &attr_ch1_ce_count,
764 &attr_ch0_dimm_label,
765 &attr_ch1_dimm_label,
766 NULL,
767};
768
769
770/* No memory to release */
771static void edac_csrow_instance_release(struct kobject *kobj)
772{
773 debugf1("EDAC MC: " __FILE__ ": %s()\n", __func__);
774}
775
776static struct kobj_type ktype_csrow = {
777 .release = edac_csrow_instance_release,
778 .sysfs_ops = &csrowfs_ops,
779 .default_attrs = (struct attribute **) csrow_attr,
780};
781
782/* Create a CSROW object under specifed edac_mc_device */
783static int edac_create_csrow_object(struct kobject *edac_mci_kobj,
784 struct csrow_info *csrow, int index )
785{
786 int err = 0;
787
788 debugf0("MC: " __FILE__ ": %s()\n", __func__);
789
790 memset(&csrow->kobj, 0, sizeof(csrow->kobj));
791
792 /* generate ..../edac/mc/mc<id>/csrow<index> */
793
794 kobject_init(&csrow->kobj);
795 csrow->kobj.parent = edac_mci_kobj;
796 csrow->kobj.ktype = &ktype_csrow;
797
798 /* name this instance of csrow<id> */
799 err = kobject_set_name(&csrow->kobj,"csrow%d",index);
800 if (!err) {
801 /* Instanstiate the csrow object */
802 err = kobject_register(&csrow->kobj);
803 if (err)
804 debugf0("Failed to register CSROW%d\n",index);
805 else
806 debugf0("Registered CSROW%d\n",index);
807 }
808
809 return err;
810}
811
812/* sysfs data structures and methods for the MCI kobjects */
813
814static ssize_t mci_reset_counters_store(struct mem_ctl_info *mci,
815 const char *data, size_t count )
816{
817 int row, chan;
818
819 mci->ue_noinfo_count = 0;
820 mci->ce_noinfo_count = 0;
821 mci->ue_count = 0;
822 mci->ce_count = 0;
823 for (row = 0; row < mci->nr_csrows; row++) {
824 struct csrow_info *ri = &mci->csrows[row];
825
826 ri->ue_count = 0;
827 ri->ce_count = 0;
828 for (chan = 0; chan < ri->nr_channels; chan++)
829 ri->channels[chan].ce_count = 0;
830 }
831 mci->start_time = jiffies;
832
833 return count;
834}
835
836static ssize_t mci_ue_count_show(struct mem_ctl_info *mci, char *data)
837{
838 return sprintf(data,"%d\n", mci->ue_count);
839}
840
841static ssize_t mci_ce_count_show(struct mem_ctl_info *mci, char *data)
842{
843 return sprintf(data,"%d\n", mci->ce_count);
844}
845
846static ssize_t mci_ce_noinfo_show(struct mem_ctl_info *mci, char *data)
847{
848 return sprintf(data,"%d\n", mci->ce_noinfo_count);
849}
850
851static ssize_t mci_ue_noinfo_show(struct mem_ctl_info *mci, char *data)
852{
853 return sprintf(data,"%d\n", mci->ue_noinfo_count);
854}
855
856static ssize_t mci_seconds_show(struct mem_ctl_info *mci, char *data)
857{
858 return sprintf(data,"%ld\n", (jiffies - mci->start_time) / HZ);
859}
860
861static ssize_t mci_mod_name_show(struct mem_ctl_info *mci, char *data)
862{
863 return sprintf(data,"%s %s\n", mci->mod_name, mci->mod_ver);
864}
865
866static ssize_t mci_ctl_name_show(struct mem_ctl_info *mci, char *data)
867{
868 return sprintf(data,"%s\n", mci->ctl_name);
869}
870
871static int mci_output_edac_cap(char *buf, unsigned long edac_cap)
872{
873 char *p = buf;
874 int bit_idx;
875
876 for (bit_idx = 0; bit_idx < 8 * sizeof(edac_cap); bit_idx++) {
877 if ((edac_cap >> bit_idx) & 0x1)
878 p += sprintf(p, "%s ", edac_caps[bit_idx]);
879 }
880
881 return p - buf;
882}
883
884static ssize_t mci_edac_capability_show(struct mem_ctl_info *mci, char *data)
885{
886 char *p = data;
887
888 p += mci_output_edac_cap(p,mci->edac_ctl_cap);
889 p += sprintf(p, "\n");
890
891 return p - data;
892}
893
894static ssize_t mci_edac_current_capability_show(struct mem_ctl_info *mci,
895 char *data)
896{
897 char *p = data;
898
899 p += mci_output_edac_cap(p,mci->edac_cap);
900 p += sprintf(p, "\n");
901
902 return p - data;
903}
904
905static int mci_output_mtype_cap(char *buf, unsigned long mtype_cap)
906{
907 char *p = buf;
908 int bit_idx;
909
910 for (bit_idx = 0; bit_idx < 8 * sizeof(mtype_cap); bit_idx++) {
911 if ((mtype_cap >> bit_idx) & 0x1)
912 p += sprintf(p, "%s ", mem_types[bit_idx]);
913 }
914
915 return p - buf;
916}
917
918static ssize_t mci_supported_mem_type_show(struct mem_ctl_info *mci, char *data)
919{
920 char *p = data;
921
922 p += mci_output_mtype_cap(p,mci->mtype_cap);
923 p += sprintf(p, "\n");
924
925 return p - data;
926}
927
928static ssize_t mci_size_mb_show(struct mem_ctl_info *mci, char *data)
929{
930 int total_pages, csrow_idx;
931
932 for (total_pages = csrow_idx = 0; csrow_idx < mci->nr_csrows;
933 csrow_idx++) {
934 struct csrow_info *csrow = &mci->csrows[csrow_idx];
935
936 if (!csrow->nr_pages)
937 continue;
938 total_pages += csrow->nr_pages;
939 }
940
941 return sprintf(data,"%u\n", PAGES_TO_MiB(total_pages));
942}
943
944struct mcidev_attribute {
945 struct attribute attr;
946 ssize_t (*show)(struct mem_ctl_info *,char *);
947 ssize_t (*store)(struct mem_ctl_info *, const char *,size_t);
948};
949
950#define to_mci(k) container_of(k, struct mem_ctl_info, edac_mci_kobj)
951#define to_mcidev_attr(a) container_of(a, struct mcidev_attribute, attr)
952
953static ssize_t mcidev_show(struct kobject *kobj, struct attribute *attr,
954 char *buffer)
955{
956 struct mem_ctl_info *mem_ctl_info = to_mci(kobj);
957 struct mcidev_attribute * mcidev_attr = to_mcidev_attr(attr);
958
959 if (mcidev_attr->show)
960 return mcidev_attr->show(mem_ctl_info, buffer);
961 return -EIO;
962}
963
964static ssize_t mcidev_store(struct kobject *kobj, struct attribute *attr,
965 const char *buffer, size_t count)
966{
967 struct mem_ctl_info *mem_ctl_info = to_mci(kobj);
968 struct mcidev_attribute * mcidev_attr = to_mcidev_attr(attr);
969
970 if (mcidev_attr->store)
971 return mcidev_attr->store(mem_ctl_info, buffer, count);
972 return -EIO;
973}
974
975static struct sysfs_ops mci_ops = {
976 .show = mcidev_show,
977 .store = mcidev_store
978};
979
980#define MCIDEV_ATTR(_name,_mode,_show,_store) \
981struct mcidev_attribute mci_attr_##_name = { \
982 .attr = {.name = __stringify(_name), .mode = _mode }, \
983 .show = _show, \
984 .store = _store, \
985};
986
987/* Control file */
988MCIDEV_ATTR(reset_counters,S_IWUSR,NULL,mci_reset_counters_store);
989
990/* Attribute files */
991MCIDEV_ATTR(mc_name,S_IRUGO,mci_ctl_name_show,NULL);
992MCIDEV_ATTR(module_name,S_IRUGO,mci_mod_name_show,NULL);
993MCIDEV_ATTR(edac_capability,S_IRUGO,mci_edac_capability_show,NULL);
994MCIDEV_ATTR(size_mb,S_IRUGO,mci_size_mb_show,NULL);
995MCIDEV_ATTR(seconds_since_reset,S_IRUGO,mci_seconds_show,NULL);
996MCIDEV_ATTR(ue_noinfo_count,S_IRUGO,mci_ue_noinfo_show,NULL);
997MCIDEV_ATTR(ce_noinfo_count,S_IRUGO,mci_ce_noinfo_show,NULL);
998MCIDEV_ATTR(ue_count,S_IRUGO,mci_ue_count_show,NULL);
999MCIDEV_ATTR(ce_count,S_IRUGO,mci_ce_count_show,NULL);
1000MCIDEV_ATTR(edac_current_capability,S_IRUGO,
1001 mci_edac_current_capability_show,NULL);
1002MCIDEV_ATTR(supported_mem_type,S_IRUGO,
1003 mci_supported_mem_type_show,NULL);
1004
1005
1006static struct mcidev_attribute *mci_attr[] = {
1007 &mci_attr_reset_counters,
1008 &mci_attr_module_name,
1009 &mci_attr_mc_name,
1010 &mci_attr_edac_capability,
1011 &mci_attr_edac_current_capability,
1012 &mci_attr_supported_mem_type,
1013 &mci_attr_size_mb,
1014 &mci_attr_seconds_since_reset,
1015 &mci_attr_ue_noinfo_count,
1016 &mci_attr_ce_noinfo_count,
1017 &mci_attr_ue_count,
1018 &mci_attr_ce_count,
1019 NULL
1020};
1021
1022
1023/*
1024 * Release of a MC controlling instance
1025 */
1026static void edac_mci_instance_release(struct kobject *kobj)
1027{
1028 struct mem_ctl_info *mci;
1029 mci = container_of(kobj,struct mem_ctl_info,edac_mci_kobj);
1030
1031 debugf0("MC: " __FILE__ ": %s() idx=%d calling kfree\n",
1032 __func__, mci->mc_idx);
1033
1034 kfree(mci);
1035}
1036
1037static struct kobj_type ktype_mci = {
1038 .release = edac_mci_instance_release,
1039 .sysfs_ops = &mci_ops,
1040 .default_attrs = (struct attribute **) mci_attr,
1041};
1042
1043#define EDAC_DEVICE_SYMLINK "device"
1044
1045/*
1046 * Create a new Memory Controller kobject instance,
1047 * mc<id> under the 'mc' directory
1048 *
1049 * Return:
1050 * 0 Success
1051 * !0 Failure
1052 */
1053static int edac_create_sysfs_mci_device(struct mem_ctl_info *mci)
1054{
1055 int i;
1056 int err;
1057 struct csrow_info *csrow;
1058 struct kobject *edac_mci_kobj=&mci->edac_mci_kobj;
1059
1060 debugf0("MC: " __FILE__ ": %s() idx=%d\n", __func__, mci->mc_idx);
1061
1062 memset(edac_mci_kobj, 0, sizeof(*edac_mci_kobj));
1063 kobject_init(edac_mci_kobj);
1064
1065 /* set the name of the mc<id> object */
1066 err = kobject_set_name(edac_mci_kobj,"mc%d",mci->mc_idx);
1067 if (err)
1068 return err;
1069
1070 /* link to our parent the '..../edac/mc' object */
1071 edac_mci_kobj->parent = &edac_memctrl_kobj;
1072 edac_mci_kobj->ktype = &ktype_mci;
1073
1074 /* register the mc<id> kobject */
1075 err = kobject_register(edac_mci_kobj);
1076 if (err)
1077 return err;
1078
1079 /* create a symlink for the device */
1080 err = sysfs_create_link(edac_mci_kobj, &mci->pdev->dev.kobj,
1081 EDAC_DEVICE_SYMLINK);
1082 if (err) {
1083 kobject_unregister(edac_mci_kobj);
1084 return err;
1085 }
1086
1087 /* Make directories for each CSROW object
1088 * under the mc<id> kobject
1089 */
1090 for (i = 0; i < mci->nr_csrows; i++) {
1091
1092 csrow = &mci->csrows[i];
1093
1094 /* Only expose populated CSROWs */
1095 if (csrow->nr_pages > 0) {
1096 err = edac_create_csrow_object(edac_mci_kobj,csrow,i);
1097 if (err)
1098 goto fail;
1099 }
1100 }
1101
1102 /* Mark this MCI instance as having sysfs entries */
1103 mci->sysfs_active = MCI_SYSFS_ACTIVE;
1104
1105 return 0;
1106
1107
1108 /* CSROW error: backout what has already been registered, */
1109fail:
1110 for ( i--; i >= 0; i--) {
1111 if (csrow->nr_pages > 0) {
1112 kobject_unregister(&mci->csrows[i].kobj);
1113 kobject_put(&mci->csrows[i].kobj);
1114 }
1115 }
1116
1117 kobject_unregister(edac_mci_kobj);
1118 kobject_put(edac_mci_kobj);
1119
1120 return err;
1121}
1122
1123/*
1124 * remove a Memory Controller instance
1125 */
1126static void edac_remove_sysfs_mci_device(struct mem_ctl_info *mci)
1127{
1128 int i;
1129
1130 debugf0("MC: " __FILE__ ": %s()\n", __func__);
1131
1132 /* remove all csrow kobjects */
1133 for (i = 0; i < mci->nr_csrows; i++) {
1134 if (mci->csrows[i].nr_pages > 0) {
1135 kobject_unregister(&mci->csrows[i].kobj);
1136 kobject_put(&mci->csrows[i].kobj);
1137 }
1138 }
1139
1140 sysfs_remove_link(&mci->edac_mci_kobj, EDAC_DEVICE_SYMLINK);
1141
1142 kobject_unregister(&mci->edac_mci_kobj);
1143 kobject_put(&mci->edac_mci_kobj);
1144}
1145
1146/* END OF sysfs data and methods */
1147
1148#ifdef CONFIG_EDAC_DEBUG
1149
1150EXPORT_SYMBOL(edac_mc_dump_channel);
1151
1152void edac_mc_dump_channel(struct channel_info *chan)
1153{
1154 debugf4("\tchannel = %p\n", chan);
1155 debugf4("\tchannel->chan_idx = %d\n", chan->chan_idx);
1156 debugf4("\tchannel->ce_count = %d\n", chan->ce_count);
1157 debugf4("\tchannel->label = '%s'\n", chan->label);
1158 debugf4("\tchannel->csrow = %p\n\n", chan->csrow);
1159}
1160
1161
1162EXPORT_SYMBOL(edac_mc_dump_csrow);
1163
1164void edac_mc_dump_csrow(struct csrow_info *csrow)
1165{
1166 debugf4("\tcsrow = %p\n", csrow);
1167 debugf4("\tcsrow->csrow_idx = %d\n", csrow->csrow_idx);
1168 debugf4("\tcsrow->first_page = 0x%lx\n",
1169 csrow->first_page);
1170 debugf4("\tcsrow->last_page = 0x%lx\n", csrow->last_page);
1171 debugf4("\tcsrow->page_mask = 0x%lx\n", csrow->page_mask);
1172 debugf4("\tcsrow->nr_pages = 0x%x\n", csrow->nr_pages);
1173 debugf4("\tcsrow->nr_channels = %d\n",
1174 csrow->nr_channels);
1175 debugf4("\tcsrow->channels = %p\n", csrow->channels);
1176 debugf4("\tcsrow->mci = %p\n\n", csrow->mci);
1177}
1178
1179
1180EXPORT_SYMBOL(edac_mc_dump_mci);
1181
1182void edac_mc_dump_mci(struct mem_ctl_info *mci)
1183{
1184 debugf3("\tmci = %p\n", mci);
1185 debugf3("\tmci->mtype_cap = %lx\n", mci->mtype_cap);
1186 debugf3("\tmci->edac_ctl_cap = %lx\n", mci->edac_ctl_cap);
1187 debugf3("\tmci->edac_cap = %lx\n", mci->edac_cap);
1188 debugf4("\tmci->edac_check = %p\n", mci->edac_check);
1189 debugf3("\tmci->nr_csrows = %d, csrows = %p\n",
1190 mci->nr_csrows, mci->csrows);
1191 debugf3("\tpdev = %p\n", mci->pdev);
1192 debugf3("\tmod_name:ctl_name = %s:%s\n",
1193 mci->mod_name, mci->ctl_name);
1194 debugf3("\tpvt_info = %p\n\n", mci->pvt_info);
1195}
1196
1197
1198#endif /* CONFIG_EDAC_DEBUG */
1199
1200/* 'ptr' points to a possibly unaligned item X such that sizeof(X) is 'size'.
1201 * Adjust 'ptr' so that its alignment is at least as stringent as what the
1202 * compiler would provide for X and return the aligned result.
1203 *
1204 * If 'size' is a constant, the compiler will optimize this whole function
1205 * down to either a no-op or the addition of a constant to the value of 'ptr'.
1206 */
1207static inline char * align_ptr (void *ptr, unsigned size)
1208{
1209 unsigned align, r;
1210
1211 /* Here we assume that the alignment of a "long long" is the most
1212 * stringent alignment that the compiler will ever provide by default.
1213 * As far as I know, this is a reasonable assumption.
1214 */
1215 if (size > sizeof(long))
1216 align = sizeof(long long);
1217 else if (size > sizeof(int))
1218 align = sizeof(long);
1219 else if (size > sizeof(short))
1220 align = sizeof(int);
1221 else if (size > sizeof(char))
1222 align = sizeof(short);
1223 else
1224 return (char *) ptr;
1225
1226 r = size % align;
1227
1228 if (r == 0)
1229 return (char *) ptr;
1230
1231 return (char *) (((unsigned long) ptr) + align - r);
1232}
1233
1234
1235EXPORT_SYMBOL(edac_mc_alloc);
1236
1237/**
1238 * edac_mc_alloc: Allocate a struct mem_ctl_info structure
1239 * @size_pvt: size of private storage needed
1240 * @nr_csrows: Number of CWROWS needed for this MC
1241 * @nr_chans: Number of channels for the MC
1242 *
1243 * Everything is kmalloc'ed as one big chunk - more efficient.
1244 * Only can be used if all structures have the same lifetime - otherwise
1245 * you have to allocate and initialize your own structures.
1246 *
1247 * Use edac_mc_free() to free mc structures allocated by this function.
1248 *
1249 * Returns:
1250 * NULL allocation failed
1251 * struct mem_ctl_info pointer
1252 */
1253struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt, unsigned nr_csrows,
1254 unsigned nr_chans)
1255{
1256 struct mem_ctl_info *mci;
1257 struct csrow_info *csi, *csrow;
1258 struct channel_info *chi, *chp, *chan;
1259 void *pvt;
1260 unsigned size;
1261 int row, chn;
1262
1263 /* Figure out the offsets of the various items from the start of an mc
1264 * structure. We want the alignment of each item to be at least as
1265 * stringent as what the compiler would provide if we could simply
1266 * hardcode everything into a single struct.
1267 */
1268 mci = (struct mem_ctl_info *) 0;
1269 csi = (struct csrow_info *)align_ptr(&mci[1], sizeof(*csi));
1270 chi = (struct channel_info *)
1271 align_ptr(&csi[nr_csrows], sizeof(*chi));
1272 pvt = align_ptr(&chi[nr_chans * nr_csrows], sz_pvt);
1273 size = ((unsigned long) pvt) + sz_pvt;
1274
1275 if ((mci = kmalloc(size, GFP_KERNEL)) == NULL)
1276 return NULL;
1277
1278 /* Adjust pointers so they point within the memory we just allocated
1279 * rather than an imaginary chunk of memory located at address 0.
1280 */
1281 csi = (struct csrow_info *) (((char *) mci) + ((unsigned long) csi));
1282 chi = (struct channel_info *) (((char *) mci) + ((unsigned long) chi));
1283 pvt = sz_pvt ? (((char *) mci) + ((unsigned long) pvt)) : NULL;
1284
1285 memset(mci, 0, size); /* clear all fields */
1286
1287 mci->csrows = csi;
1288 mci->pvt_info = pvt;
1289 mci->nr_csrows = nr_csrows;
1290
1291 for (row = 0; row < nr_csrows; row++) {
1292 csrow = &csi[row];
1293 csrow->csrow_idx = row;
1294 csrow->mci = mci;
1295 csrow->nr_channels = nr_chans;
1296 chp = &chi[row * nr_chans];
1297 csrow->channels = chp;
1298
1299 for (chn = 0; chn < nr_chans; chn++) {
1300 chan = &chp[chn];
1301 chan->chan_idx = chn;
1302 chan->csrow = csrow;
1303 }
1304 }
1305
1306 return mci;
1307}
1308
1309
1310EXPORT_SYMBOL(edac_mc_free);
1311
1312/**
1313 * edac_mc_free: Free a previously allocated 'mci' structure
1314 * @mci: pointer to a struct mem_ctl_info structure
1315 *
1316 * Free up a previously allocated mci structure
1317 * A MCI structure can be in 2 states after being allocated
1318 * by edac_mc_alloc().
1319 * 1) Allocated in a MC driver's probe, but not yet committed
1320 * 2) Allocated and committed, by a call to edac_mc_add_mc()
1321 * edac_mc_add_mc() is the function that adds the sysfs entries
1322 * thus, this free function must determine which state the 'mci'
1323 * structure is in, then either free it directly or
1324 * perform kobject cleanup by calling edac_remove_sysfs_mci_device().
1325 *
1326 * VOID Return
1327 */
1328void edac_mc_free(struct mem_ctl_info *mci)
1329{
1330 /* only if sysfs entries for this mci instance exist
1331 * do we remove them and defer the actual kfree via
1332 * the kobject 'release()' callback.
1333 *
1334 * Otherwise, do a straight kfree now.
1335 */
1336 if (mci->sysfs_active == MCI_SYSFS_ACTIVE)
1337 edac_remove_sysfs_mci_device(mci);
1338 else
1339 kfree(mci);
1340}
1341
1342
1343
1344EXPORT_SYMBOL(edac_mc_find_mci_by_pdev);
1345
1346struct mem_ctl_info *edac_mc_find_mci_by_pdev(struct pci_dev *pdev)
1347{
1348 struct mem_ctl_info *mci;
1349 struct list_head *item;
1350
1351 debugf3("MC: " __FILE__ ": %s()\n", __func__);
1352
1353 list_for_each(item, &mc_devices) {
1354 mci = list_entry(item, struct mem_ctl_info, link);
1355
1356 if (mci->pdev == pdev)
1357 return mci;
1358 }
1359
1360 return NULL;
1361}
1362
1363static int add_mc_to_global_list (struct mem_ctl_info *mci)
1364{
1365 struct list_head *item, *insert_before;
1366 struct mem_ctl_info *p;
1367 int i;
1368
1369 if (list_empty(&mc_devices)) {
1370 mci->mc_idx = 0;
1371 insert_before = &mc_devices;
1372 } else {
1373 if (edac_mc_find_mci_by_pdev(mci->pdev)) {
1374 printk(KERN_WARNING
1375 "EDAC MC: %s (%s) %s %s already assigned %d\n",
1376 mci->pdev->dev.bus_id, pci_name(mci->pdev),
1377 mci->mod_name, mci->ctl_name, mci->mc_idx);
1378 return 1;
1379 }
1380
1381 insert_before = NULL;
1382 i = 0;
1383
1384 list_for_each(item, &mc_devices) {
1385 p = list_entry(item, struct mem_ctl_info, link);
1386
1387 if (p->mc_idx != i) {
1388 insert_before = item;
1389 break;
1390 }
1391
1392 i++;
1393 }
1394
1395 mci->mc_idx = i;
1396
1397 if (insert_before == NULL)
1398 insert_before = &mc_devices;
1399 }
1400
1401 list_add_tail_rcu(&mci->link, insert_before);
1402 return 0;
1403}
1404
1405
1406
1407EXPORT_SYMBOL(edac_mc_add_mc);
1408
1409/**
1410 * edac_mc_add_mc: Insert the 'mci' structure into the mci global list
1411 * @mci: pointer to the mci structure to be added to the list
1412 *
1413 * Return:
1414 * 0 Success
1415 * !0 Failure
1416 */
1417
1418/* FIXME - should a warning be printed if no error detection? correction? */
1419int edac_mc_add_mc(struct mem_ctl_info *mci)
1420{
1421 int rc = 1;
1422
1423 debugf0("MC: " __FILE__ ": %s()\n", __func__);
1424#ifdef CONFIG_EDAC_DEBUG
1425 if (edac_debug_level >= 3)
1426 edac_mc_dump_mci(mci);
1427 if (edac_debug_level >= 4) {
1428 int i;
1429
1430 for (i = 0; i < mci->nr_csrows; i++) {
1431 int j;
1432 edac_mc_dump_csrow(&mci->csrows[i]);
1433 for (j = 0; j < mci->csrows[i].nr_channels; j++)
1434 edac_mc_dump_channel(&mci->csrows[i].
1435 channels[j]);
1436 }
1437 }
1438#endif
1439 down(&mem_ctls_mutex);
1440
1441 if (add_mc_to_global_list(mci))
1442 goto finish;
1443
1444 /* set load time so that error rate can be tracked */
1445 mci->start_time = jiffies;
1446
1447 if (edac_create_sysfs_mci_device(mci)) {
1448 printk(KERN_WARNING
1449 "EDAC MC%d: failed to create sysfs device\n",
1450 mci->mc_idx);
1451 /* FIXME - should there be an error code and unwind? */
1452 goto finish;
1453 }
1454
1455 /* Report action taken */
1456 printk(KERN_INFO
1457 "EDAC MC%d: Giving out device to %s %s: PCI %s\n",
1458 mci->mc_idx, mci->mod_name, mci->ctl_name,
1459 pci_name(mci->pdev));
1460
1461
1462 rc = 0;
1463
1464finish:
1465 up(&mem_ctls_mutex);
1466 return rc;
1467}
1468
1469
1470
1471static void complete_mc_list_del (struct rcu_head *head)
1472{
1473 struct mem_ctl_info *mci;
1474
1475 mci = container_of(head, struct mem_ctl_info, rcu);
1476 INIT_LIST_HEAD(&mci->link);
1477 complete(&mci->complete);
1478}
1479
1480static void del_mc_from_global_list (struct mem_ctl_info *mci)
1481{
1482 list_del_rcu(&mci->link);
1483 init_completion(&mci->complete);
1484 call_rcu(&mci->rcu, complete_mc_list_del);
1485 wait_for_completion(&mci->complete);
1486}
1487
1488EXPORT_SYMBOL(edac_mc_del_mc);
1489
1490/**
1491 * edac_mc_del_mc: Remove the specified mci structure from global list
1492 * @mci: Pointer to struct mem_ctl_info structure
1493 *
1494 * Returns:
1495 * 0 Success
1496 * 1 Failure
1497 */
1498int edac_mc_del_mc(struct mem_ctl_info *mci)
1499{
1500 int rc = 1;
1501
1502 debugf0("MC%d: " __FILE__ ": %s()\n", mci->mc_idx, __func__);
1503 down(&mem_ctls_mutex);
1504 del_mc_from_global_list(mci);
1505 printk(KERN_INFO
1506 "EDAC MC%d: Removed device %d for %s %s: PCI %s\n",
1507 mci->mc_idx, mci->mc_idx, mci->mod_name, mci->ctl_name,
1508 pci_name(mci->pdev));
1509 rc = 0;
1510 up(&mem_ctls_mutex);
1511
1512 return rc;
1513}
1514
1515
1516EXPORT_SYMBOL(edac_mc_scrub_block);
1517
1518void edac_mc_scrub_block(unsigned long page, unsigned long offset,
1519 u32 size)
1520{
1521 struct page *pg;
1522 void *virt_addr;
1523 unsigned long flags = 0;
1524
1525 debugf3("MC: " __FILE__ ": %s()\n", __func__);
1526
1527 /* ECC error page was not in our memory. Ignore it. */
1528 if(!pfn_valid(page))
1529 return;
1530
1531 /* Find the actual page structure then map it and fix */
1532 pg = pfn_to_page(page);
1533
1534 if (PageHighMem(pg))
1535 local_irq_save(flags);
1536
1537 virt_addr = kmap_atomic(pg, KM_BOUNCE_READ);
1538
1539 /* Perform architecture specific atomic scrub operation */
1540 atomic_scrub(virt_addr + offset, size);
1541
1542 /* Unmap and complete */
1543 kunmap_atomic(virt_addr, KM_BOUNCE_READ);
1544
1545 if (PageHighMem(pg))
1546 local_irq_restore(flags);
1547}
1548
1549
1550/* FIXME - should return -1 */
1551EXPORT_SYMBOL(edac_mc_find_csrow_by_page);
1552
1553int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
1554 unsigned long page)
1555{
1556 struct csrow_info *csrows = mci->csrows;
1557 int row, i;
1558
1559 debugf1("MC%d: " __FILE__ ": %s(): 0x%lx\n", mci->mc_idx, __func__,
1560 page);
1561 row = -1;
1562
1563 for (i = 0; i < mci->nr_csrows; i++) {
1564 struct csrow_info *csrow = &csrows[i];
1565
1566 if (csrow->nr_pages == 0)
1567 continue;
1568
1569 debugf3("MC%d: " __FILE__
1570 ": %s(): first(0x%lx) page(0x%lx)"
1571 " last(0x%lx) mask(0x%lx)\n", mci->mc_idx,
1572 __func__, csrow->first_page, page,
1573 csrow->last_page, csrow->page_mask);
1574
1575 if ((page >= csrow->first_page) &&
1576 (page <= csrow->last_page) &&
1577 ((page & csrow->page_mask) ==
1578 (csrow->first_page & csrow->page_mask))) {
1579 row = i;
1580 break;
1581 }
1582 }
1583
1584 if (row == -1)
1585 printk(KERN_ERR
1586 "EDAC MC%d: could not look up page error address %lx\n",
1587 mci->mc_idx, (unsigned long) page);
1588
1589 return row;
1590}
1591
1592
1593EXPORT_SYMBOL(edac_mc_handle_ce);
1594
1595/* FIXME - setable log (warning/emerg) levels */
1596/* FIXME - integrate with evlog: http://evlog.sourceforge.net/ */
1597void edac_mc_handle_ce(struct mem_ctl_info *mci,
1598 unsigned long page_frame_number,
1599 unsigned long offset_in_page,
1600 unsigned long syndrome, int row, int channel,
1601 const char *msg)
1602{
1603 unsigned long remapped_page;
1604
1605 debugf3("MC%d: " __FILE__ ": %s()\n", mci->mc_idx, __func__);
1606
1607 /* FIXME - maybe make panic on INTERNAL ERROR an option */
1608 if (row >= mci->nr_csrows || row < 0) {
1609 /* something is wrong */
1610 printk(KERN_ERR
1611 "EDAC MC%d: INTERNAL ERROR: row out of range (%d >= %d)\n",
1612 mci->mc_idx, row, mci->nr_csrows);
1613 edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
1614 return;
1615 }
1616 if (channel >= mci->csrows[row].nr_channels || channel < 0) {
1617 /* something is wrong */
1618 printk(KERN_ERR
1619 "EDAC MC%d: INTERNAL ERROR: channel out of range "
1620 "(%d >= %d)\n",
1621 mci->mc_idx, channel, mci->csrows[row].nr_channels);
1622 edac_mc_handle_ce_no_info(mci, "INTERNAL ERROR");
1623 return;
1624 }
1625
1626 if (log_ce)
1627 /* FIXME - put in DIMM location */
1628 printk(KERN_WARNING
1629 "EDAC MC%d: CE page 0x%lx, offset 0x%lx,"
1630 " grain %d, syndrome 0x%lx, row %d, channel %d,"
1631 " label \"%s\": %s\n", mci->mc_idx,
1632 page_frame_number, offset_in_page,
1633 mci->csrows[row].grain, syndrome, row, channel,
1634 mci->csrows[row].channels[channel].label, msg);
1635
1636 mci->ce_count++;
1637 mci->csrows[row].ce_count++;
1638 mci->csrows[row].channels[channel].ce_count++;
1639
1640 if (mci->scrub_mode & SCRUB_SW_SRC) {
1641 /*
1642 * Some MC's can remap memory so that it is still available
1643 * at a different address when PCI devices map into memory.
1644 * MC's that can't do this lose the memory where PCI devices
1645 * are mapped. This mapping is MC dependant and so we call
1646 * back into the MC driver for it to map the MC page to
1647 * a physical (CPU) page which can then be mapped to a virtual
1648 * page - which can then be scrubbed.
1649 */
1650 remapped_page = mci->ctl_page_to_phys ?
1651 mci->ctl_page_to_phys(mci, page_frame_number) :
1652 page_frame_number;
1653
1654 edac_mc_scrub_block(remapped_page, offset_in_page,
1655 mci->csrows[row].grain);
1656 }
1657}
1658
1659
1660EXPORT_SYMBOL(edac_mc_handle_ce_no_info);
1661
1662void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci,
1663 const char *msg)
1664{
1665 if (log_ce)
1666 printk(KERN_WARNING
1667 "EDAC MC%d: CE - no information available: %s\n",
1668 mci->mc_idx, msg);
1669 mci->ce_noinfo_count++;
1670 mci->ce_count++;
1671}
1672
1673
1674EXPORT_SYMBOL(edac_mc_handle_ue);
1675
1676void edac_mc_handle_ue(struct mem_ctl_info *mci,
1677 unsigned long page_frame_number,
1678 unsigned long offset_in_page, int row,
1679 const char *msg)
1680{
1681 int len = EDAC_MC_LABEL_LEN * 4;
1682 char labels[len + 1];
1683 char *pos = labels;
1684 int chan;
1685 int chars;
1686
1687 debugf3("MC%d: " __FILE__ ": %s()\n", mci->mc_idx, __func__);
1688
1689 /* FIXME - maybe make panic on INTERNAL ERROR an option */
1690 if (row >= mci->nr_csrows || row < 0) {
1691 /* something is wrong */
1692 printk(KERN_ERR
1693 "EDAC MC%d: INTERNAL ERROR: row out of range (%d >= %d)\n",
1694 mci->mc_idx, row, mci->nr_csrows);
1695 edac_mc_handle_ue_no_info(mci, "INTERNAL ERROR");
1696 return;
1697 }
1698
1699 chars = snprintf(pos, len + 1, "%s",
1700 mci->csrows[row].channels[0].label);
1701 len -= chars;
1702 pos += chars;
1703 for (chan = 1; (chan < mci->csrows[row].nr_channels) && (len > 0);
1704 chan++) {
1705 chars = snprintf(pos, len + 1, ":%s",
1706 mci->csrows[row].channels[chan].label);
1707 len -= chars;
1708 pos += chars;
1709 }
1710
1711 if (log_ue)
1712 printk(KERN_EMERG
1713 "EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, row %d,"
1714 " labels \"%s\": %s\n", mci->mc_idx,
1715 page_frame_number, offset_in_page,
1716 mci->csrows[row].grain, row, labels, msg);
1717
1718 if (panic_on_ue)
1719 panic
1720 ("EDAC MC%d: UE page 0x%lx, offset 0x%lx, grain %d, row %d,"
1721 " labels \"%s\": %s\n", mci->mc_idx,
1722 page_frame_number, offset_in_page,
1723 mci->csrows[row].grain, row, labels, msg);
1724
1725 mci->ue_count++;
1726 mci->csrows[row].ue_count++;
1727}
1728
1729
1730EXPORT_SYMBOL(edac_mc_handle_ue_no_info);
1731
1732void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci,
1733 const char *msg)
1734{
1735 if (panic_on_ue)
1736 panic("EDAC MC%d: Uncorrected Error", mci->mc_idx);
1737
1738 if (log_ue)
1739 printk(KERN_WARNING
1740 "EDAC MC%d: UE - no information available: %s\n",
1741 mci->mc_idx, msg);
1742 mci->ue_noinfo_count++;
1743 mci->ue_count++;
1744}
1745
1746
1747#ifdef CONFIG_PCI
1748
1749static u16 get_pci_parity_status(struct pci_dev *dev, int secondary)
1750{
1751 int where;
1752 u16 status;
1753
1754 where = secondary ? PCI_SEC_STATUS : PCI_STATUS;
1755 pci_read_config_word(dev, where, &status);
1756
1757 /* If we get back 0xFFFF then we must suspect that the card has been pulled but
1758 the Linux PCI layer has not yet finished cleaning up. We don't want to report
1759 on such devices */
1760
1761 if (status == 0xFFFF) {
1762 u32 sanity;
1763 pci_read_config_dword(dev, 0, &sanity);
1764 if (sanity == 0xFFFFFFFF)
1765 return 0;
1766 }
1767 status &= PCI_STATUS_DETECTED_PARITY | PCI_STATUS_SIG_SYSTEM_ERROR |
1768 PCI_STATUS_PARITY;
1769
1770 if (status)
1771 /* reset only the bits we are interested in */
1772 pci_write_config_word(dev, where, status);
1773
1774 return status;
1775}
1776
1777typedef void (*pci_parity_check_fn_t) (struct pci_dev *dev);
1778
1779/* Clear any PCI parity errors logged by this device. */
1780static void edac_pci_dev_parity_clear( struct pci_dev *dev )
1781{
1782 u8 header_type;
1783
1784 get_pci_parity_status(dev, 0);
1785
1786 /* read the device TYPE, looking for bridges */
1787 pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
1788
1789 if ((header_type & 0x7F) == PCI_HEADER_TYPE_BRIDGE)
1790 get_pci_parity_status(dev, 1);
1791}
1792
1793/*
1794 * PCI Parity polling
1795 *
1796 */
1797static void edac_pci_dev_parity_test(struct pci_dev *dev)
1798{
1799 u16 status;
1800 u8 header_type;
1801
1802 /* read the STATUS register on this device
1803 */
1804 status = get_pci_parity_status(dev, 0);
1805
1806 debugf2("PCI STATUS= 0x%04x %s\n", status, dev->dev.bus_id );
1807
1808 /* check the status reg for errors */
1809 if (status) {
1810 if (status & (PCI_STATUS_SIG_SYSTEM_ERROR))
1811 printk(KERN_CRIT
1812 "EDAC PCI- "
1813 "Signaled System Error on %s\n",
1814 pci_name (dev));
1815
1816 if (status & (PCI_STATUS_PARITY)) {
1817 printk(KERN_CRIT
1818 "EDAC PCI- "
1819 "Master Data Parity Error on %s\n",
1820 pci_name (dev));
1821
1822 atomic_inc(&pci_parity_count);
1823 }
1824
1825 if (status & (PCI_STATUS_DETECTED_PARITY)) {
1826 printk(KERN_CRIT
1827 "EDAC PCI- "
1828 "Detected Parity Error on %s\n",
1829 pci_name (dev));
1830
1831 atomic_inc(&pci_parity_count);
1832 }
1833 }
1834
1835 /* read the device TYPE, looking for bridges */
1836 pci_read_config_byte(dev, PCI_HEADER_TYPE, &header_type);
1837
1838 debugf2("PCI HEADER TYPE= 0x%02x %s\n", header_type, dev->dev.bus_id );
1839
1840 if ((header_type & 0x7F) == PCI_HEADER_TYPE_BRIDGE) {
1841 /* On bridges, need to examine secondary status register */
1842 status = get_pci_parity_status(dev, 1);
1843
1844 debugf2("PCI SEC_STATUS= 0x%04x %s\n",
1845 status, dev->dev.bus_id );
1846
1847 /* check the secondary status reg for errors */
1848 if (status) {
1849 if (status & (PCI_STATUS_SIG_SYSTEM_ERROR))
1850 printk(KERN_CRIT
1851 "EDAC PCI-Bridge- "
1852 "Signaled System Error on %s\n",
1853 pci_name (dev));
1854
1855 if (status & (PCI_STATUS_PARITY)) {
1856 printk(KERN_CRIT
1857 "EDAC PCI-Bridge- "
1858 "Master Data Parity Error on %s\n",
1859 pci_name (dev));
1860
1861 atomic_inc(&pci_parity_count);
1862 }
1863
1864 if (status & (PCI_STATUS_DETECTED_PARITY)) {
1865 printk(KERN_CRIT
1866 "EDAC PCI-Bridge- "
1867 "Detected Parity Error on %s\n",
1868 pci_name (dev));
1869
1870 atomic_inc(&pci_parity_count);
1871 }
1872 }
1873 }
1874}
1875
1876/*
1877 * check_dev_on_list: Scan for a PCI device on a white/black list
1878 * @list: an EDAC &edac_pci_device_list white/black list pointer
1879 * @free_index: index of next free entry on the list
1880 * @pci_dev: PCI Device pointer
1881 *
1882 * see if list contains the device.
1883 *
1884 * Returns: 0 not found
1885 * 1 found on list
1886 */
1887static int check_dev_on_list(struct edac_pci_device_list *list, int free_index,
1888 struct pci_dev *dev)
1889{
1890 int i;
1891 int rc = 0; /* Assume not found */
1892 unsigned short vendor=dev->vendor;
1893 unsigned short device=dev->device;
1894
1895 /* Scan the list, looking for a vendor/device match
1896 */
1897 for (i = 0; i < free_index; i++, list++ ) {
1898 if ( (list->vendor == vendor ) &&
1899 (list->device == device )) {
1900 rc = 1;
1901 break;
1902 }
1903 }
1904
1905 return rc;
1906}
1907
1908/*
1909 * pci_dev parity list iterator
1910 * Scan the PCI device list for one iteration, looking for SERRORs
1911 * Master Parity ERRORS or Parity ERRORs on primary or secondary devices
1912 */
1913static inline void edac_pci_dev_parity_iterator(pci_parity_check_fn_t fn)
1914{
1915 struct pci_dev *dev=NULL;
1916
1917 /* request for kernel access to the next PCI device, if any,
1918 * and while we are looking at it have its reference count
1919 * bumped until we are done with it
1920 */
1921 while((dev = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, dev)) != NULL) {
1922
1923 /* if whitelist exists then it has priority, so only scan those
1924 * devices on the whitelist
1925 */
1926 if (pci_whitelist_count > 0 ) {
1927 if (check_dev_on_list(pci_whitelist,
1928 pci_whitelist_count, dev))
1929 fn(dev);
1930 } else {
1931 /*
1932 * if no whitelist, then check if this devices is
1933 * blacklisted
1934 */
1935 if (!check_dev_on_list(pci_blacklist,
1936 pci_blacklist_count, dev))
1937 fn(dev);
1938 }
1939 }
1940}
1941
1942static void do_pci_parity_check(void)
1943{
1944 unsigned long flags;
1945 int before_count;
1946
1947 debugf3("MC: " __FILE__ ": %s()\n", __func__);
1948
1949 if (!check_pci_parity)
1950 return;
1951
1952 before_count = atomic_read(&pci_parity_count);
1953
1954 /* scan all PCI devices looking for a Parity Error on devices and
1955 * bridges
1956 */
1957 local_irq_save(flags);
1958 edac_pci_dev_parity_iterator(edac_pci_dev_parity_test);
1959 local_irq_restore(flags);
1960
1961 /* Only if operator has selected panic on PCI Error */
1962 if (panic_on_pci_parity) {
1963 /* If the count is different 'after' from 'before' */
1964 if (before_count != atomic_read(&pci_parity_count))
1965 panic("EDAC: PCI Parity Error");
1966 }
1967}
1968
1969
1970static inline void clear_pci_parity_errors(void)
1971{
1972 /* Clear any PCI bus parity errors that devices initially have logged
1973 * in their registers.
1974 */
1975 edac_pci_dev_parity_iterator(edac_pci_dev_parity_clear);
1976}
1977
1978
1979#else /* CONFIG_PCI */
1980
1981
1982static inline void do_pci_parity_check(void)
1983{
1984 /* no-op */
1985}
1986
1987
1988static inline void clear_pci_parity_errors(void)
1989{
1990 /* no-op */
1991}
1992
1993
1994#endif /* CONFIG_PCI */
1995
1996/*
1997 * Iterate over all MC instances and check for ECC, et al, errors
1998 */
1999static inline void check_mc_devices (void)
2000{
2001 unsigned long flags;
2002 struct list_head *item;
2003 struct mem_ctl_info *mci;
2004
2005 debugf3("MC: " __FILE__ ": %s()\n", __func__);
2006
2007 /* during poll, have interrupts off */
2008 local_irq_save(flags);
2009
2010 list_for_each(item, &mc_devices) {
2011 mci = list_entry(item, struct mem_ctl_info, link);
2012
2013 if (mci->edac_check != NULL)
2014 mci->edac_check(mci);
2015 }
2016
2017 local_irq_restore(flags);
2018}
2019
2020
2021/*
2022 * Check MC status every poll_msec.
2023 * Check PCI status every poll_msec as well.
2024 *
2025 * This where the work gets done for edac.
2026 *
2027 * SMP safe, doesn't use NMI, and auto-rate-limits.
2028 */
2029static void do_edac_check(void)
2030{
2031
2032 debugf3("MC: " __FILE__ ": %s()\n", __func__);
2033
2034 check_mc_devices();
2035
2036 do_pci_parity_check();
2037}
2038
2039
2040/*
2041 * EDAC thread state information
2042 */
2043struct bs_thread_info
2044{
2045 struct task_struct *task;
2046 struct completion *event;
2047 char *name;
2048 void (*run)(void);
2049};
2050
2051static struct bs_thread_info bs_thread;
2052
2053/*
2054 * edac_kernel_thread
2055 * This the kernel thread that processes edac operations
2056 * in a normal thread environment
2057 */
2058static int edac_kernel_thread(void *arg)
2059{
2060 struct bs_thread_info *thread = (struct bs_thread_info *) arg;
2061
2062 /* detach thread */
2063 daemonize(thread->name);
2064
2065 current->exit_signal = SIGCHLD;
2066 allow_signal(SIGKILL);
2067 thread->task = current;
2068
2069 /* indicate to starting task we have started */
2070 complete(thread->event);
2071
2072 /* loop forever, until we are told to stop */
2073 while(thread->run != NULL) {
2074 void (*run)(void);
2075
2076 /* call the function to check the memory controllers */
2077 run = thread->run;
2078 if (run)
2079 run();
2080
2081 if (signal_pending(current))
2082 flush_signals(current);
2083
2084 /* ensure we are interruptable */
2085 set_current_state(TASK_INTERRUPTIBLE);
2086
2087 /* goto sleep for the interval */
2088 schedule_timeout((HZ * poll_msec) / 1000);
2089 try_to_freeze();
2090 }
2091
2092 /* notify waiter that we are exiting */
2093 complete(thread->event);
2094
2095 return 0;
2096}
2097
2098/*
2099 * edac_mc_init
2100 * module initialization entry point
2101 */
2102static int __init edac_mc_init(void)
2103{
2104 int ret;
2105 struct completion event;
2106
2107 printk(KERN_INFO "MC: " __FILE__ " version " EDAC_MC_VERSION "\n");
2108
2109 /*
2110 * Harvest and clear any boot/initialization PCI parity errors
2111 *
2112 * FIXME: This only clears errors logged by devices present at time of
2113 * module initialization. We should also do an initial clear
2114 * of each newly hotplugged device.
2115 */
2116 clear_pci_parity_errors();
2117
2118 /* perform check for first time to harvest boot leftovers */
2119 do_edac_check();
2120
2121 /* Create the MC sysfs entires */
2122 if (edac_sysfs_memctrl_setup()) {
2123 printk(KERN_ERR "EDAC MC: Error initializing sysfs code\n");
2124 return -ENODEV;
2125 }
2126
2127 /* Create the PCI parity sysfs entries */
2128 if (edac_sysfs_pci_setup()) {
2129 edac_sysfs_memctrl_teardown();
2130 printk(KERN_ERR "EDAC PCI: Error initializing sysfs code\n");
2131 return -ENODEV;
2132 }
2133
2134 /* Create our kernel thread */
2135 init_completion(&event);
2136 bs_thread.event = &event;
2137 bs_thread.name = "kedac";
2138 bs_thread.run = do_edac_check;
2139
2140 /* create our kernel thread */
2141 ret = kernel_thread(edac_kernel_thread, &bs_thread, CLONE_KERNEL);
2142 if (ret < 0) {
2143 /* remove the sysfs entries */
2144 edac_sysfs_memctrl_teardown();
2145 edac_sysfs_pci_teardown();
2146 return -ENOMEM;
2147 }
2148
2149 /* wait for our kernel theard ack that it is up and running */
2150 wait_for_completion(&event);
2151
2152 return 0;
2153}
2154
2155
2156/*
2157 * edac_mc_exit()
2158 * module exit/termination functioni
2159 */
2160static void __exit edac_mc_exit(void)
2161{
2162 struct completion event;
2163
2164 debugf0("MC: " __FILE__ ": %s()\n", __func__);
2165
2166 init_completion(&event);
2167 bs_thread.event = &event;
2168
2169 /* As soon as ->run is set to NULL, the task could disappear,
2170 * so we need to hold tasklist_lock until we have sent the signal
2171 */
2172 read_lock(&tasklist_lock);
2173 bs_thread.run = NULL;
2174 send_sig(SIGKILL, bs_thread.task, 1);
2175 read_unlock(&tasklist_lock);
2176 wait_for_completion(&event);
2177
2178 /* tear down the sysfs device */
2179 edac_sysfs_memctrl_teardown();
2180 edac_sysfs_pci_teardown();
2181}
2182
2183
2184
2185
2186module_init(edac_mc_init);
2187module_exit(edac_mc_exit);
2188
2189MODULE_LICENSE("GPL");
2190MODULE_AUTHOR("Linux Networx (http://lnxi.com) Thayne Harbaugh et al\n"
2191 "Based on.work by Dan Hollis et al");
2192MODULE_DESCRIPTION("Core library routines for MC reporting");
2193
2194module_param(panic_on_ue, int, 0644);
2195MODULE_PARM_DESC(panic_on_ue, "Panic on uncorrected error: 0=off 1=on");
2196module_param(check_pci_parity, int, 0644);
2197MODULE_PARM_DESC(check_pci_parity, "Check for PCI bus parity errors: 0=off 1=on");
2198module_param(panic_on_pci_parity, int, 0644);
2199MODULE_PARM_DESC(panic_on_pci_parity, "Panic on PCI Bus Parity error: 0=off 1=on");
2200module_param(log_ue, int, 0644);
2201MODULE_PARM_DESC(log_ue, "Log uncorrectable error to console: 0=off 1=on");
2202module_param(log_ce, int, 0644);
2203MODULE_PARM_DESC(log_ce, "Log correctable error to console: 0=off 1=on");
2204module_param(poll_msec, int, 0644);
2205MODULE_PARM_DESC(poll_msec, "Polling period in milliseconds");
2206#ifdef CONFIG_EDAC_DEBUG
2207module_param(edac_debug_level, int, 0644);
2208MODULE_PARM_DESC(edac_debug_level, "Debug level");
2209#endif
diff --git a/drivers/edac/edac_mc.h b/drivers/edac/edac_mc.h
new file mode 100644
index 000000000000..75ecf484a43a
--- /dev/null
+++ b/drivers/edac/edac_mc.h
@@ -0,0 +1,448 @@
1/*
2 * MC kernel module
3 * (C) 2003 Linux Networx (http://lnxi.com)
4 * This file may be distributed under the terms of the
5 * GNU General Public License.
6 *
7 * Written by Thayne Harbaugh
8 * Based on work by Dan Hollis <goemon at anime dot net> and others.
9 * http://www.anime.net/~goemon/linux-ecc/
10 *
11 * NMI handling support added by
12 * Dave Peterson <dsp@llnl.gov> <dave_peterson@pobox.com>
13 *
14 * $Id: edac_mc.h,v 1.4.2.10 2005/10/05 00:43:44 dsp_llnl Exp $
15 *
16 */
17
18
19#ifndef _EDAC_MC_H_
20#define _EDAC_MC_H_
21
22
23#include <linux/config.h>
24#include <linux/kernel.h>
25#include <linux/types.h>
26#include <linux/module.h>
27#include <linux/spinlock.h>
28#include <linux/smp.h>
29#include <linux/pci.h>
30#include <linux/time.h>
31#include <linux/nmi.h>
32#include <linux/rcupdate.h>
33#include <linux/completion.h>
34#include <linux/kobject.h>
35
36
37#define EDAC_MC_LABEL_LEN 31
38#define MC_PROC_NAME_MAX_LEN 7
39
40#if PAGE_SHIFT < 20
41#define PAGES_TO_MiB( pages ) ( ( pages ) >> ( 20 - PAGE_SHIFT ) )
42#else /* PAGE_SHIFT > 20 */
43#define PAGES_TO_MiB( pages ) ( ( pages ) << ( PAGE_SHIFT - 20 ) )
44#endif
45
46#ifdef CONFIG_EDAC_DEBUG
47extern int edac_debug_level;
48#define edac_debug_printk(level, fmt, args...) \
49do { if (level <= edac_debug_level) printk(KERN_DEBUG fmt, ##args); } while(0)
50#define debugf0( ... ) edac_debug_printk(0, __VA_ARGS__ )
51#define debugf1( ... ) edac_debug_printk(1, __VA_ARGS__ )
52#define debugf2( ... ) edac_debug_printk(2, __VA_ARGS__ )
53#define debugf3( ... ) edac_debug_printk(3, __VA_ARGS__ )
54#define debugf4( ... ) edac_debug_printk(4, __VA_ARGS__ )
55#else /* !CONFIG_EDAC_DEBUG */
56#define debugf0( ... )
57#define debugf1( ... )
58#define debugf2( ... )
59#define debugf3( ... )
60#define debugf4( ... )
61#endif /* !CONFIG_EDAC_DEBUG */
62
63
64#define bs_xstr(s) bs_str(s)
65#define bs_str(s) #s
66#define BS_MOD_STR bs_xstr(KBUILD_BASENAME)
67
68#define BIT(x) (1 << (x))
69
70#define PCI_VEND_DEV(vend, dev) PCI_VENDOR_ID_ ## vend, PCI_DEVICE_ID_ ## vend ## _ ## dev
71
72/* memory devices */
73enum dev_type {
74 DEV_UNKNOWN = 0,
75 DEV_X1,
76 DEV_X2,
77 DEV_X4,
78 DEV_X8,
79 DEV_X16,
80 DEV_X32, /* Do these parts exist? */
81 DEV_X64 /* Do these parts exist? */
82};
83
84#define DEV_FLAG_UNKNOWN BIT(DEV_UNKNOWN)
85#define DEV_FLAG_X1 BIT(DEV_X1)
86#define DEV_FLAG_X2 BIT(DEV_X2)
87#define DEV_FLAG_X4 BIT(DEV_X4)
88#define DEV_FLAG_X8 BIT(DEV_X8)
89#define DEV_FLAG_X16 BIT(DEV_X16)
90#define DEV_FLAG_X32 BIT(DEV_X32)
91#define DEV_FLAG_X64 BIT(DEV_X64)
92
93/* memory types */
94enum mem_type {
95 MEM_EMPTY = 0, /* Empty csrow */
96 MEM_RESERVED, /* Reserved csrow type */
97 MEM_UNKNOWN, /* Unknown csrow type */
98 MEM_FPM, /* Fast page mode */
99 MEM_EDO, /* Extended data out */
100 MEM_BEDO, /* Burst Extended data out */
101 MEM_SDR, /* Single data rate SDRAM */
102 MEM_RDR, /* Registered single data rate SDRAM */
103 MEM_DDR, /* Double data rate SDRAM */
104 MEM_RDDR, /* Registered Double data rate SDRAM */
105 MEM_RMBS /* Rambus DRAM */
106};
107
108#define MEM_FLAG_EMPTY BIT(MEM_EMPTY)
109#define MEM_FLAG_RESERVED BIT(MEM_RESERVED)
110#define MEM_FLAG_UNKNOWN BIT(MEM_UNKNOWN)
111#define MEM_FLAG_FPM BIT(MEM_FPM)
112#define MEM_FLAG_EDO BIT(MEM_EDO)
113#define MEM_FLAG_BEDO BIT(MEM_BEDO)
114#define MEM_FLAG_SDR BIT(MEM_SDR)
115#define MEM_FLAG_RDR BIT(MEM_RDR)
116#define MEM_FLAG_DDR BIT(MEM_DDR)
117#define MEM_FLAG_RDDR BIT(MEM_RDDR)
118#define MEM_FLAG_RMBS BIT(MEM_RMBS)
119
120
121/* chipset Error Detection and Correction capabilities and mode */
122enum edac_type {
123 EDAC_UNKNOWN = 0, /* Unknown if ECC is available */
124 EDAC_NONE, /* Doesnt support ECC */
125 EDAC_RESERVED, /* Reserved ECC type */
126 EDAC_PARITY, /* Detects parity errors */
127 EDAC_EC, /* Error Checking - no correction */
128 EDAC_SECDED, /* Single bit error correction, Double detection */
129 EDAC_S2ECD2ED, /* Chipkill x2 devices - do these exist? */
130 EDAC_S4ECD4ED, /* Chipkill x4 devices */
131 EDAC_S8ECD8ED, /* Chipkill x8 devices */
132 EDAC_S16ECD16ED, /* Chipkill x16 devices */
133};
134
135#define EDAC_FLAG_UNKNOWN BIT(EDAC_UNKNOWN)
136#define EDAC_FLAG_NONE BIT(EDAC_NONE)
137#define EDAC_FLAG_PARITY BIT(EDAC_PARITY)
138#define EDAC_FLAG_EC BIT(EDAC_EC)
139#define EDAC_FLAG_SECDED BIT(EDAC_SECDED)
140#define EDAC_FLAG_S2ECD2ED BIT(EDAC_S2ECD2ED)
141#define EDAC_FLAG_S4ECD4ED BIT(EDAC_S4ECD4ED)
142#define EDAC_FLAG_S8ECD8ED BIT(EDAC_S8ECD8ED)
143#define EDAC_FLAG_S16ECD16ED BIT(EDAC_S16ECD16ED)
144
145
146/* scrubbing capabilities */
147enum scrub_type {
148 SCRUB_UNKNOWN = 0, /* Unknown if scrubber is available */
149 SCRUB_NONE, /* No scrubber */
150 SCRUB_SW_PROG, /* SW progressive (sequential) scrubbing */
151 SCRUB_SW_SRC, /* Software scrub only errors */
152 SCRUB_SW_PROG_SRC, /* Progressive software scrub from an error */
153 SCRUB_SW_TUNABLE, /* Software scrub frequency is tunable */
154 SCRUB_HW_PROG, /* HW progressive (sequential) scrubbing */
155 SCRUB_HW_SRC, /* Hardware scrub only errors */
156 SCRUB_HW_PROG_SRC, /* Progressive hardware scrub from an error */
157 SCRUB_HW_TUNABLE /* Hardware scrub frequency is tunable */
158};
159
160#define SCRUB_FLAG_SW_PROG BIT(SCRUB_SW_PROG)
161#define SCRUB_FLAG_SW_SRC BIT(SCRUB_SW_SRC_CORR)
162#define SCRUB_FLAG_SW_PROG_SRC BIT(SCRUB_SW_PROG_SRC_CORR)
163#define SCRUB_FLAG_SW_TUN BIT(SCRUB_SW_SCRUB_TUNABLE)
164#define SCRUB_FLAG_HW_PROG BIT(SCRUB_HW_PROG)
165#define SCRUB_FLAG_HW_SRC BIT(SCRUB_HW_SRC_CORR)
166#define SCRUB_FLAG_HW_PROG_SRC BIT(SCRUB_HW_PROG_SRC_CORR)
167#define SCRUB_FLAG_HW_TUN BIT(SCRUB_HW_TUNABLE)
168
169enum mci_sysfs_status {
170 MCI_SYSFS_INACTIVE = 0, /* sysfs entries NOT registered */
171 MCI_SYSFS_ACTIVE /* sysfs entries ARE registered */
172};
173
174/* FIXME - should have notify capabilities: NMI, LOG, PROC, etc */
175
176/*
177 * There are several things to be aware of that aren't at all obvious:
178 *
179 *
180 * SOCKETS, SOCKET SETS, BANKS, ROWS, CHIP-SELECT ROWS, CHANNELS, etc..
181 *
182 * These are some of the many terms that are thrown about that don't always
183 * mean what people think they mean (Inconceivable!). In the interest of
184 * creating a common ground for discussion, terms and their definitions
185 * will be established.
186 *
187 * Memory devices: The individual chip on a memory stick. These devices
188 * commonly output 4 and 8 bits each. Grouping several
189 * of these in parallel provides 64 bits which is common
190 * for a memory stick.
191 *
192 * Memory Stick: A printed circuit board that agregates multiple
193 * memory devices in parallel. This is the atomic
194 * memory component that is purchaseable by Joe consumer
195 * and loaded into a memory socket.
196 *
197 * Socket: A physical connector on the motherboard that accepts
198 * a single memory stick.
199 *
200 * Channel: Set of memory devices on a memory stick that must be
201 * grouped in parallel with one or more additional
202 * channels from other memory sticks. This parallel
203 * grouping of the output from multiple channels are
204 * necessary for the smallest granularity of memory access.
205 * Some memory controllers are capable of single channel -
206 * which means that memory sticks can be loaded
207 * individually. Other memory controllers are only
208 * capable of dual channel - which means that memory
209 * sticks must be loaded as pairs (see "socket set").
210 *
211 * Chip-select row: All of the memory devices that are selected together.
212 * for a single, minimum grain of memory access.
213 * This selects all of the parallel memory devices across
214 * all of the parallel channels. Common chip-select rows
215 * for single channel are 64 bits, for dual channel 128
216 * bits.
217 *
218 * Single-Ranked stick: A Single-ranked stick has 1 chip-select row of memmory.
219 * Motherboards commonly drive two chip-select pins to
220 * a memory stick. A single-ranked stick, will occupy
221 * only one of those rows. The other will be unused.
222 *
223 * Double-Ranked stick: A double-ranked stick has two chip-select rows which
224 * access different sets of memory devices. The two
225 * rows cannot be accessed concurrently.
226 *
227 * Double-sided stick: DEPRECATED TERM, see Double-Ranked stick.
228 * A double-sided stick has two chip-select rows which
229 * access different sets of memory devices. The two
230 * rows cannot be accessed concurrently. "Double-sided"
231 * is irrespective of the memory devices being mounted
232 * on both sides of the memory stick.
233 *
234 * Socket set: All of the memory sticks that are required for for
235 * a single memory access or all of the memory sticks
236 * spanned by a chip-select row. A single socket set
237 * has two chip-select rows and if double-sided sticks
238 * are used these will occupy those chip-select rows.
239 *
240 * Bank: This term is avoided because it is unclear when
241 * needing to distinguish between chip-select rows and
242 * socket sets.
243 *
244 * Controller pages:
245 *
246 * Physical pages:
247 *
248 * Virtual pages:
249 *
250 *
251 * STRUCTURE ORGANIZATION AND CHOICES
252 *
253 *
254 *
255 * PS - I enjoyed writing all that about as much as you enjoyed reading it.
256 */
257
258
259struct channel_info {
260 int chan_idx; /* channel index */
261 u32 ce_count; /* Correctable Errors for this CHANNEL */
262 char label[EDAC_MC_LABEL_LEN + 1]; /* DIMM label on motherboard */
263 struct csrow_info *csrow; /* the parent */
264};
265
266
267struct csrow_info {
268 unsigned long first_page; /* first page number in dimm */
269 unsigned long last_page; /* last page number in dimm */
270 unsigned long page_mask; /* used for interleaving -
271 0UL for non intlv */
272 u32 nr_pages; /* number of pages in csrow */
273 u32 grain; /* granularity of reported error in bytes */
274 int csrow_idx; /* the chip-select row */
275 enum dev_type dtype; /* memory device type */
276 u32 ue_count; /* Uncorrectable Errors for this csrow */
277 u32 ce_count; /* Correctable Errors for this csrow */
278 enum mem_type mtype; /* memory csrow type */
279 enum edac_type edac_mode; /* EDAC mode for this csrow */
280 struct mem_ctl_info *mci; /* the parent */
281
282 struct kobject kobj; /* sysfs kobject for this csrow */
283
284 /* FIXME the number of CHANNELs might need to become dynamic */
285 u32 nr_channels;
286 struct channel_info *channels;
287};
288
289
290struct mem_ctl_info {
291 struct list_head link; /* for global list of mem_ctl_info structs */
292 unsigned long mtype_cap; /* memory types supported by mc */
293 unsigned long edac_ctl_cap; /* Mem controller EDAC capabilities */
294 unsigned long edac_cap; /* configuration capabilities - this is
295 closely related to edac_ctl_cap. The
296 difference is that the controller
297 may be capable of s4ecd4ed which would
298 be listed in edac_ctl_cap, but if
299 channels aren't capable of s4ecd4ed then the
300 edac_cap would not have that capability. */
301 unsigned long scrub_cap; /* chipset scrub capabilities */
302 enum scrub_type scrub_mode; /* current scrub mode */
303
304 enum mci_sysfs_status sysfs_active; /* status of sysfs */
305
306 /* pointer to edac checking routine */
307 void (*edac_check) (struct mem_ctl_info * mci);
308 /*
309 * Remaps memory pages: controller pages to physical pages.
310 * For most MC's, this will be NULL.
311 */
312 /* FIXME - why not send the phys page to begin with? */
313 unsigned long (*ctl_page_to_phys) (struct mem_ctl_info * mci,
314 unsigned long page);
315 int mc_idx;
316 int nr_csrows;
317 struct csrow_info *csrows;
318 /*
319 * FIXME - what about controllers on other busses? - IDs must be
320 * unique. pdev pointer should be sufficiently unique, but
321 * BUS:SLOT.FUNC numbers may not be unique.
322 */
323 struct pci_dev *pdev;
324 const char *mod_name;
325 const char *mod_ver;
326 const char *ctl_name;
327 char proc_name[MC_PROC_NAME_MAX_LEN + 1];
328 void *pvt_info;
329 u32 ue_noinfo_count; /* Uncorrectable Errors w/o info */
330 u32 ce_noinfo_count; /* Correctable Errors w/o info */
331 u32 ue_count; /* Total Uncorrectable Errors for this MC */
332 u32 ce_count; /* Total Correctable Errors for this MC */
333 unsigned long start_time; /* mci load start time (in jiffies) */
334
335 /* this stuff is for safe removal of mc devices from global list while
336 * NMI handlers may be traversing list
337 */
338 struct rcu_head rcu;
339 struct completion complete;
340
341 /* edac sysfs device control */
342 struct kobject edac_mci_kobj;
343};
344
345
346
347/* write all or some bits in a byte-register*/
348static inline void pci_write_bits8(struct pci_dev *pdev, int offset,
349 u8 value, u8 mask)
350{
351 if (mask != 0xff) {
352 u8 buf;
353 pci_read_config_byte(pdev, offset, &buf);
354 value &= mask;
355 buf &= ~mask;
356 value |= buf;
357 }
358 pci_write_config_byte(pdev, offset, value);
359}
360
361
362/* write all or some bits in a word-register*/
363static inline void pci_write_bits16(struct pci_dev *pdev, int offset,
364 u16 value, u16 mask)
365{
366 if (mask != 0xffff) {
367 u16 buf;
368 pci_read_config_word(pdev, offset, &buf);
369 value &= mask;
370 buf &= ~mask;
371 value |= buf;
372 }
373 pci_write_config_word(pdev, offset, value);
374}
375
376
377/* write all or some bits in a dword-register*/
378static inline void pci_write_bits32(struct pci_dev *pdev, int offset,
379 u32 value, u32 mask)
380{
381 if (mask != 0xffff) {
382 u32 buf;
383 pci_read_config_dword(pdev, offset, &buf);
384 value &= mask;
385 buf &= ~mask;
386 value |= buf;
387 }
388 pci_write_config_dword(pdev, offset, value);
389}
390
391
392#ifdef CONFIG_EDAC_DEBUG
393void edac_mc_dump_channel(struct channel_info *chan);
394void edac_mc_dump_mci(struct mem_ctl_info *mci);
395void edac_mc_dump_csrow(struct csrow_info *csrow);
396#endif /* CONFIG_EDAC_DEBUG */
397
398extern int edac_mc_add_mc(struct mem_ctl_info *mci);
399extern int edac_mc_del_mc(struct mem_ctl_info *mci);
400
401extern int edac_mc_find_csrow_by_page(struct mem_ctl_info *mci,
402 unsigned long page);
403
404extern struct mem_ctl_info *edac_mc_find_mci_by_pdev(struct pci_dev
405 *pdev);
406
407extern void edac_mc_scrub_block(unsigned long page,
408 unsigned long offset, u32 size);
409
410/*
411 * The no info errors are used when error overflows are reported.
412 * There are a limited number of error logging registers that can
413 * be exausted. When all registers are exhausted and an additional
414 * error occurs then an error overflow register records that an
415 * error occured and the type of error, but doesn't have any
416 * further information. The ce/ue versions make for cleaner
417 * reporting logic and function interface - reduces conditional
418 * statement clutter and extra function arguments.
419 */
420extern void edac_mc_handle_ce(struct mem_ctl_info *mci,
421 unsigned long page_frame_number,
422 unsigned long offset_in_page,
423 unsigned long syndrome,
424 int row, int channel, const char *msg);
425
426extern void edac_mc_handle_ce_no_info(struct mem_ctl_info *mci,
427 const char *msg);
428
429extern void edac_mc_handle_ue(struct mem_ctl_info *mci,
430 unsigned long page_frame_number,
431 unsigned long offset_in_page,
432 int row, const char *msg);
433
434extern void edac_mc_handle_ue_no_info(struct mem_ctl_info *mci,
435 const char *msg);
436
437/*
438 * This kmalloc's and initializes all the structures.
439 * Can't be used if all structures don't have the same lifetime.
440 */
441extern struct mem_ctl_info *edac_mc_alloc(unsigned sz_pvt,
442 unsigned nr_csrows, unsigned nr_chans);
443
444/* Free an mc previously allocated by edac_mc_alloc() */
445extern void edac_mc_free(struct mem_ctl_info *mci);
446
447
448#endif /* _EDAC_MC_H_ */
diff --git a/drivers/edac/i82860_edac.c b/drivers/edac/i82860_edac.c
index bfb7ae02e379..52596e75f9c2 100644
--- a/drivers/edac/i82860_edac.c
+++ b/drivers/edac/i82860_edac.c
@@ -253,7 +253,7 @@ static struct pci_driver i82860_driver = {
253 .id_table = i82860_pci_tbl, 253 .id_table = i82860_pci_tbl,
254}; 254};
255 255
256int __init i82860_init(void) 256static int __init i82860_init(void)
257{ 257{
258 int pci_rc; 258 int pci_rc;
259 259
diff --git a/drivers/edac/i82875p_edac.c b/drivers/edac/i82875p_edac.c
index 79d14dfbcbdc..009c08fe5d69 100644
--- a/drivers/edac/i82875p_edac.c
+++ b/drivers/edac/i82875p_edac.c
@@ -483,7 +483,7 @@ static struct pci_driver i82875p_driver = {
483}; 483};
484 484
485 485
486int __init i82875p_init(void) 486static int __init i82875p_init(void)
487{ 487{
488 int pci_rc; 488 int pci_rc;
489 489
diff --git a/drivers/edac/r82600_edac.c b/drivers/edac/r82600_edac.c
index b6399a542560..e90892831b90 100644
--- a/drivers/edac/r82600_edac.c
+++ b/drivers/edac/r82600_edac.c
@@ -381,7 +381,7 @@ static struct pci_driver r82600_driver = {
381}; 381};
382 382
383 383
384int __init r82600_init(void) 384static int __init r82600_init(void)
385{ 385{
386 return pci_register_driver(&r82600_driver); 386 return pci_register_driver(&r82600_driver);
387} 387}
diff --git a/include/asm-i386/atomic.h b/include/asm-i386/atomic.h
index e2c00c95a5e1..de649d3aa2d4 100644
--- a/include/asm-i386/atomic.h
+++ b/include/asm-i386/atomic.h
@@ -255,17 +255,5 @@ __asm__ __volatile__(LOCK "orl %0,%1" \
255#define smp_mb__before_atomic_inc() barrier() 255#define smp_mb__before_atomic_inc() barrier()
256#define smp_mb__after_atomic_inc() barrier() 256#define smp_mb__after_atomic_inc() barrier()
257 257
258/* ECC atomic, DMA, SMP and interrupt safe scrub function */
259
260static __inline__ void atomic_scrub(unsigned long *virt_addr, u32 size)
261{
262 u32 i;
263 for (i = 0; i < size / 4; i++, virt_addr++)
264 /* Very carefully read and write to memory atomically
265 * so we are interrupt, DMA and SMP safe.
266 */
267 __asm__ __volatile__("lock; addl $0, %0"::"m"(*virt_addr));
268}
269
270#include <asm-generic/atomic.h> 258#include <asm-generic/atomic.h>
271#endif 259#endif
diff --git a/include/asm-i386/edac.h b/include/asm-i386/edac.h
new file mode 100644
index 000000000000..3e7dd0ab68ce
--- /dev/null
+++ b/include/asm-i386/edac.h
@@ -0,0 +1,18 @@
1#ifndef ASM_EDAC_H
2#define ASM_EDAC_H
3
4/* ECC atomic, DMA, SMP and interrupt safe scrub function */
5
6static __inline__ void atomic_scrub(void *va, u32 size)
7{
8 unsigned long *virt_addr = va;
9 u32 i;
10
11 for (i = 0; i < size / 4; i++, virt_addr++)
12 /* Very carefully read and write to memory atomically
13 * so we are interrupt, DMA and SMP safe.
14 */
15 __asm__ __volatile__("lock; addl $0, %0"::"m"(*virt_addr));
16}
17
18#endif
diff --git a/include/asm-x86_64/atomic.h b/include/asm-x86_64/atomic.h
index 4048508c4f40..4b5cd553e772 100644
--- a/include/asm-x86_64/atomic.h
+++ b/include/asm-x86_64/atomic.h
@@ -426,17 +426,5 @@ __asm__ __volatile__(LOCK "orl %0,%1" \
426#define smp_mb__before_atomic_inc() barrier() 426#define smp_mb__before_atomic_inc() barrier()
427#define smp_mb__after_atomic_inc() barrier() 427#define smp_mb__after_atomic_inc() barrier()
428 428
429/* ECC atomic, DMA, SMP and interrupt safe scrub function */
430
431static __inline__ void atomic_scrub(u32 *virt_addr, u32 size)
432{
433 u32 i;
434 for (i = 0; i < size / 4; i++, virt_addr++)
435 /* Very carefully read and write to memory atomically
436 * so we are interrupt, DMA and SMP safe.
437 */
438 __asm__ __volatile__("lock; addl $0, %0"::"m"(*virt_addr));
439}
440
441#include <asm-generic/atomic.h> 429#include <asm-generic/atomic.h>
442#endif 430#endif
diff --git a/include/asm-x86_64/edac.h b/include/asm-x86_64/edac.h
new file mode 100644
index 000000000000..cad1cd42b4ee
--- /dev/null
+++ b/include/asm-x86_64/edac.h
@@ -0,0 +1,18 @@
1#ifndef ASM_EDAC_H
2#define ASM_EDAC_H
3
4/* ECC atomic, DMA, SMP and interrupt safe scrub function */
5
6static __inline__ void atomic_scrub(void *va, u32 size)
7{
8 unsigned int *virt_addr = va;
9 u32 i;
10
11 for (i = 0; i < size / 4; i++, virt_addr++)
12 /* Very carefully read and write to memory atomically
13 * so we are interrupt, DMA and SMP safe.
14 */
15 __asm__ __volatile__("lock; addl $0, %0"::"m"(*virt_addr));
16}
17
18#endif