aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2012-07-30 12:53:50 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2012-07-30 12:53:50 -0400
commit8da8533dfb0929c5ea5d9fdf60ea6d3ffa02127d (patch)
tree1f9fe13e150dae31cf48ac3a88d5003040c2ec98 /Documentation
parentf50f118c4974f7c2208a54f96452165ffb880471 (diff)
parentc2078e4c9120e7b38b1a02cd9fc6dd4f792110bf (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC patches from Mauro Carvalho Chehab: - the second part of the EDAC rework: - Add the sysfs nodes that exports the real memory layout, instead of the fake one (needed to properly represent Intel memory controllers since 2002) - convert EDAC MC to use "struct device" instead of creating the sysfs nodes via the kobj API - adds a tracepoint to represent memory errors - some cleanup patches - some fixes at i5000, i5400 and EDAC core - a new EDAC driver for Caldera. * git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (33 commits) edac i5000, i5400: fix pointer math in i5000_get_mc_regs() edac: allow specifying the error count with fake_inject edac: add support for Calxeda highbank L2 cache ecc edac: add support for Calxeda highbank memory controller edac: create top-level debugfs directory sb_edac: properly handle error count i7core_edac: properly handle error count edac: edac_mc_handle_error(): add an error_count parameter edac: remove arch-specific parameter for the error handler amd64_edac: Don't pass driver name as an error parameter edac_mc: check for allocation failure in edac_mc_alloc() edac: Increase version to 3.0.0 edac_mc: Cleanup per-dimm_info debug messages edac: Convert debugfX to edac_dbg(X, edac: Use more normal debugging macro style edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs Edac: Add ABI Documentation for the new device nodes edac: move documentation ABI to ABI/testing/sysfs-devices-edac i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy edac: change the mem allocation scheme to make Documentation/kobject.txt happy ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/ABI/testing/sysfs-devices-edac140
-rw-r--r--Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt15
-rw-r--r--Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt14
-rw-r--r--Documentation/edac.txt112
4 files changed, 177 insertions, 104 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac
new file mode 100644
index 00000000000..30ee78aaed7
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-edac
@@ -0,0 +1,140 @@
1What: /sys/devices/system/edac/mc/mc*/reset_counters
2Date: January 2006
3Contact: linux-edac@vger.kernel.org
4Description: This write-only control file will zero all the statistical
5 counters for UE and CE errors on the given memory controller.
6 Zeroing the counters will also reset the timer indicating how
7 long since the last counter were reset. This is useful for
8 computing errors/time. Since the counters are always reset
9 at driver initialization time, no module/kernel parameter
10 is available.
11
12What: /sys/devices/system/edac/mc/mc*/seconds_since_reset
13Date: January 2006
14Contact: linux-edac@vger.kernel.org
15Description: This attribute file displays how many seconds have elapsed
16 since the last counter reset. This can be used with the error
17 counters to measure error rates.
18
19What: /sys/devices/system/edac/mc/mc*/mc_name
20Date: January 2006
21Contact: linux-edac@vger.kernel.org
22Description: This attribute file displays the type of memory controller
23 that is being utilized.
24
25What: /sys/devices/system/edac/mc/mc*/size_mb
26Date: January 2006
27Contact: linux-edac@vger.kernel.org
28Description: This attribute file displays, in count of megabytes, of memory
29 that this memory controller manages.
30
31What: /sys/devices/system/edac/mc/mc*/ue_count
32Date: January 2006
33Contact: linux-edac@vger.kernel.org
34Description: This attribute file displays the total count of uncorrectable
35 errors that have occurred on this memory controller. If
36 panic_on_ue is set, this counter will not have a chance to
37 increment, since EDAC will panic the system
38
39What: /sys/devices/system/edac/mc/mc*/ue_noinfo_count
40Date: January 2006
41Contact: linux-edac@vger.kernel.org
42Description: This attribute file displays the number of UEs that have
43 occurred on this memory controller with no information as to
44 which DIMM slot is having errors.
45
46What: /sys/devices/system/edac/mc/mc*/ce_count
47Date: January 2006
48Contact: linux-edac@vger.kernel.org
49Description: This attribute file displays the total count of correctable
50 errors that have occurred on this memory controller. This
51 count is very important to examine. CEs provide early
52 indications that a DIMM is beginning to fail. This count
53 field should be monitored for non-zero values and report
54 such information to the system administrator.
55
56What: /sys/devices/system/edac/mc/mc*/ce_noinfo_count
57Date: January 2006
58Contact: linux-edac@vger.kernel.org
59Description: This attribute file displays the number of CEs that
60 have occurred on this memory controller wherewith no
61 information as to which DIMM slot is having errors. Memory is
62 handicapped, but operational, yet no information is available
63 to indicate which slot the failing memory is in. This count
64 field should be also be monitored for non-zero values.
65
66What: /sys/devices/system/edac/mc/mc*/sdram_scrub_rate
67Date: February 2007
68Contact: linux-edac@vger.kernel.org
69Description: Read/Write attribute file that controls memory scrubbing.
70 The scrubbing rate used by the memory controller is set by
71 writing a minimum bandwidth in bytes/sec to the attribute file.
72 The rate will be translated to an internal value that gives at
73 least the specified rate.
74 Reading the file will return the actual scrubbing rate employed.
75 If configuration fails or memory scrubbing is not implemented,
76 the value of the attribute file will be -1.
77
78What: /sys/devices/system/edac/mc/mc*/max_location
79Date: April 2012
80Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
81 linux-edac@vger.kernel.org
82Description: This attribute file displays the information about the last
83 available memory slot in this memory controller. It is used by
84 userspace tools in order to display the memory filling layout.
85
86What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/size
87Date: April 2012
88Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
89 linux-edac@vger.kernel.org
90Description: This attribute file will display the size of dimm or rank.
91 For dimm*/size, this is the size, in MB of the DIMM memory
92 stick. For rank*/size, this is the size, in MB for one rank
93 of the DIMM memory stick. On single rank memories (1R), this
94 is also the total size of the dimm. On dual rank (2R) memories,
95 this is half the size of the total DIMM memories.
96
97What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type
98Date: April 2012
99Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
100 linux-edac@vger.kernel.org
101Description: This attribute file will display what type of DRAM device is
102 being utilized on this DIMM (x1, x2, x4, x8, ...).
103
104What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode
105Date: April 2012
106Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
107 linux-edac@vger.kernel.org
108Description: This attribute file will display what type of Error detection
109 and correction is being utilized. For example: S4ECD4ED would
110 mean a Chipkill with x4 DRAM.
111
112What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label
113Date: April 2012
114Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
115 linux-edac@vger.kernel.org
116Description: This control file allows this DIMM to have a label assigned
117 to it. With this label in the module, when errors occur
118 the output can provide the DIMM label in the system log.
119 This becomes vital for panic events to isolate the
120 cause of the UE event.
121 DIMM Labels must be assigned after booting, with information
122 that correctly identifies the physical slot with its
123 silk screen label. This information is currently very
124 motherboard specific and determination of this information
125 must occur in userland at this time.
126
127What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location
128Date: April 2012
129Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
130 linux-edac@vger.kernel.org
131Description: This attribute file will display the location (csrow/channel,
132 branch/channel/slot or channel/slot) of the dimm or rank.
133
134What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type
135Date: April 2012
136Contact: Mauro Carvalho Chehab <mchehab@redhat.com>
137 linux-edac@vger.kernel.org
138Description: This attribute file will display what type of memory is
139 currently on this csrow. Normally, either buffered or
140 unbuffered memory (for example, Unbuffered-DDR3).
diff --git a/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt
new file mode 100644
index 00000000000..94e642a33db
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt
@@ -0,0 +1,15 @@
1Calxeda Highbank L2 cache ECC
2
3Properties:
4- compatible : Should be "calxeda,hb-sregs-l2-ecc"
5- reg : Address and size for ECC error interrupt clear registers.
6- interrupts : Should be single bit error interrupt, then double bit error
7 interrupt.
8
9Example:
10
11 sregs@fff3c200 {
12 compatible = "calxeda,hb-sregs-l2-ecc";
13 reg = <0xfff3c200 0x100>;
14 interrupts = <0 71 4 0 72 4>;
15 };
diff --git a/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt
new file mode 100644
index 00000000000..f770ac0893d
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt
@@ -0,0 +1,14 @@
1Calxeda DDR memory controller
2
3Properties:
4- compatible : Should be "calxeda,hb-ddr-ctrl"
5- reg : Address and size for DDR controller registers.
6- interrupts : Interrupt for DDR controller.
7
8Example:
9
10 memory-controller@fff00000 {
11 compatible = "calxeda,hb-ddr-ctrl";
12 reg = <0xfff00000 0x1000>;
13 interrupts = <0 91 4>;
14 };
diff --git a/Documentation/edac.txt b/Documentation/edac.txt
index 03df2b02033..56c7e936430 100644
--- a/Documentation/edac.txt
+++ b/Documentation/edac.txt
@@ -232,116 +232,20 @@ EDAC control and attribute files.
232 232
233 233
234In 'mcX' directories are EDAC control and attribute files for 234In 'mcX' directories are EDAC control and attribute files for
235this 'X' instance of the memory controllers: 235this 'X' instance of the memory controllers.
236
237
238Counter reset control file:
239
240 'reset_counters'
241
242 This write-only control file will zero all the statistical counters
243 for UE and CE errors. Zeroing the counters will also reset the timer
244 indicating how long since the last counter zero. This is useful
245 for computing errors/time. Since the counters are always reset at
246 driver initialization time, no module/kernel parameter is available.
247
248 RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset
249
250 This resets the counters on memory controller 0
251
252
253Seconds since last counter reset control file:
254
255 'seconds_since_reset'
256
257 This attribute file displays how many seconds have elapsed since the
258 last counter reset. This can be used with the error counters to
259 measure error rates.
260
261
262
263Memory Controller name attribute file:
264
265 'mc_name'
266
267 This attribute file displays the type of memory controller
268 that is being utilized.
269
270
271Total memory managed by this memory controller attribute file:
272
273 'size_mb'
274
275 This attribute file displays, in count of megabytes, of memory
276 that this instance of memory controller manages.
277
278
279Total Uncorrectable Errors count attribute file:
280
281 'ue_count'
282
283 This attribute file displays the total count of uncorrectable
284 errors that have occurred on this memory controller. If panic_on_ue
285 is set this counter will not have a chance to increment,
286 since EDAC will panic the system.
287
288
289Total UE count that had no information attribute fileY:
290
291 'ue_noinfo_count'
292
293 This attribute file displays the number of UEs that have occurred
294 with no information as to which DIMM slot is having errors.
295
296
297Total Correctable Errors count attribute file:
298
299 'ce_count'
300
301 This attribute file displays the total count of correctable
302 errors that have occurred on this memory controller. This
303 count is very important to examine. CEs provide early
304 indications that a DIMM is beginning to fail. This count
305 field should be monitored for non-zero values and report
306 such information to the system administrator.
307
308
309Total Correctable Errors count attribute file:
310
311 'ce_noinfo_count'
312
313 This attribute file displays the number of CEs that
314 have occurred wherewith no information as to which DIMM slot
315 is having errors. Memory is handicapped, but operational,
316 yet no information is available to indicate which slot
317 the failing memory is in. This count field should be also
318 be monitored for non-zero values.
319
320Device Symlink:
321
322 'device'
323
324 Symlink to the memory controller device.
325
326Sdram memory scrubbing rate:
327
328 'sdram_scrub_rate'
329
330 Read/Write attribute file that controls memory scrubbing. The scrubbing
331 rate is set by writing a minimum bandwidth in bytes/sec to the attribute
332 file. The rate will be translated to an internal value that gives at
333 least the specified rate.
334
335 Reading the file will return the actual scrubbing rate employed.
336
337 If configuration fails or memory scrubbing is not implemented, accessing
338 that attribute will fail.
339 236
237For a description of the sysfs API, please see:
238 Documentation/ABI/testing/sysfs/devices-edac
340 239
341 240
342============================================================================ 241============================================================================
343'csrowX' DIRECTORIES 242'csrowX' DIRECTORIES
344 243
244When CONFIG_EDAC_LEGACY_SYSFS is enabled, the sysfs will contain the
245csrowX directories. As this API doesn't work properly for Rambus, FB-DIMMs
246and modern Intel Memory Controllers, this is being deprecated in favor
247of dimmX directories.
248
345In the 'csrowX' directories are EDAC control and attribute files for 249In the 'csrowX' directories are EDAC control and attribute files for
346this 'X' instance of csrow: 250this 'X' instance of csrow:
347 251