diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-30 12:53:50 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-07-30 12:53:50 -0400 |
commit | 8da8533dfb0929c5ea5d9fdf60ea6d3ffa02127d (patch) | |
tree | 1f9fe13e150dae31cf48ac3a88d5003040c2ec98 /Documentation | |
parent | f50f118c4974f7c2208a54f96452165ffb880471 (diff) | |
parent | c2078e4c9120e7b38b1a02cd9fc6dd4f792110bf (diff) |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac
Pull EDAC patches from Mauro Carvalho Chehab:
- the second part of the EDAC rework:
- Add the sysfs nodes that exports the real memory layout, instead
of the fake one (needed to properly represent Intel memory
controllers since 2002)
- convert EDAC MC to use "struct device" instead of creating the
sysfs nodes via the kobj API
- adds a tracepoint to represent memory errors
- some cleanup patches
- some fixes at i5000, i5400 and EDAC core
- a new EDAC driver for Caldera.
* git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-edac: (33 commits)
edac i5000, i5400: fix pointer math in i5000_get_mc_regs()
edac: allow specifying the error count with fake_inject
edac: add support for Calxeda highbank L2 cache ecc
edac: add support for Calxeda highbank memory controller
edac: create top-level debugfs directory
sb_edac: properly handle error count
i7core_edac: properly handle error count
edac: edac_mc_handle_error(): add an error_count parameter
edac: remove arch-specific parameter for the error handler
amd64_edac: Don't pass driver name as an error parameter
edac_mc: check for allocation failure in edac_mc_alloc()
edac: Increase version to 3.0.0
edac_mc: Cleanup per-dimm_info debug messages
edac: Convert debugfX to edac_dbg(X,
edac: Use more normal debugging macro style
edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs
Edac: Add ABI Documentation for the new device nodes
edac: move documentation ABI to ABI/testing/sysfs-devices-edac
i7core_edac: change the mem allocation scheme to make Documentation/kobject.txt happy
edac: change the mem allocation scheme to make Documentation/kobject.txt happy
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/ABI/testing/sysfs-devices-edac | 140 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt | 15 | ||||
-rw-r--r-- | Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt | 14 | ||||
-rw-r--r-- | Documentation/edac.txt | 112 |
4 files changed, 177 insertions, 104 deletions
diff --git a/Documentation/ABI/testing/sysfs-devices-edac b/Documentation/ABI/testing/sysfs-devices-edac new file mode 100644 index 00000000000..30ee78aaed7 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-edac | |||
@@ -0,0 +1,140 @@ | |||
1 | What: /sys/devices/system/edac/mc/mc*/reset_counters | ||
2 | Date: January 2006 | ||
3 | Contact: linux-edac@vger.kernel.org | ||
4 | Description: This write-only control file will zero all the statistical | ||
5 | counters for UE and CE errors on the given memory controller. | ||
6 | Zeroing the counters will also reset the timer indicating how | ||
7 | long since the last counter were reset. This is useful for | ||
8 | computing errors/time. Since the counters are always reset | ||
9 | at driver initialization time, no module/kernel parameter | ||
10 | is available. | ||
11 | |||
12 | What: /sys/devices/system/edac/mc/mc*/seconds_since_reset | ||
13 | Date: January 2006 | ||
14 | Contact: linux-edac@vger.kernel.org | ||
15 | Description: This attribute file displays how many seconds have elapsed | ||
16 | since the last counter reset. This can be used with the error | ||
17 | counters to measure error rates. | ||
18 | |||
19 | What: /sys/devices/system/edac/mc/mc*/mc_name | ||
20 | Date: January 2006 | ||
21 | Contact: linux-edac@vger.kernel.org | ||
22 | Description: This attribute file displays the type of memory controller | ||
23 | that is being utilized. | ||
24 | |||
25 | What: /sys/devices/system/edac/mc/mc*/size_mb | ||
26 | Date: January 2006 | ||
27 | Contact: linux-edac@vger.kernel.org | ||
28 | Description: This attribute file displays, in count of megabytes, of memory | ||
29 | that this memory controller manages. | ||
30 | |||
31 | What: /sys/devices/system/edac/mc/mc*/ue_count | ||
32 | Date: January 2006 | ||
33 | Contact: linux-edac@vger.kernel.org | ||
34 | Description: This attribute file displays the total count of uncorrectable | ||
35 | errors that have occurred on this memory controller. If | ||
36 | panic_on_ue is set, this counter will not have a chance to | ||
37 | increment, since EDAC will panic the system | ||
38 | |||
39 | What: /sys/devices/system/edac/mc/mc*/ue_noinfo_count | ||
40 | Date: January 2006 | ||
41 | Contact: linux-edac@vger.kernel.org | ||
42 | Description: This attribute file displays the number of UEs that have | ||
43 | occurred on this memory controller with no information as to | ||
44 | which DIMM slot is having errors. | ||
45 | |||
46 | What: /sys/devices/system/edac/mc/mc*/ce_count | ||
47 | Date: January 2006 | ||
48 | Contact: linux-edac@vger.kernel.org | ||
49 | Description: This attribute file displays the total count of correctable | ||
50 | errors that have occurred on this memory controller. This | ||
51 | count is very important to examine. CEs provide early | ||
52 | indications that a DIMM is beginning to fail. This count | ||
53 | field should be monitored for non-zero values and report | ||
54 | such information to the system administrator. | ||
55 | |||
56 | What: /sys/devices/system/edac/mc/mc*/ce_noinfo_count | ||
57 | Date: January 2006 | ||
58 | Contact: linux-edac@vger.kernel.org | ||
59 | Description: This attribute file displays the number of CEs that | ||
60 | have occurred on this memory controller wherewith no | ||
61 | information as to which DIMM slot is having errors. Memory is | ||
62 | handicapped, but operational, yet no information is available | ||
63 | to indicate which slot the failing memory is in. This count | ||
64 | field should be also be monitored for non-zero values. | ||
65 | |||
66 | What: /sys/devices/system/edac/mc/mc*/sdram_scrub_rate | ||
67 | Date: February 2007 | ||
68 | Contact: linux-edac@vger.kernel.org | ||
69 | Description: Read/Write attribute file that controls memory scrubbing. | ||
70 | The scrubbing rate used by the memory controller is set by | ||
71 | writing a minimum bandwidth in bytes/sec to the attribute file. | ||
72 | The rate will be translated to an internal value that gives at | ||
73 | least the specified rate. | ||
74 | Reading the file will return the actual scrubbing rate employed. | ||
75 | If configuration fails or memory scrubbing is not implemented, | ||
76 | the value of the attribute file will be -1. | ||
77 | |||
78 | What: /sys/devices/system/edac/mc/mc*/max_location | ||
79 | Date: April 2012 | ||
80 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
81 | linux-edac@vger.kernel.org | ||
82 | Description: This attribute file displays the information about the last | ||
83 | available memory slot in this memory controller. It is used by | ||
84 | userspace tools in order to display the memory filling layout. | ||
85 | |||
86 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/size | ||
87 | Date: April 2012 | ||
88 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
89 | linux-edac@vger.kernel.org | ||
90 | Description: This attribute file will display the size of dimm or rank. | ||
91 | For dimm*/size, this is the size, in MB of the DIMM memory | ||
92 | stick. For rank*/size, this is the size, in MB for one rank | ||
93 | of the DIMM memory stick. On single rank memories (1R), this | ||
94 | is also the total size of the dimm. On dual rank (2R) memories, | ||
95 | this is half the size of the total DIMM memories. | ||
96 | |||
97 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_dev_type | ||
98 | Date: April 2012 | ||
99 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
100 | linux-edac@vger.kernel.org | ||
101 | Description: This attribute file will display what type of DRAM device is | ||
102 | being utilized on this DIMM (x1, x2, x4, x8, ...). | ||
103 | |||
104 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_edac_mode | ||
105 | Date: April 2012 | ||
106 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
107 | linux-edac@vger.kernel.org | ||
108 | Description: This attribute file will display what type of Error detection | ||
109 | and correction is being utilized. For example: S4ECD4ED would | ||
110 | mean a Chipkill with x4 DRAM. | ||
111 | |||
112 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_label | ||
113 | Date: April 2012 | ||
114 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
115 | linux-edac@vger.kernel.org | ||
116 | Description: This control file allows this DIMM to have a label assigned | ||
117 | to it. With this label in the module, when errors occur | ||
118 | the output can provide the DIMM label in the system log. | ||
119 | This becomes vital for panic events to isolate the | ||
120 | cause of the UE event. | ||
121 | DIMM Labels must be assigned after booting, with information | ||
122 | that correctly identifies the physical slot with its | ||
123 | silk screen label. This information is currently very | ||
124 | motherboard specific and determination of this information | ||
125 | must occur in userland at this time. | ||
126 | |||
127 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_location | ||
128 | Date: April 2012 | ||
129 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
130 | linux-edac@vger.kernel.org | ||
131 | Description: This attribute file will display the location (csrow/channel, | ||
132 | branch/channel/slot or channel/slot) of the dimm or rank. | ||
133 | |||
134 | What: /sys/devices/system/edac/mc/mc*/(dimm|rank)*/dimm_mem_type | ||
135 | Date: April 2012 | ||
136 | Contact: Mauro Carvalho Chehab <mchehab@redhat.com> | ||
137 | linux-edac@vger.kernel.org | ||
138 | Description: This attribute file will display what type of memory is | ||
139 | currently on this csrow. Normally, either buffered or | ||
140 | unbuffered memory (for example, Unbuffered-DDR3). | ||
diff --git a/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt new file mode 100644 index 00000000000..94e642a33db --- /dev/null +++ b/Documentation/devicetree/bindings/arm/calxeda/l2ecc.txt | |||
@@ -0,0 +1,15 @@ | |||
1 | Calxeda Highbank L2 cache ECC | ||
2 | |||
3 | Properties: | ||
4 | - compatible : Should be "calxeda,hb-sregs-l2-ecc" | ||
5 | - reg : Address and size for ECC error interrupt clear registers. | ||
6 | - interrupts : Should be single bit error interrupt, then double bit error | ||
7 | interrupt. | ||
8 | |||
9 | Example: | ||
10 | |||
11 | sregs@fff3c200 { | ||
12 | compatible = "calxeda,hb-sregs-l2-ecc"; | ||
13 | reg = <0xfff3c200 0x100>; | ||
14 | interrupts = <0 71 4 0 72 4>; | ||
15 | }; | ||
diff --git a/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt new file mode 100644 index 00000000000..f770ac0893d --- /dev/null +++ b/Documentation/devicetree/bindings/arm/calxeda/mem-ctrlr.txt | |||
@@ -0,0 +1,14 @@ | |||
1 | Calxeda DDR memory controller | ||
2 | |||
3 | Properties: | ||
4 | - compatible : Should be "calxeda,hb-ddr-ctrl" | ||
5 | - reg : Address and size for DDR controller registers. | ||
6 | - interrupts : Interrupt for DDR controller. | ||
7 | |||
8 | Example: | ||
9 | |||
10 | memory-controller@fff00000 { | ||
11 | compatible = "calxeda,hb-ddr-ctrl"; | ||
12 | reg = <0xfff00000 0x1000>; | ||
13 | interrupts = <0 91 4>; | ||
14 | }; | ||
diff --git a/Documentation/edac.txt b/Documentation/edac.txt index 03df2b02033..56c7e936430 100644 --- a/Documentation/edac.txt +++ b/Documentation/edac.txt | |||
@@ -232,116 +232,20 @@ EDAC control and attribute files. | |||
232 | 232 | ||
233 | 233 | ||
234 | In 'mcX' directories are EDAC control and attribute files for | 234 | In 'mcX' directories are EDAC control and attribute files for |
235 | this 'X' instance of the memory controllers: | 235 | this 'X' instance of the memory controllers. |
236 | |||
237 | |||
238 | Counter reset control file: | ||
239 | |||
240 | 'reset_counters' | ||
241 | |||
242 | This write-only control file will zero all the statistical counters | ||
243 | for UE and CE errors. Zeroing the counters will also reset the timer | ||
244 | indicating how long since the last counter zero. This is useful | ||
245 | for computing errors/time. Since the counters are always reset at | ||
246 | driver initialization time, no module/kernel parameter is available. | ||
247 | |||
248 | RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset | ||
249 | |||
250 | This resets the counters on memory controller 0 | ||
251 | |||
252 | |||
253 | Seconds since last counter reset control file: | ||
254 | |||
255 | 'seconds_since_reset' | ||
256 | |||
257 | This attribute file displays how many seconds have elapsed since the | ||
258 | last counter reset. This can be used with the error counters to | ||
259 | measure error rates. | ||
260 | |||
261 | |||
262 | |||
263 | Memory Controller name attribute file: | ||
264 | |||
265 | 'mc_name' | ||
266 | |||
267 | This attribute file displays the type of memory controller | ||
268 | that is being utilized. | ||
269 | |||
270 | |||
271 | Total memory managed by this memory controller attribute file: | ||
272 | |||
273 | 'size_mb' | ||
274 | |||
275 | This attribute file displays, in count of megabytes, of memory | ||
276 | that this instance of memory controller manages. | ||
277 | |||
278 | |||
279 | Total Uncorrectable Errors count attribute file: | ||
280 | |||
281 | 'ue_count' | ||
282 | |||
283 | This attribute file displays the total count of uncorrectable | ||
284 | errors that have occurred on this memory controller. If panic_on_ue | ||
285 | is set this counter will not have a chance to increment, | ||
286 | since EDAC will panic the system. | ||
287 | |||
288 | |||
289 | Total UE count that had no information attribute fileY: | ||
290 | |||
291 | 'ue_noinfo_count' | ||
292 | |||
293 | This attribute file displays the number of UEs that have occurred | ||
294 | with no information as to which DIMM slot is having errors. | ||
295 | |||
296 | |||
297 | Total Correctable Errors count attribute file: | ||
298 | |||
299 | 'ce_count' | ||
300 | |||
301 | This attribute file displays the total count of correctable | ||
302 | errors that have occurred on this memory controller. This | ||
303 | count is very important to examine. CEs provide early | ||
304 | indications that a DIMM is beginning to fail. This count | ||
305 | field should be monitored for non-zero values and report | ||
306 | such information to the system administrator. | ||
307 | |||
308 | |||
309 | Total Correctable Errors count attribute file: | ||
310 | |||
311 | 'ce_noinfo_count' | ||
312 | |||
313 | This attribute file displays the number of CEs that | ||
314 | have occurred wherewith no information as to which DIMM slot | ||
315 | is having errors. Memory is handicapped, but operational, | ||
316 | yet no information is available to indicate which slot | ||
317 | the failing memory is in. This count field should be also | ||
318 | be monitored for non-zero values. | ||
319 | |||
320 | Device Symlink: | ||
321 | |||
322 | 'device' | ||
323 | |||
324 | Symlink to the memory controller device. | ||
325 | |||
326 | Sdram memory scrubbing rate: | ||
327 | |||
328 | 'sdram_scrub_rate' | ||
329 | |||
330 | Read/Write attribute file that controls memory scrubbing. The scrubbing | ||
331 | rate is set by writing a minimum bandwidth in bytes/sec to the attribute | ||
332 | file. The rate will be translated to an internal value that gives at | ||
333 | least the specified rate. | ||
334 | |||
335 | Reading the file will return the actual scrubbing rate employed. | ||
336 | |||
337 | If configuration fails or memory scrubbing is not implemented, accessing | ||
338 | that attribute will fail. | ||
339 | 236 | ||
237 | For a description of the sysfs API, please see: | ||
238 | Documentation/ABI/testing/sysfs/devices-edac | ||
340 | 239 | ||
341 | 240 | ||
342 | ============================================================================ | 241 | ============================================================================ |
343 | 'csrowX' DIRECTORIES | 242 | 'csrowX' DIRECTORIES |
344 | 243 | ||
244 | When CONFIG_EDAC_LEGACY_SYSFS is enabled, the sysfs will contain the | ||
245 | csrowX directories. As this API doesn't work properly for Rambus, FB-DIMMs | ||
246 | and modern Intel Memory Controllers, this is being deprecated in favor | ||
247 | of dimmX directories. | ||
248 | |||
345 | In the 'csrowX' directories are EDAC control and attribute files for | 249 | In the 'csrowX' directories are EDAC control and attribute files for |
346 | this 'X' instance of csrow: | 250 | this 'X' instance of csrow: |
347 | 251 | ||