diff options
author | Changbin Du <changbin.du@gmail.com> | 2019-05-14 10:47:30 -0400 |
---|---|---|
committer | Bjorn Helgaas <bhelgaas@google.com> | 2019-05-30 18:54:33 -0400 |
commit | 4e37f055a92e4a813b29fb196a05a6a826abb790 (patch) | |
tree | e8e80c425bf65ba5cd6ecd485cd8a0af70c7f2ca | |
parent | 8a01fa64348aaaf54b3eef9728bfe2654e7bdd88 (diff) |
Documentation: PCI: convert pcieaer-howto.txt to reST
Convert plain text documentation to reStructuredText format and add it to
Sphinx TOC tree. No essential content change.
Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
-rw-r--r-- | Documentation/PCI/index.rst | 1 | ||||
-rw-r--r-- | Documentation/PCI/pcieaer-howto.rst (renamed from Documentation/PCI/pcieaer-howto.txt) | 156 |
2 files changed, 101 insertions, 56 deletions
diff --git a/Documentation/PCI/index.rst b/Documentation/PCI/index.rst index 92e62d0fc9e6..f54b65b1ca5f 100644 --- a/Documentation/PCI/index.rst +++ b/Documentation/PCI/index.rst | |||
@@ -14,3 +14,4 @@ Linux PCI Bus Subsystem | |||
14 | msi-howto | 14 | msi-howto |
15 | acpi-info | 15 | acpi-info |
16 | pci-error-recovery | 16 | pci-error-recovery |
17 | pcieaer-howto | ||
diff --git a/Documentation/PCI/pcieaer-howto.txt b/Documentation/PCI/pcieaer-howto.rst index 48ce7903e3c6..18bdefaafd1a 100644 --- a/Documentation/PCI/pcieaer-howto.txt +++ b/Documentation/PCI/pcieaer-howto.rst | |||
@@ -1,21 +1,29 @@ | |||
1 | The PCI Express Advanced Error Reporting Driver Guide HOWTO | 1 | .. SPDX-License-Identifier: GPL-2.0 |
2 | T. Long Nguyen <tom.l.nguyen@intel.com> | 2 | .. include:: <isonum.txt> |
3 | Yanmin Zhang <yanmin.zhang@intel.com> | ||
4 | 07/29/2006 | ||
5 | 3 | ||
4 | =========================================================== | ||
5 | The PCI Express Advanced Error Reporting Driver Guide HOWTO | ||
6 | =========================================================== | ||
6 | 7 | ||
7 | 1. Overview | 8 | :Authors: - T. Long Nguyen <tom.l.nguyen@intel.com> |
9 | - Yanmin Zhang <yanmin.zhang@intel.com> | ||
8 | 10 | ||
9 | 1.1 About this guide | 11 | :Copyright: |copy| 2006 Intel Corporation |
12 | |||
13 | Overview | ||
14 | =========== | ||
15 | |||
16 | About this guide | ||
17 | ---------------- | ||
10 | 18 | ||
11 | This guide describes the basics of the PCI Express Advanced Error | 19 | This guide describes the basics of the PCI Express Advanced Error |
12 | Reporting (AER) driver and provides information on how to use it, as | 20 | Reporting (AER) driver and provides information on how to use it, as |
13 | well as how to enable the drivers of endpoint devices to conform with | 21 | well as how to enable the drivers of endpoint devices to conform with |
14 | PCI Express AER driver. | 22 | PCI Express AER driver. |
15 | 23 | ||
16 | 1.2 Copyright (C) Intel Corporation 2006. | ||
17 | 24 | ||
18 | 1.3 What is the PCI Express AER Driver? | 25 | What is the PCI Express AER Driver? |
26 | ----------------------------------- | ||
19 | 27 | ||
20 | PCI Express error signaling can occur on the PCI Express link itself | 28 | PCI Express error signaling can occur on the PCI Express link itself |
21 | or on behalf of transactions initiated on the link. PCI Express | 29 | or on behalf of transactions initiated on the link. PCI Express |
@@ -30,17 +38,19 @@ The PCI Express AER driver provides the infrastructure to support PCI | |||
30 | Express Advanced Error Reporting capability. The PCI Express AER | 38 | Express Advanced Error Reporting capability. The PCI Express AER |
31 | driver provides three basic functions: | 39 | driver provides three basic functions: |
32 | 40 | ||
33 | - Gathers the comprehensive error information if errors occurred. | 41 | - Gathers the comprehensive error information if errors occurred. |
34 | - Reports error to the users. | 42 | - Reports error to the users. |
35 | - Performs error recovery actions. | 43 | - Performs error recovery actions. |
36 | 44 | ||
37 | AER driver only attaches root ports which support PCI-Express AER | 45 | AER driver only attaches root ports which support PCI-Express AER |
38 | capability. | 46 | capability. |
39 | 47 | ||
40 | 48 | ||
41 | 2. User Guide | 49 | User Guide |
50 | ========== | ||
42 | 51 | ||
43 | 2.1 Include the PCI Express AER Root Driver into the Linux Kernel | 52 | Include the PCI Express AER Root Driver into the Linux Kernel |
53 | ------------------------------------------------------------- | ||
44 | 54 | ||
45 | The PCI Express AER Root driver is a Root Port service driver attached | 55 | The PCI Express AER Root driver is a Root Port service driver attached |
46 | to the PCI Express Port Bus driver. If a user wants to use it, the driver | 56 | to the PCI Express Port Bus driver. If a user wants to use it, the driver |
@@ -48,7 +58,8 @@ has to be compiled. Option CONFIG_PCIEAER supports this capability. It | |||
48 | depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and | 58 | depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and |
49 | CONFIG_PCIEAER = y. | 59 | CONFIG_PCIEAER = y. |
50 | 60 | ||
51 | 2.2 Load PCI Express AER Root Driver | 61 | Load PCI Express AER Root Driver |
62 | -------------------------------- | ||
52 | 63 | ||
53 | Some systems have AER support in firmware. Enabling Linux AER support at | 64 | Some systems have AER support in firmware. Enabling Linux AER support at |
54 | the same time the firmware handles AER may result in unpredictable | 65 | the same time the firmware handles AER may result in unpredictable |
@@ -56,30 +67,34 @@ behavior. Therefore, Linux does not handle AER events unless the firmware | |||
56 | grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 | 67 | grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 |
57 | Specification for details regarding _OSC usage. | 68 | Specification for details regarding _OSC usage. |
58 | 69 | ||
59 | 2.3 AER error output | 70 | AER error output |
71 | ---------------- | ||
60 | 72 | ||
61 | When a PCIe AER error is captured, an error message will be output to | 73 | When a PCIe AER error is captured, an error message will be output to |
62 | console. If it's a correctable error, it is output as a warning. | 74 | console. If it's a correctable error, it is output as a warning. |
63 | Otherwise, it is printed as an error. So users could choose different | 75 | Otherwise, it is printed as an error. So users could choose different |
64 | log level to filter out correctable error messages. | 76 | log level to filter out correctable error messages. |
65 | 77 | ||
66 | Below shows an example: | 78 | Below shows an example:: |
67 | 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) | 79 | |
68 | 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 | 80 | 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) |
69 | 0000:50:00.0: [20] Unsupported Request (First) | 81 | 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 |
70 | 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 | 82 | 0000:50:00.0: [20] Unsupported Request (First) |
83 | 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 | ||
71 | 84 | ||
72 | In the example, 'Requester ID' means the ID of the device who sends | 85 | In the example, 'Requester ID' means the ID of the device who sends |
73 | the error message to root port. Pls. refer to pci express specs for | 86 | the error message to root port. Pls. refer to pci express specs for |
74 | other fields. | 87 | other fields. |
75 | 88 | ||
76 | 2.4 AER Statistics / Counters | 89 | AER Statistics / Counters |
90 | ------------------------- | ||
77 | 91 | ||
78 | When PCIe AER errors are captured, the counters / statistics are also exposed | 92 | When PCIe AER errors are captured, the counters / statistics are also exposed |
79 | in the form of sysfs attributes which are documented at | 93 | in the form of sysfs attributes which are documented at |
80 | Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats | 94 | Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats |
81 | 95 | ||
82 | 3. Developer Guide | 96 | Developer Guide |
97 | =============== | ||
83 | 98 | ||
84 | To enable AER aware support requires a software driver to configure | 99 | To enable AER aware support requires a software driver to configure |
85 | the AER capability structure within its device and to provide callbacks. | 100 | the AER capability structure within its device and to provide callbacks. |
@@ -120,7 +135,8 @@ hierarchy and links. These errors do not include any device specific | |||
120 | errors because device specific errors will still get sent directly to | 135 | errors because device specific errors will still get sent directly to |
121 | the device driver. | 136 | the device driver. |
122 | 137 | ||
123 | 3.1 Configure the AER capability structure | 138 | Configure the AER capability structure |
139 | -------------------------------------- | ||
124 | 140 | ||
125 | AER aware drivers of PCI Express component need change the device | 141 | AER aware drivers of PCI Express component need change the device |
126 | control registers to enable AER. They also could change AER registers, | 142 | control registers to enable AER. They also could change AER registers, |
@@ -128,9 +144,11 @@ including mask and severity registers. Helper function | |||
128 | pci_enable_pcie_error_reporting could be used to enable AER. See | 144 | pci_enable_pcie_error_reporting could be used to enable AER. See |
129 | section 3.3. | 145 | section 3.3. |
130 | 146 | ||
131 | 3.2. Provide callbacks | 147 | Provide callbacks |
148 | ----------------- | ||
132 | 149 | ||
133 | 3.2.1 callback reset_link to reset pci express link | 150 | callback reset_link to reset pci express link |
151 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
134 | 152 | ||
135 | This callback is used to reset the pci express physical link when a | 153 | This callback is used to reset the pci express physical link when a |
136 | fatal error happens. The root port aer service driver provides a | 154 | fatal error happens. The root port aer service driver provides a |
@@ -140,13 +158,15 @@ upstream ports should provide their own reset_link functions. | |||
140 | 158 | ||
141 | In struct pcie_port_service_driver, a new pointer, reset_link, is | 159 | In struct pcie_port_service_driver, a new pointer, reset_link, is |
142 | added. | 160 | added. |
161 | :: | ||
143 | 162 | ||
144 | pci_ers_result_t (*reset_link) (struct pci_dev *dev); | 163 | pci_ers_result_t (*reset_link) (struct pci_dev *dev); |
145 | 164 | ||
146 | Section 3.2.2.2 provides more detailed info on when to call | 165 | Section 3.2.2.2 provides more detailed info on when to call |
147 | reset_link. | 166 | reset_link. |
148 | 167 | ||
149 | 3.2.2 PCI error-recovery callbacks | 168 | PCI error-recovery callbacks |
169 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
150 | 170 | ||
151 | The PCI Express AER Root driver uses error callbacks to coordinate | 171 | The PCI Express AER Root driver uses error callbacks to coordinate |
152 | with downstream device drivers associated with a hierarchy in question | 172 | with downstream device drivers associated with a hierarchy in question |
@@ -161,7 +181,8 @@ definitions of the callbacks. | |||
161 | 181 | ||
162 | Below sections specify when to call the error callback functions. | 182 | Below sections specify when to call the error callback functions. |
163 | 183 | ||
164 | 3.2.2.1 Correctable errors | 184 | Correctable errors |
185 | ~~~~~~~~~~~~~~~~~~ | ||
165 | 186 | ||
166 | Correctable errors pose no impacts on the functionality of | 187 | Correctable errors pose no impacts on the functionality of |
167 | the interface. The PCI Express protocol can recover without any | 188 | the interface. The PCI Express protocol can recover without any |
@@ -169,13 +190,16 @@ software intervention or any loss of data. These errors do not | |||
169 | require any recovery actions. The AER driver clears the device's | 190 | require any recovery actions. The AER driver clears the device's |
170 | correctable error status register accordingly and logs these errors. | 191 | correctable error status register accordingly and logs these errors. |
171 | 192 | ||
172 | 3.2.2.2 Non-correctable (non-fatal and fatal) errors | 193 | Non-correctable (non-fatal and fatal) errors |
194 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
173 | 195 | ||
174 | If an error message indicates a non-fatal error, performing link reset | 196 | If an error message indicates a non-fatal error, performing link reset |
175 | at upstream is not required. The AER driver calls error_detected(dev, | 197 | at upstream is not required. The AER driver calls error_detected(dev, |
176 | pci_channel_io_normal) to all drivers associated within a hierarchy in | 198 | pci_channel_io_normal) to all drivers associated within a hierarchy in |
177 | question. for example, | 199 | question. for example:: |
178 | EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort. | 200 | |
201 | EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort | ||
202 | |||
179 | If Upstream port A captures an AER error, the hierarchy consists of | 203 | If Upstream port A captures an AER error, the hierarchy consists of |
180 | Downstream port B and EndPoint. | 204 | Downstream port B and EndPoint. |
181 | 205 | ||
@@ -199,53 +223,72 @@ function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and | |||
199 | reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes | 223 | reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes |
200 | to mmio_enabled. | 224 | to mmio_enabled. |
201 | 225 | ||
202 | 3.3 helper functions | 226 | helper functions |
227 | ---------------- | ||
228 | :: | ||
229 | |||
230 | int pci_enable_pcie_error_reporting(struct pci_dev *dev); | ||
203 | 231 | ||
204 | 3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev); | ||
205 | pci_enable_pcie_error_reporting enables the device to send error | 232 | pci_enable_pcie_error_reporting enables the device to send error |
206 | messages to root port when an error is detected. Note that devices | 233 | messages to root port when an error is detected. Note that devices |
207 | don't enable the error reporting by default, so device drivers need | 234 | don't enable the error reporting by default, so device drivers need |
208 | call this function to enable it. | 235 | call this function to enable it. |
209 | 236 | ||
210 | 3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev); | 237 | :: |
238 | |||
239 | int pci_disable_pcie_error_reporting(struct pci_dev *dev); | ||
240 | |||
211 | pci_disable_pcie_error_reporting disables the device to send error | 241 | pci_disable_pcie_error_reporting disables the device to send error |
212 | messages to root port when an error is detected. | 242 | messages to root port when an error is detected. |
213 | 243 | ||
214 | 3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); | 244 | :: |
245 | |||
246 | int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);` | ||
247 | |||
215 | pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable | 248 | pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable |
216 | error status register. | 249 | error status register. |
217 | 250 | ||
218 | 3.4 Frequent Asked Questions | 251 | Frequent Asked Questions |
252 | ------------------------ | ||
219 | 253 | ||
220 | Q: What happens if a PCI Express device driver does not provide an | 254 | Q: |
221 | error recovery handler (pci_driver->err_handler is equal to NULL)? | 255 | What happens if a PCI Express device driver does not provide an |
256 | error recovery handler (pci_driver->err_handler is equal to NULL)? | ||
222 | 257 | ||
223 | A: The devices attached with the driver won't be recovered. If the | 258 | A: |
224 | error is fatal, kernel will print out warning messages. Please refer | 259 | The devices attached with the driver won't be recovered. If the |
225 | to section 3 for more information. | 260 | error is fatal, kernel will print out warning messages. Please refer |
261 | to section 3 for more information. | ||
226 | 262 | ||
227 | Q: What happens if an upstream port service driver does not provide | 263 | Q: |
228 | callback reset_link? | 264 | What happens if an upstream port service driver does not provide |
265 | callback reset_link? | ||
229 | 266 | ||
230 | A: Fatal error recovery will fail if the errors are reported by the | 267 | A: |
231 | upstream ports who are attached by the service driver. | 268 | Fatal error recovery will fail if the errors are reported by the |
269 | upstream ports who are attached by the service driver. | ||
232 | 270 | ||
233 | Q: How does this infrastructure deal with driver that is not PCI | 271 | Q: |
234 | Express aware? | 272 | How does this infrastructure deal with driver that is not PCI |
273 | Express aware? | ||
235 | 274 | ||
236 | A: This infrastructure calls the error callback functions of the | 275 | A: |
237 | driver when an error happens. But if the driver is not aware of | 276 | This infrastructure calls the error callback functions of the |
238 | PCI Express, the device might not report its own errors to root | 277 | driver when an error happens. But if the driver is not aware of |
239 | port. | 278 | PCI Express, the device might not report its own errors to root |
279 | port. | ||
240 | 280 | ||
241 | Q: What modifications will that driver need to make it compatible | 281 | Q: |
242 | with the PCI Express AER Root driver? | 282 | What modifications will that driver need to make it compatible |
283 | with the PCI Express AER Root driver? | ||
243 | 284 | ||
244 | A: It could call the helper functions to enable AER in devices and | 285 | A: |
245 | cleanup uncorrectable status register. Pls. refer to section 3.3. | 286 | It could call the helper functions to enable AER in devices and |
287 | cleanup uncorrectable status register. Pls. refer to section 3.3. | ||
246 | 288 | ||
247 | 289 | ||
248 | 4. Software error injection | 290 | Software error injection |
291 | ======================== | ||
249 | 292 | ||
250 | Debugging PCIe AER error recovery code is quite difficult because it | 293 | Debugging PCIe AER error recovery code is quite difficult because it |
251 | is hard to trigger real hardware errors. Software based error | 294 | is hard to trigger real hardware errors. Software based error |
@@ -261,6 +304,7 @@ After reboot with new kernel or insert the module, a device file named | |||
261 | 304 | ||
262 | Then, you need a user space tool named aer-inject, which can be gotten | 305 | Then, you need a user space tool named aer-inject, which can be gotten |
263 | from: | 306 | from: |
307 | |||
264 | https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ | 308 | https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ |
265 | 309 | ||
266 | More information about aer-inject can be found in the document comes | 310 | More information about aer-inject can be found in the document comes |