aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/PCI
diff options
context:
space:
mode:
authorFrederic Weisbecker <fweisbec@gmail.com>2009-09-23 17:08:43 -0400
committerFrederic Weisbecker <fweisbec@gmail.com>2009-09-23 17:08:43 -0400
commitd7a4b414eed51f1653bb05ebe84122bf9a7ae18b (patch)
treebd6603a0c27de4c138a1767871897e9cd3e1a1d2 /Documentation/PCI
parent1f0ab40976460bc4673fa204ce917a725185d8f2 (diff)
parenta724eada8c2a7b62463b73ccf73fd0bb6e928aeb (diff)
Merge commit 'linus/master' into tracing/kprobes
Conflicts: kernel/trace/Makefile kernel/trace/trace.h kernel/trace/trace_event_types.h kernel/trace/trace_export.c Merge reason: Sync with latest significant tracing core changes.
Diffstat (limited to 'Documentation/PCI')
-rw-r--r--Documentation/PCI/pci-error-recovery.txt119
1 files changed, 77 insertions, 42 deletions
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt
index 6650af432523..e83f2ea76415 100644
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.txt
@@ -4,15 +4,17 @@
4 February 2, 2006 4 February 2, 2006
5 5
6 Current document maintainer: 6 Current document maintainer:
7 Linas Vepstas <linas@austin.ibm.com> 7 Linas Vepstas <linasvepstas@gmail.com>
8 updated by Richard Lary <rlary@us.ibm.com>
9 and Mike Mason <mmlnx@us.ibm.com> on 27-Jul-2009
8 10
9 11
10Many PCI bus controllers are able to detect a variety of hardware 12Many PCI bus controllers are able to detect a variety of hardware
11PCI errors on the bus, such as parity errors on the data and address 13PCI errors on the bus, such as parity errors on the data and address
12busses, as well as SERR and PERR errors. Some of the more advanced 14busses, as well as SERR and PERR errors. Some of the more advanced
13chipsets are able to deal with these errors; these include PCI-E chipsets, 15chipsets are able to deal with these errors; these include PCI-E chipsets,
14and the PCI-host bridges found on IBM Power4 and Power5-based pSeries 16and the PCI-host bridges found on IBM Power4, Power5 and Power6-based
15boxes. A typical action taken is to disconnect the affected device, 17pSeries boxes. A typical action taken is to disconnect the affected device,
16halting all I/O to it. The goal of a disconnection is to avoid system 18halting all I/O to it. The goal of a disconnection is to avoid system
17corruption; for example, to halt system memory corruption due to DMA's 19corruption; for example, to halt system memory corruption due to DMA's
18to "wild" addresses. Typically, a reconnection mechanism is also 20to "wild" addresses. Typically, a reconnection mechanism is also
@@ -37,10 +39,11 @@ is forced by the need to handle multi-function devices, that is,
37devices that have multiple device drivers associated with them. 39devices that have multiple device drivers associated with them.
38In the first stage, each driver is allowed to indicate what type 40In the first stage, each driver is allowed to indicate what type
39of reset it desires, the choices being a simple re-enabling of I/O 41of reset it desires, the choices being a simple re-enabling of I/O
40or requesting a hard reset (a full electrical #RST of the PCI card). 42or requesting a slot reset.
41If any driver requests a full reset, that is what will be done.
42 43
43After a full reset and/or a re-enabling of I/O, all drivers are 44If any driver requests a slot reset, that is what will be done.
45
46After a reset and/or a re-enabling of I/O, all drivers are
44again notified, so that they may then perform any device setup/config 47again notified, so that they may then perform any device setup/config
45that may be required. After these have all completed, a final 48that may be required. After these have all completed, a final
46"resume normal operations" event is sent out. 49"resume normal operations" event is sent out.
@@ -101,7 +104,7 @@ if it implements any, it must implement error_detected(). If a callback
101is not implemented, the corresponding feature is considered unsupported. 104is not implemented, the corresponding feature is considered unsupported.
102For example, if mmio_enabled() and resume() aren't there, then it 105For example, if mmio_enabled() and resume() aren't there, then it
103is assumed that the driver is not doing any direct recovery and requires 106is assumed that the driver is not doing any direct recovery and requires
104a reset. If link_reset() is not implemented, the card is assumed as 107a slot reset. If link_reset() is not implemented, the card is assumed to
105not care about link resets. Typically a driver will want to know about 108not care about link resets. Typically a driver will want to know about
106a slot_reset(). 109a slot_reset().
107 110
@@ -111,7 +114,7 @@ sequence described below.
111 114
112STEP 0: Error Event 115STEP 0: Error Event
113------------------- 116-------------------
114PCI bus error is detect by the PCI hardware. On powerpc, the slot 117A PCI bus error is detected by the PCI hardware. On powerpc, the slot
115is isolated, in that all I/O is blocked: all reads return 0xffffffff, 118is isolated, in that all I/O is blocked: all reads return 0xffffffff,
116all writes are ignored. 119all writes are ignored.
117 120
@@ -139,7 +142,7 @@ The driver must return one of the following result codes:
139 a chance to extract some diagnostic information (see 142 a chance to extract some diagnostic information (see
140 mmio_enable, below). 143 mmio_enable, below).
141 - PCI_ERS_RESULT_NEED_RESET: 144 - PCI_ERS_RESULT_NEED_RESET:
142 Driver returns this if it can't recover without a hard 145 Driver returns this if it can't recover without a
143 slot reset. 146 slot reset.
144 - PCI_ERS_RESULT_DISCONNECT: 147 - PCI_ERS_RESULT_DISCONNECT:
145 Driver returns this if it doesn't want to recover at all. 148 Driver returns this if it doesn't want to recover at all.
@@ -169,11 +172,11 @@ is STEP 6 (Permanent Failure).
169 172
170>>> The current powerpc implementation doesn't much care if the device 173>>> The current powerpc implementation doesn't much care if the device
171>>> attempts I/O at this point, or not. I/O's will fail, returning 174>>> attempts I/O at this point, or not. I/O's will fail, returning
172>>> a value of 0xff on read, and writes will be dropped. If the device 175>>> a value of 0xff on read, and writes will be dropped. If more than
173>>> driver attempts more than 10K I/O's to a frozen adapter, it will 176>>> EEH_MAX_FAILS I/O's are attempted to a frozen adapter, EEH
174>>> assume that the device driver has gone into an infinite loop, and 177>>> assumes that the device driver has gone into an infinite loop
175>>> it will panic the kernel. There doesn't seem to be any other 178>>> and prints an error to syslog. A reboot is then required to
176>>> way of stopping a device driver that insists on spinning on I/O. 179>>> get the device working again.
177 180
178STEP 2: MMIO Enabled 181STEP 2: MMIO Enabled
179------------------- 182-------------------
@@ -182,15 +185,14 @@ DMA), and then calls the mmio_enabled() callback on all affected
182device drivers. 185device drivers.
183 186
184This is the "early recovery" call. IOs are allowed again, but DMA is 187This is the "early recovery" call. IOs are allowed again, but DMA is
185not (hrm... to be discussed, I prefer not), with some restrictions. This 188not, with some restrictions. This is NOT a callback for the driver to
186is NOT a callback for the driver to start operations again, only to 189start operations again, only to peek/poke at the device, extract diagnostic
187peek/poke at the device, extract diagnostic information, if any, and 190information, if any, and eventually do things like trigger a device local
188eventually do things like trigger a device local reset or some such, 191reset or some such, but not restart operations. This callback is made if
189but not restart operations. This is callback is made if all drivers on 192all drivers on a segment agree that they can try to recover and if no automatic
190a segment agree that they can try to recover and if no automatic link reset 193link reset was performed by the HW. If the platform can't just re-enable IOs
191was performed by the HW. If the platform can't just re-enable IOs without 194without a slot reset or a link reset, it will not call this callback, and
192a slot reset or a link reset, it wont call this callback, and instead 195instead will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
193will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
194 196
195>>> The following is proposed; no platform implements this yet: 197>>> The following is proposed; no platform implements this yet:
196>>> Proposal: All I/O's should be done _synchronously_ from within 198>>> Proposal: All I/O's should be done _synchronously_ from within
@@ -228,9 +230,6 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
228If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 230If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
229proceeds to STEP 4 (Slot Reset) 231proceeds to STEP 4 (Slot Reset)
230 232
231>>> The current powerpc implementation does not implement this callback.
232
233
234STEP 3: Link Reset 233STEP 3: Link Reset
235------------------ 234------------------
236The platform resets the link, and then calls the link_reset() callback 235The platform resets the link, and then calls the link_reset() callback
@@ -253,16 +252,33 @@ The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5
253 252
254>>> The current powerpc implementation does not implement this callback. 253>>> The current powerpc implementation does not implement this callback.
255 254
256
257STEP 4: Slot Reset 255STEP 4: Slot Reset
258------------------ 256------------------
259The platform performs a soft or hard reset of the device, and then
260calls the slot_reset() callback.
261 257
262A soft reset consists of asserting the adapter #RST line and then 258In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
259the platform will peform a slot reset on the requesting PCI device(s).
260The actual steps taken by a platform to perform a slot reset
261will be platform-dependent. Upon completion of slot reset, the
262platform will call the device slot_reset() callback.
263
264Powerpc platforms implement two levels of slot reset:
265soft reset(default) and fundamental(optional) reset.
266
267Powerpc soft reset consists of asserting the adapter #RST line and then
263restoring the PCI BAR's and PCI configuration header to a state 268restoring the PCI BAR's and PCI configuration header to a state
264that is equivalent to what it would be after a fresh system 269that is equivalent to what it would be after a fresh system
265power-on followed by power-on BIOS/system firmware initialization. 270power-on followed by power-on BIOS/system firmware initialization.
271Soft reset is also known as hot-reset.
272
273Powerpc fundamental reset is supported by PCI Express cards only
274and results in device's state machines, hardware logic, port states and
275configuration registers to initialize to their default conditions.
276
277For most PCI devices, a soft reset will be sufficient for recovery.
278Optional fundamental reset is provided to support a limited number
279of PCI Express PCI devices for which a soft reset is not sufficient
280for recovery.
281
266If the platform supports PCI hotplug, then the reset might be 282If the platform supports PCI hotplug, then the reset might be
267performed by toggling the slot electrical power off/on. 283performed by toggling the slot electrical power off/on.
268 284
@@ -274,10 +290,12 @@ may result in hung devices, kernel panics, or silent data corruption.
274 290
275This call gives drivers the chance to re-initialize the hardware 291This call gives drivers the chance to re-initialize the hardware
276(re-download firmware, etc.). At this point, the driver may assume 292(re-download firmware, etc.). At this point, the driver may assume
277that he card is in a fresh state and is fully functional. In 293that the card is in a fresh state and is fully functional. The slot
278particular, interrupt generation should work normally. 294is unfrozen and the driver has full access to PCI config space,
295memory mapped I/O space and DMA. Interrupts (Legacy, MSI, or MSI-X)
296will also be available.
279 297
280Drivers should not yet restart normal I/O processing operations 298Drivers should not restart normal I/O processing operations
281at this point. If all device drivers report success on this 299at this point. If all device drivers report success on this
282callback, the platform will call resume() to complete the sequence, 300callback, the platform will call resume() to complete the sequence,
283and let the driver restart normal I/O processing. 301and let the driver restart normal I/O processing.
@@ -302,11 +320,21 @@ driver performs device init only from PCI function 0:
302 - PCI_ERS_RESULT_DISCONNECT 320 - PCI_ERS_RESULT_DISCONNECT
303 Same as above. 321 Same as above.
304 322
323Drivers for PCI Express cards that require a fundamental reset must
324set the needs_freset bit in the pci_dev structure in their probe function.
325For example, the QLogic qla2xxx driver sets the needs_freset bit for certain
326PCI card types:
327
328+ /* Set EEH reset type to fundamental if required by hba */
329+ if (IS_QLA24XX(ha) || IS_QLA25XX(ha) || IS_QLA81XX(ha))
330+ pdev->needs_freset = 1;
331+
332
305Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent 333Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
306Failure). 334Failure).
307 335
308>>> The current powerpc implementation does not currently try a 336>>> The current powerpc implementation does not try a power-cycle
309>>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT. 337>>> reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
310>>> However, it probably should. 338>>> However, it probably should.
311 339
312 340
@@ -348,7 +376,7 @@ software errors.
348 376
349Conclusion; General Remarks 377Conclusion; General Remarks
350--------------------------- 378---------------------------
351The way those callbacks are called is platform policy. A platform with 379The way the callbacks are called is platform policy. A platform with
352no slot reset capability may want to just "ignore" drivers that can't 380no slot reset capability may want to just "ignore" drivers that can't
353recover (disconnect them) and try to let other cards on the same segment 381recover (disconnect them) and try to let other cards on the same segment
354recover. Keep in mind that in most real life cases, though, there will 382recover. Keep in mind that in most real life cases, though, there will
@@ -361,8 +389,8 @@ That is, the recovery API only requires that:
361 389
362 - There is no guarantee that interrupt delivery can proceed from any 390 - There is no guarantee that interrupt delivery can proceed from any
363device on the segment starting from the error detection and until the 391device on the segment starting from the error detection and until the
364resume callback is sent, at which point interrupts are expected to be 392slot_reset callback is called, at which point interrupts are expected
365fully operational. 393to be fully operational.
366 394
367 - There is no guarantee that interrupt delivery is stopped, that is, 395 - There is no guarantee that interrupt delivery is stopped, that is,
368a driver that gets an interrupt after detecting an error, or that detects 396a driver that gets an interrupt after detecting an error, or that detects
@@ -381,16 +409,23 @@ anyway :)
381>>> Implementation details for the powerpc platform are discussed in 409>>> Implementation details for the powerpc platform are discussed in
382>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt 410>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
383 411
384>>> As of this writing, there are six device drivers with patches 412>>> As of this writing, there is a growing list of device drivers with
385>>> implementing error recovery. Not all of these patches are in 413>>> patches implementing error recovery. Not all of these patches are in
386>>> mainline yet. These may be used as "examples": 414>>> mainline yet. These may be used as "examples":
387>>> 415>>>
388>>> drivers/scsi/ipr.c 416>>> drivers/scsi/ipr
389>>> drivers/scsi/sym53cxx_2 417>>> drivers/scsi/sym53c8xx_2
418>>> drivers/scsi/qla2xxx
419>>> drivers/scsi/lpfc
420>>> drivers/next/bnx2.c
390>>> drivers/next/e100.c 421>>> drivers/next/e100.c
391>>> drivers/net/e1000 422>>> drivers/net/e1000
423>>> drivers/net/e1000e
392>>> drivers/net/ixgb 424>>> drivers/net/ixgb
425>>> drivers/net/ixgbe
426>>> drivers/net/cxgb3
393>>> drivers/net/s2io.c 427>>> drivers/net/s2io.c
428>>> drivers/net/qlge
394 429
395The End 430The End
396------- 431-------