summaryrefslogtreecommitdiffstats
path: root/Documentation/PCI
diff options
context:
space:
mode:
authorKeith Busch <keith.busch@intel.com>2018-09-20 12:27:12 -0400
committerBjorn Helgaas <bhelgaas@google.com>2018-09-26 15:23:14 -0400
commitbdb5ac85777de67c909c9ad4327f03f7648b543f (patch)
tree06ffd1c0d73efa4807579a2a5b9bc04da8fcca1b /Documentation/PCI
parentc4eed62a214330908eec11b0dc170d34fa50b412 (diff)
PCI/ERR: Handle fatal error recovery
We don't need to be paranoid about the topology changing while handling an error. If the device has changed in a hotplug capable slot, we can rely on the presence detection handling to react to a changing topology. Restore the fatal error handling behavior that existed before merging DPC with AER with 7e9084b36740 ("PCI/AER: Handle ERR_FATAL with removal and re-enumeration of devices"). Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Sinan Kaya <okaya@kernel.org>
Diffstat (limited to 'Documentation/PCI')
-rw-r--r--Documentation/PCI/pci-error-recovery.txt35
1 files changed, 10 insertions, 25 deletions
diff --git a/Documentation/PCI/pci-error-recovery.txt b/Documentation/PCI/pci-error-recovery.txt
index 688b69121e82..0b6bb3ef449e 100644
--- a/Documentation/PCI/pci-error-recovery.txt
+++ b/Documentation/PCI/pci-error-recovery.txt
@@ -110,7 +110,7 @@ The actual steps taken by a platform to recover from a PCI error
110event will be platform-dependent, but will follow the general 110event will be platform-dependent, but will follow the general
111sequence described below. 111sequence described below.
112 112
113STEP 0: Error Event: ERR_NONFATAL 113STEP 0: Error Event
114------------------- 114-------------------
115A PCI bus error is detected by the PCI hardware. On powerpc, the slot 115A PCI bus error is detected by the PCI hardware. On powerpc, the slot
116is isolated, in that all I/O is blocked: all reads return 0xffffffff, 116is isolated, in that all I/O is blocked: all reads return 0xffffffff,
@@ -228,7 +228,13 @@ proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
228If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform 228If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
229proceeds to STEP 4 (Slot Reset) 229proceeds to STEP 4 (Slot Reset)
230 230
231STEP 3: Slot Reset 231STEP 3: Link Reset
232------------------
233The platform resets the link. This is a PCI-Express specific step
234and is done whenever a fatal error has been detected that can be
235"solved" by resetting the link.
236
237STEP 4: Slot Reset
232------------------ 238------------------
233 239
234In response to a return value of PCI_ERS_RESULT_NEED_RESET, the 240In response to a return value of PCI_ERS_RESULT_NEED_RESET, the
@@ -314,7 +320,7 @@ Failure).
314>>> However, it probably should. 320>>> However, it probably should.
315 321
316 322
317STEP 4: Resume Operations 323STEP 5: Resume Operations
318------------------------- 324-------------------------
319The platform will call the resume() callback on all affected device 325The platform will call the resume() callback on all affected device
320drivers if all drivers on the segment have returned 326drivers if all drivers on the segment have returned
@@ -326,7 +332,7 @@ a result code.
326At this point, if a new error happens, the platform will restart 332At this point, if a new error happens, the platform will restart
327a new error recovery sequence. 333a new error recovery sequence.
328 334
329STEP 5: Permanent Failure 335STEP 6: Permanent Failure
330------------------------- 336-------------------------
331A "permanent failure" has occurred, and the platform cannot recover 337A "permanent failure" has occurred, and the platform cannot recover
332the device. The platform will call error_detected() with a 338the device. The platform will call error_detected() with a
@@ -349,27 +355,6 @@ errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
349for additional detail on real-life experience of the causes of 355for additional detail on real-life experience of the causes of
350software errors. 356software errors.
351 357
352STEP 0: Error Event: ERR_FATAL
353-------------------
354PCI bus error is detected by the PCI hardware. On powerpc, the slot is
355isolated, in that all I/O is blocked: all reads return 0xffffffff, all
356writes are ignored.
357
358STEP 1: Remove devices
359--------------------
360Platform removes the devices depending on the error agent, it could be
361this port for all subordinates or upstream component (likely downstream
362port)
363
364STEP 2: Reset link
365--------------------
366The platform resets the link. This is a PCI-Express specific step and is
367done whenever a fatal error has been detected that can be "solved" by
368resetting the link.
369
370STEP 3: Re-enumerate the devices
371--------------------
372Initiates the re-enumeration.
373 358
374Conclusion; General Remarks 359Conclusion; General Remarks
375--------------------------- 360---------------------------