2 files changed, 261 insertions, 6 deletions
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
index 988a62fae11f..7ba2baa165ff 100644
--- a/Documentation/filesystems/sysfs-pci.txt
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -1,4 +1,5 @@
 Accessing PCI device resources through sysfs
+--------------------------------------------
 sysfs, usually mounted at /sys, provides access to PCI resources on platforms
 that support it.  For example, a given bus might look like this:
@@ -47,14 +48,21 @@ files, each with their own function.
  binary - file contains binary data
  cpumask - file contains a cpumask type
-The read only files are informational, writes to them will be ignored.
+The read only files are informational, writes to them will be ignored, with
-Writable files can be used to perform actions on the device (e.g. changing
+the exception of the 'rom' file.  Writable files can be used to perform
-config space, detaching a device).  mmapable files are available via an
+actions on the device (e.g. changing config space, detaching a device).
-mmap of the file at offset 0 and can be used to do actual device programming
+mmapable files are available via an mmap of the file at offset 0 and can be
-from userspace.  Note that some platforms don't support mmapping of certain
+used to do actual device programming from userspace.  Note that some platforms
-resources, so be sure to check the return value from any attempted mmap.
+don't support mmapping of certain resources, so be sure to check the return
+value from any attempted mmap.
+The 'rom' file is special in that it provides read-only access to the device's
+ROM file, if available.  It's disabled by default, however, so applications
+should write the string "1" to the file to enable it before attempting a read
+call, and disable it following the access by writing "0" to the file.
 Accessing legacy resources through sysfs
+----------------------------------------
 Legacy I/O port and ISA memory resources are also provided in sysfs if the
 underlying platform supports them.  They're located in the PCI class heirarchy,
@@ -75,6 +83,7 @@ simply dereference the returned pointer (after checking for errors of course)
 to access legacy memory space.
 Supporting PCI access on new platforms
+--------------------------------------
 In order to support PCI resource mapping as described above, Linux platform
 code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function.
diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt
new file mode 100644
index 000000000000..d089967e4948
--- /dev/null
+++ b/Documentation/pci-error-recovery.txt
@@ -0,0 +1,246 @@
+                       PCI Error Recovery
+                       ------------------
+                         May 31, 2005
+               Current document maintainer:
+           Linas Vepstas <linas@austin.ibm.com>
+Some PCI bus controllers are able to detect certain "hard" PCI errors
+on the bus, such as parity errors on the data and address busses, as
+well as SERR and PERR errors.  These chipsets are then able to disable
+I/O to/from the affected device, so that, for example, a bad DMA
+address doesn't end up corrupting system memory.  These same chipsets
+are also able to reset the affected PCI device, and return it to
+working condition.  This document describes a generic API form
+performing error recovery.
+The core idea is that after a PCI error has been detected, there must
+be a way for the kernel to coordinate with all affected device drivers
+so that the pci card can be made operational again, possibly after
+performing a full electrical #RST of the PCI card.  The API below
+provides a generic API for device drivers to be notified of PCI
+errors, and to be notified of, and respond to, a reset sequence.
+Preliminary sketch of API, cut-n-pasted-n-modified email from
+Ben Herrenschmidt, circa 5 april 2005
+The error recovery API support is exposed to the driver in the form of
+a structure of function pointers pointed to by a new field in struct
+pci_driver. The absence of this pointer in pci_driver denotes an
+"non-aware" driver, behaviour on these is platform dependant.
+Platforms like ppc64 can try to simulate pci hotplug remove/add.
+The definition of "pci_error_token" is not covered here. It is based on
+Seto's work on the synchronous error detection. We still need to define
+functions for extracting infos out of an opaque error token. This is
+separate from this API.
+This structure has the form:
+struct pci_error_handlers
+{
+        int (*error_detected)(struct pci_dev *dev, pci_error_token error);
+        int (*mmio_enabled)(struct pci_dev *dev);
+        int (*resume)(struct pci_dev *dev);
+        int (*link_reset)(struct pci_dev *dev);
+        int (*slot_reset)(struct pci_dev *dev);
+};
+A driver doesn't have to implement all of these callbacks. The
+only mandatory one is error_detected(). If a callback is not
+implemented, the corresponding feature is considered unsupported.
+For example, if mmio_enabled() and resume() aren't there, then the
+driver is assumed as not doing any direct recovery and requires
+a reset. If link_reset() is not implemented, the card is assumed as
+not caring about link resets, in which case, if recover is supported,
+the core can try recover (but not slot_reset() unless it really did
+reset the slot). If slot_reset() is not supported, link_reset() can
+be called instead on a slot reset.
+At first, the call will always be :
+        1) error_detected()
+        Error detected. This is sent once after an error has been detected. At
+this point, the device might not be accessible anymore depending on the
+platform (the slot will be isolated on ppc64). The driver may already
+have "noticed" the error because of a failing IO, but this is the proper
+"synchronisation point", that is, it gives a chance to the driver to
+cleanup, waiting for pending stuff (timers, whatever, etc...) to
+complete; it can take semaphores, schedule, etc... everything but touch
+the device. Within this function and after it returns, the driver
+shouldn't do any new IOs. Called in task context. This is sort of a
+"quiesce" point. See note about interrupts at the end of this doc.
+        Result codes:
+                - PCIERR_RESULT_CAN_RECOVER:
+                  Driever returns this if it thinks it might be able to recover
+                  the HW by just banging IOs or if it wants to be given
+                  a chance to extract some diagnostic informations (see
+                  below).
+                - PCIERR_RESULT_NEED_RESET:
+                  Driver returns this if it thinks it can't recover unless the
+                  slot is reset.
+                - PCIERR_RESULT_DISCONNECT:
+                  Return this if driver thinks it won't recover at all,
+                  (this will detach the driver ? or just leave it
+                  dangling ? to be decided)
+So at this point, we have called error_detected() for all drivers
+on the segment that had the error. On ppc64, the slot is isolated. What
+happens now typically depends on the result from the drivers. If all
+drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would
+re-enable IOs on the slot (or do nothing special if the platform doesn't
+isolate slots) and call 2). If not and we can reset slots, we go to 4),
+if neither, we have a dead slot. If it's an hotplug slot, we might
+"simulate" reset by triggering HW unplug/replug though.
+>>> Current ppc64 implementation assumes that a device driver will
+>>> *not* schedule or semaphore in this routine; the current ppc64
+>>> implementation uses one kernel thread to notify all devices;
+>>> thus, of one device sleeps/schedules, all devices are affected.
+>>> Doing better requires complex multi-threaded logic in the error
+>>> recovery implementation (e.g. waiting for all notification threads
+>>> to "join" before proceeding with recovery.)  This seems excessively
+>>> complex and not worth implementing.
+>>> The current ppc64 implementation doesn't much care if the device
+>>> attempts i/o at this point, or not.  I/O's will fail, returning
+>>> a value of 0xff on read, and writes will be dropped. If the device
+>>> driver attempts more than 10K I/O's to a frozen adapter, it will
+>>> assume that the device driver has gone into an infinite loop, and
+>>> it will panic the the kernel.
+        2) mmio_enabled()
+        This is the "early recovery" call. IOs are allowed again, but DMA is
+not (hrm... to be discussed, I prefer not), with some restrictions. This
+is NOT a callback for the driver to start operations again, only to
+peek/poke at the device, extract diagnostic information, if any, and
+eventually do things like trigger a device local reset or some such,
+but not restart operations. This is sent if all drivers on a segment
+agree that they can try to recover and no automatic link reset was
+performed by the HW. If the platform can't just re-enable IOs without
+a slot reset or a link reset, it doesn't call this callback and goes
+directly to 3) or 4). All IOs should be done _synchronously_ from
+within this callback, errors triggered by them will be returned via
+the normal pci_check_whatever() api, no new error_detected() callback
+will be issued due to an error happening here. However, such an error
+might cause IOs to be re-blocked for the whole segment, and thus
+invalidate the recovery that other devices on the same segment might
+have done, forcing the whole segment into one of the next states,
+that is link reset or slot reset.
+        Result codes:
+                - PCIERR_RESULT_RECOVERED
+                  Driver returns this if it thinks the device is fully
+                  functionnal and thinks it is ready to start
+                  normal driver operations again. There is no
+                  guarantee that the driver will actually be
+                  allowed to proceed, as another driver on the
+                  same segment might have failed and thus triggered a
+                  slot reset on platforms that support it.
+                - PCIERR_RESULT_NEED_RESET
+                  Driver returns this if it thinks the device is not
+                  recoverable in it's current state and it needs a slot
+                  reset to proceed.
+                - PCIERR_RESULT_DISCONNECT
+                  Same as above. Total failure, no recovery even after
+                  reset driver dead. (To be defined more precisely)
+>>> The current ppc64 implementation does not implement this callback.
+        3) link_reset()
+        This is called after the link has been reset. This is typically
+a PCI Express specific state at this point and is done whenever a
+non-fatal error has been detected that can be "solved" by resetting
+the link. This call informs the driver of the reset and the driver
+should check if the device appears to be in working condition.
+This function acts a bit like 2) mmio_enabled(), in that the driver
+is not supposed to restart normal driver I/O operations right away.
+Instead, it should just "probe" the device to check it's recoverability
+status. If all is right, then the core will call resume() once all
+drivers have ack'd link_reset().
+        Result codes:
+                (identical to mmio_enabled)
+>>> The current ppc64 implementation does not implement this callback.
+        4) slot_reset()
+        This is called after the slot has been soft or hard reset by the
+platform.  A soft reset consists of asserting the adapter #RST line
+and then restoring the PCI BARs and PCI configuration header. If the
+platform supports PCI hotplug, then it might instead perform a hard
+reset by toggling power on the slot off/on. This call gives drivers
+the chance to re-initialize the hardware (re-download firmware, etc.),
+but drivers shouldn't restart normal I/O processing operations at
+this point.  (See note about interrupts; interrupts aren't guaranteed
+to be delivered until the resume() callback has been called). If all
+device drivers report success on this callback, the patform will call
+resume() to complete the error handling and let the driver restart
+normal I/O processing.
+A driver can still return a critical failure for this function if
+it can't get the device operational after reset.  If the platform
+previously tried a soft reset, it migh now try a hard reset (power
+cycle) and then call slot_reset() again.  It the device still can't
+be recovered, there is nothing more that can be done;  the platform
+will typically report a "permanent failure" in such a case.  The
+device will be considered "dead" in this case.
+        Result codes:
+                - PCIERR_RESULT_DISCONNECT
+                Same as above.
+>>> The current ppc64 implementation does not try a power-cycle reset
+>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should.
+        5) resume()
+        This is called if all drivers on the segment have returned
+PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
+That basically tells the driver to restart activity, tht everything
+is back and running. No result code is taken into account here. If
+a new error happens, it will restart a new error handling process.
+That's it. I think this covers all the possibilities. The way those
+callbacks are called is platform policy. A platform with no slot reset
+capability for example may want to just "ignore" drivers that can't
+recover (disconnect them) and try to let other cards on the same segment
+recover. Keep in mind that in most real life cases, though, there will
+be only one driver per segment.
+Now, there is a note about interrupts. If you get an interrupt and your
+device is dead or has been isolated, there is a problem :)
+After much thinking, I decided to leave that to the platform. That is,
+the recovery API only precies that:
+ - There is no guarantee that interrupt delivery can proceed from any
+device on the segment starting from the error detection and until the
+restart callback is sent, at which point interrupts are expected to be
+fully operational.
+ - There is no guarantee that interrupt delivery is stopped, that is, ad
+river that gets an interrupts after detecting an error, or that detects
+and error within the interrupt handler such that it prevents proper
+ack'ing of the interrupt (and thus removal of the source) should just
+return IRQ_NOTHANDLED. It's up to the platform to deal with taht
+condition, typically by masking the irq source during the duration of
+the error handling. It is expected that the platform "knows" which
+interrupts are routed to error-management capable slots and can deal
+with temporarily disabling that irq number during error processing (this
+isn't terribly complex). That means some IRQ latency for other devices
+sharing the interrupt, but there is simply no other way. High end
+platforms aren't supposed to share interrupts between many devices
+anyway :)
+Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>

diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt index 988a62fae11f..7ba2baa165ff 100644 --- a/Documentation/filesystems/sysfs-pci.txt +++ b/Documentation/filesystems/sysfs-pci.txt
@@ -1,4 +1,5 @@
1	Accessing PCI device resources through sysfs	1	Accessing PCI device resources through sysfs
		2	--------------------------------------------
2		3
3	sysfs, usually mounted at /sys, provides access to PCI resources on platforms	4	sysfs, usually mounted at /sys, provides access to PCI resources on platforms
4	that support it. For example, a given bus might look like this:	5	that support it. For example, a given bus might look like this:
@@ -47,14 +48,21 @@ files, each with their own function.
47	binary - file contains binary data	48	binary - file contains binary data
48	cpumask - file contains a cpumask type	49	cpumask - file contains a cpumask type
49		50
50	The read only files are informational, writes to them will be ignored.	51	The read only files are informational, writes to them will be ignored, with
51	Writable files can be used to perform actions on the device (e.g. changing	52	the exception of the 'rom' file. Writable files can be used to perform
52	config space, detaching a device). mmapable files are available via an	53	actions on the device (e.g. changing config space, detaching a device).
53	mmap of the file at offset 0 and can be used to do actual device programming	54	mmapable files are available via an mmap of the file at offset 0 and can be
54	from userspace. Note that some platforms don't support mmapping of certain	55	used to do actual device programming from userspace. Note that some platforms
55	resources, so be sure to check the return value from any attempted mmap.	56	don't support mmapping of certain resources, so be sure to check the return
		57	value from any attempted mmap.
		58
		59	The 'rom' file is special in that it provides read-only access to the device's
		60	ROM file, if available. It's disabled by default, however, so applications
		61	should write the string "1" to the file to enable it before attempting a read
		62	call, and disable it following the access by writing "0" to the file.
56		63
57	Accessing legacy resources through sysfs	64	Accessing legacy resources through sysfs
		65	----------------------------------------
58		66
59	Legacy I/O port and ISA memory resources are also provided in sysfs if the	67	Legacy I/O port and ISA memory resources are also provided in sysfs if the
60	underlying platform supports them. They're located in the PCI class heirarchy,	68	underlying platform supports them. They're located in the PCI class heirarchy,
@@ -75,6 +83,7 @@ simply dereference the returned pointer (after checking for errors of course)
75	to access legacy memory space.	83	to access legacy memory space.
76		84
77	Supporting PCI access on new platforms	85	Supporting PCI access on new platforms
		86	--------------------------------------
78		87
79	In order to support PCI resource mapping as described above, Linux platform	88	In order to support PCI resource mapping as described above, Linux platform
80	code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function.	89	code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function.


diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt new file mode 100644 index 000000000000..d089967e4948 --- /dev/null +++ b/Documentation/pci-error-recovery.txt
@@ -0,0 +1,246 @@
		1
		2	PCI Error Recovery
		3	------------------
		4	May 31, 2005
		5
		6	Current document maintainer:
		7	Linas Vepstas <linas@austin.ibm.com>
		8
		9
		10	Some PCI bus controllers are able to detect certain "hard" PCI errors
		11	on the bus, such as parity errors on the data and address busses, as
		12	well as SERR and PERR errors. These chipsets are then able to disable
		13	I/O to/from the affected device, so that, for example, a bad DMA
		14	address doesn't end up corrupting system memory. These same chipsets
		15	are also able to reset the affected PCI device, and return it to
		16	working condition. This document describes a generic API form
		17	performing error recovery.
		18
		19	The core idea is that after a PCI error has been detected, there must
		20	be a way for the kernel to coordinate with all affected device drivers
		21	so that the pci card can be made operational again, possibly after
		22	performing a full electrical #RST of the PCI card. The API below
		23	provides a generic API for device drivers to be notified of PCI
		24	errors, and to be notified of, and respond to, a reset sequence.
		25
		26	Preliminary sketch of API, cut-n-pasted-n-modified email from
		27	Ben Herrenschmidt, circa 5 april 2005
		28
		29	The error recovery API support is exposed to the driver in the form of
		30	a structure of function pointers pointed to by a new field in struct
		31	pci_driver. The absence of this pointer in pci_driver denotes an
		32	"non-aware" driver, behaviour on these is platform dependant.
		33	Platforms like ppc64 can try to simulate pci hotplug remove/add.
		34
		35	The definition of "pci_error_token" is not covered here. It is based on
		36	Seto's work on the synchronous error detection. We still need to define
		37	functions for extracting infos out of an opaque error token. This is
		38	separate from this API.
		39
		40	This structure has the form:
		41
		42	struct pci_error_handlers
		43	{
		44	int (error_detected)(struct pci_dev dev, pci_error_token error);
		45	int (mmio_enabled)(struct pci_dev dev);
		46	int (resume)(struct pci_dev dev);
		47	int (link_reset)(struct pci_dev dev);
		48	int (slot_reset)(struct pci_dev dev);
		49	};
		50
		51	A driver doesn't have to implement all of these callbacks. The
		52	only mandatory one is error_detected(). If a callback is not
		53	implemented, the corresponding feature is considered unsupported.
		54	For example, if mmio_enabled() and resume() aren't there, then the
		55	driver is assumed as not doing any direct recovery and requires
		56	a reset. If link_reset() is not implemented, the card is assumed as
		57	not caring about link resets, in which case, if recover is supported,
		58	the core can try recover (but not slot_reset() unless it really did
		59	reset the slot). If slot_reset() is not supported, link_reset() can
		60	be called instead on a slot reset.
		61
		62	At first, the call will always be :
		63
		64	1) error_detected()
		65
		66	Error detected. This is sent once after an error has been detected. At
		67	this point, the device might not be accessible anymore depending on the
		68	platform (the slot will be isolated on ppc64). The driver may already
		69	have "noticed" the error because of a failing IO, but this is the proper
		70	"synchronisation point", that is, it gives a chance to the driver to
		71	cleanup, waiting for pending stuff (timers, whatever, etc...) to
		72	complete; it can take semaphores, schedule, etc... everything but touch
		73	the device. Within this function and after it returns, the driver
		74	shouldn't do any new IOs. Called in task context. This is sort of a
		75	"quiesce" point. See note about interrupts at the end of this doc.
		76
		77	Result codes:
		78	- PCIERR_RESULT_CAN_RECOVER:
		79	Driever returns this if it thinks it might be able to recover
		80	the HW by just banging IOs or if it wants to be given
		81	a chance to extract some diagnostic informations (see
		82	below).
		83	- PCIERR_RESULT_NEED_RESET:
		84	Driver returns this if it thinks it can't recover unless the
		85	slot is reset.
		86	- PCIERR_RESULT_DISCONNECT:
		87	Return this if driver thinks it won't recover at all,
		88	(this will detach the driver ? or just leave it
		89	dangling ? to be decided)
		90
		91	So at this point, we have called error_detected() for all drivers
		92	on the segment that had the error. On ppc64, the slot is isolated. What
		93	happens now typically depends on the result from the drivers. If all
		94	drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would
		95	re-enable IOs on the slot (or do nothing special if the platform doesn't
		96	isolate slots) and call 2). If not and we can reset slots, we go to 4),
		97	if neither, we have a dead slot. If it's an hotplug slot, we might
		98	"simulate" reset by triggering HW unplug/replug though.
		99
		100	>>> Current ppc64 implementation assumes that a device driver will
		101	>>> not schedule or semaphore in this routine; the current ppc64
		102	>>> implementation uses one kernel thread to notify all devices;
		103	>>> thus, of one device sleeps/schedules, all devices are affected.
		104	>>> Doing better requires complex multi-threaded logic in the error
		105	>>> recovery implementation (e.g. waiting for all notification threads
		106	>>> to "join" before proceeding with recovery.) This seems excessively
		107	>>> complex and not worth implementing.
		108
		109	>>> The current ppc64 implementation doesn't much care if the device
		110	>>> attempts i/o at this point, or not. I/O's will fail, returning
		111	>>> a value of 0xff on read, and writes will be dropped. If the device
		112	>>> driver attempts more than 10K I/O's to a frozen adapter, it will
		113	>>> assume that the device driver has gone into an infinite loop, and
		114	>>> it will panic the the kernel.
		115
		116	2) mmio_enabled()
		117
		118	This is the "early recovery" call. IOs are allowed again, but DMA is
		119	not (hrm... to be discussed, I prefer not), with some restrictions. This
		120	is NOT a callback for the driver to start operations again, only to
		121	peek/poke at the device, extract diagnostic information, if any, and
		122	eventually do things like trigger a device local reset or some such,
		123	but not restart operations. This is sent if all drivers on a segment
		124	agree that they can try to recover and no automatic link reset was
		125	performed by the HW. If the platform can't just re-enable IOs without
		126	a slot reset or a link reset, it doesn't call this callback and goes
		127	directly to 3) or 4). All IOs should be done _synchronously_ from
		128	within this callback, errors triggered by them will be returned via
		129	the normal pci_check_whatever() api, no new error_detected() callback
		130	will be issued due to an error happening here. However, such an error
		131	might cause IOs to be re-blocked for the whole segment, and thus
		132	invalidate the recovery that other devices on the same segment might
		133	have done, forcing the whole segment into one of the next states,
		134	that is link reset or slot reset.
		135
		136	Result codes:
		137	- PCIERR_RESULT_RECOVERED
		138	Driver returns this if it thinks the device is fully
		139	functionnal and thinks it is ready to start
		140	normal driver operations again. There is no
		141	guarantee that the driver will actually be
		142	allowed to proceed, as another driver on the
		143	same segment might have failed and thus triggered a
		144	slot reset on platforms that support it.
		145
		146	- PCIERR_RESULT_NEED_RESET
		147	Driver returns this if it thinks the device is not
		148	recoverable in it's current state and it needs a slot
		149	reset to proceed.
		150
		151	- PCIERR_RESULT_DISCONNECT
		152	Same as above. Total failure, no recovery even after
		153	reset driver dead. (To be defined more precisely)
		154
		155	>>> The current ppc64 implementation does not implement this callback.
		156
		157	3) link_reset()
		158
		159	This is called after the link has been reset. This is typically
		160	a PCI Express specific state at this point and is done whenever a
		161	non-fatal error has been detected that can be "solved" by resetting
		162	the link. This call informs the driver of the reset and the driver
		163	should check if the device appears to be in working condition.
		164	This function acts a bit like 2) mmio_enabled(), in that the driver
		165	is not supposed to restart normal driver I/O operations right away.
		166	Instead, it should just "probe" the device to check it's recoverability
		167	status. If all is right, then the core will call resume() once all
		168	drivers have ack'd link_reset().
		169
		170	Result codes:
		171	(identical to mmio_enabled)
		172
		173	>>> The current ppc64 implementation does not implement this callback.
		174
		175	4) slot_reset()
		176
		177	This is called after the slot has been soft or hard reset by the
		178	platform. A soft reset consists of asserting the adapter #RST line
		179	and then restoring the PCI BARs and PCI configuration header. If the
		180	platform supports PCI hotplug, then it might instead perform a hard
		181	reset by toggling power on the slot off/on. This call gives drivers
		182	the chance to re-initialize the hardware (re-download firmware, etc.),
		183	but drivers shouldn't restart normal I/O processing operations at
		184	this point. (See note about interrupts; interrupts aren't guaranteed
		185	to be delivered until the resume() callback has been called). If all
		186	device drivers report success on this callback, the patform will call
		187	resume() to complete the error handling and let the driver restart
		188	normal I/O processing.
		189
		190	A driver can still return a critical failure for this function if
		191	it can't get the device operational after reset. If the platform
		192	previously tried a soft reset, it migh now try a hard reset (power
		193	cycle) and then call slot_reset() again. It the device still can't
		194	be recovered, there is nothing more that can be done; the platform
		195	will typically report a "permanent failure" in such a case. The
		196	device will be considered "dead" in this case.
		197
		198	Result codes:
		199	- PCIERR_RESULT_DISCONNECT
		200	Same as above.
		201
		202	>>> The current ppc64 implementation does not try a power-cycle reset
		203	>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should.
		204
		205	5) resume()
		206
		207	This is called if all drivers on the segment have returned
		208	PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
		209	That basically tells the driver to restart activity, tht everything
		210	is back and running. No result code is taken into account here. If
		211	a new error happens, it will restart a new error handling process.
		212
		213	That's it. I think this covers all the possibilities. The way those
		214	callbacks are called is platform policy. A platform with no slot reset
		215	capability for example may want to just "ignore" drivers that can't
		216	recover (disconnect them) and try to let other cards on the same segment
		217	recover. Keep in mind that in most real life cases, though, there will
		218	be only one driver per segment.
		219
		220	Now, there is a note about interrupts. If you get an interrupt and your
		221	device is dead or has been isolated, there is a problem :)
		222
		223	After much thinking, I decided to leave that to the platform. That is,
		224	the recovery API only precies that:
		225
		226	- There is no guarantee that interrupt delivery can proceed from any
		227	device on the segment starting from the error detection and until the
		228	restart callback is sent, at which point interrupts are expected to be
		229	fully operational.
		230
		231	- There is no guarantee that interrupt delivery is stopped, that is, ad
		232	river that gets an interrupts after detecting an error, or that detects
		233	and error within the interrupt handler such that it prevents proper
		234	ack'ing of the interrupt (and thus removal of the source) should just
		235	return IRQ_NOTHANDLED. It's up to the platform to deal with taht
		236	condition, typically by masking the irq source during the duration of
		237	the error handling. It is expected that the platform "knows" which
		238	interrupts are routed to error-management capable slots and can deal
		239	with temporarily disabling that irq number during error processing (this
		240	isn't terribly complex). That means some IRQ latency for other devices
		241	sharing the interrupt, but there is simply no other way. High end
		242	platforms aren't supposed to share interrupts between many devices
		243	anyway :)
		244
		245
		246	Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>