[SCSI] libfc: remote port gets stuck in restart state without really restarting

We ran into a scenario where a remote port goes into RESTART state, but never gets added to scsi transport. The running vmcore showed the following: a) Port was in RESTART state b) rdata->event was STOP c) no work gets scheduled for the remote work to fc_rport_work After this point, shut/no-shut of the remote port did not cause the port to get re-discovered. The port would move betwen DELETE and RESTART states, but the event would always be STOP, no work would get scheduled to fc_rport_work and the port would not get added to scsi_transport. The problem is that rdata->event is not set to NONE after a port is restarted. After this point, no more work gets scheduled for the remote port since new work is scheduled only if rdata->event is non-NONE. So, the event and state keep changing, but fc_rport_work does not get scheduled to actually handle the event. Here's a transition of states that explains the above observation: ) Port is first in READY State, event is NONE 2) RSCN on shut, port goes to DELETED, event is stop 3) Before fc_rport_work runs, RSCN on no-shut, port goes to RESTART, event is still STOP 4) fc_rport_work gets scheduled, removes the port from transport, sees state as RESTART, begins the PLOGI state machine, event remains as STOP (event NOT changed to NONE, this is the bug) 5) Plogi state machine completes, port state goes to READY, event goes to READY, but no work is scheduled since event was STOP (non-NONE) before. Fc_rport_work is not scheduled, port remains in READY state, but is not added to transport. Things are broken at this point. Libfc rport is ready, but no transport rport created. 6) now a shut causes port state to change to DELETE, event to change to STOP, no work gets scheduled 7) no-shut causes port state to change to RESTART, event remains at STOP, no work gets scheduled (6) and (7) now get repeated everytime we do shut/no-shut. No way to get out of this state. Fcc reset does not help too. Only way to get out is to load/unload module. Fix is to set rdata->event to NONE while processing the STOP/LOGO/FAILED events, inside the discovery and rport locks. Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: James Bottomley <James.Bottomley@suse.de>
author: Abhijeet Joglekar <abjoglek@cisco.com> 2009-12-10 12:59:20 -0500
committer: James Bottomley <James.Bottomley@suse.de> 2009-12-12 17:29:47 -0500
commit: 5543c72e2bbb30e5ba5938b18ec26617b8b3fb04 (patch)
tree: eca32103b2d2b70f8aa8499144a8db2c40b00645 /drivers
parent: 83e7332941e3e2621502aadb0e5c8a3b11fd1197 (diff)
1 files changed, 1 insertions, 0 deletions
diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c
index 35ca0e72df46..02300523b234 100644
--- a/drivers/scsi/libfc/fc_rport.c
+++ b/drivers/scsi/libfc/fc_rport.c
@@ -310,6 +310,7 @@ static void fc_rport_work(struct work_struct *work)
                                restart = 1;
                        else
                                list_del(&rdata->peers);
+                        rdata->event = RPORT_EV_NONE;
                        mutex_unlock(&rdata->rp_mutex);
                        mutex_unlock(&lport->disc.disc_mutex);
                }
author	Abhijeet Joglekar <abjoglek@cisco.com>	2009-12-10 12:59:20 -0500
committer	James Bottomley <James.Bottomley@suse.de>	2009-12-12 17:29:47 -0500
commit	5543c72e2bbb30e5ba5938b18ec26617b8b3fb04 (patch)
tree	eca32103b2d2b70f8aa8499144a8db2c40b00645 /drivers
parent	83e7332941e3e2621502aadb0e5c8a3b11fd1197 (diff)

diff --git a/drivers/scsi/libfc/fc_rport.c b/drivers/scsi/libfc/fc_rport.c index 35ca0e72df46..02300523b234 100644 --- a/drivers/scsi/libfc/fc_rport.c +++ b/drivers/scsi/libfc/fc_rport.c
@@ -310,6 +310,7 @@ static void fc_rport_work(struct work_struct *work)
310	restart = 1;	310	restart = 1;
311	else	311	else
312	list_del(&rdata->peers);	312	list_del(&rdata->peers);
		313	rdata->event = RPORT_EV_NONE;
313	mutex_unlock(&rdata->rp_mutex);	314	mutex_unlock(&rdata->rp_mutex);
314	mutex_unlock(&lport->disc.disc_mutex);	315	mutex_unlock(&lport->disc.disc_mutex);
315	}	316	}