aboutsummaryrefslogtreecommitdiffstats
path: root/fs
diff options
context:
space:
mode:
authorDavid Teigland <teigland@redhat.com>2007-01-15 11:28:22 -0500
committerSteven Whitehouse <swhiteho@redhat.com>2007-02-05 13:36:58 -0500
commit222d396092acc11b4af03bede309aa066945e920 (patch)
treeec8e0ac0183faad1167af24ff33ba23ba0db85b1 /fs
parenta1bc86e6bddd34362ca08a3a4d898eb4b5c15215 (diff)
[DLM] fix master recovery
If master recovery happens on an rsb in one recovery sequence, then that sequence is aborted before lock recovery happens, then in the next sequence, we rely on the previous master recovery (which may now be invalid due to another node ignoring a lookup result) and go on do to the lock recovery where we get stuck due to an invalid master value. recovery cycle begins: master of rsb X has left nodes A and B send node C an rcom lookup for X to find the new master C gets lookup from B first, sets B as new master, and sends reply back to B C gets lookup from A next, and sends reply back to A saying B is master A gets lookup reply from C and sets B as the new master in the rsb recovery cycle on A, B and C is aborted to start a new recovery B gets lookup reply from C and ignores it since there's a new recovery recovery cycle begins: some other node has joined B doesn't think it's the master of X so it doesn't rebuild it in the directory C looks up the master of X, no one is master, so it becomes new master B looks up the master of X, finds it's C A believes that B is the master of X, so it sends its lock to B B sends an error back to A A resends this repeats forever, the incorrect master value on A is never corrected The fix is to do master recovery on an rsb that still has the NEW_MASTER flag set from an earlier recovery sequence, and therefore didn't complete lock recovery. Signed-off-by: David Teigland <teigland@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Diffstat (limited to 'fs')
-rw-r--r--fs/dlm/recover.c4
1 files changed, 3 insertions, 1 deletions
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index a7fa4cb6cd16..c2cc7694cd16 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -397,7 +397,9 @@ int dlm_recover_masters(struct dlm_ls *ls)
397 397
398 if (dlm_no_directory(ls)) 398 if (dlm_no_directory(ls))
399 count += recover_master_static(r); 399 count += recover_master_static(r);
400 else if (!is_master(r) && dlm_is_removed(ls, r->res_nodeid)) { 400 else if (!is_master(r) &&
401 (dlm_is_removed(ls, r->res_nodeid) ||
402 rsb_flag(r, RSB_NEW_MASTER))) {
401 recover_master(r); 403 recover_master(r);
402 count++; 404 count++;
403 } 405 }