tipc: Ensure both nodes recognize loss of contact between them

Enhances TIPC to ensure that a node that loses contact with a neighboring node does not allow contact to be re-established until it sees that its peer has also recognized the loss of contact. Previously, nodes that were connected by two or more links could encounter a situation in which node A would lose contact with node B on all of its links, purge its name table of names published by B, and then fail to repopulate those names once contact with B was restored. This would happen because B was able to re-establish one or more links so quickly that it never reached a point where it had no links to A -- meaning that B never saw a loss of contact with A, and consequently didn't re-publish its names to A. This problem is now prevented by enhancing the cleanup done by TIPC following a loss of contact with a neighboring node to ensure that node A ignores all messages sent by B until it receives a LINK_PROTOCOL message that indicates B has lost contact with A, thereby preventing the (re)establishment of links between the nodes. The loss of contact is recognized when a RESET or ACTIVATE message is received that has a "redundant link exists" field of 0, indicating that B's sending link endpoint is in a reset state and that B has no other working links. Additionally, TIPC now suppresses the sending of (most) link protocol messages to a neighboring node while it is cleaning up after an earlier loss of contact with that node. This stops the peer node from prematurely activating its link endpoint, which would prevent TIPC from later activating its own end. TIPC still allows outgoing RESET messages to occur during cleanup, to avoid problems if its own node recognizes the loss of contact first and tries to notify the peer of the situation. Finally, TIPC now recognizes an impending loss of contact with a peer node as soon as it receives a RESET message on a working link that is the peer's only link to the node, and ensures that the link protocol suppression mentioned above goes into effect right away -- that is, even before its own link endpoints have failed. This is necessary to ensure correct operation when there are redundant links between the nodes, since otherwise TIPC would send an ACTIVATE message upon receiving a RESET on its first link and only begin suppressing when a RESET on its second link was received, instead of initiating suppression with the first RESET message as it needs to. Note: The reworked cleanup code also eliminates a check that prevented a link endpoint's discovery object from responding to incoming messages while stale name table entries are being purged. This check is now unnecessary and would have slowed down re-establishment of communication between the nodes in some situations. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
author: Allan Stephens <allan.stephens@windriver.com> 2011-05-27 11:00:51 -0400
committer: Paul Gortmaker <paul.gortmaker@windriver.com> 2011-09-17 22:55:03 -0400
commit: b4b5610223f17790419b03eaa962b0e3ecf930d7 (patch)
tree: a8cdae892e3e2eac0ea1e5493cded8394f8a5d6d /net/tipc/node.c
parent: 4b3743ef2ca67e1f8ef7e9d4c551d6ba6ee85584 (diff)
1 files changed, 6 insertions, 5 deletions
diff --git a/net/tipc/node.c b/net/tipc/node.c
index d75432f5e726..27b4bb0cca6c 100644
--- a/net/tipc/node.c
+++ b/net/tipc/node.c
@@ -112,6 +112,7 @@ struct tipc_node *tipc_node_create(u32 addr)
                        break;
        }
        list_add_tail(&n_ptr->list, &temp_node->list);
+        n_ptr->block_setup = WAIT_PEER_DOWN;
        tipc_num_nodes++;
@@ -312,7 +313,7 @@ static void node_established_contact(struct tipc_node *n_ptr)
        }
 }
-static void node_cleanup_finished(unsigned long node_addr)
+static void node_name_purge_complete(unsigned long node_addr)
 {
        struct tipc_node *n_ptr;
@@ -320,7 +321,7 @@ static void node_cleanup_finished(unsigned long node_addr)
        n_ptr = tipc_node_find(node_addr);
        if (n_ptr) {
                tipc_node_lock(n_ptr);
-                n_ptr->cleanup_required = 0;
+                n_ptr->block_setup &= ~WAIT_NAMES_GONE;
                tipc_node_unlock(n_ptr);
        }
        read_unlock_bh(&tipc_net_lock);
@@ -371,10 +372,10 @@ static void node_lost_contact(struct tipc_node *n_ptr)
        /* Notify subscribers */
        tipc_nodesub_notify(n_ptr);
-        /* Prevent re-contact with node until all cleanup is done */
+        /* Prevent re-contact with node until cleanup is done */
-        n_ptr->cleanup_required = 1;
+        n_ptr->block_setup = WAIT_PEER_DOWN | WAIT_NAMES_GONE;
-        tipc_k_signal((Handler)node_cleanup_finished, n_ptr->addr);
+        tipc_k_signal((Handler)node_name_purge_complete, n_ptr->addr);
 }
 struct sk_buff *tipc_node_get_nodes(const void *req_tlv_area, int req_tlv_space)
author	Allan Stephens <allan.stephens@windriver.com>	2011-05-27 11:00:51 -0400
committer	Paul Gortmaker <paul.gortmaker@windriver.com>	2011-09-17 22:55:03 -0400
commit	b4b5610223f17790419b03eaa962b0e3ecf930d7 (patch)
tree	a8cdae892e3e2eac0ea1e5493cded8394f8a5d6d /net/tipc/node.c
parent	4b3743ef2ca67e1f8ef7e9d4c551d6ba6ee85584 (diff)