aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp.c
diff options
context:
space:
mode:
authorYaogong Wang <wygivan@google.com>2016-09-07 17:49:28 -0400
committerDavid S. Miller <davem@davemloft.net>2016-09-08 20:25:58 -0400
commit9f5afeae51526b3ad7b7cb21ee8b145ce6ea7a7a (patch)
treef434343314c30020025c7a84507107c4aad60fd4 /net/ipv4/tcp.c
parent3b61075be0929569e4de8b905ae6628d3285442f (diff)
tcp: use an RB tree for ooo receive queue
Over the years, TCP BDP has increased by several orders of magnitude, and some people are considering to reach the 2 Gbytes limit. Even with current window scale limit of 14, ~1 Gbytes maps to ~740,000 MSS. In presence of packet losses (or reorders), TCP stores incoming packets into an out of order queue, and number of skbs sitting there waiting for the missing packets to be received can be in the 10^5 range. Most packets are appended to the tail of this queue, and when packets can finally be transferred to receive queue, we scan the queue from its head. However, in presence of heavy losses, we might have to find an arbitrary point in this queue, involving a linear scan for every incoming packet, throwing away cpu caches. This patch converts it to a RB tree, to get bounded latencies. Yaogong wrote a preliminary patch about 2 years ago. Eric did the rebase, added ofo_last_skb cache, polishing and tests. Tested with network dropping between 1 and 10 % packets, with good success (about 30 % increase of throughput in stress tests) Next step would be to also use an RB tree for the write queue at sender side ;) Signed-off-by: Yaogong Wang <wygivan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Acked-By: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp.c')
-rw-r--r--net/ipv4/tcp.c4
1 files changed, 2 insertions, 2 deletions
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 77311a92275c..a13fcb369f52 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -380,7 +380,7 @@ void tcp_init_sock(struct sock *sk)
380 struct inet_connection_sock *icsk = inet_csk(sk); 380 struct inet_connection_sock *icsk = inet_csk(sk);
381 struct tcp_sock *tp = tcp_sk(sk); 381 struct tcp_sock *tp = tcp_sk(sk);
382 382
383 __skb_queue_head_init(&tp->out_of_order_queue); 383 tp->out_of_order_queue = RB_ROOT;
384 tcp_init_xmit_timers(sk); 384 tcp_init_xmit_timers(sk);
385 tcp_prequeue_init(tp); 385 tcp_prequeue_init(tp);
386 INIT_LIST_HEAD(&tp->tsq_node); 386 INIT_LIST_HEAD(&tp->tsq_node);
@@ -2243,7 +2243,7 @@ int tcp_disconnect(struct sock *sk, int flags)
2243 tcp_clear_xmit_timers(sk); 2243 tcp_clear_xmit_timers(sk);
2244 __skb_queue_purge(&sk->sk_receive_queue); 2244 __skb_queue_purge(&sk->sk_receive_queue);
2245 tcp_write_queue_purge(sk); 2245 tcp_write_queue_purge(sk);
2246 __skb_queue_purge(&tp->out_of_order_queue); 2246 skb_rbtree_purge(&tp->out_of_order_queue);
2247 2247
2248 inet->inet_dport = 0; 2248 inet->inet_dport = 0;
2249 2249