aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_input.c
diff options
context:
space:
mode:
authorCong Wang <amwang@redhat.com>2013-06-14 21:39:18 -0400
committerDavid S. Miller <davem@davemloft.net>2013-06-20 02:06:51 -0400
commitbcefe17cffd06efdda3e7ad679ea743236e6271a (patch)
tree5a21d15192afc50529274bc614541cacbf4fa73f /net/ipv4/tcp_input.c
parent2c0740e4e122239bcf6127fd2063733c5fb20c93 (diff)
tcp: introduce a per-route knob for quick ack
In previous discussions, I tried to find some reasonable heuristics for delayed ACK, however this seems not possible, according to Eric: "ACKS might also be delayed because of bidirectional traffic, and is more controlled by the application response time. TCP stack can not easily estimate it." "ACK can be incredibly useful to recover from losses in a short time. The vast majority of TCP sessions are small lived, and we send one ACK per received segment anyway at beginning or retransmits to let the sender smoothly increase its cwnd, so an auto-tuning facility wont help them that much." and according to David: "ACKs are the only information we have to detect loss. And, for the same reasons that TCP VEGAS is fundamentally broken, we cannot measure the pipe or some other receiver-side-visible piece of information to determine when it's "safe" to stretch ACK. And even if it's "safe", we should not do it so that losses are accurately detected and we don't spuriously retransmit. The only way to know when the bandwidth increases is to "test" it, by sending more and more packets until drops happen. That's why all successful congestion control algorithms must operate on explicited tested pieces of information. Similarly, it's not really possible to universally know if it's safe to stretch ACK or not." It still makes sense to enable or disable quick ack mode like what TCP_QUICK_ACK does. Similar to TCP_QUICK_ACK option, but for people who can't modify the source code and still wants to control TCP delayed ACK behavior. As David suggested, this should belong to per-path scope, since different pathes may want different behaviors. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Rick Jones <rick.jones2@hp.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Graf <tgraf@suug.ch> CC: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_input.c')
-rw-r--r--net/ipv4/tcp_input.c5
1 files changed, 4 insertions, 1 deletions
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 46271cdcf088..28af45abe062 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3717,6 +3717,7 @@ void tcp_reset(struct sock *sk)
3717static void tcp_fin(struct sock *sk) 3717static void tcp_fin(struct sock *sk)
3718{ 3718{
3719 struct tcp_sock *tp = tcp_sk(sk); 3719 struct tcp_sock *tp = tcp_sk(sk);
3720 const struct dst_entry *dst;
3720 3721
3721 inet_csk_schedule_ack(sk); 3722 inet_csk_schedule_ack(sk);
3722 3723
@@ -3728,7 +3729,9 @@ static void tcp_fin(struct sock *sk)
3728 case TCP_ESTABLISHED: 3729 case TCP_ESTABLISHED:
3729 /* Move to CLOSE_WAIT */ 3730 /* Move to CLOSE_WAIT */
3730 tcp_set_state(sk, TCP_CLOSE_WAIT); 3731 tcp_set_state(sk, TCP_CLOSE_WAIT);
3731 inet_csk(sk)->icsk_ack.pingpong = 1; 3732 dst = __sk_dst_get(sk);
3733 if (!dst || !dst_metric(dst, RTAX_QUICKACK))
3734 inet_csk(sk)->icsk_ack.pingpong = 1;
3732 break; 3735 break;
3733 3736
3734 case TCP_CLOSE_WAIT: 3737 case TCP_CLOSE_WAIT: