diff options
author | Eric Dumazet <edumazet@google.com> | 2014-02-26 17:02:48 -0500 |
---|---|---|
committer | David S. Miller <davem@davemloft.net> | 2014-02-26 17:08:40 -0500 |
commit | 740b0f1841f6e39085b711d41db9ffb07198682b (patch) | |
tree | 7befd549fc20c51bff4c79790ad4520fcc0e324e /net/ipv4/tcp.c | |
parent | 363ec392352e55c61ce2799c3f15f89f9429bba7 (diff) |
tcp: switch rtt estimations to usec resolution
Upcoming congestion controls for TCP require usec resolution for RTT
estimations. Millisecond resolution is simply not enough these days.
FQ/pacing in DC environments also require this change for finer control
and removal of bimodal behavior due to the current hack in
tcp_update_pacing_rate() for 'small rtt'
TCP_CONG_RTT_STAMP is no longer needed.
As Julian Anastasov pointed out, we need to keep user compatibility :
tcp_metrics used to export RTT and RTTVAR in msec resolution,
so we added RTT_US and RTTVAR_US. An iproute2 patch is needed
to use the new attributes if provided by the kernel.
In this example ss command displays a srtt of 32 usecs (10Gbit link)
lpk51:~# ./ss -i dst lpk52
Netid State Recv-Q Send-Q Local Address:Port Peer
Address:Port
tcp ESTAB 0 1 10.246.11.51:42959
10.246.11.52:64614
cubic wscale:6,6 rto:201 rtt:0.032/0.001 ato:40 mss:1448
cwnd:10 send
3620.0Mbps pacing_rate 7240.0Mbps unacked:1 rcv_rtt:993 rcv_space:29559
Updated iproute2 ip command displays :
lpk51:~# ./ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 274us rttvar 213us source
10.246.11.51
Old binary displays :
lpk51:~# ip tcp_metrics | grep 10.246.11.52
10.246.11.52 age 561.914sec cwnd 10 rtt 250us rttvar 125us source
10.246.11.51
With help from Julian Anastasov, Stephen Hemminger and Yuchung Cheng
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Larry Brakmo <brakmo@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp.c')
-rw-r--r-- | net/ipv4/tcp.c | 8 |
1 files changed, 4 insertions, 4 deletions
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index bed379c7abcd..7374905b3701 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c | |||
@@ -387,7 +387,7 @@ void tcp_init_sock(struct sock *sk) | |||
387 | INIT_LIST_HEAD(&tp->tsq_node); | 387 | INIT_LIST_HEAD(&tp->tsq_node); |
388 | 388 | ||
389 | icsk->icsk_rto = TCP_TIMEOUT_INIT; | 389 | icsk->icsk_rto = TCP_TIMEOUT_INIT; |
390 | tp->mdev = TCP_TIMEOUT_INIT; | 390 | tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT); |
391 | 391 | ||
392 | /* So many TCP implementations out there (incorrectly) count the | 392 | /* So many TCP implementations out there (incorrectly) count the |
393 | * initial SYN frame in their delayed-ACK and congestion control | 393 | * initial SYN frame in their delayed-ACK and congestion control |
@@ -2339,7 +2339,7 @@ int tcp_disconnect(struct sock *sk, int flags) | |||
2339 | 2339 | ||
2340 | sk->sk_shutdown = 0; | 2340 | sk->sk_shutdown = 0; |
2341 | sock_reset_flag(sk, SOCK_DONE); | 2341 | sock_reset_flag(sk, SOCK_DONE); |
2342 | tp->srtt = 0; | 2342 | tp->srtt_us = 0; |
2343 | if ((tp->write_seq += tp->max_window + 2) == 0) | 2343 | if ((tp->write_seq += tp->max_window + 2) == 0) |
2344 | tp->write_seq = 1; | 2344 | tp->write_seq = 1; |
2345 | icsk->icsk_backoff = 0; | 2345 | icsk->icsk_backoff = 0; |
@@ -2783,8 +2783,8 @@ void tcp_get_info(const struct sock *sk, struct tcp_info *info) | |||
2783 | 2783 | ||
2784 | info->tcpi_pmtu = icsk->icsk_pmtu_cookie; | 2784 | info->tcpi_pmtu = icsk->icsk_pmtu_cookie; |
2785 | info->tcpi_rcv_ssthresh = tp->rcv_ssthresh; | 2785 | info->tcpi_rcv_ssthresh = tp->rcv_ssthresh; |
2786 | info->tcpi_rtt = jiffies_to_usecs(tp->srtt)>>3; | 2786 | info->tcpi_rtt = tp->srtt_us >> 3; |
2787 | info->tcpi_rttvar = jiffies_to_usecs(tp->mdev)>>2; | 2787 | info->tcpi_rttvar = tp->mdev_us >> 2; |
2788 | info->tcpi_snd_ssthresh = tp->snd_ssthresh; | 2788 | info->tcpi_snd_ssthresh = tp->snd_ssthresh; |
2789 | info->tcpi_snd_cwnd = tp->snd_cwnd; | 2789 | info->tcpi_snd_cwnd = tp->snd_cwnd; |
2790 | info->tcpi_advmss = tp->advmss; | 2790 | info->tcpi_advmss = tp->advmss; |