aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_output.c
diff options
context:
space:
mode:
authorNeal Cardwell <ncardwell@google.com>2017-12-11 18:42:53 -0500
committerDavid S. Miller <davem@davemloft.net>2017-12-13 13:59:21 -0500
commitb4f70c3d4ec32a2ff4c62e1e2da0da5f55fe12bd (patch)
tree2bb3c45c52f11fbaf4dd51e5529c2fb6e86bab5c /net/ipv4/tcp_output.c
parent039af9c66b93154b493e3088a36b251b99c9b3c4 (diff)
tcp: allow TLP in ECN CWR
This patch enables tail loss probe in cwnd reduction (CWR) state to detect potential losses. Prior to this patch, since the sender uses PRR to determine the cwnd in CWR state, the combination of CWR+PRR plus tcp_tso_should_defer() could cause unnecessary stalls upon losses: PRR makes cwnd so gentle that tcp_tso_should_defer() defers sending wait for more ACKs. The ACKs may not come due to packet losses. Disallowing TLP when there is unused cwnd had the primary effect of disallowing TLP when there is TSO deferral, Nagle deferral, or we hit the rwin limit. Because basically every application write() or incoming ACK will cause us to run tcp_write_xmit() to see if we can send more, and then if we sent something we call tcp_schedule_loss_probe() to see if we should schedule a TLP. At that point, there are a few common reasons why some cwnd budget could still be unused: (a) rwin limit (b) nagle check (c) TSO deferral (d) TSQ For (d), after the next packet tx completion the TSQ mechanism will allow us to send more packets, so we don't really need a TLP (in practice it shouldn't matter whether we schedule one or not). But for (a), (b), (c) the sender won't send any more packets until it gets another ACK. But if the whole flight was lost, or all the ACKs were lost, then we won't get any more ACKs, and ideally we should schedule and send a TLP to get more feedback. In particular for a long time we have wanted some kind of timer for TSO deferral, and at least this would give us some kind of timer Reported-by: Steve Ibanez <sibanez@stanford.edu> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Reviewed-by: Nandita Dukkipati <nanditad@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_output.c')
-rw-r--r--net/ipv4/tcp_output.c9
1 files changed, 3 insertions, 6 deletions
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a4d214c7b506..04be9f833927 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2414,15 +2414,12 @@ bool tcp_schedule_loss_probe(struct sock *sk, bool advancing_rto)
2414 2414
2415 early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans; 2415 early_retrans = sock_net(sk)->ipv4.sysctl_tcp_early_retrans;
2416 /* Schedule a loss probe in 2*RTT for SACK capable connections 2416 /* Schedule a loss probe in 2*RTT for SACK capable connections
2417 * in Open state, that are either limited by cwnd or application. 2417 * not in loss recovery, that are either limited by cwnd or application.
2418 */ 2418 */
2419 if ((early_retrans != 3 && early_retrans != 4) || 2419 if ((early_retrans != 3 && early_retrans != 4) ||
2420 !tp->packets_out || !tcp_is_sack(tp) || 2420 !tp->packets_out || !tcp_is_sack(tp) ||
2421 icsk->icsk_ca_state != TCP_CA_Open) 2421 (icsk->icsk_ca_state != TCP_CA_Open &&
2422 return false; 2422 icsk->icsk_ca_state != TCP_CA_CWR))
2423
2424 if ((tp->snd_cwnd > tcp_packets_in_flight(tp)) &&
2425 !tcp_write_queue_empty(sk))
2426 return false; 2423 return false;
2427 2424
2428 /* Probe timeout is 2*rtt. Add minimum RTO to account 2425 /* Probe timeout is 2*rtt. Add minimum RTO to account