aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4/tcp_output.c
diff options
context:
space:
mode:
authorFlorian Westphal <fw@strlen.de>2014-11-03 11:35:03 -0500
committerDavid S. Miller <davem@davemloft.net>2014-11-04 16:06:09 -0500
commitf7b3bec6f5167efaf56b756abfafb924cb1d3050 (patch)
tree511fb5930fd9d2eaeb040c901287dbc82c8d4c0d /net/ipv4/tcp_output.c
parentf1673381b1481a409238d4552a0700d490c5b36c (diff)
net: allow setting ecn via routing table
This patch allows to set ECN on a per-route basis in case the sysctl tcp_ecn is not set to 1. In other words, when ECN is set for specific routes, it provides a tcp_ecn=1 behaviour for that route while the rest of the stack acts according to the global settings. One can use 'ip route change dev $dev $net features ecn' to toggle this. Having a more fine-grained per-route setting can be beneficial for various reasons, for example, 1) within data centers, or 2) local ISPs may deploy ECN support for their own video/streaming services [1], etc. There was a recent measurement study/paper [2] which scanned the Alexa's publicly available top million websites list from a vantage point in US, Europe and Asia: Half of the Alexa list will now happily use ECN (tcp_ecn=2, most likely blamed to commit 255cac91c3 ("tcp: extend ECN sysctl to allow server-side only ECN") ;)); the break in connectivity on-path was found is about 1 in 10,000 cases. Timeouts rather than receiving back RSTs were much more common in the negotiation phase (and mostly seen in the Alexa middle band, ranks around 50k-150k): from 12-thousand hosts on which there _may_ be ECN-linked connection failures, only 79 failed with RST when _not_ failing with RST when ECN is not requested. It's unclear though, how much equipment in the wild actually marks CE when buffers start to fill up. We thought about a fallback to non-ECN for retransmitted SYNs as another global option (which could perhaps one day be made default), but as Eric points out, there's much more work needed to detect broken middleboxes. Two examples Eric mentioned are buggy firewalls that accept only a single SYN per flow, and middleboxes that successfully let an ECN flow establish, but later mark CE for all packets (so cwnd converges to 1). [1] http://www.ietf.org/proceedings/89/slides/slides-89-tsvarea-1.pdf, p.15 [2] http://ecn.ethz.ch/ Joint work with Daniel Borkmann. Reference: http://thread.gmane.org/gmane.linux.network/335797 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net/ipv4/tcp_output.c')
-rw-r--r--net/ipv4/tcp_output.c13
1 files changed, 11 insertions, 2 deletions
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index a3d453b94747..0b88158dd4a7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -333,10 +333,19 @@ static void tcp_ecn_send_synack(struct sock *sk, struct sk_buff *skb)
333static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb) 333static void tcp_ecn_send_syn(struct sock *sk, struct sk_buff *skb)
334{ 334{
335 struct tcp_sock *tp = tcp_sk(sk); 335 struct tcp_sock *tp = tcp_sk(sk);
336 bool use_ecn = sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 ||
337 tcp_ca_needs_ecn(sk);
338
339 if (!use_ecn) {
340 const struct dst_entry *dst = __sk_dst_get(sk);
341
342 if (dst && dst_feature(dst, RTAX_FEATURE_ECN))
343 use_ecn = true;
344 }
336 345
337 tp->ecn_flags = 0; 346 tp->ecn_flags = 0;
338 if (sock_net(sk)->ipv4.sysctl_tcp_ecn == 1 || 347
339 tcp_ca_needs_ecn(sk)) { 348 if (use_ecn) {
340 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR; 349 TCP_SKB_CB(skb)->tcp_flags |= TCPHDR_ECE | TCPHDR_CWR;
341 tp->ecn_flags = TCP_ECN_OK; 350 tp->ecn_flags = TCP_ECN_OK;
342 if (tcp_ca_needs_ecn(sk)) 351 if (tcp_ca_needs_ecn(sk))