aboutsummaryrefslogtreecommitdiffstats
path: root/net
diff options
context:
space:
mode:
authorDavid S. Miller <davem@davemloft.net>2014-11-05 16:34:47 -0500
committerDavid S. Miller <davem@davemloft.net>2014-11-05 16:34:47 -0500
commit1d76c1d028975df8488d1ae18a76f268eb5efa93 (patch)
tree01bfc4d3ef16fe7e5a4da0be1e7f3fd432e7495f /net
parent890b7916d0965829ad1c457aa61f049a210c19f8 (diff)
parenta8d31c128bf574bed2fa29e0512b24d446018a50 (diff)
Merge branch 'gue-next'
Tom Herbert says: ==================== gue: Remote checksum offload This patch set implements remote checksum offload for GUE, which is a mechanism that provides checksum offload of encapsulated packets using rudimentary offload capabilities found in most Network Interface Card (NIC) devices. The outer header checksum for UDP is enabled in packets and, with some additional meta information in the GUE header, a receiver is able to deduce the checksum to be set for an inner encapsulated packet. Effectively this offloads the computation of the inner checksum. Enabling the outer checksum in encapsulation has the additional advantage that it covers more of the packet than the inner checksum including the encapsulation headers. Remote checksum offload is described in: http://tools.ietf.org/html/draft-herbert-remotecsumoffload-01 The GUE transmit and receive paths are modified to support the remote checksum offload option. The option contains a checksum offset and checksum start which are directly derived from values set in stack when doing CHECKSUM_PARTIAL. On receipt of the option, the operation is to calculate the packet checksum from "start" to end of the packet (normally derived for checksum complete), and then set the resultant value at checksum "offset" (the checksum field has already been primed with the pseudo header). This emulates a NIC that implements NETIF_F_HW_CSUM. The primary purpose of this feature is to eliminate cost of performing checksum calculation over a packet when encpasulating. In this patch set: - Move fou_build_header into fou.c and split it into a couple of functions - Enable offloading of outer UDP checksum in encapsulation - Change udp_offload to support remote checksum offload, includes new GSO type and ensuring encapsulated layers (TCP) doesn't try to set a checksum covered by RCO - TX support for RCO with GUE. This is configured through ip_tunnel and set the option on transmit when packet being encapsulated is CHECKSUM_PARTIAL - RX support for RCO with GUE for normal and GRO paths. Includes resolving the offloaded checksum v2: Address comments from davem: Move accounting for private option field in gue_encap_hlen to patch in which we add the remote checksum offload option. Testing: I ran performance numbers using netperf TCP_STREAM and TCP_RR with 200 streams, comparing GUE with and without remote checksum offload (doing checksum-unnecessary to complete conversion in both cases). These were run on mlnx4 and bnx2x. Some mlnx4 results are below. GRE/GUE TCP_STREAM IPv4, with remote checksum offload 9.71% TX CPU utilization 7.42% RX CPU utilization 36380 Mbps IPv4, without remote checksum offload 12.40% TX CPU utilization 7.36% RX CPU utilization 36591 Mbps TCP_RR IPv4, with remote checksum offload 77.79% CPU utilization 91/144/216 90/95/99% latencies 1.95127e+06 tps IPv4, without remote checksum offload 78.70% CPU utilization 89/152/297 90/95/99% latencies 1.95458e+06 tps IPIP/GUE TCP_STREAM With remote checksum offload 10.30% TX CPU utilization 7.43% RX CPU utilization 36486 Mbps Without remote checksum offload 12.47% TX CPU utilization 7.49% RX CPU utilization 36694 Mbps TCP_RR With remote checksum offload 77.80% CPU utilization 87/153/270 90/95/99% latencies 1.98735e+06 tps Without remote checksum offload 77.98% CPU utilization 87/150/287 90/95/99% latencies 1.98737e+06 tps SIT/GUE TCP_STREAM With remote checksum offload 9.68% TX CPU utilization 7.36% RX CPU utilization 35971 Mbps Without remote checksum offload 12.95% TX CPU utilization 8.04% RX CPU utilization 36177 Mbps TCP_RR With remote checksum offload 79.32% CPU utilization 94/158/295 90/95/99% latencies 1.88842e+06 tps Without remote checksum offload 80.23% CPU utilization 94/149/226 90/95/99% latencies 1.90338e+06 tps VXLAN TCP_STREAM 35.03% TX CPU utilization 20.85% RX CPU utilization 36230 Mbps TCP_RR 77.36% CPU utilization 84/146/270 90/95/99% latencies 2.08063e+06 tps We can also look at CPU time in csum_partial using perf (with bnx2x setup). For GRE with TCP_STREAM I see: With remote checksum offload 0.33% TX 1.81% RX Without remote checksum offload 6.00% TX 0.51% RX I suspect the fact that time in csum_partial noticably increases with remote checksum offload for RX is due to taking the cache miss on the encapsulated header in that function. By similar reasoning, if on the TX side the packet were not in cache (say we did a splice from a file whose data was never touched by the CPU) the CPU savings for TX would probably be more pronounced. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
Diffstat (limited to 'net')
-rw-r--r--net/core/skbuff.c4
-rw-r--r--net/ipv4/Kconfig9
-rw-r--r--net/ipv4/af_inet.c1
-rw-r--r--net/ipv4/fou.c388
-rw-r--r--net/ipv4/ip_tunnel.c61
-rw-r--r--net/ipv4/tcp_offload.c1
-rw-r--r--net/ipv4/udp_offload.c66
-rw-r--r--net/ipv6/ip6_offload.c1
-rw-r--r--net/ipv6/udp_offload.c1
9 files changed, 421 insertions, 111 deletions
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e48e5c02e877..700189604f3d 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3013,7 +3013,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
3013 if (nskb->len == len + doffset) 3013 if (nskb->len == len + doffset)
3014 goto perform_csum_check; 3014 goto perform_csum_check;
3015 3015
3016 if (!sg) { 3016 if (!sg && !nskb->remcsum_offload) {
3017 nskb->ip_summed = CHECKSUM_NONE; 3017 nskb->ip_summed = CHECKSUM_NONE;
3018 nskb->csum = skb_copy_and_csum_bits(head_skb, offset, 3018 nskb->csum = skb_copy_and_csum_bits(head_skb, offset,
3019 skb_put(nskb, len), 3019 skb_put(nskb, len),
@@ -3085,7 +3085,7 @@ skip_fraglist:
3085 nskb->truesize += nskb->data_len; 3085 nskb->truesize += nskb->data_len;
3086 3086
3087perform_csum_check: 3087perform_csum_check:
3088 if (!csum) { 3088 if (!csum && !nskb->remcsum_offload) {
3089 nskb->csum = skb_checksum(nskb, doffset, 3089 nskb->csum = skb_checksum(nskb, doffset,
3090 nskb->len - doffset, 0); 3090 nskb->len - doffset, 0);
3091 nskb->ip_summed = CHECKSUM_NONE; 3091 nskb->ip_summed = CHECKSUM_NONE;
diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index e682b48e0709..bd2901604842 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -322,6 +322,15 @@ config NET_FOU
322 network mechanisms and optimizations for UDP (such as ECMP 322 network mechanisms and optimizations for UDP (such as ECMP
323 and RSS) can be leveraged to provide better service. 323 and RSS) can be leveraged to provide better service.
324 324
325config NET_FOU_IP_TUNNELS
326 bool "IP: FOU encapsulation of IP tunnels"
327 depends on NET_IPIP || NET_IPGRE || IPV6_SIT
328 select NET_FOU
329 ---help---
330 Allow configuration of FOU or GUE encapsulation for IP tunnels.
331 When this option is enabled IP tunnels can be configured to use
332 FOU or GUE encapsulation.
333
325config GENEVE 334config GENEVE
326 tristate "Generic Network Virtualization Encapsulation (Geneve)" 335 tristate "Generic Network Virtualization Encapsulation (Geneve)"
327 depends on INET 336 depends on INET
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 8b7fe5b03906..ed2c672c5b01 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1222,6 +1222,7 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
1222 SKB_GSO_TCPV6 | 1222 SKB_GSO_TCPV6 |
1223 SKB_GSO_UDP_TUNNEL | 1223 SKB_GSO_UDP_TUNNEL |
1224 SKB_GSO_UDP_TUNNEL_CSUM | 1224 SKB_GSO_UDP_TUNNEL_CSUM |
1225 SKB_GSO_TUNNEL_REMCSUM |
1225 SKB_GSO_MPLS | 1226 SKB_GSO_MPLS |
1226 0))) 1227 0)))
1227 goto out; 1228 goto out;
diff --git a/net/ipv4/fou.c b/net/ipv4/fou.c
index 32e78924e246..740ae099a0d9 100644
--- a/net/ipv4/fou.c
+++ b/net/ipv4/fou.c
@@ -38,21 +38,17 @@ static inline struct fou *fou_from_sock(struct sock *sk)
38 return sk->sk_user_data; 38 return sk->sk_user_data;
39} 39}
40 40
41static int fou_udp_encap_recv_deliver(struct sk_buff *skb, 41static void fou_recv_pull(struct sk_buff *skb, size_t len)
42 u8 protocol, size_t len)
43{ 42{
44 struct iphdr *iph = ip_hdr(skb); 43 struct iphdr *iph = ip_hdr(skb);
45 44
46 /* Remove 'len' bytes from the packet (UDP header and 45 /* Remove 'len' bytes from the packet (UDP header and
47 * FOU header if present), modify the protocol to the one 46 * FOU header if present).
48 * we found, and then call rcv_encap.
49 */ 47 */
50 iph->tot_len = htons(ntohs(iph->tot_len) - len); 48 iph->tot_len = htons(ntohs(iph->tot_len) - len);
51 __skb_pull(skb, len); 49 __skb_pull(skb, len);
52 skb_postpull_rcsum(skb, udp_hdr(skb), len); 50 skb_postpull_rcsum(skb, udp_hdr(skb), len);
53 skb_reset_transport_header(skb); 51 skb_reset_transport_header(skb);
54
55 return -protocol;
56} 52}
57 53
58static int fou_udp_recv(struct sock *sk, struct sk_buff *skb) 54static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
@@ -62,16 +58,78 @@ static int fou_udp_recv(struct sock *sk, struct sk_buff *skb)
62 if (!fou) 58 if (!fou)
63 return 1; 59 return 1;
64 60
65 return fou_udp_encap_recv_deliver(skb, fou->protocol, 61 fou_recv_pull(skb, sizeof(struct udphdr));
66 sizeof(struct udphdr)); 62
63 return -fou->protocol;
64}
65
66static struct guehdr *gue_remcsum(struct sk_buff *skb, struct guehdr *guehdr,
67 void *data, int hdrlen, u8 ipproto)
68{
69 __be16 *pd = data;
70 u16 start = ntohs(pd[0]);
71 u16 offset = ntohs(pd[1]);
72 u16 poffset = 0;
73 u16 plen;
74 __wsum csum, delta;
75 __sum16 *psum;
76
77 if (skb->remcsum_offload) {
78 /* Already processed in GRO path */
79 skb->remcsum_offload = 0;
80 return guehdr;
81 }
82
83 if (start > skb->len - hdrlen ||
84 offset > skb->len - hdrlen - sizeof(u16))
85 return NULL;
86
87 if (unlikely(skb->ip_summed != CHECKSUM_COMPLETE))
88 __skb_checksum_complete(skb);
89
90 plen = hdrlen + offset + sizeof(u16);
91 if (!pskb_may_pull(skb, plen))
92 return NULL;
93 guehdr = (struct guehdr *)&udp_hdr(skb)[1];
94
95 if (ipproto == IPPROTO_IP && sizeof(struct iphdr) < plen) {
96 struct iphdr *ip = (struct iphdr *)(skb->data + hdrlen);
97
98 /* If next header happens to be IP we can skip that for the
99 * checksum calculation since the IP header checksum is zero
100 * if correct.
101 */
102 poffset = ip->ihl * 4;
103 }
104
105 csum = csum_sub(skb->csum, skb_checksum(skb, poffset + hdrlen,
106 start - poffset - hdrlen, 0));
107
108 /* Set derived checksum in packet */
109 psum = (__sum16 *)(skb->data + hdrlen + offset);
110 delta = csum_sub(csum_fold(csum), *psum);
111 *psum = csum_fold(csum);
112
113 /* Adjust skb->csum since we changed the packet */
114 skb->csum = csum_add(skb->csum, delta);
115
116 return guehdr;
117}
118
119static int gue_control_message(struct sk_buff *skb, struct guehdr *guehdr)
120{
121 /* No support yet */
122 kfree_skb(skb);
123 return 0;
67} 124}
68 125
69static int gue_udp_recv(struct sock *sk, struct sk_buff *skb) 126static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
70{ 127{
71 struct fou *fou = fou_from_sock(sk); 128 struct fou *fou = fou_from_sock(sk);
72 size_t len; 129 size_t len, optlen, hdrlen;
73 struct guehdr *guehdr; 130 struct guehdr *guehdr;
74 struct udphdr *uh; 131 void *data;
132 u16 doffset = 0;
75 133
76 if (!fou) 134 if (!fou)
77 return 1; 135 return 1;
@@ -80,25 +138,61 @@ static int gue_udp_recv(struct sock *sk, struct sk_buff *skb)
80 if (!pskb_may_pull(skb, len)) 138 if (!pskb_may_pull(skb, len))
81 goto drop; 139 goto drop;
82 140
83 uh = udp_hdr(skb); 141 guehdr = (struct guehdr *)&udp_hdr(skb)[1];
84 guehdr = (struct guehdr *)&uh[1]; 142
143 optlen = guehdr->hlen << 2;
144 len += optlen;
85 145
86 len += guehdr->hlen << 2;
87 if (!pskb_may_pull(skb, len)) 146 if (!pskb_may_pull(skb, len))
88 goto drop; 147 goto drop;
89 148
90 uh = udp_hdr(skb); 149 /* guehdr may change after pull */
91 guehdr = (struct guehdr *)&uh[1]; 150 guehdr = (struct guehdr *)&udp_hdr(skb)[1];
92 151
93 if (guehdr->version != 0) 152 hdrlen = sizeof(struct guehdr) + optlen;
94 goto drop;
95 153
96 if (guehdr->flags) { 154 if (guehdr->version != 0 || validate_gue_flags(guehdr, optlen))
97 /* No support yet */
98 goto drop; 155 goto drop;
156
157 hdrlen = sizeof(struct guehdr) + optlen;
158
159 ip_hdr(skb)->tot_len = htons(ntohs(ip_hdr(skb)->tot_len) - len);
160
161 /* Pull UDP header now, skb->data points to guehdr */
162 __skb_pull(skb, sizeof(struct udphdr));
163
164 /* Pull csum through the guehdr now . This can be used if
165 * there is a remote checksum offload.
166 */
167 skb_postpull_rcsum(skb, udp_hdr(skb), len);
168
169 data = &guehdr[1];
170
171 if (guehdr->flags & GUE_FLAG_PRIV) {
172 __be32 flags = *(__be32 *)(data + doffset);
173
174 doffset += GUE_LEN_PRIV;
175
176 if (flags & GUE_PFLAG_REMCSUM) {
177 guehdr = gue_remcsum(skb, guehdr, data + doffset,
178 hdrlen, guehdr->proto_ctype);
179 if (!guehdr)
180 goto drop;
181
182 data = &guehdr[1];
183
184 doffset += GUE_PLEN_REMCSUM;
185 }
99 } 186 }
100 187
101 return fou_udp_encap_recv_deliver(skb, guehdr->next_hdr, len); 188 if (unlikely(guehdr->control))
189 return gue_control_message(skb, guehdr);
190
191 __skb_pull(skb, hdrlen);
192 skb_reset_transport_header(skb);
193
194 return -guehdr->proto_ctype;
195
102drop: 196drop:
103 kfree_skb(skb); 197 kfree_skb(skb);
104 return 0; 198 return 0;
@@ -147,6 +241,66 @@ out_unlock:
147 return err; 241 return err;
148} 242}
149 243
244static struct guehdr *gue_gro_remcsum(struct sk_buff *skb, unsigned int off,
245 struct guehdr *guehdr, void *data,
246 size_t hdrlen, u8 ipproto)
247{
248 __be16 *pd = data;
249 u16 start = ntohs(pd[0]);
250 u16 offset = ntohs(pd[1]);
251 u16 poffset = 0;
252 u16 plen;
253 void *ptr;
254 __wsum csum, delta;
255 __sum16 *psum;
256
257 if (skb->remcsum_offload)
258 return guehdr;
259
260 if (start > skb_gro_len(skb) - hdrlen ||
261 offset > skb_gro_len(skb) - hdrlen - sizeof(u16) ||
262 !NAPI_GRO_CB(skb)->csum_valid || skb->remcsum_offload)
263 return NULL;
264
265 plen = hdrlen + offset + sizeof(u16);
266
267 /* Pull checksum that will be written */
268 if (skb_gro_header_hard(skb, off + plen)) {
269 guehdr = skb_gro_header_slow(skb, off + plen, off);
270 if (!guehdr)
271 return NULL;
272 }
273
274 ptr = (void *)guehdr + hdrlen;
275
276 if (ipproto == IPPROTO_IP &&
277 (hdrlen + sizeof(struct iphdr) < plen)) {
278 struct iphdr *ip = (struct iphdr *)(ptr + hdrlen);
279
280 /* If next header happens to be IP we can skip
281 * that for the checksum calculation since the
282 * IP header checksum is zero if correct.
283 */
284 poffset = ip->ihl * 4;
285 }
286
287 csum = csum_sub(NAPI_GRO_CB(skb)->csum,
288 csum_partial(ptr + poffset, start - poffset, 0));
289
290 /* Set derived checksum in packet */
291 psum = (__sum16 *)(ptr + offset);
292 delta = csum_sub(csum_fold(csum), *psum);
293 *psum = csum_fold(csum);
294
295 /* Adjust skb->csum since we changed the packet */
296 skb->csum = csum_add(skb->csum, delta);
297 NAPI_GRO_CB(skb)->csum = csum_add(NAPI_GRO_CB(skb)->csum, delta);
298
299 skb->remcsum_offload = 1;
300
301 return guehdr;
302}
303
150static struct sk_buff **gue_gro_receive(struct sk_buff **head, 304static struct sk_buff **gue_gro_receive(struct sk_buff **head,
151 struct sk_buff *skb) 305 struct sk_buff *skb)
152{ 306{
@@ -154,38 +308,64 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
154 const struct net_offload *ops; 308 const struct net_offload *ops;
155 struct sk_buff **pp = NULL; 309 struct sk_buff **pp = NULL;
156 struct sk_buff *p; 310 struct sk_buff *p;
157 u8 proto;
158 struct guehdr *guehdr; 311 struct guehdr *guehdr;
159 unsigned int hlen, guehlen; 312 size_t len, optlen, hdrlen, off;
160 unsigned int off; 313 void *data;
314 u16 doffset = 0;
161 int flush = 1; 315 int flush = 1;
162 316
163 off = skb_gro_offset(skb); 317 off = skb_gro_offset(skb);
164 hlen = off + sizeof(*guehdr); 318 len = off + sizeof(*guehdr);
319
165 guehdr = skb_gro_header_fast(skb, off); 320 guehdr = skb_gro_header_fast(skb, off);
166 if (skb_gro_header_hard(skb, hlen)) { 321 if (skb_gro_header_hard(skb, len)) {
167 guehdr = skb_gro_header_slow(skb, hlen, off); 322 guehdr = skb_gro_header_slow(skb, len, off);
168 if (unlikely(!guehdr)) 323 if (unlikely(!guehdr))
169 goto out; 324 goto out;
170 } 325 }
171 326
172 proto = guehdr->next_hdr; 327 optlen = guehdr->hlen << 2;
328 len += optlen;
173 329
174 rcu_read_lock(); 330 if (skb_gro_header_hard(skb, len)) {
175 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads; 331 guehdr = skb_gro_header_slow(skb, len, off);
176 ops = rcu_dereference(offloads[proto]); 332 if (unlikely(!guehdr))
177 if (WARN_ON(!ops || !ops->callbacks.gro_receive)) 333 goto out;
178 goto out_unlock; 334 }
179 335
180 guehlen = sizeof(*guehdr) + (guehdr->hlen << 2); 336 if (unlikely(guehdr->control) || guehdr->version != 0 ||
337 validate_gue_flags(guehdr, optlen))
338 goto out;
181 339
182 hlen = off + guehlen; 340 hdrlen = sizeof(*guehdr) + optlen;
183 if (skb_gro_header_hard(skb, hlen)) { 341
184 guehdr = skb_gro_header_slow(skb, hlen, off); 342 /* Adjust NAPI_GRO_CB(skb)->csum to account for guehdr,
185 if (unlikely(!guehdr)) 343 * this is needed if there is a remote checkcsum offload.
186 goto out_unlock; 344 */
345 skb_gro_postpull_rcsum(skb, guehdr, hdrlen);
346
347 data = &guehdr[1];
348
349 if (guehdr->flags & GUE_FLAG_PRIV) {
350 __be32 flags = *(__be32 *)(data + doffset);
351
352 doffset += GUE_LEN_PRIV;
353
354 if (flags & GUE_PFLAG_REMCSUM) {
355 guehdr = gue_gro_remcsum(skb, off, guehdr,
356 data + doffset, hdrlen,
357 guehdr->proto_ctype);
358 if (!guehdr)
359 goto out;
360
361 data = &guehdr[1];
362
363 doffset += GUE_PLEN_REMCSUM;
364 }
187 } 365 }
188 366
367 skb_gro_pull(skb, hdrlen);
368
189 flush = 0; 369 flush = 0;
190 370
191 for (p = *head; p; p = p->next) { 371 for (p = *head; p; p = p->next) {
@@ -197,7 +377,7 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
197 guehdr2 = (struct guehdr *)(p->data + off); 377 guehdr2 = (struct guehdr *)(p->data + off);
198 378
199 /* Compare base GUE header to be equal (covers 379 /* Compare base GUE header to be equal (covers
200 * hlen, version, next_hdr, and flags. 380 * hlen, version, proto_ctype, and flags.
201 */ 381 */
202 if (guehdr->word != guehdr2->word) { 382 if (guehdr->word != guehdr2->word) {
203 NAPI_GRO_CB(p)->same_flow = 0; 383 NAPI_GRO_CB(p)->same_flow = 0;
@@ -212,10 +392,11 @@ static struct sk_buff **gue_gro_receive(struct sk_buff **head,
212 } 392 }
213 } 393 }
214 394
215 skb_gro_pull(skb, guehlen); 395 rcu_read_lock();
216 396 offloads = NAPI_GRO_CB(skb)->is_ipv6 ? inet6_offloads : inet_offloads;
217 /* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/ 397 ops = rcu_dereference(offloads[guehdr->proto_ctype]);
218 skb_gro_postpull_rcsum(skb, guehdr, guehlen); 398 if (WARN_ON(!ops || !ops->callbacks.gro_receive))
399 goto out_unlock;
219 400
220 pp = ops->callbacks.gro_receive(head, skb); 401 pp = ops->callbacks.gro_receive(head, skb);
221 402
@@ -236,7 +417,7 @@ static int gue_gro_complete(struct sk_buff *skb, int nhoff)
236 u8 proto; 417 u8 proto;
237 int err = -ENOENT; 418 int err = -ENOENT;
238 419
239 proto = guehdr->next_hdr; 420 proto = guehdr->proto_ctype;
240 421
241 guehlen = sizeof(*guehdr) + (guehdr->hlen << 2); 422 guehlen = sizeof(*guehdr) + (guehdr->hlen << 2);
242 423
@@ -487,6 +668,125 @@ static const struct genl_ops fou_nl_ops[] = {
487 }, 668 },
488}; 669};
489 670
671static void fou_build_udp(struct sk_buff *skb, struct ip_tunnel_encap *e,
672 struct flowi4 *fl4, u8 *protocol, __be16 sport)
673{
674 struct udphdr *uh;
675
676 skb_push(skb, sizeof(struct udphdr));
677 skb_reset_transport_header(skb);
678
679 uh = udp_hdr(skb);
680
681 uh->dest = e->dport;
682 uh->source = sport;
683 uh->len = htons(skb->len);
684 uh->check = 0;
685 udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
686 fl4->saddr, fl4->daddr, skb->len);
687
688 *protocol = IPPROTO_UDP;
689}
690
691int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
692 u8 *protocol, struct flowi4 *fl4)
693{
694 bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
695 int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
696 __be16 sport;
697
698 skb = iptunnel_handle_offloads(skb, csum, type);
699
700 if (IS_ERR(skb))
701 return PTR_ERR(skb);
702
703 sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
704 skb, 0, 0, false);
705 fou_build_udp(skb, e, fl4, protocol, sport);
706
707 return 0;
708}
709EXPORT_SYMBOL(fou_build_header);
710
711int gue_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
712 u8 *protocol, struct flowi4 *fl4)
713{
714 bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
715 int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
716 struct guehdr *guehdr;
717 size_t hdrlen, optlen = 0;
718 __be16 sport;
719 void *data;
720 bool need_priv = false;
721
722 if ((e->flags & TUNNEL_ENCAP_FLAG_REMCSUM) &&
723 skb->ip_summed == CHECKSUM_PARTIAL) {
724 csum = false;
725 optlen += GUE_PLEN_REMCSUM;
726 type |= SKB_GSO_TUNNEL_REMCSUM;
727 need_priv = true;
728 }
729
730 optlen += need_priv ? GUE_LEN_PRIV : 0;
731
732 skb = iptunnel_handle_offloads(skb, csum, type);
733
734 if (IS_ERR(skb))
735 return PTR_ERR(skb);
736
737 /* Get source port (based on flow hash) before skb_push */
738 sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
739 skb, 0, 0, false);
740
741 hdrlen = sizeof(struct guehdr) + optlen;
742
743 skb_push(skb, hdrlen);
744
745 guehdr = (struct guehdr *)skb->data;
746
747 guehdr->control = 0;
748 guehdr->version = 0;
749 guehdr->hlen = optlen >> 2;
750 guehdr->flags = 0;
751 guehdr->proto_ctype = *protocol;
752
753 data = &guehdr[1];
754
755 if (need_priv) {
756 __be32 *flags = data;
757
758 guehdr->flags |= GUE_FLAG_PRIV;
759 *flags = 0;
760 data += GUE_LEN_PRIV;
761
762 if (type & SKB_GSO_TUNNEL_REMCSUM) {
763 u16 csum_start = skb_checksum_start_offset(skb);
764 __be16 *pd = data;
765
766 if (csum_start < hdrlen)
767 return -EINVAL;
768
769 csum_start -= hdrlen;
770 pd[0] = htons(csum_start);
771 pd[1] = htons(csum_start + skb->csum_offset);
772
773 if (!skb_is_gso(skb)) {
774 skb->ip_summed = CHECKSUM_NONE;
775 skb->encapsulation = 0;
776 }
777
778 *flags |= GUE_PFLAG_REMCSUM;
779 data += GUE_PLEN_REMCSUM;
780 }
781
782 }
783
784 fou_build_udp(skb, e, fl4, protocol, sport);
785
786 return 0;
787}
788EXPORT_SYMBOL(gue_build_header);
789
490static int __init fou_init(void) 790static int __init fou_init(void)
491{ 791{
492 int ret; 792 int ret;
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 0bb8e141eacc..c3587e1c8b82 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -56,7 +56,10 @@
56#include <net/netns/generic.h> 56#include <net/netns/generic.h>
57#include <net/rtnetlink.h> 57#include <net/rtnetlink.h>
58#include <net/udp.h> 58#include <net/udp.h>
59#include <net/gue.h> 59
60#if IS_ENABLED(CONFIG_NET_FOU)
61#include <net/fou.h>
62#endif
60 63
61#if IS_ENABLED(CONFIG_IPV6) 64#if IS_ENABLED(CONFIG_IPV6)
62#include <net/ipv6.h> 65#include <net/ipv6.h>
@@ -494,10 +497,12 @@ static int ip_encap_hlen(struct ip_tunnel_encap *e)
494 switch (e->type) { 497 switch (e->type) {
495 case TUNNEL_ENCAP_NONE: 498 case TUNNEL_ENCAP_NONE:
496 return 0; 499 return 0;
500#if IS_ENABLED(CONFIG_NET_FOU)
497 case TUNNEL_ENCAP_FOU: 501 case TUNNEL_ENCAP_FOU:
498 return sizeof(struct udphdr); 502 return fou_encap_hlen(e);
499 case TUNNEL_ENCAP_GUE: 503 case TUNNEL_ENCAP_GUE:
500 return sizeof(struct udphdr) + sizeof(struct guehdr); 504 return gue_encap_hlen(e);
505#endif
501 default: 506 default:
502 return -EINVAL; 507 return -EINVAL;
503 } 508 }
@@ -526,60 +531,18 @@ int ip_tunnel_encap_setup(struct ip_tunnel *t,
526} 531}
527EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup); 532EXPORT_SYMBOL_GPL(ip_tunnel_encap_setup);
528 533
529static int fou_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
530 size_t hdr_len, u8 *protocol, struct flowi4 *fl4)
531{
532 struct udphdr *uh;
533 __be16 sport;
534 bool csum = !!(e->flags & TUNNEL_ENCAP_FLAG_CSUM);
535 int type = csum ? SKB_GSO_UDP_TUNNEL_CSUM : SKB_GSO_UDP_TUNNEL;
536
537 skb = iptunnel_handle_offloads(skb, csum, type);
538
539 if (IS_ERR(skb))
540 return PTR_ERR(skb);
541
542 /* Get length and hash before making space in skb */
543
544 sport = e->sport ? : udp_flow_src_port(dev_net(skb->dev),
545 skb, 0, 0, false);
546
547 skb_push(skb, hdr_len);
548
549 skb_reset_transport_header(skb);
550 uh = udp_hdr(skb);
551
552 if (e->type == TUNNEL_ENCAP_GUE) {
553 struct guehdr *guehdr = (struct guehdr *)&uh[1];
554
555 guehdr->version = 0;
556 guehdr->hlen = 0;
557 guehdr->flags = 0;
558 guehdr->next_hdr = *protocol;
559 }
560
561 uh->dest = e->dport;
562 uh->source = sport;
563 uh->len = htons(skb->len);
564 uh->check = 0;
565 udp_set_csum(!(e->flags & TUNNEL_ENCAP_FLAG_CSUM), skb,
566 fl4->saddr, fl4->daddr, skb->len);
567
568 *protocol = IPPROTO_UDP;
569
570 return 0;
571}
572
573int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t, 534int ip_tunnel_encap(struct sk_buff *skb, struct ip_tunnel *t,
574 u8 *protocol, struct flowi4 *fl4) 535 u8 *protocol, struct flowi4 *fl4)
575{ 536{
576 switch (t->encap.type) { 537 switch (t->encap.type) {
577 case TUNNEL_ENCAP_NONE: 538 case TUNNEL_ENCAP_NONE:
578 return 0; 539 return 0;
540#if IS_ENABLED(CONFIG_NET_FOU)
579 case TUNNEL_ENCAP_FOU: 541 case TUNNEL_ENCAP_FOU:
542 return fou_build_header(skb, &t->encap, protocol, fl4);
580 case TUNNEL_ENCAP_GUE: 543 case TUNNEL_ENCAP_GUE:
581 return fou_build_header(skb, &t->encap, t->encap_hlen, 544 return gue_build_header(skb, &t->encap, protocol, fl4);
582 protocol, fl4); 545#endif
583 default: 546 default:
584 return -EINVAL; 547 return -EINVAL;
585 } 548 }
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 5b90f2f447a5..a1b2a5624f91 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -97,6 +97,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
97 SKB_GSO_MPLS | 97 SKB_GSO_MPLS |
98 SKB_GSO_UDP_TUNNEL | 98 SKB_GSO_UDP_TUNNEL |
99 SKB_GSO_UDP_TUNNEL_CSUM | 99 SKB_GSO_UDP_TUNNEL_CSUM |
100 SKB_GSO_TUNNEL_REMCSUM |
100 0) || 101 0) ||
101 !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6)))) 102 !(type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))))
102 goto out; 103 goto out;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 6480cea7aa53..0a5a70d0e84c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -29,7 +29,7 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
29 netdev_features_t features, 29 netdev_features_t features,
30 struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb, 30 struct sk_buff *(*gso_inner_segment)(struct sk_buff *skb,
31 netdev_features_t features), 31 netdev_features_t features),
32 __be16 new_protocol) 32 __be16 new_protocol, bool is_ipv6)
33{ 33{
34 struct sk_buff *segs = ERR_PTR(-EINVAL); 34 struct sk_buff *segs = ERR_PTR(-EINVAL);
35 u16 mac_offset = skb->mac_header; 35 u16 mac_offset = skb->mac_header;
@@ -39,7 +39,10 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
39 netdev_features_t enc_features; 39 netdev_features_t enc_features;
40 int udp_offset, outer_hlen; 40 int udp_offset, outer_hlen;
41 unsigned int oldlen; 41 unsigned int oldlen;
42 bool need_csum; 42 bool need_csum = !!(skb_shinfo(skb)->gso_type &
43 SKB_GSO_UDP_TUNNEL_CSUM);
44 bool remcsum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_TUNNEL_REMCSUM);
45 bool offload_csum = false, dont_encap = (need_csum || remcsum);
43 46
44 oldlen = (u16)~skb->len; 47 oldlen = (u16)~skb->len;
45 48
@@ -52,10 +55,13 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
52 skb_set_network_header(skb, skb_inner_network_offset(skb)); 55 skb_set_network_header(skb, skb_inner_network_offset(skb));
53 skb->mac_len = skb_inner_network_offset(skb); 56 skb->mac_len = skb_inner_network_offset(skb);
54 skb->protocol = new_protocol; 57 skb->protocol = new_protocol;
58 skb->encap_hdr_csum = need_csum;
59 skb->remcsum_offload = remcsum;
55 60
56 need_csum = !!(skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM); 61 /* Try to offload checksum if possible */
57 if (need_csum) 62 offload_csum = !!(need_csum &&
58 skb->encap_hdr_csum = 1; 63 (skb->dev->features &
64 (is_ipv6 ? NETIF_F_V6_CSUM : NETIF_F_V4_CSUM)));
59 65
60 /* segment inner packet. */ 66 /* segment inner packet. */
61 enc_features = skb->dev->hw_enc_features & features; 67 enc_features = skb->dev->hw_enc_features & features;
@@ -72,11 +78,21 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
72 do { 78 do {
73 struct udphdr *uh; 79 struct udphdr *uh;
74 int len; 80 int len;
75 81 __be32 delta;
76 skb_reset_inner_headers(skb); 82
77 skb->encapsulation = 1; 83 if (dont_encap) {
84 skb->encapsulation = 0;
85 skb->ip_summed = CHECKSUM_NONE;
86 } else {
87 /* Only set up inner headers if we might be offloading
88 * inner checksum.
89 */
90 skb_reset_inner_headers(skb);
91 skb->encapsulation = 1;
92 }
78 93
79 skb->mac_len = mac_len; 94 skb->mac_len = mac_len;
95 skb->protocol = protocol;
80 96
81 skb_push(skb, outer_hlen); 97 skb_push(skb, outer_hlen);
82 skb_reset_mac_header(skb); 98 skb_reset_mac_header(skb);
@@ -86,19 +102,36 @@ static struct sk_buff *__skb_udp_tunnel_segment(struct sk_buff *skb,
86 uh = udp_hdr(skb); 102 uh = udp_hdr(skb);
87 uh->len = htons(len); 103 uh->len = htons(len);
88 104
89 if (need_csum) { 105 if (!need_csum)
90 __be32 delta = htonl(oldlen + len); 106 continue;
91 107
92 uh->check = ~csum_fold((__force __wsum) 108 delta = htonl(oldlen + len);
93 ((__force u32)uh->check + 109
94 (__force u32)delta)); 110 uh->check = ~csum_fold((__force __wsum)
111 ((__force u32)uh->check +
112 (__force u32)delta));
113 if (offload_csum) {
114 skb->ip_summed = CHECKSUM_PARTIAL;
115 skb->csum_start = skb_transport_header(skb) - skb->head;
116 skb->csum_offset = offsetof(struct udphdr, check);
117 } else if (remcsum) {
118 /* Need to calculate checksum from scratch,
119 * inner checksums are never when doing
120 * remote_checksum_offload.
121 */
122
123 skb->csum = skb_checksum(skb, udp_offset,
124 skb->len - udp_offset,
125 0);
126 uh->check = csum_fold(skb->csum);
127 if (uh->check == 0)
128 uh->check = CSUM_MANGLED_0;
129 } else {
95 uh->check = gso_make_checksum(skb, ~uh->check); 130 uh->check = gso_make_checksum(skb, ~uh->check);
96 131
97 if (uh->check == 0) 132 if (uh->check == 0)
98 uh->check = CSUM_MANGLED_0; 133 uh->check = CSUM_MANGLED_0;
99 } 134 }
100
101 skb->protocol = protocol;
102 } while ((skb = skb->next)); 135 } while ((skb = skb->next));
103out: 136out:
104 return segs; 137 return segs;
@@ -134,7 +167,7 @@ struct sk_buff *skb_udp_tunnel_segment(struct sk_buff *skb,
134 } 167 }
135 168
136 segs = __skb_udp_tunnel_segment(skb, features, gso_inner_segment, 169 segs = __skb_udp_tunnel_segment(skb, features, gso_inner_segment,
137 protocol); 170 protocol, is_ipv6);
138 171
139out_unlock: 172out_unlock:
140 rcu_read_unlock(); 173 rcu_read_unlock();
@@ -172,6 +205,7 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
172 if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY | 205 if (unlikely(type & ~(SKB_GSO_UDP | SKB_GSO_DODGY |
173 SKB_GSO_UDP_TUNNEL | 206 SKB_GSO_UDP_TUNNEL |
174 SKB_GSO_UDP_TUNNEL_CSUM | 207 SKB_GSO_UDP_TUNNEL_CSUM |
208 SKB_GSO_TUNNEL_REMCSUM |
175 SKB_GSO_IPIP | 209 SKB_GSO_IPIP |
176 SKB_GSO_GRE | SKB_GSO_GRE_CSUM | 210 SKB_GSO_GRE | SKB_GSO_GRE_CSUM |
177 SKB_GSO_MPLS) || 211 SKB_GSO_MPLS) ||
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index a071563a7e6e..e9767079a360 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -78,6 +78,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
78 SKB_GSO_SIT | 78 SKB_GSO_SIT |
79 SKB_GSO_UDP_TUNNEL | 79 SKB_GSO_UDP_TUNNEL |
80 SKB_GSO_UDP_TUNNEL_CSUM | 80 SKB_GSO_UDP_TUNNEL_CSUM |
81 SKB_GSO_TUNNEL_REMCSUM |
81 SKB_GSO_MPLS | 82 SKB_GSO_MPLS |
82 SKB_GSO_TCPV6 | 83 SKB_GSO_TCPV6 |
83 0))) 84 0)))
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index 6b8f543f6ac6..637ba2e438b7 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -42,6 +42,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
42 SKB_GSO_DODGY | 42 SKB_GSO_DODGY |
43 SKB_GSO_UDP_TUNNEL | 43 SKB_GSO_UDP_TUNNEL |
44 SKB_GSO_UDP_TUNNEL_CSUM | 44 SKB_GSO_UDP_TUNNEL_CSUM |
45 SKB_GSO_TUNNEL_REMCSUM |
45 SKB_GSO_GRE | 46 SKB_GSO_GRE |
46 SKB_GSO_GRE_CSUM | 47 SKB_GSO_GRE_CSUM |
47 SKB_GSO_IPIP | 48 SKB_GSO_IPIP |