net-gre-gro: Add GRE support to the GRO stack

This patch built on top of Commit 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support") to add the support of the standard GRE (RFC1701/RFC2784/RFC2890) to the GRO stack. It also serves as an example for supporting other encapsulation protocols in the GRO stack in the future. The patch supports version 0 and all the flags (key, csum, seq#) but will flush any pkt with the S (seq#) flag. This is because the S flag is not support by GSO, and a GRO pkt may end up in the forwarding path, thus requiring GSO support to break it up correctly. Currently the "packet_offload" structure only contains L3 (ETH_P_IP/ ETH_P_IPV6) GRO offload support so the encapped pkts are limited to IP pkts (i.e., w/o L2 hdr). But support for other protocol type can be easily added, so is the support for GRE variations like NVGRE. The patch also support csum offload. Specifically if the csum flag is on and the h/w is capable of checksumming the payload (CHECKSUM_COMPLETE), the code will take advantage of the csum computed by the h/w when validating the GRE csum. Note that commit 60769a5dcd8755715c7143b4571d5c44f01796f1 "ipv4: gre: add GRO capability" already introduces GRO capability to IPv4 GRE tunnels, using the gro_cells infrastructure. But GRO is done after GRE hdr has been removed (i.e., decapped). The following patch applies GRO when pkts first come in (before hitting the GRE tunnel code). There is some performance advantage for applying GRO as early as possible. Also this approach is transparent to other subsystem like Open vSwitch where GRE decap is handled outside of the IP stack hence making it harder for the gro_cells stuff to apply. On the other hand, some NICs are still not capable of hashing on the inner hdr of a GRE pkt (RSS). In that case the GRO processing of pkts from the same remote host will all happen on the same CPU and the performance may be suboptimal. I'm including some rough preliminary performance numbers below. Note that the performance will be highly dependent on traffic load, mix as usual. Moreover it also depends on NIC offload features hence the following is by no means a comprehesive study. Local testing and tuning will be needed to decide the best setting. All tests spawned 50 copies of netperf TCP_STREAM and ran for 30 secs. (super_netperf 50 -H 192.168.1.18 -l 30) An IP GRE tunnel with only the key flag on (e.g., ip tunnel add gre1 mode gre local 10.246.17.18 remote 10.246.17.17 ttl 255 key 123) is configured. The GRO support for pkts AFTER decap are controlled through the device feature of the GRE device (e.g., ethtool -K gre1 gro on/off). 1.1 ethtool -K gre1 gro off; ethtool -K eth0 gro off thruput: 9.16Gbps CPU utilization: 19% 1.2 ethtool -K gre1 gro on; ethtool -K eth0 gro off thruput: 5.9Gbps CPU utilization: 15% 1.3 ethtool -K gre1 gro off; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 12-13% 1.4 ethtool -K gre1 gro on; ethtool -K eth0 gro on thruput: 9.26Gbps CPU utilization: 10% The following tests were performed on a different NIC that is capable of csum offload. I.e., the h/w is capable of computing IP payload csum (CHECKSUM_COMPLETE). 2.1 ethtool -K gre1 gro on (hence will use gro_cells) 2.1.1 ethtool -K eth0 gro off; csum offload disabled thruput: 8.53Gbps CPU utilization: 9% 2.1.2 ethtool -K eth0 gro off; csum offload enabled thruput: 8.97Gbps CPU utilization: 7-8% 2.1.3 ethtool -K eth0 gro on; csum offload disabled thruput: 8.83Gbps CPU utilization: 5-6% 2.1.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.98Gbps CPU utilization: 5% 2.2 ethtool -K gre1 gro off 2.2.1 ethtool -K eth0 gro off; csum offload disabled thruput: 5.93Gbps CPU utilization: 9% 2.2.2 ethtool -K eth0 gro off; csum offload enabled thruput: 5.62Gbps CPU utilization: 8% 2.2.3 ethtool -K eth0 gro on; csum offload disabled thruput: 7.69Gbps CPU utilization: 8% 2.2.4 ethtool -K eth0 gro on; csum offload enabled thruput: 8.96Gbps CPU utilization: 5-6% Signed-off-by: H.K. Jerry Chu <hkchu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
author: Jerry Chu <hkchu@google.com> 2014-01-07 13:23:19 -0500
committer: David S. Miller <davem@davemloft.net> 2014-01-07 16:21:31 -0500
commit: bf5a755f5e9186406bbf50f4087100af5bd68e40 (patch)
tree: c971c1aafbcb999a65b5f088bf2627c48006072a /net/ipv4/gre_offload.c
parent: cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c (diff)
1 files changed, 160 insertions, 0 deletions
diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
index 9138cfb10140..746a7b10d434 100644
--- a/net/ipv4/gre_offload.c
+++ b/net/ipv4/gre_offload.c
@@ -116,10 +116,170 @@ out:
        return segs;
 }
+/* Compute the whole skb csum in s/w and store it, then verify GRO csum
+ * starting from gro_offset.
+ */
+static __sum16 gro_skb_checksum(struct sk_buff *skb)
+{
+        __sum16 sum;
+        skb->csum = skb_checksum(skb, 0, skb->len, 0);
+        NAPI_GRO_CB(skb)->csum = csum_sub(skb->csum,
+                csum_partial(skb->data, skb_gro_offset(skb), 0));
+        sum = csum_fold(NAPI_GRO_CB(skb)->csum);
+        if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) {
+                if (unlikely(!sum))
+                        netdev_rx_csum_fault(skb->dev);
+        } else
+                skb->ip_summed = CHECKSUM_COMPLETE;
+        return sum;
+}
+static struct sk_buff **gre_gro_receive(struct sk_buff **head,
+                                        struct sk_buff *skb)
+{
+        struct sk_buff **pp = NULL;
+        struct sk_buff *p;
+        const struct gre_base_hdr *greh;
+        unsigned int hlen, grehlen;
+        unsigned int off;
+        int flush = 1;
+        struct packet_offload *ptype;
+        __be16 type;
+        off = skb_gro_offset(skb);
+        hlen = off + sizeof(*greh);
+        greh = skb_gro_header_fast(skb, off);
+        if (skb_gro_header_hard(skb, hlen)) {
+                greh = skb_gro_header_slow(skb, hlen, off);
+                if (unlikely(!greh))
+                        goto out;
+        }
+        /* Only support version 0 and K (key), C (csum) flags. Note that
+         * although the support for the S (seq#) flag can be added easily
+         * for GRO, this is problematic for GSO hence can not be enabled
+         * here because a GRO pkt may end up in the forwarding path, thus
+         * requiring GSO support to break it up correctly.
+         */
+        if ((greh->flags & ~(GRE_KEY|GRE_CSUM)) != 0)
+                goto out;
+        type = greh->protocol;
+        rcu_read_lock();
+        ptype = gro_find_receive_by_type(type);
+        if (ptype == NULL)
+                goto out_unlock;
+        grehlen = GRE_HEADER_SECTION;
+        if (greh->flags & GRE_KEY)
+                grehlen += GRE_HEADER_SECTION;
+        if (greh->flags & GRE_CSUM)
+                grehlen += GRE_HEADER_SECTION;
+        hlen = off + grehlen;
+        if (skb_gro_header_hard(skb, hlen)) {
+                greh = skb_gro_header_slow(skb, hlen, off);
+                if (unlikely(!greh))
+                        goto out_unlock;
+        }
+        if (greh->flags & GRE_CSUM) { /* Need to verify GRE csum first */
+                __sum16 csum = 0;
+                if (skb->ip_summed == CHECKSUM_COMPLETE)
+                        csum = csum_fold(NAPI_GRO_CB(skb)->csum);
+                /* Don't trust csum error calculated/reported by h/w */
+                if (skb->ip_summed == CHECKSUM_NONE || csum != 0)
+                        csum = gro_skb_checksum(skb);
+                /* GRE CSUM is the 1's complement of the 1's complement sum
+                 * of the GRE hdr plus payload so it should add up to 0xffff
+                 * (and 0 after csum_fold()) just like the IPv4 hdr csum.
+                 */
+                if (csum)
+                        goto out_unlock;
+        }
+        flush = 0;
+        for (p = *head; p; p = p->next) {
+                const struct gre_base_hdr *greh2;
+                if (!NAPI_GRO_CB(p)->same_flow)
+                        continue;
+                /* The following checks are needed to ensure only pkts
+                 * from the same tunnel are considered for aggregation.
+                 * The criteria for "the same tunnel" includes:
+                 * 1) same version (we only support version 0 here)
+                 * 2) same protocol (we only support ETH_P_IP for now)
+                 * 3) same set of flags
+                 * 4) same key if the key field is present.
+                 */
+                greh2 = (struct gre_base_hdr *)(p->data + off);
+                if (greh2->flags != greh->flags ||
+                    greh2->protocol != greh->protocol) {
+                        NAPI_GRO_CB(p)->same_flow = 0;
+                        continue;
+                }
+                if (greh->flags & GRE_KEY) {
+                        /* compare keys */
+                        if (*(__be32 *)(greh2+1) != *(__be32 *)(greh+1)) {
+                                NAPI_GRO_CB(p)->same_flow = 0;
+                                continue;
+                        }
+                }
+        }
+        skb_gro_pull(skb, grehlen);
+        /* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
+        skb_gro_postpull_rcsum(skb, greh, grehlen);
+        pp = ptype->callbacks.gro_receive(head, skb);
+out_unlock:
+        rcu_read_unlock();
+out:
+        NAPI_GRO_CB(skb)->flush |= flush;
+        return pp;
+}
+int gre_gro_complete(struct sk_buff *skb, int nhoff)
+{
+        struct gre_base_hdr *greh = (struct gre_base_hdr *)(skb->data + nhoff);
+        struct packet_offload *ptype;
+        unsigned int grehlen = sizeof(*greh);
+        int err = -ENOENT;
+        __be16 type;
+        type = greh->protocol;
+        if (greh->flags & GRE_KEY)
+                grehlen += GRE_HEADER_SECTION;
+        if (greh->flags & GRE_CSUM)
+                grehlen += GRE_HEADER_SECTION;
+        rcu_read_lock();
+        ptype = gro_find_complete_by_type(type);
+        if (ptype != NULL)
+                err = ptype->callbacks.gro_complete(skb, nhoff + grehlen);
+        rcu_read_unlock();
+        return err;
+}
 static const struct net_offload gre_offload = {
        .callbacks = {
                .gso_send_check = gre_gso_send_check,
                .gso_segment = gre_gso_segment,
+                .gro_receive = gre_gro_receive,
+                .gro_complete = gre_gro_complete,
        },
 };
author	Jerry Chu <hkchu@google.com>	2014-01-07 13:23:19 -0500
committer	David S. Miller <davem@davemloft.net>	2014-01-07 16:21:31 -0500
commit	bf5a755f5e9186406bbf50f4087100af5bd68e40 (patch)
tree	c971c1aafbcb999a65b5f088bf2627c48006072a /net/ipv4/gre_offload.c
parent	cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c (diff)

diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c index 9138cfb10140..746a7b10d434 100644 --- a/net/ipv4/gre_offload.c +++ b/net/ipv4/gre_offload.c
@@ -116,10 +116,170 @@ out:
116	return segs;	116	return segs;
117	}	117	}
118		118
		119	/* Compute the whole skb csum in s/w and store it, then verify GRO csum
		120	* starting from gro_offset.
		121	*/
		122	static __sum16 gro_skb_checksum(struct sk_buff *skb)
		123	{
		124	__sum16 sum;
		125
		126	skb->csum = skb_checksum(skb, 0, skb->len, 0);
		127	NAPI_GRO_CB(skb)->csum = csum_sub(skb->csum,
		128	csum_partial(skb->data, skb_gro_offset(skb), 0));
		129	sum = csum_fold(NAPI_GRO_CB(skb)->csum);
		130	if (unlikely(skb->ip_summed == CHECKSUM_COMPLETE)) {
		131	if (unlikely(!sum))
		132	netdev_rx_csum_fault(skb->dev);
		133	} else
		134	skb->ip_summed = CHECKSUM_COMPLETE;
		135
		136	return sum;
		137	}
		138
		139	static struct sk_buff gre_gro_receive(struct sk_buff head,
		140	struct sk_buff *skb)
		141	{
		142	struct sk_buff **pp = NULL;
		143	struct sk_buff *p;
		144	const struct gre_base_hdr *greh;
		145	unsigned int hlen, grehlen;
		146	unsigned int off;
		147	int flush = 1;
		148	struct packet_offload *ptype;
		149	__be16 type;
		150
		151	off = skb_gro_offset(skb);
		152	hlen = off + sizeof(*greh);
		153	greh = skb_gro_header_fast(skb, off);
		154	if (skb_gro_header_hard(skb, hlen)) {
		155	greh = skb_gro_header_slow(skb, hlen, off);
		156	if (unlikely(!greh))
		157	goto out;
		158	}
		159
		160	/* Only support version 0 and K (key), C (csum) flags. Note that
		161	* although the support for the S (seq#) flag can be added easily
		162	* for GRO, this is problematic for GSO hence can not be enabled
		163	* here because a GRO pkt may end up in the forwarding path, thus
		164	* requiring GSO support to break it up correctly.
		165	*/
		166	if ((greh->flags & ~(GRE_KEY\|GRE_CSUM)) != 0)
		167	goto out;
		168
		169	type = greh->protocol;
		170
		171	rcu_read_lock();
		172	ptype = gro_find_receive_by_type(type);
		173	if (ptype == NULL)
		174	goto out_unlock;
		175
		176	grehlen = GRE_HEADER_SECTION;
		177
		178	if (greh->flags & GRE_KEY)
		179	grehlen += GRE_HEADER_SECTION;
		180
		181	if (greh->flags & GRE_CSUM)
		182	grehlen += GRE_HEADER_SECTION;
		183
		184	hlen = off + grehlen;
		185	if (skb_gro_header_hard(skb, hlen)) {
		186	greh = skb_gro_header_slow(skb, hlen, off);
		187	if (unlikely(!greh))
		188	goto out_unlock;
		189	}
		190	if (greh->flags & GRE_CSUM) { /* Need to verify GRE csum first */
		191	__sum16 csum = 0;
		192
		193	if (skb->ip_summed == CHECKSUM_COMPLETE)
		194	csum = csum_fold(NAPI_GRO_CB(skb)->csum);
		195	/* Don't trust csum error calculated/reported by h/w */
		196	if (skb->ip_summed == CHECKSUM_NONE \|\| csum != 0)
		197	csum = gro_skb_checksum(skb);
		198
		199	/* GRE CSUM is the 1's complement of the 1's complement sum
		200	* of the GRE hdr plus payload so it should add up to 0xffff
		201	* (and 0 after csum_fold()) just like the IPv4 hdr csum.
		202	*/
		203	if (csum)
		204	goto out_unlock;
		205	}
		206	flush = 0;
		207
		208	for (p = *head; p; p = p->next) {
		209	const struct gre_base_hdr *greh2;
		210
		211	if (!NAPI_GRO_CB(p)->same_flow)
		212	continue;
		213
		214	/* The following checks are needed to ensure only pkts
		215	* from the same tunnel are considered for aggregation.
		216	* The criteria for "the same tunnel" includes:
		217	* 1) same version (we only support version 0 here)
		218	* 2) same protocol (we only support ETH_P_IP for now)
		219	* 3) same set of flags
		220	* 4) same key if the key field is present.
		221	*/
		222	greh2 = (struct gre_base_hdr *)(p->data + off);
		223
		224	if (greh2->flags != greh->flags \|\|
		225	greh2->protocol != greh->protocol) {
		226	NAPI_GRO_CB(p)->same_flow = 0;
		227	continue;
		228	}
		229	if (greh->flags & GRE_KEY) {
		230	/* compare keys */
		231	if ((__be32 )(greh2+1) != (__be32 )(greh+1)) {
		232	NAPI_GRO_CB(p)->same_flow = 0;
		233	continue;
		234	}
		235	}
		236	}
		237
		238	skb_gro_pull(skb, grehlen);
		239
		240	/* Adjusted NAPI_GRO_CB(skb)->csum after skb_gro_pull()*/
		241	skb_gro_postpull_rcsum(skb, greh, grehlen);
		242
		243	pp = ptype->callbacks.gro_receive(head, skb);
		244
		245	out_unlock:
		246	rcu_read_unlock();
		247	out:
		248	NAPI_GRO_CB(skb)->flush \|= flush;
		249
		250	return pp;
		251	}
		252
		253	int gre_gro_complete(struct sk_buff *skb, int nhoff)
		254	{
		255	struct gre_base_hdr greh = (struct gre_base_hdr )(skb->data + nhoff);
		256	struct packet_offload *ptype;
		257	unsigned int grehlen = sizeof(*greh);
		258	int err = -ENOENT;
		259	__be16 type;
		260
		261	type = greh->protocol;
		262	if (greh->flags & GRE_KEY)
		263	grehlen += GRE_HEADER_SECTION;
		264
		265	if (greh->flags & GRE_CSUM)
		266	grehlen += GRE_HEADER_SECTION;
		267
		268	rcu_read_lock();
		269	ptype = gro_find_complete_by_type(type);
		270	if (ptype != NULL)
		271	err = ptype->callbacks.gro_complete(skb, nhoff + grehlen);
		272
		273	rcu_read_unlock();
		274	return err;
		275	}
		276
119	static const struct net_offload gre_offload = {	277	static const struct net_offload gre_offload = {
120	.callbacks = {	278	.callbacks = {
121	.gso_send_check = gre_gso_send_check,	279	.gso_send_check = gre_gso_send_check,
122	.gso_segment = gre_gso_segment,	280	.gso_segment = gre_gso_segment,
		281	.gro_receive = gre_gro_receive,
		282	.gro_complete = gre_gro_complete,
123	},	283	},
124	};	284	};
125		285