From a699d65ec4ff82245d2c4fdfb8ed1d64776d756e Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Mon, 10 Feb 2014 16:28:52 -0500
Subject: SUNRPC: Don't create a gss auth cache unless rpc.gssd is running

An infinite loop is caused when nfs4_establish_lease() fails
with -EACCES. This causes nfs4_handle_reclaim_lease_error()
to sleep a bit and resets the NFS4CLNT_LEASE_EXPIRED bit.
This in turn causes nfs4_state_manager() to try and
reestablished the lease, again, again, again...

The problem is a valid RPCSEC_GSS client is being created when
rpc.gssd is not running.

Link: http://lkml.kernel.org/r/1392066375-16502-1-git-send-email-steved@redhat.com
Fixes: 0ea9de0ea6a4 (sunrpc: turn warn_gssd() log message into a dprintk())
Reported-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 2 ++
 1 file changed, 2 insertions(+)

(limited to 'net')

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 6c0513a7f992..44a61e8fda6f 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -991,6 +991,8 @@ gss_create_new(struct rpc_auth_create_args *args, struct rpc_clnt *clnt)
 	gss_auth->service = gss_pseudoflavor_to_service(gss_auth->mech, flavor);
 	if (gss_auth->service == 0)
 		goto err_put_mech;
+	if (!gssd_running(gss_auth->net))
+		goto err_put_mech;
 	auth = &gss_auth->rpc_auth;
 	auth->au_cslack = GSS_CRED_SLACK >> 2;
 	auth->au_rslack = GSS_VERF_SLACK >> 2;
-- 
cgit v1.2.2


From 32769814d54a5a360b83811b4039c776ec953c71 Mon Sep 17 00:00:00 2001
From: Arik Nemtsov <arik@wizery.com>
Date: Tue, 11 Feb 2014 12:27:19 +0200
Subject: mac80211: fix sched_scan restart on recovery

In case we were not suspended, the reconfig function returns without
configuring the scheduled scan.

Signed-off-by: Arik Nemtsov <arikx.nemtsov@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/util.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

(limited to 'net')

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 676dc0967f37..1d1bb7084c05 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -1740,6 +1740,26 @@ int ieee80211_reconfig(struct ieee80211_local *local)
 	ieee80211_wake_queues_by_reason(hw, IEEE80211_MAX_QUEUE_MAP,
 					IEEE80211_QUEUE_STOP_REASON_SUSPEND);
 
+	/*
+	 * Reconfigure sched scan if it was interrupted by FW restart or
+	 * suspend.
+	 */
+	mutex_lock(&local->mtx);
+	sched_scan_sdata = rcu_dereference_protected(local->sched_scan_sdata,
+						lockdep_is_held(&local->mtx));
+	if (sched_scan_sdata && local->sched_scan_req)
+		/*
+		 * Sched scan stopped, but we don't want to report it. Instead,
+		 * we're trying to reschedule.
+		 */
+		if (__ieee80211_request_sched_scan_start(sched_scan_sdata,
+							 local->sched_scan_req))
+			sched_scan_stopped = true;
+	mutex_unlock(&local->mtx);
+
+	if (sched_scan_stopped)
+		cfg80211_sched_scan_stopped(local->hw.wiphy);
+
 	/*
 	 * If this is for hw restart things are still running.
 	 * We may want to change that later, however.
@@ -1768,26 +1788,6 @@ int ieee80211_reconfig(struct ieee80211_local *local)
 	WARN_ON(1);
 #endif
 
-	/*
-	 * Reconfigure sched scan if it was interrupted by FW restart or
-	 * suspend.
-	 */
-	mutex_lock(&local->mtx);
-	sched_scan_sdata = rcu_dereference_protected(local->sched_scan_sdata,
-						lockdep_is_held(&local->mtx));
-	if (sched_scan_sdata && local->sched_scan_req)
-		/*
-		 * Sched scan stopped, but we don't want to report it. Instead,
-		 * we're trying to reschedule.
-		 */
-		if (__ieee80211_request_sched_scan_start(sched_scan_sdata,
-							 local->sched_scan_req))
-			sched_scan_stopped = true;
-	mutex_unlock(&local->mtx);
-
-	if (sched_scan_stopped)
-		cfg80211_sched_scan_stopped(local->hw.wiphy);
-
 	return 0;
 }
 
-- 
cgit v1.2.2


From c368ddaa9ad79fdffde4756804321feba6725c75 Mon Sep 17 00:00:00 2001
From: Eytan Lifshitz <eytan.lifshitz@intel.com>
Date: Thu, 6 Feb 2014 21:01:32 +0200
Subject: mac80211: fix memory leak

In case ieee80211_prep_connection() fails to dereference
sdata->vif.chanctx_conf, the function returns and doesn't
free new_sta. fixed.

Signed-off-by: Eytan Lifshitz <eytan.lifshitz@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/mlme.c | 1 +
 1 file changed, 1 insertion(+)

(limited to 'net')

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index fc1d82465b3c..57d5482b10fa 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -3753,6 +3753,7 @@ static int ieee80211_prep_connection(struct ieee80211_sub_if_data *sdata,
 		chanctx_conf = rcu_dereference(sdata->vif.chanctx_conf);
 		if (WARN_ON(!chanctx_conf)) {
 			rcu_read_unlock();
+			sta_info_free(local, new_sta);
 			return -EINVAL;
 		}
 		rate_flags = ieee80211_chandef_rate_flags(&chanctx_conf->def);
-- 
cgit v1.2.2


From 06ea0bfe6e6043cb56a78935a19f6f8ebc636226 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Tue, 11 Feb 2014 09:15:54 -0500
Subject: SUNRPC: Fix races in xs_nospace()

When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52ee1ba (SUNRPC: Don't start the retransmission timer...)
Reported-by: Neil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 net/sunrpc/xprtsock.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 817a1e523969..0addefca8e77 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -510,6 +510,7 @@ static int xs_nospace(struct rpc_task *task)
 	struct rpc_rqst *req = task->tk_rqstp;
 	struct rpc_xprt *xprt = req->rq_xprt;
 	struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt);
+	struct sock *sk = transport->inet;
 	int ret = -EAGAIN;
 
 	dprintk("RPC: %5u xmit incomplete (%u left of %u)\n",
@@ -527,7 +528,7 @@ static int xs_nospace(struct rpc_task *task)
 			 * window size
 			 */
 			set_bit(SOCK_NOSPACE, &transport->sock->flags);
-			transport->inet->sk_write_pending++;
+			sk->sk_write_pending++;
 			/* ...and wait for more buffer space */
 			xprt_wait_for_buffer_space(task, xs_nospace_callback);
 		}
@@ -537,6 +538,9 @@ static int xs_nospace(struct rpc_task *task)
 	}
 
 	spin_unlock_bh(&xprt->transport_lock);
+
+	/* Race breaker in case memory is freed before above code is called */
+	sk->sk_write_space(sk);
 	return ret;
 }
 
-- 
cgit v1.2.2


From 628356791b04ea988fee070f66a748a823d001bb Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Tue, 11 Feb 2014 13:56:54 -0500
Subject: SUNRPC: Fix potential memory scribble in xprt_free_bc_request()

The call to xprt_free_allocation() will call list_del() on
req->rq_bc_pa_list, which is not attached to a list.
This patch moves the list_del() out of xprt_free_allocation()
and into those callers that need it.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 net/sunrpc/backchannel_rqst.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 890a29912d5a..e860d4f7ed2a 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -64,7 +64,6 @@ static void xprt_free_allocation(struct rpc_rqst *req)
 	free_page((unsigned long)xbufp->head[0].iov_base);
 	xbufp = &req->rq_snd_buf;
 	free_page((unsigned long)xbufp->head[0].iov_base);
-	list_del(&req->rq_bc_pa_list);
 	kfree(req);
 }
 
@@ -168,8 +167,10 @@ out_free:
 	/*
 	 * Memory allocation failed, free the temporary list
 	 */
-	list_for_each_entry_safe(req, tmp, &tmp_list, rq_bc_pa_list)
+	list_for_each_entry_safe(req, tmp, &tmp_list, rq_bc_pa_list) {
+		list_del(&req->rq_bc_pa_list);
 		xprt_free_allocation(req);
+	}
 
 	dprintk("RPC:       setup backchannel transport failed\n");
 	return -ENOMEM;
@@ -198,6 +199,7 @@ void xprt_destroy_backchannel(struct rpc_xprt *xprt, unsigned int max_reqs)
 	xprt_dec_alloc_count(xprt, max_reqs);
 	list_for_each_entry_safe(req, tmp, &xprt->bc_pa_list, rq_bc_pa_list) {
 		dprintk("RPC:        req=%p\n", req);
+		list_del(&req->rq_bc_pa_list);
 		xprt_free_allocation(req);
 		if (--max_reqs == 0)
 			break;
-- 
cgit v1.2.2


From 1bf4bbb4024dcdab5e57634dd8ae1072d42a53ac Mon Sep 17 00:00:00 2001
From: Felix Fietkau <nbd@openwrt.org>
Date: Tue, 11 Feb 2014 16:02:47 +0100
Subject: mac80211: send control port protocol frames to the VO queue

Improves reliability of wifi connections with WPA, since authentication
frames are prioritized over normal traffic and also typically exempt
from aggregation.

Cc: stable@vger.kernel.org
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/wme.c | 5 +++++
 1 file changed, 5 insertions(+)

(limited to 'net')

diff --git a/net/mac80211/wme.c b/net/mac80211/wme.c
index 21211c60ca98..d51422c778de 100644
--- a/net/mac80211/wme.c
+++ b/net/mac80211/wme.c
@@ -154,6 +154,11 @@ u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata,
 		return IEEE80211_AC_BE;
 	}
 
+	if (skb->protocol == sdata->control_port_protocol) {
+		skb->priority = 7;
+		return ieee80211_downgrade_queue(sdata, skb);
+	}
+
 	/* use the data classifier to determine what 802.1d tag the
 	 * data frame has */
 	rcu_read_lock();
-- 
cgit v1.2.2


From ce898ecb5a3c0027855dcee21ed99690b867d017 Mon Sep 17 00:00:00 2001
From: Patrick McHardy <kaber@trash.net>
Date: Wed, 12 Feb 2014 21:13:16 +0000
Subject: netfilter: nft_reject_inet: fix unintended fall-through in
 switch-statatement

For IPv4 packets, we call both IPv4 and IPv6 reject.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_reject_inet.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/netfilter/nft_reject_inet.c b/net/netfilter/nft_reject_inet.c
index 8a310f239c93..b718a52a4654 100644
--- a/net/netfilter/nft_reject_inet.c
+++ b/net/netfilter/nft_reject_inet.c
@@ -21,9 +21,9 @@ static void nft_reject_inet_eval(const struct nft_expr *expr,
 {
 	switch (pkt->ops->pf) {
 	case NFPROTO_IPV4:
-		nft_reject_ipv4_eval(expr, data, pkt);
+		return nft_reject_ipv4_eval(expr, data, pkt);
 	case NFPROTO_IPV6:
-		nft_reject_ipv6_eval(expr, data, pkt);
+		return nft_reject_ipv6_eval(expr, data, pkt);
 	}
 }
 
-- 
cgit v1.2.2


From 06efbd6d5694b2e3cde176f724ba572d57709616 Mon Sep 17 00:00:00 2001
From: Paul Bolle <pebolle@tiscali.nl>
Date: Wed, 12 Feb 2014 10:53:01 +0100
Subject: netfilter: nft_meta: fix typo "CONFIG_NET_CLS_ROUTE"

There are two checks for CONFIG_NET_CLS_ROUTE, but the corresponding
Kconfig symbol was dropped in v2.6.39. Since the code guards access to
dst_entry.tclassid it seems CONFIG_IP_ROUTE_CLASSID should be used
instead.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_meta.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index e8254ad2e5a9..425cf39af890 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -116,7 +116,7 @@ static void nft_meta_get_eval(const struct nft_expr *expr,
 				 skb->sk->sk_socket->file->f_cred->fsgid);
 		read_unlock_bh(&skb->sk->sk_callback_lock);
 		break;
-#ifdef CONFIG_NET_CLS_ROUTE
+#ifdef CONFIG_IP_ROUTE_CLASSID
 	case NFT_META_RTCLASSID: {
 		const struct dst_entry *dst = skb_dst(skb);
 
@@ -199,7 +199,7 @@ static int nft_meta_init_validate_get(uint32_t key)
 	case NFT_META_OIFTYPE:
 	case NFT_META_SKUID:
 	case NFT_META_SKGID:
-#ifdef CONFIG_NET_CLS_ROUTE
+#ifdef CONFIG_IP_ROUTE_CLASSID
 	case NFT_META_RTCLASSID:
 #endif
 #ifdef CONFIG_NETWORK_SECMARK
-- 
cgit v1.2.2


From 2b7a79bae2dc0327af2352e1d1793b9d752648aa Mon Sep 17 00:00:00 2001
From: FX Le Bail <fx.lebail@yahoo.com>
Date: Tue, 11 Feb 2014 15:49:25 +0100
Subject: netfilter: nf_nat_snmp_basic: fix duplicates in if/else branches

The solution was found by Patrick in 2.4 kernel sources.

Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Francois-Xavier Le Bail <fx.lebail@yahoo.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/ipv4/netfilter/nf_nat_snmp_basic.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/netfilter/nf_nat_snmp_basic.c b/net/ipv4/netfilter/nf_nat_snmp_basic.c
index d551e31b416e..7c676671329d 100644
--- a/net/ipv4/netfilter/nf_nat_snmp_basic.c
+++ b/net/ipv4/netfilter/nf_nat_snmp_basic.c
@@ -1198,8 +1198,8 @@ static int snmp_translate(struct nf_conn *ct,
 		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
 	} else {
 		/* DNAT replies */
-		map.from = NOCT1(&ct->tuplehash[dir].tuple.src.u3.ip);
-		map.to = NOCT1(&ct->tuplehash[!dir].tuple.dst.u3.ip);
+		map.from = NOCT1(&ct->tuplehash[!dir].tuple.src.u3.ip);
+		map.to = NOCT1(&ct->tuplehash[dir].tuple.dst.u3.ip);
 	}
 
 	if (map.from == map.to)
-- 
cgit v1.2.2


From 9eb2ddb48ce3a7bd745c14a933112994647fa3cd Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Sun, 16 Feb 2014 12:14:13 -0500
Subject: SUNRPC: Ensure that gss_auth isn't freed before its upcall messages

Fix a race in which the RPC client is shutting down while the
gss daemon is processing a downcall. If the RPC client manages to
shut down before the gss daemon is done, then the struct gss_auth
used in gss_release_msg() may have already been freed.

Link: http://lkml.kernel.org/r/1392494917.71728.YahooMailNeo@web140002.mail.bf1.yahoo.com
Reported-by: John <da_audiophile@yahoo.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Cc: stable@vger.kernel.org # 3.12+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 44a61e8fda6f..1ba1fd114912 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -108,6 +108,7 @@ struct gss_auth {
 static DEFINE_SPINLOCK(pipe_version_lock);
 static struct rpc_wait_queue pipe_version_rpc_waitqueue;
 static DECLARE_WAIT_QUEUE_HEAD(pipe_version_waitqueue);
+static void gss_put_auth(struct gss_auth *gss_auth);
 
 static void gss_free_ctx(struct gss_cl_ctx *);
 static const struct rpc_pipe_ops gss_upcall_ops_v0;
@@ -320,6 +321,7 @@ gss_release_msg(struct gss_upcall_msg *gss_msg)
 	if (gss_msg->ctx != NULL)
 		gss_put_ctx(gss_msg->ctx);
 	rpc_destroy_wait_queue(&gss_msg->rpc_waitqueue);
+	gss_put_auth(gss_msg->auth);
 	kfree(gss_msg);
 }
 
@@ -500,6 +502,7 @@ gss_alloc_msg(struct gss_auth *gss_auth,
 		if (err)
 			goto err_free_msg;
 	};
+	kref_get(&gss_auth->kref);
 	return gss_msg;
 err_free_msg:
 	kfree(gss_msg);
@@ -1063,6 +1066,12 @@ gss_free_callback(struct kref *kref)
 	gss_free(gss_auth);
 }
 
+static void
+gss_put_auth(struct gss_auth *gss_auth)
+{
+	kref_put(&gss_auth->kref, gss_free_callback);
+}
+
 static void
 gss_destroy(struct rpc_auth *auth)
 {
@@ -1084,7 +1093,7 @@ gss_destroy(struct rpc_auth *auth)
 	gss_auth->gss_pipe[1] = NULL;
 	rpcauth_destroy_credcache(auth);
 
-	kref_put(&gss_auth->kref, gss_free_callback);
+	gss_put_auth(gss_auth);
 }
 
 /*
@@ -1255,7 +1264,7 @@ gss_destroy_nullcred(struct rpc_cred *cred)
 	call_rcu(&cred->cr_rcu, gss_free_cred_callback);
 	if (ctx)
 		gss_put_ctx(ctx);
-	kref_put(&gss_auth->kref, gss_free_callback);
+	gss_put_auth(gss_auth);
 }
 
 static void
-- 
cgit v1.2.2


From e9776d0f4adee8877145672f6416b06b57f2dc27 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@primarydata.com>
Date: Sun, 16 Feb 2014 13:28:01 -0500
Subject: SUNRPC: Fix a pipe_version reference leak

In gss_alloc_msg(), if the call to gss_encode_v1_msg() fails, we
want to release the reference to the pipe_version that was obtained
earlier in the function.

Fixes: 9d3a2260f0f4b (SUNRPC: Fix buffer overflow checking in...)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 net/sunrpc/auth_gss/auth_gss.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 1ba1fd114912..36e431ee1c90 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -500,10 +500,12 @@ gss_alloc_msg(struct gss_auth *gss_auth,
 	default:
 		err = gss_encode_v1_msg(gss_msg, service_name, gss_auth->target_name);
 		if (err)
-			goto err_free_msg;
+			goto err_put_pipe_version;
 	};
 	kref_get(&gss_auth->kref);
 	return gss_msg;
+err_put_pipe_version:
+	put_pipe_version(gss_auth->net);
 err_free_msg:
 	kfree(gss_msg);
 err:
-- 
cgit v1.2.2


From 478b360a47b71f3b5030eacd3aae6acb1a32c2b6 Mon Sep 17 00:00:00 2001
From: Florian Westphal <fw@strlen.de>
Date: Sat, 15 Feb 2014 23:48:45 +0100
Subject: netfilter: nf_tables: fix nf_trace always-on with XT_TRACE=n

When using nftables with CONFIG_NETFILTER_XT_TARGET_TRACE=n, we get
lots of "TRACE: filter:output:policy:1 IN=..." warnings as several
places will leave skb->nf_trace uninitialised.

Unlike iptables tracing functionality is not conditional in nftables,
so always copy/zero nf_trace setting when nftables is enabled.

Move this into __nf_copy() helper.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/core/skbuff.c     | 3 ---
 net/ipv4/ip_output.c  | 3 ---
 net/ipv6/ip6_output.c | 3 ---
 3 files changed, 9 deletions(-)

(limited to 'net')

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5976ef0846bd..5d6236d9fdce 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -707,9 +707,6 @@ static void __copy_skb_header(struct sk_buff *new, const struct sk_buff *old)
 	new->mark		= old->mark;
 	new->skb_iif		= old->skb_iif;
 	__nf_copy(new, old);
-#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE)
-	new->nf_trace		= old->nf_trace;
-#endif
 #ifdef CONFIG_NET_SCHED
 	new->tc_index		= old->tc_index;
 #ifdef CONFIG_NET_CLS_ACT
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 8971780aec7c..73c6b63bba74 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -422,9 +422,6 @@ static void ip_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	to->tc_index = from->tc_index;
 #endif
 	nf_copy(to, from);
-#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE)
-	to->nf_trace = from->nf_trace;
-#endif
 #if defined(CONFIG_IP_VS) || defined(CONFIG_IP_VS_MODULE)
 	to->ipvs_property = from->ipvs_property;
 #endif
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index ef02b26ccf81..4cfbe0f3793a 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -517,9 +517,6 @@ static void ip6_copy_metadata(struct sk_buff *to, struct sk_buff *from)
 	to->tc_index = from->tc_index;
 #endif
 	nf_copy(to, from);
-#if IS_ENABLED(CONFIG_NETFILTER_XT_TARGET_TRACE)
-	to->nf_trace = from->nf_trace;
-#endif
 	skb_copy_secmark(to, from);
 }
 
-- 
cgit v1.2.2


From f627ed91d85ed7a189ec8b3b045a0d831e1655e2 Mon Sep 17 00:00:00 2001
From: Nikolay Aleksandrov <nikolay@redhat.com>
Date: Sun, 16 Feb 2014 14:01:58 +0100
Subject: netfilter: nf_tables: check if payload length is a power of 2

Add a check if payload's length is a power of 2 when selecting ops.
The fast ops were meant for well aligned loads, also this fixes a
small bug when using a length of 3 with some offsets which causes
only 1 byte to be loaded because the fast ops are chosen.

Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_payload.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c
index a2aeb318678f..85daa84bfdfe 100644
--- a/net/netfilter/nft_payload.c
+++ b/net/netfilter/nft_payload.c
@@ -135,7 +135,8 @@ nft_payload_select_ops(const struct nft_ctx *ctx,
 	if (len == 0 || len > FIELD_SIZEOF(struct nft_data, data))
 		return ERR_PTR(-EINVAL);
 
-	if (len <= 4 && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
+	if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) &&
+	    base != NFT_PAYLOAD_LL_HEADER)
 		return &nft_payload_fast_ops;
 	else
 		return &nft_payload_ops;
-- 
cgit v1.2.2


From 0eba801b64cc8284d9024c7ece30415a2b981a72 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo@netfilter.org>
Date: Sun, 16 Feb 2014 12:15:43 +0100
Subject: netfilter: ctnetlink: force null nat binding on insert

Quoting Andrey Vagin:
  When a conntrack is created  by kernel, it is initialized (sets
  IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it
  is added in hashes (__nf_conntrack_hash_insert), so one conntract
  can't be initialized from a few threads concurrently.

  ctnetlink can add an uninitialized conntrack (w/o
  IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up
  this conntrack and start initialize it concurrently. It's dangerous,
  because BUG can be triggered from nf_nat_setup_info.

Fix this race by always setting up nat, even if no CTA_NAT_ attribute
was requested before inserting the ct into the hash table. In absence
of CTA_NAT_ attribute, a null binding is created.

This alters current behaviour: Before this patch, the first packet
matching the newly injected conntrack would be run through the nat
table since nf_nat_initialized() returns false.  IOW, this forces
ctnetlink users to specify the desired nat transformation on ct
creation time.

Thanks for Florian Westphal, this patch is based on his original
patch to address this problem, including this patch description.

Reported-By: Andrey Vagin <avagin@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Florian Westphal <fw@strlen.de>
---
 net/netfilter/nf_conntrack_netlink.c | 35 +++++++++-------------
 net/netfilter/nf_nat_core.c          | 56 ++++++++++++++++++++++--------------
 2 files changed, 49 insertions(+), 42 deletions(-)

(limited to 'net')

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index bb322d0beb48..b9f0e0374322 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1310,27 +1310,22 @@ ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[])
 }
 
 static int
-ctnetlink_change_nat(struct nf_conn *ct, const struct nlattr * const cda[])
+ctnetlink_setup_nat(struct nf_conn *ct, const struct nlattr * const cda[])
 {
 #ifdef CONFIG_NF_NAT_NEEDED
 	int ret;
 
-	if (cda[CTA_NAT_DST]) {
-		ret = ctnetlink_parse_nat_setup(ct,
-						NF_NAT_MANIP_DST,
-						cda[CTA_NAT_DST]);
-		if (ret < 0)
-			return ret;
-	}
-	if (cda[CTA_NAT_SRC]) {
-		ret = ctnetlink_parse_nat_setup(ct,
-						NF_NAT_MANIP_SRC,
-						cda[CTA_NAT_SRC]);
-		if (ret < 0)
-			return ret;
-	}
-	return 0;
+	ret = ctnetlink_parse_nat_setup(ct, NF_NAT_MANIP_DST,
+					cda[CTA_NAT_DST]);
+	if (ret < 0)
+		return ret;
+
+	ret = ctnetlink_parse_nat_setup(ct, NF_NAT_MANIP_SRC,
+					cda[CTA_NAT_SRC]);
+	return ret;
 #else
+	if (!cda[CTA_NAT_DST] && !cda[CTA_NAT_SRC])
+		return 0;
 	return -EOPNOTSUPP;
 #endif
 }
@@ -1659,11 +1654,9 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
 			goto err2;
 	}
 
-	if (cda[CTA_NAT_SRC] || cda[CTA_NAT_DST]) {
-		err = ctnetlink_change_nat(ct, cda);
-		if (err < 0)
-			goto err2;
-	}
+	err = ctnetlink_setup_nat(ct, cda);
+	if (err < 0)
+		goto err2;
 
 	nf_ct_acct_ext_add(ct, GFP_ATOMIC);
 	nf_ct_tstamp_ext_add(ct, GFP_ATOMIC);
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index d3f5cd6dd962..52ca952b802c 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -432,15 +432,15 @@ nf_nat_setup_info(struct nf_conn *ct,
 }
 EXPORT_SYMBOL(nf_nat_setup_info);
 
-unsigned int
-nf_nat_alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
+static unsigned int
+__nf_nat_alloc_null_binding(struct nf_conn *ct, enum nf_nat_manip_type manip)
 {
 	/* Force range to this IP; let proto decide mapping for
 	 * per-proto parts (hence not IP_NAT_RANGE_PROTO_SPECIFIED).
 	 * Use reply in case it's already been mangled (eg local packet).
 	 */
 	union nf_inet_addr ip =
-		(HOOK2MANIP(hooknum) == NF_NAT_MANIP_SRC ?
+		(manip == NF_NAT_MANIP_SRC ?
 		ct->tuplehash[IP_CT_DIR_REPLY].tuple.dst.u3 :
 		ct->tuplehash[IP_CT_DIR_REPLY].tuple.src.u3);
 	struct nf_nat_range range = {
@@ -448,7 +448,13 @@ nf_nat_alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
 		.min_addr	= ip,
 		.max_addr	= ip,
 	};
-	return nf_nat_setup_info(ct, &range, HOOK2MANIP(hooknum));
+	return nf_nat_setup_info(ct, &range, manip);
+}
+
+unsigned int
+nf_nat_alloc_null_binding(struct nf_conn *ct, unsigned int hooknum)
+{
+	return __nf_nat_alloc_null_binding(ct, HOOK2MANIP(hooknum));
 }
 EXPORT_SYMBOL_GPL(nf_nat_alloc_null_binding);
 
@@ -702,9 +708,9 @@ static const struct nla_policy nat_nla_policy[CTA_NAT_MAX+1] = {
 
 static int
 nfnetlink_parse_nat(const struct nlattr *nat,
-		    const struct nf_conn *ct, struct nf_nat_range *range)
+		    const struct nf_conn *ct, struct nf_nat_range *range,
+		    const struct nf_nat_l3proto *l3proto)
 {
-	const struct nf_nat_l3proto *l3proto;
 	struct nlattr *tb[CTA_NAT_MAX+1];
 	int err;
 
@@ -714,38 +720,46 @@ nfnetlink_parse_nat(const struct nlattr *nat,
 	if (err < 0)
 		return err;
 
-	rcu_read_lock();
-	l3proto = __nf_nat_l3proto_find(nf_ct_l3num(ct));
-	if (l3proto == NULL) {
-		err = -EAGAIN;
-		goto out;
-	}
 	err = l3proto->nlattr_to_range(tb, range);
 	if (err < 0)
-		goto out;
+		return err;
 
 	if (!tb[CTA_NAT_PROTO])
-		goto out;
+		return 0;
 
-	err = nfnetlink_parse_nat_proto(tb[CTA_NAT_PROTO], ct, range);
-out:
-	rcu_read_unlock();
-	return err;
+	return nfnetlink_parse_nat_proto(tb[CTA_NAT_PROTO], ct, range);
 }
 
+/* This function is called under rcu_read_lock() */
 static int
 nfnetlink_parse_nat_setup(struct nf_conn *ct,
 			  enum nf_nat_manip_type manip,
 			  const struct nlattr *attr)
 {
 	struct nf_nat_range range;
+	const struct nf_nat_l3proto *l3proto;
 	int err;
 
-	err = nfnetlink_parse_nat(attr, ct, &range);
+	/* Should not happen, restricted to creating new conntracks
+	 * via ctnetlink.
+	 */
+	if (WARN_ON_ONCE(nf_nat_initialized(ct, manip)))
+		return -EEXIST;
+
+	/* Make sure that L3 NAT is there by when we call nf_nat_setup_info to
+	 * attach the null binding, otherwise this may oops.
+	 */
+	l3proto = __nf_nat_l3proto_find(nf_ct_l3num(ct));
+	if (l3proto == NULL)
+		return -EAGAIN;
+
+	/* No NAT information has been passed, allocate the null-binding */
+	if (attr == NULL)
+		return __nf_nat_alloc_null_binding(ct, manip);
+
+	err = nfnetlink_parse_nat(attr, ct, &range, l3proto);
 	if (err < 0)
 		return err;
-	if (nf_nat_initialized(ct, manip))
-		return -EEXIST;
 
 	return nf_nat_setup_info(ct, &range, manip);
 }
-- 
cgit v1.2.2


From 50c11eb9982554e9f99b7bab322c517cbe5ce1a1 Mon Sep 17 00:00:00 2001
From: Inbal Hacohen <Inbal.Hacohen@intel.com>
Date: Wed, 12 Feb 2014 09:32:27 +0200
Subject: cfg80211: bugfix in regulatory user hint process

After processing hint_user, we would want to schedule the
timeout work only if we are actually waiting to CRDA. This happens
when the status is not "IGNORE" nor "ALREADY_SET".

Signed-off-by: Inbal Hacohen <Inbal.Hacohen@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/wireless/reg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 9b897fca7487..4c50c21d6f52 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -1700,7 +1700,7 @@ static void reg_process_hint(struct regulatory_request *reg_request)
 		return;
 	case NL80211_REGDOM_SET_BY_USER:
 		treatment = reg_process_hint_user(reg_request);
-		if (treatment == REG_REQ_OK ||
+		if (treatment == REG_REQ_IGNORE ||
 		    treatment == REG_REQ_ALREADY_SET)
 			return;
 		schedule_delayed_work(&reg_timeout, msecs_to_jiffies(3142));
-- 
cgit v1.2.2


From ebe44f350e15d6142d4d74cbaec0dad976c36753 Mon Sep 17 00:00:00 2001
From: "David S. Miller" <davem@davemloft.net>
Date: Thu, 20 Feb 2014 02:14:23 -0500
Subject: ip_tunnel: Move ip_tunnel_get_stats64 into ip_tunnel_core.c

net/built-in.o:(.rodata+0x1707c): undefined reference to `ip_tunnel_get_stats64'

Reported-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ip_tunnel.c      | 46 ----------------------------------------------
 net/ipv4/ip_tunnel_core.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 46 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 50228be5c17b..08f8cf99b3a2 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -119,52 +119,6 @@ static struct rtable *tunnel_rtable_get(struct ip_tunnel *t, u32 cookie)
 	return (struct rtable *)dst;
 }
 
-/* Often modified stats are per cpu, other are shared (netdev->stats) */
-struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev,
-						struct rtnl_link_stats64 *tot)
-{
-	int i;
-
-	for_each_possible_cpu(i) {
-		const struct pcpu_sw_netstats *tstats =
-						   per_cpu_ptr(dev->tstats, i);
-		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
-		unsigned int start;
-
-		do {
-			start = u64_stats_fetch_begin_bh(&tstats->syncp);
-			rx_packets = tstats->rx_packets;
-			tx_packets = tstats->tx_packets;
-			rx_bytes = tstats->rx_bytes;
-			tx_bytes = tstats->tx_bytes;
-		} while (u64_stats_fetch_retry_bh(&tstats->syncp, start));
-
-		tot->rx_packets += rx_packets;
-		tot->tx_packets += tx_packets;
-		tot->rx_bytes   += rx_bytes;
-		tot->tx_bytes   += tx_bytes;
-	}
-
-	tot->multicast = dev->stats.multicast;
-
-	tot->rx_crc_errors = dev->stats.rx_crc_errors;
-	tot->rx_fifo_errors = dev->stats.rx_fifo_errors;
-	tot->rx_length_errors = dev->stats.rx_length_errors;
-	tot->rx_frame_errors = dev->stats.rx_frame_errors;
-	tot->rx_errors = dev->stats.rx_errors;
-
-	tot->tx_fifo_errors = dev->stats.tx_fifo_errors;
-	tot->tx_carrier_errors = dev->stats.tx_carrier_errors;
-	tot->tx_dropped = dev->stats.tx_dropped;
-	tot->tx_aborted_errors = dev->stats.tx_aborted_errors;
-	tot->tx_errors = dev->stats.tx_errors;
-
-	tot->collisions  = dev->stats.collisions;
-
-	return tot;
-}
-EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64);
-
 static bool ip_tunnel_key_match(const struct ip_tunnel_parm *p,
 				__be16 flags, __be32 key)
 {
diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 6156f4ef5e91..8d69626f2206 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -148,3 +148,49 @@ error:
 	return ERR_PTR(err);
 }
 EXPORT_SYMBOL_GPL(iptunnel_handle_offloads);
+
+/* Often modified stats are per cpu, other are shared (netdev->stats) */
+struct rtnl_link_stats64 *ip_tunnel_get_stats64(struct net_device *dev,
+						struct rtnl_link_stats64 *tot)
+{
+	int i;
+
+	for_each_possible_cpu(i) {
+		const struct pcpu_sw_netstats *tstats =
+						   per_cpu_ptr(dev->tstats, i);
+		u64 rx_packets, rx_bytes, tx_packets, tx_bytes;
+		unsigned int start;
+
+		do {
+			start = u64_stats_fetch_begin_bh(&tstats->syncp);
+			rx_packets = tstats->rx_packets;
+			tx_packets = tstats->tx_packets;
+			rx_bytes = tstats->rx_bytes;
+			tx_bytes = tstats->tx_bytes;
+		} while (u64_stats_fetch_retry_bh(&tstats->syncp, start));
+
+		tot->rx_packets += rx_packets;
+		tot->tx_packets += tx_packets;
+		tot->rx_bytes   += rx_bytes;
+		tot->tx_bytes   += tx_bytes;
+	}
+
+	tot->multicast = dev->stats.multicast;
+
+	tot->rx_crc_errors = dev->stats.rx_crc_errors;
+	tot->rx_fifo_errors = dev->stats.rx_fifo_errors;
+	tot->rx_length_errors = dev->stats.rx_length_errors;
+	tot->rx_frame_errors = dev->stats.rx_frame_errors;
+	tot->rx_errors = dev->stats.rx_errors;
+
+	tot->tx_fifo_errors = dev->stats.tx_fifo_errors;
+	tot->tx_carrier_errors = dev->stats.tx_carrier_errors;
+	tot->tx_dropped = dev->stats.tx_dropped;
+	tot->tx_aborted_errors = dev->stats.tx_aborted_errors;
+	tot->tx_errors = dev->stats.tx_errors;
+
+	tot->collisions  = dev->stats.collisions;
+
+	return tot;
+}
+EXPORT_SYMBOL_GPL(ip_tunnel_get_stats64);
-- 
cgit v1.2.2


From 1d147bfa64293b2723c4fec50922168658e613ba Mon Sep 17 00:00:00 2001
From: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Date: Thu, 20 Feb 2014 09:22:11 +0200
Subject: mac80211: fix AP powersave TX vs. wakeup race

There is a race between the TX path and the STA wakeup: while
a station is sleeping, mac80211 buffers frames until it wakes
up, then the frames are transmitted. However, the RX and TX
path are concurrent, so the packet indicating wakeup can be
processed while a packet is being transmitted.

This can lead to a situation where the buffered frames list
is emptied on the one side, while a frame is being added on
the other side, as the station is still seen as sleeping in
the TX path.

As a result, the newly added frame will not be send anytime
soon. It might be sent much later (and out of order) when the
station goes to sleep and wakes up the next time.

Additionally, it can lead to the crash below.

Fix all this by synchronising both paths with a new lock.
Both path are not fastpath since they handle PS situations.

In a later patch we'll remove the extra skb queue locks to
reduce locking overhead.

BUG: unable to handle kernel
NULL pointer dereference at 000000b0
IP: [<ff6f1791>] ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
*pde = 00000000
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
EIP: 0060:[<ff6f1791>] EFLAGS: 00210282 CPU: 1
EIP is at ieee80211_report_used_skb+0x11/0x3e0 [mac80211]
EAX: e5900da0 EBX: 00000000 ECX: 00000001 EDX: 00000000
ESI: e41d00c0 EDI: e5900da0 EBP: ebe458e4 ESP: ebe458b0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 000000b0 CR3: 25a78000 CR4: 000407d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process iperf (pid: 3934, ti=ebe44000 task=e757c0b0 task.ti=ebe44000)
iwlwifi 0000:02:00.0: I iwl_pcie_enqueue_hcmd Sending command LQ_CMD (#4e), seq: 0x0903, 92 bytes at 3[3]:9
Stack:
 e403b32c ebe458c4 00200002 00200286 e403b338 ebe458cc c10960bb e5900da0
 ff76a6ec ebe458d8 00000000 e41d00c0 e5900da0 ebe458f0 ff6f1b75 e403b210
 ebe4598c ff723dc1 00000000 ff76a6ec e597c978 e403b758 00000002 00000002
Call Trace:
 [<ff6f1b75>] ieee80211_free_txskb+0x15/0x20 [mac80211]
 [<ff723dc1>] invoke_tx_handlers+0x1661/0x1780 [mac80211]
 [<ff7248a5>] ieee80211_tx+0x75/0x100 [mac80211]
 [<ff7249bf>] ieee80211_xmit+0x8f/0xc0 [mac80211]
 [<ff72550e>] ieee80211_subif_start_xmit+0x4fe/0xe20 [mac80211]
 [<c149ef70>] dev_hard_start_xmit+0x450/0x950
 [<c14b9aa9>] sch_direct_xmit+0xa9/0x250
 [<c14b9c9b>] __qdisc_run+0x4b/0x150
 [<c149f732>] dev_queue_xmit+0x2c2/0xca0

Cc: stable@vger.kernel.org
Reported-by: Yaara Rozenblum <yaara.rozenblum@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Reviewed-by: Stanislaw Gruszka <sgruszka@redhat.com>
[reword commit log, use a separate lock]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/sta_info.c |  4 ++++
 net/mac80211/sta_info.h |  7 +++----
 net/mac80211/tx.c       | 15 +++++++++++++++
 3 files changed, 22 insertions(+), 4 deletions(-)

(limited to 'net')

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index decd30c1e290..62a5f0889583 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -330,6 +330,7 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 	rcu_read_unlock();
 
 	spin_lock_init(&sta->lock);
+	spin_lock_init(&sta->ps_lock);
 	INIT_WORK(&sta->drv_unblock_wk, sta_unblock);
 	INIT_WORK(&sta->ampdu_mlme.work, ieee80211_ba_session_work);
 	mutex_init(&sta->ampdu_mlme.mtx);
@@ -1109,6 +1110,8 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
 
 	skb_queue_head_init(&pending);
 
+	/* sync with ieee80211_tx_h_unicast_ps_buf */
+	spin_lock(&sta->ps_lock);
 	/* Send all buffered frames to the station */
 	for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
 		int count = skb_queue_len(&pending), tmp;
@@ -1128,6 +1131,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
 	}
 
 	ieee80211_add_pending_skbs_fn(local, &pending, clear_sta_ps_flags, sta);
+	spin_unlock(&sta->ps_lock);
 
 	/* This station just woke up and isn't aware of our SMPS state */
 	if (!ieee80211_smps_is_restrictive(sta->known_smps_mode,
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index d77ff7090630..d3a6d8208f2f 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -267,6 +267,7 @@ struct ieee80211_tx_latency_stat {
  * @drv_unblock_wk: used for driver PS unblocking
  * @listen_interval: listen interval of this station, when we're acting as AP
  * @_flags: STA flags, see &enum ieee80211_sta_info_flags, do not use directly
+ * @ps_lock: used for powersave (when mac80211 is the AP) related locking
  * @ps_tx_buf: buffers (per AC) of frames to transmit to this station
  *	when it leaves power saving state or polls
  * @tx_filtered: buffers (per AC) of frames we already tried to
@@ -356,10 +357,8 @@ struct sta_info {
 	/* use the accessors defined below */
 	unsigned long _flags;
 
-	/*
-	 * STA powersave frame queues, no more than the internal
-	 * locking required.
-	 */
+	/* STA powersave lock and frame queues */
+	spinlock_t ps_lock;
 	struct sk_buff_head ps_tx_buf[IEEE80211_NUM_ACS];
 	struct sk_buff_head tx_filtered[IEEE80211_NUM_ACS];
 	unsigned long driver_buffered_tids;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 97a02d3f7d87..4080c615636f 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -478,6 +478,20 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
 		       sta->sta.addr, sta->sta.aid, ac);
 		if (tx->local->total_ps_buffered >= TOTAL_MAX_TX_BUFFER)
 			purge_old_ps_buffers(tx->local);
+
+		/* sync with ieee80211_sta_ps_deliver_wakeup */
+		spin_lock(&sta->ps_lock);
+		/*
+		 * STA woke up the meantime and all the frames on ps_tx_buf have
+		 * been queued to pending queue. No reordering can happen, go
+		 * ahead and Tx the packet.
+		 */
+		if (!test_sta_flag(sta, WLAN_STA_PS_STA) &&
+		    !test_sta_flag(sta, WLAN_STA_PS_DRIVER)) {
+			spin_unlock(&sta->ps_lock);
+			return TX_CONTINUE;
+		}
+
 		if (skb_queue_len(&sta->ps_tx_buf[ac]) >= STA_MAX_TX_BUFFER) {
 			struct sk_buff *old = skb_dequeue(&sta->ps_tx_buf[ac]);
 			ps_dbg(tx->sdata,
@@ -492,6 +506,7 @@ ieee80211_tx_h_unicast_ps_buf(struct ieee80211_tx_data *tx)
 		info->flags |= IEEE80211_TX_INTFL_NEED_TXPROCESSING;
 		info->flags &= ~IEEE80211_TX_TEMPORARY_FLAGS;
 		skb_queue_tail(&sta->ps_tx_buf[ac], tx->skb);
+		spin_unlock(&sta->ps_lock);
 
 		if (!timer_pending(&local->sta_cleanup))
 			mod_timer(&local->sta_cleanup,
-- 
cgit v1.2.2


From 5108ca828017120981880eeec8a9ec369334a899 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Mon, 17 Feb 2014 20:49:03 +0100
Subject: mac80211: insert stations before adding to driver

There's a race condition in mac80211 because we add stations
to the internal lists after adding them to the driver, which
means that (for example) the following can happen:
 1. a station connects and is added
 2. first, it is added to the driver
 3. then, it is added to the mac80211 lists

If the station goes to sleep between steps 2 and 3, and the
firmware/hardware records it as being asleep, mac80211 will
never instruct the driver to wake it up again as it never
realized it went to sleep since the RX path discarded the
frame as a "spurious class 3 frame", no station entry was
present yet.

Fix this by adding the station in software first, and only
then adding it to the driver. That way, any state that the
driver changes will be reflected properly in mac80211's
station state. The problematic part is the roll-back if the
driver fails to add the station, in that case a bit more is
needed. To not make that overly complex prevent starting BA
sessions in the meantime.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/sta_info.c | 30 ++++++++++++++++++++++++------
 1 file changed, 24 insertions(+), 6 deletions(-)

(limited to 'net')

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 62a5f0889583..ffc1ee6a2ec1 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -91,7 +91,7 @@ static int sta_info_hash_del(struct ieee80211_local *local,
 	return -ENOENT;
 }
 
-static void cleanup_single_sta(struct sta_info *sta)
+static void __cleanup_single_sta(struct sta_info *sta)
 {
 	int ac, i;
 	struct tid_ampdu_tx *tid_tx;
@@ -139,7 +139,14 @@ static void cleanup_single_sta(struct sta_info *sta)
 		ieee80211_purge_tx_queue(&local->hw, &tid_tx->pending);
 		kfree(tid_tx);
 	}
+}
 
+static void cleanup_single_sta(struct sta_info *sta)
+{
+	struct ieee80211_sub_if_data *sdata = sta->sdata;
+	struct ieee80211_local *local = sdata->local;
+
+	__cleanup_single_sta(sta);
 	sta_info_free(local, sta);
 }
 
@@ -488,21 +495,26 @@ static int sta_info_insert_finish(struct sta_info *sta) __acquires(RCU)
 		goto out_err;
 	}
 
-	/* notify driver */
-	err = sta_info_insert_drv_state(local, sdata, sta);
-	if (err)
-		goto out_err;
-
 	local->num_sta++;
 	local->sta_generation++;
 	smp_mb();
 
+	/* simplify things and don't accept BA sessions yet */
+	set_sta_flag(sta, WLAN_STA_BLOCK_BA);
+
 	/* make the station visible */
 	sta_info_hash_add(local, sta);
 
 	list_add_rcu(&sta->list, &local->sta_list);
 
+	/* notify driver */
+	err = sta_info_insert_drv_state(local, sdata, sta);
+	if (err)
+		goto out_remove;
+
 	set_sta_flag(sta, WLAN_STA_INSERTED);
+	/* accept BA sessions now */
+	clear_sta_flag(sta, WLAN_STA_BLOCK_BA);
 
 	ieee80211_recalc_min_chandef(sdata);
 	ieee80211_sta_debugfs_add(sta);
@@ -523,6 +535,12 @@ static int sta_info_insert_finish(struct sta_info *sta) __acquires(RCU)
 		mesh_accept_plinks_update(sdata);
 
 	return 0;
+ out_remove:
+	sta_info_hash_del(local, sta);
+	list_del_rcu(&sta->list);
+	local->num_sta--;
+	synchronize_net();
+	__cleanup_single_sta(sta);
  out_err:
 	mutex_unlock(&local->sta_mtx);
 	rcu_read_lock();
-- 
cgit v1.2.2


From e3685e03b40f5ec7926d9a75bf63467fc4071df9 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Thu, 20 Feb 2014 11:19:58 +0100
Subject: mac80211: fix station wakeup powersave race

Consider the following (relatively unlikely) scenario:
 1) station goes to sleep while frames are buffered in driver
 2) driver blocks wakeup (until no more frames are buffered)
 3) station wakes up again
 4) driver unblocks wakeup

In this case, the current mac80211 code will do the following:
 1) WLAN_STA_PS_STA set
 2) WLAN_STA_PS_DRIVER set
 3) - nothing -
 4) WLAN_STA_PS_DRIVER cleared

As a result, no frames will be delivered to the client, even
though it is awake, until it sends another frame to us that
triggers ieee80211_sta_ps_deliver_wakeup() in sta_ps_end().

Since we now take the PS spinlock, we can fix this while at
the same time removing the complexity with the pending skb
queue function. This was broken since my commit 50a9432daeec
("mac80211: fix powersaving clients races") due to removing
the clearing of WLAN_STA_PS_STA in the RX path.

While at it, fix a cleanup path issue when a station is
removed while the driver is still blocking its wakeup.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/ieee80211_i.h | 10 ++--------
 net/mac80211/rx.c          |  7 +++++++
 net/mac80211/sta_info.c    | 32 ++++++++++++++------------------
 net/mac80211/util.c        |  8 ++------
 4 files changed, 25 insertions(+), 32 deletions(-)

(limited to 'net')

diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 3701930c6649..5e44e3179e02 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -1692,14 +1692,8 @@ void ieee80211_stop_queue_by_reason(struct ieee80211_hw *hw, int queue,
 void ieee80211_propagate_queue_wake(struct ieee80211_local *local, int queue);
 void ieee80211_add_pending_skb(struct ieee80211_local *local,
 			       struct sk_buff *skb);
-void ieee80211_add_pending_skbs_fn(struct ieee80211_local *local,
-				   struct sk_buff_head *skbs,
-				   void (*fn)(void *data), void *data);
-static inline void ieee80211_add_pending_skbs(struct ieee80211_local *local,
-					      struct sk_buff_head *skbs)
-{
-	ieee80211_add_pending_skbs_fn(local, skbs, NULL, NULL);
-}
+void ieee80211_add_pending_skbs(struct ieee80211_local *local,
+				struct sk_buff_head *skbs);
 void ieee80211_flush_queues(struct ieee80211_local *local,
 			    struct ieee80211_sub_if_data *sdata);
 
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index c24ca0d0f469..3e57f96c9666 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -1128,6 +1128,13 @@ static void sta_ps_end(struct sta_info *sta)
 	       sta->sta.addr, sta->sta.aid);
 
 	if (test_sta_flag(sta, WLAN_STA_PS_DRIVER)) {
+		/*
+		 * Clear the flag only if the other one is still set
+		 * so that the TX path won't start TX'ing new frames
+		 * directly ... In the case that the driver flag isn't
+		 * set ieee80211_sta_ps_deliver_wakeup() will clear it.
+		 */
+		clear_sta_flag(sta, WLAN_STA_PS_STA);
 		ps_dbg(sta->sdata, "STA %pM aid %d driver-ps-blocked\n",
 		       sta->sta.addr, sta->sta.aid);
 		return;
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index ffc1ee6a2ec1..a023b432143b 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -99,7 +99,8 @@ static void __cleanup_single_sta(struct sta_info *sta)
 	struct ieee80211_local *local = sdata->local;
 	struct ps_data *ps;
 
-	if (test_sta_flag(sta, WLAN_STA_PS_STA)) {
+	if (test_sta_flag(sta, WLAN_STA_PS_STA) ||
+	    test_sta_flag(sta, WLAN_STA_PS_DRIVER)) {
 		if (sta->sdata->vif.type == NL80211_IFTYPE_AP ||
 		    sta->sdata->vif.type == NL80211_IFTYPE_AP_VLAN)
 			ps = &sdata->bss->ps;
@@ -109,6 +110,7 @@ static void __cleanup_single_sta(struct sta_info *sta)
 			return;
 
 		clear_sta_flag(sta, WLAN_STA_PS_STA);
+		clear_sta_flag(sta, WLAN_STA_PS_DRIVER);
 
 		atomic_dec(&ps->num_sta_ps);
 		sta_info_recalc_tim(sta);
@@ -1090,10 +1092,14 @@ struct ieee80211_sta *ieee80211_find_sta(struct ieee80211_vif *vif,
 }
 EXPORT_SYMBOL(ieee80211_find_sta);
 
-static void clear_sta_ps_flags(void *_sta)
+/* powersave support code */
+void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
 {
-	struct sta_info *sta = _sta;
 	struct ieee80211_sub_if_data *sdata = sta->sdata;
+	struct ieee80211_local *local = sdata->local;
+	struct sk_buff_head pending;
+	int filtered = 0, buffered = 0, ac;
+	unsigned long flags;
 	struct ps_data *ps;
 
 	if (sdata->vif.type == NL80211_IFTYPE_AP ||
@@ -1104,20 +1110,6 @@ static void clear_sta_ps_flags(void *_sta)
 	else
 		return;
 
-	clear_sta_flag(sta, WLAN_STA_PS_DRIVER);
-	if (test_and_clear_sta_flag(sta, WLAN_STA_PS_STA))
-		atomic_dec(&ps->num_sta_ps);
-}
-
-/* powersave support code */
-void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
-{
-	struct ieee80211_sub_if_data *sdata = sta->sdata;
-	struct ieee80211_local *local = sdata->local;
-	struct sk_buff_head pending;
-	int filtered = 0, buffered = 0, ac;
-	unsigned long flags;
-
 	clear_sta_flag(sta, WLAN_STA_SP);
 
 	BUILD_BUG_ON(BITS_TO_LONGS(IEEE80211_NUM_TIDS) > 1);
@@ -1148,9 +1140,13 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta)
 		buffered += tmp - count;
 	}
 
-	ieee80211_add_pending_skbs_fn(local, &pending, clear_sta_ps_flags, sta);
+	ieee80211_add_pending_skbs(local, &pending);
+	clear_sta_flag(sta, WLAN_STA_PS_DRIVER);
+	clear_sta_flag(sta, WLAN_STA_PS_STA);
 	spin_unlock(&sta->ps_lock);
 
+	atomic_dec(&ps->num_sta_ps);
+
 	/* This station just woke up and isn't aware of our SMPS state */
 	if (!ieee80211_smps_is_restrictive(sta->known_smps_mode,
 					   sdata->smps_mode) &&
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 1d1bb7084c05..b8700d417a9c 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -435,9 +435,8 @@ void ieee80211_add_pending_skb(struct ieee80211_local *local,
 	spin_unlock_irqrestore(&local->queue_stop_reason_lock, flags);
 }
 
-void ieee80211_add_pending_skbs_fn(struct ieee80211_local *local,
-				   struct sk_buff_head *skbs,
-				   void (*fn)(void *data), void *data)
+void ieee80211_add_pending_skbs(struct ieee80211_local *local,
+				struct sk_buff_head *skbs)
 {
 	struct ieee80211_hw *hw = &local->hw;
 	struct sk_buff *skb;
@@ -461,9 +460,6 @@ void ieee80211_add_pending_skbs_fn(struct ieee80211_local *local,
 		__skb_queue_tail(&local->pending[queue], skb);
 	}
 
-	if (fn)
-		fn(data);
-
 	for (i = 0; i < hw->queues; i++)
 		__ieee80211_wake_queue(hw, i,
 			IEEE80211_QUEUE_STOP_REASON_SKB_ADD);
-- 
cgit v1.2.2


From 876fc03aaa4d8cc81331119d80d3e9bfbf1164e8 Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 19 Feb 2014 13:33:23 +0100
Subject: ip6_vti: Fix build when NET_IP_TUNNEL is not set.

Since commit 469bdcefdc47a ip6_vti uses ip_tunnel_get_stats64(),
so we need to select NET_IP_TUNNEL to have this function available.

Fixes: 469bdcefdc ("ipv6: fix the use of pcpu_tstats in ip6_vti.c")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/ipv6/Kconfig | 1 +
 1 file changed, 1 insertion(+)

(limited to 'net')

diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig
index d92e5586783e..438a73aa777c 100644
--- a/net/ipv6/Kconfig
+++ b/net/ipv6/Kconfig
@@ -138,6 +138,7 @@ config INET6_XFRM_MODE_ROUTEOPTIMIZATION
 config IPV6_VTI
 tristate "Virtual (secure) IPv6: tunneling"
 	select IPV6_TUNNEL
+	select NET_IP_TUNNEL
 	depends on INET6_XFRM_MODE_TUNNEL
 	---help---
 	Tunneling means encapsulating data of one protocol type within
-- 
cgit v1.2.2


From 35ea790d7883dd660208f78eae50ebfd6b8bd14a Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 19 Feb 2014 13:33:23 +0100
Subject: xfrm: Fix NULL pointer dereference on sub policy usage

xfrm_state_sort() takes the unsorted states from the src array
and stores them into the dst array. We try to get the namespace
from the dst array which is empty at this time, so take the
namespace from the src array instead.

Fixes: 283bc9f35bbbc ("xfrm: Namespacify xfrm state/policy locks")
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index a26b7aa79475..97d117b80ba4 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1451,7 +1451,7 @@ xfrm_state_sort(struct xfrm_state **dst, struct xfrm_state **src, int n,
 {
 	int err = 0;
 	struct xfrm_state_afinfo *afinfo = xfrm_state_get_afinfo(family);
-	struct net *net = xs_net(*dst);
+	struct net *net = xs_net(*src);
 
 	if (!afinfo)
 		return -EAFNOSUPPORT;
-- 
cgit v1.2.2


From 8c0cba22e196122d26c92980943474eb53db8deb Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 19 Feb 2014 13:33:24 +0100
Subject: xfrm: Take xfrm_state_lock in xfrm_migrate_state_find

A comment on xfrm_migrate_state_find() says that xfrm_state_lock
is held. This is apparently not the case, but we need it to
traverse through the state lists.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

(limited to 'net')

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 97d117b80ba4..c101023be3d2 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1215,11 +1215,12 @@ out:
 	return NULL;
 }
 
-/* net->xfrm.xfrm_state_lock is held */
 struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *net)
 {
 	unsigned int h;
-	struct xfrm_state *x;
+	struct xfrm_state *x = NULL;
+
+	spin_lock_bh(&net->xfrm.xfrm_state_lock);
 
 	if (m->reqid) {
 		h = xfrm_dst_hash(net, &m->old_daddr, &m->old_saddr,
@@ -1236,7 +1237,7 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
 					     m->old_family))
 				continue;
 			xfrm_state_hold(x);
-			return x;
+			break;
 		}
 	} else {
 		h = xfrm_src_hash(net, &m->old_daddr, &m->old_saddr,
@@ -1251,11 +1252,13 @@ struct xfrm_state *xfrm_migrate_state_find(struct xfrm_migrate *m, struct net *n
 					     m->old_family))
 				continue;
 			xfrm_state_hold(x);
-			return x;
+			break;
 		}
 	}
 
-	return NULL;
+	spin_unlock_bh(&net->xfrm.xfrm_state_lock);
+
+	return x;
 }
 EXPORT_SYMBOL(xfrm_migrate_state_find);
 
-- 
cgit v1.2.2


From ee5c23176fcc820f7a56d3e86001532af0d59b1e Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 19 Feb 2014 13:33:24 +0100
Subject: xfrm: Clone states properly on migration

We loose a lot of information of the original state if we
clone it with xfrm_state_clone(). In particular, there is
no crypto algorithm attached if the original state uses
an aead algorithm. This patch add the missing information
to the clone state.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_state.c | 8 ++++++++
 net/xfrm/xfrm_user.c  | 5 -----
 2 files changed, 8 insertions(+), 5 deletions(-)

(limited to 'net')

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index c101023be3d2..40f1b3e92e78 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1159,6 +1159,11 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, int *errp)
 	}
 	x->props.aalgo = orig->props.aalgo;
 
+	if (orig->aead) {
+		x->aead = xfrm_algo_aead_clone(orig->aead);
+		if (!x->aead)
+			goto error;
+	}
 	if (orig->ealg) {
 		x->ealg = xfrm_algo_clone(orig->ealg);
 		if (!x->ealg)
@@ -1201,6 +1206,9 @@ static struct xfrm_state *xfrm_state_clone(struct xfrm_state *orig, int *errp)
 	x->props.flags = orig->props.flags;
 	x->props.extra_flags = orig->props.extra_flags;
 
+	x->tfcpad = orig->tfcpad;
+	x->replay_maxdiff = orig->replay_maxdiff;
+	x->replay_maxage = orig->replay_maxage;
 	x->curlft.add_time = orig->curlft.add_time;
 	x->km.state = orig->km.state;
 	x->km.seq = orig->km.seq;
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 1ae3ec7c18b0..c274179d60a2 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -32,11 +32,6 @@
 #include <linux/in6.h>
 #endif
 
-static inline int aead_len(struct xfrm_algo_aead *alg)
-{
-	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
-}
-
 static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 {
 	struct nlattr *rt = attrs[type];
-- 
cgit v1.2.2


From cf71d2bc0b8a473209d5c770ce560853bd720d14 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 20 Feb 2014 10:19:31 +0100
Subject: sit: fix panic with route cache in ip tunnels

Bug introduced by commit 7d442fab0a67 ("ipv4: Cache dst in tunnels").

Because sit code does not call ip_tunnel_init(), the dst_cache was not
initialized.

CC: Tom Herbert <therbert@google.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ip_tunnel.c |  7 ++++---
 net/ipv6/sit.c       | 19 +++++++++++++++++++
 2 files changed, 23 insertions(+), 3 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 08f8cf99b3a2..78a89e61925d 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -93,13 +93,14 @@ static void tunnel_dst_reset(struct ip_tunnel *t)
 	tunnel_dst_set(t, NULL);
 }
 
-static void tunnel_dst_reset_all(struct ip_tunnel *t)
+void ip_tunnel_dst_reset_all(struct ip_tunnel *t)
 {
 	int i;
 
 	for_each_possible_cpu(i)
 		__tunnel_dst_set(per_cpu_ptr(t->dst_cache, i), NULL);
 }
+EXPORT_SYMBOL(ip_tunnel_dst_reset_all);
 
 static struct rtable *tunnel_rtable_get(struct ip_tunnel *t, u32 cookie)
 {
@@ -713,7 +714,7 @@ static void ip_tunnel_update(struct ip_tunnel_net *itn,
 		if (set_mtu)
 			dev->mtu = mtu;
 	}
-	tunnel_dst_reset_all(t);
+	ip_tunnel_dst_reset_all(t);
 	netdev_state_change(dev);
 }
 
@@ -1042,7 +1043,7 @@ void ip_tunnel_uninit(struct net_device *dev)
 	if (itn->fb_tunnel_dev != dev)
 		ip_tunnel_del(netdev_priv(dev));
 
-	tunnel_dst_reset_all(tunnel);
+	ip_tunnel_dst_reset_all(tunnel);
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_uninit);
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 3dfbcf1dcb1c..b4d74c86586c 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -475,6 +475,7 @@ static void ipip6_tunnel_uninit(struct net_device *dev)
 		ipip6_tunnel_unlink(sitn, tunnel);
 		ipip6_tunnel_del_prl(tunnel, NULL);
 	}
+	ip_tunnel_dst_reset_all(tunnel);
 	dev_put(dev);
 }
 
@@ -1082,6 +1083,7 @@ static void ipip6_tunnel_update(struct ip_tunnel *t, struct ip_tunnel_parm *p)
 		t->parms.link = p->link;
 		ipip6_tunnel_bind_dev(t->dev);
 	}
+	ip_tunnel_dst_reset_all(t);
 	netdev_state_change(t->dev);
 }
 
@@ -1112,6 +1114,7 @@ static int ipip6_tunnel_update_6rd(struct ip_tunnel *t,
 	t->ip6rd.relay_prefix = relay_prefix;
 	t->ip6rd.prefixlen = ip6rd->prefixlen;
 	t->ip6rd.relay_prefixlen = ip6rd->relay_prefixlen;
+	ip_tunnel_dst_reset_all(t);
 	netdev_state_change(t->dev);
 	return 0;
 }
@@ -1271,6 +1274,7 @@ ipip6_tunnel_ioctl (struct net_device *dev, struct ifreq *ifr, int cmd)
 			err = ipip6_tunnel_add_prl(t, &prl, cmd == SIOCCHGPRL);
 			break;
 		}
+		ip_tunnel_dst_reset_all(t);
 		netdev_state_change(dev);
 		break;
 
@@ -1326,6 +1330,9 @@ static const struct net_device_ops ipip6_netdev_ops = {
 
 static void ipip6_dev_free(struct net_device *dev)
 {
+	struct ip_tunnel *tunnel = netdev_priv(dev);
+
+	free_percpu(tunnel->dst_cache);
 	free_percpu(dev->tstats);
 	free_netdev(dev);
 }
@@ -1375,6 +1382,12 @@ static int ipip6_tunnel_init(struct net_device *dev)
 		u64_stats_init(&ipip6_tunnel_stats->syncp);
 	}
 
+	tunnel->dst_cache = alloc_percpu(struct ip_tunnel_dst);
+	if (!tunnel->dst_cache) {
+		free_percpu(dev->tstats);
+		return -ENOMEM;
+	}
+
 	return 0;
 }
 
@@ -1405,6 +1418,12 @@ static int __net_init ipip6_fb_tunnel_init(struct net_device *dev)
 		u64_stats_init(&ipip6_fb_stats->syncp);
 	}
 
+	tunnel->dst_cache = alloc_percpu(struct ip_tunnel_dst);
+	if (!tunnel->dst_cache) {
+		free_percpu(dev->tstats);
+		return -ENOMEM;
+	}
+
 	dev_hold(dev);
 	rcu_assign_pointer(sitn->tunnels_wc[0], tunnel);
 	return 0;
-- 
cgit v1.2.2


From 7cce3b75682ff898c935c17d186983cbf3ed393e Mon Sep 17 00:00:00 2001
From: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Date: Thu, 20 Feb 2014 14:13:04 +0100
Subject: net: sctp: Potentially-Failed state should not be reached from
 unconfirmed state

In current implementation it is possible to reach PF state from unconfirmed.
We can interpret sctp-failover-02 in a way that PF state is meant to be reached
only from active state, in the end, this is when entering PF state makes sense.
Here are few quotes from sctp-failover-02, but regardless of these, same
understanding can be reached from whole section 5:

Section 5.1, quickfailover guide:
    "The PF state is an intermediate state between Active and Failed states."

    "Each time the T3-rtx timer expires on an active or idle
    destination, the error counter of that destination address will
    be incremented.  When the value in the error counter exceeds
    PFMR, the endpoint should mark the destination transport address as PF."

There are several concrete reasons for such interpretation. For start, rfc4960
does not take into concern quickfailover algorithm. Therefore, quickfailover
must comply to 4960. Point where this compliance can be argued is following
behavior:
When PF is entered, association overall error counter is incremented for each
missed HB. This is contradictory to rfc4960, as address, while in unconfirmed
state, is subjected to probing, and while it is probed, it should not increment
association overall error counter. This has as a consequence that we might end
up in situation in which we drop association due path failure on unconfirmed
address, in case we have wrong configuration in a way:
Association.Max.Retrans == Path.Max.Retrans.

Another reason is that entering PF from unconfirmed will cause a loss of address
confirmed event when address is once (if) confirmed. This is fine from failover
guide point of view, but it is not consistent with behavior preceding failover
implementation and recommendation from 4960:

5.4.  Path Verification
   Whenever a path is confirmed, an indication MAY be given to the upper
   layer.

Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/sm_sideeffect.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

(limited to 'net')

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index bd859154000e..5d6883ff00c3 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -495,11 +495,12 @@ static void sctp_do_8_2_transport_strike(sctp_cmd_seq_t *commands,
 	}
 
 	/* If the transport error count is greater than the pf_retrans
-	 * threshold, and less than pathmaxrtx, then mark this transport
-	 * as Partially Failed, ee SCTP Quick Failover Draft, secon 5.1,
-	 * point 1
+	 * threshold, and less than pathmaxrtx, and if the current state
+	 * is not SCTP_UNCONFIRMED, then mark this transport as Partially
+	 * Failed, see SCTP Quick Failover Draft, section 5.1
 	 */
 	if ((transport->state != SCTP_PF) &&
+	   (transport->state != SCTP_UNCONFIRMED) &&
 	   (asoc->pf_retrans < transport->pathmaxrxt) &&
 	   (transport->error_count > asoc->pf_retrans)) {
 
-- 
cgit v1.2.2


From 9fe7ed474956944443eec57c5f75be12e10da84e Mon Sep 17 00:00:00 2001
From: Ying Xue <ying.xue@windriver.com>
Date: Thu, 20 Feb 2014 11:32:49 +0800
Subject: tipc: remove all enabled flags from all tipc components

When tipc module is inserted, many tipc components are initialized
one by one. During the initialization period, if one of them is
failed, tipc_core_stop() will be called to stop all components
whatever corresponding components are created or not. To avoid to
release uncreated ones, relevant components have to add necessary
enabled flags indicating whether they are created or not.

But in the initialization stage, if one component is unsuccessfully
created, we will just destroy successfully created components before
the failed component instead of all components. All enabled flags
defined in components, in turn, become redundant. Additionally it's
also unnecessary to identify whether table.types is NULL in
tipc_nametbl_stop() because name stable has been definitely created
successfully when tipc_nametbl_stop() is called.

Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tipc/core.c       | 71 ++++++++++++++++++++++++++++++++++++---------------
 net/tipc/name_table.c |  3 ---
 net/tipc/netlink.c    |  8 ------
 net/tipc/ref.c        |  3 ---
 net/tipc/server.c     |  5 ----
 net/tipc/server.h     |  2 --
 net/tipc/socket.c     |  8 ------
 7 files changed, 50 insertions(+), 50 deletions(-)

(limited to 'net')

diff --git a/net/tipc/core.c b/net/tipc/core.c
index f9e88d8b04ca..cfd9cc150110 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -122,30 +122,59 @@ static void tipc_core_stop(void)
  */
 static int tipc_core_start(void)
 {
-	int res;
+	int err;
 
 	get_random_bytes(&tipc_random, sizeof(tipc_random));
 
-	res = tipc_handler_start();
-	if (!res)
-		res = tipc_ref_table_init(tipc_max_ports, tipc_random);
-	if (!res)
-		res = tipc_nametbl_init();
-	if (!res)
-		res = tipc_netlink_start();
-	if (!res)
-		res = tipc_socket_init();
-	if (!res)
-		res = tipc_register_sysctl();
-	if (!res)
-		res = tipc_subscr_start();
-	if (!res)
-		res = tipc_cfg_init();
-	if (res) {
-		tipc_handler_stop();
-		tipc_core_stop();
-	}
-	return res;
+	err = tipc_handler_start();
+	if (err)
+		goto out_handler;
+
+	err = tipc_ref_table_init(tipc_max_ports, tipc_random);
+	if (err)
+		goto out_reftbl;
+
+	err = tipc_nametbl_init();
+	if (err)
+		goto out_nametbl;
+
+	err = tipc_netlink_start();
+	if (err)
+		goto out_netlink;
+
+	err = tipc_socket_init();
+	if (err)
+		goto out_socket;
+
+	err = tipc_register_sysctl();
+	if (err)
+		goto out_sysctl;
+
+	err = tipc_subscr_start();
+	if (err)
+		goto out_subscr;
+
+	err = tipc_cfg_init();
+	if (err)
+		goto out_cfg;
+
+	return 0;
+out_cfg:
+	tipc_subscr_stop();
+out_subscr:
+	tipc_unregister_sysctl();
+out_sysctl:
+	tipc_socket_stop();
+out_socket:
+	tipc_netlink_stop();
+out_netlink:
+	tipc_nametbl_stop();
+out_nametbl:
+	tipc_ref_table_stop();
+out_reftbl:
+	tipc_handler_stop();
+out_handler:
+	return err;
 }
 
 static int __init tipc_init(void)
diff --git a/net/tipc/name_table.c b/net/tipc/name_table.c
index 92a1533af4e0..48302be175ce 100644
--- a/net/tipc/name_table.c
+++ b/net/tipc/name_table.c
@@ -945,9 +945,6 @@ void tipc_nametbl_stop(void)
 {
 	u32 i;
 
-	if (!table.types)
-		return;
-
 	/* Verify name table is empty, then release it */
 	write_lock_bh(&tipc_nametbl_lock);
 	for (i = 0; i < TIPC_NAMETBL_SIZE; i++) {
diff --git a/net/tipc/netlink.c b/net/tipc/netlink.c
index 9f72a6376362..3aaf73de9e2d 100644
--- a/net/tipc/netlink.c
+++ b/net/tipc/netlink.c
@@ -83,8 +83,6 @@ static struct genl_ops tipc_genl_ops[] = {
 	},
 };
 
-static int tipc_genl_family_registered;
-
 int tipc_netlink_start(void)
 {
 	int res;
@@ -94,16 +92,10 @@ int tipc_netlink_start(void)
 		pr_err("Failed to register netlink interface\n");
 		return res;
 	}
-
-	tipc_genl_family_registered = 1;
 	return 0;
 }
 
 void tipc_netlink_stop(void)
 {
-	if (!tipc_genl_family_registered)
-		return;
-
 	genl_unregister_family(&tipc_genl_family);
-	tipc_genl_family_registered = 0;
 }
diff --git a/net/tipc/ref.c b/net/tipc/ref.c
index 2a2a938dc22c..de3d593e2fee 100644
--- a/net/tipc/ref.c
+++ b/net/tipc/ref.c
@@ -126,9 +126,6 @@ int tipc_ref_table_init(u32 requested_size, u32 start)
  */
 void tipc_ref_table_stop(void)
 {
-	if (!tipc_ref_table.entries)
-		return;
-
 	vfree(tipc_ref_table.entries);
 	tipc_ref_table.entries = NULL;
 }
diff --git a/net/tipc/server.c b/net/tipc/server.c
index b635ca347a87..373979789a73 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -573,7 +573,6 @@ int tipc_server_start(struct tipc_server *s)
 		kmem_cache_destroy(s->rcvbuf_cache);
 		return ret;
 	}
-	s->enabled = 1;
 	return ret;
 }
 
@@ -583,10 +582,6 @@ void tipc_server_stop(struct tipc_server *s)
 	int total = 0;
 	int id;
 
-	if (!s->enabled)
-		return;
-
-	s->enabled = 0;
 	spin_lock_bh(&s->idr_lock);
 	for (id = 0; total < s->idr_in_use; id++) {
 		con = idr_find(&s->conn_idr, id);
diff --git a/net/tipc/server.h b/net/tipc/server.h
index 98b23f20bc0f..be817b0b547e 100644
--- a/net/tipc/server.h
+++ b/net/tipc/server.h
@@ -56,7 +56,6 @@
  * @name: server name
  * @imp: message importance
  * @type: socket type
- * @enabled: identify whether server is launched or not
  */
 struct tipc_server {
 	struct idr conn_idr;
@@ -74,7 +73,6 @@ struct tipc_server {
 	const char name[TIPC_SERVER_NAME_LEN];
 	int imp;
 	int type;
-	int enabled;
 };
 
 int tipc_conn_sendmsg(struct tipc_server *s, int conid,
diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index aab4948f0aff..a4cf274455aa 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -70,8 +70,6 @@ static const struct proto_ops msg_ops;
 static struct proto tipc_proto;
 static struct proto tipc_proto_kern;
 
-static int sockets_enabled;
-
 /*
  * Revised TIPC socket locking policy:
  *
@@ -2027,8 +2025,6 @@ int tipc_socket_init(void)
 		proto_unregister(&tipc_proto);
 		goto out;
 	}
-
-	sockets_enabled = 1;
  out:
 	return res;
 }
@@ -2038,10 +2034,6 @@ int tipc_socket_init(void)
  */
 void tipc_socket_stop(void)
 {
-	if (!sockets_enabled)
-		return;
-
-	sockets_enabled = 0;
 	sock_unregister(tipc_family_ops.family);
 	proto_unregister(&tipc_proto);
 }
-- 
cgit v1.2.2


From 970122fdf4b2d79c708022f2fdc0ab3840311d87 Mon Sep 17 00:00:00 2001
From: Ying Xue <ying.xue@windriver.com>
Date: Thu, 20 Feb 2014 11:32:50 +0800
Subject: tipc: make bearer set up in module insertion stage

Accidentally a side effect is involved by commit 6e967adf7(tipc:
relocate common functions from media to bearer). Now tipc stack
handler of receiving packets from netdevices as well as netdevice
notification handler are registered when bearer is enabled rather
than tipc module initialization stage, but the two handlers are
both unregistered in tipc module exit phase. If tipc module is
inserted and then immediately removed, the following warning
message will appear:

"dev_remove_pack: ffffffffa0380940 not found"

This is because in module insertion stage tipc stack packet handler
is not registered at all, but in module exit phase dev_remove_pack()
needs to remove it. Of course, dev_remove_pack() cannot find tipc
protocol handler from the kernel protocol handler list so that the
warning message is printed out.

But if registering the two handlers is adjusted from enabling bearer
phase into inserting module stage, the warning message will be
eliminated. Due to this change, tipc_core_start_net() and
tipc_core_stop_net() can be deleted as well.

Reported-by: Wang Weidong <wangweidong1@huawei.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Reviewed-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/tipc/bearer.c |  7 ++++++-
 net/tipc/config.c |  2 +-
 net/tipc/core.c   | 38 +++++++++-----------------------------
 net/tipc/core.h   |  1 -
 4 files changed, 16 insertions(+), 32 deletions(-)

(limited to 'net')

diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
index a38c89969c68..574b86193b15 100644
--- a/net/tipc/bearer.c
+++ b/net/tipc/bearer.c
@@ -610,8 +610,13 @@ static struct notifier_block notifier = {
 
 int tipc_bearer_setup(void)
 {
+	int err;
+
+	err = register_netdevice_notifier(&notifier);
+	if (err)
+		return err;
 	dev_add_pack(&tipc_packet_type);
-	return register_netdevice_notifier(&notifier);
+	return 0;
 }
 
 void tipc_bearer_cleanup(void)
diff --git a/net/tipc/config.c b/net/tipc/config.c
index c301a9a592d8..e74eef2e7490 100644
--- a/net/tipc/config.c
+++ b/net/tipc/config.c
@@ -181,7 +181,7 @@ static struct sk_buff *cfg_set_own_addr(void)
 	if (tipc_own_addr)
 		return tipc_cfg_reply_error_string(TIPC_CFG_NOT_SUPPORTED
 						   " (cannot change node address once assigned)");
-	tipc_core_start_net(addr);
+	tipc_net_start(addr);
 	return tipc_cfg_reply_none();
 }
 
diff --git a/net/tipc/core.c b/net/tipc/core.c
index cfd9cc150110..80c20647b3d2 100644
--- a/net/tipc/core.c
+++ b/net/tipc/core.c
@@ -76,38 +76,14 @@ struct sk_buff *tipc_buf_acquire(u32 size)
 	return skb;
 }
 
-/**
- * tipc_core_stop_net - shut down TIPC networking sub-systems
- */
-static void tipc_core_stop_net(void)
-{
-	tipc_net_stop();
-	tipc_bearer_cleanup();
-}
-
-/**
- * start_net - start TIPC networking sub-systems
- */
-int tipc_core_start_net(unsigned long addr)
-{
-	int res;
-
-	tipc_net_start(addr);
-	res = tipc_bearer_setup();
-	if (res < 0)
-		goto err;
-	return res;
-
-err:
-	tipc_core_stop_net();
-	return res;
-}
-
 /**
  * tipc_core_stop - switch TIPC from SINGLE NODE to NOT RUNNING mode
  */
 static void tipc_core_stop(void)
 {
+	tipc_handler_stop();
+	tipc_net_stop();
+	tipc_bearer_cleanup();
 	tipc_netlink_stop();
 	tipc_cfg_stop();
 	tipc_subscr_stop();
@@ -158,7 +134,13 @@ static int tipc_core_start(void)
 	if (err)
 		goto out_cfg;
 
+	err = tipc_bearer_setup();
+	if (err)
+		goto out_bearer;
+
 	return 0;
+out_bearer:
+	tipc_cfg_stop();
 out_cfg:
 	tipc_subscr_stop();
 out_subscr:
@@ -203,8 +185,6 @@ static int __init tipc_init(void)
 
 static void __exit tipc_exit(void)
 {
-	tipc_handler_stop();
-	tipc_core_stop_net();
 	tipc_core_stop();
 	pr_info("Deactivated\n");
 }
diff --git a/net/tipc/core.h b/net/tipc/core.h
index 5569d96b4da3..4dfe137587bb 100644
--- a/net/tipc/core.h
+++ b/net/tipc/core.h
@@ -90,7 +90,6 @@ extern int tipc_random __read_mostly;
 /*
  * Routines available to privileged subsystems
  */
-int tipc_core_start_net(unsigned long);
 int tipc_handler_start(void);
 void tipc_handler_stop(void);
 int tipc_netlink_start(void);
-- 
cgit v1.2.2


From f5ddcbbb40aa0ba7fbfe22355d287603dbeeaaac Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 20 Feb 2014 10:09:18 -0800
Subject: net-tcp: fastopen: fix high order allocations

This patch fixes two bugs in fastopen :

1) The tcp_sendmsg(...,  @size) argument was ignored.

   Code was relying on user not fooling the kernel with iovec mismatches

2) When MTU is about 64KB, tcp_send_syn_data() attempts order-5
allocations, which are likely to fail when memory gets fragmented.

Fixes: 783237e8daf13 ("net-tcp: Fast Open client - sending SYN-data")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Tested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp.c        | 8 +++++---
 net/ipv4/tcp_output.c | 7 ++++++-
 2 files changed, 11 insertions(+), 4 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9f3a2db9109e..97c8f5620c43 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1044,7 +1044,8 @@ void tcp_free_fastopen_req(struct tcp_sock *tp)
 	}
 }
 
-static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *size)
+static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg,
+				int *copied, size_t size)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int err, flags;
@@ -1059,11 +1060,12 @@ static int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *size)
 	if (unlikely(tp->fastopen_req == NULL))
 		return -ENOBUFS;
 	tp->fastopen_req->data = msg;
+	tp->fastopen_req->size = size;
 
 	flags = (msg->msg_flags & MSG_DONTWAIT) ? O_NONBLOCK : 0;
 	err = __inet_stream_connect(sk->sk_socket, msg->msg_name,
 				    msg->msg_namelen, flags);
-	*size = tp->fastopen_req->copied;
+	*copied = tp->fastopen_req->copied;
 	tcp_free_fastopen_req(tp);
 	return err;
 }
@@ -1083,7 +1085,7 @@ int tcp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 
 	flags = msg->msg_flags;
 	if (flags & MSG_FASTOPEN) {
-		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn);
+		err = tcp_sendmsg_fastopen(sk, msg, &copied_syn, size);
 		if (err == -EINPROGRESS && copied_syn > 0)
 			goto out;
 		else if (err)
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 3be16727f058..09805817627b 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2908,7 +2908,12 @@ static int tcp_send_syn_data(struct sock *sk, struct sk_buff *syn)
 	space = __tcp_mtu_to_mss(sk, inet_csk(sk)->icsk_pmtu_cookie) -
 		MAX_TCP_OPTION_SPACE;
 
-	syn_data = skb_copy_expand(syn, skb_headroom(syn), space,
+	space = min_t(size_t, space, fo->size);
+
+	/* limit to order-0 allocations */
+	space = min_t(size_t, space, SKB_MAX_HEAD(MAX_TCP_HEADER));
+
+	syn_data = skb_copy_expand(syn, MAX_TCP_HEADER, space,
 				   sk->sk_allocation);
 	if (syn_data == NULL)
 		goto fallback;
-- 
cgit v1.2.2


From b194c1f1dbd5f2671e49e0ac801b1b78dc7de93b Mon Sep 17 00:00:00 2001
From: Jiri Pirko <jiri@resnulli.us>
Date: Fri, 21 Feb 2014 14:52:57 +0100
Subject: neigh: fix setting of default gc_* values

This patch fixes bug introduced by:
commit 1d4c8c29841b9991cdf3c7cc4ba7f96a94f104ca
"neigh: restore old behaviour of default parms values"

The thing is that in neigh_sysctl_register, extra1 and extra2 which were
previously set for NEIGH_VAR_GC_* are overwritten. That leads to
nonsense int limits for gc_* variables. So fix this by not touching
extra* fields for gc_* variables.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/neighbour.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index b9e9e0d38672..e1aa0f36cfe7 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -3046,7 +3046,7 @@ int neigh_sysctl_register(struct net_device *dev, struct neigh_parms *p,
 	if (!t)
 		goto err;
 
-	for (i = 0; i < ARRAY_SIZE(t->neigh_vars); i++) {
+	for (i = 0; i < NEIGH_VAR_GC_INTERVAL; i++) {
 		t->neigh_vars[i].data += (long) p;
 		t->neigh_vars[i].extra1 = dev;
 		t->neigh_vars[i].extra2 = p;
-- 
cgit v1.2.2


From 4c47af4d5eb2c2f78f886079a3920a7078a6f0a0 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <dborkman@redhat.com>
Date: Thu, 20 Feb 2014 20:51:06 +0100
Subject: net: sctp: rework multihoming retransmission path selection to
 rfc4960

Problem statement: 1) both paths (primary path1 and alternate
path2) are up after the association has been established i.e.,
HB packets are normally exchanged, 2) path2 gets inactive after
path_max_retrans * max_rto timed out (i.e. path2 is down completely),
3) now, if a transmission times out on the only surviving/active
path1 (any ~1sec network service impact could cause this like
a channel bonding failover), then the retransmitted packets are
sent over the inactive path2; this happens with partial failover
and without it.

Besides not being optimal in the above scenario, a small failure
or timeout in the only existing path has the potential to cause
long delays in the retransmission (depending on RTO_MAX) until
the still active path is reselected. Further, when the T3-timeout
occurs, we have active_patch == retrans_path, and even though the
timeout occurred on the initial transmission of data, not a
retransmit, we end up updating retransmit path.

RFC4960, section 6.4. "Multi-Homed SCTP Endpoints" states under
6.4.1. "Failover from an Inactive Destination Address" the
following:

  Some of the transport addresses of a multi-homed SCTP endpoint
  may become inactive due to either the occurrence of certain
  error conditions (see Section 8.2) or adjustments from the
  SCTP user.

  When there is outbound data to send and the primary path
  becomes inactive (e.g., due to failures), or where the SCTP
  user explicitly requests to send data to an inactive
  destination transport address, before reporting an error to
  its ULP, the SCTP endpoint should try to send the data to an
  alternate __active__ destination transport address if one
  exists.

  When retransmitting data that timed out, if the endpoint is
  multihomed, it should consider each source-destination address
  pair in its retransmission selection policy. When retransmitting
  timed-out data, the endpoint should attempt to pick the most
  divergent source-destination pair from the original
  source-destination pair to which the packet was transmitted.

  Note: Rules for picking the most divergent source-destination
  pair are an implementation decision and are not specified
  within this document.

So, we should first reconsider to take the current active
retransmission transport if we cannot find an alternative
active one. If all of that fails, we can still round robin
through unkown, partial failover, and inactive ones in the
hope to find something still suitable.

Commit 4141ddc02a92 ("sctp: retran_path update bug fix") broke
that behaviour by selecting the next inactive transport when
no other active transport was found besides the current assoc's
peer.retran_path. Before commit 4141ddc02a92, we would have
traversed through the list until we reach our peer.retran_path
again, and in case that is still in state SCTP_ACTIVE, we would
take it and return. Only if that is not the case either, we
take the next inactive transport.

Besides all that, another issue is that transports in state
SCTP_UNKNOWN could be preferred over transports in state
SCTP_ACTIVE in case a SCTP_ACTIVE transport appears after
SCTP_UNKNOWN in the transport list yielding a weaker transport
state to be used in retransmission.

This patch mostly reverts 4141ddc02a92, but also rewrites
this function to introduce more clarity and strictness into
the code. A strict priority of transport states is enforced
in this patch, hence selection is active > unkown > partial
failover > inactive.

Fixes: 4141ddc02a92 ("sctp: retran_path update bug fix")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Gui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: Vlad Yasevich <yasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/associola.c | 129 +++++++++++++++++++++++++++++++--------------------
 1 file changed, 79 insertions(+), 50 deletions(-)

(limited to 'net')

diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index f558433537b8..ee13d28d39d1 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -1239,78 +1239,107 @@ void sctp_assoc_update(struct sctp_association *asoc,
 }
 
 /* Update the retran path for sending a retransmitted packet.
- * Round-robin through the active transports, else round-robin
- * through the inactive transports as this is the next best thing
- * we can try.
+ * See also RFC4960, 6.4. Multi-Homed SCTP Endpoints:
+ *
+ *   When there is outbound data to send and the primary path
+ *   becomes inactive (e.g., due to failures), or where the
+ *   SCTP user explicitly requests to send data to an
+ *   inactive destination transport address, before reporting
+ *   an error to its ULP, the SCTP endpoint should try to send
+ *   the data to an alternate active destination transport
+ *   address if one exists.
+ *
+ *   When retransmitting data that timed out, if the endpoint
+ *   is multihomed, it should consider each source-destination
+ *   address pair in its retransmission selection policy.
+ *   When retransmitting timed-out data, the endpoint should
+ *   attempt to pick the most divergent source-destination
+ *   pair from the original source-destination pair to which
+ *   the packet was transmitted.
+ *
+ *   Note: Rules for picking the most divergent source-destination
+ *   pair are an implementation decision and are not specified
+ *   within this document.
+ *
+ * Our basic strategy is to round-robin transports in priorities
+ * according to sctp_state_prio_map[] e.g., if no such
+ * transport with state SCTP_ACTIVE exists, round-robin through
+ * SCTP_UNKNOWN, etc. You get the picture.
  */
-void sctp_assoc_update_retran_path(struct sctp_association *asoc)
+static const u8 sctp_trans_state_to_prio_map[] = {
+	[SCTP_ACTIVE]	= 3,	/* best case */
+	[SCTP_UNKNOWN]	= 2,
+	[SCTP_PF]	= 1,
+	[SCTP_INACTIVE] = 0,	/* worst case */
+};
+
+static u8 sctp_trans_score(const struct sctp_transport *trans)
 {
-	struct sctp_transport *t, *next;
-	struct list_head *head = &asoc->peer.transport_addr_list;
-	struct list_head *pos;
+	return sctp_trans_state_to_prio_map[trans->state];
+}
 
-	if (asoc->peer.transport_count == 1)
-		return;
+static struct sctp_transport *sctp_trans_elect_best(struct sctp_transport *curr,
+						    struct sctp_transport *best)
+{
+	if (best == NULL)
+		return curr;
 
-	/* Find the next transport in a round-robin fashion. */
-	t = asoc->peer.retran_path;
-	pos = &t->transports;
-	next = NULL;
+	return sctp_trans_score(curr) > sctp_trans_score(best) ? curr : best;
+}
 
-	while (1) {
-		/* Skip the head. */
-		if (pos->next == head)
-			pos = head->next;
-		else
-			pos = pos->next;
+void sctp_assoc_update_retran_path(struct sctp_association *asoc)
+{
+	struct sctp_transport *trans = asoc->peer.retran_path;
+	struct sctp_transport *trans_next = NULL;
 
-		t = list_entry(pos, struct sctp_transport, transports);
+	/* We're done as we only have the one and only path. */
+	if (asoc->peer.transport_count == 1)
+		return;
+	/* If active_path and retran_path are the same and active,
+	 * then this is the only active path. Use it.
+	 */
+	if (asoc->peer.active_path == asoc->peer.retran_path &&
+	    asoc->peer.active_path->state == SCTP_ACTIVE)
+		return;
 
-		/* We have exhausted the list, but didn't find any
-		 * other active transports.  If so, use the next
-		 * transport.
-		 */
-		if (t == asoc->peer.retran_path) {
-			t = next;
+	/* Iterate from retran_path's successor back to retran_path. */
+	for (trans = list_next_entry(trans, transports); 1;
+	     trans = list_next_entry(trans, transports)) {
+		/* Manually skip the head element. */
+		if (&trans->transports == &asoc->peer.transport_addr_list)
+			continue;
+		if (trans->state == SCTP_UNCONFIRMED)
+			continue;
+		trans_next = sctp_trans_elect_best(trans, trans_next);
+		/* Active is good enough for immediate return. */
+		if (trans_next->state == SCTP_ACTIVE)
 			break;
-		}
-
-		/* Try to find an active transport. */
-
-		if ((t->state == SCTP_ACTIVE) ||
-		    (t->state == SCTP_UNKNOWN)) {
+		/* We've reached the end, time to update path. */
+		if (trans == asoc->peer.retran_path)
 			break;
-		} else {
-			/* Keep track of the next transport in case
-			 * we don't find any active transport.
-			 */
-			if (t->state != SCTP_UNCONFIRMED && !next)
-				next = t;
-		}
 	}
 
-	if (t)
-		asoc->peer.retran_path = t;
-	else
-		t = asoc->peer.retran_path;
+	if (trans_next != NULL)
+		asoc->peer.retran_path = trans_next;
 
-	pr_debug("%s: association:%p addr:%pISpc\n", __func__, asoc,
-		 &t->ipaddr.sa);
+	pr_debug("%s: association:%p updated new path to addr:%pISpc\n",
+		 __func__, asoc, &asoc->peer.retran_path->ipaddr.sa);
 }
 
-/* Choose the transport for sending retransmit packet.  */
-struct sctp_transport *sctp_assoc_choose_alter_transport(
-	struct sctp_association *asoc, struct sctp_transport *last_sent_to)
+struct sctp_transport *
+sctp_assoc_choose_alter_transport(struct sctp_association *asoc,
+				  struct sctp_transport *last_sent_to)
 {
 	/* If this is the first time packet is sent, use the active path,
 	 * else use the retran path. If the last packet was sent over the
 	 * retran path, update the retran path and use it.
 	 */
-	if (!last_sent_to)
+	if (last_sent_to == NULL) {
 		return asoc->peer.active_path;
-	else {
+	} else {
 		if (last_sent_to == asoc->peer.retran_path)
 			sctp_assoc_update_retran_path(asoc);
+
 		return asoc->peer.retran_path;
 	}
 }
-- 
cgit v1.2.2


From 916e4cf46d0204806c062c8c6c4d1f633852c5b6 Mon Sep 17 00:00:00 2001
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Fri, 21 Feb 2014 02:55:35 +0100
Subject: ipv6: reuse ip6_frag_id from ip6_ufo_append_data

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/udp_offload.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index e7359f9eaa8d..b261ee8b83fc 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -113,7 +113,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
 		fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
 		fptr->nexthdr = nexthdr;
 		fptr->reserved = 0;
-		ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+		fptr->identification = skb_shinfo(skb)->ip6_frag_id;
 
 		/* Fragment the skb. ipv6 header and the remaining fields of the
 		 * fragment header are updated in ipv6_gso_segment()
-- 
cgit v1.2.2


From 44a589ca2de4d7980f8b335481a09bf818065dbe Mon Sep 17 00:00:00 2001
From: Amitkumar Karwar <akarwar@marvell.com>
Date: Thu, 6 Feb 2014 11:28:31 -0800
Subject: NFC: NCI: Fix NULL pointer dereference

The check should be for setup function pointer.

This patch fixes NULL pointer dereference issue for NCI
based NFC driver which doesn't define setup handler.

Signed-off-by: Amitkumar Karwar <akarwar@marvell.com>
Signed-off-by: Bing Zhao <bzhao@marvell.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
---
 net/nfc/nci/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/nfc/nci/core.c b/net/nfc/nci/core.c
index 46bda010bf11..56db888b1cd5 100644
--- a/net/nfc/nci/core.c
+++ b/net/nfc/nci/core.c
@@ -301,7 +301,7 @@ static int nci_open_device(struct nci_dev *ndev)
 	rc = __nci_request(ndev, nci_reset_req, 0,
 			   msecs_to_jiffies(NCI_RESET_TIMEOUT));
 
-	if (ndev->ops->setup(ndev))
+	if (ndev->ops->setup)
 		ndev->ops->setup(ndev);
 
 	if (!rc) {
-- 
cgit v1.2.2


From 963a1852fbac4f75a2d938fa2e734ef1e6d4c044 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Fri, 21 Feb 2014 20:34:34 +0100
Subject: mac80211: don't validate unchanged AP bandwidth while tracking

The MLME code in mac80211 must track whether or not the AP changed
bandwidth, but if there's no change while tracking it shouldn't do
anything, otherwise regulatory updates can make it impossible to
connect to certain APs if the regulatory database doesn't match the
information from the AP. See the precise scenario described in the
code.

This still leaves some possible problems with CSA or if the AP
actually changed bandwidth, but those cases are less common and
won't completely prevent using it.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=70881

Cc: stable@vger.kernel.org
Reported-and-tested-by: Nate Carlson <kernel@natecarlson.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/mlme.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

(limited to 'net')

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 57d5482b10fa..c415f00cd6b6 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -271,6 +271,28 @@ ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata,
 	ret = 0;
 
 out:
+	/*
+	 * When tracking the current AP, don't do any further checks if the
+	 * new chandef is identical to the one we're currently using for the
+	 * connection. This keeps us from playing ping-pong with regulatory,
+	 * without it the following can happen (for example):
+	 *  - connect to an AP with 80 MHz, world regdom allows 80 MHz
+	 *  - AP advertises regdom US
+	 *  - CRDA loads regdom US with 80 MHz prohibited (old database)
+	 *  - the code below detects an unsupported channel, downgrades, and
+	 *    we disconnect from the AP in the caller
+	 *  - disconnect causes CRDA to reload world regdomain and the game
+	 *    starts anew.
+	 * (see https://bugzilla.kernel.org/show_bug.cgi?id=70881)
+	 *
+	 * It seems possible that there are still scenarios with CSA or real
+	 * bandwidth changes where a this could happen, but those cases are
+	 * less common and wouldn't completely prevent using the AP.
+	 */
+	if (tracking &&
+	    cfg80211_chandef_identical(chandef, &sdata->vif.bss_conf.chandef))
+		return ret;
+
 	/* don't print the message below for VHT mismatch if VHT is disabled */
 	if (ret & IEEE80211_STA_DISABLE_VHT)
 		vht_chandef = *chandef;
-- 
cgit v1.2.2


From d10473d4e3f9d1b81b50a60c8465d6f59a095c46 Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Sat, 22 Feb 2014 22:25:57 -0800
Subject: tcp: reduce the bloat caused by tcp_is_cwnd_limited()

tcp_is_cwnd_limited() allows GSO/TSO enabled flows to increase
their cwnd to allow a full size (64KB) TSO packet to be sent.

Non GSO flows only allow an extra room of 3 MSS.

For most flows with a BDP below 10 MSS, this results in a bloat
of cwnd reaching 90, and an inflate of RTT.

Thanks to TSO auto sizing, we can restrict the bloat to the number
of MSS contained in a TSO packet (tp->xmit_size_goal_segs), to keep
original intent without performance impact.

Because we keep cwnd small, it helps to keep TSO packet size to their
optimal value.

Example for a 10Mbit flow, with low TCP Small queue limits (no more than
2 skb in qdisc/device tx ring)

Before patch :

lpk51:~# ./ss -i dst lpk52:44862 | grep cwnd
         cubic wscale:6,6 rto:215 rtt:15.875/2.5 mss:1448 cwnd:96
ssthresh:96
send 70.1Mbps unacked:14 rcv_space:29200

After patch :

lpk51:~# ./ss -i dst lpk52:52916 | grep cwnd
         cubic wscale:6,6 rto:206 rtt:5.206/0.036 mss:1448 cwnd:15
ssthresh:14
send 33.4Mbps unacked:4 rcv_space:29200

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp_cong.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index ad37bf18ae4b..2388275adb9b 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -290,8 +290,7 @@ bool tcp_is_cwnd_limited(const struct sock *sk, u32 in_flight)
 	left = tp->snd_cwnd - in_flight;
 	if (sk_can_gso(sk) &&
 	    left * sysctl_tcp_tso_win_divisor < tp->snd_cwnd &&
-	    left * tp->mss_cache < sk->sk_gso_max_size &&
-	    left < sk->sk_gso_max_segs)
+	    left < tp->xmit_size_goal_segs)
 		return true;
 	return left <= tcp_max_tso_deferred_mss(tp);
 }
-- 
cgit v1.2.2


From 092008abeed1e4168c08826262695ea67657e1aa Mon Sep 17 00:00:00 2001
From: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Date: Fri, 14 Feb 2014 08:54:00 +0100
Subject: cfg80211: regulatory: reset regdomain in case of error

Reset regdomain to world regdomain in case
of errors in set_regdom() function.

This will fix a problem with such scenario:
- iw reg set US
- iw reg set 00
- iw reg set US
The last step always fail and we get deadlock
in kernel regulatory code. Next setting new
regulatory wasn't possible due to:

Pending regulatory request, waiting for it to be processed...

Signed-off-by: Janusz Dziedzic <janusz.dziedzic@tieto.com>
Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/wireless/reg.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/wireless/reg.c b/net/wireless/reg.c
index 4c50c21d6f52..f0541370e68e 100644
--- a/net/wireless/reg.c
+++ b/net/wireless/reg.c
@@ -2373,6 +2373,7 @@ static int reg_set_rd_country_ie(const struct ieee80211_regdomain *rd,
 int set_regdom(const struct ieee80211_regdomain *rd)
 {
 	struct regulatory_request *lr;
+	bool user_reset = false;
 	int r;
 
 	if (!reg_is_valid_request(rd->alpha2)) {
@@ -2389,6 +2390,7 @@ int set_regdom(const struct ieee80211_regdomain *rd)
 		break;
 	case NL80211_REGDOM_SET_BY_USER:
 		r = reg_set_rd_user(rd, lr);
+		user_reset = true;
 		break;
 	case NL80211_REGDOM_SET_BY_DRIVER:
 		r = reg_set_rd_driver(rd, lr);
@@ -2402,8 +2404,14 @@ int set_regdom(const struct ieee80211_regdomain *rd)
 	}
 
 	if (r) {
-		if (r == -EALREADY)
+		switch (r) {
+		case -EALREADY:
 			reg_set_request_processed();
+			break;
+		default:
+			/* Back to world regulatory in case of errors */
+			restore_regulatory_settings(user_reset);
+		}
 
 		kfree(rd);
 		return r;
-- 
cgit v1.2.2


From 91a48a2e85a3b70ce10ead34b4ab5347f8d215c9 Mon Sep 17 00:00:00 2001
From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Mon, 24 Feb 2014 00:48:05 +0100
Subject: ipv4: ipv6: better estimate tunnel header cut for correct ufo
 handling

Currently the UFO fragmentation process does not correctly handle inner
UDP frames.

(The following tcpdumps are captured on the parent interface with ufo
disabled while tunnel has ufo enabled, 2000 bytes payload, mtu 1280,
both sit device):

IPv6:
16:39:10.031613 IP (tos 0x0, ttl 64, id 3208, offset 0, flags [DF], proto IPv6 (41), length 1300)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 1240) 2001::1 > 2001::8: frag (0x00000001:0|1232) 44883 > distinct: UDP, length 2000
16:39:10.031709 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPv6 (41), length 844)
    192.168.122.151 > 1.1.1.1: IP6 (hlim 64, next-header Fragment (44) payload length: 784) 2001::1 > 2001::8: frag (0x00000001:0|776) 58979 > 46366: UDP, length 5471

We can see that fragmentation header offset is not correctly updated.
(fragmentation id handling is corrected by 916e4cf46d0204 ("ipv6: reuse
ip6_frag_id from ip6_ufo_append_data")).

IPv4:
16:39:57.737761 IP (tos 0x0, ttl 64, id 3209, offset 0, flags [DF], proto IPIP (4), length 1296)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57034, offset 0, flags [none], proto UDP (17), length 1276)
    192.168.99.1.35961 > 192.168.99.2.distinct: UDP, length 2000
16:39:57.738028 IP (tos 0x0, ttl 64, id 3210, offset 0, flags [DF], proto IPIP (4), length 792)
    192.168.122.151 > 1.1.1.1: IP (tos 0x0, ttl 64, id 57035, offset 0, flags [none], proto UDP (17), length 772)
    192.168.99.1.13531 > 192.168.99.2.20653: UDP, length 51109

In this case fragmentation id is incremented and offset is not updated.

First, I aligned inet_gso_segment and ipv6_gso_segment:
* align naming of flags
* ipv6_gso_segment: setting skb->encapsulation is unnecessary, as we
  always ensure that the state of this flag is left untouched when
  returning from upper gso segmenation function
* ipv6_gso_segment: move skb_reset_inner_headers below updating the
  fragmentation header data, we don't care for updating fragmentation
  header data
* remove currently unneeded comment indicating skb->encapsulation might
  get changed by upper gso_segment callback (gre and udp-tunnel reset
  encapsulation after segmentation on each fragment)

If we encounter an IPIP or SIT gso skb we now check for the protocol ==
IPPROTO_UDP and that we at least have already traversed another ip(6)
protocol header.

The reason why we have to special case GSO_IPIP and GSO_SIT is that
we reset skb->encapsulation to 0 while skb_mac_gso_segment the inner
protocol of GSO_UDP_TUNNEL or GSO_GRE packets.

Reported-by: Wolfgang Walter <linux@stwm.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/af_inet.c     |  7 +++++--
 net/ipv6/ip6_offload.c | 20 ++++++++++++--------
 2 files changed, 17 insertions(+), 10 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index ecd2c3f245ce..19ab78aca547 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -1296,8 +1296,11 @@ static struct sk_buff *inet_gso_segment(struct sk_buff *skb,
 
 	segs = ERR_PTR(-EPROTONOSUPPORT);
 
-	/* Note : following gso_segment() might change skb->encapsulation */
-	udpfrag = !skb->encapsulation && proto == IPPROTO_UDP;
+	if (skb->encapsulation &&
+	    skb_shinfo(skb)->gso_type & (SKB_GSO_SIT|SKB_GSO_IPIP))
+		udpfrag = proto == IPPROTO_UDP && encap;
+	else
+		udpfrag = proto == IPPROTO_UDP && !skb->encapsulation;
 
 	ops = rcu_dereference(inet_offloads[proto]);
 	if (likely(ops && ops->callbacks.gso_segment))
diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c
index 1e8683b135bb..59f95affceb0 100644
--- a/net/ipv6/ip6_offload.c
+++ b/net/ipv6/ip6_offload.c
@@ -89,7 +89,7 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	unsigned int unfrag_ip6hlen;
 	u8 *prevhdr;
 	int offset = 0;
-	bool tunnel;
+	bool encap, udpfrag;
 	int nhoff;
 
 	if (unlikely(skb_shinfo(skb)->gso_type &
@@ -110,8 +110,8 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	if (unlikely(!pskb_may_pull(skb, sizeof(*ipv6h))))
 		goto out;
 
-	tunnel = SKB_GSO_CB(skb)->encap_level > 0;
-	if (tunnel)
+	encap = SKB_GSO_CB(skb)->encap_level > 0;
+	if (encap)
 		features = skb->dev->hw_enc_features & netif_skb_features(skb);
 	SKB_GSO_CB(skb)->encap_level += sizeof(*ipv6h);
 
@@ -121,6 +121,12 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 
 	proto = ipv6_gso_pull_exthdrs(skb, ipv6h->nexthdr);
 
+	if (skb->encapsulation &&
+	    skb_shinfo(skb)->gso_type & (SKB_GSO_SIT|SKB_GSO_IPIP))
+		udpfrag = proto == IPPROTO_UDP && encap;
+	else
+		udpfrag = proto == IPPROTO_UDP && !skb->encapsulation;
+
 	ops = rcu_dereference(inet6_offloads[proto]);
 	if (likely(ops && ops->callbacks.gso_segment)) {
 		skb_reset_transport_header(skb);
@@ -133,13 +139,9 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 	for (skb = segs; skb; skb = skb->next) {
 		ipv6h = (struct ipv6hdr *)(skb_mac_header(skb) + nhoff);
 		ipv6h->payload_len = htons(skb->len - nhoff - sizeof(*ipv6h));
-		if (tunnel) {
-			skb_reset_inner_headers(skb);
-			skb->encapsulation = 1;
-		}
 		skb->network_header = (u8 *)ipv6h - skb->head;
 
-		if (!tunnel && proto == IPPROTO_UDP) {
+		if (udpfrag) {
 			unfrag_ip6hlen = ip6_find_1stfragopt(skb, &prevhdr);
 			fptr = (struct frag_hdr *)((u8 *)ipv6h + unfrag_ip6hlen);
 			fptr->frag_off = htons(offset);
@@ -148,6 +150,8 @@ static struct sk_buff *ipv6_gso_segment(struct sk_buff *skb,
 			offset += (ntohs(ipv6h->payload_len) -
 				   sizeof(struct frag_hdr));
 		}
+		if (encap)
+			skb_reset_inner_headers(skb);
 	}
 
 out:
-- 
cgit v1.2.2


From 46833a86f7ab30101096d81117dd250bfae74c6f Mon Sep 17 00:00:00 2001
From: Mike Pecovnik <mike.pecovnik@gmail.com>
Date: Mon, 24 Feb 2014 21:11:16 +0100
Subject: net: Fix permission check in netlink_connect()

netlink_sendmsg() was changed to prevent non-root processes from sending
messages with dst_pid != 0.
netlink_connect() however still only checks if nladdr->nl_groups is set.
This patch modifies netlink_connect() to check for the same condition.

Signed-off-by: Mike Pecovnik <mike.pecovnik@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/netlink/af_netlink.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index fdf51353cf78..04748ab649c2 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1489,8 +1489,8 @@ static int netlink_connect(struct socket *sock, struct sockaddr *addr,
 	if (addr->sa_family != AF_NETLINK)
 		return -EINVAL;
 
-	/* Only superuser is allowed to send multicasts */
-	if (nladdr->nl_groups && !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
+	if ((nladdr->nl_groups || nladdr->nl_pid) &&
+	    !netlink_capable(sock, NL_CFG_F_NONROOT_SEND))
 		return -EPERM;
 
 	if (!nlk->portid)
-- 
cgit v1.2.2


From 3a9016f97fdc8bfbb26ff36ba8f3dc9162eb691b Mon Sep 17 00:00:00 2001
From: Steffen Klassert <steffen.klassert@secunet.com>
Date: Wed, 19 Feb 2014 10:07:34 +0100
Subject: xfrm: Fix unlink race when policies are deleted.

When a policy is unlinked from the lists in thread context,
the xfrm timer can fire before we can mark this policy as dead.
So reinitialize the bydst hlist, then hlist_unhashed() will
notice that this policy is not linked and will avoid a
doulble unlink of that policy.

Reported-by: Xianpeng Zhao <673321875@qq.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
---
 net/xfrm/xfrm_policy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 4b98b25793c5..1d5c7bf29938 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1158,7 +1158,7 @@ static struct xfrm_policy *__xfrm_policy_unlink(struct xfrm_policy *pol,
 	if (hlist_unhashed(&pol->bydst))
 		return NULL;
 
-	hlist_del(&pol->bydst);
+	hlist_del_init(&pol->bydst);
 	hlist_del(&pol->byidx);
 	list_del(&pol->walk.all);
 	net->xfrm.policy_count[dir]--;
-- 
cgit v1.2.2


From 9a9bfd032f0207dd6e63c84b7676b58f160af04b Mon Sep 17 00:00:00 2001
From: Eric Dumazet <edumazet@google.com>
Date: Tue, 25 Feb 2014 04:31:03 -0800
Subject: net: tcp: use NET_INC_STATS()

While LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES can only be incremented
in tcp_transmit_skb() from softirq (incoming message or timer
activation), it is better to use NET_INC_STATS() instead of
NET_INC_STATS_BH() as tcp_transmit_skb() can be called from process
context.

This will avoid copy/paste confusion when/if we want to add
other SNMP counters in tcp_transmit_skb()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp_output.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 09805817627b..d718482fd11c 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -864,8 +864,8 @@ static int tcp_transmit_skb(struct sock *sk, struct sk_buff *skb, int clone_it,
 
 		if (unlikely(skb->fclone == SKB_FCLONE_ORIG &&
 			     fclone->fclone == SKB_FCLONE_CLONE))
-			NET_INC_STATS_BH(sock_net(sk),
-					 LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
+			NET_INC_STATS(sock_net(sk),
+				      LINUX_MIB_TCPSPURIOUS_RTX_HOSTQUEUES);
 
 		if (unlikely(skb_cloned(skb)))
 			skb = pskb_copy(skb, gfp_mask);
-- 
cgit v1.2.2


From 724b9e1d75ab3401aaa081bd4efb440c1b3509db Mon Sep 17 00:00:00 2001
From: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Date: Wed, 26 Feb 2014 21:43:42 +0900
Subject: sch_tbf: Fix potential memory leak in tbf_change().

The allocated child qdisc is not freed in error conditions.
Defer the allocation after user configuration turns out to be
valid and acceptable.

Fixes: cc106e441a63b ("net: sched: tbf: fix the calculation of max_size")
Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sched/sch_tbf.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

(limited to 'net')

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 1cb413fead89..4f505a006896 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -334,18 +334,6 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
 			qdisc_put_rtab(qdisc_get_rtab(&qopt->peakrate,
 						      tb[TCA_TBF_PTAB]));
 
-	if (q->qdisc != &noop_qdisc) {
-		err = fifo_set_limit(q->qdisc, qopt->limit);
-		if (err)
-			goto done;
-	} else if (qopt->limit > 0) {
-		child = fifo_create_dflt(sch, &bfifo_qdisc_ops, qopt->limit);
-		if (IS_ERR(child)) {
-			err = PTR_ERR(child);
-			goto done;
-		}
-	}
-
 	buffer = min_t(u64, PSCHED_TICKS2NS(qopt->buffer), ~0U);
 	mtu = min_t(u64, PSCHED_TICKS2NS(qopt->mtu), ~0U);
 
@@ -390,6 +378,18 @@ static int tbf_change(struct Qdisc *sch, struct nlattr *opt)
 		goto done;
 	}
 
+	if (q->qdisc != &noop_qdisc) {
+		err = fifo_set_limit(q->qdisc, qopt->limit);
+		if (err)
+			goto done;
+	} else if (qopt->limit > 0) {
+		child = fifo_create_dflt(sch, &bfifo_qdisc_ops, qopt->limit);
+		if (IS_ERR(child)) {
+			err = PTR_ERR(child);
+			goto done;
+		}
+	}
+
 	sch_tree_lock(sch);
 	if (child) {
 		qdisc_tree_decrease_qlen(q->qdisc, q->qdisc->q.qlen);
-- 
cgit v1.2.2


From cb664981607a6b5b3d670ad57bbda893b2528d96 Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg@intel.com>
Date: Thu, 27 Feb 2014 20:47:53 +0100
Subject: mac80211: fix association to 20/40 MHz VHT networks

When a VHT network uses 20 or 40 MHz as per the HT operation
information, the channel center frequency segment 0 field in
the VHT operation information is reserved, so ignore it.

This fixes association with such networks when the AP puts 0
into the field, previously we'd disconnect due to an invalid
channel with the message
wlan0: AP VHT information is invalid, disable VHT

Cc: stable@vger.kernel.org
Fixes: f2d9d270c15ae ("mac80211: support VHT association")
Reported-by: Tim Nelson <tim.l.nelson@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/mac80211/mlme.c | 1 +
 1 file changed, 1 insertion(+)

(limited to 'net')

diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index c415f00cd6b6..245dce969b31 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -222,6 +222,7 @@ ieee80211_determine_chantype(struct ieee80211_sub_if_data *sdata,
 	switch (vht_oper->chan_width) {
 	case IEEE80211_VHT_CHANWIDTH_USE_HT:
 		vht_chandef.width = chandef->width;
+		vht_chandef.center_freq1 = chandef->center_freq1;
 		break;
 	case IEEE80211_VHT_CHANWIDTH_80MHZ:
 		vht_chandef.width = NL80211_CHAN_WIDTH_80;
-- 
cgit v1.2.2


From bf439b3154ce49d81a79b14f9fab18af99018ae2 Mon Sep 17 00:00:00 2001
From: Lorenzo Colitti <lorenzo@google.com>
Date: Thu, 27 Feb 2014 13:38:26 +0900
Subject: net: ipv6: ping: Use socket mark in routing lookup

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/ping.c | 1 +
 1 file changed, 1 insertion(+)

(limited to 'net')

diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index fb9beb78f00b..587bbdcb22b4 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -135,6 +135,7 @@ int ping_v6_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	fl6.flowi6_proto = IPPROTO_ICMPV6;
 	fl6.saddr = np->saddr;
 	fl6.daddr = *daddr;
+	fl6.flowi6_mark = sk->sk_mark;
 	fl6.fl6_icmp_type = user_icmph.icmp6_type;
 	fl6.fl6_icmp_code = user_icmph.icmp6_code;
 	security_sk_classify_flow(sk, flowi6_to_flowi(&fl6));
-- 
cgit v1.2.2


From feff9ab2e7fa773b6a3965f77375fe89f7fd85cf Mon Sep 17 00:00:00 2001
From: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Date: Thu, 27 Feb 2014 17:14:41 +0800
Subject: neigh: recompute reachabletime before returning from
 neigh_periodic_work()

If the neigh table's entries is less than gc_thresh1, the function
will return directly, and the reachabletime will not be recompute,
so the reachabletime can be guessed.

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/core/neighbour.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

(limited to 'net')

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index e1aa0f36cfe7..e16129019c66 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -766,9 +766,6 @@ static void neigh_periodic_work(struct work_struct *work)
 	nht = rcu_dereference_protected(tbl->nht,
 					lockdep_is_held(&tbl->lock));
 
-	if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
-		goto out;
-
 	/*
 	 *	periodically recompute ReachableTime from random function
 	 */
@@ -781,6 +778,9 @@ static void neigh_periodic_work(struct work_struct *work)
 				neigh_rand_reach_time(NEIGH_VAR(p, BASE_REACHABLE_TIME));
 	}
 
+	if (atomic_read(&tbl->entries) < tbl->gc_thresh1)
+		goto out;
+
 	for (i = 0 ; i < (1 << nht->hash_shift); i++) {
 		np = &nht->hash_buckets[i];
 
-- 
cgit v1.2.2


From accfe0e356327da5bd53da8852b93fc22de9b5fc Mon Sep 17 00:00:00 2001
From: Hans Schillstrom <hans@schillstrom.com>
Date: Thu, 27 Feb 2014 12:57:58 +0100
Subject: ipv6: ipv6_find_hdr restore prev functionality

The commit 9195bb8e381d81d5a315f911904cdf0cfcc919b8 ("ipv6: improve
ipv6_find_hdr() to skip empty routing headers") broke ipv6_find_hdr().

When a target is specified like IPPROTO_ICMPV6 ipv6_find_hdr()
returns -ENOENT when it's found, not the header as expected.

A part of IPVS is broken and possible also nft_exthdr_eval().
When target is -1 which it is most cases, it works.

This patch exits the do while loop if the specific header is found
so the nexthdr could be returned as expected.

Reported-by: Art -kwaak- van Breemen <ard@telegraafnet.nl>
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
CC:Ansis Atteka <aatteka@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv6/exthdrs_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 140748debc4a..8af3eb57f438 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -212,7 +212,7 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		found = (nexthdr == target);
 
 		if ((!ipv6_ext_hdr(nexthdr)) || nexthdr == NEXTHDR_NONE) {
-			if (target < 0)
+			if (target < 0 || found)
 				break;
 			return -ENOENT;
 		}
-- 
cgit v1.2.2


From 821047c4055cca833c4674f172a9d73003563eb6 Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan@hartkopp.net>
Date: Sat, 1 Mar 2014 15:31:53 +0100
Subject: can: remove CAN FD compatibility for CAN 2.0 sockets

In commit e2d265d3b587 (canfd: add support for CAN FD in CAN_RAW sockets)
CAN FD frames with a payload length up to 8 byte are passed to legacy
sockets where the CAN FD support was not enabled by the application.

After some discussions with developers at a fair this well meant feature
leads to confusion as no clean switch for CAN / CAN FD is provided to the
application programmer. Additionally a compatibility like this for legacy
CAN_RAW sockets requires some compatibility handling for the sending, e.g.
make CAN2.0 frames a CAN FD frame with BRS at transmission time (?!?).

This will become a mess when people start to develop applications with
real CAN FD hardware. This patch reverts the bad compatibility code
together with the documentation describing the removed feature.

Acked-by: Stephane Grosjean <s.grosjean@peak-system.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
---
 net/can/raw.c | 26 +++++---------------------
 1 file changed, 5 insertions(+), 21 deletions(-)

(limited to 'net')

diff --git a/net/can/raw.c b/net/can/raw.c
index 8be757cca2ec..081e81fd017f 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -121,13 +121,9 @@ static void raw_rcv(struct sk_buff *oskb, void *data)
 	if (!ro->recv_own_msgs && oskb->sk == sk)
 		return;
 
-	/* do not pass frames with DLC > 8 to a legacy socket */
-	if (!ro->fd_frames) {
-		struct canfd_frame *cfd = (struct canfd_frame *)oskb->data;
-
-		if (unlikely(cfd->len > CAN_MAX_DLEN))
-			return;
-	}
+	/* do not pass non-CAN2.0 frames to a legacy socket */
+	if (!ro->fd_frames && oskb->len != CAN_MTU)
+		return;
 
 	/* clone the given skb to be able to enqueue it into the rcv queue */
 	skb = skb_clone(oskb, GFP_ATOMIC);
@@ -738,9 +734,7 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 		       struct msghdr *msg, size_t size, int flags)
 {
 	struct sock *sk = sock->sk;
-	struct raw_sock *ro = raw_sk(sk);
 	struct sk_buff *skb;
-	int rxmtu;
 	int err = 0;
 	int noblock;
 
@@ -751,20 +745,10 @@ static int raw_recvmsg(struct kiocb *iocb, struct socket *sock,
 	if (!skb)
 		return err;
 
-	/*
-	 * when serving a legacy socket the DLC <= 8 is already checked inside
-	 * raw_rcv(). Now check if we need to pass a canfd_frame to a legacy
-	 * socket and cut the possible CANFD_MTU/CAN_MTU length to CAN_MTU
-	 */
-	if (!ro->fd_frames)
-		rxmtu = CAN_MTU;
-	else
-		rxmtu = skb->len;
-
-	if (size < rxmtu)
+	if (size < skb->len)
 		msg->msg_flags |= MSG_TRUNC;
 	else
-		size = rxmtu;
+		size = skb->len;
 
 	err = memcpy_toiovec(msg->msg_iov, skb->data, size);
 	if (err < 0) {
-- 
cgit v1.2.2


From de39d7a4f3693c4247135cbce42716bf2a113577 Mon Sep 17 00:00:00 2001
From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Sat, 1 Mar 2014 02:18:57 +0300
Subject: hsr: off by one sanity check in hsr_register_frame_in()

This is a sanity check and we never pass invalid values so this patch
doesn't change anything.  However the node->time_in[] array has
HSR_MAX_SLAVE (2) elements and not HSR_MAX_DEV (3).

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/hsr/hsr_framereg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

(limited to 'net')

diff --git a/net/hsr/hsr_framereg.c b/net/hsr/hsr_framereg.c
index 327060c6c874..7ae0d7f6dbd0 100644
--- a/net/hsr/hsr_framereg.c
+++ b/net/hsr/hsr_framereg.c
@@ -297,7 +297,7 @@ static bool seq_nr_after(u16 a, u16 b)
 
 void hsr_register_frame_in(struct node_entry *node, enum hsr_dev_idx dev_idx)
 {
-	if ((dev_idx < 0) || (dev_idx >= HSR_MAX_DEV)) {
+	if ((dev_idx < 0) || (dev_idx >= HSR_MAX_SLAVE)) {
 		WARN_ONCE(1, "%s: Invalid dev_idx (%d)\n", __func__, dev_idx);
 		return;
 	}
-- 
cgit v1.2.2


From c84a57113f59486e6688be1cd443b96e3118efa0 Mon Sep 17 00:00:00 2001
From: Yuchung Cheng <ycheng@google.com>
Date: Fri, 28 Feb 2014 16:42:26 -0800
Subject: tcp: fix bogus RTT on special retransmission

RTT may be bogus with tall loss probe (TLP) when a packet
is retransmitted and latter (s)acked without TCPCB_SACKED_RETRANS flag.

For example, TLP calls __tcp_retransmit_skb() instead of
tcp_retransmit_skb(). The skb timestamps are updated but the sacked
flag is not marked with TCPCB_SACKED_RETRANS. As a result we'll
get bogus RTT in tcp_clean_rtx_queue() or in tcp_sacktag_one() on
spurious retransmission.

The fix is to apply the sticky flag TCP_EVER_RETRANS to enforce Karn's
check on RTT sampling. However this will disable F-RTO if timeout occurs
after TLP, by resetting undo_marker in tcp_enter_loss(). We relax this
check to only if any pending retransmists are still in-flight.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/tcp_input.c  |  3 ++-
 net/ipv4/tcp_output.c | 11 ++++++++---
 2 files changed, 10 insertions(+), 4 deletions(-)

(limited to 'net')

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 227cba79fa6b..eeaac399420d 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1945,8 +1945,9 @@ void tcp_enter_loss(struct sock *sk, int how)
 		if (skb == tcp_send_head(sk))
 			break;
 
-		if (TCP_SKB_CB(skb)->sacked & TCPCB_RETRANS)
+		if (TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_RETRANS)
 			tp->undo_marker = 0;
+
 		TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
 		if (!(TCP_SKB_CB(skb)->sacked&TCPCB_SACKED_ACKED) || how) {
 			TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index d718482fd11c..f0eb4e337ec8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2337,6 +2337,7 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	unsigned int cur_mss;
+	int err;
 
 	/* Inconslusive MTU probe */
 	if (icsk->icsk_mtup.probe_size) {
@@ -2400,11 +2401,15 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 		     skb_headroom(skb) >= 0xFFFF)) {
 		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
 						   GFP_ATOMIC);
-		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-			      -ENOBUFS;
+		err = nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
+			     -ENOBUFS;
 	} else {
-		return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
+		err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
 	}
+
+	if (likely(!err))
+		TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
+	return err;
 }
 
 int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
-- 
cgit v1.2.2


From 10ddceb22bab11dab10ba645c7df2e4a8e7a5db5 Mon Sep 17 00:00:00 2001
From: Xin Long <lucien.xin@gmail.com>
Date: Mon, 3 Mar 2014 20:18:36 +0800
Subject: ip_tunnel:multicast process cause panic due to skb->_skb_refdst NULL
 pointer

when ip_tunnel process multicast packets, it may check if the packet is looped
back packet though 'rt_is_output_route(skb_rtable(skb))' in ip_tunnel_rcv(),
but before that , skb->_skb_refdst has been dropped in iptunnel_pull_header(),
so which leads to a panic.

fix the bug: https://bugzilla.kernel.org/show_bug.cgi?id=70681

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/ip_tunnel_core.c | 1 -
 1 file changed, 1 deletion(-)

(limited to 'net')

diff --git a/net/ipv4/ip_tunnel_core.c b/net/ipv4/ip_tunnel_core.c
index 8d69626f2206..6f847dd56dbc 100644
--- a/net/ipv4/ip_tunnel_core.c
+++ b/net/ipv4/ip_tunnel_core.c
@@ -108,7 +108,6 @@ int iptunnel_pull_header(struct sk_buff *skb, int hdr_len, __be16 inner_proto)
 	nf_reset(skb);
 	secpath_reset(skb);
 	skb_clear_hash_if_not_l4(skb);
-	skb_dst_drop(skb);
 	skb->vlan_tci = 0;
 	skb_set_queue_mapping(skb, 0);
 	skb->pkt_type = PACKET_HOST;
-- 
cgit v1.2.2


From ec0223ec48a90cb605244b45f7c62de856403729 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <dborkman@redhat.com>
Date: Mon, 3 Mar 2014 17:23:04 +0100
Subject: net: sctp: fix sctp_sf_do_5_1D_ce to verify if we/peer is AUTH
 capable

RFC4895 introduced AUTH chunks for SCTP; during the SCTP
handshake RANDOM; CHUNKS; HMAC-ALGO are negotiated (CHUNKS
being optional though):

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  -------------------- COOKIE-ECHO -------------------->
  <-------------------- COOKIE-ACK ---------------------

A special case is when an endpoint requires COOKIE-ECHO
chunks to be authenticated:

  ---------- INIT[RANDOM; CHUNKS; HMAC-ALGO] ---------->
  <------- INIT-ACK[RANDOM; CHUNKS; HMAC-ALGO] ---------
  ------------------ AUTH; COOKIE-ECHO ---------------->
  <-------------------- COOKIE-ACK ---------------------

RFC4895, section 6.3. Receiving Authenticated Chunks says:

  The receiver MUST use the HMAC algorithm indicated in
  the HMAC Identifier field. If this algorithm was not
  specified by the receiver in the HMAC-ALGO parameter in
  the INIT or INIT-ACK chunk during association setup, the
  AUTH chunk and all the chunks after it MUST be discarded
  and an ERROR chunk SHOULD be sent with the error cause
  defined in Section 4.1. [...] If no endpoint pair shared
  key has been configured for that Shared Key Identifier,
  all authenticated chunks MUST be silently discarded. [...]

  When an endpoint requires COOKIE-ECHO chunks to be
  authenticated, some special procedures have to be followed
  because the reception of a COOKIE-ECHO chunk might result
  in the creation of an SCTP association. If a packet arrives
  containing an AUTH chunk as a first chunk, a COOKIE-ECHO
  chunk as the second chunk, and possibly more chunks after
  them, and the receiver does not have an STCB for that
  packet, then authentication is based on the contents of
  the COOKIE-ECHO chunk. In this situation, the receiver MUST
  authenticate the chunks in the packet by using the RANDOM
  parameters, CHUNKS parameters and HMAC_ALGO parameters
  obtained from the COOKIE-ECHO chunk, and possibly a local
  shared secret as inputs to the authentication procedure
  specified in Section 6.3. If authentication fails, then
  the packet is discarded. If the authentication is successful,
  the COOKIE-ECHO and all the chunks after the COOKIE-ECHO
  MUST be processed. If the receiver has an STCB, it MUST
  process the AUTH chunk as described above using the STCB
  from the existing association to authenticate the
  COOKIE-ECHO chunk and all the chunks after it. [...]

Commit bbd0d59809f9 introduced the possibility to receive
and verification of AUTH chunk, including the edge case for
authenticated COOKIE-ECHO. On reception of COOKIE-ECHO,
the function sctp_sf_do_5_1D_ce() handles processing,
unpacks and creates a new association if it passed sanity
checks and also tests for authentication chunks being
present. After a new association has been processed, it
invokes sctp_process_init() on the new association and
walks through the parameter list it received from the INIT
chunk. It checks SCTP_PARAM_RANDOM, SCTP_PARAM_HMAC_ALGO
and SCTP_PARAM_CHUNKS, and copies them into asoc->peer
meta data (peer_random, peer_hmacs, peer_chunks) in case
sysctl -w net.sctp.auth_enable=1 is set. If in INIT's
SCTP_PARAM_SUPPORTED_EXT parameter SCTP_CID_AUTH is set,
peer_random != NULL and peer_hmacs != NULL the peer is to be
assumed asoc->peer.auth_capable=1, in any other case
asoc->peer.auth_capable=0.

Now, if in sctp_sf_do_5_1D_ce() chunk->auth_chunk is
available, we set up a fake auth chunk and pass that on to
sctp_sf_authenticate(), which at latest in
sctp_auth_calculate_hmac() reliably dereferences a NULL pointer
at position 0..0008 when setting up the crypto key in
crypto_hash_setkey() by using asoc->asoc_shared_key that is
NULL as condition key_id == asoc->active_key_id is true if
the AUTH chunk was injected correctly from remote. This
happens no matter what net.sctp.auth_enable sysctl says.

The fix is to check for net->sctp.auth_enable and for
asoc->peer.auth_capable before doing any operations like
sctp_sf_authenticate() as no key is activated in
sctp_auth_asoc_init_active_key() for each case.

Now as RFC4895 section 6.3 states that if the used HMAC-ALGO
passed from the INIT chunk was not used in the AUTH chunk, we
SHOULD send an error; however in this case it would be better
to just silently discard such a maliciously prepared handshake
as we didn't even receive a parameter at all. Also, as our
endpoint has no shared key configured, section 6.3 says that
MUST silently discard, which we are doing from now onwards.

Before calling sctp_sf_pdiscard(), we need not only to free
the association, but also the chunk->auth_chunk skb, as
commit bbd0d59809f9 created a skb clone in that case.

I have tested this locally by using netfilter's nfqueue and
re-injecting packets into the local stack after maliciously
modifying the INIT chunk (removing RANDOM; HMAC-ALGO param)
and the SCTP packet containing the COOKIE_ECHO (injecting
AUTH chunk before COOKIE_ECHO). Fixed with this patch applied.

Fixes: bbd0d59809f9 ("[SCTP]: Implement the receive and verification of AUTH chunk")
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Vlad Yasevich <yasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/sctp/sm_statefuns.c | 7 +++++++
 1 file changed, 7 insertions(+)

(limited to 'net')

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 591b44d3b7de..ae65b6b5973a 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -758,6 +758,13 @@ sctp_disposition_t sctp_sf_do_5_1D_ce(struct net *net,
 		struct sctp_chunk auth;
 		sctp_ierror_t ret;
 
+		/* Make sure that we and the peer are AUTH capable */
+		if (!net->sctp.auth_enable || !new_asoc->peer.auth_capable) {
+			kfree_skb(chunk->auth_chunk);
+			sctp_association_free(new_asoc);
+			return sctp_sf_pdiscard(net, ep, asoc, type, arg, commands);
+		}
+
 		/* set-up our fake chunk so that we can process it */
 		auth.skb = chunk->auth_chunk;
 		auth.asoc = chunk->asoc;
-- 
cgit v1.2.2