aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* Merge branch 'for-linus' of ↵Linus Torvalds2014-11-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph fixes from Sage Weil: "There is a GFP flag fix from Mike Christie, an error code fix from Jan, and fixes for two unnecessary allocations (kmalloc and workqueue) from Ilya. All are well tested. Ilya has one other fix on the way but it didn't get tested in time" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: eliminate unnecessary allocation in process_one_ticket() rbd: Fix error recovery in rbd_obj_read_sync() libceph: use memalloc flags for net IO rbd: use a single workqueue for all devices
| * libceph: eliminate unnecessary allocation in process_one_ticket()Ilya Dryomov2014-10-31
| | | | | | | | | | | | | | | | | | | | | | Commit c27a3e4d667f ("libceph: do not hard code max auth ticket len") while fixing a buffer overlow tried to keep the same as much of the surrounding code as possible and introduced an unnecessary kmalloc() in the unencrypted ticket path. It is likely to fail on huge tickets, so get rid of it. Signed-off-by: Ilya Dryomov <idryomov@redhat.com> Reviewed-by: Sage Weil <sage@redhat.com>
| * libceph: use memalloc flags for net IOMike Christie2014-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch has ceph's lib code use the memalloc flags. If the VM layer needs to write data out to free up memory to handle new allocation requests, the block layer must be able to make forward progress. To handle that requirement we use structs like mempools to reserve memory for objects like bios and requests. The problem is when we send/receive block layer requests over the network layer, net skb allocations can fail and the system can lock up. To solve this, the memalloc related flags were added. NBD, iSCSI and NFS uses these flags to tell the network/vm layer that it should use memory reserves to fullfill allcation requests for structs like skbs. I am running ceph in a bunch of VMs in my laptop, so this patch was not tested very harshly. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
* | irda: stop calling sk_prot->disconnect() on connection failureLinus Torvalds2014-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sk_prot is irda's own set of protocol handlers, so irda should statically know what that function is anyway, without using an indirect pointer. And as it happens, we know *exactly* what that pointer is statically: it's NULL, because irda doesn't define a disconnect operation. So calling that function is doubly wrong, and will just cause an oops. Reported-by: Martin Lang <mlg.hessigheim@gmail.com> Cc: Samuel Ortiz <samuel@sortiz.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | net: ethtool: Return -EOPNOTSUPP if user space tries to read EEPROM with lengh 0Guenter Roeck2014-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | If a driver supports reading EEPROM but no EEPROM is installed in the system, the driver's get_eeprom_len function returns 0. ethtool will subsequently try to read that zero-length EEPROM anyway. If the driver does not support EEPROM access at all, this operation will return -EOPNOTSUPP. If the driver does support EEPROM access but no EEPROM is installed, the operation will return -EINVAL. Return -EOPNOTSUPP in both cases for consistency. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mpls: Allow mpls_gso to be built as modulePravin B Shelar2014-10-31
| | | | | | | | | | | | | | | | | | | | Kconfig already allows mpls to be built as module. Following patch fixes Makefile to do same. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mpls: Fix mpls_gso handler.Pravin B Shelar2014-10-31
| | | | | | | | | | | | | | | | | | mpls gso handler needs to pull skb after segmenting skb. CC: Simon Horman <simon.horman@netronome.com> Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2014-10-31
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== netfilter/ipvs fixes for net The following patchset contains fixes for netfilter/ipvs. This round of fixes is larger than usual at this stage, specifically because of the nf_tables bridge reject fixes that I would like to see in 3.18. The patches are: 1) Fix a null-pointer dereference that may occur when logging errors. This problem was introduced by 4a4739d56b0 ("ipvs: Pull out crosses_local_route_boundary logic") in v3.17-rc5. 2) Update hook mask in nft_reject_bridge so we can also filter out packets from there. This fixes 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting"), which needs this chunk to work. 3) Two patches to refactor common code to forge the IPv4 and IPv6 reject packets from the bridge. These are required by the nf_tables reject bridge fix. 4) Fix nft_reject_bridge by avoiding the use of the IP stack to reject packets from the bridge. The idea is to forge the reject packets and inject them to the original port via br_deliver() which is now exported for that purpose. 5) Restrict nft_reject_bridge to bridge prerouting and input hooks. the original skbuff may cloned after prerouting when the bridge stack needs to flood it to several bridge ports, it is too late to reject the traffic. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nft_reject_bridge: restrict reject to prerouting and inputPablo Neira Ayuso2014-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | Restrict the reject expression to the prerouting and input bridge hooks. If we allow this to be used from forward or any other later bridge hook, if the frame is flooded to several ports, we'll end up sending several reject packets, one per cloned packet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nft_reject_bridge: don't use IP stack to reject trafficPablo Neira Ayuso2014-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the packet is received via the bridge stack, this cannot reject packets from the IP stack. This adds functions to build the reject packet and send it from the bridge stack. Comments and assumptions on this patch: 1) Validate the IPv4 and IPv6 headers before further processing, given that the packet comes from the bridge stack, we cannot assume they are clean. Truncated packets are dropped, we follow similar approach in the existing iptables match/target extensions that need to inspect layer 4 headers that is not available. This also includes packets that are directed to multicast and broadcast ethernet addresses. 2) br_deliver() is exported to inject the reject packet via bridge localout -> postrouting. So the approach is similar to what we already do in the iptables reject target. The reject packet is sent to the bridge port from which we have received the original packet. 3) The reject packet is forged based on the original packet. The TTL is set based on sysctl_ip_default_ttl for IPv4 and per-net ipv6.devconf_all hoplimit for IPv6. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_reject_ipv6: split nf_send_reset6() in smaller functionsPablo Neira Ayuso2014-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip6_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_ip6hdr_put(): to build the IPv6 header. * nf_reject_ip6_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_reject_ipv4: split nf_send_reset() in smaller functionsPablo Neira Ayuso2014-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That can be reused by the reject bridge expression to build the reject packet. The new functions are: * nf_reject_ip_tcphdr_get(): to sanitize and to obtain the TCP header. * nf_reject_iphdr_put(): to build the IPv4 header. * nf_reject_ip_tcphdr_put(): to build the TCP header. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_tables_bridge: update hook_mask to allow {pre,post}routingPablo Neira Ayuso2014-10-31
| | | | | | | | | | | | | | | Fixes: 36d2af5 ("netfilter: nf_tables: allow to filter from prerouting and postrouting") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | ipvs: Avoid null-pointer deref in debug codeAlex Gartrell2014-10-27
| | | | | | | | | | | | | | | | | | | | | | | | Use daddr instead of reaching into dest. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alex Gartrell <agartrell@fb.com> Signed-off-by: Simon Horman <horms@verge.net.au>
* | | drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packetsBen Hutchings2014-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UFO is now disabled on all drivers that work with virtio net headers, but userland may try to send UFO/IPv6 packets anyway. Instead of sending with ID=0, we should select identifiers on their behalf (as we used to). Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Fixes: 916e4cf46d02 ("ipv6: reuse ip6_frag_id from ip6_ufo_append_data") Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: skb_fclone_busy() needs to detect orphaned skbEric Dumazet2014-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some drivers are unable to perform TX completions in a bound time. They instead call skb_orphan() Problem is skb_fclone_busy() has to detect this case, otherwise we block TCP retransmits and can freeze unlucky tcp sessions on mostly idle hosts. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 1f3279ae0c13 ("tcp: avoid retransmits of TCP packets hanging in host queues") Signed-off-by: David S. Miller <davem@davemloft.net>
* | | gre: Use inner mac length when computing tunnel lengthTom Herbert2014-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, skb_inner_network_header is used but this does not account for Ethernet header for ETH_P_TEB. Use skb_inner_mac_header which handles TEB and also should work with IP encapsulation in which case inner mac and inner network headers are the same. Tested: Ran TCP_STREAM over GRE, worked as expected. Signed-off-by: Tom Herbert <therbert@google.com> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Do not cache routing failures due to disabled forwarding.Nicolas Cavallari2014-10-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we cache them, the kernel will reuse them, independently of whether forwarding is enabled or not. Which means that if forwarding is disabled on the input interface where the first routing request comes from, then that unreachable result will be cached and reused for other interfaces, even if forwarding is enabled on them. The opposite is also true. This can be verified with two interfaces A and B and an output interface C, where B has forwarding enabled, but not A and trying ip route get $dst iif A from $src && ip route get $dst iif B from $src Signed-off-by: Nicolas Cavallari <nicolas.cavallari@green-communications.fr> Reviewed-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet: frags: remove the WARN_ON from inet_evict_bucketNikolay Aleksandrov2014-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The WARN_ON in inet_evict_bucket can be triggered by a valid case: inet_frag_kill and inet_evict_bucket can be running in parallel on the same queue which means that there has been at least one more ref added by a previous inet_frag_find call, but inet_frag_kill can delete the timer before inet_evict_bucket which will cause the WARN_ON() there to trigger since we'll have refcnt!=1. Now, this case is valid because the queue is being "killed" for some reason (removed from the chain list and its timer deleted) so it will get destroyed in the end by one of the inet_frag_put() calls which reaches 0 i.e. refcnt is still valid. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet: frags: fix a race between inet_evict_bucket and inet_frag_killNikolay Aleksandrov2014-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the evictor is running it adds some chosen frags to a local list to be evicted once the chain lock has been released but at the same time the *frag_queue can be running for some of the same queues and it may call inet_frag_kill which will wait on the chain lock and will then delete the queue from the wrong list since it was added in the eviction one. The fix is simple - check if the queue has the evict flag set under the chain lock before deleting it, this is safe because the evict flag is set only under that lock and having the flag set also means that the queue has been detached from the chain list, so no need to delete it again. An important note to make is that we're safe w.r.t refcnt because inet_frag_kill and inet_evict_bucket will sync on the del_timer operation where only one of the two can succeed (or if the timer is executing - none of them), the cases are: 1. inet_frag_kill succeeds in del_timer - then the timer ref is removed, but inet_evict_bucket will not add this queue to its expire list but will restart eviction in that chain 2. inet_evict_bucket succeeds in del_timer - then the timer ref is kept until the evictor "expires" the queue, but inet_frag_kill will remove the initial ref and will set INET_FRAG_COMPLETE which will make the frag_expire fn just to remove its ref. In the end all of the queue users will do an inet_frag_put and the one that reaches 0 will free it. The refcount balance should be okay. CC: Florian Westphal <fw@strlen.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McLean <chutzpah@gentoo.org> Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Patrick McLean <chutzpah@gentoo.org> Tested-by: Patrick McLean <chutzpah@gentoo.org> Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Reviewed-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv6: notify userspace when we added or changed an ipv6 tokenLubomir Rintel2014-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NetworkManager might want to know that it changed when the router advertisement arrives. Signed-off-by: Lubomir Rintel <lkundrak@v3.sk> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Daniel Borkmann <dborkman@redhat.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sch_pie: schedule the timer after all init succeedWANG Cong2014-10-29
| | | | | | | | | | | | | | | | | | | | | Cc: Vijay Subramanian <vijaynsu@cisco.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com>
* | | Merge tag 'master-2014-10-27' of ↵David S. Miller2014-10-28
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless John W. Linville says: ==================== pull request: wireless 2014-10-28 Please pull this batch of fixes intended for the 3.18 stream! For the mac80211 bits, Johannes says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." For the iwlwifi bits, Emmanuel says: "I revert here a patch that caused interoperability issues. dvm gets a fix for a bug that was reported by many users. Two minor fixes for BT Coex and platform power fix that helps reducing latency when the PCIe link goes to low power states." In addition... Felix Fietkau adds a couple of ath code fixes related to regulatory rule enforcement. Hauke Mehrtens fixes a build break with bcma when CONFIG_OF_ADDRESS is not set. Karsten Wiese provides a trio of minor fixes for rtl8192cu. Kees Cook prevents a potential information leak in rtlwifi. Larry Finger also brings a trio of minor fixes for rtlwifi. Rafał Miłecki adds a device ID to the bcma bus driver. Rickard Strandqvist offers some strn* -> strl* changes in brcmfmac to eliminate non-terminated string issues. Sujith Manoharan avoids some ath9k stalls by enabling HW queue control only for MCC. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * \ \ Merge tag 'mac80211-for-john-2014-10-23' of ↵John W. Linville2014-10-27
| |\ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg <johannes@sipsolutions.net> says: "Here are a few fixes for the wireless stack: one fixes the RTS rate, one for a debugfs file, one to return the correct channel to userspace, a sanity check for a userspace value and the remaining two are just documentation fixes." Signed-off-by: John W. Linville <linville@tuxdriver.com>
| | * | mac80211: minstrels: fix buffer overflow in HT debugfs rc_statsKarl Beldan2014-10-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ATM an HT rc_stats line is 106 chars. Times 8(MCS_GROUP_RATES)*3(SS)*2(GI)*2(BW) + CCK(4), i.e. x100, this is well above the current 8192 - sizeof(*ms) currently allocated. Fix this by squeezing the output as follows (not that we're short on memory but this also improves readability and range, the new format adds one more digit to *ok/*cum and ok/cum): - Before (HT) (106 ch): type rate throughput ewma prob this prob retry this succ/attempt success attempts CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0 0 HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0 0 - After (75 ch): type rate tpt eprob *prob ret *ok(*cum) ok( cum) CCK/LP 5.5M 0.0 0.0 0.0 0 0( 0) 0( 0) HT20/LGI ABCDP MCS0 0.0 0.0 0.0 1 0( 0) 0( 0) - Align non-HT format Before (non-HT) (83 ch): rate throughput ewma prob this prob this succ/attempt success attempts ABCDP 6 0.0 0.0 0.0 0( 0) 0 0 54 0.0 0.0 0.0 0( 0) 0 0 - After (61 ch): rate tpt eprob *prob *ok(*cum) ok( cum) ABCDP 1 0.0 0.0 0.0 0( 0) 0( 0) 54 0.0 0.0 0.0 0( 0) 0( 0) *This also adds dynamic checks for overflow, lowers the size of the non-HT request (allowing > 30 entries) and replaces the buddy-rounded allocations (s/sizeof(*ms) + 8192/8192). Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Acked-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | mac80211: fix typo in starting baserate for rts_cts_rate_idxKarl Beldan2014-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It affects non-(V)HT rates and can lead to selecting an rts_cts rate that is not a basic rate or way superior to the reference rate (ATM rates[0] used for the 1st attempt of the protected frame data). E.g, assuming drivers register growing (bitrate) sorted tables of ieee80211_rate-s, having : - rates[0].idx == d'2 and basic_rates == b'10100 will select rts_cts idx b'10011 & ~d'(BIT(2)-1), i.e. 1, likewise - rates[0].idx == d'2 and basic_rates == b'10001 will select rts_cts idx b'10000 The first is not a basic rate and the second is > rates[0]. Also, wrt severity of the addressed misbehavior, ATM we only have one rts_cts_rate_idx rather than one per rate table entry, so this idx might still point to bitrates > rates[1..MAX_RATES]. Fixes: 5253ffb8c9e1 ("mac80211: always pick a basic rate to tx RTS/CTS for pre-HT rates") Cc: stable@vger.kernel.org Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | nl80211: sanity check the channel switch counter valueLuciano Coelho2014-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nl80211 channel switch count attribute (NL80211_ATTR_CH_SWITCH_COUNT) is specified as u32, but the specification uses u8 for the counter. To make sure strange things don't happen without informing the user, sanity check the value and return -EINVAL if it doesn't fit in u8. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | net: rfkill: kernel-doc warning fixesFabian Frederick2014-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct the kernel-doc, the parameter is called "blocked" not "state". Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | mac80211: return the vif's chandef in ieee80211_cfg_get_channel()Luciano Coelho2014-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The chandef of the channel context a vif is using may be different than the chandef of the vif itself. For instance, the bandwidth used by the vif may be narrower than the one configured in the channel context. To avoid confusion, return the vif's chandef in ieee80211_cfg_get_channel() instead of the chandef of the channel context. Signed-off-by: Luciano Coelho <luciano.coelho@intel.com> Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
| | * | mac80211: fix warning on htmldocs for last_tdls_pkt_timeLiad Kaufman2014-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Forgot to add an entry to the struct description of sta_info. Signed-off-by: Liad Kaufman <liad.kaufman@intel.com> Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
* | | | net: dsa: Error out on tagging protocol mismatchesAndrew Lunn2014-10-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is a mismatch between enabled tagging protocols and the protocol the switch supports, error out, rather than continue with a situation which is unlikely to work. Signed-off-by: Andrew Lunn <andrew@lunn.ch> cc: alexander.h.duyck@intel.com Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | bpf: split eBPF out of NETAlexei Starovoitov2014-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | introduce two configs: - hidden CONFIG_BPF to select eBPF interpreter that classic socket filters depend on - visible CONFIG_BPF_SYSCALL (default off) that tracing and sockets can use that solves several problems: - tracing and others that wish to use eBPF don't need to depend on NET. They can use BPF_SYSCALL to allow loading from userspace or select BPF to use it directly from kernel in NET-less configs. - in 3.18 programs cannot be attached to events yet, so don't force it on - when the rest of eBPF infra is there in 3.19+, it's still useful to switch it off to minimize kernel size bloat-o-meter on x64 shows: add/remove: 0/60 grow/shrink: 0/2 up/down: 0/-15601 (-15601) tested with many different config combinations. Hopefully didn't miss anything. Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller2014-10-27
|\ \ \ \ | | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for your net tree, they are: 1) Allow to recycle a TCP port in conntrack when the change role from server to client, from Marcelo Leitner. 2) Fix possible off by one access in ip_set_nfnl_get_byindex(), patch from Dan Carpenter. 3) alloc_percpu returns NULL on error, no need for IS_ERR() in nf_tables chain statistic updates. From Sabrina Dubroca. 4) Don't compile ip options in bridge netfilter, this mangles the packet and bridge should not alter layer >= 3 headers when forwarding packets. Patch from Herbert Xu and tested by Florian Westphal. 5) Account the final NLMSG_DONE message when calculating the size of the nflog netlink batches. Patch from Florian Westphal. 6) Fix a possible netlink attribute length overflow with large packets. Again from Florian Westphal. 7) Release the skbuff if nfnetlink_log fails to put the final NLMSG_DONE message. This fixes a leak on error. This shouldn't ever happen though, otherwise this means we miscalculate the netlink batch size, so spot a warning if this ever happens so we can track down the problem. This patch from Houcheng Lin. 8) Look at the right list when recycling targets in the nft_compat, patch from Arturo Borrero. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netfilter: nft_compat: fix wrong target lookup in nft_target_select_ops()Arturo Borrero2014-10-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code looks for an already loaded target, and the correct list to search is nft_target_list, not nft_match_list. Signed-off-by: Arturo Borrero Gonzalez <arturo.borrero.glez@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_log: release skbuff on nlmsg put failureHoucheng Lin2014-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kernel should reserve enough room in the skb so that the DONE message can always be appended. However, in case of e.g. new attribute erronously not being size-accounted for, __nfulnl_send() will still try to put next nlmsg into this full skbuf, causing the skb to be stuck forever and blocking delivery of further messages. Fix issue by releasing skb immediately after nlmsg_put error and WARN() so we can track down the cause of such size mismatch. [ fw@strlen.de: add tailroom/len info to WARN ] Signed-off-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nfnetlink_log: fix maximum packet length logged to userspaceFlorian Westphal2014-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | don't try to queue payloads > 0xffff - NLA_HDRLEN, it does not work. The nla length includes the size of the nla struct, so anything larger results in u16 integer overflow. This patch is similar to 9cefbbc9c8f9abe (netfilter: nfnetlink_queue: cleanup copy_range usage). Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_log: account for size of NLMSG_DONE attributeFlorian Westphal2014-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently neither account for the nlattr size, nor do we consider the size of the trailing NLMSG_DONE when allocating nlmsg skb. This can result in nflog to stop working, as __nfulnl_send() re-tries sending forever if it failed to append NLMSG_DONE (which will never work if buffer is not large enough). Reported-by: Houcheng Lin <houcheng@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | bridge: Do not compile options in br_parse_ip_optionsHerbert Xu2014-10-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 462fb2af9788a82a534f8184abfde31574e1cfa0 bridge : Sanitize skb before it enters the IP stack broke when IP options are actually used because it mangles the skb as if it entered the IP stack which is wrong because the bridge is supposed to operate below the IP stack. Since nobody has actually requested for parsing of IP options this patch fixes it by simply reverting to the previous approach of ignoring all IP options, i.e., zeroing the IPCB. If and when somebody who uses IP options and actually needs them to be parsed by the bridge complains then we can revisit this. Reported-by: David Newall <davidn@davidnewall.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_tables: check for NULL in nf_tables_newchain pcpu stats allocationSabrina Dubroca2014-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | alloc_percpu returns NULL on failure, not a negative error code. Fixes: ff3cd7b3c922 ("netfilter: nf_tables: refactor chain statistic routines") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: ipset: off by one in ip_set_nfnl_get_byindex()Dan Carpenter2014-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ->ip_set_list[] array is initialized in ip_set_net_init() and it has ->ip_set_max elements so this check should be >= instead of > otherwise we are off by one. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | netfilter: nf_conntrack: allow server to become a client in TW handlingMarcelo Leitner2014-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a port that was used to listen for inbound connections gets closed and reused for outgoing connections (like rsh ends up doing for stderr flow), current we may reject the SYN/ACK packet for the new connection because tcp_conntracks states forbirds a port to become a client while there is still a TIME_WAIT entry in there for it. As TCP may expire the TIME_WAIT socket in 60s and conntrack's timeout for it is 120s, there is a ~60s window that the application can end up opening a port that conntrack will end up blocking. This patch fixes this by simply allowing such state transition: if we see a SYN, in TIME_WAIT state, on REPLY direction, move it to sSS. Note that the rest of the code already handles this situation, more specificly in tcp_packet(), first switch clause. Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | net: napi_reuse_skb() should check pfmemallocEric Dumazet2014-10-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Do not reuse skb if it was pfmemalloc tainted, otherwise future frame might be dropped anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Roman Gushchin <klamm@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | tcp: md5: do not use alloc_percpu()Eric Dumazet2014-10-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | percpu tcp_md5sig_pool contains memory blobs that ultimately go through sg_set_buf(). -> sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); This requires that whole area is in a physically contiguous portion of memory. And that @buf is not backed by vmalloc(). Given that alloc_percpu() can use vmalloc() areas, this does not fit the requirements. Replace alloc_percpu() by a static DEFINE_PER_CPU() as tcp_md5sig_pool is small anyway, there is no gain to dynamically allocate it. Signed-off-by: Eric Dumazet <edumazet@google.com> Fixes: 765cf9976e93 ("tcp: md5: remove one indirection level in tcp_md5sig_pool") Reported-by: Crestez Dan Leonard <cdleonard@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: fix saving TX flow hash in sock for outgoing connectionsSathya Perla2014-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit "net: Save TX flow hash in sock and set in skbuf on xmit" introduced the inet_set_txhash() and ip6_set_txhash() routines to calculate and record flow hash(sk_txhash) in the socket structure. sk_txhash is used to set skb->hash which is used to spread flows across multiple TXQs. But, the above routines are invoked before the source port of the connection is created. Because of this all outgoing connections that just differ in the source port get hashed into the same TXQ. This patch fixes this problem for IPv4/6 by invoking the the above routines after the source port is available for the socket. Fixes: b73c3d0e4("net: Save TX flow hash in sock and set in skbuf on xmit") Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | xfrm6: fix a potential use after free in xfrm6_policy.cLi RongQing2014-10-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pskb_may_pull() maybe change skb->data and make nh and exthdr pointer oboslete, so recompute the nd and exthdr Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: tso: fix unaligned access to crafted TCP header in helper APIKarl Beldan2014-10-22
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The crafted header start address is from a driver supplied buffer, which one can reasonably expect to be aligned on a 4-bytes boundary. However ATM the TSO helper API is only used by ethernet drivers and the tcp header will then be aligned to a 2-bytes only boundary from the header start address. Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com> Cc: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: sched: initialize bstats syncpSabrina Dubroca2014-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use netdev_alloc_pcpu_stats to allocate percpu stats and initialize syncp. Fixes: 22e0f8b9322c "net: sched: make bstats per cpu and estimator RCU safe" Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Cong Wang <cwang@twopensource.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netlink: Re-add locking to netlink_lookup() and seq walkerThomas Graf2014-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The synchronize_rcu() in netlink_release() introduces unacceptable latency. Reintroduce minimal lookup so we can drop the synchronize_rcu() until socket destruction has been RCUfied. Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reported-by: Steinar H. Gunderson <sgunderson@bigfoot.com> Reported-and-tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tipc: fix lockdep warning when intra-node messages are deliveredYing Xue2014-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When running tipcTC&tipcTS test suite, below lockdep unsafe locking scenario is reported: [ 1109.997854] [ 1109.997988] ================================= [ 1109.998290] [ INFO: inconsistent lock state ] [ 1109.998575] 3.17.0-rc1+ #113 Not tainted [ 1109.998762] --------------------------------- [ 1109.998762] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 1109.998762] swapper/7/0 [HC0[0]:SC1[1]:HE1:SE0] takes: [ 1109.998762] (slock-AF_TIPC){+.?...}, at: [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] {SOFTIRQ-ON-W} state was registered at: [ 1109.998762] [<ffffffff810a4770>] __lock_acquire+0x6a0/0x1d80 [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa0004fe8>] tipc_link_xmit+0xa8/0xc0 [tipc] [ 1109.998762] [<ffffffffa000ec6f>] tipc_sendmsg+0x15f/0x550 [tipc] [ 1109.998762] [<ffffffffa000f165>] tipc_connect+0x105/0x140 [tipc] [ 1109.998762] [<ffffffff817676ee>] SYSC_connect+0xae/0xc0 [ 1109.998762] [<ffffffff81767b7e>] SyS_connect+0xe/0x10 [ 1109.998762] [<ffffffff817a9788>] compat_SyS_socketcall+0xb8/0x200 [ 1109.998762] [<ffffffff81a306e5>] sysenter_dispatch+0x7/0x1f [ 1109.998762] irq event stamp: 241060 [ 1109.998762] hardirqs last enabled at (241060): [<ffffffff8105a4ad>] __local_bh_enable_ip+0x6d/0xd0 [ 1109.998762] hardirqs last disabled at (241059): [<ffffffff8105a46f>] __local_bh_enable_ip+0x2f/0xd0 [ 1109.998762] softirqs last enabled at (241020): [<ffffffff81059a52>] _local_bh_enable+0x22/0x50 [ 1109.998762] softirqs last disabled at (241021): [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [ 1109.998762] other info that might help us debug this: [ 1109.998762] Possible unsafe locking scenario: [ 1109.998762] [ 1109.998762] CPU0 [ 1109.998762] ---- [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] <Interrupt> [ 1109.998762] lock(slock-AF_TIPC); [ 1109.998762] [ 1109.998762] *** DEADLOCK *** [ 1109.998762] [ 1109.998762] 2 locks held by swapper/7/0: [ 1109.998762] #0: (rcu_read_lock){......}, at: [<ffffffff81782dc9>] __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] #1: (rcu_read_lock){......}, at: [<ffffffffa0001c90>] tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [ 1109.998762] stack backtrace: [ 1109.998762] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.17.0-rc1+ #113 [ 1109.998762] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007 [ 1109.998762] ffffffff82745830 ffff880016c03828 ffffffff81a209eb 0000000000000007 [ 1109.998762] ffff880017b3cac0 ffff880016c03888 ffffffff81a1c5ef 0000000000000001 [ 1109.998762] ffff880000000001 ffff880000000000 ffffffff81012d4f 0000000000000000 [ 1109.998762] Call Trace: [ 1109.998762] <IRQ> [<ffffffff81a209eb>] dump_stack+0x4e/0x68 [ 1109.998762] [<ffffffff81a1c5ef>] print_usage_bug+0x1f1/0x202 [ 1109.998762] [<ffffffff81012d4f>] ? save_stack_trace+0x2f/0x50 [ 1109.998762] [<ffffffff810a406c>] mark_lock+0x28c/0x2f0 [ 1109.998762] [<ffffffff810a3440>] ? print_irq_inversion_bug.part.46+0x1f0/0x1f0 [ 1109.998762] [<ffffffff810a467d>] __lock_acquire+0x5ad/0x1d80 [ 1109.998762] [<ffffffff810a70dd>] ? trace_hardirqs_on+0xd/0x10 [ 1109.998762] [<ffffffff8108ace8>] ? sched_clock_cpu+0x98/0xc0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff810a10dc>] ? lock_release_holdtime.part.29+0x1c/0x1a0 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffff810a6555>] lock_acquire+0x95/0x1e0 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffff810a6fb6>] ? trace_hardirqs_on_caller+0xa6/0x1c0 [ 1109.998762] [<ffffffff81a2d1ce>] _raw_spin_lock+0x3e/0x80 [ 1109.998762] [<ffffffffa0011969>] ? tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa000dec0>] ? tipc_sk_get+0x60/0x80 [tipc] [ 1109.998762] [<ffffffffa0011969>] tipc_sk_rcv+0x49/0x2b0 [tipc] [ 1109.998762] [<ffffffffa00076bd>] tipc_rcv+0x5ed/0x960 [tipc] [ 1109.998762] [<ffffffffa0001d1c>] tipc_l2_rcv_msg+0xcc/0x260 [tipc] [ 1109.998762] [<ffffffffa0001c90>] ? tipc_l2_rcv_msg+0x40/0x260 [tipc] [ 1109.998762] [<ffffffff81783345>] __netif_receive_skb_core+0x5e5/0xb70 [ 1109.998762] [<ffffffff81782dc9>] ? __netif_receive_skb_core+0x69/0xb70 [ 1109.998762] [<ffffffff81784eb9>] ? dev_gro_receive+0x259/0x4e0 [ 1109.998762] [<ffffffff817838f6>] __netif_receive_skb+0x26/0x70 [ 1109.998762] [<ffffffff81783acd>] netif_receive_skb_internal+0x2d/0x1f0 [ 1109.998762] [<ffffffff81785518>] napi_gro_receive+0xd8/0x240 [ 1109.998762] [<ffffffff815bf854>] e1000_clean_rx_irq+0x2c4/0x530 [ 1109.998762] [<ffffffff815c1a46>] e1000_clean+0x266/0x9c0 [ 1109.998762] [<ffffffff8108ad2b>] ? local_clock+0x1b/0x30 [ 1109.998762] [<ffffffff8108aa05>] ? sched_clock_local+0x25/0x90 [ 1109.998762] [<ffffffff817842b1>] net_rx_action+0x141/0x310 [ 1109.998762] [<ffffffff810bd710>] ? handle_fasteoi_irq+0xe0/0x150 [ 1109.998762] [<ffffffff81059fa6>] __do_softirq+0x116/0x4d0 [ 1109.998762] [<ffffffff8105a626>] irq_exit+0x96/0xc0 [ 1109.998762] [<ffffffff81a30d07>] do_IRQ+0x67/0x110 [ 1109.998762] [<ffffffff81a2ee2f>] common_interrupt+0x6f/0x6f [ 1109.998762] <EOI> [<ffffffff8100d2b7>] ? default_idle+0x37/0x250 [ 1109.998762] [<ffffffff8100d2b5>] ? default_idle+0x35/0x250 [ 1109.998762] [<ffffffff8100dd1f>] arch_cpu_idle+0xf/0x20 [ 1109.998762] [<ffffffff810999fd>] cpu_startup_entry+0x27d/0x4d0 [ 1109.998762] [<ffffffff81034c78>] start_secondary+0x188/0x1f0 When intra-node messages are delivered from one process to another process, tipc_link_xmit() doesn't disable BH before it directly calls tipc_sk_rcv() on process context to forward messages to destination socket. Meanwhile, if messages delivered by remote node arrive at the node and their destinations are also the same socket, tipc_sk_rcv() running on process context might be preempted by tipc_sk_rcv() running BH context. As a result, the latter cannot obtain the socket lock as the lock was obtained by the former, however, the former has no chance to be run as the latter is owning the CPU now, so headlock happens. To avoid it, BH should be always disabled in tipc_sk_rcv(). Signed-off-by: Ying Xue <ying.xue@windriver.com> Reviewed-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tipc: fix a potential deadlockYing Xue2014-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Locking dependency detected below possible unsafe locking scenario: CPU0 CPU1 T0: tipc_named_rcv() tipc_rcv() T1: [grab nametble write lock]* [grab node lock]* T2: tipc_update_nametbl() tipc_node_link_up() T3: tipc_nodesub_subscribe() tipc_nametbl_publish() T4: [grab node lock]* [grab nametble write lock]* The opposite order of holding nametbl write lock and node lock on above two different paths may result in a deadlock. If we move the the updating of the name table after link state named out of node lock, the reverse order of holding locks will be eliminated, and as a result, the deadlock risk. Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>