aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
* ipv4: Fix fib_trie rebalancingJarek Poplawski2009-06-15
| | | | | | | | | | | | | | | | | | While doing trie_rebalance(): resize(), inflate(), halve() RCU free tnodes before updating their parents. It depends on RCU delaying the real destruction, but if RCU readers start after call_rcu() and before parent update they could access freed memory. It is currently prevented with preempt_disable() on the update side, but it's not safe, except maybe classic RCU, plus it conflicts with memory allocations with GFP_KERNEL flag used from these functions. This patch explicitly delays freeing of tnodes by adding them to the list, which is flushed after the update is finished. Reported-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* PIM-SM: namespace changesTom Goff2009-06-14
| | | | | | | | | | | | | | | | IPv4: - make PIM register vifs netns local - set the netns when a PIM register vif is created - make PIM available in all network namespaces (if CONFIG_IP_PIMSM_V2) by adding the protocol handler when multicast routing is initialized IPv6: - make PIM register vifs netns local - make PIM available in all network namespaces (if CONFIG_IPV6_PIMSM_V2) by adding the protocol handler when multicast routing is initialized Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: update ARPD help textTimo Teräs2009-06-14
| | | | | | | | | | | | Removed the statements about ARP cache size as this config option does not affect it. The cache size is controlled by neigh_table gc thresholds. Remove also expiremental and obsolete markings as the API originally intended for arp caching is useful for implementing ARP-like protocols (e.g. NHRP) in user space and has been there for a long enough time. Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use a deferred timer in rt_check_expireEric Dumazet2009-06-14
| | | | | | | | | | | | | | | | For the sake of power saver lovers, use a deferrable timer to fire rt_check_expire() As some big routers cache equilibrium depends on garbage collection done in time, we take into account elapsed time between two rt_check_expire() invocations to adjust the amount of slots we have to check. Based on an initial idea and patch from Tero Kristo Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Tero Kristo <tero.kristo@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netfilter: ip_tables: fix build errorPatrick McHardy2009-06-11
| | | | | | | | | | | | Fix build error introduced by commit bb70dfa5 (netfilter: xtables: consolidate comefrom debug cast access): net/ipv4/netfilter/ip_tables.c: In function 'ipt_do_table': net/ipv4/netfilter/ip_tables.c:421: error: 'comefrom' undeclared (first use in this function) net/ipv4/netfilter/ip_tables.c:421: error: (Each undeclared identifier is reported only once net/ipv4/netfilter/ip_tables.c:421: error: for each function it appears in.) Signed-off-by: Patrick McHardy <kaber@trash.net>
* Merge branch 'master' of ↵Patrick McHardy2009-06-11
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
| * net: No more expensive sock_hold()/sock_put() on each txEric Dumazet2009-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the problem with sock memory accounting is it uses a pair of sock_hold()/sock_put() for each transmitted packet. This slows down bidirectional flows because the receive path also needs to take a refcount on socket and might use a different cpu than transmit path or transmit completion path. So these two atomic operations also trigger cache line bounces. We can see this in tx or tx/rx workloads (media gateways for example), where sock_wfree() can be in top five functions in profiles. We use this sock_hold()/sock_put() so that sock freeing is delayed until all tx packets are completed. As we also update sk_wmem_alloc, we could offset sk_wmem_alloc by one unit at init time, until sk_free() is called. Once sk_free() is called, we atomic_dec_and_test(sk_wmem_alloc) to decrement initial offset and atomicaly check if any packets are in flight. skb_set_owner_w() doesnt call sock_hold() anymore sock_wfree() doesnt call sock_put() anymore, but check if sk_wmem_alloc reached 0 to perform the final freeing. Drawback is that a skb->truesize error could lead to unfreeable sockets, or even worse, prematurely calling __sk_free() on a live socket. Nice speedups on SMP. tbench for example, going from 2691 MB/s to 2711 MB/s on my 8 cpu dev machine, even if tbench was not really hitting sk_refcnt contention point. 5 % speedup on a UDP transmit workload (depends on number of flows), lowering TX completion cpu usage. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: Fix extra semi-colon in skb_walk_frags() changes.David S. Miller2009-06-09
| | | | | | | | | | | | Noticed by Jesper Dangaard Brouer Signed-off-by: David S. Miller <davem@davemloft.net>
| * netfilter: Use frag list abstraction interfaces.David S. Miller2009-06-09
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: Use frag list abstraction interfaces.David S. Miller2009-06-09
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * ipv4: remove ip_mc_drop_socket() declaration from af_inet.c.Rami Rosen2009-06-04
| | | | | | | | | | | | | | | | | | ip_mc_drop_socket() method is declared in linux/igmp.h, which is included anyhow in af_inet.c. So there is no need for this declaration. This patch removes it from af_inet.c. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: skb->dst accessorsEric Dumazet2009-06-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define three accessors to get/set dst attached to a skb struct dst_entry *skb_dst(const struct sk_buff *skb) void skb_dst_set(struct sk_buff *skb, struct dst_entry *dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: skb->rtable accessorEric Dumazet2009-06-03
| | | | | | | | | | | | | | | | | | | | | | Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2009-06-03
| |\ | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/forcedeth.c
| | * tcp: tcp_vegas ssthresh bugfixDoug Leith2009-05-26
| | | | | | | | | | | | | | | | | | | | | This patch fixes ssthresh accounting issues in tcp_vegas when cwnd decreases Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: New multicast-all socket optionNivedita Singhvi2009-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After some discussion offline with Christoph Lameter and David Stevens regarding multicast behaviour in Linux, I'm submitting a slightly modified patch from the one Christoph submitted earlier. This patch provides a new socket option IP_MULTICAST_ALL. In this case, default behaviour is _unchanged_ from the current Linux standard. The socket option is set by default to provide original behaviour. Sockets wishing to receive data only from multicast groups they join explicitly will need to clear this socket option. Signed-off-by: Nivedita Singhvi <niv@us.ibm.com> Signed-off-by: Christoph Lameter<cl@linux.com> Acked-by: David Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: ipv4/ip_sockglue.c cleanupsEric Dumazet2009-06-02
| | | | | | | | | | | | | | | | | | | | | Pure cleanups Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: fix loop in ofo handling code and reduce its complexityIlpo Järvinen2009-05-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Somewhat luckily, I was looking into these parts with very fine comb because I've made somewhat similar changes on the same area (conflicts that arose weren't that lucky though). The loop was very much overengineered recently in commit 915219441d566 (tcp: Use SKB queue and list helpers instead of doing it by-hand), while it basically just wants to know if there are skbs after 'skb'. Also it got broken because skb1 = skb->next got translated into skb1 = skb1->next (though abstracted) improperly. Note that 'skb1' is pointing to previous sk_buff than skb or NULL if at head. Two things went wrong: - We'll kfree 'skb' on the first iteration instead of the skbuff following 'skb' (it would require required SACK reneging to recover I think). - The list head case where 'skb1' is NULL is checked too early and the loop won't execute whereas it previously did. Conclusion, mostly revert the recent changes which makes the cset very messy looking but using proper accessor in the previous-like version. The effective changes against the original can be viewed with: git-diff 915219441d566f1da0caa0e262be49b666159e17^ \ net/ipv4/tcp_input.c | sed -n -e '57,70 p' Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: unset IFF_XMIT_DST_RELEASE in ipgre_tunnel_setup()Eric Dumazet2009-05-29
| | | | | | | | | | | | | | | | | | | | | | | | ipgre_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit() to no release it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: unset IFF_XMIT_DST_RELEASE in ipip_tunnel_setup()Eric Dumazet2009-05-29
| | | | | | | | | | | | | | | | | | | | | | | | ipip_tunnel_xmit() might need skb->dst, so tell dev_hard_start_xmit() to no release it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | tcp: Use SKB queue and list helpers instead of doing it by-hand.David S. Miller2009-05-29
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netfilter: nf_ct_icmp: keep the ICMP ct entries longerJan Kasprzak2009-06-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: ipt_MASQUERADE: remove redundant rwlockFlorian Westphal2009-06-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The lock "protects" an assignment and a comparision of an integer. When the caller of device_cmp() evaluates the result, nat->masq_index may already have been changed (regardless if the lock is there or not). So, the lock either has to be held during nf_ct_iterate_cleanup(), or can be removed. This does the latter. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: x_tables: added hook number into match extension parameter structure.Evgeniy Polyakov2009-06-04
| | | | | | | | | | | | | | | Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | | netfilter: conntrack: simplify event caching systemPablo Neira Ayuso2009-06-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | Merge branch 'master' of git://dev.medozas.de/linuxPatrick McHardy2009-06-02
|\ \ \ | |/ / |/| |
| * | netfilter: xtables: consolidate comefrom debug cast accessJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove another level of indentJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove some gotoJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | Combining two ifs, and goto is easily gone. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: reduce indent level by oneJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | Cosmetic only. Transformation applied: -if (foo) { long block; } else { short block; } +if (!foo) { short block; continue; } long block; Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: consolidate open-coded logicJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: fix const inconsistencyJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: remove redundant castsJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: use NFPROTO_ in standard targetsJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: queue: use NFPROTO_ for queue callsitesJan Engelhardt2009-05-08
| | | | | | | | | | | | | | | | | | af is an nfproto. Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
| * | netfilter: xtables: use NFPROTO_ for xt_proto_init callsitesJan Engelhardt2009-05-08
| | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
* | | tcp: Do not check flush when comparing options for GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to repeatedly check flush when comparing TCP options for GRO as it will be false 99% of the time where it matters. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Use 32-bit loads for ID and length in GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | This patch optimises the IPv4 GRO code by using 32-bit loads (instead of 16-bit ones) on the ID and length checks in the receive function. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | gro: Avoid unnecessary comparison after skb_gro_headerHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For the overwhelming majority of cases, skb_gro_header's return value cannot be NULL. Yet we must check it because of its current form. This patch splits it up into multiple functions in order to avoid this. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Optimise len/mss comparisonHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of checking len > mss || len == 0, we can accomplish both by checking (len - 1) > mss using the unsigned wraparound. At nearly a million times a second, this might just help. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Remove unnecessary window comparisons for GROHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | The window has already been checked as part of the flag word so there is no need to check it explicitly. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Optimise GRO port comparisonsHerbert Xu2009-05-27
| | | | | | | | | | | | | | | | | | | | | | | | Instead of doing two 16-bit operations for the source/destination ports, we can do one 32-bit operation to take care both. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2009-05-25
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/iwlwifi/iwl-agn.c drivers/net/wireless/iwlwifi/iwl3945-base.c
| * | ipv4: Fix oops with FIB_TRIERobert Olsson2009-05-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems we can fix this by disabling preemption while we re-balance the trie. This is with the CONFIG_CLASSIC_RCU. It's been stress-tested at high loads continuesly taking a full BGP table up/down via iproute -batch. Note. fib_trie is not updated for CONFIG_PREEMPT_RCU Reported-by: Andrei Popa Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix rtable leak in net/ipv4/route.cEric Dumazet2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Alexander V. Lukyanov found a regression in 2.6.29 and made a complete analysis found in http://bugzilla.kernel.org/show_bug.cgi?id=13339 Quoted here because its a perfect one : begin_of_quotation 2.6.29 patch has introduced flexible route cache rebuilding. Unfortunately the patch has at least one critical flaw, and another problem. rt_intern_hash calculates rthi pointer, which is later used for new entry insertion. The same loop calculates cand pointer which is used to clean the list. If the pointers are the same, rtable leak occurs, as first the cand is removed then the new entry is appended to it. This leak leads to unregister_netdevice problem (usage count > 0). Another problem of the patch is that it tries to insert the entries in certain order, to facilitate counting of entries distinct by all but QoS parameters. Unfortunately, referencing an existing rtable entry moves it to list beginning, to speed up further lookups, so the carefully built order is destroyed. For the first problem the simplest patch it to set rthi=0 when rthi==cand, but it will also destroy the ordering. end_of_quotation Problematic commit is 1080d709fb9d8cd4392f93476ee46a9d6ea05a5b (net: implement emergency route cache rebulds when gc_elasticity is exceeded) Trying to keep dst_entries ordered is too complex and breaks the fact that order should depend on the frequency of use for garbage collection. A possible fix is to make rt_intern_hash() simpler, and only makes rt_check_expire() a litle bit smarter, being able to cope with an arbitrary entries order. The added loop is running on cache hot data, while cpu is prefetching next object, so should be unnoticied. Reported-and-analyzed-by: Alexander V. Lukyanov <lav@yar.ru> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: fix length computation in rt_check_expire()Eric Dumazet2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rt_check_expire() computes average and standard deviation of chain lengths, but not correclty reset length to 0 at beginning of each chain. This probably gives overflows for sum2 (and sum) on loaded machines instead of meaningful results. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | ipv4: make default for INET_LRO consistent with help textFrans Pop2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e81963b1 ("ipv4: Make INET_LRO a bool instead of tristate.") changed this config from tristate to bool. Add default so that it is consistent with the help text. Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Remove unused parameter from fill method in fib_rules_ops.Rami Rosen2009-05-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The netlink message header (struct nlmsghdr) is an unused parameter in fill method of fib_rules_ops struct. This patch removes this parameter from this method and fixes the places where this method is called. (include/net/fib_rules.h) Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: teach ipconfig about the MTU option in DHCPChris Friesen2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The DHCP spec allows the server to specify the MTU. This can be useful for netbooting with UDP-based NFS-root on a network using jumbo frames. This patch allows the kernel IP autoconfiguration to handle this option correctly. It would be possible to use initramfs and add a script to set the MTU, but that seems like a complicated solution if no initramfs is otherwise necessary, and would bloat the kernel image more than this code would. This patch was originally submitted to LKML in 2003 by Hans-Peter Jansen. Signed-off-by: Chris Friesen <cfriesen@nortel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Fix devinet_sysctl_forwardEric W. Biederman2009-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctls are unregistered with the rntl_lock held making it unsafe to unconditionally grab the the rtnl_lock. Instead we need to call rtnl_trylock and restart the system call if we can not grab it. Otherwise we could deadlock at unregistration time. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>