aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv6
Commit message (Collapse)AuthorAge
...
* | | | | | tcp: do not set queue_mapping on SYNACKEric Dumazet2015-10-19
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | At the time of commit fff326990789 ("tcp: reflect SYN queue_mapping into SYNACK packets") we had little ways to cope with SYN floods. We no longer need to reflect incoming skb queue mappings, and instead can pick a TX queue based on cpu cooking the SYNACK, with normal XPS affinities. Note that all SYNACK retransmits were picking TX queue 0, this no longer is a win given that SYNACK rtx are now distributed on all cpus. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: add inet_csk_reqsk_queue_drop_and_put() helperEric Dumazet2015-10-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's reduce the confusion about inet_csk_reqsk_queue_drop() : In many cases we also need to release reference on request socket, so add a helper to do this, reducing code size and complexity. Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: fix behavior of stale SYN_RECV request socketsEric Dumazet2015-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a TCP/DCCP listener is closed, its pending SYN_RECV request sockets become stale, meaning 3WHS can not complete. But current behavior is wrong : incoming packets finding such stale sockets are dropped. We need instead to cleanup the request socket and perform another lookup : - Incoming ACK will give a RST answer, - SYN rtx might find another listener if available. - We expedite cleanup of request sockets and old listener socket. Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Add VRF support to IPv6 stackDavid Ahern2015-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As with IPv4 support for VRFs added to IPv6 stack by replacing hardcoded table ids with possibly device specific ones and manipulating the oif in the flowi6. The flow flags are used to skip oif compare in nexthop lookups if the device is enslaved to a VRF via the L3 master device. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Export fib6_get_table and nd_tblDavid Ahern2015-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv6 route: use err pointers instead of returning pointer by referenceRoopa Prabhu2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes ip6_route_info_create return err pointer instead of returning the rt pointer by reference as suggested by Dave Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv6: Pass struct net into nf_ct_frag6_gatherEric W. Biederman2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function nf_ct_frag6_gather is called on both the input and the output paths of the networking stack. In particular ipv6_defrag which calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain on input and the LOCAL_OUT chain on output. The addition of a net parameter makes it explicit which network namespace the packets are being reassembled in, and removes the need for nf_ct_frag6_gather to guess. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: shrink struct sock and request_sock by 8 bytesEric Dumazet2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One 32bit hole is following skc_refcnt, use it. skc_incoming_cpu can also be an union for request_sock rcv_wnd. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: SO_INCOMING_CPU setsockopt() supportEric Dumazet2015-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SO_INCOMING_CPU as added in commit 2c8c56e15df3 was a getsockopt() command to fetch incoming cpu handling a particular TCP flow after accept() This commits adds setsockopt() support and extends SO_REUSEPORT selection logic : If a TCP listener or UDP socket has this option set, a packet is delivered to this socket only if CPU handling the packet matches the specified one. This allows to build very efficient TCP servers, using one listener per RX queue, as the associated TCP listener should only accept flows handled in softirq by the same cpu. This provides optimal NUMA behavior and keep cpu caches hot. Note that __inet_lookup_listener() still has to iterate over the list of all listeners. Following patch puts sk_refcnt in a different cache line to let this iteration hit only shared and read mostly cache lines. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | dst: Pass net into dst->outputEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The network namespace is already passed into dst_output pass it into dst->output lwt->output and friends. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv4, ipv6: Pass net into ip_local_out and ip6_local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv4, ipv6: Pass net into __ip_local_out and __ip6_local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv6: Merge ip6_local_out and ip6_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop hidding the sk parameter with an inline helper function and make all of the callers pass it, so that it is clear what the function is doing. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | ipv6: Merge __ip6_local_out and __ip6_local_out_skEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only __ip6_local_out_sk has callers so rename __ip6_local_out_sk __ip6_local_out and remove the previous __ip6_local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | dst: Pass a sk into .local_outEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For consistency with the other similar methods in the kernel pass a struct sock into the dst_ops .local_out method. Simplifying the socket passing case is needed a prequel to passing a struct net reference into .local_out. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Pass net into dst_output and remove dst_output_okfnEric W. Biederman2015-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace dst_output_okfn with dst_output Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Fix vti use case with oif in dst lookups for IPv6David Ahern2015-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It occurred to me yesterday that 741a11d9e4103 ("net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is set") means that xfrm6_dst_lookup needs the FLOWI_FLAG_SKIP_NH_OIF flag set. This latest commit causes the oif to be considered in lookups which is known to break vti. This explains why 58189ca7b274 did not the IPv6 change at the time it was submitted. Fixes: 42a7b32b73d6 ("xfrm: Add oif to dst lookups") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'master' of ↵David S. Miller2015-10-05
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next Eric W. Biederman says: ==================== net: Pass net through ip fragmention This is the next installment of my work to pass struct net through the output path so the code does not need to guess how to figure out which network namespace it is in, and ultimately routes can have output devices in another network namespace. This round focuses on passing net through ip fragmentation which we seem to call from about everywhere. That is the main ip output paths, the bridge netfilter code, and openvswitch. This has to happend at once accross the tree as function pointers are involved. First some prep work is done, then ipv4 and ipv6 are converted and then temporary helper functions are removed. ==================== Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | ipv6: Add missing newline to __xfrm6_output_finishEric W. Biederman2015-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a newline between variable declarations and the code. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
| * | | | | ipv6: Pass struct net through ip6_fragmentEric W. Biederman2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | | | | | ipv6: use ktime_t for internal timestampsArnd Bergmann2015-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ipv6 mip6 implementation is one of only a few users of the skb_get_timestamp() function in the kernel, which is both unsafe on 32-bit architectures because of the 2038 overflow, and slightly less efficient than the skb_get_ktime() based approach. This converts the function call and the mip6_report_rate_limiter structure that stores the time stamp, eliminating all uses of timeval in the ipv6 code. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | tcp: avoid two atomic ops for syncookiesEric Dumazet2015-10-05
| |/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inet_reqsk_alloc() is used to allocate a temporary request in order to generate a SYNACK with a cookie. Then later, syncookie validation also uses a temporary request. These paths already took a reference on listener refcount, we can avoid a couple of atomic operations. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: do not lock listener to process SYN packetsEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Everything should now be ready to finally allow SYN packets processing without holding listener lock. Tested: 3.5 Mpps SYNFLOOD. Plenty of cpu cycles available. Next bottleneck is the refcount taken on listener, that could be avoided if we remove SLAB_DESTROY_BY_RCU strict semantic for listeners, and use regular RCU. 13.18% [kernel] [k] __inet_lookup_listener 9.61% [kernel] [k] tcp_conn_request 8.16% [kernel] [k] sha_transform 5.30% [kernel] [k] inet_reqsk_alloc 4.22% [kernel] [k] sock_put 3.74% [kernel] [k] tcp_make_synack 2.88% [kernel] [k] ipt_do_table 2.56% [kernel] [k] memcpy_erms 2.53% [kernel] [k] sock_wfree 2.40% [kernel] [k] tcp_v4_rcv 2.08% [kernel] [k] fib_table_lookup 1.84% [kernel] [k] tcp_openreq_init_rwin Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: attach SYNACK messages to request sockets instead of listenerEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a listen backlog is very big (to avoid syncookies), then the listener sk->sk_wmem_alloc is the main source of false sharing, as we need to touch it twice per SYNACK re-transmit and TX completion. (One SYN packet takes listener lock once, but up to 6 SYNACK are generated) By attaching the skb to the request socket, we remove this source of contention. Tested: listen(fd, 10485760); // single listener (no SO_REUSEPORT) 16 RX/TX queue NIC Sustain a SYNFLOOD attack of ~320,000 SYN per second, Sending ~1,400,000 SYNACK per second. Perf profiles now show listener spinlock being next bottleneck. 20.29% [kernel] [k] queued_spin_lock_slowpath 10.06% [kernel] [k] __inet_lookup_established 5.12% [kernel] [k] reqsk_timer_handler 3.22% [kernel] [k] get_next_timer_interrupt 3.00% [kernel] [k] tcp_make_synack 2.77% [kernel] [k] ipt_do_table 2.70% [kernel] [k] run_timer_softirq 2.50% [kernel] [k] ip_finish_output 2.04% [kernel] [k] cascade Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: install syn_recv requests into ehash tableEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this patch, we insert request sockets into TCP/DCCP regular ehash table (where ESTABLISHED and TIMEWAIT sockets are) instead of using the per listener hash table. ACK packets find SYN_RECV pseudo sockets without having to find and lock the listener. In nominal conditions, this halves pressure on listener lock. Note that this will allow for SO_REUSEPORT refinements, so that we can select a listener using cpu/numa affinities instead of the prior 'consistent hash', since only SYN packets will apply this selection logic. We will shrink listen_sock in the following patch to ease code review. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Ying Cai <ycai@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argumentEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is no longer used. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: get_openreq[46]() changesEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When request sockets are no longer in a per listener hash table but on regular TCP ehash, we need to access listener uid through req->rsk_listener get_openreq6() also gets a const for its request socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: cleanup tcp_v[46]_inbound_md5_hash()Eric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll soon have to call tcp_v[46]_inbound_md5_hash() twice. Also add const attribute to the socket, as it might be the unlocked listener for SYN packets. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: call sk_mark_napi_id() on the child, not the listenerEric Dumazet2015-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a typo : We want to store the NAPI id on child socket. Presumably nobody really uses busy polling, on short lived flows. Fixes: 3d97379a67486 ("tcp: move sk_mark_napi_id() at the right place") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-10-02
|\ \ \ \ \ | |/ / / / |/| / / / | |/ / / | | | | | | | | | | | | | | | | | | | | Conflicts: net/dsa/slave.c net/dsa/slave.c simply had overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: ipv6: Add RT6_LOOKUP_F_IFACE flag if oif is setDavid Ahern2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wolfgang reported that IPv6 stack is ignoring oif in output route lookups: With ipv6, ip -6 route get always returns the specific route. $ ip -6 r 2001:db8:e2::1 dev enp2s0 proto kernel metric 256 2001:db8:e2::/64 dev enp2s0 metric 1024 2001:db8:e3::1 dev enp3s0 proto kernel metric 256 2001:db8:e3::/64 dev enp3s0 metric 1024 fe80::/64 dev enp3s0 proto kernel metric 256 default via 2001:db8:e3::255 dev enp3s0 metric 1024 $ ip -6 r get 2001:db8:e2::100 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache $ ip -6 r get 2001:db8:e2::100 oif enp3s0 2001:db8:e2::100 from :: dev enp2s0 src 2001:db8:e3::1 metric 0 cache The stack does consider the oif but a mismatch in rt6_device_match is not considered fatal because RT6_LOOKUP_F_IFACE is not set in the flags. Cc: Wolfgang Nothdurft <netdev@linux-dude.de> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller2015-09-30
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following pull request contains Netfilter/IPVS updates for net-next containing 90 patches from Eric Biederman. The main goal of this batch is to avoid recurrent lookups for the netns pointer, that happens over and over again in our Netfilter/IPVS code. The idea consists of passing netns pointer from the hook state to the relevant functions and objects where this may be needed. You can find more information on the IPVS updates from Simon Horman's commit merge message: c3456026adc0 ("Merge tag 'ipvs2-for-v4.4' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs-next"). Exceptionally, this time, I'm not posting the patches again on netdev, Eric already Cc'ed this mailing list in the original submission. If you need me to make, just let me know. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | ipv6: Pass struct net into ip6_route_me_harderEric W. Biederman2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't make ip6_route_me_harder guess which network namespace it is routing in, pass the network namespace in. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | | | netfilter: Push struct net down into nf_afinfo.rerouteEric W. Biederman2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The network namespace is needed when routing a packet. Stop making nf_afinfo.reroute guess which network namespace is the proper namespace to route the packet in. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | tcp: fix tcp_v6_md5_do_lookup prototypeEric Dumazet2015-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tcp_v6_md5_do_lookup() now takes a const socket, even if CONFIG_TCP_MD5SIG is not set. Fixes: b83e3deb974c ("tcp: md5: constify tcp_md5_do_lookup() socket argument") From: Eric Dumazet <edumazet@google.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | net: Replace vrf_master_ifindex{, _rcu} with l3mdev equivalentsDavid Ahern2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either l3mdev_master_ifindex_rcu or l3mdev_master_ifindex. The pattern: oif = vrf_master_ifindex(dev) ? : dev->ifindex; is replaced with oif = l3mdev_fib_oif(dev); And remove the now unused vrf macros. Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: prepare fastopen code for upcoming listener changesEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While auditing TCP stack for upcoming 'lockless' listener changes, I found I had to change fastopen_init_queue() to properly init the object before publishing it. Otherwise an other cpu could try to lock the spinlock before it gets properly initialized. Instead of adding appropriate barriers, just remove dynamic memory allocations : - Structure is 28 bytes on 64bit arches. Using additional 8 bytes for holding a pointer seems overkill. - Two listeners can share same cache line and performance would suffer. If we really want to save few bytes, we would instead dynamically allocate whole struct request_sock_queue in the future. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: constify tcp_v{4|6}_route_req() sock argumentEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These functions do not change the listener socket. Goal is to make sure tcp_conn_request() is not messing with listener in a racy way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: cookie_init_sequence() cleanupsEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some common IPv4/IPv6 code can be factorized. Also constify cookie_init_sequence() socket argument. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: constify syn_recv_sock() method sock argumentEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We'll soon no longer hold listener socket lock, these functions do not modify the socket in any way. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | dccp: use inet6_csk_route_req() helperEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before changing dccp_v6_request_recv_sock() sock argument to const, we need to get rid of security_sk_classify_flow(), and it seems doable by reusing inet6_csk_route_req() helper. We need to add a proto parameter to inet6_csk_route_req(), not assume it is TCP. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: remove tcp_rcv_state_process() tcp_hdr argumentEric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Factorize code to get tcp header from skb. It makes no sense to duplicate code in callers. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp: remove unused len argument from tcp_rcv_state_process()Eric Dumazet2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Once we realize tcp_rcv_synsent_state_process() does not use its 'len' argument and we get rid of it, then it becomes clear this argument is no longer used in tcp_rcv_state_process() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | tcp/dccp: constify send_synack and send_reset socket argumentEric Dumazet2015-09-29
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | None of these functions need to change the socket, make it const. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | net: Remove redundant oif checks in rt6_device_matchDavid Ahern2015-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The oif has already been checked that it is non-zero; the 2 additional checks on oif within that if (oif) {...} block are redundant. CC: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-09-26
|\ \ \ \ | | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/ipv4/arp.c The net/ipv4/arp.c conflict was one commit adding a new local variable while another commit was deleting one. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip6_tunnel: Reduce log level in ip6_tnl_err() to debugMatt Bennett2015-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently error log messages in ip6_tnl_err are printed at 'warn' level. This is different to other tunnel types which don't print any messages. These log messages don't provide any information that couldn't be deduced with networking tools. Also it can be annoying to have one end of the tunnel go down and have the logs fill with pointless messages such as "Path to destination invalid or inactive!". This patch reduces the log level of these messages to 'dbg' level to bring the visible behaviour into line with other tunnel types. Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ip6_gre: Reduce log level in ip6gre_err() to debugMatt Bennett2015-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently error log messages in ip6gre_err are printed at 'warn' level. This is different to most other tunnel types which don't print any messages. These log messages don't provide any information that couldn't be deduced with networking tools. Also it can be annoying to have one end of the tunnel go down and have the logs fill with pointless messages such as "Path to destination invalid or inactive!". This patch reduces the log level of these messages to 'dbg' level to bring the visible behaviour into line with other tunnel types. Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: Fix behaviour of unreachable, blackhole and prohibit routesNikola Forró2015-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Man page of ip-route(8) says following about route types: unreachable - these destinations are unreachable. Packets are dis‐ carded and the ICMP message host unreachable is generated. The local senders get an EHOSTUNREACH error. blackhole - these destinations are unreachable. Packets are dis‐ carded silently. The local senders get an EINVAL error. prohibit - these destinations are unreachable. Packets are discarded and the ICMP message communication administratively prohibited is generated. The local senders get an EACCES error. In the inet6 address family, this was correct, except the local senders got ENETUNREACH error instead of EHOSTUNREACH in case of unreachable route. In the inet address family, all three route types generated ICMP message net unreachable, and the local senders got ENETUNREACH error. In both address families all three route types now behave consistently with documentation. Signed-off-by: Nikola Forró <nforro@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | ipv6: ip6_fragment: fix headroom tests and skb leakFlorian Westphal2015-09-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | David Woodhouse reports skb_under_panic when we try to push ethernet header to fragmented ipv6 skbs: skbuff: skb_under_panic: text:c1277f1e len:1294 put:14 head:dec98000 data:dec97ffc tail:0xdec9850a end:0xdec98f40 dev:br-lan [..] ip6_finish_output2+0x196/0x4da David further debugged this: [..] offending fragments were arriving here with skb_headroom(skb)==10. Which is reasonable, being the Solos ADSL card's header of 8 bytes followed by 2 bytes of PPP frame type. The problem is that if netfilter ipv6 defragmentation is used, skb_cow() in ip6_forward will only see reassembled skb. Therefore, headroom is overestimated by 8 bytes (we pulled fragment header) and we don't check the skbs in the frag_list either. We can't do these checks in netfilter defrag since outdev isn't known yet. Furthermore, existing tests in ip6_fragment did not consider the fragment or ipv6 header size when checking headroom of the fraglist skbs. While at it, also fix a skb leak on memory allocation -- ip6_fragment must consume the skb. I tested this e1000 driver hacked to not allocate additional headroom (we end up in slowpath, since LL_RESERVED_SPACE is 16). If 2 bytes of headroom are allocated, fastpath is taken (14 byte ethernet header was pulled, so 16 byte headroom available in all fragments). Reported-by: David Woodhouse <dwmw2@infradead.org> Diagnosed-by: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Florian Westphal <fw@strlen.de> Tested-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>