aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
...
* | ipv4: Kill dst_copy_metrics() call from ipv4_blackhole_route().David S. Miller2012-07-11
| | | | | | | | | | | | | | | | Blackhole routes have a COW metrics operation that returns NULL always, therefore this dst_copy_metrics() call did absolutely nothing. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Enforce max MTU metric at route insertion time.David S. Miller2012-07-11
| | | | | | | | | | | | Rather than at every struct rtable creation. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Maintain redirect and PMTU info in struct rtable again.David S. Miller2012-07-11
| | | | | | | | | | | | | | Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* | rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo().David S. Miller2012-07-11
| | | | | | | | | | | | Nobody provides non-zero values any longer. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Kill FLOWI_FLAG_PRECOW_METRICS.David S. Miller2012-07-11
| | | | | | | | | | | | | | | | No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Minimize use of cached route inetpeer.David S. Miller2012-07-11
| | | | | | | | | | | | | | | | | | | | | | | | Only use it in the absolutely required cases: 1) COW'ing metrics 2) ipv4 PMTU 3) ipv4 redirects Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Remove ->get_peer() method.David S. Miller2012-07-11
| | | | | | | | | | | | No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Remove tw->tw_peerDavid S. Miller2012-07-11
| | | | | | | | | | | | No longer used. Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Move timestamps from inetpeer to metrics cache.David S. Miller2012-07-11
| | | | | | | | | | | | With help from Lin Ming. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Don't report route RTT metric value in cache dumps.David S. Miller2012-07-11
| | | | | | | | | | | | | | We don't maintain it dynamically any longer, so reporting it would be extremely misleading. Report zero instead. Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Maintain dynamic metrics in local cache.David S. Miller2012-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Maintain a local hash table of TCP dynamic metrics blobs. Computed TCP metrics are no longer maintained in the route metrics. The table uses RCU and an extremely simple hash so that it has low latency and low overhead. A simple hash is legitimate because we only make metrics blobs for fully established connections. Some tweaking of the default hash table sizes, metric timeouts, and the hash chain length limit certainly could use some tweaking. But the basic design seems sound. With help from Eric Dumazet and Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Abstract back handling peer aliveness test into helper function.David S. Miller2012-07-10
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Move dynamnic metrics handling into seperate file.David S. Miller2012-07-10
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2012-07-07
|\ \
| * | netfilter: nf_conntrack: generalize nf_ct_l4proto_netPablo Neira Ayuso2012-07-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch generalizes nf_ct_l4proto_net by splitting it into chunks and moving the corresponding protocol part to where it really belongs to. To clarify, note that we follow two different approaches to support per-net depending if it's built-in or run-time loadable protocol tracker. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Gao feng <gaofeng@cn.fujitsu.com>
| * | netfilter: nf_ct_icmp: add icmp_kmemdup[_compat]_sysctl_table functionGao feng2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | Split sysctl function into smaller chucks to cleanup code and prepare patches to reduce ifdef pollution. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| * | netfilter: nf_conntrack: prepare l4proto->init_net cleanupGao feng2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | l4proto->init contain quite redundant code. We can simplify this by adding a new parameter l3proto. This patch prepares that code simplification. Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | ipv4: Avoid overhead when no custom FIB rules are installed.David S. Miller2012-07-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the user hasn't actually installed any custom rules, or fiddled with the default ones, don't go through the whole FIB rules layer. It's just pure overhead. Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check the individual tables by hand, one by one. Also, move fib_num_tclassid_users into the ipv4 network namespace. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: defer fib_compute_spec_dst() callEric Dumazet2012-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ip_options_compile() can avoid calling fib_compute_spec_dst() by default, and perform the call only if needed. David suggested to add a helper to make the call only once. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: No need to set generic neighbour pointer.David S. Miller2012-07-05
| | | | | | | | | | | | | | | | | | Nobody reads it any longer. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Add optional SKB arg to dst_ops->neigh_lookup().David S. Miller2012-07-05
| | | | | | | | | | | | | | | | | | | | | Causes the handler to use the daddr in the ipv4/ipv6 header when the route gateway is unspecified (local subnet). Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Do delayed neigh confirmation.David S. Miller2012-07-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a dst_confirm() happens, mark the confirmation as pending in the dst. Then on the next packet out, when we have the neigh in-hand, do the update. This removes the dependency in dst_confirm() of dst's having an attached neigh. While we're here, remove the explicit 'dst' NULL check, all except 2 or 3 call sites ensure it's not NULL. So just fix those cases up. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Don't report neigh uptodate state in rtcache procfs.David S. Miller2012-07-05
| | | | | | | | | | | | | | | | | | | | | | | | Soon routes will not have a cached neigh attached, nor will we be able to necessarily go directly to a neigh from an arbitrary route. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Make neigh lookups directly in output packet path.David S. Miller2012-07-05
| | | | | | | | | | | | | | | | | | Do not use the dst cached neigh, we'll be getting rid of that. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Fix crashes in ip_options_compile().David S. Miller2012-07-04
| | | | | | | | | | | | | | | | | | | | | | | | The spec_dst uses should be guarded by skb_rtable() being non-NULL not just the SKB being non-null. Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | netlink: add netlink_kernel_cfg parameter to netlink_kernel_createPablo Neira Ayuso2012-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the following structure: struct netlink_kernel_cfg { unsigned int groups; void (*input)(struct sk_buff *skb); struct mutex *cb_mutex; }; That can be passed to netlink_kernel_create to set optional configurations for netlink kernel sockets. I've populated this structure by looking for NULL and zero parameters at the existing code. The remaining parameters that always need to be set are still left in the original interface. That includes optional parameters for the netlink socket creation. This allows easy extensibility of this interface in the future. This patch also adapts all callers to use this new interface. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Elide fib_validate_source() completely when possible.David S. Miller2012-06-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If rpfilter is off (or the SKB has an IPSEC path) and there are not tclassid users, we don't have to do anything at all when fib_validate_source() is invoked besides setting the itag to zero. We monitor tclassid uses with a counter (modified only under RTNL and marked __read_mostly) and we protect the fib_validate_source() real work with a test against this counter and whether rpfilter is to be done. Having a way to know whether we need no tclassid processing or not also opens the door for future optimized rpfilter algorithms that do not perform full FIB lookups. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Remove extraneous assignment of dst->tclassid.David S. Miller2012-06-29
| | | | | | | | | | | | | | | | | | We already set it several lines above. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Adjust in_dev handling in fib_validate_source()David S. Miller2012-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Checking for in_dev being NULL is pointless. In fact, all of our callers have in_dev precomputed already, so just pass it in and remove the NULL checking. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Fix bugs in fib_compute_spec_dst().David S. Miller2012-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Based upon feedback from Julian Anastasov. 1) Use route flags to determine multicast/broadcast, not the packet flags. 2) Leave saddr unspecified in flow key. 3) Adjust how we invoke inet_select_addr(). Pass ip_hdr(skb)->saddr as second arg, and if it was zeronet use link scope. 4) Use loopback as input interface in flow key. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Kill rt->rt_spec_dst, no longer used.David S. Miller2012-06-28
| | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Create and use fib_compute_spec_dst() helper.David S. Miller2012-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The specific destination is the host we direct unicast replies to. Usually this is the original packet source address, but if we are responding to a multicast or broadcast packet we have to use something different. Specifically we must use the source address we would use if we were to send a packet to the unicast source of the original packet. The routing cache precomputes this value, but we want to remove that precomputation because it creates a hard dependency on the expensive rpfilter source address validation which we'd like to make cheaper. There are only three places where this matters: 1) ICMP replies. 2) pktinfo CMSG 3) IP options Now there will be no real users of rt->rt_spec_dst and we can simply remove it altogether. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Show that ip_send_reply() is purely unicast routine.David S. Miller2012-06-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | Rename it to ip_send_unicast_reply() and add explicit 'saddr' argument. This removed one of the few users of rt->rt_spec_dst. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Kill early demux method return value.David S. Miller2012-06-28
| | | | | | | | | | | | | | | | | | It's completely unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Revert "ipv4: tcp: dont cache unconfirmed intput dst"David S. Miller2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit c074da2810c118b3812f32d6754bd9ead2f169e7. This change has several unwanted side effects: 1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll thus never create a real cached route. 2) All TCP traffic will use DST_NOCACHE and never use the routing cache at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: skb_free_datagram_locked() doesnt drop all packetsEric Dumazet2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dropwatch wrongly diagnose all received UDP packets as drops. This patch removes trace_kfree_skb() done in skb_free_datagram_locked(). Locations calling skb_free_datagram_locked() should do it on their own. As a result, drops are accounted on the right function. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipmr: Do not use RTA_PUT() macrosThomas Graf2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | Also fix a needless skb tailroom check for a 4 bytes area after after each rtnexthop block. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inet_diag: Do not use RTA_PUT() macrosThomas Graf2012-06-27
| | | | | | | | | | | | | | | | | | | | | | | | Also, no need to trim on nlmsg_put() failure, nothing has been added yet. We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: tcp: dont cache unconfirmed intput dstEric Dumazet2012-06-27
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DDOS synflood attacks hit badly IP route cache. On typical machines, this cache is allowed to hold up to 8 Millions dst entries, 256 bytes for each, for a total of 2GB of memory. rt_garbage_collect() triggers and tries to cleanup things. Eventually route cache is disabled but machine is under fire and might OOM and crash. This patch exploits the new TCP early demux, to set a nocache boolean in case incoming TCP frame is for a not yet ESTABLISHED or TIMEWAIT socket. This 'nocache' boolean is then used in case dst entry is not found in route cache, to create an unhashed dst entry (DST_NOCACHE) SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache output dst for syncookies), so after this patch, a machine is able to absorb a DDOS synflood attack without polluting its IP route cache. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | netfilter: ipt_ULOG: Move away from NLMSG_PUT().David S. Miller2012-06-27
| | | | | | | | | | | | And use nlmsg_data() while we're here too. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet_diag: Move away from NLMSG_PUT().David S. Miller2012-06-27
| | | | | | | | | | | | | | And use nlmsg_data() while we're here too, and remove useless casts. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Cache ip_error() routes even when not forwarding.David S. Miller2012-06-26
| | | | | | | | | | | | | | | | | | | | | | | | And account for the fact that, when we are not forwarding, we should bump statistic counters rather than emit an ICMP response. RP-filter rejected lookups are still not cached. Since -EHOSTUNREACH and -ENETUNREACH can now no longer be seen in ip_rcv_finish(), remove those checks. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Remove unnecessary code from rt_check_expire().David S. Miller2012-06-26
| | | | | | | | | | | | | | | | IPv4 routing cache entries no longer use dst->expires, because the metrics, PMTU, and redirect information are stored in the inetpeer cache. Signed-off-by: David S. Miller <davem@davemloft.net>
* | tcp: Fix bug in tcp socket early demuxVijay Subramanian2012-06-24
| | | | | | | | | | | | | | | | | | The dest port for the call to __inet_lookup_established() in TCP early demux code is passed with the wrong endian-ness. This causes the lookup to fail leading to early demux not being used. Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of git://1984.lsi.us.es/nf-nextDavid S. Miller2012-06-23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo says: ==================== The following four patches provide Netfilter fixes for the cthelper infrastructure that was recently merged mainstream, they are: * two fixes for compilation breakage with two different configurations: - CONFIG_NF_NAT=m and CONFIG_NF_CT_NETLINK=y - NF_CONNTRACK_EVENTS=n and CONFIG_NETFILTER_NETLINK_QUEUE_CT=y * two fixes for sparse warnings. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| * | netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and ↵Pablo Neira Ayuso2012-06-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_NF_CT_NETLINK=y LD init/built-in.o net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust' make: *** [vmlinux] Error 1 This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing in Netfilter to solve our complicated configuration dependencies. Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | ipv4: tcp: dont cache output dst for syncookiesEric Dumazet2012-06-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't cache output dst for syncookies, as this adds pressure on IP route cache and rcu subsystem for no gain. Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Hans Schillstrom <hans.schillstrom@ericsson.com> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Add sysctl knob to control early socket demuxAlexander Duyck2012-06-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is meant to add a control for disabling early socket demux. The main motivation behind this patch is to provide an option to disable the feature as it adds an additional cost to routing that reduces overall throughput by up to 5%. For example one of my systems went from 12.1Mpps to 11.6 after the early socket demux was added. It looks like the reason for the regression is that we are now having to perform two lookups, first the one for an established socket, and then the one for the routing table. By adding this patch and toggling the value for ip_early_demux to 0 I am able to get back to the 12.1Mpps I was previously seeing. [ Move local variables in ip_rcv_finish() down into the basic block in which they are actually used. -DaveM ] Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Validate route interface in early demux.David S. Miller2012-06-21
| | | | | | | | | | | | | | | | | | Otherwise we might violate reverse path filtering. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | inetpeer: inetpeer_invalidate_tree() cleanupEric Dumazet2012-06-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No need to use cmpxchg() in inetpeer_invalidate_tree() since we hold base lock. Also use correct rcu annotations to remove sparse errors (CONFIG_SPARSE_RCU_POINTER=y) net/ipv4/inetpeer.c:144:19: error: incompatible types in comparison expression (different address spaces) net/ipv4/inetpeer.c:149:20: error: incompatible types in comparison expression (different address spaces) net/ipv4/inetpeer.c:595:10: error: incompatible types in comparison expression (different address spaces) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>