aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
* ipv4: Rearrange how ip_route_newports() gets port keys.David S. Miller2011-02-24
| | | | | | | | | | | | | | | | | | | ip_route_newports() is the only place in the entire kernel that cares about the port members in the routing cache entry's lookup flow key. Therefore the only reason we store an entire flow inside of the struct rtentry is for this one special case. Rewrite ip_route_newports() such that: 1) The caller passes in the original port values, so we don't need to use the rth->fl.fl_ip_{s,d}port values to remember them. 2) The lookup flow is constructed by hand instead of being copied from the routing cache entry's flow. Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Const'ify address arguments to ->dst_lookup()David S. Miller2011-02-24
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Const'ify tmpl and address arguments to ->init_temprop()David S. Miller2011-02-24
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Mark flowi arg to ->init_tempsel() const.David S. Miller2011-02-22
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Mark flowi arg to ->fill_dst() const.David S. Miller2011-02-22
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: Mark flowi arg to ->get_tos() const.David S. Miller2011-02-22
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Remove debug macro of TCP_CHECK_TIMERShan Wei2011-02-20
| | | | | | | | | | Now, TCP_CHECK_TIMER is not used for debuging, it does nothing. And, it has been there for several years, maybe 6 years. Remove it to keep code clearer. Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2011-02-19
|\ | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/net/e1000e/netdev.c net/xfrm/xfrm_policy.c
| * tcp: fix inet_twsk_deschedule()Eric Dumazet2011-02-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Eric W. Biederman reported a lockdep splat in inet_twsk_deschedule() This is caused by inet_twsk_purge(), run from process context, and commit 575f4cd5a5b6394577 (net: Use rcu lookups in inet_twsk_purge.) removed the BH disabling that was necessary. Add the BH disabling but fine grained, right before calling inet_twsk_deschedule(), instead of whole function. With help from Linus Torvalds and Eric W. Biederman Reported-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Daniel Lezcano <daniel.lezcano@free.fr> CC: Pavel Emelyanov <xemul@openvz.org> CC: Arnaldo Carvalho de Melo <acme@redhat.com> CC: stable <stable@kernel.org> (# 2.6.33+) Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: provide default_advmss() methods to blackhole dst_opsEric Dumazet2011-02-18
| | | | | | | | | | | | | | | | | | | | Commit 0dbaee3b37e118a (net: Abstract default ADVMSS behind an accessor.) introduced a possible crash in tcp_connect_init(), when dst->default_advmss() is called from dst_metric_advmss() Reported-by: George Spelvin <linux@horizon.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS.Ian Campbell2011-02-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link notification while NETDEV_UP/NETDEV_CHANGEADDR generate link notifications as a sort of side effect. In the later cases the sysctl option is present because link notification events can have undesired effects e.g. if the link is flapping. I don't think this applies in the case of an explicit request from a driver. This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we could add a new sysctl for this case which defaults to on. This change causes Xen post-migration ARP notifications (which cause switches to relearn their MAC tables etc) to be sent by default. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * ip_gre: Add IPPROTO_GRE to flowi in ipgre_tunnel_xmitSteffen Klassert2011-02-11
| | | | | | | | | | | | | | | | | | | | | | Commit 5811662b15db018c740c57d037523683fd3e6123 ("net: use the macros defined for the members of flowi") accidentally removed the setting of IPPROTO_GRE from the struct flowi in ipgre_tunnel_xmit. This patch restores it. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Implement __ip_dev_find using new interface address hash.David S. Miller2011-02-18
| | | | | | | | | | | | Much quicker than going through the FIB tables. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Add hash table of interface addresses.David S. Miller2011-02-18
| | | | | | | | | | | | | | | | This will be used to optimize __ip_dev_find() and friends. With help from Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Use const'ify fib_result deep in the route call chains.David S. Miller2011-02-17
| | | | | | | | | | | | | | | | The only troublesome bit here is __mkroute_output which wants to override res->fi and res->type, compute those in local variables instead. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Avoid use of signed integers in fib_trie code.David S. Miller2011-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC emits all kinds of crazy zero extensions when we go from signed int, to unsigned short, etc. etc. This transformation has to be legal because: 1) In tkey_extract_bits() in mask_pfx(), the values are used to perform shifts, on which negative values are undefined by C. 2) In fib_table_lookup() we perform comparisons with unsigned values, constants, and additions. None of which should encounter negative values. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Add initial_ref arg to dst_alloc().David S. Miller2011-02-17
| | | | | | | | | | | | | | | | | | | | This allows avoiding multiple writes to the initial __refcnt. The most simplest cases of wanting an initial reference of "1" in ipv4 and ipv6 have been converted, the rest have been left along and kept at the existing "0". Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Consolidate ipv4 dst allocation logic.David S. Miller2011-02-17
| | | | | | | | | | | | | | This also allows us to combine all the dst->flags settings and avoid read/modify/write sequences to this struct member. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Move rcu_read_{lock,unlock}() into ip_route_output_slow().David S. Miller2011-02-17
| | | | | | | | | | | | Simplifies tail of __ip_route_output_key(). Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Simplify output route creation call sequence.David S. Miller2011-02-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a lot of redundancy and unnecessary stack frames in the output route creation path. 1) Make __mkroute_output() return error pointers. 2) Eliminate ip_mkroute_output() entirely, made possible by #1. 3) Call __mkroute_output() directly and handling the returning error pointers in ip_route_output_slow(). Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Cache learned redirect information in inetpeer.David S. Miller2011-02-15
| | | | | | | | | | | | | | | | | | | | Note that we do not generate the redirect netevent any longer, because we don't create a new cached route. Instead, once the new neighbour is bound to the cached route, we emit a neigh update event instead. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Cache learned PMTU information in inetpeer.David S. Miller2011-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The general idea is that if we learn new PMTU information, we bump the peer genid. This triggers the dst_ops->check() code to validate and if necessary propagate the new PMTU value into the metrics. Learned PMTU information self-expires. This means that it is not necessary to kill a cached route entry just because the PMTU information is too old. As a consequence: 1) When the path appears unreachable (dst_ops->link_failure or dst_ops->negative_advice) we unwind the PMTU state if it is out of date, instead of killing the cached route. A redirected route will still be invalidated in these situations. 2) rt_check_expire(), rt_worker_func(), et al. are no longer necessary at all. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: fix rcu lock imbalance in fib_select_default()Eric Dumazet2011-02-14
| | | | | | | | | | | | | | | | | | Commit 0c838ff1ade7 (ipv4: Consolidate all default route selection implementations.) forgot to remove one rcu_read_unlock() from fib_select_default(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inet: Create a mechanism for upward inetpeer propagation into routes.David S. Miller2011-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we didn't have a routing cache, we would not be able to properly propagate certain kinds of dynamic path attributes, for example PMTU information and redirects. The reason is that if we didn't have a routing cache, then there would be no way to lookup all of the active cached routes hanging off of sockets, tunnels, IPSEC bundles, etc. Consider the case where we created a cached route, but no inetpeer entry existed and also we were not asked to pre-COW the route metrics and therefore did not force the creation a new inetpeer entry. If we later get a PMTU message, or a redirect, and store this information in a new inetpeer entry, there is no way to teach that cached route about the newly existing inetpeer entry. The facilities implemented here handle this problem. First we create a generation ID. When we create a cached route of any kind, we remember the generation ID at the time of attachment. Any time we force-create an inetpeer entry in response to new path information, we bump that generation ID. The dst_ops->check() callback is where the knowledge of this event is propagated. If the global generation ID does not equal the one stored in the cached route, and the cached route has not attached to an inetpeer yet, we look it up and attach if one is found. Now that we've updated the cached route's information, we update the route's generation ID too. This clears the way for implementing PMTU and redirects directly in the inetpeer cache. There is absolutely no need to consult cached route information in order to maintain this information. At this point nothing bumps the inetpeer genids, that comes in the later changes which handle PMTUs and redirects using inetpeers. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inetpeer: Add redirect and PMTU discovery cached info.David S. Miller2011-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Validity of the cached PMTU information is indicated by it's expiration value being non-zero, just as per dst->expires. The scheme we will use is that we will remember the pre-ICMP value held in the metrics or route entry, and then at expiration time we will restore that value. In this way PMTU expiration does not kill off the cached route as is done currently. Redirect information is permanent, or at least until another redirect is received. Signed-off-by: David S. Miller <davem@davemloft.net>
* | inetpeer: Abstract address representation further.David S. Miller2011-02-10
| | | | | | | | | | | | | | | | | | | | | | Future changes will add caching information, and some of these new elements will be addresses. Since the family is implicit via the ->daddr.family member, replicating the family in ever address we store is entirely redundant. Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Kill NETEVENT_PMTU_UPDATE.David S. Miller2011-02-08
| | | | | | | | | | | | | | Nobody actually does anything in response to the event, so just kill it off. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipsec: allow to align IPv4 AH on 32 bitsNicolas Dichtel2011-02-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Linux IPv4 AH stack aligns the AH header on a 64 bit boundary (like in IPv6). This is not RFC compliant (see RFC4302, Section 3.3.3.2.1), it should be aligned on 32 bits. For most of the authentication algorithms, the ICV size is 96 bits. The AH header alignment on 32 or 64 bits gives the same results. However for SHA-256-128 for instance, the wrong 64 bit alignment results in adding useless padding in IPv4 AH, which is forbidden by the RFC. To avoid breaking backward compatibility, we use a new flag (XFRM_STATE_ALIGN4) do change original behavior. Initial patch from Dang Hongwu <hongwu.dang@6wind.com> and Christophe Gouault <christophe.gouault@6wind.com>. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | inetpeer: Move ICMP rate limiting state into inet_peer entries.David S. Miller2011-02-04
| | | | | | | | | | | | | | | | | | | | Like metrics, the ICMP rate limiting bits are cached state about a destination. So move it into the inet_peer entries. If an inet_peer cannot be bound (the reason is memory allocation failure or similar), the policy is to allow. Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipv4: Don't miss existing cached metrics in new routes.David S. Miller2011-02-04
| | | | | | | | | | | | | | | | | | | | | | Always lookup to see if we have an existing inetpeer entry for a route. Let FLOWI_FLAG_PRECOW_METRICS merely influence the "create" argument to rt_bind_peer(). Also, call rt_bind_peer() unconditionally since it is not possible for rt->peer to be non-NULL at this point. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2011-02-04
|\| | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * net: Support compat SIOCGETVIFCNT ioctl in ipv4.David S. Miller2011-02-03
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix bug in compat SIOCGETSGCNT handling.David S. Miller2011-02-03
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 709b46e8d90badda1898caea50483c12af178e96 ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") added the correct plumbing to handle SIOCGETSGCNT properly. However, whilst definiting a proper "struct compat_sioc_sg_req" it isn't actually used in ipmr_compat_ioctl(). Correct this oversight. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'master' of ↵David S. Miller2011-02-02
| |\ | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| | * netfilter: arpt_mangle: fix return values of checkentryPablo Neira Ayuso2011-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 135367b "netfilter: xtables: change xt_target.checkentry return type", the type returned by checkentry was changed from boolean to int, but the return values where not adjusted. arptables: Input/output error This broke arptables with the mangle target since it returns true under success, which is interpreted by xtables as >0, thus returning EIO. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | net: Add default_mtu() methods to blackhole dst_opsRoland Dreier2011-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an IPSEC SA is still being set up, __xfrm_lookup() will return -EREMOTE and so ip_route_output_flow() will return a blackhole route. This can happen in a sndmsg call, and after d33e455337ea ("net: Abstract default MTU metric calculation behind an accessor.") this leads to a crash in ip_append_data() because the blackhole dst_ops have no default_mtu() method and so dst_mtu() calls a NULL pointer. Fix this by adding default_mtu() methods (that simply return 0, matching the old behavior) to the blackhole dst_ops. The IPv4 part of this patch fixes a crash that I saw when using an IPSEC VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it looks to be needed as well. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Fix fib_trie build in some configurations.David S. Miller2011-02-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we end up including include/linux/node.h (either explicitly or implicitly) that header has a definition of "structt node" too. So rename the one we use in fib_trie to "rt_trie_node" to avoid the conflict. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | tcp: Increase the initial congestion window to 10.David S. Miller2011-02-02
| | | | | | | | | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Nandita Dukkipati <nanditad@google.com>
* | | ipv4: Rename fib_hash_* locals in fib_semantics.cDavid S. Miller2011-02-01
| | | | | | | | | | | | | | | | | | | | | To avoid confusion with the recently deleted fib_hash.c code, use "fib_info_hash_*" instead of plain "fib_hash_*". Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Update some fib_hash centric interface names.David S. Miller2011-02-01
| | | | | | | | | | | | | | | | | | | | | fib_hash_init() --> fib_trie_init() fib_hash_table() --> fib_trie_table() Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Remove fib_hash.David S. Miller2011-02-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time has finally come to remove the hash based routing table implementation in ipv4. FIB Trie is mature, well tested, and I've done an audit of it's code to confirm that it implements insert, delete, and lookup with the same identical semantics as fib_hash did. If there are any semantic differences found in fib_trie, we should simply fix them. I've placed the trie statistic config option under advanced router configuration. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Stephen Hemminger <shemminger@vyatta.com>
* | | ipv4: Consolidate all default route selection implementations.David S. Miller2011-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Both fib_trie and fib_hash have a local implementation of fib_table_select_default(). This is completely unnecessary code duplication. Since we now remember the fib_table and the head of the fib alias list of the default route, we can implement one single generic version of this routine. Looking at the fib_hash implementation you may get the impression that it's possible for there to be multiple top-level routes in the table for the default route. The truth is, it isn't, the insert code will only allow one entry to exist in the zero prefix hash table, because all keys evaluate to zero and all keys in a hash table must be unique. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Remember FIB alias list head and table in lookup results.David S. Miller2011-01-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | This will be used later to implement fib_select_default() in a completely generic manner, instead of the current situation where the default route is re-looked up in the TRIE/HASH table and then the available aliases are analyzed. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | Merge branch 'master' of ↵David S. Miller2011-01-31
|\| | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNTEric W. Biederman2011-01-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1, which unfortunately means the existing infrastructure for compat networking ioctls is insufficient. A trivial compact ioctl implementation would conflict with: SIOCAX25ADDUID SIOCAIPXPRISLT SIOCGETSGCNT_IN6 SIOCGETSGCNT SIOCRSSCAUSE SIOCX25SSUBSCRIP SIOCX25SDTEFACILITIES To make this work I have updated the compat_ioctl decode path to mirror the the normal ioctl decode path. I have added an ipv4 inet_compat_ioctl function so that I can have ipv4 specific compat ioctls. I have added a compat_ioctl function into struct proto so I can break out ioctls by which kind of ip socket I am using. I have added a compat_raw_ioctl function because SIOCGETSGCNT only works on raw sockets. I have added a ipmr_compat_ioctl that mirrors the normal ipmr_ioctl. This was necessary because unfortunately the struct layout for the SIOCGETSGCNT has unsigned longs in it so changes between 32bit and 64bit kernels. This change was sufficient to run a 32bit ip multicast routing daemon on a 64bit kernel. Reported-by: Bill Fenner <fenner@aristanetworks.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: If fib metrics are default, no need to grab ref to FIB info.David S. Miller2011-01-28
| | | | | | | | | | | | | | | | | | | | | | | | The fib metric memory in this case is static in the kernel image, so we don't need to reference count it since it's never going to go away on us. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Attach FIB info to dst_default_metrics when possibleDavid S. Miller2011-01-28
| | | | | | | | | | | | | | | | | | | | | If there are no explicit metrics attached to a route, hook fi->fib_info up to dst_default_metrics. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | ipv4: Allocate fib metrics dynamically.David S. Miller2011-01-28
| | | | | | | | | | | | | | | | | | | | | This is the initial gateway towards super-sharing metrics if they are all set to zero for a route. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Pre-COW metrics for TCP.David S. Miller2011-01-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | TCP is going to record metrics for the connection, so pre-COW the route metrics at route cache entry creation time. This avoids several atomic operations that have to occur if we COW the metrics after the entry reaches global visibility. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | net: Store ipv4/ipv6 COW'd metrics in inetpeer cache.David S. Miller2011-01-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Please note that the IPSEC dst entry metrics keep using the generic metrics COW'ing mechanism using kmalloc/kfree. This gives the IPSEC routes an opportunity to use metrics which are unique to their encapsulated paths. Signed-off-by: David S. Miller <davem@davemloft.net>