aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
* [NET]: Use u32 for routing table IDsPatrick McHardy2006-09-22
| | | | | | | | | | Use u32 for routing table IDs in net/ipv4 and net/decnet in preparation of support for a larger number of routing tables. net/ipv6 already uses u32 everywhere and needs no further changes. No functional changes are made by this patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Remove unnecessary config.h includes from net/Dave Jones2006-09-22
| | | | | | | config.h is automatically included by kbuild these days. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use network-order dport for all visible inet_lookup_*Herbert Xu2006-09-22
| | | | | | | | | | | | | | | | Right now most inet_lookup_* functions take a host-order hnum instead of a network-order dport because that's how it is represented internally. This means that users of these functions have to be careful about using the right byte-order. To add more confusion, inet_lookup takes a network-order dport unlike all other functions. So this patch changes all visible inet_lookup functions to take a dport and move all dport->hnum conversion inside them. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] fib: convert reader/writer to spinlockStephen Hemminger2006-09-22
| | | | | | | | | Ther is no point in using a more expensive reader/writer lock for a low contention lock like the fib_info_lock. The only reader case is in handling route redirects. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Uninline inet_lookup_listenerHerbert Xu2006-09-22
| | | | | | | | By modern standards this function is way too big to be inlined. It's even bigger than __inet_lookup_listener :) Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Remove is_setbyuser patchLouis Nyffenegger2006-09-22
| | | | | | | | | The value is_setbyuser from struct ip_options is never used and set only one time (http://linux-net.osdl.org/index.php/TODO#IPV4). This little patch removes it from the kernel source. Signed-off-by: Louis Nyffenegger <louis.nyffenegger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Kill fib4_rules_clean().David S. Miller2006-09-22
| | | | | | As noted by Adrian Bunk this function is totally unused. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Make code static.Adrian Bunk2006-09-22
| | | | | | | This patch makes needlessly global code static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Get rid of HW checksum invalidationPatrick McHardy2006-09-22
| | | | | | | Update hardware checksums incrementally to avoid breaking GSO. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETEPatrick McHardy2006-09-22
| | | | | | | | | | | Replace CHECKSUM_HW by CHECKSUM_PARTIAL (for outgoing packets, whose checksum still needs to be completed) and CHECKSUM_COMPLETE (for incoming packets, device supplied full checksum). Patch originally from Herbert Xu, updated by myself for 2.6.18-rc3. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: netbios conntrack: fix compilePatrick McHardy2006-09-22
| | | | | | | | Fix compile breakage caused by move of IFA_F_SECONDARY to new header file. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPv4]: Move interface address bits to linux/if_addr.hThomas Graf2006-09-22
| | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Convert address dumping to new netlink apiThomas Graf2006-09-22
| | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Convert address deletion to new netlink apiThomas Graf2006-09-22
| | | | | | | | Fixes various unvalidated netlink attributes causing memory corruptions when left empty by userspace. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Convert address addition to new netlink apiThomas Graf2006-09-22
| | | | | | | | | | Adds rtm_to_ifaddr() transforming a netlink message to a struct in_ifaddr. Fixes various unvalidated netlink attributes causing memory corruptions when left empty by userspace applications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Use Protocol Independant Policy Routing Rules FrameworkThomas Graf2006-09-22
| | | | | Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NetLabel]: CIPSOv4 enginePaul Moore2006-09-22
| | | | | | | | | | | | | | | | | | | | | Add support for the Commercial IP Security Option (CIPSO) to the IPv4 network stack. CIPSO has become a de-facto standard for trusted/labeled networking amongst existing Trusted Operating Systems such as Trusted Solaris, HP-UX CMW, etc. This implementation is designed to be used with the NetLabel subsystem to provide explicit packet labeling to LSM developers. The CIPSO/IPv4 packet labeling works by the LSM calling a NetLabel API function which attaches a CIPSO label (IPv4 option) to a given socket; this in turn attaches the CIPSO label to every packet leaving the socket without any extra processing on the outbound side. On the inbound side the individual packet's sk_buff is examined through a call to a NetLabel API function to determine if a CIPSO/IPv4 label is present and if so the security attributes of the CIPSO label are returned to the caller of the NetLabel API function. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NetLabel]: core network changesPaul Moore2006-09-22
| | | | | | | | Changes to the core network stack to support the NetLabel subsystem. This includes changes to the IPv4 option handling to support CIPSO labels. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [MLSXFRM]: Auto-labeling of child socketsVenkat Yekkirala2006-09-22
| | | | | | | | | | | | | This automatically labels the TCP, Unix stream, and dccp child sockets as well as openreqs to be at the same MLS level as the peer. This will result in the selection of appropriately labeled IPSec Security Associations. This also uses the sock's sid (as opposed to the isec sid) in SELinux enforcement of secmark in rcv_skb and postroute_last hooks. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [MLSXFRM]: Add flow labelingVenkat Yekkirala2006-09-22
| | | | | | | | | | | | | | | | | | | | | | This labels the flows that could utilize IPSec xfrms at the points the flows are defined so that IPSec policy and SAs at the right label can be used. The following protos are currently not handled, but they should continue to be able to use single-labeled IPSec like they currently do. ipmr ip_gre ipip igmp sit sctp ip6_tunnel (IPv6 over IPv6 tunnel device) decnet Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [CRYPTO] users: Use crypto_comp and crypto_has_*Herbert Xu2006-09-20
| | | | | | | This patch converts all users to use the new crypto_comp type and the crypto_has_* functions. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [IPSEC]: Use HMAC template and hash interfaceHerbert Xu2006-09-20
| | | | | | | | | | | | | This patch converts IPsec to use the new HMAC template. The names of existing simple digest algorithms may still be used to refer to their HMAC composites. The same structure can be used by other MACs such as AES-XCBC-MAC. This patch also switches from the digest interface to hash. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC] ESP: Use block ciphers where applicableHerbert Xu2006-09-20
| | | | | | | | This patch converts IPSec/ESP to use the new block cipher type where applicable. Similar to the HMAC conversion, existing algorithm names have been kept for compatibility. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* [IPV4] fib_trie: missing ntohl() when calling fib_semantic_match()Al Viro2006-09-19
| | | | | | | | fib_trie.c::check_leaf() passes host-endian where fib_semantic_match() expects (and stores into) net-endian. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP] tcp-lp: bug fix for oops in 2.6.18-rc6Wong Hoi Sing Edison2006-09-18
| | | | | | | | | Sorry that the patch submited yesterday still contain a small bug. This version have already been test for hours with BT connections. The oops is now difficult to reproduce. Signed-off-by: Wong Hoi Sing Edison <hswong3i@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: remove the debug option go ip_vs_ftpSimon Horman2006-09-18
| | | | | | | | This patch makes the debuging behaviour of this code more consistent with the rest of IPVS. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: Make sure ip_vs_ftp ports are validSimon Horman2006-09-18
| | | | | | | | | I'm not entirely sure what happens in the case of a valid port, at best it'll be silently ignored. This patch ignores them a little more verbosely. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPVS]: auto-help for ip_vs_ftpSimon Horman2006-09-18
| | | | | | | Fill in a help message for the ports option to ip_vs_ftp Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Turn ABC off.Stephen Hemminger2006-09-18
| | | | | | | | | | Turn Appropriate Byte Count off by default because it unfairly penalizes applications that do small writes. Add better documentation to describe what it is so users will understand why they might want to turn it on. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix SNMPv2 "ipFragFails" counter errorWei Dong2006-08-31
| | | | | | | | | | | | | | | | | | | | | | | | When I tested Linux kernel 2.6.17.7 about statistics "ipFragFails",found that this counter couldn't increase correctly. The criteria is RFC2011: RFC2011 ipFragFails OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of IP datagrams that have been discarded because they needed to be fragmented at this entity but could not be, e.g., because their Don't Fragment flag was set." ::= { ip 18 } When I send big IP packet to a router with DF bit set to 1 which need to be fragmented, and router just sends an ICMP error message ICMP_FRAG_NEEDED but no increments for this counter(in the function ip_fragment). Signed-off-by: Wei Dong <weid@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Two RFC3465 Appropriate Byte Count fixes.Daikichi Osuga2006-08-30
| | | | | | | | 1) fix slow start after retransmit timeout 2) fix case of L=2*SMSS acked bytes comparison Signed-off-by: Daikichi Osuga <osugad@s1.nttdocomo.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Limit window scaling if window is clamped.Stephen Hemminger2006-08-22
| | | | | | | | | | | | | | | This small change allows for easy per-route workarounds for broken hosts or middleboxes that are not compliant with TCP standards for window scaling. Rather than having to turn off window scaling globally. This patch allows reducing or disabling window scaling if window clamp is present. Example: Mark Lord reported a problem with 2.6.17 kernel being unable to access http://www.everymac.com # ip route add 216.145.246.23/32 via 10.8.0.1 window 65535 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: arp_tables: fix table locking in arpt_do_tablePatrick McHardy2006-08-22
| | | | | | | | table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ip_tables: fix table locking in ipt_do_tablePatrick McHardy2006-08-17
| | | | | | | | table->private might change because of ruleset changes, don't use it without holding the lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ctnetlink: fix deadlock in table dumpingPatrick McHardy2006-08-17
| | | | | | | | ip_conntrack_put must not be called while holding ip_conntrack_lock since destroy_conntrack takes it again. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: severe locking bug in fib_semantics.cAlexey Kuznetsov2006-08-17
| | | | | | | | | | | | | Found in 2.4 by Yixin Pan <yxpan@hotmail.com>. > When I read fib_semantics.c of Linux-2.4.32, write_lock(&fib_info_lock) = > is used in fib_release_info() instead of write_lock_bh(&fib_info_lock). = > Is the following case possible: a BH interrupts fib_release_info() while = > holding the write lock, and calls ip_check_fib_default() which calls = > read_lock(&fib_info_lock), and spin forever. Signed-off-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
* [MCAST]: Fix filter leak on device removal.David L Stevens2006-08-17
| | | | | | | | | | | This fixes source filter leakage when a device is removed and a process leaves the group thereafter. This also includes corresponding fixes for IPv6 multicast source filters on device removal. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Possible leak of multicast source filter sctructureMichal Ruzicka2006-08-17
| | | | | | | | | | There is a leak of a socket's multicast source filter list structure on closing a socket with a multicast source filter set on an interface that does not exist any more. Signed-off-by: Michal Ruzicka <michal.ruzicka@comstar.cz> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Use pskb_trim_unique when trimming paged unique skbsHerbert Xu2006-08-13
| | | | | | | | | | | | | | | | | The IPv4/IPv6 datagram output path was using skb_trim to trim paged packets because they know that the packet has not been cloned yet (since the packet hasn't been given to anything else in the system). This broke because skb_trim no longer allows paged packets to be trimmed. Paged packets must be given to one of the pskb_trim functions instead. This patch adds a new pskb_trim_unique function to cover the IPv4/IPv6 datagram output path scenario and replaces the corresponding skb_trim calls with it. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ulog: fix panic on SMP kernelsMark Huang2006-08-13
| | | | | | | | | | | | | | | | Fix kernel panic on various SMP machines. The culprit is a null ub->skb in ulog_send(). If ulog_timer() has already been scheduled on one CPU and is spinning on the lock, and ipt_ulog_packet() flushes the queue on another CPU by calling ulog_send() right before it exits, there will be no skbuff when ulog_timer() acquires the lock and calls ulog_send(). Cancelling the timer in ulog_send() doesn't help because it has already been scheduled and is running on the first CPU. Similar problem exists in ebt_ulog.c and nfnetlink_log.c. Signed-off-by: Mark Huang <mlhuang@cs.princeton.edu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: {arp,ip,ip6}_tables: proper error recovery in init pathPatrick McHardy2006-08-13
| | | | | | | | | | Neither of {arp,ip,ip6}_tables cleans up behind itself when something goes wrong during initialization. Noticed by Rennie deGraaf <degraaf@cpsc.ucalgary.ca> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: xt_hashlimit: fix limit off-by-onePatrick McHardy2006-08-13
| | | | | | | | | | Hashlimit doesn't account for the first packet, which is inconsistent with the limit match. Reported by ryan.castellucci@gmail.com, netfilter bugzilla #500. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix botched memory leak fix to tcpprobe_read().David S. Miller2006-08-13
| | | | | | | | | | | | Somehow I clobbered James's original fix and only my subsequent compiler warning change went in for that changeset. Get the real fix in there. Noticed by Jesper Juhl. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: SNMPv2 tcpOutSegs counter errorWei Yongjun2006-08-08
| | | | | | | Do not count retransmitted segments. Signed-off-by: Wei Yongjun <yjwei@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Limit rt cache size properly.Kirill Korotaev2006-08-07
| | | | | | | | | | | | | | | | | | | From: Kirill Korotaev <dev@sw.ru> During OpenVZ stress testing we found that UDP traffic with random src can generate too much excessive rt hash growing leading finally to OOM and kernel panics. It was found that for 4GB i686 system (having 1048576 total pages and 225280 normal zone pages) kernel allocates the following route hash: syslog: IP route cache hash table entries: 262144 (order: 8, 1048576 bytes) => ip_rt_max_size = 4194304 entries, i.e. max rt size is 4194304 * 256b = 1Gb of RAM > normal_zone Attached the patch which removes HASH_HIGHMEM flag from alloc_large_system_hash() call. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fixes IW > 2 cases when TCP is application limitedIlpo Järvinen2006-08-05
| | | | | | | | | Whenever a transfer is application limited, we are allowed at least initial window worth of data per window unless cwnd is previously less than that. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix more per-cpu typosAlexey Dobriyan2006-08-02
| | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Kernel memory leak fix for af_unix datagram getpeersec patchCatherine Zhang2006-08-02
| | | | | | | | | | | | | | | | | | | | From: Catherine Zhang <cxzhang@watson.ibm.com> This patch implements a cleaner fix for the memory leak problem of the original unix datagram getpeersec patch. Instead of creating a security context each time a unix datagram is sent, we only create the security context when the receiver requests it. This new design requires modification of the current unix_getsecpeer_dgram LSM hook and addition of two new hooks, namely, secid_to_secctx and release_secctx. The former retrieves the security context and the latter releases it. A hook is required for releasing the security context because it is up to the security module to decide how that's done. In the case of Selinux, it's a simple kfree operation. Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: SNMPv2 "ipv6IfStatsOutFragCreates" counter errorWei Dong2006-08-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When I tested linux kernel 2.6.71.7 about statistics "ipv6IfStatsOutFragCreates", and found that it couldn't increase correctly. The criteria is RFC 2465: ipv6IfStatsOutFragCreates OBJECT-TYPE SYNTAX Counter32 MAX-ACCESS read-only STATUS current DESCRIPTION "The number of output datagram fragments that have been generated as a result of fragmentation at this output interface." ::= { ipv6IfStatsEntry 15 } I think there are two issues in Linux kernel. 1st: RFC2465 specifies the counter is "The number of output datagram fragments...". I think increasing this counter after output a fragment successfully is better. And it should not be increased even though a fragment is created but failed to output. 2nd: If we send a big ICMP/ICMPv6 echo request to a host, and receive ICMP/ICMPv6 echo reply consisted of some fragments. As we know that in Linux kernel first fragmentation occurs in ICMP layer(maybe saying transport layer is better), but this is not the "real" fragmentation,just do some "pre-fragment" -- allocate space for date, and form a frag_list, etc. The "real" fragmentation happens in IP layer -- set offset and MF flag and so on. So I think in "fast path" for ip_fragment/ip6_fragment, if we send a fragment which "pre-fragment" by upper layer we should also increase "ipv6IfStatsOutFragCreates". Signed-off-by: Wei Dong <weid@nanjing-fnst.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: xt_hashlimit/xt_string: missing string validationPatrick McHardy2006-08-02
| | | | | | | | | The hashlimit table name and the textsearch algorithm need to be terminated, the textsearch pattern length must not exceed the maximum size. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>