aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* [TCP]: TCP Illinois congestion control (rev3)Stephen Hemminger2007-04-26
| | | | | | | | | | | This is an implementation of TCP Illinois invented by Shao Liu at University of Illinois. It is a another variant of Reno which adapts the alpha and beta parameters based on RTT. The basic idea is to increase window less rapidly as delay approaches the maximum. See the papers and talks to get a more complete description. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Get rid of netdev_nitStephen Hemminger2007-04-26
| | | | | | | | It isn't any faster to test a boolean global variable than do a simple check for empty list. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PPPOE]: Fix device tear-down notification.Michal Ostrowski2007-04-26
| | | | | | | | | | | | | | | | | | | | pppoe_flush_dev() kicks all sockets bound to a device that is going down. In doing so, locks must be taken in the right order consistently (sock lock, followed by the pppoe_hash_lock). However, the scan process is based on us holding the sock lock. So, when something is found in the scan we must release the lock we're holding and grab the sock lock. This patch fixes race conditions between this code and pppoe_release(), both of which perform similar functions but would naturally prefer to grab locks in opposing orders. Both code paths are now going after these locks in a consistent manner. pppoe_hash_lock protects the contents of the "pppox_sock" objects that reside inside the hash. Thus, NULL'ing out the pppoe_dev field should be done under the protection of this lock. Signed-off-by: Michal Ostrowski <mostrows@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PPPOE]: memory leak when socket is release()d before PPPIOCGCHAN has been ↵Florian Zumbiehl2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | | called on it below you find a patch that fixes a memory leak when a PPPoE socket is release()d after it has been connect()ed, but before the PPPIOCGCHAN ioctl ever has been called on it. This is somewhat of a security problem, too, since PPPoE sockets can be created by any user, so any user can easily allocate all the machine's RAM to non-swappable address space and thus DoS the system. Is there any specific reason for PPPoE sockets being available to any unprivileged process, BTW? After all, you need a packet socket for the discovery stage anyway, so it's unlikely that any unprivileged process will ever need to create a PPPoE socket, no? Allocating all session IDs for a known AC is a kind of DoS, too, after all - with Juniper ERXes, this is really easy, actually, since they don't ever assign session ids above 8000 ... Signed-off-by: Florian Zumbiehl <florz@florz.de> Acked-by: Michal Ostrowski <mostrows@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PPPOE]: race between interface going down and connect()Florian Zumbiehl2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | below you find a patch that (hopefully) fixes a race between an interface going down and a connect() to a peer on that interface. Before, connect() would determine that an interface is up, then the interface could go down and all entries referring to that interface in the item_hash_table would be marked as ZOMBIEs and their references to the device would be freed, and after that, connect() would put a new entry into the hash table referring to the device that meanwhile is down already - which also would cause unregister_netdevice() to wait until the socket has been release()d. This patch does not suffice if we are not allowed to accept connect()s referring to a device that we already acked a NETDEV_GOING_DOWN for (that is: all references are only guaranteed to be freed after NETDEV_DOWN has been acknowledged, not necessarily after the NETDEV_GOING_DOWN already). And if we are allowed to, we could avoid looking through the hash table upon NETDEV_GOING_DOWN completely and only do that once we get the NETDEV_DOWN ... mostrows: pppoe_flush_dev is called on NETDEV_GOING_DOWN and NETDEV_DOWN to deal with this "late connect" issue. Ideally one would hope to notify users at the "NETDEV_GOING_DOWN" phase (just to pretend to be nice). However, it is the NETDEV_DOWN scan that takes all the responsibility for ensuring nobody is hanging around at that time. Signed-off-by: Florian Zumbiehl <florz@florz.de> Acked-by: Michal Ostrowski <mostrows@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PPPoE]: miscellaneous smaller cleanupsFlorian Zumbiehl2007-04-26
| | | | | | | | | | below is a patch that just removes dead code/initializers without any effect (first access is an assignment) that I stumbled accross while reading the source. Signed-off-by: Florian Zumbiehl <florz@florz.de> Acked-by: Michal Ostrowski <mostrows@earthlink.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET] skbuff: skb_store_bits const is backwardsStephen Hemminger2007-04-26
| | | | | | | | | Getting warnings becuase skb_store_bits has skb as constant, but the function overwrites it. Looks like const was on the wrong side. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: Fix warning in net-2.6.22Stephen Hemminger2007-04-26
| | | | | | | The following is leftover from earlier change in net-2.6.22. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AX25/NETROM/ROSE]: Convert to use modern wait queue APIRalf Baechle2007-04-26
| | | | | Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_PACKET]: Add option to return orig_dev to userspace.Peter P. Waskiewicz Jr2007-04-26
| | | | | | | | | | | | | | | Add a packet socket option to allow the orig_dev index to be returned to userspace when passing traffic through a decapsulated device, such as the bonding driver. This is very useful for layer 2 traffic being able to report which physical device actually received the traffic, instead of having the encapsulating device hide that information. The new option is called PACKET_ORIGDEV. Signed-off-by: Peter P. Waskiewicz Jr. <peter.p.waskiewicz.jr@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6] SNMP: Export statistics via netlink without CONFIG_PROC_FS.YOSHIFUJI Hideaki2007-04-26
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4] SNMP: Move some statistic bits to net/ipv4/proc.c.YOSHIFUJI Hideaki2007-04-26
| | | | | | | This also fixes memory leak in error path. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6] SNMP: Move some statistic bits to net/ipv6/proc.c.YOSHIFUJI Hideaki2007-04-26
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6] SNMP: Netlink interface.YOSHIFUJI Hideaki2007-04-26
| | | | | Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: Add IP(V6)_PMTUDISC_RPOBEJohn Heffner2007-04-26
| | | | | | | | | | | Add IP(V6)_PMTUDISC_PROBE value for IP(V6)_MTU_DISCOVER. This option forces us not to fragment, but does not make use of the kernel path MTU discovery. That is, it allows for user-mode MTU probing (or, packetization-layer path MTU discovery). This is particularly useful for diagnostic utilities, like traceroute/tracepath. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: MTU discovery check in ip6_fragment()John Heffner2007-04-26
| | | | | | | | | Adds a check in ip6_fragment() mirroring ip_fragment() for packets that we can't fragment, and sends an ICMP Packet Too Big message in response. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET_SCHED]: ingress: switch back to using ingress_lockPatrick McHardy2007-04-26
| | | | | | | | | | | | Switch ingress queueing back to use ingress_lock. qdisc_lock_tree now locks both the ingress and egress qdiscs on the device. All changes to data that might be used on both ingress and egress needs to be protected by using qdisc_lock_tree instead of manually taking dev->queue_lock. Additionally the qdisc stats_lock needs to be initialized to ingress_lock for ingress qdiscs. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET_SCHED]: Eliminate qdisc_tree_lockPatrick McHardy2007-04-26
| | | | | | | | | Since we're now holding the rtnl during the entire dump operation, we can remove qdisc_tree_lock, whose only purpose is to protect dump callbacks from concurrent changes to the qdisc tree. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: don't reinitialize callback mutexPatrick McHardy2007-04-26
| | | | | | | | | Don't reinitialize the callback mutex the netlink_kernel_create caller handed in, it is supposed to already be initialized and could already be held by someone. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [RTNETLINK]: Remove unnecessary locking in dump callbacksPatrick McHardy2007-04-26
| | | | | | | | | | Since we're now holding the rtnl during the entire dump operation, we can remove additional locking for rtnl protected data. This patch does that for all simple cases (dev_base_lock for dev_base walking, RCU protection for FIB rule dumping). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [RTNETLINK]: Hold rtnl_mutex during netlink dump callbacksPatrick McHardy2007-04-26
| | | | | | | | | Hold rtnl_mutex during the entire netlink dump operation. This allows to simplify locking in the dump callbacks, since they can now rely on that no concurrent changes happen. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Switch cb_lock spinlock to mutex and allow to override itPatrick McHardy2007-04-26
| | | | | | | | | | Switch cb_lock to mutex and allow netlink kernel users to override it with a subsystem specific mutex for consistent locking in dump callbacks. All netlink_dump_start users have been audited not to rely on any side-effects of the previously used spinlock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ipt_ULOG: add compat conversion functionsPatrick McHardy2007-04-26
| | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nfnetlink_log: remove fallback to group 0Patrick McHardy2007-04-26
| | | | | | | | | | Don't fallback to group 0 if no instance can be found for the given group. This potentially confuses the listener and is not what the user configured. Also remove the ring buffer spamming that happens when rules are set up before the logging daemon is started. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: {eb,ip6,ip}t_LOG: remove remains of LOG target overloadingPatrick McHardy2007-04-26
| | | | | | | | | All LOG targets always use their internal logging function nowadays, so remove the incorrect error message and handle real errors (!= -EEXIST) by failing to load. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: nf_nat: use HW checksumming when possiblePatrick McHardy2007-04-26
| | | | | | | | When mangling packets forwarded to a HW checksumming capable device, offload recalculation of the checksum instead of doing it in software. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: ebt_arp: add gratuitous arp filteringBart De Schuymer2007-04-26
| | | | | | | | | | | | | The attached patch adds gratuitous arp filtering, more precisely: it allows checking that the IPv4 source address matches the IPv4 destination address inside the ARP header. It also adds a check for the hardware address type when matching MAC addresses (nothing critical, just for better consistency). Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Acked-by: Carl-Daniel Hailfinger <c-d.hailfinger.devel.2006@gmx.net> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: bridge-nf: filter bridged IPv4/IPv6 encapsulated in pppoe trafficMichael Milner2007-04-26
| | | | | | | | | | | The attached patch by Michael Milner adds support for using iptables and ip6tables on bridged traffic encapsulated in ppoe frames, similar to what's already supported for vlan. Signed-off-by: Michael Milner <milner@blissisland.ca> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [DCCP]: Complete documentation of dccp_sockGerrit Renker2007-04-26
| | | | | | | | | This fills in missing documentation for dccp_sock fields. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [DCCP]: Debug statements for Elapsed Time optionGerrit Renker2007-04-26
| | | | | | | | | | This prints the value of the parsed Elapsed Time when received via a Timestamp Echo option [RFC 4342, 13.3]. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [DCCP]: Fix bug in the calculation of very low sending ratesGerrit Renker2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | This fixes an error in the calculation of t_ipi when X converges towards very low sending rates (between 1 and 64 bytes per second). Although this case may not sound likely, it can be reproduced by connecting, hitting enter (1 byte sent) and waiting for some time, during which the nofeedback timer halves the sending rate until finally it reaches the region 1..64 bytes/sec. Computing X is handled correctly (tested separately); but by dividing X _before_ entering the calculation of t_ipi, X becomes zero as a result. This in turn triggers a BUG condition caught in scaled_div(). Fixed by replacing with equivalent statement and explicit typecast for good measure. Calculation verified and effect of patch tested - reduced never below 1 byte per 64 seconds afterwards, i.e. not allowing divide-by-zero. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [S390]: Fix build on 31-bit.David S. Miller2007-04-26
| | | | | | | | | | | | Allow s390 to properly override the generic __div64_32() implementation by: 1) Using obj-y for div64.o in s390's makefile instead of lib-y 2) Adding the weak attribute to the generic implementation. Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Fix missing offset adjustment in skb_copy_expandPatrick McHardy2007-04-26
| | | | | | | | skb_copy_expand changes the headroom, so it needs to adjust the header offsets by the difference between the old and the new value. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: loopback driver can use loopback_dev integrated net_device_statsEric Dumazet2007-04-26
| | | | | | | | | | Rusty added a new 'stats' field to struct net_device. loopback driver can use it instead of declaring another struct net_device_stats This saves some memory. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bridge: check kmem_cache_create() errorAkinobu Mita2007-04-26
| | | | | | | | This patch checks kmem_cache_create() error and aborts loading module on failure. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: allow changing hardware address to any valid addressStephen Hemminger2007-04-26
| | | | | | | For case of bridging pseudo devices, the get created/destroyed (Xen) need to allow setting address to any valid value. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: change when netlink events go to STPStephen Hemminger2007-04-26
| | | | | | | Need to tell STP daemon about more events, like any time a device is added even when it is down. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: add support for user mode STPStephen Hemminger2007-04-26
| | | | | | | | | | This patchset based on work by Aji_Srinivas@emc.com provides allows spanning tree to be controled from userspace. Like hotplug, it uses call_usermodehelper when spanning tree is enabled so there is no visible API change. If call to start usermode STP fails it falls back to existing kernel STP. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: add sysfs hook to flush forwarding tableStephen Hemminger2007-04-26
| | | | | | | | | | The RSTP daemon needs to be able to flush all dynamic forwarding entries in the case of topology change. This is a temporary interface. It will change to a netlink interface before RSTP daemon is officially released. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: simpler hash with saltStephen Hemminger2007-04-26
| | | | | | | | Instead of hashing the whole Ethernet address, it should be faster to just use the last 4 bytes. Add a random salt value to the hash to make it more difficult to construct worst case DoS hash chains. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: don't route packets while learningStephen Hemminger2007-04-26
| | | | | | | | | While in the STP learning state, don't route packets; wait until forwarding delay has expired. The purpose of the forwarding delay is to detect loops in the network, and if a brouter started up and started forwarding, it could cause a flood. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* bridge: eliminate call by referenceStephen Hemminger2007-04-26
| | | | | | | | Change the bridging hook to be simple function with return value rather than modifying the skb argument. This could generate better code and is cleaner. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
* [NET]: Treat CHECKSUM_PARTIAL as CHECKSUM_UNNECESSARYHerbert Xu2007-04-26
| | | | | | | | | When a transmitted packet is looped back directly, CHECKSUM_PARTIAL maps to the semantics of CHECKSUM_UNNECESSARY. Therefore we should treat it as such in the stack. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETDRV]: Perform missing csum_offset conversionsHerbert Xu2007-04-26
| | | | | | | | | | | | | | When csum_offset was introduced we did a conversion from csum to csum_offset where applicable. A couple of drivers were missed in this process. It was harmless to begin with since the two fields coincided. Now that we've made them different with the addition of csum_start, the missed drivers must be converted or they can't send packets out at all that require checksum offload. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Use csum_start offset instead of skb_transport_headerHerbert Xu2007-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | The skb transport pointer is currently used to specify the start of the checksum region for transmit checksum offload. Unfortunately, the same pointer is also used during receive side processing. This creates a problem when we want to retransmit a received packet with partial checksums since the skb transport pointer would be overwritten. This patch solves this problem by creating a new 16-bit csum_start offset value to replace the skb transport header for the purpose of checksums. This offset is calculated from skb->head so that it does not have to change when skb->data changes. No extra space is required since csum_offset itself fits within a 16-bit word so we can use the other 16 bits for csum_start. For backwards compatibility, just before we push a packet with partial checksums off into the device driver, we set the skb transport header to what it would have been under the old scheme. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [XFRM]: beet: fix worst case header_len calculationPatrick McHardy2007-04-26
| | | | | | | | | | | | | esp_init_state doesn't account for the beet pseudo header in the header_len calculation, which may result in undersized skbs hitting xfrm4_beet_output, causing unnecessary reallocations in ip_finish_output2. The skbs should still always have enough room to avoid causing skb_under_panic in skb_push since we have at least 16 bytes available from LL_RESERVED_SPACE in xfrm_state_check_space. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [XFRM]: Optimize MTU calculationPatrick McHardy2007-04-26
| | | | | | | | | | | Replace the probing based MTU estimation, which usually takes 2-3 iterations to find a fitting value and may underestimate the MTU, by an exact calculation. Also fix underestimation of the XFRM trailer_len, which causes unnecessary reallocations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [XFRM]: esp: fix skb_tail_pointer conversion bugPatrick McHardy2007-04-26
| | | | | | | | | | | | | | | Fix incorrect switch of "trailer" skb by "skb" during skb_tail_pointer conversion: - *(u8*)(trailer->tail - 1) = top_iph->protocol; + *(skb_tail_pointer(skb) - 1) = top_iph->protocol; - *(u8 *)(trailer->tail - 1) = *skb_network_header(skb); + *(skb_tail_pointer(skb) - 1) = *skb_network_header(skb); Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [SK_BUFF]: Fix missing offset adjustment in pskb_expand_headPatrick McHardy2007-04-26
| | | | | | | | Since we're increasing the headroom, the header offsets need to be increased by the same amount as well. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6] FIB6RULE: Find source address during looking up route.YOSHIFUJI Hideaki2007-04-26
| | | | | | | | | | | When looking up route for destination with rules with source address restrictions, we may need to find a source address for the traffic if not given. Based on patch from Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>