aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* [NET]: Use %lx for netdev->features sysfs formatting.David S. Miller2005-05-29
| | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: Clear up user copy warning in flowlabel code.David S. Miller2005-05-29
| | | | | | | | | We are intentionally ignoring the copy_to_user() value, make it clear to the compiler too. Noted by Jeff Garzik. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Add ethtool support for NETIF_F_HW_CSUM.Jon Mason2005-05-29
| | | | | Signed-off-by: Jon Mason <jdmason@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Kill MULTIPATHHOLDROUTE flag.Pravin B. Shelar2005-05-29
| | | | | | | | | | | It cannot work properly, so just ignore it in drr and rr multipath algorithms just like the random multipath algorithm does. Suggested by Herbert Xu. Signed-off by: Pravin B. Shelar <pravins@calsoftinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Primary and secondary addressesHarald Welte2005-05-29
| | | | | | | | | Add an option to make secondary IP addresses get promoted when primary IP addresses are removed from the device. It defaults to off to preserve existing behavior. Signed-off-by: Harald Welte <laforge@gnumonks.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: receive path optimizationStephen Hemminger2005-05-29
| | | | | | | | | | | | This improves the bridge local receive path by avoiding going through another softirq. The bridge receive path is already being called from a netif_receive_skb() there is no point in going through another receiveq round trip. Recursion is limited because bridge can never be a port of a bridge so handle_bridge() always returns. Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: prevent bad forwarding table updatesStephen Hemminger2005-05-29
| | | | | | | | | | Avoid poisoning of the bridge forwarding table by frames that have been dropped by filtering. This prevents spoofed source addresses on hostile side of bridge from causing packet leakage, a small but possible security risk. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: set features based on enslaved devicesStephen Hemminger2005-05-29
| | | | | | | | | Make features of the bridge pseudo-device be a subset of the underlying devices. Motivated by Xen and others who use bridging to do failover. Signed-off-by: Catalin BOIE <catab at umrella.ro> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: make dev->features unsignedStephen Hemminger2005-05-29
| | | | | | | | The features field in netdevice is really a bitmask, and bitmask's should be unsigned. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [BRIDGE]: features change notificationStephen Hemminger2005-05-29
| | | | | | | | | Resend of earlier patch (no changes) from Catalin used to provide device feature change notification. Signed-off-by: Catalin BOIE <catab at umbrella.ro> Acked-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TOKENRING]: net/802/tr.c: s/struct rif_cache_s/struct rif_cache/Alexey Dobriyan2005-05-26
| | | | | | | "_s" suffix is certainly of hungarian origin. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TOKENRING]: be'ify trh_hdr, trllc, rif_cache_sAlexey Dobriyan2005-05-26
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* From: Kazunori Miyazawa <kazunori@miyazawa.org>Hideaki YOSHIFUJI2005-05-26
| | | | | | | | | | [XFRM] Call dst_check() with appropriate cookie This fixes infinite loop issue with IPv6 tunnel mode. Signed-off-by: Kazunori Miyazawa <kazunori@miyazawa.org> Signed-off-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED] netem: allow random reordering (with fix)Stephen Hemminger2005-05-26
| | | | | | | | | | | | Here is a fixed up version of the reorder feature of netem. It is the same as the earlier patch plus with the bugfix from Julio merged in. Has expected backwards compatibility behaviour. Go ahead and merge this one, the TCP strangeness I was seeing was due to the reordering bug, and previous version of TSO patch. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED] netem: use only inner qdisc -- no private skbuff queueStephen Hemminger2005-05-26
| | | | | | | | | | | | | | | | | | | | Netem works better if there if packets are just queued in the inner discipline rather than having a separate delayed queue. Change to use the dequeue/requeue to peek like TBF does. By doing this potential qlen problems with the old method are avoided. The problems happened when the netem_run that moved packets from the inner discipline to the nested discipline failed (because inner queue was full). This happened in dequeue, so the effective qlen of the netem would be decreased (because of the drop), but there was no way to keep the outer qdisc (caller of netem dequeue) in sync. The problem window is still there since this patch doesn't address the issue of requeue failing in netem_dequeue, but that shouldn't happen since the sequence dequeue/requeue should always work. Long term correct fix is to implement qdisc->peek in all the qdisc's to allow for this (needed by several other qdisc's as well). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PKT_SCHED]: netem: reinsert for duplicationStephen Hemminger2005-05-26
| | | | | | | | | Handle duplication of packets in netem by re-inserting at top of qdisc tree. This avoid problems with qlen accounting with nested qdisc. This recursion requires no additional locking but will potentially increase stack depth. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: Fix xfrm tunnel oops with large packetsHerbert Xu2005-05-23
| | | | | | Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Fix stretch ACK performance killer when doing ucopy.David S. Miller2005-05-23
| | | | | | | | | | | | | | | | When we are doing ucopy, we try to defer the ACK generation to cleanup_rbuf(). This works most of the time very well, but if the ucopy prequeue is large, this ACKing behavior kills performance. With TSO, it is possible to fill the prequeue so large that by the time the ACK is sent and gets back to the sender, most of the window has emptied of data and performance suffers significantly. This behavior does help in some cases, so we should think about re-enabling this trick in the future, using some kind of limit in order to avoid the bug case. Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Defer socket destruction a bitTommy S. Christensen2005-05-19
| | | | | | | | | | | | | | In netlink_broadcast() we're sending shared skb's to netlink listeners when possible (saves some copying). This is OK, since we hold the only other reference to the skb. However, this implies that we must drop our reference on the skb, before allowing a receiving socket to disappear. Otherwise, the socket buffer accounting is disrupted. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Move broadcast skb_orphan to the skb_get path.Tommy S. Christensen2005-05-19
| | | | | | | | Cloned packets don't need the orphan call. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETLINK]: Fix race with recvmsg().Tommy S. Christensen2005-05-19
| | | | | | | | | | | | | | | | | | | | | | This bug causes: assertion (!atomic_read(&sk->sk_rmem_alloc)) failed at net/netlink/af_netlink.c (122) What's happening is that: 1) The skb is sent to socket 1. 2) Someone does a recvmsg on socket 1 and drops the ref on the skb. Note that the rmalloc is not returned at this point since the skb is still referenced. 3) The same skb is now sent to socket 2. This version of the fix resurrects the skb_orphan call that was moved out, last time we had 'shared-skb troubles'. It is practically a no-op in the common case, but still prevents the possible race with recvmsg. Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: Verify key payload in verify_one_algoHerbert Xu2005-05-19
| | | | | | | | We need to verify that the payload contains enough data so that attach_one_algo can copy alg_key_len bits from the payload. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPSEC]: Fixed alg_key_len usage in attach_one_algoHerbert Xu2005-05-19
| | | | | | | | | The variable alg_key_len is in bits and not bytes. The function attach_one_algo is currently using it as if it were in bytes. This causes it to read memory which may not be there. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NETFILTER]: Do not be clever about SKB ownership in ip_ct_gather_frags().David S. Miller2005-05-19
| | | | | | | Just do an skb_orphan() and be done with it. Based upon discussions with Herbert Xu on netdev. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IP_VS]: Remove extra __ip_vs_conn_put() for incoming ICMP.Julian Anastasov2005-05-19
| | | | | | | | Remove extra __ip_vs_conn_put for incoming ICMP in direct routing mode. Mark de Vries reports that IPVS connections are not leaked anymore. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
* [AF_UNIX]: Use lookup_create().Christoph Hellwig2005-05-19
| | | | | | | | | | | currently it opencodes it, but that's in the way of chaning the lookup_hash interface. I'd prefer to disallow modular af_unix over exporting lookup_create, but I'll leave that to you. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4/IPV6] Ensure all frag_list members have NULL skHerbert Xu2005-05-19
| | | | | | | | | | | | | | | | | | | | | | | | | Having frag_list members which holds wmem of an sk leads to nightmares with partially cloned frag skb's. The reason is that once you unleash a skb with a frag_list that has individual sk ownerships into the stack you can never undo those ownerships safely as they may have been cloned by things like netfilter. Since we have to undo them in order to make skb_linearize happy this approach leads to a dead-end. So let's go the other way and make this an invariant: For any skb on a frag_list, skb->sk must be NULL. That is, the socket ownership always belongs to the head skb. It turns out that the implementation is actually pretty simple. The above invariant is actually violated in the following patch for a short duration inside ip_fragment. This is OK because the offending frag_list member is either destroyed at the end of the slow path without being sent anywhere, or it is detached from the frag_list before being sent. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [XFRM]: skb_cow_data() does not set proper owner for new skbs.Evgeniy Polyakov2005-05-19
| | | | | | | | | | | | | | | | | | | | | It looks like skb_cow_data() does not set proper owner for newly created skb. If we have several fragments for skb and some of them are shared(?) or cloned (like in async IPsec) there might be a situation when we require recreating skb and thus using skb_copy() for it. Newly created skb has neither a destructor nor a socket assotiated with it, which must be copied from the old skb. As far as I can see, current code sets destructor and socket for the first one skb only and uses truesize of the first skb only to increment sk_wmem_alloc value. If above "analysis" is correct then attached patch fixes that. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [PATCH] update Ross Biro bouncing email addressJesper Juhl2005-05-05
| | | | | | | | | Ross moved. Remove the bad email address so people will find the correct one in ./CREDITS. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [IPV4]: multipath_wrandom.c GPF fixesPatrick McHardy2005-05-05
| | | | | | | multipath_wrandom needs to use GFP_ATOMIC. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ATALK]: Add alloc_ltalkdev().Christoph Hellwig2005-05-05
| | | | | | | | | this matches the API used by other link layer like ethernet or token ring. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: Fix OOPS when using IPV6_ADDRFORMArnaldo Carvalho de Melo2005-05-05
| | | | | | | | | | This causes sk->sk_prot to change, which makes the socket release free the sock into the wrong SLAB cache. Fix this by introducing sk_prot_creator so that we always remember where the sock came from. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [DECNET]: Fix build after C99 netlink initializer change.Rafael J. Wysocki2005-05-05
| | | | Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge with master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6.gitDavid Woodhouse2005-05-05
|\
| * [PATCH] ISA DMA Kconfig fixes - part 4 (irda)Al Viro2005-05-04
| | | | | | | | | | | | | | | | | | | | | | | | * net/irda/irda_device.c::irda_setup_dma() made conditional on ISA_DMA_API (it uses helpers in question and irda is usable on platforms that don't have them at all - think of USB IRDA, for example). * irda drivers that depend on ISA DMA marked as dependent on ISA_DMA_API Signed-off-by: Al Viro <viro@parcelfarce.linux.theplanet.co.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
| * [PKT_SCHED]: Action repeatJ Hadi Salim2005-05-03
| | | | | | | | | | | | | | | | Long standing bug. Policy to repeat an action never worked. Signed-off-by: J Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [IPSEC]: Store idev entriesHerbert Xu2005-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I found a bug that stopped IPsec/IPv6 from working. About a month ago IPv6 started using rt6i_idev->dev on the cached socket dst entries. If the cached socket dst entry is IPsec, then rt6i_idev will be NULL. Since we want to look at the rt6i_idev of the original route in this case, the easiest fix is to store rt6i_idev in the IPsec dst entry just as we do for a number of other IPv6 route attributes. Unfortunately this means that we need some new code to handle the references to rt6i_idev. That's why this patch is bigger than it would otherwise be. I've also done the same thing for IPv4 since it is conceivable that once these idev attributes start getting used for accounting, we probably need to dereference them for IPv4 IPsec entries too. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PKT_SCHED]: netetm: adjust parent qlen when duplicatingStephen Hemminger2005-05-03
| | | | | | | | | | | | | | | | | | Fix qlen underrun when doing duplication with netem. If netem is used as leaf discipline, then the parent needs to be tweaked when packets are duplicated. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PKT_SCHED]: netetm: make qdisc friendly to outer disciplinesStephen Hemminger2005-05-03
| | | | | | | | | | | | | | | | | | Netem currently dumps packets into the queue when timer expires. This patch makes work by self-clocking (more like TBF). It fixes a bug when 0 delay is requested (only doing loss or duplication). Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PKT_SCHED]: netetm: trap infinite loop hange on qlen underflowStephen Hemminger2005-05-03
| | | | | | | | | | | | | | | | | | Due to bugs in netem (fixed by later patches), it is possible to get qdisc qlen to go negative. If this happens the CPU ends up spinning forever in qdisc_run(). So add a BUG_ON() to trap it. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER]: Drop conntrack reference in ip_dev_loopback_xmit()Patrick McHardy2005-05-03
| | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETFILTER]: Fix nf_debug_ip_local_deliver()Patrick McHardy2005-05-03
| | | | | | | | | | Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Disable queueing when carrier is lost.Tommy S. Christensen2005-05-03
| | | | | | | | | | | | | | | | | | | | | | | | Some network drivers call netif_stop_queue() when detecting loss of carrier. This leads to packets being queued up at the qdisc level for an unbound period of time. In order to prevent this effect, the core networking stack will now cease to queue packets for any device, that is operationally down (i.e. the queue is flushed and disabled). Signed-off-by: Tommy S. Christensen <tommy.christensen@tpack.net> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [XFRM/RTNETLINK]: Decrement qlen properly in {xfrm_,rt}netlink_rcv().David S. Miller2005-05-03
| | | | | | | | | | | | | | | | | | | | | | If we free up a partially processed packet because it's skb->len dropped to zero, we need to decrement qlen because we are dropping out of the top-level loop so it will do the decrement for us. Spotted by Herbert Xu. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETLINK]: Fix infinite loops in synchronous netlink changes.David S. Miller2005-05-03
| | | | | | | | | | | | | | The qlen should continue to decrement, even if we pop partially processed SKBs back onto the receive queue. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETLINK]: Synchronous message processing.Herbert Xu2005-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's recap the problem. The current asynchronous netlink kernel message processing is vulnerable to these attacks: 1) Hit and run: Attacker sends one or more messages and then exits before they're processed. This may confuse/disable the next netlink user that gets the netlink address of the attacker since it may receive the responses to the attacker's messages. Proposed solutions: a) Synchronous processing. b) Stream mode socket. c) Restrict/prohibit binding. 2) Starvation: Because various netlink rcv functions were written to not return until all messages have been processed on a socket, it is possible for these functions to execute for an arbitrarily long period of time. If this is successfully exploited it could also be used to hold rtnl forever. Proposed solutions: a) Synchronous processing. b) Stream mode socket. Firstly let's cross off solution c). It only solves the first problem and it has user-visible impacts. In particular, it'll break user space applications that expect to bind or communicate with specific netlink addresses (pid's). So we're left with a choice of synchronous processing versus SOCK_STREAM for netlink. For the moment I'm sticking with the synchronous approach as suggested by Alexey since it's simpler and I'd rather spend my time working on other things. However, it does have a number of deficiencies compared to the stream mode solution: 1) User-space to user-space netlink communication is still vulnerable. 2) Inefficient use of resources. This is especially true for rtnetlink since the lock is shared with other users such as networking drivers. The latter could hold the rtnl while communicating with hardware which causes the rtnetlink user to wait when it could be doing other things. 3) It is still possible to DoS all netlink users by flooding the kernel netlink receive queue. The attacker simply fills the receive socket with a single netlink message that fills up the entire queue. The attacker then continues to call sendmsg with the same message in a loop. Point 3) can be countered by retransmissions in user-space code, however it is pretty messy. In light of these problems (in particular, point 3), we should implement stream mode netlink at some point. In the mean time, here is a patch that implements synchronous processing. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NETLINK]: cb_lock does not needs ref count on skHerbert Xu2005-05-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Here is a little optimisation for the cb_lock used by netlink_dump. While fixing that race earlier, I noticed that the reference count held by cb_lock is completely useless. The reason is that in order to obtain the protection of the reference count, you have to take the cb_lock. But the only way to take the cb_lock is through dereferencing the socket. That is, you must already possess a reference count on the socket before you can take advantage of the reference count held by cb_lock. As a corollary, we can remve the reference count held by the cb_lock. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [PKT_SCHED]: HTB: Drop packet when direct queue is fullAsim Shankar2005-05-03
| | | | | | | | | | | | | | | | | | | | htb_enqueue(): Free skb and return NET_XMIT_DROP if a packet is destined for the direct_queue but the direct_queue is full. (Before this: erroneously returned NET_XMIT_SUCCESS even though the packet was not enqueued) Signed-off-by: Asim Shankar <asimshankar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TCP]: Optimize check in port-allocation code, v6 version.Folkert van Heusden2005-05-03
| | | | | | | | | | Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [TCP]: Optimize check in port-allocation code.Folkert van Heusden2005-05-03
| | | | | | | | | | Signed-off-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>