aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* bonding: send IPv6 neighbor advertisement on failoverBrian Haley2008-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds better IPv6 failover support for bonding devices, especially when in active-backup mode and there are only IPv6 addresses configured, as reported by Alex Sidorenko. - Creates a new file, net/drivers/bonding/bond_ipv6.c, for the IPv6-specific routines. Both regular bonds and VLANs over bonds are supported. - Adds a new tunable, num_unsol_na, to limit the number of unsolicited IPv6 Neighbor Advertisements that are sent on a failover event. Default is 1. - Creates two new IPv6 neighbor discovery functions: ndisc_build_skb() ndisc_send_skb() These were required to support VLANs since we have to be able to add the VLAN id to the skb since ndisc_send_na() and friends shouldn't be asked to do this. These two routines are basically __ndisc_send() split into two pieces, in a slightly different order. - Updates Documentation/networking/bonding.txt and bumps the rev of bond support to 3.4.0. On failover, this new code will generate one packet: - An unsolicited IPv6 Neighbor Advertisement, which helps the switch learn that the address has moved to the new slave. Testing has shown that sending just the NA results in pretty good behavior when in active-back mode, I saw no lost ping packets for example. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* pkt_sched: Fix qdisc len in qdisc_peek_dequeued()Jarek Poplawski2008-11-05
| | | | | | | | | | A packet dequeued and stored as gso_skb in qdisc_peek_dequeued() should be seen as part of the queue for sch->q.qlen queries until it's really dequeued with qdisc_dequeue_peeked(), so qlen needs additional updating in these functions. (Updating qstats.backlog shouldn't matter here.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Don't leak packets when a netns is going downEric W. Biederman2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have been tracking for a while a case where when the network namespace exits the cleanup gets stck in an endless precessess of: unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 unregister_netdevice: waiting for lo to become free. Usage count = 3 It turns out that if you listen on a multicast address an unsubscribe packet is sent when the network device goes down. If you shutdown the network namespace without carefully cleaning up this can trigger the unsubscribe packet to be sent over the loopback interface while the network namespace is going down. All of which is fine except when we drop the packet and forget to free it leaking the skb and the dst entry attached to. As it turns out the dst entry hold a reference to the idev which holds the dev and keeps everything from being cleaned up. Yuck! By fixing my earlier thinko and add the needed kfree_skb and everything cleans up beautifully. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Guaranetee the proper ordering of the loopback device.Eric W. Biederman2008-11-05
| | | | | | | | | | | | | | | | | | | I was recently hunting a bug that occurred in network namespace cleanup. In looking at the code it became apparrent that we have and will continue to have cases where if we have anything going on in a network namespace there will be assumptions that the loopback device is present. Things like sending igmp unsubscribe messages when we bring down network devices invokes the routing code which assumes that at least the loopback driver is present. Therefore to avoid magic initcall ordering hackery that is hard to follow and hard to get right insert a call to register the loopback device directly from net_dev_init(). This guarantes that the loopback device is the first device registered and the last network device to go away. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netns: Delete virtual interfaces during namespace cleanupEric W. Biederman2008-11-05
| | | | | | | | | | | | | | | | | | When physical devices are inside of network namespace and that network namespace terminates we can not make them go away. We have to keep them and moving them to the initial network namespace is the best we can do. For virtual devices left in a network namespace that is exiting we have no need to preserve them and we now have the infrastructure that allows us to delete them. So delete virtual devices when we exit a network namespace. Keeping the necessary user space clean up after a network namespace exits much more tractable. Acked-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: sk_free_datagram() should use sk_mem_reclaim_partial()Eric Dumazet2008-11-05
| | | | | | | | | | | | | | | | | | | | | | I noticed a contention on udp_memory_allocated on regular UDP applications. While tcp_memory_allocated is seldom used, it appears each incoming UDP frame is currently touching udp_memory_allocated when queued, and when received by application. One possible solution is to use sk_mem_reclaim_partial() instead of sk_mem_reclaim(), so that we keep a small reserve (less than one page) of memory for each UDP socket. We did something very similar on TCP side in commit 9993e7d313e80bdc005d09c7def91903e0068f07 ([TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer()) A more complex solution would need to convert prot->memory_allocated to use a percpu_counter with batches of 64 or 128 pages. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Cleanup routines for feature negotiationGerrit Renker2008-11-05
| | | | | | | | | | This inserts the required de-allocation routines for memory allocated by feature negotiation in the socket destructors, replacing dccp_feat_clean() in one instance. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Per-socket initialisation of feature negotiationGerrit Renker2008-11-05
| | | | | | | | | | | | | | | This provides feature-negotiation initialisation for both DCCP sockets and DCCP request_sockets, to support feature negotiation during connection setup. It also resolves a FIXME regarding the congestion control initialisation. Thanks to Wei Yongjun for help with the IPv6 side of this patch. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: List management for new feature negotiationGerrit Renker2008-11-05
| | | | | | | | | | | This adds list initial fields and list management functions for the new feature negotiation implementation. Thanks to Arnaldo for suggestions and improvements. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Implement lookup table for feature-negotiation informationGerrit Renker2008-11-05
| | | | | | | | | | | | A lookup table for feature-negotiation information, extracted from RFC 4340/42, is provided by this patch. All currently known features can be found in this table, along with their feature location, their default value, and type. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dccp: Basic data structure for feature negotiationGerrit Renker2008-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch prepares for the new and extended feature-negotiation routines. The following feature-negotiation data structures are provided: * a container for the various (SP or NN) values, * symbolic state names to track feature states, * an entry struct which holds all current information together, * elementary functions to fill in and process these structures. Entry structs are arranged as FIFO for the following reason: RFC 4340 specifies that if multiple options of the same type are present, they are processed in the order of their appearance in the packet; which means that this order needs to be preserved in the local data structure (the later insertion code also respects this order). The struct list_head has been chosen for the following reasons: the most frequent operations are * add new entry at tail (when receiving Change or setting socket options); * delete entry (when Confirm has been received); * deep copy of entire list (cloning from listening socket onto request socket). The NN value has been set to 64 bit, which is a currently sufficient upper limit (Sequence Window feature has 48 bit). Thanks to Arnaldo, who contributed the streamlined layout of the entry struct. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: #ifdef ->sk_securityAlexey Dobriyan2008-11-04
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/: Kill now superfluous ->last_rx stores.David S. Miller2008-11-04
| | | | | | | | The generic packet receive code takes care of setting netdev->last_rx when necessary, for the sake of the bonding ARP monitor. Signed-off-by: David S. Miller <davem@davemloft.net>
* netem: eliminate unneeded return valuesStephen Hemminger2008-11-04
| | | | | | | | All these individual parsing functions never return an error, so they can be void. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net: Kill now superfluous ->last_rx stores.David S. Miller2008-11-04
| | | | | | | | | | | | | The generic packet receive code takes care of setting netdev->last_rx when necessary, for the sake of the bonding ARP monitor. Drivers need not do it any more. Some cases had to be skipped over because the drivers were making use of the ->last_rx value themselves. Signed-off-by: David S. Miller <davem@davemloft.net>
* net: remove two duplicated #includeJianjun Kong2008-11-03
| | | | | | | | Removed duplicated #include <rdma/ib_verbs.h> in net/9p/trans_rdma.c and #include <linux/thread_info.h> in net/socket.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: '&' reduxAlexey Dobriyan2008-11-03
| | | | | | | | | | | | | I want to compile out proc_* and sysctl_* handlers totally and stub them to NULL depending on config options, however usage of & will prevent this, since taking adress of NULL pointer will break compilation. So, drop & in front of every ->proc_handler and every ->strategy handler, it was never needed in fact. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bonding, net: Move last_rx update into bonding recv logicJay Vosburgh2008-11-03
| | | | | | | | | | | | | The only user of the net_device->last_rx field is bonding. This patch adds a conditional update of last_rx to the bonding special logic in skb_bond_should_drop, causing last_rx to only be updated when the ARP monitor is running. This frees network device drivers from the necessity of updating last_rx, which can have cache line thrash issues. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: increase receive packet quantumStephen Hemminger2008-11-03
| | | | | | | | | | | This patch gets about 1.25% back on tbench regression. My change to NAPI for multiqueue support changed the time limit on network receive processing. Under sustained loads like tbench, this can cause the receiver to reschedule prematurely. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* printk: ipv4 address digits printed in reverse orderHarvey Harrison2008-11-03
| | | | | | | | | | | put_dec_trunc prints the digits in reverse order and is reversed inside number(). Continue using put_dec_trunc, but reverse each quad in ip4_addr_string. [Noticed by Julius Volz] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPVS: Remove supports_ipv6 scheduler flagJulius Volz2008-11-03
| | | | | | | | Remove the 'supports_ipv6' scheduler flag since all schedulers now support IPv6. Signed-off-by: Julius Volz <julius.volz@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPVS: Add IPv6 support to LBLC/LBLCR schedulersJulius Volz2008-11-03
| | | | | | | | | | | Add IPv6 support to LBLC and LBLCR schedulers. These were the last schedulers without IPv6 support, but we might want to keep the supports_ipv6 flag in the case of future schedulers without IPv6 support. Signed-off-by: Julius Volz <julius.volz@gmail.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Update version to 3.95Matt Carlson2008-11-03
| | | | | | | | This patch updates the version to 3.95. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* broadcom: Add support for BCM50610Matt Carlson2008-11-03
| | | | | | | | | This patch adds the BCM50610 to the list of phys supported by the broadcom driver. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* broadcom: Refine expansion register access routineMatt Carlson2008-11-03
| | | | | | | | | This patch makes the expansion register access routines a little more formal. They will be used by the following bcm50610 support patch. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* broadcom: Add flow control supportMatt Carlson2008-11-03
| | | | | | | | This patch adds flow control support to Broadcom phys. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: 5785 enhancementsMatt Carlson2008-11-03
| | | | | | | | This patch refines support for the 5785 device. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Refine phylib supportMatt Carlson2008-11-03
| | | | | | | | | | | | | | | | | This patch refines the phylib support in the tg3 driver. The patch does the following things : * Rename tg3_mdio_config() to tg3_mdio_config_5785(). The 5785 will be the only device that will use it so the name might as well reflect that. * Fix a memory leak if mdiobus_register() fails. * Add code to deal with phy device detection failures. * Add code to correct the supported list of phy features based on the MAC <=> PHY interface. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Allow WOL for phylib controlled Broadcom physMatt Carlson2008-11-03
| | | | | | | | | | This patch allows WOL to be enabled for Broadcom phys under phylib control. The only exception is the AC131, which has a completely different register set. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Refine power management and WOL codeMatt Carlson2008-11-03
| | | | | | | | | | | | | | | | | | | | | Commit 12dac0756d357325b107fe6ec24921ec38661839 ("tg3: adapt tg3 to use reworked PCI PM code") introduced the new PCI PM API to the tg3 driver. The patch was understandably conservative, so this patch elaborates on that work. The patch starts by creating a single point in tg3_set_power_state() to decide whether or not to enable WOL. The rest of the code in tg3_set_power_state() was then pivoted to use the result of this decision. The patch then makes sure the device is allowed to wakeup before reporting whether or not WOL is currently enabled. The final hunks of the patch consolidate where the WOL capability and WOL enabled flags are set to a single location. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Move phylib report to end of tg3_init_oneMatt Carlson2008-11-03
| | | | | | | | | | Currently, phylib reports appear with a eth%d prefix. Move the line after register_netdev() and place it alongside the other informative messages. Update nearby informative messages accordingly. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Do not enable APE on bcm5700Matt Carlson2008-11-03
| | | | | | | | | | | With older versions of the NVRAM format, the driver may mistakenly determine that APE is enabled. Make sure this doesn't happen by restricting the ENABLE_APE check to devices known to have more recent NVRAM image formats. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Reclaim TG3_FLG3_5761_5784_AX_FIXES flagMatt Carlson2008-11-03
| | | | | | | | | | This patch reclaims the TG3_FLG3_5761_5784_AX_FIXES flag. It only used twice in non-fast paths. This patch also consolidates some other places where specific 5784 AX chip revisions can be generalized. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Preserve LAA when device control is releasedMatt Carlson2008-11-03
| | | | | | | | | | | This patch moves the __tg3_set_mac_addr() function earlier in the file listing, to avoid a function prototype, and calls the function to restore the LAA after a driver unload chip reset. With this code in place, the administrator can wake the machine using the LAA. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Preserve DASH connectivity when WOL enabledMatt Carlson2008-11-03
| | | | | | | | | | | | | | | | | | | | | DASH firmware runs on the APE side of the chip, but it requires a few MAC to be programmed correctly. When WOL is enabled and management firmware is disabled, incoming packets are evaluated and discarded at the chip's rule processor. When management firmware is enabled, the hardware must be informed that there are agents further up the stack that still use the incoming frames. Normally management firmware will configure the MAC correctly on its own, but there can be cases where the setting could get clobbered by the driver. The first hunk of this patch preserves this setting. The second hunk of this patch wipes out the driver present signature of the APE memory space. By doing so, the DASH firmware can assume driver absent behavior. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Use pci_ioremap_bar()Matt Carlson2008-11-03
| | | | | | | | | | | | | This patch replaces the existing APE register mapping code with a call to pci_ioremap_bar(). The code that maps the main device register space did not undergo a similar change because the information derived from the pci_resource_start() and pci_resource_len() is still used to populate the (optional) mem_start and mem_end netdevice members. Replace hardcoded constants where appropriate. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tg3: Add 5761S supportMatt Carlson2008-11-03
| | | | | | | | This patch adds support for the 5761S chip variants. Signed-off-by: Matt Carlson <mcarlson@broadcom.com> Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: sch_generic: Kfree gso_skb in qdisc_reset()Jarek Poplawski2008-11-03
| | | | | | | | | | | Since gso_skb is re-used for qdisc_peek_dequeued(), and this skb is counted in the qdisc->q.qlen, it has to be kfreed during qdisc_reset() when qlen is zeroed. With help from David S. Miller <davem@davemloft.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/tcp_ipv4.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/devinet.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/pararp.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ip_fragment.c tcp_timer.c ip_input.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ipmr.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ip_sockglue.c tcp_output.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/igmp.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/fib_frontend.c fib_hash.c ip_gre.cJianjun Kong2008-11-03
| | | | | Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c ↵Jianjun Kong2008-11-03
| | | | | | | xfrm4_policy.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: clean up net/ipv4/ah4.c esp4.c fib_semantics.c inet_connection_sock.c ↵Jianjun Kong2008-11-03
| | | | | | | inetpeer.c ip_output.c Signed-off-by: Jianjun Kong <jianjun@zeuux.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* sunrpc: Fix build warning due to typo in %pI4 format changes.David S. Miller2008-11-03
| | | | | | Noticed by Stephen Hemminger. Signed-off-by: David S. Miller <davem@davemloft.net>
* IPVS: Add IPv6 support to SH and DH schedulersJulius Volz2008-11-03
| | | | | | | | | | Add IPv6 support to SH and DH schedulers. I hope this simple IPv6 address hashing is good enough. The 128 bit are just XORed into 32 before hashing them like an IPv4 address. Signed-off-by: Julius Volz <julius.volz@gmail.com> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>