aboutsummaryrefslogtreecommitdiffstats
path: root/net/ipv4
Commit message (Collapse)AuthorAge
...
| * | | | | | TCPCT part 1e: implement socket option TCP_COOKIE_TRANSACTIONSWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide per socket control of the TCP cookie option and SYN/SYNACK data. This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 The principle difference is using a TCP option to carry the cookie nonce, instead of a user configured offset in the data. Allocations have been rearranged to avoid requiring GFP_ATOMIC. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS TCPCT part 1d: define TCP cookie option, extend existing struct's Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | TCPCT part 1d: define TCP cookie option, extend existing struct'sWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Data structures are carefully composed to require minimal additions. For example, the struct tcp_options_received cookie_plus variable fits between existing 16-bit and 8-bit variables, requiring no additional space (taking alignment into consideration). There are no additions to tcp_request_sock, and only 1 pointer in tcp_sock. This is a significantly revised implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 The principle difference is using a TCP option to carry the cookie nonce, instead of a user configured offset in the data. This is more flexible and less subject to user configuration error. Such a cookie option has been suggested for many years, and is also useful without SYN data, allowing several related concepts to use the same extension option. "Re: SYN floods (was: does history repeat itself?)", September 9, 1996. http://www.merit.net/mail.archives/nanog/1996-09/msg00235.html "Re: what a new TCP header might look like", May 12, 1998. ftp://ftp.isi.edu/end2end/end2end-interest-1998.mail These functions will also be used in subsequent patches that implement additional features. Requires: TCPCT part 1a: add request_values parameter for sending SYNACK TCPCT part 1b: generate Responder Cookie secret TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONS Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | TCPCT part 1c: sysctl_tcp_cookie_size, socket option TCP_COOKIE_TRANSACTIONSWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define sysctl (tcp_cookie_size) to turn on and off the cookie option default globally, instead of a compiled configuration option. Define per socket option (TCP_COOKIE_TRANSACTIONS) for setting constant data values, retrieving variable cookie values, and other facilities. Move inline tcp_clear_options() unchanged from net/tcp.h to linux/tcp.h, near its corresponding struct tcp_options_received (prior to changes). This is a straightforward re-implementation of an earlier (year-old) patch that no longer applies cleanly, with permission of the original author (Adam Langley): http://thread.gmane.org/gmane.linux.network/102586 These functions will also be used in subsequent patches that implement additional features. Requires: net: TCP_MSS_DEFAULT, TCP_MSS_DESIRED Signed-off-by: William.Allen.Simpson@gmail.com Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | TCPCT part 1b: generate Responder Cookie secretWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define (missing) hash message size for SHA1. Define hashing size constants specific to TCP cookies. Add new function: tcp_cookie_generator(). Maintain global secret values for tcp_cookie_generator(). This is a significantly revised implementation of earlier (15-year-old) Photuris [RFC-2522] code for the KA9Q cooperative multitasking platform. Linux RCU technique appears to be well-suited to this application, though neither of the circular queue items are freed. These functions will also be used in subsequent patches that implement additional features. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | TCPCT part 1a: add request_values parameter for sending SYNACKWilliam Allen Simpson2009-12-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add optional function parameters associated with sending SYNACK. These parameters are not needed after sending SYNACK, and are not used for retransmission. Avoids extending struct tcp_request_sock, and avoids allocating kernel memory. Also affects DCCP as it uses common struct request_sock_ops, but this parameter is currently reserved for future use. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge branch 'master' of ↵David S. Miller2009-12-02
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/ht.c
| | * | | | | | ip_fragment: also adjust skb->truesize for packets not owned by a socketPatrick McHardy2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a large packet gets reassembled by ip_defrag(), the head skb accounts for all the fragments in skb->truesize. If this packet is refragmented again, skb->truesize is not re-adjusted to reflect only the head size since its not owned by a socket. If the head fragment then gets recycled and reused for another received fragment, it might exceed the defragmentation limits due to its large truesize value. skb_recycle_check() explicitly checks for linear skbs, so any recycled skb should reflect its true size in skb->truesize. Change ip_fragment() to also adjust the truesize value of skbs not owned by a socket. Reported-and-tested-by: Ben Menchaca <ben@bigfootnetworks.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | | | | tcp: tcp_disconnect() should clear window_clampEric Dumazet2009-11-30
| | | |/ / / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NFS can reuse its TCP socket after calling tcp_disconnect(). We noticed window scaling was not negotiated in SYN packet of next connection request. Fix is to clear tp->window_clamp in tcp_disconnect(). Reported-by: Krzysztof Oledzki <ole@ans.pl> Tested-by: Krzysztof Oledzki <ole@ans.pl> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: Simplify ipip pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: Simplify ip_gre pernet operations.Eric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Take advantage of the new pernet automatic storage management, and stop using compatibility network namespace functions. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: NETDEV_UNREGISTER_PERNET -> NETDEV_UNREGISTER_BATCHEric W. Biederman2009-12-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation for an additional notifier in batched netdevice notification (rt_do_flush) only needs to be called once per batch not once per namespace. For further batching improvements I need a guarantee that the netdevices are unregistered in order allowing me to unregister an all of the network devices in a network namespace at the same time with the guarantee that the loopback device is really and truly unregistered last. Additionally it appears that we moved the route cache flush after the final synchronize_net, which seems wrong and there was no explanation. So I have restored the original location of the final synchronize_net. Cc: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: Move && and || to end of previous lineJoe Perches2009-11-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | xfrm: Use the user specified truncation length in ESP and AHMartin Willi2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of using the hardcoded truncation for authentication algorithms, use the truncation length specified on xfrm_state. Signed-off-by: Martin Willi <martin@strongswan.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: convert /proc/net/rt_acct to seq_fileAlexey Dobriyan2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rewrite statistics accumulation to be in terms of structure fields, not raw u32 additions. Keep them in same order, though. This is the last user of create_proc_read_entry() in net/, please NAK all new ones as well as all new ->write_proc, ->read_proc and create_proc_entry() users. Cc me if there are problems. :-) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: use net_eq to compare netsOctavian Purdila2009-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Generated with the following semantic patch @@ struct net *n1; struct net *n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net *n1; struct net *n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net/ipv4: Move && and || to end of previous lineJoe Perches2009-11-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Sun, 2009-11-22 at 16:31 -0800, David Miller wrote: > It should be of the form: > if (x && > y) > > or: > if (x && y) > > Fix patches, rather than complaints, for existing cases where things > do not follow this pattern are certainly welcome. Also collapsed some multiple tabs to single space. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | tcp: Don't make syn cookies initial setting depend on CONFIG_SYSCTLDavid S. Miller2009-11-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | That's extremely non-intuitive, noticed by William Allen Simpson. And let's make the default be on, it's been suggested by a lot of people so we'll give it a try. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | netns: net_identifiers should be read_mostlyEric Dumazet2009-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv4: factorize cache clearing for batched unregister operationsOctavian Purdila2009-11-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge branch 'master' of ↵David S. Miller2009-11-17
| |\ \ \ \ \ \ | | | |_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/Kconfig
| * | | | | | inetpeer: Optimize inet_getid()Eric Dumazet2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While investigating for network latencies, I found inet_getid() was a contention point for some workloads, as inet_peer_idlock is shared by all inet_getid() users regardless of peers. One way to fix this is to make ip_id_count an atomic_t instead of __u16, and use atomic_add_return(). In order to keep sizeof(struct inet_peer) = 64 on 64bit arches tcp_ts_stamp is also converted to __u32 instead of "unsigned long". Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | ipv4: speedup inet_dump_ifaddr()Eric Dumazet2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stephen Hemminger a écrit : > On Thu, 12 Nov 2009 15:11:36 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> When handling large number of netdevices, inet_dump_ifaddr() >> is very slow because it has O(N^2) complexity. >> >> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES >> sub lists of the dev_index hash table, and RCU lookups. >> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > You might be able to make RCU critical section smaller by moving > it into loop. > Indeed. But we dump at most one skb (<= 8192 bytes ?), so rcu_read_lock holding time is small, unless we meet many netdevices without addresses. I wonder if its really common... Thanks [PATCH net-next-2.6] ipv4: speedup inet_dump_ifaddr() When handling large number of netdevices, inet_dump_ifaddr() is very slow because it has O(N2) complexity. Instead of scanning one single list, we can use the NETDEV_HASHENTRIES sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | igmp: Use next_net_device_rcu()Eric Dumazet2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to use next_det_device_rcu() in RCU protected section. We also can avoid in_dev_get()/in_dev_put() overhead (code size mainly) in rcu_read_lock() sections. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | net: TCP_MSS_DEFAULT, TCP_MSS_DESIREDWilliam Allen Simpson2009-11-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Define two symbols needed in both kernel and user space. Remove old (somewhat incorrect) kernel variant that wasn't used in most cases. Default should apply to both RMSS and SMSS (RFC2581). Replace numeric constants with defined symbols. Stand-alone patch, originally developed for TCPCT. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | IPV4: use rcu to walk list of devices in IGMPstephen hemminger2009-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also needs to be optimized for large number of devices. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | udp: bind() optimisationEric Dumazet2009-11-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | Merge branch 'master' of ↵David S. Miller2009-11-09
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/can/usb/ems_usb.c
| * | | | | | | udp: multicast RX should increment SNMP/sk_drops counter in allocation failuresEric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When skb_clone() fails, we should increment sk_drops and SNMP counters. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv4: udp: Optimise multicast receptionEric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. It's also a base for a future RCU conversion of multicast recption. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Lucian Adrian Grijincu <lgrijincu@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | ipv4: udp: optimize unicast RX pathEric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain we added in previous patch. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, INADDR_ANY) chain to find socket not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | udp: secondary hash on (local port, local address)Eric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | udp: split sk_hash into two u16 hashesEric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Union sk_hash with two u16 hashes for udp (no extra memory taken) One 16 bits hash on (local port) value (the previous udp 'hash') One 16 bits hash on (local address, local port) values, initialized but not yet used. This second hash is using jenkin hash for better distribution. Because the 'port' is xored later, a partial hash is performed on local address + net_hash_mix(net) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | udp: add a counter into udp_hslotEric Dumazet2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | net: Support specifying the network namespace upon device creation.Eric W. Biederman2009-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
| * | | | | | | Merge branch 'master' of ↵David S. Miller2009-11-06
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/usb/cdc_ether.c All CDC ethernet devices of type USB_CLASS_COMM need to use '&mbm_info'. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | ip_frag: dont touch device refcountEric Dumazet2009-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When sending fragmentation expiration ICMP V4/V6 messages, we can avoid touching device refcount, thanks to RCU Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: check kern before calling security subsystemEric Paris2009-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before calling capable(CAP_NET_RAW) check if this operations is on behalf of the kernel or on behalf of userspace. Do not do the security check if it is on behalf of the kernel. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: pass kern to net_proto_family create functionEric Paris2009-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The generic __sock_create function has a kern argument which allows the security system to make decisions based on if a socket is being created by the kernel or by userspace. This patch passes that flag to the net_proto_family specific create function, so it can do the same thing. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: drop capability from protocol definitionsEric Paris2009-11-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct can_proto had a capability field which wasn't ever used. It is dropped entirely. struct inet_protosw had a capability field which can be more clearly expressed in the code by just checking if sock->type = SOCK_RAW. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | tcp: Use defaults when no route options are availableGilad Ben-Yossef2009-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Trying to parse the option of a SYN packet that we have no route entry for should just use global wide defaults for route entry options. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Tested-by: Valdis.Kletnieks@vt.edu Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | tcp: Do not call IPv4 specific func in tcp_check_reqGilad Ben-Yossef2009-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling IPv4 specific inet_csk_route_req in tcp_check_req is a bad idea and crashes machine on IPv6 connections, as reported by Valdis Kletnieks Also, all we are really interested in is the timestamp option in the header, so calling tcp_parse_options() with the "estab" set to false flag is an overkill as it tries to parse half a dozen other TCP options. We know whether timestamp should be enabled or not using data from request_sock. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Tested-by: Valdis.Kletnieks@vt.edu Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: net/ipv4/devinet.c cleanupsEric Dumazet2009-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed by Stephen Rothwell, commit c6d14c84 added a warning : net/ipv4/devinet.c: In function 'inet_select_addr': net/ipv4/devinet.c:902: warning: label 'out' defined but not used delete unused 'out' label and do some cleanups as well Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: Introduce for_each_netdev_rcu() iteratorEric Dumazet2009-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds RCU management to the list of netdevices. Convert some for_each_netdev() users to RCU version, if it can avoid read_lock-ing dev_base_lock Ie: read_lock(&dev_base_loack); for_each_netdev(net, dev) some_action(); read_unlock(&dev_base_lock); becomes : rcu_read_lock(); for_each_netdev_rcu(net, dev) some_action(); rcu_read_unlock(); Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | icmp: icmp_send() can avoid a dev_put()Eric Dumazet2009-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can avoid touching device refcount in icmp_send(), using dev_get_by_index_rcu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | ipv4: inetdev_by_index() switch to RCUEric Dumazet2009-11-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dev_get_by_index_rcu() instead of __dev_get_by_index() and dev_base_lock rwlock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Merge branch 'master' of ↵David S. Miller2009-10-30
| |\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
| * | | | | | | | | net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build timeCyrill Gorcunov2009-10-29
| | |_|_|_|_|/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | proto_ops->getname implies copying protocol specific data into storage unit (particulary to __kernel_sockaddr_storage). So when we implement new protocol support we should keep such a detail in mind (which is easy to forget about). Lets introduce DECLARE_SOCKADDR helper which check if storage unit is not overfowed at build time. Eventually inet_getname is switched to use DECLARE_SOCKADDR (to show example of usage). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | net: Cleanup redundant tests on unsignedroel kluin2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | optlen is unsigned so the `< 0' test is never true. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Allow disabling of DSACK TCP option per routeGilad Ben-Yossef2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add and use no DSCAK bit in the features field. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | | Allow to turn off TCP window scale opt per routeGilad Ben-Yossef2009-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add and use no window scale bit in the features field. Note that this is not the same as setting a window scale of 0 as would happen with window limit on route. Signed-off-by: Gilad Ben-Yossef <gilad@codefidence.com> Sigend-off-by: Ori Finkelman <ori@comsleep.com> Sigend-off-by: Yony Amit <yony@comsleep.com> Signed-off-by: David S. Miller <davem@davemloft.net>