aboutsummaryrefslogtreecommitdiffstats
path: root/net
Commit message (Collapse)AuthorAge
* [IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by defaultDavid S. Miller2005-10-03
| | | | | | | It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Get rid of bogus __in_put_dev in pktgenHerbert Xu2005-10-03
| | | | | | | | This patch gets rid of a bogus __in_dev_put() in pktgen.c. This was spotted by Suzanne Wood. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnlHerbert Xu2005-10-03
| | | | | | | | | | | | | | | | | | The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: Fix leak added by udp connect dst caching fix.David S. Miller2005-10-03
| | | | | | Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV6]: Fix ipv6 fragment ID selection at slow pathYan Zheng2005-10-03
| | | | | Signed-Off-By: Yan Zheng <yanzheng@21cn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [IPV4]: Fix "Proxy ARP seems broken"Herbert Xu2005-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | Meelis Roos <mroos@linux.ee> wrote: > RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3, > RK> it appears to be completely broken. The firewall is 212.18.232.186, > > Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to > ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging > yet since it'a a production gateway. The breakage is caused by the change to use the CB area for flagging whether a packet has been queued due to proxy_delay. This area gets cleared every time arp_rcv gets called. Unfortunately packets delayed due to proxy_delay also go through arp_rcv when they are reprocessed. In fact, I can't think of a reason why delayed proxy packets should go through netfilter again at all. So the easiest solution is to bypass that and go straight to arp_process. This is essentially what would've happened before netfilter support was added to ARP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix "sysctl_net.c:36: error: 'core_table' undeclared here"Russell King2005-10-03
| | | | | | | | | | | | | | | | | | | | | | | During the build for ARM machine type "fortunet", this error occurred: CC net/sysctl_net.o net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function) It appears that the following configuration settings cause this error due to a missing include: CONFIG_SYSCTL=y CONFIG_NET=y # CONFIG_INET is not set core_table appears to be declared in net/sock.h. if CONFIG_INET were defined, net/sock.h would have been included via: sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h so include it directly. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* [INET]: speedup inet (tcp/dccp) lookupsEric Dumazet2005-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! */ } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto *skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* [NET]: Fix packet timestamping.Herbert Xu2005-10-03
| | | | | | | | | | | | | | | | | | | I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ATM]: [lec] reset retry counter when new arp issuedScott Talbert2005-09-29
| | | | | | From: Scott Talbert <scott.talbert@lmco.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
* [ATM]: [lec] attempt to support cisco failoverScott Talbert2005-09-29
| | | | | | From: Scott Talbert <scott.talbert@lmco.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Don't over-clamp window in tcp_clamp_window()Alexey Kuznetsov2005-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: David S. Miller <davem@davemloft.net>
* [TCP]: Revert 6b251858d377196b8cea20e65cae60f584a42735David S. Miller2005-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | But retain the comment fix. Alexey Kuznetsov has explained the situation as follows: -------------------- I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is not continuous: f.e. for mss=1095 it needs initial window 1095*4, but for mss=1096 it is 1096*3. We do not know exactly what mss sender used for calculations. If we advertised 1096 (and calculate initial window 3*1096), the sender could limit it to some value < 1096 and then it will need window his_mss*4 > 3*1096 to send initial burst. See? So, the honest function for inital rcv_wnd derived from tcp_init_cwnd() is: init_rcv_wnd(mss)= min { init_cwnd(mss1)*mss1 for mss1 <= mss } It is something sort of: if (mss < 1096) return mss*4; if (mss < 1096*2) return 1096*4; return mss*2; (I just scrablled a graph of piece of paper, it is difficult to see or to explain without this) I selected it differently giving more window than it is strictly required. Initial receive window must be large enough to allow sender following to the rfc (or just setting initial cwnd to 2) to send initial burst. But besides that it is arbitrary, so I decided to give slack space of one segment. Actually, the logic was: If mss is low/normal (<=ethernet), set window to receive more than initial burst allowed by rfc under the worst conditions i.e. mss*4. This gives slack space of 1 segment for ethernet frames. For msses slighlty more than ethernet frame, take 3. Try to give slack space of 1 frame again. If mss is huge, force 2*mss. No slack space. Value 1460*3 is really confusing. Minimal one is 1096*2, but besides that it is an arbitrary value. It was meant to be ~4096. 1460*3 is just the magic number from RFC, 1460*3 = 1095*4 is the magic :-), so that I guess hands typed this themselves. -------------------- Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2005-09-29
|\
| * [NET]: Fix reversed logic in eth_type_trans().David S. Miller2005-09-29
| | | | | | | | | | | | I got the second compare_eth_addr() test reversed, oops. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ATM]: fix bug in atm address list handlingMartin Whitaker2005-09-28
| | | | | | | | | | From: Martin Whitaker <atm@martin-whitaker.co.uk> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
| * [ATM]: track and close listen sockets when sigd exitsChas Williams2005-09-28
| | | | | | | | Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
| * [ATM]: net/atm/ioctl.c: autoload pppoatm and br2684Roman Kagan2005-09-28
| | | | | | | | | | Signed-off-by: Roman Kagan <rkagan@mail.ru> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil>
| * [TCP]: Fix init_cwnd calculations in tcp_select_initial_window()David S. Miller2005-09-28
| | | | | | | | | | | | | | Match it up to what RFC2414 really specifies. Noticed by Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [APPLETALK]: Fix broadcast bug.Oliver Dawid2005-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Oliver Dawid <oliver@helios.de> we found a bug in net/appletalk/ddp.c concerning broadcast packets. In kernel 2.4 it was working fine. The bug first occured 4 years ago when switching to new SNAP layer handling. This bug can be splitted up into a sending(1) and reception(2) problem: Sending(1) In kernel 2.4 broadcast packets were sent to a matching ethernet device and atalk_rcv() was called to receive it as "loopback" (so loopback packets were shortcutted and handled in DDP layer). When switching to the new SNAP structure, this shortcut was removed and the loopback packet was send to SNAP layer. The author forgot to replace the remote device pointer by the loopback device pointer before sending the packet to SNAP layer (by calling ddp_dl->request() ) therfor the packet was not sent back by underlying layers to ddp's atalk_rcv(). Reception(2) In atalk_rcv() a packet received by this loopback mechanism contains now the (rigth) loopback device pointer (in Kernel 2.4 it was the (wrong) remote ethernet device pointer) and therefor no matching socket will be found to deliver this packet to. Because a broadcast packet should be send to the first matching socket (as it is done in many other protocols (?)), we removed the network comparison in broadcast case. Below you will find a patch to correct this bug. Its diffed to kernel 2.6.14-rc1 Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Slightly optimize ethernet address comparison.David S. Miller2005-09-27
| | | | | | | | | | | | | | | | | | We know the thing is at least 2-byte aligned, so take advantage of that instead of invoking memcmp() which results in truly horrifically inefficient code because it can't assume anything about alignment. Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ROSE]: fix typo (regeistration)Alexey Dobriyan2005-09-27
| | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ROSE]: check rose_ndevs earlierAlexey Dobriyan2005-09-27
| | | | | | | | | | | | | | | | * Don't bother with proto registering if rose_ndevs is bad. * Make escape structure more coherent. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ROSE]: return sane -E* from rose_proto_init()Alexey Dobriyan2005-09-27
| | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [ROSE]: do proto_unregister() on exit pathsAlexey Dobriyan2005-09-27
| | | | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Fix module reference counts for loadable protocol modulesFrank Filz2005-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I have been experimenting with loadable protocol modules, and ran into several issues with module reference counting. The first issue was that __module_get failed at the BUG_ON check at the top of the routine (checking that my module reference count was not zero) when I created the first socket. When sk_alloc() is called, my module reference count was still 0. When I looked at why sctp didn't have this problem, I discovered that sctp creates a control socket during module init (when the module ref count is not 0), which keeps the reference count non-zero. This section has been updated to address the point Stephen raised about checking the return value of try_module_get(). The next problem arose when my socket init routine returned an error. This resulted in my module reference count being decremented below 0. My socket ops->release routine was also being called. The issue here is that sock_release() calls the ops->release routine and decrements the ref count if sock->ops is not NULL. Since the socket probably didn't get correctly initialized, this should not be done, so we will set sock->ops to NULL because we will not call try_module_get(). While searching for another bug, I also noticed that sys_accept() has a possibility of doing a module_put() when it did not do an __module_get so I re-ordered the call to security_socket_accept(). Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Prefetch dev->qdisc_lock in dev_queue_xmit()Eric Dumazet2005-09-27
| | | | | | | | | | | | | | We know the lock is going to be taken. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * [NET]: Use non-recursive algorithm in skb_copy_datagram_iovec()Daniel Phillips2005-09-27
| | | | | | | | | | | | | | | | Use iteration instead of recursion. Fraglists within fraglists should never occur, so we BUG check this. Signed-off-by: Daniel Phillips <phillips@istop.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [PATCH] proc_mkdir() should be used to create procfs directoriesAl Viro2005-09-29
|/ | | | | | | | A bunch of create_proc_dir_entry() calls creating directories had crept in since the last sweep; converted to proc_mkdir(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NEIGH]: Add debugging check when adding timers.David S. Miller2005-09-27
| | | | | | | | If we double-add a neighbour entry timer, which should be impossible but has been reported, dump the current state of the entry so that we can debug this. Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/llc-2.6David S. Miller2005-09-26
|\
| * [LLC]: fix llc_ui_recvmsg, making it behave like tcp_recvmsgArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | | | In fact it is an exact copy of the parts that makes sense to LLC :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Fix the accept pathArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | | | | | | | | | | | Borrowing the structure of TCP/IP for this. On the receive of new connections I was bh_lock_socking the _new_ sock, not the listening one, duh, now it survives the ssh connections storm I've been using to test this specific bug. Also fixes send side skb sock accounting. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Fix sparse warningsArnaldo Carvalho de Melo2005-09-22
| | | | | | | | Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [TR]: Set correct frame type for SNAP packetsJochen Friedrich2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Fix llc_fixup_skb() bugJochen Friedrich2005-09-22
| | | | | | | | | | | | | | | | llc_fixup_skb() had a bug dropping 3 bytes packets (like UA frames). Token ring doesn't pad these frames. Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Fix for Bugzilla ticket #5157Jochen Friedrich2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Fix for Bugzilla ticket #5156Jochen Friedrich2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Use refcounting with struct llc_sapArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Do better struct sock accounting on skbsArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Use sk_wait_dataArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Use some more likely/unlikelyArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Add sysctl support for the LLC timeoutsArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Use the sk_wait_event primitiveArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Convert llc_ui_wait_for_ functions to use prepare_to_wait/finish_waitArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | | | | | And make it look more like the similar routines in the TCP/IP source code. Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Remove unused functions from llc_c_ev.cArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Use const in llc_c_ev.cArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Help the compiler with likely/unlikely, saving some more bytesArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Mark llc_find_next_offset as __init, saving some more bytesArnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
| * [LLC]: Update comments for llc_ui_bind and llc_ui_autobind to match new ↵Arnaldo Carvalho de Melo2005-09-22
| | | | | | | | | | | | | | behaviour Signed-off-by: Jochen Friedrich <jochen@scram.de> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>