litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	[NETFILTER]: Fix ip6_table.c build with NETFILTER_DEBUG enabled.	Andrew Morton	2005-10-15
\| \| \| \| \|	Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Ratelimit debugging warning.	Herbert Xu	2005-10-13
\| \| \| \| \| \| \|	Better safe than sorry. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Disable NET_SCH_CLK_CPU for SMP x86 hosts	Andi Kleen	2005-10-13
\| \| \| \| \| \| \| \| \| \|	Opterons with frequency scaling have fully unsynchronized TSCs running at different frequencies, so using TSCs there is not a good idea. Also some other x86 boxes have this problem. gettimeofday should be good enough, so just disable it. Signed-off-by: Andi Kleen <ak@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering.	David S. Miller	2005-10-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Original patch by Harald Welte, with feedback from Herbert Xu and testing by Sébastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!"	Herbert Xu	2005-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second report of this bug. Unfortunately the first reporter hasn't been able to reproduce it since to provide more debugging info. So let's apply this patch for 2.6.14 to 1) Make this non-fatal. 2) Provide the info we need to track it down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[BRIDGE]: fix race on bridge del if	Stephen Hemminger	2005-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes the RCU race on bridge delete interface. Basically, the network device has to be detached from the bridge in the first step (pre-RCU), rather than later. At that point, no more bridge traffic will come in, and the other code will not think that network device is part of a bridge. This should also fix the XEN test problems. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TWSK]: Grab the module refcount for timewait sockets	Arnaldo Carvalho de Melo	2005-10-11
\| \| \| \| \| \| \| \|	This is required to avoid unloading a module that has active timewait sockets, such as DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[DCCP]: Transition from PARTOPEN to OPEN when receiving DATA packets	Arnaldo Carvalho de Melo	2005-10-11
\| \| \| \| \| \| \| \|	Noticed by Andrea Bittau, that provided a patch that was modified to not transition from RESPOND to OPEN when receiving DATA packets. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[CCID]: Check if ccid is NULL in the hc_[tr]x_exit functions	Arnaldo Carvalho de Melo	2005-10-11
\| \| \| \| \| \| \| \| \|	For consistency with ccid_exit and to fix a bug when IP_DCCP_UNLOAD_HACK is enabled as the control sock is not associated to any CCID. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] ctnetlink: add support to change protocol info	Pablo Neira Ayuso	2005-10-11
\| \| \| \| \| \| \| \| \|	This patch add support to change the state of the private protocol information via conntrack_netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] ctnetlink: allow userspace to change TCP state	Pablo Neira Ayuso	2005-10-11
\| \| \| \| \| \| \| \| \| \| \|	This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT	Harald Welte	2005-10-11
\| \| \| \| \| \| \| \| \| \|	Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPSEC] Fix block size/MTU bugs in ESP	Herbert Xu	2005-10-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes the following bugs in ESP: * Fix transport mode MTU overestimate. This means that the inner MTU is smaller than it needs be. Worse yet, given an input MTU which is a multiple of 4 it will always produce an estimate which is not a multiple of 4. For example, given a standard ESP/3DES/MD5 transform and an MTU of 1500, the resulting MTU for transport mode is 1462 when it should be 1464. The reason for this is because IP header lengths are always a multiple of 4 for IPv4 and 8 for IPv6. * Ensure that the block size is at least 4. This is required by RFC2406 and corresponds to what the esp_output function does. At the moment this only affects crypto_null as its block size is 1. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPSEC]: Use ALIGN macro in ESP	Herbert Xu	2005-10-11
\| \| \| \| \| \| \|	This patch uses the macro ALIGN in all the applicable spots for ESP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] ctnetlink: add one nesting level for TCP state	Pablo Neira Ayuso	2005-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] ctnetlink: ICMP ID is not mandatory	Pablo Neira Ayuso	2005-10-10
\| \| \| \| \| \| \| \| \| \| \|	The ID is only required by ICMP type 8 (echo), so it's not mandatory for all sort of ICMP connections. This patch makes mandatory only the type and the code for ICMP netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] conntrack_netlink: Fix endian issue with status from userspace	Harald Welte	2005-10-10
\| \| \| \| \| \| \| \| \|	When we send "status" from userspace, we forget to convert the endianness. This patch adds the reqired conversion. Thanks to Pablo Neira for discovering this. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] nfnetlink: use highest bit of nfa_type to indicate nested TLV	Harald Welte	2005-10-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As Henrik Nordstrom pointed out, all our efforts with "split endian" (i.e. host byte order tags, net byte order values) are useless, unless a parser can determine whether an attribute is nested or not. This patch steals the highest bit of nfattr.nfa_type to indicate whether the data payload contains a nested nfattr (1) or not (0). This will break userspace compatibility, but luckily no kernel with nfnetlink was released so far. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETE	Harald Welte	2005-10-10
\| \| \| \| \| \| \| \| \|	Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete. This should have been part of the original nfnetlink_log merge, but I somehow missed it. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER] PPTP helper: Add missing Kconfig dependency	Harald Welte	2005-10-10
\| \| \| \| \| \| \|	PPTP should not be selectable without conntrack enabled Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] gfp flags annotations - part 1	Al Viro	2005-10-08
\| \| \| \| \| \| \| \| \| \| \| \|	- added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
*	[ATM]: [br2684] if we free the skb, we should return 0	Jean-Denis Boyer	2005-10-07
\| \| \| \| \| \|	From: "Jean-Denis Boyer" <jdboyer@mediatrix.com> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[ATM]: add support for LECS addresses learned from network	Eric Kinzie	2005-10-07
\| \| \| \| \| \|	From: Eric Kinzie <ekinzie@cmf.nrl.navy.mil> Signed-off-by: Chas Williams <chas@cmf.nrl.navy.mil> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[SCTP] Fix sctp_get{pl}addrs() API to work with 32-bit apps on 64-bit kernels.	Ivan Skytte Jørgensen	2005-10-07
\| \| \| \| \| \| \| \| \|	The old socket options are marked with a _OLD suffix so that the existing 32-bit apps on 32-bit kernels do not break. Signed-off-by: Ivan Skytte Jørgensen <isj-sctp@i1.dk> Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[AX.25]: Fix packet socket crash	Ralf Baechle	2005-10-05
\| \| \| \| \| \| \| \| \| \|	Since changeset 98a82febb6340466824c3a453738d4fbd05db81a AX.25 is passing received IP and ARP packets to the stack through netif_rx() but we don't set the skb->mac.raw to right value which may result in a crash with applications that use a packet socket. Signed-off-by: Ralf Baechle DL5RB <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPSEC]: Document that policy direction is derived from the index.	Herbert Xu	2005-10-05
\| \| \| \| \| \| \| \| \| \| \|	Here is a patch that adds a helper called xfrm_policy_id2dir to document the fact that the policy direction can be and is derived from the index. This is based on a patch by YOSHIFUJI Hideaki and 210313105@suda.edu.cn. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV6]: Fix NS handing for proxy/anycast address	YOSHIFUJI Hideaki	2005-10-05
\| \| \| \| \| \| \| \| \| \|	Timer set up by pneigh_enqueue() ended up calling ndisc_rcv() via pndisc_redo(), which clears LOCALLY_ENQUEUED flag in NEIGH_CB(skb) and NS was queued again. Let's call ndisc_recv_ns() directly to avoid the loop. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[TCP]: BIC coding bug in Linux 2.6.13	Stephen Hemminger	2005-10-05
\| \| \| \| \| \| \| \| \| \|	Missing parenthesis in causes BIC to be slow in increasing congestion window. Spotted by Injong Rhee. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[MCAST] ipv6: Fix address size in grec_size	Yan Zheng	2005-10-05
\| \| \| \| \| \| \| \|	Signed-Off-By: Yan Zheng <yanzheng@21cn.com> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[XFRM]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \|	Fix implicit nocast warnings in xfrm code: net/xfrm/xfrm_policy.c:232:47: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[RPC]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \| \| \| \|	Fix nocast sparse warnings: net/rxrpc/call.c:2013:25: warning: implicit cast to nocast type net/rxrpc/connection.c:538:46: warning: implicit cast to nocast type net/sunrpc/sched.c:730:36: warning: implicit cast to nocast type net/sunrpc/sched.c:734:56: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[AF_KEY]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \| \|	Fix implicit nocast warnings in net/key code: net/key/af_key.c:195:27: warning: implicit cast to nocast type net/key/af_key.c:1439:28: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \|	Fix implicit nocast warnings in nfnetlink code: net/netfilter/nfnetlink.c:204:43: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPVS]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \| \| \|	From: Randy Dunlap <rdunlap@xenotime.net> Fix implicit nocast warnings in ip_vs code: net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[DECNET]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \| \| \|	Fix implicit nocast warnings in decnet code: net/decnet/af_decnet.c:458:40: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:125:35: warning: implicit cast to nocast type net/decnet/dn_nsp_out.c:219:29: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[ATM]: fix sparse gfp nocast warnings	Randy Dunlap	2005-10-05
\| \| \| \| \| \| \| \| \| \| \|	Fix implicit nocast warnings in atm code: net/atm/atm_misc.c:35:44: warning: implicit cast to nocast type drivers/atm/fore200e.c:183:33: warning: implicit cast to nocast type Also use kzalloc() instead of kmalloc(). Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NETFILTER]: Fix Kconfig typo	Horst H. von Brand	2005-10-04
\| \| \| \| \|	Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV4]: fib_trie root-node expansion	Robert Olsson	2005-10-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch below introduces special thresholds to keep root node in the trie large. This gives a flatter tree at the cost of a modest memory increase. Overall it seems to be gain and this was also proposed by one the authors of the paper in recent a seminar. Main table after loading 123 k routes. Aver depth: 3.30 Max depth: 9 Root-node size 12 bits Total size: 4044 kB With the patch: Aver depth: 2.78 Max depth: 8 Root-node size 15 bits Total size: 4150 kB An increase of 8-10% was seen in forwading performance for an rDoS attack. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV6]: Fix infinite loop in udp_v6_get_port().	YOSHIFUJI Hideaki	2005-10-04
\| \| \| \| \|	Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[PATCH] ieee80211: fix gfp flags type	Randy Dunlap	2005-10-04
\| \| \| \| \| \| \| \|	Fix implicit nocast warnings in ieee80211 code, including __nocast: net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
*	[PATCH] ieee80211: fix gfp flags type	Randy Dunlap	2005-10-03
\| \| \| \| \| \| \| \|	Fix implicit nocast warnings in ieee80211 code: net/ieee80211/ieee80211_tx.c:215:9: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
*	[IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default	David S. Miller	2005-10-03
\| \| \| \| \| \| \|	It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV4]: Get rid of bogus __in_put_dev in pktgen	Herbert Xu	2005-10-03
\| \| \| \| \| \| \| \|	This patch gets rid of a bogus __in_dev_put() in pktgen.c. This was spotted by Suzanne Wood. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl	Herbert Xu	2005-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV6]: Fix leak added by udp connect dst caching fix.	David S. Miller	2005-10-03
\| \| \| \| \| \|	Based upon a patch from Mitsuru KANDA <mk@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV6]: Fix ipv6 fragment ID selection at slow path	Yan Zheng	2005-10-03
\| \| \| \| \|	Signed-Off-By: Yan Zheng <yanzheng@21cn.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[IPV4]: Fix "Proxy ARP seems broken"	Herbert Xu	2005-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Meelis Roos <mroos@linux.ee> wrote: > RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3, > RK> it appears to be completely broken. The firewall is 212.18.232.186, > > Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to > ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging > yet since it'a a production gateway. The breakage is caused by the change to use the CB area for flagging whether a packet has been queued due to proxy_delay. This area gets cleared every time arp_rcv gets called. Unfortunately packets delayed due to proxy_delay also go through arp_rcv when they are reprocessed. In fact, I can't think of a reason why delayed proxy packets should go through netfilter again at all. So the easiest solution is to bypass that and go straight to arp_process. This is essentially what would've happened before netfilter support was added to ARP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Fix "sysctl_net.c:36: error: 'core_table' undeclared here"	Russell King	2005-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During the build for ARM machine type "fortunet", this error occurred: CC net/sysctl_net.o net/sysctl_net.c:36: error: 'core_table' undeclared here (not in a function) It appears that the following configuration settings cause this error due to a missing include: CONFIG_SYSCTL=y CONFIG_NET=y # CONFIG_INET is not set core_table appears to be declared in net/sock.h. if CONFIG_INET were defined, net/sock.h would have been included via: sysctl_net.c -> net/ip.h -> linux/ip.h -> net/sock.h so include it directly. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[INET]: speedup inet (tcp/dccp) lookups	Eric Dumazet	2005-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a lot of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! / } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((((__u64 )&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((((__u32 )&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read\|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	[NET]: Fix packet timestamping.	Herbert Xu	2005-10-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>