litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	net: sch_netem: Fix an inconsistency in ingress netem timestamps.	Jarek Poplawski	2009-04-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alex Sidorenko reported: "while experimenting with 'netem' we have found some strange behaviour. It seemed that ingress delay as measured by 'ping' command shows up on some hosts but not on others. After some investigation I have found that the problem is that skbuff->tstamp field value depends on whether there are any packet sniffers enabled. That is: - if any ptype_all handler is registered, the tstamp field is as expected - if there are no ptype_all handlers, the tstamp field does not show the delay" This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit() on ingress path (with act_mirred) adding a check, so minimal overhead on the fast path, but only when sniffers etc. are active. Since netem at ingress seems to logically emulate a network before a host, tstamp is zeroed to trigger the update and pretend delays are from the outside. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
*	ax25: proc uid file misses header	Alan Cox	2009-04-20
\| \| \| \| \| \| \| \| \| \| \|	This has been broken for a while. I happened to catch it testing because one app "knew" that the top line of the calls data was the policy line and got confused. Put the header back. Signed-off-by: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
*	vlan/macvlan: fix NULL pointer dereferences in ethtool handlers	Patrick McHardy	2009-04-17
\| \| \| \| \| \| \| \| \|	Check whether the underlying device provides a set of ethtool ops before checking for individual handlers to avoid NULL pointer dereferences. Reported-by: Art van Breemen <ard@telegraafnet.nl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Merge branch 'master' of ↵	David S. Miller	2009-04-17
\|\ \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6
\| *	mac80211: validate TIM IE length	Johannes Berg	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The TIM IE must not be shorter than 4 bytes, so verify that when parsing it. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	cfg80211: do not replace BSS structs	Johannes Berg	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead, allocate extra IE memory if necessary. Normally, this isn't even necessary since there's enough space. This is a better way of correcting the "held BSS can disappear" issue, but also a lot more code. It is also necessary for proper auth/assoc BSS handling in the future. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	cfg80211: copy hold when replacing BSS	Johannes Berg	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we receive a probe response frame we can replace the BSS struct in our list -- but if that struct is held then we need to hold the new one as well. We really should fix this completely and not replace the struct, but this is a bandaid for now. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: avoid crashing when no scan sdata	Johannes Berg	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using the scan_sdata variable here is terribly wrong, if there has never been a scan then we fail. However, we need a bandaid... Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org [2.6.29] Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: Fragmentation threshold (typo)	Gerrit Renker	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	mac80211: Fragmentation threshold (typo) ieee80211_ioctl_siwfrag() sets the fragmentation_threshold to 2352 when frame fragmentation is to be disabled, yet the corresponding 'get' function tests for 2353 bytes instead. This causes user-space tools to display a fragmentation threshold of 2352 bytes even if fragmentation has been disabled. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: quiet beacon loss messages	Michael Buesch	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Sunday 05 April 2009 11:29:38 Michael Buesch wrote: > On Sunday 05 April 2009 11:23:59 Jaswinder Singh Rajput wrote: > > With latest linus tree I am getting, .config file attached: > > > > [ 22.895051] r8169: eth0: link down > > [ 22.897564] ADDRCONF(NETDEV_UP): eth0: link is not ready > > [ 22.928047] ADDRCONF(NETDEV_UP): wlan0: link is not ready > > [ 22.982292] libvirtd used greatest stack depth: 4200 bytes left > > [ 63.709879] wlan0: authenticate with AP 00:11:95:9e:df:f6 > > [ 63.712096] wlan0: authenticated > > [ 63.712127] wlan0: associate with AP 00:11:95:9e:df:f6 > > [ 63.726831] wlan0: RX AssocResp from 00:11:95:9e:df:f6 (capab=0x471 status=0 aid=1) > > [ 63.726855] wlan0: associated > > [ 63.730093] ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready > > [ 74.296087] wlan0: no IPv6 routers present > > [ 79.349044] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 119.358200] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 179.354292] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 259.366044] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 359.348292] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 361.953459] packagekitd used greatest stack depth: 4160 bytes left > > [ 478.824258] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 598.813343] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 718.817292] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 838.824567] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 958.815402] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1078.848434] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1198.822913] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1318.824931] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1438.814157] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1558.827336] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1678.823011] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1798.830589] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 1918.828044] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 2038.827224] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 2116.517152] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 2158.840243] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > [ 2278.827427] wlan0: beacon loss from AP 00:11:95:9e:df:f6 - sending probe request > > > I think this message should only show if CONFIG_MAC80211_VERBOSE_DEBUG is set. > It's kind of expected that we lose a beacon once in a while, so we shouldn't print > verbose messages to the kernel log (even if they are KERN_DEBUG). > > And besides that, I think one can easily remotely trigger this message and flood the logs. > So it should probably _also_ be ratelimited. Something like this: Signed-off-by: Michael Buesch <mb@bu3sch.de>
\| *	mac80211: correct wext transmit power handler	Johannes Berg	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Wext makes no assumptions about the contents of data->txpower.fixed and data->txpower.value when data->txpower.disabled is set, so do not update the user-requested power level while disabling. Also, when wext configures a really _fixed_ power output [1], we should reject it instead of limiting it to the regulatory constraint. If the user wants to set a _limit_ [2] then we should honour that. [1] iwconfig wlan0 txpower 20dBm fixed [2] iwconfig wlan0 txpower 10dBm This fixes http://www.intellinuxwireless.org/bugzilla/show_bug.cgi?id=1942 Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	mac80211: Fix bug in getting rx status for frames pending in reorder buffer	Vasanthakumar Thiagarajan	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently rx status for frames which are completed from reorder buffer is taken from it's cb area which is not always right, cb is not holding the rx status when driver uses mac80211's non-irq rx handler to pass it's received frames. This results in dropping almost all frames from reorder buffer when security is enabled by doing double decryption (first in hw, second in sw because of wrong rx status). This patch copies rx status into cb area before the frame is put into reorder buffer. After this patch, there is a significant improvement in throughput with ath9k + WPA2(AES). Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>
\| *	cfg80211: fix NULL pointer deference in reg_device_remove()	Luis R. Rodriguez	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We won't ever get here as regulatory_hint_core() can only fail on -ENOMEM and in that case we don't initialize cfg80211 but this is technically correct code. This is actually good for stable, where we don't check for -ENOMEM failure on __regulatory_hint()'s failure. Cc: stable@kernel.org Reported-by: Quentin Armitage <Quentin@armitage.org.uk> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* \|	Merge branch 'master' of ↵	David S. Miller	2009-04-17
\|\ \ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
\| * \|	netfilter: nfnetlink: return ENOMEM if we fail to create netlink socket	Pablo Neira Ayuso	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With this patch, nfnetlink returns -ENOMEM instead of -EPERM if we fail to create the nfnetlink netlink socket during the module loading. This is exactly what rtnetlink does in this case. Ideally, it would be better if we propagate the error that has happened in netlink_kernel_create(), however, this function still does not implement this yet. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| * \|	netfilter: ctnetlink: report error if event message allocation fails	Pablo Neira Ayuso	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes an inconsistency that results in no error reports to user-space listeners if we fail to allocate the event message. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \|	gro: Fix use after free in tcp_gro_receive	Herbert Xu	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After calling skb_gro_receive skb->len can no longer be relied on since if the skb was merged using frags, then its pages will have been removed and the length reduced. This caused tcp_gro_receive to prematurely end merging which resulted in suboptimal performance with ixgbe. The fix is to store skb->len on the stack. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	can: Network Drop Monitor: Make use of consume_skb() in af_can.c	Oliver Hartkopp	2009-04-17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since commit ead2ceb0ec9f85cff19c43b5cdb2f8a054484431 ("Network Drop Monitor: Adding kfree_skb_clean for non-drops and modifying end-of-line points for skbs") so called end-of-line points for skb's should use consume_skb() to free the socket buffer. In opposite to consume_skb() the function kfree_skb() is intended to be used for unexpected skb drops e.g. in error conditions that now can trigger the network drop monitor if enabled. This patch moves the skb end-of-line point in af_can.c to use consume_skb(). Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'master' of ↵	David S. Miller	2009-04-16
\|\\| \| \| \|/ \|/\| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
\| *	netfilter: nf_nat: add support for persistent mappings	Patrick McHardy	2009-04-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The removal of the SAME target accidentally removed one feature that is not available from the normal NAT targets so far, having multi-range mappings that use the same mapping for each connection from a single client. The current behaviour is to choose the address from the range based on source and destination IP, which breaks when communicating with sites having multiple addresses that require all connections to originate from the same IP address. Introduce a IP_NAT_RANGE_PERSISTENT option that controls whether the destination address is taken into account for selecting addresses. http://bugzilla.kernel.org/show_bug.cgi?id=12954 Signed-off-by: Patrick McHardy <kaber@trash.net>
\| *	netfilter: nf_conntrack: fix crash when unloading helpers	Patrick McHardy	2009-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ea781f197d (netfilter: nf_conntrack: use SLAB_DESTROY_BY_RCU and) get rid of call_rcu() was missing one conversion to the hlist_nulls functions, causing a crash when unloading conntrack helper modules. Reported-and-tested-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| *	netfilter: nf_log regression fix	Eric Dumazet	2009-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit ca735b3aaa945626ba65a3e51145bfe4ecd9e222 'netfilter: use a linked list of loggers' introduced an array of list_head in "struct nf_logger", but forgot to initialize it in nf_log_register(). This resulted in oops when calling nf_log_unregister() at module unload time. Reported-and-tested-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \|	packet: avoid warnings when high-order page allocation fails	Eric Dumazet	2009-04-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Latest tcpdump/libpcap triggers annoying messages because of high order page allocation failures (when lowmem exhausted or fragmented) These allocation errors are correctly handled so could be silent. [22660.208901] tcpdump: page allocation failure. order:5, mode:0xc0d0 [22660.208921] Pid: 13866, comm: tcpdump Not tainted 2.6.30-rc2 #170 [22660.208936] Call Trace: [22660.208950] [<c04e2b46>] ? printk+0x18/0x1a [22660.208965] [<c02760f7>] __alloc_pages_internal+0x357/0x460 [22660.208980] [<c0276251>] __get_free_pages+0x21/0x40 [22660.208995] [<c04cc835>] packet_set_ring+0x105/0x3d0 [22660.209009] [<c04ccd1d>] packet_setsockopt+0x21d/0x4d0 [22660.209025] [<c0270400>] ? filemap_fault+0x0/0x450 [22660.209040] [<c0449e34>] sys_setsockopt+0x54/0xa0 [22660.209053] [<c044b97f>] sys_socketcall+0xef/0x270 [22660.209067] [<c0202e34>] sysenter_do_call+0x12/0x26 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Revert "rose: zero length frame filtering in af_rose.c"	David S. Miller	2009-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit 244f46ae6e9e18f6fc0be7d1f49febde4762c34b. Alan Cox did the research, and just like the other radio protocols zero-length frames have meaning because at the top level ROSE is X.25 PLP. So this zero-length filtering is invalid. Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	gro: Restore correct value to gso_size	Herbert Xu	2009-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since everybody has been focusing on baremetal GRO performance no one noticed when I added a bug that zapped gso_size for all GRO packets. This only gets picked up when you forward the skb out of an interface. Thanks to Mark Wagner for noticing this bug when testing kvm. Reported-by: Mark Wagner <mwagner@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv6:remove useless check	Yang Hongyang	2009-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	After switch (rthdr->type) {...},the check below is completely useless.Because: if the type is 2,then hdrlen must be 2 and segments_left must be 1,clearly the check is redundant;if the type is not 2,then goto sticky_done,the check is useless too. Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com> Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tcp: fix >2 iw selection	Ilpo Järvinen	2009-04-14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A long-standing feature in tcp_init_metrics() is such that any of its goto reset prevents call to tcp_init_cwnd(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	netsched: Allow meta match on vlan tag on receive	Stephen Hemminger	2009-04-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When vlan acceleration is used on receive, the vlan tag is maintained outside of the skb data. The existing vlan tag match only works on TX path because it uses vlan_get_tag which tests for VLAN_HW_TX_ACCEL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	gro: Normalise skb before bypassing GRO on netpoll VLAN path	Herbert Xu	2009-04-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hi: gro: Normalise skb before bypassing GRO on netpoll VLAN path When we detect netpoll RX on the GRO VLAN path we bail out and call the normal VLAN receive handler. However, the packet needs to be normalised by calling eth_type_trans since that's what the normal path expects (normally the GRO path does the fixup). This patch adds the necessary call to vlan_gro_frags. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	ipv6: Fix NULL pointer dereference with time-wait sockets	Vlad Yasevich	2009-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit b2f5e7cd3dee2ed721bf0675e1a1ddebb849aee6 (ipv6: Fix conflict resolutions during ipv6 binding) introduced a regression where time-wait sockets were not treated correctly. This resulted in the following: BUG: unable to handle kernel NULL pointer dereference at 0000000000000062 IP: [<ffffffff805d7d61>] ipv4_rcv_saddr_equal+0x61/0x70 ... Call Trace: [<ffffffffa033847b>] ipv6_rcv_saddr_equal+0x1bb/0x250 [ipv6] [<ffffffffa03505a8>] inet6_csk_bind_conflict+0x88/0xd0 [ipv6] [<ffffffff805bb18e>] inet_csk_get_port+0x1ee/0x400 [<ffffffffa0319b7f>] inet6_bind+0x1cf/0x3a0 [ipv6] [<ffffffff8056d17c>] ? sockfd_lookup_light+0x3c/0xd0 [<ffffffff8056ed49>] sys_bind+0x89/0x100 [<ffffffff80613ea2>] ? trace_hardirqs_on_thunk+0x3a/0x3c [<ffffffff8020bf9b>] system_call_fastpath+0x16/0x1b Tested-by: Brian Haley <brian.haley@hp.com> Tested-by: Ed Tomlinson <edt@aei.ca> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	tr: fix leakage of device in net/802/tr.c	Wei Yongjun	2009-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add dev_put() after dev_get_by_index() to avoid leakage of device. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	net: netif_device_attach/detach should start/stop all queues	Alexander Duyck	2009-04-11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently netif_device_attach/detach are only stopping one queue. They should be starting and stopping all the queues on a given device. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	Merge branch 'master' of ↵	David S. Miller	2009-04-08
\|\\| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
\| *	netfilter: ctnetlink: fix regression in expectation handling	Pablo Neira Ayuso	2009-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a regression (introduced by myself in commit 19abb7b: netfilter: ctnetlink: deliver events for conntracks changed from userspace) that results in an expectation re-insertion since __nf_ct_expect_check() may return 0 for expectation timer refreshing. This patch also removes a unnecessary refcount bump that pretended to avoid a possible race condition with event delivery and expectation timers (as said, not needed since we hold a reference to the object since until we finish the expectation setup). This also merges nf_ct_expect_related_report() and nf_ct_expect_related() which look basically the same. Reported-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| *	netfilter: fix selection of "LED" target in netfilter	Alex Riesen	2009-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's plural, not LED_TRIGGERS. Signed-off-by: Alex Riesen <fork0@users.sourceforge.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
\| *	netfilter: ip6tables regression fix	Eric Dumazet	2009-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 7845447 (netfilter: iptables: lock free counters) broke ip6_tables by unconditionally returning ENOMEM in alloc_counters(), Reported-by: Graham Murray <graham@gmurray.org.uk> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2009-04-06
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: b44: Use kernel DMA addresses for the kernel DMA API forcedeth: Fix resume from hibernation regression. xfrm: fix fragmentation on inter family tunnels ibm_newemac: Fix dangerous struct assumption gigaset: documentation update gigaset: in file ops, check for device disconnect before anything else bas_gigaset: use tasklet_hi_schedule for timing critical tasklets net/802/fddi.c: add MODULE_LICENSE smsc911x: remove unused #include <linux/version.h> axnet_cs: fix phy_id detection for bogus Asix chip. bnx2: Use request_firmware() b44: Fix sizes passed to b44_sync_dma_desc_for_{device,cpu}() socket: use percpu_add() while updating sockets_in_use virtio_net: Set the mac config only when VIRITO_NET_F_MAC myri_sbus: use request_firmware e1000: fix loss of multicast packets vxge: should include tcp.h Conflict in firmware/WHENCE (SCSI vs net firmware)
\| * \|	xfrm: fix fragmentation on inter family tunnels	Steffen Klassert	2009-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If an ipv4 packet (not locally generated with IP_DF flag not set) bigger than mtu size is supposed to go via a xfrm ipv6 tunnel, the packetsize check in xfrm4_tunnel_check_size() is omited and ipv6 drops the packet without sending a notice to the original sender of the ipv4 packet. Another issue is that ipv4 connection tracking does reassembling of incomming fragmented packets. If such a reassembled packet is supposed to go via a xfrm ipv6 tunnel it will be droped, even if the original sender did proper fragmentation. According to RFC 2473 (section 7) tunnel ipv6 packets resulting from the encapsulation of an original packet are considered as locally generated packets. If such a packet passed the checks in xfrm{4,6}_tunnel_check_size() fragmentation is allowed according to RFC 2473 (section 7.1/7.2). This patch sets skb->local_df in xfrm6_prepare_output() to achieve fragmentation in this case. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/802/fddi.c: add MODULE_LICENSE	Adrian Bunk	2009-04-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the missing MODULE_LICENSE("GPL"). Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	socket: use percpu_add() while updating sockets_in_use	Eric Dumazet	2009-04-04
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sock_alloc() currently uses following code to update sockets_in_use get_cpu_var(sockets_in_use)++; put_cpu_var(sockets_in_use); This translates to : c0436274: b8 01 00 00 00 mov $0x1,%eax c0436279: e8 42 40 df ff call c022a2c0 <add_preempt_count> c043627e: bb 20 4f 6a c0 mov $0xc06a4f20,%ebx c0436283: e8 18 ca f0 ff call c0342ca0 <debug_smp_processor_id> c0436288: 03 1c 85 60 4a 65 c0 add -0x3f9ab5a0(,%eax,4),%ebx c043628f: ff 03 incl (%ebx) c0436291: b8 01 00 00 00 mov $0x1,%eax c0436296: e8 75 3f df ff call c022a210 <sub_preempt_count> c043629b: 89 e0 mov %esp,%eax c043629d: 25 00 e0 ff ff and $0xffffe000,%eax c04362a2: f6 40 08 08 testb $0x8,0x8(%eax) c04362a6: 75 07 jne c04362af <sock_alloc+0x7f> c04362a8: 8d 46 d8 lea -0x28(%esi),%eax c04362ab: 5b pop %ebx c04362ac: 5e pop %esi c04362ad: c9 leave c04362ae: c3 ret c04362af: e8 cc 5d 09 00 call c04cc080 <preempt_schedule> c04362b4: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi c04362b8: eb ee jmp c04362a8 <sock_alloc+0x78> While percpu_add(sockets_in_use, 1) translates to a single instruction : c0436275: 64 83 05 20 5f 6a c0 addl $0x1,%fs:0xc06a5f20 Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'for-2.6.30' of git://linux-nfs.org/~bfields/linux	Linus Torvalds	2009-04-06
\|\ \ \ \| \|_\|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-2.6.30' of git://linux-nfs.org/~bfields/linux: (81 commits) nfsd41: define nfsd4_set_statp as noop for !CONFIG_NFSD_V4 nfsd41: define NFSD_DRC_SIZE_SHIFT in set_max_drc nfsd41: Documentation/filesystems/nfs41-server.txt nfsd41: CREATE_EXCLUSIVE4_1 nfsd41: SUPPATTR_EXCLCREAT attribute nfsd41: support for 3-word long attribute bitmask nfsd: dynamically skip encoded fattr bitmap in _nfsd4_verify nfsd41: pass writable attrs mask to nfsd4_decode_fattr nfsd41: provide support for minor version 1 at rpc level nfsd41: control nfsv4.1 svc via /proc/fs/nfsd/versions nfsd41: add OPEN4_SHARE_ACCESS_WANT nfs4_stateid bmap nfsd41: access_valid nfsd41: clientid handling nfsd41: check encode size for sessions maxresponse cached nfsd41: stateid handling nfsd: pass nfsd4_compound_state* to nfs4_preprocess_{state,seq}id_op nfsd41: destroy_session operation nfsd41: non-page DRC for solo sequence responses nfsd41: Add a create session replay cache nfsd41: create_session operation ...
\| * \|	nfsd: don't use the deferral service, return NFS4ERR_DELAY	Andy Adamson	2009-04-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On an NFSv4.1 server cache miss that causes an upcall, NFS4ERR_DELAY will be returned. It is up to the NFSv4.1 client to resend only the operations that have not been processed. Initialize rq_usedeferral to 1 in svc_process(). It sill be turned off in nfsd4_proc_compound() only when NFSv4.1 Sessions are used. Note: this isn't an adequate solution on its own. It's acceptable as a way to get some minimal 4.1 up and working, but we're going to have to find a way to avoid returning DELAY in all common cases before 4.1 can really be considered ready. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> [nfsd41: reverse rq_nodeferral negative logic] Signed-off-by: Benny Halevy <bhalevy@panasas.com> [sunrpc: initialize rq_usedeferral] Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
\| * \|	sunrpc/svc.c: Remove unused line 'rqstp->rq_server = serv;' in svc_process	ideawu	2009-03-27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to set rqstp->rq_server to serv, while serv is initialized as rqstp->rq_server at previous line. And between these two lines, there is no change to rqstp->rq_server. Signed-off-by: ideawu <ideawu@163.com> Reviewed-by: Tom Tucker <tom@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
\| * \|	svcrpc: take advantage of tcp autotuning	Olga Kornievskaia	2009-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow the NFSv4 server to make use of TCP autotuning behaviour, which was previously disabled by setting the sk_userlocks variable. Set the receive buffers to be big enough to receive the whole RPC request, and set this for the listening socket, not the accept socket. Remove the code that readjusts the receive/send buffer sizes for the accepted socket. Previously this code was used to influence the TCP window management behaviour, which is no longer needed when autotuning is enabled. This can improve IO bandwidth on networks with high bandwidth-delay products, where a large tcp window is required. It also simplifies performance tuning, since getting adequate tcp buffers previously required increasing the number of nfsd threads. Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu> Cc: Jim Rees <rees@umich.edu> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
\| * \|	knfsd: add file to export stats about nfsd pools	Greg Banks	2009-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add /proc/fs/nfsd/pool_stats to export to userspace various statistics about the operation of rpc server thread pools. This patch is based on a forward-ported version of knfsd-add-pool-thread-stats which has been shipping in the SGI "Enhanced NFS" product since 2006 and which was previously posted: http://article.gmane.org/gmane.linux.nfs/10375 It has also been updated thus: * moved EXPORT_SYMBOL() to near the function it exports * made the new struct struct seq_operations const * used SEQ_START_TOKEN instead of ((void )1) merged fix from SGI PV 990526 "sunrpc: use dprintk instead of printk in svc_pool_stats_()" by Harshula Jayasuriya. merged fix from SGI PV 964001 "Crash reading pool_stats before nfsds are started". Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: Harshula Jayasuriya <harshula@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
\| * \|	knfsd: avoid overloading the CPU scheduler with enormous load averages	Greg Banks	2009-03-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Avoid overloading the CPU scheduler with enormous load averages when handling high call-rate NFS loads. When the knfsd bottom half is made aware of an incoming call by the socket layer, it tries to choose an nfsd thread and wake it up. As long as there are idle threads, one will be woken up. If there are lot of nfsd threads (a sensible configuration when the server is disk-bound or is running an HSM), there will be many more nfsd threads than CPUs to run them. Under a high call-rate low service-time workload, the result is that almost every nfsd is runnable, but only a handful are actually able to run. This situation causes two significant problems: 1. The CPU scheduler takes over 10% of each CPU, which is robbing the nfsd threads of valuable CPU time. 2. At a high enough load, the nfsd threads starve userspace threads of CPU time, to the point where daemons like portmap and rpc.mountd do not schedule for tens of seconds at a time. Clients attempting to mount an NFS filesystem timeout at the very first step (opening a TCP connection to portmap) because portmap cannot wake up from select() and call accept() in time. Disclaimer: these effects were observed on a SLES9 kernel, modern kernels' schedulers may behave more gracefully. The solution is simple: keep in each svc_pool a counter of the number of threads which have been woken but have not yet run, and do not wake any more if that count reaches an arbitrary small threshold. Testing was on a 4 CPU 4 NIC Altix using 4 IRIX clients, each with 16 synthetic client threads simulating an rsync (i.e. recursive directory listing) workload reading from an i386 RH9 install image (161480 regular files in 10841 directories) on the server. That tree is small enough to fill in the server's RAM so no disk traffic was involved. This setup gives a sustained call rate in excess of 60000 calls/sec before being CPU-bound on the server. The server was running 128 nfsds. Profiling showed schedule() taking 6.7% of every CPU, and __wake_up() taking 5.2%. This patch drops those contributions to 3.0% and 2.2%. Load average was over 120 before the patch, and 20.9 after. This patch is a forward-ported version of knfsd-avoid-nfsd-overload which has been shipping in the SGI "Enhanced NFS" product since 2006. It has been posted before: http://article.gmane.org/gmane.linux.nfs/10374 Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
* \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask	Linus Torvalds	2009-04-05
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-cpumask: (36 commits) cpumask: remove cpumask allocation from idle_balance, fix numa, cpumask: move numa_node_id default implementation to topology.h, fix cpumask: remove cpumask allocation from idle_balance x86: cpumask: x86 mmio-mod.c use cpumask_var_t for downed_cpus x86: cpumask: update 32-bit APM not to mug current->cpus_allowed x86: microcode: cleanup x86: cpumask: use work_on_cpu in arch/x86/kernel/microcode_core.c cpumask: fix CONFIG_CPUMASK_OFFSTACK=y cpu hotunplug crash numa, cpumask: move numa_node_id default implementation to topology.h cpumask: convert node_to_cpumask_map[] to cpumask_var_t cpumask: remove x86 cpumask_t uses. cpumask: use cpumask_var_t in uv_flush_tlb_others. cpumask: remove cpumask_t assignment from vector_allocation_domain() cpumask: make Xen use the new operators. cpumask: clean up summit's send_IPI functions cpumask: use new cpumask functions throughout x86 x86: unify cpu_callin_mask/cpu_callout_mask/cpu_initialized_mask/cpu_sibling_setup_mask cpumask: convert struct cpuinfo_x86's llc_shared_map to cpumask_var_t cpumask: convert node_to_cpumask_map[] to cpumask_var_t x86: unify 32 and 64-bit node_to_cpumask_map ...
\| * \ \	Merge branch 'cpumask-for-linus' of ↵	Rusty Russell	2009-03-30
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip Conflicts: arch/x86/include/asm/topology.h drivers/oprofile/buffer_sync.c (Both cases: changed in Linus' tree, removed in Ingo's).
\| \| * \ \	Merge branch 'linus' into cpumask-for-linus	Ingo Molnar	2009-03-30
\| \| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Conflicts: arch/x86/kernel/cpu/common.c
\| \| * \| \| \|	cpumask: replace node_to_cpumask with cpumask_of_node.	Rusty Russell	2009-03-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>