litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	quota: manage reserved space when quota is not active [v2]	Dmitry Monakhov	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit c469070aea5a0ada45a836937c776fd3083dae2b upstream. Since we implemented generic reserved space management interface, then it is possible to account reserved space even when quota is not active (similar to i_blocks/i_bytes). Without this patch following testcase result in massive comlain from WARN_ON in dquot_claim_space() TEST_CASE: mount /dev/sdb /mnt -oquota dd if=/dev/zero of=/mnt/test bs=1M count=1 quotaon /mnt # fs_reserved_spave == 1Mb # quota_reserved_space == 0, because quota was disabled dd if=/dev/zero of=/mnt/test seek=1 bs=1M count=1 # fs_reserved_spave == 2Mb # quota_reserved_space == 1Mb sync # ->dquot_claim_space() -> WARN_ON Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	skbuff: remove unused dma_head & dma_maps fields	Alexander Duyck	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 03e6d819c2cb2cc8ce5642669a0a7c72336ee7a2 ] The dma map fields in the skb_shared_info structure no longer has any users and can be dropped since it is making the skb_shared_info unecessarily larger. Running slabtop show that we were using 4K slabs for the skb->head on x86_64 w/ an allocation size of 1522. It turns out that the dma_head and dma_maps array made skb_shared large enough that we had crossed over the 2k boundary with standard frames and as such we were using 4k blocks of memory for all skbs. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	net: Potential null skb->dev dereference	Eric Dumazet	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 0641e4fbf2f824faee00ea74c459a088d94905fd ] When doing "ifenslave -d bond0 eth0", there is chance to get NULL dereference in netif_receive_skb(), because dev->master suddenly becomes NULL after we tested it. We should use ACCESS_ONCE() to avoid this (or rcu_dereference()) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	netlink: fix unaligned access in nla_get_be64()	Pablo Neira Ayuso	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit f5d410f2ea7ba340f11815a56e05b9fa9421c421 ] This patch fixes a unaligned access in nla_get_be64() that was introduced by myself in a17c859849402315613a0015ac8fbf101acf0cc1. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	netfilter: ctnetlink: fix reliable event delivery if message building fails	Pablo Neira Ayuso	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 37b7ef7203240b3aba577bb1ff6765fe15225976 ] This patch fixes a bug that allows to lose events when reliable event delivery mode is used, ie. if NETLINK_BROADCAST_SEND_ERROR and NETLINK_RECV_NO_ENOBUFS socket options are set. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	netlink: fix NETLINK_RECV_NO_ENOBUFS in netlink_set_err()	Pablo Neira Ayuso	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 1a50307ba1826e4da0024e64b245ce4eadf7688a ] Currently, ENOBUFS errors are reported to the socket via netlink_set_err() even if NETLINK_RECV_NO_ENOBUFS is set. However, that should not happen. This fixes this problem and it changes the prototype of netlink_set_err() to return the number of sockets that have set the NETLINK_RECV_NO_ENOBUFS socket option. This return value is used in the next patch in these bugfix series. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	ipsec: Fix bogus bundle flowi	Herbert Xu	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9 ] When I merged the bundle creation code, I introduced a bogus flowi value in the bundle. Instead of getting from the caller, it was instead set to the flow in the route object, which is totally different. The end result is that the bundles we created never match, and we instead end up with an ever growing bundle list. Thanks to Jamal for find this problem. Reported-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	net: add __must_check to sk_add_backlog	Zhu Yi	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 4045635318538d3ddd2007720412fdc4b08f6a62 ] Add the "__must_check" tag to sk_add_backlog() so that any failure to check and drop packets will be warned about. Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	net: backlog functions rename	Zhu Yi	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit a3a858ff18a72a8d388e31ab0d98f7e944841a62 ] sk_add_backlog -> __sk_add_backlog sk_add_backlog_limited -> sk_add_backlog Signed-off-by: Zhu Yi <yi.zhu@intel.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	net: add limit for socket backlog	Zhu Yi	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Upstream commit 8eae939f1400326b06d0c9afe53d2a484a326871 ] We got system OOM while running some UDP netperf testing on the loopback device. The case is multiple senders sent stream UDP packets to a single receiver via loopback on local host. Of course, the receiver is not able to handle all the packets in time. But we surprisingly found that these packets were not discarded due to the receiver's sk->sk_rcvbuf limit. Instead, they are kept queuing to sk->sk_backlog and finally ate up all the memory. We believe this is a secure hole that a none privileged user can crash the system. The root cause for this problem is, when the receiver is doing __release_sock() (i.e. after userspace recv, kernel udp_recvmsg -> skb_free_datagram_locked -> release_sock), it moves skbs from backlog to sk_receive_queue with the softirq enabled. In the above case, multiple busy senders will almost make it an endless loop. The skbs in the backlog end up eat all the system memory. The issue is not only for UDP. Any protocols using socket backlog is potentially affected. The patch adds limit for socket backlog so that the backlog size cannot be expanded endlessly. Reported-by: Alex Shi <alex.shi@intel.com> Cc: David Miller <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi> Cc: Patrick McHardy <kaber@trash.net> Cc: Vlad Yasevich <vladislav.yasevich@hp.com> Cc: Sridhar Samudrala <sri@us.ibm.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: Allan Stephens <allan.stephens@windriver.com> Cc: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: Zhu Yi <yi.zhu@intel.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	mac80211: Retry null data frame for power save	Vivek Natarajan	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 375177bf35efc08e1bd37bbda4cc0c8cc4db8500 upstream. Even if the null data frame is not acked by the AP, mac80211 goes into power save. This might lead to loss of frames from the AP. Prevent this by restarting dynamic_ps_timer when ack is not received for null data frames. Cc: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Vivek Natarajan <vnatarajan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	if_tunnel.h: add missing ams/byteorder.h include	Paulius Zaleckas	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 9bf35c8dddd56f7f247a27346f74f5adc18071f4 upstream. When compiling userspace application which includes if_tunnel.h and uses GRE_* defines you will get undefined reference to __cpu_to_be16. Fix this by adding missing #include <asm/byteorder.h> Signed-off-by: Paulius Zaleckas <paulius.zaleckas@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	kfifo: fix KFIFO_INIT in include/linux/kfifo.h	David Härdeman	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 4c87684d32e8f95715d53039dcd2d998dc63d1eb upstream. include/linux/kfifo.h first defines and then undefines __kfifo_initializer which is used by INIT_KFIFO (which is also a macro, so building a module which uses INIT_KFIFO will fail). Signed-off-by: David Härdeman <david@hardeman.nu> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	tty: Take a 256 byte padding into account when buffering below sub-page units	Mel Gorman	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 352fa6ad16b89f8ffd1a93b4419b1a8f2259feab upstream. The TTY layer takes some care to ensure that only sub-page allocations are made with interrupts disabled. It does this by setting a goal of "TTY_BUFFER_PAGE" to allocate. Unfortunately, while TTY_BUFFER_PAGE takes the size of tty_buffer into account, it fails to account that tty_buffer_find() rounds the buffer size out to the next 256 byte boundary before adding on the size of the tty_buffer. This patch adjusts the TTY_BUFFER_PAGE calculation to take into account the size of the tty_buffer and the padding. Once applied, tty_buffer_alloc() should not require high-order allocations. Signed-off-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	tty: Keep the default buffering to sub-page units	Alan Cox	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit d9661adfb8e53a7647360140af3b92284cbe52d4 upstream. We allocate during interrupts so while our buffering is normally diced up small anyway on some hardware at speed we can pressure the VM excessively for page pairs. We don't really need big buffers to be linear so don't try so hard. In order to make this work well we will tidy up excess callers to request_room, which cannot itself enforce this break up. Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	perf: Provide generic perf_sample_data initialization	Peter Zijlstra	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This makes it easier to extend perf_sample_data and fixes a bug on arm and sparc, which failed to set ->raw to NULL, which can cause crashes when combined with PERF_SAMPLE_RAW. It also optimizes PowerPC and tracepoint, because the struct initialization is forced to zero out the whole structure. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Jean Pihet <jpihet@mvista.com> Reviewed-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: Jamie Iles <jamie.iles@picochip.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Stephane Eranian <eranian@google.com> LKML-Reference: <20100304140100.315416040@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP	Jan Kiszka	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \|	Commit d2be1651b736002e0c76d7095d6c0ba77b4a897c upstream. This marks the guest single-step API improvement of 94fe45da and 91586a3b with a capability flag to allow reliable detection by user space. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	mac80211: Fix HT rate control configuration	Sujith	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 4fa004373133ece3d9b1c0a7e243b0e53760b165 upstream. Handling HT configuration changes involved setting the channel with the new HT parameters and then issuing a rate_update() notification to the driver. This behavior changed after the off-channel changes. Now, the channel is not updated with the new HT params in enable_ht() - instead, it is now done when the scan work terminates. This results in the driver depending on stale information, defaulting to non-HT mode always. Fix this by passing the new channel type to the driver. Signed-off-by: Sujith <Sujith.Manoharan@atheros.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	decompress: fix new decompressor for PIC	Russell King	2010-04-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 5ceaa2f39bfa73c4398cd01e78f1c3ebde3d3383 upstream. The ARM kernel decompressor wants to be able to relocate r/w data independently from the rest of the image, and we do this by ensuring that r/w data has global visibility. Define STATIC_RW_DATA to be empty to achieve this. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Alain Knaff <alain@knaff.lu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	sched: Fix sched_mv_power_savings for !SMT	Vaidyanathan Srinivasan	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 28f5318167adf23b16c844b9c2253f355cb21796 upstream. Fix for sched_mc_powersavigs for pre-Nehalem platforms. Child sched domain should clear SD_PREFER_SIBLING if parent will have SD_POWERSAVINGS_BALANCE because they are contradicting. Sets the flags correctly based on sched_mc_power_savings. Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	x86: Avoid race condition in pci_enable_msix()	Brandon Phiilps	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit ced5b697a76d325e7a7ac7d382dbbb632c765093 upstream. Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	netdevice.h: check for CONFIG_WLAN instead of CONFIG_WLAN_80211	John W. Linville	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit caf66e581172dc5032bb84841a91bc7b77ad9876 upstream. In "wireless: remove WLAN_80211 and WLAN_PRE80211 from Kconfig" I inadvertantly missed a line in include/linux/netdevice.h. I thereby effectively reverted "net: Set LL_MAX_HEADER properly for wireless." by accident. :-( Now we should check there for CONFIG_WLAN instead. Signed-off-by: John W. Linville <linville@tuxdriver.com> Reported-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	skbuff: align sk_buff::cb to 64 bit and close some potential holes	Felix Fietkau	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit da3f5cf1f8ebb0fab5c5fd09adb189166594ad6c upstream. The alignment requirement for 64-bit load/store instructions on ARM is implementation defined. Some CPUs (such as Marvell Feroceon) do not generate an exception, if such an instruction is executed with an address that is not 64 bit aligned. In such a case, the Feroceon corrupts adjacent memory, which showed up in my tests as a crash in the rx path of ath9k that only occured with CONFIG_XFRM set. This crash happened, because the first field of the mac80211 rx status info in the cb is an u64, and changing it corrupted the skb->sp field. This patch also closes some potential pre-existing holes in the sk_buff struct surrounding the cb[] area. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	tracing: Fix ftrace_event_call alignment for use with gcc 4.5	Jeff Mahoney	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 86c38a31aa7f2dd6e74a262710bf8ebf7455acc5 upstream. GCC 4.5 introduces behavior that forces the alignment of structures to use the largest possible value. The default value is 32 bytes, so if some structures are defined with a 4-byte alignment and others aren't declared with an alignment constraint at all - it will align at 32-bytes. For things like the ftrace events, this results in a non-standard array. When initializing the ftrace subsystem, we traverse the _ftrace_events section and call the initialization callback for each event. When the structures are misaligned, we could be treating another part of the structure (or the zeroed out space between them) as a function pointer. This patch forces the alignment for all the ftrace_event_call structures to 4 bytes. Without this patch, the kernel fails to boot very early when built with gcc 4.5. It's trivial to check the alignment of the members of the array, so it might be worthwhile to add something to the build system to do that automatically. Unfortunately, that only covers this case. I've asked one of the gcc developers about adding a warning when this condition is seen. Signed-off-by: Jeff Mahoney <jeffm@suse.com> LKML-Reference: <4B85770B.6010901@suse.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	perf: Reimplement frequency driven sampling	Peter Zijlstra	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit abd50713944c8ea9e0af5b7bffa0aacae21cc91a upstream. There was a bug in the old period code that caused intel_pmu_enable_all() or native_write_msr_safe() to show up quite high in the profiles. In staring at that code it made my head hurt, so I rewrote it in a hopefully simpler fashion. Its now fully symetric between tick and overflow driven adjustments and uses less data to boot. The only complication is that it basically wants to do a u128 division. The code approximates that in a rather simple truncate until it fits fashion, taking care to balance the terms while truncating. This version does not generate that sampling artefact. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM	Wu Fengguang	2010-03-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	commit 0141450f66c3c12a3aaa869748caa64241885cdf upstream. This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM. POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance: a 16K read will be carried out in 4 _sync_ 1-page reads. In other places, ra_pages==0 means - it's ramfs/tmpfs/hugetlbfs/sysfs/configfs - some IO error happened where multi-page read IO won't help or should be avoided. POSIX_FADV_RANDOM actually want a different semantics: to disable the heuristic readahead algorithm, and to use a dumb one which faithfully submit read IO for whatever application requests. So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM. Note that the random hint is not likely to help random reads performance noticeably. And it may be too permissive on huge request size (its IO size is not limited by read_ahead_kb). In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall (NFS read) performance of the application increased by 313%! Tested-by: Quentin Barnes <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com> Cc: Nick Piggin <npiggin@suse.de> Cc: Andi Kleen <andi@firstfloor.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: David Howells <dhowells@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: <qbarnes+nfs@yahoo-inc.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	Revert "block: improve queue_should_plug() by looking at IO depths"	Jens Axboe	2010-02-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit fb1e75389bd06fd5987e9cda1b4e0305c782f854. "Benjamin S." <sbenni@gmx.de> reports that the patch in question causes a big drop in sequential throughput for him, dropping from 200MB/sec down to only 70MB/sec. Needs to be investigated more fully, for now lets just revert the offending commit. Conflicts: include/linux/blkdev.h Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-02-20
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: CacheFiles: Fix a race in cachefiles_delete_object() vs rename vfs: don't call ima_file_check() unconditionally in nfsd_open() fs: inode - remove 8 bytes of padding on 64bits allowing 1 more objects/slab under slub Switch proc/self to nd_set_link() fix LOOKUP_FOLLOW on automount "symlinks"
\| *	fs: inode - remove 8 bytes of padding on 64bits allowing 1 more objects/slab ↵	Richard Kennedy	2010-02-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	under slub This removes 8 bytes of padding from struct inode on 64bit builds, and so allows 1 more object/slab in the inode_cache when using slub. Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk> ---- patch against 2.6.33-rc8 compiled & tested on x86_64 AMDX2 I've been running this patch for over a week with no obvious problems regards Richard Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \|	ARM: 5951/1: ARM: fix documentation of the PrimeCell bus	Linus Walleij	2010-02-20
\|/ \| \| \| \| \| \| \|	This fixes the filepath encoded in <linux/amba/bus.h> and adds some documentation as to what this bus really means. Signed-off-by: Linus Walleij <linus.walleij@stericsson.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-02-18
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: add KEY_RFKILL Input: i8042 - fix KBC jam during hibernate
\| *	Input: add KEY_RFKILL	Matthew Garrett	2010-02-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most laptops have keys that are intended to toggle all device state, not just wifi. These are currently generally mapped to KEY_WLAN. As a result, rfkill will only kill or enable wifi in response to the key press. This confuses users and can make it difficult for them to enable bluetooth and wwan devices. This patch adds a new keycode, KEY_RFKILL. It indicates that the system should toggle the state of all rfkillable devices. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* \|	kfifo: Don't use integer as NULL pointer	Anton Vorontsov	2010-02-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes following sparse warnings: include/linux/kfifo.h:127:25: warning: Using plain integer as NULL pointer kernel/kfifo.c:83:21: warning: Using plain integer as NULL pointer Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com> Acked-by: Stefani Seibold <stefani@seibold.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* \|	Merge branch 'perf-fixes-for-linus' of ↵	Linus Torvalds	2010-02-15
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf top: Fix help text alignment perf: Fix hypervisor sample reporting perf: Make bp_len type to u64 generic across the arch
\| * \|	perf: Make bp_len type to u64 generic across the arch	Mahesh Salgaonkar	2010-02-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change 'bp_len' type to __u64 to make it work across archs as the s390 architecture watch point length can be upto 2^64. reference: http://lkml.org/lkml/2010/1/25/212 This is an ABI change that is not backward compatible with the previous hardware breakpoint info layout integrated in this development cycle, a rebuilt of perf tools is necessary for versions based on 2.6.33-rc1 - 2.6.33-rc6 to work with a kernel based on this patch. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: "K. Prasad" <prasad@linux.vnet.ibm.com> Cc: Maneesh Soni <maneesh@in.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Martin <schwidefsky@de.ibm.com> LKML-Reference: <20100130045518.GA20776@in.ibm.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
* \| \|	Merge remote branch 'nouveau/for-airlied' of nouveau-2.6	Dave Airlie	2010-02-10
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'nouveau/for-airlied' of /home/airlied/kernel/drm-next: nouveau: fix state detection with switchable graphics drm/nouveau: move dereferences after null checks drm/nv50: make the pgraph irq handler loop like the pre-nv50 version drm/nv50: delete ramfc object after disabling fifo, not before drm/nv50: avoid unloading pgraph context when ctxprog is running drm/nv50: align size of buffer object to the right boundaries. drm/nv50: disregard dac outputs in nv50_sor_dpms() drm/nv50: prevent multiple init tables being parsed at the same time drm/nouveau: make dp auxch xfer len check for reads only drm/nv40: make INIT_COMPUTE_MEM a NOP, just like nv50 drm/nouveau: Add proper vgaarb support. drm/nouveau: Fix fbcon on mixed pre-NV50 + NV50 multicard. drivers/gpu/drm/nouveau/nouveau_grctx.c: correct NULL test drm/nouveau: call ttm_bo_wait with the bo lock held to prevent hang drm/nouveau: Fixup semaphores on pre-nv50 cards. drm/nouveau: Add getparam to get available PGRAPH units. drm/nouveau: Add module options to disable acceleration. drm/nouveau: fix non-vram notifier blocks
\| * \| \|	drm/nouveau: Add getparam to get available PGRAPH units.	Marcin Kościelnicki	2010-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On nv50, this will be needed by applications using CUDA to know how much stack/local memory to allocate. Signed-off-by: Marcin Kościelnicki <koriakin@0x04.net> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
\| * \| \|	Merge remote branch 'korg/drm-radeon-testing' into drm-testing	Dave Airlie	2010-01-07
\| \|\ \ \
* \| \| \| \|	drm/vmwgfx: Drop scanout flag compat and add execbuf ioctl parameter ↵	Jakob Bornecrantz	2010-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	members. Bumps major. Even if this bumps the version to 1 it does not mean the driver is out of staging. From what we know this is the last backwards incompatible change to the driver. Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* \| \| \| \|	drm/vmwgfx: Update the user-space interface.	Thomas Hellstrom	2010-02-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When time-based throttling is implemented, we need to bump minor. When the old way of detecting scanout is removed, we need to bump major. In the meantime, this change should not break existing user-space. Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Jakob Bornecrantz <jakob@vmware.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
* \| \| \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-02-10
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (29 commits) drivers/net: Correct NULL test MAINTAINERS: networking drivers - Add git net-next tree net/sched: Fix module name in Kconfig cxgb3: fix GRO checksum check dst: call cond_resched() in dst_gc_task() netfilter: nf_conntrack: fix hash resizing with namespaces netfilter: xtables: compat out of scope fix netfilter: nf_conntrack: restrict runtime expect hashsize modifications netfilter: nf_conntrack: per netns nf_conntrack_cachep netfilter: nf_conntrack: fix memory corruption with multiple namespaces Bluetooth: Keep a copy of each HID device's report descriptor pktgen: Fix freezing problem igb: make certain to reassign legacy interrupt vectors after reset irda: add missing BKL in irnet_ppp ioctl irda: unbalanced lock_kernel in irnet_ppp ixgbe: Fix return of invalid txq ixgbe: Fix ixgbe_tx_map error path netxen: protect resource cleanup by rtnl lock netxen: fix tx timeout recovery for NX2031 chip Bluetooth: Enter active mode before establishing a SCO link. ...
\| * \| \| \| \|	netfilter: nf_conntrack: fix hash resizing with namespaces	Patrick McHardy	2010-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As noticed by Jon Masters <jonathan@jonmasters.org>, the conntrack hash size is global and not per namespace, but modifiable at runtime through /sys/module/nf_conntrack/hashsize. Changing the hash size will only resize the hash in the current namespace however, so other namespaces will use an invalid hash size. This can cause crashes when enlarging the hashsize, or false negative lookups when shrinking it. Move the hash size into the per-namespace data and only use the global hash size to initialize the per-namespace value when instanciating a new namespace. Additionally restrict hash resizing to init_net for now as other namespaces are not handled currently. Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \| \| \| \|	netfilter: nf_conntrack: per netns nf_conntrack_cachep	Eric Dumazet	2010-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nf_conntrack_cachep is currently shared by all netns instances, but because of SLAB_DESTROY_BY_RCU special semantics, this is wrong. If we use a shared slab cache, one object can instantly flight between one hash table (netns ONE) to another one (netns TWO), and concurrent reader (doing a lookup in netns ONE, 'finding' an object of netns TWO) can be fooled without notice, because no RCU grace period has to be observed between object freeing and its reuse. We dont have this problem with UDP/TCP slab caches because TCP/UDP hashtables are global to the machine (and each object has a pointer to its netns). If we use per netns conntrack hash tables, we also must use per netns conntrack slab caches, to guarantee an object can not escape from one namespace to another one. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> [Patrick: added unique slab name allocation] Cc: stable@kernel.org Signed-off-by: Patrick McHardy <kaber@trash.net>
* \| \| \| \| \|	ima: rename ima_path_check to ima_file_check	Mimi Zohar	2010-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ima_path_check actually deals with files! call it ima_file_check instead. Signed-off-by: Eric Paris <eparis@redhat.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \| \| \| \|	fix ima breakage	Mimi Zohar	2010-02-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The "Untangling ima mess, part 2 with counters" patch messed up the counters. Based on conversations with Al Viro, this patch streamlines ima_path_check() by removing the counter maintaince. The counters are now updated independently, from measuring the file, in __dentry_open() and alloc_file() by calling ima_counts_get(). ima_path_check() is called from nfsd and do_filp_open(). It also did not measure all files that should have been measured. Reason: ima_path_check() got bogus value passed as mask. [AV: mea culpa] [AV: add missing nfsd bits] Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* \| \| \| \| \|	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2010-02-05
\|\ \ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] Call flush_dcache_page after PIO data transfers in libata-sff.c ahci: add Acer G725 to broken suspend list libata: fix ata_id_logical_per_physical_sectors libata-scsi passthru: fix bug which truncated LBA48 return values
\| * \| \| \| \| \|	libata: fix ata_id_logical_per_physical_sectors	Christoph Hellwig	2010-02-04
\| \| \|_\|_\|/ / \| \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The value we get from the low byte of the ATA_ID_SECTOR_SIZE word is not not a plain multiple, but the log of it, so fix the helper to give the correct answer. Without this we'll get an incorrect minimal I/O size in the block limits VPD page for 4k sector drives. Also change the return value of ata_id_logical_per_physical_sectors to u16 for the unlikely case of very large logical sectors. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* \| \| \| \| \|	percpu: add __percpu for sparse	Stephen Rothwell	2010-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is to make the annotation of percpu variables during the next merge window less painfull. Extracted from a patch by Rusty Russell. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \| \|	Merge branch 'core-fixes-for-linus' of ↵	Linus Torvalds	2010-02-04
\|\ \ \ \ \ \ \| \|/ / / / / \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: futex: Handle futex value corruption gracefully futex: Handle user space corruption gracefully futex_lock_pi() key refcnt fix softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume
\| * \| \| \| \|	softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume	Jason Wessel	2010-02-01
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets the time from hardware such as the TSC on x86. In this configuration kgdb will report a softlock warning message on resuming or detaching from a debug session. Sequence of events in the problem case: 1) "cpu sched clock" and "hardware time" are at 100 sec prior to a call to kgdb_handle_exception() 2) Debugger waits in kgdb_handle_exception() for 80 sec and on exit the following is called ... touch_softlockup_watchdog() --> __raw_get_cpu_var(touch_timestamp) = 0; 3) "cpu sched clock" = 100s (it was not updated, because the interrupt was disabled in kgdb) but the "hardware time" = 180 sec 4) The first timer interrupt after resuming from kgdb_handle_exception updates the watchdog from the "cpu sched clock" update_process_times() { ... run_local_timers() --> softlockup_tick() --> check (touch_timestamp == 0) (it is "YES" here, we have set "touch_timestamp = 0" at kgdb) --> __touch_softlockup_watchdog() *(A)--> reset "touch_timestamp" to "get_timestamp()" (Here, the "touch_timestamp" will still be set to 100s.) ... scheduler_tick() (B)--> sched_clock_tick() (update "cpu sched clock" to "hardware time" = 180s) ... } 5) The Second timer interrupt handler appears to have a large jump and trips the softlockup warning. update_process_times() { ... run_local_timers() --> softlockup_tick() --> "cpu sched clock" - "touch_timestamp" = 180s-100s > 60s --> printk "soft lockup error messages" ... } note: **(A) reset "touch_timestamp" to "get_timestamp(this_cpu)" Why is "touch_timestamp" 100 sec, instead of 180 sec? When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of get_timestamp() is: get_timestamp(this_cpu) -->cpu_clock(this_cpu) -->sched_clock_cpu(this_cpu) -->__update_sched_clock(sched_clock_data, now) The __update_sched_clock() function uses the GTOD tick value to create a window to normalize the "now" values. So if "now" value is too big for sched_clock_data, it will be ignored. The fix is to invoke sched_clock_tick() to update "cpu sched clock" in order to recover from this state. This is done by introducing the function touch_softlockup_watchdog_sync(). This allows kgdb to request that the sched clock is updated when the watchdog thread runs the first time after a resume from kgdb. [yong.zhang0@gmail.com: Use per cpu instead of an array] Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Dongdong Deng <Dongdong.Deng@windriver.com> Cc: kgdb-bugreport@lists.sourceforge.net Cc: peterz@infradead.org LKML-Reference: <1264631124-4837-2-git-send-email-jason.wessel@windriver.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>