aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * | | | net: fix tx queue selection for bridged devices implementing select_queueHelmut Schaa2010-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a net device is implementing the select_queue callback and is part of a bridge, frames coming from the bridge already have a tx queue associated to the socket (introduced in commit a4ee3ce3293dc931fab19beb472a8bde1295aebe, "net: Use sk_tx_queue_mapping for connected sockets"). The call to sk_tx_queue_get will then return the tx queue used by the bridge instead of calling the select_queue callback. In case of mac80211 this broke QoS which is implemented by using the select_queue callback. Furthermore it introduced problems with rt2x00 because frames with the same TID and RA sometimes appeared on different tx queues which the hw cannot handle correctly. Fix this by always calling select_queue first if it is available and only afterwards use the socket tx queue mapping. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | bonding: Fix jiffies overflow problems (again)Jiri Bohac2010-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time_before_eq()/time_after_eq() functions operate on unsigned long and only work if the difference between the two compared values is smaller than half the range of unsigned long (31 bits on i386). Some of the variables (slave->jiffies, dev->trans_start, dev->last_rx) used by bonding store a copy of jiffies and may not be updated for a long time. With HZ=1000, time_before_eq()/time_after_eq() will start giving bad results after ~25 days. jiffies will never be before slave->jiffies, dev->trans_start, dev->last_rx by more than possibly a couple ticks caused by preemption of this code. This allows us to detect/prevent these overflows by replacing time_before_eq()/time_after_eq() with time_in_range(). Signed-off-by: Jiri Bohac <jbohac@suse.cz> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | stmmac: fix sleep inside atomicGiuseppe Cavallaro2010-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot use spinlock when kmalloc is invoked with GFP_KERNEL flag because it can sleep. So this patch reviews the usage of spinlock within the stmmac_resume function avoing this bug. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | cls_cgroup: Fix rcu lockdep warningLi Zefan2010-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave reported an rcu lockdep warning on 2.6.35.4 kernel task->cgroups and task->cgroups->subsys[i] are protected by RCU. So we avoid accessing invalid pointers here. This might happen, for example, when you are deref-ing those pointers while someone move @task from one cgroup to another. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | be2net: remove a BUG_ON in be_cmds.cAjit Khaparde2010-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Async notifications other than link status are possible in certain configurations. Remove the BUG_ON in the mcc completion processing path. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | be2net: fix a bug in UE detection logicAjit Khaparde2010-09-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ONLINE registers can return 0xFFFFFFFF on more than one occassion. On systems that care, reading these registers could lead to problems. So the new code decides that the ASIC has encountered and error by reading the UE_STATUS_LOW/HIGH registers. AND them with the mask values and a non-zero result indicates an error. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | be2net: fix net-snmp error because of wrong packet statsAjit Khaparde2010-09-03
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Wrong packet statistics for multicast Rx was causing net-snmp error messages every 15 seconds. Instead of picking the multicast stats from hardware, now maintain it in the driver itself. Signed-off-by: Ajit Khaparde <ajitk@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6Linus Torvalds2010-09-11
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: sparc: Kill all BKL usage.
| * | | | sparc: Kill all BKL usage.David S. Miller2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They were all bogus artifacts and completely unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2010-09-11
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, tsc: Fix a preemption leak in restore_sched_clock_state() sched: Move sched_avg_update() to update_cpu_load()
| * | | | | x86, tsc: Fix a preemption leak in restore_sched_clock_state()Peter Zijlstra2010-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Doh, a real life genuine preemption leak.. This caused a suspend failure. Reported-bisected-and-tested-by-the-invaluable: Jeff Chua <jeff.chua.linux@gmail.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Nico Schottelius <nico-linux-20100709@schottelius.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Florian Pritz <flo@xssn.at> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: <stable@kernel.org> # Greg, please apply after: cd7240c ("x86, tsc, sched: Recompute cyc2ns_offset's during resume from") sleep states LKML-Reference: <1284150773.402.122.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | sched: Move sched_avg_update() to update_cpu_load()Suresh Siddha2010-09-09
| | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently sched_avg_update() (which updates rt_avg stats in the rq) is getting called from scale_rt_power() (in the load balance context) which doesn't take rq->lock. Fix it by moving the sched_avg_update() to more appropriate update_cpu_load() where the CFS load gets updated as well. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1282596171.2694.3.camel@sbsiddha-MOBL3> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | Merge branch 'drm-intel-fixes' of ↵Linus Torvalds2010-09-10
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel * 'drm-intel-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ickle/drm-intel: drm/i915: don't enable self-refresh on Ironlake drm/i915: Double check that the wait_request is not pending before warning Revert "drm/i915: Warn if we run out of FIFO space for a mode" Revert "drm/i915: Allow LVDS on pipe A on gen4+" Revert "drm/i915: Enable RC6 on Ironlake."
| * | | | | drm/i915: don't enable self-refresh on IronlakeJesse Barnes2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't know how to enable it safely, especially as outputs turn on and off. When disabling LP1 we also need to make sure LP2 and 3 are already disabled. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29173 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29082 Reported-by: Chris Lord <chris@linux.intel.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Tested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: stable@kernel.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
| * | | | | drm/i915: Double check that the wait_request is not pending before warningChris Wilson2010-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we are busy, then we may have woken up the wait_request handler but not yet serviced it before the hang check fires. So in hang check, double check that the i915_gem_do_wait_request() is still pending the wake-up before declaring all hope lost. Fixes regression with e78d73b16bcde921c9cf458d2e4de8e4fc2518f3. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30073 Reported-and-tested-by: Sitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
| * | | | | Revert "drm/i915: Warn if we run out of FIFO space for a mode"Chris Wilson2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit b9421ae8f30958deea98d71477b4a77a066856b4. This warning was so prelevant, even for apparently working machines, that it was just causing fear, anxiety and panic. The root cause still remains, so we will add some better debugging when we focus on fixing it. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=17021 Reported-by: Maciej Rutecki <maciej.rutecki@gmail.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
| * | | | | Revert "drm/i915: Allow LVDS on pipe A on gen4+"Chris Wilson2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 0f3ee801b332d6ff22285386675fe5aaedf035c3. Enabling LVDS on pipe A was causing excessive wakeups on otherwise idle systems due to i915 interrupts. So restrict the LVDS to pipe B once more, whilst the issue is properly diagnosed. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16307 Reported-and-tested-by: Enrico Bandiello <enban@postal.uv.es> Poked-by: Florian Mickler <florian@mickler.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Adam Jackson <ajax@redhat.com> Cc: stable@kernel.org
| * | | | | Revert "drm/i915: Enable RC6 on Ironlake."Chris Wilson2010-09-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit ce17178094f368d9e3f39b2cb4303da5ed633dd4. This commit has been independently bisected a few times as being the cause of a s2ram failure. Reported-and-tested-by: Kyle McMartin <kyle@mcmartin.ca> Reported-and-tested-by: Andy Isaacson <adi@hexapodia.org> Cc: Zou Nan hai <nanhai.zou@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
* | | | | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-09-10
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: log IO completion workqueue is a high priority queue xfs: prevent reading uninitialized stack memory
| * | | | | | xfs: log IO completion workqueue is a high priority queueDave Chinner2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The workqueue implementation in 2.6.36-rcX has changed, resulting in the workqueues no longer having dedicated threads for work processing. This has caused severe livelocks under heavy parallel create workloads because the log IO completions have been getting held up behind metadata IO completions. Hence log commits would stall, memory allocation would stall because pages could not be cleaned, and lock contention on the AIL during inode IO completion processing was being seen to slow everything down even further. By making the log Io completion workqueue a high priority workqueue, they are queued ahead of all data/metadata IO completions and processed before the data/metadata completions. Hence the log never gets stalled, and operations needed to clean memory can continue as quickly as possible. This avoids the livelock conditions and allos the system to keep running under heavy load as per normal. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Alex Elder <aelder@sgi.com>
| * | | | | | xfs: prevent reading uninitialized stack memoryDan Rosenberg2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12 bytes of uninitialized stack memory, because the fsxattr struct declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero) the 12-byte fsx_pad member before copying it back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Alex Elder <aelder@sgi.com>
* | | | | | | x86, tsc: Fix a preemption leak in restore_sched_clock_state()Peter Zijlstra2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A real life genuine preemption leak.. Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | execve: make responsive to SIGKILL with large argumentsRoland McGrath2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An execve with a very large total of argument/environment strings can take a really long time in the execve system call. It runs uninterruptibly to count and copy all the strings. This change makes it abort the exec quickly if sent a SIGKILL. Note that this is the conservative change, to interrupt only for SIGKILL, by using fatal_signal_pending(). It would be perfectly correct semantics to let any signal interrupt the string-copying in execve, i.e. use signal_pending() instead of fatal_signal_pending(). We'll save that change for later, since it could have user-visible consequences, such as having a timer set too quickly make it so that an execve can never complete, though it always happened to work before. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | execve: improve interactivity with large argumentsRoland McGrath2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a preemption point during the copying of the argument and environment strings for execve, in copy_strings(). There is already a preemption point in the count() loop, so this doesn't add any new points in the abstract sense. When the total argument+environment strings are very large, the time spent copying them can be much more than a normal user time slice. So this change improves the interactivity of the rest of the system when one process is doing an execve with very large arguments. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | setup_arg_pages: diagnose excessive argument sizeRoland McGrath2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CONFIG_STACK_GROWSDOWN variant of setup_arg_pages() does not check the size of the argument/environment area on the stack. When it is unworkably large, shift_arg_pages() hits its BUG_ON. This is exploitable with a very large RLIMIT_STACK limit, to create a crash pretty easily. Check that the initial stack is not too large to make it possible to map in any executable. We're not checking that the actual executable (or intepreter, for binfmt_elf) will fit. So those mappings might clobber part of the initial stack mapping. But that is just userland lossage that userland made happen, not a kernel problem. Signed-off-by: Roland McGrath <roland@redhat.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2010-09-10
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: Perform hardware_enable in CPU_STARTING callback KVM: i8259: fix migration KVM: fix i8259 oops when no vcpus are online KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
| * | | | | | | KVM: x86: Perform hardware_enable in CPU_STARTING callbackZachary Amsden2010-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CPU_STARTING callback was added upstream with the intention of being used for KVM, specifically for the hardware enablement that must be done before we can run in hardware virt. It had bugs on the x86_64 architecture at the time, where it was called after CPU_ONLINE. The arches have since merged and the bug is gone. It might be noted other features should probably start making use of this callback; microcode updates in particular which might be fixing important erratums would be best applied before beginning to run user tasks. Signed-off-by: Zachary Amsden <zamsden@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| * | | | | | | KVM: i8259: fix migrationGleb Natapov2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Top of kvm_kpic_state structure should have the same memory layout as kvm_pic_state since it is copied by memcpy. Signed-off-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | KVM: fix i8259 oops when no vcpus are onlineAvi Kivity2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are no vcpus, found will be NULL. Check before doing anything with it. Signed-off-by: Avi Kivity <avi@redhat.com>
| * | | | | | | KVM: x86 emulator: fix regression with cmpxchg8b on i386 hostsAvi Kivity2010-09-08
| | |_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b operands are 64-bit. Fix by adding val64 and orig_val64 union members to struct operand, and using them where needed. Signed-off-by: Avi Kivity <avi@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
* | | | | | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-09-10
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: tracing: t_start: reset FTRACE_ITER_HASH in case of seek/pread perf symbols: Fix multiple initialization of symbol system perf: Fix CPU hotplug perf, trace: Fix module leak tracing/kprobe: Fix handling of C-unlike argument names tracing/kprobes: Fix handling of argument names perf probe: Fix handling of arguments names perf probe: Fix return probe support tracing/kprobe: Fix a memory leak in error case tracing: Do not allow llseek to set_ftrace_filter
| * \ \ \ \ \ \ Merge branch 'tip/perf/urgent' of ↵Ingo Molnar2010-09-10
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
| | * | | | | | | tracing: t_start: reset FTRACE_ITER_HASH in case of seek/preadChris Wright2010-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Be sure to avoid entering t_show() with FTRACE_ITER_HASH set without having properly started the iterator to iterate the hash. This case is degenerate and, as discovered by Robert Swiecki, can cause t_hash_show() to misuse a pointer. This causes a NULL ptr deref with possible security implications. Tracked as CVE-2010-3079. Cc: Robert Swiecki <swiecki@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: <stable@kernel.org> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| * | | | | | | | Merge branch 'perf/urgent' of ↵Ingo Molnar2010-09-10
| |\ \ \ \ \ \ \ \ | | |/ / / / / / / | |/| | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux-2.6 into perf/urgent
| | * | | | | | | perf symbols: Fix multiple initialization of symbol systemJovi Zhang2010-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By returning immediately if it was already initialized, do it as well at symbol__exit, refusing multiple deinitializations. This fixes problems in the kmem, sched and timechart commands. Reported-by: Davidlohr Bueso <dave@gnu.org> Cc: Davidlohr Bueso <dave@gnu.org> Signed-off-by: Jovi Zhang <bookjovi@gmail.com> LKML-Reference: AANLkTi=9Cn=R8SPMCRp5z+gEjXbaBHeb-AaOtRbuwwcn@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | | perf: Fix CPU hotplugPeter Zijlstra2010-09-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we have UP_PREPARE, we should also have UP_CANCELED. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: paulus <paulus@samba.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | | perf, trace: Fix module leakLi Zefan2010-09-09
| |/ / / / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1c024eca (perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events) caused a module refcount leak. Reported-And-Tested-by: Avi Kivity <avi@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <4C7E1F12.8030304@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | | tracing/kprobe: Fix handling of C-unlike argument namesMasami Hiramatsu2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check the argument name whether it is invalid (not C-like symbol name). This makes event format simple. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113912.22882.62313.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | tracing/kprobes: Fix handling of argument namesMasami Hiramatsu2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set "argN" name for each argument automatically if it has no specified name. Since dynamic trace event(kprobe_events) accepts special characters for its argument, its format can show those special characters (e.g. '$', '%', '+'). However, perf can't parse those format because of the character (especially '%') mess up the format. This sets "argX" name for those arguments if user omitted the argument names. E.g. # echo 'p do_fork %ax IP=%ip $stack' > tracing/kprobe_events # cat tracing/kprobe_events p:kprobes/p_do_fork_0 do_fork arg1=%ax IP=%ip arg3=$stack Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113906.22882.59312.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | perf probe: Fix handling of arguments namesMasami Hiramatsu2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't make argument names from raw parameters (means the parameters are written in kprobe-tracer syntax), because the argument syntax may include special characters. Just leave it, then kprobe-tracer gives a new name. Reported-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100827113859.22882.75598.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | perf probe: Fix return probe supportMasami Hiramatsu2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug to support %return probe syntax again. Previous commit 4235b04 has a bug which disables the %return syntax on perf probe. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20100827113852.22882.87447.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | tracing/kprobe: Fix a memory leak in error caseMasami Hiramatsu2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a memory leak which happens when a field name conflicts with others. In error case, free_trace_probe() will free all arguments until nr_args, so this increments nr_args the begining of the loop instead of the end. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> LKML-Reference: <20100827113846.22882.12670.stgit@ltc236.sdl.hitachi.co.jp> Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | | | | tracing: Do not allow llseek to set_ftrace_filterSteven Rostedt2010-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading the file set_ftrace_filter does three things. 1) shows whether or not filters are set for the function tracer 2) shows what functions are set for the function tracer 3) shows what triggers are set on any functions 3 is independent from 1 and 2. The way this file currently works is that it is a state machine, and as you read it, it may change state. But this assumption breaks when you use lseek() on the file. The state machine gets out of sync and the t_show() may use the wrong pointer and cause a kernel oops. Luckily, this will only kill the app that does the lseek, but the app dies while holding a mutex. This prevents anyone else from using the set_ftrace_filter file (or any other function tracing file for that matter). A real fix for this is to rewrite the code, but that is too much for a -rc release or stable. This patch simply disables llseek on the set_ftrace_filter() file for now, and we can do the proper fix for the next major release. Reported-by: Robert Swiecki <swiecki@google.com> Cc: Chris Wright <chrisw@sous-sol.org> Cc: Tavis Ormandy <taviso@google.com> Cc: Eugene Teo <eugene@redhat.com> Cc: vendor-sec@lst.de Cc: <stable@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
* | | | | | | | KEYS: Fix bug in keyctl_session_to_parent() if parent has no session keyringDavid Howells2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership of the parent process's session keyring whether or not the parent has a session keyring [CVE-2010-2960]. This results in the following oops: BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0 IP: [<ffffffff811ae4dd>] keyctl_session_to_parent+0x251/0x443 ... Call Trace: [<ffffffff811ae2f3>] ? keyctl_session_to_parent+0x67/0x443 [<ffffffff8109d286>] ? __do_fault+0x24b/0x3d0 [<ffffffff811af98c>] sys_keyctl+0xb4/0xb8 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b if the parent process has no session keyring. If the system is using pam_keyinit then it mostly protected against this as all processes derived from a login will have inherited the session keyring created by pam_keyinit during the log in procedure. To test this, pam_keyinit calls need to be commented out in /etc/pam.d/. Reported-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Tavis Ormandy <taviso@cmpxchg8b.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | KEYS: Fix RCU no-lock warning in keyctl_session_to_parent()David Howells2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's an protected access to the parent process's credentials in the middle of keyctl_session_to_parent(). This results in the following RCU warning: =================================================== [ INFO: suspicious rcu_dereference_check() usage. ] --------------------------------------------------- security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection! other info that might help us debug this: rcu_scheduler_active = 1, debug_locks = 0 1 lock held by keyctl-session-/2137: #0: (tasklist_lock){.+.+..}, at: [<ffffffff811ae2ec>] keyctl_session_to_parent+0x60/0x236 stack backtrace: Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1 Call Trace: [<ffffffff8105606a>] lockdep_rcu_dereference+0xaa/0xb3 [<ffffffff811ae379>] keyctl_session_to_parent+0xed/0x236 [<ffffffff811af77e>] sys_keyctl+0xb4/0xb6 [<ffffffff81001eab>] system_call_fastpath+0x16/0x1b The code should take the RCU read lock to make sure the parents credentials don't go away, even though it's holding a spinlock and has IRQ disabled. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2010-09-10
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: Range check cpu in blk_cpu_to_group scatterlist: prevent invalid free when alloc fails writeback: Fix lost wake-up shutting down writeback thread writeback: do not lose wakeup events when forking bdi threads cciss: fix reporting of max queue depth since init block: switch s390 tape_block and mg_disk to elevator_change() block: add function call to switch the IO scheduler from a driver fs/bio-integrity.c: return -ENOMEM on kmalloc failure bio-integrity.c: remove dependency on __GFP_NOFAIL BLOCK: fix bio.bi_rw handling block: put dev->kobj in blk_register_queue fail path cciss: handle allocation failure cfq-iosched: Documentation help for new tunables cfq-iosched: blktrace print per slice sector stats cfq-iosched: Implement tunable group_idle cfq-iosched: Do group share accounting in IOPS when slice_idle=0 cfq-iosched: Do not idle if slice_idle=0 cciss: disable doorbell reset on reset_devices blkio: Fix return code for mkdir calls
| * | | | | | | | block: Range check cpu in blk_cpu_to_groupBrian King2010-09-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While testing CPU DLPAR, the following problem was discovered. We were DLPAR removing the first CPU, which in this case was logical CPUs 0-3. CPUs 0-2 were already marked offline and we were in the process of offlining CPU 3. After marking the CPU inactive and offline in cpu_disable, but before the cpu was completely idle (cpu_die), we ended up in __make_request on CPU 3. There we looked at the topology map to see which CPU to complete the I/O on and found no CPUs in the cpu_sibling_map. This resulted in the block layer setting the completion cpu to be NR_CPUS, which then caused an oops when we tried to complete the I/O. Fix this by sanity checking the value we return from blk_cpu_to_group to be a valid cpu value. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | scatterlist: prevent invalid free when alloc failsJeffrey Carlyle2010-08-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When alloc fails, free_table is being called. Depending on the number of bytes requested, we determine if we are going to call _get_free_page() or kmalloc(). When alloc fails, our math is wrong (due to sg_size - 1), and the last buffer is wrongfully assumed to have been allocated by kmalloc. Hence, kfree gets called and a panic occurs. Signed-off-by: Jeffrey Carlyle <jeff.carlyle@motorola.com> Signed-off-by: Olusanya Soyannwo <c23746@motorola.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | writeback: Fix lost wake-up shutting down writeback threadJ. Bruce Fields2010-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setting the task state here may cause us to miss the wake up from kthread_stop(), so we need to recheck kthread_should_stop() or risk sleeping forever in the following schedule(). Symptom was an indefinite hang on an NFSv4 mount. (NFSv4 may create multiple mounts in a temporary namespace while traversing the mount path, and since the temporary namespace is immediately destroyed, it may end up destroying a mount very soon after it was created, possibly making this race more likely.) INFO: task mount.nfs4:4314 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 0000000000000000 2880 4314 4313 0x00000000 ffff88001ed6da28 0000000000000046 ffff88001ed6dfd8 ffff88001ed6dfd8 ffff88001ed6c000 ffff88001ed6c000 ffff88001ed6c000 ffff88001e5003a0 ffff88001ed6dfd8 ffff88001e5003a8 ffff88001ed6c000 ffff88001ed6dfd8 Call Trace: [<ffffffff8196090d>] schedule_timeout+0x1cd/0x2e0 [<ffffffff8106a31c>] ? mark_held_locks+0x6c/0xa0 [<ffffffff819639a0>] ? _raw_spin_unlock_irq+0x30/0x60 [<ffffffff8106a5fd>] ? trace_hardirqs_on_caller+0x14d/0x190 [<ffffffff819671fe>] ? sub_preempt_count+0xe/0xd0 [<ffffffff8195fc80>] wait_for_common+0x120/0x190 [<ffffffff81033c70>] ? default_wake_function+0x0/0x20 [<ffffffff8195fdcd>] wait_for_completion+0x1d/0x20 [<ffffffff810595fa>] kthread_stop+0x4a/0x150 [<ffffffff81061a60>] ? thaw_process+0x70/0x80 [<ffffffff810cc68a>] bdi_unregister+0x10a/0x1a0 [<ffffffff81229dc9>] nfs_put_super+0x19/0x20 [<ffffffff810ee8c4>] generic_shutdown_super+0x54/0xe0 [<ffffffff810ee9b6>] kill_anon_super+0x16/0x60 [<ffffffff8122d3b9>] nfs4_kill_super+0x39/0x90 [<ffffffff810eda45>] deactivate_locked_super+0x45/0x60 [<ffffffff810edfb9>] deactivate_super+0x49/0x70 [<ffffffff81108294>] mntput_no_expire+0x84/0xe0 [<ffffffff811084ef>] release_mounts+0x9f/0xc0 [<ffffffff81108575>] put_mnt_ns+0x65/0x80 [<ffffffff8122cc56>] nfs_follow_remote_path+0x1e6/0x420 [<ffffffff8122cfbf>] nfs4_try_mount+0x6f/0xd0 [<ffffffff8122d0c2>] nfs4_get_sb+0xa2/0x360 [<ffffffff810edcb8>] vfs_kern_mount+0x88/0x1f0 [<ffffffff810ede92>] do_kern_mount+0x52/0x130 [<ffffffff81963d9a>] ? _lock_kernel+0x6a/0x170 [<ffffffff81108e9e>] do_mount+0x26e/0x7f0 [<ffffffff81106b3a>] ? copy_mount_options+0xea/0x190 [<ffffffff811094b8>] sys_mount+0x98/0xf0 [<ffffffff810024d8>] system_call_fastpath+0x16/0x1b 1 lock held by mount.nfs4/4314: #0: (&type->s_umount_key#24){+.+...}, at: [<ffffffff810edfb1>] deactivate_super+0x41/0x70 Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com> Acked-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
| * | | | | | | | writeback: do not lose wakeup events when forking bdi threadsArtem Bityutskiy2010-08-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the following issue: INFO: task mount.nfs4:1120 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mount.nfs4 D 00000000fffc6a21 0 1120 1119 0x00000000 ffff880235643948 0000000000000046 ffffffff00000000 ffffffff00000000 ffff880235643fd8 ffff880235314760 00000000001d44c0 ffff880235643fd8 00000000001d44c0 00000000001d44c0 00000000001d44c0 00000000001d44c0 Call Trace: [<ffffffff813bc747>] schedule_timeout+0x34/0xf1 [<ffffffff813bc530>] ? wait_for_common+0x3f/0x130 [<ffffffff8106b50b>] ? trace_hardirqs_on+0xd/0xf [<ffffffff813bc5c3>] wait_for_common+0xd2/0x130 [<ffffffff8104159c>] ? default_wake_function+0x0/0xf [<ffffffff813beaa0>] ? _raw_spin_unlock+0x26/0x2a [<ffffffff813bc6bb>] wait_for_completion+0x18/0x1a [<ffffffff81101a03>] sync_inodes_sb+0xca/0x1bc [<ffffffff811056a6>] __sync_filesystem+0x47/0x7e [<ffffffff81105798>] sync_filesystem+0x47/0x4b [<ffffffff810e7ffd>] generic_shutdown_super+0x22/0xd2 [<ffffffff810e80f8>] kill_anon_super+0x11/0x4f [<ffffffffa00d06d7>] nfs4_kill_super+0x3f/0x72 [nfs] [<ffffffff810e7b68>] deactivate_locked_super+0x21/0x41 [<ffffffff810e7fd6>] deactivate_super+0x40/0x45 [<ffffffff810fc66c>] mntput_no_expire+0xb8/0xed [<ffffffff810fc73b>] release_mounts+0x9a/0xb0 [<ffffffff810fc7bb>] put_mnt_ns+0x6a/0x7b [<ffffffffa00d0fb2>] nfs_follow_remote_path+0x19a/0x296 [nfs] [<ffffffffa00d11ca>] nfs4_try_mount+0x75/0xaf [nfs] [<ffffffffa00d1790>] nfs4_get_sb+0x276/0x2ff [nfs] [<ffffffff810e7dba>] vfs_kern_mount+0xb8/0x196 [<ffffffff810e7ef6>] do_kern_mount+0x48/0xe8 [<ffffffff810fdf68>] do_mount+0x771/0x7e8 [<ffffffff810fe062>] sys_mount+0x83/0xbd [<ffffffff810089c2>] system_call_fastpath+0x16/0x1b The reason of this hang was a race condition: when the flusher thread is forking a bdi thread, we use 'kthread_run()', so we run it _before_ we make it visible in 'bdi->wb.task'. The bdi thread runs, does all works, and goes sleep. 'bdi->wb.task' is still NULL. And this is a dangerous time window. If at this time someone queues a work for this bdi, he does not see the bdi thread and wakes up the forker thread instead! But the forker has already forked this bdi thread, but just did not make it visible yet! The result is that we lose the wake up event for this bdi thread and the NFS4 code waits forever. To fix the problem, we should use 'ktrhead_create()' for creating bdi threads, then make them visible in 'bdi->wb.task', and only after this wake them up. This is exactly what this patch does. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>