aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * vmscan: shrinker->nr updates race and go wrongDave Chinner2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit acf92b485cccf028177f46918e045c0c4e80ee10 upstream. Stable note: Not tracked in Bugzilla. This patch reduces excessive reclaim of slab objects reducing the amount of information that has to be brought back in from disk. shrink_slab() allows shrinkers to be called in parallel so the struct shrinker can be updated concurrently. It does not provide any exclusio for such updates, so we can get the shrinker->nr value increasing or decreasing incorrectly. As a result, when a shrinker repeatedly returns a value of -1 (e.g. a VFS shrinker called w/ GFP_NOFS), the shrinker->nr goes haywire, sometimes updating with the scan count that wasn't used, sometimes losing it altogether. Worse is when a shrinker does work and that update is lost due to racy updates, which means the shrinker will do the work again! Fix this by making the total_scan calculations independent of shrinker->nr, and making the shrinker->nr updates atomic w.r.t. to other updates via cmpxchg loops. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * vmscan: add shrink_slab tracepointsDave Chinner2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 095760730c1047c69159ce88021a7fa3833502c8 upstream. Stable note: This patch makes later patches easier to apply but otherwise has little to justify it. It is a diagnostic patch that was part of a series addressing excessive slab shrinking after GFP_NOFS failures. There is detailed information on the series' motivation at https://lkml.org/lkml/2011/6/2/42 . It is impossible to understand what the shrinkers are actually doing without instrumenting the code, so add a some tracepoints to allow insight to be gained. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * vmscan: clear ZONE_CONGESTED for zone with good watermarkShaohua Li2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 439423f6894aa0dec22187526827456f5004baed upstream. Stable note: Not tracked in Bugzilla. kswapd is responsible for clearing ZONE_CONGESTED after it balances a zone and this patch fixes a bug where that was failing to happen. Without this patch, processes can stall in wait_iff_congested unnecessarily. For users, this can look like an interactivity stall but some workloads would see it as sudden drop in throughput. ZONE_CONGESTED is only cleared in kswapd, but pages can be freed in any task. It's possible ZONE_CONGESTED isn't cleared in some cases: 1. the zone is already balanced just entering balance_pgdat() for order-0 because concurrent tasks free memory. In this case, later check will skip the zone as it's balanced so the flag isn't cleared. 2. high order balance fallbacks to order-0. quote from Mel: At the end of balance_pgdat(), kswapd uses the following logic; If reclaiming at high order { for each zone { if all_unreclaimable skip if watermark is not met order = 0 loop again /* watermark is met */ clear congested } } i.e. it clears ZONE_CONGESTED if it the zone is balanced. if not, it restarts balancing at order-0. However, if the higher zones are balanced for order-0, kswapd will miss clearing ZONE_CONGESTED as that only happens after a zone is shrunk. This can mean that wait_iff_congested() stalls unnecessarily. This patch makes kswapd clear ZONE_CONGESTED during its initial highmem->dma scan for zones that are already balanced. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm: vmscan: fix force-scanning small targets without swapJohannes Weiner2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit a4d3e9e76337059406fcf3ead288c0df22a790e9 upstream. Stable note: Not tracked in Bugzilla. This patch augments an earlier commit that avoids scanning priority being artificially raised. The older fix was particularly important for small memcgs to avoid calling wait_iff_congested() unnecessarily. Without swap, anonymous pages are not scanned. As such, they should not count when considering force-scanning a small target if there is no swap. Otherwise, targets are not force-scanned even when their effective scan number is zero and the other conditions--kswapd/memcg--apply. This fixes 246e87a93934 ("memcg: fix get_scan_count() for small targets"). [akpm@linux-foundation.org: fix comment] Signed-off-by: Johannes Weiner <jweiner@redhat.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: Michal Hocko <mhocko@suse.cz> Cc: Ying Han <yinghan@google.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm: reduce the amount of work done when updating min_free_kbytesMel Gorman2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 938929f14cb595f43cd1a4e63e22d36cab1e4a1f upstream. Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=726210 . Large machines with 1TB or more of RAM take a long time to boot without this patch and may spew out soft lockup warnings. When min_free_kbytes is updated, some pageblocks are marked MIGRATE_RESERVE. Ordinarily, this work is unnoticable as it happens early in boot but on large machines with 1TB of memory, this has been reported to delay boot times, probably due to the NUMA distances involved. The bulk of the work is due to calling calling pageblock_is_reserved() an unnecessary amount of times and accessing far more struct page metadata than is necessary. This patch significantly reduces the amount of work done by setup_zone_migrate_reserve() improving boot times on 1TB machines. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm: memory hotplug: Check if pages are correctly reserved on a per-section basisMel Gorman2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 2bbcb8788311a40714b585fc11b51da6ffa2ab92 upstream. Stable note: Fixes https://bugzilla.novell.com/show_bug.cgi?id=721039 . Without the patch, memory hot-add can fail for kernel configurations that do not set CONFIG_SPARSEMEM_VMEMMAP. (Resending as I am not seeing it in -next so maybe it got lost) mm: memory hotplug: Check if pages are correctly reserved on a per-section basis It is expected that memory being brought online is PageReserved similar to what happens when the page allocator is being brought up. Memory is onlined in "memory blocks" which consist of one or more sections. Unfortunately, the code that verifies PageReserved is currently assuming that the memmap backing all these pages is virtually contiguous which is only the case when CONFIG_SPARSEMEM_VMEMMAP is set. As a result, memory hot-add is failing on those configurations with the message; kernel: section number XXX page number 256 not reserved, was it already online? This patch updates the PageReserved check to lookup struct page once per section to guarantee the correct struct page is being checked. [Check pages within sections properly: rientjes@google.com] [original patch by: nfont@linux.vnet.ibm.com] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Tested-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm/vmstat.c: cache align vm_statDimitri Sivanich2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit a1cb2c60ddc98ff4e5246f410558805401ceee67 upstream. Stable note: Not tracked on Bugzilla. This patch is known to make a big difference to tmpfs performance on larger machines. This was found to adversely affect tmpfs I/O performance. Tests run on a 640 cpu UV system. With 120 threads doing parallel writes, each to different tmpfs mounts: No patch: ~300 MB/sec With vm_stat alignment: ~430 MB/sec Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> Acked-by: Christoph Lameter <cl@gentwo.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * dm raid1: fix crash with mirror recovery and discardMikulas Patocka2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 751f188dd5ab95b3f2b5f2f467c38aae5a2877eb upstream. This patch fixes a crash when a discard request is sent during mirror recovery. Firstly, some background. Generally, the following sequence happens during mirror synchronization: - function do_recovery is called - do_recovery calls dm_rh_recovery_prepare - dm_rh_recovery_prepare uses a semaphore to limit the number simultaneously recovered regions (by default the semaphore value is 1, so only one region at a time is recovered) - dm_rh_recovery_prepare calls __rh_recovery_prepare, __rh_recovery_prepare asks the log driver for the next region to recover. Then, it sets the region state to DM_RH_RECOVERING. If there are no pending I/Os on this region, the region is added to quiesced_regions list. If there are pending I/Os, the region is not added to any list. It is added to the quiesced_regions list later (by dm_rh_dec function) when all I/Os finish. - when the region is on quiesced_regions list, there are no I/Os in flight on this region. The region is popped from the list in dm_rh_recovery_start function. Then, a kcopyd job is started in the recover function. - when the kcopyd job finishes, recovery_complete is called. It calls dm_rh_recovery_end. dm_rh_recovery_end adds the region to recovered_regions or failed_recovered_regions list (depending on whether the copy operation was successful or not). The above mechanism assumes that if the region is in DM_RH_RECOVERING state, no new I/Os are started on this region. When I/O is started, dm_rh_inc_pending is called, which increases reg->pending count. When I/O is finished, dm_rh_dec is called. It decreases reg->pending count. If the count is zero and the region was in DM_RH_RECOVERING state, dm_rh_dec adds it to the quiesced_regions list. Consequently, if we call dm_rh_inc_pending/dm_rh_dec while the region is in DM_RH_RECOVERING state, it could be added to quiesced_regions list multiple times or it could be added to this list when kcopyd is copying data (it is assumed that the region is not on any list while kcopyd does its jobs). This results in memory corruption and crash. There already exist bypasses for REQ_FLUSH requests: REQ_FLUSH requests do not belong to any region, so they are always added to the sync list in do_writes. dm_rh_inc_pending does not increase count for REQ_FLUSH requests. In mirror_end_io, dm_rh_dec is never called for REQ_FLUSH requests. These bypasses avoid the crash possibility described above. These bypasses were improperly implemented for REQ_DISCARD when the mirror target gained discard support in commit 5fc2ffeabb9ee0fc0e71ff16b49f34f0ed3d05b4 (dm raid1: support discard). In do_writes, REQ_DISCARD requests is always added to the sync queue and immediately dispatched (even if the region is in DM_RH_RECOVERING). However, dm_rh_inc and dm_rh_dec is called for REQ_DISCARD resusts. So it violates the rule that no I/Os are started on DM_RH_RECOVERING regions, and causes the list corruption described above. This patch changes it so that REQ_DISCARD requests follow the same path as REQ_FLUSH. This avoids the crash. Reference: https://bugzilla.redhat.com/837607 Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * UBIFS: fix a bug in empty space fix-upArtem Bityutskiy2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit c6727932cfdb13501108b16c38463c09d5ec7a74 upstream. UBIFS has a feature called "empty space fix-up" which is a quirk to work-around limitations of dumb flasher programs. Namely, of those flashers that are unable to skip NAND pages full of 0xFFs while flashing, resulting in empty space at the end of half-filled eraseblocks to be unusable for UBIFS. This feature is relatively new (introduced in v3.0). The fix-up routine (fixup_free_space()) is executed only once at the very first mount if the superblock has the 'space_fixup' flag set (can be done with -F option of mkfs.ubifs). It basically reads all the UBIFS data and metadata and writes it back to the same LEB. The routine assumes the image is pristine and does not have anything in the journal. There was a bug in 'fixup_free_space()' where it fixed up the log incorrectly. All but one LEB of the log of a pristine file-system are empty. And one contains just a commit start node. And 'fixup_free_space()' just unmapped this LEB, which resulted in wiping the commit start node. As a result, some users were unable to mount the file-system next time with the following symptom: UBIFS error (pid 1): replay_log_leb: first log node at LEB 3:0 is not CS node UBIFS error (pid 1): replay_log_leb: log error detected while replaying the log at LEB 3:0 The root-cause of this bug was that 'fixup_free_space()' wrongly assumed that the beginning of empty space in the log head (c->lhead_offs) was known on mount. However, it is not the case - it was always 0. UBIFS does not store in it the master node and finds out by scanning the log on every mount. The fix is simple - just pass commit start node size instead of 0 to 'fixup_leb()'. Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@linux.intel.com> Reported-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Tested-by: Iwo Mergler <Iwo.Mergler@netcommwireless.com> Reported-by: James Nute <newten82@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * MIPS: Properly align the .data..init_task section.David Daney2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 7b1c0d26a8e272787f0f9fcc5f3e8531df3b3409 upstream. Improper alignment can lead to unbootable systems and/or random crashes. [ralf@linux-mips.org: This is a lond standing bug since 6eb10bc9e2deab06630261cd05c4cb1e9a60e980 (kernel.org) rsp. c422a10917f75fd19fa7fe070aaaa23e384dae6f (lmo) [MIPS: Clean up linker script using new linker script macros.] so dates back to 2.6.32.] Signed-off-by: David Daney <david.daney@cavium.com> Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/3881/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm: fix lost kswapd wakeup in kswapd_stop()Aaditya Kumar2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 1c7e7f6c0703d03af6bcd5ccc11fc15d23e5ecbe upstream. Offlining memory may block forever, waiting for kswapd() to wake up because kswapd() does not check the event kthread->should_stop before sleeping. The proper pattern, from Documentation/memory-barriers.txt, is: --- waker --- event_indicated = 1; wake_up_process(event_daemon); --- sleeper --- for (;;) { set_current_state(TASK_UNINTERRUPTIBLE); if (event_indicated) break; schedule(); } set_current_state() may be wrapped by: prepare_to_wait(); In the kswapd() case, event_indicated is kthread->should_stop. === offlining memory (waker) === kswapd_stop() kthread_stop() kthread->should_stop = 1 wake_up_process() wait_for_completion() === kswapd_try_to_sleep (sleeper) === kswapd_try_to_sleep() prepare_to_wait() . . schedule() . . finish_wait() The schedule() needs to be protected by a test of kthread->should_stop, which is wrapped by kthread_should_stop(). Reproducer: Do heavy file I/O in background. Do a memory offline/online in a tight loop Signed-off-by: Aaditya Kumar <aaditya.kumar@ap.sony.com> Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Minchan Kim <minchan@kernel.org> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * ntp: Fix STA_INS/DEL clearing bugJohn Stultz2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream. In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I introduced a bug that kept the STA_INS or STA_DEL bit from being cleared from time_status via adjtimex() without forcing STA_PLL first. Usually once the STA_INS is set, it isn't cleared until the leap second is applied, so its unlikely this affected anyone. However during testing I noticed it took some effort to cancel a leap second once STA_INS was set. Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1342156917-25092-2-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * cifs: always update the inode cache with the results from a FIND_*Jeff Layton2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1031926 commit cd60042cc1392e79410dc8de9e9c1abb38a29e57 upstream. When we get back a FIND_FIRST/NEXT result, we have some info about the dentry that we use to instantiate a new inode. We were ignoring and discarding that info when we had an existing dentry in the cache. Fix this by updating the inode in place when we find an existing dentry and the uniqueid is the same. Reported-and-Tested-by: Andrew Bartlett <abartlet@samba.org> Reported-by: Bill Robertson <bill_robertson@debortoli.com.au> Reported-by: Dion Edwards <dion_edwards@debortoli.com.au> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * UBUNTU: SAUCE: rds_ib_send() -- prevent local pings triggering BUG_ON()Andy Whitcroft2012-08-13
| | | | | | | | | | | | | | | | | | | | | | | | Pining localhost on an infiniband connection can trigger a BUG_ON() and cause a denial of service. Fix identified by comparison of the RHEL source rpms. CVE-2012-2372 BugLink: http://bugs.launchpad.net/bugs/1016299 Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * UBUNTU: Bump ABILuis Henriques2012-08-13
| | | | | | | | | | Ignore: yes Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
| * UBUNTU: Start new releaseLuis Henriques2012-08-13
| | | | | | | | | | Ignore: yes Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
| * UBUNTU: Ubuntu-3.0.0-24.40Luis Henriques2012-07-23
| | | | | | | | Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
| * Linux 3.0.38Greg Kroah-Hartman2012-07-23
| | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850
| * timekeeping: Add missing update call in timekeeping_resume()Thomas Gleixner2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0 The leap second rework unearthed another issue of inconsistent data. On timekeeping_resume() the timekeeper data is updated, but nothing calls timekeeping_update(), so now the update code in the timer interrupt sees stale values. This has been the case before those changes, but then the timer interrupt was using stale data as well so this went unnoticed for quite some time. Add the missing update call, so all the data is consistent everywhere. Reported-by: Andreas Schwab <schwab@linux-m68k.org> Reported-and-tested-by: "Rafael J. Wysocki" <rjw@sisk.pl> Reported-and-tested-by: Martin Steigerwald <Martin@lichtvoll.de> Cc: John Stultz <johnstul@us.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>, Cc: Prarit Bhargava <prarit@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * hrtimer: Update hrtimer base offsets each hrtimer_interruptJohn Stultz2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 5baefd6d84163443215f4a99f6a20f054ef11236 The update of the hrtimer base offsets on all cpus cannot be made atomically from the timekeeper.lock held and interrupt disabled region as smp function calls are not allowed there. clock_was_set(), which enforces the update on all cpus, is called either from preemptible process context in case of do_settimeofday() or from the softirq context when the offset modification happened in the timer interrupt itself due to a leap second. In both cases there is a race window for an hrtimer interrupt between dropping timekeeper lock, enabling interrupts and clock_was_set() issuing the updates. Any interrupt which arrives in that window will see the new time but operate on stale offsets. So we need to make sure that an hrtimer interrupt always sees a consistent state of time and offsets. ktime_get_update_offsets() allows us to get the current monotonic time and update the per cpu hrtimer base offsets from hrtimer_interrupt() to capture a consistent state of monotonic time and the offsets. The function replaces the existing ktime_get() calls in hrtimer_interrupt(). The overhead of the new function vs. ktime_get() is minimal as it just adds two store operations. This ensures that any changes to realtime or boottime offsets are noticed and stored into the per-cpu hrtimer base structures, prior to any hrtimer expiration and guarantees that timers are not expired early. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-8-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * timekeeping: Provide hrtimer update functionThomas Gleixner2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of f6c06abfb3972ad4914cef57d8348fcb2932bc3b To finally fix the infamous leap second issue and other race windows caused by functions which change the offsets between the various time bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a function which atomically gets the current monotonic time and updates the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic overhead. The previous patch which provides ktime_t offsets allows us to make this function almost as cheap as ktime_get() which is going to be replaced in hrtimer_interrupt(). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-7-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * hrtimers: Move lock held region in hrtimer_interrupt()Thomas Gleixner2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40 We need to update the base offsets from this code and we need to do that under base->lock. Move the lock held region around the ktime_get() calls. The ktime_get() calls are going to be replaced with a function which gets the time and the offsets atomically. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Link: http://lkml.kernel.org/r/1341960205-56738-6-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * timekeeping: Maintain ktime_t based offsets for hrtimersThomas Gleixner2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0 We need to update the hrtimer clock offsets from the hrtimer interrupt context. To avoid conversions from timespec to ktime_t maintain a ktime_t based representation of those offsets in the timekeeper. This puts the conversion overhead into the code which updates the underlying offsets and provides fast accessible values in the hrtimer interrupt. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-4-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * timekeeping: Fix leapsecond triggered load spike issueJohn Stultz2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 4873fa070ae84a4115f0b3c9dfabc224f1bc7c51 The timekeeping code misses an update of the hrtimer subsystem after a leap second happened. Due to that timers based on CLOCK_REALTIME are either expiring a second early or late depending on whether a leap second has been inserted or deleted until an operation is initiated which causes that update. Unless the update happens by some other means this discrepancy between the timekeeping and the hrtimer data stays forever and timers are expired either early or late. The reported immediate workaround - $ data -s "`date`" - is causing a call to clock_was_set() which updates the hrtimer data structures. See: http://www.sheeri.com/content/mysql-and-leap-second-high-cpu-and-fix Add the missing clock_was_set() call to update_wall_time() in case of a leap second event. The actual update is deferred to softirq context as the necessary smp function call cannot be invoked from hard interrupt context. Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-3-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * hrtimer: Provide clock_was_set_delayed()John Stultz2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of f55a6faa384304c89cfef162768e88374d3312cb clock_was_set() cannot be called from hard interrupt context because it calls on_each_cpu(). For fixing the widely reported leap seconds issue it is necessary to call it from hard interrupt context, i.e. the timer tick code, which does the timekeeping updates. Provide a new function which denotes it in the hrtimer cpu base structure of the cpu on which it is called and raise the hrtimer softirq. We then execute the clock_was_set() notificiation from softirq context in run_hrtimer_softirq(). The hrtimer softirq is rarely used, so polling the flag there is not a performance issue. [ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get rid of all this ifdeffery ASAP ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Reported-by: Jan Engelhardt <jengelh@inai.de> Reviewed-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1341960205-56738-2-git-send-email-johnstul@us.ibm.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * time: Move common updates to a functionThomas Gleixner2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of cc06268c6a87db156af2daed6e96a936b955cc82 While not a bugfix itself, it allows following fixes to backport in a more straightforward manner. CC: Thomas Gleixner <tglx@linutronix.de> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * timekeeping: Fix CLOCK_MONOTONIC inconsistency during leapsecondJohn Stultz2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a which resolves a bug the previous commit. Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC. Adjust wall_to_monotonic when NTP inserted a leapsecond. Reported-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Richard Cochran <richardcochran@gmail.com> Link: http://lkml.kernel.org/r/1338400497-12420-1-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * ntp: Correct TAI offset during leap secondRichard Cochran2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of dd48d708ff3e917f6d6b6c2b696c3f18c019feed When repeating a UTC time value during a leap second (when the UTC time should be 23:59:60), the TAI timescale should not stop. The kernel NTP code increments the TAI offset one second too late. This patch fixes the issue by incrementing the offset during the leap second itself. Signed-off-by: Richard Cochran <richardcochran@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * ntp: Fix leap-second hrtimer livelockJohn Stultz2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d This should have been backported when it was commited, but I mistook the problem as requiring the ntp_lock changes that landed in 3.4 in order for it to occur. Unfortunately the same issue can happen (with only one cpu) as follows: do_adjtimex() write_seqlock_irq(&xtime_lock); process_adjtimex_modes() process_adj_status() ntp_start_leap_timer() hrtimer_start() hrtimer_reprogram() tick_program_event() clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); [DEADLOCK] This deadlock will no always occur, as it requires the leap_timer to force a hrtimer_reprogram which only happens if its set and there's no sooner timer to expire. NOTE: This patch, being faithful to the original commit, introduces a bug (we don't update wall_to_monotonic), which will be resovled by backporting a following fix. Original commit message below: Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp subsystem has used an hrtimer for triggering the leapsecond adjustment. However, this can cause a potential livelock. Thomas diagnosed this as the following pattern: CPU 0 CPU 1 do_adjtimex() spin_lock_irq(&ntp_lock); process_adjtimex_modes(); timer_interrupt() process_adj_status(); do_timer() ntp_start_leap_timer(); write_lock(&xtime_lock); hrtimer_start(); update_wall_time(); hrtimer_reprogram(); ntp_tick_length() tick_program_event() spin_lock(&ntp_lock); clockevents_program_event() ktime_get() seq = req_seqbegin(xtime_lock); This patch tries to avoid the problem by reverting back to not using an hrtimer to inject leapseconds, and instead we handle the leapsecond processing in the second_overflow() function. The downside to this change is that on systems that support highres timers, the leap second processing will occur on a HZ tick boundary, (ie: ~1-10ms, depending on HZ) after the leap second instead of possibly sooner (~34us in my tests w/ x86_64 lapic). This patch applies on top of tip/timers/core. CC: Sasha Levin <levinsasha928@gmail.com> CC: Thomas Gleixner <tglx@linutronix.de> Reported-by: Sasha Levin <levinsasha928@gmail.com> Diagnoised-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Sasha Levin <levinsasha928@gmail.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * cfg80211: check iface combinations only when iface is runningMichal Kazior2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit f8cdddb8d61d16a156229f0910f7ecfc7a82c003 upstream. Don't validate interface combinations on a stopped interface. Otherwise we might end up being able to create a new interface with a certain type, but won't be able to change an existing interface into that type. This also skips some other functions when interface is stopped and changing interface type. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> [Fixes regression introduced by cherry pick of 463454b5dbd8] Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * tcp: drop SYN+FIN messagesEric Dumazet2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream. Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his linux machines to their limits. Dont call conn_request() if the TCP flags includes SYN flag Reported-by: Denys Fedoryshchenko <denys@visp.net.lb> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * Input: xpad - add Andamiro Pump It Up padYuri Khan2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit e76b8ee25e034ab601b525abb95cea14aa167ed3 upstream. I couldn't find the vendor ID in any of the online databases, but this mat has a Pump It Up logo on the top side of the controller compartment, and a disclaimer stating that Andamiro will not be liable on the bottom. Signed-off-by: Yuri Khan <yurivkhan@gmail.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * e1000e: Correct link check logic for 82571 serdesTushar Dave2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit d0efa8f23a644f7cb7d1f8e78dd9a223efa412a3 upstream. SYNCH bit and IV bit of RXCW register are sticky. Before examining these bits, RXCW should be read twice to filter out one-time false events and have correct values for these bits. Incorrect values of these bits in link check logic can cause weird link stability issues if auto-negotiation fails. Reported-by: Dean Nelson <dnelson@redhat.com> Signed-off-by: Tushar Dave <tushar.n.dave@intel.com> Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * rt2x00usb: fix indexes ordering on RX queue kickStanislaw Gruszka2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit efd821182cec8c92babef6e00a95066d3252fda4 upstream. On rt2x00_dmastart() we increase index specified by Q_INDEX and on rt2x00_dmadone() we increase index specified by Q_INDEX_DONE. So entries between Q_INDEX_DONE and Q_INDEX are those we currently process in the hardware. Entries between Q_INDEX and Q_INDEX_DONE are those we can submit to the hardware. According to that fix rt2x00usb_kick_queue(), as we need to submit RX entries that are not processed by the hardware. It worked before only for empty queue, otherwise was broken. Note that for TX queues indexes ordering are ok. We need to kick entries that have filled skb, but was not submitted to the hardware, i.e. started from Q_INDEX_DONE and have ENTRY_DATA_PENDING bit set. From practical standpoint this fixes RX queue stall, usually reproducible in AP mode, like for example reported here: https://bugzilla.redhat.com/show_bug.cgi?id=828824 Reported-and-tested-by: Franco Miceli <fmiceli@plan.ceibal.edu.uy> Reported-and-tested-by: Tom Horsley <horsley1953@gmail.com> Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com> Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * fifo: Do not restart open() if it already found a partnerAnders Kaseorg2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 05d290d66be6ef77a0b962ebecf01911bd984a78 upstream. If a parent and child process open the two ends of a fifo, and the child immediately exits, the parent may receive a SIGCHLD before its open() returns. In that case, we need to make sure that open() will return successfully after the SIGCHLD handler returns, instead of throwing EINTR or being restarted. Otherwise, the restarted open() would incorrectly wait for a second partner on the other end. The following test demonstrates the EINTR that was wrongly thrown from the parent’s open(). Change .sa_flags = 0 to .sa_flags = SA_RESTART to see a deadlock instead, in which the restarted open() waits for a second reader that will never come. (On my systems, this happens pretty reliably within about 5 to 500 iterations. Others report that it manages to loop ~forever sometimes; YMMV.) #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <fcntl.h> #include <signal.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #define CHECK(x) do if ((x) == -1) {perror(#x); abort();} while(0) void handler(int signum) {} int main() { struct sigaction act = {.sa_handler = handler, .sa_flags = 0}; CHECK(sigaction(SIGCHLD, &act, NULL)); CHECK(mknod("fifo", S_IFIFO | S_IRWXU, 0)); for (;;) { int fd; pid_t pid; putc('.', stderr); CHECK(pid = fork()); if (pid == 0) { CHECK(fd = open("fifo", O_RDONLY)); _exit(0); } CHECK(fd = open("fifo", O_WRONLY)); CHECK(close(fd)); CHECK(waitpid(pid, NULL, 0)); } } This is what I suspect was causing the Git test suite to fail in t9010-svn-fe.sh: http://bugs.debian.org/678852 Signed-off-by: Anders Kaseorg <andersk@mit.edu> Reviewed-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * intel_ips: blacklist HP ProBook laptopsTakashi Iwai2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 88ca518b0bb4161e5f20f8a1d9cc477cae294e54 upstream. intel_ips driver spews the warning message "ME failed to update for more than 1s, likely hung" at each second endlessly on HP ProBook laptops with IronLake. As this has never worked, better to blacklist the driver for now. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * ARM: SAMSUNG: fix race in s3c_adc_start for ADCTodd Poynor2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 8265981bb439f3ecc5356fb877a6c2a6636ac88a upstream. Checking for adc->ts_pend already claimed should be done with the lock held. Signed-off-by: Todd Poynor <toddpoynor@google.com> Acked-by: Ben Dooks <ben-linux@fluff.org> Signed-off-by: Kukjin Kim <kgene.kim@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mtd: nandsim: don't open code a do_div helperHerton Ronaldo Krzesinski2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 596fd46268634082314b3af1ded4612e1b7f3f03 upstream. We don't need to open code the divide function, just use div_u64 that already exists and do the same job. While this is a straightforward clean up, there is more to that, the real motivation for this. While building on a cross compiling environment in armel, using gcc 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5), I was getting the following build error: ERROR: "__aeabi_uldivmod" [drivers/mtd/nand/nandsim.ko] undefined! After investigating with objdump and hand built assembly version generated with the compiler, I narrowed __aeabi_uldivmod as being generated from the divide function. When nandsim.c is built with -fno-inline-functions-called-once, that happens when CONFIG_DEBUG_SECTION_MISMATCH is enabled, the do_div optimization in arch/arm/include/asm/div64.h doesn't work as expected with the open coded divide function: even if the do_div we are using doesn't have a constant divisor, the compiler still includes the else parts of the optimized do_div macro, and translates the divisions there to use __aeabi_uldivmod, instead of only calling __do_div_asm -> __do_div64 and optimizing/removing everything else out. So to reproduce, gcc 4.6 plus CONFIG_DEBUG_SECTION_MISMATCH=y and CONFIG_MTD_NAND_NANDSIM=m should do it, building on armel. After this change, the compiler does the intended thing even with -fno-inline-functions-called-once, and optimizes out as expected the constant handling in the optimized do_div on arm. As this also avoids a build issue, I'm marking for Stable, as I think is applicable for this case. Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * media: dvb-core: Release semaphore on error path dvb_register_device()Santosh Nayak2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 82163edcdfa4eb3d74516cc8e9f38dd3d039b67d upstream. There is a missing "up_write()" here. Semaphore should be released before returning error value. Signed-off-by: Santosh Nayak <santoshprasadnayak@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * block: fix infinite loop in __getblk_slowJeff Moyer2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 91f68c89d8f35fe98ea04159b9a3b42d0149478f upstream. Commit 080399aaaf35 ("block: don't mark buffers beyond end of disk as mapped") exposed a bug in __getblk_slow that causes mount to hang as it loops infinitely waiting for a buffer that lies beyond the end of the disk to become uptodate. The problem was initially reported by Torsten Hilbrich here: https://lkml.org/lkml/2012/6/18/54 and also reported independently here: http://www.sysresccd.org/forums/viewtopic.php?f=13&t=4511 and then Richard W.M. Jones and Marcos Mello noted a few separate bugzillas also associated with the same issue. This patch has been confirmed to fix: https://bugzilla.redhat.com/show_bug.cgi?id=835019 The main problem is here, in __getblk_slow: for (;;) { struct buffer_head * bh; int ret; bh = __find_get_block(bdev, block, size); if (bh) return bh; ret = grow_buffers(bdev, block, size); if (ret < 0) return NULL; if (ret == 0) free_more_memory(); } __find_get_block does not find the block, since it will not be marked as mapped, and so grow_buffers is called to fill in the buffers for the associated page. I believe the for (;;) loop is there primarily to retry in the case of memory pressure keeping grow_buffers from succeeding. However, we also continue to loop for other cases, like the block lying beond the end of the disk. So, the fix I came up with is to only loop when grow_buffers fails due to memory allocation issues (return value of 0). The attached patch was tested by myself, Torsten, and Rich, and was found to resolve the problem in call cases. Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Reported-and-Tested-by: Torsten Hilbrich <torsten.hilbrich@secunet.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Reviewed-by: Josh Boyer <jwboyer@redhat.com> [ Jens is on vacation, taking this directly - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * hwmon: (it87) Preserve configuration register bits on initJean Delvare2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1026850 commit 41002f8dd5938d5ad1d008ce5bfdbfe47fa7b4e8 upstream. We were accidentally losing one bit in the configuration register on device initialization. It was reported to freeze one specific system right away. Properly preserve all bits we don't explicitly want to change in order to prevent that. Reported-by: Stevie Trujillo <stevie.trujillo@gmail.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * UBUNTU: [Config] getabis enhancementsStefan Bader2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | - Allow to define DEBIAN to be run standalone - Limit the amount of time wget waits for connections - Prevent exposing some errors when called by python script - Print architecture of downloaded packages Ignore: yes Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Herton Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * UBUNTU: SAUCE: sched: Fix race in task_group()Stefan Bader2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stefan reported a crash on a kernel before a3e5d1091c1 ("sched: Don't call task_group() too many times in set_task_rq()"), he found the reason to be that the multiple task_group() invocations in set_task_rq() returned different values. Looking at all that I found a lack of serialization and plain wrong comments. The below tries to fix it using an extra pointer which is updated under the appropriate scheduler locks. Its not pretty, but I can't really see another way given how all the cgroup stuff works. Reported-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> BugLink: http://bugs.launchpad.net/bugs/999755 [backported from patch posted on mailing list] Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Herton Krzesinski <herton.krzesinski@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * x86, mce, therm_throt: Don't report power limit and package level thermal ↵Fenghua Yu2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | throttle events in mcelog BugLink: http://bugs.launchpad.net/bugs/930288 Thermal throttle and power limit events are not defined as MCE errors in x86 architecture and should not generate MCE errors in mcelog. Current kernel generates fake software defined MCE errors for these events. This may confuse users because they may think the machine has real MCE errors while actually only thermal throttle or power limit events happen. To make it worse, buggy firmware on some platforms may falsely generate the events. Therefore, kernel reports MCE errors which users think as real hardware errors. Although the firmware bugs should be fixed, on the other hand, kernel should not report MCE errors either. So mcelog is not a good mechanism to report these events. To report the events, we count them in respective counters (core_power_limit_count, package_power_limit_count, core_throttle_count, and package_throttle_count) in /sys/devices/system/cpu/cpu#/thermal_throttle/. Users can check the counters for each event on each CPU. Please note that all CPU's on one package report duplicate counters. It's user application's responsibity to retrieve a package level counter for one package. This patch doesn't report package level power limit, core level power limit, and package level thermal throttle events in mcelog. When the events happen, only report them in respective counters in sysfs. Since core level thermal throttle has been legacy code in kernel for a while and users accepted it as MCE error in mcelog, core level thermal throttle is still reported in mcelog. In the mean time, the event is counted in a counter in sysfs as well. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Acked-by: Borislav Petkov <bp@amd64.org> Acked-by: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/20111215001945.GA21009@linux-os.sc.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> (cherry picked from commit 29e9bf1841e4f9df13b4992a716fece7087dd237) Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
| * Linux 3.0.37Greg Kroah-Hartman2012-07-23
| | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406
| * ACPI: Remove one board specific WARN when ignoring timer overridingFeng Tang2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406 commit 7f68b4c2e158019c2ec494b5cfbd9c83b4e5b253 upstream. Current WARN msg is only for the ati_ixp4x0 board, while this function is used by mulitple platforms. So this one board specific warning is not appropriate any more. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * ACPI: Make acpi_skip_timer_override cover all source_irq==0 casesFeng Tang2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406 commit ae10ccdc3093486f8c2369d227583f9d79f628e5 upstream. Currently when acpi_skip_timer_override is set, it only cover the (source_irq == 0 && global_irq == 2) cases. While there is also platform which need use this option and its global_irq is not 2. This patch will extend acpi_skip_timer_override to cover all timer overriding cases as long as the source irq is 0. This is the first part of a fix to kernel bug bugzilla 40002: "IRQ 0 assigned to VGA" https://bugzilla.kernel.org/show_bug.cgi?id=40002 Reported-and-tested-by: Szymon Kowalczyk <fazerxlo@o2.pl> Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm: Hold a file reference in madvise_removeAndy Lutomirski2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406 commit 9ab4233dd08036fe34a89c7dc6f47a8bf2eb29eb upstream. Otherwise the code races with munmap (causing a use-after-free of the vma) or with close (causing a use-after-free of the struct file). The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix mmap_sem i_mutex deadlock") [bwh: Backported to 3.2: - Adjust context - madvise_remove() calls vmtruncate_range(), not do_fallocate()] [luto: Backported to 3.0: Adjust context] Cc: Hugh Dickins <hugh@veritas.com> Cc: Miklos Szeredi <mszeredi@suse.cz> Cc: Badari Pulavarty <pbadari@us.ibm.com> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * fs: ramfs: file-nommu: add SetPageUptodate()Bob Liu2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406 commit fea9f718b3d68147f162ed2d870183ce5e0ad8d8 upstream. There is a bug in the below scenario for !CONFIG_MMU: 1. create a new file 2. mmap the file and write to it 3. read the file can't get the correct value Because sys_read() -> generic_file_aio_read() -> simple_readpage() -> clear_page() which causes the page to be zeroed. Add SetPageUptodate() to ramfs_nommu_expand_for_mapping() so that generic_file_aio_read() do not call simple_readpage(). Signed-off-by: Bob Liu <lliubbo@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greg Ungerer <gerg@uclinux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
| * mm, thp: abort compaction if migration page cannot be charged to memcgDavid Rientjes2012-07-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BugLink: http://bugs.launchpad.net/bugs/1025406 commit 4bf2bba3750f10aa9e62e6949bc7e8329990f01b upstream. If page migration cannot charge the temporary page to the memcg, migrate_pages() will return -ENOMEM. This isn't considered in memory compaction however, and the loop continues to iterate over all pageblocks trying to isolate and migrate pages. If a small number of very large memcgs happen to be oom, however, these attempts will mostly be futile leading to an enormous amout of cpu consumption due to the page migration failures. This patch will short circuit and fail memory compaction if migrate_pages() returns -ENOMEM. COMPACT_PARTIAL is returned in case some migrations were successful so that the page allocator will retry. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Mel Gorman <mgorman@suse.de> Cc: Minchan Kim <minchan@kernel.org> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>