aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* cgroup: make ->pre_destroy() return voidTejun Heo2012-11-05
| | | | | | | | | | | | All ->pre_destory() implementations return 0 now, which is the only allowed return value. Make it return void. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com>
* hugetlb: do not fail in hugetlb_cgroup_pre_destroyMichal Hocko2012-11-05
| | | | | | | | | | | | Now that pre_destroy callbacks are called from the context where neither any task can attach the group nor any children group can be added there is no other way to fail from hugetlb_pre_destroy. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Glauber Costa <glommer@parallels.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* memcg: make mem_cgroup_reparent_charges non failingMichal Hocko2012-11-05
| | | | | | | | | | | | | | | | Now that pre_destroy callbacks are called from the context where neither any task can attach the group nor any children group can be added there is no other way to fail from mem_cgroup_pre_destroy. mem_cgroup_pre_destroy doesn't have to take a reference to memcg's css because all css' are marked dead already. tj: Remove now unused local variable @cgrp from mem_cgroup_reparent_charges(). Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Glauber Costa <glommer@parallels.com> Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* cgroup: remove CGRP_WAIT_ON_RMDIR, cgroup_exclude_rmdir() and ↵Tejun Heo2012-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | cgroup_release_and_wakeup_rmdir() CGRP_WAIT_ON_RMDIR is another kludge which was added to make cgroup destruction rollback somewhat working. cgroup_rmdir() used to drain CSS references and CGRP_WAIT_ON_RMDIR and the associated waitqueue and helpers were used to allow the task performing rmdir to wait for the next relevant event. Unfortunately, the wait is visible to controllers too and the mechanism got exposed to memcg by 887032670d ("cgroup avoid permanent sleep at rmdir"). Now that the draining and retries are gone, CGRP_WAIT_ON_RMDIR is unnecessary. Remove it and all the mechanisms supporting it. Note that memcontrol.c changes are essentially revert of 887032670d ("cgroup avoid permanent sleep at rmdir"). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Balbir Singh <bsingharora@gmail.com>
* cgroup: deactivate CSS's and mark cgroup dead before invoking ->pre_destroy()Tejun Heo2012-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because ->pre_destroy() could fail and can't be called under cgroup_mutex, cgroup destruction did something very ugly. 1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise. 2. Release cgroup_mutex and call ->pre_destroy(). 3. Re-grab cgroup_mutex and verify it can still be destroyed; fail otherwise. 4. Continue destroying. In addition to being ugly, it has been always broken in various ways. For example, memcg ->pre_destroy() expects the cgroup to be inactive after it's done but tasks can be attached and detached between #2 and #3 and the conditions that memcg verified in ->pre_destroy() might no longer hold by the time control reaches #3. Now that ->pre_destroy() is no longer allowed to fail. We can switch to the following. 1. Grab cgroup_mutex and verify it can be destroyed; fail otherwise. 2. Deactivate CSS's and mark the cgroup removed thus preventing any further operations which can invalidate the verification from #1. 3. Release cgroup_mutex and call ->pre_destroy(). 4. Re-grab cgroup_mutex and continue destroying. After this change, controllers can safely assume that ->pre_destroy() will only be called only once for a given cgroup and, once ->pre_destroy() is called, the cgroup will stay dormant till it's destroyed. This removes the only reason ->pre_destroy() can fail - new task being attached or child cgroup being created inbetween. Error out path is removed and ->pre_destroy() invocation is open coded in cgroup_rmdir(). v2: cgroup_call_pre_destroy() removal moved to this patch per Michal. Commit message updated per Glauber. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Glauber Costa <glommer@parallels.com>
* cgroup: use cgroup_lock_live_group(parent) in cgroup_create()Tejun Heo2012-11-05
| | | | | | | | | | | | | | | | | | | | | This patch makes cgroup_create() fail if @parent is marked removed. This is to prepare for further updates to cgroup_rmdir() path. Note that this change isn't strictly necessary. cgroup can only be created via mkdir and the removed marking and dentry removal happen without releasing cgroup_mutex, so cgroup_create() can never race with cgroup_rmdir(). Even after the scheduled updates to cgroup_rmdir(), cgroup_mkdir() and cgroup_rmdir() are synchronized by i_mutex rendering the added liveliness check unnecessary. Do it anyway such that locking is contained inside cgroup proper and we don't get nasty surprises if we ever grow another caller of cgroup_create(). Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com>
* cgroup: kill CSS_REMOVEDTejun Heo2012-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CSS_REMOVED is one of the several contortions which were necessary to support css reference draining on cgroup removal. All css->refcnts which need draining should be deactivated and verified to equal zero atomically w.r.t. css_tryget(). If any one isn't zero, all refcnts needed to be re-activated and css_tryget() shouldn't fail in the process. This was achieved by letting css_tryget() busy-loop until either the refcnt is reactivated (failed removal attempt) or CSS_REMOVED is set (committing to removal). Now that css refcnt draining is no longer used, there's no need for atomic rollback mechanism. css_tryget() simply can look at the reference count and fail if it's deactivated - it's never getting re-activated. This patch removes CSS_REMOVED and updates __css_tryget() to fail if the refcnt is deactivated. As deactivation and removal are a single step now, they no longer need to be protected against css_tryget() happening from irq context. Remove local_irq_disable/enable() from cgroup_rmdir(). Note that this removes css_is_removed() whose only user is VM_BUG_ON() in memcontrol.c. We can replace it with a check on the refcnt but given that the only use case is a debug assert, I think it's better to simply unexport it. v2: Comment updated and explanation on local_irq_disable/enable() added per Michal Hocko. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Balbir Singh <bsingharora@gmail.com>
* cgroup: kill cgroup_subsys->__DEPRECATED_clear_css_refsTejun Heo2012-11-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 2ef37d3fe4 ("memcg: Simplify mem_cgroup_force_empty_list error handling") removed the last user of __DEPRECATED_clear_css_refs. This patch removes __DEPRECATED_clear_css_refs and mechanisms to support it. * Conditionals dependent on __DEPRECATED_clear_css_refs removed. * cgroup_clear_css_refs() can no longer fail. All that needs to be done are deactivating refcnts, setting CSS_REMOVED and putting the base reference on each css. Remove cgroup_clear_css_refs() and the failure path, and open-code the loops into cgroup_rmdir(). This patch keeps the two for_each_subsys() loops separate while open coding them. They can be merged now but there are scheduled changes which need them to be separate, so keep them separate to reduce the amount of churn. local_irq_save/restore() from cgroup_clear_css_refs() are replaced with local_irq_disable/enable() for simplicity. This is safe as cgroup_rmdir() is always called with IRQ enabled. Note that this IRQ switching is necessary to ensure that css_tryget() isn't called from IRQ context on the same CPU while lower context is between CSS deactivation and setting CSS_REMOVED as css_tryget() would hang forever in such cases waiting for CSS to be re-activated or CSS_REMOVED set. This will go away soon. v2: cgroup_call_pre_destroy() removal dropped per Michal. Commit message updated to explain local_irq_disable/enable() conversion. Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: Li Zefan <lizefan@huawei.com>
* memcg: Simplify mem_cgroup_force_empty_list error handlingMichal Hocko2012-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mem_cgroup_force_empty_list currently tries to remove all pages from the given LRU. To prevent from temoporary failures (EBUSY returned by mem_cgroup_move_parent) it uses a margin to the current LRU pages and returns the true if there are still some pages left on the list. If we consider that mem_cgroup_move_parent fails only when it is racing with somebody else removing (uncharging) the page or when the page is migrated then it is obvious that all those failures are only temporal and so we can safely retry later. Let's get rid of the safety margin and make the loop really wait for the empty LRU. The caller should still make sure that all charges have been removed from the res_counter because mem_cgroup_replace_page_cache might add a page to the LRU after the list_empty check (it doesn't touch res_counter though). This catches most of the cases except for shmem which might call mem_cgroup_replace_page_cache with a page which is not charged and on the LRU yet but this was the case also without this patch. In order to fix this we need a guarantee that try_get_mem_cgroup_from_page falls back to the current mm's cgroup so it needs css_tryget to fail. This will be fixed up in a later patch because it needs a help from cgroup core (pre_destroy has to be called after css is cleared). Although mem_cgroup_pre_destroy can still fail (if a new task or a new sub-group appears) there is no reason to retry pre_destroy callback from the cgroup core. This means that __DEPRECATED_clear_css_refs has lost its meaning and it can be removed. Changes since v2 - remove __DEPRECATED_clear_css_refs Changes since v1 - use kerndoc - be more specific about mem_cgroup_move_parent possible failures Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* memcg: root_cgroup cannot reach mem_cgroup_move_parentMichal Hocko2012-10-29
| | | | | | | | | | | | | | The root cgroup cannot be destroyed so we never hit it down the mem_cgroup_pre_destroy path and mem_cgroup_force_empty_write shouldn't even try to do anything if called for the root. This means that mem_cgroup_move_parent doesn't have to bother with the root cgroup and it can assume it can always move charges upwards. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* memcg: split mem_cgroup_force_empty into reclaiming and reparenting partsMichal Hocko2012-10-29
| | | | | | | | | | | | | | | | | | | | mem_cgroup_force_empty did two separate things depending on free_all parameter from the very beginning. It either reclaimed as many pages as possible and moved the rest to the parent or just moved charges to the parent. The first variant is used as memory.force_empty callback while the later is used from the mem_cgroup_pre_destroy. The whole games around gotos are far from being nice and there is no reason to keep those two functions inside one. Let's split them and also move the responsibility for css reference counting to their callers to make to code easier. This patch doesn't have any functional changes. Signed-off-by: Michal Hocko <mhocko@suse.cz> Reviewed-by: Tejun Heo <tj@kernel.org> Reviewed-by: Glauber Costa <glommer@parallels.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* Linux 3.6Linus Torvalds2012-09-30
|
* vfs: dcache: fix deadlock in tree traversalMiklos Szeredi2012-09-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IBM reported a deadlock in select_parent(). This was found to be caused by taking rename_lock when already locked when restarting the tree traversal. There are two cases when the traversal needs to be restarted: 1) concurrent d_move(); this can only happen when not already locked, since taking rename_lock protects against concurrent d_move(). 2) racing with final d_put() on child just at the moment of ascending to parent; rename_lock doesn't protect against this rare race, so it can happen when already locked. Because of case 2, we need to be able to handle restarting the traversal when rename_lock is already held. This patch fixes all three callers of try_to_ascend(). IBM reported that the deadlock is gone with this patch. [ I rewrote the patch to be smaller and just do the "goto again" if the lock was already held, but credit goes to Miklos for the real work. - Linus ] Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'iommu-fixes-v3.6-rc7' of ↵Linus Torvalds2012-09-29
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fixes from Joerg Roedel: "Two small patches: * One patch to fix the function declarations for !CONFIG_IOMMU_API. This is causing build errors in linux-next and should be fixed for v3.6. * Another patch to fix an IOMMU group related NULL pointer dereference." * tag 'iommu-fixes-v3.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/amd: Fix wrong assumption in iommu-group specific code iommu: static inline iommu group stub functions
| * iommu/amd: Fix wrong assumption in iommu-group specific codeJoerg Roedel2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new IOMMU groups code in the AMD IOMMU driver makes the assumption that there is a pci_dev struct available for all device-ids listed in the IVRS ACPI table. Unfortunatly this assumption is not true and so this code causes a NULL pointer dereference at boot on some systems. Fix it by making sure the given pointer is never NULL when passed to the group specific code. The real fix is larger and will be queued for v3.7. Reported-by: Florian Dazinger <florian@dazinger.net> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
| * iommu: static inline iommu group stub functionsAlex Williamson2012-09-25
| | | | | | | | | | Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* | Merge git://git.infradead.org/users/willy/linux-nvmeLinus Torvalds2012-09-29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NVMe driver fixes from Matthew Wilcox: "Now that actual hardware has been released (don't have any yet myself), people are starting to want some of these fixes merged." Willy doesn't have hardware? Guys... * git://git.infradead.org/users/willy/linux-nvme: NVMe: Cancel outstanding IOs on queue deletion NVMe: Free admin queue memory on initialisation failure NVMe: Use ida for nvme device instance NVMe: Fix whitespace damage in nvme_init NVMe: handle allocation failure in nvme_map_user_pages() NVMe: Fix uninitialized iod compiler warning NVMe: Do not set IO queue depth beyond device max NVMe: Set block queue max sectors NVMe: use namespace id for nvme_get_features NVMe: replace nvme_ns with nvme_dev for user admin NVMe: Fix nvme module init when nvme_major is set NVMe: Set request queue logical block size
| * | NVMe: Cancel outstanding IOs on queue deletionMatthew Wilcox2012-08-07
| | | | | | | | | | | | | | | | | | | | | If the device is hot-unplugged while there are active commands, we should time out the I/Os so that upper layers don't just see the I/Os disappear. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Free admin queue memory on initialisation failureMatthew Wilcox2012-08-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the adapter fails initialisation, the memory allocated for the admin queue may not be freed. Split the memory freeing part of nvme_free_queue() into nvme_free_queue_mem() and call it in the case of initialisation failure. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com>
| * | NVMe: Use ida for nvme device instanceQuoc-Son Anh2012-07-31
| | | | | | | | | | | | | | | Signed-off-by: Quoc-Son Anh <quoc-sonx.anh@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Fix whitespace damage in nvme_initMatthew Wilcox2012-07-31
| | | | | | | | | | | | | | | | | | | | | Commit 5c42ea1643 used spaces instead of tabs. Also remove the unnecessary initialisation of the 'result' variable. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: handle allocation failure in nvme_map_user_pages()Dan Carpenter2012-07-31
| | | | | | | | | | | | | | | | | | | | | We should return here and avoid a NULL dereference. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Fix uninitialized iod compiler warningKeith Busch2012-07-27
| | | | | | | | | | | | | | | Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Do not set IO queue depth beyond device maxKeith Busch2012-07-27
| | | | | | | | | | | | | | | | | | | | | | | | Set the depth for IO queues to the device's maximum supported queue entries if the requested depth exceeds the device's capabilities. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Set block queue max sectorsKeith Busch2012-07-26
| | | | | | | | | | | | | | | | | | | | | | | | Set the max hw sectors in a namespace's request queue if the nvme device has a max data transfer size. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: use namespace id for nvme_get_featuresKeith Busch2012-07-26
| | | | | | | | | | | | | | | | | | | | | | | | The specification does not provide a use for command dword11 in the NVMe Get Features command, but does use the NSID for some features. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: replace nvme_ns with nvme_dev for user adminKeith Busch2012-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | The function nvme_user_admin_command does not require a namespace to proceed. Replace with the nvme_dev structure so that it can be called from contexts that do not have a namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Fix nvme module init when nvme_major is setKeith Busch2012-07-26
| | | | | | | | | | | | | | | | | | | | | | | | register_blkdev returns 0 when given a valid major number. Reported-by:Ross Zwisler <ross.zwisler@intel.com> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
| * | NVMe: Set request queue logical block sizeKeith Busch2012-07-25
| | | | | | | | | | | | | | | | | | | | | | | | Sets the request queue logical block size with the block size of the namespace. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* | | mtdchar: fix offset overflow detectionLinus Torvalds2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sasha Levin has been running trinity in a KVM tools guest, and was able to trigger the BUG_ON() at arch/x86/mm/pat.c:279 (verifying the range of the memory type). The call trace showed that it was mtdchar_mmap() that created an invalid remap_pfn_range(). The problem is that mtdchar_mmap() does various really odd and subtle things with the vma page offset etc, and uses the wrong types (and the wrong overflow) detection for it. For example, the page offset may well be 32-bit on a 32-bit architecture, but after shifting it up by PAGE_SHIFT, we need to use a potentially 64-bit resource_size_t to correctly hold the full value. Also, we need to check that the vma length plus offset doesn't overflow before we check that it is smaller than the length of the mtdmap region. This fixes things up and tries to make the code a bit easier to read. Reported-and-tested-by: Sasha Levin <levinsasha928@gmail.com> Acked-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Artem Bityutskiy <dedekind1@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: linux-mtd@lists.infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2012-09-28
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David S Miller: 1) Netfilter xt_limit module can use uninitialized rules, from Jan Engelhardt. 2) Wei Yongjun has found several more spots where error pointers were treated as NULL/non-NULL and vice versa. 3) bnx2x was converted to pci_io{,un}map() but one remaining plain iounmap() got missed. From Neil Horman. 4) Due to a fence-post type error in initialization of inetpeer entries (which is where we store the ICMP rate limiting information), we can erroneously drop ICMPs if the inetpeer was created right around when jiffies wraps. Fix from Nicolas Dichtel. 5) smsc75xx resume fix from Steve Glendinnig. 6) LAN87xx smsc chips need an explicit hardware init, from Marek Vasut. 7) qlcnic uses msleep() with locks held, fix from Narendra K. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: netdev: octeon: fix return value check in octeon_mgmt_init_phy() inetpeer: fix token initialization qlcnic: Fix scheduling while atomic bug bnx2: Clean up remaining iounmap net: phy: smsc: Implement PHY config_init for LAN87xx smsc75xx: fix resume after device reset netdev: pasemi: fix return value check in pasemi_mac_phy_init() team: fix return value check l2tp: fix return value check netfilter: xt_limit: have r->cost != 0 case work
| * | | netdev: octeon: fix return value check in octeon_mgmt_init_phy()Wei Yongjun2012-09-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function of_phy_connect() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | inetpeer: fix token initializationNicolas Dichtel2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When jiffies wraps around (for example, 5 minutes after the boot, see INITIAL_JIFFIES) and peer has just been created, now - peer->rate_last can be < XRLIM_BURST_FACTOR * timeout, so token is not set to the maximum value, thus some icmp packets can be unexpectedly dropped. Fix this case by initializing last_rate to 60 seconds in the past. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | qlcnic: Fix scheduling while atomic bugNarendra K2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the device close path, 'qlcnic_fw_destroy_ctx' and 'qlcnic_poll_rsp' call msleep. But 'qlcnic_fw_destroy_ctx' and 'qlcnic_poll_rsp' are called with 'adapter->tx_clean_lock' spin lock held resulting in scheduling while atomic bug causing the following trace. I observed that the commit 012dc19a45b2b9cc2ebd14aaa401cf782c2abba4 from John Fastabend addresses a similar issue in ixgbevf driver. Adopting the same approach used in the commit, this patch uses mdelay to address the issue. [79884.999115] BUG: scheduling while atomic: ip/30846/0x00000002 [79885.005562] INFO: lockdep is turned off. [79885.009958] Modules linked in: qlcnic fuse nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE bnep bluetooth rfkill ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables iptable_nat nf_nat iptable_mangle ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack iptable_filter ip_tables dcdbas coretemp kvm_intel kvm iTCO_wdt ixgbe iTCO_vendor_support crc32c_intel ghash_clmulni_intel nfsd microcode sb_edac pcspkr edac_core dca bnx2x shpchp auth_rpcgss nfs_acl lpc_ich mfd_core mdio lockd libcrc32c wmi acpi_pad acpi_power_meter sunrpc uinput sd_mod sr_mod cdrom crc_t10dif ahci libahci libata megaraid_sas usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded: qlcnic] [79885.083608] Pid: 30846, comm: ip Tainted: G W O 3.6.0-rc7+ #1 [79885.090805] Call Trace: [79885.093569] [<ffffffff816764d8>] __schedule_bug+0x68/0x76 [79885.099699] [<ffffffff8168358e>] __schedule+0x99e/0xa00 [79885.105634] [<ffffffff81683929>] schedule+0x29/0x70 [79885.111186] [<ffffffff81680def>] schedule_timeout+0x16f/0x350 [79885.117724] [<ffffffff811afb7a>] ? init_object+0x4a/0x90 [79885.123770] [<ffffffff8107c190>] ? __internal_add_timer+0x140/0x140 [79885.130873] [<ffffffff81680fee>] schedule_timeout_uninterruptible+0x1e/0x20 [79885.138773] [<ffffffff8107e830>] msleep+0x20/0x30 [79885.144159] [<ffffffffa04c7fbf>] qlcnic_issue_cmd+0xef/0x290 [qlcnic] [79885.151478] [<ffffffffa04c8265>] qlcnic_fw_cmd_destroy_rx_ctx+0x55/0x90 [qlcnic] [79885.159868] [<ffffffffa04c92fd>] qlcnic_fw_destroy_ctx+0x2d/0xa0 [qlcnic] [79885.167576] [<ffffffffa04bf2ed>] __qlcnic_down+0x11d/0x180 [qlcnic] [79885.174708] [<ffffffffa04bf6f8>] qlcnic_close+0x18/0x20 [qlcnic] [79885.181547] [<ffffffff8153b4c5>] __dev_close_many+0x95/0xe0 [79885.187899] [<ffffffff8153b548>] __dev_close+0x38/0x50 [79885.193761] [<ffffffff81545101>] __dev_change_flags+0xa1/0x180 [79885.200419] [<ffffffff81545298>] dev_change_flags+0x28/0x70 [79885.206779] [<ffffffff815531b8>] do_setlink+0x378/0xa00 [79885.212731] [<ffffffff81354fe1>] ? nla_parse+0x31/0xe0 [79885.218612] [<ffffffff815558ee>] rtnl_newlink+0x37e/0x560 [79885.224768] [<ffffffff812cfa19>] ? selinux_capable+0x39/0x50 [79885.231217] [<ffffffff812cbf98>] ? security_capable+0x18/0x20 [79885.237765] [<ffffffff81555114>] rtnetlink_rcv_msg+0x114/0x2f0 [79885.244412] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20 [79885.250280] [<ffffffff81551f87>] ? rtnl_lock+0x17/0x20 [79885.256148] [<ffffffff81555000>] ? __rtnl_unlock+0x20/0x20 [79885.262413] [<ffffffff81570fc1>] netlink_rcv_skb+0xa1/0xb0 [79885.268661] [<ffffffff81551fb5>] rtnetlink_rcv+0x25/0x40 [79885.274727] [<ffffffff815708bd>] netlink_unicast+0x19d/0x220 [79885.281146] [<ffffffff81570c45>] netlink_sendmsg+0x305/0x3f0 [79885.287595] [<ffffffff8152b188>] ? sock_update_classid+0x148/0x2e0 [79885.294650] [<ffffffff81525c2c>] sock_sendmsg+0xbc/0xf0 [79885.300600] [<ffffffff8152600c>] __sys_sendmsg+0x3ac/0x3c0 [79885.306853] [<ffffffff8109be23>] ? up_read+0x23/0x40 [79885.312510] [<ffffffff816896cc>] ? do_page_fault+0x2bc/0x570 [79885.318968] [<ffffffff81191854>] ? sys_brk+0x44/0x150 [79885.324715] [<ffffffff811c458c>] ? fget_light+0x24c/0x520 [79885.330875] [<ffffffff815286f9>] sys_sendmsg+0x49/0x90 [79885.336707] [<ffffffff8168e429>] system_call_fastpath+0x16/0x1b Signed-off-by: Narendra K <narendra_k@dell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | bnx2: Clean up remaining iounmapNeil Horman2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c0357e975afdbbedab5c662d19bef865f02adc17 modified bnx2 to switch from using ioremap/iounmap to pci_iomap/pci_iounmap. They missed a spot in the error path of bnx2_init_one though. This patch just cleans that up. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Michael Chan <mcan@broadcom.com> CC: "David S. Miller" <davem@davemloft.net> Acked-by: Michael Chan <mchan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | net: phy: smsc: Implement PHY config_init for LAN87xxMarek Vasut2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LAN8710/LAN8720 chips do have broken the "FlexPWR" smart power-saving capability. Enabling it leads to the PHY not being able to detect Link when cold-started without cable connected. Thus, make sure this is disabled. Signed-off-by: Marek Vasut <marex@denx.de> Cc: Christian Hohnstaedt <chohnstaedt@innominate.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fabio Estevam <fabio.estevam@freescale.com> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Otavio Salvador <otavio@ossystems.com.br> Acked-by: Otavio Salvador <otavio@ossystems.com.br> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | smsc75xx: fix resume after device resetSteve Glendinning2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On some systems this device fails to properly resume after suspend, this patch fixes it by running the usbnet_resume handler. I suspect this also fixes this bug: http://code.google.com/p/chromium-os/issues/detail?id=31871 Signed-off-by: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | netdev: pasemi: fix return value check in pasemi_mac_phy_init()Wei Yongjun2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function of_phy_connect() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | team: fix return value checkWei Yongjun2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function genlmsg_put() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | l2tp: fix return value checkWei Yongjun2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of error, the function genlmsg_put() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. dpatch engine is used to auto generate this patch. (https://github.com/weiyj/dpatch) Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | Merge branch 'master' of git://1984.lsi.us.es/nfDavid S. Miller2012-09-27
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pablo Neira Ayuso says: ==================== If time allows, I'd appreciate if you can take the following fix for the xt_limit match. As Jan indicates, random things may occur while using the xt_limit match due to use of uninitialized memory. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
| | * | | netfilter: xt_limit: have r->cost != 0 case workJan Engelhardt2012-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit v2.6.19-rc1~1272^2~41 tells us that r->cost != 0 can happen when a running state is saved to userspace and then reinstated from there. Make sure that private xt_limit area is initialized with correct values. Otherwise, random matchings due to use of uninitialized memory. Signed-off-by: Jan Engelhardt <jengelh@inai.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
* | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-09-28
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs fixes from Al Viro: "A couple of fixes; one for automount/lazy umount race, another a classic "we don't protect the refcount transition to zero with the lock that protects looking for object in hash" kind of crap in lockd." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: close the race in nlmsvc_free_block() do_add_mount()/umount -l races
| * | | | | close the race in nlmsvc_free_block()Al Viro2012-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we need to grab mutex before the reference counter reaches 0 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | do_add_mount()/umount -l racesAl Viro2012-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | normally we deal with lock_mount()/umount races by checking that mountpoint to be is still in our namespace after lock_mount() has been done. However, do_add_mount() skips that check when called with MNT_SHRINKABLE in flags (i.e. from finish_automount()). The reason is that ->mnt_ns may be a temporary namespace created exactly to contain automounts a-la NFS4 referral handling. It's not the namespace of the caller, though, so check_mnt() would fail here. We still need to check that ->mnt_ns is non-NULL in that case, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | | | Merge branch 'for-linus-3.6-rc-final' of ↵Linus Torvalds2012-09-28
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML fixes from Richard Weinberger. * 'for-linus-3.6-rc-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Preinclude include/linux/kern_levels.h um: Fix IPC on um um: kill thread->forking um: let signal_delivered() do SIGTRAP on singlestepping into handler um: don't leak floating point state and segment registers on execve() um: take cleaning singlestep to start_thread()
| * | | | | | um: Preinclude include/linux/kern_levels.hGeert Uytterhoeven2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The userspace part of UML uses the asm-offsets.h generator mechanism to create definitions for UM_KERN_<LEVEL> that match the in-kernel KERN_<LEVEL> constant definitions. As of commit 04d2c8c83d0e3ac5f78aeede51babb3236200112 ("printk: convert the format for KERN_<LEVEL> to a 2 byte pattern"), KERN_<LEVEL> is no longer expanded to the literal '"<LEVEL>"', but to '"\001" "LEVEL"', i.e. it contains two parts. However, the combo of DEFINE_STR() in arch/x86/um/shared/sysdep/kernel-offsets.h and sed-y in Kbuild doesn't support string literals consisting of multiple parts. Hence for all UM_KERN_<LEVEL> definitions, only the SOH character is retained in the actual definition, while the remainder ends up in the comment. E.g. in include/generated/asm-offsets.h we get #define UM_KERN_INFO "\001" /* "6" KERN_INFO */ instead of #define UM_KERN_INFO "\001" "6" /* KERN_INFO */ This causes spurious '^A' output in some kernel messages: Calibrating delay loop... 4640.76 BogoMIPS (lpj=23203840) pid_max: default: 32768 minimum: 301 Mount-cache hash table entries: 256 ^AChecking that host ptys support output SIGIO...Yes ^AChecking that host ptys support SIGIO on close...No, enabling workaround ^AUsing 2.6 host AIO NET: Registered protocol family 16 bio: create slab <bio-0> at 0 Switching to clocksource itimer To fix this: - Move the mapping from UM_KERN_<LEVEL> to KERN_<LEVEL> from arch/um/include/shared/common-offsets.h to arch/um/include/shared/user.h, which is preincluded for all userspace parts, - Preinclude include/linux/kern_levels.h for all userspace parts, to obtain the in-kernel KERN_<LEVEL> constant definitions. This doesn't violate the kernel/userspace separation, as include/linux/kern_levels.h is self-contained and doesn't expose any other kernel internals. - Remove the now unused STR() and DEFINE_STR() macros. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | um: Fix IPC on umRichard Weinberger2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c1d7e01d (ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION) forgot UML and broke IPC on it. Also UML has to select ARCH_WANT_IPC_PARSE_VERSION usin Kconfig. Reported-and-tested-by: <Toralf Förster toralf.foerster@gmx.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | um: kill thread->forkingAl Viro2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | we only use that to tell copy_thread() done by syscall from that done by kernel_thread(). However, it's easier to do simply by checking PF_KTHREAD in thread flags. Merge sys_clone() guts for 32bit and 64bit, while we are at it... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | | | | um: let signal_delivered() do SIGTRAP on singlestepping into handlerAl Viro2012-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... rather than duplicating that in sigframe setup code (and doing that inconsistently, at that) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>