aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'for-linus' of ↵Tejun Heo2014-09-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
| * blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probeTejun Heo2014-09-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blk-mq uses percpu_ref for its usage counter which tracks the number of in-flight commands and used to synchronously drain the queue on freeze. percpu_ref shutdown takes measureable wallclock time as it involves a sched RCU grace period. This means that draining a blk-mq takes measureable wallclock time. One would think that this shouldn't matter as queue shutdown should be a rare event which takes place asynchronously w.r.t. userland. Unfortunately, SCSI probing involves synchronously setting up and then tearing down a lot of request_queues back-to-back for non-existent LUNs. This means that SCSI probing may take more than ten seconds when scsi-mq is used. This will be properly fixed by implementing a mechanism to keep q->mq_usage_counter in atomic mode till genhd registration; however, that involves rather big updates to percpu_ref which is difficult to apply late in the devel cycle (v3.17-rc6 at the moment). As a stop-gap measure till the proper fix can be implemented in the next cycle, this patch introduces __percpu_ref_kill_expedited() and makes blk_mq_freeze_queue() use it. This is heavy-handed but should work for testing the experimental SCSI blk-mq implementation. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Link: http://lkml.kernel.org/g/20140919113815.GA10791@lst.de Fixes: add703fda981 ("blk-mq: use percpu_ref for mq usage count") Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * Merge tag 'rdma-for-linus' of ↵Linus Torvalds2014-09-23
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband/rdma fixes from Roland Dreier: "Last late set of InfiniBand/RDMA fixes for 3.17: - fixes for the new memory region re-registration support - iSER initiator error path fixes - grab bag of small fixes for the qib and ocrdma hardware drivers - larger set of fixes for mlx4, especially in RoCE mode" * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits) IB/mlx4: Fix VF mac handling in RoCE IB/mlx4: Do not allow APM under RoCE IB/mlx4: Don't update QP1 in native mode IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses IB/core: When marshaling uverbs path, clear unused fields IB/mlx4: Avoid executing gid task when device is being removed IB/mlx4: Fix lockdep splat for the iboe lock IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up IB/mlx4: Reorder steps in RoCE GID table initialization IB/mlx4: Don't duplicate the default RoCE GID IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs() IB/iser: Bump version to 1.4.1 IB/iser: Allow bind only when connection state is UP IB/iser: Fix RX/TX CQ resource leak on error flow RDMA/ocrdma: Use right macro in query AH RDMA/ocrdma: Resolve L2 address when creating user AH mlx4: Correct error flows in rereg_mr IB/qib: Correct reference counting in debugfs qp_stats IPoIB: Remove unnecessary port query ...
| | *-------. Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-nextRoland Dreier2014-09-22
| | |\ \ \ \ \
| | | | | | | * IB/qib: Correct reference counting in debugfs qp_statsMike Marciniszyn2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This particular reference count is not needed with the rcu protection, and the current code leaks a reference count, causing a hang in qib_qp_destroy(). Cc: <stable@vger.kernel.org> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/qib: Change get_user_pages() usage to always NULL vmasMike Marciniszyn2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The static helper routine, __qib_get_user_pages(), accepts a vma arg, but current use always passes NULL. This has caused some confusion associated with the correct use of this argument, but since the current use case doesn't require the flexiblity, the best thing to do is to simplfy the code to always pass NULL to get_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | * IB/ipath: Change get_user_pages() usage to always NULL vmasMike Marciniszyn2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The static helper routine, __ipath_get_user_pages(), accepts a vma arg, but current use always passes NULL. This has caused some confusion associated with the correct use of this argument, but since the current use case doesn't require the flexiblity, the best thing to do is to simplfy the code to always pass NULL to get_user_pages(). Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | RDMA/ocrdma: Use right macro in query AHdevesh.sharma@emulex.com2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ocrdma_query_ah() does not use correct macro, and checks the wrong bit for the validity of address handle in vector table. Fix this. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | RDMA/ocrdma: Resolve L2 address when creating user AHdevesh.sharma@emulex.com2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of IP-based GIDs, userspace AHs must have MAC and VLAN ID resolved separately. Presently, user AHs are broken for ocrdma. This patch resolves L2 addresses while creating user AH and obtains the right DMAC and VLAN ID before creating AH. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | RDMA/ocrdma: Do not skip setting deferred_armDevesh Sharma2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ib_request_notify_cq() is called for the first time, ocrdma tries to skip setting deffered_arm flag. This may lead CQ to an un-armed state thus never generating a CQ event and leaving consumer hung. This patch removes the part of code that skips setting deferred_arm. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | * | RDMA/ocrdma: Report correct value of max_fast_reg_page_list_lenDevesh Sharma2014-09-19
| | | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix ocrdma_query_device() to report correct value of max_fast_reg_page_list_len. Signed-off-by: Devesh Sharma <devesh.sharma@emulex.com> Signed-off-by: Selvin Xavier <selvin.xavier@emulex.com> Signed-off-by: Mitesh Ahuja <mitesh.ahuja@emulex.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Fix VF mac handling in RoCEJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had several problems here. First, a race condition on QP1 mac handling between mlx4_ib_update_qps and mlx4_ib_modify_qp, which is fixed by taking the qp mutex in mlx4_ib_update_qps. Also, qp->pri.smac_port was not updated in mlx4_ib_update_qps. Last, in __mlx4_ib_modify_qp we did not properly handle the case where the mac is zero, but port is non-zero. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Do not allow APM under RoCEJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Automatic Path Migration is not supported under RoCE. Therefore, return a "not-supported" error if the caller attempts to set an alternate path in a QP context. In addition, if there are no IB ports configured, do not report APM capability in the device flags returned by mlx4_ib_query_device. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Don't update QP1 in native modeJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For native functions (non-SR-IOV), there's no reason to update the smac_index, as QP1 is a GSI QP. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Avoid accessing netdevice when building RoCE qp1 headerJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The source MAC is needed in RoCE when building the QP1 header. Currently, this is obtained from the source net device. However, the net device may not yet exist, or can be destroyed in parallel to this QP1 send operation (e.g through the VPI port change flow) so accessing it may cause a kernel crash. To fix this, we maintain a source MAC cache per port for the net device in struct mlx4_ib_roce. This cached MAC is initialized to be the default MAC address obtained during HCA initialization via QUERY_PORT. This cached MAC is updated via the netdev event notifier handler. Since the cached MAC is held in an atomic64 object, we do not need locking when accessing it. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addressesJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a chance that the VF mlx4 RoCE driver (mlx4_ib) may see a 0-mac as the current default MAC address when a RoCE interface first comes up. In this case, the RoCE driver registers the 0-mac to get its MAC index -- used in the INIT2RTR transition when it creates its proxy Q1 qp's. If we do not allow QP1 to be created, the RoCE driver will not come up. If we do not register the 0-mac, but simply use a random mac-index, QP1 will attempt to send packets with an someone's else source MAC which will get the system into more troubled. Since a 0-mac was previously used to indicate a free slot, this leads to errors, both when the 0-mac is registered and when it is unregistered. The required fix is to check in addition that the slot containing the 0-mac has a reference count of zero. Additionally, when comparing MAC addresses, need to mask out the 2 MSBs of the u64 mac on both sides of the comparison. Note that when the EN driver (mlx4_en) comes up, it set itself a proper mac --> the RoCE driver gets to be notified on that and further handing is done with the update qp command, as was added by commit 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes"). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/core: When marshaling uverbs path, clear unused fieldsMatan Barak2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When marsheling a user path to the kernel struct ib_sa_path, need to zero smac, dmac and set the vlan id to the "no vlan" value. Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures") Reported-by: Aleksey Senin <alekseys@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Avoid executing gid task when device is being removedMoni Shoua2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When device is being removed (e.g during VPI port link type change from ETH to IB), tasks for gid table changes should not be executed. Flush the current queue of tasks and block further tasks from entering the queue. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Fix lockdep splat for the iboe lockJack Morgenstein2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Chuck Lever reported the following stack trace: ================================= [ INFO: inconsistent lock state ] 3.16.0-rc2-00024-g2e78883 #17 Tainted: G E --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (&(&iboe->lock)->rlock){+.?...}, at: [<ffffffffa065f68b>] mlx4_ib_addr_event+0xdb/0x1a0 [mlx4_ib] {SOFTIRQ-ON-W} state was registered at: [<ffffffff810b3110>] mark_irqflags+0x110/0x170 [<ffffffff810b4806>] __lock_acquire+0x2c6/0x5b0 [<ffffffff810b4bd9>] lock_acquire+0xe9/0x120 [<ffffffff815f7f6e>] _raw_spin_lock+0x3e/0x80 [<ffffffffa0661084>] mlx4_ib_scan_netdevs+0x34/0x260 [mlx4_ib] [<ffffffffa06612db>] mlx4_ib_netdev_event+0x2b/0x40 [mlx4_ib] [<ffffffff81522219>] register_netdevice_notifier+0x99/0x1e0 [<ffffffffa06626e3>] mlx4_ib_add+0x743/0xbc0 [mlx4_ib] [<ffffffffa05ec168>] mlx4_add_device+0x48/0xa0 [mlx4_core] [<ffffffffa05ec2c3>] mlx4_register_interface+0x73/0xb0 [mlx4_core] [<ffffffffa05c505e>] cm_req_handler+0x13e/0x460 [ib_cm] [<ffffffff810002e2>] do_one_initcall+0x112/0x1c0 [<ffffffff810e8264>] do_init_module+0x34/0x190 [<ffffffff810ea62f>] load_module+0x5cf/0x740 [<ffffffff810ea939>] SyS_init_module+0x99/0xd0 [<ffffffff815f8fd2>] system_call_fastpath+0x16/0x1b irq event stamp: 336142 hardirqs last enabled at (336142): [<ffffffff810612f5>] __local_bh_enable_ip+0xb5/0xc0 hardirqs last disabled at (336141): [<ffffffff81061296>] __local_bh_enable_ip+0x56/0xc0 softirqs last enabled at (336004): [<ffffffff8106123a>] _local_bh_enable+0x4a/0x50 softirqs last disabled at (336005): [<ffffffff810617a4>] irq_exit+0x44/0xd0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&iboe->lock)->rlock); <Interrupt> lock(&(&iboe->lock)->rlock); *** DEADLOCK *** The above problem was caused by the spin lock being taken both in the process context and in a soft-irq context (in a netdev notifier handler). The required fix is to use spin_lock/unlock_bh() instead of spin_lock/unlock on the iboe lock. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes upMoni Shoua2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a RoCE port becomes active and the netdev of the port has upper device (e.g bond/team), GIDs derived from the upper dev should appear in the port's RoCE GID table. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Reorder steps in RoCE GID table initializationMoni Shoua2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no need to reset the gid table twice and we need to do it only for Ethernet ports. Also, no need to actively scan ndetdevs since it's being done immediatly after we register netdev notifiers. Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Don't duplicate the default RoCE GIDMoni Shoua2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When reading the IPv6 addresses from the net-device, make sure to avoid adding a duplicate entry to the GID table because of equality between the default GID we generate and the default IPv6 link-local address of the device. Fixes: acc4fccf4eff ("IB/mlx4: Make sure GID index 0 is always occupied") Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()Moni Shoua2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When Ethernet netdev is not present for a port (e.g. when the link layer type of the port is InfiniBand) it's possible to dereference a null pointer when we do netdevice scanning. To fix that, we move a section of code that needs to run only when netdev is present to a proper if () statement. Fixes: ad4885d279b6 ("IB/mlx4: Build the port IBoE GID table properly under bonding") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | mlx4: Correct error flows in rereg_mrMatan Barak2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch addresses feedback from Sagi Grimberg on the rereg_mr implementation of mlx4. The following are fixed: 1. Set the correct pd_flags 2. Make sure we change the iova and size MR fields only after successful write and allocation of the MTTs. 3. Make the error checking more robust Fixes: e630664c8383 ("mlx4_core: Add helper functions to support MR re-registration") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | * | IB/mlx4: Disable TSO for Connect-X rev. A0 HCAsMarkus Stockhausen2014-09-19
| | | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | According to <http://marc.info/?t=138347640900004&r=1&w=2>, revision A0 of Connect-X does not correctly assemble TSO packets. Disable that feature on that hardware revision. Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | * | IB/iser: Bump version to 1.4.1Or Gerlitz2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | * | IB/iser: Allow bind only when connection state is UPSagi Grimberg2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to fail the bind operation if the iser connection state != UP (started teardown) and this should be done under the state lock. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | * | IB/iser: Fix RX/TX CQ resource leak on error flowRoi Dayan2014-09-22
| | | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When failing to allocate TX CQ we already allocated RX CQ, so we need to make sure we release it. Also, when failing to register notification to the RX CQ we currently leak both RX and TX CQs of the current index, fix that too. Signed-off-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * / IPoIB: Remove unnecessary port queryAlex Estrin2014-09-19
| | | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two queries for port attributes one after another. A second call is not needed since port_attr structure already holds the data. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Alex Estrin <alex.estrin@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * / IB: ib_umem_release() should decrement mm->pinned_vm from ib_umem_getShawn Bohrer2014-09-19
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In debugging an application that receives -ENOMEM from ib_reg_mr(), I found that ib_umem_get() can fail because the pinned_vm count has wrapped causing it to always be larger than the lock limit even with RLIMIT_MEMLOCK set to RLIM_INFINITY. The wrapping of pinned_vm occurs because the process that calls ib_reg_mr() will have its mm->pinned_vm count incremented. Later a different process with a different mm_struct than the one that allocated the ib_umem struct ends up releasing it which results in decrementing the new processes mm->pinned_vm count past zero and wrapping. I'm not entirely sure what circumstances cause a different process to release the ib_umem than the one that allocated it but the kernel stack trace of the freeing process from my situation looks like the following: Call Trace: [<ffffffff814d64b1>] dump_stack+0x19/0x1b [<ffffffffa0b522a5>] ib_umem_release+0x1f5/0x200 [ib_core] [<ffffffffa0b90681>] mlx4_ib_destroy_qp+0x241/0x440 [mlx4_ib] [<ffffffffa0b4d93c>] ib_destroy_qp+0x12c/0x170 [ib_core] [<ffffffffa0cc7129>] ib_uverbs_close+0x259/0x4e0 [ib_uverbs] [<ffffffff81141cba>] __fput+0xba/0x240 [<ffffffff81141e4e>] ____fput+0xe/0x10 [<ffffffff81060894>] task_work_run+0xc4/0xe0 [<ffffffff810029e5>] do_notify_resume+0x95/0xa0 [<ffffffff814e3dd0>] int_signal+0x12/0x17 The following patch fixes the issue by storing the pid struct of the process that calls ib_umem_get() so that ib_umem_release and/or ib_umem_account() can properly decrement the pinned_vm count of the correct mm_struct. Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com> Reviewed-by: Shachar Raindel <raindel@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | Merge tag 'sound-3.17-rc7' of ↵Linus Torvalds2014-09-23
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "One fix is about a buggy computation in PCM API function Clemens spotted out, but the impact must be really small as no one really uses it in user-space side. The rest are a trivial fix for a HD-audio model and a USB-audio device-specific regression fix, so all look fairly safe to apply" * tag 'sound-3.17-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: snd-usb-caiaq: Fix LED commands for Kore controller ALSA: pcm: fix fifo_size frame calculation ALSA: hda - Add fixup model name lookup for Lemote A1205
| | * | ALSA: snd-usb-caiaq: Fix LED commands for Kore controllerDaniel Mack2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | KoreController and KoreController2 need an EP1_CMD_DIMM_LEDS command to set their LEDs, not EP1_CMD_WRITE_IO. Signed-off-by: Daniel Mack <daniel@zonque.org> Reported-and-tested-by: Brad Wilson <brad.wilson.00@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| | * | ALSA: pcm: fix fifo_size frame calculationClemens Ladisch2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The calculated frame size was wrong because snd_pcm_format_physical_width() actually returns the number of bits, not bytes. Use snd_pcm_format_size() instead, which not only returns bytes, but also simplifies the calculation. Fixes: 8bea869c5e56 ("ALSA: PCM midlevel: improve fifo_size handling") Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| | * | ALSA: hda - Add fixup model name lookup for Lemote A1205Huacai Chen2014-09-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lemote A1004 is already added in commit a2dd933d01f (ALSA: hda - Add fixup name lookup for CX5051 and 5066 codecs), but Lemote A1205 has missing. Signed-off-by: Huacai Chen <chenhc@lemote.com> Cc: <stable@vger.kernel.org> # 3.15+ Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2014-09-23
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull final block fixes from Jens Axboe: "This week and last we've been fixing some corner cases related to blk-mq, mostly. I ended up pulling most of that out of for-linus yesterday, which is why the branch looks fresh. The rest were postponed for 3.18. This pull request contains: - Fix from Christoph, avoiding a stack overflow when FUA insertion would recursive infinitely. - Fix from David Hildenbrand on races between the timeout handler and uninitialized requests. Fixes a real issue that virtio_blk has run into. - A few fixes from me: - Ensure that request deadline/timeout is ordered before the request is marked as started. - A potential oops on out-of-memory, when we scale the queue depth of the device and retry. - A hang fix on requeue from SCSI, where the hardware queue would be stopped when we attempt to re-run it (and hence nothing would happen, stalling progress). - A fix for commit 2da78092, where the cleanup path was moved to RCU, but a debug might_sleep() was inadvertently left in the code. This causes warnings for people" * 'for-linus' of git://git.kernel.dk/linux-block: genhd: fix leftover might_sleep() in blk_free_devt() blk-mq: use blk_mq_start_hw_queues() when running requeue work blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps() blk-mq: avoid infinite recursion with the FUA flag blk-mq: Avoid race condition with uninitialized requests blk-mq: request deadline must be visible before marking rq as started
| | * | | genhd: fix leftover might_sleep() in blk_free_devt()Jens Axboe2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2da78092 changed the locking from a mutex to a spinlock, so we now longer sleep in this context. But there was a leftover might_sleep() in there, which now triggers since we do the final free from an RCU callback. Get rid of it. Reported-by: Pontus Fuchs <pontus.fuchs@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: use blk_mq_start_hw_queues() when running requeue workJens Axboe2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When requests are retried due to hw or sw resource shortages, we often stop the associated hardware queue. So ensure that we restart the queues when running the requeue work, otherwise the queue run will be a no-op. Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: fix potential oops on out-of-memory in __blk_mq_alloc_rq_maps()Jens Axboe2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __blk_mq_alloc_rq_maps() can be invoked multiple times, if we scale back the queue depth if we are low on memory. So don't clear set->tags when we fail, this is handled directly in the parent function, blk_mq_alloc_tag_set(). Reported-by: Robert Elliott <Elliott@hp.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: avoid infinite recursion with the FUA flagChristoph Hellwig2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not insert requests into the flush state machine from blk_mq_insert_request. All incoming flush requests come through blk_{m,s}q_make_request and are handled there, while blk_execute_rq_nowait should only be called for BLOCK_PC requests. All other callers deal with requests that already went through the flush statemchine and shouldn't be reinserted into it. Reported-by: Robert Elliott <Elliott@hp.com> Debugged-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: Avoid race condition with uninitialized requestsDavid Hildenbrand2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch should fix the bug reported in https://lkml.org/lkml/2014/9/11/249. We have to initialize at least the atomic_flags and the cmd_flags when allocating storage for the requests. Otherwise blk_mq_timeout_check() might dereference uninitialized pointers when racing with the creation of a request. Also move the reset of cmd_flags for the initializing code to the point where a request is freed. So we will never end up with pending flush request indicators that might trigger dereferences of invalid pointers in blk_mq_timeout_check(). Cc: stable@vger.kernel.org Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com> Reported-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com> Tested-by: Paulo De Rezende Pinatti <ppinatti@linux.vnet.ibm.com> Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| | * | | blk-mq: request deadline must be visible before marking rq as startedJens Axboe2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we start the request, we set the deadline and flip the bits marking the request as started and non-complete. However, it's important that the deadline store is ordered before flipping the bits, otherwise we could have a small window where the request is marked started but with an invalid deadline. This can confuse the timeout handling. Suggested-by: Ming Lei <tom.leiming@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | Merge branch 'parisc-3.17-7' of ↵Linus Torvalds2014-09-23
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "We avoid using -mfast-indirect-calls for 64bit kernel builds to prevent building an unbootable kernel due to latest gcc changes. In the pdc_stable/firmware-access driver we fix a few possible stack overflows and we now call secure_computing_strict() instead of secure_computing() which fixes upcoming SECCOMP patches in the for-next trees" * 'parisc-3.17-7' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: parisc: Only use -mfast-indirect-calls option for 32-bit kernel builds parisc: pdc_stable.c: Avoid potential stack overflows parisc: pdc_stable.c: Cleaning up unnecessary use of memset in conjunction with strncpy parisc: ptrace: use secure_computing_strict()
| | * | | | parisc: Only use -mfast-indirect-calls option for 32-bit kernel buildsJohn David Anglin2014-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In spite of what the GCC manual says, the -mfast-indirect-calls has never been supported in the 64-bit parisc compiler. Indirect calls have always been done using function descriptors irrespective of the -mfast-indirect-calls option. Recently, it was noticed that a function descriptor was always requested when the -mfast-indirect-calls option was specified. This caused problems when the option was used in application code and doesn't make any sense because the whole point of the option is to avoid using a function descriptor for indirect calls. Fixing this broke 64-bit kernel builds. I will fix GCC but for now we need the attached change. This results in the same kernel code as before. Signed-off-by: John David Anglin <dave.anglin@bell.net> Cc: stable@vger.kernel.org # v3.0+ Signed-off-by: Helge Deller <deller@gmx.de>
| | * | | | parisc: pdc_stable.c: Avoid potential stack overflowsHelge Deller2014-09-21
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Helge Deller <deller@gmx.de>
| | * | | | parisc: pdc_stable.c: Cleaning up unnecessary use of memset in conjunction ↵Rickard Strandqvist2014-09-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with strncpy Using memset before strncpy just to ensure a trailing null character is an unnecessary double writing of a string Patch modified by Helge Deller to additionally reduce stack usage. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Helge Deller <deller@gmx.de>
| | * | | | parisc: ptrace: use secure_computing_strict()Helge Deller2014-09-21
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Helge Deller <deller@gmx.de>
| * | | | | Merge tag 'please-pull-defconfig' of ↵Linus Torvalds2014-09-23
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux Pull ia64 defconfig update from Tony Luck: "Need to rebuild defconfig files to cope with removal of "select NET" in drivers/scsi/Kconfig" * tag 'please-pull-defconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux: [IA64] refresh arch/ia64/configs/* using "make savedefconfig"
| | * | | | | [IA64] refresh arch/ia64/configs/* using "make savedefconfig"Tony Luck2014-09-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prompted by a change to drivers/scsi/Kconfig which used to do a "select NET" but now does a "depends on NET". This meant that some configurations ended up without CONFIG_NET=y Signed-off-by Tony Luck <tony.luck@intel.com>
| * | | | | | Merge tag 'hwmon-for-linus' of ↵Linus Torvalds2014-09-23
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix a resource leak in tmp103 driver - Add support for two more processors to fam15h_power driver - Also fix a bug in the same driver to only report the power level on chips which actually support reporting it * tag 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (tmp103) Fix resource leak bug in tmp103 temperature sensor driver hwmon: (fam15h_power) Add support for two more processors hwmon: (fam15h_power) Make actual power reporting conditional
| | * | | | | | hwmon: (tmp103) Fix resource leak bug in tmp103 temperature sensor driversundarjdev2014-09-22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tmp103 temperature sensor driver registers with the hwmon framework by calling hwmon_device_register_with_groups but does not have a .remove method to call hwmon_device_unregister to unregister from the framework when the device is no longer needed. Fix this by calling devm_hwmon_device_register_with_groups. Signed-off-by: Sundar J Dev <sundarjayakumardev@gmail.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net>