aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband
Commit message (Collapse)AuthorAge
*-. Merge branches 'cma', 'mlx4' and 'qib' into for-nextRoland Dreier2011-12-19
|\ \
| | * IB/qib: Correct sense on freectxts increment and decrementMike Marciniszyn2011-12-19
| | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 53ab1c64983 ("IB/qib: Correct nfreectxts for multiple HCAs") reversed the increments and decrements of dd->nfreectxts. Fix it. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | IB/mlx4: Fix shutdown crash accessing a non-existent bitmapRoland Dreier2011-12-06
| |/ | | | | | | | | | | | | | | | | | | | | | | Commit cfcde11c3d7a ("IB/mlx4: Use flow counters on IBoE ports") added code that sets elements of counters[] to -1 if no counter is allocated, but then goes ahead and passes every entry to mlx4_counter_free() on shutdown. This is a bad idea, especially if MLX4_DEV_CAP_FLAG_COUNTERS isn't set so there isn't even an underlying bitmap to free from. Tested-by: Sean Hefty <sean.hefty@intel.com> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
* / RDMA/cma: Verify private data lengthSean Hefty2011-12-19
|/ | | | | | | | | | | | | private_data_len is defined as a u8. If the user specifies a large private_data size (> 220 bytes), we will calculate a total length that exceeds 255, resulting in private_data_len wrapping back to 0. This can lead to overwriting random kernel memory. Avoid this by verifying that the resulting size fits into a u8. Reported-by: B. Thery <benjamin.thery@bull.net> Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
*---. Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-nextRoland Dreier2011-11-29
|\ \ \
| | | * IB/qib: Fix over-scheduling of QSFP workMike Marciniszyn2011-11-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't over-schedule QSFP work on driver initialization. It could end up being run simultaneously on two different CPUs resulting in bad EEPROM reads. In combination with setting the physical IB link state prior to the IBC being brought out of reset, this can cause the link state machine to start training early with wrong settings. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * IB/qib: Don't use schedule_work()Mike Marciniszyn2011-11-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was mistakenly introduced by dde05cbdf8b1 ("IB/qib: Hold links until tuning data is available"). Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * | IB: Fix RCU lockdep splatsEric Dumazet2011-11-29
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * / IB/ipoib: Prevent hung task or softlockup processing multicast responseMike Marciniszyn2011-11-29
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | RDMA/cxgb4: Fix retry with MPAv1 logic for MPAv2Kumar Sanghvi2011-11-28
| | | | | | | | | | | | | | | | | | Fix logic so that we don't retry with MPAv1 once we have done that already. Otherwise, we end up retrying with MPAv1 even when its not needed on getting peer aborts - and this could lead to kernel panic. Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logicJonathan Lallinger2011-11-28
|/ | | | | | | | | | Fix another place in the code where logic dealing with the t4_cqe was using the wrong QID. This fixes the counting logic so that it tests against the SQ QID instead of the RQ QID when counting RCQES. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds2011-11-06
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
| * infiniband: add moduleparam.h to drivers/infiniband as requiredPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULEPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | | | These were getting it implicitly via device.h --> module.h but we are going to stop that when we clean up the headers. Fix these in advance so the tree remains biscect-clean. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * infiniband: Fix up module files that need to include module.hPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | They had been getting it implicitly via device.h but we can't rely on that for the future, due to a pending cleanup so fix it now. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * infiniband: Fix up users implicitly relying on getting stat.hPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | They get it via module.h (via device.h) but we want to clean that up. When we do, we'll get things like: CC [M] drivers/infiniband/core/sysfs.o sysfs.c:361: error: 'S_IRUGO' undeclared here (not in a function) sysfs.c:654: error: 'S_IWUSR' undeclared here (not in a function) so add in the stat header it is using explicitly in advance. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| |
| \
*-. \ Merge branches 'iser', 'mthca' and 'qib' into for-nextRoland Dreier2011-11-04
|\ \ \
| | * | IB/qib: Fix panic in RC error flushing logicMike Marciniszyn2011-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following panic can occur when flushing a QP: RIP: 0010:[<ffffffffa0168e8b>] [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] RSP: 0018:ffff8803cdc6fc90 EFLAGS: 00010046 RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000 RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000 RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001 R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0 R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246 FS: 0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100) Stack: 53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a Call Trace: [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib] [<ffffffffa0165a10>] ? qib_make_rc_req+0x0/0xe80 [ib_qib] [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib] [<ffffffff814db757>] ? thread_return+0x4e/0x777 [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib] [<ffffffff81088bf0>] worker_thread+0x170/0x2a0 [<ffffffff8108e530>] ? autoremove_wake_function+0x0/0x40 [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0 [<ffffffff8108e1c6>] kthread+0x96/0xa0 [<ffffffff8100c1ca>] child_rip+0xa/0x20 [<ffffffff8108e130>] ? kthread+0x0/0xa0 [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 RIP [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib] The RC error state flush logic in qib_make_rc_req() could return all of the acked wqes and potentially have emptied the queue. It would then unconditionally try return a flush completion via qib_send_complete() for an invalid wqe, or worse a valid one that is not queued. The panic results when the completion code tries to maintain an MR reference count for a NULL MR. This fix modifies logic to only send one completion per qib_make_rc_req() call and changing the completion status from IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress. The outer loop will call as many times as necessary to flush the queue. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * | | IB/mthca: Fix buddy->num_free allocation sizeRoland Dreier2011-11-03
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The num_free field of mthca_buddy has a type of array of unsigned int while it was allocated as an array of pointers. On 64-bit platforms this allocates twice more than required. Fix this by allocating the correct size for the type. This is the same bug just fixed in mlx4 by Eli Cohen <eli@mellanox.co.il>. Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | IB/iser: DMA unmap TX bufs used for iSCSI/iSER headersOr Gerlitz2011-11-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | The current driver never does DMA unmapping on these buffers. Fix that by adding DMA unmapping to the task cleanup callback, and DMA mapping to the task init function (drop the headers_initialized micro-optimization). Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | | IB/iser: Use separate buffers for the login request/responseOr Gerlitz2011-11-04
|/ / | | | | | | | | | | | | | | | | | | | | | | The driver counted on the transactional nature of iSCSI login/text flows and used the same buffer for both the request and the response. We also went further and did DMA mapping only once, with DMA_FROM_DEVICE, which violates the DMA mapping API. Fix that by using different buffers, one for requests and one for responses, and use the correct DMA mapping direction for each. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-11-01
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits) mlx4_core: Deprecate log_num_vlan module param IB/mlx4: Don't set VLAN in IBoE WQEs' control segment IB/mlx4: Enable 4K mtu for IBoE RDMA/cxgb4: Mark QP in error before disabling the queue in firmware RDMA/cxgb4: Serialize calls to CQ's comp_handler RDMA/cxgb3: Serialize calls to CQ's comp_handler IB/qib: Fix issue with link states and QSFP cables IB/mlx4: Configure extended active speeds mlx4_core: Add extended port capabilities support IB/qib: Hold links until tuning data is available IB/qib: Clean up checkpatch issue IB/qib: Remove s_lock around header validation IB/qib: Precompute timeout jiffies to optimize latency IB/qib: Use RCU for qpn lookup IB/qib: Eliminate divide/mod in converting idx to egr buf pointer IB/qib: Decode path MTU optimization IB/qib: Optimize RC/UC code by IB operation IPoIB: Use the right function to do DMA unmap pages RDMA/cxgb4: Use correct QID in insert_recv_cqe() RDMA/cxgb4: Make sure flush CQ entries are collected on connection close ...
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| | \
| *-------------------. \ Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', ↵Roland Dreier2011-11-01
| |\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next
| | | | | | | | | | | | * | RDMA/uverbs: Export ib_open_qp() capability to user spaceSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow processes that share the same XRC domain to open an existing shareable QP. This permits those processes to receive events on the shared QP and transfer ownership, so that any process may modify the QP. The latter allows the creating process to exit, while a remaining process can still transition it for path migration purposes. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/core: Export ib_open_qp() to share XRC TGT QPsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC TGT QPs are shared resources among multiple processes. Since the creating process may exit, allow other processes which share the same XRC domain to open an existing QP. This allows us to transfer ownership of an XRC TGT QP to another process. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/mlx4: Add support for XRC QPsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support the creation of XRC INI and TGT QPs. To handle the case where a CQ or PD is not provided, we allocate them internally with the xrcd. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/mlx4: Add support for XRC SRQsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the user to create XRC SRQs. This patch is based on a patch from Jack Morgenstrein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/mlx4: Add support for XRC domainsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support creating and destroying XRC domains. Any sharing of the XRCD is managed above the low-level driver. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/cm: Do not automatically disconnect XRC TGT QPsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because an XRC TGT QP can end up being shared among multiple processes, don't have the ib_cm automatically send a DREQ when the userspace process that owns the ib_cm_id exits. Disconnect can be initiated by the user directly; otherwise, the owner of the XRC INI QP controls the connection. Note that as a result of the process exiting, the ib_cm will stop tracking the XRC connection on the target side. For the purposes of disconnecting, this isn't a big deal. The ib_cm will respond to the DREQ appropriately. For other messages, mainly LAP, the CM will reject the request, since there's no one available to route the request to. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/cma: Support XRC QPsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow users to connect XRC QPs through the rdma_cm. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/ucm: Allow user to specify QP type when creating idSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow the user to indicate the QP type separately from the port space when allocating an rdma_cm_id. With RDMA_PS_IB, there is no longer a 1:1 relationship between the QP type and port space, so we need to switch on the QP type to select between UD and connected QPs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/cm: Define new RDMA port space specific to IBSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add RDMA_PS_IB. XRC QP types will use the IB port space when operating over the RDMA CM. For the 'IP protocol' field value, we select 0x3F, which is listed as being for 'any local network'. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/cm: Update XRC support based on XRC annex errataSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The XRC annex was updated to have XRC behave more like RD. Specifically, the XRC TGT QPN moves from the local QPN to local EECN field. Lookup of SRQN is done using the REQ/REP protocol. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | IB/cm: Update protocol to support XRCSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the REQ and REP messages to support XRC connection setup according to the XRC Annex. Several existing fields must be set to 0 or 1 when connecting XRC QPs, and a reserved field is changed to an extended transport type. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/uverbs: Export XRC TGT QPs to user spaceSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow user space to operate on XRC TGT QPs the same way as other types of QPs, with one notable exception: since XRC TGT QPs may be shared among multiple processes, the XRC TGT QP is allowed to exist beyond the lifetime of the creating process. The process that creates the QP is allowed to destroy it, but if the process exits without destroying the QP, then the QP will be left bound to the lifetime of the XRCD. TGT QPs are not associated with CQs or a PD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/uverbs: Export XRC INI QPs to userspaceSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC INI QPs are similar to send only RC QPs. Allow user space to create INI QPs. Note that INI QPs do not require receive CQs. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/uverbs: Export XRC SRQs to user spaceSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We require additional information to create XRC SRQs than we can exchange using the existing create SRQ ABI. Provide an enhanced create ABI for extended SRQ types. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> and Roland Dreier <roland@purestorage.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/uverbs: Export XRC domains to user spaceSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow user space to create XRC domains. Because XRCDs are expected to be shared among multiple processes, we use inodes to identify an XRCD. Based on patches by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/verbs: Cleanup XRC TGT QPs when destroying XRCDSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC TGT QPs are intended to be shared among multiple users and processes. Allow the destruction of an XRC TGT QP to be done explicitly through ib_destroy_qp() or when the XRCD is destroyed. To support destroying an XRC TGT QP, we need to track TGT QPs with the XRCD. When the XRCD is destroyed, all tracked XRC TGT QPs are also cleaned up. To avoid stale reference issues, if a user is holding a reference on a TGT QP, we increment a reference count on the QP. The user releases the reference by calling ib_release_qp. This releases any access to the QP from a user above verbs, but allows the QP to continue to exist until destroyed by the XRCD. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/core: Add XRC QPsSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC communication is between an initiator (INI) QP and a target (TGT) QP. Target QPs are associated with SRQs through an XRCD. An XRC TGT QP behaves like a receive-only RD QP. XRC INI QPs behave similarly to RC QPs, except that work requests posted to an XRC INI QP must specify the remote SRQ that is the target of the work request. We define two new QP types for XRC, to distinguish between INI and TGT QPs, and update the core layer to support XRC QPs. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/core: Add XRC SRQ typeSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). XRC defines SRQs that are specifically used by XRC connections. Expand the SRQ code to support XRC SRQs. An XRC SRQ is currently restricted to only XRC use according to the IB XRC Annex. Portions of this patch were derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/core: Add SRQ type fieldSean Hefty2011-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | | * | RDMA/core: Add XRC domain supportSean Hefty2011-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | XRC ("eXtended reliable connected") is an IB transport that provides better scalability by allowing senders to specify which shared receive queue (SRQ) should be used to receive a message, which essentially allows one transport context (QP connection) to serve multiple destinations (as long as they share an adapter, of course). A few new concepts are introduced to support this. This patch adds: - A new device capability flag, IB_DEVICE_XRC, which low-level drivers set to indicate that a device supports XRC. - A new object type, XRC domains (struct ib_xrcd), and new verbs ib_alloc_xrcd()/ib_dealloc_xrcd(). XRCDs are used to limit which XRC SRQs an incoming message can target. This patch is derived from work by Jack Morgenstein <jackm@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Fix issue with link states and QSFP cablesMitko Haralanov2011-10-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix an issue where the link would come up after replugging a cable even if it has been DISABLED manually. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Hold links until tuning data is availableMitko Haralanov2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hold the link state machine until the tuning data is read from the QSFP EEPROM so correct tuning settings are applied before the state machine attempts to bring the link up. Link is also held on cable unplug in case a different cable is used. Signed-off-by: Mitko Haralanov <mitko@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Clean up checkpatch issueMike Marciniszyn2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was probably present from initial submission. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Remove s_lock around header validationMike Marciniszyn2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Review of qib_ruc_check_hdr() shows that the s_lock is not required in the normal case. The r_lock is held in all cases, and protects the qp fields that are read. The s_lock will be needed to around the call to qib_migrate_qp() to insure that the send engine sees a consistent set of fields. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Precompute timeout jiffies to optimize latencyMike Marciniszyn2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A new field is added to qib_qp called timeout_jiffies. It is initialized upon create and modify. The field is now used instead of a computation based on qp->timeout. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Use RCU for qpn lookupMike Marciniszyn2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The heavy weight spinlock in qib_lookup_qpn() is replaced with RCU. The hash list itself is now accessed via jhash functions instead of mod. The changes should benefit multiple receive contexts in different processors by not contending for the lock just to read the hash structures. The patch also adds a lookaside_qp (pointer) and a lookaside_qpn in the context. The interrupt handler will test the current packet's qpn against lookaside_qpn if the lookaside_qp pointer is non-NULL. The pointer is NULL'ed when the interrupt handler exits. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | | | | | | | | | * | | IB/qib: Eliminate divide/mod in converting idx to egr buf pointerMike Marciniszyn2011-10-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The context init now saves a shift from rcvegrbufs_perchunk rcvegrbufs_perchunk_shift using ilog2. A BUG_ON() protects the power of 2 assumption. Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>