aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/hw/cxgb4/cq.c
Commit message (Collapse)AuthorAge
* iw_cxgb4: invalidate the mr when posting a read_w_inv wrSteve Wise2016-11-16
| | | | | | | | | Also, rearrange things a bit to have a common c4iw_invalidate_mr() function used everywhere that we need to invalidate. Fixes: 49b53a93a64a ("iw_cxgb4: add fast-path for small REG_MR operations") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* Merge tag 'for-linus' of ↵Linus Torvalds2016-10-09
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull main rdma updates from Doug Ledford: "This is the main pull request for the rdma stack this release. The code has been through 0day and I had it tagged for linux-next testing for a couple days. Summary: - updates to mlx5 - updates to mlx4 (two conflicts, both minor and easily resolved) - updates to iw_cxgb4 (one conflict, not so obvious to resolve, proper resolution is to keep the code in cxgb4_main.c as it is in Linus' tree as attach_uld was refactored and moved into cxgb4_uld.c) - improvements to uAPI (moved vendor specific API elements to uAPI area) - add hns-roce driver and hns and hns-roce ACPI reset support - conversion of all rdma code away from deprecated create_singlethread_workqueue - security improvement: remove unsafe ib_get_dma_mr (breaks lustre in staging)" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (75 commits) staging/lustre: Disable InfiniBand support iw_cxgb4: add fast-path for small REG_MR operations cxgb4: advertise support for FR_NSMR_TPTE_WR IB/core: correctly handle rdma_rw_init_mrs() failure IB/srp: Fix infinite loop when FMR sg[0].offset != 0 IB/srp: Remove an unused argument IB/core: Improve ib_map_mr_sg() documentation IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets IB/mthca: Move user vendor structures IB/nes: Move user vendor structures IB/ocrdma: Move user vendor structures IB/mlx4: Move user vendor structures IB/cxgb4: Move user vendor structures IB/cxgb3: Move user vendor structures IB/mlx5: Move and decouple user vendor structures IB/{core,hw}: Add constant for node_desc ipoib: Make ipoib_warn ratelimited IB/mlx4/alias_GUID: Remove deprecated create_singlethread_workqueue IB/ipoib_verbs: Remove deprecated create_singlethread_workqueue IB/ipoib: Remove deprecated create_singlethread_workqueue ...
| * iw_cxgb4: add fast-path for small REG_MR operationsSteve Wise2016-10-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When processing a REG_MR work request, if fw supports the FW_RI_NSMR_TPTE_WR work request, and if the page list for this registration is <= 2 pages, and the current state of the mr is INVALID, then use FW_RI_NSMR_TPTE_WR to pass down a fully populated TPTE for FW to write. This avoids FW having to do an async read of the TPTE blocking the SQ until the read completes. To know if the current MR state is INVALID or not, iw_cxgb4 must track the state of each fastreg MR. The c4iw_mr struct state is updated as REG_MR and LOCAL_INV WRs are posted and completed, when a reg_mr is destroyed, and when RECV completions are processed that include a local invalidation. This optimization increases small IO IOPS for both iSER and NVMF. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | iw_cxgb4: Fix cxgb4 arm CQ logic w/IB_CQ_REPORT_MISSED_EVENTSBharat Potnuri2016-08-23
|/ | | | | | | | | | | | | | | Current cxgb4 arm CQ logic ignores IB_CQ_REPORT_MISSED_EVENTS for request completion notification on a CQ. Due to this ib_poll_handler() assumes all events polled and avoids further iopoll scheduling. This patch adds logic to cxgb4 ib_req_notify_cq() handler to check if CQ is not empty and return accordingly. Based on the return value of ib_req_notify_cq() handler, ib_poll_handler() will schedule a run of iopoll handler. Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/iw_cxgb4: Low resource fixes for Completion queueHariprasad S2016-06-23
| | | | | | | | | | Pre-allocate buffers to deallocate completion queue, so that completion queue is deallocated during RDMA termination when system is running out of memory. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chipsHariprasad S2016-04-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For T4, kernel mode qps don't use the user doorbell. User mode qps during flow control db ringing are forced into kernel, where user doorbell is treated as kernel doorbell and proper bar2 offset in bar2 virtual space is calculated, which incase of T4 is a bogus address, causing a kernel panic due to illegal write during doorbell ringing. In case of T4, kernel mode qp bar2 virtual address should be 0. Added T4 check during bar2 virtual address calculation to return 0. Fixed Bar2 range checks based on bar2 physical address. The below oops will be fixed <1>BUG: unable to handle kernel paging request at 000000000002aa08 <1>IP: [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4>PGD 1416a8067 PUD 15bf35067 PMD 0 <4>Oops: 0002 [#1] SMP <4>last sysfs file: /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.4/infiniband/cxgb4_0/node_guid <4>CPU 5 <4>Modules linked in: rdma_ucm rdma_cm ib_cm ib_sa ib_mad ib_uverbs ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge autofs4 target_core_iblock target_core_file target_core_pscsi target_core_mod configfs bnx2fc cnic uio fcoe libfcoe libfc scsi_transport_fc scsi_tgt 8021q garp stp llc cpufreq_ondemand acpi_cpufreq freq_table mperf vhost_net macvtap macvlan tun kvm uinput microcode iTCO_wdt iTCO_vendor_support sg joydev serio_raw i2c_i801 i2c_core lpc_ich mfd_core e1000e ptp pps_core ioatdma dca i7core_edac edac_core shpchp ext3 jbd mbcache sd_mod crc_t10dif pata_acpi ata_generic ata_piix iw_cxgb4 iw_cm ib_core ib_addr cxgb4 ipv6 dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] <4> Supermicro X8ST3/X8ST3 <4>RIP: 0010:[<ffffffffa011d800>] [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4>RSP: 0000:ffff880155a03db0 EFLAGS: 00010006 <4>RAX: 000000000000001d RBX: ffff88013ae5fc00 RCX: ffff880155adb180 <4>RDX: 000000000002aa00 RSI: 0000000000000001 RDI: ffff88013ae5fdf8 <4>RBP: ffff880155a03e10 R08: 0000000000000000 R09: 0000000000000001 <4>R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 <4>R13: 000000000000001d R14: ffff880156414ab0 R15: ffffe8ffffc05b88 <4>FS: 0000000000000000(0000) GS:ffff8800282a0000(0000) knlGS:0000000000000000 <4>CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b <4>CR2: 000000000002aa08 CR3: 000000015bd0e000 CR4: 00000000000007e0 <4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 <4>DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 <4>Process cxgb4 (pid: 394, threadinfo ffff880155a00000, task ffff880156414ab0) <4>Stack: <4> ffff880156415068 ffff880155adb180 ffff880155a03df0 ffffffffa00a344b <4><d> 00000000000003e8 ffff880155920000 0000000000000004 ffff880155920000 <4><d> ffff88015592d438 ffffffffa00a3860 ffff880155a03fd8 ffffe8ffffc05b88 <4>Call Trace: <4> [<ffffffffa00a344b>] ? enable_txq_db+0x2b/0x80 [cxgb4] <4> [<ffffffffa00a3860>] ? process_db_full+0x0/0xa0 [cxgb4] <4> [<ffffffffa00a38a6>] process_db_full+0x46/0xa0 [cxgb4] <4> [<ffffffff8109fda0>] worker_thread+0x170/0x2a0 <4> [<ffffffff810a6aa0>] ? autoremove_wake_function+0x0/0x40 <4> [<ffffffff8109fc30>] ? worker_thread+0x0/0x2a0 <4> [<ffffffff810a660e>] kthread+0x9e/0xc0 <4> [<ffffffff8100c28a>] child_rip+0xa/0x20 <4> [<ffffffff810a6570>] ? kthread+0x0/0xc0 <4> [<ffffffff8100c280>] ? child_rip+0x0/0x20 <4>Code: e9 ba 00 00 00 66 0f 1f 44 00 00 44 8b 05 29 07 02 00 45 85 c0 0f 85 71 02 00 00 8b 83 70 01 00 00 45 0f b7 ed c1 e0 0f 44 09 e8 <89> 42 08 0f ae f8 66 c7 83 82 01 00 00 00 00 44 0f b7 ab dc 01 <1>RIP [<ffffffffa011d800>] c4iw_uld_control+0x4e0/0x880 [iw_cxgb4] <4> RSP <ffff880155a03db0> <4>CR2: 000000000002aa08` Based on original work by Bharat Potnuri <bharat@chelsio.com> Fixes: 74217d4c6a4fb0d8 ("iw_cxgb4: support for bar2 qid densities exceeding the page size") Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Reviewed-by: Leon Romanovsky <leon@leon.nu> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: add queue drain functionsSteve Wise2016-02-29
| | | | | | | | | | | | | Add completion objects, named sq_drained and rq_drained, to the c4iw_qp struct. The queue-specific completion object is signaled when the last CQE is drained from the CQ for that queue. Add c4iw_drain_sq() to block until qp->rq_drained is completed. Add c4iw_drain_rq() to block until qp->sq_drained is completed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB: remove in-kernel support for memory windowsChristoph Hellwig2015-12-23
| | | | | | | | | | | | Remove the unused ib_allow_mw and ib_bind_mw functions, remove the unused IB_WR_BIND_MW and IB_WC_BIND_MW opcodes and move ib_dealloc_mw into the uverbs module. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagig@mellanox.com> Reviewed-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> [core] Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Remove old FRWR APISagi Grimberg2015-10-28
| | | | | | | | No ULP uses it anymore, go ahead and remove it. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: gracefully handle unknown CQE status errorsHariprasad S2015-07-28
| | | | | | | | | | c4iw_poll_cq_on() shouldn't fail the poll operation just because the CQE status is unknown. Rather, it should map this to the "fatal error" status and log the anomaly. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* IB/core: Change provider's API of create_cq to be extendibleMatan Barak2015-06-12
| | | | | | | | | | | | | | | Add a new ib_cq_init_attr structure which contains the previous cqe (minimum number of CQ entries) and comp_vector (completion vector) in addition to a new flags field. All vendors' create_cq callbacks are changed in order to work with the new API. This commit does not change any functionality. Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-By: Devesh Sharma <devesh.sharma@avagotech.com> to patch #2 Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: support for bar2 qid densities exceeding the page sizeHariprasad S2015-06-11
| | | | | | | | | | | | | | | Handle this configuration: Queues Per Page * SGE BAR2 Queue Register Area Size > Page Size Use cxgb4_bar2_sge_qregs() to obtain the proper location within the bar2 region for a given qid. Rework the DB and GTS write functions to make use of this bar2 info. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQsHariprasad S2015-05-05
| | | | | | | | | For T5, we must not use the kdb/kgts registers, in order avoid db drops under extreme loads. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: 32b platform fixesHariprasad S2015-05-05
| | | | | | | | | | | | | | | - get_dma_mr() was using ~0UL which is should be ~0ULL. This causes the DMA MR to get setup incorrectly in hardware. - wr_log_show() needed a 64b divide function div64_u64() instead of doing division directly. - fixed warnings about recasting a pointer to a u64 Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* iw_cxgb4: Cleanup register defines/MACROS defined in t4fw_ri_api.hHariprasad Shenai2015-01-16
| | | | | | | Cleanup all the MACROS that are defined in t4fw_ri_api.h and affected files Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* iw_cxgb4: Cleanup register defines/MACROS defined in t4.hHariprasad Shenai2015-01-16
| | | | | | | Cleanup all the MACROS defined in t4.h and the affected files Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cxgb4: Cleanup macros so they follow the same style and look consistent, part 2Hariprasad Shenai2014-11-10
| | | | | | | | | | | | | | | Various patches have ended up changing the style of the symbolic macros/register defines to different style. As a result, the current kernel.org files are a mix of different macro styles. Since this macro/register defines is used by different drivers a few patch series have ended up adding duplicate macro/register define entries with different styles. This makes these register define/macro files a complete mess and we want to make them clean and consistent. This patch cleans up a part of it. Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* iw_cxgb4: advertise the correct device max attributesHariprasad Shenai2014-07-21
| | | | | | | | | | | Advertise the actual max limits for things like qp depths, number of qps, cqs, etc. Clean up the queue allocation for qps and cqs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cxgb4/iw_cxgb4: work request logging featureHariprasad Shenai2014-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit enhances the iwarp driver to optionally keep a log of rdma work request timining data for kernel mode QPs. If iw_cxgb4 module option c4iw_wr_log is set to non-zero, each work request is tracked and timing data maintained in a rolling log that is 4096 entries deep by default. Module option c4iw_wr_log_size_order allows specifing a log2 size to use instead of the default order of 12 (4096 entries). Both module options are read-only and must be passed in at module load time to set them. IE: modprobe iw_cxgb4 c4iw_wr_log=1 c4iw_wr_log_size_order=10 The timing data is viewable via the iw_cxgb4 debugfs file "wr_log". Writing anything to this file will clear all the timing data. Data tracked includes: - The host time when the work request was posted, just before ringing the doorbell. The host time when the completion was polled by the application. This is also the time the log entry is created. The delta of these two times is the amount of time took processing the work request. - The qid of the EQ used to post the work request. - The work request opcode. - The cqe wr_id field. For sq completions requests this is the swsqe index. For recv completions this is the MSN of the ingress SEND. This value can be used to match log entries from this log with firmware flowc event entries. - The sge timestamp value just before ringing the doorbell when posting, the sge timestamp value just after polling the completion, and CQE.timestamp field from the completion itself. With these three timestamps we can track the latency from post to poll, and the amount of time the completion resided in the CQ before being reaped by the application. With debug firmware, the sge timestamp is also logged by firmware in its flowc history so that we can compute the latency from posting the work request until the firmware sees it. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* iw_cxgb4: Detect Ing. Padding Boundary at run-timeHariprasad Shenai2014-07-15
| | | | | | | | | Updates iw_cxgb4 to determine the Ingress Padding Boundary from cxgb4_lld_info, and take subsequent actions. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds2014-06-12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking updates from David Miller: 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov. 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J Benniston. 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn Mork. 4) BPF now has a "random" opcode, from Chema Gonzalez. 5) Add more BPF documentation and improve test framework, from Daniel Borkmann. 6) Support TCP fastopen over ipv6, from Daniel Lee. 7) Add software TSO helper functions and use them to support software TSO in mvneta and mv643xx_eth drivers. From Ezequiel Garcia. 8) Support software TSO in fec driver too, from Nimrod Andy. 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli. 10) Handle broadcasts more gracefully over macvlan when there are large numbers of interfaces configured, from Herbert Xu. 11) Allow more control over fwmark used for non-socket based responses, from Lorenzo Colitti. 12) Do TCP congestion window limiting based upon measurements, from Neal Cardwell. 13) Support busy polling in SCTP, from Neal Horman. 14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru. 15) Bridge promisc mode handling improvements from Vlad Yasevich. 16) Don't use inetpeer entries to implement ID generation any more, it performs poorly, from Eric Dumazet. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits) rtnetlink: fix userspace API breakage for iproute2 < v3.9.0 tcp: fixing TLP's FIN recovery net: fec: Add software TSO support net: fec: Add Scatter/gather support net: fec: Increase buffer descriptor entry number net: fec: Factorize feature setting net: fec: Enable IP header hardware checksum net: fec: Factorize the .xmit transmit function bridge: fix compile error when compiling without IPv6 support bridge: fix smatch warning / potential null pointer dereference via-rhine: fix full-duplex with autoneg disable bnx2x: Enlarge the dorq threshold for VFs bnx2x: Check for UNDI in uncommon branch bnx2x: Fix 1G-baseT link bnx2x: Fix link for KR with swapped polarity lane sctp: Fix sk_ack_backlog wrap-around problem net/core: Add VF link state control policy net/fsl: xgmac_mdio is dependent on OF_MDIO net/fsl: Make xgmac_mdio read error message useful net_sched: drr: warn when qdisc is not work conserving ...
| * iw_cxgb4: Allocate and use IQs specifically for indirect interruptsHariprasad Shenai2014-06-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently indirect interrupts for RDMA CQs funnel through the LLD's RDMA RXQs, which also handle direct interrupts for offload CPLs during RDMA connection setup/teardown. The intended T4 usage model, however, is to have indirect interrupts flow through dedicated IQs. IE not to mix indirect interrupts with CPL messages in an IQ. This patch adds the concept of RDMA concentrator IQs, or CIQs, setup and maintained by the LLD and exported to iw_cxgb4 for use when creating CQs. RDMA CPLs will flow through the LLD's RDMA RXQs, and CQ interrupts flow through the CIQs. Design: cxgb4 creates and exports an array of CIQs for the RDMA ULD. These IQs are sized according to the max available CQs available at adapter init. In addition, these IQs don't need FL buffers since they only service indirect interrupts. One CIQ is setup per RX channel similar to the RDMA RXQs. iw_cxgb4 will utilize these CIQs based on the vector value passed into create_cq(). The num_comp_vectors advertised by iw_cxgb4 will be the number of CIQs configured, and thus the vector value will be the index into the array of CIQs. Based on original work by Steve Wise <swise@opengridcomputing.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_respYann Droneaud2014-05-30
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i386 ABI disagrees with most other ABIs regarding alignment of data types larger than 4 bytes: on most ABIs a padding must be added at end of the structures, while it is not required on i386. So for most ABI struct c4iw_create_cq_resp gets implicitly padded to be aligned on a 8 bytes multiple, while for i386, such padding is not added. The tool pahole can be used to find such implicit padding: $ pahole --anon_include \ --nested_anon_include \ --recursive \ --class_name c4iw_create_cq_resp \ drivers/infiniband/hw/cxgb4/iw_cxgb4.o Then, structure layout can be compared between i386 and x86_64: +++ obj-i386/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 11:43:05.547432195 +0100 --- obj-x86_64/drivers/infiniband/hw/cxgb4/iw_cxgb4.o.pahole.txt 2014-03-28 10:55:10.990133017 +0100 @@ -14,9 +13,8 @@ struct c4iw_create_cq_resp { __u32 size; /* 28 4 */ __u32 qid_mask; /* 32 4 */ - /* size: 36, cachelines: 1, members: 6 */ - /* last cacheline: 36 bytes */ + /* size: 40, cachelines: 1, members: 6 */ + /* padding: 4 */ + /* last cacheline: 40 bytes */ }; This ABI disagreement will make an x86_64 kernel try to write past the buffer provided by an i386 binary. When boundary check will be implemented, the x86_64 kernel will refuse to write past the i386 userspace provided buffer and the uverbs will fail. If the structure is on a page boundary and the next page is not mapped, ib_copy_to_udata() will fail and the uverb will fail. This patch adds an explicit padding at end of structure c4iw_create_cq_resp, and, like 92b0ca7cb149 ("IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()"), makes function c4iw_create_cq() not writting this padding field to userspace. This way, x86_64 kernel will be able to write struct c4iw_create_cq_resp as expected by unpatched and patched i386 libcxgb4. Link: http://marc.info/?i=cover.1399309513.git.ydroneaud@opteya.com Cc: <stable@vger.kernel.org> Fixes: cfdda9d764362 ("RDMA/cxgb4: Add driver for Chelsio T4 RNIC") Fixes: e24a72a3302a6 ("RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()") Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Use uninitialized_var()Steve Wise2014-04-11
| | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: SQ flush fixSteve Wise2014-04-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race when moving a QP from RTS->CLOSING where a SQ work request could be posted after the FW receives the RDMA_RI/FINI WR. The SQ work request will never get processed, and should be completed with FLUSHED status. Function c4iw_flush_sq(), however was dropping the oldest SQ work request when in CLOSING or IDLE states, instead of completing the pending work request. If that oldest pending work request was actually complete and has a CQE in the CQ, then when that CQE is proceessed in poll_cq, we'll BUG_ON() due to the inconsistent SQ/CQ state. This is a very small timing hole and has only been hit once so far. The fix is two-fold: 1) c4iw_flush_sq() MUST always flush all non-completed WRs with FLUSHED status regardless of the QP state. 2) In c4iw_modify_rc_qp(), always set the "in error" bit on the queue before moving the state out of RTS. This ensures that the state transition will not happen while another thread is in post_rc_send(), because set_state() and post_rc_send() both aquire the qp spinlock. Also, once we transition the state out of RTS, subsequent calls to post_rc_send() will fail because the "in error" bit is set. I don't think this fully closes the race where the FW can get a FINI followed a SQ work request being posted (because they are posted to differente EQs), but the #1 fix will handle the issue by flushing the SQ work request. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Ignore read reponse type 1 CQEsSteve Wise2014-03-24
| | | | | | | | These are generated by HW in some error cases and need to be silently discarded. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix incorrect BUG_ON conditionsSteve Wise2014-03-20
| | | | | | | Based on original work from Jay Hernandez <jay@chelsio.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Cap CQ size at T4_MAX_IQ_SIZESteve Wise2014-03-20
| | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix four byte info leak in c4iw_create_cq()Dan Carpenter2014-03-20
| | | | | | | | There is a four byte hole at the end of the "uresp" struct after the ->qid_mask member. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix accounting for unsignaled SQ WRs to deal with wrapSteve Wise2013-08-13
| | | | | | | | | When determining how many WRs are completed with a signaled CQE, correctly deal with queue wraps. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix QP flush logicSteve Wise2013-08-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes following fixes in QP flush logic: - correctly flushes unsignaled WRs followed by a signaled WR - supports for flushing a CQ bound to multiple QPs - resets cidx_flush if a active queue starts getting HW CQEs again - marks WQ in error when we leave RTS. This was only being done for user queues, but we need it for kernel queues too so that post_send/post_recv will start returning the appropriate error synchronously - eats unsignaled read resp CQEs. HW always inserts CQEs so we must silently discard them if the read work request was unsignaled. - handles QP flushes with pending SW CQEs. The flush and out of order completion logic has a bug where if out of order completions are flushed but not yet polled by the consumer and the qp is then flushed then we end up inserting duplicate completions. - c4iw_flush_sq() should only flush wrs that have not already been flushed. Since we already track where in the SQ we've flushed via sq.cidx_flush, just start at that point and flush any remaining. This bug only caused a problem in the presence of unsignaled work requests. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Vipul Pandya <vipul@chelsio.com> [ Fixed sparse warning due to htonl/ntohl confusion. - Roland ] Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Fix iw_cxgb4 count_rcqes() logicJonathan Lallinger2011-11-28
| | | | | | | | | | Fix another place in the code where logic dealing with the t4_cqe was using the wrong QID. This fixes the counting logic so that it tests against the SQ QID instead of the RQ QID when counting RCQES. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Serialize calls to CQ's comp_handlerKumar Sanghvi2011-10-31
| | | | | | | | | | | | | | | | | | | | Commit 01e7da6ba53c ("RDMA/cxgb4: Make sure flush CQ entries are collected on connection close") introduced a potential problem where a CQ's comp_handler can get called simultaneously from different places in the iw_cxgb4 driver. This does not comply with Documentation/infiniband/core_locking.txt, which states that at a given point of time, there should be only one callback per CQ should be active. This problem was reported by Parav Pandit <Parav.Pandit@Emulex.Com>. Based on discussion between Parav Pandit and Steve Wise, this patch fixes the above problem by serializing the calls to a CQ's comp_handler using a spin_lock. Reported-by: Parav Pandit <Parav.Pandit@Emulex.Com> Signed-off-by: Kumar Sanghvi <kumaras@chelsio.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Use correct QID in insert_recv_cqe()Jonathan Lallinger2011-10-14
| | | | | | | | | | | When creating flushed receive CQEs, set the QPID field in the t4_cqe to the SQ QID and not the RQ QID. Otherwise the poll code will not find the correct QP context. Signed-off by: Jonathan Lallinger <jonathan@ogc.us> Signed-off by: Steve Wise <swise@ogc.us> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Don't exceed hw IQ depth limit for user CQsSteve Wise2011-06-17
| | | | | | | | | | | | Memory allocated for user CQs gets rounded up to the next page boundary. And after rounding, we recalculate the resulting IQ depth and we need to make sure we don't exceed the HW limits. This bug can result a much smaller CQ allocated than was expected if the HW size field is exceeded, resulting in CQ overflow failures. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* RDMA/cxgb4: Centralize the wait logicSteve Wise2010-09-28
| | | | | Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Ignore TERMINATE CQEsSteve Wise2010-09-28
| | | | | | | T4 incorrectly inserts TERM CQEs into the CQ. Silently ignore them. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Fix warnings about casts to/from pointers of different sizesRoland Dreier2010-09-27
| | | | | | | | | | | | | | | | | | | | | | | | | Fix: drivers/infiniband/hw/cxgb4/qp.c: In function ‘create_qp’: drivers/infiniband/hw/cxgb4/qp.c:147: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_fini’: drivers/infiniband/hw/cxgb4/qp.c:988: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/qp.c: In function ‘rdma_init’: drivers/infiniband/hw/cxgb4/qp.c:1063: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/mem.c: In function ‘write_adapter_mem’: drivers/infiniband/hw/cxgb4/mem.c:74: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/cq.c: In function ‘destroy_cq’: drivers/infiniband/hw/cxgb4/cq.c:58: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/cq.c: In function ‘create_cq’: drivers/infiniband/hw/cxgb4/cq.c:135: warning: cast from pointer to integer of different size drivers/infiniband/hw/cxgb4/cm.c: In function ‘fw6_msg’: drivers/infiniband/hw/cxgb4/cm.c:2326: warning: cast to pointer from integer of different size by casting pointers to unsigned long instead of u64. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Remove dependency on __GFP_NOFAILDavid Rientjes2010-07-21
| | | | | | | | | The alloc_skb() in various allocations are failable, so remove __GFP_NOFAIL from their masks. Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Avoid false GTS CIDX_INC overflowsSteve Wise2010-07-06
| | | | | | | | | | | | | | | | | | | | | The T4 IQ hw design assumes CIDX_INC credits will be returned on a regular basis and always before the CIDX counter crosses over the PIDX counter. For RDMA CQs, however, returning CIDX_INC credits is only needed and desired when and if the CQ is armed for notification. This can lead to a GTS write returning credits that causes the HW to reject the credit update because it causes CIDX to pass PIDX. Once this happens, the CIDX/PIDX counters get out of whack and an application can miss a notification and get stuck blocked awaiting a notification. To avoid this, we allocate the HW IQ 2x times the requested size. This seems to avoid the false overflow failures. If we see more issues with this, then we'll have to add code in the poll path to return credits periodically like when the amount reaches 1/2 the queue depth). I would like to avoid this as it adds a PCI write transaction for applications that never arm the CQ (like most MPIs). Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Use the DMA state API instead of the pci equivalentsFUJITA Tomonori2010-07-06
| | | | | | | | | | | | | | | This replace the PCI DMA state API (include/linux/pci-dma.h) with the DMA equivalents since the PCI DMA state API will be obsolete. No functional change. For further information about the background: http://marc.info/?l=linux-netdev&m=127037540020276&w=2 Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Optimize CQ overflow detectionSteve Wise2010-05-25
| | | | | | | | | | | 1) save the timestamp flit in the cq when we consume a CQE. 2) always compare the saved flit with the previous entry flit when reading the next CQE entry. If the flits don't compare, then we have overflowed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: CQ size must be IQ size - 2Steve Wise2010-05-25
| | | | | | | | We need 1 extra entry for the status page and 1 to always have 1 free entry to detect when the queue is full. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* RDMA/cxgb4: Add driver for Chelsio T4 RNICSteve Wise2010-04-21
Add an RDMA/iWARP driver for Chelsio T4 Ethernet adapters. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>