aboutsummaryrefslogtreecommitdiffstats
path: root/net/sunrpc
Commit message (Collapse)AuthorAge
...
| * xprtrdma: Refactor __fmr_dma_unmap()Chuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | Separate the DMA unmap operation from freeing the MW. In a subsequent patch they will not always be done at the same time, and they are not related operations (except by order; freeing the MW must be the last step during invalidation). Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Move fr_xprt and fr_worker to struct rpcrdma_mwChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | In a subsequent patch, the fr_xprt and fr_worker fields will be needed by another memory registration mode. Move them into the generic rpcrdma_mw structure that wraps struct rpcrdma_frmr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Refactor the FRWR recovery workerChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | Maintain the order of invalidation and DMA unmapping when doing a background MR reset. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Reset MRs in frwr_op_unmap_sync()Chuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | frwr_op_unmap_sync() is now invoked in a workqueue context, the same as __frwr_queue_recovery(). There's no need to defer MR reset if posting LOCAL_INV MRs fails. This means that even when ib_post_send() fails (which should occur very rarely) the invalidation and DMA unmapping steps are still done in the correct order. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Save I/O direction in struct rpcrdma_frwrChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the the I/O direction field from rpcrdma_mr_seg into the rpcrdma_frmr. This makes it possible to DMA-unmap the frwr long after an RPC has exited and its rpcrdma_mr_seg array has been released and re-used. This might occur if an RPC times out while waiting for a new connection to be established. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Rename rpcrdma_frwr::sg and sg_nentsChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | Clean up: Follow same naming convention as other fields in struct rpcrdma_frwr. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Use core ib_drain_qp() APIChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | Clean up: Replace rpcrdma_flush_cqs() and rpcrdma_clean_cqs() with the new ib_drain_qp() API. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-By: Leon Romanovsky <leonro@mellanox.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Remove rpcrdma_create_chunks()Chuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | rpcrdma_create_chunks() has been replaced, and can be removed. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Allow Read list and Reply chunk simultaneouslyChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rpcrdma_marshal_req() makes a simplifying assumption: that NFS operations with large Call messages have small Reply messages, and vice versa. Therefore with RPC-over-RDMA, only one chunk type is ever needed for each Call/Reply pair, because one direction needs chunks, the other direction will always fit inline. In fact, this assumption is asserted in the code: if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) { dprintk("RPC: %s: cannot marshal multiple chunk lists\n", __func__); return -EIO; } But RPCGSS_SEC breaks this assumption. Because krb5i and krb5p perform data transformation on RPC messages before they are transmitted, direct data placement techniques cannot be used, thus RPC messages must be sent via a Long call in both directions. All such calls are sent with a Position Zero Read chunk, and all such replies are handled with a Reply chunk. Thus the client must provide every Call/Reply pair with both a Read list and a Reply chunk. Without any special security in effect, NFSv4 WRITEs may now also use the Read list and provide a Reply chunk. The marshal_req logic was preventing that, meaning an NFSv4 WRITE with a large payload that included a GETATTR result larger than the inline threshold would fail. The code that encodes each chunk list is now completely contained in its own function. There is some code duplication, but the trade-off is that the overall logic should be more clear. Note that all three chunk lists now share the rl_segments array. Some additional per-req accounting is necessary to track this usage. For the same reasons that the above simplifying assumption has held true for so long, I don't expect more array elements are needed at this time. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Update comments in rpcrdma_marshal_req()Chuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | Update documenting comments to reflect code changes over the past year. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Avoid using Write list for small NFS READ requestsChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the latency and interrupt overhead of registering a Write chunk when handling NFS READ requests of a few hundred bytes or less. This change does not interoperate with Linux NFS/RDMA servers that do not have commit 9d11b51ce7c1 ('svcrdma: Fix send_reply() scatter/gather set-up'). Commit 9d11b51ce7c1 was introduced in v4.3, and is included in 4.2.y, 4.1.y, and 3.18.y. Oracle bug 22925946 has been filed to request that the above fix be included in the Oracle Linux UEK4 NFS/RDMA server. Red Hat bugzillas 1327280 and 1327554 have been filed to request that RHEL NFS/RDMA server backports include the above fix. Workaround: Replace the "proto=rdma,port=20049" mount options with "proto=tcp" until commit 9d11b51ce7c1 is applied to your NFS server. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Prevent inline overflowChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When deciding whether to send a Call inline, rpcrdma_marshal_req doesn't take into account header bytes consumed by chunk lists. This results in Call messages on the wire that are sometimes larger than the inline threshold. Likewise, when a Write list or Reply chunk is in play, the server's reply has to emit an RDMA Send that includes a larger-than-minimal RPC-over-RDMA header. The actual size of a Call message cannot be estimated until after the chunk lists have been registered. Thus the size of each RPC-over-RDMA header can be estimated only after chunks are registered; but the decision to register chunks is based on the size of that header. Chicken, meet egg. The best a client can do is estimate header size based on the largest header that might occur, and then ensure that inline content is always smaller than that. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headersChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * xprtrdma: Bound the inline threshold valuesChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the sysctls that allow setting the inline threshold allow any value to be set. Small values only make the transport run slower. The default 1KB setting is as low as is reasonable. And the logic that decides how to divide a Send buffer between RPC-over-RDMA header and RPC message assumes (but does not check) that the lower bound is not crazy (say, 57 bytes). Send and receive buffers share a page with some control information. Values larger than about 3KB can't be supported, currently. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: Advertise maximum backchannel payload sizeChuck Lever2016-05-17
| | | | | | | | | | | | | | | | | | | | RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: add rpc_lookup_generic_credWeston Andros Adamson2016-05-09
| | | | | | | | | | | | | | | | | | | | Add function rpc_lookup_generic_cred, which allows lookups of a generic credential that's not current_cred(). [jlayton: add gfp_t parm] Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * sunrpc: plumb gfp_t parm into crcreate operationJeff Layton2016-05-09
| | | | | | | | | | | | | | | | | | | | We need to be able to call the generic_cred creator from different contexts. Add a gfp_t parm to the crcreate operation and to rpcauth_lookup_credcache. For now, we just push the gfp_t parms up one level to the *_lookup_cred functions. Signed-off-by: Jeff Layton <jeff.layton@primarydata.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
| * SUNRPC: init xdr_stream for zero iov_len, page_lenBenjamin Coddington2016-05-09
| | | | | | | | | | | | | | | | | | An xdr_buf with head[0].iov_len = 0 and page_len = 0 will cause xdr_init_decode() to incorrectly setup the xdr_stream. Specifically, xdr->end is never initialized. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | Merge tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-05-24
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "A very quiet cycle for nfsd, mainly just an RDMA update from Chuck Lever" * tag 'nfsd-4.7' of git://linux-nfs.org/~bfields/linux: sunrpc: fix stripping of padded MIC tokens svcrpc: autoload rdma module svcrdma: Generalize svc_rdma_xdr_decode_req() svcrdma: Eliminate code duplication in svc_rdma_recvfrom() svcrdma: Drain QP before freeing svcrdma_xprt svcrdma: Post Receives only for forward channel requests svcrdma: Remove superfluous line from rdma_read_chunks() svcrdma: svc_rdma_put_context() is invoked twice in Send error path svcrdma: Do not add XDR padding to xdr_buf page vector svcrdma: Support IPv6 with NFS/RDMA nfsd: handle seqid wraparound in nfsd4_preprocess_layout_stateid Remove unnecessary allocation
| * | sunrpc: fix stripping of padded MIC tokensTomáš Trnka2016-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The length of the GSS MIC token need not be a multiple of four bytes. It is then padded by XDR to a multiple of 4 B, but unwrap_integ_data() would previously only trim mic.len + 4 B. The remaining up to three bytes would then trigger a check in nfs4svc_decode_compoundargs(), leading to a "garbage args" error and mount failure: nfs4svc_decode_compoundargs: compound not properly padded! nfsd: failed to decode arguments! This would prevent older clients using the pre-RFC 4121 MIC format (37-byte MIC including a 9-byte OID) from mounting exports from v3.9+ servers using krb5i. The trimming was introduced by commit 4c190e2f913f ("sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer"). Fixes: 4c190e2f913f "unrpc: trim off trailing checksum..." Signed-off-by: Tomáš Trnka <ttrnka@mail.muni.cz> Cc: stable@vger.kernel.org Acked-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrpc: autoload rdma moduleJ. Bruce Fields2016-05-23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should fix failures like: # rpc.nfsd --rdma rpc.nfsd: Unable to request RDMA services: Protocol not supported Reported-by: Steve Dickson <steved@redhat.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Generalize svc_rdma_xdr_decode_req()Chuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: Pass in just the piece of the svc_rqst that is needed here. While we're in the area, add an informative documenting comment. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Eliminate code duplication in svc_rdma_recvfrom()Chuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | Clean up. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Drain QP before freeing svcrdma_xprtChuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the server has forced a disconnect, the associated QP has not been moved to the Error state, and thus Receives are still posted. Ensure Receives (and any other outstanding WRs) are drained to release resources that can be freed during teardown of the svcrdma_xprt. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Post Receives only for forward channel requestsChuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since backward direction support was added, the rq_depth was increased to accommodate both forward and backward Receives. But only forward Receives need to be posted after a connection has been accepted. Receives for backward replies are posted as needed by svc_rdma_bc_sendto(). This doesn't break anything, but it means some resources are wasted. Fixes: 03fe9931536f ('svcrdma: Define maximum number of ...') Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Remove superfluous line from rdma_read_chunks()Chuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up: svc_rdma_get_read_chunk() already returns a pointer to the Read list. No need to set "ch" again to the value it already contains. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: svc_rdma_put_context() is invoked twice in Send error pathChuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | Get a fresh op_ctxt in send_reply() instead of in svc_rdma_sendto(). This ensures that svc_rdma_put_context() is invoked only once if send_reply() fails. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Do not add XDR padding to xdr_buf page vectorChuck Lever2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An xdr_buf has a head, a vector of pages, and a tail. Each RPC request is presented to the NFS server contained in an xdr_buf. The RDMA transport would like to supply the NFS server with only the NFS WRITE payload bytes in the page vector. In some common cases, that would allow the NFS server to swap those pages right into the target file's page cache. Have the transport's RDMA Read logic put XDR pad bytes in the tail iovec, and not in the pages that hold the data payload. The NFSv3 WRITE XDR decoder is finicky about the lengths involved, so make sure it is looking in the correct places when computing the total length of the incoming NFS WRITE request. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | svcrdma: Support IPv6 with NFS/RDMAShirley Ma2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow both IPv4 and IPv6 to bind same port at the same time, restricts use of the IPv6 socket to IPv6 communication. Changes from v1: - Check rdma_set_afonly return value (suggested by Leon Romanovsky) Changes from v2: - Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Shirley Ma <shirley.ma@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | Remove unnecessary allocationJ. Bruce Fields2016-05-03
| |/ | | | | | | | | Reported-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | Merge tag 'for-linus' of ↵Linus Torvalds2016-05-20
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull rdma updates from Doug Ledford: "Primary 4.7 merge window changes - Updates to the new Intel X722 iWARP driver - Updates to the hfi1 driver - Fixes for the iw_cxgb4 driver - Misc core fixes - Generic RDMA READ/WRITE API addition - SRP updates - Misc ipoib updates - Minor mlx5 updates" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (148 commits) IB/mlx5: Fire the CQ completion handler from tasklet net/mlx5_core: Use tasklet for user-space CQ completion events IB/core: Do not require CAP_NET_ADMIN for packet sniffing IB/mlx4: Fix unaligned access in send_reply_to_slave IB/mlx5: Report Scatter FCS device capability when supported IB/mlx5: Add Scatter FCS support for Raw Packet QP IB/core: Add Scatter FCS create flag IB/core: Add Raw Scatter FCS device capability IB/core: Add extended device capability flags i40iw: pass hw_stats by reference rather than by value i40iw: Remove unnecessary synchronize_irq() before free_irq() i40iw: constify i40iw_vf_cqp_ops structure IB/mlx5: Add UARs write-combining and non-cached mapping IB/mlx5: Allow mapping the free running counter on PROT_EXEC IB/mlx4: Use list_for_each_entry_safe IB/SA: Use correct free function IB/core: Fix a potential array overrun in CMA and SA agent IB/core: Remove unnecessary check in ibnl_rcv_msg IB/IWPM: Fix a potential skb leak RDMA/nes: replace custom print_hex_dump() ...
| * | IB/core: Enhance ib_map_mr_sg()Bart Van Assche2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SRP initiator allows to set max_sectors to a value that exceeds the largest amount of data that can be mapped at once with an mlx4 HCA using fast registration and a page size of 4 KB. Hence modify ib_map_mr_sg() such that it can map partial sg-elements. If an sg-element has been mapped partially, let the caller know which fraction has been mapped by adjusting *sg_offset. Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Tested-by: Laurence Oberman <loberman@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
| * | IB/core: Add passing an offset into the SG to ib_map_mr_sgChristoph Hellwig2016-05-13
| |/ | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
* | sunrpc: set SOCK_FASYNCEric Dumazet2016-05-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sunrpc is using SOCKWQ_ASYNC_NOSPACE without setting SOCK_FASYNC, so the recent optimizations done in sk_set_bit() and sk_clear_bit() broke it. There is still the risk that a subsequent sock_fasync() call would clear SOCK_FASYNC, but sunrpc does not use this yet. Fixes: 9317bb69824e ("net: SOCKWQ_ASYNC_NOSPACE optimizations") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Jiri Pirko <jiri@resnulli.us> Reported-by: Huang, Ying <ying.huang@intel.com> Tested-by: Jiri Pirko <jiri@resnulli.us> Tested-by: Huang, Ying <ying.huang@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: udp: rename UDP_INC_STATS_BH()Eric Dumazet2016-04-27
| | | | | | | | | | | | | | | | Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(), and UDP6_INC_STATS_BH() to __UDP6_INC_STATS() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2016-04-23
|\| | | | | | | | | | | | | | | | | | | | | Conflicts were two cases of simple overlapping changes, nothing serious. In the UDP case, we need to add a hlist_add_tail_rcu() to linux/rculist.h, because we've moved UDP socket handling away from using nulls lists. Signed-off-by: David S. Miller <davem@davemloft.net>
| * Merge branch 'linus' of ↵Linus Torvalds2016-04-14
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes an NFS regression caused by the skcipher/hash conversion in sunrpc. It also fixes a build problem in certain configurations with bcm63xx" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: hwrng: bcm63xx - fix device tree compilation sunrpc: Fix skcipher/shash conversion
| | * sunrpc: Fix skcipher/shash conversionHerbert Xu2016-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The skcpiher/shash conversion introduced a number of bugs in the sunrpc code: 1) Missing calls to skcipher_request_set_tfm lead to crashes. 2) The allocation size of shash_desc is too small which leads to memory corruption. Fixes: 3b5cf20cf439 ("sunrpc: Use skcipher and ahash/shash") Reported-by: J. Bruce Fields <bfields@fieldses.org> Tested-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | sock: tigthen lockdep checks for sock_owned_by_userHannes Frederic Sowa2016-04-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sock_owned_by_user should not be used without socket lock held. It seems to be a common practice to check .owned before lock reclassification, so provide a little help to abstract this check away. Cc: linux-cifs@vger.kernel.org Cc: linux-bluetooth@vger.kernel.org Cc: linux-nfs@vger.kernel.org Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | sunrpc: do not pull udp headers on receiveWillem de Bruijn2016-04-11
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e6afc8ace6dd modified the udp receive path by pulling the udp header before queuing an skbuff onto the receive queue. Sunrpc also calls skb_recv_datagram to dequeue an skb from a udp socket. Modify this receive path to also no longer expect udp headers. Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Reported-by: Franklin S Cooper Jr. <fcooper@ti.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usageKirill A. Shutemov2016-04-04
| | | | | | | | | | | | | | | | | | Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macrosKirill A. Shutemov2016-04-04
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2016-03-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd updates from Bruce Fields: "Various bugfixes, a RDMA update from Chuck Lever, and support for a new pnfs layout type from Christoph Hellwig. The new layout type is a variant of the block layout which uses SCSI features to offer improved fencing and device identification. (Also: note this pull request also includes the client side of SCSI layout, with Trond's permission.)" * tag 'nfsd-4.6' of git://linux-nfs.org/~bfields/linux: sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race nfsd: recover: fix memory leak nfsd: fix deadlock secinfo+readdir compound nfsd4: resfh unused in nfsd4_secinfo svcrdma: Use new CQ API for RPC-over-RDMA server send CQs svcrdma: Use new CQ API for RPC-over-RDMA server receive CQs svcrdma: Remove close_out exit path svcrdma: Hook up the logic to return ERR_CHUNK svcrdma: Use correct XID in error replies svcrdma: Make RDMA_ERROR messages work rpcrdma: Add RPCRDMA_HDRLEN_ERR svcrdma: svc_rdma_post_recv() should close connection on error svcrdma: Close connection when a send error occurs nfsd: Lower NFSv4.1 callback message size limit svcrdma: Do not send Write chunk XDR pad with inline content svcrdma: Do not write xdr_buf::tail in a Write chunk svcrdma: Find client-provided write and reply chunks once per reply nfsd: Update NFS server comments related to RDMA support nfsd: Fix a memory leak when meeting unsupported state_protect_how4 nfsd4: fix bad bounds checking
| * sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a raceNeilBrown2016-03-17
| | | | | | | | | | | | | | | | | | | | | | | | | | sunrpc_cache_pipe_upcall() can detect a race if CACHE_PENDING is no longer set. In this case it aborts the queuing of the upcall. However it has already taken a new counted reference on "h" and doesn't "put" it, even though it frees the data structure holding the reference. So let's delay the "cache_get" until we know we need it. Fixes: f9e1aedc6c79 ("sunrpc/cache: remove races with queuing an upcall.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Use new CQ API for RPC-over-RDMA server send CQsChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Use new CQ API for RPC-over-RDMA server receive CQsChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2498e ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Remove close_out exit pathChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Hook up the logic to return ERR_CHUNKChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Use correct XID in error repliesChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * svcrdma: Make RDMA_ERROR messages workChuck Lever2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Reviewed-by: Devesh Sharma <devesh.sharma@broadcom.com> Tested-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>