| Commit message (Collapse) | Author | Age |
... | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Problem reported by Avneesh Pant <avneesh.pant@oracle.com>:
It looks like we are triggering a bug in RDMA CM/UCM interaction.
The bug specifically hits when we have an incoming connection
request and the connecting process dies BEFORE the passive end of
the connection can process the request i.e. it does not call
rdma_get_cm_event() to retrieve the initial connection event. We
were able to triage this further and have some additional
information now.
In the example below when P1 dies after issuing a connect request
as the CM id is being destroyed all outstanding connects (to P2)
are sent a reject message. We see this reject message being
received on the passive end and the appropriate CM ID created for
the initial connection message being retrieved in cm_match_req().
The problem is in the ucma_event_handler() code when this reject
message is delivered to it and the initial connect message itself
HAS NOT been delivered to the client. In fact the client has not
even called rdma_cm_get_event() at this stage so we haven't
allocated a new ctx in ucma_get_event() and updated the new
connection CM_ID to point to the new UCMA context.
This results in the reject message not being dropped in
ucma_event_handler() for the new connection request as the
(if (!ctx->uid)) block is skipped since the ctx it refers to is
the listen CM id context which does have a valid UID associated
with it (I believe the new CMID for the connection initially
uses the listen CMID -> context when it is created in
cma_new_conn_id). Thus the assumption that new events for a
connection can get dropped in ucma_event_handler() is incorrect
IF the initial connect request has not been retrieved in the
first case. We end up getting a CM Reject event on the listen CM
ID and our upper layer code asserts (in fact this event does not
even have the listen_id set as that only gets set up librdmacm
for connect requests).
The solution is to verify that the cm_id being reported in the event
is the same as the cm_id referenced by the ucma context. A mismatch
indicates that the ucma context corresponds to the listen. This fix
was validated by using a modified version of librdmacm that was able
to verify the problem and see that the reject message was indeed
dropped after this patch was applied.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
As a simple optimization that should speed up the vast majority of
connect attemps on IB devices, when we are searching for the GID of an
incoming connection in the cached GID lists of devices, search the
device that received the incoming connection request first. If we
don't find it there, then move on to other devices.
This reduces the time to perform 10,000 connections considerably.
Prior to this patch, a bad run of cmtime would look like this:
connect : 12399.26 12351.10 8609.00 1239.93
With this patch, it looks more like this:
connect : 5864.86 5799.80 8876.00 586.49
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The cma_acquire_dev function was changed by commit 3c86aa70bf67
("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
because multiport devices might have either IB or IBoE formatted gids.
The old function assumed that all ports on the same device used the
same GID format.
However, when it was changed to use find_gid_port(), we inadvertently
lost usage of the GID cache. This turned out to be a very costly
change. In our testing, each iteration through each index of the GID
table takes roughly 35us. When you have multiple devices in a system,
and the GID you are looking for is on one of the later devices, the
code loops through all of the GID indexes on all of the early devices
before it finally succeeds on the target device. This pathological
search behavior combined with 35us per GID table index retrieval
results in results such as the following from the cmtime application
that's part of the latest librdmacm git repo:
ib1:
step total ms max ms min us us / conn
create id : 29.42 0.04 1.00 2.94
bind addr : 186705.66 19.00 18556.00 18670.57
resolve addr : 41.93 9.68 619.00 4.19
resolve route: 486.93 0.48 101.00 48.69
create qp : 4021.95 6.18 330.00 402.20
connect : 68350.39 68588.17 24632.00 6835.04
disconnect : 1460.43 252.65-1862269.00 146.04
destroy : 41.16 0.04 2.00 4.12
ib0:
step total ms max ms min us us / conn
create id : 28.61 0.68 1.00 2.86
bind addr : 2178.86 2.95 201.00 217.89
resolve addr : 51.26 16.85 845.00 5.13
resolve route: 620.08 0.43 92.00 62.01
create qp : 3344.40 6.36 273.00 334.44
connect : 6435.99 6368.53 7844.00 643.60
disconnect : 5095.38 321.90 757.00 509.54
destroy : 37.13 0.02 2.00 3.71
Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system.
After applying this patch, the numbers now look like this:
ib1:
step total ms max ms min us us / conn
create id : 30.15 0.03 1.00 3.01
bind addr : 80.27 0.04 7.00 8.03
resolve addr : 43.02 13.53 589.00 4.30
resolve route: 482.90 0.45 100.00 48.29
create qp : 3986.55 5.80 330.00 398.66
connect : 7141.53 7051.29 5005.00 714.15
disconnect : 5038.85 193.63 918.00 503.88
destroy : 37.02 0.04 2.00 3.70
ib0:
step total ms max ms min us us / conn
create id : 34.27 0.05 1.00 3.43
bind addr : 26.45 0.04 1.00 2.64
resolve addr : 38.25 10.54 760.00 3.82
resolve route: 604.79 0.43 97.00 60.48
create qp : 3314.95 6.34 273.00 331.49
connect : 12399.26 12351.10 8609.00 1239.93
disconnect : 5096.76 270.72 1015.00 509.68
destroy : 37.10 0.03 2.00 3.71
It's worth noting that we still suffer a bit of a penalty on
connect to the wrong device, but the penalty is much less than
it used to be. Follow on patches deal with this penalty.
Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
On top of commit 366cddb40 "IB/rdma_cm: TOS <=> UP mapping for IBoE", add
support for case vlan egress map is used.
When the IBoE session is being set over a vlan, inherit the socket priority
to vlan priority mapping which was configured for the vlan device egress map.
Signed-off-by: Eyal Perry <eyalpe@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Conflicts:
drivers/net/usb/qmi_wwan.c
include/net/dst.h
Trivial merge conflicts, both were overlapping changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.
So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).
The feature will be enabled after proper cleanup for v3.13.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com
[ Add a Kconfig option to reenable these verbs. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Move sysctl_local_ports from a global variable into struct netns_ipv4.
- Modify inet_get_local_port_range to take a struct net, and update all
of the callers.
- Move the initialization of sysctl_local_ports into
sysctl_net_ipv4.c:ipv4_sysctl_init_net from inet_connection_sock.c
v2:
- Ensure indentation used tabs
- Fixed ip.h so it applies cleanly to todays net-next
v3:
- Compile fixes of strange callers of inet_get_local_port_range.
This patch now successfully passes an allmodconfig build.
Removed manual inlining of inet_get_local_port_range in ipv4_local_port_range
Originally-by: Samya <samya@twitter.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull main batch of InfiniBand/RDMA changes from Roland Dreier:
- Large ocrdma HW driver update: add "fast register" work requests,
fixes, cleanups
- Add receive flow steering support for raw QPs
- Fix IPoIB neighbour race that leads to crash
- iSER updates including support for using "fast register" memory
registration
- IPv6 support for iWARP
- XRC transport fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits)
RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch
IB/iser: Fix redundant pointer check in dealloc flow
IB/iser: Fix possible memory leak in iser_create_frwr_pool()
IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
RDMA/ocrdma: Fix passing wrong opcode to modify_srq
RDMA/ocrdma: Fill PVID in UMC case
RDMA/ocrdma: Add ABI versioning support
RDMA/ocrdma: Consider multiple SGES in case of DPP
RDMA/ocrdma: Fix for displaying proper link speed
RDMA/ocrdma: Increase STAG array size
RDMA/ocrdma: Dont use PD 0 for userpace CQ DB
RDMA/ocrdma: FRMA code cleanup
RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24
RDMA/ocrdma: Fix to work with even a single MSI-X vector
RDMA/ocrdma: Remove the MTU check based on Ethernet MTU
RDMA/ocrdma: Add support for fast register work requests (FRWR)
RDMA/ocrdma: Create IRD queue fix
IB/core: Better checking of userspace values for receive flow steering
IB/mlx4: Add receive flow steering support
IB/core: Export ib_create/destroy_flow through uverbs
...
|
| |\ \ \
| | | | |
| | | | |
| | | | | |
'qib' into for-next
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Added reference counting mechanism for XRC target QPs between
ib_uqp_object and its ib_uxrcd_object. This prevents closing an XRC
domain that is still attached to a QP. In addition, add missing code
in ib_uverbs_destroy_srq() to handle ib_uxrcd_object reference
counting correctly when destroying an xsrq.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |/
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix a potential race when event occurrs on a target XRC QP and in the
middle of reporting that on its shared qps, one of them is destroyed
by user space application. Also add note for kernel consumers in
ib_verbs.h that they must not destroy the QP from within the handler.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
- Don't allow unsupported comp_mask values, user should check
ibv_query_device to know which features are supported.
- Add a check in ib_uverbs_create_flow() to verify the size passed
from the user space.
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
support flow steering for user space applications.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add infrastructure to support extended uverbs capabilities in a
forward/backward manner. Uverbs command opcodes which are based on
the verbs extensions approach should be greater or equal to
IB_USER_VERBS_CMD_THRESHOLD. They have new header format and
processed a bit differently.
Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
it needs to have additional arguments, we will be able to add them without creating
a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.
This patch for itself doesn't provide the whole scheme which is also dependent
on adding a comp_mask field to each extended uverbs command struct.
The new header framework allows for future extension of the CMD arguments
(ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
new command (that is a command that supports the new uverbs command header format
suggested in this patch) w/o bumping ABI version and with maintaining backward
and formward compatibility to new and old libibverbs versions.
In the uverbs command we are passing both uverbs arguments and the provider arguments.
We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
uverbs input argument struct size and ib_uverbs_cmd_hdr.provider_in_words that will carry
the provider input argument size. Same goes for the response (the uverbs CMD output argument).
For example take the create_cq call and the mlx4_ib provider:
The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
in_words = sizeof(mlx4_create_cq)/4 .
Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.
Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
to the struct which is basically bit field indicating which fields exists in the struct
(as done for the libibverbs API extension), but we need a way to tell what is the total
size of the struct and not assume the struct size is predefined (since we may get different
struct sizes from different user libibverbs versions). So we know at which point the
provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.
Signed-off-by: Igor Ivanov <Igor.Ivanov@itseez.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The RDMA stack allows for applications to create IB_QPT_RAW_PACKET
QPs, which receive plain Ethernet packets, specifically packets that
don't carry any QPN to be matched by the receiving side. Applications
using these QPs must be provided with a method to program some
steering rule with the HW so packets arriving at the local port can be
routed to them.
This patch adds ib_create_flow(), which allow providing a flow
specification for a QP. When there's a match between the
specification and a received packet, the packet is forwarded to that
QP, in a the same way one uses ib_attach_multicast() for IB UD
multicast handling.
Flow specifications are provided as instances of struct ib_flow_spec_yyy,
which describe L2, L3 and L4 headers. Currently specs for Ethernet, IPv4,
TCP and UDP are defined. Flow specs are made of values and masks.
The input to ib_create_flow() is a struct ib_flow_attr, which contains
a few mandatory control elements and optional flow specs.
struct ib_flow_attr {
enum ib_flow_attr_type type;
u16 size;
u16 priority;
u32 flags;
u8 num_of_specs;
u8 port;
/* Following are the optional layers according to user request
* struct ib_flow_spec_yyy
* struct ib_flow_spec_zzz
*/
};
As these specs are eventually coming from user space, they are defined and
used in a way which allows adding new spec types without kernel/user ABI
change, just with a little API enhancement which defines the newly added spec.
The flow spec structures are defined with TLV (Type-Length-Value)
entries, which allows calling ib_create_flow() with a list of variable
length of optional specs.
For the actual processing of ib_flow_attr the driver uses the number
of specs and the size mandatory fields along with the TLV nature of
the specs.
Steering rules processing order is according to the domain over which
the rule is set and the rule priority. All rules set by user space
applicatations fall into the IB_FLOW_DOMAIN_USER domain, other domains
could be used by future IPoIB RFS and Ethetool flow-steering interface
implementation. Lower numerical value for the priority field means
higher priority.
The returned value from ib_create_flow() is a struct ib_flow, which
contains a database pointer (handle) provided by the HW driver to be
used when calling ib_destroy_flow().
Applications that offload TCP/IP traffic can also be written over IB
UD QPs. The ib_create_flow() / ib_destroy_flow() API is designed to
support UD QPs too. A HW driver can set IB_DEVICE_MANAGED_FLOW_STEERING
to denote support for flow steering.
The ib_flow_attr enum type supports usage of flow steering for promiscuous
and sniffer purposes:
IB_FLOW_ATTR_NORMAL - "regular" rule, steering according to rule specification
IB_FLOW_ATTR_ALL_DEFAULT - default unicast and multicast rule, receive
all Ethernet traffic which isn't steered to any QP
IB_FLOW_ATTR_MC_DEFAULT - same as IB_FLOW_ATTR_ALL_DEFAULT but only for multicast
IB_FLOW_ATTR_SNIFFER - sniffer rule, receive all port traffic
ALL_DEFAULT and MC_DEFAULT rules options are valid only for Ethernet link type.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Modify the type of local_addr and remote_addr fields in struct
iw_cm_id from struct sockaddr_in to struct sockaddr_storage to hold
IPv6 and IPv4 addresses uniformly.
Change the references of local_addr and remote_addr in cxgb4, cxgb3,
nes and amso drivers to match this. However to be able to actully run
traffic over IPv6, low-level drivers have to add code to support this.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
[ Fix unused variable warnings when INFINIBAND_NES_DEBUG not set.
- Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull PTR_RET() removal patches from Rusty Russell:
"PTR_RET() is a weird name, and led to some confusing usage. We ended
up with PTR_ERR_OR_ZERO(), and replacing or fixing all the usages.
This has been sitting in linux-next for a whole cycle"
[ There are still some PTR_RET users scattered about, with some of them
possibly being new, but most of them existing in Rusty's tree too. We
have that
#define PTR_RET(p) PTR_ERR_OR_ZERO(p)
thing in <linux/err.h>, so they continue to work for now - Linus ]
* tag 'PTR_RET-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
GFS2: Replace PTR_RET with PTR_ERR_OR_ZERO
Btrfs: volume: Replace PTR_RET with PTR_ERR_OR_ZERO
drm/cma: Replace PTR_RET with PTR_ERR_OR_ZERO
sh_veu: Replace PTR_RET with PTR_ERR_OR_ZERO
dma-buf: Replace PTR_RET with PTR_ERR_OR_ZERO
drivers/rtc: Replace PTR_RET with PTR_ERR_OR_ZERO
mm/oom_kill: remove weird use of ERR_PTR()/PTR_ERR().
staging/zcache: don't use PTR_RET().
remoteproc: don't use PTR_RET().
pinctrl: don't use PTR_RET().
acpi: Replace weird use of PTR_RET.
s390: Replace weird use of PTR_RET.
PTR_RET is now PTR_ERR_OR_ZERO(): Replace most.
PTR_RET is now PTR_ERR_OR_ZERO
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sweep of the simple cases.
Cc: netdev@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: Julia Lawall <julia.lawall@lip6.fr>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
|
| | | |
| \ | |
|\ \ \
| | |/
| |/|
| | | |
'nes', 'ocrdma' and 'qib' into for-next
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, QP1 is created using pkey index 0. This patch simply looks
for the index containing the default pkey, rather than hard-coding
pkey index 0.
This change will have no effect in native mode, since QP0 and QP1 are
created before the SM configures the port, so pkey table will still be
the default table defined by the IB Spec, in C10-123: "If non-volatile
storage is not used to hold P_Key Table contents, then if a PM
(Partition Manager) is not present, and prior to PM initialization of
the P_Key Table, the P_Key Table must act as if it contains a single
valid entry, at P_Key_ix = 0, containing the default partition
key. All other entries in the P_Key Table must be invalid."
Thus, in the native mode case, the driver will find the default pkey
at index 0 (so it will be no different than the hard-coding).
However, in SR-IOV mode, for VFs, the pkey table may be
paravirtualized, so that the VF's pkey index zero may not necessarily
be mapped to the real pkey index 0. For VFs, therefore, it is
important to find the virtual index which maps to the real default
pkey.
This commit does the following for QP1 creation:
1. Find the pkey index containing the default pkey, and use that index
if found. ib_find_pkey() returns the index of the
limited-membership default pkey (0x7FFF) if the full-member default
pkey is not in the table.
2. If neither form of the default pkey is found, use pkey index 0
(previous behavior).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Calling cma_save_ib_info() for CM SIDR REQs results in a crash
accessing an invalid path record pointer.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a application is using AF_IB with a UD QP, but does not provide any
private data, we will end up accessing invalid memory. Check for this
case and handle it appropriately.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Building cma.o triggers this gcc warning:
drivers/infiniband/core/cma.c: In function ‘rdma_resolve_addr’:
drivers/infiniband/core/cma.c:465:23: warning: ‘port’ may be used uninitialized in this function [-Wmaybe-uninitialized]
drivers/infiniband/core/cma.c:426:5: note: ‘port’ was declared here
This is a false positive, as "port" will always be initialized if we're
at "found". But if we assign to "id_priv->id.port_num" directly, we can
drop "port". That will, obviously, silence gcc.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull InfiniBand/RDMA changes from Roland Dreier:
- AF_IB (native IB addressing) for CMA from Sean Hefty
- new mlx5 driver for Mellanox Connect-IB adapters (including post
merge request fixes)
- SRP fixes from Bart Van Assche (including fix to first merge request)
- qib HW driver updates
- resurrection of ocrdma HW driver development
- uverbs conversion to create fds with O_CLOEXEC set
- other small changes and fixes
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
mlx5: Return -EFAULT instead of -EPERM
IB/qib: Log all SDMA errors unconditionally
IB/qib: Fix module-level leak
mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
mlx5_core: Fixes for sparse warnings
IB/mlx5: Make profile[] static in main.c
mlx5: Fix parameter type of health_handler_t
mlx5: Add driver for Mellanox Connect-IB adapters
IB/core: Add reserved values to enums for low-level driver use
IB/srp: Bump driver version and release date
IB/srp: Make HCA completion vector configurable
IB/srp: Maintain a single connection per I_T nexus
IB/srp: Fail I/O fast if target offline
IB/srp: Skip host settle delay
IB/srp: Avoid skipping srp_reset_host() after a transport error
IB/srp: Fix remove_one crash due to resource exhaustion
IB/qib: New transmitter tunning settings for Dell 1.1 backplane
IB/core: Fix error return code in add_port()
...
|
| |\ \
| | | |
| | | |
| | | | |
into for-next
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The macro get_unused_fd() is used to allocate a file descriptor with
default flags. Those default flags (0) can be "unsafe": O_CLOEXEC must
be used by default to not leak file descriptor across exec().
Replace calls to get_unused_fd() in uverbs with calls to
get_unused_fd_flags(O_CLOEXEC). Inheriting uverbs fds across exec()
cannot be used to do anything useful.
Based on a patch/suggestion from Yann Droneaud <ydroneaud@opteya.com>.
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix to return -ENOMEM in the add_port() error handling case instead of
0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Report AF_IB source and destination addresses through netlink
interface.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow user space applications to join multicast groups using MGIDs
directly. MGIDs may be passed using AF_IB addresses. Since the
current multicast join command only supports addresses as large as
sockaddr_in6, define a new structure for joining addresses specified
using sockaddr_ib.
Since AF_IB allows the user to specify the qkey when resolving a
remote UD QP address, when joining the multicast group use the qkey
value, if one has been assigned.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow user space applications to call resolve_addr using AF_IB. To
support sockaddr_ib, we need to define a new structure capable of
handling the larger address size.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Support user space binding to addresses using AF_IB. Since
sockaddr_ib is larger than sockaddr_in6, we need to define a larger
structure when binding using AF_IB. This time we use sockaddr_storage
to cover future cases.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Several commands into the RDMA CM from user space are restricted to
supporting addresses which fit into a sockaddr_in6 structure: bind
address, resolve address, and join multicast.
With the addition of AF_IB, we need to support addresses which are
larger than sockaddr_in6. This will be done by adding new commands
that exchange address information using sockaddr_storage. However, to
support existing applications, we maintain the current commands and
structures, but rename them to indicate that they only support IPv4
and v6 addresses.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Part of address resolution is mapping IP addresses to IB GIDs. With
the changes to support querying larger addresses and more path records,
also provide a way to query IB GIDs after resolution completes.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow the rdma_ucm to query the IB service ID formed or allocated by
the rdma_cm by exporting the cma_get_service_id() functionality.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The current query_route call can return up to two path records. The
assumption being that one is the primary path, with optional support
for an alternate path. In both cases, the paths are assumed to be
reversible and are used to send CM MADs.
With the ability to manually set IB path data, the rdma cm can
eventually be capable of using up to 6 paths per connection:
forward primary, reverse primary,
forward alternate, reverse alternate,
reversible primary path for CM MADs
reversible alternate path for CM MADs.
(It is unclear at this time if IB routing will complicate this) In
order to handle more flexible routing topologies, add a new command to
report any number of paths.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow converting from struct ib_sa_path_rec to the IB defined SA path
record wire format. This will be used to report path data from the
rdma cm into user space.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The sockaddr structure for AF_IB is larger than sockaddr_in6. The
rdma cm user space ABI uses the latter to exchange address information
between user space and the kernel.
To support querying for larger addresses, define a new query command
that exchanges data using sockaddr_storage, rather than sockaddr_in6.
Unlike the existing query_route command, the new command only returns
address information. Route (i.e. path record) data is separated.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If an rdma_cm_id is bound to AF_IB, with a wild card address, only
listen on IB devices.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow the user to specify the qkey when using AF_IB. The qkey is
added to struct rdma_ucm_conn_param in place of a reserved field, but
for backwards compatability, is only accessed if the associated
rdma_cm_id is using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the source or destination address is AF_IB, then do not reserve a
portion of the private data in the IB CM REQ or SIDR REQ messages for
the cma header. Instead, all private data should be exported to the
user. When AF_IB is used, the rdma cm does not have sufficient
information to fill in the cma header. Additionally, this will be
necessary to support any IB connection through the rdma cm interface,
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
With the removal of SDP related code, we can merge cma_get_net_info()
with cma_save_net_info(), since we're only ever dealing with a single
header format.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The SDP protocol was never merged upstream. Remove unused SDP related
code from the RDMA CM.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
cma_get_service_id() forms the service ID based on the port space and
port number of the rdma_cm_id. Extend the call to support AF_IB,
which contains the service ID directly. This will be needed to
support any arbitrary SID.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow rdma_resolve_route() to handle the case where the user specified
the source and destination addresses using AF_IB.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow the user to specify the remote address using AF_IB format. When
AF_IB is used, the remote address simply needs to be recorded, and no
resolution using ARP is done. The local address may still need to be
matched with a local IB device.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If a user specifies AF_IB as the source address for a loopback
connection, limit the resolution to IB devices only.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Provide inline helpers to extract source and destination address data
from the rdma_cm_id.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
cma_resolve_loopback is called after an rdma_cm_id has been
bound to a specific sa_family and port. Once the
source sa_family for the id has been set, do not modify it.
Only the actual IP address portion of the source address
needs to be set.
As part of this fix, we can simplify setting the source address
by moving the loopback address assignment from cma_resolve_loopback
to cma_bind_loopback. cma_bind_loopback is only invoked when
the source address is the loopback address.
Finally, add loopback support for AF_IB as part of the change.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Modify rdma_bind_addr to allow the user to specify AF_IB when binding
to a device. AF_IB indicates that the user is not mapping an IP
address to the native IB addressing. (The mapping may have already
been done, or is not needed)
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|