aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp/ipoib
Commit message (Collapse)AuthorAge
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2013-05-08
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - XRC transport fixes - Fix DHCP on IPoIB - mlx4 preparations for flow steering - iSER fixes - miscellaneous other fixes * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits) IB/iser: Add support for iser CM REQ additional info IB/iser: Return error to upper layers on EAGAIN registration failures IB/iser: Move informational messages from error to info level IB/iser: Add module version mlx4_core: Expose a few helpers to fill DMFS HW strucutures mlx4_core: Directly expose fields of DMFS HW rule control segment mlx4_core: Change a few DMFS fields names to match firmare spec mlx4: Match DMFS promiscuous field names to firmware spec mlx4_core: Move DMFS HW structs to common header file IB/mlx4: Set link type for RAW PACKET QPs in the QP context IB/mlx4: Disable VLAN stripping for RAW PACKET QPs mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level RDMA/iwcm: Don't touch cmid after dropping reference IB/qib: Correct qib_verbs_register_sysfs() error handling IB/ipath: Correct ipath_verbs_register_sysfs() error handling RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled SRPT: Fix odd use of WARN_ON() IPoIB: Fix ipoib_hard_header() return value RDMA: Rename random32() to prandom_u32() RDMA/cxgb3: Fix uninitialized variable ...
| *-. Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' ↵Roland Dreier2013-05-08
| |\ \ | | | | | | | | | | | | into for-next
| | | * RDMA: Rename random32() to prandom_u32()Akinobu Mita2013-04-17
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use more preferable function name which implies using a pseudo-random number generator. Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Steve Wise <swise@chelsio.com> Cc: linux-rdma@vger.kernel.org Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IPoIB: Fix ipoib_hard_header() return valueDoug Ledford2013-04-17
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If you have a patched up dhcp server (and dhclient), they will use AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB. However, when testing an upstream kernel, this has been broken for a very long time (I tested 2.6.34, 2.6.38, 3.0, 3.1, 3.8, HEAD). It turns out that the hard_header routine in ipoib is not following the API and is returning 0 even when it pushed data onto the skb. This then causes af_packet.c to overwrite the header just pushed with data from user space. Fixing this gets DHCP working on IPoIB. Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | drivers/infiniband/hw: rename random32() to prandom_u32()Andrew Morton2013-05-07
| | | | | | | | | | | | | | | | Use preferable function name which implies using a pseudo-random number generator. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | IPoIB: add support for TIPC protocolPatrick McHardy2013-04-17
|/ | | | | | | | | | | | | | | | | | Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own neighbour entries and doesn't require the packet to have a dst_entry anymore, the only necessary changes are to: - not drop multicast TIPC packets because of the unknown ethernet type - handle unicast TIPC packets similar to IPv4/IPv6 unicast packets in ipoib_start_xmit(). An alternative would be to remove all ethertype limitations since they're not necessary anymore, all TIPC needs to know about is ARP and RARP since it wants to always perform "path find", even if a path is already known. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPoIB: Fix send lockup due to missed TX completionMike Marciniszyn2013-03-22
| | | | | | | | | | | | | | | | | | | | | | Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM traffic") attempts to solve an issue where unprocessed UD send completions can deadlock the netdev. The patch doesn't fully resolve the issue because if more than half the tx_outstanding's were UD and all of the destinations are RC reachable, arming the CQ doesn't solve the issue. This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the ib_req_notify_cq(). If the rc is above 0, the UD send cq completion callback is called directly to re-arm the send completion timer. This issue is seen in very large parallel filesystem deployments and the patch has been shown to correct the issue. Cc: <stable@vger.kernel.org> Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Free ipoib neigh on path record failure so path rec queries are retriedRoland Dreier2013-02-25
| | | | | | | | | | | | | | If IPoIB fails to look up a path record (eg if it tries during an SM failover when one SM is dead but the new one hasn't taken over yet), the driver ends up with a neighbour structure but no address handle (AH). There's no mechanism to recover from this: any further packets sent to this destination will be silently dumped in ipoib_start_xmit(). Fix this by freeing the neighbour structures when a path rec query fails, so that the next packet queued to be sent will trigger a new path record query. Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Don't attempt to release resources on error flowItai Garbi2013-02-19
| | | | | | | | | If the ipoib client info isn't found on the _remove_one callback, we must not attempt to scan the returned null list. Found by Coverity. Signed-off-by: Itai Garbi <igarbi@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Add version and firmware info to ethtool reportingYan Burman2013-02-19
| | | | | | | | | Implement version info as well as report firmware version and bus info of the underlying IB HW device. Signed-off-by: Yan Burman <yanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Fix ipoib_neigh hashing to use the correct daddr octetsShlomo Pongratz2013-02-19
| | | | | | | | | | | | | | | | The hash function introduced in commit b63b70d877 ("IPoIB: Use a private hash table for path lookup in xmit path") was designd to use the 3 octets of the IPoIB HW address that holds the remote QPN. However, this currently isn't the case on little-endian machines, because the the code there uses the flags part (octet[0]) and not the last octet of the QPN (octet[3]). Fix this. The fix caused a checkpatch warning on line over 80 characters, to solve that changed the name of the temp variable that holds the daddr. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Fix crash due to skb double destructShlomo Pongratz2013-02-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | After commit b13912bbb4a2 ("IPoIB: Call skb_dst_drop() once skb is enqueued for sending"), using connected mode and running multithreaded iperf for long time, ie iperf -c <IP> -P 16 -t 3600 results in a crash. After the above-mentioned patch, the driver is calling skb_orphan() and skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send() (also in ipoib_ib.c::ipoib_send()) The problem with this is, as is written in a comment in both routines, "it's entirely possible that the completion handler will run before we execute anything after the post_send()." This leads to running the skb cleanup routines simultaneously in two different contexts. The solution is to always perform the skb_orphan() and skb_dst_drop() before queueing the send work request. If an error occurs, then it will be no different than the regular case where dev_free_skb_any() in the completion path, which is assumed to be after these two routines. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Call skb_dst_drop() once skb is enqueued for sendingRoland Dreier2012-12-19
| | | | | | | | | | | | | | | | | | | | | Currently, IPoIB delays collecting send completions for TX packets in order to batch work more efficiently. It does skb_orphan() right after queuing the packets so that destructors run early, to avoid problems like holding socket send buffers for too long (since we might not collect a send completion until a long time after the packet is actually sent). However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks at skb_dst() to update the PMTU when it gets a too-long packet. This means that the packets sitting in the TX ring with uncollected send completions are holding a reference on the dst. We've seen this lead to pathological behavior with respect to route and neighbour GC. The easy fix for this is to call skb_dst_drop() when we call skb_orphan(). Also, give packets sent via connected mode (CM) the same skb_orphan() / skb_dst_drop() treatment that packets sent via datagram mode get. Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=nRoland Dreier2012-10-03
| | | | | | | | | | | | | | | | | | | With the new netlink support in commit 862096a8bbf8 ("IB/ipoib: Add more rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even if connected mode isn't built. Move the function from ipoib_cm.c to ipoib_main.c (and make a few CM-related macros available unconditonally). This fixes the build error drivers/built-in.o: In function 'ipoib_changelink': ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode' ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode' when CONFIG_INFINIBAND_IPOIB_CM isn't set. Reported-by: Randy Dunlap <rdunlap@xenotime.net> Reported-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge tag 'rdma-for-linus' of ↵Linus Torvalds2012-10-02
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull infiniband updates from Roland Dreier: "First batch of InfiniBand/RDMA changes for the 3.7 merge window: - mlx4 IB support for SR-IOV - A couple of SRP initiator fixes - Batch of nes hardware driver fixes - Fix for long-standing use-after-free crash in IPoIB - Other miscellaneous fixes" This merge also removes a new use of __cancel_delayed_work(), and replaces it with the regular cancel_delayed_work() that is now irq-safe thanks to the workqueue updates. That said, I suspect the sequence in question should probably use "mod_delayed_work()". I just did the minimal "don't use deprecated functions" fixup, though. * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits) IB/qib: Fix local access validation for user MRs mlx4_core: Disable SENSE_PORT for multifunction devices mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs mlx4_core: Stash PCI ID driver_data in mlx4_priv structure IB/srp: Avoid having aborted requests hang IB/srp: Fix use-after-free in srp_reset_req() IB/qib: Add a qib driver version RDMA/nes: Fix compilation error when nes_debug is enabled RDMA/nes: Print hardware resource type RDMA/nes: Fix for crash when TX checksum offload is off RDMA/nes: Cosmetic changes RDMA/nes: Fix for incorrect MSS when TSO is on RDMA/nes: Fix incorrect resolving of the loopback MAC address mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem mlx4_core: Trivial cleanups to driver log messages mlx4_core: Trivial readability fix: "0X30" -> "0x30" IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations mlx4: Paravirtualize Node Guids for slaves mlx4: Activate SR-IOV mode for IB ...
| * IPoIB: Fix use-after-free of multicast objectPatrick McHardy2012-09-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz) Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which is run from the ipoib_workqueue, and hence the workqueue can't be flushed from the context of ipoib_stop(). In the current code, ipoib_stop() (which doesn't flush the workqueue) calls ipoib_mcast_dev_flush(), which goes and deletes all the multicast entries. This takes place without any synchronization with a possible running instance of ipoib_mcast_join_task() for the same ipoib device, leading to a crash due to NULL pointer dereference. Fix this by making sure that the workqueue is flushed before ipoib_mcast_dev_flush() is called. To make that possible, we move the RTNL-lock wrapped code to ipoib_mcast_join_finish(). Signed-off-by: Patrick McHardy <kaber@trash.net> Cc: <stable@vger.kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | IB/ipoib: Add more rtnl_link_ops callbacksOr Gerlitz2012-10-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the rtnl_link_ops changelink and fill_info callbacks, through which the admin can now set/get the driver mode, etc policies. Maintain the proprietary sysfs entries only for legacy childs. For child devices, set dev->iflink to point to the parent device ifindex, such that user space tools can now correctly show the uplink relation as done for vlan, macvlan, etc devices. Pointed out by Patrick McHardy <kaber@trash.net> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-09-28
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/team/team.c drivers/net/usb/qmi_wwan.c net/batman-adv/bat_iv_ogm.c net/ipv4/fib_frontend.c net/ipv4/route.c net/l2tp/l2tp_netlink.c The team, fib_frontend, route, and l2tp_netlink conflicts were simply overlapping changes. qmi_wwan and bat_iv_ogm were of the "use HEAD" variety. With help from Antonio Quartulli. Signed-off-by: David S. Miller <davem@davemloft.net>
| * IPoIB: Fix AB-BA deadlock when deleting neighboursShlomo Pongratz2012-09-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep points out a circular locking dependency betwwen the ipoib device priv spinlock (priv->lock) and the neighbour table rwlock (ntbl->rwlock). In the normal path, ie neigbour garbage collection task, the neigh table rwlock is taken first and then if the neighbour needs to be deleted, priv->lock is taken. However in some error paths, such as in ipoib_cm_handle_tx_wc(), priv->lock is taken first and then ipoib_neigh_free routine is called which in turn takes the neighbour table ntbl->rwlock. The solution is to get rid the neigh table rwlock completely and use only priv->lock. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| * IPoIB: Fix memory leak in the neigh table deletion flowShlomo Pongratz2012-09-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the neighbours hash table is empty when unloading the module, then ipoib_flush_neighs(), the cleanup routine, isn't called and the memory used for the hash table itself leaked. To fix this, ipoib_flush_neighs() is allways called, and another completion object is added to signal when the table is freed. Once invoked, ipoib_flush_neighs() flushes all the neighbours (if there are any), calls the the hash table RCU free routine, which now signals completion of the deletion process, and waits for the last neighbour to be freed. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | IB/ipoib: Add rtnl_link_ops supportOr Gerlitz2012-09-20
|/ | | | | | | | | | | | | | | | | | Add rtnl_link_ops to IPoIB, with the first usage being child device create/delete through them. Childs devices are now either legacy ones, created/deleted through the ipoib sysfs entries, or RTNL ones. Adding support for RTNL childs involved refactoring of ipoib_vlan_add which is now used by both the sysfs and the link_ops code. Also, added ndo_uninit entry to support calling unregister_netdevice_queue from the rtnl dellink entry. This required removal of calls to ipoib_dev_cleanup from the driver in flows which use unregister_netdevice, since the networking core will invoke ipoib_uninit which does exactly that. Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IB/ipoib: Fix RCU pointer dereference of wrong objectShlomo Pongratz2012-08-14
| | | | | | | | | | | Commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_neigh_free() (which is called from a few errors flows in the driver), rcu_dereference() is invoked with the wrong pointer object, which results in a crash. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IB/ipoib: Add missing locking when CM object is deletedShlomo Pongratz2012-08-14
| | | | | | | | | | | | | | | | Commit b63b70d87741 ("IPoIB: Use a private hash table for path lookup in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM object is moved between lists without any supported locking. Under a stress test, this eventually leads to list corruption and a crash. Previously when this routine was called, callers were taking the device priv lock. Currently this function is called from the RCU callback associated with neighbour deletion. Fix the race by taking the same lock we used to before. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Use a private hash table for path lookup in xmit pathShlomo Pongratz2012-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Miller <davem@davemloft.net> provided a detailed description of why the way IPoIB is using neighbours for its own ipoib_neigh struct is buggy: Any time an ipoib_neigh is changed, a sequence like the following is made: spin_lock_irqsave(&priv->lock, flags); /* * It's safe to call ipoib_put_ah() inside * priv->lock here, because we know that * path->ah will always hold one more reference, * so ipoib_put_ah() will never do more than * decrement the ref count. */ if (neigh->ah) ipoib_put_ah(neigh->ah); list_del(&neigh->list); ipoib_neigh_free(dev, neigh); spin_unlock_irqrestore(&priv->lock, flags); ipoib_path_lookup(skb, n, dev); This doesn't work, because you're leaving a stale pointer to the freed up ipoib_neigh in the special neigh->ha pointer cookie. Yes, it even fails with all the locking done to protect _changes_ to *ipoib_neigh(n), and with the code in ipoib_neigh_free() that NULLs out the pointer. The core issue is that read side calls to *to_ipoib_neigh(n) are not being synchronized at all, they are performed without any locking. So whether we hold the lock or not when making changes to *ipoib_neigh(n) you still can have threads see references to freed up ipoib_neigh objects. cpu 1 cpu 2 n = *ipoib_neigh() *ipoib_neigh() = NULL kfree(n) n->foo == OOPS [..] Perhaps the ipoib code can have a private path database it manages entirely itself, which holds all the necessary information and is looked up by some generic key which is available easily at transmit time and does not involve generic neighbour entries. See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and <http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b> for the full discussion. This patch aims to solve the race conditions found in the IPoIB driver. The patch removes the connection between the core networking neighbour structure and the ipoib_neigh structure. In addition to avoiding the race described above, it allows us to handle SKBs carrying IP packets that don't have any associated neighbour. We add an ipoib_neigh hash table with N buckets where the key is the destination hardware address. The ipoib_neigh is fetched from the hash table and instead of the stashed location in the neighbour structure. The hash table uses both RCU and reference counting to guarantee that no ipoib_neigh instance is ever deleted while in use. Fetching the ipoib_neigh structure instance from the hash also makes the special code in ipoib_start_xmit that handles remote and local bonding failover redundant. Aged ipoib_neigh instances are deleted by a garbage collection task that runs every M seconds and deletes every ipoib_neigh instance that was idle for at least 2*M seconds. The deletion is safe since the ipoib_neigh instances are protected using RCU and reference count mechanisms. The number of buckets (N) and frequency of running the GC thread (M), are taken from the exported arb_tbl. Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* Merge tag 'rdma-for-3.6' of ↵Linus Torvalds2012-07-24
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband Pull InfiniBand/RDMA changes from Roland Dreier: - Updates to the qib low-level driver - First chunk of changes for SR-IOV support for mlx4 IB - RDMA CM support for IPv6-only binding - Other misc cleanups and fixes Fix up some add-add conflicts in include/linux/mlx4/device.h and drivers/net/ethernet/mellanox/mlx4/main.c * tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits) IB/qib: checkpatch fixes IB/qib: Add congestion control agent implementation IB/qib: Reduce sdma_lock contention IB/qib: Fix an incorrect log message IB/qib: Fix QP RCU sparse warnings mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them mlx4_core: Allow guests to have IB ports mlx4_core: Implement mechanism for reserved Q_Keys net/mlx4_core: Free ICM table in case of error IB/cm: Destroy idr as part of the module init error flow mlx4_core: Remove double function declarations IB/mlx4: Fill the masked_atomic_cap attribute in query device IB/mthca: Fill in sq_sig_type in query QP IB/mthca: Warning about event for non-existent QPs should show event type IB/qib: Fix sparse RCU warnings in qib_keys.c net/mlx4_core: Initialize IB port capabilities for all slaves mlx4: Use port management change event instead of smp_snoop IB/qib: RCU locking for MR validation IB/qib: Avoid returning EBUSY from MR deregister IB/qib: Fix UC MR refs for immediate operations ...
| * IB: Use IS_ENABLED(CONFIG_IPV6)Roland Dreier2012-07-08
| | | | | | | | | | | | Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) Signed-off-by: Roland Dreier <roland@purestorage.com>
* | net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()David S. Miller2012-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will be used so that we can compose a full flow key. Even though we have a route in this context, we need more. In the future the routes will be without destination address, source address, etc. keying. One ipv4 route will cover entire subnets, etc. In this environment we have to have a way to possess persistent storage for redirects and PMTU information. This persistent storage will exist in the FIB tables, and that's why we'll need to be able to rebuild a full lookup flow key here. Using that flow key will do a fib_lookup() and create/update the persistent entry. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2012-07-11
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: net/batman-adv/bridge_loop_avoidance.c net/batman-adv/bridge_loop_avoidance.h net/batman-adv/soft-interface.c net/mac80211/mlme.c With merge help from Antonio Quartulli (batman-adv) and Stephen Rothwell (drivers/net/usb/qmi_wwan.c). The net/mac80211/mlme.c conflict seemed easy enough, accounting for a conversion to some new tracing macros. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | IPoIB: fix skb truesize underestimatiomEric Dumazet2012-07-11
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in skb_try_coalesce() This warning tracks drivers that incorrectly set skb->truesize IPoIB indeed allocates a full page to store a fragment, but only accounts in skb->truesize the used part of the page (frame length) This patch fixes skb truesize underestimation, and also fixes a performance issue, because RX skbs have not enough tailroom to allow IP and TCP stacks to pull their header in skb linear part without an expensive call to pskb_expand_head() Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Cc: Erez Shitrit <erezsh@mellanox.com> Cc: Shlomo Pongartz <shlomop@mellanox.com> Cc: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock.David S. Miller2012-07-06
| | | | | | | | | | | | | | Otherwise local_bh_enable() complains. Reported-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | ipoib: Convert over to dev_lookup_neigh_skb().David S. Miller2012-07-05
|/ | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* IB: Change CQE "csum_ok" field to a bit flagOr Gerlitz2012-03-08
| | | | | | | | | | | | | | | | | Use a bit in wc_flags rather then a whole integer to hold the "checksum OK" flag. By itself, this change doesn't reduce the size of struct ib_wc on 64bit machines -- it stays on 56 bytes because of padding. However, it will allow to add more fields in the future without enlarging the struct. Also, it will let us have a unified approach with future libibverbs checksum offload reporting, because a bit flag doesn't break the library ABI. This patch was suggested during conversation with Liran Liss <liranl@mellanox.com>. Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addressesRoland Dreier2012-02-08
| | | | | | | | | | | | | Commit a0417fa3a18a ("net: Make qdisc_skb_cb upper size bound explicit.") made it possible for a netdev driver to use skb->cb between its header_ops.create method and its .ndo_start_xmit method. Use this in ipoib_hard_header() to stash away the LL address (GID + QPN), instead of the "ipoib_pseudoheader" hack. This allows IPoIB to stop lying about its hard_header_len, which will let us fix the L2 check for GRO. Signed-off-by: Roland Dreier <roland@purestorage.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* infiniband: ipoib: Sanitize neighbour handling in ipoib_main.cDavid Miller2011-12-05
| | | | | | | | | | | | | | Reduce the number of dst_get_neighbour_noref() calls within a single call chain. Primarily by passing the neighbour pointer down to the helper functions. Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit() by incrementing the dropped counter and freeing the packet. We don't want it to fall through into the ARP/RARP/multicast handling, since that should only happen when skb_dst() is NULL. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.David Miller2011-12-05
| | | | | | | | To reflect the fact that a refrence is not obtained to the resulting neighbour entry. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Roland Dreier <roland@purestorage.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2011-12-02
|\
| *-. Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-nextRoland Dreier2011-11-29
| |\ \
| | | * IB: Fix RCU lockdep splatsEric Dumazet2011-11-29
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f2c31e32b37 ("net: fix NULL dereferences in check_peer_redir()") forgot to take care of infiniband uses of dst neighbours. Many thanks to Marc Aurele who provided a nice bug report and feedback. Reported-by: Marc Aurele La France <tsi@ualberta.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Miller <davem@davemloft.net> Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IB/ipoib: Prevent hung task or softlockup processing multicast responseMike Marciniszyn2011-11-29
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This following can occur with ipoib when processing a multicast reponse: BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982] Modules linked in: ... CPU 0: Modules linked in: ... Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5 RIP: 0010:[<ffffffff814ddb27>] [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20 RSP: 0018:ffff8802119ed860 EFLAGS: 00000246 0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299 RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246 RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000 R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860 R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Call Trace: [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib] [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20 [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib] [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib] [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0 [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0 [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0 [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib] [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib] [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa] [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad] [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core] [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa] [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa] [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa] [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad] [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad] [<ffffffff81088830>] ? worker_thread+0x170/0x2a0 [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40 [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0 [<ffffffff8108ddf6>] ? kthread+0x96/0xa0 [<ffffffff8100c1ca>] ? child_rip+0xa/0x20 Coinciding with stack trace is the following message: ib0: ib_address_create failed The code below in ipoib_mcast_join_finish() will note the above failure in the address handle but otherwise continue: ah = ipoib_create_ah(dev, priv->pd, &av); if (!ah) { ipoib_warn(priv, "ib_address_create failed\n"); } else { The while loop at the bottom of ipoib_mcast_join_finish() will attempt to send queued multicast packets in mcast->pkt_queue and eventually end up in ipoib_mcast_send(): if (!mcast->ah) { if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE) skb_queue_tail(&mcast->pkt_queue, skb); else { ++dev->stats.tx_dropped; dev_kfree_skb_any(skb); } My read is that the code will requeue the packet and return to the ipoib_mcast_join_finish() while loop and the stage is set for the "hung" task diagnostic as the while loop never sees a non-NULL ah, and will do nothing to resolve. There are GFP_ATOMIC allocates in the provider routines, so this is possible and should be dealt with. The test that induced the failure is associated with a host SM on the same server during a shutdown. This patch causes ipoib_mcast_join_finish() to exit with an error which will flush the queued mcast packets. Nothing is done to unwind the QP attached state so that subsequent sends from above will retry the join. Reviewed-by: Ram Vepa <ram.vepa@qlogic.com> Reviewed-by: Gary Leshner <gary.leshner@qlogic.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | neigh: Add infrastructure for allocating device neigh privates.David Miller2011-11-30
| | | | | | | | | | | | | | | | | | netdev->neigh_priv_len records the private area length. This will trigger for neigh_table objects which set tbl->entry_size to zero, and the first instances of this will be forthcoming. Signed-off-by: David S. Miller <davem@davemloft.net>
* | infiniband: Update net drivers for netdev_features_t changes.David S. Miller2011-11-16
|/ | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'modsplit-Oct31_2011' of ↵Linus Torvalds2011-11-06
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits) Revert "tracing: Include module.h in define_trace.h" irq: don't put module.h into irq.h for tracking irqgen modules. bluetooth: macroize two small inlines to avoid module.h ip_vs.h: fix implicit use of module_get/module_put from module.h nf_conntrack.h: fix up fallout from implicit moduleparam.h presence include: replace linux/module.h with "struct module" wherever possible include: convert various register fcns to macros to avoid include chaining crypto.h: remove unused crypto_tfm_alg_modname() inline uwb.h: fix implicit use of asm/page.h for PAGE_SIZE pm_runtime.h: explicitly requires notifier.h linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h miscdevice.h: fix up implicit use of lists and types stop_machine.h: fix implicit use of smp.h for smp_processor_id of: fix implicit use of errno.h in include/linux/of.h of_platform.h: delete needless include <linux/module.h> acpi: remove module.h include from platform/aclinux.h miscdevice.h: delete unnecessary inclusion of module.h device_cgroup.h: delete needless include <linux/module.h> net: sch_generic remove redundant use of <linux/module.h> net: inet_timewait_sock doesnt need <linux/module.h> ... Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in - drivers/media/dvb/frontends/dibx000_common.c - drivers/media/video/{mt9m111.c,ov6650.c} - drivers/mfd/ab3550-core.c - include/linux/dmaengine.h
| * infiniband: add moduleparam.h to drivers/infiniband as requiredPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | These files were getting the moduleparam infrastructure from the implicit presence of module.h being everywhere, but that is going away soon. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
| * infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULEPaul Gortmaker2011-10-31
| | | | | | | | | | | | | | | | | | These were getting it implicitly via device.h --> module.h but we are going to stop that when we clean up the headers. Fix these in advance so the tree remains biscect-clean. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2011-11-01
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits) mlx4_core: Deprecate log_num_vlan module param IB/mlx4: Don't set VLAN in IBoE WQEs' control segment IB/mlx4: Enable 4K mtu for IBoE RDMA/cxgb4: Mark QP in error before disabling the queue in firmware RDMA/cxgb4: Serialize calls to CQ's comp_handler RDMA/cxgb3: Serialize calls to CQ's comp_handler IB/qib: Fix issue with link states and QSFP cables IB/mlx4: Configure extended active speeds mlx4_core: Add extended port capabilities support IB/qib: Hold links until tuning data is available IB/qib: Clean up checkpatch issue IB/qib: Remove s_lock around header validation IB/qib: Precompute timeout jiffies to optimize latency IB/qib: Use RCU for qpn lookup IB/qib: Eliminate divide/mod in converting idx to egr buf pointer IB/qib: Decode path MTU optimization IB/qib: Optimize RC/UC code by IB operation IPoIB: Use the right function to do DMA unmap pages RDMA/cxgb4: Use correct QID in insert_recv_cqe() RDMA/cxgb4: Make sure flush CQ entries are collected on connection close ...
| *---. Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', ↵Roland Dreier2011-11-01
| |\ \ \ | | | | | | | | | | | | | | | 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next
| | | | * RDMA/core: Add SRQ type fieldSean Hefty2011-10-13
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is only a single ("basic") type of SRQ, but with XRC support we will add a second. Prepare for this by defining an SRQ type and setting all current users to IB_SRQT_BASIC. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | | * IPoIB: Use the right function to do DMA unmap pagesDotan Barak2011-10-18
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | Pages that were mapped using ib_dma_map_page() should be unmapped using ib_dma_unmap_page(). Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
| | * IPoIB: Handle extended rates in debugfsMarcel Apfelbaum2011-10-11
| |/ | | | | | | | | | | | | | | Use new function ib_rate_to_mbps() to handle printing rate in debugfs, so that we handle extended rates. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Signed-off-by: Roland Dreier <roland@purestorage.com>
* | net: add skb frag size accessorsEric Dumazet2011-10-19
| | | | | | | | | | | | | | | | | | | | | | To ease skb->truesize sanitization, its better to be able to localize all references to skb frags size. Define accessors : skb_frag_size() to fetch frag size, and skb_frag_size_{set|add|sub}() to manipulate it. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>