aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/infiniband/ulp/ipoib/ipoib_ib.c
Commit message (Collapse)AuthorAge
* IPoIB: Trivial formatting cleanupsRoland Dreier2008-01-25
| | | | | | Fix whitespace blunders, convert "foo* bar" to "foo *bar", etc. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB/cm: Use common CQ for CM send completionsMichael S. Tsirkin2007-10-20
| | | | | | | | | | | | | | | | Use the same CQ for CM send completions as for all other IPoIB completions. This means all completions are processed via the same NAPI polling routine. This should help reduce the number of interrupts for bi-directional traffic (such as TCP) and fixes "driver is hogging interrupts" errors reported for IPoIB send side, e.g. <https://bugs.openfabrics.org/show_bug.cgi?id=508> To do this, keep a per-interface counter of outstanding send WRs, and stop the interface when this counter reaches the send queue size to avoid CQ overruns. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Use round_jiffies() for ah_reap_taskAnton Blanchard2007-10-16
| | | | | | | | Use round_jiffies() to align the 1 second ah_reap_task with other work and potentially save power by sleeping cores for longer. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2007-10-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...
| * IPoIB: Make sure no receives are handled when stopping deviceRoland Dreier2007-10-09
| | | | | | | | | | | | | | | | | | | | The current IPoIB code might process receive completions from ipoib_drain_cq() when bringing down the interface. This could cause packets to be passed up the stack without the device's poll method being called. Avoid this by setting the status of any successful completions to IB_WC_WR_FLUSH_ERR. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | [IPoIB]: Convert to netdevice internal statsRoland Dreier2007-10-10
| | | | | | | | | | | | | | | | | | Use the stats member of struct netdevice in IPoIB, so we can save memory by deleting the stats member of struct ipoib_dev_priv, and save code by deleting ipoib_get_stats(). Signed-off-by: Roland Dreier <rolandd@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | [NET]: Make NAPI polling independent of struct net_device objects.Stephen Hemminger2007-10-10
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several devices have multiple independant RX queues per net device, and some have a single interrupt doorbell for several queues. In either case, it's easier to support layouts like that if the structure representing the poll is independant from the net device itself. The signature of the ->poll() call back goes from: int foo_poll(struct net_device *dev, int *budget) to int foo_poll(struct napi_struct *napi, int budget) The caller is returned the number of RX packets processed (or the number of "NAPI credits" consumed if you want to get abstract). The callee no longer messes around bumping dev->quota, *budget, etc. because that is all handled in the caller upon return. The napi_struct is to be embedded in the device driver private data structures. Furthermore, it is the driver's responsibility to disable all NAPI instances in it's ->stop() device close handler. Since the napi_struct is privatized into the driver's private data structures, only the driver knows how to get at all of the napi_struct instances it may have per-device. With lots of help and suggestions from Rusty Russell, Roland Dreier, Michael Chan, Jeff Garzik, and Jamal Hadi Salim. Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra, Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan. [ Ported to current tree and all drivers converted. Integrated Stephen's follow-on kerneldoc additions, and restored poll_list handling to the old style to fix mutual exclusion issues. -DaveM ] Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* IPoIB: Recycle loopback skbs instead of freeing and reallocatingRoland Dreier2007-07-10
| | | | | | | | | | | | | | InfiniBand HCAs replicate multicast packets back to the QP that sent them if that QP is attached to the destination multicast group. This means that IPoIB multicasts are often replicated back to the receive queue of the interface that generated them. To avoid confusing the network stack, we drop these duplicates within the IPoIB driver. However, there's no reason to free the skb that received the duplicate and then immediately allocate a new skb to post to the receive queue. We can be more efficient and just repost the same skb. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB/cm: Drain cq in ipoib_cm_dev_stop()Michael S. Tsirkin2007-05-24
| | | | | | | | | Since NAPI polling is disabled while ipoib_cm_dev_stop() is running, ipoib_cm_dev_stop() must poll the CQ itself in order to see the packets draining. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB/ipoib: Fix typos in error messagesMichael S. Tsirkin2007-05-21
| | | | | | | Trivial error message fixups. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Handle P_Key table reorderingYosef Etigin2007-05-19
| | | | | | | | | | | | SM reconfiguration or failover possibly causes a shuffling of the values in the P_Key table. Right now, IPoIB only queries for the P_Key index once when it creates the device QP, and hence there are problems if the index of a P_Key value changes. Fix this by using the PKEY_CHANGE event to trigger a recheck of the P_Key index. Signed-off-by: Yosef Etigin <yosefe@voltaire.com> Acked-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Convert to NAPIRoland Dreier2007-05-07
| | | | | | | | | Convert the IP-over-InfiniBand network device driver over to using NAPI to handle completions for the main CQ. This covers all receives as well as datagram mode sends; send completions for connected mode connections are still handled from interrupt context. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* Merge branch 'for-linus' of ↵Linus Torvalds2007-04-27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (49 commits) IB: Set class_dev->dev in core for nice device symlink IB/ehca: Implement modify_port IB/umad: Clarify documentation of transaction ID IPoIB/cm: spin_lock_irqsave() -> spin_lock_irq() replacements IB/mad: Change SMI to use enums rather than magic return codes IB/umad: Implement GRH handling for sent/received MADs IB/ipoib: Use ib_init_ah_from_path to initialize ah_attr IB/sa: Set src_path_bits correctly in ib_init_ah_from_path() IB/ucm: Simplify ib_ucm_event() RDMA/ucma: Simplify ucma_get_event() IB/mthca: Simplify CQ cleaning in mthca_free_qp() IB/mthca: Fix mthca_write_mtt() on HCAs with hidden memory IB/mthca: Update HCA firmware revisions IB/ipath: Fix WC format drift between user and kernel space IB/ipath: Check that a UD work request's address handle is valid IB/ipath: Remove duplicate stuff from ipath_verbs.h IB/ipath: Check reserved memory keys IB/ipath: Fix unit selection when all CPU affinity bits set IB/ipath: Don't allow QPs 0 and 1 to be opened multiple times IB/ipath: Disable IB link earlier in shutdown sequence ...
| * IPoIB: Remove pointless opcode field from debugging outputRoland Dreier2007-04-18
| | | | | | | | | | | | | | | | | | There's no point in printing the opcode field in the completion handling debugging output, since the type of completion is already printed at the beginning of the line. In fact the opcode field is not even defined for completions with a status other than success. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* | [SK_BUFF]: Introduce skb_reset_mac_header(skb)Arnaldo Carvalho de Melo2007-04-26
|/ | | | | | | | | | | | For the common, open coded 'skb->mac.raw = skb->data' operation, so that we can later turn skb->mac.raw into a offset, reducing the size of struct sk_buff in 64bit land while possibly keeping it as a pointer on 32bit. This one touches just the most simple case, next will handle the slightly more "complex" cases. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* IB/ipoib: Fix thinko in packet length checksMichael S. Tsirkin2007-03-22
| | | | | | | | | | | | | | | | | | The packet length checks in ipoib are broken: we add 4 bytes (IPoIB encapsulation header) when sending a packet, not 20 bytes (hardware address length) to each packet. Therefore, if connected mode is enabled so that the interface MTU is larger than the multicast MTU, IPoIB may end up trying to send too-long multicast packets. For example, multicast is broken if a message of size 2048 bytes is sent on an interface with UD MTU 2048, because 2048 is bigger than the real limit of 2044 but the code tests against the wrong limit of 2060. This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>, submitted by Scott Weitzenkamp <sweitzen@cisco.com>. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Connected mode experimental supportMichael S. Tsirkin2007-02-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | The following patch adds experimental support for IPoIB connected mode, as defined by the draft from the IETF ipoib working group. The idea is to increase performance by increasing the MTU from the maximum of 2K (theoretically 4K) supported by IPoIB on top of UD. With this code, I'm able to get 800MByte/sec or more with netperf without options on a Mellanox 4x back-to-back DDR system. Some notes on code: 1. SRQ is used for scalability to large cluster sizes 2. Only RC connections are used (UC does not support SRQ now) 3. Retry count is set to 0 since spec draft warns against retries 4. Each connection is used for data transfers in only 1 direction, so each connection is either active(TX) or passive (RX). 2 sides that want to communicate create 2 connections. 5. Each active (TX) connection has a separate CQ for send completions - this keeps the code simple without CQ resize and other tricks 6. To detect stale passive side connections (where the remote side is down), we keep an LRU list of passive connections (updated once per second per connection) and destroy a connection after it has been unused for several seconds. The LRU rule makes it possible to avoid scanning connections that have recently been active. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Use the new verbs DMA mapping functionsRalph Campbell2006-12-12
| | | | | | | | Convert IPoIB to use the new DMA mapping functions for kernel verbs consumers. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* WorkStruct: make allyesconfigDavid Howells2006-11-22
| | | | | | Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>
* IPoIB: Check for DMA mapping error for TX packetsRoland Dreier2006-10-10
| | | | Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Add some likely/unlikely annotations in hot pathEli Cohen2006-09-22
| | | | | Signed-off-by: Eli Cohen <eli@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Rejoin all multicast groups after a port eventEli Cohen2006-09-22
| | | | | | | | | | | | | When ipoib_ib_dev_flush() is called because of a port event, the driver needs to rejoin all multicast groups, since the flush will call ipoib_mcast_dev_flush() (via ipoib_ib_dev_down()). Otherwise no (non-broadcast) multicast groups will be rejoined until the networking core calls ->set_multicast_list again, and so multicast reception will be broken for potentially a long time. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Refactor completion handlingRoland Dreier2006-09-22
| | | | | | | | Split up ipoib_ib_handle_wc() into ipoib_ib_handle_rx_wc() and ipoib_ib_handle_tx_wc() to make the code easier to read. This will also help implement NAPI in the future. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Avoid using stale last_send counter when reaping AHsRoland Dreier2006-06-17
| | | | | | | | | | | | The comparisons of priv->tx_tail to ah->last_send in ipoib_free_ah() and ipoib_post_receive() are slightly unsafe, because priv->tx_lock is not held and hence a stale value of ah->last_send might be used, which would lead to freeing an AH before the driver was really done with it. The simple way to fix this is to the optimization of early free from ipoib_free_ah() and unconditionally queue AHs for reaping, and then take priv->tx_lock in __ipoib_reap_ah(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Fix AH leak at interface downEli Cohen2006-06-05
| | | | | | | | | | | | | | | When ipoib_stop() is called it first calls netif_stop_queue() to stop the kernel from passing more packets to the network driver. However, the completion handler may call netif_wake_queue() re-enabling packet transfer. This might result in leaks (we see AH leaks which we think can be attributed to this bug) as new packets get posted while the interface is going down. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Make send and receive queue sizes tunableShirley Ma2006-04-10
| | | | | | | | | | | Make IPoIB's send and receive queue sizes tunable via module parameters ("send_queue_size" and "recv_queue_size"). This allows the queue sizes to be enlarged to fix disastrously bad performance on some platforms and workloads, without bloating memory usage when large queues aren't needed. Signed-off-by: Shirley Ma <xma@us.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: P_Key change event handlingLeonid Arsh2006-03-24
| | | | | | | | | | | | | | | | | | | | | | | | This patch causes the network interface to respond to P_Key change events correctly. As a result, you'll see a child interface in the "RUNNING" state (netif_carrier_on()) only when the corresponding P_Key is configured by the SM. When SM removes a P_Key, the "RUNNING" state will be disabled for the corresponding network interface. To implement this, I added IB_EVENT_PKEY_CHANGE event handling. To prevent flushing the device before the device is open by the "delay open" mechanism, I added an additional device flag called IPOIB_FLAG_INITIALIZED. This also prevents the child network interface from trying to join to multicast groups until the PKEY is configured. We used to get error messages like: ib0.f2f2: couldn't attach QP to multicast group ff12:401b:f2f2:0:0:0:ffff:ffff in this case. To fix this, I just check IPOIB_FLAG_OPER_UP flag in ipoib_set_mcast_list(). Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Pass correct pointer when flushing child interfacesLeonid Arsh2006-03-24
| | | | | | | ipoib_ib_dev_flush() should get passed cpriv->dev, not &cpriv->dev. Signed-off-by: Leonid Arsh <leonida@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Move ipoib_ib_dev_flush() to ipoib workqueueJack Morgenstein2006-03-20
| | | | | | | | | | | | | | Move ipoib_ib_dev_flush() to ipoib's workqueue. This keeps it ordered with respect to other work scheduled by the ipoib driver. This fixes problems with races, for example: - ipoib_ib_dev_flush() has started running because of an IB event - user does ifconfig ib0 down - ipoib_mcast_stop_thread() gets called twice and waits for the same completion twice Signed-off-by: Jack Morgenstein <jackm@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Clean up if posting receives failsEli Cohen2006-03-20
| | | | | | | | | | If posting receives in ipoib_ib_dev_open() fails, call ipoib_ib_dev_stop() to move the device's QP back to the RESET state so that we can try again later. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IB: convert from semaphores to mutexesIngo Molnar2006-01-13
| | | | | | | | semaphore to mutex conversion by Ingo and Arjan's script. Signed-off-by: Ingo Molnar <mingo@elte.hu> [ Sanity-checked on real IB hardware ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: Fix memory leak of multicast group structuresEli Cohen2006-01-12
| | | | | | | | | | | The current handling of multicast groups in IPoIB ends up never freeing send-only multicast groups. It turns out the logic was much more complicated than it needed to be; we can fix this bug and completely kill ipoib_mcast_dev_down() at the same time. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* IPoIB: protect child list in ipoib_ib_dev_flushMichael S. Tsirkin2005-11-29
| | | | | | | race condition: ipoib_ib_dev_flush is accessing child list without locks. Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [IPoIB] don't compile debug code if debugging isn't enabledRoland Dreier2005-11-02
| | | | | | | | | Don't build ipoib_mcast_iter_ functions if CONFIG_INFINIBAND_IPOIB_DEBUG is not enabled -- their only callers will not be built either. Also move the prototype for ipoib_open() to ipoib.h to fix a sparse warning. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [IPoIB] cleanups: fix comment, remove useless variablesRoland Dreier2005-10-31
| | | | | | | | Minor cleanups: fix a misleading comment, and get rid of attr_mask variables that are only used to hold constants (just use the constants directly). Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [IPoIB] Drop RX packets when out of memoryRoland Dreier2005-10-28
| | | | | | | | | | Change the way IPoIB handles RX packets when it can't allocate a new receive skbuff. If the allocation of a new receive skb fails, we now drop the packet we just received and repost the original receive skb. This means that the receive ring always stays full and we don't have to monkey around with trying to schedule a refill task for later. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [IPoIB] Rename ipoib_create_qp() -> ipoib_init_qp() and fix error cleanupRoland Dreier2005-10-17
| | | | | | | | | | ipoib_create_qp() no longer creates IPoIB's QP, so it shouldn't destroy the QP on failure -- that unwinding happens elsewhere, so the current code can cause a double free. While we're at it, the function's name should match what it actually does, so rename it to ipoib_init_qp(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [PATCH] IPoIB: Don't flush workqueue from within workqueueRoland Dreier2005-09-20
| | | | | | | | | | | ipoib_mcast_restart_task() is always called from within the single-threaded IPoIB workqueue, so flushing the workqueue from within the function can lead to a recursion overflow. But since we're running in a single-threaded workqueue, we're already synchronized against other items in the workqueue, so just get rid of the flush in ipoib_mcast_restart_task(). Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [PATCH] IB: move include files to include/rdmaRoland Dreier2005-08-26
| | | | | | | | Move the InfiniBand headers from drivers/infiniband/include to include/rdma. This allows InfiniBand-using code to live elsewhere, and lets us remove the ugly EXTRA_CFLAGS include path from the InfiniBand Makefiles. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [PATCH] IB: Add copyright noticesRoland Dreier2005-08-26
| | | | | | | Make some lawyers happy and add copyright notices for people who forgot to include them when they actually touched the code. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [IB/ipoib]: Fix unsigned comparisons to handle wraparoundRoland Dreier2005-07-27
| | | | | | Fix handling of tx_head/tx_tail comparisons to handle wraparound. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* [PATCH] IPoIB: set skb->mac.raw on receiveHal Rosenstock2005-04-16
| | | | | | | | | | Set skb->mac.raw on receive. This fixes crashes when this is dereferenced, for example by netfilter or when PF_PACKET is used. Signed-off-by: Hal Rosenstock <halr@voltaire.com> Signed-off-by: Roland Dreier <roland@topspin.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Linux-2.6.12-rc2v2.6.12-rc2Linus Torvalds2005-04-16
Initial git repository build. I'm not bothering with the full history, even though we have it. We can create a separate "historical" git archive of that later if we want to, and in the meantime it's about 3.2GB when imported into git - space that would just make the early git days unnecessarily complicated, when we don't have a lot of good infrastructure for it. Let it rip!