litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	IB/mthca: Clear ICM pages before handing to FW	Eli Cohen	2008-06-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current memfree FW has a bug which in some cases, assumes that ICM pages passed to it are cleared. This patch uses __GFP_ZERO to allocate all ICM pages passed to the FW. Once firmware with a fix is released, we can make the workaround conditional on firmware version. This fixes the bug reported by Arthur Kepner <akepner@sgi.com> here: http://lists.openfabrics.org/pipermail/general/2008-May/050026.html Cc: <stable@kernel.org> Signed-off-by: Eli Cohen <eli@mellanox.co.il> [ Rewritten to be a one-liner using __GFP_ZERO instead of vmap()ing ICM memory and memset()ing it to 0. - Roland ] Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/uverbs: Fix check of is_closed flag check in ib_uverbs_async_handler()	Jack Morgenstein	2008-06-18
\| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 1ae5c187 ("IB/uverbs: Don't store struct file * for event files") changed the way that closed files are handled in the uverbs code. However, after the conversion, is_closed flag is checked incorrectly in ib_uverbs_async_handler(). As a result, no async events are ever passed to applications. Found by: Ronni Zimmerman <ronniz@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/nes: Fix off-by-one in nes_reg_user_mr() error path	Roland Dreier	2008-06-10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nes_reg_user_mr() should fail if page_count becomes >= 1024 * 512 rather than just testing for strict >, because page_count is essentially used as an index into an array with 1024 * 512 entries, so allowing the loop to continue with page_count == 1024 * 512 means that memory after the end of the array is corrupted. This leads to a crash triggerable by a userspace application that requests registration of a too-big region. Also get rid of the call to pci_free_consistent() here to avoid corrupting state with a double free, since the same memory will be freed in the code jumped to at reg_user_mr_err. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/core: Remove IB_DEVICE_SEND_W_INV capability flag	Roland Dreier	2008-06-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In 2.6.26, we added some support for send with invalidate work requests, including a device capability flag to indicate whether a device supports such requests. However, the support was incomplete: the completion structure was not extended with a field for the key contained in incoming send with invalidate requests. Full support for memory management extensions (send with invalidate, local invalidate, fast register through a send queue, etc) is planned for 2.6.27. Since send with invalidate is not very useful by itself, just remove the IB_DEVICE_SEND_W_INV bit before the 2.6.26 final release; we will add an IB_DEVICE_MEM_MGT_EXTENSIONS bit in 2.6.27, which makes things simpler for applications, since they will not have quite as confusing an array of fine-grained bits to check. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/umem: Avoid sign problems when demoting npages to integer	Roland Dreier	2008-06-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On a 64-bit architecture, if ib_umem_get() is called with a size value that is so big that npages is negative when cast to int, then the length of the page list passed to get_user_pages(), namely min_t(int, npages, PAGE_SIZE / sizeof (struct page *)) will be negative, and get_user_pages() will immediately return 0 (at least since 900cf086, "Be more robust about bad arguments in get_user_pages()"). This leads to an infinite loop in ib_umem_get(), since the code boils down to: while (npages) { ret = get_user_pages(...); npages -= ret; } Fix this by taking the minimum as unsigned longs, so that the value of npages is never truncated. The impact of this bug isn't too severe, since the value of npages is checked against RLIMIT_MEMLOCK, so a process would need to have an astronomical limit or have CAP_IPC_LOCK to be able to trigger this, and such a process could already cause lots of mischief. But it does let buggy userspace code cause a kernel lock-up; for example I hit this with code that passes a negative value into a memory registartion function where it is promoted to a huge u64 value. Cc: <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix SM trap forwarding	Ralph Campbell	2008-06-06
\| \| \| \| \| \| \| \| \|	SM/SMA traps received by the ipath driver should be forwarded to the SM if it is running on the host. The ib_ipath driver was incorrectly replying with "bad method." Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ehca: Reject send WRs only for RESET, INIT and RTR state	Joachim Fenkes	2008-06-06
\| \| \| \| \|	Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix device capability flags	Ralph Campbell	2008-05-26
\| \| \| \| \| \| \| \| \|	The driver supports a few features (RNR NAK, port active event, SRQ resize) that were not reported in the device capability flags. This patch fixes that. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Avoid test_bit() on u64 SDMA status value	Roland Dreier	2008-05-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Gabriel C <nix.or.die@googlemail.com> pointed out that when the x86 bitops are updated to operate on unsigned long, the code in sdma_abort_task() will produce warnings: drivers/infiniband/hw/ipath/ipath_sdma.c: In function 'sdma_abort_task': drivers/infiniband/hw/ipath/ipath_sdma.c:267: warning: passing argument 2 of 'constant_test_bit' from incompatible pointer type and so on, because it uses test_bit() to operation on a u64 value (returned by ipath_read_kref64() for a hardware register). Fix up these warnings by converting the test_bit() operations to &ing with appropriate symbolic defines of the bits within the hardware register. This has the benign side-effect of making the code more self-documenting as well. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2008-05-23
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/mad: Fix kernel crash when .process_mad() returns SUCCESS\|CONSUMED IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish() MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries IB/mlx4: Fix creation of kernel QP with max number of send s/g entries IB/mthca: Fix max_sge value returned by query_device RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send() IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send() IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate IB/ipath: Fix printk format for ipath_sdma_status
\| *	IB/mad: Fix kernel crash when .process_mad() returns SUCCESS\|CONSUMED	Dave Olson	2008-05-23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a low-level driver returns IB_MAD_RESULT_SUCCESS \| IB_MAD_RESULT_CONSUMED, handle_outgoing_dr_smp() doesn't clean up properly. The fix is to kfree the local data and break, rather than falling through. This was observed with the ipath driver, but could happen with any driver. This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()	Jack Morgenstein	2008-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We saw a kernel oops in our regression testing when a multicast "join finish" occurred just after the interface was -- this is <https://bugs.openfabrics.org/show_bug.cgi?id=1040>. The test randomly causes the HCA physical port to go down then up. The cause of this is that ipoib_mcast_join_finish() processing happen just after ipoib_mcast_dev_flush() was invoked (in which case the broadcast pointer is NULL). This patch tests for and handles the case where priv->broadcast is NULL. Cc: <stable@kernel.org> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/mlx4: Fix creation of kernel QP with max number of send s/g entries	Roland Dreier	2008-05-20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When creating a kernel QP where the consumer asked for a send queue with lots of scatter/gater entries, set_kernel_sq_size() incorrectly returned an error if the send queue stride is larger than the hardware's maximum send work request descriptor size. This is not a problem; the only issue is to make sure that the actual descriptors used do not overflow the maximum descriptor size, so check this instead. Clamp the returned max_send_sge value to be no bigger than what query_device returns for the max_sge to avoid confusing hapless users, even if the hardware is capable of handling a few more s/g entries. This bug caused NFS/RDMA mounts to fail when the server adapter used the mlx4 driver. Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/mthca: Fix max_sge value returned by query_device	Roland Dreier	2008-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The mthca driver returns the maximum number of scatter/gather entries returned by the firmware as the max_sge value when device properties are queried. However, the firmware also reports a limit on the maximum descriptor size allowed, and because mthca takes into account the worst case send request overhead when checking whether to allow a QP to be created, the largest number of scatter/gather entries that can be used with mthca may be limited by the maximum descriptor size rather than just by the actual s/g entry limit. This means that applications cannot actually create QPs with max_send_sge equal to the limit returned by ib_query_device(). Fix this by checking if the maximum descriptor size imposes a lower limit and if so returning that lower limit. Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()	Roland Dreier	2008-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	drivers/infiniband/hw/cxgb3/iwch_qp.c: In function 'iwch_post_send': drivers/infiniband/hw/cxgb3/iwch_qp.c:232: warning: 't3_wr_flit_cnt' may be used uninitialized in this function This is what akpm describes as "the dopey gcc-doesn't-know-that-foo(&var)-writes-to-var problem." Signed-off-by: Roland Dreier <rolandd@cisco.com> Acked-by: Steve Wise <swise@opengridcomputing.com>
\| *	IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()	Andrew Morton	2008-05-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	drivers/infiniband/hw/mlx4/qp.c: In function 'mlx4_ib_post_send': drivers/infiniband/hw/mlx4/qp.c:1460: warning: 'seglen' may be used uninitialized in this function This is the dopey gcc-doesn't-know-that-foo(&var)-writes-to-var problem. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate	Ralph Campbell	2008-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I fixed the RC receive completion opcode in 2bfc8e9e ("IB/ipath: Return the correct opcode for RDMA WRITE with immediate"), I forgot to fix UC, which had the same problem for RDMA write with immediate returning the wrong opcode. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/ipath: Fix printk format for ipath_sdma_status	Roland Dreier	2008-05-15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f018c7e1 ("IB/ipath: Change ipath_devdata.ipath_sdma_status to be unsigned long") changed ipath_sdma_status to be unsigned long, but left a few debug messages that printed it out with a %016llx format, which generates the warnings drivers/infiniband/hw/ipath/ipath_sdma.c:348: warning: format '%016llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int' drivers/infiniband/hw/ipath/ipath_sdma.c:618: warning: format '%016llx' expects type 'long long unsigned int', but argument 3 has type 'long unsigned int' Fix this by changing the format used to print out the value to %08lx (8 hex digits are now sufficient, because the highest bit used is 31). Warnings reported by Randy Dunlap <randy.dunlap@oracle.com>. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \|	IB: fix race in device_create	Greg Kroah-Hartman	2008-05-20
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a race from when a device is created with device_create() and then the drvdata is set with a call to dev_set_drvdata() in which a sysfs file could be open, yet the drvdata will be NULL, causing all sorts of bad things to happen. This patch fixes the problem by using the new function, device_create_drvdata(). Cc: Kay Sievers <kay.sievers@vrfy.org> Reviewed-by: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	RDMA/cxgb3: Wrap the software send queue pointer as needed on flush	Steve Wise	2008-05-13
\| \| \| \| \| \| \| \|	cxio_flush_sq() was failing to wrap around the software send queue causing garbage completion entries on a flush operation. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Change ipath_devdata.ipath_sdma_status to be unsigned long	Roland Dreier	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Andrew Morton <akpm@linux-foundation.org> pointed out that bitops should take an unsigned long * arg. However, the ipath driver was doing bitops on struct ipath_devdata.ipath_sdma_status, which is u64. Change this member to unsigned long to avoid tons of warnings when x86 fixes the bitops to take unsigned long * instead of void *. Also, change the IPATH_SDMA_RUNNING and IPATH_SDMA_SHUTDOWN bit numbers to 30 and 31 (instead of 62 and 63) so that we're not setting another booby trap for someone who tries to make ipath work on a 32-bit architecture. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Make ipath_portdata work with struct pid * not pid_t	Pavel Emelyanov	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The official reason is "with the presence of pid namespaces in the kernel using pid_t-s inside one is no longer safe." But the reason I fix this right now is the following: About a month ago (when 2.6.25 was not yet released) there still was a one last caller of a to-be-deprecated-soon function find_pid() - the kill_proc() function, which in turn was only used by nfs callback code. During the last merge window, this last caller was finally eliminated by some NFS patch(es) and I was about to finally kill this kill_proc() and find_pid(), but found, that I was late and the kill_proc is now called from the ipath driver since commit 58411d1c ("IB/ipath: Head of Line blocking vs forward progress of user apps"). So here's a patch that fixes this code to use struct pid * and (!) the kill_pid routine. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix RDMA read response sequence checking	Ralph Campbell	2008-05-13
\| \| \| \| \| \| \| \| \| \|	If an out of sequence RDMA read response middle or last packet is received, we should only resend the RDMA read request on the first out of sequence packet and drop subsequent out of sequence packets otherwise, we get "too many retries". Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix many locking issues when switching to error state	Ralph Campbell	2008-05-13
\| \| \| \| \| \| \| \| \| \|	The send DMA hardware queue voided a number of prior assumptions about when a send is complete which led to completions being generated out of order. There were also a number of locking issues when switching the QP to the error or reset states, and we implement the IB_QPS_SQD state. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix RC and UC error handling	Ralph Campbell	2008-05-13
\| \| \| \| \| \| \| \| \| \|	When errors are detected in RC, the QP should transition to the IB_QPS_ERR state, not the IB_QPS_SQE state. Also, when the error is on the responder side, the receive work completion error was incorrect (remote vs. local). Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/nes: Fix up nes_lro_max_aggr module parameter	Roland Dreier	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix some bugs with the max_aggr module parameter added with LRO support: - The module parameter value ignored and not actually used to set lro_mgr.max_aggr. - MODULE_PARM_DESC had a typo "_mro_" instead of "_lro_" so it didn't end up describing the actual module parameter. - The nes_lro_max_aggr variable was declared as unsigned, but the module_param line said "int" instead of "uint" for the type. - The default value for the parameter was stuck in the permissions field of module_param, which led to nonsensical permissions for the file under /sys/module/iw_nes/param. - The parameter was used in only one file but defined in another, which led to the variable being global for no good reason. Move everything related to the parameter to the file nes_hw.c where it is actually used. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ehca: Wait for async events to finish before destroying QP	Stefan Roscher	2008-05-07
\| \| \| \| \| \| \| \|	This is necessary because, in a multicore environment, a race between uverbs async handler and destroy QP could occur. Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix SDMA error recovery in absence of link status change	John Gregor	2008-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	What's fixed: in ipath_cancel_sends() We need to unconditionally set ABORTING. So, swap the tests so the set_bit() isn't shadowed by the &&. If we've disarmed the piobufs, then we need to unconditionally set DISARMED. So, move it out from the overly protective if at the bottom. in sdma_abort_task() Abort_task was written knowing that the SDMA engine would always be reset (and restarted) on error. A recent change broke that fundamental assumption by taking the restart portion and making it conditional on a link status change. But, SDMA can go boom without a link status change in some conditions. Signed-off-by: John Gregor <john.gregor@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Need to always request and handle PIO avail interrupts	Dave Olson	2008-05-07
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that we always use PIO for vl15 on 7220, we could get stuck forever if we happened to run out of PIO buffers from the verbs code, because the setup code wouldn't run; the interrupt was also ignored if SDMA was supported. We also have to reduce the pio update threshold if we have fewer kernel buffers than the existing threshold. Clean up the initialization a bit to get ordering safer and more sensible, and use the existing ipath_chg_kernavail call to do init, rather than doing it separately. Drop unnecessary clearing of pio buffer on pio parity error. Drop incorrect updating of pioavailshadow when exitting freeze mode (software state may not match chip state if buffer has been allocated and not yet written). If we couldn't get a kernel buffer for a while, make sure we are in sync with hardware, mainly to handle the exitting freeze case. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix count of packets received by kernel	Michael Albaugh	2008-05-07
\| \| \| \| \| \| \| \| \| \| \|	The loop in ipath_kreceive() that processes packets increments the loop-index 'i' once too often, because the exit condition does not depend on it, and is checked after the increment. By adding a check for !last to the iterator in the for loop, we correct that in a way that is not so likely to be re-broken by changes in the loop body. Signed-off-by: Michael Albaugh <micheal.albaugh@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Return the correct opcode for RDMA WRITE with immediate	Ralph Campbell	2008-05-07
\| \| \| \| \| \| \| \|	This patch fixes a bug in the RC responder which generates a completion entry with the wrong opcode when an RDMA WRITE with immediate is received. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Fix bug that can leave sends disabled after freeze recovery	Dave Olson	2008-05-07
\| \| \| \| \| \| \| \| \|	The semantics of cancel_sends changed, but the code using it was missed. Don't leave sends and pioavail updates disabled, and add a comment as to why the force update is needed. Signed-off-by: Dave Olson <dave.olson@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Only increment SSN if WQE is put on send queue	Ralph Campbell	2008-05-07
\| \| \| \| \| \| \| \| \| \|	If a send work request has immediate errors and is not put on the send queue, we shouldn't update any of the QP state. The increment of the SSN wasn't obeying this. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipath: Only warn about prototype chip during init	Michael Albaugh	2008-05-07
\| \| \| \| \| \| \| \| \| \| \|	We warn about prototype chips, but the function that checks for support is also called as a result of a get_portinfo request, which can clutter the logs. Restrict warning to only appear during initialization. Signed-off-by: Michael Albaugh <michael.albaugh@qlogic.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cxgb3: Fix severe limit on userspace memory registration size	Roland Dreier	2008-05-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, iw_cxgb3 is severely limited on the amount of userspace memory that can be registered in in a single memory region, which causes big problems for applications that expect to be able to register 100s of MB. The problem is that the driver uses a single kmalloc()ed buffer to hold the physical buffer list (PBL) for the entire memory region during registration, which means that 8 bytes of contiguous memory are required for each page of memory being registered. For example, a 64 MB registration will require 128 KB of contiguous memory with 4 KB pages, and it unlikely that such an allocation will succeed on a busy system. This is purely a driver problem: the temporary page list buffer is not needed by the hardware, so we can fix this by writing the PBL to the hardware in page-sized chunks rather than all at once. We do this by splitting the memory registration operation up into several steps: - Allocate PBL space in adapter memory for the full registration - Copy PBL to adapter memory in chunks - Allocate STag and enable memory region This also allows several other cleanups to the __cxio_tpt_op() interface and related parts of the driver. This change leaves the reregister memory region and memory window operations broken, but they already didn't work due to other longstanding bugs, so fixing them will be left to a later patch. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cxgb3: Don't add PBL memory to gen_pool in chunks	Roland Dreier	2008-05-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current iw_cxgb3 code adds PBL memory to the driver's gen_pool in 2 MB chunks. This limits the largest single allocation that can be done to the same size, which means that with 4 KB pages, each of which takes 8 bytes of PBL memory, the largest memory region that can be allocated is 1 GB (256K PBL entries * 4 KB/entry). Remove this limit by adding all the PBL memory in a single gen_pool chunk, if possible. Add code that falls back to smaller chunks if gen_pool_add() fails, which can happen if there is not sufficient contiguous lowmem for the internal gen_pool bitmap. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ehca: Fix function return types	Stefan Roscher	2008-05-05
\| \| \| \| \| \| \| \|	Also remove duplicate assignment of local_ca_ack_delay and change min_t check for local_ca_ack_delay to u8 instead of int. Signed-off-by: Stefan Roscher <stefan.roscher at de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cxgb3: Bump up the MPA connection setup timeout.	Steve Wise	2008-05-02
\| \| \| \| \| \| \|	Testing on large clusters shows its way too short at 10 secs. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cxgb3: Silently ignore close reply after abort.	Steve Wise	2008-05-02
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove bad BUG_ON() that can trigger in correct operation from close_con_rpl(). It is possible to get a close_rpl message on a dead connection. The sequence is: - host refs ep for close exchange - host posts close_req - hw posts PEER_ABORT from incoming RST - host marks ep DEAD - host posts ABORT_RPL and releases ep resources - hw posts CLOSE_RPL - host derefs ep and ep freed. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cxgb3: QP flush fixes	Steve Wise	2008-05-02
\| \| \| \| \| \| \| \| \| \| \| \|	- Flush the QP only after the HW disables the connection. Currently we flush the QP when transitioning to CLOSING. This exposes a race condition where the HW can complete a RECV WR, for instance, -and- the SW can flush that same WR. - Only call CQ event handlers on flush IFF we actually flushed something. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ipoib: Fix transmit queue stalling forever	Eli Cohen	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit f56bcd80 ("IPoIB: Use separate CQ for UD send completions") introduced a bug where the transmit queue could get stopped and never woken up. The problem is that send completions are only polled at the end of the xmit function, so if the send queue fills up and the xmit path stops the queue, then there is no way for send completions to ever get polled, and so the transmit queue stays stopped forever. Fix this by arming the send CQ just before posting the last send request that fills the send queue. Then, when the completion event handler is called, drain the send CQ. Since it is possible that not enough send completions are in the CQ, verify that the the net queue has been woken up after draining the send CQ, and if not arm a timer and drain again at the timer function. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf()	Roland Dreier	2008-04-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I changed things around so that mlx4_ib_alloc_cq_buf() and mlx4_ib_free_cq_buf() were used everywhere they could be. However, I screwed up the number of entries passed into mlx4_ib_alloc_cq_buf() in a couple places -- the function bumps the number of entries internally, so the caller shouldn't add 1 as well. Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf() can cause the cleanup to go off the end of an array and corrupt allocator state in interesting ways. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/nes: Formatting cleanup	Glenn Streiff	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \|	Various cleanups: - Change // to /* .. */ - Place whitespace around binary operators. - Trim down a few long lines. - Some minor alignment formatting for better readability. - Remove some silly tabs. Signed-off-by: Glenn Streiff <gstreiff@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/nes: Add support for SFP+ PHY	Eric Schneider	2008-04-29
\| \| \| \| \| \| \| \| \|	This patch enables the iw_nes module for NetEffect RNICs to support additional PHYs including SFP+ (referred to as ARGUS in the code). Signed-off-by: Eric Schneider <eric.schneider@neteffect.com> Signed-off-by: Glenn Streiff <gstreiff@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/nes: Use LRO	Faisal Latif	2008-04-29
\| \| \| \| \| \|	Signed-off-by: Faisal Latif <flatif@neteffect.com. Signed-off-by: Glenn Streiff <gstreiff@neteffect.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IPoIB: Copy child MTU from parent	Eli Cohen	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \|	When creating a child interface, copy the MTU information from the parent. Otherwise when the child's multicast join completes, the MTU will not be updated since the code does dev->mtu = min(priv->mcast_mtu, priv->admin_mtu); and priv->admin_mtu will be set to 0. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mthca: Avoid changing userspace ABI to handle DMA write barrier attribute	Roland Dreier	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit cb9fbc5c ("IB: expand ib_umem_get() prototype") changed the mthca userspace ABI to provide a way for userspace to indicate which memory regions need the DMA write barrier attribute. However, it is possible to handle this without breaking existing userspace, by having the mthca kernel driver recognize whether it is talking to old or new userspace, depending on the size of the register MR structure passed in. The only potential drawback of this is that is allows old userspace (which has a bug with DMA ordering on large SGI Altix systems) to continue to run on new kernels, but the advantage of allowing old userspace to continue to work on unaffected systems seems to outweigh this, and we can print a warning to push people to upgrade their userspace. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mthca: Avoid recycling old FMR R_Keys too soon	Olaf Kirch	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a FMR is unmapped, mthca resets the map count to 0, and clears the upper part of the R_Key which is used as the sequence counter. This poses a problem for RDS, which uses ib_fmr_unmap as a fence operation. RDS assumes that after issuing an unmap, the old R_Keys will be invalid for a "reasonable" period of time. For instance, Oracle processes uses shared memory buffers allocated from a pool of buffers. When a process dies, we want to reclaim these buffers -- but we must make sure there are no pending RDMA operations to/from those buffers. The only way to achieve that is by using unmap and sync the TPT. However, when the sequence count is reset on unmap, there is a high likelihood that a new mapping will be given the same R_Key that was issued a few milliseconds ago. To prevent this, don't reset the sequence count when unmapping a FMR. Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/ehca: Allocate event queue size depending on max number of CQs and QPs	Stefan Roscher	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a lot of QPs fall into Error state at once and the EQ of the respective HCA is too small, it might overrun, causing the eHCA driver to stop processing completion events and calling the application's completion handlers, effectively causing traffic to stop. Fix this by limiting available QPs and CQs to a customizable max count, and determining EQ size based on these counts and a worst-case assumption. Signed-off-by: Stefan Roscher <stefan.roscher@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IPoIB: Use separate CQ for UD send completions	Eli Cohen	2008-04-29
\| \| \| \| \| \| \| \| \| \|	Use a dedicated CQ for UD send completions. Also, do not arm the UD send CQ, which reduces the number of interrupts generated. This patch farther reduces overhead by not calling poll CQ for every posted send WR -- it does polls only when there 16 or more outstanding work requests. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>