litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	IB/fmr_pool: ib_fmr_pool_flush() should flush all dirty FMRs	Olaf Kirch	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a FMR is released via ib_fmr_pool_unmap(), the FMR usually ends up on the free_list rather than the dirty_list (because we allow a certain number of remappings before actually requiring a flush). However, ib_fmr_batch_release() only looks at dirty_list when flushing out old mappings. This means that when ib_fmr_pool_flush() is used to force a flush of the FMR pool, some dirty FMRs that have not reached their maximum remap count will not actually be flushed. Fix this by flushing all FMRs that have been used at least once in ib_fmr_batch_release(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/fmr_pool: Flush serial numbers can get out of sync	Olaf Kirch	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally, the serial numbers for flush requests and flushes executed for an FMR pool should be in sync. However, if the FMR pool flushes dirty FMRs because the dirty_watermark was reached, we wake up the cleanup thread and let it do its stuff. As a side effect, the cleanup thread increments pool->flush_ser, which leaves it one higher than pool->req_ser. The next time the user calls ib_flush_fmr_pool(), the cleanup thread will be woken up, but ib_flush_fmr_pool() won't wait for the flush to complete because flush_ser is already past req_ser. This means the FMRs that the user expects to be flushed may not have all been flushed when the function returns. Fix this by telling the cleanup thread to do work exclusively by incrementing req_ser, and by moving the comparison of dirty_len and dirty_watermark into ib_fmr_pool_unmap(). Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
*	IB/umad: Simplify and fix locking	Roland Dreier	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to being overly complex, the locking in user_mad.c is broken: there were multiple reports of deadlocks and lockdep warnings. In particular it seems that a single thread may end up trying to take the same rwsem for reading more than once, which is explicitly forbidden in the comments in <linux/rwsem.h>. To solve this, we change the locking to use plain mutexes instead of rwsems. There is one mutex per open file, which protects the contents of the struct ib_umad_file, including the array of agents and list of queued packets; and there is one mutex per struct ib_umad_port, which protects the contents, including the list of open files. We never hold the file mutex across calls to functions like ib_unregister_mad_agent(), which can call back into other ib_umad code to queue a packet, and we always hold the port mutex as long as we need to make sure that a device is not hot-unplugged from under us. This even makes things nicer for users of the -rt patch, since we remove calls to downgrade_write() (which is not implemented in -rt). Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cma: Override default responder_resources with user value	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	By default, the responder_resources parameter is set to that received in a connection request. The passive side may override this value when accepting the connection. Use the value provided by the passive side when transitioning the QP to RTR state, rather than the value given in the connect request. Without this change, the RTR transition may fail if the passive side supports fewer responder_resources than that in the request. For code consistency and to protect against QP destruction, restructure overriding initiator_depth to match how responder_resources is set. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IPoIB: improve IPv4/IPv6 to IB mcast mapping functions	Rolf Manderscheid	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	An IPoIB subnet on an IB fabric that spans multiple IB subnets can't use link-local scope in multicast GIDs. The existing routines that map IP/IPv6 multicast addresses into IB link-level addresses hard-code the scope to link-local, and they also leave the partition key field uninitialised. This patch adds a parameter (the link-level broadcast address) to the mapping routines, allowing them to initialise both the scope and the P_Key appropriately, and fixes up the call sites. The next step will be to add a way to configure the scope for an IPoIB interface. Signed-off-by: Rolf Manderscheid <rvm@obsidianresearch.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cma: add support for rdma_migrate_id()	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is based on user feedback from Doug Ledford at RedHat: Events that occur on an rdma_cm_id are reported to userspace through an event channel. Connection request events are reported on the event channel associated with the listen. When the connection is accepted, a new rdma_cm_id is created and automatically uses the listen event channel. This is suboptimal where the user only wants listen events on that channel. Additionally, it may be desirable to have events related to connection establishment use a different event channel than those related to already established connections. Allow the user to migrate an rdma_cm_id between event channels. All pending events associated with the rdma_cm_id are moved to the new event channel. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/cma: Reenable device removal on passive side	Vladimir Sokolovsky	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Enable conn_id remove on the passive side after connection establishment. This corrects an issue where the IB driver can't be unloaded after running applications over RDS. The 'dev_remove' counter does not reach 0 for established connections on the passive side. This problem is limited to device removal, and only occurs on the passive side if there are established connections. Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.co.il> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Fix incorrect access to items on local_list	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In cancel_mads(), MADs are moved from the wait_list and local_list to a cancel_list for processing. However, the structures on these two lists are not the same. The wait_list references struct ib_mad_send_wr_private, but local_list references struct ib_mad_local_private. Cancel_mads() treats all items moved to the cancel_list as struct ib_mad_send_wr_private. This leads to a system crash when requests are moved from the local_list to the cancel_list. Fix this by leaving local_list alone. All requests on the local_list have completed are just awaiting processing by a queued worker thread. Bug (crash) reported by Dotan Barak <dotanb@dev.mellanox.co.il>. Problem with local_list access reported by Robert Reynolds <rreynolds@opengridcomputing.com>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/cm: Add basic performance counters	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \| \| \|	Add performance/debug counters to track sent/received messages, retries, and duplicates. Counters are tracked per CM message type, per port. The counters are always enabled, so intrusive state tracking is not done. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Report number of times a mad was retried	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \| \| \| \|	To allow ULPs to tune timeout values and capture retry statistics, report the number of times that a mad send operation was retried. For RMPP mads, report the total number of times that the any portion (send window) of the send operation was retried. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/multicast: Report errors on multicast groups if P_key changes	Sean Hefty	2008-01-25
\| \| \| \| \| \| \| \|	P_key changes can invalidate multicast groups. Report errors on all multicast groups affected by a pkey change. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Enable loopback of DR SMP responses from userspace	Steve Welch	2008-01-25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The local loopback of an outgoing DR SMP response is limited to those that originate at the driver specific SMA implementation during the driver specific process_mad() function. This patch enables a returning DR SMP originating in userspace (or elsewhere) to be delivered to the local managment stack. In this specific case the driver process_mad() function does not consume or process the MAD, so a reponse mad has not be created and the original MAD must manually be copied to the MAD buffer that is to be handed off to the local agent. Signed-off-by: Steve Welch <swelch@systemfabricworks.com> Acked-by: Hal Rosenstock <hal@xsigo.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Remove redundant NULL pointer check in ib_mad_recv_done_handler()	Ralph Campbell	2008-01-25
\| \| \| \| \| \| \| \| \| \| \|	In ib_mad_recv_done_handler(), the response pointer is checked for NULL after allocating it. It is then checked again in the local process_mad() path but there is no possibility of it changing in between. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Acked-by: Hal Rosenstock <hal@xsigo.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	RDMA/iwcm: Set initiator depth and responder resources to device max values	Steve Wise	2008-01-25
\| \| \| \| \| \| \| \|	Set the initiator depth and responder resources to the device max values for new connect request events in the iWARP connection manager. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	Kobject: convert drivers/* from kobject_unregister() to kobject_put()	Greg Kroah-Hartman	2008-01-24
\| \| \| \| \| \| \| \| \| \| \|	There is no need for kobject_unregister() anymore, thanks to Kay's kobject cleanup changes, so replace all instances of it with kobject_put(). Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	Kobject: change drivers/infiniband to use kobject_init_and_add	Greg Kroah-Hartman	2008-01-24
\| \| \| \| \| \| \| \| \| \| \| \|	Stop using kobject_register, as this way we can control the sending of the uevent properly, after everything is properly initialized. Cc: Roland Dreier <rolandd@cisco.com> Cc: Sean Hefty <mshefty@ichips.intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-10-30
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: IB/fmr_pool: Stop ib_fmr threads from contributing to load average IB/ipath: Fix incorrect use of sizeof on msg buffer (function argument) IB/ipath: Limit length checksummed in eeprom IB/ipath: Fix a race where s_last is updated without lock held IB/mlx4: Lock SQ lock in mlx4_ib_post_send() IPoIB/cm: Fix receive QP cleanup
\| *	IB/fmr_pool: Stop ib_fmr threads from contributing to load average	Anton Blanchard	2007-10-30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I noticed my machine was at a constant load average of 1. This was because ib_create_fmr_pool calls kthread_create but does not immediately wake the thread up. Change to using kthread_run so we enter ib_fmr_cleanup_thread(), set TASK_INTERRUPTIBLE, then go to sleep. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \|	SG: Change sg_set_page() to take length and offset argument	Jens Axboe	2007-10-24
\|/ \| \| \| \| \| \| \| \| \|	Most drivers need to set length and offset as well, so may as well fold those three lines into one. Add sg_assign_page() for those two locations that only needed to set the page, where the offset/length is set outside of the function context. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-10-23
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: mlx4_core: Increase command timeout for INIT_HCA to 10 seconds IPoIB/cm: Use common CQ for CM send completions IB/uverbs: Fix checking of userspace object ownership IB/mlx4: Sanity check userspace send queue sizes IPoIB: Rewrite "if (!likely(...))" as "if (unlikely(!(...)))" IB/ehca: Enable large page MRs by default IB/ehca: Change meaning of hca_cap_mr_pgsize IB/ehca: Fix ehca_encode_hwpage_size() and alloc_fmr() IB/ehca: Fix masking error in {,re}reg_phys_mr() IB/ehca: Supply QP token for SRQ base QPs IPoIB: Use round_jiffies() for ah_reap_task RDMA/cma: Fix deadlock destroying listen requests RDMA/cma: Add locking around QP accesses IB/mthca: Avoid alignment traps when writing doorbells mlx4_core: Kill mlx4_write64_raw()
\| *	IB/uverbs: Fix checking of userspace object ownership	Roland Dreier	2007-10-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 9ead190b ("IB/uverbs: Don't serialize with ib_uverbs_idr_mutex") rewrote how userspace objects are looked up in the uverbs module's idrs, and introduced a severe bug in the process: there is no checking that an operation is being performed by the right process any more. Fix this by adding the missing check of uobj->context in __idr_get_uobj(). Apparently everyone is being very careful to only touch their own objects, because this bug was introduced in June 2006 in 2.6.18, and has gone undetected until now. Cc: stable <stable@kernel.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/cma: Fix deadlock destroying listen requests	Sean Hefty	2007-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Deadlock condition reported by Kanoj Sarcar <kanoj@netxen.com>. The deadlock occurs when a connection request arrives at the same time that a wildcard listen is being destroyed. A wildcard listen maintains per device listen requests for each RDMA device in the system. The per device listens are automatically added and removed when RDMA devices are inserted or removed from the system. When a wildcard listen is destroyed, rdma_destroy_id() acquires the rdma_cm's device mutex ('lock') to protect against hot-plug events adding or removing per device listens. It then tries to destroy the per device listens by calling ib_destroy_cm_id() or iw_destroy_cm_id(). It does this while holding the device mutex. However, if the underlying iw/ib CM reports a connection request while this is occurring, the rdma_cm callback function will try to acquire the same device mutex. Since we're in a callback, the ib_destroy_cm_id() or iw_destroy_cm_id() calls will block until their callback thread returns, but the callback is blocked waiting for the device mutex. Fix this by re-working how per device listens are destroyed. Use rdma_destroy_id(), which avoids the deadlock, in place of cma_destroy_listen(). Additional synchronization is added to handle device hot-plug events and ensure that the id is not destroyed twice. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/cma: Add locking around QP accesses	Sean Hefty	2007-10-16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If a user allocates a QP on an rdma_cm_id, the rdma_cm will automatically transition the QP through its states (RTR, RTS, error, etc.) While the QP state transitions are occurring, the QP itself must remain valid. Provide locking around the QP pointer to prevent its destruction while accessing the pointer. This fixes an issue reported by Olaf Kirch from Oracle that resulted in a system crash: "An incoming connection arrives and we decide to tear down the nascent connection. The remote ends decides to do the same. We start to shut down the connection, and call rdma_destroy_qp on our cm_id. ... Now apparently a 'connect reject' message comes in from the other host, and cma_ib_handler() is called with an event of IB_CM_REJ_RECEIVED. It calls cma_modify_qp_err, which for some odd reason tries to modify the exact same QP we just destroyed." Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \|	[SG] Update drivers to use sg helpers	Jens Axboe	2007-10-22
\| \| \| \| \| \| \| \|	Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* \|	[INET]: Justification for local port range robustness.	Anton Arapov	2007-10-19
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a justifying patch for Stephen's patches. Stephen's patches disallows using a port range of one single port and brakes the meaning of the 'remaining' variable, in some places it has different meaning. My patch gives back the sense of 'remaining' variable. It should mean how many ports are remaining and nothing else. Also my patch allows using a single port. I sure we must be able to use mentioned port range, this does not restricted by documentation and does not brake current behavior. usefull links: Patches posted by Stephen Hemminger http://marc.info/?l=linux-netdev&m=119206106218187&w=2 http://marc.info/?l=linux-netdev&m=119206109918235&w=2 Andrew Morton's comment http://marc.info/?l=linux-kernel&m=119248225007737&w=2 1. Allows using a port range of one single port. 2. Gives back sense of 'remaining' variable. Signed-off-by: Anton Arapov <aarapov@redhat.com> Acked-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	Driver core: change add_uevent_var to use a struct	Kay Sievers	2007-10-12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This changes the uevent buffer functions to use a struct instead of a long list of parameters. It does no longer require the caller to do the proper buffer termination and size accounting, which is currently wrong in some places. It fixes a known bug where parts of the uevent environment are overwritten because of wrong index calculations. Many thanks to Mathieu Desnoyers for finding bugs and improving the error handling. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Cc: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2007-10-11
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (87 commits) mlx4_core: Fix section mismatches IPoIB: Allow setting policy to ignore multicast groups IB/mthca: Mark error paths as unlikely() in post_srq_recv functions IB/ipath: Minor fix to ordering of freeing and zeroing of tid pages. IB/ipath: Remove redundant link state checks IB/ipath: Fix IB_EVENT_PORT_ERR event IB/ipath: Better handling of unexpected GPIO interrupts IB/ipath: Maintain active time on all chips IB/ipath: Fix QHT7040 serial number check IB/ipath: Indicate a couple of chip bugs to userspace IB/ipath: iba6110 rev4 no longer needs recv header overrun workaround IB/ipath: Use counters in ipath_poll and cleanup interrupts in ipath_close IB/ipath: Remove duplicate copy of LMC IB/ipath: Add ability to set the LMC via the sysfs debugging interface IB/ipath: Optimize completion queue entry insertion and polling IB/ipath: Implement IB_EVENT_QP_LAST_WQE_REACHED IB/ipath: Generate flush CQE when QP is in error state IB/ipath: Remove redundant code IB/ipath: Future proof eeprom checksum code (contents reading) IB/ipath: UC RDMA WRITE with IMMEDIATE doesn't send the immediate ...
\| *	RDMA/cma: Queue IB CM MRAs to avoid unnecessary remote retries	Sean Hefty	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Automatically queue MRA message to decrease the number of retries sent by the remote side during connection establishment. This also has the effect of increasing the overall connection timeout without using a longer retry time in the case of dropped packets. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/cm: Modify interface to send MRAs in response to duplicate messages	Sean Hefty	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The IB CM provides a message received acknowledged (MRA) message that can be sent to indicate that a REQ or REP message has been received, but will require more time to process than the timeout specified by those messages. In many cases, the application may not know how long it will take to respond to a CM message, but the majority of the time, it will usually respond before a retry has been sent. Rather than sending an MRA in response to all messages just to handle the case where a longer timeout is needed, it is more efficient to queue the MRA for sending in case a duplicate message is received. This avoids sending an MRA when it is not needed, but limits the number of times that a REQ or REP will be resent. It also provides for a simpler implementation than generating the MRA based on a timer event. (That is, trying to send the MRA after receiving the first REQ or REP if a response has not been generated, so that it is received at the remote side before a duplicate REQ or REP has been received) Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/uverbs: Make ib_uverbs_release_event_file() static	Roland Dreier	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ib_uverbs_release_event_file() is only used in uverbs_main.c, so make it static to that file. Also move the definition before the first use, so a forward declaration is not needed. Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/umad: Fix bit ordering and 32-on-64 problems on big endian systems	Roland Dreier	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The declaration of struct ib_user_mad_reg_req.method_mask[] exported to userspace was an array of __u32, but the kernel internally treated it as a bitmap made up of longs. This makes a difference for 64-bit big-endian kernels, where numbering the bits in an array of__u32 gives: \|31.....0\|63....31\|95....64\|127...96\| while numbering the bits in an array of longs gives: \|63..............0\|127............64\| 64-bit userspace can handle this by just treating method_mask[] as an array of longs, but 32-bit userspace is really stuck: the meaning of the bits in method_mask[] depends on whether the kernel is 32-bit or 64-bit, and there's no sane way for userspace to know that. Fix this by updating <rdma/ib_user_mad.h> to make it clear that method_mask[] is an array of longs, and using a compat_ioctl method to convert to an array of 64-bit longs to handle the 32-on-64 problem. This fixes the interface description to match existing behavior (so working binaries continue to work) in almost all situations, and gives consistent semantics in the case of 32-bit userspace that can run on either a 32-bit or 64-bit kernel, so that the same binary can work for both 32-on-32 and 32-on-64 systems. Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/umad: Add P_Key index support	Roland Dreier	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for setting the P_Key index of sent MADs and getting the P_Key index of received MADs. This requires a change to the layout of the ABI structure struct ib_user_mad_hdr, so to avoid breaking compatibility, we default to the old (unchanged) ABI and add a new ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware of the new ABI to opt into using it. We plan on switching to the new ABI by default in a year or so, and this patch adds a warning that is printed when an application uses the old ABI, to push people towards converting to the new ABI. Signed-off-by: Roland Dreier <rolandd@cisco.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Hal Rosenstock <hal@xsigo.com>
\| *	IB/core: Fix handling of multicast response failures	Ralph Campbell	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I was looking at the code for multicast.c and noticed that ib_sa_join_multicast() calls queue_join() which puts the request at the front of the group->pending_list. If this is a second request, it seems like it would interfere with process_join_error() since group->last_join won't point to the member at the head of the pending_list. The sequence would thus be: 1. ib_sa_join_multicast() puts member1 on head of pending_list and starts work thread 2. mcast_work_handler() calls send_join() which sets group->last_join to member1 3. ib_sa_join_multicast() puts member2 on head of pending_list 4. join operation for member1 receives failures response from SA. 5. join_handler() is called with error status 6. process_join_error() fails to process member1 since it doesn't match the first entry in the group->pending_list. The impact is that the failed join request is tossed. The second request is processed, and after it completes, the original request ends up being retried. This change also results in join requests being processed in FIFO order. Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/cma: Use neigh_event_send() to start neighbour discovery	Steve Wise	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calling arp_send() to initiate neighbour discovery (ND) doesn't do the full ND protocol. Namely, it doesn't handle retransmitting the arp request if it is dropped. The function neigh_event_send() does all this. Without doing full ND, RDMA address resolution fails in the presence of dropped ARP broadcast packets. Signed-off-by: Steve Wise <swise@opengridcomputing.com> Acked-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/umem: Add hugetlb flag to struct ib_umem	Joachim Fenkes	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During ib_umem_get(), determine whether all pages from the memory region are hugetlb pages and report this in the "hugetlb" member. Low-level drivers can use this information if they need it. Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/ucma: Allow user space to set service type	Sean Hefty	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Export the ability to set the type of service to user space. Model the interface after setsockopt. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	RDMA/cma: Add ability to specify type of service	Sean Hefty	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Provide support to specify a type of service for a communication identifier. A new function call is used when dealing with IPv4 addresses. For IPv6 addresses, the ToS is specified through the traffic class field in the sockaddr_in6 structure. Signed-off-by: Sean Hefty <sean.hefty@intel.com> [ The comments Eitan Zahavi and myself have made over the v1 post at <http://lists.openfabrics.org/pipermail/general/2007-August/039247.html> were fully addressed. ] Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/sa: Add new QoS fields to path record	Sean Hefty	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The QoS annex defines new fields for path records. Add them to the ib_sa for consumers that want to use them. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Reviewed-by: Or Gerlitz <ogerlitz@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/sa: Error handling thinko fix	Ali Ayoub	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	ib_create_send_mad() returns an error code pointer on error, not NULL. Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB/fmr_pool: Clean up some error messages in fmr_pool.c	Anton Blanchard	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A number of printks in fmr_pool.c dont have newlines, eg: fmr_create failed for FMR 0<5>FS-Cache: Loaded Fix them up. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Roland Dreier <rolandd@cisco.com>
\| *	IB: find_first_zero_bit() takes unsigned pointer	Roland Dreier	2007-10-09
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix sparse warning drivers/infiniband/core/device.c:142:6: warning: incorrect type in argument 1 (different signedness) drivers/infiniband/core/device.c:142:6: expected unsigned long const addr drivers/infiniband/core/device.c:142:6: got long [assigned] inuse by making the local variable inuse unsigned. Does not affect generated code at all. Signed-off-by: Roland Dreier <rolandd@cisco.com>
* \|	[INET]: local port range robustness	Stephen Hemminger	2007-10-10
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Expansion of original idea from Denis V. Lunev <den@openvz.org> Add robustness and locking to the local_port_range sysctl. 1. Enforce that low < high when setting. 2. Use seqlock to ensure atomic update. The locking might seem like overkill, but there are cases where sysadmin might want to change value in the middle of a DoS attack. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
*	IB: Move the macro IB_UMEM_MAX_PAGE_CHUNK() to umem.c	Dotan Barak	2007-08-03
\| \| \| \| \| \| \| \| \| \|	After moving the definition of struct ib_umem_chunk from ib_verbs.h to ib_umem.h there isn't any reason for the macro IB_UMEM_MAX_PAGE_CHUNK to stay in ib_verbs.h. Move the macro to umem.c, the only place where it is used. Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Fix address handle leak in mad_rmpp	Sean Hefty	2007-08-03
\| \| \| \| \| \| \| \| \| \| \|	The address handle associated with dual-sided RMPP direction switch ACKs is never destroyed. Free the AH for ACKs which fall into this category. Problem was reported by Dotan Barak <dotanb@dev.mellanox.co.il>. Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: agent_send_response() should be void	Hal Rosenstock	2007-08-03
\| \| \| \| \| \| \| \|	Nothing looks at the return value of agent_send_response(), so there's no point in returning anything. Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Fix memory leak in switch handling in ib_mad_recv_done_handler()	Hal Rosenstock	2007-08-03
\| \| \| \| \| \| \| \| \| \|	If agent_send_response() returns an error, we shouldn't do anything differently than if it succeeds; setting response to NULL just means that the response buffer gets leaked. Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/mad: Fix error path if response alloc fails in ib_mad_recv_done_handler()	Hal Rosenstock	2007-08-03
\| \| \| \| \| \| \| \| \| \| \| \|	If ib_mad_recv_done_handler() fails to allocate response, then it just printed a warning and continued, which leads to an oops if the MAD is being handled for a switch device, because the switch code uses response without checking for NULL. Fix this by bailing out of the function if the allocation fails. Signed-off-by: Suresh Shelvapille <suri@baymicrosystems.com> Signed-off-by: Hal Rosenstock <hal.rosenstock@gmail.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/sa: Don't need to check for default P_Key twice	Roland Dreier	2007-08-03
\| \| \| \| \| \| \|	Now that ib_find_pkey() ignores the membership bit of P_Keys, there's no need for ib_sa to look for both 0x7fff and 0xffff in a port's P_Key table. Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	IB/core: Ignore membership bit in ib_find_pkey()	Moni Shoua	2007-08-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ib_find_pkey() is used as a replacement for ib_find_cached_pkey(), and the original function ignored the membership bit when searching for a P_Key, so ib_find_pkey() should ignore the bit too. In particular, IPoIB turns on the P_Key membership bit of limited membership P_Keys when creating a child interface and looks for the full membership P_key. This broke if a port was a partial member of a partition when IPoIB switched from ib_find_cached_pkey() to ib_find_pkey(), and this change fixes things again. Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: Roland Dreier <rolandd@cisco.com>
*	mm: Remove slab destructors from kmem_cache_create().	Paul Mundt	2007-07-19
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Slab destructors were no longer supported after Christoph's c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>