<feed xmlns='http://www.w3.org/2005/Atom'>
<title>litmus-rt.git/drivers/infiniband/ulp/ipoib, branch v2.6.28-rc9</title>
<subtitle>The LITMUS^RT kernel.</subtitle>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/'/>
<entry>
<title>IPoIB: Fix crash in path_rec_completion()</title>
<updated>2008-11-12T18:24:39+00:00</updated>
<author>
<name>Yossi Etigin</name>
<email>yosefe@Voltaire.COM</email>
</author>
<published>2008-11-12T18:24:39+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=ff79ae80837cf45cb703b34824dd3862d2ddcb24'/>
<id>ff79ae80837cf45cb703b34824dd3862d2ddcb24</id>
<content type='text'>
Fix a crash in path_rec_completion() during an SM up/down loop.  If
more than one path record request is issued, the first completion
releases path-&gt;done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path-&gt;valid and changed the test "if
(!path)" to "if (!path || !path-&gt;valid)".  This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path.  Having two queries on the same
path leads to a crash.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1325&gt;.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Fix a crash in path_rec_completion() during an SM up/down loop.  If
more than one path record request is issued, the first completion
releases path-&gt;done, allowing ipoib_flush_paths() to free the path,
and thus corrupting it for the second completion.

Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") added the field path-&gt;valid and changed the test "if
(!path)" to "if (!path || !path-&gt;valid)".  This change made it
possible for a path with an outstanding query to pass the test and
issue another query on the same path.  Having two queries on the same
path leads to a crash.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1325&gt;.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix hang in ipoib_flush_paths()</title>
<updated>2008-11-12T18:24:38+00:00</updated>
<author>
<name>Yossi Etigin</name>
<email>yosefe@Voltaire.COM</email>
</author>
<published>2008-11-12T18:24:38+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=93a3ab939ba90e00e193f0bad98f43fbdfbd925d'/>
<id>93a3ab939ba90e00e193f0bad98f43fbdfbd925d</id>
<content type='text'>
ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path().  Then,
ipoib_flush_paths() will wait for path-&gt;done, but it will never
complete because the request was not issued at all.  Fix this by
completing path-&gt;done if issuing the query fails.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1329&gt;.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ipoib_flush_paths() can hang during an SM up/down loop: if
path_rec_start() fails (for instance, because there is no sm_ah), the
path is still added to the path list by neigh_add_path().  Then,
ipoib_flush_paths() will wait for path-&gt;done, but it will never
complete because the request was not issued at all.  Fix this by
completing path-&gt;done if issuing the query fails.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1329&gt;.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Don't enable NAPI when it's already enabled</title>
<updated>2008-11-12T18:24:36+00:00</updated>
<author>
<name>Yossi Etigin</name>
<email>yosefe@Voltaire.COM</email>
</author>
<published>2008-11-12T18:24:36+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=fe25c56190bbc0951d7c53b4ccd148e669d69938'/>
<id>fe25c56190bbc0951d7c53b4ccd148e669d69938</id>
<content type='text'>
If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable().  ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().

Fix this by moving the call to napi_enable() to after the test for
P_Key presence.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
If a P_Key is not present when an interface is created, ipoib_open()
will return after doing napi_enable().  ipoib_open() will be called
again from ipoib_pkey_poll() when the P_Key appears, after NAPI has
already been enabled, and try to enable it again. This triggers a
BUG_ON() in napi_enable().

Fix this by moving the call to napi_enable() to after the test for
P_Key presence.

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Set netdev offload features properly for child (VLAN) interfaces</title>
<updated>2008-10-22T22:49:49+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@voltaire.com</email>
</author>
<published>2008-10-22T22:49:49+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=83bb63f62bda28be88b21216fbb59838a10f2348'/>
<id>83bb63f62bda28be88b21216fbb59838a10f2348</id>
<content type='text'>
Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz &lt;ogerlitz@voltaire.com&gt;

-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Child devices were created without any offload features set, fix this by
moving the code that computes the features into generic function which is
now called through non-child and child device creation.

Signed-off-by: Or Gerlitz &lt;ogerlitz@voltaire.com&gt;

-- v1 has a bug where the 'result' flag in ipoib_vlan_add may be used uninitialized
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Clean up ethtool support</title>
<updated>2008-10-22T22:49:29+00:00</updated>
<author>
<name>Or Gerlitz</name>
<email>ogerlitz@voltaire.com</email>
</author>
<published>2008-10-22T22:49:29+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=70c9c0db549245a49cabf42d5a74688077254d46'/>
<id>70c9c0db549245a49cabf42d5a74688077254d46</id>
<content type='text'>
Add a get_rx_csum method.  Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz &lt;ogerlitz@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Add a get_rx_csum method.  Remove the driver's own get_tso method, as
the ethtool kernel code uses the default one if nothing is provided.

Signed-off-by: Or Gerlitz &lt;ogerlitz@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Always initialize poll_timer to avoid crash on unload</title>
<updated>2008-10-10T22:58:52+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2008-10-10T22:58:52+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=2767840a5ca73fde62b25e0209aa9269ec4fa7c7'/>
<id>2767840a5ca73fde62b25e0209aa9269ec4fa7c7</id>
<content type='text'>
ipoib_ib_dev_stop() does del_timer_sync(&amp;priv-&gt;poll_timer), but if a
P_key for an interface is not found, poll_timer is not initialized, so
this leads to a crash or hang.  Fix this by moving where poll_timer is
initialized to ipoib_ib_dev_init(), which is always called.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1172&gt;.

Debugged-by: Yosef Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
ipoib_ib_dev_stop() does del_timer_sync(&amp;priv-&gt;poll_timer), but if a
P_key for an interface is not found, poll_timer is not initialized, so
this leads to a crash or hang.  Fix this by moving where poll_timer is
initialized to ipoib_ib_dev_init(), which is always called.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1172&gt;.

Debugged-by: Yosef Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Use netif_tx_lock() and get rid of private tx_lock, LLTX</title>
<updated>2008-09-30T17:36:21+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2008-09-30T17:36:21+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=943c246e9ba9078a61b6bcc5b4a8131ce8befb64'/>
<id>943c246e9ba9078a61b6bcc5b4a8131ce8befb64</id>
<content type='text'>
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv-&gt;tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv-&gt;lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently, IPoIB is an LLTX driver that uses its own IRQ-disabling
tx_lock.  Not only do we want to get rid of LLTX, this actually causes
problems because of the skb_orphan() done with this tx_lock held: some
skb destructors expect to be run with interrupts enabled.

The simplest fix for this is to get rid of the driver-private tx_lock
and stop using LLTX.  We kill off priv-&gt;tx_lock and use
netif_tx_lock[_bh]() instead; the patch to do this is a tiny bit
tricky because we need to update places that take priv-&gt;lock inside
the tx_lock to disable IRQs, rather than relying on tx_lock having
already disabled IRQs.

Also, there are a couple of places where we need to disable BHs to
make sure we have a consistent context to call netif_tx_lock() (since
we no longer can use _irqsave() variants), and we also have to change
ipoib_send_comp_handler() to call drain_tx_cq() through a timer rather
than directly, because ipoib_send_comp_handler() runs in interrupt
context and drain_tx_cq() must run in BH context so it can call
netif_tx_lock().

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix crash when path record fails after path flush</title>
<updated>2008-09-25T22:26:15+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2008-09-25T22:26:15+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=c9da4bad5b80c3d9884e2c6ad8d2091252c32d5e'/>
<id>c9da4bad5b80c3d9884e2c6ad8d2091252c32d5e</id>
<content type='text'>
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path-&gt;ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path-&gt;ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path-&gt;ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1194&gt;

Reported-by: Rabah Salem &lt;ravah@mellanox.com&gt;
Debugged-by: Eli Cohen &lt;eli@mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit ee1e2c82 ("IPoIB: Refresh paths instead of flushing them on SM
change events") changed how paths are flushed on an SM event.  This
change introduces a problem if the path record query triggered by
fails, causing path-&gt;ah to become NULL.  A later successful path query
will then trigger WARN_ON() in path_rec_completion(), and crash
because path-&gt;ah has already been freed, so the ipoib_put_ah() inside
the lock in path_rec_completion() may actually drop the last reference
(contrary to the comment that claims this is safe).

Fix this by updating path-&gt;ah and freeing old_ah only when the path
record query is successful.  This prevents the neighbour AH and that
path AH from getting out of sync.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1194&gt;

Reported-by: Rabah Salem &lt;ravah@mellanox.com&gt;
Debugged-by: Eli Cohen &lt;eli@mellanox.co.il&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix deadlock on RTNL between bcast join comp and ipoib_stop()</title>
<updated>2008-09-16T18:57:45+00:00</updated>
<author>
<name>Yossi Etigin</name>
<email>yossi.openib@gmail.com</email>
</author>
<published>2008-09-16T18:57:45+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=e8224e4b804b4fd26723191c1891101a5959bb8a'/>
<id>e8224e4b804b4fd26723191c1891101a5959bb8a</id>
<content type='text'>
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop().  We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.  This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes.  This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Taking rtnl_lock in ipoib_mcast_join_complete() causes a deadlock with
ipoib_stop().  We avoid it by scheduling the piece of code that takes
the lock on ipoib_workqueue instead of executing it directly.  This
works because we only flush the ipoib_workqueue with the RTNL not held.

The deadlock happens because ipoib_stop() calls ipoib_ib_dev_down()
which calls ipoib_mcast_dev_flush(), which calls ipoib_mcast_free(),
which calls ipoib_mcast_leave(). The latter calls
ib_sa_free_multicast(), and this waits until the multicast completion
handler finishes.  This handler is ipoib_mcast_join_complete(), which
waits for the rtnl_lock(), which was already taken by ipoib_stop().

This bug was introduced in commit a77a57a1 ("IPoIB: Fix deadlock on
RTNL in ipoib_stop()").

Signed-off-by: Yossi Etigin &lt;yosefe@voltaire.com&gt;
Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>IPoIB: Fix deadlock on RTNL in ipoib_stop()</title>
<updated>2008-08-19T22:01:32+00:00</updated>
<author>
<name>Roland Dreier</name>
<email>rolandd@cisco.com</email>
</author>
<published>2008-08-19T22:01:32+00:00</published>
<link rel='alternate' type='text/html' href='http://rtsrv.cs.unc.edu/cgit/cgit.cgi/litmus-rt.git/commit/?id=a77a57a1a22afc31891d95879fe3cf2ab03838b0'/>
<id>a77a57a1a22afc31891d95879fe3cf2ab03838b0</id>
<content type='text'>
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1114&gt;.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Commit c8c2afe3 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue.  However, ipoib_stop() (which is run
inside rtnl_lock()) flushes this workqueue, which leads to a deadlock
if the join task is pending.

Fix this by simply not flushing the workqueue from ipoib_stop().  It
turns out that we really don't care about workqueue tasks running
during or after ipoib_stop(), as long as we make sure to flush the
workqueue before unregistering a netdev.

This fixes &lt;https://bugs.openfabrics.org/show_bug.cgi?id=1114&gt;.

Signed-off-by: Roland Dreier &lt;rolandd@cisco.com&gt;
</pre>
</div>
</content>
</entry>
</feed>
