| Commit message (Collapse) | Author | Age |
... | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Using TX and RX CQs attached to the same vector might
create a throttling effect coming from the serial processing
of a work-queue. Use one CQ instead, it will do better in interrupt
processing and it provides a simpler code. Also, We get rid of
redundant isert_rx_wq.
Next we can remove the atomic post_send_buf_count from the IO path.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A pre-step before going to a single CQ.
Also this makes the code a little more simple to
read.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Nit, uintptr_t is designed for pointer casting, use it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
As a pre-step to a single CQ, we unite the error completion
handlers to a single handler.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It is disabled at the moment, we will get that back
in once the target is more stable.
This reverts commit 95b60f0
"Add support for completion interrupt coalescing"
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Currently we have no way to tell that the target stack is in shutdown
sequence. In case we have open connections, the initiator immediately
attempts to reconnect in a DDOS attack style, so we may end up
terminating the iser enabled network portal while it's np_accept_list
still have pending connections.
The workaround is simply release all the connections in the list.
A proper fix will be to start shutdown sequence by shutting the
network portal to avoid initiator immediate reconnect attempts.
But the temporary work around seems to work at this point, so I think
we can do this for now...
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.
In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.
Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
[<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
[<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
[<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
[<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
[<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
[<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
[<ffffffff8106ece2>] process_one_work+0x182/0x3b0
[<ffffffff8106fda0>] worker_thread+0x120/0x3c0
[<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
[<ffffffff8107594e>] kthread+0xce/0xf0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.
Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.
We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.
Squashed:
iser-target: Centralize PI context handling code
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.
So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.
The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).
We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.
Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.
Squashed:
iser-target: Move cma_id setup to a function
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.
In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.
Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().
Squashed:
iser-target: Decrement completion context active_qps in error flow
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.
This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.
In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.
Squashed:
iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.
Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.
- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.
Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).
Squashed:
iser-target: Don't deffer disconnected handler to a work
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Since commit 0fc4ea701fcf ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.
Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).
Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.
Squashed:
iscsi-target: Add call to wait_conn in establishment error flow
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
|
| | | | | |
| \ \ \ | |
| \ \ \ | |
| \ \ \ | |
|\ \ \ \ \ \
| |_|_|_|/ /
|/| | | | |
| | | | | | |
'srp' into for-next
|
| |_|/ / /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In case the last argument of the connection string is processed as a
string (destination GID for example).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Following few recent Block integrity updates, we align the iSER data
integrity offload settings with:
- Deprecate pi_guard module param
- Expose support for DIX type 0.
- Use scsi_transfer_length for the transfer length
- Get pi_interval, ref_tag, ref_remap, bg_type and
check_mask setting from scsi_cmnd
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Use likely() for wc.status == IB_WC_SUCCESS
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
And fix a checkpatch warning.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
No reason to settle with four, can use the min between device max comp
vectors and number of cores.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It is enough to check mem_h pointer assignment, mem_h == NULL will
indicate that buffer is not registered using mr.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Eliminates code duplication.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When closing the connection, we should first terminate the connection
(in case it was not previously terminated) to guarantee the QP is in
error state and we are done with servicing IO. Only then go ahead with
tasks cleanup via iscsi_conn_stop.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In certain scenarios (target kill with live IO) scsi TMFs may race
with iser RDMA teardown, which might cause NULL dereference on iser IB
device handle (which might have been freed). In this case we take a
conditional lock for TMFs and check the connection state (avoid
introducing lock contention in the IO path). This is indeed best
effort approach, but sufficient to survive multi targets sudden death
while heavy IO is inflight.
While we are on it, add a nice kernel-doc style documentation.
Reported-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If rdma_cm error event comes after ep_poll but before conn_bind, we
should protect against dereferncing the device (which may have been
terminated) in session_create and conn_create (already protected)
callbacks.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Use uintptr_t to handle wr_id casting, which was found by Kbuild test
robot and smatch. Also remove an internal definition of variable which
potentially shadows an external one (and make sparse happy).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fix a regression was introduced in commit 6df5a128f0fd ("IB/iser:
Suppress scsi command send completions").
The sig_count was wrongly set to be static variable, thus it is
possible that we won't reach to (sig_count % ISER_SIGNAL_BATCH) == 0
condition (due to races) and the send queue will be overflowed.
Instead keep sig_count per connection. We don't need it to be atomic
as we are safe under the iscsi session frwd_lock taken by libiscsi on
the queuecommand path.
Fixes: 6df5a128f0fd ("IB/iser: Suppress scsi command send completions")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
When creating a connection QP we choose the least used CQ and inc the
number of active QPs on that. If we fail to create the QP, we need to
decrement the active QPs counter.
Reported-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
No real need to wait for TIMEWAIT_EXIT before we destroy the RDMA
resources (also TIMEAWAIT_EXIT is not guarenteed to always arrive). As
for the cma_id, only destroy it if the state is not DOWN where in this
case, conn_release is already running and we don't want to compete.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In case of the HCA going into catasrophic error flow, the
beacon post_send is likely to fail, so surely there will
be no completion for it.
In this case, use a best effort approach and don't wait for beacon
completion if we failed to post the send.
Reported-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| |/ / /
|/| | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Re-adjust max CQEs per CQ and max send_wr per QP according
to the resource limits supported by underlying hardware.
Signed-off-by: Minh Tran <minhduc.tran@emulex.com>
Signed-off-by: Jayamohan Kallickal <jayamohan.kallickal@emulex.com>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Various places in the IPoIB code had a deadlock related to flushing
the ipoib workqueue. Now that we have per device workqueues and a
specific flush workqueue, there is no longer a deadlock issue with
flushing the device specific workqueues and we can do so unilaterally.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We used to pass a flush variable to mcast_stop_thread to indicate if
we should flush the workqueue or not. This was due to some code
trying to flush a workqueue that it was currently running on which is
a no-no. Now that we have per-device work queues, and now that
ipoib_mcast_restart_task has taken the fact that it is queued on a
single thread workqueue with all of the ipoib_mcast_join_task's and
therefore has no need to stop the join task while it runs, we can do
away with the flush parameter and unilaterally flush always.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
During my recent work on the rtnl lock deadlock in the IPoIB driver, I
saw that even once I fixed the apparent races for a single device, as
soon as that device had any children, new races popped up. It turns
out that this is because no matter how well we protect against races
on a single device, the fact that all devices use the same workqueue,
and flush_workqueue() flushes *everything* from that workqueue, we can
have one device in the middle of a down and holding the rtnl lock and
another totally unrelated device needing to run mcast_restart_task,
which wants the rtnl lock and will loop trying to take it unless is
sees its own FLAG_ADMIN_UP flag go away. Because the unrelated
interface will never see its own ADMIN_UP flag drop, the interface
going down will deadlock trying to flush the queue. There are several
possible solutions to this problem:
Make carrier_on_task and mcast_restart_task try to take the rtnl for
some set period of time and if they fail, then bail. This runs the
real risk of dropping work on the floor, which can end up being its
own separate kind of deadlock.
Set some global flag in the driver that says some device is in the
middle of going down, letting all tasks know to bail. Again, this can
drop work on the floor. I suppose if our own ADMIN_UP flag doesn't go
away, then maybe after a few tries on the rtnl lock we can queue our
own task back up as a delayed work and return and avoid dropping work
on the floor that way. But I'm not 100% convinced that we won't cause
other problems.
Or the method this patch attempts to use, which is when we bring an
interface up, create a workqueue specifically for that interface, so
that when we take it back down, we are flushing only those tasks
associated with our interface. In addition, keep the global
workqueue, but now limit it to only flush tasks. In this way, the
flush tasks can always flush the device specific work queues without
having deadlock issues.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In preparation for using per device work queues, we need to move the
start of the neighbor thread task to after ipoib_ib_dev_init and move
the destruction of the neighbor task to before ipoib_ib_dev_cleanup.
Otherwise we will end up freeing our workqueue with work possibly
still on it.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Our mcast_dev_flush routine and our mcast_restart_task can race
against each other. In particular, they both hold the priv->lock
while manipulating the rbtree and while removing mcast entries from
the multicast_list and while adding entries to the remove_list, but
they also both drop their locks prior to doing the actual removes.
The mcast_dev_flush routine is run entirely under the rtnl lock and so
has at least some locking. The actual race condition is like this:
Thread 1 Thread 2
ifconfig ib0 up
start multicast join for broadcast
multicast join completes for broadcast
start to add more multicast joins
call mcast_restart_task to add new entries
ifconfig ib0 down
mcast_dev_flush
mcast_leave(mcast A)
mcast_leave(mcast A)
As mcast_leave calls ib_sa_multicast_leave, and as member in
core/multicast.c is ref counted, we run into an unbalanced refcount
issue. To avoid stomping on each others removes, take the rtnl lock
specifically when we are deleting the entries from the remove list.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit a9c8ba5884 ("IPoIB: Fix usage of uninitialized multicast
objects") added a new flag MCAST_JOIN_STARTED, but was not very strict
in how it was used. We didn't always initialize the completion struct
before we set the flag, and we didn't always call complete on the
completion struct from all paths that complete it. This made it less
than totally effective, and certainly made its use confusing. And in
the flush function we would use the presence of this flag to signal
that we should wait on the completion struct, but we never cleared
this flag, ever. This is further muddied by the fact that we overload
the MCAST_FLAG_BUSY flag to mean two different things: we have a join
in flight, and we have succeeded in getting an ib_sa_join_multicast.
In order to make things clearer and aid in resolving the rtnl deadlock
bug I've been chasing, I cleaned this up a bit.
1) Remove the MCAST_JOIN_STARTED flag entirely
2) Un-overload MCAST_FLAG_BUSY so it now only means a join is in-flight
3) Test on mcast->mc directly to see if we have completed
ib_sa_join_multicast (using IS_ERR_OR_NULL)
4) Make sure that before setting MCAST_FLAG_BUSY we always initialize
the mcast->done completion struct
5) Make sure that before calling complete(&mcast->done), we always clear
the MCAST_FLAG_BUSY bit
6) Take the mcast_mutex before we call ib_sa_multicast_join and also
take the mutex in our join callback. This forces
ib_sa_multicast_join to return and set mcast->mc before we process
the callback. This way, our callback can safely clear mcast->mc
if there is an error on the join and we will do the right thing as
a result in mcast_dev_flush.
7) Because we need the mutex to synchronize mcast->mc, we can no
longer call mcast_sendonly_join directly from mcast_send and
instead must add sendonly join processing to the mcast_join_task
A number of different races are resolved with these changes. These
races existed with the old MCAST_FLAG_BUSY usage, the
MCAST_JOIN_STARTED flag was an attempt to address them, and while it
helped, a determined effort could still trip things up.
One race looks something like this:
Thread 1 Thread 2
ib_sa_join_multicast (as part of running restart mcast task)
alloc member
call callback
ifconfig ib0 down
wait_for_completion
callback call completes
wait_for_completion in
mcast_dev_flush completes
mcast->mc is PTR_ERR_OR_NULL
so we skip ib_sa_leave_multicast
return from callback
return from ib_sa_join_multicast
set mcast->mc = return from ib_sa_multicast
We now have a permanently unbalanced join/leave issue that trips up the
refcounting in core/multicast.c
Another like this:
Thread 1 Thread 2 Thread 3
ib_sa_multicast_join
ifconfig ib0 down
priv->broadcast = NULL
join_complete
wait_for_completion
mcast->mc is not yet set, so don't clear
return from ib_sa_join_multicast and set mcast->mc
complete
return -EAGAIN (making mcast->mc invalid)
call ib_sa_multicast_leave
on invalid mcast->mc, hang
forever
By holding the mutex around ib_sa_multicast_join and taking the mutex
early in the callback, we force mcast->mc to be valid at the time we
run the callback. This allows us to clear mcast->mc if there is an
error and the join is going to fail. We do this before we complete
the mcast. In this way, mcast_dev_flush always sees consistent state
in regards to mcast->mc membership at the time that the
wait_for_completion() returns.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We blindly assume that we can just take the rtnl lock and that will
prevent races with downing this interface. Unfortunately, that's not
the case. In ipoib_mcast_stop_thread() we will call flush_workqueue()
in an attempt to clear out all remaining instances of ipoib_join_task.
But, since this task is put on the same workqueue as the join task,
the flush_workqueue waits on this thread too. But this thread is
deadlocked on the rtnl lock. The better thing here is to use trylock
and loop on that until we either get the lock or we see that
FLAG_ADMIN_UP has been cleared, in which case we don't need to do
anything anyway and we just return.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|/ / /
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Setting the MTU can safely be moved to the carrier_on_task, which keeps
us from needing to take the rtnl lock in the join_finish section.
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|\ \ \
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Conflicts:
drivers/scsi/scsi_debug.c
Agreed and tested resolution to a merge problem between a fix in scsi_debug
and a driver update
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Drop the now unused reason argument from the ->change_queue_depth method.
Also add a return value to scsi_adjust_queue_depth, and rename it to
scsi_change_queue_depth now that it can be used as the default
->change_queue_depth implementation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We won't ever queue more commands than the host allows. Instead of
letting drivers either reject or ignore this case handle it in
common code. Note that various driver use internal constant or
variables that are assigned to both shost->can_queue and checked
in ->change_queue_depth - I did remove those checks as well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
All drivers use the implementation for ramping the queue up and down, so
instead of overloading the change_queue_depth method call the
implementation diretly if the driver opts into it by setting the
track_queue_depth flag in the host template.
Note that a few drivers validated the new queue depth in their
change_queue_depth method, but as we never go over the queue depth
set during slave_configure or the sysfs file this isn't nessecary
and can safely be removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Venkatesh Srinivas <venkateshs@google.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
At least LID reassignment can trigger a race condition in the SRP
initiator driver, namely the receive completion handler trying to
post a request on a QP during or after QP destruction and before
the CQ's have been destroyed. Avoid this race by modifying a QP
into the error state and by waiting until all receive completions
have been processed before destroying a QP.
Reported-by: Max Gurtuvoy <maxg@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Improve performance by using multiple RDMA/RC channels per SCSI
host for communication with an SRP target. About the
implementation:
- Introduce a loop over all channels in the code that uses
target->ch.
- Set the SRP_MULTICHAN_MULTI flag during login for the creation
of the second and subsequent channels.
- RDMA completion vectors are chosen such that RDMA completion
interrupts are handled by the CPU socket that submitted the I/O
request. As one can see in this patch it has been assumed if a
system contains n CPU sockets and m RDMA completion vectors
have been assigned to an RDMA HCA that IRQ affinity has been
configured such that completion vectors [i*m/n..(i+1)*m/n) are
bound to CPU socket i with 0 <= i < n.
- Modify srp_free_ch_ib() and srp_free_req_data() such that it
becomes safe to invoke these functions after the corresponding
allocation function failed.
- Add a ch_count sysfs attribute per target port.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since the block layer already contains functionality to assign
a tag to each request, use that functionality instead of
reimplementing that functionality in the SRP initiator driver.
This change makes the free_reqs list superfluous. Hence remove
that list.
[hch: updated to use .use_blk_tags instead scsi_activate_tcq]
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Changes in this patch:
- Move channel variables into a new structure (struct srp_rdma_ch).
- Add an srp_target_port pointer, 'lock' and 'comp_vector' members
in struct srp_rdma_ch.
- Add code to initialize these three new member variables.
- Many boring "target->" into "ch->" changes.
- The cm_id and completion handler context pointers are now of type
srp_rdma_ch * instead of srp_target_port *.
- Three kzalloc(a * b, f) calls have been changed into kcalloc(a, b, f)
to avoid that this patch would trigger a checkpatch warning.
- Two casts from u64 into unsigned long long have been left out
because these are superfluous. Since considerable time u64 is
defined as unsigned long long for all architectures supported by
the Linux kernel.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Introduce the srp_target_port member variables 'sgid' and 'pkey'.
Change the type of 'orig_dgid' from __be16[8] into union ib_gid.
This patch does not change any functionality but makes the
"Separate target and channel variables" patch easier to verify.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|