aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/block/drbd
Commit message (Collapse)AuthorAge
...
| * drbd: poison free'd device, resource and connection structsLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | Now that we have additional asynchronous kref_get/kref_put via debugfs, make sure we catch access after free. Poison struct drbd_device, drbd_connection and drbd_resource before kfree() with 0xfd, 0xfc, and 0xf2, respectively. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: also keep track of trim -> zero-out fallback peer_requestsLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | To be able to find and present such zero-out fallback peer_requests in debugfs, we add those to "active_ee", once that list drained. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: consistently use list_add_tail for peer_request trackingLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | Keep the epoch entry lists (active_ee, read_ee, sync_ee, ...) consistently "oldest first". That way finding the oldest not yet successfully processed request is simply list_first_entry_or_null. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: drop drbd_md_flushLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | The only user of drbd_md_flush was bm_rw(), and it is always followed by either a drbd_md_sync(), or an al_write_transaction(), which, if so configured, both end up submiting a FLUSH|FUA request anyways. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: add drbd_queue_work_if_unqueued helperLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We sometimes do if (list_empty(&w.list)) drbd_queue_work(&q, &w.list); Removal (list_del_init) may happen outside all locks, after all pending work entries have been moved to an on-stack local work list. For not dynamically allocated, but embeded, work structs, we must avoid to re-add until it really was removed. Move that list_empty check inside the spin_lock(&q->q_lock) within the helper function, and change to list_empty_careful(). This may have been the reason for a list_add corruption inside drbd_queue_work(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: drbd_rs_number_requests: fix unit mismatch in comparisonLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | We try to limit the number of "in-flight" resync requests. One condition for that is the amount of requested data should not exceed half of what can be covered by our "max-buffers" setting. However we compared number of 4k pages with number of in-flight 512 Byte sectors, and this extra throttle triggered much earlier than intended. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: cosmetic: change all printk(level, ...) to pr_<level>(...)Lars Ellenberg2014-07-10
| | | | | | | | | | | | | | Cosmetic change only. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: clear CRASHED_PRIMARY only after successful resyncLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | If we lost a disk during the first resync after primary crash, we could have prematurely cleared the CRASHED_PRIMARY flag. Testing on C_CONNECTED is not what we meant there, but testing for both peers to become D_UP_TO_DATE. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: improve resync request throttling due to sendbuf sizeLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | If we throttle resync because the socket sendbuffer is filling up, tell TCP about it, so it may expand the sendbuffer for us. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * block: Convert last uses of __FUNCTION__ to __func__Joe Perches2014-07-10
| | | | | | | | | | | | | | | | Just about all of these have been converted to __func__, so convert the last uses. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drivers/block: Use RCU_INIT_POINTER(x, NULL) in drbd/drbd_state.cMonam Agarwal2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | This patch replaces rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL) The rcu_assign_pointer() ensures that the initialization of a structure is carried out before storing a pointer to that structure. And in the case of the NULL pointer, there is no structure to initialize. So, rcu_assign_pointer(p, NULL) can be safely converted to RCU_INIT_POINTER(p, NULL) Signed-off-by: Monam Agarwal <monamagarwal123@gmail.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: short-circuit in maybe_pull_aheadLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | If we already "pulled ahead", we can short-circuit, and avoid logging the same messages over and over again. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: application writes may set-in-sync in protocol != CLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | If "dirty" blocks are written to during resync, that brings them in-sync. By explicitly requesting write-acks during resync even in protocol != C, we now can actually respect this. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: New net configuration option socket-check-timeoutPhilipp Reisner2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In setups involving a DRBD-proxy and connections that experience a lot of buffer-bloat it might be necessary to set ping-timeout to an unusual high value. By default DRBD uses the same value to wait if a newly established TCP-connection is stable. Since the DRBD-proxy is usually located in the same data center such a long wait time may hinder DRBD's connect process. In such setups socket-check-timeout should be set to at least to the round trip time between DRBD and DRBD-proxy. I.e. in most cases to 1. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Limit the time we are waiting for the first packet on an accepted socketPhilipp Reisner2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Before the patch 'drbd: Keep the listening socket open while trying to connect to the peer' the newly created socket inherited the receive timeout from the listen socket. The listen socket had a receive timeout of connect-intervall +- 30% random jitter. The real issue is that after the mentioned patch we had no timeout at all. Now use 4 times the ping-timeout. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: implement csums-after-crash-onlyLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Checksum based resync trades CPU cycles for network bandwidth, in situations where we expect much of the to-be-resynced blocks to be actually identical on both sides already. In a "network hickup" scenario, it won't help: all to-be-resynced blocks will typically be different. The use case is for the resync of *potentially* different blocks after crash recovery -- the crash recovery had marked larger areas (those covered by the activity log) as need-to-be-resynced, just in case. Most of those blocks will be identical. This option makes it possible to configure checksum based resync, but only actually use it for the first resync after primary crash. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: don't implicitly resize Diskless node beyond end of deviceLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During handshake, we compare backend sizes, and user set limits, and agree on what device size we are going to expose. We remember that last-agreed-size in our meta data. But if we come up diskless, we have to accept what the peer presents us with. We used to accept the peers maximum potential capacity (backend size), which is wrong, and could lead to IO errors due to access beyond end of device. Instead, we need to accept the peer's current size. Unless that is communicated as 0, in which case we accept the backend size, or the user set limit, if set. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix bogus resync stats in /proc/drbdLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We intentionally do not serialize /proc/drbd access with internal state changes or statistic updates. Because of that, cat /proc/drbd may race with resync just being finished, still see the sync state, and find information about number of blocks still to go, but then find the total number of blocks within this resync has just been reset to 0 when accessing it. This now produces bogus numbers in the resync speed estimates. Fix by accessing all relevant data only once, and fixing it up if "still to go" happens to be more than "total". Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Remove unnecessary/unused codeAndreas Gruenbacher2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get rid of dump_stack() debug statements. There is no point whatsoever in registering and unregistering a reboot notifier that doesn't do anything. The intention was to switch to an "emergency read-only" mode, so we won't have to resync the full activity log just because we had been Primary before the reboot. Once we have that implemented, we may re-introduce the reboot notifier. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: silence -Wmissing-prototypes warningsLars Ellenberg2014-07-10
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: drop wrong debugging aidLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | The textual representation of resync extents in /proc/drbd presented with proc_details >= 3 was wrong, it used bitnumbers as bitmasks. It was not particularly useful either, and I doubt anyone has even tried to look at it in the last few years. Drop it. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: get rid of drbd_queue_work_frontLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The last user was al_write_transaction, if called with "delegate", and the last user to call it with "delegate = true" was the receiver thread, which has no need to delegate, but can call it himself. Finally drop the delegate parameter, drop the extra w_al_write_transaction callback, and drop drbd_queue_work_front. Do not (yet) change dequeue_work_item to dequeue_work_batch, though. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: use drbd_device_post_work() in more placesLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | This replaces the md_sync_work member of struct drbd_device by a new MD_SYNC "work bit" in device->flags. This replaces the resync_start_work member of struct drbd_device by a new RS_START "work bit" in device->flags. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: make sure disk cleanup happens in worker contextLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The recent fix to put_ldev() (correct ordering of access to local_cnt and state.disk; memory barrier in __drbd_set_state) guarantees that the cleanup happens exactly once. However it does not yet guarantee that the cleanup happens from worker context, the last put_ldev() may still happen from atomic context, which must not happen: blkdev_put() may sleep. Fix this by scheduling the cleanup to the worker instead, using a couple more bits in device->flags and a new helper, drbd_device_post_work(). Generalized the "resync progress" work to cover these new work bits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: close race when detaching from diskLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 IP: bd_release+0x21/0x70 Process drbd_w_t7146 Call Trace: close_bdev_exclusive drbd_free_ldev [drbd] drbd_ldev_destroy [drbd] w_after_state_ch [drbd] Race probably went like this: state.disk = D_FAILED ... first one to hit zero during D_FAILED: put_ldev() /* ----------------> 0 */ i = atomic_dec_return() if (i == 0) if (state.disk == D_FAILED) schedule_work(go_diskless) /* 1 <------ */ get_ldev_if_state() go_diskless() do_some_pre_cleanup() corresponding put_ldev(): force_state(D_DISKLESS) /* 0 <------ */ i = atomic_dec_return() if (i == 0) atomic_inc() /* ---------> 1 */ state.disk = D_DISKLESS schedule_work(after_state_ch) /* execution pre-empted by IRQ ? */ after_state_ch() put_ldev() i = atomic_dec_return() /* 0 */ if (i == 0) if (state.disk == D_DISKLESS) if (state.disk == D_DISKLESS) drbd_ldev_destroy() drbd_ldev_destroy(); Trying to fix this by checking the disk state *before* the atomic_dec_return(), which implies memory barriers, and by inserting extra memory barriers around the state assignment in __drbd_set_state(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: explicitly submit meta data requests with REQ_NOIDLELars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | For some reason we have assumed NOIDLE was implied by one of the other flags we set. It is not (anymore?). Explicitly set REQ_NOIDLE for synchronous meta data updates, or we can seriously starve random writes when using CFQ. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: move set_disk_ro() to after we persisted the new roleLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | This probably does not have any real life impact, but we should first persist any potentially new UUID and other meta data flags, as well as our new role, before we allow/disallow write access. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: trigger tcp_push_pending_frames() for PING and PING_ACKLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | This should reduce latency for such in-DRBD-protocol "pings", and may help reduce spurious disconnect/reconnect cycles due to "PingAck did not arrive in time." Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: re-add lost conf_mutex protection in drbd_set_roleLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conf_update mutex used to be held while clearing the net_conf->discard_my_data flag inside drbd_set_role. It was moved into drbd_adm_set_role with drbd: allow parallel promote/demote actions but then replaced at that location by the newly introduced adm_mutex with drbd: Fix a potential deadlock in drbdsetup, introduce resource->adm_mutex And I simply forgot to put it back in at the original location. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: stop the meta data sync timer before open coded meta data syncLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | If we re-write all meta data due to resize, we have open-coded write-out of our meta data super block. Stop the md_sync_timer, it would just trigger scary but in this case spurious "timer expired" messages. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix resync finished detectionLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes one recent regresion, and one long existing bug. The bug: drbd_try_clear_on_disk_bm() assumed that all "count" bits have to be accounted in the resync extent corresponding to the start sector. Since we allow application requests to cross our "extent" boundaries, this assumption is no longer true, resulting in possible misaccounting, scary messages ("BAD! sector=12345s enr=6 rs_left=-7 rs_failed=0 count=58 cstate=..."), and potentially, if the last bit to be cleared during resync would reside in previously misaccounted resync extent, the resync would never be recognized as finished, but would be "stalled" forever, even though all blocks are in sync again and all bits have been cleared... The regression was introduced by drbd: get rid of atomic update on disk bitmap works For an "empty" resync (rs_total == 0), we must not "finish" the resync on the SyncSource before the SyncTarget knows all relevant information (sync uuid). We need to wait for the full round-trip, the SyncTarget will then explicitly notify us. Also for normal, non-empty resyncs (rs_total > 0), the resync-finished condition needs to be tested before the schedule() in wait_for_work, or it is likely to be missed. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: fix a race stopping the worker threadLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | We may implicitly call drbd_send() from inside wait_for_work(), via maybe_send_barrier(). If the "stop" signal was send just before that, drbd_send() would call flush_signals(), and we would run an unbounded schedule() afterwards. Fix: check for thread_state == RUNNING before we schedule() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: get rid of atomic update on disk bitmap worksLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just trigger the occasional lazy bitmap write-out during resync from the central wait_for_work() helper. Previously, during resync, bitmap pages would be written out separately, synchronously, one at a time, at least 8 times each (every 512 bytes worth of bitmap cleared). Now we trigger "merge friendly" bulk write out of all cleared pages every two seconds during resync, and once the resync is finished. Most pages will be written out only once. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: allow write-ordering policy to be bumped up againLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, once you disabled flushes as a means of enforcing write-ordering, you'd need to detach/re-attach to enable them again. Allow drbdsetup disk-options to re-enable previously disabled write-ordering policy options at runtime. While at it fix RCU in drbd_bump_write_ordering() max_allowed_wo() uses rcu_dereference, therefore it must be called within rcu_read_lock()/rcu_read_unlock() Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: refactor use of first_peer_device()Lars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | Reduce the number of calls to first_peer_device(). Instead, call first_peer_device() just once to assign a local variable peer_device. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: reduce number of spinlock drop/re-aquire cyclesLars Ellenberg2014-07-10
| | | | | | | | | | | | | | | | | | Instead of dropping and re-aquiring the spinlock around the submit, just remember that we want to submit, and do that only once we have dropped the spinlock for good. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: rename drbd_free_bc() to drbd_free_ldev()Philipp Reisner2014-07-10
| | | | | | | | | | | | | | Since the member of drbd_device is called ldev Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: device->ldev is not guaranteed on an D_ATTACHING diskPhilipp Reisner2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some parts of the code assumed that get_ldev_if_state(device, D_ATTACHING) is sufficient to access the ldev member of the device object. That was wrong. ldev may not be there or might be freed at any time if the device has a disk state of D_ATTACHING. bm_rw() Documented that drbd_bm_read() is only called from drbd_adm_attach. drbd_bm_write() is only called when a reference is held, and it is documented that a caller has to hold a reference before calling drbd_bm_write() drbd_bm_write_page() Use get_ldev() instead of get_ldev_if_state(device, D_ATTACHING) drbd_bmio_set_n_write() No longer use get_ldev_if_state(device, D_ATTACHING). All callers hold a reference to ldev now. drbd_bmio_clear_n_write() All callers where holding a reference of ldev anyways. Remove the misleading get_ldev_if_state(device, D_ATTACHING) drbd_reconsider_max_bio_size() Removed the get_ldev_if_state(device, D_ATTACHING). All callers now pass a struct drbd_backing_dev* when they have a proper reference, or a NULL pointer. Before this fix, the receiver could trigger a NULL pointer deref when in drbd_reconsider_max_bio_size() drbd_bump_write_ordering() Used get_ldev_if_state(device, D_ATTACHING) with the wrong assumption. Remove it, and allow the caller to pass in a struct drbd_backing_dev* when the caller knows that accessing this bdev is safe. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
| * drbd: Move write_ordering from connection to resourcePhilipp Reisner2014-07-10
| | | | | | | | | | Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
* | drbd: fix regression 'out of mem, failed to invoke fence-peer helper'Lars Ellenberg2014-07-10
|/ | | | | | | | | | | | | | Since linux kernel 3.13, kthread_run() internally uses wait_for_completion_killable(). We sometimes may use kthread_run() while we still have a signal pending, which we used to kick our threads out of potentially blocking network functions, causing kthread_run() to mistake that as a new fatal signal and fail. Fix: flush_signals() before kthread_run(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: fix NULL pointer deref in blk_add_request_payloadLars Ellenberg2014-06-25
| | | | | | | | | | Discards don't have any payload. But the scsi layer still expects a bio_vec it can use internally, see sd_setup_discard_cmnd() and blk_add_request_payload(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: use list_first_entry_or_null in first_peer_device/first_connectionLars Ellenberg2014-04-30
| | | | | | | | | | | | | If there are no peer_devices or connections, I'd rather have NULL than some "arbitrary" address pretending to point to a struct. Helps to avoid hard to debug symptoms, in case we ever try to use and dereference a drbd_connection or drbd_peer_device where we in fact don't have any connection at all. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: Allow attaching of a newly created device to any backing devicePhilipp Reisner2014-04-30
| | | | | | | | | | A newly created device was never exposed before, i.e. has a exposed_data_uuid of 0. Then it is valid to attach to any current_uuid of a backing device (of course also to a newly created one (4)) Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: Test cstate while holding req_lockPhilipp Reisner2014-04-30
| | | | | | | | | | | In case a connection transitions into C_TIMEOUT within the timer function (request_timer_fn()) we need to make sure that the receiver thread (potentially running on a different CPU) sees the updated cstate later on. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: use blk_set_stacking_limits()Philipp Reisner2014-04-30
| | | | | | | | ...instead directly assigning to q->limits.discard_zeroes_data Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: evaluate disk and network timeout on different requestsLars Ellenberg2014-04-30
| | | | | | | | | | | | | | | | | | | | | Just because it is the oldest not yet completed request does not make it the oldest request waiting for disk. Or waiting for the peer. And we completely missed already completed requests that would still hold references to activity log extents, waiting only for the barrier ack. Find two oldest not yet completely processed requests, one that is still waiting for local completion, and one that is still waiting for some response from the peer. These may or may not be the same request object. Then separately apply the network and disk timeouts, respectively. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: Fix a hole in the challange-response connection authenticationPhilipp Reisner2014-04-30
| | | | | | | | | | | | | | | | | In the implementation as it was, the two peers sent each other a challenge, and expects the challenge hashed with the shared secret back. A attacker could simply wait for the challenge of the peer, and send the same challenge back. Then it waits for the response, and sends the same response back. Prevent this by not accepting a challenge from the peer that is the same as the challenge sent to the peer. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: always implicitly close last epoch when idleLars Ellenberg2014-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Once our sender thread needs to wait_for_work(), and actually needs to schedule(), just before we do that, we already check if it is useful to implicitly close the last epoch. The condition was too strict: only implicitly close the epoch, if there have been no new (write) requests at all. The assumption was that if there were new requests, they would always be communicated one way or another, and would send necessary epoch separating barriers explicitly. This is not always true, e.g. when becoming diskless, or while explicitly starting a full resync. The last communicated epoch could stay open for a long time, locking down corresponding activity log extents. It is safe to always implicitly send that last barrier, as soon as we determin that there cannot be more requests in the last communicated epoch, even if there have been (uncommunicated) new requests in new epochs meanwhile. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: add back some fairness to AL transactionsLars Ellenberg2014-04-30
| | | | | | | | | | | | | | | | | | | | | | | | | | When batching more updates to the activity log into single transactions, we lost the ability for new requests to force themselves into the active set: all preparation steps became non-blocking, and if all currently hot extents keep busy, they could starve out new incoming requests to cold extents for quite a while. This can only happen if your IO backend accepts more IO operations per average DRBD replication round trip time than you have al-extents configured. If we have incoming requests to cold extents, at least do one blocking update per transaction. In an artificial worst-case workload on SSD with an asynchronous 600 ms replication link, with al-extents = 7 (the minimum we allow), and concurrent full resynch, without this patch, some write requests have been observed to be starved for 40 seconds. With this patch, application observed a worst case latency of twice the replication round trip time. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: keep max-bio size during detach/attach on disconnected primaryLars Ellenberg2014-04-30
| | | | | | | | | | | | | | | We want to store in persistent meta data what the peer DRBD can handle, which, due to spreading requests to multiple bios, may be more than its backing device can handle. Otherwise, if a disconnected Primary temporarily loses access to its local data as well, we may accidentally shrink the max-bio setting, portentially causing already assembled, but not yet processed, application bios to be spuriously failed due to device limits. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>