aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md
Commit message (Collapse)AuthorAge
...
| * dm crypt: use per-bio dataMikulas Patocka2014-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change dm-crypt so that it uses auxiliary data allocated with the bio. Dm-crypt requires two allocations per request - struct dm_crypt_io and struct ablkcipher_request (with other data appended to it). It previously only used mempool allocations. Some requests may require more dm_crypt_ios and ablkcipher_requests, however most requests need just one of each of these two structures to complete. This patch changes it so that the first dm_crypt_io and ablkcipher_request are allocated with the bio (using target per_bio_data_size option). If the request needs additional values, they are allocated from the mempool. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm table: make dm_table_supports_discards staticMikulas Patocka2014-08-01
| | | | | | | | | | | | | | | | | | The function dm_table_supports_discards is only called from dm-table.c:dm_table_set_restrictions(). So move it above dm_table_set_restrictions and make it static. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache metadata: use dm-space-map-metadata.h defined size limitsMike Snitzer2014-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7d48935e cleaned up the persistent-data's space-map-metadata limits by elevating them to dm-space-map-metadata.h. Update dm-cache-metadata to use these same limits. The calculation for DM_CACHE_METADATA_MAX_SECTORS didn't account for the sizeof the disk_bitmap_header. So the supported maximum metadata size is a bit smaller (reduced from 33423360 to 33292800 sectors). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com>
| * dm cache: fail migrations in the do_worker error pathJoe Thornber2014-08-01
| | | | | | | | | | Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm cache: simplify deferred set reference count incrementsJoe Thornber2014-08-01
| | | | | | | | | | | | | | | | | | | | | | Factor out inc_and_issue and inc_ds helpers to simplify deferred set reference count increments. Also cleanup cache_map to consistently call cell_defer and inc_ds when the bio is DM_MAPIO_REMAPPED. No functional change. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: relax external origin size constraintsJoe Thornber2014-08-01
| | | | | | | | | | | | | | | | | | | | Track the size of any external origin. Previously the external origin's size had to be a multiple of the thin-pool's block size, that is no longer a requirement. In addition, snapshots that are larger than the external origin are now supported. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin: switch to an atomic_t for tracking pending new block preparationsJoe Thornber2014-08-01
| | | | | | | | | | | | | | | | | | Previously we used separate boolean values to track quiescing and copying actions. By switching to an atomic_t we can support blocks that need a partial copy and partial zero. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm mpath: eliminate pg_ready() wrapperMike Snitzer2014-08-01
| | | | | | | | | | | | | | | | pg_ready() is not comprehensive in its logic and only serves to obfuscate code. Replace pg_ready() with the appropriate logic in multipath_map(). Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm io: simplify dec_count and sync_ioJoe Thornber2014-08-01
| | | | | | | | | | | | | | | | | | | | | | Remove the io struct off the stack in sync_io() and allocate it from the mempool like is done in async_io(). dec_count() now always calls a callback function and always frees the io struct back to the mempool (so sync_io and async_io share this pattern). Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge branch 'for-3.17/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2014-08-14
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver changes from Jens Axboe: "Nothing out of the ordinary here, this pull request contains: - A big round of fixes for bcache from Kent Overstreet, Slava Pestov, and Surbhi Palande. No new features, just a lot of fixes. - The usual round of drbd updates from Andreas Gruenbacher, Lars Ellenberg, and Philipp Reisner. - virtio_blk was converted to blk-mq back in 3.13, but now Ming Lei has taken it one step further and added support for actually using more than one queue. - Addition of an explicit SG_FLAG_Q_AT_HEAD for block/bsg, to compliment the the default behavior of adding to the tail of the queue. From Douglas Gilbert" * 'for-3.17/drivers' of git://git.kernel.dk/linux-block: (86 commits) bcache: Drop unneeded blk_sync_queue() calls bcache: add mutex lock for bch_is_open bcache: Correct printing of btree_gc_max_duration_ms bcache: try to set b->parent properly bcache: fix memory corruption in init error path bcache: fix crash with incomplete cache set bcache: Fix more early shutdown bugs bcache: fix use-after-free in btree_gc_coalesce() bcache: Fix an infinite loop in journal replay bcache: fix crash in bcache_btree_node_alloc_fail tracepoint bcache: bcache_write tracepoint was crashing bcache: fix typo in bch_bkey_equal_header bcache: Allocate bounce buffers with GFP_NOWAIT bcache: Make sure to pass GFP_WAIT to mempool_alloc() bcache: fix uninterruptible sleep in writeback thread bcache: wait for buckets when allocating new btree root bcache: fix crash on shutdown in passthrough mode bcache: fix lockdep warnings on shutdown bcache allocator: send discards with correct size bcache: Fix to remove the rcu_sched stalls. ...
| * | bcache: Drop unneeded blk_sync_queue() callsKent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | this is needed for the queue/block device we created (it's done by blk_cleanup_queue() which we do call) - but calling it for the block devices we only opened is pointless. Change-Id: I53dfded14ed15b9581d10ca8399d5e1b3abbf9f2
| * | bcache: add mutex lock for bch_is_openJianjian Huo2014-08-04
| | | | | | | | | | | | | | | | | | | | | Since bch_is_open will iterate linked list bch_cache_sets and uncached_devices, it needs bch_register_lock. Signed-off-by: Jianjian Huo <samuel.huo@gmail.com>
| * | bcache: Correct printing of btree_gc_max_duration_msSurbhi Palande2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | time_stats::btree_gc_max_duration_mc is not bit shifted by 8 Fixes BUG #138 Change-Id: I44fc6e1d0579674016acc533f1a546b080e5371a Signed-off-by: Surbhi Palande <sap@daterainc.com>
| * | bcache: try to set b->parent properlySlava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | | | | bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size before; now it passes. Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
| * | bcache: fix memory corruption in init error pathSlava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | If register_cache_set() failed, we would touch ca->set after it had already been freed. Also, fix an assertion to catch this. Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d
| * | bcache: fix crash with incomplete cache setSlava Pestov2014-08-04
| | | | | | | | | | | | Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
| * | bcache: Fix more early shutdown bugsKent Overstreet2014-08-04
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix use-after-free in btree_gc_coalesce()Slava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we goto out_nocoalesce after we free new_nodes[0], we end up freeing new_nodes[0] again. This was generating a lockdep warning. The fix is to set new_nodes[0] to NULL, since the out_nocoalesce path safely ignores NULL entries in the new_nodes array. This regression was introduced in 2d7f9531. Change-Id: I76564d7257800583214376b4bacf236cda90c89c
| * | bcache: Fix an infinite loop in journal replayKent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | When running with multiple cache devices, if one of the devices has a completely empty journal but we'd already found some journal entries on a previosu device we'd go into an infinite loop. Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix crash in bcache_btree_node_alloc_fail tracepointSlava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | 'b' was NULL. Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
| * | bcache: bcache_write tracepoint was crashingSlava Pestov2014-08-04
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix typo in bch_bkey_equal_headerSlava Pestov2014-08-04
| | | | | | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Allocate bounce buffers with GFP_NOWAITKent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | There's no point in blocking on these allocations, since our fallback paths will probably go faster than blocking. Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c
| * | bcache: Make sure to pass GFP_WAIT to mempool_alloc()Kent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT. bcache uses GFP_NOWAIT in various other places where we have a fallback, circuits must've gotten crossed when writing this code or something. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix uninterruptible sleep in writeback threadSlava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were two issues here: - writeback thread did not start until the device first became dirty - writeback thread used uninterruptible sleep once running Without this patch I see kernel warnings printed and a load average of 1.52 after booting my test VM. With this patch the warnings are gone and the load average is near 0.00 as expected. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: wait for buckets when allocating new btree rootSlava Pestov2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | Tested: - sometimes bcache_tier test would hang on startup with a failure to allocate the btree root -- no longer seeing this Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: fix crash on shutdown in passthrough modeSlava Pestov2014-08-04
| | | | | | | | | | | | We never started the writeback thread in this case, so don't stop it.
| * | bcache: fix lockdep warnings on shutdownSlava Pestov2014-08-04
| | |
| * | bcache allocator: send discards with correct sizeSlava Pestov2014-08-04
| | |
| * | bcache: Fix to remove the rcu_sched stalls.Surbhi Palande2014-08-04
| | | | | | | | | | | | | | | | | | | | | | | | while loop was executing infinitely. This fix ends the while loop gracefully. Signed-off-by: Surbhi Palande <sap@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Fix a journal replay bugKent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | journal replay wansn't validating pointers with bch_extent_invalid() before derefing, fixed Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * | bcache: Fix a bug when detachingKent Overstreet2014-08-04
| | | | | | | | | | | | | | | | | | | | | After detaching a backing device from a cache set, a bit wasn't getting reset meaning the second detach wouldn't work correctly. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* | | Merge tag 'md/3.17' of git://neil.brown.name/mdLinus Torvalds2014-08-11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md updates from Neil Brown: "Most interesting is that md devices (major == 9) with minor numbers of 512 or more will no longer be created simply by opening a block device file. They can only be created by writing to /sys/module/md_mod/parameters/new_array The 'auto-create-on-open' semantic is cumbersome and we need to start moving away from it" * tag 'md/3.17' of git://neil.brown.name/md: md: don't allow bitmap file to be added to raid0/linear. md/raid0: check for bitmap compatability when changing raid levels. md: Recovery speed is wrong md: disable probing for md devices 512 and over. md/raid1,raid10: always abort recover on write error.
| * | | md: don't allow bitmap file to be added to raid0/linear.NeilBrown2014-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An array can only accept a bitmap if it will call bitmap_daemon_work periodically, which means it needs a thread running. If there is no thread, don't allow a bitmap to be added. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md/raid0: check for bitmap compatability when changing raid levels.NeilBrown2014-08-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If an array has a bitmap, then it cannot be converted to raid0. Reported-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md: Recovery speed is wrongXiao Ni2014-08-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we calculate the speed of recovery, the numerator that contains the recovery done sectors. It's need to subtract the sectors which don't finish recovery. Signed-off-by: Xiao Ni <xni@redhat.com> Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md: disable probing for md devices 512 and over.NeilBrown2014-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way md devices are traditionally created in the kernel is simply to open the device with the desired major/minor number. This can be problematic as some support tools, notably udev and programs run by udev, can open a device just to see what is there, and find that it has created something. It is easy for a race to cause udev to open an md device just after it was destroy, causing it to suddenly re-appear. For some time we have had an alternate way to create md devices echo md_somename > /sys/modules/md_mod/paramaters/new_array This will always use a minor number of 512 or higher, which mdadm normally avoids. Using this makes the creation-by-opening unnecessary, but does not disable it, so it is still there to cause problems. This patch disable probing for devices with a major of 9 (MD_MAJOR) and a minor of 512 and up. This devices created by writing to new_array cannot be re-created by opening the node in /dev. Signed-off-by: NeilBrown <neilb@suse.de>
| * | | md/raid1,raid10: always abort recover on write error.NeilBrown2014-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we don't abort recovery on a write error if the write error to the recovering device was triggerd by normal IO (as opposed to recovery IO). This means that for one bitmap region, the recovery might write to the recovering device for a few sectors, then not bother for subsequent sectors (as it never writes to failed devices). In this case the bitmap bit will be cleared, but it really shouldn't. The result is that if the recovering device fails and is then re-added (after fixing whatever hardware problem triggerred the failure), the second recovery won't redo the region it was in the middle of, so some of the device will not be recovered properly. If we abort the recovery, the region being processes will be cancelled (bit not cleared) and the whole region will be retried. As the bug can result in data corruption the patch is suitable for -stable. For kernels prior to 3.11 there is a conflict in raid10.c which will require care. Original-from: jiao hui <jiaohui@bwstor.com.cn> Reported-and-tested-by: jiao hui <jiaohui@bwstor.com.cn> Signed-off-by: NeilBrown <neilb@suse.de> Cc: stable@vger.kernel.org
* | | | Merge branch 'sched-core-for-linus' of ↵Linus Torvalds2014-08-04
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: - Move the nohz kick code out of the scheduler tick to a dedicated IPI, from Frederic Weisbecker. This necessiated quite some background infrastructure rework, including: * Clean up some irq-work internals * Implement remote irq-work * Implement nohz kick on top of remote irq-work * Move full dynticks timer enqueue notification to new kick * Move multi-task notification to new kick * Remove unecessary barriers on multi-task notification - Remove proliferation of wait_on_bit() action functions and allow wait_on_bit_action() functions to support a timeout. (Neil Brown) - Another round of sched/numa improvements, cleanups and fixes. (Rik van Riel) - Implement fast idling of CPUs when the system is partially loaded, for better scalability. (Tim Chen) - Restructure and fix the CPU hotplug handling code that may leave cfs_rq and rt_rq's throttled when tasks are migrated away from a dead cpu. (Kirill Tkhai) - Robustify the sched topology setup code. (Peterz Zijlstra) - Improve sched_feat() handling wrt. static_keys (Jason Baron) - Misc fixes. * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits) sched/fair: Fix 'make xmldocs' warning caused by missing description sched: Use macro for magic number of -1 for setparam sched: Robustify topology setup sched: Fix sched_setparam() policy == -1 logic sched: Allow wait_on_bit_action() functions to support a timeout sched: Remove proliferation of wait_on_bit() action functions sched/numa: Revert "Use effective_load() to balance NUMA loads" sched: Fix static_key race with sched_feat() sched: Remove extra static_key*() function indirection sched/rt: Fix replenish_dl_entity() comments to match the current upstream code sched: Transform resched_task() into resched_curr() sched/deadline: Kill task_struct->pi_top_task sched: Rework check_for_tasks() sched/rt: Enqueue just unthrottled rt_rq back on the stack in __disable_runtime() sched/fair: Disable runtime_enabled on dying rq sched/numa: Change scan period code to match intent sched/numa: Rework best node setting in task_numa_migrate() sched/numa: Examine a task move when examining a task swap sched/numa: Simplify task_numa_compare() sched/numa: Use effective_load() to balance NUMA loads ...
| * | | Merge branch 'sched/urgent' into sched/core, to merge fixes before applying ↵Ingo Molnar2014-07-28
| |\| | | | | | | | | | | | | | | | | | | | | | new changes Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | sched: Remove proliferation of wait_on_bit() action functionsNeilBrown2014-07-16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown <neilb@suse.de> Acked-by: David Howells <dhowells@redhat.com> (fscache, keys) Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2) Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Steve French <sfrench@samba.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | dm cache: fix race affecting dirty block countAnssi Hannula2014-08-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | nr_dirty is updated without locking, causing it to drift so that it is non-zero (either a small positive integer, or a very large one when an underflow occurs) even when there are no actual dirty blocks. This was due to a race between the workqueue and map function accessing nr_dirty in parallel without proper protection. People were seeing under runs due to a race on increment/decrement of nr_dirty, see: https://lkml.org/lkml/2014/6/3/648 Fix this by using an atomic_t for nr_dirty. Reported-by: roma1390@gmail.com Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* | | | dm bufio: fully initialize shrinkerGreg Thelen2014-08-01
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to struct shrinker assuming that all shrinkers were zero filled. The dm bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE. But there are proposed patches which add other bits to shrinker.flags (e.g. memcg awareness). Rather than simply initializing the shrinker, this patch uses kzalloc() when allocating the dm_bufio_client to ensure that the embedded shrinker and any other similar structures are zeroed. This fixes theoretical over aggressive shrinking of dm bufio objects. If the uninitialized dm_bufio_client.shrinker.flags contains SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for each numa node rather than just once. This has been broken since 3.12. Signed-off-by: Greg Thelen <gthelen@google.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.12+
* | | Merge tag 'dm-3.16-fixes-2' of ↵Linus Torvalds2014-07-18
|\ \ \ | |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: "Fix the dm-thinp and dm-cache targets to disallow changing the data device's block size" * tag 'dm-3.16-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm cache metadata: do not allow the data block size to change dm thin metadata: do not allow the data block size to change
| * | dm cache metadata: do not allow the data block size to changeMike Snitzer2014-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block size for the dm-cache's data device must remained fixed for the life of the cache. Disallow any attempt to change the cache's data block size. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
| * | dm thin metadata: do not allow the data block size to changeMike Snitzer2014-07-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The block size for the thin-pool's data device must remained fixed for the life of the thin-pool. Disallow any attempt to change the thin-pool's data block size. It should be noted that attempting to change the data block size via thin-pool table reload will be ignored as a side-effect of the thin-pool handover that the thin-pool target does during thin-pool table reload. Here is an example outcome of attempting to load a thin-pool table that reduced the thin-pool's data block size from 1024K to 512K. Before: kernel: device-mapper: thin: 253:4: growing the data device from 204800 to 409600 blocks After: kernel: device-mapper: thin metadata: changing the data block size (from 2048 to 1024) is not supported kernel: device-mapper: table: 253:4: thin-pool: Error creating metadata object kernel: device-mapper: ioctl: error adding target to table Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Joe Thornber <ejt@redhat.com> Cc: stable@vger.kernel.org
* | | Merge tag 'dm-3.16-fixes' of ↵Linus Torvalds2014-07-11
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM multipath IO hang regression from 3.15 due to logic bug in multipath_busy. This impacted cable-pull testing and also the ability to boot with IPR SCSI on a POWER8 box. - Fix possible deadlock with deferred device removal by using a new dedicated workqueue rather than using the system workqueue. - Fix NULL pointer crash due to race condition in dm-io's wake up code for sync_io by using a completion. - Update dm-crypt and dm-zero author name following legal name change; this is important to Jana so I didn't see any reason to hold it back. * tag 'dm-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm mpath: fix IO hang due to logic bug in multipath_busy dm io: fix a race condition in the wake up code for sync_io dm crypt, dm zero: update author name following legal name change dm: allocate a special workqueue for deferred device removal
| * | dm mpath: fix IO hang due to logic bug in multipath_busyJun'ichi Nomura2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e80991773 ("dm mpath: push back requests instead of queueing") modified multipath_busy() to return true if !pg_ready(). pg_ready() checks the current state of the multipath device and may return false even if a new IO is needed to change the state. Bart Van Assche reported that he had multipath IO lockup when he was performing cable pull tests. Analysis showed that the multipath device had a single path group with both paths active, but that the path group itself was not active. During the multipath device state transitions 'queue_io' got set but nothing could clear it. Clearing 'queue_io' only happens in __choose_pgpath(), but it won't be called if multipath_busy() returns true due to pg_ready() returning false when 'queue_io' is set. As such the !pg_ready() check in multipath_busy() is wrong because new IO will not be sent to multipath target and the multipath state change won't happen. That results in multipath IO lockup. The intent of multipath_busy() is to avoid unnecessary cycles of dequeue + request_fn + requeue if it is known that the multipath device will requeue. Such "busy" situations would be: - path group is being activated - there is no path and the multipath is setup to requeue if no path Fix multipath_busy() to return "busy" early only for these specific situations. Reported-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # v3.15
| * | dm io: fix a race condition in the wake up code for sync_ioJoe Thornber2014-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race condition between the atomic_dec_and_test(&io->count) in dec_count() and the waking of the sync_io() thread. If the thread is spuriously woken immediately after the decrement it may exit, making the on stack io struct invalid, yet the dec_count could still be using it. Fix this race by using a completion in sync_io() and dec_count(). Reported-by: Minfei Huang <huangminfei@ucloud.cn> Signed-off-by: Joe Thornber <thornber@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Acked-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org
| * | dm crypt, dm zero: update author name following legal name changeJana Saout2014-07-10
| | | | | | | | | | | | | | | Signed-off-by: Jana Saout <jana@saout.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>