aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/md/dm-cache-metadata.c
Commit message (Collapse)AuthorAge
* dm cache: fix a lock-inversionJoe Thornber2014-04-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When suspending a cache the policy is walked and the individual policy hints written to the metadata via sync_metadata(). This led to this lock order: policy->lock cache_metadata->root_lock When loading the cache target the policy is populated while the metadata lock is held: cache_metadata->root_lock policy->lock Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by ensuring the cache_metadata root_lock is held whilst all the hints are written, rather than being repeatedly locked while policy->lock is held (as was the case with each callout that policy_walk_mappings() made to the old save_hint() method). Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove locking correctness") build option. However, it is not clear how the LOCKDEP reported paths can lead to a deadlock since the two paths, suspending a target and loading a target, never occur at the same time. But that doesn't mean the same lock-inversion couldn't have occurred elsewhere. Reported-by: Marian Csontos <mcsontos@redhat.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm: take care to copy the space map roots before locking the superblockJoe Thornber2014-03-27
| | | | | | | | | | | | In theory copying the space map root can fail, but in practice it never does because we're careful to check what size buffer is needed. But make certain we're able to copy the space map roots before locking the superblock. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org # drop dm-era and dm-cache changes as needed
* dm transaction manager: fix corruption due to non-atomic transaction commitJoe Thornber2014-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The persistent-data library used by dm-thin, dm-cache, etc is transactional. If anything goes wrong, such as an io error when writing new metadata or a power failure, then we roll back to the last transaction. Atomicity when committing a transaction is achieved by: a) Never overwriting data from the previous transaction. b) Writing the superblock last, after all other metadata has hit the disk. This commit and the following commit ("dm: take care to copy the space map roots before locking the superblock") fix a bug associated with (b). When committing it was possible for the superblock to still be written in spite of an io error occurring during the preceeding metadata flush. With these commits we're careful not to take the write lock out on the superblock until after the metadata flush has completed. Change the transaction manager's semantics for dm_tm_commit() to assume all data has been flushed _before_ the single superblock that is passed in. As a prerequisite, split the block manager's block unlocking and flushing by simplifying dm_bm_flush_and_unlock() to dm_bm_flush(). Now the unlocking must be done separately. This issue was discovered by forcing io errors at the crucial time using dm-flakey. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm cache: remove remainder of distinct discard block sizeHeinz Mauelshagen2014-03-27
| | | | | | | | | | | | | Discard block size not being equal to cache block size causes data corruption by erroneously avoiding migrations in issue_copy() because the discard state is being cleared for a group of cache blocks when it should not. Completely remove all code that enabled a distinction between the cache block size and discard block size. Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache metadata: check the metadata version when reading the superblockJoe Thornber2013-11-11
| | | | | | | Need to check the version to verify on-disk metadata is supported. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache: add passthrough modeJoe Thornber2013-11-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "Passthrough" is a dm-cache operating mode (like writethrough or writeback) which is intended to be used when the cache contents are not known to be coherent with the origin device. It behaves as follows: * All reads are served from the origin device (all reads miss the cache) * All writes are forwarded to the origin device; additionally, write hits cause cache block invalidates This mode decouples cache coherency checks from cache device creation, largely to avoid having to perform coherency checks while booting. Boot scripts can create cache devices in passthrough mode and put them into service (mount cached filesystems, for example) without having to worry about coherency. Coherency that exists is maintained, although the cache will gradually cool as writes take place. Later, applications can perform coherency checks, the nature of which will depend on the type of the underlying storage. If coherency can be verified, the cache device can be transitioned to writethrough or writeback mode while still warm; otherwise, the cache contents can be discarded prior to transitioning to the desired operating mode. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com> Signed-off-by: Morgan Mears <Morgan.Mears@netapp.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache: cache shrinking supportJoe Thornber2013-11-11
| | | | | | | | Allow a cache to shrink if the blocks being removed from the cache are not dirty. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache metadata: return bool from __superblock_all_zeroesJoe Thornber2013-11-09
| | | | | Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm cache: replace memcpy with struct assignmentJoe Thornber2013-05-10
| | | | | | | Use struct assignment rather than memcpy in dm cache. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm cache: policy ignore hints if generated by different versionMike Snitzer2013-03-20
| | | | | | | | | | | | | | | When reading the dm cache metadata from disk, ignore the policy hints unless they were generated by the same major version number of the same policy module. The hints are considered to be private data belonging to the specific module that generated them and there is no requirement for them to make sense to different versions of the policy that generated them. Policy modules are all required to work fine if no previous hints are supplied (or if existing hints are lost). Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm cache: policy change version from string to integer setMike Snitzer2013-03-20
| | | | | | | | | | Separate dm cache policy version string into 3 unsigned numbers corresponding to major, minor and patchlevel and store them at the end of the on-disk metadata so we know which version of the policy generated the hints in case a future version wants to use them differently. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm cache: metadata clear dirty bits on clean shutdownJoe Thornber2013-03-20
| | | | | | | | | | | | When writing the dirty bitset to the metadata device on a clean shutdown, clear the dirty bits. Previously they were left indicating the cache was dirty. This led to confusion about whether there really was dirty data in the cache or not. (This was a harmless bug.) Reported-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
* dm: add cache targetJoe Thornber2013-03-01
Add a target that allows a fast device such as an SSD to be used as a cache for a slower device such as a disk. A plug-in architecture was chosen so that the decisions about which data to migrate and when are delegated to interchangeable tunable policy modules. The first general purpose module we have developed, called "mq" (multiqueue), follows in the next patch. Other modules are under development. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Heinz Mauelshagen <mauelshagen@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>