aboutsummaryrefslogtreecommitdiffstats
path: root/fs/xfs
Commit message (Collapse)AuthorAge
...
| * | | | Merge branch 'xfs-misc-fixes-4.6-2' into for-nextDave Chinner2016-03-06
| |\ \ \ \
| | * | | | xfs: remove xfs_trans_get_block_resChristoph Hellwig2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Just use the t_blk_res field directly instead of obsfucating the reference by a macro. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: fix up inode32/64 (re)mount handlingEric Sandeen2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | inode32/inode64 allocator behavior with respect to mount, remount and growfs is a little tricky. The inode32 mount option should only enable the inode32 allocator heuristics if the filesystem is large enough for 64-bit inodes to exist. Today, it has this behavior on the initial mount, but a remount with inode32 unconditionally changes the allocation heuristics, even for a small fs. Also, an inode32 mounted small filesystem should transition to the inode32 allocator if the filesystem is subsequently grown to a sufficient size. Today that does not happen. This patch consolidates xfs_set_inode32 and xfs_set_inode64 into a single new function, and moves the "is the maximum inode number big enough to matter" test into that function, so it doesn't rely on the caller to get it right - which remount did not do, previously. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: fix format specifier , should be %llx and not %lluColin Ian King2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | busyp->bno is printed with a %llu format specifier when the intention is to print a hexadecimal value. Trivial fix to use %llx instead. Found with smatch static analysis: fs/xfs/xfs_discard.c:229 xfs_discard_extents() warn: '0x' prefix is confusing together with '%llu' specifier Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: sanitize remount optionsEric Sandeen2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perform basic sanitization of remount options by passing the option string and a dummy mount structure through xfs_parseargs and returning the result. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: convert mount option parsing to tokensEric Sandeen2016-03-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This should be a no-op change, just switch to token parsing like every other respectable filesystem does. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: fix two memory leaks in xfs_attr_list.c error pathsMateusz Guzik2016-03-01
| | | |/ / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This plugs 2 trivial leaks in xfs_attr_shortform_list and xfs_attr3_leaf_list_int. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Cc: <stable@vger.kernel.org> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | Merge branch 'xfs-dax-fixes-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \ \
| | * | | | xfs: XFS_DIFLAG2_DAX limited by PAGE_SIZEDave Chinner2016-02-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the block size of a filesystem is not at least PAGE_SIZEd, then at this point in time DAX cannot be used due to the fact we can't guarantee extents are page sized or aligned without further work. Hence disallow setting the DAX flag on an inode if the block size is too small. Also, be defensive and check the block size when reading an inode in off disk. In future, we want to allow DAX to work on any filesystem, so this is temporary while we sort of the correct conbination of extent size hints and allocation alignment configurations needed to guarantee page sized and aligned extent allocation for DAX enabled files. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: dynamically switch modes when XFS_DIFLAG2_DAX is set/clearedDave Chinner2016-02-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we set or clear the XFS_DIFLAG2_DAX flag, we should also set/clear the S_DAX flag in the VFS inode. To do this, we need to ensure that we first flush and remove any cached entries in the radix tree to ensure the correct data access method is used when we next try to read or write data. We ahve to be especially careful here to lock out page faults so they don't race with the flush and invalidation before we change the access mode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: S_DAX is only for regular filesDave Chinner2016-02-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only regular files can use DAX for data operations, so we should restrict setting it on the VFS inode to regular files. Setting it on metadata inodes may cause the VFS to do the wrong thing for such inodes, so avoid potential problems by restricting the scope of the flag to what we know is supported. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: XFS_DIFLAG_DAX is only for regular files or directoriesDave Chinner2016-02-29
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only file data can use DAX, so we should onyl be able to set this flag on regular files. However, the flag also serves as an "inherit" flag at file create time when set on directories, so limit the FS_IOC_FSSETXATTR ioctl to only set this flag on regular files and directories. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | Merge branch 'xfs-writepage-rework-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \ \
| | * | | | xfs: ioends require logically contiguous file offsetsDarrick J. Wong2016-03-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to create a new ioend if the current writepage call isn't logically contiguous with the range contained in the previous ioend. Hopefully writepage gets called in order of increasing file offset. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: don't chain ioends during writepage submissionDave Chinner2016-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can build a long ioend chain during ->writepages that gets attached to the writepage context. IO submission only then occurs when we finish all the writepage processing. This means we can have many ioends allocated and pending, and this violates the mempool guarantees that we need to give about forwards progress. i.e. we really should only have one ioend being built at a time, otherwise we may drain the mempool trying to allocate a new ioend and that blocks submission, completion and freeing of ioends that are already in progress. To prevent this situation from happening, we need to submit ioends for IO as soon as they are ready for dispatch rather than queuing them for later submission. This means the ioends have bios built immediately and they get queued on any plug that is current active. Hence if we schedule away from writeback, the ioends that have been built will make forwards progress due to the plug flushing on context switch. This will also prevent context switches from creating unnecessary IO submission latency. We can't completely avoid having nested IO allocation - when we have a block size smaller than a page size, we still need to hold the ioend submission until after we have marked the current page dirty. Hence we may need multiple ioends to be held while the current page is completely mapped and made ready for IO dispatch. We cannot avoid this problem - the current code already has this ioend chaining within a page so we can mostly ignore that it occurs. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: factor mapping out of xfs_do_writepageDave Chinner2016-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate out the bufferhead based mapping from the writepage code so that we have a clear separation of the page operations and the bufferhead state. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: xfs_cluster_write is redundantDave Chinner2016-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_cluster_write() is not necessary now that xfs_vm_writepages() aggregates writepage calls across a single mapping. This means we no longer need to do page lookups in xfs_cluster_write, so writeback only needs to look up th epage cache once per page being written. This also removes a large amount of mostly duplicate code between xfs_do_writepage() and xfs_convert_page(). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: Introduce writeback context for writepagesDave Chinner2016-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_vm_writepages() calls generic_writepages to writeback a range of a file, but then xfs_vm_writepage() clusters pages itself as it does not have any context it can pass between->writepage calls from __write_cache_pages(). Introduce a writeback context for xfs_vm_writepages() and call __write_cache_pages directly with our own writepage callback so that we can pass that context to each writepage invocation. This encapsulates the current mapping, whether it is valid or not, the current ioend and it's IO type and the ioend chain being built. This requires us to move the ioend submission up to the level where the writepage context is declared. This does mean we do not submit IO until we packaged the entire writeback range, but with the block plugging in the writepages call this is the way IO is submitted, anyway. It also means that we need to handle discontiguous page ranges. If the pages sent down by write_cache_pages to the writepage callback are discontiguous, we need to detect this and put each discontiguous page range into individual ioends. This is needed to ensure that the ioend accurately represents the range of the file that it covers so that file size updates during IO completion set the size correctly. Failure to take into account the discontiguous ranges results in files being too small when writeback patterns are non-sequential. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: remove xfs_cancel_ioendDave Chinner2016-02-15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently have code to cancel ioends being built because we change bufferhead state as we build the ioend. On error, this needs to be unwound and so we have cancelling code that walks the buffers on the ioend chain and undoes these state changes. However, the IO submission path already handles state changes for buffers when a submission error occurs, so we don't really need a separate cancel function to do this - we can simply submit the ioend chain with the specific error and it will be cancelled rather than submitted. Hence we can remove the explicit cancel code and just rely on submission to deal with the error correctly. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | | xfs: remove nonblocking mode from xfs_vm_writepageDave Chinner2016-02-15
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the nonblocking optimisation done for mapping lookups during writeback. It's not clear that leaving a hole in the writeback range just because we couldn't get a lock is really a win, as it makes us do another small random IO later on rather than a large sequential IO now. As this gets in the way of sane error handling later on, just remove for the moment and we can re-introduce an equivalent optimisation in future if we see problems due to extent map lock contention. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | | Merge branch 'xfs-buf-macro-cleanup-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \ \ | | | |/ / | | |/| |
| | * | | xfs: remove XFS_BUF_ZEROFLAGS macroDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The places where we use this macro already clear unnecessary IO flags (e.g. through xfs_bwrite()) or never have unexpected IO flags set on them in the first place (e.g. iclog buffers). Remove the macro from these locations, and where necessary clear only the specific flags that are conditional in the current buffer context. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove XBF_STALE flag wrapper macrosDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove XBF_WRITE flag wrapper macrosDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove XBF_READ flag wrapper macrosDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove XBF_ASYNC flag wrapper macrosDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove XBF_DONE flag wrapper macrosDave Chinner2016-02-09
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They only set/clear/check a flag, no need for obfuscating this with a macro. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | Merge branch 'xfs-gut-icdinode-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \
| | * | | xfs: mode di_mode to vfs inodeDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the di_mode value from the xfs_icdinode to the VFS inode, reducing the xfs_icdinode byte another 2 bytes and collapsing another 2 byte hole in the structure. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: move di_changecount to VFS inodeDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can store the di_changecount in the i_version field of the VFS inode and remove another 8 bytes from the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: move inode generation count to VFS inodeDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull another 4 bytes out of the xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: use vfs inode nlink field everywhereDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The VFS tracks the inode nlink just like the xfs_icdinode. We can remove the variable from the icdinode and use the VFS inode variable everywhere, reducing the size of the xfs_icdinode by a further 4 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: reinitialise recycled VFS inode correctlyDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are going to keep certain on-disk information in the VFS inode rather than in a separate XFS specific stucture, so we have to be careful of the VFS code clearing that information when we re-initialise reclaimable cached inodes during lookup. If we don't do this, then we lose critical information from the inode and that results in corruption being detected. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: move v1 inode conversion to xfs_inode_from_diskDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So we don't have to carry an di_onlink variable around anymore, move the inode conversion from v1 inode format to v2 inode format into xfs_inode_from_disk(). This means we can remove the di_onlink fields from the struct xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: cull unnecessary icdinode fieldsDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the struct xfs_icdinode is not directly related to the on-disk format, we can cull things in it we really don't need to store: - magic number never changes - padding is not necessary - next_unlinked is never used - inode number is redundant - uuid is redundant - lsn is accessed directly from dinode - inode CRC is only accessed directly from dinode Hence we can remove these from the struct xfs_icdinode and redirect the code that uses them to the xfs_dinode appripriately. This reduces the size of the struct icdinode from 152 bytes to 88 bytes, and removes a fair chunk of unnecessary code, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove timestamps from incore inodeDave Chinner2016-02-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The struct xfs_inode has two copies of the current timestamps in it, one in the vfs inode and one in the struct xfs_icdinode. Now that we no longer log the struct xfs_icdinode directly, we don't need to keep the timestamps in this structure. instead we can copy them straight out of the VFS inode when formatting the inode log item or the on-disk inode. This reduces the struct xfs_inode in size by 24 bytes. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: introduce inode log format objectDave Chinner2016-02-09
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently carry around and log an entire inode core in the struct xfs_inode. A lot of the information in the inode core is duplicated in the VFS inode, but we cannot remove this duplication of infomration because the inode core is logged directly in xfs_inode_item_format(). Add a new function xfs_inode_item_format_core() that copies the inode core data into a struct xfs_icdinode that is pulled directly from the log vector buffer. This means we no longer directly copy the inode core, but copy the structures one member at a time. This will be slightly less efficient than copying, but will allow us to remove duplicate and unnecessary items from the struct xfs_inode. To enable us to do this, call the new structure a xfs_log_dinode, so that we know it's different to the physical xfs_dinode and the in-core xfs_icdinode. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | Merge branch 'xfs-misc-fixes-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \
| | * | | xfs: fix xfs_log_ticket leak in xfs_end_io() after fs shutdownBrian Foster2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the filesystem has shut down, xfs_end_io() currently sets an error on the ioend and proceeds to ioend destruction. The ioend might contain a truncate transaction if the I/O extended the size of the file. This transaction is only cleaned up in xfs_setfilesize_ioend(), however, which is skipped in this case. This results in an xfs_log_ticket leak message when the associate cache slab is destroyed (e.g., on rmmod). This was originally reproduced by xfs/141 on a distro kernel. The problem is reproducible on an upstream kernel, but not easily detected in current upstream if the xfs_log_ticket cache happens to be merged with another cache. This can be reproduced more deterministically with the 'slab_nomerge' kernel boot option. Update xfs_end_io() to proceed with normal end I/O processing after an error is set on an ioend due to fs shutdown. The I/O type-based processing is already designed to handle an I/O error and ensure that the ioend is cleaned up correctly. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: clean up unwritten buffers on write failureBrian Foster2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xfs_vm_write_failed() handler is currently responsible for cleaning up any delalloc blocks over the range of a failed write beyond EOF. Failure to do so results in warning messages and other inconsistencies between buffer and extent state. The ->releasepage() handler currently warns in the event of a page being released with either unwritten or delalloc buffers, as neither is ever expected by the time a page is released. As has been reproduced by generic/083 on a -bsize=1k fs, it is currently possible to trigger the ->releasepage() warning for a page with unwritten buffers when a filesystem is near ENOSPC. This is reproduced by the following sequence: $ mkfs.xfs -f -b size=1k -d size=100m <dev> $ mount <dev> /mnt/ $ $ xfs_io -fc "falloc -k 0 1k" /mnt/file $ dd if=/dev/zero of=/mnt/enospc conv=notrunc oflag=append $ $ xfs_io -c "pwrite 512 1k" /mnt/file $ xfs_io -d -c "pwrite 16k 1k" /mnt/file The first pwrite command attempts a block unaligned write across an unwritten block and a hole. The delalloc for the hole fails with ENOSPC and the subsequent error handling does not clean up the unwritten buffer that was instantiated during the first ->get_block() call. The second pwrite triggers a warning as part of the inode mapping invalidation that occurs prior to direct I/O. The releasepage() handler detects the unwritten buffer at this time, warns and prevents the release of the page. To deal with this problem, update xfs_vm_write_failed() to clean up unwritten as well as delalloc buffers that are beyond EOF and within the range of the failed write. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: move struct xfs_attr_shortform to xfs_da_format.hDarrick J. Wong2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the shortform attr structure definition to the same place as the other attribute structure definitions for consistency and also so that xfs/122 verifies the structure size. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: Make xfsaild freezeable againMichal Hocko2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Hendik has reported suspend failures due to xfsaild blocking the freezer to settle down. Jan 17 19:59:56 linux-6380 kernel: PM: Syncing filesystems ... done. Jan 17 19:59:56 linux-6380 kernel: PM: Preparing system for sleep (mem) Jan 17 19:59:56 linux-6380 kernel: Freezing user space processes ... (elapsed 0.001 seconds) done. Jan 17 19:59:56 linux-6380 kernel: Freezing remaining freezable tasks ... Jan 17 19:59:56 linux-6380 kernel: Freezing of tasks failed after 20.002 seconds (1 tasks refusing to freeze, wq_busy=0): Jan 17 19:59:56 linux-6380 kernel: xfsaild/dm-5 S 00000000 0 1293 2 0x00000080 Jan 17 19:59:56 linux-6380 kernel: f0ef5f00 00000046 00000200 00000000 ffff9022 c02d3800 00000000 00000032 Jan 17 19:59:56 linux-6380 kernel: ee0b2400 00000032 f71e0d00 f36fabc0 f0ef2d00 f0ef6000 f0ef2d00 f12f90c0 Jan 17 19:59:56 linux-6380 kernel: f0ef5f0c c0844e44 00000000 f0ef5f6c f811e0be 00000000 00000000 f0ef2d00 Jan 17 19:59:56 linux-6380 kernel: Call Trace: Jan 17 19:59:56 linux-6380 kernel: [<c0844e44>] schedule+0x34/0x90 Jan 17 19:59:56 linux-6380 kernel: [<f811e0be>] xfsaild+0x5de/0x600 [xfs] Jan 17 19:59:56 linux-6380 kernel: [<c0286cbb>] kthread+0x9b/0xb0 Jan 17 19:59:56 linux-6380 kernel: [<c0848a79>] ret_from_kernel_thread+0x21/0x38 The issue has been there for quite some time but it has been made visible by only by 24ba16bb3d49 ("xfs: clear PF_NOFREEZE for xfsaild kthread") because the suspend started seeing xfsaild. The above commit has missed that the !xfs_ail_min branch might call schedule with TASK_INTERRUPTIBLE without calling try_to_freeze so the pm suspend would wake up the kernel thread over and over again without any progress. What we want here is to use freezable_schedule instead to hide the thread from the suspend. While we are here also change schedule_timeout to freezable variant to prevent from spurious wakeups by suspend. [dchinner: re-add set_freezeable call so the freezer will account properly for this kthread. ] Reported-by: Hendrik Woltersdorf <hendrikw@arcor.de> Signed-off-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: remove unused function definitionsEric Sandeen2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Old leftovers. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: move buffer invalidation to xfs_btree_free_blockChristoph Hellwig2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... instead of leaving it in the methods. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: factor btree block freeing into a helperChristoph Hellwig2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: handle errors from ->free_blocks in xfs_btree_kill_irootChristoph Hellwig2016-02-07
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * | | Merge branch 'xfs-dio-fix-4.6' into for-nextDave Chinner2016-03-06
| |\ \ \
| | * | | xfs: fold xfs_vm_do_dio into xfs_vm_direct_IOChristoph Hellwig2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| | * | | xfs: don't use ioends for direct write completionsChristoph Hellwig2016-02-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We only need to communicate two bits of information to the direct I/O completion handler: (1) do we need to convert any unwritten extents in the range (2) do we need to check if we need to update the inode size based on the range passed to the completion handler We can use the private data passed to the get_block handler and the completion handler as a simple bitmask to communicate this information instead of the current complicated infrastructure reusing the ioends from the buffer I/O path, and thus avoiding a memory allocation and a context switch for any non-trivial direct write. As a nice side effect we also decouple the direct I/O path implementation from that of the buffered I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com>
| | * | | direct-io: always call ->end_io if non-NULLChristoph Hellwig2016-02-07
| | |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This way we can pass back errors to the file system, and allow for cleanup required for all direct I/O invocations. Also allow the ->end_io handlers to return errors on their own, so that I/O completion errors can be passed on to the callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>