aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs/extent_map.c
Commit message (Collapse)AuthorAge
* Btrfs: Add zlib compression supportChris Mason2008-10-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a large change for adding compression on reading and writing, both for inline and regular extents. It does some fairly large surgery to the writeback paths. Compression is off by default and enabled by mount -o compress. Even when the -o compress mount option is not used, it is possible to read compressed extents off the disk. If compression for a given set of pages fails to make them smaller, the file is flagged to avoid future compression attempts later. * While finding delalloc extents, the pages are locked before being sent down to the delalloc handler. This allows the delalloc handler to do complex things such as cleaning the pages, marking them writeback and starting IO on their behalf. * Inline extents are inserted at delalloc time now. This allows us to compress the data before inserting the inline extent, and it allows us to insert an inline extent that spans multiple pages. * All of the in-memory extent representations (extent_map.c, ordered-data.c etc) are changed to record both an in-memory size and an on disk size, as well as a flag for compression. From a disk format point of view, the extent pointers in the file are changed to record the on disk size of a given extent and some encoding flags. Space in the disk format is allocated for compression encoding, as well as encryption and a generic 'other' field. Neither the encryption or the 'other' field are currently used. In order to limit the amount of data read for a single random read in the file, the size of a compressed extent is limited to 128k. This is a software only limit, the disk format supports u64 sized compressed extents. In order to limit the ram consumed while processing extents, the uncompressed size of a compressed extent is limited to 256k. This is a software only limit and will be subject to tuning later. Checksumming is still done on compressed extents, and it is done on the uncompressed version of the data. This way additional encodings can be layered on without having to figure out which encoding to checksum. Compression happens at delalloc time, which is basically singled threaded because it is usually done by a single pdflush thread. This makes it tricky to spread the compression load across all the cpus on the box. We'll have to look at parallel pdflush walks of dirty inodes at a later time. Decompression is hooked into readpages and it does spread across CPUs nicely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: add and improve commentsChris Mason2008-09-29
| | | | | | | | | | | This improves the comments at the top of many functions. It didn't dive into the guts of functions because I was trying to avoid merging problems with the new allocator and back reference work. extent-tree.c and volumes.c were both skipped, and there is definitely more work todo in cleaning and commenting the code. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix add_extent_mapping to check for duplicates across the whole rangeChris Mason2008-09-25
| | | | | | | | | | | | add_extent_mapping was allowing the insertion of overlapping extents. This never used to happen because it only inserted the extents from disk and those were never overlapping. But, with the data=ordered code, the disk and memory representations of the file are not the same. add_extent_mapping needs to ensure a new extent does not overlap before it inserts. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Use assert_spin_locked instead of spin_trylockDavid Woodhouse2008-09-25
| | | | | | On UP systems spin_trylock always succeeds Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix some data=ordered related data corruptionsChris Mason2008-09-25
| | | | | | | | | | | | | | | | | | Stress testing was showing data checksum errors, most of which were caused by a lookup bug in the extent_map tree. The tree was caching the last pointer returned, and searches would check the last pointer first. But, search callers also expect the search to return the very first matching extent in the range, which wasn't always true with the last pointer usage. For now, the code to cache the last return value is just removed. It is easy to fix, but I think lookups are rare enough that it isn't required anymore. This commit also replaces do_sync_mapping_range with a local copy of the related functions. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Keep extent mappings in ram until pending ordered extents are doneChris Mason2008-09-25
| | | | | | | | It was possible for stale mappings from disk to be used instead of the new pending ordered extent. This adds a flag to the extent map struct to keep it pinned until the pending ordered extent is actually on disk. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: New data=ordered implementationChris Mason2008-09-25
| | | | | | | | | | | | | | | | | | | | | | | | The old data=ordered code would force commit to wait until all the data extents from the transaction were fully on disk. This introduced large latencies into the commit and stalled new writers in the transaction for a long time. The new code changes the way data allocations and extents work: * When delayed allocation is filled, data extents are reserved, and the extent bit EXTENT_ORDERED is set on the entire range of the extent. A struct btrfs_ordered_extent is allocated an inserted into a per-inode rbtree to track the pending extents. * As each page is written EXTENT_ORDERED is cleared on the bytes corresponding to that page. * When all of the bytes corresponding to a single struct btrfs_ordered_extent are written, The previously reserved extent is inserted into the FS btree and into the extent allocation trees. The checksums for the file data are also updated. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: kerneldoc comments for extent_map.cChristoph Hellwig2008-09-25
| | | | | | | Add kerneldoc comments for all exported functions. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: fix strange indentation in lookup_extent_mappingChristoph Hellwig2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Split the extent_map code into two partsChris Mason2008-09-25
| | | | | | | | | | | | | | There is now extent_map for mapping offsets in the file to disk and extent_io for state tracking, IO submission and extent_bufers. The new extent_map code shifts from [start,end] pairs to [start,len], and pushes the locking out into the caller. This allows a few performance optimizations and is easier to use. A number of extent_map usage bugs were fixed, mostly with failing to remove extent_map entries when changing the file. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix hole insertion corner casesChris Mason2008-09-25
| | | | | | | | | | There were a few places that could cause duplicate extent insertion, this adjusts the code that creates holes to avoid it. lookup_extent_map is changed to correctly return all of the extents in a range, even when there are none matching at the start of the range. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix for test_range_bitYan2008-09-25
| | | | | | | test_range_bit doesn't properly handle the case: there's a hole at the end of the range and there's no other extent_state after the range. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Remove verbose WARN_ONChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix extent_buffer usage when nodesize != leafsizeChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Remove extent_map debugging messageChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix an off by one in the extent_map prepare write codeChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Implement basic support for -ENOSPCChris Mason2008-09-25
| | | | | | | | | | This is intended to prevent accidentally filling the drive. A determined user can still make things oops. It includes some accounting of the current bytes under delayed allocation, but this will change as things get optimized Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix delayed allocation to avoid missing delalloc extentsChris Mason2008-09-25
| | | | | | find_lock_delalloc_range could exit out too early Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Back port to 2.6.18-el kernelsChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: section mismatch warningsChristian Hesse2008-09-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | --Boundary-00=_CcOWHFYK4T+JwSj Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Hello everybody, compiling btrfs into the kernel results in section mismatch warnings. __exit functions are called where they are not allowed to. The attached patch fixes this for me. Not sure if it is correct though. Signed-off-by: Christian Hesse <mail@earthworm.de> -- Regards, Chris --Boundary-00=_CcOWHFYK4T+JwSj Content-Type: text/x-diff; charset="iso-8859-1"; name="btrfs-section_mismatches.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="btrfs-section_mismatches.patch" Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add efficient dirty accounting to the extent_map treeChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Limit btree writeback to prevent seeksChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Handle writeback under high memory pressure betterChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make sure page mapping dirty tag is properly clearedChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Avoid fragmentation from parallel delalloc fillingChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Return value checking in module initWyatt Banks2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix extent bit range testingChris Mason2008-09-25
| | | | | | | It could return the bit as set when there was actually a hole at the very end of the range. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add readpages supportChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Avoid extent_buffer lru corruptionChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix failure cleanups when allocating extent buffers failChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add writepages supportChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: small fixes for find_lock_delalloc_range.Yan2008-09-25
| | | | | | | | There is a 'finish_wait', but no 'prepare_to_wait' . So I think that the 'prepare_to_wait' is missing. The second change is according to the name of variable. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix a number of inline extent problems that Yan Zheng reported.Chris Mason2008-09-25
| | | | | | | | | | | | | | | | | | The fixes do a number of things: 1) Most btrfs_drop_extent callers will try to leave the inline extents in place. It can truncate bytes off the beginning of the inline extent if required. 2) writepage can now update the inline extent, allowing mmap writes to go directly into the inline extent. 3) btrfs_truncate_in_transaction truncates inline extents 4) extent_map.c fixed to not merge inline extent mappings and hole mappings together Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix PAGE_CACHE_SHIFT shifts on 32 bit machinesChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix extent_map leak in extent_bmapYan2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Off by one fixes in extent_map.cYan2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Avoid recursive KM_USER1 mappings in copy_extent_bufferChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: CPU usage optimizations in push and the extent_map codeChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix read/write_extent_buffer to use KM_USER1 instead of KM_USER0Chris Mason2008-09-25
| | | | | | This avoids recursive use of KM_USER0 during btrfs_file_write Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix bi_end_io() functions on > 2.6.23 kernelsJens Axboe2008-09-25
| | | | | | | It now returns void and it is never called for partial completions, so the bio->bi_size check must go. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: 32-bit type problemsJens Axboe2008-09-25
| | | | | | An assorted set of casts to get rid of the warnings on 32-bit archs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add back file data checksummingChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add back metadata checksummingChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: extent_map optimizations to cut down on CPU usageChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add an extent buffer LRU to reduce radix tree hitsChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix allocation routines to avoid intermixing data and metadata ↵Chris Mason2008-09-25
| | | | | | allocations Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Use an array of pages in the extent buffers to reduce the cost of ↵Chris Mason2008-09-25
| | | | | | find_get_page Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Cache extent buffer mappingsChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allow tree blocks larger than the page sizeChris Mason2008-09-25
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Change the remaining radix trees used by extent-tree.c to extent_map ↵Chris Mason2008-09-25
| | | | | | trees Signed-off-by: Chris Mason <chris.mason@oracle.com>