aboutsummaryrefslogtreecommitdiffstats
path: root/fs/btrfs
Commit message (Collapse)AuthorAge
...
* Btrfs: fix metadata dirty throttling limitsChris Mason2009-06-10
| | | | | | | | | | | | Once a metadata block has been written, it must be recowed, so the btrfs dirty balancing call has a check to make sure a fair amount of metadata was actually dirty before it started writing it back to disk. A previous commit had changed the dirty tracking for metadata without updating the btrfs dirty balancing checks. This commit switches it to use the correct counter. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: reduce mount -o ssd CPU usageChris Mason2009-06-10
| | | | | | | | | | | | The block allocator in SSD mode will try to find groups of free blocks that are close together. This commit makes it loop less on a given group size before bumping it. The end result is that we are less likely to fill small holes in the available free space, but we don't waste as much CPU building the large cluster used by ssd mode. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: balance btree more oftenChris Mason2009-06-10
| | | | | | | | | With the new back reference code, the cost of a balance has gone down in terms of the number of back reference updates done. This commit makes us more aggressively balance leaves and nodes as they become less full. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: stop avoiding balancing at the end of the transaction.Chris Mason2009-06-10
| | | | | | | | | | | | | | | When the delayed reference code was added, some checks were added to avoid extra balancing while the delayed references were being flushed. This made for less efficient btrees, but it reduced the chances of loops where no forward progress was made because the balances made more delayed ref updates. With the new dead root removal code and the mixed back references, the extent allocation tree is no longer using precise back refs, and the delayed reference updates don't carry the risk of looping forever anymore. So, the balance avoidance is no longer required. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Mixed back reference (FORWARD ROLLING FORMAT CHANGE)Yan Zheng2009-06-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit introduces a new kind of back reference for btrfs metadata. Once a filesystem has been mounted with this commit, IT WILL NO LONGER BE MOUNTABLE BY OLDER KERNELS. When a tree block in subvolume tree is cow'd, the reference counts of all extents it points to are increased by one. At transaction commit time, the old root of the subvolume is recorded in a "dead root" data structure, and the btree it points to is later walked, dropping reference counts and freeing any blocks where the reference count goes to 0. The increments done during cow and decrements done after commit cancel out, and the walk is a very expensive way to go about freeing the blocks that are no longer referenced by the new btree root. This commit reduces the transaction overhead by avoiding the need for dead root records. When a non-shared tree block is cow'd, we free the old block at once, and the new block inherits old block's references. When a tree block with reference count > 1 is cow'd, we increase the reference counts of all extents the new block points to by one, and decrease the old block's reference count by one. This dead tree avoidance code removes the need to modify the reference counts of lower level extents when a non-shared tree block is cow'd. But we still need to update back ref for all pointers in the block. This is because the location of the block is recorded in the back ref item. We can solve this by introducing a new type of back ref. The new back ref provides information about pointer's key, level and in which tree the pointer lives. This information allow us to find the pointer by searching the tree. The shortcoming of the new back ref is that it only works for pointers in tree blocks referenced by their owner trees. This is mostly a problem for snapshots, where resolving one of these fuzzy back references would be O(number_of_snapshots) and quite slow. The solution used here is to use the fuzzy back references in the common case where a given tree block is only referenced by one root, and use the full back references when multiple roots have a reference on a given block. This commit adds per subvolume red-black tree to keep trace of cached inodes. The red-black tree helps the balancing code to find cached inodes whose inode numbers within a given range. This commit improves the balancing code by introducing several data structures to keep the state of balancing. The most important one is the back ref cache. It caches how the upper level tree blocks are referenced. This greatly reduce the overhead of checking back ref. The improved balancing code scales significantly better with a large number of snapshots. This is a very large commit and was written in a number of pieces. But, they depend heavily on the disk format change and were squashed together to make sure git bisect didn't end up in a bad state wrt space balancing or the format change. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* btrfs: Fix set/clear_extent_bit for 'end == (u64)-1'Yan Zheng2009-06-10
| | | | | | | | There are some 'start = state->end + 1;' like code in set_extent_bit and clear_extent_bit. They overflow when end == (u64)-1. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-06-05
|\ | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: Fix oops and use after free during space balancing Btrfs: set device->total_disk_bytes when adding new device
| * Btrfs: Fix oops and use after free during space balancingChris Mason2009-06-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The btrfs allocator uses list_for_each to walk the available block groups when searching for free blocks. It starts off with a hint to help find the best block group for a given allocation. The hint is resolved into a block group, but we don't properly check to make sure the block group we find isn't in the middle of being freed due to filesystem shrinking or balancing. If it is being freed, the list pointers in it are bogus and can't be trusted. But, the code happily goes along and uses them in the list_for_each loop, leading to all kinds of fun. The fix used here is to check to make sure the block group we find really is on the list before we use it. list_del_init is used when removing it from the list, so we can do a proper check. The allocation clustering code has a similar bug where it will trust the block group in the current free space cluster. If our allocation flags have changed (going from single spindle dup to raid1 for example) because the drives in the FS have changed, we're not allowed to use the old block group any more. The fix used here is to check the current cluster against the current allocation flags. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: set device->total_disk_bytes when adding new deviceYan Zheng2009-06-04
| | | | | | | | | | | | | | It was not being properly initialized, and so the size saved to disk was not correct. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-05-14
|\| | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: Spelling fix in btrfs_lookup_first_block_group comments Btrfs: make show_options result match actual option names Btrfs: remove outdated comment in btrfs_ioctl_resize() Btrfs: remove some WARN_ONs in the IO failure path Btrfs: Don't loop forever on metadata IO failures Btrfs: init inode ordered_data_close flag properly
| * Btrfs: Spelling fix in btrfs_lookup_first_block_group commentsSankar P2009-05-14
| | | | | | | | | | Signed-off-by: Sankar P <sankar.curiosity@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: make show_options result match actual option namesSage Weil2009-05-14
| | | | | | | | | | | | | | | | The notreelog and flushoncommit mount options were being printed slightly differently. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove outdated comment in btrfs_ioctl_resize()Li Hong2009-05-14
| | | | | | | | | | | | | | | | | | In Li Zefan's commit dae7b665cf6d6e6e733f1c9c16cf55547dd37e33, a combination call of kmalloc() and copy_from_user() is replaced by memdup_user(). So btrfs_ioctl_resize() doesn't use GFP_NOFS any more. Signed-off-by: Li Hong <lihong.hi@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove some WARN_ONs in the IO failure pathChris Mason2009-05-14
| | | | | | | | | | | | | | | | These debugging WARN_ONs make too much console noise during regular IO failures. An IO failure will still generate a number of messages as we verify checksums etc, but these two are not needed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Don't loop forever on metadata IO failuresChris Mason2009-05-14
| | | | | | | | | | | | | | | | | | | | | | | | | | When a btrfs metadata read fails, the first thing we try to do is find a good copy on another mirror of the block. If this fails, read_tree_block() ends up returning a buffer that isn't up to date. The btrfs btree reading code was reworked to drop locks and repeat the search when IO was done, but the changes didn't add a check for failed reads. The end result was looping forever on buffers that were never going to become up to date. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: init inode ordered_data_close flag properlyChris Mason2009-05-14
| | | | | | | | | | | | | | | | | | This flag is used to decide when we need to send a given file through the ordered code to make sure it is fully written before a transaction commits. It was not being properly set to zero when the inode was being setup. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Convert obvious places to deactivate_locked_super()Al Viro2009-05-09
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-04-27
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: look for acls during btrfs_read_locked_inode Btrfs: fix acl caching Btrfs: Fix a bunch of printk() warnings. Btrfs: Fix a trivial warning using max() of u64 vs ULL. Btrfs: remove unused btrfs_bit_radix slab Btrfs: ratelimit IO error printks Btrfs: remove #if 0 code Btrfs: When shrinking, only update disk size on success Btrfs: fix deadlocks and stalls on dead root removal Btrfs: fix fallocate deadlock on inode extent lock Btrfs: kill btrfs_cache_create Btrfs: don't export symbols Btrfs: simplify makefile Btrfs: try to keep a healthy ratio of metadata vs data block groups
| * Btrfs: look for acls during btrfs_read_locked_inodeChris Mason2009-04-27
| | | | | | | | | | | | | | | | This changes btrfs_read_locked_inode() to peek ahead in the btree for acl items. If it is certain a given inode has no acls, it will set the in memory acl fields to null to avoid acl lookups completely. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix acl cachingChris Mason2009-04-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linus noticed the btrfs code to cache acls wasn't properly caching a NULL acl when the inode didn't have any acls. This meant the common case of no acls resulted in expensive btree searches every time the kernel checked permissions (which is quite often). This is a modified version of Linus' original patch: Properly set initial acl fields to BTRFS_ACL_NOT_CACHED in the inode. This forces an acl lookup when permission checks are done. Fix btrfs_get_acl to avoid lookups and locking when the inode acls fields are set to null. Fix btrfs_get_acl to use the right return value from __btrfs_getxattr when deciding to cache a NULL acl. It was storing a NULL acl when __btrfs_getxattr return -ENOENT, but __btrfs_getxattr was actually returning -ENODATA for this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix a bunch of printk() warnings.Joel Becker2009-04-27
| | | | | | | | | | | | | | | | Just happened to notice a bunch of %llu vs u64 warnings. Here's a patch to cast them all. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: Fix a trivial warning using max() of u64 vs ULL.Joel Becker2009-04-27
| | | | | | | | | | | | | | | | | | | | | | A small warning popped up on ia64 because inode-map.c was comparing a u64 object id with the ULL FIRST_FREE_OBJECTID. My first thought was that all the OBJECTID constants should contain the u64 cast because btrfs code deals entirely in u64s. But then I saw how large that was, and figured I'd just fix the max() call. Signed-off-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove unused btrfs_bit_radix slabChris Mason2009-04-27
| | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: ratelimit IO error printksChris Mason2009-04-27
| | | | | | | | | | | | | | | | | | | | | | Btrfs has printks for various IO errors, including bad checksums and mismatches between what we expect the block headers to contain and what we actually find on the disk. Longer term we need a real reporting mechanism for this, but for now printk is going to have to do. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove #if 0 codeChris Mason2009-04-27
| | | | | | | | | | | | Btrfs had some old code sitting around under #if 0, this drops it. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: When shrinking, only update disk size on successChris Ball2009-04-27
| | | | | | | | | | | | | | | | | | | | Previously, we updated a device's size prior to attempting a shrink operation. This patch moves the device resizing logic to only happen if the shrink completes successfully. In the process, it introduces a new field to btrfs_device -- disk_total_bytes -- to track the on-disk size. Signed-off-by: Chris Ball <cjb@laptop.org> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix deadlocks and stalls on dead root removalChris Mason2009-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a transaction commit, the old root of the subvol btrees are sent through snapshot removal. This is what actually frees up any blocks replaced by COW, and anything the old blocks pointed to. Snapshot deletion will pause when a transaction commit has started, which helps to avoid a huge amount of delayed reference count updates piling up as the transaction is trying to close. But, this pause happens after the snapshot deletion process has asked other procs on the system to throttle back a bit so that it can make progress. We don't want to throttle everyone while we're waiting for the transaction commit, it leads to deadlocks in the user transaction ioctls used by Ceph and makes things slower in general. This patch changes things to avoid the throttling while we sleep. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix fallocate deadlock on inode extent lockChris Mason2009-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The btrfs fallocate call takes an extent lock on the entire range being fallocated, and then runs through insert_reserved_extent on each extent as they are allocated. The problem with this is that btrfs_drop_extents may decide to try and take the same extent lock fallocate was already holding. The solution used here is to push down knowledge of the range that is already locked going into btrfs_drop_extents. It turns out that at least one other caller had the same bug. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: kill btrfs_cache_createChristoph Hellwig2009-04-24
| | | | | | | | | | | | | | Just use kmem_cache_create directly. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: don't export symbolsChristoph Hellwig2009-04-24
| | | | | | | | | | | | | | | | Currently the extent_map code is only for btrfs so don't export it's symbols. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: simplify makefileChristoph Hellwig2009-04-24
| | | | | | | | | | | | | | | | Get rid of the hacks for building out of tree, and always use += for assigning to the object lists. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: try to keep a healthy ratio of metadata vs data block groupsJosef Bacik2009-04-24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes the chunk allocator keep a good ratio of metadata vs data block groups. By default for every 8 data block groups, we'll allocate 1 metadata chunk, or about 12% of the disk will be allocated for metadata. This can be changed by specifying the metadata_ratio mount option. This is simply the number of data block groups that have to be allocated to force a metadata chunk allocation. By making sure we allocate metadata chunks more often, we are less likely to get into situations where the whole disk has been allocated as data block groups. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-04-21
|\| | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix btrfs fallocate oops and deadlock Btrfs: use the right node in reada_for_balance Btrfs: fix oops on page->mapping->host during writepage Btrfs: add a priority queue to the async thread helpers Btrfs: use WRITE_SYNC for synchronous writes
| * Btrfs: fix btrfs fallocate oops and deadlockChris Mason2009-04-21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs fallocate was incorrectly starting a transaction with a lock held on the extent_io tree for the file, which could deadlock. Strictly speaking it was using join_transaction which would be safe, but it is better to move the transaction outside of the lock. When preallocated extents are overwritten, btrfs_mark_buffer_dirty was being called on an unlocked buffer. This was triggering an assertion and oops because the lock is supposed to be held. The bug was calling btrfs_mark_buffer_dirty on a leaf after btrfs_del_item had been run. btrfs_del_item takes care of dirtying things, so the solution is a to skip the btrfs_mark_buffer_dirty call in this case. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: use the right node in reada_for_balanceChris Mason2009-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | reada_for_balance was using the wrong index into the path node array, so it wasn't reading the right blocks. We never directly used the results of the read done by this function because the btree search is started over at the end. This fixes reada_for_balance to reada in the correct node and to avoid searching past the last slot in the node. It also makes sure to hold the parent lock while we are finding the nodes to read. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix oops on page->mapping->host during writepageChris Mason2009-04-20
| | | | | | | | | | | | | | | | | | | | | | | | The extent_io writepage call updates the writepage index in the inode as it makes progress. But, it was doing the update after unlocking the page, which isn't legal because page->mapping can't be trusted once the page is unlocked. This lead to an oops, especially common with compression turned on. The fix here is to update the writeback index before unlocking the page. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: add a priority queue to the async thread helpersChris Mason2009-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs is using WRITE_SYNC_PLUG to send down synchronous IOs with a higher priority. But, the checksumming helper threads prevent it from being fully effective. There are two problems. First, a big queue of pending checksumming will delay the synchronous IO behind other lower priority writes. Second, the checksumming uses an ordered async work queue. The ordering makes sure that IOs are sent to the block layer in the same order they are sent to the checksumming threads. Usually this gives us less seeky IO. But, when we start mixing IO priorities, the lower priority IO can delay the higher priority IO. This patch solves both problems by adding a high priority list to the async helper threads, and a new btrfs_set_work_high_prio(), which is used to make put a new async work item onto the higher priority list. The ordering is still done on high priority IO, but all of the high priority bios are ordered separately from the low priority bios. This ordering is purely an IO optimization, it is not involved in data or metadata integrity. Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: use WRITE_SYNC for synchronous writesChris Mason2009-04-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Part of reducing fsync/O_SYNC/O_DIRECT latencies is using WRITE_SYNC for writes we plan on waiting on in the near future. This patch mirrors recent changes in other filesystems and the generic code to use WRITE_SYNC when WB_SYNC_ALL is passed and to use WRITE_SYNC for other latency critical writes. Btrfs uses async worker threads for checksumming before the write is done, and then again to actually submit the bios. The bio submission code just runs a per-device list of bios that need to be sent down the pipe. This list is split into low priority and high priority lists so the WRITE_SYNC IO happens first. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* | btrfs: use memdup_user()Li Zefan2009-04-20
|/ | | | | | | | | | Remove open-coded memdup_user(). Note this changes some GFP_NOFS to GFP_KERNEL, since copy_from_user() may cause pagefault, it's pointless to pass GFP_NOFS to kmalloc(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstableLinus Torvalds2009-04-03
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: BUG to BUG_ON changes Btrfs: remove dead code Btrfs: remove dead code Btrfs: fix typos in comments Btrfs: remove unused ftrace include Btrfs: fix __ucmpdi2 compile bug on 32 bit builds Btrfs: free inode struct when btrfs_new_inode fails Btrfs: fix race in worker_loop Btrfs: add flushoncommit mount option Btrfs: notreelog mount option Btrfs: introduce btrfs_show_options Btrfs: rework allocation clustering Btrfs: Optimize locking in btrfs_next_leaf() Btrfs: break up btrfs_search_slot into smaller pieces Btrfs: kill the pinned_mutex Btrfs: kill the block group alloc mutex Btrfs: clean up find_free_extent Btrfs: free space cache cleanups Btrfs: unplug in the async bio submission threads Btrfs: keep processing bios for a given bdev if our proc is batching
| * Btrfs: BUG to BUG_ON changesStoyan Gaydarov2009-04-02
| | | | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove dead codeDan Carpenter2009-04-02
| | | | | | | | | | | | | | | | Remove an unneeded return statement and conditional Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove dead codeDan Carpenter2009-04-02
| | | | | | | | | | | | | | | | merge is always NULL at this point. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix typos in commentsWu Fengguang2009-04-02
| | | | | | | | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: remove unused ftrace includeJim Owens2009-04-02
| | | | | | | | | | | | Signed-off-by: jim owens <jowens@hp.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix __ucmpdi2 compile bug on 32 bit buildsHeiko Carstens2009-04-03
| | | | | | | | | | | | | | | | | | | | | | | | | | We get this on 32 builds: fs/built-in.o: In function `extent_fiemap': (.text+0x1019f2): undefined reference to `__ucmpdi2' Happens because of a switch statement with a 64 bit argument. Convert this to an if statement to fix this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: free inode struct when btrfs_new_inode failsShen Feng2009-04-02
| | | | | | | | | | | | | | | | | | btrfs_new_inode doesn't call iput to free the inode when it fails. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: fix race in worker_loopAmit Gud2009-04-02
| | | | | | | | | | | | | | | | | | | | Need to check kthread_should_stop after schedule_timeout() before calling schedule(). This causes threads to sleep with potentially no one to wake them up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop. Signed-off-by: Amit Gud <gud@ksu.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: add flushoncommit mount optionSage Weil2009-04-02
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes the committed state a fully consistent view of the file system from the application's perspective (i.e., it includes all completed file system operations). This was previously the behavior only when a snapshot is created. This is used by Ceph to ensure that completed writes make it to the platter along with the metadata operations they are bound to (by BTRFS_IOC_TRANS_{START,END}). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| * Btrfs: notreelog mount optionSage Weil2009-04-02
| | | | | | | | | | | | | | | | | | | | Add a 'notreelog' mount option to disable the tree log (used by fsync, O_SYNC writes). This is much slower, but the tree logging produces inconsistent views into the FS for ceph. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>