aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ext4
Commit message (Collapse)AuthorAge
* ext4: Properly count journal credits for long symlinksEric Sandeen2011-08-11
| | | | | | | | | | | | | | | | | | | | | Commit df5e6223407e ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* ext4: use kzalloc in ext4_kzalloc()Mathias Krause2011-08-03
| | | | | | | | | Commit 9933fc0i (ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()) intruduced wrappers around k*alloc/vmalloc but introduced a typo for ext4_kzalloc() by not using kzalloc() but kmalloc(). Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* Merge branch 'for_linus' of ↵Linus Torvalds2011-08-01
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (60 commits) ext4: prevent memory leaks from ext4_mb_init_backend() on error path ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode # ext4: use ext4_msg() instead of printk in mballoc ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_info ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree() ext4: use the correct error exit path in ext4_init_inode_table() ext4: add missing kfree() on error return path in add_new_gdb() ext4: change umode_t in tracepoint headers to be an explicit __u16 ext4: fix races in ext4_sync_parent() ext4: Fix overflow caused by missing cast in ext4_fallocate() ext4: add action of moving index in ext4_ext_rm_idx for Punch Hole ext4: simplify parameters of reserve_backup_gdb() ext4: simplify parameters of add_new_gdb() ext4: remove lock_buffer in bclean() and setup_new_group_blocks() ext4: simplify journal handling in setup_new_group_blocks() ext4: let setup_new_group_blocks() set multiple bits at a time ext4: fix a typo in ext4_group_extend() ext4: let ext4_group_add_blocks() handle 0 blocks quickly ext4: let ext4_group_add_blocks() return an error code ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks() ... Fix up conflict in fs/ext4/inode.c: commit aacfc19c626e ("fs: simplify the blockdev_direct_IO prototype") had changed the ext4_ind_direct_IO() function for the new simplified calling convention, while commit dae1e52cb126 ("ext4: move ext4_ind_* functions from inode.c to indirect.c") moved the function to another file.
| * ext4: prevent memory leaks from ext4_mb_init_backend() on error pathYu Jian2011-08-01
| | | | | | | | | | | | | | | | | | | | | | In ext4_mb_init(), if the s_locality_group allocation fails it will currently cause the allocations made in ext4_mb_init_backend() to be leaked. Moving the ext4_mb_init_backend() allocation after the s_locality_group allocation avoids that problem. Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: use EXT4_BAD_INO for buddy cache to avoid colliding with valid inode #Yu Jian2011-08-01
| | | | | | | | | | | | Signed-off-by: Yu Jian <yujian@whamcloud.com> Signed-off-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: use ext4_msg() instead of printk in mballocTheodore Ts'o2011-08-01
| | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: use ext4_kvzalloc()/ext4_kvmalloc() for s_group_desc and s_group_infoTheodore Ts'o2011-08-01
| | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: introduce ext4_kvmalloc(), ext4_kzalloc(), and ext4_kvfree()Theodore Ts'o2011-08-01
| | | | | | | | | | | | | | | | Introduce new helper functions which try kmalloc, and then fall back to vmalloc if necessary, and use them for allocating and deallocating s_flex_groups. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: use the correct error exit path in ext4_init_inode_table()Yongqiang Yang2011-08-01
| | | | | | | | | | | | | | | | | | This patch lets ext4_init_inode_table() handle errors right. ext4_init_inode_table() should down_write() alloc_sem which has been up_write()ed and stop the started journal handle. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: add missing kfree() on error return path in add_new_gdb()Dan Carpenter2011-07-30
| | | | | | | | | | | | | | | | | | We added some more error handling in b40971426a "ext4: add error checking to calls to ext4_handle_dirty_metadata()". But we need to call kfree() as well to avoid a memory leak. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix races in ext4_sync_parent()Theodore Ts'o2011-07-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix problems if fsync() races against a rename of a parent directory as pointed out by Al Viro in his own inimitable way: >While we are at it, could somebody please explain what the hell is ext4 >doing in >static int ext4_sync_parent(struct inode *inode) >{ > struct writeback_control wbc; > struct dentry *dentry = NULL; > int ret = 0; > > while (inode && ext4_test_inode_state(inode, EXT4_STATE_NEWENTRY)) { > ext4_clear_inode_state(inode, EXT4_STATE_NEWENTRY); > dentry = list_entry(inode->i_dentry.next, > struct dentry, d_alias); > if (!dentry || !dentry->d_parent || !dentry->d_parent->d_inode) > break; > inode = dentry->d_parent->d_inode; > ret = sync_mapping_buffers(inode->i_mapping); > ... >Note that dentry obviously can't be NULL there. dentry->d_parent is never >NULL. And dentry->d_parent would better not be negative, for crying out >loud! What's worse, there's no guarantees that dentry->d_parent will >remain our parent over that sync_mapping_buffers() *and* that inode won't >just be freed under us (after rename() and memory pressure leading to >eviction of what used to be our dentry->d_parent)...... Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Fix overflow caused by missing cast in ext4_fallocate()Utako Kusaka2011-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The logical block number in map.l_blk is a __u32, and so before we shift it left, by the block size, we neeed cast it to a 64-bit size. Otherwise i_size can be corrupted on an ENOSPC. # df -T /mnt/mp1 Filesystem Type 1K-blocks Used Available Use% Mounted on /dev/sda6 ext4 9843276 153056 9190200 2% /mnt/mp1 # fallocate -o 0 -l 2199023251456 /mnt/mp1/testfile fallocate: /mnt/mp1/testfile: fallocate failed: No space left on device # stat /mnt/mp1/testfile File: `/mnt/mp1/testfile' Size: 4293656576 Blocks: 19380440 IO Block: 4096 regular file Device: 806h/2054d Inode: 12 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2011-07-25 13:01:31.414490496 +0900 Modify: 2011-07-25 13:01:31.414490496 +0900 Change: 2011-07-25 13:01:31.454490495 +0900 Signed-off-by: Utako Kusaka <u-kusaka@wm.jp.nec.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> -- fs/ext4/extents.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
| * ext4: add action of moving index in ext4_ext_rm_idx for Punch HoleRobin Dong2011-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The old function ext4_ext_rm_idx is used only for truncate case because it just remove last index in extent-index-block. When punching hole, it usually needed to remove "middle" index, therefore we must move indexes which after it forward. (I create a file with 1 depth extent tree and punch hole in the middle of it, the last index in index-block strangly gone, so I find out this bug) Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: simplify parameters of reserve_backup_gdb()Yongqiang Yang2011-07-27
| | | | | | | | | | | | | | The reserve_backup_gdb() function only needs the block group number; there's no need to pass a pointer to struct ext4_new_group_data to it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
| * ext4: simplify parameters of add_new_gdb()Yongqiang Yang2011-07-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | add_new_gdb() only needs the block group number; there is no need to pass a pointer to struct ext4_new_group_data to add_new_gdb(). Instead of filling in a pointer the struct buffer_head in add_new_gdb(), it's simpler to have the caller fetch it from the s_group_desc[] array. [Fixed error path to handle the case where struct buffer_head *primary hasn't been set yet. -- Ted] Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove lock_buffer in bclean() and setup_new_group_blocks()Yongqiang Yang2011-07-27
| | | | | | | | | | | | | | | | There is no need to lock the buffers since no one else should be touching these buffers besides the file system. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: simplify journal handling in setup_new_group_blocks()Yongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies journal handling in setup_new_group_blocks(). In previous code, block bitmap is modified everywhere in setup_new_group_blocks(), ext4_get_write_access() in extend_or_restart_transaction() is used to guarantee that the block bitmap stays in the new handle, this makes things complicated. The previous commit changed things so that the modifications on the block bitmap are batched and done by ext4_set_bits() at the end of the for loop. This allows us to simplify things. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: let setup_new_group_blocks() set multiple bits at a timeYongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | Rename mb_set_bits() to ext4_set_bits() and make it a global function so that setup_new_group_blocks() can use it. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix a typo in ext4_group_extend()Yongqiang Yang2011-07-26
| | | | | | | | | | Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: let ext4_group_add_blocks() handle 0 blocks quicklyYongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | If ext4_group_add_blocks() is called with 0 block, make it return 0 without doing any extra work. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: let ext4_group_add_blocks() return an error codeYongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | This patch lets ext4_group_add_blocks() return an error code if it fails, so that upper functions can handle error correctly. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: rename ext4_add_groupblocks() to ext4_group_add_blocks()Yongqiang Yang2011-07-26
| | | | | | | | | | Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: prevent a fs with errors from being resizedYongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | A filesystem with errors is not allowed to being resized, otherwise, it is easy to destroy the filesystem. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: prevent parallel resizers by atomic bit opsYongqiang Yang2011-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Before this patch, parallel resizers are allowed and protected by a mutex lock, actually, there is no need to support parallel resizer, so this patch prevents parallel resizers by atmoic bit ops, like lock_page() and unlock_page() do. To do this, the patch removed the mutex lock s_resize_lock from struct ext4_sb_info and added a unsigned long field named s_resize_flags which inidicates if there is a resizer. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix data corruption in inodes with journalled dataJan Kara2011-07-26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When journalling data for an inode (either because it is a symlink or because the filesystem is mounted in data=journal mode), ext4_evict_inode() can discard unwritten data by calling truncate_inode_pages(). This is because we don't mark the buffer / page dirty when journalling data but only add the buffer to the running transaction and thus mm does not know there are still unwritten data. Fix the problem by carefully tracking transaction containing inode's data, committing this transaction, and writing uncheckpointed buffers when inode should be reaped. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: correct comment for ext4_ext_check_cacheRobin Dong2011-07-23
| | | | | | | | | | | | | | The comment for ext4_ext_check_cache has a litte mistake. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: correct the debug message in ext4_ext_insert_extentRobin Dong2011-07-23
| | | | | | | | | | | | | | | | The debug message in ext4_ext_insert_extent before moving extent is incorrect (the "from xx to xx"). Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove unused argument in ext4_ext_next_leaf_blockRobin Dong2011-07-23
| | | | | | | | | | | | | | | | The argument "inode" in function ext4_ext_next_allocated_block looks useless, so clean it. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove ac_repeats from ext4_allocation_contextTao Ma2011-07-23
| | | | | | | | | | | | | | ac_repeats isn't referenced in the mballoc code. So remove it. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: don't increment s_mb_buddies_generated in ext4_mb_releaseTao Ma2011-07-23
| | | | | | | | | | | | | | | | In ext4_mb_release, we use s_mb_buddies_generated++. Although the output is OK, but I don't think we need this extra ++. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove unnecessary ext4_get_group_info in ext4_mb_load_buddyTao Ma2011-07-23
| | | | | | | | | | | | | | | | | | ext4_mb_load_buddy() calls ext4_get_group_info() for setting both "grp" and "e4b->bd_info", but it could do "e4b->bd_info = grp". Reported-by: Andreas Dilger <adilger@whamcloud.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: avoid eh_entries overflow before insert extent_idxRobin Dong2011-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If eh_entries is equal to (or greater than) eh_max, the operation of inserting new extent_idx will make number of entries overflow. So check eh_entries before inserting the new extent_idx. Although there is no bug case according the code (function ext4_ext_insert_index is called by ext4_ext_split and ext4_ext_split is called only if the index block has free space), the right logic should be "lookup the capacity before insertion". Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: avoid wasted extent cache lookup if !PUNCH_OUT_EXTRobin Dong2011-07-17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch avoids an extraneous lookup of the extent cache in ext4_ext_map_blocks() when the flag EXT4_GET_BLOCKS_PUNCH_OUT_EXT is absent. The existing logic was performing the lookup but not making use of the result. The patch simply reverses the order of evaluation in the condition. Since ext4_ext_in_cache() does not initialize newex on misses, bypassing its invocation does not introduce any new issue in this regard. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Eric Gouriou <egouriou@google.com>
| * ext4: remove unneeded parameter to ext4_ext_remove_space()Allison Henderson2011-07-17
| | | | | | | | | | | | | | | | This patch removes the extra parameter in ext4_ext_remove_space() which is no longer needed. Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: punch hole optimizations: skip un-needed extent lookupAllison Henderson2011-07-17
| | | | | | | | | | | | | | | | | | | | This patch optimizes the punch hole operation by skipping the tree walking code that is used by truncate. Since punch hole is done through map blocks, the path to the extent is already known in this function, so we do not need to look it up again. Signed-off-by: Allison Henderson <achender@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: ignore a stripe width of 1Dan Ehrenberg2011-07-17
| | | | | | | | | | | | | | | | | | If the stripe width was set to 1, then this patch will ignore that stripe width and ext4 will act as if the stripe width were 0 with respect to optimizing allocations. Signed-off-by: Dan Ehrenberg <dehrenberg@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: make the preallocation size be a multiple of stripe sizeDan Ehrenberg2011-07-17
| | | | | | | | | | | | | | | | | | | | | | | | Previously, if a stripe width was provided, then it would be used as the preallocation granularity, with no santiy checking and no way to override this. Now, mb_prealloc_size defaults to the smallest multiple of stripe size that is greater than or equal to the old default mb_prealloc_size, and this can be overridden with the sysfs interface. Signed-off-by: Dan Ehrenberg <dehrenberg@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix compilation with -DDX_DEBUGBernd Schubert2011-07-16
| | | | | | | | | | | | | | | | Compilation of ext4/namei.c brought up an error and warning messages when compiled with -DDX_DEBUG Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove unnecessary comments in ext4_orphan_add()Lukas Czerner2011-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The comment from Al Viro about possible race in the ext4_orphan_add() is not justified. There is no race possible as we always have either i_mutex locked, or the inode can not be referenced from outside hence the J_ASSERS should not be hit from the reason described in comment. This commit replaces it with notion that we are holding i_mutex so it should not be possible for i_nlink to be changed while waiting for s_orphan_lock. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Fix a double free of sbi->s_group_info in ext4_mb_init_backendTao Ma2011-07-11
| | | | | | | | | | | | | | | | | | | | | | | | If we meet with an error in ext4_mb_add_groupinfo, we kfree sbi->s_group_info[group >> EXT4_DESC_PER_BLOCK_BITS(sb)], but fail to reset it to NULL. So the caller ext4_mb_init_backend will try to kfree it again and causes a double free. So fix it by resetting it to NULL. Some typo in comments of mballoc.c are also changed. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix a race which could leak memory in ext4_groupinfo_create_slab()Tao Ma2011-07-11
| | | | | | | | | | | | | | | | | | | | In ext4_groupinfo_create_slab, we create ext4_groupinfo_caches within ext4_grpinfo_slab_create_mutex, but set it outside the lock, and there does exist some case that we may create it twice and causes a memory leak. So set it before we call mutex_unlock. Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: avoid unneeded ext4_ext_next_leaf_block() while inserting extentsRobin Dong2011-07-11
| | | | | | | | | | | | | | | | Optimize ext4_ext_insert_extent() by avoiding ext4_ext_next_leaf_block() when the result is not used/needed. Signed-off-by: Robin Dong <sanbai@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove redundant goto in ext4_ext_insert_extent()Robin Dong2011-07-11
| | | | | | | | | | | | | | | | | | | | If eh->eh_entries is smaller than eh->eh_max, the routine will go to the "repeat" and then go to "has_space" directlly , since argument "depth" and "eh" are not even changed. Therefore, goto "has_space" directly and remove redundant "repeat" tag. Signed-off-by: Robin Dong <sanbai@taobao.com>
| * ext4: Change the wrong param comment for ext4_trim_all_freeTao Ma2011-07-11
| | | | | | | | | | | | | | | | | | at ext4_trim_all_free() comment, there is no longer an @e4b parameter, instead it is @group. Reported-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Speed up FITRIM by recording flags in ext4_group_infoTao Ma2011-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In ext4, when FITRIM is called every time, we iterate all the groups and do trim one by one. It is a bit time wasting if the group has been trimmed and there is no change since the last trim. So this patch adds a new flag in ext4_group_info->bb_state to indicate that the group has been trimmed, and it will be cleared if some blocks is freed(in release_blocks_on_commit). Another trim_minlen is added in ext4_sb_info to record the last minlen we use to trim the volume, so that if the caller provide a small one, we will go on the trim regardless of the bb_state. A simple test with my intel x25m ssd: df -h shows: /dev/sdb1 40G 21G 17G 56% /mnt/ext4 Block size: 4096 run the FITRIM with the following parameter: range.start = 0; range.len = UINT64_MAX; range.minlen = 1048576; without the patch: [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.505s user 0m0.000s sys 0m1.224s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.359s user 0m0.000s sys 0m1.178s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.228s user 0m0.000s sys 0m1.151s with the patch: [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m5.625s user 0m0.000s sys 0m1.269s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m0.002s user 0m0.000s sys 0m0.001s [root@boyu-tm linux-2.6]# time ./ftrim /mnt/ext4/a real 0m0.002s user 0m0.000s sys 0m0.001s A big improvement for the 2nd and 3rd run. Even after I delete some big image files, it is still much faster than iterating the whole disk. [root@boyu-tm test]# time ./ftrim /mnt/ext4/a real 0m1.217s user 0m0.000s sys 0m0.196s Cc: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Add new ext4 trim tracepointsTao Ma2011-07-11
| | | | | | | | | | | | | | | | Add ext4_trim_extent and ext4_trim_all_free. Reviewed-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: speed up group trim with the right free block countTao Ma2011-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we trim some free blocks in a group of ext4, we need to calculate the free blocks properly and check whether there are enough freed blocks left for us to trim. Current solution will only calculate free spaces if they are large for a trim which isn't appropriate. Let us see a small example: a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k. And minblocks is 1M. With current solution, we have to iterate the whole group since these 300k will never be subtracted from 1.5M. But actually we should exit after we find the first 2 free spaces since the left 3 chunks only sum up to 900K if we subtract the first 600K although they can't be trimed. Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix trim length underflow with small trim lengthTao Ma2011-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In 0f0a25b, we adjust 'len' with s_first_data_block - start, but it could underflow in case blocksize=1K, fstrim_range.len=512 and fstrim_range.start = 0. In this case, when we run the code: len -= first_data_blk - start; len will be underflow to -1ULL. In the end, although we are safe that last_group check later will limit the trim to the whole volume, but that isn't what the user really want. So this patch fix it. It also adds the check for 'start' like ext3 so that we can break immediately if the start is invalid. Cc: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: add tracepoint for ext4_journal_startTheodore Ts'o2011-07-10
| | | | | | | | | | | | This will help debug who is responsible for starting a jbd2 transaction. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: free allocated and pre-allocated blocks when check_eofblocks_fl failsJiaying Zhang2011-07-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Upon corrupted inode or disk failures, we may fail after we already allocate some blocks from the inode or take some blocks from the inode's preallocation list, but before we successfully insert the corresponding extent to the extent tree. In this case, we should free any allocated blocks and discard the inode's preallocated blocks because the entries in the inode's preallocation list may be in an inconsistent state. Signed-off-by: Jiaying Zhang <jiayingz@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org