aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* Merge branch 'for_linus' of ↵Linus Torvalds2008-07-15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (61 commits) ext4: Documention update for new ordered mode and delayed allocation ext4: do not set extents feature from the kernel ext4: Don't allow nonextenst mount option for large filesystem ext4: Enable delalloc by default. ext4: delayed allocation i_blocks fix for stat ext4: fix delalloc i_disksize early update issue ext4: Handle page without buffers in ext4_*_writepage() ext4: Add ordered mode support for delalloc ext4: Invert lock ordering of page_lock and transaction start in delalloc mm: Add range_cont mode for writeback ext4: delayed allocation ENOSPC handling percpu_counter: new function percpu_counter_sum_and_set ext4: Add delayed allocation support in data=writeback mode vfs: add hooks for ext4's delayed allocation support jbd2: Remove data=ordered mode support using jbd buffer heads ext4: Use new framework for data=ordered mode in JBD2 jbd2: Implement data=ordered mode handling via inodes vfs: export filemap_fdatawrite_range() ext4: Fix lock inversion in ext4_ext_truncate() ext4: Invert the locking order of page_lock and transaction start ...
| * ext4: Documention update for new ordered mode and delayed allocationMingming Cao2008-07-11
| | | | | | | | | | | | | | Adding some documentations for delayed allocation and new ordered mode. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: do not set extents feature from the kernelEric Sandeen2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've talked for a while about getting rid of any feature- setting from the kernel; this gets rid of the code which would set the INCOMPAT_EXTENTS flag on the first file write when mounted as ext4[dev]. With this patch, if the extents feature is not already set on disk, then mounting as ext4 will fall back to noextents with a warning, and if -o extents is explicitly requested, the mount will fail, also with warning. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Don't allow nonextenst mount option for large filesystemAneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | The block mapped inode format can address only blocks within 2**32. This causes a number of issues, the biggest of which is that the block allocator needs to be taught that certain inodes can not utilize block numbers > 2**32. So until this is fixed, it is simplest to fail mounting of file systems with more than 2**32 blocks if the -o noextents option is given. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Enable delalloc by default.Aneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Enable delalloc by default to ensure it gets sufficient testing and because it makes the filesystem much more efficient. Add a nodealalloc option to disable delayed allocation, and update ext4_show_options to show delayed allocation off if it is disabled. If the data=journal mount option is used, disable delayed allocation since the delalloc code doesn't support data=journal yet. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
| * ext4: delayed allocation i_blocks fix for statMingming Cao2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now i_blocks is not getting updated until the blocks are actually allocaed on disk. This means with delayed allocation, right after files are copied, "ls -sF" shoes the file as taking 0 blocks on disk. "du" also shows the files taking zero space, which is highly confusing to the user. Since delayed allocation already keeps track of per-inode total number of blocks that are subject to delayed allocation, this patch fix this by using that to adjust the value returned by stat(2). When real block allocation is done, the i_blocks will get updated. Since the reserved blocks for delayed allocation will be decreased, this will be keep value returned by stat(2) consistent. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix delalloc i_disksize early update issueMingming Cao2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ext4_da_write_end() used walk_page_buffers() with a callback function of ext4_bh_unmapped_or_delay() to check if it extended the file size without allocating any blocks (since in this case i_disksize needs to be updated). However, this is didn't work proprely because the buffer head has not been marked dirty yet --- this is done later in block_commit_write() --- which caused ext4_bh_unmapped_or_delay() to always return false. In addition, walk_page_buffers() checks all of the buffer heads covering the page, and the only buffer_head that should be checked is the one covering the end of the write. Otherwise, given a 1k blocksize filesystem and a 4k page size, the buffer head covering the first 1k stripe of the file could be unmapped (because it was a sparse file), and the second or third buffer_head covering that page could be mapped, and using walk_page_buffers() would fail in this case since it would stop at the first unmapped buffer_head and return true. The core problem is that walk_page_buffers() was intended to do work in a callback function, and a non-zero return value indicated a failure, which termined the walk of the buffer heads covering the page. It was not intended to be used with a boolean function, such as ext4_bh_unmapped_or_delay(). Add addtional fix from Aneesh to protect i_disksize update rave with truncate. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Handle page without buffers in ext4_*_writepage()Aneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It can happen that buffers are removed from the page before it gets marked dirty and then is passed to writepage(). In writepage() we just initialize the buffers and check whether they are mapped and non delay. If they are mapped and non delay we write the page. Otherwise we mark them dirty. With this change we don't do block allocation at all in ext4_*_write_page. writepage() can get called under many condition and with a locking order of journal_start -> lock_page, we should not try to allocate blocks in writepage() which get called after taking page lock. writepage() can get called via shrink_page_list even with a journal handle which was created for doing inode update. For example when doing ext4_da_write_begin we create a journal handle with credit 1 expecting a i_disksize update for the inode. But ext4_da_write_begin can cause shrink_page_list via _grab_page_cache. So having a valid handle via ext4_journal_current_handle is not a guarantee that we can use the handle for block allocation in writepage, since we shouldn't be using credits that had been reserved for other updates. That it could result in we running out of credits when we update inodes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Add ordered mode support for delallocAneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This provides a new ordered mode implementation which gets rid of using buffer heads to enforce the ordering between metadata change with the related data chage. Instead, in the new ordering mode, it keeps track of all of the inodes touched by each transaction on a list, and when that transaction is committed, it flushes all of the dirty pages for those inodes. In addition, the new ordered mode reverses the lock ordering of the page lock and transaction lock, which provides easier support for delayed allocation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Invert lock ordering of page_lock and transaction start in delallocMingming Cao2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the reverse locking, we need to start a transation before taking the page lock, so in ext4_da_writepages() we need to break the write-out into chunks, and restart the journal for each chunck to ensure the write-out fits in a single transaction. Updated patch from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> which fixes delalloc sync hang with journal lock inversion, and address the performance regression issue. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * mm: Add range_cont mode for writebackAneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Filesystems like ext4 needs to start a new transaction in the writepages for block allocation. This happens with delayed allocation and there is limit to how many credits we can request from the journal layer. So we call write_cache_pages multiple times with wbc->nr_to_write set to the maximum possible value limitted by the max journal credits available. Add a new mode to writeback that enables us to handle this behaviour. In the new mode we update the wbc->range_start to point to the new offset to be written. Next call to call to write_cache_pages will start writeout from specified range_start offset. In the new mode we also limit writing to the specified wbc->range_end. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: delayed allocation ENOSPC handlingMingming Cao2008-07-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch does block reservation for delayed allocation, to avoid ENOSPC later at page flush time. Blocks(data and metadata) are reserved at da_write_begin() time, the freeblocks counter is updated by then, and the number of reserved blocks is store in per inode counter. At the writepage time, the unused reserved meta blocks are returned back. At unlink/truncate time, reserved blocks are properly released. Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to fix the oldallocator block reservation accounting with delalloc, added lock to guard the counters and also fix the reservation for meta blocks. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * percpu_counter: new function percpu_counter_sum_and_setMingming Cao2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Delayed allocation need to check free blocks at every write time. percpu_counter_read_positive() is not quit accurate. delayed allocation need a more accurate accounting, but using percpu_counter_sum_positive() is frequently is quite expensive. This patch added a new function to update center counter when sum per-cpu counter, to increase the accurate rate for next percpu_counter_read() and require less calling expensive percpu_counter_sum(). Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Add delayed allocation support in data=writeback modeAlex Tomas2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Updated with fixes from Mingming Cao <cmm@us.ibm.com> to unlock and release the page from page cache if the delalloc write_begin failed, and properly handle preallocated blocks. Also added a fix to clear buffer_delay in block_write_full_page() after allocating a delayed buffer. Updated with fixes from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to update i_disksize properly and to add bmap support for delayed allocation. Updated with a fix from Valerie Clement <valerie.clement@bull.net> to avoid filesystem corruption when the filesystem is mounted with the delalloc option and blocksize < pagesize. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Dave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
| * vfs: add hooks for ext4's delayed allocation supportAlex Tomas2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | Export mpage_bio_submit() and __mpage_writepage() for the benefit of ext4's delayed allocation support. Also change __block_write_full_page so that if buffers that have the BH_Delay flag set it will call get_block() to get the physical block allocated, just as in the !BH_Mapped case. Signed-off-by: Alex Tomas <alex@clusterfs.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: Remove data=ordered mode support using jbd buffer headsJan Kara2008-07-11
| | | | | | | | Signed-off-by: Jan Kara <jack@suse.cz>
| * ext4: Use new framework for data=ordered mode in JBD2Jan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch makes ext4 use inode-based implementation of data=ordered mode in JBD2. It allows us to unify some data=ordered and data=writeback paths (especially writepage since we don't have to start a transaction anymore) and remove some buffer walking. Updated fix from Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> to fix file system hang due to corrupt jinode values. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: Implement data=ordered mode handling via inodesJan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds necessary framework into JBD2 to be able to track inodes with each transaction and write-out their dirty data during transaction commit time. This new ordered mode brings all sorts of advantages such as possibility to get rid of journal heads and buffer heads for data buffers in ordered mode, better ordering of writes on transaction commit, simplification of some JBD code, no more anonymous pages when truncate of data being committed happens. Also with this new ordered mode, delayed allocation on ordered mode is much simpler. Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: export filemap_fdatawrite_range()Jan Kara2008-07-11
| | | | | | | | | | | | | | Make filemap_fdatawrite_range() function public, so that it can later be used in ordered mode rewrite by JBD/JBD2. Signed-off-by: Jan Kara <jack@suse.cz>
| * ext4: Fix lock inversion in ext4_ext_truncate()Jan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot call ext4_orphan_add() from under i_data_sem because that causes a lock ordering violation between i_data_sem and and the superblock lock. Updated with Aneesh's locking order fix Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Invert the locking order of page_lock and transaction startJan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes are needed to support data=ordered mode handling via inodes. This enables us to get rid of the journal heads and buffer heads for data buffers in the ordered mode. With the changes, during tranasaction commit we writeout the inode pages using the writepages()/writepage(). That implies we take page lock during transaction commit. This can cause a deadlock with the locking order page_lock -> jbd2_journal_start, since the jbd2_journal_start can wait for the journal_commit to happen and the journal_commit now needs to take the page lock. To avoid this dead lock reverse the locking order. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * vfs: Move mark_inode_dirty() from under page lock in generic_write_end()Jan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | There's no need to call mark_inode_dirty() under page lock in generic_write_end(). It unnecessarily makes hold time of page lock longer and more importantly it forces locking order of page lock and transaction start for journaling filesystems. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Use page_mkwrite vma_operations to get mmap write notification.Aneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | We would like to get notified when we are doing a write on mmap section. This is needed with respect to preallocated area. We split the preallocated area into initialzed extent and uninitialzed extent in the call back. This let us handle ENOSPC better. Otherwise we get ENOSPC in the writepage and that would result in data loss. The changes are also needed to handle ENOSPC when writing to an mmap section of files with holes. Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Documentation updates.Jose R. Santos2008-07-11
| | | | | | | | | | | | | | | | Some of the information in Documentation/filesystems/ext4.txt is out of date and in need of an update. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix online resize with mballocFrederic Bohe2008-07-11
| | | | | | | | | | | | | | | | | | | | Update group infos when updating a group's descriptor. Add group infos when adding a group's descriptor. Refresh cache pages used by mb_alloc when changes occur. This will probably need modifications when META_BG resizing will be allowed. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: Mingming Cao <cmm@us.ibm.com>
| * ext4: use atomic functions to set bh_stateEric Sandeen2008-07-11
| | | | | | | | | | | | | | | | | | | | Use the BUFFER_FNS functions (set_buffer_foo) to set buffer head state atomically instead of nonatomic __set_bit(). Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Set journal pointer to NULL when journal is releasedJan Kara2008-07-11
| | | | | | | | | | | | | | | | | | | | | | Set sbi->s_journal to NULL after we call journal_destroy(). This will be later needed because after journal_destroy() is called, ext4_clear_inode() can still be called for some inodes (e.g. root inode) and we'll need to detect there that journal doesn't exists anymore. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: mballoc avoid use root reserved blocks for non root allocationMingming Cao2008-07-11
| | | | | | | | | | | | | | | | | | mballoc allocation missed check for blocks reserved for root users. Add ext4_has_free_blocks() check before allocation. Also modified ext4_has_free_blocks() to support multiple block allocation request. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: call blkdev_issue_flush on fsyncEric Sandeen2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | To ensure that bits are truly on-disk after an fsync, we should call blkdev_issue_flush if barriers are supported. Inspired by an old thread on barriers, by reiserfs & xfs which do the same, and by a patch SuSE ships with their kernel Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: cleanup block allocatorAneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | Move the code related to block allocation to a single function and add helper funtions to differient allocation for data and meta data blocks Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Use inode preallocation with -o noextentsAneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mballoc is enabled, block allocation for old block-based files are allocated using mballoc allocator instead of old block-based allocator. The old ext3 block reservation is turned off when mballoc is turned on. However, the in-core preallocation is not enabled for block-based/ non-extent based file block allocation. This result in performance regression, as now we don't have "reservation" ore in-core preallocation to prevent interleaved fragmentation in multiple writes workload. This patch fix this by enable per inode in-core preallocation for non extent files when mballoc is used. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix ext4_init_block_bitmap() for metablock block groupAkinobu Mita2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When meta_bg feature is enabled and s_first_meta_bg != 0, ext4_init_block_bitmap() miscalculates the number of block used by the group descriptor table (0 or 1 for metablock block group) This patch fixes this by using ext4_bg_num_gdb() Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Stephen Tweedie <sct@redhat.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Andreas Dilger <adilger@sun.com>
| * ext4: Fix sparse warningAneesh Kumar K.V2008-07-11
| | | | | | | | | | Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: cleanup never-used magic numbers from htree codeLi Zefan2008-07-11
| | | | | | | | | | | | | | | | | | | | | | dx_root_limit() will had some dead code which forced it to always return 20, and dx_node_limit to always return 22 for debugging purposes. Remove it. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Fix ext4_ext_journal_restart() to reflect errors up to the callerShen Feng2008-07-11
| | | | | | | | | | | | | | | | | | | | | | Fix ext4_ext_journal_restart() so it returns any errors reported by ext4_journal_extend() and ext4_journal_restart(). Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Make ext4_ext_find_extent fills ext_path completelyShen Feng2008-07-11
| | | | | | | | | | | | | | | | | | When pos=0 or depth, the fields of ext4_ext_path is are not completely filled. This patch also removes some unnecessary code. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: return error when calling ext4_ext_split failedShen Feng2008-07-11
| | | | | | | | | | | | | | | | | | | | ext4_ext_create_new_leaf must return error when its calling to ext4_ext_split failed. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Update i_disksize properly when allocating from fallocate area.Aneesh Kumar K.V2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When allocating unitialized space at the end of file which had been preallocated with the FALLOC_FL_KEEP_SIZE option, the file size is not updated at that time. But the later we are not updating the file size when writing to that preallocated space. These changes are for code correctness. This patch allows us to update the i_disksize at the write_end() callback of filesystem properly. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove quota allocation when ext4_mb_new_blocks failsShen Feng2008-07-11
| | | | | | | | | | | | | | | | | | | | Quota allocation is not removed when ext4_mb_new_blocks calls kmem_cache_alloc failed. Also make sure the allocation context is freed on the error path. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove redundant code in ext4_fill_super()Li Zefan2008-07-11
| | | | | | | | | | | | | | | | The previous sb_min_blocksize() has already set the block size. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: fix race between jbd2_journal_try_to_free_buffers() and jbd2 commit ↵Mingming Cao2008-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | transaction journal_try_to_free_buffers() could race with jbd commit transaction when the later is holding the buffer reference while waiting for the data buffer to flush to disk. If the caller of journal_try_to_free_buffers() request tries hard to release the buffers, it will treat the failure as error and return back to the caller. We have seen the directo IO failed due to this race. Some of the caller of releasepage() also expecting the buffer to be dropped when passed with GFP_KERNEL mask to the releasepage()->journal_try_to_free_buffers(). With this patch, if the caller is passing the GFP_KERNEL to indicating this call could wait, in case of try_to_free_buffers() failed, let's waiting for journal_commit_transaction() to finish commit the current committing transaction , then try to free those buffers again with journal locked. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Reviewed-by: Badari Pulavarty <pbadari@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: New inode allocation for FLEX_BG meta-data groups.Jose R. Santos2008-07-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch mostly controls the way inode are allocated in order to make ialloc aware of flex_bg block group grouping. It achieves this by bypassing the Orlov allocator when block group meta-data are packed toghether through mke2fs. Since the impact on the block allocator is minimal, this patch should have little or no effect on other block allocation algorithms. By controlling the inode allocation, it can basically control where the initial search for new block begins and thus indirectly manipulate the block allocator. This allocator favors data and meta-data locality so the disk will gradually be filled from block group zero upward. This helps improve performance by reducing seek time. Since the group of inode tables within one flex_bg are treated as one giant inode table, uninitialized block groups would not need to partially initialize as many inode table as with Orlov which would help fsck time as the filesystem usage goes up. Signed-off-by: Jose R. Santos <jrs@us.ibm.com> Signed-off-by: Valerie Clement <valerie.clement@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: Add commit time into the commit blockTheodore Ts'o2008-07-11
| | | | | | | | | | | | | | | | Carlo Wood has demonstrated that it's possible to recover deleted files from the journal. Something that will make this easier is if we can put the time of the commit into commit block. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: replace __FUNCTION__ occurrencesStoyan Gaydarov2008-07-13
| | | | | | | | | | | | | | | | | | | | | | __FUNCTION__ is gcc-specific, use __func__ instead Signed-off-by: Stoyan Gaydarov <stoyboyker@gmail.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Mingming Cao <cmm@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix error processing in mb_free_blocksShen Feng2008-07-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The error processing of the return value of mb_free_blocks is meanless because it only returns 0. This fix includes - make mb_free_blocks return void - remove the error processing part in callers - unlock group before calling ext4_error in mb_free_blocks Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Cc: Mingming Cao <cmm@us.ibm.com> Cc: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: error proc entry creation when the fs/ext4 is not correctly createdShen Feng2008-07-13
| | | | | | | | | | | | | | | | | | | | When the directory fs/ext4 is not correctly created under proc, the entry under this directory should not be created. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix build failure if DX_DEBUG is enabledLi Zefan2008-07-11
| | | | | | | | | | | | | | | | | | ext4_next_entry() is used by the debugging function dx_show_leaf(), so it must be defined before that function. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Remove unused variable from ext4_show_optionsTheodore Ts'o2008-07-11
| | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Rename read_block_bitmap() to ext4_read_block_bitmap()Theodore Ts'o2008-07-11
| | | | | | | | | | | | | | Since this a non-static function, make it be ext4 specific to avoid conflicts with potentially other filesystems. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: remove double definitions of xattr macrosShen Feng2008-07-11
| | | | | | | | | | | | | | | | | | remove the definitions of macros XATTR_TRUSTED_PREFIX and XATTR_USER_PREFIX since they are defined in linux/xattr.h Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>