aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
* md: fix input truncation in safe_delay_store()Dan Williams2008-10-16
| | | | | | | | | | safe_delay_store() currently truncates the last character of input since it tells strlcpy that the buffer can only hold 'len' characters, off by one. sysfs already null terminates the buffer, so just increase the last argument to strlcpy. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* md: check for memory allocation failure in faulty personalitySven Wegener2008-10-15
| | | | | | | | It's a fault injection module, but I don't think we should oops here. Signed-off-by: Sven Wegener <sven.wegener@stealer.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Neil Brown <neilb@suse.de>
* md: build failure due to missing delay.hStephen Rothwell2008-10-15
| | | | | | | | | | | | | | | | | | | | | Today's linux-next build (powerpc ppc64_defconfig) failed like this: drivers/md/raid1.c: In function 'sync_request': drivers/md/raid1.c:1759: error: implicit declaration of function 'msleep_interruptible' make[3]: *** [drivers/md/raid1.o] Error 1 make[3]: *** Waiting for unfinished jobs.... drivers/md/raid10.c: In function 'sync_request': drivers/md/raid10.c:1749: error: implicit declaration of function 'msleep_interruptible' make[3]: *** [drivers/md/raid10.o] Error 1 drivers/md/md.c: In function 'md_do_sync': drivers/md/md.c:5915: error: implicit declaration of function 'msleep' Caused by commit 6caa3b0bbdb474647f6bdd8a958ffc46f78d8d58 ("md: Remove unnecessary #includes, #defines, and function declarations"). I added the following patch. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Relax minimum size restrictions on chunk_size.NeilBrown2008-10-12
| | | | | | | | | | | | | | | | | | Currently, the 'chunk_size' of an array must be at-least PAGE_SIZE. This makes moving an array to a machine with a larger PAGE_SIZE, or changing the kernel to use a larger PAGE_SIZE, can stop an array from working. For RAID10 and RAID4/5/6, this is non-trivial to fix as the resync process works on whole pages at a time, and assumes them to be wholly within a stripe. For other raid personalities, this restriction is not needed at all and can be dropped. So remove the test on chunk_size from common can, and add it in just the places where it is needed: raid10 and raid4/5/6. Signed-off-by: NeilBrown <neilb@suse.de>
* md: remove space after function name in declaration and call.NeilBrown2008-10-12
| | | | | | | | | | | | Having function (args) instead of function(args) make is harder to search for calls of particular functions. So remove all those spaces. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Remove unnecessary #includes, #defines, and function declarations.NeilBrown2008-10-12
| | | | | | A lot of cruft has gathered over the years. Time to remove it. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Convert remaining 1k representations in linear.c to sectors.Andre Noll2008-10-12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch renames hash_spacing and preshift to spacing and sector_shift respectively with the following change of semantics: Case 1: (sizeof(sector_t) <= sizeof(u32)). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this case, we have sector_shift = preshift = 0 and spacing = 2 * hash_spacing. Hence, the index for the hash table which is computed by the new code in which_dev() as sector / spacing equals the old value which was (sector/2) / hash_spacing. Note also that the value of nb_zone stays the same because both sz and base double. Case 2: (sizeof(sector_t) > sizeof(u32)). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (aka the shifting dance case). Here we have sector_shift = preshift + 1 and spacing = 2 * hash_spacing during the computation of nb_zone and curr_sector, but spacing = hash_spacing in which_dev() because in the last hunk of the patch for linear.c we shift down conf->spacing (= 2 * hash_spacing) by one more bit than in the old code. Hence in the computation of nb_zone, sz and base have the same value as before, so nb_zone is not affected. Also curr_sector in the next hunk stays the same. In which_dev() the hash table index is computed as (sector >> sector_shift) / spacing In view of sector_shift = preshift + 1 and spacing = hash_spacing, this equals ((sector/2) >> preshift) / hash_spacing which is the value computed by the old code. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: linear.c: Make two local variables sector-based.Andre Noll2008-10-12
| | | | | | | | This is a preparation for representing also the remaining fields of struct linear_private_data as sectors. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: linear: Represent dev_info->size and dev_info->offset in sectors.Andre Noll2008-10-12
| | | | | | | Rename them to num_sectors and start_sector which is more descriptive. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: linear.c: Remove broken debug code.Andre Noll2008-10-12
| | | | | | | conf->smallest_size is undefined since day one of the git repo.. Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: linear.c: Remove pointless initialization of curr_offset.Andre Noll2008-10-12
| | | | | Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: linear.c: Fix typo in comment.Andre Noll2008-10-12
| | | | | Signed-off-by: Andre Noll <maan@systemlinux.org> Signed-off-by: NeilBrown <neilb@suse.de>
* md: Don't try to set an array to 'read-auto' if it is already in that state.NeilBrown2008-10-12
| | | | | | | | | | 'read-auto' is a variant of 'readonly' which will switch to writable on the first write attempt. Calling do_md_stop to set the array readonly when it is already readonly returns an error. So make sure not to do that. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Allow metadata_version to be updated for externally managed metadata.NeilBrown2008-10-12
| | | | | | | | | | | | | | For externally managed metadata, the 'metadata_version' sysfs attribute is really just a channel for user-space programs to communicate about how the array is being managed. It can be useful for this to be changed while the array is active. Normally changes to metadata_version are not permitted while the array is active. Change that so that if the metadata is externally managed, the metadata_version can be changed to a different flavour of external management. Signed-off-by: NeilBrown <neilb@suse.de>
* md: Fix rdev_size_store with size == 0Chris Webb2008-10-12
| | | | | | | | | | | | | Fix rdev_size_store with size == 0. size == 0 means to use the largest size allowed by the underlying device and is used when modifying an active array. This fixes a regression introduced by commit d7027458d68b2f1752a28016dcf2ffd0a7e8f567 Cc: <stable@kernel.org> Signed-off-by: Chris Webb <chris@arachsys.com> Signed-off-by: NeilBrown <neilb@suse.de>
* Merge branch 'for_linus' of ↵Linus Torvalds2008-10-11
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: Rename ext4dev to ext4 ext4: Avoid double dirtying of super block in ext4_put_super() Update ext4 MAINTAINERS file Hook ext4 to the vfs fiemap interface. generic block based fiemap implementation ocfs2: fiemap support vfs: vfs-level fiemap interface ext4: fix xattr deadlock jbd2: Fix buffer head leak when writing the commit block ext4: Add debugging markers that can be used by systemtap jbd2: abort instead of waiting for nonexistent transaction ext4: fix initialization of UNINIT bitmap blocks ext4: Remove old legacy block allocator ext4: Use readahead when reading an inode from the inode table ext4: Improve the documentation for ext4's /proc tunables ext4: Combine proc file handling into a single set of functions ext4: move /proc setup and teardown out of mballoc.c ext4: Don't use 'struct dentry' for internal lookups ext4/jbd2: Avoid WARN() messages when failing to write to the superblock ext4: use percpu data structures for lg_prealloc_list ...
| * ext4: Rename ext4dev to ext4Theodore Ts'o2008-10-10
| | | | | | | | | | | | | | | | The ext4 filesystem is getting stable enough that it's time to drop the "dev" prefix. Also remove the requirement for the TEST_FILESYS flag. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Avoid double dirtying of super block in ext4_put_super()Andi Kleen2008-10-06
| | | | | | | | | | | | | | | | | | | | While reading code I noticed that ext4_put_super() dirties the superblock bh twice. It is always done in ext4_commit_super() too. Remove the redundant dirty operation. Should be a nop semantically. Signed-off-by: Andi Kleen <ak@linux.intel.com>
| * Update ext4 MAINTAINERS fileTheodore Ts'o2008-10-06
| | | | | | | | | | | | | | | | The ext4 entry was copied from ext3 and was never correct. Update it so that Theodore Ts'o is listed as the maintainer, and point the website to http://ext4.wiki.kernel.org. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * Hook ext4 to the vfs fiemap interface.Eric Sandeen2008-10-07
| | | | | | | | | | | | | | | | | | | | ext4_ext_walk_space() was reinstated to be used for iterating over file extents with a callback; it is used by the ext4 fiemap implementation. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
| * generic block based fiemap implementationJosef Bacik2008-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any block based fs (this patch includes ext3) just has to declare its own fiemap() function and then call this generic function with its own get_block_t. This works well for block based filesystems that will map multiple contiguous blocks at one time, but will work for filesystems that only map one block at a time, you will just end up with an "extent" for each block. One gotcha is this will not play nicely where there is hole+data after the EOF. This function will assume its hit the end of the data as soon as it hits a hole after the EOF, so if there is any data past that it will not pick that up. AFAIK no block based fs does this anyway, but its in the comments of the function anyway just in case. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org
| * ocfs2: fiemap supportMark Fasheh2008-10-03
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be refactored so that the extent cache can be skipped in favor of going directly to the on-disk records. This makes it easier for us to determine which extent is the last one in the btree. Also, I'm not sure we want to be caching fiemap lookups anyway as they're not directly related to data read/write. Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: ocfs2-devel@oss.oracle.com Cc: linux-fsdevel@vger.kernel.org
| * vfs: vfs-level fiemap interfaceMark Fasheh2008-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Basic vfs-level fiemap infrastructure, which sets up a new ->fiemap inode operation. Userspace can get extent information on a file via fiemap ioctl. As input, the fiemap ioctl takes a struct fiemap which includes an array of struct fiemap_extent (fm_extents). Size of the extent array is passed as fm_extent_count and number of extents returned will be written into fm_mapped_extents. Offset and length fields on the fiemap structure (fm_start, fm_length) describe a logical range which will be searched for extents. All extents returned will at least partially contain this range. The actual extent offsets and ranges returned will be unmodified from their offset and range on-disk. The fiemap ioctl returns '0' on success. On error, -1 is returned and errno is set. If errno is equal to EBADR, then fm_flags will contain those flags which were passed in which the kernel did not understand. On all other errors, the contents of fm_extents is undefined. As fiemap evolved, there have been many authors of the vfs patch. As far as I can tell, the list includes: Kalpak Shah <kalpak.shah@sun.com> Andreas Dilger <adilger@sun.com> Eric Sandeen <sandeen@redhat.com> Mark Fasheh <mfasheh@suse.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Michael Kerrisk <mtk.manpages@googlemail.com> Cc: linux-api@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org
| * ext4: fix xattr deadlockKalpak Shah2008-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | ext4_xattr_set_handle() eventually ends up calling ext4_mark_inode_dirty() which tries to expand the inode by shifting the EAs. This leads to the xattr_sem being downed again and leading to a deadlock. This patch makes sure that if ext4_xattr_set_handle() is in the call-chain, ext4_mark_inode_dirty() will not expand the inode. Signed-off-by: Kalpak Shah <kalpak.shah@sun.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: Fix buffer head leak when writing the commit blockTheodore Ts'o2008-10-06
| | | | | | | | | | | | | | | | | | Also make sure the buffer heads are marked clean before submitting bh for writing. The previous code was marking the buffer head dirty, which would have forced an unneeded write (and seek) to the journal for no good reason. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Add debugging markers that can be used by systemtapTheodore Ts'o2008-10-05
| | | | | | | | | | | | | | This debugging markers are designed to debug problems such as the random filesystem latency problems reported by Arjan. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: abort instead of waiting for nonexistent transactionDuane Griffin2008-10-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __jbd2_log_wait_for_space function sits in a loop checkpointing transactions until there is sufficient space free in the journal. However, if there are no transactions to be processed (e.g. because the free space calculation is wrong due to a corrupted filesystem) it will never progress. Check for space being required when no transactions are outstanding and abort the journal instead of endlessly looping. This patch fixes the bug reported by Sami Liedes at: http://bugzilla.kernel.org/show_bug.cgi?id=10976 Signed-off-by: Duane Griffin <duaneg@dghda.com> Cc: Sami Liedes <sliedes@cc.hut.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix initialization of UNINIT bitmap blocksFrederic Bohe2008-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a bug which caused on-line resizing of filesystems with a 1k blocksize to fail. The root cause of this bug was the fact that if an uninitalized bitmap block gets read in by userspace (which e2fsprogs does try to avoid, but can happen when the blocksize is less than the pagesize and an adjacent blocks is read into memory) ext4_read_block_bitmap() was erroneously depending on the buffer uptodate flag to decide whether it needed to initialize the bitmap block in memory --- i.e., to set the standard set of blocks in use by a block group (superblock, bitmaps, inode table, etc.). Essentially, ext4_read_block_bitmap() assumed it was the only routine that might try to read a block containing a block bitmap, which is simply not true. To fix this, ext4_read_block_bitmap() and ext4_read_inode_bitmap() must always initialize uninitialized bitmap blocks. Once a block or inode is allocated out of that bitmap, it will be marked as initialized in the block group descriptor, so in general this won't result any extra unnecessary work. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Remove old legacy block allocatorTheodore Ts'o2008-10-10
| | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Use readahead when reading an inode from the inode tableTheodore Ts'o2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | With modern hard drives, reading 64k takes roughly the same time as reading a 4k block. So request readahead for adjacent inode table blocks to reduce the time it takes when iterating over directories (especially when doing this in htree sort order) in a cold cache case. With this patch, the time it takes to run "git status" on a kernel tree after flushing the caches via "echo 3 > /proc/sys/vm/drop_caches" is reduced by 21%. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Improve the documentation for ext4's /proc tunablesTheodore Ts'o2008-10-09
| | | | | | | | | | | | Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: Alex Tomas <bzzz@sun.com> Cc: Andreas Dilger <adilger@sun.com>
| * ext4: Combine proc file handling into a single set of functionsTheodore Ts'o2008-09-23
| | | | | | | | | | | | | | | | | | Previously mballoc created a separate set of functions for each proc file. This combines the tunables into a single set of functions which gets used for all of the per-superblock proc files, saving approximately 2k of compiled object code. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: move /proc setup and teardown out of mballoc.cTheodore Ts'o2008-09-23
| | | | | | | | | | | | | | ...and into the core setup/teardown code in fs/ext4/super.c so that other parts of ext4 can define tuning parameters. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Don't use 'struct dentry' for internal lookupsTheodore Ts'o2008-09-22
| | | | | | | | | | | | | | | | | | | | | | | | This is a port of a patch from Linus which fixes a 200+ byte stack usage problem in ext4_get_parent(). It's more efficient to pass down only the actual parts of the dentry that matter: the parent inode and the name, instead of allocating a struct dentry on the stack. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4/jbd2: Avoid WARN() messages when failing to write to the superblockTheodore Ts'o2008-10-06
| | | | | | | | | | | | This fixes some very common warnings reported by kerneloops.org Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: use percpu data structures for lg_prealloc_listEric Sandeen2008-09-13
| | | | | | | | | | | | | | | | | | | | lg_prealloc_list seems to cry out for a per-cpu data structure; on a large smp system I think this should be better. I've lightly tested this change on a 4-cpu system. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Acked-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Renumber EXT4_IOC_MIGRATETheodore Ts'o2008-09-13
| | | | | | | | | | | | | | | | | | | | | | | | Pick an ioctl number for EXT4_IOC_MIGRATE that won't conflict with other ext4 ioctl's. Since there haven't been any major userspace users of this ioctl, we can afford to change this now, to avoid potential problems later. Also, reorder the ioctl numbers in ext4.h to avoid this sort of mistake in the future. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: hook the ext3 migration interface to the EXT4_IOC_SETFLAGS ioctlAneesh Kumar K.V2008-10-08
| | | | | | | | | | | | | | | | | | | | This patch hooks the ext3 to ext4 migrate interface to EXT4_IOC_SETFLAGS ioctl. The userspace interface is via chattr +e. We only allow setting extent flags. Clearing extent flag (migrating from ext4 to ext3) is not supported. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: elevate write count for migrate ioctlAneesh Kumar K.V2008-09-13
| | | | | | | | | | | | | | | | The migrate ioctl writes to the filsystem, so we need to elevate the write count. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: add missing unlock in ext4_check_descriptors() on error pathLi Zefan2008-09-08
| | | | | | | | | | | | | | | | | | If there group descriptors are corrupted we need unlock the block group lock before returning from the function; else we will oops when freeing a spinlock which is still being held. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * jbd2: clean up how the journal device name is printedTheodore Ts'o2008-09-16
| | | | | | | | | | | | | | | | | | | | | | | | | | Calculate the journal device name once and stash it away in the journal_s structure. This avoids needing to call bdevname() everywhere and reduces stack usage by not needing to allocate an on-stack buffer. In addition, we eliminate the '/' that can appear in device names (e.g. "cciss/c0d0p9" --- see kernel bugzilla #11321) that can cause problems when creating proc directory names, and include the inode number to support ocfs2 which creates multiple journals with different inode numbers. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: fix #11321: create /proc/ext4/*/stats more carefullyAlexey Dobriyan2008-09-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4 creates per-suberblock directory in /proc/ext4/ . Name used as basis is taken from bdevname, which, surprise, can contain slash. However, proc while allowing to use proc_create("a/b", parent) form of PDE creation, assumes that parent/a was already created. bdevname in question is 'cciss/c0d0p9', directory is not created and all this stuff goes directly into /proc (which is real bug). Warning comes when _second_ partition is mounted. http://bugzilla.kernel.org/show_bug.cgi?id=11321 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * Update flex_bg free blocks and free inodes counters when resizing.Frederic Bohe2008-09-08
| | | | | | | | | | | | | | | | This fixes a bug which prevented the newly created inodes after a resize from being used on filesystems with flex_bg. Signed-off-by: Frederic Bohe <frederic.bohe@bull.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Avoid printk floods in the face of directory corruptionEric Sandeen2008-10-09
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Note: some people thinks this represents a security bug, since it might make the system go away while it is printing a large number of console messages, especially if a serial console is involved. Hence, it has been assigned CVE-2008-3528, but it requires that the attacker either has physical access to your machine to insert a USB disk with a corrupted filesystem image (at which point why not just hit the power button), or is otherwise able to convince the system administrator to mount an arbitrary filesystem image (at which point why not just include a setuid shell or world-writable hard disk device file or some such). Me, I think they're just being silly. --tytso Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-ext4@vger.kernel.org Cc: Eugene Teo <eugeneteo@kernel.sg>
| * ext4: Properly update i_disksize.Aneesh Kumar K.V2008-09-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With delayed allocation we use i_data_sem to update i_disksize. We need to update i_disksize only if the new size specified is greater than the current value and we need to make sure we don't race with other i_disksize update. With delayed allocation we will switch to the write_begin function for non-delayed allocation if we are low on free blocks. This means the write_begin function for non-delayed allocation also needs to use the same locking. We also need to check and update i_disksize even if the new size is less that inode.i_size because of delayed allocation. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: truncate block allocated on a failed ext4_write_beginAneesh Kumar K.V2008-09-13
| | | | | | | | | | | | | | | | | | | | For blocksize < pagesize we need to remove blocks that got allocated in block_write_begin() if we fail with ENOSPC for later blocks. block_write_begin() internally does this if it allocated pages locally. This makes sure we don't have blocks outside inode.i_size during ENOSPC. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Retry block allocation if we have free blocks leftAneesh Kumar K.V2008-09-08
| | | | | | | | | | | | | | | | | | | | | | | | When we truncate files, the meta-data blocks released are not reused untill we commit the truncate transaction. That means delayed get_block request will return ENOSPC even if we have free blocks left. Force a journal commit and retry block allocation if we get ENOSPC with free blocks left. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Don't add the inode to journal handle until after the block is allocatedAneesh Kumar K.V2008-09-08
| | | | | | | | | | | | | | | | | | | | Make sure we don't add the inode to the journal handle until after the block allocation, so that a journal commit will not include the inode in case of block allocation failure. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Fix ext4 nomballoc allocator for ENOSPCAneesh Kumar K.V2008-09-08
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We run into ENOSPC error on nonmballoc ext4, even when there is free blocks on the filesystem. The patch includes two changes: a) Set reservation to NULL if we trying to allocate near group_target_block from the goal group if the free block in the group is less than windows. This should give us a better chance to allocate near group_target_block. This also ensures that if we are not allocating near group_target_block then we don't trun off reservation. This should enable us to allocate with reservation from other groups that have large free blocks count. b) we don't need to check the window size if the block reservation is off. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * ext4: Signed arithmetic fixAneesh Kumar K.V2008-10-08
| | | | | | | | | | | | | | | | | | | | This patch converts some usage of ext4_fsblk_t to s64. This is needed so that some of the sign conversion works as expected in if loops. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>