aboutsummaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAge
* ext4: tighten restrictions on inode flagsDuane Griffin2009-02-15
| | | | | | | | | | | | | | | | | | At the moment there are few restrictions on which flags may be set on which inodes. Specifically DIRSYNC may only be set on directories and IMMUTABLE and APPEND may not be set on links. Tighten that to disallow TOPDIR being set on non-directories and only NODUMP and NOATIME to be set on non-regular file, non-directories. Introduces a flags masking function which masks flags based on mode and use it during inode creation and when flags are set via the ioctl to facilitate future consistency. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: don't inherit inappropriate inode flags from parentDuane Griffin2009-02-15
| | | | | | | | | | | | | | | | | At present INDEX and EXTENTS are the only flags that new ext4 inodes do NOT inherit from their parent. In addition prevent the flags DIRTY, ECOMPR, IMAGIC, TOPDIR, HUGE_FILE and EXT_MIGRATE from being inherited. List inheritable flags explicitly to prevent future flags from accidentally being inherited. This fixes the TOPDIR flag inheritance bug reported at http://bugzilla.kernel.org/show_bug.cgi?id=9866. Signed-off-by: Duane Griffin <duaneg@dghda.com> Acked-by: Andreas Dilger <adilger@sun.com> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: allocate ->s_blockgroup_lock separatelyPekka Enberg2009-02-15
| | | | | | | | | | | | | | | | | | As spotted by kmemtrace, struct ext4_sb_info is 17664 bytes on 64-bit which makes it a very bad fit for SLAB allocators. The culprit of the wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when NR_CPUS >= 32. To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2 page in the worst case, separately. This shinks down struct ext4_sb_info enough to fit a 2 KB slab cache so now we allocate 16 KB + 2 KB instead of 32 KB saving 14 KB of memory. Acked-by: Andreas Dilger <adilger@sun.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Cc: <linux-ext4@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: New rec_len encoding for very large blocksizesWei Yongjun2009-02-14
| | | | | | | | | | | | | | | | | The rec_len field in the directory entry is 16 bits, so to encode blocksizes larger than 64k becomes problematic. This patch allows us to supprot block sizes up to 256k, by using the low 2 bits to extend the range of rec_len to 2**18-1 (since valid rec_len sizes must be a multiple of 4). We use the convention that a rec_len of 0 or 65535 means the filesystem block size, for compatibility with older kernels. It's unlikely we'll see VM pages of up to 256k, but at some point we might find that the Linux VM has been enhanced to support filesystem block sizes > than the VM page size, at which point it might be useful for some applications to allow very large filesystem block sizes. Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Use unsigned int for blocksize in dx_make_map() and dx_pack_dirents()Theodore Ts'o2009-02-14
| | | | | Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove call to ext4_group_desc() in ext4_group_used_meta_blocks()Theodore Ts'o2009-02-06
| | | | | | | | | | | | | | | The static function ext4_group_used_meta_blocks() only has one caller, who already has access to the block group's group descriptor. So it's better to have ext4_init_block_bitmap() pass the group descriptor to ext4_group_used_meta_blocks(), so it doesn't need to call ext4_group_desc(). Previously this function did not check if ext4_group_desc() returned NULL due to an error, potentially causing a kernel OOPS report. This avoids the issue entirely. Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: Remove stale block allocator references from ext4.hMike Snitzer2009-02-06
| | | | | | | | | Remove some leftovers from when the old block allocator was removed (c2ea3fde). ext4_sb_info is now a bit lighter. Also remove a dangling read_block_bitmap() prototype. Signed-off-by: Mike Snitzer <snitzer@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* fix setuid sometimes wouldn'tHugh Dickins2009-03-28
| | | | | | | | | | | | | | | | | | | | | | | | check_unsafe_exec() also notes whether the fs_struct is being shared by more threads than will get killed by the exec, and if so sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid. But /proc/<pid>/cwd and /proc/<pid>/root lookups make transient use of get_fs_struct(), which also raises that sharing count. This might occasionally cause a setuid program not to change euid, in the same way as happened with files->count (check_unsafe_exec also looks at sighand->count, but /proc doesn't raise that one). We'd prefer exec not to unshare fs_struct: so fix this in procfs, replacing get_fs_struct() by get_fs_path(), which does path_get while still holding task_lock, instead of raising fs->count. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org ___ fs/proc/base.c | 50 +++++++++++++++-------------------------------- 1 file changed, 16 insertions(+), 34 deletions(-) Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fix setuid sometimes doesn'tHugh Dickins2009-03-28
| | | | | | | | | | | | | | | | | | | | | | | | | | Joe Malicki reports that setuid sometimes doesn't: very rarely, a setuid root program does not get root euid; and, by the way, they have a health check running lsof every few minutes. Right, check_unsafe_exec() notes whether the files_struct is being shared by more threads than will get killed by the exec, and if so sets LSM_UNSAFE_SHARE to make bprm_set_creds() careful about euid. But /proc/<pid>/fd and /proc/<pid>/fdinfo lookups make transient use of get_files_struct(), which also raises that sharing count. There's a rather simple fix for this: exec's check on files->count has been redundant ever since 2.6.1 made it unshare_files() (except while compat_do_execve() omitted to do so) - just remove that check. [Note to -stable: this patch will not apply before 2.6.29: earlier releases should just remove the files->count line from unsafe_exec().] Reported-by: Joe Malicki <jmalicki@metacarta.com> Narrowed-down-by: Michael Itz <mitz@metacarta.com> Tested-by: Joe Malicki <jmalicki@metacarta.com> Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* compat_do_execve should unshare_filesHugh Dickins2009-03-28
| | | | | | | | | | | | | | | | | | 2.6.26's commit fd8328be874f4190a811c58cd4778ec2c74d2c05 "sanitize handling of shared descriptor tables in failing execve()" moved the unshare_files() from flush_old_exec() and several binfmts to the head of do_execve(); but forgot to make the same change to compat_do_execve(), leaving a CLONE_FILES files_struct shared across exec from a 32-bit process on a 64-bit kernel. It's arguable whether the files_struct really ought to be unshared across exec; but 2.6.1 made that so to stop the loading binary's fd leaking into other threads, and a 32-bit process on a 64-bit kernel ought to behave in the same way as 32 on 32 and 64 on 64. Signed-off-by: Hugh Dickins <hugh@veritas.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2009-03-27
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits) fs: avoid I_NEW inodes Merge code for single and multiple-instance mounts Remove get_init_pts_sb() Move common mknod_ptmx() calls into caller Parse mount options just once and copy them to super block Unroll essentials of do_remount_sb() into devpts vfs: simple_set_mnt() should return void fs: move bdev code out of buffer.c constify dentry_operations: rest constify dentry_operations: configfs constify dentry_operations: sysfs constify dentry_operations: JFS constify dentry_operations: OCFS2 constify dentry_operations: GFS2 constify dentry_operations: FAT constify dentry_operations: FUSE constify dentry_operations: procfs constify dentry_operations: ecryptfs constify dentry_operations: CIFS constify dentry_operations: AFS ...
| * fs: avoid I_NEW inodesNick Piggin2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To be on the safe side, it should be less fragile to exclude I_NEW inodes from inode list scans by default (unless there is an important reason to have them). Normally they will get excluded (eg. by zero refcount or writecount etc), however it is a bit fragile for list walkers to know exactly what parts of the inode state is set up and valid to test when in I_NEW. So along these lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc checks with them too -- this shouldn't be a problem should it?) Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Merge code for single and multiple-instance mountsSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | new_pts_mount() (including the get_sb_nodev()), shares a lot of code with init_pts_mount(). The only difference between them is the 'test-super' function passed into sget(). Move all common code into devpts_get_sb() and remove the new_pts_mount() and init_pts_mount() functions, Changelog[v3]: [Serge Hallyn]: Remove unnecessary printk()s Changelog[v2]: (Christoph Hellwig): Merge code in 'do_pts_mount()' into devpts_get_sb() Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Tested-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Remove get_init_pts_sb()Sukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | | | With mknod_ptmx() moved to devpts_get_sb(), init_pts_mount() becomes a wrapper around get_init_pts_sb(). Remove get_init_pts_sb() and fold code into init_pts_mount(). Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Move common mknod_ptmx() calls into callerSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | | | We create 'ptmx' node in both single-instance and multiple-instance mounts. So devpts_get_sb() can call mknod_ptmx() once rather than have both modes calling mknod_ptmx() separately. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Parse mount options just once and copy them to super blockSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | Since all the mount option parsing is done in devpts, we could do it just once and pass it around in devpts functions and eventually store it in the super block. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * Unroll essentials of do_remount_sb() into devptsSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On remount, devpts fs only needs to parse the mount options. Users cannot directly create/dirty files in /dev/pts so the MS_RDONLY flag and shrinking the dcache does not really apply to devpts. So effectively on remount, devpts only parses the mount options and updates these options in its super block. As such, we could replace do_remount_sb() call with a direct parse_mount_options(). Doing so enables subsequent patches to avoid parsing the mount options twice and simplify the code. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * vfs: simple_set_mnt() should return voidSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | simple_set_mnt() is defined as returning 'int' but always returns 0. Callers assume simple_set_mnt() never fails and don't properly cleanup if it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev() should: up_write(sb->s_unmount); deactivate_super(sb); if simple_set_mnt() fails. Since simple_set_mnt() never fails, would be cleaner if it did not return anything. [akpm@linux-foundation.org: fix build] Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Serge Hallyn <serue@us.ibm.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: move bdev code out of buffer.cNick Piggin2009-03-27
| | | | | | | | | | | | | | Move some block device related code out from buffer.c and put it in block_dev.c. I'm trying to move non-buffer_head code out of buffer.c Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: restAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: configfsAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: sysfsAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: JFSAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: OCFS2Al Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: GFS2Al Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: FATAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: FUSEAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: procfsAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: ecryptfsAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: CIFSAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: AFSAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: autofs, autofs4Al Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: 9pAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: misc filesystemsAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * constify dentry_operations: NFSAl Viro2009-03-27
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * devpts: Must release s_umount on errorSukadev Bhattiprolu2009-03-27
| | | | | | | | | | | | | | | | | | We should drop the ->s_umount mutex if an error occurs after the sget()/grab_super() call. This was introduced when adding support for multiple instances of devpts and noticed during a code review/reorg. Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * do_pipe cleanup: drop its last user in arch/alpha/Cheng Renquan2009-03-27
| | | | | | | | | | | | | | | | | | The last user of do_pipe is in arch/alpha/, after replacing it with do_pipe_flags, the do_pipe can be totally dropped. Signed-off-by: Cheng Renquan <crquan@gmail.com> Acked-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ufs: copy symlink data into the correct union memberDuane Griffin2009-03-27
| | | | | | | | | | | | | | | | | | | | Copy symlink data into the union member it is accessed through. Although this shouldn't make a difference to behaviour it makes the code easier to follow and grep through. It may also prevent problems if the struct/union definitions change in the future. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ufs: ensure fast symlinks are NUL-terminatedDuane Griffin2009-03-27
| | | | | | | | | | | | | | | | Ensure fast symlink targets are NUL-terminated, even if corrupted on-disk. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ufs: don't truncate longer ufs2 fast symlinksDuane Griffin2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ufs2 fast symlinks can be twice as long as ufs ones, however the code was using the ufs size in various places. Fix that so ufs2 symlinks over 60 characters aren't truncated. Note that we copy the entire area instead of using the maxsymlinklen field from the superblock. This way we will be more robust against corruption (of the superblock). While we are at it, use memcpy instead of open-coding it with for loops. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * ufs: validate maximum fast symlink size from superblockDuane Griffin2009-03-27
| | | | | | | | | | | | | | | | | | The maximum fast symlink size is set in the superblock of certain types of UFS filesystem. Before using it we need to check that it isn't longer than the available space we have in the inode. Signed-off-by: Duane Griffin <duaneg@dghda.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * cleanup may_openChristoph Hellwig2009-03-27
| | | | | | | | | | | | | | | | | | Add a switch for the various i_mode fmt cases, and remove the comment about writeability of devices nodes - that part is handled in inode_permission and comment on (briefly) there. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * cleanup d_add_ciChristoph Hellwig2009-03-27
| | | | | | | | | | | | | | | | | | Make sure that comments describe what's going on and not how, and always use __d_instantiate instead of two separate branches, one with d_instantiate and one with __d_instantiate. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * generic compat_sys_ustatChristoph Hellwig2009-03-27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to a different size of ino_t ustat needs a compat handler, but currently only x86 and mips provide one. Add a generic compat_sys_ustat and switch all architectures over to it. Instead of doing various user copy hacks compat_sys_ustat just reimplements sys_ustat as it's trivial. This was suggested by Arnd Bergmann. Found by Eric Sandeen when running xfstests/017 on ppc64, which causes stack smashing warnings on RHEL/Fedora due to the too large amount of data writen by the syscall. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * affs: fix missing unlocks in affs_remove_linkChristoph Hellwig2009-03-27
| | | | | | | | | | | | | | | | In two error cases affs_remove_link doesn't call affs_unlock_dir to release the i_hash_lock semaphore. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for_linus' of ↵Linus Torvalds2009-03-27
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits) ext2: Zero our b_size in ext2_quota_read() trivial: fix typos/grammar errors in fs/Kconfig quota: Coding style fixes quota: Remove superfluous inlines quota: Remove uppercase aliases for quota functions. nfsd: Use lowercase names of quota functions jfs: Use lowercase names of quota functions udf: Use lowercase names of quota functions ufs: Use lowercase names of quota functions reiserfs: Use lowercase names of quota functions ext4: Use lowercase names of quota functions ext3: Use lowercase names of quota functions ext2: Use lowercase names of quota functions ramfs: Remove quota call vfs: Use lowercase names of quota functions quota: Remove dqbuf_t and other cleanups quota: Remove NODQUOT macro quota: Make global quota locks cacheline aligned quota: Move quota files into separate directory ext4: quota reservation for delayed allocation ...
| * | ext2: Zero our b_size in ext2_quota_read()Manish Katiyar2009-03-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext2_quota_read() doesn't initialize tmp_bh.b_size before calling ext2_get_block() where we access it. Since it is a local variable it might contain some garbage. Make sure it is filled with reasonable value before passing. Signed-off-by: Manish Katiyar <mkatiyar@gmail.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * | trivial: fix typos/grammar errors in fs/KconfigMatt LaPlante2009-03-25
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Jan Kara <jack@suse.cz>
| * | quota: Coding style fixesJan Kara2009-03-25
| | | | | | | | | | | | | | | | | | | | | Wrap long lines, remove assignments from conditions, rewrite two overcomplicated for loops. Signed-off-by: Jan Kara <jack@suse.cz>
| * | quota: Remove superfluous inlinesJan Kara2009-03-25
| | | | | | | | | | | | | | | | | | | | | Remove inlines of large functions to decrease code size (saved 1543 bytes). Signed-off-by: Jan Kara <jack@suse.cz>