aboutsummaryrefslogtreecommitdiffstats
path: root/fs/ext4/super.c
Commit message (Collapse)AuthorAge
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2014-12-12
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Lots of bugs fixes, including Zheng and Jan's extent status shrinker fixes, which should improve CPU utilization and potential soft lockups under heavy memory pressure, and Eric Whitney's bigalloc fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (26 commits) ext4: ext4_da_convert_inline_data_to_extent drop locked page after error ext4: fix suboptimal seek_{data,hole} extents traversial ext4: ext4_inline_data_fiemap should respect callers argument ext4: prevent fsreentrance deadlock for inline_data ext4: forbid journal_async_commit in data=ordered mode jbd2: remove unnecessary NULL check before iput() ext4: Remove an unnecessary check for NULL before iput() ext4: remove unneeded code in ext4_unlink ext4: don't count external journal blocks as overhead ext4: remove never taken branch from ext4_ext_shift_path_extents() ext4: create nojournal_checksum mount option ext4: update comments regarding ext4_delete_inode() ext4: cleanup GFP flags inside resize path ext4: introduce aging to extent status tree ext4: cleanup flag definitions for extent status tree ext4: limit number of scanned extents in status tree shrinker ext4: move handling of list of shrinkable inodes into extent status code ext4: change LRU to round-robin in extent status tree shrinker ext4: cache extent hole in extent status tree for ext4_da_map_blocks() ext4: fix block reservation for bigalloc filesystems ...
| * ext4: forbid journal_async_commit in data=ordered modeJan Kara2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Option journal_async_commit breaks gurantees of data=ordered mode as it sends only a single cache flush after writing a transaction commit block. Thus even though the transaction including the commit block is fully stored on persistent storage, file data may still linger in drives caches and will be lost on power failure. Since all checksums match on journal recovery, we replay the transaction thus possibly exposing stale user data. To fix this data exposure issue, remove the possibility to use journal_async_commit in data=ordered mode. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: don't count external journal blocks as overheadEric Sandeen2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was fixed for ext3 with: e6d8fb3 ext3: Count internal journal as bsddf overhead in ext3_statfs but was never fixed for ext4. With a large external journal and no used disk blocks, df comes out negative without this, as journal blocks are added to the overhead & subtracted from used blocks unconditionally. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: create nojournal_checksum mount optionDarrick J. Wong2014-11-25
| | | | | | | | | | | | | | | | | | | | Create a mount option to disable journal checksumming (because the metadata_csum feature turns it on by default now), and fix remount not to allow changing the journal checksumming option, since changing the mount options has no effect on the journal. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: cleanup GFP flags inside resize pathDmitry Monakhov2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must use GFP_NOFS instead GFP_KERNEL inside ext4_mb_add_groupinfo and ext4_calculate_overhead() because they are called from inside a journal transaction. Call trace: ioctl ->ext4_group_add ->journal_start ->ext4_setup_new_descs ->ext4_mb_add_groupinfo -> GFP_KERNEL ->ext4_flex_group_add ->ext4_update_super ->ext4_calculate_overhead -> GFP_KERNEL ->journal_stop Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: limit number of scanned extents in status tree shrinkerJan Kara2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we scan extent status trees of inodes until we reclaim nr_to_scan extents. This can however require a lot of scanning when there are lots of delayed extents (as those cannot be reclaimed). Change shrinker to work as shrinkers are supposed to and *scan* only nr_to_scan extents regardless of how many extents did we actually reclaim. We however need to be careful and avoid scanning each status tree from the beginning - that could lead to a situation where we would not be able to reclaim anything at all when first nr_to_scan extents in the tree are always unreclaimable. We remember with each inode offset where we stopped scanning and continue from there when we next come across the inode. Note that we also need to update places calling __es_shrink() manually to pass reasonable nr_to_scan to have a chance of reclaiming anything and not just 1. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: move handling of list of shrinkable inodes into extent status codeJan Kara2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently callers adding extents to extent status tree were responsible for adding the inode to the list of inodes with freeable extents. This is error prone and puts list handling in unnecessarily many places. Just add inode to the list automatically when the first non-delay extent is added to the tree and remove inode from the list when the last non-delay extent is removed. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: change LRU to round-robin in extent status tree shrinkerZheng Liu2014-11-25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In this commit we discard the lru algorithm for inodes with extent status tree because it takes significant effort to maintain a lru list in extent status tree shrinker and the shrinker can take a long time to scan this lru list in order to reclaim some objects. We replace the lru ordering with a simple round-robin. After that we never need to keep a lru list. That means that the list needn't be sorted if the shrinker can not reclaim any objects in the first round. Cc: Andreas Dilger <adilger.kernel@dilger.ca> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: kill ext4_kvfree()Al Viro2014-11-20
| | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | ext4: Convert to private i_dquot fieldJan Kara2014-11-10
|/ | | | | | | CC: linux-ext4@vger.kernel.org Acked-by: "Theodore Ts'o" <tytso@mit.edu> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* ext4: remove extent status procfs files if journal load failsDarrick J. Wong2014-10-30
| | | | | | | | | If we can't load the journal, remove the procfs files for the extent status information file to avoid leaking resources. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: disallow changing journal_csum option during remountDarrick J. Wong2014-10-30
| | | | | | | | | | ext4 does not permit changing the metadata or journal checksum feature flag while mounted. Until we decide to support that, don't allow a remount to change the journal_csum flag (right now we silently fail to change anything). Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: enable journal checksum when metadata checksum feature enabledDarrick J. Wong2014-10-30
| | | | | | | | | If metadata checksumming is turned on for the FS, we need to tell the journal to use checksumming too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2014-10-20
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "A large number of cleanups and bug fixes, with some (minor) journal optimizations" [ This got sent to me before -rc1, but was stuck in my spam folder. - Linus ] * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (67 commits) ext4: check s_chksum_driver when looking for bg csum presence ext4: move error report out of atomic context in ext4_init_block_bitmap() ext4: Replace open coded mdata csum feature to helper function ext4: delete useless comments about ext4_move_extents ext4: fix reservation overflow in ext4_da_write_begin ext4: add ext4_iget_normal() which is to be used for dir tree lookups ext4: don't orphan or truncate the boot loader inode ext4: grab missed write_count for EXT4_IOC_SWAP_BOOT ext4: optimize block allocation on grow indepth ext4: get rid of code duplication ext4: fix over-defensive complaint after journal abort ext4: fix return value of ext4_do_update_inode ext4: fix mmap data corruption when blocksize < pagesize vfs: fix data corruption when blocksize < pagesize for mmaped data ext4: fold ext4_nojournal_sops into ext4_sops ext4: support freezing ext2 (nojournal) file systems ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs() ext4: don't check quota format when there are no quota files jbd2: simplify calling convention around __jbd2_journal_clean_checkpoint_list jbd2: avoid pointless scanning of checkpoint lists ...
| * ext4: check s_chksum_driver when looking for bg csum presenceDarrick J. Wong2014-10-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the ext4_has_group_desc_csum predicate to look for a checksum driver instead of the metadata_csum flag and change the bg checksum calculation function to look for GDT_CSUM before taking the crc16 path. Without this patch, if we mount with ^uninit_bg,^metadata_csum and later metadata_csum gets turned on by accident, the block group checksum functions will incorrectly assume that checksumming is enabled (metadata_csum) but that crc16 should be used (!s_chksum_driver). This is totally wrong, so fix the predicate and the checksum formula selection. (Granted, if the metadata_csum feature bit gets enabled on a live FS then something underhanded is going on, but we could at least avoid writing garbage into the on-disk fields.) Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Dmitry Monakhov <dmonakhov@openvz.org> Cc: stable@vger.kernel.org
| * ext4: Replace open coded mdata csum feature to helper functionDmitry Monakhov2014-10-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Besides the fact that this replacement improves code readability it also protects from errors caused direct EXT4_S(sb)->s_es manipulation which may result attempt to use uninitialized csum machinery. #Testcase_BEGIN IMG=/dev/ram0 MNT=/mnt mkfs.ext4 $IMG mount $IMG $MNT #Enable feature directly on disk, on mounted fs tune2fs -O metadata_csum $IMG # Provoke metadata update, likey result in OOPS touch $MNT/test umount $MNT #Testcase_END # Replacement script @@ expression E; @@ - EXT4_HAS_RO_COMPAT_FEATURE(E, EXT4_FEATURE_RO_COMPAT_METADATA_CSUM) + ext4_has_metadata_csum(E) https://bugzilla.kernel.org/show_bug.cgi?id=82201 Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: add ext4_iget_normal() which is to be used for dir tree lookupsTheodore Ts'o2014-10-05
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is a corrupted file system which has directory entries that point at reserved, metadata inodes, prohibit them from being used by treating them the same way we treat Boot Loader inodes --- that is, mark them to be bad inodes. This prohibits them from being opened, deleted, or modified via chmod, chown, utimes, etc. In particular, this prevents a corrupted file system which has a directory entry which points at the journal inode from being deleted and its blocks released, after which point Much Hilarity Ensues. Reported-by: Sami Liedes <sami.liedes@iki.fi> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
| * ext4: fold ext4_nojournal_sops into ext4_sopsTheodore Ts'o2014-09-18
| | | | | | | | | | | | | | There's no longer any need to have a separate set of super_operations for nojournal mode. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: support freezing ext2 (nojournal) file systemsTheodore Ts'o2014-09-18
| | | | | | | | | | | | | | | | | | Through an oversight, when we added nojournal support to ext4, we didn't add support to allow file system freezing. This is relatively easy to add, so let's do it. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reported-by: Dexuan Cui <decui@microsoft.com>
| * ext4: fold ext4_sync_fs_nojournal() into ext4_sync_fs()Theodore Ts'o2014-09-18
| | | | | | | | | | | | | | This allows us to eliminate duplicate code, and eventually allow us to also fold ext4_sops and ext4_nojournal_sops together. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: don't check quota format when there are no quota filesJan Kara2014-09-18
| | | | | | | | | | | | | | | | | | | | | | The check whether quota format is set even though there are no quota files with journalled quota is pointless and it actually makes it impossible to turn off journalled quotas (as there's no way to unset journalled quota format). Just remove the check. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: explicitly inform user about orphan list cleanupDmitry Monakhov2014-09-16
| | | | | | | | | | | | | | | | Production fs likely compiled/mounted w/o jbd debugging, so orphan list clearing will be silent. Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: validate external journal superblock checksumDarrick J. Wong2014-09-11
| | | | | | | | | | | | | | | | | | | | If the external journal device has metadata_csum enabled, verify that the superblock checksum matches the block before we try to mount. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * jbd2: fix journal checksum feature flag handlingDarrick J. Wong2014-09-11
| | | | | | | | | | | | | | | | | | | | Clear all three journal checksum feature flags before turning on whichever journal checksum options we want. Rearrange the error checking so that newer flags get complained about first. Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: provide separate operations for sysfs feature filesLukas Czerner2014-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently sysfs feature files uses ext4_attr_ops as the file operations to show/store data. However the feature files is not supposed to contain any data at all, the sole existence of the file means that the module support the feature. Moreover, none of the sysfs feature attributes actually register show/store functions so that would not be a problem. However if a sysfs feature attribute register a show or store function we might be in trouble because the kobject in this case is _not_ embedded in the ext4_sb_info structure as ext4_attr_show/store expect. So just to be safe, provide separate empty sysfs_ops to use in ext4_feat_ktype. This might safe us from potential problems in the future. As a bonus we can "store" something more descriptive than nothing in the files, so let it contain "enabled" to make it clear that the feature is really present in the module. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: add sysfs entry showing whether the fs contains errorsLukas Czerner2014-09-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently there is no easy way to tell that the mounted file system contains errors other than checking for log messages, or reading the information directly from superblock. This patch adds new sysfs entries: errors_count (number of fs errors we encounter) first_error_time (unix timestamp for the first error we see) last_error_time (unix timestamp for the last error we see) If the file system is not marked as containing errors then any of the file will return 0. Otherwise it will contain valid information. More details about the errors should as always be found in the logs. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: don't use MAXQUOTAS valueJan Kara2014-09-11
| | | | | | | | | | | | | | | | | | | | MAXQUOTAS value defines maximum number of quota types VFS supports. This isn't necessarily the number of types ext4 supports. Although ext4 will support project quotas, use ext4 private definition for consistency with other filesystems. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: use non-movable memory for the ext4 superblockGioh Kim2014-09-04
| | | | | | | | | | | | | | | | | | | | | | Since the ext4 superblock is not released until the file system is unmounted, allocate the buffer cache entry for the ext4 superblock out of the non-moveable are to allow page migrations and thus CMA allocations to more easily succeed if the CMA area is limited. Signed-off-by: Gioh Kim <gioh.kim@lge.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * ext4: track extent status tree shrinker delay staticticsZheng Liu2014-09-01
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds some statictics in extent status tree shrinker. The purpose to add these is that we want to collect more details when we encounter a stall caused by extent status tree shrinker. Here we count the following statictics: stats: the number of all objects on all extent status trees the number of reclaimable objects on lru list cache hits/misses the last sorted interval the number of inodes on lru list average: scan time for shrinking some objects the number of shrunk objects maximum: the inode that has max nr. of objects on lru list the maximum scan time for shrinking some objects The output looks like below: $ cat /proc/fs/ext4/sda1/es_shrinker_info stats: 28228 objects 6341 reclaimable objects 5281/631 cache hits/misses 586 ms last sorted interval 250 inodes on lru list average: 153 us scan time 128 shrunk objects maximum: 255 inode (255 objects, 198 reclaimable) 125723 us max scan time If the lru list has never been sorted, the following line will not be printed: 586ms last sorted interval If there is an empty lru list, the following lines also will not be printed: 250 inodes on lru list ... maximum: 255 inode (255 objects, 198 reclaimable) 0 us max scan time Meanwhile in this commit a new trace point is defined to print some details in __ext4_es_shrink(). Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Zheng Liu <wenqing.lz@taobao.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: enable block_validity by defaultDarrick J. Wong2014-09-01
| | | | | | | | | | | | | | | | | | Enable by default the block_validity feature, which checks for collisions between newly allocated blocks and critical system metadata. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: convert ext4_bread() to use the ERR_PTR conventionTheodore Ts'o2014-08-29
| | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge branch 'for-linus' of ↵Tejun Heo2014-09-24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block into for-3.18 This is to receive 0a30288da1ae ("blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe") which implements __percpu_ref_kill_expedited() to work around SCSI blk-mq stall. The commit reverted and patches to implement proper fix will be added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de>
| * jbd2: fix descriptor block size handling errors with journal_csumDarrick J. Wong2014-08-28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that there are some serious problems with the on-disk format of journal checksum v2. The foremost is that the function to calculate descriptor tag size returns sizes that are too big. This causes alignment issues on some architectures and is compounded by the fact that some parts of jbd2 use the structure size (incorrectly) to determine the presence of a 64bit journal instead of checking the feature flags. Therefore, introduce journal checksum v3, which enlarges the descriptor block tag format to allow for full 32-bit checksums of journal blocks, fix the journal tag function to return the correct sizes, and fix the jbd2 recovery code to use feature flags to determine 64bitness. Add a few function helpers so we don't have to open-code quite so many pieces. Switching to a 16-byte block size was found to increase journal size overhead by a maximum of 0.1%, to convert a 32-bit journal with no checksumming to a 32-bit journal with checksum v3 enabled. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reported-by: TR Reardon <thomas_reardon@hotmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* | percpu_counter: add @gfp to percpu_counter_init()Tejun Heo2014-09-07
|/ | | | | | | | | | | | | | | | | | | | | | | | Percpu allocator now supports allocation mask. Add @gfp to percpu_counter_init() so that !GFP_KERNEL allocation masks can be used with percpu_counters too. We could have left percpu_counter_init() alone and added percpu_counter_init_gfp(); however, the number of users isn't that high and introducing _gfp variants to all percpu data structures would be quite ugly, so let's just do the conversion. This is the one with the most users. Other percpu data structures are a lot easier to convert. This patch doesn't make any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Jan Kara <jack@suse.cz> Acked-by: "David S. Miller" <davem@davemloft.net> Cc: x86@kernel.org Cc: Jens Axboe <axboe@kernel.dk> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org>
* ext4: rearrange initialization to fix EXT4FS_DEBUGTheodore Ts'o2014-07-15
| | | | | | | | | | | | | | | | | | | | The EXT4FS_DEBUG is a *very* developer specific #ifdef designed for ext4 developers only. (You have to modify fs/ext4/ext4.h to enable it.) Rearrange how we initialize data structures to avoid calling ext4_count_free_clusters() until the multiblock allocator has been initialized. This also allows us to only call ext4_count_free_clusters() once, and simplifies the code somewhat. (Thanks to Chen Gang <gang.chen.5i5j@gmail.com> for pointing out a !CONFIG_SMP compile breakage in the original patch.) Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com>
* ext4: revert commit which was causing fs corruption after journal replaysTheodore Ts'o2014-07-11
| | | | | | | | | | | | | | | | | | | | | | | Commit 007649375f6af2 ("ext4: initialize multi-block allocator before checking block descriptors") causes the block group descriptor's count of the number of free blocks to become inconsistent with the number of free blocks in the allocation bitmap. This is a harmless form of fs corruption, but it causes the kernel to potentially remount the file system read-only, or to panic, depending on the file systems's error behavior. Thanks to Eric Whitney for his tireless work to reproduce and to find the guilty commit. Fixes: 007649375f6af2 ("ext4: initialize multi-block allocator before checking block descriptors" Cc: stable@vger.kernel.org # 3.15 Reported-by: David Jander <david@protonic.nl> Reported-by: Matteo Croce <technoboy85@gmail.com> Tested-by: Eric Whitney <enwlinux@gmail.com> Suggested-by: Eric Whitney <enwlinux@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* ext4: disable synchronous transaction batching if max_batch_time==0Eric Sandeen2014-07-05
| | | | | | | | | | | | | | The mount manpage says of the max_batch_time option, This optimization can be turned off entirely by setting max_batch_time to 0. But the code doesn't do that. So fix the code to do that. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: clarify error count warning messagesTheodore Ts'o2014-07-05
| | | | | | | | | | | Make it clear that values printed are times, and that it is error since last fsck. Also add note about fsck version required. Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Cc: stable@vger.kernel.org
* ext4: add missing BUFFER_TRACE before ext4_journal_get_write_accessliang xie2014-05-12
| | | | | | | Make them more consistently Signed-off-by: xieliang <xieliang@xiaomi.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: remove unnecessary double parenthesesLukas Czerner2014-05-12
| | | | | Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: make local functions staticStephen Hemminger2014-05-12
| | | | | | | | I have been running make namespacecheck to look for unneeded globals, and found these in ext4. Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: find the group descriptors on a 1k-block bigalloc,meta_bg filesystemDarrick J. Wong2014-05-12
| | | | | | | | | | | On a filesystem with a 1k block size, the group descriptors live in block 2, not block 1. If the filesystem has bigalloc,meta_bg set, however, the calculation of the group descriptor table location does not take this into account and returns the wrong block number. Fix the calculation to return the correct value for this case. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: add a new spinlock i_raw_lock to protect the ext4's raw inodeTheodore Ts'o2014-04-21
| | | | | | | | To avoid potential data races, use a spinlock which protects the raw (on-disk) inode. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* ext4: rename uninitialized extents to unwrittenLukas Czerner2014-04-20
| | | | | | | | | | | | | | | | | | | | | | | Currently in ext4 there is quite a mess when it comes to naming unwritten extents. Sometimes we call it uninitialized and sometimes we refer to it as unwritten. The right name for the extent which has been allocated but does not contain any written data is _unwritten_. Other file systems are using this name consistently, even the buffer head state refers to it as unwritten. We need to fix this confusion in ext4. This commit changes every reference to an uninitialized extent (meaning allocated but unwritten) to unwritten extent. This includes comments, function names and variable names. It even covers abbreviation of the word uninitialized (such as uninit) and some misspellings. This commit does not change any of the code paths at all. This has been confirmed by comparing md5sums of the assembly code of each object file after all the function names were stripped from it. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: initialize multi-block allocator before checking block descriptorsAzat Khuzhin2014-04-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With EXT4FS_DEBUG ext4_count_free_clusters() will call ext4_read_block_bitmap() without s_group_info initialized, so we need to initialize multi-block allocator before. And dependencies that must be solved, to allow this: - multi-block allocator needs in group descriptors - need to install s_op before initializing multi-block allocator, because in ext4_mb_init_backend() new inode is created. - initialize number of group desc blocks (s_gdb_count) otherwise number of clusters returned by ext4_free_clusters_after_init() is not correct. (see ext4_bg_num_gdb_nometa()) Here is the stack backtrace: (gdb) bt #0 ext4_get_group_info (group=0, sb=0xffff880079a10000) at ext4.h:2430 #1 ext4_validate_block_bitmap (sb=sb@entry=0xffff880079a10000, desc=desc@entry=0xffff880056510000, block_group=block_group@entry=0, bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:358 #2 0xffffffff81232202 in ext4_wait_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0, bh=bh@entry=0xffff88007bf2b2d8) at balloc.c:476 #3 0xffffffff81232eaf in ext4_read_block_bitmap (sb=sb@entry=0xffff880079a10000, block_group=block_group@entry=0) at balloc.c:489 #4 0xffffffff81232fc0 in ext4_count_free_clusters (sb=sb@entry=0xffff880079a10000) at balloc.c:665 #5 0xffffffff81259ffa in ext4_check_descriptors (first_not_zeroed=<synthetic pointer>, sb=0xffff880079a10000) at super.c:2143 #6 ext4_fill_super (sb=sb@entry=0xffff880079a10000, data=<optimized out>, data@entry=0x0 <irq_stack_union>, silent=silent@entry=0) at super.c:3851 ... Signed-off-by: Azat Khuzhin <a3at.mail@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: optimize Hurd tests when reading/writing inodesTheodore Ts'o2014-03-24
| | | | | | | | | | | Set a in-memory superblock flag to indicate whether the file system is designed to support the Hurd. Also, add a sanity check to make sure the 64-bit feature is not set for Hurd file systems, since i_file_acl_high conflicts with a Hurd-specific field. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* ext4: each filesystem creates and uses its own mb_cacheT Makphaibulchoke2014-03-18
| | | | | | | | | | | | | | | This patch adds new interfaces to create and destory cache, ext4_xattr_create_cache() and ext4_xattr_destroy_cache(), and remove the cache creation and destory calls from ex4_init_xattr() and ext4_exitxattr() in fs/ext4/xattr.c. fs/ext4/super.c has been changed so that when a filesystem is mounted a cache is allocated and attched to its ext4_sb_info structure. fs/mbcache.c has been changed so that only one slab allocator is allocated and used by all mbcache structures. Signed-off-by: T. Makphaibulchoke <tmac@hp.com>
* ext4: only call sync_filesystm() when remounting read-onlyTheodore Ts'o2014-03-13
| | | | | | This is the only time it is required for ext4. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
* fs: push sync_filesystem() down to the file system's remount_fs()Theodore Ts'o2014-03-13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, the no-op "mount -o mount /dev/xxx" operation when the file system is already mounted read-write causes an implied, unconditional syncfs(). This seems pretty stupid, and it's certainly documented or guaraunteed to do this, nor is it particularly useful, except in the case where the file system was mounted rw and is getting remounted read-only. However, it's possible that there might be some file systems that are actually depending on this behavior. In most file systems, it's probably fine to only call sync_filesystem() when transitioning from read-write to read-only, and there are some file systems where this is not needed at all (for example, for a pseudo-filesystem or something like romfs). Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Artem Bityutskiy <dedekind1@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Evgeniy Dushistov <dushistov@mail.ru> Cc: Jan Kara <jack@suse.cz> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Anders Larsen <al@alarsen.net> Cc: Phillip Lougher <phillip@squashfs.org.uk> Cc: Kees Cook <keescook@chromium.org> Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: Petr Vandrovec <petr@vandrovec.name> Cc: xfs@oss.sgi.com Cc: linux-btrfs@vger.kernel.org Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: codalist@coda.cs.cmu.edu Cc: linux-ext4@vger.kernel.org Cc: linux-f2fs-devel@lists.sourceforge.net Cc: fuse-devel@lists.sourceforge.net Cc: cluster-devel@redhat.com Cc: linux-mtd@lists.infradead.org Cc: jfs-discussion@lists.sourceforge.net Cc: linux-nfs@vger.kernel.org Cc: linux-nilfs@vger.kernel.org Cc: linux-ntfs-dev@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: reiserfs-devel@vger.kernel.org
* ext4: Add __init marking to init_inodecacheFabian Frederick2014-02-17
| | | | | | | init_inodecache is only called by __init init_ext4_fs. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>