aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAge
...
| * | arch/tile: move user_exit() to early kernel entry sequenceChris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This ensures that we always notify context tracking that we have exited from user space no matter how we enter the kernel. It is similar to how arm64 handles context tracking, for example. This allows the removal of all the exception_enter() calls that were added in commit 49e4e15619cd ("tile: support CONTEXT_TRACKING and thus NOHZ_FULL"). Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile: fix bug in setting PT_FLAGS_DISABLE_IRQ on kernel entryChris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This flag value is saved in ptregs and used to decide whether to disable irqs when returning from the kernel. Commit 1168df528fe4 ("tile: don't assume user privilege is zero") performed a bad merge from some KVM-enabled code that had not yet been upstreamed. The only issue with the old code is that we will read the interrupt mask in more conditions than we need to (e.g., coming from user space when user space has the Interrupt Critical Section bit set, or coming from a guest kernel), which is a slow multi-cycle operation. This change saves those few cycles in the common case. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile: fix tilepro casts for readl, writel, etcChris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | Missing parentheses could cause an argument of the form "integer + pointer" to get cast to "(long)integer + pointer" and remain a pointer type, causing compiler warnings. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile: fix a -Wframe-larger-than warningChris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | The warning occurs in setup.c, where it is known that it can't be a problem, but it's still a good idea to silence the warning. The onstack array is converted from an s32 to a u8, which still is plenty of range for the values being managed there. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile: include the syscall number in the backtraceChris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | This information is easily available in the backtrace data and can be helpful when trying to figure out the backtrace, particularly if we're early in kernel entry or late in kernel exit. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | MAINTAINERS: add git URL for tileFengguang Wu2016-01-18
| | | | | | | | | | | | | | | Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | arch/tile: adopt prepare_exit_to_usermode() model from x86Chris Metcalf2016-01-18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change is a prerequisite change for TASK_ISOLATION but also stands on its own for readability and maintainability. The existing tile do_work_pending() was called in a loop from assembly on the slow path; this change moves the loop into C code as well. For the x86 version see commit c5c46f59e4e7 ("x86/entry: Add new, comprehensible entry and exit handlers written in C"). This change exposes a pre-existing bug on the older tilepro platform; the singlestep processing is done last, but on tilepro (unlike tilegx) we enable interrupts while doing that processing, so we could in theory miss a signal or other asynchronous event. A future change could fix this by breaking the singlestep work into a "prepare" step done in the main loop, and a "trigger" step done after exiting the loop. Since this change is intended as purely a restructuring change, we call out the bug explicitly now, but don't yet fix it. Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile/jump_label: add jump label support for TILE-GxZhigang Lu2016-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the arch-specific code to support jump label for TILE-Gx. This code shares NOP instruction with ftrace, so we move it to a common header file. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
| * | tile: define a macro ktext_writable_addr to get writable kernel text addressZhigang Lu2016-01-04
| | | | | | | | | | | | | | | | | | | | | | | | | | | It is used by kgdb, ftrace, kprobe and jump label, so we factor this out into a helper routine. Reviewed-by: Chris Metcalf <cmetcalf@ezchip.com> Signed-off-by: Zhigang Lu <zlu@ezchip.com> Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2016-01-18
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32 Pull AVR32 updates from Hans-Christian Noren Egtvedt. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/egtvedt/linux-avr32: mmc: atmel: get rid of struct mci_dma_data mmc: atmel-mci: restore dma on AVR32 avr32: wire up missing syscalls avr32: wire up accept4 syscall
| * | | mmc: atmel: get rid of struct mci_dma_dataMans Rullgard2016-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As struct mci_dma_data is now only used by AVR32, it is nothing but pointless indirection. Replace it with struct dw_dma_slave in the AVR32 platform code and with a void pointer elsewhere. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | mmc: atmel-mci: restore dma on AVR32Mans Rullgard2016-01-14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ecb89f2f5f3e7 ("mmc: atmel-mci: remove compat for non DT board when requesting dma chan") broke dma on AVR32 and any other boards not using DT. This restores a fallback mechanism for such cases. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> Acked-by: Ludovic Desroches <ludovic.desroches@atmel.com> Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
| * | | avr32: wire up missing syscallsHans-Christian Egtvedt2016-01-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds three missing syscalls to AVR32: __NR_userfaultfd __NR_membarrier __NR_mlock2 Signed-off-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
| * | | avr32: wire up accept4 syscallMans Rullgard2016-01-11
| | |/ | |/| | | | | | | | | | | | | | | | The accept4 syscall is missing on AVR32. Fix this. Signed-off-by: Mans Rullgard <mans@mansr.com> Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
* | | Merge branch 'for-linus-4.5' of ↵Linus Torvalds2016-01-18
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs updates from Chris Mason: "This has our usual assortment of fixes and cleanups, but the biggest change included is Omar Sandoval's free space tree. It's not the default yet, mounting -o space_cache=v2 enables it and sets a readonly compat bit. The tree can actually be deleted and regenerated if there are any problems, but it has held up really well in testing so far. For very large filesystems (30T+) our existing free space caching code can end up taking a huge amount of time during commits. The new tree based code is faster and less work overall to update as the commit progresses. Omar worked on this during the summer and we'll hammer on it in production here at FB over the next few months" * 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (73 commits) Btrfs: fix fitrim discarding device area reserved for boot loader's use Btrfs: Check metadata redundancy on balance btrfs: statfs: report zero available if metadata are exhausted btrfs: preallocate path for snapshot creation at ioctl time btrfs: allocate root item at snapshot ioctl time btrfs: do an allocation earlier during snapshot creation btrfs: use smaller type for btrfs_path locks btrfs: use smaller type for btrfs_path lowest_level btrfs: use smaller type for btrfs_path reada btrfs: cleanup, use enum values for btrfs_path reada btrfs: constify static arrays btrfs: constify remaining structs with function pointers btrfs tests: replace whole ops structure for free space tests btrfs: use list_for_each_entry* in backref.c btrfs: use list_for_each_entry_safe in free-space-cache.c btrfs: use list_for_each_entry* in check-integrity.c Btrfs: use linux/sizes.h to represent constants btrfs: cleanup, remove stray return statements btrfs: zero out delayed node upon allocation btrfs: pass proper enum type to start_transaction() ...
| * \ \ Merge branch 'for-chris-4.5' of ↵Chris Mason2016-01-11
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
| | * | | Btrfs: fix fitrim discarding device area reserved for boot loader's useFilipe Manana2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of the 4.3 kernel release, the fitrim ioctl can now discard any region of a disk that is not allocated to any chunk/block group, including the first megabyte which is used for our primary superblock and by the boot loader (grub for example). Fix this by not allowing to trim/discard any region in the device starting with an offset not greater than min(alloc_start_mount_option, 1Mb), just as it was not possible before 4.3. A reproducer test case for xfstests follows. seq=`basename $0` seqres=$RESULT_DIR/$seq echo "QA output created by $seq" tmp=/tmp/$$ status=1 # failure is the default! trap "_cleanup; exit \$status" 0 1 2 3 15 _cleanup() { cd / rm -f $tmp.* } # get standard environment, filters and checks . ./common/rc . ./common/filter # real QA test starts here _need_to_be_root _supported_fs btrfs _supported_os Linux _require_scratch rm -f $seqres.full _scratch_mkfs >>$seqres.full 2>&1 # Write to the [0, 64Kb[ and [68Kb, 1Mb[ ranges of the device. These ranges are # reserved for a boot loader to use (GRUB for example) and btrfs should never # use them - neither for allocating metadata/data nor should trim/discard them. # The range [64Kb, 68Kb[ is used for the primary superblock of the filesystem. $XFS_IO_PROG -c "pwrite -S 0xfd 0 64K" $SCRATCH_DEV | _filter_xfs_io $XFS_IO_PROG -c "pwrite -S 0xfd 68K 956K" $SCRATCH_DEV | _filter_xfs_io # Now mount the filesystem and perform a fitrim against it. _scratch_mount _require_batched_discard $SCRATCH_MNT $FSTRIM_PROG $SCRATCH_MNT # Now unmount the filesystem and verify the content of the ranges was not # modified (no trim/discard happened on them). _scratch_unmount echo "Content of the ranges [0, 64Kb] and [68Kb, 1Mb[ after fitrim:" od -t x1 -N $((64 * 1024)) $SCRATCH_DEV od -t x1 -j $((68 * 1024)) -N $((956 * 1024)) $SCRATCH_DEV status=0 exit Reported-by: Vincent Petry <PVince81@yahoo.fr> Reported-by: Andrei Borzenkov <arvidjaar@gmail.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109341 Fixes: 499f377f49f0 (btrfs: iterate over unused chunk space in FITRIM) Cc: stable@vger.kernel.org # 4.3+ Signed-off-by: Filipe Manana <fdmanana@suse.com>
| | * | | Btrfs: fix transaction handle leak on failure to create hard linkFilipe Manana2016-01-06
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we failed to create a hard link we were not always releasing the the transaction handle we got before, resulting in a memory leak and preventing any other tasks from being able to commit the current transaction. Fix this by always releasing our transaction handle. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
| | * | | Btrfs: fix number of transaction units required to create symlinkFilipe Manana2015-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We weren't accounting for the insertion of an inline extent item for the symlink inode nor that we need to update the parent inode item (through the call to btrfs_add_nondir()). So fix this by including two more transaction units. Signed-off-by: Filipe Manana <fdmanana@suse.com>
| | * | | Btrfs: don't leave dangling dentry if symlink creation failedFilipe Manana2015-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are creating a symlink we might fail with an error after we created its inode and added the corresponding directory indexes to its parent inode. In this case we end up never removing the directory indexes because the inode eviction handler, called for our symlink inode on the final iput(), only removes items associated with the symlink inode and not with the parent inode. Example: $ mkfs.btrfs -f /dev/sdi $ mount /dev/sdi /mnt $ touch /mnt/foo $ ln -s /mnt/foo /mnt/bar ln: failed to create symbolic link ‘bar’: Cannot allocate memory $ umount /mnt $ btrfsck /dev/sdi Checking filesystem on /dev/sdi UUID: d5acb5ba-31bd-42da-b456-89dca2e716e1 checking extents checking free space cache checking fs roots root 5 inode 258 errors 2001, no inode item, link count wrong unresolved ref dir 256 index 3 namelen 3 name bar filetype 7 errors 4, no inode ref found 131073 bytes used err is 1 total csum bytes: 0 total tree bytes: 131072 total fs tree bytes: 32768 total extent tree bytes: 16384 btree space waste bytes: 124305 file data blocks allocated: 262144 referenced 262144 btrfs-progs v4.2.3 So fix this by adding the directory index entries as the very last step of symlink creation. Signed-off-by: Filipe Manana <fdmanana@suse.com>
| | * | | Btrfs: send, don't BUG_ON() when an empty symlink is foundFilipe Manana2015-12-31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a symlink is successfully created it always has an inline extent containing the source path. However if an error happens when creating the symlink, we can leave in the subvolume's tree a symlink inode without any such inline extent item - this happens if after btrfs_symlink() calls btrfs_end_transaction() and before it calls the inode eviction handler (through the final iput() call), the transaction gets committed and a crash happens before the eviction handler gets called, or if a snapshot of the subvolume is made before the eviction handler gets called. Sadly we can't just avoid this by making btrfs_symlink() call btrfs_end_transaction() after it calls the eviction handler, because the later can commit the current transaction before it removes any items from the subvolume tree (if it encounters ENOSPC errors while reserving space for removing all the items). So make send fail more gracefully, with an -EIO error, and print a message to dmesg/syslog informing that there's an empty symlink inode, so that the user can delete the empty symlink or do something else about it. Reported-by: Stephen R. van den Berg <srb@cuci.nl> Signed-off-by: Filipe Manana <fdmanana@suse.com>
| | * | | Btrfs: fix race between free space endio workers and space cache writeoutFilipe Manana2015-12-30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While running a stress test I ran into the following trace/transaction abort: [471626.672243] ------------[ cut here ]------------ [471626.673322] WARNING: CPU: 9 PID: 19107 at fs/btrfs/extent-tree.c:3740 btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs]() [471626.675492] BTRFS: Transaction aborted (error -2) [471626.676748] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix [471626.688802] CPU: 14 PID: 19107 Comm: fsstress Tainted: G W 4.3.0-rc5-btrfs-next-17+ #1 [471626.690148] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [471626.691901] 0000000000000000 ffff880016037cf0 ffffffff812566f4 ffff880016037d38 [471626.695009] ffff880016037d28 ffffffff8104d0a6 ffffffffa040c84e 00000000fffffffe [471626.697490] ffff88011fe855f8 ffff88000c484cb0 ffff88000d195000 ffff880016037d90 [471626.699201] Call Trace: [471626.699804] [<ffffffff812566f4>] dump_stack+0x4e/0x79 [471626.701049] [<ffffffff8104d0a6>] warn_slowpath_common+0x9f/0xb8 [471626.702542] [<ffffffffa040c84e>] ? btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [471626.704326] [<ffffffff8104d107>] warn_slowpath_fmt+0x48/0x50 [471626.705636] [<ffffffffa0403717>] ? write_one_cache_group.isra.32+0x77/0x82 [btrfs] [471626.707048] [<ffffffffa040c84e>] btrfs_write_dirty_block_groups+0x17c/0x214 [btrfs] [471626.708616] [<ffffffffa048a50a>] commit_cowonly_roots+0x1d7/0x25a [btrfs] [471626.709950] [<ffffffffa041e34a>] btrfs_commit_transaction+0x4c4/0x991 [btrfs] [471626.711286] [<ffffffff81081c61>] ? signal_pending_state+0x31/0x31 [471626.712611] [<ffffffffa03f6df4>] btrfs_sync_fs+0x145/0x1ad [btrfs] [471626.715610] [<ffffffff811962a2>] ? SyS_tee+0x226/0x226 [471626.716718] [<ffffffff811962c2>] sync_fs_one_sb+0x20/0x22 [471626.717672] [<ffffffff8116fc01>] iterate_supers+0x75/0xc2 [471626.718800] [<ffffffff8119669a>] sys_sync+0x52/0x80 [471626.719990] [<ffffffff8147cd97>] entry_SYSCALL_64_fastpath+0x12/0x6f [471626.721835] ---[ end trace baf57f43d76693f4 ]--- [471626.722954] BTRFS: error (device sdc) in btrfs_write_dirty_block_groups:3740: errno=-2 No such entry This is a very rare situation and it happened due to a race between a free space endio worker and writing the space caches for dirty block groups at a transaction's commit critical section. The steps leading to this are: 1) A task calls btrfs_commit_transaction() and starts the writeout of the space caches for all currently dirty block groups (i.e. it calls btrfs_start_dirty_block_groups()); 2) The previous step starts writeback for space caches; 3) When the writeback finishes it queues jobs for free space endio work queue (fs_info->endio_freespace_worker) that execute btrfs_finish_ordered_io(); 4) The task committing the transaction sets the transaction's state to TRANS_STATE_COMMIT_DOING and shortly after calls btrfs_write_dirty_block_groups(); 5) A free space endio job joins the transaction, through btrfs_join_transaction_nolock(), and updates a free space inode item in the root tree through btrfs_update_inode_fallback(); 6) Updating the free space inode item resulted in COWing one or more nodes/leaves of the root tree, and that resulted in creating a new metadata block group, which gets added to the transaction's list of dirty block groups (this is a very rare case); 7) The free space endio job has not released yet its transaction handle at this point, so the new metadata block group was not yet fully created (didn't go through btrfs_create_pending_block_groups() yet); 8) The transaction commit task sees the new metadata block group in the transaction's list of dirty block groups and processes it. When it attempts to update the block group's block group item in the extent tree, through write_one_cache_group(), it isn't able to find it and aborts the transaction with error -ENOENT - this is because the free space endio job hasn't yet released its transaction handle (which calls btrfs_create_pending_block_groups()) and therefore the block group item was not yet added to the extent tree. Fix this waiting for free space endio jobs if we fail to find a block group item in the extent tree and then retry once updating the block group item. Signed-off-by: Filipe Manana <fdmanana@suse.com>
| * | | | Merge branch 'misc-cleanups-4.5' of ↵Chris Mason2016-01-11
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5 Signed-off-by: Chris Mason <clm@fb.com>
| | * | | | btrfs: use list_for_each_entry* in backref.cGeliang Tang2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use list_for_each_entry*() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: use list_for_each_entry_safe in free-space-cache.cGeliang Tang2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: use list_for_each_entry* in check-integrity.cGeliang Tang2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use list_for_each_entry*() instead of list_for_each*() to simplify the code. Signed-off-by: Geliang Tang <geliangtang@163.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | Btrfs: use linux/sizes.h to represent constantsByongho Lee2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We use many constants to represent size and offset value. And to make code readable we use '256 * 1024 * 1024' instead of '268435456' to represent '256MB'. However we can make far more readable with 'SZ_256MB' which is defined in the 'linux/sizes.h'. So this patch replaces 'xxx * 1024 * 1024' kind of expression with single 'SZ_xxxMB' if 'xxx' is a power of 2 then 'xxx * SZ_1M' if 'xxx' is not a power of 2. And I haven't touched to '4096' & '8192' because it's more intuitive than 'SZ_4KB' & 'SZ_8KB'. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: cleanup, remove stray return statementsDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: zero out delayed node upon allocationAlexandru Moise2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's slightly cleaner to zero-out the delayed node upon allocation than to do it by hand in btrfs_init_delayed_node() for a few members Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: pass proper enum type to start_transaction()Alexandru Moise2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: switch __btrfs_fs_incompat return type from int to boolAlexandru Moise2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conform to __btrfs_fs_incompat() cast-to-bool (!!) by explicitly returning boolean not int. Signed-off-by: Alexandru Moise <00moses.alexander00@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: remove unused inode argument from uncompress_inline()Byongho Lee2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inode argument is never used from the beginning, so remove it. Signed-off-by: Byongho Lee <bhlee.kernel@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | | | | Merge branch 'misc-for-4.5' of ↵Chris Mason2016-01-11
| |\ \ \ \ \ | | |_|/ / / | |/| | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.5
| | * | | | Btrfs: Check metadata redundancy on balanceSam Tygier2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When converting a filesystem via balance check that metadata mode is at least as redundant as the data mode. For example give warning when: -dconvert=raid1 -mconvert=single Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk> [ minor message reformatting ] Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: statfs: report zero available if metadata are exhaustedDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is one ENOSPC case that's very confusing. There's Available greater than zero but no file operation succeds (besides removing files). This happens when the metadata are exhausted and there's no possibility to allocate another chunk. In this scenario it's normal that there's still some space in the data chunk and the calculation in df reflects that in the Avail value. To at least give some clue about the ENOSPC situation, let statfs report zero value in Avail, even if there's still data space available. Current: /dev/sdb1 4.0G 3.3G 719M 83% /mnt/test New: /dev/sdb1 4.0G 3.3G 0 100% /mnt/test We calculate the remaining metadata space minus global reserve. If this is (supposedly) smaller than zero, there's no space. But this does not hold in practice, the exhausted state happens where's still some positive delta. So we apply some guesswork and compare the delta to a 4M threshold. (Practically observed delta was 2M.) We probably cannot calculate the exact threshold value because this depends on the internal reservations requested by various operations, so some operations that consume a few metadata will succeed even if the Avail is zero. But this is better than the other way around. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: preallocate path for snapshot creation at ioctl timeDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can also preallocate btrfs_path that's used during pending snapshot creation and avoid another late ENOMEM failure. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: allocate root item at snapshot ioctl timeDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The actual snapshot creation is delayed until transaction commit. If we cannot get enough memory for the root item there, we have to fail the whole transaction commit which is bad. So we'll allocate the memory at the ioctl call and pass it along with the pending_snapshot struct. The potential ENOMEM will be returned to the caller of snapshot ioctl. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: do an allocation earlier during snapshot creationDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can allocate pending_snapshot earlier and do not have to do cleanup in case of failure. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: use smaller type for btrfs_path locksDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The values of btrfs_path::locks are 0 to 4, fit into a u8. Let's see: * overall size of btrfs_path drops down from 136 to 112 (-24 bytes), * better packing in a slab page +6 objects * the whole structure now fits to 2 cachelines * slight decrease in code size: text data bss dec hex filename 938731 43670 23144 1005545 f57e9 fs/btrfs/btrfs.ko.before 938203 43670 23144 1005017 f55d9 fs/btrfs/btrfs.ko.after (and the generated assembly does not change much) The main purpose is to decrease the size of the structure without affecting performance. The byte access is usually well behaving accross arches, the locks are not accessed frequently and sometimes just compared to zero. Note for further size reduction attempts: the slots could be made u16 but this might generate worse code on some arches (non-byte and non-int access). Also the range of operations on slots is wider compared to locks and the potential performance drop should be evaluated first. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: use smaller type for btrfs_path lowest_levelDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The level is 0..7, we can use smaller type. The size of btrfs_path is now 136 bytes from 144, which is +2 objects that fit into a 4k slab. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: use smaller type for btrfs_path readaDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The possible values for reada are all positive and bounded, we can later save some bytes by storing it in u8. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: cleanup, use enum values for btrfs_path readaDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the integers by enums for better readability. The value 2 does not have any meaning since a717531942f488209dded30f6bc648167bcefa72 "Btrfs: do less aggressive btree readahead" (2009-01-22). Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: constify static arraysDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few statically initialized arrays that can be made const. The remaining (like file_system_type, sysfs attributes or prop handlers) do not allow that due to type mismatch when passed to the APIs or because the structures are modified through other members. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: constify remaining structs with function pointersDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * struct extent_io_ops * struct btrfs_free_space_op Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs tests: replace whole ops structure for free space testsDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Preparatory work for making btrfs_free_space_op constant. In test_steal_space_from_bitmap_to_extent, we substitute use_bitmap with own version thus preventing constification. We can rework it so we replace the whole structure with the correct function pointers. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: don't use slab cache for struct btrfs_delalloc_workDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although we prefer to use separate caches for various structs, it seems better not to do that for struct btrfs_delalloc_work. Objects of this type are allocated rarely, when transaction commit calls btrfs_start_delalloc_roots, requesting delayed iputs. The objects are temporary (with some IO involved) but still allocated and freed within __start_delalloc_inodes. Memory allocation failure is handled. The slab cache is empty most of the time (observed on several systems), so if we need to allocate a new slab object, the first one has to allocate a full page. In a potential case of low memory conditions this might fail with higher probability compared to using the generic slab caches. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: drop duplicate prefix from scrub workqueuesDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The helper btrfs_alloc_workqueue will add the "btrfs-" prefix. Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: verbose error when we find an unexpected item in sys_arrayDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: handle invalid num_stripes in sys_arrayDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can handle the special case of num_stripes == 0 directly inside btrfs_read_sys_array. The BUG_ON in btrfs_chunk_item_size is there to catch other unhandled cases where we fail to validate external data. A crafted or corrupted image crashes at mount time: BTRFS: device fsid 9006933e-2a9a-44f0-917f-514252aeec2c devid 1 transid 7 /dev/loop0 BTRFS info (device loop0): disk space caching is enabled BUG: failure at fs/btrfs/ctree.h:337/btrfs_chunk_item_size()! Kernel panic - not syncing: BUG! CPU: 0 PID: 313 Comm: mount Not tainted 4.2.5-00657-ge047887-dirty #25 Stack: 637af890 60062489 602aeb2e 604192ba 60387961 00000011 637af8a0 6038a835 637af9c0 6038776b 634ef32b 00000000 Call Trace: [<6001c86d>] show_stack+0xfe/0x15b [<6038a835>] dump_stack+0x2a/0x2c [<6038776b>] panic+0x13e/0x2b3 [<6020f099>] btrfs_read_sys_array+0x25d/0x2ff [<601cfbbe>] open_ctree+0x192d/0x27af [<6019c2c1>] btrfs_mount+0x8f5/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<6019bcb0>] btrfs_mount+0x2e4/0xb9a [<600bc9a7>] mount_fs+0x11/0xf3 [<600d5167>] vfs_kern_mount+0x75/0x11a [<600d710b>] do_mount+0xa35/0xbc9 [<600d7557>] SyS_mount+0x95/0xc8 [<6001e884>] handle_syscall+0x6b/0x8e Reported-by: Jiri Slaby <jslaby@suse.com> Reported-by: Vegard Nossum <vegard.nossum@oracle.com> CC: stable@vger.kernel.org # 3.19+ Signed-off-by: David Sterba <dsterba@suse.com>
| | * | | | btrfs: better packing of btrfs_delayed_extent_opDavid Sterba2016-01-07
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_delayed_extent_op can be packed in a better way, it's 40 bytes now and has 8 unused bytes. Reducing the level type to u8 makes it possible to squeeze it to the padding byte after key. The bitfields were switched to bool as there's space to store the full byte without increasing the whole structure, besides that the generated assembly is smaller. struct btrfs_delayed_extent_op { struct btrfs_disk_key key; /* 0 17 */ u8 level; /* 17 1 */ bool update_key; /* 18 1 */ bool update_flags; /* 19 1 */ bool is_data; /* 20 1 */ /* XXX 3 bytes hole, try to pack */ u64 flags_to_set; /* 24 8 */ /* size: 32, cachelines: 1, members: 6 */ /* sum members: 29, holes: 1, sum holes: 3 */ /* last cacheline: 32 bytes */ }; The final size is 32 bytes which gives +26 object per slab page. text data bss dec hex filename 938811 43670 23144 1005625 f5839 fs/btrfs/btrfs.ko.before 938747 43670 23144 1005561 f57f9 fs/btrfs/btrfs.ko.after Signed-off-by: David Sterba <dsterba@suse.com>