| Commit message (Collapse) | Author | Age |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull IPMI driver updates from Corey Minyard:
"Some minor fixes and cleanups, nothing big.
In for-next for a while and I've done some extensive beating on the
driver since I have it working in qemu and can do creatively cruel
things to it"
* tag 'for-linus-3.20-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
ipmi: Fix a memory ordering issue
ipmi: Remove uses of return value of seq_printf
ipmi: Use is_visible callback for conditional sysfs entries
ipmi: Free ipmi_recv_msg messages from the linked list on close
ipmi: avoid gcc warning
ipmi: Update timespec usage to timespec64
ipmi: Cleanup DEBUG_TIMING ifdef usage
drivers:char:ipmi: Remove unneeded FIXME comment in the file,ipmi_si_intf.c
char: ipmi: Remove obsolete cleanup for clientdata
ipmi: Remove a FIXME for slab conversion
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
From a locking point of view it is safe to check waiting_msg without
a lock, but there is a memory ordering issue that causes it to
possibly not be set right when viewed from another processor. We are
already claiming a lock right after that, move the check to inside
the lock to enforce the memory ordering.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The seq_printf like functions will soon be changed to return void.
Convert these uses to check seq_has_overflowed instead.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Instead of manual calls of device_create_file() and
device_remove_file(), implement the condition in is_visible callback
for the attribute group and put these entries to the group, too.
This simplifies the code and avoids the possible races.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds a loop through the elements in the linked list, recv_msgs using
list_for_entry_safe in order to free messages in this list. In addition
we are using the safe version of this marco in order to prevent use after
bugs related to deleting the element we are on currently by holding a
pointer to the next element after the current one we are on and freeing
with the function, ipmi_free_recv_msg internally in this loop.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A new harmless warning has come up on ARM builds with gcc-4.9:
drivers/char/ipmi/ipmi_msghandler.c: In function 'smi_send.isra.11':
include/linux/spinlock.h:372:95: warning: 'flags' may be used uninitialized in this function [-Wmaybe-uninitialized]
raw_spin_unlock_irqrestore(&lock->rlock, flags);
^
drivers/char/ipmi/ipmi_msghandler.c:1490:16: note: 'flags' was declared here
unsigned long flags;
^
This could be worked around by initializing the 'flags' variable, but it
seems better to rework the code to avoid this.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 7ea0ed2b5be81 ("ipmi: Make the message handler easier to use for SMI interfaces")
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As part of the internal y2038 cleanup, this patch removes
timespec usage in the ipmi driver, replacing it timespec64
Cc: openipmi-developer@lists.sourceforge.net
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Corey Minyard <minyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The driver uses #ifdef DEBUG_TIMING in order to conditionally print out
timestamped debug messages. Unfortunately it adds the ifdefs all over the
usage sites.
This patch cleans it up by adding a debug_timestamp() function which
is compiled out if DEBUG_TIMING isn't present. This cleans up all
the ugly ifdefs in the function logic.
Cc: openipmi-developer@lists.sourceforge.net
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Corey Minyard <minyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Removes a no longer needed FIXME comment in the function,acpi_gpe_irq_setup
for the file,ipmi_si_intf.c. This comment is no longer needed as clearly we
are passing the correct level of ACPI_GPE_LEVEL_TRIGGERED to the installer
function,acpi_install_gpe_handler due to no breakage after years of using
this ACPI level in the function,acpi_install_gpe_handler.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
A few new i2c-drivers came into the kernel which clear the clientdata-pointer
on exit or error. This is obsolete meanwhile, the core will do it.
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There can't be more than a few IPMI messages allocated at any one time,
so converting the messages to slabs would be a waste. So just remove
the FIXME.
Suggested-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Corey Minyard <cminyard@mvista.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull more thermal managament updates from Zhang Rui:
"Specifics:
- Exynos thermal driver refactoring. Several cleanups, code
optimization, unused symbols removal, and unused feature removal in
Exynos thermal driver. Thanks Lukasz for this effort.
- Exynos thermal driver support to OF thermal. After the code
refactoring, the driver earned the support to OF thermal. Chip
thermal data were moved from driver code to DTS, reducing the code
footprint. Thanks Lukasz for this.
- After receiving the OF thermal support, the exynos thermal driver
now must allow modular build. Thanks Arnd for detecting, reporting
and fixing this.
- Exynos thermal driver support to Exynos 7 SoC. Thanks Abhilash for
this.
- Accurate temperature reporting on Rockchip thermal driver, thanks
to Caesar.
- Fix on how OF thermal enables its zones, thanks Lukasz for fixing.
- Fixes in OF thermal examples under Documentation/. Thanks Srinivas
for fixing"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: exynos: Add TMU support for Exynos7 SoC
dts: Documentation: Add documentation for Exynos7 SoC thermal bindings
cpufreq: exynos: allow modular build
thermal: Fix examples in DT documentation
thermal: exynos: Correct sanity check at exynos_report_trigger() function
thermal: Kconfig: Remove config for not used EXYNOS_THERMAL_CORE
thermal: exynos: Remove exynos_tmu_data.c file
thermal: rockchip: make temperature reporting much more accurate
thermal: exynos: Remove exynos_thermal_common.[c|h] files
thermal: samsung: core: Exynos TMU rework to use device tree for configuration
dts: Documentation: Update exynos-thermal.txt example for Exynos5440
dts: Documentation: Extending documentation entry for exynos-thermal
cpufreq: exynos: Use device tree to determine if cpufreq cooling should be registered
thermal: exynos: Modify exynos thermal code to use device tree for cpu cooling configuration
thermal: exynos: Provide thermal_exynos.h file to be included in device tree files
thermal: exynos: cosmetic: Correct comment format
thermal: of: Enable thermal_zoneX when sensor is correctly added
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add registers, bit fields and compatible strings for Exynos7 TMU
(Thermal Management Unit). Following are a few of the differences
in the Exynos7 TMU from earlier SoCs:
- 8 trigger levels
- Different bit offsets and more registers for the rising
and falling thresholds.
- New power down detection bit in the TMU_CONTROL register
which does not update the CURRENT_TEMP0 when tmu power down
is detected.
- Change in bit offset for the NEXT_DATA field of EMUL_CON
register. EMUL_CON register address has also changed.
- INTSTAT and INTCLEAR registers present in earlier SoCs
have been combined into one INTPEND register. The register
address for INTCLEAR and INTPEND is also different.
- Since there are 8 rising/falling interrupts as against
at most 4 in earlier SoCs the INTEN bit offsets are different.
- Multiple probe support which is handled by a TMU_CONTROL1
register (No support for this in the current patch).
This patch adds special clock support required only for Exynos7. It
also updates the "code_to_temp" prototype as Exynos7 has 9 bit
code-temp mapping.
Acked-by: Lukasz Majewski <l.majewski@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add documentation for exynos7 thermal bindings including compatible
name and special clock properties.
Acked-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The exynos cpufreq driver code recently gained a dependency on the
cooling code, which may be a loadable module. This breaks an ARM
allmodconfig build:
drivers/built-in.o: In function `exynos_cpufreq_probe':
:(.text+0x1748e8): undefined reference to `of_cpufreq_cooling_register'
To avoid this problem, change cpufreq Kconfig to allow the drivers
to be loadable modules as well and enforce a dependency on the
thermal module.
This change, in order to allow module builds on this cpufreq
driver, properly constructs the driver into a single module,
instead of several modules. The change also keeps the proper
platform dependency, and therefore, it wont load in platforms
that are not supposed to be loaded. The user will be able to
build the support for all platforms, or select which platforms
(s)he wants (as originally), except that now it can be a module,
instead.
Besides, it will still keep the driver only on those configs
that expect it to be on. And it won't compile/load on platforms
that it is not supposed to. It brings the config ARM_EXYNOS_CPU_FREQ_BOOST_SW
closer to this driver, so it looks better in the menuconfig.
We intentionally change ARM_EXYNOS5440_CPUFREQ to be tristate too, to
avoid future troubles.
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-pm@vger.kernel.org
Cc: linux-samsung-soc@vger.kernel.org
Fixes: e725d26c4857 ("cpufreq: exynos: Use device tree to determine if cpufreq cooling should be registered")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There are various issues with the examples in this documentation, some
of the DT labels are invalid and one of the macro THERMAL_NO_LIMITS
referenced is not available as well.
This patch attempts to fix such errors in the documentation.
Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Up till now, by mistake, wrong variable was tested against being NULL.
Since exynos_report_trigger() is always called with valid p pointer,
it is only necessary to check if a valid thermal zone device is passed.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After removing exynos_thermal_common.[c|h] files the CONFIG_EXYNOS_THERMA_CORE
is not needed anymore.
This patch removes this entry from Kconfig.
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Data already present in the exynos_tmu_data.c file has been moved to the
appropriate device tree files.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In general, the kernel should report temperature readings exactly as
reported by the hardware. The cpu / gpu thermal driver works in 5 degree
increments,but we ought to do more accurate. The temperature will do
linear interpolation between the entries in the table.
Test= $md5sum /dev/zero &
$while true; do grep "" /sys/class/thermal/thermal_zone[1-2]/temp;
sleep .5; done
e.g. We can get the result as follows:
/sys/class/thermal/thermal_zone1/temp:39994
/sys/class/thermal/thermal_zone2/temp:39086
/sys/class/thermal/thermal_zone1/temp:39994
/sys/class/thermal/thermal_zone2/temp:39540
/sys/class/thermal/thermal_zone1/temp:39540
/sys/class/thermal/thermal_zone2/temp:39540
/sys/class/thermal/thermal_zone1/temp:39540
/sys/class/thermal/thermal_zone2/temp:39994
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Caesar Wang <wxt@rock-chips.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
After defining all necessary Exynos data in the device tree and heavy
reusage of the of-thermal.c those files can be removed.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This patch brings support for providing configuration via device tree.
Previously this data has been hardcoded in the exynos_tmu_data.c file.
Such approach was not scalable and very often required copying the whole
data.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| | |
Updating exynos-thermal.txt documentation entry for Exynos5440
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Properties necessary for providing Exynos thermal configuration via device
tree.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
registered
With thermal subsystem rework it is necessary to tune current cpufreq code
to use cpu frequency change as a potential cooling device.
Now the cpu cooling device is registered only when proper nodes and properties
are available in device tree. Lack of them, however, will not prevent
cpufreq for normal operation.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
cooling configuration
Up till now exynos_tmu_data.c was used for storing CPU cooling configuration
data. Now the Exynos thermal core code uses device tree to get this data.
For this purpose generic thermal code for configuring CPU cooling was
used.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
files
This patch is a preparatory patch to be able to read Exynos thermal
configuration from the device tree.
It turned out that DTC is not able to interpret enums properly and hence
it is necessary to #define those values explicitly.
For this reason the ./include/dt-bindings/thermal/thermal_exynos.h file
has been introduced.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| | |
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Up till now the thermal_zone mode was by default "disabled". With this
patch the default behavior was changed to "enable".
One can read the mode at:
/sys/class/thermal/thermal_zone0/mode
Tested-by: Javi Merino <javi.merino@arm.com>
Reported-by: Abhilash Kesavan <a.kesavan@samsung.com>
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since _PAGE_PROTNONE aliases _PAGE_GLOBAL it is only valid if
_PAGE_PRESENT is clear. Make pte_protnone() and pmd_protnone() check
for this.
This fixes a 64-bit Xen PV guest regression introduced by 8a0516ed8b90
("mm: convert p[te|md]_numa users to p[te|md]_protnone_numa"). Any
userspace process would endlessly fault.
In a 64-bit PV guest, userspace page table entries have _PAGE_GLOBAL set
by the hypervisor. This meant that any fault on a present userspace
entry (e.g., a write to a read-only mapping) would be misinterpreted as
a NUMA hinting fault and the fault would not be correctly handled,
resulting in the access endlessly faulting.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
"This pull is mostly cleanups and fixes:
- The raid5/6 cleanups from Zhao Lei fixup some long standing warts
in the code and add improvements on top of the scrubbing support
from 3.19.
- Josef has round one of our ENOSPC fixes coming from large btrfs
clusters here at FB.
- Dave Sterba continues a long series of cleanups (thanks Dave), and
Filipe continues hammering on corner cases in fsync and others
This all was held up a little trying to track down a use-after-free in
btrfs raid5/6. It's not clear yet if this is just made easier to
trigger with this pull or if its a new bug from the raid5/6 cleanups.
Dave Sterba is the only one to trigger it so far, but he has a
consistent way to reproduce, so we'll get it nailed shortly"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (68 commits)
Btrfs: don't remove extents and xattrs when logging new names
Btrfs: fix fsync data loss after adding hard link to inode
Btrfs: fix BUG_ON in btrfs_orphan_add() when delete unused block group
Btrfs: account for large extents with enospc
Btrfs: don't set and clear delalloc for O_DIRECT writes
Btrfs: only adjust outstanding_extents when we do a short write
btrfs: Fix out-of-space bug
Btrfs: scrub, fix sleep in atomic context
Btrfs: fix scheduler warning when syncing log
Btrfs: Remove unnecessary placeholder in btrfs_err_code
btrfs: cleanup init for list in free-space-cache
btrfs: delete chunk allocation attemp when setting block group ro
btrfs: clear bio reference after submit_one_bio()
Btrfs: fix scrub race leading to use-after-free
Btrfs: add missing cleanup on sysfs init failure
Btrfs: fix race between transaction commit and empty block group removal
btrfs: add more checks to btrfs_read_sys_array
btrfs: cleanup, rename a few variables in btrfs_read_sys_array
btrfs: add checks for sys_chunk_array sizes
btrfs: more superblock checks, lower bounds on devices and sectorsize/nodesize
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we are recording in the tree log that an inode has new names (new hard
links were added), we would drop items, belonging to the inode, that we
shouldn't:
1) When the flag BTRFS_INODE_COPY_EVERYTHING is set in the inode's runtime
flags, we ended up dropping all the extent and xattr items that were
previously logged. This was done only in memory, since logging a new
name doesn't imply syncing the log;
2) When the flag BTRFS_INODE_COPY_EVERYTHING is set in the inode's runtime
flags, we ended up dropping all the xattr items that were previously
logged. Like the case before, this was done only in memory because
logging a new name doesn't imply syncing the log.
This led to some surprises in scenarios such as the following:
1) write some extents to an inode;
2) fsync the inode;
3) truncate the inode or delete/modify some of its xattrs
4) add a new hard link for that inode
5) fsync some other file, to force the log tree to be durably persisted
6) power failure happens
The next time the fs is mounted, the fsync log replay code is executed,
and the resulting file doesn't have the content it had when the last fsync
against it was performed, instead if has a content matching what it had
when the last transaction commit happened.
So change the behaviour such that when a new name is logged, only the inode
item and reference items are processed.
This is easy to reproduce with the test I just made for xfstests, whose
main body is:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create our test file with some data.
$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Make sure the file is durably persisted.
sync
# Append some data to our file, to increase its size.
$XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Fsync the file, so from this point on if a crash/power failure happens, our
# new data is guaranteed to be there next time the fs is mounted.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Now shrink our file to 5000 bytes.
$XFS_IO_PROG -c "truncate 5000" $SCRATCH_MNT/foo
# Now do an expanding truncate to a size larger than what we had when we last
# fsync'ed our file. This is just to verify that after power failure and
# replaying the fsync log, our file matches what it was when we last fsync'ed
# it - 12Kb size, first 8Kb of data had a value of 0xaa and the last 4Kb of
# data had a value of 0xcc.
$XFS_IO_PROG -c "truncate 32K" $SCRATCH_MNT/foo
# Add one hard link to our file. This made btrfs drop all of our file's
# metadata from the fsync log, including the metadata relative to the
# extent we just wrote and fsync'ed. This change was made only to the fsync
# log in memory, so adding the hard link alone doesn't change the persisted
# fsync log. This happened because the previous truncates set the runtime
# flag BTRFS_INODE_NEEDS_FULL_SYNC in the btrfs inode structure.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
# Now make sure the in memory fsync log is durably persisted.
# Creating and fsync'ing another file will do it.
# After this our persisted fsync log will no longer have metadata for our file
# foo that points to the extent we wrote and fsync'ed before.
touch $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
# As expected, before the crash/power failure, we should be able to see a file
# with a size of 32Kb, with its first 5000 bytes having the value 0xaa and all
# the remaining bytes with value 0x00.
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# After mounting the fs again, the fsync log was replayed.
# The expected result is to see a file with a size of 12Kb, with its first 8Kb
# of data having the value 0xaa and its last 4Kb of data having a value of 0xcc.
# The btrfs bug used to leave the file as it used te be as of the last
# transaction commit - that is, with a size of 8Kb with all bytes having a
# value of 0xaa.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
The test case for xfstests follows soon.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We have a scenario where after the fsync log replay we can lose file data
that had been previously fsync'ed if we added an hard link for our inode
and after that we sync'ed the fsync log (for example by fsync'ing some
other file or directory).
This is because when adding an hard link we updated the inode item in the
log tree with an i_size value of 0. At that point the new inode item was
in memory only and a subsequent fsync log replay would not make us lose
the file data. However if after adding the hard link we sync the log tree
to disk, by fsync'ing some other file or directory for example, we ended
up losing the file data after log replay, because the inode item in the
persisted log tree had an an i_size of zero.
This is easy to reproduce, and the following excerpt from my test for
xfstests shows this:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create one file with data and fsync it.
# This made the btrfs fsync log persist the data and the inode metadata with
# a correct inode->i_size (4096 bytes).
$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 4K 0 4K" -c "fsync" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Now add one hard link to our file. This made the btrfs code update the fsync
# log, in memory only, with an inode metadata having a size of 0.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
# Now force persistence of the fsync log to disk, for example, by fsyncing some
# other file.
touch $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
# Before a power loss or crash, we could read the 4Kb of data from our file as
# expected.
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# After the fsync log replay, because the fsync log had a value of 0 for our
# inode's i_size, we couldn't read anymore the 4Kb of data that we previously
# wrote and fsync'ed. The size of the file became 0 after the fsync log replay.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
Another alternative test, that doesn't need to fsync an inode in the same
transaction it was created, is:
_scratch_mkfs >> $seqres.full 2>&1
_init_flakey
_mount_flakey
# Create our test file with some data.
$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 8K 0 8K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Make sure the file is durably persisted.
sync
# Append some data to our file, to increase its size.
$XFS_IO_PROG -f -c "pwrite -S 0xcc -b 4K 8K 4K" \
$SCRATCH_MNT/foo | _filter_xfs_io
# Fsync the file, so from this point on if a crash/power failure happens, our
# new data is guaranteed to be there next time the fs is mounted.
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
# Add one hard link to our file. This made btrfs write into the in memory fsync
# log a special inode with generation 0 and an i_size of 0 too. Note that this
# didn't update the inode in the fsync log on disk.
ln $SCRATCH_MNT/foo $SCRATCH_MNT/foo_link
# Now make sure the in memory fsync log is durably persisted.
# Creating and fsync'ing another file will do it.
touch $SCRATCH_MNT/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/bar
# As expected, before the crash/power failure, we should be able to read the
# 12Kb of file data.
echo "File content before:"
od -t x1 $SCRATCH_MNT/foo
# Simulate a crash/power loss.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# After mounting the fs again, the fsync log was replayed.
# The btrfs fsync log replay code didn't update the i_size of the persisted
# inode because the inode item in the log had a special generation with a
# value of 0 (and it couldn't know the correct i_size, since that inode item
# had a 0 i_size too). This made the last 4Kb of file data inaccessible and
# effectively lost.
echo "File content after:"
od -t x1 $SCRATCH_MNT/foo
This isn't a new issue/regression. This problem has been around since the
log tree code was added in 2008:
Btrfs: Add a write ahead tree log to optimize synchronous operations
(commit e02119d5a7b4396c5a872582fddc8bd6d305a70a)
Test cases for xfstests follow soon.
CC: <stable@vger.kernel.org>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Removing large amount of block group in a transaction may encounters
BUG_ON() in btrfs_orphan_add(). That is because btrfs_orphan_reserve_metadata()
will grab metadata reservation from transaction handle, and
btrfs_delete_unused_bgs() didn't reserve metadata for trnasaction handle when
delete unused block group.
The problem can be reproduce by following script
mntpath=/btrfs
loopdev=/dev/loop0
filepath=/home/forrest/image
umount $mntpath
losetup -d $loopdev
truncate --size 1000g $filepath
losetup $loopdev $filepath
mkfs.btrfs -f $loopdev
mount $loopdev $mntpath
for j in `seq 1 1 1000`; do
fallocate -l 1g $mntpath/$j
done
# wait cleaner thread remove unused block group
sleep 300
The call trace that results from the BUG_ON() is:
[ 613.093084] ------------[ cut here ]------------
[ 613.097928] kernel BUG at fs/btrfs/inode.c:3142!
[ 613.105855] invalid opcode: 0000 [#1] SMP
[ 613.112702] Modules linked in: coretemp(E) crc32_pclmul(E) ghash_clmulni_intel(E) aesni_intel(E) snd_ens1371(E) snd_ac97_codec(E) aes_x86_64(E) lrw(E) gf128mul(E) glue_helper(E) ppdev(E) ac97_bus(E) ablk_helper(E) gameport(E) cryptd(E) snd_rawmidi(E) snd_seq_device(E) snd_pcm(E) vmw_balloon(E) snd_timer(E) snd(E) soundcore(E) serio_raw(E) vmwgfx(E) ttm(E) drm_kms_helper(E) drm(E) vmw_vmci(E) parport_pc(E) shpchp(E) i2c_piix4(E) mac_hid(E) lp(E) parport(E) btrfs(E) xor(E) raid6_pq(E) hid_generic(E) usbhid(E) hid(E) psmouse(E) ahci(E) libahci(E) e1000(E) mptspi(E) mptscsih(E) mptbase(E) floppy(E) vmw_pvscsi(E) vmxnet3(E)
[ 613.144196] CPU: 0 PID: 1480 Comm: btrfs-cleaner Tainted: G E 3.19.0-rc7-custom #2
[ 613.148501] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 613.152694] task: ffff880035cdb1a0 ti: ffff880039cf4000 task.ti: ffff880039cf4000
[ 613.154969] RIP: 0010:[<ffffffffa01441c2>] [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
[ 613.157780] RSP: 0018:ffff880039cf7c48 EFLAGS: 00010286
[ 613.159560] RAX: 00000000ffffffe4 RBX: ffff88003bd981a0 RCX: ffff88003c9e4000
[ 613.161904] RDX: 0000000000002244 RSI: 0000000000040000 RDI: ffff88003c9e4138
[ 613.164264] RBP: ffff880039cf7c88 R08: 000060ffc0000850 R09: 0000000000000000
[ 613.166507] R10: ffff88003bc4b7a0 R11: ffffea0000eb6740 R12: ffff88003c9c0000
[ 613.168681] R13: ffff88003c102160 R14: ffff88003c9c0458 R15: 0000000000000001
[ 613.170932] FS: 0000000000000000(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000
[ 613.173316] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 613.175227] CR2: 00007f6343537000 CR3: 0000000036329000 CR4: 00000000000407f0
[ 613.177554] Stack:
[ 613.178712] ffff880039cf7c88 ffffffffa0182a54 ffff88003c9e4b04 ffff88003c9c7800
[ 613.181297] ffff88003bc4b7a0 ffff88003bd981a0 ffff88003c8db200 ffff88003c2fcc60
[ 613.183782] ffff880039cf7d18 ffffffffa012da97 ffff88003bc4b7a4 ffff88003bc4b7a0
[ 613.186171] Call Trace:
[ 613.187493] [<ffffffffa0182a54>] ? lookup_free_space_inode+0x44/0x100 [btrfs]
[ 613.189801] [<ffffffffa012da97>] btrfs_remove_block_group+0x137/0x740 [btrfs]
[ 613.192126] [<ffffffffa0166912>] btrfs_remove_chunk+0x672/0x780 [btrfs]
[ 613.194267] [<ffffffffa012e2ff>] btrfs_delete_unused_bgs+0x25f/0x280 [btrfs]
[ 613.196567] [<ffffffffa0135e4c>] cleaner_kthread+0x12c/0x190 [btrfs]
[ 613.198687] [<ffffffffa0135d20>] ? check_leaf+0x350/0x350 [btrfs]
[ 613.200758] [<ffffffff8108f232>] kthread+0xd2/0xf0
[ 613.202616] [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
[ 613.204738] [<ffffffff8175dabc>] ret_from_fork+0x7c/0xb0
[ 613.206652] [<ffffffff8108f160>] ? kthread_create_on_node+0x180/0x180
[ 613.208741] Code: ff ff 0f 1f 80 00 00 00 00 89 45 c8 3e 80 63 80 fd 48 89 df e8 d0 23 fe ff 8b 45 c8 e9 14 ff ff ff b8 f4 ff ff ff e9 12 ff ff ff <0f> 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
[ 613.216562] RIP [<ffffffffa01441c2>] btrfs_orphan_add+0x1d2/0x1e0 [btrfs]
[ 613.218828] RSP <ffff880039cf7c48>
[ 613.220382] ---[ end trace 71073106deb8a457 ]---
This patch replace btrfs_join_transaction() with btrfs_start_transaction() in
btrfs_delete_unused_bgs() to revent BUG_ON() in btrfs_orphan_add()
Signed-off-by: Forrest Liu <forrestl@synology.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On our gluster boxes we stream large tar balls of backups onto our fses. With
160gb of ram this means we get really large contiguous ranges of dirty data, but
the way our ENOSPC stuff works is that as long as it's contiguous we only hold
metadata reservation for one extent. The problem is we limit our extents to
128mb, so we'll end up with at least 800 extents so our enospc accounting is
quite a bit lower than what we need. To keep track of this make sure we
increase outstanding_extents for every multiple of the max extent size so we can
be sure to have enough reserved metadata space. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We do this to get the space accounting, but this is just needless churn on the
io_tree, so just drop setting/clearing delalloc and just drop the reserved data
space when we have a successfull allocation. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We have this weird dance where we always inc outstanding_extents when we do a
O_DIRECT write, even if we allocate the entire range. To get around this we
also drop the metadata space if we successfully write. This is an unnecessary
dance, we only need to jack up outstanding_extents if we don't satisfy the
entire range request in get_blocks_direct, otherwise we are good using our
original reservation. So drop the unconditional inc and the drop of the
metadata space that we have for the unconditional inc. Thanks,
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Btrfs will report NO_SPACE when we create and remove files for several times,
and we can't write to filesystem until mount it again.
Steps to reproduce:
1: Create a single-dev btrfs fs with default option
2: Write a file into it to take up most fs space
3: Delete above file
4: Wait about 100s to let chunk removed
5: goto 2
Script is like following:
#!/bin/bash
# Recommend 1.2G space, too large disk will make test slow
DEV="/dev/sda16"
MNT="/mnt/tmp"
dev_size="$(lsblk -bn -o SIZE "$DEV")" || exit 2
file_size_m=$((dev_size * 75 / 100 / 1024 / 1024))
echo "Loop write ${file_size_m}M file on $((dev_size / 1024 / 1024))M dev"
for ((i = 0; i < 10; i++)); do umount "$MNT" 2>/dev/null; done
echo "mkfs $DEV"
mkfs.btrfs -f "$DEV" >/dev/null || exit 2
echo "mount $DEV $MNT"
mount "$DEV" "$MNT" || exit 2
for ((loop_i = 0; loop_i < 20; loop_i++)); do
echo
echo "loop $loop_i"
echo "dd file..."
cmd=(dd if=/dev/zero of="$MNT"/file0 bs=1M count="$file_size_m")
"${cmd[@]}" 2>/dev/null || {
# NO_SPACE error triggered
echo "dd failed: ${cmd[*]}"
exit 1
}
echo "rm file..."
rm -f "$MNT"/file0 || exit 2
for ((i = 0; i < 10; i++)); do
df "$MNT" | tail -1
sleep 10
done
done
Reason:
It is triggered by commit: 47ab2a6c689913db23ccae38349714edf8365e0a
which is used to remove empty block groups automatically, but the
reason is not in that patch. Code before works well because btrfs
don't need to create and delete chunks so many times with high
complexity.
Above bug is caused by many reason, any of them can trigger it.
Reason1:
When we remove some continuous chunks but leave other chunks after,
these disk space should be used by chunk-recreating, but in current
code, only first create will successed.
Fixed by Forrest Liu <forrestl@synology.com> in:
Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
Reason2:
contains_pending_extent() return wrong value in calculation.
Fixed by Forrest Liu <forrestl@synology.com> in:
Btrfs: fix find_free_dev_extent() malfunction in case device tree has hole
Reason3:
btrfs_check_data_free_space() try to commit transaction and retry
allocating chunk when the first allocating failed, but space_info->full
is set in first allocating, and prevent second allocating in retry.
Fixed in this patch by clear space_info->full in commit transaction.
Tested for severial times by above script.
Changelog v3->v4:
use light weight int instead of atomic_t to record have_remove_bgs in
transaction, suggested by:
Josef Bacik <jbacik@fb.com>
Changelog v2->v3:
v2 fixed the bug by adding more commit-transaction, but we
only need to reclaim space when we are really have no space for
new chunk, noticed by:
Filipe David Manana <fdmanana@gmail.com>
Actually, our code already have this type of commit-and-retry,
we only need to make it working with removed-bgs.
v3 fixed the bug with above way.
Changelog v1->v2:
v1 will introduce a new bug when delete and create chunk in same disk
space in same transaction, noticed by:
Filipe David Manana <fdmanana@gmail.com>
V2 fix this bug by commit transaction after remove block grops.
Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Suggested-by: Filipe David Manana <fdmanana@gmail.com>
Suggested-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
My previous patch "Btrfs: fix scrub race leading to use-after-free"
introduced the possibility to sleep in an atomic context, which happens
when the scrub_lock mutex is held at the time scrub_pending_bio_dec()
is called - this function can be called under an atomic context.
Chris ran into this in a debug kernel which gave the following trace:
[ 1928.950319] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:621
[ 1928.967334] in_atomic(): 1, irqs_disabled(): 0, pid: 149670, name: fsstress
[ 1928.981324] INFO: lockdep is turned off.
[ 1928.989244] CPU: 24 PID: 149670 Comm: fsstress Tainted: G W 3.19.0-rc7-mason+ #41
[ 1929.006418] Hardware name: ZTSYSTEMS Echo Ridge T4 /A9DRPF-10D, BIOS 1.07 05/10/2012
[ 1929.022207] ffffffff81a22cf8 ffff881076e03b78 ffffffff816b8dd9 ffff881076e03b78
[ 1929.037267] ffff880d8e828710 ffff881076e03ba8 ffffffff810856c4 ffff881076e03bc8
[ 1929.052315] 0000000000000000 000000000000026d ffffffff81a22cf8 ffff881076e03bd8
[ 1929.067381] Call Trace:
[ 1929.072344] <IRQ> [<ffffffff816b8dd9>] dump_stack+0x4f/0x6e
[ 1929.083968] [<ffffffff810856c4>] ___might_sleep+0x174/0x230
[ 1929.095352] [<ffffffff810857d2>] __might_sleep+0x52/0x90
[ 1929.106223] [<ffffffff816bb68f>] mutex_lock_nested+0x2f/0x3b0
[ 1929.117951] [<ffffffff810ab37d>] ? trace_hardirqs_on+0xd/0x10
[ 1929.129708] [<ffffffffa05dc838>] scrub_pending_bio_dec+0x38/0x70 [btrfs]
[ 1929.143370] [<ffffffffa05dd0e0>] scrub_parity_bio_endio+0x50/0x70 [btrfs]
[ 1929.157191] [<ffffffff812fa603>] bio_endio+0x53/0xa0
[ 1929.167382] [<ffffffffa05f96bc>] rbio_orig_end_io+0x7c/0xa0 [btrfs]
[ 1929.180161] [<ffffffffa05f97ba>] raid_write_parity_end_io+0x5a/0x80 [btrfs]
[ 1929.194318] [<ffffffff812fa603>] bio_endio+0x53/0xa0
[ 1929.204496] [<ffffffff8130401b>] blk_update_request+0x1eb/0x450
[ 1929.216569] [<ffffffff81096e58>] ? trigger_load_balance+0x78/0x500
[ 1929.229176] [<ffffffff8144c74d>] scsi_end_request+0x3d/0x1f0
[ 1929.240740] [<ffffffff8144ccac>] scsi_io_completion+0xac/0x5b0
[ 1929.252654] [<ffffffff81441c50>] scsi_finish_command+0xf0/0x150
[ 1929.264725] [<ffffffff8144d317>] scsi_softirq_done+0x147/0x170
[ 1929.276635] [<ffffffff8130ace6>] blk_done_softirq+0x86/0xa0
[ 1929.288014] [<ffffffff8105d92e>] __do_softirq+0xde/0x600
[ 1929.298885] [<ffffffff8105df6d>] irq_exit+0xbd/0xd0
(...)
Fix this by using a reference count on the scrub context structure
instead of locking the scrub_lock mutex.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We try to lock a mutex while the current task state is not TASK_RUNNING,
which results in the following warning when CONFIG_DEBUG_LOCK_ALLOC=y:
[30736.772501] ------------[ cut here ]------------
[30736.774545] WARNING: CPU: 9 PID: 19972 at kernel/sched/core.c:7300 __might_sleep+0x8b/0xa8()
[30736.783453] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff8107499b>] prepare_to_wait+0x43/0x89
[30736.786261] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc psmouse parport pcspkr microcode serio_raw evdev processor thermal_sys i2c_piix4 i2c_core button ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi floppy ata_piix libata virtio_pci virtio_ring e1000 virtio scsi_mod
[30736.794323] CPU: 9 PID: 19972 Comm: fsstress Not tainted 3.19.0-rc7-btrfs-next-5+ #1
[30736.795821] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[30736.798788] 0000000000000009 ffff88042743fbd8 ffffffff814248ed ffff88043d32f2d8
[30736.800504] ffff88042743fc28 ffff88042743fc18 ffffffff81045338 0000000000000001
[30736.802131] ffffffff81064514 ffffffff817c52d1 000000000000026d 0000000000000000
[30736.803676] Call Trace:
[30736.804256] [<ffffffff814248ed>] dump_stack+0x4c/0x65
[30736.805245] [<ffffffff81045338>] warn_slowpath_common+0xa1/0xbb
[30736.806360] [<ffffffff81064514>] ? __might_sleep+0x8b/0xa8
[30736.807391] [<ffffffff81045398>] warn_slowpath_fmt+0x46/0x48
[30736.808511] [<ffffffff8107499b>] ? prepare_to_wait+0x43/0x89
[30736.809620] [<ffffffff8107499b>] ? prepare_to_wait+0x43/0x89
[30736.810691] [<ffffffff81064514>] __might_sleep+0x8b/0xa8
[30736.811703] [<ffffffff81426eaf>] mutex_lock_nested+0x2f/0x3a0
[30736.812889] [<ffffffff8107bfa1>] ? trace_hardirqs_on_caller+0x18f/0x1ab
[30736.814138] [<ffffffff8107bfca>] ? trace_hardirqs_on+0xd/0xf
[30736.819878] [<ffffffffa038cfff>] wait_for_writer.isra.12+0x91/0xaa [btrfs]
[30736.821260] [<ffffffff810748bd>] ? signal_pending_state+0x31/0x31
[30736.822410] [<ffffffffa0391f0a>] btrfs_sync_log+0x160/0x947 [btrfs]
[30736.823574] [<ffffffff8107bfa1>] ? trace_hardirqs_on_caller+0x18f/0x1ab
[30736.824847] [<ffffffff8107bfca>] ? trace_hardirqs_on+0xd/0xf
[30736.825972] [<ffffffffa036e555>] btrfs_sync_file+0x2b0/0x319 [btrfs]
[30736.827684] [<ffffffff8117901a>] vfs_fsync_range+0x21/0x23
[30736.828932] [<ffffffff81179038>] vfs_fsync+0x1c/0x1e
[30736.829917] [<ffffffff8117928b>] do_fsync+0x34/0x4e
[30736.830862] [<ffffffff811794b3>] SyS_fsync+0x10/0x14
[30736.831819] [<ffffffff8142a512>] system_call_fastpath+0x12/0x17
[30736.832982] ---[ end trace c0b57df60d32ae5c ]---
Fix this my acquiring the mutex after calling finish_wait(), which sets the
task's state to TASK_RUNNING.
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
"notused" is not necessary. Set 1 to the first entry is enough.
Signed-off-by: Takeuchi Satoru <takeuchi_satoru@jp.fujitsu.com
Cc: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
o removed an unecessary INIT_LIST_HEAD after LIST_HEAD
o merge a declare & INIT_LIST_HEAD pair into one LIST_HEAD
Signed-off-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Below test will fail currently:
mkfs.ext4 -F /dev/sda
btrfs-convert /dev/sda
mount /dev/sda /mnt
btrfs device add -f /dev/sdb /mnt
btrfs balance start -v -dconvert=raid1 -mconvert=raid1 /mnt
The reason is there are some block groups with usage 0, but the whole
disk hasn't free space to allocate new chunk, so we even can't set such
block group readonly. This patch deletes the chunk allocation when
setting block group ro. For META, we already have reserve. But for
SYSTEM, we don't have, so the check_system_chunk is still required.
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
After submit_one_bio(), `bio' can go away. However submit_extent_page()
leave `bio' referable if submit_one_bio() failed (e.g. -ENOMEM on OOM).
It will cause invalid paging request when submit_extent_page() is called
next time.
I reproduced ENOMEM case with the following script (need
CONFIG_FAIL_PAGE_ALLOC, and CONFIG_FAULT_INJECTION_DEBUG_FS).
#!/bin/bash
dmesgout=dmesg.txt
start=100000
end=300000
step=1000
# btrfs options
device=/dev/vdb1
directory=/mnt/btrfs
# fault-injection options
percent=100
times=3
mkdir -p $directory || exit 1
mount -o compress $device $directory || exit 1
rm -f $directory/file || exit 1
dd if=/dev/zero of=$directory/file bs=1M count=512 || exit 1
for interval in `seq $start $step $end`; do
dmesg -C
echo 1 > /proc/sys/vm/drop_caches
sync
export FAILCMD_TYPE=fail_page_alloc
./failcmd.sh -p $percent -t $times -i $interval \
--ignore-gfp-highmem=N --ignore-gfp-wait=N --min-order=0 \
-- \
cat $directory/file > /dev/null
dmesg > ${dmesgout}
if grep -q BUG: ${dmesgout}; then
cat ${dmesgout}
exit 1
fi
done
umount $directory
exit 0
Signed-off-by: Naohiro Aota <naota@elisp.net>
Tested-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While running a scrub on a kernel with CONFIG_DEBUG_PAGEALLOC=y, I got
the following trace:
[68127.807663] BUG: unable to handle kernel paging request at ffff8803f8947a50
[68127.807663] IP: [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663] PGD 3003067 PUD 43e1f5067 PMD 43e030067 PTE 80000003f8947060
[68127.807663] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[68127.807663] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc processor parpo
[68127.807663] CPU: 2 PID: 3081 Comm: kworker/u8:5 Not tainted 3.18.0-rc6-btrfs-next-3+ #4
[68127.807663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[68127.807663] Workqueue: btrfs-btrfs-scrub btrfs_scrub_helper [btrfs]
[68127.807663] task: ffff880101fc5250 ti: ffff8803f097c000 task.ti: ffff8803f097c000
[68127.807663] RIP: 0010:[<ffffffff8107da31>] [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663] RSP: 0018:ffff8803f097fbb8 EFLAGS: 00010093
[68127.807663] RAX: 0000000028dd386c RBX: ffff8803f8947a50 RCX: 0000000028dd3854
[68127.807663] RDX: 0000000000000018 RSI: 0000000000000002 RDI: 0000000000000001
[68127.807663] RBP: ffff8803f097fbd8 R08: 0000000000000004 R09: 0000000000000001
[68127.807663] R10: ffff880102620980 R11: ffff8801f3e8c900 R12: 000000000001d390
[68127.807663] R13: 00000000cabd13c8 R14: ffff8803f8947800 R15: ffff88037c574f00
[68127.807663] FS: 0000000000000000(0000) GS:ffff88043dd00000(0000) knlGS:0000000000000000
[68127.807663] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[68127.807663] CR2: ffff8803f8947a50 CR3: 00000000b6481000 CR4: 00000000000006e0
[68127.807663] Stack:
[68127.807663] ffffffff823942a8 ffff8803f8947a50 ffff8802a3416f80 0000000000000000
[68127.807663] ffff8803f097fc18 ffffffff8141e7c0 ffffffff81072948 000000000034f314
[68127.807663] ffff8803f097fc08 0000000000000292 ffff8803f097fc48 ffff8803f8947a50
[68127.807663] Call Trace:
[68127.807663] [<ffffffff8141e7c0>] _raw_spin_lock_irqsave+0x4b/0x55
[68127.807663] [<ffffffff81072948>] ? __wake_up+0x22/0x4b
[68127.807663] [<ffffffff81072948>] __wake_up+0x22/0x4b
[68127.807663] [<ffffffffa0392327>] scrub_pending_bio_dec+0x32/0x36 [btrfs]
[68127.807663] [<ffffffffa0395e70>] scrub_bio_end_io_worker+0x5a3/0x5c9 [btrfs]
[68127.807663] [<ffffffff810e0c7c>] ? time_hardirqs_off+0x15/0x28
[68127.807663] [<ffffffff81078106>] ? trace_hardirqs_off_caller+0x4c/0xb9
[68127.807663] [<ffffffffa0372a7c>] normal_work_helper+0xf1/0x238 [btrfs]
[68127.807663] [<ffffffffa0372d3d>] btrfs_scrub_helper+0x12/0x14 [btrfs]
[68127.807663] [<ffffffff810582d2>] process_one_work+0x1e4/0x3b6
[68127.807663] [<ffffffff81078180>] ? trace_hardirqs_off+0xd/0xf
[68127.807663] [<ffffffff81058dc9>] worker_thread+0x1fb/0x2a8
[68127.807663] [<ffffffff81058bce>] ? rescuer_thread+0x219/0x219
[68127.807663] [<ffffffff8105cd75>] kthread+0xdb/0xe3
[68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
[68127.807663] [<ffffffff8141f1ec>] ret_from_fork+0x7c/0xb0
[68127.807663] [<ffffffff8105cc9a>] ? __kthread_parkme+0x67/0x67
[68127.807663] Code: 39 c2 75 14 8d 8a 00 00 01 00 89 d0 f0 0f b1 0b 39 d0 0f 84 81 00 00 00 4c 69 2d 27 86 99 00 fa 00 00 00 45 31 e4 4d 39 ec 74 2b <8b> 13 89 d0 c1 e8 10 66 39 c2 75
[68127.807663] RIP [<ffffffff8107da31>] do_raw_spin_lock+0x94/0x122
[68127.807663] RSP <ffff8803f097fbb8>
[68127.807663] CR2: ffff8803f8947a50
[68127.807663] ---[ end trace d7045aac00a66cd8 ]---
This is due to a race that can happen in a very tiny time window and is
illustrated by the following sequence diagram:
CPU 1 CPU 2
btrfs_scrub_dev()
scrub_bio_end_io_worker()
scrub_pending_bio_dec()
atomic_dec(&sctx->bios_in_flight)
wait sctx->bios_in_flight == 0
wait sctx->workers_pending == 0
mutex_lock(&fs_info->scrub_lock)
(...)
mutex_lock(&fs_info->scrub_lock)
scrub_free_ctx(sctx)
kfree(sctx)
wake_up(&sctx->list_wait)
__wake_up()
spin_lock_irqsave(&sctx->list_wait->lock, flags)
Another variation of this scenario that results in the same use-after-free
issue is:
CPU 1 CPU 2
btrfs_scrub_dev()
wait sctx->bios_in_flight == 0
scrub_bio_end_io_worker()
scrub_pending_bio_dec()
__wake_up(&sctx->list_wait)
spin_lock_irqsave(&sctx->list_wait->lock, flags)
default_wake_function()
wake up task at CPU 2
wait sctx->workers_pending == 0
mutex_lock(&fs_info->scrub_lock)
(...)
mutex_lock(&fs_info->scrub_lock)
scrub_free_ctx(sctx)
kfree(sctx)
spin_unlock_irqrestore(&sctx->list_wait->lock, flags)
Fix this by holding the scrub lock while doing the wakeup.
This isn't a recent regression, the issue as been around since the scrub
feature was added (2011, commit a2de733c78fa7af51ba9670482fa7d392aa67c57).
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If we failed during initialization of sysfs, we weren't unregistering the
top level btrfs sysfs entry nor the debugfs stuff.
Not unregistering the top level sysfs entry makes future attempts to reload
the btrfs module impossible and the following is reported in dmesg:
[ 2246.451296] WARNING: CPU: 3 PID: 10999 at fs/sysfs/dir.c:486 sysfs_warn_dup+0x91/0xb0()
[ 2246.451298] sysfs: cannot create duplicate filename '/fs/btrfs'
[ 2246.451298] Modules linked in: btrfs(+) raid6_pq xor bnep rfcomm bluetooth binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc parport_pc parport psmouse serio_raw pcspkr evbug i2c_piix4 e1000 floppy [last unloaded: btrfs]
[ 2246.451310] CPU: 3 PID: 10999 Comm: modprobe Tainted: G W 3.13.0-fdm-btrfs-next-24+ #7
[ 2246.451311] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 2246.451312] 0000000000000009 ffff8800d353fa08 ffffffff816f1da6 0000000000000410
[ 2246.451314] ffff8800d353fa58 ffff8800d353fa48 ffffffff8104a32c ffff88020821a290
[ 2246.451316] ffff88020821a290 ffff88020821a290 ffff8802148f0000 ffff8800d353fb80
[ 2246.451318] Call Trace:
[ 2246.451322] [<ffffffff816f1da6>] dump_stack+0x4e/0x68
[ 2246.451324] [<ffffffff8104a32c>] warn_slowpath_common+0x8c/0xc0
[ 2246.451325] [<ffffffff8104a416>] warn_slowpath_fmt+0x46/0x50
[ 2246.451328] [<ffffffff81367dc5>] ? strlcat+0x65/0x90
(....)
This fixes the following change:
btrfs: add simple debugfs interface
commit 1bae30982bc86ab66d61ccb6e22792593b45d44d
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Committing a transaction can race with automatic removal of empty block
groups (cleaner kthread), leading to a BUG_ON() in the transaction
commit code while running btrfs_finish_extent_commit(). The following
sequence diagram shows how it can happen:
CPU 1 CPU 2
btrfs_commit_transaction()
fs_info->running_transaction = NULL
btrfs_finish_extent_commit()
find_first_extent_bit()
-> found range for block group X
in fs_info->freed_extents[]
btrfs_delete_unused_bgs()
-> found block group X
Removed block group X's range
from fs_info->freed_extents[]
btrfs_remove_chunk()
btrfs_remove_block_group(bg X)
unpin_extent_range(bg X range)
btrfs_lookup_block_group(bg X)
-> returns NULL
-> BUG_ON()
The trace that results from the BUG_ON() is:
[48665.187808] ------------[ cut here ]------------
[48665.188032] kernel BUG at fs/btrfs/extent-tree.c:5675!
[48665.188032] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
[48665.188032] Modules linked in: dm_flakey dm_mod crc32c_generic btrfs xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop parport_pc evdev microcode
[48665.197388] CPU: 2 PID: 31211 Comm: kworker/u32:16 Tainted: G W 3.19.0-rc5-btrfs-next-4+ #1
[48665.197388] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[48665.197388] Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
[48665.197388] task: ffff880222011810 ti: ffff8801b56a4000 task.ti: ffff8801b56a4000
[48665.197388] RIP: 0010:[<ffffffffa0350d05>] [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
[48665.197388] RSP: 0018:ffff8801b56a7b88 EFLAGS: 00010246
[48665.197388] RAX: 0000000000000000 RBX: ffff8802143a6000 RCX: ffff8802220120c8
[48665.197388] RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffff8800a3c140b0
[48665.197388] RBP: ffff8801b56a7bd8 R08: 0000000000000003 R09: 0000000000000000
[48665.197388] R10: 0000000000000000 R11: 000000000000bbac R12: 0000000012e8e000
[48665.197388] R13: ffff8800a3c14000 R14: 0000000000000000 R15: 0000000000000000
[48665.197388] FS: 0000000000000000(0000) GS:ffff88023ec40000(0000) knlGS:0000000000000000
[48665.197388] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[48665.197388] CR2: 00007f065e42f270 CR3: 0000000206f70000 CR4: 00000000000006e0
[48665.197388] Stack:
[48665.197388] ffff8801b56a7bd8 0000000012ea0000 01ff8800a3c14138 0000000012e9ffff
[48665.197388] ffff880141df3dd8 ffff8802143a6000 ffff8800a3c14138 ffff880141df3df0
[48665.197388] ffff880141df3dd8 0000000000000000 ffff8801b56a7c08 ffffffffa0354227
[48665.197388] Call Trace:
[48665.197388] [<ffffffffa0354227>] btrfs_finish_extent_commit+0xb0/0xd9 [btrfs]
[48665.197388] [<ffffffffa0366b4b>] btrfs_commit_transaction+0x791/0x92c [btrfs]
[48665.197388] [<ffffffffa0352432>] flush_space+0x43d/0x452 [btrfs]
[48665.197388] [<ffffffff814295c3>] ? _raw_spin_unlock+0x28/0x33
[48665.197388] [<ffffffffa035255f>] btrfs_async_reclaim_metadata_space+0x118/0x164 [btrfs]
[48665.197388] [<ffffffff81059917>] ? process_one_work+0x14b/0x3ab
[48665.197388] [<ffffffff810599ac>] process_one_work+0x1e0/0x3ab
[48665.197388] [<ffffffff81079fa9>] ? trace_hardirqs_off+0xd/0xf
[48665.197388] [<ffffffff8105a55b>] worker_thread+0x210/0x2d0
[48665.197388] [<ffffffff8105a34b>] ? rescuer_thread+0x2c3/0x2c3
[48665.197388] [<ffffffff8105e5c0>] kthread+0xef/0xf7
[48665.197388] [<ffffffff81429682>] ? _raw_spin_unlock_irq+0x2d/0x39
[48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
[48665.197388] [<ffffffff81429dec>] ret_from_fork+0x7c/0xb0
[48665.197388] [<ffffffff8105e4d1>] ? __kthread_parkme+0xad/0xad
[48665.197388] Code: 85 f6 74 14 49 8b 06 49 03 46 09 49 39 c4 72 1d 4c 89 f7 e8 83 ec ff ff 4c 89 e6 4c 89 ef e8 1e f1 ff ff 48 85 c0 49 89 c6 75 02 <0f> 0b 49 8b 1e 49 03 5e 09 48 8b
[48665.197388] RIP [<ffffffffa0350d05>] unpin_extent_range+0x6a/0x1ba [btrfs]
[48665.197388] RSP <ffff8801b56a7b88>
[48665.272246] ---[ end trace b9c6ab9957521376 ]---
Fix this by ensuring that unpining the block group's range in
btrfs_finish_extent_commit() is done in a synchronized fashion
with removing the block group's range from freed_extents[]
in btrfs_delete_unused_bgs()
This race got introduced with the change:
Btrfs: remove empty block groups automatically
commit 47ab2a6c689913db23ccae38349714edf8365e0a
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Verify that the sys_array has enough bytes to read the next item.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There's a pointer to buffer, integer offset and offset passed as
pointer, try to find matching names for them.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Verify that possible minimum and maximum size is set, validity of
contents is checked in btrfs_read_sys_array.
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>
|