diff options
author | Chris Mason <chris.mason@oracle.com> | 2011-03-12 07:08:42 -0500 |
---|---|---|
committer | Chris Mason <chris.mason@oracle.com> | 2011-03-12 07:08:42 -0500 |
commit | 36e39c40b3facc9b489a13f1d301fc53ff6960a3 (patch) | |
tree | e009d85998f89ef06d1d96515e3856fa074c4f4f /fs/btrfs/ctree.h | |
parent | 7e6b6465e6efbca3985258996be9c189da96c8bf (diff) |
Btrfs: break out of shrink_delalloc earlier
Josef had changed shrink_delalloc to exit after three shrink
attempts, which wasn't quite enough because new writers could
race in and steal free space.
But it also fixed deadlocks and stalls as we tried to recover
delalloc reservations. The code was tweaked to loop 1024
times, and would reset the counter any time a small amount
of progress was made. This was too drastic, and with a
lot of writers we can end up stuck in shrink_delalloc forever.
The shrink_delalloc loop is fairly complex because the caller is looping
too, and the caller will go ahead and force a transaction commit to make
sure we reclaim space.
This reworks things to exit shrink_delalloc when we've forced some
writeback and the delalloc reservations have gone down. This means
the writeback has not just started but has also finished at
least some of the metadata changes required to reclaim delalloc
space.
If we've got this wrong, we're returning ENOSPC too early, which
is a big improvement over the current behavior of hanging the machine.
Test 224 in xfstests hammers on this nicely, and with 1000 writers
trying to fill a 1GB drive we get our first ENOSPC at 93% full. The
other writers are able to continue until we get 100%.
This is a worst case test for btrfs because the 1000 writers are doing
small IO, and the small FS size means we don't have a lot of room
for metadata chunks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Diffstat (limited to 'fs/btrfs/ctree.h')
-rw-r--r-- | fs/btrfs/ctree.h | 9 |
1 files changed, 9 insertions, 0 deletions
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index 28188a786da0..8b4b9d158a0a 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h | |||
@@ -729,6 +729,15 @@ struct btrfs_space_info { | |||
729 | u64 disk_total; /* total bytes on disk, takes mirrors into | 729 | u64 disk_total; /* total bytes on disk, takes mirrors into |
730 | account */ | 730 | account */ |
731 | 731 | ||
732 | /* | ||
733 | * we bump reservation progress every time we decrement | ||
734 | * bytes_reserved. This way people waiting for reservations | ||
735 | * know something good has happened and they can check | ||
736 | * for progress. The number here isn't to be trusted, it | ||
737 | * just shows reclaim activity | ||
738 | */ | ||
739 | unsigned long reservation_progress; | ||
740 | |||
732 | int full; /* indicates that we cannot allocate any more | 741 | int full; /* indicates that we cannot allocate any more |
733 | chunks for this space */ | 742 | chunks for this space */ |
734 | int force_alloc; /* set if we need to force a chunk alloc for | 743 | int force_alloc; /* set if we need to force a chunk alloc for |