| Commit message (Collapse) | Author | Age |
... | |
|
|
|
|
|
|
| |
bcache_flash_dev.ktest would reliably crash with 8k and 16k bucket size
before; now it passes.
Change-Id: Ib542232235e39298c3a7548fe52b645cabb823d1
|
|
|
|
|
|
|
|
| |
If register_cache_set() failed, we would touch ca->set after
it had already been freed. Also, fix an assertion to catch
this.
Change-Id: I748e5f5b223e2d9b2602075dec2f997cced2394d
|
|
|
|
| |
Change-Id: I6abde52afe917633480caaf4e2518f42a816d886
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
| |
If we goto out_nocoalesce after we free new_nodes[0], we end up freeing
new_nodes[0] again. This was generating a lockdep warning. The fix is
to set new_nodes[0] to NULL, since the out_nocoalesce path safely
ignores NULL entries in the new_nodes array.
This regression was introduced in 2d7f9531.
Change-Id: I76564d7257800583214376b4bacf236cda90c89c
|
|
|
|
|
|
|
|
|
| |
When running with multiple cache devices, if one of the devices has a completely
empty journal but we'd already found some journal entries on a previosu device
we'd go into an infinite loop.
Change-Id: I1dcdc0d738192746de28f40e8b08825b0dea5e2b
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
| |
'b' was NULL.
Change-Id: Icac0fd04afa2d23f213d96d51afd53374e6dd0c0
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
There's no point in blocking on these allocations, since our fallback paths will
probably go faster than blocking.
Change-Id: I733ca202c25cb36bde02607a0a60552229a4241c
|
|
|
|
|
|
|
|
| |
this was very wrong - mempool_alloc() only guarantees success with GFP_WAIT.
bcache uses GFP_NOWAIT in various other places where we have a fallback,
circuits must've gotten crossed when writing this code or something.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There were two issues here:
- writeback thread did not start until the device first became dirty
- writeback thread used uninterruptible sleep once running
Without this patch I see kernel warnings printed and a load average of
1.52 after booting my test VM. With this patch the warnings are gone and
the load average is near 0.00 as expected.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
| |
Tested:
- sometimes bcache_tier test would hang on startup with a failure
to allocate the btree root -- no longer seeing this
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
| |
We never started the writeback thread in this case, so don't stop it.
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
while loop was executing infinitely.
This fix ends the while loop gracefully.
Signed-off-by: Surbhi Palande <sap@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
journal replay wansn't validating pointers with bch_extent_invalid() before
derefing, fixed
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
After detaching a backing device from a cache set, a bit wasn't getting
reset meaning the second detach wouldn't work correctly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
Uninlined nested functions can cause crashes when using ftrace, as they don't
follow the normal calling convention and confuse the ftrace function graph
tracer as it examines the stack.
Also, nested functions are supported as a gcc extension, but may fail on other
compilers (e.g. llvm).
Signed-off-by: John Sheu <john.sheu@gmail.com>
|
|
|
|
|
|
|
|
| |
gc_gen was a temporary used to recalculate last_gc, but since we only need
bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
last_gc directly.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
| |
This was originally added as at optimization that for various reasons isn't
needed anymore, but it does add a lot of nasty corner cases (and it was
responsible for some recently fixed bugs). Just get rid of it now.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the bucket allocation reserves to use _real_ reserves - separate
freelists - instead of watermarks, which if nothing else makes the current code
saner to reason about and is going to be important in the future when we add
support for multiple btrees.
It also adds btree_check_reserve(), which checks (and locks) the reserves for
both bucket allocation and memory allocation for btree nodes; the old code just
kinda sorta assumed that since (e.g. for btree node splits) it had the root
locked and that meant no other threads could try to make use of the same
reserve; this technically should have been ok for memory allocation (we should
always have a reserve for memory allocation (the btree node cache is used as a
reserve and we preallocate it)), but multiple btrees will mean that locking the
root won't be sufficient anymore, and for the bucket allocation reserve it was
technically possible for the old code to deadlock.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
| |
With the locking rework in the last patch, this shouldn't be needed anymore -
btree_node_write_work() only takes b->write_lock which is never held for very
long.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a new lock, b->write_lock, which is required to actually modify - or write -
a btree node; this lock is only held for short durations.
This means we can write out a btree node without taking b->lock, which _is_ held
for long durations - solving a deadlock when btree_flush_write() (from the
journalling code) is called with a btree node locked.
Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
rework is going to happen a lot more.
This also turns b->lock is now more of a read/intent lock instead of a
read/write lock - but not completely, since it still blocks readers. May turn it
into a real intent lock at some point in the future.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
bucket on the unused freelist, where it can be reused right away without any
ordering requirements. It would be better to wait on at least a journal write to
go down before reusing the bucket. bch_btree_set_root() does this, and inserting
into non leaf nodes is completely synchronous so we should be ok, but future
patches are just going to get rid of the unused freelist - it was needed in the
past for various reasons but shouldn't be anymore.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
This means the garbage collection code can better check for data and metadata
pointers to the same buckets.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
This will potentially save us an allocation when we've got inode/dirent bkeys
that don't fit in the keylist's inline keys.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
| |
Break down data into clean data/dirty data/metadata.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
Change the invalidate tracepoint to indicate how much data we're invalidating,
and change the alloc tracepoints to indicate what offset they're for.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
| |
This hasn't been used or even enabled in ages.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
| |
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
|
|
|
|
|
|
| |
Avoid a potential null pointer deref (e.g. from check keys for cache misses)
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Deadlock happened because a foreground write slept, waiting for a bucket
to be allocated. Normally the gc would mark buckets available for invalidation.
But the moving_gc was stuck waiting for outstanding writes to complete.
These writes used the bcache_wq, the same queue foreground writes used.
This fix gives moving_gc its own work queue, so it was still finish moving
even if foreground writes are stuck waiting for allocation. It also makes
work queue a parameter to the data_insert path, so moving_gc can use its
workqueue for writes.
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
| |
blk_stack_limits() doesn't like a discard granularity of 0.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The on disk bucket gens are allowed to be out of date, when we reuse buckets
that didn't have any live data in them. To deal with this, the initial gc has to
update the bucket gen when we find a pointer gen newer than the bucket's gen.
Unfortunately we weren't doing this for pointers in the journal that we're about
to replay.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
The code to fixup incorrect bucket prios incorrectly did not skip btree node
freeing keys
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
|
| |
On recovery we weren't correctly keeping track of what journal buckets had open
journal entries, thus it was possible for them to be overwritten until we'd
written all new journal entries.
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
| |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
| |
Shutdown wasn't cancelling/waiting on journal_write_work()
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
The code was using sectors to count the number of sectors it was zeroing... but
then it passed it to bio_advance()... after it had been set to 0. Amusing...
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
|
|
|
|
|
|
| |
Use a bigger hammer this time
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: linux-stable <stable@vger.kernel.org>
|
|\
| |
| |
| | |
into for-linus
|
| |
| |
| |
| | |
Signed-off-by: Nicholas Swenson <nks@daterainc.com>
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The BUG_ON at the end of __bch_btree_mark_key can be triggered due to
an integer overflow error:
BITMASK(GC_SECTORS_USED, struct bucket, gc_mark, 2, 13);
...
SET_GC_SECTORS_USED(g, min_t(unsigned,
GC_SECTORS_USED(g) + KEY_SIZE(k),
(1 << 14) - 1));
BUG_ON(!GC_SECTORS_USED(g));
In bcache.h, the SECTORS_USED bitfield is defined to be 13 bits wide.
While the SET_ code tries to ensure that the field doesn't overflow by
clamping it to (1<<14)-1 == 16383, this is incorrect because 16383
requires 14 bits. Therefore, if GC_SECTORS_USED() + KEY_SIZE() =
8192, the SET_ statement tries to store 8192 into a 13-bit field. In
a 13-bit field, 8192 becomes zero, thus triggering the BUG_ON.
Therefore, create a field width constant and a max value constant, and
use those to create the bitfield and check the inputs to
SET_GC_SECTORS_USED. Arguably the BITMASK() template ought to have
BUG_ON checks for too-large values, but that's a separate patch.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
|
|\|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull block IO driver changes from Jens Axboe:
- bcache update from Kent Overstreet.
- two bcache fixes from Nicholas Swenson.
- cciss pci init error fix from Andrew.
- underflow fix in the parallel IDE pg_write code from Dan Carpenter.
I'm sure the 1 (or 0) users of that are now happy.
- two PCI related fixes for sx8 from Jingoo Han.
- floppy init fix for first block read from Jiri Kosina.
- pktcdvd error return miss fix from Julia Lawall.
- removal of IRQF_SHARED from the SEGA Dreamcast CD-ROM code from
Michael Opdenacker.
- comment typo fix for the loop driver from Olaf Hering.
- potential oops fix for null_blk from Raghavendra K T.
- two fixes from Sam Bradshaw (Micron) for the mtip32xx driver, fixing
an OOM problem and a problem with handling security locked conditions
* 'for-3.14/drivers' of git://git.kernel.dk/linux-block: (47 commits)
mg_disk: Spelling s/finised/finished/
null_blk: Null pointer deference problem in alloc_page_buffers
mtip32xx: Correctly handle security locked condition
mtip32xx: Make SGL container per-command to eliminate high order dma allocation
drivers/block/loop.c: fix comment typo in loop_config_discard
drivers/block/cciss.c:cciss_init_one(): use proper errnos
drivers/block/paride/pg.c: underflow bug in pg_write()
drivers/block/sx8.c: remove unnecessary pci_set_drvdata()
drivers/block/sx8.c: use module_pci_driver()
floppy: bail out in open() if drive is not responding to block0 read
bcache: Fix auxiliary search trees for key size > cacheline size
bcache: Don't return -EINTR when insert finished
bcache: Improve bucket_prio() calculation
bcache: Add bch_bkey_equal_header()
bcache: update bch_bkey_try_merge
bcache: Move insert_fixup() to btree_keys_ops
bcache: Convert sorting to btree_keys
bcache: Convert debug code to btree_keys
bcache: Convert btree_iter to struct btree_keys
bcache: Refactor bset_tree sysfs stats
...
|
| |
| |
| |
| | |
Signed-off-by: Kent Overstreet <kmo@daterainc.com>
|