bcache: Documentation updates

Signed-off-by: Kent Overstreet <koverstreet@google.com>
author: Kent Overstreet <koverstreet@google.com> 2013-03-27 15:24:17 -0400
committer: Kent Overstreet <koverstreet@google.com> 2013-04-08 16:33:48 -0400
commit: 7b41b51a705ec0eb5f88060c9f724c8bc0e79eab (patch)
tree: 94f9705bad438d8710b7d67baef1334ccb6819fa /Documentation/bcache.txt
parent: cc0f4eaa61817aaea6e61a820f3f1c500a5542b1 (diff)
1 files changed, 88 insertions, 0 deletions
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
index 533307d52c87..77db8809bd96 100644
--- a/Documentation/bcache.txt
+++ b/Documentation/bcache.txt
@@ -101,6 +101,94 @@ but all the cached data will be invalidated. If there was dirty data in the
 cache, don't expect the filesystem to be recoverable - you will have massive
 filesystem corruption, though ext4's fsck does work miracles.
+ERROR HANDLING:
+Bcache tries to transparently handle IO errors to/from the cache device without
+affecting normal operation; if it sees too many errors (the threshold is
+configurable, and defaults to 0) it shuts down the cache device and switches all
+the backing devices to passthrough mode.
+ - For reads from the cache, if they error we just retry the read from the
+   backing device.
+ - For writethrough writes, if the write to the cache errors we just switch to
+   invalidating the data at that lba in the cache (i.e. the same thing we do for
+   a write that bypasses the cache)
+ - For writeback writes, we currently pass that error back up to the
+   filesystem/userspace. This could be improved - we could retry it as a write
+   that skips the cache so we don't have to error the write.
+ - When we detach, we first try to flush any dirty data (if we were running in
+   writeback mode). It currently doesn't do anything intelligent if it fails to
+   read some of the dirty data, though.
+TROUBLESHOOTING PERFORMANCE:
+Bcache has a bunch of config options and tunables. The defaults are intended to
+be reasonable for typical desktop and server workloads, but they're not what you
+want for getting the best possible numbers when benchmarking.
+ - Bad write performance
+   If write performance is not what you expected, you probably wanted to be
+   running in writeback mode, which isn't the default (not due to a lack of
+   maturity, but simply because in writeback mode you'll lose data if something
+   happens to your SSD)
+   # echo writeback > /sys/block/bcache0/cache_mode
+ - Bad performance, or traffic not going to the SSD that you'd expect
+   By default, bcache doesn't cache everything. It tries to skip sequential IO -
+   because you really want to be caching the random IO, and if you copy a 10
+   gigabyte file you probably don't want that pushing 10 gigabytes of randomly
+   accessed data out of your cache.
+   But if you want to benchmark reads from cache, and you start out with fio
+   writing an 8 gigabyte test file - so you want to disable that.
+   # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
+   To set it back to the default (4 mb), do
+   # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
+ - Traffic's still going to the spindle/still getting cache misses
+   In the real world, SSDs don't always keep up with disks - particularly with
+   slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
+   you want to avoid being bottlenecked by the SSD and having it slow everything
+   down.
+   To avoid that bcache tracks latency to the cache device, and gradually
+   throttles traffic if the latency exceeds a threshold (it does this by
+   cranking down the sequential bypass).
+   You can disable this if you need to by setting the thresholds to 0:
+   # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
+   # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
+   The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
+ - Still getting cache misses, of the same data
+   One last issue that sometimes trips people up is actually an old bug, due to
+   the way cache coherency is handled for cache misses. If a btree node is full,
+   a cache miss won't be able to insert a key for the new data and the data
+   won't be written to the cache.
+   In practice this isn't an issue because as soon as a write comes along it'll
+   cause the btree node to be split, and you need almost no write traffic for
+   this to not show up enough to be noticable (especially since bcache's btree
+   nodes are huge and index large regions of the device). But when you're
+   benchmarking, if you're trying to warm the cache by reading a bunch of data
+   and there's no other traffic - that can be a problem.
+   Solution: warm the cache by doing writes, or use the testing branch (there's
+   a fix for the issue there).
 SYSFS - BACKING DEVICE:
 attach
author	Kent Overstreet <koverstreet@google.com>	2013-03-27 15:24:17 -0400
committer	Kent Overstreet <koverstreet@google.com>	2013-04-08 16:33:48 -0400
commit	7b41b51a705ec0eb5f88060c9f724c8bc0e79eab (patch)
tree	94f9705bad438d8710b7d67baef1334ccb6819fa /Documentation/bcache.txt
parent	cc0f4eaa61817aaea6e61a820f3f1c500a5542b1 (diff)

diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt index 533307d52c87..77db8809bd96 100644 --- a/Documentation/bcache.txt +++ b/Documentation/bcache.txt
@@ -101,6 +101,94 @@ but all the cached data will be invalidated. If there was dirty data in the
101	cache, don't expect the filesystem to be recoverable - you will have massive	101	cache, don't expect the filesystem to be recoverable - you will have massive
102	filesystem corruption, though ext4's fsck does work miracles.	102	filesystem corruption, though ext4's fsck does work miracles.
103		103
		104	ERROR HANDLING:
		105
		106	Bcache tries to transparently handle IO errors to/from the cache device without
		107	affecting normal operation; if it sees too many errors (the threshold is
		108	configurable, and defaults to 0) it shuts down the cache device and switches all
		109	the backing devices to passthrough mode.
		110
		111	- For reads from the cache, if they error we just retry the read from the
		112	backing device.
		113
		114	- For writethrough writes, if the write to the cache errors we just switch to
		115	invalidating the data at that lba in the cache (i.e. the same thing we do for
		116	a write that bypasses the cache)
		117
		118	- For writeback writes, we currently pass that error back up to the
		119	filesystem/userspace. This could be improved - we could retry it as a write
		120	that skips the cache so we don't have to error the write.
		121
		122	- When we detach, we first try to flush any dirty data (if we were running in
		123	writeback mode). It currently doesn't do anything intelligent if it fails to
		124	read some of the dirty data, though.
		125
		126	TROUBLESHOOTING PERFORMANCE:
		127
		128	Bcache has a bunch of config options and tunables. The defaults are intended to
		129	be reasonable for typical desktop and server workloads, but they're not what you
		130	want for getting the best possible numbers when benchmarking.
		131
		132	- Bad write performance
		133
		134	If write performance is not what you expected, you probably wanted to be
		135	running in writeback mode, which isn't the default (not due to a lack of
		136	maturity, but simply because in writeback mode you'll lose data if something
		137	happens to your SSD)
		138
		139	# echo writeback > /sys/block/bcache0/cache_mode
		140
		141	- Bad performance, or traffic not going to the SSD that you'd expect
		142
		143	By default, bcache doesn't cache everything. It tries to skip sequential IO -
		144	because you really want to be caching the random IO, and if you copy a 10
		145	gigabyte file you probably don't want that pushing 10 gigabytes of randomly
		146	accessed data out of your cache.
		147
		148	But if you want to benchmark reads from cache, and you start out with fio
		149	writing an 8 gigabyte test file - so you want to disable that.
		150
		151	# echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
		152
		153	To set it back to the default (4 mb), do
		154
		155	# echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
		156
		157	- Traffic's still going to the spindle/still getting cache misses
		158
		159	In the real world, SSDs don't always keep up with disks - particularly with
		160	slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
		161	you want to avoid being bottlenecked by the SSD and having it slow everything
		162	down.
		163
		164	To avoid that bcache tracks latency to the cache device, and gradually
		165	throttles traffic if the latency exceeds a threshold (it does this by
		166	cranking down the sequential bypass).
		167
		168	You can disable this if you need to by setting the thresholds to 0:
		169
		170	# echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
		171	# echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
		172
		173	The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
		174
		175	- Still getting cache misses, of the same data
		176
		177	One last issue that sometimes trips people up is actually an old bug, due to
		178	the way cache coherency is handled for cache misses. If a btree node is full,
		179	a cache miss won't be able to insert a key for the new data and the data
		180	won't be written to the cache.
		181
		182	In practice this isn't an issue because as soon as a write comes along it'll
		183	cause the btree node to be split, and you need almost no write traffic for
		184	this to not show up enough to be noticable (especially since bcache's btree
		185	nodes are huge and index large regions of the device). But when you're
		186	benchmarking, if you're trying to warm the cache by reading a bunch of data
		187	and there's no other traffic - that can be a problem.
		188
		189	Solution: warm the cache by doing writes, or use the testing branch (there's
		190	a fix for the issue there).
		191
104	SYSFS - BACKING DEVICE:	192	SYSFS - BACKING DEVICE:
105		193
106	attach	194	attach