aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/bcache.txt
diff options
context:
space:
mode:
authorKent Overstreet <koverstreet@google.com>2013-03-27 15:24:17 -0400
committerKent Overstreet <koverstreet@google.com>2013-04-08 16:33:48 -0400
commit7b41b51a705ec0eb5f88060c9f724c8bc0e79eab (patch)
tree94f9705bad438d8710b7d67baef1334ccb6819fa /Documentation/bcache.txt
parentcc0f4eaa61817aaea6e61a820f3f1c500a5542b1 (diff)
bcache: Documentation updates
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Diffstat (limited to 'Documentation/bcache.txt')
-rw-r--r--Documentation/bcache.txt88
1 files changed, 88 insertions, 0 deletions
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt
index 533307d52c87..77db8809bd96 100644
--- a/Documentation/bcache.txt
+++ b/Documentation/bcache.txt
@@ -101,6 +101,94 @@ but all the cached data will be invalidated. If there was dirty data in the
101cache, don't expect the filesystem to be recoverable - you will have massive 101cache, don't expect the filesystem to be recoverable - you will have massive
102filesystem corruption, though ext4's fsck does work miracles. 102filesystem corruption, though ext4's fsck does work miracles.
103 103
104ERROR HANDLING:
105
106Bcache tries to transparently handle IO errors to/from the cache device without
107affecting normal operation; if it sees too many errors (the threshold is
108configurable, and defaults to 0) it shuts down the cache device and switches all
109the backing devices to passthrough mode.
110
111 - For reads from the cache, if they error we just retry the read from the
112 backing device.
113
114 - For writethrough writes, if the write to the cache errors we just switch to
115 invalidating the data at that lba in the cache (i.e. the same thing we do for
116 a write that bypasses the cache)
117
118 - For writeback writes, we currently pass that error back up to the
119 filesystem/userspace. This could be improved - we could retry it as a write
120 that skips the cache so we don't have to error the write.
121
122 - When we detach, we first try to flush any dirty data (if we were running in
123 writeback mode). It currently doesn't do anything intelligent if it fails to
124 read some of the dirty data, though.
125
126TROUBLESHOOTING PERFORMANCE:
127
128Bcache has a bunch of config options and tunables. The defaults are intended to
129be reasonable for typical desktop and server workloads, but they're not what you
130want for getting the best possible numbers when benchmarking.
131
132 - Bad write performance
133
134 If write performance is not what you expected, you probably wanted to be
135 running in writeback mode, which isn't the default (not due to a lack of
136 maturity, but simply because in writeback mode you'll lose data if something
137 happens to your SSD)
138
139 # echo writeback > /sys/block/bcache0/cache_mode
140
141 - Bad performance, or traffic not going to the SSD that you'd expect
142
143 By default, bcache doesn't cache everything. It tries to skip sequential IO -
144 because you really want to be caching the random IO, and if you copy a 10
145 gigabyte file you probably don't want that pushing 10 gigabytes of randomly
146 accessed data out of your cache.
147
148 But if you want to benchmark reads from cache, and you start out with fio
149 writing an 8 gigabyte test file - so you want to disable that.
150
151 # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff
152
153 To set it back to the default (4 mb), do
154
155 # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff
156
157 - Traffic's still going to the spindle/still getting cache misses
158
159 In the real world, SSDs don't always keep up with disks - particularly with
160 slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So
161 you want to avoid being bottlenecked by the SSD and having it slow everything
162 down.
163
164 To avoid that bcache tracks latency to the cache device, and gradually
165 throttles traffic if the latency exceeds a threshold (it does this by
166 cranking down the sequential bypass).
167
168 You can disable this if you need to by setting the thresholds to 0:
169
170 # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us
171 # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us
172
173 The default is 2000 us (2 milliseconds) for reads, and 20000 for writes.
174
175 - Still getting cache misses, of the same data
176
177 One last issue that sometimes trips people up is actually an old bug, due to
178 the way cache coherency is handled for cache misses. If a btree node is full,
179 a cache miss won't be able to insert a key for the new data and the data
180 won't be written to the cache.
181
182 In practice this isn't an issue because as soon as a write comes along it'll
183 cause the btree node to be split, and you need almost no write traffic for
184 this to not show up enough to be noticable (especially since bcache's btree
185 nodes are huge and index large regions of the device). But when you're
186 benchmarking, if you're trying to warm the cache by reading a bunch of data
187 and there's no other traffic - that can be a problem.
188
189 Solution: warm the cache by doing writes, or use the testing branch (there's
190 a fix for the issue there).
191
104SYSFS - BACKING DEVICE: 192SYSFS - BACKING DEVICE:
105 193
106attach 194attach