diff options
author | Marc MERLIN <marc@merlins.org> | 2016-03-12 02:04:19 -0500 |
---|---|---|
committer | Jonathan Corbet <corbet@lwn.net> | 2016-06-23 09:57:03 -0400 |
commit | c9b2ffc02287771e70eb27132c47df6d5c7d91fb (patch) | |
tree | 97f4caf20b461288499f88456b4eb5c512b4ab8c | |
parent | ebc88ef05c825024a5d95285459b8c842c095c0f (diff) |
bcache: documentation updates and corrections
Bcache documentation updates:
- Added new HOWTO/COOKBOOK section
- fixed a few typos
- /sys/block/bcache0/cache_mode is /sys/block/bcache0/bcache/cache_mode
Signed-off-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
-rw-r--r-- | Documentation/bcache.txt | 160 |
1 files changed, 152 insertions, 8 deletions
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt index 32b6c3189d98..0aba405d3368 100644 --- a/Documentation/bcache.txt +++ b/Documentation/bcache.txt | |||
@@ -1,4 +1,4 @@ | |||
1 | Say you've got a big slow raid 6, and an X-25E or three. Wouldn't it be | 1 | Say you've got a big slow raid 6, and an ssd or three. Wouldn't it be |
2 | nice if you could use them as cache... Hence bcache. | 2 | nice if you could use them as cache... Hence bcache. |
3 | 3 | ||
4 | Wiki and git repositories are at: | 4 | Wiki and git repositories are at: |
@@ -8,7 +8,7 @@ Wiki and git repositories are at: | |||
8 | 8 | ||
9 | It's designed around the performance characteristics of SSDs - it only allocates | 9 | It's designed around the performance characteristics of SSDs - it only allocates |
10 | in erase block sized buckets, and it uses a hybrid btree/log to track cached | 10 | in erase block sized buckets, and it uses a hybrid btree/log to track cached |
11 | extants (which can be anywhere from a single sector to the bucket size). It's | 11 | extents (which can be anywhere from a single sector to the bucket size). It's |
12 | designed to avoid random writes at all costs; it fills up an erase block | 12 | designed to avoid random writes at all costs; it fills up an erase block |
13 | sequentially, then issues a discard before reusing it. | 13 | sequentially, then issues a discard before reusing it. |
14 | 14 | ||
@@ -55,7 +55,10 @@ immediately. Without udev, you can manually register devices like this: | |||
55 | Registering the backing device makes the bcache device show up in /dev; you can | 55 | Registering the backing device makes the bcache device show up in /dev; you can |
56 | now format it and use it as normal. But the first time using a new bcache | 56 | now format it and use it as normal. But the first time using a new bcache |
57 | device, it'll be running in passthrough mode until you attach it to a cache. | 57 | device, it'll be running in passthrough mode until you attach it to a cache. |
58 | See the section on attaching. | 58 | If you are thinking about using bcache later, it is recommended to setup all your |
59 | slow devices as bcache backing devices without a cache, and you can choose to add | ||
60 | a caching device later. | ||
61 | See 'ATTACHING' section below. | ||
59 | 62 | ||
60 | The devices show up as: | 63 | The devices show up as: |
61 | 64 | ||
@@ -72,12 +75,14 @@ To get started: | |||
72 | mount /dev/bcache0 /mnt | 75 | mount /dev/bcache0 /mnt |
73 | 76 | ||
74 | You can control bcache devices through sysfs at /sys/block/bcache<N>/bcache . | 77 | You can control bcache devices through sysfs at /sys/block/bcache<N>/bcache . |
78 | You can also control them through /sys/fs//bcache/<cset-uuid>/ . | ||
75 | 79 | ||
76 | Cache devices are managed as sets; multiple caches per set isn't supported yet | 80 | Cache devices are managed as sets; multiple caches per set isn't supported yet |
77 | but will allow for mirroring of metadata and dirty data in the future. Your new | 81 | but will allow for mirroring of metadata and dirty data in the future. Your new |
78 | cache set shows up as /sys/fs/bcache/<UUID> | 82 | cache set shows up as /sys/fs/bcache/<UUID> |
79 | 83 | ||
80 | ATTACHING: | 84 | ATTACHING |
85 | --------- | ||
81 | 86 | ||
82 | After your cache device and backing device are registered, the backing device | 87 | After your cache device and backing device are registered, the backing device |
83 | must be attached to your cache set to enable caching. Attaching a backing | 88 | must be attached to your cache set to enable caching. Attaching a backing |
@@ -105,7 +110,8 @@ but all the cached data will be invalidated. If there was dirty data in the | |||
105 | cache, don't expect the filesystem to be recoverable - you will have massive | 110 | cache, don't expect the filesystem to be recoverable - you will have massive |
106 | filesystem corruption, though ext4's fsck does work miracles. | 111 | filesystem corruption, though ext4's fsck does work miracles. |
107 | 112 | ||
108 | ERROR HANDLING: | 113 | ERROR HANDLING |
114 | -------------- | ||
109 | 115 | ||
110 | Bcache tries to transparently handle IO errors to/from the cache device without | 116 | Bcache tries to transparently handle IO errors to/from the cache device without |
111 | affecting normal operation; if it sees too many errors (the threshold is | 117 | affecting normal operation; if it sees too many errors (the threshold is |
@@ -127,7 +133,143 @@ the backing devices to passthrough mode. | |||
127 | writeback mode). It currently doesn't do anything intelligent if it fails to | 133 | writeback mode). It currently doesn't do anything intelligent if it fails to |
128 | read some of the dirty data, though. | 134 | read some of the dirty data, though. |
129 | 135 | ||
130 | TROUBLESHOOTING PERFORMANCE: | 136 | |
137 | HOWTO/COOKBOOK | ||
138 | -------------- | ||
139 | |||
140 | A) Your bcache doesn't start. | ||
141 | Starting and starting a bcache with a missing caching device | ||
142 | |||
143 | Registering the backing device doesn't help, it's already there, you just need | ||
144 | to force it to run without the cache: | ||
145 | host:~# echo /dev/sdb1 > /sys/fs/bcache/register | ||
146 | [ 119.844831] bcache: register_bcache() error opening /dev/sdb1: device already registered | ||
147 | |||
148 | Next, you try to register your caching device if it's present. However if it's | ||
149 | absent, or registration fails for some reason, you can still start your bcache | ||
150 | without its cache, like so: | ||
151 | host:/sys/block/sdb/sdb1/bcache# echo 1 > running | ||
152 | |||
153 | |||
154 | B) Bcache not finding its cache and not starting | ||
155 | |||
156 | This does not work: | ||
157 | host:/sys/block/md5/bcache# echo 0226553a-37cf-41d5-b3ce-8b1e944543a8 > attach | ||
158 | [ 1933.455082] bcache: bch_cached_dev_attach() Couldn't find uuid for md5 in set | ||
159 | [ 1933.478179] bcache: __cached_dev_store() Can't attach 0226553a-37cf-41d5-b3ce-8b1e944543a8 | ||
160 | [ 1933.478179] : cache set not found | ||
161 | |||
162 | In this case, the caching device was simply not registered at boot or | ||
163 | disappeared and came back, and needs to be (re-)registered: | ||
164 | host:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register | ||
165 | |||
166 | |||
167 | C) Corrupt bcache caching device crashes the kernel on startup/boot | ||
168 | |||
169 | You'll have to wipe the caching device, start the backing device without the | ||
170 | cache, and you can re-attach the cleaned up caching device then. This does | ||
171 | require booting with a kernel/rescue media where bcache is disabled | ||
172 | since it will otherwise try to access your device and probably crash | ||
173 | again before you have a chance to wipe it. | ||
174 | (or if you plan ahead, compile a backup kernel with bcache disabled and keep it | ||
175 | in your grub config for a rainy day) | ||
176 | If bcache is not available in the kernel, a filesystem on the backing device is | ||
177 | still available at an 8KiB offset. So either via a loopdev of the backing device | ||
178 | created with --offset 8K or by temporarily increasing the start sector of the | ||
179 | partition by 16 (512byte sectors). | ||
180 | |||
181 | This is how you wipe the caching device: | ||
182 | host:~# wipefs -a /dev/sdh2 | ||
183 | 16 bytes were erased at offset 0x1018 (bcache) | ||
184 | they were: c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 | ||
185 | |||
186 | After you boot back with bcache enabled, you recreate the cache and attach it: | ||
187 | host:~# make-bcache -C /dev/sdh2 | ||
188 | UUID: 7be7e175-8f4c-4f99-94b2-9c904d227045 | ||
189 | Set UUID: 5bc072a8-ab17-446d-9744-e247949913c1 | ||
190 | version: 0 | ||
191 | nbuckets: 106874 | ||
192 | block_size: 1 | ||
193 | bucket_size: 1024 | ||
194 | nr_in_set: 1 | ||
195 | nr_this_dev: 0 | ||
196 | first_bucket: 1 | ||
197 | [ 650.511912] bcache: run_cache_set() invalidating existing data | ||
198 | [ 650.549228] bcache: register_cache() registered cache device sdh2 | ||
199 | |||
200 | start backing device with missing cache: | ||
201 | host:/sys/block/md5/bcache# echo 1 > running | ||
202 | |||
203 | attach new cache: | ||
204 | host:/sys/block/md5/bcache# echo 5bc072a8-ab17-446d-9744-e247949913c1 > attach | ||
205 | [ 865.276616] bcache: bch_cached_dev_attach() Caching md5 as bcache0 on set 5bc072a8-ab17-446d-9744-e247949913c1 | ||
206 | |||
207 | |||
208 | D) Remove or replace a caching device | ||
209 | |||
210 | host:/sys/block/sda/sda7/bcache# echo 1 > detach | ||
211 | [ 695.872542] bcache: cached_dev_detach_finish() Caching disabled for sda7 | ||
212 | |||
213 | host:~# wipefs -a /dev/nvme0n1p4 | ||
214 | wipefs: error: /dev/nvme0n1p4: probing initialization failed: Device or resource busy | ||
215 | Ooops, it's disabled, but not unregistered, so it's still protected | ||
216 | |||
217 | We need to go and unregister it: | ||
218 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# ls -l cache0 | ||
219 | lrwxrwxrwx 1 root root 0 Feb 25 18:33 cache0 -> ../../../devices/pci0000:00/0000:00:1d.0/0000:70:00.0/nvme/nvme0/nvme0n1/nvme0n1p4/bcache/ | ||
220 | host:/sys/fs/bcache/b7ba27a1-2398-4649-8ae3-0959f57ba128# echo 1 > stop | ||
221 | kernel: [ 917.041908] bcache: cache_set_free() Cache set b7ba27a1-2398-4649-8ae3-0959f57ba128 unregistered | ||
222 | |||
223 | Now we can wipe it: | ||
224 | host:~# wipefs -a /dev/nvme0n1p4 | ||
225 | /dev/nvme0n1p4: 16 bytes were erased at offset 0x00001018 (bcache): c6 85 73 f6 4e 1a 45 ca 82 65 f5 7f 48 ba 6d 81 | ||
226 | |||
227 | |||
228 | E) dmcrypt and bcache | ||
229 | |||
230 | First setup bcache unencrypted and then install dmcrypt on top of /dev/bcache<N> | ||
231 | This will work faster than if you dmcrypt both the backing and caching | ||
232 | devices and then install bcache on top. | ||
233 | |||
234 | |||
235 | F) Stop/free a registered bcache to wipe and/or recreate it | ||
236 | (or maybe you need to free up all bcache references so that you can have fdisk | ||
237 | run and re-register a changed partition table, which won't work if there are any | ||
238 | active backing or caching devices left on it) | ||
239 | |||
240 | 1) Is it present in /dev/bcache* ? (there are times where it won't be) | ||
241 | If so, it's easy: | ||
242 | host:/sys/block/bcache0/bcache# echo 1 > stop | ||
243 | |||
244 | 2) But if your backing device is gone, this won't work: | ||
245 | host:/sys/block/bcache0# cd bcache | ||
246 | bash: cd: bcache: No such file or directory | ||
247 | |||
248 | In this case, you may have to unregister the dmcrypt block device that | ||
249 | references this bcache to free it up: | ||
250 | host:~# dmsetup remove oldds1 | ||
251 | bcache: bcache_device_free() bcache0 stopped | ||
252 | bcache: cache_set_free() Cache set 5bc072a8-ab17-446d-9744-e247949913c1 unregistered | ||
253 | |||
254 | This causes the backing bcache to be removed from /sys/fs/bcache and then it can | ||
255 | be reused | ||
256 | |||
257 | 3) In other cases, you can also look in /sys/fs/bcache/: | ||
258 | host:/sys/fs/bcache# ls -l */{cache?,bdev?} | ||
259 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/bdev1 -> ../../../devices/virtual/block/dm-1/bcache/ | ||
260 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 0226553a-37cf-41d5-b3ce-8b1e944543a8/cache0 -> ../../../devices/virtual/block/dm-4/bcache/ | ||
261 | lrwxrwxrwx 1 root root 0 Mar 5 09:39 5bc072a8-ab17-446d-9744-e247949913c1/cache0 -> ../../../devices/pci0000:00/0000:00:01.0/0000:01:00.0/ata10/host9/target9:0:0/9:0:0:0/block/sdl/sdl2/bcache/ | ||
262 | |||
263 | The device names will show which UUID is relevant, cd in that directory | ||
264 | and stop the cache: | ||
265 | host:/sys/fs/bcache/5bc072a8-ab17-446d-9744-e247949913c1# echo 1 > stop | ||
266 | this will free up bcache references and let you reuse the partition for other | ||
267 | purposes. | ||
268 | |||
269 | |||
270 | |||
271 | TROUBLESHOOTING PERFORMANCE | ||
272 | --------------------------- | ||
131 | 273 | ||
132 | Bcache has a bunch of config options and tunables. The defaults are intended to | 274 | Bcache has a bunch of config options and tunables. The defaults are intended to |
133 | be reasonable for typical desktop and server workloads, but they're not what you | 275 | be reasonable for typical desktop and server workloads, but they're not what you |
@@ -140,7 +282,7 @@ want for getting the best possible numbers when benchmarking. | |||
140 | maturity, but simply because in writeback mode you'll lose data if something | 282 | maturity, but simply because in writeback mode you'll lose data if something |
141 | happens to your SSD) | 283 | happens to your SSD) |
142 | 284 | ||
143 | # echo writeback > /sys/block/bcache0/cache_mode | 285 | # echo writeback > /sys/block/bcache0/bcache/cache_mode |
144 | 286 | ||
145 | - Bad performance, or traffic not going to the SSD that you'd expect | 287 | - Bad performance, or traffic not going to the SSD that you'd expect |
146 | 288 | ||
@@ -193,7 +335,9 @@ want for getting the best possible numbers when benchmarking. | |||
193 | Solution: warm the cache by doing writes, or use the testing branch (there's | 335 | Solution: warm the cache by doing writes, or use the testing branch (there's |
194 | a fix for the issue there). | 336 | a fix for the issue there). |
195 | 337 | ||
196 | SYSFS - BACKING DEVICE: | 338 | |
339 | SYSFS - BACKING DEVICE | ||
340 | ---------------------- | ||
197 | 341 | ||
198 | Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and | 342 | Available at /sys/block/<bdev>/bcache, /sys/block/bcache*/bcache and |
199 | (if attached) /sys/fs/bcache/<cset-uuid>/bdev* | 343 | (if attached) /sys/fs/bcache/<cset-uuid>/bdev* |