diff options
author | Kent Overstreet <koverstreet@google.com> | 2013-03-27 15:24:17 -0400 |
---|---|---|
committer | Kent Overstreet <koverstreet@google.com> | 2013-04-08 16:33:48 -0400 |
commit | 7b41b51a705ec0eb5f88060c9f724c8bc0e79eab (patch) | |
tree | 94f9705bad438d8710b7d67baef1334ccb6819fa /Documentation/bcache.txt | |
parent | cc0f4eaa61817aaea6e61a820f3f1c500a5542b1 (diff) |
bcache: Documentation updates
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Diffstat (limited to 'Documentation/bcache.txt')
-rw-r--r-- | Documentation/bcache.txt | 88 |
1 files changed, 88 insertions, 0 deletions
diff --git a/Documentation/bcache.txt b/Documentation/bcache.txt index 533307d52c87..77db8809bd96 100644 --- a/Documentation/bcache.txt +++ b/Documentation/bcache.txt | |||
@@ -101,6 +101,94 @@ but all the cached data will be invalidated. If there was dirty data in the | |||
101 | cache, don't expect the filesystem to be recoverable - you will have massive | 101 | cache, don't expect the filesystem to be recoverable - you will have massive |
102 | filesystem corruption, though ext4's fsck does work miracles. | 102 | filesystem corruption, though ext4's fsck does work miracles. |
103 | 103 | ||
104 | ERROR HANDLING: | ||
105 | |||
106 | Bcache tries to transparently handle IO errors to/from the cache device without | ||
107 | affecting normal operation; if it sees too many errors (the threshold is | ||
108 | configurable, and defaults to 0) it shuts down the cache device and switches all | ||
109 | the backing devices to passthrough mode. | ||
110 | |||
111 | - For reads from the cache, if they error we just retry the read from the | ||
112 | backing device. | ||
113 | |||
114 | - For writethrough writes, if the write to the cache errors we just switch to | ||
115 | invalidating the data at that lba in the cache (i.e. the same thing we do for | ||
116 | a write that bypasses the cache) | ||
117 | |||
118 | - For writeback writes, we currently pass that error back up to the | ||
119 | filesystem/userspace. This could be improved - we could retry it as a write | ||
120 | that skips the cache so we don't have to error the write. | ||
121 | |||
122 | - When we detach, we first try to flush any dirty data (if we were running in | ||
123 | writeback mode). It currently doesn't do anything intelligent if it fails to | ||
124 | read some of the dirty data, though. | ||
125 | |||
126 | TROUBLESHOOTING PERFORMANCE: | ||
127 | |||
128 | Bcache has a bunch of config options and tunables. The defaults are intended to | ||
129 | be reasonable for typical desktop and server workloads, but they're not what you | ||
130 | want for getting the best possible numbers when benchmarking. | ||
131 | |||
132 | - Bad write performance | ||
133 | |||
134 | If write performance is not what you expected, you probably wanted to be | ||
135 | running in writeback mode, which isn't the default (not due to a lack of | ||
136 | maturity, but simply because in writeback mode you'll lose data if something | ||
137 | happens to your SSD) | ||
138 | |||
139 | # echo writeback > /sys/block/bcache0/cache_mode | ||
140 | |||
141 | - Bad performance, or traffic not going to the SSD that you'd expect | ||
142 | |||
143 | By default, bcache doesn't cache everything. It tries to skip sequential IO - | ||
144 | because you really want to be caching the random IO, and if you copy a 10 | ||
145 | gigabyte file you probably don't want that pushing 10 gigabytes of randomly | ||
146 | accessed data out of your cache. | ||
147 | |||
148 | But if you want to benchmark reads from cache, and you start out with fio | ||
149 | writing an 8 gigabyte test file - so you want to disable that. | ||
150 | |||
151 | # echo 0 > /sys/block/bcache0/bcache/sequential_cutoff | ||
152 | |||
153 | To set it back to the default (4 mb), do | ||
154 | |||
155 | # echo 4M > /sys/block/bcache0/bcache/sequential_cutoff | ||
156 | |||
157 | - Traffic's still going to the spindle/still getting cache misses | ||
158 | |||
159 | In the real world, SSDs don't always keep up with disks - particularly with | ||
160 | slower SSDs, many disks being cached by one SSD, or mostly sequential IO. So | ||
161 | you want to avoid being bottlenecked by the SSD and having it slow everything | ||
162 | down. | ||
163 | |||
164 | To avoid that bcache tracks latency to the cache device, and gradually | ||
165 | throttles traffic if the latency exceeds a threshold (it does this by | ||
166 | cranking down the sequential bypass). | ||
167 | |||
168 | You can disable this if you need to by setting the thresholds to 0: | ||
169 | |||
170 | # echo 0 > /sys/fs/bcache/<cache set>/congested_read_threshold_us | ||
171 | # echo 0 > /sys/fs/bcache/<cache set>/congested_write_threshold_us | ||
172 | |||
173 | The default is 2000 us (2 milliseconds) for reads, and 20000 for writes. | ||
174 | |||
175 | - Still getting cache misses, of the same data | ||
176 | |||
177 | One last issue that sometimes trips people up is actually an old bug, due to | ||
178 | the way cache coherency is handled for cache misses. If a btree node is full, | ||
179 | a cache miss won't be able to insert a key for the new data and the data | ||
180 | won't be written to the cache. | ||
181 | |||
182 | In practice this isn't an issue because as soon as a write comes along it'll | ||
183 | cause the btree node to be split, and you need almost no write traffic for | ||
184 | this to not show up enough to be noticable (especially since bcache's btree | ||
185 | nodes are huge and index large regions of the device). But when you're | ||
186 | benchmarking, if you're trying to warm the cache by reading a bunch of data | ||
187 | and there's no other traffic - that can be a problem. | ||
188 | |||
189 | Solution: warm the cache by doing writes, or use the testing branch (there's | ||
190 | a fix for the issue there). | ||
191 | |||
104 | SYSFS - BACKING DEVICE: | 192 | SYSFS - BACKING DEVICE: |
105 | 193 | ||
106 | attach | 194 | attach |