diff options
author | Joe Thornber <thornber@redhat.com> | 2011-10-31 16:19:11 -0400 |
---|---|---|
committer | Alasdair G Kergon <agk@redhat.com> | 2011-10-31 16:19:11 -0400 |
commit | 3241b1d3e0aaafbfcd320f4d71ade629728cc4f4 (patch) | |
tree | 499461f724d4db3d7118641f4a20f5be23549edd | |
parent | 95d402f057f2e208e4631893f6cd4a59c7c05e41 (diff) |
dm: add persistent data library
The persistent-data library offers a re-usable framework for the storage
and management of on-disk metadata in device-mapper targets.
It's used by the thin-provisioning target in the next patch and in an
upcoming hierarchical storage target.
For further information, please read
Documentation/device-mapper/persistent-data.txt
Signed-off-by: Joe Thornber <thornber@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
22 files changed, 5709 insertions, 0 deletions
diff --git a/Documentation/device-mapper/persistent-data.txt b/Documentation/device-mapper/persistent-data.txt new file mode 100644 index 000000000000..0e5df9b04ad2 --- /dev/null +++ b/Documentation/device-mapper/persistent-data.txt | |||
@@ -0,0 +1,84 @@ | |||
1 | Introduction | ||
2 | ============ | ||
3 | |||
4 | The more-sophisticated device-mapper targets require complex metadata | ||
5 | that is managed in kernel. In late 2010 we were seeing that various | ||
6 | different targets were rolling their own data strutures, for example: | ||
7 | |||
8 | - Mikulas Patocka's multisnap implementation | ||
9 | - Heinz Mauelshagen's thin provisioning target | ||
10 | - Another btree-based caching target posted to dm-devel | ||
11 | - Another multi-snapshot target based on a design of Daniel Phillips | ||
12 | |||
13 | Maintaining these data structures takes a lot of work, so if possible | ||
14 | we'd like to reduce the number. | ||
15 | |||
16 | The persistent-data library is an attempt to provide a re-usable | ||
17 | framework for people who want to store metadata in device-mapper | ||
18 | targets. It's currently used by the thin-provisioning target and an | ||
19 | upcoming hierarchical storage target. | ||
20 | |||
21 | Overview | ||
22 | ======== | ||
23 | |||
24 | The main documentation is in the header files which can all be found | ||
25 | under drivers/md/persistent-data. | ||
26 | |||
27 | The block manager | ||
28 | ----------------- | ||
29 | |||
30 | dm-block-manager.[hc] | ||
31 | |||
32 | This provides access to the data on disk in fixed sized-blocks. There | ||
33 | is a read/write locking interface to prevent concurrent accesses, and | ||
34 | keep data that is being used in the cache. | ||
35 | |||
36 | Clients of persistent-data are unlikely to use this directly. | ||
37 | |||
38 | The transaction manager | ||
39 | ----------------------- | ||
40 | |||
41 | dm-transaction-manager.[hc] | ||
42 | |||
43 | This restricts access to blocks and enforces copy-on-write semantics. | ||
44 | The only way you can get hold of a writable block through the | ||
45 | transaction manager is by shadowing an existing block (ie. doing | ||
46 | copy-on-write) or allocating a fresh one. Shadowing is elided within | ||
47 | the same transaction so performance is reasonable. The commit method | ||
48 | ensures that all data is flushed before it writes the superblock. | ||
49 | On power failure your metadata will be as it was when last committed. | ||
50 | |||
51 | The Space Maps | ||
52 | -------------- | ||
53 | |||
54 | dm-space-map.h | ||
55 | dm-space-map-metadata.[hc] | ||
56 | dm-space-map-disk.[hc] | ||
57 | |||
58 | On-disk data structures that keep track of reference counts of blocks. | ||
59 | Also acts as the allocator of new blocks. Currently two | ||
60 | implementations: a simpler one for managing blocks on a different | ||
61 | device (eg. thinly-provisioned data blocks); and one for managing | ||
62 | the metadata space. The latter is complicated by the need to store | ||
63 | its own data within the space it's managing. | ||
64 | |||
65 | The data structures | ||
66 | ------------------- | ||
67 | |||
68 | dm-btree.[hc] | ||
69 | dm-btree-remove.c | ||
70 | dm-btree-spine.c | ||
71 | dm-btree-internal.h | ||
72 | |||
73 | Currently there is only one data structure, a hierarchical btree. | ||
74 | There are plans to add more. For example, something with an | ||
75 | array-like interface would see a lot of use. | ||
76 | |||
77 | The btree is 'hierarchical' in that you can define it to be composed | ||
78 | of nested btrees, and take multiple keys. For example, the | ||
79 | thin-provisioning target uses a btree with two levels of nesting. | ||
80 | The first maps a device id to a mapping tree, and that in turn maps a | ||
81 | virtual block to a physical block. | ||
82 | |||
83 | Values stored in the btrees can have arbitrary size. Keys are always | ||
84 | 64bits, although nesting allows you to use multiple keys. | ||
diff --git a/drivers/md/persistent-data/Kconfig b/drivers/md/persistent-data/Kconfig new file mode 100644 index 000000000000..ceb359050a59 --- /dev/null +++ b/drivers/md/persistent-data/Kconfig | |||
@@ -0,0 +1,8 @@ | |||
1 | config DM_PERSISTENT_DATA | ||
2 | tristate | ||
3 | depends on BLK_DEV_DM && EXPERIMENTAL | ||
4 | select LIBCRC32C | ||
5 | select DM_BUFIO | ||
6 | ---help--- | ||
7 | Library providing immutable on-disk data structure support for | ||
8 | device-mapper targets such as the thin provisioning target. | ||
diff --git a/drivers/md/persistent-data/Makefile b/drivers/md/persistent-data/Makefile new file mode 100644 index 000000000000..cfa95f662230 --- /dev/null +++ b/drivers/md/persistent-data/Makefile | |||
@@ -0,0 +1,11 @@ | |||
1 | obj-$(CONFIG_DM_PERSISTENT_DATA) += dm-persistent-data.o | ||
2 | dm-persistent-data-objs := \ | ||
3 | dm-block-manager.o \ | ||
4 | dm-space-map-checker.o \ | ||
5 | dm-space-map-common.o \ | ||
6 | dm-space-map-disk.o \ | ||
7 | dm-space-map-metadata.o \ | ||
8 | dm-transaction-manager.o \ | ||
9 | dm-btree.o \ | ||
10 | dm-btree-remove.o \ | ||
11 | dm-btree-spine.o | ||
diff --git a/drivers/md/persistent-data/dm-block-manager.c b/drivers/md/persistent-data/dm-block-manager.c new file mode 100644 index 000000000000..0317ecdc6e53 --- /dev/null +++ b/drivers/md/persistent-data/dm-block-manager.c | |||
@@ -0,0 +1,620 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | #include "dm-block-manager.h" | ||
7 | #include "dm-persistent-data-internal.h" | ||
8 | #include "../dm-bufio.h" | ||
9 | |||
10 | #include <linux/crc32c.h> | ||
11 | #include <linux/module.h> | ||
12 | #include <linux/slab.h> | ||
13 | #include <linux/rwsem.h> | ||
14 | #include <linux/device-mapper.h> | ||
15 | #include <linux/stacktrace.h> | ||
16 | |||
17 | #define DM_MSG_PREFIX "block manager" | ||
18 | |||
19 | /*----------------------------------------------------------------*/ | ||
20 | |||
21 | /* | ||
22 | * This is a read/write semaphore with a couple of differences. | ||
23 | * | ||
24 | * i) There is a restriction on the number of concurrent read locks that | ||
25 | * may be held at once. This is just an implementation detail. | ||
26 | * | ||
27 | * ii) Recursive locking attempts are detected and return EINVAL. A stack | ||
28 | * trace is also emitted for the previous lock aquisition. | ||
29 | * | ||
30 | * iii) Priority is given to write locks. | ||
31 | */ | ||
32 | #define MAX_HOLDERS 4 | ||
33 | #define MAX_STACK 10 | ||
34 | |||
35 | typedef unsigned long stack_entries[MAX_STACK]; | ||
36 | |||
37 | struct block_lock { | ||
38 | spinlock_t lock; | ||
39 | __s32 count; | ||
40 | struct list_head waiters; | ||
41 | struct task_struct *holders[MAX_HOLDERS]; | ||
42 | |||
43 | #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING | ||
44 | struct stack_trace traces[MAX_HOLDERS]; | ||
45 | stack_entries entries[MAX_HOLDERS]; | ||
46 | #endif | ||
47 | }; | ||
48 | |||
49 | struct waiter { | ||
50 | struct list_head list; | ||
51 | struct task_struct *task; | ||
52 | int wants_write; | ||
53 | }; | ||
54 | |||
55 | static unsigned __find_holder(struct block_lock *lock, | ||
56 | struct task_struct *task) | ||
57 | { | ||
58 | unsigned i; | ||
59 | |||
60 | for (i = 0; i < MAX_HOLDERS; i++) | ||
61 | if (lock->holders[i] == task) | ||
62 | break; | ||
63 | |||
64 | BUG_ON(i == MAX_HOLDERS); | ||
65 | return i; | ||
66 | } | ||
67 | |||
68 | /* call this *after* you increment lock->count */ | ||
69 | static void __add_holder(struct block_lock *lock, struct task_struct *task) | ||
70 | { | ||
71 | unsigned h = __find_holder(lock, NULL); | ||
72 | #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING | ||
73 | struct stack_trace *t; | ||
74 | #endif | ||
75 | |||
76 | get_task_struct(task); | ||
77 | lock->holders[h] = task; | ||
78 | |||
79 | #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING | ||
80 | t = lock->traces + h; | ||
81 | t->nr_entries = 0; | ||
82 | t->max_entries = MAX_STACK; | ||
83 | t->entries = lock->entries[h]; | ||
84 | t->skip = 2; | ||
85 | save_stack_trace(t); | ||
86 | #endif | ||
87 | } | ||
88 | |||
89 | /* call this *before* you decrement lock->count */ | ||
90 | static void __del_holder(struct block_lock *lock, struct task_struct *task) | ||
91 | { | ||
92 | unsigned h = __find_holder(lock, task); | ||
93 | lock->holders[h] = NULL; | ||
94 | put_task_struct(task); | ||
95 | } | ||
96 | |||
97 | static int __check_holder(struct block_lock *lock) | ||
98 | { | ||
99 | unsigned i; | ||
100 | #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING | ||
101 | static struct stack_trace t; | ||
102 | static stack_entries entries; | ||
103 | #endif | ||
104 | |||
105 | for (i = 0; i < MAX_HOLDERS; i++) { | ||
106 | if (lock->holders[i] == current) { | ||
107 | DMERR("recursive lock detected in pool metadata"); | ||
108 | #ifdef CONFIG_DM_DEBUG_BLOCK_STACK_TRACING | ||
109 | DMERR("previously held here:"); | ||
110 | print_stack_trace(lock->traces + i, 4); | ||
111 | |||
112 | DMERR("subsequent aquisition attempted here:"); | ||
113 | t.nr_entries = 0; | ||
114 | t.max_entries = MAX_STACK; | ||
115 | t.entries = entries; | ||
116 | t.skip = 3; | ||
117 | save_stack_trace(&t); | ||
118 | print_stack_trace(&t, 4); | ||
119 | #endif | ||
120 | return -EINVAL; | ||
121 | } | ||
122 | } | ||
123 | |||
124 | return 0; | ||
125 | } | ||
126 | |||
127 | static void __wait(struct waiter *w) | ||
128 | { | ||
129 | for (;;) { | ||
130 | set_task_state(current, TASK_UNINTERRUPTIBLE); | ||
131 | |||
132 | if (!w->task) | ||
133 | break; | ||
134 | |||
135 | schedule(); | ||
136 | } | ||
137 | |||
138 | set_task_state(current, TASK_RUNNING); | ||
139 | } | ||
140 | |||
141 | static void __wake_waiter(struct waiter *w) | ||
142 | { | ||
143 | struct task_struct *task; | ||
144 | |||
145 | list_del(&w->list); | ||
146 | task = w->task; | ||
147 | smp_mb(); | ||
148 | w->task = NULL; | ||
149 | wake_up_process(task); | ||
150 | } | ||
151 | |||
152 | /* | ||
153 | * We either wake a few readers or a single writer. | ||
154 | */ | ||
155 | static void __wake_many(struct block_lock *lock) | ||
156 | { | ||
157 | struct waiter *w, *tmp; | ||
158 | |||
159 | BUG_ON(lock->count < 0); | ||
160 | list_for_each_entry_safe(w, tmp, &lock->waiters, list) { | ||
161 | if (lock->count >= MAX_HOLDERS) | ||
162 | return; | ||
163 | |||
164 | if (w->wants_write) { | ||
165 | if (lock->count > 0) | ||
166 | return; /* still read locked */ | ||
167 | |||
168 | lock->count = -1; | ||
169 | __add_holder(lock, w->task); | ||
170 | __wake_waiter(w); | ||
171 | return; | ||
172 | } | ||
173 | |||
174 | lock->count++; | ||
175 | __add_holder(lock, w->task); | ||
176 | __wake_waiter(w); | ||
177 | } | ||
178 | } | ||
179 | |||
180 | static void bl_init(struct block_lock *lock) | ||
181 | { | ||
182 | int i; | ||
183 | |||
184 | spin_lock_init(&lock->lock); | ||
185 | lock->count = 0; | ||
186 | INIT_LIST_HEAD(&lock->waiters); | ||
187 | for (i = 0; i < MAX_HOLDERS; i++) | ||
188 | lock->holders[i] = NULL; | ||
189 | } | ||
190 | |||
191 | static int __available_for_read(struct block_lock *lock) | ||
192 | { | ||
193 | return lock->count >= 0 && | ||
194 | lock->count < MAX_HOLDERS && | ||
195 | list_empty(&lock->waiters); | ||
196 | } | ||
197 | |||
198 | static int bl_down_read(struct block_lock *lock) | ||
199 | { | ||
200 | int r; | ||
201 | struct waiter w; | ||
202 | |||
203 | spin_lock(&lock->lock); | ||
204 | r = __check_holder(lock); | ||
205 | if (r) { | ||
206 | spin_unlock(&lock->lock); | ||
207 | return r; | ||
208 | } | ||
209 | |||
210 | if (__available_for_read(lock)) { | ||
211 | lock->count++; | ||
212 | __add_holder(lock, current); | ||
213 | spin_unlock(&lock->lock); | ||
214 | return 0; | ||
215 | } | ||
216 | |||
217 | get_task_struct(current); | ||
218 | |||
219 | w.task = current; | ||
220 | w.wants_write = 0; | ||
221 | list_add_tail(&w.list, &lock->waiters); | ||
222 | spin_unlock(&lock->lock); | ||
223 | |||
224 | __wait(&w); | ||
225 | put_task_struct(current); | ||
226 | return 0; | ||
227 | } | ||
228 | |||
229 | static int bl_down_read_nonblock(struct block_lock *lock) | ||
230 | { | ||
231 | int r; | ||
232 | |||
233 | spin_lock(&lock->lock); | ||
234 | r = __check_holder(lock); | ||
235 | if (r) | ||
236 | goto out; | ||
237 | |||
238 | if (__available_for_read(lock)) { | ||
239 | lock->count++; | ||
240 | __add_holder(lock, current); | ||
241 | r = 0; | ||
242 | } else | ||
243 | r = -EWOULDBLOCK; | ||
244 | |||
245 | out: | ||
246 | spin_unlock(&lock->lock); | ||
247 | return r; | ||
248 | } | ||
249 | |||
250 | static void bl_up_read(struct block_lock *lock) | ||
251 | { | ||
252 | spin_lock(&lock->lock); | ||
253 | BUG_ON(lock->count <= 0); | ||
254 | __del_holder(lock, current); | ||
255 | --lock->count; | ||
256 | if (!list_empty(&lock->waiters)) | ||
257 | __wake_many(lock); | ||
258 | spin_unlock(&lock->lock); | ||
259 | } | ||
260 | |||
261 | static int bl_down_write(struct block_lock *lock) | ||
262 | { | ||
263 | int r; | ||
264 | struct waiter w; | ||
265 | |||
266 | spin_lock(&lock->lock); | ||
267 | r = __check_holder(lock); | ||
268 | if (r) { | ||
269 | spin_unlock(&lock->lock); | ||
270 | return r; | ||
271 | } | ||
272 | |||
273 | if (lock->count == 0 && list_empty(&lock->waiters)) { | ||
274 | lock->count = -1; | ||
275 | __add_holder(lock, current); | ||
276 | spin_unlock(&lock->lock); | ||
277 | return 0; | ||
278 | } | ||
279 | |||
280 | get_task_struct(current); | ||
281 | w.task = current; | ||
282 | w.wants_write = 1; | ||
283 | |||
284 | /* | ||
285 | * Writers given priority. We know there's only one mutator in the | ||
286 | * system, so ignoring the ordering reversal. | ||
287 | */ | ||
288 | list_add(&w.list, &lock->waiters); | ||
289 | spin_unlock(&lock->lock); | ||
290 | |||
291 | __wait(&w); | ||
292 | put_task_struct(current); | ||
293 | |||
294 | return 0; | ||
295 | } | ||
296 | |||
297 | static void bl_up_write(struct block_lock *lock) | ||
298 | { | ||
299 | spin_lock(&lock->lock); | ||
300 | __del_holder(lock, current); | ||
301 | lock->count = 0; | ||
302 | if (!list_empty(&lock->waiters)) | ||
303 | __wake_many(lock); | ||
304 | spin_unlock(&lock->lock); | ||
305 | } | ||
306 | |||
307 | static void report_recursive_bug(dm_block_t b, int r) | ||
308 | { | ||
309 | if (r == -EINVAL) | ||
310 | DMERR("recursive acquisition of block %llu requested.", | ||
311 | (unsigned long long) b); | ||
312 | } | ||
313 | |||
314 | /*----------------------------------------------------------------*/ | ||
315 | |||
316 | /* | ||
317 | * Block manager is currently implemented using dm-bufio. struct | ||
318 | * dm_block_manager and struct dm_block map directly onto a couple of | ||
319 | * structs in the bufio interface. I want to retain the freedom to move | ||
320 | * away from bufio in the future. So these structs are just cast within | ||
321 | * this .c file, rather than making it through to the public interface. | ||
322 | */ | ||
323 | static struct dm_buffer *to_buffer(struct dm_block *b) | ||
324 | { | ||
325 | return (struct dm_buffer *) b; | ||
326 | } | ||
327 | |||
328 | static struct dm_bufio_client *to_bufio(struct dm_block_manager *bm) | ||
329 | { | ||
330 | return (struct dm_bufio_client *) bm; | ||
331 | } | ||
332 | |||
333 | dm_block_t dm_block_location(struct dm_block *b) | ||
334 | { | ||
335 | return dm_bufio_get_block_number(to_buffer(b)); | ||
336 | } | ||
337 | EXPORT_SYMBOL_GPL(dm_block_location); | ||
338 | |||
339 | void *dm_block_data(struct dm_block *b) | ||
340 | { | ||
341 | return dm_bufio_get_block_data(to_buffer(b)); | ||
342 | } | ||
343 | EXPORT_SYMBOL_GPL(dm_block_data); | ||
344 | |||
345 | struct buffer_aux { | ||
346 | struct dm_block_validator *validator; | ||
347 | struct block_lock lock; | ||
348 | int write_locked; | ||
349 | }; | ||
350 | |||
351 | static void dm_block_manager_alloc_callback(struct dm_buffer *buf) | ||
352 | { | ||
353 | struct buffer_aux *aux = dm_bufio_get_aux_data(buf); | ||
354 | aux->validator = NULL; | ||
355 | bl_init(&aux->lock); | ||
356 | } | ||
357 | |||
358 | static void dm_block_manager_write_callback(struct dm_buffer *buf) | ||
359 | { | ||
360 | struct buffer_aux *aux = dm_bufio_get_aux_data(buf); | ||
361 | if (aux->validator) { | ||
362 | aux->validator->prepare_for_write(aux->validator, (struct dm_block *) buf, | ||
363 | dm_bufio_get_block_size(dm_bufio_get_client(buf))); | ||
364 | } | ||
365 | } | ||
366 | |||
367 | /*---------------------------------------------------------------- | ||
368 | * Public interface | ||
369 | *--------------------------------------------------------------*/ | ||
370 | struct dm_block_manager *dm_block_manager_create(struct block_device *bdev, | ||
371 | unsigned block_size, | ||
372 | unsigned cache_size, | ||
373 | unsigned max_held_per_thread) | ||
374 | { | ||
375 | return (struct dm_block_manager *) | ||
376 | dm_bufio_client_create(bdev, block_size, max_held_per_thread, | ||
377 | sizeof(struct buffer_aux), | ||
378 | dm_block_manager_alloc_callback, | ||
379 | dm_block_manager_write_callback); | ||
380 | } | ||
381 | EXPORT_SYMBOL_GPL(dm_block_manager_create); | ||
382 | |||
383 | void dm_block_manager_destroy(struct dm_block_manager *bm) | ||
384 | { | ||
385 | return dm_bufio_client_destroy(to_bufio(bm)); | ||
386 | } | ||
387 | EXPORT_SYMBOL_GPL(dm_block_manager_destroy); | ||
388 | |||
389 | unsigned dm_bm_block_size(struct dm_block_manager *bm) | ||
390 | { | ||
391 | return dm_bufio_get_block_size(to_bufio(bm)); | ||
392 | } | ||
393 | EXPORT_SYMBOL_GPL(dm_bm_block_size); | ||
394 | |||
395 | dm_block_t dm_bm_nr_blocks(struct dm_block_manager *bm) | ||
396 | { | ||
397 | return dm_bufio_get_device_size(to_bufio(bm)); | ||
398 | } | ||
399 | |||
400 | static int dm_bm_validate_buffer(struct dm_block_manager *bm, | ||
401 | struct dm_buffer *buf, | ||
402 | struct buffer_aux *aux, | ||
403 | struct dm_block_validator *v) | ||
404 | { | ||
405 | if (unlikely(!aux->validator)) { | ||
406 | int r; | ||
407 | if (!v) | ||
408 | return 0; | ||
409 | r = v->check(v, (struct dm_block *) buf, dm_bufio_get_block_size(to_bufio(bm))); | ||
410 | if (unlikely(r)) | ||
411 | return r; | ||
412 | aux->validator = v; | ||
413 | } else { | ||
414 | if (unlikely(aux->validator != v)) { | ||
415 | DMERR("validator mismatch (old=%s vs new=%s) for block %llu", | ||
416 | aux->validator->name, v ? v->name : "NULL", | ||
417 | (unsigned long long) | ||
418 | dm_bufio_get_block_number(buf)); | ||
419 | return -EINVAL; | ||
420 | } | ||
421 | } | ||
422 | |||
423 | return 0; | ||
424 | } | ||
425 | int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b, | ||
426 | struct dm_block_validator *v, | ||
427 | struct dm_block **result) | ||
428 | { | ||
429 | struct buffer_aux *aux; | ||
430 | void *p; | ||
431 | int r; | ||
432 | |||
433 | p = dm_bufio_read(to_bufio(bm), b, (struct dm_buffer **) result); | ||
434 | if (unlikely(IS_ERR(p))) | ||
435 | return PTR_ERR(p); | ||
436 | |||
437 | aux = dm_bufio_get_aux_data(to_buffer(*result)); | ||
438 | r = bl_down_read(&aux->lock); | ||
439 | if (unlikely(r)) { | ||
440 | dm_bufio_release(to_buffer(*result)); | ||
441 | report_recursive_bug(b, r); | ||
442 | return r; | ||
443 | } | ||
444 | |||
445 | aux->write_locked = 0; | ||
446 | |||
447 | r = dm_bm_validate_buffer(bm, to_buffer(*result), aux, v); | ||
448 | if (unlikely(r)) { | ||
449 | bl_up_read(&aux->lock); | ||
450 | dm_bufio_release(to_buffer(*result)); | ||
451 | return r; | ||
452 | } | ||
453 | |||
454 | return 0; | ||
455 | } | ||
456 | EXPORT_SYMBOL_GPL(dm_bm_read_lock); | ||
457 | |||
458 | int dm_bm_write_lock(struct dm_block_manager *bm, | ||
459 | dm_block_t b, struct dm_block_validator *v, | ||
460 | struct dm_block **result) | ||
461 | { | ||
462 | struct buffer_aux *aux; | ||
463 | void *p; | ||
464 | int r; | ||
465 | |||
466 | p = dm_bufio_read(to_bufio(bm), b, (struct dm_buffer **) result); | ||
467 | if (unlikely(IS_ERR(p))) | ||
468 | return PTR_ERR(p); | ||
469 | |||
470 | aux = dm_bufio_get_aux_data(to_buffer(*result)); | ||
471 | r = bl_down_write(&aux->lock); | ||
472 | if (r) { | ||
473 | dm_bufio_release(to_buffer(*result)); | ||
474 | report_recursive_bug(b, r); | ||
475 | return r; | ||
476 | } | ||
477 | |||
478 | aux->write_locked = 1; | ||
479 | |||
480 | r = dm_bm_validate_buffer(bm, to_buffer(*result), aux, v); | ||
481 | if (unlikely(r)) { | ||
482 | bl_up_write(&aux->lock); | ||
483 | dm_bufio_release(to_buffer(*result)); | ||
484 | return r; | ||
485 | } | ||
486 | |||
487 | return 0; | ||
488 | } | ||
489 | EXPORT_SYMBOL_GPL(dm_bm_write_lock); | ||
490 | |||
491 | int dm_bm_read_try_lock(struct dm_block_manager *bm, | ||
492 | dm_block_t b, struct dm_block_validator *v, | ||
493 | struct dm_block **result) | ||
494 | { | ||
495 | struct buffer_aux *aux; | ||
496 | void *p; | ||
497 | int r; | ||
498 | |||
499 | p = dm_bufio_get(to_bufio(bm), b, (struct dm_buffer **) result); | ||
500 | if (unlikely(IS_ERR(p))) | ||
501 | return PTR_ERR(p); | ||
502 | if (unlikely(!p)) | ||
503 | return -EWOULDBLOCK; | ||
504 | |||
505 | aux = dm_bufio_get_aux_data(to_buffer(*result)); | ||
506 | r = bl_down_read_nonblock(&aux->lock); | ||
507 | if (r < 0) { | ||
508 | dm_bufio_release(to_buffer(*result)); | ||
509 | report_recursive_bug(b, r); | ||
510 | return r; | ||
511 | } | ||
512 | aux->write_locked = 0; | ||
513 | |||
514 | r = dm_bm_validate_buffer(bm, to_buffer(*result), aux, v); | ||
515 | if (unlikely(r)) { | ||
516 | bl_up_read(&aux->lock); | ||
517 | dm_bufio_release(to_buffer(*result)); | ||
518 | return r; | ||
519 | } | ||
520 | |||
521 | return 0; | ||
522 | } | ||
523 | |||
524 | int dm_bm_write_lock_zero(struct dm_block_manager *bm, | ||
525 | dm_block_t b, struct dm_block_validator *v, | ||
526 | struct dm_block **result) | ||
527 | { | ||
528 | int r; | ||
529 | struct buffer_aux *aux; | ||
530 | void *p; | ||
531 | |||
532 | p = dm_bufio_new(to_bufio(bm), b, (struct dm_buffer **) result); | ||
533 | if (unlikely(IS_ERR(p))) | ||
534 | return PTR_ERR(p); | ||
535 | |||
536 | memset(p, 0, dm_bm_block_size(bm)); | ||
537 | |||
538 | aux = dm_bufio_get_aux_data(to_buffer(*result)); | ||
539 | r = bl_down_write(&aux->lock); | ||
540 | if (r) { | ||
541 | dm_bufio_release(to_buffer(*result)); | ||
542 | return r; | ||
543 | } | ||
544 | |||
545 | aux->write_locked = 1; | ||
546 | aux->validator = v; | ||
547 | |||
548 | return 0; | ||
549 | } | ||
550 | |||
551 | int dm_bm_unlock(struct dm_block *b) | ||
552 | { | ||
553 | struct buffer_aux *aux; | ||
554 | aux = dm_bufio_get_aux_data(to_buffer(b)); | ||
555 | |||
556 | if (aux->write_locked) { | ||
557 | dm_bufio_mark_buffer_dirty(to_buffer(b)); | ||
558 | bl_up_write(&aux->lock); | ||
559 | } else | ||
560 | bl_up_read(&aux->lock); | ||
561 | |||
562 | dm_bufio_release(to_buffer(b)); | ||
563 | |||
564 | return 0; | ||
565 | } | ||
566 | EXPORT_SYMBOL_GPL(dm_bm_unlock); | ||
567 | |||
568 | int dm_bm_unlock_move(struct dm_block *b, dm_block_t n) | ||
569 | { | ||
570 | struct buffer_aux *aux; | ||
571 | |||
572 | aux = dm_bufio_get_aux_data(to_buffer(b)); | ||
573 | |||
574 | if (aux->write_locked) { | ||
575 | dm_bufio_mark_buffer_dirty(to_buffer(b)); | ||
576 | bl_up_write(&aux->lock); | ||
577 | } else | ||
578 | bl_up_read(&aux->lock); | ||
579 | |||
580 | dm_bufio_release_move(to_buffer(b), n); | ||
581 | return 0; | ||
582 | } | ||
583 | |||
584 | int dm_bm_flush_and_unlock(struct dm_block_manager *bm, | ||
585 | struct dm_block *superblock) | ||
586 | { | ||
587 | int r; | ||
588 | |||
589 | r = dm_bufio_write_dirty_buffers(to_bufio(bm)); | ||
590 | if (unlikely(r)) | ||
591 | return r; | ||
592 | r = dm_bufio_issue_flush(to_bufio(bm)); | ||
593 | if (unlikely(r)) | ||
594 | return r; | ||
595 | |||
596 | dm_bm_unlock(superblock); | ||
597 | |||
598 | r = dm_bufio_write_dirty_buffers(to_bufio(bm)); | ||
599 | if (unlikely(r)) | ||
600 | return r; | ||
601 | r = dm_bufio_issue_flush(to_bufio(bm)); | ||
602 | if (unlikely(r)) | ||
603 | return r; | ||
604 | |||
605 | return 0; | ||
606 | } | ||
607 | |||
608 | u32 dm_bm_checksum(const void *data, size_t len, u32 init_xor) | ||
609 | { | ||
610 | return crc32c(~(u32) 0, data, len) ^ init_xor; | ||
611 | } | ||
612 | EXPORT_SYMBOL_GPL(dm_bm_checksum); | ||
613 | |||
614 | /*----------------------------------------------------------------*/ | ||
615 | |||
616 | MODULE_LICENSE("GPL"); | ||
617 | MODULE_AUTHOR("Joe Thornber <dm-devel@redhat.com>"); | ||
618 | MODULE_DESCRIPTION("Immutable metadata library for dm"); | ||
619 | |||
620 | /*----------------------------------------------------------------*/ | ||
diff --git a/drivers/md/persistent-data/dm-block-manager.h b/drivers/md/persistent-data/dm-block-manager.h new file mode 100644 index 000000000000..924833d2dfa6 --- /dev/null +++ b/drivers/md/persistent-data/dm-block-manager.h | |||
@@ -0,0 +1,123 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef _LINUX_DM_BLOCK_MANAGER_H | ||
8 | #define _LINUX_DM_BLOCK_MANAGER_H | ||
9 | |||
10 | #include <linux/types.h> | ||
11 | #include <linux/blkdev.h> | ||
12 | |||
13 | /*----------------------------------------------------------------*/ | ||
14 | |||
15 | /* | ||
16 | * Block number. | ||
17 | */ | ||
18 | typedef uint64_t dm_block_t; | ||
19 | struct dm_block; | ||
20 | |||
21 | dm_block_t dm_block_location(struct dm_block *b); | ||
22 | void *dm_block_data(struct dm_block *b); | ||
23 | |||
24 | /*----------------------------------------------------------------*/ | ||
25 | |||
26 | /* | ||
27 | * @name should be a unique identifier for the block manager, no longer | ||
28 | * than 32 chars. | ||
29 | * | ||
30 | * @max_held_per_thread should be the maximum number of locks, read or | ||
31 | * write, that an individual thread holds at any one time. | ||
32 | */ | ||
33 | struct dm_block_manager; | ||
34 | struct dm_block_manager *dm_block_manager_create( | ||
35 | struct block_device *bdev, unsigned block_size, | ||
36 | unsigned cache_size, unsigned max_held_per_thread); | ||
37 | void dm_block_manager_destroy(struct dm_block_manager *bm); | ||
38 | |||
39 | unsigned dm_bm_block_size(struct dm_block_manager *bm); | ||
40 | dm_block_t dm_bm_nr_blocks(struct dm_block_manager *bm); | ||
41 | |||
42 | /*----------------------------------------------------------------*/ | ||
43 | |||
44 | /* | ||
45 | * The validator allows the caller to verify newly-read data and modify | ||
46 | * the data just before writing, e.g. to calculate checksums. It's | ||
47 | * important to be consistent with your use of validators. The only time | ||
48 | * you can change validators is if you call dm_bm_write_lock_zero. | ||
49 | */ | ||
50 | struct dm_block_validator { | ||
51 | const char *name; | ||
52 | void (*prepare_for_write)(struct dm_block_validator *v, struct dm_block *b, size_t block_size); | ||
53 | |||
54 | /* | ||
55 | * Return 0 if the checksum is valid or < 0 on error. | ||
56 | */ | ||
57 | int (*check)(struct dm_block_validator *v, struct dm_block *b, size_t block_size); | ||
58 | }; | ||
59 | |||
60 | /*----------------------------------------------------------------*/ | ||
61 | |||
62 | /* | ||
63 | * You can have multiple concurrent readers or a single writer holding a | ||
64 | * block lock. | ||
65 | */ | ||
66 | |||
67 | /* | ||
68 | * dm_bm_lock() locks a block and returns through @result a pointer to | ||
69 | * memory that holds a copy of that block. If you have write-locked the | ||
70 | * block then any changes you make to memory pointed to by @result will be | ||
71 | * written back to the disk sometime after dm_bm_unlock is called. | ||
72 | */ | ||
73 | int dm_bm_read_lock(struct dm_block_manager *bm, dm_block_t b, | ||
74 | struct dm_block_validator *v, | ||
75 | struct dm_block **result); | ||
76 | |||
77 | int dm_bm_write_lock(struct dm_block_manager *bm, dm_block_t b, | ||
78 | struct dm_block_validator *v, | ||
79 | struct dm_block **result); | ||
80 | |||
81 | /* | ||
82 | * The *_try_lock variants return -EWOULDBLOCK if the block isn't | ||
83 | * available immediately. | ||
84 | */ | ||
85 | int dm_bm_read_try_lock(struct dm_block_manager *bm, dm_block_t b, | ||
86 | struct dm_block_validator *v, | ||
87 | struct dm_block **result); | ||
88 | |||
89 | /* | ||
90 | * Use dm_bm_write_lock_zero() when you know you're going to | ||
91 | * overwrite the block completely. It saves a disk read. | ||
92 | */ | ||
93 | int dm_bm_write_lock_zero(struct dm_block_manager *bm, dm_block_t b, | ||
94 | struct dm_block_validator *v, | ||
95 | struct dm_block **result); | ||
96 | |||
97 | int dm_bm_unlock(struct dm_block *b); | ||
98 | |||
99 | /* | ||
100 | * An optimisation; we often want to copy a block's contents to a new | ||
101 | * block. eg, as part of the shadowing operation. It's far better for | ||
102 | * bufio to do this move behind the scenes than hold 2 locks and memcpy the | ||
103 | * data. | ||
104 | */ | ||
105 | int dm_bm_unlock_move(struct dm_block *b, dm_block_t n); | ||
106 | |||
107 | /* | ||
108 | * It's a common idiom to have a superblock that should be committed last. | ||
109 | * | ||
110 | * @superblock should be write-locked on entry. It will be unlocked during | ||
111 | * this function. All dirty blocks are guaranteed to be written and flushed | ||
112 | * before the superblock. | ||
113 | * | ||
114 | * This method always blocks. | ||
115 | */ | ||
116 | int dm_bm_flush_and_unlock(struct dm_block_manager *bm, | ||
117 | struct dm_block *superblock); | ||
118 | |||
119 | u32 dm_bm_checksum(const void *data, size_t len, u32 init_xor); | ||
120 | |||
121 | /*----------------------------------------------------------------*/ | ||
122 | |||
123 | #endif /* _LINUX_DM_BLOCK_MANAGER_H */ | ||
diff --git a/drivers/md/persistent-data/dm-btree-internal.h b/drivers/md/persistent-data/dm-btree-internal.h new file mode 100644 index 000000000000..d279c768f8f1 --- /dev/null +++ b/drivers/md/persistent-data/dm-btree-internal.h | |||
@@ -0,0 +1,137 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef DM_BTREE_INTERNAL_H | ||
8 | #define DM_BTREE_INTERNAL_H | ||
9 | |||
10 | #include "dm-btree.h" | ||
11 | |||
12 | /*----------------------------------------------------------------*/ | ||
13 | |||
14 | /* | ||
15 | * We'll need 2 accessor functions for n->csum and n->blocknr | ||
16 | * to support dm-btree-spine.c in that case. | ||
17 | */ | ||
18 | |||
19 | enum node_flags { | ||
20 | INTERNAL_NODE = 1, | ||
21 | LEAF_NODE = 1 << 1 | ||
22 | }; | ||
23 | |||
24 | /* | ||
25 | * Every btree node begins with this structure. Make sure it's a multiple | ||
26 | * of 8-bytes in size, otherwise the 64bit keys will be mis-aligned. | ||
27 | */ | ||
28 | struct node_header { | ||
29 | __le32 csum; | ||
30 | __le32 flags; | ||
31 | __le64 blocknr; /* Block this node is supposed to live in. */ | ||
32 | |||
33 | __le32 nr_entries; | ||
34 | __le32 max_entries; | ||
35 | __le32 value_size; | ||
36 | __le32 padding; | ||
37 | } __packed; | ||
38 | |||
39 | struct node { | ||
40 | struct node_header header; | ||
41 | __le64 keys[0]; | ||
42 | } __packed; | ||
43 | |||
44 | |||
45 | void inc_children(struct dm_transaction_manager *tm, struct node *n, | ||
46 | struct dm_btree_value_type *vt); | ||
47 | |||
48 | int new_block(struct dm_btree_info *info, struct dm_block **result); | ||
49 | int unlock_block(struct dm_btree_info *info, struct dm_block *b); | ||
50 | |||
51 | /* | ||
52 | * Spines keep track of the rolling locks. There are 2 variants, read-only | ||
53 | * and one that uses shadowing. These are separate structs to allow the | ||
54 | * type checker to spot misuse, for example accidentally calling read_lock | ||
55 | * on a shadow spine. | ||
56 | */ | ||
57 | struct ro_spine { | ||
58 | struct dm_btree_info *info; | ||
59 | |||
60 | int count; | ||
61 | struct dm_block *nodes[2]; | ||
62 | }; | ||
63 | |||
64 | void init_ro_spine(struct ro_spine *s, struct dm_btree_info *info); | ||
65 | int exit_ro_spine(struct ro_spine *s); | ||
66 | int ro_step(struct ro_spine *s, dm_block_t new_child); | ||
67 | struct node *ro_node(struct ro_spine *s); | ||
68 | |||
69 | struct shadow_spine { | ||
70 | struct dm_btree_info *info; | ||
71 | |||
72 | int count; | ||
73 | struct dm_block *nodes[2]; | ||
74 | |||
75 | dm_block_t root; | ||
76 | }; | ||
77 | |||
78 | void init_shadow_spine(struct shadow_spine *s, struct dm_btree_info *info); | ||
79 | int exit_shadow_spine(struct shadow_spine *s); | ||
80 | |||
81 | int shadow_step(struct shadow_spine *s, dm_block_t b, | ||
82 | struct dm_btree_value_type *vt); | ||
83 | |||
84 | /* | ||
85 | * The spine must have at least one entry before calling this. | ||
86 | */ | ||
87 | struct dm_block *shadow_current(struct shadow_spine *s); | ||
88 | |||
89 | /* | ||
90 | * The spine must have at least two entries before calling this. | ||
91 | */ | ||
92 | struct dm_block *shadow_parent(struct shadow_spine *s); | ||
93 | |||
94 | int shadow_has_parent(struct shadow_spine *s); | ||
95 | |||
96 | int shadow_root(struct shadow_spine *s); | ||
97 | |||
98 | /* | ||
99 | * Some inlines. | ||
100 | */ | ||
101 | static inline __le64 *key_ptr(struct node *n, uint32_t index) | ||
102 | { | ||
103 | return n->keys + index; | ||
104 | } | ||
105 | |||
106 | static inline void *value_base(struct node *n) | ||
107 | { | ||
108 | return &n->keys[le32_to_cpu(n->header.max_entries)]; | ||
109 | } | ||
110 | |||
111 | /* | ||
112 | * FIXME: Now that value size is stored in node we don't need the third parm. | ||
113 | */ | ||
114 | static inline void *value_ptr(struct node *n, uint32_t index, size_t value_size) | ||
115 | { | ||
116 | BUG_ON(value_size != le32_to_cpu(n->header.value_size)); | ||
117 | return value_base(n) + (value_size * index); | ||
118 | } | ||
119 | |||
120 | /* | ||
121 | * Assumes the values are suitably-aligned and converts to core format. | ||
122 | */ | ||
123 | static inline uint64_t value64(struct node *n, uint32_t index) | ||
124 | { | ||
125 | __le64 *values_le = value_base(n); | ||
126 | |||
127 | return le64_to_cpu(values_le[index]); | ||
128 | } | ||
129 | |||
130 | /* | ||
131 | * Searching for a key within a single node. | ||
132 | */ | ||
133 | int lower_bound(struct node *n, uint64_t key); | ||
134 | |||
135 | extern struct dm_block_validator btree_node_validator; | ||
136 | |||
137 | #endif /* DM_BTREE_INTERNAL_H */ | ||
diff --git a/drivers/md/persistent-data/dm-btree-remove.c b/drivers/md/persistent-data/dm-btree-remove.c new file mode 100644 index 000000000000..65fd85ec6514 --- /dev/null +++ b/drivers/md/persistent-data/dm-btree-remove.c | |||
@@ -0,0 +1,566 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-btree.h" | ||
8 | #include "dm-btree-internal.h" | ||
9 | #include "dm-transaction-manager.h" | ||
10 | |||
11 | #include <linux/module.h> | ||
12 | |||
13 | /* | ||
14 | * Removing an entry from a btree | ||
15 | * ============================== | ||
16 | * | ||
17 | * A very important constraint for our btree is that no node, except the | ||
18 | * root, may have fewer than a certain number of entries. | ||
19 | * (MIN_ENTRIES <= nr_entries <= MAX_ENTRIES). | ||
20 | * | ||
21 | * Ensuring this is complicated by the way we want to only ever hold the | ||
22 | * locks on 2 nodes concurrently, and only change nodes in a top to bottom | ||
23 | * fashion. | ||
24 | * | ||
25 | * Each node may have a left or right sibling. When decending the spine, | ||
26 | * if a node contains only MIN_ENTRIES then we try and increase this to at | ||
27 | * least MIN_ENTRIES + 1. We do this in the following ways: | ||
28 | * | ||
29 | * [A] No siblings => this can only happen if the node is the root, in which | ||
30 | * case we copy the childs contents over the root. | ||
31 | * | ||
32 | * [B] No left sibling | ||
33 | * ==> rebalance(node, right sibling) | ||
34 | * | ||
35 | * [C] No right sibling | ||
36 | * ==> rebalance(left sibling, node) | ||
37 | * | ||
38 | * [D] Both siblings, total_entries(left, node, right) <= DEL_THRESHOLD | ||
39 | * ==> delete node adding it's contents to left and right | ||
40 | * | ||
41 | * [E] Both siblings, total_entries(left, node, right) > DEL_THRESHOLD | ||
42 | * ==> rebalance(left, node, right) | ||
43 | * | ||
44 | * After these operations it's possible that the our original node no | ||
45 | * longer contains the desired sub tree. For this reason this rebalancing | ||
46 | * is performed on the children of the current node. This also avoids | ||
47 | * having a special case for the root. | ||
48 | * | ||
49 | * Once this rebalancing has occurred we can then step into the child node | ||
50 | * for internal nodes. Or delete the entry for leaf nodes. | ||
51 | */ | ||
52 | |||
53 | /* | ||
54 | * Some little utilities for moving node data around. | ||
55 | */ | ||
56 | static void node_shift(struct node *n, int shift) | ||
57 | { | ||
58 | uint32_t nr_entries = le32_to_cpu(n->header.nr_entries); | ||
59 | uint32_t value_size = le32_to_cpu(n->header.value_size); | ||
60 | |||
61 | if (shift < 0) { | ||
62 | shift = -shift; | ||
63 | BUG_ON(shift > nr_entries); | ||
64 | BUG_ON((void *) key_ptr(n, shift) >= value_ptr(n, shift, value_size)); | ||
65 | memmove(key_ptr(n, 0), | ||
66 | key_ptr(n, shift), | ||
67 | (nr_entries - shift) * sizeof(__le64)); | ||
68 | memmove(value_ptr(n, 0, value_size), | ||
69 | value_ptr(n, shift, value_size), | ||
70 | (nr_entries - shift) * value_size); | ||
71 | } else { | ||
72 | BUG_ON(nr_entries + shift > le32_to_cpu(n->header.max_entries)); | ||
73 | memmove(key_ptr(n, shift), | ||
74 | key_ptr(n, 0), | ||
75 | nr_entries * sizeof(__le64)); | ||
76 | memmove(value_ptr(n, shift, value_size), | ||
77 | value_ptr(n, 0, value_size), | ||
78 | nr_entries * value_size); | ||
79 | } | ||
80 | } | ||
81 | |||
82 | static void node_copy(struct node *left, struct node *right, int shift) | ||
83 | { | ||
84 | uint32_t nr_left = le32_to_cpu(left->header.nr_entries); | ||
85 | uint32_t value_size = le32_to_cpu(left->header.value_size); | ||
86 | BUG_ON(value_size != le32_to_cpu(right->header.value_size)); | ||
87 | |||
88 | if (shift < 0) { | ||
89 | shift = -shift; | ||
90 | BUG_ON(nr_left + shift > le32_to_cpu(left->header.max_entries)); | ||
91 | memcpy(key_ptr(left, nr_left), | ||
92 | key_ptr(right, 0), | ||
93 | shift * sizeof(__le64)); | ||
94 | memcpy(value_ptr(left, nr_left, value_size), | ||
95 | value_ptr(right, 0, value_size), | ||
96 | shift * value_size); | ||
97 | } else { | ||
98 | BUG_ON(shift > le32_to_cpu(right->header.max_entries)); | ||
99 | memcpy(key_ptr(right, 0), | ||
100 | key_ptr(left, nr_left - shift), | ||
101 | shift * sizeof(__le64)); | ||
102 | memcpy(value_ptr(right, 0, value_size), | ||
103 | value_ptr(left, nr_left - shift, value_size), | ||
104 | shift * value_size); | ||
105 | } | ||
106 | } | ||
107 | |||
108 | /* | ||
109 | * Delete a specific entry from a leaf node. | ||
110 | */ | ||
111 | static void delete_at(struct node *n, unsigned index) | ||
112 | { | ||
113 | unsigned nr_entries = le32_to_cpu(n->header.nr_entries); | ||
114 | unsigned nr_to_copy = nr_entries - (index + 1); | ||
115 | uint32_t value_size = le32_to_cpu(n->header.value_size); | ||
116 | BUG_ON(index >= nr_entries); | ||
117 | |||
118 | if (nr_to_copy) { | ||
119 | memmove(key_ptr(n, index), | ||
120 | key_ptr(n, index + 1), | ||
121 | nr_to_copy * sizeof(__le64)); | ||
122 | |||
123 | memmove(value_ptr(n, index, value_size), | ||
124 | value_ptr(n, index + 1, value_size), | ||
125 | nr_to_copy * value_size); | ||
126 | } | ||
127 | |||
128 | n->header.nr_entries = cpu_to_le32(nr_entries - 1); | ||
129 | } | ||
130 | |||
131 | static unsigned del_threshold(struct node *n) | ||
132 | { | ||
133 | return le32_to_cpu(n->header.max_entries) / 3; | ||
134 | } | ||
135 | |||
136 | static unsigned merge_threshold(struct node *n) | ||
137 | { | ||
138 | /* | ||
139 | * The extra one is because we know we're potentially going to | ||
140 | * delete an entry. | ||
141 | */ | ||
142 | return 2 * (le32_to_cpu(n->header.max_entries) / 3) + 1; | ||
143 | } | ||
144 | |||
145 | struct child { | ||
146 | unsigned index; | ||
147 | struct dm_block *block; | ||
148 | struct node *n; | ||
149 | }; | ||
150 | |||
151 | static struct dm_btree_value_type le64_type = { | ||
152 | .context = NULL, | ||
153 | .size = sizeof(__le64), | ||
154 | .inc = NULL, | ||
155 | .dec = NULL, | ||
156 | .equal = NULL | ||
157 | }; | ||
158 | |||
159 | static int init_child(struct dm_btree_info *info, struct node *parent, | ||
160 | unsigned index, struct child *result) | ||
161 | { | ||
162 | int r, inc; | ||
163 | dm_block_t root; | ||
164 | |||
165 | result->index = index; | ||
166 | root = value64(parent, index); | ||
167 | |||
168 | r = dm_tm_shadow_block(info->tm, root, &btree_node_validator, | ||
169 | &result->block, &inc); | ||
170 | if (r) | ||
171 | return r; | ||
172 | |||
173 | result->n = dm_block_data(result->block); | ||
174 | |||
175 | if (inc) | ||
176 | inc_children(info->tm, result->n, &le64_type); | ||
177 | |||
178 | *((__le64 *) value_ptr(parent, index, sizeof(__le64))) = | ||
179 | cpu_to_le64(dm_block_location(result->block)); | ||
180 | |||
181 | return 0; | ||
182 | } | ||
183 | |||
184 | static int exit_child(struct dm_btree_info *info, struct child *c) | ||
185 | { | ||
186 | return dm_tm_unlock(info->tm, c->block); | ||
187 | } | ||
188 | |||
189 | static void shift(struct node *left, struct node *right, int count) | ||
190 | { | ||
191 | if (!count) | ||
192 | return; | ||
193 | |||
194 | if (count > 0) { | ||
195 | node_shift(right, count); | ||
196 | node_copy(left, right, count); | ||
197 | } else { | ||
198 | node_copy(left, right, count); | ||
199 | node_shift(right, count); | ||
200 | } | ||
201 | |||
202 | left->header.nr_entries = | ||
203 | cpu_to_le32(le32_to_cpu(left->header.nr_entries) - count); | ||
204 | BUG_ON(le32_to_cpu(left->header.nr_entries) > le32_to_cpu(left->header.max_entries)); | ||
205 | |||
206 | right->header.nr_entries = | ||
207 | cpu_to_le32(le32_to_cpu(right->header.nr_entries) + count); | ||
208 | BUG_ON(le32_to_cpu(right->header.nr_entries) > le32_to_cpu(right->header.max_entries)); | ||
209 | } | ||
210 | |||
211 | static void __rebalance2(struct dm_btree_info *info, struct node *parent, | ||
212 | struct child *l, struct child *r) | ||
213 | { | ||
214 | struct node *left = l->n; | ||
215 | struct node *right = r->n; | ||
216 | uint32_t nr_left = le32_to_cpu(left->header.nr_entries); | ||
217 | uint32_t nr_right = le32_to_cpu(right->header.nr_entries); | ||
218 | |||
219 | if (nr_left + nr_right <= merge_threshold(left)) { | ||
220 | /* | ||
221 | * Merge | ||
222 | */ | ||
223 | node_copy(left, right, -nr_right); | ||
224 | left->header.nr_entries = cpu_to_le32(nr_left + nr_right); | ||
225 | delete_at(parent, r->index); | ||
226 | |||
227 | /* | ||
228 | * We need to decrement the right block, but not it's | ||
229 | * children, since they're still referenced by left. | ||
230 | */ | ||
231 | dm_tm_dec(info->tm, dm_block_location(r->block)); | ||
232 | } else { | ||
233 | /* | ||
234 | * Rebalance. | ||
235 | */ | ||
236 | unsigned target_left = (nr_left + nr_right) / 2; | ||
237 | unsigned shift_ = nr_left - target_left; | ||
238 | BUG_ON(le32_to_cpu(left->header.max_entries) <= nr_left - shift_); | ||
239 | BUG_ON(le32_to_cpu(right->header.max_entries) <= nr_right + shift_); | ||
240 | shift(left, right, nr_left - target_left); | ||
241 | *key_ptr(parent, r->index) = right->keys[0]; | ||
242 | } | ||
243 | } | ||
244 | |||
245 | static int rebalance2(struct shadow_spine *s, struct dm_btree_info *info, | ||
246 | unsigned left_index) | ||
247 | { | ||
248 | int r; | ||
249 | struct node *parent; | ||
250 | struct child left, right; | ||
251 | |||
252 | parent = dm_block_data(shadow_current(s)); | ||
253 | |||
254 | r = init_child(info, parent, left_index, &left); | ||
255 | if (r) | ||
256 | return r; | ||
257 | |||
258 | r = init_child(info, parent, left_index + 1, &right); | ||
259 | if (r) { | ||
260 | exit_child(info, &left); | ||
261 | return r; | ||
262 | } | ||
263 | |||
264 | __rebalance2(info, parent, &left, &right); | ||
265 | |||
266 | r = exit_child(info, &left); | ||
267 | if (r) { | ||
268 | exit_child(info, &right); | ||
269 | return r; | ||
270 | } | ||
271 | |||
272 | return exit_child(info, &right); | ||
273 | } | ||
274 | |||
275 | static void __rebalance3(struct dm_btree_info *info, struct node *parent, | ||
276 | struct child *l, struct child *c, struct child *r) | ||
277 | { | ||
278 | struct node *left = l->n; | ||
279 | struct node *center = c->n; | ||
280 | struct node *right = r->n; | ||
281 | |||
282 | uint32_t nr_left = le32_to_cpu(left->header.nr_entries); | ||
283 | uint32_t nr_center = le32_to_cpu(center->header.nr_entries); | ||
284 | uint32_t nr_right = le32_to_cpu(right->header.nr_entries); | ||
285 | uint32_t max_entries = le32_to_cpu(left->header.max_entries); | ||
286 | |||
287 | unsigned target; | ||
288 | |||
289 | BUG_ON(left->header.max_entries != center->header.max_entries); | ||
290 | BUG_ON(center->header.max_entries != right->header.max_entries); | ||
291 | |||
292 | if (((nr_left + nr_center + nr_right) / 2) < merge_threshold(center)) { | ||
293 | /* | ||
294 | * Delete center node: | ||
295 | * | ||
296 | * We dump as many entries from center as possible into | ||
297 | * left, then the rest in right, then rebalance2. This | ||
298 | * wastes some cpu, but I want something simple atm. | ||
299 | */ | ||
300 | unsigned shift = min(max_entries - nr_left, nr_center); | ||
301 | |||
302 | BUG_ON(nr_left + shift > max_entries); | ||
303 | node_copy(left, center, -shift); | ||
304 | left->header.nr_entries = cpu_to_le32(nr_left + shift); | ||
305 | |||
306 | if (shift != nr_center) { | ||
307 | shift = nr_center - shift; | ||
308 | BUG_ON((nr_right + shift) >= max_entries); | ||
309 | node_shift(right, shift); | ||
310 | node_copy(center, right, shift); | ||
311 | right->header.nr_entries = cpu_to_le32(nr_right + shift); | ||
312 | } | ||
313 | *key_ptr(parent, r->index) = right->keys[0]; | ||
314 | |||
315 | delete_at(parent, c->index); | ||
316 | r->index--; | ||
317 | |||
318 | dm_tm_dec(info->tm, dm_block_location(c->block)); | ||
319 | __rebalance2(info, parent, l, r); | ||
320 | |||
321 | return; | ||
322 | } | ||
323 | |||
324 | /* | ||
325 | * Rebalance | ||
326 | */ | ||
327 | target = (nr_left + nr_center + nr_right) / 3; | ||
328 | BUG_ON(target > max_entries); | ||
329 | |||
330 | /* | ||
331 | * Adjust the left node | ||
332 | */ | ||
333 | shift(left, center, nr_left - target); | ||
334 | |||
335 | /* | ||
336 | * Adjust the right node | ||
337 | */ | ||
338 | shift(center, right, target - nr_right); | ||
339 | *key_ptr(parent, c->index) = center->keys[0]; | ||
340 | *key_ptr(parent, r->index) = right->keys[0]; | ||
341 | } | ||
342 | |||
343 | static int rebalance3(struct shadow_spine *s, struct dm_btree_info *info, | ||
344 | unsigned left_index) | ||
345 | { | ||
346 | int r; | ||
347 | struct node *parent = dm_block_data(shadow_current(s)); | ||
348 | struct child left, center, right; | ||
349 | |||
350 | /* | ||
351 | * FIXME: fill out an array? | ||
352 | */ | ||
353 | r = init_child(info, parent, left_index, &left); | ||
354 | if (r) | ||
355 | return r; | ||
356 | |||
357 | r = init_child(info, parent, left_index + 1, ¢er); | ||
358 | if (r) { | ||
359 | exit_child(info, &left); | ||
360 | return r; | ||
361 | } | ||
362 | |||
363 | r = init_child(info, parent, left_index + 2, &right); | ||
364 | if (r) { | ||
365 | exit_child(info, &left); | ||
366 | exit_child(info, ¢er); | ||
367 | return r; | ||
368 | } | ||
369 | |||
370 | __rebalance3(info, parent, &left, ¢er, &right); | ||
371 | |||
372 | r = exit_child(info, &left); | ||
373 | if (r) { | ||
374 | exit_child(info, ¢er); | ||
375 | exit_child(info, &right); | ||
376 | return r; | ||
377 | } | ||
378 | |||
379 | r = exit_child(info, ¢er); | ||
380 | if (r) { | ||
381 | exit_child(info, &right); | ||
382 | return r; | ||
383 | } | ||
384 | |||
385 | r = exit_child(info, &right); | ||
386 | if (r) | ||
387 | return r; | ||
388 | |||
389 | return 0; | ||
390 | } | ||
391 | |||
392 | static int get_nr_entries(struct dm_transaction_manager *tm, | ||
393 | dm_block_t b, uint32_t *result) | ||
394 | { | ||
395 | int r; | ||
396 | struct dm_block *block; | ||
397 | struct node *n; | ||
398 | |||
399 | r = dm_tm_read_lock(tm, b, &btree_node_validator, &block); | ||
400 | if (r) | ||
401 | return r; | ||
402 | |||
403 | n = dm_block_data(block); | ||
404 | *result = le32_to_cpu(n->header.nr_entries); | ||
405 | |||
406 | return dm_tm_unlock(tm, block); | ||
407 | } | ||
408 | |||
409 | static int rebalance_children(struct shadow_spine *s, | ||
410 | struct dm_btree_info *info, uint64_t key) | ||
411 | { | ||
412 | int i, r, has_left_sibling, has_right_sibling; | ||
413 | uint32_t child_entries; | ||
414 | struct node *n; | ||
415 | |||
416 | n = dm_block_data(shadow_current(s)); | ||
417 | |||
418 | if (le32_to_cpu(n->header.nr_entries) == 1) { | ||
419 | struct dm_block *child; | ||
420 | dm_block_t b = value64(n, 0); | ||
421 | |||
422 | r = dm_tm_read_lock(info->tm, b, &btree_node_validator, &child); | ||
423 | if (r) | ||
424 | return r; | ||
425 | |||
426 | memcpy(n, dm_block_data(child), | ||
427 | dm_bm_block_size(dm_tm_get_bm(info->tm))); | ||
428 | r = dm_tm_unlock(info->tm, child); | ||
429 | if (r) | ||
430 | return r; | ||
431 | |||
432 | dm_tm_dec(info->tm, dm_block_location(child)); | ||
433 | return 0; | ||
434 | } | ||
435 | |||
436 | i = lower_bound(n, key); | ||
437 | if (i < 0) | ||
438 | return -ENODATA; | ||
439 | |||
440 | r = get_nr_entries(info->tm, value64(n, i), &child_entries); | ||
441 | if (r) | ||
442 | return r; | ||
443 | |||
444 | if (child_entries > del_threshold(n)) | ||
445 | return 0; | ||
446 | |||
447 | has_left_sibling = i > 0; | ||
448 | has_right_sibling = i < (le32_to_cpu(n->header.nr_entries) - 1); | ||
449 | |||
450 | if (!has_left_sibling) | ||
451 | r = rebalance2(s, info, i); | ||
452 | |||
453 | else if (!has_right_sibling) | ||
454 | r = rebalance2(s, info, i - 1); | ||
455 | |||
456 | else | ||
457 | r = rebalance3(s, info, i - 1); | ||
458 | |||
459 | return r; | ||
460 | } | ||
461 | |||
462 | static int do_leaf(struct node *n, uint64_t key, unsigned *index) | ||
463 | { | ||
464 | int i = lower_bound(n, key); | ||
465 | |||
466 | if ((i < 0) || | ||
467 | (i >= le32_to_cpu(n->header.nr_entries)) || | ||
468 | (le64_to_cpu(n->keys[i]) != key)) | ||
469 | return -ENODATA; | ||
470 | |||
471 | *index = i; | ||
472 | |||
473 | return 0; | ||
474 | } | ||
475 | |||
476 | /* | ||
477 | * Prepares for removal from one level of the hierarchy. The caller must | ||
478 | * call delete_at() to remove the entry at index. | ||
479 | */ | ||
480 | static int remove_raw(struct shadow_spine *s, struct dm_btree_info *info, | ||
481 | struct dm_btree_value_type *vt, dm_block_t root, | ||
482 | uint64_t key, unsigned *index) | ||
483 | { | ||
484 | int i = *index, r; | ||
485 | struct node *n; | ||
486 | |||
487 | for (;;) { | ||
488 | r = shadow_step(s, root, vt); | ||
489 | if (r < 0) | ||
490 | break; | ||
491 | |||
492 | /* | ||
493 | * We have to patch up the parent node, ugly, but I don't | ||
494 | * see a way to do this automatically as part of the spine | ||
495 | * op. | ||
496 | */ | ||
497 | if (shadow_has_parent(s)) { | ||
498 | __le64 location = cpu_to_le64(dm_block_location(shadow_current(s))); | ||
499 | memcpy(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(__le64)), | ||
500 | &location, sizeof(__le64)); | ||
501 | } | ||
502 | |||
503 | n = dm_block_data(shadow_current(s)); | ||
504 | |||
505 | if (le32_to_cpu(n->header.flags) & LEAF_NODE) | ||
506 | return do_leaf(n, key, index); | ||
507 | |||
508 | r = rebalance_children(s, info, key); | ||
509 | if (r) | ||
510 | break; | ||
511 | |||
512 | n = dm_block_data(shadow_current(s)); | ||
513 | if (le32_to_cpu(n->header.flags) & LEAF_NODE) | ||
514 | return do_leaf(n, key, index); | ||
515 | |||
516 | i = lower_bound(n, key); | ||
517 | |||
518 | /* | ||
519 | * We know the key is present, or else | ||
520 | * rebalance_children would have returned | ||
521 | * -ENODATA | ||
522 | */ | ||
523 | root = value64(n, i); | ||
524 | } | ||
525 | |||
526 | return r; | ||
527 | } | ||
528 | |||
529 | int dm_btree_remove(struct dm_btree_info *info, dm_block_t root, | ||
530 | uint64_t *keys, dm_block_t *new_root) | ||
531 | { | ||
532 | unsigned level, last_level = info->levels - 1; | ||
533 | int index = 0, r = 0; | ||
534 | struct shadow_spine spine; | ||
535 | struct node *n; | ||
536 | |||
537 | init_shadow_spine(&spine, info); | ||
538 | for (level = 0; level < info->levels; level++) { | ||
539 | r = remove_raw(&spine, info, | ||
540 | (level == last_level ? | ||
541 | &info->value_type : &le64_type), | ||
542 | root, keys[level], (unsigned *)&index); | ||
543 | if (r < 0) | ||
544 | break; | ||
545 | |||
546 | n = dm_block_data(shadow_current(&spine)); | ||
547 | if (level != last_level) { | ||
548 | root = value64(n, index); | ||
549 | continue; | ||
550 | } | ||
551 | |||
552 | BUG_ON(index < 0 || index >= le32_to_cpu(n->header.nr_entries)); | ||
553 | |||
554 | if (info->value_type.dec) | ||
555 | info->value_type.dec(info->value_type.context, | ||
556 | value_ptr(n, index, info->value_type.size)); | ||
557 | |||
558 | delete_at(n, index); | ||
559 | } | ||
560 | |||
561 | *new_root = shadow_root(&spine); | ||
562 | exit_shadow_spine(&spine); | ||
563 | |||
564 | return r; | ||
565 | } | ||
566 | EXPORT_SYMBOL_GPL(dm_btree_remove); | ||
diff --git a/drivers/md/persistent-data/dm-btree-spine.c b/drivers/md/persistent-data/dm-btree-spine.c new file mode 100644 index 000000000000..d9a7912ee8ee --- /dev/null +++ b/drivers/md/persistent-data/dm-btree-spine.c | |||
@@ -0,0 +1,244 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-btree-internal.h" | ||
8 | #include "dm-transaction-manager.h" | ||
9 | |||
10 | #include <linux/device-mapper.h> | ||
11 | |||
12 | #define DM_MSG_PREFIX "btree spine" | ||
13 | |||
14 | /*----------------------------------------------------------------*/ | ||
15 | |||
16 | #define BTREE_CSUM_XOR 121107 | ||
17 | |||
18 | static int node_check(struct dm_block_validator *v, | ||
19 | struct dm_block *b, | ||
20 | size_t block_size); | ||
21 | |||
22 | static void node_prepare_for_write(struct dm_block_validator *v, | ||
23 | struct dm_block *b, | ||
24 | size_t block_size) | ||
25 | { | ||
26 | struct node *n = dm_block_data(b); | ||
27 | struct node_header *h = &n->header; | ||
28 | |||
29 | h->blocknr = cpu_to_le64(dm_block_location(b)); | ||
30 | h->csum = cpu_to_le32(dm_bm_checksum(&h->flags, | ||
31 | block_size - sizeof(__le32), | ||
32 | BTREE_CSUM_XOR)); | ||
33 | |||
34 | BUG_ON(node_check(v, b, 4096)); | ||
35 | } | ||
36 | |||
37 | static int node_check(struct dm_block_validator *v, | ||
38 | struct dm_block *b, | ||
39 | size_t block_size) | ||
40 | { | ||
41 | struct node *n = dm_block_data(b); | ||
42 | struct node_header *h = &n->header; | ||
43 | size_t value_size; | ||
44 | __le32 csum_disk; | ||
45 | uint32_t flags; | ||
46 | |||
47 | if (dm_block_location(b) != le64_to_cpu(h->blocknr)) { | ||
48 | DMERR("node_check failed blocknr %llu wanted %llu", | ||
49 | le64_to_cpu(h->blocknr), dm_block_location(b)); | ||
50 | return -ENOTBLK; | ||
51 | } | ||
52 | |||
53 | csum_disk = cpu_to_le32(dm_bm_checksum(&h->flags, | ||
54 | block_size - sizeof(__le32), | ||
55 | BTREE_CSUM_XOR)); | ||
56 | if (csum_disk != h->csum) { | ||
57 | DMERR("node_check failed csum %u wanted %u", | ||
58 | le32_to_cpu(csum_disk), le32_to_cpu(h->csum)); | ||
59 | return -EILSEQ; | ||
60 | } | ||
61 | |||
62 | value_size = le32_to_cpu(h->value_size); | ||
63 | |||
64 | if (sizeof(struct node_header) + | ||
65 | (sizeof(__le64) + value_size) * le32_to_cpu(h->max_entries) > block_size) { | ||
66 | DMERR("node_check failed: max_entries too large"); | ||
67 | return -EILSEQ; | ||
68 | } | ||
69 | |||
70 | if (le32_to_cpu(h->nr_entries) > le32_to_cpu(h->max_entries)) { | ||
71 | DMERR("node_check failed, too many entries"); | ||
72 | return -EILSEQ; | ||
73 | } | ||
74 | |||
75 | /* | ||
76 | * The node must be either INTERNAL or LEAF. | ||
77 | */ | ||
78 | flags = le32_to_cpu(h->flags); | ||
79 | if (!(flags & INTERNAL_NODE) && !(flags & LEAF_NODE)) { | ||
80 | DMERR("node_check failed, node is neither INTERNAL or LEAF"); | ||
81 | return -EILSEQ; | ||
82 | } | ||
83 | |||
84 | return 0; | ||
85 | } | ||
86 | |||
87 | struct dm_block_validator btree_node_validator = { | ||
88 | .name = "btree_node", | ||
89 | .prepare_for_write = node_prepare_for_write, | ||
90 | .check = node_check | ||
91 | }; | ||
92 | |||
93 | /*----------------------------------------------------------------*/ | ||
94 | |||
95 | static int bn_read_lock(struct dm_btree_info *info, dm_block_t b, | ||
96 | struct dm_block **result) | ||
97 | { | ||
98 | return dm_tm_read_lock(info->tm, b, &btree_node_validator, result); | ||
99 | } | ||
100 | |||
101 | static int bn_shadow(struct dm_btree_info *info, dm_block_t orig, | ||
102 | struct dm_btree_value_type *vt, | ||
103 | struct dm_block **result) | ||
104 | { | ||
105 | int r, inc; | ||
106 | |||
107 | r = dm_tm_shadow_block(info->tm, orig, &btree_node_validator, | ||
108 | result, &inc); | ||
109 | if (!r && inc) | ||
110 | inc_children(info->tm, dm_block_data(*result), vt); | ||
111 | |||
112 | return r; | ||
113 | } | ||
114 | |||
115 | int new_block(struct dm_btree_info *info, struct dm_block **result) | ||
116 | { | ||
117 | return dm_tm_new_block(info->tm, &btree_node_validator, result); | ||
118 | } | ||
119 | |||
120 | int unlock_block(struct dm_btree_info *info, struct dm_block *b) | ||
121 | { | ||
122 | return dm_tm_unlock(info->tm, b); | ||
123 | } | ||
124 | |||
125 | /*----------------------------------------------------------------*/ | ||
126 | |||
127 | void init_ro_spine(struct ro_spine *s, struct dm_btree_info *info) | ||
128 | { | ||
129 | s->info = info; | ||
130 | s->count = 0; | ||
131 | s->nodes[0] = NULL; | ||
132 | s->nodes[1] = NULL; | ||
133 | } | ||
134 | |||
135 | int exit_ro_spine(struct ro_spine *s) | ||
136 | { | ||
137 | int r = 0, i; | ||
138 | |||
139 | for (i = 0; i < s->count; i++) { | ||
140 | int r2 = unlock_block(s->info, s->nodes[i]); | ||
141 | if (r2 < 0) | ||
142 | r = r2; | ||
143 | } | ||
144 | |||
145 | return r; | ||
146 | } | ||
147 | |||
148 | int ro_step(struct ro_spine *s, dm_block_t new_child) | ||
149 | { | ||
150 | int r; | ||
151 | |||
152 | if (s->count == 2) { | ||
153 | r = unlock_block(s->info, s->nodes[0]); | ||
154 | if (r < 0) | ||
155 | return r; | ||
156 | s->nodes[0] = s->nodes[1]; | ||
157 | s->count--; | ||
158 | } | ||
159 | |||
160 | r = bn_read_lock(s->info, new_child, s->nodes + s->count); | ||
161 | if (!r) | ||
162 | s->count++; | ||
163 | |||
164 | return r; | ||
165 | } | ||
166 | |||
167 | struct node *ro_node(struct ro_spine *s) | ||
168 | { | ||
169 | struct dm_block *block; | ||
170 | |||
171 | BUG_ON(!s->count); | ||
172 | block = s->nodes[s->count - 1]; | ||
173 | |||
174 | return dm_block_data(block); | ||
175 | } | ||
176 | |||
177 | /*----------------------------------------------------------------*/ | ||
178 | |||
179 | void init_shadow_spine(struct shadow_spine *s, struct dm_btree_info *info) | ||
180 | { | ||
181 | s->info = info; | ||
182 | s->count = 0; | ||
183 | } | ||
184 | |||
185 | int exit_shadow_spine(struct shadow_spine *s) | ||
186 | { | ||
187 | int r = 0, i; | ||
188 | |||
189 | for (i = 0; i < s->count; i++) { | ||
190 | int r2 = unlock_block(s->info, s->nodes[i]); | ||
191 | if (r2 < 0) | ||
192 | r = r2; | ||
193 | } | ||
194 | |||
195 | return r; | ||
196 | } | ||
197 | |||
198 | int shadow_step(struct shadow_spine *s, dm_block_t b, | ||
199 | struct dm_btree_value_type *vt) | ||
200 | { | ||
201 | int r; | ||
202 | |||
203 | if (s->count == 2) { | ||
204 | r = unlock_block(s->info, s->nodes[0]); | ||
205 | if (r < 0) | ||
206 | return r; | ||
207 | s->nodes[0] = s->nodes[1]; | ||
208 | s->count--; | ||
209 | } | ||
210 | |||
211 | r = bn_shadow(s->info, b, vt, s->nodes + s->count); | ||
212 | if (!r) { | ||
213 | if (!s->count) | ||
214 | s->root = dm_block_location(s->nodes[0]); | ||
215 | |||
216 | s->count++; | ||
217 | } | ||
218 | |||
219 | return r; | ||
220 | } | ||
221 | |||
222 | struct dm_block *shadow_current(struct shadow_spine *s) | ||
223 | { | ||
224 | BUG_ON(!s->count); | ||
225 | |||
226 | return s->nodes[s->count - 1]; | ||
227 | } | ||
228 | |||
229 | struct dm_block *shadow_parent(struct shadow_spine *s) | ||
230 | { | ||
231 | BUG_ON(s->count != 2); | ||
232 | |||
233 | return s->count == 2 ? s->nodes[0] : NULL; | ||
234 | } | ||
235 | |||
236 | int shadow_has_parent(struct shadow_spine *s) | ||
237 | { | ||
238 | return s->count >= 2; | ||
239 | } | ||
240 | |||
241 | int shadow_root(struct shadow_spine *s) | ||
242 | { | ||
243 | return s->root; | ||
244 | } | ||
diff --git a/drivers/md/persistent-data/dm-btree.c b/drivers/md/persistent-data/dm-btree.c new file mode 100644 index 000000000000..e0638be53ea4 --- /dev/null +++ b/drivers/md/persistent-data/dm-btree.c | |||
@@ -0,0 +1,805 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-btree-internal.h" | ||
8 | #include "dm-space-map.h" | ||
9 | #include "dm-transaction-manager.h" | ||
10 | |||
11 | #include <linux/module.h> | ||
12 | #include <linux/device-mapper.h> | ||
13 | |||
14 | #define DM_MSG_PREFIX "btree" | ||
15 | |||
16 | /*---------------------------------------------------------------- | ||
17 | * Array manipulation | ||
18 | *--------------------------------------------------------------*/ | ||
19 | static void memcpy_disk(void *dest, const void *src, size_t len) | ||
20 | __dm_written_to_disk(src) | ||
21 | { | ||
22 | memcpy(dest, src, len); | ||
23 | __dm_unbless_for_disk(src); | ||
24 | } | ||
25 | |||
26 | static void array_insert(void *base, size_t elt_size, unsigned nr_elts, | ||
27 | unsigned index, void *elt) | ||
28 | __dm_written_to_disk(elt) | ||
29 | { | ||
30 | if (index < nr_elts) | ||
31 | memmove(base + (elt_size * (index + 1)), | ||
32 | base + (elt_size * index), | ||
33 | (nr_elts - index) * elt_size); | ||
34 | |||
35 | memcpy_disk(base + (elt_size * index), elt, elt_size); | ||
36 | } | ||
37 | |||
38 | /*----------------------------------------------------------------*/ | ||
39 | |||
40 | /* makes the assumption that no two keys are the same. */ | ||
41 | static int bsearch(struct node *n, uint64_t key, int want_hi) | ||
42 | { | ||
43 | int lo = -1, hi = le32_to_cpu(n->header.nr_entries); | ||
44 | |||
45 | while (hi - lo > 1) { | ||
46 | int mid = lo + ((hi - lo) / 2); | ||
47 | uint64_t mid_key = le64_to_cpu(n->keys[mid]); | ||
48 | |||
49 | if (mid_key == key) | ||
50 | return mid; | ||
51 | |||
52 | if (mid_key < key) | ||
53 | lo = mid; | ||
54 | else | ||
55 | hi = mid; | ||
56 | } | ||
57 | |||
58 | return want_hi ? hi : lo; | ||
59 | } | ||
60 | |||
61 | int lower_bound(struct node *n, uint64_t key) | ||
62 | { | ||
63 | return bsearch(n, key, 0); | ||
64 | } | ||
65 | |||
66 | void inc_children(struct dm_transaction_manager *tm, struct node *n, | ||
67 | struct dm_btree_value_type *vt) | ||
68 | { | ||
69 | unsigned i; | ||
70 | uint32_t nr_entries = le32_to_cpu(n->header.nr_entries); | ||
71 | |||
72 | if (le32_to_cpu(n->header.flags) & INTERNAL_NODE) | ||
73 | for (i = 0; i < nr_entries; i++) | ||
74 | dm_tm_inc(tm, value64(n, i)); | ||
75 | else if (vt->inc) | ||
76 | for (i = 0; i < nr_entries; i++) | ||
77 | vt->inc(vt->context, | ||
78 | value_ptr(n, i, vt->size)); | ||
79 | } | ||
80 | |||
81 | static int insert_at(size_t value_size, struct node *node, unsigned index, | ||
82 | uint64_t key, void *value) | ||
83 | __dm_written_to_disk(value) | ||
84 | { | ||
85 | uint32_t nr_entries = le32_to_cpu(node->header.nr_entries); | ||
86 | __le64 key_le = cpu_to_le64(key); | ||
87 | |||
88 | if (index > nr_entries || | ||
89 | index >= le32_to_cpu(node->header.max_entries)) { | ||
90 | DMERR("too many entries in btree node for insert"); | ||
91 | __dm_unbless_for_disk(value); | ||
92 | return -ENOMEM; | ||
93 | } | ||
94 | |||
95 | __dm_bless_for_disk(&key_le); | ||
96 | |||
97 | array_insert(node->keys, sizeof(*node->keys), nr_entries, index, &key_le); | ||
98 | array_insert(value_base(node), value_size, nr_entries, index, value); | ||
99 | node->header.nr_entries = cpu_to_le32(nr_entries + 1); | ||
100 | |||
101 | return 0; | ||
102 | } | ||
103 | |||
104 | /*----------------------------------------------------------------*/ | ||
105 | |||
106 | /* | ||
107 | * We want 3n entries (for some n). This works more nicely for repeated | ||
108 | * insert remove loops than (2n + 1). | ||
109 | */ | ||
110 | static uint32_t calc_max_entries(size_t value_size, size_t block_size) | ||
111 | { | ||
112 | uint32_t total, n; | ||
113 | size_t elt_size = sizeof(uint64_t) + value_size; /* key + value */ | ||
114 | |||
115 | block_size -= sizeof(struct node_header); | ||
116 | total = block_size / elt_size; | ||
117 | n = total / 3; /* rounds down */ | ||
118 | |||
119 | return 3 * n; | ||
120 | } | ||
121 | |||
122 | int dm_btree_empty(struct dm_btree_info *info, dm_block_t *root) | ||
123 | { | ||
124 | int r; | ||
125 | struct dm_block *b; | ||
126 | struct node *n; | ||
127 | size_t block_size; | ||
128 | uint32_t max_entries; | ||
129 | |||
130 | r = new_block(info, &b); | ||
131 | if (r < 0) | ||
132 | return r; | ||
133 | |||
134 | block_size = dm_bm_block_size(dm_tm_get_bm(info->tm)); | ||
135 | max_entries = calc_max_entries(info->value_type.size, block_size); | ||
136 | |||
137 | n = dm_block_data(b); | ||
138 | memset(n, 0, block_size); | ||
139 | n->header.flags = cpu_to_le32(LEAF_NODE); | ||
140 | n->header.nr_entries = cpu_to_le32(0); | ||
141 | n->header.max_entries = cpu_to_le32(max_entries); | ||
142 | n->header.value_size = cpu_to_le32(info->value_type.size); | ||
143 | |||
144 | *root = dm_block_location(b); | ||
145 | return unlock_block(info, b); | ||
146 | } | ||
147 | EXPORT_SYMBOL_GPL(dm_btree_empty); | ||
148 | |||
149 | /*----------------------------------------------------------------*/ | ||
150 | |||
151 | /* | ||
152 | * Deletion uses a recursive algorithm, since we have limited stack space | ||
153 | * we explicitly manage our own stack on the heap. | ||
154 | */ | ||
155 | #define MAX_SPINE_DEPTH 64 | ||
156 | struct frame { | ||
157 | struct dm_block *b; | ||
158 | struct node *n; | ||
159 | unsigned level; | ||
160 | unsigned nr_children; | ||
161 | unsigned current_child; | ||
162 | }; | ||
163 | |||
164 | struct del_stack { | ||
165 | struct dm_transaction_manager *tm; | ||
166 | int top; | ||
167 | struct frame spine[MAX_SPINE_DEPTH]; | ||
168 | }; | ||
169 | |||
170 | static int top_frame(struct del_stack *s, struct frame **f) | ||
171 | { | ||
172 | if (s->top < 0) { | ||
173 | DMERR("btree deletion stack empty"); | ||
174 | return -EINVAL; | ||
175 | } | ||
176 | |||
177 | *f = s->spine + s->top; | ||
178 | |||
179 | return 0; | ||
180 | } | ||
181 | |||
182 | static int unprocessed_frames(struct del_stack *s) | ||
183 | { | ||
184 | return s->top >= 0; | ||
185 | } | ||
186 | |||
187 | static int push_frame(struct del_stack *s, dm_block_t b, unsigned level) | ||
188 | { | ||
189 | int r; | ||
190 | uint32_t ref_count; | ||
191 | |||
192 | if (s->top >= MAX_SPINE_DEPTH - 1) { | ||
193 | DMERR("btree deletion stack out of memory"); | ||
194 | return -ENOMEM; | ||
195 | } | ||
196 | |||
197 | r = dm_tm_ref(s->tm, b, &ref_count); | ||
198 | if (r) | ||
199 | return r; | ||
200 | |||
201 | if (ref_count > 1) | ||
202 | /* | ||
203 | * This is a shared node, so we can just decrement it's | ||
204 | * reference counter and leave the children. | ||
205 | */ | ||
206 | dm_tm_dec(s->tm, b); | ||
207 | |||
208 | else { | ||
209 | struct frame *f = s->spine + ++s->top; | ||
210 | |||
211 | r = dm_tm_read_lock(s->tm, b, &btree_node_validator, &f->b); | ||
212 | if (r) { | ||
213 | s->top--; | ||
214 | return r; | ||
215 | } | ||
216 | |||
217 | f->n = dm_block_data(f->b); | ||
218 | f->level = level; | ||
219 | f->nr_children = le32_to_cpu(f->n->header.nr_entries); | ||
220 | f->current_child = 0; | ||
221 | } | ||
222 | |||
223 | return 0; | ||
224 | } | ||
225 | |||
226 | static void pop_frame(struct del_stack *s) | ||
227 | { | ||
228 | struct frame *f = s->spine + s->top--; | ||
229 | |||
230 | dm_tm_dec(s->tm, dm_block_location(f->b)); | ||
231 | dm_tm_unlock(s->tm, f->b); | ||
232 | } | ||
233 | |||
234 | int dm_btree_del(struct dm_btree_info *info, dm_block_t root) | ||
235 | { | ||
236 | int r; | ||
237 | struct del_stack *s; | ||
238 | |||
239 | s = kmalloc(sizeof(*s), GFP_KERNEL); | ||
240 | if (!s) | ||
241 | return -ENOMEM; | ||
242 | s->tm = info->tm; | ||
243 | s->top = -1; | ||
244 | |||
245 | r = push_frame(s, root, 1); | ||
246 | if (r) | ||
247 | goto out; | ||
248 | |||
249 | while (unprocessed_frames(s)) { | ||
250 | uint32_t flags; | ||
251 | struct frame *f; | ||
252 | dm_block_t b; | ||
253 | |||
254 | r = top_frame(s, &f); | ||
255 | if (r) | ||
256 | goto out; | ||
257 | |||
258 | if (f->current_child >= f->nr_children) { | ||
259 | pop_frame(s); | ||
260 | continue; | ||
261 | } | ||
262 | |||
263 | flags = le32_to_cpu(f->n->header.flags); | ||
264 | if (flags & INTERNAL_NODE) { | ||
265 | b = value64(f->n, f->current_child); | ||
266 | f->current_child++; | ||
267 | r = push_frame(s, b, f->level); | ||
268 | if (r) | ||
269 | goto out; | ||
270 | |||
271 | } else if (f->level != (info->levels - 1)) { | ||
272 | b = value64(f->n, f->current_child); | ||
273 | f->current_child++; | ||
274 | r = push_frame(s, b, f->level + 1); | ||
275 | if (r) | ||
276 | goto out; | ||
277 | |||
278 | } else { | ||
279 | if (info->value_type.dec) { | ||
280 | unsigned i; | ||
281 | |||
282 | for (i = 0; i < f->nr_children; i++) | ||
283 | info->value_type.dec(info->value_type.context, | ||
284 | value_ptr(f->n, i, info->value_type.size)); | ||
285 | } | ||
286 | f->current_child = f->nr_children; | ||
287 | } | ||
288 | } | ||
289 | |||
290 | out: | ||
291 | kfree(s); | ||
292 | return r; | ||
293 | } | ||
294 | EXPORT_SYMBOL_GPL(dm_btree_del); | ||
295 | |||
296 | /*----------------------------------------------------------------*/ | ||
297 | |||
298 | static int btree_lookup_raw(struct ro_spine *s, dm_block_t block, uint64_t key, | ||
299 | int (*search_fn)(struct node *, uint64_t), | ||
300 | uint64_t *result_key, void *v, size_t value_size) | ||
301 | { | ||
302 | int i, r; | ||
303 | uint32_t flags, nr_entries; | ||
304 | |||
305 | do { | ||
306 | r = ro_step(s, block); | ||
307 | if (r < 0) | ||
308 | return r; | ||
309 | |||
310 | i = search_fn(ro_node(s), key); | ||
311 | |||
312 | flags = le32_to_cpu(ro_node(s)->header.flags); | ||
313 | nr_entries = le32_to_cpu(ro_node(s)->header.nr_entries); | ||
314 | if (i < 0 || i >= nr_entries) | ||
315 | return -ENODATA; | ||
316 | |||
317 | if (flags & INTERNAL_NODE) | ||
318 | block = value64(ro_node(s), i); | ||
319 | |||
320 | } while (!(flags & LEAF_NODE)); | ||
321 | |||
322 | *result_key = le64_to_cpu(ro_node(s)->keys[i]); | ||
323 | memcpy(v, value_ptr(ro_node(s), i, value_size), value_size); | ||
324 | |||
325 | return 0; | ||
326 | } | ||
327 | |||
328 | int dm_btree_lookup(struct dm_btree_info *info, dm_block_t root, | ||
329 | uint64_t *keys, void *value_le) | ||
330 | { | ||
331 | unsigned level, last_level = info->levels - 1; | ||
332 | int r = -ENODATA; | ||
333 | uint64_t rkey; | ||
334 | __le64 internal_value_le; | ||
335 | struct ro_spine spine; | ||
336 | |||
337 | init_ro_spine(&spine, info); | ||
338 | for (level = 0; level < info->levels; level++) { | ||
339 | size_t size; | ||
340 | void *value_p; | ||
341 | |||
342 | if (level == last_level) { | ||
343 | value_p = value_le; | ||
344 | size = info->value_type.size; | ||
345 | |||
346 | } else { | ||
347 | value_p = &internal_value_le; | ||
348 | size = sizeof(uint64_t); | ||
349 | } | ||
350 | |||
351 | r = btree_lookup_raw(&spine, root, keys[level], | ||
352 | lower_bound, &rkey, | ||
353 | value_p, size); | ||
354 | |||
355 | if (!r) { | ||
356 | if (rkey != keys[level]) { | ||
357 | exit_ro_spine(&spine); | ||
358 | return -ENODATA; | ||
359 | } | ||
360 | } else { | ||
361 | exit_ro_spine(&spine); | ||
362 | return r; | ||
363 | } | ||
364 | |||
365 | root = le64_to_cpu(internal_value_le); | ||
366 | } | ||
367 | exit_ro_spine(&spine); | ||
368 | |||
369 | return r; | ||
370 | } | ||
371 | EXPORT_SYMBOL_GPL(dm_btree_lookup); | ||
372 | |||
373 | /* | ||
374 | * Splits a node by creating a sibling node and shifting half the nodes | ||
375 | * contents across. Assumes there is a parent node, and it has room for | ||
376 | * another child. | ||
377 | * | ||
378 | * Before: | ||
379 | * +--------+ | ||
380 | * | Parent | | ||
381 | * +--------+ | ||
382 | * | | ||
383 | * v | ||
384 | * +----------+ | ||
385 | * | A ++++++ | | ||
386 | * +----------+ | ||
387 | * | ||
388 | * | ||
389 | * After: | ||
390 | * +--------+ | ||
391 | * | Parent | | ||
392 | * +--------+ | ||
393 | * | | | ||
394 | * v +------+ | ||
395 | * +---------+ | | ||
396 | * | A* +++ | v | ||
397 | * +---------+ +-------+ | ||
398 | * | B +++ | | ||
399 | * +-------+ | ||
400 | * | ||
401 | * Where A* is a shadow of A. | ||
402 | */ | ||
403 | static int btree_split_sibling(struct shadow_spine *s, dm_block_t root, | ||
404 | unsigned parent_index, uint64_t key) | ||
405 | { | ||
406 | int r; | ||
407 | size_t size; | ||
408 | unsigned nr_left, nr_right; | ||
409 | struct dm_block *left, *right, *parent; | ||
410 | struct node *ln, *rn, *pn; | ||
411 | __le64 location; | ||
412 | |||
413 | left = shadow_current(s); | ||
414 | |||
415 | r = new_block(s->info, &right); | ||
416 | if (r < 0) | ||
417 | return r; | ||
418 | |||
419 | ln = dm_block_data(left); | ||
420 | rn = dm_block_data(right); | ||
421 | |||
422 | nr_left = le32_to_cpu(ln->header.nr_entries) / 2; | ||
423 | nr_right = le32_to_cpu(ln->header.nr_entries) - nr_left; | ||
424 | |||
425 | ln->header.nr_entries = cpu_to_le32(nr_left); | ||
426 | |||
427 | rn->header.flags = ln->header.flags; | ||
428 | rn->header.nr_entries = cpu_to_le32(nr_right); | ||
429 | rn->header.max_entries = ln->header.max_entries; | ||
430 | rn->header.value_size = ln->header.value_size; | ||
431 | memcpy(rn->keys, ln->keys + nr_left, nr_right * sizeof(rn->keys[0])); | ||
432 | |||
433 | size = le32_to_cpu(ln->header.flags) & INTERNAL_NODE ? | ||
434 | sizeof(uint64_t) : s->info->value_type.size; | ||
435 | memcpy(value_ptr(rn, 0, size), value_ptr(ln, nr_left, size), | ||
436 | size * nr_right); | ||
437 | |||
438 | /* | ||
439 | * Patch up the parent | ||
440 | */ | ||
441 | parent = shadow_parent(s); | ||
442 | |||
443 | pn = dm_block_data(parent); | ||
444 | location = cpu_to_le64(dm_block_location(left)); | ||
445 | __dm_bless_for_disk(&location); | ||
446 | memcpy_disk(value_ptr(pn, parent_index, sizeof(__le64)), | ||
447 | &location, sizeof(__le64)); | ||
448 | |||
449 | location = cpu_to_le64(dm_block_location(right)); | ||
450 | __dm_bless_for_disk(&location); | ||
451 | |||
452 | r = insert_at(sizeof(__le64), pn, parent_index + 1, | ||
453 | le64_to_cpu(rn->keys[0]), &location); | ||
454 | if (r) | ||
455 | return r; | ||
456 | |||
457 | if (key < le64_to_cpu(rn->keys[0])) { | ||
458 | unlock_block(s->info, right); | ||
459 | s->nodes[1] = left; | ||
460 | } else { | ||
461 | unlock_block(s->info, left); | ||
462 | s->nodes[1] = right; | ||
463 | } | ||
464 | |||
465 | return 0; | ||
466 | } | ||
467 | |||
468 | /* | ||
469 | * Splits a node by creating two new children beneath the given node. | ||
470 | * | ||
471 | * Before: | ||
472 | * +----------+ | ||
473 | * | A ++++++ | | ||
474 | * +----------+ | ||
475 | * | ||
476 | * | ||
477 | * After: | ||
478 | * +------------+ | ||
479 | * | A (shadow) | | ||
480 | * +------------+ | ||
481 | * | | | ||
482 | * +------+ +----+ | ||
483 | * | | | ||
484 | * v v | ||
485 | * +-------+ +-------+ | ||
486 | * | B +++ | | C +++ | | ||
487 | * +-------+ +-------+ | ||
488 | */ | ||
489 | static int btree_split_beneath(struct shadow_spine *s, uint64_t key) | ||
490 | { | ||
491 | int r; | ||
492 | size_t size; | ||
493 | unsigned nr_left, nr_right; | ||
494 | struct dm_block *left, *right, *new_parent; | ||
495 | struct node *pn, *ln, *rn; | ||
496 | __le64 val; | ||
497 | |||
498 | new_parent = shadow_current(s); | ||
499 | |||
500 | r = new_block(s->info, &left); | ||
501 | if (r < 0) | ||
502 | return r; | ||
503 | |||
504 | r = new_block(s->info, &right); | ||
505 | if (r < 0) { | ||
506 | /* FIXME: put left */ | ||
507 | return r; | ||
508 | } | ||
509 | |||
510 | pn = dm_block_data(new_parent); | ||
511 | ln = dm_block_data(left); | ||
512 | rn = dm_block_data(right); | ||
513 | |||
514 | nr_left = le32_to_cpu(pn->header.nr_entries) / 2; | ||
515 | nr_right = le32_to_cpu(pn->header.nr_entries) - nr_left; | ||
516 | |||
517 | ln->header.flags = pn->header.flags; | ||
518 | ln->header.nr_entries = cpu_to_le32(nr_left); | ||
519 | ln->header.max_entries = pn->header.max_entries; | ||
520 | ln->header.value_size = pn->header.value_size; | ||
521 | |||
522 | rn->header.flags = pn->header.flags; | ||
523 | rn->header.nr_entries = cpu_to_le32(nr_right); | ||
524 | rn->header.max_entries = pn->header.max_entries; | ||
525 | rn->header.value_size = pn->header.value_size; | ||
526 | |||
527 | memcpy(ln->keys, pn->keys, nr_left * sizeof(pn->keys[0])); | ||
528 | memcpy(rn->keys, pn->keys + nr_left, nr_right * sizeof(pn->keys[0])); | ||
529 | |||
530 | size = le32_to_cpu(pn->header.flags) & INTERNAL_NODE ? | ||
531 | sizeof(__le64) : s->info->value_type.size; | ||
532 | memcpy(value_ptr(ln, 0, size), value_ptr(pn, 0, size), nr_left * size); | ||
533 | memcpy(value_ptr(rn, 0, size), value_ptr(pn, nr_left, size), | ||
534 | nr_right * size); | ||
535 | |||
536 | /* new_parent should just point to l and r now */ | ||
537 | pn->header.flags = cpu_to_le32(INTERNAL_NODE); | ||
538 | pn->header.nr_entries = cpu_to_le32(2); | ||
539 | pn->header.max_entries = cpu_to_le32( | ||
540 | calc_max_entries(sizeof(__le64), | ||
541 | dm_bm_block_size( | ||
542 | dm_tm_get_bm(s->info->tm)))); | ||
543 | pn->header.value_size = cpu_to_le32(sizeof(__le64)); | ||
544 | |||
545 | val = cpu_to_le64(dm_block_location(left)); | ||
546 | __dm_bless_for_disk(&val); | ||
547 | pn->keys[0] = ln->keys[0]; | ||
548 | memcpy_disk(value_ptr(pn, 0, sizeof(__le64)), &val, sizeof(__le64)); | ||
549 | |||
550 | val = cpu_to_le64(dm_block_location(right)); | ||
551 | __dm_bless_for_disk(&val); | ||
552 | pn->keys[1] = rn->keys[0]; | ||
553 | memcpy_disk(value_ptr(pn, 1, sizeof(__le64)), &val, sizeof(__le64)); | ||
554 | |||
555 | /* | ||
556 | * rejig the spine. This is ugly, since it knows too | ||
557 | * much about the spine | ||
558 | */ | ||
559 | if (s->nodes[0] != new_parent) { | ||
560 | unlock_block(s->info, s->nodes[0]); | ||
561 | s->nodes[0] = new_parent; | ||
562 | } | ||
563 | if (key < le64_to_cpu(rn->keys[0])) { | ||
564 | unlock_block(s->info, right); | ||
565 | s->nodes[1] = left; | ||
566 | } else { | ||
567 | unlock_block(s->info, left); | ||
568 | s->nodes[1] = right; | ||
569 | } | ||
570 | s->count = 2; | ||
571 | |||
572 | return 0; | ||
573 | } | ||
574 | |||
575 | static int btree_insert_raw(struct shadow_spine *s, dm_block_t root, | ||
576 | struct dm_btree_value_type *vt, | ||
577 | uint64_t key, unsigned *index) | ||
578 | { | ||
579 | int r, i = *index, top = 1; | ||
580 | struct node *node; | ||
581 | |||
582 | for (;;) { | ||
583 | r = shadow_step(s, root, vt); | ||
584 | if (r < 0) | ||
585 | return r; | ||
586 | |||
587 | node = dm_block_data(shadow_current(s)); | ||
588 | |||
589 | /* | ||
590 | * We have to patch up the parent node, ugly, but I don't | ||
591 | * see a way to do this automatically as part of the spine | ||
592 | * op. | ||
593 | */ | ||
594 | if (shadow_has_parent(s) && i >= 0) { /* FIXME: second clause unness. */ | ||
595 | __le64 location = cpu_to_le64(dm_block_location(shadow_current(s))); | ||
596 | |||
597 | __dm_bless_for_disk(&location); | ||
598 | memcpy_disk(value_ptr(dm_block_data(shadow_parent(s)), i, sizeof(uint64_t)), | ||
599 | &location, sizeof(__le64)); | ||
600 | } | ||
601 | |||
602 | node = dm_block_data(shadow_current(s)); | ||
603 | |||
604 | if (node->header.nr_entries == node->header.max_entries) { | ||
605 | if (top) | ||
606 | r = btree_split_beneath(s, key); | ||
607 | else | ||
608 | r = btree_split_sibling(s, root, i, key); | ||
609 | |||
610 | if (r < 0) | ||
611 | return r; | ||
612 | } | ||
613 | |||
614 | node = dm_block_data(shadow_current(s)); | ||
615 | |||
616 | i = lower_bound(node, key); | ||
617 | |||
618 | if (le32_to_cpu(node->header.flags) & LEAF_NODE) | ||
619 | break; | ||
620 | |||
621 | if (i < 0) { | ||
622 | /* change the bounds on the lowest key */ | ||
623 | node->keys[0] = cpu_to_le64(key); | ||
624 | i = 0; | ||
625 | } | ||
626 | |||
627 | root = value64(node, i); | ||
628 | top = 0; | ||
629 | } | ||
630 | |||
631 | if (i < 0 || le64_to_cpu(node->keys[i]) != key) | ||
632 | i++; | ||
633 | |||
634 | *index = i; | ||
635 | return 0; | ||
636 | } | ||
637 | |||
638 | static int insert(struct dm_btree_info *info, dm_block_t root, | ||
639 | uint64_t *keys, void *value, dm_block_t *new_root, | ||
640 | int *inserted) | ||
641 | __dm_written_to_disk(value) | ||
642 | { | ||
643 | int r, need_insert; | ||
644 | unsigned level, index = -1, last_level = info->levels - 1; | ||
645 | dm_block_t block = root; | ||
646 | struct shadow_spine spine; | ||
647 | struct node *n; | ||
648 | struct dm_btree_value_type le64_type; | ||
649 | |||
650 | le64_type.context = NULL; | ||
651 | le64_type.size = sizeof(__le64); | ||
652 | le64_type.inc = NULL; | ||
653 | le64_type.dec = NULL; | ||
654 | le64_type.equal = NULL; | ||
655 | |||
656 | init_shadow_spine(&spine, info); | ||
657 | |||
658 | for (level = 0; level < (info->levels - 1); level++) { | ||
659 | r = btree_insert_raw(&spine, block, &le64_type, keys[level], &index); | ||
660 | if (r < 0) | ||
661 | goto bad; | ||
662 | |||
663 | n = dm_block_data(shadow_current(&spine)); | ||
664 | need_insert = ((index >= le32_to_cpu(n->header.nr_entries)) || | ||
665 | (le64_to_cpu(n->keys[index]) != keys[level])); | ||
666 | |||
667 | if (need_insert) { | ||
668 | dm_block_t new_tree; | ||
669 | __le64 new_le; | ||
670 | |||
671 | r = dm_btree_empty(info, &new_tree); | ||
672 | if (r < 0) | ||
673 | goto bad; | ||
674 | |||
675 | new_le = cpu_to_le64(new_tree); | ||
676 | __dm_bless_for_disk(&new_le); | ||
677 | |||
678 | r = insert_at(sizeof(uint64_t), n, index, | ||
679 | keys[level], &new_le); | ||
680 | if (r) | ||
681 | goto bad; | ||
682 | } | ||
683 | |||
684 | if (level < last_level) | ||
685 | block = value64(n, index); | ||
686 | } | ||
687 | |||
688 | r = btree_insert_raw(&spine, block, &info->value_type, | ||
689 | keys[level], &index); | ||
690 | if (r < 0) | ||
691 | goto bad; | ||
692 | |||
693 | n = dm_block_data(shadow_current(&spine)); | ||
694 | need_insert = ((index >= le32_to_cpu(n->header.nr_entries)) || | ||
695 | (le64_to_cpu(n->keys[index]) != keys[level])); | ||
696 | |||
697 | if (need_insert) { | ||
698 | if (inserted) | ||
699 | *inserted = 1; | ||
700 | |||
701 | r = insert_at(info->value_type.size, n, index, | ||
702 | keys[level], value); | ||
703 | if (r) | ||
704 | goto bad_unblessed; | ||
705 | } else { | ||
706 | if (inserted) | ||
707 | *inserted = 0; | ||
708 | |||
709 | if (info->value_type.dec && | ||
710 | (!info->value_type.equal || | ||
711 | !info->value_type.equal( | ||
712 | info->value_type.context, | ||
713 | value_ptr(n, index, info->value_type.size), | ||
714 | value))) { | ||
715 | info->value_type.dec(info->value_type.context, | ||
716 | value_ptr(n, index, info->value_type.size)); | ||
717 | } | ||
718 | memcpy_disk(value_ptr(n, index, info->value_type.size), | ||
719 | value, info->value_type.size); | ||
720 | } | ||
721 | |||
722 | *new_root = shadow_root(&spine); | ||
723 | exit_shadow_spine(&spine); | ||
724 | |||
725 | return 0; | ||
726 | |||
727 | bad: | ||
728 | __dm_unbless_for_disk(value); | ||
729 | bad_unblessed: | ||
730 | exit_shadow_spine(&spine); | ||
731 | return r; | ||
732 | } | ||
733 | |||
734 | int dm_btree_insert(struct dm_btree_info *info, dm_block_t root, | ||
735 | uint64_t *keys, void *value, dm_block_t *new_root) | ||
736 | __dm_written_to_disk(value) | ||
737 | { | ||
738 | return insert(info, root, keys, value, new_root, NULL); | ||
739 | } | ||
740 | EXPORT_SYMBOL_GPL(dm_btree_insert); | ||
741 | |||
742 | int dm_btree_insert_notify(struct dm_btree_info *info, dm_block_t root, | ||
743 | uint64_t *keys, void *value, dm_block_t *new_root, | ||
744 | int *inserted) | ||
745 | __dm_written_to_disk(value) | ||
746 | { | ||
747 | return insert(info, root, keys, value, new_root, inserted); | ||
748 | } | ||
749 | EXPORT_SYMBOL_GPL(dm_btree_insert_notify); | ||
750 | |||
751 | /*----------------------------------------------------------------*/ | ||
752 | |||
753 | static int find_highest_key(struct ro_spine *s, dm_block_t block, | ||
754 | uint64_t *result_key, dm_block_t *next_block) | ||
755 | { | ||
756 | int i, r; | ||
757 | uint32_t flags; | ||
758 | |||
759 | do { | ||
760 | r = ro_step(s, block); | ||
761 | if (r < 0) | ||
762 | return r; | ||
763 | |||
764 | flags = le32_to_cpu(ro_node(s)->header.flags); | ||
765 | i = le32_to_cpu(ro_node(s)->header.nr_entries); | ||
766 | if (!i) | ||
767 | return -ENODATA; | ||
768 | else | ||
769 | i--; | ||
770 | |||
771 | *result_key = le64_to_cpu(ro_node(s)->keys[i]); | ||
772 | if (next_block || flags & INTERNAL_NODE) | ||
773 | block = value64(ro_node(s), i); | ||
774 | |||
775 | } while (flags & INTERNAL_NODE); | ||
776 | |||
777 | if (next_block) | ||
778 | *next_block = block; | ||
779 | return 0; | ||
780 | } | ||
781 | |||
782 | int dm_btree_find_highest_key(struct dm_btree_info *info, dm_block_t root, | ||
783 | uint64_t *result_keys) | ||
784 | { | ||
785 | int r = 0, count = 0, level; | ||
786 | struct ro_spine spine; | ||
787 | |||
788 | init_ro_spine(&spine, info); | ||
789 | for (level = 0; level < info->levels; level++) { | ||
790 | r = find_highest_key(&spine, root, result_keys + level, | ||
791 | level == info->levels - 1 ? NULL : &root); | ||
792 | if (r == -ENODATA) { | ||
793 | r = 0; | ||
794 | break; | ||
795 | |||
796 | } else if (r) | ||
797 | break; | ||
798 | |||
799 | count++; | ||
800 | } | ||
801 | exit_ro_spine(&spine); | ||
802 | |||
803 | return r ? r : count; | ||
804 | } | ||
805 | EXPORT_SYMBOL_GPL(dm_btree_find_highest_key); | ||
diff --git a/drivers/md/persistent-data/dm-btree.h b/drivers/md/persistent-data/dm-btree.h new file mode 100644 index 000000000000..ae02c84410ff --- /dev/null +++ b/drivers/md/persistent-data/dm-btree.h | |||
@@ -0,0 +1,145 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | #ifndef _LINUX_DM_BTREE_H | ||
7 | #define _LINUX_DM_BTREE_H | ||
8 | |||
9 | #include "dm-block-manager.h" | ||
10 | |||
11 | struct dm_transaction_manager; | ||
12 | |||
13 | /*----------------------------------------------------------------*/ | ||
14 | |||
15 | /* | ||
16 | * Annotations used to check on-disk metadata is handled as little-endian. | ||
17 | */ | ||
18 | #ifdef __CHECKER__ | ||
19 | # define __dm_written_to_disk(x) __releases(x) | ||
20 | # define __dm_reads_from_disk(x) __acquires(x) | ||
21 | # define __dm_bless_for_disk(x) __acquire(x) | ||
22 | # define __dm_unbless_for_disk(x) __release(x) | ||
23 | #else | ||
24 | # define __dm_written_to_disk(x) | ||
25 | # define __dm_reads_from_disk(x) | ||
26 | # define __dm_bless_for_disk(x) | ||
27 | # define __dm_unbless_for_disk(x) | ||
28 | #endif | ||
29 | |||
30 | /*----------------------------------------------------------------*/ | ||
31 | |||
32 | /* | ||
33 | * Manipulates hierarchical B+ trees with 64-bit keys and arbitrary-sized | ||
34 | * values. | ||
35 | */ | ||
36 | |||
37 | /* | ||
38 | * Infomation about the values stored within the btree. | ||
39 | */ | ||
40 | struct dm_btree_value_type { | ||
41 | void *context; | ||
42 | |||
43 | /* | ||
44 | * The size in bytes of each value. | ||
45 | */ | ||
46 | uint32_t size; | ||
47 | |||
48 | /* | ||
49 | * Any of these methods can be safely set to NULL if you do not | ||
50 | * need the corresponding feature. | ||
51 | */ | ||
52 | |||
53 | /* | ||
54 | * The btree is making a duplicate of the value, for instance | ||
55 | * because previously-shared btree nodes have now diverged. | ||
56 | * @value argument is the new copy that the copy function may modify. | ||
57 | * (Probably it just wants to increment a reference count | ||
58 | * somewhere.) This method is _not_ called for insertion of a new | ||
59 | * value: It is assumed the ref count is already 1. | ||
60 | */ | ||
61 | void (*inc)(void *context, void *value); | ||
62 | |||
63 | /* | ||
64 | * This value is being deleted. The btree takes care of freeing | ||
65 | * the memory pointed to by @value. Often the del function just | ||
66 | * needs to decrement a reference count somewhere. | ||
67 | */ | ||
68 | void (*dec)(void *context, void *value); | ||
69 | |||
70 | /* | ||
71 | * A test for equality between two values. When a value is | ||
72 | * overwritten with a new one, the old one has the dec method | ||
73 | * called _unless_ the new and old value are deemed equal. | ||
74 | */ | ||
75 | int (*equal)(void *context, void *value1, void *value2); | ||
76 | }; | ||
77 | |||
78 | /* | ||
79 | * The shape and contents of a btree. | ||
80 | */ | ||
81 | struct dm_btree_info { | ||
82 | struct dm_transaction_manager *tm; | ||
83 | |||
84 | /* | ||
85 | * Number of nested btrees. (Not the depth of a single tree.) | ||
86 | */ | ||
87 | unsigned levels; | ||
88 | struct dm_btree_value_type value_type; | ||
89 | }; | ||
90 | |||
91 | /* | ||
92 | * Set up an empty tree. O(1). | ||
93 | */ | ||
94 | int dm_btree_empty(struct dm_btree_info *info, dm_block_t *root); | ||
95 | |||
96 | /* | ||
97 | * Delete a tree. O(n) - this is the slow one! It can also block, so | ||
98 | * please don't call it on an IO path. | ||
99 | */ | ||
100 | int dm_btree_del(struct dm_btree_info *info, dm_block_t root); | ||
101 | |||
102 | /* | ||
103 | * All the lookup functions return -ENODATA if the key cannot be found. | ||
104 | */ | ||
105 | |||
106 | /* | ||
107 | * Tries to find a key that matches exactly. O(ln(n)) | ||
108 | */ | ||
109 | int dm_btree_lookup(struct dm_btree_info *info, dm_block_t root, | ||
110 | uint64_t *keys, void *value_le); | ||
111 | |||
112 | /* | ||
113 | * Insertion (or overwrite an existing value). O(ln(n)) | ||
114 | */ | ||
115 | int dm_btree_insert(struct dm_btree_info *info, dm_block_t root, | ||
116 | uint64_t *keys, void *value, dm_block_t *new_root) | ||
117 | __dm_written_to_disk(value); | ||
118 | |||
119 | /* | ||
120 | * A variant of insert that indicates whether it actually inserted or just | ||
121 | * overwrote. Useful if you're keeping track of the number of entries in a | ||
122 | * tree. | ||
123 | */ | ||
124 | int dm_btree_insert_notify(struct dm_btree_info *info, dm_block_t root, | ||
125 | uint64_t *keys, void *value, dm_block_t *new_root, | ||
126 | int *inserted) | ||
127 | __dm_written_to_disk(value); | ||
128 | |||
129 | /* | ||
130 | * Remove a key if present. This doesn't remove empty sub trees. Normally | ||
131 | * subtrees represent a separate entity, like a snapshot map, so this is | ||
132 | * correct behaviour. O(ln(n)). | ||
133 | */ | ||
134 | int dm_btree_remove(struct dm_btree_info *info, dm_block_t root, | ||
135 | uint64_t *keys, dm_block_t *new_root); | ||
136 | |||
137 | /* | ||
138 | * Returns < 0 on failure. Otherwise the number of key entries that have | ||
139 | * been filled out. Remember trees can have zero entries, and as such have | ||
140 | * no highest key. | ||
141 | */ | ||
142 | int dm_btree_find_highest_key(struct dm_btree_info *info, dm_block_t root, | ||
143 | uint64_t *result_keys); | ||
144 | |||
145 | #endif /* _LINUX_DM_BTREE_H */ | ||
diff --git a/drivers/md/persistent-data/dm-persistent-data-internal.h b/drivers/md/persistent-data/dm-persistent-data-internal.h new file mode 100644 index 000000000000..c49e26fff36c --- /dev/null +++ b/drivers/md/persistent-data/dm-persistent-data-internal.h | |||
@@ -0,0 +1,19 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef _DM_PERSISTENT_DATA_INTERNAL_H | ||
8 | #define _DM_PERSISTENT_DATA_INTERNAL_H | ||
9 | |||
10 | #include "dm-block-manager.h" | ||
11 | |||
12 | static inline unsigned dm_hash_block(dm_block_t b, unsigned hash_mask) | ||
13 | { | ||
14 | const unsigned BIG_PRIME = 4294967291UL; | ||
15 | |||
16 | return (((unsigned) b) * BIG_PRIME) & hash_mask; | ||
17 | } | ||
18 | |||
19 | #endif /* _PERSISTENT_DATA_INTERNAL_H */ | ||
diff --git a/drivers/md/persistent-data/dm-space-map-checker.c b/drivers/md/persistent-data/dm-space-map-checker.c new file mode 100644 index 000000000000..bb44a937fe63 --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-checker.c | |||
@@ -0,0 +1,437 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-space-map-checker.h" | ||
8 | |||
9 | #include <linux/device-mapper.h> | ||
10 | |||
11 | #ifdef CONFIG_DM_DEBUG_SPACE_MAPS | ||
12 | |||
13 | #define DM_MSG_PREFIX "space map checker" | ||
14 | |||
15 | /*----------------------------------------------------------------*/ | ||
16 | |||
17 | struct count_array { | ||
18 | dm_block_t nr; | ||
19 | dm_block_t nr_free; | ||
20 | |||
21 | uint32_t *counts; | ||
22 | }; | ||
23 | |||
24 | static int ca_get_count(struct count_array *ca, dm_block_t b, uint32_t *count) | ||
25 | { | ||
26 | if (b >= ca->nr) | ||
27 | return -EINVAL; | ||
28 | |||
29 | *count = ca->counts[b]; | ||
30 | return 0; | ||
31 | } | ||
32 | |||
33 | static int ca_count_more_than_one(struct count_array *ca, dm_block_t b, int *r) | ||
34 | { | ||
35 | if (b >= ca->nr) | ||
36 | return -EINVAL; | ||
37 | |||
38 | *r = ca->counts[b] > 1; | ||
39 | return 0; | ||
40 | } | ||
41 | |||
42 | static int ca_set_count(struct count_array *ca, dm_block_t b, uint32_t count) | ||
43 | { | ||
44 | uint32_t old_count; | ||
45 | |||
46 | if (b >= ca->nr) | ||
47 | return -EINVAL; | ||
48 | |||
49 | old_count = ca->counts[b]; | ||
50 | |||
51 | if (!count && old_count) | ||
52 | ca->nr_free++; | ||
53 | |||
54 | else if (count && !old_count) | ||
55 | ca->nr_free--; | ||
56 | |||
57 | ca->counts[b] = count; | ||
58 | return 0; | ||
59 | } | ||
60 | |||
61 | static int ca_inc_block(struct count_array *ca, dm_block_t b) | ||
62 | { | ||
63 | if (b >= ca->nr) | ||
64 | return -EINVAL; | ||
65 | |||
66 | ca_set_count(ca, b, ca->counts[b] + 1); | ||
67 | return 0; | ||
68 | } | ||
69 | |||
70 | static int ca_dec_block(struct count_array *ca, dm_block_t b) | ||
71 | { | ||
72 | if (b >= ca->nr) | ||
73 | return -EINVAL; | ||
74 | |||
75 | BUG_ON(ca->counts[b] == 0); | ||
76 | ca_set_count(ca, b, ca->counts[b] - 1); | ||
77 | return 0; | ||
78 | } | ||
79 | |||
80 | static int ca_create(struct count_array *ca, struct dm_space_map *sm) | ||
81 | { | ||
82 | int r; | ||
83 | dm_block_t nr_blocks; | ||
84 | |||
85 | r = dm_sm_get_nr_blocks(sm, &nr_blocks); | ||
86 | if (r) | ||
87 | return r; | ||
88 | |||
89 | ca->nr = nr_blocks; | ||
90 | ca->nr_free = nr_blocks; | ||
91 | ca->counts = kzalloc(sizeof(*ca->counts) * nr_blocks, GFP_KERNEL); | ||
92 | if (!ca->counts) | ||
93 | return -ENOMEM; | ||
94 | |||
95 | return 0; | ||
96 | } | ||
97 | |||
98 | static int ca_load(struct count_array *ca, struct dm_space_map *sm) | ||
99 | { | ||
100 | int r; | ||
101 | uint32_t count; | ||
102 | dm_block_t nr_blocks, i; | ||
103 | |||
104 | r = dm_sm_get_nr_blocks(sm, &nr_blocks); | ||
105 | if (r) | ||
106 | return r; | ||
107 | |||
108 | BUG_ON(ca->nr != nr_blocks); | ||
109 | |||
110 | DMWARN("Loading debug space map from disk. This may take some time"); | ||
111 | for (i = 0; i < nr_blocks; i++) { | ||
112 | r = dm_sm_get_count(sm, i, &count); | ||
113 | if (r) { | ||
114 | DMERR("load failed"); | ||
115 | return r; | ||
116 | } | ||
117 | |||
118 | ca_set_count(ca, i, count); | ||
119 | } | ||
120 | DMWARN("Load complete"); | ||
121 | |||
122 | return 0; | ||
123 | } | ||
124 | |||
125 | static int ca_extend(struct count_array *ca, dm_block_t extra_blocks) | ||
126 | { | ||
127 | dm_block_t nr_blocks = ca->nr + extra_blocks; | ||
128 | uint32_t *counts = kzalloc(sizeof(*counts) * nr_blocks, GFP_KERNEL); | ||
129 | if (!counts) | ||
130 | return -ENOMEM; | ||
131 | |||
132 | memcpy(counts, ca->counts, sizeof(*counts) * ca->nr); | ||
133 | kfree(ca->counts); | ||
134 | ca->nr = nr_blocks; | ||
135 | ca->nr_free += extra_blocks; | ||
136 | ca->counts = counts; | ||
137 | return 0; | ||
138 | } | ||
139 | |||
140 | static int ca_commit(struct count_array *old, struct count_array *new) | ||
141 | { | ||
142 | if (old->nr != new->nr) { | ||
143 | BUG_ON(old->nr > new->nr); | ||
144 | ca_extend(old, new->nr - old->nr); | ||
145 | } | ||
146 | |||
147 | BUG_ON(old->nr != new->nr); | ||
148 | old->nr_free = new->nr_free; | ||
149 | memcpy(old->counts, new->counts, sizeof(*old->counts) * old->nr); | ||
150 | return 0; | ||
151 | } | ||
152 | |||
153 | static void ca_destroy(struct count_array *ca) | ||
154 | { | ||
155 | kfree(ca->counts); | ||
156 | } | ||
157 | |||
158 | /*----------------------------------------------------------------*/ | ||
159 | |||
160 | struct sm_checker { | ||
161 | struct dm_space_map sm; | ||
162 | |||
163 | struct count_array old_counts; | ||
164 | struct count_array counts; | ||
165 | |||
166 | struct dm_space_map *real_sm; | ||
167 | }; | ||
168 | |||
169 | static void sm_checker_destroy(struct dm_space_map *sm) | ||
170 | { | ||
171 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
172 | |||
173 | dm_sm_destroy(smc->real_sm); | ||
174 | ca_destroy(&smc->old_counts); | ||
175 | ca_destroy(&smc->counts); | ||
176 | kfree(smc); | ||
177 | } | ||
178 | |||
179 | static int sm_checker_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) | ||
180 | { | ||
181 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
182 | int r = dm_sm_get_nr_blocks(smc->real_sm, count); | ||
183 | if (!r) | ||
184 | BUG_ON(smc->old_counts.nr != *count); | ||
185 | return r; | ||
186 | } | ||
187 | |||
188 | static int sm_checker_get_nr_free(struct dm_space_map *sm, dm_block_t *count) | ||
189 | { | ||
190 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
191 | int r = dm_sm_get_nr_free(smc->real_sm, count); | ||
192 | if (!r) { | ||
193 | /* | ||
194 | * Slow, but we know it's correct. | ||
195 | */ | ||
196 | dm_block_t b, n = 0; | ||
197 | for (b = 0; b < smc->old_counts.nr; b++) | ||
198 | if (smc->old_counts.counts[b] == 0 && | ||
199 | smc->counts.counts[b] == 0) | ||
200 | n++; | ||
201 | |||
202 | if (n != *count) | ||
203 | DMERR("free block counts differ, checker %u, sm-disk:%u", | ||
204 | (unsigned) n, (unsigned) *count); | ||
205 | } | ||
206 | return r; | ||
207 | } | ||
208 | |||
209 | static int sm_checker_new_block(struct dm_space_map *sm, dm_block_t *b) | ||
210 | { | ||
211 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
212 | int r = dm_sm_new_block(smc->real_sm, b); | ||
213 | |||
214 | if (!r) { | ||
215 | BUG_ON(*b >= smc->old_counts.nr); | ||
216 | BUG_ON(smc->old_counts.counts[*b] != 0); | ||
217 | BUG_ON(*b >= smc->counts.nr); | ||
218 | BUG_ON(smc->counts.counts[*b] != 0); | ||
219 | ca_set_count(&smc->counts, *b, 1); | ||
220 | } | ||
221 | |||
222 | return r; | ||
223 | } | ||
224 | |||
225 | static int sm_checker_inc_block(struct dm_space_map *sm, dm_block_t b) | ||
226 | { | ||
227 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
228 | int r = dm_sm_inc_block(smc->real_sm, b); | ||
229 | int r2 = ca_inc_block(&smc->counts, b); | ||
230 | BUG_ON(r != r2); | ||
231 | return r; | ||
232 | } | ||
233 | |||
234 | static int sm_checker_dec_block(struct dm_space_map *sm, dm_block_t b) | ||
235 | { | ||
236 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
237 | int r = dm_sm_dec_block(smc->real_sm, b); | ||
238 | int r2 = ca_dec_block(&smc->counts, b); | ||
239 | BUG_ON(r != r2); | ||
240 | return r; | ||
241 | } | ||
242 | |||
243 | static int sm_checker_get_count(struct dm_space_map *sm, dm_block_t b, uint32_t *result) | ||
244 | { | ||
245 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
246 | uint32_t result2 = 0; | ||
247 | int r = dm_sm_get_count(smc->real_sm, b, result); | ||
248 | int r2 = ca_get_count(&smc->counts, b, &result2); | ||
249 | |||
250 | BUG_ON(r != r2); | ||
251 | if (!r) | ||
252 | BUG_ON(*result != result2); | ||
253 | return r; | ||
254 | } | ||
255 | |||
256 | static int sm_checker_count_more_than_one(struct dm_space_map *sm, dm_block_t b, int *result) | ||
257 | { | ||
258 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
259 | int result2 = 0; | ||
260 | int r = dm_sm_count_is_more_than_one(smc->real_sm, b, result); | ||
261 | int r2 = ca_count_more_than_one(&smc->counts, b, &result2); | ||
262 | |||
263 | BUG_ON(r != r2); | ||
264 | if (!r) | ||
265 | BUG_ON(!(*result) && result2); | ||
266 | return r; | ||
267 | } | ||
268 | |||
269 | static int sm_checker_set_count(struct dm_space_map *sm, dm_block_t b, uint32_t count) | ||
270 | { | ||
271 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
272 | uint32_t old_rc; | ||
273 | int r = dm_sm_set_count(smc->real_sm, b, count); | ||
274 | int r2; | ||
275 | |||
276 | BUG_ON(b >= smc->counts.nr); | ||
277 | old_rc = smc->counts.counts[b]; | ||
278 | r2 = ca_set_count(&smc->counts, b, count); | ||
279 | BUG_ON(r != r2); | ||
280 | |||
281 | return r; | ||
282 | } | ||
283 | |||
284 | static int sm_checker_commit(struct dm_space_map *sm) | ||
285 | { | ||
286 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
287 | int r; | ||
288 | |||
289 | r = dm_sm_commit(smc->real_sm); | ||
290 | if (r) | ||
291 | return r; | ||
292 | |||
293 | r = ca_commit(&smc->old_counts, &smc->counts); | ||
294 | if (r) | ||
295 | return r; | ||
296 | |||
297 | return 0; | ||
298 | } | ||
299 | |||
300 | static int sm_checker_extend(struct dm_space_map *sm, dm_block_t extra_blocks) | ||
301 | { | ||
302 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
303 | int r = dm_sm_extend(smc->real_sm, extra_blocks); | ||
304 | if (r) | ||
305 | return r; | ||
306 | |||
307 | return ca_extend(&smc->counts, extra_blocks); | ||
308 | } | ||
309 | |||
310 | static int sm_checker_root_size(struct dm_space_map *sm, size_t *result) | ||
311 | { | ||
312 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
313 | return dm_sm_root_size(smc->real_sm, result); | ||
314 | } | ||
315 | |||
316 | static int sm_checker_copy_root(struct dm_space_map *sm, void *copy_to_here_le, size_t len) | ||
317 | { | ||
318 | struct sm_checker *smc = container_of(sm, struct sm_checker, sm); | ||
319 | return dm_sm_copy_root(smc->real_sm, copy_to_here_le, len); | ||
320 | } | ||
321 | |||
322 | /*----------------------------------------------------------------*/ | ||
323 | |||
324 | static struct dm_space_map ops_ = { | ||
325 | .destroy = sm_checker_destroy, | ||
326 | .get_nr_blocks = sm_checker_get_nr_blocks, | ||
327 | .get_nr_free = sm_checker_get_nr_free, | ||
328 | .inc_block = sm_checker_inc_block, | ||
329 | .dec_block = sm_checker_dec_block, | ||
330 | .new_block = sm_checker_new_block, | ||
331 | .get_count = sm_checker_get_count, | ||
332 | .count_is_more_than_one = sm_checker_count_more_than_one, | ||
333 | .set_count = sm_checker_set_count, | ||
334 | .commit = sm_checker_commit, | ||
335 | .extend = sm_checker_extend, | ||
336 | .root_size = sm_checker_root_size, | ||
337 | .copy_root = sm_checker_copy_root | ||
338 | }; | ||
339 | |||
340 | struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm) | ||
341 | { | ||
342 | int r; | ||
343 | struct sm_checker *smc; | ||
344 | |||
345 | if (!sm) | ||
346 | return NULL; | ||
347 | |||
348 | smc = kmalloc(sizeof(*smc), GFP_KERNEL); | ||
349 | if (!smc) | ||
350 | return NULL; | ||
351 | |||
352 | memcpy(&smc->sm, &ops_, sizeof(smc->sm)); | ||
353 | r = ca_create(&smc->old_counts, sm); | ||
354 | if (r) { | ||
355 | kfree(smc); | ||
356 | return NULL; | ||
357 | } | ||
358 | |||
359 | r = ca_create(&smc->counts, sm); | ||
360 | if (r) { | ||
361 | ca_destroy(&smc->old_counts); | ||
362 | kfree(smc); | ||
363 | return NULL; | ||
364 | } | ||
365 | |||
366 | smc->real_sm = sm; | ||
367 | |||
368 | r = ca_load(&smc->counts, sm); | ||
369 | if (r) { | ||
370 | ca_destroy(&smc->counts); | ||
371 | ca_destroy(&smc->old_counts); | ||
372 | kfree(smc); | ||
373 | return NULL; | ||
374 | } | ||
375 | |||
376 | r = ca_commit(&smc->old_counts, &smc->counts); | ||
377 | if (r) { | ||
378 | ca_destroy(&smc->counts); | ||
379 | ca_destroy(&smc->old_counts); | ||
380 | kfree(smc); | ||
381 | return NULL; | ||
382 | } | ||
383 | |||
384 | return &smc->sm; | ||
385 | } | ||
386 | EXPORT_SYMBOL_GPL(dm_sm_checker_create); | ||
387 | |||
388 | struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm) | ||
389 | { | ||
390 | int r; | ||
391 | struct sm_checker *smc; | ||
392 | |||
393 | if (!sm) | ||
394 | return NULL; | ||
395 | |||
396 | smc = kmalloc(sizeof(*smc), GFP_KERNEL); | ||
397 | if (!smc) | ||
398 | return NULL; | ||
399 | |||
400 | memcpy(&smc->sm, &ops_, sizeof(smc->sm)); | ||
401 | r = ca_create(&smc->old_counts, sm); | ||
402 | if (r) { | ||
403 | kfree(smc); | ||
404 | return NULL; | ||
405 | } | ||
406 | |||
407 | r = ca_create(&smc->counts, sm); | ||
408 | if (r) { | ||
409 | ca_destroy(&smc->old_counts); | ||
410 | kfree(smc); | ||
411 | return NULL; | ||
412 | } | ||
413 | |||
414 | smc->real_sm = sm; | ||
415 | return &smc->sm; | ||
416 | } | ||
417 | EXPORT_SYMBOL_GPL(dm_sm_checker_create_fresh); | ||
418 | |||
419 | /*----------------------------------------------------------------*/ | ||
420 | |||
421 | #else | ||
422 | |||
423 | struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm) | ||
424 | { | ||
425 | return sm; | ||
426 | } | ||
427 | EXPORT_SYMBOL_GPL(dm_sm_checker_create); | ||
428 | |||
429 | struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm) | ||
430 | { | ||
431 | return sm; | ||
432 | } | ||
433 | EXPORT_SYMBOL_GPL(dm_sm_checker_create_fresh); | ||
434 | |||
435 | /*----------------------------------------------------------------*/ | ||
436 | |||
437 | #endif | ||
diff --git a/drivers/md/persistent-data/dm-space-map-checker.h b/drivers/md/persistent-data/dm-space-map-checker.h new file mode 100644 index 000000000000..444dccf6688c --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-checker.h | |||
@@ -0,0 +1,26 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef SNAPSHOTS_SPACE_MAP_CHECKER_H | ||
8 | #define SNAPSHOTS_SPACE_MAP_CHECKER_H | ||
9 | |||
10 | #include "dm-space-map.h" | ||
11 | |||
12 | /*----------------------------------------------------------------*/ | ||
13 | |||
14 | /* | ||
15 | * This space map wraps a real on-disk space map, and verifies all of its | ||
16 | * operations. It uses a lot of memory, so only use if you have a specific | ||
17 | * problem that you're debugging. | ||
18 | * | ||
19 | * Ownership of @sm passes. | ||
20 | */ | ||
21 | struct dm_space_map *dm_sm_checker_create(struct dm_space_map *sm); | ||
22 | struct dm_space_map *dm_sm_checker_create_fresh(struct dm_space_map *sm); | ||
23 | |||
24 | /*----------------------------------------------------------------*/ | ||
25 | |||
26 | #endif | ||
diff --git a/drivers/md/persistent-data/dm-space-map-common.c b/drivers/md/persistent-data/dm-space-map-common.c new file mode 100644 index 000000000000..df2494c06cdc --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-common.c | |||
@@ -0,0 +1,705 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-space-map-common.h" | ||
8 | #include "dm-transaction-manager.h" | ||
9 | |||
10 | #include <linux/bitops.h> | ||
11 | #include <linux/device-mapper.h> | ||
12 | |||
13 | #define DM_MSG_PREFIX "space map common" | ||
14 | |||
15 | /*----------------------------------------------------------------*/ | ||
16 | |||
17 | /* | ||
18 | * Index validator. | ||
19 | */ | ||
20 | #define INDEX_CSUM_XOR 160478 | ||
21 | |||
22 | static void index_prepare_for_write(struct dm_block_validator *v, | ||
23 | struct dm_block *b, | ||
24 | size_t block_size) | ||
25 | { | ||
26 | struct disk_metadata_index *mi_le = dm_block_data(b); | ||
27 | |||
28 | mi_le->blocknr = cpu_to_le64(dm_block_location(b)); | ||
29 | mi_le->csum = cpu_to_le32(dm_bm_checksum(&mi_le->padding, | ||
30 | block_size - sizeof(__le32), | ||
31 | INDEX_CSUM_XOR)); | ||
32 | } | ||
33 | |||
34 | static int index_check(struct dm_block_validator *v, | ||
35 | struct dm_block *b, | ||
36 | size_t block_size) | ||
37 | { | ||
38 | struct disk_metadata_index *mi_le = dm_block_data(b); | ||
39 | __le32 csum_disk; | ||
40 | |||
41 | if (dm_block_location(b) != le64_to_cpu(mi_le->blocknr)) { | ||
42 | DMERR("index_check failed blocknr %llu wanted %llu", | ||
43 | le64_to_cpu(mi_le->blocknr), dm_block_location(b)); | ||
44 | return -ENOTBLK; | ||
45 | } | ||
46 | |||
47 | csum_disk = cpu_to_le32(dm_bm_checksum(&mi_le->padding, | ||
48 | block_size - sizeof(__le32), | ||
49 | INDEX_CSUM_XOR)); | ||
50 | if (csum_disk != mi_le->csum) { | ||
51 | DMERR("index_check failed csum %u wanted %u", | ||
52 | le32_to_cpu(csum_disk), le32_to_cpu(mi_le->csum)); | ||
53 | return -EILSEQ; | ||
54 | } | ||
55 | |||
56 | return 0; | ||
57 | } | ||
58 | |||
59 | static struct dm_block_validator index_validator = { | ||
60 | .name = "index", | ||
61 | .prepare_for_write = index_prepare_for_write, | ||
62 | .check = index_check | ||
63 | }; | ||
64 | |||
65 | /*----------------------------------------------------------------*/ | ||
66 | |||
67 | /* | ||
68 | * Bitmap validator | ||
69 | */ | ||
70 | #define BITMAP_CSUM_XOR 240779 | ||
71 | |||
72 | static void bitmap_prepare_for_write(struct dm_block_validator *v, | ||
73 | struct dm_block *b, | ||
74 | size_t block_size) | ||
75 | { | ||
76 | struct disk_bitmap_header *disk_header = dm_block_data(b); | ||
77 | |||
78 | disk_header->blocknr = cpu_to_le64(dm_block_location(b)); | ||
79 | disk_header->csum = cpu_to_le32(dm_bm_checksum(&disk_header->not_used, | ||
80 | block_size - sizeof(__le32), | ||
81 | BITMAP_CSUM_XOR)); | ||
82 | } | ||
83 | |||
84 | static int bitmap_check(struct dm_block_validator *v, | ||
85 | struct dm_block *b, | ||
86 | size_t block_size) | ||
87 | { | ||
88 | struct disk_bitmap_header *disk_header = dm_block_data(b); | ||
89 | __le32 csum_disk; | ||
90 | |||
91 | if (dm_block_location(b) != le64_to_cpu(disk_header->blocknr)) { | ||
92 | DMERR("bitmap check failed blocknr %llu wanted %llu", | ||
93 | le64_to_cpu(disk_header->blocknr), dm_block_location(b)); | ||
94 | return -ENOTBLK; | ||
95 | } | ||
96 | |||
97 | csum_disk = cpu_to_le32(dm_bm_checksum(&disk_header->not_used, | ||
98 | block_size - sizeof(__le32), | ||
99 | BITMAP_CSUM_XOR)); | ||
100 | if (csum_disk != disk_header->csum) { | ||
101 | DMERR("bitmap check failed csum %u wanted %u", | ||
102 | le32_to_cpu(csum_disk), le32_to_cpu(disk_header->csum)); | ||
103 | return -EILSEQ; | ||
104 | } | ||
105 | |||
106 | return 0; | ||
107 | } | ||
108 | |||
109 | static struct dm_block_validator dm_sm_bitmap_validator = { | ||
110 | .name = "sm_bitmap", | ||
111 | .prepare_for_write = bitmap_prepare_for_write, | ||
112 | .check = bitmap_check | ||
113 | }; | ||
114 | |||
115 | /*----------------------------------------------------------------*/ | ||
116 | |||
117 | #define ENTRIES_PER_WORD 32 | ||
118 | #define ENTRIES_SHIFT 5 | ||
119 | |||
120 | static void *dm_bitmap_data(struct dm_block *b) | ||
121 | { | ||
122 | return dm_block_data(b) + sizeof(struct disk_bitmap_header); | ||
123 | } | ||
124 | |||
125 | #define WORD_MASK_HIGH 0xAAAAAAAAAAAAAAAAULL | ||
126 | |||
127 | static unsigned bitmap_word_used(void *addr, unsigned b) | ||
128 | { | ||
129 | __le64 *words_le = addr; | ||
130 | __le64 *w_le = words_le + (b >> ENTRIES_SHIFT); | ||
131 | |||
132 | uint64_t bits = le64_to_cpu(*w_le); | ||
133 | uint64_t mask = (bits + WORD_MASK_HIGH + 1) & WORD_MASK_HIGH; | ||
134 | |||
135 | return !(~bits & mask); | ||
136 | } | ||
137 | |||
138 | static unsigned sm_lookup_bitmap(void *addr, unsigned b) | ||
139 | { | ||
140 | __le64 *words_le = addr; | ||
141 | __le64 *w_le = words_le + (b >> ENTRIES_SHIFT); | ||
142 | unsigned hi, lo; | ||
143 | |||
144 | b = (b & (ENTRIES_PER_WORD - 1)) << 1; | ||
145 | hi = !!test_bit_le(b, (void *) w_le); | ||
146 | lo = !!test_bit_le(b + 1, (void *) w_le); | ||
147 | return (hi << 1) | lo; | ||
148 | } | ||
149 | |||
150 | static void sm_set_bitmap(void *addr, unsigned b, unsigned val) | ||
151 | { | ||
152 | __le64 *words_le = addr; | ||
153 | __le64 *w_le = words_le + (b >> ENTRIES_SHIFT); | ||
154 | |||
155 | b = (b & (ENTRIES_PER_WORD - 1)) << 1; | ||
156 | |||
157 | if (val & 2) | ||
158 | __set_bit_le(b, (void *) w_le); | ||
159 | else | ||
160 | __clear_bit_le(b, (void *) w_le); | ||
161 | |||
162 | if (val & 1) | ||
163 | __set_bit_le(b + 1, (void *) w_le); | ||
164 | else | ||
165 | __clear_bit_le(b + 1, (void *) w_le); | ||
166 | } | ||
167 | |||
168 | static int sm_find_free(void *addr, unsigned begin, unsigned end, | ||
169 | unsigned *result) | ||
170 | { | ||
171 | while (begin < end) { | ||
172 | if (!(begin & (ENTRIES_PER_WORD - 1)) && | ||
173 | bitmap_word_used(addr, begin)) { | ||
174 | begin += ENTRIES_PER_WORD; | ||
175 | continue; | ||
176 | } | ||
177 | |||
178 | if (!sm_lookup_bitmap(addr, begin)) { | ||
179 | *result = begin; | ||
180 | return 0; | ||
181 | } | ||
182 | |||
183 | begin++; | ||
184 | } | ||
185 | |||
186 | return -ENOSPC; | ||
187 | } | ||
188 | |||
189 | /*----------------------------------------------------------------*/ | ||
190 | |||
191 | static int sm_ll_init(struct ll_disk *ll, struct dm_transaction_manager *tm) | ||
192 | { | ||
193 | ll->tm = tm; | ||
194 | |||
195 | ll->bitmap_info.tm = tm; | ||
196 | ll->bitmap_info.levels = 1; | ||
197 | |||
198 | /* | ||
199 | * Because the new bitmap blocks are created via a shadow | ||
200 | * operation, the old entry has already had its reference count | ||
201 | * decremented and we don't need the btree to do any bookkeeping. | ||
202 | */ | ||
203 | ll->bitmap_info.value_type.size = sizeof(struct disk_index_entry); | ||
204 | ll->bitmap_info.value_type.inc = NULL; | ||
205 | ll->bitmap_info.value_type.dec = NULL; | ||
206 | ll->bitmap_info.value_type.equal = NULL; | ||
207 | |||
208 | ll->ref_count_info.tm = tm; | ||
209 | ll->ref_count_info.levels = 1; | ||
210 | ll->ref_count_info.value_type.size = sizeof(uint32_t); | ||
211 | ll->ref_count_info.value_type.inc = NULL; | ||
212 | ll->ref_count_info.value_type.dec = NULL; | ||
213 | ll->ref_count_info.value_type.equal = NULL; | ||
214 | |||
215 | ll->block_size = dm_bm_block_size(dm_tm_get_bm(tm)); | ||
216 | |||
217 | if (ll->block_size > (1 << 30)) { | ||
218 | DMERR("block size too big to hold bitmaps"); | ||
219 | return -EINVAL; | ||
220 | } | ||
221 | |||
222 | ll->entries_per_block = (ll->block_size - sizeof(struct disk_bitmap_header)) * | ||
223 | ENTRIES_PER_BYTE; | ||
224 | ll->nr_blocks = 0; | ||
225 | ll->bitmap_root = 0; | ||
226 | ll->ref_count_root = 0; | ||
227 | |||
228 | return 0; | ||
229 | } | ||
230 | |||
231 | int sm_ll_extend(struct ll_disk *ll, dm_block_t extra_blocks) | ||
232 | { | ||
233 | int r; | ||
234 | dm_block_t i, nr_blocks, nr_indexes; | ||
235 | unsigned old_blocks, blocks; | ||
236 | |||
237 | nr_blocks = ll->nr_blocks + extra_blocks; | ||
238 | old_blocks = dm_sector_div_up(ll->nr_blocks, ll->entries_per_block); | ||
239 | blocks = dm_sector_div_up(nr_blocks, ll->entries_per_block); | ||
240 | |||
241 | nr_indexes = dm_sector_div_up(nr_blocks, ll->entries_per_block); | ||
242 | if (nr_indexes > ll->max_entries(ll)) { | ||
243 | DMERR("space map too large"); | ||
244 | return -EINVAL; | ||
245 | } | ||
246 | |||
247 | for (i = old_blocks; i < blocks; i++) { | ||
248 | struct dm_block *b; | ||
249 | struct disk_index_entry idx; | ||
250 | |||
251 | r = dm_tm_new_block(ll->tm, &dm_sm_bitmap_validator, &b); | ||
252 | if (r < 0) | ||
253 | return r; | ||
254 | idx.blocknr = cpu_to_le64(dm_block_location(b)); | ||
255 | |||
256 | r = dm_tm_unlock(ll->tm, b); | ||
257 | if (r < 0) | ||
258 | return r; | ||
259 | |||
260 | idx.nr_free = cpu_to_le32(ll->entries_per_block); | ||
261 | idx.none_free_before = 0; | ||
262 | |||
263 | r = ll->save_ie(ll, i, &idx); | ||
264 | if (r < 0) | ||
265 | return r; | ||
266 | } | ||
267 | |||
268 | ll->nr_blocks = nr_blocks; | ||
269 | return 0; | ||
270 | } | ||
271 | |||
272 | int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result) | ||
273 | { | ||
274 | int r; | ||
275 | dm_block_t index = b; | ||
276 | struct disk_index_entry ie_disk; | ||
277 | struct dm_block *blk; | ||
278 | |||
279 | b = do_div(index, ll->entries_per_block); | ||
280 | r = ll->load_ie(ll, index, &ie_disk); | ||
281 | if (r < 0) | ||
282 | return r; | ||
283 | |||
284 | r = dm_tm_read_lock(ll->tm, le64_to_cpu(ie_disk.blocknr), | ||
285 | &dm_sm_bitmap_validator, &blk); | ||
286 | if (r < 0) | ||
287 | return r; | ||
288 | |||
289 | *result = sm_lookup_bitmap(dm_bitmap_data(blk), b); | ||
290 | |||
291 | return dm_tm_unlock(ll->tm, blk); | ||
292 | } | ||
293 | |||
294 | int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result) | ||
295 | { | ||
296 | __le32 le_rc; | ||
297 | int r = sm_ll_lookup_bitmap(ll, b, result); | ||
298 | |||
299 | if (r) | ||
300 | return r; | ||
301 | |||
302 | if (*result != 3) | ||
303 | return r; | ||
304 | |||
305 | r = dm_btree_lookup(&ll->ref_count_info, ll->ref_count_root, &b, &le_rc); | ||
306 | if (r < 0) | ||
307 | return r; | ||
308 | |||
309 | *result = le32_to_cpu(le_rc); | ||
310 | |||
311 | return r; | ||
312 | } | ||
313 | |||
314 | int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin, | ||
315 | dm_block_t end, dm_block_t *result) | ||
316 | { | ||
317 | int r; | ||
318 | struct disk_index_entry ie_disk; | ||
319 | dm_block_t i, index_begin = begin; | ||
320 | dm_block_t index_end = dm_sector_div_up(end, ll->entries_per_block); | ||
321 | |||
322 | /* | ||
323 | * FIXME: Use shifts | ||
324 | */ | ||
325 | begin = do_div(index_begin, ll->entries_per_block); | ||
326 | end = do_div(end, ll->entries_per_block); | ||
327 | |||
328 | for (i = index_begin; i < index_end; i++, begin = 0) { | ||
329 | struct dm_block *blk; | ||
330 | unsigned position; | ||
331 | uint32_t bit_end; | ||
332 | |||
333 | r = ll->load_ie(ll, i, &ie_disk); | ||
334 | if (r < 0) | ||
335 | return r; | ||
336 | |||
337 | if (le32_to_cpu(ie_disk.nr_free) == 0) | ||
338 | continue; | ||
339 | |||
340 | r = dm_tm_read_lock(ll->tm, le64_to_cpu(ie_disk.blocknr), | ||
341 | &dm_sm_bitmap_validator, &blk); | ||
342 | if (r < 0) | ||
343 | return r; | ||
344 | |||
345 | bit_end = (i == index_end - 1) ? end : ll->entries_per_block; | ||
346 | |||
347 | r = sm_find_free(dm_bitmap_data(blk), | ||
348 | max_t(unsigned, begin, le32_to_cpu(ie_disk.none_free_before)), | ||
349 | bit_end, &position); | ||
350 | if (r == -ENOSPC) { | ||
351 | /* | ||
352 | * This might happen because we started searching | ||
353 | * part way through the bitmap. | ||
354 | */ | ||
355 | dm_tm_unlock(ll->tm, blk); | ||
356 | continue; | ||
357 | |||
358 | } else if (r < 0) { | ||
359 | dm_tm_unlock(ll->tm, blk); | ||
360 | return r; | ||
361 | } | ||
362 | |||
363 | r = dm_tm_unlock(ll->tm, blk); | ||
364 | if (r < 0) | ||
365 | return r; | ||
366 | |||
367 | *result = i * ll->entries_per_block + (dm_block_t) position; | ||
368 | return 0; | ||
369 | } | ||
370 | |||
371 | return -ENOSPC; | ||
372 | } | ||
373 | |||
374 | int sm_ll_insert(struct ll_disk *ll, dm_block_t b, | ||
375 | uint32_t ref_count, enum allocation_event *ev) | ||
376 | { | ||
377 | int r; | ||
378 | uint32_t bit, old; | ||
379 | struct dm_block *nb; | ||
380 | dm_block_t index = b; | ||
381 | struct disk_index_entry ie_disk; | ||
382 | void *bm_le; | ||
383 | int inc; | ||
384 | |||
385 | bit = do_div(index, ll->entries_per_block); | ||
386 | r = ll->load_ie(ll, index, &ie_disk); | ||
387 | if (r < 0) | ||
388 | return r; | ||
389 | |||
390 | r = dm_tm_shadow_block(ll->tm, le64_to_cpu(ie_disk.blocknr), | ||
391 | &dm_sm_bitmap_validator, &nb, &inc); | ||
392 | if (r < 0) { | ||
393 | DMERR("dm_tm_shadow_block() failed"); | ||
394 | return r; | ||
395 | } | ||
396 | ie_disk.blocknr = cpu_to_le64(dm_block_location(nb)); | ||
397 | |||
398 | bm_le = dm_bitmap_data(nb); | ||
399 | old = sm_lookup_bitmap(bm_le, bit); | ||
400 | |||
401 | if (ref_count <= 2) { | ||
402 | sm_set_bitmap(bm_le, bit, ref_count); | ||
403 | |||
404 | r = dm_tm_unlock(ll->tm, nb); | ||
405 | if (r < 0) | ||
406 | return r; | ||
407 | |||
408 | #if 0 | ||
409 | /* FIXME: dm_btree_remove doesn't handle this yet */ | ||
410 | if (old > 2) { | ||
411 | r = dm_btree_remove(&ll->ref_count_info, | ||
412 | ll->ref_count_root, | ||
413 | &b, &ll->ref_count_root); | ||
414 | if (r) | ||
415 | return r; | ||
416 | } | ||
417 | #endif | ||
418 | |||
419 | } else { | ||
420 | __le32 le_rc = cpu_to_le32(ref_count); | ||
421 | |||
422 | sm_set_bitmap(bm_le, bit, 3); | ||
423 | r = dm_tm_unlock(ll->tm, nb); | ||
424 | if (r < 0) | ||
425 | return r; | ||
426 | |||
427 | __dm_bless_for_disk(&le_rc); | ||
428 | r = dm_btree_insert(&ll->ref_count_info, ll->ref_count_root, | ||
429 | &b, &le_rc, &ll->ref_count_root); | ||
430 | if (r < 0) { | ||
431 | DMERR("ref count insert failed"); | ||
432 | return r; | ||
433 | } | ||
434 | } | ||
435 | |||
436 | if (ref_count && !old) { | ||
437 | *ev = SM_ALLOC; | ||
438 | ll->nr_allocated++; | ||
439 | ie_disk.nr_free = cpu_to_le32(le32_to_cpu(ie_disk.nr_free) - 1); | ||
440 | if (le32_to_cpu(ie_disk.none_free_before) == bit) | ||
441 | ie_disk.none_free_before = cpu_to_le32(bit + 1); | ||
442 | |||
443 | } else if (old && !ref_count) { | ||
444 | *ev = SM_FREE; | ||
445 | ll->nr_allocated--; | ||
446 | ie_disk.nr_free = cpu_to_le32(le32_to_cpu(ie_disk.nr_free) + 1); | ||
447 | ie_disk.none_free_before = cpu_to_le32(min(le32_to_cpu(ie_disk.none_free_before), bit)); | ||
448 | } | ||
449 | |||
450 | return ll->save_ie(ll, index, &ie_disk); | ||
451 | } | ||
452 | |||
453 | int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev) | ||
454 | { | ||
455 | int r; | ||
456 | uint32_t rc; | ||
457 | |||
458 | r = sm_ll_lookup(ll, b, &rc); | ||
459 | if (r) | ||
460 | return r; | ||
461 | |||
462 | return sm_ll_insert(ll, b, rc + 1, ev); | ||
463 | } | ||
464 | |||
465 | int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev) | ||
466 | { | ||
467 | int r; | ||
468 | uint32_t rc; | ||
469 | |||
470 | r = sm_ll_lookup(ll, b, &rc); | ||
471 | if (r) | ||
472 | return r; | ||
473 | |||
474 | if (!rc) | ||
475 | return -EINVAL; | ||
476 | |||
477 | return sm_ll_insert(ll, b, rc - 1, ev); | ||
478 | } | ||
479 | |||
480 | int sm_ll_commit(struct ll_disk *ll) | ||
481 | { | ||
482 | return ll->commit(ll); | ||
483 | } | ||
484 | |||
485 | /*----------------------------------------------------------------*/ | ||
486 | |||
487 | static int metadata_ll_load_ie(struct ll_disk *ll, dm_block_t index, | ||
488 | struct disk_index_entry *ie) | ||
489 | { | ||
490 | memcpy(ie, ll->mi_le.index + index, sizeof(*ie)); | ||
491 | return 0; | ||
492 | } | ||
493 | |||
494 | static int metadata_ll_save_ie(struct ll_disk *ll, dm_block_t index, | ||
495 | struct disk_index_entry *ie) | ||
496 | { | ||
497 | memcpy(ll->mi_le.index + index, ie, sizeof(*ie)); | ||
498 | return 0; | ||
499 | } | ||
500 | |||
501 | static int metadata_ll_init_index(struct ll_disk *ll) | ||
502 | { | ||
503 | int r; | ||
504 | struct dm_block *b; | ||
505 | |||
506 | r = dm_tm_new_block(ll->tm, &index_validator, &b); | ||
507 | if (r < 0) | ||
508 | return r; | ||
509 | |||
510 | memcpy(dm_block_data(b), &ll->mi_le, sizeof(ll->mi_le)); | ||
511 | ll->bitmap_root = dm_block_location(b); | ||
512 | |||
513 | return dm_tm_unlock(ll->tm, b); | ||
514 | } | ||
515 | |||
516 | static int metadata_ll_open(struct ll_disk *ll) | ||
517 | { | ||
518 | int r; | ||
519 | struct dm_block *block; | ||
520 | |||
521 | r = dm_tm_read_lock(ll->tm, ll->bitmap_root, | ||
522 | &index_validator, &block); | ||
523 | if (r) | ||
524 | return r; | ||
525 | |||
526 | memcpy(&ll->mi_le, dm_block_data(block), sizeof(ll->mi_le)); | ||
527 | return dm_tm_unlock(ll->tm, block); | ||
528 | } | ||
529 | |||
530 | static dm_block_t metadata_ll_max_entries(struct ll_disk *ll) | ||
531 | { | ||
532 | return MAX_METADATA_BITMAPS; | ||
533 | } | ||
534 | |||
535 | static int metadata_ll_commit(struct ll_disk *ll) | ||
536 | { | ||
537 | int r, inc; | ||
538 | struct dm_block *b; | ||
539 | |||
540 | r = dm_tm_shadow_block(ll->tm, ll->bitmap_root, &index_validator, &b, &inc); | ||
541 | if (r) | ||
542 | return r; | ||
543 | |||
544 | memcpy(dm_block_data(b), &ll->mi_le, sizeof(ll->mi_le)); | ||
545 | ll->bitmap_root = dm_block_location(b); | ||
546 | |||
547 | return dm_tm_unlock(ll->tm, b); | ||
548 | } | ||
549 | |||
550 | int sm_ll_new_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm) | ||
551 | { | ||
552 | int r; | ||
553 | |||
554 | r = sm_ll_init(ll, tm); | ||
555 | if (r < 0) | ||
556 | return r; | ||
557 | |||
558 | ll->load_ie = metadata_ll_load_ie; | ||
559 | ll->save_ie = metadata_ll_save_ie; | ||
560 | ll->init_index = metadata_ll_init_index; | ||
561 | ll->open_index = metadata_ll_open; | ||
562 | ll->max_entries = metadata_ll_max_entries; | ||
563 | ll->commit = metadata_ll_commit; | ||
564 | |||
565 | ll->nr_blocks = 0; | ||
566 | ll->nr_allocated = 0; | ||
567 | |||
568 | r = ll->init_index(ll); | ||
569 | if (r < 0) | ||
570 | return r; | ||
571 | |||
572 | r = dm_btree_empty(&ll->ref_count_info, &ll->ref_count_root); | ||
573 | if (r < 0) | ||
574 | return r; | ||
575 | |||
576 | return 0; | ||
577 | } | ||
578 | |||
579 | int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm, | ||
580 | void *root_le, size_t len) | ||
581 | { | ||
582 | int r; | ||
583 | struct disk_sm_root *smr = root_le; | ||
584 | |||
585 | if (len < sizeof(struct disk_sm_root)) { | ||
586 | DMERR("sm_metadata root too small"); | ||
587 | return -ENOMEM; | ||
588 | } | ||
589 | |||
590 | r = sm_ll_init(ll, tm); | ||
591 | if (r < 0) | ||
592 | return r; | ||
593 | |||
594 | ll->load_ie = metadata_ll_load_ie; | ||
595 | ll->save_ie = metadata_ll_save_ie; | ||
596 | ll->init_index = metadata_ll_init_index; | ||
597 | ll->open_index = metadata_ll_open; | ||
598 | ll->max_entries = metadata_ll_max_entries; | ||
599 | ll->commit = metadata_ll_commit; | ||
600 | |||
601 | ll->nr_blocks = le64_to_cpu(smr->nr_blocks); | ||
602 | ll->nr_allocated = le64_to_cpu(smr->nr_allocated); | ||
603 | ll->bitmap_root = le64_to_cpu(smr->bitmap_root); | ||
604 | ll->ref_count_root = le64_to_cpu(smr->ref_count_root); | ||
605 | |||
606 | return ll->open_index(ll); | ||
607 | } | ||
608 | |||
609 | /*----------------------------------------------------------------*/ | ||
610 | |||
611 | static int disk_ll_load_ie(struct ll_disk *ll, dm_block_t index, | ||
612 | struct disk_index_entry *ie) | ||
613 | { | ||
614 | return dm_btree_lookup(&ll->bitmap_info, ll->bitmap_root, &index, ie); | ||
615 | } | ||
616 | |||
617 | static int disk_ll_save_ie(struct ll_disk *ll, dm_block_t index, | ||
618 | struct disk_index_entry *ie) | ||
619 | { | ||
620 | __dm_bless_for_disk(ie); | ||
621 | return dm_btree_insert(&ll->bitmap_info, ll->bitmap_root, | ||
622 | &index, ie, &ll->bitmap_root); | ||
623 | } | ||
624 | |||
625 | static int disk_ll_init_index(struct ll_disk *ll) | ||
626 | { | ||
627 | return dm_btree_empty(&ll->bitmap_info, &ll->bitmap_root); | ||
628 | } | ||
629 | |||
630 | static int disk_ll_open(struct ll_disk *ll) | ||
631 | { | ||
632 | /* nothing to do */ | ||
633 | return 0; | ||
634 | } | ||
635 | |||
636 | static dm_block_t disk_ll_max_entries(struct ll_disk *ll) | ||
637 | { | ||
638 | return -1ULL; | ||
639 | } | ||
640 | |||
641 | static int disk_ll_commit(struct ll_disk *ll) | ||
642 | { | ||
643 | return 0; | ||
644 | } | ||
645 | |||
646 | int sm_ll_new_disk(struct ll_disk *ll, struct dm_transaction_manager *tm) | ||
647 | { | ||
648 | int r; | ||
649 | |||
650 | r = sm_ll_init(ll, tm); | ||
651 | if (r < 0) | ||
652 | return r; | ||
653 | |||
654 | ll->load_ie = disk_ll_load_ie; | ||
655 | ll->save_ie = disk_ll_save_ie; | ||
656 | ll->init_index = disk_ll_init_index; | ||
657 | ll->open_index = disk_ll_open; | ||
658 | ll->max_entries = disk_ll_max_entries; | ||
659 | ll->commit = disk_ll_commit; | ||
660 | |||
661 | ll->nr_blocks = 0; | ||
662 | ll->nr_allocated = 0; | ||
663 | |||
664 | r = ll->init_index(ll); | ||
665 | if (r < 0) | ||
666 | return r; | ||
667 | |||
668 | r = dm_btree_empty(&ll->ref_count_info, &ll->ref_count_root); | ||
669 | if (r < 0) | ||
670 | return r; | ||
671 | |||
672 | return 0; | ||
673 | } | ||
674 | |||
675 | int sm_ll_open_disk(struct ll_disk *ll, struct dm_transaction_manager *tm, | ||
676 | void *root_le, size_t len) | ||
677 | { | ||
678 | int r; | ||
679 | struct disk_sm_root *smr = root_le; | ||
680 | |||
681 | if (len < sizeof(struct disk_sm_root)) { | ||
682 | DMERR("sm_metadata root too small"); | ||
683 | return -ENOMEM; | ||
684 | } | ||
685 | |||
686 | r = sm_ll_init(ll, tm); | ||
687 | if (r < 0) | ||
688 | return r; | ||
689 | |||
690 | ll->load_ie = disk_ll_load_ie; | ||
691 | ll->save_ie = disk_ll_save_ie; | ||
692 | ll->init_index = disk_ll_init_index; | ||
693 | ll->open_index = disk_ll_open; | ||
694 | ll->max_entries = disk_ll_max_entries; | ||
695 | ll->commit = disk_ll_commit; | ||
696 | |||
697 | ll->nr_blocks = le64_to_cpu(smr->nr_blocks); | ||
698 | ll->nr_allocated = le64_to_cpu(smr->nr_allocated); | ||
699 | ll->bitmap_root = le64_to_cpu(smr->bitmap_root); | ||
700 | ll->ref_count_root = le64_to_cpu(smr->ref_count_root); | ||
701 | |||
702 | return ll->open_index(ll); | ||
703 | } | ||
704 | |||
705 | /*----------------------------------------------------------------*/ | ||
diff --git a/drivers/md/persistent-data/dm-space-map-common.h b/drivers/md/persistent-data/dm-space-map-common.h new file mode 100644 index 000000000000..8f220821a9a9 --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-common.h | |||
@@ -0,0 +1,126 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef DM_SPACE_MAP_COMMON_H | ||
8 | #define DM_SPACE_MAP_COMMON_H | ||
9 | |||
10 | #include "dm-btree.h" | ||
11 | |||
12 | /*----------------------------------------------------------------*/ | ||
13 | |||
14 | /* | ||
15 | * Low level disk format | ||
16 | * | ||
17 | * Bitmap btree | ||
18 | * ------------ | ||
19 | * | ||
20 | * Each value stored in the btree is an index_entry. This points to a | ||
21 | * block that is used as a bitmap. Within the bitmap hold 2 bits per | ||
22 | * entry, which represent UNUSED = 0, REF_COUNT = 1, REF_COUNT = 2 and | ||
23 | * REF_COUNT = many. | ||
24 | * | ||
25 | * Refcount btree | ||
26 | * -------------- | ||
27 | * | ||
28 | * Any entry that has a ref count higher than 2 gets entered in the ref | ||
29 | * count tree. The leaf values for this tree is the 32-bit ref count. | ||
30 | */ | ||
31 | |||
32 | struct disk_index_entry { | ||
33 | __le64 blocknr; | ||
34 | __le32 nr_free; | ||
35 | __le32 none_free_before; | ||
36 | } __packed; | ||
37 | |||
38 | |||
39 | #define MAX_METADATA_BITMAPS 255 | ||
40 | struct disk_metadata_index { | ||
41 | __le32 csum; | ||
42 | __le32 padding; | ||
43 | __le64 blocknr; | ||
44 | |||
45 | struct disk_index_entry index[MAX_METADATA_BITMAPS]; | ||
46 | } __packed; | ||
47 | |||
48 | struct ll_disk; | ||
49 | |||
50 | typedef int (*load_ie_fn)(struct ll_disk *ll, dm_block_t index, struct disk_index_entry *result); | ||
51 | typedef int (*save_ie_fn)(struct ll_disk *ll, dm_block_t index, struct disk_index_entry *ie); | ||
52 | typedef int (*init_index_fn)(struct ll_disk *ll); | ||
53 | typedef int (*open_index_fn)(struct ll_disk *ll); | ||
54 | typedef dm_block_t (*max_index_entries_fn)(struct ll_disk *ll); | ||
55 | typedef int (*commit_fn)(struct ll_disk *ll); | ||
56 | |||
57 | struct ll_disk { | ||
58 | struct dm_transaction_manager *tm; | ||
59 | struct dm_btree_info bitmap_info; | ||
60 | struct dm_btree_info ref_count_info; | ||
61 | |||
62 | uint32_t block_size; | ||
63 | uint32_t entries_per_block; | ||
64 | dm_block_t nr_blocks; | ||
65 | dm_block_t nr_allocated; | ||
66 | |||
67 | /* | ||
68 | * bitmap_root may be a btree root or a simple index. | ||
69 | */ | ||
70 | dm_block_t bitmap_root; | ||
71 | |||
72 | dm_block_t ref_count_root; | ||
73 | |||
74 | struct disk_metadata_index mi_le; | ||
75 | load_ie_fn load_ie; | ||
76 | save_ie_fn save_ie; | ||
77 | init_index_fn init_index; | ||
78 | open_index_fn open_index; | ||
79 | max_index_entries_fn max_entries; | ||
80 | commit_fn commit; | ||
81 | }; | ||
82 | |||
83 | struct disk_sm_root { | ||
84 | __le64 nr_blocks; | ||
85 | __le64 nr_allocated; | ||
86 | __le64 bitmap_root; | ||
87 | __le64 ref_count_root; | ||
88 | } __packed; | ||
89 | |||
90 | #define ENTRIES_PER_BYTE 4 | ||
91 | |||
92 | struct disk_bitmap_header { | ||
93 | __le32 csum; | ||
94 | __le32 not_used; | ||
95 | __le64 blocknr; | ||
96 | } __packed; | ||
97 | |||
98 | enum allocation_event { | ||
99 | SM_NONE, | ||
100 | SM_ALLOC, | ||
101 | SM_FREE, | ||
102 | }; | ||
103 | |||
104 | /*----------------------------------------------------------------*/ | ||
105 | |||
106 | int sm_ll_extend(struct ll_disk *ll, dm_block_t extra_blocks); | ||
107 | int sm_ll_lookup_bitmap(struct ll_disk *ll, dm_block_t b, uint32_t *result); | ||
108 | int sm_ll_lookup(struct ll_disk *ll, dm_block_t b, uint32_t *result); | ||
109 | int sm_ll_find_free_block(struct ll_disk *ll, dm_block_t begin, | ||
110 | dm_block_t end, dm_block_t *result); | ||
111 | int sm_ll_insert(struct ll_disk *ll, dm_block_t b, uint32_t ref_count, enum allocation_event *ev); | ||
112 | int sm_ll_inc(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev); | ||
113 | int sm_ll_dec(struct ll_disk *ll, dm_block_t b, enum allocation_event *ev); | ||
114 | int sm_ll_commit(struct ll_disk *ll); | ||
115 | |||
116 | int sm_ll_new_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm); | ||
117 | int sm_ll_open_metadata(struct ll_disk *ll, struct dm_transaction_manager *tm, | ||
118 | void *root_le, size_t len); | ||
119 | |||
120 | int sm_ll_new_disk(struct ll_disk *ll, struct dm_transaction_manager *tm); | ||
121 | int sm_ll_open_disk(struct ll_disk *ll, struct dm_transaction_manager *tm, | ||
122 | void *root_le, size_t len); | ||
123 | |||
124 | /*----------------------------------------------------------------*/ | ||
125 | |||
126 | #endif /* DM_SPACE_MAP_COMMON_H */ | ||
diff --git a/drivers/md/persistent-data/dm-space-map-disk.c b/drivers/md/persistent-data/dm-space-map-disk.c new file mode 100644 index 000000000000..aeff7852cf79 --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-disk.c | |||
@@ -0,0 +1,335 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-space-map-checker.h" | ||
8 | #include "dm-space-map-common.h" | ||
9 | #include "dm-space-map-disk.h" | ||
10 | #include "dm-space-map.h" | ||
11 | #include "dm-transaction-manager.h" | ||
12 | |||
13 | #include <linux/list.h> | ||
14 | #include <linux/slab.h> | ||
15 | #include <linux/module.h> | ||
16 | #include <linux/device-mapper.h> | ||
17 | |||
18 | #define DM_MSG_PREFIX "space map disk" | ||
19 | |||
20 | /*----------------------------------------------------------------*/ | ||
21 | |||
22 | /* | ||
23 | * Space map interface. | ||
24 | */ | ||
25 | struct sm_disk { | ||
26 | struct dm_space_map sm; | ||
27 | |||
28 | struct ll_disk ll; | ||
29 | struct ll_disk old_ll; | ||
30 | |||
31 | dm_block_t begin; | ||
32 | dm_block_t nr_allocated_this_transaction; | ||
33 | }; | ||
34 | |||
35 | static void sm_disk_destroy(struct dm_space_map *sm) | ||
36 | { | ||
37 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
38 | |||
39 | kfree(smd); | ||
40 | } | ||
41 | |||
42 | static int sm_disk_extend(struct dm_space_map *sm, dm_block_t extra_blocks) | ||
43 | { | ||
44 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
45 | |||
46 | return sm_ll_extend(&smd->ll, extra_blocks); | ||
47 | } | ||
48 | |||
49 | static int sm_disk_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) | ||
50 | { | ||
51 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
52 | *count = smd->old_ll.nr_blocks; | ||
53 | |||
54 | return 0; | ||
55 | } | ||
56 | |||
57 | static int sm_disk_get_nr_free(struct dm_space_map *sm, dm_block_t *count) | ||
58 | { | ||
59 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
60 | *count = (smd->old_ll.nr_blocks - smd->old_ll.nr_allocated) - smd->nr_allocated_this_transaction; | ||
61 | |||
62 | return 0; | ||
63 | } | ||
64 | |||
65 | static int sm_disk_get_count(struct dm_space_map *sm, dm_block_t b, | ||
66 | uint32_t *result) | ||
67 | { | ||
68 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
69 | return sm_ll_lookup(&smd->ll, b, result); | ||
70 | } | ||
71 | |||
72 | static int sm_disk_count_is_more_than_one(struct dm_space_map *sm, dm_block_t b, | ||
73 | int *result) | ||
74 | { | ||
75 | int r; | ||
76 | uint32_t count; | ||
77 | |||
78 | r = sm_disk_get_count(sm, b, &count); | ||
79 | if (r) | ||
80 | return r; | ||
81 | |||
82 | return count > 1; | ||
83 | } | ||
84 | |||
85 | static int sm_disk_set_count(struct dm_space_map *sm, dm_block_t b, | ||
86 | uint32_t count) | ||
87 | { | ||
88 | int r; | ||
89 | uint32_t old_count; | ||
90 | enum allocation_event ev; | ||
91 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
92 | |||
93 | r = sm_ll_insert(&smd->ll, b, count, &ev); | ||
94 | if (!r) { | ||
95 | switch (ev) { | ||
96 | case SM_NONE: | ||
97 | break; | ||
98 | |||
99 | case SM_ALLOC: | ||
100 | /* | ||
101 | * This _must_ be free in the prior transaction | ||
102 | * otherwise we've lost atomicity. | ||
103 | */ | ||
104 | smd->nr_allocated_this_transaction++; | ||
105 | break; | ||
106 | |||
107 | case SM_FREE: | ||
108 | /* | ||
109 | * It's only free if it's also free in the last | ||
110 | * transaction. | ||
111 | */ | ||
112 | r = sm_ll_lookup(&smd->old_ll, b, &old_count); | ||
113 | if (r) | ||
114 | return r; | ||
115 | |||
116 | if (!old_count) | ||
117 | smd->nr_allocated_this_transaction--; | ||
118 | break; | ||
119 | } | ||
120 | } | ||
121 | |||
122 | return r; | ||
123 | } | ||
124 | |||
125 | static int sm_disk_inc_block(struct dm_space_map *sm, dm_block_t b) | ||
126 | { | ||
127 | int r; | ||
128 | enum allocation_event ev; | ||
129 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
130 | |||
131 | r = sm_ll_inc(&smd->ll, b, &ev); | ||
132 | if (!r && (ev == SM_ALLOC)) | ||
133 | /* | ||
134 | * This _must_ be free in the prior transaction | ||
135 | * otherwise we've lost atomicity. | ||
136 | */ | ||
137 | smd->nr_allocated_this_transaction++; | ||
138 | |||
139 | return r; | ||
140 | } | ||
141 | |||
142 | static int sm_disk_dec_block(struct dm_space_map *sm, dm_block_t b) | ||
143 | { | ||
144 | int r; | ||
145 | uint32_t old_count; | ||
146 | enum allocation_event ev; | ||
147 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
148 | |||
149 | r = sm_ll_dec(&smd->ll, b, &ev); | ||
150 | if (!r && (ev == SM_FREE)) { | ||
151 | /* | ||
152 | * It's only free if it's also free in the last | ||
153 | * transaction. | ||
154 | */ | ||
155 | r = sm_ll_lookup(&smd->old_ll, b, &old_count); | ||
156 | if (r) | ||
157 | return r; | ||
158 | |||
159 | if (!old_count) | ||
160 | smd->nr_allocated_this_transaction--; | ||
161 | } | ||
162 | |||
163 | return r; | ||
164 | } | ||
165 | |||
166 | static int sm_disk_new_block(struct dm_space_map *sm, dm_block_t *b) | ||
167 | { | ||
168 | int r; | ||
169 | enum allocation_event ev; | ||
170 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
171 | |||
172 | /* FIXME: we should loop round a couple of times */ | ||
173 | r = sm_ll_find_free_block(&smd->old_ll, smd->begin, smd->old_ll.nr_blocks, b); | ||
174 | if (r) | ||
175 | return r; | ||
176 | |||
177 | smd->begin = *b + 1; | ||
178 | r = sm_ll_inc(&smd->ll, *b, &ev); | ||
179 | if (!r) { | ||
180 | BUG_ON(ev != SM_ALLOC); | ||
181 | smd->nr_allocated_this_transaction++; | ||
182 | } | ||
183 | |||
184 | return r; | ||
185 | } | ||
186 | |||
187 | static int sm_disk_commit(struct dm_space_map *sm) | ||
188 | { | ||
189 | int r; | ||
190 | dm_block_t nr_free; | ||
191 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
192 | |||
193 | r = sm_disk_get_nr_free(sm, &nr_free); | ||
194 | if (r) | ||
195 | return r; | ||
196 | |||
197 | r = sm_ll_commit(&smd->ll); | ||
198 | if (r) | ||
199 | return r; | ||
200 | |||
201 | memcpy(&smd->old_ll, &smd->ll, sizeof(smd->old_ll)); | ||
202 | smd->begin = 0; | ||
203 | smd->nr_allocated_this_transaction = 0; | ||
204 | |||
205 | r = sm_disk_get_nr_free(sm, &nr_free); | ||
206 | if (r) | ||
207 | return r; | ||
208 | |||
209 | return 0; | ||
210 | } | ||
211 | |||
212 | static int sm_disk_root_size(struct dm_space_map *sm, size_t *result) | ||
213 | { | ||
214 | *result = sizeof(struct disk_sm_root); | ||
215 | |||
216 | return 0; | ||
217 | } | ||
218 | |||
219 | static int sm_disk_copy_root(struct dm_space_map *sm, void *where_le, size_t max) | ||
220 | { | ||
221 | struct sm_disk *smd = container_of(sm, struct sm_disk, sm); | ||
222 | struct disk_sm_root root_le; | ||
223 | |||
224 | root_le.nr_blocks = cpu_to_le64(smd->ll.nr_blocks); | ||
225 | root_le.nr_allocated = cpu_to_le64(smd->ll.nr_allocated); | ||
226 | root_le.bitmap_root = cpu_to_le64(smd->ll.bitmap_root); | ||
227 | root_le.ref_count_root = cpu_to_le64(smd->ll.ref_count_root); | ||
228 | |||
229 | if (max < sizeof(root_le)) | ||
230 | return -ENOSPC; | ||
231 | |||
232 | memcpy(where_le, &root_le, sizeof(root_le)); | ||
233 | |||
234 | return 0; | ||
235 | } | ||
236 | |||
237 | /*----------------------------------------------------------------*/ | ||
238 | |||
239 | static struct dm_space_map ops = { | ||
240 | .destroy = sm_disk_destroy, | ||
241 | .extend = sm_disk_extend, | ||
242 | .get_nr_blocks = sm_disk_get_nr_blocks, | ||
243 | .get_nr_free = sm_disk_get_nr_free, | ||
244 | .get_count = sm_disk_get_count, | ||
245 | .count_is_more_than_one = sm_disk_count_is_more_than_one, | ||
246 | .set_count = sm_disk_set_count, | ||
247 | .inc_block = sm_disk_inc_block, | ||
248 | .dec_block = sm_disk_dec_block, | ||
249 | .new_block = sm_disk_new_block, | ||
250 | .commit = sm_disk_commit, | ||
251 | .root_size = sm_disk_root_size, | ||
252 | .copy_root = sm_disk_copy_root | ||
253 | }; | ||
254 | |||
255 | static struct dm_space_map *dm_sm_disk_create_real( | ||
256 | struct dm_transaction_manager *tm, | ||
257 | dm_block_t nr_blocks) | ||
258 | { | ||
259 | int r; | ||
260 | struct sm_disk *smd; | ||
261 | |||
262 | smd = kmalloc(sizeof(*smd), GFP_KERNEL); | ||
263 | if (!smd) | ||
264 | return ERR_PTR(-ENOMEM); | ||
265 | |||
266 | smd->begin = 0; | ||
267 | smd->nr_allocated_this_transaction = 0; | ||
268 | memcpy(&smd->sm, &ops, sizeof(smd->sm)); | ||
269 | |||
270 | r = sm_ll_new_disk(&smd->ll, tm); | ||
271 | if (r) | ||
272 | goto bad; | ||
273 | |||
274 | r = sm_ll_extend(&smd->ll, nr_blocks); | ||
275 | if (r) | ||
276 | goto bad; | ||
277 | |||
278 | r = sm_disk_commit(&smd->sm); | ||
279 | if (r) | ||
280 | goto bad; | ||
281 | |||
282 | return &smd->sm; | ||
283 | |||
284 | bad: | ||
285 | kfree(smd); | ||
286 | return ERR_PTR(r); | ||
287 | } | ||
288 | |||
289 | struct dm_space_map *dm_sm_disk_create(struct dm_transaction_manager *tm, | ||
290 | dm_block_t nr_blocks) | ||
291 | { | ||
292 | struct dm_space_map *sm = dm_sm_disk_create_real(tm, nr_blocks); | ||
293 | return dm_sm_checker_create_fresh(sm); | ||
294 | } | ||
295 | EXPORT_SYMBOL_GPL(dm_sm_disk_create); | ||
296 | |||
297 | static struct dm_space_map *dm_sm_disk_open_real( | ||
298 | struct dm_transaction_manager *tm, | ||
299 | void *root_le, size_t len) | ||
300 | { | ||
301 | int r; | ||
302 | struct sm_disk *smd; | ||
303 | |||
304 | smd = kmalloc(sizeof(*smd), GFP_KERNEL); | ||
305 | if (!smd) | ||
306 | return ERR_PTR(-ENOMEM); | ||
307 | |||
308 | smd->begin = 0; | ||
309 | smd->nr_allocated_this_transaction = 0; | ||
310 | memcpy(&smd->sm, &ops, sizeof(smd->sm)); | ||
311 | |||
312 | r = sm_ll_open_disk(&smd->ll, tm, root_le, len); | ||
313 | if (r) | ||
314 | goto bad; | ||
315 | |||
316 | r = sm_disk_commit(&smd->sm); | ||
317 | if (r) | ||
318 | goto bad; | ||
319 | |||
320 | return &smd->sm; | ||
321 | |||
322 | bad: | ||
323 | kfree(smd); | ||
324 | return ERR_PTR(r); | ||
325 | } | ||
326 | |||
327 | struct dm_space_map *dm_sm_disk_open(struct dm_transaction_manager *tm, | ||
328 | void *root_le, size_t len) | ||
329 | { | ||
330 | return dm_sm_checker_create( | ||
331 | dm_sm_disk_open_real(tm, root_le, len)); | ||
332 | } | ||
333 | EXPORT_SYMBOL_GPL(dm_sm_disk_open); | ||
334 | |||
335 | /*----------------------------------------------------------------*/ | ||
diff --git a/drivers/md/persistent-data/dm-space-map-disk.h b/drivers/md/persistent-data/dm-space-map-disk.h new file mode 100644 index 000000000000..447a0a9a2d9f --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-disk.h | |||
@@ -0,0 +1,25 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef _LINUX_DM_SPACE_MAP_DISK_H | ||
8 | #define _LINUX_DM_SPACE_MAP_DISK_H | ||
9 | |||
10 | #include "dm-block-manager.h" | ||
11 | |||
12 | struct dm_space_map; | ||
13 | struct dm_transaction_manager; | ||
14 | |||
15 | /* | ||
16 | * Unfortunately we have to use two-phase construction due to the cycle | ||
17 | * between the tm and sm. | ||
18 | */ | ||
19 | struct dm_space_map *dm_sm_disk_create(struct dm_transaction_manager *tm, | ||
20 | dm_block_t nr_blocks); | ||
21 | |||
22 | struct dm_space_map *dm_sm_disk_open(struct dm_transaction_manager *tm, | ||
23 | void *root, size_t len); | ||
24 | |||
25 | #endif /* _LINUX_DM_SPACE_MAP_DISK_H */ | ||
diff --git a/drivers/md/persistent-data/dm-space-map-metadata.c b/drivers/md/persistent-data/dm-space-map-metadata.c new file mode 100644 index 000000000000..e89ae5e7a519 --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-metadata.c | |||
@@ -0,0 +1,596 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #include "dm-space-map.h" | ||
8 | #include "dm-space-map-common.h" | ||
9 | #include "dm-space-map-metadata.h" | ||
10 | |||
11 | #include <linux/list.h> | ||
12 | #include <linux/slab.h> | ||
13 | #include <linux/device-mapper.h> | ||
14 | |||
15 | #define DM_MSG_PREFIX "space map metadata" | ||
16 | |||
17 | /*----------------------------------------------------------------*/ | ||
18 | |||
19 | /* | ||
20 | * Space map interface. | ||
21 | * | ||
22 | * The low level disk format is written using the standard btree and | ||
23 | * transaction manager. This means that performing disk operations may | ||
24 | * cause us to recurse into the space map in order to allocate new blocks. | ||
25 | * For this reason we have a pool of pre-allocated blocks large enough to | ||
26 | * service any metadata_ll_disk operation. | ||
27 | */ | ||
28 | |||
29 | /* | ||
30 | * FIXME: we should calculate this based on the size of the device. | ||
31 | * Only the metadata space map needs this functionality. | ||
32 | */ | ||
33 | #define MAX_RECURSIVE_ALLOCATIONS 1024 | ||
34 | |||
35 | enum block_op_type { | ||
36 | BOP_INC, | ||
37 | BOP_DEC | ||
38 | }; | ||
39 | |||
40 | struct block_op { | ||
41 | enum block_op_type type; | ||
42 | dm_block_t block; | ||
43 | }; | ||
44 | |||
45 | struct sm_metadata { | ||
46 | struct dm_space_map sm; | ||
47 | |||
48 | struct ll_disk ll; | ||
49 | struct ll_disk old_ll; | ||
50 | |||
51 | dm_block_t begin; | ||
52 | |||
53 | unsigned recursion_count; | ||
54 | unsigned allocated_this_transaction; | ||
55 | unsigned nr_uncommitted; | ||
56 | struct block_op uncommitted[MAX_RECURSIVE_ALLOCATIONS]; | ||
57 | }; | ||
58 | |||
59 | static int add_bop(struct sm_metadata *smm, enum block_op_type type, dm_block_t b) | ||
60 | { | ||
61 | struct block_op *op; | ||
62 | |||
63 | if (smm->nr_uncommitted == MAX_RECURSIVE_ALLOCATIONS) { | ||
64 | DMERR("too many recursive allocations"); | ||
65 | return -ENOMEM; | ||
66 | } | ||
67 | |||
68 | op = smm->uncommitted + smm->nr_uncommitted++; | ||
69 | op->type = type; | ||
70 | op->block = b; | ||
71 | |||
72 | return 0; | ||
73 | } | ||
74 | |||
75 | static int commit_bop(struct sm_metadata *smm, struct block_op *op) | ||
76 | { | ||
77 | int r = 0; | ||
78 | enum allocation_event ev; | ||
79 | |||
80 | switch (op->type) { | ||
81 | case BOP_INC: | ||
82 | r = sm_ll_inc(&smm->ll, op->block, &ev); | ||
83 | break; | ||
84 | |||
85 | case BOP_DEC: | ||
86 | r = sm_ll_dec(&smm->ll, op->block, &ev); | ||
87 | break; | ||
88 | } | ||
89 | |||
90 | return r; | ||
91 | } | ||
92 | |||
93 | static void in(struct sm_metadata *smm) | ||
94 | { | ||
95 | smm->recursion_count++; | ||
96 | } | ||
97 | |||
98 | static int out(struct sm_metadata *smm) | ||
99 | { | ||
100 | int r = 0; | ||
101 | |||
102 | /* | ||
103 | * If we're not recursing then very bad things are happening. | ||
104 | */ | ||
105 | if (!smm->recursion_count) { | ||
106 | DMERR("lost track of recursion depth"); | ||
107 | return -ENOMEM; | ||
108 | } | ||
109 | |||
110 | if (smm->recursion_count == 1 && smm->nr_uncommitted) { | ||
111 | while (smm->nr_uncommitted && !r) { | ||
112 | smm->nr_uncommitted--; | ||
113 | r = commit_bop(smm, smm->uncommitted + | ||
114 | smm->nr_uncommitted); | ||
115 | if (r) | ||
116 | break; | ||
117 | } | ||
118 | } | ||
119 | |||
120 | smm->recursion_count--; | ||
121 | |||
122 | return r; | ||
123 | } | ||
124 | |||
125 | /* | ||
126 | * When using the out() function above, we often want to combine an error | ||
127 | * code for the operation run in the recursive context with that from | ||
128 | * out(). | ||
129 | */ | ||
130 | static int combine_errors(int r1, int r2) | ||
131 | { | ||
132 | return r1 ? r1 : r2; | ||
133 | } | ||
134 | |||
135 | static int recursing(struct sm_metadata *smm) | ||
136 | { | ||
137 | return smm->recursion_count; | ||
138 | } | ||
139 | |||
140 | static void sm_metadata_destroy(struct dm_space_map *sm) | ||
141 | { | ||
142 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
143 | |||
144 | kfree(smm); | ||
145 | } | ||
146 | |||
147 | static int sm_metadata_extend(struct dm_space_map *sm, dm_block_t extra_blocks) | ||
148 | { | ||
149 | DMERR("doesn't support extend"); | ||
150 | return -EINVAL; | ||
151 | } | ||
152 | |||
153 | static int sm_metadata_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) | ||
154 | { | ||
155 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
156 | |||
157 | *count = smm->ll.nr_blocks; | ||
158 | |||
159 | return 0; | ||
160 | } | ||
161 | |||
162 | static int sm_metadata_get_nr_free(struct dm_space_map *sm, dm_block_t *count) | ||
163 | { | ||
164 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
165 | |||
166 | *count = smm->old_ll.nr_blocks - smm->old_ll.nr_allocated - | ||
167 | smm->allocated_this_transaction; | ||
168 | |||
169 | return 0; | ||
170 | } | ||
171 | |||
172 | static int sm_metadata_get_count(struct dm_space_map *sm, dm_block_t b, | ||
173 | uint32_t *result) | ||
174 | { | ||
175 | int r, i; | ||
176 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
177 | unsigned adjustment = 0; | ||
178 | |||
179 | /* | ||
180 | * We may have some uncommitted adjustments to add. This list | ||
181 | * should always be really short. | ||
182 | */ | ||
183 | for (i = 0; i < smm->nr_uncommitted; i++) { | ||
184 | struct block_op *op = smm->uncommitted + i; | ||
185 | |||
186 | if (op->block != b) | ||
187 | continue; | ||
188 | |||
189 | switch (op->type) { | ||
190 | case BOP_INC: | ||
191 | adjustment++; | ||
192 | break; | ||
193 | |||
194 | case BOP_DEC: | ||
195 | adjustment--; | ||
196 | break; | ||
197 | } | ||
198 | } | ||
199 | |||
200 | r = sm_ll_lookup(&smm->ll, b, result); | ||
201 | if (r) | ||
202 | return r; | ||
203 | |||
204 | *result += adjustment; | ||
205 | |||
206 | return 0; | ||
207 | } | ||
208 | |||
209 | static int sm_metadata_count_is_more_than_one(struct dm_space_map *sm, | ||
210 | dm_block_t b, int *result) | ||
211 | { | ||
212 | int r, i, adjustment = 0; | ||
213 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
214 | uint32_t rc; | ||
215 | |||
216 | /* | ||
217 | * We may have some uncommitted adjustments to add. This list | ||
218 | * should always be really short. | ||
219 | */ | ||
220 | for (i = 0; i < smm->nr_uncommitted; i++) { | ||
221 | struct block_op *op = smm->uncommitted + i; | ||
222 | |||
223 | if (op->block != b) | ||
224 | continue; | ||
225 | |||
226 | switch (op->type) { | ||
227 | case BOP_INC: | ||
228 | adjustment++; | ||
229 | break; | ||
230 | |||
231 | case BOP_DEC: | ||
232 | adjustment--; | ||
233 | break; | ||
234 | } | ||
235 | } | ||
236 | |||
237 | if (adjustment > 1) { | ||
238 | *result = 1; | ||
239 | return 0; | ||
240 | } | ||
241 | |||
242 | r = sm_ll_lookup_bitmap(&smm->ll, b, &rc); | ||
243 | if (r) | ||
244 | return r; | ||
245 | |||
246 | if (rc == 3) | ||
247 | /* | ||
248 | * We err on the side of caution, and always return true. | ||
249 | */ | ||
250 | *result = 1; | ||
251 | else | ||
252 | *result = rc + adjustment > 1; | ||
253 | |||
254 | return 0; | ||
255 | } | ||
256 | |||
257 | static int sm_metadata_set_count(struct dm_space_map *sm, dm_block_t b, | ||
258 | uint32_t count) | ||
259 | { | ||
260 | int r, r2; | ||
261 | enum allocation_event ev; | ||
262 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
263 | |||
264 | if (smm->recursion_count) { | ||
265 | DMERR("cannot recurse set_count()"); | ||
266 | return -EINVAL; | ||
267 | } | ||
268 | |||
269 | in(smm); | ||
270 | r = sm_ll_insert(&smm->ll, b, count, &ev); | ||
271 | r2 = out(smm); | ||
272 | |||
273 | return combine_errors(r, r2); | ||
274 | } | ||
275 | |||
276 | static int sm_metadata_inc_block(struct dm_space_map *sm, dm_block_t b) | ||
277 | { | ||
278 | int r, r2 = 0; | ||
279 | enum allocation_event ev; | ||
280 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
281 | |||
282 | if (recursing(smm)) | ||
283 | r = add_bop(smm, BOP_INC, b); | ||
284 | else { | ||
285 | in(smm); | ||
286 | r = sm_ll_inc(&smm->ll, b, &ev); | ||
287 | r2 = out(smm); | ||
288 | } | ||
289 | |||
290 | return combine_errors(r, r2); | ||
291 | } | ||
292 | |||
293 | static int sm_metadata_dec_block(struct dm_space_map *sm, dm_block_t b) | ||
294 | { | ||
295 | int r, r2 = 0; | ||
296 | enum allocation_event ev; | ||
297 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
298 | |||
299 | if (recursing(smm)) | ||
300 | r = add_bop(smm, BOP_DEC, b); | ||
301 | else { | ||
302 | in(smm); | ||
303 | r = sm_ll_dec(&smm->ll, b, &ev); | ||
304 | r2 = out(smm); | ||
305 | } | ||
306 | |||
307 | return combine_errors(r, r2); | ||
308 | } | ||
309 | |||
310 | static int sm_metadata_new_block_(struct dm_space_map *sm, dm_block_t *b) | ||
311 | { | ||
312 | int r, r2 = 0; | ||
313 | enum allocation_event ev; | ||
314 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
315 | |||
316 | r = sm_ll_find_free_block(&smm->old_ll, smm->begin, smm->old_ll.nr_blocks, b); | ||
317 | if (r) | ||
318 | return r; | ||
319 | |||
320 | smm->begin = *b + 1; | ||
321 | |||
322 | if (recursing(smm)) | ||
323 | r = add_bop(smm, BOP_INC, *b); | ||
324 | else { | ||
325 | in(smm); | ||
326 | r = sm_ll_inc(&smm->ll, *b, &ev); | ||
327 | r2 = out(smm); | ||
328 | } | ||
329 | |||
330 | if (!r) | ||
331 | smm->allocated_this_transaction++; | ||
332 | |||
333 | return combine_errors(r, r2); | ||
334 | } | ||
335 | |||
336 | static int sm_metadata_new_block(struct dm_space_map *sm, dm_block_t *b) | ||
337 | { | ||
338 | int r = sm_metadata_new_block_(sm, b); | ||
339 | if (r) | ||
340 | DMERR("out of metadata space"); | ||
341 | return r; | ||
342 | } | ||
343 | |||
344 | static int sm_metadata_commit(struct dm_space_map *sm) | ||
345 | { | ||
346 | int r; | ||
347 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
348 | |||
349 | r = sm_ll_commit(&smm->ll); | ||
350 | if (r) | ||
351 | return r; | ||
352 | |||
353 | memcpy(&smm->old_ll, &smm->ll, sizeof(smm->old_ll)); | ||
354 | smm->begin = 0; | ||
355 | smm->allocated_this_transaction = 0; | ||
356 | |||
357 | return 0; | ||
358 | } | ||
359 | |||
360 | static int sm_metadata_root_size(struct dm_space_map *sm, size_t *result) | ||
361 | { | ||
362 | *result = sizeof(struct disk_sm_root); | ||
363 | |||
364 | return 0; | ||
365 | } | ||
366 | |||
367 | static int sm_metadata_copy_root(struct dm_space_map *sm, void *where_le, size_t max) | ||
368 | { | ||
369 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
370 | struct disk_sm_root root_le; | ||
371 | |||
372 | root_le.nr_blocks = cpu_to_le64(smm->ll.nr_blocks); | ||
373 | root_le.nr_allocated = cpu_to_le64(smm->ll.nr_allocated); | ||
374 | root_le.bitmap_root = cpu_to_le64(smm->ll.bitmap_root); | ||
375 | root_le.ref_count_root = cpu_to_le64(smm->ll.ref_count_root); | ||
376 | |||
377 | if (max < sizeof(root_le)) | ||
378 | return -ENOSPC; | ||
379 | |||
380 | memcpy(where_le, &root_le, sizeof(root_le)); | ||
381 | |||
382 | return 0; | ||
383 | } | ||
384 | |||
385 | static struct dm_space_map ops = { | ||
386 | .destroy = sm_metadata_destroy, | ||
387 | .extend = sm_metadata_extend, | ||
388 | .get_nr_blocks = sm_metadata_get_nr_blocks, | ||
389 | .get_nr_free = sm_metadata_get_nr_free, | ||
390 | .get_count = sm_metadata_get_count, | ||
391 | .count_is_more_than_one = sm_metadata_count_is_more_than_one, | ||
392 | .set_count = sm_metadata_set_count, | ||
393 | .inc_block = sm_metadata_inc_block, | ||
394 | .dec_block = sm_metadata_dec_block, | ||
395 | .new_block = sm_metadata_new_block, | ||
396 | .commit = sm_metadata_commit, | ||
397 | .root_size = sm_metadata_root_size, | ||
398 | .copy_root = sm_metadata_copy_root | ||
399 | }; | ||
400 | |||
401 | /*----------------------------------------------------------------*/ | ||
402 | |||
403 | /* | ||
404 | * When a new space map is created that manages its own space. We use | ||
405 | * this tiny bootstrap allocator. | ||
406 | */ | ||
407 | static void sm_bootstrap_destroy(struct dm_space_map *sm) | ||
408 | { | ||
409 | } | ||
410 | |||
411 | static int sm_bootstrap_extend(struct dm_space_map *sm, dm_block_t extra_blocks) | ||
412 | { | ||
413 | DMERR("boostrap doesn't support extend"); | ||
414 | |||
415 | return -EINVAL; | ||
416 | } | ||
417 | |||
418 | static int sm_bootstrap_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) | ||
419 | { | ||
420 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
421 | |||
422 | return smm->ll.nr_blocks; | ||
423 | } | ||
424 | |||
425 | static int sm_bootstrap_get_nr_free(struct dm_space_map *sm, dm_block_t *count) | ||
426 | { | ||
427 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
428 | |||
429 | *count = smm->ll.nr_blocks - smm->begin; | ||
430 | |||
431 | return 0; | ||
432 | } | ||
433 | |||
434 | static int sm_bootstrap_get_count(struct dm_space_map *sm, dm_block_t b, | ||
435 | uint32_t *result) | ||
436 | { | ||
437 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
438 | |||
439 | return b < smm->begin ? 1 : 0; | ||
440 | } | ||
441 | |||
442 | static int sm_bootstrap_count_is_more_than_one(struct dm_space_map *sm, | ||
443 | dm_block_t b, int *result) | ||
444 | { | ||
445 | *result = 0; | ||
446 | |||
447 | return 0; | ||
448 | } | ||
449 | |||
450 | static int sm_bootstrap_set_count(struct dm_space_map *sm, dm_block_t b, | ||
451 | uint32_t count) | ||
452 | { | ||
453 | DMERR("boostrap doesn't support set_count"); | ||
454 | |||
455 | return -EINVAL; | ||
456 | } | ||
457 | |||
458 | static int sm_bootstrap_new_block(struct dm_space_map *sm, dm_block_t *b) | ||
459 | { | ||
460 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
461 | |||
462 | /* | ||
463 | * We know the entire device is unused. | ||
464 | */ | ||
465 | if (smm->begin == smm->ll.nr_blocks) | ||
466 | return -ENOSPC; | ||
467 | |||
468 | *b = smm->begin++; | ||
469 | |||
470 | return 0; | ||
471 | } | ||
472 | |||
473 | static int sm_bootstrap_inc_block(struct dm_space_map *sm, dm_block_t b) | ||
474 | { | ||
475 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
476 | |||
477 | return add_bop(smm, BOP_INC, b); | ||
478 | } | ||
479 | |||
480 | static int sm_bootstrap_dec_block(struct dm_space_map *sm, dm_block_t b) | ||
481 | { | ||
482 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
483 | |||
484 | return add_bop(smm, BOP_DEC, b); | ||
485 | } | ||
486 | |||
487 | static int sm_bootstrap_commit(struct dm_space_map *sm) | ||
488 | { | ||
489 | return 0; | ||
490 | } | ||
491 | |||
492 | static int sm_bootstrap_root_size(struct dm_space_map *sm, size_t *result) | ||
493 | { | ||
494 | DMERR("boostrap doesn't support root_size"); | ||
495 | |||
496 | return -EINVAL; | ||
497 | } | ||
498 | |||
499 | static int sm_bootstrap_copy_root(struct dm_space_map *sm, void *where, | ||
500 | size_t max) | ||
501 | { | ||
502 | DMERR("boostrap doesn't support copy_root"); | ||
503 | |||
504 | return -EINVAL; | ||
505 | } | ||
506 | |||
507 | static struct dm_space_map bootstrap_ops = { | ||
508 | .destroy = sm_bootstrap_destroy, | ||
509 | .extend = sm_bootstrap_extend, | ||
510 | .get_nr_blocks = sm_bootstrap_get_nr_blocks, | ||
511 | .get_nr_free = sm_bootstrap_get_nr_free, | ||
512 | .get_count = sm_bootstrap_get_count, | ||
513 | .count_is_more_than_one = sm_bootstrap_count_is_more_than_one, | ||
514 | .set_count = sm_bootstrap_set_count, | ||
515 | .inc_block = sm_bootstrap_inc_block, | ||
516 | .dec_block = sm_bootstrap_dec_block, | ||
517 | .new_block = sm_bootstrap_new_block, | ||
518 | .commit = sm_bootstrap_commit, | ||
519 | .root_size = sm_bootstrap_root_size, | ||
520 | .copy_root = sm_bootstrap_copy_root | ||
521 | }; | ||
522 | |||
523 | /*----------------------------------------------------------------*/ | ||
524 | |||
525 | struct dm_space_map *dm_sm_metadata_init(void) | ||
526 | { | ||
527 | struct sm_metadata *smm; | ||
528 | |||
529 | smm = kmalloc(sizeof(*smm), GFP_KERNEL); | ||
530 | if (!smm) | ||
531 | return ERR_PTR(-ENOMEM); | ||
532 | |||
533 | memcpy(&smm->sm, &ops, sizeof(smm->sm)); | ||
534 | |||
535 | return &smm->sm; | ||
536 | } | ||
537 | |||
538 | int dm_sm_metadata_create(struct dm_space_map *sm, | ||
539 | struct dm_transaction_manager *tm, | ||
540 | dm_block_t nr_blocks, | ||
541 | dm_block_t superblock) | ||
542 | { | ||
543 | int r; | ||
544 | dm_block_t i; | ||
545 | enum allocation_event ev; | ||
546 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
547 | |||
548 | smm->begin = superblock + 1; | ||
549 | smm->recursion_count = 0; | ||
550 | smm->allocated_this_transaction = 0; | ||
551 | smm->nr_uncommitted = 0; | ||
552 | |||
553 | memcpy(&smm->sm, &bootstrap_ops, sizeof(smm->sm)); | ||
554 | |||
555 | r = sm_ll_new_metadata(&smm->ll, tm); | ||
556 | if (r) | ||
557 | return r; | ||
558 | |||
559 | r = sm_ll_extend(&smm->ll, nr_blocks); | ||
560 | if (r) | ||
561 | return r; | ||
562 | |||
563 | memcpy(&smm->sm, &ops, sizeof(smm->sm)); | ||
564 | |||
565 | /* | ||
566 | * Now we need to update the newly created data structures with the | ||
567 | * allocated blocks that they were built from. | ||
568 | */ | ||
569 | for (i = superblock; !r && i < smm->begin; i++) | ||
570 | r = sm_ll_inc(&smm->ll, i, &ev); | ||
571 | |||
572 | if (r) | ||
573 | return r; | ||
574 | |||
575 | return sm_metadata_commit(sm); | ||
576 | } | ||
577 | |||
578 | int dm_sm_metadata_open(struct dm_space_map *sm, | ||
579 | struct dm_transaction_manager *tm, | ||
580 | void *root_le, size_t len) | ||
581 | { | ||
582 | int r; | ||
583 | struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm); | ||
584 | |||
585 | r = sm_ll_open_metadata(&smm->ll, tm, root_le, len); | ||
586 | if (r) | ||
587 | return r; | ||
588 | |||
589 | smm->begin = 0; | ||
590 | smm->recursion_count = 0; | ||
591 | smm->allocated_this_transaction = 0; | ||
592 | smm->nr_uncommitted = 0; | ||
593 | |||
594 | memcpy(&smm->old_ll, &smm->ll, sizeof(smm->old_ll)); | ||
595 | return 0; | ||
596 | } | ||
diff --git a/drivers/md/persistent-data/dm-space-map-metadata.h b/drivers/md/persistent-data/dm-space-map-metadata.h new file mode 100644 index 000000000000..39bba0801cf2 --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map-metadata.h | |||
@@ -0,0 +1,33 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef DM_SPACE_MAP_METADATA_H | ||
8 | #define DM_SPACE_MAP_METADATA_H | ||
9 | |||
10 | #include "dm-transaction-manager.h" | ||
11 | |||
12 | /* | ||
13 | * Unfortunately we have to use two-phase construction due to the cycle | ||
14 | * between the tm and sm. | ||
15 | */ | ||
16 | struct dm_space_map *dm_sm_metadata_init(void); | ||
17 | |||
18 | /* | ||
19 | * Create a fresh space map. | ||
20 | */ | ||
21 | int dm_sm_metadata_create(struct dm_space_map *sm, | ||
22 | struct dm_transaction_manager *tm, | ||
23 | dm_block_t nr_blocks, | ||
24 | dm_block_t superblock); | ||
25 | |||
26 | /* | ||
27 | * Open from a previously-recorded root. | ||
28 | */ | ||
29 | int dm_sm_metadata_open(struct dm_space_map *sm, | ||
30 | struct dm_transaction_manager *tm, | ||
31 | void *root_le, size_t len); | ||
32 | |||
33 | #endif /* DM_SPACE_MAP_METADATA_H */ | ||
diff --git a/drivers/md/persistent-data/dm-space-map.h b/drivers/md/persistent-data/dm-space-map.h new file mode 100644 index 000000000000..1cbfc6b1638a --- /dev/null +++ b/drivers/md/persistent-data/dm-space-map.h | |||
@@ -0,0 +1,134 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef _LINUX_DM_SPACE_MAP_H | ||
8 | #define _LINUX_DM_SPACE_MAP_H | ||
9 | |||
10 | #include "dm-block-manager.h" | ||
11 | |||
12 | /* | ||
13 | * struct dm_space_map keeps a record of how many times each block in a device | ||
14 | * is referenced. It needs to be fixed on disk as part of the transaction. | ||
15 | */ | ||
16 | struct dm_space_map { | ||
17 | void (*destroy)(struct dm_space_map *sm); | ||
18 | |||
19 | /* | ||
20 | * You must commit before allocating the newly added space. | ||
21 | */ | ||
22 | int (*extend)(struct dm_space_map *sm, dm_block_t extra_blocks); | ||
23 | |||
24 | /* | ||
25 | * Extensions do not appear in this count until after commit has | ||
26 | * been called. | ||
27 | */ | ||
28 | int (*get_nr_blocks)(struct dm_space_map *sm, dm_block_t *count); | ||
29 | |||
30 | /* | ||
31 | * Space maps must never allocate a block from the previous | ||
32 | * transaction, in case we need to rollback. This complicates the | ||
33 | * semantics of get_nr_free(), it should return the number of blocks | ||
34 | * that are available for allocation _now_. For instance you may | ||
35 | * have blocks with a zero reference count that will not be | ||
36 | * available for allocation until after the next commit. | ||
37 | */ | ||
38 | int (*get_nr_free)(struct dm_space_map *sm, dm_block_t *count); | ||
39 | |||
40 | int (*get_count)(struct dm_space_map *sm, dm_block_t b, uint32_t *result); | ||
41 | int (*count_is_more_than_one)(struct dm_space_map *sm, dm_block_t b, | ||
42 | int *result); | ||
43 | int (*set_count)(struct dm_space_map *sm, dm_block_t b, uint32_t count); | ||
44 | |||
45 | int (*commit)(struct dm_space_map *sm); | ||
46 | |||
47 | int (*inc_block)(struct dm_space_map *sm, dm_block_t b); | ||
48 | int (*dec_block)(struct dm_space_map *sm, dm_block_t b); | ||
49 | |||
50 | /* | ||
51 | * new_block will increment the returned block. | ||
52 | */ | ||
53 | int (*new_block)(struct dm_space_map *sm, dm_block_t *b); | ||
54 | |||
55 | /* | ||
56 | * The root contains all the information needed to fix the space map. | ||
57 | * Generally this info is small, so squirrel it away in a disk block | ||
58 | * along with other info. | ||
59 | */ | ||
60 | int (*root_size)(struct dm_space_map *sm, size_t *result); | ||
61 | int (*copy_root)(struct dm_space_map *sm, void *copy_to_here_le, size_t len); | ||
62 | }; | ||
63 | |||
64 | /*----------------------------------------------------------------*/ | ||
65 | |||
66 | static inline void dm_sm_destroy(struct dm_space_map *sm) | ||
67 | { | ||
68 | sm->destroy(sm); | ||
69 | } | ||
70 | |||
71 | static inline int dm_sm_extend(struct dm_space_map *sm, dm_block_t extra_blocks) | ||
72 | { | ||
73 | return sm->extend(sm, extra_blocks); | ||
74 | } | ||
75 | |||
76 | static inline int dm_sm_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) | ||
77 | { | ||
78 | return sm->get_nr_blocks(sm, count); | ||
79 | } | ||
80 | |||
81 | static inline int dm_sm_get_nr_free(struct dm_space_map *sm, dm_block_t *count) | ||
82 | { | ||
83 | return sm->get_nr_free(sm, count); | ||
84 | } | ||
85 | |||
86 | static inline int dm_sm_get_count(struct dm_space_map *sm, dm_block_t b, | ||
87 | uint32_t *result) | ||
88 | { | ||
89 | return sm->get_count(sm, b, result); | ||
90 | } | ||
91 | |||
92 | static inline int dm_sm_count_is_more_than_one(struct dm_space_map *sm, | ||
93 | dm_block_t b, int *result) | ||
94 | { | ||
95 | return sm->count_is_more_than_one(sm, b, result); | ||
96 | } | ||
97 | |||
98 | static inline int dm_sm_set_count(struct dm_space_map *sm, dm_block_t b, | ||
99 | uint32_t count) | ||
100 | { | ||
101 | return sm->set_count(sm, b, count); | ||
102 | } | ||
103 | |||
104 | static inline int dm_sm_commit(struct dm_space_map *sm) | ||
105 | { | ||
106 | return sm->commit(sm); | ||
107 | } | ||
108 | |||
109 | static inline int dm_sm_inc_block(struct dm_space_map *sm, dm_block_t b) | ||
110 | { | ||
111 | return sm->inc_block(sm, b); | ||
112 | } | ||
113 | |||
114 | static inline int dm_sm_dec_block(struct dm_space_map *sm, dm_block_t b) | ||
115 | { | ||
116 | return sm->dec_block(sm, b); | ||
117 | } | ||
118 | |||
119 | static inline int dm_sm_new_block(struct dm_space_map *sm, dm_block_t *b) | ||
120 | { | ||
121 | return sm->new_block(sm, b); | ||
122 | } | ||
123 | |||
124 | static inline int dm_sm_root_size(struct dm_space_map *sm, size_t *result) | ||
125 | { | ||
126 | return sm->root_size(sm, result); | ||
127 | } | ||
128 | |||
129 | static inline int dm_sm_copy_root(struct dm_space_map *sm, void *copy_to_here_le, size_t len) | ||
130 | { | ||
131 | return sm->copy_root(sm, copy_to_here_le, len); | ||
132 | } | ||
133 | |||
134 | #endif /* _LINUX_DM_SPACE_MAP_H */ | ||
diff --git a/drivers/md/persistent-data/dm-transaction-manager.c b/drivers/md/persistent-data/dm-transaction-manager.c new file mode 100644 index 000000000000..728e89a3f978 --- /dev/null +++ b/drivers/md/persistent-data/dm-transaction-manager.c | |||
@@ -0,0 +1,400 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | #include "dm-transaction-manager.h" | ||
7 | #include "dm-space-map.h" | ||
8 | #include "dm-space-map-checker.h" | ||
9 | #include "dm-space-map-disk.h" | ||
10 | #include "dm-space-map-metadata.h" | ||
11 | #include "dm-persistent-data-internal.h" | ||
12 | |||
13 | #include <linux/module.h> | ||
14 | #include <linux/slab.h> | ||
15 | #include <linux/device-mapper.h> | ||
16 | |||
17 | #define DM_MSG_PREFIX "transaction manager" | ||
18 | |||
19 | /*----------------------------------------------------------------*/ | ||
20 | |||
21 | struct shadow_info { | ||
22 | struct hlist_node hlist; | ||
23 | dm_block_t where; | ||
24 | }; | ||
25 | |||
26 | /* | ||
27 | * It would be nice if we scaled with the size of transaction. | ||
28 | */ | ||
29 | #define HASH_SIZE 256 | ||
30 | #define HASH_MASK (HASH_SIZE - 1) | ||
31 | |||
32 | struct dm_transaction_manager { | ||
33 | int is_clone; | ||
34 | struct dm_transaction_manager *real; | ||
35 | |||
36 | struct dm_block_manager *bm; | ||
37 | struct dm_space_map *sm; | ||
38 | |||
39 | spinlock_t lock; | ||
40 | struct hlist_head buckets[HASH_SIZE]; | ||
41 | }; | ||
42 | |||
43 | /*----------------------------------------------------------------*/ | ||
44 | |||
45 | static int is_shadow(struct dm_transaction_manager *tm, dm_block_t b) | ||
46 | { | ||
47 | int r = 0; | ||
48 | unsigned bucket = dm_hash_block(b, HASH_MASK); | ||
49 | struct shadow_info *si; | ||
50 | struct hlist_node *n; | ||
51 | |||
52 | spin_lock(&tm->lock); | ||
53 | hlist_for_each_entry(si, n, tm->buckets + bucket, hlist) | ||
54 | if (si->where == b) { | ||
55 | r = 1; | ||
56 | break; | ||
57 | } | ||
58 | spin_unlock(&tm->lock); | ||
59 | |||
60 | return r; | ||
61 | } | ||
62 | |||
63 | /* | ||
64 | * This can silently fail if there's no memory. We're ok with this since | ||
65 | * creating redundant shadows causes no harm. | ||
66 | */ | ||
67 | static void insert_shadow(struct dm_transaction_manager *tm, dm_block_t b) | ||
68 | { | ||
69 | unsigned bucket; | ||
70 | struct shadow_info *si; | ||
71 | |||
72 | si = kmalloc(sizeof(*si), GFP_NOIO); | ||
73 | if (si) { | ||
74 | si->where = b; | ||
75 | bucket = dm_hash_block(b, HASH_MASK); | ||
76 | spin_lock(&tm->lock); | ||
77 | hlist_add_head(&si->hlist, tm->buckets + bucket); | ||
78 | spin_unlock(&tm->lock); | ||
79 | } | ||
80 | } | ||
81 | |||
82 | static void wipe_shadow_table(struct dm_transaction_manager *tm) | ||
83 | { | ||
84 | struct shadow_info *si; | ||
85 | struct hlist_node *n, *tmp; | ||
86 | struct hlist_head *bucket; | ||
87 | int i; | ||
88 | |||
89 | spin_lock(&tm->lock); | ||
90 | for (i = 0; i < HASH_SIZE; i++) { | ||
91 | bucket = tm->buckets + i; | ||
92 | hlist_for_each_entry_safe(si, n, tmp, bucket, hlist) | ||
93 | kfree(si); | ||
94 | |||
95 | INIT_HLIST_HEAD(bucket); | ||
96 | } | ||
97 | |||
98 | spin_unlock(&tm->lock); | ||
99 | } | ||
100 | |||
101 | /*----------------------------------------------------------------*/ | ||
102 | |||
103 | static struct dm_transaction_manager *dm_tm_create(struct dm_block_manager *bm, | ||
104 | struct dm_space_map *sm) | ||
105 | { | ||
106 | int i; | ||
107 | struct dm_transaction_manager *tm; | ||
108 | |||
109 | tm = kmalloc(sizeof(*tm), GFP_KERNEL); | ||
110 | if (!tm) | ||
111 | return ERR_PTR(-ENOMEM); | ||
112 | |||
113 | tm->is_clone = 0; | ||
114 | tm->real = NULL; | ||
115 | tm->bm = bm; | ||
116 | tm->sm = sm; | ||
117 | |||
118 | spin_lock_init(&tm->lock); | ||
119 | for (i = 0; i < HASH_SIZE; i++) | ||
120 | INIT_HLIST_HEAD(tm->buckets + i); | ||
121 | |||
122 | return tm; | ||
123 | } | ||
124 | |||
125 | struct dm_transaction_manager *dm_tm_create_non_blocking_clone(struct dm_transaction_manager *real) | ||
126 | { | ||
127 | struct dm_transaction_manager *tm; | ||
128 | |||
129 | tm = kmalloc(sizeof(*tm), GFP_KERNEL); | ||
130 | if (tm) { | ||
131 | tm->is_clone = 1; | ||
132 | tm->real = real; | ||
133 | } | ||
134 | |||
135 | return tm; | ||
136 | } | ||
137 | EXPORT_SYMBOL_GPL(dm_tm_create_non_blocking_clone); | ||
138 | |||
139 | void dm_tm_destroy(struct dm_transaction_manager *tm) | ||
140 | { | ||
141 | kfree(tm); | ||
142 | } | ||
143 | EXPORT_SYMBOL_GPL(dm_tm_destroy); | ||
144 | |||
145 | int dm_tm_pre_commit(struct dm_transaction_manager *tm) | ||
146 | { | ||
147 | int r; | ||
148 | |||
149 | if (tm->is_clone) | ||
150 | return -EWOULDBLOCK; | ||
151 | |||
152 | r = dm_sm_commit(tm->sm); | ||
153 | if (r < 0) | ||
154 | return r; | ||
155 | |||
156 | return 0; | ||
157 | } | ||
158 | EXPORT_SYMBOL_GPL(dm_tm_pre_commit); | ||
159 | |||
160 | int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *root) | ||
161 | { | ||
162 | if (tm->is_clone) | ||
163 | return -EWOULDBLOCK; | ||
164 | |||
165 | wipe_shadow_table(tm); | ||
166 | |||
167 | return dm_bm_flush_and_unlock(tm->bm, root); | ||
168 | } | ||
169 | EXPORT_SYMBOL_GPL(dm_tm_commit); | ||
170 | |||
171 | int dm_tm_new_block(struct dm_transaction_manager *tm, | ||
172 | struct dm_block_validator *v, | ||
173 | struct dm_block **result) | ||
174 | { | ||
175 | int r; | ||
176 | dm_block_t new_block; | ||
177 | |||
178 | if (tm->is_clone) | ||
179 | return -EWOULDBLOCK; | ||
180 | |||
181 | r = dm_sm_new_block(tm->sm, &new_block); | ||
182 | if (r < 0) | ||
183 | return r; | ||
184 | |||
185 | r = dm_bm_write_lock_zero(tm->bm, new_block, v, result); | ||
186 | if (r < 0) { | ||
187 | dm_sm_dec_block(tm->sm, new_block); | ||
188 | return r; | ||
189 | } | ||
190 | |||
191 | /* | ||
192 | * New blocks count as shadows in that they don't need to be | ||
193 | * shadowed again. | ||
194 | */ | ||
195 | insert_shadow(tm, new_block); | ||
196 | |||
197 | return 0; | ||
198 | } | ||
199 | |||
200 | static int __shadow_block(struct dm_transaction_manager *tm, dm_block_t orig, | ||
201 | struct dm_block_validator *v, | ||
202 | struct dm_block **result) | ||
203 | { | ||
204 | int r; | ||
205 | dm_block_t new; | ||
206 | struct dm_block *orig_block; | ||
207 | |||
208 | r = dm_sm_new_block(tm->sm, &new); | ||
209 | if (r < 0) | ||
210 | return r; | ||
211 | |||
212 | r = dm_sm_dec_block(tm->sm, orig); | ||
213 | if (r < 0) | ||
214 | return r; | ||
215 | |||
216 | r = dm_bm_read_lock(tm->bm, orig, v, &orig_block); | ||
217 | if (r < 0) | ||
218 | return r; | ||
219 | |||
220 | r = dm_bm_unlock_move(orig_block, new); | ||
221 | if (r < 0) { | ||
222 | dm_bm_unlock(orig_block); | ||
223 | return r; | ||
224 | } | ||
225 | |||
226 | return dm_bm_write_lock(tm->bm, new, v, result); | ||
227 | } | ||
228 | |||
229 | int dm_tm_shadow_block(struct dm_transaction_manager *tm, dm_block_t orig, | ||
230 | struct dm_block_validator *v, struct dm_block **result, | ||
231 | int *inc_children) | ||
232 | { | ||
233 | int r; | ||
234 | |||
235 | if (tm->is_clone) | ||
236 | return -EWOULDBLOCK; | ||
237 | |||
238 | r = dm_sm_count_is_more_than_one(tm->sm, orig, inc_children); | ||
239 | if (r < 0) | ||
240 | return r; | ||
241 | |||
242 | if (is_shadow(tm, orig) && !*inc_children) | ||
243 | return dm_bm_write_lock(tm->bm, orig, v, result); | ||
244 | |||
245 | r = __shadow_block(tm, orig, v, result); | ||
246 | if (r < 0) | ||
247 | return r; | ||
248 | insert_shadow(tm, dm_block_location(*result)); | ||
249 | |||
250 | return r; | ||
251 | } | ||
252 | |||
253 | int dm_tm_read_lock(struct dm_transaction_manager *tm, dm_block_t b, | ||
254 | struct dm_block_validator *v, | ||
255 | struct dm_block **blk) | ||
256 | { | ||
257 | if (tm->is_clone) | ||
258 | return dm_bm_read_try_lock(tm->real->bm, b, v, blk); | ||
259 | |||
260 | return dm_bm_read_lock(tm->bm, b, v, blk); | ||
261 | } | ||
262 | |||
263 | int dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b) | ||
264 | { | ||
265 | return dm_bm_unlock(b); | ||
266 | } | ||
267 | EXPORT_SYMBOL_GPL(dm_tm_unlock); | ||
268 | |||
269 | void dm_tm_inc(struct dm_transaction_manager *tm, dm_block_t b) | ||
270 | { | ||
271 | /* | ||
272 | * The non-blocking clone doesn't support this. | ||
273 | */ | ||
274 | BUG_ON(tm->is_clone); | ||
275 | |||
276 | dm_sm_inc_block(tm->sm, b); | ||
277 | } | ||
278 | EXPORT_SYMBOL_GPL(dm_tm_inc); | ||
279 | |||
280 | void dm_tm_dec(struct dm_transaction_manager *tm, dm_block_t b) | ||
281 | { | ||
282 | /* | ||
283 | * The non-blocking clone doesn't support this. | ||
284 | */ | ||
285 | BUG_ON(tm->is_clone); | ||
286 | |||
287 | dm_sm_dec_block(tm->sm, b); | ||
288 | } | ||
289 | EXPORT_SYMBOL_GPL(dm_tm_dec); | ||
290 | |||
291 | int dm_tm_ref(struct dm_transaction_manager *tm, dm_block_t b, | ||
292 | uint32_t *result) | ||
293 | { | ||
294 | if (tm->is_clone) | ||
295 | return -EWOULDBLOCK; | ||
296 | |||
297 | return dm_sm_get_count(tm->sm, b, result); | ||
298 | } | ||
299 | |||
300 | struct dm_block_manager *dm_tm_get_bm(struct dm_transaction_manager *tm) | ||
301 | { | ||
302 | return tm->bm; | ||
303 | } | ||
304 | |||
305 | /*----------------------------------------------------------------*/ | ||
306 | |||
307 | static int dm_tm_create_internal(struct dm_block_manager *bm, | ||
308 | dm_block_t sb_location, | ||
309 | struct dm_block_validator *sb_validator, | ||
310 | size_t root_offset, size_t root_max_len, | ||
311 | struct dm_transaction_manager **tm, | ||
312 | struct dm_space_map **sm, | ||
313 | struct dm_block **sblock, | ||
314 | int create) | ||
315 | { | ||
316 | int r; | ||
317 | struct dm_space_map *inner; | ||
318 | |||
319 | inner = dm_sm_metadata_init(); | ||
320 | if (IS_ERR(inner)) | ||
321 | return PTR_ERR(inner); | ||
322 | |||
323 | *tm = dm_tm_create(bm, inner); | ||
324 | if (IS_ERR(*tm)) { | ||
325 | dm_sm_destroy(inner); | ||
326 | return PTR_ERR(*tm); | ||
327 | } | ||
328 | |||
329 | if (create) { | ||
330 | r = dm_bm_write_lock_zero(dm_tm_get_bm(*tm), sb_location, | ||
331 | sb_validator, sblock); | ||
332 | if (r < 0) { | ||
333 | DMERR("couldn't lock superblock"); | ||
334 | goto bad1; | ||
335 | } | ||
336 | |||
337 | r = dm_sm_metadata_create(inner, *tm, dm_bm_nr_blocks(bm), | ||
338 | sb_location); | ||
339 | if (r) { | ||
340 | DMERR("couldn't create metadata space map"); | ||
341 | goto bad2; | ||
342 | } | ||
343 | |||
344 | *sm = dm_sm_checker_create(inner); | ||
345 | if (!*sm) | ||
346 | goto bad2; | ||
347 | |||
348 | } else { | ||
349 | r = dm_bm_write_lock(dm_tm_get_bm(*tm), sb_location, | ||
350 | sb_validator, sblock); | ||
351 | if (r < 0) { | ||
352 | DMERR("couldn't lock superblock"); | ||
353 | goto bad1; | ||
354 | } | ||
355 | |||
356 | r = dm_sm_metadata_open(inner, *tm, | ||
357 | dm_block_data(*sblock) + root_offset, | ||
358 | root_max_len); | ||
359 | if (r) { | ||
360 | DMERR("couldn't open metadata space map"); | ||
361 | goto bad2; | ||
362 | } | ||
363 | |||
364 | *sm = dm_sm_checker_create(inner); | ||
365 | if (!*sm) | ||
366 | goto bad2; | ||
367 | } | ||
368 | |||
369 | return 0; | ||
370 | |||
371 | bad2: | ||
372 | dm_tm_unlock(*tm, *sblock); | ||
373 | bad1: | ||
374 | dm_tm_destroy(*tm); | ||
375 | dm_sm_destroy(inner); | ||
376 | return r; | ||
377 | } | ||
378 | |||
379 | int dm_tm_create_with_sm(struct dm_block_manager *bm, dm_block_t sb_location, | ||
380 | struct dm_block_validator *sb_validator, | ||
381 | struct dm_transaction_manager **tm, | ||
382 | struct dm_space_map **sm, struct dm_block **sblock) | ||
383 | { | ||
384 | return dm_tm_create_internal(bm, sb_location, sb_validator, | ||
385 | 0, 0, tm, sm, sblock, 1); | ||
386 | } | ||
387 | EXPORT_SYMBOL_GPL(dm_tm_create_with_sm); | ||
388 | |||
389 | int dm_tm_open_with_sm(struct dm_block_manager *bm, dm_block_t sb_location, | ||
390 | struct dm_block_validator *sb_validator, | ||
391 | size_t root_offset, size_t root_max_len, | ||
392 | struct dm_transaction_manager **tm, | ||
393 | struct dm_space_map **sm, struct dm_block **sblock) | ||
394 | { | ||
395 | return dm_tm_create_internal(bm, sb_location, sb_validator, root_offset, | ||
396 | root_max_len, tm, sm, sblock, 0); | ||
397 | } | ||
398 | EXPORT_SYMBOL_GPL(dm_tm_open_with_sm); | ||
399 | |||
400 | /*----------------------------------------------------------------*/ | ||
diff --git a/drivers/md/persistent-data/dm-transaction-manager.h b/drivers/md/persistent-data/dm-transaction-manager.h new file mode 100644 index 000000000000..6da784871db4 --- /dev/null +++ b/drivers/md/persistent-data/dm-transaction-manager.h | |||
@@ -0,0 +1,130 @@ | |||
1 | /* | ||
2 | * Copyright (C) 2011 Red Hat, Inc. | ||
3 | * | ||
4 | * This file is released under the GPL. | ||
5 | */ | ||
6 | |||
7 | #ifndef _LINUX_DM_TRANSACTION_MANAGER_H | ||
8 | #define _LINUX_DM_TRANSACTION_MANAGER_H | ||
9 | |||
10 | #include "dm-block-manager.h" | ||
11 | |||
12 | struct dm_transaction_manager; | ||
13 | struct dm_space_map; | ||
14 | |||
15 | /*----------------------------------------------------------------*/ | ||
16 | |||
17 | /* | ||
18 | * This manages the scope of a transaction. It also enforces immutability | ||
19 | * of the on-disk data structures by limiting access to writeable blocks. | ||
20 | * | ||
21 | * Clients should not fiddle with the block manager directly. | ||
22 | */ | ||
23 | |||
24 | void dm_tm_destroy(struct dm_transaction_manager *tm); | ||
25 | |||
26 | /* | ||
27 | * The non-blocking version of a transaction manager is intended for use in | ||
28 | * fast path code that needs to do lookups e.g. a dm mapping function. | ||
29 | * You create the non-blocking variant from a normal tm. The interface is | ||
30 | * the same, except that most functions will just return -EWOULDBLOCK. | ||
31 | * Methods that return void yet may block should not be called on a clone | ||
32 | * viz. dm_tm_inc, dm_tm_dec. Call dm_tm_destroy() as you would with a normal | ||
33 | * tm when you've finished with it. You may not destroy the original prior | ||
34 | * to clones. | ||
35 | */ | ||
36 | struct dm_transaction_manager *dm_tm_create_non_blocking_clone(struct dm_transaction_manager *real); | ||
37 | |||
38 | /* | ||
39 | * We use a 2-phase commit here. | ||
40 | * | ||
41 | * i) In the first phase the block manager is told to start flushing, and | ||
42 | * the changes to the space map are written to disk. You should interrogate | ||
43 | * your particular space map to get detail of its root node etc. to be | ||
44 | * included in your superblock. | ||
45 | * | ||
46 | * ii) @root will be committed last. You shouldn't use more than the | ||
47 | * first 512 bytes of @root if you wish the transaction to survive a power | ||
48 | * failure. You *must* have a write lock held on @root for both stage (i) | ||
49 | * and (ii). The commit will drop the write lock. | ||
50 | */ | ||
51 | int dm_tm_pre_commit(struct dm_transaction_manager *tm); | ||
52 | int dm_tm_commit(struct dm_transaction_manager *tm, struct dm_block *root); | ||
53 | |||
54 | /* | ||
55 | * These methods are the only way to get hold of a writeable block. | ||
56 | */ | ||
57 | |||
58 | /* | ||
59 | * dm_tm_new_block() is pretty self-explanatory. Make sure you do actually | ||
60 | * write to the whole of @data before you unlock, otherwise you could get | ||
61 | * a data leak. (The other option is for tm_new_block() to zero new blocks | ||
62 | * before handing them out, which will be redundant in most, if not all, | ||
63 | * cases). | ||
64 | * Zeroes the new block and returns with write lock held. | ||
65 | */ | ||
66 | int dm_tm_new_block(struct dm_transaction_manager *tm, | ||
67 | struct dm_block_validator *v, | ||
68 | struct dm_block **result); | ||
69 | |||
70 | /* | ||
71 | * dm_tm_shadow_block() allocates a new block and copies the data from @orig | ||
72 | * to it. It then decrements the reference count on original block. Use | ||
73 | * this to update the contents of a block in a data structure, don't | ||
74 | * confuse this with a clone - you shouldn't access the orig block after | ||
75 | * this operation. Because the tm knows the scope of the transaction it | ||
76 | * can optimise requests for a shadow of a shadow to a no-op. Don't forget | ||
77 | * to unlock when you've finished with the shadow. | ||
78 | * | ||
79 | * The @inc_children flag is used to tell the caller whether it needs to | ||
80 | * adjust reference counts for children. (Data in the block may refer to | ||
81 | * other blocks.) | ||
82 | * | ||
83 | * Shadowing implicitly drops a reference on @orig so you must not have | ||
84 | * it locked when you call this. | ||
85 | */ | ||
86 | int dm_tm_shadow_block(struct dm_transaction_manager *tm, dm_block_t orig, | ||
87 | struct dm_block_validator *v, | ||
88 | struct dm_block **result, int *inc_children); | ||
89 | |||
90 | /* | ||
91 | * Read access. You can lock any block you want. If there's a write lock | ||
92 | * on it outstanding then it'll block. | ||
93 | */ | ||
94 | int dm_tm_read_lock(struct dm_transaction_manager *tm, dm_block_t b, | ||
95 | struct dm_block_validator *v, | ||
96 | struct dm_block **result); | ||
97 | |||
98 | int dm_tm_unlock(struct dm_transaction_manager *tm, struct dm_block *b); | ||
99 | |||
100 | /* | ||
101 | * Functions for altering the reference count of a block directly. | ||
102 | */ | ||
103 | void dm_tm_inc(struct dm_transaction_manager *tm, dm_block_t b); | ||
104 | |||
105 | void dm_tm_dec(struct dm_transaction_manager *tm, dm_block_t b); | ||
106 | |||
107 | int dm_tm_ref(struct dm_transaction_manager *tm, dm_block_t b, | ||
108 | uint32_t *result); | ||
109 | |||
110 | struct dm_block_manager *dm_tm_get_bm(struct dm_transaction_manager *tm); | ||
111 | |||
112 | /* | ||
113 | * A little utility that ties the knot by producing a transaction manager | ||
114 | * that has a space map managed by the transaction manager... | ||
115 | * | ||
116 | * Returns a tm that has an open transaction to write the new disk sm. | ||
117 | * Caller should store the new sm root and commit. | ||
118 | */ | ||
119 | int dm_tm_create_with_sm(struct dm_block_manager *bm, dm_block_t sb_location, | ||
120 | struct dm_block_validator *sb_validator, | ||
121 | struct dm_transaction_manager **tm, | ||
122 | struct dm_space_map **sm, struct dm_block **sblock); | ||
123 | |||
124 | int dm_tm_open_with_sm(struct dm_block_manager *bm, dm_block_t sb_location, | ||
125 | struct dm_block_validator *sb_validator, | ||
126 | size_t root_offset, size_t root_max_len, | ||
127 | struct dm_transaction_manager **tm, | ||
128 | struct dm_space_map **sm, struct dm_block **sblock); | ||
129 | |||
130 | #endif /* _LINUX_DM_TRANSACTION_MANAGER_H */ | ||