aboutsummaryrefslogtreecommitdiffstats
path: root/drivers/mtd/ubi/wl.c
diff options
context:
space:
mode:
authorArtem B. Bityutskiy <dedekind@linutronix.de>2006-06-27 04:22:22 -0400
committerFrank Haverkamp <haver@vnet.ibm.com>2007-04-27 07:23:33 -0400
commit801c135ce73d5df1caf3eca35b66a10824ae0707 (patch)
treeeaf6e7859650557192533b70746479de686c56e1 /drivers/mtd/ubi/wl.c
parentde46c33745f5e2ad594c72f2cf5f490861b16ce1 (diff)
UBI: Unsorted Block Images
UBI (Latin: "where?") manages multiple logical volumes on a single flash device, specifically supporting NAND flash devices. UBI provides a flexible partitioning concept which still allows for wear-levelling across the whole flash device. In a sense, UBI may be compared to the Logical Volume Manager (LVM). Whereas LVM maps logical sector numbers to physical HDD sector numbers, UBI maps logical eraseblocks to physical eraseblocks. More information may be found at http://www.linux-mtd.infradead.org/doc/ubi.html Partitioning/Re-partitioning An UBI volume occupies a certain number of erase blocks. This is limited by a configured maximum volume size, which could also be viewed as the partition size. Each individual UBI volume's size can be changed independently of the other UBI volumes, provided that the sum of all volume sizes doesn't exceed a certain limit. UBI supports dynamic volumes and static volumes. Static volumes are read-only and their contents are protected by CRC check sums. Bad eraseblocks handling UBI transparently handles bad eraseblocks. When a physical eraseblock becomes bad, it is substituted by a good physical eraseblock, and the user does not even notice this. Scrubbing On a NAND flash bit flips can occur on any write operation, sometimes also on read. If bit flips persist on the device, at first they can still be corrected by ECC, but once they accumulate, correction will become impossible. Thus it is best to actively scrub the affected eraseblock, by first copying it to a free eraseblock and then erasing the original. The UBI layer performs this type of scrubbing under the covers, transparently to the UBI volume users. Erase Counts UBI maintains an erase count header per eraseblock. This frees higher-level layers (like file systems) from doing this and allows for centralized erase count management instead. The erase counts are used by the wear-levelling algorithm in the UBI layer. The algorithm itself is exchangeable. Booting from NAND For booting directly from NAND flash the hardware must at least be capable of fetching and executing a small portion of the NAND flash. Some NAND flash controllers have this kind of support. They usually limit the window to a few kilobytes in erase block 0. This "initial program loader" (IPL) must then contain sufficient logic to load and execute the next boot phase. Due to bad eraseblocks, which may be randomly scattered over the flash device, it is problematic to store the "secondary program loader" (SPL) statically. Also, due to bit-flips it may become corrupted over time. UBI allows to solve this problem gracefully by storing the SPL in a small static UBI volume. UBI volumes vs. static partitions UBI volumes are still very similar to static MTD partitions: * both consist of eraseblocks (logical eraseblocks in case of UBI volumes, and physical eraseblocks in case of static partitions; * both support three basic operations - read, write, erase. But UBI volumes have the following advantages over traditional static MTD partitions: * there are no eraseblock wear-leveling constraints in case of UBI volumes, so the user should not care about this; * there are no bit-flips and bad eraseblocks in case of UBI volumes. So, UBI volumes may be considered as flash devices with relaxed restrictions. Where can it be found? Documentation, kernel code and applications can be found in the MTD gits. What are the applications for? The applications help to create binary flash images for two purposes: pfi files (partial flash images) for in-system update of UBI volumes, and plain binary images, with or without OOB data in case of NAND, for a manufacturing step. Furthermore some tools are/and will be created that allow flash content analysis after a system has crashed.. Who did UBI? The original ideas, where UBI is based on, were developed by Andreas Arnez, Frank Haverkamp and Thomas Gleixner. Josh W. Boyer and some others were involved too. The implementation of the kernel layer was done by Artem B. Bityutskiy. The user-space applications and tools were written by Oliver Lohmann with contributions from Frank Haverkamp, Andreas Arnez, and Artem. Joern Engel contributed a patch which modifies JFFS2 so that it can be run on a UBI volume. Thomas Gleixner did modifications to the NAND layer. Alexander Schmidt made some testing work as well as core functionality improvements. Signed-off-by: Artem B. Bityutskiy <dedekind@linutronix.de> Signed-off-by: Frank Haverkamp <haver@vnet.ibm.com>
Diffstat (limited to 'drivers/mtd/ubi/wl.c')
-rw-r--r--drivers/mtd/ubi/wl.c1671
1 files changed, 1671 insertions, 0 deletions
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c
new file mode 100644
index 000000000000..9ecaf77eca9e
--- /dev/null
+++ b/drivers/mtd/ubi/wl.c
@@ -0,0 +1,1671 @@
1/*
2 * Copyright (c) International Business Machines Corp., 2006
3 *
4 * This program is free software; you can redistribute it and/or modify
5 * it under the terms of the GNU General Public License as published by
6 * the Free Software Foundation; either version 2 of the License, or
7 * (at your option) any later version.
8 *
9 * This program is distributed in the hope that it will be useful,
10 * but WITHOUT ANY WARRANTY; without even the implied warranty of
11 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See
12 * the GNU General Public License for more details.
13 *
14 * You should have received a copy of the GNU General Public License
15 * along with this program; if not, write to the Free Software
16 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
17 *
18 * Authors: Artem Bityutskiy (Битюцкий Артём), Thomas Gleixner
19 */
20
21/*
22 * UBI wear-leveling unit.
23 *
24 * This unit is responsible for wear-leveling. It works in terms of physical
25 * eraseblocks and erase counters and knows nothing about logical eraseblocks,
26 * volumes, etc. From this unit's perspective all physical eraseblocks are of
27 * two types - used and free. Used physical eraseblocks are those that were
28 * "get" by the 'ubi_wl_get_peb()' function, and free physical eraseblocks are
29 * those that were put by the 'ubi_wl_put_peb()' function.
30 *
31 * Physical eraseblocks returned by 'ubi_wl_get_peb()' have only erase counter
32 * header. The rest of the physical eraseblock contains only 0xFF bytes.
33 *
34 * When physical eraseblocks are returned to the WL unit by means of the
35 * 'ubi_wl_put_peb()' function, they are scheduled for erasure. The erasure is
36 * done asynchronously in context of the per-UBI device background thread,
37 * which is also managed by the WL unit.
38 *
39 * The wear-leveling is ensured by means of moving the contents of used
40 * physical eraseblocks with low erase counter to free physical eraseblocks
41 * with high erase counter.
42 *
43 * The 'ubi_wl_get_peb()' function accepts data type hints which help to pick
44 * an "optimal" physical eraseblock. For example, when it is known that the
45 * physical eraseblock will be "put" soon because it contains short-term data,
46 * the WL unit may pick a free physical eraseblock with low erase counter, and
47 * so forth.
48 *
49 * If the WL unit fails to erase a physical eraseblock, it marks it as bad.
50 *
51 * This unit is also responsible for scrubbing. If a bit-flip is detected in a
52 * physical eraseblock, it has to be moved. Technically this is the same as
53 * moving it for wear-leveling reasons.
54 *
55 * As it was said, for the UBI unit all physical eraseblocks are either "free"
56 * or "used". Free eraseblock are kept in the @wl->free RB-tree, while used
57 * eraseblocks are kept in a set of different RB-trees: @wl->used,
58 * @wl->prot.pnum, @wl->prot.aec, and @wl->scrub.
59 *
60 * Note, in this implementation, we keep a small in-RAM object for each physical
61 * eraseblock. This is surely not a scalable solution. But it appears to be good
62 * enough for moderately large flashes and it is simple. In future, one may
63 * re-work this unit and make it more scalable.
64 *
65 * At the moment this unit does not utilize the sequence number, which was
66 * introduced relatively recently. But it would be wise to do this because the
67 * sequence number of a logical eraseblock characterizes how old is it. For
68 * example, when we move a PEB with low erase counter, and we need to pick the
69 * target PEB, we pick a PEB with the highest EC if our PEB is "old" and we
70 * pick target PEB with an average EC if our PEB is not very "old". This is a
71 * room for future re-works of the WL unit.
72 *
73 * FIXME: looks too complex, should be simplified (later).
74 */
75
76#include <linux/slab.h>
77#include <linux/crc32.h>
78#include <linux/freezer.h>
79#include <linux/kthread.h>
80#include "ubi.h"
81
82/* Number of physical eraseblocks reserved for wear-leveling purposes */
83#define WL_RESERVED_PEBS 1
84
85/*
86 * How many erase cycles are short term, unknown, and long term physical
87 * eraseblocks protected.
88 */
89#define ST_PROTECTION 16
90#define U_PROTECTION 10
91#define LT_PROTECTION 4
92
93/*
94 * Maximum difference between two erase counters. If this threshold is
95 * exceeded, the WL unit starts moving data from used physical eraseblocks with
96 * low erase counter to free physical eraseblocks with high erase counter.
97 */
98#define UBI_WL_THRESHOLD CONFIG_MTD_UBI_WL_THRESHOLD
99
100/*
101 * When a physical eraseblock is moved, the WL unit has to pick the target
102 * physical eraseblock to move to. The simplest way would be just to pick the
103 * one with the highest erase counter. But in certain workloads this could lead
104 * to an unlimited wear of one or few physical eraseblock. Indeed, imagine a
105 * situation when the picked physical eraseblock is constantly erased after the
106 * data is written to it. So, we have a constant which limits the highest erase
107 * counter of the free physical eraseblock to pick. Namely, the WL unit does
108 * not pick eraseblocks with erase counter greater then the lowest erase
109 * counter plus %WL_FREE_MAX_DIFF.
110 */
111#define WL_FREE_MAX_DIFF (2*UBI_WL_THRESHOLD)
112
113/*
114 * Maximum number of consecutive background thread failures which is enough to
115 * switch to read-only mode.
116 */
117#define WL_MAX_FAILURES 32
118
119/**
120 * struct ubi_wl_entry - wear-leveling entry.
121 * @rb: link in the corresponding RB-tree
122 * @ec: erase counter
123 * @pnum: physical eraseblock number
124 *
125 * Each physical eraseblock has a corresponding &struct wl_entry object which
126 * may be kept in different RB-trees.
127 */
128struct ubi_wl_entry {
129 struct rb_node rb;
130 int ec;
131 int pnum;
132};
133
134/**
135 * struct ubi_wl_prot_entry - PEB protection entry.
136 * @rb_pnum: link in the @wl->prot.pnum RB-tree
137 * @rb_aec: link in the @wl->prot.aec RB-tree
138 * @abs_ec: the absolute erase counter value when the protection ends
139 * @e: the wear-leveling entry of the physical eraseblock under protection
140 *
141 * When the WL unit returns a physical eraseblock, the physical eraseblock is
142 * protected from being moved for some "time". For this reason, the physical
143 * eraseblock is not directly moved from the @wl->free tree to the @wl->used
144 * tree. There is one more tree in between where this physical eraseblock is
145 * temporarily stored (@wl->prot).
146 *
147 * All this protection stuff is needed because:
148 * o we don't want to move physical eraseblocks just after we have given them
149 * to the user; instead, we first want to let users fill them up with data;
150 *
151 * o there is a chance that the user will put the physical eraseblock very
152 * soon, so it makes sense not to move it for some time, but wait; this is
153 * especially important in case of "short term" physical eraseblocks.
154 *
155 * Physical eraseblocks stay protected only for limited time. But the "time" is
156 * measured in erase cycles in this case. This is implemented with help of the
157 * absolute erase counter (@wl->abs_ec). When it reaches certain value, the
158 * physical eraseblocks are moved from the protection trees (@wl->prot.*) to
159 * the @wl->used tree.
160 *
161 * Protected physical eraseblocks are searched by physical eraseblock number
162 * (when they are put) and by the absolute erase counter (to check if it is
163 * time to move them to the @wl->used tree). So there are actually 2 RB-trees
164 * storing the protected physical eraseblocks: @wl->prot.pnum and
165 * @wl->prot.aec. They are referred to as the "protection" trees. The
166 * first one is indexed by the physical eraseblock number. The second one is
167 * indexed by the absolute erase counter. Both trees store
168 * &struct ubi_wl_prot_entry objects.
169 *
170 * Each physical eraseblock has 2 main states: free and used. The former state
171 * corresponds to the @wl->free tree. The latter state is split up on several
172 * sub-states:
173 * o the WL movement is allowed (@wl->used tree);
174 * o the WL movement is temporarily prohibited (@wl->prot.pnum and
175 * @wl->prot.aec trees);
176 * o scrubbing is needed (@wl->scrub tree).
177 *
178 * Depending on the sub-state, wear-leveling entries of the used physical
179 * eraseblocks may be kept in one of those trees.
180 */
181struct ubi_wl_prot_entry {
182 struct rb_node rb_pnum;
183 struct rb_node rb_aec;
184 unsigned long long abs_ec;
185 struct ubi_wl_entry *e;
186};
187
188/**
189 * struct ubi_work - UBI work description data structure.
190 * @list: a link in the list of pending works
191 * @func: worker function
192 * @priv: private data of the worker function
193 *
194 * @e: physical eraseblock to erase
195 * @torture: if the physical eraseblock has to be tortured
196 *
197 * The @func pointer points to the worker function. If the @cancel argument is
198 * not zero, the worker has to free the resources and exit immediately. The
199 * worker has to return zero in case of success and a negative error code in
200 * case of failure.
201 */
202struct ubi_work {
203 struct list_head list;
204 int (*func)(struct ubi_device *ubi, struct ubi_work *wrk, int cancel);
205 /* The below fields are only relevant to erasure works */
206 struct ubi_wl_entry *e;
207 int torture;
208};
209
210#ifdef CONFIG_MTD_UBI_DEBUG_PARANOID
211static int paranoid_check_ec(const struct ubi_device *ubi, int pnum, int ec);
212static int paranoid_check_in_wl_tree(struct ubi_wl_entry *e,
213 struct rb_root *root);
214#else
215#define paranoid_check_ec(ubi, pnum, ec) 0
216#define paranoid_check_in_wl_tree(e, root)
217#endif
218
219/* Slab cache for wear-leveling entries */
220static struct kmem_cache *wl_entries_slab;
221
222/**
223 * tree_empty - a helper function to check if an RB-tree is empty.
224 * @root: the root of the tree
225 *
226 * This function returns non-zero if the RB-tree is empty and zero if not.
227 */
228static inline int tree_empty(struct rb_root *root)
229{
230 return root->rb_node == NULL;
231}
232
233/**
234 * wl_tree_add - add a wear-leveling entry to a WL RB-tree.
235 * @e: the wear-leveling entry to add
236 * @root: the root of the tree
237 *
238 * Note, we use (erase counter, physical eraseblock number) pairs as keys in
239 * the @ubi->used and @ubi->free RB-trees.
240 */
241static void wl_tree_add(struct ubi_wl_entry *e, struct rb_root *root)
242{
243 struct rb_node **p, *parent = NULL;
244
245 p = &root->rb_node;
246 while (*p) {
247 struct ubi_wl_entry *e1;
248
249 parent = *p;
250 e1 = rb_entry(parent, struct ubi_wl_entry, rb);
251
252 if (e->ec < e1->ec)
253 p = &(*p)->rb_left;
254 else if (e->ec > e1->ec)
255 p = &(*p)->rb_right;
256 else {
257 ubi_assert(e->pnum != e1->pnum);
258 if (e->pnum < e1->pnum)
259 p = &(*p)->rb_left;
260 else
261 p = &(*p)->rb_right;
262 }
263 }
264
265 rb_link_node(&e->rb, parent, p);
266 rb_insert_color(&e->rb, root);
267}
268
269
270/*
271 * Helper functions to add and delete wear-leveling entries from different
272 * trees.
273 */
274
275static void free_tree_add(struct ubi_device *ubi, struct ubi_wl_entry *e)
276{
277 wl_tree_add(e, &ubi->free);
278}
279static inline void used_tree_add(struct ubi_device *ubi,
280 struct ubi_wl_entry *e)
281{
282 wl_tree_add(e, &ubi->used);
283}
284static inline void scrub_tree_add(struct ubi_device *ubi,
285 struct ubi_wl_entry *e)
286{
287 wl_tree_add(e, &ubi->scrub);
288}
289static inline void free_tree_del(struct ubi_device *ubi,
290 struct ubi_wl_entry *e)
291{
292 paranoid_check_in_wl_tree(e, &ubi->free);
293 rb_erase(&e->rb, &ubi->free);
294}
295static inline void used_tree_del(struct ubi_device *ubi,
296 struct ubi_wl_entry *e)
297{
298 paranoid_check_in_wl_tree(e, &ubi->used);
299 rb_erase(&e->rb, &ubi->used);
300}
301static inline void scrub_tree_del(struct ubi_device *ubi,
302 struct ubi_wl_entry *e)
303{
304 paranoid_check_in_wl_tree(e, &ubi->scrub);
305 rb_erase(&e->rb, &ubi->scrub);
306}
307
308/**
309 * do_work - do one pending work.
310 * @ubi: UBI device description object
311 *
312 * This function returns zero in case of success and a negative error code in
313 * case of failure.
314 */
315static int do_work(struct ubi_device *ubi)
316{
317 int err;
318 struct ubi_work *wrk;
319
320 spin_lock(&ubi->wl_lock);
321
322 if (list_empty(&ubi->works)) {
323 spin_unlock(&ubi->wl_lock);
324 return 0;
325 }
326
327 wrk = list_entry(ubi->works.next, struct ubi_work, list);
328 list_del(&wrk->list);
329 spin_unlock(&ubi->wl_lock);
330
331 /*
332 * Call the worker function. Do not touch the work structure
333 * after this call as it will have been freed or reused by that
334 * time by the worker function.
335 */
336 err = wrk->func(ubi, wrk, 0);
337 if (err)
338 ubi_err("work failed with error code %d", err);
339
340 spin_lock(&ubi->wl_lock);
341 ubi->works_count -= 1;
342 ubi_assert(ubi->works_count >= 0);
343 spin_unlock(&ubi->wl_lock);
344 return err;
345}
346
347/**
348 * produce_free_peb - produce a free physical eraseblock.
349 * @ubi: UBI device description object
350 *
351 * This function tries to make a free PEB by means of synchronous execution of
352 * pending works. This may be needed if, for example the background thread is
353 * disabled. Returns zero in case of success and a negative error code in case
354 * of failure.
355 */
356static int produce_free_peb(struct ubi_device *ubi)
357{
358 int err;
359
360 spin_lock(&ubi->wl_lock);
361 while (tree_empty(&ubi->free)) {
362 spin_unlock(&ubi->wl_lock);
363
364 dbg_wl("do one work synchronously");
365 err = do_work(ubi);
366 if (err)
367 return err;
368
369 spin_lock(&ubi->wl_lock);
370 }
371 spin_unlock(&ubi->wl_lock);
372
373 return 0;
374}
375
376/**
377 * in_wl_tree - check if wear-leveling entry is present in a WL RB-tree.
378 * @e: the wear-leveling entry to check
379 * @root: the root of the tree
380 *
381 * This function returns non-zero if @e is in the @root RB-tree and zero if it
382 * is not.
383 */
384static int in_wl_tree(struct ubi_wl_entry *e, struct rb_root *root)
385{
386 struct rb_node *p;
387
388 p = root->rb_node;
389 while (p) {
390 struct ubi_wl_entry *e1;
391
392 e1 = rb_entry(p, struct ubi_wl_entry, rb);
393
394 if (e->pnum == e1->pnum) {
395 ubi_assert(e == e1);
396 return 1;
397 }
398
399 if (e->ec < e1->ec)
400 p = p->rb_left;
401 else if (e->ec > e1->ec)
402 p = p->rb_right;
403 else {
404 ubi_assert(e->pnum != e1->pnum);
405 if (e->pnum < e1->pnum)
406 p = p->rb_left;
407 else
408 p = p->rb_right;
409 }
410 }
411
412 return 0;
413}
414
415/**
416 * prot_tree_add - add physical eraseblock to protection trees.
417 * @ubi: UBI device description object
418 * @e: the physical eraseblock to add
419 * @pe: protection entry object to use
420 * @abs_ec: absolute erase counter value when this physical eraseblock has
421 * to be removed from the protection trees.
422 *
423 * @wl->lock has to be locked.
424 */
425static void prot_tree_add(struct ubi_device *ubi, struct ubi_wl_entry *e,
426 struct ubi_wl_prot_entry *pe, int abs_ec)
427{
428 struct rb_node **p, *parent = NULL;
429 struct ubi_wl_prot_entry *pe1;
430
431 pe->e = e;
432 pe->abs_ec = ubi->abs_ec + abs_ec;
433
434 p = &ubi->prot.pnum.rb_node;
435 while (*p) {
436 parent = *p;
437 pe1 = rb_entry(parent, struct ubi_wl_prot_entry, rb_pnum);
438
439 if (e->pnum < pe1->e->pnum)
440 p = &(*p)->rb_left;
441 else
442 p = &(*p)->rb_right;
443 }
444 rb_link_node(&pe->rb_pnum, parent, p);
445 rb_insert_color(&pe->rb_pnum, &ubi->prot.pnum);
446
447 p = &ubi->prot.aec.rb_node;
448 parent = NULL;
449 while (*p) {
450 parent = *p;
451 pe1 = rb_entry(parent, struct ubi_wl_prot_entry, rb_aec);
452
453 if (pe->abs_ec < pe1->abs_ec)
454 p = &(*p)->rb_left;
455 else
456 p = &(*p)->rb_right;
457 }
458 rb_link_node(&pe->rb_aec, parent, p);
459 rb_insert_color(&pe->rb_aec, &ubi->prot.aec);
460}
461
462/**
463 * find_wl_entry - find wear-leveling entry closest to certain erase counter.
464 * @root: the RB-tree where to look for
465 * @max: highest possible erase counter
466 *
467 * This function looks for a wear leveling entry with erase counter closest to
468 * @max and less then @max.
469 */
470static struct ubi_wl_entry *find_wl_entry(struct rb_root *root, int max)
471{
472 struct rb_node *p;
473 struct ubi_wl_entry *e;
474
475 e = rb_entry(rb_first(root), struct ubi_wl_entry, rb);
476 max += e->ec;
477
478 p = root->rb_node;
479 while (p) {
480 struct ubi_wl_entry *e1;
481
482 e1 = rb_entry(p, struct ubi_wl_entry, rb);
483 if (e1->ec >= max)
484 p = p->rb_left;
485 else {
486 p = p->rb_right;
487 e = e1;
488 }
489 }
490
491 return e;
492}
493
494/**
495 * ubi_wl_get_peb - get a physical eraseblock.
496 * @ubi: UBI device description object
497 * @dtype: type of data which will be stored in this physical eraseblock
498 *
499 * This function returns a physical eraseblock in case of success and a
500 * negative error code in case of failure. Might sleep.
501 */
502int ubi_wl_get_peb(struct ubi_device *ubi, int dtype)
503{
504 int err, protect, medium_ec;
505 struct ubi_wl_entry *e, *first, *last;
506 struct ubi_wl_prot_entry *pe;
507
508 ubi_assert(dtype == UBI_LONGTERM || dtype == UBI_SHORTTERM ||
509 dtype == UBI_UNKNOWN);
510
511 pe = kmalloc(sizeof(struct ubi_wl_prot_entry), GFP_KERNEL);
512 if (!pe)
513 return -ENOMEM;
514
515retry:
516 spin_lock(&ubi->wl_lock);
517 if (tree_empty(&ubi->free)) {
518 if (ubi->works_count == 0) {
519 ubi_assert(list_empty(&ubi->works));
520 ubi_err("no free eraseblocks");
521 spin_unlock(&ubi->wl_lock);
522 kfree(pe);
523 return -ENOSPC;
524 }
525 spin_unlock(&ubi->wl_lock);
526
527 err = produce_free_peb(ubi);
528 if (err < 0) {
529 kfree(pe);
530 return err;
531 }
532 goto retry;
533 }
534
535 switch (dtype) {
536 case UBI_LONGTERM:
537 /*
538 * For long term data we pick a physical eraseblock
539 * with high erase counter. But the highest erase
540 * counter we can pick is bounded by the the lowest
541 * erase counter plus %WL_FREE_MAX_DIFF.
542 */
543 e = find_wl_entry(&ubi->free, WL_FREE_MAX_DIFF);
544 protect = LT_PROTECTION;
545 break;
546 case UBI_UNKNOWN:
547 /*
548 * For unknown data we pick a physical eraseblock with
549 * medium erase counter. But we by no means can pick a
550 * physical eraseblock with erase counter greater or
551 * equivalent than the lowest erase counter plus
552 * %WL_FREE_MAX_DIFF.
553 */
554 first = rb_entry(rb_first(&ubi->free),
555 struct ubi_wl_entry, rb);
556 last = rb_entry(rb_last(&ubi->free),
557 struct ubi_wl_entry, rb);
558
559 if (last->ec - first->ec < WL_FREE_MAX_DIFF)
560 e = rb_entry(ubi->free.rb_node,
561 struct ubi_wl_entry, rb);
562 else {
563 medium_ec = (first->ec + WL_FREE_MAX_DIFF)/2;
564 e = find_wl_entry(&ubi->free, medium_ec);
565 }
566 protect = U_PROTECTION;
567 break;
568 case UBI_SHORTTERM:
569 /*
570 * For short term data we pick a physical eraseblock
571 * with the lowest erase counter as we expect it will
572 * be erased soon.
573 */
574 e = rb_entry(rb_first(&ubi->free),
575 struct ubi_wl_entry, rb);
576 protect = ST_PROTECTION;
577 break;
578 default:
579 protect = 0;
580 e = NULL;
581 BUG();
582 }
583
584 /*
585 * Move the physical eraseblock to the protection trees where it will
586 * be protected from being moved for some time.
587 */
588 free_tree_del(ubi, e);
589 prot_tree_add(ubi, e, pe, protect);
590
591 dbg_wl("PEB %d EC %d, protection %d", e->pnum, e->ec, protect);
592 spin_unlock(&ubi->wl_lock);
593
594 return e->pnum;
595}
596
597/**
598 * prot_tree_del - remove a physical eraseblock from the protection trees
599 * @ubi: UBI device description object
600 * @pnum: the physical eraseblock to remove
601 */
602static void prot_tree_del(struct ubi_device *ubi, int pnum)
603{
604 struct rb_node *p;
605 struct ubi_wl_prot_entry *pe = NULL;
606
607 p = ubi->prot.pnum.rb_node;
608 while (p) {
609
610 pe = rb_entry(p, struct ubi_wl_prot_entry, rb_pnum);
611
612 if (pnum == pe->e->pnum)
613 break;
614
615 if (pnum < pe->e->pnum)
616 p = p->rb_left;
617 else
618 p = p->rb_right;
619 }
620
621 ubi_assert(pe->e->pnum == pnum);
622 rb_erase(&pe->rb_aec, &ubi->prot.aec);
623 rb_erase(&pe->rb_pnum, &ubi->prot.pnum);
624 kfree(pe);
625}
626
627/**
628 * sync_erase - synchronously erase a physical eraseblock.
629 * @ubi: UBI device description object
630 * @e: the the physical eraseblock to erase
631 * @torture: if the physical eraseblock has to be tortured
632 *
633 * This function returns zero in case of success and a negative error code in
634 * case of failure.
635 */
636static int sync_erase(struct ubi_device *ubi, struct ubi_wl_entry *e, int torture)
637{
638 int err;
639 struct ubi_ec_hdr *ec_hdr;
640 unsigned long long ec = e->ec;
641
642 dbg_wl("erase PEB %d, old EC %llu", e->pnum, ec);
643
644 err = paranoid_check_ec(ubi, e->pnum, e->ec);
645 if (err > 0)
646 return -EINVAL;
647
648 ec_hdr = kzalloc(ubi->ec_hdr_alsize, GFP_KERNEL);
649 if (!ec_hdr)
650 return -ENOMEM;
651
652 err = ubi_io_sync_erase(ubi, e->pnum, torture);
653 if (err < 0)
654 goto out_free;
655
656 ec += err;
657 if (ec > UBI_MAX_ERASECOUNTER) {
658 /*
659 * Erase counter overflow. Upgrade UBI and use 64-bit
660 * erase counters internally.
661 */
662 ubi_err("erase counter overflow at PEB %d, EC %llu",
663 e->pnum, ec);
664 err = -EINVAL;
665 goto out_free;
666 }
667
668 dbg_wl("erased PEB %d, new EC %llu", e->pnum, ec);
669
670 ec_hdr->ec = cpu_to_ubi64(ec);
671
672 err = ubi_io_write_ec_hdr(ubi, e->pnum, ec_hdr);
673 if (err)
674 goto out_free;
675
676 e->ec = ec;
677 spin_lock(&ubi->wl_lock);
678 if (e->ec > ubi->max_ec)
679 ubi->max_ec = e->ec;
680 spin_unlock(&ubi->wl_lock);
681
682out_free:
683 kfree(ec_hdr);
684 return err;
685}
686
687/**
688 * check_protection_over - check if it is time to stop protecting some
689 * physical eraseblocks.
690 * @ubi: UBI device description object
691 *
692 * This function is called after each erase operation, when the absolute erase
693 * counter is incremented, to check if some physical eraseblock have not to be
694 * protected any longer. These physical eraseblocks are moved from the
695 * protection trees to the used tree.
696 */
697static void check_protection_over(struct ubi_device *ubi)
698{
699 struct ubi_wl_prot_entry *pe;
700
701 /*
702 * There may be several protected physical eraseblock to remove,
703 * process them all.
704 */
705 while (1) {
706 spin_lock(&ubi->wl_lock);
707 if (tree_empty(&ubi->prot.aec)) {
708 spin_unlock(&ubi->wl_lock);
709 break;
710 }
711
712 pe = rb_entry(rb_first(&ubi->prot.aec),
713 struct ubi_wl_prot_entry, rb_aec);
714
715 if (pe->abs_ec > ubi->abs_ec) {
716 spin_unlock(&ubi->wl_lock);
717 break;
718 }
719
720 dbg_wl("PEB %d protection over, abs_ec %llu, PEB abs_ec %llu",
721 pe->e->pnum, ubi->abs_ec, pe->abs_ec);
722 rb_erase(&pe->rb_aec, &ubi->prot.aec);
723 rb_erase(&pe->rb_pnum, &ubi->prot.pnum);
724 used_tree_add(ubi, pe->e);
725 spin_unlock(&ubi->wl_lock);
726
727 kfree(pe);
728 cond_resched();
729 }
730}
731
732/**
733 * schedule_ubi_work - schedule a work.
734 * @ubi: UBI device description object
735 * @wrk: the work to schedule
736 *
737 * This function enqueues a work defined by @wrk to the tail of the pending
738 * works list.
739 */
740static void schedule_ubi_work(struct ubi_device *ubi, struct ubi_work *wrk)
741{
742 spin_lock(&ubi->wl_lock);
743 list_add_tail(&wrk->list, &ubi->works);
744 ubi_assert(ubi->works_count >= 0);
745 ubi->works_count += 1;
746 if (ubi->thread_enabled)
747 wake_up_process(ubi->bgt_thread);
748 spin_unlock(&ubi->wl_lock);
749}
750
751static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
752 int cancel);
753
754/**
755 * schedule_erase - schedule an erase work.
756 * @ubi: UBI device description object
757 * @e: the WL entry of the physical eraseblock to erase
758 * @torture: if the physical eraseblock has to be tortured
759 *
760 * This function returns zero in case of success and a %-ENOMEM in case of
761 * failure.
762 */
763static int schedule_erase(struct ubi_device *ubi, struct ubi_wl_entry *e,
764 int torture)
765{
766 struct ubi_work *wl_wrk;
767
768 dbg_wl("schedule erasure of PEB %d, EC %d, torture %d",
769 e->pnum, e->ec, torture);
770
771 wl_wrk = kmalloc(sizeof(struct ubi_work), GFP_KERNEL);
772 if (!wl_wrk)
773 return -ENOMEM;
774
775 wl_wrk->func = &erase_worker;
776 wl_wrk->e = e;
777 wl_wrk->torture = torture;
778
779 schedule_ubi_work(ubi, wl_wrk);
780 return 0;
781}
782
783/**
784 * wear_leveling_worker - wear-leveling worker function.
785 * @ubi: UBI device description object
786 * @wrk: the work object
787 * @cancel: non-zero if the worker has to free memory and exit
788 *
789 * This function copies a more worn out physical eraseblock to a less worn out
790 * one. Returns zero in case of success and a negative error code in case of
791 * failure.
792 */
793static int wear_leveling_worker(struct ubi_device *ubi, struct ubi_work *wrk,
794 int cancel)
795{
796 int err, put = 0;
797 struct ubi_wl_entry *e1, *e2;
798 struct ubi_vid_hdr *vid_hdr;
799
800 kfree(wrk);
801
802 if (cancel)
803 return 0;
804
805 vid_hdr = ubi_zalloc_vid_hdr(ubi);
806 if (!vid_hdr)
807 return -ENOMEM;
808
809 spin_lock(&ubi->wl_lock);
810
811 /*
812 * Only one WL worker at a time is supported at this implementation, so
813 * make sure a PEB is not being moved already.
814 */
815 if (ubi->move_to || tree_empty(&ubi->free) ||
816 (tree_empty(&ubi->used) && tree_empty(&ubi->scrub))) {
817 /*
818 * Only one WL worker at a time is supported at this
819 * implementation, so if a LEB is already being moved, cancel.
820 *
821 * No free physical eraseblocks? Well, we cancel wear-leveling
822 * then. It will be triggered again when a free physical
823 * eraseblock appears.
824 *
825 * No used physical eraseblocks? They must be temporarily
826 * protected from being moved. They will be moved to the
827 * @ubi->used tree later and the wear-leveling will be
828 * triggered again.
829 */
830 dbg_wl("cancel WL, a list is empty: free %d, used %d",
831 tree_empty(&ubi->free), tree_empty(&ubi->used));
832 ubi->wl_scheduled = 0;
833 spin_unlock(&ubi->wl_lock);
834 ubi_free_vid_hdr(ubi, vid_hdr);
835 return 0;
836 }
837
838 if (tree_empty(&ubi->scrub)) {
839 /*
840 * Now pick the least worn-out used physical eraseblock and a
841 * highly worn-out free physical eraseblock. If the erase
842 * counters differ much enough, start wear-leveling.
843 */
844 e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, rb);
845 e2 = find_wl_entry(&ubi->free, WL_FREE_MAX_DIFF);
846
847 if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD)) {
848 dbg_wl("no WL needed: min used EC %d, max free EC %d",
849 e1->ec, e2->ec);
850 ubi->wl_scheduled = 0;
851 spin_unlock(&ubi->wl_lock);
852 ubi_free_vid_hdr(ubi, vid_hdr);
853 return 0;
854 }
855 used_tree_del(ubi, e1);
856 dbg_wl("move PEB %d EC %d to PEB %d EC %d",
857 e1->pnum, e1->ec, e2->pnum, e2->ec);
858 } else {
859 e1 = rb_entry(rb_first(&ubi->scrub), struct ubi_wl_entry, rb);
860 e2 = find_wl_entry(&ubi->free, WL_FREE_MAX_DIFF);
861 scrub_tree_del(ubi, e1);
862 dbg_wl("scrub PEB %d to PEB %d", e1->pnum, e2->pnum);
863 }
864
865 free_tree_del(ubi, e2);
866 ubi_assert(!ubi->move_from && !ubi->move_to);
867 ubi_assert(!ubi->move_to_put && !ubi->move_from_put);
868 ubi->move_from = e1;
869 ubi->move_to = e2;
870 spin_unlock(&ubi->wl_lock);
871
872 /*
873 * Now we are going to copy physical eraseblock @e1->pnum to @e2->pnum.
874 * We so far do not know which logical eraseblock our physical
875 * eraseblock (@e1) belongs to. We have to read the volume identifier
876 * header first.
877 */
878
879 err = ubi_io_read_vid_hdr(ubi, e1->pnum, vid_hdr, 0);
880 if (err && err != UBI_IO_BITFLIPS) {
881 if (err == UBI_IO_PEB_FREE) {
882 /*
883 * We are trying to move PEB without a VID header. UBI
884 * always write VID headers shortly after the PEB was
885 * given, so we have a situation when it did not have
886 * chance to write it down because it was preempted.
887 * Just re-schedule the work, so that next time it will
888 * likely have the VID header in place.
889 */
890 dbg_wl("PEB %d has no VID header", e1->pnum);
891 err = 0;
892 } else {
893 ubi_err("error %d while reading VID header from PEB %d",
894 err, e1->pnum);
895 if (err > 0)
896 err = -EIO;
897 }
898 goto error;
899 }
900
901 err = ubi_eba_copy_leb(ubi, e1->pnum, e2->pnum, vid_hdr);
902 if (err) {
903 if (err == UBI_IO_BITFLIPS)
904 err = 0;
905 goto error;
906 }
907
908 ubi_free_vid_hdr(ubi, vid_hdr);
909 spin_lock(&ubi->wl_lock);
910 if (!ubi->move_to_put)
911 used_tree_add(ubi, e2);
912 else
913 put = 1;
914 ubi->move_from = ubi->move_to = NULL;
915 ubi->move_from_put = ubi->move_to_put = 0;
916 ubi->wl_scheduled = 0;
917 spin_unlock(&ubi->wl_lock);
918
919 if (put) {
920 /*
921 * Well, the target PEB was put meanwhile, schedule it for
922 * erasure.
923 */
924 dbg_wl("PEB %d was put meanwhile, erase", e2->pnum);
925 err = schedule_erase(ubi, e2, 0);
926 if (err) {
927 kmem_cache_free(wl_entries_slab, e2);
928 ubi_ro_mode(ubi);
929 }
930 }
931
932 err = schedule_erase(ubi, e1, 0);
933 if (err) {
934 kmem_cache_free(wl_entries_slab, e1);
935 ubi_ro_mode(ubi);
936 }
937
938 dbg_wl("done");
939 return err;
940
941 /*
942 * Some error occurred. @e1 was not changed, so return it back. @e2
943 * might be changed, schedule it for erasure.
944 */
945error:
946 if (err)
947 dbg_wl("error %d occurred, cancel operation", err);
948 ubi_assert(err <= 0);
949
950 ubi_free_vid_hdr(ubi, vid_hdr);
951 spin_lock(&ubi->wl_lock);
952 ubi->wl_scheduled = 0;
953 if (ubi->move_from_put)
954 put = 1;
955 else
956 used_tree_add(ubi, e1);
957 ubi->move_from = ubi->move_to = NULL;
958 ubi->move_from_put = ubi->move_to_put = 0;
959 spin_unlock(&ubi->wl_lock);
960
961 if (put) {
962 /*
963 * Well, the target PEB was put meanwhile, schedule it for
964 * erasure.
965 */
966 dbg_wl("PEB %d was put meanwhile, erase", e1->pnum);
967 err = schedule_erase(ubi, e1, 0);
968 if (err) {
969 kmem_cache_free(wl_entries_slab, e1);
970 ubi_ro_mode(ubi);
971 }
972 }
973
974 err = schedule_erase(ubi, e2, 0);
975 if (err) {
976 kmem_cache_free(wl_entries_slab, e2);
977 ubi_ro_mode(ubi);
978 }
979
980 yield();
981 return err;
982}
983
984/**
985 * ensure_wear_leveling - schedule wear-leveling if it is needed.
986 * @ubi: UBI device description object
987 *
988 * This function checks if it is time to start wear-leveling and schedules it
989 * if yes. This function returns zero in case of success and a negative error
990 * code in case of failure.
991 */
992static int ensure_wear_leveling(struct ubi_device *ubi)
993{
994 int err = 0;
995 struct ubi_wl_entry *e1;
996 struct ubi_wl_entry *e2;
997 struct ubi_work *wrk;
998
999 spin_lock(&ubi->wl_lock);
1000 if (ubi->wl_scheduled)
1001 /* Wear-leveling is already in the work queue */
1002 goto out_unlock;
1003
1004 /*
1005 * If the ubi->scrub tree is not empty, scrubbing is needed, and the
1006 * the WL worker has to be scheduled anyway.
1007 */
1008 if (tree_empty(&ubi->scrub)) {
1009 if (tree_empty(&ubi->used) || tree_empty(&ubi->free))
1010 /* No physical eraseblocks - no deal */
1011 goto out_unlock;
1012
1013 /*
1014 * We schedule wear-leveling only if the difference between the
1015 * lowest erase counter of used physical eraseblocks and a high
1016 * erase counter of free physical eraseblocks is greater then
1017 * %UBI_WL_THRESHOLD.
1018 */
1019 e1 = rb_entry(rb_first(&ubi->used), struct ubi_wl_entry, rb);
1020 e2 = find_wl_entry(&ubi->free, WL_FREE_MAX_DIFF);
1021
1022 if (!(e2->ec - e1->ec >= UBI_WL_THRESHOLD))
1023 goto out_unlock;
1024 dbg_wl("schedule wear-leveling");
1025 } else
1026 dbg_wl("schedule scrubbing");
1027
1028 ubi->wl_scheduled = 1;
1029 spin_unlock(&ubi->wl_lock);
1030
1031 wrk = kmalloc(sizeof(struct ubi_work), GFP_KERNEL);
1032 if (!wrk) {
1033 err = -ENOMEM;
1034 goto out_cancel;
1035 }
1036
1037 wrk->func = &wear_leveling_worker;
1038 schedule_ubi_work(ubi, wrk);
1039 return err;
1040
1041out_cancel:
1042 spin_lock(&ubi->wl_lock);
1043 ubi->wl_scheduled = 0;
1044out_unlock:
1045 spin_unlock(&ubi->wl_lock);
1046 return err;
1047}
1048
1049/**
1050 * erase_worker - physical eraseblock erase worker function.
1051 * @ubi: UBI device description object
1052 * @wl_wrk: the work object
1053 * @cancel: non-zero if the worker has to free memory and exit
1054 *
1055 * This function erases a physical eraseblock and perform torture testing if
1056 * needed. It also takes care about marking the physical eraseblock bad if
1057 * needed. Returns zero in case of success and a negative error code in case of
1058 * failure.
1059 */
1060static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk,
1061 int cancel)
1062{
1063 int err;
1064 struct ubi_wl_entry *e = wl_wrk->e;
1065 int pnum = e->pnum;
1066
1067 if (cancel) {
1068 dbg_wl("cancel erasure of PEB %d EC %d", pnum, e->ec);
1069 kfree(wl_wrk);
1070 kmem_cache_free(wl_entries_slab, e);
1071 return 0;
1072 }
1073
1074 dbg_wl("erase PEB %d EC %d", pnum, e->ec);
1075
1076 err = sync_erase(ubi, e, wl_wrk->torture);
1077 if (!err) {
1078 /* Fine, we've erased it successfully */
1079 kfree(wl_wrk);
1080
1081 spin_lock(&ubi->wl_lock);
1082 ubi->abs_ec += 1;
1083 free_tree_add(ubi, e);
1084 spin_unlock(&ubi->wl_lock);
1085
1086 /*
1087 * One more erase operation has happened, take care about protected
1088 * physical eraseblocks.
1089 */
1090 check_protection_over(ubi);
1091
1092 /* And take care about wear-leveling */
1093 err = ensure_wear_leveling(ubi);
1094 return err;
1095 }
1096
1097 kfree(wl_wrk);
1098 kmem_cache_free(wl_entries_slab, e);
1099
1100 if (err != -EIO) {
1101 /*
1102 * If this is not %-EIO, we have no idea what to do. Scheduling
1103 * this physical eraseblock for erasure again would cause
1104 * errors again and again. Well, lets switch to RO mode.
1105 */
1106 ubi_ro_mode(ubi);
1107 return err;
1108 }
1109
1110 /* It is %-EIO, the PEB went bad */
1111
1112 if (!ubi->bad_allowed) {
1113 ubi_err("bad physical eraseblock %d detected", pnum);
1114 ubi_ro_mode(ubi);
1115 err = -EIO;
1116 } else {
1117 int need;
1118
1119 spin_lock(&ubi->volumes_lock);
1120 need = ubi->beb_rsvd_level - ubi->beb_rsvd_pebs + 1;
1121 if (need > 0) {
1122 need = ubi->avail_pebs >= need ? need : ubi->avail_pebs;
1123 ubi->avail_pebs -= need;
1124 ubi->rsvd_pebs += need;
1125 ubi->beb_rsvd_pebs += need;
1126 if (need > 0)
1127 ubi_msg("reserve more %d PEBs", need);
1128 }
1129
1130 if (ubi->beb_rsvd_pebs == 0) {
1131 spin_unlock(&ubi->volumes_lock);
1132 ubi_err("no reserved physical eraseblocks");
1133 ubi_ro_mode(ubi);
1134 return -EIO;
1135 }
1136
1137 spin_unlock(&ubi->volumes_lock);
1138 ubi_msg("mark PEB %d as bad", pnum);
1139
1140 err = ubi_io_mark_bad(ubi, pnum);
1141 if (err) {
1142 ubi_ro_mode(ubi);
1143 return err;
1144 }
1145
1146 spin_lock(&ubi->volumes_lock);
1147 ubi->beb_rsvd_pebs -= 1;
1148 ubi->bad_peb_count += 1;
1149 ubi->good_peb_count -= 1;
1150 ubi_calculate_reserved(ubi);
1151 if (ubi->beb_rsvd_pebs == 0)
1152 ubi_warn("last PEB from the reserved pool was used");
1153 spin_unlock(&ubi->volumes_lock);
1154 }
1155
1156 return err;
1157}
1158
1159/**
1160 * ubi_wl_put_peb - return a physical eraseblock to the wear-leveling
1161 * unit.
1162 * @ubi: UBI device description object
1163 * @pnum: physical eraseblock to return
1164 * @torture: if this physical eraseblock has to be tortured
1165 *
1166 * This function is called to return physical eraseblock @pnum to the pool of
1167 * free physical eraseblocks. The @torture flag has to be set if an I/O error
1168 * occurred to this @pnum and it has to be tested. This function returns zero
1169 * in case of success and a negative error code in case of failure.
1170 */
1171int ubi_wl_put_peb(struct ubi_device *ubi, int pnum, int torture)
1172{
1173 int err;
1174 struct ubi_wl_entry *e;
1175
1176 dbg_wl("PEB %d", pnum);
1177 ubi_assert(pnum >= 0);
1178 ubi_assert(pnum < ubi->peb_count);
1179
1180 spin_lock(&ubi->wl_lock);
1181
1182 e = ubi->lookuptbl[pnum];
1183 if (e == ubi->move_from) {
1184 /*
1185 * User is putting the physical eraseblock which was selected to
1186 * be moved. It will be scheduled for erasure in the
1187 * wear-leveling worker.
1188 */
1189 dbg_wl("PEB %d is being moved", pnum);
1190 ubi_assert(!ubi->move_from_put);
1191 ubi->move_from_put = 1;
1192 spin_unlock(&ubi->wl_lock);
1193 return 0;
1194 } else if (e == ubi->move_to) {
1195 /*
1196 * User is putting the physical eraseblock which was selected
1197 * as the target the data is moved to. It may happen if the EBA
1198 * unit already re-mapped the LEB but the WL unit did has not
1199 * put the PEB to the "used" tree.
1200 */
1201 dbg_wl("PEB %d is the target of data moving", pnum);
1202 ubi_assert(!ubi->move_to_put);
1203 ubi->move_to_put = 1;
1204 spin_unlock(&ubi->wl_lock);
1205 return 0;
1206 } else {
1207 if (in_wl_tree(e, &ubi->used))
1208 used_tree_del(ubi, e);
1209 else if (in_wl_tree(e, &ubi->scrub))
1210 scrub_tree_del(ubi, e);
1211 else
1212 prot_tree_del(ubi, e->pnum);
1213 }
1214 spin_unlock(&ubi->wl_lock);
1215
1216 err = schedule_erase(ubi, e, torture);
1217 if (err) {
1218 spin_lock(&ubi->wl_lock);
1219 used_tree_add(ubi, e);
1220 spin_unlock(&ubi->wl_lock);
1221 }
1222
1223 return err;
1224}
1225
1226/**
1227 * ubi_wl_scrub_peb - schedule a physical eraseblock for scrubbing.
1228 * @ubi: UBI device description object
1229 * @pnum: the physical eraseblock to schedule
1230 *
1231 * If a bit-flip in a physical eraseblock is detected, this physical eraseblock
1232 * needs scrubbing. This function schedules a physical eraseblock for
1233 * scrubbing which is done in background. This function returns zero in case of
1234 * success and a negative error code in case of failure.
1235 */
1236int ubi_wl_scrub_peb(struct ubi_device *ubi, int pnum)
1237{
1238 struct ubi_wl_entry *e;
1239
1240 ubi_msg("schedule PEB %d for scrubbing", pnum);
1241
1242retry:
1243 spin_lock(&ubi->wl_lock);
1244 e = ubi->lookuptbl[pnum];
1245 if (e == ubi->move_from || in_wl_tree(e, &ubi->scrub)) {
1246 spin_unlock(&ubi->wl_lock);
1247 return 0;
1248 }
1249
1250 if (e == ubi->move_to) {
1251 /*
1252 * This physical eraseblock was used to move data to. The data
1253 * was moved but the PEB was not yet inserted to the proper
1254 * tree. We should just wait a little and let the WL worker
1255 * proceed.
1256 */
1257 spin_unlock(&ubi->wl_lock);
1258 dbg_wl("the PEB %d is not in proper tree, retry", pnum);
1259 yield();
1260 goto retry;
1261 }
1262
1263 if (in_wl_tree(e, &ubi->used))
1264 used_tree_del(ubi, e);
1265 else
1266 prot_tree_del(ubi, pnum);
1267
1268 scrub_tree_add(ubi, e);
1269 spin_unlock(&ubi->wl_lock);
1270
1271 /*
1272 * Technically scrubbing is the same as wear-leveling, so it is done
1273 * by the WL worker.
1274 */
1275 return ensure_wear_leveling(ubi);
1276}
1277
1278/**
1279 * ubi_wl_flush - flush all pending works.
1280 * @ubi: UBI device description object
1281 *
1282 * This function returns zero in case of success and a negative error code in
1283 * case of failure.
1284 */
1285int ubi_wl_flush(struct ubi_device *ubi)
1286{
1287 int err, pending_count;
1288
1289 pending_count = ubi->works_count;
1290
1291 dbg_wl("flush (%d pending works)", pending_count);
1292
1293 /*
1294 * Erase while the pending works queue is not empty, but not more then
1295 * the number of currently pending works.
1296 */
1297 while (pending_count-- > 0) {
1298 err = do_work(ubi);
1299 if (err)
1300 return err;
1301 }
1302
1303 return 0;
1304}
1305
1306/**
1307 * tree_destroy - destroy an RB-tree.
1308 * @root: the root of the tree to destroy
1309 */
1310static void tree_destroy(struct rb_root *root)
1311{
1312 struct rb_node *rb;
1313 struct ubi_wl_entry *e;
1314
1315 rb = root->rb_node;
1316 while (rb) {
1317 if (rb->rb_left)
1318 rb = rb->rb_left;
1319 else if (rb->rb_right)
1320 rb = rb->rb_right;
1321 else {
1322 e = rb_entry(rb, struct ubi_wl_entry, rb);
1323
1324 rb = rb_parent(rb);
1325 if (rb) {
1326 if (rb->rb_left == &e->rb)
1327 rb->rb_left = NULL;
1328 else
1329 rb->rb_right = NULL;
1330 }
1331
1332 kmem_cache_free(wl_entries_slab, e);
1333 }
1334 }
1335}
1336
1337/**
1338 * ubi_thread - UBI background thread.
1339 * @u: the UBI device description object pointer
1340 */
1341static int ubi_thread(void *u)
1342{
1343 int failures = 0;
1344 struct ubi_device *ubi = u;
1345
1346 ubi_msg("background thread \"%s\" started, PID %d",
1347 ubi->bgt_name, current->pid);
1348
1349 for (;;) {
1350 int err;
1351
1352 if (kthread_should_stop())
1353 goto out;
1354
1355 if (try_to_freeze())
1356 continue;
1357
1358 spin_lock(&ubi->wl_lock);
1359 if (list_empty(&ubi->works) || ubi->ro_mode ||
1360 !ubi->thread_enabled) {
1361 set_current_state(TASK_INTERRUPTIBLE);
1362 spin_unlock(&ubi->wl_lock);
1363 schedule();
1364 continue;
1365 }
1366 spin_unlock(&ubi->wl_lock);
1367
1368 err = do_work(ubi);
1369 if (err) {
1370 ubi_err("%s: work failed with error code %d",
1371 ubi->bgt_name, err);
1372 if (failures++ > WL_MAX_FAILURES) {
1373 /*
1374 * Too many failures, disable the thread and
1375 * switch to read-only mode.
1376 */
1377 ubi_msg("%s: %d consecutive failures",
1378 ubi->bgt_name, WL_MAX_FAILURES);
1379 ubi_ro_mode(ubi);
1380 break;
1381 }
1382 } else
1383 failures = 0;
1384
1385 cond_resched();
1386 }
1387
1388out:
1389 dbg_wl("background thread \"%s\" is killed", ubi->bgt_name);
1390 return 0;
1391}
1392
1393/**
1394 * cancel_pending - cancel all pending works.
1395 * @ubi: UBI device description object
1396 */
1397static void cancel_pending(struct ubi_device *ubi)
1398{
1399 while (!list_empty(&ubi->works)) {
1400 struct ubi_work *wrk;
1401
1402 wrk = list_entry(ubi->works.next, struct ubi_work, list);
1403 list_del(&wrk->list);
1404 wrk->func(ubi, wrk, 1);
1405 ubi->works_count -= 1;
1406 ubi_assert(ubi->works_count >= 0);
1407 }
1408}
1409
1410/**
1411 * ubi_wl_init_scan - initialize the wear-leveling unit using scanning
1412 * information.
1413 * @ubi: UBI device description object
1414 * @si: scanning information
1415 *
1416 * This function returns zero in case of success, and a negative error code in
1417 * case of failure.
1418 */
1419int ubi_wl_init_scan(struct ubi_device *ubi, struct ubi_scan_info *si)
1420{
1421 int err;
1422 struct rb_node *rb1, *rb2;
1423 struct ubi_scan_volume *sv;
1424 struct ubi_scan_leb *seb, *tmp;
1425 struct ubi_wl_entry *e;
1426
1427
1428 ubi->used = ubi->free = ubi->scrub = RB_ROOT;
1429 ubi->prot.pnum = ubi->prot.aec = RB_ROOT;
1430 spin_lock_init(&ubi->wl_lock);
1431 ubi->max_ec = si->max_ec;
1432 INIT_LIST_HEAD(&ubi->works);
1433
1434 sprintf(ubi->bgt_name, UBI_BGT_NAME_PATTERN, ubi->ubi_num);
1435
1436 ubi->bgt_thread = kthread_create(ubi_thread, ubi, ubi->bgt_name);
1437 if (IS_ERR(ubi->bgt_thread)) {
1438 err = PTR_ERR(ubi->bgt_thread);
1439 ubi_err("cannot spawn \"%s\", error %d", ubi->bgt_name,
1440 err);
1441 return err;
1442 }
1443
1444 if (ubi_devices_cnt == 0) {
1445 wl_entries_slab = kmem_cache_create("ubi_wl_entry_slab",
1446 sizeof(struct ubi_wl_entry),
1447 0, 0, NULL, NULL);
1448 if (!wl_entries_slab)
1449 return -ENOMEM;
1450 }
1451
1452 err = -ENOMEM;
1453 ubi->lookuptbl = kzalloc(ubi->peb_count * sizeof(void *), GFP_KERNEL);
1454 if (!ubi->lookuptbl)
1455 goto out_free;
1456
1457 list_for_each_entry_safe(seb, tmp, &si->erase, u.list) {
1458 cond_resched();
1459
1460 e = kmem_cache_alloc(wl_entries_slab, GFP_KERNEL);
1461 if (!e)
1462 goto out_free;
1463
1464 e->pnum = seb->pnum;
1465 e->ec = seb->ec;
1466 ubi->lookuptbl[e->pnum] = e;
1467 if (schedule_erase(ubi, e, 0)) {
1468 kmem_cache_free(wl_entries_slab, e);
1469 goto out_free;
1470 }
1471 }
1472
1473 list_for_each_entry(seb, &si->free, u.list) {
1474 cond_resched();
1475
1476 e = kmem_cache_alloc(wl_entries_slab, GFP_KERNEL);
1477 if (!e)
1478 goto out_free;
1479
1480 e->pnum = seb->pnum;
1481 e->ec = seb->ec;
1482 ubi_assert(e->ec >= 0);
1483 free_tree_add(ubi, e);
1484 ubi->lookuptbl[e->pnum] = e;
1485 }
1486
1487 list_for_each_entry(seb, &si->corr, u.list) {
1488 cond_resched();
1489
1490 e = kmem_cache_alloc(wl_entries_slab, GFP_KERNEL);
1491 if (!e)
1492 goto out_free;
1493
1494 e->pnum = seb->pnum;
1495 e->ec = seb->ec;
1496 ubi->lookuptbl[e->pnum] = e;
1497 if (schedule_erase(ubi, e, 0)) {
1498 kmem_cache_free(wl_entries_slab, e);
1499 goto out_free;
1500 }
1501 }
1502
1503 ubi_rb_for_each_entry(rb1, sv, &si->volumes, rb) {
1504 ubi_rb_for_each_entry(rb2, seb, &sv->root, u.rb) {
1505 cond_resched();
1506
1507 e = kmem_cache_alloc(wl_entries_slab, GFP_KERNEL);
1508 if (!e)
1509 goto out_free;
1510
1511 e->pnum = seb->pnum;
1512 e->ec = seb->ec;
1513 ubi->lookuptbl[e->pnum] = e;
1514 if (!seb->scrub) {
1515 dbg_wl("add PEB %d EC %d to the used tree",
1516 e->pnum, e->ec);
1517 used_tree_add(ubi, e);
1518 } else {
1519 dbg_wl("add PEB %d EC %d to the scrub tree",
1520 e->pnum, e->ec);
1521 scrub_tree_add(ubi, e);
1522 }
1523 }
1524 }
1525
1526 if (WL_RESERVED_PEBS > ubi->avail_pebs) {
1527 ubi_err("no enough physical eraseblocks (%d, need %d)",
1528 ubi->avail_pebs, WL_RESERVED_PEBS);
1529 goto out_free;
1530 }
1531 ubi->avail_pebs -= WL_RESERVED_PEBS;
1532 ubi->rsvd_pebs += WL_RESERVED_PEBS;
1533
1534 /* Schedule wear-leveling if needed */
1535 err = ensure_wear_leveling(ubi);
1536 if (err)
1537 goto out_free;
1538
1539 return 0;
1540
1541out_free:
1542 cancel_pending(ubi);
1543 tree_destroy(&ubi->used);
1544 tree_destroy(&ubi->free);
1545 tree_destroy(&ubi->scrub);
1546 kfree(ubi->lookuptbl);
1547 if (ubi_devices_cnt == 0)
1548 kmem_cache_destroy(wl_entries_slab);
1549 return err;
1550}
1551
1552/**
1553 * protection_trees_destroy - destroy the protection RB-trees.
1554 * @ubi: UBI device description object
1555 */
1556static void protection_trees_destroy(struct ubi_device *ubi)
1557{
1558 struct rb_node *rb;
1559 struct ubi_wl_prot_entry *pe;
1560
1561 rb = ubi->prot.aec.rb_node;
1562 while (rb) {
1563 if (rb->rb_left)
1564 rb = rb->rb_left;
1565 else if (rb->rb_right)
1566 rb = rb->rb_right;
1567 else {
1568 pe = rb_entry(rb, struct ubi_wl_prot_entry, rb_aec);
1569
1570 rb = rb_parent(rb);
1571 if (rb) {
1572 if (rb->rb_left == &pe->rb_aec)
1573 rb->rb_left = NULL;
1574 else
1575 rb->rb_right = NULL;
1576 }
1577
1578 kmem_cache_free(wl_entries_slab, pe->e);
1579 kfree(pe);
1580 }
1581 }
1582}
1583
1584/**
1585 * ubi_wl_close - close the wear-leveling unit.
1586 * @ubi: UBI device description object
1587 */
1588void ubi_wl_close(struct ubi_device *ubi)
1589{
1590 dbg_wl("disable \"%s\"", ubi->bgt_name);
1591 if (ubi->bgt_thread)
1592 kthread_stop(ubi->bgt_thread);
1593
1594 dbg_wl("close the UBI wear-leveling unit");
1595
1596 cancel_pending(ubi);
1597 protection_trees_destroy(ubi);
1598 tree_destroy(&ubi->used);
1599 tree_destroy(&ubi->free);
1600 tree_destroy(&ubi->scrub);
1601 kfree(ubi->lookuptbl);
1602 if (ubi_devices_cnt == 1)
1603 kmem_cache_destroy(wl_entries_slab);
1604}
1605
1606#ifdef CONFIG_MTD_UBI_DEBUG_PARANOID
1607
1608/**
1609 * paranoid_check_ec - make sure that the erase counter of a physical eraseblock
1610 * is correct.
1611 * @ubi: UBI device description object
1612 * @pnum: the physical eraseblock number to check
1613 * @ec: the erase counter to check
1614 *
1615 * This function returns zero if the erase counter of physical eraseblock @pnum
1616 * is equivalent to @ec, %1 if not, and a negative error code if an error
1617 * occurred.
1618 */
1619static int paranoid_check_ec(const struct ubi_device *ubi, int pnum, int ec)
1620{
1621 int err;
1622 long long read_ec;
1623 struct ubi_ec_hdr *ec_hdr;
1624
1625 ec_hdr = kzalloc(ubi->ec_hdr_alsize, GFP_KERNEL);
1626 if (!ec_hdr)
1627 return -ENOMEM;
1628
1629 err = ubi_io_read_ec_hdr(ubi, pnum, ec_hdr, 0);
1630 if (err && err != UBI_IO_BITFLIPS) {
1631 /* The header does not have to exist */
1632 err = 0;
1633 goto out_free;
1634 }
1635
1636 read_ec = ubi64_to_cpu(ec_hdr->ec);
1637 if (ec != read_ec) {
1638 ubi_err("paranoid check failed for PEB %d", pnum);
1639 ubi_err("read EC is %lld, should be %d", read_ec, ec);
1640 ubi_dbg_dump_stack();
1641 err = 1;
1642 } else
1643 err = 0;
1644
1645out_free:
1646 kfree(ec_hdr);
1647 return err;
1648}
1649
1650/**
1651 * paranoid_check_in_wl_tree - make sure that a wear-leveling entry is present
1652 * in a WL RB-tree.
1653 * @e: the wear-leveling entry to check
1654 * @root: the root of the tree
1655 *
1656 * This function returns zero if @e is in the @root RB-tree and %1 if it
1657 * is not.
1658 */
1659static int paranoid_check_in_wl_tree(struct ubi_wl_entry *e,
1660 struct rb_root *root)
1661{
1662 if (in_wl_tree(e, root))
1663 return 0;
1664
1665 ubi_err("paranoid check failed for PEB %d, EC %d, RB-tree %p ",
1666 e->pnum, e->ec, root);
1667 ubi_dbg_dump_stack();
1668 return 1;
1669}
1670
1671#endif /* CONFIG_MTD_UBI_DEBUG_PARANOID */