diff options
author | Artur Paszkiewicz <artur.paszkiewicz@intel.com> | 2017-03-09 03:59:59 -0500 |
---|---|---|
committer | Shaohua Li <shli@fb.com> | 2017-03-16 19:55:54 -0400 |
commit | 3418d036c81dcb604b7c7c71b209d5890a8418aa (patch) | |
tree | d02a31103e09f82858bf149ebcb511e12ed6065a /Documentation/md/raid5-ppl.txt | |
parent | ff875738edd44e3bc892d378deacc50bccc9d70c (diff) |
raid5-ppl: Partial Parity Log write logging implementation
Implement the calculation of partial parity for a stripe and PPL write
logging functionality. The description of PPL is added to the
documentation. More details can be found in the comments in raid5-ppl.c.
Attach a page for holding the partial parity data to stripe_head.
Allocate it only if mddev has the MD_HAS_PPL flag set.
Partial parity is the xor of not modified data chunks of a stripe and is
calculated as follows:
- reconstruct-write case:
xor data from all not updated disks in a stripe
- read-modify-write case:
xor old data and parity from all updated disks in a stripe
Implement it using the async_tx API and integrate into raid_run_ops().
It must be called when we still have access to old data, so do it when
STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is
stored into sh->ppl_page.
Partial parity is not meaningful for full stripe write and is not stored
in the log or used for recovery, so don't attempt to calculate it when
stripe has STRIPE_FULL_WRITE.
Put the PPL metadata structures to md_p.h because userspace tools
(mdadm) will also need to read/write PPL.
Warn about using PPL with enabled disk volatile write-back cache for
now. It can be removed once disk cache flushing before writing PPL is
implemented.
Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Diffstat (limited to 'Documentation/md/raid5-ppl.txt')
-rw-r--r-- | Documentation/md/raid5-ppl.txt | 44 |
1 files changed, 44 insertions, 0 deletions
diff --git a/Documentation/md/raid5-ppl.txt b/Documentation/md/raid5-ppl.txt new file mode 100644 index 000000000000..127072b09363 --- /dev/null +++ b/Documentation/md/raid5-ppl.txt | |||
@@ -0,0 +1,44 @@ | |||
1 | Partial Parity Log | ||
2 | |||
3 | Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue | ||
4 | addressed by PPL is that after a dirty shutdown, parity of a particular stripe | ||
5 | may become inconsistent with data on other member disks. If the array is also | ||
6 | in degraded state, there is no way to recalculate parity, because one of the | ||
7 | disks is missing. This can lead to silent data corruption when rebuilding the | ||
8 | array or using it is as degraded - data calculated from parity for array blocks | ||
9 | that have not been touched by a write request during the unclean shutdown can | ||
10 | be incorrect. Such condition is known as the RAID5 Write Hole. Because of | ||
11 | this, md by default does not allow starting a dirty degraded array. | ||
12 | |||
13 | Partial parity for a write operation is the XOR of stripe data chunks not | ||
14 | modified by this write. It is just enough data needed for recovering from the | ||
15 | write hole. XORing partial parity with the modified chunks produces parity for | ||
16 | the stripe, consistent with its state before the write operation, regardless of | ||
17 | which chunk writes have completed. If one of the not modified data disks of | ||
18 | this stripe is missing, this updated parity can be used to recover its | ||
19 | contents. PPL recovery is also performed when starting an array after an | ||
20 | unclean shutdown and all disks are available, eliminating the need to resync | ||
21 | the array. Because of this, using write-intent bitmap and PPL together is not | ||
22 | supported. | ||
23 | |||
24 | When handling a write request PPL writes partial parity before new data and | ||
25 | parity are dispatched to disks. PPL is a distributed log - it is stored on | ||
26 | array member drives in the metadata area, on the parity drive of a particular | ||
27 | stripe. It does not require a dedicated journaling drive. Write performance is | ||
28 | reduced by up to 30%-40% but it scales with the number of drives in the array | ||
29 | and the journaling drive does not become a bottleneck or a single point of | ||
30 | failure. | ||
31 | |||
32 | Unlike raid5-cache, the other solution in md for closing the write hole, PPL is | ||
33 | not a true journal. It does not protect from losing in-flight data, only from | ||
34 | silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is | ||
35 | performed for this stripe (parity is not updated). So it is possible to have | ||
36 | arbitrary data in the written part of a stripe if that disk is lost. In such | ||
37 | case the behavior is the same as in plain raid5. | ||
38 | |||
39 | PPL is available for md version-1 metadata and external (specifically IMSM) | ||
40 | metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl. | ||
41 | |||
42 | Currently, volatile write-back cache should be disabled on all member drives | ||
43 | when using PPL. Otherwise it cannot guarantee consistency in case of power | ||
44 | failure. | ||