raid5-ppl: Partial Parity Log write logging implementation

Implement the calculation of partial parity for a stripe and PPL write logging functionality. The description of PPL is added to the documentation. More details can be found in the comments in raid5-ppl.c. Attach a page for holding the partial parity data to stripe_head. Allocate it only if mddev has the MD_HAS_PPL flag set. Partial parity is the xor of not modified data chunks of a stripe and is calculated as follows: - reconstruct-write case: xor data from all not updated disks in a stripe - read-modify-write case: xor old data and parity from all updated disks in a stripe Implement it using the async_tx API and integrate into raid_run_ops(). It must be called when we still have access to old data, so do it when STRIPE_OP_BIODRAIN is set, but before ops_run_prexor5(). The result is stored into sh->ppl_page. Partial parity is not meaningful for full stripe write and is not stored in the log or used for recovery, so don't attempt to calculate it when stripe has STRIPE_FULL_WRITE. Put the PPL metadata structures to md_p.h because userspace tools (mdadm) will also need to read/write PPL. Warn about using PPL with enabled disk volatile write-back cache for now. It can be removed once disk cache flushing before writing PPL is implemented. Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com> Signed-off-by: Shaohua Li <shli@fb.com>
author: Artur Paszkiewicz <artur.paszkiewicz@intel.com> 2017-03-09 03:59:59 -0500
committer: Shaohua Li <shli@fb.com> 2017-03-16 19:55:54 -0400
commit: 3418d036c81dcb604b7c7c71b209d5890a8418aa (patch)
tree: d02a31103e09f82858bf149ebcb511e12ed6065a /Documentation/md/raid5-ppl.txt
parent: ff875738edd44e3bc892d378deacc50bccc9d70c (diff)
1 files changed, 44 insertions, 0 deletions
diff --git a/Documentation/md/raid5-ppl.txt b/Documentation/md/raid5-ppl.txt
new file mode 100644
index 000000000000..127072b09363
--- /dev/null
+++ b/Documentation/md/raid5-ppl.txt
@@ -0,0 +1,44 @@
+Partial Parity Log
+Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
+addressed by PPL is that after a dirty shutdown, parity of a particular stripe
+may become inconsistent with data on other member disks. If the array is also
+in degraded state, there is no way to recalculate parity, because one of the
+disks is missing. This can lead to silent data corruption when rebuilding the
+array or using it is as degraded - data calculated from parity for array blocks
+that have not been touched by a write request during the unclean shutdown can
+be incorrect. Such condition is known as the RAID5 Write Hole. Because of
+this, md by default does not allow starting a dirty degraded array.
+Partial parity for a write operation is the XOR of stripe data chunks not
+modified by this write. It is just enough data needed for recovering from the
+write hole. XORing partial parity with the modified chunks produces parity for
+the stripe, consistent with its state before the write operation, regardless of
+which chunk writes have completed. If one of the not modified data disks of
+this stripe is missing, this updated parity can be used to recover its
+contents. PPL recovery is also performed when starting an array after an
+unclean shutdown and all disks are available, eliminating the need to resync
+the array. Because of this, using write-intent bitmap and PPL together is not
+supported.
+When handling a write request PPL writes partial parity before new data and
+parity are dispatched to disks. PPL is a distributed log - it is stored on
+array member drives in the metadata area, on the parity drive of a particular
+stripe.  It does not require a dedicated journaling drive. Write performance is
+reduced by up to 30%-40% but it scales with the number of drives in the array
+and the journaling drive does not become a bottleneck or a single point of
+failure.
+Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
+not a true journal. It does not protect from losing in-flight data, only from
+silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
+performed for this stripe (parity is not updated). So it is possible to have
+arbitrary data in the written part of a stripe if that disk is lost. In such
+case the behavior is the same as in plain raid5.
+PPL is available for md version-1 metadata and external (specifically IMSM)
+metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
+Currently, volatile write-back cache should be disabled on all member drives
+when using PPL. Otherwise it cannot guarantee consistency in case of power
+failure.
author	Artur Paszkiewicz <artur.paszkiewicz@intel.com>	2017-03-09 03:59:59 -0500
committer	Shaohua Li <shli@fb.com>	2017-03-16 19:55:54 -0400
commit	3418d036c81dcb604b7c7c71b209d5890a8418aa (patch)
tree	d02a31103e09f82858bf149ebcb511e12ed6065a /Documentation/md/raid5-ppl.txt
parent	ff875738edd44e3bc892d378deacc50bccc9d70c (diff)

diff --git a/Documentation/md/raid5-ppl.txt b/Documentation/md/raid5-ppl.txt new file mode 100644 index 000000000000..127072b09363 --- /dev/null +++ b/Documentation/md/raid5-ppl.txt
@@ -0,0 +1,44 @@
	1	Partial Parity Log
	2
	3	Partial Parity Log (PPL) is a feature available for RAID5 arrays. The issue
	4	addressed by PPL is that after a dirty shutdown, parity of a particular stripe
	5	may become inconsistent with data on other member disks. If the array is also
	6	in degraded state, there is no way to recalculate parity, because one of the
	7	disks is missing. This can lead to silent data corruption when rebuilding the
	8	array or using it is as degraded - data calculated from parity for array blocks
	9	that have not been touched by a write request during the unclean shutdown can
	10	be incorrect. Such condition is known as the RAID5 Write Hole. Because of
	11	this, md by default does not allow starting a dirty degraded array.
	12
	13	Partial parity for a write operation is the XOR of stripe data chunks not
	14	modified by this write. It is just enough data needed for recovering from the
	15	write hole. XORing partial parity with the modified chunks produces parity for
	16	the stripe, consistent with its state before the write operation, regardless of
	17	which chunk writes have completed. If one of the not modified data disks of
	18	this stripe is missing, this updated parity can be used to recover its
	19	contents. PPL recovery is also performed when starting an array after an
	20	unclean shutdown and all disks are available, eliminating the need to resync
	21	the array. Because of this, using write-intent bitmap and PPL together is not
	22	supported.
	23
	24	When handling a write request PPL writes partial parity before new data and
	25	parity are dispatched to disks. PPL is a distributed log - it is stored on
	26	array member drives in the metadata area, on the parity drive of a particular
	27	stripe. It does not require a dedicated journaling drive. Write performance is
	28	reduced by up to 30%-40% but it scales with the number of drives in the array
	29	and the journaling drive does not become a bottleneck or a single point of
	30	failure.
	31
	32	Unlike raid5-cache, the other solution in md for closing the write hole, PPL is
	33	not a true journal. It does not protect from losing in-flight data, only from
	34	silent data corruption. If a dirty disk of a stripe is lost, no PPL recovery is
	35	performed for this stripe (parity is not updated). So it is possible to have
	36	arbitrary data in the written part of a stripe if that disk is lost. In such
	37	case the behavior is the same as in plain raid5.
	38
	39	PPL is available for md version-1 metadata and external (specifically IMSM)
	40	metadata arrays. It can be enabled using mdadm option --consistency-policy=ppl.
	41
	42	Currently, volatile write-back cache should be disabled on all member drives
	43	when using PPL. Otherwise it cannot guarantee consistency in case of power
	44	failure.