1 files changed, 419 insertions, 0 deletions
diff --git a/Documentation/admin-guide/device-mapper/dm-raid.rst b/Documentation/admin-guide/device-mapper/dm-raid.rst
new file mode 100644
index 000000000000..2fe255b130fb
--- /dev/null
+++ b/Documentation/admin-guide/device-mapper/dm-raid.rst
@@ -0,0 +1,419 @@
+=======
+dm-raid
+=======
+The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
+It allows the MD RAID drivers to be accessed using a device-mapper
+interface.
+Mapping Table Interface
+-----------------------
+The target is named "raid" and it accepts the following parameters::
+  <raid_type> <#raid_params> <raid_params> \
+    <#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
+<raid_type>:
+  ============= ===============================================================
+  raid0         RAID0 striping (no resilience)
+  raid1         RAID1 mirroring
+  raid4         RAID4 with dedicated last parity disk
+  raid5_n       RAID5 with dedicated last parity disk supporting takeover
+                Same as raid4
+                - Transitory layout
+  raid5_la      RAID5 left asymmetric
+                - rotating parity 0 with data continuation
+  raid5_ra      RAID5 right asymmetric
+                - rotating parity N with data continuation
+  raid5_ls      RAID5 left symmetric
+                - rotating parity 0 with data restart
+  raid5_rs      RAID5 right symmetric
+                - rotating parity N with data restart
+  raid6_zr      RAID6 zero restart
+                - rotating parity zero (left-to-right) with data restart
+  raid6_nr      RAID6 N restart
+                - rotating parity N (right-to-left) with data restart
+  raid6_nc      RAID6 N continue
+                - rotating parity N (right-to-left) with data continuation
+  raid6_n_6     RAID6 with dedicate parity disks
+                - parity and Q-syndrome on the last 2 disks;
+                  layout for takeover from/to raid4/raid5_n
+  raid6_la_6    Same as "raid_la" plus dedicated last Q-syndrome disk
+                - layout for takeover from raid5_la from/to raid6
+  raid6_ra_6    Same as "raid5_ra" dedicated last Q-syndrome disk
+                - layout for takeover from raid5_ra from/to raid6
+  raid6_ls_6    Same as "raid5_ls" dedicated last Q-syndrome disk
+                - layout for takeover from raid5_ls from/to raid6
+  raid6_rs_6    Same as "raid5_rs" dedicated last Q-syndrome disk
+                - layout for takeover from raid5_rs from/to raid6
+  raid10        Various RAID10 inspired algorithms chosen by additional params
+                (see raid10_format and raid10_copies below)
+                - RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
+                - RAID1E: Integrated Adjacent Stripe Mirroring
+                - RAID1E: Integrated Offset Stripe Mirroring
+                - and other similar RAID10 variants
+  ============= ===============================================================
+  Reference: Chapter 4 of
+  http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
+<#raid_params>: The number of parameters that follow.
+<raid_params> consists of
+    Mandatory parameters:
+        <chunk_size>:
+                      Chunk size in sectors.  This parameter is often known as
+                      "stripe size".  It is the only mandatory parameter and
+                      is placed first.
+    followed by optional parameters (in any order):
+        [sync|nosync]
+                Force or prevent RAID initialization.
+        [rebuild <idx>]
+                Rebuild drive number 'idx' (first drive is 0).
+        [daemon_sleep <ms>]
+                Interval between runs of the bitmap daemon that
+                clear bits.  A longer interval means less bitmap I/O but
+                resyncing after a failure is likely to take longer.
+        [min_recovery_rate <kB/sec/disk>]
+                Throttle RAID initialization
+        [max_recovery_rate <kB/sec/disk>]
+                Throttle RAID initialization
+        [write_mostly <idx>]
+                Mark drive index 'idx' write-mostly.
+        [max_write_behind <sectors>]
+                See '--write-behind=' (man mdadm)
+        [stripe_cache <sectors>]
+                Stripe cache size (RAID 4/5/6 only)
+        [region_size <sectors>]
+                The region_size multiplied by the number of regions is the
+                logical size of the array.  The bitmap records the device
+                synchronisation state for each region.
+        [raid10_copies   <# copies>], [raid10_format   <near|far|offset>]
+                These two options are used to alter the default layout of
+                a RAID10 configuration.  The number of copies is can be
+                specified, but the default is 2.  There are also three
+                variations to how the copies are laid down - the default
+                is "near".  Near copies are what most people think of with
+                respect to mirroring.  If these options are left unspecified,
+                or 'raid10_copies 2' and/or 'raid10_format near' are given,
+                then the layouts for 2, 3 and 4 devices are:
+                ========         ==========        ==============
+                2 drives         3 drives          4 drives
+                ========         ==========        ==============
+                A1  A1           A1  A1  A2        A1  A1  A2  A2
+                A2  A2           A2  A3  A3        A3  A3  A4  A4
+                A3  A3           A4  A4  A5        A5  A5  A6  A6
+                A4  A4           A5  A6  A6        A7  A7  A8  A8
+                ..  ..           ..  ..  ..        ..  ..  ..  ..
+                ========         ==========        ==============
+                The 2-device layout is equivalent 2-way RAID1.  The 4-device
+                layout is what a traditional RAID10 would look like.  The
+                3-device layout is what might be called a 'RAID1E - Integrated
+                Adjacent Stripe Mirroring'.
+                If 'raid10_copies 2' and 'raid10_format far', then the layouts
+                for 2, 3 and 4 devices are:
+                ========             ============         ===================
+                2 drives             3 drives             4 drives
+                ========             ============         ===================
+                A1  A2               A1   A2   A3         A1   A2   A3   A4
+                A3  A4               A4   A5   A6         A5   A6   A7   A8
+                A5  A6               A7   A8   A9         A9   A10  A11  A12
+                ..  ..               ..   ..   ..         ..   ..   ..   ..
+                A2  A1               A3   A1   A2         A2   A1   A4   A3
+                A4  A3               A6   A4   A5         A6   A5   A8   A7
+                A6  A5               A9   A7   A8         A10  A9   A12  A11
+                ..  ..               ..   ..   ..         ..   ..   ..   ..
+                ========             ============         ===================
+                If 'raid10_copies 2' and 'raid10_format offset', then the
+                layouts for 2, 3 and 4 devices are:
+                ========       ==========         ================
+                2 drives       3 drives           4 drives
+                ========       ==========         ================
+                A1  A2         A1  A2  A3         A1  A2  A3  A4
+                A2  A1         A3  A1  A2         A2  A1  A4  A3
+                A3  A4         A4  A5  A6         A5  A6  A7  A8
+                A4  A3         A6  A4  A5         A6  A5  A8  A7
+                A5  A6         A7  A8  A9         A9  A10 A11 A12
+                A6  A5         A9  A7  A8         A10 A9  A12 A11
+                ..  ..         ..  ..  ..         ..  ..  ..  ..
+                ========       ==========         ================
+                Here we see layouts closely akin to 'RAID1E - Integrated
+                Offset Stripe Mirroring'.
+        [delta_disks <N>]
+                The delta_disks option value (-251 < N < +251) triggers
+                device removal (negative value) or device addition (positive
+                value) to any reshape supporting raid levels 4/5/6 and 10.
+                RAID levels 4/5/6 allow for addition of devices (metadata
+                and data device tuple), raid10_near and raid10_offset only
+                allow for device addition. raid10_far does not support any
+                reshaping at all.
+                A minimum of devices have to be kept to enforce resilience,
+                which is 3 devices for raid4/5 and 4 devices for raid6.
+        [data_offset <sectors>]
+                This option value defines the offset into each data device
+                where the data starts. This is used to provide out-of-place
+                reshaping space to avoid writing over data while
+                changing the layout of stripes, hence an interruption/crash
+                may happen at any time without the risk of losing data.
+                E.g. when adding devices to an existing raid set during
+                forward reshaping, the out-of-place space will be allocated
+                at the beginning of each raid device. The kernel raid4/5/6/10
+                MD personalities supporting such device addition will read the data from
+                the existing first stripes (those with smaller number of stripes)
+                starting at data_offset to fill up a new stripe with the larger
+                number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
+                and write that new stripe to offset 0. Same will be applied to all
+                N-1 other new stripes. This out-of-place scheme is used to change
+                the RAID type (i.e. the allocation algorithm) as well, e.g.
+                changing from raid5_ls to raid5_n.
+        [journal_dev <dev>]
+                This option adds a journal device to raid4/5/6 raid sets and
+                uses it to close the 'write hole' caused by the non-atomic updates
+                to the component devices which can cause data loss during recovery.
+                The journal device is used as writethrough thus causing writes to
+                be throttled versus non-journaled raid4/5/6 sets.
+                Takeover/reshape is not possible with a raid4/5/6 journal device;
+                it has to be deconfigured before requesting these.
+        [journal_mode <mode>]
+                This option sets the caching mode on journaled raid4/5/6 raid sets
+                (see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
+                If 'writeback' is selected the journal device has to be resilient
+                and must not suffer from the 'write hole' problem itself (e.g. use
+                raid1 or raid10) to avoid a single point of failure.
+<#raid_devs>: The number of devices composing the array.
+        Each device consists of two entries.  The first is the device
+        containing the metadata (if any); the second is the one containing the
+        data. A Maximum of 64 metadata/data device entries are supported
+        up to target version 1.8.0.
+        1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
+        If a drive has failed or is missing at creation time, a '-' can be
+        given for both the metadata and data drives for a given position.
+Example Tables
+--------------
+::
+  # RAID4 - 4 data drives, 1 parity (no metadata devices)
+  # No metadata devices specified to hold superblock/bitmap info
+  # Chunk size of 1MiB
+  # (Lines separated for easy reading)
+  0 1960893648 raid \
+          raid4 1 2048 \
+          5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
+  # RAID4 - 4 data drives, 1 parity (with metadata devices)
+  # Chunk size of 1MiB, force RAID initialization,
+  #       min recovery rate at 20 kiB/sec/disk
+  0 1960893648 raid \
+          raid4 4 2048 sync min_recovery_rate 20 \
+          5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
+Status Output
+-------------
+'dmsetup table' displays the table used to construct the mapping.
+The optional parameters are always printed in the order listed
+above with "sync" or "nosync" always output ahead of the other
+arguments, regardless of the order used when originally loading the table.
+Arguments that can be repeated are ordered by value.
+'dmsetup status' yields information on the state and health of the array.
+The output is as follows (normally a single line, but expanded here for
+clarity)::
+  1: <s> <l> raid \
+  2:      <raid_type> <#devices> <health_chars> \
+  3:      <sync_ratio> <sync_action> <mismatch_cnt>
+Line 1 is the standard output produced by device-mapper.
+Line 2 & 3 are produced by the raid target and are best explained by example::
+        0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
+Here we can see the RAID type is raid4, there are 5 devices - all of
+which are 'A'live, and the array is 2/490221568 complete with its initial
+recovery.  Here is a fuller description of the individual fields:
+        =============== =========================================================
+        <raid_type>     Same as the <raid_type> used to create the array.
+        <health_chars>  One char for each device, indicating:
+                        - 'A' = alive and in-sync
+                        - 'a' = alive but not in-sync
+                        - 'D' = dead/failed.
+        <sync_ratio>    The ratio indicating how much of the array has undergone
+                        the process described by 'sync_action'.  If the
+                        'sync_action' is "check" or "repair", then the process
+                        of "resync" or "recover" can be considered complete.
+        <sync_action>   One of the following possible states:
+                        idle
+                                - No synchronization action is being performed.
+                        frozen
+                                - The current action has been halted.
+                        resync
+                                - Array is undergoing its initial synchronization
+                                  or is resynchronizing after an unclean shutdown
+                                  (possibly aided by a bitmap).
+                        recover
+                                - A device in the array is being rebuilt or
+                                  replaced.
+                        check
+                                - A user-initiated full check of the array is
+                                  being performed.  All blocks are read and
+                                  checked for consistency.  The number of
+                                  discrepancies found are recorded in
+                                  <mismatch_cnt>.  No changes are made to the
+                                  array by this action.
+                        repair
+                                - The same as "check", but discrepancies are
+                                  corrected.
+                        reshape
+                                - The array is undergoing a reshape.
+        <mismatch_cnt>  The number of discrepancies found between mirror copies
+                        in RAID1/10 or wrong parity values found in RAID4/5/6.
+                        This value is valid only after a "check" of the array
+                        is performed.  A healthy array has a 'mismatch_cnt' of 0.
+        <data_offset>   The current data offset to the start of the user data on
+                        each component device of a raid set (see the respective
+                        raid parameter to support out-of-place reshaping).
+        <journal_char>  - 'A' - active write-through journal device.
+                        - 'a' - active write-back journal device.
+                        - 'D' - dead journal device.
+                        - '-' - no journal device.
+        =============== =========================================================
+Message Interface
+-----------------
+The dm-raid target will accept certain actions through the 'message' interface.
+('man dmsetup' for more information on the message interface.)  These actions
+include:
+        ========= ================================================
+        "idle"    Halt the current sync action.
+        "frozen"  Freeze the current sync action.
+        "resync"  Initiate/continue a resync.
+        "recover" Initiate/continue a recover process.
+        "check"   Initiate a check (i.e. a "scrub") of the array.
+        "repair"  Initiate a repair of the array.
+        ========= ================================================
+Discard Support
+---------------
+The implementation of discard support among hardware vendors varies.
+When a block is discarded, some storage devices will return zeroes when
+the block is read.  These devices set the 'discard_zeroes_data'
+attribute.  Other devices will return random data.  Confusingly, some
+devices that advertise 'discard_zeroes_data' will not reliably return
+zeroes when discarded blocks are read!  Since RAID 4/5/6 uses blocks
+from a number of devices to calculate parity blocks and (for performance
+reasons) relies on 'discard_zeroes_data' being reliable, it is important
+that the devices be consistent.  Blocks may be discarded in the middle
+of a RAID 4/5/6 stripe and if subsequent read results are not
+consistent, the parity blocks may be calculated differently at any time;
+making the parity blocks useless for redundancy.  It is important to
+understand how your hardware behaves with discards if you are going to
+enable discards with RAID 4/5/6.
+Since the behavior of storage devices is unreliable in this respect,
+even when reporting 'discard_zeroes_data', by default RAID 4/5/6
+discard support is disabled -- this ensures data integrity at the
+expense of losing some performance.
+Storage devices that properly support 'discard_zeroes_data' are
+increasingly whitelisted in the kernel and can thus be trusted.
+For trusted devices, the following dm-raid module parameter can be set
+to safely enable discard support for RAID 4/5/6:
+    'devices_handle_discards_safely'
+Version History
+---------------
+::
+ 1.0.0  Initial version.  Support for RAID 4/5/6
+ 1.1.0  Added support for RAID 1
+ 1.2.0  Handle creation of arrays that contain failed devices.
+ 1.3.0  Added support for RAID 10
+ 1.3.1  Allow device replacement/rebuild for RAID 10
+ 1.3.2  Fix/improve redundancy checking for RAID10
+ 1.4.0  Non-functional change.  Removes arg from mapping function.
+ 1.4.1  RAID10 fix redundancy validation checks (commit 55ebbb5).
+ 1.4.2  Add RAID10 "far" and "offset" algorithm support.
+ 1.5.0  Add message interface to allow manipulation of the sync_action.
+        New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
+ 1.5.1  Add ability to restore transiently failed devices on resume.
+ 1.5.2  'mismatch_cnt' is zero unless [last_]sync_action is "check".
+ 1.6.0  Add discard support (and devices_handle_discard_safely module param).
+ 1.7.0  Add support for MD RAID0 mappings.
+ 1.8.0  Explicitly check for compatible flags in the superblock metadata
+        and reject to start the raid set if any are set by a newer
+        target version, thus avoiding data corruption on a raid set
+        with a reshape in progress.
+ 1.9.0  Add support for RAID level takeover/reshape/region size
+        and set size reduction.
+ 1.9.1  Fix activation of existing RAID 4/10 mapped devices
+ 1.9.2  Don't emit '- -' on the status table line in case the constructor
+        fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
+        'D' on the status line.  If '- -' is passed into the constructor, emit
+        '- -' on the table line and '-' as the status line health character.
+ 1.10.0 Add support for raid4/5/6 journal device
+ 1.10.1 Fix data corruption on reshape request
+ 1.11.0 Fix table line argument order
+        (wrong raid10_copies/raid10_format sequence)
+ 1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
+ 1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
+ 1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
+ 1.13.1 Fix deadlock caused by early md_stop_writes().  Also fix size an
+        state races.
+ 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
+ 1.14.0 Fix reshape race on small devices.  Fix stripe adding reshape
+        deadlock/potential data corruption.  Update superblock when
+        specific devices are requested via rebuild.  Fix RAID leg
+        rebuild errors.

diff --git a/Documentation/admin-guide/device-mapper/dm-raid.rst b/Documentation/admin-guide/device-mapper/dm-raid.rst new file mode 100644 index 000000000000..2fe255b130fb --- /dev/null +++ b/Documentation/admin-guide/device-mapper/dm-raid.rst
@@ -0,0 +1,419 @@
	1	=======
	2	dm-raid
	3	=======
	4
	5	The device-mapper RAID (dm-raid) target provides a bridge from DM to MD.
	6	It allows the MD RAID drivers to be accessed using a device-mapper
	7	interface.
	8
	9
	10	Mapping Table Interface
	11	-----------------------
	12	The target is named "raid" and it accepts the following parameters::
	13
	14	<raid_type> <#raid_params> <raid_params> \
	15	<#raid_devs> <metadata_dev0> <dev0> [.. <metadata_devN> <devN>]
	16
	17	<raid_type>:
	18
	19	============= ===============================================================
	20	raid0 RAID0 striping (no resilience)
	21	raid1 RAID1 mirroring
	22	raid4 RAID4 with dedicated last parity disk
	23	raid5_n RAID5 with dedicated last parity disk supporting takeover
	24	Same as raid4
	25
	26	- Transitory layout
	27	raid5_la RAID5 left asymmetric
	28
	29	- rotating parity 0 with data continuation
	30	raid5_ra RAID5 right asymmetric
	31
	32	- rotating parity N with data continuation
	33	raid5_ls RAID5 left symmetric
	34
	35	- rotating parity 0 with data restart
	36	raid5_rs RAID5 right symmetric
	37
	38	- rotating parity N with data restart
	39	raid6_zr RAID6 zero restart
	40
	41	- rotating parity zero (left-to-right) with data restart
	42	raid6_nr RAID6 N restart
	43
	44	- rotating parity N (right-to-left) with data restart
	45	raid6_nc RAID6 N continue
	46
	47	- rotating parity N (right-to-left) with data continuation
	48	raid6_n_6 RAID6 with dedicate parity disks
	49
	50	- parity and Q-syndrome on the last 2 disks;
	51	layout for takeover from/to raid4/raid5_n
	52	raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk
	53
	54	- layout for takeover from raid5_la from/to raid6
	55	raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk
	56
	57	- layout for takeover from raid5_ra from/to raid6
	58	raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk
	59
	60	- layout for takeover from raid5_ls from/to raid6
	61	raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk
	62
	63	- layout for takeover from raid5_rs from/to raid6
	64	raid10 Various RAID10 inspired algorithms chosen by additional params
	65	(see raid10_format and raid10_copies below)
	66
	67	- RAID10: Striped Mirrors (aka 'Striping on top of mirrors')
	68	- RAID1E: Integrated Adjacent Stripe Mirroring
	69	- RAID1E: Integrated Offset Stripe Mirroring
	70	- and other similar RAID10 variants
	71	============= ===============================================================
	72
	73	Reference: Chapter 4 of
	74	http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf
	75
	76	<#raid_params>: The number of parameters that follow.
	77
	78	<raid_params> consists of
	79
	80	Mandatory parameters:
	81	<chunk_size>:
	82	Chunk size in sectors. This parameter is often known as
	83	"stripe size". It is the only mandatory parameter and
	84	is placed first.
	85
	86	followed by optional parameters (in any order):
	87	[sync\|nosync]
	88	Force or prevent RAID initialization.
	89
	90	[rebuild <idx>]
	91	Rebuild drive number 'idx' (first drive is 0).
	92
	93	[daemon_sleep <ms>]
	94	Interval between runs of the bitmap daemon that
	95	clear bits. A longer interval means less bitmap I/O but
	96	resyncing after a failure is likely to take longer.
	97
	98	[min_recovery_rate <kB/sec/disk>]
	99	Throttle RAID initialization
	100	[max_recovery_rate <kB/sec/disk>]
	101	Throttle RAID initialization
	102	[write_mostly <idx>]
	103	Mark drive index 'idx' write-mostly.
	104	[max_write_behind <sectors>]
	105	See '--write-behind=' (man mdadm)
	106	[stripe_cache <sectors>]
	107	Stripe cache size (RAID 4/5/6 only)
	108	[region_size <sectors>]
	109	The region_size multiplied by the number of regions is the
	110	logical size of the array. The bitmap records the device
	111	synchronisation state for each region.
	112
	113	[raid10_copies <# copies>], [raid10_format <near\|far\|offset>]
	114	These two options are used to alter the default layout of
	115	a RAID10 configuration. The number of copies is can be
	116	specified, but the default is 2. There are also three
	117	variations to how the copies are laid down - the default
	118	is "near". Near copies are what most people think of with
	119	respect to mirroring. If these options are left unspecified,
	120	or 'raid10_copies 2' and/or 'raid10_format near' are given,
	121	then the layouts for 2, 3 and 4 devices are:
	122
	123	======== ========== ==============
	124	2 drives 3 drives 4 drives
	125	======== ========== ==============
	126	A1 A1 A1 A1 A2 A1 A1 A2 A2
	127	A2 A2 A2 A3 A3 A3 A3 A4 A4
	128	A3 A3 A4 A4 A5 A5 A5 A6 A6
	129	A4 A4 A5 A6 A6 A7 A7 A8 A8
	130	.. .. .. .. .. .. .. .. ..
	131	======== ========== ==============
	132
	133	The 2-device layout is equivalent 2-way RAID1. The 4-device
	134	layout is what a traditional RAID10 would look like. The
	135	3-device layout is what might be called a 'RAID1E - Integrated
	136	Adjacent Stripe Mirroring'.
	137
	138	If 'raid10_copies 2' and 'raid10_format far', then the layouts
	139	for 2, 3 and 4 devices are:
	140
	141	======== ============ ===================
	142	2 drives 3 drives 4 drives
	143	======== ============ ===================
	144	A1 A2 A1 A2 A3 A1 A2 A3 A4
	145	A3 A4 A4 A5 A6 A5 A6 A7 A8
	146	A5 A6 A7 A8 A9 A9 A10 A11 A12
	147	.. .. .. .. .. .. .. .. ..
	148	A2 A1 A3 A1 A2 A2 A1 A4 A3
	149	A4 A3 A6 A4 A5 A6 A5 A8 A7
	150	A6 A5 A9 A7 A8 A10 A9 A12 A11
	151	.. .. .. .. .. .. .. .. ..
	152	======== ============ ===================
	153
	154	If 'raid10_copies 2' and 'raid10_format offset', then the
	155	layouts for 2, 3 and 4 devices are:
	156
	157	======== ========== ================
	158	2 drives 3 drives 4 drives
	159	======== ========== ================
	160	A1 A2 A1 A2 A3 A1 A2 A3 A4
	161	A2 A1 A3 A1 A2 A2 A1 A4 A3
	162	A3 A4 A4 A5 A6 A5 A6 A7 A8
	163	A4 A3 A6 A4 A5 A6 A5 A8 A7
	164	A5 A6 A7 A8 A9 A9 A10 A11 A12
	165	A6 A5 A9 A7 A8 A10 A9 A12 A11
	166	.. .. .. .. .. .. .. .. ..
	167	======== ========== ================
	168
	169	Here we see layouts closely akin to 'RAID1E - Integrated
	170	Offset Stripe Mirroring'.
	171
	172	[delta_disks <N>]
	173	The delta_disks option value (-251 < N < +251) triggers
	174	device removal (negative value) or device addition (positive
	175	value) to any reshape supporting raid levels 4/5/6 and 10.
	176	RAID levels 4/5/6 allow for addition of devices (metadata
	177	and data device tuple), raid10_near and raid10_offset only
	178	allow for device addition. raid10_far does not support any
	179	reshaping at all.
	180	A minimum of devices have to be kept to enforce resilience,
	181	which is 3 devices for raid4/5 and 4 devices for raid6.
	182
	183	[data_offset <sectors>]
	184	This option value defines the offset into each data device
	185	where the data starts. This is used to provide out-of-place
	186	reshaping space to avoid writing over data while
	187	changing the layout of stripes, hence an interruption/crash
	188	may happen at any time without the risk of losing data.
	189	E.g. when adding devices to an existing raid set during
	190	forward reshaping, the out-of-place space will be allocated
	191	at the beginning of each raid device. The kernel raid4/5/6/10
	192	MD personalities supporting such device addition will read the data from
	193	the existing first stripes (those with smaller number of stripes)
	194	starting at data_offset to fill up a new stripe with the larger
	195	number of stripes, calculate the redundancy blocks (CRC/Q-syndrome)
	196	and write that new stripe to offset 0. Same will be applied to all
	197	N-1 other new stripes. This out-of-place scheme is used to change
	198	the RAID type (i.e. the allocation algorithm) as well, e.g.
	199	changing from raid5_ls to raid5_n.
	200
	201	[journal_dev <dev>]
	202	This option adds a journal device to raid4/5/6 raid sets and
	203	uses it to close the 'write hole' caused by the non-atomic updates
	204	to the component devices which can cause data loss during recovery.
	205	The journal device is used as writethrough thus causing writes to
	206	be throttled versus non-journaled raid4/5/6 sets.
	207	Takeover/reshape is not possible with a raid4/5/6 journal device;
	208	it has to be deconfigured before requesting these.
	209
	210	[journal_mode <mode>]
	211	This option sets the caching mode on journaled raid4/5/6 raid sets
	212	(see 'journal_dev <dev>' above) to 'writethrough' or 'writeback'.
	213	If 'writeback' is selected the journal device has to be resilient
	214	and must not suffer from the 'write hole' problem itself (e.g. use
	215	raid1 or raid10) to avoid a single point of failure.
	216
	217	<#raid_devs>: The number of devices composing the array.
	218	Each device consists of two entries. The first is the device
	219	containing the metadata (if any); the second is the one containing the
	220	data. A Maximum of 64 metadata/data device entries are supported
	221	up to target version 1.8.0.
	222	1.9.0 supports up to 253 which is enforced by the used MD kernel runtime.
	223
	224	If a drive has failed or is missing at creation time, a '-' can be
	225	given for both the metadata and data drives for a given position.
	226
	227
	228	Example Tables
	229	--------------
	230
	231	::
	232
	233	# RAID4 - 4 data drives, 1 parity (no metadata devices)
	234	# No metadata devices specified to hold superblock/bitmap info
	235	# Chunk size of 1MiB
	236	# (Lines separated for easy reading)
	237
	238	0 1960893648 raid \
	239	raid4 1 2048 \
	240	5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
	241
	242	# RAID4 - 4 data drives, 1 parity (with metadata devices)
	243	# Chunk size of 1MiB, force RAID initialization,
	244	# min recovery rate at 20 kiB/sec/disk
	245
	246	0 1960893648 raid \
	247	raid4 4 2048 sync min_recovery_rate 20 \
	248	5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82
	249
	250
	251	Status Output
	252	-------------
	253	'dmsetup table' displays the table used to construct the mapping.
	254	The optional parameters are always printed in the order listed
	255	above with "sync" or "nosync" always output ahead of the other
	256	arguments, regardless of the order used when originally loading the table.
	257	Arguments that can be repeated are ordered by value.
	258
	259
	260	'dmsetup status' yields information on the state and health of the array.
	261	The output is as follows (normally a single line, but expanded here for
	262	clarity)::
	263
	264	1: <s> <l> raid \
	265	2: <raid_type> <#devices> <health_chars> \
	266	3: <sync_ratio> <sync_action> <mismatch_cnt>
	267
	268	Line 1 is the standard output produced by device-mapper.
	269
	270	Line 2 & 3 are produced by the raid target and are best explained by example::
	271
	272	0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0
	273
	274	Here we can see the RAID type is raid4, there are 5 devices - all of
	275	which are 'A'live, and the array is 2/490221568 complete with its initial
	276	recovery. Here is a fuller description of the individual fields:
	277
	278	=============== =========================================================
	279	<raid_type> Same as the <raid_type> used to create the array.
	280	<health_chars> One char for each device, indicating:
	281
	282	- 'A' = alive and in-sync
	283	- 'a' = alive but not in-sync
	284	- 'D' = dead/failed.
	285	<sync_ratio> The ratio indicating how much of the array has undergone
	286	the process described by 'sync_action'. If the
	287	'sync_action' is "check" or "repair", then the process
	288	of "resync" or "recover" can be considered complete.
	289	<sync_action> One of the following possible states:
	290
	291	idle
	292	- No synchronization action is being performed.
	293	frozen
	294	- The current action has been halted.
	295	resync
	296	- Array is undergoing its initial synchronization
	297	or is resynchronizing after an unclean shutdown
	298	(possibly aided by a bitmap).
	299	recover
	300	- A device in the array is being rebuilt or
	301	replaced.
	302	check
	303	- A user-initiated full check of the array is
	304	being performed. All blocks are read and
	305	checked for consistency. The number of
	306	discrepancies found are recorded in
	307	<mismatch_cnt>. No changes are made to the
	308	array by this action.
	309	repair
	310	- The same as "check", but discrepancies are
	311	corrected.
	312	reshape
	313	- The array is undergoing a reshape.
	314	<mismatch_cnt> The number of discrepancies found between mirror copies
	315	in RAID1/10 or wrong parity values found in RAID4/5/6.
	316	This value is valid only after a "check" of the array
	317	is performed. A healthy array has a 'mismatch_cnt' of 0.
	318	<data_offset> The current data offset to the start of the user data on
	319	each component device of a raid set (see the respective
	320	raid parameter to support out-of-place reshaping).
	321	<journal_char> - 'A' - active write-through journal device.
	322	- 'a' - active write-back journal device.
	323	- 'D' - dead journal device.
	324	- '-' - no journal device.
	325	=============== =========================================================
	326
	327
	328	Message Interface
	329	-----------------
	330	The dm-raid target will accept certain actions through the 'message' interface.
	331	('man dmsetup' for more information on the message interface.) These actions
	332	include:
	333
	334	========= ================================================
	335	"idle" Halt the current sync action.
	336	"frozen" Freeze the current sync action.
	337	"resync" Initiate/continue a resync.
	338	"recover" Initiate/continue a recover process.
	339	"check" Initiate a check (i.e. a "scrub") of the array.
	340	"repair" Initiate a repair of the array.
	341	========= ================================================
	342
	343
	344	Discard Support
	345	---------------
	346	The implementation of discard support among hardware vendors varies.
	347	When a block is discarded, some storage devices will return zeroes when
	348	the block is read. These devices set the 'discard_zeroes_data'
	349	attribute. Other devices will return random data. Confusingly, some
	350	devices that advertise 'discard_zeroes_data' will not reliably return
	351	zeroes when discarded blocks are read! Since RAID 4/5/6 uses blocks
	352	from a number of devices to calculate parity blocks and (for performance
	353	reasons) relies on 'discard_zeroes_data' being reliable, it is important
	354	that the devices be consistent. Blocks may be discarded in the middle
	355	of a RAID 4/5/6 stripe and if subsequent read results are not
	356	consistent, the parity blocks may be calculated differently at any time;
	357	making the parity blocks useless for redundancy. It is important to
	358	understand how your hardware behaves with discards if you are going to
	359	enable discards with RAID 4/5/6.
	360
	361	Since the behavior of storage devices is unreliable in this respect,
	362	even when reporting 'discard_zeroes_data', by default RAID 4/5/6
	363	discard support is disabled -- this ensures data integrity at the
	364	expense of losing some performance.
	365
	366	Storage devices that properly support 'discard_zeroes_data' are
	367	increasingly whitelisted in the kernel and can thus be trusted.
	368
	369	For trusted devices, the following dm-raid module parameter can be set
	370	to safely enable discard support for RAID 4/5/6:
	371
	372	'devices_handle_discards_safely'
	373
	374
	375	Version History
	376	---------------
	377
	378	::
	379
	380	1.0.0 Initial version. Support for RAID 4/5/6
	381	1.1.0 Added support for RAID 1
	382	1.2.0 Handle creation of arrays that contain failed devices.
	383	1.3.0 Added support for RAID 10
	384	1.3.1 Allow device replacement/rebuild for RAID 10
	385	1.3.2 Fix/improve redundancy checking for RAID10
	386	1.4.0 Non-functional change. Removes arg from mapping function.
	387	1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5).
	388	1.4.2 Add RAID10 "far" and "offset" algorithm support.
	389	1.5.0 Add message interface to allow manipulation of the sync_action.
	390	New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt.
	391	1.5.1 Add ability to restore transiently failed devices on resume.
	392	1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check".
	393	1.6.0 Add discard support (and devices_handle_discard_safely module param).
	394	1.7.0 Add support for MD RAID0 mappings.
	395	1.8.0 Explicitly check for compatible flags in the superblock metadata
	396	and reject to start the raid set if any are set by a newer
	397	target version, thus avoiding data corruption on a raid set
	398	with a reshape in progress.
	399	1.9.0 Add support for RAID level takeover/reshape/region size
	400	and set size reduction.
	401	1.9.1 Fix activation of existing RAID 4/10 mapped devices
	402	1.9.2 Don't emit '- -' on the status table line in case the constructor
	403	fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and
	404	'D' on the status line. If '- -' is passed into the constructor, emit
	405	'- -' on the table line and '-' as the status line health character.
	406	1.10.0 Add support for raid4/5/6 journal device
	407	1.10.1 Fix data corruption on reshape request
	408	1.11.0 Fix table line argument order
	409	(wrong raid10_copies/raid10_format sequence)
	410	1.11.1 Add raid4/5/6 journal write-back support via journal_mode option
	411	1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available
	412	1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A')
	413	1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an
	414	state races.
	415	1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen
	416	1.14.0 Fix reshape race on small devices. Fix stripe adding reshape
	417	deadlock/potential data corruption. Update superblock when
	418	specific devices are requested via rebuild. Fix RAID leg
	419	rebuild errors.