aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2009-06-24 13:26:54 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2009-06-24 13:26:54 -0400
commitc3cb5e193937c7aa50c323e7933507020bd26340 (patch)
treeea36213ccd29dc4caf2f729fd51b2d489b591a31 /Documentation
parentea94b5034bbebc964115f119d6cd330757fce7f9 (diff)
parentf40c67f0f7e2767f80f7cbcbc1ab86c4113c202e (diff)
Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (48 commits) dm mpath: change to be request based dm: disable interrupt when taking map_lock dm: do not set QUEUE_ORDERED_DRAIN if request based dm: enable request based option dm: prepare for request based option dm raid1: add userspace log dm: calculate queue limits during resume not load dm log: fix create_log_context to use logical_block_size of log device dm target:s introduce iterate devices fn dm table: establish queue limits by copying table limits dm table: replace struct io_restrictions with struct queue_limits dm table: validate device logical_block_size dm table: ensure targets are aligned to logical_block_size dm ioctl: support cookies for udev dm: sysfs add suspended attribute dm table: improve warning message when devices not freed before destruction dm mpath: add service time load balancer dm mpath: add queue length load balancer dm mpath: add start_io and nr_bytes to path selectors dm snapshot: use barrier when writing exception store ...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/device-mapper/dm-log.txt54
-rw-r--r--Documentation/device-mapper/dm-queue-length.txt39
-rw-r--r--Documentation/device-mapper/dm-service-time.txt91
3 files changed, 184 insertions, 0 deletions
diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt
new file mode 100644
index 00000000000..994dd75475a
--- /dev/null
+++ b/Documentation/device-mapper/dm-log.txt
@@ -0,0 +1,54 @@
1Device-Mapper Logging
2=====================
3The device-mapper logging code is used by some of the device-mapper
4RAID targets to track regions of the disk that are not consistent.
5A region (or portion of the address space) of the disk may be
6inconsistent because a RAID stripe is currently being operated on or
7a machine died while the region was being altered. In the case of
8mirrors, a region would be considered dirty/inconsistent while you
9are writing to it because the writes need to be replicated for all
10the legs of the mirror and may not reach the legs at the same time.
11Once all writes are complete, the region is considered clean again.
12
13There is a generic logging interface that the device-mapper RAID
14implementations use to perform logging operations (see
15dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different
16logging implementations are available and provide different
17capabilities. The list includes:
18
19Type Files
20==== =====
21disk drivers/md/dm-log.c
22core drivers/md/dm-log.c
23userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h
24
25The "disk" log type
26-------------------
27This log implementation commits the log state to disk. This way, the
28logging state survives reboots/crashes.
29
30The "core" log type
31-------------------
32This log implementation keeps the log state in memory. The log state
33will not survive a reboot or crash, but there may be a small boost in
34performance. This method can also be used if no storage device is
35available for storing log state.
36
37The "userspace" log type
38------------------------
39This log type simply provides a way to export the log API to userspace,
40so log implementations can be done there. This is done by forwarding most
41logging requests to userspace, where a daemon receives and processes the
42request.
43
44The structure used for communication between kernel and userspace are
45located in include/linux/dm-log-userspace.h. Due to the frequency,
46diversity, and 2-way communication nature of the exchanges between
47kernel and userspace, 'connector' is used as the interface for
48communication.
49
50There are currently two userspace log implementations that leverage this
51framework - "clustered_disk" and "clustered_core". These implementations
52provide a cluster-coherent log for shared-storage. Device-mapper mirroring
53can be used in a shared-storage environment when the cluster log implementations
54are employed.
diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt
new file mode 100644
index 00000000000..f4db2562175
--- /dev/null
+++ b/Documentation/device-mapper/dm-queue-length.txt
@@ -0,0 +1,39 @@
1dm-queue-length
2===============
3
4dm-queue-length is a path selector module for device-mapper targets,
5which selects a path with the least number of in-flight I/Os.
6The path selector name is 'queue-length'.
7
8Table parameters for each path: [<repeat_count>]
9 <repeat_count>: The number of I/Os to dispatch using the selected
10 path before switching to the next path.
11 If not given, internal default is used. To check
12 the default value, see the activated table.
13
14Status for each path: <status> <fail-count> <in-flight>
15 <status>: 'A' if the path is active, 'F' if the path is failed.
16 <fail-count>: The number of path failures.
17 <in-flight>: The number of in-flight I/Os on the path.
18
19
20Algorithm
21=========
22
23dm-queue-length increments/decrements 'in-flight' when an I/O is
24dispatched/completed respectively.
25dm-queue-length selects a path with the minimum 'in-flight'.
26
27
28Examples
29========
30In case that 2 paths (sda and sdb) are used with repeat_count == 128.
31
32# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \
33 dmsetup create test
34#
35# dmsetup table
36test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128
37#
38# dmsetup status
39test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0
diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt
new file mode 100644
index 00000000000..7d00668e97b
--- /dev/null
+++ b/Documentation/device-mapper/dm-service-time.txt
@@ -0,0 +1,91 @@
1dm-service-time
2===============
3
4dm-service-time is a path selector module for device-mapper targets,
5which selects a path with the shortest estimated service time for
6the incoming I/O.
7
8The service time for each path is estimated by dividing the total size
9of in-flight I/Os on a path with the performance value of the path.
10The performance value is a relative throughput value among all paths
11in a path-group, and it can be specified as a table argument.
12
13The path selector name is 'service-time'.
14
15Table parameters for each path: [<repeat_count> [<relative_throughput>]]
16 <repeat_count>: The number of I/Os to dispatch using the selected
17 path before switching to the next path.
18 If not given, internal default is used. To check
19 the default value, see the activated table.
20 <relative_throughput>: The relative throughput value of the path
21 among all paths in the path-group.
22 The valid range is 0-100.
23 If not given, minimum value '1' is used.
24 If '0' is given, the path isn't selected while
25 other paths having a positive value are available.
26
27Status for each path: <status> <fail-count> <in-flight-size> \
28 <relative_throughput>
29 <status>: 'A' if the path is active, 'F' if the path is failed.
30 <fail-count>: The number of path failures.
31 <in-flight-size>: The size of in-flight I/Os on the path.
32 <relative_throughput>: The relative throughput value of the path
33 among all paths in the path-group.
34
35
36Algorithm
37=========
38
39dm-service-time adds the I/O size to 'in-flight-size' when the I/O is
40dispatched and substracts when completed.
41Basically, dm-service-time selects a path having minimum service time
42which is calculated by:
43
44 ('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput'
45
46However, some optimizations below are used to reduce the calculation
47as much as possible.
48
49 1. If the paths have the same 'relative_throughput', skip
50 the division and just compare the 'in-flight-size'.
51
52 2. If the paths have the same 'in-flight-size', skip the division
53 and just compare the 'relative_throughput'.
54
55 3. If some paths have non-zero 'relative_throughput' and others
56 have zero 'relative_throughput', ignore those paths with zero
57 'relative_throughput'.
58
59If such optimizations can't be applied, calculate service time, and
60compare service time.
61If calculated service time is equal, the path having maximum
62'relative_throughput' may be better. So compare 'relative_throughput'
63then.
64
65
66Examples
67========
68In case that 2 paths (sda and sdb) are used with repeat_count == 128
69and sda has an average throughput 1GB/s and sdb has 4GB/s,
70'relative_throughput' value may be '1' for sda and '4' for sdb.
71
72# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \
73 dmsetup create test
74#
75# dmsetup table
76test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4
77#
78# dmsetup status
79test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4
80
81
82Or '2' for sda and '8' for sdb would be also true.
83
84# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \
85 dmsetup create test
86#
87# dmsetup table
88test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8
89#
90# dmsetup status
91test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8