diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-24 13:26:54 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-06-24 13:26:54 -0400 |
commit | c3cb5e193937c7aa50c323e7933507020bd26340 (patch) | |
tree | ea36213ccd29dc4caf2f729fd51b2d489b591a31 /Documentation | |
parent | ea94b5034bbebc964115f119d6cd330757fce7f9 (diff) | |
parent | f40c67f0f7e2767f80f7cbcbc1ab86c4113c202e (diff) |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm
* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (48 commits)
dm mpath: change to be request based
dm: disable interrupt when taking map_lock
dm: do not set QUEUE_ORDERED_DRAIN if request based
dm: enable request based option
dm: prepare for request based option
dm raid1: add userspace log
dm: calculate queue limits during resume not load
dm log: fix create_log_context to use logical_block_size of log device
dm target:s introduce iterate devices fn
dm table: establish queue limits by copying table limits
dm table: replace struct io_restrictions with struct queue_limits
dm table: validate device logical_block_size
dm table: ensure targets are aligned to logical_block_size
dm ioctl: support cookies for udev
dm: sysfs add suspended attribute
dm table: improve warning message when devices not freed before destruction
dm mpath: add service time load balancer
dm mpath: add queue length load balancer
dm mpath: add start_io and nr_bytes to path selectors
dm snapshot: use barrier when writing exception store
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/device-mapper/dm-log.txt | 54 | ||||
-rw-r--r-- | Documentation/device-mapper/dm-queue-length.txt | 39 | ||||
-rw-r--r-- | Documentation/device-mapper/dm-service-time.txt | 91 |
3 files changed, 184 insertions, 0 deletions
diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt new file mode 100644 index 00000000000..994dd75475a --- /dev/null +++ b/Documentation/device-mapper/dm-log.txt | |||
@@ -0,0 +1,54 @@ | |||
1 | Device-Mapper Logging | ||
2 | ===================== | ||
3 | The device-mapper logging code is used by some of the device-mapper | ||
4 | RAID targets to track regions of the disk that are not consistent. | ||
5 | A region (or portion of the address space) of the disk may be | ||
6 | inconsistent because a RAID stripe is currently being operated on or | ||
7 | a machine died while the region was being altered. In the case of | ||
8 | mirrors, a region would be considered dirty/inconsistent while you | ||
9 | are writing to it because the writes need to be replicated for all | ||
10 | the legs of the mirror and may not reach the legs at the same time. | ||
11 | Once all writes are complete, the region is considered clean again. | ||
12 | |||
13 | There is a generic logging interface that the device-mapper RAID | ||
14 | implementations use to perform logging operations (see | ||
15 | dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different | ||
16 | logging implementations are available and provide different | ||
17 | capabilities. The list includes: | ||
18 | |||
19 | Type Files | ||
20 | ==== ===== | ||
21 | disk drivers/md/dm-log.c | ||
22 | core drivers/md/dm-log.c | ||
23 | userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h | ||
24 | |||
25 | The "disk" log type | ||
26 | ------------------- | ||
27 | This log implementation commits the log state to disk. This way, the | ||
28 | logging state survives reboots/crashes. | ||
29 | |||
30 | The "core" log type | ||
31 | ------------------- | ||
32 | This log implementation keeps the log state in memory. The log state | ||
33 | will not survive a reboot or crash, but there may be a small boost in | ||
34 | performance. This method can also be used if no storage device is | ||
35 | available for storing log state. | ||
36 | |||
37 | The "userspace" log type | ||
38 | ------------------------ | ||
39 | This log type simply provides a way to export the log API to userspace, | ||
40 | so log implementations can be done there. This is done by forwarding most | ||
41 | logging requests to userspace, where a daemon receives and processes the | ||
42 | request. | ||
43 | |||
44 | The structure used for communication between kernel and userspace are | ||
45 | located in include/linux/dm-log-userspace.h. Due to the frequency, | ||
46 | diversity, and 2-way communication nature of the exchanges between | ||
47 | kernel and userspace, 'connector' is used as the interface for | ||
48 | communication. | ||
49 | |||
50 | There are currently two userspace log implementations that leverage this | ||
51 | framework - "clustered_disk" and "clustered_core". These implementations | ||
52 | provide a cluster-coherent log for shared-storage. Device-mapper mirroring | ||
53 | can be used in a shared-storage environment when the cluster log implementations | ||
54 | are employed. | ||
diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt new file mode 100644 index 00000000000..f4db2562175 --- /dev/null +++ b/Documentation/device-mapper/dm-queue-length.txt | |||
@@ -0,0 +1,39 @@ | |||
1 | dm-queue-length | ||
2 | =============== | ||
3 | |||
4 | dm-queue-length is a path selector module for device-mapper targets, | ||
5 | which selects a path with the least number of in-flight I/Os. | ||
6 | The path selector name is 'queue-length'. | ||
7 | |||
8 | Table parameters for each path: [<repeat_count>] | ||
9 | <repeat_count>: The number of I/Os to dispatch using the selected | ||
10 | path before switching to the next path. | ||
11 | If not given, internal default is used. To check | ||
12 | the default value, see the activated table. | ||
13 | |||
14 | Status for each path: <status> <fail-count> <in-flight> | ||
15 | <status>: 'A' if the path is active, 'F' if the path is failed. | ||
16 | <fail-count>: The number of path failures. | ||
17 | <in-flight>: The number of in-flight I/Os on the path. | ||
18 | |||
19 | |||
20 | Algorithm | ||
21 | ========= | ||
22 | |||
23 | dm-queue-length increments/decrements 'in-flight' when an I/O is | ||
24 | dispatched/completed respectively. | ||
25 | dm-queue-length selects a path with the minimum 'in-flight'. | ||
26 | |||
27 | |||
28 | Examples | ||
29 | ======== | ||
30 | In case that 2 paths (sda and sdb) are used with repeat_count == 128. | ||
31 | |||
32 | # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \ | ||
33 | dmsetup create test | ||
34 | # | ||
35 | # dmsetup table | ||
36 | test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 | ||
37 | # | ||
38 | # dmsetup status | ||
39 | test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0 | ||
diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt new file mode 100644 index 00000000000..7d00668e97b --- /dev/null +++ b/Documentation/device-mapper/dm-service-time.txt | |||
@@ -0,0 +1,91 @@ | |||
1 | dm-service-time | ||
2 | =============== | ||
3 | |||
4 | dm-service-time is a path selector module for device-mapper targets, | ||
5 | which selects a path with the shortest estimated service time for | ||
6 | the incoming I/O. | ||
7 | |||
8 | The service time for each path is estimated by dividing the total size | ||
9 | of in-flight I/Os on a path with the performance value of the path. | ||
10 | The performance value is a relative throughput value among all paths | ||
11 | in a path-group, and it can be specified as a table argument. | ||
12 | |||
13 | The path selector name is 'service-time'. | ||
14 | |||
15 | Table parameters for each path: [<repeat_count> [<relative_throughput>]] | ||
16 | <repeat_count>: The number of I/Os to dispatch using the selected | ||
17 | path before switching to the next path. | ||
18 | If not given, internal default is used. To check | ||
19 | the default value, see the activated table. | ||
20 | <relative_throughput>: The relative throughput value of the path | ||
21 | among all paths in the path-group. | ||
22 | The valid range is 0-100. | ||
23 | If not given, minimum value '1' is used. | ||
24 | If '0' is given, the path isn't selected while | ||
25 | other paths having a positive value are available. | ||
26 | |||
27 | Status for each path: <status> <fail-count> <in-flight-size> \ | ||
28 | <relative_throughput> | ||
29 | <status>: 'A' if the path is active, 'F' if the path is failed. | ||
30 | <fail-count>: The number of path failures. | ||
31 | <in-flight-size>: The size of in-flight I/Os on the path. | ||
32 | <relative_throughput>: The relative throughput value of the path | ||
33 | among all paths in the path-group. | ||
34 | |||
35 | |||
36 | Algorithm | ||
37 | ========= | ||
38 | |||
39 | dm-service-time adds the I/O size to 'in-flight-size' when the I/O is | ||
40 | dispatched and substracts when completed. | ||
41 | Basically, dm-service-time selects a path having minimum service time | ||
42 | which is calculated by: | ||
43 | |||
44 | ('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput' | ||
45 | |||
46 | However, some optimizations below are used to reduce the calculation | ||
47 | as much as possible. | ||
48 | |||
49 | 1. If the paths have the same 'relative_throughput', skip | ||
50 | the division and just compare the 'in-flight-size'. | ||
51 | |||
52 | 2. If the paths have the same 'in-flight-size', skip the division | ||
53 | and just compare the 'relative_throughput'. | ||
54 | |||
55 | 3. If some paths have non-zero 'relative_throughput' and others | ||
56 | have zero 'relative_throughput', ignore those paths with zero | ||
57 | 'relative_throughput'. | ||
58 | |||
59 | If such optimizations can't be applied, calculate service time, and | ||
60 | compare service time. | ||
61 | If calculated service time is equal, the path having maximum | ||
62 | 'relative_throughput' may be better. So compare 'relative_throughput' | ||
63 | then. | ||
64 | |||
65 | |||
66 | Examples | ||
67 | ======== | ||
68 | In case that 2 paths (sda and sdb) are used with repeat_count == 128 | ||
69 | and sda has an average throughput 1GB/s and sdb has 4GB/s, | ||
70 | 'relative_throughput' value may be '1' for sda and '4' for sdb. | ||
71 | |||
72 | # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ | ||
73 | dmsetup create test | ||
74 | # | ||
75 | # dmsetup table | ||
76 | test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 | ||
77 | # | ||
78 | # dmsetup status | ||
79 | test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 | ||
80 | |||
81 | |||
82 | Or '2' for sda and '8' for sdb would be also true. | ||
83 | |||
84 | # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ | ||
85 | dmsetup create test | ||
86 | # | ||
87 | # dmsetup table | ||
88 | test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 | ||
89 | # | ||
90 | # dmsetup status | ||
91 | test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 | ||