diff options
Diffstat (limited to 'Documentation/device-mapper/dm-integrity.rst')
-rw-r--r-- | Documentation/device-mapper/dm-integrity.rst | 259 |
1 files changed, 259 insertions, 0 deletions
diff --git a/Documentation/device-mapper/dm-integrity.rst b/Documentation/device-mapper/dm-integrity.rst new file mode 100644 index 000000000000..a30aa91b5fbe --- /dev/null +++ b/Documentation/device-mapper/dm-integrity.rst | |||
@@ -0,0 +1,259 @@ | |||
1 | ============ | ||
2 | dm-integrity | ||
3 | ============ | ||
4 | |||
5 | The dm-integrity target emulates a block device that has additional | ||
6 | per-sector tags that can be used for storing integrity information. | ||
7 | |||
8 | A general problem with storing integrity tags with every sector is that | ||
9 | writing the sector and the integrity tag must be atomic - i.e. in case of | ||
10 | crash, either both sector and integrity tag or none of them is written. | ||
11 | |||
12 | To guarantee write atomicity, the dm-integrity target uses journal, it | ||
13 | writes sector data and integrity tags into a journal, commits the journal | ||
14 | and then copies the data and integrity tags to their respective location. | ||
15 | |||
16 | The dm-integrity target can be used with the dm-crypt target - in this | ||
17 | situation the dm-crypt target creates the integrity data and passes them | ||
18 | to the dm-integrity target via bio_integrity_payload attached to the bio. | ||
19 | In this mode, the dm-crypt and dm-integrity targets provide authenticated | ||
20 | disk encryption - if the attacker modifies the encrypted device, an I/O | ||
21 | error is returned instead of random data. | ||
22 | |||
23 | The dm-integrity target can also be used as a standalone target, in this | ||
24 | mode it calculates and verifies the integrity tag internally. In this | ||
25 | mode, the dm-integrity target can be used to detect silent data | ||
26 | corruption on the disk or in the I/O path. | ||
27 | |||
28 | There's an alternate mode of operation where dm-integrity uses bitmap | ||
29 | instead of a journal. If a bit in the bitmap is 1, the corresponding | ||
30 | region's data and integrity tags are not synchronized - if the machine | ||
31 | crashes, the unsynchronized regions will be recalculated. The bitmap mode | ||
32 | is faster than the journal mode, because we don't have to write the data | ||
33 | twice, but it is also less reliable, because if data corruption happens | ||
34 | when the machine crashes, it may not be detected. | ||
35 | |||
36 | When loading the target for the first time, the kernel driver will format | ||
37 | the device. But it will only format the device if the superblock contains | ||
38 | zeroes. If the superblock is neither valid nor zeroed, the dm-integrity | ||
39 | target can't be loaded. | ||
40 | |||
41 | To use the target for the first time: | ||
42 | |||
43 | 1. overwrite the superblock with zeroes | ||
44 | 2. load the dm-integrity target with one-sector size, the kernel driver | ||
45 | will format the device | ||
46 | 3. unload the dm-integrity target | ||
47 | 4. read the "provided_data_sectors" value from the superblock | ||
48 | 5. load the dm-integrity target with the the target size | ||
49 | "provided_data_sectors" | ||
50 | 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target | ||
51 | with the size "provided_data_sectors" | ||
52 | |||
53 | |||
54 | Target arguments: | ||
55 | |||
56 | 1. the underlying block device | ||
57 | |||
58 | 2. the number of reserved sector at the beginning of the device - the | ||
59 | dm-integrity won't read of write these sectors | ||
60 | |||
61 | 3. the size of the integrity tag (if "-" is used, the size is taken from | ||
62 | the internal-hash algorithm) | ||
63 | |||
64 | 4. mode: | ||
65 | |||
66 | D - direct writes (without journal) | ||
67 | in this mode, journaling is | ||
68 | not used and data sectors and integrity tags are written | ||
69 | separately. In case of crash, it is possible that the data | ||
70 | and integrity tag doesn't match. | ||
71 | J - journaled writes | ||
72 | data and integrity tags are written to the | ||
73 | journal and atomicity is guaranteed. In case of crash, | ||
74 | either both data and tag or none of them are written. The | ||
75 | journaled mode degrades write throughput twice because the | ||
76 | data have to be written twice. | ||
77 | B - bitmap mode - data and metadata are written without any | ||
78 | synchronization, the driver maintains a bitmap of dirty | ||
79 | regions where data and metadata don't match. This mode can | ||
80 | only be used with internal hash. | ||
81 | R - recovery mode - in this mode, journal is not replayed, | ||
82 | checksums are not checked and writes to the device are not | ||
83 | allowed. This mode is useful for data recovery if the | ||
84 | device cannot be activated in any of the other standard | ||
85 | modes. | ||
86 | |||
87 | 5. the number of additional arguments | ||
88 | |||
89 | Additional arguments: | ||
90 | |||
91 | journal_sectors:number | ||
92 | The size of journal, this argument is used only if formatting the | ||
93 | device. If the device is already formatted, the value from the | ||
94 | superblock is used. | ||
95 | |||
96 | interleave_sectors:number | ||
97 | The number of interleaved sectors. This values is rounded down to | ||
98 | a power of two. If the device is already formatted, the value from | ||
99 | the superblock is used. | ||
100 | |||
101 | meta_device:device | ||
102 | Don't interleave the data and metadata on on device. Use a | ||
103 | separate device for metadata. | ||
104 | |||
105 | buffer_sectors:number | ||
106 | The number of sectors in one buffer. The value is rounded down to | ||
107 | a power of two. | ||
108 | |||
109 | The tag area is accessed using buffers, the buffer size is | ||
110 | configurable. The large buffer size means that the I/O size will | ||
111 | be larger, but there could be less I/Os issued. | ||
112 | |||
113 | journal_watermark:number | ||
114 | The journal watermark in percents. When the size of the journal | ||
115 | exceeds this watermark, the thread that flushes the journal will | ||
116 | be started. | ||
117 | |||
118 | commit_time:number | ||
119 | Commit time in milliseconds. When this time passes, the journal is | ||
120 | written. The journal is also written immediatelly if the FLUSH | ||
121 | request is received. | ||
122 | |||
123 | internal_hash:algorithm(:key) (the key is optional) | ||
124 | Use internal hash or crc. | ||
125 | When this argument is used, the dm-integrity target won't accept | ||
126 | integrity tags from the upper target, but it will automatically | ||
127 | generate and verify the integrity tags. | ||
128 | |||
129 | You can use a crc algorithm (such as crc32), then integrity target | ||
130 | will protect the data against accidental corruption. | ||
131 | You can also use a hmac algorithm (for example | ||
132 | "hmac(sha256):0123456789abcdef"), in this mode it will provide | ||
133 | cryptographic authentication of the data without encryption. | ||
134 | |||
135 | When this argument is not used, the integrity tags are accepted | ||
136 | from an upper layer target, such as dm-crypt. The upper layer | ||
137 | target should check the validity of the integrity tags. | ||
138 | |||
139 | recalculate | ||
140 | Recalculate the integrity tags automatically. It is only valid | ||
141 | when using internal hash. | ||
142 | |||
143 | journal_crypt:algorithm(:key) (the key is optional) | ||
144 | Encrypt the journal using given algorithm to make sure that the | ||
145 | attacker can't read the journal. You can use a block cipher here | ||
146 | (such as "cbc(aes)") or a stream cipher (for example "chacha20", | ||
147 | "salsa20", "ctr(aes)" or "ecb(arc4)"). | ||
148 | |||
149 | The journal contains history of last writes to the block device, | ||
150 | an attacker reading the journal could see the last sector nubmers | ||
151 | that were written. From the sector numbers, the attacker can infer | ||
152 | the size of files that were written. To protect against this | ||
153 | situation, you can encrypt the journal. | ||
154 | |||
155 | journal_mac:algorithm(:key) (the key is optional) | ||
156 | Protect sector numbers in the journal from accidental or malicious | ||
157 | modification. To protect against accidental modification, use a | ||
158 | crc algorithm, to protect against malicious modification, use a | ||
159 | hmac algorithm with a key. | ||
160 | |||
161 | This option is not needed when using internal-hash because in this | ||
162 | mode, the integrity of journal entries is checked when replaying | ||
163 | the journal. Thus, modified sector number would be detected at | ||
164 | this stage. | ||
165 | |||
166 | block_size:number | ||
167 | The size of a data block in bytes. The larger the block size the | ||
168 | less overhead there is for per-block integrity metadata. | ||
169 | Supported values are 512, 1024, 2048 and 4096 bytes. If not | ||
170 | specified the default block size is 512 bytes. | ||
171 | |||
172 | sectors_per_bit:number | ||
173 | In the bitmap mode, this parameter specifies the number of | ||
174 | 512-byte sectors that corresponds to one bitmap bit. | ||
175 | |||
176 | bitmap_flush_interval:number | ||
177 | The bitmap flush interval in milliseconds. The metadata buffers | ||
178 | are synchronized when this interval expires. | ||
179 | |||
180 | |||
181 | The journal mode (D/J), buffer_sectors, journal_watermark, commit_time can | ||
182 | be changed when reloading the target (load an inactive table and swap the | ||
183 | tables with suspend and resume). The other arguments should not be changed | ||
184 | when reloading the target because the layout of disk data depend on them | ||
185 | and the reloaded target would be non-functional. | ||
186 | |||
187 | |||
188 | The layout of the formatted block device: | ||
189 | |||
190 | * reserved sectors | ||
191 | (they are not used by this target, they can be used for | ||
192 | storing LUKS metadata or for other purpose), the size of the reserved | ||
193 | area is specified in the target arguments | ||
194 | |||
195 | * superblock (4kiB) | ||
196 | * magic string - identifies that the device was formatted | ||
197 | * version | ||
198 | * log2(interleave sectors) | ||
199 | * integrity tag size | ||
200 | * the number of journal sections | ||
201 | * provided data sectors - the number of sectors that this target | ||
202 | provides (i.e. the size of the device minus the size of all | ||
203 | metadata and padding). The user of this target should not send | ||
204 | bios that access data beyond the "provided data sectors" limit. | ||
205 | * flags | ||
206 | SB_FLAG_HAVE_JOURNAL_MAC | ||
207 | - a flag is set if journal_mac is used | ||
208 | SB_FLAG_RECALCULATING | ||
209 | - recalculating is in progress | ||
210 | SB_FLAG_DIRTY_BITMAP | ||
211 | - journal area contains the bitmap of dirty | ||
212 | blocks | ||
213 | * log2(sectors per block) | ||
214 | * a position where recalculating finished | ||
215 | * journal | ||
216 | The journal is divided into sections, each section contains: | ||
217 | |||
218 | * metadata area (4kiB), it contains journal entries | ||
219 | |||
220 | - every journal entry contains: | ||
221 | |||
222 | * logical sector (specifies where the data and tag should | ||
223 | be written) | ||
224 | * last 8 bytes of data | ||
225 | * integrity tag (the size is specified in the superblock) | ||
226 | |||
227 | - every metadata sector ends with | ||
228 | |||
229 | * mac (8-bytes), all the macs in 8 metadata sectors form a | ||
230 | 64-byte value. It is used to store hmac of sector | ||
231 | numbers in the journal section, to protect against a | ||
232 | possibility that the attacker tampers with sector | ||
233 | numbers in the journal. | ||
234 | * commit id | ||
235 | |||
236 | * data area (the size is variable; it depends on how many journal | ||
237 | entries fit into the metadata area) | ||
238 | |||
239 | - every sector in the data area contains: | ||
240 | |||
241 | * data (504 bytes of data, the last 8 bytes are stored in | ||
242 | the journal entry) | ||
243 | * commit id | ||
244 | |||
245 | To test if the whole journal section was written correctly, every | ||
246 | 512-byte sector of the journal ends with 8-byte commit id. If the | ||
247 | commit id matches on all sectors in a journal section, then it is | ||
248 | assumed that the section was written correctly. If the commit id | ||
249 | doesn't match, the section was written partially and it should not | ||
250 | be replayed. | ||
251 | |||
252 | * one or more runs of interleaved tags and data. | ||
253 | Each run contains: | ||
254 | |||
255 | * tag area - it contains integrity tags. There is one tag for each | ||
256 | sector in the data area | ||
257 | * data area - it contains data sectors. The number of data sectors | ||
258 | in one run must be a power of two. log2 of this value is stored | ||
259 | in the superblock. | ||