diff options
author | Sascha Hauer <s.hauer@pengutronix.de> | 2018-09-07 08:36:46 -0400 |
---|---|---|
committer | Richard Weinberger <richard@nod.at> | 2018-10-23 07:49:01 -0400 |
commit | e453fa60e086786fe89ba15ee8fef80bc2e6ecc3 (patch) | |
tree | eb99c1f706f6b387c734acfe6025380339114523 | |
parent | d8a22773a12c6d78ee758c9e530f3a488bb7cb29 (diff) |
Documentation: ubifs: Add authentication whitepaper
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
-rw-r--r-- | Documentation/filesystems/ubifs-authentication.md | 426 |
1 files changed, 426 insertions, 0 deletions
diff --git a/Documentation/filesystems/ubifs-authentication.md b/Documentation/filesystems/ubifs-authentication.md new file mode 100644 index 000000000000..028b3e2e25f9 --- /dev/null +++ b/Documentation/filesystems/ubifs-authentication.md | |||
@@ -0,0 +1,426 @@ | |||
1 | % UBIFS Authentication | ||
2 | % sigma star gmbh | ||
3 | % 2018 | ||
4 | |||
5 | # Introduction | ||
6 | |||
7 | UBIFS utilizes the fscrypt framework to provide confidentiality for file | ||
8 | contents and file names. This prevents attacks where an attacker is able to | ||
9 | read contents of the filesystem on a single point in time. A classic example | ||
10 | is a lost smartphone where the attacker is unable to read personal data stored | ||
11 | on the device without the filesystem decryption key. | ||
12 | |||
13 | At the current state, UBIFS encryption however does not prevent attacks where | ||
14 | the attacker is able to modify the filesystem contents and the user uses the | ||
15 | device afterwards. In such a scenario an attacker can modify filesystem | ||
16 | contents arbitrarily without the user noticing. One example is to modify a | ||
17 | binary to perform a malicious action when executed [DMC-CBC-ATTACK]. Since | ||
18 | most of the filesystem metadata of UBIFS is stored in plain, this makes it | ||
19 | fairly easy to swap files and replace their contents. | ||
20 | |||
21 | Other full disk encryption systems like dm-crypt cover all filesystem metadata, | ||
22 | which makes such kinds of attacks more complicated, but not impossible. | ||
23 | Especially, if the attacker is given access to the device multiple points in | ||
24 | time. For dm-crypt and other filesystems that build upon the Linux block IO | ||
25 | layer, the dm-integrity or dm-verity subsystems [DM-INTEGRITY, DM-VERITY] | ||
26 | can be used to get full data authentication at the block layer. | ||
27 | These can also be combined with dm-crypt [CRYPTSETUP2]. | ||
28 | |||
29 | This document describes an approach to get file contents _and_ full metadata | ||
30 | authentication for UBIFS. Since UBIFS uses fscrypt for file contents and file | ||
31 | name encryption, the authentication system could be tied into fscrypt such that | ||
32 | existing features like key derivation can be utilized. It should however also | ||
33 | be possible to use UBIFS authentication without using encryption. | ||
34 | |||
35 | |||
36 | ## MTD, UBI & UBIFS | ||
37 | |||
38 | On Linux, the MTD (Memory Technology Devices) subsystem provides a uniform | ||
39 | interface to access raw flash devices. One of the more prominent subsystems that | ||
40 | work on top of MTD is UBI (Unsorted Block Images). It provides volume management | ||
41 | for flash devices and is thus somewhat similar to LVM for block devices. In | ||
42 | addition, it deals with flash-specific wear-leveling and transparent I/O error | ||
43 | handling. UBI offers logical erase blocks (LEBs) to the layers on top of it | ||
44 | and maps them transparently to physical erase blocks (PEBs) on the flash. | ||
45 | |||
46 | UBIFS is a filesystem for raw flash which operates on top of UBI. Thus, wear | ||
47 | leveling and some flash specifics are left to UBI, while UBIFS focuses on | ||
48 | scalability, performance and recoverability. | ||
49 | |||
50 | |||
51 | |||
52 | +------------+ +*******+ +-----------+ +-----+ | ||
53 | | | * UBIFS * | UBI-BLOCK | | ... | | ||
54 | | JFFS/JFFS2 | +*******+ +-----------+ +-----+ | ||
55 | | | +-----------------------------+ +-----------+ +-----+ | ||
56 | | | | UBI | | MTD-BLOCK | | ... | | ||
57 | +------------+ +-----------------------------+ +-----------+ +-----+ | ||
58 | +------------------------------------------------------------------+ | ||
59 | | MEMORY TECHNOLOGY DEVICES (MTD) | | ||
60 | +------------------------------------------------------------------+ | ||
61 | +-----------------------------+ +--------------------------+ +-----+ | ||
62 | | NAND DRIVERS | | NOR DRIVERS | | ... | | ||
63 | +-----------------------------+ +--------------------------+ +-----+ | ||
64 | |||
65 | Figure 1: Linux kernel subsystems for dealing with raw flash | ||
66 | |||
67 | |||
68 | |||
69 | Internally, UBIFS maintains multiple data structures which are persisted on | ||
70 | the flash: | ||
71 | |||
72 | - *Index*: an on-flash B+ tree where the leaf nodes contain filesystem data | ||
73 | - *Journal*: an additional data structure to collect FS changes before updating | ||
74 | the on-flash index and reduce flash wear. | ||
75 | - *Tree Node Cache (TNC)*: an in-memory B+ tree that reflects the current FS | ||
76 | state to avoid frequent flash reads. It is basically the in-memory | ||
77 | representation of the index, but contains additional attributes. | ||
78 | - *LEB property tree (LPT)*: an on-flash B+ tree for free space accounting per | ||
79 | UBI LEB. | ||
80 | |||
81 | In the remainder of this section we will cover the on-flash UBIFS data | ||
82 | structures in more detail. The TNC is of less importance here since it is never | ||
83 | persisted onto the flash directly. More details on UBIFS can also be found in | ||
84 | [UBIFS-WP]. | ||
85 | |||
86 | |||
87 | ### UBIFS Index & Tree Node Cache | ||
88 | |||
89 | Basic on-flash UBIFS entities are called *nodes*. UBIFS knows different types | ||
90 | of nodes. Eg. data nodes (`struct ubifs_data_node`) which store chunks of file | ||
91 | contents or inode nodes (`struct ubifs_ino_node`) which represent VFS inodes. | ||
92 | Almost all types of nodes share a common header (`ubifs_ch`) containing basic | ||
93 | information like node type, node length, a sequence number, etc. (see | ||
94 | `fs/ubifs/ubifs-media.h`in kernel source). Exceptions are entries of the LPT | ||
95 | and some less important node types like padding nodes which are used to pad | ||
96 | unusable content at the end of LEBs. | ||
97 | |||
98 | To avoid re-writing the whole B+ tree on every single change, it is implemented | ||
99 | as *wandering tree*, where only the changed nodes are re-written and previous | ||
100 | versions of them are obsoleted without erasing them right away. As a result, | ||
101 | the index is not stored in a single place on the flash, but *wanders* around | ||
102 | and there are obsolete parts on the flash as long as the LEB containing them is | ||
103 | not reused by UBIFS. To find the most recent version of the index, UBIFS stores | ||
104 | a special node called *master node* into UBI LEB 1 which always points to the | ||
105 | most recent root node of the UBIFS index. For recoverability, the master node | ||
106 | is additionally duplicated to LEB 2. Mounting UBIFS is thus a simple read of | ||
107 | LEB 1 and 2 to get the current master node and from there get the location of | ||
108 | the most recent on-flash index. | ||
109 | |||
110 | The TNC is the in-memory representation of the on-flash index. It contains some | ||
111 | additional runtime attributes per node which are not persisted. One of these is | ||
112 | a dirty-flag which marks nodes that have to be persisted the next time the | ||
113 | index is written onto the flash. The TNC acts as a write-back cache and all | ||
114 | modifications of the on-flash index are done through the TNC. Like other caches, | ||
115 | the TNC does not have to mirror the full index into memory, but reads parts of | ||
116 | it from flash whenever needed. A *commit* is the UBIFS operation of updating the | ||
117 | on-flash filesystem structures like the index. On every commit, the TNC nodes | ||
118 | marked as dirty are written to the flash to update the persisted index. | ||
119 | |||
120 | |||
121 | ### Journal | ||
122 | |||
123 | To avoid wearing out the flash, the index is only persisted (*commited*) when | ||
124 | certain conditions are met (eg. `fsync(2)`). The journal is used to record | ||
125 | any changes (in form of inode nodes, data nodes etc.) between commits | ||
126 | of the index. During mount, the journal is read from the flash and replayed | ||
127 | onto the TNC (which will be created on-demand from the on-flash index). | ||
128 | |||
129 | UBIFS reserves a bunch of LEBs just for the journal called *log area*. The | ||
130 | amount of log area LEBs is configured on filesystem creation (using | ||
131 | `mkfs.ubifs`) and stored in the superblock node. The log area contains only | ||
132 | two types of nodes: *reference nodes* and *commit start nodes*. A commit start | ||
133 | node is written whenever an index commit is performed. Reference nodes are | ||
134 | written on every journal update. Each reference node points to the position of | ||
135 | other nodes (inode nodes, data nodes etc.) on the flash that are part of this | ||
136 | journal entry. These nodes are called *buds* and describe the actual filesystem | ||
137 | changes including their data. | ||
138 | |||
139 | The log area is maintained as a ring. Whenever the journal is almost full, | ||
140 | a commit is initiated. This also writes a commit start node so that during | ||
141 | mount, UBIFS will seek for the most recent commit start node and just replay | ||
142 | every reference node after that. Every reference node before the commit start | ||
143 | node will be ignored as they are already part of the on-flash index. | ||
144 | |||
145 | When writing a journal entry, UBIFS first ensures that enough space is | ||
146 | available to write the reference node and buds part of this entry. Then, the | ||
147 | reference node is written and afterwards the buds describing the file changes. | ||
148 | On replay, UBIFS will record every reference node and inspect the location of | ||
149 | the referenced LEBs to discover the buds. If these are corrupt or missing, | ||
150 | UBIFS will attempt to recover them by re-reading the LEB. This is however only | ||
151 | done for the last referenced LEB of the journal. Only this can become corrupt | ||
152 | because of a power cut. If the recovery fails, UBIFS will not mount. An error | ||
153 | for every other LEB will directly cause UBIFS to fail the mount operation. | ||
154 | |||
155 | |||
156 | | ---- LOG AREA ---- | ---------- MAIN AREA ------------ | | ||
157 | |||
158 | -----+------+-----+--------+---- ------+-----+-----+--------------- | ||
159 | \ | | | | / / | | | \ | ||
160 | / CS | REF | REF | | \ \ DENT | INO | INO | / | ||
161 | \ | | | | / / | | | \ | ||
162 | ----+------+-----+--------+--- -------+-----+-----+---------------- | ||
163 | | | ^ ^ | ||
164 | | | | | | ||
165 | +------------------------+ | | ||
166 | | | | ||
167 | +-------------------------------+ | ||
168 | |||
169 | |||
170 | Figure 2: UBIFS flash layout of log area with commit start nodes | ||
171 | (CS) and reference nodes (REF) pointing to main area | ||
172 | containing their buds | ||
173 | |||
174 | |||
175 | ### LEB Property Tree/Table | ||
176 | |||
177 | The LEB property tree is used to store per-LEB information. This includes the | ||
178 | LEB type and amount of free and *dirty* (old, obsolete content) space [1] on | ||
179 | the LEB. The type is important, because UBIFS never mixes index nodes with data | ||
180 | nodes on a single LEB and thus each LEB has a specific purpose. This again is | ||
181 | useful for free space calculations. See [UBIFS-WP] for more details. | ||
182 | |||
183 | The LEB property tree again is a B+ tree, but it is much smaller than the | ||
184 | index. Due to its smaller size it is always written as one chunk on every | ||
185 | commit. Thus, saving the LPT is an atomic operation. | ||
186 | |||
187 | |||
188 | [1] Since LEBs can only be appended and never overwritten, there is a | ||
189 | difference between free space ie. the remaining space left on the LEB to be | ||
190 | written to without erasing it and previously written content that is obsolete | ||
191 | but can't be overwritten without erasing the full LEB. | ||
192 | |||
193 | |||
194 | # UBIFS Authentication | ||
195 | |||
196 | This chapter introduces UBIFS authentication which enables UBIFS to verify | ||
197 | the authenticity and integrity of metadata and file contents stored on flash. | ||
198 | |||
199 | |||
200 | ## Threat Model | ||
201 | |||
202 | UBIFS authentication enables detection of offline data modification. While it | ||
203 | does not prevent it, it enables (trusted) code to check the integrity and | ||
204 | authenticity of on-flash file contents and filesystem metadata. This covers | ||
205 | attacks where file contents are swapped. | ||
206 | |||
207 | UBIFS authentication will not protect against rollback of full flash contents. | ||
208 | Ie. an attacker can still dump the flash and restore it at a later time without | ||
209 | detection. It will also not protect against partial rollback of individual | ||
210 | index commits. That means that an attacker is able to partially undo changes. | ||
211 | This is possible because UBIFS does not immediately overwrites obsolete | ||
212 | versions of the index tree or the journal, but instead marks them as obsolete | ||
213 | and garbage collection erases them at a later time. An attacker can use this by | ||
214 | erasing parts of the current tree and restoring old versions that are still on | ||
215 | the flash and have not yet been erased. This is possible, because every commit | ||
216 | will always write a new version of the index root node and the master node | ||
217 | without overwriting the previous version. This is further helped by the | ||
218 | wear-leveling operations of UBI which copies contents from one physical | ||
219 | eraseblock to another and does not atomically erase the first eraseblock. | ||
220 | |||
221 | UBIFS authentication does not cover attacks where an attacker is able to | ||
222 | execute code on the device after the authentication key was provided. | ||
223 | Additional measures like secure boot and trusted boot have to be taken to | ||
224 | ensure that only trusted code is executed on a device. | ||
225 | |||
226 | |||
227 | ## Authentication | ||
228 | |||
229 | To be able to fully trust data read from flash, all UBIFS data structures | ||
230 | stored on flash are authenticated. That is: | ||
231 | |||
232 | - The index which includes file contents, file metadata like extended | ||
233 | attributes, file length etc. | ||
234 | - The journal which also contains file contents and metadata by recording changes | ||
235 | to the filesystem | ||
236 | - The LPT which stores UBI LEB metadata which UBIFS uses for free space accounting | ||
237 | |||
238 | |||
239 | ### Index Authentication | ||
240 | |||
241 | Through UBIFS' concept of a wandering tree, it already takes care of only | ||
242 | updating and persisting changed parts from leaf node up to the root node | ||
243 | of the full B+ tree. This enables us to augment the index nodes of the tree | ||
244 | with a hash over each node's child nodes. As a result, the index basically also | ||
245 | a Merkle tree. Since the leaf nodes of the index contain the actual filesystem | ||
246 | data, the hashes of their parent index nodes thus cover all the file contents | ||
247 | and file metadata. When a file changes, the UBIFS index is updated accordingly | ||
248 | from the leaf nodes up to the root node including the master node. This process | ||
249 | can be hooked to recompute the hash only for each changed node at the same time. | ||
250 | Whenever a file is read, UBIFS can verify the hashes from each leaf node up to | ||
251 | the root node to ensure the node's integrity. | ||
252 | |||
253 | To ensure the authenticity of the whole index, the UBIFS master node stores a | ||
254 | keyed hash (HMAC) over its own contents and a hash of the root node of the index | ||
255 | tree. As mentioned above, the master node is always written to the flash whenever | ||
256 | the index is persisted (ie. on index commit). | ||
257 | |||
258 | Using this approach only UBIFS index nodes and the master node are changed to | ||
259 | include a hash. All other types of nodes will remain unchanged. This reduces | ||
260 | the storage overhead which is precious for users of UBIFS (ie. embedded | ||
261 | devices). | ||
262 | |||
263 | |||
264 | +---------------+ | ||
265 | | Master Node | | ||
266 | | (hash) | | ||
267 | +---------------+ | ||
268 | | | ||
269 | v | ||
270 | +-------------------+ | ||
271 | | Index Node #1 | | ||
272 | | | | ||
273 | | branch0 branchn | | ||
274 | | (hash) (hash) | | ||
275 | +-------------------+ | ||
276 | | ... | (fanout: 8) | ||
277 | | | | ||
278 | +-------+ +------+ | ||
279 | | | | ||
280 | v v | ||
281 | +-------------------+ +-------------------+ | ||
282 | | Index Node #2 | | Index Node #3 | | ||
283 | | | | | | ||
284 | | branch0 branchn | | branch0 branchn | | ||
285 | | (hash) (hash) | | (hash) (hash) | | ||
286 | +-------------------+ +-------------------+ | ||
287 | | ... | ... | | ||
288 | v v v | ||
289 | +-----------+ +----------+ +-----------+ | ||
290 | | Data Node | | INO Node | | DENT Node | | ||
291 | +-----------+ +----------+ +-----------+ | ||
292 | |||
293 | |||
294 | Figure 3: Coverage areas of index node hash and master node HMAC | ||
295 | |||
296 | |||
297 | |||
298 | The most important part for robustness and power-cut safety is to atomically | ||
299 | persist the hash and file contents. Here the existing UBIFS logic for how | ||
300 | changed nodes are persisted is already designed for this purpose such that | ||
301 | UBIFS can safely recover if a power-cut occurs while persisting. Adding | ||
302 | hashes to index nodes does not change this since each hash will be persisted | ||
303 | atomically together with its respective node. | ||
304 | |||
305 | |||
306 | ### Journal Authentication | ||
307 | |||
308 | The journal is authenticated too. Since the journal is continuously written | ||
309 | it is necessary to also add authentication information frequently to the | ||
310 | journal so that in case of a powercut not too much data can't be authenticated. | ||
311 | This is done by creating a continuous hash beginning from the commit start node | ||
312 | over the previous reference nodes, the current reference node, and the bud | ||
313 | nodes. From time to time whenever it is suitable authentication nodes are added | ||
314 | between the bud nodes. This new node type contains a HMAC over the current state | ||
315 | of the hash chain. That way a journal can be authenticated up to the last | ||
316 | authentication node. The tail of the journal which may not have a authentication | ||
317 | node cannot be authenticated and is skipped during journal replay. | ||
318 | |||
319 | We get this picture for journal authentication: | ||
320 | |||
321 | ,,,,,,,, | ||
322 | ,......,........................................... | ||
323 | ,. CS , hash1.----. hash2.----. | ||
324 | ,. | , . |hmac . |hmac | ||
325 | ,. v , . v . v | ||
326 | ,.REF#0,-> bud -> bud -> bud.-> auth -> bud -> bud.-> auth ... | ||
327 | ,..|...,........................................... | ||
328 | , | , | ||
329 | , | ,,,,,,,,,,,,,,, | ||
330 | . | hash3,----. | ||
331 | , | , |hmac | ||
332 | , v , v | ||
333 | , REF#1 -> bud -> bud,-> auth ... | ||
334 | ,,,|,,,,,,,,,,,,,,,,,, | ||
335 | v | ||
336 | REF#2 -> ... | ||
337 | | | ||
338 | V | ||
339 | ... | ||
340 | |||
341 | Since the hash also includes the reference nodes an attacker cannot reorder or | ||
342 | skip any journal heads for replay. An attacker can only remove bud nodes or | ||
343 | reference nodes from the end of the journal, effectively rewinding the | ||
344 | filesystem at maximum back to the last commit. | ||
345 | |||
346 | The location of the log area is stored in the master node. Since the master | ||
347 | node is authenticated with a HMAC as described above, it is not possible to | ||
348 | tamper with that without detection. The size of the log area is specified when | ||
349 | the filesystem is created using `mkfs.ubifs` and stored in the superblock node. | ||
350 | To avoid tampering with this and other values stored there, a HMAC is added to | ||
351 | the superblock struct. The superblock node is stored in LEB 0 and is only | ||
352 | modified on feature flag or similar changes, but never on file changes. | ||
353 | |||
354 | |||
355 | ### LPT Authentication | ||
356 | |||
357 | The location of the LPT root node on the flash is stored in the UBIFS master | ||
358 | node. Since the LPT is written and read atomically on every commit, there is | ||
359 | no need to authenticate individual nodes of the tree. It suffices to | ||
360 | protect the integrity of the full LPT by a simple hash stored in the master | ||
361 | node. Since the master node itself is authenticated, the LPTs authenticity can | ||
362 | be verified by verifying the authenticity of the master node and comparing the | ||
363 | LTP hash stored there with the hash computed from the read on-flash LPT. | ||
364 | |||
365 | |||
366 | ## Key Management | ||
367 | |||
368 | For simplicity, UBIFS authentication uses a single key to compute the HMACs | ||
369 | of superblock, master, commit start and reference nodes. This key has to be | ||
370 | available on creation of the filesystem (`mkfs.ubifs`) to authenticate the | ||
371 | superblock node. Further, it has to be available on mount of the filesystem | ||
372 | to verify authenticated nodes and generate new HMACs for changes. | ||
373 | |||
374 | UBIFS authentication is intended to operate side-by-side with UBIFS encryption | ||
375 | (fscrypt) to provide confidentiality and authenticity. Since UBIFS encryption | ||
376 | has a different approach of encryption policies per directory, there can be | ||
377 | multiple fscrypt master keys and there might be folders without encryption. | ||
378 | UBIFS authentication on the other hand has an all-or-nothing approach in the | ||
379 | sense that it either authenticates everything of the filesystem or nothing. | ||
380 | Because of this and because UBIFS authentication should also be usable without | ||
381 | encryption, it does not share the same master key with fscrypt, but manages | ||
382 | a dedicated authentication key. | ||
383 | |||
384 | The API for providing the authentication key has yet to be defined, but the | ||
385 | key can eg. be provided by userspace through a keyring similar to the way it | ||
386 | is currently done in fscrypt. It should however be noted that the current | ||
387 | fscrypt approach has shown its flaws and the userspace API will eventually | ||
388 | change [FSCRYPT-POLICY2]. | ||
389 | |||
390 | Nevertheless, it will be possible for a user to provide a single passphrase | ||
391 | or key in userspace that covers UBIFS authentication and encryption. This can | ||
392 | be solved by the corresponding userspace tools which derive a second key for | ||
393 | authentication in addition to the derived fscrypt master key used for | ||
394 | encryption. | ||
395 | |||
396 | To be able to check if the proper key is available on mount, the UBIFS | ||
397 | superblock node will additionally store a hash of the authentication key. This | ||
398 | approach is similar to the approach proposed for fscrypt encryption policy v2 | ||
399 | [FSCRYPT-POLICY2]. | ||
400 | |||
401 | |||
402 | # Future Extensions | ||
403 | |||
404 | In certain cases where a vendor wants to provide an authenticated filesystem | ||
405 | image to customers, it should be possible to do so without sharing the secret | ||
406 | UBIFS authentication key. Instead, in addition the each HMAC a digital | ||
407 | signature could be stored where the vendor shares the public key alongside the | ||
408 | filesystem image. In case this filesystem has to be modified afterwards, | ||
409 | UBIFS can exchange all digital signatures with HMACs on first mount similar | ||
410 | to the way the IMA/EVM subsystem deals with such situations. The HMAC key | ||
411 | will then have to be provided beforehand in the normal way. | ||
412 | |||
413 | |||
414 | # References | ||
415 | |||
416 | [CRYPTSETUP2] http://www.saout.de/pipermail/dm-crypt/2017-November/005745.html | ||
417 | |||
418 | [DMC-CBC-ATTACK] http://www.jakoblell.com/blog/2013/12/22/practical-malleability-attack-against-cbc-encrypted-luks-partitions/ | ||
419 | |||
420 | [DM-INTEGRITY] https://www.kernel.org/doc/Documentation/device-mapper/dm-integrity.txt | ||
421 | |||
422 | [DM-VERITY] https://www.kernel.org/doc/Documentation/device-mapper/verity.txt | ||
423 | |||
424 | [FSCRYPT-POLICY2] https://www.spinics.net/lists/linux-ext4/msg58710.html | ||
425 | |||
426 | [UBIFS-WP] http://www.linux-mtd.infradead.org/doc/ubifs_whitepaper.pdf | ||