diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-21 21:19:09 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2018-08-21 21:19:09 -0400 |
commit | d9a185f8b49678775ef56ecbdbc7b76970302897 (patch) | |
tree | 7ace1b26133e5d796af09e5d71d6531bcb69865c /Documentation/filesystems | |
parent | c22fc16d172fba4d19ffd8f2aa8fe67edba63895 (diff) | |
parent | 989974c804574d250ac92d44e220081959ac8ac1 (diff) |
Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi:
"This contains two new features:
- Stack file operations: this allows removal of several hacks from
the VFS, proper interaction of read-only open files with copy-up,
possibility to implement fs modifying ioctls properly, and others.
- Metadata only copy-up: when file is on lower layer and only
metadata is modified (except size) then only copy up the metadata
and continue to use the data from the lower file"
* tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
ovl: Enable metadata only feature
ovl: Do not do metacopy only for ioctl modifying file attr
ovl: Do not do metadata only copy-up for truncate operation
ovl: add helper to force data copy-up
ovl: Check redirect on index as well
ovl: Set redirect on upper inode when it is linked
ovl: Set redirect on metacopy files upon rename
ovl: Do not set dentry type ORIGIN for broken hardlinks
ovl: Add an inode flag OVL_CONST_INO
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
ovl: Check redirects for metacopy files
ovl: Move some dir related ovl_lookup_single() code in else block
ovl: Do not expose metacopy only dentry from d_real()
ovl: Open file with data except for the case of fsync
ovl: Add helper ovl_inode_realdata()
ovl: Store lower data inode in ovl_inode
ovl: Fix ovl_getattr() to get number of blocks from lower
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
ovl: Copy up meta inode data from lowest data inode
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/Locking | 3 | ||||
-rw-r--r-- | Documentation/filesystems/overlayfs.txt | 81 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 16 |
3 files changed, 64 insertions, 36 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 9e6f19eaef89..efea228ccd8a 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -21,8 +21,7 @@ prototypes: | |||
21 | char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); | 21 | char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); |
22 | struct vfsmount *(*d_automount)(struct path *path); | 22 | struct vfsmount *(*d_automount)(struct path *path); |
23 | int (*d_manage)(const struct path *, bool); | 23 | int (*d_manage)(const struct path *, bool); |
24 | struct dentry *(*d_real)(struct dentry *, const struct inode *, | 24 | struct dentry *(*d_real)(struct dentry *, const struct inode *); |
25 | unsigned int, unsigned int); | ||
26 | 25 | ||
27 | locking rules: | 26 | locking rules: |
28 | rename_lock ->d_lock may block rcu-walk | 27 | rename_lock ->d_lock may block rcu-walk |
diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt index 72615a2c0752..51c136c821bf 100644 --- a/Documentation/filesystems/overlayfs.txt +++ b/Documentation/filesystems/overlayfs.txt | |||
@@ -10,10 +10,6 @@ union-filesystems). An overlay-filesystem tries to present a | |||
10 | filesystem which is the result over overlaying one filesystem on top | 10 | filesystem which is the result over overlaying one filesystem on top |
11 | of the other. | 11 | of the other. |
12 | 12 | ||
13 | The result will inevitably fail to look exactly like a normal | ||
14 | filesystem for various technical reasons. The expectation is that | ||
15 | many use cases will be able to ignore these differences. | ||
16 | |||
17 | 13 | ||
18 | Overlay objects | 14 | Overlay objects |
19 | --------------- | 15 | --------------- |
@@ -266,6 +262,30 @@ rightmost one and going left. In the above example lower1 will be the | |||
266 | top, lower2 the middle and lower3 the bottom layer. | 262 | top, lower2 the middle and lower3 the bottom layer. |
267 | 263 | ||
268 | 264 | ||
265 | Metadata only copy up | ||
266 | -------------------- | ||
267 | |||
268 | When metadata only copy up feature is enabled, overlayfs will only copy | ||
269 | up metadata (as opposed to whole file), when a metadata specific operation | ||
270 | like chown/chmod is performed. Full file will be copied up later when | ||
271 | file is opened for WRITE operation. | ||
272 | |||
273 | In other words, this is delayed data copy up operation and data is copied | ||
274 | up when there is a need to actually modify data. | ||
275 | |||
276 | There are multiple ways to enable/disable this feature. A config option | ||
277 | CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature | ||
278 | by default. Or one can enable/disable it at module load time with module | ||
279 | parameter metacopy=on/off. Lastly, there is also a per mount option | ||
280 | metacopy=on/off to enable/disable this feature per mount. | ||
281 | |||
282 | Do not use metacopy=on with untrusted upper/lower directories. Otherwise | ||
283 | it is possible that an attacker can create a handcrafted file with | ||
284 | appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower | ||
285 | pointed by REDIRECT. This should not be possible on local system as setting | ||
286 | "trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible | ||
287 | for untrusted layers like from a pen drive. | ||
288 | |||
269 | Sharing and copying layers | 289 | Sharing and copying layers |
270 | -------------------------- | 290 | -------------------------- |
271 | 291 | ||
@@ -284,7 +304,7 @@ though it will not result in a crash or deadlock. | |||
284 | Mounting an overlay using an upper layer path, where the upper layer path | 304 | Mounting an overlay using an upper layer path, where the upper layer path |
285 | was previously used by another mounted overlay in combination with a | 305 | was previously used by another mounted overlay in combination with a |
286 | different lower layer path, is allowed, unless the "inodes index" feature | 306 | different lower layer path, is allowed, unless the "inodes index" feature |
287 | is enabled. | 307 | or "metadata only copy up" feature is enabled. |
288 | 308 | ||
289 | With the "inodes index" feature, on the first time mount, an NFS file | 309 | With the "inodes index" feature, on the first time mount, an NFS file |
290 | handle of the lower layer root directory, along with the UUID of the lower | 310 | handle of the lower layer root directory, along with the UUID of the lower |
@@ -297,6 +317,10 @@ lower root origin, mount will fail with ESTALE. An overlayfs mount with | |||
297 | does not support NFS export, lower filesystem does not have a valid UUID or | 317 | does not support NFS export, lower filesystem does not have a valid UUID or |
298 | if the upper filesystem does not support extended attributes. | 318 | if the upper filesystem does not support extended attributes. |
299 | 319 | ||
320 | For "metadata only copy up" feature there is no verification mechanism at | ||
321 | mount time. So if same upper is mounted with different set of lower, mount | ||
322 | probably will succeed but expect the unexpected later on. So don't do it. | ||
323 | |||
300 | It is quite a common practice to copy overlay layers to a different | 324 | It is quite a common practice to copy overlay layers to a different |
301 | directory tree on the same or different underlying filesystem, and even | 325 | directory tree on the same or different underlying filesystem, and even |
302 | to a different machine. With the "inodes index" feature, trying to mount | 326 | to a different machine. With the "inodes index" feature, trying to mount |
@@ -306,27 +330,40 @@ the copied layers will fail the verification of the lower root file handle. | |||
306 | Non-standard behavior | 330 | Non-standard behavior |
307 | --------------------- | 331 | --------------------- |
308 | 332 | ||
309 | The copy_up operation essentially creates a new, identical file and | 333 | Overlayfs can now act as a POSIX compliant filesystem with the following |
310 | moves it over to the old name. Any open files referring to this inode | 334 | features turned on: |
311 | will access the old data. | 335 | |
336 | 1) "redirect_dir" | ||
337 | |||
338 | Enabled with the mount option or module option: "redirect_dir=on" or with | ||
339 | the kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y. | ||
340 | |||
341 | If this feature is disabled, then rename(2) on a lower or merged directory | ||
342 | will fail with EXDEV ("Invalid cross-device link"). | ||
343 | |||
344 | 2) "inode index" | ||
345 | |||
346 | Enabled with the mount option or module option "index=on" or with the | ||
347 | kernel config option CONFIG_OVERLAY_FS_INDEX=y. | ||
312 | 348 | ||
313 | The new file may be on a different filesystem, so both st_dev and st_ino | 349 | If this feature is disabled and a file with multiple hard links is copied |
314 | of the real file may change. The values of st_dev and st_ino returned by | 350 | up, then this will "break" the link. Changes will not be propagated to |
315 | stat(2) on an overlay object are often not the same as the real file | 351 | other names referring to the same inode. |
316 | stat(2) values to prevent the values from changing on copy_up. | ||
317 | 352 | ||
318 | Unless "xino" feature is enabled, when overlay layers are not all on the | 353 | 3) "xino" |
319 | same underlying filesystem, the value of st_dev may be different for two | ||
320 | non-directory objects in the same overlay filesystem and the value of | ||
321 | st_ino for directory objects may be non persistent and could change even | ||
322 | while the overlay filesystem is still mounted. | ||
323 | 354 | ||
324 | Unless "inode index" feature is enabled, if a file with multiple hard | 355 | Enabled with the mount option "xino=auto" or "xino=on", with the module |
325 | links is copied up, then this will "break" the link. Changes will not be | 356 | option "xino_auto=on" or with the kernel config option |
326 | propagated to other names referring to the same inode. | 357 | CONFIG_OVERLAY_FS_XINO_AUTO=y. Also implicitly enabled by using the same |
358 | underlying filesystem for all layers making up the overlay. | ||
327 | 359 | ||
328 | Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged | 360 | If this feature is disabled or the underlying filesystem doesn't have |
329 | directory will fail with EXDEV. | 361 | enough free bits in the inode number, then overlayfs will not be able to |
362 | guarantee that the values of st_ino and st_dev returned by stat(2) and the | ||
363 | value of d_ino returned by readdir(3) will act like on a normal filesystem. | ||
364 | E.g. the value of st_dev may be different for two objects in the same | ||
365 | overlay filesystem and the value of st_ino for directory objects may not be | ||
366 | persistent and could change even while the overlay filesystem is mounted. | ||
330 | 367 | ||
331 | 368 | ||
332 | Changes to underlying filesystems | 369 | Changes to underlying filesystems |
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 85907d5b9c2c..4b2084d0f1fb 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -989,8 +989,7 @@ struct dentry_operations { | |||
989 | char *(*d_dname)(struct dentry *, char *, int); | 989 | char *(*d_dname)(struct dentry *, char *, int); |
990 | struct vfsmount *(*d_automount)(struct path *); | 990 | struct vfsmount *(*d_automount)(struct path *); |
991 | int (*d_manage)(const struct path *, bool); | 991 | int (*d_manage)(const struct path *, bool); |
992 | struct dentry *(*d_real)(struct dentry *, const struct inode *, | 992 | struct dentry *(*d_real)(struct dentry *, const struct inode *); |
993 | unsigned int, unsigned int); | ||
994 | }; | 993 | }; |
995 | 994 | ||
996 | d_revalidate: called when the VFS needs to revalidate a dentry. This | 995 | d_revalidate: called when the VFS needs to revalidate a dentry. This |
@@ -1124,22 +1123,15 @@ struct dentry_operations { | |||
1124 | dentry being transited from. | 1123 | dentry being transited from. |
1125 | 1124 | ||
1126 | d_real: overlay/union type filesystems implement this method to return one of | 1125 | d_real: overlay/union type filesystems implement this method to return one of |
1127 | the underlying dentries hidden by the overlay. It is used in three | 1126 | the underlying dentries hidden by the overlay. It is used in two |
1128 | different modes: | 1127 | different modes: |
1129 | 1128 | ||
1130 | Called from open it may need to copy-up the file depending on the | ||
1131 | supplied open flags. This mode is selected with a non-zero flags | ||
1132 | argument. In this mode the d_real method can return an error. | ||
1133 | |||
1134 | Called from file_dentry() it returns the real dentry matching the inode | 1129 | Called from file_dentry() it returns the real dentry matching the inode |
1135 | argument. The real dentry may be from a lower layer already copied up, | 1130 | argument. The real dentry may be from a lower layer already copied up, |
1136 | but still referenced from the file. This mode is selected with a | 1131 | but still referenced from the file. This mode is selected with a |
1137 | non-NULL inode argument. This will always succeed. | 1132 | non-NULL inode argument. |
1138 | |||
1139 | With NULL inode and zero flags the topmost real underlying dentry is | ||
1140 | returned. This will always succeed. | ||
1141 | 1133 | ||
1142 | This method is never called with both non-NULL inode and non-zero flags. | 1134 | With NULL inode the topmost real underlying dentry is returned. |
1143 | 1135 | ||
1144 | Each dentry has a pointer to its parent dentry, as well as a hash list | 1136 | Each dentry has a pointer to its parent dentry, as well as a hash list |
1145 | of child dentries. Child dentries are basically like files in a | 1137 | of child dentries. Child dentries are basically like files in a |