aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2018-08-21 21:19:09 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2018-08-21 21:19:09 -0400
commitd9a185f8b49678775ef56ecbdbc7b76970302897 (patch)
tree7ace1b26133e5d796af09e5d71d6531bcb69865c /Documentation/filesystems
parentc22fc16d172fba4d19ffd8f2aa8fe67edba63895 (diff)
parent989974c804574d250ac92d44e220081959ac8ac1 (diff)
Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs updates from Miklos Szeredi: "This contains two new features: - Stack file operations: this allows removal of several hacks from the VFS, proper interaction of read-only open files with copy-up, possibility to implement fs modifying ioctls properly, and others. - Metadata only copy-up: when file is on lower layer and only metadata is modified (except size) then only copy up the metadata and continue to use the data from the lower file" * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits) ovl: Enable metadata only feature ovl: Do not do metacopy only for ioctl modifying file attr ovl: Do not do metadata only copy-up for truncate operation ovl: add helper to force data copy-up ovl: Check redirect on index as well ovl: Set redirect on upper inode when it is linked ovl: Set redirect on metacopy files upon rename ovl: Do not set dentry type ORIGIN for broken hardlinks ovl: Add an inode flag OVL_CONST_INO ovl: Treat metacopy dentries as type OVL_PATH_MERGE ovl: Check redirects for metacopy files ovl: Move some dir related ovl_lookup_single() code in else block ovl: Do not expose metacopy only dentry from d_real() ovl: Open file with data except for the case of fsync ovl: Add helper ovl_inode_realdata() ovl: Store lower data inode in ovl_inode ovl: Fix ovl_getattr() to get number of blocks from lower ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry ovl: Copy up meta inode data from lowest data inode ovl: Modify ovl_lookup() and friends to lookup metacopy dentry ...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/Locking3
-rw-r--r--Documentation/filesystems/overlayfs.txt81
-rw-r--r--Documentation/filesystems/vfs.txt16
3 files changed, 64 insertions, 36 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 9e6f19eaef89..efea228ccd8a 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -21,8 +21,7 @@ prototypes:
21 char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen); 21 char *(*d_dname)((struct dentry *dentry, char *buffer, int buflen);
22 struct vfsmount *(*d_automount)(struct path *path); 22 struct vfsmount *(*d_automount)(struct path *path);
23 int (*d_manage)(const struct path *, bool); 23 int (*d_manage)(const struct path *, bool);
24 struct dentry *(*d_real)(struct dentry *, const struct inode *, 24 struct dentry *(*d_real)(struct dentry *, const struct inode *);
25 unsigned int, unsigned int);
26 25
27locking rules: 26locking rules:
28 rename_lock ->d_lock may block rcu-walk 27 rename_lock ->d_lock may block rcu-walk
diff --git a/Documentation/filesystems/overlayfs.txt b/Documentation/filesystems/overlayfs.txt
index 72615a2c0752..51c136c821bf 100644
--- a/Documentation/filesystems/overlayfs.txt
+++ b/Documentation/filesystems/overlayfs.txt
@@ -10,10 +10,6 @@ union-filesystems). An overlay-filesystem tries to present a
10filesystem which is the result over overlaying one filesystem on top 10filesystem which is the result over overlaying one filesystem on top
11of the other. 11of the other.
12 12
13The result will inevitably fail to look exactly like a normal
14filesystem for various technical reasons. The expectation is that
15many use cases will be able to ignore these differences.
16
17 13
18Overlay objects 14Overlay objects
19--------------- 15---------------
@@ -266,6 +262,30 @@ rightmost one and going left. In the above example lower1 will be the
266top, lower2 the middle and lower3 the bottom layer. 262top, lower2 the middle and lower3 the bottom layer.
267 263
268 264
265Metadata only copy up
266--------------------
267
268When metadata only copy up feature is enabled, overlayfs will only copy
269up metadata (as opposed to whole file), when a metadata specific operation
270like chown/chmod is performed. Full file will be copied up later when
271file is opened for WRITE operation.
272
273In other words, this is delayed data copy up operation and data is copied
274up when there is a need to actually modify data.
275
276There are multiple ways to enable/disable this feature. A config option
277CONFIG_OVERLAY_FS_METACOPY can be set/unset to enable/disable this feature
278by default. Or one can enable/disable it at module load time with module
279parameter metacopy=on/off. Lastly, there is also a per mount option
280metacopy=on/off to enable/disable this feature per mount.
281
282Do not use metacopy=on with untrusted upper/lower directories. Otherwise
283it is possible that an attacker can create a handcrafted file with
284appropriate REDIRECT and METACOPY xattrs, and gain access to file on lower
285pointed by REDIRECT. This should not be possible on local system as setting
286"trusted." xattrs will require CAP_SYS_ADMIN. But it should be possible
287for untrusted layers like from a pen drive.
288
269Sharing and copying layers 289Sharing and copying layers
270-------------------------- 290--------------------------
271 291
@@ -284,7 +304,7 @@ though it will not result in a crash or deadlock.
284Mounting an overlay using an upper layer path, where the upper layer path 304Mounting an overlay using an upper layer path, where the upper layer path
285was previously used by another mounted overlay in combination with a 305was previously used by another mounted overlay in combination with a
286different lower layer path, is allowed, unless the "inodes index" feature 306different lower layer path, is allowed, unless the "inodes index" feature
287is enabled. 307or "metadata only copy up" feature is enabled.
288 308
289With the "inodes index" feature, on the first time mount, an NFS file 309With the "inodes index" feature, on the first time mount, an NFS file
290handle of the lower layer root directory, along with the UUID of the lower 310handle of the lower layer root directory, along with the UUID of the lower
@@ -297,6 +317,10 @@ lower root origin, mount will fail with ESTALE. An overlayfs mount with
297does not support NFS export, lower filesystem does not have a valid UUID or 317does not support NFS export, lower filesystem does not have a valid UUID or
298if the upper filesystem does not support extended attributes. 318if the upper filesystem does not support extended attributes.
299 319
320For "metadata only copy up" feature there is no verification mechanism at
321mount time. So if same upper is mounted with different set of lower, mount
322probably will succeed but expect the unexpected later on. So don't do it.
323
300It is quite a common practice to copy overlay layers to a different 324It is quite a common practice to copy overlay layers to a different
301directory tree on the same or different underlying filesystem, and even 325directory tree on the same or different underlying filesystem, and even
302to a different machine. With the "inodes index" feature, trying to mount 326to a different machine. With the "inodes index" feature, trying to mount
@@ -306,27 +330,40 @@ the copied layers will fail the verification of the lower root file handle.
306Non-standard behavior 330Non-standard behavior
307--------------------- 331---------------------
308 332
309The copy_up operation essentially creates a new, identical file and 333Overlayfs can now act as a POSIX compliant filesystem with the following
310moves it over to the old name. Any open files referring to this inode 334features turned on:
311will access the old data. 335
3361) "redirect_dir"
337
338Enabled with the mount option or module option: "redirect_dir=on" or with
339the kernel config option CONFIG_OVERLAY_FS_REDIRECT_DIR=y.
340
341If this feature is disabled, then rename(2) on a lower or merged directory
342will fail with EXDEV ("Invalid cross-device link").
343
3442) "inode index"
345
346Enabled with the mount option or module option "index=on" or with the
347kernel config option CONFIG_OVERLAY_FS_INDEX=y.
312 348
313The new file may be on a different filesystem, so both st_dev and st_ino 349If this feature is disabled and a file with multiple hard links is copied
314of the real file may change. The values of st_dev and st_ino returned by 350up, then this will "break" the link. Changes will not be propagated to
315stat(2) on an overlay object are often not the same as the real file 351other names referring to the same inode.
316stat(2) values to prevent the values from changing on copy_up.
317 352
318Unless "xino" feature is enabled, when overlay layers are not all on the 3533) "xino"
319same underlying filesystem, the value of st_dev may be different for two
320non-directory objects in the same overlay filesystem and the value of
321st_ino for directory objects may be non persistent and could change even
322while the overlay filesystem is still mounted.
323 354
324Unless "inode index" feature is enabled, if a file with multiple hard 355Enabled with the mount option "xino=auto" or "xino=on", with the module
325links is copied up, then this will "break" the link. Changes will not be 356option "xino_auto=on" or with the kernel config option
326propagated to other names referring to the same inode. 357CONFIG_OVERLAY_FS_XINO_AUTO=y. Also implicitly enabled by using the same
358underlying filesystem for all layers making up the overlay.
327 359
328Unless "redirect_dir" feature is enabled, rename(2) on a lower or merged 360If this feature is disabled or the underlying filesystem doesn't have
329directory will fail with EXDEV. 361enough free bits in the inode number, then overlayfs will not be able to
362guarantee that the values of st_ino and st_dev returned by stat(2) and the
363value of d_ino returned by readdir(3) will act like on a normal filesystem.
364E.g. the value of st_dev may be different for two objects in the same
365overlay filesystem and the value of st_ino for directory objects may not be
366persistent and could change even while the overlay filesystem is mounted.
330 367
331 368
332Changes to underlying filesystems 369Changes to underlying filesystems
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 85907d5b9c2c..4b2084d0f1fb 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -989,8 +989,7 @@ struct dentry_operations {
989 char *(*d_dname)(struct dentry *, char *, int); 989 char *(*d_dname)(struct dentry *, char *, int);
990 struct vfsmount *(*d_automount)(struct path *); 990 struct vfsmount *(*d_automount)(struct path *);
991 int (*d_manage)(const struct path *, bool); 991 int (*d_manage)(const struct path *, bool);
992 struct dentry *(*d_real)(struct dentry *, const struct inode *, 992 struct dentry *(*d_real)(struct dentry *, const struct inode *);
993 unsigned int, unsigned int);
994}; 993};
995 994
996 d_revalidate: called when the VFS needs to revalidate a dentry. This 995 d_revalidate: called when the VFS needs to revalidate a dentry. This
@@ -1124,22 +1123,15 @@ struct dentry_operations {
1124 dentry being transited from. 1123 dentry being transited from.
1125 1124
1126 d_real: overlay/union type filesystems implement this method to return one of 1125 d_real: overlay/union type filesystems implement this method to return one of
1127 the underlying dentries hidden by the overlay. It is used in three 1126 the underlying dentries hidden by the overlay. It is used in two
1128 different modes: 1127 different modes:
1129 1128
1130 Called from open it may need to copy-up the file depending on the
1131 supplied open flags. This mode is selected with a non-zero flags
1132 argument. In this mode the d_real method can return an error.
1133
1134 Called from file_dentry() it returns the real dentry matching the inode 1129 Called from file_dentry() it returns the real dentry matching the inode
1135 argument. The real dentry may be from a lower layer already copied up, 1130 argument. The real dentry may be from a lower layer already copied up,
1136 but still referenced from the file. This mode is selected with a 1131 but still referenced from the file. This mode is selected with a
1137 non-NULL inode argument. This will always succeed. 1132 non-NULL inode argument.
1138
1139 With NULL inode and zero flags the topmost real underlying dentry is
1140 returned. This will always succeed.
1141 1133
1142 This method is never called with both non-NULL inode and non-zero flags. 1134 With NULL inode the topmost real underlying dentry is returned.
1143 1135
1144Each dentry has a pointer to its parent dentry, as well as a hash list 1136Each dentry has a pointer to its parent dentry, as well as a hash list
1145of child dentries. Child dentries are basically like files in a 1137of child dentries. Child dentries are basically like files in a