From 114a0a08d46cfb0eefb1a882f7866b7f57bfc5ba Mon Sep 17 00:00:00 2001 From: Bryan Schumaker Date: Tue, 1 Nov 2011 13:35:23 -0400 Subject: NFSD: Added fault injection documentation Signed-off-by: Bryan Schumaker Signed-off-by: J. Bruce Fields --- Documentation/filesystems/nfs/00-INDEX | 2 + Documentation/filesystems/nfs/fault_injection.txt | 69 +++++++++++++++++++++++ 2 files changed, 71 insertions(+) create mode 100644 Documentation/filesystems/nfs/fault_injection.txt (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index a57e12411d2a..1716874a651e 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX @@ -2,6 +2,8 @@ - this file (nfs-related documentation). Exporting - explanation of how to make filesystems exportable. +fault_injection.txt + - information for using fault injection on the server knfsd-stats.txt - statistics which the NFS server makes available to user space. nfs.txt diff --git a/Documentation/filesystems/nfs/fault_injection.txt b/Documentation/filesystems/nfs/fault_injection.txt new file mode 100644 index 000000000000..426d166089a3 --- /dev/null +++ b/Documentation/filesystems/nfs/fault_injection.txt @@ -0,0 +1,69 @@ + +Fault Injection +=============== +Fault injection is a method for forcing errors that may not normally occur, or +may be difficult to reproduce. Forcing these errors in a controlled environment +can help the developer find and fix bugs before their code is shipped in a +production system. Injecting an error on the Linux NFS server will allow us to +observe how the client reacts and if it manages to recover its state correctly. + +NFSD_FAULT_INJECTION must be selected when configuring the kernel to use this +feature. + + +Using Fault Injection +===================== +On the client, mount the fault injection server through NFS v4.0+ and do some +work over NFS (open files, take locks, ...). + +On the server, mount the debugfs filesystem to and ls +/nfsd. This will show a list of files that will be used for +injecting faults on the NFS server. As root, write a number n to the file +corresponding to the action you want the server to take. The server will then +process the first n items it finds. So if you want to forget 5 locks, echo '5' +to /nfsd/forget_locks. A value of 0 will tell the server to forget +all corresponding items. A log message will be created containing the number +of items forgotten (check dmesg). + +Go back to work on the client and check if the client recovered from the error +correctly. + + +Available Faults +================ +forget_clients: + The NFS server keeps a list of clients that have placed a mount call. If + this list is cleared, the server will have no knowledge of who the client + is, forcing the client to reauthenticate with the server. + +forget_openowners: + The NFS server keeps a list of what files are currently opened and who + they were opened by. Clearing this list will force the client to reopen + its files. + +forget_locks: + The NFS server keeps a list of what files are currently locked in the VFS. + Clearing this list will force the client to reclaim its locks (files are + unlocked through the VFS as they are cleared from this list). + +forget_delegations: + A delegation is used to assure the client that a file, or part of a file, + has not changed since the delegation was awarded. Clearing this list will + force the client to reaquire its delegation before accessing the file + again. + +recall_delegations: + Delegations can be recalled by the server when another client attempts to + access a file. This test will notify the client that its delegation has + been revoked, forcing the client to reaquire the delegation before using + the file again. + + +tools/nfs/inject_faults.sh script +================================= +This script has been created to ease the fault injection process. This script +will detect the mounted debugfs directory and write to the files located there +based on the arguments passed by the user. For example, running +`inject_faults.sh forget_locks 1` as root will instruct the server to forget +one lock. Running `inject_faults forget_locks` will instruct the server to +forgetall locks. -- cgit v1.2.2 From 1a087c6ad975bcc193b4bab2e9d61f9c6c547138 Mon Sep 17 00:00:00 2001 From: Alessandro Rubini Date: Fri, 18 Nov 2011 14:50:21 +0100 Subject: debugfs: add tools to printk 32-bit registers Some debugfs file I deal with are mostly blocks of registers, i.e. lines of the form " = 0x". Some files are only registers, some include registers blocks among other material. This patch introduces data structures and functions to deal with both cases. I expect more users of this over time. Signed-off-by: Alessandro Rubini Acked-by: Giancarlo Asnaghi Cc: Felipe Balbi Signed-off-by: Greg Kroah-Hartman --- Documentation/filesystems/debugfs.txt | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 742cc06e138f..f04066a37f4c 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -97,7 +97,8 @@ A read on the resulting file will yield either Y (for non-zero values) or N, followed by a newline. If written to, it will accept either upper- or lower-case values, or 1 or 0. Any other input will be silently ignored. -Finally, a block of arbitrary binary data can be exported with: +Another option is exporting a block of arbitrary binary data, with +this structure and function: struct debugfs_blob_wrapper { void *data; @@ -115,6 +116,35 @@ can be used to export binary information, but there does not appear to be any code which does so in the mainline. Note that all files created with debugfs_create_blob() are read-only. +If you want to dump a block of registers (something that happens quite +often during development, even if little such code reaches mainline. +Debugfs offers two functions: one to make a registers-only file, and +another to insert a register block in the middle of another sequential +file. + + struct debugfs_reg32 { + char *name; + unsigned long offset; + }; + + struct debugfs_regset32 { + struct debugfs_reg32 *regs; + int nregs; + void __iomem *base; + }; + + struct dentry *debugfs_create_regset32(const char *name, mode_t mode, + struct dentry *parent, + struct debugfs_regset32 *regset); + + int debugfs_print_regs32(struct seq_file *s, struct debugfs_reg32 *regs, + int nregs, void __iomem *base, char *prefix); + +The "base" argument may be 0, but you may want to build the reg32 array +using __stringify, and a number of register names (macros) are actually +byte offsets over a base for the register block. + + There are a couple of other directory-oriented helper functions: struct dentry *debugfs_rename(struct dentry *old_dir, -- cgit v1.2.2 From b52f75a595e8a70ee453bd6fb8023ee294f7a729 Mon Sep 17 00:00:00 2001 From: Arnd Hannemann Date: Wed, 16 Nov 2011 17:35:37 +0100 Subject: Fix URL of btrfs-progs git repository in docs The location of the btrfs-progs repository has been changed. This patch updates the documentation accordingly. Signed-off-by: Arnd Hannemann --- Documentation/filesystems/btrfs.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt index 64087c34327f..7671352216f1 100644 --- a/Documentation/filesystems/btrfs.txt +++ b/Documentation/filesystems/btrfs.txt @@ -63,8 +63,8 @@ IRC network. Userspace tools for creating and manipulating Btrfs file systems are available from the git repository at the following location: - http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git - git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git + http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs.git + git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs.git These include the following tools: -- cgit v1.2.2 From 89cab5b5727d3139adc247e3a3d4ee5b10e3eda5 Mon Sep 17 00:00:00 2001 From: Phillip Lougher Date: Thu, 29 Dec 2011 13:54:17 +0000 Subject: Squashfs: Update documentation to include xattrs One paragraph was not updated when xattrs were added to Squashfs. Signed-off-by: Phillip Lougher --- Documentation/filesystems/squashfs.txt | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/squashfs.txt b/Documentation/filesystems/squashfs.txt index 7db3ebda5a4c..403c090aca39 100644 --- a/Documentation/filesystems/squashfs.txt +++ b/Documentation/filesystems/squashfs.txt @@ -93,8 +93,8 @@ byte alignment: Compressed data blocks are written to the filesystem as files are read from the source directory, and checked for duplicates. Once all file data has been -written the completed inode, directory, fragment, export and uid/gid lookup -tables are written. +written the completed inode, directory, fragment, export, uid/gid lookup and +xattr tables are written. 3.1 Compression options ----------------------- @@ -151,7 +151,7 @@ in each metadata block. Directories are sorted in alphabetical order, and at lookup the index is scanned linearly looking for the first filename alphabetically larger than the filename being looked up. At this point the location of the metadata block the filename is in has been found. -The general idea of the index is ensure only one metadata block needs to be +The general idea of the index is to ensure only one metadata block needs to be decompressed to do a lookup irrespective of the length of the directory. This scheme has the advantage that it doesn't require extra memory overhead and doesn't require much extra storage on disk. -- cgit v1.2.2 From 18bb1db3e7607e4a997d50991a6f9fa5b0f8722c Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 26 Jul 2011 01:41:39 -0400 Subject: switch vfs_mkdir() and ->mkdir() to umode_t vfs_mkdir() gets int, but immediately drops everything that might not fit into umode_t and that's the only caller of ->mkdir()... Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 2 +- Documentation/filesystems/vfs.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index d819ba16a0c7..6c7676d9c0ea 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -43,7 +43,7 @@ ata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); int (*mknod) (struct inode *,struct dentry *,int,dev_t); int (*rename) (struct inode *, struct dentry *, diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 43cbd0821721..0c147c79cdd8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -346,7 +346,7 @@ struct inode_operations { int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); int (*symlink) (struct inode *,struct dentry *,const char *); - int (*mkdir) (struct inode *,struct dentry *,int); + int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); int (*mknod) (struct inode *,struct dentry *,int,dev_t); int (*rename) (struct inode *, struct dentry *, -- cgit v1.2.2 From 4acdaf27ebe2034c342f3be57ef49aed1ad885ef Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 26 Jul 2011 01:42:34 -0400 Subject: switch ->create() to umode_t vfs_create() ignores everything outside of 16bit subset of its mode argument; switching it to umode_t is obviously equivalent and it's the only caller of the method Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 2 +- Documentation/filesystems/vfs.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 6c7676d9c0ea..38d00c8898b9 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -37,7 +37,7 @@ d_manage: no no yes (ref-walk) maybe --------------------------- inode_operations --------------------------- prototypes: - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *,umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameid ata *); int (*link) (struct dentry *,struct inode *,struct dentry *); diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 0c147c79cdd8..e7b900bc6285 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -341,7 +341,7 @@ This describes how the VFS can manipulate an inode in your filesystem. As of kernel 2.6.22, the following members are defined: struct inode_operations { - int (*create) (struct inode *,struct dentry *,int, struct nameidata *); + int (*create) (struct inode *,struct dentry *, umode_t, struct nameidata *); struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *); int (*link) (struct dentry *,struct inode *,struct dentry *); int (*unlink) (struct inode *,struct dentry *); -- cgit v1.2.2 From 1a67aafb5f72a436ca044293309fa7e6351d6a35 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Tue, 26 Jul 2011 01:52:52 -0400 Subject: switch ->mknod() to umode_t Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 2 +- Documentation/filesystems/vfs.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 38d00c8898b9..9e9f30b9f46b 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -45,7 +45,7 @@ ata *); int (*symlink) (struct inode *,struct dentry *,const char *); int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index e7b900bc6285..4b9f0d092a79 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -348,7 +348,7 @@ struct inode_operations { int (*symlink) (struct inode *,struct dentry *,const char *); int (*mkdir) (struct inode *,struct dentry *,umode_t); int (*rmdir) (struct inode *,struct dentry *); - int (*mknod) (struct inode *,struct dentry *,int,dev_t); + int (*mknod) (struct inode *,struct dentry *,umode_t,dev_t); int (*rename) (struct inode *, struct dentry *, struct inode *, struct dentry *); int (*readlink) (struct dentry *, char __user *,int); -- cgit v1.2.2 From f4ae40a6a50a98ac23d4b285f739455e926a473e Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 24 Jul 2011 04:33:43 -0400 Subject: switch debugfs to umode_t Signed-off-by: Al Viro --- Documentation/filesystems/debugfs.txt | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/debugfs.txt b/Documentation/filesystems/debugfs.txt index 742cc06e138f..9281a95d689f 100644 --- a/Documentation/filesystems/debugfs.txt +++ b/Documentation/filesystems/debugfs.txt @@ -35,7 +35,7 @@ described below will work. The most general way to create a file within a debugfs directory is with: - struct dentry *debugfs_create_file(const char *name, mode_t mode, + struct dentry *debugfs_create_file(const char *name, umode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); @@ -53,13 +53,13 @@ actually necessary; the debugfs code provides a number of helper functions for simple situations. Files containing a single integer value can be created with any of: - struct dentry *debugfs_create_u8(const char *name, mode_t mode, + struct dentry *debugfs_create_u8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_u16(const char *name, mode_t mode, + struct dentry *debugfs_create_u16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_u32(const char *name, mode_t mode, + struct dentry *debugfs_create_u32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_u64(const char *name, mode_t mode, + struct dentry *debugfs_create_u64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These files support both reading and writing the given value; if a specific @@ -67,13 +67,13 @@ file should not be written to, simply set the mode bits accordingly. The values in these files are in decimal; if hexadecimal is more appropriate, the following functions can be used instead: - struct dentry *debugfs_create_x8(const char *name, mode_t mode, + struct dentry *debugfs_create_x8(const char *name, umode_t mode, struct dentry *parent, u8 *value); - struct dentry *debugfs_create_x16(const char *name, mode_t mode, + struct dentry *debugfs_create_x16(const char *name, umode_t mode, struct dentry *parent, u16 *value); - struct dentry *debugfs_create_x32(const char *name, mode_t mode, + struct dentry *debugfs_create_x32(const char *name, umode_t mode, struct dentry *parent, u32 *value); - struct dentry *debugfs_create_x64(const char *name, mode_t mode, + struct dentry *debugfs_create_x64(const char *name, umode_t mode, struct dentry *parent, u64 *value); These functions are useful as long as the developer knows the size of the @@ -81,7 +81,7 @@ value to be exported. Some types can have different widths on different architectures, though, complicating the situation somewhat. There is a function meant to help out in one special case: - struct dentry *debugfs_create_size_t(const char *name, mode_t mode, + struct dentry *debugfs_create_size_t(const char *name, umode_t mode, struct dentry *parent, size_t *value); @@ -90,7 +90,7 @@ a variable of type size_t. Boolean values can be placed in debugfs with: - struct dentry *debugfs_create_bool(const char *name, mode_t mode, + struct dentry *debugfs_create_bool(const char *name, umode_t mode, struct dentry *parent, u32 *value); A read on the resulting file will yield either Y (for non-zero values) or @@ -104,7 +104,7 @@ Finally, a block of arbitrary binary data can be exported with: unsigned long size; }; - struct dentry *debugfs_create_blob(const char *name, mode_t mode, + struct dentry *debugfs_create_blob(const char *name, umode_t mode, struct dentry *parent, struct debugfs_blob_wrapper *blob); -- cgit v1.2.2 From 439475140bed762c04567c325d48409862341ae4 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Mon, 25 Jul 2011 00:05:26 -0400 Subject: configfs: convert to umode_t Signed-off-by: Al Viro --- Documentation/filesystems/configfs/configfs.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/configfs/configfs.txt b/Documentation/filesystems/configfs/configfs.txt index dd57bb6bb390..b40fec9d3f53 100644 --- a/Documentation/filesystems/configfs/configfs.txt +++ b/Documentation/filesystems/configfs/configfs.txt @@ -192,7 +192,7 @@ attribute value uses the store_attribute() method. struct configfs_attribute { char *ca_name; struct module *ca_owner; - mode_t ca_mode; + umode_t ca_mode; }; When a config_item wants an attribute to appear as a file in the item's -- cgit v1.2.2 From faef2b6c9960b5ae288899f461a2218ec6bb7928 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sun, 24 Jul 2011 23:44:53 -0400 Subject: sysfs: propagate umode_t Signed-off-by: Al Viro --- Documentation/filesystems/sysfs.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt index 07235caec22c..a6619b7064b9 100644 --- a/Documentation/filesystems/sysfs.txt +++ b/Documentation/filesystems/sysfs.txt @@ -70,7 +70,7 @@ An attribute definition is simply: struct attribute { char * name; struct module *owner; - mode_t mode; + umode_t mode; }; -- cgit v1.2.2 From 19c5246d251640ac76daa4d34165af78c64b1454 Mon Sep 17 00:00:00 2001 From: Yongqiang Yang Date: Wed, 4 Jan 2012 17:09:44 -0500 Subject: ext4: add new online resize interface This patch adds new online resize interface, whose input argument is a 64-bit integer indicating how many blocks there are in the resized fs. In new resize impelmentation, all work like allocating group tables are done by kernel side, so the new resize interface can support flex_bg feature and prepares ground for suppoting resize with features like bigalloc and exclude bitmap. Besides these, user-space tools just passes in the new number of blocks. We delay initializing the bitmaps and inode tables of added groups if possible and add multi groups (a flex groups) each time, so new resize is very fast like mkfs. Signed-off-by: Yongqiang Yang Signed-off-by: "Theodore Ts'o" --- Documentation/filesystems/ext4.txt | 7 +++++++ 1 file changed, 7 insertions(+) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 4917cf24a5e0..10ec4639f152 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt @@ -581,6 +581,13 @@ Table of Ext4 specific ioctls behaviour may change in the future as it is not necessary and has been done this way only for sake of simplicity. + + EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number + of blocks of resized filesystem is passed in via + 64 bit integer argument. The kernel allocates + bitmaps and inode table, the userspace tool thus + just passes the new number of blocks. + .............................................................................. References -- cgit v1.2.2 From 34c80b1d93e6e20ca9dea0baf583a5b5510d92d4 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Thu, 8 Dec 2011 21:32:45 -0500 Subject: vfs: switch ->show_options() to struct dentry * Signed-off-by: Al Viro --- Documentation/filesystems/Locking | 2 +- Documentation/filesystems/vfs.txt | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 9e9f30b9f46b..4fca82e5276e 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -117,7 +117,7 @@ prototypes: int (*statfs) (struct dentry *, struct kstatfs *); int (*remount_fs) (struct super_block *, int *, char *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); int (*bdev_try_to_free_page)(struct super_block*, struct page*, gfp_t); diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 4b9f0d092a79..3d9393b845b8 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt @@ -225,7 +225,7 @@ struct super_operations { void (*clear_inode) (struct inode *); void (*umount_begin) (struct super_block *); - int (*show_options)(struct seq_file *, struct vfsmount *); + int (*show_options)(struct seq_file *, struct dentry *); ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t); ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t); -- cgit v1.2.2 From 0499680a42141d86417a8fbaa8c8db806bea1201 Mon Sep 17 00:00:00 2001 From: Vasiliy Kulikov Date: Tue, 10 Jan 2012 15:11:31 -0800 Subject: procfs: add hidepid= and gid= mount options Add support for mount options to restrict access to /proc/PID/ directories. The default backward-compatible "relaxed" behaviour is left untouched. The first mount option is called "hidepid" and its value defines how much info about processes we want to be available for non-owners: hidepid=0 (default) means the old behavior - anybody may read all world-readable /proc/PID/* files. hidepid=1 means users may not access any /proc// directories, but their own. Sensitive files like cmdline, sched*, status are now protected against other users. As permission checking done in proc_pid_permission() and files' permissions are left untouched, programs expecting specific files' modes are not confused. hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other users. It doesn't mean that it hides whether a process exists (it can be learned by other means, e.g. by kill -0 $PID), but it hides process' euid and egid. It compicates intruder's task of gathering info about running processes, whether some daemon runs with elevated privileges, whether another user runs some sensitive program, whether other users run any program at all, etc. gid=XXX defines a group that will be able to gather all processes' info (as in hidepid=0 mode). This group should be used instead of putting nonroot user in sudoers file or something. However, untrusted users (like daemons, etc.) which are not supposed to monitor the tasks in the whole system should not be added to the group. hidepid=1 or higher is designed to restrict access to procfs files, which might reveal some sensitive private information like precise keystrokes timings: http://www.openwall.com/lists/oss-security/2011/11/05/3 hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and conky gracefully handle EPERM/ENOENT and behave as if the current user is the only user running processes. pstree shows the process subtree which contains "pstree" process. Note: the patch doesn't deal with setuid/setgid issues of keeping preopened descriptors of procfs files (like https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked information like the scheduling counters of setuid apps doesn't threaten anybody's privacy - only the user started the setuid program may read the counters. Signed-off-by: Vasiliy Kulikov Cc: Alexey Dobriyan Cc: Al Viro Cc: Randy Dunlap Cc: "H. Peter Anvin" Cc: Greg KH Cc: Theodore Tso Cc: Alan Cox Cc: James Morris Cc: Oleg Nesterov Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 39 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 39 insertions(+) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 0ec91f03422e..12fee132fbe2 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -41,6 +41,8 @@ Table of Contents 3.5 /proc//mountinfo - Information about mounts 3.6 /proc//comm & /proc//task//comm + 4 Configuring procfs + 4.1 Mount options ------------------------------------------------------------------------------ Preface @@ -1542,3 +1544,40 @@ a task to set its own or one of its thread siblings comm value. The comm value is limited in size compared to the cmdline value, so writing anything longer then the kernel's TASK_COMM_LEN (currently 16 chars) will result in a truncated comm value. + + +------------------------------------------------------------------------------ +Configuring procfs +------------------------------------------------------------------------------ + +4.1 Mount options +--------------------- + +The following mount options are supported: + + hidepid= Set /proc// access mode. + gid= Set the group authorized to learn processes information. + +hidepid=0 means classic mode - everybody may access all /proc// directories +(default). + +hidepid=1 means users may not access any /proc// directories but their +own. Sensitive files like cmdline, sched*, status are now protected against +other users. This makes it impossible to learn whether any user runs +specific program (given the program doesn't reveal itself by its behaviour). +As an additional bonus, as /proc//cmdline is unaccessible for other users, +poorly written programs passing sensitive information via program arguments are +now protected against local eavesdroppers. + +hidepid=2 means hidepid=1 plus all /proc// will be fully invisible to other +users. It doesn't mean that it hides a fact whether a process with a specific +pid value exists (it can be learned by other means, e.g. by "kill -0 $PID"), +but it hides process' uid and gid, which may be learned by stat()'ing +/proc// otherwise. It greatly complicates an intruder's task of gathering +information about running processes, whether some daemon runs with elevated +privileges, whether other user runs some sensitive program, whether other users +run any program at all, etc. + +gid= defines a group authorized to learn processes information otherwise +prohibited by hidepid=. If you use some daemon like identd which needs to learn +information about processes information, just add identd to this group. -- cgit v1.2.2 From a40dc6cc2e121abcbd1b22583ef5447763df510c Mon Sep 17 00:00:00 2001 From: Sage Weil Date: Tue, 10 Jan 2012 09:12:55 -0800 Subject: ceph: enable/disable dentry complete flags via mount option Enable/disable use of the dentry dir 'complete' flag via a mount option. This lets the admin control whether ceph uses the dcache to satisfy negative lookups or readdir when it has the entire directory contents in its cache. This is purely a performance optimization; correctness is guaranteed whether it is enabled or not. Reviewed-by: Christoph Hellwig Signed-off-by: Sage Weil --- Documentation/filesystems/ceph.txt | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/ceph.txt b/Documentation/filesystems/ceph.txt index 763d8ebbbebd..d6030aa33376 100644 --- a/Documentation/filesystems/ceph.txt +++ b/Documentation/filesystems/ceph.txt @@ -119,12 +119,20 @@ Mount Options must rely on TCP's error correction to detect data corruption in the data payload. - noasyncreaddir - Disable client's use its local cache to satisfy readdir - requests. (This does not change correctness; the client uses - cached metadata only when a lease or capability ensures it is - valid.) + dcache + Use the dcache contents to perform negative lookups and + readdir when the client has the entire directory contents in + its cache. (This does not change correctness; the client uses + cached metadata only when a lease or capability ensures it is + valid.) + + nodcache + Do not use the dcache as above. This avoids a significant amount of + complex code, sacrificing performance without affecting correctness, + and is useful for tracking down bugs. + noasyncreaddir + Do not use the dcache as above for readdir. More Information ================ -- cgit v1.2.2 From b3f7f573a20081910e34e99cbc91831f4f02f1ff Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov Date: Thu, 12 Jan 2012 17:20:53 -0800 Subject: c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are involved into calculation of program text/data segment sizes (which might be seen in /proc//statm) and into brk() call final address. For restore we need to know all these values. While mm->start_code/end_code already present in /proc/$pid/stat, the rest members are not, so this patch brings them in. The restore procedure of these members is addressed in another patch using prctl(). Signed-off-by: Cyrill Gorcunov Acked-by: Serge Hallyn Reviewed-by: Kees Cook Reviewed-by: KAMEZAWA Hiroyuki Cc: Alexey Dobriyan Cc: Tejun Heo Cc: Andrew Vagin Cc: Vasiliy Kulikov Cc: Alexey Dobriyan Cc: "Eric W. Biederman" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/proc.txt | 3 +++ 1 file changed, 3 insertions(+) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 12fee132fbe2..a76a26a1db8a 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -307,6 +307,9 @@ Table 1-4: Contents of the stat files (as of 2.6.30-rc7) blkio_ticks time spent waiting for block IO gtime guest time of the task in jiffies cgtime guest time of the task children in jiffies + start_data address above which program data+bss is placed + end_data address below which program data+bss is placed + start_brk address above which program heap can be expanded with brk() .............................................................................. The /proc/PID/maps file containing the currently mapped memory regions and -- cgit v1.2.2