aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/00-INDEX4
-rw-r--r--Documentation/filesystems/efivarfs.txt16
-rw-r--r--Documentation/filesystems/ext4.txt9
-rw-r--r--Documentation/filesystems/f2fs.txt421
-rw-r--r--Documentation/filesystems/nfs/nfs41-server.txt20
-rw-r--r--Documentation/filesystems/proc.txt146
-rw-r--r--Documentation/filesystems/vfat.txt9
-rw-r--r--Documentation/filesystems/xfs.txt13
8 files changed, 607 insertions, 31 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
index 8c624a18f67d..8042050eb265 100644
--- a/Documentation/filesystems/00-INDEX
+++ b/Documentation/filesystems/00-INDEX
@@ -38,6 +38,8 @@ dnotify_test.c
38 - example program for dnotify 38 - example program for dnotify
39ecryptfs.txt 39ecryptfs.txt
40 - docs on eCryptfs: stacked cryptographic filesystem for Linux. 40 - docs on eCryptfs: stacked cryptographic filesystem for Linux.
41efivarfs.txt
42 - info for the efivarfs filesystem.
41exofs.txt 43exofs.txt
42 - info, usage, mount options, design about EXOFS. 44 - info, usage, mount options, design about EXOFS.
43ext2.txt 45ext2.txt
@@ -48,6 +50,8 @@ ext4.txt
48 - info, mount options and specifications for the Ext4 filesystem. 50 - info, mount options and specifications for the Ext4 filesystem.
49files.txt 51files.txt
50 - info on file management in the Linux kernel. 52 - info on file management in the Linux kernel.
53f2fs.txt
54 - info and mount options for the F2FS filesystem.
51fuse.txt 55fuse.txt
52 - info on the Filesystem in User SpacE including mount options. 56 - info on the Filesystem in User SpacE including mount options.
53gfs2.txt 57gfs2.txt
diff --git a/Documentation/filesystems/efivarfs.txt b/Documentation/filesystems/efivarfs.txt
new file mode 100644
index 000000000000..c477af086e65
--- /dev/null
+++ b/Documentation/filesystems/efivarfs.txt
@@ -0,0 +1,16 @@
1
2efivarfs - a (U)EFI variable filesystem
3
4The efivarfs filesystem was created to address the shortcomings of
5using entries in sysfs to maintain EFI variables. The old sysfs EFI
6variables code only supported variables of up to 1024 bytes. This
7limitation existed in version 0.99 of the EFI specification, but was
8removed before any full releases. Since variables can now be larger
9than a single page, sysfs isn't the best interface for this.
10
11Variables can be created, deleted and modified with the efivarfs
12filesystem.
13
14efivarfs is typically mounted like this,
15
16 mount -t efivarfs none /sys/firmware/efi/efivars
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 104322bf378c..34ea4f1fa6ea 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -200,12 +200,9 @@ inode_readahead_blks=n This tuning parameter controls the maximum
200 table readahead algorithm will pre-read into 200 table readahead algorithm will pre-read into
201 the buffer cache. The default value is 32 blocks. 201 the buffer cache. The default value is 32 blocks.
202 202
203nouser_xattr Disables Extended User Attributes. If you have extended 203nouser_xattr Disables Extended User Attributes. See the
204 attribute support enabled in the kernel configuration 204 attr(5) manual page and http://acl.bestbits.at/
205 (CONFIG_EXT4_FS_XATTR), extended attribute support 205 for more information about extended attributes.
206 is enabled by default on mount. See the attr(5) manual
207 page and http://acl.bestbits.at/ for more information
208 about extended attributes.
209 206
210noacl This option disables POSIX Access Control List 207noacl This option disables POSIX Access Control List
211 support. If ACL support is enabled in the kernel 208 support. If ACL support is enabled in the kernel
diff --git a/Documentation/filesystems/f2fs.txt b/Documentation/filesystems/f2fs.txt
new file mode 100644
index 000000000000..8fbd8b46ee34
--- /dev/null
+++ b/Documentation/filesystems/f2fs.txt
@@ -0,0 +1,421 @@
1================================================================================
2WHAT IS Flash-Friendly File System (F2FS)?
3================================================================================
4
5NAND flash memory-based storage devices, such as SSD, eMMC, and SD cards, have
6been equipped on a variety systems ranging from mobile to server systems. Since
7they are known to have different characteristics from the conventional rotating
8disks, a file system, an upper layer to the storage device, should adapt to the
9changes from the sketch in the design level.
10
11F2FS is a file system exploiting NAND flash memory-based storage devices, which
12is based on Log-structured File System (LFS). The design has been focused on
13addressing the fundamental issues in LFS, which are snowball effect of wandering
14tree and high cleaning overhead.
15
16Since a NAND flash memory-based storage device shows different characteristic
17according to its internal geometry or flash memory management scheme, namely FTL,
18F2FS and its tools support various parameters not only for configuring on-disk
19layout, but also for selecting allocation and cleaning algorithms.
20
21The file system formatting tool, "mkfs.f2fs", is available from the following
22git tree:
23>> git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs-tools.git
24
25For reporting bugs and sending patches, please use the following mailing list:
26>> linux-f2fs-devel@lists.sourceforge.net
27
28================================================================================
29BACKGROUND AND DESIGN ISSUES
30================================================================================
31
32Log-structured File System (LFS)
33--------------------------------
34"A log-structured file system writes all modifications to disk sequentially in
35a log-like structure, thereby speeding up both file writing and crash recovery.
36The log is the only structure on disk; it contains indexing information so that
37files can be read back from the log efficiently. In order to maintain large free
38areas on disk for fast writing, we divide the log into segments and use a
39segment cleaner to compress the live information from heavily fragmented
40segments." from Rosenblum, M. and Ousterhout, J. K., 1992, "The design and
41implementation of a log-structured file system", ACM Trans. Computer Systems
4210, 1, 26–52.
43
44Wandering Tree Problem
45----------------------
46In LFS, when a file data is updated and written to the end of log, its direct
47pointer block is updated due to the changed location. Then the indirect pointer
48block is also updated due to the direct pointer block update. In this manner,
49the upper index structures such as inode, inode map, and checkpoint block are
50also updated recursively. This problem is called as wandering tree problem [1],
51and in order to enhance the performance, it should eliminate or relax the update
52propagation as much as possible.
53
54[1] Bityutskiy, A. 2005. JFFS3 design issues. http://www.linux-mtd.infradead.org/
55
56Cleaning Overhead
57-----------------
58Since LFS is based on out-of-place writes, it produces so many obsolete blocks
59scattered across the whole storage. In order to serve new empty log space, it
60needs to reclaim these obsolete blocks seamlessly to users. This job is called
61as a cleaning process.
62
63The process consists of three operations as follows.
641. A victim segment is selected through referencing segment usage table.
652. It loads parent index structures of all the data in the victim identified by
66 segment summary blocks.
673. It checks the cross-reference between the data and its parent index structure.
684. It moves valid data selectively.
69
70This cleaning job may cause unexpected long delays, so the most important goal
71is to hide the latencies to users. And also definitely, it should reduce the
72amount of valid data to be moved, and move them quickly as well.
73
74================================================================================
75KEY FEATURES
76================================================================================
77
78Flash Awareness
79---------------
80- Enlarge the random write area for better performance, but provide the high
81 spatial locality
82- Align FS data structures to the operational units in FTL as best efforts
83
84Wandering Tree Problem
85----------------------
86- Use a term, “node”, that represents inodes as well as various pointer blocks
87- Introduce Node Address Table (NAT) containing the locations of all the “node”
88 blocks; this will cut off the update propagation.
89
90Cleaning Overhead
91-----------------
92- Support a background cleaning process
93- Support greedy and cost-benefit algorithms for victim selection policies
94- Support multi-head logs for static/dynamic hot and cold data separation
95- Introduce adaptive logging for efficient block allocation
96
97================================================================================
98MOUNT OPTIONS
99================================================================================
100
101background_gc_off Turn off cleaning operations, namely garbage collection,
102 triggered in background when I/O subsystem is idle.
103disable_roll_forward Disable the roll-forward recovery routine
104discard Issue discard/TRIM commands when a segment is cleaned.
105no_heap Disable heap-style segment allocation which finds free
106 segments for data from the beginning of main area, while
107 for node from the end of main area.
108nouser_xattr Disable Extended User Attributes. Note: xattr is enabled
109 by default if CONFIG_F2FS_FS_XATTR is selected.
110noacl Disable POSIX Access Control List. Note: acl is enabled
111 by default if CONFIG_F2FS_FS_POSIX_ACL is selected.
112active_logs=%u Support configuring the number of active logs. In the
113 current design, f2fs supports only 2, 4, and 6 logs.
114 Default number is 6.
115disable_ext_identify Disable the extension list configured by mkfs, so f2fs
116 does not aware of cold files such as media files.
117
118================================================================================
119DEBUGFS ENTRIES
120================================================================================
121
122/sys/kernel/debug/f2fs/ contains information about all the partitions mounted as
123f2fs. Each file shows the whole f2fs information.
124
125/sys/kernel/debug/f2fs/status includes:
126 - major file system information managed by f2fs currently
127 - average SIT information about whole segments
128 - current memory footprint consumed by f2fs.
129
130================================================================================
131USAGE
132================================================================================
133
1341. Download userland tools and compile them.
135
1362. Skip, if f2fs was compiled statically inside kernel.
137 Otherwise, insert the f2fs.ko module.
138 # insmod f2fs.ko
139
1403. Create a directory trying to mount
141 # mkdir /mnt/f2fs
142
1434. Format the block device, and then mount as f2fs
144 # mkfs.f2fs -l label /dev/block_device
145 # mount -t f2fs /dev/block_device /mnt/f2fs
146
147Format options
148--------------
149-l [label] : Give a volume label, up to 256 unicode name.
150-a [0 or 1] : Split start location of each area for heap-based allocation.
151 1 is set by default, which performs this.
152-o [int] : Set overprovision ratio in percent over volume size.
153 5 is set by default.
154-s [int] : Set the number of segments per section.
155 1 is set by default.
156-z [int] : Set the number of sections per zone.
157 1 is set by default.
158-e [str] : Set basic extension list. e.g. "mp3,gif,mov"
159
160================================================================================
161DESIGN
162================================================================================
163
164On-disk Layout
165--------------
166
167F2FS divides the whole volume into a number of segments, each of which is fixed
168to 2MB in size. A section is composed of consecutive segments, and a zone
169consists of a set of sections. By default, section and zone sizes are set to one
170segment size identically, but users can easily modify the sizes by mkfs.
171
172F2FS splits the entire volume into six areas, and all the areas except superblock
173consists of multiple segments as described below.
174
175 align with the zone size <-|
176 |-> align with the segment size
177 _________________________________________________________________________
178 | | | Node | Segment | Segment | |
179 | Superblock | Checkpoint | Address | Info. | Summary | Main |
180 | (SB) | (CP) | Table (NAT) | Table (SIT) | Area (SSA) | |
181 |____________|_____2______|______N______|______N______|______N_____|__N___|
182 . .
183 . .
184 . .
185 ._________________________________________.
186 |_Segment_|_..._|_Segment_|_..._|_Segment_|
187 . .
188 ._________._________
189 |_section_|__...__|_
190 . .
191 .________.
192 |__zone__|
193
194- Superblock (SB)
195 : It is located at the beginning of the partition, and there exist two copies
196 to avoid file system crash. It contains basic partition information and some
197 default parameters of f2fs.
198
199- Checkpoint (CP)
200 : It contains file system information, bitmaps for valid NAT/SIT sets, orphan
201 inode lists, and summary entries of current active segments.
202
203- Node Address Table (NAT)
204 : It is composed of a block address table for all the node blocks stored in
205 Main area.
206
207- Segment Information Table (SIT)
208 : It contains segment information such as valid block count and bitmap for the
209 validity of all the blocks.
210
211- Segment Summary Area (SSA)
212 : It contains summary entries which contains the owner information of all the
213 data and node blocks stored in Main area.
214
215- Main Area
216 : It contains file and directory data including their indices.
217
218In order to avoid misalignment between file system and flash-based storage, F2FS
219aligns the start block address of CP with the segment size. Also, it aligns the
220start block address of Main area with the zone size by reserving some segments
221in SSA area.
222
223Reference the following survey for additional technical details.
224https://wiki.linaro.org/WorkingGroups/Kernel/Projects/FlashCardSurvey
225
226File System Metadata Structure
227------------------------------
228
229F2FS adopts the checkpointing scheme to maintain file system consistency. At
230mount time, F2FS first tries to find the last valid checkpoint data by scanning
231CP area. In order to reduce the scanning time, F2FS uses only two copies of CP.
232One of them always indicates the last valid data, which is called as shadow copy
233mechanism. In addition to CP, NAT and SIT also adopt the shadow copy mechanism.
234
235For file system consistency, each CP points to which NAT and SIT copies are
236valid, as shown as below.
237
238 +--------+----------+---------+
239 | CP | NAT | SIT |
240 +--------+----------+---------+
241 . . . .
242 . . . .
243 . . . .
244 +-------+-------+--------+--------+--------+--------+
245 | CP #0 | CP #1 | NAT #0 | NAT #1 | SIT #0 | SIT #1 |
246 +-------+-------+--------+--------+--------+--------+
247 | ^ ^
248 | | |
249 `----------------------------------------'
250
251Index Structure
252---------------
253
254The key data structure to manage the data locations is a "node". Similar to
255traditional file structures, F2FS has three types of node: inode, direct node,
256indirect node. F2FS assigns 4KB to an inode block which contains 923 data block
257indices, two direct node pointers, two indirect node pointers, and one double
258indirect node pointer as described below. One direct node block contains 1018
259data blocks, and one indirect node block contains also 1018 node blocks. Thus,
260one inode block (i.e., a file) covers:
261
262 4KB * (923 + 2 * 1018 + 2 * 1018 * 1018 + 1018 * 1018 * 1018) := 3.94TB.
263
264 Inode block (4KB)
265 |- data (923)
266 |- direct node (2)
267 | `- data (1018)
268 |- indirect node (2)
269 | `- direct node (1018)
270 | `- data (1018)
271 `- double indirect node (1)
272 `- indirect node (1018)
273 `- direct node (1018)
274 `- data (1018)
275
276Note that, all the node blocks are mapped by NAT which means the location of
277each node is translated by the NAT table. In the consideration of the wandering
278tree problem, F2FS is able to cut off the propagation of node updates caused by
279leaf data writes.
280
281Directory Structure
282-------------------
283
284A directory entry occupies 11 bytes, which consists of the following attributes.
285
286- hash hash value of the file name
287- ino inode number
288- len the length of file name
289- type file type such as directory, symlink, etc
290
291A dentry block consists of 214 dentry slots and file names. Therein a bitmap is
292used to represent whether each dentry is valid or not. A dentry block occupies
2934KB with the following composition.
294
295 Dentry Block(4 K) = bitmap (27 bytes) + reserved (3 bytes) +
296 dentries(11 * 214 bytes) + file name (8 * 214 bytes)
297
298 [Bucket]
299 +--------------------------------+
300 |dentry block 1 | dentry block 2 |
301 +--------------------------------+
302 . .
303 . .
304 . [Dentry Block Structure: 4KB] .
305 +--------+----------+----------+------------+
306 | bitmap | reserved | dentries | file names |
307 +--------+----------+----------+------------+
308 [Dentry Block: 4KB] . .
309 . .
310 . .
311 +------+------+-----+------+
312 | hash | ino | len | type |
313 +------+------+-----+------+
314 [Dentry Structure: 11 bytes]
315
316F2FS implements multi-level hash tables for directory structure. Each level has
317a hash table with dedicated number of hash buckets as shown below. Note that
318"A(2B)" means a bucket includes 2 data blocks.
319
320----------------------
321A : bucket
322B : block
323N : MAX_DIR_HASH_DEPTH
324----------------------
325
326level #0 | A(2B)
327 |
328level #1 | A(2B) - A(2B)
329 |
330level #2 | A(2B) - A(2B) - A(2B) - A(2B)
331 . | . . . .
332level #N/2 | A(2B) - A(2B) - A(2B) - A(2B) - A(2B) - ... - A(2B)
333 . | . . . .
334level #N | A(4B) - A(4B) - A(4B) - A(4B) - A(4B) - ... - A(4B)
335
336The number of blocks and buckets are determined by,
337
338 ,- 2, if n < MAX_DIR_HASH_DEPTH / 2,
339 # of blocks in level #n = |
340 `- 4, Otherwise
341
342 ,- 2^n, if n < MAX_DIR_HASH_DEPTH / 2,
343 # of buckets in level #n = |
344 `- 2^((MAX_DIR_HASH_DEPTH / 2) - 1), Otherwise
345
346When F2FS finds a file name in a directory, at first a hash value of the file
347name is calculated. Then, F2FS scans the hash table in level #0 to find the
348dentry consisting of the file name and its inode number. If not found, F2FS
349scans the next hash table in level #1. In this way, F2FS scans hash tables in
350each levels incrementally from 1 to N. In each levels F2FS needs to scan only
351one bucket determined by the following equation, which shows O(log(# of files))
352complexity.
353
354 bucket number to scan in level #n = (hash value) % (# of buckets in level #n)
355
356In the case of file creation, F2FS finds empty consecutive slots that cover the
357file name. F2FS searches the empty slots in the hash tables of whole levels from
3581 to N in the same way as the lookup operation.
359
360The following figure shows an example of two cases holding children.
361 --------------> Dir <--------------
362 | |
363 child child
364
365 child - child [hole] - child
366
367 child - child - child [hole] - [hole] - child
368
369 Case 1: Case 2:
370 Number of children = 6, Number of children = 3,
371 File size = 7 File size = 7
372
373Default Block Allocation
374------------------------
375
376At runtime, F2FS manages six active logs inside "Main" area: Hot/Warm/Cold node
377and Hot/Warm/Cold data.
378
379- Hot node contains direct node blocks of directories.
380- Warm node contains direct node blocks except hot node blocks.
381- Cold node contains indirect node blocks
382- Hot data contains dentry blocks
383- Warm data contains data blocks except hot and cold data blocks
384- Cold data contains multimedia data or migrated data blocks
385
386LFS has two schemes for free space management: threaded log and copy-and-compac-
387tion. The copy-and-compaction scheme which is known as cleaning, is well-suited
388for devices showing very good sequential write performance, since free segments
389are served all the time for writing new data. However, it suffers from cleaning
390overhead under high utilization. Contrarily, the threaded log scheme suffers
391from random writes, but no cleaning process is needed. F2FS adopts a hybrid
392scheme where the copy-and-compaction scheme is adopted by default, but the
393policy is dynamically changed to the threaded log scheme according to the file
394system status.
395
396In order to align F2FS with underlying flash-based storage, F2FS allocates a
397segment in a unit of section. F2FS expects that the section size would be the
398same as the unit size of garbage collection in FTL. Furthermore, with respect
399to the mapping granularity in FTL, F2FS allocates each section of the active
400logs from different zones as much as possible, since FTL can write the data in
401the active logs into one allocation unit according to its mapping granularity.
402
403Cleaning process
404----------------
405
406F2FS does cleaning both on demand and in the background. On-demand cleaning is
407triggered when there are not enough free segments to serve VFS calls. Background
408cleaner is operated by a kernel thread, and triggers the cleaning job when the
409system is idle.
410
411F2FS supports two victim selection policies: greedy and cost-benefit algorithms.
412In the greedy algorithm, F2FS selects a victim segment having the smallest number
413of valid blocks. In the cost-benefit algorithm, F2FS selects a victim segment
414according to the segment age and the number of valid blocks in order to address
415log block thrashing problem in the greedy algorithm. F2FS adopts the greedy
416algorithm for on-demand cleaner, while background cleaner adopts cost-benefit
417algorithm.
418
419In order to identify whether the data in the victim segment are valid or not,
420F2FS manages a bitmap. Each bit represents the validity of a block, and the
421bitmap is composed of a bit stream covering whole blocks in main area.
diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt
index 092fad92a3f0..01c2db769791 100644
--- a/Documentation/filesystems/nfs/nfs41-server.txt
+++ b/Documentation/filesystems/nfs/nfs41-server.txt
@@ -39,21 +39,10 @@ interoperability problems with future clients. Known issues:
39 from a linux client are possible, but we aren't really 39 from a linux client are possible, but we aren't really
40 conformant with the spec (for example, we don't use kerberos 40 conformant with the spec (for example, we don't use kerberos
41 on the backchannel correctly). 41 on the backchannel correctly).
42 - Incomplete backchannel support: incomplete backchannel gss
43 support and no support for BACKCHANNEL_CTL mean that
44 callbacks (hence delegations and layouts) may not be
45 available and clients confused by the incomplete
46 implementation may fail.
47 - We do not support SSV, which provides security for shared 42 - We do not support SSV, which provides security for shared
48 client-server state (thus preventing unauthorized tampering 43 client-server state (thus preventing unauthorized tampering
49 with locks and opens, for example). It is mandatory for 44 with locks and opens, for example). It is mandatory for
50 servers to support this, though no clients use it yet. 45 servers to support this, though no clients use it yet.
51 - Mandatory operations which we do not support, such as
52 DESTROY_CLIENTID, are not currently used by clients, but will be
53 (and the spec recommends their uses in common cases), and
54 clients should not be expected to know how to recover from the
55 case where they are not supported. This will eventually cause
56 interoperability failures.
57 46
58In addition, some limitations are inherited from the current NFSv4 47In addition, some limitations are inherited from the current NFSv4
59implementation: 48implementation:
@@ -89,7 +78,7 @@ Operations
89 | | MNI | or OPT) | | 78 | | MNI | or OPT) | |
90 +----------------------+------------+--------------+----------------+ 79 +----------------------+------------+--------------+----------------+
91 | ACCESS | REQ | | Section 18.1 | 80 | ACCESS | REQ | | Section 18.1 |
92NS | BACKCHANNEL_CTL | REQ | | Section 18.33 | 81I | BACKCHANNEL_CTL | REQ | | Section 18.33 |
93I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 | 82I | BIND_CONN_TO_SESSION | REQ | | Section 18.34 |
94 | CLOSE | REQ | | Section 18.2 | 83 | CLOSE | REQ | | Section 18.2 |
95 | COMMIT | REQ | | Section 18.3 | 84 | COMMIT | REQ | | Section 18.3 |
@@ -99,7 +88,7 @@ NS*| DELEGPURGE | OPT | FDELG (REQ) | Section 18.5 |
99 | DELEGRETURN | OPT | FDELG, | Section 18.6 | 88 | DELEGRETURN | OPT | FDELG, | Section 18.6 |
100 | | | DDELG, pNFS | | 89 | | | DDELG, pNFS | |
101 | | | (REQ) | | 90 | | | (REQ) | |
102NS | DESTROY_CLIENTID | REQ | | Section 18.50 | 91I | DESTROY_CLIENTID | REQ | | Section 18.50 |
103I | DESTROY_SESSION | REQ | | Section 18.37 | 92I | DESTROY_SESSION | REQ | | Section 18.37 |
104I | EXCHANGE_ID | REQ | | Section 18.35 | 93I | EXCHANGE_ID | REQ | | Section 18.35 |
105I | FREE_STATEID | REQ | | Section 18.38 | 94I | FREE_STATEID | REQ | | Section 18.38 |
@@ -192,7 +181,6 @@ EXCHANGE_ID:
192 181
193CREATE_SESSION: 182CREATE_SESSION:
194* backchannel attributes are ignored 183* backchannel attributes are ignored
195* backchannel security parameters are ignored
196 184
197SEQUENCE: 185SEQUENCE:
198* no support for dynamic slot table renegotiation (optional) 186* no support for dynamic slot table renegotiation (optional)
@@ -202,7 +190,7 @@ Nonstandard compound limitations:
202 ca_maxrequestsize request and a ca_maxresponsesize reply, so we may 190 ca_maxrequestsize request and a ca_maxresponsesize reply, so we may
203 fail to live up to the promise we made in CREATE_SESSION fore channel 191 fail to live up to the promise we made in CREATE_SESSION fore channel
204 negotiation. 192 negotiation.
205* No more than one IO operation (read, write, readdir) allowed per 193* No more than one read-like operation allowed per compound; encoding
206 compound. 194 replies that cross page boundaries (except for read data) not handled.
207 195
208See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues. 196See also http://wiki.linux-nfs.org/wiki/index.php/Server_4.0_and_4.1_issues.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a1793d670cd0..fd8d0d594fc7 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -33,7 +33,7 @@ Table of Contents
33 2 Modifying System Parameters 33 2 Modifying System Parameters
34 34
35 3 Per-Process Parameters 35 3 Per-Process Parameters
36 3.1 /proc/<pid>/oom_score_adj - Adjust the oom-killer 36 3.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj - Adjust the oom-killer
37 score 37 score
38 3.2 /proc/<pid>/oom_score - Display current oom-killer score 38 3.2 /proc/<pid>/oom_score - Display current oom-killer score
39 3.3 /proc/<pid>/io - Display the IO accounting fields 39 3.3 /proc/<pid>/io - Display the IO accounting fields
@@ -41,6 +41,7 @@ Table of Contents
41 3.5 /proc/<pid>/mountinfo - Information about mounts 41 3.5 /proc/<pid>/mountinfo - Information about mounts
42 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 42 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm
43 3.7 /proc/<pid>/task/<tid>/children - Information about task children 43 3.7 /proc/<pid>/task/<tid>/children - Information about task children
44 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file
44 45
45 4 Configuring procfs 46 4 Configuring procfs
46 4.1 Mount options 47 4.1 Mount options
@@ -142,7 +143,7 @@ Table 1-1: Process specific entries in /proc
142 pagemap Page table 143 pagemap Page table
143 stack Report full stack trace, enable via CONFIG_STACKTRACE 144 stack Report full stack trace, enable via CONFIG_STACKTRACE
144 smaps a extension based on maps, showing the memory consumption of 145 smaps a extension based on maps, showing the memory consumption of
145 each mapping 146 each mapping and flags associated with it
146.............................................................................. 147..............................................................................
147 148
148For example, to get the status information of a process, all you have to do is 149For example, to get the status information of a process, all you have to do is
@@ -181,6 +182,7 @@ read the file /proc/PID/status:
181 CapPrm: 0000000000000000 182 CapPrm: 0000000000000000
182 CapEff: 0000000000000000 183 CapEff: 0000000000000000
183 CapBnd: ffffffffffffffff 184 CapBnd: ffffffffffffffff
185 Seccomp: 0
184 voluntary_ctxt_switches: 0 186 voluntary_ctxt_switches: 0
185 nonvoluntary_ctxt_switches: 1 187 nonvoluntary_ctxt_switches: 1
186 188
@@ -237,6 +239,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
237 CapPrm bitmap of permitted capabilities 239 CapPrm bitmap of permitted capabilities
238 CapEff bitmap of effective capabilities 240 CapEff bitmap of effective capabilities
239 CapBnd bitmap of capabilities bounding set 241 CapBnd bitmap of capabilities bounding set
242 Seccomp seccomp mode, like prctl(PR_GET_SECCOMP, ...)
240 Cpus_allowed mask of CPUs on which this process may run 243 Cpus_allowed mask of CPUs on which this process may run
241 Cpus_allowed_list Same as previous, but in "list format" 244 Cpus_allowed_list Same as previous, but in "list format"
242 Mems_allowed mask of memory nodes allowed to this process 245 Mems_allowed mask of memory nodes allowed to this process
@@ -415,8 +418,9 @@ Swap: 0 kB
415KernelPageSize: 4 kB 418KernelPageSize: 4 kB
416MMUPageSize: 4 kB 419MMUPageSize: 4 kB
417Locked: 374 kB 420Locked: 374 kB
421VmFlags: rd ex mr mw me de
418 422
419The first of these lines shows the same information as is displayed for the 423the first of these lines shows the same information as is displayed for the
420mapping in /proc/PID/maps. The remaining lines show the size of the mapping 424mapping in /proc/PID/maps. The remaining lines show the size of the mapping
421(size), the amount of the mapping that is currently resident in RAM (RSS), the 425(size), the amount of the mapping that is currently resident in RAM (RSS), the
422process' proportional share of this mapping (PSS), the number of clean and 426process' proportional share of this mapping (PSS), the number of clean and
@@ -430,6 +434,41 @@ and a page is modified, the file page is replaced by a private anonymous copy.
430"Swap" shows how much would-be-anonymous memory is also used, but out on 434"Swap" shows how much would-be-anonymous memory is also used, but out on
431swap. 435swap.
432 436
437"VmFlags" field deserves a separate description. This member represents the kernel
438flags associated with the particular virtual memory area in two letter encoded
439manner. The codes are the following:
440 rd - readable
441 wr - writeable
442 ex - executable
443 sh - shared
444 mr - may read
445 mw - may write
446 me - may execute
447 ms - may share
448 gd - stack segment growns down
449 pf - pure PFN range
450 dw - disabled write to the mapped file
451 lo - pages are locked in memory
452 io - memory mapped I/O area
453 sr - sequential read advise provided
454 rr - random read advise provided
455 dc - do not copy area on fork
456 de - do not expand area on remapping
457 ac - area is accountable
458 nr - swap space is not reserved for the area
459 ht - area uses huge tlb pages
460 nl - non-linear mapping
461 ar - architecture specific flag
462 dd - do not include area into core dump
463 mm - mixed map area
464 hg - huge page advise flag
465 nh - no-huge page advise flag
466 mg - mergable advise flag
467
468Note that there is no guarantee that every flag and associated mnemonic will
469be present in all further kernel releases. Things get changed, the flags may
470be vanished or the reverse -- new added.
471
433This file is only present if the CONFIG_MMU kernel configuration option is 472This file is only present if the CONFIG_MMU kernel configuration option is
434enabled. 473enabled.
435 474
@@ -1320,10 +1359,10 @@ of the kernel.
1320CHAPTER 3: PER-PROCESS PARAMETERS 1359CHAPTER 3: PER-PROCESS PARAMETERS
1321------------------------------------------------------------------------------ 1360------------------------------------------------------------------------------
1322 1361
13233.1 /proc/<pid>/oom_score_adj- Adjust the oom-killer score 13623.1 /proc/<pid>/oom_adj & /proc/<pid>/oom_score_adj- Adjust the oom-killer score
1324-------------------------------------------------------------------------------- 1363--------------------------------------------------------------------------------
1325 1364
1326This file can be used to adjust the badness heuristic used to select which 1365These file can be used to adjust the badness heuristic used to select which
1327process gets killed in out of memory conditions. 1366process gets killed in out of memory conditions.
1328 1367
1329The badness heuristic assigns a value to each candidate task ranging from 0 1368The badness heuristic assigns a value to each candidate task ranging from 0
@@ -1361,6 +1400,12 @@ same system, cpuset, mempolicy, or memory controller resources to use at least
1361equivalent to discounting 50% of the task's allowed memory from being considered 1400equivalent to discounting 50% of the task's allowed memory from being considered
1362as scoring against the task. 1401as scoring against the task.
1363 1402
1403For backwards compatibility with previous kernels, /proc/<pid>/oom_adj may also
1404be used to tune the badness score. Its acceptable values range from -16
1405(OOM_ADJUST_MIN) to +15 (OOM_ADJUST_MAX) and a special value of -17
1406(OOM_DISABLE) to disable oom killing entirely for that task. Its value is
1407scaled linearly with /proc/<pid>/oom_score_adj.
1408
1364The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last 1409The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
1365value set by a CAP_SYS_RESOURCE process. To reduce the value any lower 1410value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
1366requires CAP_SYS_RESOURCE. 1411requires CAP_SYS_RESOURCE.
@@ -1375,7 +1420,9 @@ minimal amount of work.
1375------------------------------------------------------------- 1420-------------------------------------------------------------
1376 1421
1377This file can be used to check the current score used by the oom-killer is for 1422This file can be used to check the current score used by the oom-killer is for
1378any given <pid>. 1423any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
1424process should be killed in an out-of-memory situation.
1425
1379 1426
13803.3 /proc/<pid>/io - Display the IO accounting fields 14273.3 /proc/<pid>/io - Display the IO accounting fields
1381------------------------------------------------------- 1428-------------------------------------------------------
@@ -1587,6 +1634,93 @@ pids, so one need to either stop or freeze processes being inspected
1587if precise results are needed. 1634if precise results are needed.
1588 1635
1589 1636
16373.7 /proc/<pid>/fdinfo/<fd> - Information about opened file
1638---------------------------------------------------------------
1639This file provides information associated with an opened file. The regular
1640files have at least two fields -- 'pos' and 'flags'. The 'pos' represents
1641the current offset of the opened file in decimal form [see lseek(2) for
1642details] and 'flags' denotes the octal O_xxx mask the file has been
1643created with [see open(2) for details].
1644
1645A typical output is
1646
1647 pos: 0
1648 flags: 0100002
1649
1650The files such as eventfd, fsnotify, signalfd, epoll among the regular pos/flags
1651pair provide additional information particular to the objects they represent.
1652
1653 Eventfd files
1654 ~~~~~~~~~~~~~
1655 pos: 0
1656 flags: 04002
1657 eventfd-count: 5a
1658
1659 where 'eventfd-count' is hex value of a counter.
1660
1661 Signalfd files
1662 ~~~~~~~~~~~~~~
1663 pos: 0
1664 flags: 04002
1665 sigmask: 0000000000000200
1666
1667 where 'sigmask' is hex value of the signal mask associated
1668 with a file.
1669
1670 Epoll files
1671 ~~~~~~~~~~~
1672 pos: 0
1673 flags: 02
1674 tfd: 5 events: 1d data: ffffffffffffffff
1675
1676 where 'tfd' is a target file descriptor number in decimal form,
1677 'events' is events mask being watched and the 'data' is data
1678 associated with a target [see epoll(7) for more details].
1679
1680 Fsnotify files
1681 ~~~~~~~~~~~~~~
1682 For inotify files the format is the following
1683
1684 pos: 0
1685 flags: 02000000
1686 inotify wd:3 ino:9e7e sdev:800013 mask:800afce ignored_mask:0 fhandle-bytes:8 fhandle-type:1 f_handle:7e9e0000640d1b6d
1687
1688 where 'wd' is a watch descriptor in decimal form, ie a target file
1689 descriptor number, 'ino' and 'sdev' are inode and device where the
1690 target file resides and the 'mask' is the mask of events, all in hex
1691 form [see inotify(7) for more details].
1692
1693 If the kernel was built with exportfs support, the path to the target
1694 file is encoded as a file handle. The file handle is provided by three
1695 fields 'fhandle-bytes', 'fhandle-type' and 'f_handle', all in hex
1696 format.
1697
1698 If the kernel is built without exportfs support the file handle won't be
1699 printed out.
1700
1701 If there is no inotify mark attached yet the 'inotify' line will be omitted.
1702
1703 For fanotify files the format is
1704
1705 pos: 0
1706 flags: 02
1707 fanotify flags:10 event-flags:0
1708 fanotify mnt_id:12 mflags:40 mask:38 ignored_mask:40000003
1709 fanotify ino:4f969 sdev:800013 mflags:0 mask:3b ignored_mask:40000000 fhandle-bytes:8 fhandle-type:1 f_handle:69f90400c275b5b4
1710
1711 where fanotify 'flags' and 'event-flags' are values used in fanotify_init
1712 call, 'mnt_id' is the mount point identifier, 'mflags' is the value of
1713 flags associated with mark which are tracked separately from events
1714 mask. 'ino', 'sdev' are target inode and device, 'mask' is the events
1715 mask and 'ignored_mask' is the mask of events which are to be ignored.
1716 All in hex format. Incorporation of 'mflags', 'mask' and 'ignored_mask'
1717 does provide information about flags and mask used in fanotify_mark
1718 call [see fsnotify manpage for details].
1719
1720 While the first three lines are mandatory and always printed, the rest is
1721 optional and may be omitted if no marks created yet.
1722
1723
1590------------------------------------------------------------------------------ 1724------------------------------------------------------------------------------
1591Configuring procfs 1725Configuring procfs
1592------------------------------------------------------------------------------ 1726------------------------------------------------------------------------------
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
index de1e6c4dccff..d230dd9c99b0 100644
--- a/Documentation/filesystems/vfat.txt
+++ b/Documentation/filesystems/vfat.txt
@@ -111,6 +111,15 @@ tz=UTC -- Interpret timestamps as UTC rather than local time.
111 useful when mounting devices (like digital cameras) 111 useful when mounting devices (like digital cameras)
112 that are set to UTC in order to avoid the pitfalls of 112 that are set to UTC in order to avoid the pitfalls of
113 local time. 113 local time.
114time_offset=minutes
115 -- Set offset for conversion of timestamps from local time
116 used by FAT to UTC. I.e. <minutes> minutes will be subtracted
117 from each timestamp to convert it to UTC used internally by
118 Linux. This is useful when time zone set in sys_tz is
119 not the time zone used by the filesystem. Note that this
120 option still does not provide correct time stamps in all
121 cases in presence of DST - time stamps in a different DST
122 setting will be off by one hour.
114 123
115showexec -- If set, the execute permission bits of the file will be 124showexec -- If set, the execute permission bits of the file will be
116 allowed only if the extension part of the name is .EXE, 125 allowed only if the extension part of the name is .EXE,
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
index 3fc0c31a6f5d..3e4b3dd1e046 100644
--- a/Documentation/filesystems/xfs.txt
+++ b/Documentation/filesystems/xfs.txt
@@ -43,7 +43,7 @@ When mounting an XFS filesystem, the following options are accepted.
43 Issue command to let the block device reclaim space freed by the 43 Issue command to let the block device reclaim space freed by the
44 filesystem. This is useful for SSD devices, thinly provisioned 44 filesystem. This is useful for SSD devices, thinly provisioned
45 LUNs and virtual machine images, but may have a performance 45 LUNs and virtual machine images, but may have a performance
46 impact. This option is incompatible with the nodelaylog option. 46 impact.
47 47
48 dmapi 48 dmapi
49 Enable the DMAPI (Data Management API) event callouts. 49 Enable the DMAPI (Data Management API) event callouts.
@@ -72,8 +72,15 @@ When mounting an XFS filesystem, the following options are accepted.
72 Indicates that XFS is allowed to create inodes at any location 72 Indicates that XFS is allowed to create inodes at any location
73 in the filesystem, including those which will result in inode 73 in the filesystem, including those which will result in inode
74 numbers occupying more than 32 bits of significance. This is 74 numbers occupying more than 32 bits of significance. This is
75 provided for backwards compatibility, but causes problems for 75 the default allocation option. Applications which do not handle
76 backup applications that cannot handle large inode numbers. 76 inode numbers bigger than 32 bits, should use inode32 option.
77
78 inode32
79 Indicates that XFS is limited to create inodes at locations which
80 will not result in inode numbers with more than 32 bits of
81 significance. This is provided for backwards compatibility, since
82 64 bits inode numbers might cause problems for some applications
83 that cannot handle large inode numbers.
77 84
78 largeio/nolargeio 85 largeio/nolargeio
79 If "nolargeio" is specified, the optimal I/O reported in 86 If "nolargeio" is specified, the optimal I/O reported in