aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/Locking2
-rw-r--r--Documentation/filesystems/btrfs.txt91
-rw-r--r--Documentation/filesystems/ext4.txt85
-rw-r--r--Documentation/filesystems/ocfs2.txt3
-rw-r--r--Documentation/filesystems/proc.txt27
5 files changed, 187 insertions, 21 deletions
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index ccec55394380..cfbfa15a46ba 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -397,7 +397,7 @@ prototypes:
397}; 397};
398 398
399locking rules: 399locking rules:
400 All except ->poll() may block. 400 All may block.
401 BKL 401 BKL
402llseek: no (see below) 402llseek: no (see below)
403read: no 403read: no
diff --git a/Documentation/filesystems/btrfs.txt b/Documentation/filesystems/btrfs.txt
new file mode 100644
index 000000000000..64087c34327f
--- /dev/null
+++ b/Documentation/filesystems/btrfs.txt
@@ -0,0 +1,91 @@
1
2 BTRFS
3 =====
4
5Btrfs is a new copy on write filesystem for Linux aimed at
6implementing advanced features while focusing on fault tolerance,
7repair and easy administration. Initially developed by Oracle, Btrfs
8is licensed under the GPL and open for contribution from anyone.
9
10Linux has a wealth of filesystems to choose from, but we are facing a
11number of challenges with scaling to the large storage subsystems that
12are becoming common in today's data centers. Filesystems need to scale
13in their ability to address and manage large storage, and also in
14their ability to detect, repair and tolerate errors in the data stored
15on disk. Btrfs is under heavy development, and is not suitable for
16any uses other than benchmarking and review. The Btrfs disk format is
17not yet finalized.
18
19The main Btrfs features include:
20
21 * Extent based file storage (2^64 max file size)
22 * Space efficient packing of small files
23 * Space efficient indexed directories
24 * Dynamic inode allocation
25 * Writable snapshots
26 * Subvolumes (separate internal filesystem roots)
27 * Object level mirroring and striping
28 * Checksums on data and metadata (multiple algorithms available)
29 * Compression
30 * Integrated multiple device support, with several raid algorithms
31 * Online filesystem check (not yet implemented)
32 * Very fast offline filesystem check
33 * Efficient incremental backup and FS mirroring (not yet implemented)
34 * Online filesystem defragmentation
35
36
37
38 MAILING LIST
39 ============
40
41There is a Btrfs mailing list hosted on vger.kernel.org. You can
42find details on how to subscribe here:
43
44http://vger.kernel.org/vger-lists.html#linux-btrfs
45
46Mailing list archives are available from gmane:
47
48http://dir.gmane.org/gmane.comp.file-systems.btrfs
49
50
51
52 IRC
53 ===
54
55Discussion of Btrfs also occurs on the #btrfs channel of the Freenode
56IRC network.
57
58
59
60 UTILITIES
61 =========
62
63Userspace tools for creating and manipulating Btrfs file systems are
64available from the git repository at the following location:
65
66 http://git.kernel.org/?p=linux/kernel/git/mason/btrfs-progs-unstable.git
67 git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-progs-unstable.git
68
69These include the following tools:
70
71mkfs.btrfs: create a filesystem
72
73btrfsctl: control program to create snapshots and subvolumes:
74
75 mount /dev/sda2 /mnt
76 btrfsctl -s new_subvol_name /mnt
77 btrfsctl -s snapshot_of_default /mnt/default
78 btrfsctl -s snapshot_of_new_subvol /mnt/new_subvol_name
79 btrfsctl -s snapshot_of_a_snapshot /mnt/snapshot_of_new_subvol
80 ls /mnt
81 default snapshot_of_a_snapshot snapshot_of_new_subvol
82 new_subvol_name snapshot_of_default
83
84 Snapshots and subvolumes cannot be deleted right now, but you can
85 rm -rf all the files and directories inside them.
86
87btrfsck: do a limited check of the FS extent trees.
88
89btrfs-debug-tree: print all of the FS metadata in text form. Example:
90
91 btrfs-debug-tree /dev/sda2 >& big_output_file
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 174eaff7ded9..cec829bc7291 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -58,13 +58,22 @@ Note: More extensive information for getting started with ext4 can be
58 58
59 # mount -t ext4 /dev/hda1 /wherever 59 # mount -t ext4 /dev/hda1 /wherever
60 60
61 - When comparing performance with other filesystems, remember that 61 - When comparing performance with other filesystems, it's always
62 ext3/4 by default offers higher data integrity guarantees than most. 62 important to try multiple workloads; very often a subtle change in a
63 So when comparing with a metadata-only journalling filesystem, such 63 workload parameter can completely change the ranking of which
64 as ext3, use `mount -o data=writeback'. And you might as well use 64 filesystems do well compared to others. When comparing versus ext3,
65 `mount -o nobh' too along with it. Making the journal larger than 65 note that ext4 enables write barriers by default, while ext3 does
66 the mke2fs default often helps performance with metadata-intensive 66 not enable write barriers by default. So it is useful to use
67 workloads. 67 explicitly specify whether barriers are enabled or not when via the
68 '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems
69 for a fair comparison. When tuning ext3 for best benchmark numbers,
70 it is often worthwhile to try changing the data journaling mode; '-o
71 data=writeback,nobh' can be faster for some workloads. (Note
72 however that running mounted with data=writeback can potentially
73 leave stale data exposed in recently written files in case of an
74 unclean shutdown, which could be a security exposure in some
75 situations.) Configuring the filesystem with a large journal can
76 also be helpful for metadata-intensive workloads.
68 77
692. Features 782. Features
70=========== 79===========
@@ -74,7 +83,7 @@ Note: More extensive information for getting started with ext4 can be
74* ability to use filesystems > 16TB (e2fsprogs support not available yet) 83* ability to use filesystems > 16TB (e2fsprogs support not available yet)
75* extent format reduces metadata overhead (RAM, IO for access, transactions) 84* extent format reduces metadata overhead (RAM, IO for access, transactions)
76* extent format more robust in face of on-disk corruption due to magics, 85* extent format more robust in face of on-disk corruption due to magics,
77* internal redunancy in tree 86* internal redundancy in tree
78* improved file allocation (multi-block alloc) 87* improved file allocation (multi-block alloc)
79* fix 32000 subdirectory limit 88* fix 32000 subdirectory limit
80* nsec timestamps for mtime, atime, ctime, create time 89* nsec timestamps for mtime, atime, ctime, create time
@@ -116,10 +125,11 @@ grouping of bitmaps and inode tables. Some test results available here:
116When mounting an ext4 filesystem, the following option are accepted: 125When mounting an ext4 filesystem, the following option are accepted:
117(*) == default 126(*) == default
118 127
119extents (*) ext4 will use extents to address file data. The 128ro Mount filesystem read only. Note that ext4 will
120 file system will no longer be mountable by ext3. 129 replay the journal (and thus write to the
121 130 partition) even when mounted "read only". The
122noextents ext4 will not use extents for newly created files 131 mount options "ro,noload" can be used to prevent
132 writes to the filesystem.
123 133
124journal_checksum Enable checksumming of the journal transactions. 134journal_checksum Enable checksumming of the journal transactions.
125 This will allow the recovery code in e2fsck and the 135 This will allow the recovery code in e2fsck and the
@@ -134,17 +144,17 @@ journal_async_commit Commit block can be written to disk without waiting
134journal=update Update the ext4 file system's journal to the current 144journal=update Update the ext4 file system's journal to the current
135 format. 145 format.
136 146
137journal=inum When a journal already exists, this option is ignored.
138 Otherwise, it specifies the number of the inode which
139 will represent the ext4 file system's journal file.
140
141journal_dev=devnum When the external journal device's major/minor numbers 147journal_dev=devnum When the external journal device's major/minor numbers
142 have changed, this option allows the user to specify 148 have changed, this option allows the user to specify
143 the new journal location. The journal device is 149 the new journal location. The journal device is
144 identified through its new major/minor numbers encoded 150 identified through its new major/minor numbers encoded
145 in devnum. 151 in devnum.
146 152
147noload Don't load the journal on mounting. 153noload Don't load the journal on mounting. Note that
154 if the filesystem was not unmounted cleanly,
155 skipping the journal replay will lead to the
156 filesystem containing inconsistencies that can
157 lead to any number of problems.
148 158
149data=journal All data are committed into the journal prior to being 159data=journal All data are committed into the journal prior to being
150 written into the main file system. 160 written into the main file system.
@@ -219,9 +229,12 @@ minixdf Make 'df' act like Minix.
219 229
220debug Extra debugging information is sent to syslog. 230debug Extra debugging information is sent to syslog.
221 231
222errors=remount-ro(*) Remount the filesystem read-only on an error. 232errors=remount-ro Remount the filesystem read-only on an error.
223errors=continue Keep going on a filesystem error. 233errors=continue Keep going on a filesystem error.
224errors=panic Panic and halt the machine if an error occurs. 234errors=panic Panic and halt the machine if an error occurs.
235 (These mount options override the errors behavior
236 specified in the superblock, which can be configured
237 using tune2fs)
225 238
226data_err=ignore(*) Just print an error message if an error occurs 239data_err=ignore(*) Just print an error message if an error occurs
227 in a file data buffer in ordered mode. 240 in a file data buffer in ordered mode.
@@ -261,6 +274,42 @@ delalloc (*) Deferring block allocation until write-out time.
261nodelalloc Disable delayed allocation. Blocks are allocation 274nodelalloc Disable delayed allocation. Blocks are allocation
262 when data is copied from user to page cache. 275 when data is copied from user to page cache.
263 276
277max_batch_time=usec Maximum amount of time ext4 should wait for
278 additional filesystem operations to be batch
279 together with a synchronous write operation.
280 Since a synchronous write operation is going to
281 force a commit and then a wait for the I/O
282 complete, it doesn't cost much, and can be a
283 huge throughput win, we wait for a small amount
284 of time to see if any other transactions can
285 piggyback on the synchronous write. The
286 algorithm used is designed to automatically tune
287 for the speed of the disk, by measuring the
288 amount of time (on average) that it takes to
289 finish committing a transaction. Call this time
290 the "commit time". If the time that the
291 transactoin has been running is less than the
292 commit time, ext4 will try sleeping for the
293 commit time to see if other operations will join
294 the transaction. The commit time is capped by
295 the max_batch_time, which defaults to 15000us
296 (15ms). This optimization can be turned off
297 entirely by setting max_batch_time to 0.
298
299min_batch_time=usec This parameter sets the commit time (as
300 described above) to be at least min_batch_time.
301 It defaults to zero microseconds. Increasing
302 this parameter may improve the throughput of
303 multi-threaded, synchronous workloads on very
304 fast disks, at the cost of increasing latency.
305
306journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the
307 highest priorty) which should be used for I/O
308 operations submitted by kjournald2 during a
309 commit operation. This defaults to 3, which is
310 a slightly higher priority than the default I/O
311 priority.
312
264Data Mode 313Data Mode
265========= 314=========
266There are 3 different data modes: 315There are 3 different data modes:
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index 67310fbbb7df..c2a0871280a0 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -31,7 +31,6 @@ Features which OCFS2 does not support yet:
31 - quotas 31 - quotas
32 - Directory change notification (F_NOTIFY) 32 - Directory change notification (F_NOTIFY)
33 - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) 33 - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
34 - POSIX ACLs
35 34
36Mount options 35Mount options
37============= 36=============
@@ -79,3 +78,5 @@ inode64 Indicates that Ocfs2 is allowed to create inodes at
79 bits of significance. 78 bits of significance.
80user_xattr (*) Enables Extended User Attributes. 79user_xattr (*) Enables Extended User Attributes.
81nouser_xattr Disables Extended User Attributes. 80nouser_xattr Disables Extended User Attributes.
81acl Enables POSIX Access Control Lists support.
82noacl (*) Disables POSIX Access Control Lists support.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 71df353e367c..d105eb45282a 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -140,6 +140,7 @@ Table 1-1: Process specific entries in /proc
140 statm Process memory status information 140 statm Process memory status information
141 status Process status in human readable form 141 status Process status in human readable form
142 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan 142 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
143 stack Report full stack trace, enable via CONFIG_STACKTRACE
143 smaps Extension based on maps, the rss size for each mapped file 144 smaps Extension based on maps, the rss size for each mapped file
144.............................................................................. 145..............................................................................
145 146
@@ -1385,6 +1386,15 @@ swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer
1385to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100 1386to retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
1386causes the kernel to prefer to reclaim dentries and inodes. 1387causes the kernel to prefer to reclaim dentries and inodes.
1387 1388
1389dirty_background_bytes
1390----------------------
1391
1392Contains the amount of dirty memory at which the pdflush background writeback
1393daemon will start writeback.
1394
1395If dirty_background_bytes is written, dirty_background_ratio becomes a function
1396of its value (dirty_background_bytes / the amount of dirtyable system memory).
1397
1388dirty_background_ratio 1398dirty_background_ratio
1389---------------------- 1399----------------------
1390 1400
@@ -1393,14 +1403,29 @@ pages + file cache, not including locked pages and HugePages), the number of
1393pages at which the pdflush background writeback daemon will start writing out 1403pages at which the pdflush background writeback daemon will start writing out
1394dirty data. 1404dirty data.
1395 1405
1406If dirty_background_ratio is written, dirty_background_bytes becomes a function
1407of its value (dirty_background_ratio * the amount of dirtyable system memory).
1408
1409dirty_bytes
1410-----------
1411
1412Contains the amount of dirty memory at which a process generating disk writes
1413will itself start writeback.
1414
1415If dirty_bytes is written, dirty_ratio becomes a function of its value
1416(dirty_bytes / the amount of dirtyable system memory).
1417
1396dirty_ratio 1418dirty_ratio
1397----------------- 1419-----------
1398 1420
1399Contains, as a percentage of the dirtyable system memory (free pages + mapped 1421Contains, as a percentage of the dirtyable system memory (free pages + mapped
1400pages + file cache, not including locked pages and HugePages), the number of 1422pages + file cache, not including locked pages and HugePages), the number of
1401pages at which a process which is generating disk writes will itself start 1423pages at which a process which is generating disk writes will itself start
1402writing out dirty data. 1424writing out dirty data.
1403 1425
1426If dirty_ratio is written, dirty_bytes becomes a function of its value
1427(dirty_ratio * the amount of dirtyable system memory).
1428
1404dirty_writeback_centisecs 1429dirty_writeback_centisecs
1405------------------------- 1430-------------------------
1406 1431