diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2009-01-08 20:14:59 -0500 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2009-01-08 20:14:59 -0500 |
commit | 2150edc6c5cf00f7adb54538b9ea2a3e9cedca3f (patch) | |
tree | f72a0d85e66f500b4cead348a231e3d3b9f357bc /Documentation/filesystems | |
parent | cd764695b67386a81964f68e9c66efd9f13f4d29 (diff) | |
parent | 4b905671d2ea09fd48fed72c581df17e40823f39 (diff) |
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
ext4: Remove "extents" mount option
block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
ext4: Make printk's consistently prefixed with "EXT4-fs: "
ext4: Add sanity checks for the superblock before mounting the filesystem
ext4: Add mount option to set kjournald's I/O priority
jbd2: Submit writes to the journal using WRITE_SYNC
jbd2: Add pid and journal device name to the "kjournald2 starting" message
ext4: Add markers for better debuggability
ext4: Remove code to create the journal inode
ext4: provide function to release metadata pages under memory pressure
ext3: provide function to release metadata pages under memory pressure
add releasepage hooks to block devices which can be used by file systems
ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
ext4: Init the complete page while building buddy cache
ext4: Don't allow new groups to be added during block allocation
ext4: mark the blocks/inode bitmap beyond end of group as used
ext4: Use new buffer_head flag to check uninit group bitmaps initialization
ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
ext4: code cleanup
...
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/ext4.txt | 85 |
1 files changed, 67 insertions, 18 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 174eaff7ded9..cec829bc7291 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -58,13 +58,22 @@ Note: More extensive information for getting started with ext4 can be | |||
58 | 58 | ||
59 | # mount -t ext4 /dev/hda1 /wherever | 59 | # mount -t ext4 /dev/hda1 /wherever |
60 | 60 | ||
61 | - When comparing performance with other filesystems, remember that | 61 | - When comparing performance with other filesystems, it's always |
62 | ext3/4 by default offers higher data integrity guarantees than most. | 62 | important to try multiple workloads; very often a subtle change in a |
63 | So when comparing with a metadata-only journalling filesystem, such | 63 | workload parameter can completely change the ranking of which |
64 | as ext3, use `mount -o data=writeback'. And you might as well use | 64 | filesystems do well compared to others. When comparing versus ext3, |
65 | `mount -o nobh' too along with it. Making the journal larger than | 65 | note that ext4 enables write barriers by default, while ext3 does |
66 | the mke2fs default often helps performance with metadata-intensive | 66 | not enable write barriers by default. So it is useful to use |
67 | workloads. | 67 | explicitly specify whether barriers are enabled or not when via the |
68 | '-o barriers=[0|1]' mount option for both ext3 and ext4 filesystems | ||
69 | for a fair comparison. When tuning ext3 for best benchmark numbers, | ||
70 | it is often worthwhile to try changing the data journaling mode; '-o | ||
71 | data=writeback,nobh' can be faster for some workloads. (Note | ||
72 | however that running mounted with data=writeback can potentially | ||
73 | leave stale data exposed in recently written files in case of an | ||
74 | unclean shutdown, which could be a security exposure in some | ||
75 | situations.) Configuring the filesystem with a large journal can | ||
76 | also be helpful for metadata-intensive workloads. | ||
68 | 77 | ||
69 | 2. Features | 78 | 2. Features |
70 | =========== | 79 | =========== |
@@ -74,7 +83,7 @@ Note: More extensive information for getting started with ext4 can be | |||
74 | * ability to use filesystems > 16TB (e2fsprogs support not available yet) | 83 | * ability to use filesystems > 16TB (e2fsprogs support not available yet) |
75 | * extent format reduces metadata overhead (RAM, IO for access, transactions) | 84 | * extent format reduces metadata overhead (RAM, IO for access, transactions) |
76 | * extent format more robust in face of on-disk corruption due to magics, | 85 | * extent format more robust in face of on-disk corruption due to magics, |
77 | * internal redunancy in tree | 86 | * internal redundancy in tree |
78 | * improved file allocation (multi-block alloc) | 87 | * improved file allocation (multi-block alloc) |
79 | * fix 32000 subdirectory limit | 88 | * fix 32000 subdirectory limit |
80 | * nsec timestamps for mtime, atime, ctime, create time | 89 | * nsec timestamps for mtime, atime, ctime, create time |
@@ -116,10 +125,11 @@ grouping of bitmaps and inode tables. Some test results available here: | |||
116 | When mounting an ext4 filesystem, the following option are accepted: | 125 | When mounting an ext4 filesystem, the following option are accepted: |
117 | (*) == default | 126 | (*) == default |
118 | 127 | ||
119 | extents (*) ext4 will use extents to address file data. The | 128 | ro Mount filesystem read only. Note that ext4 will |
120 | file system will no longer be mountable by ext3. | 129 | replay the journal (and thus write to the |
121 | 130 | partition) even when mounted "read only". The | |
122 | noextents ext4 will not use extents for newly created files | 131 | mount options "ro,noload" can be used to prevent |
132 | writes to the filesystem. | ||
123 | 133 | ||
124 | journal_checksum Enable checksumming of the journal transactions. | 134 | journal_checksum Enable checksumming of the journal transactions. |
125 | This will allow the recovery code in e2fsck and the | 135 | This will allow the recovery code in e2fsck and the |
@@ -134,17 +144,17 @@ journal_async_commit Commit block can be written to disk without waiting | |||
134 | journal=update Update the ext4 file system's journal to the current | 144 | journal=update Update the ext4 file system's journal to the current |
135 | format. | 145 | format. |
136 | 146 | ||
137 | journal=inum When a journal already exists, this option is ignored. | ||
138 | Otherwise, it specifies the number of the inode which | ||
139 | will represent the ext4 file system's journal file. | ||
140 | |||
141 | journal_dev=devnum When the external journal device's major/minor numbers | 147 | journal_dev=devnum When the external journal device's major/minor numbers |
142 | have changed, this option allows the user to specify | 148 | have changed, this option allows the user to specify |
143 | the new journal location. The journal device is | 149 | the new journal location. The journal device is |
144 | identified through its new major/minor numbers encoded | 150 | identified through its new major/minor numbers encoded |
145 | in devnum. | 151 | in devnum. |
146 | 152 | ||
147 | noload Don't load the journal on mounting. | 153 | noload Don't load the journal on mounting. Note that |
154 | if the filesystem was not unmounted cleanly, | ||
155 | skipping the journal replay will lead to the | ||
156 | filesystem containing inconsistencies that can | ||
157 | lead to any number of problems. | ||
148 | 158 | ||
149 | data=journal All data are committed into the journal prior to being | 159 | data=journal All data are committed into the journal prior to being |
150 | written into the main file system. | 160 | written into the main file system. |
@@ -219,9 +229,12 @@ minixdf Make 'df' act like Minix. | |||
219 | 229 | ||
220 | debug Extra debugging information is sent to syslog. | 230 | debug Extra debugging information is sent to syslog. |
221 | 231 | ||
222 | errors=remount-ro(*) Remount the filesystem read-only on an error. | 232 | errors=remount-ro Remount the filesystem read-only on an error. |
223 | errors=continue Keep going on a filesystem error. | 233 | errors=continue Keep going on a filesystem error. |
224 | errors=panic Panic and halt the machine if an error occurs. | 234 | errors=panic Panic and halt the machine if an error occurs. |
235 | (These mount options override the errors behavior | ||
236 | specified in the superblock, which can be configured | ||
237 | using tune2fs) | ||
225 | 238 | ||
226 | data_err=ignore(*) Just print an error message if an error occurs | 239 | data_err=ignore(*) Just print an error message if an error occurs |
227 | in a file data buffer in ordered mode. | 240 | in a file data buffer in ordered mode. |
@@ -261,6 +274,42 @@ delalloc (*) Deferring block allocation until write-out time. | |||
261 | nodelalloc Disable delayed allocation. Blocks are allocation | 274 | nodelalloc Disable delayed allocation. Blocks are allocation |
262 | when data is copied from user to page cache. | 275 | when data is copied from user to page cache. |
263 | 276 | ||
277 | max_batch_time=usec Maximum amount of time ext4 should wait for | ||
278 | additional filesystem operations to be batch | ||
279 | together with a synchronous write operation. | ||
280 | Since a synchronous write operation is going to | ||
281 | force a commit and then a wait for the I/O | ||
282 | complete, it doesn't cost much, and can be a | ||
283 | huge throughput win, we wait for a small amount | ||
284 | of time to see if any other transactions can | ||
285 | piggyback on the synchronous write. The | ||
286 | algorithm used is designed to automatically tune | ||
287 | for the speed of the disk, by measuring the | ||
288 | amount of time (on average) that it takes to | ||
289 | finish committing a transaction. Call this time | ||
290 | the "commit time". If the time that the | ||
291 | transactoin has been running is less than the | ||
292 | commit time, ext4 will try sleeping for the | ||
293 | commit time to see if other operations will join | ||
294 | the transaction. The commit time is capped by | ||
295 | the max_batch_time, which defaults to 15000us | ||
296 | (15ms). This optimization can be turned off | ||
297 | entirely by setting max_batch_time to 0. | ||
298 | |||
299 | min_batch_time=usec This parameter sets the commit time (as | ||
300 | described above) to be at least min_batch_time. | ||
301 | It defaults to zero microseconds. Increasing | ||
302 | this parameter may improve the throughput of | ||
303 | multi-threaded, synchronous workloads on very | ||
304 | fast disks, at the cost of increasing latency. | ||
305 | |||
306 | journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the | ||
307 | highest priorty) which should be used for I/O | ||
308 | operations submitted by kjournald2 during a | ||
309 | commit operation. This defaults to 3, which is | ||
310 | a slightly higher priority than the default I/O | ||
311 | priority. | ||
312 | |||
264 | Data Mode | 313 | Data Mode |
265 | ========= | 314 | ========= |
266 | There are 3 different data modes: | 315 | There are 3 different data modes: |