aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDarrick J. Wong <darrick.wong@oracle.com>2018-10-02 22:45:25 -0400
committerTheodore Ts'o <tytso@mit.edu>2018-10-02 22:45:25 -0400
commitc0e3e0406a0c39044c7dc25f3386694542d50fcc (patch)
tree8e463117a71adc0685c44eb6f06c701a09089149
parentde7abd7bbb73d67f90c6fb48d4b2debe54f6f46e (diff)
docs: make ext4 readme tables readable
The tables in the ext4 readme are not particularly space efficient in the text or html outputs, and they're totally broken in the pdf output. Convert them into titled paragraphs so that they render more nicely. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
-rw-r--r--Documentation/filesystems/ext4/ext4.rst821
1 files changed, 391 insertions, 430 deletions
diff --git a/Documentation/filesystems/ext4/ext4.rst b/Documentation/filesystems/ext4/ext4.rst
index 9d4368d591fa..e2b6bb7c2730 100644
--- a/Documentation/filesystems/ext4/ext4.rst
+++ b/Documentation/filesystems/ext4/ext4.rst
@@ -101,269 +101,256 @@ Options
101When mounting an ext4 filesystem, the following option are accepted: 101When mounting an ext4 filesystem, the following option are accepted:
102(*) == default 102(*) == default
103 103
104======================= ======================================================= 104 ro
105Mount Option Description 105 Mount filesystem read only. Note that ext4 will replay the journal (and
106======================= ======================================================= 106 thus write to the partition) even when mounted "read only". The mount
107ro Mount filesystem read only. Note that ext4 will 107 options "ro,noload" can be used to prevent writes to the filesystem.
108 replay the journal (and thus write to the 108
109 partition) even when mounted "read only". The 109 journal_checksum
110 mount options "ro,noload" can be used to prevent 110 Enable checksumming of the journal transactions. This will allow the
111 writes to the filesystem. 111 recovery code in e2fsck and the kernel to detect corruption in the
112 112 kernel. It is a compatible change and will be ignored by older
113journal_checksum Enable checksumming of the journal transactions. 113 kernels.
114 This will allow the recovery code in e2fsck and the 114
115 kernel to detect corruption in the kernel. It is a 115 journal_async_commit
116 compatible change and will be ignored by older kernels. 116 Commit block can be written to disk without waiting for descriptor
117 117 blocks. If enabled older kernels cannot mount the device. This will
118journal_async_commit Commit block can be written to disk without waiting 118 enable 'journal_checksum' internally.
119 for descriptor blocks. If enabled older kernels cannot 119
120 mount the device. This will enable 'journal_checksum' 120 journal_path=path, journal_dev=devnum
121 internally. 121 When the external journal device's major/minor numbers have changed,
122 122 these options allow the user to specify the new journal location. The
123journal_path=path 123 journal device is identified through either its new major/minor numbers
124journal_dev=devnum When the external journal device's major/minor numbers 124 encoded in devnum, or via a path to the device.
125 have changed, these options allow the user to specify 125
126 the new journal location. The journal device is 126 norecovery, noload
127 identified through either its new major/minor numbers 127 Don't load the journal on mounting. Note that if the filesystem was
128 encoded in devnum, or via a path to the device. 128 not unmounted cleanly, skipping the journal replay will lead to the
129 129 filesystem containing inconsistencies that can lead to any number of
130norecovery Don't load the journal on mounting. Note that 130 problems.
131noload if the filesystem was not unmounted cleanly, 131
132 skipping the journal replay will lead to the 132 data=journal
133 filesystem containing inconsistencies that can 133 All data are committed into the journal prior to being written into the
134 lead to any number of problems. 134 main file system. Enabling this mode will disable delayed allocation
135 135 and O_DIRECT support.
136data=journal All data are committed into the journal prior to being 136
137 written into the main file system. Enabling 137 data=ordered (*)
138 this mode will disable delayed allocation and 138 All data are forced directly out to the main file system prior to its
139 O_DIRECT support. 139 metadata being committed to the journal.
140 140
141data=ordered (*) All data are forced directly out to the main file 141 data=writeback
142 system prior to its metadata being committed to the 142 Data ordering is not preserved, data may be written into the main file
143 journal. 143 system after its metadata has been committed to the journal.
144 144
145data=writeback Data ordering is not preserved, data may be written 145 commit=nrsec (*)
146 into the main file system after its metadata has been 146 Ext4 can be told to sync all its data and metadata every 'nrsec'
147 committed to the journal. 147 seconds. The default value is 5 seconds. This means that if you lose
148 148 your power, you will lose as much as the latest 5 seconds of work (your
149commit=nrsec (*) Ext4 can be told to sync all its data and metadata 149 filesystem will not be damaged though, thanks to the journaling). This
150 every 'nrsec' seconds. The default value is 5 seconds. 150 default value (or any low value) will hurt performance, but it's good
151 This means that if you lose your power, you will lose 151 for data-safety. Setting it to 0 will have the same effect as leaving
152 as much as the latest 5 seconds of work (your 152 it at the default (5 seconds). Setting it to very large values will
153 filesystem will not be damaged though, thanks to the 153 improve performance.
154 journaling). This default value (or any low value) 154
155 will hurt performance, but it's good for data-safety. 155 barrier=<0|1(*)>, barrier(*), nobarrier
156 Setting it to 0 will have the same effect as leaving 156 This enables/disables the use of write barriers in the jbd code.
157 it at the default (5 seconds). 157 barrier=0 disables, barrier=1 enables. This also requires an IO stack
158 Setting it to very large values will improve 158 which can support barriers, and if jbd gets an error on a barrier
159 performance. 159 write, it will disable again with a warning. Write barriers enforce
160 160 proper on-disk ordering of journal commits, making volatile disk write
161barrier=<0|1(*)> This enables/disables the use of write barriers in 161 caches safe to use, at some performance penalty. If your disks are
162barrier(*) the jbd code. barrier=0 disables, barrier=1 enables. 162 battery-backed in one way or another, disabling barriers may safely
163nobarrier This also requires an IO stack which can support 163 improve performance. The mount options "barrier" and "nobarrier" can
164 barriers, and if jbd gets an error on a barrier 164 also be used to enable or disable barriers, for consistency with other
165 write, it will disable again with a warning. 165 ext4 mount options.
166 Write barriers enforce proper on-disk ordering 166
167 of journal commits, making volatile disk write caches 167 inode_readahead_blks=n
168 safe to use, at some performance penalty. If 168 This tuning parameter controls the maximum number of inode table blocks
169 your disks are battery-backed in one way or another, 169 that ext4's inode table readahead algorithm will pre-read into the
170 disabling barriers may safely improve performance. 170 buffer cache. The default value is 32 blocks.
171 The mount options "barrier" and "nobarrier" can 171
172 also be used to enable or disable barriers, for 172 nouser_xattr
173 consistency with other ext4 mount options. 173 Disables Extended User Attributes. See the attr(5) manual page for
174 174 more information about extended attributes.
175inode_readahead_blks=n This tuning parameter controls the maximum 175
176 number of inode table blocks that ext4's inode 176 noacl
177 table readahead algorithm will pre-read into 177 This option disables POSIX Access Control List support. If ACL support
178 the buffer cache. The default value is 32 blocks. 178 is enabled in the kernel configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL
179 179 is enabled by default on mount. See the acl(5) manual page for more
180nouser_xattr Disables Extended User Attributes. See the 180 information about acl.
181 attr(5) manual page for more information about 181
182 extended attributes. 182 bsddf (*)
183 183 Make 'df' act like BSD.
184noacl This option disables POSIX Access Control List 184
185 support. If ACL support is enabled in the kernel 185 minixdf
186 configuration (CONFIG_EXT4_FS_POSIX_ACL), ACL is 186 Make 'df' act like Minix.
187 enabled by default on mount. See the acl(5) manual 187
188 page for more information about acl. 188 debug
189 189 Extra debugging information is sent to syslog.
190bsddf (*) Make 'df' act like BSD. 190
191minixdf Make 'df' act like Minix. 191 abort
192 192 Simulate the effects of calling ext4_abort() for debugging purposes.
193debug Extra debugging information is sent to syslog. 193 This is normally used while remounting a filesystem which is already
194 194 mounted.
195abort Simulate the effects of calling ext4_abort() for 195
196 debugging purposes. This is normally used while 196 errors=remount-ro
197 remounting a filesystem which is already mounted. 197 Remount the filesystem read-only on an error.
198 198
199errors=remount-ro Remount the filesystem read-only on an error. 199 errors=continue
200errors=continue Keep going on a filesystem error. 200 Keep going on a filesystem error.
201errors=panic Panic and halt the machine if an error occurs. 201
202 (These mount options override the errors behavior 202 errors=panic
203 specified in the superblock, which can be configured 203 Panic and halt the machine if an error occurs. (These mount options
204 using tune2fs) 204 override the errors behavior specified in the superblock, which can be
205 205 configured using tune2fs)
206data_err=ignore(*) Just print an error message if an error occurs 206
207 in a file data buffer in ordered mode. 207 data_err=ignore(*)
208data_err=abort Abort the journal if an error occurs in a file 208 Just print an error message if an error occurs in a file data buffer in
209 data buffer in ordered mode. 209 ordered mode.
210 210 data_err=abort
211grpid New objects have the group ID of their parent. 211 Abort the journal if an error occurs in a file data buffer in ordered
212bsdgroups 212 mode.
213 213
214nogrpid (*) New objects have the group ID of their creator. 214 grpid | bsdgroups
215sysvgroups 215 New objects have the group ID of their parent.
216 216
217resgid=n The group ID which may use the reserved blocks. 217 nogrpid (*) | sysvgroups
218 218 New objects have the group ID of their creator.
219resuid=n The user ID which may use the reserved blocks. 219
220 220 resgid=n
221sb=n Use alternate superblock at this location. 221 The group ID which may use the reserved blocks.
222 222
223quota These options are ignored by the filesystem. They 223 resuid=n
224noquota are used only by quota tools to recognize volumes 224 The user ID which may use the reserved blocks.
225grpquota where quota should be turned on. See documentation 225
226usrquota in the quota-tools package for more details 226 sb=
227 (http://sourceforge.net/projects/linuxquota). 227 Use alternate superblock at this location.
228 228
229jqfmt=<quota type> These options tell filesystem details about quota 229 quota, noquota, grpquota, usrquota
230usrjquota=<file> so that quota information can be properly updated 230 These options are ignored by the filesystem. They are used only by
231grpjquota=<file> during journal replay. They replace the above 231 quota tools to recognize volumes where quota should be turned on. See
232 quota options. See documentation in the quota-tools 232 documentation in the quota-tools package for more details
233 package for more details 233 (http://sourceforge.net/projects/linuxquota).
234 (http://sourceforge.net/projects/linuxquota). 234
235 235 jqfmt=<quota type>, usrjquota=<file>, grpjquota=<file>
236stripe=n Number of filesystem blocks that mballoc will try 236 These options tell filesystem details about quota so that quota
237 to use for allocation size and alignment. For RAID5/6 237 information can be properly updated during journal replay. They replace
238 systems this should be the number of data 238 the above quota options. See documentation in the quota-tools package
239 disks * RAID chunk size in file system blocks. 239 for more details (http://sourceforge.net/projects/linuxquota).
240 240
241delalloc (*) Defer block allocation until just before ext4 241 stripe=n
242 writes out the block(s) in question. This 242 Number of filesystem blocks that mballoc will try to use for allocation
243 allows ext4 to better allocation decisions 243 size and alignment. For RAID5/6 systems this should be the number of
244 more efficiently. 244 data disks * RAID chunk size in file system blocks.
245nodelalloc Disable delayed allocation. Blocks are allocated 245
246 when the data is copied from userspace to the 246 delalloc (*)
247 page cache, either via the write(2) system call 247 Defer block allocation until just before ext4 writes out the block(s)
248 or when an mmap'ed page which was previously 248 in question. This allows ext4 to better allocation decisions more
249 unallocated is written for the first time. 249 efficiently.
250 250
251max_batch_time=usec Maximum amount of time ext4 should wait for 251 nodelalloc
252 additional filesystem operations to be batch 252 Disable delayed allocation. Blocks are allocated when the data is
253 together with a synchronous write operation. 253 copied from userspace to the page cache, either via the write(2) system
254 Since a synchronous write operation is going to 254 call or when an mmap'ed page which was previously unallocated is
255 force a commit and then a wait for the I/O 255 written for the first time.
256 complete, it doesn't cost much, and can be a 256
257 huge throughput win, we wait for a small amount 257 max_batch_time=usec
258 of time to see if any other transactions can 258 Maximum amount of time ext4 should wait for additional filesystem
259 piggyback on the synchronous write. The 259 operations to be batch together with a synchronous write operation.
260 algorithm used is designed to automatically tune 260 Since a synchronous write operation is going to force a commit and then
261 for the speed of the disk, by measuring the 261 a wait for the I/O complete, it doesn't cost much, and can be a huge
262 amount of time (on average) that it takes to 262 throughput win, we wait for a small amount of time to see if any other
263 finish committing a transaction. Call this time 263 transactions can piggyback on the synchronous write. The algorithm
264 the "commit time". If the time that the 264 used is designed to automatically tune for the speed of the disk, by
265 transaction has been running is less than the 265 measuring the amount of time (on average) that it takes to finish
266 commit time, ext4 will try sleeping for the 266 committing a transaction. Call this time the "commit time". If the
267 commit time to see if other operations will join 267 time that the transaction has been running is less than the commit
268 the transaction. The commit time is capped by 268 time, ext4 will try sleeping for the commit time to see if other
269 the max_batch_time, which defaults to 15000us 269 operations will join the transaction. The commit time is capped by
270 (15ms). This optimization can be turned off 270 the max_batch_time, which defaults to 15000us (15ms). This
271 entirely by setting max_batch_time to 0. 271 optimization can be turned off entirely by setting max_batch_time to 0.
272 272
273min_batch_time=usec This parameter sets the commit time (as 273 min_batch_time=usec
274 described above) to be at least min_batch_time. 274 This parameter sets the commit time (as described above) to be at least
275 It defaults to zero microseconds. Increasing 275 min_batch_time. It defaults to zero microseconds. Increasing this
276 this parameter may improve the throughput of 276 parameter may improve the throughput of multi-threaded, synchronous
277 multi-threaded, synchronous workloads on very 277 workloads on very fast disks, at the cost of increasing latency.
278 fast disks, at the cost of increasing latency. 278
279 279 journal_ioprio=prio
280journal_ioprio=prio The I/O priority (from 0 to 7, where 0 is the 280 The I/O priority (from 0 to 7, where 0 is the highest priority) which
281 highest priority) which should be used for I/O 281 should be used for I/O operations submitted by kjournald2 during a
282 operations submitted by kjournald2 during a 282 commit operation. This defaults to 3, which is a slightly higher
283 commit operation. This defaults to 3, which is 283 priority than the default I/O priority.
284 a slightly higher priority than the default I/O 284
285 priority. 285 auto_da_alloc(*), noauto_da_alloc
286 286 Many broken applications don't use fsync() when replacing existing
287auto_da_alloc(*) Many broken applications don't use fsync() when 287 files via patterns such as fd = open("foo.new")/write(fd,..)/close(fd)/
288noauto_da_alloc replacing existing files via patterns such as 288 rename("foo.new", "foo"), or worse yet, fd = open("foo",
289 fd = open("foo.new")/write(fd,..)/close(fd)/ 289 O_TRUNC)/write(fd,..)/close(fd). If auto_da_alloc is enabled, ext4
290 rename("foo.new", "foo"), or worse yet, 290 will detect the replace-via-rename and replace-via-truncate patterns
291 fd = open("foo", O_TRUNC)/write(fd,..)/close(fd). 291 and force that any delayed allocation blocks are allocated such that at
292 If auto_da_alloc is enabled, ext4 will detect 292 the next journal commit, in the default data=ordered mode, the data
293 the replace-via-rename and replace-via-truncate 293 blocks of the new file are forced to disk before the rename() operation
294 patterns and force that any delayed allocation 294 is committed. This provides roughly the same level of guarantees as
295 blocks are allocated such that at the next 295 ext3, and avoids the "zero-length" problem that can happen when a
296 journal commit, in the default data=ordered 296 system crashes before the delayed allocation blocks are forced to disk.
297 mode, the data blocks of the new file are forced 297
298 to disk before the rename() operation is 298 noinit_itable
299 committed. This provides roughly the same level 299 Do not initialize any uninitialized inode table blocks in the
300 of guarantees as ext3, and avoids the 300 background. This feature may be used by installation CD's so that the
301 "zero-length" problem that can happen when a 301 install process can complete as quickly as possible; the inode table
302 system crashes before the delayed allocation 302 initialization process would then be deferred until the next time the
303 blocks are forced to disk. 303 file system is unmounted.
304 304
305noinit_itable Do not initialize any uninitialized inode table 305 init_itable=n
306 blocks in the background. This feature may be 306 The lazy itable init code will wait n times the number of milliseconds
307 used by installation CD's so that the install 307 it took to zero out the previous block group's inode table. This
308 process can complete as quickly as possible; the 308 minimizes the impact on the system performance while file system's
309 inode table initialization process would then be 309 inode table is being initialized.
310 deferred until the next time the file system 310
311 is unmounted. 311 discard, nodiscard(*)
312 312 Controls whether ext4 should issue discard/TRIM commands to the
313init_itable=n The lazy itable init code will wait n times the 313 underlying block device when blocks are freed. This is useful for SSD
314 number of milliseconds it took to zero out the 314 devices and sparse/thinly-provisioned LUNs, but it is off by default
315 previous block group's inode table. This 315 until sufficient testing has been done.
316 minimizes the impact on the system performance 316
317 while file system's inode table is being initialized. 317 nouid32
318 318 Disables 32-bit UIDs and GIDs. This is for interoperability with
319discard Controls whether ext4 should issue discard/TRIM 319 older kernels which only store and expect 16-bit values.
320nodiscard(*) commands to the underlying block device when 320
321 blocks are freed. This is useful for SSD devices 321 block_validity(*), noblock_validity
322 and sparse/thinly-provisioned LUNs, but it is off 322 These options enable or disable the in-kernel facility for tracking
323 by default until sufficient testing has been done. 323 filesystem metadata blocks within internal data structures. This
324 324 allows multi- block allocator and other routines to notice bugs or
325nouid32 Disables 32-bit UIDs and GIDs. This is for 325 corrupted allocation bitmaps which cause blocks to be allocated which
326 interoperability with older kernels which only 326 overlap with filesystem metadata blocks.
327 store and expect 16-bit values. 327
328 328 dioread_lock, dioread_nolock
329block_validity(*) These options enable or disable the in-kernel 329 Controls whether or not ext4 should use the DIO read locking. If the
330noblock_validity facility for tracking filesystem metadata blocks 330 dioread_nolock option is specified ext4 will allocate uninitialized
331 within internal data structures. This allows multi- 331 extent before buffer write and convert the extent to initialized after
332 block allocator and other routines to notice 332 IO completes. This approach allows ext4 code to avoid using inode
333 bugs or corrupted allocation bitmaps which cause 333 mutex, which improves scalability on high speed storages. However this
334 blocks to be allocated which overlap with 334 does not work with data journaling and dioread_nolock option will be
335 filesystem metadata blocks. 335 ignored with kernel warning. Note that dioread_nolock code path is only
336 336 used for extent-based files. Because of the restrictions this options
337dioread_lock Controls whether or not ext4 should use the DIO read 337 comprises it is off by default (e.g. dioread_lock).
338dioread_nolock locking. If the dioread_nolock option is specified 338
339 ext4 will allocate uninitialized extent before buffer 339 max_dir_size_kb=n
340 write and convert the extent to initialized after IO 340 This limits the size of directories so that any attempt to expand them
341 completes. This approach allows ext4 code to avoid 341 beyond the specified limit in kilobytes will cause an ENOSPC error.
342 using inode mutex, which improves scalability on high 342 This is useful in memory constrained environments, where a very large
343 speed storages. However this does not work with 343 directory can cause severe performance problems or even provoke the Out
344 data journaling and dioread_nolock option will be 344 Of Memory killer. (For example, if there is only 512mb memory
345 ignored with kernel warning. Note that dioread_nolock 345 available, a 176mb directory may seriously cramp the system's style.)
346 code path is only used for extent-based files. 346
347 Because of the restrictions this options comprises 347 i_version
348 it is off by default (e.g. dioread_lock). 348 Enable 64-bit inode version support. This option is off by default.
349 349
350max_dir_size_kb=n This limits the size of directories so that any 350 dax
351 attempt to expand them beyond the specified 351 Use direct access (no page cache). See
352 limit in kilobytes will cause an ENOSPC error. 352 Documentation/filesystems/dax.txt. Note that this option is
353 This is useful in memory constrained 353 incompatible with data=journal.
354 environments, where a very large directory can
355 cause severe performance problems or even
356 provoke the Out Of Memory killer. (For example,
357 if there is only 512mb memory available, a 176mb
358 directory may seriously cramp the system's style.)
359
360i_version Enable 64-bit inode version support. This option is
361 off by default.
362
363dax Use direct access (no page cache). See
364 Documentation/filesystems/dax.txt. Note that
365 this option is incompatible with data=journal.
366======================= =======================================================
367 354
368Data Mode 355Data Mode
369========= 356=========
@@ -407,11 +394,8 @@ in table below.
407 394
408Files in /proc/fs/ext4/<devname> 395Files in /proc/fs/ext4/<devname>
409 396
410================ ======= 397 mb_groups
411 File Content 398 details of multiblock allocator buddy cache of free blocks
412================ =======
413 mb_groups details of multiblock allocator buddy cache of free blocks
414================ =======
415 399
416/sys entries 400/sys entries
417============ 401============
@@ -426,74 +410,71 @@ Files in /sys/fs/ext4/<devname>:
426 410
427(see also Documentation/ABI/testing/sysfs-fs-ext4) 411(see also Documentation/ABI/testing/sysfs-fs-ext4)
428 412
429============================= ================================================= 413 delayed_allocation_blocks
430File Content 414 This file is read-only and shows the number of blocks that are dirty in
431============================= ================================================= 415 the page cache, but which do not have their location in the filesystem
432 delayed_allocation_blocks This file is read-only and shows the number of 416 allocated yet.
433 blocks that are dirty in the page cache, but 417
434 which do not have their location in the 418 inode_goal
435 filesystem allocated yet. 419 Tuning parameter which (if non-zero) controls the goal inode used by
436 420 the inode allocator in preference to all other allocation heuristics.
437inode_goal Tuning parameter which (if non-zero) controls 421 This is intended for debugging use only, and should be 0 on production
438 the goal inode used by the inode allocator in 422 systems.
439 preference to all other allocation heuristics. 423
440 This is intended for debugging use only, and 424 inode_readahead_blks
441 should be 0 on production systems. 425 Tuning parameter which controls the maximum number of inode table
442 426 blocks that ext4's inode table readahead algorithm will pre-read into
443inode_readahead_blks Tuning parameter which controls the maximum 427 the buffer cache.
444 number of inode table blocks that ext4's inode 428
445 table readahead algorithm will pre-read into 429 lifetime_write_kbytes
446 the buffer cache 430 This file is read-only and shows the number of kilobytes of data that
447 431 have been written to this filesystem since it was created.
448lifetime_write_kbytes This file is read-only and shows the number of 432
449 kilobytes of data that have been written to this 433 max_writeback_mb_bump
450 filesystem since it was created. 434 The maximum number of megabytes the writeback code will try to write
451 435 out before move on to another inode.
452 max_writeback_mb_bump The maximum number of megabytes the writeback 436
453 code will try to write out before move on to 437 mb_group_prealloc
454 another inode. 438 The multiblock allocator will round up allocation requests to a
455 439 multiple of this tuning parameter if the stripe size is not set in the
456 mb_group_prealloc The multiblock allocator will round up allocation 440 ext4 superblock
457 requests to a multiple of this tuning parameter if 441
458 the stripe size is not set in the ext4 superblock 442 mb_max_to_scan
459 443 The maximum number of extents the multiblock allocator will search to
460 mb_max_to_scan The maximum number of extents the multiblock 444 find the best extent.
461 allocator will search to find the best extent 445
462 446 mb_min_to_scan
463 mb_min_to_scan The minimum number of extents the multiblock 447 The minimum number of extents the multiblock allocator will search to
464 allocator will search to find the best extent 448 find the best extent.
465 449
466 mb_order2_req Tuning parameter which controls the minimum size 450 mb_order2_req
467 for requests (as a power of 2) where the buddy 451 Tuning parameter which controls the minimum size for requests (as a
468 cache is used 452 power of 2) where the buddy cache is used.
469 453
470 mb_stats Controls whether the multiblock allocator should 454 mb_stats
471 collect statistics, which are shown during the 455 Controls whether the multiblock allocator should collect statistics,
472 unmount. 1 means to collect statistics, 0 means 456 which are shown during the unmount. 1 means to collect statistics, 0
473 not to collect statistics 457 means not to collect statistics.
474 458
475 mb_stream_req Files which have fewer blocks than this tunable 459 mb_stream_req
476 parameter will have their blocks allocated out 460 Files which have fewer blocks than this tunable parameter will have
477 of a block group specific preallocation pool, so 461 their blocks allocated out of a block group specific preallocation
478 that small files are packed closely together. 462 pool, so that small files are packed closely together. Each large file
479 Each large file will have its blocks allocated 463 will have its blocks allocated out of its own unique preallocation
480 out of its own unique preallocation pool. 464 pool.
481 465
482 session_write_kbytes This file is read-only and shows the number of 466 session_write_kbytes
483 kilobytes of data that have been written to this 467 This file is read-only and shows the number of kilobytes of data that
484 filesystem since it was mounted. 468 have been written to this filesystem since it was mounted.
485 469
486 reserved_clusters This is RW file and contains number of reserved 470 reserved_clusters
487 clusters in the file system which will be used 471 This is RW file and contains number of reserved clusters in the file
488 in the specific situations to avoid costly 472 system which will be used in the specific situations to avoid costly
489 zeroout, unexpected ENOSPC, or possible data 473 zeroout, unexpected ENOSPC, or possible data loss. The default is 2% or
490 loss. The default is 2% or 4096 clusters, 474 4096 clusters, whichever is smaller and this can be changed however it
491 whichever is smaller and this can be changed 475 can never exceed number of clusters in the file system. If there is not
492 however it can never exceed number of clusters 476 enough space for the reserved space when mounting the file mount will
493 in the file system. If there is not enough space 477 _not_ fail.
494 for the reserved space when mounting the file
495 mount will _not_ fail.
496============================= =================================================
497 478
498Ioctls 479Ioctls
499====== 480======
@@ -504,100 +485,80 @@ shown in the table below.
504 485
505Table of Ext4 specific ioctls 486Table of Ext4 specific ioctls
506 487
507============================= ================================================= 488 EXT4_IOC_GETFLAGS
508Ioctl Description 489 Get additional attributes associated with inode. The ioctl argument is
509============================= ================================================= 490 an integer bitfield, with bit values described in ext4.h. This ioctl is
510 EXT4_IOC_GETFLAGS Get additional attributes associated with inode. 491 an alias for FS_IOC_GETFLAGS.
511 The ioctl argument is an integer bitfield, with 492
512 bit values described in ext4.h. This ioctl is an 493 EXT4_IOC_SETFLAGS
513 alias for FS_IOC_GETFLAGS. 494 Set additional attributes associated with inode. The ioctl argument is
514 495 an integer bitfield, with bit values described in ext4.h. This ioctl is
515 EXT4_IOC_SETFLAGS Set additional attributes associated with inode. 496 an alias for FS_IOC_SETFLAGS.
516 The ioctl argument is an integer bitfield, with 497
517 bit values described in ext4.h. This ioctl is an 498 EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD
518 alias for FS_IOC_SETFLAGS. 499 Get the inode i_generation number stored for each inode. The
519 500 i_generation number is normally changed only when new inode is created
520 EXT4_IOC_GETVERSION 501 and it is particularly useful for network filesystems. The '_OLD'
521 EXT4_IOC_GETVERSION_OLD 502 version of this ioctl is an alias for FS_IOC_GETVERSION.
522 Get the inode i_generation number stored for 503
523 each inode. The i_generation number is normally 504 EXT4_IOC_SETVERSION, EXT4_IOC_SETVERSION_OLD
524 changed only when new inode is created and it is 505 Set the inode i_generation number stored for each inode. The '_OLD'
525 particularly useful for network filesystems. The 506 version of this ioctl is an alias for FS_IOC_SETVERSION.
526 '_OLD' version of this ioctl is an alias for 507
527 FS_IOC_GETVERSION. 508 EXT4_IOC_GROUP_EXTEND
528 509 This ioctl has the same purpose as the resize mount option. It allows
529 EXT4_IOC_SETVERSION 510 to resize filesystem to the end of the last existing block group,
530 EXT4_IOC_SETVERSION_OLD 511 further resize has to be done with resize2fs, either online, or
531 Set the inode i_generation number stored for 512 offline. The argument points to the unsigned logn number representing
532 each inode. The '_OLD' version of this ioctl 513 the filesystem new block count.
533 is an alias for FS_IOC_SETVERSION. 514
534 515 EXT4_IOC_MOVE_EXT
535 EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize 516 Move the block extents from orig_fd (the one this ioctl is pointing to)
536 mount option. It allows to resize filesystem 517 to the donor_fd (the one specified in move_extent structure passed as
537 to the end of the last existing block group, 518 an argument to this ioctl). Then, exchange inode metadata between
538 further resize has to be done with resize2fs, 519 orig_fd and donor_fd. This is especially useful for online
539 either online, or offline. The argument points 520 defragmentation, because the allocator has the opportunity to allocate
540 to the unsigned logn number representing the 521 moved blocks better, ideally into one contiguous extent.
541 filesystem new block count. 522
542 523 EXT4_IOC_GROUP_ADD
543 EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one 524 Add a new group descriptor to an existing or new group descriptor
544 this ioctl is pointing to) to the donor_fd (the 525 block. The new group descriptor is described by ext4_new_group_input
545 one specified in move_extent structure passed 526 structure, which is passed as an argument to this ioctl. This is
546 as an argument to this ioctl). Then, exchange 527 especially useful in conjunction with EXT4_IOC_GROUP_EXTEND, which
547 inode metadata between orig_fd and donor_fd. 528 allows online resize of the filesystem to the end of the last existing
548 This is especially useful for online 529 block group. Those two ioctls combined is used in userspace online
549 defragmentation, because the allocator has the 530 resize tool (e.g. resize2fs).
550 opportunity to allocate moved blocks better, 531
551 ideally into one contiguous extent. 532 EXT4_IOC_MIGRATE
552 533 This ioctl operates on the filesystem itself. It converts (migrates)
553 EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or 534 ext3 indirect block mapped inode to ext4 extent mapped inode by walking
554 new group descriptor block. The new group 535 through indirect block mapping of the original inode and converting
555 descriptor is described by ext4_new_group_input 536 contiguous block ranges into ext4 extents of the temporary inode. Then,
556 structure, which is passed as an argument to 537 inodes are swapped. This ioctl might help, when migrating from ext3 to
557 this ioctl. This is especially useful in 538 ext4 filesystem, however suggestion is to create fresh ext4 filesystem
558 conjunction with EXT4_IOC_GROUP_EXTEND, 539 and copy data from the backup. Note, that filesystem has to support
559 which allows online resize of the filesystem 540 extents for this ioctl to work.
560 to the end of the last existing block group. 541
561 Those two ioctls combined is used in userspace 542 EXT4_IOC_ALLOC_DA_BLKS
562 online resize tool (e.g. resize2fs). 543 Force all of the delay allocated blocks to be allocated to preserve
563 544 application-expected ext3 behaviour. Note that this will also start
564 EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. 545 triggering a write of the data blocks, but this behaviour may change in
565 It converts (migrates) ext3 indirect block mapped 546 the future as it is not necessary and has been done this way only for
566 inode to ext4 extent mapped inode by walking 547 sake of simplicity.
567 through indirect block mapping of the original 548
568 inode and converting contiguous block ranges 549 EXT4_IOC_RESIZE_FS
569 into ext4 extents of the temporary inode. Then, 550 Resize the filesystem to a new size. The number of blocks of resized
570 inodes are swapped. This ioctl might help, when 551 filesystem is passed in via 64 bit integer argument. The kernel
571 migrating from ext3 to ext4 filesystem, however 552 allocates bitmaps and inode table, the userspace tool thus just passes
572 suggestion is to create fresh ext4 filesystem 553 the new number of blocks.
573 and copy data from the backup. Note, that 554
574 filesystem has to support extents for this ioctl 555 EXT4_IOC_SWAP_BOOT
575 to work. 556 Swap i_blocks and associated attributes (like i_blocks, i_size,
576 557 i_flags, ...) from the specified inode with inode EXT4_BOOT_LOADER_INO
577 EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be 558 (#5). This is typically used to store a boot loader in a secure part of
578 allocated to preserve application-expected ext3 559 the filesystem, where it can't be changed by a normal user by accident.
579 behaviour. Note that this will also start 560 The data blocks of the previous boot loader will be associated with the
580 triggering a write of the data blocks, but this 561 given inode.
581 behaviour may change in the future as it is
582 not necessary and has been done this way only
583 for sake of simplicity.
584
585 EXT4_IOC_RESIZE_FS Resize the filesystem to a new size. The number
586 of blocks of resized filesystem is passed in via
587 64 bit integer argument. The kernel allocates
588 bitmaps and inode table, the userspace tool thus
589 just passes the new number of blocks.
590
591 EXT4_IOC_SWAP_BOOT Swap i_blocks and associated attributes
592 (like i_blocks, i_size, i_flags, ...) from
593 the specified inode with inode
594 EXT4_BOOT_LOADER_INO (#5). This is typically
595 used to store a boot loader in a secure part of
596 the filesystem, where it can't be changed by a
597 normal user by accident.
598 The data blocks of the previous boot loader
599 will be associated with the given inode.
600============================= =================================================
601 562
602References 563References
603========== 564==========