diff options
Diffstat (limited to 'Documentation/filesystems/ext4.txt')
-rw-r--r-- | Documentation/filesystems/ext4.txt | 229 |
1 files changed, 222 insertions, 7 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index e1def1786e50..3ae9bc94352a 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -97,7 +97,7 @@ Note: More extensive information for getting started with ext4 can be | |||
97 | * Inode allocation using large virtual block groups via flex_bg | 97 | * Inode allocation using large virtual block groups via flex_bg |
98 | * delayed allocation | 98 | * delayed allocation |
99 | * large block (up to pagesize) support | 99 | * large block (up to pagesize) support |
100 | * efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force | 100 | * efficient new ordered mode in JBD2 and ext4(avoid using buffer head to force |
101 | the ordering) | 101 | the ordering) |
102 | 102 | ||
103 | [1] Filesystems with a block size of 1k may see a limit imposed by the | 103 | [1] Filesystems with a block size of 1k may see a limit imposed by the |
@@ -106,7 +106,7 @@ directory hash tree having a maximum depth of two. | |||
106 | 2.2 Candidate features for future inclusion | 106 | 2.2 Candidate features for future inclusion |
107 | 107 | ||
108 | * Online defrag (patches available but not well tested) | 108 | * Online defrag (patches available but not well tested) |
109 | * reduced mke2fs time via lazy itable initialization in conjuction with | 109 | * reduced mke2fs time via lazy itable initialization in conjunction with |
110 | the uninit_bg feature (capability to do this is available in e2fsprogs | 110 | the uninit_bg feature (capability to do this is available in e2fsprogs |
111 | but a kernel thread to do lazy zeroing of unused inode table blocks | 111 | but a kernel thread to do lazy zeroing of unused inode table blocks |
112 | after filesystem is first mounted is required for safety) | 112 | after filesystem is first mounted is required for safety) |
@@ -226,10 +226,6 @@ acl Enables POSIX Access Control Lists support. | |||
226 | noacl This option disables POSIX Access Control List | 226 | noacl This option disables POSIX Access Control List |
227 | support. | 227 | support. |
228 | 228 | ||
229 | reservation | ||
230 | |||
231 | noreservation | ||
232 | |||
233 | bsddf (*) Make 'df' act like BSD. | 229 | bsddf (*) Make 'df' act like BSD. |
234 | minixdf Make 'df' act like Minix. | 230 | minixdf Make 'df' act like Minix. |
235 | 231 | ||
@@ -353,12 +349,61 @@ noauto_da_alloc replacing existing files via patterns such as | |||
353 | system crashes before the delayed allocation | 349 | system crashes before the delayed allocation |
354 | blocks are forced to disk. | 350 | blocks are forced to disk. |
355 | 351 | ||
356 | discard Controls whether ext4 should issue discard/TRIM | 352 | noinit_itable Do not initialize any uninitialized inode table |
353 | blocks in the background. This feature may be | ||
354 | used by installation CD's so that the install | ||
355 | process can complete as quickly as possible; the | ||
356 | inode table initialization process would then be | ||
357 | deferred until the next time the file system | ||
358 | is unmounted. | ||
359 | |||
360 | init_itable=n The lazy itable init code will wait n times the | ||
361 | number of milliseconds it took to zero out the | ||
362 | previous block group's inode table. This | ||
363 | minimizes the impact on the systme performance | ||
364 | while file system's inode table is being initialized. | ||
365 | |||
366 | discard Controls whether ext4 should issue discard/TRIM | ||
357 | nodiscard(*) commands to the underlying block device when | 367 | nodiscard(*) commands to the underlying block device when |
358 | blocks are freed. This is useful for SSD devices | 368 | blocks are freed. This is useful for SSD devices |
359 | and sparse/thinly-provisioned LUNs, but it is off | 369 | and sparse/thinly-provisioned LUNs, but it is off |
360 | by default until sufficient testing has been done. | 370 | by default until sufficient testing has been done. |
361 | 371 | ||
372 | nouid32 Disables 32-bit UIDs and GIDs. This is for | ||
373 | interoperability with older kernels which only | ||
374 | store and expect 16-bit values. | ||
375 | |||
376 | resize Allows to resize filesystem to the end of the last | ||
377 | existing block group, further resize has to be done | ||
378 | with resize2fs either online, or offline. It can be | ||
379 | used only with conjunction with remount. | ||
380 | |||
381 | block_validity This options allows to enables/disables the in-kernel | ||
382 | noblock_validity facility for tracking filesystem metadata blocks | ||
383 | within internal data structures. This allows multi- | ||
384 | block allocator and other routines to quickly locate | ||
385 | extents which might overlap with filesystem metadata | ||
386 | blocks. This option is intended for debugging | ||
387 | purposes and since it negatively affects the | ||
388 | performance, it is off by default. | ||
389 | |||
390 | dioread_lock Controls whether or not ext4 should use the DIO read | ||
391 | dioread_nolock locking. If the dioread_nolock option is specified | ||
392 | ext4 will allocate uninitialized extent before buffer | ||
393 | write and convert the extent to initialized after IO | ||
394 | completes. This approach allows ext4 code to avoid | ||
395 | using inode mutex, which improves scalability on high | ||
396 | speed storages. However this does not work with nobh | ||
397 | option and the mount will fail. Nor does it work with | ||
398 | data journaling and dioread_nolock option will be | ||
399 | ignored with kernel warning. Note that dioread_nolock | ||
400 | code path is only used for extent-based files. | ||
401 | Because of the restrictions this options comprises | ||
402 | it is off by default (e.g. dioread_lock). | ||
403 | |||
404 | i_version Enable 64-bit inode version support. This option is | ||
405 | off by default. | ||
406 | |||
362 | Data Mode | 407 | Data Mode |
363 | ========= | 408 | ========= |
364 | There are 3 different data modes: | 409 | There are 3 different data modes: |
@@ -386,6 +431,176 @@ needs to be read from and written to disk at the same time where it | |||
386 | outperforms all others modes. Currently ext4 does not have delayed | 431 | outperforms all others modes. Currently ext4 does not have delayed |
387 | allocation support if this data journalling mode is selected. | 432 | allocation support if this data journalling mode is selected. |
388 | 433 | ||
434 | /proc entries | ||
435 | ============= | ||
436 | |||
437 | Information about mounted ext4 file systems can be found in | ||
438 | /proc/fs/ext4. Each mounted filesystem will have a directory in | ||
439 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or | ||
440 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown | ||
441 | in table below. | ||
442 | |||
443 | Files in /proc/fs/ext4/<devname> | ||
444 | .............................................................................. | ||
445 | File Content | ||
446 | mb_groups details of multiblock allocator buddy cache of free blocks | ||
447 | .............................................................................. | ||
448 | |||
449 | /sys entries | ||
450 | ============ | ||
451 | |||
452 | Information about mounted ext4 file systems can be found in | ||
453 | /sys/fs/ext4. Each mounted filesystem will have a directory in | ||
454 | /sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or | ||
455 | /sys/fs/ext4/dm-0). The files in each per-device directory are shown | ||
456 | in table below. | ||
457 | |||
458 | Files in /sys/fs/ext4/<devname> | ||
459 | (see also Documentation/ABI/testing/sysfs-fs-ext4) | ||
460 | .............................................................................. | ||
461 | File Content | ||
462 | |||
463 | delayed_allocation_blocks This file is read-only and shows the number of | ||
464 | blocks that are dirty in the page cache, but | ||
465 | which do not have their location in the | ||
466 | filesystem allocated yet. | ||
467 | |||
468 | inode_goal Tuning parameter which (if non-zero) controls | ||
469 | the goal inode used by the inode allocator in | ||
470 | preference to all other allocation heuristics. | ||
471 | This is intended for debugging use only, and | ||
472 | should be 0 on production systems. | ||
473 | |||
474 | inode_readahead_blks Tuning parameter which controls the maximum | ||
475 | number of inode table blocks that ext4's inode | ||
476 | table readahead algorithm will pre-read into | ||
477 | the buffer cache | ||
478 | |||
479 | lifetime_write_kbytes This file is read-only and shows the number of | ||
480 | kilobytes of data that have been written to this | ||
481 | filesystem since it was created. | ||
482 | |||
483 | max_writeback_mb_bump The maximum number of megabytes the writeback | ||
484 | code will try to write out before move on to | ||
485 | another inode. | ||
486 | |||
487 | mb_group_prealloc The multiblock allocator will round up allocation | ||
488 | requests to a multiple of this tuning parameter if | ||
489 | the stripe size is not set in the ext4 superblock | ||
490 | |||
491 | mb_max_to_scan The maximum number of extents the multiblock | ||
492 | allocator will search to find the best extent | ||
493 | |||
494 | mb_min_to_scan The minimum number of extents the multiblock | ||
495 | allocator will search to find the best extent | ||
496 | |||
497 | mb_order2_req Tuning parameter which controls the minimum size | ||
498 | for requests (as a power of 2) where the buddy | ||
499 | cache is used | ||
500 | |||
501 | mb_stats Controls whether the multiblock allocator should | ||
502 | collect statistics, which are shown during the | ||
503 | unmount. 1 means to collect statistics, 0 means | ||
504 | not to collect statistics | ||
505 | |||
506 | mb_stream_req Files which have fewer blocks than this tunable | ||
507 | parameter will have their blocks allocated out | ||
508 | of a block group specific preallocation pool, so | ||
509 | that small files are packed closely together. | ||
510 | Each large file will have its blocks allocated | ||
511 | out of its own unique preallocation pool. | ||
512 | |||
513 | session_write_kbytes This file is read-only and shows the number of | ||
514 | kilobytes of data that have been written to this | ||
515 | filesystem since it was mounted. | ||
516 | .............................................................................. | ||
517 | |||
518 | Ioctls | ||
519 | ====== | ||
520 | |||
521 | There is some Ext4 specific functionality which can be accessed by applications | ||
522 | through the system call interfaces. The list of all Ext4 specific ioctls are | ||
523 | shown in the table below. | ||
524 | |||
525 | Table of Ext4 specific ioctls | ||
526 | .............................................................................. | ||
527 | Ioctl Description | ||
528 | EXT4_IOC_GETFLAGS Get additional attributes associated with inode. | ||
529 | The ioctl argument is an integer bitfield, with | ||
530 | bit values described in ext4.h. This ioctl is an | ||
531 | alias for FS_IOC_GETFLAGS. | ||
532 | |||
533 | EXT4_IOC_SETFLAGS Set additional attributes associated with inode. | ||
534 | The ioctl argument is an integer bitfield, with | ||
535 | bit values described in ext4.h. This ioctl is an | ||
536 | alias for FS_IOC_SETFLAGS. | ||
537 | |||
538 | EXT4_IOC_GETVERSION | ||
539 | EXT4_IOC_GETVERSION_OLD | ||
540 | Get the inode i_generation number stored for | ||
541 | each inode. The i_generation number is normally | ||
542 | changed only when new inode is created and it is | ||
543 | particularly useful for network filesystems. The | ||
544 | '_OLD' version of this ioctl is an alias for | ||
545 | FS_IOC_GETVERSION. | ||
546 | |||
547 | EXT4_IOC_SETVERSION | ||
548 | EXT4_IOC_SETVERSION_OLD | ||
549 | Set the inode i_generation number stored for | ||
550 | each inode. The '_OLD' version of this ioctl | ||
551 | is an alias for FS_IOC_SETVERSION. | ||
552 | |||
553 | EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize | ||
554 | mount option. It allows to resize filesystem | ||
555 | to the end of the last existing block group, | ||
556 | further resize has to be done with resize2fs, | ||
557 | either online, or offline. The argument points | ||
558 | to the unsigned logn number representing the | ||
559 | filesystem new block count. | ||
560 | |||
561 | EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one | ||
562 | this ioctl is pointing to) to the donor_fd (the | ||
563 | one specified in move_extent structure passed | ||
564 | as an argument to this ioctl). Then, exchange | ||
565 | inode metadata between orig_fd and donor_fd. | ||
566 | This is especially useful for online | ||
567 | defragmentation, because the allocator has the | ||
568 | opportunity to allocate moved blocks better, | ||
569 | ideally into one contiguous extent. | ||
570 | |||
571 | EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or | ||
572 | new group descriptor block. The new group | ||
573 | descriptor is described by ext4_new_group_input | ||
574 | structure, which is passed as an argument to | ||
575 | this ioctl. This is especially useful in | ||
576 | conjunction with EXT4_IOC_GROUP_EXTEND, | ||
577 | which allows online resize of the filesystem | ||
578 | to the end of the last existing block group. | ||
579 | Those two ioctls combined is used in userspace | ||
580 | online resize tool (e.g. resize2fs). | ||
581 | |||
582 | EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. | ||
583 | It converts (migrates) ext3 indirect block mapped | ||
584 | inode to ext4 extent mapped inode by walking | ||
585 | through indirect block mapping of the original | ||
586 | inode and converting contiguous block ranges | ||
587 | into ext4 extents of the temporary inode. Then, | ||
588 | inodes are swapped. This ioctl might help, when | ||
589 | migrating from ext3 to ext4 filesystem, however | ||
590 | suggestion is to create fresh ext4 filesystem | ||
591 | and copy data from the backup. Note, that | ||
592 | filesystem has to support extents for this ioctl | ||
593 | to work. | ||
594 | |||
595 | EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be | ||
596 | allocated to preserve application-expected ext3 | ||
597 | behaviour. Note that this will also start | ||
598 | triggering a write of the data blocks, but this | ||
599 | behaviour may change in the future as it is | ||
600 | not necessary and has been done this way only | ||
601 | for sake of simplicity. | ||
602 | .............................................................................. | ||
603 | |||
389 | References | 604 | References |
390 | ========== | 605 | ========== |
391 | 606 | ||