diff options
Diffstat (limited to 'Documentation/filesystems/ext4.txt')
-rw-r--r-- | Documentation/filesystems/ext4.txt | 211 |
1 files changed, 208 insertions, 3 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 6ab9442d7eeb..c79ec58fd7f6 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -97,7 +97,7 @@ Note: More extensive information for getting started with ext4 can be | |||
97 | * Inode allocation using large virtual block groups via flex_bg | 97 | * Inode allocation using large virtual block groups via flex_bg |
98 | * delayed allocation | 98 | * delayed allocation |
99 | * large block (up to pagesize) support | 99 | * large block (up to pagesize) support |
100 | * efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force | 100 | * efficient new ordered mode in JBD2 and ext4(avoid using buffer head to force |
101 | the ordering) | 101 | the ordering) |
102 | 102 | ||
103 | [1] Filesystems with a block size of 1k may see a limit imposed by the | 103 | [1] Filesystems with a block size of 1k may see a limit imposed by the |
@@ -106,7 +106,7 @@ directory hash tree having a maximum depth of two. | |||
106 | 2.2 Candidate features for future inclusion | 106 | 2.2 Candidate features for future inclusion |
107 | 107 | ||
108 | * Online defrag (patches available but not well tested) | 108 | * Online defrag (patches available but not well tested) |
109 | * reduced mke2fs time via lazy itable initialization in conjuction with | 109 | * reduced mke2fs time via lazy itable initialization in conjunction with |
110 | the uninit_bg feature (capability to do this is available in e2fsprogs | 110 | the uninit_bg feature (capability to do this is available in e2fsprogs |
111 | but a kernel thread to do lazy zeroing of unused inode table blocks | 111 | but a kernel thread to do lazy zeroing of unused inode table blocks |
112 | after filesystem is first mounted is required for safety) | 112 | after filesystem is first mounted is required for safety) |
@@ -367,12 +367,47 @@ init_itable=n The lazy itable init code will wait n times the | |||
367 | minimizes the impact on the systme performance | 367 | minimizes the impact on the systme performance |
368 | while file system's inode table is being initialized. | 368 | while file system's inode table is being initialized. |
369 | 369 | ||
370 | discard Controls whether ext4 should issue discard/TRIM | 370 | discard Controls whether ext4 should issue discard/TRIM |
371 | nodiscard(*) commands to the underlying block device when | 371 | nodiscard(*) commands to the underlying block device when |
372 | blocks are freed. This is useful for SSD devices | 372 | blocks are freed. This is useful for SSD devices |
373 | and sparse/thinly-provisioned LUNs, but it is off | 373 | and sparse/thinly-provisioned LUNs, but it is off |
374 | by default until sufficient testing has been done. | 374 | by default until sufficient testing has been done. |
375 | 375 | ||
376 | nouid32 Disables 32-bit UIDs and GIDs. This is for | ||
377 | interoperability with older kernels which only | ||
378 | store and expect 16-bit values. | ||
379 | |||
380 | resize Allows to resize filesystem to the end of the last | ||
381 | existing block group, further resize has to be done | ||
382 | with resize2fs either online, or offline. It can be | ||
383 | used only with conjunction with remount. | ||
384 | |||
385 | block_validity This options allows to enables/disables the in-kernel | ||
386 | noblock_validity facility for tracking filesystem metadata blocks | ||
387 | within internal data structures. This allows multi- | ||
388 | block allocator and other routines to quickly locate | ||
389 | extents which might overlap with filesystem metadata | ||
390 | blocks. This option is intended for debugging | ||
391 | purposes and since it negatively affects the | ||
392 | performance, it is off by default. | ||
393 | |||
394 | dioread_lock Controls whether or not ext4 should use the DIO read | ||
395 | dioread_nolock locking. If the dioread_nolock option is specified | ||
396 | ext4 will allocate uninitialized extent before buffer | ||
397 | write and convert the extent to initialized after IO | ||
398 | completes. This approach allows ext4 code to avoid | ||
399 | using inode mutex, which improves scalability on high | ||
400 | speed storages. However this does not work with nobh | ||
401 | option and the mount will fail. Nor does it work with | ||
402 | data journaling and dioread_nolock option will be | ||
403 | ignored with kernel warning. Note that dioread_nolock | ||
404 | code path is only used for extent-based files. | ||
405 | Because of the restrictions this options comprises | ||
406 | it is off by default (e.g. dioread_lock). | ||
407 | |||
408 | i_version Enable 64-bit inode version support. This option is | ||
409 | off by default. | ||
410 | |||
376 | Data Mode | 411 | Data Mode |
377 | ========= | 412 | ========= |
378 | There are 3 different data modes: | 413 | There are 3 different data modes: |
@@ -400,6 +435,176 @@ needs to be read from and written to disk at the same time where it | |||
400 | outperforms all others modes. Currently ext4 does not have delayed | 435 | outperforms all others modes. Currently ext4 does not have delayed |
401 | allocation support if this data journalling mode is selected. | 436 | allocation support if this data journalling mode is selected. |
402 | 437 | ||
438 | /proc entries | ||
439 | ============= | ||
440 | |||
441 | Information about mounted ext4 file systems can be found in | ||
442 | /proc/fs/ext4. Each mounted filesystem will have a directory in | ||
443 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or | ||
444 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown | ||
445 | in table below. | ||
446 | |||
447 | Files in /proc/fs/ext4/<devname> | ||
448 | .............................................................................. | ||
449 | File Content | ||
450 | mb_groups details of multiblock allocator buddy cache of free blocks | ||
451 | .............................................................................. | ||
452 | |||
453 | /sys entries | ||
454 | ============ | ||
455 | |||
456 | Information about mounted ext4 file systems can be found in | ||
457 | /sys/fs/ext4. Each mounted filesystem will have a directory in | ||
458 | /sys/fs/ext4 based on its device name (i.e., /sys/fs/ext4/hdc or | ||
459 | /sys/fs/ext4/dm-0). The files in each per-device directory are shown | ||
460 | in table below. | ||
461 | |||
462 | Files in /sys/fs/ext4/<devname> | ||
463 | (see also Documentation/ABI/testing/sysfs-fs-ext4) | ||
464 | .............................................................................. | ||
465 | File Content | ||
466 | |||
467 | delayed_allocation_blocks This file is read-only and shows the number of | ||
468 | blocks that are dirty in the page cache, but | ||
469 | which do not have their location in the | ||
470 | filesystem allocated yet. | ||
471 | |||
472 | inode_goal Tuning parameter which (if non-zero) controls | ||
473 | the goal inode used by the inode allocator in | ||
474 | preference to all other allocation heuristics. | ||
475 | This is intended for debugging use only, and | ||
476 | should be 0 on production systems. | ||
477 | |||
478 | inode_readahead_blks Tuning parameter which controls the maximum | ||
479 | number of inode table blocks that ext4's inode | ||
480 | table readahead algorithm will pre-read into | ||
481 | the buffer cache | ||
482 | |||
483 | lifetime_write_kbytes This file is read-only and shows the number of | ||
484 | kilobytes of data that have been written to this | ||
485 | filesystem since it was created. | ||
486 | |||
487 | max_writeback_mb_bump The maximum number of megabytes the writeback | ||
488 | code will try to write out before move on to | ||
489 | another inode. | ||
490 | |||
491 | mb_group_prealloc The multiblock allocator will round up allocation | ||
492 | requests to a multiple of this tuning parameter if | ||
493 | the stripe size is not set in the ext4 superblock | ||
494 | |||
495 | mb_max_to_scan The maximum number of extents the multiblock | ||
496 | allocator will search to find the best extent | ||
497 | |||
498 | mb_min_to_scan The minimum number of extents the multiblock | ||
499 | allocator will search to find the best extent | ||
500 | |||
501 | mb_order2_req Tuning parameter which controls the minimum size | ||
502 | for requests (as a power of 2) where the buddy | ||
503 | cache is used | ||
504 | |||
505 | mb_stats Controls whether the multiblock allocator should | ||
506 | collect statistics, which are shown during the | ||
507 | unmount. 1 means to collect statistics, 0 means | ||
508 | not to collect statistics | ||
509 | |||
510 | mb_stream_req Files which have fewer blocks than this tunable | ||
511 | parameter will have their blocks allocated out | ||
512 | of a block group specific preallocation pool, so | ||
513 | that small files are packed closely together. | ||
514 | Each large file will have its blocks allocated | ||
515 | out of its own unique preallocation pool. | ||
516 | |||
517 | session_write_kbytes This file is read-only and shows the number of | ||
518 | kilobytes of data that have been written to this | ||
519 | filesystem since it was mounted. | ||
520 | .............................................................................. | ||
521 | |||
522 | Ioctls | ||
523 | ====== | ||
524 | |||
525 | There is some Ext4 specific functionality which can be accessed by applications | ||
526 | through the system call interfaces. The list of all Ext4 specific ioctls are | ||
527 | shown in the table below. | ||
528 | |||
529 | Table of Ext4 specific ioctls | ||
530 | .............................................................................. | ||
531 | Ioctl Description | ||
532 | EXT4_IOC_GETFLAGS Get additional attributes associated with inode. | ||
533 | The ioctl argument is an integer bitfield, with | ||
534 | bit values described in ext4.h. This ioctl is an | ||
535 | alias for FS_IOC_GETFLAGS. | ||
536 | |||
537 | EXT4_IOC_SETFLAGS Set additional attributes associated with inode. | ||
538 | The ioctl argument is an integer bitfield, with | ||
539 | bit values described in ext4.h. This ioctl is an | ||
540 | alias for FS_IOC_SETFLAGS. | ||
541 | |||
542 | EXT4_IOC_GETVERSION | ||
543 | EXT4_IOC_GETVERSION_OLD | ||
544 | Get the inode i_generation number stored for | ||
545 | each inode. The i_generation number is normally | ||
546 | changed only when new inode is created and it is | ||
547 | particularly useful for network filesystems. The | ||
548 | '_OLD' version of this ioctl is an alias for | ||
549 | FS_IOC_GETVERSION. | ||
550 | |||
551 | EXT4_IOC_SETVERSION | ||
552 | EXT4_IOC_SETVERSION_OLD | ||
553 | Set the inode i_generation number stored for | ||
554 | each inode. The '_OLD' version of this ioctl | ||
555 | is an alias for FS_IOC_SETVERSION. | ||
556 | |||
557 | EXT4_IOC_GROUP_EXTEND This ioctl has the same purpose as the resize | ||
558 | mount option. It allows to resize filesystem | ||
559 | to the end of the last existing block group, | ||
560 | further resize has to be done with resize2fs, | ||
561 | either online, or offline. The argument points | ||
562 | to the unsigned logn number representing the | ||
563 | filesystem new block count. | ||
564 | |||
565 | EXT4_IOC_MOVE_EXT Move the block extents from orig_fd (the one | ||
566 | this ioctl is pointing to) to the donor_fd (the | ||
567 | one specified in move_extent structure passed | ||
568 | as an argument to this ioctl). Then, exchange | ||
569 | inode metadata between orig_fd and donor_fd. | ||
570 | This is especially useful for online | ||
571 | defragmentation, because the allocator has the | ||
572 | opportunity to allocate moved blocks better, | ||
573 | ideally into one contiguous extent. | ||
574 | |||
575 | EXT4_IOC_GROUP_ADD Add a new group descriptor to an existing or | ||
576 | new group descriptor block. The new group | ||
577 | descriptor is described by ext4_new_group_input | ||
578 | structure, which is passed as an argument to | ||
579 | this ioctl. This is especially useful in | ||
580 | conjunction with EXT4_IOC_GROUP_EXTEND, | ||
581 | which allows online resize of the filesystem | ||
582 | to the end of the last existing block group. | ||
583 | Those two ioctls combined is used in userspace | ||
584 | online resize tool (e.g. resize2fs). | ||
585 | |||
586 | EXT4_IOC_MIGRATE This ioctl operates on the filesystem itself. | ||
587 | It converts (migrates) ext3 indirect block mapped | ||
588 | inode to ext4 extent mapped inode by walking | ||
589 | through indirect block mapping of the original | ||
590 | inode and converting contiguous block ranges | ||
591 | into ext4 extents of the temporary inode. Then, | ||
592 | inodes are swapped. This ioctl might help, when | ||
593 | migrating from ext3 to ext4 filesystem, however | ||
594 | suggestion is to create fresh ext4 filesystem | ||
595 | and copy data from the backup. Note, that | ||
596 | filesystem has to support extents for this ioctl | ||
597 | to work. | ||
598 | |||
599 | EXT4_IOC_ALLOC_DA_BLKS Force all of the delay allocated blocks to be | ||
600 | allocated to preserve application-expected ext3 | ||
601 | behaviour. Note that this will also start | ||
602 | triggering a write of the data blocks, but this | ||
603 | behaviour may change in the future as it is | ||
604 | not necessary and has been done this way only | ||
605 | for sake of simplicity. | ||
606 | .............................................................................. | ||
607 | |||
403 | References | 608 | References |
404 | ========== | 609 | ========== |
405 | 610 | ||