diff options
| -rw-r--r-- | Documentation/admin-guide/ext4.rst | 38 |
1 files changed, 38 insertions, 0 deletions
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst index e506d3dae510..059ddcbe769d 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst | |||
| @@ -91,10 +91,48 @@ Currently Available | |||
| 91 | * large block (up to pagesize) support | 91 | * large block (up to pagesize) support |
| 92 | * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force | 92 | * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force |
| 93 | the ordering) | 93 | the ordering) |
| 94 | * Case-insensitive file name lookups | ||
| 94 | 95 | ||
| 95 | [1] Filesystems with a block size of 1k may see a limit imposed by the | 96 | [1] Filesystems with a block size of 1k may see a limit imposed by the |
| 96 | directory hash tree having a maximum depth of two. | 97 | directory hash tree having a maximum depth of two. |
| 97 | 98 | ||
| 99 | case-insensitive file name lookups | ||
| 100 | ====================================================== | ||
| 101 | |||
| 102 | The case-insensitive file name lookup feature is supported on a | ||
| 103 | per-directory basis, allowing the user to mix case-insensitive and | ||
| 104 | case-sensitive directories in the same filesystem. It is enabled by | ||
| 105 | flipping the +F inode attribute of an empty directory. The | ||
| 106 | case-insensitive string match operation is only defined when we know how | ||
| 107 | text in encoded in a byte sequence. For that reason, in order to enable | ||
| 108 | case-insensitive directories, the filesystem must have the | ||
| 109 | casefold feature, which stores the filesystem-wide encoding | ||
| 110 | model used. By default, the charset adopted is the latest version of | ||
| 111 | Unicode (12.1.0, by the time of this writing), encoded in the UTF-8 | ||
| 112 | form. The comparison algorithm is implemented by normalizing the | ||
| 113 | strings to the Canonical decomposition form, as defined by Unicode, | ||
| 114 | followed by a byte per byte comparison. | ||
| 115 | |||
| 116 | The case-awareness is name-preserving on the disk, meaning that the file | ||
| 117 | name provided by userspace is a byte-per-byte match to what is actually | ||
| 118 | written in the disk. The Unicode normalization format used by the | ||
| 119 | kernel is thus an internal representation, and not exposed to the | ||
| 120 | userspace nor to the disk, with the important exception of disk hashes, | ||
| 121 | used on large case-insensitive directories with DX feature. On DX | ||
| 122 | directories, the hash must be calculated using the casefolded version of | ||
| 123 | the filename, meaning that the normalization format used actually has an | ||
| 124 | impact on where the directory entry is stored. | ||
| 125 | |||
| 126 | When we change from viewing filenames as opaque byte sequences to seeing | ||
| 127 | them as encoded strings we need to address what happens when a program | ||
| 128 | tries to create a file with an invalid name. The Unicode subsystem | ||
| 129 | within the kernel leaves the decision of what to do in this case to the | ||
| 130 | filesystem, which select its preferred behavior by enabling/disabling | ||
| 131 | the strict mode. When Ext4 encounters one of those strings and the | ||
| 132 | filesystem did not require strict mode, it falls back to considering the | ||
| 133 | entire string as an opaque byte sequence, which still allows the user to | ||
| 134 | operate on that file, but the case-insensitive lookups won't work. | ||
| 135 | |||
| 98 | Options | 136 | Options |
| 99 | ======= | 137 | ======= |
| 100 | 138 | ||
