aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
-rw-r--r--Documentation/admin-guide/ext4.rst38
1 files changed, 38 insertions, 0 deletions
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index e506d3dae510..059ddcbe769d 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,48 @@ Currently Available
91* large block (up to pagesize) support 91* large block (up to pagesize) support
92* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force 92* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
93 the ordering) 93 the ordering)
94* Case-insensitive file name lookups
94 95
95[1] Filesystems with a block size of 1k may see a limit imposed by the 96[1] Filesystems with a block size of 1k may see a limit imposed by the
96directory hash tree having a maximum depth of two. 97directory hash tree having a maximum depth of two.
97 98
99case-insensitive file name lookups
100======================================================
101
102The case-insensitive file name lookup feature is supported on a
103per-directory basis, allowing the user to mix case-insensitive and
104case-sensitive directories in the same filesystem. It is enabled by
105flipping the +F inode attribute of an empty directory. The
106case-insensitive string match operation is only defined when we know how
107text in encoded in a byte sequence. For that reason, in order to enable
108case-insensitive directories, the filesystem must have the
109casefold feature, which stores the filesystem-wide encoding
110model used. By default, the charset adopted is the latest version of
111Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
112form. The comparison algorithm is implemented by normalizing the
113strings to the Canonical decomposition form, as defined by Unicode,
114followed by a byte per byte comparison.
115
116The case-awareness is name-preserving on the disk, meaning that the file
117name provided by userspace is a byte-per-byte match to what is actually
118written in the disk. The Unicode normalization format used by the
119kernel is thus an internal representation, and not exposed to the
120userspace nor to the disk, with the important exception of disk hashes,
121used on large case-insensitive directories with DX feature. On DX
122directories, the hash must be calculated using the casefolded version of
123the filename, meaning that the normalization format used actually has an
124impact on where the directory entry is stored.
125
126When we change from viewing filenames as opaque byte sequences to seeing
127them as encoded strings we need to address what happens when a program
128tries to create a file with an invalid name. The Unicode subsystem
129within the kernel leaves the decision of what to do in this case to the
130filesystem, which select its preferred behavior by enabling/disabling
131the strict mode. When Ext4 encounters one of those strings and the
132filesystem did not require strict mode, it falls back to considering the
133entire string as an opaque byte sequence, which still allows the user to
134operate on that file, but the case-insensitive lookups won't work.
135
98Options 136Options
99======= 137=======
100 138