1 files changed, 38 insertions, 0 deletions
diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst
index e506d3dae510..059ddcbe769d 100644
--- a/Documentation/admin-guide/ext4.rst
+++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,48 @@ Currently Available
 * large block (up to pagesize) support
 * efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
  the ordering)
+* Case-insensitive file name lookups
 [1] Filesystems with a block size of 1k may see a limit imposed by the
 directory hash tree having a maximum depth of two.
+case-insensitive file name lookups
+======================================================
+The case-insensitive file name lookup feature is supported on a
+per-directory basis, allowing the user to mix case-insensitive and
+case-sensitive directories in the same filesystem.  It is enabled by
+flipping the +F inode attribute of an empty directory.  The
+case-insensitive string match operation is only defined when we know how
+text in encoded in a byte sequence.  For that reason, in order to enable
+case-insensitive directories, the filesystem must have the
+casefold feature, which stores the filesystem-wide encoding
+model used.  By default, the charset adopted is the latest version of
+Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
+form.  The comparison algorithm is implemented by normalizing the
+strings to the Canonical decomposition form, as defined by Unicode,
+followed by a byte per byte comparison.
+The case-awareness is name-preserving on the disk, meaning that the file
+name provided by userspace is a byte-per-byte match to what is actually
+written in the disk.  The Unicode normalization format used by the
+kernel is thus an internal representation, and not exposed to the
+userspace nor to the disk, with the important exception of disk hashes,
+used on large case-insensitive directories with DX feature.  On DX
+directories, the hash must be calculated using the casefolded version of
+the filename, meaning that the normalization format used actually has an
+impact on where the directory entry is stored.
+When we change from viewing filenames as opaque byte sequences to seeing
+them as encoded strings we need to address what happens when a program
+tries to create a file with an invalid name.  The Unicode subsystem
+within the kernel leaves the decision of what to do in this case to the
+filesystem, which select its preferred behavior by enabling/disabling
+the strict mode.  When Ext4 encounters one of those strings and the
+filesystem did not require strict mode, it falls back to considering the
+entire string as an opaque byte sequence, which still allows the user to
+operate on that file, but the case-insensitive lookups won't work.
 Options
 =======

diff --git a/Documentation/admin-guide/ext4.rst b/Documentation/admin-guide/ext4.rst index e506d3dae510..059ddcbe769d 100644 --- a/Documentation/admin-guide/ext4.rst +++ b/Documentation/admin-guide/ext4.rst
@@ -91,10 +91,48 @@ Currently Available
91	* large block (up to pagesize) support	91	* large block (up to pagesize) support
92	* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force	92	* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
93	the ordering)	93	the ordering)
		94	* Case-insensitive file name lookups
94		95
95	[1] Filesystems with a block size of 1k may see a limit imposed by the	96	[1] Filesystems with a block size of 1k may see a limit imposed by the
96	directory hash tree having a maximum depth of two.	97	directory hash tree having a maximum depth of two.
97		98
		99	case-insensitive file name lookups
		100	======================================================
		101
		102	The case-insensitive file name lookup feature is supported on a
		103	per-directory basis, allowing the user to mix case-insensitive and
		104	case-sensitive directories in the same filesystem. It is enabled by
		105	flipping the +F inode attribute of an empty directory. The
		106	case-insensitive string match operation is only defined when we know how
		107	text in encoded in a byte sequence. For that reason, in order to enable
		108	case-insensitive directories, the filesystem must have the
		109	casefold feature, which stores the filesystem-wide encoding
		110	model used. By default, the charset adopted is the latest version of
		111	Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
		112	form. The comparison algorithm is implemented by normalizing the
		113	strings to the Canonical decomposition form, as defined by Unicode,
		114	followed by a byte per byte comparison.
		115
		116	The case-awareness is name-preserving on the disk, meaning that the file
		117	name provided by userspace is a byte-per-byte match to what is actually
		118	written in the disk. The Unicode normalization format used by the
		119	kernel is thus an internal representation, and not exposed to the
		120	userspace nor to the disk, with the important exception of disk hashes,
		121	used on large case-insensitive directories with DX feature. On DX
		122	directories, the hash must be calculated using the casefolded version of
		123	the filename, meaning that the normalization format used actually has an
		124	impact on where the directory entry is stored.
		125
		126	When we change from viewing filenames as opaque byte sequences to seeing
		127	them as encoded strings we need to address what happens when a program
		128	tries to create a file with an invalid name. The Unicode subsystem
		129	within the kernel leaves the decision of what to do in this case to the
		130	filesystem, which select its preferred behavior by enabling/disabling
		131	the strict mode. When Ext4 encounters one of those strings and the
		132	filesystem did not require strict mode, it falls back to considering the
		133	entire string as an opaque byte sequence, which still allows the user to
		134	operate on that file, but the case-insensitive lookups won't work.
		135
98	Options	136	Options
99	=======	137	=======
100		138