1 files changed, 75 insertions, 50 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt
index 0c5086db8352..80e193d82e2e 100644
--- a/Documentation/filesystems/ext4.txt
+++ b/Documentation/filesystems/ext4.txt
@@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
 1. Quick usage instructions:
 ===========================
-  - Grab updated e2fsprogs from
+  - Compile and install the latest version of e2fsprogs (as of this
-    ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/
+    writing version 1.41) from:
-    This is a patchset on top of e2fsprogs-1.39, which can be found at
+    http://sourceforge.net/project/showfiles.php?group_id=2406
+        
+        or
    ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
-  - It's still mke2fs -j /dev/hda1
+        or grab the latest git repository from:
+    git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
+  - Create a new filesystem using the ext4dev filesystem type:
+        # mke2fs -t ext4dev /dev/hda1
+    Or configure an existing ext3 filesystem to support extents and set
+    the test_fs flag to indicate that it's ok for an in-development
+    filesystem to touch this filesystem:
-  - mount /dev/hda1 /wherever -t ext4dev
+        # tune2fs -O extents -E test_fs /dev/hda1
-  - To enable extents,
+    If the filesystem was created with 128 byte inodes, it can be
+    converted to use 256 byte for greater efficiency via:
-        mount /dev/hda1 /wherever -t ext4dev -o extents
+        # tune2fs -I 256 /dev/hda1
-  - The filesystem is compatible with the ext3 driver until you add a file
+    (Note: we currently do not have tools to convert an ext4dev
-    which has extents (ie: `mount -o extents', then create a file).
+    filesystem back to ext3; so please do not do try this on production
+    filesystems.)
-    NOTE: The "extents" mount flag is temporary.  It will soon go away and
+  - Mounting:
-    extents will be enabled by the "-o extents" flag to mke2fs or tune2fs
+        # mount -t ext4dev /dev/hda1 /wherever
  - When comparing performance with other filesystems, remember that
-    ext3/4 by default offers higher data integrity guarantees than most.  So
+    ext3/4 by default offers higher data integrity guarantees than most.
-    when comparing with a metadata-only journalling filesystem, use `mount -o
+    So when comparing with a metadata-only journalling filesystem, such
-    data=writeback'.  And you might as well use `mount -o nobh' too along
+    as ext3, use `mount -o data=writeback'.  And you might as well use
-    with it.  Making the journal larger than the mke2fs default often helps
+    `mount -o nobh' too along with it.  Making the journal larger than
-    performance with metadata-intensive workloads.
+    the mke2fs default often helps performance with metadata-intensive
+    workloads.
 2. Features
 ===========
 2.1 Currently available
-* ability to use filesystems > 16TB
+* ability to use filesystems > 16TB (e2fsprogs support not available yet)
 * extent format reduces metadata overhead (RAM, IO for access, transactions)
 * extent format more robust in face of on-disk corruption due to magics,
 * internal redunancy in tree
+* improved file allocation (multi-block alloc)
-2.1 Previously available, soon to be enabled by default by "mkefs.ext4":
+* fix 32000 subdirectory limit
+* nsec timestamps for mtime, atime, ctime, create time
-* dir_index and resize inode will be on by default
+* inode version field on disk (NFSv4, Lustre)
-* large inodes will be used by default for fast EAs, nsec timestamps, etc
+* reduced e2fsck time via uninit_bg feature
+* journal checksumming for robustness, performance
+* persistent file preallocation (e.g for streaming media, databases)
+* ability to pack bitmaps and inode tables into larger virtual groups via the
+  flex_bg feature
+* large file support
+* Inode allocation using large virtual block groups via flex_bg
+* delayed allocation
+* large block (up to pagesize) support
+* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
+  the ordering)
 2.2 Candidate features for future inclusion
-There are several under discussion, whether they all make it in is
+* Online defrag (patches available but not well tested)
-partly a function of how much time everyone has to work on them:
+* reduced mke2fs time via lazy itable initialization in conjuction with
+  the uninit_bg feature (capability to do this is available in e2fsprogs
+  but a kernel thread to do lazy zeroing of unused inode table blocks
+  after filesystem is first mounted is required for safety)
-* improved file allocation (multi-block alloc, delayed alloc; basically done)
+There are several others under discussion, whether they all make it in is
-* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)
+partly a function of how much time everyone has to work on them. Features like
-* nsec timestamps for mtime, atime, ctime, create time (patch exists,
+metadata checksumming have been discussed and planned for a bit but no patches
-  needs some e2fsck work)
+exist yet so I'm not sure they're in the near-term roadmap.
-* inode version field on disk (NFSv4, Lustre; prototype exists)
-* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
-* journal checksumming for robustness, performance (prototype exists)
-* persistent file preallocation (e.g for streaming media, databases)
-Features like metadata checksumming have been discussed and planned for
+The big performance win will come with mballoc, delalloc and flex_bg
-a bit but no patches exist yet so I'm not sure they're in the near-term
+grouping of bitmaps and inode tables.  Some test results available here:
-roadmap.
-The big performance win will come with mballoc and delalloc.  CFS has
+ - http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
-been using mballoc for a few years already with Lustre, and IBM + Bull
+ - http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
-did a lot of benchmarking on it.  The reason it isn't in the first set of
-patches is partly a manageability issue, and partly because it doesn't
-directly affect the on-disk format (outside of much better allocation)
-so it isn't critical to get into the first round of changes.  I believe
-Alex is working on a new set of patches right now.
 3. Options
 ==========
@@ -222,9 +243,11 @@ stripe=n		Number of filesystem blocks that mballoc will try
                        to use for allocation size and alignment. For RAID5/6
                        systems this should be the number of data
                        disks *  RAID chunk size in file system blocks.
+delalloc        (*)     Deferring block allocation until write-out time.
+nodelalloc              Disable delayed allocation. Blocks are allocation
+                        when data is copied from user to page cache.
 Data Mode
---------
+=========
 There are 3 different data modes:
 * writeback mode
@@ -236,10 +259,10 @@ typically provide the best ext4 performance.
 * ordered mode
 In data=ordered mode, ext4 only officially journals metadata, but it logically
-groups metadata and data blocks into a single unit called a transaction.  When
+groups metadata information related to data changes with the data blocks into a
-it's time to write the new metadata out to disk, the associated data blocks
+single unit called a transaction.  When it's time to write the new metadata
-are written first.  In general, this mode performs slightly slower than
+out to disk, the associated data blocks are written first.  In general,
-writeback but significantly faster than journal mode.
+this mode performs slightly slower than writeback but significantly faster than journal mode.
 * journal mode
 data=journal mode provides full data and metadata journaling.  All new data is
@@ -247,7 +270,8 @@ written to the journal first, and then to its final location.
 In the event of a crash, the journal can be replayed, bringing both data and
 metadata into a consistent state.  This mode is the slowest except when data
 needs to be read from and written to disk at the same time where it
-outperforms all others modes.
+outperforms all others modes.  Curently ext4 does not have delayed
+allocation support if this data journalling mode is selected.
 References
 ==========
@@ -256,7 +280,8 @@ kernel source:	<file:fs/ext4/>
                <file:fs/jbd2/>
 programs:       http://e2fsprogs.sourceforge.net/
-                http://ext2resize.sourceforge.net
 useful links:   http://fedoraproject.org/wiki/ext3-devel
                http://www.bullopensource.org/ext4/
+                http://ext4.wiki.kernel.org/index.php/Main_Page
+                http://fedoraproject.org/wiki/Features/Ext4

diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 0c5086db8352..80e193d82e2e 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt
@@ -13,72 +13,93 @@ Mailing list: linux-ext4@vger.kernel.org
13	1. Quick usage instructions:	13	1. Quick usage instructions:
14	===========================	14	===========================
15		15
16	- Grab updated e2fsprogs from	16	- Compile and install the latest version of e2fsprogs (as of this
17	ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs-interim/	17	writing version 1.41) from:
18	This is a patchset on top of e2fsprogs-1.39, which can be found at	18
		19	http://sourceforge.net/project/showfiles.php?group_id=2406
		20
		21	or
		22
19	ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/	23	ftp://ftp.kernel.org/pub/linux/kernel/people/tytso/e2fsprogs/
20		24
21	- It's still mke2fs -j /dev/hda1	25	or grab the latest git repository from:
		26
		27	git://git.kernel.org/pub/scm/fs/ext2/e2fsprogs.git
		28
		29	- Create a new filesystem using the ext4dev filesystem type:
		30
		31	# mke2fs -t ext4dev /dev/hda1
		32
		33	Or configure an existing ext3 filesystem to support extents and set
		34	the test_fs flag to indicate that it's ok for an in-development
		35	filesystem to touch this filesystem:
22		36
23	- mount /dev/hda1 /wherever -t ext4dev	37	# tune2fs -O extents -E test_fs /dev/hda1
24		38
25	- To enable extents,	39	If the filesystem was created with 128 byte inodes, it can be
		40	converted to use 256 byte for greater efficiency via:
26		41
27	mount /dev/hda1 /wherever -t ext4dev -o extents	42	# tune2fs -I 256 /dev/hda1
28		43
29	- The filesystem is compatible with the ext3 driver until you add a file	44	(Note: we currently do not have tools to convert an ext4dev
30	which has extents (ie: `mount -o extents', then create a file).	45	filesystem back to ext3; so please do not do try this on production
		46	filesystems.)
31		47
32	NOTE: The "extents" mount flag is temporary. It will soon go away and	48	- Mounting:
33	extents will be enabled by the "-o extents" flag to mke2fs or tune2fs	49
		50	# mount -t ext4dev /dev/hda1 /wherever
34		51
35	- When comparing performance with other filesystems, remember that	52	- When comparing performance with other filesystems, remember that
36	ext3/4 by default offers higher data integrity guarantees than most. So	53	ext3/4 by default offers higher data integrity guarantees than most.
37	when comparing with a metadata-only journalling filesystem, use `mount -o	54	So when comparing with a metadata-only journalling filesystem, such
38	data=writeback'. And you might as well use `mount -o nobh' too along	55	as ext3, use `mount -o data=writeback'. And you might as well use
39	with it. Making the journal larger than the mke2fs default often helps	56	`mount -o nobh' too along with it. Making the journal larger than
40	performance with metadata-intensive workloads.	57	the mke2fs default often helps performance with metadata-intensive
		58	workloads.
41		59
42	2. Features	60	2. Features
43	===========	61	===========
44		62
45	2.1 Currently available	63	2.1 Currently available
46		64
47	* ability to use filesystems > 16TB	65	* ability to use filesystems > 16TB (e2fsprogs support not available yet)
48	* extent format reduces metadata overhead (RAM, IO for access, transactions)	66	* extent format reduces metadata overhead (RAM, IO for access, transactions)
49	* extent format more robust in face of on-disk corruption due to magics,	67	* extent format more robust in face of on-disk corruption due to magics,
50	* internal redunancy in tree	68	* internal redunancy in tree
51		69	* improved file allocation (multi-block alloc)
52	2.1 Previously available, soon to be enabled by default by "mkefs.ext4":	70	* fix 32000 subdirectory limit
53		71	* nsec timestamps for mtime, atime, ctime, create time
54	* dir_index and resize inode will be on by default	72	* inode version field on disk (NFSv4, Lustre)
55	* large inodes will be used by default for fast EAs, nsec timestamps, etc	73	* reduced e2fsck time via uninit_bg feature
		74	* journal checksumming for robustness, performance
		75	* persistent file preallocation (e.g for streaming media, databases)
		76	* ability to pack bitmaps and inode tables into larger virtual groups via the
		77	flex_bg feature
		78	* large file support
		79	* Inode allocation using large virtual block groups via flex_bg
		80	* delayed allocation
		81	* large block (up to pagesize) support
		82	* efficent new ordered mode in JBD2 and ext4(avoid using buffer head to force
		83	the ordering)
56		84
57	2.2 Candidate features for future inclusion	85	2.2 Candidate features for future inclusion
58		86
59	There are several under discussion, whether they all make it in is	87	* Online defrag (patches available but not well tested)
60	partly a function of how much time everyone has to work on them:	88	* reduced mke2fs time via lazy itable initialization in conjuction with
		89	the uninit_bg feature (capability to do this is available in e2fsprogs
		90	but a kernel thread to do lazy zeroing of unused inode table blocks
		91	after filesystem is first mounted is required for safety)
61		92
62	* improved file allocation (multi-block alloc, delayed alloc; basically done)	93	There are several others under discussion, whether they all make it in is
63	* fix 32000 subdirectory limit (patch exists, needs some e2fsck work)	94	partly a function of how much time everyone has to work on them. Features like
64	* nsec timestamps for mtime, atime, ctime, create time (patch exists,	95	metadata checksumming have been discussed and planned for a bit but no patches
65	needs some e2fsck work)	96	exist yet so I'm not sure they're in the near-term roadmap.
66	* inode version field on disk (NFSv4, Lustre; prototype exists)
67	* reduced mke2fs/e2fsck time via uninitialized groups (prototype exists)
68	* journal checksumming for robustness, performance (prototype exists)
69	* persistent file preallocation (e.g for streaming media, databases)
70		97
71	Features like metadata checksumming have been discussed and planned for	98	The big performance win will come with mballoc, delalloc and flex_bg
72	a bit but no patches exist yet so I'm not sure they're in the near-term	99	grouping of bitmaps and inode tables. Some test results available here:
73	roadmap.
74		100
75	The big performance win will come with mballoc and delalloc. CFS has	101	- http://www.bullopensource.org/ext4/20080530/ffsb-write-2.6.26-rc2.html
76	been using mballoc for a few years already with Lustre, and IBM + Bull	102	- http://www.bullopensource.org/ext4/20080530/ffsb-readwrite-2.6.26-rc2.html
77	did a lot of benchmarking on it. The reason it isn't in the first set of
78	patches is partly a manageability issue, and partly because it doesn't
79	directly affect the on-disk format (outside of much better allocation)
80	so it isn't critical to get into the first round of changes. I believe
81	Alex is working on a new set of patches right now.
82		103
83	3. Options	104	3. Options
84	==========	105	==========
@@ -222,9 +243,11 @@ stripe=n Number of filesystem blocks that mballoc will try
222	to use for allocation size and alignment. For RAID5/6	243	to use for allocation size and alignment. For RAID5/6
223	systems this should be the number of data	244	systems this should be the number of data
224	disks * RAID chunk size in file system blocks.	245	disks * RAID chunk size in file system blocks.
225		246	delalloc (*) Deferring block allocation until write-out time.
		247	nodelalloc Disable delayed allocation. Blocks are allocation
		248	when data is copied from user to page cache.
226	Data Mode	249	Data Mode
227	---------	250	=========
228	There are 3 different data modes:	251	There are 3 different data modes:
229		252
230	* writeback mode	253	* writeback mode
@@ -236,10 +259,10 @@ typically provide the best ext4 performance.
236		259
237	* ordered mode	260	* ordered mode
238	In data=ordered mode, ext4 only officially journals metadata, but it logically	261	In data=ordered mode, ext4 only officially journals metadata, but it logically
239	groups metadata and data blocks into a single unit called a transaction. When	262	groups metadata information related to data changes with the data blocks into a
240	it's time to write the new metadata out to disk, the associated data blocks	263	single unit called a transaction. When it's time to write the new metadata
241	are written first. In general, this mode performs slightly slower than	264	out to disk, the associated data blocks are written first. In general,
242	writeback but significantly faster than journal mode.	265	this mode performs slightly slower than writeback but significantly faster than journal mode.
243		266
244	* journal mode	267	* journal mode
245	data=journal mode provides full data and metadata journaling. All new data is	268	data=journal mode provides full data and metadata journaling. All new data is
@@ -247,7 +270,8 @@ written to the journal first, and then to its final location.
247	In the event of a crash, the journal can be replayed, bringing both data and	270	In the event of a crash, the journal can be replayed, bringing both data and
248	metadata into a consistent state. This mode is the slowest except when data	271	metadata into a consistent state. This mode is the slowest except when data
249	needs to be read from and written to disk at the same time where it	272	needs to be read from and written to disk at the same time where it
250	outperforms all others modes.	273	outperforms all others modes. Curently ext4 does not have delayed
		274	allocation support if this data journalling mode is selected.
251		275
252	References	276	References
253	==========	277	==========
@@ -256,7 +280,8 @@ kernel source: <file:fs/ext4/>
256	<file:fs/jbd2/>	280	<file:fs/jbd2/>
257		281
258	programs: http://e2fsprogs.sourceforge.net/	282	programs: http://e2fsprogs.sourceforge.net/
259	http://ext2resize.sourceforge.net
260		283
261	useful links: http://fedoraproject.org/wiki/ext3-devel	284	useful links: http://fedoraproject.org/wiki/ext3-devel
262	http://www.bullopensource.org/ext4/	285	http://www.bullopensource.org/ext4/
		286	http://ext4.wiki.kernel.org/index.php/Main_Page
		287	http://fedoraproject.org/wiki/Features/Ext4