diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2008-10-11 16:23:48 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2008-10-11 16:23:48 -0400 |
commit | fd048088306656824958e7783ffcee27e241b361 (patch) | |
tree | be11bebe3bbd2cac88ff27bd3c7450339d21bdc7 /Documentation | |
parent | 5c3c4d9b5810c9aabd8c05219c62ca088aa83eb0 (diff) | |
parent | 03010a3350301baac2154fa66de925ae2981b7e3 (diff) |
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits)
ext4: Rename ext4dev to ext4
ext4: Avoid double dirtying of super block in ext4_put_super()
Update ext4 MAINTAINERS file
Hook ext4 to the vfs fiemap interface.
generic block based fiemap implementation
ocfs2: fiemap support
vfs: vfs-level fiemap interface
ext4: fix xattr deadlock
jbd2: Fix buffer head leak when writing the commit block
ext4: Add debugging markers that can be used by systemtap
jbd2: abort instead of waiting for nonexistent transaction
ext4: fix initialization of UNINIT bitmap blocks
ext4: Remove old legacy block allocator
ext4: Use readahead when reading an inode from the inode table
ext4: Improve the documentation for ext4's /proc tunables
ext4: Combine proc file handling into a single set of functions
ext4: move /proc setup and teardown out of mballoc.c
ext4: Don't use 'struct dentry' for internal lookups
ext4/jbd2: Avoid WARN() messages when failing to write to the superblock
ext4: use percpu data structures for lg_prealloc_list
...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/filesystems/ext4.txt | 14 | ||||
-rw-r--r-- | Documentation/filesystems/fiemap.txt | 228 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 73 |
3 files changed, 274 insertions, 41 deletions
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index 0d5394920a31..74484e696405 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -32,9 +32,9 @@ Mailing list: linux-ext4@vger.kernel.org | |||
32 | you will need to merge your changes with the version from e2fsprogs | 32 | you will need to merge your changes with the version from e2fsprogs |
33 | 1.41.x. | 33 | 1.41.x. |
34 | 34 | ||
35 | - Create a new filesystem using the ext4dev filesystem type: | 35 | - Create a new filesystem using the ext4 filesystem type: |
36 | 36 | ||
37 | # mke2fs -t ext4dev /dev/hda1 | 37 | # mke2fs -t ext4 /dev/hda1 |
38 | 38 | ||
39 | Or configure an existing ext3 filesystem to support extents and set | 39 | Or configure an existing ext3 filesystem to support extents and set |
40 | the test_fs flag to indicate that it's ok for an in-development | 40 | the test_fs flag to indicate that it's ok for an in-development |
@@ -47,13 +47,13 @@ Mailing list: linux-ext4@vger.kernel.org | |||
47 | 47 | ||
48 | # tune2fs -I 256 /dev/hda1 | 48 | # tune2fs -I 256 /dev/hda1 |
49 | 49 | ||
50 | (Note: we currently do not have tools to convert an ext4dev | 50 | (Note: we currently do not have tools to convert an ext4 |
51 | filesystem back to ext3; so please do not do try this on production | 51 | filesystem back to ext3; so please do not do try this on production |
52 | filesystems.) | 52 | filesystems.) |
53 | 53 | ||
54 | - Mounting: | 54 | - Mounting: |
55 | 55 | ||
56 | # mount -t ext4dev /dev/hda1 /wherever | 56 | # mount -t ext4 /dev/hda1 /wherever |
57 | 57 | ||
58 | - When comparing performance with other filesystems, remember that | 58 | - When comparing performance with other filesystems, remember that |
59 | ext3/4 by default offers higher data integrity guarantees than most. | 59 | ext3/4 by default offers higher data integrity guarantees than most. |
@@ -177,6 +177,11 @@ barrier=<0|1(*)> This enables/disables the use of write barriers in | |||
177 | your disks are battery-backed in one way or another, | 177 | your disks are battery-backed in one way or another, |
178 | disabling barriers may safely improve performance. | 178 | disabling barriers may safely improve performance. |
179 | 179 | ||
180 | inode_readahead=n This tuning parameter controls the maximum | ||
181 | number of inode table blocks that ext4's inode | ||
182 | table readahead algorithm will pre-read into | ||
183 | the buffer cache. The default value is 32 blocks. | ||
184 | |||
180 | orlov (*) This enables the new Orlov block allocator. It is | 185 | orlov (*) This enables the new Orlov block allocator. It is |
181 | enabled by default. | 186 | enabled by default. |
182 | 187 | ||
@@ -252,6 +257,7 @@ stripe=n Number of filesystem blocks that mballoc will try | |||
252 | delalloc (*) Deferring block allocation until write-out time. | 257 | delalloc (*) Deferring block allocation until write-out time. |
253 | nodelalloc Disable delayed allocation. Blocks are allocation | 258 | nodelalloc Disable delayed allocation. Blocks are allocation |
254 | when data is copied from user to page cache. | 259 | when data is copied from user to page cache. |
260 | |||
255 | Data Mode | 261 | Data Mode |
256 | ========= | 262 | ========= |
257 | There are 3 different data modes: | 263 | There are 3 different data modes: |
diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt new file mode 100644 index 000000000000..1e3defcfe50b --- /dev/null +++ b/Documentation/filesystems/fiemap.txt | |||
@@ -0,0 +1,228 @@ | |||
1 | ============ | ||
2 | Fiemap Ioctl | ||
3 | ============ | ||
4 | |||
5 | The fiemap ioctl is an efficient method for userspace to get file | ||
6 | extent mappings. Instead of block-by-block mapping (such as bmap), fiemap | ||
7 | returns a list of extents. | ||
8 | |||
9 | |||
10 | Request Basics | ||
11 | -------------- | ||
12 | |||
13 | A fiemap request is encoded within struct fiemap: | ||
14 | |||
15 | struct fiemap { | ||
16 | __u64 fm_start; /* logical offset (inclusive) at | ||
17 | * which to start mapping (in) */ | ||
18 | __u64 fm_length; /* logical length of mapping which | ||
19 | * userspace cares about (in) */ | ||
20 | __u32 fm_flags; /* FIEMAP_FLAG_* flags for request (in/out) */ | ||
21 | __u32 fm_mapped_extents; /* number of extents that were | ||
22 | * mapped (out) */ | ||
23 | __u32 fm_extent_count; /* size of fm_extents array (in) */ | ||
24 | __u32 fm_reserved; | ||
25 | struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */ | ||
26 | }; | ||
27 | |||
28 | |||
29 | fm_start, and fm_length specify the logical range within the file | ||
30 | which the process would like mappings for. Extents returned mirror | ||
31 | those on disk - that is, the logical offset of the 1st returned extent | ||
32 | may start before fm_start, and the range covered by the last returned | ||
33 | extent may end after fm_length. All offsets and lengths are in bytes. | ||
34 | |||
35 | Certain flags to modify the way in which mappings are looked up can be | ||
36 | set in fm_flags. If the kernel doesn't understand some particular | ||
37 | flags, it will return EBADR and the contents of fm_flags will contain | ||
38 | the set of flags which caused the error. If the kernel is compatible | ||
39 | with all flags passed, the contents of fm_flags will be unmodified. | ||
40 | It is up to userspace to determine whether rejection of a particular | ||
41 | flag is fatal to it's operation. This scheme is intended to allow the | ||
42 | fiemap interface to grow in the future but without losing | ||
43 | compatibility with old software. | ||
44 | |||
45 | fm_extent_count specifies the number of elements in the fm_extents[] array | ||
46 | that can be used to return extents. If fm_extent_count is zero, then the | ||
47 | fm_extents[] array is ignored (no extents will be returned), and the | ||
48 | fm_mapped_extents count will hold the number of extents needed in | ||
49 | fm_extents[] to hold the file's current mapping. Note that there is | ||
50 | nothing to prevent the file from changing between calls to FIEMAP. | ||
51 | |||
52 | The following flags can be set in fm_flags: | ||
53 | |||
54 | * FIEMAP_FLAG_SYNC | ||
55 | If this flag is set, the kernel will sync the file before mapping extents. | ||
56 | |||
57 | * FIEMAP_FLAG_XATTR | ||
58 | If this flag is set, the extents returned will describe the inodes | ||
59 | extended attribute lookup tree, instead of it's data tree. | ||
60 | |||
61 | |||
62 | Extent Mapping | ||
63 | -------------- | ||
64 | |||
65 | Extent information is returned within the embedded fm_extents array | ||
66 | which userspace must allocate along with the fiemap structure. The | ||
67 | number of elements in the fiemap_extents[] array should be passed via | ||
68 | fm_extent_count. The number of extents mapped by kernel will be | ||
69 | returned via fm_mapped_extents. If the number of fiemap_extents | ||
70 | allocated is less than would be required to map the requested range, | ||
71 | the maximum number of extents that can be mapped in the fm_extent[] | ||
72 | array will be returned and fm_mapped_extents will be equal to | ||
73 | fm_extent_count. In that case, the last extent in the array will not | ||
74 | complete the requested range and will not have the FIEMAP_EXTENT_LAST | ||
75 | flag set (see the next section on extent flags). | ||
76 | |||
77 | Each extent is described by a single fiemap_extent structure as | ||
78 | returned in fm_extents. | ||
79 | |||
80 | struct fiemap_extent { | ||
81 | __u64 fe_logical; /* logical offset in bytes for the start of | ||
82 | * the extent */ | ||
83 | __u64 fe_physical; /* physical offset in bytes for the start | ||
84 | * of the extent */ | ||
85 | __u64 fe_length; /* length in bytes for the extent */ | ||
86 | __u64 fe_reserved64[2]; | ||
87 | __u32 fe_flags; /* FIEMAP_EXTENT_* flags for this extent */ | ||
88 | __u32 fe_reserved[3]; | ||
89 | }; | ||
90 | |||
91 | All offsets and lengths are in bytes and mirror those on disk. It is valid | ||
92 | for an extents logical offset to start before the request or it's logical | ||
93 | length to extend past the request. Unless FIEMAP_EXTENT_NOT_ALIGNED is | ||
94 | returned, fe_logical, fe_physical, and fe_length will be aligned to the | ||
95 | block size of the file system. With the exception of extents flagged as | ||
96 | FIEMAP_EXTENT_MERGED, adjacent extents will not be merged. | ||
97 | |||
98 | The fe_flags field contains flags which describe the extent returned. | ||
99 | A special flag, FIEMAP_EXTENT_LAST is always set on the last extent in | ||
100 | the file so that the process making fiemap calls can determine when no | ||
101 | more extents are available, without having to call the ioctl again. | ||
102 | |||
103 | Some flags are intentionally vague and will always be set in the | ||
104 | presence of other more specific flags. This way a program looking for | ||
105 | a general property does not have to know all existing and future flags | ||
106 | which imply that property. | ||
107 | |||
108 | For example, if FIEMAP_EXTENT_DATA_INLINE or FIEMAP_EXTENT_DATA_TAIL | ||
109 | are set, FIEMAP_EXTENT_NOT_ALIGNED will also be set. A program looking | ||
110 | for inline or tail-packed data can key on the specific flag. Software | ||
111 | which simply cares not to try operating on non-aligned extents | ||
112 | however, can just key on FIEMAP_EXTENT_NOT_ALIGNED, and not have to | ||
113 | worry about all present and future flags which might imply unaligned | ||
114 | data. Note that the opposite is not true - it would be valid for | ||
115 | FIEMAP_EXTENT_NOT_ALIGNED to appear alone. | ||
116 | |||
117 | * FIEMAP_EXTENT_LAST | ||
118 | This is the last extent in the file. A mapping attempt past this | ||
119 | extent will return nothing. | ||
120 | |||
121 | * FIEMAP_EXTENT_UNKNOWN | ||
122 | The location of this extent is currently unknown. This may indicate | ||
123 | the data is stored on an inaccessible volume or that no storage has | ||
124 | been allocated for the file yet. | ||
125 | |||
126 | * FIEMAP_EXTENT_DELALLOC | ||
127 | - This will also set FIEMAP_EXTENT_UNKNOWN. | ||
128 | Delayed allocation - while there is data for this extent, it's | ||
129 | physical location has not been allocated yet. | ||
130 | |||
131 | * FIEMAP_EXTENT_ENCODED | ||
132 | This extent does not consist of plain filesystem blocks but is | ||
133 | encoded (e.g. encrypted or compressed). Reading the data in this | ||
134 | extent via I/O to the block device will have undefined results. | ||
135 | |||
136 | Note that it is *always* undefined to try to update the data | ||
137 | in-place by writing to the indicated location without the | ||
138 | assistance of the filesystem, or to access the data using the | ||
139 | information returned by the FIEMAP interface while the filesystem | ||
140 | is mounted. In other words, user applications may only read the | ||
141 | extent data via I/O to the block device while the filesystem is | ||
142 | unmounted, and then only if the FIEMAP_EXTENT_ENCODED flag is | ||
143 | clear; user applications must not try reading or writing to the | ||
144 | filesystem via the block device under any other circumstances. | ||
145 | |||
146 | * FIEMAP_EXTENT_DATA_ENCRYPTED | ||
147 | - This will also set FIEMAP_EXTENT_ENCODED | ||
148 | The data in this extent has been encrypted by the file system. | ||
149 | |||
150 | * FIEMAP_EXTENT_NOT_ALIGNED | ||
151 | Extent offsets and length are not guaranteed to be block aligned. | ||
152 | |||
153 | * FIEMAP_EXTENT_DATA_INLINE | ||
154 | This will also set FIEMAP_EXTENT_NOT_ALIGNED | ||
155 | Data is located within a meta data block. | ||
156 | |||
157 | * FIEMAP_EXTENT_DATA_TAIL | ||
158 | This will also set FIEMAP_EXTENT_NOT_ALIGNED | ||
159 | Data is packed into a block with data from other files. | ||
160 | |||
161 | * FIEMAP_EXTENT_UNWRITTEN | ||
162 | Unwritten extent - the extent is allocated but it's data has not been | ||
163 | initialized. This indicates the extent's data will be all zero if read | ||
164 | through the filesystem but the contents are undefined if read directly from | ||
165 | the device. | ||
166 | |||
167 | * FIEMAP_EXTENT_MERGED | ||
168 | This will be set when a file does not support extents, i.e., it uses a block | ||
169 | based addressing scheme. Since returning an extent for each block back to | ||
170 | userspace would be highly inefficient, the kernel will try to merge most | ||
171 | adjacent blocks into 'extents'. | ||
172 | |||
173 | |||
174 | VFS -> File System Implementation | ||
175 | --------------------------------- | ||
176 | |||
177 | File systems wishing to support fiemap must implement a ->fiemap callback on | ||
178 | their inode_operations structure. The fs ->fiemap call is responsible for | ||
179 | defining it's set of supported fiemap flags, and calling a helper function on | ||
180 | each discovered extent: | ||
181 | |||
182 | struct inode_operations { | ||
183 | ... | ||
184 | |||
185 | int (*fiemap)(struct inode *, struct fiemap_extent_info *, u64 start, | ||
186 | u64 len); | ||
187 | |||
188 | ->fiemap is passed struct fiemap_extent_info which describes the | ||
189 | fiemap request: | ||
190 | |||
191 | struct fiemap_extent_info { | ||
192 | unsigned int fi_flags; /* Flags as passed from user */ | ||
193 | unsigned int fi_extents_mapped; /* Number of mapped extents */ | ||
194 | unsigned int fi_extents_max; /* Size of fiemap_extent array */ | ||
195 | struct fiemap_extent *fi_extents_start; /* Start of fiemap_extent array */ | ||
196 | }; | ||
197 | |||
198 | It is intended that the file system should not need to access any of this | ||
199 | structure directly. | ||
200 | |||
201 | |||
202 | Flag checking should be done at the beginning of the ->fiemap callback via the | ||
203 | fiemap_check_flags() helper: | ||
204 | |||
205 | int fiemap_check_flags(struct fiemap_extent_info *fieinfo, u32 fs_flags); | ||
206 | |||
207 | The struct fieinfo should be passed in as recieved from ioctl_fiemap(). The | ||
208 | set of fiemap flags which the fs understands should be passed via fs_flags. If | ||
209 | fiemap_check_flags finds invalid user flags, it will place the bad values in | ||
210 | fieinfo->fi_flags and return -EBADR. If the file system gets -EBADR, from | ||
211 | fiemap_check_flags(), it should immediately exit, returning that error back to | ||
212 | ioctl_fiemap(). | ||
213 | |||
214 | |||
215 | For each extent in the request range, the file system should call | ||
216 | the helper function, fiemap_fill_next_extent(): | ||
217 | |||
218 | int fiemap_fill_next_extent(struct fiemap_extent_info *info, u64 logical, | ||
219 | u64 phys, u64 len, u32 flags, u32 dev); | ||
220 | |||
221 | fiemap_fill_next_extent() will use the passed values to populate the | ||
222 | next free extent in the fm_extents array. 'General' extent flags will | ||
223 | automatically be set from specific flags on behalf of the calling file | ||
224 | system so that the userspace API is not broken. | ||
225 | |||
226 | fiemap_fill_next_extent() returns 0 on success, and 1 when the | ||
227 | user-supplied fm_extents array is full. If an error is encountered | ||
228 | while copying the extent to user memory, -EFAULT will be returned. | ||
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index f566ad9bcb7b..d831d24d2a6c 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -923,45 +923,44 @@ CPUs. | |||
923 | The "procs_blocked" line gives the number of processes currently blocked, | 923 | The "procs_blocked" line gives the number of processes currently blocked, |
924 | waiting for I/O to complete. | 924 | waiting for I/O to complete. |
925 | 925 | ||
926 | |||
926 | 1.9 Ext4 file system parameters | 927 | 1.9 Ext4 file system parameters |
927 | ------------------------------ | 928 | ------------------------------ |
928 | Ext4 file system have one directory per partition under /proc/fs/ext4/ | 929 | |
929 | # ls /proc/fs/ext4/hdc/ | 930 | Information about mounted ext4 file systems can be found in |
930 | group_prealloc max_to_scan mb_groups mb_history min_to_scan order2_req | 931 | /proc/fs/ext4. Each mounted filesystem will have a directory in |
931 | stats stream_req | 932 | /proc/fs/ext4 based on its device name (i.e., /proc/fs/ext4/hdc or |
932 | 933 | /proc/fs/ext4/dm-0). The files in each per-device directory are shown | |
933 | mb_groups: | 934 | in Table 1-10, below. |
934 | This file gives the details of multiblock allocator buddy cache of free blocks | 935 | |
935 | 936 | Table 1-10: Files in /proc/fs/ext4/<devname> | |
936 | mb_history: | 937 | .............................................................................. |
937 | Multiblock allocation history. | 938 | File Content |
938 | 939 | mb_groups details of multiblock allocator buddy cache of free blocks | |
939 | stats: | 940 | mb_history multiblock allocation history |
940 | This file indicate whether the multiblock allocator should start collecting | 941 | stats controls whether the multiblock allocator should start |
941 | statistics. The statistics are shown during unmount | 942 | collecting statistics, which are shown during the unmount |
942 | 943 | group_prealloc the multiblock allocator will round up allocation | |
943 | group_prealloc: | 944 | requests to a multiple of this tuning parameter if the |
944 | The multiblock allocator normalize the block allocation request to | 945 | stripe size is not set in the ext4 superblock |
945 | group_prealloc filesystem blocks if we don't have strip value set. | 946 | max_to_scan The maximum number of extents the multiblock allocator |
946 | The stripe value can be specified at mount time or during mke2fs. | 947 | will search to find the best extent |
947 | 948 | min_to_scan The minimum number of extents the multiblock allocator | |
948 | max_to_scan: | 949 | will search to find the best extent |
949 | How long multiblock allocator can look for a best extent (in found extents) | 950 | order2_req Tuning parameter which controls the minimum size for |
950 | 951 | requests (as a power of 2) where the buddy cache is | |
951 | min_to_scan: | 952 | used |
952 | How long multiblock allocator must look for a best extent | 953 | stream_req Files which have fewer blocks than this tunable |
953 | 954 | parameter will have their blocks allocated out of a | |
954 | order2_req: | 955 | block group specific preallocation pool, so that small |
955 | Multiblock allocator use 2^N search using buddies only for requests greater | 956 | files are packed closely together. Each large file |
956 | than or equal to order2_req. The request size is specfied in file system | 957 | will have its blocks allocated out of its own unique |
957 | blocks. A value of 2 indicate only if the requests are greater than or equal | 958 | preallocation pool. |
958 | to 4 blocks. | 959 | inode_readahead Tuning parameter which controls the maximum number of |
959 | 960 | inode table blocks that ext4's inode table readahead | |
960 | stream_req: | 961 | algorithm will pre-read into the buffer cache |
961 | Files smaller than stream_req are served by the stream allocator, whose | 962 | .............................................................................. |
962 | purpose is to pack requests as close each to other as possible to | 963 | |
963 | produce smooth I/O traffic. Avalue of 16 indicate that file smaller than 16 | ||
964 | filesystem block size will use group based preallocation. | ||
965 | 964 | ||
966 | ------------------------------------------------------------------------------ | 965 | ------------------------------------------------------------------------------ |
967 | Summary | 966 | Summary |