diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/00-INDEX | 54 | ||||
-rw-r--r-- | Documentation/filesystems/9p.txt (renamed from Documentation/filesystems/v9fs.txt) | 21 | ||||
-rw-r--r-- | Documentation/filesystems/isofs.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/jfs.txt | 2 | ||||
-rw-r--r-- | Documentation/filesystems/ntfs.txt | 5 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/udf.txt | 14 | ||||
-rw-r--r-- | Documentation/filesystems/vfat.txt | 6 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 217 |
9 files changed, 281 insertions, 48 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX index 74052d22d868..66fdc0744fe0 100644 --- a/Documentation/filesystems/00-INDEX +++ b/Documentation/filesystems/00-INDEX | |||
@@ -1,27 +1,47 @@ | |||
1 | 00-INDEX | 1 | 00-INDEX |
2 | - this file (info on some of the filesystems supported by linux). | 2 | - this file (info on some of the filesystems supported by linux). |
3 | Exporting | ||
4 | - explanation of how to make filesystems exportable. | ||
3 | Locking | 5 | Locking |
4 | - info on locking rules as they pertain to Linux VFS. | 6 | - info on locking rules as they pertain to Linux VFS. |
5 | adfs.txt | 7 | adfs.txt |
6 | - info and mount options for the Acorn Advanced Disc Filing System. | 8 | - info and mount options for the Acorn Advanced Disc Filing System. |
9 | afs.txt | ||
10 | - info and examples for the distributed AFS (Andrew File System) fs. | ||
7 | affs.txt | 11 | affs.txt |
8 | - info and mount options for the Amiga Fast File System. | 12 | - info and mount options for the Amiga Fast File System. |
13 | automount-support.txt | ||
14 | - information about filesystem automount support. | ||
15 | befs.txt | ||
16 | - information about the BeOS filesystem for Linux. | ||
9 | bfs.txt | 17 | bfs.txt |
10 | - info for the SCO UnixWare Boot Filesystem (BFS). | 18 | - info for the SCO UnixWare Boot Filesystem (BFS). |
11 | cifs.txt | 19 | cifs.txt |
12 | - description of the CIFS filesystem | 20 | - description of the CIFS filesystem. |
13 | coda.txt | 21 | coda.txt |
14 | - description of the CODA filesystem. | 22 | - description of the CODA filesystem. |
15 | configfs/ | 23 | configfs/ |
16 | - directory containing configfs documentation and example code. | 24 | - directory containing configfs documentation and example code. |
17 | cramfs.txt | 25 | cramfs.txt |
18 | - info on the cram filesystem for small storage (ROMs etc) | 26 | - info on the cram filesystem for small storage (ROMs etc). |
27 | dentry-locking.txt | ||
28 | - info on the RCU-based dcache locking model. | ||
19 | devfs/ | 29 | devfs/ |
20 | - directory containing devfs documentation. | 30 | - directory containing devfs documentation. |
31 | directory-locking | ||
32 | - info about the locking scheme used for directory operations. | ||
21 | dlmfs.txt | 33 | dlmfs.txt |
22 | - info on the userspace interface to the OCFS2 DLM. | 34 | - info on the userspace interface to the OCFS2 DLM. |
23 | ext2.txt | 35 | ext2.txt |
24 | - info, mount options and specifications for the Ext2 filesystem. | 36 | - info, mount options and specifications for the Ext2 filesystem. |
37 | ext3.txt | ||
38 | - info, mount options and specifications for the Ext3 filesystem. | ||
39 | files.txt | ||
40 | - info on file management in the Linux kernel. | ||
41 | fuse.txt | ||
42 | - info on the Filesystem in User SpacE including mount options. | ||
43 | hfs.txt | ||
44 | - info on the Macintosh HFS Filesystem for Linux. | ||
25 | hpfs.txt | 45 | hpfs.txt |
26 | - info and mount options for the OS/2 HPFS. | 46 | - info and mount options for the OS/2 HPFS. |
27 | isofs.txt | 47 | isofs.txt |
@@ -32,23 +52,43 @@ ncpfs.txt | |||
32 | - info on Novell Netware(tm) filesystem using NCP protocol. | 52 | - info on Novell Netware(tm) filesystem using NCP protocol. |
33 | ntfs.txt | 53 | ntfs.txt |
34 | - info and mount options for the NTFS filesystem (Windows NT). | 54 | - info and mount options for the NTFS filesystem (Windows NT). |
35 | proc.txt | ||
36 | - info on Linux's /proc filesystem. | ||
37 | ocfs2.txt | 55 | ocfs2.txt |
38 | - info and mount options for the OCFS2 clustered filesystem. | 56 | - info and mount options for the OCFS2 clustered filesystem. |
57 | porting | ||
58 | - various information on filesystem porting. | ||
59 | proc.txt | ||
60 | - info on Linux's /proc filesystem. | ||
61 | ramfs-rootfs-initramfs.txt | ||
62 | - info on the 'in memory' filesystems ramfs, rootfs and initramfs. | ||
63 | reiser4.txt | ||
64 | - info on the Reiser4 filesystem based on dancing tree algorithms. | ||
65 | relayfs.txt | ||
66 | - info on relayfs, for efficient streaming from kernel to user space. | ||
39 | romfs.txt | 67 | romfs.txt |
40 | - Description of the ROMFS filesystem. | 68 | - description of the ROMFS filesystem. |
41 | smbfs.txt | 69 | smbfs.txt |
42 | - info on using filesystems with the SMB protocol (Windows 3.11 and NT) | 70 | - info on using filesystems with the SMB protocol (Win 3.11 and NT). |
71 | spufs.txt | ||
72 | - info and mount options for the SPU filesystem used on Cell. | ||
73 | sysfs-pci.txt | ||
74 | - info on accessing PCI device resources through sysfs. | ||
75 | sysfs.txt | ||
76 | - info on sysfs, a ram-based filesystem for exporting kernel objects. | ||
43 | sysv-fs.txt | 77 | sysv-fs.txt |
44 | - info on the SystemV/V7/Xenix/Coherent filesystem. | 78 | - info on the SystemV/V7/Xenix/Coherent filesystem. |
79 | tmpfs.txt | ||
80 | - info on tmpfs, a filesystem that holds all files in virtual memory. | ||
45 | udf.txt | 81 | udf.txt |
46 | - info and mount options for the UDF filesystem. | 82 | - info and mount options for the UDF filesystem. |
47 | ufs.txt | 83 | ufs.txt |
48 | - info on the ufs filesystem. | 84 | - info on the ufs filesystem. |
85 | v9fs.txt | ||
86 | - v9fs is a Unix implementation of the Plan 9 9p remote fs protocol. | ||
49 | vfat.txt | 87 | vfat.txt |
50 | - info on using the VFAT filesystem used in Windows NT and Windows 95 | 88 | - info on using the VFAT filesystem used in Windows NT and Windows 95 |
51 | vfs.txt | 89 | vfs.txt |
52 | - Overview of the Virtual File System | 90 | - overview of the Virtual File System |
53 | xfs.txt | 91 | xfs.txt |
54 | - info and mount options for the XFS filesystem. | 92 | - info and mount options for the XFS filesystem. |
93 | xip.txt | ||
94 | - info on execute-in-place for file mappings. | ||
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/9p.txt index 24c7a9c41f0d..43b89c214d20 100644 --- a/Documentation/filesystems/v9fs.txt +++ b/Documentation/filesystems/9p.txt | |||
@@ -1,5 +1,5 @@ | |||
1 | V9FS: 9P2000 for Linux | 1 | v9fs: Plan 9 Resource Sharing for Linux |
2 | ====================== | 2 | ======================================= |
3 | 3 | ||
4 | ABOUT | 4 | ABOUT |
5 | ===== | 5 | ===== |
@@ -9,18 +9,19 @@ v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol. | |||
9 | This software was originally developed by Ron Minnich <rminnich@lanl.gov> | 9 | This software was originally developed by Ron Minnich <rminnich@lanl.gov> |
10 | and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson | 10 | and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson |
11 | <gwatson@lanl.gov> and most recently Eric Van Hensbergen | 11 | <gwatson@lanl.gov> and most recently Eric Van Hensbergen |
12 | <ericvh@gmail.com> and Latchesar Ionkov <lucho@ionkov.net>. | 12 | <ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox |
13 | <rsc@swtch.com>. | ||
13 | 14 | ||
14 | USAGE | 15 | USAGE |
15 | ===== | 16 | ===== |
16 | 17 | ||
17 | For remote file server: | 18 | For remote file server: |
18 | 19 | ||
19 | mount -t 9P 10.10.1.2 /mnt/9 | 20 | mount -t 9p 10.10.1.2 /mnt/9 |
20 | 21 | ||
21 | For Plan 9 From User Space applications (http://swtch.com/plan9) | 22 | For Plan 9 From User Space applications (http://swtch.com/plan9) |
22 | 23 | ||
23 | mount -t 9P `namespace`/acme /mnt/9 -o proto=unix,name=$USER | 24 | mount -t 9p `namespace`/acme /mnt/9 -o proto=unix,uname=$USER |
24 | 25 | ||
25 | OPTIONS | 26 | OPTIONS |
26 | ======= | 27 | ======= |
@@ -32,7 +33,7 @@ OPTIONS | |||
32 | fd - used passed file descriptors for connection | 33 | fd - used passed file descriptors for connection |
33 | (see rfdno and wfdno) | 34 | (see rfdno and wfdno) |
34 | 35 | ||
35 | name=name user name to attempt mount as on the remote server. The | 36 | uname=name user name to attempt mount as on the remote server. The |
36 | server may override or ignore this value. Certain user | 37 | server may override or ignore this value. Certain user |
37 | names may require authentication. | 38 | names may require authentication. |
38 | 39 | ||
@@ -42,7 +43,7 @@ OPTIONS | |||
42 | debug=n specifies debug level. The debug level is a bitmask. | 43 | debug=n specifies debug level. The debug level is a bitmask. |
43 | 0x01 = display verbose error messages | 44 | 0x01 = display verbose error messages |
44 | 0x02 = developer debug (DEBUG_CURRENT) | 45 | 0x02 = developer debug (DEBUG_CURRENT) |
45 | 0x04 = display 9P trace | 46 | 0x04 = display 9p trace |
46 | 0x08 = display VFS trace | 47 | 0x08 = display VFS trace |
47 | 0x10 = display Marshalling debug | 48 | 0x10 = display Marshalling debug |
48 | 0x20 = display RPC debug | 49 | 0x20 = display RPC debug |
@@ -53,11 +54,11 @@ OPTIONS | |||
53 | 54 | ||
54 | wfdno=n the file descriptor for writing with proto=fd | 55 | wfdno=n the file descriptor for writing with proto=fd |
55 | 56 | ||
56 | maxdata=n the number of bytes to use for 9P packet payload (msize) | 57 | maxdata=n the number of bytes to use for 9p packet payload (msize) |
57 | 58 | ||
58 | port=n port to connect to on the remote server | 59 | port=n port to connect to on the remote server |
59 | 60 | ||
60 | noextend force legacy mode (no 9P2000.u semantics) | 61 | noextend force legacy mode (no 9p2000.u semantics) |
61 | 62 | ||
62 | uid attempt to mount as a particular uid | 63 | uid attempt to mount as a particular uid |
63 | 64 | ||
@@ -72,7 +73,7 @@ OPTIONS | |||
72 | RESOURCES | 73 | RESOURCES |
73 | ========= | 74 | ========= |
74 | 75 | ||
75 | The Linux version of the 9P server is now maintained under the npfs project | 76 | The Linux version of the 9p server is now maintained under the npfs project |
76 | on sourceforge (http://sourceforge.net/projects/npfs). | 77 | on sourceforge (http://sourceforge.net/projects/npfs). |
77 | 78 | ||
78 | There are user and developer mailing lists available through the v9fs project | 79 | There are user and developer mailing lists available through the v9fs project |
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt index 424585ff6ea1..758e50401c16 100644 --- a/Documentation/filesystems/isofs.txt +++ b/Documentation/filesystems/isofs.txt | |||
@@ -9,9 +9,9 @@ when using discs encoded using Microsoft's Joliet extensions. | |||
9 | iocharset=name Character set to use for converting from Unicode to | 9 | iocharset=name Character set to use for converting from Unicode to |
10 | ASCII. Joliet filenames are stored in Unicode format, but | 10 | ASCII. Joliet filenames are stored in Unicode format, but |
11 | Unix for the most part doesn't know how to deal with Unicode. | 11 | Unix for the most part doesn't know how to deal with Unicode. |
12 | There is also an option of doing UTF8 translations with the | 12 | There is also an option of doing UTF-8 translations with the |
13 | utf8 option. | 13 | utf8 option. |
14 | utf8 Encode Unicode names in UTF8 format. Default is no. | 14 | utf8 Encode Unicode names in UTF-8 format. Default is no. |
15 | 15 | ||
16 | Mount options unique to the isofs filesystem. | 16 | Mount options unique to the isofs filesystem. |
17 | block=512 Set the block size for the disk to 512 bytes | 17 | block=512 Set the block size for the disk to 512 bytes |
diff --git a/Documentation/filesystems/jfs.txt b/Documentation/filesystems/jfs.txt index 3e992daf99ad..bae128663748 100644 --- a/Documentation/filesystems/jfs.txt +++ b/Documentation/filesystems/jfs.txt | |||
@@ -6,7 +6,7 @@ The following mount options are supported: | |||
6 | 6 | ||
7 | iocharset=name Character set to use for converting from Unicode to | 7 | iocharset=name Character set to use for converting from Unicode to |
8 | ASCII. The default is to do no conversion. Use | 8 | ASCII. The default is to do no conversion. Use |
9 | iocharset=utf8 for UTF8 translations. This requires | 9 | iocharset=utf8 for UTF-8 translations. This requires |
10 | CONFIG_NLS_UTF8 to be set in the kernel .config file. | 10 | CONFIG_NLS_UTF8 to be set in the kernel .config file. |
11 | iocharset=none specifies the default behavior explicitly. | 11 | iocharset=none specifies the default behavior explicitly. |
12 | 12 | ||
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt index 251168587899..638cbd3d2b00 100644 --- a/Documentation/filesystems/ntfs.txt +++ b/Documentation/filesystems/ntfs.txt | |||
@@ -457,6 +457,11 @@ ChangeLog | |||
457 | 457 | ||
458 | Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. | 458 | Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. |
459 | 459 | ||
460 | 2.1.27: | ||
461 | - Implement page migration support so the kernel can move memory used | ||
462 | by NTFS files and directories around for management purposes. | ||
463 | - Add support for writing to sparse files created with Windows XP SP2. | ||
464 | - Many minor improvements and bug fixes. | ||
460 | 2.1.26: | 465 | 2.1.26: |
461 | - Implement support for sector sizes above 512 bytes (up to the maximum | 466 | - Implement support for sector sizes above 512 bytes (up to the maximum |
462 | supported by NTFS which is 4096 bytes). | 467 | supported by NTFS which is 4096 bytes). |
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index 944cf109a6f5..99902ae6804e 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -121,7 +121,7 @@ Table 1-1: Process specific entries in /proc | |||
121 | .............................................................................. | 121 | .............................................................................. |
122 | File Content | 122 | File Content |
123 | cmdline Command line arguments | 123 | cmdline Command line arguments |
124 | cpu Current and last cpu in wich it was executed (2.4)(smp) | 124 | cpu Current and last cpu in which it was executed (2.4)(smp) |
125 | cwd Link to the current working directory | 125 | cwd Link to the current working directory |
126 | environ Values of environment variables | 126 | environ Values of environment variables |
127 | exe Link to the executable of this process | 127 | exe Link to the executable of this process |
@@ -309,13 +309,13 @@ is the same by default: | |||
309 | > cat /proc/irq/0/smp_affinity | 309 | > cat /proc/irq/0/smp_affinity |
310 | ffffffff | 310 | ffffffff |
311 | 311 | ||
312 | It's a bitmask, in wich you can specify wich CPUs can handle the IRQ, you can | 312 | It's a bitmask, in which you can specify which CPUs can handle the IRQ, you can |
313 | set it by doing: | 313 | set it by doing: |
314 | 314 | ||
315 | > echo 1 > /proc/irq/prof_cpu_mask | 315 | > echo 1 > /proc/irq/prof_cpu_mask |
316 | 316 | ||
317 | This means that only the first CPU will handle the IRQ, but you can also echo 5 | 317 | This means that only the first CPU will handle the IRQ, but you can also echo 5 |
318 | wich means that only the first and fourth CPU can handle the IRQ. | 318 | which means that only the first and fourth CPU can handle the IRQ. |
319 | 319 | ||
320 | The way IRQs are routed is handled by the IO-APIC, and it's Round Robin | 320 | The way IRQs are routed is handled by the IO-APIC, and it's Round Robin |
321 | between all the CPUs which are allowed to handle it. As usual the kernel has | 321 | between all the CPUs which are allowed to handle it. As usual the kernel has |
diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.txt index e5213bc301f7..511b4230c053 100644 --- a/Documentation/filesystems/udf.txt +++ b/Documentation/filesystems/udf.txt | |||
@@ -26,6 +26,20 @@ The following mount options are supported: | |||
26 | nostrict Unset strict conformance | 26 | nostrict Unset strict conformance |
27 | iocharset= Set the NLS character set | 27 | iocharset= Set the NLS character set |
28 | 28 | ||
29 | The uid= and gid= options need a bit more explaining. They will accept a | ||
30 | decimal numeric value which will be used as the default ID for that mount. | ||
31 | They will also accept the string "ignore" and "forget". For files on the disk | ||
32 | that are owned by nobody ( -1 ), they will instead look as if they are owned | ||
33 | by the default ID. The ignore option causes the default ID to override all | ||
34 | IDs on the disk, not just -1. The forget option causes all IDs to be written | ||
35 | to disk as -1, so when the media is later remounted, they will appear to be | ||
36 | owned by whatever default ID it is mounted with at that time. | ||
37 | |||
38 | For typical desktop use of removable media, you should set the ID to that | ||
39 | of the interactively logged on user, and also specify both the forget and | ||
40 | ignore options. This way the interactive user will always see the files | ||
41 | on the disk as belonging to him. | ||
42 | |||
29 | The remaining are for debugging and disaster recovery: | 43 | The remaining are for debugging and disaster recovery: |
30 | 44 | ||
31 | novrs Skip volume sequence recognition | 45 | novrs Skip volume sequence recognition |
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt index 5ead20c6c744..2001abbc60e6 100644 --- a/Documentation/filesystems/vfat.txt +++ b/Documentation/filesystems/vfat.txt | |||
@@ -28,16 +28,16 @@ iocharset=name -- Character set to use for converting between the | |||
28 | know how to deal with Unicode. | 28 | know how to deal with Unicode. |
29 | By default, FAT_DEFAULT_IOCHARSET setting is used. | 29 | By default, FAT_DEFAULT_IOCHARSET setting is used. |
30 | 30 | ||
31 | There is also an option of doing UTF8 translations | 31 | There is also an option of doing UTF-8 translations |
32 | with the utf8 option. | 32 | with the utf8 option. |
33 | 33 | ||
34 | NOTE: "iocharset=utf8" is not recommended. If unsure, | 34 | NOTE: "iocharset=utf8" is not recommended. If unsure, |
35 | you should consider the following option instead. | 35 | you should consider the following option instead. |
36 | 36 | ||
37 | utf8=<bool> -- UTF8 is the filesystem safe version of Unicode that | 37 | utf8=<bool> -- UTF-8 is the filesystem safe version of Unicode that |
38 | is used by the console. It can be be enabled for the | 38 | is used by the console. It can be be enabled for the |
39 | filesystem with this option. If 'uni_xlate' gets set, | 39 | filesystem with this option. If 'uni_xlate' gets set, |
40 | UTF8 gets disabled. | 40 | UTF-8 gets disabled. |
41 | 41 | ||
42 | uni_xlate=<bool> -- Translate unhandled Unicode characters to special | 42 | uni_xlate=<bool> -- Translate unhandled Unicode characters to special |
43 | escaped sequences. This would let you backup and | 43 | escaped sequences. This would let you backup and |
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index e56e842847d3..adaa899e5c90 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -230,10 +230,15 @@ only called from a process context (i.e. not from an interrupt handler | |||
230 | or bottom half). | 230 | or bottom half). |
231 | 231 | ||
232 | alloc_inode: this method is called by inode_alloc() to allocate memory | 232 | alloc_inode: this method is called by inode_alloc() to allocate memory |
233 | for struct inode and initialize it. | 233 | for struct inode and initialize it. If this function is not |
234 | defined, a simple 'struct inode' is allocated. Normally | ||
235 | alloc_inode will be used to allocate a larger structure which | ||
236 | contains a 'struct inode' embedded within it. | ||
234 | 237 | ||
235 | destroy_inode: this method is called by destroy_inode() to release | 238 | destroy_inode: this method is called by destroy_inode() to release |
236 | resources allocated for struct inode. | 239 | resources allocated for struct inode. It is only required if |
240 | ->alloc_inode was defined and simply undoes anything done by | ||
241 | ->alloc_inode. | ||
237 | 242 | ||
238 | read_inode: this method is called to read a specific inode from the | 243 | read_inode: this method is called to read a specific inode from the |
239 | mounted filesystem. The i_ino member in the struct inode is | 244 | mounted filesystem. The i_ino member in the struct inode is |
@@ -443,14 +448,81 @@ otherwise noted. | |||
443 | The Address Space Object | 448 | The Address Space Object |
444 | ======================== | 449 | ======================== |
445 | 450 | ||
446 | The address space object is used to identify pages in the page cache. | 451 | The address space object is used to group and manage pages in the page |
447 | 452 | cache. It can be used to keep track of the pages in a file (or | |
453 | anything else) and also track the mapping of sections of the file into | ||
454 | process address spaces. | ||
455 | |||
456 | There are a number of distinct yet related services that an | ||
457 | address-space can provide. These include communicating memory | ||
458 | pressure, page lookup by address, and keeping track of pages tagged as | ||
459 | Dirty or Writeback. | ||
460 | |||
461 | The first can be used independently to the others. The VM can try to | ||
462 | either write dirty pages in order to clean them, or release clean | ||
463 | pages in order to reuse them. To do this it can call the ->writepage | ||
464 | method on dirty pages, and ->releasepage on clean pages with | ||
465 | PagePrivate set. Clean pages without PagePrivate and with no external | ||
466 | references will be released without notice being given to the | ||
467 | address_space. | ||
468 | |||
469 | To achieve this functionality, pages need to be placed on an LRU with | ||
470 | lru_cache_add and mark_page_active needs to be called whenever the | ||
471 | page is used. | ||
472 | |||
473 | Pages are normally kept in a radix tree index by ->index. This tree | ||
474 | maintains information about the PG_Dirty and PG_Writeback status of | ||
475 | each page, so that pages with either of these flags can be found | ||
476 | quickly. | ||
477 | |||
478 | The Dirty tag is primarily used by mpage_writepages - the default | ||
479 | ->writepages method. It uses the tag to find dirty pages to call | ||
480 | ->writepage on. If mpage_writepages is not used (i.e. the address | ||
481 | provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is | ||
482 | almost unused. write_inode_now and sync_inode do use it (through | ||
483 | __sync_single_inode) to check if ->writepages has been successful in | ||
484 | writing out the whole address_space. | ||
485 | |||
486 | The Writeback tag is used by filemap*wait* and sync_page* functions, | ||
487 | via wait_on_page_writeback_range, to wait for all writeback to | ||
488 | complete. While waiting ->sync_page (if defined) will be called on | ||
489 | each page that is found to require writeback. | ||
490 | |||
491 | An address_space handler may attach extra information to a page, | ||
492 | typically using the 'private' field in the 'struct page'. If such | ||
493 | information is attached, the PG_Private flag should be set. This will | ||
494 | cause various VM routines to make extra calls into the address_space | ||
495 | handler to deal with that data. | ||
496 | |||
497 | An address space acts as an intermediate between storage and | ||
498 | application. Data is read into the address space a whole page at a | ||
499 | time, and provided to the application either by copying of the page, | ||
500 | or by memory-mapping the page. | ||
501 | Data is written into the address space by the application, and then | ||
502 | written-back to storage typically in whole pages, however the | ||
503 | address_space has finer control of write sizes. | ||
504 | |||
505 | The read process essentially only requires 'readpage'. The write | ||
506 | process is more complicated and uses prepare_write/commit_write or | ||
507 | set_page_dirty to write data into the address_space, and writepage, | ||
508 | sync_page, and writepages to writeback data to storage. | ||
509 | |||
510 | Adding and removing pages to/from an address_space is protected by the | ||
511 | inode's i_mutex. | ||
512 | |||
513 | When data is written to a page, the PG_Dirty flag should be set. It | ||
514 | typically remains set until writepage asks for it to be written. This | ||
515 | should clear PG_Dirty and set PG_Writeback. It can be actually | ||
516 | written at any point after PG_Dirty is clear. Once it is known to be | ||
517 | safe, PG_Writeback is cleared. | ||
518 | |||
519 | Writeback makes use of a writeback_control structure... | ||
448 | 520 | ||
449 | struct address_space_operations | 521 | struct address_space_operations |
450 | ------------------------------- | 522 | ------------------------------- |
451 | 523 | ||
452 | This describes how the VFS can manipulate mapping of a file to page cache in | 524 | This describes how the VFS can manipulate mapping of a file to page cache in |
453 | your filesystem. As of kernel 2.6.13, the following members are defined: | 525 | your filesystem. As of kernel 2.6.16, the following members are defined: |
454 | 526 | ||
455 | struct address_space_operations { | 527 | struct address_space_operations { |
456 | int (*writepage)(struct page *page, struct writeback_control *wbc); | 528 | int (*writepage)(struct page *page, struct writeback_control *wbc); |
@@ -469,47 +541,148 @@ struct address_space_operations { | |||
469 | loff_t offset, unsigned long nr_segs); | 541 | loff_t offset, unsigned long nr_segs); |
470 | struct page* (*get_xip_page)(struct address_space *, sector_t, | 542 | struct page* (*get_xip_page)(struct address_space *, sector_t, |
471 | int); | 543 | int); |
544 | /* migrate the contents of a page to the specified target */ | ||
545 | int (*migratepage) (struct page *, struct page *); | ||
472 | }; | 546 | }; |
473 | 547 | ||
474 | writepage: called by the VM write a dirty page to backing store. | 548 | writepage: called by the VM to write a dirty page to backing store. |
549 | This may happen for data integrity reasons (i.e. 'sync'), or | ||
550 | to free up memory (flush). The difference can be seen in | ||
551 | wbc->sync_mode. | ||
552 | The PG_Dirty flag has been cleared and PageLocked is true. | ||
553 | writepage should start writeout, should set PG_Writeback, | ||
554 | and should make sure the page is unlocked, either synchronously | ||
555 | or asynchronously when the write operation completes. | ||
556 | |||
557 | If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to | ||
558 | try too hard if there are problems, and may choose to write out | ||
559 | other pages from the mapping if that is easier (e.g. due to | ||
560 | internal dependencies). If it chooses not to start writeout, it | ||
561 | should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep | ||
562 | calling ->writepage on that page. | ||
563 | |||
564 | See the file "Locking" for more details. | ||
475 | 565 | ||
476 | readpage: called by the VM to read a page from backing store. | 566 | readpage: called by the VM to read a page from backing store. |
567 | The page will be Locked when readpage is called, and should be | ||
568 | unlocked and marked uptodate once the read completes. | ||
569 | If ->readpage discovers that it needs to unlock the page for | ||
570 | some reason, it can do so, and then return AOP_TRUNCATED_PAGE. | ||
571 | In this case, the page will be relocated, relocked and if | ||
572 | that all succeeds, ->readpage will be called again. | ||
477 | 573 | ||
478 | sync_page: called by the VM to notify the backing store to perform all | 574 | sync_page: called by the VM to notify the backing store to perform all |
479 | queued I/O operations for a page. I/O operations for other pages | 575 | queued I/O operations for a page. I/O operations for other pages |
480 | associated with this address_space object may also be performed. | 576 | associated with this address_space object may also be performed. |
481 | 577 | ||
578 | This function is optional and is called only for pages with | ||
579 | PG_Writeback set while waiting for the writeback to complete. | ||
580 | |||
482 | writepages: called by the VM to write out pages associated with the | 581 | writepages: called by the VM to write out pages associated with the |
483 | address_space object. | 582 | address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then |
583 | the writeback_control will specify a range of pages that must be | ||
584 | written out. If it is WBC_SYNC_NONE, then a nr_to_write is given | ||
585 | and that many pages should be written if possible. | ||
586 | If no ->writepages is given, then mpage_writepages is used | ||
587 | instead. This will choose pages from the address space that are | ||
588 | tagged as DIRTY and will pass them to ->writepage. | ||
484 | 589 | ||
485 | set_page_dirty: called by the VM to set a page dirty. | 590 | set_page_dirty: called by the VM to set a page dirty. |
591 | This is particularly needed if an address space attaches | ||
592 | private data to a page, and that data needs to be updated when | ||
593 | a page is dirtied. This is called, for example, when a memory | ||
594 | mapped page gets modified. | ||
595 | If defined, it should set the PageDirty flag, and the | ||
596 | PAGECACHE_TAG_DIRTY tag in the radix tree. | ||
486 | 597 | ||
487 | readpages: called by the VM to read pages associated with the address_space | 598 | readpages: called by the VM to read pages associated with the address_space |
488 | object. | 599 | object. This is essentially just a vector version of |
600 | readpage. Instead of just one page, several pages are | ||
601 | requested. | ||
602 | readpages is only used for read-ahead, so read errors are | ||
603 | ignored. If anything goes wrong, feel free to give up. | ||
489 | 604 | ||
490 | prepare_write: called by the generic write path in VM to set up a write | 605 | prepare_write: called by the generic write path in VM to set up a write |
491 | request for a page. | 606 | request for a page. This indicates to the address space that |
492 | 607 | the given range of bytes is about to be written. The | |
493 | commit_write: called by the generic write path in VM to write page to | 608 | address_space should check that the write will be able to |
494 | its backing store. | 609 | complete, by allocating space if necessary and doing any other |
610 | internal housekeeping. If the write will update parts of | ||
611 | any basic-blocks on storage, then those blocks should be | ||
612 | pre-read (if they haven't been read already) so that the | ||
613 | updated blocks can be written out properly. | ||
614 | The page will be locked. If prepare_write wants to unlock the | ||
615 | page it, like readpage, may do so and return | ||
616 | AOP_TRUNCATED_PAGE. | ||
617 | In this case the prepare_write will be retried one the lock is | ||
618 | regained. | ||
619 | |||
620 | commit_write: If prepare_write succeeds, new data will be copied | ||
621 | into the page and then commit_write will be called. It will | ||
622 | typically update the size of the file (if appropriate) and | ||
623 | mark the inode as dirty, and do any other related housekeeping | ||
624 | operations. It should avoid returning an error if possible - | ||
625 | errors should have been handled by prepare_write. | ||
495 | 626 | ||
496 | bmap: called by the VFS to map a logical block offset within object to | 627 | bmap: called by the VFS to map a logical block offset within object to |
497 | physical block number. This method is use by for the legacy FIBMAP | 628 | physical block number. This method is used by the FIBMAP |
498 | ioctl. Other uses are discouraged. | 629 | ioctl and for working with swap-files. To be able to swap to |
499 | 630 | a file, the file must have a stable mapping to a block | |
500 | invalidatepage: called by the VM on truncate to disassociate a page from its | 631 | device. The swap system does not go through the filesystem |
501 | address_space mapping. | 632 | but instead uses bmap to find out where the blocks in the file |
502 | 633 | are and uses those addresses directly. | |
503 | releasepage: called by the VFS to release filesystem specific metadata from | 634 | |
504 | a page. | 635 | |
505 | 636 | invalidatepage: If a page has PagePrivate set, then invalidatepage | |
506 | direct_IO: called by the VM for direct I/O writes and reads. | 637 | will be called when part or all of the page is to be removed |
638 | from the address space. This generally corresponds to either a | ||
639 | truncation or a complete invalidation of the address space | ||
640 | (in the latter case 'offset' will always be 0). | ||
641 | Any private data associated with the page should be updated | ||
642 | to reflect this truncation. If offset is 0, then | ||
643 | the private data should be released, because the page | ||
644 | must be able to be completely discarded. This may be done by | ||
645 | calling the ->releasepage function, but in this case the | ||
646 | release MUST succeed. | ||
647 | |||
648 | releasepage: releasepage is called on PagePrivate pages to indicate | ||
649 | that the page should be freed if possible. ->releasepage | ||
650 | should remove any private data from the page and clear the | ||
651 | PagePrivate flag. It may also remove the page from the | ||
652 | address_space. If this fails for some reason, it may indicate | ||
653 | failure with a 0 return value. | ||
654 | This is used in two distinct though related cases. The first | ||
655 | is when the VM finds a clean page with no active users and | ||
656 | wants to make it a free page. If ->releasepage succeeds, the | ||
657 | page will be removed from the address_space and become free. | ||
658 | |||
659 | The second case if when a request has been made to invalidate | ||
660 | some or all pages in an address_space. This can happen | ||
661 | through the fadvice(POSIX_FADV_DONTNEED) system call or by the | ||
662 | filesystem explicitly requesting it as nfs and 9fs do (when | ||
663 | they believe the cache may be out of date with storage) by | ||
664 | calling invalidate_inode_pages2(). | ||
665 | If the filesystem makes such a call, and needs to be certain | ||
666 | that all pages are invalidated, then its releasepage will | ||
667 | need to ensure this. Possibly it can clear the PageUptodate | ||
668 | bit if it cannot free private data yet. | ||
669 | |||
670 | direct_IO: called by the generic read/write routines to perform | ||
671 | direct_IO - that is IO requests which bypass the page cache | ||
672 | and transfer data directly between the storage and the | ||
673 | application's address space. | ||
507 | 674 | ||
508 | get_xip_page: called by the VM to translate a block number to a page. | 675 | get_xip_page: called by the VM to translate a block number to a page. |
509 | The page is valid until the corresponding filesystem is unmounted. | 676 | The page is valid until the corresponding filesystem is unmounted. |
510 | Filesystems that want to use execute-in-place (XIP) need to implement | 677 | Filesystems that want to use execute-in-place (XIP) need to implement |
511 | it. An example implementation can be found in fs/ext2/xip.c. | 678 | it. An example implementation can be found in fs/ext2/xip.c. |
512 | 679 | ||
680 | migrate_page: This is used to compact the physical memory usage. | ||
681 | If the VM wants to relocate a page (maybe off a memory card | ||
682 | that is signalling imminent failure) it will pass a new page | ||
683 | and an old page to this function. migrate_page should | ||
684 | transfer any private data across and update any references | ||
685 | that it has to the page. | ||
513 | 686 | ||
514 | The File Object | 687 | The File Object |
515 | =============== | 688 | =============== |