diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/9p.txt (renamed from Documentation/filesystems/v9fs.txt) | 21 | ||||
-rw-r--r-- | Documentation/filesystems/udf.txt | 14 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 217 |
3 files changed, 220 insertions, 32 deletions
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/9p.txt index 24c7a9c41f0d..43b89c214d20 100644 --- a/Documentation/filesystems/v9fs.txt +++ b/Documentation/filesystems/9p.txt | |||
@@ -1,5 +1,5 @@ | |||
1 | V9FS: 9P2000 for Linux | 1 | v9fs: Plan 9 Resource Sharing for Linux |
2 | ====================== | 2 | ======================================= |
3 | 3 | ||
4 | ABOUT | 4 | ABOUT |
5 | ===== | 5 | ===== |
@@ -9,18 +9,19 @@ v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol. | |||
9 | This software was originally developed by Ron Minnich <rminnich@lanl.gov> | 9 | This software was originally developed by Ron Minnich <rminnich@lanl.gov> |
10 | and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson | 10 | and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson |
11 | <gwatson@lanl.gov> and most recently Eric Van Hensbergen | 11 | <gwatson@lanl.gov> and most recently Eric Van Hensbergen |
12 | <ericvh@gmail.com> and Latchesar Ionkov <lucho@ionkov.net>. | 12 | <ericvh@gmail.com>, Latchesar Ionkov <lucho@ionkov.net> and Russ Cox |
13 | <rsc@swtch.com>. | ||
13 | 14 | ||
14 | USAGE | 15 | USAGE |
15 | ===== | 16 | ===== |
16 | 17 | ||
17 | For remote file server: | 18 | For remote file server: |
18 | 19 | ||
19 | mount -t 9P 10.10.1.2 /mnt/9 | 20 | mount -t 9p 10.10.1.2 /mnt/9 |
20 | 21 | ||
21 | For Plan 9 From User Space applications (http://swtch.com/plan9) | 22 | For Plan 9 From User Space applications (http://swtch.com/plan9) |
22 | 23 | ||
23 | mount -t 9P `namespace`/acme /mnt/9 -o proto=unix,name=$USER | 24 | mount -t 9p `namespace`/acme /mnt/9 -o proto=unix,uname=$USER |
24 | 25 | ||
25 | OPTIONS | 26 | OPTIONS |
26 | ======= | 27 | ======= |
@@ -32,7 +33,7 @@ OPTIONS | |||
32 | fd - used passed file descriptors for connection | 33 | fd - used passed file descriptors for connection |
33 | (see rfdno and wfdno) | 34 | (see rfdno and wfdno) |
34 | 35 | ||
35 | name=name user name to attempt mount as on the remote server. The | 36 | uname=name user name to attempt mount as on the remote server. The |
36 | server may override or ignore this value. Certain user | 37 | server may override or ignore this value. Certain user |
37 | names may require authentication. | 38 | names may require authentication. |
38 | 39 | ||
@@ -42,7 +43,7 @@ OPTIONS | |||
42 | debug=n specifies debug level. The debug level is a bitmask. | 43 | debug=n specifies debug level. The debug level is a bitmask. |
43 | 0x01 = display verbose error messages | 44 | 0x01 = display verbose error messages |
44 | 0x02 = developer debug (DEBUG_CURRENT) | 45 | 0x02 = developer debug (DEBUG_CURRENT) |
45 | 0x04 = display 9P trace | 46 | 0x04 = display 9p trace |
46 | 0x08 = display VFS trace | 47 | 0x08 = display VFS trace |
47 | 0x10 = display Marshalling debug | 48 | 0x10 = display Marshalling debug |
48 | 0x20 = display RPC debug | 49 | 0x20 = display RPC debug |
@@ -53,11 +54,11 @@ OPTIONS | |||
53 | 54 | ||
54 | wfdno=n the file descriptor for writing with proto=fd | 55 | wfdno=n the file descriptor for writing with proto=fd |
55 | 56 | ||
56 | maxdata=n the number of bytes to use for 9P packet payload (msize) | 57 | maxdata=n the number of bytes to use for 9p packet payload (msize) |
57 | 58 | ||
58 | port=n port to connect to on the remote server | 59 | port=n port to connect to on the remote server |
59 | 60 | ||
60 | noextend force legacy mode (no 9P2000.u semantics) | 61 | noextend force legacy mode (no 9p2000.u semantics) |
61 | 62 | ||
62 | uid attempt to mount as a particular uid | 63 | uid attempt to mount as a particular uid |
63 | 64 | ||
@@ -72,7 +73,7 @@ OPTIONS | |||
72 | RESOURCES | 73 | RESOURCES |
73 | ========= | 74 | ========= |
74 | 75 | ||
75 | The Linux version of the 9P server is now maintained under the npfs project | 76 | The Linux version of the 9p server is now maintained under the npfs project |
76 | on sourceforge (http://sourceforge.net/projects/npfs). | 77 | on sourceforge (http://sourceforge.net/projects/npfs). |
77 | 78 | ||
78 | There are user and developer mailing lists available through the v9fs project | 79 | There are user and developer mailing lists available through the v9fs project |
diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.txt index e5213bc301f7..511b4230c053 100644 --- a/Documentation/filesystems/udf.txt +++ b/Documentation/filesystems/udf.txt | |||
@@ -26,6 +26,20 @@ The following mount options are supported: | |||
26 | nostrict Unset strict conformance | 26 | nostrict Unset strict conformance |
27 | iocharset= Set the NLS character set | 27 | iocharset= Set the NLS character set |
28 | 28 | ||
29 | The uid= and gid= options need a bit more explaining. They will accept a | ||
30 | decimal numeric value which will be used as the default ID for that mount. | ||
31 | They will also accept the string "ignore" and "forget". For files on the disk | ||
32 | that are owned by nobody ( -1 ), they will instead look as if they are owned | ||
33 | by the default ID. The ignore option causes the default ID to override all | ||
34 | IDs on the disk, not just -1. The forget option causes all IDs to be written | ||
35 | to disk as -1, so when the media is later remounted, they will appear to be | ||
36 | owned by whatever default ID it is mounted with at that time. | ||
37 | |||
38 | For typical desktop use of removable media, you should set the ID to that | ||
39 | of the interactively logged on user, and also specify both the forget and | ||
40 | ignore options. This way the interactive user will always see the files | ||
41 | on the disk as belonging to him. | ||
42 | |||
29 | The remaining are for debugging and disaster recovery: | 43 | The remaining are for debugging and disaster recovery: |
30 | 44 | ||
31 | novrs Skip volume sequence recognition | 45 | novrs Skip volume sequence recognition |
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index e56e842847d3..adaa899e5c90 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -230,10 +230,15 @@ only called from a process context (i.e. not from an interrupt handler | |||
230 | or bottom half). | 230 | or bottom half). |
231 | 231 | ||
232 | alloc_inode: this method is called by inode_alloc() to allocate memory | 232 | alloc_inode: this method is called by inode_alloc() to allocate memory |
233 | for struct inode and initialize it. | 233 | for struct inode and initialize it. If this function is not |
234 | defined, a simple 'struct inode' is allocated. Normally | ||
235 | alloc_inode will be used to allocate a larger structure which | ||
236 | contains a 'struct inode' embedded within it. | ||
234 | 237 | ||
235 | destroy_inode: this method is called by destroy_inode() to release | 238 | destroy_inode: this method is called by destroy_inode() to release |
236 | resources allocated for struct inode. | 239 | resources allocated for struct inode. It is only required if |
240 | ->alloc_inode was defined and simply undoes anything done by | ||
241 | ->alloc_inode. | ||
237 | 242 | ||
238 | read_inode: this method is called to read a specific inode from the | 243 | read_inode: this method is called to read a specific inode from the |
239 | mounted filesystem. The i_ino member in the struct inode is | 244 | mounted filesystem. The i_ino member in the struct inode is |
@@ -443,14 +448,81 @@ otherwise noted. | |||
443 | The Address Space Object | 448 | The Address Space Object |
444 | ======================== | 449 | ======================== |
445 | 450 | ||
446 | The address space object is used to identify pages in the page cache. | 451 | The address space object is used to group and manage pages in the page |
447 | 452 | cache. It can be used to keep track of the pages in a file (or | |
453 | anything else) and also track the mapping of sections of the file into | ||
454 | process address spaces. | ||
455 | |||
456 | There are a number of distinct yet related services that an | ||
457 | address-space can provide. These include communicating memory | ||
458 | pressure, page lookup by address, and keeping track of pages tagged as | ||
459 | Dirty or Writeback. | ||
460 | |||
461 | The first can be used independently to the others. The VM can try to | ||
462 | either write dirty pages in order to clean them, or release clean | ||
463 | pages in order to reuse them. To do this it can call the ->writepage | ||
464 | method on dirty pages, and ->releasepage on clean pages with | ||
465 | PagePrivate set. Clean pages without PagePrivate and with no external | ||
466 | references will be released without notice being given to the | ||
467 | address_space. | ||
468 | |||
469 | To achieve this functionality, pages need to be placed on an LRU with | ||
470 | lru_cache_add and mark_page_active needs to be called whenever the | ||
471 | page is used. | ||
472 | |||
473 | Pages are normally kept in a radix tree index by ->index. This tree | ||
474 | maintains information about the PG_Dirty and PG_Writeback status of | ||
475 | each page, so that pages with either of these flags can be found | ||
476 | quickly. | ||
477 | |||
478 | The Dirty tag is primarily used by mpage_writepages - the default | ||
479 | ->writepages method. It uses the tag to find dirty pages to call | ||
480 | ->writepage on. If mpage_writepages is not used (i.e. the address | ||
481 | provides its own ->writepages) , the PAGECACHE_TAG_DIRTY tag is | ||
482 | almost unused. write_inode_now and sync_inode do use it (through | ||
483 | __sync_single_inode) to check if ->writepages has been successful in | ||
484 | writing out the whole address_space. | ||
485 | |||
486 | The Writeback tag is used by filemap*wait* and sync_page* functions, | ||
487 | via wait_on_page_writeback_range, to wait for all writeback to | ||
488 | complete. While waiting ->sync_page (if defined) will be called on | ||
489 | each page that is found to require writeback. | ||
490 | |||
491 | An address_space handler may attach extra information to a page, | ||
492 | typically using the 'private' field in the 'struct page'. If such | ||
493 | information is attached, the PG_Private flag should be set. This will | ||
494 | cause various VM routines to make extra calls into the address_space | ||
495 | handler to deal with that data. | ||
496 | |||
497 | An address space acts as an intermediate between storage and | ||
498 | application. Data is read into the address space a whole page at a | ||
499 | time, and provided to the application either by copying of the page, | ||
500 | or by memory-mapping the page. | ||
501 | Data is written into the address space by the application, and then | ||
502 | written-back to storage typically in whole pages, however the | ||
503 | address_space has finer control of write sizes. | ||
504 | |||
505 | The read process essentially only requires 'readpage'. The write | ||
506 | process is more complicated and uses prepare_write/commit_write or | ||
507 | set_page_dirty to write data into the address_space, and writepage, | ||
508 | sync_page, and writepages to writeback data to storage. | ||
509 | |||
510 | Adding and removing pages to/from an address_space is protected by the | ||
511 | inode's i_mutex. | ||
512 | |||
513 | When data is written to a page, the PG_Dirty flag should be set. It | ||
514 | typically remains set until writepage asks for it to be written. This | ||
515 | should clear PG_Dirty and set PG_Writeback. It can be actually | ||
516 | written at any point after PG_Dirty is clear. Once it is known to be | ||
517 | safe, PG_Writeback is cleared. | ||
518 | |||
519 | Writeback makes use of a writeback_control structure... | ||
448 | 520 | ||
449 | struct address_space_operations | 521 | struct address_space_operations |
450 | ------------------------------- | 522 | ------------------------------- |
451 | 523 | ||
452 | This describes how the VFS can manipulate mapping of a file to page cache in | 524 | This describes how the VFS can manipulate mapping of a file to page cache in |
453 | your filesystem. As of kernel 2.6.13, the following members are defined: | 525 | your filesystem. As of kernel 2.6.16, the following members are defined: |
454 | 526 | ||
455 | struct address_space_operations { | 527 | struct address_space_operations { |
456 | int (*writepage)(struct page *page, struct writeback_control *wbc); | 528 | int (*writepage)(struct page *page, struct writeback_control *wbc); |
@@ -469,47 +541,148 @@ struct address_space_operations { | |||
469 | loff_t offset, unsigned long nr_segs); | 541 | loff_t offset, unsigned long nr_segs); |
470 | struct page* (*get_xip_page)(struct address_space *, sector_t, | 542 | struct page* (*get_xip_page)(struct address_space *, sector_t, |
471 | int); | 543 | int); |
544 | /* migrate the contents of a page to the specified target */ | ||
545 | int (*migratepage) (struct page *, struct page *); | ||
472 | }; | 546 | }; |
473 | 547 | ||
474 | writepage: called by the VM write a dirty page to backing store. | 548 | writepage: called by the VM to write a dirty page to backing store. |
549 | This may happen for data integrity reasons (i.e. 'sync'), or | ||
550 | to free up memory (flush). The difference can be seen in | ||
551 | wbc->sync_mode. | ||
552 | The PG_Dirty flag has been cleared and PageLocked is true. | ||
553 | writepage should start writeout, should set PG_Writeback, | ||
554 | and should make sure the page is unlocked, either synchronously | ||
555 | or asynchronously when the write operation completes. | ||
556 | |||
557 | If wbc->sync_mode is WB_SYNC_NONE, ->writepage doesn't have to | ||
558 | try too hard if there are problems, and may choose to write out | ||
559 | other pages from the mapping if that is easier (e.g. due to | ||
560 | internal dependencies). If it chooses not to start writeout, it | ||
561 | should return AOP_WRITEPAGE_ACTIVATE so that the VM will not keep | ||
562 | calling ->writepage on that page. | ||
563 | |||
564 | See the file "Locking" for more details. | ||
475 | 565 | ||
476 | readpage: called by the VM to read a page from backing store. | 566 | readpage: called by the VM to read a page from backing store. |
567 | The page will be Locked when readpage is called, and should be | ||
568 | unlocked and marked uptodate once the read completes. | ||
569 | If ->readpage discovers that it needs to unlock the page for | ||
570 | some reason, it can do so, and then return AOP_TRUNCATED_PAGE. | ||
571 | In this case, the page will be relocated, relocked and if | ||
572 | that all succeeds, ->readpage will be called again. | ||
477 | 573 | ||
478 | sync_page: called by the VM to notify the backing store to perform all | 574 | sync_page: called by the VM to notify the backing store to perform all |
479 | queued I/O operations for a page. I/O operations for other pages | 575 | queued I/O operations for a page. I/O operations for other pages |
480 | associated with this address_space object may also be performed. | 576 | associated with this address_space object may also be performed. |
481 | 577 | ||
578 | This function is optional and is called only for pages with | ||
579 | PG_Writeback set while waiting for the writeback to complete. | ||
580 | |||
482 | writepages: called by the VM to write out pages associated with the | 581 | writepages: called by the VM to write out pages associated with the |
483 | address_space object. | 582 | address_space object. If wbc->sync_mode is WBC_SYNC_ALL, then |
583 | the writeback_control will specify a range of pages that must be | ||
584 | written out. If it is WBC_SYNC_NONE, then a nr_to_write is given | ||
585 | and that many pages should be written if possible. | ||
586 | If no ->writepages is given, then mpage_writepages is used | ||
587 | instead. This will choose pages from the address space that are | ||
588 | tagged as DIRTY and will pass them to ->writepage. | ||
484 | 589 | ||
485 | set_page_dirty: called by the VM to set a page dirty. | 590 | set_page_dirty: called by the VM to set a page dirty. |
591 | This is particularly needed if an address space attaches | ||
592 | private data to a page, and that data needs to be updated when | ||
593 | a page is dirtied. This is called, for example, when a memory | ||
594 | mapped page gets modified. | ||
595 | If defined, it should set the PageDirty flag, and the | ||
596 | PAGECACHE_TAG_DIRTY tag in the radix tree. | ||
486 | 597 | ||
487 | readpages: called by the VM to read pages associated with the address_space | 598 | readpages: called by the VM to read pages associated with the address_space |
488 | object. | 599 | object. This is essentially just a vector version of |
600 | readpage. Instead of just one page, several pages are | ||
601 | requested. | ||
602 | readpages is only used for read-ahead, so read errors are | ||
603 | ignored. If anything goes wrong, feel free to give up. | ||
489 | 604 | ||
490 | prepare_write: called by the generic write path in VM to set up a write | 605 | prepare_write: called by the generic write path in VM to set up a write |
491 | request for a page. | 606 | request for a page. This indicates to the address space that |
492 | 607 | the given range of bytes is about to be written. The | |
493 | commit_write: called by the generic write path in VM to write page to | 608 | address_space should check that the write will be able to |
494 | its backing store. | 609 | complete, by allocating space if necessary and doing any other |
610 | internal housekeeping. If the write will update parts of | ||
611 | any basic-blocks on storage, then those blocks should be | ||
612 | pre-read (if they haven't been read already) so that the | ||
613 | updated blocks can be written out properly. | ||
614 | The page will be locked. If prepare_write wants to unlock the | ||
615 | page it, like readpage, may do so and return | ||
616 | AOP_TRUNCATED_PAGE. | ||
617 | In this case the prepare_write will be retried one the lock is | ||
618 | regained. | ||
619 | |||
620 | commit_write: If prepare_write succeeds, new data will be copied | ||
621 | into the page and then commit_write will be called. It will | ||
622 | typically update the size of the file (if appropriate) and | ||
623 | mark the inode as dirty, and do any other related housekeeping | ||
624 | operations. It should avoid returning an error if possible - | ||
625 | errors should have been handled by prepare_write. | ||
495 | 626 | ||
496 | bmap: called by the VFS to map a logical block offset within object to | 627 | bmap: called by the VFS to map a logical block offset within object to |
497 | physical block number. This method is use by for the legacy FIBMAP | 628 | physical block number. This method is used by the FIBMAP |
498 | ioctl. Other uses are discouraged. | 629 | ioctl and for working with swap-files. To be able to swap to |
499 | 630 | a file, the file must have a stable mapping to a block | |
500 | invalidatepage: called by the VM on truncate to disassociate a page from its | 631 | device. The swap system does not go through the filesystem |
501 | address_space mapping. | 632 | but instead uses bmap to find out where the blocks in the file |
502 | 633 | are and uses those addresses directly. | |
503 | releasepage: called by the VFS to release filesystem specific metadata from | 634 | |
504 | a page. | 635 | |
505 | 636 | invalidatepage: If a page has PagePrivate set, then invalidatepage | |
506 | direct_IO: called by the VM for direct I/O writes and reads. | 637 | will be called when part or all of the page is to be removed |
638 | from the address space. This generally corresponds to either a | ||
639 | truncation or a complete invalidation of the address space | ||
640 | (in the latter case 'offset' will always be 0). | ||
641 | Any private data associated with the page should be updated | ||
642 | to reflect this truncation. If offset is 0, then | ||
643 | the private data should be released, because the page | ||
644 | must be able to be completely discarded. This may be done by | ||
645 | calling the ->releasepage function, but in this case the | ||
646 | release MUST succeed. | ||
647 | |||
648 | releasepage: releasepage is called on PagePrivate pages to indicate | ||
649 | that the page should be freed if possible. ->releasepage | ||
650 | should remove any private data from the page and clear the | ||
651 | PagePrivate flag. It may also remove the page from the | ||
652 | address_space. If this fails for some reason, it may indicate | ||
653 | failure with a 0 return value. | ||
654 | This is used in two distinct though related cases. The first | ||
655 | is when the VM finds a clean page with no active users and | ||
656 | wants to make it a free page. If ->releasepage succeeds, the | ||
657 | page will be removed from the address_space and become free. | ||
658 | |||
659 | The second case if when a request has been made to invalidate | ||
660 | some or all pages in an address_space. This can happen | ||
661 | through the fadvice(POSIX_FADV_DONTNEED) system call or by the | ||
662 | filesystem explicitly requesting it as nfs and 9fs do (when | ||
663 | they believe the cache may be out of date with storage) by | ||
664 | calling invalidate_inode_pages2(). | ||
665 | If the filesystem makes such a call, and needs to be certain | ||
666 | that all pages are invalidated, then its releasepage will | ||
667 | need to ensure this. Possibly it can clear the PageUptodate | ||
668 | bit if it cannot free private data yet. | ||
669 | |||
670 | direct_IO: called by the generic read/write routines to perform | ||
671 | direct_IO - that is IO requests which bypass the page cache | ||
672 | and transfer data directly between the storage and the | ||
673 | application's address space. | ||
507 | 674 | ||
508 | get_xip_page: called by the VM to translate a block number to a page. | 675 | get_xip_page: called by the VM to translate a block number to a page. |
509 | The page is valid until the corresponding filesystem is unmounted. | 676 | The page is valid until the corresponding filesystem is unmounted. |
510 | Filesystems that want to use execute-in-place (XIP) need to implement | 677 | Filesystems that want to use execute-in-place (XIP) need to implement |
511 | it. An example implementation can be found in fs/ext2/xip.c. | 678 | it. An example implementation can be found in fs/ext2/xip.c. |
512 | 679 | ||
680 | migrate_page: This is used to compact the physical memory usage. | ||
681 | If the VM wants to relocate a page (maybe off a memory card | ||
682 | that is signalling imminent failure) it will pass a new page | ||
683 | and an old page to this function. migrate_page should | ||
684 | transfer any private data across and update any references | ||
685 | that it has to the page. | ||
513 | 686 | ||
514 | The File Object | 687 | The File Object |
515 | =============== | 688 | =============== |