38 files changed, 13259 insertions, 0 deletions
diff --git a/Documentation/filesystems/00-INDEX b/Documentation/filesystems/00-INDEX
new file mode 100644
index 000000000000..bcfbab899b37
--- /dev/null
+++ b/Documentation/filesystems/00-INDEX
@@ -0,0 +1,50 @@
+00-INDEX
+        - this file (info on some of the filesystems supported by linux).
+Locking
+        - info on locking rules as they pertain to Linux VFS.
+adfs.txt
+        - info and mount options for the Acorn Advanced Disc Filing System.
+affs.txt
+        - info and mount options for the Amiga Fast File System.
+bfs.txt
+        - info for the SCO UnixWare Boot Filesystem (BFS).
+cifs.txt
+        - description of the CIFS filesystem
+coda.txt
+        - description of the CODA filesystem.
+cramfs.txt
+        - info on the cram filesystem for small storage (ROMs etc)
+devfs/
+        - directory containing devfs documentation.
+ext2.txt
+        - info, mount options and specifications for the Ext2 filesystem.
+fat_cvf.txt
+        - info on the Compressed Volume Files extension to the FAT filesystem
+hpfs.txt
+        - info and mount options for the OS/2 HPFS.
+isofs.txt
+        - info and mount options for the ISO 9660 (CDROM) filesystem.
+jfs.txt
+        - info and mount options for the JFS filesystem.
+ncpfs.txt
+        - info on Novell Netware(tm) filesystem using NCP protocol.
+ntfs.txt
+        - info and mount options for the NTFS filesystem (Windows NT).
+proc.txt
+        - info on Linux's /proc filesystem.
+romfs.txt
+        - Description of the ROMFS filesystem.
+smbfs.txt
+        - info on using filesystems with the SMB protocol (Windows 3.11 and NT)
+sysv-fs.txt
+        - info on the SystemV/V7/Xenix/Coherent filesystem.
+udf.txt
+        - info and mount options for the UDF filesystem.
+ufs.txt
+        - info on the ufs filesystem.
+vfat.txt
+        - info on using the VFAT filesystem used in Windows NT and Windows 95
+vfs.txt
+        - Overview of the Virtual File System
+xfs.txt
+        - info and mount options for the XFS filesystem.
diff --git a/Documentation/filesystems/Exporting b/Documentation/filesystems/Exporting
new file mode 100644
index 000000000000..31047e0fe14b
--- /dev/null
+++ b/Documentation/filesystems/Exporting
@@ -0,0 +1,176 @@
+Making Filesystems Exportable
+=============================
+Most filesystem operations require a dentry (or two) as a starting
+point.  Local applications have a reference-counted hold on suitable
+dentrys via open file descriptors or cwd/root.  However remote
+applications that access a filesystem via a remote filesystem protocol
+such as NFS may not be able to hold such a reference, and so need a
+different way to refer to a particular dentry.  As the alternative
+form of reference needs to be stable across renames, truncates, and
+server-reboot (among other things, though these tend to be the most
+problematic), there is no simple answer like 'filename'.
+The mechanism discussed here allows each filesystem implementation to
+specify how to generate an opaque (out side of the filesystem) byte
+string for any dentry, and how to find an appropriate dentry for any
+given opaque byte string.
+This byte string will be called a "filehandle fragment" as it
+corresponds to part of an NFS filehandle.
+A filesystem which supports the mapping between filehandle fragments
+and dentrys will be termed "exportable".
+Dcache Issues
+-------------
+The dcache normally contains a proper prefix of any given filesystem
+tree.  This means that if any filesystem object is in the dcache, then
+all of the ancestors of that filesystem object are also in the dcache.
+As normal access is by filename this prefix is created naturally and
+maintained easily (by each object maintaining a reference count on
+its parent).
+However when objects are included into the dcache by interpreting a
+filehandle fragment, there is no automatic creation of a path prefix
+for the object.  This leads to two related but distinct features of
+the dcache that are not needed for normal filesystem access.
+1/ The dcache must sometimes contain objects that are not part of the
+   proper prefix. i.e that are not connected to the root.
+2/ The dcache must be prepared for a newly found (via ->lookup) directory
+   to already have a (non-connected) dentry, and must be able to move
+   that dentry into place (based on the parent and name in the
+   ->lookup).   This is particularly needed for directories as
+   it is a dcache invariant that directories only have one dentry.
+To implement these features, the dcache has:
+a/ A dentry flag DCACHE_DISCONNECTED which is set on
+   any dentry that might not be part of the proper prefix.
+   This is set when anonymous dentries are created, and cleared when a
+   dentry is noticed to be a child of a dentry which is in the proper
+   prefix. 
+b/ A per-superblock list "s_anon" of dentries which are the roots of
+   subtrees that are not in the proper prefix.  These dentries, as
+   well as the proper prefix, need to be released at unmount time.  As
+   these dentries will not be hashed, they are linked together on the
+   d_hash list_head.
+c/ Helper routines to allocate anonymous dentries, and to help attach
+   loose directory dentries at lookup time. They are:
+    d_alloc_anon(inode) will return a dentry for the given inode.
+      If the inode already has a dentry, one of those is returned.
+      If it doesn't, a new anonymous (IS_ROOT and
+        DCACHE_DISCONNECTED) dentry is allocated and attached.
+      In the case of a directory, care is taken that only one dentry
+      can ever be attached.
+    d_splice_alias(inode, dentry) will make sure that there is a
+      dentry with the same name and parent as the given dentry, and
+      which refers to the given inode.
+      If the inode is a directory and already has a dentry, then that
+      dentry is d_moved over the given dentry.
+      If the passed dentry gets attached, care is taken that this is
+      mutually exclusive to a d_alloc_anon operation.
+      If the passed dentry is used, NULL is returned, else the used
+      dentry is returned.  This corresponds to the calling pattern of
+      ->lookup.
+  
+ 
+Filesystem Issues
+-----------------
+For a filesystem to be exportable it must:
+ 
+   1/ provide the filehandle fragment routines described below.
+   2/ make sure that d_splice_alias is used rather than d_add
+      when ->lookup finds an inode for a given parent and name.
+      Typically the ->lookup routine will end:
+                if (inode)
+                        return d_splice(inode, dentry);
+                d_add(dentry, inode);
+                return NULL;
+        }
+  A file system implementation declares that instances of the filesystem
+are exportable by setting the s_export_op field in the struct
+super_block.  This field must point to a "struct export_operations"
+struct which could potentially be full of NULLs, though normally at
+least get_parent will be set.
+ The primary operations are decode_fh and encode_fh.  
+decode_fh takes a filehandle fragment and tries to find or create a
+dentry for the object referred to by the filehandle.
+encode_fh takes a dentry and creates a filehandle fragment which can
+later be used to find/create a dentry for the same object.
+decode_fh will probably make use of "find_exported_dentry".
+This function lives in the "exportfs" module which a filesystem does
+not need unless it is being exported.  So rather that calling
+find_exported_dentry directly, each filesystem should call it through
+the find_exported_dentry pointer in it's export_operations table.
+This field is set correctly by the exporting agent (e.g. nfsd) when a
+filesystem is exported, and before any export operations are called.
+find_exported_dentry needs three support functions from the
+filesystem:
+  get_name.  When given a parent dentry and a child dentry, this
+    should find a name in the directory identified by the parent
+    dentry, which leads to the object identified by the child dentry.
+    If no get_name function is supplied, a default implementation is
+    provided which uses vfs_readdir to find potential names, and
+    matches inode numbers to find the correct match.
+  get_parent.  When given a dentry for a directory, this should return 
+    a dentry for the parent.  Quite possibly the parent dentry will
+    have been allocated by d_alloc_anon.  
+    The default get_parent function just returns an error so any
+    filehandle lookup that requires finding a parent will fail.
+    ->lookup("..") is *not* used as a default as it can leave ".."
+    entries in the dcache which are too messy to work with.
+  get_dentry.  When given an opaque datum, this should find the
+    implied object and create a dentry for it (possibly with
+    d_alloc_anon). 
+    The opaque datum is whatever is passed down by the decode_fh
+    function, and is often simply a fragment of the filehandle
+    fragment.
+    decode_fh passes two datums through find_exported_dentry.  One that 
+    should be used to identify the target object, and one that can be
+    used to identify the object's parent, should that be necessary.
+    The default get_dentry function assumes that the datum contains an
+    inode number and a generation number, and it attempts to get the
+    inode using "iget" and check it's validity by matching the
+    generation number.  A filesystem should only depend on the default
+    if iget can safely be used this way.
+If decode_fh and/or encode_fh are left as NULL, then default
+implementations are used.  These defaults are suitable for ext2 and 
+extremely similar filesystems (like ext3).
+The default encode_fh creates a filehandle fragment from the inode
+number and generation number of the target together with the inode
+number and generation number of the parent (if the parent is
+required).
+The default decode_fh extract the target and parent datums from the
+filehandle assuming the format used by the default encode_fh and
+passed them to find_exported_dentry.
+A filehandle fragment consists of an array of 1 or more 4byte words,
+together with a one byte "type".
+The decode_fh routine should not depend on the stated size that is
+passed to it.  This size may be larger than the original filehandle
+generated by encode_fh, in which case it will have been padded with
+nuls.  Rather, the encode_fh routine should choose a "type" which
+indicates the decode_fh how much of the filehandle is valid, and how
+it should be interpreted.
+ 
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
new file mode 100644
index 000000000000..a934baeeb33a
--- /dev/null
+++ b/Documentation/filesystems/Locking
@@ -0,0 +1,515 @@
+        The text below describes the locking rules for VFS-related methods.
+It is (believed to be) up-to-date. *Please*, if you change anything in
+prototypes or locking protocols - update this file. And update the relevant
+instances in the tree, don't leave that to maintainers of filesystems/devices/
+etc. At the very least, put the list of dubious cases in the end of this file.
+Don't turn it into log - maintainers of out-of-the-tree code are supposed to
+be able to use diff(1).
+        Thing currently missing here: socket operations. Alexey?
+--------------------------- dentry_operations --------------------------
+prototypes:
+        int (*d_revalidate)(struct dentry *, int);
+        int (*d_hash) (struct dentry *, struct qstr *);
+        int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
+        int (*d_delete)(struct dentry *);
+        void (*d_release)(struct dentry *);
+        void (*d_iput)(struct dentry *, struct inode *);
+locking rules:
+        none have BKL
+                dcache_lock     rename_lock     ->d_lock        may block
+d_revalidate:   no              no              no              yes
+d_hash          no              no              no              yes
+d_compare:      no              yes             no              no 
+d_delete:       yes             no              yes             no
+d_release:      no              no              no              yes
+d_iput:         no              no              no              yes
+--------------------------- inode_operations --------------------------- 
+prototypes:
+        int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
+        struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameid
+ata *);
+        int (*link) (struct dentry *,struct inode *,struct dentry *);
+        int (*unlink) (struct inode *,struct dentry *);
+        int (*symlink) (struct inode *,struct dentry *,const char *);
+        int (*mkdir) (struct inode *,struct dentry *,int);
+        int (*rmdir) (struct inode *,struct dentry *);
+        int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+        int (*rename) (struct inode *, struct dentry *,
+                        struct inode *, struct dentry *);
+        int (*readlink) (struct dentry *, char __user *,int);
+        int (*follow_link) (struct dentry *, struct nameidata *);
+        void (*truncate) (struct inode *);
+        int (*permission) (struct inode *, int, struct nameidata *);
+        int (*setattr) (struct dentry *, struct iattr *);
+        int (*getattr) (struct vfsmount *, struct dentry *, struct kstat *);
+        int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
+        ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
+        ssize_t (*listxattr) (struct dentry *, char *, size_t);
+        int (*removexattr) (struct dentry *, const char *);
+locking rules:
+        all may block, none have BKL
+                i_sem(inode)
+lookup:         yes
+create:         yes
+link:           yes (both)
+mknod:          yes
+symlink:        yes
+mkdir:          yes
+unlink:         yes (both)
+rmdir:          yes (both)      (see below)
+rename:         yes (all)       (see below)
+readlink:       no
+follow_link:    no
+truncate:       yes             (see below)
+setattr:        yes
+permission:     no
+getattr:        no
+setxattr:       yes
+getxattr:       no
+listxattr:      no
+removexattr:    yes
+        Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_sem on
+victim.
+        cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
+        ->truncate() is never called directly - it's a callback, not a
+method. It's called by vmtruncate() - library function normally used by
+->setattr(). Locking information above applies to that call (i.e. is
+inherited from ->setattr() - vmtruncate() is used when ATTR_SIZE had been
+passed).
+See Documentation/filesystems/directory-locking for more detailed discussion
+of the locking scheme for directory operations.
+--------------------------- super_operations ---------------------------
+prototypes:
+        struct inode *(*alloc_inode)(struct super_block *sb);
+        void (*destroy_inode)(struct inode *);
+        void (*read_inode) (struct inode *);
+        void (*dirty_inode) (struct inode *);
+        int (*write_inode) (struct inode *, int);
+        void (*put_inode) (struct inode *);
+        void (*drop_inode) (struct inode *);
+        void (*delete_inode) (struct inode *);
+        void (*put_super) (struct super_block *);
+        void (*write_super) (struct super_block *);
+        int (*sync_fs)(struct super_block *sb, int wait);
+        void (*write_super_lockfs) (struct super_block *);
+        void (*unlockfs) (struct super_block *);
+        int (*statfs) (struct super_block *, struct kstatfs *);
+        int (*remount_fs) (struct super_block *, int *, char *);
+        void (*clear_inode) (struct inode *);
+        void (*umount_begin) (struct super_block *);
+        int (*show_options)(struct seq_file *, struct vfsmount *);
+        ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
+        ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
+locking rules:
+        All may block.
+                        BKL     s_lock  s_umount
+alloc_inode:            no      no      no
+destroy_inode:          no
+read_inode:             no                              (see below)
+dirty_inode:            no                              (must not sleep)
+write_inode:            no
+put_inode:              no
+drop_inode:             no                              !!!inode_lock!!!
+delete_inode:           no
+put_super:              yes     yes     no
+write_super:            no      yes     read
+sync_fs:                no      no      read
+write_super_lockfs:     ?
+unlockfs:               ?
+statfs:                 no      no      no
+remount_fs:             no      yes     maybe           (see below)
+clear_inode:            no
+umount_begin:           yes     no      no
+show_options:           no                              (vfsmount->sem)
+quota_read:             no      no      no              (see below)
+quota_write:            no      no      no              (see below)
+->read_inode() is not a method - it's a callback used in iget().
+->remount_fs() will have the s_umount lock if it's already mounted.
+When called from get_sb_single, it does NOT have the s_umount lock.
+->quota_read() and ->quota_write() functions are both guaranteed to
+be the only ones operating on the quota file by the quota code (via
+dqio_sem) (unless an admin really wants to screw up something and
+writes to quota files with quotas on). For other details about locking
+see also dquot_operations section.
+--------------------------- file_system_type ---------------------------
+prototypes:
+        struct super_block *(*get_sb) (struct file_system_type *, int,
+                        const char *, void *);
+        void (*kill_sb) (struct super_block *);
+locking rules:
+                may block       BKL
+get_sb          yes             yes
+kill_sb         yes             yes
+->get_sb() returns error or a locked superblock (exclusive on ->s_umount).
+->kill_sb() takes a write-locked superblock, does all shutdown work on it,
+unlocks and drops the reference.
+--------------------------- address_space_operations --------------------------
+prototypes:
+        int (*writepage)(struct page *page, struct writeback_control *wbc);
+        int (*readpage)(struct file *, struct page *);
+        int (*sync_page)(struct page *);
+        int (*writepages)(struct address_space *, struct writeback_control *);
+        int (*set_page_dirty)(struct page *page);
+        int (*readpages)(struct file *filp, struct address_space *mapping,
+                        struct list_head *pages, unsigned nr_pages);
+        int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
+        int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
+        sector_t (*bmap)(struct address_space *, sector_t);
+        int (*invalidatepage) (struct page *, unsigned long);
+        int (*releasepage) (struct page *, int);
+        int (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
+                        loff_t offset, unsigned long nr_segs);
+locking rules:
+        All except set_page_dirty may block
+                        BKL     PageLocked(page)
+writepage:              no      yes, unlocks (see below)
+readpage:               no      yes, unlocks
+sync_page:              no      maybe
+writepages:             no
+set_page_dirty          no      no
+readpages:              no
+prepare_write:          no      yes
+commit_write:           no      yes
+bmap:                   yes
+invalidatepage:         no      yes
+releasepage:            no      yes
+direct_IO:              no
+        ->prepare_write(), ->commit_write(), ->sync_page() and ->readpage()
+may be called from the request handler (/dev/loop).
+        ->readpage() unlocks the page, either synchronously or via I/O
+completion.
+        ->readpages() populates the pagecache with the passed pages and starts
+I/O against them.  They come unlocked upon I/O completion.
+        ->writepage() is used for two purposes: for "memory cleansing" and for
+"sync".  These are quite different operations and the behaviour may differ
+depending upon the mode.
+If writepage is called for sync (wbc->sync_mode != WBC_SYNC_NONE) then
+it *must* start I/O against the page, even if that would involve
+blocking on in-progress I/O.
+If writepage is called for memory cleansing (sync_mode ==
+WBC_SYNC_NONE) then its role is to get as much writeout underway as
+possible.  So writepage should try to avoid blocking against
+currently-in-progress I/O.
+If the filesystem is not called for "sync" and it determines that it
+would need to block against in-progress I/O to be able to start new I/O
+against the page the filesystem should redirty the page with
+redirty_page_for_writepage(), then unlock the page and return zero.
+This may also be done to avoid internal deadlocks, but rarely.
+If the filesytem is called for sync then it must wait on any
+in-progress I/O and then start new I/O.
+The filesystem should unlock the page synchronously, before returning
+to the caller.
+Unless the filesystem is going to redirty_page_for_writepage(), unlock the page
+and return zero, writepage *must* run set_page_writeback() against the page,
+followed by unlocking it.  Once set_page_writeback() has been run against the
+page, write I/O can be submitted and the write I/O completion handler must run
+end_page_writeback() once the I/O is complete.  If no I/O is submitted, the
+filesystem must run end_page_writeback() against the page before returning from
+writepage.
+That is: after 2.5.12, pages which are under writeout are *not* locked.  Note,
+if the filesystem needs the page to be locked during writeout, that is ok, too,
+the page is allowed to be unlocked at any point in time between the calls to
+set_page_writeback() and end_page_writeback().
+Note, failure to run either redirty_page_for_writepage() or the combination of
+set_page_writeback()/end_page_writeback() on a page submitted to writepage
+will leave the page itself marked clean but it will be tagged as dirty in the
+radix tree.  This incoherency can lead to all sorts of hard-to-debug problems
+in the filesystem like having dirty inodes at umount and losing written data.
+        ->sync_page() locking rules are not well-defined - usually it is called
+with lock on page, but that is not guaranteed. Considering the currently
+existing instances of this method ->sync_page() itself doesn't look
+well-defined...
+        ->writepages() is used for periodic writeback and for syscall-initiated
+sync operations.  The address_space should start I/O against at least
+*nr_to_write pages.  *nr_to_write must be decremented for each page which is
+written.  The address_space implementation may write more (or less) pages
+than *nr_to_write asks for, but it should try to be reasonably close.  If
+nr_to_write is NULL, all dirty pages must be written.
+writepages should _only_ write pages which are present on
+mapping->io_pages.
+        ->set_page_dirty() is called from various places in the kernel
+when the target page is marked as needing writeback.  It may be called
+under spinlock (it cannot block) and is sometimes called with the page
+not locked.
+        ->bmap() is currently used by legacy ioctl() (FIBMAP) provided by some
+filesystems and by the swapper. The latter will eventually go away. All
+instances do not actually need the BKL. Please, keep it that way and don't
+breed new callers.
+        ->invalidatepage() is called when the filesystem must attempt to drop
+some or all of the buffers from the page when it is being truncated.  It
+returns zero on success.  If ->invalidatepage is zero, the kernel uses
+block_invalidatepage() instead.
+        ->releasepage() is called when the kernel is about to try to drop the
+buffers from the page in preparation for freeing it.  It returns zero to
+indicate that the buffers are (or may be) freeable.  If ->releasepage is zero,
+the kernel assumes that the fs has no private interest in the buffers.
+        Note: currently almost all instances of address_space methods are
+using BKL for internal serialization and that's one of the worst sources
+of contention. Normally they are calling library functions (in fs/buffer.c)
+and pass foo_get_block() as a callback (on local block-based filesystems,
+indeed). BKL is not needed for library stuff and is usually taken by
+foo_get_block(). It's an overkill, since block bitmaps can be protected by
+internal fs locking and real critical areas are much smaller than the areas
+filesystems protect now.
+----------------------- file_lock_operations ------------------------------
+prototypes:
+        void (*fl_insert)(struct file_lock *);  /* lock insertion callback */
+        void (*fl_remove)(struct file_lock *);  /* lock removal callback */
+        void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
+        void (*fl_release_private)(struct file_lock *);
+locking rules:
+                        BKL     may block
+fl_insert:              yes     no
+fl_remove:              yes     no
+fl_copy_lock:           yes     no
+fl_release_private:     yes     yes
+----------------------- lock_manager_operations ---------------------------
+prototypes:
+        int (*fl_compare_owner)(struct file_lock *, struct file_lock *);
+        void (*fl_notify)(struct file_lock *);  /* unblock callback */
+        void (*fl_copy_lock)(struct file_lock *, struct file_lock *);
+        void (*fl_release_private)(struct file_lock *);
+        void (*fl_break)(struct file_lock *); /* break_lease callback */
+locking rules:
+                        BKL     may block
+fl_compare_owner:       yes     no
+fl_notify:              yes     no
+fl_copy_lock:           yes     no
+fl_release_private:     yes     yes
+fl_break:               yes     no
+        Currently only NFSD and NLM provide instances of this class. None of the
+them block. If you have out-of-tree instances - please, show up. Locking
+in that area will change.
+--------------------------- buffer_head -----------------------------------
+prototypes:
+        void (*b_end_io)(struct buffer_head *bh, int uptodate);
+locking rules:
+        called from interrupts. In other words, extreme care is needed here.
+bh is locked, but that's all warranties we have here. Currently only RAID1,
+highmem, fs/buffer.c, and fs/ntfs/aops.c are providing these. Block devices
+call this method upon the IO completion.
+--------------------------- block_device_operations -----------------------
+prototypes:
+        int (*open) (struct inode *, struct file *);
+        int (*release) (struct inode *, struct file *);
+        int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
+        int (*media_changed) (struct gendisk *);
+        int (*revalidate_disk) (struct gendisk *);
+locking rules:
+                        BKL     bd_sem
+open:                   yes     yes
+release:                yes     yes
+ioctl:                  yes     no
+media_changed:          no      no
+revalidate_disk:        no      no
+The last two are called only from check_disk_change().
+--------------------------- file_operations -------------------------------
+prototypes:
+        loff_t (*llseek) (struct file *, loff_t, int);
+        ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
+        ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
+        ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
+        ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t,
+                        loff_t);
+        int (*readdir) (struct file *, void *, filldir_t);
+        unsigned int (*poll) (struct file *, struct poll_table_struct *);
+        int (*ioctl) (struct inode *, struct file *, unsigned int,
+                        unsigned long);
+        long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
+        long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
+        int (*mmap) (struct file *, struct vm_area_struct *);
+        int (*open) (struct inode *, struct file *);
+        int (*flush) (struct file *);
+        int (*release) (struct inode *, struct file *);
+        int (*fsync) (struct file *, struct dentry *, int datasync);
+        int (*aio_fsync) (struct kiocb *, int datasync);
+        int (*fasync) (int, struct file *, int);
+        int (*lock) (struct file *, int, struct file_lock *);
+        ssize_t (*readv) (struct file *, const struct iovec *, unsigned long,
+                        loff_t *);
+        ssize_t (*writev) (struct file *, const struct iovec *, unsigned long,
+                        loff_t *);
+        ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t,
+                        void __user *);
+        ssize_t (*sendpage) (struct file *, struct page *, int, size_t,
+                        loff_t *, int);
+        unsigned long (*get_unmapped_area)(struct file *, unsigned long,
+                        unsigned long, unsigned long, unsigned long);
+        int (*check_flags)(int);
+        int (*dir_notify)(struct file *, unsigned long);
+};
+locking rules:
+        All except ->poll() may block.
+                        BKL
+llseek:                 no      (see below)
+read:                   no
+aio_read:               no
+write:                  no
+aio_write:              no
+readdir:                no
+poll:                   no
+ioctl:                  yes     (see below)
+unlocked_ioctl:         no      (see below)
+compat_ioctl:           no
+mmap:                   no
+open:                   maybe   (see below)
+flush:                  no
+release:                no
+fsync:                  no      (see below)
+aio_fsync:              no
+fasync:                 yes     (see below)
+lock:                   yes
+readv:                  no
+writev:                 no
+sendfile:               no
+sendpage:               no
+get_unmapped_area:      no
+check_flags:            no
+dir_notify:             no
+->llseek() locking has moved from llseek to the individual llseek
+implementations.  If your fs is not using generic_file_llseek, you
+need to acquire and release the appropriate locks in your ->llseek().
+For many filesystems, it is probably safe to acquire the inode
+semaphore.  Note some filesystems (i.e. remote ones) provide no
+protection for i_size so you will need to use the BKL.
+->open() locking is in-transit: big lock partially moved into the methods.
+The only exception is ->open() in the instances of file_operations that never
+end up in ->i_fop/->proc_fops, i.e. ones that belong to character devices
+(chrdev_open() takes lock before replacing ->f_op and calling the secondary
+method. As soon as we fix the handling of module reference counters all
+instances of ->open() will be called without the BKL.
+Note: ext2_release() was *the* source of contention on fs-intensive
+loads and dropping BKL on ->release() helps to get rid of that (we still
+grab BKL for cases when we close a file that had been opened r/w, but that
+can and should be done using the internal locking with smaller critical areas).
+Current worst offender is ext2_get_block()...
+->fasync() is a mess. This area needs a big cleanup and that will probably
+affect locking.
+->readdir() and ->ioctl() on directories must be changed. Ideally we would
+move ->readdir() to inode_operations and use a separate method for directory
+->ioctl() or kill the latter completely. One of the problems is that for
+anything that resembles union-mount we won't have a struct file for all
+components. And there are other reasons why the current interface is a mess...
+->ioctl() on regular files is superceded by the ->unlocked_ioctl() that
+doesn't take the BKL.
+->read on directories probably must go away - we should just enforce -EISDIR
+in sys_read() and friends.
+->fsync() has i_sem on inode.
+--------------------------- dquot_operations -------------------------------
+prototypes:
+        int (*initialize) (struct inode *, int);
+        int (*drop) (struct inode *);
+        int (*alloc_space) (struct inode *, qsize_t, int);
+        int (*alloc_inode) (const struct inode *, unsigned long);
+        int (*free_space) (struct inode *, qsize_t);
+        int (*free_inode) (const struct inode *, unsigned long);
+        int (*transfer) (struct inode *, struct iattr *);
+        int (*write_dquot) (struct dquot *);
+        int (*acquire_dquot) (struct dquot *);
+        int (*release_dquot) (struct dquot *);
+        int (*mark_dirty) (struct dquot *);
+        int (*write_info) (struct super_block *, int);
+These operations are intended to be more or less wrapping functions that ensure
+a proper locking wrt the filesystem and call the generic quota operations.
+What filesystem should expect from the generic quota functions:
+                FS recursion    Held locks when called
+initialize:     yes             maybe dqonoff_sem
+drop:           yes             -
+alloc_space:    ->mark_dirty()  -
+alloc_inode:    ->mark_dirty()  -
+free_space:     ->mark_dirty()  -
+free_inode:     ->mark_dirty()  -
+transfer:       yes             -
+write_dquot:    yes             dqonoff_sem or dqptr_sem
+acquire_dquot:  yes             dqonoff_sem or dqptr_sem
+release_dquot:  yes             dqonoff_sem or dqptr_sem
+mark_dirty:     no              -
+write_info:     yes             dqonoff_sem
+FS recursion means calling ->quota_read() and ->quota_write() from superblock
+operations.
+->alloc_space(), ->alloc_inode(), ->free_space(), ->free_inode() are called
+only directly by the filesystem and do not call any fs functions only
+the ->mark_dirty() operation.
+More details about quota locking can be found in fs/dquot.c.
+--------------------------- vm_operations_struct -----------------------------
+prototypes:
+        void (*open)(struct vm_area_struct*);
+        void (*close)(struct vm_area_struct*);
+        struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *);
+locking rules:
+                BKL     mmap_sem
+open:           no      yes
+close:          no      yes
+nopage:         no      yes
+================================================================================
+                        Dubious stuff
+(if you break something or notice that it is broken and do not fix it yourself
+- at least put it here)
+ipc/shm.c::shm_delete() - may need BKL.
+->read() and ->write() in many drivers are (probably) missing BKL.
+drivers/sgi/char/graphics.c::sgi_graphics_nopage() - may need BKL.
diff --git a/Documentation/filesystems/adfs.txt b/Documentation/filesystems/adfs.txt
new file mode 100644
index 000000000000..060abb0c7004
--- /dev/null
+++ b/Documentation/filesystems/adfs.txt
@@ -0,0 +1,57 @@
+Mount options for ADFS
+----------------------
+  uid=nnn       All files in the partition will be owned by
+                user id nnn.  Default 0 (root).
+  gid=nnn       All files in the partition willbe in group
+                nnn.  Default 0 (root).
+  ownmask=nnn   The permission mask for ADFS 'owner' permissions
+                will be nnn.  Default 0700.
+  othmask=nnn   The permission mask for ADFS 'other' permissions
+                will be nnn.  Default 0077.
+Mapping of ADFS permissions to Linux permissions
+------------------------------------------------
+  ADFS permissions consist of the following:
+        Owner read
+        Owner write
+        Other read
+        Other write
+  (In older versions, an 'execute' permission did exist, but this
+   does not hold the same meaning as the Linux 'execute' permission
+   and is now obsolete).
+  The mapping is performed as follows:
+        Owner read                              -> -r--r--r--
+        Owner write                             -> --w--w---w
+        Owner read and filetype UnixExec        -> ---x--x--x
+    These are then masked by ownmask, eg 700    -> -rwx------
+        Possible owner mode permissions         -> -rwx------
+        Other read                              -> -r--r--r--
+        Other write                             -> --w--w--w-
+        Other read and filetype UnixExec        -> ---x--x--x
+    These are then masked by othmask, eg 077    -> ----rwxrwx
+        Possible other mode permissions         -> ----rwxrwx
+  Hence, with the default masks, if a file is owner read/write, and
+  not a UnixExec filetype, then the permissions will be:
+                        -rw-------
+  However, if the masks were ownmask=0770,othmask=0007, then this would
+  be modified to:
+                        -rw-rw----
+  There is no restriction on what you can do with these masks.  You may
+  wish that either read bits give read access to the file for all, but
+  keep the default write protection (ownmask=0755,othmask=0577):
+                        -rw-r--r--
+  You can therefore tailor the permission translation to whatever you
+  desire the permissions should be under Linux.
diff --git a/Documentation/filesystems/affs.txt b/Documentation/filesystems/affs.txt
new file mode 100644
index 000000000000..30c9738590f4
--- /dev/null
+++ b/Documentation/filesystems/affs.txt
@@ -0,0 +1,219 @@
+Overview of Amiga Filesystems
+=============================
+Not all varieties of the Amiga filesystems are supported for reading and
+writing. The Amiga currently knows six different filesystems:
+DOS\0           The old or original filesystem, not really suited for
+                hard disks and normally not used on them, either.
+                Supported read/write.
+DOS\1           The original Fast File System. Supported read/write.
+DOS\2           The old "international" filesystem. International means that
+                a bug has been fixed so that accented ("international") letters
+                in file names are case-insensitive, as they ought to be.
+                Supported read/write.
+DOS\3           The "international" Fast File System.  Supported read/write.
+DOS\4           The original filesystem with directory cache. The directory
+                cache speeds up directory accesses on floppies considerably,
+                but slows down file creation/deletion. Doesn't make much
+                sense on hard disks. Supported read only.
+DOS\5           The Fast File System with directory cache. Supported read only.
+All of the above filesystems allow block sizes from 512 to 32K bytes.
+Supported block sizes are: 512, 1024, 2048 and 4096 bytes. Larger blocks
+speed up almost everything at the expense of wasted disk space. The speed
+gain above 4K seems not really worth the price, so you don't lose too
+much here, either.
+The muFS (multi user File System) equivalents of the above file systems
+are supported, too.
+Mount options for the AFFS
+==========================
+protect         If this option is set, the protection bits cannot be altered.
+setuid[=uid]    This sets the owner of all files and directories in the file
+                system to uid or the uid of the current user, respectively.
+setgid[=gid]    Same as above, but for gid.
+mode=mode       Sets the mode flags to the given (octal) value, regardless
+                of the original permissions. Directories will get an x
+                permission if the corresponding r bit is set.
+                This is useful since most of the plain AmigaOS files
+                will map to 600.
+reserved=num    Sets the number of reserved blocks at the start of the
+                partition to num. You should never need this option.
+                Default is 2.
+root=block      Sets the block number of the root block. This should never
+                be necessary.
+bs=blksize      Sets the blocksize to blksize. Valid block sizes are 512,
+                1024, 2048 and 4096. Like the root option, this should
+                never be necessary, as the affs can figure it out itself.
+quiet           The file system will not return an error for disallowed
+                mode changes.
+verbose         The volume name, file system type and block size will
+                be written to the syslog when the filesystem is mounted.
+mufs            The filesystem is really a muFS, also it doesn't
+                identify itself as one. This option is necessary if
+                the filesystem wasn't formatted as muFS, but is used
+                as one.
+prefix=path     Path will be prefixed to every absolute path name of
+                symbolic links on an AFFS partition. Default = "/".
+                (See below.)
+volume=name     When symbolic links with an absolute path are created
+                on an AFFS partition, name will be prepended as the
+                volume name. Default = "" (empty string).
+                (See below.)
+Handling of the Users/Groups and protection flags
+=================================================
+Amiga -> Linux:
+The Amiga protection flags RWEDRWEDHSPARWED are handled as follows:
+  - R maps to r for user, group and others. On directories, R implies x.
+  - If both W and D are allowed, w will be set.
+  - E maps to x.
+  - H and P are always retained and ignored under Linux.
+  - A is always reset when a file is written to.
+User id and group id will be used unless set[gu]id are given as mount
+options. Since most of the Amiga file systems are single user systems
+they will be owned by root. The root directory (the mount point) of the
+Amiga filesystem will be owned by the user who actually mounts the
+filesystem (the root directory doesn't have uid/gid fields).
+Linux -> Amiga:
+The Linux rwxrwxrwx file mode is handled as follows:
+  - r permission will set R for user, group and others.
+  - w permission will set W and D for user, group and others.
+  - x permission of the user will set E for plain files.
+  - All other flags (suid, sgid, ...) are ignored and will
+    not be retained.
+    
+Newly created files and directories will get the user and group ID
+of the current user and a mode according to the umask.
+Symbolic links
+==============
+Although the Amiga and Linux file systems resemble each other, there
+are some, not always subtle, differences. One of them becomes apparent
+with symbolic links. While Linux has a file system with exactly one
+root directory, the Amiga has a separate root directory for each
+file system (for example, partition, floppy disk, ...). With the Amiga,
+these entities are called "volumes". They have symbolic names which
+can be used to access them. Thus, symbolic links can point to a
+different volume. AFFS turns the volume name into a directory name
+and prepends the prefix path (see prefix option) to it.
+Example:
+You mount all your Amiga partitions under /amiga/<volume> (where
+<volume> is the name of the volume), and you give the option
+"prefix=/amiga/" when mounting all your AFFS partitions. (They
+might be "User", "WB" and "Graphics", the mount points /amiga/User,
+/amiga/WB and /amiga/Graphics). A symbolic link referring to
+"User:sc/include/dos/dos.h" will be followed to
+"/amiga/User/sc/include/dos/dos.h".
+Examples
+========
+Command line:
+    mount  Archive/Amiga/Workbench3.1.adf /mnt -t affs -o loop,verbose
+    mount  /dev/sda3 /Amiga -t affs
+/etc/fstab entry:
+    /dev/sdb5   /amiga/Workbench    affs    noauto,user,exec,verbose 0 0
+IMPORTANT NOTE
+==============
+If you boot Windows 95 (don't know about 3.x, 98 and NT) while you
+have an Amiga harddisk connected to your PC, it will overwrite
+the bytes 0x00dc..0x00df of block 0 with garbage, thus invalidating
+the Rigid Disk Block. Sheer luck has it that this is an unused
+area of the RDB, so only the checksum doesn't match anymore.
+Linux will ignore this garbage and recognize the RDB anyway, but
+before you connect that drive to your Amiga again, you must
+restore or repair your RDB. So please do make a backup copy of it
+before booting Windows!
+If the damage is already done, the following should fix the RDB
+(where <disk> is the device name).
+DO AT YOUR OWN RISK:
+  dd if=/dev/<disk> of=rdb.tmp count=1
+  cp rdb.tmp rdb.fixed
+  dd if=/dev/zero of=rdb.fixed bs=1 seek=220 count=4
+  dd if=rdb.fixed of=/dev/<disk>
+Bugs, Restrictions, Caveats
+===========================
+Quite a few things may not work as advertised. Not everything is
+tested, though several hundred MB have been read and written using
+this fs. For a most up-to-date list of bugs please consult
+fs/affs/Changes.
+Filenames are truncated to 30 characters without warning (this
+can be changed by setting the compile-time option AFFS_NO_TRUNCATE
+in include/linux/amigaffs.h).
+Case is ignored by the affs in filename matching, but Linux shells
+do care about the case. Example (with /wb being an affs mounted fs):
+    rm /wb/WRONGCASE
+will remove /mnt/wrongcase, but
+    rm /wb/WR*
+will not since the names are matched by the shell.
+The block allocation is designed for hard disk partitions. If more
+than 1 process writes to a (small) diskette, the blocks are allocated
+in an ugly way (but the real AFFS doesn't do much better). This
+is also true when space gets tight.
+You cannot execute programs on an OFS (Old File System), since the
+program files cannot be memory mapped due to the 488 byte blocks.
+For the same reason you cannot mount an image on such a filesystem
+via the loopback device.
+The bitmap valid flag in the root block may not be accurate when the
+system crashes while an affs partition is mounted. There's currently
+no way to fix a garbled filesystem without an Amiga (disk validator)
+or manually (who would do this?). Maybe later.
+If you mount affs partitions on system startup, you may want to tell
+fsck that the fs should not be checked (place a '0' in the sixth field
+of /etc/fstab).
+It's not possible to read floppy disks with a normal PC or workstation
+due to an incompatibility with the Amiga floppy controller.
+If you are interested in an Amiga Emulator for Linux, look at
+http://www-users.informatik.rwth-aachen.de/~crux/uae.html
diff --git a/Documentation/filesystems/afs.txt b/Documentation/filesystems/afs.txt
new file mode 100644
index 000000000000..2f4237dfb8c7
--- /dev/null
+++ b/Documentation/filesystems/afs.txt
@@ -0,0 +1,155 @@
+                             kAFS: AFS FILESYSTEM
+                             ====================
+ABOUT
+=====
+This filesystem provides a fairly simple AFS filesystem driver. It is under
+development and only provides very basic facilities. It does not yet support
+the following AFS features:
+        (*) Write support.
+        (*) Communications security.
+        (*) Local caching.
+        (*) pioctl() system call.
+        (*) Automatic mounting of embedded mountpoints.
+USAGE
+=====
+When inserting the driver modules the root cell must be specified along with a
+list of volume location server IP addresses:
+        insmod rxrpc.o
+        insmod kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+The first module is a driver for the RxRPC remote operation protocol, and the
+second is the actual filesystem driver for the AFS filesystem.
+Once the module has been loaded, more modules can be added by the following
+procedure:
+        echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+Where the parameters to the "add" command are the name of a cell and a list of
+volume location servers within that cell.
+Filesystems can be mounted anywhere by commands similar to the following:
+        mount -t afs "%cambridge.redhat.com:root.afs." /afs
+        mount -t afs "#cambridge.redhat.com:root.cell." /afs/cambridge
+        mount -t afs "#root.afs." /afs
+        mount -t afs "#root.cell." /afs/cambridge
+  NB: When using this on Linux 2.4, the mount command has to be different,
+      since the filesystem doesn't have access to the device name argument:
+        mount -t afs none /afs -ovol="#root.afs."
+Where the initial character is either a hash or a percent symbol depending on
+whether you definitely want a R/W volume (hash) or whether you'd prefer a R/O
+volume, but are willing to use a R/W volume instead (percent).
+The name of the volume can be suffixes with ".backup" or ".readonly" to
+specify connection to only volumes of those types.
+The name of the cell is optional, and if not given during a mount, then the
+named volume will be looked up in the cell specified during insmod.
+Additional cells can be added through /proc (see later section).
+MOUNTPOINTS
+===========
+AFS has a concept of mountpoints. These are specially formatted symbolic links
+(of the same form as the "device name" passed to mount). kAFS presents these
+to the user as directories that have special properties:
+  (*) They cannot be listed. Running a program like "ls" on them will incur an
+      EREMOTE error (Object is remote).
+  (*) Other objects can't be looked up inside of them. This also incurs an
+      EREMOTE error.
+  (*) They can be queried with the readlink() system call, which will return
+      the name of the mountpoint to which they point. The "readlink" program
+      will also work.
+  (*) They can be mounted on (which symbolic links can't).
+PROC FILESYSTEM
+===============
+The rxrpc module creates a number of files in various places in the /proc
+filesystem:
+  (*) Firstly, some information files are made available in a directory called
+      "/proc/net/rxrpc/". These list the extant transport endpoint, peer,
+      connection and call records.
+  (*) Secondly, some control files are made available in a directory called
+      "/proc/sys/rxrpc/". Currently, all these files can be used for is to
+      turn on various levels of tracing.
+The AFS modules creates a "/proc/fs/afs/" directory and populates it:
+  (*) A "cells" file that lists cells currently known to the afs module.
+  (*) A directory per cell that contains files that list volume location
+      servers, volumes, and active servers known within that cell.
+THE CELL DATABASE
+=================
+The filesystem maintains an internal database of all the cells it knows and
+the IP addresses of the volume location servers for those cells. The cell to
+which the computer belongs is added to the database when insmod is performed
+by the "rootcell=" argument.
+Further cells can be added by commands similar to the following:
+        echo add CELLNAME VLADDR[:VLADDR][:VLADDR]... >/proc/fs/afs/cells
+        echo add grand.central.org 18.7.14.88:128.2.191.224 >/proc/fs/afs/cells
+No other cell database operations are available at this time.
+EXAMPLES
+========
+Here's what I use to test this. Some of the names and IP addresses are local
+to my internal DNS. My "root.afs" partition has a mount point within it for
+some public volumes volumes.
+insmod -S /tmp/rxrpc.o 
+insmod -S /tmp/kafs.o rootcell=cambridge.redhat.com:172.16.18.73:172.16.18.91
+mount -t afs \%root.afs. /afs
+mount -t afs \%cambridge.redhat.com:root.cell. /afs/cambridge.redhat.com/
+echo add grand.central.org 18.7.14.88:128.2.191.224 > /proc/fs/afs/cells 
+mount -t afs "#grand.central.org:root.cell." /afs/grand.central.org/
+mount -t afs "#grand.central.org:root.archive." /afs/grand.central.org/archive
+mount -t afs "#grand.central.org:root.contrib." /afs/grand.central.org/contrib
+mount -t afs "#grand.central.org:root.doc." /afs/grand.central.org/doc
+mount -t afs "#grand.central.org:root.project." /afs/grand.central.org/project
+mount -t afs "#grand.central.org:root.service." /afs/grand.central.org/service
+mount -t afs "#grand.central.org:root.software." /afs/grand.central.org/software
+mount -t afs "#grand.central.org:root.user." /afs/grand.central.org/user
+umount /afs/grand.central.org/user
+umount /afs/grand.central.org/software
+umount /afs/grand.central.org/service
+umount /afs/grand.central.org/project
+umount /afs/grand.central.org/doc
+umount /afs/grand.central.org/contrib
+umount /afs/grand.central.org/archive
+umount /afs/grand.central.org
+umount /afs/cambridge.redhat.com
+umount /afs
+rmmod kafs
+rmmod rxrpc
diff --git a/Documentation/filesystems/automount-support.txt b/Documentation/filesystems/automount-support.txt
new file mode 100644
index 000000000000..58c65a1713e5
--- /dev/null
+++ b/Documentation/filesystems/automount-support.txt
@@ -0,0 +1,118 @@
+Support is available for filesystems that wish to do automounting support (such
+as kAFS which can be found in fs/afs/). This facility includes allowing
+in-kernel mounts to be performed and mountpoint degradation to be
+requested. The latter can also be requested by userspace.
+======================
+IN-KERNEL AUTOMOUNTING
+======================
+A filesystem can now mount another filesystem on one of its directories by the
+following procedure:
+ (1) Give the directory a follow_link() operation.
+     When the directory is accessed, the follow_link op will be called, and
+     it will be provided with the location of the mountpoint in the nameidata
+     structure (vfsmount and dentry).
+ (2) Have the follow_link() op do the following steps:
+     (a) Call do_kern_mount() to call the appropriate filesystem to set up a
+         superblock and gain a vfsmount structure representing it.
+     (b) Copy the nameidata provided as an argument and substitute the dentry
+         argument into it the copy.
+     (c) Call do_add_mount() to install the new vfsmount into the namespace's
+         mountpoint tree, thus making it accessible to userspace. Use the
+         nameidata set up in (b) as the destination.
+         If the mountpoint will be automatically expired, then do_add_mount()
+         should also be given the location of an expiration list (see further
+         down).
+     (d) Release the path in the nameidata argument and substitute in the new
+         vfsmount and its root dentry. The ref counts on these will need
+         incrementing.
+Then from userspace, you can just do something like:
+        [root@andromeda root]# mount -t afs \#root.afs. /afs
+        [root@andromeda root]# ls /afs
+        asd  cambridge  cambridge.redhat.com  grand.central.org
+        [root@andromeda root]# ls /afs/cambridge
+        afsdoc
+        [root@andromeda root]# ls /afs/cambridge/afsdoc/
+        ChangeLog  html  LICENSE  pdf  RELNOTES-1.2.2
+And then if you look in the mountpoint catalogue, you'll see something like:
+        [root@andromeda root]# cat /proc/mounts
+        ...
+        #root.afs. /afs afs rw 0 0
+        #root.cell. /afs/cambridge.redhat.com afs rw 0 0
+        #afsdoc. /afs/cambridge.redhat.com/afsdoc afs rw 0 0
+===========================
+AUTOMATIC MOUNTPOINT EXPIRY
+===========================
+Automatic expiration of mountpoints is easy, provided you've mounted the
+mountpoint to be expired in the automounting procedure outlined above.
+To do expiration, you need to follow these steps:
+ (3) Create at least one list off which the vfsmounts to be expired can be
+     hung. Access to this list will be governed by the vfsmount_lock.
+ (4) In step (2c) above, the call to do_add_mount() should be provided with a
+     pointer to this list. It will hang the vfsmount off of it if it succeeds.
+ (5) When you want mountpoints to be expired, call mark_mounts_for_expiry()
+     with a pointer to this list. This will process the list, marking every
+     vfsmount thereon for potential expiry on the next call.
+     If a vfsmount was already flagged for expiry, and if its usage count is 1
+     (it's only referenced by its parent vfsmount), then it will be deleted
+     from the namespace and thrown away (effectively unmounted).
+     It may prove simplest to simply call this at regular intervals, using
+     some sort of timed event to drive it.
+The expiration flag is cleared by calls to mntput. This means that expiration
+will only happen on the second expiration request after the last time the
+mountpoint was accessed.
+If a mountpoint is moved, it gets removed from the expiration list. If a bind
+mount is made on an expirable mount, the new vfsmount will not be on the
+expiration list and will not expire.
+If a namespace is copied, all mountpoints contained therein will be copied,
+and the copies of those that are on an expiration list will be added to the
+same expiration list.
+=======================
+USERSPACE DRIVEN EXPIRY
+=======================
+As an alternative, it is possible for userspace to request expiry of any
+mountpoint (though some will be rejected - the current process's idea of the
+rootfs for example). It does this by passing the MNT_EXPIRE flag to
+umount(). This flag is considered incompatible with MNT_FORCE and MNT_DETACH.
+If the mountpoint in question is in referenced by something other than
+umount() or its parent mountpoint, an EBUSY error will be returned and the
+mountpoint will not be marked for expiration or unmounted.
+If the mountpoint was not already marked for expiry at that time, an EAGAIN
+error will be given and it won't be unmounted.
+Otherwise if it was already marked and it wasn't referenced, unmounting will
+take place as usual.
+Again, the expiration flag is cleared every time anything other than umount()
+looks at a mountpoint.
diff --git a/Documentation/filesystems/befs.txt b/Documentation/filesystems/befs.txt
new file mode 100644
index 000000000000..877a7b1d46ec
--- /dev/null
+++ b/Documentation/filesystems/befs.txt
@@ -0,0 +1,117 @@
+BeOS filesystem for Linux
+Document last updated: Dec 6, 2001
+WARNING
+=======
+Make sure you understand that this is alpha software.  This means that the
+implementation is neither complete nor well-tested. 
+I DISCLAIM ALL RESPONSIBILTY FOR ANY POSSIBLE BAD EFFECTS OF THIS CODE!
+LICENSE
+=====
+This software is covered by the GNU General Public License. 
+See the file COPYING for the complete text of the license.
+Or the GNU website: <http://www.gnu.org/licenses/licenses.html>
+AUTHOR
+=====
+The largest part of the code written by Will Dyson <will_dyson@pobox.com>
+He has been working on the code since Aug 13, 2001. See the changelog for
+details.
+Original Author: Makoto Kato <m_kato@ga2.so-net.ne.jp>
+His orriginal code can still be found at:
+<http://hp.vector.co.jp/authors/VA008030/bfs/>
+Does anyone know of a more current email address for Makoto? He doesn't
+respond to the address given above...
+Current maintainer: Sergey S. Kostyliov <rathamahata@php4.ru>
+WHAT IS THIS DRIVER?
+==================
+This module implements the native filesystem of BeOS <http://www.be.com/>
+for the linux 2.4.1 and later kernels. Currently it is a read-only
+implementation.
+Which is it, BFS or BEFS?
+================
+Be, Inc said, "BeOS Filesystem is officially called BFS, not BeFS". 
+But Unixware Boot Filesystem is called bfs, too. And they are already in
+the kernel. Because of this nameing conflict, on Linux the BeOS
+filesystem is called befs.
+HOW TO INSTALL
+==============
+step 1.  Install the BeFS  patch into the source code tree of linux.
+Apply the patchfile to your kernel source tree.
+Assuming that your kernel source is in /foo/bar/linux and the patchfile
+is called patch-befs-xxx, you would do the following:
+        cd /foo/bar/linux
+        patch -p1 < /path/to/patch-befs-xxx
+if the patching step fails (i.e. there are rejected hunks), you can try to
+figure it out yourself (it shouldn't be hard), or mail the maintainer 
+(Will Dyson <will_dyson@pobox.com>) for help.
+step 2.  Configuretion & make kernel
+The linux kernel has many compile-time options. Most of them are beyond the
+scope of this document. I suggest the Kernel-HOWTO document as a good general
+reference on this topic. <http://www.linux.com/howto/Kernel-HOWTO.html>
+However, to use the BeFS module, you must enable it at configure time.
+        cd /foo/bar/linux
+        make menuconfig (or xconfig)
+The BeFS module is not a standard part of the linux kernel, so you must first
+enable support for experimental code under the "Code maturity level" menu.
+Then, under the "Filesystems" menu will be an option called "BeFS
+filesystem (experimental)", or something like that. Enable that option
+(it is fine to make it a module).
+Save your kernel configuration and then build your kernel.
+step 3.  Install
+See the kernel howto <http://www.linux.com/howto/Kernel-HOWTO.html> for
+instructions on this critical step.
+USING BFS
+=========
+To use the BeOS filesystem, use filesystem type 'befs'.
+ex)
+    mount -t befs /dev/fd0 /beos
+MOUNT OPTIONS
+=============
+uid=nnn        All files in the partition will be owned by user id nnn.
+gid=nnn        All files in the partition will be in group nnn.
+iocharset=xxx  Use xxx as the name of the NLS translation table.
+debug          The driver will output debugging information to the syslog.
+HOW TO GET LASTEST VERSION
+==========================
+The latest version is currently available at:
+<http://befs-driver.sourceforge.net/>
+ANY KNOWN BUGS?
+===========
+As of Jan 20, 2002:
+        
+        None
+SPECIAL THANKS
+==============
+Dominic Giampalo ... Writing "Practical file system design with Be filesystem"
+Hiroyuki Yamada  ... Testing LinuxPPC.
diff --git a/Documentation/filesystems/bfs.txt b/Documentation/filesystems/bfs.txt
new file mode 100644
index 000000000000..d2841e0bcf02
--- /dev/null
+++ b/Documentation/filesystems/bfs.txt
@@ -0,0 +1,57 @@
+BFS FILESYSTEM FOR LINUX
+========================
+The BFS filesystem is used by SCO UnixWare OS for the /stand slice, which
+usually contains the kernel image and a few other files required for the
+boot process.
+In order to access /stand partition under Linux you obviously need to
+know the partition number and the kernel must support UnixWare disk slices
+(CONFIG_UNIXWARE_DISKLABEL config option). However BFS support does not
+depend on having UnixWare disklabel support because one can also mount
+BFS filesystem via loopback:
+# losetup /dev/loop0 stand.img
+# mount -t bfs /dev/loop0 /mnt/stand
+where stand.img is a file containing the image of BFS filesystem. 
+When you have finished using it and umounted you need to also deallocate
+/dev/loop0 device by:
+# losetup -d /dev/loop0
+You can simplify mounting by just typing:
+# mount -t bfs -o loop stand.img /mnt/stand
+this will allocate the first available loopback device (and load loop.o 
+kernel module if necessary) automatically. If the loopback driver is not
+loaded automatically, make sure that your kernel is compiled with kmod 
+support (CONFIG_KMOD) enabled. Beware that umount will not
+deallocate /dev/loopN device if /etc/mtab file on your system is a
+symbolic link to /proc/mounts. You will need to do it manually using
+"-d" switch of losetup(8). Read losetup(8) manpage for more info.
+To create the BFS image under UnixWare you need to find out first which
+slice contains it. The command prtvtoc(1M) is your friend:
+# prtvtoc /dev/rdsk/c0b0t0d0s0
+(assuming your root disk is on target=0, lun=0, bus=0, controller=0). Then you
+look for the slice with tag "STAND", which is usually slice 10. With this
+information you can use dd(1) to create the BFS image:
+# umount /stand
+# dd if=/dev/rdsk/c0b0t0d0sa of=stand.img bs=512
+Just in case, you can verify that you have done the right thing by checking
+the magic number:
+# od -Ad -tx4 stand.img | more
+The first 4 bytes should be 0x1badface.
+If you have any patches, questions or suggestions regarding this BFS
+implementation please contact the author:
+Tigran A. Aivazian <tigran@veritas.com>
diff --git a/Documentation/filesystems/cifs.txt b/Documentation/filesystems/cifs.txt
new file mode 100644
index 000000000000..49cc923a93e3
--- /dev/null
+++ b/Documentation/filesystems/cifs.txt
@@ -0,0 +1,51 @@
+  This is the client VFS module for the Common Internet File System
+  (CIFS) protocol which is the successor to the Server Message Block 
+  (SMB) protocol, the native file sharing mechanism for most early
+  PC operating systems.  CIFS is fully supported by current network
+  file servers such as Windows 2000, Windows 2003 (including  
+  Windows XP) as well by Samba (which provides excellent CIFS
+  server support for Linux and many other operating systems), so
+  this network filesystem client can mount to a wide variety of
+  servers.  The smbfs module should be used instead of this cifs module
+  for mounting to older SMB servers such as OS/2.  The smbfs and cifs
+  modules can coexist and do not conflict.  The CIFS VFS filesystem
+  module is designed to work well with servers that implement the
+  newer versions (dialects) of the SMB/CIFS protocol such as Samba, 
+  the program written by Andrew Tridgell that turns any Unix host 
+  into a SMB/CIFS file server.
+  The intent of this module is to provide the most advanced network
+  file system function for CIFS compliant servers, including better
+  POSIX compliance, secure per-user session establishment, high
+  performance safe distributed caching (oplock), optional packet
+  signing, large files, Unicode support and other internationalization
+  improvements. Since both Samba server and this filesystem client support
+  the CIFS Unix extensions, the combination can provide a reasonable 
+  alternative to NFSv4 for fileserving in some Linux to Linux environments,
+  not just in Linux to Windows environments.
+  This filesystem has an optional mount utility (mount.cifs) that can
+  be obtained from the project page and installed in the path in the same
+  directory with the other mount helpers (such as mount.smbfs). 
+  Mounting using the cifs filesystem without installing the mount helper
+  requires specifying the server's ip address.
+  For Linux 2.4:
+    mount //anything/here /mnt_target -o
+            user=username,pass=password,unc=//ip_address_of_server/sharename
+  For Linux 2.5: 
+    mount //ip_address_of_server/sharename /mnt_target -o user=username, pass=password
+  For more information on the module see the project page at
+      http://us1.samba.org/samba/Linux_CIFS_client.html 
+  For more information on CIFS see:
+      http://www.snia.org/tech_activities/CIFS
+  or the Samba site:
+     
+      http://www.samba.org
diff --git a/Documentation/filesystems/coda.txt b/Documentation/filesystems/coda.txt
new file mode 100644
index 000000000000..61311356025d
--- /dev/null
+++ b/Documentation/filesystems/coda.txt
@@ -0,0 +1,1673 @@
+NOTE: 
+This is one of the technical documents describing a component of
+Coda -- this document describes the client kernel-Venus interface.
+For more information:
+  http://www.coda.cs.cmu.edu
+For user level software needed to run Coda:
+  ftp://ftp.coda.cs.cmu.edu
+To run Coda you need to get a user level cache manager for the client,
+named Venus, as well as tools to manipulate ACLs, to log in, etc.  The
+client needs to have the Coda filesystem selected in the kernel
+configuration.
+The server needs a user level server and at present does not depend on
+kernel support.
+  The Venus kernel interface
+  Peter J. Braam
+  v1.0, Nov 9, 1997
+  This document describes the communication between Venus and kernel
+  level filesystem code needed for the operation of the Coda file sys-
+  tem.  This document version is meant to describe the current interface
+  (version 1.0) as well as improvements we envisage.
+  ______________________________________________________________________
+  Table of Contents
+  1. Introduction
+  2. Servicing Coda filesystem calls
+  3. The message layer
+     3.1 Implementation details
+  4. The interface at the call level
+     4.1 Data structures shared by the kernel and Venus
+     4.2 The pioctl interface
+     4.3 root
+     4.4 lookup
+     4.5 getattr
+     4.6 setattr
+     4.7 access
+     4.8 create
+     4.9 mkdir
+     4.10 link
+     4.11 symlink
+     4.12 remove
+     4.13 rmdir
+     4.14 readlink
+     4.15 open
+     4.16 close
+     4.17 ioctl
+     4.18 rename
+     4.19 readdir
+     4.20 vget
+     4.21 fsync
+     4.22 inactive
+     4.23 rdwr
+     4.24 odymount
+     4.25 ody_lookup
+     4.26 ody_expand
+     4.27 prefetch
+     4.28 signal
+  5. The minicache and downcalls
+     5.1 INVALIDATE
+     5.2 FLUSH
+     5.3 PURGEUSER
+     5.4 ZAPFILE
+     5.5 ZAPDIR
+     5.6 ZAPVNODE
+     5.7 PURGEFID
+     5.8 REPLACE
+  6. Initialization and cleanup
+     6.1 Requirements
+  ______________________________________________________________________
+  0wpage
+  11..  IInnttrroodduuccttiioonn
+  A key component in the Coda Distributed File System is the cache
+  manager, _V_e_n_u_s.
+  When processes on a Coda enabled system access files in the Coda
+  filesystem, requests are directed at the filesystem layer in the
+  operating system. The operating system will communicate with Venus to
+  service the request for the process.  Venus manages a persistent
+  client cache and makes remote procedure calls to Coda file servers and
+  related servers (such as authentication servers) to service these
+  requests it receives from the operating system.  When Venus has
+  serviced a request it replies to the operating system with appropriate
+  return codes, and other data related to the request.  Optionally the
+  kernel support for Coda may maintain a minicache of recently processed
+  requests to limit the number of interactions with Venus.  Venus
+  possesses the facility to inform the kernel when elements from its
+  minicache are no longer valid.
+  This document describes precisely this communication between the
+  kernel and Venus.  The definitions of so called upcalls and downcalls
+  will be given with the format of the data they handle. We shall also
+  describe the semantic invariants resulting from the calls.
+  Historically Coda was implemented in a BSD file system in Mach 2.6.
+  The interface between the kernel and Venus is very similar to the BSD
+  VFS interface.  Similar functionality is provided, and the format of
+  the parameters and returned data is very similar to the BSD VFS.  This
+  leads to an almost natural environment for implementing a kernel-level
+  filesystem driver for Coda in a BSD system.  However, other operating
+  systems such as Linux and Windows 95 and NT have virtual filesystem
+  with different interfaces.
+  To implement Coda on these systems some reverse engineering of the
+  Venus/Kernel protocol is necessary.  Also it came to light that other
+  systems could profit significantly from certain small optimizations
+  and modifications to the protocol. To facilitate this work as well as
+  to make future ports easier, communication between Venus and the
+  kernel should be documented in great detail.  This is the aim of this
+  document.
+  0wpage
+  22..  SSeerrvviicciinngg CCooddaa ffiilleessyysstteemm ccaallllss
+  The service of a request for a Coda file system service originates in
+  a process PP which accessing a Coda file. It makes a system call which
+  traps to the OS kernel. Examples of such calls trapping to the kernel
+  are _r_e_a_d_, _w_r_i_t_e_, _o_p_e_n_, _c_l_o_s_e_, _c_r_e_a_t_e_, _m_k_d_i_r_, _r_m_d_i_r_, _c_h_m_o_d in a Unix
+  context.  Similar calls exist in the Win32 environment, and are named
+  _C_r_e_a_t_e_F_i_l_e_, .
+  Generally the operating system handles the request in a virtual
+  filesystem (VFS) layer, which is named I/O Manager in NT and IFS
+  manager in Windows 95.  The VFS is responsible for partial processing
+  of the request and for locating the specific filesystem(s) which will
+  service parts of the request.  Usually the information in the path
+  assists in locating the correct FS drivers.  Sometimes after extensive
+  pre-processing, the VFS starts invoking exported routines in the FS
+  driver.  This is the point where the FS specific processing of the
+  request starts, and here the Coda specific kernel code comes into
+  play.
+  The FS layer for Coda must expose and implement several interfaces.
+  First and foremost the VFS must be able to make all necessary calls to
+  the Coda FS layer, so the Coda FS driver must expose the VFS interface
+  as applicable in the operating system. These differ very significantly
+  among operating systems, but share features such as facilities to
+  read/write and create and remove objects.  The Coda FS layer services
+  such VFS requests by invoking one or more well defined services
+  offered by the cache manager Venus.  When the replies from Venus have
+  come back to the FS driver, servicing of the VFS call continues and
+  finishes with a reply to the kernel's VFS. Finally the VFS layer
+  returns to the process.
+  As a result of this design a basic interface exposed by the FS driver
+  must allow Venus to manage message traffic.  In particular Venus must
+  be able to retrieve and place messages and to be notified of the
+  arrival of a new message. The notification must be through a mechanism
+  which does not block Venus since Venus must attend to other tasks even
+  when no messages are waiting or being processed.
+                     Interfaces of the Coda FS Driver
+  Furthermore the FS layer provides for a special path of communication
+  between a user process and Venus, called the pioctl interface. The
+  pioctl interface is used for Coda specific services, such as
+  requesting detailed information about the persistent cache managed by
+  Venus. Here the involvement of the kernel is minimal.  It identifies
+  the calling process and passes the information on to Venus.  When
+  Venus replies the response is passed back to the caller in unmodified
+  form.
+  Finally Venus allows the kernel FS driver to cache the results from
+  certain services.  This is done to avoid excessive context switches
+  and results in an efficient system.  However, Venus may acquire
+  information, for example from the network which implies that cached
+  information must be flushed or replaced. Venus then makes a downcall
+  to the Coda FS layer to request flushes or updates in the cache.  The
+  kernel FS driver handles such requests synchronously.
+  Among these interfaces the VFS interface and the facility to place,
+  receive and be notified of messages are platform specific.  We will
+  not go into the calls exported to the VFS layer but we will state the
+  requirements of the message exchange mechanism.
+  0wpage
+  33..  TThhee mmeessssaaggee llaayyeerr
+  At the lowest level the communication between Venus and the FS driver
+  proceeds through messages.  The synchronization between processes
+  requesting Coda file service and Venus relies on blocking and waking
+  up processes.  The Coda FS driver processes VFS- and pioctl-requests
+  on behalf of a process P, creates messages for Venus, awaits replies
+  and finally returns to the caller.  The implementation of the exchange
+  of messages is platform specific, but the semantics have (so far)
+  appeared to be generally applicable.  Data buffers are created by the
+  FS Driver in kernel memory on behalf of P and copied to user memory in
+  Venus.
+  The FS Driver while servicing P makes upcalls to Venus.  Such an
+  upcall is dispatched to Venus by creating a message structure.  The
+  structure contains the identification of P, the message sequence
+  number, the size of the request and a pointer to the data in kernel
+  memory for the request.  Since the data buffer is re-used to hold the
+  reply from Venus, there is a field for the size of the reply.  A flags
+  field is used in the message to precisely record the status of the
+  message.  Additional platform dependent structures involve pointers to
+  determine the position of the message on queues and pointers to
+  synchronization objects.  In the upcall routine the message structure
+  is filled in, flags are set to 0, and it is placed on the _p_e_n_d_i_n_g
+  queue.  The routine calling upcall is responsible for allocating the
+  data buffer; its structure will be described in the next section.
+  A facility must exist to notify Venus that the message has been
+  created, and implemented using available synchronization objects in
+  the OS. This notification is done in the upcall context of the process
+  P. When the message is on the pending queue, process P cannot proceed
+  in upcall.  The (kernel mode) processing of P in the filesystem
+  request routine must be suspended until Venus has replied.  Therefore
+  the calling thread in P is blocked in upcall.  A pointer in the
+  message structure will locate the synchronization object on which P is
+  sleeping.
+  Venus detects the notification that a message has arrived, and the FS
+  driver allow Venus to retrieve the message with a getmsg_from_kernel
+  call. This action finishes in the kernel by putting the message on the
+  queue of processing messages and setting flags to READ.  Venus is
+  passed the contents of the data buffer. The getmsg_from_kernel call
+  now returns and Venus processes the request.
+  At some later point the FS driver receives a message from Venus,
+  namely when Venus calls sendmsg_to_kernel.  At this moment the Coda FS
+  driver looks at the contents of the message and decides if:
+  +o  the message is a reply for a suspended thread P.  If so it removes
+     the message from the processing queue and marks the message as
+     WRITTEN.  Finally, the FS driver unblocks P (still in the kernel
+     mode context of Venus) and the sendmsg_to_kernel call returns to
+     Venus.  The process P will be scheduled at some point and continues
+     processing its upcall with the data buffer replaced with the reply
+     from Venus.
+  +o  The message is a _d_o_w_n_c_a_l_l.  A downcall is a request from Venus to
+     the FS Driver. The FS driver processes the request immediately
+     (usually a cache eviction or replacement) and when it finishes
+     sendmsg_to_kernel returns.
+  Now P awakes and continues processing upcall.  There are some
+  subtleties to take account of. First P will determine if it was woken
+  up in upcall by a signal from some other source (for example an
+  attempt to terminate P) or as is normally the case by Venus in its
+  sendmsg_to_kernel call.  In the normal case, the upcall routine will
+  deallocate the message structure and return.  The FS routine can proceed
+  with its processing.
+                      Sleeping and IPC arrangements
+  In case P is woken up by a signal and not by Venus, it will first look
+  at the flags field.  If the message is not yet READ, the process P can
+  handle its signal without notifying Venus.  If Venus has READ, and
+  the request should not be processed, P can send Venus a signal message
+  to indicate that it should disregard the previous message.  Such
+  signals are put in the queue at the head, and read first by Venus.  If
+  the message is already marked as WRITTEN it is too late to stop the
+  processing.  The VFS routine will now continue.  (-- If a VFS request
+  involves more than one upcall, this can lead to complicated state, an
+  extra field "handle_signals" could be added in the message structure
+  to indicate points of no return have been passed.--)
+  33..11..  IImmpplleemmeennttaattiioonn ddeettaaiillss
+  The Unix implementation of this mechanism has been through the
+  implementation of a character device associated with Coda.  Venus
+  retrieves messages by doing a read on the device, replies are sent
+  with a write and notification is through the select system call on the
+  file descriptor for the device.  The process P is kept waiting on an
+  interruptible wait queue object.
+  In Windows NT and the DPMI Windows 95 implementation a DeviceIoControl
+  call is used.  The DeviceIoControl call is designed to copy buffers
+  from user memory to kernel memory with OPCODES. The sendmsg_to_kernel
+  is issued as a synchronous call, while the getmsg_from_kernel call is
+  asynchronous.  Windows EventObjects are used for notification of
+  message arrival.  The process P is kept waiting on a KernelEvent
+  object in NT and a semaphore in Windows 95.
+  0wpage
+  44..  TThhee iinntteerrffaaccee aatt tthhee ccaallll lleevveell
+  This section describes the upcalls a Coda FS driver can make to Venus.
+  Each of these upcalls make use of two structures: inputArgs and
+  outputArgs.   In pseudo BNF form the structures take the following
+  form:
+  struct inputArgs {
+      u_long opcode;
+      u_long unique;     /* Keep multiple outstanding msgs distinct */
+      u_short pid;                 /* Common to all */
+      u_short pgid;                /* Common to all */
+      struct CodaCred cred;        /* Common to all */
+      <union "in" of call dependent parts of inputArgs>
+  };
+  struct outputArgs {
+      u_long opcode;
+      u_long unique;       /* Keep multiple outstanding msgs distinct */
+      u_long result;
+      <union "out" of call dependent parts of inputArgs>
+  };
+  Before going on let us elucidate the role of the various fields. The
+  inputArgs start with the opcode which defines the type of service
+  requested from Venus. There are approximately 30 upcalls at present
+  which we will discuss.   The unique field labels the inputArg with a
+  unique number which will identify the message uniquely.  A process and
+  process group id are passed.  Finally the credentials of the caller
+  are included.
+  Before delving into the specific calls we need to discuss a variety of
+  data structures shared by the kernel and Venus.
+  44..11..  DDaattaa ssttrruuccttuurreess sshhaarreedd bbyy tthhee kkeerrnneell aanndd VVeennuuss
+  The CodaCred structure defines a variety of user and group ids as
+  they are set for the calling process. The vuid_t and guid_t are 32 bit
+  unsigned integers.  It also defines group membership in an array.  On
+  Unix the CodaCred has proven sufficient to implement good security
+  semantics for Coda but the structure may have to undergo modification
+  for the Windows environment when these mature.
+  struct CodaCred {
+      vuid_t cr_uid, cr_euid, cr_suid, cr_fsuid; /* Real, effective, set, fs uid*/
+      vgid_t cr_gid, cr_egid, cr_sgid, cr_fsgid; /* same for groups */
+      vgid_t cr_groups[NGROUPS];        /* Group membership for caller */
+  };
+  NNOOTTEE It is questionable if we need CodaCreds in Venus. Finally Venus
+  doesn't know about groups, although it does create files with the
+  default uid/gid.  Perhaps the list of group membership is superfluous.
+  The next item is the fundamental identifier used to identify Coda
+  files, the ViceFid.  A fid of a file uniquely defines a file or
+  directory in the Coda filesystem within a _c_e_l_l.   (-- A _c_e_l_l is a
+  group of Coda servers acting under the aegis of a single system
+  control machine or SCM. See the Coda Administration manual for a
+  detailed description of the role of the SCM.--)
+  typedef struct ViceFid {
+      VolumeId Volume;
+      VnodeId Vnode;
+      Unique_t Unique;
+  } ViceFid;
+  Each of the constituent fields: VolumeId, VnodeId and Unique_t are
+  unsigned 32 bit integers.  We envisage that a further field will need
+  to be prefixed to identify the Coda cell; this will probably take the
+  form of a Ipv6 size IP address naming the Coda cell through DNS.
+  The next important structure shared between Venus and the kernel is
+  the attributes of the file.  The following structure is used to
+  exchange information.  It has room for future extensions such as
+  support for device files (currently not present in Coda).
+  struct coda_vattr {
+          enum coda_vtype va_type;        /* vnode type (for create) */
+          u_short         va_mode;        /* files access mode and type */
+          short           va_nlink;       /* number of references to file */
+          vuid_t          va_uid;         /* owner user id */
+          vgid_t          va_gid;         /* owner group id */
+          long            va_fsid;        /* file system id (dev for now) */
+          long            va_fileid;      /* file id */
+          u_quad_t        va_size;        /* file size in bytes */
+          long            va_blocksize;   /* blocksize preferred for i/o */
+          struct timespec va_atime;       /* time of last access */
+          struct timespec va_mtime;       /* time of last modification */
+          struct timespec va_ctime;       /* time file changed */
+          u_long          va_gen;         /* generation number of file */
+          u_long          va_flags;       /* flags defined for file */
+          dev_t           va_rdev;        /* device special file represents */
+          u_quad_t        va_bytes;       /* bytes of disk space held by file */
+          u_quad_t        va_filerev;     /* file modification number */
+          u_int           va_vaflags;     /* operations flags, see below */
+          long            va_spare;       /* remain quad aligned */
+  };
+  44..22..  TThhee ppiiooccttll iinntteerrffaaccee
+  Coda specific requests can be made by application through the pioctl
+  interface. The pioctl is implemented as an ordinary ioctl on a
+  fictitious file /coda/.CONTROL.  The pioctl call opens this file, gets
+  a file handle and makes the ioctl call. Finally it closes the file.
+  The kernel involvement in this is limited to providing the facility to
+  open and close and pass the ioctl message _a_n_d to verify that a path in
+  the pioctl data buffers is a file in a Coda filesystem.
+  The kernel is handed a data packet of the form:
+      struct {
+          const char *path;
+          struct ViceIoctl vidata;
+          int follow;
+      } data;
+  where
+  struct ViceIoctl {
+          caddr_t in, out;        /* Data to be transferred in, or out */
+          short in_size;          /* Size of input buffer <= 2K */
+          short out_size;         /* Maximum size of output buffer, <= 2K */
+  };
+  The path must be a Coda file, otherwise the ioctl upcall will not be
+  made.
+  NNOOTTEE  The data structures and code are a mess.  We need to clean this
+  up.
+  We now proceed to document the individual calls:
+  0wpage
+  44..33..  rroooott
+  AArrgguummeennttss
+     iinn empty
+     oouutt
+                struct cfs_root_out {
+                    ViceFid VFid;
+                } cfs_root;
+  DDeessccrriippttiioonn This call is made to Venus during the initialization of
+  the Coda filesystem. If the result is zero, the cfs_root structure
+  contains the ViceFid of the root of the Coda filesystem. If a non-zero
+  result is generated, its value is a platform dependent error code
+  indicating the difficulty Venus encountered in locating the root of
+  the Coda filesystem.
+  0wpage
+  44..44..  llooookkuupp
+  SSuummmmaarryy Find the ViceFid and type of an object in a directory if it
+  exists.
+  AArrgguummeennttss
+     iinn
+                struct  cfs_lookup_in {
+                    ViceFid     VFid;
+                    char        *name;          /* Place holder for data. */
+                } cfs_lookup;
+     oouutt
+                struct cfs_lookup_out {
+                    ViceFid VFid;
+                    int vtype;
+                } cfs_lookup;
+  DDeessccrriippttiioonn This call is made to determine the ViceFid and filetype of
+  a directory entry.  The directory entry requested carries name name
+  and Venus will search the directory identified by cfs_lookup_in.VFid.
+  The result may indicate that the name does not exist, or that
+  difficulty was encountered in finding it (e.g. due to disconnection).
+  If the result is zero, the field cfs_lookup_out.VFid contains the
+  targets ViceFid and cfs_lookup_out.vtype the coda_vtype giving the
+  type of object the name designates.
+  The name of the object is an 8 bit character string of maximum length
+  CFS_MAXNAMLEN, currently set to 256 (including a 0 terminator.)
+  It is extremely important to realize that Venus bitwise ors the field
+  cfs_lookup.vtype with CFS_NOCACHE to indicate that the object should
+  not be put in the kernel name cache.
+  NNOOTTEE The type of the vtype is currently wrong.  It should be
+  coda_vtype. Linux does not take note of CFS_NOCACHE.  It should.
+  0wpage
+  44..55..  ggeettaattttrr
+  SSuummmmaarryy Get the attributes of a file.
+  AArrgguummeennttss
+     iinn
+                struct cfs_getattr_in {
+                    ViceFid VFid;
+                    struct coda_vattr attr; /* XXXXX */
+                } cfs_getattr;
+     oouutt
+                struct cfs_getattr_out {
+                    struct coda_vattr attr;
+                } cfs_getattr;
+  DDeessccrriippttiioonn This call returns the attributes of the file identified by
+  fid.
+  EErrrroorrss Errors can occur if the object with fid does not exist, is
+  unaccessible or if the caller does not have permission to fetch
+  attributes.
+  NNoottee Many kernel FS drivers (Linux, NT and Windows 95) need to acquire
+  the attributes as well as the Fid for the instantiation of an internal
+  "inode" or "FileHandle".  A significant improvement in performance on
+  such systems could be made by combining the _l_o_o_k_u_p and _g_e_t_a_t_t_r calls
+  both at the Venus/kernel interaction level and at the RPC level.
+  The vattr structure included in the input arguments is superfluous and
+  should be removed.
+  0wpage
+  44..66..  sseettaattttrr
+  SSuummmmaarryy Set the attributes of a file.
+  AArrgguummeennttss
+     iinn
+                struct cfs_setattr_in {
+                    ViceFid VFid;
+                    struct coda_vattr attr;
+                } cfs_setattr;
+     oouutt
+        empty
+  DDeessccrriippttiioonn The structure attr is filled with attributes to be changed
+  in BSD style.  Attributes not to be changed are set to -1, apart from
+  vtype which is set to VNON. Other are set to the value to be assigned.
+  The only attributes which the FS driver may request to change are the
+  mode, owner, groupid, atime, mtime and ctime.  The return value
+  indicates success or failure.
+  EErrrroorrss A variety of errors can occur.  The object may not exist, may
+  be inaccessible, or permission may not be granted by Venus.
+  0wpage
+  44..77..  aacccceessss
+  SSuummmmaarryy
+  AArrgguummeennttss
+     iinn
+                struct cfs_access_in {
+                    ViceFid     VFid;
+                    int flags;
+                } cfs_access;
+     oouutt
+        empty
+  DDeessccrriippttiioonn Verify if access to the object identified by VFid for
+  operations described by flags is permitted.  The result indicates if
+  access will be granted.  It is important to remember that Coda uses
+  ACLs to enforce protection and that ultimately the servers, not the
+  clients enforce the security of the system.  The result of this call
+  will depend on whether a _t_o_k_e_n is held by the user.
+  EErrrroorrss The object may not exist, or the ACL describing the protection
+  may not be accessible.
+  0wpage
+  44..88..  ccrreeaattee
+  SSuummmmaarryy Invoked to create a file
+  AArrgguummeennttss
+     iinn
+                struct cfs_create_in {
+                    ViceFid VFid;
+                    struct coda_vattr attr;
+                    int excl;
+                    int mode;
+                    char        *name;          /* Place holder for data. */
+                } cfs_create;
+     oouutt
+                struct cfs_create_out {
+                    ViceFid VFid;
+                    struct coda_vattr attr;
+                } cfs_create;
+  DDeessccrriippttiioonn  This upcall is invoked to request creation of a file.
+  The file will be created in the directory identified by VFid, its name
+  will be name, and the mode will be mode.  If excl is set an error will
+  be returned if the file already exists.  If the size field in attr is
+  set to zero the file will be truncated.  The uid and gid of the file
+  are set by converting the CodaCred to a uid using a macro CRTOUID
+  (this macro is platform dependent).  Upon success the VFid and
+  attributes of the file are returned.  The Coda FS Driver will normally
+  instantiate a vnode, inode or file handle at kernel level for the new
+  object.
+  EErrrroorrss A variety of errors can occur. Permissions may be insufficient.
+  If the object exists and is not a file the error EISDIR is returned
+  under Unix.
+  NNOOTTEE The packing of parameters is very inefficient and appears to
+  indicate confusion between the system call creat and the VFS operation
+  create. The VFS operation create is only called to create new objects.
+  This create call differs from the Unix one in that it is not invoked
+  to return a file descriptor. The truncate and exclusive options,
+  together with the mode, could simply be part of the mode as it is
+  under Unix.  There should be no flags argument; this is used in open
+  (2) to return a file descriptor for READ or WRITE mode.
+  The attributes of the directory should be returned too, since the size
+  and mtime changed.
+  0wpage
+  44..99..  mmkkddiirr
+  SSuummmmaarryy Create a new directory.
+  AArrgguummeennttss
+     iinn
+                struct cfs_mkdir_in {
+                    ViceFid     VFid;
+                    struct coda_vattr attr;
+                    char        *name;          /* Place holder for data. */
+                } cfs_mkdir;
+     oouutt
+                struct cfs_mkdir_out {
+                    ViceFid VFid;
+                    struct coda_vattr attr;
+                } cfs_mkdir;
+  DDeessccrriippttiioonn This call is similar to create but creates a directory.
+  Only the mode field in the input parameters is used for creation.
+  Upon successful creation, the attr returned contains the attributes of
+  the new directory.
+  EErrrroorrss As for create.
+  NNOOTTEE The input parameter should be changed to mode instead of
+  attributes.
+  The attributes of the parent should be returned since the size and
+  mtime changes.
+  0wpage
+  44..1100..  lliinnkk
+  SSuummmmaarryy Create a link to an existing file.
+  AArrgguummeennttss
+     iinn
+                struct cfs_link_in {
+                    ViceFid sourceFid;          /* cnode to link *to* */
+                    ViceFid destFid;            /* Directory in which to place link */
+                    char        *tname;         /* Place holder for data. */
+                } cfs_link;
+     oouutt
+        empty
+  DDeessccrriippttiioonn This call creates a link to the sourceFid in the directory
+  identified by destFid with name tname.  The source must reside in the
+  target's parent, i.e. the source must be have parent destFid, i.e. Coda
+  does not support cross directory hard links.  Only the return value is
+  relevant.  It indicates success or the type of failure.
+  EErrrroorrss The usual errors can occur.0wpage
+  44..1111..  ssyymmlliinnkk
+  SSuummmmaarryy create a symbolic link
+  AArrgguummeennttss
+     iinn
+                struct cfs_symlink_in {
+                    ViceFid     VFid;          /* Directory to put symlink in */
+                    char        *srcname;
+                    struct coda_vattr attr;
+                    char        *tname;
+                } cfs_symlink;
+     oouutt
+        none
+  DDeessccrriippttiioonn Create a symbolic link. The link is to be placed in the
+  directory identified by VFid and named tname.  It should point to the
+  pathname srcname.  The attributes of the newly created object are to
+  be set to attr.
+  EErrrroorrss
+  NNOOTTEE The attributes of the target directory should be returned since
+  its size changed.
+  0wpage
+  44..1122..  rreemmoovvee
+  SSuummmmaarryy Remove a file
+  AArrgguummeennttss
+     iinn
+                struct cfs_remove_in {
+                    ViceFid     VFid;
+                    char        *name;          /* Place holder for data. */
+                } cfs_remove;
+     oouutt
+        none
+  DDeessccrriippttiioonn  Remove file named cfs_remove_in.name in directory
+  identified by   VFid.
+  EErrrroorrss
+  NNOOTTEE The attributes of the directory should be returned since its
+  mtime and size may change.
+  0wpage
+  44..1133..  rrmmddiirr
+  SSuummmmaarryy Remove a directory
+  AArrgguummeennttss
+     iinn
+                struct cfs_rmdir_in {
+                    ViceFid     VFid;
+                    char        *name;          /* Place holder for data. */
+                } cfs_rmdir;
+     oouutt
+        none
+  DDeessccrriippttiioonn Remove the directory with name name from the directory
+  identified by VFid.
+  EErrrroorrss
+  NNOOTTEE The attributes of the parent directory should be returned since
+  its mtime and size may change.
+  0wpage
+  44..1144..  rreeaaddlliinnkk
+  SSuummmmaarryy Read the value of a symbolic link.
+  AArrgguummeennttss
+     iinn
+                struct cfs_readlink_in {
+                    ViceFid VFid;
+                } cfs_readlink;
+     oouutt
+                struct cfs_readlink_out {
+                    int count;
+                    caddr_t     data;           /* Place holder for data. */
+                } cfs_readlink;
+  DDeessccrriippttiioonn This routine reads the contents of symbolic link
+  identified by VFid into the buffer data.  The buffer data must be able
+  to hold any name up to CFS_MAXNAMLEN (PATH or NAM??).
+  EErrrroorrss No unusual errors.
+  0wpage
+  44..1155..  ooppeenn
+  SSuummmmaarryy Open a file.
+  AArrgguummeennttss
+     iinn
+                struct cfs_open_in {
+                    ViceFid     VFid;
+                    int flags;
+                } cfs_open;
+     oouutt
+                struct cfs_open_out {
+                    dev_t       dev;
+                    ino_t       inode;
+                } cfs_open;
+  DDeessccrriippttiioonn  This request asks Venus to place the file identified by
+  VFid in its cache and to note that the calling process wishes to open
+  it with flags as in open(2).  The return value to the kernel differs
+  for Unix and Windows systems.  For Unix systems the Coda FS Driver is
+  informed of the device and inode number of the container file in the
+  fields dev and inode.  For Windows the path of the container file is
+  returned to the kernel.
+  EErrrroorrss
+  NNOOTTEE Currently the cfs_open_out structure is not properly adapted to
+  deal with the Windows case.  It might be best to implement two
+  upcalls, one to open aiming at a container file name, the other at a
+  container file inode.
+  0wpage
+  44..1166..  cclloossee
+  SSuummmmaarryy Close a file, update it on the servers.
+  AArrgguummeennttss
+     iinn
+                struct cfs_close_in {
+                    ViceFid     VFid;
+                    int flags;
+                } cfs_close;
+     oouutt
+        none
+  DDeessccrriippttiioonn Close the file identified by VFid.
+  EErrrroorrss
+  NNOOTTEE The flags argument is bogus and not used.  However, Venus' code
+  has room to deal with an execp input field, probably this field should
+  be used to inform Venus that the file was closed but is still memory
+  mapped for execution.  There are comments about fetching versus not
+  fetching the data in Venus vproc_vfscalls.  This seems silly.  If a
+  file is being closed, the data in the container file is to be the new
+  data.  Here again the execp flag might be in play to create confusion:
+  currently Venus might think a file can be flushed from the cache when
+  it is still memory mapped.  This needs to be understood.
+  0wpage
+  44..1177..  iiooccttll
+  SSuummmmaarryy Do an ioctl on a file. This includes the pioctl interface.
+  AArrgguummeennttss
+     iinn
+                struct cfs_ioctl_in {
+                    ViceFid VFid;
+                    int cmd;
+                    int len;
+                    int rwflag;
+                    char *data;                 /* Place holder for data. */
+                } cfs_ioctl;
+     oouutt
+                struct cfs_ioctl_out {
+                    int len;
+                    caddr_t     data;           /* Place holder for data. */
+                } cfs_ioctl;
+  DDeessccrriippttiioonn Do an ioctl operation on a file.  The command, len and
+  data arguments are filled as usual.  flags is not used by Venus.
+  EErrrroorrss
+  NNOOTTEE Another bogus parameter.  flags is not used.  What is the
+  business about PREFETCHING in the Venus code?
+  0wpage
+  44..1188..  rreennaammee
+  SSuummmmaarryy Rename a fid.
+  AArrgguummeennttss
+     iinn
+                struct cfs_rename_in {
+                    ViceFid     sourceFid;
+                    char        *srcname;
+                    ViceFid destFid;
+                    char        *destname;
+                } cfs_rename;
+     oouutt
+        none
+  DDeessccrriippttiioonn  Rename the object with name srcname in directory
+  sourceFid to destname in destFid.   It is important that the names
+  srcname and destname are 0 terminated strings.  Strings in Unix
+  kernels are not always null terminated.
+  EErrrroorrss
+  0wpage
+  44..1199..  rreeaaddddiirr
+  SSuummmmaarryy Read directory entries.
+  AArrgguummeennttss
+     iinn
+                struct cfs_readdir_in {
+                    ViceFid     VFid;
+                    int count;
+                    int offset;
+                } cfs_readdir;
+     oouutt
+                struct cfs_readdir_out {
+                    int size;
+                    caddr_t     data;           /* Place holder for data. */
+                } cfs_readdir;
+  DDeessccrriippttiioonn Read directory entries from VFid starting at offset and
+  read at most count bytes.  Returns the data in data and returns
+  the size in size.
+  EErrrroorrss
+  NNOOTTEE This call is not used.  Readdir operations exploit container
+  files.  We will re-evaluate this during the directory revamp which is
+  about to take place.
+  0wpage
+  44..2200..  vvggeett
+  SSuummmmaarryy instructs Venus to do an FSDB->Get.
+  AArrgguummeennttss
+     iinn
+                struct cfs_vget_in {
+                    ViceFid VFid;
+                } cfs_vget;
+     oouutt
+                struct cfs_vget_out {
+                    ViceFid VFid;
+                    int vtype;
+                } cfs_vget;
+  DDeessccrriippttiioonn This upcall asks Venus to do a get operation on an fsobj
+  labelled by VFid.
+  EErrrroorrss
+  NNOOTTEE This operation is not used.  However, it is extremely useful
+  since it can be used to deal with read/write memory mapped files.
+  These can be "pinned" in the Venus cache using vget and released with
+  inactive.
+  0wpage
+  44..2211..  ffssyynncc
+  SSuummmmaarryy Tell Venus to update the RVM attributes of a file.
+  AArrgguummeennttss
+     iinn
+                struct cfs_fsync_in {
+                    ViceFid VFid;
+                } cfs_fsync;
+     oouutt
+        none
+  DDeessccrriippttiioonn Ask Venus to update RVM attributes of object VFid. This
+  should be called as part of kernel level fsync type calls.  The
+  result indicates if the syncing was successful.
+  EErrrroorrss
+  NNOOTTEE Linux does not implement this call. It should.
+  0wpage
+  44..2222..  iinnaaccttiivvee
+  SSuummmmaarryy Tell Venus a vnode is no longer in use.
+  AArrgguummeennttss
+     iinn
+                struct cfs_inactive_in {
+                    ViceFid VFid;
+                } cfs_inactive;
+     oouutt
+        none
+  DDeessccrriippttiioonn This operation returns EOPNOTSUPP.
+  EErrrroorrss
+  NNOOTTEE This should perhaps be removed.
+  0wpage
+  44..2233..  rrddwwrr
+  SSuummmmaarryy Read or write from a file
+  AArrgguummeennttss
+     iinn
+                struct cfs_rdwr_in {
+                    ViceFid     VFid;
+                    int rwflag;
+                    int count;
+                    int offset;
+                    int ioflag;
+                    caddr_t     data;           /* Place holder for data. */
+                } cfs_rdwr;
+     oouutt
+                struct cfs_rdwr_out {
+                    int rwflag;
+                    int count;
+                    caddr_t     data;   /* Place holder for data. */
+                } cfs_rdwr;
+  DDeessccrriippttiioonn This upcall asks Venus to read or write from a file.
+  EErrrroorrss
+  NNOOTTEE It should be removed since it is against the Coda philosophy that
+  read/write operations never reach Venus.  I have been told the
+  operation does not work.  It is not currently used.
+  0wpage
+  44..2244..  ooddyymmoouunntt
+  SSuummmmaarryy Allows mounting multiple Coda "filesystems" on one Unix mount
+  point.
+  AArrgguummeennttss
+     iinn
+                struct ody_mount_in {
+                    char        *name;          /* Place holder for data. */
+                } ody_mount;
+     oouutt
+                struct ody_mount_out {
+                    ViceFid VFid;
+                } ody_mount;
+  DDeessccrriippttiioonn  Asks Venus to return the rootfid of a Coda system named
+  name.  The fid is returned in VFid.
+  EErrrroorrss
+  NNOOTTEE This call was used by David for dynamic sets.  It should be
+  removed since it causes a jungle of pointers in the VFS mounting area.
+  It is not used by Coda proper.  Call is not implemented by Venus.
+  0wpage
+  44..2255..  ooddyy__llooookkuupp
+  SSuummmmaarryy Looks up something.
+  AArrgguummeennttss
+     iinn irrelevant
+     oouutt
+        irrelevant
+  DDeessccrriippttiioonn
+  EErrrroorrss
+  NNOOTTEE Gut it. Call is not implemented by Venus.
+  0wpage
+  44..2266..  ooddyy__eexxppaanndd
+  SSuummmmaarryy expands something in a dynamic set.
+  AArrgguummeennttss
+     iinn irrelevant
+     oouutt
+        irrelevant
+  DDeessccrriippttiioonn
+  EErrrroorrss
+  NNOOTTEE Gut it.  Call is not implemented by Venus.
+  0wpage
+  44..2277..  pprreeffeettcchh
+  SSuummmmaarryy Prefetch a dynamic set.
+  AArrgguummeennttss
+     iinn Not documented.
+     oouutt
+        Not documented.
+  DDeessccrriippttiioonn  Venus worker.cc has support for this call, although it is
+  noted that it doesn't work.  Not surprising, since the kernel does not
+  have support for it. (ODY_PREFETCH is not a defined operation).
+  EErrrroorrss
+  NNOOTTEE Gut it. It isn't working and isn't used by Coda.
+  0wpage
+  44..2288..  ssiiggnnaall
+  SSuummmmaarryy Send Venus a signal about an upcall.
+  AArrgguummeennttss
+     iinn none
+     oouutt
+        not applicable.
+  DDeessccrriippttiioonn  This is an out-of-band upcall to Venus to inform Venus
+  that the calling process received a signal after Venus read the
+  message from the input queue.  Venus is supposed to clean up the
+  operation.
+  EErrrroorrss No reply is given.
+  NNOOTTEE We need to better understand what Venus needs to clean up and if
+  it is doing this correctly.  Also we need to handle multiple upcall
+  per system call situations correctly.  It would be important to know
+  what state changes in Venus take place after an upcall for which the
+  kernel is responsible for notifying Venus to clean up (e.g. open
+  definitely is such a state change, but many others are maybe not).
+  0wpage
+  55..  TThhee mmiinniiccaacchhee aanndd ddoowwnnccaallllss
+  The Coda FS Driver can cache results of lookup and access upcalls, to
+  limit the frequency of upcalls.  Upcalls carry a price since a process
+  context switch needs to take place.  The counterpart of caching the
+  information is that Venus will notify the FS Driver that cached
+  entries must be flushed or renamed.
+  The kernel code generally has to maintain a structure which links the
+  internal file handles (called vnodes in BSD, inodes in Linux and
+  FileHandles in Windows) with the ViceFid's which Venus maintains.  The
+  reason is that frequent translations back and forth are needed in
+  order to make upcalls and use the results of upcalls.  Such linking
+  objects are called ccnnooddeess.
+  The current minicache implementations have cache entries which record
+  the following:
+  1. the name of the file
+  2. the cnode of the directory containing the object
+  3. a list of CodaCred's for which the lookup is permitted.
+  4. the cnode of the object
+  The lookup call in the Coda FS Driver may request the cnode of the
+  desired object from the cache, by passing its name, directory and the
+  CodaCred's of the caller.  The cache will return the cnode or indicate
+  that it cannot be found.  The Coda FS Driver must be careful to
+  invalidate cache entries when it modifies or removes objects.
+  When Venus obtains information that indicates that cache entries are
+  no longer valid, it will make a downcall to the kernel.  Downcalls are
+  intercepted by the Coda FS Driver and lead to cache invalidations of
+  the kind described below.  The Coda FS Driver does not return an error
+  unless the downcall data could not be read into kernel memory.
+  55..11..  IINNVVAALLIIDDAATTEE
+  No information is available on this call.
+  55..22..  FFLLUUSSHH
+  AArrgguummeennttss None
+  SSuummmmaarryy Flush the name cache entirely.
+  DDeessccrriippttiioonn Venus issues this call upon startup and when it dies. This
+  is to prevent stale cache information being held.  Some operating
+  systems allow the kernel name cache to be switched off dynamically.
+  When this is done, this downcall is made.
+  55..33..  PPUURRGGEEUUSSEERR
+  AArrgguummeennttss
+          struct cfs_purgeuser_out {/* CFS_PURGEUSER is a venus->kernel call */
+              struct CodaCred cred;
+          } cfs_purgeuser;
+  DDeessccrriippttiioonn Remove all entries in the cache carrying the Cred.  This
+  call is issued when tokens for a user expire or are flushed.
+  55..44..  ZZAAPPFFIILLEE
+  AArrgguummeennttss
+          struct cfs_zapfile_out {  /* CFS_ZAPFILE is a venus->kernel call */
+              ViceFid CodaFid;
+          } cfs_zapfile;
+  DDeessccrriippttiioonn Remove all entries which have the (dir vnode, name) pair.
+  This is issued as a result of an invalidation of cached attributes of
+  a vnode.
+  NNOOTTEE Call is not named correctly in NetBSD and Mach.  The minicache
+  zapfile routine takes different arguments. Linux does not implement
+  the invalidation of attributes correctly.
+  55..55..  ZZAAPPDDIIRR
+  AArrgguummeennttss
+          struct cfs_zapdir_out {   /* CFS_ZAPDIR is a venus->kernel call */
+              ViceFid CodaFid;
+          } cfs_zapdir;
+  DDeessccrriippttiioonn Remove all entries in the cache lying in a directory
+  CodaFid, and all children of this directory. This call is issued when
+  Venus receives a callback on the directory.
+  55..66..  ZZAAPPVVNNOODDEE
+  AArrgguummeennttss
+          struct cfs_zapvnode_out { /* CFS_ZAPVNODE is a venus->kernel call */
+              struct CodaCred cred;
+              ViceFid VFid;
+          } cfs_zapvnode;
+  DDeessccrriippttiioonn Remove all entries in the cache carrying the cred and VFid
+  as in the arguments. This downcall is probably never issued.
+  55..77..  PPUURRGGEEFFIIDD
+  SSuummmmaarryy
+  AArrgguummeennttss
+          struct cfs_purgefid_out { /* CFS_PURGEFID is a venus->kernel call */
+              ViceFid CodaFid;
+          } cfs_purgefid;
+  DDeessccrriippttiioonn Flush the attribute for the file. If it is a dir (odd
+  vnode), purge its children from the namecache and remove the file from the
+  namecache.
+  55..88..  RREEPPLLAACCEE
+  SSuummmmaarryy Replace the Fid's for a collection of names.
+  AArrgguummeennttss
+          struct cfs_replace_out { /* cfs_replace is a venus->kernel call */
+              ViceFid NewFid;
+              ViceFid OldFid;
+          } cfs_replace;
+  DDeessccrriippttiioonn This routine replaces a ViceFid in the name cache with
+  another.  It is added to allow Venus during reintegration to replace
+  locally allocated temp fids while disconnected with global fids even
+  when the reference counts on those fids are not zero.
+  0wpage
+  66..  IInniittiiaalliizzaattiioonn aanndd cclleeaannuupp
+  This section gives brief hints as to desirable features for the Coda
+  FS Driver at startup and upon shutdown or Venus failures.  Before
+  entering the discussion it is useful to repeat that the Coda FS Driver
+  maintains the following data:
+  1. message queues
+  2. cnodes
+  3. name cache entries
+     The name cache entries are entirely private to the driver, so they
+     can easily be manipulated.   The message queues will generally have
+     clear points of initialization and destruction.  The cnodes are
+     much more delicate.  User processes hold reference counts in Coda
+     filesystems and it can be difficult to clean up the cnodes.
+  It can expect requests through:
+  1. the message subsystem
+  2. the VFS layer
+  3. pioctl interface
+     Currently the _p_i_o_c_t_l passes through the VFS for Coda so we can
+     treat these similarly.
+  66..11..  RReeqquuiirreemmeennttss
+  The following requirements should be accommodated:
+  1. The message queues should have open and close routines.  On Unix
+     the opening of the character devices are such routines.
+  +o  Before opening, no messages can be placed.
+  +o  Opening will remove any old messages still pending.
+  +o  Close will notify any sleeping processes that their upcall cannot
+     be completed.
+  +o  Close will free all memory allocated by the message queues.
+  2. At open the namecache shall be initialized to empty state.
+  3. Before the message queues are open, all VFS operations will fail.
+     Fortunately this can be achieved by making sure than mounting the
+     Coda filesystem cannot succeed before opening.
+  4. After closing of the queues, no VFS operations can succeed.  Here
+     one needs to be careful, since a few operations (lookup,
+     read/write, readdir) can proceed without upcalls.  These must be
+     explicitly blocked.
+  5. Upon closing the namecache shall be flushed and disabled.
+  6. All memory held by cnodes can be freed without relying on upcalls.
+  7. Unmounting the file system can be done without relying on upcalls.
+  8. Mounting the Coda filesystem should fail gracefully if Venus cannot
+     get the rootfid or the attributes of the rootfid.  The latter is
+     best implemented by Venus fetching these objects before attempting
+     to mount.
+  NNOOTTEE  NetBSD in particular but also Linux have not implemented the
+  above requirements fully.  For smooth operation this needs to be
+  corrected.
diff --git a/Documentation/filesystems/cramfs.txt b/Documentation/filesystems/cramfs.txt
new file mode 100644
index 000000000000..31f53f0ab957
--- /dev/null
+++ b/Documentation/filesystems/cramfs.txt
@@ -0,0 +1,76 @@
+        Cramfs - cram a filesystem onto a small ROM
+cramfs is designed to be simple and small, and to compress things well. 
+It uses the zlib routines to compress a file one page at a time, and
+allows random page access.  The meta-data is not compressed, but is
+expressed in a very terse representation to make it use much less
+diskspace than traditional filesystems. 
+You can't write to a cramfs filesystem (making it compressible and
+compact also makes it _very_ hard to update on-the-fly), so you have to
+create the disk image with the "mkcramfs" utility.
+Usage Notes
+-----------
+File sizes are limited to less than 16MB.
+Maximum filesystem size is a little over 256MB.  (The last file on the
+filesystem is allowed to extend past 256MB.)
+Only the low 8 bits of gid are stored.  The current version of
+mkcramfs simply truncates to 8 bits, which is a potential security
+issue.
+Hard links are supported, but hard linked files
+will still have a link count of 1 in the cramfs image.
+Cramfs directories have no `.' or `..' entries.  Directories (like
+every other file on cramfs) always have a link count of 1.  (There's
+no need to use -noleaf in `find', btw.)
+No timestamps are stored in a cramfs, so these default to the epoch
+(1970 GMT).  Recently-accessed files may have updated timestamps, but
+the update lasts only as long as the inode is cached in memory, after
+which the timestamp reverts to 1970, i.e. moves backwards in time.
+Currently, cramfs must be written and read with architectures of the
+same endianness, and can be read only by kernels with PAGE_CACHE_SIZE
+== 4096.  At least the latter of these is a bug, but it hasn't been
+decided what the best fix is.  For the moment if you have larger pages
+you can just change the #define in mkcramfs.c, so long as you don't
+mind the filesystem becoming unreadable to future kernels.
+For /usr/share/magic
+--------------------
+0       ulelong 0x28cd3d45      Linux cramfs offset 0
+>4      ulelong x               size %d
+>8      ulelong x               flags 0x%x
+>12     ulelong x               future 0x%x
+>16     string  >\0             signature "%.16s"
+>32     ulelong x               fsid.crc 0x%x
+>36     ulelong x               fsid.edition %d
+>40     ulelong x               fsid.blocks %d
+>44     ulelong x               fsid.files %d
+>48     string  >\0             name "%.16s"
+512     ulelong 0x28cd3d45      Linux cramfs offset 512
+>516    ulelong x               size %d
+>520    ulelong x               flags 0x%x
+>524    ulelong x               future 0x%x
+>528    string  >\0             signature "%.16s"
+>544    ulelong x               fsid.crc 0x%x
+>548    ulelong x               fsid.edition %d
+>552    ulelong x               fsid.blocks %d
+>556    ulelong x               fsid.files %d
+>560    string  >\0             name "%.16s"
+Hacker Notes
+------------
+See fs/cramfs/README for filesystem layout and implementation notes.
diff --git a/Documentation/filesystems/devfs/ChangeLog b/Documentation/filesystems/devfs/ChangeLog
new file mode 100644
index 000000000000..e5aba5246d7c
--- /dev/null
+++ b/Documentation/filesystems/devfs/ChangeLog
@@ -0,0 +1,1977 @@
+/* -*- auto-fill -*-                                                         */
+===============================================================================
+Changes for patch v1
+- creation of devfs
+- modified miscellaneous character devices to support devfs
+===============================================================================
+Changes for patch v2
+- bug fix with manual inode creation
+===============================================================================
+Changes for patch v3
+- bugfixes
+- documentation improvements
+- created a couple of scripts (one to save&restore a devfs and the
+  other to set up compatibility symlinks)
+- devfs support for SCSI discs. New name format is: sd_hHcCiIlL
+===============================================================================
+Changes for patch v4
+- bugfix for the directory reading code
+- bugfix for compilation with kerneld
+- devfs support for generic hard discs
+- rationalisation of the various watchdog drivers
+===============================================================================
+Changes for patch v5
+- support for mounting directly from entries in the devfs (it doesn't
+  need to be mounted to do this), including the root filesystem.
+  Mounting of swap partitions also works. Hence, now if you set
+  CONFIG_DEVFS_ONLY to 'Y' then you won't be able to access your discs
+  via ordinary device nodes. Naturally, the default is 'N' so that you
+  can still use your old device nodes.  If you want to mount from devfs
+  entries, make sure you use: append = "root=/dev/sd_..." in your
+  lilo.conf. It seems LILO looks for the device number (major&minor)
+  and writes that into the kernel image :-( 
+- support for character memory devices (/dev/null, /dev/zero, /dev/full
+  and so on). Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+===============================================================================
+Changes for patch v6
+- support for subdirectories
+- support for symbolic links (created by devfs_mk_symlink(), no
+  support yet for creation via symlink(2))
+- SCSI disc naming now cast in stone, with the format:
+  /dev/sd/c0b1t2u3      controller=0, bus=1, ID=2, LUN=3, whole disc
+  /dev/sd/c0b1t2u3p4    controller=0, bus=1, ID=2, LUN=3, 4th partition
+- loop devices now appear in devfs
+- tty devices, console, serial ports, etc. now appear in devfs
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- bugs with mounting devfs-only devices now fixed
+===============================================================================
+Changes for patch v7
+- SCSI CD-ROMS, tapes and generic devices now appear in devfs
+===============================================================================
+Changes for patch v8
+- bugfix with no-rewind SCSI tapes
+- RAMDISCs now appear in devfs
+- better cleaning up of devfs entries created by various modules
+- interface change to <devfs_register>
+===============================================================================
+Changes for patch v9
+- the v8 patch was corrupted somehow, which would affect the patch for
+  linux/fs/filesystems.c
+  I've also fixed the v8 patch file on the WWW
+- MetaDevices (/dev/md*) should now appear in devfs
+===============================================================================
+Changes for patch v10
+- bugfix in meta device support for devfs
+- created this ChangeLog file
+- added devfs support to the floppy driver
+- added support for creating sockets in a devfs
+===============================================================================
+Changes for patch v11
+- added DEVFS_FL_HIDE_UNREG flag
+- incorporated better patch for ttyname() in libc 5.4.43 from H.J. Lu.
+- interface change to <devfs_mk_symlink>
+- support for creating symlinks with symlink(2)
+- parallel port printer (/dev/lp*) now appears in devfs
+===============================================================================
+Changes for patch v12
+- added inode check to <devfs_fill_file> function
+- improved devfs support when mounting from devfs
+- added call to <<release>> operation when removing swap areas on
+  devfs devices
+- increased NR_SUPER to 128 to support large numbers of devfs mounts
+  (for chroot(2) gaols)
+- fixed bug in SCSI disc support: was generating incorrect minors if
+  SCSI ID's did not start at 0 and increase by 1
+- support symlink traversal when mounting root
+===============================================================================
+Changes for patch v13
+- added devfs support to soundcard driver
+  Thanks to Eric Dumas <dumas@linux.eu.org> and
+  C. Scott Ananian <cananian@alumni.princeton.edu>
+- added devfs support to the joystick driver
+- loop driver now has it's own subdirectory "/dev/loop/"
+- created <devfs_get_flags> and <devfs_set_flags> functions
+- fix problem with SCSI disc compatibility names (sd{a,b,c,d,e,f})
+  which assumes ID's start at 0 and increase by 1. Also only create
+  devfs entries for SCSI disc partitions which actually exist
+  Show new names in partition check
+  Thanks to Jakub Jelinek <jj@sunsite.ms.mff.cuni.cz>
+===============================================================================
+Changes for patch v14
+- bug fix in floppy driver: would not compile without
+  CONFIG_DEVFS_FS='Y'
+  Thanks to Jurgen Botz <jbotz@nova.botz.org>
+- bug fix in loop driver
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- do not create devfs entries for printers not configured
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- do not create devfs entries for serial ports not present
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- ensure <tty_register_devfs> is exported from tty_io.c
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- allow unregistering of devfs symlink entries
+- fixed bug in SCSI disc naming introduced in last patch version
+===============================================================================
+Changes for patch v15
+- ported to kernel 2.1.81
+===============================================================================
+Changes for patch v16
+- created <devfs_set_symlink_destination> function
+- moved DEVFS_SUPER_MAGIC into header file
+- added DEVFS_FL_HIDE flag
+- created <devfs_get_maj_min>
+- created <devfs_get_handle_from_inode>
+- fixed bugs in searching by major&minor
+- changed interface to <devfs_unregister>, <devfs_fill_file> and
+  <devfs_find_handle>
+- fixed inode times when symlink created with symlink(2)
+- change tty driver to do auto-creation of devfs entries
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- fixed bug in genhd.c: whole disc (non-SCSI) was not registered to
+  devfs
+- updated libc 5.4.43 patch for ttyname()
+===============================================================================
+Changes for patch v17
+- added CONFIG_DEVFS_TTY_COMPAT
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- bugfix in devfs support for drivers/char/lp.c
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- clean up serial driver so that PCMCIA devices unregister correctly
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- fixed bug in genhd.c: whole disc (non-SCSI) was not registered to
+  devfs [was missing in patch v16]
+- updated libc 5.4.43 patch for ttyname() [was missing in patch v16]
+- all SCSI devices now registered in /dev/sg
+- support removal of devfs entries via unlink(2)
+===============================================================================
+Changes for patch v18
+- added floppy/?u720 floppy entry
+- fixed kerneld support for entries in devfs subdirectories
+- incorporated latest patch for ttyname() in libc 5.4.43 from H.J. Lu.
+===============================================================================
+Changes for patch v19
+- bug fix when looking up unregistered entries: kerneld was not called
+- fixes for kernel 2.1.86 (now requires 2.1.86)
+===============================================================================
+Changes for patch v20
+- only create available floppy entries
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- new IDE naming scheme following SCSI format (i.e. /dev/id/c0b0t0u0p1
+  instead of /dev/hda1)
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- new XT disc naming scheme following SCSI format (i.e. /dev/xd/c0t0p1
+  instead of /dev/xda1)
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- new non-standard CD-ROM names (i.e. /dev/sbp/c#t#)
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- allow symlink traversal when mounting the root filesystem
+- Create entries for MD devices at MD init
+  Thanks to Christophe Leroy <christophe.leroy5@capway.com>
+===============================================================================
+Changes for patch v21
+- ported to kernel 2.1.91
+===============================================================================
+Changes for patch v22
+- SCSI host number patch ("scsihosts=" kernel option)
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+===============================================================================
+Changes for patch v23
+- Fixed persistence bug with device numbers for manually created
+  device files
+- Fixed problem with recreating symlinks with different content
+- Added CONFIG_DEVFS_MOUNT (mount devfs on /dev at boot time)
+===============================================================================
+Changes for patch v24
+- Switched from CONFIG_KERNELD to CONFIG_KMOD: module autoloading
+  should now work again
+- Hide entries which are manually unlinked
+- Always invalidate devfs dentry cache when registering entries
+- Support removal of devfs directories via rmdir(2)
+- Ensure directories created by <devfs_mk_dir> are visible
+- Default no access for "other" for floppy device
+===============================================================================
+Changes for patch v25
+- Updates to CREDITS file and minor IDE numbering change
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- Invalidate devfs dentry cache when making directories
+- Invalidate devfs dentry cache when removing entries
+- More informative message if root FS mount fails when devfs
+  configured
+- Fixed persistence bug with fifos
+===============================================================================
+Changes for patch v26
+- ported to kernel 2.1.97
+- Changed serial directory from "/dev/serial" to "/dev/tts" and
+  "/dev/consoles" to "/dev/vc" to be more friendly to new procps
+===============================================================================
+Changes for patch v27
+- Added support for IDE4 and IDE5
+  Thanks to Andrzej Krzysztofowicz <ankry@green.mif.pg.gda.pl>
+- Documented "scsihosts=" boot parameter
+- Print process command when debugging kerneld/kmod
+- Added debugging for register/unregister/change operations
+- Added "devfs=" boot options
+- Hide unregistered entries by default
+===============================================================================
+Changes for patch v28
+- No longer lock/unlock superblock in <devfs_put_super> (cope with
+  recent VFS interface change)
+- Do not automatically change ownership/protection of /dev/tty
+- Drop negative dentries when they are released
+- Manage dcache more efficiently
+===============================================================================
+Changes for patch v29
+- Added DEVFS_FL_AUTO_DEVNUM flag
+===============================================================================
+Changes for patch v30
+- No longer set unnecessary methods
+- Ported to kernel 2.1.99-pre3
+===============================================================================
+Changes for patch v31
+- Added PID display to <call_kerneld> debugging message
+- Added "diread" and "diwrite" options
+- Ported to kernel 2.1.102
+- Fixed persistence problem with permissions
+===============================================================================
+Changes for patch v32
+- Fixed devfs support in drivers/block/md.c
+===============================================================================
+Changes for patch v33
+- Support legacy device nodes
+- Fixed bug where recreated inodes were hidden
+- New IDE naming scheme: everything is under /dev/ide
+===============================================================================
+Changes for patch v34
+- Improved debugging in <get_vfs_inode>
+- Prevent duplicate calls to <devfs_mk_dir> in SCSI layer
+- No longer free old dentries in <devfs_mk_dir>
+- Free all dentries for a given entry when deleting inodes
+===============================================================================
+Changes for patch v35
+- Ported to kernel 2.1.105 (sound driver changes)
+===============================================================================
+Changes for patch v36
+- Fixed sound driver port
+===============================================================================
+Changes for patch v37
+- Minor documentation tweaks
+===============================================================================
+Changes for patch v38
+- More documentation tweaks
+- Fix for sound driver port
+- Removed ttyname-patch (grab libc 5.4.44 instead)
+- Ported to kernel 2.1.107-pre2 (loop driver fix)
+===============================================================================
+Changes for patch v39
+- Ported to kernel 2.1.107 (hd.c hunk broke due to spelling "fixes"). Sigh
+- Removed many #ifdef's, replaced with trickery in include/devfs_fs.h
+===============================================================================
+Changes for patch v40
+- Fix for sound driver port
+- Limit auto-device numbering to majors 128 to 239
+===============================================================================
+Changes for patch v41
+- Fixed inode times persistence problem
+===============================================================================
+Changes for patch v42
+- Ported to kernel 2.1.108 (drivers/scsi/hosts.c hunk broke)
+===============================================================================
+Changes for patch v43
+- Fixed spelling in <devfs_readlink> debug
+- Fixed bug in <devfs_setup> parsing "dilookup"
+- More #ifdef's removed
+- Supported Sparc keyboard (/dev/kbd)
+- Supported DSP56001 digital signal processor (/dev/dsp56k)
+- Supported Apple Desktop Bus (/dev/adb)
+- Supported Coda network file system (/dev/cfs*)
+===============================================================================
+Changes for patch v44
+- Fixed devfs inode leak when manually recreating inodes
+- Fixed permission persistence problem when recreating inodes
+===============================================================================
+Changes for patch v45
+- Ported to kernel 2.1.110
+===============================================================================
+Changes for patch v46
+- Ported to kernel 2.1.112-pre1
+- Removed harmless "unused variable" compiler warning
+- Fixed modes for manually recreated device nodes
+===============================================================================
+Changes for patch v47
+- Added NULL devfs inode warning in <devfs_read_inode>
+- Force all inode nlink values to 1
+===============================================================================
+Changes for patch v48
+- Added "dimknod" option
+- Set inode nlink to 0 when freeing dentries
+- Added support for virtual console capture devices (/dev/vcs*)
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Fixed modes for manually recreated symlinks
+===============================================================================
+Changes for patch v49
+- Ported to kernel 2.1.113
+===============================================================================
+Changes for patch v50
+- Fixed bugs in recreated directories and symlinks
+===============================================================================
+Changes for patch v51
+- Improved robustness of rc.devfs script
+  Thanks to Roderich Schupp <rsch@experteam.de>
+- Fixed bugs in recreated device nodes
+- Fixed bug in currently unused <devfs_get_handle_from_inode>
+- Defined new <devfs_handle_t> type
+- Improved debugging when getting entries
+- Fixed bug where directories could be emptied
+- Ported to kernel 2.1.115
+===============================================================================
+Changes for patch v52
+- Replaced dummy .epoch inode with .devfsd character device
+- Modified rc.devfs to take account of above change
+- Removed spurious driver warning messages when CONFIG_DEVFS_FS=n
+- Implemented devfsd protocol revision 0
+===============================================================================
+Changes for patch v53
+- Ported to kernel 2.1.116 (kmod change broke hunk)
+- Updated Documentation/Configure.help
+- Test and tty pattern patch for rc.devfs script
+  Thanks to Roderich Schupp <rsch@experteam.de>
+- Added soothing message to warning in <devfs_d_iput>
+===============================================================================
+Changes for patch v54
+- Ported to kernel 2.1.117
+- Fixed default permissions in sound driver
+- Added support for frame buffer devices (/dev/fb*)
+===============================================================================
+Changes for patch v55
+- Ported to kernel 2.1.119
+- Use GCC extensions for structure initialisations
+- Implemented async open notification
+- Incremented devfsd protocol revision to 1
+===============================================================================
+Changes for patch v56
+- Ported to kernel 2.1.120-pre3
+- Moved async open notification to end of <devfs_open>
+===============================================================================
+Changes for patch v57
+- Ported to kernel 2.1.121
+- Prepended "/dev/" to module load request
+- Renamed <call_kerneld> to <call_kmod>
+- Created sample modules.conf file
+===============================================================================
+Changes for patch v58
+- Fixed typo "AYSNC" -> "ASYNC"
+===============================================================================
+Changes for patch v59
+- Added open flag for files
+===============================================================================
+Changes for patch v60
+- Ported to kernel 2.1.123-pre2
+===============================================================================
+Changes for patch v61
+- Set i_blocks=0 and i_blksize=1024 in <devfs_read_inode>
+===============================================================================
+Changes for patch v62
+- Ported to kernel 2.1.123
+===============================================================================
+Changes for patch v63
+- Ported to kernel 2.1.124-pre2
+===============================================================================
+Changes for patch v64
+- Fixed Unix98 pty support
+- Increased buffer size in <get_partition_list> to avoid crash and
+  burn
+===============================================================================
+Changes for patch v65
+- More Unix98 pty support fixes
+- Added test for empty <<name>> in <devfs_find_handle>
+- Renamed <generate_path> to <devfs_generate_path> and published
+- Created /dev/root symlink
+  Thanks to Roderich Schupp <rsch@ExperTeam.de>
+  with further modifications by me
+===============================================================================
+Changes for patch v66
+- Yet more Unix98 pty support fixes (now tested)
+- Created <devfs_get_fops>
+- Support media change checks when CONFIG_DEVFS_ONLY=y
+- Abolished Unix98-style PTY names for old PTY devices
+===============================================================================
+Changes for patch v67
+- Added inline declaration for dummy <devfs_generate_path>
+- Removed spurious "unable to register... in devfs" messages when
+  CONFIG_DEVFS_FS=n
+- Fixed misc. devices when CONFIG_DEVFS_FS=n
+- Limit auto-device numbering to majors 144 to 239
+===============================================================================
+Changes for patch v68
+- Hide unopened virtual consoles from directory listings
+- Added support for video capture devices
+- Ported to kernel 2.1.125
+===============================================================================
+Changes for patch v69
+- Fix for CONFIG_VT=n
+===============================================================================
+Changes for patch v70
+- Added support for non-OSS/Free sound cards
+===============================================================================
+Changes for patch v71
+- Ported to kernel 2.1.126-pre2
+===============================================================================
+Changes for patch v72
+- #ifdef's for CONFIG_DEVFS_DISABLE_OLD_NAMES removed
+===============================================================================
+Changes for patch v73
+- CONFIG_DEVFS_DISABLE_OLD_NAMES replaced with "nocompat" boot option
+- CONFIG_DEVFS_BOOT_OPTIONS removed: boot options always available
+===============================================================================
+Changes for patch v74
+- Removed CONFIG_DEVFS_MOUNT and "mount" boot option and replaced with
+  "nomount" boot option
+- Documentation updates
+- Updated sample modules.conf
+===============================================================================
+Changes for patch v75
+- Updated sample modules.conf
+- Remount devfs after initrd finishes
+- Ported to kernel 2.1.127
+- Added support for ISDN
+  Thanks to Christophe Leroy <christophe.leroy5@capway.com>
+===============================================================================
+Changes for patch v76
+- Updated an email address in ChangeLog
+- CONFIG_DEVFS_ONLY replaced with "only" boot option
+===============================================================================
+Changes for patch v77
+- Added DEVFS_FL_REMOVABLE flag
+- Check for disc change when listing directories with removable media
+  devices
+- Use DEVFS_FL_REMOVABLE in sd.c
+- Ported to kernel 2.1.128
+===============================================================================
+Changes for patch v78
+- Only call <scan_dir_for_removable> on first call to <devfs_readdir>
+- Ported to kernel 2.1.129-pre5
+- ISDN support improvements
+  Thanks to Christophe Leroy <christophe.leroy5@capway.com>
+===============================================================================
+Changes for patch v79
+- Ported to kernel 2.1.130
+- Renamed miscdevice "apm" to "apm_bios" to be consistent with
+  devices.txt
+===============================================================================
+Changes for patch v80
+- Ported to kernel 2.1.131
+- Updated <devfs_rmdir> for VFS change in 2.1.131
+===============================================================================
+Changes for patch v81
+- Fixed permissions on /dev/ptmx
+===============================================================================
+Changes for patch v82
+- Ported to kernel 2.1.132-pre4
+- Changed initial permissions on /dev/pts/*
+- Created <devfs_mk_compat>
+- Added "symlinks" boot option
+- Changed devfs_register_blkdev() back to register_blkdev() for IDE
+- Check for partitions on removable media in <devfs_lookup>
+===============================================================================
+Changes for patch v83
+- Fixed support for ramdisc when using string-based root FS name
+- Ported to kernel 2.2.0-pre1
+===============================================================================
+Changes for patch v84
+- Ported to kernel 2.2.0-pre7
+===============================================================================
+Changes for patch v85
+- Compile fixes for driver/sound/sound_common.c (non-module) and
+  drivers/isdn/isdn_common.c
+  Thanks to Christophe Leroy <christophe.leroy5@capway.com>
+- Added support for registering regular files
+- Created <devfs_set_file_size>
+- Added /dev/cpu/mtrr as an alternative interface to /proc/mtrr
+- Update devfs inodes from entries if not changed through FS
+===============================================================================
+Changes for patch v86
+- Ported to kernel 2.2.0-pre9
+===============================================================================
+Changes for patch v87
+- Fixed bug when mounting non-devfs devices in a devfs
+===============================================================================
+Changes for patch v88
+- Fixed <devfs_fill_file> to only initialise temporary inodes
+- Trap for NULL fops in <devfs_register>
+- Return -ENODEV in <devfs_fill_file> for non-driver inodes
+- Fixed bug when unswapping non-devfs devices in a devfs
+===============================================================================
+Changes for patch v89
+- Switched to C data types in include/linux/devfs_fs.h
+- Switched from PATH_MAX to DEVFS_PATHLEN
+- Updated Documentation/filesystems/devfs/modules.conf to take account
+  of reverse scanning (!) by modprobe
+- Ported to kernel 2.2.0
+===============================================================================
+Changes for patch v90
+- CONFIG_DEVFS_DISABLE_OLD_TTY_NAMES replaced with "nottycompat" boot
+  option
+- CONFIG_DEVFS_TTY_COMPAT removed: existing "symlinks" boot option now
+  controls this. This means you must have libc 5.4.44 or later, or a
+  recent version of libc 6 if you use the "symlinks" option
+===============================================================================
+Changes for patch v91
+- Switch from <devfs_mk_symlink> to <devfs_mk_compat> in
+  drivers/char/vc_screen.c to fix problems with Midnight Commander
+===============================================================================
+Changes for patch v92
+- Ported to kernel 2.2.2-pre5
+===============================================================================
+Changes for patch v93
+- Modified <sd_name> in drivers/scsi/sd.c to cope with devices that
+  don't exist (which happens with new RAID autostart code printk()s)
+===============================================================================
+Changes for patch v94
+- Fixed bug in joystick driver: only first joystick was registered
+===============================================================================
+Changes for patch v95
+- Fixed another bug in joystick driver
+- Fixed <devfsd_read> to not overrun event buffer
+===============================================================================
+Changes for patch v96
+- Ported to kernel 2.2.5-2
+- Created <devfs_auto_unregister>
+- Fixed bugs: compatibility entries were not unregistered for:
+    loop driver
+    floppy driver
+    RAMDISC driver
+    IDE tape driver
+    SCSI CD-ROM driver
+    SCSI HDD driver
+===============================================================================
+Changes for patch v97
+- Fixed bugs: compatibility entries were not unregistered for:
+    ALSA sound driver
+    partitions in generic disc driver
+- Don't return unregistred entries in <devfs_find_handle>
+- Panic in <devfs_unregister> if entry unregistered
+- Don't panic in <devfs_auto_unregister> for duplicates
+===============================================================================
+Changes for patch v98
+- Don't unregister already unregistered entries in <unregister>
+- Register entry in <sd_detect>
+- Unregister entry in <sd_detach>
+- Changed to <devfs_*register_chrdev> in drivers/char/tty_io.c
+- Ported to kernel 2.2.7
+===============================================================================
+Changes for patch v99
+- Ported to kernel 2.2.8
+- Fixed bug in drivers/scsi/sd.c when >16 SCSI discs
+- Disable warning messages when unable to read partition table for
+  removable media
+===============================================================================
+Changes for patch v100
+- Ported to kernel 2.3.1-pre5
+- Added "oops-on-panic" boot option
+- Improved debugging in <devfs_register> and <devfs_unregister>
+- Register entry in <sr_detect>
+- Unregister entry in <sr_detach>
+- Register entry in <sg_detect>
+- Unregister entry in <sg_detach>
+- Added support for ALSA drivers
+===============================================================================
+Changes for patch v101
+- Ported to kernel 2.3.2
+===============================================================================
+Changes for patch v102
+- Update serial driver to register PCMCIA entries
+  Thanks to Roch-Alexandre Nomine-Beguin <roch@samarkand.infini.fr>
+- Updated an email address in ChangeLog
+- Hide virtual console capture entries from directory listings when
+  corresponding console device is not open
+===============================================================================
+Changes for patch v103
+- Ported to kernel 2.3.3
+===============================================================================
+Changes for patch v104
+- Added documentation for some functions
+- Added "doc" target to fs/devfs/Makefile
+- Added "v4l" directory for video4linux devices
+- Replaced call to <devfs_unregister> in <sd_detach> with call to
+  <devfs_register_partitions>
+- Moved registration for sr and sg drivers from detect() to attach()
+  methods
+- Register entries in <st_attach> and unregister in <st_detach>
+- Work around IDE driver treating CD-ROM as gendisk
+- Use <sed> instead of <tr> in rc.devfs
+- Updated ToDo list
+- Removed "oops-on-panic" boot option: now always Oops
+===============================================================================
+Changes for patch v105
+- Unregister SCSI host from <scsi_host_no_list> in <scsi_unregister>
+  Thanks to Zolt�n B�sz�rm�nyi <zboszor@mail.externet.hu>
+- Don't save /dev/log in rc.devfs
+- Ported to kernel 2.3.4-pre1
+===============================================================================
+Changes for patch v106
+- Fixed silly typo in drivers/scsi/st.c
+- Improved debugging in <devfs_register>
+===============================================================================
+Changes for patch v107
+- Added "diunlink" and "nokmod" boot options
+- Removed superfluous warning message in <devfs_d_iput>
+===============================================================================
+Changes for patch v108
+- Remove entries when unloading sound module
+===============================================================================
+Changes for patch v109
+- Ported to kernel 2.3.6-pre2
+===============================================================================
+Changes for patch v110
+- Took account of change to <d_alloc_root>
+===============================================================================
+Changes for patch v111
+- Created separate event queue for each mounted devfs
+- Removed <devfs_invalidate_dcache>
+- Created new ioctl()s for devfsd
+- Incremented devfsd protocol revision to 3
+- Fixed bug when re-creating directories: contents were lost
+- Block access to inodes until devfsd updates permissions
+===============================================================================
+Changes for patch v112
+- Modified patch so it applies against 2.3.5 and 2.3.6
+- Updated an email address in ChangeLog
+- Do not automatically change ownership/protection of /dev/tty<n>
+- Updated sample modules.conf
+- Switched to sending process uid/gid to devfsd
+- Renamed <call_kmod> to <try_modload>
+- Added DEVFSD_NOTIFY_LOOKUP event
+- Added DEVFSD_NOTIFY_CHANGE event
+- Added DEVFSD_NOTIFY_CREATE event
+- Incremented devfsd protocol revision to 4
+- Moved kernel-specific stuff to include/linux/devfs_fs_kernel.h
+===============================================================================
+Changes for patch v113
+- Ported to kernel 2.3.9
+- Restricted permissions on some block devices
+===============================================================================
+Changes for patch v114
+- Added support for /dev/netlink
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Return EISDIR rather than EINVAL for read(2) on directories
+- Ported to kernel 2.3.10
+===============================================================================
+Changes for patch v115
+- Added support for all remaining character devices
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Cleaned up netlink support
+===============================================================================
+Changes for patch v116
+- Added support for /dev/parport%d
+  Thanks to Tim Waugh <tim@cyberelk.demon.co.uk>
+- Fixed parallel port ATAPI tape driver
+- Fixed Atari SLM laser printer driver
+===============================================================================
+Changes for patch v117
+- Added support for COSA card
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Fixed drivers/char/ppdev.c: missing #include <linux/init.h>
+- Fixed drivers/char/ftape/zftape/zftape-init.c
+  Thanks to Vladimir Popov <mashgrad@usa.net>
+===============================================================================
+Changes for patch v118
+- Ported to kernel 2.3.15-pre3
+- Fixed bug in loop driver
+- Unregister /dev/lp%d entries in drivers/char/lp.c
+  Thanks to Maciej W. Rozycki <macro@ds2.pg.gda.pl>
+===============================================================================
+Changes for patch v119
+- Ported to kernel 2.3.16
+===============================================================================
+Changes for patch v120
+- Fixed bug in drivers/scsi/scsi.c
+- Added /dev/ppp
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Ported to kernel 2.3.17
+===============================================================================
+Changes for patch v121
+- Fixed bug in drivers/block/loop.c
+- Ported to kernel 2.3.18
+===============================================================================
+Changes for patch v122
+- Ported to kernel 2.3.19
+===============================================================================
+Changes for patch v123
+- Ported to kernel 2.3.20
+===============================================================================
+Changes for patch v124
+- Ported to kernel 2.3.21
+===============================================================================
+Changes for patch v125
+- Created <devfs_get_info>, <devfs_set_info>,
+  <devfs_get_first_child> and <devfs_get_next_sibling>
+  Added <<dir>> parameter to <devfs_register>, <devfs_mk_compat>,
+  <devfs_mk_dir> and <devfs_find_handle>
+  Work sponsored by SGI
+- Fixed apparent bug in COSA driver
+- Re-instated "scsihosts=" boot option
+===============================================================================
+Changes for patch v126
+- Always create /dev/pts if CONFIG_UNIX98_PTYS=y
+- Fixed call to <devfs_mk_dir> in drivers/block/ide-disk.c
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Allow multiple unregistrations
+- Created /dev/scsi hierarchy
+  Work sponsored by SGI
+===============================================================================
+Changes for patch v127
+Work sponsored by SGI
+- No longer disable devpts if devfs enabled (caveat emptor)
+- Added flags array to struct gendisk and removed code from
+  drivers/scsi/sd.c
+- Created /dev/discs hierarchy
+===============================================================================
+Changes for patch v128
+Work sponsored by SGI
+- Created /dev/cdroms hierarchy
+===============================================================================
+Changes for patch v129
+Work sponsored by SGI
+- Removed compatibility entries for sound devices
+- Removed compatibility entries for printer devices
+- Removed compatibility entries for video4linux devices
+- Removed compatibility entries for parallel port devices
+- Removed compatibility entries for frame buffer devices
+===============================================================================
+Changes for patch v130
+Work sponsored by SGI
+- Added major and minor number to devfsd protocol
+- Incremented devfsd protocol revision to 5
+- Removed compatibility entries for SoundBlaster CD-ROMs
+- Removed compatibility entries for netlink devices
+- Removed compatibility entries for SCSI generic devices
+- Removed compatibility entries for SCSI tape devices
+===============================================================================
+Changes for patch v131
+Work sponsored by SGI
+- Support info pointer for all devfs entry types
+- Added <<info>> parameter to <devfs_mk_dir> and <devfs_mk_symlink>
+- Removed /dev/st hierarchy
+- Removed /dev/sg hierarchy
+- Removed compatibility entries for loop devices
+- Removed compatibility entries for IDE tape devices
+- Removed compatibility entries for SCSI CD-ROMs
+- Removed /dev/sr hierarchy
+===============================================================================
+Changes for patch v132
+Work sponsored by SGI
+- Removed compatibility entries for floppy devices
+- Removed compatibility entries for RAMDISCs
+- Removed compatibility entries for meta-devices
+- Removed compatibility entries for SCSI discs
+- Created <devfs_make_root>
+- Removed /dev/sd hierarchy
+- Support "../" when searching devfs namespace
+- Created /dev/ide/host* hierarchy
+- Supported IDE hard discs in /dev/ide/host* hierarchy
+- Removed compatibility entries for IDE discs
+- Removed /dev/ide/hd hierarchy
+- Supported IDE CD-ROMs in /dev/ide/host* hierarchy
+- Removed compatibility entries for IDE CD-ROMs
+- Removed /dev/ide/cd hierarchy
+===============================================================================
+Changes for patch v133
+Work sponsored by SGI
+- Created <devfs_get_unregister_slave>
+- Fixed bug in fs/partitions/check.c when rescanning
+===============================================================================
+Changes for patch v134
+Work sponsored by SGI
+- Removed /dev/sd, /dev/sr, /dev/st and /dev/sg directories
+- Removed /dev/ide/hd directory
+- Exported <devfs_get_parent>
+- Created <devfs_register_tape> and /dev/tapes hierarchy
+- Removed /dev/ide/mt hierarchy
+- Removed /dev/ide/fd hierarchy
+- Ported to kernel 2.3.25
+===============================================================================
+Changes for patch v135
+Work sponsored by SGI
+- Removed compatibility entries for virtual console capture devices
+- Removed unused <devfs_set_symlink_destination>
+- Removed compatibility entries for serial devices
+- Removed compatibility entries for console devices
+- Do not hide entries from devfsd or children
+- Removed DEVFS_FL_TTY_COMPAT flag
+- Removed "nottycompat" boot option
+- Removed <devfs_mk_compat>
+===============================================================================
+Changes for patch v136
+Work sponsored by SGI
+- Moved BSD pty devices to /dev/pty
+- Added DEVFS_FL_WAIT flag
+===============================================================================
+Changes for patch v137
+Work sponsored by SGI
+- Really fixed bug in fs/partitions/check.c when rescanning
+- Support new "disc" naming scheme in <get_removable_partition>
+- Allow NULL fops in <devfs_register>
+- Removed redundant name functions in SCSI disc and IDE drivers
+===============================================================================
+Changes for patch v138
+Work sponsored by SGI
+- Fixed old bugs in drivers/block/paride/pt.c, drivers/char/tpqic02.c,
+  drivers/net/wan/cosa.c and drivers/scsi/scsi.c
+  Thanks to Sergey Kubushin <ksi@ksi-linux.com>
+- Fall back to major table if NULL fops given to <devfs_register>
+===============================================================================
+Changes for patch v139
+Work sponsored by SGI
+- Corrected and moved <get_blkfops> and <get_chrfops> declarations
+  from arch/alpha/kernel/osf_sys.c to include/linux/fs.h
+- Removed name function from struct gendisk
+- Updated devfs FAQ
+===============================================================================
+Changes for patch v140
+Work sponsored by SGI
+- Ported to kernel 2.3.27
+===============================================================================
+Changes for patch v141
+Work sponsored by SGI
+- Bug fix in arch/m68k/atari/joystick.c
+- Moved ISDN and capi devices to /dev/isdn
+===============================================================================
+Changes for patch v142
+Work sponsored by SGI
+- Bug fix in drivers/block/ide-probe.c (patch confusion)
+===============================================================================
+Changes for patch v143
+Work sponsored by SGI
+- Bug fix in drivers/block/blkpg.c:partition_name()
+===============================================================================
+Changes for patch v144
+Work sponsored by SGI
+- Ported to kernel 2.3.29
+- Removed calls to <devfs_register> from cdu31a, cm206, mcd and mcdx
+  CD-ROM drivers: generic driver handles this now
+- Moved joystick devices to /dev/joysticks
+===============================================================================
+Changes for patch v145
+Work sponsored by SGI
+- Ported to kernel 2.3.30-pre3
+- Register whole-disc entry even for invalid partition tables
+- Fixed bug in mounting root FS when initrd enabled
+- Fixed device entry leak with IDE CD-ROMs
+- Fixed compile problem with drivers/isdn/isdn_common.c
+- Moved COSA devices to /dev/cosa
+- Support fifos when unregistering
+- Created <devfs_register_series> and used in many drivers
+- Moved Coda devices to /dev/coda
+- Moved parallel port IDE tapes to /dev/pt
+- Moved parallel port IDE generic devices to /dev/pg
+===============================================================================
+Changes for patch v146
+Work sponsored by SGI
+- Removed obsolete DEVFS_FL_COMPAT and DEVFS_FL_TOLERANT flags
+- Fixed compile problem with fs/coda/psdev.c
+- Reinstate change to <devfs_register_blkdev> in
+  drivers/block/ide-probe.c now that fs/isofs/inode.c is fixed
+- Switched to <devfs_register_blkdev> in drivers/block/floppy.c,
+  drivers/scsi/sr.c and drivers/block/md.c
+- Moved DAC960 devices to /dev/dac960
+===============================================================================
+Changes for patch v147
+Work sponsored by SGI
+- Ported to kernel 2.3.32-pre4
+===============================================================================
+Changes for patch v148
+Work sponsored by SGI
+- Removed kmod support: use devfsd instead
+- Moved miscellaneous character devices to /dev/misc
+===============================================================================
+Changes for patch v149
+Work sponsored by SGI
+- Ensure include/linux/joystick.h is OK for user-space
+- Improved debugging in <get_vfs_inode>
+- Ensure dentries created by devfsd will be cleaned up
+===============================================================================
+Changes for patch v150
+Work sponsored by SGI
+- Ported to kernel 2.3.34
+===============================================================================
+Changes for patch v151
+Work sponsored by SGI
+- Ported to kernel 2.3.35-pre1
+- Created <devfs_get_name>
+===============================================================================
+Changes for patch v152
+Work sponsored by SGI
+- Updated sample modules.conf
+- Ported to kernel 2.3.36-pre1
+===============================================================================
+Changes for patch v153
+Work sponsored by SGI
+- Ported to kernel 2.3.42
+- Removed <devfs_fill_file>
+===============================================================================
+Changes for patch v154
+Work sponsored by SGI
+- Took account of device number changes for /dev/fb*
+===============================================================================
+Changes for patch v155
+Work sponsored by SGI
+- Ported to kernel 2.3.43-pre8
+- Moved /dev/tty0 to /dev/vc/0
+- Moved sequence number formatting from <_tty_make_name> to drivers
+===============================================================================
+Changes for patch v156
+Work sponsored by SGI
+- Fixed breakage in drivers/scsi/sd.c due to recent SCSI changes
+===============================================================================
+Changes for patch v157
+Work sponsored by SGI
+- Ported to kernel 2.3.45
+===============================================================================
+Changes for patch v158
+Work sponsored by SGI
+- Ported to kernel 2.3.46-pre2
+===============================================================================
+Changes for patch v159
+Work sponsored by SGI
+- Fixed drivers/block/md.c
+  Thanks to Mike Galbraith <mikeg@weiden.de>
+- Documentation fixes
+- Moved device registration from <lp_init> to <lp_register>
+  Thanks to Tim Waugh <twaugh@redhat.com>
+===============================================================================
+Changes for patch v160
+Work sponsored by SGI
+- Fixed drivers/char/joystick/joystick.c
+  Thanks to Vojtech Pavlik <vojtech@suse.cz>
+- Documentation updates
+- Fixed arch/i386/kernel/mtrr.c if procfs and devfs not enabled
+- Fixed drivers/char/stallion.c
+===============================================================================
+Changes for patch v161
+Work sponsored by SGI
+- Remove /dev/ide when ide-mod is unloaded
+- Fixed bug in drivers/block/ide-probe.c when secondary but no primary
+- Added DEVFS_FL_NO_PERSISTENCE flag
+- Used new DEVFS_FL_NO_PERSISTENCE flag for Unix98 pty slaves
+- Removed unnecessary call to <update_devfs_inode_from_entry> in
+  <devfs_readdir>
+- Only set auto-ownership for /dev/pty/s*
+===============================================================================
+Changes for patch v162
+Work sponsored by SGI
+- Set inode->i_size to correct size for symlinks
+  Thanks to Jeremy Fitzhardinge <jeremy@goop.org>
+- Only give lookup() method to directories to comply with new VFS
+  assumptions
+- Remove unnecessary tests in symlink methods
+- Don't kill existing block ops in <devfs_read_inode>
+- Restore auto-ownership for /dev/pty/m*
+===============================================================================
+Changes for patch v163
+Work sponsored by SGI
+- Don't create missing directories in <devfs_find_handle>
+- Removed Documentation/filesystems/devfs/mk-devlinks
+- Updated Documentation/filesystems/devfs/README
+===============================================================================
+Changes for patch v164
+Work sponsored by SGI
+- Fixed CONFIG_DEVFS breakage in drivers/char/serial.c introduced in
+  linux-2.3.99-pre6-7
+===============================================================================
+Changes for patch v165
+Work sponsored by SGI
+- Ported to kernel 2.3.99-pre6
+===============================================================================
+Changes for patch v166
+Work sponsored by SGI
+- Added CONFIG_DEVFS_MOUNT
+===============================================================================
+Changes for patch v167
+Work sponsored by SGI
+- Updated Documentation/filesystems/devfs/README
+- Updated sample modules.conf
+===============================================================================
+Changes for patch v168
+Work sponsored by SGI
+- Disabled multi-mount capability (use VFS bindings instead)
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v169
+Work sponsored by SGI
+- Removed multi-mount code
+- Removed compatibility macros: VFS has changed too much
+===============================================================================
+Changes for patch v170
+Work sponsored by SGI
+- Updated README from master HTML file
+- Merged devfs inode into devfs entry
+===============================================================================
+Changes for patch v171
+Work sponsored by SGI
+- Updated sample modules.conf
+- Removed dead code in <devfs_register> which used to call
+  <free_dentries>
+- Ported to kernel 2.4.0-test2-pre3
+===============================================================================
+Changes for patch v172
+Work sponsored by SGI
+- Changed interface to <devfs_register>
+- Changed interface to <devfs_register_series>
+===============================================================================
+Changes for patch v173
+Work sponsored by SGI
+- Simplified interface to <devfs_mk_symlink>
+- Simplified interface to <devfs_mk_dir>
+- Simplified interface to <devfs_find_handle>
+===============================================================================
+Changes for patch v174
+Work sponsored by SGI
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v175
+Work sponsored by SGI
+- DocBook update for fs/devfs/base.c
+  Thanks to Tim Waugh <twaugh@redhat.com>
+- Removed stale fs/tunnel.c (was never used or completed)
+===============================================================================
+Changes for patch v176
+Work sponsored by SGI
+- Updated ToDo list
+- Removed sample modules.conf: now distributed with devfsd
+- Updated README from master HTML file
+- Ported to kernel 2.4.0-test3-pre4 (which had devfs-patch-v174)
+===============================================================================
+Changes for patch v177
+- Updated README from master HTML file
+- Documentation cleanups
+- Ensure <devfs_generate_path> terminates string for root entry
+  Thanks to Tim Jansen <tim@tjansen.de>
+- Exported <devfs_get_name> to modules
+- Make <devfs_mk_symlink> send events to devfsd
+- Cleaned up option processing in <devfs_setup>
+- Fixed bugs in handling symlinks: could leak or cause Oops
+- Cleaned up directory handling by separating fops
+  Thanks to Alexander Viro <viro@parcelfarce.linux.theplanet.co.uk>
+===============================================================================
+Changes for patch v178
+- Fixed handling of inverted options in <devfs_setup>
+===============================================================================
+Changes for patch v179
+- Adjusted <try_modload> to account for <devfs_generate_path> fix
+===============================================================================
+Changes for patch v180
+- Fixed !CONFIG_DEVFS_FS stub declaration of <devfs_get_info>
+===============================================================================
+Changes for patch v181
+- Answered question posed by Al Viro and removed his comments from <devfs_open>
+- Moved setting of registered flag after other fields are changed
+- Fixed race between <devfsd_close> and <devfsd_notify_one>
+- Global VFS changes added bogus BKL to devfsd_close(): removed
+- Widened locking in <devfs_readlink> and <devfs_follow_link>
+- Replaced <devfsd_read> stack usage with <devfsd_ioctl> kmalloc
+- Simplified locking in <devfsd_ioctl> and fixed memory leak
+===============================================================================
+Changes for patch v182
+- Created <devfs_*alloc_major> and <devfs_*alloc_devnum>
+- Removed broken devnum allocation and use <devfs_alloc_devnum>
+- Fixed old devnum leak by calling new <devfs_dealloc_devnum>
+- Created <devfs_*alloc_unique_number>
+- Fixed number leak for /dev/cdroms/cdrom%d
+- Fixed number leak for /dev/discs/disc%d
+===============================================================================
+Changes for patch v183
+- Fixed bug in <devfs_setup> which could hang boot process
+===============================================================================
+Changes for patch v184
+- Documentation typo fix for fs/devfs/util.c
+- Fixed drivers/char/stallion.c for devfs
+- Added DEVFSD_NOTIFY_DELETE event
+- Updated README from master HTML file
+- Removed #include <asm/segment.h> from fs/devfs/base.c
+===============================================================================
+Changes for patch v185
+- Made <block_semaphore> and <char_semaphore> in fs/devfs/util.c
+  private
+- Fixed inode table races by removing it and using inode->u.generic_ip
+  instead
+- Moved <devfs_read_inode> into <get_vfs_inode>
+- Moved <devfs_write_inode> into <devfs_notify_change>
+===============================================================================
+Changes for patch v186
+- Fixed race in <devfs_do_symlink> for uni-processor
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v187
+- Fixed drivers/char/stallion.c for devfs
+- Fixed drivers/char/rocket.c for devfs
+- Fixed bug in <devfs_alloc_unique_number>: limited to 128 numbers
+===============================================================================
+Changes for patch v188
+- Updated major masks in fs/devfs/util.c up to Linus' "no new majors"
+  proclamation. Block: were 126 now 122 free, char: were 26 now 19 free
+- Updated README from master HTML file
+- Removed remnant of multi-mount support in <devfs_mknod>
+- Removed unused DEVFS_FL_SHOW_UNREG flag
+===============================================================================
+Changes for patch v189
+- Removed nlink field from struct devfs_inode
+- Removed auto-ownership for /dev/pty/* (BSD ptys) and used
+  DEVFS_FL_CURRENT_OWNER|DEVFS_FL_NO_PERSISTENCE for /dev/pty/s* (just
+  like Unix98 pty slaves) and made /dev/pty/m* rw-rw-rw- access
+===============================================================================
+Changes for patch v190
+- Updated README from master HTML file
+- Replaced BKL with global rwsem to protect symlink data (quick and
+  dirty hack)
+===============================================================================
+Changes for patch v191
+- Replaced global rwsem for symlink with per-link refcount
+===============================================================================
+Changes for patch v192
+- Removed unnecessary #ifdef CONFIG_DEVFS_FS from arch/i386/kernel/mtrr.c
+- Ported to kernel 2.4.10-pre11
+- Set inode->i_mapping->a_ops for block nodes in <get_vfs_inode>
+===============================================================================
+Changes for patch v193
+- Went back to global rwsem for symlinks (refcount scheme no good)
+===============================================================================
+Changes for patch v194
+- Fixed overrun in <devfs_link> by removing function (not needed)
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v195
+- Fixed buffer underrun in <try_modload>
+- Moved down_read() from <search_for_entry_in_dir> to <find_entry>
+===============================================================================
+Changes for patch v196
+- Fixed race in <devfsd_ioctl> when setting event mask
+  Thanks to Kari Hurtta <hurtta@leija.mh.fmi.fi>
+- Avoid deadlock in <devfs_follow_link> by using temporary buffer
+===============================================================================
+Changes for patch v197
+- First release of new locking code for devfs core (v1.0)
+- Fixed bug in drivers/cdrom/cdrom.c
+===============================================================================
+Changes for patch v198
+- Discard temporary buffer, now use "%s" for dentry names
+- Don't generate path in <try_modload>: use fake entry instead
+- Use "existing" directory in <_devfs_make_parent_for_leaf>
+- Use slab cache rather than fixed buffer for devfsd events
+===============================================================================
+Changes for patch v199
+- Removed obsolete usage of DEVFS_FL_NO_PERSISTENCE
+- Send DEVFSD_NOTIFY_REGISTERED events in <devfs_mk_dir>
+- Fixed locking bug in <devfs_d_revalidate_wait> due to typo
+- Do not send CREATE, CHANGE, ASYNC_OPEN or DELETE events from devfsd
+  or children
+===============================================================================
+Changes for patch v200
+- Ported to kernel 2.5.1-pre2
+===============================================================================
+Changes for patch v201
+- Fixed bug in <devfsd_read>: was dereferencing freed pointer
+===============================================================================
+Changes for patch v202
+- Fixed bug in <devfsd_close>: was dereferencing freed pointer
+- Added process group check for devfsd privileges
+===============================================================================
+Changes for patch v203
+- Use SLAB_ATOMIC in <devfsd_notify_de> from <devfs_d_delete>
+===============================================================================
+Changes for patch v204
+- Removed long obsolete rc.devfs
+- Return old entry in <devfs_mk_dir> for 2.4.x kernels
+- Updated README from master HTML file
+- Increment refcount on module in <check_disc_changed>
+- Created <devfs_get_handle> and exported <devfs_put>
+- Increment refcount on module in <devfs_get_ops>
+- Created <devfs_put_ops> and used where needed to fix races
+- Added clarifying comments in response to preliminary EMC code review
+- Added poisoning to <devfs_put>
+- Improved debugging messages
+- Fixed unregister bugs in drivers/md/lvm-fs.c
+===============================================================================
+Changes for patch v205
+- Corrected (made useful) debugging message in <unregister>
+- Moved <kmem_cache_create> in <mount_devfs_fs> to <init_devfs_fs>
+- Fixed drivers/md/lvm-fs.c to create "lvm" entry
+- Added magic number to guard against scribbling drivers
+- Only return old entry in <devfs_mk_dir> if a directory
+- Defined macros for error and debug messages
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v206
+- Added support for multiple Compaq cpqarray controllers
+- Fixed (rare, old) race in <devfs_lookup>
+===============================================================================
+Changes for patch v207
+- Fixed deadlock bug in <devfs_d_revalidate_wait>
+- Tag VFS deletable in <devfs_mk_symlink> if handle ignored
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v208
+- Added KERN_* to remaining messages
+- Cleaned up declaration of <stat_read>
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v209
+- Updated README from master HTML file
+- Removed silently introduced calls to lock_kernel() and
+  unlock_kernel() due to recent VFS locking changes. BKL isn't
+  required in devfs 
+- Changed <devfs_rmdir> to allow later additions if not yet empty
+- Added calls to <devfs_register_partitions> in drivers/block/blkpc.c
+  <add_partition> and <del_partition>
+- Fixed bug in <devfs_alloc_unique_number>: was clearing beyond
+  bitfield
+- Fixed bitfield data type for <devfs_*alloc_devnum>
+- Made major bitfield type and initialiser 64 bit safe
+===============================================================================
+Changes for patch v210
+- Updated fs/devfs/util.c to fix shift warning on 64 bit machines
+  Thanks to Anton Blanchard <anton@samba.org>
+- Updated README from master HTML file
+===============================================================================
+Changes for patch v211
+- Do not put miscellaneous character devices in /dev/misc if they
+  specify their own directory (i.e. contain a '/' character)
+- Copied macro for error messages from fs/devfs/base.c to
+  fs/devfs/util.c and made use of this macro
+- Removed 2.4.x compatibility code from fs/devfs/base.c
+===============================================================================
+Changes for patch v212
+- Added BKL to <devfs_open> because drivers still need it
+===============================================================================
+Changes for patch v213
+- Protected <scan_dir_for_removable> and <get_removable_partition>
+  from changing directory contents
+===============================================================================
+Changes for patch v214
+- Switched to ISO C structure field initialisers
+- Switch to set_current_state() and move before add_wait_queue()
+- Updated README from master HTML file
+- Fixed devfs entry leak in <devfs_readdir> when *readdir fails
+===============================================================================
+Changes for patch v215
+- Created <devfs_find_and_unregister>
+- Switched many functions from <devfs_find_handle> to
+  <devfs_find_and_unregister>
+- Switched many functions from <devfs_find_handle> to <devfs_get_handle>
+===============================================================================
+Changes for patch v216
+- Switched arch/ia64/sn/io/hcl.c from <devfs_find_handle> to
+  <devfs_get_handle>
+- Removed deprecated <devfs_find_handle>
+===============================================================================
+Changes for patch v217
+- Exported <devfs_find_and_unregister> and <devfs_only> to modules
+- Updated README from master HTML file
+- Fixed module unload race in <devfs_open>
+===============================================================================
+Changes for patch v218
+- Removed DEVFS_FL_AUTO_OWNER flag
+- Switched lingering structure field initialiser to ISO C
+- Added locking when setting/clearing flags
+- Documentation fix in fs/devfs/util.c
diff --git a/Documentation/filesystems/devfs/README b/Documentation/filesystems/devfs/README
new file mode 100644
index 000000000000..54366ecc241f
--- /dev/null
+++ b/Documentation/filesystems/devfs/README
@@ -0,0 +1,1964 @@
+Devfs (Device File System) FAQ
+Linux Devfs (Device File System) FAQ
+Richard Gooch
+20-AUG-2002
+Document languages:
+-----------------------------------------------------------------------------
+NOTE: the master copy of this document is available online at:
+http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html
+and looks much better than the text version distributed with the
+kernel sources. A mirror site is available at:
+http://www.ras.ucalgary.ca/~rgooch/linux/docs/devfs.html
+There is also an optional daemon that may be used with devfs. You can
+find out more about it at:
+http://www.atnf.csiro.au/~rgooch/linux/
+A mailing list is available which you may subscribe to. Send
+email
+to majordomo@oss.sgi.com with the following line in the
+body of the message:
+subscribe devfs
+To unsubscribe, send the message body:
+unsubscribe devfs
+instead. The list is archived at
+http://oss.sgi.com/projects/devfs/archive/.
+-----------------------------------------------------------------------------
+Contents
+What is it?
+Why do it?
+Who else does it?
+How it works
+Operational issues (essential reading)
+Instructions for the impatient
+Permissions persistence across reboots
+Dealing with drivers without devfs support
+All the way with Devfs
+Other Issues
+Kernel Naming Scheme
+Devfsd Naming Scheme
+Old Compatibility Names
+SCSI Host Probing Issues
+Device drivers currently ported
+Allocation of Device Numbers
+Questions and Answers
+Making things work
+Alternatives to devfs
+What I don't like about devfs
+How to report bugs
+Strange kernel messages
+Compilation problems with devfsd
+Other resources
+Translations of this document
+-----------------------------------------------------------------------------
+What is it?
+Devfs is an alternative to "real" character and block special devices
+on your root filesystem. Kernel device drivers can register devices by
+name rather than major and minor numbers. These devices will appear in
+devfs automatically, with whatever default ownership and
+protection the driver specified. A daemon (devfsd) can be used to
+override these defaults. Devfs has been in the kernel since 2.3.46.
+NOTE that devfs is entirely optional. If you prefer the old
+disc-based device nodes, then simply leave CONFIG_DEVFS_FS=n (the
+default). In this case, nothing will change.  ALSO NOTE that if you do
+enable devfs, the defaults are such that full compatibility is
+maintained with the old devices names.
+There are two aspects to devfs: one is the underlying device
+namespace, which is a namespace just like any mounted filesystem. The
+other aspect is the filesystem code which provides a view of the
+device namespace. The reason I make a distinction is because devfs
+can be mounted many times, with each mount showing the same device
+namespace. Changes made are global to all mounted devfs filesystems.
+Also, because the devfs namespace exists without any devfs mounts, you
+can easily mount the root filesystem by referring to an entry in the
+devfs namespace.
+The cost of devfs is a small increase in kernel code size and memory
+usage. About 7 pages of code (some of that in __init sections) and 72
+bytes for each entry in the namespace. A modest system has only a
+couple of hundred device entries, so this costs a few more
+pages. Compare this with the suggestion to put /dev on a <a
+href="#why-faq-ramdisc">ramdisc.
+On a typical machine, the cost is under 0.2 percent. On a modest
+system with 64 MBytes of RAM, the cost is under 0.1 percent.  The
+accusations of "bloatware" levelled at devfs are not justified.
+-----------------------------------------------------------------------------
+Why do it?
+There are several problems that devfs addresses. Some of these
+problems are more serious than others (depending on your point of
+view), and some can be solved without devfs. However, the totality of
+these problems really calls out for devfs.
+The choice is a patchwork of inefficient user space solutions, which
+are complex and likely to be fragile, or to use a simple and efficient
+devfs which is robust.
+There have been many counter-proposals to devfs, all seeking to
+provide some of the benefits without actually implementing devfs. So
+far there has been an absence of code and no proposed alternative has
+been able to provide all the features that devfs does. Further,
+alternative proposals require far more complexity in user-space (and
+still deliver less functionality than devfs). Some people have the
+mantra of reducing "kernel bloat", but don't consider the effects on
+user-space.
+A good solution limits the total complexity of kernel-space and
+user-space.
+Major&minor allocation
+The existing scheme requires the allocation of major and minor device
+numbers for each and every device. This means that a central
+co-ordinating authority is required to issue these device numbers
+(unless you're developing a "private" device driver), in order to
+preserve uniqueness. Devfs shifts the burden to a namespace. This may
+not seem like a huge benefit, but actually it is. Since driver authors
+will naturally choose a device name which reflects the functionality
+of the device, there is far less potential for namespace conflict.
+Solving this requires a kernel change.
+/dev management
+Because you currently access devices through device nodes, these must
+be created by the system administrator. For standard devices you can
+usually find a MAKEDEV programme which creates all these (hundreds!)
+of nodes. This means that changes in the kernel must be reflected by
+changes in the MAKEDEV programme, or else the system administrator
+creates device nodes by hand.
+The basic problem is that there are two separate databases of
+major and minor numbers. One is in the kernel and one is in /dev (or
+in a MAKEDEV programme, if you want to look at it that way). This is
+duplication of information, which is not good practice.
+Solving this requires a kernel change.
+/dev growth
+A typical /dev has over 1200 nodes! Most of these devices simply don't
+exist because the hardware is not available. A huge /dev increases the
+time to access devices (I'm just referring to the dentry lookup times
+and the time taken to read inodes off disc: the next subsection shows
+some more horrors).
+An example of how big /dev can grow is if we consider SCSI devices:
+host           6  bits  (say up to 64 hosts on a really big machine)
+channel        4  bits  (say up to 16 SCSI buses per host)
+id             4  bits
+lun            3  bits
+partition      6  bits
+TOTAL          23 bits
+This requires 8 Mega (1024*1024) inodes if we want to store all
+possible device nodes. Even if we scrap everything but id,partition
+and assume a single host adapter with a single SCSI bus and only one
+logical unit per SCSI target (id), that's still 10 bits or 1024
+inodes. Each VFS inode takes around 256 bytes (kernel 2.1.78), so
+that's 256 kBytes of inode storage on disc (assuming real inodes take
+a similar amount of space as VFS inodes). This is actually not so bad,
+because disc is cheap these days. Embedded systems would care about
+256 kBytes of /dev inodes, but you could argue that embedded systems
+would have hand-tuned /dev directories. I've had to do just that on my
+embedded systems, but I would rather just leave it to devfs.
+Another issue is the time taken to lookup an inode when first
+referenced. Not only does this take time in scanning through a list in
+memory, but also the seek times to read the inodes off disc.
+This could be solved in user-space using a clever programme which
+scanned the kernel logs and deleted /dev entries which are not
+available and created them when they were available. This programme
+would need to be run every time a new module was loaded, which would
+slow things down a lot.
+There is an existing programme called scsidev which will automatically
+create device nodes for SCSI devices. It can do this by scanning files
+in /proc/scsi. Unfortunately, to extend this idea to other device
+nodes would require significant modifications to existing drivers (so
+they too would provide information in /proc). This is a non-trivial
+change (I should know: devfs has had to do something similar). Once
+you go to this much effort, you may as well use devfs itself (which
+also provides this information).  Furthermore, such a system would
+likely be implemented in an ad-hoc fashion, as different drivers will
+provide their information in different ways.
+Devfs is much cleaner, because it (naturally) has a uniform mechanism
+to provide this information: the device nodes themselves!
+Node to driver file_operations translation
+There is an important difference between the way disc-based character
+and block nodes and devfs entries make the connection between an entry
+in /dev and the actual device driver.
+With the current 8 bit major and minor numbers the connection between
+disc-based c&b nodes and per-major drivers is done through a
+fixed-length table of 128 entries. The various filesystem types set
+the inode operations for c&b nodes to {chr,blk}dev_inode_operations,
+so when a device is opened a few quick levels of indirection bring us
+to the driver file_operations.
+For miscellaneous character devices a second step is required: there
+is a scan for the driver entry with the same minor number as the file
+that was opened, and the appropriate minor open method is called. This
+scanning is done *every time* you open a device node. Potentially, you
+may be searching through dozens of misc. entries before you find your
+open method. While not an enormous performance overhead, this does
+seem pointless.
+Linux *must* move beyond the 8 bit major and minor barrier,
+somehow. If we simply increase each to 16 bits, then the indexing
+scheme used for major driver lookup becomes untenable, because the
+major tables (one each for character and block devices) would need to
+be 64 k entries long (512 kBytes on x86, 1 MByte for 64 bit
+systems). So we would have to use a scheme like that used for
+miscellaneous character devices, which means the search time goes up
+linearly with the average number of major device drivers on your
+system. Not all "devices" are hardware, some are higher-level drivers
+like KGI, so you can get more "devices" without adding hardware
+You can improve this by creating an ordered (balanced:-)
+binary tree, in which case your search time becomes log(N).
+Alternatively, you can use hashing to speed up the search.
+But why do that search at all if you don't have to? Once again, it
+seems pointless.
+Note that devfs doesn't use the major&minor system. For devfs
+entries, the connection is done when you lookup the /dev entry. When
+devfs_register() is called, an internal table is appended which has
+the entry name and the file_operations. If the dentry cache doesn't
+have the /dev entry already, this internal table is scanned to get the
+file_operations, and an inode is created. If the dentry cache already
+has the entry, there is *no lookup time* (other than the dentry scan
+itself, but we can't avoid that anyway, and besides Linux dentries
+cream other OS's which don't have them:-). Furthermore, the number of
+node entries in a devfs is only the number of available device
+entries, not the number of *conceivable* entries. Even if you remove
+unnecessary entries in a disc-based /dev, the number of conceivable
+entries remains the same: you just limit yourself in order to save
+space.
+Devfs provides a fast connection between a VFS node and the device
+driver, in a scalable way.
+/dev as a system administration tool
+Right now /dev contains a list of conceivable devices, most of which I
+don't have. Devfs only shows those devices available on my
+system. This means that listing /dev is a handy way of checking what
+devices are available.
+Major&minor size
+Existing major and minor numbers are limited to 8 bits each. This is
+now a limiting factor for some drivers, particularly the SCSI disc
+driver, which consumes a single major number. Only 16 discs are
+supported, and each disc may have only 15 partitions. Maybe this isn't
+a problem for you, but some of us are building huge Linux systems with
+disc arrays. With devfs an arbitrary pointer can be associated with
+each device entry, which can be used to give an effective 32 bit
+device identifier (i.e. that's like having a 32 bit minor
+number). Since this is private to the kernel, there are no C library
+compatibility issues which you would have with increasing major and
+minor number sizes. See the section on "Allocation of Device Numbers"
+for details on maintaining compatibility with userspace.
+Solving this requires a kernel change.
+Since writing this, the kernel has been modified so that the SCSI disc
+driver has more major numbers allocated to it and now supports up to
+128 discs. Since these major numbers are non-contiguous (a result of
+unplanned expansion), the implementation is a little more cumbersome
+than originally.
+Just like the changes to IPv4 to fix impending limitations in the
+address space, people find ways around the limitations. In the long
+run, however, solutions like IPv6 or devfs can't be put off forever.
+Read-only root filesystem
+Having your device nodes on the root filesystem means that you can't
+operate properly with a read-only root filesystem. This is because you
+want to change ownerships and protections of tty devices. Existing
+practice prevents you using a CD-ROM as your root filesystem for a
+*real* system. Sure, you can boot off a CD-ROM, but you can't change
+tty ownerships, so it's only good for installing.
+Also, you can't use a shared NFS root filesystem for a cluster of
+discless Linux machines (having tty ownerships changed on a common
+/dev is not good). Nor can you embed your root filesystem in a
+ROM-FS.
+You can get around this by creating a RAMDISC at boot time, making
+an ext2 filesystem in it, mounting it somewhere and copying the
+contents of /dev into it, then unmounting it and mounting it over
+/dev.
+A devfs is a cleaner way of solving this.
+Non-Unix root filesystem
+Non-Unix filesystems (such as NTFS) can't be used for a root
+filesystem because they variously don't support character and block
+special files or symbolic links. You can't have a separate disc-based
+or RAMDISC-based filesystem mounted on /dev because you need device
+nodes before you can mount these. Devfs can be mounted without any
+device nodes. Devlinks won't work because symlinks aren't supported.
+An alternative solution is to use initrd to mount a RAMDISC initial
+root filesystem (which is populated with a minimal set of device
+nodes), and then construct a new /dev in another RAMDISC, and finally
+switch to your non-Unix root filesystem. This requires clever boot
+scripts and a fragile and conceptually complex boot procedure.
+Devfs solves this in a robust and conceptually simple way.
+PTY security
+Current pseudo-tty (pty) devices are owned by root and read-writable
+by everyone. The user of a pty-pair cannot change
+ownership/protections without being suid-root.
+This could be solved with a secure user-space daemon which runs as
+root and does the actual creation of pty-pairs. Such a daemon would
+require modification to *every* programme that wants to use this new
+mechanism. It also slows down creation of pty-pairs.
+An alternative is to create a new open_pty() syscall which does much
+the same thing as the user-space daemon. Once again, this requires
+modifications to pty-handling programmes.
+The devfs solution allows a device driver to "tag" certain device
+files so that when an unopened device is opened, the ownerships are
+changed to the current euid and egid of the opening process, and the
+protections are changed to the default registered by the driver. When
+the device is closed ownership is set back to root and protections are
+set back to read-write for everybody. No programme need be changed.
+The devpts filesystem provides this auto-ownership feature for Unix98
+ptys. It doesn't support old-style pty devices, nor does it have all
+the other features of devfs.
+Intelligent device management
+Devfs implements a simple yet powerful protocol for communication with
+a device management daemon (devfsd) which runs in user space. It is
+possible to send a message (either synchronously or asynchronously) to
+devfsd on any event, such as registration/unregistration of device
+entries, opening and closing devices, looking up inodes, scanning
+directories and more. This has many possibilities. Some of these are
+already implemented. See:
+http://www.atnf.csiro.au/~rgooch/linux/
+Device entry registration events can be used by devfsd to change
+permissions of newly-created device nodes. This is one mechanism to
+control device permissions.
+Device entry registration/unregistration events can be used to run
+programmes or scripts. This can be used to provide automatic mounting
+of filesystems when a new block device media is inserted into the
+drive.
+Asynchronous device open and close events can be used to implement
+clever permissions management. For example, the default permissions on
+/dev/dsp do not allow everybody to read from the device. This is
+sensible, as you don't want some remote user recording what you say at
+your console. However, the console user is also prevented from
+recording. This behaviour is not desirable. With asynchronous device
+open and close events, you can have devfsd run a programme or script
+when console devices are opened to change the ownerships for *other*
+device nodes (such as /dev/dsp). On closure, you can run a different
+script to restore permissions. An advantage of this scheme over
+modifying the C library tty handling is that this works even if your
+programme crashes (how many times have you seen the utmp database with
+lingering entries for non-existent logins?).
+Synchronous device open events can be used to perform intelligent
+device access protections. Before the device driver open() method is
+called, the daemon must first validate the open attempt, by running an
+external programme or script. This is far more flexible than access
+control lists, as access can be determined on the basis of other
+system conditions instead of just the UID and GID.
+Inode lookup events can be used to authenticate module autoload
+requests. Instead of using kmod directly, the event is sent to
+devfsd which can implement an arbitrary authentication before loading
+the module itself.
+Inode lookup events can also be used to construct arbitrary
+namespaces, without having to resort to populating devfs with symlinks
+to devices that don't exist.
+Speculative Device Scanning
+Consider an application (like cdparanoia) that wants to find all
+CD-ROM devices on the system (SCSI, IDE and other types), whether or
+not their respective modules are loaded. The application must
+speculatively open certain device nodes (such as /dev/sr0 for the SCSI
+CD-ROMs) in order to make sure the module is loaded. This requires
+that all Linux distributions follow the standard device naming scheme
+(last time I looked RedHat did things differently). Devfs solves the
+naming problem.
+The same application also wants to see which devices are actually
+available on the system. With the existing system it needs to read the
+/dev directory and speculatively open each /dev/sr* device to
+determine if the device exists or not. With a large /dev this is an
+inefficient operation, especially if there are many /dev/sr* nodes. A
+solution like scsidev could reduce the number of /dev/sr* entries (but
+of course that also requires all that inefficient directory scanning).
+With devfs, the application can open the /dev/sr directory
+(which triggers the module autoloading if required), and proceed to
+read /dev/sr. Since only the available devices will have
+entries, there are no inefficencies in directory scanning or device
+openings.
+-----------------------------------------------------------------------------
+Who else does it?
+FreeBSD has a devfs implementation. Solaris and AIX each have a
+pseudo-devfs (something akin to scsidev but for all devices, with some
+unspecified kernel support). BeOS, Plan9 and QNX also have it. SGI's
+IRIX 6.4 and above also have a device filesystem.
+While we shouldn't just automatically do something because others do
+it, we should not ignore the work of others either. FreeBSD has a lot
+of competent people working on it, so their opinion should not be
+blithely ignored.
+-----------------------------------------------------------------------------
+How it works
+Registering device entries
+For every entry (device node) in a devfs-based /dev a driver must call
+devfs_register(). This adds the name of the device entry, the
+file_operations structure pointer and a few other things to an
+internal table. Device entries may be added and removed at any
+time. When a device entry is registered, it automagically appears in
+any mounted devfs'.
+Inode lookup
+When a lookup operation on an entry is performed and if there is no
+driver information for that entry devfs will attempt to call
+devfsd. If still no driver information can be found then a negative
+dentry is yielded and the next stage operation will be called by the
+VFS (such as create() or mknod() inode methods). If driver information
+can be found, an inode is created (if one does not exist already) and
+all is well.
+Manually creating device nodes
+The mknod() method allows you to create an ordinary named pipe in the
+devfs, or you can create a character or block special inode if one
+does not already exist. You may wish to create a character or block
+special inode so that you can set permissions and ownership. Later, if
+a device driver registers an entry with the same name, the
+permissions, ownership and times are retained. This is how you can set
+the protections on a device even before the driver is loaded. Once you
+create an inode it appears in the directory listing.
+Unregistering device entries
+A device driver calls devfs_unregister() to unregister an entry.
+Chroot() gaols
+2.2.x kernels
+The semantics of inode creation are different when devfs is mounted
+with the "explicit" option. Now, when a device entry is registered, it
+will not appear until you use mknod() to create the device. It doesn't
+matter if you mknod() before or after the device is registered with
+devfs_register(). The purpose of this behaviour is to support
+chroot(2) gaols, where you want to mount a minimal devfs inside the
+gaol. Only the devices you specifically want to be available (through
+your mknod() setup) will be accessible.
+2.4.x kernels
+As of kernel 2.3.99, the VFS has had the ability to rebind parts of
+the global filesystem namespace into another part of the namespace.
+This now works even at the leaf-node level, which means that
+individual files and device nodes may be bound into other parts of the
+namespace. This is like making links, but better, because it works
+across filesystems (unlike hard links) and works through chroot()
+gaols (unlike symbolic links).
+Because of these improvements to the VFS, the multi-mount capability
+in devfs is no longer needed. The administrator may create a minimal
+device tree inside a chroot(2) gaol by using VFS bindings. As this
+provides most of the features of the devfs multi-mount capability, I
+removed the multi-mount support code (after issuing an RFC). This
+yielded code size reductions and simplifications.
+If you want to construct a minimal chroot() gaol, the following
+command should suffice:
+mount --bind /dev/null /gaol/dev/null
+Repeat for other device nodes you want to expose. Simple!
+-----------------------------------------------------------------------------
+Operational issues
+Instructions for the impatient
+Nobody likes reading documentation. People just want to get in there
+and play. So this section tells you quickly the steps you need to take
+to run with devfs mounted over /dev. Skip these steps and you will end
+up with a nearly unbootable system. Subsequent sections describe the
+issues in more detail, and discuss non-essential configuration
+options.
+Devfsd
+OK, if you're reading this, I assume you want to play with
+devfs. First you should ensure that /usr/src/linux contains a
+recent kernel source tree. Then you need to compile devfsd, the device
+management daemon, available at
+http://www.atnf.csiro.au/~rgooch/linux/.
+Because the kernel has a naming scheme
+which is quite different from the old naming scheme, you need to
+install devfsd so that software and configuration files that use the
+old naming scheme will not break.
+Compile and install devfsd. You will be provided with a default
+configuration file /etc/devfsd.conf which will provide
+compatibility symlinks for the old naming scheme. Don't change this
+config file unless you know what you're doing. Even if you think you
+do know what you're doing, don't change it until you've followed all
+the steps below and booted a devfs-enabled system and verified that it
+works.
+Now edit your main system boot script so that devfsd is started at the
+very beginning (before any filesystem
+checks). /etc/rc.d/rc.sysinit is often the main boot script
+on systems with SysV-style boot scripts. On systems with BSD-style
+boot scripts it is often /etc/rc. Also check
+/sbin/rc.
+NOTE that the line you put into the boot
+script should be exactly:
+/sbin/devfsd /dev
+DO NOT use some special daemon-launching
+programme, otherwise the boot script may not wait for devfsd to finish
+initialising.
+System Libraries
+There may still be some problems because of broken software making
+assumptions about device names. In particular, some software does not
+handle devices which are symbolic links. If you are running a libc 5
+based system, install libc 5.4.44 (if you have libc 5.4.46, go back to
+libc 5.4.44, which is actually correct). If you are running a glibc
+based system, make sure you have glibc 2.1.3 or later.
+/etc/securetty
+PAM (Pluggable Authentication Modules) is supposed to be a flexible
+mechanism for providing better user authentication and access to
+services. Unfortunately, it's also fragile, complex and undocumented
+(check out RedHat 6.1, and probably other distributions as well). PAM
+has problems with symbolic links. Append the following lines to your
+/etc/securetty file:
+vc/1
+vc/2
+vc/3
+vc/4
+vc/5
+vc/6
+vc/7
+vc/8
+This will not weaken security. If you have a version of util-linux
+earlier than 2.10.h, please upgrade to 2.10.h or later. If you
+absolutely cannot upgrade, then also append the following lines to
+your /etc/securetty file:
+1
+2
+3
+4
+5
+6
+7
+8
+This may potentially weaken security by allowing root logins over the
+network (a password is still required, though). However, since there
+are problems with dealing with symlinks, I'm suspicious of the level
+of security offered in any case.
+XFree86
+While not essential, it's probably a good idea to upgrade to XFree86
+4.0, as patches went in to make it more devfs-friendly. If you don't,
+you'll probably need to apply the following patch to
+/etc/security/console.perms so that ordinary users can run
+startx. Note that not all distributions have this file (e.g. Debian),
+so if it's not present, don't worry about it.
+--- /etc/security/console.perms.orig    Sat Apr 17 16:26:47 1999 
+++ /etc/security/console.perms Fri Feb 25 23:53:55 2000 
+@@ -14,7 +14,7 @@ 
+ # man 5 console.perms 
+ # file classes -- these are regular expressions 
+-<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 
+<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9] 
+ # device classes -- these are shell-style globs 
+ <floppy>=/dev/fd[0-1]* 
+If the patch does not apply, then change the line:
+<console>=tty[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
+with:
+<console>=tty[0-9][0-9]* vc/[0-9][0-9]* :[0-9]\.[0-9] :[0-9]
+Disable devpts
+I've had a report of devpts mounted on /dev/pts not working
+correctly. Since devfs will also manage /dev/pts, there is no
+need to mount devpts as well. You should either edit your
+/etc/fstab so devpts is not mounted, or disable devpts from
+your kernel configuration.
+Unsupported drivers
+Not all drivers have devfs support. If you depend on one of these
+drivers, you will need to create a script or tarfile that you can use
+at boot time to create device nodes as appropriate. There is a
+section which describes this. Another
+section lists the drivers which have
+devfs support.
+/dev/mouse
+Many disributions configure /dev/mouse to be the mouse device
+for XFree86 and GPM. I actually think this is a bad idea, because it
+adds another level of indirection. When looking at a config file, if
+you see /dev/mouse you're left wondering which mouse
+is being referred to. Hence I recommend putting the actual mouse
+device (for example /dev/psaux) into your
+/etc/X11/XF86Config file (and similarly for the GPM
+configuration file).
+Alternatively, use the same technique used for unsupported drivers
+described above.
+The Kernel
+Finally, you need to make sure devfs is compiled into your kernel. Set
+CONFIG_EXPERIMENTAL=y, CONFIG_DEVFS_FS=y and CONFIG_DEVFS_MOUNT=y by
+using favourite configuration tool (i.e. make config or
+make xconfig) and then make clean and then recompile your kernel and 
+modules. At boot, devfs will be mounted onto /dev.
+If you encounter problems booting (for example if you forgot a
+configuration step), you can pass devfs=nomount at the kernel
+boot command line. This will prevent the kernel from mounting devfs at
+boot time onto /dev.
+In general, a kernel built with CONFIG_DEVFS_FS=y but without mounting
+devfs onto /dev is completely safe, and requires no
+configuration changes. One exception to take note of is when
+LABEL= directives are used in /etc/fstab. In this
+case you will be unable to boot properly. This is because the
+mount(8) programme uses /proc/partitions as part of
+the volume label search process, and the device names it finds are not
+available, because setting CONFIG_DEVFS_FS=y changes the names in
+/proc/partitions, irrespective of whether devfs is mounted.
+Now you've finished all the steps required. You're now ready to boot
+your shiny new kernel. Enjoy.
+Changing the configuration
+OK, you've now booted a devfs-enabled system, and everything works.
+Now you may feel like changing the configuration (common targets are
+/etc/fstab and /etc/devfsd.conf). Since you have a
+system that works, if you make any changes and it doesn't work, you
+now know that you only have to restore your configuration files to the
+default and it will work again.
+Permissions persistence across reboots
+If you don't use mknod(2) to create a device file, nor use chmod(2) or
+chown(2) to change the ownerships/permissions, the inode ctime will
+remain at 0 (the epoch, 12 am, 1-JAN-1970, GMT). Anything with a ctime
+later than this has had it's ownership/permissions changed. Hence, a
+simple script or programme may be used to tar up all changed inodes,
+prior to shutdown. Although effective, many consider this approach a
+kludge.
+A much better approach is to use devfsd to save and restore
+permissions. It may be configured to record changes in permissions and
+will save them in a database (in fact a directory tree), and restore
+these upon boot. This is an efficient method and results in immediate
+saving of current permissions (unlike the tar approach, which saves
+permissions at some unspecified future time).
+The default configuration file supplied with devfsd has config entries
+which you may uncomment to enable persistence management.
+If you decide to use the tar approach anyway, be aware that tar will
+first unlink(2) an inode before creating a new device node. The
+unlink(2) has the effect of breaking the connection between a devfs
+entry and the device driver. If you use the "devfs=only" boot option,
+you lose access to the device driver, requiring you to reload the
+module. I consider this a bug in tar (there is no real need to
+unlink(2) the inode first).
+Alternatively, you can use devfsd to provide more sophisticated
+management of device permissions. You can use devfsd to store
+permissions for whole groups of devices with a single configuration
+entry, rather than the conventional single entry per device entry.
+Permissions database stored in mounted-over /dev
+If you wish to save and restore your device permissions into the
+disc-based /dev while still mounting devfs onto /dev
+you may do so. This requires a 2.4.x kernel (in fact, 2.3.99 or
+later), which has the VFS binding facility. You need to do the
+following to set this up:
+make sure the kernel does not mount devfs at boot time
+make sure you have a correct /dev/console entry in your
+root file-system (where your disc-based /dev lives)
+create the /dev-state directory
+add the following lines near the very beginning of your boot
+scripts:
+mount --bind /dev /dev-state
+mount -t devfs none /dev
+devfsd /dev
+add the following lines to your /etc/devfsd.conf file:
+REGISTER        ^pt[sy]         IGNORE
+CREATE          ^pt[sy]         IGNORE
+CHANGE          ^pt[sy]         IGNORE
+DELETE          ^pt[sy]         IGNORE
+REGISTER        .*              COPY    /dev-state/$devname $devpath
+CREATE          .*              COPY    $devpath /dev-state/$devname
+CHANGE          .*              COPY    $devpath /dev-state/$devname
+DELETE          .*              CFUNCTION GLOBAL unlink /dev-state/$devname
+RESTORE         /dev-state
+Note that the sample devfsd.conf file contains these lines,
+as well as other sample configurations you may find useful. See the
+devfsd distribution
+reboot.
+Permissions database stored in normal directory
+If you are using an older kernel which doesn't support VFS binding,
+then you won't be able to have the permissions database in a
+mounted-over /dev. However, you can still use a regular
+directory to store the database. The sample /etc/devfsd.conf
+file above may still be used. You will need to create the
+/dev-state directory prior to installing devfsd. If you have
+old permissions in /dev, then just copy (or move) the device
+nodes over to the new directory.
+Which method is better?
+The best method is to have the permissions database stored in the
+mounted-over /dev. This is because you will not need to copy
+device nodes over to /dev-state, and because it allows you to
+switch between devfs and non-devfs kernels, without requiring you to
+copy permissions between /dev-state (for devfs) and
+/dev (for non-devfs).
+Dealing with drivers without devfs support
+Currently, not all device drivers in the kernel have been modified to
+use devfs. Device drivers which do not yet have devfs support will not
+automagically appear in devfs. The simplest way to create device nodes
+for these drivers is to unpack a tarfile containing the required
+device nodes. You can do this in your boot scripts. All your drivers
+will now work as before.
+Hopefully for most people devfs will have enough support so that they
+can mount devfs directly over /dev without losing most functionality
+(i.e. losing access to various devices). As of 22-JAN-1998 (devfs
+patch version 10) I am now running this way. All the devices I have
+are available in devfs, so I don't lose anything.
+WARNING: if your configuration requires the old-style device names
+(i.e. /dev/hda1 or /dev/sda1), you must install devfsd and configure
+it to maintain compatibility entries. It is almost certain that you
+will require this. Note that the kernel creates a compatibility entry
+for the root device, so you don't need initrd.
+Note that you no longer need to mount devpts if you use Unix98 PTYs,
+as devfs can manage /dev/pts itself. This saves you some RAM, as you
+don't need to compile and install devpts. Note that some versions of
+glibc have a bug with Unix98 pty handling on devfs systems. Contact
+the glibc maintainers for a fix. Glibc 2.1.3 has the fix.
+Note also that apart from editing /etc/fstab, other things will need
+to be changed if you *don't* install devfsd. Some software (like the X
+server) hard-wire device names in their source. It really is much
+easier to install devfsd so that compatibility entries are created.
+You can then slowly migrate your system to using the new device names
+(for example, by starting with /etc/fstab), and then limiting the
+compatibility entries that devfsd creates.
+IF YOU CONFIGURE TO MOUNT DEVFS AT BOOT, MAKE SURE YOU INSTALL DEVFSD
+BEFORE YOU BOOT A DEVFS-ENABLED KERNEL!
+Now that devfs has gone into the 2.3.46 kernel, I'm getting a lot of
+reports back. Many of these are because people are trying to run
+without devfsd, and hence some things break. Please just run devfsd if
+things break. I want to concentrate on real bugs rather than
+misconfiguration problems at the moment. If people are willing to fix
+bugs/false assumptions in other code (i.e. glibc, X server) and submit
+that to the respective maintainers, that would be great.
+All the way with Devfs
+The devfs kernel patch creates a rationalised device tree. As stated
+above, if you want to keep using the old /dev naming scheme,
+you just need to configure devfsd appopriately (see the man
+page). People who prefer the old names can ignore this section. For
+those of us who like the rationalised names and an uncluttered
+/dev, read on.
+If you don't run devfsd, or don't enable compatibility entry
+management, then you will have to configure your system to use the new
+names. For example, you will then need to edit your
+/etc/fstab to use the new disc naming scheme. If you want to
+be able to boot non-devfs kernels, you will need compatibility
+symlinks in the underlying disc-based /dev pointing back to
+the old-style names for when you boot a kernel without devfs.
+You can selectively decide which devices you want compatibility
+entries for. For example, you may only want compatibility entries for
+BSD pseudo-terminal devices (otherwise you'll have to patch you C
+library or use Unix98 ptys instead). It's just a matter of putting in
+the correct regular expression into /dev/devfsd.conf.
+There are other choices of naming schemes that you may prefer. For
+example, I don't use the kernel-supplied
+names, because they are too verbose. A common misconception is
+that the kernel-supplied names are meant to be used directly in
+configuration files. This is not the case. They are designed to
+reflect the layout of the devices attached and to provide easy
+classification.
+If you like the kernel-supplied names, that's fine. If you don't then
+you should be using devfsd to construct a namespace more to your
+liking. Devfsd has built-in code to construct a
+namespace that is both logical and easy to
+manage. In essence, it creates a convenient abbreviation of the
+kernel-supplied namespace.
+You are of course free to build your own namespace. Devfsd has all the
+infrastructure required to make this easy for you. All you need do is
+write a script. You can even write some C code and devfsd can load the
+shared object as a callable extension.
+Other Issues
+The init programme
+Another thing to take note of is whether your init programme
+creates a Unix socket /dev/telinit. Some versions of init
+create /dev/telinit so that the telinit programme can
+communicate with the init process. If you have such a system you need
+to make sure that devfs is mounted over /dev *before* init
+starts. In other words, you can't leave the mounting of devfs to
+/etc/rc, since this is executed after init. Other
+versions of init require a named pipe /dev/initctl
+which must exist *before* init starts. Once again, you need to
+mount devfs and then create the named pipe *before* init
+starts.
+The default behaviour now is not to mount devfs onto /dev at
+boot time for 2.3.x and later kernels. You can correct this with the
+"devfs=mount" boot option. This solves any problems with init,
+and also prevents the dreaded:
+Cannot open initial console
+message. For 2.2.x kernels where you need to apply the devfs patch,
+the default is to mount.
+If you have automatic mounting of devfs onto /dev then you
+may need to create /dev/initctl in your boot scripts. The
+following lines should suffice:
+mknod /dev/initctl p
+kill -SIGUSR1 1       # tell init that /dev/initctl now exists
+Alternatively, if you don't want the kernel to mount devfs onto
+/dev then you could use the following procedure is a
+guideline for how to get around /dev/initctl problems:
+# cd /sbin
+# mv init init.real
+# cat > init
+#! /bin/sh
+mount -n -t devfs none /dev
+mknod /dev/initctl p
+exec /sbin/init.real $*
+[control-D]
+# chmod a+x init
+Note that newer versions of init create /dev/initctl
+automatically, so you don't have to worry about this.
+Module autoloading
+You will need to configure devfsd to enable module
+autoloading. The following lines should be placed in your
+/etc/devfsd.conf file:
+LOOKUP  .*              MODLOAD
+As of devfsd-v1.3.10, a generic /etc/modules.devfs
+configuration file is installed, which is used by the MODLOAD
+action. This should be sufficient for most configurations. If you
+require further configuration, edit your /etc/modules.conf
+file. The way module autoloading work with devfs is:
+a process attempts to lookup a device node (e.g. /dev/fred)
+if that device node does not exist, the full pathname is passed to
+devfsd as a string
+devfsd will pass the string to the modprobe programme (provided the
+configuration line shown above is present), and specifies that
+/etc/modules.devfs is the configuration file
+/etc/modules.devfs includes /etc/modules.conf to
+access local configurations
+modprobe will search it's configuration files, looking for an alias
+that translates the pathname into a module name
+the translated pathname is then used to load the module.
+If you wanted a lookup of /dev/fred to load the
+mymod module, you would require the following configuration
+line in /etc/modules.conf:
+alias    /dev/fred    mymod
+The /etc/modules.devfs configuration file provides many such
+aliases for standard device names. If you look closely at this file,
+you will note that some modules require multiple alias configuration
+lines. This is required to support module autoloading for old and new
+device names.
+Mounting root off a devfs device
+If you wish to mount root off a devfs device when you pass the
+"devfs=only" boot option, then you need to pass in the
+"root=<device>" option to the kernel when booting. If you use
+LILO, then you must have this in lilo.conf:
+append = "root=<device>"
+Surprised? Yep, so was I. It turns out if you have (as most people
+do):
+root = <device>
+then LILO will determine the device number of <device> and will
+write that device number into a special place in the kernel image
+before starting the kernel, and the kernel will use that device number
+to mount the root filesystem. So, using the "append" variety ensures
+that LILO passes the root filesystem device as a string, which devfs
+can then use.
+Note that this isn't an issue if you don't pass "devfs=only".
+TTY issues
+The ttyname(3) function in some versions of the C library makes
+false assumptions about device entries which are symbolic links.  The
+tty(1) programme is one that depends on this function.  I've
+written a patch to libc 5.4.43 which fixes this. This has been
+included in libc 5.4.44 and a similar fix is in glibc 2.1.3.
+Kernel Naming Scheme
+The kernel provides a default naming scheme. This scheme is designed
+to make it easy to search for specific devices or device types, and to
+view the available devices. Some device types (such as hard discs),
+have a directory of entries, making it easy to see what devices of
+that class are available. Often, the entries are symbolic links into a
+directory tree that reflects the topology of available devices. The
+topological tree is useful for finding how your devices are arranged.
+Below is a list of the naming schemes for the most common drivers. A
+list of reserved device names is
+available for reference. Please send email to
+rgooch@atnf.csiro.au to obtain an allocation. Please be
+patient (the maintainer is busy). An alternative name may be allocated
+instead of the requested name, at the discretion of the maintainer.
+Disc Devices
+All discs, whether SCSI, IDE or whatever, are placed under the
+/dev/discs hierarchy:
+        /dev/discs/disc0        first disc
+        /dev/discs/disc1        second disc
+Each of these entries is a symbolic link to the directory for that
+device. The device directory contains:
+        disc    for the whole disc
+        part*   for individual partitions
+CD-ROM Devices
+All CD-ROMs, whether SCSI, IDE or whatever, are placed under the
+/dev/cdroms hierarchy:
+        /dev/cdroms/cdrom0      first CD-ROM
+        /dev/cdroms/cdrom1      second CD-ROM
+Each of these entries is a symbolic link to the real device entry for
+that device.
+Tape Devices
+All tapes, whether SCSI, IDE or whatever, are placed under the
+/dev/tapes hierarchy:
+        /dev/tapes/tape0        first tape
+        /dev/tapes/tape1        second tape
+Each of these entries is a symbolic link to the directory for that
+device. The device directory contains:
+        mt                      for mode 0
+        mtl                     for mode 1
+        mtm                     for mode 2
+        mta                     for mode 3
+        mtn                     for mode 0, no rewind
+        mtln                    for mode 1, no rewind
+        mtmn                    for mode 2, no rewind
+        mtan                    for mode 3, no rewind
+SCSI Devices
+To uniquely identify any SCSI device requires the following
+information:
+  controller    (host adapter)
+  bus           (SCSI channel)
+  target        (SCSI ID)
+  unit          (Logical Unit Number)
+All SCSI devices are placed under /dev/scsi (assuming devfs
+is mounted on /dev). Hence, a SCSI device with the following
+parameters: c=1,b=2,t=3,u=4 would appear as:
+        /dev/scsi/host1/bus2/target3/lun4       device directory
+Inside this directory, a number of device entries may be created,
+depending on which SCSI device-type drivers were installed.
+See the section on the disc naming scheme to see what entries the SCSI
+disc driver creates.
+See the section on the tape naming scheme to see what entries the SCSI
+tape driver creates.
+The SCSI CD-ROM driver creates:
+        cd
+The SCSI generic driver creates:
+        generic
+IDE Devices
+To uniquely identify any IDE device requires the following
+information:
+  controller
+  bus           (aka. primary/secondary)
+  target        (aka. master/slave)
+  unit
+All IDE devices are placed under /dev/ide, and uses a similar
+naming scheme to the SCSI subsystem.
+XT Hard Discs
+All XT discs are placed under /dev/xd. The first XT disc has
+the directory /dev/xd/disc0.
+TTY devices
+The tty devices now appear as:
+  New name                   Old-name                   Device Type
+  --------                   --------                   -----------
+  /dev/tts/{0,1,...}         /dev/ttyS{0,1,...}         Serial ports
+  /dev/cua/{0,1,...}         /dev/cua{0,1,...}          Call out devices
+  /dev/vc/0                  /dev/tty                   Current virtual console
+  /dev/vc/{1,2,...}          /dev/tty{1...63}           Virtual consoles
+  /dev/vcc/{0,1,...}         /dev/vcs{1...63}           Virtual consoles
+  /dev/pty/m{0,1,...}        /dev/ptyp??                PTY masters
+  /dev/pty/s{0,1,...}        /dev/ttyp??                PTY slaves
+RAMDISCS
+The RAMDISCS are placed in their own directory, and are named thus:
+  /dev/rd/{0,1,2,...}
+Meta Devices
+The meta devices are placed in their own directory, and are named
+thus:
+  /dev/md/{0,1,2,...}
+Floppy discs
+Floppy discs are placed in the /dev/floppy directory.
+Loop devices
+Loop devices are placed in the /dev/loop directory.
+Sound devices
+Sound devices are placed in the /dev/sound directory
+(audio, sequencer, ...).
+Devfsd Naming Scheme
+Devfsd provides a naming scheme which is a convenient abbreviation of
+the kernel-supplied namespace. In some
+cases, the kernel-supplied naming scheme is quite convenient, so
+devfsd does not provide another naming scheme. The convenience names
+that devfsd creates are in fact the same names as the original devfs
+kernel patch created (before Linus mandated the Big Name
+Change). These are referred to as "new compatibility entries".
+In order to configure devfsd to create these convenience names, the
+following lines should be placed in your /etc/devfsd.conf:
+REGISTER        .*              MKNEWCOMPAT
+UNREGISTER      .*              RMNEWCOMPAT
+This will cause devfsd to create (and destroy) symbolic links which
+point to the kernel-supplied names.
+SCSI Hard Discs
+All SCSI discs are placed under /dev/sd (assuming devfs is
+mounted on /dev). Hence, a SCSI disc with the following
+parameters: c=1,b=2,t=3,u=4 would appear as:
+        /dev/sd/c1b2t3u4        for the whole disc
+        /dev/sd/c1b2t3u4p5      for the 5th partition
+        /dev/sd/c1b2t3u4p5s6    for the 6th slice in the 5th partition
+SCSI Tapes
+All SCSI tapes are placed under /dev/st. A similar naming
+scheme is used as for SCSI discs. A SCSI tape with the
+parameters:c=1,b=2,t=3,u=4 would appear as:
+        /dev/st/c1b2t3u4m0      for mode 0
+        /dev/st/c1b2t3u4m1      for mode 1
+        /dev/st/c1b2t3u4m2      for mode 2
+        /dev/st/c1b2t3u4m3      for mode 3
+        /dev/st/c1b2t3u4m0n     for mode 0, no rewind
+        /dev/st/c1b2t3u4m1n     for mode 1, no rewind
+        /dev/st/c1b2t3u4m2n     for mode 2, no rewind
+        /dev/st/c1b2t3u4m3n     for mode 3, no rewind
+SCSI CD-ROMs
+All SCSI CD-ROMs are placed under /dev/sr. A similar naming
+scheme is used as for SCSI discs. A SCSI CD-ROM with the
+parameters:c=1,b=2,t=3,u=4 would appear as:
+        /dev/sr/c1b2t3u4
+SCSI Generic Devices
+The generic (aka. raw) interface for all SCSI devices are placed under
+/dev/sg. A similar naming scheme is used as for SCSI discs. A
+SCSI generic device with the parameters:c=1,b=2,t=3,u=4 would appear
+as:
+        /dev/sg/c1b2t3u4
+IDE Hard Discs
+All IDE discs are placed under /dev/ide/hd, using a similar
+convention to SCSI discs. The following mappings exist between the new
+and the old names:
+        /dev/hda        /dev/ide/hd/c0b0t0u0
+        /dev/hdb        /dev/ide/hd/c0b0t1u0
+        /dev/hdc        /dev/ide/hd/c0b1t0u0
+        /dev/hdd        /dev/ide/hd/c0b1t1u0
+IDE Tapes
+A similar naming scheme is used as for IDE discs. The entries will
+appear in the /dev/ide/mt directory.
+IDE CD-ROM
+A similar naming scheme is used as for IDE discs. The entries will
+appear in the /dev/ide/cd directory.
+IDE Floppies
+A similar naming scheme is used as for IDE discs. The entries will
+appear in the /dev/ide/fd directory.
+XT Hard Discs
+All XT discs are placed under /dev/xd. The first XT disc
+would appear as /dev/xd/c0t0.
+Old Compatibility Names
+The old compatibility names are the legacy device names, such as
+/dev/hda, /dev/sda, /dev/rtc and so on.
+Devfsd can be configured to create compatibility symlinks so that you
+may continue to use the old names in your configuration files and so
+that old applications will continue to function correctly.
+In order to configure devfsd to create these legacy names, the
+following lines should be placed in your /etc/devfsd.conf:
+REGISTER        .*              MKOLDCOMPAT
+UNREGISTER      .*              RMOLDCOMPAT
+This will cause devfsd to create (and destroy) symbolic links which
+point to the kernel-supplied names.
+-----------------------------------------------------------------------------
+Device drivers currently ported
+- All miscellaneous character devices support devfs (this is done
+  transparently through misc_register())
+- SCSI discs and generic hard discs
+- Character memory devices (null, zero, full and so on)
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- Loop devices (/dev/loop?)
+ 
+- TTY devices (console, serial ports, terminals and pseudo-terminals)
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- SCSI tapes (/dev/scsi and /dev/tapes)
+- SCSI CD-ROMs (/dev/scsi and /dev/cdroms)
+- SCSI generic devices (/dev/scsi)
+- RAMDISCS (/dev/ram?)
+- Meta Devices (/dev/md*)
+- Floppy discs (/dev/floppy)
+- Parallel port printers (/dev/printers)
+- Sound devices (/dev/sound)
+  Thanks to Eric Dumas <dumas@linux.eu.org> and
+  C. Scott Ananian <cananian@alumni.princeton.edu>
+- Joysticks (/dev/joysticks)
+- Sparc keyboard (/dev/kbd)
+- DSP56001 digital signal processor (/dev/dsp56k)
+- Apple Desktop Bus (/dev/adb)
+- Coda network file system (/dev/cfs*)
+- Virtual console capture devices (/dev/vcc)
+  Thanks to Dennis Hou <smilax@mindmeld.yi.org>
+- Frame buffer devices (/dev/fb)
+- Video capture devices (/dev/v4l)
+-----------------------------------------------------------------------------
+Allocation of Device Numbers
+Devfs allows you to write a driver which doesn't need to allocate a
+device number (major&minor numbers) for the internal operation of the
+kernel. However, there are a number of userspace programmes that use
+the device number as a unique handle for a device. An example is the
+find programme, which uses device numbers to determine whether
+an inode is on a different filesystem than another inode. The device
+number used is the one for the block device which a filesystem is
+using. To preserve compatibility with userspace programmes, block
+devices using devfs need to have unique device numbers allocated to
+them. Furthermore, POSIX specifies device numbers, so some kind of
+device number needs to be presented to userspace.
+The simplest option (especially when porting drivers to devfs) is to
+keep using the old major and minor numbers. Devfs will take whatever
+values are given for major&minor and pass them onto userspace.
+This device number is a 16 bit number, so this leaves plenty of space
+for large numbers of discs and partitions. This scheme can also be
+used for character devices, in particular the tty devices, which are
+currently limited to 256 pseudo-ttys (this limits the total number of
+simultaneous xterms and remote logins).  Note that the device number
+is limited to the range 36864-61439 (majors 144-239), in order to
+avoid any possible conflicts with existing official allocations.
+Please note that using dynamically allocated block device numbers may
+break the NFS daemons (both user and kernel mode), which expect dev_t
+for a given device to be constant over the lifetime of remote mounts.
+A final note on this scheme: since it doesn't increase the size of
+device numbers, there are no compatibility issues with userspace.
+-----------------------------------------------------------------------------
+Questions and Answers
+Making things work
+Alternatives to devfs
+What I don't like about devfs
+How to report bugs
+Strange kernel messages
+Compilation problems with devfsd
+Making things work
+Here are some common questions and answers.
+Devfsd doesn't start
+Make sure you have compiled and installed devfsd
+Make sure devfsd is being started from your boot
+scripts
+Make sure you have configured your kernel to enable devfs (see
+below)
+Make sure devfs is mounted (see below)
+Devfsd is not managing all my permissions
+Make sure you are capturing the appropriate events. For example,
+device entries created by the kernel generate REGISTER events,
+but those created by devfsd generate CREATE events.
+Devfsd is not capturing all REGISTER events
+See the previous entry: you may need to capture CREATE events.
+X will not start
+Make sure you followed the steps 
+outlined above.
+Why don't my network devices appear in devfs?
+This is not a bug. Network devices have their own, completely separate
+namespace. They are accessed via socket(2) and
+setsockopt(2) calls, and thus require no device nodes. I have
+raised the possibilty of moving network devices into the device
+namespace, but have had no response.
+How can I test if I have devfs compiled into my kernel?
+All filesystems built-in or currently loaded are listed in
+/proc/filesystems. If you see a devfs entry, then
+you know that devfs was compiled into your kernel. If you have
+correctly configured and rebuilt your kernel, then devfs will be
+built-in. If you think you've configured it in, but
+/proc/filesystems doesn't show it, you've made a mistake.
+Common mistakes include:
+Using a 2.2.x kernel without applying the devfs patch (if you
+don't know how to patch your kernel, use 2.4.x instead, don't bother
+asking me how to patch)
+Forgetting to set CONFIG_EXPERIMENTAL=y
+Forgetting to set CONFIG_DEVFS_FS=y
+Forgetting to set CONFIG_DEVFS_MOUNT=y (if you want devfs
+to be automatically mounted at boot)
+Editing your .config manually, instead of using make
+config or make xconfig
+Forgetting to run make dep; make clean after changing the
+configuration and before compiling
+Forgetting to compile your kernel and modules
+Forgetting to install your kernel
+Forgetting to install your modules
+Please check twice that you've done all these steps before sending in
+a bug report.
+How can I test if devfs is mounted on /dev?
+The device filesystem will always create an entry called
+".devfsd", which is used to communicate with the daemon. Even
+if the daemon is not running, this entry will exist. Testing for the
+existence of this entry is the approved method of determining if devfs
+is mounted or not. Note that the type of entry (i.e. regular file,
+character device, named pipe, etc.) may change without notice. Only
+the existence of the entry should be relied upon.
+When I start devfsd, I see the error:
+Error opening file: ".devfsd"   No such file or directory?
+This means that devfs is not mounted. Make sure you have devfs mounted.
+How do I mount devfs?
+First make sure you have devfs compiled into your kernel (see
+above). Then you will either need to:
+set CONFIG_DEVFS_MOUNT=y in your kernel config
+pass devfs=mount to your boot loader
+mount devfs manually in your boot scripts with:
+mount -t none devfs /dev
+Mount by volume LABEL=<label> doesn't work with
+devfs
+Most probably you are not mounting devfs onto /dev. What
+happens is that if your kernel config has CONFIG_DEVFS_FS=y
+then the contents of /proc/partitions will have the devfs
+names (such as scsi/host0/bus0/target0/lun0/part1). The
+contents of /proc/partitions are used by mount(8) when
+mounting by volume label. If devfs is not mounted on /dev,
+then mount(8) will fail to find devices. The solution is to
+make sure that devfs is mounted on /dev. See above for how to
+do that.
+I have extra or incorrect entries in /dev
+You may have stale entries in your dev-state area. Check for a
+RESTORE configuration line in your devfsd configuration
+(typically /etc/devfsd.conf). If you have this line, check
+the contents of the specified directory for stale entries. Remove
+any entries which are incorrect, then reboot.
+I get "Unable to open initial console" messages at boot
+This usually happens when you don't have devfs automounted onto
+/dev at boot time, and there is no valid
+/dev/console entry on your root file-system. Create a valid
+/dev/console device node.
+Alternatives to devfs
+I've attempted to collate all the anti-devfs proposals and explain
+their limitations. Under construction.
+Why not just pass device create/remove events to a daemon?
+Here the suggestion is to develop an API in the kernel so that devices
+can register create and remove events, and a daemon listens for those
+events. The daemon would then populate/depopulate /dev (which
+resides on disc).
+This has several limitations:
+it only works for modules loaded and unloaded (or devices inserted
+and removed) after the kernel has finished booting. Without a database
+of events, there is no way the daemon could fully populate
+/dev
+if you add a database to this scheme, the question is then how to
+present that database to user-space. If you make it a list of strings
+with embedded event codes which are passed through a pipe to the
+daemon, then this is only of use to the daemon. I would argue that the
+natural way to present this data is via a filesystem (since many of
+the events will be of a hierarchical nature), such as devfs.
+Presenting the data as a filesystem makes it easy for the user to see
+what is available and also makes it easy to write scripts to scan the
+"database"
+the tight binding between device nodes and drivers is no longer
+possible (requiring the otherwise perfectly avoidable
+table lookups)
+you cannot catch inode lookup events on /dev which means
+that module autoloading requires device nodes to be created. This is a
+problem, particularly for drivers where only a few inodes are created
+from a potentially large set
+this technique can't be used when the root FS is mounted
+read-only
+Just implement a better scsidev
+This suggestion involves taking the scsidev programme and
+extending it to scan for all devices, not just SCSI devices. The
+scsidev programme works by scanning /proc/scsi
+Problems:
+the kernel does not currently provide a list of all devices
+available. Not all drivers register entries in /proc or
+generate kernel messages
+there is no uniform mechanism to register devices other than the
+devfs API
+implementing such an API is then the same as the
+proposal above
+Put /dev on a ramdisc
+This suggestion involves creating a ramdisc and populating it with
+device nodes and then mounting it over /dev.
+Problems:
+this doesn't help when mounting the root filesystem, since you
+still need a device node to do that
+if you want to use this technique for the root device node as
+well, you need to use initrd. This complicates the booting sequence
+and makes it significantly harder to administer and configure. The
+initrd is essentially opaque, robbing the system administrator of easy
+configuration
+insufficient information is available to correctly populate the
+ramdisc. So we come back to the
+proposal above to "solve" this
+a ramdisc-based solution would take more kernel memory, since the
+backing store would be (at best) normal VFS inodes and dentries, which
+take 284 bytes and 112 bytes, respectively, for each entry. Compare
+that to 72 bytes for devfs
+Do nothing: there's no problem
+Sometimes people can be heard to claim that the existing scheme is
+fine. This is what they're ignoring:
+device number size (8 bits each for major and minor) is a real
+limitation, and must be fixed somehow. Systems with large numbers of
+SCSI devices, for example, will continue to consume the remaining
+unallocated major numbers. USB will also need to push beyond the 8 bit
+minor limitation
+simply increasing the device number size is insufficient. Apart
+from causing a lot of pain, it doesn't solve the management issues
+of a /dev with thousands or more device nodes
+ignoring the problem of a huge /dev will not make it go
+away, and dismisses the legitimacy of a large number of people who
+want a dynamic /dev
+the standard response then becomes: "write a device management
+daemon", which brings us back to the
+proposal above
+What I don't like about devfs
+Here are some common complaints about devfs, and some suggestions and
+solutions that may make it more palatable for you. I can't please
+everybody, but I do try :-)
+I hate the naming scheme
+First, remember that no naming scheme will please everybody. You hate
+the scheme, others love it. Who's to say who's right and who's wrong?
+Ultimately, the person who writes the code gets to choose, and what
+exists now is a combination of the choices made by the
+devfs author and the
+kernel maintainer (Linus).
+However, not all is lost. If you want to create your own naming
+scheme, it is a simple matter to write a standalone script, hack
+devfsd, or write a script called by devfsd. You can create whatever
+naming scheme you like.
+Further, if you want to remove all traces of the devfs naming scheme
+from /dev, you can mount devfs elsewhere (say
+/devfs) and populate /dev with links into
+/devfs. This population can be automated using devfsd if you
+wish.
+You can even use the VFS binding facility to make the links, rather
+than using symbolic links. This way, you don't even have to see the
+"destination" of these symbolic links.
+Devfs puts policy into the kernel
+There's already policy in the kernel. Device numbers are in fact
+policy (why should the kernel dictate what device numbers I use?).
+Face it, some policy has to be in the kernel. The real difference
+between device names as policy and device numbers as policy is that
+no one will use device numbers directly, because device
+numbers are devoid of meaning to humans and are ugly. At least with
+the devfs device names, (even though you can add your own naming
+scheme) some people will use the devfs-supplied names directly. This
+offends some people :-)
+Devfs is bloatware
+This is not even remotely true. As shown above,
+both code and data size are quite modest.
+How to report bugs
+If you have (or think you have) a bug with devfs, please follow the
+steps below:
+make sure you have enabled debugging output when configuring your
+kernel. You will need to set (at least) the following config options:
+CONFIG_DEVFS_DEBUG=y
+CONFIG_DEBUG_KERNEL=y
+CONFIG_DEBUG_SLAB=y
+please make sure you have the latest devfs patches applied. The
+latest kernel version might not have the latest devfs patches applied
+yet (Linus is very busy)
+save a copy of your complete kernel logs (preferably by
+using the dmesg programme) for later inclusion in your bug
+report. You may need to use the -s switch to increase the
+internal buffer size so you can capture all the boot messages.
+Don't edit or trim the dmesg output
+try booting with devfs=dall passed to the kernel boot
+command line (read the documentation on your bootloader on how to do
+this), and save the result to a file. This may be quite verbose, and
+it may overflow the messages buffer, but try to get as much of it as
+you can
+if you get an Oops, run ksymoops to decode it so that the
+names of the offending functions are provided. A non-decoded Oops is
+pretty useless
+send a copy of your devfsd configuration file(s)
+send the bug report to me first.
+Don't expect that I will see it if you post it to the linux-kernel
+mailing list. Include all the information listed above, plus
+anything else that you think might be relevant. Put the string
+devfs somewhere in the subject line, so my mail filters mark
+it as urgent
+Here is a general guide on how to ask questions in a way that greatly
+improves your chances of getting a reply:
+http://www.tuxedo.org/~esr/faqs/smart-questions.html. If you have
+a bug to report, you should also read
+http://www.chiark.greenend.org.uk/~sgtatham/bugs.html.
+Strange kernel messages
+You may see devfs-related messages in your kernel logs. Below are some
+messages and what they mean (and what you should do about them, if
+anything).
+devfs_register(fred): could not append to parent, err: -17
+You need to check what the error code means, but usually 17 means
+EEXIST. This means that a driver attempted to create an entry
+fred in a directory, but there already was an entry with that
+name. This is often caused by flawed boot scripts which untar a bunch
+of inodes into /dev, as a way to restore permissions. This
+message is harmless, as the device nodes will still
+provide access to the driver (unless you use the devfs=only
+boot option, which is only for dedicated souls:-). If you want to get
+rid of these annoying messages, upgrade to devfsd-v1.3.20 and use the
+recommended RESTORE directive to restore permissions.
+devfs_mk_dir(bill): using old entry in dir: c1808724 ""
+This is similar to the message above, except that a driver attempted
+to create a directory named bill, and the parent directory
+has an entry with the same name. In this case, to ensure that drivers
+continue to work properly, the old entry is re-used and given to the
+driver. In 2.5 kernels, the driver is given a NULL entry, and thus,
+under rare circumstances, may not create the require device nodes.
+The solution is the same as above.
+Compilation problems with devfsd
+Usually, you can compile devfsd just by typing in
+make in the source directory, followed by a make
+install (as root). Sometimes, you may have problems, particularly
+on broken configurations.
+error messages relating to DEVFSD_NOTIFY_DELETE
+This happened because you have an ancient set of kernel headers
+installed in /usr/include/linux or /usr/src/linux.
+Install kernel 2.4.10 or later. You may need to pass the
+KERNEL_DIR variable to make (if you did not install
+the new kernel sources as /usr/src/linux), or you may copy
+the devfs_fs.h file in the kernel source tree into
+/usr/include/linux.
+-----------------------------------------------------------------------------
+Other resources
+Douglas Gilbert has written a useful document at
+http://www.torque.net/sg/devfs_scsi.html which
+explores the SCSI subsystem and how it interacts with devfs
+Douglas Gilbert has written another useful document at
+http://www.torque.net/scsi/SCSI-2.4-HOWTO/ which
+discusses the Linux SCSI subsystem in 2.4.
+Johannes Erdfelt has started a discussion paper on Linux and
+hot-swap devices, describing what the requirements are for a scalable
+solution and how and why he's used devfs+devfsd. Note that this is an
+early draft only, available in plain text form at:
+http://johannes.erdfelt.com/hotswap.txt.
+Johannes has promised a HTML version will follow.
+I presented an invited 
+paper
+at the
+2nd Annual Storage Management Workshop held in Miamia, Florida,
+U.S.A. in October 2000.
+-----------------------------------------------------------------------------
+Translations of this document
+This document has been translated into other languages.
+The document master (in English) by rgooch@atnf.csiro.au is
+available at
+http://www.atnf.csiro.au/~rgooch/linux/docs/devfs.html
+A Korean translation by viatoris@nownuri.net is available at
+http://your.destiny.pe.kr/devfs/devfs.html
+-----------------------------------------------------------------------------
+Most flags courtesy of ITA's 
+Flags of All Countries
+used with permission. 
diff --git a/Documentation/filesystems/devfs/ToDo b/Documentation/filesystems/devfs/ToDo
new file mode 100644
index 000000000000..afd5a8f2c19b
--- /dev/null
+++ b/Documentation/filesystems/devfs/ToDo
@@ -0,0 +1,40 @@
+                Device File System (devfs) ToDo List
+                Richard Gooch <rgooch@atnf.csiro.au>
+                              3-JUL-2000
+This is a list of things to be done for better devfs support in the
+Linux kernel. If you'd like to contribute to the devfs, please have a
+look at this list for anything that is unallocated. Also, if there are
+items missing (surely), please contact me so I can add them to the
+list (preferably with your name attached to them:-).
+- >256 ptys
+  Thanks to C. Scott Ananian <cananian@alumni.princeton.edu>
+- Amiga floppy driver (drivers/block/amiflop.c)
+- Atari floppy driver (drivers/block/ataflop.c)
+- SWIM3 (Super Woz Integrated Machine 3) floppy driver (drivers/block/swim3.c)
+- Amiga ZorroII ramdisc driver (drivers/block/z2ram.c)
+- Parallel port ATAPI CD-ROM (drivers/block/paride/pcd.c)
+- Parallel port ATAPI floppy (drivers/block/paride/pf.c)
+- AP1000 block driver (drivers/ap1000/ap.c, drivers/ap1000/ddv.c)
+- Archimedes floppy (drivers/acorn/block/fd1772.c)
+- MFM hard drive (drivers/acorn/block/mfmhd.c)
+- I2O block device (drivers/message/i2o/i2o_block.c)
+- ST-RAM device (arch/m68k/atari/stram.c)
+- Raw devices
diff --git a/Documentation/filesystems/devfs/boot-options b/Documentation/filesystems/devfs/boot-options
new file mode 100644
index 000000000000..df3d33b03e0a
--- /dev/null
+++ b/Documentation/filesystems/devfs/boot-options
@@ -0,0 +1,65 @@
+/* -*- auto-fill -*-                                                         */
+                Device File System (devfs) Boot Options
+                Richard Gooch <rgooch@atnf.csiro.au>
+                              18-AUG-2001
+When CONFIG_DEVFS_DEBUG is enabled, you can pass several boot options
+to the kernel to debug devfs. The boot options are prefixed by
+"devfs=", and are separated by commas. Spaces are not allowed. The
+syntax looks like this:
+devfs=<option1>,<option2>,<option3>
+and so on. For example, if you wanted to turn on debugging for module
+load requests and device registration, you would do:
+devfs=dmod,dreg
+You may prefix "no" to any option. This will invert the option.
+Debugging Options
+=================
+These requires CONFIG_DEVFS_DEBUG to be enabled.
+Note that all debugging options have 'd' as the first character. By
+default all options are off. All debugging output is sent to the
+kernel logs. The debugging options do not take effect until the devfs
+version message appears (just prior to the root filesystem being
+mounted).
+These are the options:
+dmod            print module load requests to <request_module>
+dreg            print device register requests to <devfs_register>
+dunreg          print device unregister requests to <devfs_unregister>
+dchange         print device change requests to <devfs_set_flags>
+dilookup        print inode lookup requests
+diget           print VFS inode allocations
+diunlink        print inode unlinks
+dichange        print inode changes
+dimknod         print calls to mknod(2)
+dall            some debugging turned on
+Other Options
+=============
+These control the default behaviour of devfs. The options are:
+mount           mount devfs onto /dev at boot time
+only            disable non-devfs device nodes for devfs-capable drivers
diff --git a/Documentation/filesystems/directory-locking b/Documentation/filesystems/directory-locking
new file mode 100644
index 000000000000..34380d4fbce3
--- /dev/null
+++ b/Documentation/filesystems/directory-locking
@@ -0,0 +1,113 @@
+        Locking scheme used for directory operations is based on two
+kinds of locks - per-inode (->i_sem) and per-filesystem (->s_vfs_rename_sem).
+        For our purposes all operations fall in 5 classes:
+1) read access.  Locking rules: caller locks directory we are accessing.
+2) object creation.  Locking rules: same as above.
+3) object removal.  Locking rules: caller locks parent, finds victim,
+locks victim and calls the method.
+4) rename() that is _not_ cross-directory.  Locking rules: caller locks
+the parent, finds source and target, if target already exists - locks it
+and then calls the method.
+5) link creation.  Locking rules:
+        * lock parent
+        * check that source is not a directory
+        * lock source
+        * call the method.
+6) cross-directory rename.  The trickiest in the whole bunch.  Locking
+rules:
+        * lock the filesystem
+        * lock parents in "ancestors first" order.
+        * find source and target.
+        * if old parent is equal to or is a descendent of target
+                fail with -ENOTEMPTY
+        * if new parent is equal to or is a descendent of source
+                fail with -ELOOP
+        * if target exists - lock it.
+        * call the method.
+The rules above obviously guarantee that all directories that are going to be
+read, modified or removed by method will be locked by caller.
+If no directory is its own ancestor, the scheme above is deadlock-free.
+Proof:
+        First of all, at any moment we have a partial ordering of the
+objects - A < B iff A is an ancestor of B.
+        That ordering can change.  However, the following is true:
+(1) if object removal or non-cross-directory rename holds lock on A and
+    attempts to acquire lock on B, A will remain the parent of B until we
+    acquire the lock on B.  (Proof: only cross-directory rename can change
+    the parent of object and it would have to lock the parent).
+(2) if cross-directory rename holds the lock on filesystem, order will not
+    change until rename acquires all locks.  (Proof: other cross-directory
+    renames will be blocked on filesystem lock and we don't start changing
+    the order until we had acquired all locks).
+(3) any operation holds at most one lock on non-directory object and
+    that lock is acquired after all other locks.  (Proof: see descriptions
+    of operations).
+        Now consider the minimal deadlock.  Each process is blocked on
+attempt to acquire some lock and already holds at least one lock.  Let's
+consider the set of contended locks.  First of all, filesystem lock is
+not contended, since any process blocked on it is not holding any locks.
+Thus all processes are blocked on ->i_sem.
+        Non-directory objects are not contended due to (3).  Thus link
+creation can't be a part of deadlock - it can't be blocked on source
+and it means that it doesn't hold any locks.
+        Any contended object is either held by cross-directory rename or
+has a child that is also contended.  Indeed, suppose that it is held by
+operation other than cross-directory rename.  Then the lock this operation
+is blocked on belongs to child of that object due to (1).
+        It means that one of the operations is cross-directory rename.
+Otherwise the set of contended objects would be infinite - each of them
+would have a contended child and we had assumed that no object is its
+own descendent.  Moreover, there is exactly one cross-directory rename
+(see above).
+        Consider the object blocking the cross-directory rename.  One
+of its descendents is locked by cross-directory rename (otherwise we
+would again have an infinite set of of contended objects).  But that
+means that cross-directory rename is taking locks out of order.  Due
+to (2) the order hadn't changed since we had acquired filesystem lock.
+But locking rules for cross-directory rename guarantee that we do not
+try to acquire lock on descendent before the lock on ancestor.
+Contradiction.  I.e.  deadlock is impossible.  Q.E.D.
+        These operations are guaranteed to avoid loop creation.  Indeed,
+the only operation that could introduce loops is cross-directory rename.
+Since the only new (parent, child) pair added by rename() is (new parent,
+source), such loop would have to contain these objects and the rest of it
+would have to exist before rename().  I.e. at the moment of loop creation
+rename() responsible for that would be holding filesystem lock and new parent
+would have to be equal to or a descendent of source.  But that means that
+new parent had been equal to or a descendent of source since the moment when
+we had acquired filesystem lock and rename() would fail with -ELOOP in that
+case.
+        While this locking scheme works for arbitrary DAGs, it relies on
+ability to check that directory is a descendent of another object.  Current
+implementation assumes that directory graph is a tree.  This assumption is
+also preserved by all operations (cross-directory rename on a tree that would
+not introduce a cycle will leave it a tree and link() fails for directories).
+        Notice that "directory" in the above == "anything that might have
+children", so if we are going to introduce hybrid objects we will need
+either to make sure that link(2) doesn't work for them or to make changes
+in is_subdir() that would make it work even in presence of such beasts.
diff --git a/Documentation/filesystems/ext2.txt b/Documentation/filesystems/ext2.txt
new file mode 100644
index 000000000000..b5cb9110cc6b
--- /dev/null
+++ b/Documentation/filesystems/ext2.txt
@@ -0,0 +1,383 @@
+The Second Extended Filesystem
+==============================
+ext2 was originally released in January 1993.  Written by R\'emy Card,
+Theodore Ts'o and Stephen Tweedie, it was a major rewrite of the
+Extended Filesystem.  It is currently still (April 2001) the predominant
+filesystem in use by Linux.  There are also implementations available
+for NetBSD, FreeBSD, the GNU HURD, Windows 95/98/NT, OS/2 and RISC OS.
+Options
+=======
+Most defaults are determined by the filesystem superblock, and can be
+set using tune2fs(8). Kernel-determined defaults are indicated by (*).
+bsddf                   (*)     Makes `df' act like BSD.
+minixdf                         Makes `df' act like Minix.
+check                           Check block and inode bitmaps at mount time
+                                (requires CONFIG_EXT2_CHECK).
+check=none, nocheck     (*)     Don't do extra checking of bitmaps on mount
+                                (check=normal and check=strict options removed)
+debug                           Extra debugging information is sent to the
+                                kernel syslog.  Useful for developers.
+errors=continue                 Keep going on a filesystem error.
+errors=remount-ro               Remount the filesystem read-only on an error.
+errors=panic                    Panic and halt the machine if an error occurs.
+grpid, bsdgroups                Give objects the same group ID as their parent.
+nogrpid, sysvgroups             New objects have the group ID of their creator.
+nouid32                         Use 16-bit UIDs and GIDs.
+oldalloc                        Enable the old block allocator. Orlov should
+                                have better performance, we'd like to get some
+                                feedback if it's the contrary for you.
+orlov                   (*)     Use the Orlov block allocator.
+                                (See http://lwn.net/Articles/14633/ and
+                                http://lwn.net/Articles/14446/.)
+resuid=n                        The user ID which may use the reserved blocks.
+resgid=n                        The group ID which may use the reserved blocks.
+sb=n                            Use alternate superblock at this location.
+user_xattr                      Enable "user." POSIX Extended Attributes
+                                (requires CONFIG_EXT2_FS_XATTR).
+                                See also http://acl.bestbits.at
+nouser_xattr                    Don't support "user." extended attributes.
+acl                             Enable POSIX Access Control Lists support
+                                (requires CONFIG_EXT2_FS_POSIX_ACL).
+                                See also http://acl.bestbits.at
+noacl                           Don't support POSIX ACLs.
+nobh                            Do not attach buffer_heads to file pagecache.
+grpquota,noquota,quota,usrquota Quota options are silently ignored by ext2.
+Specification
+=============
+ext2 shares many properties with traditional Unix filesystems.  It has
+the concepts of blocks, inodes and directories.  It has space in the
+specification for Access Control Lists (ACLs), fragments, undeletion and
+compression though these are not yet implemented (some are available as
+separate patches).  There is also a versioning mechanism to allow new
+features (such as journalling) to be added in a maximally compatible
+manner.
+Blocks
+------
+The space in the device or file is split up into blocks.  These are
+a fixed size, of 1024, 2048 or 4096 bytes (8192 bytes on Alpha systems),
+which is decided when the filesystem is created.  Smaller blocks mean
+less wasted space per file, but require slightly more accounting overhead,
+and also impose other limits on the size of files and the filesystem.
+Block Groups
+------------
+Blocks are clustered into block groups in order to reduce fragmentation
+and minimise the amount of head seeking when reading a large amount
+of consecutive data.  Information about each block group is kept in a
+descriptor table stored in the block(s) immediately after the superblock.
+Two blocks near the start of each group are reserved for the block usage
+bitmap and the inode usage bitmap which show which blocks and inodes
+are in use.  Since each bitmap is limited to a single block, this means
+that the maximum size of a block group is 8 times the size of a block.
+The block(s) following the bitmaps in each block group are designated
+as the inode table for that block group and the remainder are the data
+blocks.  The block allocation algorithm attempts to allocate data blocks
+in the same block group as the inode which contains them.
+The Superblock
+--------------
+The superblock contains all the information about the configuration of
+the filing system.  The primary copy of the superblock is stored at an
+offset of 1024 bytes from the start of the device, and it is essential
+to mounting the filesystem.  Since it is so important, backup copies of
+the superblock are stored in block groups throughout the filesystem.
+The first version of ext2 (revision 0) stores a copy at the start of
+every block group, along with backups of the group descriptor block(s).
+Because this can consume a considerable amount of space for large
+filesystems, later revisions can optionally reduce the number of backup
+copies by only putting backups in specific groups (this is the sparse
+superblock feature).  The groups chosen are 0, 1 and powers of 3, 5 and 7.
+The information in the superblock contains fields such as the total
+number of inodes and blocks in the filesystem and how many are free,
+how many inodes and blocks are in each block group, when the filesystem
+was mounted (and if it was cleanly unmounted), when it was modified,
+what version of the filesystem it is (see the Revisions section below)
+and which OS created it.
+If the filesystem is revision 1 or higher, then there are extra fields,
+such as a volume name, a unique identification number, the inode size,
+and space for optional filesystem features to store configuration info.
+All fields in the superblock (as in all other ext2 structures) are stored
+on the disc in little endian format, so a filesystem is portable between
+machines without having to know what machine it was created on.
+Inodes
+------
+The inode (index node) is a fundamental concept in the ext2 filesystem.
+Each object in the filesystem is represented by an inode.  The inode
+structure contains pointers to the filesystem blocks which contain the
+data held in the object and all of the metadata about an object except
+its name.  The metadata about an object includes the permissions, owner,
+group, flags, size, number of blocks used, access time, change time,
+modification time, deletion time, number of links, fragments, version
+(for NFS) and extended attributes (EAs) and/or Access Control Lists (ACLs).
+There are some reserved fields which are currently unused in the inode
+structure and several which are overloaded.  One field is reserved for the
+directory ACL if the inode is a directory and alternately for the top 32
+bits of the file size if the inode is a regular file (allowing file sizes
+larger than 2GB).  The translator field is unused under Linux, but is used
+by the HURD to reference the inode of a program which will be used to
+interpret this object.  Most of the remaining reserved fields have been
+used up for both Linux and the HURD for larger owner and group fields,
+The HURD also has a larger mode field so it uses another of the remaining
+fields to store the extra more bits.
+There are pointers to the first 12 blocks which contain the file's data
+in the inode.  There is a pointer to an indirect block (which contains
+pointers to the next set of blocks), a pointer to a doubly-indirect
+block (which contains pointers to indirect blocks) and a pointer to a
+trebly-indirect block (which contains pointers to doubly-indirect blocks).
+The flags field contains some ext2-specific flags which aren't catered
+for by the standard chmod flags.  These flags can be listed with lsattr
+and changed with the chattr command, and allow specific filesystem
+behaviour on a per-file basis.  There are flags for secure deletion,
+undeletable, compression, synchronous updates, immutability, append-only,
+dumpable, no-atime, indexed directories, and data-journaling.  Not all
+of these are supported yet.
+Directories
+-----------
+A directory is a filesystem object and has an inode just like a file.
+It is a specially formatted file containing records which associate
+each name with an inode number.  Later revisions of the filesystem also
+encode the type of the object (file, directory, symlink, device, fifo,
+socket) to avoid the need to check the inode itself for this information
+(support for taking advantage of this feature does not yet exist in
+Glibc 2.2).
+The inode allocation code tries to assign inodes which are in the same
+block group as the directory in which they are first created.
+The current implementation of ext2 uses a singly-linked list to store
+the filenames in the directory; a pending enhancement uses hashing of the
+filenames to allow lookup without the need to scan the entire directory.
+The current implementation never removes empty directory blocks once they
+have been allocated to hold more files.
+Special files
+-------------
+Symbolic links are also filesystem objects with inodes.  They deserve
+special mention because the data for them is stored within the inode
+itself if the symlink is less than 60 bytes long.  It uses the fields
+which would normally be used to store the pointers to data blocks.
+This is a worthwhile optimisation as it we avoid allocating a full
+block for the symlink, and most symlinks are less than 60 characters long.
+Character and block special devices never have data blocks assigned to
+them.  Instead, their device number is stored in the inode, again reusing
+the fields which would be used to point to the data blocks.
+Reserved Space
+--------------
+In ext2, there is a mechanism for reserving a certain number of blocks
+for a particular user (normally the super-user).  This is intended to
+allow for the system to continue functioning even if non-priveleged users
+fill up all the space available to them (this is independent of filesystem
+quotas).  It also keeps the filesystem from filling up entirely which
+helps combat fragmentation.
+Filesystem check
+----------------
+At boot time, most systems run a consistency check (e2fsck) on their
+filesystems.  The superblock of the ext2 filesystem contains several
+fields which indicate whether fsck should actually run (since checking
+the filesystem at boot can take a long time if it is large).  fsck will
+run if the filesystem was not cleanly unmounted, if the maximum mount
+count has been exceeded or if the maximum time between checks has been
+exceeded.
+Feature Compatibility
+---------------------
+The compatibility feature mechanism used in ext2 is sophisticated.
+It safely allows features to be added to the filesystem, without
+unnecessarily sacrificing compatibility with older versions of the
+filesystem code.  The feature compatibility mechanism is not supported by
+the original revision 0 (EXT2_GOOD_OLD_REV) of ext2, but was introduced in
+revision 1.  There are three 32-bit fields, one for compatible features
+(COMPAT), one for read-only compatible (RO_COMPAT) features and one for
+incompatible (INCOMPAT) features.
+These feature flags have specific meanings for the kernel as follows:
+A COMPAT flag indicates that a feature is present in the filesystem,
+but the on-disk format is 100% compatible with older on-disk formats, so
+a kernel which didn't know anything about this feature could read/write
+the filesystem without any chance of corrupting the filesystem (or even
+making it inconsistent).  This is essentially just a flag which says
+"this filesystem has a (hidden) feature" that the kernel or e2fsck may
+want to be aware of (more on e2fsck and feature flags later).  The ext3
+HAS_JOURNAL feature is a COMPAT flag because the ext3 journal is simply
+a regular file with data blocks in it so the kernel does not need to
+take any special notice of it if it doesn't understand ext3 journaling.
+An RO_COMPAT flag indicates that the on-disk format is 100% compatible
+with older on-disk formats for reading (i.e. the feature does not change
+the visible on-disk format).  However, an old kernel writing to such a
+filesystem would/could corrupt the filesystem, so this is prevented. The
+most common such feature, SPARSE_SUPER, is an RO_COMPAT feature because
+sparse groups allow file data blocks where superblock/group descriptor
+backups used to live, and ext2_free_blocks() refuses to free these blocks,
+which would leading to inconsistent bitmaps.  An old kernel would also
+get an error if it tried to free a series of blocks which crossed a group
+boundary, but this is a legitimate layout in a SPARSE_SUPER filesystem.
+An INCOMPAT flag indicates the on-disk format has changed in some
+way that makes it unreadable by older kernels, or would otherwise
+cause a problem if an old kernel tried to mount it.  FILETYPE is an
+INCOMPAT flag because older kernels would think a filename was longer
+than 256 characters, which would lead to corrupt directory listings.
+The COMPRESSION flag is an obvious INCOMPAT flag - if the kernel
+doesn't understand compression, you would just get garbage back from
+read() instead of it automatically decompressing your data.  The ext3
+RECOVER flag is needed to prevent a kernel which does not understand the
+ext3 journal from mounting the filesystem without replaying the journal.
+For e2fsck, it needs to be more strict with the handling of these
+flags than the kernel.  If it doesn't understand ANY of the COMPAT,
+RO_COMPAT, or INCOMPAT flags it will refuse to check the filesystem,
+because it has no way of verifying whether a given feature is valid
+or not.  Allowing e2fsck to succeed on a filesystem with an unknown
+feature is a false sense of security for the user.  Refusing to check
+a filesystem with unknown features is a good incentive for the user to
+update to the latest e2fsck.  This also means that anyone adding feature
+flags to ext2 also needs to update e2fsck to verify these features.
+Metadata
+--------
+It is frequently claimed that the ext2 implementation of writing
+asynchronous metadata is faster than the ffs synchronous metadata
+scheme but less reliable.  Both methods are equally resolvable by their
+respective fsck programs.
+If you're exceptionally paranoid, there are 3 ways of making metadata
+writes synchronous on ext2:
+per-file if you have the program source: use the O_SYNC flag to open()
+per-file if you don't have the source: use "chattr +S" on the file
+per-filesystem: add the "sync" option to mount (or in /etc/fstab)
+the first and last are not ext2 specific but do force the metadata to
+be written synchronously.  See also Journaling below.
+Limitations
+-----------
+There are various limits imposed by the on-disk layout of ext2.  Other
+limits are imposed by the current implementation of the kernel code.
+Many of the limits are determined at the time the filesystem is first
+created, and depend upon the block size chosen.  The ratio of inodes to
+data blocks is fixed at filesystem creation time, so the only way to
+increase the number of inodes is to increase the size of the filesystem.
+No tools currently exist which can change the ratio of inodes to blocks.
+Most of these limits could be overcome with slight changes in the on-disk
+format and using a compatibility flag to signal the format change (at
+the expense of some compatibility).
+Filesystem block size:     1kB        2kB        4kB        8kB
+File size limit:          16GB      256GB     2048GB     2048GB
+Filesystem size limit:  2047GB     8192GB    16384GB    32768GB
+There is a 2.4 kernel limit of 2048GB for a single block device, so no
+filesystem larger than that can be created at this time.  There is also
+an upper limit on the block size imposed by the page size of the kernel,
+so 8kB blocks are only allowed on Alpha systems (and other architectures
+which support larger pages).
+There is an upper limit of 32768 subdirectories in a single directory.
+There is a "soft" upper limit of about 10-15k files in a single directory
+with the current linear linked-list directory implementation.  This limit
+stems from performance problems when creating and deleting (and also
+finding) files in such large directories.  Using a hashed directory index
+(under development) allows 100k-1M+ files in a single directory without
+performance problems (although RAM size becomes an issue at this point).
+The (meaningless) absolute upper limit of files in a single directory
+(imposed by the file size, the realistic limit is obviously much less)
+is over 130 trillion files.  It would be higher except there are not
+enough 4-character names to make up unique directory entries, so they
+have to be 8 character filenames, even then we are fairly close to
+running out of unique filenames.
+Journaling
+----------
+A journaling extension to the ext2 code has been developed by Stephen
+Tweedie.  It avoids the risks of metadata corruption and the need to
+wait for e2fsck to complete after a crash, without requiring a change
+to the on-disk ext2 layout.  In a nutshell, the journal is a regular
+file which stores whole metadata (and optionally data) blocks that have
+been modified, prior to writing them into the filesystem.  This means
+it is possible to add a journal to an existing ext2 filesystem without
+the need for data conversion.
+When changes to the filesystem (e.g. a file is renamed) they are stored in
+a transaction in the journal and can either be complete or incomplete at
+the time of a crash.  If a transaction is complete at the time of a crash
+(or in the normal case where the system does not crash), then any blocks
+in that transaction are guaranteed to represent a valid filesystem state,
+and are copied into the filesystem.  If a transaction is incomplete at
+the time of the crash, then there is no guarantee of consistency for
+the blocks in that transaction so they are discarded (which means any
+filesystem changes they represent are also lost).
+Check Documentation/filesystems/ext3.txt if you want to read more about
+ext3 and journaling.
+References
+==========
+The kernel source       file:/usr/src/linux/fs/ext2/
+e2fsprogs (e2fsck)      http://e2fsprogs.sourceforge.net/
+Design & Implementation http://e2fsprogs.sourceforge.net/ext2intro.html
+Journaling (ext3)       ftp://ftp.uk.linux.org/pub/linux/sct/fs/jfs/
+Hashed Directories      http://kernelnewbies.org/~phillips/htree/
+Filesystem Resizing     http://ext2resize.sourceforge.net/
+Compression (*)         http://www.netspace.net.au/~reiter/e2compr/
+Implementations for:
+Windows 95/98/NT/2000   http://uranus.it.swin.edu.au/~jn/linux/Explore2fs.htm
+Windows 95 (*)          http://www.yipton.demon.co.uk/content.html#FSDEXT2
+DOS client (*)          ftp://metalab.unc.edu/pub/Linux/system/filesystems/ext2/
+OS/2                    http://perso.wanadoo.fr/matthieu.willm/ext2-os2/
+RISC OS client          ftp://ftp.barnet.ac.uk/pub/acorn/armlinux/iscafs/
+(*) no longer actively developed/supported (as of Apr 2001)
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
new file mode 100644
index 000000000000..9ab7f446f7ad
--- /dev/null
+++ b/Documentation/filesystems/ext3.txt
@@ -0,0 +1,183 @@
+Ext3 Filesystem
+===============
+ext3 was originally released in September 1999. Written by Stephen Tweedie
+for 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, 
+Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie.
+ext3 is ext2 filesystem enhanced with journalling capabilities. 
+Options
+=======
+When mounting an ext3 filesystem, the following option are accepted:
+(*) == default
+jounal=update           Update the ext3 file system's journal to the 
+                        current format.
+journal=inum            When a journal already exists, this option is 
+                        ignored. Otherwise, it specifies the number of
+                        the inode which will represent the ext3 file
+                        system's journal file.
+noload                  Don't load the journal on mounting.
+data=journal            All data are committed into the journal prior
+                        to being written into the main file system.
+data=ordered    (*)     All data are forced directly out to the main file
+                        system prior to its metadata being committed to
+                        the journal.
+data=writeback          Data ordering is not preserved, data may be
+                        written into the main file system after its
+                        metadata has been committed to the journal.
+commit=nrsec    (*)     Ext3 can be told to sync all its data and metadata
+                        every 'nrsec' seconds. The default value is 5 seconds.
+                        This means that if you lose your power, you will lose,
+                        as much, the latest 5 seconds of work (your filesystem
+                        will not be damaged though, thanks to journaling). This
+                        default value (or any low value) will hurt performance,
+                        but it's good for data-safety. Setting it to 0 will
+                        have the same effect than leaving the default 5 sec.
+                        Setting it to very large values will improve
+                        performance.
+barrier=1               This enables/disables barriers. barrier=0 disables it,
+                        barrier=1 enables it.
+orlov           (*)     This enables the new Orlov block allocator. It's enabled
+                        by default.
+oldalloc                This disables the Orlov block allocator and enables the
+                        old block allocator. Orlov should have better performance,
+                        we'd like to get some feedback if it's the contrary for
+                        you.
+user_xattr      (*)     Enables POSIX Extended Attributes. It's enabled by
+                        default, however you need to confifure its support
+                        (CONFIG_EXT3_FS_XATTR). This is neccesary if you want
+                        to use POSIX Acces Control Lists support. You can visit
+                        http://acl.bestbits.at to know more about POSIX Extended
+                        attributes.
+nouser_xattr            Disables POSIX Extended Attributes.
+acl             (*)     Enables POSIX Access Control Lists support. This is
+                        enabled by default, however you need to configure
+                        its support (CONFIG_EXT3_FS_POSIX_ACL). If you want
+                        to know more about ACLs visit http://acl.bestbits.at
+noacl                   This option disables POSIX Access Control List support.
+reservation
+noreservation
+resize=
+bsddf           (*)     Make 'df' act like BSD.
+minixdf                 Make 'df' act like Minix.
+check=none              Don't do extra checking of bitmaps on mount.
+nocheck         
+debug                   Extra debugging information is sent to syslog.
+errors=remount-ro(*)    Remount the filesystem read-only on an error.
+errors=continue         Keep going on a filesystem error.
+errors=panic            Panic and halt the machine if an error occurs.
+grpid                   Give objects the same group ID as their creator.
+bsdgroups               
+nogrpid         (*)     New objects have the group ID of their creator.
+sysvgroups
+resgid=n                The group ID which may use the reserved blocks.
+resuid=n                The user ID which may use the reserved blocks.
+sb=n                    Use alternate superblock at this location.
+quota                   Quota options are currently silently ignored.
+noquota                 (see fs/ext3/super.c, line 594)
+grpquota
+usrquota
+Specification
+=============
+ext3 shares all disk implementation with ext2 filesystem, and add
+transactions capabilities to ext2.  Journaling is done by the
+Journaling block device layer.
+Journaling Block Device layer
+-----------------------------
+The Journaling Block Device layer (JBD) isn't ext3 specific.  It was
+design to add journaling capabilities on a block device.  The ext3
+filesystem code will inform the JBD of modifications it is performing
+(Call a transaction).  the journal support the transactions start and
+stop, and in case of crash, the journal can replayed the transactions
+to put the partition on a consistent state fastly.
+handles represent a single atomic update to a filesystem.  JBD can
+handle external journal on a block device.
+Data Mode
+---------
+There's 3 different data modes:
+* writeback mode
+In data=writeback mode, ext3 does not journal data at all.  This mode
+provides a similar level of journaling as XFS, JFS, and ReiserFS in its
+default mode - metadata journaling.  A crash+recovery can cause
+incorrect data to appear in files which were written shortly before the
+crash.  This mode will typically provide the best ext3 performance.
+* ordered mode
+In data=ordered mode, ext3 only officially journals metadata, but it
+logically groups metadata and data blocks into a single unit called a
+transaction.  When it's time to write the new metadata out to disk, the
+associated data blocks are written first.  In general, this mode
+perform slightly slower than writeback but significantly faster than
+journal mode.
+* journal mode
+data=journal mode provides full data and metadata journaling.  All new
+data is written to the journal first, and then to its final location. 
+In the event of a crash, the journal can be replayed, bringing both
+data and metadata into a consistent state.  This mode is the slowest
+except when data needs to be read from and written to disk at the same
+time where it outperform all others mode.
+Compatibility
+-------------
+Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`.
+Ext3 is fully compatible with Ext2.  Ext3 partitions can easily be
+mounted as Ext2.
+External Tools
+==============
+see manual pages to know more.
+tune2fs:        create a ext3 journal on a ext2 partition with the -j flags
+mke2fs:         create a ext3 partition with the -j flags
+debugfs:        ext2 and ext3 file system debugger
+References
+==========
+kernel source:  file:/usr/src/linux/fs/ext3
+                file:/usr/src/linux/fs/jbd
+programs:       http://e2fsprogs.sourceforge.net
+useful link:
+                http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html
+                http://www-106.ibm.com/developerworks/linux/library/l-fs7/
+                http://www-106.ibm.com/developerworks/linux/library/l-fs8/
diff --git a/Documentation/filesystems/hfs.txt b/Documentation/filesystems/hfs.txt
new file mode 100644
index 000000000000..bd0fa7704035
--- /dev/null
+++ b/Documentation/filesystems/hfs.txt
@@ -0,0 +1,83 @@
+Macintosh HFS Filesystem for Linux
+==================================
+HFS stands for ``Hierarchical File System'' and is the filesystem used
+by the Mac Plus and all later Macintosh models.  Earlier Macintosh
+models used MFS (``Macintosh File System''), which is not supported,
+MacOS 8.1 and newer support a filesystem called HFS+ that's similar to
+HFS but is extended in various areas.  Use the hfsplus filesystem driver
+to access such filesystems from Linux.
+Mount options
+=============
+When mounting an HFS filesystem, the following options are accepted:
+  creator=cccc, type=cccc
+        Specifies the creator/type values as shown by the MacOS finder
+        used for creating new files.  Default values: '????'.
+  uid=n, gid=n
+        Specifies the user/group that owns all files on the filesystems.
+        Default:  user/group id of the mounting process.
+  dir_umask=n, file_umask=n, umask=n
+        Specifies the umask used for all files , all directories or all
+        files and directories.  Defaults to the umask of the mounting process.
+  session=n
+        Select the CDROM session to mount as HFS filesystem.  Defaults to
+        leaving that decision to the CDROM driver.  This option will fail
+        with anything but a CDROM as underlying devices.
+  part=n
+        Select partition number n from the devices.  Does only makes
+        sense for CDROMS because they can't be partitioned under Linux.
+        For disk devices the generic partition parsing code does this
+        for us.  Defaults to not parsing the partition table at all.
+  quiet
+        Ignore invalid mount options instead of complaining.
+Writing to HFS Filesystems
+==========================
+HFS is not a UNIX filesystem, thus it does not have the usual features you'd
+expect:
+ o You can't modify the set-uid, set-gid, sticky or executable bits or the uid
+   and gid of files.
+ o You can't create hard- or symlinks, device files, sockets or FIFOs.
+HFS does on the other have the concepts of multiple forks per file.  These
+non-standard forks are represented as hidden additional files in the normal
+filesystems namespace which is kind of a cludge and makes the semantics for
+the a little strange:
+ o You can't create, delete or rename resource forks of files or the
+   Finder's metadata.
+ o They are however created (with default values), deleted and renamed
+   along with the corresponding data fork or directory.
+ o Copying files to a different filesystem will loose those attributes
+   that are essential for MacOS to work.
+Creating HFS filesystems
+===================================
+The hfsutils package from Robert Leslie contains a program called
+hformat that can be used to create HFS filesystem. See
+<http://www.mars.org/home/rob/proj/hfs/> for details.
+Credits
+=======
+The HFS drivers was written by Paul H. Hargrovea (hargrove@sccm.Stanford.EDU)
+and is now maintained by Roman Zippel (roman@ardistech.com) at Ardis
+Technologies.
+Roman rewrote large parts of the code and brought in btree routines derived
+from Brad Boyer's hfsplus driver (also maintained by Roman now).
diff --git a/Documentation/filesystems/hpfs.txt b/Documentation/filesystems/hpfs.txt
new file mode 100644
index 000000000000..33dc360c8e89
--- /dev/null
+++ b/Documentation/filesystems/hpfs.txt
@@ -0,0 +1,296 @@
+Read/Write HPFS 2.09
+1998-2004, Mikulas Patocka
+email: mikulas@artax.karlin.mff.cuni.cz
+homepage: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
+CREDITS:
+Chris Smith, 1993, original read-only HPFS, some code and hpfs structures file
+        is taken from it
+Jacques Gelinas, MSDos mmap, Inspired by fs/nfs/mmap.c (Jon Tombs 15 Aug 1993)
+Werner Almesberger, 1992, 1993, MSDos option parser & CR/LF conversion
+Mount options
+uid=xxx,gid=xxx,umask=xxx (default uid=gid=0 umask=default_system_umask)
+        Set owner/group/mode for files that do not have it specified in extended
+        attributes. Mode is inverted umask - for example umask 027 gives owner
+        all permission, group read permission and anybody else no access. Note
+        that for files mode is anded with 0666. If you want files to have 'x'
+        rights, you must use extended attributes.
+case=lower,asis (default asis)
+        File name lowercasing in readdir.
+conv=binary,text,auto (default binary)
+        CR/LF -> LF conversion, if auto, decision is made according to extension
+        - there is a list of text extensions (I thing it's better to not convert
+        text file than to damage binary file). If you want to change that list,
+        change it in the source. Original readonly HPFS contained some strange
+        heuristic algorithm that I removed. I thing it's danger to let the
+        computer decide whether file is text or binary. For example, DJGPP
+        binaries contain small text message at the beginning and they could be
+        misidentified and damaged under some circumstances.
+check=none,normal,strict (default normal)
+        Check level. Selecting none will cause only little speedup and big
+        danger. I tried to write it so that it won't crash if check=normal on
+        corrupted filesystems. check=strict means many superfluous checks -
+        used for debugging (for example it checks if file is allocated in
+        bitmaps when accessing it).
+errors=continue,remount-ro,panic (default remount-ro)
+        Behaviour when filesystem errors found.
+chkdsk=no,errors,always (default errors)
+        When to mark filesystem dirty so that OS/2 checks it.
+eas=no,ro,rw (default rw)
+        What to do with extended attributes. 'no' - ignore them and use always
+        values specified in uid/gid/mode options. 'ro' - read extended
+        attributes but do not create them. 'rw' - create extended attributes
+        when you use chmod/chown/chgrp/mknod/ln -s on the filesystem.
+timeshift=(-)nnn (default 0)
+        Shifts the time by nnn seconds. For example, if you see under linux
+        one hour more, than under os/2, use timeshift=-3600.
+File names
+As in OS/2, filenames are case insensitive. However, shell thinks that names
+are case sensitive, so for example when you create a file FOO, you can use
+'cat FOO', 'cat Foo', 'cat foo' or 'cat F*' but not 'cat f*'. Note, that you
+also won't be able to compile linux kernel (and maybe other things) on HPFS
+because kernel creates different files with names like bootsect.S and
+bootsect.s. When searching for file thats name has characters >= 128, codepages
+are used - see below.
+OS/2 ignores dots and spaces at the end of file name, so this driver does as
+well. If you create 'a. ...', the file 'a' will be created, but you can still
+access it under names 'a.', 'a..', 'a .  . . ' etc.
+Extended attributes
+On HPFS partitions, OS/2 can associate to each file a special information called
+extended attributes. Extended attributes are pairs of (key,value) where key is
+an ascii string identifying that attribute and value is any string of bytes of
+variable length. OS/2 stores window and icon positions and file types there. So
+why not use it for unix-specific info like file owner or access rights? This
+driver can do it. If you chown/chgrp/chmod on a hpfs partition, extended
+attributes with keys "UID", "GID" or "MODE" and 2-byte values are created. Only
+that extended attributes those value differs from defaults specified in mount
+options are created. Once created, the extended attributes are never deleted,
+they're just changed. It means that when your default uid=0 and you type
+something like 'chown luser file; chown root file' the file will contain
+extended attribute UID=0. And when you umount the fs and mount it again with
+uid=luser_uid, the file will be still owned by root! If you chmod file to 444,
+extended attribute "MODE" will not be set, this special case is done by setting
+read-only flag. When you mknod a block or char device, besides "MODE", the
+special 4-byte extended attribute "DEV" will be created containing the device
+number. Currently this driver cannot resize extended attributes - it means
+that if somebody (I don't know who?) has set "UID", "GID", "MODE" or "DEV"
+attributes with different sizes, they won't be rewritten and changing these
+values doesn't work.
+Symlinks
+You can do symlinks on HPFS partition, symlinks are achieved by setting extended
+attribute named "SYMLINK" with symlink value. Like on ext2, you can chown and
+chgrp symlinks but I don't know what is it good for. chmoding symlink results
+in chmoding file where symlink points. These symlinks are just for Linux use and
+incompatible with OS/2. OS/2 PmShell symlinks are not supported because they are
+stored in very crazy way. They tried to do it so that link changes when file is
+moved ... sometimes it works. But the link is partly stored in directory
+extended attributes and partly in OS2SYS.INI. I don't want (and don't know how)
+to analyze or change OS2SYS.INI.
+Codepages
+HPFS can contain several uppercasing tables for several codepages and each
+file has a pointer to codepage it's name is in. However OS/2 was created in
+America where people don't care much about codepages and so multiple codepages
+support is quite buggy. I have Czech OS/2 working in codepage 852 on my disk.
+Once I booted English OS/2 working in cp 850 and I created a file on my 852
+partition. It marked file name codepage as 850 - good. But when I again booted
+Czech OS/2, the file was completely inaccessible under any name. It seems that
+OS/2 uppercases the search pattern with its system code page (852) and file
+name it's comparing to with its code page (850). These could never match. Is it
+really what IBM developers wanted? But problems continued. When I created in
+Czech OS/2 another file in that directory, that file was inaccessible too. OS/2
+probably uses different uppercasing method when searching where to place a file
+(note, that files in HPFS directory must be sorted) and when searching for
+a file. Finally when I opened this directory in PmShell, PmShell crashed (the
+funny thing was that, when rebooted, PmShell tried to reopen this directory
+again :-). chkdsk happily ignores these errors and only low-level disk
+modification saved me.  Never mix different language versions of OS/2 on one
+system although HPFS was designed to allow that.
+OK, I could implement complex codepage support to this driver but I think it
+would cause more problems than benefit with such buggy implementation in OS/2.
+So this driver simply uses first codepage it finds for uppercasing and
+lowercasing no matter what's file codepage index. Usually all file names are in
+this codepage - if you don't try to do what I described above :-)
+Known bugs
+HPFS386 on OS/2 server is not supported. HPFS386 installed on normal OS/2 client
+should work. If you have OS/2 server, use only read-only mode. I don't know how
+to handle some HPFS386 structures like access control list or extended perm
+list, I don't know how to delete them when file is deleted and how to not
+overwrite them with extended attributes. Send me some info on these structures
+and I'll make it. However, this driver should detect presence of HPFS386
+structures, remount read-only and not destroy them (I hope).
+When there's not enough space for extended attributes, they will be truncated
+and no error is returned.
+OS/2 can't access files if the path is longer than about 256 chars but this
+driver allows you to do it. chkdsk ignores such errors.
+Sometimes you won't be able to delete some files on a very full filesystem
+(returning error ENOSPC). That's because file in non-leaf node in directory tree
+(one directory, if it's large, has dirents in tree on HPFS) must be replaced
+with another node when deleted. And that new file might have larger name than
+the old one so the new name doesn't fit in directory node (dnode). And that
+would result in directory tree splitting, that takes disk space. Workaround is
+to delete other files that are leaf (probability that the file is non-leaf is
+about 1/50) or to truncate file first to make some space.
+You encounter this problem only if you have many directories so that
+preallocated directory band is full i.e.
+        number_of_directories / size_of_filesystem_in_mb > 4.
+You can't delete open directories.
+You can't rename over directories (what is it good for?).
+Renaming files so that only case changes doesn't work. This driver supports it
+but vfs doesn't. Something like 'mv file FILE' won't work.
+All atimes and directory mtimes are not updated. That's because of performance
+reasons. If you extremely wish to update them, let me know, I'll write it (but
+it will be slow).
+When the system is out of memory and swap, it may slightly corrupt filesystem
+(lost files, unbalanced directories). (I guess all filesystem may do it).
+When compiled, you get warning: function declaration isn't a prototype. Does
+anybody know what does it mean?
+What does "unbalanced tree" message mean?
+Old versions of this driver created sometimes unbalanced dnode trees. OS/2
+chkdsk doesn't scream if the tree is unbalanced (and sometimes creates
+unbalanced trees too :-) but both HPFS and HPFS386 contain bug that it rarely
+crashes when the tree is not balanced. This driver handles unbalanced trees
+correctly and writes warning if it finds them. If you see this message, this is
+probably because of directories created with old version of this driver.
+Workaround is to move all files from that directory to another and then back
+again. Do it in Linux, not OS/2! If you see this message in directory that is
+whole created by this driver, it is BUG - let me know about it.
+Bugs in OS/2
+When you have two (or more) lost directories pointing each to other, chkdsk
+locks up when repairing filesystem.
+Sometimes (I think it's random) when you create a file with one-char name under
+OS/2, OS/2 marks it as 'long'. chkdsk then removes this flag saying "Minor fs
+error corrected".
+File names like "a .b" are marked as 'long' by OS/2 but chkdsk "corrects" it and
+marks them as short (and writes "minor fs error corrected"). This bug is not in
+HPFS386.
+Codepage bugs described above.
+If you don't install fixpacks, there are many, many more...
+History
+0.90 First public release
+0.91 Fixed bug that caused shooting to memory when write_inode was called on
+        open inode (rarely happened)
+0.92 Fixed a little memory leak in freeing directory inodes
+0.93 Fixed bug that locked up the machine when there were too many filenames
+        with first 15 characters same
+     Fixed write_file to zero file when writing behind file end
+0.94 Fixed a little memory leak when trying to delete busy file or directory
+0.95 Fixed a bug that i_hpfs_parent_dir was not updated when moving files
+1.90 First version for 2.1.1xx kernels
+1.91 Fixed a bug that chk_sectors failed when sectors were at the end of disk
+     Fixed a race-condition when write_inode is called while deleting file
+     Fixed a bug that could possibly happen (with very low probability) when
+        using 0xff in filenames
+     Rewritten locking to avoid race-conditions
+     Mount option 'eas' now works
+     Fsync no longer returns error
+     Files beginning with '.' are marked hidden
+     Remount support added
+     Alloc is not so slow when filesystem becomes full
+     Atimes are no more updated because it slows down operation
+     Code cleanup (removed all commented debug prints)
+1.92 Corrected a bug when sync was called just before closing file
+1.93 Modified, so that it works with kernels >= 2.1.131, I don't know if it
+        works with previous versions
+     Fixed a possible problem with disks > 64G (but I don't have one, so I can't
+        test it)
+     Fixed a file overflow at 2G
+     Added new option 'timeshift'
+     Changed behaviour on HPFS386: It is now possible to operate on HPFS386 in
+        read-only mode
+     Fixed a bug that slowed down alloc and prevented allocating 100% space
+        (this bug was not destructive)
+1.94 Added workaround for one bug in Linux
+     Fixed one buffer leak
+     Fixed some incompatibilities with large extended attributes (but it's still
+        not 100% ok, I have no info on it and OS/2 doesn't want to create them)
+     Rewritten allocation
+     Fixed a bug with i_blocks (du sometimes didn't display correct values)
+     Directories have no longer archive attribute set (some programs don't like
+        it)
+     Fixed a bug that it set badly one flag in large anode tree (it was not
+        destructive)
+1.95 Fixed one buffer leak, that could happen on corrupted filesystem
+     Fixed one bug in allocation in 1.94
+1.96 Added workaround for one bug in OS/2 (HPFS locked up, HPFS386 reported
+        error sometimes when opening directories in PMSHELL)
+     Fixed a possible bitmap race
+     Fixed possible problem on large disks
+     You can now delete open files
+     Fixed a nondestructive race in rename
+1.97 Support for HPFS v3 (on large partitions)
+     Fixed a bug that it didn't allow creation of files > 128M (it should be 2G)
+1.97.1 Changed names of global symbols
+       Fixed a bug when chmoding or chowning root directory
+1.98 Fixed a deadlock when using old_readdir
+     Better directory handling; workaround for "unbalanced tree" bug in OS/2
+1.99 Corrected a possible problem when there's not enough space while deleting
+        file
+     Now it tries to truncate the file if there's not enough space when deleting
+     Removed a lot of redundant code
+2.00 Fixed a bug in rename (it was there since 1.96)
+     Better anti-fragmentation strategy
+2.01 Fixed problem with directory listing over NFS
+     Directory lseek now checks for proper parameters
+     Fixed race-condition in buffer code - it is in all filesystems in Linux;
+        when reading device (cat /dev/hda) while creating files on it, files
+        could be damaged
+2.02 Woraround for bug in breada in Linux. breada could cause accesses beyond
+        end of partition
+2.03 Char, block devices and pipes are correctly created
+     Fixed non-crashing race in unlink (Alexander Viro)
+     Now it works with Japanese version of OS/2
+2.04 Fixed error when ftruncate used to extend file
+2.05 Fixed crash when got mount parameters without =
+     Fixed crash when allocation of anode failed due to full disk
+     Fixed some crashes when block io or inode allocation failed
+2.06 Fixed some crash on corrupted disk structures
+     Better allocation strategy
+     Reschedule points added so that it doesn't lock CPU long time
+     It should work in read-only mode on Warp Server
+2.07 More fixes for Warp Server. Now it really works
+2.08 Creating new files is not so slow on large disks
+     An attempt to sync deleted file does not generate filesystem error
+2.09 Fixed error on extremly fragmented files
+ vim: set textwidth=80:
diff --git a/Documentation/filesystems/isofs.txt b/Documentation/filesystems/isofs.txt
new file mode 100644
index 000000000000..f64a10506689
--- /dev/null
+++ b/Documentation/filesystems/isofs.txt
@@ -0,0 +1,38 @@
+Mount options that are the same as for msdos and vfat partitions.
+  gid=nnn       All files in the partition will be in group nnn.
+  uid=nnn       All files in the partition will be owned by user id nnn.
+  umask=nnn     The permission mask (see umask(1)) for the partition.
+Mount options that are the same as vfat partitions. These are only useful
+when using discs encoded using Microsoft's Joliet extensions.
+  iocharset=name Character set to use for converting from Unicode to
+                ASCII.  Joliet filenames are stored in Unicode format, but
+                Unix for the most part doesn't know how to deal with Unicode.
+                There is also an option of doing UTF8 translations with the
+                utf8 option.
+  utf8          Encode Unicode names in UTF8 format. Default is no.
+Mount options unique to the isofs filesystem.
+  block=512     Set the block size for the disk to 512 bytes
+  block=1024    Set the block size for the disk to 1024 bytes
+  block=2048    Set the block size for the disk to 2048 bytes
+  check=relaxed Matches filenames with different cases
+  check=strict  Matches only filenames with the exact same case
+  cruft         Try to handle badly formatted CDs.
+  map=off       Do not map non-Rock Ridge filenames to lower case
+  map=normal    Map non-Rock Ridge filenames to lower case
+  map=acorn     As map=normal but also apply Acorn extensions if present
+  mode=xxx      Sets the permissions on files to xxx
+  nojoliet      Ignore Joliet extensions if they are present.
+  norock        Ignore Rock Ridge extensions if they are present.
+  unhide        Show hidden files.
+  session=x     Select number of session on multisession CD
+  sbsector=xxx  Session begins from sector xxx
+Recommended documents about ISO 9660 standard are located at:
+http://www.y-adagio.com/public/standards/iso_cdromr/tocont.htm
+ftp://ftp.ecma.ch/ecma-st/Ecma-119.pdf
+Quoting from the PDF "This 2nd Edition of Standard ECMA-119 is technically 
+identical with ISO 9660.", so it is a valid and gratis substitute of the
+official ISO specification.
diff --git a/Documentation/filesystems/jfs.txt b/Documentation/filesystems/jfs.txt
new file mode 100644
index 000000000000..3e992daf99ad
--- /dev/null
+++ b/Documentation/filesystems/jfs.txt
@@ -0,0 +1,35 @@
+IBM's Journaled File System (JFS) for Linux
+JFS Homepage:  http://jfs.sourceforge.net/
+The following mount options are supported:
+iocharset=name  Character set to use for converting from Unicode to
+                ASCII.  The default is to do no conversion.  Use
+                iocharset=utf8 for UTF8 translations.  This requires
+                CONFIG_NLS_UTF8 to be set in the kernel .config file.
+                iocharset=none specifies the default behavior explicitly.
+resize=value    Resize the volume to <value> blocks.  JFS only supports
+                growing a volume, not shrinking it.  This option is only
+                valid during a remount, when the volume is mounted
+                read-write.  The resize keyword with no value will grow
+                the volume to the full size of the partition.
+nointegrity     Do not write to the journal.  The primary use of this option
+                is to allow for higher performance when restoring a volume
+                from backup media.  The integrity of the volume is not
+                guaranteed if the system abnormally abends.
+integrity       Default.  Commit metadata changes to the journal.  Use this
+                option to remount a volume where the nointegrity option was
+                previously specified in order to restore normal behavior.
+errors=continue         Keep going on a filesystem error.
+errors=remount-ro       Default. Remount the filesystem read-only on an error.
+errors=panic            Panic and halt the machine if an error occurs.
+Please send bugs, comments, cards and letters to shaggy@austin.ibm.com.
+The JFS mailing list can be subscribed to by using the link labeled
+"Mail list Subscribe" at our web page http://jfs.sourceforge.net/
diff --git a/Documentation/filesystems/ncpfs.txt b/Documentation/filesystems/ncpfs.txt
new file mode 100644
index 000000000000..f12c30c93f2f
--- /dev/null
+++ b/Documentation/filesystems/ncpfs.txt
@@ -0,0 +1,12 @@
+The ncpfs filesystem understands the NCP protocol, designed by the
+Novell Corporation for their NetWare(tm) product.  NCP is functionally
+similar to the NFS used in the TCP/IP community.
+To mount a NetWare filesystem, you need a special mount program, which
+can be found in the ncpfs package.  The home site for ncpfs is
+ftp.gwdg.de/pub/linux/misc/ncpfs, but sunsite and its many mirrors
+will have it as well.
+Related products are linware and mars_nwe, which will give Linux partial
+NetWare server functionality.  Linware's home site is
+klokan.sh.cvut.cz/pub/linux/linware; mars_nwe can be found on
+ftp.gwdg.de/pub/linux/misc/ncpfs.
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
new file mode 100644
index 000000000000..f89b440fad1d
--- /dev/null
+++ b/Documentation/filesystems/ntfs.txt
@@ -0,0 +1,630 @@
+The Linux NTFS filesystem driver
+================================
+Table of contents
+=================
+- Overview
+- Web site
+- Features
+- Supported mount options
+- Known bugs and (mis-)features
+- Using NTFS volume and stripe sets
+  - The Device-Mapper driver
+  - The Software RAID / MD driver
+  - Limitiations when using the MD driver
+- ChangeLog
+Overview
+========
+Linux-NTFS comes with a number of user-space programs known as ntfsprogs.
+These include mkntfs, a full-featured ntfs file system format utility,
+ntfsundelete used for recovering files that were unintentionally deleted
+from an NTFS volume and ntfsresize which is used to resize an NTFS partition.
+See the web site for more information.
+To mount an NTFS 1.2/3.x (Windows NT4/2000/XP/2003) volume, use the file
+system type 'ntfs'.  The driver currently supports read-only mode (with no
+fault-tolerance, encryption or journalling) and very limited, but safe, write
+support.
+For fault tolerance and raid support (i.e. volume and stripe sets), you can
+use the kernel's Software RAID / MD driver.  See section "Using Software RAID
+with NTFS" for details.
+Web site
+========
+There is plenty of additional information on the linux-ntfs web site
+at http://linux-ntfs.sourceforge.net/
+The web site has a lot of additional information, such as a comprehensive
+FAQ, documentation on the NTFS on-disk format, informaiton on the Linux-NTFS
+userspace utilities, etc.
+Features
+========
+- This is a complete rewrite of the NTFS driver that used to be in the kernel.
+  This new driver implements NTFS read support and is functionally equivalent
+  to the old ntfs driver.
+- The new driver has full support for sparse files on NTFS 3.x volumes which
+  the old driver isn't happy with.
+- The new driver supports execution of binaries due to mmap() now being
+  supported.
+- The new driver supports loopback mounting of files on NTFS which is used by
+  some Linux distributions to enable the user to run Linux from an NTFS
+  partition by creating a large file while in Windows and then loopback
+  mounting the file while in Linux and creating a Linux filesystem on it that
+  is used to install Linux on it.
+- A comparison of the two drivers using:
+        time find . -type f -exec md5sum "{}" \;
+  run three times in sequence with each driver (after a reboot) on a 1.4GiB
+  NTFS partition, showed the new driver to be 20% faster in total time elapsed
+  (from 9:43 minutes on average down to 7:53).  The time spent in user space
+  was unchanged but the time spent in the kernel was decreased by a factor of
+  2.5 (from 85 CPU seconds down to 33).
+- The driver does not support short file names in general.  For backwards
+  compatibility, we implement access to files using their short file names if
+  they exist.  The driver will not create short file names however, and a
+  rename will discard any existing short file name.
+- The new driver supports exporting of mounted NTFS volumes via NFS.
+- The new driver supports async io (aio).
+- The new driver supports fsync(2), fdatasync(2), and msync(2).
+- The new driver supports readv(2) and writev(2).
+- The new driver supports access time updates (including mtime and ctime).
+Supported mount options
+=======================
+In addition to the generic mount options described by the manual page for the
+mount command (man 8 mount, also see man 5 fstab), the NTFS driver supports the
+following mount options:
+iocharset=name          Deprecated option.  Still supported but please use
+                        nls=name in the future.  See description for nls=name.
+nls=name                Character set to use when returning file names.
+                        Unlike VFAT, NTFS suppresses names that contain
+                        unconvertible characters.  Note that most character
+                        sets contain insufficient characters to represent all
+                        possible Unicode characters that can exist on NTFS.
+                        To be sure you are not missing any files, you are
+                        advised to use nls=utf8 which is capable of
+                        representing all Unicode characters.
+utf8=<bool>             Option no longer supported.  Currently mapped to
+                        nls=utf8 but please use nls=utf8 in the future and
+                        make sure utf8 is compiled either as module or into
+                        the kernel.  See description for nls=name.
+uid=
+gid=
+umask=                  Provide default owner, group, and access mode mask.
+                        These options work as documented in mount(8).  By
+                        default, the files/directories are owned by root and
+                        he/she has read and write permissions, as well as
+                        browse permission for directories.  No one else has any
+                        access permissions.  I.e. the mode on all files is by
+                        default rw------- and for directories rwx------, a
+                        consequence of the default fmask=0177 and dmask=0077.
+                        Using a umask of zero will grant all permissions to
+                        everyone, i.e. all files and directories will have mode
+                        rwxrwxrwx.
+fmask=
+dmask=                  Instead of specifying umask which applies both to
+                        files and directories, fmask applies only to files and
+                        dmask only to directories.
+sloppy=<BOOL>           If sloppy is specified, ignore unknown mount options.
+                        Otherwise the default behaviour is to abort mount if
+                        any unknown options are found.
+show_sys_files=<BOOL>   If show_sys_files is specified, show the system files
+                        in directory listings.  Otherwise the default behaviour
+                        is to hide the system files.
+                        Note that even when show_sys_files is specified, "$MFT"
+                        will not be visible due to bugs/mis-features in glibc.
+                        Further, note that irrespective of show_sys_files, all
+                        files are accessible by name, i.e. you can always do
+                        "ls -l \$UpCase" for example to specifically show the
+                        system file containing the Unicode upcase table.
+case_sensitive=<BOOL>   If case_sensitive is specified, treat all file names as
+                        case sensitive and create file names in the POSIX
+                        namespace.  Otherwise the default behaviour is to treat
+                        file names as case insensitive and to create file names
+                        in the WIN32/LONG name space.  Note, the Linux NTFS
+                        driver will never create short file names and will
+                        remove them on rename/delete of the corresponding long
+                        file name.
+                        Note that files remain accessible via their short file
+                        name, if it exists.  If case_sensitive, you will need
+                        to provide the correct case of the short file name.
+errors=opt              What to do when critical file system errors are found.
+                        Following values can be used for "opt":
+                          continue: DEFAULT, try to clean-up as much as
+                                    possible, e.g. marking a corrupt inode as
+                                    bad so it is no longer accessed, and then
+                                    continue.
+                          recover:  At present only supported is recovery of
+                                    the boot sector from the backup copy.
+                                    If read-only mount, the recovery is done
+                                    in memory only and not written to disk.
+                        Note that the options are additive, i.e. specifying:
+                           errors=continue,errors=recover
+                        means the driver will attempt to recover and if that
+                        fails it will clean-up as much as possible and
+                        continue.
+mft_zone_multiplier=    Set the MFT zone multiplier for the volume (this
+                        setting is not persistent across mounts and can be
+                        changed from mount to mount but cannot be changed on
+                        remount).  Values of 1 to 4 are allowed, 1 being the
+                        default.  The MFT zone multiplier determines how much
+                        space is reserved for the MFT on the volume.  If all
+                        other space is used up, then the MFT zone will be
+                        shrunk dynamically, so this has no impact on the
+                        amount of free space.  However, it can have an impact
+                        on performance by affecting fragmentation of the MFT.
+                        In general use the default.  If you have a lot of small
+                        files then use a higher value.  The values have the
+                        following meaning:
+                              Value          MFT zone size (% of volume size)
+                                1               12.5%
+                                2               25%
+                                3               37.5%
+                                4               50%
+                        Note this option is irrelevant for read-only mounts.
+Known bugs and (mis-)features
+=============================
+- The link count on each directory inode entry is set to 1, due to Linux not
+  supporting directory hard links.  This may well confuse some user space
+  applications, since the directory names will have the same inode numbers.
+  This also speeds up ntfs_read_inode() immensely.  And we haven't found any
+  problems with this approach so far.  If you find a problem with this, please
+  let us know.
+Please send bug reports/comments/feedback/abuse to the Linux-NTFS development
+list at sourceforge: linux-ntfs-dev@lists.sourceforge.net
+Using NTFS volume and stripe sets
+=================================
+For support of volume and stripe sets, you can either use the kernel's
+Device-Mapper driver or the kernel's Software RAID / MD driver.  The former is
+the recommended one to use for linear raid.  But the latter is required for
+raid level 5.  For striping and mirroring, either driver should work fine.
+The Device-Mapper driver
+------------------------
+You will need to create a table of the components of the volume/stripe set and
+how they fit together and load this into the kernel using the dmsetup utility
+(see man 8 dmsetup).
+Linear volume sets, i.e. linear raid, has been tested and works fine.  Even
+though untested, there is no reason why stripe sets, i.e. raid level 0, and
+mirrors, i.e. raid level 1 should not work, too.  Stripes with parity, i.e.
+raid level 5, unfortunately cannot work yet because the current version of the
+Device-Mapper driver does not support raid level 5.  You may be able to use the
+Software RAID / MD driver for raid level 5, see the next section for details.
+To create the table describing your volume you will need to know each of its
+components and their sizes in sectors, i.e. multiples of 512-byte blocks.
+For NT4 fault tolerant volumes you can obtain the sizes using fdisk.  So for
+example if one of your partitions is /dev/hda2 you would do:
+$ fdisk -ul /dev/hda
+Disk /dev/hda: 81.9 GB, 81964302336 bytes
+255 heads, 63 sectors/track, 9964 cylinders, total 160086528 sectors
+Units = sectors of 1 * 512 = 512 bytes
+   Device Boot      Start         End      Blocks   Id  System
+   /dev/hda1   *          63     4209029     2104483+  83  Linux
+   /dev/hda2         4209030    37768814    16779892+  86  NTFS
+   /dev/hda3        37768815    46170809     4200997+  83  Linux
+And you would know that /dev/hda2 has a size of 37768814 - 4209030 + 1 =
+33559785 sectors.
+For Win2k and later dynamic disks, you can for example use the ldminfo utility
+which is part of the Linux LDM tools (the latest version at the time of
+writing is linux-ldm-0.0.8.tar.bz2).  You can download it from:
+        http://linux-ntfs.sourceforge.net/downloads.html
+Simply extract the downloaded archive (tar xvjf linux-ldm-0.0.8.tar.bz2), go
+into it (cd linux-ldm-0.0.8) and change to the test directory (cd test).  You
+will find the precompiled (i386) ldminfo utility there.  NOTE: You will not be
+able to compile this yourself easily so use the binary version!
+Then you would use ldminfo in dump mode to obtain the necessary information:
+$ ./ldminfo --dump /dev/hda
+This would dump the LDM database found on /dev/hda which describes all of your
+dynamic disks and all the volumes on them.  At the bottom you will see the
+VOLUME DEFINITIONS section which is all you really need.  You may need to look
+further above to determine which of the disks in the volume definitions is
+which device in Linux.  Hint: Run ldminfo on each of your dynamic disks and
+look at the Disk Id close to the top of the output for each (the PRIVATE HEADER
+section).  You can then find these Disk Ids in the VBLK DATABASE section in the
+<Disk> components where you will get the LDM Name for the disk that is found in
+the VOLUME DEFINITIONS section.
+Note you will also need to enable the LDM driver in the Linux kernel.  If your
+distribution did not enable it, you will need to recompile the kernel with it
+enabled.  This will create the LDM partitions on each device at boot time.  You
+would then use those devices (for /dev/hda they would be /dev/hda1, 2, 3, etc)
+in the Device-Mapper table.
+You can also bypass using the LDM driver by using the main device (e.g.
+/dev/hda) and then using the offsets of the LDM partitions into this device as
+the "Start sector of device" when creating the table.  Once again ldminfo would
+give you the correct information to do this.
+Assuming you know all your devices and their sizes things are easy.
+For a linear raid the table would look like this (note all values are in
+512-byte sectors):
+--- cut here ---
+# Offset into   Size of this    Raid type       Device          Start sector
+# volume        device                                          of device
+0               1028161         linear          /dev/hda1       0
+1028161         3903762         linear          /dev/hdb2       0
+4931923         2103211         linear          /dev/hdc1       0
+--- cut here ---
+For a striped volume, i.e. raid level 0, you will need to know the chunk size
+you used when creating the volume.  Windows uses 64kiB as the default, so it
+will probably be this unless you changes the defaults when creating the array.
+For a raid level 0 the table would look like this (note all values are in
+512-byte sectors):
+--- cut here ---
+# Offset   Size     Raid     Number   Chunk  1st        Start   2nd       Start
+# into     of the   type     of       size   Device     in      Device    in
+# volume   volume            stripes                    device            device
+0          2056320  striped  2        128    /dev/hda1  0       /dev/hdb1 0
+--- cut here ---
+If there are more than two devices, just add each of them to the end of the
+line.
+Finally, for a mirrored volume, i.e. raid level 1, the table would look like
+this (note all values are in 512-byte sectors):
+--- cut here ---
+# Ofs Size   Raid   Log  Number Region Should Number Source  Start Taget  Start
+# in  of the type   type of log size   sync?  of     Device  in    Device in
+# vol volume             params              mirrors         Device       Device
+0    2056320 mirror core 2      16     nosync 2    /dev/hda1 0   /dev/hdb1 0
+--- cut here ---
+If you are mirroring to multiple devices you can specify further targets at the
+end of the line.
+Note the "Should sync?" parameter "nosync" means that the two mirrors are
+already in sync which will be the case on a clean shutdown of Windows.  If the
+mirrors are not clean, you can specify the "sync" option instead of "nosync"
+and the Device-Mapper driver will then copy the entirey of the "Source Device"
+to the "Target Device" or if you specified multipled target devices to all of
+them.
+Once you have your table, save it in a file somewhere (e.g. /etc/ntfsvolume1),
+and hand it over to dmsetup to work with, like so:
+$ dmsetup create myvolume1 /etc/ntfsvolume1
+You can obviously replace "myvolume1" with whatever name you like.
+If it all worked, you will now have the device /dev/device-mapper/myvolume1
+which you can then just use as an argument to the mount command as usual to
+mount the ntfs volume.  For example:
+$ mount -t ntfs -o ro /dev/device-mapper/myvolume1 /mnt/myvol1
+(You need to create the directory /mnt/myvol1 first and of course you can use
+anything you like instead of /mnt/myvol1 as long as it is an existing
+directory.)
+It is advisable to do the mount read-only to see if the volume has been setup
+correctly to avoid the possibility of causing damage to the data on the ntfs
+volume.
+The Software RAID / MD driver
+-----------------------------
+An alternative to using the Device-Mapper driver is to use the kernel's
+Software RAID / MD driver.  For which you need to set up your /etc/raidtab
+appropriately (see man 5 raidtab).
+Linear volume sets, i.e. linear raid, as well as stripe sets, i.e. raid level
+0, have been tested and work fine (though see section "Limitiations when using
+the MD driver with NTFS volumes" especially if you want to use linear raid).
+Even though untested, there is no reason why mirrors, i.e. raid level 1, and
+stripes with parity, i.e. raid level 5, should not work, too.
+You have to use the "persistent-superblock 0" option for each raid-disk in the
+NTFS volume/stripe you are configuring in /etc/raidtab as the persistent
+superblock used by the MD driver would damange the NTFS volume.
+Windows by default uses a stripe chunk size of 64k, so you probably want the
+"chunk-size 64k" option for each raid-disk, too.
+For example, if you have a stripe set consisting of two partitions /dev/hda5
+and /dev/hdb1 your /etc/raidtab would look like this:
+raiddev /dev/md0
+        raid-level      0
+        nr-raid-disks   2
+        nr-spare-disks  0
+        persistent-superblock   0
+        chunk-size      64k
+        device          /dev/hda5
+        raid-disk       0
+        device          /dev/hdb1
+        raid-disl       1
+For linear raid, just change the raid-level above to "raid-level linear", for
+mirrors, change it to "raid-level 1", and for stripe sets with parity, change
+it to "raid-level 5".
+Note for stripe sets with parity you will also need to tell the MD driver
+which parity algorithm to use by specifying the option "parity-algorithm
+which", where you need to replace "which" with the name of the algorithm to
+use (see man 5 raidtab for available algorithms) and you will have to try the
+different available algorithms until you find one that works.  Make sure you
+are working read-only when playing with this as you may damage your data
+otherwise.  If you find which algorithm works please let us know (email the
+linux-ntfs developers list linux-ntfs-dev@lists.sourceforge.net or drop in on
+IRC in channel #ntfs on the irc.freenode.net network) so we can update this
+documentation.
+Once the raidtab is setup, run for example raid0run -a to start all devices or
+raid0run /dev/md0 to start a particular md device, in this case /dev/md0.
+Then just use the mount command as usual to mount the ntfs volume using for
+example:        mount -t ntfs -o ro /dev/md0 /mnt/myntfsvolume
+It is advisable to do the mount read-only to see if the md volume has been
+setup correctly to avoid the possibility of causing damage to the data on the
+ntfs volume.
+Limitiations when using the Software RAID / MD driver
+-----------------------------------------------------
+Using the md driver will not work properly if any of your NTFS partitions have
+an odd number of sectors.  This is especially important for linear raid as all
+data after the first partition with an odd number of sectors will be offset by
+one or more sectors so if you mount such a partition with write support you
+will cause massive damage to the data on the volume which will only become
+apparent when you try to use the volume again under Windows.
+So when using linear raid, make sure that all your partitions have an even
+number of sectors BEFORE attempting to use it.  You have been warned!
+Even better is to simply use the Device-Mapper for linear raid and then you do
+not have this problem with odd numbers of sectors.
+ChangeLog
+=========
+Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
+2.1.22:
+        - Improve handling of ntfs volumes with errors.
+        - Fix various bugs and race conditions.
+2.1.21:
+        - Fix several race conditions and various other bugs.
+        - Many internal cleanups, code reorganization, optimizations, and mft
+          and index record writing code rewritten to fit in with the changes.
+        - Update Documentation/filesystems/ntfs.txt with instructions on how to
+          use the Device-Mapper driver with NTFS ftdisk/LDM raid.
+2.1.20:
+        - Fix two stupid bugs introduced in 2.1.18 release.
+2.1.19:
+        - Minor bugfix in handling of the default upcase table.
+        - Many internal cleanups and improvements.  Many thanks to Linus
+          Torvalds and Al Viro for the help and advice with the sparse
+          annotations and cleanups.
+2.1.18:
+        - Fix scheduling latencies at mount time.  (Ingo Molnar)
+        - Fix endianness bug in a little traversed portion of the attribute
+          lookup code.
+2.1.17:
+        - Fix bugs in mount time error code paths.
+2.1.16:
+        - Implement access time updates (including mtime and ctime).
+        - Implement fsync(2), fdatasync(2), and msync(2) system calls.
+        - Enable the readv(2) and writev(2) system calls.
+        - Enable access via the asynchronous io (aio) API by adding support for
+          the aio_read(3) and aio_write(3) functions.
+2.1.15:
+        - Invalidate quotas when (re)mounting read-write.
+          NOTE:  This now only leave user space journalling on the side.  (See
+          note for version 2.1.13, below.)
+2.1.14:
+        - Fix an NFSd caused deadlock reported by several users.
+2.1.13:
+        - Implement writing of inodes (access time updates are not implemented
+          yet so mounting with -o noatime,nodiratime is enforced).
+        - Enable writing out of resident files so you can now overwrite any
+          uncompressed, unencrypted, nonsparse file as long as you do not
+          change the file size.
+        - Add housekeeping of ntfs system files so that ntfsfix no longer needs
+          to be run after writing to an NTFS volume.
+          NOTE:  This still leaves quota tracking and user space journalling on
+          the side but they should not cause data corruption.  In the worst
+          case the charged quotas will be out of date ($Quota) and some
+          userspace applications might get confused due to the out of date
+          userspace journal ($UsnJrnl).
+2.1.12:
+        - Fix the second fix to the decompression engine from the 2.1.9 release
+          and some further internals cleanups.
+2.1.11:
+        - Driver internal cleanups.
+2.1.10:
+        - Force read-only (re)mounting of volumes with unsupported volume
+          flags and various cleanups.
+2.1.9:
+        - Fix two bugs in handling of corner cases in the decompression engine.
+2.1.8:
+        - Read the $MFT mirror and compare it to the $MFT and if the two do not
+          match, force a read-only mount and do not allow read-write remounts.
+        - Read and parse the $LogFile journal and if it indicates that the
+          volume was not shutdown cleanly, force a read-only mount and do not
+          allow read-write remounts.  If the $LogFile indicates a clean
+          shutdown and a read-write (re)mount is requested, empty $LogFile to
+          ensure that Windows cannot cause data corruption by replaying a stale
+          journal after Linux has written to the volume.
+        - Improve time handling so that the NTFS time is fully preserved when
+          converted to kernel time and only up to 99 nano-seconds are lost when
+          kernel time is converted to NTFS time.
+2.1.7:
+        - Enable NFS exporting of mounted NTFS volumes.
+2.1.6:
+        - Fix minor bug in handling of compressed directories that fixes the
+          erroneous "du" and "stat" output people reported.
+2.1.5:
+        - Minor bug fix in attribute list attribute handling that fixes the
+          I/O errors on "ls" of certain fragmented files found by at least two
+          people running Windows XP.
+2.1.4:
+        - Minor update allowing compilation with all gcc versions (well, the
+          ones the kernel can be compiled with anyway).
+2.1.3:
+        - Major bug fixes for reading files and volumes in corner cases which
+          were being hit by Windows 2k/XP users.
+2.1.2:
+        - Major bug fixes aleviating the hangs in statfs experienced by some
+          users.
+2.1.1:
+        - Update handling of compressed files so people no longer get the
+          frequently reported warning messages about initialized_size !=
+          data_size.
+2.1.0:
+        - Add configuration option for developmental write support.
+        - Initial implementation of file overwriting. (Writes to resident files
+          are not written out to disk yet, so avoid writing to files smaller
+          than about 1kiB.)
+        - Intercept/abort changes in file size as they are not implemented yet.
+2.0.25:
+        - Minor bugfixes in error code paths and small cleanups.
+2.0.24:
+        - Small internal cleanups.
+        - Support for sendfile system call. (Christoph Hellwig)
+2.0.23:
+        - Massive internal locking changes to mft record locking. Fixes
+          various race conditions and deadlocks.
+        - Fix ntfs over loopback for compressed files by adding an
+          optimization barrier. (gcc was screwing up otherwise ?)
+        Thanks go to Christoph Hellwig for pointing these two out:
+        - Remove now unused function fs/ntfs/malloc.h::vmalloc_nofs().
+        - Fix ntfs_free() for ia64 and parisc.
+2.0.22:
+        - Small internal cleanups.
+2.0.21:
+        These only affect 32-bit architectures:
+        - Check for, and refuse to mount too large volumes (maximum is 2TiB).
+        - Check for, and refuse to open too large files and directories
+          (maximum is 16TiB).
+2.0.20:
+        - Support non-resident directory index bitmaps. This means we now cope
+          with huge directories without problems.
+        - Fix a page leak that manifested itself in some cases when reading
+          directory contents.
+        - Internal cleanups.
+2.0.19:
+        - Fix race condition and improvements in block i/o interface.
+        - Optimization when reading compressed files.
+2.0.18:
+        - Fix race condition in reading of compressed files.
+2.0.17:
+        - Cleanups and optimizations.
+2.0.16:
+        - Fix stupid bug introduced in 2.0.15 in new attribute inode API.
+        - Big internal cleanup replacing the mftbmp access hacks by using the
+          new attribute inode API instead.
+2.0.15:
+        - Bug fix in parsing of remount options.
+        - Internal changes implementing attribute (fake) inodes allowing all
+          attribute i/o to go via the page cache and to use all the normal
+          vfs/mm functionality.
+2.0.14:
+        - Internal changes improving run list merging code and minor locking
+          change to not rely on BKL in ntfs_statfs().
+2.0.13:
+        - Internal changes towards using iget5_locked() in preparation for
+          fake inodes and small cleanups to ntfs_volume structure.
+2.0.12:
+        - Internal cleanups in address space operations made possible by the
+          changes introduced in the previous release.
+2.0.11:
+        - Internal updates and cleanups introducing the first step towards
+          fake inode based attribute i/o.
+2.0.10:
+        - Microsoft says that the maximum number of inodes is 2^32 - 1. Update
+          the driver accordingly to only use 32-bits to store inode numbers on
+          32-bit architectures. This improves the speed of the driver a little.
+2.0.9:
+        - Change decompression engine to use a single buffer. This should not
+          affect performance except perhaps on the most heavy i/o on SMP
+          systems when accessing multiple compressed files from multiple
+          devices simultaneously.
+        - Minor updates and cleanups.
+2.0.8:
+        - Remove now obsolete show_inodes and posix mount option(s).
+        - Restore show_sys_files mount option.
+        - Add new mount option case_sensitive, to determine if the driver
+          treats file names as case sensitive or not.
+        - Mostly drop support for short file names (for backwards compatibility
+          we only support accessing files via their short file name if one
+          exists).
+        - Fix dcache aliasing issues wrt short/long file names.
+        - Cleanups and minor fixes.
+2.0.7:
+        - Just cleanups.
+2.0.6:
+        - Major bugfix to make compatible with other kernel changes. This fixes
+          the hangs/oopses on umount.
+        - Locking cleanup in directory operations (remove BKL usage).
+2.0.5:
+        - Major buffer overflow bug fix.
+        - Minor cleanups and updates for kernel 2.5.12.
+2.0.4:
+        - Cleanups and updates for kernel 2.5.11.
+2.0.3:
+        - Small bug fixes, cleanups, and performance improvements.
+2.0.2:
+        - Use default fmask of 0177 so that files are no executable by default.
+          If you want owner executable files, just use fmask=0077.
+        - Update for kernel 2.5.9 but preserve backwards compatibility with
+          kernel 2.5.7.
+        - Minor bug fixes, cleanups, and updates.
+2.0.1:
+        - Minor updates, primarily set the executable bit by default on files
+          so they can be executed.
+2.0.0:
+        - Started ChangeLog.
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
new file mode 100644
index 000000000000..2f388460cbe7
--- /dev/null
+++ b/Documentation/filesystems/porting
@@ -0,0 +1,266 @@
+Changes since 2.5.0:
+--- 
+[recommended]
+New helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
+        sb_set_blocksize() and sb_min_blocksize().
+Use them.
+(sb_find_get_block() replaces 2.4's get_hash_table())
+--- 
+[recommended]
+New methods: ->alloc_inode() and ->destroy_inode().
+Remove inode->u.foo_inode_i
+Declare
+        struct foo_inode_info {
+                /* fs-private stuff */
+                struct inode vfs_inode;
+        };
+        static inline struct foo_inode_info *FOO_I(struct inode *inode)
+        {
+                return list_entry(inode, struct foo_inode_info, vfs_inode);
+        }
+Use FOO_I(inode) instead of &inode->u.foo_inode_i;
+Add foo_alloc_inode() and foo_destory_inode() - the former should allocate
+foo_inode_info and return the address of ->vfs_inode, the latter should free
+FOO_I(inode) (see in-tree filesystems for examples).
+Make them ->alloc_inode and ->destroy_inode in your super_operations.
+Keep in mind that now you need explicit initialization of private data -
+typically in ->read_inode() and after getting an inode from new_inode().
+At some point that will become mandatory.
+---
+[mandatory]
+Change of file_system_type method (->read_super to ->get_sb)
+->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
+Turn your foo_read_super() into a function that would return 0 in case of
+success and negative number in case of error (-EINVAL unless you have more
+informative error value to report).  Call it foo_fill_super().  Now declare
+struct super_block foo_get_sb(struct file_system_type *fs_type,
+        int flags, const char *dev_name, void *data)
+{
+        return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super);
+}
+(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
+filesystem).
+Replace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
+foo_get_sb.
+---
+[mandatory]
+Locking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
+Most likely there is no need to change anything, but if you relied on
+global exclusion between renames for some internal purpose - you need to
+change your internal locking.  Otherwise exclusion warranties remain the
+same (i.e. parents and victim are locked, etc.).
+---
+[informational]
+Now we have the exclusion between ->lookup() and directory removal (by
+->rmdir() and ->rename()).  If you used to need that exclusion and do
+it by internal locking (most of filesystems couldn't care less) - you
+can relax your locking.
+---
+[mandatory]
+->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
+->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
+and ->readdir() are called without BKL now.  Grab it on entry, drop upon return
+- that will guarantee the same locking you used to have.  If your method or its
+parts do not need BKL - better yet, now you can shift lock_kernel() and
+unlock_kernel() so that they would protect exactly what needs to be
+protected.
+---
+[mandatory]
+BKL is also moved from around sb operations.  ->write_super() Is now called 
+without BKL held.  BKL should have been shifted into individual fs sb_op
+functions.  If you don't need it, remove it.  
+---
+[informational]
+check for ->link() target not being a directory is done by callers.  Feel
+free to drop it...
+---
+[informational]
+->link() callers hold ->i_sem on the object we are linking to.  Some of your
+problems might be over...
+---
+[mandatory]
+new file_system_type method - kill_sb(superblock).  If you are converting
+an existing filesystem, set it according to ->fs_flags:
+        FS_REQUIRES_DEV         -       kill_block_super
+        FS_LITTER               -       kill_litter_super
+        neither                 -       kill_anon_super
+FS_LITTER is gone - just remove it from fs_flags.
+---
+[mandatory]
+        FS_SINGLE is gone (actually, that had happened back when ->get_sb()
+went in - and hadn't been documented ;-/).  Just remove it from fs_flags
+(and see ->get_sb() entry for other actions).
+---
+[mandatory]
+->setattr() is called without BKL now.  Caller _always_ holds ->i_sem, so
+watch for ->i_sem-grabbing code that might be used by your ->setattr().
+Callers of notify_change() need ->i_sem now.
+---
+[recommended]
+New super_block field "struct export_operations *s_export_op" for
+explicit support for exporting, e.g. via NFS.  The structure is fully
+documented at its declaration in include/linux/fs.h, and in
+Documentation/filesystems/Exporting.
+Briefly it allows for the definition of decode_fh and encode_fh operations
+to encode and decode filehandles, and allows the filesystem to use
+a standard helper function for decode_fh, and provide file-system specific
+support for this helper, particularly get_parent.
+It is planned that this will be required for exporting once the code
+settles down a bit.
+[mandatory]
+s_export_op is now required for exporting a filesystem.
+isofs, ext2, ext3, resierfs, fat
+can be used as examples of very different filesystems.
+---
+[mandatory]
+iget4() and the read_inode2 callback have been superseded by iget5_locked()
+which has the following prototype,
+    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
+                                int (*test)(struct inode *, void *),
+                                int (*set)(struct inode *, void *),
+                                void *data);
+'test' is an additional function that can be used when the inode
+number is not sufficient to identify the actual file object. 'set'
+should be a non-blocking function that initializes those parts of a
+newly created inode to allow the test function to succeed. 'data' is
+passed as an opaque value to both test and set functions.
+When the inode has been created by iget5_locked(), it will be returned with
+the I_NEW flag set and will still be locked. read_inode has not been
+called so the file system still has to finalize the initialization. Once
+the inode is initialized it must be unlocked by calling unlock_new_inode().
+The filesystem is responsible for setting (and possibly testing) i_ino
+when appropriate. There is also a simpler iget_locked function that
+just takes the superblock and inode number as arguments and does the
+test and set for you.
+e.g.
+       inode = iget_locked(sb, ino);
+       if (inode->i_state & I_NEW) {
+               read_inode_from_disk(inode);
+               unlock_new_inode(inode);
+       }
+---
+[recommended]
+->getattr() finally getting used.  See instances in nfs, minix, etc.
+---
+[mandatory]
+->revalidate() is gone.  If your filesystem had it - provide ->getattr()
+and let it call whatever you had as ->revlidate() + (for symlinks that
+had ->revalidate()) add calls in ->follow_link()/->readlink().
+---
+[mandatory]
+->d_parent changes are not protected by BKL anymore.  Read access is safe
+if at least one of the following is true:
+        * filesystem has no cross-directory rename()
+        * dcache_lock is held
+        * we know that parent had been locked (e.g. we are looking at
+->d_parent of ->lookup() argument).
+        * we are called from ->rename().
+        * the child's ->d_lock is held
+Audit your code and add locking if needed.  Notice that any place that is
+not protected by the conditions above is risky even in the old tree - you
+had been relying on BKL and that's prone to screwups.  Old tree had quite
+a few holes of that kind - unprotected access to ->d_parent leading to
+anything from oops to silent memory corruption.
+---
+[mandatory]
+        FS_NOMOUNT is gone.  If you use it - just set MS_NOUSER in flags
+(see rootfs for one kind of solution and bdev/socket/pipe for another).
+---
+[recommended]
+        Use bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
+is still alive, but only because of the mess in drivers/s390/block/dasd.c.
+As soon as it gets fixed is_read_only() will die.
+---
+[mandatory]
+->permission() is called without BKL now. Grab it on entry, drop upon
+return - that will guarantee the same locking you used to have.  If
+your method or its parts do not need BKL - better yet, now you can
+shift lock_kernel() and unlock_kernel() so that they would protect
+exactly what needs to be protected.
+---
+[mandatory]
+->statfs() is now called without BKL held.  BKL should have been
+shifted into individual fs sb_op functions where it's not clear that
+it's safe to remove it.  If you don't need it, remove it.
+---
+[mandatory]
+        is_read_only() is gone; use bdev_read_only() instead.
+---
+[mandatory]
+        destroy_buffers() is gone; use invalidate_bdev().
+---
+[mandatory]
+        fsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
+deliberate; as soon as struct block_device * is propagated in a reasonable
+way by that code fixing will become trivial; until then nothing can be
+done.
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
new file mode 100644
index 000000000000..cbe85c17176b
--- /dev/null
+++ b/Documentation/filesystems/proc.txt
@@ -0,0 +1,1940 @@
+------------------------------------------------------------------------------
+                       T H E  /proc   F I L E S Y S T E M
+------------------------------------------------------------------------------
+/proc/sys         Terrehon Bowden <terrehon@pacbell.net>        October 7 1999
+                  Bodo Bauer <bb@ricochet.net>
+2.4.x update      Jorge Nerin <comandante@zaralinux.com>      November 14 2000
+------------------------------------------------------------------------------
+Version 1.3                                              Kernel version 2.2.12
+                                              Kernel version 2.4.0-test11-pre4
+------------------------------------------------------------------------------
+Table of Contents
+-----------------
+  0     Preface
+  0.1   Introduction/Credits
+  0.2   Legal Stuff
+  1     Collecting System Information
+  1.1   Process-Specific Subdirectories
+  1.2   Kernel data
+  1.3   IDE devices in /proc/ide
+  1.4   Networking info in /proc/net
+  1.5   SCSI info
+  1.6   Parallel port info in /proc/parport
+  1.7   TTY info in /proc/tty
+  1.8   Miscellaneous kernel statistics in /proc/stat
+  2     Modifying System Parameters
+  2.1   /proc/sys/fs - File system data
+  2.2   /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
+  2.3   /proc/sys/kernel - general kernel parameters
+  2.4   /proc/sys/vm - The virtual memory subsystem
+  2.5   /proc/sys/dev - Device specific parameters
+  2.6   /proc/sys/sunrpc - Remote procedure calls
+  2.7   /proc/sys/net - Networking stuff
+  2.8   /proc/sys/net/ipv4 - IPV4 settings
+  2.9   Appletalk
+  2.10  IPX
+  2.11  /proc/sys/fs/mqueue - POSIX message queues filesystem
+------------------------------------------------------------------------------
+Preface
+------------------------------------------------------------------------------
+0.1 Introduction/Credits
+------------------------
+This documentation is  part of a soon (or  so we hope) to be  released book on
+the SuSE  Linux distribution. As  there is  no complete documentation  for the
+/proc file system and we've used  many freely available sources to write these
+chapters, it  seems only fair  to give the work  back to the  Linux community.
+This work is  based on the 2.2.*  kernel version and the  upcoming 2.4.*. I'm
+afraid it's still far from complete, but we  hope it will be useful. As far as
+we know, it is the first 'all-in-one' document about the /proc file system. It
+is focused  on the Intel  x86 hardware,  so if you  are looking for  PPC, ARM,
+SPARC, AXP, etc., features, you probably  won't find what you are looking for.
+It also only covers IPv4 networking, not IPv6 nor other protocols - sorry. But
+additions and patches  are welcome and will  be added to this  document if you
+mail them to Bodo.
+We'd like  to  thank Alan Cox, Rik van Riel, and Alexey Kuznetsov and a lot of
+other people for help compiling this documentation. We'd also like to extend a
+special thank  you to Andi Kleen for documentation, which we relied on heavily
+to create  this  document,  as well as the additional information he provided.
+Thanks to  everybody  else  who contributed source or docs to the Linux kernel
+and helped create a great piece of software... :)
+If you  have  any comments, corrections or additions, please don't hesitate to
+contact Bodo  Bauer  at  bb@ricochet.net.  We'll  be happy to add them to this
+document.
+The   latest   version    of   this   document   is    available   online   at
+http://skaro.nightcrawler.com/~bb/Docs/Proc as HTML version.
+If  the above  direction does  not works  for you,  ypu could  try the  kernel
+mailing  list  at  linux-kernel@vger.kernel.org  and/or try  to  reach  me  at
+comandante@zaralinux.com.
+0.2 Legal Stuff
+---------------
+We don't  guarantee  the  correctness  of this document, and if you come to us
+complaining about  how  you  screwed  up  your  system  because  of  incorrect
+documentation, we won't feel responsible...
+------------------------------------------------------------------------------
+CHAPTER 1: COLLECTING SYSTEM INFORMATION
+------------------------------------------------------------------------------
+------------------------------------------------------------------------------
+In This Chapter
+------------------------------------------------------------------------------
+* Investigating  the  properties  of  the  pseudo  file  system  /proc and its
+  ability to provide information on the running Linux system
+* Examining /proc's structure
+* Uncovering  various  information  about the kernel and the processes running
+  on the system
+------------------------------------------------------------------------------
+The proc  file  system acts as an interface to internal data structures in the
+kernel. It  can  be  used to obtain information about the system and to change
+certain kernel parameters at runtime (sysctl).
+First, we'll  take  a  look  at the read-only parts of /proc. In Chapter 2, we
+show you how you can use /proc/sys to change settings.
+1.1 Process-Specific Subdirectories
+-----------------------------------
+The directory  /proc  contains  (among other things) one subdirectory for each
+process running on the system, which is named after the process ID (PID).
+The link  self  points  to  the  process reading the file system. Each process
+subdirectory has the entries listed in Table 1-1.
+Table 1-1: Process specific entries in /proc 
+..............................................................................
+ File    Content                                        
+ cmdline Command line arguments                         
+ cpu     Current and last cpu in wich it was executed           (2.4)(smp)
+ cwd     Link to the current working directory
+ environ Values of environment variables      
+ exe     Link to the executable of this process
+ fd      Directory, which contains all file descriptors 
+ maps    Memory maps to executables and library files           (2.4)
+ mem     Memory held by this process                    
+ root    Link to the root directory of this process
+ stat    Process status                                 
+ statm   Process memory status information              
+ status  Process status in human readable form          
+ wchan   If CONFIG_KALLSYMS is set, a pre-decoded wchan
+..............................................................................
+For example, to get the status information of a process, all you have to do is
+read the file /proc/PID/status:
+  >cat /proc/self/status 
+  Name:   cat 
+  State:  R (running) 
+  Pid:    5452 
+  PPid:   743 
+  TracerPid:      0                                             (2.4)
+  Uid:    501     501     501     501 
+  Gid:    100     100     100     100 
+  Groups: 100 14 16 
+  VmSize:     1112 kB 
+  VmLck:         0 kB 
+  VmRSS:       348 kB 
+  VmData:       24 kB 
+  VmStk:        12 kB 
+  VmExe:         8 kB 
+  VmLib:      1044 kB 
+  SigPnd: 0000000000000000 
+  SigBlk: 0000000000000000 
+  SigIgn: 0000000000000000 
+  SigCgt: 0000000000000000 
+  CapInh: 00000000fffffeff 
+  CapPrm: 0000000000000000 
+  CapEff: 0000000000000000 
+This shows you nearly the same information you would get if you viewed it with
+the ps  command.  In  fact,  ps  uses  the  proc  file  system  to  obtain its
+information. The  statm  file  contains  more  detailed  information about the
+process memory usage. Its seven fields are explained in Table 1-2.
+Table 1-2: Contents of the statm files (as of 2.6.8-rc3)
+..............................................................................
+ Field    Content
+ size     total program size (pages)            (same as VmSize in status)
+ resident size of memory portions (pages)       (same as VmRSS in status)
+ shared   number of pages that are shared       (i.e. backed by a file)
+ trs      number of pages that are 'code'       (not including libs; broken,
+                                                        includes data segment)
+ lrs      number of pages of library            (always 0 on 2.6)
+ drs      number of pages of data/stack         (including libs; broken,
+                                                        includes library text)
+ dt       number of dirty pages                 (always 0 on 2.6)
+..............................................................................
+1.2 Kernel data
+---------------
+Similar to  the  process entries, the kernel data files give information about
+the running kernel. The files used to obtain this information are contained in
+/proc and  are  listed  in Table 1-3. Not all of these will be present in your
+system. It  depends  on the kernel configuration and the loaded modules, which
+files are there, and which are missing.
+Table 1-3: Kernel info in /proc 
+..............................................................................
+ File        Content                                           
+ apm         Advanced power management info                    
+ buddyinfo   Kernel memory allocator information (see text)     (2.5)
+ bus         Directory containing bus specific information     
+ cmdline     Kernel command line                               
+ cpuinfo     Info about the CPU                                
+ devices     Available devices (block and character)           
+ dma         Used DMS channels                                 
+ filesystems Supported filesystems                             
+ driver      Various drivers grouped here, currently rtc (2.4)
+ execdomains Execdomains, related to security                   (2.4)
+ fb          Frame Buffer devices                               (2.4)
+ fs          File system parameters, currently nfs/exports      (2.4)
+ ide         Directory containing info about the IDE subsystem 
+ interrupts  Interrupt usage                                   
+ iomem       Memory map                                         (2.4)
+ ioports     I/O port usage                                    
+ irq         Masks for irq to cpu affinity                      (2.4)(smp?)
+ isapnp      ISA PnP (Plug&Play) Info                           (2.4)
+ kcore       Kernel core image (can be ELF or A.OUT(deprecated in 2.4))   
+ kmsg        Kernel messages                                   
+ ksyms       Kernel symbol table                               
+ loadavg     Load average of last 1, 5 & 15 minutes                
+ locks       Kernel locks                                      
+ meminfo     Memory info                                       
+ misc        Miscellaneous                                     
+ modules     List of loaded modules                            
+ mounts      Mounted filesystems                               
+ net         Networking info (see text)                        
+ partitions  Table of partitions known to the system           
+ pci         Depreciated info of PCI bus (new way -> /proc/bus/pci/, 
+             decoupled by lspci                                 (2.4)
+ rtc         Real time clock                                   
+ scsi        SCSI info (see text)                              
+ slabinfo    Slab pool info                                    
+ stat        Overall statistics                                
+ swaps       Swap space utilization                            
+ sys         See chapter 2                                     
+ sysvipc     Info of SysVIPC Resources (msg, sem, shm)          (2.4)
+ tty         Info of tty drivers
+ uptime      System uptime                                     
+ version     Kernel version                                    
+ video       bttv info of video resources                       (2.4)
+..............................................................................
+You can,  for  example,  check  which interrupts are currently in use and what
+they are used for by looking in the file /proc/interrupts:
+  > cat /proc/interrupts 
+             CPU0        
+    0:    8728810          XT-PIC  timer 
+    1:        895          XT-PIC  keyboard 
+    2:          0          XT-PIC  cascade 
+    3:     531695          XT-PIC  aha152x 
+    4:    2014133          XT-PIC  serial 
+    5:      44401          XT-PIC  pcnet_cs 
+    8:          2          XT-PIC  rtc 
+   11:          8          XT-PIC  i82365 
+   12:     182918          XT-PIC  PS/2 Mouse 
+   13:          1          XT-PIC  fpu 
+   14:    1232265          XT-PIC  ide0 
+   15:          7          XT-PIC  ide1 
+  NMI:          0 
+In 2.4.* a couple of lines where added to this file LOC & ERR (this time is the
+output of a SMP machine):
+  > cat /proc/interrupts 
+             CPU0       CPU1       
+    0:    1243498    1214548    IO-APIC-edge  timer
+    1:       8949       8958    IO-APIC-edge  keyboard
+    2:          0          0          XT-PIC  cascade
+    5:      11286      10161    IO-APIC-edge  soundblaster
+    8:          1          0    IO-APIC-edge  rtc
+    9:      27422      27407    IO-APIC-edge  3c503
+   12:     113645     113873    IO-APIC-edge  PS/2 Mouse
+   13:          0          0          XT-PIC  fpu
+   14:      22491      24012    IO-APIC-edge  ide0
+   15:       2183       2415    IO-APIC-edge  ide1
+   17:      30564      30414   IO-APIC-level  eth0
+   18:        177        164   IO-APIC-level  bttv
+  NMI:    2457961    2457959 
+  LOC:    2457882    2457881 
+  ERR:       2155
+NMI is incremented in this case because every timer interrupt generates a NMI
+(Non Maskable Interrupt) which is used by the NMI Watchdog to detect lockups.
+LOC is the local interrupt counter of the internal APIC of every CPU.
+ERR is incremented in the case of errors in the IO-APIC bus (the bus that
+connects the CPUs in a SMP system. This means that an error has been detected,
+the IO-APIC automatically retry the transmission, so it should not be a big
+problem, but you should read the SMP-FAQ.
+In this context it could be interesting to note the new irq directory in 2.4.
+It could be used to set IRQ to CPU affinity, this means that you can "hook" an
+IRQ to only one CPU, or to exclude a CPU of handling IRQs. The contents of the
+irq subdir is one subdir for each IRQ, and one file; prof_cpu_mask
+For example 
+  > ls /proc/irq/
+  0  10  12  14  16  18  2  4  6  8  prof_cpu_mask
+  1  11  13  15  17  19  3  5  7  9
+  > ls /proc/irq/0/
+  smp_affinity
+The contents of the prof_cpu_mask file and each smp_affinity file for each IRQ
+is the same by default:
+  > cat /proc/irq/0/smp_affinity 
+  ffffffff
+It's a bitmask, in wich you can specify wich CPUs can handle the IRQ, you can
+set it by doing:
+  > echo 1 > /proc/irq/prof_cpu_mask
+This means that only the first CPU will handle the IRQ, but you can also echo 5
+wich means that only the first and fourth CPU can handle the IRQ.
+The way IRQs are routed is handled by the IO-APIC, and it's Round Robin
+between all the CPUs which are allowed to handle it. As usual the kernel has
+more info than you and does a better job than you, so the defaults are the
+best choice for almost everyone.
+There are  three  more  important subdirectories in /proc: net, scsi, and sys.
+The general  rule  is  that  the  contents,  or  even  the  existence of these
+directories, depend  on your kernel configuration. If SCSI is not enabled, the
+directory scsi  may  not  exist. The same is true with the net, which is there
+only when networking support is present in the running kernel.
+The slabinfo  file  gives  information  about  memory usage at the slab level.
+Linux uses  slab  pools for memory management above page level in version 2.2.
+Commonly used  objects  have  their  own  slab  pool (such as network buffers,
+directory cache, and so on).
+..............................................................................
+> cat /proc/buddyinfo
+Node 0, zone      DMA      0      4      5      4      4      3 ...
+Node 0, zone   Normal      1      0      0      1    101      8 ...
+Node 0, zone  HighMem      2      0      0      1      1      0 ...
+Memory fragmentation is a problem under some workloads, and buddyinfo is a 
+useful tool for helping diagnose these problems.  Buddyinfo will give you a 
+clue as to how big an area you can safely allocate, or why a previous
+allocation failed.
+Each column represents the number of pages of a certain order which are 
+available.  In this case, there are 0 chunks of 2^0*PAGE_SIZE available in 
+ZONE_DMA, 4 chunks of 2^1*PAGE_SIZE in ZONE_DMA, 101 chunks of 2^4*PAGE_SIZE 
+available in ZONE_NORMAL, etc... 
+..............................................................................
+meminfo:
+Provides information about distribution and utilization of memory.  This
+varies by architecture and compile options.  The following is from a
+16GB PIII, which has highmem enabled.  You may not have all of these fields.
+> cat /proc/meminfo
+MemTotal:     16344972 kB
+MemFree:      13634064 kB
+Buffers:          3656 kB
+Cached:        1195708 kB
+SwapCached:          0 kB
+Active:         891636 kB
+Inactive:      1077224 kB
+HighTotal:    15597528 kB
+HighFree:     13629632 kB
+LowTotal:       747444 kB
+LowFree:          4432 kB
+SwapTotal:           0 kB
+SwapFree:            0 kB
+Dirty:             968 kB
+Writeback:           0 kB
+Mapped:         280372 kB
+Slab:           684068 kB
+CommitLimit:   7669796 kB
+Committed_AS:   100056 kB
+PageTables:      24448 kB
+VmallocTotal:   112216 kB
+VmallocUsed:       428 kB
+VmallocChunk:   111088 kB
+    MemTotal: Total usable ram (i.e. physical ram minus a few reserved
+              bits and the kernel binary code)
+     MemFree: The sum of LowFree+HighFree
+     Buffers: Relatively temporary storage for raw disk blocks
+              shouldn't get tremendously large (20MB or so)
+      Cached: in-memory cache for files read from the disk (the
+              pagecache).  Doesn't include SwapCached
+  SwapCached: Memory that once was swapped out, is swapped back in but
+              still also is in the swapfile (if memory is needed it
+              doesn't need to be swapped out AGAIN because it is already
+              in the swapfile. This saves I/O)
+      Active: Memory that has been used more recently and usually not
+              reclaimed unless absolutely necessary.
+    Inactive: Memory which has been less recently used.  It is more
+              eligible to be reclaimed for other purposes
+   HighTotal:
+    HighFree: Highmem is all memory above ~860MB of physical memory
+              Highmem areas are for use by userspace programs, or
+              for the pagecache.  The kernel must use tricks to access
+              this memory, making it slower to access than lowmem.
+    LowTotal:
+     LowFree: Lowmem is memory which can be used for everything that
+              highmem can be used for, but it is also availble for the
+              kernel's use for its own data structures.  Among many
+              other things, it is where everything from the Slab is
+              allocated.  Bad things happen when you're out of lowmem.
+   SwapTotal: total amount of swap space available
+    SwapFree: Memory which has been evicted from RAM, and is temporarily
+              on the disk
+       Dirty: Memory which is waiting to get written back to the disk
+   Writeback: Memory which is actively being written back to the disk
+      Mapped: files which have been mmaped, such as libraries
+              Slab: in-kernel data structures cache
+ CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'),
+              this is the total amount of  memory currently available to
+              be allocated on the system. This limit is only adhered to
+              if strict overcommit accounting is enabled (mode 2 in
+              'vm.overcommit_memory').
+              The CommitLimit is calculated with the following formula:
+              CommitLimit = ('vm.overcommit_ratio' * Physical RAM) + Swap
+              For example, on a system with 1G of physical RAM and 7G
+              of swap with a `vm.overcommit_ratio` of 30 it would
+              yield a CommitLimit of 7.3G.
+              For more details, see the memory overcommit documentation
+              in vm/overcommit-accounting.
+Committed_AS: The amount of memory presently allocated on the system.
+              The committed memory is a sum of all of the memory which
+              has been allocated by processes, even if it has not been
+              "used" by them as of yet. A process which malloc()'s 1G
+              of memory, but only touches 300M of it will only show up
+              as using 300M of memory even if it has the address space
+              allocated for the entire 1G. This 1G is memory which has
+              been "committed" to by the VM and can be used at any time
+              by the allocating application. With strict overcommit
+              enabled on the system (mode 2 in 'vm.overcommit_memory'),
+              allocations which would exceed the CommitLimit (detailed
+              above) will not be permitted. This is useful if one needs
+              to guarantee that processes will not fail due to lack of
+              memory once that memory has been successfully allocated.
+  PageTables: amount of memory dedicated to the lowest level of page
+              tables.
+VmallocTotal: total size of vmalloc memory area
+ VmallocUsed: amount of vmalloc area which is used
+VmallocChunk: largest contigious block of vmalloc area which is free
+1.3 IDE devices in /proc/ide
+----------------------------
+The subdirectory /proc/ide contains information about all IDE devices of which
+the kernel  is  aware.  There is one subdirectory for each IDE controller, the
+file drivers  and a link for each IDE device, pointing to the device directory
+in the controller specific subtree.
+The file  drivers  contains general information about the drivers used for the
+IDE devices:
+  > cat /proc/ide/drivers
+  ide-cdrom version 4.53
+  ide-disk version 1.08
+More detailed  information  can  be  found  in  the  controller  specific
+subdirectories. These  are  named  ide0,  ide1  and  so  on.  Each  of  these
+directories contains the files shown in table 1-4.
+Table 1-4: IDE controller info in  /proc/ide/ide? 
+..............................................................................
+ File    Content                                 
+ channel IDE channel (0 or 1)                    
+ config  Configuration (only for PCI/IDE bridge) 
+ mate    Mate name                               
+ model   Type/Chipset of IDE controller          
+..............................................................................
+Each device  connected  to  a  controller  has  a separate subdirectory in the
+controllers directory.  The  files  listed in table 1-5 are contained in these
+directories.
+Table 1-5: IDE device information 
+..............................................................................
+ File             Content                                    
+ cache            The cache                                  
+ capacity         Capacity of the medium (in 512Byte blocks) 
+ driver           driver and version                         
+ geometry         physical and logical geometry              
+ identify         device identify block                      
+ media            media type                                 
+ model            device identifier                          
+ settings         device setup                               
+ smart_thresholds IDE disk management thresholds             
+ smart_values     IDE disk management values                 
+..............................................................................
+The most  interesting  file is settings. This file contains a nice overview of
+the drive parameters:
+  # cat /proc/ide/ide0/hda/settings 
+  name                    value           min             max             mode 
+  ----                    -----           ---             ---             ---- 
+  bios_cyl                526             0               65535           rw 
+  bios_head               255             0               255             rw 
+  bios_sect               63              0               63              rw 
+  breada_readahead        4               0               127             rw 
+  bswap                   0               0               1               r 
+  file_readahead          72              0               2097151         rw 
+  io_32bit                0               0               3               rw 
+  keepsettings            0               0               1               rw 
+  max_kb_per_request      122             1               127             rw 
+  multcount               0               0               8               rw 
+  nice1                   1               0               1               rw 
+  nowerr                  0               0               1               rw 
+  pio_mode                write-only      0               255             w 
+  slow                    0               0               1               rw 
+  unmaskirq               0               0               1               rw 
+  using_dma               0               0               1               rw 
+1.4 Networking info in /proc/net
+--------------------------------
+The subdirectory  /proc/net  follows  the  usual  pattern. Table 1-6 shows the
+additional values  you  get  for  IP  version 6 if you configure the kernel to
+support this. Table 1-7 lists the files and their meaning.
+Table 1-6: IPv6 info in /proc/net 
+..............................................................................
+ File       Content                                               
+ udp6       UDP sockets (IPv6)                                    
+ tcp6       TCP sockets (IPv6)                                    
+ raw6       Raw device statistics (IPv6)                          
+ igmp6      IP multicast addresses, which this host joined (IPv6) 
+ if_inet6   List of IPv6 interface addresses                      
+ ipv6_route Kernel routing table for IPv6                         
+ rt6_stats  Global IPv6 routing tables statistics                 
+ sockstat6  Socket statistics (IPv6)                              
+ snmp6      Snmp data (IPv6)                                      
+..............................................................................
+Table 1-7: Network info in /proc/net 
+..............................................................................
+ File          Content                                                         
+ arp           Kernel  ARP table                                               
+ dev           network devices with statistics                                 
+ dev_mcast     the Layer2 multicast groups a device is listening too
+               (interface index, label, number of references, number of bound
+               addresses). 
+ dev_stat      network device status                                           
+ ip_fwchains   Firewall chain linkage                                          
+ ip_fwnames    Firewall chain names                                            
+ ip_masq       Directory containing the masquerading tables                    
+ ip_masquerade Major masquerading table                                        
+ netstat       Network statistics                                              
+ raw           raw device statistics                                           
+ route         Kernel routing table                                            
+ rpc           Directory containing rpc info                                   
+ rt_cache      Routing cache                                                   
+ snmp          SNMP data                                                       
+ sockstat      Socket statistics                                               
+ tcp           TCP  sockets                                                    
+ tr_rif        Token ring RIF routing table                                    
+ udp           UDP sockets                                                     
+ unix          UNIX domain sockets                                             
+ wireless      Wireless interface data (Wavelan etc)                           
+ igmp          IP multicast addresses, which this host joined                  
+ psched        Global packet scheduler parameters.                             
+ netlink       List of PF_NETLINK sockets                                      
+ ip_mr_vifs    List of multicast virtual interfaces                            
+ ip_mr_cache   List of multicast routing cache                                 
+..............................................................................
+You can  use  this  information  to see which network devices are available in
+your system and how much traffic was routed over those devices:
+  > cat /proc/net/dev 
+  Inter-|Receive                                                   |[... 
+   face |bytes    packets errs drop fifo frame compressed multicast|[... 
+      lo:  908188   5596     0    0    0     0          0         0 [...         
+    ppp0:15475140  20721   410    0    0   410          0         0 [...  
+    eth0:  614530   7085     0    0    0     0          0         1 [... 
+   
+  ...] Transmit 
+  ...] bytes    packets errs drop fifo colls carrier compressed 
+  ...]  908188     5596    0    0    0     0       0          0 
+  ...] 1375103    17405    0    0    0     0       0          0 
+  ...] 1703981     5535    0    0    0     3       0          0 
+In addition, each Channel Bond interface has it's own directory.  For
+example, the bond0 device will have a directory called /proc/net/bond0/.
+It will contain information that is specific to that bond, such as the
+current slaves of the bond, the link status of the slaves, and how
+many times the slaves link has failed.
+1.5 SCSI info
+-------------
+If you  have  a  SCSI  host adapter in your system, you'll find a subdirectory
+named after  the driver for this adapter in /proc/scsi. You'll also see a list
+of all recognized SCSI devices in /proc/scsi:
+  >cat /proc/scsi/scsi 
+  Attached devices: 
+  Host: scsi0 Channel: 00 Id: 00 Lun: 00 
+    Vendor: IBM      Model: DGHS09U          Rev: 03E0 
+    Type:   Direct-Access                    ANSI SCSI revision: 03 
+  Host: scsi0 Channel: 00 Id: 06 Lun: 00 
+    Vendor: PIONEER  Model: CD-ROM DR-U06S   Rev: 1.04 
+    Type:   CD-ROM                           ANSI SCSI revision: 02 
+The directory  named  after  the driver has one file for each adapter found in
+the system.  These  files  contain information about the controller, including
+the used  IRQ  and  the  IO  address range. The amount of information shown is
+dependent on  the adapter you use. The example shows the output for an Adaptec
+AHA-2940 SCSI adapter:
+  > cat /proc/scsi/aic7xxx/0 
+   
+  Adaptec AIC7xxx driver version: 5.1.19/3.2.4 
+  Compile Options: 
+    TCQ Enabled By Default : Disabled 
+    AIC7XXX_PROC_STATS     : Disabled 
+    AIC7XXX_RESET_DELAY    : 5 
+  Adapter Configuration: 
+             SCSI Adapter: Adaptec AHA-294X Ultra SCSI host adapter 
+                             Ultra Wide Controller 
+      PCI MMAPed I/O Base: 0xeb001000 
+   Adapter SEEPROM Config: SEEPROM found and used. 
+        Adaptec SCSI BIOS: Enabled 
+                      IRQ: 10 
+                     SCBs: Active 0, Max Active 2, 
+                           Allocated 15, HW 16, Page 255 
+               Interrupts: 160328 
+        BIOS Control Word: 0x18b6 
+     Adapter Control Word: 0x005b 
+     Extended Translation: Enabled 
+  Disconnect Enable Flags: 0xffff 
+       Ultra Enable Flags: 0x0001 
+   Tag Queue Enable Flags: 0x0000 
+  Ordered Queue Tag Flags: 0x0000 
+  Default Tag Queue Depth: 8 
+      Tagged Queue By Device array for aic7xxx host instance 0: 
+        {255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255} 
+      Actual queue depth per device for aic7xxx host instance 0: 
+        {1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1} 
+  Statistics: 
+  (scsi0:0:0:0) 
+    Device using Wide/Sync transfers at 40.0 MByte/sec, offset 8 
+    Transinfo settings: current(12/8/1/0), goal(12/8/1/0), user(12/15/1/0) 
+    Total transfers 160151 (74577 reads and 85574 writes) 
+  (scsi0:0:6:0) 
+    Device using Narrow/Sync transfers at 5.0 MByte/sec, offset 15 
+    Transinfo settings: current(50/15/0/0), goal(50/15/0/0), user(50/15/0/0) 
+    Total transfers 0 (0 reads and 0 writes) 
+1.6 Parallel port info in /proc/parport
+---------------------------------------
+The directory  /proc/parport  contains information about the parallel ports of
+your system.  It  has  one  subdirectory  for  each port, named after the port
+number (0,1,2,...).
+These directories contain the four files shown in Table 1-8.
+Table 1-8: Files in /proc/parport 
+..............................................................................
+ File      Content                                                             
+ autoprobe Any IEEE-1284 device ID information that has been acquired.         
+ devices   list of the device drivers using that port. A + will appear by the
+           name of the device currently using the port (it might not appear
+           against any). 
+ hardware  Parallel port's base address, IRQ line and DMA channel.             
+ irq       IRQ that parport is using for that port. This is in a separate
+           file to allow you to alter it by writing a new value in (IRQ
+           number or none). 
+..............................................................................
+1.7 TTY info in /proc/tty
+-------------------------
+Information about  the  available  and actually used tty's can be found in the
+directory /proc/tty.You'll  find  entries  for drivers and line disciplines in
+this directory, as shown in Table 1-9.
+Table 1-9: Files in /proc/tty 
+..............................................................................
+ File          Content                                        
+ drivers       list of drivers and their usage                
+ ldiscs        registered line disciplines                    
+ driver/serial usage statistic and status of single tty lines 
+..............................................................................
+To see  which  tty's  are  currently in use, you can simply look into the file
+/proc/tty/drivers:
+  > cat /proc/tty/drivers 
+  pty_slave            /dev/pts      136   0-255 pty:slave 
+  pty_master           /dev/ptm      128   0-255 pty:master 
+  pty_slave            /dev/ttyp       3   0-255 pty:slave 
+  pty_master           /dev/pty        2   0-255 pty:master 
+  serial               /dev/cua        5   64-67 serial:callout 
+  serial               /dev/ttyS       4   64-67 serial 
+  /dev/tty0            /dev/tty0       4       0 system:vtmaster 
+  /dev/ptmx            /dev/ptmx       5       2 system 
+  /dev/console         /dev/console    5       1 system:console 
+  /dev/tty             /dev/tty        5       0 system:/dev/tty 
+  unknown              /dev/tty        4    1-63 console 
+1.8 Miscellaneous kernel statistics in /proc/stat
+-------------------------------------------------
+Various pieces   of  information about  kernel activity  are  available in the
+/proc/stat file.  All  of  the numbers reported  in  this file are  aggregates
+since the system first booted.  For a quick look, simply cat the file:
+  > cat /proc/stat
+  cpu  2255 34 2290 22625563 6290 127 456
+  cpu0 1132 34 1441 11311718 3675 127 438
+  cpu1 1123 0 849 11313845 2614 0 18
+  intr 114930548 113199788 3 0 5 263 0 4 [... lots more numbers ...]
+  ctxt 1990473
+  btime 1062191376
+  processes 2915
+  procs_running 1
+  procs_blocked 0
+The very first  "cpu" line aggregates the  numbers in all  of the other "cpuN"
+lines.  These numbers identify the amount of time the CPU has spent performing
+different kinds of work.  Time units are in USER_HZ (typically hundredths of a
+second).  The meanings of the columns are as follows, from left to right:
+- user: normal processes executing in user mode
+- nice: niced processes executing in user mode
+- system: processes executing in kernel mode
+- idle: twiddling thumbs
+- iowait: waiting for I/O to complete
+- irq: servicing interrupts
+- softirq: servicing softirqs
+The "intr" line gives counts of interrupts  serviced since boot time, for each
+of the  possible system interrupts.   The first  column  is the  total of  all
+interrupts serviced; each  subsequent column is the  total for that particular
+interrupt.
+The "ctxt" line gives the total number of context switches across all CPUs.
+The "btime" line gives  the time at which the  system booted, in seconds since
+the Unix epoch.
+The "processes" line gives the number  of processes and threads created, which
+includes (but  is not limited  to) those  created by  calls to the  fork() and
+clone() system calls.
+The  "procs_running" line gives the  number of processes  currently running on
+CPUs.
+The   "procs_blocked" line gives  the  number of  processes currently blocked,
+waiting for I/O to complete.
+------------------------------------------------------------------------------
+Summary
+------------------------------------------------------------------------------
+The /proc file system serves information about the running system. It not only
+allows access to process data but also allows you to request the kernel status
+by reading files in the hierarchy.
+The directory  structure  of /proc reflects the types of information and makes
+it easy, if not obvious, where to look for specific data.
+------------------------------------------------------------------------------
+------------------------------------------------------------------------------
+CHAPTER 2: MODIFYING SYSTEM PARAMETERS
+------------------------------------------------------------------------------
+------------------------------------------------------------------------------
+In This Chapter
+------------------------------------------------------------------------------
+* Modifying kernel parameters by writing into files found in /proc/sys
+* Exploring the files which modify certain parameters
+* Review of the /proc/sys file tree
+------------------------------------------------------------------------------
+A very  interesting part of /proc is the directory /proc/sys. This is not only
+a source  of  information,  it also allows you to change parameters within the
+kernel. Be  very  careful  when attempting this. You can optimize your system,
+but you  can  also  cause  it  to  crash.  Never  alter kernel parameters on a
+production system.  Set  up  a  development machine and test to make sure that
+everything works  the  way  you want it to. You may have no alternative but to
+reboot the machine once an error has been made.
+To change  a  value,  simply  echo  the new value into the file. An example is
+given below  in the section on the file system data. You need to be root to do
+this. You  can  create  your  own  boot script to perform this every time your
+system boots.
+The files  in /proc/sys can be used to fine tune and monitor miscellaneous and
+general things  in  the operation of the Linux kernel. Since some of the files
+can inadvertently  disrupt  your  system,  it  is  advisable  to  read  both
+documentation and  source  before actually making adjustments. In any case, be
+very careful  when  writing  to  any  of these files. The entries in /proc may
+change slightly between the 2.1.* and the 2.2 kernel, so if there is any doubt
+review the kernel documentation in the directory /usr/src/linux/Documentation.
+This chapter  is  heavily  based  on the documentation included in the pre 2.2
+kernels, and became part of it in version 2.2.1 of the Linux kernel.
+2.1 /proc/sys/fs - File system data
+-----------------------------------
+This subdirectory  contains  specific  file system, file handle, inode, dentry
+and quota information.
+Currently, these files are in /proc/sys/fs:
+dentry-state
+------------
+Status of  the  directory  cache.  Since  directory  entries  are  dynamically
+allocated and  deallocated,  this  file indicates the current status. It holds
+six values, in which the last two are not used and are always zero. The others
+are listed in table 2-1.
+Table 2-1: Status files of the directory cache 
+..............................................................................
+ File       Content                                                            
+ nr_dentry  Almost always zero                                                 
+ nr_unused  Number of unused cache entries                                     
+ age_limit  
+            in seconds after the entry may be reclaimed, when memory is short 
+ want_pages internally                                                         
+..............................................................................
+dquot-nr and dquot-max
+----------------------
+The file dquot-max shows the maximum number of cached disk quota entries.
+The file  dquot-nr  shows  the  number of allocated disk quota entries and the
+number of free disk quota entries.
+If the number of available cached disk quotas is very low and you have a large
+number of simultaneous system users, you might want to raise the limit.
+file-nr and file-max
+--------------------
+The kernel  allocates file handles dynamically, but doesn't free them again at
+this time.
+The value  in  file-max  denotes  the  maximum number of file handles that the
+Linux kernel will allocate. When you get a lot of error messages about running
+out of  file handles, you might want to raise this limit. The default value is
+10% of  RAM in kilobytes.  To  change it, just  write the new number  into the
+file:
+  # cat /proc/sys/fs/file-max 
+  4096 
+  # echo 8192 > /proc/sys/fs/file-max 
+  # cat /proc/sys/fs/file-max 
+  8192 
+This method  of  revision  is  useful  for  all customizable parameters of the
+kernel - simply echo the new value to the corresponding file.
+Historically, the three values in file-nr denoted the number of allocated file
+handles,  the number of  allocated but  unused file  handles, and  the maximum
+number of file handles. Linux 2.6 always  reports 0 as the number of free file
+handles -- this  is not an error,  it just means that the  number of allocated
+file handles exactly matches the number of used file handles.
+Attempts to  allocate more  file descriptors than  file-max are  reported with
+printk, look for "VFS: file-max limit <number> reached".
+inode-state and inode-nr
+------------------------
+The file inode-nr contains the first two items from inode-state, so we'll skip
+to that file...
+inode-state contains  two  actual numbers and five dummy values. The numbers
+are nr_inodes and nr_free_inodes (in order of appearance).
+nr_inodes
+~~~~~~~~~
+Denotes the  number  of  inodes the system has allocated. This number will
+grow and shrink dynamically.
+nr_free_inodes
+--------------
+Represents the  number of free inodes. Ie. The number of inuse inodes is
+(nr_inodes - nr_free_inodes).
+super-nr and super-max
+----------------------
+Again, super  block structures are allocated by the kernel, but not freed. The
+file super-max  contains  the  maximum  number  of super block handlers, where
+super-nr shows the number of currently allocated ones.
+Every mounted file system needs a super block, so if you plan to mount lots of
+file systems, you may want to increase these numbers.
+aio-nr and aio-max-nr
+---------------------
+aio-nr is the running total of the number of events specified on the
+io_setup system call for all currently active aio contexts.  If aio-nr
+reaches aio-max-nr then io_setup will fail with EAGAIN.  Note that
+raising aio-max-nr does not result in the pre-allocation or re-sizing
+of any kernel data structures.
+2.2 /proc/sys/fs/binfmt_misc - Miscellaneous binary formats
+-----------------------------------------------------------
+Besides these  files, there is the subdirectory /proc/sys/fs/binfmt_misc. This
+handles the kernel support for miscellaneous binary formats.
+Binfmt_misc provides  the ability to register additional binary formats to the
+Kernel without  compiling  an additional module/kernel. Therefore, binfmt_misc
+needs to  know magic numbers at the beginning or the filename extension of the
+binary.
+It works by maintaining a linked list of structs that contain a description of
+a binary  format,  including  a  magic  with size (or the filename extension),
+offset and  mask,  and  the  interpreter name. On request it invokes the given
+interpreter with  the  original  program  as  argument,  as  binfmt_java  and
+binfmt_em86 and  binfmt_mz  do.  Since binfmt_misc does not define any default
+binary-formats, you have to register an additional binary-format.
+There are two general files in binfmt_misc and one file per registered format.
+The two general files are register and status.
+Registering a new binary format
+-------------------------------
+To register a new binary format you have to issue the command
+  echo :name:type:offset:magic:mask:interpreter: > /proc/sys/fs/binfmt_misc/register 
+with appropriate  name (the name for the /proc-dir entry), offset (defaults to
+0, if  omitted),  magic, mask (which can be omitted, defaults to all 0xff) and
+last but  not  least,  the  interpreter that is to be invoked (for example and
+testing /bin/echo).  Type  can be M for usual magic matching or E for filename
+extension matching (give extension in place of magic).
+Check or reset the status of the binary format handler
+------------------------------------------------------
+If you  do a cat on the file /proc/sys/fs/binfmt_misc/status, you will get the
+current status (enabled/disabled) of binfmt_misc. Change the status by echoing
+0 (disables)  or  1  (enables)  or  -1  (caution:  this  clears all previously
+registered binary  formats)  to status. For example echo 0 > status to disable
+binfmt_misc (temporarily).
+Status of a single handler
+--------------------------
+Each registered  handler has an entry in /proc/sys/fs/binfmt_misc. These files
+perform the  same function as status, but their scope is limited to the actual
+binary format.  By  cating this file, you also receive all related information
+about the interpreter/magic of the binfmt.
+Example usage of binfmt_misc (emulate binfmt_java)
+--------------------------------------------------
+  cd /proc/sys/fs/binfmt_misc  
+  echo ':Java:M::\xca\xfe\xba\xbe::/usr/local/java/bin/javawrapper:' > register  
+  echo ':HTML:E::html::/usr/local/java/bin/appletviewer:' > register  
+  echo ':Applet:M::<!--applet::/usr/local/java/bin/appletviewer:' > register 
+  echo ':DEXE:M::\x0eDEX::/usr/bin/dosexec:' > register 
+These four  lines  add  support  for  Java  executables and Java applets (like
+binfmt_java, additionally  recognizing the .html extension with no need to put
+<!--applet> to  every  applet  file).  You  have  to  install  the JDK and the
+shell-script /usr/local/java/bin/javawrapper  too.  It  works  around  the
+brokenness of  the Java filename handling. To add a Java binary, just create a
+link to the class-file somewhere in the path.
+2.3 /proc/sys/kernel - general kernel parameters
+------------------------------------------------
+This directory  reflects  general  kernel  behaviors. As I've said before, the
+contents depend  on  your  configuration.  Here you'll find the most important
+files, along with descriptions of what they mean and how to use them.
+acct
+----
+The file contains three values; highwater, lowwater, and frequency.
+It exists  only  when  BSD-style  process  accounting is enabled. These values
+control its behavior. If the free space on the file system where the log lives
+goes below  lowwater  percentage,  accounting  suspends.  If  it  goes  above
+highwater percentage,  accounting  resumes. Frequency determines how often you
+check the amount of free space (value is in seconds). Default settings are: 4,
+2, and  30.  That is, suspend accounting if there is less than 2 percent free;
+resume it  if we have a value of 3 or more percent; consider information about
+the amount of free space valid for 30 seconds
+ctrl-alt-del
+------------
+When the value in this file is 0, ctrl-alt-del is trapped and sent to the init
+program to  handle a graceful restart. However, when the value is greater that
+zero, Linux's  reaction  to  this key combination will be an immediate reboot,
+without syncing its dirty buffers.
+[NOTE]
+    When a  program  (like  dosemu)  has  the  keyboard  in  raw  mode,  the
+    ctrl-alt-del is  intercepted  by  the  program  before it ever reaches the
+    kernel tty  layer,  and  it is up to the program to decide what to do with
+    it.
+domainname and hostname
+-----------------------
+These files  can  be controlled to set the NIS domainname and hostname of your
+box. For the classic darkstar.frop.org a simple:
+  # echo "darkstar" > /proc/sys/kernel/hostname 
+  # echo "frop.org" > /proc/sys/kernel/domainname 
+would suffice to set your hostname and NIS domainname.
+osrelease, ostype and version
+-----------------------------
+The names make it pretty obvious what these fields contain:
+  > cat /proc/sys/kernel/osrelease 
+  2.2.12 
+   
+  > cat /proc/sys/kernel/ostype 
+  Linux 
+   
+  > cat /proc/sys/kernel/version 
+  #4 Fri Oct 1 12:41:14 PDT 1999 
+The files  osrelease and ostype should be clear enough. Version needs a little
+more clarification.  The  #4 means that this is the 4th kernel built from this
+source base and the date after it indicates the time the kernel was built. The
+only way to tune these values is to rebuild the kernel.
+panic
+-----
+The value  in  this  file  represents  the  number of seconds the kernel waits
+before rebooting  on  a  panic.  When  you  use  the  software  watchdog,  the
+recommended setting  is  60. If set to 0, the auto reboot after a kernel panic
+is disabled, which is the default setting.
+printk
+------
+The four values in printk denote
+* console_loglevel,
+* default_message_loglevel,
+* minimum_console_loglevel and
+* default_console_loglevel
+respectively.
+These values  influence  printk()  behavior  when  printing  or  logging error
+messages, which  come  from  inside  the  kernel.  See  syslog(2)  for  more
+information on the different log levels.
+console_loglevel
+----------------
+Messages with a higher priority than this will be printed to the console.
+default_message_level
+---------------------
+Messages without an explicit priority will be printed with this priority.
+minimum_console_loglevel
+------------------------
+Minimum (highest) value to which the console_loglevel can be set.
+default_console_loglevel
+------------------------
+Default value for console_loglevel.
+sg-big-buff
+-----------
+This file  shows  the size of the generic SCSI (sg) buffer. At this point, you
+can't tune  it  yet,  but  you  can  change  it  at  compile  time  by editing
+include/scsi/sg.h and changing the value of SG_BIG_BUFF.
+If you use a scanner with SANE (Scanner Access Now Easy) you might want to set
+this to a higher value. Refer to the SANE documentation on this issue.
+modprobe
+--------
+The location  where  the  modprobe  binary  is  located.  The kernel uses this
+program to load modules on demand.
+unknown_nmi_panic
+-----------------
+The value in this file affects behavior of handling NMI. When the value is
+non-zero, unknown NMI is trapped and then panic occurs. At that time, kernel
+debugging information is displayed on console.
+NMI switch that most IA32 servers have fires unknown NMI up, for example.
+If a system hangs up, try pressing the NMI switch.
+[NOTE]
+   This function and oprofile share a NMI callback. Therefore this function
+   cannot be enabled when oprofile is activated.
+   And NMI watchdog will be disabled when the value in this file is set to
+   non-zero.
+2.4 /proc/sys/vm - The virtual memory subsystem
+-----------------------------------------------
+The files  in  this directory can be used to tune the operation of the virtual
+memory (VM)  subsystem  of  the  Linux  kernel.
+vfs_cache_pressure
+------------------
+Controls the tendency of the kernel to reclaim the memory which is used for
+caching of directory and inode objects.
+At the default value of vfs_cache_pressure=100 the kernel will attempt to
+reclaim dentries and inodes at a "fair" rate with respect to pagecache and
+swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
+to retain dentry and inode caches.  Increasing vfs_cache_pressure beyond 100
+causes the kernel to prefer to reclaim dentries and inodes.
+dirty_background_ratio
+----------------------
+Contains, as a percentage of total system memory, the number of pages at which
+the pdflush background writeback daemon will start writing out dirty data.
+dirty_ratio
+-----------------
+Contains, as a percentage of total system memory, the number of pages at which
+a process which is generating disk writes will itself start writing out dirty
+data.
+dirty_writeback_centisecs
+-------------------------
+The pdflush writeback daemons will periodically wake up and write `old' data
+out to disk.  This tunable expresses the interval between those wakeups, in
+100'ths of a second.
+Setting this to zero disables periodic writeback altogether.
+dirty_expire_centisecs
+----------------------
+This tunable is used to define when dirty data is old enough to be eligible
+for writeout by the pdflush daemons.  It is expressed in 100'ths of a second. 
+Data which has been dirty in-memory for longer than this interval will be
+written out next time a pdflush daemon wakes up.
+legacy_va_layout
+----------------
+If non-zero, this sysctl disables the new 32-bit mmap mmap layout - the kernel
+will use the legacy (2.4) layout for all processes.
+lower_zone_protection
+---------------------
+For some specialised workloads on highmem machines it is dangerous for
+the kernel to allow process memory to be allocated from the "lowmem"
+zone.  This is because that memory could then be pinned via the mlock()
+system call, or by unavailability of swapspace.
+And on large highmem machines this lack of reclaimable lowmem memory
+can be fatal.
+So the Linux page allocator has a mechanism which prevents allocations
+which _could_ use highmem from using too much lowmem.  This means that
+a certain amount of lowmem is defended from the possibility of being
+captured into pinned user memory.
+(The same argument applies to the old 16 megabyte ISA DMA region.  This
+mechanism will also defend that region from allocations which could use
+highmem or lowmem).
+The `lower_zone_protection' tunable determines how aggressive the kernel is
+in defending these lower zones.  The default value is zero - no
+protection at all.
+If you have a machine which uses highmem or ISA DMA and your
+applications are using mlock(), or if you are running with no swap then
+you probably should increase the lower_zone_protection setting.
+The units of this tunable are fairly vague.  It is approximately equal
+to "megabytes".  So setting lower_zone_protection=100 will protect around 100
+megabytes of the lowmem zone from user allocations.  It will also make
+those 100 megabytes unavaliable for use by applications and by
+pagecache, so there is a cost.
+The effects of this tunable may be observed by monitoring
+/proc/meminfo:LowFree.  Write a single huge file and observe the point
+at which LowFree ceases to fall.
+A reasonable value for lower_zone_protection is 100.
+page-cluster
+------------
+page-cluster controls the number of pages which are written to swap in
+a single attempt.  The swap I/O size.
+It is a logarithmic value - setting it to zero means "1 page", setting
+it to 1 means "2 pages", setting it to 2 means "4 pages", etc.
+The default value is three (eight pages at a time).  There may be some
+small benefits in tuning this to a different value if your workload is
+swap-intensive.
+overcommit_memory
+-----------------
+This file  contains  one  value.  The following algorithm is used to decide if
+there's enough  memory:  if  the  value of overcommit_memory is positive, then
+there's always  enough  memory. This is a useful feature, since programs often
+malloc() huge  amounts  of  memory 'just in case', while they only use a small
+part of  it.  Leaving  this value at 0 will lead to the failure of such a huge
+malloc(), when in fact the system has enough memory for the program to run.
+On the  other  hand,  enabling this feature can cause you to run out of memory
+and thrash the system to death, so large and/or important servers will want to
+set this value to 0.
+nr_hugepages and hugetlb_shm_group
+----------------------------------
+nr_hugepages configures number of hugetlb page reserved for the system.
+hugetlb_shm_group contains group id that is allowed to create SysV shared
+memory segment using hugetlb page.
+laptop_mode
+-----------
+laptop_mode is a knob that controls "laptop mode". All the things that are
+controlled by this knob are discussed in Documentation/laptop-mode.txt.
+block_dump
+----------
+block_dump enables block I/O debugging when set to a nonzero value. More
+information on block I/O debugging is in Documentation/laptop-mode.txt.
+swap_token_timeout
+------------------
+This file contains valid hold time of swap out protection token. The Linux
+VM has token based thrashing control mechanism and uses the token to prevent
+unnecessary page faults in thrashing situation. The unit of the value is
+second. The value would be useful to tune thrashing behavior.
+2.5 /proc/sys/dev - Device specific parameters
+----------------------------------------------
+Currently there is only support for CDROM drives, and for those, there is only
+one read-only  file containing information about the CD-ROM drives attached to
+the system:
+  >cat /proc/sys/dev/cdrom/info 
+  CD-ROM information, Id: cdrom.c 2.55 1999/04/25 
+   
+  drive name:             sr0     hdb 
+  drive speed:            32      40 
+  drive # of slots:       1       0 
+  Can close tray:         1       1 
+  Can open tray:          1       1 
+  Can lock tray:          1       1 
+  Can change speed:       1       1 
+  Can select disk:        0       1 
+  Can read multisession:  1       1 
+  Can read MCN:           1       1 
+  Reports media changed:  1       1 
+  Can play audio:         1       1 
+You see two drives, sr0 and hdb, along with a list of their features.
+2.6 /proc/sys/sunrpc - Remote procedure calls
+---------------------------------------------
+This directory  contains four files, which enable or disable debugging for the
+RPC functions NFS, NFS-daemon, RPC and NLM. The default values are 0. They can
+be set to one to turn debugging on. (The default value is 0 for each)
+2.7 /proc/sys/net - Networking stuff
+------------------------------------
+The interface  to  the  networking  parts  of  the  kernel  is  located  in
+/proc/sys/net. Table  2-3  shows all possible subdirectories. You may see only
+some of them, depending on your kernel's configuration.
+Table 2-3: Subdirectories in /proc/sys/net 
+..............................................................................
+ Directory Content             Directory  Content            
+ core      General parameter   appletalk  Appletalk protocol 
+ unix      Unix domain sockets netrom     NET/ROM            
+ 802       E802 protocol       ax25       AX25               
+ ethernet  Ethernet protocol   rose       X.25 PLP layer     
+ ipv4      IP version 4        x25        X.25 protocol      
+ ipx       IPX                 token-ring IBM token ring     
+ bridge    Bridging            decnet     DEC net            
+ ipv6      IP version 6                   
+..............................................................................
+We will  concentrate  on IP networking here. Since AX15, X.25, and DEC Net are
+only minor players in the Linux world, we'll skip them in this chapter. You'll
+find some  short  info on Appletalk and IPX further on in this chapter. Review
+the online  documentation  and the kernel source to get a detailed view of the
+parameters for  those  protocols.  In  this  section  we'll  discuss  the
+subdirectories printed  in  bold letters in the table above. As default values
+are suitable for most needs, there is no need to change these values.
+/proc/sys/net/core - Network core options
+-----------------------------------------
+rmem_default
+------------
+The default setting of the socket receive buffer in bytes.
+rmem_max
+--------
+The maximum receive socket buffer size in bytes.
+wmem_default
+------------
+The default setting (in bytes) of the socket send buffer.
+wmem_max
+--------
+The maximum send socket buffer size in bytes.
+message_burst and message_cost
+------------------------------
+These parameters  are used to limit the warning messages written to the kernel
+log from  the  networking  code.  They  enforce  a  rate  limit  to  make  a
+denial-of-service attack  impossible. A higher message_cost factor, results in
+fewer messages that will be written. Message_burst controls when messages will
+be dropped.  The  default  settings  limit  warning messages to one every five
+seconds.
+netdev_max_backlog
+------------------
+Maximum number  of  packets,  queued  on  the  INPUT  side, when the interface
+receives packets faster than kernel can process them.
+optmem_max
+----------
+Maximum ancillary buffer size allowed per socket. Ancillary data is a sequence
+of struct cmsghdr structures with appended data.
+/proc/sys/net/unix - Parameters for Unix domain sockets
+-------------------------------------------------------
+There are  only  two  files  in this subdirectory. They control the delays for
+deleting and destroying socket descriptors.
+2.8 /proc/sys/net/ipv4 - IPV4 settings
+--------------------------------------
+IP version  4  is  still the most used protocol in Unix networking. It will be
+replaced by  IP version 6 in the next couple of years, but for the moment it's
+the de  facto  standard  for  the  internet  and  is  used  in most networking
+environments around  the  world.  Because  of the importance of this protocol,
+we'll have a deeper look into the subtree controlling the behavior of the IPv4
+subsystem of the Linux kernel.
+Let's start with the entries in /proc/sys/net/ipv4.
+ICMP settings
+-------------
+icmp_echo_ignore_all and icmp_echo_ignore_broadcasts
+----------------------------------------------------
+Turn on (1) or off (0), if the kernel should ignore all ICMP ECHO requests, or
+just those to broadcast and multicast addresses.
+Please note that if you accept ICMP echo requests with a broadcast/multi\-cast
+destination address  your  network  may  be  used as an exploder for denial of
+service packet flooding attacks to other hosts.
+icmp_destunreach_rate, icmp_echoreply_rate, icmp_paramprob_rate and icmp_timeexeed_rate
+---------------------------------------------------------------------------------------
+Sets limits  for  sending  ICMP  packets  to specific targets. A value of zero
+disables all  limiting.  Any  positive  value sets the maximum package rate in
+hundredth of a second (on Intel systems).
+IP settings
+-----------
+ip_autoconfig
+-------------
+This file contains the number one if the host received its IP configuration by
+RARP, BOOTP, DHCP or a similar mechanism. Otherwise it is zero.
+ip_default_ttl
+--------------
+TTL (Time  To  Live) for IPv4 interfaces. This is simply the maximum number of
+hops a packet may travel.
+ip_dynaddr
+----------
+Enable dynamic  socket  address rewriting on interface address change. This is
+useful for dialup interface with changing IP addresses.
+ip_forward
+----------
+Enable or  disable forwarding of IP packages between interfaces. Changing this
+value resets  all other parameters to their default values. They differ if the
+kernel is configured as host or router.
+ip_local_port_range
+-------------------
+Range of  ports  used  by  TCP  and UDP to choose the local port. Contains two
+numbers, the  first  number  is the lowest port, the second number the highest
+local port.  Default  is  1024-4999.  Should  be  changed  to  32768-61000 for
+high-usage systems.
+ip_no_pmtu_disc
+---------------
+Global switch  to  turn  path  MTU  discovery off. It can also be set on a per
+socket basis by the applications or on a per route basis.
+ip_masq_debug
+-------------
+Enable/disable debugging of IP masquerading.
+IP fragmentation settings
+-------------------------
+ipfrag_high_trash and ipfrag_low_trash
+--------------------------------------
+Maximum memory  used to reassemble IP fragments. When ipfrag_high_thresh bytes
+of memory  is  allocated  for  this  purpose,  the  fragment handler will toss
+packets until ipfrag_low_thresh is reached.
+ipfrag_time
+-----------
+Time in seconds to keep an IP fragment in memory.
+TCP settings
+------------
+tcp_ecn
+-------
+This file controls the use of the ECN bit in the IPv4 headers, this is a new
+feature about Explicit Congestion Notification, but some routers and firewalls
+block trafic that has this bit set, so it could be necessary to echo 0 to
+/proc/sys/net/ipv4/tcp_ecn, if you want to talk to this sites. For more info
+you could read RFC2481.
+tcp_retrans_collapse
+--------------------
+Bug-to-bug compatibility with some broken printers. On retransmit, try to send
+larger packets to work around bugs in certain TCP stacks. Can be turned off by
+setting it to zero.
+tcp_keepalive_probes
+--------------------
+Number of  keep  alive  probes  TCP  sends  out,  until  it  decides  that the
+connection is broken.
+tcp_keepalive_time
+------------------
+How often  TCP  sends out keep alive messages, when keep alive is enabled. The
+default is 2 hours.
+tcp_syn_retries
+---------------
+Number of  times  initial  SYNs  for  a  TCP  connection  attempt  will  be
+retransmitted. Should  not  be  higher  than 255. This is only the timeout for
+outgoing connections,  for  incoming  connections the number of retransmits is
+defined by tcp_retries1.
+tcp_sack
+--------
+Enable select acknowledgments after RFC2018.
+tcp_timestamps
+--------------
+Enable timestamps as defined in RFC1323.
+tcp_stdurg
+----------
+Enable the  strict  RFC793 interpretation of the TCP urgent pointer field. The
+default is  to  use  the  BSD  compatible interpretation of the urgent pointer
+pointing to the first byte after the urgent data. The RFC793 interpretation is
+to have  it  point  to  the last byte of urgent data. Enabling this option may
+lead to interoperatibility problems. Disabled by default.
+tcp_syncookies
+--------------
+Only valid  when  the  kernel  was  compiled  with CONFIG_SYNCOOKIES. Send out
+syncookies when  the  syn backlog queue of a socket overflows. This is to ward
+off the common 'syn flood attack'. Disabled by default.
+Note that  the  concept  of a socket backlog is abandoned. This means the peer
+may not  receive  reliable  error  messages  from  an  over loaded server with
+syncookies enabled.
+tcp_window_scaling
+------------------
+Enable window scaling as defined in RFC1323.
+tcp_fin_timeout
+---------------
+The length  of  time  in  seconds  it  takes to receive a final FIN before the
+socket is  always  closed.  This  is  strictly  a  violation  of  the  TCP
+specification, but required to prevent denial-of-service attacks.
+tcp_max_ka_probes
+-----------------
+Indicates how  many  keep alive probes are sent per slow timer run. Should not
+be set too high to prevent bursts.
+tcp_max_syn_backlog
+-------------------
+Length of  the per socket backlog queue. Since Linux 2.2 the backlog specified
+in listen(2)  only  specifies  the  length  of  the  backlog  queue of already
+established sockets. When more connection requests arrive Linux starts to drop
+packets. When  syncookies  are  enabled the packets are still answered and the
+maximum queue is effectively ignored.
+tcp_retries1
+------------
+Defines how  often  an  answer  to  a  TCP connection request is retransmitted
+before giving up.
+tcp_retries2
+------------
+Defines how often a TCP packet is retransmitted before giving up.
+Interface specific settings
+---------------------------
+In the directory /proc/sys/net/ipv4/conf you'll find one subdirectory for each
+interface the  system  knows about and one directory calls all. Changes in the
+all subdirectory  affect  all  interfaces,  whereas  changes  in  the  other
+subdirectories affect  only  one  interface.  All  directories  have  the same
+entries:
+accept_redirects
+----------------
+This switch  decides  if the kernel accepts ICMP redirect messages or not. The
+default is 'yes' if the kernel is configured for a regular host and 'no' for a
+router configuration.
+accept_source_route
+-------------------
+Should source  routed  packages  be  accepted  or  declined.  The  default  is
+dependent on  the  kernel  configuration.  It's 'yes' for routers and 'no' for
+hosts.
+bootp_relay
+~~~~~~~~~~~
+Accept packets  with source address 0.b.c.d with destinations not to this host
+as local ones. It is supposed that a BOOTP relay daemon will catch and forward
+such packets.
+The default  is  0,  since this feature is not implemented yet (kernel version
+2.2.12).
+forwarding
+----------
+Enable or disable IP forwarding on this interface.
+log_martians
+------------
+Log packets with source addresses with no known route to kernel log.
+mc_forwarding
+-------------
+Do multicast routing. The kernel needs to be compiled with CONFIG_MROUTE and a
+multicast routing daemon is required.
+proxy_arp
+---------
+Does (1) or does not (0) perform proxy ARP.
+rp_filter
+---------
+Integer value determines if a source validation should be made. 1 means yes, 0
+means no.  Disabled by default, but local/broadcast address spoofing is always
+on.
+If you  set this to 1 on a router that is the only connection for a network to
+the net,  it  will  prevent  spoofing  attacks  against your internal networks
+(external addresses  can  still  be  spoofed), without the need for additional
+firewall rules.
+secure_redirects
+----------------
+Accept ICMP  redirect  messages  only  for gateways, listed in default gateway
+list. Enabled by default.
+shared_media
+------------
+If it  is  not  set  the kernel does not assume that different subnets on this
+device can communicate directly. Default setting is 'yes'.
+send_redirects
+--------------
+Determines whether to send ICMP redirects to other hosts.
+Routing settings
+----------------
+The directory  /proc/sys/net/ipv4/route  contains  several  file  to  control
+routing issues.
+error_burst and error_cost
+--------------------------
+These  parameters  are used to limit how many ICMP destination unreachable to 
+send  from  the  host  in question. ICMP destination unreachable messages are 
+sent  when  we can not reach the next hop, while trying to transmit a packet. 
+It  will also print some error messages to kernel logs if someone is ignoring 
+our   ICMP  redirects.  The  higher  the  error_cost  factor  is,  the  fewer 
+destination  unreachable  and error messages will be let through. Error_burst 
+controls  when  destination  unreachable  messages and error messages will be
+dropped. The default settings limit warning messages to five every second.
+flush
+-----
+Writing to this file results in a flush of the routing cache.
+gc_elasticity, gc_interval, gc_min_interval_ms, gc_timeout, gc_thresh
+---------------------------------------------------------------------
+Values to  control  the  frequency  and  behavior  of  the  garbage collection
+algorithm for the routing cache. gc_min_interval is deprecated and replaced
+by gc_min_interval_ms.
+max_size
+--------
+Maximum size  of  the routing cache. Old entries will be purged once the cache
+reached has this size.
+max_delay, min_delay
+--------------------
+Delays for flushing the routing cache.
+redirect_load, redirect_number
+------------------------------
+Factors which  determine  if  more ICPM redirects should be sent to a specific
+host. No  redirects  will be sent once the load limit or the maximum number of
+redirects has been reached.
+redirect_silence
+----------------
+Timeout for redirects. After this period redirects will be sent again, even if
+this has been stopped, because the load or number limit has been reached.
+Network Neighbor handling
+-------------------------
+Settings about how to handle connections with direct neighbors (nodes attached
+to the same link) can be found in the directory /proc/sys/net/ipv4/neigh.
+As we  saw  it  in  the  conf directory, there is a default subdirectory which
+holds the  default  values, and one directory for each interface. The contents
+of the  directories  are identical, with the single exception that the default
+settings contain additional options to set garbage collection parameters.
+In the interface directories you'll find the following entries:
+base_reachable_time, base_reachable_time_ms
+-------------------------------------------
+A base  value  used for computing the random reachable time value as specified
+in RFC2461.
+Expression of base_reachable_time, which is deprecated, is in seconds.
+Expression of base_reachable_time_ms is in milliseconds.
+retrans_time, retrans_time_ms
+-----------------------------
+The time between retransmitted Neighbor Solicitation messages.
+Used for address resolution and to determine if a neighbor is
+unreachable.
+Expression of retrans_time, which is deprecated, is in 1/100 seconds (for
+IPv4) or in jiffies (for IPv6).
+Expression of retrans_time_ms is in milliseconds.
+unres_qlen
+----------
+Maximum queue  length  for a pending arp request - the number of packets which
+are accepted from other layers while the ARP address is still resolved.
+anycast_delay
+-------------
+Maximum for  random  delay  of  answers  to  neighbor solicitation messages in
+jiffies (1/100  sec). Not yet implemented (Linux does not have anycast support
+yet).
+ucast_solicit
+-------------
+Maximum number of retries for unicast solicitation.
+mcast_solicit
+-------------
+Maximum number of retries for multicast solicitation.
+delay_first_probe_time
+----------------------
+Delay for  the  first  time  probe  if  the  neighbor  is  reachable.  (see
+gc_stale_time)
+locktime
+--------
+An ARP/neighbor  entry  is only replaced with a new one if the old is at least
+locktime old. This prevents ARP cache thrashing.
+proxy_delay
+-----------
+Maximum time  (real  time is random [0..proxytime]) before answering to an ARP
+request for  which  we have an proxy ARP entry. In some cases, this is used to
+prevent network flooding.
+proxy_qlen
+----------
+Maximum queue length of the delayed proxy arp timer. (see proxy_delay).
+app_solcit
+----------
+Determines the  number of requests to send to the user level ARP daemon. Use 0
+to turn off.
+gc_stale_time
+-------------
+Determines how  often  to  check  for stale ARP entries. After an ARP entry is
+stale it  will  be resolved again (which is useful when an IP address migrates
+to another  machine).  When  ucast_solicit is greater than 0 it first tries to
+send an  ARP  packet  directly  to  the  known  host  When  that  fails  and
+mcast_solicit is greater than 0, an ARP request is broadcasted.
+2.9 Appletalk
+-------------
+The /proc/sys/net/appletalk  directory  holds the Appletalk configuration data
+when Appletalk is loaded. The configurable parameters are:
+aarp-expiry-time
+----------------
+The amount  of  time  we keep an ARP entry before expiring it. Used to age out
+old hosts.
+aarp-resolve-time
+-----------------
+The amount of time we will spend trying to resolve an Appletalk address.
+aarp-retransmit-limit
+---------------------
+The number of times we will retransmit a query before giving up.
+aarp-tick-time
+--------------
+Controls the rate at which expires are checked.
+The directory  /proc/net/appletalk  holds the list of active Appletalk sockets
+on a machine.
+The fields  indicate  the DDP type, the local address (in network:node format)
+the remote  address,  the  size of the transmit pending queue, the size of the
+received queue  (bytes waiting for applications to read) the state and the uid
+owning the socket.
+/proc/net/atalk_iface lists  all  the  interfaces  configured for appletalk.It
+shows the  name  of the interface, its Appletalk address, the network range on
+that address  (or  network number for phase 1 networks), and the status of the
+interface.
+/proc/net/atalk_route lists  each  known  network  route.  It lists the target
+(network) that the route leads to, the router (may be directly connected), the
+route flags, and the device the route is using.
+2.10 IPX
+--------
+The IPX protocol has no tunable values in proc/sys/net.
+The IPX  protocol  does,  however,  provide  proc/net/ipx. This lists each IPX
+socket giving  the  local  and  remote  addresses  in  Novell  format (that is
+network:node:port). In  accordance  with  the  strange  Novell  tradition,
+everything but the port is in hex. Not_Connected is displayed for sockets that
+are not  tied to a specific remote address. The Tx and Rx queue sizes indicate
+the number  of  bytes  pending  for  transmission  and  reception.  The  state
+indicates the  state  the  socket  is  in and the uid is the owning uid of the
+socket.
+The /proc/net/ipx_interface  file lists all IPX interfaces. For each interface
+it gives  the network number, the node number, and indicates if the network is
+the primary  network.  It  also  indicates  which  device  it  is bound to (or
+Internal for  internal  networks)  and  the  Frame  Type if appropriate. Linux
+supports 802.3,  802.2,  802.2  SNAP  and DIX (Blue Book) ethernet framing for
+IPX.
+The /proc/net/ipx_route  table  holds  a list of IPX routes. For each route it
+gives the  destination  network, the router node (or Directly) and the network
+address of the router (or Connected) for internal networks.
+2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
+----------------------------------------------------------
+The "mqueue"  filesystem provides  the necessary kernel features to enable the
+creation of a  user space  library that  implements  the  POSIX message queues
+API (as noted by the  MSG tag in the  POSIX 1003.1-2001 version  of the System
+Interfaces specification.)
+The "mqueue" filesystem contains values for determining/setting  the amount of
+resources used by the file system.
+/proc/sys/fs/mqueue/queues_max is a read/write  file for  setting/getting  the
+maximum number of message queues allowed on the system.
+/proc/sys/fs/mqueue/msg_max  is  a  read/write file  for  setting/getting  the
+maximum number of messages in a queue value.  In fact it is the limiting value
+for another (user) limit which is set in mq_open invocation. This attribute of
+a queue must be less or equal then msg_max.
+/proc/sys/fs/mqueue/msgsize_max is  a read/write  file for setting/getting the
+maximum  message size value (it is every  message queue's attribute set during
+its creation).
+------------------------------------------------------------------------------
+Summary
+------------------------------------------------------------------------------
+Certain aspects  of  kernel  behavior  can be modified at runtime, without the
+need to  recompile  the kernel, or even to reboot the system. The files in the
+/proc/sys tree  can  not only be read, but also modified. You can use the echo
+command to write value into these files, thereby changing the default settings
+of the kernel.
+------------------------------------------------------------------------------
diff --git a/Documentation/filesystems/romfs.txt b/Documentation/filesystems/romfs.txt
new file mode 100644
index 000000000000..2d2a7b2a16b9
--- /dev/null
+++ b/Documentation/filesystems/romfs.txt
@@ -0,0 +1,187 @@
+ROMFS - ROM FILE SYSTEM
+This is a quite dumb, read only filesystem, mainly for initial RAM
+disks of installation disks.  It has grown up by the need of having
+modules linked at boot time.  Using this filesystem, you get a very
+similar feature, and even the possibility of a small kernel, with a
+file system which doesn't take up useful memory from the router
+functions in the basement of your office.
+For comparison, both the older minix and xiafs (the latter is now
+defunct) filesystems, compiled as module need more than 20000 bytes,
+while romfs is less than a page, about 4000 bytes (assuming i586
+code).  Under the same conditions, the msdos filesystem would need
+about 30K (and does not support device nodes or symlinks), while the
+nfs module with nfsroot is about 57K.  Furthermore, as a bit unfair
+comparison, an actual rescue disk used up 3202 blocks with ext2, while
+with romfs, it needed 3079 blocks.
+To create such a file system, you'll need a user program named
+genromfs.  It is available via anonymous ftp on sunsite.unc.edu and
+its mirrors, in the /pub/Linux/system/recovery/ directory.
+As the name suggests, romfs could be also used (space-efficiently) on
+various read-only media, like (E)EPROM disks if someone will have the
+motivation.. :)
+However, the main purpose of romfs is to have a very small kernel,
+which has only this filesystem linked in, and then can load any module
+later, with the current module utilities.  It can also be used to run
+some program to decide if you need SCSI devices, and even IDE or
+floppy drives can be loaded later if you use the "initrd"--initial
+RAM disk--feature of the kernel.  This would not be really news
+flash, but with romfs, you can even spare off your ext2 or minix or
+maybe even affs filesystem until you really know that you need it.
+For example, a distribution boot disk can contain only the cd disk
+drivers (and possibly the SCSI drivers), and the ISO 9660 filesystem
+module.  The kernel can be small enough, since it doesn't have other
+filesystems, like the quite large ext2fs module, which can then be
+loaded off the CD at a later stage of the installation.  Another use
+would be for a recovery disk, when you are reinstalling a workstation
+from the network, and you will have all the tools/modules available
+from a nearby server, so you don't want to carry two disks for this
+purpose, just because it won't fit into ext2.
+romfs operates on block devices as you can expect, and the underlying
+structure is very simple.  Every accessible structure begins on 16
+byte boundaries for fast access.  The minimum space a file will take
+is 32 bytes (this is an empty file, with a less than 16 character
+name).  The maximum overhead for any non-empty file is the header, and
+the 16 byte padding for the name and the contents, also 16+14+15 = 45
+bytes.  This is quite rare however, since most file names are longer
+than 3 bytes, and shorter than 15 bytes.
+The layout of the filesystem is the following:
+offset      content
+        +---+---+---+---+
+  0     | - | r | o | m |  \
+        +---+---+---+---+       The ASCII representation of those bytes
+  4     | 1 | f | s | - |  /    (i.e. "-rom1fs-")
+        +---+---+---+---+
+  8     |   full size   |       The number of accessible bytes in this fs.
+        +---+---+---+---+
+ 12     |    checksum   |       The checksum of the FIRST 512 BYTES.
+        +---+---+---+---+
+ 16     | volume name   |       The zero terminated name of the volume,
+        :               :       padded to 16 byte boundary.
+        +---+---+---+---+
+ xx     |     file      |
+        :    headers    :
+Every multi byte value (32 bit words, I'll use the longwords term from
+now on) must be in big endian order.
+The first eight bytes identify the filesystem, even for the casual
+inspector.  After that, in the 3rd longword, it contains the number of
+bytes accessible from the start of this filesystem.  The 4th longword
+is the checksum of the first 512 bytes (or the number of bytes
+accessible, whichever is smaller).  The applied algorithm is the same
+as in the AFFS filesystem, namely a simple sum of the longwords
+(assuming bigendian quantities again).  For details, please consult
+the source.  This algorithm was chosen because although it's not quite
+reliable, it does not require any tables, and it is very simple.
+The following bytes are now part of the file system; each file header
+must begin on a 16 byte boundary.
+offset      content
+        +---+---+---+---+
+  0     | next filehdr|X|       The offset of the next file header
+        +---+---+---+---+         (zero if no more files)
+  4     |   spec.info   |       Info for directories/hard links/devices
+        +---+---+---+---+
+  8     |     size      |       The size of this file in bytes
+        +---+---+---+---+
+ 12     |   checksum    |       Covering the meta data, including the file
+        +---+---+---+---+         name, and padding
+ 16     | file name     |       The zero terminated name of the file,
+        :               :       padded to 16 byte boundary
+        +---+---+---+---+
+ xx     | file data     |
+        :               :
+Since the file headers begin always at a 16 byte boundary, the lowest
+4 bits would be always zero in the next filehdr pointer.  These four
+bits are used for the mode information.  Bits 0..2 specify the type of
+the file; while bit 4 shows if the file is executable or not.  The
+permissions are assumed to be world readable, if this bit is not set,
+and world executable if it is; except the character and block devices,
+they are never accessible for other than owner.  The owner of every
+file is user and group 0, this should never be a problem for the
+intended use.  The mapping of the 8 possible values to file types is
+the following:
+          mapping               spec.info means
+ 0      hard link       link destination [file header]
+ 1      directory       first file's header
+ 2      regular file    unused, must be zero [MBZ]
+ 3      symbolic link   unused, MBZ (file data is the link content)
+ 4      block device    16/16 bits major/minor number
+ 5      char device                 - " -
+ 6      socket          unused, MBZ
+ 7      fifo            unused, MBZ
+Note that hard links are specifically marked in this filesystem, but
+they will behave as you can expect (i.e. share the inode number).
+Note also that it is your responsibility to not create hard link
+loops, and creating all the . and .. links for directories.  This is
+normally done correctly by the genromfs program.  Please refrain from
+using the executable bits for special purposes on the socket and fifo
+special files, they may have other uses in the future.  Additionally,
+please remember that only regular files, and symlinks are supposed to
+have a nonzero size field; they contain the number of bytes available
+directly after the (padded) file name.
+Another thing to note is that romfs works on file headers and data
+aligned to 16 byte boundaries, but most hardware devices and the block
+device drivers are unable to cope with smaller than block-sized data.
+To overcome this limitation, the whole size of the file system must be
+padded to an 1024 byte boundary.
+If you have any problems or suggestions concerning this file system,
+please contact me.  However, think twice before wanting me to add
+features and code, because the primary and most important advantage of
+this file system is the small code.  On the other hand, don't be
+alarmed, I'm not getting that much romfs related mail.  Now I can
+understand why Avery wrote poems in the ARCnet docs to get some more
+feedback. :)
+romfs has also a mailing list, and to date, it hasn't received any
+traffic, so you are welcome to join it to discuss your ideas. :)
+It's run by ezmlm, so you can subscribe to it by sending a message
+to romfs-subscribe@shadow.banki.hu, the content is irrelevant.
+Pending issues:
+- Permissions and owner information are pretty essential features of a
+Un*x like system, but romfs does not provide the full possibilities.
+I have never found this limiting, but others might.
+- The file system is read only, so it can be very small, but in case
+one would want to write _anything_ to a file system, he still needs
+a writable file system, thus negating the size advantages.  Possible
+solutions: implement write access as a compile-time option, or a new,
+similarly small writable filesystem for RAM disks.
+- Since the files are only required to have alignment on a 16 byte
+boundary, it is currently possibly suboptimal to read or execute files
+from the filesystem.  It might be resolved by reordering file data to
+have most of it (i.e. except the start and the end) laying at "natural"
+boundaries, thus it would be possible to directly map a big portion of
+the file contents to the mm subsystem.
+- Compression might be an useful feature, but memory is quite a
+limiting factor in my eyes.
+- Where it is used?
+- Does it work on other architectures than intel and motorola?
+Have fun,
+Janos Farkas <chexum@shadow.banki.hu>
diff --git a/Documentation/filesystems/smbfs.txt b/Documentation/filesystems/smbfs.txt
new file mode 100644
index 000000000000..f673ef0de0f7
--- /dev/null
+++ b/Documentation/filesystems/smbfs.txt
@@ -0,0 +1,8 @@
+Smbfs is a filesystem that implements the SMB protocol, which is the
+protocol used by Windows for Workgroups, Windows 95 and Windows NT.
+Smbfs was inspired by Samba, the program written by Andrew Tridgell
+that turns any Unix host into a file server for DOS or Windows clients.
+Smbfs is a SMB client, but uses parts of samba for it's operation. For
+more info on samba, including documentation, please go to
+http://www.samba.org/ and then on to your nearest mirror.
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt
new file mode 100644
index 000000000000..e97d024eae77
--- /dev/null
+++ b/Documentation/filesystems/sysfs-pci.txt
@@ -0,0 +1,88 @@
+Accessing PCI device resources through sysfs
+sysfs, usually mounted at /sys, provides access to PCI resources on platforms
+that support it.  For example, a given bus might look like this:
+     /sys/devices/pci0000:17
+     |-- 0000:17:00.0
+     |   |-- class
+     |   |-- config
+     |   |-- detach_state
+     |   |-- device
+     |   |-- irq
+     |   |-- local_cpus
+     |   |-- resource
+     |   |-- resource0
+     |   |-- resource1
+     |   |-- resource2
+     |   |-- rom
+     |   |-- subsystem_device
+     |   |-- subsystem_vendor
+     |   `-- vendor
+     `-- detach_state
+The topmost element describes the PCI domain and bus number.  In this case,
+the domain number is 0000 and the bus number is 17 (both values are in hex).
+This bus contains a single function device in slot 0.  The domain and bus
+numbers are reproduced for convenience.  Under the device directory are several
+files, each with their own function.
+       file                function
+       ----                --------
+       class               PCI class (ascii, ro)
+       config              PCI config space (binary, rw)
+       detach_state        connection status (bool, rw)
+       device              PCI device (ascii, ro)
+       irq                 IRQ number (ascii, ro)
+       local_cpus          nearby CPU mask (cpumask, ro)
+       resource            PCI resource host addresses (ascii, ro)
+       resource0..N        PCI resource N, if present (binary, mmap)
+       rom                 PCI ROM resource, if present (binary, ro)
+       subsystem_device    PCI subsystem device (ascii, ro)
+       subsystem_vendor    PCI subsystem vendor (ascii, ro)
+       vendor              PCI vendor (ascii, ro)
+  ro - read only file
+  rw - file is readable and writable
+  mmap - file is mmapable
+  ascii - file contains ascii text
+  binary - file contains binary data
+  cpumask - file contains a cpumask type
+The read only files are informational, writes to them will be ignored.
+Writable files can be used to perform actions on the device (e.g. changing
+config space, detaching a device).  mmapable files are available via an
+mmap of the file at offset 0 and can be used to do actual device programming
+from userspace.  Note that some platforms don't support mmapping of certain
+resources, so be sure to check the return value from any attempted mmap.
+Accessing legacy resources through sysfs
+Legacy I/O port and ISA memory resources are also provided in sysfs if the
+underlying platform supports them.  They're located in the PCI class heirarchy,
+e.g.
+        /sys/class/pci_bus/0000:17/
+        |-- bridge -> ../../../devices/pci0000:17
+        |-- cpuaffinity
+        |-- legacy_io
+        `-- legacy_mem
+The legacy_io file is a read/write file that can be used by applications to
+do legacy port I/O.  The application should open the file, seek to the desired
+port (e.g. 0x3e8) and do a read or a write of 1, 2 or 4 bytes.  The legacy_mem
+file should be mmapped with an offset corresponding to the memory offset
+desired, e.g. 0xa0000 for the VGA frame buffer.  The application can then
+simply dereference the returned pointer (after checking for errors of course)
+to access legacy memory space.
+Supporting PCI access on new platforms
+In order to support PCI resource mapping as described above, Linux platform
+code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function.
+Platforms are free to only support subsets of the mmap functionality, but
+useful return codes should be provided.
+Legacy resources are protected by the HAVE_PCI_LEGACY define.  Platforms
+wishing to support legacy functionality should define it and provide
+pci_legacy_read, pci_legacy_write and pci_mmap_legacy_page_range functions.
+\ No newline at end of file
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
new file mode 100644
index 000000000000..60f6c2c4d477
--- /dev/null
+++ b/Documentation/filesystems/sysfs.txt
@@ -0,0 +1,341 @@
+sysfs - _The_ filesystem for exporting kernel objects. 
+Patrick Mochel  <mochel@osdl.org>
+10 January 2003
+What it is:
+~~~~~~~~~~~
+sysfs is a ram-based filesystem initially based on ramfs. It provides
+a means to export kernel data structures, their attributes, and the 
+linkages between them to userspace. 
+sysfs is tied inherently to the kobject infrastructure. Please read
+Documentation/kobject.txt for more information concerning the kobject
+interface. 
+Using sysfs
+~~~~~~~~~~~
+sysfs is always compiled in. You can access it by doing:
+    mount -t sysfs sysfs /sys 
+Directory Creation
+~~~~~~~~~~~~~~~~~~
+For every kobject that is registered with the system, a directory is
+created for it in sysfs. That directory is created as a subdirectory
+of the kobject's parent, expressing internal object hierarchies to
+userspace. Top-level directories in sysfs represent the common
+ancestors of object hierarchies; i.e. the subsystems the objects
+belong to. 
+Sysfs internally stores the kobject that owns the directory in the
+->d_fsdata pointer of the directory's dentry. This allows sysfs to do
+reference counting directly on the kobject when the file is opened and
+closed. 
+Attributes
+~~~~~~~~~~
+Attributes can be exported for kobjects in the form of regular files in
+the filesystem. Sysfs forwards file I/O operations to methods defined
+for the attributes, providing a means to read and write kernel
+attributes.
+Attributes should be ASCII text files, preferably with only one value
+per file. It is noted that it may not be efficient to contain only
+value per file, so it is socially acceptable to express an array of
+values of the same type. 
+Mixing types, expressing multiple lines of data, and doing fancy
+formatting of data is heavily frowned upon. Doing these things may get
+you publically humiliated and your code rewritten without notice. 
+An attribute definition is simply:
+struct attribute {
+        char                    * name;
+        mode_t                  mode;
+};
+int sysfs_create_file(struct kobject * kobj, struct attribute * attr);
+void sysfs_remove_file(struct kobject * kobj, struct attribute * attr);
+A bare attribute contains no means to read or write the value of the
+attribute. Subsystems are encouraged to define their own attribute
+structure and wrapper functions for adding and removing attributes for
+a specific object type. 
+For example, the driver model defines struct device_attribute like:
+struct device_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct device * dev, char * buf);
+        ssize_t (*store)(struct device * dev, const char * buf);
+};
+int device_create_file(struct device *, struct device_attribute *);
+void device_remove_file(struct device *, struct device_attribute *);
+It also defines this helper for defining device attributes: 
+#define DEVICE_ATTR(_name,_mode,_show,_store)      \
+struct device_attribute dev_attr_##_name = {            \
+        .attr = {.name  = __stringify(_name) , .mode   = _mode },      \
+        .show   = _show,                                \
+        .store  = _store,                               \
+};
+For example, declaring
+static DEVICE_ATTR(foo,0644,show_foo,store_foo);
+is equivalent to doing:
+static struct device_attribute dev_attr_foo = {
+       .attr    = {
+                .name = "foo",
+                .mode = 0644,
+        },
+        .show = show_foo,
+        .store = store_foo,
+};
+Subsystem-Specific Callbacks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+When a subsystem defines a new attribute type, it must implement a
+set of sysfs operations for forwarding read and write calls to the
+show and store methods of the attribute owners. 
+struct sysfs_ops {
+        ssize_t (*show)(struct kobject *, struct attribute *,char *);
+        ssize_t (*store)(struct kobject *,struct attribute *,const char *);
+};
+[ Subsystems should have already defined a struct kobj_type as a
+descriptor for this type, which is where the sysfs_ops pointer is
+stored. See the kobject documentation for more information. ]
+When a file is read or written, sysfs calls the appropriate method
+for the type. The method then translates the generic struct kobject
+and struct attribute pointers to the appropriate pointer types, and
+calls the associated methods. 
+To illustrate:
+#define to_dev_attr(_attr) container_of(_attr,struct device_attribute,attr)
+#define to_dev(d) container_of(d, struct device, kobj)
+static ssize_t
+dev_attr_show(struct kobject * kobj, struct attribute * attr, char * buf)
+{
+        struct device_attribute * dev_attr = to_dev_attr(attr);
+        struct device * dev = to_dev(kobj);
+        ssize_t ret = 0;
+        if (dev_attr->show)
+                ret = dev_attr->show(dev,buf);
+        return ret;
+}
+Reading/Writing Attribute Data
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To read or write attributes, show() or store() methods must be
+specified when declaring the attribute. The method types should be as
+simple as those defined for device attributes:
+        ssize_t (*show)(struct device * dev, char * buf);
+        ssize_t (*store)(struct device * dev, const char * buf);
+IOW, they should take only an object and a buffer as parameters. 
+sysfs allocates a buffer of size (PAGE_SIZE) and passes it to the
+method. Sysfs will call the method exactly once for each read or
+write. This forces the following behavior on the method
+implementations: 
+- On read(2), the show() method should fill the entire buffer. 
+  Recall that an attribute should only be exporting one value, or an
+  array of similar values, so this shouldn't be that expensive. 
+  This allows userspace to do partial reads and seeks arbitrarily over
+  the entire file at will. 
+- On write(2), sysfs expects the entire buffer to be passed during the
+  first write. Sysfs then passes the entire buffer to the store()
+  method. 
+  
+  When writing sysfs files, userspace processes should first read the
+  entire file, modify the values it wishes to change, then write the
+  entire buffer back. 
+  Attribute method implementations should operate on an identical
+  buffer when reading and writing values. 
+Other notes:
+- The buffer will always be PAGE_SIZE bytes in length. On i386, this
+  is 4096. 
+- show() methods should return the number of bytes printed into the
+  buffer. This is the return value of snprintf().
+- show() should always use snprintf(). 
+- store() should return the number of bytes used from the buffer. This
+  can be done using strlen().
+- show() or store() can always return errors. If a bad value comes
+  through, be sure to return an error.
+- The object passed to the methods will be pinned in memory via sysfs
+  referencing counting its embedded object. However, the physical 
+  entity (e.g. device) the object represents may not be present. Be 
+  sure to have a way to check this, if necessary. 
+A very simple (and naive) implementation of a device attribute is:
+static ssize_t show_name(struct device * dev, char * buf)
+{
+        return sprintf(buf,"%s\n",dev->name);
+}
+static ssize_t store_name(struct device * dev, const char * buf)
+{
+        sscanf(buf,"%20s",dev->name);
+        return strlen(buf);
+}
+static DEVICE_ATTR(name,S_IRUGO,show_name,store_name);
+(Note that the real implementation doesn't allow userspace to set the 
+name for a device.)
+Top Level Directory Layout
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+The sysfs directory arrangement exposes the relationship of kernel
+data structures. 
+The top level sysfs diretory looks like:
+block/
+bus/
+class/
+devices/
+firmware/
+net/
+devices/ contains a filesystem representation of the device tree. It maps
+directly to the internal kernel device tree, which is a hierarchy of
+struct device. 
+bus/ contains flat directory layout of the various bus types in the
+kernel. Each bus's directory contains two subdirectories:
+        devices/
+        drivers/
+devices/ contains symlinks for each device discovered in the system
+that point to the device's directory under root/.
+drivers/ contains a directory for each device driver that is loaded
+for devices on that particular bus (this assumes that drivers do not
+span multiple bus types).
+More information can driver-model specific features can be found in
+Documentation/driver-model/. 
+TODO: Finish this section.
+Current Interfaces
+~~~~~~~~~~~~~~~~~~
+The following interface layers currently exist in sysfs:
+- devices (include/linux/device.h)
+----------------------------------
+Structure:
+struct device_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct device * dev, char * buf);
+        ssize_t (*store)(struct device * dev, const char * buf);
+};
+Declaring:
+DEVICE_ATTR(_name,_str,_mode,_show,_store);
+Creation/Removal:
+int device_create_file(struct device *device, struct device_attribute * attr);
+void device_remove_file(struct device * dev, struct device_attribute * attr);
+- bus drivers (include/linux/device.h)
+--------------------------------------
+Structure:
+struct bus_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct bus_type *, char * buf);
+        ssize_t (*store)(struct bus_type *, const char * buf);
+};
+Declaring:
+BUS_ATTR(_name,_mode,_show,_store)
+Creation/Removal:
+int bus_create_file(struct bus_type *, struct bus_attribute *);
+void bus_remove_file(struct bus_type *, struct bus_attribute *);
+- device drivers (include/linux/device.h)
+-----------------------------------------
+Structure:
+struct driver_attribute {
+        struct attribute        attr;
+        ssize_t (*show)(struct device_driver *, char * buf);
+        ssize_t (*store)(struct device_driver *, const char * buf);
+};
+Declaring:
+DRIVER_ATTR(_name,_mode,_show,_store)
+Creation/Removal:
+int driver_create_file(struct device_driver *, struct driver_attribute *);
+void driver_remove_file(struct device_driver *, struct driver_attribute *);
diff --git a/Documentation/filesystems/sysv-fs.txt b/Documentation/filesystems/sysv-fs.txt
new file mode 100644
index 000000000000..d81722418010
--- /dev/null
+++ b/Documentation/filesystems/sysv-fs.txt
@@ -0,0 +1,38 @@
+This is the implementation of the SystemV/Coherent filesystem for Linux.
+It implements all of
+  - Xenix FS,
+  - SystemV/386 FS,
+  - Coherent FS.
+This is version beta 4.
+To install:
+* Answer the 'System V and Coherent filesystem support' question with 'y'
+  when configuring the kernel.
+* To mount a disk or a partition, use
+    mount [-r] -t sysv device mountpoint
+  The file system type names
+               -t sysv
+               -t xenix
+               -t coherent
+  may be used interchangeably, but the last two will eventually disappear.
+Bugs in the present implementation:
+- Coherent FS:
+  - The "free list interleave" n:m is currently ignored.
+  - Only file systems with no filesystem name and no pack name are recognized.
+  (See Coherent "man mkfs" for a description of these features.)
+- SystemV Release 2 FS:
+  The superblock is only searched in the blocks 9, 15, 18, which
+  corresponds to the beginning of track 1 on floppy disks. No support
+  for this FS on hard disk yet.
+Please report any bugs and suggestions to
+  Bruno Haible <haible@ma2s2.mathematik.uni-karlsruhe.de>
+  Pascal Haible <haible@izfm.uni-stuttgart.de>
+  Krzysztof G. Baranowski <kgb@manjak.knm.org.pl>
+Bruno Haible
+<haible@ma2s2.mathematik.uni-karlsruhe.de>
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
new file mode 100644
index 000000000000..417e3095fe39
--- /dev/null
+++ b/Documentation/filesystems/tmpfs.txt
@@ -0,0 +1,100 @@
+Tmpfs is a file system which keeps all files in virtual memory.
+Everything in tmpfs is temporary in the sense that no files will be
+created on your hard drive. If you unmount a tmpfs instance,
+everything stored therein is lost.
+tmpfs puts everything into the kernel internal caches and grows and
+shrinks to accommodate the files it contains and is able to swap
+unneeded pages out to swap space. It has maximum size limits which can
+be adjusted on the fly via 'mount -o remount ...'
+If you compare it to ramfs (which was the template to create tmpfs)
+you gain swapping and limit checking. Another similar thing is the RAM
+disk (/dev/ram*), which simulates a fixed size hard disk in physical
+RAM, where you have to create an ordinary filesystem on top. Ramdisks
+cannot swap and you do not have the possibility to resize them. 
+Since tmpfs lives completely in the page cache and on swap, all tmpfs
+pages currently in memory will show up as cached. It will not show up
+as shared or something like that. Further on you can check the actual
+RAM+swap use of a tmpfs instance with df(1) and du(1).
+tmpfs has the following uses:
+1) There is always a kernel internal mount which you will not see at
+   all. This is used for shared anonymous mappings and SYSV shared
+   memory. 
+   This mount does not depend on CONFIG_TMPFS. If CONFIG_TMPFS is not
+   set, the user visible part of tmpfs is not build. But the internal
+   mechanisms are always present.
+2) glibc 2.2 and above expects tmpfs to be mounted at /dev/shm for
+   POSIX shared memory (shm_open, shm_unlink). Adding the following
+   line to /etc/fstab should take care of this:
+        tmpfs   /dev/shm        tmpfs   defaults        0 0
+   Remember to create the directory that you intend to mount tmpfs on
+   if necessary (/dev/shm is automagically created if you use devfs).
+   This mount is _not_ needed for SYSV shared memory. The internal
+   mount is used for that. (In the 2.3 kernel versions it was
+   necessary to mount the predecessor of tmpfs (shm fs) to use SYSV
+   shared memory)
+3) Some people (including me) find it very convenient to mount it
+   e.g. on /tmp and /var/tmp and have a big swap partition. And now
+   loop mounts of tmpfs files do work, so mkinitrd shipped by most
+   distributions should succeed with a tmpfs /tmp.
+4) And probably a lot more I do not know about :-)
+tmpfs has three mount options for sizing:
+size:      The limit of allocated bytes for this tmpfs instance. The 
+           default is half of your physical RAM without swap. If you
+           oversize your tmpfs instances the machine will deadlock
+           since the OOM handler will not be able to free that memory.
+nr_blocks: The same as size, but in blocks of PAGE_CACHE_SIZE.
+nr_inodes: The maximum number of inodes for this instance. The default
+           is half of the number of your physical RAM pages, or (on a
+           a machine with highmem) the number of lowmem RAM pages,
+           whichever is the lower.
+These parameters accept a suffix k, m or g for kilo, mega and giga and
+can be changed on remount.  The size parameter also accepts a suffix %
+to limit this tmpfs instance to that percentage of your physical RAM:
+the default, when neither size nor nr_blocks is specified, is size=50%
+If both nr_blocks (or size) and nr_inodes are set to 0, neither blocks
+nor inodes will be limited in that instance.  It is generally unwise to
+mount with such options, since it allows any user with write access to
+use up all the memory on the machine; but enhances the scalability of
+that instance in a system with many cpus making intensive use of it.
+To specify the initial root directory you can use the following mount
+options:
+mode:   The permissions as an octal number
+uid:    The user id 
+gid:    The group id
+These options do not have any effect on remount. You can change these
+parameters with chmod(1), chown(1) and chgrp(1) on a mounted filesystem.
+So 'mount -t tmpfs -o size=10G,nr_inodes=10k,mode=700 tmpfs /mytmpfs'
+will give you tmpfs instance on /mytmpfs which can allocate 10GB
+RAM/SWAP in 10240 inodes and it is only accessible by root.
+Author:
+   Christoph Rohland <cr@sap.com>, 1.12.01
+Updated:
+   Hugh Dickins <hugh@veritas.com>, 01 September 2004
diff --git a/Documentation/filesystems/udf.txt b/Documentation/filesystems/udf.txt
new file mode 100644
index 000000000000..e5213bc301f7
--- /dev/null
+++ b/Documentation/filesystems/udf.txt
@@ -0,0 +1,57 @@
+*
+* Documentation/filesystems/udf.txt
+*
+UDF Filesystem version 0.9.8.1
+If you encounter problems with reading UDF discs using this driver,
+please report them to linux_udf@hpesjro.fc.hp.com, which is the
+developer's list.
+Write support requires a block driver which supports writing. The current
+scsi and ide cdrom drivers do not support writing.
+-------------------------------------------------------------------------------
+The following mount options are supported:
+        gid=            Set the default group.
+        umask=          Set the default umask.
+        uid=            Set the default user.
+        bs=             Set the block size.
+        unhide          Show otherwise hidden files.
+        undelete        Show deleted files in lists.
+        adinicb         Embed data in the inode (default)
+        noadinicb       Don't embed data in the inode
+        shortad         Use short ad's
+        longad          Use long ad's (default)
+        nostrict        Unset strict conformance
+        iocharset=      Set the NLS character set
+The remaining are for debugging and disaster recovery:
+        novrs           Skip volume sequence recognition 
+The following expect a offset from 0.
+        session=        Set the CDROM session (default= last session)
+        anchor=         Override standard anchor location. (default= 256)
+        volume=         Override the VolumeDesc location. (unused)
+        partition=      Override the PartitionDesc location. (unused)
+        lastblock=      Set the last block of the filesystem/
+The following expect a offset from the partition root.
+        fileset=        Override the fileset block location. (unused)
+        rootdir=        Override the root directory location. (unused)
+                        WARNING: overriding the rootdir to a non-directory may
+                                yield highly unpredictable results.
+-------------------------------------------------------------------------------
+For the latest version and toolset see:
+        http://linux-udf.sourceforge.net/
+Documentation on UDF and ECMA 167 is available FREE from:
+        http://www.osta.org/
+        http://www.ecma-international.org/
+Ben Fennema <bfennema@falcon.csc.calpoly.edu>
diff --git a/Documentation/filesystems/ufs.txt b/Documentation/filesystems/ufs.txt
new file mode 100644
index 000000000000..2b5a56a6a558
--- /dev/null
+++ b/Documentation/filesystems/ufs.txt
@@ -0,0 +1,61 @@
+USING UFS
+=========
+mount -t ufs -o ufstype=type_of_ufs device dir
+UFS OPTIONS
+===========
+ufstype=type_of_ufs
+        UFS is a file system widely used in different operating systems.
+        The problem are differences among implementations. Features of
+        some implementations are undocumented, so its hard to recognize
+        type of ufs automatically. That's why user must specify type of 
+        ufs manually by mount option ufstype. Possible values are:
+        old     old format of ufs
+                default value, supported as read-only
+        44bsd   used in FreeBSD, NetBSD, OpenBSD
+                supported as read-write
+        ufs2    used in FreeBSD 5.x
+                supported as read-only
+        5xbsd   synonym for ufs2
+        sun     used in SunOS (Solaris)
+                supported as read-write
+        sunx86  used in SunOS for Intel (Solarisx86)
+                supported as read-write
+        hp      used in HP-UX
+                supported as read-only
+        nextstep
+                used in NextStep
+                supported as read-only
+        nextstep-cd
+                used for NextStep CDROMs (block_size == 2048)
+                supported as read-only
+        openstep
+                used in OpenStep
+                supported as read-only
+POSSIBLE PROBLEMS
+=================
+There is still bug in reallocation of fragment, in file fs/ufs/balloc.c, 
+line 364. But it seems working on current buffer cache configuration.
+BUG REPORTS
+===========
+Any ufs bug report you can send to daniel.pirkl@email.cz (do not send 
+partition tables bug reports.)
diff --git a/Documentation/filesystems/vfat.txt b/Documentation/filesystems/vfat.txt
new file mode 100644
index 000000000000..5ead20c6c744
--- /dev/null
+++ b/Documentation/filesystems/vfat.txt
@@ -0,0 +1,231 @@
+USING VFAT
+----------------------------------------------------------------------
+To use the vfat filesystem, use the filesystem type 'vfat'.  i.e.
+  mount -t vfat /dev/fd0 /mnt
+No special partition formatter is required.  mkdosfs will work fine
+if you want to format from within Linux.
+VFAT MOUNT OPTIONS
+----------------------------------------------------------------------
+umask=###     -- The permission mask (for files and directories, see umask(1)).
+                 The default is the umask of current process.
+dmask=###     -- The permission mask for the directory.
+                 The default is the umask of current process.
+fmask=###     -- The permission mask for files.
+                 The default is the umask of current process.
+codepage=###  -- Sets the codepage number for converting to shortname
+                 characters on FAT filesystem.
+                 By default, FAT_DEFAULT_CODEPAGE setting is used.
+iocharset=name -- Character set to use for converting between the
+                 encoding is used for user visible filename and 16 bit
+                 Unicode characters. Long filenames are stored on disk
+                 in Unicode format, but Unix for the most part doesn't
+                 know how to deal with Unicode.
+                 By default, FAT_DEFAULT_IOCHARSET setting is used.
+                 There is also an option of doing UTF8 translations
+                 with the utf8 option.
+                 NOTE: "iocharset=utf8" is not recommended. If unsure,
+                 you should consider the following option instead.
+utf8=<bool>   -- UTF8 is the filesystem safe version of Unicode that
+                 is used by the console.  It can be be enabled for the
+                 filesystem with this option. If 'uni_xlate' gets set,
+                 UTF8 gets disabled.
+uni_xlate=<bool> -- Translate unhandled Unicode characters to special
+                 escaped sequences.  This would let you backup and
+                 restore filenames that are created with any Unicode
+                 characters.  Until Linux supports Unicode for real,
+                 this gives you an alternative.  Without this option,
+                 a '?' is used when no translation is possible.  The
+                 escape character is ':' because it is otherwise
+                 illegal on the vfat filesystem.  The escape sequence
+                 that gets used is ':' and the four digits of hexadecimal
+                 unicode.
+nonumtail=<bool> -- When creating 8.3 aliases, normally the alias will
+                 end in '~1' or tilde followed by some number.  If this
+                 option is set, then if the filename is 
+                 "longfilename.txt" and "longfile.txt" does not
+                 currently exist in the directory, 'longfile.txt' will
+                 be the short alias instead of 'longfi~1.txt'. 
+                  
+quiet         -- Stops printing certain warning messages.
+check=s|r|n   -- Case sensitivity checking setting.
+                 s: strict, case sensitive
+                 r: relaxed, case insensitive
+                 n: normal, default setting, currently case insensitive
+shortname=lower|win95|winnt|mixed
+              -- Shortname display/create setting.
+                 lower: convert to lowercase for display,
+                        emulate the Windows 95 rule for create.
+                 win95: emulate the Windows 95 rule for display/create.
+                 winnt: emulate the Windows NT rule for display/create.
+                 mixed: emulate the Windows NT rule for display,
+                        emulate the Windows 95 rule for create.
+                 Default setting is `lower'.
+<bool>: 0,1,yes,no,true,false
+TODO
+----------------------------------------------------------------------
+* Need to get rid of the raw scanning stuff.  Instead, always use
+  a get next directory entry approach.  The only thing left that uses
+  raw scanning is the directory renaming code.
+POSSIBLE PROBLEMS
+----------------------------------------------------------------------
+* vfat_valid_longname does not properly checked reserved names.
+* When a volume name is the same as a directory name in the root
+  directory of the filesystem, the directory name sometimes shows
+  up as an empty file.
+* autoconv option does not work correctly.
+BUG REPORTS
+----------------------------------------------------------------------
+If you have trouble with the VFAT filesystem, mail bug reports to
+chaffee@bmrc.cs.berkeley.edu.  Please specify the filename
+and the operation that gave you trouble.
+TEST SUITE
+----------------------------------------------------------------------
+If you plan to make any modifications to the vfat filesystem, please
+get the test suite that comes with the vfat distribution at
+  http://bmrc.berkeley.edu/people/chaffee/vfat.html
+This tests quite a few parts of the vfat filesystem and additional
+tests for new features or untested features would be appreciated.
+NOTES ON THE STRUCTURE OF THE VFAT FILESYSTEM
+----------------------------------------------------------------------
+(This documentation was provided by Galen C. Hunt <gchunt@cs.rochester.edu>
+ and lightly annotated by Gordon Chaffee).
+This document presents a very rough, technical overview of my
+knowledge of the extended FAT file system used in Windows NT 3.5 and
+Windows 95.  I don't guarantee that any of the following is correct,
+but it appears to be so.
+The extended FAT file system is almost identical to the FAT
+file system used in DOS versions up to and including 6.223410239847
+:-).  The significant change has been the addition of long file names.
+These names support up to 255 characters including spaces and lower
+case characters as opposed to the traditional 8.3 short names.
+Here is the description of the traditional FAT entry in the current
+Windows 95 filesystem:
+        struct directory { // Short 8.3 names 
+                unsigned char name[8];          // file name 
+                unsigned char ext[3];           // file extension 
+                unsigned char attr;             // attribute byte 
+                unsigned char lcase;            // Case for base and extension
+                unsigned char ctime_ms;         // Creation time, milliseconds
+                unsigned char ctime[2];         // Creation time
+                unsigned char cdate[2];         // Creation date
+                unsigned char adate[2];         // Last access date
+                unsigned char reserved[2];      // reserved values (ignored) 
+                unsigned char time[2];          // time stamp 
+                unsigned char date[2];          // date stamp 
+                unsigned char start[2];         // starting cluster number 
+                unsigned char size[4];          // size of the file 
+        };
+The lcase field specifies if the base and/or the extension of an 8.3
+name should be capitalized.  This field does not seem to be used by
+Windows 95 but it is used by Windows NT.  The case of filenames is not
+completely compatible from Windows NT to Windows 95.  It is not completely
+compatible in the reverse direction, however.  Filenames that fit in
+the 8.3 namespace and are written on Windows NT to be lowercase will
+show up as uppercase on Windows 95.
+Note that the "start" and "size" values are actually little
+endian integer values.  The descriptions of the fields in this
+structure are public knowledge and can be found elsewhere.
+With the extended FAT system, Microsoft has inserted extra
+directory entries for any files with extended names.  (Any name which
+legally fits within the old 8.3 encoding scheme does not have extra
+entries.)  I call these extra entries slots.  Basically, a slot is a
+specially formatted directory entry which holds up to 13 characters of
+a file's extended name.  Think of slots as additional labeling for the
+directory entry of the file to which they correspond.  Microsoft
+prefers to refer to the 8.3 entry for a file as its alias and the
+extended slot directory entries as the file name. 
+The C structure for a slot directory entry follows:
+        struct slot { // Up to 13 characters of a long name 
+                unsigned char id;               // sequence number for slot 
+                unsigned char name0_4[10];      // first 5 characters in name 
+                unsigned char attr;             // attribute byte
+                unsigned char reserved;         // always 0 
+                unsigned char alias_checksum;   // checksum for 8.3 alias 
+                unsigned char name5_10[12];     // 6 more characters in name
+                unsigned char start[2];         // starting cluster number
+                unsigned char name11_12[4];     // last 2 characters in name
+        };
+If the layout of the slots looks a little odd, it's only
+because of Microsoft's efforts to maintain compatibility with old
+software.  The slots must be disguised to prevent old software from
+panicking.  To this end, a number of measures are taken:
+        1) The attribute byte for a slot directory entry is always set
+           to 0x0f.  This corresponds to an old directory entry with
+           attributes of "hidden", "system", "read-only", and "volume
+           label".  Most old software will ignore any directory
+           entries with the "volume label" bit set.  Real volume label
+           entries don't have the other three bits set.
+        2) The starting cluster is always set to 0, an impossible
+           value for a DOS file.
+Because the extended FAT system is backward compatible, it is
+possible for old software to modify directory entries.  Measures must
+be taken to ensure the validity of slots.  An extended FAT system can
+verify that a slot does in fact belong to an 8.3 directory entry by
+the following:
+        1) Positioning.  Slots for a file always immediately proceed
+           their corresponding 8.3 directory entry.  In addition, each
+           slot has an id which marks its order in the extended file
+           name.  Here is a very abbreviated view of an 8.3 directory
+           entry and its corresponding long name slots for the file
+           "My Big File.Extension which is long":
+                <proceeding files...>
+                <slot #3, id = 0x43, characters = "h is long">
+                <slot #2, id = 0x02, characters = "xtension whic">
+                <slot #1, id = 0x01, characters = "My Big File.E">
+                <directory entry, name = "MYBIGFIL.EXT">
+           Note that the slots are stored from last to first.  Slots
+           are numbered from 1 to N.  The Nth slot is or'ed with 0x40
+           to mark it as the last one.
+        2) Checksum.  Each slot has an "alias_checksum" value.  The
+           checksum is calculated from the 8.3 name using the
+           following algorithm:
+                for (sum = i = 0; i < 11; i++) {
+                        sum = (((sum&1)<<7)|((sum&0xfe)>>1)) + name[i]
+                }
+        3) If there is free space in the final slot, a Unicode NULL (0x0000) 
+           is stored after the final character.  After that, all unused 
+           characters in the final slot are set to Unicode 0xFFFF.
+Finally, note that the extended name is stored in Unicode.  Each Unicode
+character takes two bytes.
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
new file mode 100644
index 000000000000..3f318dd44c77
--- /dev/null
+++ b/Documentation/filesystems/vfs.txt
@@ -0,0 +1,671 @@
+/* -*- auto-fill -*-                                                         */
+                Overview of the Virtual File System
+                Richard Gooch <rgooch@atnf.csiro.au>
+                              5-JUL-1999
+Conventions used in this document                                     <section>
+=================================
+Each section in this document will have the string "<section>" at the
+right-hand side of the section title. Each subsection will have
+"<subsection>" at the right-hand side. These strings are meant to make
+it easier to search through the document.
+NOTE that the master copy of this document is available online at:
+http://www.atnf.csiro.au/~rgooch/linux/docs/vfs.txt
+What is it?                                                           <section>
+===========
+The Virtual File System (otherwise known as the Virtual Filesystem
+Switch) is the software layer in the kernel that provides the
+filesystem interface to userspace programs. It also provides an
+abstraction within the kernel which allows different filesystem
+implementations to co-exist.
+A Quick Look At How It Works                                          <section>
+============================
+In this section I'll briefly describe how things work, before
+launching into the details. I'll start with describing what happens
+when user programs open and manipulate files, and then look from the
+other view which is how a filesystem is supported and subsequently
+mounted.
+Opening a File                                                     <subsection>
+--------------
+The VFS implements the open(2), stat(2), chmod(2) and similar system
+calls. The pathname argument is used by the VFS to search through the
+directory entry cache (dentry cache or "dcache"). This provides a very
+fast look-up mechanism to translate a pathname (filename) into a
+specific dentry.
+An individual dentry usually has a pointer to an inode. Inodes are the
+things that live on disc drives, and can be regular files (you know:
+those things that you write data into), directories, FIFOs and other
+beasts. Dentries live in RAM and are never saved to disc: they exist
+only for performance. Inodes live on disc and are copied into memory
+when required. Later any changes are written back to disc. The inode
+that lives in RAM is a VFS inode, and it is this which the dentry
+points to. A single inode can be pointed to by multiple dentries
+(think about hardlinks).
+The dcache is meant to be a view into your entire filespace. Unlike
+Linus, most of us losers can't fit enough dentries into RAM to cover
+all of our filespace, so the dcache has bits missing. In order to
+resolve your pathname into a dentry, the VFS may have to resort to
+creating dentries along the way, and then loading the inode. This is
+done by looking up the inode.
+To look up an inode (usually read from disc) requires that the VFS
+calls the lookup() method of the parent directory inode. This method
+is installed by the specific filesystem implementation that the inode
+lives in. There will be more on this later.
+Once the VFS has the required dentry (and hence the inode), we can do
+all those boring things like open(2) the file, or stat(2) it to peek
+at the inode data. The stat(2) operation is fairly simple: once the
+VFS has the dentry, it peeks at the inode data and passes some of it
+back to userspace.
+Opening a file requires another operation: allocation of a file
+structure (this is the kernel-side implementation of file
+descriptors). The freshly allocated file structure is initialised with
+a pointer to the dentry and a set of file operation member functions.
+These are taken from the inode data. The open() file method is then
+called so the specific filesystem implementation can do it's work. You
+can see that this is another switch performed by the VFS.
+The file structure is placed into the file descriptor table for the
+process.
+Reading, writing and closing files (and other assorted VFS operations)
+is done by using the userspace file descriptor to grab the appropriate
+file structure, and then calling the required file structure method
+function to do whatever is required.
+For as long as the file is open, it keeps the dentry "open" (in use),
+which in turn means that the VFS inode is still in use.
+All VFS system calls (i.e. open(2), stat(2), read(2), write(2),
+chmod(2) and so on) are called from a process context. You should
+assume that these calls are made without any kernel locks being
+held. This means that the processes may be executing the same piece of
+filesystem or driver code at the same time, on different
+processors. You should ensure that access to shared resources is
+protected by appropriate locks.
+Registering and Mounting a Filesystem                              <subsection>
+-------------------------------------
+If you want to support a new kind of filesystem in the kernel, all you
+need to do is call register_filesystem(). You pass a structure
+describing the filesystem implementation (struct file_system_type)
+which is then added to an internal table of supported filesystems. You
+can do:
+% cat /proc/filesystems
+to see what filesystems are currently available on your system.
+When a request is made to mount a block device onto a directory in
+your filespace the VFS will call the appropriate method for the
+specific filesystem. The dentry for the mount point will then be
+updated to point to the root inode for the new filesystem.
+It's now time to look at things in more detail.
+struct file_system_type                                               <section>
+=======================
+This describes the filesystem. As of kernel 2.1.99, the following
+members are defined:
+struct file_system_type {
+        const char *name;
+        int fs_flags;
+        struct super_block *(*read_super) (struct super_block *, void *, int);
+        struct file_system_type * next;
+};
+  name: the name of the filesystem type, such as "ext2", "iso9660",
+        "msdos" and so on
+  fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
+  read_super: the method to call when a new instance of this
+        filesystem should be mounted
+  next: for internal VFS use: you should initialise this to NULL
+The read_super() method has the following arguments:
+  struct super_block *sb: the superblock structure. This is partially
+        initialised by the VFS and the rest must be initialised by the
+        read_super() method
+  void *data: arbitrary mount options, usually comes as an ASCII
+        string
+  int silent: whether or not to be silent on error
+The read_super() method must determine if the block device specified
+in the superblock contains a filesystem of the type the method
+supports. On success the method returns the superblock pointer, on
+failure it returns NULL.
+The most interesting member of the superblock structure that the
+read_super() method fills in is the "s_op" field. This is a pointer to
+a "struct super_operations" which describes the next level of the
+filesystem implementation.
+struct super_operations                                               <section>
+=======================
+This describes how the VFS can manipulate the superblock of your
+filesystem. As of kernel 2.1.99, the following members are defined:
+struct super_operations {
+        void (*read_inode) (struct inode *);
+        int (*write_inode) (struct inode *, int);
+        void (*put_inode) (struct inode *);
+        void (*drop_inode) (struct inode *);
+        void (*delete_inode) (struct inode *);
+        int (*notify_change) (struct dentry *, struct iattr *);
+        void (*put_super) (struct super_block *);
+        void (*write_super) (struct super_block *);
+        int (*statfs) (struct super_block *, struct statfs *, int);
+        int (*remount_fs) (struct super_block *, int *, char *);
+        void (*clear_inode) (struct inode *);
+};
+All methods are called without any locks being held, unless otherwise
+noted. This means that most methods can block safely. All methods are
+only called from a process context (i.e. not from an interrupt handler
+or bottom half).
+  read_inode: this method is called to read a specific inode from the
+        mounted filesystem. The "i_ino" member in the "struct inode"
+        will be initialised by the VFS to indicate which inode to
+        read. Other members are filled in by this method
+  write_inode: this method is called when the VFS needs to write an
+        inode to disc.  The second parameter indicates whether the write
+        should be synchronous or not, not all filesystems check this flag.
+  put_inode: called when the VFS inode is removed from the inode
+        cache. This method is optional
+  drop_inode: called when the last access to the inode is dropped,
+        with the inode_lock spinlock held.
+        This method should be either NULL (normal unix filesystem
+        semantics) or "generic_delete_inode" (for filesystems that do not
+        want to cache inodes - causing "delete_inode" to always be
+        called regardless of the value of i_nlink)
+        The "generic_delete_inode()" behaviour is equivalent to the
+        old practice of using "force_delete" in the put_inode() case,
+        but does not have the races that the "force_delete()" approach
+        had. 
+  delete_inode: called when the VFS wants to delete an inode
+  notify_change: called when VFS inode attributes are changed. If this
+        is NULL the VFS falls back to the write_inode() method. This
+        is called with the kernel lock held
+  put_super: called when the VFS wishes to free the superblock
+        (i.e. unmount). This is called with the superblock lock held
+  write_super: called when the VFS superblock needs to be written to
+        disc. This method is optional
+  statfs: called when the VFS needs to get filesystem statistics. This
+        is called with the kernel lock held
+  remount_fs: called when the filesystem is remounted. This is called
+        with the kernel lock held
+  clear_inode: called then the VFS clears the inode. Optional
+The read_inode() method is responsible for filling in the "i_op"
+field. This is a pointer to a "struct inode_operations" which
+describes the methods that can be performed on individual inodes.
+struct inode_operations                                               <section>
+=======================
+This describes how the VFS can manipulate an inode in your
+filesystem. As of kernel 2.1.99, the following members are defined:
+struct inode_operations {
+        struct file_operations * default_file_ops;
+        int (*create) (struct inode *,struct dentry *,int);
+        int (*lookup) (struct inode *,struct dentry *);
+        int (*link) (struct dentry *,struct inode *,struct dentry *);
+        int (*unlink) (struct inode *,struct dentry *);
+        int (*symlink) (struct inode *,struct dentry *,const char *);
+        int (*mkdir) (struct inode *,struct dentry *,int);
+        int (*rmdir) (struct inode *,struct dentry *);
+        int (*mknod) (struct inode *,struct dentry *,int,dev_t);
+        int (*rename) (struct inode *, struct dentry *,
+                        struct inode *, struct dentry *);
+        int (*readlink) (struct dentry *, char *,int);
+        struct dentry * (*follow_link) (struct dentry *, struct dentry *);
+        int (*readpage) (struct file *, struct page *);
+        int (*writepage) (struct page *page, struct writeback_control *wbc);
+        int (*bmap) (struct inode *,int);
+        void (*truncate) (struct inode *);
+        int (*permission) (struct inode *, int);
+        int (*smap) (struct inode *,int);
+        int (*updatepage) (struct file *, struct page *, const char *,
+                                unsigned long, unsigned int, int);
+        int (*revalidate) (struct dentry *);
+};
+Again, all methods are called without any locks being held, unless
+otherwise noted.
+  default_file_ops: this is a pointer to a "struct file_operations"
+        which describes how to open and then manipulate open files
+  create: called by the open(2) and creat(2) system calls. Only
+        required if you want to support regular files. The dentry you
+        get should not have an inode (i.e. it should be a negative
+        dentry). Here you will probably call d_instantiate() with the
+        dentry and the newly created inode
+  lookup: called when the VFS needs to look up an inode in a parent
+        directory. The name to look for is found in the dentry. This
+        method must call d_add() to insert the found inode into the
+        dentry. The "i_count" field in the inode structure should be
+        incremented. If the named inode does not exist a NULL inode
+        should be inserted into the dentry (this is called a negative
+        dentry). Returning an error code from this routine must only
+        be done on a real error, otherwise creating inodes with system
+        calls like create(2), mknod(2), mkdir(2) and so on will fail.
+        If you wish to overload the dentry methods then you should
+        initialise the "d_dop" field in the dentry; this is a pointer
+        to a struct "dentry_operations".
+        This method is called with the directory inode semaphore held
+  link: called by the link(2) system call. Only required if you want
+        to support hard links. You will probably need to call
+        d_instantiate() just as you would in the create() method
+  unlink: called by the unlink(2) system call. Only required if you
+        want to support deleting inodes
+  symlink: called by the symlink(2) system call. Only required if you
+        want to support symlinks. You will probably need to call
+        d_instantiate() just as you would in the create() method
+  mkdir: called by the mkdir(2) system call. Only required if you want
+        to support creating subdirectories. You will probably need to
+        call d_instantiate() just as you would in the create() method
+  rmdir: called by the rmdir(2) system call. Only required if you want
+        to support deleting subdirectories
+  mknod: called by the mknod(2) system call to create a device (char,
+        block) inode or a named pipe (FIFO) or socket. Only required
+        if you want to support creating these types of inodes. You
+        will probably need to call d_instantiate() just as you would
+        in the create() method
+  readlink: called by the readlink(2) system call. Only required if
+        you want to support reading symbolic links
+  follow_link: called by the VFS to follow a symbolic link to the
+        inode it points to. Only required if you want to support
+        symbolic links
+struct file_operations                                                <section>
+======================
+This describes how the VFS can manipulate an open file. As of kernel
+2.1.99, the following members are defined:
+struct file_operations {
+        loff_t (*llseek) (struct file *, loff_t, int);
+        ssize_t (*read) (struct file *, char *, size_t, loff_t *);
+        ssize_t (*write) (struct file *, const char *, size_t, loff_t *);
+        int (*readdir) (struct file *, void *, filldir_t);
+        unsigned int (*poll) (struct file *, struct poll_table_struct *);
+        int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
+        int (*mmap) (struct file *, struct vm_area_struct *);
+        int (*open) (struct inode *, struct file *);
+        int (*release) (struct inode *, struct file *);
+        int (*fsync) (struct file *, struct dentry *);
+        int (*fasync) (struct file *, int);
+        int (*check_media_change) (kdev_t dev);
+        int (*revalidate) (kdev_t dev);
+        int (*lock) (struct file *, int, struct file_lock *);
+};
+Again, all methods are called without any locks being held, unless
+otherwise noted.
+  llseek: called when the VFS needs to move the file position index
+  read: called by read(2) and related system calls
+  write: called by write(2) and related system calls
+  readdir: called when the VFS needs to read the directory contents
+  poll: called by the VFS when a process wants to check if there is
+        activity on this file and (optionally) go to sleep until there
+        is activity. Called by the select(2) and poll(2) system calls
+  ioctl: called by the ioctl(2) system call
+  mmap: called by the mmap(2) system call
+  open: called by the VFS when an inode should be opened. When the VFS
+        opens a file, it creates a new "struct file" and initialises
+        the "f_op" file operations member with the "default_file_ops"
+        field in the inode structure. It then calls the open method
+        for the newly allocated file structure. You might think that
+        the open method really belongs in "struct inode_operations",
+        and you may be right. I think it's done the way it is because
+        it makes filesystems simpler to implement. The open() method
+        is a good place to initialise the "private_data" member in the
+        file structure if you want to point to a device structure
+  release: called when the last reference to an open file is closed
+  fsync: called by the fsync(2) system call
+  fasync: called by the fcntl(2) system call when asynchronous
+        (non-blocking) mode is enabled for a file
+Note that the file operations are implemented by the specific
+filesystem in which the inode resides. When opening a device node
+(character or block special) most filesystems will call special
+support routines in the VFS which will locate the required device
+driver information. These support routines replace the filesystem file
+operations with those for the device driver, and then proceed to call
+the new open() method for the file. This is how opening a device file
+in the filesystem eventually ends up calling the device driver open()
+method. Note the devfs (the Device FileSystem) has a more direct path
+from device node to device driver (this is an unofficial kernel
+patch).
+Directory Entry Cache (dcache)                                        <section>
+------------------------------
+struct dentry_operations
+========================
+This describes how a filesystem can overload the standard dentry
+operations. Dentries and the dcache are the domain of the VFS and the
+individual filesystem implementations. Device drivers have no business
+here. These methods may be set to NULL, as they are either optional or
+the VFS uses a default. As of kernel 2.1.99, the following members are
+defined:
+struct dentry_operations {
+        int (*d_revalidate)(struct dentry *);
+        int (*d_hash) (struct dentry *, struct qstr *);
+        int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
+        void (*d_delete)(struct dentry *);
+        void (*d_release)(struct dentry *);
+        void (*d_iput)(struct dentry *, struct inode *);
+};
+  d_revalidate: called when the VFS needs to revalidate a dentry. This
+        is called whenever a name look-up finds a dentry in the
+        dcache. Most filesystems leave this as NULL, because all their
+        dentries in the dcache are valid
+  d_hash: called when the VFS adds a dentry to the hash table
+  d_compare: called when a dentry should be compared with another
+  d_delete: called when the last reference to a dentry is
+        deleted. This means no-one is using the dentry, however it is
+        still valid and in the dcache
+  d_release: called when a dentry is really deallocated
+  d_iput: called when a dentry loses its inode (just prior to its
+        being deallocated). The default when this is NULL is that the
+        VFS calls iput(). If you define this method, you must call
+        iput() yourself
+Each dentry has a pointer to its parent dentry, as well as a hash list
+of child dentries. Child dentries are basically like files in a
+directory.
+Directory Entry Cache APIs
+--------------------------
+There are a number of functions defined which permit a filesystem to
+manipulate dentries:
+  dget: open a new handle for an existing dentry (this just increments
+        the usage count)
+  dput: close a handle for a dentry (decrements the usage count). If
+        the usage count drops to 0, the "d_delete" method is called
+        and the dentry is placed on the unused list if the dentry is
+        still in its parents hash list. Putting the dentry on the
+        unused list just means that if the system needs some RAM, it
+        goes through the unused list of dentries and deallocates them.
+        If the dentry has already been unhashed and the usage count
+        drops to 0, in this case the dentry is deallocated after the
+        "d_delete" method is called
+  d_drop: this unhashes a dentry from its parents hash list. A
+        subsequent call to dput() will dellocate the dentry if its
+        usage count drops to 0
+  d_delete: delete a dentry. If there are no other open references to
+        the dentry then the dentry is turned into a negative dentry
+        (the d_iput() method is called). If there are other
+        references, then d_drop() is called instead
+  d_add: add a dentry to its parents hash list and then calls
+        d_instantiate()
+  d_instantiate: add a dentry to the alias hash list for the inode and
+        updates the "d_inode" member. The "i_count" member in the
+        inode structure should be set/incremented. If the inode
+        pointer is NULL, the dentry is called a "negative
+        dentry". This function is commonly called when an inode is
+        created for an existing negative dentry
+  d_lookup: look up a dentry given its parent and path name component
+        It looks up the child of that given name from the dcache
+        hash table. If it is found, the reference count is incremented
+        and the dentry is returned. The caller must use d_put()
+        to free the dentry when it finishes using it.
+RCU-based dcache locking model
+------------------------------
+On many workloads, the most common operation on dcache is
+to look up a dentry, given a parent dentry and the name
+of the child. Typically, for every open(), stat() etc.,
+the dentry corresponding to the pathname will be looked
+up by walking the tree starting with the first component
+of the pathname and using that dentry along with the next
+component to look up the next level and so on. Since it
+is a frequent operation for workloads like multiuser
+environments and webservers, it is important to optimize
+this path.
+Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus
+in every component during path look-up. Since 2.5.10 onwards,
+fastwalk algorithm changed this by holding the dcache_lock
+at the beginning and walking as many cached path component
+dentries as possible. This signficantly decreases the number
+of acquisition of dcache_lock. However it also increases the
+lock hold time signficantly and affects performance in large
+SMP machines. Since 2.5.62 kernel, dcache has been using
+a new locking model that uses RCU to make dcache look-up
+lock-free.
+The current dcache locking model is not very different from the existing
+dcache locking model. Prior to 2.5.62 kernel, dcache_lock
+protected the hash chain, d_child, d_alias, d_lru lists as well
+as d_inode and several other things like mount look-up. RCU-based
+changes affect only the way the hash chain is protected. For everything
+else the dcache_lock must be taken for both traversing as well as
+updating. The hash chain updations too take the dcache_lock.
+The significant change is the way d_lookup traverses the hash chain,
+it doesn't acquire the dcache_lock for this and rely on RCU to
+ensure that the dentry has not been *freed*.
+Dcache locking details
+----------------------
+For many multi-user workloads, open() and stat() on files are
+very frequently occurring operations. Both involve walking
+of path names to find the dentry corresponding to the
+concerned file. In 2.4 kernel, dcache_lock was held
+during look-up of each path component. Contention and
+cacheline bouncing of this global lock caused significant
+scalability problems. With the introduction of RCU
+in linux kernel, this was worked around by making
+the look-up of path components during path walking lock-free.
+Safe lock-free look-up of dcache hash table
+===========================================
+Dcache is a complex data structure with the hash table entries
+also linked together in other lists. In 2.4 kernel, dcache_lock
+protected all the lists. We applied RCU only on hash chain
+walking. The rest of the lists are still protected by dcache_lock.
+Some of the important changes are :
+1. The deletion from hash chain is done using hlist_del_rcu() macro which
+   doesn't initialize next pointer of the deleted dentry and this
+   allows us to walk safely lock-free while a deletion is happening.
+2. Insertion of a dentry into the hash table is done using
+   hlist_add_head_rcu() which take care of ordering the writes -
+   the writes to the dentry must be visible before the dentry
+   is inserted. This works in conjuction with hlist_for_each_rcu()
+   while walking the hash chain. The only requirement is that
+   all initialization to the dentry must be done before hlist_add_head_rcu()
+   since we don't have dcache_lock protection while traversing
+   the hash chain. This isn't different from the existing code.
+3. The dentry looked up without holding dcache_lock by cannot be
+   returned for walking if it is unhashed. It then may have a NULL
+   d_inode or other bogosity since RCU doesn't protect the other
+   fields in the dentry. We therefore use a flag DCACHE_UNHASHED to
+   indicate unhashed  dentries and use this in conjunction with a
+   per-dentry lock (d_lock). Once looked up without the dcache_lock,
+   we acquire the per-dentry lock (d_lock) and check if the
+   dentry is unhashed. If so, the look-up is failed. If not, the
+   reference count of the dentry is increased and the dentry is returned.
+4. Once a dentry is looked up, it must be ensured during the path
+   walk for that component it doesn't go away. In pre-2.5.10 code,
+   this was done holding a reference to the dentry. dcache_rcu does
+   the same.  In some sense, dcache_rcu path walking looks like
+   the pre-2.5.10 version.
+5. All dentry hash chain updations must take the dcache_lock as well as
+   the per-dentry lock in that order. dput() does this to ensure
+   that a dentry that has just been looked up in another CPU
+   doesn't get deleted before dget() can be done on it.
+6. There are several ways to do reference counting of RCU protected
+   objects. One such example is in ipv4 route cache where
+   deferred freeing (using call_rcu()) is done as soon as
+   the reference count goes to zero. This cannot be done in
+   the case of dentries because tearing down of dentries
+   require blocking (dentry_iput()) which isn't supported from
+   RCU callbacks. Instead, tearing down of dentries happen
+   synchronously in dput(), but actual freeing happens later
+   when RCU grace period is over. This allows safe lock-free
+   walking of the hash chains, but a matched dentry may have
+   been partially torn down. The checking of DCACHE_UNHASHED
+   flag with d_lock held detects such dentries and prevents
+   them from being returned from look-up.
+Maintaining POSIX rename semantics
+==================================
+Since look-up of dentries is lock-free, it can race against
+a concurrent rename operation. For example, during rename
+of file A to B, look-up of either A or B must succeed.
+So, if look-up of B happens after A has been removed from the
+hash chain but not added to the new hash chain, it may fail.
+Also, a comparison while the name is being written concurrently
+by a rename may result in false positive matches violating
+rename semantics.  Issues related to race with rename are
+handled as described below :
+1. Look-up can be done in two ways - d_lookup() which is safe
+   from simultaneous renames and __d_lookup() which is not.
+   If __d_lookup() fails, it must be followed up by a d_lookup()
+   to correctly determine whether a dentry is in the hash table
+   or not. d_lookup() protects look-ups using a sequence
+   lock (rename_lock).
+2. The name associated with a dentry (d_name) may be changed if
+   a rename is allowed to happen simultaneously. To avoid memcmp()
+   in __d_lookup() go out of bounds due to a rename and false
+   positive comparison, the name comparison is done while holding the
+   per-dentry lock. This prevents concurrent renames during this
+   operation.
+3. Hash table walking during look-up may move to a different bucket as
+   the current dentry is moved to a different bucket due to rename.
+   But we use hlists in dcache hash table and they are null-terminated.
+   So, even if a dentry moves to a different bucket, hash chain
+   walk will terminate. [with a list_head list, it may not since
+   termination is when the list_head in the original bucket is reached].
+   Since we redo the d_parent check and compare name while holding
+   d_lock, lock-free look-up will not race against d_move().
+4. There can be a theoritical race when a dentry keeps coming back
+   to original bucket due to double moves. Due to this look-up may
+   consider that it has never moved and can end up in a infinite loop.
+   But this is not any worse that theoritical livelocks we already
+   have in the kernel.
+Important guidelines for filesystem developers related to dcache_rcu
+====================================================================
+1. Existing dcache interfaces (pre-2.5.62) exported to filesystem
+   don't change. Only dcache internal implementation changes. However
+   filesystems *must not* delete from the dentry hash chains directly
+   using the list macros like allowed earlier. They must use dcache
+   APIs like d_drop() or __d_drop() depending on the situation.
+2. d_flags is now protected by a per-dentry lock (d_lock). All
+   access to d_flags must be protected by it.
+3. For a hashed dentry, checking of d_count needs to be protected
+   by d_lock.
+Papers and other documentation on dcache locking
+================================================
+1. Scaling dcache with RCU (http://linuxjournal.com/article.php?sid=7124).
+2. http://lse.sourceforge.net/locking/dcache/dcache.html
diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt
new file mode 100644
index 000000000000..c7d5d0c7067d
--- /dev/null
+++ b/Documentation/filesystems/xfs.txt
@@ -0,0 +1,188 @@
+The SGI XFS Filesystem
+======================
+XFS is a high performance journaling filesystem which originated
+on the SGI IRIX platform.  It is completely multi-threaded, can
+support large files and large filesystems, extended attributes,
+variable block sizes, is extent based, and makes extensive use of
+Btrees (directories, extents, free space) to aid both performance
+and scalability.
+Refer to the documentation at http://oss.sgi.com/projects/xfs/
+for further details.  This implementation is on-disk compatible
+with the IRIX version of XFS.
+Mount Options
+=============
+When mounting an XFS filesystem, the following options are accepted.
+  biosize=size
+        Sets the preferred buffered I/O size (default size is 64K).
+        "size" must be expressed as the logarithm (base2) of the
+        desired I/O size.
+        Valid values for this option are 14 through 16, inclusive
+        (i.e. 16K, 32K, and 64K bytes).  On machines with a 4K
+        pagesize, 13 (8K bytes) is also a valid size.
+        The preferred buffered I/O size can also be altered on an
+        individual file basis using the ioctl(2) system call.
+  ikeep/noikeep
+        When inode clusters are emptied of inodes, keep them around
+        on the disk (ikeep) - this is the traditional XFS behaviour
+        and is still the default for now.  Using the noikeep option,
+        inode clusters are returned to the free space pool.
+  logbufs=value
+        Set the number of in-memory log buffers.  Valid numbers range
+        from 2-8 inclusive.
+        The default value is 8 buffers for filesystems with a
+        blocksize of 64K, 4 buffers for filesystems with a blocksize
+        of 32K, 3 buffers for filesystems with a blocksize of 16K
+        and 2 buffers for all other configurations.  Increasing the
+        number of buffers may increase performance on some workloads
+        at the cost of the memory used for the additional log buffers
+        and their associated control structures.
+  logbsize=value
+        Set the size of each in-memory log buffer.
+        Size may be specified in bytes, or in kilobytes with a "k" suffix.
+        Valid sizes for version 1 and version 2 logs are 16384 (16k) and 
+        32768 (32k).  Valid sizes for version 2 logs also include 
+        65536 (64k), 131072 (128k) and 262144 (256k).
+        The default value for machines with more than 32MB of memory
+        is 32768, machines with less memory use 16384 by default.
+  logdev=device and rtdev=device
+        Use an external log (metadata journal) and/or real-time device.
+        An XFS filesystem has up to three parts: a data section, a log
+        section, and a real-time section.  The real-time section is
+        optional, and the log section can be separate from the data
+        section or contained within it.
+  noalign
+        Data allocations will not be aligned at stripe unit boundaries.
+  noatime
+        Access timestamps are not updated when a file is read.
+  norecovery
+        The filesystem will be mounted without running log recovery.
+        If the filesystem was not cleanly unmounted, it is likely to
+        be inconsistent when mounted in "norecovery" mode.
+        Some files or directories may not be accessible because of this.
+        Filesystems mounted "norecovery" must be mounted read-only or
+        the mount will fail.
+  nouuid
+        Don't check for double mounted file systems using the file system uuid.
+        This is useful to mount LVM snapshot volumes.
+  osyncisosync
+        Make O_SYNC writes implement true O_SYNC.  WITHOUT this option,
+        Linux XFS behaves as if an "osyncisdsync" option is used,
+        which will make writes to files opened with the O_SYNC flag set
+        behave as if the O_DSYNC flag had been used instead.
+        This can result in better performance without compromising
+        data safety.
+        However if this option is not in effect, timestamp updates from
+        O_SYNC writes can be lost if the system crashes.
+        If timestamp updates are critical, use the osyncisosync option.
+  quota/usrquota/uqnoenforce
+        User disk quota accounting enabled, and limits (optionally)
+        enforced.
+  grpquota/gqnoenforce
+        Group disk quota accounting enabled and limits (optionally)
+        enforced.
+  sunit=value and swidth=value
+        Used to specify the stripe unit and width for a RAID device or
+        a stripe volume.  "value" must be specified in 512-byte block
+        units.
+        If this option is not specified and the filesystem was made on
+        a stripe volume or the stripe width or unit were specified for
+        the RAID device at mkfs time, then the mount system call will
+        restore the value from the superblock.  For filesystems that
+        are made directly on RAID devices, these options can be used
+        to override the information in the superblock if the underlying
+        disk layout changes after the filesystem has been created.
+        The "swidth" option is required if the "sunit" option has been
+        specified, and must be a multiple of the "sunit" value.
+sysctls
+=======
+The following sysctls are available for the XFS filesystem:
+  fs.xfs.stats_clear            (Min: 0  Default: 0  Max: 1)
+        Setting this to "1" clears accumulated XFS statistics 
+        in /proc/fs/xfs/stat.  It then immediately resets to "0".
+  
+  fs.xfs.xfssyncd_centisecs     (Min: 100  Default: 3000  Max: 720000)
+        The interval at which the xfssyncd thread flushes metadata
+        out to disk.  This thread will flush log activity out, and
+        do some processing on unlinked inodes.
+  fs.xfs.xfsbufd_centisecs      (Min: 50  Default: 100  Max: 3000)
+        The interval at which xfsbufd scans the dirty metadata buffers list.
+  fs.xfs.age_buffer_centisecs   (Min: 100  Default: 1500  Max: 720000)
+        The age at which xfsbufd flushes dirty metadata buffers to disk.
+  fs.xfs.error_level            (Min: 0  Default: 3  Max: 11)
+        A volume knob for error reporting when internal errors occur.
+        This will generate detailed messages & backtraces for filesystem
+        shutdowns, for example.  Current threshold values are:
+                XFS_ERRLEVEL_OFF:       0
+                XFS_ERRLEVEL_LOW:       1
+                XFS_ERRLEVEL_HIGH:      5
+  fs.xfs.panic_mask             (Min: 0  Default: 0  Max: 127)
+        Causes certain error conditions to call BUG(). Value is a bitmask; 
+        AND together the tags which represent errors which should cause panics:
+        
+                XFS_NO_PTAG                     0
+                XFS_PTAG_IFLUSH                 0x00000001
+                XFS_PTAG_LOGRES                 0x00000002
+                XFS_PTAG_AILDELETE              0x00000004
+                XFS_PTAG_ERROR_REPORT           0x00000008
+                XFS_PTAG_SHUTDOWN_CORRUPT       0x00000010
+                XFS_PTAG_SHUTDOWN_IOERROR       0x00000020
+                XFS_PTAG_SHUTDOWN_LOGERROR      0x00000040
+        This option is intended for debugging only.             
+  fs.xfs.irix_symlink_mode      (Min: 0  Default: 0  Max: 1)
+        Controls whether symlinks are created with mode 0777 (default)
+        or whether their mode is affected by the umask (irix mode).
+  fs.xfs.irix_sgid_inherit      (Min: 0  Default: 0  Max: 1)
+        Controls files created in SGID directories.
+        If the group ID of the new file does not match the effective group
+        ID or one of the supplementary group IDs of the parent dir, the 
+        ISGID bit is cleared if the irix_sgid_inherit compatibility sysctl 
+        is set.
+  fs.xfs.restrict_chown         (Min: 0  Default: 1  Max: 1)
+        Controls whether unprivileged users can use chown to "give away"
+        a file to another user.
+  fs.xfs.inherit_sync           (Min: 0  Default: 1  Max 1)
+        Setting this to "1" will cause the "sync" flag set 
+        by the chattr(1) command on a directory to be
+        inherited by files in that directory.
+  fs.xfs.inherit_nodump         (Min: 0  Default: 1  Max 1)
+        Setting this to "1" will cause the "nodump" flag set 
+        by the chattr(1) command on a directory to be
+        inherited by files in that directory.
+  fs.xfs.inherit_noatime        (Min: 0  Default: 1  Max 1)
+        Setting this to "1" will cause the "noatime" flag set 
+        by the chattr(1) command on a directory to be
+        inherited by files in that directory.