Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits) NFS: Add mount options to enable local caching on NFS NFS: Display local caching state NFS: Store pages from an NFS inode into a local cache NFS: Read pages from FS-Cache into an NFS inode NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching NFS: Add read context retention for FS-Cache to call back with NFS: FS-Cache page management NFS: Add some new I/O counters for FS-Cache doing things for NFS NFS: Invalidate FsCache page flags when cache removed NFS: Use local disk inode cache NFS: Define and create inode-level cache objects NFS: Define and create superblock-level objects NFS: Define and create server-level objects NFS: Register NFS for caching and retrieve the top-level index NFS: Permit local filesystem caching to be enabled for NFS NFS: Add FS-Cache option bit and debug bit NFS: Add comment banners to some NFS functions FS-Cache: Make kAFS use FS-Cache CacheFiles: A cache that backs onto a mounted filesystem CacheFiles: Export things for CacheFiles ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2009-04-03 13:07:43 -0400
committer: Linus Torvalds <torvalds@linux-foundation.org> 2009-04-03 13:07:43 -0400
commit: 3cc50ac0dbda5100684e570247782330155d35e0 (patch)
tree: f4b8f22d1725ebe65d2fe658d292dabacd7ed564 /Documentation
parent: d9b9be024a6628a01d8730d1fd0b5f25658a2794 (diff)
parent: b797cac7487dee6bfddeb161631c1bbc54fa3cdb (diff)
7 files changed, 2970 insertions, 0 deletions
diff --git a/Documentation/filesystems/caching/backend-api.txt b/Documentation/filesystems/caching/backend-api.txt
new file mode 100644
index 000000000000..382d52cdaf2d
--- /dev/null
+++ b/Documentation/filesystems/caching/backend-api.txt
@@ -0,0 +1,658 @@
+                          ==========================
+                          FS-CACHE CACHE BACKEND API
+                          ==========================
+The FS-Cache system provides an API by which actual caches can be supplied to
+FS-Cache for it to then serve out to network filesystems and other interested
+parties.
+This API is declared in <linux/fscache-cache.h>.
+====================================
+INITIALISING AND REGISTERING A CACHE
+====================================
+To start off, a cache definition must be initialised and registered for each
+cache the backend wants to make available.  For instance, CacheFS does this in
+the fill_super() operation on mounting.
+The cache definition (struct fscache_cache) should be initialised by calling:
+        void fscache_init_cache(struct fscache_cache *cache,
+                                struct fscache_cache_ops *ops,
+                                const char *idfmt,
+                                ...);
+Where:
+ (*) "cache" is a pointer to the cache definition;
+ (*) "ops" is a pointer to the table of operations that the backend supports on
+     this cache; and
+ (*) "idfmt" is a format and printf-style arguments for constructing a label
+     for the cache.
+The cache should then be registered with FS-Cache by passing a pointer to the
+previously initialised cache definition to:
+        int fscache_add_cache(struct fscache_cache *cache,
+                              struct fscache_object *fsdef,
+                              const char *tagname);
+Two extra arguments should also be supplied:
+ (*) "fsdef" which should point to the object representation for the FS-Cache
+     master index in this cache.  Netfs primary index entries will be created
+     here.  FS-Cache keeps the caller's reference to the index object if
+     successful and will release it upon withdrawal of the cache.
+ (*) "tagname" which, if given, should be a text string naming this cache.  If
+     this is NULL, the identifier will be used instead.  For CacheFS, the
+     identifier is set to name the underlying block device and the tag can be
+     supplied by mount.
+This function may return -ENOMEM if it ran out of memory or -EEXIST if the tag
+is already in use.  0 will be returned on success.
+=====================
+UNREGISTERING A CACHE
+=====================
+A cache can be withdrawn from the system by calling this function with a
+pointer to the cache definition:
+        void fscache_withdraw_cache(struct fscache_cache *cache);
+In CacheFS's case, this is called by put_super().
+========
+SECURITY
+========
+The cache methods are executed one of two contexts:
+ (1) that of the userspace process that issued the netfs operation that caused
+     the cache method to be invoked, or
+ (2) that of one of the processes in the FS-Cache thread pool.
+In either case, this may not be an appropriate context in which to access the
+cache.
+The calling process's fsuid, fsgid and SELinux security identities may need to
+be masqueraded for the duration of the cache driver's access to the cache.
+This is left to the cache to handle; FS-Cache makes no effort in this regard.
+===================================
+CONTROL AND STATISTICS PRESENTATION
+===================================
+The cache may present data to the outside world through FS-Cache's interfaces
+in sysfs and procfs - the former for control and the latter for statistics.
+A sysfs directory called /sys/fs/fscache/<cachetag>/ is created if CONFIG_SYSFS
+is enabled.  This is accessible through the kobject struct fscache_cache::kobj
+and is for use by the cache as it sees fit.
+========================
+RELEVANT DATA STRUCTURES
+========================
+ (*) Index/Data file FS-Cache representation cookie:
+        struct fscache_cookie {
+                struct fscache_object_def       *def;
+                struct fscache_netfs            *netfs;
+                void                            *netfs_data;
+                ...
+        };
+     The fields that might be of use to the backend describe the object
+     definition, the netfs definition and the netfs's data for this cookie.
+     The object definition contain functions supplied by the netfs for loading
+     and matching index entries; these are required to provide some of the
+     cache operations.
+ (*) In-cache object representation:
+        struct fscache_object {
+                int                             debug_id;
+                enum {
+                        FSCACHE_OBJECT_RECYCLING,
+                        ...
+                }                               state;
+                spinlock_t                      lock
+                struct fscache_cache            *cache;
+                struct fscache_cookie           *cookie;
+                ...
+        };
+     Structures of this type should be allocated by the cache backend and
+     passed to FS-Cache when requested by the appropriate cache operation.  In
+     the case of CacheFS, they're embedded in CacheFS's internal object
+     structures.
+     The debug_id is a simple integer that can be used in debugging messages
+     that refer to a particular object.  In such a case it should be printed
+     using "OBJ%x" to be consistent with FS-Cache.
+     Each object contains a pointer to the cookie that represents the object it
+     is backing.  An object should retired when put_object() is called if it is
+     in state FSCACHE_OBJECT_RECYCLING.  The fscache_object struct should be
+     initialised by calling fscache_object_init(object).
+ (*) FS-Cache operation record:
+        struct fscache_operation {
+                atomic_t                usage;
+                struct fscache_object   *object;
+                unsigned long           flags;
+        #define FSCACHE_OP_EXCLUSIVE
+                void (*processor)(struct fscache_operation *op);
+                void (*release)(struct fscache_operation *op);
+                ...
+        };
+     FS-Cache has a pool of threads that it uses to give CPU time to the
+     various asynchronous operations that need to be done as part of driving
+     the cache.  These are represented by the above structure.  The processor
+     method is called to give the op CPU time, and the release method to get
+     rid of it when its usage count reaches 0.
+     An operation can be made exclusive upon an object by setting the
+     appropriate flag before enqueuing it with fscache_enqueue_operation().  If
+     an operation needs more processing time, it should be enqueued again.
+ (*) FS-Cache retrieval operation record:
+        struct fscache_retrieval {
+                struct fscache_operation op;
+                struct address_space    *mapping;
+                struct list_head        *to_do;
+                ...
+        };
+     A structure of this type is allocated by FS-Cache to record retrieval and
+     allocation requests made by the netfs.  This struct is then passed to the
+     backend to do the operation.  The backend may get extra refs to it by
+     calling fscache_get_retrieval() and refs may be discarded by calling
+     fscache_put_retrieval().
+     A retrieval operation can be used by the backend to do retrieval work.  To
+     do this, the retrieval->op.processor method pointer should be set
+     appropriately by the backend and fscache_enqueue_retrieval() called to
+     submit it to the thread pool.  CacheFiles, for example, uses this to queue
+     page examination when it detects PG_lock being cleared.
+     The to_do field is an empty list available for the cache backend to use as
+     it sees fit.
+ (*) FS-Cache storage operation record:
+        struct fscache_storage {
+                struct fscache_operation op;
+                pgoff_t                 store_limit;
+                ...
+        };
+     A structure of this type is allocated by FS-Cache to record outstanding
+     writes to be made.  FS-Cache itself enqueues this operation and invokes
+     the write_page() method on the object at appropriate times to effect
+     storage.
+================
+CACHE OPERATIONS
+================
+The cache backend provides FS-Cache with a table of operations that can be
+performed on the denizens of the cache.  These are held in a structure of type:
+        struct fscache_cache_ops
+ (*) Name of cache provider [mandatory]:
+        const char *name
+     This isn't strictly an operation, but should be pointed at a string naming
+     the backend.
+ (*) Allocate a new object [mandatory]:
+        struct fscache_object *(*alloc_object)(struct fscache_cache *cache,
+                                               struct fscache_cookie *cookie)
+     This method is used to allocate a cache object representation to back a
+     cookie in a particular cache.  fscache_object_init() should be called on
+     the object to initialise it prior to returning.
+     This function may also be used to parse the index key to be used for
+     multiple lookup calls to turn it into a more convenient form.  FS-Cache
+     will call the lookup_complete() method to allow the cache to release the
+     form once lookup is complete or aborted.
+ (*) Look up and create object [mandatory]:
+        void (*lookup_object)(struct fscache_object *object)
+     This method is used to look up an object, given that the object is already
+     allocated and attached to the cookie.  This should instantiate that object
+     in the cache if it can.
+     The method should call fscache_object_lookup_negative() as soon as
+     possible if it determines the object doesn't exist in the cache.  If the
+     object is found to exist and the netfs indicates that it is valid then
+     fscache_obtained_object() should be called once the object is in a
+     position to have data stored in it.  Similarly, fscache_obtained_object()
+     should also be called once a non-present object has been created.
+     If a lookup error occurs, fscache_object_lookup_error() should be called
+     to abort the lookup of that object.
+ (*) Release lookup data [mandatory]:
+        void (*lookup_complete)(struct fscache_object *object)
+     This method is called to ask the cache to release any resources it was
+     using to perform a lookup.
+ (*) Increment object refcount [mandatory]:
+        struct fscache_object *(*grab_object)(struct fscache_object *object)
+     This method is called to increment the reference count on an object.  It
+     may fail (for instance if the cache is being withdrawn) by returning NULL.
+     It should return the object pointer if successful.
+ (*) Lock/Unlock object [mandatory]:
+        void (*lock_object)(struct fscache_object *object)
+        void (*unlock_object)(struct fscache_object *object)
+     These methods are used to exclusively lock an object.  It must be possible
+     to schedule with the lock held, so a spinlock isn't sufficient.
+ (*) Pin/Unpin object [optional]:
+        int (*pin_object)(struct fscache_object *object)
+        void (*unpin_object)(struct fscache_object *object)
+     These methods are used to pin an object into the cache.  Once pinned an
+     object cannot be reclaimed to make space.  Return -ENOSPC if there's not
+     enough space in the cache to permit this.
+ (*) Update object [mandatory]:
+        int (*update_object)(struct fscache_object *object)
+     This is called to update the index entry for the specified object.  The
+     new information should be in object->cookie->netfs_data.  This can be
+     obtained by calling object->cookie->def->get_aux()/get_attr().
+ (*) Discard object [mandatory]:
+        void (*drop_object)(struct fscache_object *object)
+     This method is called to indicate that an object has been unbound from its
+     cookie, and that the cache should release the object's resources and
+     retire it if it's in state FSCACHE_OBJECT_RECYCLING.
+     This method should not attempt to release any references held by the
+     caller.  The caller will invoke the put_object() method as appropriate.
+ (*) Release object reference [mandatory]:
+        void (*put_object)(struct fscache_object *object)
+     This method is used to discard a reference to an object.  The object may
+     be freed when all the references to it are released.
+ (*) Synchronise a cache [mandatory]:
+        void (*sync)(struct fscache_cache *cache)
+     This is called to ask the backend to synchronise a cache with its backing
+     device.
+ (*) Dissociate a cache [mandatory]:
+        void (*dissociate_pages)(struct fscache_cache *cache)
+     This is called to ask a cache to perform any page dissociations as part of
+     cache withdrawal.
+ (*) Notification that the attributes on a netfs file changed [mandatory]:
+        int (*attr_changed)(struct fscache_object *object);
+     This is called to indicate to the cache that certain attributes on a netfs
+     file have changed (for example the maximum size a file may reach).  The
+     cache can read these from the netfs by calling the cookie's get_attr()
+     method.
+     The cache may use the file size information to reserve space on the cache.
+     It should also call fscache_set_store_limit() to indicate to FS-Cache the
+     highest byte it's willing to store for an object.
+     This method may return -ve if an error occurred or the cache object cannot
+     be expanded.  In such a case, the object will be withdrawn from service.
+     This operation is run asynchronously from FS-Cache's thread pool, and
+     storage and retrieval operations from the netfs are excluded during the
+     execution of this operation.
+ (*) Reserve cache space for an object's data [optional]:
+        int (*reserve_space)(struct fscache_object *object, loff_t size);
+     This is called to request that cache space be reserved to hold the data
+     for an object and the metadata used to track it.  Zero size should be
+     taken as request to cancel a reservation.
+     This should return 0 if successful, -ENOSPC if there isn't enough space
+     available, or -ENOMEM or -EIO on other errors.
+     The reservation may exceed the current size of the object, thus permitting
+     future expansion.  If the amount of space consumed by an object would
+     exceed the reservation, it's permitted to refuse requests to allocate
+     pages, but not required.  An object may be pruned down to its reservation
+     size if larger than that already.
+ (*) Request page be read from cache [mandatory]:
+        int (*read_or_alloc_page)(struct fscache_retrieval *op,
+                                  struct page *page,
+                                  gfp_t gfp)
+     This is called to attempt to read a netfs page from the cache, or to
+     reserve a backing block if not.  FS-Cache will have done as much checking
+     as it can before calling, but most of the work belongs to the backend.
+     If there's no page in the cache, then -ENODATA should be returned if the
+     backend managed to reserve a backing block; -ENOBUFS or -ENOMEM if it
+     didn't.
+     If there is suitable data in the cache, then a read operation should be
+     queued and 0 returned.  When the read finishes, fscache_end_io() should be
+     called.
+     The fscache_mark_pages_cached() should be called for the page if any cache
+     metadata is retained.  This will indicate to the netfs that the page needs
+     explicit uncaching.  This operation takes a pagevec, thus allowing several
+     pages to be marked at once.
+     The retrieval record pointed to by op should be retained for each page
+     queued and released when I/O on the page has been formally ended.
+     fscache_get/put_retrieval() are available for this purpose.
+     The retrieval record may be used to get CPU time via the FS-Cache thread
+     pool.  If this is desired, the op->op.processor should be set to point to
+     the appropriate processing routine, and fscache_enqueue_retrieval() should
+     be called at an appropriate point to request CPU time.  For instance, the
+     retrieval routine could be enqueued upon the completion of a disk read.
+     The to_do field in the retrieval record is provided to aid in this.
+     If an I/O error occurs, fscache_io_error() should be called and -ENOBUFS
+     returned if possible or fscache_end_io() called with a suitable error
+     code..
+ (*) Request pages be read from cache [mandatory]:
+        int (*read_or_alloc_pages)(struct fscache_retrieval *op,
+                                   struct list_head *pages,
+                                   unsigned *nr_pages,
+                                   gfp_t gfp)
+     This is like the read_or_alloc_page() method, except it is handed a list
+     of pages instead of one page.  Any pages on which a read operation is
+     started must be added to the page cache for the specified mapping and also
+     to the LRU.  Such pages must also be removed from the pages list and
+     *nr_pages decremented per page.
+     If there was an error such as -ENOMEM, then that should be returned; else
+     if one or more pages couldn't be read or allocated, then -ENOBUFS should
+     be returned; else if one or more pages couldn't be read, then -ENODATA
+     should be returned.  If all the pages are dispatched then 0 should be
+     returned.
+ (*) Request page be allocated in the cache [mandatory]:
+        int (*allocate_page)(struct fscache_retrieval *op,
+                             struct page *page,
+                             gfp_t gfp)
+     This is like the read_or_alloc_page() method, except that it shouldn't
+     read from the cache, even if there's data there that could be retrieved.
+     It should, however, set up any internal metadata required such that
+     the write_page() method can write to the cache.
+     If there's no backing block available, then -ENOBUFS should be returned
+     (or -ENOMEM if there were other problems).  If a block is successfully
+     allocated, then the netfs page should be marked and 0 returned.
+ (*) Request pages be allocated in the cache [mandatory]:
+        int (*allocate_pages)(struct fscache_retrieval *op,
+                              struct list_head *pages,
+                              unsigned *nr_pages,
+                              gfp_t gfp)
+     This is an multiple page version of the allocate_page() method.  pages and
+     nr_pages should be treated as for the read_or_alloc_pages() method.
+ (*) Request page be written to cache [mandatory]:
+        int (*write_page)(struct fscache_storage *op,
+                          struct page *page);
+     This is called to write from a page on which there was a previously
+     successful read_or_alloc_page() call or similar.  FS-Cache filters out
+     pages that don't have mappings.
+     This method is called asynchronously from the FS-Cache thread pool.  It is
+     not required to actually store anything, provided -ENODATA is then
+     returned to the next read of this page.
+     If an error occurred, then a negative error code should be returned,
+     otherwise zero should be returned.  FS-Cache will take appropriate action
+     in response to an error, such as withdrawing this object.
+     If this method returns success then FS-Cache will inform the netfs
+     appropriately.
+ (*) Discard retained per-page metadata [mandatory]:
+        void (*uncache_page)(struct fscache_object *object, struct page *page)
+     This is called when a netfs page is being evicted from the pagecache.  The
+     cache backend should tear down any internal representation or tracking it
+     maintains for this page.
+==================
+FS-CACHE UTILITIES
+==================
+FS-Cache provides some utilities that a cache backend may make use of:
+ (*) Note occurrence of an I/O error in a cache:
+        void fscache_io_error(struct fscache_cache *cache)
+     This tells FS-Cache that an I/O error occurred in the cache.  After this
+     has been called, only resource dissociation operations (object and page
+     release) will be passed from the netfs to the cache backend for the
+     specified cache.
+     This does not actually withdraw the cache.  That must be done separately.
+ (*) Invoke the retrieval I/O completion function:
+        void fscache_end_io(struct fscache_retrieval *op, struct page *page,
+                            int error);
+     This is called to note the end of an attempt to retrieve a page.  The
+     error value should be 0 if successful and an error otherwise.
+ (*) Set highest store limit:
+        void fscache_set_store_limit(struct fscache_object *object,
+                                     loff_t i_size);
+     This sets the limit FS-Cache imposes on the highest byte it's willing to
+     try and store for a netfs.  Any page over this limit is automatically
+     rejected by fscache_read_alloc_page() and co with -ENOBUFS.
+ (*) Mark pages as being cached:
+        void fscache_mark_pages_cached(struct fscache_retrieval *op,
+                                       struct pagevec *pagevec);
+     This marks a set of pages as being cached.  After this has been called,
+     the netfs must call fscache_uncache_page() to unmark the pages.
+ (*) Perform coherency check on an object:
+        enum fscache_checkaux fscache_check_aux(struct fscache_object *object,
+                                                const void *data,
+                                                uint16_t datalen);
+     This asks the netfs to perform a coherency check on an object that has
+     just been looked up.  The cookie attached to the object will determine the
+     netfs to use.  data and datalen should specify where the auxiliary data
+     retrieved from the cache can be found.
+     One of three values will be returned:
+        (*) FSCACHE_CHECKAUX_OKAY
+            The coherency data indicates the object is valid as is.
+        (*) FSCACHE_CHECKAUX_NEEDS_UPDATE
+            The coherency data needs updating, but otherwise the object is
+            valid.
+        (*) FSCACHE_CHECKAUX_OBSOLETE
+            The coherency data indicates that the object is obsolete and should
+            be discarded.
+ (*) Initialise a freshly allocated object:
+        void fscache_object_init(struct fscache_object *object);
+     This initialises all the fields in an object representation.
+ (*) Indicate the destruction of an object:
+        void fscache_object_destroyed(struct fscache_cache *cache);
+     This must be called to inform FS-Cache that an object that belonged to a
+     cache has been destroyed and deallocated.  This will allow continuation
+     of the cache withdrawal process when it is stopped pending destruction of
+     all the objects.
+ (*) Indicate negative lookup on an object:
+        void fscache_object_lookup_negative(struct fscache_object *object);
+     This is called to indicate to FS-Cache that a lookup process for an object
+     found a negative result.
+     This changes the state of an object to permit reads pending on lookup
+     completion to go off and start fetching data from the netfs server as it's
+     known at this point that there can't be any data in the cache.
+     This may be called multiple times on an object.  Only the first call is
+     significant - all subsequent calls are ignored.
+ (*) Indicate an object has been obtained:
+        void fscache_obtained_object(struct fscache_object *object);
+     This is called to indicate to FS-Cache that a lookup process for an object
+     produced a positive result, or that an object was created.  This should
+     only be called once for any particular object.
+     This changes the state of an object to indicate:
+        (1) if no call to fscache_object_lookup_negative() has been made on
+            this object, that there may be data available, and that reads can
+            now go and look for it; and
+        (2) that writes may now proceed against this object.
+ (*) Indicate that object lookup failed:
+        void fscache_object_lookup_error(struct fscache_object *object);
+     This marks an object as having encountered a fatal error (usually EIO)
+     and causes it to move into a state whereby it will be withdrawn as soon
+     as possible.
+ (*) Get and release references on a retrieval record:
+        void fscache_get_retrieval(struct fscache_retrieval *op);
+        void fscache_put_retrieval(struct fscache_retrieval *op);
+     These two functions are used to retain a retrieval record whilst doing
+     asynchronous data retrieval and block allocation.
+ (*) Enqueue a retrieval record for processing.
+        void fscache_enqueue_retrieval(struct fscache_retrieval *op);
+     This enqueues a retrieval record for processing by the FS-Cache thread
+     pool.  One of the threads in the pool will invoke the retrieval record's
+     op->op.processor callback function.  This function may be called from
+     within the callback function.
+ (*) List of object state names:
+        const char *fscache_object_states[];
+     For debugging purposes, this may be used to turn the state that an object
+     is in into a text string for display purposes.
diff --git a/Documentation/filesystems/caching/cachefiles.txt b/Documentation/filesystems/caching/cachefiles.txt
new file mode 100644
index 000000000000..c78a49b7bba6
--- /dev/null
+++ b/Documentation/filesystems/caching/cachefiles.txt
@@ -0,0 +1,501 @@
+               ===============================================
+               CacheFiles: CACHE ON ALREADY MOUNTED FILESYSTEM
+               ===============================================
+Contents:
+ (*) Overview.
+ (*) Requirements.
+ (*) Configuration.
+ (*) Starting the cache.
+ (*) Things to avoid.
+ (*) Cache culling.
+ (*) Cache structure.
+ (*) Security model and SELinux.
+ (*) A note on security.
+ (*) Statistical information.
+ (*) Debugging.
+========
+OVERVIEW
+========
+CacheFiles is a caching backend that's meant to use as a cache a directory on
+an already mounted filesystem of a local type (such as Ext3).
+CacheFiles uses a userspace daemon to do some of the cache management - such as
+reaping stale nodes and culling.  This is called cachefilesd and lives in
+/sbin.
+The filesystem and data integrity of the cache are only as good as those of the
+filesystem providing the backing services.  Note that CacheFiles does not
+attempt to journal anything since the journalling interfaces of the various
+filesystems are very specific in nature.
+CacheFiles creates a misc character device - "/dev/cachefiles" - that is used
+to communication with the daemon.  Only one thing may have this open at once,
+and whilst it is open, a cache is at least partially in existence.  The daemon
+opens this and sends commands down it to control the cache.
+CacheFiles is currently limited to a single cache.
+CacheFiles attempts to maintain at least a certain percentage of free space on
+the filesystem, shrinking the cache by culling the objects it contains to make
+space if necessary - see the "Cache Culling" section.  This means it can be
+placed on the same medium as a live set of data, and will expand to make use of
+spare space and automatically contract when the set of data requires more
+space.
+============
+REQUIREMENTS
+============
+The use of CacheFiles and its daemon requires the following features to be
+available in the system and in the cache filesystem:
+        - dnotify.
+        - extended attributes (xattrs).
+        - openat() and friends.
+        - bmap() support on files in the filesystem (FIBMAP ioctl).
+        - The use of bmap() to detect a partial page at the end of the file.
+It is strongly recommended that the "dir_index" option is enabled on Ext3
+filesystems being used as a cache.
+=============
+CONFIGURATION
+=============
+The cache is configured by a script in /etc/cachefilesd.conf.  These commands
+set up cache ready for use.  The following script commands are available:
+ (*) brun <N>%
+ (*) bcull <N>%
+ (*) bstop <N>%
+ (*) frun <N>%
+ (*) fcull <N>%
+ (*) fstop <N>%
+        Configure the culling limits.  Optional.  See the section on culling
+        The defaults are 7% (run), 5% (cull) and 1% (stop) respectively.
+        The commands beginning with a 'b' are file space (block) limits, those
+        beginning with an 'f' are file count limits.
+ (*) dir <path>
+        Specify the directory containing the root of the cache.  Mandatory.
+ (*) tag <name>
+        Specify a tag to FS-Cache to use in distinguishing multiple caches.
+        Optional.  The default is "CacheFiles".
+ (*) debug <mask>
+        Specify a numeric bitmask to control debugging in the kernel module.
+        Optional.  The default is zero (all off).  The following values can be
+        OR'd into the mask to collect various information:
+                1       Turn on trace of function entry (_enter() macros)
+                2       Turn on trace of function exit (_leave() macros)
+                4       Turn on trace of internal debug points (_debug())
+        This mask can also be set through sysfs, eg:
+                echo 5 >/sys/modules/cachefiles/parameters/debug
+==================
+STARTING THE CACHE
+==================
+The cache is started by running the daemon.  The daemon opens the cache device,
+configures the cache and tells it to begin caching.  At that point the cache
+binds to fscache and the cache becomes live.
+The daemon is run as follows:
+        /sbin/cachefilesd [-d]* [-s] [-n] [-f <configfile>]
+The flags are:
+ (*) -d
+        Increase the debugging level.  This can be specified multiple times and
+        is cumulative with itself.
+ (*) -s
+        Send messages to stderr instead of syslog.
+ (*) -n
+        Don't daemonise and go into background.
+ (*) -f <configfile>
+        Use an alternative configuration file rather than the default one.
+===============
+THINGS TO AVOID
+===============
+Do not mount other things within the cache as this will cause problems.  The
+kernel module contains its own very cut-down path walking facility that ignores
+mountpoints, but the daemon can't avoid them.
+Do not create, rename or unlink files and directories in the cache whilst the
+cache is active, as this may cause the state to become uncertain.
+Renaming files in the cache might make objects appear to be other objects (the
+filename is part of the lookup key).
+Do not change or remove the extended attributes attached to cache files by the
+cache as this will cause the cache state management to get confused.
+Do not create files or directories in the cache, lest the cache get confused or
+serve incorrect data.
+Do not chmod files in the cache.  The module creates things with minimal
+permissions to prevent random users being able to access them directly.
+=============
+CACHE CULLING
+=============
+The cache may need culling occasionally to make space.  This involves
+discarding objects from the cache that have been used less recently than
+anything else.  Culling is based on the access time of data objects.  Empty
+directories are culled if not in use.
+Cache culling is done on the basis of the percentage of blocks and the
+percentage of files available in the underlying filesystem.  There are six
+"limits":
+ (*) brun
+ (*) frun
+     If the amount of free space and the number of available files in the cache
+     rises above both these limits, then culling is turned off.
+ (*) bcull
+ (*) fcull
+     If the amount of available space or the number of available files in the
+     cache falls below either of these limits, then culling is started.
+ (*) bstop
+ (*) fstop
+     If the amount of available space or the number of available files in the
+     cache falls below either of these limits, then no further allocation of
+     disk space or files is permitted until culling has raised things above
+     these limits again.
+These must be configured thusly:
+        0 <= bstop < bcull < brun < 100
+        0 <= fstop < fcull < frun < 100
+Note that these are percentages of available space and available files, and do
+_not_ appear as 100 minus the percentage displayed by the "df" program.
+The userspace daemon scans the cache to build up a table of cullable objects.
+These are then culled in least recently used order.  A new scan of the cache is
+started as soon as space is made in the table.  Objects will be skipped if
+their atimes have changed or if the kernel module says it is still using them.
+===============
+CACHE STRUCTURE
+===============
+The CacheFiles module will create two directories in the directory it was
+given:
+ (*) cache/
+ (*) graveyard/
+The active cache objects all reside in the first directory.  The CacheFiles
+kernel module moves any retired or culled objects that it can't simply unlink
+to the graveyard from which the daemon will actually delete them.
+The daemon uses dnotify to monitor the graveyard directory, and will delete
+anything that appears therein.
+The module represents index objects as directories with the filename "I..." or
+"J...".  Note that the "cache/" directory is itself a special index.
+Data objects are represented as files if they have no children, or directories
+if they do.  Their filenames all begin "D..." or "E...".  If represented as a
+directory, data objects will have a file in the directory called "data" that
+actually holds the data.
+Special objects are similar to data objects, except their filenames begin
+"S..." or "T...".
+If an object has children, then it will be represented as a directory.
+Immediately in the representative directory are a collection of directories
+named for hash values of the child object keys with an '@' prepended.  Into
+this directory, if possible, will be placed the representations of the child
+objects:
+        INDEX     INDEX      INDEX                             DATA FILES
+        ========= ========== ================================= ================
+        cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400
+        cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...DB1ry
+        cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...N22ry
+        cache/@4a/I03nfs/@30/Ji000000000000000--fHg8hi8400/@75/Es0g000w...FP1ry
+If the key is so long that it exceeds NAME_MAX with the decorations added on to
+it, then it will be cut into pieces, the first few of which will be used to
+make a nest of directories, and the last one of which will be the objects
+inside the last directory.  The names of the intermediate directories will have
+'+' prepended:
+        J1223/@23/+xy...z/+kl...m/Epqr
+Note that keys are raw data, and not only may they exceed NAME_MAX in size,
+they may also contain things like '/' and NUL characters, and so they may not
+be suitable for turning directly into a filename.
+To handle this, CacheFiles will use a suitably printable filename directly and
+"base-64" encode ones that aren't directly suitable.  The two versions of
+object filenames indicate the encoding:
+        OBJECT TYPE     PRINTABLE       ENCODED
+        =============== =============== ===============
+        Index           "I..."          "J..."
+        Data            "D..."          "E..."
+        Special         "S..."          "T..."
+Intermediate directories are always "@" or "+" as appropriate.
+Each object in the cache has an extended attribute label that holds the object
+type ID (required to distinguish special objects) and the auxiliary data from
+the netfs.  The latter is used to detect stale objects in the cache and update
+or retire them.
+Note that CacheFiles will erase from the cache any file it doesn't recognise or
+any file of an incorrect type (such as a FIFO file or a device file).
+==========================
+SECURITY MODEL AND SELINUX
+==========================
+CacheFiles is implemented to deal properly with the LSM security features of
+the Linux kernel and the SELinux facility.
+One of the problems that CacheFiles faces is that it is generally acting on
+behalf of a process, and running in that process's context, and that includes a
+security context that is not appropriate for accessing the cache - either
+because the files in the cache are inaccessible to that process, or because if
+the process creates a file in the cache, that file may be inaccessible to other
+processes.
+The way CacheFiles works is to temporarily change the security context (fsuid,
+fsgid and actor security label) that the process acts as - without changing the
+security context of the process when it the target of an operation performed by
+some other process (so signalling and suchlike still work correctly).
+When the CacheFiles module is asked to bind to its cache, it:
+ (1) Finds the security label attached to the root cache directory and uses
+     that as the security label with which it will create files.  By default,
+     this is:
+        cachefiles_var_t
+ (2) Finds the security label of the process which issued the bind request
+     (presumed to be the cachefilesd daemon), which by default will be:
+        cachefilesd_t
+     and asks LSM to supply a security ID as which it should act given the
+     daemon's label.  By default, this will be:
+        cachefiles_kernel_t
+     SELinux transitions the daemon's security ID to the module's security ID
+     based on a rule of this form in the policy.
+        type_transition <daemon's-ID> kernel_t : process <module's-ID>;
+     For instance:
+        type_transition cachefilesd_t kernel_t : process cachefiles_kernel_t;
+The module's security ID gives it permission to create, move and remove files
+and directories in the cache, to find and access directories and files in the
+cache, to set and access extended attributes on cache objects, and to read and
+write files in the cache.
+The daemon's security ID gives it only a very restricted set of permissions: it
+may scan directories, stat files and erase files and directories.  It may
+not read or write files in the cache, and so it is precluded from accessing the
+data cached therein; nor is it permitted to create new files in the cache.
+There are policy source files available in:
+        http://people.redhat.com/~dhowells/fscache/cachefilesd-0.8.tar.bz2
+and later versions.  In that tarball, see the files:
+        cachefilesd.te
+        cachefilesd.fc
+        cachefilesd.if
+They are built and installed directly by the RPM.
+If a non-RPM based system is being used, then copy the above files to their own
+directory and run:
+        make -f /usr/share/selinux/devel/Makefile
+        semodule -i cachefilesd.pp
+You will need checkpolicy and selinux-policy-devel installed prior to the
+build.
+By default, the cache is located in /var/fscache, but if it is desirable that
+it should be elsewhere, than either the above policy files must be altered, or
+an auxiliary policy must be installed to label the alternate location of the
+cache.
+For instructions on how to add an auxiliary policy to enable the cache to be
+located elsewhere when SELinux is in enforcing mode, please see:
+        /usr/share/doc/cachefilesd-*/move-cache.txt
+When the cachefilesd rpm is installed; alternatively, the document can be found
+in the sources.
+==================
+A NOTE ON SECURITY
+==================
+CacheFiles makes use of the split security in the task_struct.  It allocates
+its own task_security structure, and redirects current->act_as to point to it
+when it acts on behalf of another process, in that process's context.
+The reason it does this is that it calls vfs_mkdir() and suchlike rather than
+bypassing security and calling inode ops directly.  Therefore the VFS and LSM
+may deny the CacheFiles access to the cache data because under some
+circumstances the caching code is running in the security context of whatever
+process issued the original syscall on the netfs.
+Furthermore, should CacheFiles create a file or directory, the security
+parameters with that object is created (UID, GID, security label) would be
+derived from that process that issued the system call, thus potentially
+preventing other processes from accessing the cache - including CacheFiles's
+cache management daemon (cachefilesd).
+What is required is to temporarily override the security of the process that
+issued the system call.  We can't, however, just do an in-place change of the
+security data as that affects the process as an object, not just as a subject.
+This means it may lose signals or ptrace events for example, and affects what
+the process looks like in /proc.
+So CacheFiles makes use of a logical split in the security between the
+objective security (task->sec) and the subjective security (task->act_as).  The
+objective security holds the intrinsic security properties of a process and is
+never overridden.  This is what appears in /proc, and is what is used when a
+process is the target of an operation by some other process (SIGKILL for
+example).
+The subjective security holds the active security properties of a process, and
+may be overridden.  This is not seen externally, and is used whan a process
+acts upon another object, for example SIGKILLing another process or opening a
+file.
+LSM hooks exist that allow SELinux (or Smack or whatever) to reject a request
+for CacheFiles to run in a context of a specific security label, or to create
+files and directories with another security label.
+=======================
+STATISTICAL INFORMATION
+=======================
+If FS-Cache is compiled with the following option enabled:
+        CONFIG_CACHEFILES_HISTOGRAM=y
+then it will gather certain statistics and display them through a proc file.
+ (*) /proc/fs/cachefiles/histogram
+        cat /proc/fs/cachefiles/histogram
+        JIFS  SECS  LOOKUPS   MKDIRS    CREATES
+        ===== ===== ========= ========= =========
+     This shows the breakdown of the number of times each amount of time
+     between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The
+     columns are as follows:
+        COLUMN          TIME MEASUREMENT
+        =======         =======================================================
+        LOOKUPS         Length of time to perform a lookup on the backing fs
+        MKDIRS          Length of time to perform a mkdir on the backing fs
+        CREATES         Length of time to perform a create on the backing fs
+     Each row shows the number of events that took a particular range of times.
+     Each step is 1 jiffy in size.  The JIFS column indicates the particular
+     jiffy range covered, and the SECS field the equivalent number of seconds.
+=========
+DEBUGGING
+=========
+If CONFIG_CACHEFILES_DEBUG is enabled, the CacheFiles facility can have runtime
+debugging enabled by adjusting the value in:
+        /sys/module/cachefiles/parameters/debug
+This is a bitmask of debugging streams to enable:
+        BIT     VALUE   STREAM                          POINT
+        ======= ======= =============================== =======================
+        0       1       General                         Function entry trace
+        1       2                                       Function exit trace
+        2       4                                       General
+The appropriate set of values should be OR'd together and the result written to
+the control file.  For example:
+        echo $((1|4|8)) >/sys/module/cachefiles/parameters/debug
+will turn on all function entry debugging.
diff --git a/Documentation/filesystems/caching/fscache.txt b/Documentation/filesystems/caching/fscache.txt
new file mode 100644
index 000000000000..9e94b9491d89
--- /dev/null
+++ b/Documentation/filesystems/caching/fscache.txt
@@ -0,0 +1,333 @@
+                          ==========================
+                          General Filesystem Caching
+                          ==========================
+========
+OVERVIEW
+========
+This facility is a general purpose cache for network filesystems, though it
+could be used for caching other things such as ISO9660 filesystems too.
+FS-Cache mediates between cache backends (such as CacheFS) and network
+filesystems:
+        +---------+
+        |         |                        +--------------+
+        |   NFS   |--+                     |              |
+        |         |  |                 +-->|   CacheFS    |
+        +---------+  |   +----------+  |   |  /dev/hda5   |
+                     |   |          |  |   +--------------+
+        +---------+  +-->|          |  |
+        |         |      |          |--+
+        |   AFS   |----->| FS-Cache |
+        |         |      |          |--+
+        +---------+  +-->|          |  |
+                     |   |          |  |   +--------------+
+        +---------+  |   +----------+  |   |              |
+        |         |  |                 +-->|  CacheFiles  |
+        |  ISOFS  |--+                     |  /var/cache  |
+        |         |                        +--------------+
+        +---------+
+Or to look at it another way, FS-Cache is a module that provides a caching
+facility to a network filesystem such that the cache is transparent to the
+user:
+        +---------+
+        |         |
+        | Server  |
+        |         |
+        +---------+
+             |                  NETWORK
+        ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+             |
+             |           +----------+
+             V           |          |
+        +---------+      |          |
+        |         |      |          |
+        |   NFS   |----->| FS-Cache |
+        |         |      |          |--+
+        +---------+      |          |  |   +--------------+   +--------------+
+             |           |          |  |   |              |   |              |
+             V           +----------+  +-->|  CacheFiles  |-->|  Ext3        |
+        +---------+                        |  /var/cache  |   |  /dev/sda6   |
+        |         |                        +--------------+   +--------------+
+        |   VFS   |                                ^                     ^
+        |         |                                |                     |
+        +---------+                                +--------------+      |
+             |                  KERNEL SPACE                      |      |
+        ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~
+             |                  USER SPACE                        |      |
+             V                                                    |      |
+        +---------+                                           +--------------+
+        |         |                                           |              |
+        | Process |                                           | cachefilesd  |
+        |         |                                           |              |
+        +---------+                                           +--------------+
+FS-Cache does not follow the idea of completely loading every netfs file
+opened in its entirety into a cache before permitting it to be accessed and
+then serving the pages out of that cache rather than the netfs inode because:
+ (1) It must be practical to operate without a cache.
+ (2) The size of any accessible file must not be limited to the size of the
+     cache.
+ (3) The combined size of all opened files (this includes mapped libraries)
+     must not be limited to the size of the cache.
+ (4) The user should not be forced to download an entire file just to do a
+     one-off access of a small portion of it (such as might be done with the
+     "file" program).
+It instead serves the cache out in PAGE_SIZE chunks as and when requested by
+the netfs('s) using it.
+FS-Cache provides the following facilities:
+ (1) More than one cache can be used at once.  Caches can be selected
+     explicitly by use of tags.
+ (2) Caches can be added / removed at any time.
+ (3) The netfs is provided with an interface that allows either party to
+     withdraw caching facilities from a file (required for (2)).
+ (4) The interface to the netfs returns as few errors as possible, preferring
+     rather to let the netfs remain oblivious.
+ (5) Cookies are used to represent indices, files and other objects to the
+     netfs.  The simplest cookie is just a NULL pointer - indicating nothing
+     cached there.
+ (6) The netfs is allowed to propose - dynamically - any index hierarchy it
+     desires, though it must be aware that the index search function is
+     recursive, stack space is limited, and indices can only be children of
+     indices.
+ (7) Data I/O is done direct to and from the netfs's pages.  The netfs
+     indicates that page A is at index B of the data-file represented by cookie
+     C, and that it should be read or written.  The cache backend may or may
+     not start I/O on that page, but if it does, a netfs callback will be
+     invoked to indicate completion.  The I/O may be either synchronous or
+     asynchronous.
+ (8) Cookies can be "retired" upon release.  At this point FS-Cache will mark
+     them as obsolete and the index hierarchy rooted at that point will get
+     recycled.
+ (9) The netfs provides a "match" function for index searches.  In addition to
+     saying whether a match was made or not, this can also specify that an
+     entry should be updated or deleted.
+(10) As much as possible is done asynchronously.
+FS-Cache maintains a virtual indexing tree in which all indices, files, objects
+and pages are kept.  Bits of this tree may actually reside in one or more
+caches.
+                                           FSDEF
+                                             |
+                        +------------------------------------+
+                        |                                    |
+                       NFS                                  AFS
+                        |                                    |
+           +--------------------------+                +-----------+
+           |                          |                |           |
+        homedir                     mirror          afs.org   redhat.com
+           |                          |                            |
+     +------------+           +---------------+              +----------+
+     |            |           |               |              |          |
+   00001        00002       00007           00125        vol00001   vol00002
+     |            |           |               |                         |
+ +---+---+     +-----+      +---+      +------+------+            +-----+----+
+ |   |   |     |     |      |   |      |      |      |            |     |    |
+PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak
+                     |                                            |
+                    PG0                                       +-------+
+                                                              |       |
+                                                            00001   00003
+                                                              |
+                                                          +---+---+
+                                                          |   |   |
+                                                         PG0 PG1 PG2
+In the example above, you can see two netfs's being backed: NFS and AFS.  These
+have different index hierarchies:
+ (*) The NFS primary index contains per-server indices.  Each server index is
+     indexed by NFS file handles to get data file objects.  Each data file
+     objects can have an array of pages, but may also have further child
+     objects, such as extended attributes and directory entries.  Extended
+     attribute objects themselves have page-array contents.
+ (*) The AFS primary index contains per-cell indices.  Each cell index contains
+     per-logical-volume indices.  Each of volume index contains up to three
+     indices for the read-write, read-only and backup mirrors of those volumes.
+     Each of these contains vnode data file objects, each of which contains an
+     array of pages.
+The very top index is the FS-Cache master index in which individual netfs's
+have entries.
+Any index object may reside in more than one cache, provided it only has index
+children.  Any index with non-index object children will be assumed to only
+reside in one cache.
+The netfs API to FS-Cache can be found in:
+        Documentation/filesystems/caching/netfs-api.txt
+The cache backend API to FS-Cache can be found in:
+        Documentation/filesystems/caching/backend-api.txt
+A description of the internal representations and object state machine can be
+found in:
+        Documentation/filesystems/caching/object.txt
+=======================
+STATISTICAL INFORMATION
+=======================
+If FS-Cache is compiled with the following options enabled:
+        CONFIG_FSCACHE_STATS=y
+        CONFIG_FSCACHE_HISTOGRAM=y
+then it will gather certain statistics and display them through a number of
+proc files.
+ (*) /proc/fs/fscache/stats
+     This shows counts of a number of events that can happen in FS-Cache:
+        CLASS   EVENT   MEANING
+        ======= ======= =======================================================
+        Cookies idx=N   Number of index cookies allocated
+                dat=N   Number of data storage cookies allocated
+                spc=N   Number of special cookies allocated
+        Objects alc=N   Number of objects allocated
+                nal=N   Number of object allocation failures
+                avl=N   Number of objects that reached the available state
+                ded=N   Number of objects that reached the dead state
+        ChkAux  non=N   Number of objects that didn't have a coherency check
+                ok=N    Number of objects that passed a coherency check
+                upd=N   Number of objects that needed a coherency data update
+                obs=N   Number of objects that were declared obsolete
+        Pages   mrk=N   Number of pages marked as being cached
+                unc=N   Number of uncache page requests seen
+        Acquire n=N     Number of acquire cookie requests seen
+                nul=N   Number of acq reqs given a NULL parent
+                noc=N   Number of acq reqs rejected due to no cache available
+                ok=N    Number of acq reqs succeeded
+                nbf=N   Number of acq reqs rejected due to error
+                oom=N   Number of acq reqs failed on ENOMEM
+        Lookups n=N     Number of lookup calls made on cache backends
+                neg=N   Number of negative lookups made
+                pos=N   Number of positive lookups made
+                crt=N   Number of objects created by lookup
+        Updates n=N     Number of update cookie requests seen
+                nul=N   Number of upd reqs given a NULL parent
+                run=N   Number of upd reqs granted CPU time
+        Relinqs n=N     Number of relinquish cookie requests seen
+                nul=N   Number of rlq reqs given a NULL parent
+                wcr=N   Number of rlq reqs waited on completion of creation
+        AttrChg n=N     Number of attribute changed requests seen
+                ok=N    Number of attr changed requests queued
+                nbf=N   Number of attr changed rejected -ENOBUFS
+                oom=N   Number of attr changed failed -ENOMEM
+                run=N   Number of attr changed ops given CPU time
+        Allocs  n=N     Number of allocation requests seen
+                ok=N    Number of successful alloc reqs
+                wt=N    Number of alloc reqs that waited on lookup completion
+                nbf=N   Number of alloc reqs rejected -ENOBUFS
+                ops=N   Number of alloc reqs submitted
+                owt=N   Number of alloc reqs waited for CPU time
+        Retrvls n=N     Number of retrieval (read) requests seen
+                ok=N    Number of successful retr reqs
+                wt=N    Number of retr reqs that waited on lookup completion
+                nod=N   Number of retr reqs returned -ENODATA
+                nbf=N   Number of retr reqs rejected -ENOBUFS
+                int=N   Number of retr reqs aborted -ERESTARTSYS
+                oom=N   Number of retr reqs failed -ENOMEM
+                ops=N   Number of retr reqs submitted
+                owt=N   Number of retr reqs waited for CPU time
+        Stores  n=N     Number of storage (write) requests seen
+                ok=N    Number of successful store reqs
+                agn=N   Number of store reqs on a page already pending storage
+                nbf=N   Number of store reqs rejected -ENOBUFS
+                oom=N   Number of store reqs failed -ENOMEM
+                ops=N   Number of store reqs submitted
+                run=N   Number of store reqs granted CPU time
+        Ops     pend=N  Number of times async ops added to pending queues
+                run=N   Number of times async ops given CPU time
+                enq=N   Number of times async ops queued for processing
+                dfr=N   Number of async ops queued for deferred release
+                rel=N   Number of async ops released
+                gc=N    Number of deferred-release async ops garbage collected
+ (*) /proc/fs/fscache/histogram
+        cat /proc/fs/fscache/histogram
+        JIFS  SECS  OBJ INST  OP RUNS   OBJ RUNS  RETRV DLY RETRIEVLS
+        ===== ===== ========= ========= ========= ========= =========
+     This shows the breakdown of the number of times each amount of time
+     between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The
+     columns are as follows:
+        COLUMN          TIME MEASUREMENT
+        =======         =======================================================
+        OBJ INST        Length of time to instantiate an object
+        OP RUNS         Length of time a call to process an operation took
+        OBJ RUNS        Length of time a call to process an object event took
+        RETRV DLY       Time between an requesting a read and lookup completing
+        RETRIEVLS       Time between beginning and end of a retrieval
+     Each row shows the number of events that took a particular range of times.
+     Each step is 1 jiffy in size.  The JIFS column indicates the particular
+     jiffy range covered, and the SECS field the equivalent number of seconds.
+=========
+DEBUGGING
+=========
+If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime
+debugging enabled by adjusting the value in:
+        /sys/module/fscache/parameters/debug
+This is a bitmask of debugging streams to enable:
+        BIT     VALUE   STREAM                          POINT
+        ======= ======= =============================== =======================
+        0       1       Cache management                Function entry trace
+        1       2                                       Function exit trace
+        2       4                                       General
+        3       8       Cookie management               Function entry trace
+        4       16                                      Function exit trace
+        5       32                                      General
+        6       64      Page handling                   Function entry trace
+        7       128                                     Function exit trace
+        8       256                                     General
+        9       512     Operation management            Function entry trace
+        10      1024                                    Function exit trace
+        11      2048                                    General
+The appropriate set of values should be OR'd together and the result written to
+the control file.  For example:
+        echo $((1|8|64)) >/sys/module/fscache/parameters/debug
+will turn on all function entry debugging.
diff --git a/Documentation/filesystems/caching/netfs-api.txt b/Documentation/filesystems/caching/netfs-api.txt
new file mode 100644
index 000000000000..4db125b3a5c6
--- /dev/null
+++ b/Documentation/filesystems/caching/netfs-api.txt
@@ -0,0 +1,778 @@
+                        ===============================
+                        FS-CACHE NETWORK FILESYSTEM API
+                        ===============================
+There's an API by which a network filesystem can make use of the FS-Cache
+facilities.  This is based around a number of principles:
+ (1) Caches can store a number of different object types.  There are two main
+     object types: indices and files.  The first is a special type used by
+     FS-Cache to make finding objects faster and to make retiring of groups of
+     objects easier.
+ (2) Every index, file or other object is represented by a cookie.  This cookie
+     may or may not have anything associated with it, but the netfs doesn't
+     need to care.
+ (3) Barring the top-level index (one entry per cached netfs), the index
+     hierarchy for each netfs is structured according the whim of the netfs.
+This API is declared in <linux/fscache.h>.
+This document contains the following sections:
+         (1) Network filesystem definition
+         (2) Index definition
+         (3) Object definition
+         (4) Network filesystem (un)registration
+         (5) Cache tag lookup
+         (6) Index registration
+         (7) Data file registration
+         (8) Miscellaneous object registration
+         (9) Setting the data file size
+        (10) Page alloc/read/write
+        (11) Page uncaching
+        (12) Index and data file update
+        (13) Miscellaneous cookie operations
+        (14) Cookie unregistration
+        (15) Index and data file invalidation
+        (16) FS-Cache specific page flags.
+=============================
+NETWORK FILESYSTEM DEFINITION
+=============================
+FS-Cache needs a description of the network filesystem.  This is specified
+using a record of the following structure:
+        struct fscache_netfs {
+                uint32_t                        version;
+                const char                      *name;
+                struct fscache_cookie           *primary_index;
+                ...
+        };
+This first two fields should be filled in before registration, and the third
+will be filled in by the registration function; any other fields should just be
+ignored and are for internal use only.
+The fields are:
+ (1) The name of the netfs (used as the key in the toplevel index).
+ (2) The version of the netfs (if the name matches but the version doesn't, the
+     entire in-cache hierarchy for this netfs will be scrapped and begun
+     afresh).
+ (3) The cookie representing the primary index will be allocated according to
+     another parameter passed into the registration function.
+For example, kAFS (linux/fs/afs/) uses the following definitions to describe
+itself:
+        struct fscache_netfs afs_cache_netfs = {
+                .version        = 0,
+                .name           = "afs",
+        };
+================
+INDEX DEFINITION
+================
+Indices are used for two purposes:
+ (1) To aid the finding of a file based on a series of keys (such as AFS's
+     "cell", "volume ID", "vnode ID").
+ (2) To make it easier to discard a subset of all the files cached based around
+     a particular key - for instance to mirror the removal of an AFS volume.
+However, since it's unlikely that any two netfs's are going to want to define
+their index hierarchies in quite the same way, FS-Cache tries to impose as few
+restraints as possible on how an index is structured and where it is placed in
+the tree.  The netfs can even mix indices and data files at the same level, but
+it's not recommended.
+Each index entry consists of a key of indeterminate length plus some auxilliary
+data, also of indeterminate length.
+There are some limits on indices:
+ (1) Any index containing non-index objects should be restricted to a single
+     cache.  Any such objects created within an index will be created in the
+     first cache only.  The cache in which an index is created can be
+     controlled by cache tags (see below).
+ (2) The entry data must be atomically journallable, so it is limited to about
+     400 bytes at present.  At least 400 bytes will be available.
+ (3) The depth of the index tree should be judged with care as the search
+     function is recursive.  Too many layers will run the kernel out of stack.
+=================
+OBJECT DEFINITION
+=================
+To define an object, a structure of the following type should be filled out:
+        struct fscache_cookie_def
+        {
+                uint8_t name[16];
+                uint8_t type;
+                struct fscache_cache_tag *(*select_cache)(
+                        const void *parent_netfs_data,
+                        const void *cookie_netfs_data);
+                uint16_t (*get_key)(const void *cookie_netfs_data,
+                                    void *buffer,
+                                    uint16_t bufmax);
+                void (*get_attr)(const void *cookie_netfs_data,
+                                 uint64_t *size);
+                uint16_t (*get_aux)(const void *cookie_netfs_data,
+                                    void *buffer,
+                                    uint16_t bufmax);
+                enum fscache_checkaux (*check_aux)(void *cookie_netfs_data,
+                                                   const void *data,
+                                                   uint16_t datalen);
+                void (*get_context)(void *cookie_netfs_data, void *context);
+                void (*put_context)(void *cookie_netfs_data, void *context);
+                void (*mark_pages_cached)(void *cookie_netfs_data,
+                                          struct address_space *mapping,
+                                          struct pagevec *cached_pvec);
+                void (*now_uncached)(void *cookie_netfs_data);
+        };
+This has the following fields:
+ (1) The type of the object [mandatory].
+     This is one of the following values:
+        (*) FSCACHE_COOKIE_TYPE_INDEX
+            This defines an index, which is a special FS-Cache type.
+        (*) FSCACHE_COOKIE_TYPE_DATAFILE
+            This defines an ordinary data file.
+        (*) Any other value between 2 and 255
+            This defines an extraordinary object such as an XATTR.
+ (2) The name of the object type (NUL terminated unless all 16 chars are used)
+     [optional].
+ (3) A function to select the cache in which to store an index [optional].
+     This function is invoked when an index needs to be instantiated in a cache
+     during the instantiation of a non-index object.  Only the immediate index
+     parent for the non-index object will be queried.  Any indices above that
+     in the hierarchy may be stored in multiple caches.  This function does not
+     need to be supplied for any non-index object or any index that will only
+     have index children.
+     If this function is not supplied or if it returns NULL then the first
+     cache in the parent's list will be chosed, or failing that, the first
+     cache in the master list.
+ (4) A function to retrieve an object's key from the netfs [mandatory].
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function and the maximum length of key data that it may
+     provide.  It should write the required key data into the given buffer and
+     return the quantity it wrote.
+ (5) A function to retrieve attribute data from the netfs [optional].
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function.  It should return the size of the file if
+     this is a data file.  The size may be used to govern how much cache must
+     be reserved for this file in the cache.
+     If the function is absent, a file size of 0 is assumed.
+ (6) A function to retrieve auxilliary data from the netfs [optional].
+     This function will be called with the netfs data that was passed to the
+     cookie acquisition function and the maximum length of auxilliary data that
+     it may provide.  It should write the auxilliary data into the given buffer
+     and return the quantity it wrote.
+     If this function is absent, the auxilliary data length will be set to 0.
+     The length of the auxilliary data buffer may be dependent on the key
+     length.  A netfs mustn't rely on being able to provide more than 400 bytes
+     for both.
+ (7) A function to check the auxilliary data [optional].
+     This function will be called to check that a match found in the cache for
+     this object is valid.  For instance with AFS it could check the auxilliary
+     data against the data version number returned by the server to determine
+     whether the index entry in a cache is still valid.
+     If this function is absent, it will be assumed that matching objects in a
+     cache are always valid.
+     If present, the function should return one of the following values:
+        (*) FSCACHE_CHECKAUX_OKAY               - the entry is okay as is
+        (*) FSCACHE_CHECKAUX_NEEDS_UPDATE       - the entry requires update
+        (*) FSCACHE_CHECKAUX_OBSOLETE           - the entry should be deleted
+     This function can also be used to extract data from the auxilliary data in
+     the cache and copy it into the netfs's structures.
+ (8) A pair of functions to manage contexts for the completion callback
+     [optional].
+     The cache read/write functions are passed a context which is then passed
+     to the I/O completion callback function.  To ensure this context remains
+     valid until after the I/O completion is called, two functions may be
+     provided: one to get an extra reference on the context, and one to drop a
+     reference to it.
+     If the context is not used or is a type of object that won't go out of
+     scope, then these functions are not required.  These functions are not
+     required for indices as indices may not contain data.  These functions may
+     be called in interrupt context and so may not sleep.
+ (9) A function to mark a page as retaining cache metadata [optional].
+     This is called by the cache to indicate that it is retaining in-memory
+     information for this page and that the netfs should uncache the page when
+     it has finished.  This does not indicate whether there's data on the disk
+     or not.  Note that several pages at once may be presented for marking.
+     The PG_fscache bit is set on the pages before this function would be
+     called, so the function need not be provided if this is sufficient.
+     This function is not required for indices as they're not permitted data.
+(10) A function to unmark all the pages retaining cache metadata [mandatory].
+     This is called by FS-Cache to indicate that a backing store is being
+     unbound from a cookie and that all the marks on the pages should be
+     cleared to prevent confusion.  Note that the cache will have torn down all
+     its tracking information so that the pages don't need to be explicitly
+     uncached.
+     This function is not required for indices as they're not permitted data.
+===================================
+NETWORK FILESYSTEM (UN)REGISTRATION
+===================================
+The first step is to declare the network filesystem to the cache.  This also
+involves specifying the layout of the primary index (for AFS, this would be the
+"cell" level).
+The registration function is:
+        int fscache_register_netfs(struct fscache_netfs *netfs);
+It just takes a pointer to the netfs definition.  It returns 0 or an error as
+appropriate.
+For kAFS, registration is done as follows:
+        ret = fscache_register_netfs(&afs_cache_netfs);
+The last step is, of course, unregistration:
+        void fscache_unregister_netfs(struct fscache_netfs *netfs);
+================
+CACHE TAG LOOKUP
+================
+FS-Cache permits the use of more than one cache.  To permit particular index
+subtrees to be bound to particular caches, the second step is to look up cache
+representation tags.  This step is optional; it can be left entirely up to
+FS-Cache as to which cache should be used.  The problem with doing that is that
+FS-Cache will always pick the first cache that was registered.
+To get the representation for a named tag:
+        struct fscache_cache_tag *fscache_lookup_cache_tag(const char *name);
+This takes a text string as the name and returns a representation of a tag.  It
+will never return an error.  It may return a dummy tag, however, if it runs out
+of memory; this will inhibit caching with this tag.
+Any representation so obtained must be released by passing it to this function:
+        void fscache_release_cache_tag(struct fscache_cache_tag *tag);
+The tag will be retrieved by FS-Cache when it calls the object definition
+operation select_cache().
+==================
+INDEX REGISTRATION
+==================
+The third step is to inform FS-Cache about part of an index hierarchy that can
+be used to locate files.  This is done by requesting a cookie for each index in
+the path to the file:
+        struct fscache_cookie *
+        fscache_acquire_cookie(struct fscache_cookie *parent,
+                               const struct fscache_object_def *def,
+                               void *netfs_data);
+This function creates an index entry in the index represented by parent,
+filling in the index entry by calling the operations pointed to by def.
+Note that this function never returns an error - all errors are handled
+internally.  It may, however, return NULL to indicate no cookie.  It is quite
+acceptable to pass this token back to this function as the parent to another
+acquisition (or even to the relinquish cookie, read page and write page
+functions - see below).
+Note also that no indices are actually created in a cache until a non-index
+object needs to be created somewhere down the hierarchy.  Furthermore, an index
+may be created in several different caches independently at different times.
+This is all handled transparently, and the netfs doesn't see any of it.
+For example, with AFS, a cell would be added to the primary index.  This index
+entry would have a dependent inode containing a volume location index for the
+volume mappings within this cell:
+        cell->cache =
+                fscache_acquire_cookie(afs_cache_netfs.primary_index,
+                                       &afs_cell_cache_index_def,
+                                       cell);
+Then when a volume location was accessed, it would be entered into the cell's
+index and an inode would be allocated that acts as a volume type and hash chain
+combination:
+        vlocation->cache =
+                fscache_acquire_cookie(cell->cache,
+                                       &afs_vlocation_cache_index_def,
+                                       vlocation);
+And then a particular flavour of volume (R/O for example) could be added to
+that index, creating another index for vnodes (AFS inode equivalents):
+        volume->cache =
+                fscache_acquire_cookie(vlocation->cache,
+                                       &afs_volume_cache_index_def,
+                                       volume);
+======================
+DATA FILE REGISTRATION
+======================
+The fourth step is to request a data file be created in the cache.  This is
+identical to index cookie acquisition.  The only difference is that the type in
+the object definition should be something other than index type.
+        vnode->cache =
+                fscache_acquire_cookie(volume->cache,
+                                       &afs_vnode_cache_object_def,
+                                       vnode);
+=================================
+MISCELLANEOUS OBJECT REGISTRATION
+=================================
+An optional step is to request an object of miscellaneous type be created in
+the cache.  This is almost identical to index cookie acquisition.  The only
+difference is that the type in the object definition should be something other
+than index type.  Whilst the parent object could be an index, it's more likely
+it would be some other type of object such as a data file.
+        xattr->cache =
+                fscache_acquire_cookie(vnode->cache,
+                                       &afs_xattr_cache_object_def,
+                                       xattr);
+Miscellaneous objects might be used to store extended attributes or directory
+entries for example.
+==========================
+SETTING THE DATA FILE SIZE
+==========================
+The fifth step is to set the physical attributes of the file, such as its size.
+This doesn't automatically reserve any space in the cache, but permits the
+cache to adjust its metadata for data tracking appropriately:
+        int fscache_attr_changed(struct fscache_cookie *cookie);
+The cache will return -ENOBUFS if there is no backing cache or if there is no
+space to allocate any extra metadata required in the cache.  The attributes
+will be accessed with the get_attr() cookie definition operation.
+Note that attempts to read or write data pages in the cache over this size may
+be rebuffed with -ENOBUFS.
+This operation schedules an attribute adjustment to happen asynchronously at
+some point in the future, and as such, it may happen after the function returns
+to the caller.  The attribute adjustment excludes read and write operations.
+=====================
+PAGE READ/ALLOC/WRITE
+=====================
+And the sixth step is to store and retrieve pages in the cache.  There are
+three functions that are used to do this.
+Note:
+ (1) A page should not be re-read or re-allocated without uncaching it first.
+ (2) A read or allocated page must be uncached when the netfs page is released
+     from the pagecache.
+ (3) A page should only be written to the cache if previous read or allocated.
+This permits the cache to maintain its page tracking in proper order.
+PAGE READ
+---------
+Firstly, the netfs should ask FS-Cache to examine the caches and read the
+contents cached for a particular page of a particular file if present, or else
+allocate space to store the contents if not:
+        typedef
+        void (*fscache_rw_complete_t)(struct page *page,
+                                      void *context,
+                                      int error);
+        int fscache_read_or_alloc_page(struct fscache_cookie *cookie,
+                                       struct page *page,
+                                       fscache_rw_complete_t end_io_func,
+                                       void *context,
+                                       gfp_t gfp);
+The cookie argument must specify a cookie for an object that isn't an index,
+the page specified will have the data loaded into it (and is also used to
+specify the page number), and the gfp argument is used to control how any
+memory allocations made are satisfied.
+If the cookie indicates the inode is not cached:
+ (1) The function will return -ENOBUFS.
+Else if there's a copy of the page resident in the cache:
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+ (2) The function will submit a request to read the data from the cache's
+     backing device directly into the page specified.
+ (3) The function will return 0.
+ (4) When the read is complete, end_io_func() will be invoked with:
+     (*) The netfs data supplied when the cookie was created.
+     (*) The page descriptor.
+     (*) The context argument passed to the above function.  This will be
+         maintained with the get_context/put_context functions mentioned above.
+     (*) An argument that's 0 on success or negative for an error code.
+     If an error occurs, it should be assumed that the page contains no usable
+     data.
+     end_io_func() will be called in process context if the read is results in
+     an error, but it might be called in interrupt context if the read is
+     successful.
+Otherwise, if there's not a copy available in cache, but the cache may be able
+to store the page:
+ (1) The mark_pages_cached() cookie operation will be called on that page.
+ (2) A block may be reserved in the cache and attached to the object at the
+     appropriate place.
+ (3) The function will return -ENODATA.
+This function may also return -ENOMEM or -EINTR, in which case it won't have
+read any data from the cache.
+PAGE ALLOCATE
+-------------
+Alternatively, if there's not expected to be any data in the cache for a page
+because the file has been extended, a block can simply be allocated instead:
+        int fscache_alloc_page(struct fscache_cookie *cookie,
+                               struct page *page,
+                               gfp_t gfp);
+This is similar to the fscache_read_or_alloc_page() function, except that it
+never reads from the cache.  It will return 0 if a block has been allocated,
+rather than -ENODATA as the other would.  One or the other must be performed
+before writing to the cache.
+The mark_pages_cached() cookie operation will be called on the page if
+successful.
+PAGE WRITE
+----------
+Secondly, if the netfs changes the contents of the page (either due to an
+initial download or if a user performs a write), then the page should be
+written back to the cache:
+        int fscache_write_page(struct fscache_cookie *cookie,
+                               struct page *page,
+                               gfp_t gfp);
+The cookie argument must specify a data file cookie, the page specified should
+contain the data to be written (and is also used to specify the page number),
+and the gfp argument is used to control how any memory allocations made are
+satisfied.
+The page must have first been read or allocated successfully and must not have
+been uncached before writing is performed.
+If the cookie indicates the inode is not cached then:
+ (1) The function will return -ENOBUFS.
+Else if space can be allocated in the cache to hold this page:
+ (1) PG_fscache_write will be set on the page.
+ (2) The function will submit a request to write the data to cache's backing
+     device directly from the page specified.
+ (3) The function will return 0.
+ (4) When the write is complete PG_fscache_write is cleared on the page and
+     anyone waiting for that bit will be woken up.
+Else if there's no space available in the cache, -ENOBUFS will be returned.  It
+is also possible for the PG_fscache_write bit to be cleared when no write took
+place if unforeseen circumstances arose (such as a disk error).
+Writing takes place asynchronously.
+MULTIPLE PAGE READ
+------------------
+A facility is provided to read several pages at once, as requested by the
+readpages() address space operation:
+        int fscache_read_or_alloc_pages(struct fscache_cookie *cookie,
+                                        struct address_space *mapping,
+                                        struct list_head *pages,
+                                        int *nr_pages,
+                                        fscache_rw_complete_t end_io_func,
+                                        void *context,
+                                        gfp_t gfp);
+This works in a similar way to fscache_read_or_alloc_page(), except:
+ (1) Any page it can retrieve data for is removed from pages and nr_pages and
+     dispatched for reading to the disk.  Reads of adjacent pages on disk may
+     be merged for greater efficiency.
+ (2) The mark_pages_cached() cookie operation will be called on several pages
+     at once if they're being read or allocated.
+ (3) If there was an general error, then that error will be returned.
+     Else if some pages couldn't be allocated or read, then -ENOBUFS will be
+     returned.
+     Else if some pages couldn't be read but were allocated, then -ENODATA will
+     be returned.
+     Otherwise, if all pages had reads dispatched, then 0 will be returned, the
+     list will be empty and *nr_pages will be 0.
+ (4) end_io_func will be called once for each page being read as the reads
+     complete.  It will be called in process context if error != 0, but it may
+     be called in interrupt context if there is no error.
+Note that a return of -ENODATA, -ENOBUFS or any other error does not preclude
+some of the pages being read and some being allocated.  Those pages will have
+been marked appropriately and will need uncaching.
+==============
+PAGE UNCACHING
+==============
+To uncache a page, this function should be called:
+        void fscache_uncache_page(struct fscache_cookie *cookie,
+                                  struct page *page);
+This function permits the cache to release any in-memory representation it
+might be holding for this netfs page.  This function must be called once for
+each page on which the read or write page functions above have been called to
+make sure the cache's in-memory tracking information gets torn down.
+Note that pages can't be explicitly deleted from the a data file.  The whole
+data file must be retired (see the relinquish cookie function below).
+Furthermore, note that this does not cancel the asynchronous read or write
+operation started by the read/alloc and write functions, so the page
+invalidation and release functions must use:
+        bool fscache_check_page_write(struct fscache_cookie *cookie,
+                                      struct page *page);
+to see if a page is being written to the cache, and:
+        void fscache_wait_on_page_write(struct fscache_cookie *cookie,
+                                        struct page *page);
+to wait for it to finish if it is.
+==========================
+INDEX AND DATA FILE UPDATE
+==========================
+To request an update of the index data for an index or other object, the
+following function should be called:
+        void fscache_update_cookie(struct fscache_cookie *cookie);
+This function will refer back to the netfs_data pointer stored in the cookie by
+the acquisition function to obtain the data to write into each revised index
+entry.  The update method in the parent index definition will be called to
+transfer the data.
+Note that partial updates may happen automatically at other times, such as when
+data blocks are added to a data file object.
+===============================
+MISCELLANEOUS COOKIE OPERATIONS
+===============================
+There are a number of operations that can be used to control cookies:
+ (*) Cookie pinning:
+        int fscache_pin_cookie(struct fscache_cookie *cookie);
+        void fscache_unpin_cookie(struct fscache_cookie *cookie);
+     These operations permit data cookies to be pinned into the cache and to
+     have the pinning removed.  They are not permitted on index cookies.
+     The pinning function will return 0 if successful, -ENOBUFS in the cookie
+     isn't backed by a cache, -EOPNOTSUPP if the cache doesn't support pinning,
+     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+     -EIO if there's any other problem.
+ (*) Data space reservation:
+        int fscache_reserve_space(struct fscache_cookie *cookie, loff_t size);
+     This permits a netfs to request cache space be reserved to store up to the
+     given amount of a file.  It is permitted to ask for more than the current
+     size of the file to allow for future file expansion.
+     If size is given as zero then the reservation will be cancelled.
+     The function will return 0 if successful, -ENOBUFS in the cookie isn't
+     backed by a cache, -EOPNOTSUPP if the cache doesn't support reservations,
+     -ENOSPC if there isn't enough space to honour the operation, -ENOMEM or
+     -EIO if there's any other problem.
+     Note that this doesn't pin an object in a cache; it can still be culled to
+     make space if it's not in use.
+=====================
+COOKIE UNREGISTRATION
+=====================
+To get rid of a cookie, this function should be called.
+        void fscache_relinquish_cookie(struct fscache_cookie *cookie,
+                                       int retire);
+If retire is non-zero, then the object will be marked for recycling, and all
+copies of it will be removed from all active caches in which it is present.
+Not only that but all child objects will also be retired.
+If retire is zero, then the object may be available again when next the
+acquisition function is called.  Retirement here will overrule the pinning on a
+cookie.
+One very important note - relinquish must NOT be called for a cookie unless all
+the cookies for "child" indices, objects and pages have been relinquished
+first.
+================================
+INDEX AND DATA FILE INVALIDATION
+================================
+There is no direct way to invalidate an index subtree or a data file.  To do
+this, the caller should relinquish and retire the cookie they have, and then
+acquire a new one.
+===========================
+FS-CACHE SPECIFIC PAGE FLAG
+===========================
+FS-Cache makes use of a page flag, PG_private_2, for its own purpose.  This is
+given the alternative name PG_fscache.
+PG_fscache is used to indicate that the page is known by the cache, and that
+the cache must be informed if the page is going to go away.  It's an indication
+to the netfs that the cache has an interest in this page, where an interest may
+be a pointer to it, resources allocated or reserved for it, or I/O in progress
+upon it.
+The netfs can use this information in methods such as releasepage() to
+determine whether it needs to uncache a page or update it.
+Furthermore, if this bit is set, releasepage() and invalidatepage() operations
+will be called on a page to get rid of it, even if PG_private is not set.  This
+allows caching to attempted on a page before read_cache_pages() to be called
+after fscache_read_or_alloc_pages() as the former will try and release pages it
+was given under certain circumstances.
+This bit does not overlap with such as PG_private.  This means that FS-Cache
+can be used with a filesystem that uses the block buffering code.
+There are a number of operations defined on this flag:
+        int PageFsCache(struct page *page);
+        void SetPageFsCache(struct page *page)
+        void ClearPageFsCache(struct page *page)
+        int TestSetPageFsCache(struct page *page)
+        int TestClearPageFsCache(struct page *page)
+These functions are bit test, bit set, bit clear, bit test and set and bit
+test and clear operations on PG_fscache.
diff --git a/Documentation/filesystems/caching/object.txt b/Documentation/filesystems/caching/object.txt
new file mode 100644
index 000000000000..e8b0a35d8fe5
--- /dev/null
+++ b/Documentation/filesystems/caching/object.txt
@@ -0,0 +1,313 @@
+             ====================================================
+             IN-KERNEL CACHE OBJECT REPRESENTATION AND MANAGEMENT
+             ====================================================
+By: David Howells <dhowells@redhat.com>
+Contents:
+ (*) Representation
+ (*) Object management state machine.
+     - Provision of cpu time.
+     - Locking simplification.
+ (*) The set of states.
+ (*) The set of events.
+==============
+REPRESENTATION
+==============
+FS-Cache maintains an in-kernel representation of each object that a netfs is
+currently interested in.  Such objects are represented by the fscache_cookie
+struct and are referred to as cookies.
+FS-Cache also maintains a separate in-kernel representation of the objects that
+a cache backend is currently actively caching.  Such objects are represented by
+the fscache_object struct.  The cache backends allocate these upon request, and
+are expected to embed them in their own representations.  These are referred to
+as objects.
+There is a 1:N relationship between cookies and objects.  A cookie may be
+represented by multiple objects - an index may exist in more than one cache -
+or even by no objects (it may not be cached).
+Furthermore, both cookies and objects are hierarchical.  The two hierarchies
+correspond, but the cookies tree is a superset of the union of the object trees
+of multiple caches:
+            NETFS INDEX TREE               :      CACHE 1     :      CACHE 2
+                                           :                  :
+                                           :   +-----------+  :
+                                  +----------->|  IObject  |  :
+              +-----------+       |        :   +-----------+  :
+              |  ICookie  |-------+        :         |        :
+              +-----------+       |        :         |        :   +-----------+
+                    |             +------------------------------>|  IObject  |
+                    |                      :         |        :   +-----------+
+                    |                      :         V        :         |
+                    |                      :   +-----------+  :         |
+                    V             +----------->|  IObject  |  :         |
+              +-----------+       |        :   +-----------+  :         |
+              |  ICookie  |-------+        :         |        :         V
+              +-----------+       |        :         |        :   +-----------+
+                    |             +------------------------------>|  IObject  |
+              +-----+-----+                :         |        :   +-----------+
+              |           |                :         |        :         |
+              V           |                :         V        :         |
+        +-----------+     |                :   +-----------+  :         |
+        |  ICookie  |------------------------->|  IObject  |  :         |
+        +-----------+     |                :   +-----------+  :         |
+              |           V                :         |        :         V
+              |     +-----------+          :         |        :   +-----------+
+              |     |  ICookie  |-------------------------------->|  IObject  |
+              |     +-----------+          :         |        :   +-----------+
+              V           |                :         V        :         |
+        +-----------+     |                :   +-----------+  :         |
+        |  DCookie  |------------------------->|  DObject  |  :         |
+        +-----------+     |                :   +-----------+  :         |
+                          |                :                  :         |
+                  +-------+-------+        :                  :         |
+                  |               |        :                  :         |
+                  V               V        :                  :         V
+            +-----------+   +-----------+  :                  :   +-----------+
+            |  DCookie  |   |  DCookie  |------------------------>|  DObject  |
+            +-----------+   +-----------+  :                  :   +-----------+
+                                           :                  :
+In the above illustration, ICookie and IObject represent indices and DCookie
+and DObject represent data storage objects.  Indices may have representation in
+multiple caches, but currently, non-index objects may not.  Objects of any type
+may also be entirely unrepresented.
+As far as the netfs API goes, the netfs is only actually permitted to see
+pointers to the cookies.  The cookies themselves and any objects attached to
+those cookies are hidden from it.
+===============================
+OBJECT MANAGEMENT STATE MACHINE
+===============================
+Within FS-Cache, each active object is managed by its own individual state
+machine.  The state for an object is kept in the fscache_object struct, in
+object->state.  A cookie may point to a set of objects that are in different
+states.
+Each state has an action associated with it that is invoked when the machine
+wakes up in that state.  There are four logical sets of states:
+ (1) Preparation: states that wait for the parent objects to become ready.  The
+     representations are hierarchical, and it is expected that an object must
+     be created or accessed with respect to its parent object.
+ (2) Initialisation: states that perform lookups in the cache and validate
+     what's found and that create on disk any missing metadata.
+ (3) Normal running: states that allow netfs operations on objects to proceed
+     and that update the state of objects.
+ (4) Termination: states that detach objects from their netfs cookies, that
+     delete objects from disk, that handle disk and system errors and that free
+     up in-memory resources.
+In most cases, transitioning between states is in response to signalled events.
+When a state has finished processing, it will usually set the mask of events in
+which it is interested (object->event_mask) and relinquish the worker thread.
+Then when an event is raised (by calling fscache_raise_event()), if the event
+is not masked, the object will be queued for processing (by calling
+fscache_enqueue_object()).
+PROVISION OF CPU TIME
+---------------------
+The work to be done by the various states is given CPU time by the threads of
+the slow work facility (see Documentation/slow-work.txt).  This is used in
+preference to the workqueue facility because:
+ (1) Threads may be completely occupied for very long periods of time by a
+     particular work item.  These state actions may be doing sequences of
+     synchronous, journalled disk accesses (lookup, mkdir, create, setxattr,
+     getxattr, truncate, unlink, rmdir, rename).
+ (2) Threads may do little actual work, but may rather spend a lot of time
+     sleeping on I/O.  This means that single-threaded and 1-per-CPU-threaded
+     workqueues don't necessarily have the right numbers of threads.
+LOCKING SIMPLIFICATION
+----------------------
+Because only one worker thread may be operating on any particular object's
+state machine at once, this simplifies the locking, particularly with respect
+to disconnecting the netfs's representation of a cache object (fscache_cookie)
+from the cache backend's representation (fscache_object) - which may be
+requested from either end.
+=================
+THE SET OF STATES
+=================
+The object state machine has a set of states that it can be in.  There are
+preparation states in which the object sets itself up and waits for its parent
+object to transit to a state that allows access to its children:
+ (1) State FSCACHE_OBJECT_INIT.
+     Initialise the object and wait for the parent object to become active.  In
+     the cache, it is expected that it will not be possible to look an object
+     up from the parent object, until that parent object itself has been looked
+     up.
+There are initialisation states in which the object sets itself up and accesses
+disk for the object metadata:
+ (2) State FSCACHE_OBJECT_LOOKING_UP.
+     Look up the object on disk, using the parent as a starting point.
+     FS-Cache expects the cache backend to probe the cache to see whether this
+     object is represented there, and if it is, to see if it's valid (coherency
+     management).
+     The cache should call fscache_object_lookup_negative() to indicate lookup
+     failure for whatever reason, and should call fscache_obtained_object() to
+     indicate success.
+     At the completion of lookup, FS-Cache will let the netfs go ahead with
+     read operations, no matter whether the file is yet cached.  If not yet
+     cached, read operations will be immediately rejected with ENODATA until
+     the first known page is uncached - as to that point there can be no data
+     to be read out of the cache for that file that isn't currently also held
+     in the pagecache.
+ (3) State FSCACHE_OBJECT_CREATING.
+     Create an object on disk, using the parent as a starting point.  This
+     happens if the lookup failed to find the object, or if the object's
+     coherency data indicated what's on disk is out of date.  In this state,
+     FS-Cache expects the cache to create
+     The cache should call fscache_obtained_object() if creation completes
+     successfully, fscache_object_lookup_negative() otherwise.
+     At the completion of creation, FS-Cache will start processing write
+     operations the netfs has queued for an object.  If creation failed, the
+     write ops will be transparently discarded, and nothing recorded in the
+     cache.
+There are some normal running states in which the object spends its time
+servicing netfs requests:
+ (4) State FSCACHE_OBJECT_AVAILABLE.
+     A transient state in which pending operations are started, child objects
+     are permitted to advance from FSCACHE_OBJECT_INIT state, and temporary
+     lookup data is freed.
+ (5) State FSCACHE_OBJECT_ACTIVE.
+     The normal running state.  In this state, requests the netfs makes will be
+     passed on to the cache.
+ (6) State FSCACHE_OBJECT_UPDATING.
+     The state machine comes here to update the object in the cache from the
+     netfs's records.  This involves updating the auxiliary data that is used
+     to maintain coherency.
+And there are terminal states in which an object cleans itself up, deallocates
+memory and potentially deletes stuff from disk:
+ (7) State FSCACHE_OBJECT_LC_DYING.
+     The object comes here if it is dying because of a lookup or creation
+     error.  This would be due to a disk error or system error of some sort.
+     Temporary data is cleaned up, and the parent is released.
+ (8) State FSCACHE_OBJECT_DYING.
+     The object comes here if it is dying due to an error, because its parent
+     cookie has been relinquished by the netfs or because the cache is being
+     withdrawn.
+     Any child objects waiting on this one are given CPU time so that they too
+     can destroy themselves.  This object waits for all its children to go away
+     before advancing to the next state.
+ (9) State FSCACHE_OBJECT_ABORT_INIT.
+     The object comes to this state if it was waiting on its parent in
+     FSCACHE_OBJECT_INIT, but its parent died.  The object will destroy itself
+     so that the parent may proceed from the FSCACHE_OBJECT_DYING state.
+(10) State FSCACHE_OBJECT_RELEASING.
+(11) State FSCACHE_OBJECT_RECYCLING.
+     The object comes to one of these two states when dying once it is rid of
+     all its children, if it is dying because the netfs relinquished its
+     cookie.  In the first state, the cached data is expected to persist, and
+     in the second it will be deleted.
+(12) State FSCACHE_OBJECT_WITHDRAWING.
+     The object transits to this state if the cache decides it wants to
+     withdraw the object from service, perhaps to make space, but also due to
+     error or just because the whole cache is being withdrawn.
+(13) State FSCACHE_OBJECT_DEAD.
+     The object transits to this state when the in-memory object record is
+     ready to be deleted.  The object processor shouldn't ever see an object in
+     this state.
+THE SET OF EVENTS
+-----------------
+There are a number of events that can be raised to an object state machine:
+ (*) FSCACHE_OBJECT_EV_UPDATE
+     The netfs requested that an object be updated.  The state machine will ask
+     the cache backend to update the object, and the cache backend will ask the
+     netfs for details of the change through its cookie definition ops.
+ (*) FSCACHE_OBJECT_EV_CLEARED
+     This is signalled in two circumstances:
+     (a) when an object's last child object is dropped and
+     (b) when the last operation outstanding on an object is completed.
+     This is used to proceed from the dying state.
+ (*) FSCACHE_OBJECT_EV_ERROR
+     This is signalled when an I/O error occurs during the processing of some
+     object.
+ (*) FSCACHE_OBJECT_EV_RELEASE
+ (*) FSCACHE_OBJECT_EV_RETIRE
+     These are signalled when the netfs relinquishes a cookie it was using.
+     The event selected depends on whether the netfs asks for the backing
+     object to be retired (deleted) or retained.
+ (*) FSCACHE_OBJECT_EV_WITHDRAW
+     This is signalled when the cache backend wants to withdraw an object.
+     This means that the object will have to be detached from the netfs's
+     cookie.
+Because the withdrawing releasing/retiring events are all handled by the object
+state machine, it doesn't matter if there's a collision with both ends trying
+to sever the connection at the same time.  The state machine can just pick
+which one it wants to honour, and that effects the other.
diff --git a/Documentation/filesystems/caching/operations.txt b/Documentation/filesystems/caching/operations.txt
new file mode 100644
index 000000000000..b6b070c57cbf
--- /dev/null
+++ b/Documentation/filesystems/caching/operations.txt
@@ -0,0 +1,213 @@
+                       ================================
+                       ASYNCHRONOUS OPERATIONS HANDLING
+                       ================================
+By: David Howells <dhowells@redhat.com>
+Contents:
+ (*) Overview.
+ (*) Operation record initialisation.
+ (*) Parameters.
+ (*) Procedure.
+ (*) Asynchronous callback.
+========
+OVERVIEW
+========
+FS-Cache has an asynchronous operations handling facility that it uses for its
+data storage and retrieval routines.  Its operations are represented by
+fscache_operation structs, though these are usually embedded into some other
+structure.
+This facility is available to and expected to be be used by the cache backends,
+and FS-Cache will create operations and pass them off to the appropriate cache
+backend for completion.
+To make use of this facility, <linux/fscache-cache.h> should be #included.
+===============================
+OPERATION RECORD INITIALISATION
+===============================
+An operation is recorded in an fscache_operation struct:
+        struct fscache_operation {
+                union {
+                        struct work_struct fast_work;
+                        struct slow_work slow_work;
+                };
+                unsigned long           flags;
+                fscache_operation_processor_t processor;
+                ...
+        };
+Someone wanting to issue an operation should allocate something with this
+struct embedded in it.  They should initialise it by calling:
+        void fscache_operation_init(struct fscache_operation *op,
+                                    fscache_operation_release_t release);
+with the operation to be initialised and the release function to use.
+The op->flags parameter should be set to indicate the CPU time provision and
+the exclusivity (see the Parameters section).
+The op->fast_work, op->slow_work and op->processor flags should be set as
+appropriate for the CPU time provision (see the Parameters section).
+FSCACHE_OP_WAITING may be set in op->flags prior to each submission of the
+operation and waited for afterwards.
+==========
+PARAMETERS
+==========
+There are a number of parameters that can be set in the operation record's flag
+parameter.  There are three options for the provision of CPU time in these
+operations:
+ (1) The operation may be done synchronously (FSCACHE_OP_MYTHREAD).  A thread
+     may decide it wants to handle an operation itself without deferring it to
+     another thread.
+     This is, for example, used in read operations for calling readpages() on
+     the backing filesystem in CacheFiles.  Although readpages() does an
+     asynchronous data fetch, the determination of whether pages exist is done
+     synchronously - and the netfs does not proceed until this has been
+     determined.
+     If this option is to be used, FSCACHE_OP_WAITING must be set in op->flags
+     before submitting the operation, and the operating thread must wait for it
+     to be cleared before proceeding:
+                wait_on_bit(&op->flags, FSCACHE_OP_WAITING,
+                            fscache_wait_bit, TASK_UNINTERRUPTIBLE);
+ (2) The operation may be fast asynchronous (FSCACHE_OP_FAST), in which case it
+     will be given to keventd to process.  Such an operation is not permitted
+     to sleep on I/O.
+     This is, for example, used by CacheFiles to copy data from a backing fs
+     page to a netfs page after the backing fs has read the page in.
+     If this option is used, op->fast_work and op->processor must be
+     initialised before submitting the operation:
+                INIT_WORK(&op->fast_work, do_some_work);
+ (3) The operation may be slow asynchronous (FSCACHE_OP_SLOW), in which case it
+     will be given to the slow work facility to process.  Such an operation is
+     permitted to sleep on I/O.
+     This is, for example, used by FS-Cache to handle background writes of
+     pages that have just been fetched from a remote server.
+     If this option is used, op->slow_work and op->processor must be
+     initialised before submitting the operation:
+                fscache_operation_init_slow(op, processor)
+Furthermore, operations may be one of two types:
+ (1) Exclusive (FSCACHE_OP_EXCLUSIVE).  Operations of this type may not run in
+     conjunction with any other operation on the object being operated upon.
+     An example of this is the attribute change operation, in which the file
+     being written to may need truncation.
+ (2) Shareable.  Operations of this type may be running simultaneously.  It's
+     up to the operation implementation to prevent interference between other
+     operations running at the same time.
+=========
+PROCEDURE
+=========
+Operations are used through the following procedure:
+ (1) The submitting thread must allocate the operation and initialise it
+     itself.  Normally this would be part of a more specific structure with the
+     generic op embedded within.
+ (2) The submitting thread must then submit the operation for processing using
+     one of the following two functions:
+        int fscache_submit_op(struct fscache_object *object,
+                              struct fscache_operation *op);
+        int fscache_submit_exclusive_op(struct fscache_object *object,
+                                        struct fscache_operation *op);
+     The first function should be used to submit non-exclusive ops and the
+     second to submit exclusive ones.  The caller must still set the
+     FSCACHE_OP_EXCLUSIVE flag.
+     If successful, both functions will assign the operation to the specified
+     object and return 0.  -ENOBUFS will be returned if the object specified is
+     permanently unavailable.
+     The operation manager will defer operations on an object that is still
+     undergoing lookup or creation.  The operation will also be deferred if an
+     operation of conflicting exclusivity is in progress on the object.
+     If the operation is asynchronous, the manager will retain a reference to
+     it, so the caller should put their reference to it by passing it to:
+        void fscache_put_operation(struct fscache_operation *op);
+ (3) If the submitting thread wants to do the work itself, and has marked the
+     operation with FSCACHE_OP_MYTHREAD, then it should monitor
+     FSCACHE_OP_WAITING as described above and check the state of the object if
+     necessary (the object might have died whilst the thread was waiting).
+     When it has finished doing its processing, it should call
+     fscache_put_operation() on it.
+ (4) The operation holds an effective lock upon the object, preventing other
+     exclusive ops conflicting until it is released.  The operation can be
+     enqueued for further immediate asynchronous processing by adjusting the
+     CPU time provisioning option if necessary, eg:
+        op->flags &= ~FSCACHE_OP_TYPE;
+        op->flags |= ~FSCACHE_OP_FAST;
+     and calling:
+        void fscache_enqueue_operation(struct fscache_operation *op)
+     This can be used to allow other things to have use of the worker thread
+     pools.
+=====================
+ASYNCHRONOUS CALLBACK
+=====================
+When used in asynchronous mode, the worker thread pool will invoke the
+processor method with a pointer to the operation.  This should then get at the
+container struct by using container_of():
+        static void fscache_write_op(struct fscache_operation *_op)
+        {
+                struct fscache_storage *op =
+                        container_of(_op, struct fscache_storage, op);
+        ...
+        }
+The caller holds a reference on the operation, and will invoke
+fscache_put_operation() when the processor function returns.  The processor
+function is at liberty to call fscache_enqueue_operation() or to take extra
+references.
diff --git a/Documentation/slow-work.txt b/Documentation/slow-work.txt
new file mode 100644
index 000000000000..ebc50f808ea4
--- /dev/null
+++ b/Documentation/slow-work.txt
@@ -0,0 +1,174 @@
+                     ====================================
+                     SLOW WORK ITEM EXECUTION THREAD POOL
+                     ====================================
+By: David Howells <dhowells@redhat.com>
+The slow work item execution thread pool is a pool of threads for performing
+things that take a relatively long time, such as making mkdir calls.
+Typically, when processing something, these items will spend a lot of time
+blocking a thread on I/O, thus making that thread unavailable for doing other
+work.
+The standard workqueue model is unsuitable for this class of work item as that
+limits the owner to a single thread or a single thread per CPU.  For some
+tasks, however, more threads - or fewer - are required.
+There is just one pool per system.  It contains no threads unless something
+wants to use it - and that something must register its interest first.  When
+the pool is active, the number of threads it contains is dynamic, varying
+between a maximum and minimum setting, depending on the load.
+====================
+CLASSES OF WORK ITEM
+====================
+This pool support two classes of work items:
+ (*) Slow work items.
+ (*) Very slow work items.
+The former are expected to finish much quicker than the latter.
+An operation of the very slow class may do a batch combination of several
+lookups, mkdirs, and a create for instance.
+An operation of the ordinarily slow class may, for example, write stuff or
+expand files, provided the time taken to do so isn't too long.
+Operations of both types may sleep during execution, thus tying up the thread
+loaned to it.
+THREAD-TO-CLASS ALLOCATION
+--------------------------
+Not all the threads in the pool are available to work on very slow work items.
+The number will be between one and one fewer than the number of active threads.
+This is configurable (see the "Pool Configuration" section).
+All the threads are available to work on ordinarily slow work items, but a
+percentage of the threads will prefer to work on very slow work items.
+The configuration ensures that at least one thread will be available to work on
+very slow work items, and at least one thread will be available that won't work
+on very slow work items at all.
+=====================
+USING SLOW WORK ITEMS
+=====================
+Firstly, a module or subsystem wanting to make use of slow work items must
+register its interest:
+         int ret = slow_work_register_user();
+This will return 0 if successful, or a -ve error upon failure.
+Slow work items may then be set up by:
+ (1) Declaring a slow_work struct type variable:
+        #include <linux/slow-work.h>
+        struct slow_work myitem;
+ (2) Declaring the operations to be used for this item:
+        struct slow_work_ops myitem_ops = {
+                .get_ref = myitem_get_ref,
+                .put_ref = myitem_put_ref,
+                .execute = myitem_execute,
+        };
+     [*] For a description of the ops, see section "Item Operations".
+ (3) Initialising the item:
+        slow_work_init(&myitem, &myitem_ops);
+     or:
+        vslow_work_init(&myitem, &myitem_ops);
+     depending on its class.
+A suitably set up work item can then be enqueued for processing:
+        int ret = slow_work_enqueue(&myitem);
+This will return a -ve error if the thread pool is unable to gain a reference
+on the item, 0 otherwise.
+The items are reference counted, so there ought to be no need for a flush
+operation.  When all a module's slow work items have been processed, and the
+module has no further interest in the facility, it should unregister its
+interest:
+        slow_work_unregister_user();
+===============
+ITEM OPERATIONS
+===============
+Each work item requires a table of operations of type struct slow_work_ops.
+All members are required:
+ (*) Get a reference on an item:
+        int (*get_ref)(struct slow_work *work);
+     This allows the thread pool to attempt to pin an item by getting a
+     reference on it.  This function should return 0 if the reference was
+     granted, or a -ve error otherwise.  If an error is returned,
+     slow_work_enqueue() will fail.
+     The reference is held whilst the item is queued and whilst it is being
+     executed.  The item may then be requeued with the same reference held, or
+     the reference will be released.
+ (*) Release a reference on an item:
+        void (*put_ref)(struct slow_work *work);
+     This allows the thread pool to unpin an item by releasing the reference on
+     it.  The thread pool will not touch the item again once this has been
+     called.
+ (*) Execute an item:
+        void (*execute)(struct slow_work *work);
+     This should perform the work required of the item.  It may sleep, it may
+     perform disk I/O and it may wait for locks.
+==================
+POOL CONFIGURATION
+==================
+The slow-work thread pool has a number of configurables:
+ (*) /proc/sys/kernel/slow-work/min-threads
+     The minimum number of threads that should be in the pool whilst it is in
+     use.  This may be anywhere between 2 and max-threads.
+ (*) /proc/sys/kernel/slow-work/max-threads
+     The maximum number of threads that should in the pool.  This may be
+     anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater.
+ (*) /proc/sys/kernel/slow-work/vslow-percentage
+     The percentage of active threads in the pool that may be used to execute
+     very slow work items.  This may be between 1 and 99.  The resultant number
+     is bounded to between 1 and one fewer than the number of active threads.
+     This ensures there is always at least one thread that can process very
+     slow work items, and always at least one thread that won't.
author	Linus Torvalds <torvalds@linux-foundation.org>	2009-04-03 13:07:43 -0400
committer	Linus Torvalds <torvalds@linux-foundation.org>	2009-04-03 13:07:43 -0400
commit	3cc50ac0dbda5100684e570247782330155d35e0 (patch)
tree	f4b8f22d1725ebe65d2fe658d292dabacd7ed564 /Documentation
parent	d9b9be024a6628a01d8730d1fd0b5f25658a2794 (diff)
parent	b797cac7487dee6bfddeb161631c1bbc54fa3cdb (diff)