nfs: move more to Documentation/filesystems/nfs

Oops: I missed two files in the first commit that created this directory. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
author: J. Bruce Fields <bfields@citi.umich.edu> 2009-11-06 13:59:43 -0500
committer: J. Bruce Fields <bfields@citi.umich.edu> 2009-11-06 14:01:02 -0500
commit: ea4878a24d7e6a467d369b962bab95bd6a12cbe0 (patch)
tree: 4f937b8dfa658b16779ae2267d450b53fb035fe7 /Documentation/filesystems/nfs
parent: 8c10cbdb4af642d9a2efb45ea89251aaab905360 (diff)
3 files changed, 365 insertions, 0 deletions
diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX
index 6ff3d212027b..2f68cd688769 100644
--- a/Documentation/filesystems/nfs/00-INDEX
+++ b/Documentation/filesystems/nfs/00-INDEX
@@ -2,6 +2,8 @@
        - this file (nfs-related documentation).
 Exporting
        - explanation of how to make filesystems exportable.
+knfsd-stats.txt
+        - statistics which the NFS server makes available to user space.
 nfs.txt
        - nfs client, and DNS resolution for fs_locations.
 nfs41-server.txt
@@ -10,3 +12,5 @@ nfs-rdma.txt
        - how to install and setup the Linux NFS/RDMA client and server software
 nfsroot.txt
        - short guide on setting up a diskless box with NFS root filesystem.
+rpc-cache.txt
+        - introduction to the caching mechanisms in the sunrpc layer.
diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.txt
new file mode 100644
index 000000000000..64ced5149d37
--- /dev/null
+++ b/Documentation/filesystems/nfs/knfsd-stats.txt
@@ -0,0 +1,159 @@
+Kernel NFS Server Statistics
+============================
+This document describes the format and semantics of the statistics
+which the kernel NFS server makes available to userspace.  These
+statistics are available in several text form pseudo files, each of
+which is described separately below.
+In most cases you don't need to know these formats, as the nfsstat(8)
+program from the nfs-utils distribution provides a helpful command-line
+interface for extracting and printing them.
+All the files described here are formatted as a sequence of text lines,
+separated by newline '\n' characters.  Lines beginning with a hash
+'#' character are comments intended for humans and should be ignored
+by parsing routines.  All other lines contain a sequence of fields
+separated by whitespace.
+/proc/fs/nfsd/pool_stats
+------------------------
+This file is available in kernels from 2.6.30 onwards, if the
+/proc/fs/nfsd filesystem is mounted (it almost always should be).
+The first line is a comment which describes the fields present in
+all the other lines.  The other lines present the following data as
+a sequence of unsigned decimal numeric fields.  One line is shown
+for each NFS thread pool.
+All counters are 64 bits wide and wrap naturally.  There is no way
+to zero these counters, instead applications should do their own
+rate conversion.
+pool
+        The id number of the NFS thread pool to which this line applies.
+        This number does not change.
+        Thread pool ids are a contiguous set of small integers starting
+        at zero.  The maximum value depends on the thread pool mode, but
+        currently cannot be larger than the number of CPUs in the system.
+        Note that in the default case there will be a single thread pool
+        which contains all the nfsd threads and all the CPUs in the system,
+        and thus this file will have a single line with a pool id of "0".
+packets-arrived
+        Counts how many NFS packets have arrived.  More precisely, this
+        is the number of times that the network stack has notified the
+        sunrpc server layer that new data may be available on a transport
+        (e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
+        Depending on the NFS workload patterns and various network stack
+        effects (such as Large Receive Offload) which can combine packets
+        on the wire, this may be either more or less than the number
+        of NFS calls received (which statistic is available elsewhere).
+        However this is a more accurate and less workload-dependent measure
+        of how much CPU load is being placed on the sunrpc server layer
+        due to NFS network traffic.
+sockets-enqueued
+        Counts how many times an NFS transport is enqueued to wait for
+        an nfsd thread to service it, i.e. no nfsd thread was considered
+        available.
+        The circumstance this statistic tracks indicates that there was NFS
+        network-facing work to be done but it couldn't be done immediately,
+        thus introducing a small delay in servicing NFS calls.  The ideal
+        rate of change for this counter is zero; significantly non-zero
+        values may indicate a performance limitation.
+        This can happen either because there are too few nfsd threads in the
+        thread pool for the NFS workload (the workload is thread-limited),
+        or because the NFS workload needs more CPU time than is available in
+        the thread pool (the workload is CPU-limited).  In the former case,
+        configuring more nfsd threads will probably improve the performance
+        of the NFS workload.  In the latter case, the sunrpc server layer is
+        already choosing not to wake idle nfsd threads because there are too
+        many nfsd threads which want to run but cannot, so configuring more
+        nfsd threads will make no difference whatsoever.  The overloads-avoided
+        statistic (see below) can be used to distinguish these cases.
+threads-woken
+        Counts how many times an idle nfsd thread is woken to try to
+        receive some data from an NFS transport.
+        This statistic tracks the circumstance where incoming
+        network-facing NFS work is being handled quickly, which is a good
+        thing.  The ideal rate of change for this counter will be close
+        to but less than the rate of change of the packets-arrived counter.
+overloads-avoided
+        Counts how many times the sunrpc server layer chose not to wake an
+        nfsd thread, despite the presence of idle nfsd threads, because
+        too many nfsd threads had been recently woken but could not get
+        enough CPU time to actually run.
+        This statistic counts a circumstance where the sunrpc layer
+        heuristically avoids overloading the CPU scheduler with too many
+        runnable nfsd threads.  The ideal rate of change for this counter
+        is zero.  Significant non-zero values indicate that the workload
+        is CPU limited.  Usually this is associated with heavy CPU usage
+        on all the CPUs in the nfsd thread pool.
+        If a sustained large overloads-avoided rate is detected on a pool,
+        the top(1) utility should be used to check for the following
+        pattern of CPU usage on all the CPUs associated with the given
+        nfsd thread pool.
+         - %us ~= 0 (as you're *NOT* running applications on your NFS server)
+         - %wa ~= 0
+         - %id ~= 0
+         - %sy + %hi + %si ~= 100
+        If this pattern is seen, configuring more nfsd threads will *not*
+        improve the performance of the workload.  If this patten is not
+        seen, then something more subtle is wrong.
+threads-timedout
+        Counts how many times an nfsd thread triggered an idle timeout,
+        i.e. was not woken to handle any incoming network packets for
+        some time.
+        This statistic counts a circumstance where there are more nfsd
+        threads configured than can be used by the NFS workload.  This is
+        a clue that the number of nfsd threads can be reduced without
+        affecting performance.  Unfortunately, it's only a clue and not
+        a strong indication, for a couple of reasons:
+         - Currently the rate at which the counter is incremented is quite
+           slow; the idle timeout is 60 minutes.  Unless the NFS workload
+           remains constant for hours at a time, this counter is unlikely
+           to be providing information that is still useful.
+         - It is usually a wise policy to provide some slack,
+           i.e. configure a few more nfsds than are currently needed,
+           to allow for future spikes in load.
+Note that incoming packets on NFS transports will be dealt with in
+one of three ways.  An nfsd thread can be woken (threads-woken counts
+this case), or the transport can be enqueued for later attention
+(sockets-enqueued counts this case), or the packet can be temporarily
+deferred because the transport is currently being used by an nfsd
+thread.  This last case is not very interesting and is not explicitly
+counted, but can be inferred from the other counters thus:
+packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
+More
+----
+Descriptions of the other statistics file should go here.
+Greg Banks <gnb@sgi.com>
+26 Mar 2009
diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.txt
new file mode 100644
index 000000000000..8a382bea6808
--- /dev/null
+++ b/Documentation/filesystems/nfs/rpc-cache.txt
@@ -0,0 +1,202 @@
+        This document gives a brief introduction to the caching
+mechanisms in the sunrpc layer that is used, in particular,
+for NFS authentication.
+CACHES
+======
+The caching replaces the old exports table and allows for
+a wide variety of values to be caches.
+There are a number of caches that are similar in structure though
+quite possibly very different in content and use.  There is a corpus
+of common code for managing these caches.
+Examples of caches that are likely to be needed are:
+  - mapping from IP address to client name
+  - mapping from client name and filesystem to export options
+  - mapping from UID to list of GIDs, to work around NFS's limitation
+    of 16 gids.
+  - mappings between local UID/GID and remote UID/GID for sites that
+    do not have uniform uid assignment
+  - mapping from network identify to public key for crypto authentication.
+The common code handles such things as:
+   - general cache lookup with correct locking
+   - supporting 'NEGATIVE' as well as positive entries
+   - allowing an EXPIRED time on cache items, and removing
+     items after they expire, and are no longer in-use.
+   - making requests to user-space to fill in cache entries
+   - allowing user-space to directly set entries in the cache
+   - delaying RPC requests that depend on as-yet incomplete
+     cache entries, and replaying those requests when the cache entry
+     is complete.
+   - clean out old entries as they expire.
+Creating a Cache
+----------------
+1/ A cache needs a datum to store.  This is in the form of a
+   structure definition that must contain a
+     struct cache_head
+   as an element, usually the first.
+   It will also contain a key and some content.
+   Each cache element is reference counted and contains
+   expiry and update times for use in cache management.
+2/ A cache needs a "cache_detail" structure that
+   describes the cache.  This stores the hash table, some
+   parameters for cache management, and some operations detailing how
+   to work with particular cache items.
+   The operations requires are:
+        struct cache_head *alloc(void)
+                This simply allocates appropriate memory and returns
+                a pointer to the cache_detail embedded within the
+                structure
+        void cache_put(struct kref *)
+                This is called when the last reference to an item is
+                dropped.  The pointer passed is to the 'ref' field
+                in the cache_head.  cache_put should release any
+                references create by 'cache_init' and, if CACHE_VALID
+                is set, any references created by cache_update.
+                It should then release the memory allocated by
+                'alloc'.
+        int match(struct cache_head *orig, struct cache_head *new)
+                test if the keys in the two structures match.  Return
+                1 if they do, 0 if they don't.
+        void init(struct cache_head *orig, struct cache_head *new)
+                Set the 'key' fields in 'new' from 'orig'.  This may
+                include taking references to shared objects.
+        void update(struct cache_head *orig, struct cache_head *new)
+                Set the 'content' fileds in 'new' from 'orig'.
+        int cache_show(struct seq_file *m, struct cache_detail *cd,
+                        struct cache_head *h)
+                Optional.  Used to provide a /proc file that lists the
+                contents of a cache.  This should show one item,
+                usually on just one line.
+        int cache_request(struct cache_detail *cd, struct cache_head *h,
+                char **bpp, int *blen)
+                Format a request to be send to user-space for an item
+                to be instantiated.  *bpp is a buffer of size *blen.
+                bpp should be moved forward over the encoded message,
+                and  *blen should be reduced to show how much free
+                space remains.  Return 0 on success or <0 if not
+                enough room or other problem.
+        int cache_parse(struct cache_detail *cd, char *buf, int len)
+                A message from user space has arrived to fill out a
+                cache entry.  It is in 'buf' of length 'len'.
+                cache_parse should parse this, find the item in the
+                cache with sunrpc_cache_lookup, and update the item
+                with sunrpc_cache_update.
+3/ A cache needs to be registered using cache_register().  This
+   includes it on a list of caches that will be regularly
+   cleaned to discard old data.
+Using a cache
+-------------
+To find a value in a cache, call sunrpc_cache_lookup passing a pointer
+to the cache_head in a sample item with the 'key' fields filled in.
+This will be passed to ->match to identify the target entry.  If no
+entry is found, a new entry will be create, added to the cache, and
+marked as not containing valid data.
+The item returned is typically passed to cache_check which will check
+if the data is valid, and may initiate an up-call to get fresh data.
+cache_check will return -ENOENT in the entry is negative or if an up
+call is needed but not possible, -EAGAIN if an upcall is pending,
+or 0 if the data is valid;
+cache_check can be passed a "struct cache_req *".  This structure is
+typically embedded in the actual request and can be used to create a
+deferred copy of the request (struct cache_deferred_req).  This is
+done when the found cache item is not uptodate, but the is reason to
+believe that userspace might provide information soon.  When the cache
+item does become valid, the deferred copy of the request will be
+revisited (->revisit).  It is expected that this method will
+reschedule the request for processing.
+The value returned by sunrpc_cache_lookup can also be passed to
+sunrpc_cache_update to set the content for the item.  A second item is
+passed which should hold the content.  If the item found by _lookup
+has valid data, then it is discarded and a new item is created.  This
+saves any user of an item from worrying about content changing while
+it is being inspected.  If the item found by _lookup does not contain
+valid data, then the content is copied across and CACHE_VALID is set.
+Populating a cache
+------------------
+Each cache has a name, and when the cache is registered, a directory
+with that name is created in /proc/net/rpc
+This directory contains a file called 'channel' which is a channel
+for communicating between kernel and user for populating the cache.
+This directory may later contain other files of interacting
+with the cache.
+The 'channel' works a bit like a datagram socket. Each 'write' is
+passed as a whole to the cache for parsing and interpretation.
+Each cache can treat the write requests differently, but it is
+expected that a message written will contain:
+  - a key
+  - an expiry time
+  - a content.
+with the intention that an item in the cache with the give key
+should be create or updated to have the given content, and the
+expiry time should be set on that item.
+Reading from a channel is a bit more interesting.  When a cache
+lookup fails, or when it succeeds but finds an entry that may soon
+expire, a request is lodged for that cache item to be updated by
+user-space.  These requests appear in the channel file.
+Successive reads will return successive requests.
+If there are no more requests to return, read will return EOF, but a
+select or poll for read will block waiting for another request to be
+added.
+Thus a user-space helper is likely to:
+  open the channel.
+    select for readable
+    read a request
+    write a response
+  loop.
+If it dies and needs to be restarted, any requests that have not been
+answered will still appear in the file and will be read by the new
+instance of the helper.
+Each cache should define a "cache_parse" method which takes a message
+written from user-space and processes it.  It should return an error
+(which propagates back to the write syscall) or 0.
+Each cache should also define a "cache_request" method which
+takes a cache item and encodes a request into the buffer
+provided.
+Note: If a cache has no active readers on the channel, and has had not
+active readers for more than 60 seconds, further requests will not be
+added to the channel but instead all lookups that do not find a valid
+entry will fail.  This is partly for backward compatibility: The
+previous nfs exports table was deemed to be authoritative and a
+failed lookup meant a definite 'no'.
+request/response format
+-----------------------
+While each cache is free to use it's own format for requests
+and responses over channel, the following is recommended as
+appropriate and support routines are available to help:
+Each request or response record should be printable ASCII
+with precisely one newline character which should be at the end.
+Fields within the record should be separated by spaces, normally one.
+If spaces, newlines, or nul characters are needed in a field they
+much be quoted.  two mechanisms are available:
+1/ If a field begins '\x' then it must contain an even number of
+   hex digits, and pairs of these digits provide the bytes in the
+   field.
+2/ otherwise a \ in the field must be followed by 3 octal digits
+   which give the code for a byte.  Other characters are treated
+   as them selves.  At the very least, space, newline, nul, and
+   '\' must be quoted in this way.
author	J. Bruce Fields <bfields@citi.umich.edu>	2009-11-06 13:59:43 -0500
committer	J. Bruce Fields <bfields@citi.umich.edu>	2009-11-06 14:01:02 -0500
commit	ea4878a24d7e6a467d369b962bab95bd6a12cbe0 (patch)
tree	4f937b8dfa658b16779ae2267d450b53fb035fe7 /Documentation/filesystems/nfs
parent	8c10cbdb4af642d9a2efb45ea89251aaab905360 (diff)

diff --git a/Documentation/filesystems/nfs/00-INDEX b/Documentation/filesystems/nfs/00-INDEX index 6ff3d212027b..2f68cd688769 100644 --- a/Documentation/filesystems/nfs/00-INDEX +++ b/Documentation/filesystems/nfs/00-INDEX
@@ -2,6 +2,8 @@
2	- this file (nfs-related documentation).	2	- this file (nfs-related documentation).
3	Exporting	3	Exporting
4	- explanation of how to make filesystems exportable.	4	- explanation of how to make filesystems exportable.
		5	knfsd-stats.txt
		6	- statistics which the NFS server makes available to user space.
5	nfs.txt	7	nfs.txt
6	- nfs client, and DNS resolution for fs_locations.	8	- nfs client, and DNS resolution for fs_locations.
7	nfs41-server.txt	9	nfs41-server.txt
@@ -10,3 +12,5 @@ nfs-rdma.txt
10	- how to install and setup the Linux NFS/RDMA client and server software	12	- how to install and setup the Linux NFS/RDMA client and server software
11	nfsroot.txt	13	nfsroot.txt
12	- short guide on setting up a diskless box with NFS root filesystem.	14	- short guide on setting up a diskless box with NFS root filesystem.
		15	rpc-cache.txt
		16	- introduction to the caching mechanisms in the sunrpc layer.


diff --git a/Documentation/filesystems/nfs/knfsd-stats.txt b/Documentation/filesystems/nfs/knfsd-stats.txt new file mode 100644 index 000000000000..64ced5149d37 --- /dev/null +++ b/Documentation/filesystems/nfs/knfsd-stats.txt
@@ -0,0 +1,159 @@
		1
		2	Kernel NFS Server Statistics
		3	============================
		4
		5	This document describes the format and semantics of the statistics
		6	which the kernel NFS server makes available to userspace. These
		7	statistics are available in several text form pseudo files, each of
		8	which is described separately below.
		9
		10	In most cases you don't need to know these formats, as the nfsstat(8)
		11	program from the nfs-utils distribution provides a helpful command-line
		12	interface for extracting and printing them.
		13
		14	All the files described here are formatted as a sequence of text lines,
		15	separated by newline '\n' characters. Lines beginning with a hash
		16	'#' character are comments intended for humans and should be ignored
		17	by parsing routines. All other lines contain a sequence of fields
		18	separated by whitespace.
		19
		20	/proc/fs/nfsd/pool_stats
		21	------------------------
		22
		23	This file is available in kernels from 2.6.30 onwards, if the
		24	/proc/fs/nfsd filesystem is mounted (it almost always should be).
		25
		26	The first line is a comment which describes the fields present in
		27	all the other lines. The other lines present the following data as
		28	a sequence of unsigned decimal numeric fields. One line is shown
		29	for each NFS thread pool.
		30
		31	All counters are 64 bits wide and wrap naturally. There is no way
		32	to zero these counters, instead applications should do their own
		33	rate conversion.
		34
		35	pool
		36	The id number of the NFS thread pool to which this line applies.
		37	This number does not change.
		38
		39	Thread pool ids are a contiguous set of small integers starting
		40	at zero. The maximum value depends on the thread pool mode, but
		41	currently cannot be larger than the number of CPUs in the system.
		42	Note that in the default case there will be a single thread pool
		43	which contains all the nfsd threads and all the CPUs in the system,
		44	and thus this file will have a single line with a pool id of "0".
		45
		46	packets-arrived
		47	Counts how many NFS packets have arrived. More precisely, this
		48	is the number of times that the network stack has notified the
		49	sunrpc server layer that new data may be available on a transport
		50	(e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
		51
		52	Depending on the NFS workload patterns and various network stack
		53	effects (such as Large Receive Offload) which can combine packets
		54	on the wire, this may be either more or less than the number
		55	of NFS calls received (which statistic is available elsewhere).
		56	However this is a more accurate and less workload-dependent measure
		57	of how much CPU load is being placed on the sunrpc server layer
		58	due to NFS network traffic.
		59
		60	sockets-enqueued
		61	Counts how many times an NFS transport is enqueued to wait for
		62	an nfsd thread to service it, i.e. no nfsd thread was considered
		63	available.
		64
		65	The circumstance this statistic tracks indicates that there was NFS
		66	network-facing work to be done but it couldn't be done immediately,
		67	thus introducing a small delay in servicing NFS calls. The ideal
		68	rate of change for this counter is zero; significantly non-zero
		69	values may indicate a performance limitation.
		70
		71	This can happen either because there are too few nfsd threads in the
		72	thread pool for the NFS workload (the workload is thread-limited),
		73	or because the NFS workload needs more CPU time than is available in
		74	the thread pool (the workload is CPU-limited). In the former case,
		75	configuring more nfsd threads will probably improve the performance
		76	of the NFS workload. In the latter case, the sunrpc server layer is
		77	already choosing not to wake idle nfsd threads because there are too
		78	many nfsd threads which want to run but cannot, so configuring more
		79	nfsd threads will make no difference whatsoever. The overloads-avoided
		80	statistic (see below) can be used to distinguish these cases.
		81
		82	threads-woken
		83	Counts how many times an idle nfsd thread is woken to try to
		84	receive some data from an NFS transport.
		85
		86	This statistic tracks the circumstance where incoming
		87	network-facing NFS work is being handled quickly, which is a good
		88	thing. The ideal rate of change for this counter will be close
		89	to but less than the rate of change of the packets-arrived counter.
		90
		91	overloads-avoided
		92	Counts how many times the sunrpc server layer chose not to wake an
		93	nfsd thread, despite the presence of idle nfsd threads, because
		94	too many nfsd threads had been recently woken but could not get
		95	enough CPU time to actually run.
		96
		97	This statistic counts a circumstance where the sunrpc layer
		98	heuristically avoids overloading the CPU scheduler with too many
		99	runnable nfsd threads. The ideal rate of change for this counter
		100	is zero. Significant non-zero values indicate that the workload
		101	is CPU limited. Usually this is associated with heavy CPU usage
		102	on all the CPUs in the nfsd thread pool.
		103
		104	If a sustained large overloads-avoided rate is detected on a pool,
		105	the top(1) utility should be used to check for the following
		106	pattern of CPU usage on all the CPUs associated with the given
		107	nfsd thread pool.
		108
		109	- %us ~= 0 (as you're NOT running applications on your NFS server)
		110
		111	- %wa ~= 0
		112
		113	- %id ~= 0
		114
		115	- %sy + %hi + %si ~= 100
		116
		117	If this pattern is seen, configuring more nfsd threads will not
		118	improve the performance of the workload. If this patten is not
		119	seen, then something more subtle is wrong.
		120
		121	threads-timedout
		122	Counts how many times an nfsd thread triggered an idle timeout,
		123	i.e. was not woken to handle any incoming network packets for
		124	some time.
		125
		126	This statistic counts a circumstance where there are more nfsd
		127	threads configured than can be used by the NFS workload. This is
		128	a clue that the number of nfsd threads can be reduced without
		129	affecting performance. Unfortunately, it's only a clue and not
		130	a strong indication, for a couple of reasons:
		131
		132	- Currently the rate at which the counter is incremented is quite
		133	slow; the idle timeout is 60 minutes. Unless the NFS workload
		134	remains constant for hours at a time, this counter is unlikely
		135	to be providing information that is still useful.
		136
		137	- It is usually a wise policy to provide some slack,
		138	i.e. configure a few more nfsds than are currently needed,
		139	to allow for future spikes in load.
		140
		141
		142	Note that incoming packets on NFS transports will be dealt with in
		143	one of three ways. An nfsd thread can be woken (threads-woken counts
		144	this case), or the transport can be enqueued for later attention
		145	(sockets-enqueued counts this case), or the packet can be temporarily
		146	deferred because the transport is currently being used by an nfsd
		147	thread. This last case is not very interesting and is not explicitly
		148	counted, but can be inferred from the other counters thus:
		149
		150	packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
		151
		152
		153	More
		154	----
		155	Descriptions of the other statistics file should go here.
		156
		157
		158	Greg Banks <gnb@sgi.com>
		159	26 Mar 2009


diff --git a/Documentation/filesystems/nfs/rpc-cache.txt b/Documentation/filesystems/nfs/rpc-cache.txt new file mode 100644 index 000000000000..8a382bea6808 --- /dev/null +++ b/Documentation/filesystems/nfs/rpc-cache.txt
@@ -0,0 +1,202 @@
		1	This document gives a brief introduction to the caching
		2	mechanisms in the sunrpc layer that is used, in particular,
		3	for NFS authentication.
		4
		5	CACHES
		6	======
		7	The caching replaces the old exports table and allows for
		8	a wide variety of values to be caches.
		9
		10	There are a number of caches that are similar in structure though
		11	quite possibly very different in content and use. There is a corpus
		12	of common code for managing these caches.
		13
		14	Examples of caches that are likely to be needed are:
		15	- mapping from IP address to client name
		16	- mapping from client name and filesystem to export options
		17	- mapping from UID to list of GIDs, to work around NFS's limitation
		18	of 16 gids.
		19	- mappings between local UID/GID and remote UID/GID for sites that
		20	do not have uniform uid assignment
		21	- mapping from network identify to public key for crypto authentication.
		22
		23	The common code handles such things as:
		24	- general cache lookup with correct locking
		25	- supporting 'NEGATIVE' as well as positive entries
		26	- allowing an EXPIRED time on cache items, and removing
		27	items after they expire, and are no longer in-use.
		28	- making requests to user-space to fill in cache entries
		29	- allowing user-space to directly set entries in the cache
		30	- delaying RPC requests that depend on as-yet incomplete
		31	cache entries, and replaying those requests when the cache entry
		32	is complete.
		33	- clean out old entries as they expire.
		34
		35	Creating a Cache
		36	----------------
		37
		38	1/ A cache needs a datum to store. This is in the form of a
		39	structure definition that must contain a
		40	struct cache_head
		41	as an element, usually the first.
		42	It will also contain a key and some content.
		43	Each cache element is reference counted and contains
		44	expiry and update times for use in cache management.
		45	2/ A cache needs a "cache_detail" structure that
		46	describes the cache. This stores the hash table, some
		47	parameters for cache management, and some operations detailing how
		48	to work with particular cache items.
		49	The operations requires are:
		50	struct cache_head *alloc(void)
		51	This simply allocates appropriate memory and returns
		52	a pointer to the cache_detail embedded within the
		53	structure
		54	void cache_put(struct kref *)
		55	This is called when the last reference to an item is
		56	dropped. The pointer passed is to the 'ref' field
		57	in the cache_head. cache_put should release any
		58	references create by 'cache_init' and, if CACHE_VALID
		59	is set, any references created by cache_update.
		60	It should then release the memory allocated by
		61	'alloc'.
		62	int match(struct cache_head orig, struct cache_head new)
		63	test if the keys in the two structures match. Return
		64	1 if they do, 0 if they don't.
		65	void init(struct cache_head orig, struct cache_head new)
		66	Set the 'key' fields in 'new' from 'orig'. This may
		67	include taking references to shared objects.
		68	void update(struct cache_head orig, struct cache_head new)
		69	Set the 'content' fileds in 'new' from 'orig'.
		70	int cache_show(struct seq_file m, struct cache_detail cd,
		71	struct cache_head *h)
		72	Optional. Used to provide a /proc file that lists the
		73	contents of a cache. This should show one item,
		74	usually on just one line.
		75	int cache_request(struct cache_detail cd, struct cache_head h,
		76	char *bpp, int blen)
		77	Format a request to be send to user-space for an item
		78	to be instantiated. bpp is a buffer of size blen.
		79	bpp should be moved forward over the encoded message,
		80	and *blen should be reduced to show how much free
		81	space remains. Return 0 on success or <0 if not
		82	enough room or other problem.
		83	int cache_parse(struct cache_detail cd, char buf, int len)
		84	A message from user space has arrived to fill out a
		85	cache entry. It is in 'buf' of length 'len'.
		86	cache_parse should parse this, find the item in the
		87	cache with sunrpc_cache_lookup, and update the item
		88	with sunrpc_cache_update.
		89
		90
		91	3/ A cache needs to be registered using cache_register(). This
		92	includes it on a list of caches that will be regularly
		93	cleaned to discard old data.
		94
		95	Using a cache
		96	-------------
		97
		98	To find a value in a cache, call sunrpc_cache_lookup passing a pointer
		99	to the cache_head in a sample item with the 'key' fields filled in.
		100	This will be passed to ->match to identify the target entry. If no
		101	entry is found, a new entry will be create, added to the cache, and
		102	marked as not containing valid data.
		103
		104	The item returned is typically passed to cache_check which will check
		105	if the data is valid, and may initiate an up-call to get fresh data.
		106	cache_check will return -ENOENT in the entry is negative or if an up
		107	call is needed but not possible, -EAGAIN if an upcall is pending,
		108	or 0 if the data is valid;
		109
		110	cache_check can be passed a "struct cache_req *". This structure is
		111	typically embedded in the actual request and can be used to create a
		112	deferred copy of the request (struct cache_deferred_req). This is
		113	done when the found cache item is not uptodate, but the is reason to
		114	believe that userspace might provide information soon. When the cache
		115	item does become valid, the deferred copy of the request will be
		116	revisited (->revisit). It is expected that this method will
		117	reschedule the request for processing.
		118
		119	The value returned by sunrpc_cache_lookup can also be passed to
		120	sunrpc_cache_update to set the content for the item. A second item is
		121	passed which should hold the content. If the item found by _lookup
		122	has valid data, then it is discarded and a new item is created. This
		123	saves any user of an item from worrying about content changing while
		124	it is being inspected. If the item found by _lookup does not contain
		125	valid data, then the content is copied across and CACHE_VALID is set.
		126
		127	Populating a cache
		128	------------------
		129
		130	Each cache has a name, and when the cache is registered, a directory
		131	with that name is created in /proc/net/rpc
		132
		133	This directory contains a file called 'channel' which is a channel
		134	for communicating between kernel and user for populating the cache.
		135	This directory may later contain other files of interacting
		136	with the cache.
		137
		138	The 'channel' works a bit like a datagram socket. Each 'write' is
		139	passed as a whole to the cache for parsing and interpretation.
		140	Each cache can treat the write requests differently, but it is
		141	expected that a message written will contain:
		142	- a key
		143	- an expiry time
		144	- a content.
		145	with the intention that an item in the cache with the give key
		146	should be create or updated to have the given content, and the
		147	expiry time should be set on that item.
		148
		149	Reading from a channel is a bit more interesting. When a cache
		150	lookup fails, or when it succeeds but finds an entry that may soon
		151	expire, a request is lodged for that cache item to be updated by
		152	user-space. These requests appear in the channel file.
		153
		154	Successive reads will return successive requests.
		155	If there are no more requests to return, read will return EOF, but a
		156	select or poll for read will block waiting for another request to be
		157	added.
		158
		159	Thus a user-space helper is likely to:
		160	open the channel.
		161	select for readable
		162	read a request
		163	write a response
		164	loop.
		165
		166	If it dies and needs to be restarted, any requests that have not been
		167	answered will still appear in the file and will be read by the new
		168	instance of the helper.
		169
		170	Each cache should define a "cache_parse" method which takes a message
		171	written from user-space and processes it. It should return an error
		172	(which propagates back to the write syscall) or 0.
		173
		174	Each cache should also define a "cache_request" method which
		175	takes a cache item and encodes a request into the buffer
		176	provided.
		177
		178	Note: If a cache has no active readers on the channel, and has had not
		179	active readers for more than 60 seconds, further requests will not be
		180	added to the channel but instead all lookups that do not find a valid
		181	entry will fail. This is partly for backward compatibility: The
		182	previous nfs exports table was deemed to be authoritative and a
		183	failed lookup meant a definite 'no'.
		184
		185	request/response format
		186	-----------------------
		187
		188	While each cache is free to use it's own format for requests
		189	and responses over channel, the following is recommended as
		190	appropriate and support routines are available to help:
		191	Each request or response record should be printable ASCII
		192	with precisely one newline character which should be at the end.
		193	Fields within the record should be separated by spaces, normally one.
		194	If spaces, newlines, or nul characters are needed in a field they
		195	much be quoted. two mechanisms are available:
		196	1/ If a field begins '\x' then it must contain an even number of
		197	hex digits, and pairs of these digits provide the bytes in the
		198	field.
		199	2/ otherwise a \ in the field must be followed by 3 octal digits
		200	which give the code for a byte. Other characters are treated
		201	as them selves. At the very least, space, newline, nul, and
		202	'\' must be quoted in this way.