Document /proc/fs/nfsd/pool_stats

Document the format and semantics of the /proc/fs/nfsd/pool_stats file. Signed-off-by: Greg Banks <gnb@sgi.com> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
author: Greg Banks <gnb@sgi.com> 2009-03-26 02:45:27 -0400
committer: J. Bruce Fields <bfields@citi.umich.edu> 2009-03-27 19:24:27 -0400
commit: b5cbc369db39d9080f4932db8607aea1e1654d4d (patch)
tree: 439327c3fa2fc0ee6e4171e0025f7378f51fd289 /Documentation/filesystems
parent: abd91ee979f785b7377216532620d98ab4e3e5af (diff)
1 files changed, 159 insertions, 0 deletions
diff --git a/Documentation/filesystems/knfsd-stats.txt b/Documentation/filesystems/knfsd-stats.txt
new file mode 100644
index 000000000000..64ced5149d37
--- /dev/null
+++ b/Documentation/filesystems/knfsd-stats.txt
@@ -0,0 +1,159 @@
+Kernel NFS Server Statistics
+============================
+This document describes the format and semantics of the statistics
+which the kernel NFS server makes available to userspace.  These
+statistics are available in several text form pseudo files, each of
+which is described separately below.
+In most cases you don't need to know these formats, as the nfsstat(8)
+program from the nfs-utils distribution provides a helpful command-line
+interface for extracting and printing them.
+All the files described here are formatted as a sequence of text lines,
+separated by newline '\n' characters.  Lines beginning with a hash
+'#' character are comments intended for humans and should be ignored
+by parsing routines.  All other lines contain a sequence of fields
+separated by whitespace.
+/proc/fs/nfsd/pool_stats
+------------------------
+This file is available in kernels from 2.6.30 onwards, if the
+/proc/fs/nfsd filesystem is mounted (it almost always should be).
+The first line is a comment which describes the fields present in
+all the other lines.  The other lines present the following data as
+a sequence of unsigned decimal numeric fields.  One line is shown
+for each NFS thread pool.
+All counters are 64 bits wide and wrap naturally.  There is no way
+to zero these counters, instead applications should do their own
+rate conversion.
+pool
+        The id number of the NFS thread pool to which this line applies.
+        This number does not change.
+        Thread pool ids are a contiguous set of small integers starting
+        at zero.  The maximum value depends on the thread pool mode, but
+        currently cannot be larger than the number of CPUs in the system.
+        Note that in the default case there will be a single thread pool
+        which contains all the nfsd threads and all the CPUs in the system,
+        and thus this file will have a single line with a pool id of "0".
+packets-arrived
+        Counts how many NFS packets have arrived.  More precisely, this
+        is the number of times that the network stack has notified the
+        sunrpc server layer that new data may be available on a transport
+        (e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
+        Depending on the NFS workload patterns and various network stack
+        effects (such as Large Receive Offload) which can combine packets
+        on the wire, this may be either more or less than the number
+        of NFS calls received (which statistic is available elsewhere).
+        However this is a more accurate and less workload-dependent measure
+        of how much CPU load is being placed on the sunrpc server layer
+        due to NFS network traffic.
+sockets-enqueued
+        Counts how many times an NFS transport is enqueued to wait for
+        an nfsd thread to service it, i.e. no nfsd thread was considered
+        available.
+        The circumstance this statistic tracks indicates that there was NFS
+        network-facing work to be done but it couldn't be done immediately,
+        thus introducing a small delay in servicing NFS calls.  The ideal
+        rate of change for this counter is zero; significantly non-zero
+        values may indicate a performance limitation.
+        This can happen either because there are too few nfsd threads in the
+        thread pool for the NFS workload (the workload is thread-limited),
+        or because the NFS workload needs more CPU time than is available in
+        the thread pool (the workload is CPU-limited).  In the former case,
+        configuring more nfsd threads will probably improve the performance
+        of the NFS workload.  In the latter case, the sunrpc server layer is
+        already choosing not to wake idle nfsd threads because there are too
+        many nfsd threads which want to run but cannot, so configuring more
+        nfsd threads will make no difference whatsoever.  The overloads-avoided
+        statistic (see below) can be used to distinguish these cases.
+threads-woken
+        Counts how many times an idle nfsd thread is woken to try to
+        receive some data from an NFS transport.
+        This statistic tracks the circumstance where incoming
+        network-facing NFS work is being handled quickly, which is a good
+        thing.  The ideal rate of change for this counter will be close
+        to but less than the rate of change of the packets-arrived counter.
+overloads-avoided
+        Counts how many times the sunrpc server layer chose not to wake an
+        nfsd thread, despite the presence of idle nfsd threads, because
+        too many nfsd threads had been recently woken but could not get
+        enough CPU time to actually run.
+        This statistic counts a circumstance where the sunrpc layer
+        heuristically avoids overloading the CPU scheduler with too many
+        runnable nfsd threads.  The ideal rate of change for this counter
+        is zero.  Significant non-zero values indicate that the workload
+        is CPU limited.  Usually this is associated with heavy CPU usage
+        on all the CPUs in the nfsd thread pool.
+        If a sustained large overloads-avoided rate is detected on a pool,
+        the top(1) utility should be used to check for the following
+        pattern of CPU usage on all the CPUs associated with the given
+        nfsd thread pool.
+         - %us ~= 0 (as you're *NOT* running applications on your NFS server)
+         - %wa ~= 0
+         - %id ~= 0
+         - %sy + %hi + %si ~= 100
+        If this pattern is seen, configuring more nfsd threads will *not*
+        improve the performance of the workload.  If this patten is not
+        seen, then something more subtle is wrong.
+threads-timedout
+        Counts how many times an nfsd thread triggered an idle timeout,
+        i.e. was not woken to handle any incoming network packets for
+        some time.
+        This statistic counts a circumstance where there are more nfsd
+        threads configured than can be used by the NFS workload.  This is
+        a clue that the number of nfsd threads can be reduced without
+        affecting performance.  Unfortunately, it's only a clue and not
+        a strong indication, for a couple of reasons:
+         - Currently the rate at which the counter is incremented is quite
+           slow; the idle timeout is 60 minutes.  Unless the NFS workload
+           remains constant for hours at a time, this counter is unlikely
+           to be providing information that is still useful.
+         - It is usually a wise policy to provide some slack,
+           i.e. configure a few more nfsds than are currently needed,
+           to allow for future spikes in load.
+Note that incoming packets on NFS transports will be dealt with in
+one of three ways.  An nfsd thread can be woken (threads-woken counts
+this case), or the transport can be enqueued for later attention
+(sockets-enqueued counts this case), or the packet can be temporarily
+deferred because the transport is currently being used by an nfsd
+thread.  This last case is not very interesting and is not explicitly
+counted, but can be inferred from the other counters thus:
+packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
+More
+----
+Descriptions of the other statistics file should go here.
+Greg Banks <gnb@sgi.com>
+26 Mar 2009
author	Greg Banks <gnb@sgi.com>	2009-03-26 02:45:27 -0400
committer	J. Bruce Fields <bfields@citi.umich.edu>	2009-03-27 19:24:27 -0400
commit	b5cbc369db39d9080f4932db8607aea1e1654d4d (patch)
tree	439327c3fa2fc0ee6e4171e0025f7378f51fd289 /Documentation/filesystems
parent	abd91ee979f785b7377216532620d98ab4e3e5af (diff)

diff --git a/Documentation/filesystems/knfsd-stats.txt b/Documentation/filesystems/knfsd-stats.txt new file mode 100644 index 000000000000..64ced5149d37 --- /dev/null +++ b/Documentation/filesystems/knfsd-stats.txt
@@ -0,0 +1,159 @@
	1
	2	Kernel NFS Server Statistics
	3	============================
	4
	5	This document describes the format and semantics of the statistics
	6	which the kernel NFS server makes available to userspace. These
	7	statistics are available in several text form pseudo files, each of
	8	which is described separately below.
	9
	10	In most cases you don't need to know these formats, as the nfsstat(8)
	11	program from the nfs-utils distribution provides a helpful command-line
	12	interface for extracting and printing them.
	13
	14	All the files described here are formatted as a sequence of text lines,
	15	separated by newline '\n' characters. Lines beginning with a hash
	16	'#' character are comments intended for humans and should be ignored
	17	by parsing routines. All other lines contain a sequence of fields
	18	separated by whitespace.
	19
	20	/proc/fs/nfsd/pool_stats
	21	------------------------
	22
	23	This file is available in kernels from 2.6.30 onwards, if the
	24	/proc/fs/nfsd filesystem is mounted (it almost always should be).
	25
	26	The first line is a comment which describes the fields present in
	27	all the other lines. The other lines present the following data as
	28	a sequence of unsigned decimal numeric fields. One line is shown
	29	for each NFS thread pool.
	30
	31	All counters are 64 bits wide and wrap naturally. There is no way
	32	to zero these counters, instead applications should do their own
	33	rate conversion.
	34
	35	pool
	36	The id number of the NFS thread pool to which this line applies.
	37	This number does not change.
	38
	39	Thread pool ids are a contiguous set of small integers starting
	40	at zero. The maximum value depends on the thread pool mode, but
	41	currently cannot be larger than the number of CPUs in the system.
	42	Note that in the default case there will be a single thread pool
	43	which contains all the nfsd threads and all the CPUs in the system,
	44	and thus this file will have a single line with a pool id of "0".
	45
	46	packets-arrived
	47	Counts how many NFS packets have arrived. More precisely, this
	48	is the number of times that the network stack has notified the
	49	sunrpc server layer that new data may be available on a transport
	50	(e.g. an NFS or UDP socket or an NFS/RDMA endpoint).
	51
	52	Depending on the NFS workload patterns and various network stack
	53	effects (such as Large Receive Offload) which can combine packets
	54	on the wire, this may be either more or less than the number
	55	of NFS calls received (which statistic is available elsewhere).
	56	However this is a more accurate and less workload-dependent measure
	57	of how much CPU load is being placed on the sunrpc server layer
	58	due to NFS network traffic.
	59
	60	sockets-enqueued
	61	Counts how many times an NFS transport is enqueued to wait for
	62	an nfsd thread to service it, i.e. no nfsd thread was considered
	63	available.
	64
	65	The circumstance this statistic tracks indicates that there was NFS
	66	network-facing work to be done but it couldn't be done immediately,
	67	thus introducing a small delay in servicing NFS calls. The ideal
	68	rate of change for this counter is zero; significantly non-zero
	69	values may indicate a performance limitation.
	70
	71	This can happen either because there are too few nfsd threads in the
	72	thread pool for the NFS workload (the workload is thread-limited),
	73	or because the NFS workload needs more CPU time than is available in
	74	the thread pool (the workload is CPU-limited). In the former case,
	75	configuring more nfsd threads will probably improve the performance
	76	of the NFS workload. In the latter case, the sunrpc server layer is
	77	already choosing not to wake idle nfsd threads because there are too
	78	many nfsd threads which want to run but cannot, so configuring more
	79	nfsd threads will make no difference whatsoever. The overloads-avoided
	80	statistic (see below) can be used to distinguish these cases.
	81
	82	threads-woken
	83	Counts how many times an idle nfsd thread is woken to try to
	84	receive some data from an NFS transport.
	85
	86	This statistic tracks the circumstance where incoming
	87	network-facing NFS work is being handled quickly, which is a good
	88	thing. The ideal rate of change for this counter will be close
	89	to but less than the rate of change of the packets-arrived counter.
	90
	91	overloads-avoided
	92	Counts how many times the sunrpc server layer chose not to wake an
	93	nfsd thread, despite the presence of idle nfsd threads, because
	94	too many nfsd threads had been recently woken but could not get
	95	enough CPU time to actually run.
	96
	97	This statistic counts a circumstance where the sunrpc layer
	98	heuristically avoids overloading the CPU scheduler with too many
	99	runnable nfsd threads. The ideal rate of change for this counter
	100	is zero. Significant non-zero values indicate that the workload
	101	is CPU limited. Usually this is associated with heavy CPU usage
	102	on all the CPUs in the nfsd thread pool.
	103
	104	If a sustained large overloads-avoided rate is detected on a pool,
	105	the top(1) utility should be used to check for the following
	106	pattern of CPU usage on all the CPUs associated with the given
	107	nfsd thread pool.
	108
	109	- %us ~= 0 (as you're NOT running applications on your NFS server)
	110
	111	- %wa ~= 0
	112
	113	- %id ~= 0
	114
	115	- %sy + %hi + %si ~= 100
	116
	117	If this pattern is seen, configuring more nfsd threads will not
	118	improve the performance of the workload. If this patten is not
	119	seen, then something more subtle is wrong.
	120
	121	threads-timedout
	122	Counts how many times an nfsd thread triggered an idle timeout,
	123	i.e. was not woken to handle any incoming network packets for
	124	some time.
	125
	126	This statistic counts a circumstance where there are more nfsd
	127	threads configured than can be used by the NFS workload. This is
	128	a clue that the number of nfsd threads can be reduced without
	129	affecting performance. Unfortunately, it's only a clue and not
	130	a strong indication, for a couple of reasons:
	131
	132	- Currently the rate at which the counter is incremented is quite
	133	slow; the idle timeout is 60 minutes. Unless the NFS workload
	134	remains constant for hours at a time, this counter is unlikely
	135	to be providing information that is still useful.
	136
	137	- It is usually a wise policy to provide some slack,
	138	i.e. configure a few more nfsds than are currently needed,
	139	to allow for future spikes in load.
	140
	141
	142	Note that incoming packets on NFS transports will be dealt with in
	143	one of three ways. An nfsd thread can be woken (threads-woken counts
	144	this case), or the transport can be enqueued for later attention
	145	(sockets-enqueued counts this case), or the packet can be temporarily
	146	deferred because the transport is currently being used by an nfsd
	147	thread. This last case is not very interesting and is not explicitly
	148	counted, but can be inferred from the other counters thus:
	149
	150	packets-deferred = packets-arrived - ( sockets-enqueued + threads-woken )
	151
	152
	153	More
	154	----
	155	Descriptions of the other statistics file should go here.
	156
	157
	158	Greg Banks <gnb@sgi.com>
	159	26 Mar 2009