aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/filesystems
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r--Documentation/filesystems/ext3.txt6
-rw-r--r--Documentation/filesystems/gfs2-glocks.txt119
-rw-r--r--Documentation/filesystems/gfs2.txt9
-rw-r--r--Documentation/filesystems/nfs/pnfs.txt2
-rw-r--r--Documentation/filesystems/proc.txt1
-rw-r--r--Documentation/filesystems/qnx6.txt28
6 files changed, 140 insertions, 25 deletions
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index b100adc38ad..293855e9500 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -59,9 +59,9 @@ commit=nrsec (*) Ext3 can be told to sync all its data and metadata
59 Setting it to very large values will improve 59 Setting it to very large values will improve
60 performance. 60 performance.
61 61
62barrier=<0(*)|1> This enables/disables the use of write barriers in 62barrier=<0|1(*)> This enables/disables the use of write barriers in
63barrier the jbd code. barrier=0 disables, barrier=1 enables. 63barrier (*) the jbd code. barrier=0 disables, barrier=1 enables.
64nobarrier (*) This also requires an IO stack which can support 64nobarrier This also requires an IO stack which can support
65 barriers, and if jbd gets an error on a barrier 65 barriers, and if jbd gets an error on a barrier
66 write, it will disable again with a warning. 66 write, it will disable again with a warning.
67 Write barriers enforce proper on-disk ordering 67 Write barriers enforce proper on-disk ordering
diff --git a/Documentation/filesystems/gfs2-glocks.txt b/Documentation/filesystems/gfs2-glocks.txt
index 0494f78d87e..fcc79957be6 100644
--- a/Documentation/filesystems/gfs2-glocks.txt
+++ b/Documentation/filesystems/gfs2-glocks.txt
@@ -61,7 +61,9 @@ go_unlock | Called on the final local unlock of a lock
61go_dump | Called to print content of object for debugfs file, or on 61go_dump | Called to print content of object for debugfs file, or on
62 | error to dump glock to the log. 62 | error to dump glock to the log.
63go_type | The type of the glock, LM_TYPE_..... 63go_type | The type of the glock, LM_TYPE_.....
64go_min_hold_time | The minimum hold time 64go_callback | Called if the DLM sends a callback to drop this lock
65go_flags | GLOF_ASPACE is set, if the glock has an address space
66 | associated with it
65 67
66The minimum hold time for each lock is the time after a remote lock 68The minimum hold time for each lock is the time after a remote lock
67grant for which we ignore remote demote requests. This is in order to 69grant for which we ignore remote demote requests. This is in order to
@@ -89,6 +91,7 @@ go_demote_ok | Sometimes | Yes
89go_lock | Yes | No 91go_lock | Yes | No
90go_unlock | Yes | No 92go_unlock | Yes | No
91go_dump | Sometimes | Yes 93go_dump | Sometimes | Yes
94go_callback | Sometimes (N/A) | Yes
92 95
93N.B. Operations must not drop either the bit lock or the spinlock 96N.B. Operations must not drop either the bit lock or the spinlock
94if its held on entry. go_dump and do_demote_ok must never block. 97if its held on entry. go_dump and do_demote_ok must never block.
@@ -111,4 +114,118 @@ itself (locking order as above), and the other, known as the iopen
111glock is used in conjunction with the i_nlink field in the inode to 114glock is used in conjunction with the i_nlink field in the inode to
112determine the lifetime of the inode in question. Locking of inodes 115determine the lifetime of the inode in question. Locking of inodes
113is on a per-inode basis. Locking of rgrps is on a per rgrp basis. 116is on a per-inode basis. Locking of rgrps is on a per rgrp basis.
117In general we prefer to lock local locks prior to cluster locks.
118
119 Glock Statistics
120 ------------------
121
122The stats are divided into two sets: those relating to the
123super block and those relating to an individual glock. The
124super block stats are done on a per cpu basis in order to
125try and reduce the overhead of gathering them. They are also
126further divided by glock type. All timings are in nanoseconds.
127
128In the case of both the super block and glock statistics,
129the same information is gathered in each case. The super
130block timing statistics are used to provide default values for
131the glock timing statistics, so that newly created glocks
132should have, as far as possible, a sensible starting point.
133The per-glock counters are initialised to zero when the
134glock is created. The per-glock statistics are lost when
135the glock is ejected from memory.
136
137The statistics are divided into three pairs of mean and
138variance, plus two counters. The mean/variance pairs are
139smoothed exponential estimates and the algorithm used is
140one which will be very familiar to those used to calculation
141of round trip times in network code. See "TCP/IP Illustrated,
142Volume 1", W. Richard Stevens, sect 21.3, "Round-Trip Time Measurement",
143p. 299 and onwards. Also, Volume 2, Sect. 25.10, p. 838 and onwards.
144Unlike the TCP/IP Illustrated case, the mean and variance are
145not scaled, but are in units of integer nanoseconds.
146
147The three pairs of mean/variance measure the following
148things:
149
150 1. DLM lock time (non-blocking requests)
151 2. DLM lock time (blocking requests)
152 3. Inter-request time (again to the DLM)
153
154A non-blocking request is one which will complete right
155away, whatever the state of the DLM lock in question. That
156currently means any requests when (a) the current state of
157the lock is exclusive, i.e. a lock demotion (b) the requested
158state is either null or unlocked (again, a demotion) or (c) the
159"try lock" flag is set. A blocking request covers all the other
160lock requests.
161
162There are two counters. The first is there primarily to show
163how many lock requests have been made, and thus how much data
164has gone into the mean/variance calculations. The other counter
165is counting queuing of holders at the top layer of the glock
166code. Hopefully that number will be a lot larger than the number
167of dlm lock requests issued.
168
169So why gather these statistics? There are several reasons
170we'd like to get a better idea of these timings:
171
1721. To be able to better set the glock "min hold time"
1732. To spot performance issues more easily
1743. To improve the algorithm for selecting resource groups for
175allocation (to base it on lock wait time, rather than blindly
176using a "try lock")
177
178Due to the smoothing action of the updates, a step change in
179some input quantity being sampled will only fully be taken
180into account after 8 samples (or 4 for the variance) and this
181needs to be carefully considered when interpreting the
182results.
183
184Knowing both the time it takes a lock request to complete and
185the average time between lock requests for a glock means we
186can compute the total percentage of the time for which the
187node is able to use a glock vs. time that the rest of the
188cluster has its share. That will be very useful when setting
189the lock min hold time.
190
191Great care has been taken to ensure that we
192measure exactly the quantities that we want, as accurately
193as possible. There are always inaccuracies in any
194measuring system, but I hope this is as accurate as we
195can reasonably make it.
196
197Per sb stats can be found here:
198/sys/kernel/debug/gfs2/<fsname>/sbstats
199Per glock stats can be found here:
200/sys/kernel/debug/gfs2/<fsname>/glstats
201
202Assuming that debugfs is mounted on /sys/kernel/debug and also
203that <fsname> is replaced with the name of the gfs2 filesystem
204in question.
205
206The abbreviations used in the output as are follows:
207
208srtt - Smoothed round trip time for non-blocking dlm requests
209srttvar - Variance estimate for srtt
210srttb - Smoothed round trip time for (potentially) blocking dlm requests
211srttvarb - Variance estimate for srttb
212sirt - Smoothed inter-request time (for dlm requests)
213sirtvar - Variance estimate for sirt
214dlm - Number of dlm requests made (dcnt in glstats file)
215queue - Number of glock requests queued (qcnt in glstats file)
216
217The sbstats file contains a set of these stats for each glock type (so 8 lines
218for each type) and for each cpu (one column per cpu). The glstats file contains
219a set of these stats for each glock in a similar format to the glocks file, but
220using the format mean/variance for each of the timing stats.
221
222The gfs2_glock_lock_time tracepoint prints out the current values of the stats
223for the glock in question, along with some addition information on each dlm
224reply that is received:
225
226status - The status of the dlm request
227flags - The dlm request flags
228tdiff - The time taken by this specific request
229(remaining fields as per above list)
230
114 231
diff --git a/Documentation/filesystems/gfs2.txt b/Documentation/filesystems/gfs2.txt
index 4cda926628a..cc4f2306609 100644
--- a/Documentation/filesystems/gfs2.txt
+++ b/Documentation/filesystems/gfs2.txt
@@ -1,7 +1,7 @@
1Global File System 1Global File System
2------------------ 2------------------
3 3
4http://sources.redhat.com/cluster/wiki/ 4https://fedorahosted.org/cluster/wiki/HomePage
5 5
6GFS is a cluster file system. It allows a cluster of computers to 6GFS is a cluster file system. It allows a cluster of computers to
7simultaneously use a block device that is shared between them (with FC, 7simultaneously use a block device that is shared between them (with FC,
@@ -30,7 +30,8 @@ needed, simply:
30 30
31If you are using Fedora, you need to install the gfs2-utils package 31If you are using Fedora, you need to install the gfs2-utils package
32and, for lock_dlm, you will also need to install the cman package 32and, for lock_dlm, you will also need to install the cman package
33and write a cluster.conf as per the documentation. 33and write a cluster.conf as per the documentation. For F17 and above
34cman has been replaced by the dlm package.
34 35
35GFS2 is not on-disk compatible with previous versions of GFS, but it 36GFS2 is not on-disk compatible with previous versions of GFS, but it
36is pretty close. 37is pretty close.
@@ -39,8 +40,6 @@ The following man pages can be found at the URL above:
39 fsck.gfs2 to repair a filesystem 40 fsck.gfs2 to repair a filesystem
40 gfs2_grow to expand a filesystem online 41 gfs2_grow to expand a filesystem online
41 gfs2_jadd to add journals to a filesystem online 42 gfs2_jadd to add journals to a filesystem online
42 gfs2_tool to manipulate, examine and tune a filesystem 43 tunegfs2 to manipulate, examine and tune a filesystem
43 gfs2_quota to examine and change quota values in a filesystem
44 gfs2_convert to convert a gfs filesystem to gfs2 in-place 44 gfs2_convert to convert a gfs filesystem to gfs2 in-place
45 mount.gfs2 to help mount(8) mount a filesystem
46 mkfs.gfs2 to make a filesystem 45 mkfs.gfs2 to make a filesystem
diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt
index c7919c6e3be..52ae07f5f57 100644
--- a/Documentation/filesystems/nfs/pnfs.txt
+++ b/Documentation/filesystems/nfs/pnfs.txt
@@ -93,7 +93,7 @@ The API to the login script is as follows:
93 (allways exists) 93 (allways exists)
94 (More protocols can be defined in the future. 94 (More protocols can be defined in the future.
95 The client does not interpret this string it is 95 The client does not interpret this string it is
96 passed unchanged as recieved from the Server) 96 passed unchanged as received from the Server)
97 -o osdname of the requested target OSD 97 -o osdname of the requested target OSD
98 (Might be empty) 98 (Might be empty)
99 (A string which denotes the OSD name, there is a 99 (A string which denotes the OSD name, there is a
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index b7413cb46dc..ef088e55ab2 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -996,7 +996,6 @@ Table 1-9: Network info in /proc/net
996 snmp SNMP data 996 snmp SNMP data
997 sockstat Socket statistics 997 sockstat Socket statistics
998 tcp TCP sockets 998 tcp TCP sockets
999 tr_rif Token ring RIF routing table
1000 udp UDP sockets 999 udp UDP sockets
1001 unix UNIX domain sockets 1000 unix UNIX domain sockets
1002 wireless Wireless interface data (Wavelan etc) 1001 wireless Wireless interface data (Wavelan etc)
diff --git a/Documentation/filesystems/qnx6.txt b/Documentation/filesystems/qnx6.txt
index 050223ea03c..e59f2f09f56 100644
--- a/Documentation/filesystems/qnx6.txt
+++ b/Documentation/filesystems/qnx6.txt
@@ -17,7 +17,7 @@ concepts of blocks, inodes and directories.
17On QNX it is possible to create little endian and big endian qnx6 filesystems. 17On QNX it is possible to create little endian and big endian qnx6 filesystems.
18This feature makes it possible to create and use a different endianness fs 18This feature makes it possible to create and use a different endianness fs
19for the target (QNX is used on quite a range of embedded systems) plattform 19for the target (QNX is used on quite a range of embedded systems) plattform
20running on a different endianess. 20running on a different endianness.
21The Linux driver handles endianness transparently. (LE and BE) 21The Linux driver handles endianness transparently. (LE and BE)
22 22
23Blocks 23Blocks
@@ -26,7 +26,7 @@ Blocks
26The space in the device or file is split up into blocks. These are a fixed 26The space in the device or file is split up into blocks. These are a fixed
27size of 512, 1024, 2048 or 4096, which is decided when the filesystem is 27size of 512, 1024, 2048 or 4096, which is decided when the filesystem is
28created. 28created.
29Blockpointers are 32bit, so the maximum space that can be adressed is 29Blockpointers are 32bit, so the maximum space that can be addressed is
302^32 * 4096 bytes or 16TB 302^32 * 4096 bytes or 16TB
31 31
32The superblocks 32The superblocks
@@ -47,16 +47,16 @@ inactive superblock.
47Each superblock holds a set of root inodes for the different filesystem 47Each superblock holds a set of root inodes for the different filesystem
48parts. (Inode, Bitmap and Longfilenames) 48parts. (Inode, Bitmap and Longfilenames)
49Each of these root nodes holds information like total size of the stored 49Each of these root nodes holds information like total size of the stored
50data and the adressing levels in that specific tree. 50data and the addressing levels in that specific tree.
51If the level value is 0, up to 16 direct blocks can be adressed by each 51If the level value is 0, up to 16 direct blocks can be addressed by each
52node. 52node.
53Level 1 adds an additional indirect adressing level where each indirect 53Level 1 adds an additional indirect addressing level where each indirect
54adressing block holds up to blocksize / 4 bytes pointers to data blocks. 54addressing block holds up to blocksize / 4 bytes pointers to data blocks.
55Level 2 adds an additional indirect adressig block level (so, already up 55Level 2 adds an additional indirect addressing block level (so, already up
56to 16 * 256 * 256 = 1048576 blocks that can be adressed by such a tree)a 56to 16 * 256 * 256 = 1048576 blocks that can be addressed by such a tree).
57 57
58Unused block pointers are always set to ~0 - regardless of root node, 58Unused block pointers are always set to ~0 - regardless of root node,
59indirect adressing blocks or inodes. 59indirect addressing blocks or inodes.
60Data leaves are always on the lowest level. So no data is stored on upper 60Data leaves are always on the lowest level. So no data is stored on upper
61tree levels. 61tree levels.
62 62
@@ -64,7 +64,7 @@ The first Superblock is located at 0x2000. (0x2000 is the bootblock size)
64The Audi MMI 3G first superblock directly starts at byte 0. 64The Audi MMI 3G first superblock directly starts at byte 0.
65Second superblock position can either be calculated from the superblock 65Second superblock position can either be calculated from the superblock
66information (total number of filesystem blocks) or by taking the highest 66information (total number of filesystem blocks) or by taking the highest
67device address, zeroing the last 3 bytes and then substracting 0x1000 from 67device address, zeroing the last 3 bytes and then subtracting 0x1000 from
68that address. 68that address.
69 69
700x1000 is the size reserved for each superblock - regardless of the 700x1000 is the size reserved for each superblock - regardless of the
@@ -83,8 +83,8 @@ size, number of blocks used, access time, change time and modification time.
83Object mode field is POSIX format. (which makes things easier) 83Object mode field is POSIX format. (which makes things easier)
84 84
85There are also pointers to the first 16 blocks, if the object data can be 85There are also pointers to the first 16 blocks, if the object data can be
86adressed with 16 direct blocks. 86addressed with 16 direct blocks.
87For more than 16 blocks an indirect adressing in form of another tree is 87For more than 16 blocks an indirect addressing in form of another tree is
88used. (scheme is the same as the one used for the superblock root nodes) 88used. (scheme is the same as the one used for the superblock root nodes)
89 89
90The filesize is stored 64bit. Inode counting starts with 1. (whilst long 90The filesize is stored 64bit. Inode counting starts with 1. (whilst long
@@ -118,13 +118,13 @@ no block pointers and the directory file record pointing to the target file
118inode. 118inode.
119 119
120Character and block special devices do not exist in QNX as those files 120Character and block special devices do not exist in QNX as those files
121are handled by the QNX kernel/drivers and created in /dev independant of the 121are handled by the QNX kernel/drivers and created in /dev independent of the
122underlaying filesystem. 122underlaying filesystem.
123 123
124Long filenames 124Long filenames
125-------------- 125--------------
126 126
127Long filenames are stored in a seperate adressing tree. The staring point 127Long filenames are stored in a separate addressing tree. The staring point
128is the longfilename root node in the active superblock. 128is the longfilename root node in the active superblock.
129Each data block (tree leaves) holds one long filename. That filename is 129Each data block (tree leaves) holds one long filename. That filename is
130limited to 510 bytes. The first two starting bytes are used as length field 130limited to 510 bytes. The first two starting bytes are used as length field