diff options
author | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 18:20:36 -0400 |
---|---|---|
committer | Linus Torvalds <torvalds@ppc970.osdl.org> | 2005-04-16 18:20:36 -0400 |
commit | 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (patch) | |
tree | 0bba044c4ce775e45a88a51686b5d9f90697ea9d /Documentation/iostats.txt |
Linux-2.6.12-rc2v2.6.12-rc2
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!
Diffstat (limited to 'Documentation/iostats.txt')
-rw-r--r-- | Documentation/iostats.txt | 150 |
1 files changed, 150 insertions, 0 deletions
diff --git a/Documentation/iostats.txt b/Documentation/iostats.txt new file mode 100644 index 000000000000..09a1bafe2528 --- /dev/null +++ b/Documentation/iostats.txt | |||
@@ -0,0 +1,150 @@ | |||
1 | I/O statistics fields | ||
2 | --------------- | ||
3 | |||
4 | Last modified Sep 30, 2003 | ||
5 | |||
6 | Since 2.4.20 (and some versions before, with patches), and 2.5.45, | ||
7 | more extensive disk statistics have been introduced to help measure disk | ||
8 | activity. Tools such as sar and iostat typically interpret these and do | ||
9 | the work for you, but in case you are interested in creating your own | ||
10 | tools, the fields are explained here. | ||
11 | |||
12 | In 2.4 now, the information is found as additional fields in | ||
13 | /proc/partitions. In 2.6, the same information is found in two | ||
14 | places: one is in the file /proc/diskstats, and the other is within | ||
15 | the sysfs file system, which must be mounted in order to obtain | ||
16 | the information. Throughout this document we'll assume that sysfs | ||
17 | is mounted on /sys, although of course it may be mounted anywhere. | ||
18 | Both /proc/diskstats and sysfs use the same source for the information | ||
19 | and so should not differ. | ||
20 | |||
21 | Here are examples of these different formats: | ||
22 | |||
23 | 2.4: | ||
24 | 3 0 39082680 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | ||
25 | 3 1 9221278 hda1 35486 0 35496 38030 0 0 0 0 0 38030 38030 | ||
26 | |||
27 | |||
28 | 2.6 sysfs: | ||
29 | 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | ||
30 | 35486 38030 38030 38030 | ||
31 | |||
32 | 2.6 diskstats: | ||
33 | 3 0 hda 446216 784926 9550688 4382310 424847 312726 5922052 19310380 0 3376340 23705160 | ||
34 | 3 1 hda1 35486 38030 38030 38030 | ||
35 | |||
36 | On 2.4 you might execute "grep 'hda ' /proc/partitions". On 2.6, you have | ||
37 | a choice of "cat /sys/block/hda/stat" or "grep 'hda ' /proc/diskstats". | ||
38 | The advantage of one over the other is that the sysfs choice works well | ||
39 | if you are watching a known, small set of disks. /proc/diskstats may | ||
40 | be a better choice if you are watching a large number of disks because | ||
41 | you'll avoid the overhead of 50, 100, or 500 or more opens/closes with | ||
42 | each snapshot of your disk statistics. | ||
43 | |||
44 | In 2.4, the statistics fields are those after the device name. In | ||
45 | the above example, the first field of statistics would be 446216. | ||
46 | By contrast, in 2.6 if you look at /sys/block/hda/stat, you'll | ||
47 | find just the eleven fields, beginning with 446216. If you look at | ||
48 | /proc/diskstats, the eleven fields will be preceded by the major and | ||
49 | minor device numbers, and device name. Each of these formats provide | ||
50 | eleven fields of statistics, each meaning exactly the same things. | ||
51 | All fields except field 9 are cumulative since boot. Field 9 should | ||
52 | go to zero as I/Os complete; all others only increase. Yes, these are | ||
53 | 32 bit unsigned numbers, and on a very busy or long-lived system they | ||
54 | may wrap. Applications should be prepared to deal with that; unless | ||
55 | your observations are measured in large numbers of minutes or hours, | ||
56 | they should not wrap twice before you notice them. | ||
57 | |||
58 | Each set of stats only applies to the indicated device; if you want | ||
59 | system-wide stats you'll have to find all the devices and sum them all up. | ||
60 | |||
61 | Field 1 -- # of reads issued | ||
62 | This is the total number of reads completed successfully. | ||
63 | Field 2 -- # of reads merged, field 6 -- # of writes merged | ||
64 | Reads and writes which are adjacent to each other may be merged for | ||
65 | efficiency. Thus two 4K reads may become one 8K read before it is | ||
66 | ultimately handed to the disk, and so it will be counted (and queued) | ||
67 | as only one I/O. This field lets you know how often this was done. | ||
68 | Field 3 -- # of sectors read | ||
69 | This is the total number of sectors read successfully. | ||
70 | Field 4 -- # of milliseconds spent reading | ||
71 | This is the total number of milliseconds spent by all reads (as | ||
72 | measured from __make_request() to end_that_request_last()). | ||
73 | Field 5 -- # of writes completed | ||
74 | This is the total number of writes completed successfully. | ||
75 | Field 7 -- # of sectors written | ||
76 | This is the total number of sectors written successfully. | ||
77 | Field 8 -- # of milliseconds spent writing | ||
78 | This is the total number of milliseconds spent by all writes (as | ||
79 | measured from __make_request() to end_that_request_last()). | ||
80 | Field 9 -- # of I/Os currently in progress | ||
81 | The only field that should go to zero. Incremented as requests are | ||
82 | given to appropriate request_queue_t and decremented as they finish. | ||
83 | Field 10 -- # of milliseconds spent doing I/Os | ||
84 | This field is increases so long as field 9 is nonzero. | ||
85 | Field 11 -- weighted # of milliseconds spent doing I/Os | ||
86 | This field is incremented at each I/O start, I/O completion, I/O | ||
87 | merge, or read of these stats by the number of I/Os in progress | ||
88 | (field 9) times the number of milliseconds spent doing I/O since the | ||
89 | last update of this field. This can provide an easy measure of both | ||
90 | I/O completion time and the backlog that may be accumulating. | ||
91 | |||
92 | |||
93 | To avoid introducing performance bottlenecks, no locks are held while | ||
94 | modifying these counters. This implies that minor inaccuracies may be | ||
95 | introduced when changes collide, so (for instance) adding up all the | ||
96 | read I/Os issued per partition should equal those made to the disks ... | ||
97 | but due to the lack of locking it may only be very close. | ||
98 | |||
99 | In 2.6, there are counters for each cpu, which made the lack of locking | ||
100 | almost a non-issue. When the statistics are read, the per-cpu counters | ||
101 | are summed (possibly overflowing the unsigned 32-bit variable they are | ||
102 | summed to) and the result given to the user. There is no convenient | ||
103 | user interface for accessing the per-cpu counters themselves. | ||
104 | |||
105 | Disks vs Partitions | ||
106 | ------------------- | ||
107 | |||
108 | There were significant changes between 2.4 and 2.6 in the I/O subsystem. | ||
109 | As a result, some statistic information disappeared. The translation from | ||
110 | a disk address relative to a partition to the disk address relative to | ||
111 | the host disk happens much earlier. All merges and timings now happen | ||
112 | at the disk level rather than at both the disk and partition level as | ||
113 | in 2.4. Consequently, you'll see a different statistics output on 2.6 for | ||
114 | partitions from that for disks. There are only *four* fields available | ||
115 | for partitions on 2.6 machines. This is reflected in the examples above. | ||
116 | |||
117 | Field 1 -- # of reads issued | ||
118 | This is the total number of reads issued to this partition. | ||
119 | Field 2 -- # of sectors read | ||
120 | This is the total number of sectors requested to be read from this | ||
121 | partition. | ||
122 | Field 3 -- # of writes issued | ||
123 | This is the total number of writes issued to this partition. | ||
124 | Field 4 -- # of sectors written | ||
125 | This is the total number of sectors requested to be written to | ||
126 | this partition. | ||
127 | |||
128 | Note that since the address is translated to a disk-relative one, and no | ||
129 | record of the partition-relative address is kept, the subsequent success | ||
130 | or failure of the read cannot be attributed to the partition. In other | ||
131 | words, the number of reads for partitions is counted slightly before time | ||
132 | of queuing for partitions, and at completion for whole disks. This is | ||
133 | a subtle distinction that is probably uninteresting for most cases. | ||
134 | |||
135 | Additional notes | ||
136 | ---------------- | ||
137 | |||
138 | In 2.6, sysfs is not mounted by default. If your distribution of | ||
139 | Linux hasn't added it already, here's the line you'll want to add to | ||
140 | your /etc/fstab: | ||
141 | |||
142 | none /sys sysfs defaults 0 0 | ||
143 | |||
144 | |||
145 | In 2.6, all disk statistics were removed from /proc/stat. In 2.4, they | ||
146 | appear in both /proc/partitions and /proc/stat, although the ones in | ||
147 | /proc/stat take a very different format from those in /proc/partitions | ||
148 | (see proc(5), if your system has it.) | ||
149 | |||
150 | -- ricklind@us.ibm.com | ||