diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/block/00-INDEX | 2 | ||||
-rw-r--r-- | Documentation/block/as-iosched.txt | 172 | ||||
-rw-r--r-- | Documentation/filesystems/ext4.txt | 2 | ||||
-rw-r--r-- | Documentation/kvm/api.txt | 10 | ||||
-rw-r--r-- | Documentation/sound/alsa/Procfile.txt | 2 | ||||
-rw-r--r-- | Documentation/vgaarbiter.txt | 2 |
6 files changed, 12 insertions, 178 deletions
diff --git a/Documentation/block/00-INDEX b/Documentation/block/00-INDEX index 961a0513f8c3..a406286f6f3e 100644 --- a/Documentation/block/00-INDEX +++ b/Documentation/block/00-INDEX | |||
@@ -1,7 +1,5 @@ | |||
1 | 00-INDEX | 1 | 00-INDEX |
2 | - This file | 2 | - This file |
3 | as-iosched.txt | ||
4 | - Anticipatory IO scheduler | ||
5 | barrier.txt | 3 | barrier.txt |
6 | - I/O Barriers | 4 | - I/O Barriers |
7 | biodoc.txt | 5 | biodoc.txt |
diff --git a/Documentation/block/as-iosched.txt b/Documentation/block/as-iosched.txt deleted file mode 100644 index 738b72be128e..000000000000 --- a/Documentation/block/as-iosched.txt +++ /dev/null | |||
@@ -1,172 +0,0 @@ | |||
1 | Anticipatory IO scheduler | ||
2 | ------------------------- | ||
3 | Nick Piggin <piggin@cyberone.com.au> 13 Sep 2003 | ||
4 | |||
5 | Attention! Database servers, especially those using "TCQ" disks should | ||
6 | investigate performance with the 'deadline' IO scheduler. Any system with high | ||
7 | disk performance requirements should do so, in fact. | ||
8 | |||
9 | If you see unusual performance characteristics of your disk systems, or you | ||
10 | see big performance regressions versus the deadline scheduler, please email | ||
11 | me. Database users don't bother unless you're willing to test a lot of patches | ||
12 | from me ;) its a known issue. | ||
13 | |||
14 | Also, users with hardware RAID controllers, doing striping, may find | ||
15 | highly variable performance results with using the as-iosched. The | ||
16 | as-iosched anticipatory implementation is based on the notion that a disk | ||
17 | device has only one physical seeking head. A striped RAID controller | ||
18 | actually has a head for each physical device in the logical RAID device. | ||
19 | |||
20 | However, setting the antic_expire (see tunable parameters below) produces | ||
21 | very similar behavior to the deadline IO scheduler. | ||
22 | |||
23 | Selecting IO schedulers | ||
24 | ----------------------- | ||
25 | Refer to Documentation/block/switching-sched.txt for information on | ||
26 | selecting an io scheduler on a per-device basis. | ||
27 | |||
28 | Anticipatory IO scheduler Policies | ||
29 | ---------------------------------- | ||
30 | The as-iosched implementation implements several layers of policies | ||
31 | to determine when an IO request is dispatched to the disk controller. | ||
32 | Here are the policies outlined, in order of application. | ||
33 | |||
34 | 1. one-way Elevator algorithm. | ||
35 | |||
36 | The elevator algorithm is similar to that used in deadline scheduler, with | ||
37 | the addition that it allows limited backward movement of the elevator | ||
38 | (i.e. seeks backwards). A seek backwards can occur when choosing between | ||
39 | two IO requests where one is behind the elevator's current position, and | ||
40 | the other is in front of the elevator's position. If the seek distance to | ||
41 | the request in back of the elevator is less than half the seek distance to | ||
42 | the request in front of the elevator, then the request in back can be chosen. | ||
43 | Backward seeks are also limited to a maximum of MAXBACK (1024*1024) sectors. | ||
44 | This favors forward movement of the elevator, while allowing opportunistic | ||
45 | "short" backward seeks. | ||
46 | |||
47 | 2. FIFO expiration times for reads and for writes. | ||
48 | |||
49 | This is again very similar to the deadline IO scheduler. The expiration | ||
50 | times for requests on these lists is tunable using the parameters read_expire | ||
51 | and write_expire discussed below. When a read or a write expires in this way, | ||
52 | the IO scheduler will interrupt its current elevator sweep or read anticipation | ||
53 | to service the expired request. | ||
54 | |||
55 | 3. Read and write request batching | ||
56 | |||
57 | A batch is a collection of read requests or a collection of write | ||
58 | requests. The as scheduler alternates dispatching read and write batches | ||
59 | to the driver. In the case a read batch, the scheduler submits read | ||
60 | requests to the driver as long as there are read requests to submit, and | ||
61 | the read batch time limit has not been exceeded (read_batch_expire). | ||
62 | The read batch time limit begins counting down only when there are | ||
63 | competing write requests pending. | ||
64 | |||
65 | In the case of a write batch, the scheduler submits write requests to | ||
66 | the driver as long as there are write requests available, and the | ||
67 | write batch time limit has not been exceeded (write_batch_expire). | ||
68 | However, the length of write batches will be gradually shortened | ||
69 | when read batches frequently exceed their time limit. | ||
70 | |||
71 | When changing between batch types, the scheduler waits for all requests | ||
72 | from the previous batch to complete before scheduling requests for the | ||
73 | next batch. | ||
74 | |||
75 | The read and write fifo expiration times described in policy 2 above | ||
76 | are checked only when in scheduling IO of a batch for the corresponding | ||
77 | (read/write) type. So for example, the read FIFO timeout values are | ||
78 | tested only during read batches. Likewise, the write FIFO timeout | ||
79 | values are tested only during write batches. For this reason, | ||
80 | it is generally not recommended for the read batch time | ||
81 | to be longer than the write expiration time, nor for the write batch | ||
82 | time to exceed the read expiration time (see tunable parameters below). | ||
83 | |||
84 | When the IO scheduler changes from a read to a write batch, | ||
85 | it begins the elevator from the request that is on the head of the | ||
86 | write expiration FIFO. Likewise, when changing from a write batch to | ||
87 | a read batch, scheduler begins the elevator from the first entry | ||
88 | on the read expiration FIFO. | ||
89 | |||
90 | 4. Read anticipation. | ||
91 | |||
92 | Read anticipation occurs only when scheduling a read batch. | ||
93 | This implementation of read anticipation allows only one read request | ||
94 | to be dispatched to the disk controller at a time. In | ||
95 | contrast, many write requests may be dispatched to the disk controller | ||
96 | at a time during a write batch. It is this characteristic that can make | ||
97 | the anticipatory scheduler perform anomalously with controllers supporting | ||
98 | TCQ, or with hardware striped RAID devices. Setting the antic_expire | ||
99 | queue parameter (see below) to zero disables this behavior, and the | ||
100 | anticipatory scheduler behaves essentially like the deadline scheduler. | ||
101 | |||
102 | When read anticipation is enabled (antic_expire is not zero), reads | ||
103 | are dispatched to the disk controller one at a time. | ||
104 | At the end of each read request, the IO scheduler examines its next | ||
105 | candidate read request from its sorted read list. If that next request | ||
106 | is from the same process as the request that just completed, | ||
107 | or if the next request in the queue is "very close" to the | ||
108 | just completed request, it is dispatched immediately. Otherwise, | ||
109 | statistics (average think time, average seek distance) on the process | ||
110 | that submitted the just completed request are examined. If it seems | ||
111 | likely that that process will submit another request soon, and that | ||
112 | request is likely to be near the just completed request, then the IO | ||
113 | scheduler will stop dispatching more read requests for up to (antic_expire) | ||
114 | milliseconds, hoping that process will submit a new request near the one | ||
115 | that just completed. If such a request is made, then it is dispatched | ||
116 | immediately. If the antic_expire wait time expires, then the IO scheduler | ||
117 | will dispatch the next read request from the sorted read queue. | ||
118 | |||
119 | To decide whether an anticipatory wait is worthwhile, the scheduler | ||
120 | maintains statistics for each process that can be used to compute | ||
121 | mean "think time" (the time between read requests), and mean seek | ||
122 | distance for that process. One observation is that these statistics | ||
123 | are associated with each process, but those statistics are not associated | ||
124 | with a specific IO device. So for example, if a process is doing IO | ||
125 | on several file systems on separate devices, the statistics will be | ||
126 | a combination of IO behavior from all those devices. | ||
127 | |||
128 | |||
129 | Tuning the anticipatory IO scheduler | ||
130 | ------------------------------------ | ||
131 | When using 'as', the anticipatory IO scheduler there are 5 parameters under | ||
132 | /sys/block/*/queue/iosched/. All are units of milliseconds. | ||
133 | |||
134 | The parameters are: | ||
135 | * read_expire | ||
136 | Controls how long until a read request becomes "expired". It also controls the | ||
137 | interval between which expired requests are served, so set to 50, a request | ||
138 | might take anywhere < 100ms to be serviced _if_ it is the next on the | ||
139 | expired list. Obviously request expiration strategies won't make the disk | ||
140 | go faster. The result basically equates to the timeslice a single reader | ||
141 | gets in the presence of other IO. 100*((seek time / read_expire) + 1) is | ||
142 | very roughly the % streaming read efficiency your disk should get with | ||
143 | multiple readers. | ||
144 | |||
145 | * read_batch_expire | ||
146 | Controls how much time a batch of reads is given before pending writes are | ||
147 | served. A higher value is more efficient. This might be set below read_expire | ||
148 | if writes are to be given higher priority than reads, but reads are to be | ||
149 | as efficient as possible when there are no writes. Generally though, it | ||
150 | should be some multiple of read_expire. | ||
151 | |||
152 | * write_expire, and | ||
153 | * write_batch_expire are equivalent to the above, for writes. | ||
154 | |||
155 | * antic_expire | ||
156 | Controls the maximum amount of time we can anticipate a good read (one | ||
157 | with a short seek distance from the most recently completed request) before | ||
158 | giving up. Many other factors may cause anticipation to be stopped early, | ||
159 | or some processes will not be "anticipated" at all. Should be a bit higher | ||
160 | for big seek time devices though not a linear correspondence - most | ||
161 | processes have only a few ms thinktime. | ||
162 | |||
163 | In addition to the tunables above there is a read-only file named est_time | ||
164 | which, when read, will show: | ||
165 | |||
166 | - The probability of a task exiting without a cooperating task | ||
167 | submitting an anticipated IO. | ||
168 | |||
169 | - The current mean think time. | ||
170 | |||
171 | - The seek distance used to determine if an incoming IO is better. | ||
172 | |||
diff --git a/Documentation/filesystems/ext4.txt b/Documentation/filesystems/ext4.txt index af6885c3c821..e1def1786e50 100644 --- a/Documentation/filesystems/ext4.txt +++ b/Documentation/filesystems/ext4.txt | |||
@@ -196,7 +196,7 @@ nobarrier This also requires an IO stack which can support | |||
196 | also be used to enable or disable barriers, for | 196 | also be used to enable or disable barriers, for |
197 | consistency with other ext4 mount options. | 197 | consistency with other ext4 mount options. |
198 | 198 | ||
199 | inode_readahead=n This tuning parameter controls the maximum | 199 | inode_readahead_blks=n This tuning parameter controls the maximum |
200 | number of inode table blocks that ext4's inode | 200 | number of inode table blocks that ext4's inode |
201 | table readahead algorithm will pre-read into | 201 | table readahead algorithm will pre-read into |
202 | the buffer cache. The default value is 32 blocks. | 202 | the buffer cache. The default value is 32 blocks. |
diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt index e1a114161027..2811e452f756 100644 --- a/Documentation/kvm/api.txt +++ b/Documentation/kvm/api.txt | |||
@@ -685,7 +685,7 @@ struct kvm_vcpu_events { | |||
685 | __u8 pad; | 685 | __u8 pad; |
686 | } nmi; | 686 | } nmi; |
687 | __u32 sipi_vector; | 687 | __u32 sipi_vector; |
688 | __u32 flags; /* must be zero */ | 688 | __u32 flags; |
689 | }; | 689 | }; |
690 | 690 | ||
691 | 4.30 KVM_SET_VCPU_EVENTS | 691 | 4.30 KVM_SET_VCPU_EVENTS |
@@ -701,6 +701,14 @@ vcpu. | |||
701 | 701 | ||
702 | See KVM_GET_VCPU_EVENTS for the data structure. | 702 | See KVM_GET_VCPU_EVENTS for the data structure. |
703 | 703 | ||
704 | Fields that may be modified asynchronously by running VCPUs can be excluded | ||
705 | from the update. These fields are nmi.pending and sipi_vector. Keep the | ||
706 | corresponding bits in the flags field cleared to suppress overwriting the | ||
707 | current in-kernel state. The bits are: | ||
708 | |||
709 | KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel | ||
710 | KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector | ||
711 | |||
704 | 712 | ||
705 | 5. The kvm_run structure | 713 | 5. The kvm_run structure |
706 | 714 | ||
diff --git a/Documentation/sound/alsa/Procfile.txt b/Documentation/sound/alsa/Procfile.txt index 719a819f8cc2..07301de12cc4 100644 --- a/Documentation/sound/alsa/Procfile.txt +++ b/Documentation/sound/alsa/Procfile.txt | |||
@@ -95,7 +95,7 @@ card*/pcm*/xrun_debug | |||
95 | It takes an integer value, can be changed by writing to this | 95 | It takes an integer value, can be changed by writing to this |
96 | file, such as | 96 | file, such as |
97 | 97 | ||
98 | # cat 5 > /proc/asound/card0/pcm0p/xrun_debug | 98 | # echo 5 > /proc/asound/card0/pcm0p/xrun_debug |
99 | 99 | ||
100 | The value consists of the following bit flags: | 100 | The value consists of the following bit flags: |
101 | bit 0 = Enable XRUN/jiffies debug messages | 101 | bit 0 = Enable XRUN/jiffies debug messages |
diff --git a/Documentation/vgaarbiter.txt b/Documentation/vgaarbiter.txt index 987f9b0a5ece..43a9b0694fdd 100644 --- a/Documentation/vgaarbiter.txt +++ b/Documentation/vgaarbiter.txt | |||
@@ -103,7 +103,7 @@ I.2 libpciaccess | |||
103 | ---------------- | 103 | ---------------- |
104 | 104 | ||
105 | To use the vga arbiter char device it was implemented an API inside the | 105 | To use the vga arbiter char device it was implemented an API inside the |
106 | libpciaccess library. One fieldd was added to struct pci_device (each device | 106 | libpciaccess library. One field was added to struct pci_device (each device |
107 | on the system): | 107 | on the system): |
108 | 108 | ||
109 | /* the type of resource decoded by the device */ | 109 | /* the type of resource decoded by the device */ |