diff options
Diffstat (limited to 'Documentation')
27 files changed, 2248 insertions, 195 deletions
diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile index 5a2882d275ba..66e1cf733571 100644 --- a/Documentation/DocBook/Makefile +++ b/Documentation/DocBook/Makefile | |||
@@ -10,7 +10,8 @@ DOCBOOKS := wanbook.xml z8530book.xml mcabook.xml videobook.xml \ | |||
10 | kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ | 10 | kernel-hacking.xml kernel-locking.xml deviceiobook.xml \ |
11 | procfs-guide.xml writing_usb_driver.xml \ | 11 | procfs-guide.xml writing_usb_driver.xml \ |
12 | kernel-api.xml journal-api.xml lsm.xml usb.xml \ | 12 | kernel-api.xml journal-api.xml lsm.xml usb.xml \ |
13 | gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml | 13 | gadget.xml libata.xml mtdnand.xml librs.xml rapidio.xml \ |
14 | genericirq.xml | ||
14 | 15 | ||
15 | ### | 16 | ### |
16 | # The build process is as follows (targets): | 17 | # The build process is as follows (targets): |
diff --git a/Documentation/DocBook/genericirq.tmpl b/Documentation/DocBook/genericirq.tmpl new file mode 100644 index 000000000000..0f4a4b6321e4 --- /dev/null +++ b/Documentation/DocBook/genericirq.tmpl | |||
@@ -0,0 +1,474 @@ | |||
1 | <?xml version="1.0" encoding="UTF-8"?> | ||
2 | <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.1.2//EN" | ||
3 | "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd" []> | ||
4 | |||
5 | <book id="Generic-IRQ-Guide"> | ||
6 | <bookinfo> | ||
7 | <title>Linux generic IRQ handling</title> | ||
8 | |||
9 | <authorgroup> | ||
10 | <author> | ||
11 | <firstname>Thomas</firstname> | ||
12 | <surname>Gleixner</surname> | ||
13 | <affiliation> | ||
14 | <address> | ||
15 | <email>tglx@linutronix.de</email> | ||
16 | </address> | ||
17 | </affiliation> | ||
18 | </author> | ||
19 | <author> | ||
20 | <firstname>Ingo</firstname> | ||
21 | <surname>Molnar</surname> | ||
22 | <affiliation> | ||
23 | <address> | ||
24 | <email>mingo@elte.hu</email> | ||
25 | </address> | ||
26 | </affiliation> | ||
27 | </author> | ||
28 | </authorgroup> | ||
29 | |||
30 | <copyright> | ||
31 | <year>2005-2006</year> | ||
32 | <holder>Thomas Gleixner</holder> | ||
33 | </copyright> | ||
34 | <copyright> | ||
35 | <year>2005-2006</year> | ||
36 | <holder>Ingo Molnar</holder> | ||
37 | </copyright> | ||
38 | |||
39 | <legalnotice> | ||
40 | <para> | ||
41 | This documentation is free software; you can redistribute | ||
42 | it and/or modify it under the terms of the GNU General Public | ||
43 | License version 2 as published by the Free Software Foundation. | ||
44 | </para> | ||
45 | |||
46 | <para> | ||
47 | This program is distributed in the hope that it will be | ||
48 | useful, but WITHOUT ANY WARRANTY; without even the implied | ||
49 | warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. | ||
50 | See the GNU General Public License for more details. | ||
51 | </para> | ||
52 | |||
53 | <para> | ||
54 | You should have received a copy of the GNU General Public | ||
55 | License along with this program; if not, write to the Free | ||
56 | Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, | ||
57 | MA 02111-1307 USA | ||
58 | </para> | ||
59 | |||
60 | <para> | ||
61 | For more details see the file COPYING in the source | ||
62 | distribution of Linux. | ||
63 | </para> | ||
64 | </legalnotice> | ||
65 | </bookinfo> | ||
66 | |||
67 | <toc></toc> | ||
68 | |||
69 | <chapter id="intro"> | ||
70 | <title>Introduction</title> | ||
71 | <para> | ||
72 | The generic interrupt handling layer is designed to provide a | ||
73 | complete abstraction of interrupt handling for device drivers. | ||
74 | It is able to handle all the different types of interrupt controller | ||
75 | hardware. Device drivers use generic API functions to request, enable, | ||
76 | disable and free interrupts. The drivers do not have to know anything | ||
77 | about interrupt hardware details, so they can be used on different | ||
78 | platforms without code changes. | ||
79 | </para> | ||
80 | <para> | ||
81 | This documentation is provided to developers who want to implement | ||
82 | an interrupt subsystem based for their architecture, with the help | ||
83 | of the generic IRQ handling layer. | ||
84 | </para> | ||
85 | </chapter> | ||
86 | |||
87 | <chapter id="rationale"> | ||
88 | <title>Rationale</title> | ||
89 | <para> | ||
90 | The original implementation of interrupt handling in Linux is using | ||
91 | the __do_IRQ() super-handler, which is able to deal with every | ||
92 | type of interrupt logic. | ||
93 | </para> | ||
94 | <para> | ||
95 | Originally, Russell King identified different types of handlers to | ||
96 | build a quite universal set for the ARM interrupt handler | ||
97 | implementation in Linux 2.5/2.6. He distinguished between: | ||
98 | <itemizedlist> | ||
99 | <listitem><para>Level type</para></listitem> | ||
100 | <listitem><para>Edge type</para></listitem> | ||
101 | <listitem><para>Simple type</para></listitem> | ||
102 | </itemizedlist> | ||
103 | In the SMP world of the __do_IRQ() super-handler another type | ||
104 | was identified: | ||
105 | <itemizedlist> | ||
106 | <listitem><para>Per CPU type</para></listitem> | ||
107 | </itemizedlist> | ||
108 | </para> | ||
109 | <para> | ||
110 | This split implementation of highlevel IRQ handlers allows us to | ||
111 | optimize the flow of the interrupt handling for each specific | ||
112 | interrupt type. This reduces complexity in that particular codepath | ||
113 | and allows the optimized handling of a given type. | ||
114 | </para> | ||
115 | <para> | ||
116 | The original general IRQ implementation used hw_interrupt_type | ||
117 | structures and their ->ack(), ->end() [etc.] callbacks to | ||
118 | differentiate the flow control in the super-handler. This leads to | ||
119 | a mix of flow logic and lowlevel hardware logic, and it also leads | ||
120 | to unnecessary code duplication: for example in i386, there is a | ||
121 | ioapic_level_irq and a ioapic_edge_irq irq-type which share many | ||
122 | of the lowlevel details but have different flow handling. | ||
123 | </para> | ||
124 | <para> | ||
125 | A more natural abstraction is the clean separation of the | ||
126 | 'irq flow' and the 'chip details'. | ||
127 | </para> | ||
128 | <para> | ||
129 | Analysing a couple of architecture's IRQ subsystem implementations | ||
130 | reveals that most of them can use a generic set of 'irq flow' | ||
131 | methods and only need to add the chip level specific code. | ||
132 | The separation is also valuable for (sub)architectures | ||
133 | which need specific quirks in the irq flow itself but not in the | ||
134 | chip-details - and thus provides a more transparent IRQ subsystem | ||
135 | design. | ||
136 | </para> | ||
137 | <para> | ||
138 | Each interrupt descriptor is assigned its own highlevel flow | ||
139 | handler, which is normally one of the generic | ||
140 | implementations. (This highlevel flow handler implementation also | ||
141 | makes it simple to provide demultiplexing handlers which can be | ||
142 | found in embedded platforms on various architectures.) | ||
143 | </para> | ||
144 | <para> | ||
145 | The separation makes the generic interrupt handling layer more | ||
146 | flexible and extensible. For example, an (sub)architecture can | ||
147 | use a generic irq-flow implementation for 'level type' interrupts | ||
148 | and add a (sub)architecture specific 'edge type' implementation. | ||
149 | </para> | ||
150 | <para> | ||
151 | To make the transition to the new model easier and prevent the | ||
152 | breakage of existing implementations, the __do_IRQ() super-handler | ||
153 | is still available. This leads to a kind of duality for the time | ||
154 | being. Over time the new model should be used in more and more | ||
155 | architectures, as it enables smaller and cleaner IRQ subsystems. | ||
156 | </para> | ||
157 | </chapter> | ||
158 | <chapter id="bugs"> | ||
159 | <title>Known Bugs And Assumptions</title> | ||
160 | <para> | ||
161 | None (knock on wood). | ||
162 | </para> | ||
163 | </chapter> | ||
164 | |||
165 | <chapter id="Abstraction"> | ||
166 | <title>Abstraction layers</title> | ||
167 | <para> | ||
168 | There are three main levels of abstraction in the interrupt code: | ||
169 | <orderedlist> | ||
170 | <listitem><para>Highlevel driver API</para></listitem> | ||
171 | <listitem><para>Highlevel IRQ flow handlers</para></listitem> | ||
172 | <listitem><para>Chiplevel hardware encapsulation</para></listitem> | ||
173 | </orderedlist> | ||
174 | </para> | ||
175 | <sect1> | ||
176 | <title>Interrupt control flow</title> | ||
177 | <para> | ||
178 | Each interrupt is described by an interrupt descriptor structure | ||
179 | irq_desc. The interrupt is referenced by an 'unsigned int' numeric | ||
180 | value which selects the corresponding interrupt decription structure | ||
181 | in the descriptor structures array. | ||
182 | The descriptor structure contains status information and pointers | ||
183 | to the interrupt flow method and the interrupt chip structure | ||
184 | which are assigned to this interrupt. | ||
185 | </para> | ||
186 | <para> | ||
187 | Whenever an interrupt triggers, the lowlevel arch code calls into | ||
188 | the generic interrupt code by calling desc->handle_irq(). | ||
189 | This highlevel IRQ handling function only uses desc->chip primitives | ||
190 | referenced by the assigned chip descriptor structure. | ||
191 | </para> | ||
192 | </sect1> | ||
193 | <sect1> | ||
194 | <title>Highlevel Driver API</title> | ||
195 | <para> | ||
196 | The highlevel Driver API consists of following functions: | ||
197 | <itemizedlist> | ||
198 | <listitem><para>request_irq()</para></listitem> | ||
199 | <listitem><para>free_irq()</para></listitem> | ||
200 | <listitem><para>disable_irq()</para></listitem> | ||
201 | <listitem><para>enable_irq()</para></listitem> | ||
202 | <listitem><para>disable_irq_nosync() (SMP only)</para></listitem> | ||
203 | <listitem><para>synchronize_irq() (SMP only)</para></listitem> | ||
204 | <listitem><para>set_irq_type()</para></listitem> | ||
205 | <listitem><para>set_irq_wake()</para></listitem> | ||
206 | <listitem><para>set_irq_data()</para></listitem> | ||
207 | <listitem><para>set_irq_chip()</para></listitem> | ||
208 | <listitem><para>set_irq_chip_data()</para></listitem> | ||
209 | </itemizedlist> | ||
210 | See the autogenerated function documentation for details. | ||
211 | </para> | ||
212 | </sect1> | ||
213 | <sect1> | ||
214 | <title>Highlevel IRQ flow handlers</title> | ||
215 | <para> | ||
216 | The generic layer provides a set of pre-defined irq-flow methods: | ||
217 | <itemizedlist> | ||
218 | <listitem><para>handle_level_irq</para></listitem> | ||
219 | <listitem><para>handle_edge_irq</para></listitem> | ||
220 | <listitem><para>handle_simple_irq</para></listitem> | ||
221 | <listitem><para>handle_percpu_irq</para></listitem> | ||
222 | </itemizedlist> | ||
223 | The interrupt flow handlers (either predefined or architecture | ||
224 | specific) are assigned to specific interrupts by the architecture | ||
225 | either during bootup or during device initialization. | ||
226 | </para> | ||
227 | <sect2> | ||
228 | <title>Default flow implementations</title> | ||
229 | <sect3> | ||
230 | <title>Helper functions</title> | ||
231 | <para> | ||
232 | The helper functions call the chip primitives and | ||
233 | are used by the default flow implementations. | ||
234 | The following helper functions are implemented (simplified excerpt): | ||
235 | <programlisting> | ||
236 | default_enable(irq) | ||
237 | { | ||
238 | desc->chip->unmask(irq); | ||
239 | } | ||
240 | |||
241 | default_disable(irq) | ||
242 | { | ||
243 | if (!delay_disable(irq)) | ||
244 | desc->chip->mask(irq); | ||
245 | } | ||
246 | |||
247 | default_ack(irq) | ||
248 | { | ||
249 | chip->ack(irq); | ||
250 | } | ||
251 | |||
252 | default_mask_ack(irq) | ||
253 | { | ||
254 | if (chip->mask_ack) { | ||
255 | chip->mask_ack(irq); | ||
256 | } else { | ||
257 | chip->mask(irq); | ||
258 | chip->ack(irq); | ||
259 | } | ||
260 | } | ||
261 | |||
262 | noop(irq) | ||
263 | { | ||
264 | } | ||
265 | |||
266 | </programlisting> | ||
267 | </para> | ||
268 | </sect3> | ||
269 | </sect2> | ||
270 | <sect2> | ||
271 | <title>Default flow handler implementations</title> | ||
272 | <sect3> | ||
273 | <title>Default Level IRQ flow handler</title> | ||
274 | <para> | ||
275 | handle_level_irq provides a generic implementation | ||
276 | for level-triggered interrupts. | ||
277 | </para> | ||
278 | <para> | ||
279 | The following control flow is implemented (simplified excerpt): | ||
280 | <programlisting> | ||
281 | desc->chip->start(); | ||
282 | handle_IRQ_event(desc->action); | ||
283 | desc->chip->end(); | ||
284 | </programlisting> | ||
285 | </para> | ||
286 | </sect3> | ||
287 | <sect3> | ||
288 | <title>Default Edge IRQ flow handler</title> | ||
289 | <para> | ||
290 | handle_edge_irq provides a generic implementation | ||
291 | for edge-triggered interrupts. | ||
292 | </para> | ||
293 | <para> | ||
294 | The following control flow is implemented (simplified excerpt): | ||
295 | <programlisting> | ||
296 | if (desc->status & running) { | ||
297 | desc->chip->hold(); | ||
298 | desc->status |= pending | masked; | ||
299 | return; | ||
300 | } | ||
301 | desc->chip->start(); | ||
302 | desc->status |= running; | ||
303 | do { | ||
304 | if (desc->status & masked) | ||
305 | desc->chip->enable(); | ||
306 | desc-status &= ~pending; | ||
307 | handle_IRQ_event(desc->action); | ||
308 | } while (status & pending); | ||
309 | desc-status &= ~running; | ||
310 | desc->chip->end(); | ||
311 | </programlisting> | ||
312 | </para> | ||
313 | </sect3> | ||
314 | <sect3> | ||
315 | <title>Default simple IRQ flow handler</title> | ||
316 | <para> | ||
317 | handle_simple_irq provides a generic implementation | ||
318 | for simple interrupts. | ||
319 | </para> | ||
320 | <para> | ||
321 | Note: The simple flow handler does not call any | ||
322 | handler/chip primitives. | ||
323 | </para> | ||
324 | <para> | ||
325 | The following control flow is implemented (simplified excerpt): | ||
326 | <programlisting> | ||
327 | handle_IRQ_event(desc->action); | ||
328 | </programlisting> | ||
329 | </para> | ||
330 | </sect3> | ||
331 | <sect3> | ||
332 | <title>Default per CPU flow handler</title> | ||
333 | <para> | ||
334 | handle_percpu_irq provides a generic implementation | ||
335 | for per CPU interrupts. | ||
336 | </para> | ||
337 | <para> | ||
338 | Per CPU interrupts are only available on SMP and | ||
339 | the handler provides a simplified version without | ||
340 | locking. | ||
341 | </para> | ||
342 | <para> | ||
343 | The following control flow is implemented (simplified excerpt): | ||
344 | <programlisting> | ||
345 | desc->chip->start(); | ||
346 | handle_IRQ_event(desc->action); | ||
347 | desc->chip->end(); | ||
348 | </programlisting> | ||
349 | </para> | ||
350 | </sect3> | ||
351 | </sect2> | ||
352 | <sect2> | ||
353 | <title>Quirks and optimizations</title> | ||
354 | <para> | ||
355 | The generic functions are intended for 'clean' architectures and chips, | ||
356 | which have no platform-specific IRQ handling quirks. If an architecture | ||
357 | needs to implement quirks on the 'flow' level then it can do so by | ||
358 | overriding the highlevel irq-flow handler. | ||
359 | </para> | ||
360 | </sect2> | ||
361 | <sect2> | ||
362 | <title>Delayed interrupt disable</title> | ||
363 | <para> | ||
364 | This per interrupt selectable feature, which was introduced by Russell | ||
365 | King in the ARM interrupt implementation, does not mask an interrupt | ||
366 | at the hardware level when disable_irq() is called. The interrupt is | ||
367 | kept enabled and is masked in the flow handler when an interrupt event | ||
368 | happens. This prevents losing edge interrupts on hardware which does | ||
369 | not store an edge interrupt event while the interrupt is disabled at | ||
370 | the hardware level. When an interrupt arrives while the IRQ_DISABLED | ||
371 | flag is set, then the interrupt is masked at the hardware level and | ||
372 | the IRQ_PENDING bit is set. When the interrupt is re-enabled by | ||
373 | enable_irq() the pending bit is checked and if it is set, the | ||
374 | interrupt is resent either via hardware or by a software resend | ||
375 | mechanism. (It's necessary to enable CONFIG_HARDIRQS_SW_RESEND when | ||
376 | you want to use the delayed interrupt disable feature and your | ||
377 | hardware is not capable of retriggering an interrupt.) | ||
378 | The delayed interrupt disable can be runtime enabled, per interrupt, | ||
379 | by setting the IRQ_DELAYED_DISABLE flag in the irq_desc status field. | ||
380 | </para> | ||
381 | </sect2> | ||
382 | </sect1> | ||
383 | <sect1> | ||
384 | <title>Chiplevel hardware encapsulation</title> | ||
385 | <para> | ||
386 | The chip level hardware descriptor structure irq_chip | ||
387 | contains all the direct chip relevant functions, which | ||
388 | can be utilized by the irq flow implementations. | ||
389 | <itemizedlist> | ||
390 | <listitem><para>ack()</para></listitem> | ||
391 | <listitem><para>mask_ack() - Optional, recommended for performance</para></listitem> | ||
392 | <listitem><para>mask()</para></listitem> | ||
393 | <listitem><para>unmask()</para></listitem> | ||
394 | <listitem><para>retrigger() - Optional</para></listitem> | ||
395 | <listitem><para>set_type() - Optional</para></listitem> | ||
396 | <listitem><para>set_wake() - Optional</para></listitem> | ||
397 | </itemizedlist> | ||
398 | These primitives are strictly intended to mean what they say: ack means | ||
399 | ACK, masking means masking of an IRQ line, etc. It is up to the flow | ||
400 | handler(s) to use these basic units of lowlevel functionality. | ||
401 | </para> | ||
402 | </sect1> | ||
403 | </chapter> | ||
404 | |||
405 | <chapter id="doirq"> | ||
406 | <title>__do_IRQ entry point</title> | ||
407 | <para> | ||
408 | The original implementation __do_IRQ() is an alternative entry | ||
409 | point for all types of interrupts. | ||
410 | </para> | ||
411 | <para> | ||
412 | This handler turned out to be not suitable for all | ||
413 | interrupt hardware and was therefore reimplemented with split | ||
414 | functionality for egde/level/simple/percpu interrupts. This is not | ||
415 | only a functional optimization. It also shortens code paths for | ||
416 | interrupts. | ||
417 | </para> | ||
418 | <para> | ||
419 | To make use of the split implementation, replace the call to | ||
420 | __do_IRQ by a call to desc->chip->handle_irq() and associate | ||
421 | the appropriate handler function to desc->chip->handle_irq(). | ||
422 | In most cases the generic handler implementations should | ||
423 | be sufficient. | ||
424 | </para> | ||
425 | </chapter> | ||
426 | |||
427 | <chapter id="locking"> | ||
428 | <title>Locking on SMP</title> | ||
429 | <para> | ||
430 | The locking of chip registers is up to the architecture that | ||
431 | defines the chip primitives. There is a chip->lock field that can be used | ||
432 | for serialization, but the generic layer does not touch it. The per-irq | ||
433 | structure is protected via desc->lock, by the generic layer. | ||
434 | </para> | ||
435 | </chapter> | ||
436 | <chapter id="structs"> | ||
437 | <title>Structures</title> | ||
438 | <para> | ||
439 | This chapter contains the autogenerated documentation of the structures which are | ||
440 | used in the generic IRQ layer. | ||
441 | </para> | ||
442 | !Iinclude/linux/irq.h | ||
443 | </chapter> | ||
444 | |||
445 | <chapter id="pubfunctions"> | ||
446 | <title>Public Functions Provided</title> | ||
447 | <para> | ||
448 | This chapter contains the autogenerated documentation of the kernel API functions | ||
449 | which are exported. | ||
450 | </para> | ||
451 | !Ekernel/irq/manage.c | ||
452 | !Ekernel/irq/chip.c | ||
453 | </chapter> | ||
454 | |||
455 | <chapter id="intfunctions"> | ||
456 | <title>Internal Functions Provided</title> | ||
457 | <para> | ||
458 | This chapter contains the autogenerated documentation of the internal functions. | ||
459 | </para> | ||
460 | !Ikernel/irq/handle.c | ||
461 | !Ikernel/irq/chip.c | ||
462 | </chapter> | ||
463 | |||
464 | <chapter id="credits"> | ||
465 | <title>Credits</title> | ||
466 | <para> | ||
467 | The following people have contributed to this document: | ||
468 | <orderedlist> | ||
469 | <listitem><para>Thomas Gleixner<email>tglx@linutronix.de</email></para></listitem> | ||
470 | <listitem><para>Ingo Molnar<email>mingo@elte.hu</email></para></listitem> | ||
471 | </orderedlist> | ||
472 | </para> | ||
473 | </chapter> | ||
474 | </book> | ||
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 158ffe9bfade..644c3884fab9 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl | |||
@@ -1590,7 +1590,7 @@ the amount of locking which needs to be done. | |||
1590 | <para> | 1590 | <para> |
1591 | Our final dilemma is this: when can we actually destroy the | 1591 | Our final dilemma is this: when can we actually destroy the |
1592 | removed element? Remember, a reader might be stepping through | 1592 | removed element? Remember, a reader might be stepping through |
1593 | this element in the list right now: it we free this element and | 1593 | this element in the list right now: if we free this element and |
1594 | the <symbol>next</symbol> pointer changes, the reader will jump | 1594 | the <symbol>next</symbol> pointer changes, the reader will jump |
1595 | off into garbage and crash. We need to wait until we know that | 1595 | off into garbage and crash. We need to wait until we know that |
1596 | all the readers who were traversing the list when we deleted the | 1596 | all the readers who were traversing the list when we deleted the |
diff --git a/Documentation/IRQ.txt b/Documentation/IRQ.txt new file mode 100644 index 000000000000..1011e7175021 --- /dev/null +++ b/Documentation/IRQ.txt | |||
@@ -0,0 +1,22 @@ | |||
1 | What is an IRQ? | ||
2 | |||
3 | An IRQ is an interrupt request from a device. | ||
4 | Currently they can come in over a pin, or over a packet. | ||
5 | Several devices may be connected to the same pin thus | ||
6 | sharing an IRQ. | ||
7 | |||
8 | An IRQ number is a kernel identifier used to talk about a hardware | ||
9 | interrupt source. Typically this is an index into the global irq_desc | ||
10 | array, but except for what linux/interrupt.h implements the details | ||
11 | are architecture specific. | ||
12 | |||
13 | An IRQ number is an enumeration of the possible interrupt sources on a | ||
14 | machine. Typically what is enumerated is the number of input pins on | ||
15 | all of the interrupt controller in the system. In the case of ISA | ||
16 | what is enumerated are the 16 input pins on the two i8259 interrupt | ||
17 | controllers. | ||
18 | |||
19 | Architectures can assign additional meaning to the IRQ numbers, and | ||
20 | are encouraged to in the case where there is any manual configuration | ||
21 | of the hardware involved. The ISA IRQs are a classic example of | ||
22 | assigning this kind of additional meaning. | ||
diff --git a/Documentation/RCU/torture.txt b/Documentation/RCU/torture.txt index e4c38152f7f7..a4948591607d 100644 --- a/Documentation/RCU/torture.txt +++ b/Documentation/RCU/torture.txt | |||
@@ -7,7 +7,7 @@ The CONFIG_RCU_TORTURE_TEST config option is available for all RCU | |||
7 | implementations. It creates an rcutorture kernel module that can | 7 | implementations. It creates an rcutorture kernel module that can |
8 | be loaded to run a torture test. The test periodically outputs | 8 | be loaded to run a torture test. The test periodically outputs |
9 | status messages via printk(), which can be examined via the dmesg | 9 | status messages via printk(), which can be examined via the dmesg |
10 | command (perhaps grepping for "rcutorture"). The test is started | 10 | command (perhaps grepping for "torture"). The test is started |
11 | when the module is loaded, and stops when the module is unloaded. | 11 | when the module is loaded, and stops when the module is unloaded. |
12 | 12 | ||
13 | However, actually setting this config option to "y" results in the system | 13 | However, actually setting this config option to "y" results in the system |
@@ -35,6 +35,19 @@ stat_interval The number of seconds between output of torture | |||
35 | be printed -only- when the module is unloaded, and this | 35 | be printed -only- when the module is unloaded, and this |
36 | is the default. | 36 | is the default. |
37 | 37 | ||
38 | shuffle_interval | ||
39 | The number of seconds to keep the test threads affinitied | ||
40 | to a particular subset of the CPUs. Used in conjunction | ||
41 | with test_no_idle_hz. | ||
42 | |||
43 | test_no_idle_hz Whether or not to test the ability of RCU to operate in | ||
44 | a kernel that disables the scheduling-clock interrupt to | ||
45 | idle CPUs. Boolean parameter, "1" to test, "0" otherwise. | ||
46 | |||
47 | torture_type The type of RCU to test: "rcu" for the rcu_read_lock() | ||
48 | API, "rcu_bh" for the rcu_read_lock_bh() API, and "srcu" | ||
49 | for the "srcu_read_lock()" API. | ||
50 | |||
38 | verbose Enable debug printk()s. Default is disabled. | 51 | verbose Enable debug printk()s. Default is disabled. |
39 | 52 | ||
40 | 53 | ||
@@ -42,14 +55,14 @@ OUTPUT | |||
42 | 55 | ||
43 | The statistics output is as follows: | 56 | The statistics output is as follows: |
44 | 57 | ||
45 | rcutorture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 | 58 | rcu-torture: --- Start of test: nreaders=16 stat_interval=0 verbose=0 |
46 | rcutorture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 | 59 | rcu-torture: rtc: 0000000000000000 ver: 1916 tfle: 0 rta: 1916 rtaf: 0 rtf: 1915 |
47 | rcutorture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 | 60 | rcu-torture: Reader Pipe: 1466408 9747 0 0 0 0 0 0 0 0 0 |
48 | rcutorture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 | 61 | rcu-torture: Reader Batch: 1464477 11678 0 0 0 0 0 0 0 0 |
49 | rcutorture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 | 62 | rcu-torture: Free-Block Circulation: 1915 1915 1915 1915 1915 1915 1915 1915 1915 1915 0 |
50 | rcutorture: --- End of test | 63 | rcu-torture: --- End of test |
51 | 64 | ||
52 | The command "dmesg | grep rcutorture:" will extract this information on | 65 | The command "dmesg | grep torture:" will extract this information on |
53 | most systems. On more esoteric configurations, it may be necessary to | 66 | most systems. On more esoteric configurations, it may be necessary to |
54 | use other commands to access the output of the printk()s used by | 67 | use other commands to access the output of the printk()s used by |
55 | the RCU torture test. The printk()s use KERN_ALERT, so they should | 68 | the RCU torture test. The printk()s use KERN_ALERT, so they should |
@@ -115,8 +128,9 @@ The following script may be used to torture RCU: | |||
115 | modprobe rcutorture | 128 | modprobe rcutorture |
116 | sleep 100 | 129 | sleep 100 |
117 | rmmod rcutorture | 130 | rmmod rcutorture |
118 | dmesg | grep rcutorture: | 131 | dmesg | grep torture: |
119 | 132 | ||
120 | The output can be manually inspected for the error flag of "!!!". | 133 | The output can be manually inspected for the error flag of "!!!". |
121 | One could of course create a more elaborate script that automatically | 134 | One could of course create a more elaborate script that automatically |
122 | checked for such errors. | 135 | checked for such errors. The "rmmod" command forces a "SUCCESS" or |
136 | "FAILURE" indication to be printk()ed. | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/Overview.txt b/Documentation/arm/Samsung-S3C24XX/Overview.txt index 8c6ee684174c..3e46d2a31158 100644 --- a/Documentation/arm/Samsung-S3C24XX/Overview.txt +++ b/Documentation/arm/Samsung-S3C24XX/Overview.txt | |||
@@ -7,11 +7,13 @@ Introduction | |||
7 | ------------ | 7 | ------------ |
8 | 8 | ||
9 | The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported | 9 | The Samsung S3C24XX range of ARM9 System-on-Chip CPUs are supported |
10 | by the 's3c2410' architecture of ARM Linux. Currently the S3C2410 and | 10 | by the 's3c2410' architecture of ARM Linux. Currently the S3C2410, |
11 | the S3C2440 are supported CPUs. | 11 | S3C2440 and S3C2442 devices are supported. |
12 | 12 | ||
13 | Support for the S3C2400 series is in progress. | 13 | Support for the S3C2400 series is in progress. |
14 | 14 | ||
15 | Support for the S3C2412 and S3C2413 CPUs is being merged. | ||
16 | |||
15 | 17 | ||
16 | Configuration | 18 | Configuration |
17 | ------------- | 19 | ------------- |
@@ -43,9 +45,18 @@ Machines | |||
43 | 45 | ||
44 | Samsung's own development board, geared for PDA work. | 46 | Samsung's own development board, geared for PDA work. |
45 | 47 | ||
48 | Samsung/Aiji SMDK2412 | ||
49 | |||
50 | The S3C2412 version of the SMDK2440. | ||
51 | |||
52 | Samsung/Aiji SMDK2413 | ||
53 | |||
54 | The S3C2412 version of the SMDK2440. | ||
55 | |||
46 | Samsung/Meritech SMDK2440 | 56 | Samsung/Meritech SMDK2440 |
47 | 57 | ||
48 | The S3C2440 compatible version of the SMDK2440 | 58 | The S3C2440 compatible version of the SMDK2440, which has the |
59 | option of an S3C2440 or S3C2442 CPU module. | ||
49 | 60 | ||
50 | Thorcom VR1000 | 61 | Thorcom VR1000 |
51 | 62 | ||
@@ -211,24 +222,6 @@ Port Contributors | |||
211 | Lucas Correia Villa Real (S3C2400 port) | 222 | Lucas Correia Villa Real (S3C2400 port) |
212 | 223 | ||
213 | 224 | ||
214 | Document Changes | ||
215 | ---------------- | ||
216 | |||
217 | 05 Sep 2004 - BJD - Added Document Changes section | ||
218 | 05 Sep 2004 - BJD - Added Klaus Fetscher to list of contributors | ||
219 | 25 Oct 2004 - BJD - Added Dimitry Andric to list of contributors | ||
220 | 25 Oct 2004 - BJD - Updated the MTD from the 2.6.9 merge | ||
221 | 21 Jan 2005 - BJD - Added rx3715, added Shannon to contributors | ||
222 | 10 Feb 2005 - BJD - Added Guillaume Gourat to contributors | ||
223 | 02 Mar 2005 - BJD - Added SMDK2440 to list of machines | ||
224 | 06 Mar 2005 - BJD - Added Christer Weinigel | ||
225 | 08 Mar 2005 - BJD - Added LCVR to list of people, updated introduction | ||
226 | 08 Mar 2005 - BJD - Added section on adding machines | ||
227 | 09 Sep 2005 - BJD - Added section on platform data | ||
228 | 11 Feb 2006 - BJD - Added I2C, RTC and Watchdog sections | ||
229 | 11 Feb 2006 - BJD - Added Osiris machine, and S3C2400 information | ||
230 | |||
231 | |||
232 | Document Author | 225 | Document Author |
233 | --------------- | 226 | --------------- |
234 | 227 | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2412.txt b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt new file mode 100644 index 000000000000..cb82a7fc7901 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2412.txt | |||
@@ -0,0 +1,120 @@ | |||
1 | S3C2412 ARM Linux Overview | ||
2 | ========================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The S3C2412 is part of the S3C24XX range of ARM9 System-on-Chip CPUs | ||
8 | from Samsung. This part has an ARM926-EJS core, capable of running up | ||
9 | to 266MHz (see data-sheet for more information) | ||
10 | |||
11 | |||
12 | Clock | ||
13 | ----- | ||
14 | |||
15 | The core clock code provides a set of clocks to the drivers, and allows | ||
16 | for source selection and a number of other features. | ||
17 | |||
18 | |||
19 | Power | ||
20 | ----- | ||
21 | |||
22 | No support for suspend/resume to RAM in the current system. | ||
23 | |||
24 | |||
25 | DMA | ||
26 | --- | ||
27 | |||
28 | No current support for DMA. | ||
29 | |||
30 | |||
31 | GPIO | ||
32 | ---- | ||
33 | |||
34 | There is support for setting the GPIO to input/output/special function | ||
35 | and reading or writing to them. | ||
36 | |||
37 | |||
38 | UART | ||
39 | ---- | ||
40 | |||
41 | The UART hardware is similar to the S3C2440, and is supported by the | ||
42 | s3c2410 driver in the drivers/serial directory. | ||
43 | |||
44 | |||
45 | NAND | ||
46 | ---- | ||
47 | |||
48 | The NAND hardware is similar to the S3C2440, and is supported by the | ||
49 | s3c2410 driver in the drivers/mtd/nand directory. | ||
50 | |||
51 | |||
52 | USB Host | ||
53 | -------- | ||
54 | |||
55 | The USB hardware is similar to the S3C2410, with extended clock source | ||
56 | control. The OHCI portion is supported by the ohci-s3c2410 driver, and | ||
57 | the clock control selection is supported by the core clock code. | ||
58 | |||
59 | |||
60 | USB Device | ||
61 | ---------- | ||
62 | |||
63 | No current support in the kernel | ||
64 | |||
65 | |||
66 | IRQs | ||
67 | ---- | ||
68 | |||
69 | All the standard, and external interrupt sources are supported. The | ||
70 | extra sub-sources are not yet supported. | ||
71 | |||
72 | |||
73 | RTC | ||
74 | --- | ||
75 | |||
76 | The RTC hardware is similar to the S3C2410, and is supported by the | ||
77 | s3c2410-rtc driver. | ||
78 | |||
79 | |||
80 | Watchdog | ||
81 | -------- | ||
82 | |||
83 | The watchdog harware is the same as the S3C2410, and is supported by | ||
84 | the s3c2410_wdt driver. | ||
85 | |||
86 | |||
87 | MMC/SD/SDIO | ||
88 | ----------- | ||
89 | |||
90 | No current support for the MMC/SD/SDIO block. | ||
91 | |||
92 | IIC | ||
93 | --- | ||
94 | |||
95 | The IIC hardware is the same as the S3C2410, and is supported by the | ||
96 | i2c-s3c24xx driver. | ||
97 | |||
98 | |||
99 | IIS | ||
100 | --- | ||
101 | |||
102 | No current support for the IIS interface. | ||
103 | |||
104 | |||
105 | SPI | ||
106 | --- | ||
107 | |||
108 | No current support for the SPI interfaces. | ||
109 | |||
110 | |||
111 | ATA | ||
112 | --- | ||
113 | |||
114 | No current support for the on-board ATA block. | ||
115 | |||
116 | |||
117 | Document Author | ||
118 | --------------- | ||
119 | |||
120 | Ben Dooks, (c) 2006 Simtec Electronics | ||
diff --git a/Documentation/arm/Samsung-S3C24XX/S3C2413.txt b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt new file mode 100644 index 000000000000..ab2a88858f12 --- /dev/null +++ b/Documentation/arm/Samsung-S3C24XX/S3C2413.txt | |||
@@ -0,0 +1,21 @@ | |||
1 | S3C2413 ARM Linux Overview | ||
2 | ========================== | ||
3 | |||
4 | Introduction | ||
5 | ------------ | ||
6 | |||
7 | The S3C2413 is an extended version of the S3C2412, with an camera | ||
8 | interface and mobile DDR memory support. See the S3C2412 support | ||
9 | documentation for more information. | ||
10 | |||
11 | |||
12 | Camera Interface | ||
13 | --------------- | ||
14 | |||
15 | This block is currently not supported. | ||
16 | |||
17 | |||
18 | Document Author | ||
19 | --------------- | ||
20 | |||
21 | Ben Dooks, (c) 2006 Simtec Electronics | ||
diff --git a/Documentation/atomic_ops.txt b/Documentation/atomic_ops.txt index 23a1c2402bcc..2a63d5662a93 100644 --- a/Documentation/atomic_ops.txt +++ b/Documentation/atomic_ops.txt | |||
@@ -157,13 +157,13 @@ For example, smp_mb__before_atomic_dec() can be used like so: | |||
157 | smp_mb__before_atomic_dec(); | 157 | smp_mb__before_atomic_dec(); |
158 | atomic_dec(&obj->ref_count); | 158 | atomic_dec(&obj->ref_count); |
159 | 159 | ||
160 | It makes sure that all memory operations preceeding the atomic_dec() | 160 | It makes sure that all memory operations preceding the atomic_dec() |
161 | call are strongly ordered with respect to the atomic counter | 161 | call are strongly ordered with respect to the atomic counter |
162 | operation. In the above example, it guarentees that the assignment of | 162 | operation. In the above example, it guarantees that the assignment of |
163 | "1" to obj->dead will be globally visible to other cpus before the | 163 | "1" to obj->dead will be globally visible to other cpus before the |
164 | atomic counter decrement. | 164 | atomic counter decrement. |
165 | 165 | ||
166 | Without the explicitl smp_mb__before_atomic_dec() call, the | 166 | Without the explicit smp_mb__before_atomic_dec() call, the |
167 | implementation could legally allow the atomic counter update visible | 167 | implementation could legally allow the atomic counter update visible |
168 | to other cpus before the "obj->dead = 1;" assignment. | 168 | to other cpus before the "obj->dead = 1;" assignment. |
169 | 169 | ||
@@ -173,11 +173,11 @@ ordering with respect to memory operations after an atomic_dec() call | |||
173 | (smp_mb__{before,after}_atomic_inc()). | 173 | (smp_mb__{before,after}_atomic_inc()). |
174 | 174 | ||
175 | A missing memory barrier in the cases where they are required by the | 175 | A missing memory barrier in the cases where they are required by the |
176 | atomic_t implementation above can have disasterous results. Here is | 176 | atomic_t implementation above can have disastrous results. Here is |
177 | an example, which follows a pattern occuring frequently in the Linux | 177 | an example, which follows a pattern occurring frequently in the Linux |
178 | kernel. It is the use of atomic counters to implement reference | 178 | kernel. It is the use of atomic counters to implement reference |
179 | counting, and it works such that once the counter falls to zero it can | 179 | counting, and it works such that once the counter falls to zero it can |
180 | be guarenteed that no other entity can be accessing the object: | 180 | be guaranteed that no other entity can be accessing the object: |
181 | 181 | ||
182 | static void obj_list_add(struct obj *obj) | 182 | static void obj_list_add(struct obj *obj) |
183 | { | 183 | { |
@@ -291,9 +291,9 @@ to the size of an "unsigned long" C data type, and are least of that | |||
291 | size. The endianness of the bits within each "unsigned long" are the | 291 | size. The endianness of the bits within each "unsigned long" are the |
292 | native endianness of the cpu. | 292 | native endianness of the cpu. |
293 | 293 | ||
294 | void set_bit(unsigned long nr, volatils unsigned long *addr); | 294 | void set_bit(unsigned long nr, volatile unsigned long *addr); |
295 | void clear_bit(unsigned long nr, volatils unsigned long *addr); | 295 | void clear_bit(unsigned long nr, volatile unsigned long *addr); |
296 | void change_bit(unsigned long nr, volatils unsigned long *addr); | 296 | void change_bit(unsigned long nr, volatile unsigned long *addr); |
297 | 297 | ||
298 | These routines set, clear, and change, respectively, the bit number | 298 | These routines set, clear, and change, respectively, the bit number |
299 | indicated by "nr" on the bit mask pointed to by "ADDR". | 299 | indicated by "nr" on the bit mask pointed to by "ADDR". |
@@ -301,9 +301,9 @@ indicated by "nr" on the bit mask pointed to by "ADDR". | |||
301 | They must execute atomically, yet there are no implicit memory barrier | 301 | They must execute atomically, yet there are no implicit memory barrier |
302 | semantics required of these interfaces. | 302 | semantics required of these interfaces. |
303 | 303 | ||
304 | int test_and_set_bit(unsigned long nr, volatils unsigned long *addr); | 304 | int test_and_set_bit(unsigned long nr, volatile unsigned long *addr); |
305 | int test_and_clear_bit(unsigned long nr, volatils unsigned long *addr); | 305 | int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr); |
306 | int test_and_change_bit(unsigned long nr, volatils unsigned long *addr); | 306 | int test_and_change_bit(unsigned long nr, volatile unsigned long *addr); |
307 | 307 | ||
308 | Like the above, except that these routines return a boolean which | 308 | Like the above, except that these routines return a boolean which |
309 | indicates whether the changed bit was set _BEFORE_ the atomic bit | 309 | indicates whether the changed bit was set _BEFORE_ the atomic bit |
@@ -335,7 +335,7 @@ subsequent memory operation is made visible. For example: | |||
335 | /* ... */; | 335 | /* ... */; |
336 | obj->killed = 1; | 336 | obj->killed = 1; |
337 | 337 | ||
338 | The implementation of test_and_set_bit() must guarentee that | 338 | The implementation of test_and_set_bit() must guarantee that |
339 | "obj->dead = 1;" is visible to cpus before the atomic memory operation | 339 | "obj->dead = 1;" is visible to cpus before the atomic memory operation |
340 | done by test_and_set_bit() becomes visible. Likewise, the atomic | 340 | done by test_and_set_bit() becomes visible. Likewise, the atomic |
341 | memory operation done by test_and_set_bit() must become visible before | 341 | memory operation done by test_and_set_bit() must become visible before |
@@ -474,7 +474,7 @@ Now, as far as memory barriers go, as long as spin_lock() | |||
474 | strictly orders all subsequent memory operations (including | 474 | strictly orders all subsequent memory operations (including |
475 | the cas()) with respect to itself, things will be fine. | 475 | the cas()) with respect to itself, things will be fine. |
476 | 476 | ||
477 | Said another way, _atomic_dec_and_lock() must guarentee that | 477 | Said another way, _atomic_dec_and_lock() must guarantee that |
478 | a counter dropping to zero is never made visible before the | 478 | a counter dropping to zero is never made visible before the |
479 | spinlock being acquired. | 479 | spinlock being acquired. |
480 | 480 | ||
diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt index ac4a7a737e43..2050c9ffc629 100644 --- a/Documentation/driver-model/overview.txt +++ b/Documentation/driver-model/overview.txt | |||
@@ -18,7 +18,7 @@ Traditional driver models implemented some sort of tree-like structure | |||
18 | (sometimes just a list) for the devices they control. There wasn't any | 18 | (sometimes just a list) for the devices they control. There wasn't any |
19 | uniformity across the different bus types. | 19 | uniformity across the different bus types. |
20 | 20 | ||
21 | The current driver model provides a comon, uniform data model for describing | 21 | The current driver model provides a common, uniform data model for describing |
22 | a bus and the devices that can appear under the bus. The unified bus | 22 | a bus and the devices that can appear under the bus. The unified bus |
23 | model includes a set of common attributes which all busses carry, and a set | 23 | model includes a set of common attributes which all busses carry, and a set |
24 | of common callbacks, such as device discovery during bus probing, bus | 24 | of common callbacks, such as device discovery during bus probing, bus |
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index ddfd77029597..1cbbb8e28999 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -121,16 +121,6 @@ Who: NeilBrown <neilb@suse.de> | |||
121 | 121 | ||
122 | --------------------------- | 122 | --------------------------- |
123 | 123 | ||
124 | What: au1x00_uart driver | ||
125 | When: January 2006 | ||
126 | Why: The 8250 serial driver now has the ability to deal with the differences | ||
127 | between the standard 8250 family of UARTs and their slightly strange | ||
128 | brother on Alchemy SOCs. The loss of features is not considered an | ||
129 | issue. | ||
130 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
131 | |||
132 | --------------------------- | ||
133 | |||
134 | What: eepro100 network driver | 124 | What: eepro100 network driver |
135 | When: January 2007 | 125 | When: January 2007 |
136 | Why: replaced by the e100 driver | 126 | Why: replaced by the e100 driver |
@@ -166,6 +156,16 @@ Who: Jean Delvare <khali@linux-fr.org> | |||
166 | 156 | ||
167 | --------------------------- | 157 | --------------------------- |
168 | 158 | ||
159 | What: Unused EXPORT_SYMBOL/EXPORT_SYMBOL_GPL exports | ||
160 | (temporary transition config option provided until then) | ||
161 | The transition config option will also be removed at the same time. | ||
162 | When: before 2.6.19 | ||
163 | Why: Unused symbols are both increasing the size of the kernel binary | ||
164 | and are often a sign of "wrong API" | ||
165 | Who: Arjan van de Ven <arjan@linux.intel.com> | ||
166 | |||
167 | --------------------------- | ||
168 | |||
169 | What: remove EXPORT_SYMBOL(tasklist_lock) | 169 | What: remove EXPORT_SYMBOL(tasklist_lock) |
170 | When: August 2006 | 170 | When: August 2006 |
171 | Files: kernel/fork.c | 171 | Files: kernel/fork.c |
@@ -213,3 +213,47 @@ Why: The interface no longer has any callers left in the kernel. It | |||
213 | Who: Nick Piggin <npiggin@suse.de> | 213 | Who: Nick Piggin <npiggin@suse.de> |
214 | 214 | ||
215 | --------------------------- | 215 | --------------------------- |
216 | |||
217 | What: Support for the MIPS EV96100 evaluation board | ||
218 | When: September 2006 | ||
219 | Why: Does no longer build since at least November 15, 2003, apparently | ||
220 | no userbase left. | ||
221 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
222 | |||
223 | --------------------------- | ||
224 | |||
225 | What: Support for the Momentum / PMC-Sierra Jaguar ATX evaluation board | ||
226 | When: September 2006 | ||
227 | Why: Does no longer build since quite some time, and was never popular, | ||
228 | due to the platform being replaced by successor models. Apparently | ||
229 | no user base left. It also is one of the last users of | ||
230 | WANT_PAGE_VIRTUAL. | ||
231 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
232 | |||
233 | --------------------------- | ||
234 | |||
235 | What: Support for the Momentum Ocelot, Ocelot 3, Ocelot C and Ocelot G | ||
236 | When: September 2006 | ||
237 | Why: Some do no longer build and apparently there is no user base left | ||
238 | for these platforms. | ||
239 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
240 | |||
241 | --------------------------- | ||
242 | |||
243 | What: Support for MIPS Technologies' Altas and SEAD evaluation board | ||
244 | When: September 2006 | ||
245 | Why: Some do no longer build and apparently there is no user base left | ||
246 | for these platforms. Hardware out of production since several years. | ||
247 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
248 | |||
249 | --------------------------- | ||
250 | |||
251 | What: Support for the IT8172-based platforms, ITE 8172G and Globespan IVR | ||
252 | When: September 2006 | ||
253 | Why: Code does no longer build since at least 2.6.0, apparently there is | ||
254 | no user base left for these platforms. Hardware out of production | ||
255 | since several years and hardly a trace of the manufacturer left on | ||
256 | the net. | ||
257 | Who: Ralf Baechle <ralf@linux-mips.org> | ||
258 | |||
259 | --------------------------- | ||
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt index dcf5580380ab..9b9b454b048a 100644 --- a/Documentation/kdump/gdbmacros.txt +++ b/Documentation/kdump/gdbmacros.txt | |||
@@ -175,7 +175,7 @@ end | |||
175 | document trapinfo | 175 | document trapinfo |
176 | Run info threads and lookup pid of thread #1 | 176 | Run info threads and lookup pid of thread #1 |
177 | 'trapinfo <pid>' will tell you by which trap & possibly | 177 | 'trapinfo <pid>' will tell you by which trap & possibly |
178 | addresthe kernel paniced. | 178 | address the kernel panicked. |
179 | end | 179 | end |
180 | 180 | ||
181 | 181 | ||
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index bf5d2cd6a56e..86e9282d1c20 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -1665,6 +1665,10 @@ running once the system is up. | |||
1665 | usbhid.mousepoll= | 1665 | usbhid.mousepoll= |
1666 | [USBHID] The interval which mice are to be polled at. | 1666 | [USBHID] The interval which mice are to be polled at. |
1667 | 1667 | ||
1668 | vdso= [IA-32] | ||
1669 | vdso=1: enable VDSO (default) | ||
1670 | vdso=0: disable VDSO mapping | ||
1671 | |||
1668 | video= [FB] Frame buffer configuration | 1672 | video= [FB] Frame buffer configuration |
1669 | See Documentation/fb/modedb.txt. | 1673 | See Documentation/fb/modedb.txt. |
1670 | 1674 | ||
@@ -1681,9 +1685,14 @@ running once the system is up. | |||
1681 | decrease the size and leave more room for directly | 1685 | decrease the size and leave more room for directly |
1682 | mapped kernel RAM. | 1686 | mapped kernel RAM. |
1683 | 1687 | ||
1684 | vmhalt= [KNL,S390] | 1688 | vmhalt= [KNL,S390] Perform z/VM CP command after system halt. |
1689 | Format: <command> | ||
1690 | |||
1691 | vmpanic= [KNL,S390] Perform z/VM CP command after kernel panic. | ||
1692 | Format: <command> | ||
1685 | 1693 | ||
1686 | vmpoff= [KNL,S390] | 1694 | vmpoff= [KNL,S390] Perform z/VM CP command after power off. |
1695 | Format: <command> | ||
1687 | 1696 | ||
1688 | waveartist= [HW,OSS] | 1697 | waveartist= [HW,OSS] |
1689 | Format: <io>,<irq>,<dma>,<dma2> | 1698 | Format: <io>,<irq>,<dma>,<dma2> |
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 22488d791168..c1f64fdf84cb 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt | |||
@@ -3,16 +3,23 @@ | |||
3 | =================== | 3 | =================== |
4 | 4 | ||
5 | The key request service is part of the key retention service (refer to | 5 | The key request service is part of the key retention service (refer to |
6 | Documentation/keys.txt). This document explains more fully how that the | 6 | Documentation/keys.txt). This document explains more fully how the requesting |
7 | requesting algorithm works. | 7 | algorithm works. |
8 | 8 | ||
9 | The process starts by either the kernel requesting a service by calling | 9 | The process starts by either the kernel requesting a service by calling |
10 | request_key(): | 10 | request_key*(): |
11 | 11 | ||
12 | struct key *request_key(const struct key_type *type, | 12 | struct key *request_key(const struct key_type *type, |
13 | const char *description, | 13 | const char *description, |
14 | const char *callout_string); | 14 | const char *callout_string); |
15 | 15 | ||
16 | or: | ||
17 | |||
18 | struct key *request_key_with_auxdata(const struct key_type *type, | ||
19 | const char *description, | ||
20 | const char *callout_string, | ||
21 | void *aux); | ||
22 | |||
16 | Or by userspace invoking the request_key system call: | 23 | Or by userspace invoking the request_key system call: |
17 | 24 | ||
18 | key_serial_t request_key(const char *type, | 25 | key_serial_t request_key(const char *type, |
@@ -20,16 +27,26 @@ Or by userspace invoking the request_key system call: | |||
20 | const char *callout_info, | 27 | const char *callout_info, |
21 | key_serial_t dest_keyring); | 28 | key_serial_t dest_keyring); |
22 | 29 | ||
23 | The main difference between the two access points is that the in-kernel | 30 | The main difference between the access points is that the in-kernel interface |
24 | interface does not need to link the key to a keyring to prevent it from being | 31 | does not need to link the key to a keyring to prevent it from being immediately |
25 | immediately destroyed. The kernel interface returns a pointer directly to the | 32 | destroyed. The kernel interface returns a pointer directly to the key, and |
26 | key, and it's up to the caller to destroy the key. | 33 | it's up to the caller to destroy the key. |
34 | |||
35 | The request_key_with_auxdata() call is like the in-kernel request_key() call, | ||
36 | except that it permits auxiliary data to be passed to the upcaller (the default | ||
37 | is NULL). This is only useful for those key types that define their own upcall | ||
38 | mechanism rather than using /sbin/request-key. | ||
27 | 39 | ||
28 | The userspace interface links the key to a keyring associated with the process | 40 | The userspace interface links the key to a keyring associated with the process |
29 | to prevent the key from going away, and returns the serial number of the key to | 41 | to prevent the key from going away, and returns the serial number of the key to |
30 | the caller. | 42 | the caller. |
31 | 43 | ||
32 | 44 | ||
45 | The following example assumes that the key types involved don't define their | ||
46 | own upcall mechanisms. If they do, then those should be substituted for the | ||
47 | forking and execution of /sbin/request-key. | ||
48 | |||
49 | |||
33 | =========== | 50 | =========== |
34 | THE PROCESS | 51 | THE PROCESS |
35 | =========== | 52 | =========== |
@@ -40,8 +57,8 @@ A request proceeds in the following manner: | |||
40 | interface]. | 57 | interface]. |
41 | 58 | ||
42 | (2) request_key() searches the process's subscribed keyrings to see if there's | 59 | (2) request_key() searches the process's subscribed keyrings to see if there's |
43 | a suitable key there. If there is, it returns the key. If there isn't, and | 60 | a suitable key there. If there is, it returns the key. If there isn't, |
44 | callout_info is not set, an error is returned. Otherwise the process | 61 | and callout_info is not set, an error is returned. Otherwise the process |
45 | proceeds to the next step. | 62 | proceeds to the next step. |
46 | 63 | ||
47 | (3) request_key() sees that A doesn't have the desired key yet, so it creates | 64 | (3) request_key() sees that A doesn't have the desired key yet, so it creates |
@@ -62,7 +79,7 @@ A request proceeds in the following manner: | |||
62 | instantiation. | 79 | instantiation. |
63 | 80 | ||
64 | (7) The program may want to access another key from A's context (say a | 81 | (7) The program may want to access another key from A's context (say a |
65 | Kerberos TGT key). It just requests the appropriate key, and the keyring | 82 | Kerberos TGT key). It just requests the appropriate key, and the keyring |
66 | search notes that the session keyring has auth key V in its bottom level. | 83 | search notes that the session keyring has auth key V in its bottom level. |
67 | 84 | ||
68 | This will permit it to then search the keyrings of process A with the | 85 | This will permit it to then search the keyrings of process A with the |
@@ -79,10 +96,11 @@ A request proceeds in the following manner: | |||
79 | (10) The program then exits 0 and request_key() deletes key V and returns key | 96 | (10) The program then exits 0 and request_key() deletes key V and returns key |
80 | U to the caller. | 97 | U to the caller. |
81 | 98 | ||
82 | This also extends further. If key W (step 7 above) didn't exist, key W would be | 99 | This also extends further. If key W (step 7 above) didn't exist, key W would |
83 | created uninstantiated, another auth key (X) would be created (as per step 3) | 100 | be created uninstantiated, another auth key (X) would be created (as per step |
84 | and another copy of /sbin/request-key spawned (as per step 4); but the context | 101 | 3) and another copy of /sbin/request-key spawned (as per step 4); but the |
85 | specified by auth key X will still be process A, as it was in auth key V. | 102 | context specified by auth key X will still be process A, as it was in auth key |
103 | V. | ||
86 | 104 | ||
87 | This is because process A's keyrings can't simply be attached to | 105 | This is because process A's keyrings can't simply be attached to |
88 | /sbin/request-key at the appropriate places because (a) execve will discard two | 106 | /sbin/request-key at the appropriate places because (a) execve will discard two |
@@ -118,17 +136,17 @@ A search of any particular keyring proceeds in the following fashion: | |||
118 | 136 | ||
119 | (2) It considers all the non-keyring keys within that keyring and, if any key | 137 | (2) It considers all the non-keyring keys within that keyring and, if any key |
120 | matches the criteria specified, calls key_permission(SEARCH) on it to see | 138 | matches the criteria specified, calls key_permission(SEARCH) on it to see |
121 | if the key is allowed to be found. If it is, that key is returned; if | 139 | if the key is allowed to be found. If it is, that key is returned; if |
122 | not, the search continues, and the error code is retained if of higher | 140 | not, the search continues, and the error code is retained if of higher |
123 | priority than the one currently set. | 141 | priority than the one currently set. |
124 | 142 | ||
125 | (3) It then considers all the keyring-type keys in the keyring it's currently | 143 | (3) It then considers all the keyring-type keys in the keyring it's currently |
126 | searching. It calls key_permission(SEARCH) on each keyring, and if this | 144 | searching. It calls key_permission(SEARCH) on each keyring, and if this |
127 | grants permission, it recurses, executing steps (2) and (3) on that | 145 | grants permission, it recurses, executing steps (2) and (3) on that |
128 | keyring. | 146 | keyring. |
129 | 147 | ||
130 | The process stops immediately a valid key is found with permission granted to | 148 | The process stops immediately a valid key is found with permission granted to |
131 | use it. Any error from a previous match attempt is discarded and the key is | 149 | use it. Any error from a previous match attempt is discarded and the key is |
132 | returned. | 150 | returned. |
133 | 151 | ||
134 | When search_process_keyrings() is invoked, it performs the following searches | 152 | When search_process_keyrings() is invoked, it performs the following searches |
@@ -153,7 +171,7 @@ The moment one succeeds, all pending errors are discarded and the found key is | |||
153 | returned. | 171 | returned. |
154 | 172 | ||
155 | Only if all these fail does the whole thing fail with the highest priority | 173 | Only if all these fail does the whole thing fail with the highest priority |
156 | error. Note that several errors may have come from LSM. | 174 | error. Note that several errors may have come from LSM. |
157 | 175 | ||
158 | The error priority is: | 176 | The error priority is: |
159 | 177 | ||
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 61c0fad2fe2f..e373f0212843 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -780,6 +780,17 @@ payload contents" for more information. | |||
780 | See also Documentation/keys-request-key.txt. | 780 | See also Documentation/keys-request-key.txt. |
781 | 781 | ||
782 | 782 | ||
783 | (*) To search for a key, passing auxiliary data to the upcaller, call: | ||
784 | |||
785 | struct key *request_key_with_auxdata(const struct key_type *type, | ||
786 | const char *description, | ||
787 | const char *callout_string, | ||
788 | void *aux); | ||
789 | |||
790 | This is identical to request_key(), except that the auxiliary data is | ||
791 | passed to the key_type->request_key() op if it exists. | ||
792 | |||
793 | |||
783 | (*) When it is no longer required, the key should be released using: | 794 | (*) When it is no longer required, the key should be released using: |
784 | 795 | ||
785 | void key_put(struct key *key); | 796 | void key_put(struct key *key); |
@@ -1031,6 +1042,24 @@ The structure has a number of fields, some of which are mandatory: | |||
1031 | as might happen when the userspace buffer is accessed. | 1042 | as might happen when the userspace buffer is accessed. |
1032 | 1043 | ||
1033 | 1044 | ||
1045 | (*) int (*request_key)(struct key *key, struct key *authkey, const char *op, | ||
1046 | void *aux); | ||
1047 | |||
1048 | This method is optional. If provided, request_key() and | ||
1049 | request_key_with_auxdata() will invoke this function rather than | ||
1050 | upcalling to /sbin/request-key to operate upon a key of this type. | ||
1051 | |||
1052 | The aux parameter is as passed to request_key_with_auxdata() or is NULL | ||
1053 | otherwise. Also passed are the key to be operated upon, the | ||
1054 | authorisation key for this operation and the operation type (currently | ||
1055 | only "create"). | ||
1056 | |||
1057 | This function should return only when the upcall is complete. Upon return | ||
1058 | the authorisation key will be revoked, and the target key will be | ||
1059 | negatively instantiated if it is still uninstantiated. The error will be | ||
1060 | returned to the caller of request_key*(). | ||
1061 | |||
1062 | |||
1034 | ============================ | 1063 | ============================ |
1035 | REQUEST-KEY CALLBACK SERVICE | 1064 | REQUEST-KEY CALLBACK SERVICE |
1036 | ============================ | 1065 | ============================ |
diff --git a/Documentation/pi-futex.txt b/Documentation/pi-futex.txt new file mode 100644 index 000000000000..5d61dacd21f6 --- /dev/null +++ b/Documentation/pi-futex.txt | |||
@@ -0,0 +1,121 @@ | |||
1 | Lightweight PI-futexes | ||
2 | ---------------------- | ||
3 | |||
4 | We are calling them lightweight for 3 reasons: | ||
5 | |||
6 | - in the user-space fastpath a PI-enabled futex involves no kernel work | ||
7 | (or any other PI complexity) at all. No registration, no extra kernel | ||
8 | calls - just pure fast atomic ops in userspace. | ||
9 | |||
10 | - even in the slowpath, the system call and scheduling pattern is very | ||
11 | similar to normal futexes. | ||
12 | |||
13 | - the in-kernel PI implementation is streamlined around the mutex | ||
14 | abstraction, with strict rules that keep the implementation | ||
15 | relatively simple: only a single owner may own a lock (i.e. no | ||
16 | read-write lock support), only the owner may unlock a lock, no | ||
17 | recursive locking, etc. | ||
18 | |||
19 | Priority Inheritance - why? | ||
20 | --------------------------- | ||
21 | |||
22 | The short reply: user-space PI helps achieving/improving determinism for | ||
23 | user-space applications. In the best-case, it can help achieve | ||
24 | determinism and well-bound latencies. Even in the worst-case, PI will | ||
25 | improve the statistical distribution of locking related application | ||
26 | delays. | ||
27 | |||
28 | The longer reply: | ||
29 | ----------------- | ||
30 | |||
31 | Firstly, sharing locks between multiple tasks is a common programming | ||
32 | technique that often cannot be replaced with lockless algorithms. As we | ||
33 | can see it in the kernel [which is a quite complex program in itself], | ||
34 | lockless structures are rather the exception than the norm - the current | ||
35 | ratio of lockless vs. locky code for shared data structures is somewhere | ||
36 | between 1:10 and 1:100. Lockless is hard, and the complexity of lockless | ||
37 | algorithms often endangers to ability to do robust reviews of said code. | ||
38 | I.e. critical RT apps often choose lock structures to protect critical | ||
39 | data structures, instead of lockless algorithms. Furthermore, there are | ||
40 | cases (like shared hardware, or other resource limits) where lockless | ||
41 | access is mathematically impossible. | ||
42 | |||
43 | Media players (such as Jack) are an example of reasonable application | ||
44 | design with multiple tasks (with multiple priority levels) sharing | ||
45 | short-held locks: for example, a highprio audio playback thread is | ||
46 | combined with medium-prio construct-audio-data threads and low-prio | ||
47 | display-colory-stuff threads. Add video and decoding to the mix and | ||
48 | we've got even more priority levels. | ||
49 | |||
50 | So once we accept that synchronization objects (locks) are an | ||
51 | unavoidable fact of life, and once we accept that multi-task userspace | ||
52 | apps have a very fair expectation of being able to use locks, we've got | ||
53 | to think about how to offer the option of a deterministic locking | ||
54 | implementation to user-space. | ||
55 | |||
56 | Most of the technical counter-arguments against doing priority | ||
57 | inheritance only apply to kernel-space locks. But user-space locks are | ||
58 | different, there we cannot disable interrupts or make the task | ||
59 | non-preemptible in a critical section, so the 'use spinlocks' argument | ||
60 | does not apply (user-space spinlocks have the same priority inversion | ||
61 | problems as other user-space locking constructs). Fact is, pretty much | ||
62 | the only technique that currently enables good determinism for userspace | ||
63 | locks (such as futex-based pthread mutexes) is priority inheritance: | ||
64 | |||
65 | Currently (without PI), if a high-prio and a low-prio task shares a lock | ||
66 | [this is a quite common scenario for most non-trivial RT applications], | ||
67 | even if all critical sections are coded carefully to be deterministic | ||
68 | (i.e. all critical sections are short in duration and only execute a | ||
69 | limited number of instructions), the kernel cannot guarantee any | ||
70 | deterministic execution of the high-prio task: any medium-priority task | ||
71 | could preempt the low-prio task while it holds the shared lock and | ||
72 | executes the critical section, and could delay it indefinitely. | ||
73 | |||
74 | Implementation: | ||
75 | --------------- | ||
76 | |||
77 | As mentioned before, the userspace fastpath of PI-enabled pthread | ||
78 | mutexes involves no kernel work at all - they behave quite similarly to | ||
79 | normal futex-based locks: a 0 value means unlocked, and a value==TID | ||
80 | means locked. (This is the same method as used by list-based robust | ||
81 | futexes.) Userspace uses atomic ops to lock/unlock these mutexes without | ||
82 | entering the kernel. | ||
83 | |||
84 | To handle the slowpath, we have added two new futex ops: | ||
85 | |||
86 | FUTEX_LOCK_PI | ||
87 | FUTEX_UNLOCK_PI | ||
88 | |||
89 | If the lock-acquire fastpath fails, [i.e. an atomic transition from 0 to | ||
90 | TID fails], then FUTEX_LOCK_PI is called. The kernel does all the | ||
91 | remaining work: if there is no futex-queue attached to the futex address | ||
92 | yet then the code looks up the task that owns the futex [it has put its | ||
93 | own TID into the futex value], and attaches a 'PI state' structure to | ||
94 | the futex-queue. The pi_state includes an rt-mutex, which is a PI-aware, | ||
95 | kernel-based synchronization object. The 'other' task is made the owner | ||
96 | of the rt-mutex, and the FUTEX_WAITERS bit is atomically set in the | ||
97 | futex value. Then this task tries to lock the rt-mutex, on which it | ||
98 | blocks. Once it returns, it has the mutex acquired, and it sets the | ||
99 | futex value to its own TID and returns. Userspace has no other work to | ||
100 | perform - it now owns the lock, and futex value contains | ||
101 | FUTEX_WAITERS|TID. | ||
102 | |||
103 | If the unlock side fastpath succeeds, [i.e. userspace manages to do a | ||
104 | TID -> 0 atomic transition of the futex value], then no kernel work is | ||
105 | triggered. | ||
106 | |||
107 | If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), | ||
108 | then FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the | ||
109 | behalf of userspace - and it also unlocks the attached | ||
110 | pi_state->rt_mutex and thus wakes up any potential waiters. | ||
111 | |||
112 | Note that under this approach, contrary to previous PI-futex approaches, | ||
113 | there is no prior 'registration' of a PI-futex. [which is not quite | ||
114 | possible anyway, due to existing ABI properties of pthread mutexes.] | ||
115 | |||
116 | Also, under this scheme, 'robustness' and 'PI' are two orthogonal | ||
117 | properties of futexes, and all four combinations are possible: futex, | ||
118 | robust-futex, PI-futex, robust+PI-futex. | ||
119 | |||
120 | More details about priority inheritance can be found in | ||
121 | Documentation/rtmutex.txt. | ||
diff --git a/Documentation/robust-futexes.txt b/Documentation/robust-futexes.txt index df82d75245a0..76e8064b8c3a 100644 --- a/Documentation/robust-futexes.txt +++ b/Documentation/robust-futexes.txt | |||
@@ -95,7 +95,7 @@ comparison. If the thread has registered a list, then normally the list | |||
95 | is empty. If the thread/process crashed or terminated in some incorrect | 95 | is empty. If the thread/process crashed or terminated in some incorrect |
96 | way then the list might be non-empty: in this case the kernel carefully | 96 | way then the list might be non-empty: in this case the kernel carefully |
97 | walks the list [not trusting it], and marks all locks that are owned by | 97 | walks the list [not trusting it], and marks all locks that are owned by |
98 | this thread with the FUTEX_OWNER_DEAD bit, and wakes up one waiter (if | 98 | this thread with the FUTEX_OWNER_DIED bit, and wakes up one waiter (if |
99 | any). | 99 | any). |
100 | 100 | ||
101 | The list is guaranteed to be private and per-thread at do_exit() time, | 101 | The list is guaranteed to be private and per-thread at do_exit() time, |
diff --git a/Documentation/rt-mutex-design.txt b/Documentation/rt-mutex-design.txt new file mode 100644 index 000000000000..c472ffacc2f6 --- /dev/null +++ b/Documentation/rt-mutex-design.txt | |||
@@ -0,0 +1,781 @@ | |||
1 | # | ||
2 | # Copyright (c) 2006 Steven Rostedt | ||
3 | # Licensed under the GNU Free Documentation License, Version 1.2 | ||
4 | # | ||
5 | |||
6 | RT-mutex implementation design | ||
7 | ------------------------------ | ||
8 | |||
9 | This document tries to describe the design of the rtmutex.c implementation. | ||
10 | It doesn't describe the reasons why rtmutex.c exists. For that please see | ||
11 | Documentation/rt-mutex.txt. Although this document does explain problems | ||
12 | that happen without this code, but that is in the concept to understand | ||
13 | what the code actually is doing. | ||
14 | |||
15 | The goal of this document is to help others understand the priority | ||
16 | inheritance (PI) algorithm that is used, as well as reasons for the | ||
17 | decisions that were made to implement PI in the manner that was done. | ||
18 | |||
19 | |||
20 | Unbounded Priority Inversion | ||
21 | ---------------------------- | ||
22 | |||
23 | Priority inversion is when a lower priority process executes while a higher | ||
24 | priority process wants to run. This happens for several reasons, and | ||
25 | most of the time it can't be helped. Anytime a high priority process wants | ||
26 | to use a resource that a lower priority process has (a mutex for example), | ||
27 | the high priority process must wait until the lower priority process is done | ||
28 | with the resource. This is a priority inversion. What we want to prevent | ||
29 | is something called unbounded priority inversion. That is when the high | ||
30 | priority process is prevented from running by a lower priority process for | ||
31 | an undetermined amount of time. | ||
32 | |||
33 | The classic example of unbounded priority inversion is were you have three | ||
34 | processes, let's call them processes A, B, and C, where A is the highest | ||
35 | priority process, C is the lowest, and B is in between. A tries to grab a lock | ||
36 | that C owns and must wait and lets C run to release the lock. But in the | ||
37 | meantime, B executes, and since B is of a higher priority than C, it preempts C, | ||
38 | but by doing so, it is in fact preempting A which is a higher priority process. | ||
39 | Now there's no way of knowing how long A will be sleeping waiting for C | ||
40 | to release the lock, because for all we know, B is a CPU hog and will | ||
41 | never give C a chance to release the lock. This is called unbounded priority | ||
42 | inversion. | ||
43 | |||
44 | Here's a little ASCII art to show the problem. | ||
45 | |||
46 | grab lock L1 (owned by C) | ||
47 | | | ||
48 | A ---+ | ||
49 | C preempted by B | ||
50 | | | ||
51 | C +----+ | ||
52 | |||
53 | B +--------> | ||
54 | B now keeps A from running. | ||
55 | |||
56 | |||
57 | Priority Inheritance (PI) | ||
58 | ------------------------- | ||
59 | |||
60 | There are several ways to solve this issue, but other ways are out of scope | ||
61 | for this document. Here we only discuss PI. | ||
62 | |||
63 | PI is where a process inherits the priority of another process if the other | ||
64 | process blocks on a lock owned by the current process. To make this easier | ||
65 | to understand, let's use the previous example, with processes A, B, and C again. | ||
66 | |||
67 | This time, when A blocks on the lock owned by C, C would inherit the priority | ||
68 | of A. So now if B becomes runnable, it would not preempt C, since C now has | ||
69 | the high priority of A. As soon as C releases the lock, it loses its | ||
70 | inherited priority, and A then can continue with the resource that C had. | ||
71 | |||
72 | Terminology | ||
73 | ----------- | ||
74 | |||
75 | Here I explain some terminology that is used in this document to help describe | ||
76 | the design that is used to implement PI. | ||
77 | |||
78 | PI chain - The PI chain is an ordered series of locks and processes that cause | ||
79 | processes to inherit priorities from a previous process that is | ||
80 | blocked on one of its locks. This is described in more detail | ||
81 | later in this document. | ||
82 | |||
83 | mutex - In this document, to differentiate from locks that implement | ||
84 | PI and spin locks that are used in the PI code, from now on | ||
85 | the PI locks will be called a mutex. | ||
86 | |||
87 | lock - In this document from now on, I will use the term lock when | ||
88 | referring to spin locks that are used to protect parts of the PI | ||
89 | algorithm. These locks disable preemption for UP (when | ||
90 | CONFIG_PREEMPT is enabled) and on SMP prevents multiple CPUs from | ||
91 | entering critical sections simultaneously. | ||
92 | |||
93 | spin lock - Same as lock above. | ||
94 | |||
95 | waiter - A waiter is a struct that is stored on the stack of a blocked | ||
96 | process. Since the scope of the waiter is within the code for | ||
97 | a process being blocked on the mutex, it is fine to allocate | ||
98 | the waiter on the process's stack (local variable). This | ||
99 | structure holds a pointer to the task, as well as the mutex that | ||
100 | the task is blocked on. It also has the plist node structures to | ||
101 | place the task in the waiter_list of a mutex as well as the | ||
102 | pi_list of a mutex owner task (described below). | ||
103 | |||
104 | waiter is sometimes used in reference to the task that is waiting | ||
105 | on a mutex. This is the same as waiter->task. | ||
106 | |||
107 | waiters - A list of processes that are blocked on a mutex. | ||
108 | |||
109 | top waiter - The highest priority process waiting on a specific mutex. | ||
110 | |||
111 | top pi waiter - The highest priority process waiting on one of the mutexes | ||
112 | that a specific process owns. | ||
113 | |||
114 | Note: task and process are used interchangeably in this document, mostly to | ||
115 | differentiate between two processes that are being described together. | ||
116 | |||
117 | |||
118 | PI chain | ||
119 | -------- | ||
120 | |||
121 | The PI chain is a list of processes and mutexes that may cause priority | ||
122 | inheritance to take place. Multiple chains may converge, but a chain | ||
123 | would never diverge, since a process can't be blocked on more than one | ||
124 | mutex at a time. | ||
125 | |||
126 | Example: | ||
127 | |||
128 | Process: A, B, C, D, E | ||
129 | Mutexes: L1, L2, L3, L4 | ||
130 | |||
131 | A owns: L1 | ||
132 | B blocked on L1 | ||
133 | B owns L2 | ||
134 | C blocked on L2 | ||
135 | C owns L3 | ||
136 | D blocked on L3 | ||
137 | D owns L4 | ||
138 | E blocked on L4 | ||
139 | |||
140 | The chain would be: | ||
141 | |||
142 | E->L4->D->L3->C->L2->B->L1->A | ||
143 | |||
144 | To show where two chains merge, we could add another process F and | ||
145 | another mutex L5 where B owns L5 and F is blocked on mutex L5. | ||
146 | |||
147 | The chain for F would be: | ||
148 | |||
149 | F->L5->B->L1->A | ||
150 | |||
151 | Since a process may own more than one mutex, but never be blocked on more than | ||
152 | one, the chains merge. | ||
153 | |||
154 | Here we show both chains: | ||
155 | |||
156 | E->L4->D->L3->C->L2-+ | ||
157 | | | ||
158 | +->B->L1->A | ||
159 | | | ||
160 | F->L5-+ | ||
161 | |||
162 | For PI to work, the processes at the right end of these chains (or we may | ||
163 | also call it the Top of the chain) must be equal to or higher in priority | ||
164 | than the processes to the left or below in the chain. | ||
165 | |||
166 | Also since a mutex may have more than one process blocked on it, we can | ||
167 | have multiple chains merge at mutexes. If we add another process G that is | ||
168 | blocked on mutex L2: | ||
169 | |||
170 | G->L2->B->L1->A | ||
171 | |||
172 | And once again, to show how this can grow I will show the merging chains | ||
173 | again. | ||
174 | |||
175 | E->L4->D->L3->C-+ | ||
176 | +->L2-+ | ||
177 | | | | ||
178 | G-+ +->B->L1->A | ||
179 | | | ||
180 | F->L5-+ | ||
181 | |||
182 | |||
183 | Plist | ||
184 | ----- | ||
185 | |||
186 | Before I go further and talk about how the PI chain is stored through lists | ||
187 | on both mutexes and processes, I'll explain the plist. This is similar to | ||
188 | the struct list_head functionality that is already in the kernel. | ||
189 | The implementation of plist is out of scope for this document, but it is | ||
190 | very important to understand what it does. | ||
191 | |||
192 | There are a few differences between plist and list, the most important one | ||
193 | being that plist is a priority sorted linked list. This means that the | ||
194 | priorities of the plist are sorted, such that it takes O(1) to retrieve the | ||
195 | highest priority item in the list. Obviously this is useful to store processes | ||
196 | based on their priorities. | ||
197 | |||
198 | Another difference, which is important for implementation, is that, unlike | ||
199 | list, the head of the list is a different element than the nodes of a list. | ||
200 | So the head of the list is declared as struct plist_head and nodes that will | ||
201 | be added to the list are declared as struct plist_node. | ||
202 | |||
203 | |||
204 | Mutex Waiter List | ||
205 | ----------------- | ||
206 | |||
207 | Every mutex keeps track of all the waiters that are blocked on itself. The mutex | ||
208 | has a plist to store these waiters by priority. This list is protected by | ||
209 | a spin lock that is located in the struct of the mutex. This lock is called | ||
210 | wait_lock. Since the modification of the waiter list is never done in | ||
211 | interrupt context, the wait_lock can be taken without disabling interrupts. | ||
212 | |||
213 | |||
214 | Task PI List | ||
215 | ------------ | ||
216 | |||
217 | To keep track of the PI chains, each process has its own PI list. This is | ||
218 | a list of all top waiters of the mutexes that are owned by the process. | ||
219 | Note that this list only holds the top waiters and not all waiters that are | ||
220 | blocked on mutexes owned by the process. | ||
221 | |||
222 | The top of the task's PI list is always the highest priority task that | ||
223 | is waiting on a mutex that is owned by the task. So if the task has | ||
224 | inherited a priority, it will always be the priority of the task that is | ||
225 | at the top of this list. | ||
226 | |||
227 | This list is stored in the task structure of a process as a plist called | ||
228 | pi_list. This list is protected by a spin lock also in the task structure, | ||
229 | called pi_lock. This lock may also be taken in interrupt context, so when | ||
230 | locking the pi_lock, interrupts must be disabled. | ||
231 | |||
232 | |||
233 | Depth of the PI Chain | ||
234 | --------------------- | ||
235 | |||
236 | The maximum depth of the PI chain is not dynamic, and could actually be | ||
237 | defined. But is very complex to figure it out, since it depends on all | ||
238 | the nesting of mutexes. Let's look at the example where we have 3 mutexes, | ||
239 | L1, L2, and L3, and four separate functions func1, func2, func3 and func4. | ||
240 | The following shows a locking order of L1->L2->L3, but may not actually | ||
241 | be directly nested that way. | ||
242 | |||
243 | void func1(void) | ||
244 | { | ||
245 | mutex_lock(L1); | ||
246 | |||
247 | /* do anything */ | ||
248 | |||
249 | mutex_unlock(L1); | ||
250 | } | ||
251 | |||
252 | void func2(void) | ||
253 | { | ||
254 | mutex_lock(L1); | ||
255 | mutex_lock(L2); | ||
256 | |||
257 | /* do something */ | ||
258 | |||
259 | mutex_unlock(L2); | ||
260 | mutex_unlock(L1); | ||
261 | } | ||
262 | |||
263 | void func3(void) | ||
264 | { | ||
265 | mutex_lock(L2); | ||
266 | mutex_lock(L3); | ||
267 | |||
268 | /* do something else */ | ||
269 | |||
270 | mutex_unlock(L3); | ||
271 | mutex_unlock(L2); | ||
272 | } | ||
273 | |||
274 | void func4(void) | ||
275 | { | ||
276 | mutex_lock(L3); | ||
277 | |||
278 | /* do something again */ | ||
279 | |||
280 | mutex_unlock(L3); | ||
281 | } | ||
282 | |||
283 | Now we add 4 processes that run each of these functions separately. | ||
284 | Processes A, B, C, and D which run functions func1, func2, func3 and func4 | ||
285 | respectively, and such that D runs first and A last. With D being preempted | ||
286 | in func4 in the "do something again" area, we have a locking that follows: | ||
287 | |||
288 | D owns L3 | ||
289 | C blocked on L3 | ||
290 | C owns L2 | ||
291 | B blocked on L2 | ||
292 | B owns L1 | ||
293 | A blocked on L1 | ||
294 | |||
295 | And thus we have the chain A->L1->B->L2->C->L3->D. | ||
296 | |||
297 | This gives us a PI depth of 4 (four processes), but looking at any of the | ||
298 | functions individually, it seems as though they only have at most a locking | ||
299 | depth of two. So, although the locking depth is defined at compile time, | ||
300 | it still is very difficult to find the possibilities of that depth. | ||
301 | |||
302 | Now since mutexes can be defined by user-land applications, we don't want a DOS | ||
303 | type of application that nests large amounts of mutexes to create a large | ||
304 | PI chain, and have the code holding spin locks while looking at a large | ||
305 | amount of data. So to prevent this, the implementation not only implements | ||
306 | a maximum lock depth, but also only holds at most two different locks at a | ||
307 | time, as it walks the PI chain. More about this below. | ||
308 | |||
309 | |||
310 | Mutex owner and flags | ||
311 | --------------------- | ||
312 | |||
313 | The mutex structure contains a pointer to the owner of the mutex. If the | ||
314 | mutex is not owned, this owner is set to NULL. Since all architectures | ||
315 | have the task structure on at least a four byte alignment (and if this is | ||
316 | not true, the rtmutex.c code will be broken!), this allows for the two | ||
317 | least significant bits to be used as flags. This part is also described | ||
318 | in Documentation/rt-mutex.txt, but will also be briefly described here. | ||
319 | |||
320 | Bit 0 is used as the "Pending Owner" flag. This is described later. | ||
321 | Bit 1 is used as the "Has Waiters" flags. This is also described later | ||
322 | in more detail, but is set whenever there are waiters on a mutex. | ||
323 | |||
324 | |||
325 | cmpxchg Tricks | ||
326 | -------------- | ||
327 | |||
328 | Some architectures implement an atomic cmpxchg (Compare and Exchange). This | ||
329 | is used (when applicable) to keep the fast path of grabbing and releasing | ||
330 | mutexes short. | ||
331 | |||
332 | cmpxchg is basically the following function performed atomically: | ||
333 | |||
334 | unsigned long _cmpxchg(unsigned long *A, unsigned long *B, unsigned long *C) | ||
335 | { | ||
336 | unsigned long T = *A; | ||
337 | if (*A == *B) { | ||
338 | *A = *C; | ||
339 | } | ||
340 | return T; | ||
341 | } | ||
342 | #define cmpxchg(a,b,c) _cmpxchg(&a,&b,&c) | ||
343 | |||
344 | This is really nice to have, since it allows you to only update a variable | ||
345 | if the variable is what you expect it to be. You know if it succeeded if | ||
346 | the return value (the old value of A) is equal to B. | ||
347 | |||
348 | The macro rt_mutex_cmpxchg is used to try to lock and unlock mutexes. If | ||
349 | the architecture does not support CMPXCHG, then this macro is simply set | ||
350 | to fail every time. But if CMPXCHG is supported, then this will | ||
351 | help out extremely to keep the fast path short. | ||
352 | |||
353 | The use of rt_mutex_cmpxchg with the flags in the owner field help optimize | ||
354 | the system for architectures that support it. This will also be explained | ||
355 | later in this document. | ||
356 | |||
357 | |||
358 | Priority adjustments | ||
359 | -------------------- | ||
360 | |||
361 | The implementation of the PI code in rtmutex.c has several places that a | ||
362 | process must adjust its priority. With the help of the pi_list of a | ||
363 | process this is rather easy to know what needs to be adjusted. | ||
364 | |||
365 | The functions implementing the task adjustments are rt_mutex_adjust_prio, | ||
366 | __rt_mutex_adjust_prio (same as the former, but expects the task pi_lock | ||
367 | to already be taken), rt_mutex_get_prio, and rt_mutex_setprio. | ||
368 | |||
369 | rt_mutex_getprio and rt_mutex_setprio are only used in __rt_mutex_adjust_prio. | ||
370 | |||
371 | rt_mutex_getprio returns the priority that the task should have. Either the | ||
372 | task's own normal priority, or if a process of a higher priority is waiting on | ||
373 | a mutex owned by the task, then that higher priority should be returned. | ||
374 | Since the pi_list of a task holds an order by priority list of all the top | ||
375 | waiters of all the mutexes that the task owns, rt_mutex_getprio simply needs | ||
376 | to compare the top pi waiter to its own normal priority, and return the higher | ||
377 | priority back. | ||
378 | |||
379 | (Note: if looking at the code, you will notice that the lower number of | ||
380 | prio is returned. This is because the prio field in the task structure | ||
381 | is an inverse order of the actual priority. So a "prio" of 5 is | ||
382 | of higher priority than a "prio" of 10.) | ||
383 | |||
384 | __rt_mutex_adjust_prio examines the result of rt_mutex_getprio, and if the | ||
385 | result does not equal the task's current priority, then rt_mutex_setprio | ||
386 | is called to adjust the priority of the task to the new priority. | ||
387 | Note that rt_mutex_setprio is defined in kernel/sched.c to implement the | ||
388 | actual change in priority. | ||
389 | |||
390 | It is interesting to note that __rt_mutex_adjust_prio can either increase | ||
391 | or decrease the priority of the task. In the case that a higher priority | ||
392 | process has just blocked on a mutex owned by the task, __rt_mutex_adjust_prio | ||
393 | would increase/boost the task's priority. But if a higher priority task | ||
394 | were for some reason to leave the mutex (timeout or signal), this same function | ||
395 | would decrease/unboost the priority of the task. That is because the pi_list | ||
396 | always contains the highest priority task that is waiting on a mutex owned | ||
397 | by the task, so we only need to compare the priority of that top pi waiter | ||
398 | to the normal priority of the given task. | ||
399 | |||
400 | |||
401 | High level overview of the PI chain walk | ||
402 | ---------------------------------------- | ||
403 | |||
404 | The PI chain walk is implemented by the function rt_mutex_adjust_prio_chain. | ||
405 | |||
406 | The implementation has gone through several iterations, and has ended up | ||
407 | with what we believe is the best. It walks the PI chain by only grabbing | ||
408 | at most two locks at a time, and is very efficient. | ||
409 | |||
410 | The rt_mutex_adjust_prio_chain can be used either to boost or lower process | ||
411 | priorities. | ||
412 | |||
413 | rt_mutex_adjust_prio_chain is called with a task to be checked for PI | ||
414 | (de)boosting (the owner of a mutex that a process is blocking on), a flag to | ||
415 | check for deadlocking, the mutex that the task owns, and a pointer to a waiter | ||
416 | that is the process's waiter struct that is blocked on the mutex (although this | ||
417 | parameter may be NULL for deboosting). | ||
418 | |||
419 | For this explanation, I will not mention deadlock detection. This explanation | ||
420 | will try to stay at a high level. | ||
421 | |||
422 | When this function is called, there are no locks held. That also means | ||
423 | that the state of the owner and lock can change when entered into this function. | ||
424 | |||
425 | Before this function is called, the task has already had rt_mutex_adjust_prio | ||
426 | performed on it. This means that the task is set to the priority that it | ||
427 | should be at, but the plist nodes of the task's waiter have not been updated | ||
428 | with the new priorities, and that this task may not be in the proper locations | ||
429 | in the pi_lists and wait_lists that the task is blocked on. This function | ||
430 | solves all that. | ||
431 | |||
432 | A loop is entered, where task is the owner to be checked for PI changes that | ||
433 | was passed by parameter (for the first iteration). The pi_lock of this task is | ||
434 | taken to prevent any more changes to the pi_list of the task. This also | ||
435 | prevents new tasks from completing the blocking on a mutex that is owned by this | ||
436 | task. | ||
437 | |||
438 | If the task is not blocked on a mutex then the loop is exited. We are at | ||
439 | the top of the PI chain. | ||
440 | |||
441 | A check is now done to see if the original waiter (the process that is blocked | ||
442 | on the current mutex) is the top pi waiter of the task. That is, is this | ||
443 | waiter on the top of the task's pi_list. If it is not, it either means that | ||
444 | there is another process higher in priority that is blocked on one of the | ||
445 | mutexes that the task owns, or that the waiter has just woken up via a signal | ||
446 | or timeout and has left the PI chain. In either case, the loop is exited, since | ||
447 | we don't need to do any more changes to the priority of the current task, or any | ||
448 | task that owns a mutex that this current task is waiting on. A priority chain | ||
449 | walk is only needed when a new top pi waiter is made to a task. | ||
450 | |||
451 | The next check sees if the task's waiter plist node has the priority equal to | ||
452 | the priority the task is set at. If they are equal, then we are done with | ||
453 | the loop. Remember that the function started with the priority of the | ||
454 | task adjusted, but the plist nodes that hold the task in other processes | ||
455 | pi_lists have not been adjusted. | ||
456 | |||
457 | Next, we look at the mutex that the task is blocked on. The mutex's wait_lock | ||
458 | is taken. This is done by a spin_trylock, because the locking order of the | ||
459 | pi_lock and wait_lock goes in the opposite direction. If we fail to grab the | ||
460 | lock, the pi_lock is released, and we restart the loop. | ||
461 | |||
462 | Now that we have both the pi_lock of the task as well as the wait_lock of | ||
463 | the mutex the task is blocked on, we update the task's waiter's plist node | ||
464 | that is located on the mutex's wait_list. | ||
465 | |||
466 | Now we release the pi_lock of the task. | ||
467 | |||
468 | Next the owner of the mutex has its pi_lock taken, so we can update the | ||
469 | task's entry in the owner's pi_list. If the task is the highest priority | ||
470 | process on the mutex's wait_list, then we remove the previous top waiter | ||
471 | from the owner's pi_list, and replace it with the task. | ||
472 | |||
473 | Note: It is possible that the task was the current top waiter on the mutex, | ||
474 | in which case the task is not yet on the pi_list of the waiter. This | ||
475 | is OK, since plist_del does nothing if the plist node is not on any | ||
476 | list. | ||
477 | |||
478 | If the task was not the top waiter of the mutex, but it was before we | ||
479 | did the priority updates, that means we are deboosting/lowering the | ||
480 | task. In this case, the task is removed from the pi_list of the owner, | ||
481 | and the new top waiter is added. | ||
482 | |||
483 | Lastly, we unlock both the pi_lock of the task, as well as the mutex's | ||
484 | wait_lock, and continue the loop again. On the next iteration of the | ||
485 | loop, the previous owner of the mutex will be the task that will be | ||
486 | processed. | ||
487 | |||
488 | Note: One might think that the owner of this mutex might have changed | ||
489 | since we just grab the mutex's wait_lock. And one could be right. | ||
490 | The important thing to remember is that the owner could not have | ||
491 | become the task that is being processed in the PI chain, since | ||
492 | we have taken that task's pi_lock at the beginning of the loop. | ||
493 | So as long as there is an owner of this mutex that is not the same | ||
494 | process as the tasked being worked on, we are OK. | ||
495 | |||
496 | Looking closely at the code, one might be confused. The check for the | ||
497 | end of the PI chain is when the task isn't blocked on anything or the | ||
498 | task's waiter structure "task" element is NULL. This check is | ||
499 | protected only by the task's pi_lock. But the code to unlock the mutex | ||
500 | sets the task's waiter structure "task" element to NULL with only | ||
501 | the protection of the mutex's wait_lock, which was not taken yet. | ||
502 | Isn't this a race condition if the task becomes the new owner? | ||
503 | |||
504 | The answer is No! The trick is the spin_trylock of the mutex's | ||
505 | wait_lock. If we fail that lock, we release the pi_lock of the | ||
506 | task and continue the loop, doing the end of PI chain check again. | ||
507 | |||
508 | In the code to release the lock, the wait_lock of the mutex is held | ||
509 | the entire time, and it is not let go when we grab the pi_lock of the | ||
510 | new owner of the mutex. So if the switch of a new owner were to happen | ||
511 | after the check for end of the PI chain and the grabbing of the | ||
512 | wait_lock, the unlocking code would spin on the new owner's pi_lock | ||
513 | but never give up the wait_lock. So the PI chain loop is guaranteed to | ||
514 | fail the spin_trylock on the wait_lock, release the pi_lock, and | ||
515 | try again. | ||
516 | |||
517 | If you don't quite understand the above, that's OK. You don't have to, | ||
518 | unless you really want to make a proof out of it ;) | ||
519 | |||
520 | |||
521 | Pending Owners and Lock stealing | ||
522 | -------------------------------- | ||
523 | |||
524 | One of the flags in the owner field of the mutex structure is "Pending Owner". | ||
525 | What this means is that an owner was chosen by the process releasing the | ||
526 | mutex, but that owner has yet to wake up and actually take the mutex. | ||
527 | |||
528 | Why is this important? Why can't we just give the mutex to another process | ||
529 | and be done with it? | ||
530 | |||
531 | The PI code is to help with real-time processes, and to let the highest | ||
532 | priority process run as long as possible with little latencies and delays. | ||
533 | If a high priority process owns a mutex that a lower priority process is | ||
534 | blocked on, when the mutex is released it would be given to the lower priority | ||
535 | process. What if the higher priority process wants to take that mutex again. | ||
536 | The high priority process would fail to take that mutex that it just gave up | ||
537 | and it would need to boost the lower priority process to run with full | ||
538 | latency of that critical section (since the low priority process just entered | ||
539 | it). | ||
540 | |||
541 | There's no reason a high priority process that gives up a mutex should be | ||
542 | penalized if it tries to take that mutex again. If the new owner of the | ||
543 | mutex has not woken up yet, there's no reason that the higher priority process | ||
544 | could not take that mutex away. | ||
545 | |||
546 | To solve this, we introduced Pending Ownership and Lock Stealing. When a | ||
547 | new process is given a mutex that it was blocked on, it is only given | ||
548 | pending ownership. This means that it's the new owner, unless a higher | ||
549 | priority process comes in and tries to grab that mutex. If a higher priority | ||
550 | process does come along and wants that mutex, we let the higher priority | ||
551 | process "steal" the mutex from the pending owner (only if it is still pending) | ||
552 | and continue with the mutex. | ||
553 | |||
554 | |||
555 | Taking of a mutex (The walk through) | ||
556 | ------------------------------------ | ||
557 | |||
558 | OK, now let's take a look at the detailed walk through of what happens when | ||
559 | taking a mutex. | ||
560 | |||
561 | The first thing that is tried is the fast taking of the mutex. This is | ||
562 | done when we have CMPXCHG enabled (otherwise the fast taking automatically | ||
563 | fails). Only when the owner field of the mutex is NULL can the lock be | ||
564 | taken with the CMPXCHG and nothing else needs to be done. | ||
565 | |||
566 | If there is contention on the lock, whether it is owned or pending owner | ||
567 | we go about the slow path (rt_mutex_slowlock). | ||
568 | |||
569 | The slow path function is where the task's waiter structure is created on | ||
570 | the stack. This is because the waiter structure is only needed for the | ||
571 | scope of this function. The waiter structure holds the nodes to store | ||
572 | the task on the wait_list of the mutex, and if need be, the pi_list of | ||
573 | the owner. | ||
574 | |||
575 | The wait_lock of the mutex is taken since the slow path of unlocking the | ||
576 | mutex also takes this lock. | ||
577 | |||
578 | We then call try_to_take_rt_mutex. This is where the architecture that | ||
579 | does not implement CMPXCHG would always grab the lock (if there's no | ||
580 | contention). | ||
581 | |||
582 | try_to_take_rt_mutex is used every time the task tries to grab a mutex in the | ||
583 | slow path. The first thing that is done here is an atomic setting of | ||
584 | the "Has Waiters" flag of the mutex's owner field. Yes, this could really | ||
585 | be false, because if the the mutex has no owner, there are no waiters and | ||
586 | the current task also won't have any waiters. But we don't have the lock | ||
587 | yet, so we assume we are going to be a waiter. The reason for this is to | ||
588 | play nice for those architectures that do have CMPXCHG. By setting this flag | ||
589 | now, the owner of the mutex can't release the mutex without going into the | ||
590 | slow unlock path, and it would then need to grab the wait_lock, which this | ||
591 | code currently holds. So setting the "Has Waiters" flag forces the owner | ||
592 | to synchronize with this code. | ||
593 | |||
594 | Now that we know that we can't have any races with the owner releasing the | ||
595 | mutex, we check to see if we can take the ownership. This is done if the | ||
596 | mutex doesn't have a owner, or if we can steal the mutex from a pending | ||
597 | owner. Let's look at the situations we have here. | ||
598 | |||
599 | 1) Has owner that is pending | ||
600 | ---------------------------- | ||
601 | |||
602 | The mutex has a owner, but it hasn't woken up and the mutex flag | ||
603 | "Pending Owner" is set. The first check is to see if the owner isn't the | ||
604 | current task. This is because this function is also used for the pending | ||
605 | owner to grab the mutex. When a pending owner wakes up, it checks to see | ||
606 | if it can take the mutex, and this is done if the owner is already set to | ||
607 | itself. If so, we succeed and leave the function, clearing the "Pending | ||
608 | Owner" bit. | ||
609 | |||
610 | If the pending owner is not current, we check to see if the current priority is | ||
611 | higher than the pending owner. If not, we fail the function and return. | ||
612 | |||
613 | There's also something special about a pending owner. That is a pending owner | ||
614 | is never blocked on a mutex. So there is no PI chain to worry about. It also | ||
615 | means that if the mutex doesn't have any waiters, there's no accounting needed | ||
616 | to update the pending owner's pi_list, since we only worry about processes | ||
617 | blocked on the current mutex. | ||
618 | |||
619 | If there are waiters on this mutex, and we just stole the ownership, we need | ||
620 | to take the top waiter, remove it from the pi_list of the pending owner, and | ||
621 | add it to the current pi_list. Note that at this moment, the pending owner | ||
622 | is no longer on the list of waiters. This is fine, since the pending owner | ||
623 | would add itself back when it realizes that it had the ownership stolen | ||
624 | from itself. When the pending owner tries to grab the mutex, it will fail | ||
625 | in try_to_take_rt_mutex if the owner field points to another process. | ||
626 | |||
627 | 2) No owner | ||
628 | ----------- | ||
629 | |||
630 | If there is no owner (or we successfully stole the lock), we set the owner | ||
631 | of the mutex to current, and set the flag of "Has Waiters" if the current | ||
632 | mutex actually has waiters, or we clear the flag if it doesn't. See, it was | ||
633 | OK that we set that flag early, since now it is cleared. | ||
634 | |||
635 | 3) Failed to grab ownership | ||
636 | --------------------------- | ||
637 | |||
638 | The most interesting case is when we fail to take ownership. This means that | ||
639 | there exists an owner, or there's a pending owner with equal or higher | ||
640 | priority than the current task. | ||
641 | |||
642 | We'll continue on the failed case. | ||
643 | |||
644 | If the mutex has a timeout, we set up a timer to go off to break us out | ||
645 | of this mutex if we failed to get it after a specified amount of time. | ||
646 | |||
647 | Now we enter a loop that will continue to try to take ownership of the mutex, or | ||
648 | fail from a timeout or signal. | ||
649 | |||
650 | Once again we try to take the mutex. This will usually fail the first time | ||
651 | in the loop, since it had just failed to get the mutex. But the second time | ||
652 | in the loop, this would likely succeed, since the task would likely be | ||
653 | the pending owner. | ||
654 | |||
655 | If the mutex is TASK_INTERRUPTIBLE a check for signals and timeout is done | ||
656 | here. | ||
657 | |||
658 | The waiter structure has a "task" field that points to the task that is blocked | ||
659 | on the mutex. This field can be NULL the first time it goes through the loop | ||
660 | or if the task is a pending owner and had it's mutex stolen. If the "task" | ||
661 | field is NULL then we need to set up the accounting for it. | ||
662 | |||
663 | Task blocks on mutex | ||
664 | -------------------- | ||
665 | |||
666 | The accounting of a mutex and process is done with the waiter structure of | ||
667 | the process. The "task" field is set to the process, and the "lock" field | ||
668 | to the mutex. The plist nodes are initialized to the processes current | ||
669 | priority. | ||
670 | |||
671 | Since the wait_lock was taken at the entry of the slow lock, we can safely | ||
672 | add the waiter to the wait_list. If the current process is the highest | ||
673 | priority process currently waiting on this mutex, then we remove the | ||
674 | previous top waiter process (if it exists) from the pi_list of the owner, | ||
675 | and add the current process to that list. Since the pi_list of the owner | ||
676 | has changed, we call rt_mutex_adjust_prio on the owner to see if the owner | ||
677 | should adjust its priority accordingly. | ||
678 | |||
679 | If the owner is also blocked on a lock, and had its pi_list changed | ||
680 | (or deadlock checking is on), we unlock the wait_lock of the mutex and go ahead | ||
681 | and run rt_mutex_adjust_prio_chain on the owner, as described earlier. | ||
682 | |||
683 | Now all locks are released, and if the current process is still blocked on a | ||
684 | mutex (waiter "task" field is not NULL), then we go to sleep (call schedule). | ||
685 | |||
686 | Waking up in the loop | ||
687 | --------------------- | ||
688 | |||
689 | The schedule can then wake up for a few reasons. | ||
690 | 1) we were given pending ownership of the mutex. | ||
691 | 2) we received a signal and was TASK_INTERRUPTIBLE | ||
692 | 3) we had a timeout and was TASK_INTERRUPTIBLE | ||
693 | |||
694 | In any of these cases, we continue the loop and once again try to grab the | ||
695 | ownership of the mutex. If we succeed, we exit the loop, otherwise we continue | ||
696 | and on signal and timeout, will exit the loop, or if we had the mutex stolen | ||
697 | we just simply add ourselves back on the lists and go back to sleep. | ||
698 | |||
699 | Note: For various reasons, because of timeout and signals, the steal mutex | ||
700 | algorithm needs to be careful. This is because the current process is | ||
701 | still on the wait_list. And because of dynamic changing of priorities, | ||
702 | especially on SCHED_OTHER tasks, the current process can be the | ||
703 | highest priority task on the wait_list. | ||
704 | |||
705 | Failed to get mutex on Timeout or Signal | ||
706 | ---------------------------------------- | ||
707 | |||
708 | If a timeout or signal occurred, the waiter's "task" field would not be | ||
709 | NULL and the task needs to be taken off the wait_list of the mutex and perhaps | ||
710 | pi_list of the owner. If this process was a high priority process, then | ||
711 | the rt_mutex_adjust_prio_chain needs to be executed again on the owner, | ||
712 | but this time it will be lowering the priorities. | ||
713 | |||
714 | |||
715 | Unlocking the Mutex | ||
716 | ------------------- | ||
717 | |||
718 | The unlocking of a mutex also has a fast path for those architectures with | ||
719 | CMPXCHG. Since the taking of a mutex on contention always sets the | ||
720 | "Has Waiters" flag of the mutex's owner, we use this to know if we need to | ||
721 | take the slow path when unlocking the mutex. If the mutex doesn't have any | ||
722 | waiters, the owner field of the mutex would equal the current process and | ||
723 | the mutex can be unlocked by just replacing the owner field with NULL. | ||
724 | |||
725 | If the owner field has the "Has Waiters" bit set (or CMPXCHG is not available), | ||
726 | the slow unlock path is taken. | ||
727 | |||
728 | The first thing done in the slow unlock path is to take the wait_lock of the | ||
729 | mutex. This synchronizes the locking and unlocking of the mutex. | ||
730 | |||
731 | A check is made to see if the mutex has waiters or not. On architectures that | ||
732 | do not have CMPXCHG, this is the location that the owner of the mutex will | ||
733 | determine if a waiter needs to be awoken or not. On architectures that | ||
734 | do have CMPXCHG, that check is done in the fast path, but it is still needed | ||
735 | in the slow path too. If a waiter of a mutex woke up because of a signal | ||
736 | or timeout between the time the owner failed the fast path CMPXCHG check and | ||
737 | the grabbing of the wait_lock, the mutex may not have any waiters, thus the | ||
738 | owner still needs to make this check. If there are no waiters than the mutex | ||
739 | owner field is set to NULL, the wait_lock is released and nothing more is | ||
740 | needed. | ||
741 | |||
742 | If there are waiters, then we need to wake one up and give that waiter | ||
743 | pending ownership. | ||
744 | |||
745 | On the wake up code, the pi_lock of the current owner is taken. The top | ||
746 | waiter of the lock is found and removed from the wait_list of the mutex | ||
747 | as well as the pi_list of the current owner. The task field of the new | ||
748 | pending owner's waiter structure is set to NULL, and the owner field of the | ||
749 | mutex is set to the new owner with the "Pending Owner" bit set, as well | ||
750 | as the "Has Waiters" bit if there still are other processes blocked on the | ||
751 | mutex. | ||
752 | |||
753 | The pi_lock of the previous owner is released, and the new pending owner's | ||
754 | pi_lock is taken. Remember that this is the trick to prevent the race | ||
755 | condition in rt_mutex_adjust_prio_chain from adding itself as a waiter | ||
756 | on the mutex. | ||
757 | |||
758 | We now clear the "pi_blocked_on" field of the new pending owner, and if | ||
759 | the mutex still has waiters pending, we add the new top waiter to the pi_list | ||
760 | of the pending owner. | ||
761 | |||
762 | Finally we unlock the pi_lock of the pending owner and wake it up. | ||
763 | |||
764 | |||
765 | Contact | ||
766 | ------- | ||
767 | |||
768 | For updates on this document, please email Steven Rostedt <rostedt@goodmis.org> | ||
769 | |||
770 | |||
771 | Credits | ||
772 | ------- | ||
773 | |||
774 | Author: Steven Rostedt <rostedt@goodmis.org> | ||
775 | |||
776 | Reviewers: Ingo Molnar, Thomas Gleixner, Thomas Duetsch, and Randy Dunlap | ||
777 | |||
778 | Updates | ||
779 | ------- | ||
780 | |||
781 | This document was originally written for 2.6.17-rc3-mm1 | ||
diff --git a/Documentation/rt-mutex.txt b/Documentation/rt-mutex.txt new file mode 100644 index 000000000000..243393d882ee --- /dev/null +++ b/Documentation/rt-mutex.txt | |||
@@ -0,0 +1,79 @@ | |||
1 | RT-mutex subsystem with PI support | ||
2 | ---------------------------------- | ||
3 | |||
4 | RT-mutexes with priority inheritance are used to support PI-futexes, | ||
5 | which enable pthread_mutex_t priority inheritance attributes | ||
6 | (PTHREAD_PRIO_INHERIT). [See Documentation/pi-futex.txt for more details | ||
7 | about PI-futexes.] | ||
8 | |||
9 | This technology was developed in the -rt tree and streamlined for | ||
10 | pthread_mutex support. | ||
11 | |||
12 | Basic principles: | ||
13 | ----------------- | ||
14 | |||
15 | RT-mutexes extend the semantics of simple mutexes by the priority | ||
16 | inheritance protocol. | ||
17 | |||
18 | A low priority owner of a rt-mutex inherits the priority of a higher | ||
19 | priority waiter until the rt-mutex is released. If the temporarily | ||
20 | boosted owner blocks on a rt-mutex itself it propagates the priority | ||
21 | boosting to the owner of the other rt_mutex it gets blocked on. The | ||
22 | priority boosting is immediately removed once the rt_mutex has been | ||
23 | unlocked. | ||
24 | |||
25 | This approach allows us to shorten the block of high-prio tasks on | ||
26 | mutexes which protect shared resources. Priority inheritance is not a | ||
27 | magic bullet for poorly designed applications, but it allows | ||
28 | well-designed applications to use userspace locks in critical parts of | ||
29 | an high priority thread, without losing determinism. | ||
30 | |||
31 | The enqueueing of the waiters into the rtmutex waiter list is done in | ||
32 | priority order. For same priorities FIFO order is chosen. For each | ||
33 | rtmutex, only the top priority waiter is enqueued into the owner's | ||
34 | priority waiters list. This list too queues in priority order. Whenever | ||
35 | the top priority waiter of a task changes (for example it timed out or | ||
36 | got a signal), the priority of the owner task is readjusted. [The | ||
37 | priority enqueueing is handled by "plists", see include/linux/plist.h | ||
38 | for more details.] | ||
39 | |||
40 | RT-mutexes are optimized for fastpath operations and have no internal | ||
41 | locking overhead when locking an uncontended mutex or unlocking a mutex | ||
42 | without waiters. The optimized fastpath operations require cmpxchg | ||
43 | support. [If that is not available then the rt-mutex internal spinlock | ||
44 | is used] | ||
45 | |||
46 | The state of the rt-mutex is tracked via the owner field of the rt-mutex | ||
47 | structure: | ||
48 | |||
49 | rt_mutex->owner holds the task_struct pointer of the owner. Bit 0 and 1 | ||
50 | are used to keep track of the "owner is pending" and "rtmutex has | ||
51 | waiters" state. | ||
52 | |||
53 | owner bit1 bit0 | ||
54 | NULL 0 0 mutex is free (fast acquire possible) | ||
55 | NULL 0 1 invalid state | ||
56 | NULL 1 0 Transitional state* | ||
57 | NULL 1 1 invalid state | ||
58 | taskpointer 0 0 mutex is held (fast release possible) | ||
59 | taskpointer 0 1 task is pending owner | ||
60 | taskpointer 1 0 mutex is held and has waiters | ||
61 | taskpointer 1 1 task is pending owner and mutex has waiters | ||
62 | |||
63 | Pending-ownership handling is a performance optimization: | ||
64 | pending-ownership is assigned to the first (highest priority) waiter of | ||
65 | the mutex, when the mutex is released. The thread is woken up and once | ||
66 | it starts executing it can acquire the mutex. Until the mutex is taken | ||
67 | by it (bit 0 is cleared) a competing higher priority thread can "steal" | ||
68 | the mutex which puts the woken up thread back on the waiters list. | ||
69 | |||
70 | The pending-ownership optimization is especially important for the | ||
71 | uninterrupted workflow of high-prio tasks which repeatedly | ||
72 | takes/releases locks that have lower-prio waiters. Without this | ||
73 | optimization the higher-prio thread would ping-pong to the lower-prio | ||
74 | task [because at unlock time we always assign a new owner]. | ||
75 | |||
76 | (*) The "mutex has waiters" bit gets set to take the lock. If the lock | ||
77 | doesn't already have an owner, this bit is quickly cleared if there are | ||
78 | no waiters. So this is a transitional state to synchronize with looking | ||
79 | at the owner field of the mutex and the mutex owner releasing the lock. | ||
diff --git a/Documentation/scsi/ppa.txt b/Documentation/scsi/ppa.txt index 0dac88d86d87..5d9223bc1bd5 100644 --- a/Documentation/scsi/ppa.txt +++ b/Documentation/scsi/ppa.txt | |||
@@ -12,5 +12,3 @@ http://www.torque.net/parport/ | |||
12 | Email list for Linux Parport | 12 | Email list for Linux Parport |
13 | linux-parport@torque.net | 13 | linux-parport@torque.net |
14 | 14 | ||
15 | Email for problems with ZIP or ZIP Plus drivers | ||
16 | campbell@torque.net | ||
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt index 87d76a5c73d0..f61af23dd85d 100644 --- a/Documentation/sound/alsa/ALSA-Configuration.txt +++ b/Documentation/sound/alsa/ALSA-Configuration.txt | |||
@@ -472,6 +472,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
472 | 472 | ||
473 | The power-management is supported. | 473 | The power-management is supported. |
474 | 474 | ||
475 | Module snd-darla20 | ||
476 | ------------------ | ||
477 | |||
478 | Module for Echoaudio Darla20 | ||
479 | |||
480 | This module supports multiple cards. | ||
481 | The driver requires the firmware loader support on kernel. | ||
482 | |||
483 | Module snd-darla24 | ||
484 | ------------------ | ||
485 | |||
486 | Module for Echoaudio Darla24 | ||
487 | |||
488 | This module supports multiple cards. | ||
489 | The driver requires the firmware loader support on kernel. | ||
490 | |||
475 | Module snd-dt019x | 491 | Module snd-dt019x |
476 | ----------------- | 492 | ----------------- |
477 | 493 | ||
@@ -499,6 +515,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
499 | 515 | ||
500 | The power-management is supported. | 516 | The power-management is supported. |
501 | 517 | ||
518 | Module snd-echo3g | ||
519 | ----------------- | ||
520 | |||
521 | Module for Echoaudio 3G cards (Gina3G/Layla3G) | ||
522 | |||
523 | This module supports multiple cards. | ||
524 | The driver requires the firmware loader support on kernel. | ||
525 | |||
502 | Module snd-emu10k1 | 526 | Module snd-emu10k1 |
503 | ------------------ | 527 | ------------------ |
504 | 528 | ||
@@ -657,6 +681,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
657 | 681 | ||
658 | The power-management is supported. | 682 | The power-management is supported. |
659 | 683 | ||
684 | Module snd-gina20 | ||
685 | ----------------- | ||
686 | |||
687 | Module for Echoaudio Gina20 | ||
688 | |||
689 | This module supports multiple cards. | ||
690 | The driver requires the firmware loader support on kernel. | ||
691 | |||
692 | Module snd-gina24 | ||
693 | ----------------- | ||
694 | |||
695 | Module for Echoaudio Gina24 | ||
696 | |||
697 | This module supports multiple cards. | ||
698 | The driver requires the firmware loader support on kernel. | ||
699 | |||
660 | Module snd-gusclassic | 700 | Module snd-gusclassic |
661 | --------------------- | 701 | --------------------- |
662 | 702 | ||
@@ -760,12 +800,18 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
760 | basic fixed pin assignment w/o SPDIF | 800 | basic fixed pin assignment w/o SPDIF |
761 | auto auto-config reading BIOS (default) | 801 | auto auto-config reading BIOS (default) |
762 | 802 | ||
763 | ALC882/883/885 | 803 | ALC882/885 |
764 | 3stack-dig 3-jack with SPDIF I/O | 804 | 3stack-dig 3-jack with SPDIF I/O |
765 | 6stck-dig 6-jack digital with SPDIF I/O | 805 | 6stck-dig 6-jack digital with SPDIF I/O |
766 | auto auto-config reading BIOS (default) | 806 | auto auto-config reading BIOS (default) |
767 | 807 | ||
768 | ALC861 | 808 | ALC883/888 |
809 | 3stack-dig 3-jack with SPDIF I/O | ||
810 | 6stack-dig 6-jack digital with SPDIF I/O | ||
811 | 6stack-dig-demo 6-stack digital for Intel demo board | ||
812 | auto auto-config reading BIOS (default) | ||
813 | |||
814 | ALC861/660 | ||
769 | 3stack 3-jack | 815 | 3stack 3-jack |
770 | 3stack-dig 3-jack with SPDIF I/O | 816 | 3stack-dig 3-jack with SPDIF I/O |
771 | 6stack-dig 6-jack with SPDIF I/O | 817 | 6stack-dig 6-jack with SPDIF I/O |
@@ -937,6 +983,30 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
937 | driver isn't configured properly or you want to try another | 983 | driver isn't configured properly or you want to try another |
938 | type for testing. | 984 | type for testing. |
939 | 985 | ||
986 | Module snd-indigo | ||
987 | ----------------- | ||
988 | |||
989 | Module for Echoaudio Indigo | ||
990 | |||
991 | This module supports multiple cards. | ||
992 | The driver requires the firmware loader support on kernel. | ||
993 | |||
994 | Module snd-indigodj | ||
995 | ------------------- | ||
996 | |||
997 | Module for Echoaudio Indigo DJ | ||
998 | |||
999 | This module supports multiple cards. | ||
1000 | The driver requires the firmware loader support on kernel. | ||
1001 | |||
1002 | Module snd-indigoio | ||
1003 | ------------------- | ||
1004 | |||
1005 | Module for Echoaudio Indigo IO | ||
1006 | |||
1007 | This module supports multiple cards. | ||
1008 | The driver requires the firmware loader support on kernel. | ||
1009 | |||
940 | Module snd-intel8x0 | 1010 | Module snd-intel8x0 |
941 | ------------------- | 1011 | ------------------- |
942 | 1012 | ||
@@ -1036,6 +1106,22 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1036 | 1106 | ||
1037 | This module supports multiple cards. | 1107 | This module supports multiple cards. |
1038 | 1108 | ||
1109 | Module snd-layla20 | ||
1110 | ------------------ | ||
1111 | |||
1112 | Module for Echoaudio Layla20 | ||
1113 | |||
1114 | This module supports multiple cards. | ||
1115 | The driver requires the firmware loader support on kernel. | ||
1116 | |||
1117 | Module snd-layla24 | ||
1118 | ------------------ | ||
1119 | |||
1120 | Module for Echoaudio Layla24 | ||
1121 | |||
1122 | This module supports multiple cards. | ||
1123 | The driver requires the firmware loader support on kernel. | ||
1124 | |||
1039 | Module snd-maestro3 | 1125 | Module snd-maestro3 |
1040 | ------------------- | 1126 | ------------------- |
1041 | 1127 | ||
@@ -1056,6 +1142,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1056 | 1142 | ||
1057 | The power-management is supported. | 1143 | The power-management is supported. |
1058 | 1144 | ||
1145 | Module snd-mia | ||
1146 | --------------- | ||
1147 | |||
1148 | Module for Echoaudio Mia | ||
1149 | |||
1150 | This module supports multiple cards. | ||
1151 | The driver requires the firmware loader support on kernel. | ||
1152 | |||
1059 | Module snd-miro | 1153 | Module snd-miro |
1060 | --------------- | 1154 | --------------- |
1061 | 1155 | ||
@@ -1088,6 +1182,14 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed. | |||
1088 | When no hotplug fw loader is available, you need to load the | 1182 | When no hotplug fw loader is available, you need to load the |
1089 | firmware via mixartloader utility in alsa-tools package. | 1183 | firmware via mixartloader utility in alsa-tools package. |
1090 | 1184 | ||
1185 | Module snd-mona | ||
1186 | --------------- | ||
1187 | |||
1188 | Module for Echoaudio Mona | ||
1189 | |||
1190 | This module supports multiple cards. | ||
1191 | The driver requires the firmware loader support on kernel. | ||
1192 | |||
1091 | Module snd-mpu401 | 1193 | Module snd-mpu401 |
1092 | ----------------- | 1194 | ----------------- |
1093 | 1195 | ||
diff --git a/Documentation/video4linux/README.pvrusb2 b/Documentation/video4linux/README.pvrusb2 new file mode 100644 index 000000000000..c73a32c34528 --- /dev/null +++ b/Documentation/video4linux/README.pvrusb2 | |||
@@ -0,0 +1,212 @@ | |||
1 | |||
2 | $Id$ | ||
3 | Mike Isely <isely@pobox.com> | ||
4 | |||
5 | pvrusb2 driver | ||
6 | |||
7 | Background: | ||
8 | |||
9 | This driver is intended for the "Hauppauge WinTV PVR USB 2.0", which | ||
10 | is a USB 2.0 hosted TV Tuner. This driver is a work in progress. | ||
11 | Its history started with the reverse-engineering effort by Björn | ||
12 | Danielsson <pvrusb2@dax.nu> whose web page can be found here: | ||
13 | |||
14 | http://pvrusb2.dax.nu/ | ||
15 | |||
16 | From there Aurelien Alleaume <slts@free.fr> began an effort to | ||
17 | create a video4linux compatible driver. I began with Aurelien's | ||
18 | last known snapshot and evolved the driver to the state it is in | ||
19 | here. | ||
20 | |||
21 | More information on this driver can be found at: | ||
22 | |||
23 | http://www.isely.net/pvrusb2.html | ||
24 | |||
25 | |||
26 | This driver has a strong separation of layers. They are very | ||
27 | roughly: | ||
28 | |||
29 | 1a. Low level wire-protocol implementation with the device. | ||
30 | |||
31 | 1b. I2C adaptor implementation and corresponding I2C client drivers | ||
32 | implemented elsewhere in V4L. | ||
33 | |||
34 | 1c. High level hardware driver implementation which coordinates all | ||
35 | activities that ensure correct operation of the device. | ||
36 | |||
37 | 2. A "context" layer which manages instancing of driver, setup, | ||
38 | tear-down, arbitration, and interaction with high level | ||
39 | interfaces appropriately as devices are hotplugged in the | ||
40 | system. | ||
41 | |||
42 | 3. High level interfaces which glue the driver to various published | ||
43 | Linux APIs (V4L, sysfs, maybe DVB in the future). | ||
44 | |||
45 | The most important shearing layer is between the top 2 layers. A | ||
46 | lot of work went into the driver to ensure that any kind of | ||
47 | conceivable API can be laid on top of the core driver. (Yes, the | ||
48 | driver internally leverages V4L to do its work but that really has | ||
49 | nothing to do with the API published by the driver to the outside | ||
50 | world.) The architecture allows for different APIs to | ||
51 | simultaneously access the driver. I have a strong sense of fairness | ||
52 | about APIs and also feel that it is a good design principle to keep | ||
53 | implementation and interface isolated from each other. Thus while | ||
54 | right now the V4L high level interface is the most complete, the | ||
55 | sysfs high level interface will work equally well for similar | ||
56 | functions, and there's no reason I see right now why it shouldn't be | ||
57 | possible to produce a DVB high level interface that can sit right | ||
58 | alongside V4L. | ||
59 | |||
60 | NOTE: Complete documentation on the pvrusb2 driver is contained in | ||
61 | the html files within the doc directory; these are exactly the same | ||
62 | as what is on the web site at the time. Browse those files | ||
63 | (especially the FAQ) before asking questions. | ||
64 | |||
65 | |||
66 | Building | ||
67 | |||
68 | To build these modules essentially amounts to just running "Make", | ||
69 | but you need the kernel source tree nearby and you will likely also | ||
70 | want to set a few controlling environment variables first in order | ||
71 | to link things up with that source tree. Please see the Makefile | ||
72 | here for comments that explain how to do that. | ||
73 | |||
74 | |||
75 | Source file list / functional overview: | ||
76 | |||
77 | (Note: The term "module" used below generally refers to loosely | ||
78 | defined functional units within the pvrusb2 driver and bears no | ||
79 | relation to the Linux kernel's concept of a loadable module.) | ||
80 | |||
81 | pvrusb2-audio.[ch] - This is glue logic that resides between this | ||
82 | driver and the msp3400.ko I2C client driver (which is found | ||
83 | elsewhere in V4L). | ||
84 | |||
85 | pvrusb2-context.[ch] - This module implements the context for an | ||
86 | instance of the driver. Everything else eventually ties back to | ||
87 | or is otherwise instanced within the data structures implemented | ||
88 | here. Hotplugging is ultimately coordinated here. All high level | ||
89 | interfaces tie into the driver through this module. This module | ||
90 | helps arbitrate each interface's access to the actual driver core, | ||
91 | and is designed to allow concurrent access through multiple | ||
92 | instances of multiple interfaces (thus you can for example change | ||
93 | the tuner's frequency through sysfs while simultaneously streaming | ||
94 | video through V4L out to an instance of mplayer). | ||
95 | |||
96 | pvrusb2-debug.h - This header defines a printk() wrapper and a mask | ||
97 | of debugging bit definitions for the various kinds of debug | ||
98 | messages that can be enabled within the driver. | ||
99 | |||
100 | pvrusb2-debugifc.[ch] - This module implements a crude command line | ||
101 | oriented debug interface into the driver. Aside from being part | ||
102 | of the process for implementing manual firmware extraction (see | ||
103 | the pvrusb2 web site mentioned earlier), probably I'm the only one | ||
104 | who has ever used this. It is mainly a debugging aid. | ||
105 | |||
106 | pvrusb2-eeprom.[ch] - This is glue logic that resides between this | ||
107 | driver the tveeprom.ko module, which is itself implemented | ||
108 | elsewhere in V4L. | ||
109 | |||
110 | pvrusb2-encoder.[ch] - This module implements all protocol needed to | ||
111 | interact with the Conexant mpeg2 encoder chip within the pvrusb2 | ||
112 | device. It is a crude echo of corresponding logic in ivtv, | ||
113 | however the design goals (strict isolation) and physical layer | ||
114 | (proxy through USB instead of PCI) are enough different that this | ||
115 | implementation had to be completely different. | ||
116 | |||
117 | pvrusb2-hdw-internal.h - This header defines the core data structure | ||
118 | in the driver used to track ALL internal state related to control | ||
119 | of the hardware. Nobody outside of the core hardware-handling | ||
120 | modules should have any business using this header. All external | ||
121 | access to the driver should be through one of the high level | ||
122 | interfaces (e.g. V4L, sysfs, etc), and in fact even those high | ||
123 | level interfaces are restricted to the API defined in | ||
124 | pvrusb2-hdw.h and NOT this header. | ||
125 | |||
126 | pvrusb2-hdw.h - This header defines the full internal API for | ||
127 | controlling the hardware. High level interfaces (e.g. V4L, sysfs) | ||
128 | will work through here. | ||
129 | |||
130 | pvrusb2-hdw.c - This module implements all the various bits of logic | ||
131 | that handle overall control of a specific pvrusb2 device. | ||
132 | (Policy, instantiation, and arbitration of pvrusb2 devices fall | ||
133 | within the jurisdiction of pvrusb-context not here). | ||
134 | |||
135 | pvrusb2-i2c-chips-*.c - These modules implement the glue logic to | ||
136 | tie together and configure various I2C modules as they attach to | ||
137 | the I2C bus. There are two versions of this file. The "v4l2" | ||
138 | version is intended to be used in-tree alongside V4L, where we | ||
139 | implement just the logic that makes sense for a pure V4L | ||
140 | environment. The "all" version is intended for use outside of | ||
141 | V4L, where we might encounter other possibly "challenging" modules | ||
142 | from ivtv or older kernel snapshots (or even the support modules | ||
143 | in the standalone snapshot). | ||
144 | |||
145 | pvrusb2-i2c-cmd-v4l1.[ch] - This module implements generic V4L1 | ||
146 | compatible commands to the I2C modules. It is here where state | ||
147 | changes inside the pvrusb2 driver are translated into V4L1 | ||
148 | commands that are in turn send to the various I2C modules. | ||
149 | |||
150 | pvrusb2-i2c-cmd-v4l2.[ch] - This module implements generic V4L2 | ||
151 | compatible commands to the I2C modules. It is here where state | ||
152 | changes inside the pvrusb2 driver are translated into V4L2 | ||
153 | commands that are in turn send to the various I2C modules. | ||
154 | |||
155 | pvrusb2-i2c-core.[ch] - This module provides an implementation of a | ||
156 | kernel-friendly I2C adaptor driver, through which other external | ||
157 | I2C client drivers (e.g. msp3400, tuner, lirc) may connect and | ||
158 | operate corresponding chips within the the pvrusb2 device. It is | ||
159 | through here that other V4L modules can reach into this driver to | ||
160 | operate specific pieces (and those modules are in turn driven by | ||
161 | glue logic which is coordinated by pvrusb2-hdw, doled out by | ||
162 | pvrusb2-context, and then ultimately made available to users | ||
163 | through one of the high level interfaces). | ||
164 | |||
165 | pvrusb2-io.[ch] - This module implements a very low level ring of | ||
166 | transfer buffers, required in order to stream data from the | ||
167 | device. This module is *very* low level. It only operates the | ||
168 | buffers and makes no attempt to define any policy or mechanism for | ||
169 | how such buffers might be used. | ||
170 | |||
171 | pvrusb2-ioread.[ch] - This module layers on top of pvrusb2-io.[ch] | ||
172 | to provide a streaming API usable by a read() system call style of | ||
173 | I/O. Right now this is the only layer on top of pvrusb2-io.[ch], | ||
174 | however the underlying architecture here was intended to allow for | ||
175 | other styles of I/O to be implemented with additonal modules, like | ||
176 | mmap()'ed buffers or something even more exotic. | ||
177 | |||
178 | pvrusb2-main.c - This is the top level of the driver. Module level | ||
179 | and USB core entry points are here. This is our "main". | ||
180 | |||
181 | pvrusb2-sysfs.[ch] - This is the high level interface which ties the | ||
182 | pvrusb2 driver into sysfs. Through this interface you can do | ||
183 | everything with the driver except actually stream data. | ||
184 | |||
185 | pvrusb2-tuner.[ch] - This is glue logic that resides between this | ||
186 | driver and the tuner.ko I2C client driver (which is found | ||
187 | elsewhere in V4L). | ||
188 | |||
189 | pvrusb2-util.h - This header defines some common macros used | ||
190 | throughout the driver. These macros are not really specific to | ||
191 | the driver, but they had to go somewhere. | ||
192 | |||
193 | pvrusb2-v4l2.[ch] - This is the high level interface which ties the | ||
194 | pvrusb2 driver into video4linux. It is through here that V4L | ||
195 | applications can open and operate the driver in the usual V4L | ||
196 | ways. Note that **ALL** V4L functionality is published only | ||
197 | through here and nowhere else. | ||
198 | |||
199 | pvrusb2-video-*.[ch] - This is glue logic that resides between this | ||
200 | driver and the saa711x.ko I2C client driver (which is found | ||
201 | elsewhere in V4L). Note that saa711x.ko used to be known as | ||
202 | saa7115.ko in ivtv. There are two versions of this; one is | ||
203 | selected depending on the particular saa711[5x].ko that is found. | ||
204 | |||
205 | pvrusb2.h - This header contains compile time tunable parameters | ||
206 | (and at the moment the driver has very little that needs to be | ||
207 | tuned). | ||
208 | |||
209 | |||
210 | -Mike Isely | ||
211 | isely@pobox.com | ||
212 | |||
diff --git a/Documentation/watchdog/pcwd-watchdog.txt b/Documentation/watchdog/pcwd-watchdog.txt index 12187a33e310..d9ee6336c1d4 100644 --- a/Documentation/watchdog/pcwd-watchdog.txt +++ b/Documentation/watchdog/pcwd-watchdog.txt | |||
@@ -22,78 +22,9 @@ | |||
22 | to run the program with an "&" to run it in the background!) | 22 | to run the program with an "&" to run it in the background!) |
23 | 23 | ||
24 | If you want to write a program to be compatible with the PC Watchdog | 24 | If you want to write a program to be compatible with the PC Watchdog |
25 | driver, simply do the following: | 25 | driver, simply use of modify the watchdog test program: |
26 | 26 | Documentation/watchdog/src/watchdog-test.c | |
27 | -- Snippet of code -- | 27 | |
28 | /* | ||
29 | * Watchdog Driver Test Program | ||
30 | */ | ||
31 | |||
32 | #include <stdio.h> | ||
33 | #include <stdlib.h> | ||
34 | #include <string.h> | ||
35 | #include <unistd.h> | ||
36 | #include <fcntl.h> | ||
37 | #include <sys/ioctl.h> | ||
38 | #include <linux/types.h> | ||
39 | #include <linux/watchdog.h> | ||
40 | |||
41 | int fd; | ||
42 | |||
43 | /* | ||
44 | * This function simply sends an IOCTL to the driver, which in turn ticks | ||
45 | * the PC Watchdog card to reset its internal timer so it doesn't trigger | ||
46 | * a computer reset. | ||
47 | */ | ||
48 | void keep_alive(void) | ||
49 | { | ||
50 | int dummy; | ||
51 | |||
52 | ioctl(fd, WDIOC_KEEPALIVE, &dummy); | ||
53 | } | ||
54 | |||
55 | /* | ||
56 | * The main program. Run the program with "-d" to disable the card, | ||
57 | * or "-e" to enable the card. | ||
58 | */ | ||
59 | int main(int argc, char *argv[]) | ||
60 | { | ||
61 | fd = open("/dev/watchdog", O_WRONLY); | ||
62 | |||
63 | if (fd == -1) { | ||
64 | fprintf(stderr, "Watchdog device not enabled.\n"); | ||
65 | fflush(stderr); | ||
66 | exit(-1); | ||
67 | } | ||
68 | |||
69 | if (argc > 1) { | ||
70 | if (!strncasecmp(argv[1], "-d", 2)) { | ||
71 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD); | ||
72 | fprintf(stderr, "Watchdog card disabled.\n"); | ||
73 | fflush(stderr); | ||
74 | exit(0); | ||
75 | } else if (!strncasecmp(argv[1], "-e", 2)) { | ||
76 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD); | ||
77 | fprintf(stderr, "Watchdog card enabled.\n"); | ||
78 | fflush(stderr); | ||
79 | exit(0); | ||
80 | } else { | ||
81 | fprintf(stderr, "-d to disable, -e to enable.\n"); | ||
82 | fprintf(stderr, "run by itself to tick the card.\n"); | ||
83 | fflush(stderr); | ||
84 | exit(0); | ||
85 | } | ||
86 | } else { | ||
87 | fprintf(stderr, "Watchdog Ticking Away!\n"); | ||
88 | fflush(stderr); | ||
89 | } | ||
90 | |||
91 | while(1) { | ||
92 | keep_alive(); | ||
93 | sleep(1); | ||
94 | } | ||
95 | } | ||
96 | -- End snippet -- | ||
97 | 28 | ||
98 | Other IOCTL functions include: | 29 | Other IOCTL functions include: |
99 | 30 | ||
diff --git a/Documentation/watchdog/src/watchdog-simple.c b/Documentation/watchdog/src/watchdog-simple.c new file mode 100644 index 000000000000..85cf17c48669 --- /dev/null +++ b/Documentation/watchdog/src/watchdog-simple.c | |||
@@ -0,0 +1,15 @@ | |||
1 | #include <stdlib.h> | ||
2 | #include <fcntl.h> | ||
3 | |||
4 | int main(int argc, const char *argv[]) { | ||
5 | int fd = open("/dev/watchdog", O_WRONLY); | ||
6 | if (fd == -1) { | ||
7 | perror("watchdog"); | ||
8 | exit(1); | ||
9 | } | ||
10 | while (1) { | ||
11 | write(fd, "\0", 1); | ||
12 | fsync(fd); | ||
13 | sleep(10); | ||
14 | } | ||
15 | } | ||
diff --git a/Documentation/watchdog/src/watchdog-test.c b/Documentation/watchdog/src/watchdog-test.c new file mode 100644 index 000000000000..65f6c19cb865 --- /dev/null +++ b/Documentation/watchdog/src/watchdog-test.c | |||
@@ -0,0 +1,68 @@ | |||
1 | /* | ||
2 | * Watchdog Driver Test Program | ||
3 | */ | ||
4 | |||
5 | #include <stdio.h> | ||
6 | #include <stdlib.h> | ||
7 | #include <string.h> | ||
8 | #include <unistd.h> | ||
9 | #include <fcntl.h> | ||
10 | #include <sys/ioctl.h> | ||
11 | #include <linux/types.h> | ||
12 | #include <linux/watchdog.h> | ||
13 | |||
14 | int fd; | ||
15 | |||
16 | /* | ||
17 | * This function simply sends an IOCTL to the driver, which in turn ticks | ||
18 | * the PC Watchdog card to reset its internal timer so it doesn't trigger | ||
19 | * a computer reset. | ||
20 | */ | ||
21 | void keep_alive(void) | ||
22 | { | ||
23 | int dummy; | ||
24 | |||
25 | ioctl(fd, WDIOC_KEEPALIVE, &dummy); | ||
26 | } | ||
27 | |||
28 | /* | ||
29 | * The main program. Run the program with "-d" to disable the card, | ||
30 | * or "-e" to enable the card. | ||
31 | */ | ||
32 | int main(int argc, char *argv[]) | ||
33 | { | ||
34 | fd = open("/dev/watchdog", O_WRONLY); | ||
35 | |||
36 | if (fd == -1) { | ||
37 | fprintf(stderr, "Watchdog device not enabled.\n"); | ||
38 | fflush(stderr); | ||
39 | exit(-1); | ||
40 | } | ||
41 | |||
42 | if (argc > 1) { | ||
43 | if (!strncasecmp(argv[1], "-d", 2)) { | ||
44 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_DISABLECARD); | ||
45 | fprintf(stderr, "Watchdog card disabled.\n"); | ||
46 | fflush(stderr); | ||
47 | exit(0); | ||
48 | } else if (!strncasecmp(argv[1], "-e", 2)) { | ||
49 | ioctl(fd, WDIOC_SETOPTIONS, WDIOS_ENABLECARD); | ||
50 | fprintf(stderr, "Watchdog card enabled.\n"); | ||
51 | fflush(stderr); | ||
52 | exit(0); | ||
53 | } else { | ||
54 | fprintf(stderr, "-d to disable, -e to enable.\n"); | ||
55 | fprintf(stderr, "run by itself to tick the card.\n"); | ||
56 | fflush(stderr); | ||
57 | exit(0); | ||
58 | } | ||
59 | } else { | ||
60 | fprintf(stderr, "Watchdog Ticking Away!\n"); | ||
61 | fflush(stderr); | ||
62 | } | ||
63 | |||
64 | while(1) { | ||
65 | keep_alive(); | ||
66 | sleep(1); | ||
67 | } | ||
68 | } | ||
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt index 21ed51173662..958ff3d48be3 100644 --- a/Documentation/watchdog/watchdog-api.txt +++ b/Documentation/watchdog/watchdog-api.txt | |||
@@ -34,22 +34,7 @@ activates as soon as /dev/watchdog is opened and will reboot unless | |||
34 | the watchdog is pinged within a certain time, this time is called the | 34 | the watchdog is pinged within a certain time, this time is called the |
35 | timeout or margin. The simplest way to ping the watchdog is to write | 35 | timeout or margin. The simplest way to ping the watchdog is to write |
36 | some data to the device. So a very simple watchdog daemon would look | 36 | some data to the device. So a very simple watchdog daemon would look |
37 | like this: | 37 | like this source file: see Documentation/watchdog/src/watchdog-simple.c |
38 | |||
39 | #include <stdlib.h> | ||
40 | #include <fcntl.h> | ||
41 | |||
42 | int main(int argc, const char *argv[]) { | ||
43 | int fd=open("/dev/watchdog",O_WRONLY); | ||
44 | if (fd==-1) { | ||
45 | perror("watchdog"); | ||
46 | exit(1); | ||
47 | } | ||
48 | while(1) { | ||
49 | write(fd, "\0", 1); | ||
50 | sleep(10); | ||
51 | } | ||
52 | } | ||
53 | 38 | ||
54 | A more advanced driver could for example check that a HTTP server is | 39 | A more advanced driver could for example check that a HTTP server is |
55 | still responding before doing the write call to ping the watchdog. | 40 | still responding before doing the write call to ping the watchdog. |
@@ -110,7 +95,40 @@ current timeout using the GETTIMEOUT ioctl. | |||
110 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); | 95 | ioctl(fd, WDIOC_GETTIMEOUT, &timeout); |
111 | printf("The timeout was is %d seconds\n", timeout); | 96 | printf("The timeout was is %d seconds\n", timeout); |
112 | 97 | ||
113 | Envinronmental monitoring: | 98 | Pretimeouts: |
99 | |||
100 | Some watchdog timers can be set to have a trigger go off before the | ||
101 | actual time they will reset the system. This can be done with an NMI, | ||
102 | interrupt, or other mechanism. This allows Linux to record useful | ||
103 | information (like panic information and kernel coredumps) before it | ||
104 | resets. | ||
105 | |||
106 | pretimeout = 10; | ||
107 | ioctl(fd, WDIOC_SETPRETIMEOUT, &pretimeout); | ||
108 | |||
109 | Note that the pretimeout is the number of seconds before the time | ||
110 | when the timeout will go off. It is not the number of seconds until | ||
111 | the pretimeout. So, for instance, if you set the timeout to 60 seconds | ||
112 | and the pretimeout to 10 seconds, the pretimout will go of in 50 | ||
113 | seconds. Setting a pretimeout to zero disables it. | ||
114 | |||
115 | There is also a get function for getting the pretimeout: | ||
116 | |||
117 | ioctl(fd, WDIOC_GETPRETIMEOUT, &timeout); | ||
118 | printf("The pretimeout was is %d seconds\n", timeout); | ||
119 | |||
120 | Not all watchdog drivers will support a pretimeout. | ||
121 | |||
122 | Get the number of seconds before reboot: | ||
123 | |||
124 | Some watchdog drivers have the ability to report the remaining time | ||
125 | before the system will reboot. The WDIOC_GETTIMELEFT is the ioctl | ||
126 | that returns the number of seconds before reboot. | ||
127 | |||
128 | ioctl(fd, WDIOC_GETTIMELEFT, &timeleft); | ||
129 | printf("The timeout was is %d seconds\n", timeleft); | ||
130 | |||
131 | Environmental monitoring: | ||
114 | 132 | ||
115 | All watchdog drivers are required return more information about the system, | 133 | All watchdog drivers are required return more information about the system, |
116 | some do temperature, fan and power level monitoring, some can tell you | 134 | some do temperature, fan and power level monitoring, some can tell you |
@@ -169,6 +187,10 @@ The watchdog saw a keepalive ping since it was last queried. | |||
169 | 187 | ||
170 | WDIOF_SETTIMEOUT Can set/get the timeout | 188 | WDIOF_SETTIMEOUT Can set/get the timeout |
171 | 189 | ||
190 | The watchdog can do pretimeouts. | ||
191 | |||
192 | WDIOF_PRETIMEOUT Pretimeout (in seconds), get/set | ||
193 | |||
172 | 194 | ||
173 | For those drivers that return any bits set in the option field, the | 195 | For those drivers that return any bits set in the option field, the |
174 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current | 196 | GETSTATUS and GETBOOTSTATUS ioctls can be used to ask for the current |
diff --git a/Documentation/watchdog/watchdog.txt b/Documentation/watchdog/watchdog.txt index dffda29c8799..4b1ff69cc19a 100644 --- a/Documentation/watchdog/watchdog.txt +++ b/Documentation/watchdog/watchdog.txt | |||
@@ -65,28 +65,7 @@ The external event interfaces on the WDT boards are not currently supported. | |||
65 | Minor numbers are however allocated for it. | 65 | Minor numbers are however allocated for it. |
66 | 66 | ||
67 | 67 | ||
68 | Example Watchdog Driver | 68 | Example Watchdog Driver: see Documentation/watchdog/src/watchdog-simple.c |
69 | ----------------------- | ||
70 | |||
71 | #include <stdio.h> | ||
72 | #include <unistd.h> | ||
73 | #include <fcntl.h> | ||
74 | |||
75 | int main(int argc, const char *argv[]) | ||
76 | { | ||
77 | int fd=open("/dev/watchdog",O_WRONLY); | ||
78 | if(fd==-1) | ||
79 | { | ||
80 | perror("watchdog"); | ||
81 | exit(1); | ||
82 | } | ||
83 | while(1) | ||
84 | { | ||
85 | write(fd,"\0",1); | ||
86 | fsync(fd); | ||
87 | sleep(10); | ||
88 | } | ||
89 | } | ||
90 | 69 | ||
91 | 70 | ||
92 | Contact Information | 71 | Contact Information |