diff options
Diffstat (limited to 'Documentation')
58 files changed, 2721 insertions, 584 deletions
diff --git a/Documentation/Changes b/Documentation/Changes index 86b86399d61d..fe5ae0f55020 100644 --- a/Documentation/Changes +++ b/Documentation/Changes | |||
@@ -31,8 +31,6 @@ al espaņol de este documento en varios formatos. | |||
31 | Eine deutsche Version dieser Datei finden Sie unter | 31 | Eine deutsche Version dieser Datei finden Sie unter |
32 | <http://www.stefan-winter.de/Changes-2.4.0.txt>. | 32 | <http://www.stefan-winter.de/Changes-2.4.0.txt>. |
33 | 33 | ||
34 | Last updated: October 29th, 2002 | ||
35 | |||
36 | Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu). | 34 | Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu). |
37 | 35 | ||
38 | Current Minimal Requirements | 36 | Current Minimal Requirements |
@@ -48,7 +46,7 @@ necessary on all systems; obviously, if you don't have any ISDN | |||
48 | hardware, for example, you probably needn't concern yourself with | 46 | hardware, for example, you probably needn't concern yourself with |
49 | isdn4k-utils. | 47 | isdn4k-utils. |
50 | 48 | ||
51 | o Gnu C 2.95.3 # gcc --version | 49 | o Gnu C 3.2 # gcc --version |
52 | o Gnu make 3.79.1 # make --version | 50 | o Gnu make 3.79.1 # make --version |
53 | o binutils 2.12 # ld -v | 51 | o binutils 2.12 # ld -v |
54 | o util-linux 2.10o # fdformat --version | 52 | o util-linux 2.10o # fdformat --version |
@@ -74,26 +72,7 @@ GCC | |||
74 | --- | 72 | --- |
75 | 73 | ||
76 | The gcc version requirements may vary depending on the type of CPU in your | 74 | The gcc version requirements may vary depending on the type of CPU in your |
77 | computer. The next paragraph applies to users of x86 CPUs, but not | 75 | computer. |
78 | necessarily to users of other CPUs. Users of other CPUs should obtain | ||
79 | information about their gcc version requirements from another source. | ||
80 | |||
81 | The recommended compiler for the kernel is gcc 2.95.x (x >= 3), and it | ||
82 | should be used when you need absolute stability. You may use gcc 3.0.x | ||
83 | instead if you wish, although it may cause problems. Later versions of gcc | ||
84 | have not received much testing for Linux kernel compilation, and there are | ||
85 | almost certainly bugs (mainly, but not exclusively, in the kernel) that | ||
86 | will need to be fixed in order to use these compilers. In any case, using | ||
87 | pgcc instead of plain gcc is just asking for trouble. | ||
88 | |||
89 | The Red Hat gcc 2.96 compiler subtree can also be used to build this tree. | ||
90 | You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build | ||
91 | the kernel correctly. | ||
92 | |||
93 | In addition, please pay attention to compiler optimization. Anything | ||
94 | greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x | ||
95 | or derivatives, be sure not to use -fstrict-aliasing (which, depending on | ||
96 | your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing). | ||
97 | 76 | ||
98 | Make | 77 | Make |
99 | ---- | 78 | ---- |
@@ -322,9 +301,9 @@ Getting updated software | |||
322 | Kernel compilation | 301 | Kernel compilation |
323 | ****************** | 302 | ****************** |
324 | 303 | ||
325 | gcc 2.95.3 | 304 | gcc |
326 | ---------- | 305 | --- |
327 | o <ftp://ftp.gnu.org/gnu/gcc/gcc-2.95.3.tar.gz> | 306 | o <ftp://ftp.gnu.org/gnu/gcc/> |
328 | 307 | ||
329 | Make | 308 | Make |
330 | ---- | 309 | ---- |
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index eb7db3c19227..ce5d2c038cf5 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle | |||
@@ -199,7 +199,7 @@ The rationale is: | |||
199 | modifications are prevented | 199 | modifications are prevented |
200 | - saves the compiler work to optimize redundant code away ;) | 200 | - saves the compiler work to optimize redundant code away ;) |
201 | 201 | ||
202 | int fun(int ) | 202 | int fun(int a) |
203 | { | 203 | { |
204 | int result = 0; | 204 | int result = 0; |
205 | char *buffer = kmalloc(SIZE); | 205 | char *buffer = kmalloc(SIZE); |
@@ -344,7 +344,7 @@ Remember: if another thread can find your data structure, and you don't | |||
344 | have a reference count on it, you almost certainly have a bug. | 344 | have a reference count on it, you almost certainly have a bug. |
345 | 345 | ||
346 | 346 | ||
347 | Chapter 11: Macros, Enums, Inline functions and RTL | 347 | Chapter 11: Macros, Enums and RTL |
348 | 348 | ||
349 | Names of macros defining constants and labels in enums are capitalized. | 349 | Names of macros defining constants and labels in enums are capitalized. |
350 | 350 | ||
@@ -429,7 +429,35 @@ from void pointer to any other pointer type is guaranteed by the C programming | |||
429 | language. | 429 | language. |
430 | 430 | ||
431 | 431 | ||
432 | Chapter 14: References | 432 | Chapter 14: The inline disease |
433 | |||
434 | There appears to be a common misperception that gcc has a magic "make me | ||
435 | faster" speedup option called "inline". While the use of inlines can be | ||
436 | appropriate (for example as a means of replacing macros, see Chapter 11), it | ||
437 | very often is not. Abundant use of the inline keyword leads to a much bigger | ||
438 | kernel, which in turn slows the system as a whole down, due to a bigger | ||
439 | icache footprint for the CPU and simply because there is less memory | ||
440 | available for the pagecache. Just think about it; a pagecache miss causes a | ||
441 | disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles | ||
442 | that can go into these 5 miliseconds. | ||
443 | |||
444 | A reasonable rule of thumb is to not put inline at functions that have more | ||
445 | than 3 lines of code in them. An exception to this rule are the cases where | ||
446 | a parameter is known to be a compiletime constant, and as a result of this | ||
447 | constantness you *know* the compiler will be able to optimize most of your | ||
448 | function away at compile time. For a good example of this later case, see | ||
449 | the kmalloc() inline function. | ||
450 | |||
451 | Often people argue that adding inline to functions that are static and used | ||
452 | only once is always a win since there is no space tradeoff. While this is | ||
453 | technically correct, gcc is capable of inlining these automatically without | ||
454 | help, and the maintenance issue of removing the inline when a second user | ||
455 | appears outweighs the potential value of the hint that tells gcc to do | ||
456 | something it would have done anyway. | ||
457 | |||
458 | |||
459 | |||
460 | Chapter 15: References | ||
433 | 461 | ||
434 | The C Programming Language, Second Edition | 462 | The C Programming Language, Second Edition |
435 | by Brian W. Kernighan and Dennis M. Ritchie. | 463 | by Brian W. Kernighan and Dennis M. Ritchie. |
@@ -444,10 +472,13 @@ ISBN 0-201-61586-X. | |||
444 | URL: http://cm.bell-labs.com/cm/cs/tpop/ | 472 | URL: http://cm.bell-labs.com/cm/cs/tpop/ |
445 | 473 | ||
446 | GNU manuals - where in compliance with K&R and this text - for cpp, gcc, | 474 | GNU manuals - where in compliance with K&R and this text - for cpp, gcc, |
447 | gcc internals and indent, all available from http://www.gnu.org | 475 | gcc internals and indent, all available from http://www.gnu.org/manual/ |
448 | 476 | ||
449 | WG14 is the international standardization working group for the programming | 477 | WG14 is the international standardization working group for the programming |
450 | language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/ | 478 | language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ |
479 | |||
480 | Kernel CodingStyle, by greg@kroah.com at OLS 2002: | ||
481 | http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ | ||
451 | 482 | ||
452 | -- | 483 | -- |
453 | Last updated on 16 February 2004 by a community effort on LKML. | 484 | Last updated on 30 December 2005 by a community effort on LKML. |
diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore new file mode 100644 index 000000000000..c102c02ecf89 --- /dev/null +++ b/Documentation/DocBook/.gitignore | |||
@@ -0,0 +1,6 @@ | |||
1 | *.xml | ||
2 | *.ps | ||
3 | |||
4 | *.html | ||
5 | *.9.gz | ||
6 | *.9 | ||
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 767433bdbc40..8c9c6704e85b 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -54,6 +54,11 @@ | |||
54 | !Ekernel/sched.c | 54 | !Ekernel/sched.c |
55 | !Ekernel/timer.c | 55 | !Ekernel/timer.c |
56 | </sect1> | 56 | </sect1> |
57 | <sect1><title>High-resolution timers</title> | ||
58 | !Iinclude/linux/ktime.h | ||
59 | !Iinclude/linux/hrtimer.h | ||
60 | !Ekernel/hrtimer.c | ||
61 | </sect1> | ||
57 | <sect1><title>Internal Functions</title> | 62 | <sect1><title>Internal Functions</title> |
58 | !Ikernel/exit.c | 63 | !Ikernel/exit.c |
59 | !Ikernel/signal.c | 64 | !Ikernel/signal.c |
@@ -369,6 +374,7 @@ X!Edrivers/acpi/motherboard.c | |||
369 | X!Edrivers/acpi/bus.c | 374 | X!Edrivers/acpi/bus.c |
370 | --> | 375 | --> |
371 | !Edrivers/acpi/scan.c | 376 | !Edrivers/acpi/scan.c |
377 | !Idrivers/acpi/scan.c | ||
372 | <!-- No correct structured comments | 378 | <!-- No correct structured comments |
373 | X!Edrivers/acpi/pci_bind.c | 379 | X!Edrivers/acpi/pci_bind.c |
374 | --> | 380 | --> |
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 90dc2de8e0af..158ffe9bfade 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl | |||
@@ -222,7 +222,7 @@ | |||
222 | <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> | 222 | <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> |
223 | 223 | ||
224 | <para> | 224 | <para> |
225 | There are two main types of kernel locks. The fundamental type | 225 | There are three main types of kernel locks. The fundamental type |
226 | is the spinlock | 226 | is the spinlock |
227 | (<filename class="headerfile">include/asm/spinlock.h</filename>), | 227 | (<filename class="headerfile">include/asm/spinlock.h</filename>), |
228 | which is a very simple single-holder lock: if you can't get the | 228 | which is a very simple single-holder lock: if you can't get the |
@@ -230,16 +230,22 @@ | |||
230 | very small and fast, and can be used anywhere. | 230 | very small and fast, and can be used anywhere. |
231 | </para> | 231 | </para> |
232 | <para> | 232 | <para> |
233 | The second type is a semaphore | 233 | The second type is a mutex |
234 | (<filename class="headerfile">include/linux/mutex.h</filename>): it | ||
235 | is like a spinlock, but you may block holding a mutex. | ||
236 | If you can't lock a mutex, your task will suspend itself, and be woken | ||
237 | up when the mutex is released. This means the CPU can do something | ||
238 | else while you are waiting. There are many cases when you simply | ||
239 | can't sleep (see <xref linkend="sleeping-things"/>), and so have to | ||
240 | use a spinlock instead. | ||
241 | </para> | ||
242 | <para> | ||
243 | The third type is a semaphore | ||
234 | (<filename class="headerfile">include/asm/semaphore.h</filename>): it | 244 | (<filename class="headerfile">include/asm/semaphore.h</filename>): it |
235 | can have more than one holder at any time (the number decided at | 245 | can have more than one holder at any time (the number decided at |
236 | initialization time), although it is most commonly used as a | 246 | initialization time), although it is most commonly used as a |
237 | single-holder lock (a mutex). If you can't get a semaphore, | 247 | single-holder lock (a mutex). If you can't get a semaphore, your |
238 | your task will put itself on the queue, and be woken up when the | 248 | task will be suspended and later on woken up - just like for mutexes. |
239 | semaphore is released. This means the CPU will do something | ||
240 | else while you are waiting, but there are many cases when you | ||
241 | simply can't sleep (see <xref linkend="sleeping-things"/>), and so | ||
242 | have to use a spinlock instead. | ||
243 | </para> | 249 | </para> |
244 | <para> | 250 | <para> |
245 | Neither type of lock is recursive: see | 251 | Neither type of lock is recursive: see |
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt index a23fee66064d..3f60db41b2f0 100644 --- a/Documentation/RCU/rcuref.txt +++ b/Documentation/RCU/rcuref.txt | |||
@@ -1,74 +1,67 @@ | |||
1 | Refcounter framework for elements of lists/arrays protected by | 1 | Refcounter design for elements of lists/arrays protected by RCU. |
2 | RCU. | ||
3 | 2 | ||
4 | Refcounting on elements of lists which are protected by traditional | 3 | Refcounting on elements of lists which are protected by traditional |
5 | reader/writer spinlocks or semaphores are straight forward as in: | 4 | reader/writer spinlocks or semaphores are straight forward as in: |
6 | 5 | ||
7 | 1. 2. | 6 | 1. 2. |
8 | add() search_and_reference() | 7 | add() search_and_reference() |
9 | { { | 8 | { { |
10 | alloc_object read_lock(&list_lock); | 9 | alloc_object read_lock(&list_lock); |
11 | ... search_for_element | 10 | ... search_for_element |
12 | atomic_set(&el->rc, 1); atomic_inc(&el->rc); | 11 | atomic_set(&el->rc, 1); atomic_inc(&el->rc); |
13 | write_lock(&list_lock); ... | 12 | write_lock(&list_lock); ... |
14 | add_element read_unlock(&list_lock); | 13 | add_element read_unlock(&list_lock); |
15 | ... ... | 14 | ... ... |
16 | write_unlock(&list_lock); } | 15 | write_unlock(&list_lock); } |
17 | } | 16 | } |
18 | 17 | ||
19 | 3. 4. | 18 | 3. 4. |
20 | release_referenced() delete() | 19 | release_referenced() delete() |
21 | { { | 20 | { { |
22 | ... write_lock(&list_lock); | 21 | ... write_lock(&list_lock); |
23 | atomic_dec(&el->rc, relfunc) ... | 22 | atomic_dec(&el->rc, relfunc) ... |
24 | ... delete_element | 23 | ... delete_element |
25 | } write_unlock(&list_lock); | 24 | } write_unlock(&list_lock); |
26 | ... | 25 | ... |
27 | if (atomic_dec_and_test(&el->rc)) | 26 | if (atomic_dec_and_test(&el->rc)) |
28 | kfree(el); | 27 | kfree(el); |
29 | ... | 28 | ... |
30 | } | 29 | } |
31 | 30 | ||
32 | If this list/array is made lock free using rcu as in changing the | 31 | If this list/array is made lock free using rcu as in changing the |
33 | write_lock in add() and delete() to spin_lock and changing read_lock | 32 | write_lock in add() and delete() to spin_lock and changing read_lock |
34 | in search_and_reference to rcu_read_lock(), the rcuref_get in | 33 | in search_and_reference to rcu_read_lock(), the atomic_get in |
35 | search_and_reference could potentially hold reference to an element which | 34 | search_and_reference could potentially hold reference to an element which |
36 | has already been deleted from the list/array. rcuref_lf_get_rcu takes | 35 | has already been deleted from the list/array. atomic_inc_not_zero takes |
37 | care of this scenario. search_and_reference should look as; | 36 | care of this scenario. search_and_reference should look as; |
38 | 37 | ||
39 | 1. 2. | 38 | 1. 2. |
40 | add() search_and_reference() | 39 | add() search_and_reference() |
41 | { { | 40 | { { |
42 | alloc_object rcu_read_lock(); | 41 | alloc_object rcu_read_lock(); |
43 | ... search_for_element | 42 | ... search_for_element |
44 | atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) { | 43 | atomic_set(&el->rc, 1); if (atomic_inc_not_zero(&el->rc)) { |
45 | write_lock(&list_lock); rcu_read_unlock(); | 44 | write_lock(&list_lock); rcu_read_unlock(); |
46 | return FAIL; | 45 | return FAIL; |
47 | add_element } | 46 | add_element } |
48 | ... ... | 47 | ... ... |
49 | write_unlock(&list_lock); rcu_read_unlock(); | 48 | write_unlock(&list_lock); rcu_read_unlock(); |
50 | } } | 49 | } } |
51 | 3. 4. | 50 | 3. 4. |
52 | release_referenced() delete() | 51 | release_referenced() delete() |
53 | { { | 52 | { { |
54 | ... write_lock(&list_lock); | 53 | ... write_lock(&list_lock); |
55 | rcuref_dec(&el->rc, relfunc) ... | 54 | atomic_dec(&el->rc, relfunc) ... |
56 | ... delete_element | 55 | ... delete_element |
57 | } write_unlock(&list_lock); | 56 | } write_unlock(&list_lock); |
58 | ... | 57 | ... |
59 | if (rcuref_dec_and_test(&el->rc)) | 58 | if (atomic_dec_and_test(&el->rc)) |
60 | call_rcu(&el->head, el_free); | 59 | call_rcu(&el->head, el_free); |
61 | ... | 60 | ... |
62 | } | 61 | } |
63 | 62 | ||
64 | Sometimes, reference to the element need to be obtained in the | 63 | Sometimes, reference to the element need to be obtained in the |
65 | update (write) stream. In such cases, rcuref_inc_lf might be an overkill | 64 | update (write) stream. In such cases, atomic_inc_not_zero might be an |
66 | since the spinlock serialising list updates are held. rcuref_inc | 65 | overkill since the spinlock serialising list updates are held. atomic_inc |
67 | is to be used in such cases. | 66 | is to be used in such cases. |
68 | For arches which do not have cmpxchg rcuref_inc_lf | 67 | |
69 | api uses a hashed spinlock implementation and the same hashed spinlock | ||
70 | is acquired in all rcuref_xxx primitives to preserve atomicity. | ||
71 | Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the | ||
72 | refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api | ||
73 | might lead to races. rcuref_inc_lf() must be used in lockfree | ||
74 | RCU critical sections only. | ||
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers index c3cca924e94b..dd311cff1cc3 100644 --- a/Documentation/SubmittingDrivers +++ b/Documentation/SubmittingDrivers | |||
@@ -27,18 +27,17 @@ Who To Submit Drivers To | |||
27 | ------------------------ | 27 | ------------------------ |
28 | 28 | ||
29 | Linux 2.0: | 29 | Linux 2.0: |
30 | No new drivers are accepted for this kernel tree | 30 | No new drivers are accepted for this kernel tree. |
31 | 31 | ||
32 | Linux 2.2: | 32 | Linux 2.2: |
33 | No new drivers are accepted for this kernel tree. | ||
34 | |||
35 | Linux 2.4: | ||
33 | If the code area has a general maintainer then please submit it to | 36 | If the code area has a general maintainer then please submit it to |
34 | the maintainer listed in MAINTAINERS in the kernel file. If the | 37 | the maintainer listed in MAINTAINERS in the kernel file. If the |
35 | maintainer does not respond or you cannot find the appropriate | 38 | maintainer does not respond or you cannot find the appropriate |
36 | maintainer then please contact the 2.2 kernel maintainer: | 39 | maintainer then please contact Marcelo Tosatti |
37 | Marc-Christian Petersen <m.c.p@wolk-project.de>. | 40 | <marcelo.tosatti@cyclades.com>. |
38 | |||
39 | Linux 2.4: | ||
40 | The same rules apply as 2.2. The final contact point for Linux 2.4 | ||
41 | submissions is Marcelo Tosatti <marcelo.tosatti@cyclades.com>. | ||
42 | 41 | ||
43 | Linux 2.6: | 42 | Linux 2.6: |
44 | The same rules apply as 2.4 except that you should follow linux-kernel | 43 | The same rules apply as 2.4 except that you should follow linux-kernel |
@@ -53,6 +52,7 @@ Licensing: The code must be released to us under the | |||
53 | of exclusive GPL licensing, and if you wish the driver | 52 | of exclusive GPL licensing, and if you wish the driver |
54 | to be useful to other communities such as BSD you may well | 53 | to be useful to other communities such as BSD you may well |
55 | wish to release under multiple licenses. | 54 | wish to release under multiple licenses. |
55 | See accepted licenses at include/linux/module.h | ||
56 | 56 | ||
57 | Copyright: The copyright owner must agree to use of GPL. | 57 | Copyright: The copyright owner must agree to use of GPL. |
58 | It's best if the submitter and copyright owner | 58 | It's best if the submitter and copyright owner |
@@ -143,5 +143,13 @@ KernelNewbies: | |||
143 | http://kernelnewbies.org/ | 143 | http://kernelnewbies.org/ |
144 | 144 | ||
145 | Linux USB project: | 145 | Linux USB project: |
146 | http://sourceforge.net/projects/linux-usb/ | 146 | http://linux-usb.sourceforge.net/ |
147 | |||
148 | How to NOT write kernel driver by arjanv@redhat.com | ||
149 | http://people.redhat.com/arjanv/olspaper.pdf | ||
150 | |||
151 | Kernel Janitor: | ||
152 | http://janitor.kernelnewbies.org/ | ||
147 | 153 | ||
154 | -- | ||
155 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 1d47e6c09dc6..6198e5ebcf65 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches | |||
@@ -78,7 +78,9 @@ Randy Dunlap's patch scripts: | |||
78 | http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz | 78 | http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz |
79 | 79 | ||
80 | Andrew Morton's patch scripts: | 80 | Andrew Morton's patch scripts: |
81 | http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20 | 81 | http://www.zip.com.au/~akpm/linux/patches/ |
82 | Instead of these scripts, quilt is the recommended patch management | ||
83 | tool (see above). | ||
82 | 84 | ||
83 | 85 | ||
84 | 86 | ||
@@ -97,7 +99,7 @@ need to split up your patch. See #3, next. | |||
97 | 99 | ||
98 | 3) Separate your changes. | 100 | 3) Separate your changes. |
99 | 101 | ||
100 | Separate each logical change into its own patch. | 102 | Separate _logical changes_ into a single patch file. |
101 | 103 | ||
102 | For example, if your changes include both bug fixes and performance | 104 | For example, if your changes include both bug fixes and performance |
103 | enhancements for a single driver, separate those changes into two | 105 | enhancements for a single driver, separate those changes into two |
@@ -112,6 +114,10 @@ If one patch depends on another patch in order for a change to be | |||
112 | complete, that is OK. Simply note "this patch depends on patch X" | 114 | complete, that is OK. Simply note "this patch depends on patch X" |
113 | in your patch description. | 115 | in your patch description. |
114 | 116 | ||
117 | If you cannot condense your patch set into a smaller set of patches, | ||
118 | then only post say 15 or so at a time and wait for review and integration. | ||
119 | |||
120 | |||
115 | 121 | ||
116 | 4) Select e-mail destination. | 122 | 4) Select e-mail destination. |
117 | 123 | ||
@@ -124,6 +130,10 @@ your patch to the primary Linux kernel developer's mailing list, | |||
124 | linux-kernel@vger.kernel.org. Most kernel developers monitor this | 130 | linux-kernel@vger.kernel.org. Most kernel developers monitor this |
125 | e-mail list, and can comment on your changes. | 131 | e-mail list, and can comment on your changes. |
126 | 132 | ||
133 | |||
134 | Do not send more than 15 patches at once to the vger mailing lists!!! | ||
135 | |||
136 | |||
127 | Linus Torvalds is the final arbiter of all changes accepted into the | 137 | Linus Torvalds is the final arbiter of all changes accepted into the |
128 | Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets | 138 | Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets |
129 | a lot of e-mail, so typically you should do your best to -avoid- sending | 139 | a lot of e-mail, so typically you should do your best to -avoid- sending |
@@ -149,6 +159,9 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the | |||
149 | MAINTAINERS file for a mailing list that relates specifically to | 159 | MAINTAINERS file for a mailing list that relates specifically to |
150 | your change. | 160 | your change. |
151 | 161 | ||
162 | Majordomo lists of VGER.KERNEL.ORG at: | ||
163 | <http://vger.kernel.org/vger-lists.html> | ||
164 | |||
152 | If changes affect userland-kernel interfaces, please send | 165 | If changes affect userland-kernel interfaces, please send |
153 | the MAN-PAGES maintainer (as listed in the MAINTAINERS file) | 166 | the MAN-PAGES maintainer (as listed in the MAINTAINERS file) |
154 | a man-pages patch, or at least a notification of the change, | 167 | a man-pages patch, or at least a notification of the change, |
@@ -373,27 +386,14 @@ a diffstat, to show what files have changed, and the number of inserted | |||
373 | and deleted lines per file. A diffstat is especially useful on bigger | 386 | and deleted lines per file. A diffstat is especially useful on bigger |
374 | patches. Other comments relevant only to the moment or the maintainer, | 387 | patches. Other comments relevant only to the moment or the maintainer, |
375 | not suitable for the permanent changelog, should also go here. | 388 | not suitable for the permanent changelog, should also go here. |
389 | Use diffstat options "-p 1 -w 70" so that filenames are listed from the | ||
390 | top of the kernel source tree and don't use too much horizontal space | ||
391 | (easily fit in 80 columns, maybe with some indentation). | ||
376 | 392 | ||
377 | See more details on the proper patch format in the following | 393 | See more details on the proper patch format in the following |
378 | references. | 394 | references. |
379 | 395 | ||
380 | 396 | ||
381 | 13) More references for submitting patches | ||
382 | |||
383 | Andrew Morton, "The perfect patch" (tpp). | ||
384 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> | ||
385 | |||
386 | Jeff Garzik, "Linux kernel patch submission format." | ||
387 | <http://linux.yyz.us/patch-format.html> | ||
388 | |||
389 | Greg KH, "How to piss off a kernel subsystem maintainer" | ||
390 | <http://www.kroah.com/log/2005/03/31/> | ||
391 | |||
392 | Kernel Documentation/CodingStyle | ||
393 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> | ||
394 | |||
395 | Linus Torvald's mail on the canonical patch format: | ||
396 | <http://lkml.org/lkml/2005/4/7/183> | ||
397 | 397 | ||
398 | 398 | ||
399 | ----------------------------------- | 399 | ----------------------------------- |
@@ -466,3 +466,30 @@ and 'extern __inline__'. | |||
466 | Don't try to anticipate nebulous future cases which may or may not | 466 | Don't try to anticipate nebulous future cases which may or may not |
467 | be useful: "Make it as simple as you can, and no simpler." | 467 | be useful: "Make it as simple as you can, and no simpler." |
468 | 468 | ||
469 | |||
470 | |||
471 | ---------------------- | ||
472 | SECTION 3 - REFERENCES | ||
473 | ---------------------- | ||
474 | |||
475 | Andrew Morton, "The perfect patch" (tpp). | ||
476 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> | ||
477 | |||
478 | Jeff Garzik, "Linux kernel patch submission format." | ||
479 | <http://linux.yyz.us/patch-format.html> | ||
480 | |||
481 | Greg Kroah, "How to piss off a kernel subsystem maintainer". | ||
482 | <http://www.kroah.com/log/2005/03/31/> | ||
483 | <http://www.kroah.com/log/2005/07/08/> | ||
484 | <http://www.kroah.com/log/2005/10/19/> | ||
485 | |||
486 | NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!. | ||
487 | <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> | ||
488 | |||
489 | Kernel Documentation/CodingStyle | ||
490 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> | ||
491 | |||
492 | Linus Torvald's mail on the canonical patch format: | ||
493 | <http://lkml.org/lkml/2005/4/7/183> | ||
494 | -- | ||
495 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt index 681e426e2482..a083ba35d1ad 100644 --- a/Documentation/applying-patches.txt +++ b/Documentation/applying-patches.txt | |||
@@ -2,8 +2,8 @@ | |||
2 | Applying Patches To The Linux Kernel | 2 | Applying Patches To The Linux Kernel |
3 | ------------------------------------ | 3 | ------------------------------------ |
4 | 4 | ||
5 | (Written by Jesper Juhl, August 2005) | 5 | Original by: Jesper Juhl, August 2005 |
6 | 6 | Last update: 2006-01-05 | |
7 | 7 | ||
8 | 8 | ||
9 | A frequently asked question on the Linux Kernel Mailing List is how to apply | 9 | A frequently asked question on the Linux Kernel Mailing List is how to apply |
@@ -76,7 +76,7 @@ instead: | |||
76 | 76 | ||
77 | If you wish to uncompress the patch file by hand first before applying it | 77 | If you wish to uncompress the patch file by hand first before applying it |
78 | (what I assume you've done in the examples below), then you simply run | 78 | (what I assume you've done in the examples below), then you simply run |
79 | gunzip or bunzip2 on the file - like this: | 79 | gunzip or bunzip2 on the file -- like this: |
80 | gunzip patch-x.y.z.gz | 80 | gunzip patch-x.y.z.gz |
81 | bunzip2 patch-x.y.z.bz2 | 81 | bunzip2 patch-x.y.z.bz2 |
82 | 82 | ||
@@ -94,7 +94,7 @@ Common errors when patching | |||
94 | --- | 94 | --- |
95 | When patch applies a patch file it attempts to verify the sanity of the | 95 | When patch applies a patch file it attempts to verify the sanity of the |
96 | file in different ways. | 96 | file in different ways. |
97 | Checking that the file looks like a valid patch file, checking the code | 97 | Checking that the file looks like a valid patch file & checking the code |
98 | around the bits being modified matches the context provided in the patch are | 98 | around the bits being modified matches the context provided in the patch are |
99 | just two of the basic sanity checks patch does. | 99 | just two of the basic sanity checks patch does. |
100 | 100 | ||
@@ -118,16 +118,16 @@ wrong. | |||
118 | 118 | ||
119 | When patch encounters a change that it can't fix up with fuzz it rejects it | 119 | When patch encounters a change that it can't fix up with fuzz it rejects it |
120 | outright and leaves a file with a .rej extension (a reject file). You can | 120 | outright and leaves a file with a .rej extension (a reject file). You can |
121 | read this file to see exactely what change couldn't be applied, so you can | 121 | read this file to see exactly what change couldn't be applied, so you can |
122 | go fix it up by hand if you wish. | 122 | go fix it up by hand if you wish. |
123 | 123 | ||
124 | If you don't have any third party patches applied to your kernel source, but | 124 | If you don't have any third-party patches applied to your kernel source, but |
125 | only patches from kernel.org and you apply the patches in the correct order, | 125 | only patches from kernel.org and you apply the patches in the correct order, |
126 | and have made no modifications yourself to the source files, then you should | 126 | and have made no modifications yourself to the source files, then you should |
127 | never see a fuzz or reject message from patch. If you do see such messages | 127 | never see a fuzz or reject message from patch. If you do see such messages |
128 | anyway, then there's a high risk that either your local source tree or the | 128 | anyway, then there's a high risk that either your local source tree or the |
129 | patch file is corrupted in some way. In that case you should probably try | 129 | patch file is corrupted in some way. In that case you should probably try |
130 | redownloading the patch and if things are still not OK then you'd be advised | 130 | re-downloading the patch and if things are still not OK then you'd be advised |
131 | to start with a fresh tree downloaded in full from kernel.org. | 131 | to start with a fresh tree downloaded in full from kernel.org. |
132 | 132 | ||
133 | Let's look a bit more at some of the messages patch can produce. | 133 | Let's look a bit more at some of the messages patch can produce. |
@@ -136,7 +136,7 @@ If patch stops and presents a "File to patch:" prompt, then patch could not | |||
136 | find a file to be patched. Most likely you forgot to specify -p1 or you are | 136 | find a file to be patched. Most likely you forgot to specify -p1 or you are |
137 | in the wrong directory. Less often, you'll find patches that need to be | 137 | in the wrong directory. Less often, you'll find patches that need to be |
138 | applied with -p0 instead of -p1 (reading the patch file should reveal if | 138 | applied with -p0 instead of -p1 (reading the patch file should reveal if |
139 | this is the case - if so, then this is an error by the person who created | 139 | this is the case -- if so, then this is an error by the person who created |
140 | the patch but is not fatal). | 140 | the patch but is not fatal). |
141 | 141 | ||
142 | If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a | 142 | If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a |
@@ -167,22 +167,28 @@ the patch will in fact apply it. | |||
167 | 167 | ||
168 | A message similar to "patch: **** unexpected end of file in patch" or "patch | 168 | A message similar to "patch: **** unexpected end of file in patch" or "patch |
169 | unexpectedly ends in middle of line" means that patch could make no sense of | 169 | unexpectedly ends in middle of line" means that patch could make no sense of |
170 | the file you fed to it. Either your download is broken or you tried to feed | 170 | the file you fed to it. Either your download is broken, you tried to feed |
171 | patch a compressed patch file without uncompressing it first. | 171 | patch a compressed patch file without uncompressing it first, or the patch |
172 | file that you are using has been mangled by a mail client or mail transfer | ||
173 | agent along the way somewhere, e.g., by splitting a long line into two lines. | ||
174 | Often these warnings can easily be fixed by joining (concatenating) the | ||
175 | two lines that had been split. | ||
172 | 176 | ||
173 | As I already mentioned above, these errors should never happen if you apply | 177 | As I already mentioned above, these errors should never happen if you apply |
174 | a patch from kernel.org to the correct version of an unmodified source tree. | 178 | a patch from kernel.org to the correct version of an unmodified source tree. |
175 | So if you get these errors with kernel.org patches then you should probably | 179 | So if you get these errors with kernel.org patches then you should probably |
176 | assume that either your patch file or your tree is broken and I'd advice you | 180 | assume that either your patch file or your tree is broken and I'd advise you |
177 | to start over with a fresh download of a full kernel tree and the patch you | 181 | to start over with a fresh download of a full kernel tree and the patch you |
178 | wish to apply. | 182 | wish to apply. |
179 | 183 | ||
180 | 184 | ||
181 | Are there any alternatives to `patch'? | 185 | Are there any alternatives to `patch'? |
182 | --- | 186 | --- |
183 | Yes there are alternatives. You can use the `interdiff' program | 187 | Yes there are alternatives. |
184 | (http://cyberelk.net/tim/patchutils/) to generate a patch representing the | 188 | |
185 | differences between two patches and then apply the result. | 189 | You can use the `interdiff' program (http://cyberelk.net/tim/patchutils/) to |
190 | generate a patch representing the differences between two patches and then | ||
191 | apply the result. | ||
186 | This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single | 192 | This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single |
187 | step. The -z flag to interdiff will even let you feed it patches in gzip or | 193 | step. The -z flag to interdiff will even let you feed it patches in gzip or |
188 | bzip2 compressed form directly without the use of zcat or bzcat or manual | 194 | bzip2 compressed form directly without the use of zcat or bzcat or manual |
@@ -197,10 +203,10 @@ do the additional steps since interdiff can get things wrong in some cases. | |||
197 | Another alternative is `ketchup', which is a python script for automatic | 203 | Another alternative is `ketchup', which is a python script for automatic |
198 | downloading and applying of patches (http://www.selenic.com/ketchup/). | 204 | downloading and applying of patches (http://www.selenic.com/ketchup/). |
199 | 205 | ||
200 | Other nice tools are diffstat which shows a summary of changes made by a | 206 | Other nice tools are diffstat, which shows a summary of changes made by a |
201 | patch, lsdiff which displays a short listing of affected files in a patch | 207 | patch; lsdiff, which displays a short listing of affected files in a patch |
202 | file, along with (optionally) the line numbers of the start of each patch | 208 | file, along with (optionally) the line numbers of the start of each patch; |
203 | and grepdiff which displays a list of the files modified by a patch where | 209 | and grepdiff, which displays a list of the files modified by a patch where |
204 | the patch contains a given regular expression. | 210 | the patch contains a given regular expression. |
205 | 211 | ||
206 | 212 | ||
@@ -225,8 +231,8 @@ The -mm kernels live at | |||
225 | In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a | 231 | In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a |
226 | country code. This way you'll be downloading from a mirror site that's most | 232 | country code. This way you'll be downloading from a mirror site that's most |
227 | likely geographically closer to you, resulting in faster downloads for you, | 233 | likely geographically closer to you, resulting in faster downloads for you, |
228 | less bandwidth used globally and less load on the main kernel.org servers - | 234 | less bandwidth used globally and less load on the main kernel.org servers -- |
229 | these are good things, do use mirrors when possible. | 235 | these are good things, so do use mirrors when possible. |
230 | 236 | ||
231 | 237 | ||
232 | The 2.6.x kernels | 238 | The 2.6.x kernels |
@@ -234,14 +240,14 @@ The 2.6.x kernels | |||
234 | These are the base stable releases released by Linus. The highest numbered | 240 | These are the base stable releases released by Linus. The highest numbered |
235 | release is the most recent. | 241 | release is the most recent. |
236 | 242 | ||
237 | If regressions or other serious flaws are found then a -stable fix patch | 243 | If regressions or other serious flaws are found, then a -stable fix patch |
238 | will be released (see below) on top of this base. Once a new 2.6.x base | 244 | will be released (see below) on top of this base. Once a new 2.6.x base |
239 | kernel is released, a patch is made available that is a delta between the | 245 | kernel is released, a patch is made available that is a delta between the |
240 | previous 2.6.x kernel and the new one. | 246 | previous 2.6.x kernel and the new one. |
241 | 247 | ||
242 | To apply a patch moving from 2.6.11 to 2.6.12 you'd do the following (note | 248 | To apply a patch moving from 2.6.11 to 2.6.12, you'd do the following (note |
243 | that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the | 249 | that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the |
244 | base 2.6.x kernel - if you need to move from 2.6.x.y to 2.6.x+1 you need to | 250 | base 2.6.x kernel -- if you need to move from 2.6.x.y to 2.6.x+1 you need to |
245 | first revert the 2.6.x.y patch). | 251 | first revert the 2.6.x.y patch). |
246 | 252 | ||
247 | Here are some examples: | 253 | Here are some examples: |
@@ -258,12 +264,12 @@ $ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch | |||
258 | # source dir is now 2.6.11 | 264 | # source dir is now 2.6.11 |
259 | $ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch | 265 | $ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch |
260 | $ cd .. | 266 | $ cd .. |
261 | $ mv linux-2.6.11.1 inux-2.6.12 # rename source dir | 267 | $ mv linux-2.6.11.1 linux-2.6.12 # rename source dir |
262 | 268 | ||
263 | 269 | ||
264 | The 2.6.x.y kernels | 270 | The 2.6.x.y kernels |
265 | --- | 271 | --- |
266 | Kernels with 4 digit versions are -stable kernels. They contain small(ish) | 272 | Kernels with 4-digit versions are -stable kernels. They contain small(ish) |
267 | critical fixes for security problems or significant regressions discovered | 273 | critical fixes for security problems or significant regressions discovered |
268 | in a given 2.6.x kernel. | 274 | in a given 2.6.x kernel. |
269 | 275 | ||
@@ -274,9 +280,14 @@ versions. | |||
274 | If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is | 280 | If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is |
275 | the current stable kernel. | 281 | the current stable kernel. |
276 | 282 | ||
283 | note: the -stable team usually do make incremental patches available as well | ||
284 | as patches against the latest mainline release, but I only cover the | ||
285 | non-incremental ones below. The incremental ones can be found at | ||
286 | ftp://ftp.kernel.org/pub/linux/kernel/v2.6/incr/ | ||
287 | |||
277 | These patches are not incremental, meaning that for example the 2.6.12.3 | 288 | These patches are not incremental, meaning that for example the 2.6.12.3 |
278 | patch does not apply on top of the 2.6.12.2 kernel source, but rather on top | 289 | patch does not apply on top of the 2.6.12.2 kernel source, but rather on top |
279 | of the base 2.6.12 kernel source. | 290 | of the base 2.6.12 kernel source . |
280 | So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel | 291 | So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel |
281 | source you have to first back out the 2.6.12.2 patch (so you are left with a | 292 | source you have to first back out the 2.6.12.2 patch (so you are left with a |
282 | base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch. | 293 | base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch. |
@@ -342,12 +353,12 @@ The -git kernels | |||
342 | repository, hence the name). | 353 | repository, hence the name). |
343 | 354 | ||
344 | These patches are usually released daily and represent the current state of | 355 | These patches are usually released daily and represent the current state of |
345 | Linus' tree. They are more experimental than -rc kernels since they are | 356 | Linus's tree. They are more experimental than -rc kernels since they are |
346 | generated automatically without even a cursory glance to see if they are | 357 | generated automatically without even a cursory glance to see if they are |
347 | sane. | 358 | sane. |
348 | 359 | ||
349 | -git patches are not incremental and apply either to a base 2.6.x kernel or | 360 | -git patches are not incremental and apply either to a base 2.6.x kernel or |
350 | a base 2.6.x-rc kernel - you can see which from their name. | 361 | a base 2.6.x-rc kernel -- you can see which from their name. |
351 | A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch | 362 | A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch |
352 | named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel. | 363 | named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel. |
353 | 364 | ||
@@ -390,12 +401,12 @@ You should generally strive to get your patches into mainline via -mm to | |||
390 | ensure maximum testing. | 401 | ensure maximum testing. |
391 | 402 | ||
392 | This branch is in constant flux and contains many experimental features, a | 403 | This branch is in constant flux and contains many experimental features, a |
393 | lot of debugging patches not appropriate for mainline etc and is the most | 404 | lot of debugging patches not appropriate for mainline etc., and is the most |
394 | experimental of the branches described in this document. | 405 | experimental of the branches described in this document. |
395 | 406 | ||
396 | These kernels are not appropriate for use on systems that are supposed to be | 407 | These kernels are not appropriate for use on systems that are supposed to be |
397 | stable and they are more risky to run than any of the other branches (make | 408 | stable and they are more risky to run than any of the other branches (make |
398 | sure you have up-to-date backups - that goes for any experimental kernel but | 409 | sure you have up-to-date backups -- that goes for any experimental kernel but |
399 | even more so for -mm kernels). | 410 | even more so for -mm kernels). |
400 | 411 | ||
401 | These kernels in addition to all the other experimental patches they contain | 412 | These kernels in addition to all the other experimental patches they contain |
@@ -433,7 +444,11 @@ $ cd .. | |||
433 | $ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir | 444 | $ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir |
434 | 445 | ||
435 | 446 | ||
436 | This concludes this list of explanations of the various kernel trees and I | 447 | This concludes this list of explanations of the various kernel trees. |
437 | hope you are now crystal clear on how to apply the various patches and help | 448 | I hope you are now clear on how to apply the various patches and help testing |
438 | testing the kernel. | 449 | the kernel. |
450 | |||
451 | Thank you's to Randy Dunlap, Rolf Eike Beer, Linus Torvalds, Bodo Eggert, | ||
452 | Johannes Stezenbach, Grant Coady, Pavel Machek and others that I may have | ||
453 | forgotten for their reviews and contributions to this document. | ||
439 | 454 | ||
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 303c57a7fad9..8e63831971d5 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt | |||
@@ -263,14 +263,8 @@ A flag in the bio structure, BIO_BARRIER is used to identify a barrier i/o. | |||
263 | The generic i/o scheduler would make sure that it places the barrier request and | 263 | The generic i/o scheduler would make sure that it places the barrier request and |
264 | all other requests coming after it after all the previous requests in the | 264 | all other requests coming after it after all the previous requests in the |
265 | queue. Barriers may be implemented in different ways depending on the | 265 | queue. Barriers may be implemented in different ways depending on the |
266 | driver. A SCSI driver for example could make use of ordered tags to | 266 | driver. For more details regarding I/O barriers, please read barrier.txt |
267 | preserve the necessary ordering with a lower impact on throughput. For IDE | 267 | in this directory. |
268 | this might be two sync cache flush: a pre and post flush when encountering | ||
269 | a barrier write. | ||
270 | |||
271 | There is a provision for queues to indicate what kind of barriers they | ||
272 | can provide. This is as of yet unmerged, details will be added here once it | ||
273 | is in the kernel. | ||
274 | 268 | ||
275 | 1.2.2 Request Priority/Latency | 269 | 1.2.2 Request Priority/Latency |
276 | 270 | ||
diff --git a/Documentation/block/stat.txt b/Documentation/block/stat.txt new file mode 100644 index 000000000000..0dbc946de2ea --- /dev/null +++ b/Documentation/block/stat.txt | |||
@@ -0,0 +1,82 @@ | |||
1 | Block layer statistics in /sys/block/<dev>/stat | ||
2 | =============================================== | ||
3 | |||
4 | This file documents the contents of the /sys/block/<dev>/stat file. | ||
5 | |||
6 | The stat file provides several statistics about the state of block | ||
7 | device <dev>. | ||
8 | |||
9 | Q. Why are there multiple statistics in a single file? Doesn't sysfs | ||
10 | normally contain a single value per file? | ||
11 | A. By having a single file, the kernel can guarantee that the statistics | ||
12 | represent a consistent snapshot of the state of the device. If the | ||
13 | statistics were exported as multiple files containing one statistic | ||
14 | each, it would be impossible to guarantee that a set of readings | ||
15 | represent a single point in time. | ||
16 | |||
17 | The stat file consists of a single line of text containing 11 decimal | ||
18 | values separated by whitespace. The fields are summarized in the | ||
19 | following table, and described in more detail below. | ||
20 | |||
21 | Name units description | ||
22 | ---- ----- ----------- | ||
23 | read I/Os requests number of read I/Os processed | ||
24 | read merges requests number of read I/Os merged with in-queue I/O | ||
25 | read sectors sectors number of sectors read | ||
26 | read ticks milliseconds total wait time for read requests | ||
27 | write I/Os requests number of write I/Os processed | ||
28 | write merges requests number of write I/Os merged with in-queue I/O | ||
29 | write sectors sectors number of sectors written | ||
30 | write ticks milliseconds total wait time for write requests | ||
31 | in_flight requests number of I/Os currently in flight | ||
32 | io_ticks milliseconds total time this block device has been active | ||
33 | time_in_queue milliseconds total wait time for all requests | ||
34 | |||
35 | read I/Os, write I/Os | ||
36 | ===================== | ||
37 | |||
38 | These values increment when an I/O request completes. | ||
39 | |||
40 | read merges, write merges | ||
41 | ========================= | ||
42 | |||
43 | These values increment when an I/O request is merged with an | ||
44 | already-queued I/O request. | ||
45 | |||
46 | read sectors, write sectors | ||
47 | =========================== | ||
48 | |||
49 | These values count the number of sectors read from or written to this | ||
50 | block device. The "sectors" in question are the standard UNIX 512-byte | ||
51 | sectors, not any device- or filesystem-specific block size. The | ||
52 | counters are incremented when the I/O completes. | ||
53 | |||
54 | read ticks, write ticks | ||
55 | ======================= | ||
56 | |||
57 | These values count the number of milliseconds that I/O requests have | ||
58 | waited on this block device. If there are multiple I/O requests waiting, | ||
59 | these values will increase at a rate greater than 1000/second; for | ||
60 | example, if 60 read requests wait for an average of 30 ms, the read_ticks | ||
61 | field will increase by 60*30 = 1800. | ||
62 | |||
63 | in_flight | ||
64 | ========= | ||
65 | |||
66 | This value counts the number of I/O requests that have been issued to | ||
67 | the device driver but have not yet completed. It does not include I/O | ||
68 | requests that are in the queue but not yet issued to the device driver. | ||
69 | |||
70 | io_ticks | ||
71 | ======== | ||
72 | |||
73 | This value counts the number of milliseconds during which the device has | ||
74 | had I/O requests queued. | ||
75 | |||
76 | time_in_queue | ||
77 | ============= | ||
78 | |||
79 | This value counts the number of milliseconds that I/O requests have waited | ||
80 | on this block device. If there are multiple I/O requests waiting, this | ||
81 | value will increase as the product of the number of milliseconds times the | ||
82 | number of requests waiting (see "read ticks" above for an example). | ||
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt new file mode 100644 index 000000000000..08c5d04f3086 --- /dev/null +++ b/Documentation/cpu-hotplug.txt | |||
@@ -0,0 +1,357 @@ | |||
1 | CPU hotplug Support in Linux(tm) Kernel | ||
2 | |||
3 | Maintainers: | ||
4 | CPU Hotplug Core: | ||
5 | Rusty Russell <rusty@rustycorp.com.au> | ||
6 | Srivatsa Vaddagiri <vatsa@in.ibm.com> | ||
7 | i386: | ||
8 | Zwane Mwaikambo <zwane@arm.linux.org.uk> | ||
9 | ppc64: | ||
10 | Nathan Lynch <nathanl@austin.ibm.com> | ||
11 | Joel Schopp <jschopp@austin.ibm.com> | ||
12 | ia64/x86_64: | ||
13 | Ashok Raj <ashok.raj@intel.com> | ||
14 | |||
15 | Authors: Ashok Raj <ashok.raj@intel.com> | ||
16 | Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>, | ||
17 | Joel Schopp <jschopp@austin.ibm.com> | ||
18 | |||
19 | Introduction | ||
20 | |||
21 | Modern advances in system architectures have introduced advanced error | ||
22 | reporting and correction capabilities in processors. CPU architectures permit | ||
23 | partitioning support, where compute resources of a single CPU could be made | ||
24 | available to virtual machine environments. There are couple OEMS that | ||
25 | support NUMA hardware which are hot pluggable as well, where physical | ||
26 | node insertion and removal require support for CPU hotplug. | ||
27 | |||
28 | Such advances require CPUs available to a kernel to be removed either for | ||
29 | provisioning reasons, or for RAS purposes to keep an offending CPU off | ||
30 | system execution path. Hence the need for CPU hotplug support in the | ||
31 | Linux kernel. | ||
32 | |||
33 | A more novel use of CPU-hotplug support is its use today in suspend | ||
34 | resume support for SMP. Dual-core and HT support makes even | ||
35 | a laptop run SMP kernels which didn't support these methods. SMP support | ||
36 | for suspend/resume is a work in progress. | ||
37 | |||
38 | General Stuff about CPU Hotplug | ||
39 | -------------------------------- | ||
40 | |||
41 | Command Line Switches | ||
42 | --------------------- | ||
43 | maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using | ||
44 | maxcpus=2 will only boot 2. You can choose to bring the | ||
45 | other cpus later online, read FAQ's for more info. | ||
46 | |||
47 | additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus. | ||
48 | This option sets | ||
49 | cpu_possible_map = cpu_present_map + additional_cpus | ||
50 | |||
51 | CPU maps and such | ||
52 | ----------------- | ||
53 | [More on cpumaps and primitive to manipulate, please check | ||
54 | include/linux/cpumask.h that has more descriptive text.] | ||
55 | |||
56 | cpu_possible_map: Bitmap of possible CPUs that can ever be available in the | ||
57 | system. This is used to allocate some boot time memory for per_cpu variables | ||
58 | that aren't designed to grow/shrink as CPUs are made available or removed. | ||
59 | Once set during boot time discovery phase, the map is static, i.e no bits | ||
60 | are added or removed anytime. Trimming it accurately for your system needs | ||
61 | upfront can save some boot time memory. See below for how we use heuristics | ||
62 | in x86_64 case to keep this under check. | ||
63 | |||
64 | cpu_online_map: Bitmap of all CPUs currently online. Its set in __cpu_up() | ||
65 | after a cpu is available for kernel scheduling and ready to receive | ||
66 | interrupts from devices. Its cleared when a cpu is brought down using | ||
67 | __cpu_disable(), before which all OS services including interrupts are | ||
68 | migrated to another target CPU. | ||
69 | |||
70 | cpu_present_map: Bitmap of CPUs currently present in the system. Not all | ||
71 | of them may be online. When physical hotplug is processed by the relevant | ||
72 | subsystem (e.g ACPI) can change and new bit either be added or removed | ||
73 | from the map depending on the event is hot-add/hot-remove. There are currently | ||
74 | no locking rules as of now. Typical usage is to init topology during boot, | ||
75 | at which time hotplug is disabled. | ||
76 | |||
77 | You really dont need to manipulate any of the system cpu maps. They should | ||
78 | be read-only for most use. When setting up per-cpu resources almost always use | ||
79 | cpu_possible_map/for_each_cpu() to iterate. | ||
80 | |||
81 | Never use anything other than cpumask_t to represent bitmap of CPUs. | ||
82 | |||
83 | #include <linux/cpumask.h> | ||
84 | |||
85 | for_each_cpu - Iterate over cpu_possible_map | ||
86 | for_each_online_cpu - Iterate over cpu_online_map | ||
87 | for_each_present_cpu - Iterate over cpu_present_map | ||
88 | for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask. | ||
89 | |||
90 | #include <linux/cpu.h> | ||
91 | lock_cpu_hotplug() and unlock_cpu_hotplug(): | ||
92 | |||
93 | The above calls are used to inhibit cpu hotplug operations. While holding the | ||
94 | cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid | ||
95 | cpus going away, you could also use preempt_disable() and preempt_enable() | ||
96 | for those sections. Just remember the critical section cannot call any | ||
97 | function that can sleep or schedule this process away. The preempt_disable() | ||
98 | will work as long as stop_machine_run() is used to take a cpu down. | ||
99 | |||
100 | CPU Hotplug - Frequently Asked Questions. | ||
101 | |||
102 | Q: How to i enable my kernel to support CPU hotplug? | ||
103 | A: When doing make defconfig, Enable CPU hotplug support | ||
104 | |||
105 | "Processor type and Features" -> Support for Hotpluggable CPUs | ||
106 | |||
107 | Make sure that you have CONFIG_HOTPLUG, and CONFIG_SMP turned on as well. | ||
108 | |||
109 | You would need to enable CONFIG_HOTPLUG_CPU for SMP suspend/resume support | ||
110 | as well. | ||
111 | |||
112 | Q: What architectures support CPU hotplug? | ||
113 | A: As of 2.6.14, the following architectures support CPU hotplug. | ||
114 | |||
115 | i386 (Intel), ppc, ppc64, parisc, s390, ia64 and x86_64 | ||
116 | |||
117 | Q: How to test if hotplug is supported on the newly built kernel? | ||
118 | A: You should now notice an entry in sysfs. | ||
119 | |||
120 | Check if sysfs is mounted, using the "mount" command. You should notice | ||
121 | an entry as shown below in the output. | ||
122 | |||
123 | .... | ||
124 | none on /sys type sysfs (rw) | ||
125 | .... | ||
126 | |||
127 | if this is not mounted, do the following. | ||
128 | |||
129 | #mkdir /sysfs | ||
130 | #mount -t sysfs sys /sys | ||
131 | |||
132 | now you should see entries for all present cpu, the following is an example | ||
133 | in a 8-way system. | ||
134 | |||
135 | #pwd | ||
136 | #/sys/devices/system/cpu | ||
137 | #ls -l | ||
138 | total 0 | ||
139 | drwxr-xr-x 10 root root 0 Sep 19 07:44 . | ||
140 | drwxr-xr-x 13 root root 0 Sep 19 07:45 .. | ||
141 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0 | ||
142 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1 | ||
143 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2 | ||
144 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3 | ||
145 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4 | ||
146 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5 | ||
147 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6 | ||
148 | drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7 | ||
149 | |||
150 | Under each directory you would find an "online" file which is the control | ||
151 | file to logically online/offline a processor. | ||
152 | |||
153 | Q: Does hot-add/hot-remove refer to physical add/remove of cpus? | ||
154 | A: The usage of hot-add/remove may not be very consistently used in the code. | ||
155 | CONFIG_CPU_HOTPLUG enables logical online/offline capability in the kernel. | ||
156 | To support physical addition/removal, one would need some BIOS hooks and | ||
157 | the platform should have something like an attention button in PCI hotplug. | ||
158 | CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs. | ||
159 | |||
160 | Q: How do i logically offline a CPU? | ||
161 | A: Do the following. | ||
162 | |||
163 | #echo 0 > /sys/devices/system/cpu/cpuX/online | ||
164 | |||
165 | once the logical offline is successful, check | ||
166 | |||
167 | #cat /proc/interrupts | ||
168 | |||
169 | you should now not see the CPU that you removed. Also online file will report | ||
170 | the state as 0 when a cpu if offline and 1 when its online. | ||
171 | |||
172 | #To display the current cpu state. | ||
173 | #cat /sys/devices/system/cpu/cpuX/online | ||
174 | |||
175 | Q: Why cant i remove CPU0 on some systems? | ||
176 | A: Some architectures may have some special dependency on a certain CPU. | ||
177 | |||
178 | For e.g in IA64 platforms we have ability to sent platform interrupts to the | ||
179 | OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI | ||
180 | specifications, we didn't have a way to change the target CPU. Hence if the | ||
181 | current ACPI version doesn't support such re-direction, we disable that CPU | ||
182 | by making it not-removable. | ||
183 | |||
184 | In such cases you will also notice that the online file is missing under cpu0. | ||
185 | |||
186 | Q: How do i find out if a particular CPU is not removable? | ||
187 | A: Depending on the implementation, some architectures may show this by the | ||
188 | absence of the "online" file. This is done if it can be determined ahead of | ||
189 | time that this CPU cannot be removed. | ||
190 | |||
191 | In some situations, this can be a run time check, i.e if you try to remove the | ||
192 | last CPU, this will not be permitted. You can find such failures by | ||
193 | investigating the return value of the "echo" command. | ||
194 | |||
195 | Q: What happens when a CPU is being logically offlined? | ||
196 | A: The following happen, listed in no particular order :-) | ||
197 | |||
198 | - A notification is sent to in-kernel registered modules by sending an event | ||
199 | CPU_DOWN_PREPARE | ||
200 | - All process is migrated away from this outgoing CPU to a new CPU | ||
201 | - All interrupts targeted to this CPU is migrated to a new CPU | ||
202 | - timers/bottom half/task lets are also migrated to a new CPU | ||
203 | - Once all services are migrated, kernel calls an arch specific routine | ||
204 | __cpu_disable() to perform arch specific cleanup. | ||
205 | - Once this is successful, an event for successful cleanup is sent by an event | ||
206 | CPU_DEAD. | ||
207 | |||
208 | "It is expected that each service cleans up when the CPU_DOWN_PREPARE | ||
209 | notifier is called, when CPU_DEAD is called its expected there is nothing | ||
210 | running on behalf of this CPU that was offlined" | ||
211 | |||
212 | Q: If i have some kernel code that needs to be aware of CPU arrival and | ||
213 | departure, how to i arrange for proper notification? | ||
214 | A: This is what you would need in your kernel code to receive notifications. | ||
215 | |||
216 | #include <linux/cpu.h> | ||
217 | static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb, | ||
218 | unsigned long action, void *hcpu) | ||
219 | { | ||
220 | unsigned int cpu = (unsigned long)hcpu; | ||
221 | |||
222 | switch (action) { | ||
223 | case CPU_ONLINE: | ||
224 | foobar_online_action(cpu); | ||
225 | break; | ||
226 | case CPU_DEAD: | ||
227 | foobar_dead_action(cpu); | ||
228 | break; | ||
229 | } | ||
230 | return NOTIFY_OK; | ||
231 | } | ||
232 | |||
233 | static struct notifier_block foobar_cpu_notifer = | ||
234 | { | ||
235 | .notifier_call = foobar_cpu_callback, | ||
236 | }; | ||
237 | |||
238 | |||
239 | In your init function, | ||
240 | |||
241 | register_cpu_notifier(&foobar_cpu_notifier); | ||
242 | |||
243 | You can fail PREPARE notifiers if something doesn't work to prepare resources. | ||
244 | This will stop the activity and send a following CANCELED event back. | ||
245 | |||
246 | CPU_DEAD should not be failed, its just a goodness indication, but bad | ||
247 | things will happen if a notifier in path sent a BAD notify code. | ||
248 | |||
249 | Q: I don't see my action being called for all CPUs already up and running? | ||
250 | A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined. | ||
251 | If you need to perform some action for each cpu already in the system, then | ||
252 | |||
253 | for_each_online_cpu(i) { | ||
254 | foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i); | ||
255 | foobar_cpu_callback(&foobar-cpu_notifier, CPU_ONLINE, i); | ||
256 | } | ||
257 | |||
258 | Q: If i would like to develop cpu hotplug support for a new architecture, | ||
259 | what do i need at a minimum? | ||
260 | A: The following are what is required for CPU hotplug infrastructure to work | ||
261 | correctly. | ||
262 | |||
263 | - Make sure you have an entry in Kconfig to enable CONFIG_HOTPLUG_CPU | ||
264 | - __cpu_up() - Arch interface to bring up a CPU | ||
265 | - __cpu_disable() - Arch interface to shutdown a CPU, no more interrupts | ||
266 | can be handled by the kernel after the routine | ||
267 | returns. Including local APIC timers etc are | ||
268 | shutdown. | ||
269 | - __cpu_die() - This actually supposed to ensure death of the CPU. | ||
270 | Actually look at some example code in other arch | ||
271 | that implement CPU hotplug. The processor is taken | ||
272 | down from the idle() loop for that specific | ||
273 | architecture. __cpu_die() typically waits for some | ||
274 | per_cpu state to be set, to ensure the processor | ||
275 | dead routine is called to be sure positively. | ||
276 | |||
277 | Q: I need to ensure that a particular cpu is not removed when there is some | ||
278 | work specific to this cpu is in progress. | ||
279 | A: First switch the current thread context to preferred cpu | ||
280 | |||
281 | int my_func_on_cpu(int cpu) | ||
282 | { | ||
283 | cpumask_t saved_mask, new_mask = CPU_MASK_NONE; | ||
284 | int curr_cpu, err = 0; | ||
285 | |||
286 | saved_mask = current->cpus_allowed; | ||
287 | cpu_set(cpu, new_mask); | ||
288 | err = set_cpus_allowed(current, new_mask); | ||
289 | |||
290 | if (err) | ||
291 | return err; | ||
292 | |||
293 | /* | ||
294 | * If we got scheduled out just after the return from | ||
295 | * set_cpus_allowed() before running the work, this ensures | ||
296 | * we stay locked. | ||
297 | */ | ||
298 | curr_cpu = get_cpu(); | ||
299 | |||
300 | if (curr_cpu != cpu) { | ||
301 | err = -EAGAIN; | ||
302 | goto ret; | ||
303 | } else { | ||
304 | /* | ||
305 | * Do work : But cant sleep, since get_cpu() disables preempt | ||
306 | */ | ||
307 | } | ||
308 | ret: | ||
309 | put_cpu(); | ||
310 | set_cpus_allowed(current, saved_mask); | ||
311 | return err; | ||
312 | } | ||
313 | |||
314 | |||
315 | Q: How do we determine how many CPUs are available for hotplug. | ||
316 | A: There is no clear spec defined way from ACPI that can give us that | ||
317 | information today. Based on some input from Natalie of Unisys, | ||
318 | that the ACPI MADT (Multiple APIC Description Tables) marks those possible | ||
319 | CPUs in a system with disabled status. | ||
320 | |||
321 | Andi implemented some simple heuristics that count the number of disabled | ||
322 | CPUs in MADT as hotpluggable CPUS. In the case there are no disabled CPUS | ||
323 | we assume 1/2 the number of CPUs currently present can be hotplugged. | ||
324 | |||
325 | Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field | ||
326 | in MADT is only 8 bits. | ||
327 | |||
328 | User Space Notification | ||
329 | |||
330 | Hotplug support for devices is common in Linux today. Its being used today to | ||
331 | support automatic configuration of network, usb and pci devices. A hotplug | ||
332 | event can be used to invoke an agent script to perform the configuration task. | ||
333 | |||
334 | You can add /etc/hotplug/cpu.agent to handle hotplug notification user space | ||
335 | scripts. | ||
336 | |||
337 | #!/bin/bash | ||
338 | # $Id: cpu.agent | ||
339 | # Kernel hotplug params include: | ||
340 | #ACTION=%s [online or offline] | ||
341 | #DEVPATH=%s | ||
342 | # | ||
343 | cd /etc/hotplug | ||
344 | . ./hotplug.functions | ||
345 | |||
346 | case $ACTION in | ||
347 | online) | ||
348 | echo `date` ":cpu.agent" add cpu >> /tmp/hotplug.txt | ||
349 | ;; | ||
350 | offline) | ||
351 | echo `date` ":cpu.agent" remove cpu >>/tmp/hotplug.txt | ||
352 | ;; | ||
353 | *) | ||
354 | debug_mesg CPU $ACTION event not supported | ||
355 | exit 1 | ||
356 | ;; | ||
357 | esac | ||
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index a09a8eb80665..9e49b1c35729 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt | |||
@@ -14,7 +14,10 @@ CONTENTS: | |||
14 | 1.1 What are cpusets ? | 14 | 1.1 What are cpusets ? |
15 | 1.2 Why are cpusets needed ? | 15 | 1.2 Why are cpusets needed ? |
16 | 1.3 How are cpusets implemented ? | 16 | 1.3 How are cpusets implemented ? |
17 | 1.4 How do I use cpusets ? | 17 | 1.4 What are exclusive cpusets ? |
18 | 1.5 What does notify_on_release do ? | ||
19 | 1.6 What is memory_pressure ? | ||
20 | 1.7 How do I use cpusets ? | ||
18 | 2. Usage Examples and Syntax | 21 | 2. Usage Examples and Syntax |
19 | 2.1 Basic Usage | 22 | 2.1 Basic Usage |
20 | 2.2 Adding/removing cpus | 23 | 2.2 Adding/removing cpus |
@@ -49,29 +52,6 @@ its cpus_allowed vector, and the kernel page allocator will not | |||
49 | allocate a page on a node that is not allowed in the requesting tasks | 52 | allocate a page on a node that is not allowed in the requesting tasks |
50 | mems_allowed vector. | 53 | mems_allowed vector. |
51 | 54 | ||
52 | If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct | ||
53 | ancestor or descendent, may share any of the same CPUs or Memory Nodes. | ||
54 | A cpuset that is cpu exclusive has a sched domain associated with it. | ||
55 | The sched domain consists of all cpus in the current cpuset that are not | ||
56 | part of any exclusive child cpusets. | ||
57 | This ensures that the scheduler load balacing code only balances | ||
58 | against the cpus that are in the sched domain as defined above and not | ||
59 | all of the cpus in the system. This removes any overhead due to | ||
60 | load balancing code trying to pull tasks outside of the cpu exclusive | ||
61 | cpuset only to be prevented by the tasks' cpus_allowed mask. | ||
62 | |||
63 | A cpuset that is mem_exclusive restricts kernel allocations for | ||
64 | page, buffer and other data commonly shared by the kernel across | ||
65 | multiple users. All cpusets, whether mem_exclusive or not, restrict | ||
66 | allocations of memory for user space. This enables configuring a | ||
67 | system so that several independent jobs can share common kernel | ||
68 | data, such as file system pages, while isolating each jobs user | ||
69 | allocation in its own cpuset. To do this, construct a large | ||
70 | mem_exclusive cpuset to hold all the jobs, and construct child, | ||
71 | non-mem_exclusive cpusets for each individual job. Only a small | ||
72 | amount of typical kernel memory, such as requests from interrupt | ||
73 | handlers, is allowed to be taken outside even a mem_exclusive cpuset. | ||
74 | |||
75 | User level code may create and destroy cpusets by name in the cpuset | 55 | User level code may create and destroy cpusets by name in the cpuset |
76 | virtual file system, manage the attributes and permissions of these | 56 | virtual file system, manage the attributes and permissions of these |
77 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, | 57 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, |
@@ -192,9 +172,15 @@ containing the following files describing that cpuset: | |||
192 | 172 | ||
193 | - cpus: list of CPUs in that cpuset | 173 | - cpus: list of CPUs in that cpuset |
194 | - mems: list of Memory Nodes in that cpuset | 174 | - mems: list of Memory Nodes in that cpuset |
175 | - memory_migrate flag: if set, move pages to cpusets nodes | ||
195 | - cpu_exclusive flag: is cpu placement exclusive? | 176 | - cpu_exclusive flag: is cpu placement exclusive? |
196 | - mem_exclusive flag: is memory placement exclusive? | 177 | - mem_exclusive flag: is memory placement exclusive? |
197 | - tasks: list of tasks (by pid) attached to that cpuset | 178 | - tasks: list of tasks (by pid) attached to that cpuset |
179 | - notify_on_release flag: run /sbin/cpuset_release_agent on exit? | ||
180 | - memory_pressure: measure of how much paging pressure in cpuset | ||
181 | |||
182 | In addition, the root cpuset only has the following file: | ||
183 | - memory_pressure_enabled flag: compute memory_pressure? | ||
198 | 184 | ||
199 | New cpusets are created using the mkdir system call or shell | 185 | New cpusets are created using the mkdir system call or shell |
200 | command. The properties of a cpuset, such as its flags, allowed | 186 | command. The properties of a cpuset, such as its flags, allowed |
@@ -228,7 +214,108 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs) | |||
228 | to represent the cpuset hierarchy provides for a familiar permission | 214 | to represent the cpuset hierarchy provides for a familiar permission |
229 | and name space for cpusets, with a minimum of additional kernel code. | 215 | and name space for cpusets, with a minimum of additional kernel code. |
230 | 216 | ||
231 | 1.4 How do I use cpusets ? | 217 | |
218 | 1.4 What are exclusive cpusets ? | ||
219 | -------------------------------- | ||
220 | |||
221 | If a cpuset is cpu or mem exclusive, no other cpuset, other than | ||
222 | a direct ancestor or descendent, may share any of the same CPUs or | ||
223 | Memory Nodes. | ||
224 | |||
225 | A cpuset that is cpu_exclusive has a scheduler (sched) domain | ||
226 | associated with it. The sched domain consists of all CPUs in the | ||
227 | current cpuset that are not part of any exclusive child cpusets. | ||
228 | This ensures that the scheduler load balancing code only balances | ||
229 | against the CPUs that are in the sched domain as defined above and | ||
230 | not all of the CPUs in the system. This removes any overhead due to | ||
231 | load balancing code trying to pull tasks outside of the cpu_exclusive | ||
232 | cpuset only to be prevented by the tasks' cpus_allowed mask. | ||
233 | |||
234 | A cpuset that is mem_exclusive restricts kernel allocations for | ||
235 | page, buffer and other data commonly shared by the kernel across | ||
236 | multiple users. All cpusets, whether mem_exclusive or not, restrict | ||
237 | allocations of memory for user space. This enables configuring a | ||
238 | system so that several independent jobs can share common kernel data, | ||
239 | such as file system pages, while isolating each jobs user allocation in | ||
240 | its own cpuset. To do this, construct a large mem_exclusive cpuset to | ||
241 | hold all the jobs, and construct child, non-mem_exclusive cpusets for | ||
242 | each individual job. Only a small amount of typical kernel memory, | ||
243 | such as requests from interrupt handlers, is allowed to be taken | ||
244 | outside even a mem_exclusive cpuset. | ||
245 | |||
246 | |||
247 | 1.5 What does notify_on_release do ? | ||
248 | ------------------------------------ | ||
249 | |||
250 | If the notify_on_release flag is enabled (1) in a cpuset, then whenever | ||
251 | the last task in the cpuset leaves (exits or attaches to some other | ||
252 | cpuset) and the last child cpuset of that cpuset is removed, then | ||
253 | the kernel runs the command /sbin/cpuset_release_agent, supplying the | ||
254 | pathname (relative to the mount point of the cpuset file system) of the | ||
255 | abandoned cpuset. This enables automatic removal of abandoned cpusets. | ||
256 | The default value of notify_on_release in the root cpuset at system | ||
257 | boot is disabled (0). The default value of other cpusets at creation | ||
258 | is the current value of their parents notify_on_release setting. | ||
259 | |||
260 | |||
261 | 1.6 What is memory_pressure ? | ||
262 | ----------------------------- | ||
263 | The memory_pressure of a cpuset provides a simple per-cpuset metric | ||
264 | of the rate that the tasks in a cpuset are attempting to free up in | ||
265 | use memory on the nodes of the cpuset to satisfy additional memory | ||
266 | requests. | ||
267 | |||
268 | This enables batch managers monitoring jobs running in dedicated | ||
269 | cpusets to efficiently detect what level of memory pressure that job | ||
270 | is causing. | ||
271 | |||
272 | This is useful both on tightly managed systems running a wide mix of | ||
273 | submitted jobs, which may choose to terminate or re-prioritize jobs that | ||
274 | are trying to use more memory than allowed on the nodes assigned them, | ||
275 | and with tightly coupled, long running, massively parallel scientific | ||
276 | computing jobs that will dramatically fail to meet required performance | ||
277 | goals if they start to use more memory than allowed to them. | ||
278 | |||
279 | This mechanism provides a very economical way for the batch manager | ||
280 | to monitor a cpuset for signs of memory pressure. It's up to the | ||
281 | batch manager or other user code to decide what to do about it and | ||
282 | take action. | ||
283 | |||
284 | ==> Unless this feature is enabled by writing "1" to the special file | ||
285 | /dev/cpuset/memory_pressure_enabled, the hook in the rebalance | ||
286 | code of __alloc_pages() for this metric reduces to simply noticing | ||
287 | that the cpuset_memory_pressure_enabled flag is zero. So only | ||
288 | systems that enable this feature will compute the metric. | ||
289 | |||
290 | Why a per-cpuset, running average: | ||
291 | |||
292 | Because this meter is per-cpuset, rather than per-task or mm, | ||
293 | the system load imposed by a batch scheduler monitoring this | ||
294 | metric is sharply reduced on large systems, because a scan of | ||
295 | the tasklist can be avoided on each set of queries. | ||
296 | |||
297 | Because this meter is a running average, instead of an accumulating | ||
298 | counter, a batch scheduler can detect memory pressure with a | ||
299 | single read, instead of having to read and accumulate results | ||
300 | for a period of time. | ||
301 | |||
302 | Because this meter is per-cpuset rather than per-task or mm, | ||
303 | the batch scheduler can obtain the key information, memory | ||
304 | pressure in a cpuset, with a single read, rather than having to | ||
305 | query and accumulate results over all the (dynamically changing) | ||
306 | set of tasks in the cpuset. | ||
307 | |||
308 | A per-cpuset simple digital filter (requires a spinlock and 3 words | ||
309 | of data per-cpuset) is kept, and updated by any task attached to that | ||
310 | cpuset, if it enters the synchronous (direct) page reclaim code. | ||
311 | |||
312 | A per-cpuset file provides an integer number representing the recent | ||
313 | (half-life of 10 seconds) rate of direct page reclaims caused by | ||
314 | the tasks in the cpuset, in units of reclaims attempted per second, | ||
315 | times 1000. | ||
316 | |||
317 | |||
318 | 1.7 How do I use cpusets ? | ||
232 | -------------------------- | 319 | -------------------------- |
233 | 320 | ||
234 | In order to minimize the impact of cpusets on critical kernel | 321 | In order to minimize the impact of cpusets on critical kernel |
@@ -277,6 +364,30 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid | |||
277 | impacting the scheduler code in the kernel with a check for changes | 364 | impacting the scheduler code in the kernel with a check for changes |
278 | in a tasks processor placement. | 365 | in a tasks processor placement. |
279 | 366 | ||
367 | Normally, once a page is allocated (given a physical page | ||
368 | of main memory) then that page stays on whatever node it | ||
369 | was allocated, so long as it remains allocated, even if the | ||
370 | cpusets memory placement policy 'mems' subsequently changes. | ||
371 | If the cpuset flag file 'memory_migrate' is set true, then when | ||
372 | tasks are attached to that cpuset, any pages that task had | ||
373 | allocated to it on nodes in its previous cpuset are migrated | ||
374 | to the tasks new cpuset. Depending on the implementation, | ||
375 | this migration may either be done by swapping the page out, | ||
376 | so that the next time the page is referenced, it will be paged | ||
377 | into the tasks new cpuset, usually on the node where it was | ||
378 | referenced, or this migration may be done by directly copying | ||
379 | the pages from the tasks previous cpuset to the new cpuset, | ||
380 | where possible to the same node, relative to the new cpuset, | ||
381 | as the node that held the page, relative to the old cpuset. | ||
382 | Also if 'memory_migrate' is set true, then if that cpusets | ||
383 | 'mems' file is modified, pages allocated to tasks in that | ||
384 | cpuset, that were on nodes in the previous setting of 'mems', | ||
385 | will be moved to nodes in the new setting of 'mems.' Again, | ||
386 | depending on the implementation, this might be done by swapping, | ||
387 | or by direct copying. In either case, pages that were not in | ||
388 | the tasks prior cpuset, or in the cpusets prior 'mems' setting, | ||
389 | will not be moved. | ||
390 | |||
280 | There is an exception to the above. If hotplug functionality is used | 391 | There is an exception to the above. If hotplug functionality is used |
281 | to remove all the CPUs that are currently assigned to a cpuset, | 392 | to remove all the CPUs that are currently assigned to a cpuset, |
282 | then the kernel will automatically update the cpus_allowed of all | 393 | then the kernel will automatically update the cpus_allowed of all |
diff --git a/Documentation/dvb/avermedia.txt b/Documentation/dvb/avermedia.txt index 2dc260b2b0a4..068070ff13cd 100644 --- a/Documentation/dvb/avermedia.txt +++ b/Documentation/dvb/avermedia.txt | |||
@@ -150,7 +150,8 @@ Getting the card going | |||
150 | 150 | ||
151 | The frontend module sp887x.o, requires an external firmware. | 151 | The frontend module sp887x.o, requires an external firmware. |
152 | Please use the command "get_dvb_firmware sp887x" to download | 152 | Please use the command "get_dvb_firmware sp887x" to download |
153 | it. Then copy it to /usr/lib/hotplug/firmware. | 153 | it. Then copy it to /usr/lib/hotplug/firmware or /lib/firmware/ |
154 | (depending on configuration of firmware hotplug). | ||
154 | 155 | ||
155 | Receiving DVB-T in Australia | 156 | Receiving DVB-T in Australia |
156 | 157 | ||
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index be6eb4c75991..75c28a174092 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware | |||
@@ -23,7 +23,7 @@ use IO::Handle; | |||
23 | 23 | ||
24 | @components = ( "sp8870", "sp887x", "tda10045", "tda10046", "av7110", "dec2000t", | 24 | @components = ( "sp8870", "sp887x", "tda10045", "tda10046", "av7110", "dec2000t", |
25 | "dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", | 25 | "dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", |
26 | "or51211", "or51132_qam", "or51132_vsb"); | 26 | "or51211", "or51132_qam", "or51132_vsb", "bluebird"); |
27 | 27 | ||
28 | # Check args | 28 | # Check args |
29 | syntax() if (scalar(@ARGV) != 1); | 29 | syntax() if (scalar(@ARGV) != 1); |
@@ -34,7 +34,11 @@ for ($i=0; $i < scalar(@components); $i++) { | |||
34 | if ($cid eq $components[$i]) { | 34 | if ($cid eq $components[$i]) { |
35 | $outfile = eval($cid); | 35 | $outfile = eval($cid); |
36 | die $@ if $@; | 36 | die $@ if $@; |
37 | print STDERR "Firmware $outfile extracted successfully. Now copy it to either /lib/firmware or /usr/lib/hotplug/firmware/ (depending on your hotplug version).\n"; | 37 | print STDERR <<EOF; |
38 | Firmware $outfile extracted successfully. | ||
39 | Now copy it to either /usr/lib/hotplug/firmware or /lib/firmware | ||
40 | (depending on configuration of firmware hotplug). | ||
41 | EOF | ||
38 | exit(0); | 42 | exit(0); |
39 | } | 43 | } |
40 | } | 44 | } |
@@ -243,7 +247,7 @@ sub nxt2002 { | |||
243 | my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); | 247 | my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); |
244 | 248 | ||
245 | checkstandard(); | 249 | checkstandard(); |
246 | 250 | ||
247 | wgetfile($sourcefile, $url); | 251 | wgetfile($sourcefile, $url); |
248 | unzip($sourcefile, $tmpdir); | 252 | unzip($sourcefile, $tmpdir); |
249 | verify("$tmpdir/SkyNETU.sys", $hash); | 253 | verify("$tmpdir/SkyNETU.sys", $hash); |
@@ -308,6 +312,19 @@ sub or51132_vsb { | |||
308 | $fwfile; | 312 | $fwfile; |
309 | } | 313 | } |
310 | 314 | ||
315 | sub bluebird { | ||
316 | my $url = "http://www.linuxtv.org/download/dvb/firmware/dvb-usb-bluebird-01.fw"; | ||
317 | my $outfile = "dvb-usb-bluebird-01.fw"; | ||
318 | my $hash = "658397cb9eba9101af9031302671f49d"; | ||
319 | |||
320 | checkstandard(); | ||
321 | |||
322 | wgetfile($outfile, $url); | ||
323 | verify($outfile,$hash); | ||
324 | |||
325 | $outfile; | ||
326 | } | ||
327 | |||
311 | # --------------------------------------------------------------- | 328 | # --------------------------------------------------------------- |
312 | # Utilities | 329 | # Utilities |
313 | 330 | ||
diff --git a/Documentation/dvb/ttusb-dec.txt b/Documentation/dvb/ttusb-dec.txt index 5c1e984c26a7..b2f271cd784b 100644 --- a/Documentation/dvb/ttusb-dec.txt +++ b/Documentation/dvb/ttusb-dec.txt | |||
@@ -41,4 +41,5 @@ Hotplug Firmware Loading for 2.6 kernels | |||
41 | For 2.6 kernels the firmware is loaded at the point that the driver module is | 41 | For 2.6 kernels the firmware is loaded at the point that the driver module is |
42 | loaded. See linux/Documentation/dvb/firmware.txt for more information. | 42 | loaded. See linux/Documentation/dvb/firmware.txt for more information. |
43 | 43 | ||
44 | Copy the three files downloaded above into the /usr/lib/hotplug/firmware directory. | 44 | Copy the three files downloaded above into the /usr/lib/hotplug/firmware or |
45 | /lib/firmware directory (depending on configuration of firmware hotplug). | ||
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs index f90cc66ea919..9443a6d72cdd 100644 --- a/Documentation/fb/cyblafb/bugs +++ b/Documentation/fb/cyblafb/bugs | |||
@@ -11,4 +11,3 @@ Untested features | |||
11 | 11 | ||
12 | All LCD stuff is untested. If it worked in tridentfb, it should work in | 12 | All LCD stuff is untested. If it worked in tridentfb, it should work in |
13 | cyblafb. Please test and report the results to Knut_Petersen@t-online.de. | 13 | cyblafb. Please test and report the results to Knut_Petersen@t-online.de. |
14 | |||
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes index cf4351fc32ff..fe0e5223ba86 100644 --- a/Documentation/fb/cyblafb/fb.modes +++ b/Documentation/fb/cyblafb/fb.modes | |||
@@ -14,142 +14,141 @@ | |||
14 | # | 14 | # |
15 | 15 | ||
16 | mode "640x480-50" | 16 | mode "640x480-50" |
17 | geometry 640 480 640 3756 8 | 17 | geometry 640 480 2048 4096 8 |
18 | timings 47619 4294967256 24 17 0 216 3 | 18 | timings 47619 4294967256 24 17 0 216 3 |
19 | endmode | 19 | endmode |
20 | 20 | ||
21 | mode "640x480-60" | 21 | mode "640x480-60" |
22 | geometry 640 480 640 3756 8 | 22 | geometry 640 480 2048 4096 8 |
23 | timings 39682 4294967256 24 17 0 216 3 | 23 | timings 39682 4294967256 24 17 0 216 3 |
24 | endmode | 24 | endmode |
25 | 25 | ||
26 | mode "640x480-70" | 26 | mode "640x480-70" |
27 | geometry 640 480 640 3756 8 | 27 | geometry 640 480 2048 4096 8 |
28 | timings 34013 4294967256 24 17 0 216 3 | 28 | timings 34013 4294967256 24 17 0 216 3 |
29 | endmode | 29 | endmode |
30 | 30 | ||
31 | mode "640x480-72" | 31 | mode "640x480-72" |
32 | geometry 640 480 640 3756 8 | 32 | geometry 640 480 2048 4096 8 |
33 | timings 33068 4294967256 24 17 0 216 3 | 33 | timings 33068 4294967256 24 17 0 216 3 |
34 | endmode | 34 | endmode |
35 | 35 | ||
36 | mode "640x480-75" | 36 | mode "640x480-75" |
37 | geometry 640 480 640 3756 8 | 37 | geometry 640 480 2048 4096 8 |
38 | timings 31746 4294967256 24 17 0 216 3 | 38 | timings 31746 4294967256 24 17 0 216 3 |
39 | endmode | 39 | endmode |
40 | 40 | ||
41 | mode "640x480-80" | 41 | mode "640x480-80" |
42 | geometry 640 480 640 3756 8 | 42 | geometry 640 480 2048 4096 8 |
43 | timings 29761 4294967256 24 17 0 216 3 | 43 | timings 29761 4294967256 24 17 0 216 3 |
44 | endmode | 44 | endmode |
45 | 45 | ||
46 | mode "640x480-85" | 46 | mode "640x480-85" |
47 | geometry 640 480 640 3756 8 | 47 | geometry 640 480 2048 4096 8 |
48 | timings 28011 4294967256 24 17 0 216 3 | 48 | timings 28011 4294967256 24 17 0 216 3 |
49 | endmode | 49 | endmode |
50 | 50 | ||
51 | mode "800x600-50" | 51 | mode "800x600-50" |
52 | geometry 800 600 800 3221 8 | 52 | geometry 800 600 2048 4096 8 |
53 | timings 30303 96 24 14 0 136 11 | 53 | timings 30303 96 24 14 0 136 11 |
54 | endmode | 54 | endmode |
55 | 55 | ||
56 | mode "800x600-60" | 56 | mode "800x600-60" |
57 | geometry 800 600 800 3221 8 | 57 | geometry 800 600 2048 4096 8 |
58 | timings 25252 96 24 14 0 136 11 | 58 | timings 25252 96 24 14 0 136 11 |
59 | endmode | 59 | endmode |
60 | 60 | ||
61 | mode "800x600-70" | 61 | mode "800x600-70" |
62 | geometry 800 600 800 3221 8 | 62 | geometry 800 600 2048 4096 8 |
63 | timings 21645 96 24 14 0 136 11 | 63 | timings 21645 96 24 14 0 136 11 |
64 | endmode | 64 | endmode |
65 | 65 | ||
66 | mode "800x600-72" | 66 | mode "800x600-72" |
67 | geometry 800 600 800 3221 8 | 67 | geometry 800 600 2048 4096 8 |
68 | timings 21043 96 24 14 0 136 11 | 68 | timings 21043 96 24 14 0 136 11 |
69 | endmode | 69 | endmode |
70 | 70 | ||
71 | mode "800x600-75" | 71 | mode "800x600-75" |
72 | geometry 800 600 800 3221 8 | 72 | geometry 800 600 2048 4096 8 |
73 | timings 20202 96 24 14 0 136 11 | 73 | timings 20202 96 24 14 0 136 11 |
74 | endmode | 74 | endmode |
75 | 75 | ||
76 | mode "800x600-80" | 76 | mode "800x600-80" |
77 | geometry 800 600 800 3221 8 | 77 | geometry 800 600 2048 4096 8 |
78 | timings 18939 96 24 14 0 136 11 | 78 | timings 18939 96 24 14 0 136 11 |
79 | endmode | 79 | endmode |
80 | 80 | ||
81 | mode "800x600-85" | 81 | mode "800x600-85" |
82 | geometry 800 600 800 3221 8 | 82 | geometry 800 600 2048 4096 8 |
83 | timings 17825 96 24 14 0 136 11 | 83 | timings 17825 96 24 14 0 136 11 |
84 | endmode | 84 | endmode |
85 | 85 | ||
86 | mode "1024x768-50" | 86 | mode "1024x768-50" |
87 | geometry 1024 768 1024 2815 8 | 87 | geometry 1024 768 2048 4096 8 |
88 | timings 19054 144 24 29 0 120 3 | 88 | timings 19054 144 24 29 0 120 3 |
89 | endmode | 89 | endmode |
90 | 90 | ||
91 | mode "1024x768-60" | 91 | mode "1024x768-60" |
92 | geometry 1024 768 1024 2815 8 | 92 | geometry 1024 768 2048 4096 8 |
93 | timings 15880 144 24 29 0 120 3 | 93 | timings 15880 144 24 29 0 120 3 |
94 | endmode | 94 | endmode |
95 | 95 | ||
96 | mode "1024x768-70" | 96 | mode "1024x768-70" |
97 | geometry 1024 768 1024 2815 8 | 97 | geometry 1024 768 2048 4096 8 |
98 | timings 13610 144 24 29 0 120 3 | 98 | timings 13610 144 24 29 0 120 3 |
99 | endmode | 99 | endmode |
100 | 100 | ||
101 | mode "1024x768-72" | 101 | mode "1024x768-72" |
102 | geometry 1024 768 1024 2815 8 | 102 | geometry 1024 768 2048 4096 8 |
103 | timings 13232 144 24 29 0 120 3 | 103 | timings 13232 144 24 29 0 120 3 |
104 | endmode | 104 | endmode |
105 | 105 | ||
106 | mode "1024x768-75" | 106 | mode "1024x768-75" |
107 | geometry 1024 768 1024 2815 8 | 107 | geometry 1024 768 2048 4096 8 |
108 | timings 12703 144 24 29 0 120 3 | 108 | timings 12703 144 24 29 0 120 3 |
109 | endmode | 109 | endmode |
110 | 110 | ||
111 | mode "1024x768-80" | 111 | mode "1024x768-80" |
112 | geometry 1024 768 1024 2815 8 | 112 | geometry 1024 768 2048 4096 8 |
113 | timings 11910 144 24 29 0 120 3 | 113 | timings 11910 144 24 29 0 120 3 |
114 | endmode | 114 | endmode |
115 | 115 | ||
116 | mode "1024x768-85" | 116 | mode "1024x768-85" |
117 | geometry 1024 768 1024 2815 8 | 117 | geometry 1024 768 2048 4096 8 |
118 | timings 11209 144 24 29 0 120 3 | 118 | timings 11209 144 24 29 0 120 3 |
119 | endmode | 119 | endmode |
120 | 120 | ||
121 | mode "1280x1024-50" | 121 | mode "1280x1024-50" |
122 | geometry 1280 1024 1280 2662 8 | 122 | geometry 1280 1024 2048 4096 8 |
123 | timings 11114 232 16 39 0 160 3 | 123 | timings 11114 232 16 39 0 160 3 |
124 | endmode | 124 | endmode |
125 | 125 | ||
126 | mode "1280x1024-60" | 126 | mode "1280x1024-60" |
127 | geometry 1280 1024 1280 2662 8 | 127 | geometry 1280 1024 2048 4096 8 |
128 | timings 9262 232 16 39 0 160 3 | 128 | timings 9262 232 16 39 0 160 3 |
129 | endmode | 129 | endmode |
130 | 130 | ||
131 | mode "1280x1024-70" | 131 | mode "1280x1024-70" |
132 | geometry 1280 1024 1280 2662 8 | 132 | geometry 1280 1024 2048 4096 8 |
133 | timings 7939 232 16 39 0 160 3 | 133 | timings 7939 232 16 39 0 160 3 |
134 | endmode | 134 | endmode |
135 | 135 | ||
136 | mode "1280x1024-72" | 136 | mode "1280x1024-72" |
137 | geometry 1280 1024 1280 2662 8 | 137 | geometry 1280 1024 2048 4096 8 |
138 | timings 7719 232 16 39 0 160 3 | 138 | timings 7719 232 16 39 0 160 3 |
139 | endmode | 139 | endmode |
140 | 140 | ||
141 | mode "1280x1024-75" | 141 | mode "1280x1024-75" |
142 | geometry 1280 1024 1280 2662 8 | 142 | geometry 1280 1024 2048 4096 8 |
143 | timings 7410 232 16 39 0 160 3 | 143 | timings 7410 232 16 39 0 160 3 |
144 | endmode | 144 | endmode |
145 | 145 | ||
146 | mode "1280x1024-80" | 146 | mode "1280x1024-80" |
147 | geometry 1280 1024 1280 2662 8 | 147 | geometry 1280 1024 2048 4096 8 |
148 | timings 6946 232 16 39 0 160 3 | 148 | timings 6946 232 16 39 0 160 3 |
149 | endmode | 149 | endmode |
150 | 150 | ||
151 | mode "1280x1024-85" | 151 | mode "1280x1024-85" |
152 | geometry 1280 1024 1280 2662 8 | 152 | geometry 1280 1024 2048 4096 8 |
153 | timings 6538 232 16 39 0 160 3 | 153 | timings 6538 232 16 39 0 160 3 |
154 | endmode | 154 | endmode |
155 | |||
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance index eb4e47a9cea6..8d15d5dfc6b3 100644 --- a/Documentation/fb/cyblafb/performance +++ b/Documentation/fb/cyblafb/performance | |||
@@ -77,4 +77,3 @@ patch that speeds up kernel bitblitting a lot ( > 20%). | |||
77 | | | | | | | 77 | | | | | | |
78 | | | | | | | 78 | | | | | | |
79 | +-----------+-----------------+-----------------+-----------------+ | 79 | +-----------+-----------------+-----------------+-----------------+ |
80 | |||
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo index 80fb2f89b6c1..c5f6d0eae545 100644 --- a/Documentation/fb/cyblafb/todo +++ b/Documentation/fb/cyblafb/todo | |||
@@ -22,11 +22,10 @@ accelerated color blitting Who needs it? The console driver does use color | |||
22 | everything else is done using color expanding | 22 | everything else is done using color expanding |
23 | blitting of 1bpp character bitmaps. | 23 | blitting of 1bpp character bitmaps. |
24 | 24 | ||
25 | xpanning Who needs it? | ||
26 | |||
27 | ioctls Who needs it? | 25 | ioctls Who needs it? |
28 | 26 | ||
29 | TV-out Will be done later | 27 | TV-out Will be done later. Use "vga= " at boot time |
28 | to set a suitable video mode. | ||
30 | 29 | ||
31 | ??? Feel free to contact me if you have any | 30 | ??? Feel free to contact me if you have any |
32 | feature requests | 31 | feature requests |
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage index e627c8f54211..a39bb3d402a2 100644 --- a/Documentation/fb/cyblafb/usage +++ b/Documentation/fb/cyblafb/usage | |||
@@ -40,6 +40,16 @@ Selecting Modes | |||
40 | None of the modes possible to select as startup modes are affected by | 40 | None of the modes possible to select as startup modes are affected by |
41 | the problems described at the end of the next subsection. | 41 | the problems described at the end of the next subsection. |
42 | 42 | ||
43 | For all startup modes cyblafb chooses a virtual x resolution of 2048, | ||
44 | the only exception is mode 1280x1024 in combination with 32 bpp. This | ||
45 | allows ywrap scrolling for all those modes if rotation is 0 or 2, and | ||
46 | also fast scrolling if rotation is 1 or 3. The default virtual y reso- | ||
47 | lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32, | ||
48 | again with the only exception of 1280x1024 at 32 bpp. | ||
49 | |||
50 | Please do set your video memory size to 8 Mb in the Bios setup. Other | ||
51 | values will work, but performace is decreased for a lot of modes. | ||
52 | |||
43 | Mode changes using fbset | 53 | Mode changes using fbset |
44 | ======================== | 54 | ======================== |
45 | 55 | ||
@@ -54,20 +64,26 @@ Selecting Modes | |||
54 | - if a flat panel is found, cyblafb does not allow you | 64 | - if a flat panel is found, cyblafb does not allow you |
55 | to program a resolution higher than the physical | 65 | to program a resolution higher than the physical |
56 | resolution of the flat panel monitor | 66 | resolution of the flat panel monitor |
57 | - cyblafb does not allow xres to differ from xres_virtual | ||
58 | - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp | 67 | - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp |
59 | and (currently) 24 bit modes use a doubled vclk internally, | 68 | and (currently) 24 bit modes use a doubled vclk internally, |
60 | the dotclock limit as seen by fbset is 115 MHz for those | 69 | the dotclock limit as seen by fbset is 115 MHz for those |
61 | modes and 230 MHz for 8 and 16 bpp modes. | 70 | modes and 230 MHz for 8 and 16 bpp modes. |
71 | - cyblafb will allow you to select very high resolutions as | ||
72 | long as the hardware can be programmed to these modes. The | ||
73 | documented limit 1600x1200 is not enforced, but don't expect | ||
74 | perfect signal quality. | ||
62 | 75 | ||
63 | Any request that violates the rules given above will be ignored and | 76 | Any request that violates the rules given above will be either changed |
64 | fbset will return an error. | 77 | to something the hardware supports or an error value will be returned. |
65 | 78 | ||
66 | If you program a virtual y resolution higher than the hardware limit, | 79 | If you program a virtual y resolution higher than the hardware limit, |
67 | cyblafb will silently decrease that value to the highest possible | 80 | cyblafb will silently decrease that value to the highest possible |
68 | value. | 81 | value. The same is true for a virtual x resolution that is not |
82 | supported by the hardware. Cyblafb tries to adapt vyres first because | ||
83 | vxres decides if ywrap scrolling is possible or not. | ||
69 | 84 | ||
70 | Attempts to disable acceleration are ignored. | 85 | Attempts to disable acceleration are ignored, I believe that this is |
86 | safe. | ||
71 | 87 | ||
72 | Some video modes that should work do not work as expected. If you use | 88 | Some video modes that should work do not work as expected. If you use |
73 | the standard fb.modes, fbset 640x480-60 will program that mode, but | 89 | the standard fb.modes, fbset 640x480-60 will program that mode, but |
@@ -129,10 +145,6 @@ mode 640x480 or 800x600 or 1024x768 or 1280x1024 | |||
129 | verbosity 0 is the default, increase to at least 2 for every | 145 | verbosity 0 is the default, increase to at least 2 for every |
130 | bug report! | 146 | bug report! |
131 | 147 | ||
132 | vesafb allows cyblafb to be loaded after vesafb has been | ||
133 | loaded. See sections "Module unloading ...". | ||
134 | |||
135 | |||
136 | Development hints | 148 | Development hints |
137 | ================= | 149 | ================= |
138 | 150 | ||
@@ -195,7 +207,7 @@ a graphics mode. | |||
195 | After booting, load cyblafb without any mode and bpp parameter and assign | 207 | After booting, load cyblafb without any mode and bpp parameter and assign |
196 | cyblafb to individual ttys using con2fb, e.g.: | 208 | cyblafb to individual ttys using con2fb, e.g.: |
197 | 209 | ||
198 | modprobe cyblafb vesafb=1 | 210 | modprobe cyblafb |
199 | con2fb /dev/fb1 /dev/tty1 | 211 | con2fb /dev/fb1 /dev/tty1 |
200 | 212 | ||
201 | Unloading cyblafb works without problems after you assign vesafb to all | 213 | Unloading cyblafb works without problems after you assign vesafb to all |
@@ -203,4 +215,3 @@ ttys again, e.g.: | |||
203 | 215 | ||
204 | con2fb /dev/fb0 /dev/tty1 | 216 | con2fb /dev/fb0 /dev/tty1 |
205 | rmmod cyblafb | 217 | rmmod cyblafb |
206 | |||
diff --git a/Documentation/fb/cyblafb/whatsnew b/Documentation/fb/cyblafb/whatsnew new file mode 100644 index 000000000000..76c07a26e044 --- /dev/null +++ b/Documentation/fb/cyblafb/whatsnew | |||
@@ -0,0 +1,29 @@ | |||
1 | 0.62 | ||
2 | ==== | ||
3 | |||
4 | - the vesafb parameter has been removed as I decided to allow the | ||
5 | feature without any special parameter. | ||
6 | |||
7 | - Cyblafb does not use the vga style of panning any longer, now the | ||
8 | "right view" register in the graphics engine IO space is used. Without | ||
9 | that change it was impossible to use all available memory, and without | ||
10 | access to all available memory it is impossible to ywrap. | ||
11 | |||
12 | - The imageblit function now uses hardware acceleration for all font | ||
13 | widths. Hardware blitting across pixel column 2048 is broken in the | ||
14 | cyberblade/i1 graphics core, but we work around that hardware bug. | ||
15 | |||
16 | - modes with vxres != xres are supported now. | ||
17 | |||
18 | - ywrap scrolling is supported now and the default. This is a big | ||
19 | performance gain. | ||
20 | |||
21 | - default video modes use vyres > yres and vxres > xres to allow | ||
22 | almost optimal scrolling speed for normal and rotated screens | ||
23 | |||
24 | - some features mainly usefull for debugging the upper layers of the | ||
25 | framebuffer system have been added, have a look at the code | ||
26 | |||
27 | - fixed: Oops after unloading cyblafb when reading /proc/io* | ||
28 | |||
29 | - we work around some bugs of the higher framebuffer layers. | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 8ae8dad8e150..9474501dd6cc 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -71,15 +71,6 @@ Who: Mauro Carvalho Chehab <mchehab@brturbo.com.br> | |||
71 | 71 | ||
72 | --------------------------- | 72 | --------------------------- |
73 | 73 | ||
74 | What: i2c sysfs name change: in1_ref, vid deprecated in favour of cpu0_vid | ||
75 | When: November 2005 | ||
76 | Files: drivers/i2c/chips/adm1025.c, drivers/i2c/chips/adm1026.c | ||
77 | Why: Match the other drivers' name for the same function, duplicate names | ||
78 | will be available until removal of old names. | ||
79 | Who: Grant Coady <gcoady@gmail.com> | ||
80 | |||
81 | --------------------------- | ||
82 | |||
83 | What: remove EXPORT_SYMBOL(panic_timeout) | 74 | What: remove EXPORT_SYMBOL(panic_timeout) |
84 | When: April 2006 | 75 | When: April 2006 |
85 | Files: kernel/panic.c | 76 | Files: kernel/panic.c |
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index 9840d5b8d5b9..f4d0de6bac63 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt | |||
@@ -2,11 +2,11 @@ | |||
2 | Ext3 Filesystem | 2 | Ext3 Filesystem |
3 | =============== | 3 | =============== |
4 | 4 | ||
5 | ext3 was originally released in September 1999. Written by Stephen Tweedie | 5 | Ext3 was originally released in September 1999. Written by Stephen Tweedie |
6 | for 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, | 6 | for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, |
7 | Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie. | 7 | Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie. |
8 | 8 | ||
9 | ext3 is ext2 filesystem enhanced with journalling capabilities. | 9 | Ext3 is the ext2 filesystem enhanced with journalling capabilities. |
10 | 10 | ||
11 | Options | 11 | Options |
12 | ======= | 12 | ======= |
@@ -14,64 +14,71 @@ Options | |||
14 | When mounting an ext3 filesystem, the following option are accepted: | 14 | When mounting an ext3 filesystem, the following option are accepted: |
15 | (*) == default | 15 | (*) == default |
16 | 16 | ||
17 | jounal=update Update the ext3 file system's journal to the | 17 | journal=update Update the ext3 file system's journal to the current |
18 | current format. | 18 | format. |
19 | 19 | ||
20 | journal=inum When a journal already exists, this option is | 20 | journal=inum When a journal already exists, this option is ignored. |
21 | ignored. Otherwise, it specifies the number of | 21 | Otherwise, it specifies the number of the inode which |
22 | the inode which will represent the ext3 file | 22 | will represent the ext3 file system's journal file. |
23 | system's journal file. | 23 | |
24 | journal_dev=devnum When the external journal device's major/minor numbers | ||
25 | have changed, this option allows the user to specify | ||
26 | the new journal location. The journal device is | ||
27 | identified through its new major/minor numbers encoded | ||
28 | in devnum. | ||
24 | 29 | ||
25 | noload Don't load the journal on mounting. | 30 | noload Don't load the journal on mounting. |
26 | 31 | ||
27 | data=journal All data are committed into the journal prior | 32 | data=journal All data are committed into the journal prior to being |
28 | to being written into the main file system. | 33 | written into the main file system. |
29 | 34 | ||
30 | data=ordered (*) All data are forced directly out to the main file | 35 | data=ordered (*) All data are forced directly out to the main file |
31 | system prior to its metadata being committed to | 36 | system prior to its metadata being committed to the |
32 | the journal. | 37 | journal. |
33 | 38 | ||
34 | data=writeback Data ordering is not preserved, data may be | 39 | data=writeback Data ordering is not preserved, data may be written |
35 | written into the main file system after its | 40 | into the main file system after its metadata has been |
36 | metadata has been committed to the journal. | 41 | committed to the journal. |
37 | 42 | ||
38 | commit=nrsec (*) Ext3 can be told to sync all its data and metadata | 43 | commit=nrsec (*) Ext3 can be told to sync all its data and metadata |
39 | every 'nrsec' seconds. The default value is 5 seconds. | 44 | every 'nrsec' seconds. The default value is 5 seconds. |
40 | This means that if you lose your power, you will lose, | 45 | This means that if you lose your power, you will lose |
41 | as much, the latest 5 seconds of work (your filesystem | 46 | as much as the latest 5 seconds of work (your |
42 | will not be damaged though, thanks to journaling). This | 47 | filesystem will not be damaged though, thanks to the |
43 | default value (or any low value) will hurt performance, | 48 | journaling). This default value (or any low value) |
44 | but it's good for data-safety. Setting it to 0 will | 49 | will hurt performance, but it's good for data-safety. |
45 | have the same effect than leaving the default 5 sec. | 50 | Setting it to 0 will have the same effect as leaving |
51 | it at the default (5 seconds). | ||
46 | Setting it to very large values will improve | 52 | Setting it to very large values will improve |
47 | performance. | 53 | performance. |
48 | 54 | ||
49 | barrier=1 This enables/disables barriers. barrier=0 disables it, | 55 | barrier=1 This enables/disables barriers. barrier=0 disables |
50 | barrier=1 enables it. | 56 | it, barrier=1 enables it. |
51 | 57 | ||
52 | orlov (*) This enables the new Orlov block allocator. It's enabled | 58 | orlov (*) This enables the new Orlov block allocator. It is |
53 | by default. | 59 | enabled by default. |
54 | 60 | ||
55 | oldalloc This disables the Orlov block allocator and enables the | 61 | oldalloc This disables the Orlov block allocator and enables |
56 | old block allocator. Orlov should have better performance, | 62 | the old block allocator. Orlov should have better |
57 | we'd like to get some feedback if it's the contrary for | 63 | performance - we'd like to get some feedback if it's |
58 | you. | 64 | the contrary for you. |
59 | 65 | ||
60 | user_xattr Enables Extended User Attributes. Additionally, you need | 66 | user_xattr Enables Extended User Attributes. Additionally, you |
61 | to have extended attribute support enabled in the kernel | 67 | need to have extended attribute support enabled in the |
62 | configuration (CONFIG_EXT3_FS_XATTR). See the attr(5) | 68 | kernel configuration (CONFIG_EXT3_FS_XATTR). See the |
63 | manual page and http://acl.bestbits.at to learn more | 69 | attr(5) manual page and http://acl.bestbits.at/ to |
64 | about extended attributes. | 70 | learn more about extended attributes. |
65 | 71 | ||
66 | nouser_xattr Disables Extended User Attributes. | 72 | nouser_xattr Disables Extended User Attributes. |
67 | 73 | ||
68 | acl Enables POSIX Access Control Lists support. Additionally, | 74 | acl Enables POSIX Access Control Lists support. |
69 | you need to have ACL support enabled in the kernel | 75 | Additionally, you need to have ACL support enabled in |
70 | configuration (CONFIG_EXT3_FS_POSIX_ACL). See the acl(5) | 76 | the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL). |
71 | manual page and http://acl.bestbits.at for more | 77 | See the acl(5) manual page and http://acl.bestbits.at/ |
72 | information. | 78 | for more information. |
73 | 79 | ||
74 | noacl This option disables POSIX Access Control List support. | 80 | noacl This option disables POSIX Access Control List |
81 | support. | ||
75 | 82 | ||
76 | reservation | 83 | reservation |
77 | 84 | ||
@@ -83,7 +90,7 @@ bsddf (*) Make 'df' act like BSD. | |||
83 | minixdf Make 'df' act like Minix. | 90 | minixdf Make 'df' act like Minix. |
84 | 91 | ||
85 | check=none Don't do extra checking of bitmaps on mount. | 92 | check=none Don't do extra checking of bitmaps on mount. |
86 | nocheck | 93 | nocheck |
87 | 94 | ||
88 | debug Extra debugging information is sent to syslog. | 95 | debug Extra debugging information is sent to syslog. |
89 | 96 | ||
@@ -92,7 +99,7 @@ errors=continue Keep going on a filesystem error. | |||
92 | errors=panic Panic and halt the machine if an error occurs. | 99 | errors=panic Panic and halt the machine if an error occurs. |
93 | 100 | ||
94 | grpid Give objects the same group ID as their creator. | 101 | grpid Give objects the same group ID as their creator. |
95 | bsdgroups | 102 | bsdgroups |
96 | 103 | ||
97 | nogrpid (*) New objects have the group ID of their creator. | 104 | nogrpid (*) New objects have the group ID of their creator. |
98 | sysvgroups | 105 | sysvgroups |
@@ -103,81 +110,81 @@ resuid=n The user ID which may use the reserved blocks. | |||
103 | 110 | ||
104 | sb=n Use alternate superblock at this location. | 111 | sb=n Use alternate superblock at this location. |
105 | 112 | ||
106 | quota Quota options are currently silently ignored. | 113 | quota |
107 | noquota (see fs/ext3/super.c, line 594) | 114 | noquota |
108 | grpquota | 115 | grpquota |
109 | usrquota | 116 | usrquota |
110 | 117 | ||
111 | 118 | ||
112 | Specification | 119 | Specification |
113 | ============= | 120 | ============= |
114 | ext3 shares all disk implementation with ext2 filesystem, and add | 121 | Ext3 shares all disk implementation with the ext2 filesystem, and adds |
115 | transactions capabilities to ext2. Journaling is done by the | 122 | transactions capabilities to ext2. Journaling is done by the Journaling Block |
116 | Journaling block device layer. | 123 | Device layer. |
117 | 124 | ||
118 | Journaling Block Device layer | 125 | Journaling Block Device layer |
119 | ----------------------------- | 126 | ----------------------------- |
120 | The Journaling Block Device layer (JBD) isn't ext3 specific. It was | 127 | The Journaling Block Device layer (JBD) isn't ext3 specific. It was design to |
121 | design to add journaling capabilities on a block device. The ext3 | 128 | add journaling capabilities on a block device. The ext3 filesystem code will |
122 | filesystem code will inform the JBD of modifications it is performing | 129 | inform the JBD of modifications it is performing (called a transaction). The |
123 | (Call a transaction). the journal support the transactions start and | 130 | journal supports the transactions start and stop, and in case of crash, the |
124 | stop, and in case of crash, the journal can replayed the transactions | 131 | journal can replayed the transactions to put the partition back in a |
125 | to put the partition on a consistent state fastly. | 132 | consistent state fast. |
126 | 133 | ||
127 | handles represent a single atomic update to a filesystem. JBD can | 134 | Handles represent a single atomic update to a filesystem. JBD can handle an |
128 | handle external journal on a block device. | 135 | external journal on a block device. |
129 | 136 | ||
130 | Data Mode | 137 | Data Mode |
131 | --------- | 138 | --------- |
132 | There's 3 different data modes: | 139 | There are 3 different data modes: |
133 | 140 | ||
134 | * writeback mode | 141 | * writeback mode |
135 | In data=writeback mode, ext3 does not journal data at all. This mode | 142 | In data=writeback mode, ext3 does not journal data at all. This mode provides |
136 | provides a similar level of journaling as XFS, JFS, and ReiserFS in its | 143 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default |
137 | default mode - metadata journaling. A crash+recovery can cause | 144 | mode - metadata journaling. A crash+recovery can cause incorrect data to |
138 | incorrect data to appear in files which were written shortly before the | 145 | appear in files which were written shortly before the crash. This mode will |
139 | crash. This mode will typically provide the best ext3 performance. | 146 | typically provide the best ext3 performance. |
140 | 147 | ||
141 | * ordered mode | 148 | * ordered mode |
142 | In data=ordered mode, ext3 only officially journals metadata, but it | 149 | In data=ordered mode, ext3 only officially journals metadata, but it logically |
143 | logically groups metadata and data blocks into a single unit called a | 150 | groups metadata and data blocks into a single unit called a transaction. When |
144 | transaction. When it's time to write the new metadata out to disk, the | 151 | it's time to write the new metadata out to disk, the associated data blocks |
145 | associated data blocks are written first. In general, this mode | 152 | are written first. In general, this mode performs slightly slower than |
146 | perform slightly slower than writeback but significantly faster than | 153 | writeback but significantly faster than journal mode. |
147 | journal mode. | ||
148 | 154 | ||
149 | * journal mode | 155 | * journal mode |
150 | data=journal mode provides full data and metadata journaling. All new | 156 | data=journal mode provides full data and metadata journaling. All new data is |
151 | data is written to the journal first, and then to its final location. | 157 | written to the journal first, and then to its final location. |
152 | In the event of a crash, the journal can be replayed, bringing both | 158 | In the event of a crash, the journal can be replayed, bringing both data and |
153 | data and metadata into a consistent state. This mode is the slowest | 159 | metadata into a consistent state. This mode is the slowest except when data |
154 | except when data needs to be read from and written to disk at the same | 160 | needs to be read from and written to disk at the same time where it |
155 | time where it outperform all others mode. | 161 | outperforms all others modes. |
156 | 162 | ||
157 | Compatibility | 163 | Compatibility |
158 | ------------- | 164 | ------------- |
159 | 165 | ||
160 | Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`. | 166 | Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`. |
161 | Ext3 is fully compatible with Ext2. Ext3 partitions can easily be | 167 | Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as |
162 | mounted as Ext2. | 168 | Ext2. |
169 | |||
163 | 170 | ||
164 | External Tools | 171 | External Tools |
165 | ============== | 172 | ============== |
166 | see manual pages to know more. | 173 | See manual pages to learn more. |
174 | |||
175 | tune2fs: create a ext3 journal on a ext2 partition with the -j flag. | ||
176 | mke2fs: create a ext3 partition with the -j flag. | ||
177 | debugfs: ext2 and ext3 file system debugger. | ||
167 | 178 | ||
168 | tune2fs: create a ext3 journal on a ext2 partition with the -j flags | ||
169 | mke2fs: create a ext3 partition with the -j flags | ||
170 | debugfs: ext2 and ext3 file system debugger | ||
171 | 179 | ||
172 | References | 180 | References |
173 | ========== | 181 | ========== |
174 | 182 | ||
175 | kernel source: file:/usr/src/linux/fs/ext3 | 183 | kernel source: <file:fs/ext3/> |
176 | file:/usr/src/linux/fs/jbd | 184 | <file:fs/jbd/> |
177 | 185 | ||
178 | programs: http://e2fsprogs.sourceforge.net | 186 | programs: http://e2fsprogs.sourceforge.net/ |
179 | 187 | ||
180 | useful link: | 188 | useful links: http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html |
181 | http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html | ||
182 | http://www-106.ibm.com/developerworks/linux/library/l-fs7/ | 189 | http://www-106.ibm.com/developerworks/linux/library/l-fs7/ |
183 | http://www-106.ibm.com/developerworks/linux/library/l-fs8/ | 190 | http://www-106.ibm.com/developerworks/linux/library/l-fs8/ |
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d4773565ea2f..944cf109a6f5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -418,7 +418,7 @@ VmallocChunk: 111088 kB | |||
418 | Dirty: Memory which is waiting to get written back to the disk | 418 | Dirty: Memory which is waiting to get written back to the disk |
419 | Writeback: Memory which is actively being written back to the disk | 419 | Writeback: Memory which is actively being written back to the disk |
420 | Mapped: files which have been mmaped, such as libraries | 420 | Mapped: files which have been mmaped, such as libraries |
421 | Slab: in-kernel data structures cache | 421 | Slab: in-kernel data structures cache |
422 | CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), | 422 | CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), |
423 | this is the total amount of memory currently available to | 423 | this is the total amount of memory currently available to |
424 | be allocated on the system. This limit is only adhered to | 424 | be allocated on the system. This limit is only adhered to |
@@ -1302,6 +1302,23 @@ VM has token based thrashing control mechanism and uses the token to prevent | |||
1302 | unnecessary page faults in thrashing situation. The unit of the value is | 1302 | unnecessary page faults in thrashing situation. The unit of the value is |
1303 | second. The value would be useful to tune thrashing behavior. | 1303 | second. The value would be useful to tune thrashing behavior. |
1304 | 1304 | ||
1305 | drop_caches | ||
1306 | ----------- | ||
1307 | |||
1308 | Writing to this will cause the kernel to drop clean caches, dentries and | ||
1309 | inodes from memory, causing that memory to become free. | ||
1310 | |||
1311 | To free pagecache: | ||
1312 | echo 1 > /proc/sys/vm/drop_caches | ||
1313 | To free dentries and inodes: | ||
1314 | echo 2 > /proc/sys/vm/drop_caches | ||
1315 | To free pagecache, dentries and inodes: | ||
1316 | echo 3 > /proc/sys/vm/drop_caches | ||
1317 | |||
1318 | As this is a non-destructive operation and dirty objects are not freeable, the | ||
1319 | user should run `sync' first. | ||
1320 | |||
1321 | |||
1305 | 2.5 /proc/sys/dev - Device specific parameters | 1322 | 2.5 /proc/sys/dev - Device specific parameters |
1306 | ---------------------------------------------- | 1323 | ---------------------------------------------- |
1307 | 1324 | ||
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index b3404a032596..60ab61e54e8a 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt | |||
@@ -143,12 +143,26 @@ as the following example: | |||
143 | dir /mnt 755 0 0 | 143 | dir /mnt 755 0 0 |
144 | file /init initramfs/init.sh 755 0 0 | 144 | file /init initramfs/init.sh 755 0 0 |
145 | 145 | ||
146 | Run "usr/gen_init_cpio" (after the kernel build) to get a usage message | ||
147 | documenting the above file format. | ||
148 | |||
146 | One advantage of the text file is that root access is not required to | 149 | One advantage of the text file is that root access is not required to |
147 | set permissions or create device nodes in the new archive. (Note that those | 150 | set permissions or create device nodes in the new archive. (Note that those |
148 | two example "file" entries expect to find files named "init.sh" and "busybox" in | 151 | two example "file" entries expect to find files named "init.sh" and "busybox" in |
149 | a directory called "initramfs", under the linux-2.6.* directory. See | 152 | a directory called "initramfs", under the linux-2.6.* directory. See |
150 | Documentation/early-userspace/README for more details.) | 153 | Documentation/early-userspace/README for more details.) |
151 | 154 | ||
155 | The kernel does not depend on external cpio tools, gen_init_cpio is created | ||
156 | from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's | ||
157 | boot-time extractor is also (obviously) self-contained. However, if you _do_ | ||
158 | happen to have cpio installed, the following command line can extract the | ||
159 | generated cpio image back into its component files: | ||
160 | |||
161 | cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames | ||
162 | |||
163 | Contents of initramfs: | ||
164 | ---------------------- | ||
165 | |||
152 | If you don't already understand what shared libraries, devices, and paths | 166 | If you don't already understand what shared libraries, devices, and paths |
153 | you need to get a minimal root filesystem up and running, here are some | 167 | you need to get a minimal root filesystem up and running, here are some |
154 | references: | 168 | references: |
@@ -161,13 +175,69 @@ designed to be a tiny C library to statically link early userspace | |||
161 | code against, along with some related utilities. It is BSD licensed. | 175 | code against, along with some related utilities. It is BSD licensed. |
162 | 176 | ||
163 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) | 177 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) |
164 | myself. These are LGPL and GPL, respectively. | 178 | myself. These are LGPL and GPL, respectively. (A self-contained initramfs |
179 | package is planned for the busybox 1.2 release.) | ||
165 | 180 | ||
166 | In theory you could use glibc, but that's not well suited for small embedded | 181 | In theory you could use glibc, but that's not well suited for small embedded |
167 | uses like this. (A "hello world" program statically linked against glibc is | 182 | uses like this. (A "hello world" program statically linked against glibc is |
168 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do | 183 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do |
169 | name lookups, even when otherwise statically linked.) | 184 | name lookups, even when otherwise statically linked.) |
170 | 185 | ||
186 | Why cpio rather than tar? | ||
187 | ------------------------- | ||
188 | |||
189 | This decision was made back in December, 2001. The discussion started here: | ||
190 | |||
191 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html | ||
192 | |||
193 | And spawned a second thread (specifically on tar vs cpio), starting here: | ||
194 | |||
195 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html | ||
196 | |||
197 | The quick and dirty summary version (which is no substitute for reading | ||
198 | the above threads) is: | ||
199 | |||
200 | 1) cpio is a standard. It's decades old (from the AT&T days), and already | ||
201 | widely used on Linux (inside RPM, Red Hat's device driver disks). Here's | ||
202 | a Linux Journal article about it from 1996: | ||
203 | |||
204 | http://www.linuxjournal.com/article/1213 | ||
205 | |||
206 | It's not as popular as tar because the traditional cpio command line tools | ||
207 | require _truly_hideous_ command line arguments. But that says nothing | ||
208 | either way about the archive format, and there are alternative tools, | ||
209 | such as: | ||
210 | |||
211 | http://freshmeat.net/projects/afio/ | ||
212 | |||
213 | 2) The cpio archive format chosen by the kernel is simpler and cleaner (and | ||
214 | thus easier to create and parse) than any of the (literally dozens of) | ||
215 | various tar archive formats. The complete initramfs archive format is | ||
216 | explained in buffer-format.txt, created in usr/gen_init_cpio.c, and | ||
217 | extracted in init/initramfs.c. All three together come to less than 26k | ||
218 | total of human-readable text. | ||
219 | |||
220 | 3) The GNU project standardizing on tar is approximately as relevant as | ||
221 | Windows standardizing on zip. Linux is not part of either, and is free | ||
222 | to make its own technical decisions. | ||
223 | |||
224 | 4) Since this is a kernel internal format, it could easily have been | ||
225 | something brand new. The kernel provides its own tools to create and | ||
226 | extract this format anyway. Using an existing standard was preferable, | ||
227 | but not essential. | ||
228 | |||
229 | 5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be | ||
230 | supported on the kernel side"): | ||
231 | |||
232 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html | ||
233 | |||
234 | explained his reasoning: | ||
235 | |||
236 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html | ||
237 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html | ||
238 | |||
239 | and, most importantly, designed and implemented the initramfs code. | ||
240 | |||
171 | Future directions: | 241 | Future directions: |
172 | ------------------ | 242 | ------------------ |
173 | 243 | ||
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt index d803abed29f0..5832377b7340 100644 --- a/Documentation/filesystems/relayfs.txt +++ b/Documentation/filesystems/relayfs.txt | |||
@@ -44,30 +44,41 @@ relayfs can operate in a mode where it will overwrite data not yet | |||
44 | collected by userspace, and not wait for it to consume it. | 44 | collected by userspace, and not wait for it to consume it. |
45 | 45 | ||
46 | relayfs itself does not provide for communication of such data between | 46 | relayfs itself does not provide for communication of such data between |
47 | userspace and kernel, allowing the kernel side to remain simple and not | 47 | userspace and kernel, allowing the kernel side to remain simple and |
48 | impose a single interface on userspace. It does provide a separate | 48 | not impose a single interface on userspace. It does provide a set of |
49 | helper though, described below. | 49 | examples and a separate helper though, described below. |
50 | |||
51 | klog and relay-apps example code | ||
52 | ================================ | ||
53 | |||
54 | relayfs itself is ready to use, but to make things easier, a couple | ||
55 | simple utility functions and a set of examples are provided. | ||
56 | |||
57 | The relay-apps example tarball, available on the relayfs sourceforge | ||
58 | site, contains a set of self-contained examples, each consisting of a | ||
59 | pair of .c files containing boilerplate code for each of the user and | ||
60 | kernel sides of a relayfs application; combined these two sets of | ||
61 | boilerplate code provide glue to easily stream data to disk, without | ||
62 | having to bother with mundane housekeeping chores. | ||
63 | |||
64 | The 'klog debugging functions' patch (klog.patch in the relay-apps | ||
65 | tarball) provides a couple of high-level logging functions to the | ||
66 | kernel which allow writing formatted text or raw data to a channel, | ||
67 | regardless of whether a channel to write into exists or not, or | ||
68 | whether relayfs is compiled into the kernel or is configured as a | ||
69 | module. These functions allow you to put unconditional 'trace' | ||
70 | statements anywhere in the kernel or kernel modules; only when there | ||
71 | is a 'klog handler' registered will data actually be logged (see the | ||
72 | klog and kleak examples for details). | ||
73 | |||
74 | It is of course possible to use relayfs from scratch i.e. without | ||
75 | using any of the relay-apps example code or klog, but you'll have to | ||
76 | implement communication between userspace and kernel, allowing both to | ||
77 | convey the state of buffers (full, empty, amount of padding). | ||
78 | |||
79 | klog and the relay-apps examples can be found in the relay-apps | ||
80 | tarball on http://relayfs.sourceforge.net | ||
50 | 81 | ||
51 | klog, relay-app & librelay | ||
52 | ========================== | ||
53 | |||
54 | relayfs itself is ready to use, but to make things easier, two | ||
55 | additional systems are provided. klog is a simple wrapper to make | ||
56 | writing formatted text or raw data to a channel simpler, regardless of | ||
57 | whether a channel to write into exists or not, or whether relayfs is | ||
58 | compiled into the kernel or is configured as a module. relay-app is | ||
59 | the kernel counterpart of userspace librelay.c, combined these two | ||
60 | files provide glue to easily stream data to disk, without having to | ||
61 | bother with housekeeping. klog and relay-app can be used together, | ||
62 | with klog providing high-level logging functions to the kernel and | ||
63 | relay-app taking care of kernel-user control and disk-logging chores. | ||
64 | |||
65 | It is possible to use relayfs without relay-app & librelay, but you'll | ||
66 | have to implement communication between userspace and kernel, allowing | ||
67 | both to convey the state of buffers (full, empty, amount of padding). | ||
68 | |||
69 | klog, relay-app and librelay can be found in the relay-apps tarball on | ||
70 | http://relayfs.sourceforge.net | ||
71 | 82 | ||
72 | The relayfs user space API | 83 | The relayfs user space API |
73 | ========================== | 84 | ========================== |
@@ -125,6 +136,8 @@ Here's a summary of the API relayfs provides to in-kernel clients: | |||
125 | relay_reset(chan) | 136 | relay_reset(chan) |
126 | relayfs_create_dir(name, parent) | 137 | relayfs_create_dir(name, parent) |
127 | relayfs_remove_dir(dentry) | 138 | relayfs_remove_dir(dentry) |
139 | relayfs_create_file(name, parent, mode, fops, data) | ||
140 | relayfs_remove_file(dentry) | ||
128 | 141 | ||
129 | channel management typically called on instigation of userspace: | 142 | channel management typically called on instigation of userspace: |
130 | 143 | ||
@@ -141,6 +154,8 @@ Here's a summary of the API relayfs provides to in-kernel clients: | |||
141 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) | 154 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) |
142 | buf_mapped(buf, filp) | 155 | buf_mapped(buf, filp) |
143 | buf_unmapped(buf, filp) | 156 | buf_unmapped(buf, filp) |
157 | create_buf_file(filename, parent, mode, buf, is_global) | ||
158 | remove_buf_file(dentry) | ||
144 | 159 | ||
145 | helper functions: | 160 | helper functions: |
146 | 161 | ||
@@ -320,6 +335,71 @@ forces a sub-buffer switch on all the channel buffers, and can be used | |||
320 | to finalize and process the last sub-buffers before the channel is | 335 | to finalize and process the last sub-buffers before the channel is |
321 | closed. | 336 | closed. |
322 | 337 | ||
338 | Creating non-relay files | ||
339 | ------------------------ | ||
340 | |||
341 | relay_open() automatically creates files in the relayfs filesystem to | ||
342 | represent the per-cpu kernel buffers; it's often useful for | ||
343 | applications to be able to create their own files alongside the relay | ||
344 | files in the relayfs filesystem as well e.g. 'control' files much like | ||
345 | those created in /proc or debugfs for similar purposes, used to | ||
346 | communicate control information between the kernel and user sides of a | ||
347 | relayfs application. For this purpose the relayfs_create_file() and | ||
348 | relayfs_remove_file() API functions exist. For relayfs_create_file(), | ||
349 | the caller passes in a set of user-defined file operations to be used | ||
350 | for the file and an optional void * to a user-specified data item, | ||
351 | which will be accessible via inode->u.generic_ip (see the relay-apps | ||
352 | tarball for examples). The file_operations are a required parameter | ||
353 | to relayfs_create_file() and thus the semantics of these files are | ||
354 | completely defined by the caller. | ||
355 | |||
356 | See the relay-apps tarball at http://relayfs.sourceforge.net for | ||
357 | examples of how these non-relay files are meant to be used. | ||
358 | |||
359 | Creating relay files in other filesystems | ||
360 | ----------------------------------------- | ||
361 | |||
362 | By default of course, relay_open() creates relay files in the relayfs | ||
363 | filesystem. Because relay_file_operations is exported, however, it's | ||
364 | also possible to create and use relay files in other pseudo-filesytems | ||
365 | such as debugfs. | ||
366 | |||
367 | For this purpose, two callback functions are provided, | ||
368 | create_buf_file() and remove_buf_file(). create_buf_file() is called | ||
369 | once for each per-cpu buffer from relay_open() to allow the client to | ||
370 | create a file to be used to represent the corresponding buffer; if | ||
371 | this callback is not defined, the default implementation will create | ||
372 | and return a file in the relayfs filesystem to represent the buffer. | ||
373 | The callback should return the dentry of the file created to represent | ||
374 | the relay buffer. Note that the parent directory passed to | ||
375 | relay_open() (and passed along to the callback), if specified, must | ||
376 | exist in the same filesystem the new relay file is created in. If | ||
377 | create_buf_file() is defined, remove_buf_file() must also be defined; | ||
378 | it's responsible for deleting the file(s) created in create_buf_file() | ||
379 | and is called during relay_close(). | ||
380 | |||
381 | The create_buf_file() implementation can also be defined in such a way | ||
382 | as to allow the creation of a single 'global' buffer instead of the | ||
383 | default per-cpu set. This can be useful for applications interested | ||
384 | mainly in seeing the relative ordering of system-wide events without | ||
385 | the need to bother with saving explicit timestamps for the purpose of | ||
386 | merging/sorting per-cpu files in a postprocessing step. | ||
387 | |||
388 | To have relay_open() create a global buffer, the create_buf_file() | ||
389 | implementation should set the value of the is_global outparam to a | ||
390 | non-zero value in addition to creating the file that will be used to | ||
391 | represent the single buffer. In the case of a global buffer, | ||
392 | create_buf_file() and remove_buf_file() will be called only once. The | ||
393 | normal channel-writing functions e.g. relay_write() can still be used | ||
394 | - writes from any cpu will transparently end up in the global buffer - | ||
395 | but since it is a global buffer, callers should make sure they use the | ||
396 | proper locking for such a buffer, either by wrapping writes in a | ||
397 | spinlock, or by copying a write function from relayfs_fs.h and | ||
398 | creating a local version that internally does the proper locking. | ||
399 | |||
400 | See the 'exported-relayfile' examples in the relay-apps tarball for | ||
401 | examples of creating and using relay files in debugfs. | ||
402 | |||
323 | Misc | 403 | Misc |
324 | ---- | 404 | ---- |
325 | 405 | ||
diff --git a/Documentation/filesystems/spufs.txt b/Documentation/filesystems/spufs.txt new file mode 100644 index 000000000000..8edc3952eff4 --- /dev/null +++ b/Documentation/filesystems/spufs.txt | |||
@@ -0,0 +1,521 @@ | |||
1 | SPUFS(2) Linux Programmer's Manual SPUFS(2) | ||
2 | |||
3 | |||
4 | |||
5 | NAME | ||
6 | spufs - the SPU file system | ||
7 | |||
8 | |||
9 | DESCRIPTION | ||
10 | The SPU file system is used on PowerPC machines that implement the Cell | ||
11 | Broadband Engine Architecture in order to access Synergistic Processor | ||
12 | Units (SPUs). | ||
13 | |||
14 | The file system provides a name space similar to posix shared memory or | ||
15 | message queues. Users that have write permissions on the file system | ||
16 | can use spu_create(2) to establish SPU contexts in the spufs root. | ||
17 | |||
18 | Every SPU context is represented by a directory containing a predefined | ||
19 | set of files. These files can be used for manipulating the state of the | ||
20 | logical SPU. Users can change permissions on those files, but not actu- | ||
21 | ally add or remove files. | ||
22 | |||
23 | |||
24 | MOUNT OPTIONS | ||
25 | uid=<uid> | ||
26 | set the user owning the mount point, the default is 0 (root). | ||
27 | |||
28 | gid=<gid> | ||
29 | set the group owning the mount point, the default is 0 (root). | ||
30 | |||
31 | |||
32 | FILES | ||
33 | The files in spufs mostly follow the standard behavior for regular sys- | ||
34 | tem calls like read(2) or write(2), but often support only a subset of | ||
35 | the operations supported on regular file systems. This list details the | ||
36 | supported operations and the deviations from the behaviour in the | ||
37 | respective man pages. | ||
38 | |||
39 | All files that support the read(2) operation also support readv(2) and | ||
40 | all files that support the write(2) operation also support writev(2). | ||
41 | All files support the access(2) and stat(2) family of operations, but | ||
42 | only the st_mode, st_nlink, st_uid and st_gid fields of struct stat | ||
43 | contain reliable information. | ||
44 | |||
45 | All files support the chmod(2)/fchmod(2) and chown(2)/fchown(2) opera- | ||
46 | tions, but will not be able to grant permissions that contradict the | ||
47 | possible operations, e.g. read access on the wbox file. | ||
48 | |||
49 | The current set of files is: | ||
50 | |||
51 | |||
52 | /mem | ||
53 | the contents of the local storage memory of the SPU. This can be | ||
54 | accessed like a regular shared memory file and contains both code and | ||
55 | data in the address space of the SPU. The possible operations on an | ||
56 | open mem file are: | ||
57 | |||
58 | read(2), pread(2), write(2), pwrite(2), lseek(2) | ||
59 | These operate as documented, with the exception that seek(2), | ||
60 | write(2) and pwrite(2) are not supported beyond the end of the | ||
61 | file. The file size is the size of the local storage of the SPU, | ||
62 | which normally is 256 kilobytes. | ||
63 | |||
64 | mmap(2) | ||
65 | Mapping mem into the process address space gives access to the | ||
66 | SPU local storage within the process address space. Only | ||
67 | MAP_SHARED mappings are allowed. | ||
68 | |||
69 | |||
70 | /mbox | ||
71 | The first SPU to CPU communication mailbox. This file is read-only and | ||
72 | can be read in units of 32 bits. The file can only be used in non- | ||
73 | blocking mode and it even poll() will not block on it. The possible | ||
74 | operations on an open mbox file are: | ||
75 | |||
76 | read(2) | ||
77 | If a count smaller than four is requested, read returns -1 and | ||
78 | sets errno to EINVAL. If there is no data available in the mail | ||
79 | box, the return value is set to -1 and errno becomes EAGAIN. | ||
80 | When data has been read successfully, four bytes are placed in | ||
81 | the data buffer and the value four is returned. | ||
82 | |||
83 | |||
84 | /ibox | ||
85 | The second SPU to CPU communication mailbox. This file is similar to | ||
86 | the first mailbox file, but can be read in blocking I/O mode, and the | ||
87 | poll familiy of system calls can be used to wait for it. The possible | ||
88 | operations on an open ibox file are: | ||
89 | |||
90 | read(2) | ||
91 | If a count smaller than four is requested, read returns -1 and | ||
92 | sets errno to EINVAL. If there is no data available in the mail | ||
93 | box and the file descriptor has been opened with O_NONBLOCK, the | ||
94 | return value is set to -1 and errno becomes EAGAIN. | ||
95 | |||
96 | If there is no data available in the mail box and the file | ||
97 | descriptor has been opened without O_NONBLOCK, the call will | ||
98 | block until the SPU writes to its interrupt mailbox channel. | ||
99 | When data has been read successfully, four bytes are placed in | ||
100 | the data buffer and the value four is returned. | ||
101 | |||
102 | poll(2) | ||
103 | Poll on the ibox file returns (POLLIN | POLLRDNORM) whenever | ||
104 | data is available for reading. | ||
105 | |||
106 | |||
107 | /wbox | ||
108 | The CPU to SPU communation mailbox. It is write-only can can be written | ||
109 | in units of 32 bits. If the mailbox is full, write() will block and | ||
110 | poll can be used to wait for it becoming empty again. The possible | ||
111 | operations on an open wbox file are: write(2) If a count smaller than | ||
112 | four is requested, write returns -1 and sets errno to EINVAL. If there | ||
113 | is no space available in the mail box and the file descriptor has been | ||
114 | opened with O_NONBLOCK, the return value is set to -1 and errno becomes | ||
115 | EAGAIN. | ||
116 | |||
117 | If there is no space available in the mail box and the file descriptor | ||
118 | has been opened without O_NONBLOCK, the call will block until the SPU | ||
119 | reads from its PPE mailbox channel. When data has been read success- | ||
120 | fully, four bytes are placed in the data buffer and the value four is | ||
121 | returned. | ||
122 | |||
123 | poll(2) | ||
124 | Poll on the ibox file returns (POLLOUT | POLLWRNORM) whenever | ||
125 | space is available for writing. | ||
126 | |||
127 | |||
128 | /mbox_stat | ||
129 | /ibox_stat | ||
130 | /wbox_stat | ||
131 | Read-only files that contain the length of the current queue, i.e. how | ||
132 | many words can be read from mbox or ibox or how many words can be | ||
133 | written to wbox without blocking. The files can be read only in 4-byte | ||
134 | units and return a big-endian binary integer number. The possible | ||
135 | operations on an open *box_stat file are: | ||
136 | |||
137 | read(2) | ||
138 | If a count smaller than four is requested, read returns -1 and | ||
139 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
140 | the data buffer, containing the number of elements that can be | ||
141 | read from (for mbox_stat and ibox_stat) or written to (for | ||
142 | wbox_stat) the respective mail box without blocking or resulting | ||
143 | in EAGAIN. | ||
144 | |||
145 | |||
146 | /npc | ||
147 | /decr | ||
148 | /decr_status | ||
149 | /spu_tag_mask | ||
150 | /event_mask | ||
151 | /srr0 | ||
152 | Internal registers of the SPU. The representation is an ASCII string | ||
153 | with the numeric value of the next instruction to be executed. These | ||
154 | can be used in read/write mode for debugging, but normal operation of | ||
155 | programs should not rely on them because access to any of them except | ||
156 | npc requires an SPU context save and is therefore very inefficient. | ||
157 | |||
158 | The contents of these files are: | ||
159 | |||
160 | npc Next Program Counter | ||
161 | |||
162 | decr SPU Decrementer | ||
163 | |||
164 | decr_status Decrementer Status | ||
165 | |||
166 | spu_tag_mask MFC tag mask for SPU DMA | ||
167 | |||
168 | event_mask Event mask for SPU interrupts | ||
169 | |||
170 | srr0 Interrupt Return address register | ||
171 | |||
172 | |||
173 | The possible operations on an open npc, decr, decr_status, | ||
174 | spu_tag_mask, event_mask or srr0 file are: | ||
175 | |||
176 | read(2) | ||
177 | When the count supplied to the read call is shorter than the | ||
178 | required length for the pointer value plus a newline character, | ||
179 | subsequent reads from the same file descriptor will result in | ||
180 | completing the string, regardless of changes to the register by | ||
181 | a running SPU task. When a complete string has been read, all | ||
182 | subsequent read operations will return zero bytes and a new file | ||
183 | descriptor needs to be opened to read the value again. | ||
184 | |||
185 | write(2) | ||
186 | A write operation on the file results in setting the register to | ||
187 | the value given in the string. The string is parsed from the | ||
188 | beginning to the first non-numeric character or the end of the | ||
189 | buffer. Subsequent writes to the same file descriptor overwrite | ||
190 | the previous setting. | ||
191 | |||
192 | |||
193 | /fpcr | ||
194 | This file gives access to the Floating Point Status and Control Regis- | ||
195 | ter as a four byte long file. The operations on the fpcr file are: | ||
196 | |||
197 | read(2) | ||
198 | If a count smaller than four is requested, read returns -1 and | ||
199 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
200 | the data buffer, containing the current value of the fpcr regis- | ||
201 | ter. | ||
202 | |||
203 | write(2) | ||
204 | If a count smaller than four is requested, write returns -1 and | ||
205 | sets errno to EINVAL. Otherwise, a four byte value is copied | ||
206 | from the data buffer, updating the value of the fpcr register. | ||
207 | |||
208 | |||
209 | /signal1 | ||
210 | /signal2 | ||
211 | The two signal notification channels of an SPU. These are read-write | ||
212 | files that operate on a 32 bit word. Writing to one of these files | ||
213 | triggers an interrupt on the SPU. The value writting to the signal | ||
214 | files can be read from the SPU through a channel read or from host user | ||
215 | space through the file. After the value has been read by the SPU, it | ||
216 | is reset to zero. The possible operations on an open signal1 or sig- | ||
217 | nal2 file are: | ||
218 | |||
219 | read(2) | ||
220 | If a count smaller than four is requested, read returns -1 and | ||
221 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
222 | the data buffer, containing the current value of the specified | ||
223 | signal notification register. | ||
224 | |||
225 | write(2) | ||
226 | If a count smaller than four is requested, write returns -1 and | ||
227 | sets errno to EINVAL. Otherwise, a four byte value is copied | ||
228 | from the data buffer, updating the value of the specified signal | ||
229 | notification register. The signal notification register will | ||
230 | either be replaced with the input data or will be updated to the | ||
231 | bitwise OR or the old value and the input data, depending on the | ||
232 | contents of the signal1_type, or signal2_type respectively, | ||
233 | file. | ||
234 | |||
235 | |||
236 | /signal1_type | ||
237 | /signal2_type | ||
238 | These two files change the behavior of the signal1 and signal2 notifi- | ||
239 | cation files. The contain a numerical ASCII string which is read as | ||
240 | either "1" or "0". In mode 0 (overwrite), the hardware replaces the | ||
241 | contents of the signal channel with the data that is written to it. in | ||
242 | mode 1 (logical OR), the hardware accumulates the bits that are subse- | ||
243 | quently written to it. The possible operations on an open signal1_type | ||
244 | or signal2_type file are: | ||
245 | |||
246 | read(2) | ||
247 | When the count supplied to the read call is shorter than the | ||
248 | required length for the digit plus a newline character, subse- | ||
249 | quent reads from the same file descriptor will result in com- | ||
250 | pleting the string. When a complete string has been read, all | ||
251 | subsequent read operations will return zero bytes and a new file | ||
252 | descriptor needs to be opened to read the value again. | ||
253 | |||
254 | write(2) | ||
255 | A write operation on the file results in setting the register to | ||
256 | the value given in the string. The string is parsed from the | ||
257 | beginning to the first non-numeric character or the end of the | ||
258 | buffer. Subsequent writes to the same file descriptor overwrite | ||
259 | the previous setting. | ||
260 | |||
261 | |||
262 | EXAMPLES | ||
263 | /etc/fstab entry | ||
264 | none /spu spufs gid=spu 0 0 | ||
265 | |||
266 | |||
267 | AUTHORS | ||
268 | Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>, | ||
269 | Ulrich Weigand <Ulrich.Weigand@de.ibm.com> | ||
270 | |||
271 | SEE ALSO | ||
272 | capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7) | ||
273 | |||
274 | |||
275 | |||
276 | Linux 2005-09-28 SPUFS(2) | ||
277 | |||
278 | ------------------------------------------------------------------------------ | ||
279 | |||
280 | SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2) | ||
281 | |||
282 | |||
283 | |||
284 | NAME | ||
285 | spu_run - execute an spu context | ||
286 | |||
287 | |||
288 | SYNOPSIS | ||
289 | #include <sys/spu.h> | ||
290 | |||
291 | int spu_run(int fd, unsigned int *npc, unsigned int *event); | ||
292 | |||
293 | DESCRIPTION | ||
294 | The spu_run system call is used on PowerPC machines that implement the | ||
295 | Cell Broadband Engine Architecture in order to access Synergistic Pro- | ||
296 | cessor Units (SPUs). It uses the fd that was returned from spu_cre- | ||
297 | ate(2) to address a specific SPU context. When the context gets sched- | ||
298 | uled to a physical SPU, it starts execution at the instruction pointer | ||
299 | passed in npc. | ||
300 | |||
301 | Execution of SPU code happens synchronously, meaning that spu_run does | ||
302 | not return while the SPU is still running. If there is a need to exe- | ||
303 | cute SPU code in parallel with other code on either the main CPU or | ||
304 | other SPUs, you need to create a new thread of execution first, e.g. | ||
305 | using the pthread_create(3) call. | ||
306 | |||
307 | When spu_run returns, the current value of the SPU instruction pointer | ||
308 | is written back to npc, so you can call spu_run again without updating | ||
309 | the pointers. | ||
310 | |||
311 | event can be a NULL pointer or point to an extended status code that | ||
312 | gets filled when spu_run returns. It can be one of the following con- | ||
313 | stants: | ||
314 | |||
315 | SPE_EVENT_DMA_ALIGNMENT | ||
316 | A DMA alignment error | ||
317 | |||
318 | SPE_EVENT_SPE_DATA_SEGMENT | ||
319 | A DMA segmentation error | ||
320 | |||
321 | SPE_EVENT_SPE_DATA_STORAGE | ||
322 | A DMA storage error | ||
323 | |||
324 | If NULL is passed as the event argument, these errors will result in a | ||
325 | signal delivered to the calling process. | ||
326 | |||
327 | RETURN VALUE | ||
328 | spu_run returns the value of the spu_status register or -1 to indicate | ||
329 | an error and set errno to one of the error codes listed below. The | ||
330 | spu_status register value contains a bit mask of status codes and | ||
331 | optionally a 14 bit code returned from the stop-and-signal instruction | ||
332 | on the SPU. The bit masks for the status codes are: | ||
333 | |||
334 | 0x02 SPU was stopped by stop-and-signal. | ||
335 | |||
336 | 0x04 SPU was stopped by halt. | ||
337 | |||
338 | 0x08 SPU is waiting for a channel. | ||
339 | |||
340 | 0x10 SPU is in single-step mode. | ||
341 | |||
342 | 0x20 SPU has tried to execute an invalid instruction. | ||
343 | |||
344 | 0x40 SPU has tried to access an invalid channel. | ||
345 | |||
346 | 0x3fff0000 | ||
347 | The bits masked with this value contain the code returned from | ||
348 | stop-and-signal. | ||
349 | |||
350 | There are always one or more of the lower eight bits set or an error | ||
351 | code is returned from spu_run. | ||
352 | |||
353 | ERRORS | ||
354 | EAGAIN or EWOULDBLOCK | ||
355 | fd is in non-blocking mode and spu_run would block. | ||
356 | |||
357 | EBADF fd is not a valid file descriptor. | ||
358 | |||
359 | EFAULT npc is not a valid pointer or status is neither NULL nor a valid | ||
360 | pointer. | ||
361 | |||
362 | EINTR A signal occured while spu_run was in progress. The npc value | ||
363 | has been updated to the new program counter value if necessary. | ||
364 | |||
365 | EINVAL fd is not a file descriptor returned from spu_create(2). | ||
366 | |||
367 | ENOMEM Insufficient memory was available to handle a page fault result- | ||
368 | ing from an MFC direct memory access. | ||
369 | |||
370 | ENOSYS the functionality is not provided by the current system, because | ||
371 | either the hardware does not provide SPUs or the spufs module is | ||
372 | not loaded. | ||
373 | |||
374 | |||
375 | NOTES | ||
376 | spu_run is meant to be used from libraries that implement a more | ||
377 | abstract interface to SPUs, not to be used from regular applications. | ||
378 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | ||
379 | ommended libraries. | ||
380 | |||
381 | |||
382 | CONFORMING TO | ||
383 | This call is Linux specific and only implemented by the ppc64 architec- | ||
384 | ture. Programs using this system call are not portable. | ||
385 | |||
386 | |||
387 | BUGS | ||
388 | The code does not yet fully implement all features lined out here. | ||
389 | |||
390 | |||
391 | AUTHOR | ||
392 | Arnd Bergmann <arndb@de.ibm.com> | ||
393 | |||
394 | SEE ALSO | ||
395 | capabilities(7), close(2), spu_create(2), spufs(7) | ||
396 | |||
397 | |||
398 | |||
399 | Linux 2005-09-28 SPU_RUN(2) | ||
400 | |||
401 | ------------------------------------------------------------------------------ | ||
402 | |||
403 | SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2) | ||
404 | |||
405 | |||
406 | |||
407 | NAME | ||
408 | spu_create - create a new spu context | ||
409 | |||
410 | |||
411 | SYNOPSIS | ||
412 | #include <sys/types.h> | ||
413 | #include <sys/spu.h> | ||
414 | |||
415 | int spu_create(const char *pathname, int flags, mode_t mode); | ||
416 | |||
417 | DESCRIPTION | ||
418 | The spu_create system call is used on PowerPC machines that implement | ||
419 | the Cell Broadband Engine Architecture in order to access Synergistic | ||
420 | Processor Units (SPUs). It creates a new logical context for an SPU in | ||
421 | pathname and returns a handle to associated with it. pathname must | ||
422 | point to a non-existing directory in the mount point of the SPU file | ||
423 | system (spufs). When spu_create is successful, a directory gets cre- | ||
424 | ated on pathname and it is populated with files. | ||
425 | |||
426 | The returned file handle can only be passed to spu_run(2) or closed, | ||
427 | other operations are not defined on it. When it is closed, all associ- | ||
428 | ated directory entries in spufs are removed. When the last file handle | ||
429 | pointing either inside of the context directory or to this file | ||
430 | descriptor is closed, the logical SPU context is destroyed. | ||
431 | |||
432 | The parameter flags can be zero or any bitwise or'd combination of the | ||
433 | following constants: | ||
434 | |||
435 | SPU_RAWIO | ||
436 | Allow mapping of some of the hardware registers of the SPU into | ||
437 | user space. This flag requires the CAP_SYS_RAWIO capability, see | ||
438 | capabilities(7). | ||
439 | |||
440 | The mode parameter specifies the permissions used for creating the new | ||
441 | directory in spufs. mode is modified with the user's umask(2) value | ||
442 | and then used for both the directory and the files contained in it. The | ||
443 | file permissions mask out some more bits of mode because they typically | ||
444 | support only read or write access. See stat(2) for a full list of the | ||
445 | possible mode values. | ||
446 | |||
447 | |||
448 | RETURN VALUE | ||
449 | spu_create returns a new file descriptor. It may return -1 to indicate | ||
450 | an error condition and set errno to one of the error codes listed | ||
451 | below. | ||
452 | |||
453 | |||
454 | ERRORS | ||
455 | EACCESS | ||
456 | The current user does not have write access on the spufs mount | ||
457 | point. | ||
458 | |||
459 | EEXIST An SPU context already exists at the given path name. | ||
460 | |||
461 | EFAULT pathname is not a valid string pointer in the current address | ||
462 | space. | ||
463 | |||
464 | EINVAL pathname is not a directory in the spufs mount point. | ||
465 | |||
466 | ELOOP Too many symlinks were found while resolving pathname. | ||
467 | |||
468 | EMFILE The process has reached its maximum open file limit. | ||
469 | |||
470 | ENAMETOOLONG | ||
471 | pathname was too long. | ||
472 | |||
473 | ENFILE The system has reached the global open file limit. | ||
474 | |||
475 | ENOENT Part of pathname could not be resolved. | ||
476 | |||
477 | ENOMEM The kernel could not allocate all resources required. | ||
478 | |||
479 | ENOSPC There are not enough SPU resources available to create a new | ||
480 | context or the user specific limit for the number of SPU con- | ||
481 | texts has been reached. | ||
482 | |||
483 | ENOSYS the functionality is not provided by the current system, because | ||
484 | either the hardware does not provide SPUs or the spufs module is | ||
485 | not loaded. | ||
486 | |||
487 | ENOTDIR | ||
488 | A part of pathname is not a directory. | ||
489 | |||
490 | |||
491 | |||
492 | NOTES | ||
493 | spu_create is meant to be used from libraries that implement a more | ||
494 | abstract interface to SPUs, not to be used from regular applications. | ||
495 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | ||
496 | ommended libraries. | ||
497 | |||
498 | |||
499 | FILES | ||
500 | pathname must point to a location beneath the mount point of spufs. By | ||
501 | convention, it gets mounted in /spu. | ||
502 | |||
503 | |||
504 | CONFORMING TO | ||
505 | This call is Linux specific and only implemented by the ppc64 architec- | ||
506 | ture. Programs using this system call are not portable. | ||
507 | |||
508 | |||
509 | BUGS | ||
510 | The code does not yet fully implement all features lined out here. | ||
511 | |||
512 | |||
513 | AUTHOR | ||
514 | Arnd Bergmann <arndb@de.ibm.com> | ||
515 | |||
516 | SEE ALSO | ||
517 | capabilities(7), close(2), spu_run(2), spufs(7) | ||
518 | |||
519 | |||
520 | |||
521 | Linux 2005-09-28 SPU_CREATE(2) | ||
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt index 988a62fae11f..7ba2baa165ff 100644 --- a/Documentation/filesystems/sysfs-pci.txt +++ b/Documentation/filesystems/sysfs-pci.txt | |||
@@ -1,4 +1,5 @@ | |||
1 | Accessing PCI device resources through sysfs | 1 | Accessing PCI device resources through sysfs |
2 | -------------------------------------------- | ||
2 | 3 | ||
3 | sysfs, usually mounted at /sys, provides access to PCI resources on platforms | 4 | sysfs, usually mounted at /sys, provides access to PCI resources on platforms |
4 | that support it. For example, a given bus might look like this: | 5 | that support it. For example, a given bus might look like this: |
@@ -47,14 +48,21 @@ files, each with their own function. | |||
47 | binary - file contains binary data | 48 | binary - file contains binary data |
48 | cpumask - file contains a cpumask type | 49 | cpumask - file contains a cpumask type |
49 | 50 | ||
50 | The read only files are informational, writes to them will be ignored. | 51 | The read only files are informational, writes to them will be ignored, with |
51 | Writable files can be used to perform actions on the device (e.g. changing | 52 | the exception of the 'rom' file. Writable files can be used to perform |
52 | config space, detaching a device). mmapable files are available via an | 53 | actions on the device (e.g. changing config space, detaching a device). |
53 | mmap of the file at offset 0 and can be used to do actual device programming | 54 | mmapable files are available via an mmap of the file at offset 0 and can be |
54 | from userspace. Note that some platforms don't support mmapping of certain | 55 | used to do actual device programming from userspace. Note that some platforms |
55 | resources, so be sure to check the return value from any attempted mmap. | 56 | don't support mmapping of certain resources, so be sure to check the return |
57 | value from any attempted mmap. | ||
58 | |||
59 | The 'rom' file is special in that it provides read-only access to the device's | ||
60 | ROM file, if available. It's disabled by default, however, so applications | ||
61 | should write the string "1" to the file to enable it before attempting a read | ||
62 | call, and disable it following the access by writing "0" to the file. | ||
56 | 63 | ||
57 | Accessing legacy resources through sysfs | 64 | Accessing legacy resources through sysfs |
65 | ---------------------------------------- | ||
58 | 66 | ||
59 | Legacy I/O port and ISA memory resources are also provided in sysfs if the | 67 | Legacy I/O port and ISA memory resources are also provided in sysfs if the |
60 | underlying platform supports them. They're located in the PCI class heirarchy, | 68 | underlying platform supports them. They're located in the PCI class heirarchy, |
@@ -75,6 +83,7 @@ simply dereference the returned pointer (after checking for errors of course) | |||
75 | to access legacy memory space. | 83 | to access legacy memory space. |
76 | 84 | ||
77 | Supporting PCI access on new platforms | 85 | Supporting PCI access on new platforms |
86 | -------------------------------------- | ||
78 | 87 | ||
79 | In order to support PCI resource mapping as described above, Linux platform | 88 | In order to support PCI resource mapping as described above, Linux platform |
80 | code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function. | 89 | code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function. |
diff --git a/Documentation/hrtimers.txt b/Documentation/hrtimers.txt new file mode 100644 index 000000000000..7620ff735faf --- /dev/null +++ b/Documentation/hrtimers.txt | |||
@@ -0,0 +1,178 @@ | |||
1 | |||
2 | hrtimers - subsystem for high-resolution kernel timers | ||
3 | ---------------------------------------------------- | ||
4 | |||
5 | This patch introduces a new subsystem for high-resolution kernel timers. | ||
6 | |||
7 | One might ask the question: we already have a timer subsystem | ||
8 | (kernel/timers.c), why do we need two timer subsystems? After a lot of | ||
9 | back and forth trying to integrate high-resolution and high-precision | ||
10 | features into the existing timer framework, and after testing various | ||
11 | such high-resolution timer implementations in practice, we came to the | ||
12 | conclusion that the timer wheel code is fundamentally not suitable for | ||
13 | such an approach. We initially didnt believe this ('there must be a way | ||
14 | to solve this'), and spent a considerable effort trying to integrate | ||
15 | things into the timer wheel, but we failed. In hindsight, there are | ||
16 | several reasons why such integration is hard/impossible: | ||
17 | |||
18 | - the forced handling of low-resolution and high-resolution timers in | ||
19 | the same way leads to a lot of compromises, macro magic and #ifdef | ||
20 | mess. The timers.c code is very "tightly coded" around jiffies and | ||
21 | 32-bitness assumptions, and has been honed and micro-optimized for a | ||
22 | relatively narrow use case (jiffies in a relatively narrow HZ range) | ||
23 | for many years - and thus even small extensions to it easily break | ||
24 | the wheel concept, leading to even worse compromises. The timer wheel | ||
25 | code is very good and tight code, there's zero problems with it in its | ||
26 | current usage - but it is simply not suitable to be extended for | ||
27 | high-res timers. | ||
28 | |||
29 | - the unpredictable [O(N)] overhead of cascading leads to delays which | ||
30 | necessiate a more complex handling of high resolution timers, which | ||
31 | in turn decreases robustness. Such a design still led to rather large | ||
32 | timing inaccuracies. Cascading is a fundamental property of the timer | ||
33 | wheel concept, it cannot be 'designed out' without unevitably | ||
34 | degrading other portions of the timers.c code in an unacceptable way. | ||
35 | |||
36 | - the implementation of the current posix-timer subsystem on top of | ||
37 | the timer wheel has already introduced a quite complex handling of | ||
38 | the required readjusting of absolute CLOCK_REALTIME timers at | ||
39 | settimeofday or NTP time - further underlying our experience by | ||
40 | example: that the timer wheel data structure is too rigid for high-res | ||
41 | timers. | ||
42 | |||
43 | - the timer wheel code is most optimal for use cases which can be | ||
44 | identified as "timeouts". Such timeouts are usually set up to cover | ||
45 | error conditions in various I/O paths, such as networking and block | ||
46 | I/O. The vast majority of those timers never expire and are rarely | ||
47 | recascaded because the expected correct event arrives in time so they | ||
48 | can be removed from the timer wheel before any further processing of | ||
49 | them becomes necessary. Thus the users of these timeouts can accept | ||
50 | the granularity and precision tradeoffs of the timer wheel, and | ||
51 | largely expect the timer subsystem to have near-zero overhead. | ||
52 | Accurate timing for them is not a core purpose - in fact most of the | ||
53 | timeout values used are ad-hoc. For them it is at most a necessary | ||
54 | evil to guarantee the processing of actual timeout completions | ||
55 | (because most of the timeouts are deleted before completion), which | ||
56 | should thus be as cheap and unintrusive as possible. | ||
57 | |||
58 | The primary users of precision timers are user-space applications that | ||
59 | utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel | ||
60 | users like drivers and subsystems which require precise timed events | ||
61 | (e.g. multimedia) can benefit from the availability of a seperate | ||
62 | high-resolution timer subsystem as well. | ||
63 | |||
64 | While this subsystem does not offer high-resolution clock sources just | ||
65 | yet, the hrtimer subsystem can be easily extended with high-resolution | ||
66 | clock capabilities, and patches for that exist and are maturing quickly. | ||
67 | The increasing demand for realtime and multimedia applications along | ||
68 | with other potential users for precise timers gives another reason to | ||
69 | separate the "timeout" and "precise timer" subsystems. | ||
70 | |||
71 | Another potential benefit is that such a seperation allows even more | ||
72 | special-purpose optimization of the existing timer wheel for the low | ||
73 | resolution and low precision use cases - once the precision-sensitive | ||
74 | APIs are separated from the timer wheel and are migrated over to | ||
75 | hrtimers. E.g. we could decrease the frequency of the timeout subsystem | ||
76 | from 250 Hz to 100 HZ (or even smaller). | ||
77 | |||
78 | hrtimer subsystem implementation details | ||
79 | ---------------------------------------- | ||
80 | |||
81 | the basic design considerations were: | ||
82 | |||
83 | - simplicity | ||
84 | |||
85 | - data structure not bound to jiffies or any other granularity. All the | ||
86 | kernel logic works at 64-bit nanoseconds resolution - no compromises. | ||
87 | |||
88 | - simplification of existing, timing related kernel code | ||
89 | |||
90 | another basic requirement was the immediate enqueueing and ordering of | ||
91 | timers at activation time. After looking at several possible solutions | ||
92 | such as radix trees and hashes, we chose the red black tree as the basic | ||
93 | data structure. Rbtrees are available as a library in the kernel and are | ||
94 | used in various performance-critical areas of e.g. memory management and | ||
95 | file systems. The rbtree is solely used for time sorted ordering, while | ||
96 | a separate list is used to give the expiry code fast access to the | ||
97 | queued timers, without having to walk the rbtree. | ||
98 | |||
99 | (This seperate list is also useful for later when we'll introduce | ||
100 | high-resolution clocks, where we need seperate pending and expired | ||
101 | queues while keeping the time-order intact.) | ||
102 | |||
103 | Time-ordered enqueueing is not purely for the purposes of | ||
104 | high-resolution clocks though, it also simplifies the handling of | ||
105 | absolute timers based on a low-resolution CLOCK_REALTIME. The existing | ||
106 | implementation needed to keep an extra list of all armed absolute | ||
107 | CLOCK_REALTIME timers along with complex locking. In case of | ||
108 | settimeofday and NTP, all the timers (!) had to be dequeued, the | ||
109 | time-changing code had to fix them up one by one, and all of them had to | ||
110 | be enqueued again. The time-ordered enqueueing and the storage of the | ||
111 | expiry time in absolute time units removes all this complex and poorly | ||
112 | scaling code from the posix-timer implementation - the clock can simply | ||
113 | be set without having to touch the rbtree. This also makes the handling | ||
114 | of posix-timers simpler in general. | ||
115 | |||
116 | The locking and per-CPU behavior of hrtimers was mostly taken from the | ||
117 | existing timer wheel code, as it is mature and well suited. Sharing code | ||
118 | was not really a win, due to the different data structures. Also, the | ||
119 | hrtimer functions now have clearer behavior and clearer names - such as | ||
120 | hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly | ||
121 | equivalent to del_timer() and del_timer_sync()] - so there's no direct | ||
122 | 1:1 mapping between them on the algorithmical level, and thus no real | ||
123 | potential for code sharing either. | ||
124 | |||
125 | Basic data types: every time value, absolute or relative, is in a | ||
126 | special nanosecond-resolution type: ktime_t. The kernel-internal | ||
127 | representation of ktime_t values and operations is implemented via | ||
128 | macros and inline functions, and can be switched between a "hybrid | ||
129 | union" type and a plain "scalar" 64bit nanoseconds representation (at | ||
130 | compile time). The hybrid union type optimizes time conversions on 32bit | ||
131 | CPUs. This build-time-selectable ktime_t storage format was implemented | ||
132 | to avoid the performance impact of 64-bit multiplications and divisions | ||
133 | on 32bit CPUs. Such operations are frequently necessary to convert | ||
134 | between the storage formats provided by kernel and userspace interfaces | ||
135 | and the internal time format. (See include/linux/ktime.h for further | ||
136 | details.) | ||
137 | |||
138 | hrtimers - rounding of timer values | ||
139 | ----------------------------------- | ||
140 | |||
141 | the hrtimer code will round timer events to lower-resolution clocks | ||
142 | because it has to. Otherwise it will do no artificial rounding at all. | ||
143 | |||
144 | one question is, what resolution value should be returned to the user by | ||
145 | the clock_getres() interface. This will return whatever real resolution | ||
146 | a given clock has - be it low-res, high-res, or artificially-low-res. | ||
147 | |||
148 | hrtimers - testing and verification | ||
149 | ---------------------------------- | ||
150 | |||
151 | We used the high-resolution clock subsystem ontop of hrtimers to verify | ||
152 | the hrtimer implementation details in praxis, and we also ran the posix | ||
153 | timer tests in order to ensure specification compliance. We also ran | ||
154 | tests on low-resolution clocks. | ||
155 | |||
156 | The hrtimer patch converts the following kernel functionality to use | ||
157 | hrtimers: | ||
158 | |||
159 | - nanosleep | ||
160 | - itimers | ||
161 | - posix-timers | ||
162 | |||
163 | The conversion of nanosleep and posix-timers enabled the unification of | ||
164 | nanosleep and clock_nanosleep. | ||
165 | |||
166 | The code was successfully compiled for the following platforms: | ||
167 | |||
168 | i386, x86_64, ARM, PPC, PPC64, IA64 | ||
169 | |||
170 | The code was run-tested on the following platforms: | ||
171 | |||
172 | i386(UP/SMP), x86_64(UP/SMP), ARM, PPC | ||
173 | |||
174 | hrtimers were also integrated into the -rt tree, along with a | ||
175 | hrtimers-based high-resolution clock implementation, so the hrtimers | ||
176 | code got a healthy amount of testing and use in practice. | ||
177 | |||
178 | Thomas Gleixner, Ingo Molnar | ||
diff --git a/Documentation/hwmon/w83627hf b/Documentation/hwmon/w83627hf index 78f37c2d602e..5d23776e9907 100644 --- a/Documentation/hwmon/w83627hf +++ b/Documentation/hwmon/w83627hf | |||
@@ -54,13 +54,16 @@ If you really want i2c accesses for these Super I/O chips, | |||
54 | use the w83781d driver. However this is not the preferred method | 54 | use the w83781d driver. However this is not the preferred method |
55 | now that this ISA driver has been developed. | 55 | now that this ISA driver has been developed. |
56 | 56 | ||
57 | Technically, the w83627thf does not support a VID reading. However, it's | 57 | The w83627_HF_ uses pins 110-106 as VID0-VID4. The w83627_THF_ uses the |
58 | possible or even likely that your mainboard maker has routed these signals | 58 | same pins as GPIO[0:4]. Technically, the w83627_THF_ does not support a |
59 | to a specific set of general purpose IO pins (the Asus P4C800-E is one such | 59 | VID reading. However the two chips have the identical 128 pin package. So, |
60 | board). The w83627thf driver now interprets these as VID. If the VID on | 60 | it is possible or even likely for a w83627thf to have the VID signals routed |
61 | your board doesn't work, first see doc/vid in the lm_sensors package. If | 61 | to these pins despite their not being labeled for that purpose. Therefore, |
62 | that still doesn't help, email us at lm-sensors@lm-sensors.org. | 62 | the w83627thf driver interprets these as VID. If the VID on your board |
63 | doesn't work, first see doc/vid in the lm_sensors package[1]. If that still | ||
64 | doesn't help, you may just ignore the bogus VID reading with no harm done. | ||
63 | 65 | ||
64 | For further information on this driver see the w83781d driver | 66 | For further information on this driver see the w83781d driver documentation. |
65 | documentation. | 67 | |
68 | [1] http://www2.lm-sensors.nu/~lm78/cvs/browse.cgi/lm_sensors2/doc/vid | ||
66 | 69 | ||
diff --git a/Documentation/i2c/busses/i2c-nforce2 b/Documentation/i2c/busses/i2c-nforce2 index e379e182e64f..d751282d9b2a 100644 --- a/Documentation/i2c/busses/i2c-nforce2 +++ b/Documentation/i2c/busses/i2c-nforce2 | |||
@@ -5,7 +5,8 @@ Supported adapters: | |||
5 | * nForce2 Ultra 400 MCP 10de:0084 | 5 | * nForce2 Ultra 400 MCP 10de:0084 |
6 | * nForce3 Pro150 MCP 10de:00D4 | 6 | * nForce3 Pro150 MCP 10de:00D4 |
7 | * nForce3 250Gb MCP 10de:00E4 | 7 | * nForce3 250Gb MCP 10de:00E4 |
8 | * nForce4 MCP 10de:0052 | 8 | * nForce4 MCP 10de:0052 |
9 | * nForce4 MCP-04 10de:0034 | ||
9 | 10 | ||
10 | Datasheet: not publically available, but seems to be similar to the | 11 | Datasheet: not publically available, but seems to be similar to the |
11 | AMD-8111 SMBus 2.0 adapter. | 12 | AMD-8111 SMBus 2.0 adapter. |
diff --git a/Documentation/i2c/busses/i2c-parport b/Documentation/i2c/busses/i2c-parport index 9f1d0082da18..d9f23c0763f1 100644 --- a/Documentation/i2c/busses/i2c-parport +++ b/Documentation/i2c/busses/i2c-parport | |||
@@ -17,6 +17,7 @@ It currently supports the following devices: | |||
17 | * Velleman K8000 adapter | 17 | * Velleman K8000 adapter |
18 | * ELV adapter | 18 | * ELV adapter |
19 | * Analog Devices evaluation boards (ADM1025, ADM1030, ADM1031, ADM1032) | 19 | * Analog Devices evaluation boards (ADM1025, ADM1030, ADM1031, ADM1032) |
20 | * Barco LPT->DVI (K5800236) adapter | ||
20 | 21 | ||
21 | These devices use different pinout configurations, so you have to tell | 22 | These devices use different pinout configurations, so you have to tell |
22 | the driver what you have, using the type module parameter. There is no | 23 | the driver what you have, using the type module parameter. There is no |
diff --git a/Documentation/i2c/porting-clients b/Documentation/i2c/porting-clients index 184fac2377aa..f03c2a02f806 100644 --- a/Documentation/i2c/porting-clients +++ b/Documentation/i2c/porting-clients | |||
@@ -1,10 +1,13 @@ | |||
1 | Revision 5, 2005-07-29 | 1 | Revision 6, 2005-11-20 |
2 | Jean Delvare <khali@linux-fr.org> | 2 | Jean Delvare <khali@linux-fr.org> |
3 | Greg KH <greg@kroah.com> | 3 | Greg KH <greg@kroah.com> |
4 | 4 | ||
5 | This is a guide on how to convert I2C chip drivers from Linux 2.4 to | 5 | This is a guide on how to convert I2C chip drivers from Linux 2.4 to |
6 | Linux 2.6. I have been using existing drivers (lm75, lm78) as examples. | 6 | Linux 2.6. I have been using existing drivers (lm75, lm78) as examples. |
7 | Then I converted a driver myself (lm83) and updated this document. | 7 | Then I converted a driver myself (lm83) and updated this document. |
8 | Note that this guide is strongly oriented towards hardware monitoring | ||
9 | drivers. Many points are still valid for other type of drivers, but | ||
10 | others may be irrelevant. | ||
8 | 11 | ||
9 | There are two sets of points below. The first set concerns technical | 12 | There are two sets of points below. The first set concerns technical |
10 | changes. The second set concerns coding policy. Both are mandatory. | 13 | changes. The second set concerns coding policy. Both are mandatory. |
@@ -22,16 +25,20 @@ Technical changes: | |||
22 | #include <linux/module.h> | 25 | #include <linux/module.h> |
23 | #include <linux/init.h> | 26 | #include <linux/init.h> |
24 | #include <linux/slab.h> | 27 | #include <linux/slab.h> |
28 | #include <linux/jiffies.h> | ||
25 | #include <linux/i2c.h> | 29 | #include <linux/i2c.h> |
30 | #include <linux/i2c-isa.h> /* for ISA drivers */ | ||
26 | #include <linux/hwmon.h> /* for hardware monitoring drivers */ | 31 | #include <linux/hwmon.h> /* for hardware monitoring drivers */ |
27 | #include <linux/hwmon-sysfs.h> | 32 | #include <linux/hwmon-sysfs.h> |
28 | #include <linux/hwmon-vid.h> /* if you need VRM support */ | 33 | #include <linux/hwmon-vid.h> /* if you need VRM support */ |
34 | #include <linux/err.h> /* for class registration */ | ||
29 | #include <asm/io.h> /* if you have I/O operations */ | 35 | #include <asm/io.h> /* if you have I/O operations */ |
30 | Please respect this inclusion order. Some extra headers may be | 36 | Please respect this inclusion order. Some extra headers may be |
31 | required for a given driver (e.g. "lm75.h"). | 37 | required for a given driver (e.g. "lm75.h"). |
32 | 38 | ||
33 | * [Addresses] SENSORS_I2C_END becomes I2C_CLIENT_END, ISA addresses | 39 | * [Addresses] SENSORS_I2C_END becomes I2C_CLIENT_END, ISA addresses |
34 | are no more handled by the i2c core. | 40 | are no more handled by the i2c core. Address ranges are no more |
41 | supported either, define each individual address separately. | ||
35 | SENSORS_INSMOD_<n> becomes I2C_CLIENT_INSMOD_<n>. | 42 | SENSORS_INSMOD_<n> becomes I2C_CLIENT_INSMOD_<n>. |
36 | 43 | ||
37 | * [Client data] Get rid of sysctl_id. Try using standard names for | 44 | * [Client data] Get rid of sysctl_id. Try using standard names for |
@@ -48,23 +55,23 @@ Technical changes: | |||
48 | int kind); | 55 | int kind); |
49 | static void lm75_init_client(struct i2c_client *client); | 56 | static void lm75_init_client(struct i2c_client *client); |
50 | static int lm75_detach_client(struct i2c_client *client); | 57 | static int lm75_detach_client(struct i2c_client *client); |
51 | static void lm75_update_client(struct i2c_client *client); | 58 | static struct lm75_data lm75_update_device(struct device *dev); |
52 | 59 | ||
53 | * [Sysctl] All sysctl stuff is of course gone (defines, ctl_table | 60 | * [Sysctl] All sysctl stuff is of course gone (defines, ctl_table |
54 | and functions). Instead, you have to define show and set functions for | 61 | and functions). Instead, you have to define show and set functions for |
55 | each sysfs file. Only define set for writable values. Take a look at an | 62 | each sysfs file. Only define set for writable values. Take a look at an |
56 | existing 2.6 driver for details (lm78 for example). Don't forget | 63 | existing 2.6 driver for details (it87 for example). Don't forget |
57 | to define the attributes for each file (this is that step that | 64 | to define the attributes for each file (this is that step that |
58 | links callback functions). Use the file names specified in | 65 | links callback functions). Use the file names specified in |
59 | Documentation/i2c/sysfs-interface for the individual files. Also | 66 | Documentation/hwmon/sysfs-interface for the individual files. Also |
60 | convert the units these files read and write to the specified ones. | 67 | convert the units these files read and write to the specified ones. |
61 | If you need to add a new type of file, please discuss it on the | 68 | If you need to add a new type of file, please discuss it on the |
62 | sensors mailing list <lm-sensors@lm-sensors.org> by providing a | 69 | sensors mailing list <lm-sensors@lm-sensors.org> by providing a |
63 | patch to the Documentation/i2c/sysfs-interface file. | 70 | patch to the Documentation/hwmon/sysfs-interface file. |
64 | 71 | ||
65 | * [Attach] For I2C drivers, the attach function should make sure | 72 | * [Attach] For I2C drivers, the attach function should make sure |
66 | that the adapter's class has I2C_CLASS_HWMON, using the | 73 | that the adapter's class has I2C_CLASS_HWMON (or whatever class is |
67 | following construct: | 74 | suitable for your driver), using the following construct: |
68 | if (!(adapter->class & I2C_CLASS_HWMON)) | 75 | if (!(adapter->class & I2C_CLASS_HWMON)) |
69 | return 0; | 76 | return 0; |
70 | ISA-only drivers of course don't need this. | 77 | ISA-only drivers of course don't need this. |
@@ -72,63 +79,72 @@ Technical changes: | |||
72 | 79 | ||
73 | * [Detect] As mentioned earlier, the flags parameter is gone. | 80 | * [Detect] As mentioned earlier, the flags parameter is gone. |
74 | The type_name and client_name strings are replaced by a single | 81 | The type_name and client_name strings are replaced by a single |
75 | name string, which will be filled with a lowercase, short string | 82 | name string, which will be filled with a lowercase, short string. |
76 | (typically the driver name, e.g. "lm75"). | ||
77 | In i2c-only drivers, drop the i2c_is_isa_adapter check, it's | 83 | In i2c-only drivers, drop the i2c_is_isa_adapter check, it's |
78 | useless. Same for isa-only drivers, as the test would always be | 84 | useless. Same for isa-only drivers, as the test would always be |
79 | true. Only hybrid drivers (which are quite rare) still need it. | 85 | true. Only hybrid drivers (which are quite rare) still need it. |
80 | The errorN labels are reduced to the number needed. If that number | 86 | The labels used for error paths are reduced to the number needed. |
81 | is 2 (i2c-only drivers), it is advised that the labels are named | 87 | It is advised that the labels are given descriptive names such as |
82 | exit and exit_free. For i2c+isa drivers, labels should be named | 88 | exit and exit_free. Don't forget to properly set err before |
83 | ERROR0, ERROR1 and ERROR2. Don't forget to properly set err before | ||
84 | jumping to error labels. By the way, labels should be left-aligned. | 89 | jumping to error labels. By the way, labels should be left-aligned. |
85 | Use kzalloc instead of kmalloc. | 90 | Use kzalloc instead of kmalloc. |
86 | Use i2c_set_clientdata to set the client data (as opposed to | 91 | Use i2c_set_clientdata to set the client data (as opposed to |
87 | a direct access to client->data). | 92 | a direct access to client->data). |
88 | Use strlcpy instead of strcpy to copy the client name. | 93 | Use strlcpy instead of strcpy or snprintf to copy the client name. |
89 | Replace the sysctl directory registration by calls to | 94 | Replace the sysctl directory registration by calls to |
90 | device_create_file. Move the driver initialization before any | 95 | device_create_file. Move the driver initialization before any |
91 | sysfs file creation. | 96 | sysfs file creation. |
97 | Register the client with the hwmon class (using hwmon_device_register) | ||
98 | if applicable. | ||
92 | Drop client->id. | 99 | Drop client->id. |
93 | Drop any 24RF08 corruption prevention you find, as this is now done | 100 | Drop any 24RF08 corruption prevention you find, as this is now done |
94 | at the i2c-core level, and doing it twice voids it. | 101 | at the i2c-core level, and doing it twice voids it. |
102 | Don't add I2C_CLIENT_ALLOW_USE to client->flags, it's the default now. | ||
95 | 103 | ||
96 | * [Init] Limits must not be set by the driver (can be done later in | 104 | * [Init] Limits must not be set by the driver (can be done later in |
97 | user-space). Chip should not be reset default (although a module | 105 | user-space). Chip should not be reset default (although a module |
98 | parameter may be used to force is), and initialization should be | 106 | parameter may be used to force it), and initialization should be |
99 | limited to the strictly necessary steps. | 107 | limited to the strictly necessary steps. |
100 | 108 | ||
101 | * [Detach] Get rid of data, remove the call to | 109 | * [Detach] Remove the call to i2c_deregister_entry. Do not log an |
102 | i2c_deregister_entry. Do not log an error message if | 110 | error message if i2c_detach_client fails, as i2c-core will now do |
103 | i2c_detach_client fails, as i2c-core will now do it for you. | 111 | it for you. |
104 | 112 | Unregister from the hwmon class if applicable. | |
105 | * [Update] Don't access client->data directly, use | 113 | |
106 | i2c_get_clientdata(client) instead. | 114 | * [Update] The function prototype changed, it is now |
107 | 115 | passed a device structure, which you have to convert to a client | |
108 | * [Interface] Init function should not print anything. Make sure | 116 | using to_i2c_client(dev). The update function should return a |
109 | there is a MODULE_LICENSE() line, at the bottom of the file | 117 | pointer to the client data. |
110 | (after MODULE_AUTHOR() and MODULE_DESCRIPTION(), in this order). | 118 | Don't access client->data directly, use i2c_get_clientdata(client) |
119 | instead. | ||
120 | Use time_after() instead of direct jiffies comparison. | ||
121 | |||
122 | * [Interface] Make sure there is a MODULE_LICENSE() line, at the bottom | ||
123 | of the file (after MODULE_AUTHOR() and MODULE_DESCRIPTION(), in this | ||
124 | order). | ||
125 | |||
126 | * [Driver] The flags field of the i2c_driver structure is gone. | ||
127 | I2C_DF_NOTIFY is now the default behavior. | ||
128 | The i2c_driver structure has a driver member, which is itself a | ||
129 | structure, those name member should be initialized to a driver name | ||
130 | string. i2c_driver itself has no name member anymore. | ||
111 | 131 | ||
112 | Coding policy: | 132 | Coding policy: |
113 | 133 | ||
114 | * [Copyright] Use (C), not (c), for copyright. | 134 | * [Copyright] Use (C), not (c), for copyright. |
115 | 135 | ||
116 | * [Debug/log] Get rid of #ifdef DEBUG/#endif constructs whenever you | 136 | * [Debug/log] Get rid of #ifdef DEBUG/#endif constructs whenever you |
117 | can. Calls to printk/pr_debug for debugging purposes are replaced | 137 | can. Calls to printk for debugging purposes are replaced by calls to |
118 | by calls to dev_dbg. Here is an example on how to call it (taken | 138 | dev_dbg where possible, else to pr_debug. Here is an example of how |
119 | from lm75_detect): | 139 | to call it (taken from lm75_detect): |
120 | dev_dbg(&client->dev, "Starting lm75 update\n"); | 140 | dev_dbg(&client->dev, "Starting lm75 update\n"); |
121 | Replace other printk calls with the dev_info, dev_err or dev_warn | 141 | Replace other printk calls with the dev_info, dev_err or dev_warn |
122 | function, as appropriate. | 142 | function, as appropriate. |
123 | 143 | ||
124 | * [Constants] Constants defines (registers, conversions, initial | 144 | * [Constants] Constants defines (registers, conversions) should be |
125 | values) should be aligned. This greatly improves readability. | 145 | aligned. This greatly improves readability. |
126 | Same goes for variables declarations. Alignments are achieved by the | 146 | Alignments are achieved by the means of tabs, not spaces. Remember |
127 | means of tabs, not spaces. Remember that tabs are set to 8 in the | 147 | that tabs are set to 8 in the Linux kernel code. |
128 | Linux kernel code. | ||
129 | |||
130 | * [Structure definition] The name field should be standardized. All | ||
131 | lowercase and as simple as the driver name itself (e.g. "lm75"). | ||
132 | 148 | ||
133 | * [Layout] Avoid extra empty lines between comments and what they | 149 | * [Layout] Avoid extra empty lines between comments and what they |
134 | comment. Respect the coding style (see Documentation/CodingStyle), | 150 | comment. Respect the coding style (see Documentation/CodingStyle), |
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients index d19993cc0604..3a057c8e5507 100644 --- a/Documentation/i2c/writing-clients +++ b/Documentation/i2c/writing-clients | |||
@@ -25,9 +25,9 @@ routines, a client structure specific information like the actual I2C | |||
25 | address. | 25 | address. |
26 | 26 | ||
27 | static struct i2c_driver foo_driver = { | 27 | static struct i2c_driver foo_driver = { |
28 | .owner = THIS_MODULE, | 28 | .driver = { |
29 | .name = "Foo version 2.3 driver", | 29 | .name = "foo", |
30 | .flags = I2C_DF_NOTIFY, | 30 | }, |
31 | .attach_adapter = &foo_attach_adapter, | 31 | .attach_adapter = &foo_attach_adapter, |
32 | .detach_client = &foo_detach_client, | 32 | .detach_client = &foo_detach_client, |
33 | .command = &foo_command /* may be NULL */ | 33 | .command = &foo_command /* may be NULL */ |
@@ -36,10 +36,6 @@ static struct i2c_driver foo_driver = { | |||
36 | The name field must match the driver name, including the case. It must not | 36 | The name field must match the driver name, including the case. It must not |
37 | contain spaces, and may be up to 31 characters long. | 37 | contain spaces, and may be up to 31 characters long. |
38 | 38 | ||
39 | Don't worry about the flags field; just put I2C_DF_NOTIFY into it. This | ||
40 | means that your driver will be notified when new adapters are found. | ||
41 | This is almost always what you want. | ||
42 | |||
43 | All other fields are for call-back functions which will be explained | 39 | All other fields are for call-back functions which will be explained |
44 | below. | 40 | below. |
45 | 41 | ||
@@ -496,17 +492,13 @@ Note that some functions are marked by `__init', and some data structures | |||
496 | by `__init_data'. Hose functions and structures can be removed after | 492 | by `__init_data'. Hose functions and structures can be removed after |
497 | kernel booting (or module loading) is completed. | 493 | kernel booting (or module loading) is completed. |
498 | 494 | ||
495 | |||
499 | Command function | 496 | Command function |
500 | ================ | 497 | ================ |
501 | 498 | ||
502 | A generic ioctl-like function call back is supported. You will seldom | 499 | A generic ioctl-like function call back is supported. You will seldom |
503 | need this. You may even set it to NULL. | 500 | need this, and its use is deprecated anyway, so newer design should not |
504 | 501 | use it. Set it to NULL. | |
505 | /* No commands defined */ | ||
506 | int foo_command(struct i2c_client *client, unsigned int cmd, void *arg) | ||
507 | { | ||
508 | return 0; | ||
509 | } | ||
510 | 502 | ||
511 | 503 | ||
512 | Sending and receiving | 504 | Sending and receiving |
diff --git a/Documentation/i2o/ioctl b/Documentation/i2o/ioctl index 3e174978997d..1e77fac4e120 100644 --- a/Documentation/i2o/ioctl +++ b/Documentation/i2o/ioctl | |||
@@ -185,7 +185,7 @@ VII. Getting Parameters | |||
185 | ENOMEM Kernel memory allocation error | 185 | ENOMEM Kernel memory allocation error |
186 | 186 | ||
187 | A return value of 0 does not mean that the value was actually | 187 | A return value of 0 does not mean that the value was actually |
188 | properly retreived. The user should check the result list | 188 | properly retrieved. The user should check the result list |
189 | to determine the specific status of the transaction. | 189 | to determine the specific status of the transaction. |
190 | 190 | ||
191 | VIII. Downloading Software | 191 | VIII. Downloading Software |
diff --git a/Documentation/input/appletouch.txt b/Documentation/input/appletouch.txt index b48d11d0326d..4f7c633a76d2 100644 --- a/Documentation/input/appletouch.txt +++ b/Documentation/input/appletouch.txt | |||
@@ -3,7 +3,7 @@ Apple Touchpad Driver (appletouch) | |||
3 | Copyright (C) 2005 Stelian Pop <stelian@popies.net> | 3 | Copyright (C) 2005 Stelian Pop <stelian@popies.net> |
4 | 4 | ||
5 | appletouch is a Linux kernel driver for the USB touchpad found on post | 5 | appletouch is a Linux kernel driver for the USB touchpad found on post |
6 | February 2005 Apple Alu Powerbooks. | 6 | February 2005 and October 2005 Apple Aluminium Powerbooks. |
7 | 7 | ||
8 | This driver is derived from Johannes Berg's appletrackpad driver[1], but it has | 8 | This driver is derived from Johannes Berg's appletrackpad driver[1], but it has |
9 | been improved in some areas: | 9 | been improved in some areas: |
@@ -13,7 +13,8 @@ been improved in some areas: | |||
13 | 13 | ||
14 | Credits go to Johannes Berg for reverse-engineering the touchpad protocol, | 14 | Credits go to Johannes Berg for reverse-engineering the touchpad protocol, |
15 | Frank Arnold for further improvements, and Alex Harper for some additional | 15 | Frank Arnold for further improvements, and Alex Harper for some additional |
16 | information about the inner workings of the touchpad sensors. | 16 | information about the inner workings of the touchpad sensors. Michael |
17 | Hanselmann added support for the October 2005 models. | ||
17 | 18 | ||
18 | Usage: | 19 | Usage: |
19 | ------ | 20 | ------ |
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 5f08f9ce6046..212cf3c21abf 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt | |||
@@ -4,10 +4,10 @@ Documentation for kdump - the kexec-based crash dumping solution | |||
4 | DESIGN | 4 | DESIGN |
5 | ====== | 5 | ====== |
6 | 6 | ||
7 | Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken. | 7 | Kdump uses kexec to reboot to a second kernel whenever a dump needs to be |
8 | This second kernel is booted with very little memory. The first kernel reserves | 8 | taken. This second kernel is booted with very little memory. The first kernel |
9 | the section of memory that the second kernel uses. This ensures that on-going | 9 | reserves the section of memory that the second kernel uses. This ensures that |
10 | DMA from the first kernel does not corrupt the second kernel. | 10 | on-going DMA from the first kernel does not corrupt the second kernel. |
11 | 11 | ||
12 | All the necessary information about Core image is encoded in ELF format and | 12 | All the necessary information about Core image is encoded in ELF format and |
13 | stored in reserved area of memory before crash. Physical address of start of | 13 | stored in reserved area of memory before crash. Physical address of start of |
@@ -35,77 +35,82 @@ In the second kernel, "old memory" can be accessed in two ways. | |||
35 | SETUP | 35 | SETUP |
36 | ===== | 36 | ===== |
37 | 37 | ||
38 | 1) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz | 38 | 1) Download the upstream kexec-tools userspace package from |
39 | and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch | 39 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz. |
40 | and after that build the source. | ||
41 | 40 | ||
42 | 2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel. | 41 | Apply the latest consolidated kdump patch on top of kexec-tools-1.101 |
42 | from http://lse.sourceforge.net/kdump/. This arrangment has been made | ||
43 | till all the userspace patches supporting kdump are integrated with | ||
44 | upstream kexec-tools userspace. | ||
43 | 45 | ||
46 | 2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels. | ||
44 | Two kernels need to be built in order to get this feature working. | 47 | Two kernels need to be built in order to get this feature working. |
48 | Following are the steps to properly configure the two kernels specific | ||
49 | to kexec and kdump features: | ||
45 | 50 | ||
46 | A) First kernel: | 51 | A) First kernel or regular kernel: |
52 | ---------------------------------- | ||
47 | a) Enable "kexec system call" feature (in Processor type and features). | 53 | a) Enable "kexec system call" feature (in Processor type and features). |
48 | CONFIG_KEXEC=y | 54 | CONFIG_KEXEC=y |
49 | b) This kernel's physical load address should be the default value of | 55 | b) Enable "sysfs file system support" (in Pseudo filesystems). |
50 | 0x100000 (0x100000, 1 MB) (in Processor type and features). | 56 | CONFIG_SYSFS=y |
51 | CONFIG_PHYSICAL_START=0x100000 | 57 | c) make |
52 | c) Enable "sysfs file system support" (in Pseudo filesystems). | ||
53 | CONFIG_SYSFS=y | ||
54 | d) Boot into first kernel with the command line parameter "crashkernel=Y@X". | 58 | d) Boot into first kernel with the command line parameter "crashkernel=Y@X". |
55 | Use appropriate values for X and Y. Y denotes how much memory to reserve | 59 | Use appropriate values for X and Y. Y denotes how much memory to reserve |
56 | for the second kernel, and X denotes at what physical address the reserved | 60 | for the second kernel, and X denotes at what physical address the |
57 | memory section starts. For example: "crashkernel=64M@16M". | 61 | reserved memory section starts. For example: "crashkernel=64M@16M". |
58 | 62 | ||
59 | B) Second kernel: | 63 | |
60 | a) Enable "kernel crash dumps" feature (in Processor type and features). | 64 | B) Second kernel or dump capture kernel: |
61 | CONFIG_CRASH_DUMP=y | 65 | --------------------------------------- |
62 | b) Specify a suitable value for "Physical address where the kernel is | 66 | a) For i386 architecture enable Highmem support |
63 | loaded" (in Processor type and features). Typically this value | 67 | CONFIG_HIGHMEM=y |
64 | should be same as X (See option d) above, e.g., 16 MB or 0x1000000. | 68 | b) Enable "kernel crash dumps" feature (under "Processor type and features") |
65 | CONFIG_PHYSICAL_START=0x1000000 | 69 | CONFIG_CRASH_DUMP=y |
66 | c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems). | 70 | c) Make sure a suitable value for "Physical address where the kernel is |
67 | CONFIG_PROC_VMCORE=y | 71 | loaded" (under "Processor type and features"). By default this value |
68 | d) Disable SMP support and build a UP kernel (Until it is fixed). | 72 | is 0x1000000 (16MB) and it should be same as X (See option d above), |
69 | CONFIG_SMP=n | 73 | e.g., 16 MB or 0x1000000. |
70 | e) Enable "Local APIC support on uniprocessors". | 74 | CONFIG_PHYSICAL_START=0x1000000 |
71 | CONFIG_X86_UP_APIC=y | 75 | d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems"). |
72 | f) Enable "IO-APIC support on uniprocessors" | 76 | CONFIG_PROC_VMCORE=y |
73 | CONFIG_X86_UP_IOAPIC=y | 77 | |
74 | 78 | 3) After booting to regular kernel or first kernel, load the second kernel | |
75 | Note: i) Options a) and b) depend upon "Configure standard kernel features | 79 | using the following command: |
76 | (for small systems)" (under General setup). | ||
77 | ii) Option a) also depends on CONFIG_HIGHMEM (under Processor | ||
78 | type and features). | ||
79 | iii) Both option a) and b) are under "Processor type and features". | ||
80 | |||
81 | 3) Boot into the first kernel. You are now ready to try out kexec-based crash | ||
82 | dumps. | ||
83 | |||
84 | 4) Load the second kernel to be booted using: | ||
85 | 80 | ||
86 | kexec -p <second-kernel> --args-linux --elf32-core-headers | 81 | kexec -p <second-kernel> --args-linux --elf32-core-headers |
87 | --append="root=<root-dev> init 1 irqpoll" | 82 | --append="root=<root-dev> init 1 irqpoll maxcpus=1" |
88 | 83 | ||
89 | Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, | 84 | Notes: |
90 | as of now. | 85 | ====== |
91 | ii) By default ELF headers are stored in ELF64 format. Option | 86 | i) <second-kernel> has to be a vmlinux image ie uncompressed elf image. |
92 | --elf32-core-headers forces generation of ELF32 headers. gdb can | 87 | bzImage will not work, as of now. |
93 | not open ELF64 headers on 32 bit systems. So creating ELF32 | 88 | ii) --args-linux has to be speicfied as if kexec it loading an elf image, |
94 | headers can come handy for users who have got non-PAE systems and | 89 | it needs to know that the arguments supplied are of linux type. |
95 | hence have memory less than 4GB. | 90 | iii) By default ELF headers are stored in ELF64 format to support systems |
96 | iii) Specify "irqpoll" as command line parameter. This reduces driver | 91 | with more than 4GB memory. Option --elf32-core-headers forces generation |
97 | initialization failures in second kernel due to shared interrupts. | 92 | of ELF32 headers. The reason for this option being, as of now gdb can |
98 | iv) <root-dev> needs to be specified in a format corresponding to | 93 | not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32 |
99 | the root device name in the output of mount command. | 94 | headers can be used if one has non-PAE systems and hence memory less |
100 | v) If you have built the drivers required to mount root file | 95 | than 4GB. |
101 | system as modules in <second-kernel>, then, specify | 96 | iv) Specify "irqpoll" as command line parameter. This reduces driver |
102 | --initrd=<initrd-for-second-kernel>. | 97 | initialization failures in second kernel due to shared interrupts. |
103 | 98 | v) <root-dev> needs to be specified in a format corresponding to the root | |
104 | 5) System reboots into the second kernel when a panic occurs. A module can be | 99 | device name in the output of mount command. |
105 | written to force the panic or "ALT-SysRq-c" can be used initiate a crash | 100 | vi) If you have built the drivers required to mount root file system as |
106 | dump for testing purposes. | 101 | modules in <second-kernel>, then, specify |
107 | 102 | --initrd=<initrd-for-second-kernel>. | |
108 | 6) Write out the dump file using | 103 | vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on |
104 | non-boot cpus, second kernel doesn't seem to be boot up all the cpus. | ||
105 | The other option is to always built the second kernel without SMP | ||
106 | support ie CONFIG_SMP=n | ||
107 | |||
108 | 4) After successfully loading the second kernel as above, if a panic occurs | ||
109 | system reboots into the second kernel. A module can be written to force | ||
110 | the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing | ||
111 | purposes. | ||
112 | |||
113 | 5) Once the second kernel has booted, write out the dump file using | ||
109 | 114 | ||
110 | cp /proc/vmcore <dump-file> | 115 | cp /proc/vmcore <dump-file> |
111 | 116 | ||
@@ -119,9 +124,9 @@ SETUP | |||
119 | 124 | ||
120 | Entire memory: dd if=/dev/oldmem of=oldmem.001 | 125 | Entire memory: dd if=/dev/oldmem of=oldmem.001 |
121 | 126 | ||
127 | |||
122 | ANALYSIS | 128 | ANALYSIS |
123 | ======== | 129 | ======== |
124 | |||
125 | Limited analysis can be done using gdb on the dump file copied out of | 130 | Limited analysis can be done using gdb on the dump file copied out of |
126 | /proc/vmcore. Use vmlinux built with -g and run | 131 | /proc/vmcore. Use vmlinux built with -g and run |
127 | 132 | ||
@@ -132,15 +137,19 @@ work fine. | |||
132 | 137 | ||
133 | Note: gdb cannot analyse core files generated in ELF64 format for i386. | 138 | Note: gdb cannot analyse core files generated in ELF64 format for i386. |
134 | 139 | ||
140 | Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site | ||
141 | http://people.redhat.com/~anderson/ works well with kdump format. | ||
142 | |||
143 | |||
135 | TODO | 144 | TODO |
136 | ==== | 145 | ==== |
137 | |||
138 | 1) Provide a kernel pages filtering mechanism so that core file size is not | 146 | 1) Provide a kernel pages filtering mechanism so that core file size is not |
139 | insane on systems having huge memory banks. | 147 | insane on systems having huge memory banks. |
140 | 2) Modify "crash" tool to make it recognize this dump. | 148 | 2) Relocatable kernel can help in maintaining multiple kernels for crashdump |
149 | and same kernel as the first kernel can be used to capture the dump. | ||
150 | |||
141 | 151 | ||
142 | CONTACT | 152 | CONTACT |
143 | ======= | 153 | ======= |
144 | |||
145 | Vivek Goyal (vgoyal@in.ibm.com) | 154 | Vivek Goyal (vgoyal@in.ibm.com) |
146 | Maneesh Soni (maneesh@in.ibm.com) | 155 | Maneesh Soni (maneesh@in.ibm.com) |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 61a56b100c62..dd0bfc291a68 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -475,10 +475,11 @@ running once the system is up. | |||
475 | See Documentation/block/as-iosched.txt and | 475 | See Documentation/block/as-iosched.txt and |
476 | Documentation/block/deadline-iosched.txt for details. | 476 | Documentation/block/deadline-iosched.txt for details. |
477 | 477 | ||
478 | elfcorehdr= [IA-32] | 478 | elfcorehdr= [IA-32, X86_64] |
479 | Specifies physical address of start of kernel core | 479 | Specifies physical address of start of kernel core |
480 | image elf header. | 480 | image elf header. Generally kexec loader will |
481 | See Documentation/kdump.txt for details. | 481 | pass this option to capture kernel. |
482 | See Documentation/kdump/kdump.txt for details. | ||
482 | 483 | ||
483 | enforcing [SELINUX] Set initial enforcing status. | 484 | enforcing [SELINUX] Set initial enforcing status. |
484 | Format: {"0" | "1"} | 485 | Format: {"0" | "1"} |
@@ -832,7 +833,7 @@ running once the system is up. | |||
832 | mem=nopentium [BUGS=IA-32] Disable usage of 4MB pages for kernel | 833 | mem=nopentium [BUGS=IA-32] Disable usage of 4MB pages for kernel |
833 | memory. | 834 | memory. |
834 | 835 | ||
835 | memmap=exactmap [KNL,IA-32] Enable setting of an exact | 836 | memmap=exactmap [KNL,IA-32,X86_64] Enable setting of an exact |
836 | E820 memory map, as specified by the user. | 837 | E820 memory map, as specified by the user. |
837 | Such memmap=exactmap lines can be constructed based on | 838 | Such memmap=exactmap lines can be constructed based on |
838 | BIOS output or other requirements. See the memmap=nn@ss | 839 | BIOS output or other requirements. See the memmap=nn@ss |
@@ -910,6 +911,14 @@ running once the system is up. | |||
910 | nfsroot= [NFS] nfs root filesystem for disk-less boxes. | 911 | nfsroot= [NFS] nfs root filesystem for disk-less boxes. |
911 | See Documentation/nfsroot.txt. | 912 | See Documentation/nfsroot.txt. |
912 | 913 | ||
914 | nfs.callback_tcpport= | ||
915 | [NFS] set the TCP port on which the NFSv4 callback | ||
916 | channel should listen. | ||
917 | |||
918 | nfs.idmap_cache_timeout= | ||
919 | [NFS] set the maximum lifetime for idmapper cache | ||
920 | entries. | ||
921 | |||
913 | nmi_watchdog= [KNL,BUGS=IA-32] Debugging features for SMP kernels | 922 | nmi_watchdog= [KNL,BUGS=IA-32] Debugging features for SMP kernels |
914 | 923 | ||
915 | no387 [BUGS=IA-32] Tells the kernel to use the 387 maths | 924 | no387 [BUGS=IA-32] Tells the kernel to use the 387 maths |
@@ -990,6 +999,8 @@ running once the system is up. | |||
990 | 999 | ||
991 | nowb [ARM] | 1000 | nowb [ARM] |
992 | 1001 | ||
1002 | nr_uarts= [SERIAL] maximum number of UARTs to be registered. | ||
1003 | |||
993 | opl3= [HW,OSS] | 1004 | opl3= [HW,OSS] |
994 | Format: <io> | 1005 | Format: <io> |
995 | 1006 | ||
@@ -1168,6 +1179,10 @@ running once the system is up. | |||
1168 | Limit processor to maximum C-state | 1179 | Limit processor to maximum C-state |
1169 | max_cstate=9 overrides any DMI blacklist limit. | 1180 | max_cstate=9 overrides any DMI blacklist limit. |
1170 | 1181 | ||
1182 | processor.nocst [HW,ACPI] | ||
1183 | Ignore the _CST method to determine C-states, | ||
1184 | instead using the legacy FADT method | ||
1185 | |||
1171 | prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk | 1186 | prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk |
1172 | before loading. | 1187 | before loading. |
1173 | See Documentation/ramdisk.txt. | 1188 | See Documentation/ramdisk.txt. |
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 5f2b9c5edbb5..22488d791168 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt | |||
@@ -56,10 +56,12 @@ A request proceeds in the following manner: | |||
56 | (4) request_key() then forks and executes /sbin/request-key with a new session | 56 | (4) request_key() then forks and executes /sbin/request-key with a new session |
57 | keyring that contains a link to auth key V. | 57 | keyring that contains a link to auth key V. |
58 | 58 | ||
59 | (5) /sbin/request-key execs an appropriate program to perform the actual | 59 | (5) /sbin/request-key assumes the authority associated with key U. |
60 | |||
61 | (6) /sbin/request-key execs an appropriate program to perform the actual | ||
60 | instantiation. | 62 | instantiation. |
61 | 63 | ||
62 | (6) The program may want to access another key from A's context (say a | 64 | (7) The program may want to access another key from A's context (say a |
63 | Kerberos TGT key). It just requests the appropriate key, and the keyring | 65 | Kerberos TGT key). It just requests the appropriate key, and the keyring |
64 | search notes that the session keyring has auth key V in its bottom level. | 66 | search notes that the session keyring has auth key V in its bottom level. |
65 | 67 | ||
@@ -67,19 +69,19 @@ A request proceeds in the following manner: | |||
67 | UID, GID, groups and security info of process A as if it was process A, | 69 | UID, GID, groups and security info of process A as if it was process A, |
68 | and come up with key W. | 70 | and come up with key W. |
69 | 71 | ||
70 | (7) The program then does what it must to get the data with which to | 72 | (8) The program then does what it must to get the data with which to |
71 | instantiate key U, using key W as a reference (perhaps it contacts a | 73 | instantiate key U, using key W as a reference (perhaps it contacts a |
72 | Kerberos server using the TGT) and then instantiates key U. | 74 | Kerberos server using the TGT) and then instantiates key U. |
73 | 75 | ||
74 | (8) Upon instantiating key U, auth key V is automatically revoked so that it | 76 | (9) Upon instantiating key U, auth key V is automatically revoked so that it |
75 | may not be used again. | 77 | may not be used again. |
76 | 78 | ||
77 | (9) The program then exits 0 and request_key() deletes key V and returns key | 79 | (10) The program then exits 0 and request_key() deletes key V and returns key |
78 | U to the caller. | 80 | U to the caller. |
79 | 81 | ||
80 | This also extends further. If key W (step 5 above) didn't exist, key W would be | 82 | This also extends further. If key W (step 7 above) didn't exist, key W would be |
81 | created uninstantiated, another auth key (X) would be created [as per step 3] | 83 | created uninstantiated, another auth key (X) would be created (as per step 3) |
82 | and another copy of /sbin/request-key spawned [as per step 4]; but the context | 84 | and another copy of /sbin/request-key spawned (as per step 4); but the context |
83 | specified by auth key X will still be process A, as it was in auth key V. | 85 | specified by auth key X will still be process A, as it was in auth key V. |
84 | 86 | ||
85 | This is because process A's keyrings can't simply be attached to | 87 | This is because process A's keyrings can't simply be attached to |
@@ -138,8 +140,8 @@ until one succeeds: | |||
138 | 140 | ||
139 | (3) The process's session keyring is searched. | 141 | (3) The process's session keyring is searched. |
140 | 142 | ||
141 | (4) If the process has a request_key() authorisation key in its session | 143 | (4) If the process has assumed the authority associated with a request_key() |
142 | keyring then: | 144 | authorisation key then: |
143 | 145 | ||
144 | (a) If extant, the calling process's thread keyring is searched. | 146 | (a) If extant, the calling process's thread keyring is searched. |
145 | 147 | ||
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 31154882000a..aaa01b0e3ee9 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -308,6 +308,8 @@ process making the call: | |||
308 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring | 308 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring |
309 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring | 309 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring |
310 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring | 310 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring |
311 | KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() | ||
312 | authorisation key | ||
311 | 313 | ||
312 | 314 | ||
313 | The main syscalls are: | 315 | The main syscalls are: |
@@ -498,7 +500,11 @@ The keyctl syscall functions are: | |||
498 | keyring is full, error ENFILE will result. | 500 | keyring is full, error ENFILE will result. |
499 | 501 | ||
500 | The link procedure checks the nesting of the keyrings, returning ELOOP if | 502 | The link procedure checks the nesting of the keyrings, returning ELOOP if |
501 | it appears to deep or EDEADLK if the link would introduce a cycle. | 503 | it appears too deep or EDEADLK if the link would introduce a cycle. |
504 | |||
505 | Any links within the keyring to keys that match the new key in terms of | ||
506 | type and description will be discarded from the keyring as the new one is | ||
507 | added. | ||
502 | 508 | ||
503 | 509 | ||
504 | (*) Unlink a key or keyring from another keyring: | 510 | (*) Unlink a key or keyring from another keyring: |
@@ -628,6 +634,41 @@ The keyctl syscall functions are: | |||
628 | there is one, otherwise the user default session keyring. | 634 | there is one, otherwise the user default session keyring. |
629 | 635 | ||
630 | 636 | ||
637 | (*) Set the timeout on a key. | ||
638 | |||
639 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); | ||
640 | |||
641 | This sets or clears the timeout on a key. The timeout can be 0 to clear | ||
642 | the timeout or a number of seconds to set the expiry time that far into | ||
643 | the future. | ||
644 | |||
645 | The process must have attribute modification access on a key to set its | ||
646 | timeout. Timeouts may not be set with this function on negative, revoked | ||
647 | or expired keys. | ||
648 | |||
649 | |||
650 | (*) Assume the authority granted to instantiate a key | ||
651 | |||
652 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); | ||
653 | |||
654 | This assumes or divests the authority required to instantiate the | ||
655 | specified key. Authority can only be assumed if the thread has the | ||
656 | authorisation key associated with the specified key in its keyrings | ||
657 | somewhere. | ||
658 | |||
659 | Once authority is assumed, searches for keys will also search the | ||
660 | requester's keyrings using the requester's security label, UID, GID and | ||
661 | groups. | ||
662 | |||
663 | If the requested authority is unavailable, error EPERM will be returned, | ||
664 | likewise if the authority has been revoked because the target key is | ||
665 | already instantiated. | ||
666 | |||
667 | If the specified key is 0, then any assumed authority will be divested. | ||
668 | |||
669 | The assumed authorititive key is inherited across fork and exec. | ||
670 | |||
671 | |||
631 | =============== | 672 | =============== |
632 | KERNEL SERVICES | 673 | KERNEL SERVICES |
633 | =============== | 674 | =============== |
@@ -860,24 +901,6 @@ The structure has a number of fields, some of which are mandatory: | |||
860 | It is safe to sleep in this method. | 901 | It is safe to sleep in this method. |
861 | 902 | ||
862 | 903 | ||
863 | (*) int (*duplicate)(struct key *key, const struct key *source); | ||
864 | |||
865 | If this type of key can be duplicated, then this method should be | ||
866 | provided. It is called to copy the payload attached to the source into the | ||
867 | new key. The data length on the new key will have been updated and the | ||
868 | quota adjusted already. | ||
869 | |||
870 | This method will be called with the source key's semaphore read-locked to | ||
871 | prevent its payload from being changed, thus RCU constraints need not be | ||
872 | applied to the source key. | ||
873 | |||
874 | This method does not have to lock the destination key in order to attach a | ||
875 | payload. The fact that KEY_FLAG_INSTANTIATED is not set in key->flags | ||
876 | prevents anything else from gaining access to the key. | ||
877 | |||
878 | It is safe to sleep in this method. | ||
879 | |||
880 | |||
881 | (*) int (*update)(struct key *key, const void *data, size_t datalen); | 904 | (*) int (*update)(struct key *key, const void *data, size_t datalen); |
882 | 905 | ||
883 | If this type of key can be updated, then this method should be provided. | 906 | If this type of key can be updated, then this method should be provided. |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 0541fe1de704..0ea5a0c6e827 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt | |||
@@ -411,7 +411,8 @@ int init_module(void) | |||
411 | printk("Couldn't find %s to plant kprobe\n", "do_fork"); | 411 | printk("Couldn't find %s to plant kprobe\n", "do_fork"); |
412 | return -1; | 412 | return -1; |
413 | } | 413 | } |
414 | if ((ret = register_kprobe(&kp) < 0)) { | 414 | ret = register_kprobe(&kp); |
415 | if (ret < 0) { | ||
415 | printk("register_kprobe failed, returned %d\n", ret); | 416 | printk("register_kprobe failed, returned %d\n", ret); |
416 | return -1; | 417 | return -1; |
417 | } | 418 | } |
diff --git a/Documentation/locks.txt b/Documentation/locks.txt index ce1be79edfb8..e3b402ef33bd 100644 --- a/Documentation/locks.txt +++ b/Documentation/locks.txt | |||
@@ -65,20 +65,3 @@ The default is to disallow mandatory locking. The intention is that | |||
65 | mandatory locking only be enabled on a local filesystem as the specific need | 65 | mandatory locking only be enabled on a local filesystem as the specific need |
66 | arises. | 66 | arises. |
67 | 67 | ||
68 | Until an updated version of mount(8) becomes available you may have to apply | ||
69 | this patch to the mount sources (based on the version distributed with Rick | ||
70 | Faith's util-linux-2.5 package): | ||
71 | |||
72 | *** mount.c.orig Sat Jun 8 09:14:31 1996 | ||
73 | --- mount.c Sat Jun 8 09:13:02 1996 | ||
74 | *************** | ||
75 | *** 100,105 **** | ||
76 | --- 100,107 ---- | ||
77 | { "noauto", 0, MS_NOAUTO }, /* Can only be mounted explicitly */ | ||
78 | { "user", 0, MS_USER }, /* Allow ordinary user to mount */ | ||
79 | { "nouser", 1, MS_USER }, /* Forbid ordinary user to mount */ | ||
80 | + { "mand", 0, MS_MANDLOCK }, /* Allow mandatory locks on this FS */ | ||
81 | + { "nomand", 1, MS_MANDLOCK }, /* Forbid mandatory locks on this FS */ | ||
82 | /* add new options here */ | ||
83 | #ifdef MS_NOSUB | ||
84 | { "sub", 1, MS_NOSUB }, /* allow submounts */ | ||
diff --git a/Documentation/md.txt b/Documentation/md.txt index 23e6cce40f9c..03a13c462cf2 100644 --- a/Documentation/md.txt +++ b/Documentation/md.txt | |||
@@ -51,6 +51,30 @@ superblock can be autodetected and run at boot time. | |||
51 | The kernel parameter "raid=partitionable" (or "raid=part") means | 51 | The kernel parameter "raid=partitionable" (or "raid=part") means |
52 | that all auto-detected arrays are assembled as partitionable. | 52 | that all auto-detected arrays are assembled as partitionable. |
53 | 53 | ||
54 | Boot time assembly of degraded/dirty arrays | ||
55 | ------------------------------------------- | ||
56 | |||
57 | If a raid5 or raid6 array is both dirty and degraded, it could have | ||
58 | undetectable data corruption. This is because the fact that it is | ||
59 | 'dirty' means that the parity cannot be trusted, and the fact that it | ||
60 | is degraded means that some datablocks are missing and cannot reliably | ||
61 | be reconstructed (due to no parity). | ||
62 | |||
63 | For this reason, md will normally refuse to start such an array. This | ||
64 | requires the sysadmin to take action to explicitly start the array | ||
65 | desipite possible corruption. This is normally done with | ||
66 | mdadm --assemble --force .... | ||
67 | |||
68 | This option is not really available if the array has the root | ||
69 | filesystem on it. In order to support this booting from such an | ||
70 | array, md supports a module parameter "start_dirty_degraded" which, | ||
71 | when set to 1, bypassed the checks and will allows dirty degraded | ||
72 | arrays to be started. | ||
73 | |||
74 | So, to boot with a root filesystem of a dirty degraded raid[56], use | ||
75 | |||
76 | md-mod.start_dirty_degraded=1 | ||
77 | |||
54 | 78 | ||
55 | Superblock formats | 79 | Superblock formats |
56 | ------------------ | 80 | ------------------ |
@@ -141,6 +165,70 @@ All md devices contain: | |||
141 | in a fully functional array. If this is not yet known, the file | 165 | in a fully functional array. If this is not yet known, the file |
142 | will be empty. If an array is being resized (not currently | 166 | will be empty. If an array is being resized (not currently |
143 | possible) this will contain the larger of the old and new sizes. | 167 | possible) this will contain the larger of the old and new sizes. |
168 | Some raid level (RAID1) allow this value to be set while the | ||
169 | array is active. This will reconfigure the array. Otherwise | ||
170 | it can only be set while assembling an array. | ||
171 | |||
172 | chunk_size | ||
173 | This is the size if bytes for 'chunks' and is only relevant to | ||
174 | raid levels that involve striping (1,4,5,6,10). The address space | ||
175 | of the array is conceptually divided into chunks and consecutive | ||
176 | chunks are striped onto neighbouring devices. | ||
177 | The size should be atleast PAGE_SIZE (4k) and should be a power | ||
178 | of 2. This can only be set while assembling an array | ||
179 | |||
180 | component_size | ||
181 | For arrays with data redundancy (i.e. not raid0, linear, faulty, | ||
182 | multipath), all components must be the same size - or at least | ||
183 | there must a size that they all provide space for. This is a key | ||
184 | part or the geometry of the array. It is measured in sectors | ||
185 | and can be read from here. Writing to this value may resize | ||
186 | the array if the personality supports it (raid1, raid5, raid6), | ||
187 | and if the component drives are large enough. | ||
188 | |||
189 | metadata_version | ||
190 | This indicates the format that is being used to record metadata | ||
191 | about the array. It can be 0.90 (traditional format), 1.0, 1.1, | ||
192 | 1.2 (newer format in varying locations) or "none" indicating that | ||
193 | the kernel isn't managing metadata at all. | ||
194 | |||
195 | level | ||
196 | The raid 'level' for this array. The name will often (but not | ||
197 | always) be the same as the name of the module that implements the | ||
198 | level. To be auto-loaded the module must have an alias | ||
199 | md-$LEVEL e.g. md-raid5 | ||
200 | This can be written only while the array is being assembled, not | ||
201 | after it is started. | ||
202 | |||
203 | new_dev | ||
204 | This file can be written but not read. The value written should | ||
205 | be a block device number as major:minor. e.g. 8:0 | ||
206 | This will cause that device to be attached to the array, if it is | ||
207 | available. It will then appear at md/dev-XXX (depending on the | ||
208 | name of the device) and further configuration is then possible. | ||
209 | |||
210 | sync_speed_min | ||
211 | sync_speed_max | ||
212 | This are similar to /proc/sys/dev/raid/speed_limit_{min,max} | ||
213 | however they only apply to the particular array. | ||
214 | If no value has been written to these, of if the word 'system' | ||
215 | is written, then the system-wide value is used. If a value, | ||
216 | in kibibytes-per-second is written, then it is used. | ||
217 | When the files are read, they show the currently active value | ||
218 | followed by "(local)" or "(system)" depending on whether it is | ||
219 | a locally set or system-wide value. | ||
220 | |||
221 | sync_completed | ||
222 | This shows the number of sectors that have been completed of | ||
223 | whatever the current sync_action is, followed by the number of | ||
224 | sectors in total that could need to be processed. The two | ||
225 | numbers are separated by a '/' thus effectively showing one | ||
226 | value, a fraction of the process that is complete. | ||
227 | |||
228 | sync_speed | ||
229 | This shows the current actual speed, in K/sec, of the current | ||
230 | sync_action. It is averaged over the last 30 seconds. | ||
231 | |||
144 | 232 | ||
145 | As component devices are added to an md array, they appear in the 'md' | 233 | As component devices are added to an md array, they appear in the 'md' |
146 | directory as new directories named | 234 | directory as new directories named |
@@ -167,6 +255,38 @@ Each directory contains: | |||
167 | of being recoverred to | 255 | of being recoverred to |
168 | This list make grow in future. | 256 | This list make grow in future. |
169 | 257 | ||
258 | errors | ||
259 | An approximate count of read errors that have been detected on | ||
260 | this device but have not caused the device to be evicted from | ||
261 | the array (either because they were corrected or because they | ||
262 | happened while the array was read-only). When using version-1 | ||
263 | metadata, this value persists across restarts of the array. | ||
264 | |||
265 | This value can be written while assembling an array thus | ||
266 | providing an ongoing count for arrays with metadata managed by | ||
267 | userspace. | ||
268 | |||
269 | slot | ||
270 | This gives the role that the device has in the array. It will | ||
271 | either be 'none' if the device is not active in the array | ||
272 | (i.e. is a spare or has failed) or an integer less than the | ||
273 | 'raid_disks' number for the array indicating which possition | ||
274 | it currently fills. This can only be set while assembling an | ||
275 | array. A device for which this is set is assumed to be working. | ||
276 | |||
277 | offset | ||
278 | This gives the location in the device (in sectors from the | ||
279 | start) where data from the array will be stored. Any part of | ||
280 | the device before this offset us not touched, unless it is | ||
281 | used for storing metadata (Formats 1.1 and 1.2). | ||
282 | |||
283 | size | ||
284 | The amount of the device, after the offset, that can be used | ||
285 | for storage of data. This will normally be the same as the | ||
286 | component_size. This can be written while assembling an | ||
287 | array. If a value less than the current component_size is | ||
288 | written, component_size will be reduced to this value. | ||
289 | |||
170 | 290 | ||
171 | An active md device will also contain and entry for each active device | 291 | An active md device will also contain and entry for each active device |
172 | in the array. These are named | 292 | in the array. These are named |
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt new file mode 100644 index 000000000000..cbf79881a41c --- /dev/null +++ b/Documentation/mutex-design.txt | |||
@@ -0,0 +1,135 @@ | |||
1 | Generic Mutex Subsystem | ||
2 | |||
3 | started by Ingo Molnar <mingo@redhat.com> | ||
4 | |||
5 | "Why on earth do we need a new mutex subsystem, and what's wrong | ||
6 | with semaphores?" | ||
7 | |||
8 | firstly, there's nothing wrong with semaphores. But if the simpler | ||
9 | mutex semantics are sufficient for your code, then there are a couple | ||
10 | of advantages of mutexes: | ||
11 | |||
12 | - 'struct mutex' is smaller on most architectures: .e.g on x86, | ||
13 | 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. | ||
14 | A smaller structure size means less RAM footprint, and better | ||
15 | CPU-cache utilization. | ||
16 | |||
17 | - tighter code. On x86 i get the following .text sizes when | ||
18 | switching all mutex-alike semaphores in the kernel to the mutex | ||
19 | subsystem: | ||
20 | |||
21 | text data bss dec hex filename | ||
22 | 3280380 868188 396860 4545428 455b94 vmlinux-semaphore | ||
23 | 3255329 865296 396732 4517357 44eded vmlinux-mutex | ||
24 | |||
25 | that's 25051 bytes of code saved, or a 0.76% win - off the hottest | ||
26 | codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%) | ||
27 | Smaller code means better icache footprint, which is one of the | ||
28 | major optimization goals in the Linux kernel currently. | ||
29 | |||
30 | - the mutex subsystem is slightly faster and has better scalability for | ||
31 | contended workloads. On an 8-way x86 system, running a mutex-based | ||
32 | kernel and testing creat+unlink+close (of separate, per-task files) | ||
33 | in /tmp with 16 parallel tasks, the average number of ops/sec is: | ||
34 | |||
35 | Semaphores: Mutexes: | ||
36 | |||
37 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
38 | 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
39 | checking VFS performance. checking VFS performance. | ||
40 | avg loops/sec: 34713 avg loops/sec: 84153 | ||
41 | CPU utilization: 63% CPU utilization: 22% | ||
42 | |||
43 | i.e. in this workload, the mutex based kernel was 2.4 times faster | ||
44 | than the semaphore based kernel, _and_ it also had 2.8 times less CPU | ||
45 | utilization. (In terms of 'ops per CPU cycle', the semaphore kernel | ||
46 | performed 551 ops/sec per 1% of CPU time used, while the mutex kernel | ||
47 | performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times | ||
48 | more efficient.) | ||
49 | |||
50 | the scalability difference is visible even on a 2-way P4 HT box: | ||
51 | |||
52 | Semaphores: Mutexes: | ||
53 | |||
54 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
55 | 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
56 | checking VFS performance. checking VFS performance. | ||
57 | avg loops/sec: 127659 avg loops/sec: 181082 | ||
58 | CPU utilization: 100% CPU utilization: 34% | ||
59 | |||
60 | (the straight performance advantage of mutexes is 41%, the per-cycle | ||
61 | efficiency of mutexes is 4.1 times better.) | ||
62 | |||
63 | - there are no fastpath tradeoffs, the mutex fastpath is just as tight | ||
64 | as the semaphore fastpath. On x86, the locking fastpath is 2 | ||
65 | instructions: | ||
66 | |||
67 | c0377ccb <mutex_lock>: | ||
68 | c0377ccb: f0 ff 08 lock decl (%eax) | ||
69 | c0377cce: 78 0e js c0377cde <.text.lock.mutex> | ||
70 | c0377cd0: c3 ret | ||
71 | |||
72 | the unlocking fastpath is equally tight: | ||
73 | |||
74 | c0377cd1 <mutex_unlock>: | ||
75 | c0377cd1: f0 ff 00 lock incl (%eax) | ||
76 | c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7> | ||
77 | c0377cd6: c3 ret | ||
78 | |||
79 | - 'struct mutex' semantics are well-defined and are enforced if | ||
80 | CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have | ||
81 | virtually no debugging code or instrumentation. The mutex subsystem | ||
82 | checks and enforces the following rules: | ||
83 | |||
84 | * - only one task can hold the mutex at a time | ||
85 | * - only the owner can unlock the mutex | ||
86 | * - multiple unlocks are not permitted | ||
87 | * - recursive locking is not permitted | ||
88 | * - a mutex object must be initialized via the API | ||
89 | * - a mutex object must not be initialized via memset or copying | ||
90 | * - task may not exit with mutex held | ||
91 | * - memory areas where held locks reside must not be freed | ||
92 | * - held mutexes must not be reinitialized | ||
93 | * - mutexes may not be used in irq contexts | ||
94 | |||
95 | furthermore, there are also convenience features in the debugging | ||
96 | code: | ||
97 | |||
98 | * - uses symbolic names of mutexes, whenever they are printed in debug output | ||
99 | * - point-of-acquire tracking, symbolic lookup of function names | ||
100 | * - list of all locks held in the system, printout of them | ||
101 | * - owner tracking | ||
102 | * - detects self-recursing locks and prints out all relevant info | ||
103 | * - detects multi-task circular deadlocks and prints out all affected | ||
104 | * locks and tasks (and only those tasks) | ||
105 | |||
106 | Disadvantages | ||
107 | ------------- | ||
108 | |||
109 | The stricter mutex API means you cannot use mutexes the same way you | ||
110 | can use semaphores: e.g. they cannot be used from an interrupt context, | ||
111 | nor can they be unlocked from a different context that which acquired | ||
112 | it. [ I'm not aware of any other (e.g. performance) disadvantages from | ||
113 | using mutexes at the moment, please let me know if you find any. ] | ||
114 | |||
115 | Implementation of mutexes | ||
116 | ------------------------- | ||
117 | |||
118 | 'struct mutex' is the new mutex type, defined in include/linux/mutex.h | ||
119 | and implemented in kernel/mutex.c. It is a counter-based mutex with a | ||
120 | spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", | ||
121 | 0 for "locked" and negative numbers (usually -1) for "locked, potential | ||
122 | waiters queued". | ||
123 | |||
124 | the APIs of 'struct mutex' have been streamlined: | ||
125 | |||
126 | DEFINE_MUTEX(name); | ||
127 | |||
128 | mutex_init(mutex); | ||
129 | |||
130 | void mutex_lock(struct mutex *lock); | ||
131 | int mutex_lock_interruptible(struct mutex *lock); | ||
132 | int mutex_trylock(struct mutex *lock); | ||
133 | void mutex_unlock(struct mutex *lock); | ||
134 | int mutex_is_locked(struct mutex *lock); | ||
135 | |||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index b0fe41da007b..8d8b4e5ea184 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -945,7 +945,6 @@ bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | |||
945 | collisions:0 txqueuelen:0 | 945 | collisions:0 txqueuelen:0 |
946 | 946 | ||
947 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 947 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
948 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 | ||
949 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 948 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
950 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 | 949 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 |
951 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 | 950 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 |
@@ -953,7 +952,6 @@ eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | |||
953 | Interrupt:10 Base address:0x1080 | 952 | Interrupt:10 Base address:0x1080 |
954 | 953 | ||
955 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 954 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
956 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 | ||
957 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 955 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
958 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 | 956 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 |
959 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 | 957 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 |
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt index 851fc97bb22f..f9d979ee9526 100644 --- a/Documentation/networking/sk98lin.txt +++ b/Documentation/networking/sk98lin.txt | |||
@@ -245,7 +245,7 @@ Default: Both | |||
245 | This parameters is only relevant if auto-negotiation for this port is | 245 | This parameters is only relevant if auto-negotiation for this port is |
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | 246 | not set to "Sense". If auto-negotiation is set to "On", all three values |
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | 247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. |
248 | This parameter is usefull if your link partner does not support all | 248 | This parameter is useful if your link partner does not support all |
249 | possible combinations. | 249 | possible combinations. |
250 | 250 | ||
251 | Flow Control | 251 | Flow Control |
diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt new file mode 100644 index 000000000000..d089967e4948 --- /dev/null +++ b/Documentation/pci-error-recovery.txt | |||
@@ -0,0 +1,246 @@ | |||
1 | |||
2 | PCI Error Recovery | ||
3 | ------------------ | ||
4 | May 31, 2005 | ||
5 | |||
6 | Current document maintainer: | ||
7 | Linas Vepstas <linas@austin.ibm.com> | ||
8 | |||
9 | |||
10 | Some PCI bus controllers are able to detect certain "hard" PCI errors | ||
11 | on the bus, such as parity errors on the data and address busses, as | ||
12 | well as SERR and PERR errors. These chipsets are then able to disable | ||
13 | I/O to/from the affected device, so that, for example, a bad DMA | ||
14 | address doesn't end up corrupting system memory. These same chipsets | ||
15 | are also able to reset the affected PCI device, and return it to | ||
16 | working condition. This document describes a generic API form | ||
17 | performing error recovery. | ||
18 | |||
19 | The core idea is that after a PCI error has been detected, there must | ||
20 | be a way for the kernel to coordinate with all affected device drivers | ||
21 | so that the pci card can be made operational again, possibly after | ||
22 | performing a full electrical #RST of the PCI card. The API below | ||
23 | provides a generic API for device drivers to be notified of PCI | ||
24 | errors, and to be notified of, and respond to, a reset sequence. | ||
25 | |||
26 | Preliminary sketch of API, cut-n-pasted-n-modified email from | ||
27 | Ben Herrenschmidt, circa 5 april 2005 | ||
28 | |||
29 | The error recovery API support is exposed to the driver in the form of | ||
30 | a structure of function pointers pointed to by a new field in struct | ||
31 | pci_driver. The absence of this pointer in pci_driver denotes an | ||
32 | "non-aware" driver, behaviour on these is platform dependant. | ||
33 | Platforms like ppc64 can try to simulate pci hotplug remove/add. | ||
34 | |||
35 | The definition of "pci_error_token" is not covered here. It is based on | ||
36 | Seto's work on the synchronous error detection. We still need to define | ||
37 | functions for extracting infos out of an opaque error token. This is | ||
38 | separate from this API. | ||
39 | |||
40 | This structure has the form: | ||
41 | |||
42 | struct pci_error_handlers | ||
43 | { | ||
44 | int (*error_detected)(struct pci_dev *dev, pci_error_token error); | ||
45 | int (*mmio_enabled)(struct pci_dev *dev); | ||
46 | int (*resume)(struct pci_dev *dev); | ||
47 | int (*link_reset)(struct pci_dev *dev); | ||
48 | int (*slot_reset)(struct pci_dev *dev); | ||
49 | }; | ||
50 | |||
51 | A driver doesn't have to implement all of these callbacks. The | ||
52 | only mandatory one is error_detected(). If a callback is not | ||
53 | implemented, the corresponding feature is considered unsupported. | ||
54 | For example, if mmio_enabled() and resume() aren't there, then the | ||
55 | driver is assumed as not doing any direct recovery and requires | ||
56 | a reset. If link_reset() is not implemented, the card is assumed as | ||
57 | not caring about link resets, in which case, if recover is supported, | ||
58 | the core can try recover (but not slot_reset() unless it really did | ||
59 | reset the slot). If slot_reset() is not supported, link_reset() can | ||
60 | be called instead on a slot reset. | ||
61 | |||
62 | At first, the call will always be : | ||
63 | |||
64 | 1) error_detected() | ||
65 | |||
66 | Error detected. This is sent once after an error has been detected. At | ||
67 | this point, the device might not be accessible anymore depending on the | ||
68 | platform (the slot will be isolated on ppc64). The driver may already | ||
69 | have "noticed" the error because of a failing IO, but this is the proper | ||
70 | "synchronisation point", that is, it gives a chance to the driver to | ||
71 | cleanup, waiting for pending stuff (timers, whatever, etc...) to | ||
72 | complete; it can take semaphores, schedule, etc... everything but touch | ||
73 | the device. Within this function and after it returns, the driver | ||
74 | shouldn't do any new IOs. Called in task context. This is sort of a | ||
75 | "quiesce" point. See note about interrupts at the end of this doc. | ||
76 | |||
77 | Result codes: | ||
78 | - PCIERR_RESULT_CAN_RECOVER: | ||
79 | Driever returns this if it thinks it might be able to recover | ||
80 | the HW by just banging IOs or if it wants to be given | ||
81 | a chance to extract some diagnostic informations (see | ||
82 | below). | ||
83 | - PCIERR_RESULT_NEED_RESET: | ||
84 | Driver returns this if it thinks it can't recover unless the | ||
85 | slot is reset. | ||
86 | - PCIERR_RESULT_DISCONNECT: | ||
87 | Return this if driver thinks it won't recover at all, | ||
88 | (this will detach the driver ? or just leave it | ||
89 | dangling ? to be decided) | ||
90 | |||
91 | So at this point, we have called error_detected() for all drivers | ||
92 | on the segment that had the error. On ppc64, the slot is isolated. What | ||
93 | happens now typically depends on the result from the drivers. If all | ||
94 | drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would | ||
95 | re-enable IOs on the slot (or do nothing special if the platform doesn't | ||
96 | isolate slots) and call 2). If not and we can reset slots, we go to 4), | ||
97 | if neither, we have a dead slot. If it's an hotplug slot, we might | ||
98 | "simulate" reset by triggering HW unplug/replug though. | ||
99 | |||
100 | >>> Current ppc64 implementation assumes that a device driver will | ||
101 | >>> *not* schedule or semaphore in this routine; the current ppc64 | ||
102 | >>> implementation uses one kernel thread to notify all devices; | ||
103 | >>> thus, of one device sleeps/schedules, all devices are affected. | ||
104 | >>> Doing better requires complex multi-threaded logic in the error | ||
105 | >>> recovery implementation (e.g. waiting for all notification threads | ||
106 | >>> to "join" before proceeding with recovery.) This seems excessively | ||
107 | >>> complex and not worth implementing. | ||
108 | |||
109 | >>> The current ppc64 implementation doesn't much care if the device | ||
110 | >>> attempts i/o at this point, or not. I/O's will fail, returning | ||
111 | >>> a value of 0xff on read, and writes will be dropped. If the device | ||
112 | >>> driver attempts more than 10K I/O's to a frozen adapter, it will | ||
113 | >>> assume that the device driver has gone into an infinite loop, and | ||
114 | >>> it will panic the the kernel. | ||
115 | |||
116 | 2) mmio_enabled() | ||
117 | |||
118 | This is the "early recovery" call. IOs are allowed again, but DMA is | ||
119 | not (hrm... to be discussed, I prefer not), with some restrictions. This | ||
120 | is NOT a callback for the driver to start operations again, only to | ||
121 | peek/poke at the device, extract diagnostic information, if any, and | ||
122 | eventually do things like trigger a device local reset or some such, | ||
123 | but not restart operations. This is sent if all drivers on a segment | ||
124 | agree that they can try to recover and no automatic link reset was | ||
125 | performed by the HW. If the platform can't just re-enable IOs without | ||
126 | a slot reset or a link reset, it doesn't call this callback and goes | ||
127 | directly to 3) or 4). All IOs should be done _synchronously_ from | ||
128 | within this callback, errors triggered by them will be returned via | ||
129 | the normal pci_check_whatever() api, no new error_detected() callback | ||
130 | will be issued due to an error happening here. However, such an error | ||
131 | might cause IOs to be re-blocked for the whole segment, and thus | ||
132 | invalidate the recovery that other devices on the same segment might | ||
133 | have done, forcing the whole segment into one of the next states, | ||
134 | that is link reset or slot reset. | ||
135 | |||
136 | Result codes: | ||
137 | - PCIERR_RESULT_RECOVERED | ||
138 | Driver returns this if it thinks the device is fully | ||
139 | functionnal and thinks it is ready to start | ||
140 | normal driver operations again. There is no | ||
141 | guarantee that the driver will actually be | ||
142 | allowed to proceed, as another driver on the | ||
143 | same segment might have failed and thus triggered a | ||
144 | slot reset on platforms that support it. | ||
145 | |||
146 | - PCIERR_RESULT_NEED_RESET | ||
147 | Driver returns this if it thinks the device is not | ||
148 | recoverable in it's current state and it needs a slot | ||
149 | reset to proceed. | ||
150 | |||
151 | - PCIERR_RESULT_DISCONNECT | ||
152 | Same as above. Total failure, no recovery even after | ||
153 | reset driver dead. (To be defined more precisely) | ||
154 | |||
155 | >>> The current ppc64 implementation does not implement this callback. | ||
156 | |||
157 | 3) link_reset() | ||
158 | |||
159 | This is called after the link has been reset. This is typically | ||
160 | a PCI Express specific state at this point and is done whenever a | ||
161 | non-fatal error has been detected that can be "solved" by resetting | ||
162 | the link. This call informs the driver of the reset and the driver | ||
163 | should check if the device appears to be in working condition. | ||
164 | This function acts a bit like 2) mmio_enabled(), in that the driver | ||
165 | is not supposed to restart normal driver I/O operations right away. | ||
166 | Instead, it should just "probe" the device to check it's recoverability | ||
167 | status. If all is right, then the core will call resume() once all | ||
168 | drivers have ack'd link_reset(). | ||
169 | |||
170 | Result codes: | ||
171 | (identical to mmio_enabled) | ||
172 | |||
173 | >>> The current ppc64 implementation does not implement this callback. | ||
174 | |||
175 | 4) slot_reset() | ||
176 | |||
177 | This is called after the slot has been soft or hard reset by the | ||
178 | platform. A soft reset consists of asserting the adapter #RST line | ||
179 | and then restoring the PCI BARs and PCI configuration header. If the | ||
180 | platform supports PCI hotplug, then it might instead perform a hard | ||
181 | reset by toggling power on the slot off/on. This call gives drivers | ||
182 | the chance to re-initialize the hardware (re-download firmware, etc.), | ||
183 | but drivers shouldn't restart normal I/O processing operations at | ||
184 | this point. (See note about interrupts; interrupts aren't guaranteed | ||
185 | to be delivered until the resume() callback has been called). If all | ||
186 | device drivers report success on this callback, the patform will call | ||
187 | resume() to complete the error handling and let the driver restart | ||
188 | normal I/O processing. | ||
189 | |||
190 | A driver can still return a critical failure for this function if | ||
191 | it can't get the device operational after reset. If the platform | ||
192 | previously tried a soft reset, it migh now try a hard reset (power | ||
193 | cycle) and then call slot_reset() again. It the device still can't | ||
194 | be recovered, there is nothing more that can be done; the platform | ||
195 | will typically report a "permanent failure" in such a case. The | ||
196 | device will be considered "dead" in this case. | ||
197 | |||
198 | Result codes: | ||
199 | - PCIERR_RESULT_DISCONNECT | ||
200 | Same as above. | ||
201 | |||
202 | >>> The current ppc64 implementation does not try a power-cycle reset | ||
203 | >>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. | ||
204 | |||
205 | 5) resume() | ||
206 | |||
207 | This is called if all drivers on the segment have returned | ||
208 | PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. | ||
209 | That basically tells the driver to restart activity, tht everything | ||
210 | is back and running. No result code is taken into account here. If | ||
211 | a new error happens, it will restart a new error handling process. | ||
212 | |||
213 | That's it. I think this covers all the possibilities. The way those | ||
214 | callbacks are called is platform policy. A platform with no slot reset | ||
215 | capability for example may want to just "ignore" drivers that can't | ||
216 | recover (disconnect them) and try to let other cards on the same segment | ||
217 | recover. Keep in mind that in most real life cases, though, there will | ||
218 | be only one driver per segment. | ||
219 | |||
220 | Now, there is a note about interrupts. If you get an interrupt and your | ||
221 | device is dead or has been isolated, there is a problem :) | ||
222 | |||
223 | After much thinking, I decided to leave that to the platform. That is, | ||
224 | the recovery API only precies that: | ||
225 | |||
226 | - There is no guarantee that interrupt delivery can proceed from any | ||
227 | device on the segment starting from the error detection and until the | ||
228 | restart callback is sent, at which point interrupts are expected to be | ||
229 | fully operational. | ||
230 | |||
231 | - There is no guarantee that interrupt delivery is stopped, that is, ad | ||
232 | river that gets an interrupts after detecting an error, or that detects | ||
233 | and error within the interrupt handler such that it prevents proper | ||
234 | ack'ing of the interrupt (and thus removal of the source) should just | ||
235 | return IRQ_NOTHANDLED. It's up to the platform to deal with taht | ||
236 | condition, typically by masking the irq source during the duration of | ||
237 | the error handling. It is expected that the platform "knows" which | ||
238 | interrupts are routed to error-management capable slots and can deal | ||
239 | with temporarily disabling that irq number during error processing (this | ||
240 | isn't terribly complex). That means some IRQ latency for other devices | ||
241 | sharing the interrupt, but there is simply no other way. High end | ||
242 | platforms aren't supposed to share interrupts between many devices | ||
243 | anyway :) | ||
244 | |||
245 | |||
246 | Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com> | ||
diff --git a/Documentation/pm.txt b/Documentation/pm.txt index 2ea1149bf6b0..79c0f32a760e 100644 --- a/Documentation/pm.txt +++ b/Documentation/pm.txt | |||
@@ -218,7 +218,7 @@ proceed in the opposite direction. | |||
218 | Q: Who do I contact for additional information about | 218 | Q: Who do I contact for additional information about |
219 | enabling power management for my specific driver/device? | 219 | enabling power management for my specific driver/device? |
220 | 220 | ||
221 | ACPI Development mailing list: acpi-devel@lists.sourceforge.net | 221 | ACPI Development mailing list: linux-acpi@vger.kernel.org |
222 | 222 | ||
223 | System Interface -- OBSOLETE, DO NOT USE! | 223 | System Interface -- OBSOLETE, DO NOT USE! |
224 | ----------------************************* | 224 | ----------------************************* |
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt index f5ebda5f4276..bd4ffb5bd49a 100644 --- a/Documentation/power/interface.txt +++ b/Documentation/power/interface.txt | |||
@@ -41,3 +41,14 @@ to. Writing to this file will accept one of | |||
41 | It will only change to 'firmware' or 'platform' if the system supports | 41 | It will only change to 'firmware' or 'platform' if the system supports |
42 | it. | 42 | it. |
43 | 43 | ||
44 | /sys/power/image_size controls the size of the image created by | ||
45 | the suspend-to-disk mechanism. It can be written a string | ||
46 | representing a non-negative integer that will be used as an upper | ||
47 | limit of the image size, in megabytes. The suspend-to-disk mechanism will | ||
48 | do its best to ensure the image size will not exceed that number. However, | ||
49 | if this turns out to be impossible, it will try to suspend anyway using the | ||
50 | smallest image possible. In particular, if "0" is written to this file, the | ||
51 | suspend image will be as small as possible. | ||
52 | |||
53 | Reading from this file will display the current image size limit, which | ||
54 | is set to 500 MB by default. | ||
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index b0d50840788e..08c79d4dc540 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt | |||
@@ -27,6 +27,11 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state | |||
27 | 27 | ||
28 | echo platform > /sys/power/disk; echo disk > /sys/power/state | 28 | echo platform > /sys/power/disk; echo disk > /sys/power/state |
29 | 29 | ||
30 | If you want to limit the suspend image size to N megabytes, do | ||
31 | |||
32 | echo N > /sys/power/image_size | ||
33 | |||
34 | before suspend (it is limited to 500 MB by default). | ||
30 | 35 | ||
31 | Encrypted suspend image: | 36 | Encrypted suspend image: |
32 | ------------------------ | 37 | ------------------------ |
@@ -207,7 +212,7 @@ A: Try running | |||
207 | 212 | ||
208 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null | 213 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null |
209 | 214 | ||
210 | after resume. swapoff -a; swapon -a may also be usefull. | 215 | after resume. swapoff -a; swapon -a may also be useful. |
211 | 216 | ||
212 | Q: What happens to devices during swsusp? They seem to be resumed | 217 | Q: What happens to devices during swsusp? They seem to be resumed |
213 | during system suspend? | 218 | during system suspend? |
@@ -318,7 +323,7 @@ to be useless to try to suspend to disk while that app is running? | |||
318 | A: No, it should work okay, as long as your app does not mlock() | 323 | A: No, it should work okay, as long as your app does not mlock() |
319 | it. Just prepare big enough swap partition. | 324 | it. Just prepare big enough swap partition. |
320 | 325 | ||
321 | Q: What information is usefull for debugging suspend-to-disk problems? | 326 | Q: What information is useful for debugging suspend-to-disk problems? |
322 | 327 | ||
323 | A: Well, last messages on the screen are always useful. If something | 328 | A: Well, last messages on the screen are always useful. If something |
324 | is broken, it is usually some kernel driver, therefore trying with as | 329 | is broken, it is usually some kernel driver, therefore trying with as |
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index e7bea0a407b4..d6d65b9bcfe3 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX | |||
@@ -8,12 +8,18 @@ please mail me. | |||
8 | cpu_features.txt | 8 | cpu_features.txt |
9 | - info on how we support a variety of CPUs with minimal compile-time | 9 | - info on how we support a variety of CPUs with minimal compile-time |
10 | options. | 10 | options. |
11 | eeh-pci-error-recovery.txt | ||
12 | - info on PCI Bus EEH Error Recovery | ||
13 | hvcs.txt | ||
14 | - IBM "Hypervisor Virtual Console Server" Installation Guide | ||
15 | mpc52xx.txt | ||
16 | - Linux 2.6.x on MPC52xx family | ||
11 | ppc_htab.txt | 17 | ppc_htab.txt |
12 | - info about the Linux/PPC /proc/ppc_htab entry | 18 | - info about the Linux/PPC /proc/ppc_htab entry |
13 | smp.txt | ||
14 | - use and state info about Linux/PPC on MP machines | ||
15 | SBC8260_memory_mapping.txt | 19 | SBC8260_memory_mapping.txt |
16 | - EST SBC8260 board info | 20 | - EST SBC8260 board info |
21 | smp.txt | ||
22 | - use and state info about Linux/PPC on MP machines | ||
17 | sound.txt | 23 | sound.txt |
18 | - info on sound support under Linux/PPC | 24 | - info on sound support under Linux/PPC |
19 | zImage_layout.txt | 25 | zImage_layout.txt |
diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt index 2c81305090df..e409e5d07486 100644 --- a/Documentation/stable_kernel_rules.txt +++ b/Documentation/stable_kernel_rules.txt | |||
@@ -1,58 +1,56 @@ | |||
1 | Everything you ever wanted to know about Linux 2.6 -stable releases. | 1 | Everything you ever wanted to know about Linux 2.6 -stable releases. |
2 | 2 | ||
3 | Rules on what kind of patches are accepted, and what ones are not, into | 3 | Rules on what kind of patches are accepted, and which ones are not, into the |
4 | the "-stable" tree: | 4 | "-stable" tree: |
5 | 5 | ||
6 | - It must be obviously correct and tested. | 6 | - It must be obviously correct and tested. |
7 | - It can not bigger than 100 lines, with context. | 7 | - It can not be bigger than 100 lines, with context. |
8 | - It must fix only one thing. | 8 | - It must fix only one thing. |
9 | - It must fix a real bug that bothers people (not a, "This could be a | 9 | - It must fix a real bug that bothers people (not a, "This could be a |
10 | problem..." type thing.) | 10 | problem..." type thing). |
11 | - It must fix a problem that causes a build error (but not for things | 11 | - It must fix a problem that causes a build error (but not for things |
12 | marked CONFIG_BROKEN), an oops, a hang, data corruption, a real | 12 | marked CONFIG_BROKEN), an oops, a hang, data corruption, a real |
13 | security issue, or some "oh, that's not good" issue. In short, | 13 | security issue, or some "oh, that's not good" issue. In short, something |
14 | something critical. | 14 | critical. |
15 | - No "theoretical race condition" issues, unless an explanation of how | 15 | - No "theoretical race condition" issues, unless an explanation of how the |
16 | the race can be exploited. | 16 | race can be exploited is also provided. |
17 | - It can not contain any "trivial" fixes in it (spelling changes, | 17 | - It can not contain any "trivial" fixes in it (spelling changes, |
18 | whitespace cleanups, etc.) | 18 | whitespace cleanups, etc). |
19 | - It must be accepted by the relevant subsystem maintainer. | 19 | - It must be accepted by the relevant subsystem maintainer. |
20 | - It must follow Documentation/SubmittingPatches rules. | 20 | - It must follow the Documentation/SubmittingPatches rules. |
21 | 21 | ||
22 | 22 | ||
23 | Procedure for submitting patches to the -stable tree: | 23 | Procedure for submitting patches to the -stable tree: |
24 | 24 | ||
25 | - Send the patch, after verifying that it follows the above rules, to | 25 | - Send the patch, after verifying that it follows the above rules, to |
26 | stable@kernel.org. | 26 | stable@kernel.org. |
27 | - The sender will receive an ack when the patch has been accepted into | 27 | - The sender will receive an ACK when the patch has been accepted into the |
28 | the queue, or a nak if the patch is rejected. This response might | 28 | queue, or a NAK if the patch is rejected. This response might take a few |
29 | take a few days, according to the developer's schedules. | 29 | days, according to the developer's schedules. |
30 | - If accepted, the patch will be added to the -stable queue, for review | 30 | - If accepted, the patch will be added to the -stable queue, for review by |
31 | by other developers. | 31 | other developers. |
32 | - Security patches should not be sent to this alias, but instead to the | 32 | - Security patches should not be sent to this alias, but instead to the |
33 | documented security@kernel.org. | 33 | documented security@kernel.org address. |
34 | 34 | ||
35 | 35 | ||
36 | Review cycle: | 36 | Review cycle: |
37 | 37 | ||
38 | - When the -stable maintainers decide for a review cycle, the patches | 38 | - When the -stable maintainers decide for a review cycle, the patches will be |
39 | will be sent to the review committee, and the maintainer of the | 39 | sent to the review committee, and the maintainer of the affected area of |
40 | affected area of the patch (unless the submitter is the maintainer of | 40 | the patch (unless the submitter is the maintainer of the area) and CC: to |
41 | the area) and CC: to the linux-kernel mailing list. | 41 | the linux-kernel mailing list. |
42 | - The review committee has 48 hours in which to ack or nak the patch. | 42 | - The review committee has 48 hours in which to ACK or NAK the patch. |
43 | - If the patch is rejected by a member of the committee, or linux-kernel | 43 | - If the patch is rejected by a member of the committee, or linux-kernel |
44 | members object to the patch, bringing up issues that the maintainers | 44 | members object to the patch, bringing up issues that the maintainers and |
45 | and members did not realize, the patch will be dropped from the | 45 | members did not realize, the patch will be dropped from the queue. |
46 | queue. | 46 | - At the end of the review cycle, the ACKed patches will be added to the |
47 | - At the end of the review cycle, the acked patches will be added to | 47 | latest -stable release, and a new -stable release will happen. |
48 | the latest -stable release, and a new -stable release will happen. | 48 | - Security patches will be accepted into the -stable tree directly from the |
49 | - Security patches will be accepted into the -stable tree directly from | 49 | security kernel team, and not go through the normal review cycle. |
50 | the security kernel team, and not go through the normal review cycle. | ||
51 | Contact the kernel security team for more details on this procedure. | 50 | Contact the kernel security team for more details on this procedure. |
52 | 51 | ||
53 | 52 | ||
54 | Review committe: | 53 | Review committe: |
55 | 54 | ||
56 | - This will be made up of a number of kernel developers who have | 55 | - This is made up of a number of kernel developers who have volunteered for |
57 | volunteered for this task, and a few that haven't. | 56 | this task, and a few that haven't. |
58 | |||
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 2f1aae32a5d9..6910c0136f8d 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -26,12 +26,13 @@ Currently, these files are in /proc/sys/vm: | |||
26 | - min_free_kbytes | 26 | - min_free_kbytes |
27 | - laptop_mode | 27 | - laptop_mode |
28 | - block_dump | 28 | - block_dump |
29 | - drop-caches | ||
29 | 30 | ||
30 | ============================================================== | 31 | ============================================================== |
31 | 32 | ||
32 | dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, | 33 | dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, |
33 | dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, | 34 | dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, |
34 | block_dump, swap_token_timeout: | 35 | block_dump, swap_token_timeout, drop-caches: |
35 | 36 | ||
36 | See Documentation/filesystems/proc.txt | 37 | See Documentation/filesystems/proc.txt |
37 | 38 | ||
@@ -102,3 +103,20 @@ This is used to force the Linux VM to keep a minimum number | |||
102 | of kilobytes free. The VM uses this number to compute a pages_min | 103 | of kilobytes free. The VM uses this number to compute a pages_min |
103 | value for each lowmem zone in the system. Each lowmem zone gets | 104 | value for each lowmem zone in the system. Each lowmem zone gets |
104 | a number of reserved free pages based proportionally on its size. | 105 | a number of reserved free pages based proportionally on its size. |
106 | |||
107 | ============================================================== | ||
108 | |||
109 | percpu_pagelist_fraction | ||
110 | |||
111 | This is the fraction of pages at most (high mark pcp->high) in each zone that | ||
112 | are allocated for each per cpu page list. The min value for this is 8. It | ||
113 | means that we don't allow more than 1/8th of pages in each zone to be | ||
114 | allocated in any single per_cpu_pagelist. This entry only changes the value | ||
115 | of hot per cpu pagelists. User can specify a number like 100 to allocate | ||
116 | 1/100th of each zone to each per cpu page list. | ||
117 | |||
118 | The batch value of each per cpu pagelist is also updated as a result. It is | ||
119 | set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) | ||
120 | |||
121 | The initial value is zero. Kernel does not use this value at boot time to set | ||
122 | the high water marks for each per cpu page list. | ||
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv index 330246ac80f8..74fb085e178b 100644 --- a/Documentation/video4linux/CARDLIST.bttv +++ b/Documentation/video4linux/CARDLIST.bttv | |||
@@ -141,3 +141,4 @@ | |||
141 | 140 -> Osprey 440 [0070:ff07] | 141 | 140 -> Osprey 440 [0070:ff07] |
142 | 141 -> Asound Skyeye PCTV | 142 | 141 -> Asound Skyeye PCTV |
143 | 142 -> Sabrent TV-FM (bttv version) | 143 | 142 -> Sabrent TV-FM (bttv version) |
144 | 143 -> Hauppauge ImpactVCB (bt878) [0070:13eb] | ||
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index a1017d1a85d4..34b6e59f2968 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 | |||
@@ -16,7 +16,7 @@ | |||
16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] | 16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] |
17 | 16 -> KWorld LTV883RF | 17 | 16 -> KWorld LTV883RF |
18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] | 18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] |
19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002] | 19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] |
20 | 19 -> Conexant DVB-T reference design [14f1:0187] | 20 | 19 -> Conexant DVB-T reference design [14f1:0187] |
21 | 20 -> Provideo PV259 [1540:2580] | 21 | 20 -> Provideo PV259 [1540:2580] |
22 | 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10] | 22 | 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10] |
@@ -35,3 +35,11 @@ | |||
35 | 34 -> ATI HDTV Wonder [1002:a101] | 35 | 34 -> ATI HDTV Wonder [1002:a101] |
36 | 35 -> WinFast DTV1000-T [107d:665f] | 36 | 35 -> WinFast DTV1000-T [107d:665f] |
37 | 36 -> AVerTV 303 (M126) [1461:000a] | 37 | 36 -> AVerTV 303 (M126) [1461:000a] |
38 | 37 -> Hauppauge Nova-S-Plus DVB-S [0070:9201,0070:9202] | ||
39 | 38 -> Hauppauge Nova-SE2 DVB-S [0070:9200] | ||
40 | 39 -> KWorld DVB-S 100 [17de:08b2] | ||
41 | 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] | ||
42 | 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] | ||
43 | 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025] | ||
44 | 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] | ||
45 | 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50] | ||
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index efb708ec116a..cb3a59bbeb17 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 | |||
@@ -56,7 +56,7 @@ | |||
56 | 55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306] | 56 | 55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306] |
57 | 56 -> Avermedia AVerTV 307 [1461:a70a] | 57 | 56 -> Avermedia AVerTV 307 [1461:a70a] |
58 | 57 -> Avermedia AVerTV GO 007 FM [1461:f31f] | 58 | 57 -> Avermedia AVerTV GO 007 FM [1461:f31f] |
59 | 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0370,1421:1370] | 59 | 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0351,1421:0370,1421:1370] |
60 | 59 -> Kworld/Tevion V-Stream Xpert TV PVR7134 | 60 | 59 -> Kworld/Tevion V-Stream Xpert TV PVR7134 |
61 | 60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502] | 61 | 60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502] |
62 | 61 -> Philips TOUGH DVB-T reference design [1131:2004] | 62 | 61 -> Philips TOUGH DVB-T reference design [1131:2004] |
@@ -81,4 +81,5 @@ | |||
81 | 80 -> ASUS Digimatrix TV [1043:0210] | 81 | 80 -> ASUS Digimatrix TV [1043:0210] |
82 | 81 -> Philips Tiger reference design [1131:2018] | 82 | 81 -> Philips Tiger reference design [1131:2018] |
83 | 82 -> MSI TV@Anywhere plus [1462:6231] | 83 | 82 -> MSI TV@Anywhere plus [1462:6231] |
84 | 84 | 83 -> Terratec Cinergy 250 PCI TV [153b:1160] | |
85 | 84 -> LifeView FlyDVB Trio [5168:0319] | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index 9d6544ea9f41..0bf3d5bf9ef8 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -40,7 +40,7 @@ tuner=38 - Philips PAL/SECAM multi (FM1216ME MK3) | |||
40 | tuner=39 - LG NTSC (newer TAPC series) | 40 | tuner=39 - LG NTSC (newer TAPC series) |
41 | tuner=40 - HITACHI V7-J180AT | 41 | tuner=40 - HITACHI V7-J180AT |
42 | tuner=41 - Philips PAL_MK (FI1216 MK) | 42 | tuner=41 - Philips PAL_MK (FI1216 MK) |
43 | tuner=42 - Philips 1236D ATSC/NTSC daul in | 43 | tuner=42 - Philips 1236D ATSC/NTSC dual in |
44 | tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F) | 44 | tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F) |
45 | tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant) | 45 | tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant) |
46 | tuner=45 - Microtune 4049 FM5 | 46 | tuner=45 - Microtune 4049 FM5 |
@@ -50,7 +50,7 @@ tuner=48 - Tenna TNF 8831 BGFF) | |||
50 | tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in | 50 | tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in |
51 | tuner=50 - TCL 2002N | 51 | tuner=50 - TCL 2002N |
52 | tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) | 52 | tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) |
53 | tuner=52 - Thomson DDT 7610 (ATSC/NTSC) | 53 | tuner=52 - Thomson DTT 7610 (ATSC/NTSC) |
54 | tuner=53 - Philips FQ1286 | 54 | tuner=53 - Philips FQ1286 |
55 | tuner=54 - tda8290+75 | 55 | tuner=54 - tda8290+75 |
56 | tuner=55 - TCL 2002MB | 56 | tuner=55 - TCL 2002MB |
@@ -58,7 +58,7 @@ tuner=56 - Philips PAL/SECAM multi (FQ1216AME MK4) | |||
58 | tuner=57 - Philips FQ1236A MK4 | 58 | tuner=57 - Philips FQ1236A MK4 |
59 | tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF | 59 | tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF |
60 | tuner=59 - Ymec TVision TVF-5533MF | 60 | tuner=59 - Ymec TVision TVF-5533MF |
61 | tuner=60 - Thomson DDT 7611 (ATSC/NTSC) | 61 | tuner=60 - Thomson DTT 761X (ATSC/NTSC) |
62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF | 62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF |
63 | tuner=62 - Philips TEA5767HN FM Radio | 63 | tuner=62 - Philips TEA5767HN FM Radio |
64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner | 64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner |