diff options
Diffstat (limited to 'Documentation')
69 files changed, 4299 insertions, 518 deletions
diff --git a/Documentation/Changes b/Documentation/Changes index 86b86399d61d..fe5ae0f55020 100644 --- a/Documentation/Changes +++ b/Documentation/Changes | |||
@@ -31,8 +31,6 @@ al espaņol de este documento en varios formatos. | |||
31 | Eine deutsche Version dieser Datei finden Sie unter | 31 | Eine deutsche Version dieser Datei finden Sie unter |
32 | <http://www.stefan-winter.de/Changes-2.4.0.txt>. | 32 | <http://www.stefan-winter.de/Changes-2.4.0.txt>. |
33 | 33 | ||
34 | Last updated: October 29th, 2002 | ||
35 | |||
36 | Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu). | 34 | Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu). |
37 | 35 | ||
38 | Current Minimal Requirements | 36 | Current Minimal Requirements |
@@ -48,7 +46,7 @@ necessary on all systems; obviously, if you don't have any ISDN | |||
48 | hardware, for example, you probably needn't concern yourself with | 46 | hardware, for example, you probably needn't concern yourself with |
49 | isdn4k-utils. | 47 | isdn4k-utils. |
50 | 48 | ||
51 | o Gnu C 2.95.3 # gcc --version | 49 | o Gnu C 3.2 # gcc --version |
52 | o Gnu make 3.79.1 # make --version | 50 | o Gnu make 3.79.1 # make --version |
53 | o binutils 2.12 # ld -v | 51 | o binutils 2.12 # ld -v |
54 | o util-linux 2.10o # fdformat --version | 52 | o util-linux 2.10o # fdformat --version |
@@ -74,26 +72,7 @@ GCC | |||
74 | --- | 72 | --- |
75 | 73 | ||
76 | The gcc version requirements may vary depending on the type of CPU in your | 74 | The gcc version requirements may vary depending on the type of CPU in your |
77 | computer. The next paragraph applies to users of x86 CPUs, but not | 75 | computer. |
78 | necessarily to users of other CPUs. Users of other CPUs should obtain | ||
79 | information about their gcc version requirements from another source. | ||
80 | |||
81 | The recommended compiler for the kernel is gcc 2.95.x (x >= 3), and it | ||
82 | should be used when you need absolute stability. You may use gcc 3.0.x | ||
83 | instead if you wish, although it may cause problems. Later versions of gcc | ||
84 | have not received much testing for Linux kernel compilation, and there are | ||
85 | almost certainly bugs (mainly, but not exclusively, in the kernel) that | ||
86 | will need to be fixed in order to use these compilers. In any case, using | ||
87 | pgcc instead of plain gcc is just asking for trouble. | ||
88 | |||
89 | The Red Hat gcc 2.96 compiler subtree can also be used to build this tree. | ||
90 | You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build | ||
91 | the kernel correctly. | ||
92 | |||
93 | In addition, please pay attention to compiler optimization. Anything | ||
94 | greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x | ||
95 | or derivatives, be sure not to use -fstrict-aliasing (which, depending on | ||
96 | your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing). | ||
97 | 76 | ||
98 | Make | 77 | Make |
99 | ---- | 78 | ---- |
@@ -322,9 +301,9 @@ Getting updated software | |||
322 | Kernel compilation | 301 | Kernel compilation |
323 | ****************** | 302 | ****************** |
324 | 303 | ||
325 | gcc 2.95.3 | 304 | gcc |
326 | ---------- | 305 | --- |
327 | o <ftp://ftp.gnu.org/gnu/gcc/gcc-2.95.3.tar.gz> | 306 | o <ftp://ftp.gnu.org/gnu/gcc/> |
328 | 307 | ||
329 | Make | 308 | Make |
330 | ---- | 309 | ---- |
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index eb7db3c19227..ce5d2c038cf5 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle | |||
@@ -199,7 +199,7 @@ The rationale is: | |||
199 | modifications are prevented | 199 | modifications are prevented |
200 | - saves the compiler work to optimize redundant code away ;) | 200 | - saves the compiler work to optimize redundant code away ;) |
201 | 201 | ||
202 | int fun(int ) | 202 | int fun(int a) |
203 | { | 203 | { |
204 | int result = 0; | 204 | int result = 0; |
205 | char *buffer = kmalloc(SIZE); | 205 | char *buffer = kmalloc(SIZE); |
@@ -344,7 +344,7 @@ Remember: if another thread can find your data structure, and you don't | |||
344 | have a reference count on it, you almost certainly have a bug. | 344 | have a reference count on it, you almost certainly have a bug. |
345 | 345 | ||
346 | 346 | ||
347 | Chapter 11: Macros, Enums, Inline functions and RTL | 347 | Chapter 11: Macros, Enums and RTL |
348 | 348 | ||
349 | Names of macros defining constants and labels in enums are capitalized. | 349 | Names of macros defining constants and labels in enums are capitalized. |
350 | 350 | ||
@@ -429,7 +429,35 @@ from void pointer to any other pointer type is guaranteed by the C programming | |||
429 | language. | 429 | language. |
430 | 430 | ||
431 | 431 | ||
432 | Chapter 14: References | 432 | Chapter 14: The inline disease |
433 | |||
434 | There appears to be a common misperception that gcc has a magic "make me | ||
435 | faster" speedup option called "inline". While the use of inlines can be | ||
436 | appropriate (for example as a means of replacing macros, see Chapter 11), it | ||
437 | very often is not. Abundant use of the inline keyword leads to a much bigger | ||
438 | kernel, which in turn slows the system as a whole down, due to a bigger | ||
439 | icache footprint for the CPU and simply because there is less memory | ||
440 | available for the pagecache. Just think about it; a pagecache miss causes a | ||
441 | disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles | ||
442 | that can go into these 5 miliseconds. | ||
443 | |||
444 | A reasonable rule of thumb is to not put inline at functions that have more | ||
445 | than 3 lines of code in them. An exception to this rule are the cases where | ||
446 | a parameter is known to be a compiletime constant, and as a result of this | ||
447 | constantness you *know* the compiler will be able to optimize most of your | ||
448 | function away at compile time. For a good example of this later case, see | ||
449 | the kmalloc() inline function. | ||
450 | |||
451 | Often people argue that adding inline to functions that are static and used | ||
452 | only once is always a win since there is no space tradeoff. While this is | ||
453 | technically correct, gcc is capable of inlining these automatically without | ||
454 | help, and the maintenance issue of removing the inline when a second user | ||
455 | appears outweighs the potential value of the hint that tells gcc to do | ||
456 | something it would have done anyway. | ||
457 | |||
458 | |||
459 | |||
460 | Chapter 15: References | ||
433 | 461 | ||
434 | The C Programming Language, Second Edition | 462 | The C Programming Language, Second Edition |
435 | by Brian W. Kernighan and Dennis M. Ritchie. | 463 | by Brian W. Kernighan and Dennis M. Ritchie. |
@@ -444,10 +472,13 @@ ISBN 0-201-61586-X. | |||
444 | URL: http://cm.bell-labs.com/cm/cs/tpop/ | 472 | URL: http://cm.bell-labs.com/cm/cs/tpop/ |
445 | 473 | ||
446 | GNU manuals - where in compliance with K&R and this text - for cpp, gcc, | 474 | GNU manuals - where in compliance with K&R and this text - for cpp, gcc, |
447 | gcc internals and indent, all available from http://www.gnu.org | 475 | gcc internals and indent, all available from http://www.gnu.org/manual/ |
448 | 476 | ||
449 | WG14 is the international standardization working group for the programming | 477 | WG14 is the international standardization working group for the programming |
450 | language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/ | 478 | language C, URL: http://www.open-std.org/JTC1/SC22/WG14/ |
479 | |||
480 | Kernel CodingStyle, by greg@kroah.com at OLS 2002: | ||
481 | http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ | ||
451 | 482 | ||
452 | -- | 483 | -- |
453 | Last updated on 16 February 2004 by a community effort on LKML. | 484 | Last updated on 30 December 2005 by a community effort on LKML. |
diff --git a/Documentation/DocBook/.gitignore b/Documentation/DocBook/.gitignore new file mode 100644 index 000000000000..c102c02ecf89 --- /dev/null +++ b/Documentation/DocBook/.gitignore | |||
@@ -0,0 +1,6 @@ | |||
1 | *.xml | ||
2 | *.ps | ||
3 | |||
4 | *.html | ||
5 | *.9.gz | ||
6 | *.9 | ||
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 767433bdbc40..8c9c6704e85b 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -54,6 +54,11 @@ | |||
54 | !Ekernel/sched.c | 54 | !Ekernel/sched.c |
55 | !Ekernel/timer.c | 55 | !Ekernel/timer.c |
56 | </sect1> | 56 | </sect1> |
57 | <sect1><title>High-resolution timers</title> | ||
58 | !Iinclude/linux/ktime.h | ||
59 | !Iinclude/linux/hrtimer.h | ||
60 | !Ekernel/hrtimer.c | ||
61 | </sect1> | ||
57 | <sect1><title>Internal Functions</title> | 62 | <sect1><title>Internal Functions</title> |
58 | !Ikernel/exit.c | 63 | !Ikernel/exit.c |
59 | !Ikernel/signal.c | 64 | !Ikernel/signal.c |
@@ -369,6 +374,7 @@ X!Edrivers/acpi/motherboard.c | |||
369 | X!Edrivers/acpi/bus.c | 374 | X!Edrivers/acpi/bus.c |
370 | --> | 375 | --> |
371 | !Edrivers/acpi/scan.c | 376 | !Edrivers/acpi/scan.c |
377 | !Idrivers/acpi/scan.c | ||
372 | <!-- No correct structured comments | 378 | <!-- No correct structured comments |
373 | X!Edrivers/acpi/pci_bind.c | 379 | X!Edrivers/acpi/pci_bind.c |
374 | --> | 380 | --> |
diff --git a/Documentation/DocBook/kernel-locking.tmpl b/Documentation/DocBook/kernel-locking.tmpl index 90dc2de8e0af..158ffe9bfade 100644 --- a/Documentation/DocBook/kernel-locking.tmpl +++ b/Documentation/DocBook/kernel-locking.tmpl | |||
@@ -222,7 +222,7 @@ | |||
222 | <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> | 222 | <title>Two Main Types of Kernel Locks: Spinlocks and Semaphores</title> |
223 | 223 | ||
224 | <para> | 224 | <para> |
225 | There are two main types of kernel locks. The fundamental type | 225 | There are three main types of kernel locks. The fundamental type |
226 | is the spinlock | 226 | is the spinlock |
227 | (<filename class="headerfile">include/asm/spinlock.h</filename>), | 227 | (<filename class="headerfile">include/asm/spinlock.h</filename>), |
228 | which is a very simple single-holder lock: if you can't get the | 228 | which is a very simple single-holder lock: if you can't get the |
@@ -230,16 +230,22 @@ | |||
230 | very small and fast, and can be used anywhere. | 230 | very small and fast, and can be used anywhere. |
231 | </para> | 231 | </para> |
232 | <para> | 232 | <para> |
233 | The second type is a semaphore | 233 | The second type is a mutex |
234 | (<filename class="headerfile">include/linux/mutex.h</filename>): it | ||
235 | is like a spinlock, but you may block holding a mutex. | ||
236 | If you can't lock a mutex, your task will suspend itself, and be woken | ||
237 | up when the mutex is released. This means the CPU can do something | ||
238 | else while you are waiting. There are many cases when you simply | ||
239 | can't sleep (see <xref linkend="sleeping-things"/>), and so have to | ||
240 | use a spinlock instead. | ||
241 | </para> | ||
242 | <para> | ||
243 | The third type is a semaphore | ||
234 | (<filename class="headerfile">include/asm/semaphore.h</filename>): it | 244 | (<filename class="headerfile">include/asm/semaphore.h</filename>): it |
235 | can have more than one holder at any time (the number decided at | 245 | can have more than one holder at any time (the number decided at |
236 | initialization time), although it is most commonly used as a | 246 | initialization time), although it is most commonly used as a |
237 | single-holder lock (a mutex). If you can't get a semaphore, | 247 | single-holder lock (a mutex). If you can't get a semaphore, your |
238 | your task will put itself on the queue, and be woken up when the | 248 | task will be suspended and later on woken up - just like for mutexes. |
239 | semaphore is released. This means the CPU will do something | ||
240 | else while you are waiting, but there are many cases when you | ||
241 | simply can't sleep (see <xref linkend="sleeping-things"/>), and so | ||
242 | have to use a spinlock instead. | ||
243 | </para> | 249 | </para> |
244 | <para> | 250 | <para> |
245 | Neither type of lock is recursive: see | 251 | Neither type of lock is recursive: see |
diff --git a/Documentation/DocBook/videobook.tmpl b/Documentation/DocBook/videobook.tmpl index 3ec6c875588a..fdff984a5161 100644 --- a/Documentation/DocBook/videobook.tmpl +++ b/Documentation/DocBook/videobook.tmpl | |||
@@ -229,7 +229,7 @@ int __init myradio_init(struct video_init *v) | |||
229 | 229 | ||
230 | static int users = 0; | 230 | static int users = 0; |
231 | 231 | ||
232 | static int radio_open(stuct video_device *dev, int flags) | 232 | static int radio_open(struct video_device *dev, int flags) |
233 | { | 233 | { |
234 | if(users) | 234 | if(users) |
235 | return -EBUSY; | 235 | return -EBUSY; |
@@ -949,7 +949,7 @@ int __init mycamera_init(struct video_init *v) | |||
949 | 949 | ||
950 | static int users = 0; | 950 | static int users = 0; |
951 | 951 | ||
952 | static int camera_open(stuct video_device *dev, int flags) | 952 | static int camera_open(struct video_device *dev, int flags) |
953 | { | 953 | { |
954 | if(users) | 954 | if(users) |
955 | return -EBUSY; | 955 | return -EBUSY; |
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt index a23fee66064d..3f60db41b2f0 100644 --- a/Documentation/RCU/rcuref.txt +++ b/Documentation/RCU/rcuref.txt | |||
@@ -1,74 +1,67 @@ | |||
1 | Refcounter framework for elements of lists/arrays protected by | 1 | Refcounter design for elements of lists/arrays protected by RCU. |
2 | RCU. | ||
3 | 2 | ||
4 | Refcounting on elements of lists which are protected by traditional | 3 | Refcounting on elements of lists which are protected by traditional |
5 | reader/writer spinlocks or semaphores are straight forward as in: | 4 | reader/writer spinlocks or semaphores are straight forward as in: |
6 | 5 | ||
7 | 1. 2. | 6 | 1. 2. |
8 | add() search_and_reference() | 7 | add() search_and_reference() |
9 | { { | 8 | { { |
10 | alloc_object read_lock(&list_lock); | 9 | alloc_object read_lock(&list_lock); |
11 | ... search_for_element | 10 | ... search_for_element |
12 | atomic_set(&el->rc, 1); atomic_inc(&el->rc); | 11 | atomic_set(&el->rc, 1); atomic_inc(&el->rc); |
13 | write_lock(&list_lock); ... | 12 | write_lock(&list_lock); ... |
14 | add_element read_unlock(&list_lock); | 13 | add_element read_unlock(&list_lock); |
15 | ... ... | 14 | ... ... |
16 | write_unlock(&list_lock); } | 15 | write_unlock(&list_lock); } |
17 | } | 16 | } |
18 | 17 | ||
19 | 3. 4. | 18 | 3. 4. |
20 | release_referenced() delete() | 19 | release_referenced() delete() |
21 | { { | 20 | { { |
22 | ... write_lock(&list_lock); | 21 | ... write_lock(&list_lock); |
23 | atomic_dec(&el->rc, relfunc) ... | 22 | atomic_dec(&el->rc, relfunc) ... |
24 | ... delete_element | 23 | ... delete_element |
25 | } write_unlock(&list_lock); | 24 | } write_unlock(&list_lock); |
26 | ... | 25 | ... |
27 | if (atomic_dec_and_test(&el->rc)) | 26 | if (atomic_dec_and_test(&el->rc)) |
28 | kfree(el); | 27 | kfree(el); |
29 | ... | 28 | ... |
30 | } | 29 | } |
31 | 30 | ||
32 | If this list/array is made lock free using rcu as in changing the | 31 | If this list/array is made lock free using rcu as in changing the |
33 | write_lock in add() and delete() to spin_lock and changing read_lock | 32 | write_lock in add() and delete() to spin_lock and changing read_lock |
34 | in search_and_reference to rcu_read_lock(), the rcuref_get in | 33 | in search_and_reference to rcu_read_lock(), the atomic_get in |
35 | search_and_reference could potentially hold reference to an element which | 34 | search_and_reference could potentially hold reference to an element which |
36 | has already been deleted from the list/array. rcuref_lf_get_rcu takes | 35 | has already been deleted from the list/array. atomic_inc_not_zero takes |
37 | care of this scenario. search_and_reference should look as; | 36 | care of this scenario. search_and_reference should look as; |
38 | 37 | ||
39 | 1. 2. | 38 | 1. 2. |
40 | add() search_and_reference() | 39 | add() search_and_reference() |
41 | { { | 40 | { { |
42 | alloc_object rcu_read_lock(); | 41 | alloc_object rcu_read_lock(); |
43 | ... search_for_element | 42 | ... search_for_element |
44 | atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) { | 43 | atomic_set(&el->rc, 1); if (atomic_inc_not_zero(&el->rc)) { |
45 | write_lock(&list_lock); rcu_read_unlock(); | 44 | write_lock(&list_lock); rcu_read_unlock(); |
46 | return FAIL; | 45 | return FAIL; |
47 | add_element } | 46 | add_element } |
48 | ... ... | 47 | ... ... |
49 | write_unlock(&list_lock); rcu_read_unlock(); | 48 | write_unlock(&list_lock); rcu_read_unlock(); |
50 | } } | 49 | } } |
51 | 3. 4. | 50 | 3. 4. |
52 | release_referenced() delete() | 51 | release_referenced() delete() |
53 | { { | 52 | { { |
54 | ... write_lock(&list_lock); | 53 | ... write_lock(&list_lock); |
55 | rcuref_dec(&el->rc, relfunc) ... | 54 | atomic_dec(&el->rc, relfunc) ... |
56 | ... delete_element | 55 | ... delete_element |
57 | } write_unlock(&list_lock); | 56 | } write_unlock(&list_lock); |
58 | ... | 57 | ... |
59 | if (rcuref_dec_and_test(&el->rc)) | 58 | if (atomic_dec_and_test(&el->rc)) |
60 | call_rcu(&el->head, el_free); | 59 | call_rcu(&el->head, el_free); |
61 | ... | 60 | ... |
62 | } | 61 | } |
63 | 62 | ||
64 | Sometimes, reference to the element need to be obtained in the | 63 | Sometimes, reference to the element need to be obtained in the |
65 | update (write) stream. In such cases, rcuref_inc_lf might be an overkill | 64 | update (write) stream. In such cases, atomic_inc_not_zero might be an |
66 | since the spinlock serialising list updates are held. rcuref_inc | 65 | overkill since the spinlock serialising list updates are held. atomic_inc |
67 | is to be used in such cases. | 66 | is to be used in such cases. |
68 | For arches which do not have cmpxchg rcuref_inc_lf | 67 | |
69 | api uses a hashed spinlock implementation and the same hashed spinlock | ||
70 | is acquired in all rcuref_xxx primitives to preserve atomicity. | ||
71 | Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the | ||
72 | refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api | ||
73 | might lead to races. rcuref_inc_lf() must be used in lockfree | ||
74 | RCU critical sections only. | ||
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers index c3cca924e94b..6bd30fdd0786 100644 --- a/Documentation/SubmittingDrivers +++ b/Documentation/SubmittingDrivers | |||
@@ -27,18 +27,17 @@ Who To Submit Drivers To | |||
27 | ------------------------ | 27 | ------------------------ |
28 | 28 | ||
29 | Linux 2.0: | 29 | Linux 2.0: |
30 | No new drivers are accepted for this kernel tree | 30 | No new drivers are accepted for this kernel tree. |
31 | 31 | ||
32 | Linux 2.2: | 32 | Linux 2.2: |
33 | No new drivers are accepted for this kernel tree. | ||
34 | |||
35 | Linux 2.4: | ||
33 | If the code area has a general maintainer then please submit it to | 36 | If the code area has a general maintainer then please submit it to |
34 | the maintainer listed in MAINTAINERS in the kernel file. If the | 37 | the maintainer listed in MAINTAINERS in the kernel file. If the |
35 | maintainer does not respond or you cannot find the appropriate | 38 | maintainer does not respond or you cannot find the appropriate |
36 | maintainer then please contact the 2.2 kernel maintainer: | 39 | maintainer then please contact Marcelo Tosatti |
37 | Marc-Christian Petersen <m.c.p@wolk-project.de>. | 40 | <marcelo.tosatti@cyclades.com>. |
38 | |||
39 | Linux 2.4: | ||
40 | The same rules apply as 2.2. The final contact point for Linux 2.4 | ||
41 | submissions is Marcelo Tosatti <marcelo.tosatti@cyclades.com>. | ||
42 | 41 | ||
43 | Linux 2.6: | 42 | Linux 2.6: |
44 | The same rules apply as 2.4 except that you should follow linux-kernel | 43 | The same rules apply as 2.4 except that you should follow linux-kernel |
@@ -53,6 +52,7 @@ Licensing: The code must be released to us under the | |||
53 | of exclusive GPL licensing, and if you wish the driver | 52 | of exclusive GPL licensing, and if you wish the driver |
54 | to be useful to other communities such as BSD you may well | 53 | to be useful to other communities such as BSD you may well |
55 | wish to release under multiple licenses. | 54 | wish to release under multiple licenses. |
55 | See accepted licenses at include/linux/module.h | ||
56 | 56 | ||
57 | Copyright: The copyright owner must agree to use of GPL. | 57 | Copyright: The copyright owner must agree to use of GPL. |
58 | It's best if the submitter and copyright owner | 58 | It's best if the submitter and copyright owner |
@@ -143,5 +143,13 @@ KernelNewbies: | |||
143 | http://kernelnewbies.org/ | 143 | http://kernelnewbies.org/ |
144 | 144 | ||
145 | Linux USB project: | 145 | Linux USB project: |
146 | http://sourceforge.net/projects/linux-usb/ | 146 | http://www.linux-usb.org/ |
147 | |||
148 | How to NOT write kernel driver by arjanv@redhat.com | ||
149 | http://people.redhat.com/arjanv/olspaper.pdf | ||
150 | |||
151 | Kernel Janitor: | ||
152 | http://janitor.kernelnewbies.org/ | ||
147 | 153 | ||
154 | -- | ||
155 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches index 1d47e6c09dc6..c2c85bcb3d43 100644 --- a/Documentation/SubmittingPatches +++ b/Documentation/SubmittingPatches | |||
@@ -78,7 +78,9 @@ Randy Dunlap's patch scripts: | |||
78 | http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz | 78 | http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz |
79 | 79 | ||
80 | Andrew Morton's patch scripts: | 80 | Andrew Morton's patch scripts: |
81 | http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20 | 81 | http://www.zip.com.au/~akpm/linux/patches/ |
82 | Instead of these scripts, quilt is the recommended patch management | ||
83 | tool (see above). | ||
82 | 84 | ||
83 | 85 | ||
84 | 86 | ||
@@ -97,7 +99,7 @@ need to split up your patch. See #3, next. | |||
97 | 99 | ||
98 | 3) Separate your changes. | 100 | 3) Separate your changes. |
99 | 101 | ||
100 | Separate each logical change into its own patch. | 102 | Separate _logical changes_ into a single patch file. |
101 | 103 | ||
102 | For example, if your changes include both bug fixes and performance | 104 | For example, if your changes include both bug fixes and performance |
103 | enhancements for a single driver, separate those changes into two | 105 | enhancements for a single driver, separate those changes into two |
@@ -112,6 +114,10 @@ If one patch depends on another patch in order for a change to be | |||
112 | complete, that is OK. Simply note "this patch depends on patch X" | 114 | complete, that is OK. Simply note "this patch depends on patch X" |
113 | in your patch description. | 115 | in your patch description. |
114 | 116 | ||
117 | If you cannot condense your patch set into a smaller set of patches, | ||
118 | then only post say 15 or so at a time and wait for review and integration. | ||
119 | |||
120 | |||
115 | 121 | ||
116 | 4) Select e-mail destination. | 122 | 4) Select e-mail destination. |
117 | 123 | ||
@@ -124,6 +130,10 @@ your patch to the primary Linux kernel developer's mailing list, | |||
124 | linux-kernel@vger.kernel.org. Most kernel developers monitor this | 130 | linux-kernel@vger.kernel.org. Most kernel developers monitor this |
125 | e-mail list, and can comment on your changes. | 131 | e-mail list, and can comment on your changes. |
126 | 132 | ||
133 | |||
134 | Do not send more than 15 patches at once to the vger mailing lists!!! | ||
135 | |||
136 | |||
127 | Linus Torvalds is the final arbiter of all changes accepted into the | 137 | Linus Torvalds is the final arbiter of all changes accepted into the |
128 | Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets | 138 | Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets |
129 | a lot of e-mail, so typically you should do your best to -avoid- sending | 139 | a lot of e-mail, so typically you should do your best to -avoid- sending |
@@ -149,6 +159,9 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the | |||
149 | MAINTAINERS file for a mailing list that relates specifically to | 159 | MAINTAINERS file for a mailing list that relates specifically to |
150 | your change. | 160 | your change. |
151 | 161 | ||
162 | Majordomo lists of VGER.KERNEL.ORG at: | ||
163 | <http://vger.kernel.org/vger-lists.html> | ||
164 | |||
152 | If changes affect userland-kernel interfaces, please send | 165 | If changes affect userland-kernel interfaces, please send |
153 | the MAN-PAGES maintainer (as listed in the MAINTAINERS file) | 166 | the MAN-PAGES maintainer (as listed in the MAINTAINERS file) |
154 | a man-pages patch, or at least a notification of the change, | 167 | a man-pages patch, or at least a notification of the change, |
@@ -373,27 +386,14 @@ a diffstat, to show what files have changed, and the number of inserted | |||
373 | and deleted lines per file. A diffstat is especially useful on bigger | 386 | and deleted lines per file. A diffstat is especially useful on bigger |
374 | patches. Other comments relevant only to the moment or the maintainer, | 387 | patches. Other comments relevant only to the moment or the maintainer, |
375 | not suitable for the permanent changelog, should also go here. | 388 | not suitable for the permanent changelog, should also go here. |
389 | Use diffstat options "-p 1 -w 70" so that filenames are listed from the | ||
390 | top of the kernel source tree and don't use too much horizontal space | ||
391 | (easily fit in 80 columns, maybe with some indentation). | ||
376 | 392 | ||
377 | See more details on the proper patch format in the following | 393 | See more details on the proper patch format in the following |
378 | references. | 394 | references. |
379 | 395 | ||
380 | 396 | ||
381 | 13) More references for submitting patches | ||
382 | |||
383 | Andrew Morton, "The perfect patch" (tpp). | ||
384 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> | ||
385 | |||
386 | Jeff Garzik, "Linux kernel patch submission format." | ||
387 | <http://linux.yyz.us/patch-format.html> | ||
388 | |||
389 | Greg KH, "How to piss off a kernel subsystem maintainer" | ||
390 | <http://www.kroah.com/log/2005/03/31/> | ||
391 | |||
392 | Kernel Documentation/CodingStyle | ||
393 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> | ||
394 | |||
395 | Linus Torvald's mail on the canonical patch format: | ||
396 | <http://lkml.org/lkml/2005/4/7/183> | ||
397 | 397 | ||
398 | 398 | ||
399 | ----------------------------------- | 399 | ----------------------------------- |
@@ -466,3 +466,31 @@ and 'extern __inline__'. | |||
466 | Don't try to anticipate nebulous future cases which may or may not | 466 | Don't try to anticipate nebulous future cases which may or may not |
467 | be useful: "Make it as simple as you can, and no simpler." | 467 | be useful: "Make it as simple as you can, and no simpler." |
468 | 468 | ||
469 | |||
470 | |||
471 | ---------------------- | ||
472 | SECTION 3 - REFERENCES | ||
473 | ---------------------- | ||
474 | |||
475 | Andrew Morton, "The perfect patch" (tpp). | ||
476 | <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt> | ||
477 | |||
478 | Jeff Garzik, "Linux kernel patch submission format." | ||
479 | <http://linux.yyz.us/patch-format.html> | ||
480 | |||
481 | Greg Kroah-Hartman "How to piss off a kernel subsystem maintainer". | ||
482 | <http://www.kroah.com/log/2005/03/31/> | ||
483 | <http://www.kroah.com/log/2005/07/08/> | ||
484 | <http://www.kroah.com/log/2005/10/19/> | ||
485 | <http://www.kroah.com/log/2006/01/11/> | ||
486 | |||
487 | NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!. | ||
488 | <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2> | ||
489 | |||
490 | Kernel Documentation/CodingStyle | ||
491 | <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle> | ||
492 | |||
493 | Linus Torvald's mail on the canonical patch format: | ||
494 | <http://lkml.org/lkml/2005/4/7/183> | ||
495 | -- | ||
496 | Last updated on 17 Nov 2005. | ||
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt index 681e426e2482..a083ba35d1ad 100644 --- a/Documentation/applying-patches.txt +++ b/Documentation/applying-patches.txt | |||
@@ -2,8 +2,8 @@ | |||
2 | Applying Patches To The Linux Kernel | 2 | Applying Patches To The Linux Kernel |
3 | ------------------------------------ | 3 | ------------------------------------ |
4 | 4 | ||
5 | (Written by Jesper Juhl, August 2005) | 5 | Original by: Jesper Juhl, August 2005 |
6 | 6 | Last update: 2006-01-05 | |
7 | 7 | ||
8 | 8 | ||
9 | A frequently asked question on the Linux Kernel Mailing List is how to apply | 9 | A frequently asked question on the Linux Kernel Mailing List is how to apply |
@@ -76,7 +76,7 @@ instead: | |||
76 | 76 | ||
77 | If you wish to uncompress the patch file by hand first before applying it | 77 | If you wish to uncompress the patch file by hand first before applying it |
78 | (what I assume you've done in the examples below), then you simply run | 78 | (what I assume you've done in the examples below), then you simply run |
79 | gunzip or bunzip2 on the file - like this: | 79 | gunzip or bunzip2 on the file -- like this: |
80 | gunzip patch-x.y.z.gz | 80 | gunzip patch-x.y.z.gz |
81 | bunzip2 patch-x.y.z.bz2 | 81 | bunzip2 patch-x.y.z.bz2 |
82 | 82 | ||
@@ -94,7 +94,7 @@ Common errors when patching | |||
94 | --- | 94 | --- |
95 | When patch applies a patch file it attempts to verify the sanity of the | 95 | When patch applies a patch file it attempts to verify the sanity of the |
96 | file in different ways. | 96 | file in different ways. |
97 | Checking that the file looks like a valid patch file, checking the code | 97 | Checking that the file looks like a valid patch file & checking the code |
98 | around the bits being modified matches the context provided in the patch are | 98 | around the bits being modified matches the context provided in the patch are |
99 | just two of the basic sanity checks patch does. | 99 | just two of the basic sanity checks patch does. |
100 | 100 | ||
@@ -118,16 +118,16 @@ wrong. | |||
118 | 118 | ||
119 | When patch encounters a change that it can't fix up with fuzz it rejects it | 119 | When patch encounters a change that it can't fix up with fuzz it rejects it |
120 | outright and leaves a file with a .rej extension (a reject file). You can | 120 | outright and leaves a file with a .rej extension (a reject file). You can |
121 | read this file to see exactely what change couldn't be applied, so you can | 121 | read this file to see exactly what change couldn't be applied, so you can |
122 | go fix it up by hand if you wish. | 122 | go fix it up by hand if you wish. |
123 | 123 | ||
124 | If you don't have any third party patches applied to your kernel source, but | 124 | If you don't have any third-party patches applied to your kernel source, but |
125 | only patches from kernel.org and you apply the patches in the correct order, | 125 | only patches from kernel.org and you apply the patches in the correct order, |
126 | and have made no modifications yourself to the source files, then you should | 126 | and have made no modifications yourself to the source files, then you should |
127 | never see a fuzz or reject message from patch. If you do see such messages | 127 | never see a fuzz or reject message from patch. If you do see such messages |
128 | anyway, then there's a high risk that either your local source tree or the | 128 | anyway, then there's a high risk that either your local source tree or the |
129 | patch file is corrupted in some way. In that case you should probably try | 129 | patch file is corrupted in some way. In that case you should probably try |
130 | redownloading the patch and if things are still not OK then you'd be advised | 130 | re-downloading the patch and if things are still not OK then you'd be advised |
131 | to start with a fresh tree downloaded in full from kernel.org. | 131 | to start with a fresh tree downloaded in full from kernel.org. |
132 | 132 | ||
133 | Let's look a bit more at some of the messages patch can produce. | 133 | Let's look a bit more at some of the messages patch can produce. |
@@ -136,7 +136,7 @@ If patch stops and presents a "File to patch:" prompt, then patch could not | |||
136 | find a file to be patched. Most likely you forgot to specify -p1 or you are | 136 | find a file to be patched. Most likely you forgot to specify -p1 or you are |
137 | in the wrong directory. Less often, you'll find patches that need to be | 137 | in the wrong directory. Less often, you'll find patches that need to be |
138 | applied with -p0 instead of -p1 (reading the patch file should reveal if | 138 | applied with -p0 instead of -p1 (reading the patch file should reveal if |
139 | this is the case - if so, then this is an error by the person who created | 139 | this is the case -- if so, then this is an error by the person who created |
140 | the patch but is not fatal). | 140 | the patch but is not fatal). |
141 | 141 | ||
142 | If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a | 142 | If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a |
@@ -167,22 +167,28 @@ the patch will in fact apply it. | |||
167 | 167 | ||
168 | A message similar to "patch: **** unexpected end of file in patch" or "patch | 168 | A message similar to "patch: **** unexpected end of file in patch" or "patch |
169 | unexpectedly ends in middle of line" means that patch could make no sense of | 169 | unexpectedly ends in middle of line" means that patch could make no sense of |
170 | the file you fed to it. Either your download is broken or you tried to feed | 170 | the file you fed to it. Either your download is broken, you tried to feed |
171 | patch a compressed patch file without uncompressing it first. | 171 | patch a compressed patch file without uncompressing it first, or the patch |
172 | file that you are using has been mangled by a mail client or mail transfer | ||
173 | agent along the way somewhere, e.g., by splitting a long line into two lines. | ||
174 | Often these warnings can easily be fixed by joining (concatenating) the | ||
175 | two lines that had been split. | ||
172 | 176 | ||
173 | As I already mentioned above, these errors should never happen if you apply | 177 | As I already mentioned above, these errors should never happen if you apply |
174 | a patch from kernel.org to the correct version of an unmodified source tree. | 178 | a patch from kernel.org to the correct version of an unmodified source tree. |
175 | So if you get these errors with kernel.org patches then you should probably | 179 | So if you get these errors with kernel.org patches then you should probably |
176 | assume that either your patch file or your tree is broken and I'd advice you | 180 | assume that either your patch file or your tree is broken and I'd advise you |
177 | to start over with a fresh download of a full kernel tree and the patch you | 181 | to start over with a fresh download of a full kernel tree and the patch you |
178 | wish to apply. | 182 | wish to apply. |
179 | 183 | ||
180 | 184 | ||
181 | Are there any alternatives to `patch'? | 185 | Are there any alternatives to `patch'? |
182 | --- | 186 | --- |
183 | Yes there are alternatives. You can use the `interdiff' program | 187 | Yes there are alternatives. |
184 | (http://cyberelk.net/tim/patchutils/) to generate a patch representing the | 188 | |
185 | differences between two patches and then apply the result. | 189 | You can use the `interdiff' program (http://cyberelk.net/tim/patchutils/) to |
190 | generate a patch representing the differences between two patches and then | ||
191 | apply the result. | ||
186 | This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single | 192 | This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single |
187 | step. The -z flag to interdiff will even let you feed it patches in gzip or | 193 | step. The -z flag to interdiff will even let you feed it patches in gzip or |
188 | bzip2 compressed form directly without the use of zcat or bzcat or manual | 194 | bzip2 compressed form directly without the use of zcat or bzcat or manual |
@@ -197,10 +203,10 @@ do the additional steps since interdiff can get things wrong in some cases. | |||
197 | Another alternative is `ketchup', which is a python script for automatic | 203 | Another alternative is `ketchup', which is a python script for automatic |
198 | downloading and applying of patches (http://www.selenic.com/ketchup/). | 204 | downloading and applying of patches (http://www.selenic.com/ketchup/). |
199 | 205 | ||
200 | Other nice tools are diffstat which shows a summary of changes made by a | 206 | Other nice tools are diffstat, which shows a summary of changes made by a |
201 | patch, lsdiff which displays a short listing of affected files in a patch | 207 | patch; lsdiff, which displays a short listing of affected files in a patch |
202 | file, along with (optionally) the line numbers of the start of each patch | 208 | file, along with (optionally) the line numbers of the start of each patch; |
203 | and grepdiff which displays a list of the files modified by a patch where | 209 | and grepdiff, which displays a list of the files modified by a patch where |
204 | the patch contains a given regular expression. | 210 | the patch contains a given regular expression. |
205 | 211 | ||
206 | 212 | ||
@@ -225,8 +231,8 @@ The -mm kernels live at | |||
225 | In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a | 231 | In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a |
226 | country code. This way you'll be downloading from a mirror site that's most | 232 | country code. This way you'll be downloading from a mirror site that's most |
227 | likely geographically closer to you, resulting in faster downloads for you, | 233 | likely geographically closer to you, resulting in faster downloads for you, |
228 | less bandwidth used globally and less load on the main kernel.org servers - | 234 | less bandwidth used globally and less load on the main kernel.org servers -- |
229 | these are good things, do use mirrors when possible. | 235 | these are good things, so do use mirrors when possible. |
230 | 236 | ||
231 | 237 | ||
232 | The 2.6.x kernels | 238 | The 2.6.x kernels |
@@ -234,14 +240,14 @@ The 2.6.x kernels | |||
234 | These are the base stable releases released by Linus. The highest numbered | 240 | These are the base stable releases released by Linus. The highest numbered |
235 | release is the most recent. | 241 | release is the most recent. |
236 | 242 | ||
237 | If regressions or other serious flaws are found then a -stable fix patch | 243 | If regressions or other serious flaws are found, then a -stable fix patch |
238 | will be released (see below) on top of this base. Once a new 2.6.x base | 244 | will be released (see below) on top of this base. Once a new 2.6.x base |
239 | kernel is released, a patch is made available that is a delta between the | 245 | kernel is released, a patch is made available that is a delta between the |
240 | previous 2.6.x kernel and the new one. | 246 | previous 2.6.x kernel and the new one. |
241 | 247 | ||
242 | To apply a patch moving from 2.6.11 to 2.6.12 you'd do the following (note | 248 | To apply a patch moving from 2.6.11 to 2.6.12, you'd do the following (note |
243 | that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the | 249 | that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the |
244 | base 2.6.x kernel - if you need to move from 2.6.x.y to 2.6.x+1 you need to | 250 | base 2.6.x kernel -- if you need to move from 2.6.x.y to 2.6.x+1 you need to |
245 | first revert the 2.6.x.y patch). | 251 | first revert the 2.6.x.y patch). |
246 | 252 | ||
247 | Here are some examples: | 253 | Here are some examples: |
@@ -258,12 +264,12 @@ $ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch | |||
258 | # source dir is now 2.6.11 | 264 | # source dir is now 2.6.11 |
259 | $ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch | 265 | $ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch |
260 | $ cd .. | 266 | $ cd .. |
261 | $ mv linux-2.6.11.1 inux-2.6.12 # rename source dir | 267 | $ mv linux-2.6.11.1 linux-2.6.12 # rename source dir |
262 | 268 | ||
263 | 269 | ||
264 | The 2.6.x.y kernels | 270 | The 2.6.x.y kernels |
265 | --- | 271 | --- |
266 | Kernels with 4 digit versions are -stable kernels. They contain small(ish) | 272 | Kernels with 4-digit versions are -stable kernels. They contain small(ish) |
267 | critical fixes for security problems or significant regressions discovered | 273 | critical fixes for security problems or significant regressions discovered |
268 | in a given 2.6.x kernel. | 274 | in a given 2.6.x kernel. |
269 | 275 | ||
@@ -274,9 +280,14 @@ versions. | |||
274 | If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is | 280 | If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is |
275 | the current stable kernel. | 281 | the current stable kernel. |
276 | 282 | ||
283 | note: the -stable team usually do make incremental patches available as well | ||
284 | as patches against the latest mainline release, but I only cover the | ||
285 | non-incremental ones below. The incremental ones can be found at | ||
286 | ftp://ftp.kernel.org/pub/linux/kernel/v2.6/incr/ | ||
287 | |||
277 | These patches are not incremental, meaning that for example the 2.6.12.3 | 288 | These patches are not incremental, meaning that for example the 2.6.12.3 |
278 | patch does not apply on top of the 2.6.12.2 kernel source, but rather on top | 289 | patch does not apply on top of the 2.6.12.2 kernel source, but rather on top |
279 | of the base 2.6.12 kernel source. | 290 | of the base 2.6.12 kernel source . |
280 | So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel | 291 | So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel |
281 | source you have to first back out the 2.6.12.2 patch (so you are left with a | 292 | source you have to first back out the 2.6.12.2 patch (so you are left with a |
282 | base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch. | 293 | base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch. |
@@ -342,12 +353,12 @@ The -git kernels | |||
342 | repository, hence the name). | 353 | repository, hence the name). |
343 | 354 | ||
344 | These patches are usually released daily and represent the current state of | 355 | These patches are usually released daily and represent the current state of |
345 | Linus' tree. They are more experimental than -rc kernels since they are | 356 | Linus's tree. They are more experimental than -rc kernels since they are |
346 | generated automatically without even a cursory glance to see if they are | 357 | generated automatically without even a cursory glance to see if they are |
347 | sane. | 358 | sane. |
348 | 359 | ||
349 | -git patches are not incremental and apply either to a base 2.6.x kernel or | 360 | -git patches are not incremental and apply either to a base 2.6.x kernel or |
350 | a base 2.6.x-rc kernel - you can see which from their name. | 361 | a base 2.6.x-rc kernel -- you can see which from their name. |
351 | A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch | 362 | A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch |
352 | named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel. | 363 | named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel. |
353 | 364 | ||
@@ -390,12 +401,12 @@ You should generally strive to get your patches into mainline via -mm to | |||
390 | ensure maximum testing. | 401 | ensure maximum testing. |
391 | 402 | ||
392 | This branch is in constant flux and contains many experimental features, a | 403 | This branch is in constant flux and contains many experimental features, a |
393 | lot of debugging patches not appropriate for mainline etc and is the most | 404 | lot of debugging patches not appropriate for mainline etc., and is the most |
394 | experimental of the branches described in this document. | 405 | experimental of the branches described in this document. |
395 | 406 | ||
396 | These kernels are not appropriate for use on systems that are supposed to be | 407 | These kernels are not appropriate for use on systems that are supposed to be |
397 | stable and they are more risky to run than any of the other branches (make | 408 | stable and they are more risky to run than any of the other branches (make |
398 | sure you have up-to-date backups - that goes for any experimental kernel but | 409 | sure you have up-to-date backups -- that goes for any experimental kernel but |
399 | even more so for -mm kernels). | 410 | even more so for -mm kernels). |
400 | 411 | ||
401 | These kernels in addition to all the other experimental patches they contain | 412 | These kernels in addition to all the other experimental patches they contain |
@@ -433,7 +444,11 @@ $ cd .. | |||
433 | $ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir | 444 | $ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir |
434 | 445 | ||
435 | 446 | ||
436 | This concludes this list of explanations of the various kernel trees and I | 447 | This concludes this list of explanations of the various kernel trees. |
437 | hope you are now crystal clear on how to apply the various patches and help | 448 | I hope you are now clear on how to apply the various patches and help testing |
438 | testing the kernel. | 449 | the kernel. |
450 | |||
451 | Thank you's to Randy Dunlap, Rolf Eike Beer, Linus Torvalds, Bodo Eggert, | ||
452 | Johannes Stezenbach, Grant Coady, Pavel Machek and others that I may have | ||
453 | forgotten for their reviews and contributions to this document. | ||
439 | 454 | ||
diff --git a/Documentation/block/barrier.txt b/Documentation/block/barrier.txt new file mode 100644 index 000000000000..03971518b222 --- /dev/null +++ b/Documentation/block/barrier.txt | |||
@@ -0,0 +1,271 @@ | |||
1 | I/O Barriers | ||
2 | ============ | ||
3 | Tejun Heo <htejun@gmail.com>, July 22 2005 | ||
4 | |||
5 | I/O barrier requests are used to guarantee ordering around the barrier | ||
6 | requests. Unless you're crazy enough to use disk drives for | ||
7 | implementing synchronization constructs (wow, sounds interesting...), | ||
8 | the ordering is meaningful only for write requests for things like | ||
9 | journal checkpoints. All requests queued before a barrier request | ||
10 | must be finished (made it to the physical medium) before the barrier | ||
11 | request is started, and all requests queued after the barrier request | ||
12 | must be started only after the barrier request is finished (again, | ||
13 | made it to the physical medium). | ||
14 | |||
15 | In other words, I/O barrier requests have the following two properties. | ||
16 | |||
17 | 1. Request ordering | ||
18 | |||
19 | Requests cannot pass the barrier request. Preceding requests are | ||
20 | processed before the barrier and following requests after. | ||
21 | |||
22 | Depending on what features a drive supports, this can be done in one | ||
23 | of the following three ways. | ||
24 | |||
25 | i. For devices which have queue depth greater than 1 (TCQ devices) and | ||
26 | support ordered tags, block layer can just issue the barrier as an | ||
27 | ordered request and the lower level driver, controller and drive | ||
28 | itself are responsible for making sure that the ordering contraint is | ||
29 | met. Most modern SCSI controllers/drives should support this. | ||
30 | |||
31 | NOTE: SCSI ordered tag isn't currently used due to limitation in the | ||
32 | SCSI midlayer, see the following random notes section. | ||
33 | |||
34 | ii. For devices which have queue depth greater than 1 but don't | ||
35 | support ordered tags, block layer ensures that the requests preceding | ||
36 | a barrier request finishes before issuing the barrier request. Also, | ||
37 | it defers requests following the barrier until the barrier request is | ||
38 | finished. Older SCSI controllers/drives and SATA drives fall in this | ||
39 | category. | ||
40 | |||
41 | iii. Devices which have queue depth of 1. This is a degenerate case | ||
42 | of ii. Just keeping issue order suffices. Ancient SCSI | ||
43 | controllers/drives and IDE drives are in this category. | ||
44 | |||
45 | 2. Forced flushing to physcial medium | ||
46 | |||
47 | Again, if you're not gonna do synchronization with disk drives (dang, | ||
48 | it sounds even more appealing now!), the reason you use I/O barriers | ||
49 | is mainly to protect filesystem integrity when power failure or some | ||
50 | other events abruptly stop the drive from operating and possibly make | ||
51 | the drive lose data in its cache. So, I/O barriers need to guarantee | ||
52 | that requests actually get written to non-volatile medium in order. | ||
53 | |||
54 | There are four cases, | ||
55 | |||
56 | i. No write-back cache. Keeping requests ordered is enough. | ||
57 | |||
58 | ii. Write-back cache but no flush operation. There's no way to | ||
59 | gurantee physical-medium commit order. This kind of devices can't to | ||
60 | I/O barriers. | ||
61 | |||
62 | iii. Write-back cache and flush operation but no FUA (forced unit | ||
63 | access). We need two cache flushes - before and after the barrier | ||
64 | request. | ||
65 | |||
66 | iv. Write-back cache, flush operation and FUA. We still need one | ||
67 | flush to make sure requests preceding a barrier are written to medium, | ||
68 | but post-barrier flush can be avoided by using FUA write on the | ||
69 | barrier itself. | ||
70 | |||
71 | |||
72 | How to support barrier requests in drivers | ||
73 | ------------------------------------------ | ||
74 | |||
75 | All barrier handling is done inside block layer proper. All low level | ||
76 | drivers have to are implementing its prepare_flush_fn and using one | ||
77 | the following two functions to indicate what barrier type it supports | ||
78 | and how to prepare flush requests. Note that the term 'ordered' is | ||
79 | used to indicate the whole sequence of performing barrier requests | ||
80 | including draining and flushing. | ||
81 | |||
82 | typedef void (prepare_flush_fn)(request_queue_t *q, struct request *rq); | ||
83 | |||
84 | int blk_queue_ordered(request_queue_t *q, unsigned ordered, | ||
85 | prepare_flush_fn *prepare_flush_fn, | ||
86 | unsigned gfp_mask); | ||
87 | |||
88 | int blk_queue_ordered_locked(request_queue_t *q, unsigned ordered, | ||
89 | prepare_flush_fn *prepare_flush_fn, | ||
90 | unsigned gfp_mask); | ||
91 | |||
92 | The only difference between the two functions is whether or not the | ||
93 | caller is holding q->queue_lock on entry. The latter expects the | ||
94 | caller is holding the lock. | ||
95 | |||
96 | @q : the queue in question | ||
97 | @ordered : the ordered mode the driver/device supports | ||
98 | @prepare_flush_fn : this function should prepare @rq such that it | ||
99 | flushes cache to physical medium when executed | ||
100 | @gfp_mask : gfp_mask used when allocating data structures | ||
101 | for ordered processing | ||
102 | |||
103 | For example, SCSI disk driver's prepare_flush_fn looks like the | ||
104 | following. | ||
105 | |||
106 | static void sd_prepare_flush(request_queue_t *q, struct request *rq) | ||
107 | { | ||
108 | memset(rq->cmd, 0, sizeof(rq->cmd)); | ||
109 | rq->flags |= REQ_BLOCK_PC; | ||
110 | rq->timeout = SD_TIMEOUT; | ||
111 | rq->cmd[0] = SYNCHRONIZE_CACHE; | ||
112 | } | ||
113 | |||
114 | The following seven ordered modes are supported. The following table | ||
115 | shows which mode should be used depending on what features a | ||
116 | device/driver supports. In the leftmost column of table, | ||
117 | QUEUE_ORDERED_ prefix is omitted from the mode names to save space. | ||
118 | |||
119 | The table is followed by description of each mode. Note that in the | ||
120 | descriptions of QUEUE_ORDERED_DRAIN*, '=>' is used whereas '->' is | ||
121 | used for QUEUE_ORDERED_TAG* descriptions. '=>' indicates that the | ||
122 | preceding step must be complete before proceeding to the next step. | ||
123 | '->' indicates that the next step can start as soon as the previous | ||
124 | step is issued. | ||
125 | |||
126 | write-back cache ordered tag flush FUA | ||
127 | ----------------------------------------------------------------------- | ||
128 | NONE yes/no N/A no N/A | ||
129 | DRAIN no no N/A N/A | ||
130 | DRAIN_FLUSH yes no yes no | ||
131 | DRAIN_FUA yes no yes yes | ||
132 | TAG no yes N/A N/A | ||
133 | TAG_FLUSH yes yes yes no | ||
134 | TAG_FUA yes yes yes yes | ||
135 | |||
136 | |||
137 | QUEUE_ORDERED_NONE | ||
138 | I/O barriers are not needed and/or supported. | ||
139 | |||
140 | Sequence: N/A | ||
141 | |||
142 | QUEUE_ORDERED_DRAIN | ||
143 | Requests are ordered by draining the request queue and cache | ||
144 | flushing isn't needed. | ||
145 | |||
146 | Sequence: drain => barrier | ||
147 | |||
148 | QUEUE_ORDERED_DRAIN_FLUSH | ||
149 | Requests are ordered by draining the request queue and both | ||
150 | pre-barrier and post-barrier cache flushings are needed. | ||
151 | |||
152 | Sequence: drain => preflush => barrier => postflush | ||
153 | |||
154 | QUEUE_ORDERED_DRAIN_FUA | ||
155 | Requests are ordered by draining the request queue and | ||
156 | pre-barrier cache flushing is needed. By using FUA on barrier | ||
157 | request, post-barrier flushing can be skipped. | ||
158 | |||
159 | Sequence: drain => preflush => barrier | ||
160 | |||
161 | QUEUE_ORDERED_TAG | ||
162 | Requests are ordered by ordered tag and cache flushing isn't | ||
163 | needed. | ||
164 | |||
165 | Sequence: barrier | ||
166 | |||
167 | QUEUE_ORDERED_TAG_FLUSH | ||
168 | Requests are ordered by ordered tag and both pre-barrier and | ||
169 | post-barrier cache flushings are needed. | ||
170 | |||
171 | Sequence: preflush -> barrier -> postflush | ||
172 | |||
173 | QUEUE_ORDERED_TAG_FUA | ||
174 | Requests are ordered by ordered tag and pre-barrier cache | ||
175 | flushing is needed. By using FUA on barrier request, | ||
176 | post-barrier flushing can be skipped. | ||
177 | |||
178 | Sequence: preflush -> barrier | ||
179 | |||
180 | |||
181 | Random notes/caveats | ||
182 | -------------------- | ||
183 | |||
184 | * SCSI layer currently can't use TAG ordering even if the drive, | ||
185 | controller and driver support it. The problem is that SCSI midlayer | ||
186 | request dispatch function is not atomic. It releases queue lock and | ||
187 | switch to SCSI host lock during issue and it's possible and likely to | ||
188 | happen in time that requests change their relative positions. Once | ||
189 | this problem is solved, TAG ordering can be enabled. | ||
190 | |||
191 | * Currently, no matter which ordered mode is used, there can be only | ||
192 | one barrier request in progress. All I/O barriers are held off by | ||
193 | block layer until the previous I/O barrier is complete. This doesn't | ||
194 | make any difference for DRAIN ordered devices, but, for TAG ordered | ||
195 | devices with very high command latency, passing multiple I/O barriers | ||
196 | to low level *might* be helpful if they are very frequent. Well, this | ||
197 | certainly is a non-issue. I'm writing this just to make clear that no | ||
198 | two I/O barrier is ever passed to low-level driver. | ||
199 | |||
200 | * Completion order. Requests in ordered sequence are issued in order | ||
201 | but not required to finish in order. Barrier implementation can | ||
202 | handle out-of-order completion of ordered sequence. IOW, the requests | ||
203 | MUST be processed in order but the hardware/software completion paths | ||
204 | are allowed to reorder completion notifications - eg. current SCSI | ||
205 | midlayer doesn't preserve completion order during error handling. | ||
206 | |||
207 | * Requeueing order. Low-level drivers are free to requeue any request | ||
208 | after they removed it from the request queue with | ||
209 | blkdev_dequeue_request(). As barrier sequence should be kept in order | ||
210 | when requeued, generic elevator code takes care of putting requests in | ||
211 | order around barrier. See blk_ordered_req_seq() and | ||
212 | ELEVATOR_INSERT_REQUEUE handling in __elv_add_request() for details. | ||
213 | |||
214 | Note that block drivers must not requeue preceding requests while | ||
215 | completing latter requests in an ordered sequence. Currently, no | ||
216 | error checking is done against this. | ||
217 | |||
218 | * Error handling. Currently, block layer will report error to upper | ||
219 | layer if any of requests in an ordered sequence fails. Unfortunately, | ||
220 | this doesn't seem to be enough. Look at the following request flow. | ||
221 | QUEUE_ORDERED_TAG_FLUSH is in use. | ||
222 | |||
223 | [0] [1] [2] [3] [pre] [barrier] [post] < [4] [5] [6] ... > | ||
224 | still in elevator | ||
225 | |||
226 | Let's say request [2], [3] are write requests to update file system | ||
227 | metadata (journal or whatever) and [barrier] is used to mark that | ||
228 | those updates are valid. Consider the following sequence. | ||
229 | |||
230 | i. Requests [0] ~ [post] leaves the request queue and enters | ||
231 | low-level driver. | ||
232 | ii. After a while, unfortunately, something goes wrong and the | ||
233 | drive fails [2]. Note that any of [0], [1] and [3] could have | ||
234 | completed by this time, but [pre] couldn't have been finished | ||
235 | as the drive must process it in order and it failed before | ||
236 | processing that command. | ||
237 | iii. Error handling kicks in and determines that the error is | ||
238 | unrecoverable and fails [2], and resumes operation. | ||
239 | iv. [pre] [barrier] [post] gets processed. | ||
240 | v. *BOOM* power fails | ||
241 | |||
242 | The problem here is that the barrier request is *supposed* to indicate | ||
243 | that filesystem update requests [2] and [3] made it safely to the | ||
244 | physical medium and, if the machine crashes after the barrier is | ||
245 | written, filesystem recovery code can depend on that. Sadly, that | ||
246 | isn't true in this case anymore. IOW, the success of a I/O barrier | ||
247 | should also be dependent on success of some of the preceding requests, | ||
248 | where only upper layer (filesystem) knows what 'some' is. | ||
249 | |||
250 | This can be solved by implementing a way to tell the block layer which | ||
251 | requests affect the success of the following barrier request and | ||
252 | making lower lever drivers to resume operation on error only after | ||
253 | block layer tells it to do so. | ||
254 | |||
255 | As the probability of this happening is very low and the drive should | ||
256 | be faulty, implementing the fix is probably an overkill. But, still, | ||
257 | it's there. | ||
258 | |||
259 | * In previous drafts of barrier implementation, there was fallback | ||
260 | mechanism such that, if FUA or ordered TAG fails, less fancy ordered | ||
261 | mode can be selected and the failed barrier request is retried | ||
262 | automatically. The rationale for this feature was that as FUA is | ||
263 | pretty new in ATA world and ordered tag was never used widely, there | ||
264 | could be devices which report to support those features but choke when | ||
265 | actually given such requests. | ||
266 | |||
267 | This was removed for two reasons 1. it's an overkill 2. it's | ||
268 | impossible to implement properly when TAG ordering is used as low | ||
269 | level drivers resume after an error automatically. If it's ever | ||
270 | needed adding it back and modifying low level drivers accordingly | ||
271 | shouldn't be difficult. | ||
diff --git a/Documentation/block/stat.txt b/Documentation/block/stat.txt new file mode 100644 index 000000000000..0dbc946de2ea --- /dev/null +++ b/Documentation/block/stat.txt | |||
@@ -0,0 +1,82 @@ | |||
1 | Block layer statistics in /sys/block/<dev>/stat | ||
2 | =============================================== | ||
3 | |||
4 | This file documents the contents of the /sys/block/<dev>/stat file. | ||
5 | |||
6 | The stat file provides several statistics about the state of block | ||
7 | device <dev>. | ||
8 | |||
9 | Q. Why are there multiple statistics in a single file? Doesn't sysfs | ||
10 | normally contain a single value per file? | ||
11 | A. By having a single file, the kernel can guarantee that the statistics | ||
12 | represent a consistent snapshot of the state of the device. If the | ||
13 | statistics were exported as multiple files containing one statistic | ||
14 | each, it would be impossible to guarantee that a set of readings | ||
15 | represent a single point in time. | ||
16 | |||
17 | The stat file consists of a single line of text containing 11 decimal | ||
18 | values separated by whitespace. The fields are summarized in the | ||
19 | following table, and described in more detail below. | ||
20 | |||
21 | Name units description | ||
22 | ---- ----- ----------- | ||
23 | read I/Os requests number of read I/Os processed | ||
24 | read merges requests number of read I/Os merged with in-queue I/O | ||
25 | read sectors sectors number of sectors read | ||
26 | read ticks milliseconds total wait time for read requests | ||
27 | write I/Os requests number of write I/Os processed | ||
28 | write merges requests number of write I/Os merged with in-queue I/O | ||
29 | write sectors sectors number of sectors written | ||
30 | write ticks milliseconds total wait time for write requests | ||
31 | in_flight requests number of I/Os currently in flight | ||
32 | io_ticks milliseconds total time this block device has been active | ||
33 | time_in_queue milliseconds total wait time for all requests | ||
34 | |||
35 | read I/Os, write I/Os | ||
36 | ===================== | ||
37 | |||
38 | These values increment when an I/O request completes. | ||
39 | |||
40 | read merges, write merges | ||
41 | ========================= | ||
42 | |||
43 | These values increment when an I/O request is merged with an | ||
44 | already-queued I/O request. | ||
45 | |||
46 | read sectors, write sectors | ||
47 | =========================== | ||
48 | |||
49 | These values count the number of sectors read from or written to this | ||
50 | block device. The "sectors" in question are the standard UNIX 512-byte | ||
51 | sectors, not any device- or filesystem-specific block size. The | ||
52 | counters are incremented when the I/O completes. | ||
53 | |||
54 | read ticks, write ticks | ||
55 | ======================= | ||
56 | |||
57 | These values count the number of milliseconds that I/O requests have | ||
58 | waited on this block device. If there are multiple I/O requests waiting, | ||
59 | these values will increase at a rate greater than 1000/second; for | ||
60 | example, if 60 read requests wait for an average of 30 ms, the read_ticks | ||
61 | field will increase by 60*30 = 1800. | ||
62 | |||
63 | in_flight | ||
64 | ========= | ||
65 | |||
66 | This value counts the number of I/O requests that have been issued to | ||
67 | the device driver but have not yet completed. It does not include I/O | ||
68 | requests that are in the queue but not yet issued to the device driver. | ||
69 | |||
70 | io_ticks | ||
71 | ======== | ||
72 | |||
73 | This value counts the number of milliseconds during which the device has | ||
74 | had I/O requests queued. | ||
75 | |||
76 | time_in_queue | ||
77 | ============= | ||
78 | |||
79 | This value counts the number of milliseconds that I/O requests have waited | ||
80 | on this block device. If there are multiple I/O requests waiting, this | ||
81 | value will increase as the product of the number of milliseconds times the | ||
82 | number of requests waiting (see "read ticks" above for an example). | ||
diff --git a/Documentation/cachetlb.txt b/Documentation/cachetlb.txt index 7eb715e07eda..4ae418889b88 100644 --- a/Documentation/cachetlb.txt +++ b/Documentation/cachetlb.txt | |||
@@ -136,7 +136,7 @@ changes occur: | |||
136 | 8) void lazy_mmu_prot_update(pte_t pte) | 136 | 8) void lazy_mmu_prot_update(pte_t pte) |
137 | This interface is called whenever the protection on | 137 | This interface is called whenever the protection on |
138 | any user PTEs change. This interface provides a notification | 138 | any user PTEs change. This interface provides a notification |
139 | to architecture specific code to take appropiate action. | 139 | to architecture specific code to take appropriate action. |
140 | 140 | ||
141 | 141 | ||
142 | Next, we have the cache flushing interfaces. In general, when Linux | 142 | Next, we have the cache flushing interfaces. In general, when Linux |
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt new file mode 100644 index 000000000000..08c5d04f3086 --- /dev/null +++ b/Documentation/cpu-hotplug.txt | |||
@@ -0,0 +1,357 @@ | |||
1 | CPU hotplug Support in Linux(tm) Kernel | ||
2 | |||
3 | Maintainers: | ||
4 | CPU Hotplug Core: | ||
5 | Rusty Russell <rusty@rustycorp.com.au> | ||
6 | Srivatsa Vaddagiri <vatsa@in.ibm.com> | ||
7 | i386: | ||
8 | Zwane Mwaikambo <zwane@arm.linux.org.uk> | ||
9 | ppc64: | ||
10 | Nathan Lynch <nathanl@austin.ibm.com> | ||
11 | Joel Schopp <jschopp@austin.ibm.com> | ||
12 | ia64/x86_64: | ||
13 | Ashok Raj <ashok.raj@intel.com> | ||
14 | |||
15 | Authors: Ashok Raj <ashok.raj@intel.com> | ||
16 | Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>, | ||
17 | Joel Schopp <jschopp@austin.ibm.com> | ||
18 | |||
19 | Introduction | ||
20 | |||
21 | Modern advances in system architectures have introduced advanced error | ||
22 | reporting and correction capabilities in processors. CPU architectures permit | ||
23 | partitioning support, where compute resources of a single CPU could be made | ||
24 | available to virtual machine environments. There are couple OEMS that | ||
25 | support NUMA hardware which are hot pluggable as well, where physical | ||
26 | node insertion and removal require support for CPU hotplug. | ||
27 | |||
28 | Such advances require CPUs available to a kernel to be removed either for | ||
29 | provisioning reasons, or for RAS purposes to keep an offending CPU off | ||
30 | system execution path. Hence the need for CPU hotplug support in the | ||
31 | Linux kernel. | ||
32 | |||
33 | A more novel use of CPU-hotplug support is its use today in suspend | ||
34 | resume support for SMP. Dual-core and HT support makes even | ||
35 | a laptop run SMP kernels which didn't support these methods. SMP support | ||
36 | for suspend/resume is a work in progress. | ||
37 | |||
38 | General Stuff about CPU Hotplug | ||
39 | -------------------------------- | ||
40 | |||
41 | Command Line Switches | ||
42 | --------------------- | ||
43 | maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using | ||
44 | maxcpus=2 will only boot 2. You can choose to bring the | ||
45 | other cpus later online, read FAQ's for more info. | ||
46 | |||
47 | additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus. | ||
48 | This option sets | ||
49 | cpu_possible_map = cpu_present_map + additional_cpus | ||
50 | |||
51 | CPU maps and such | ||
52 | ----------------- | ||
53 | [More on cpumaps and primitive to manipulate, please check | ||
54 | include/linux/cpumask.h that has more descriptive text.] | ||
55 | |||
56 | cpu_possible_map: Bitmap of possible CPUs that can ever be available in the | ||
57 | system. This is used to allocate some boot time memory for per_cpu variables | ||
58 | that aren't designed to grow/shrink as CPUs are made available or removed. | ||
59 | Once set during boot time discovery phase, the map is static, i.e no bits | ||
60 | are added or removed anytime. Trimming it accurately for your system needs | ||
61 | upfront can save some boot time memory. See below for how we use heuristics | ||
62 | in x86_64 case to keep this under check. | ||
63 | |||
64 | cpu_online_map: Bitmap of all CPUs currently online. Its set in __cpu_up() | ||
65 | after a cpu is available for kernel scheduling and ready to receive | ||
66 | interrupts from devices. Its cleared when a cpu is brought down using | ||
67 | __cpu_disable(), before which all OS services including interrupts are | ||
68 | migrated to another target CPU. | ||
69 | |||
70 | cpu_present_map: Bitmap of CPUs currently present in the system. Not all | ||
71 | of them may be online. When physical hotplug is processed by the relevant | ||
72 | subsystem (e.g ACPI) can change and new bit either be added or removed | ||
73 | from the map depending on the event is hot-add/hot-remove. There are currently | ||
74 | no locking rules as of now. Typical usage is to init topology during boot, | ||
75 | at which time hotplug is disabled. | ||
76 | |||
77 | You really dont need to manipulate any of the system cpu maps. They should | ||
78 | be read-only for most use. When setting up per-cpu resources almost always use | ||
79 | cpu_possible_map/for_each_cpu() to iterate. | ||
80 | |||
81 | Never use anything other than cpumask_t to represent bitmap of CPUs. | ||
82 | |||
83 | #include <linux/cpumask.h> | ||
84 | |||
85 | for_each_cpu - Iterate over cpu_possible_map | ||
86 | for_each_online_cpu - Iterate over cpu_online_map | ||
87 | for_each_present_cpu - Iterate over cpu_present_map | ||
88 | for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask. | ||
89 | |||
90 | #include <linux/cpu.h> | ||
91 | lock_cpu_hotplug() and unlock_cpu_hotplug(): | ||
92 | |||
93 | The above calls are used to inhibit cpu hotplug operations. While holding the | ||
94 | cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid | ||
95 | cpus going away, you could also use preempt_disable() and preempt_enable() | ||
96 | for those sections. Just remember the critical section cannot call any | ||
97 | function that can sleep or schedule this process away. The preempt_disable() | ||
98 | will work as long as stop_machine_run() is used to take a cpu down. | ||
99 | |||
100 | CPU Hotplug - Frequently Asked Questions. | ||
101 | |||
102 | Q: How to i enable my kernel to support CPU hotplug? | ||
103 | A: When doing make defconfig, Enable CPU hotplug support | ||
104 | |||
105 | "Processor type and Features" -> Support for Hotpluggable CPUs | ||
106 | |||
107 | Make sure that you have CONFIG_HOTPLUG, and CONFIG_SMP turned on as well. | ||
108 | |||
109 | You would need to enable CONFIG_HOTPLUG_CPU for SMP suspend/resume support | ||
110 | as well. | ||
111 | |||
112 | Q: What architectures support CPU hotplug? | ||
113 | A: As of 2.6.14, the following architectures support CPU hotplug. | ||
114 | |||
115 | i386 (Intel), ppc, ppc64, parisc, s390, ia64 and x86_64 | ||
116 | |||
117 | Q: How to test if hotplug is supported on the newly built kernel? | ||
118 | A: You should now notice an entry in sysfs. | ||
119 | |||
120 | Check if sysfs is mounted, using the "mount" command. You should notice | ||
121 | an entry as shown below in the output. | ||
122 | |||
123 | .... | ||
124 | none on /sys type sysfs (rw) | ||
125 | .... | ||
126 | |||
127 | if this is not mounted, do the following. | ||
128 | |||
129 | #mkdir /sysfs | ||
130 | #mount -t sysfs sys /sys | ||
131 | |||
132 | now you should see entries for all present cpu, the following is an example | ||
133 | in a 8-way system. | ||
134 | |||
135 | #pwd | ||
136 | #/sys/devices/system/cpu | ||
137 | #ls -l | ||
138 | total 0 | ||
139 | drwxr-xr-x 10 root root 0 Sep 19 07:44 . | ||
140 | drwxr-xr-x 13 root root 0 Sep 19 07:45 .. | ||
141 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0 | ||
142 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1 | ||
143 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2 | ||
144 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3 | ||
145 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4 | ||
146 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5 | ||
147 | drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6 | ||
148 | drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7 | ||
149 | |||
150 | Under each directory you would find an "online" file which is the control | ||
151 | file to logically online/offline a processor. | ||
152 | |||
153 | Q: Does hot-add/hot-remove refer to physical add/remove of cpus? | ||
154 | A: The usage of hot-add/remove may not be very consistently used in the code. | ||
155 | CONFIG_CPU_HOTPLUG enables logical online/offline capability in the kernel. | ||
156 | To support physical addition/removal, one would need some BIOS hooks and | ||
157 | the platform should have something like an attention button in PCI hotplug. | ||
158 | CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs. | ||
159 | |||
160 | Q: How do i logically offline a CPU? | ||
161 | A: Do the following. | ||
162 | |||
163 | #echo 0 > /sys/devices/system/cpu/cpuX/online | ||
164 | |||
165 | once the logical offline is successful, check | ||
166 | |||
167 | #cat /proc/interrupts | ||
168 | |||
169 | you should now not see the CPU that you removed. Also online file will report | ||
170 | the state as 0 when a cpu if offline and 1 when its online. | ||
171 | |||
172 | #To display the current cpu state. | ||
173 | #cat /sys/devices/system/cpu/cpuX/online | ||
174 | |||
175 | Q: Why cant i remove CPU0 on some systems? | ||
176 | A: Some architectures may have some special dependency on a certain CPU. | ||
177 | |||
178 | For e.g in IA64 platforms we have ability to sent platform interrupts to the | ||
179 | OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI | ||
180 | specifications, we didn't have a way to change the target CPU. Hence if the | ||
181 | current ACPI version doesn't support such re-direction, we disable that CPU | ||
182 | by making it not-removable. | ||
183 | |||
184 | In such cases you will also notice that the online file is missing under cpu0. | ||
185 | |||
186 | Q: How do i find out if a particular CPU is not removable? | ||
187 | A: Depending on the implementation, some architectures may show this by the | ||
188 | absence of the "online" file. This is done if it can be determined ahead of | ||
189 | time that this CPU cannot be removed. | ||
190 | |||
191 | In some situations, this can be a run time check, i.e if you try to remove the | ||
192 | last CPU, this will not be permitted. You can find such failures by | ||
193 | investigating the return value of the "echo" command. | ||
194 | |||
195 | Q: What happens when a CPU is being logically offlined? | ||
196 | A: The following happen, listed in no particular order :-) | ||
197 | |||
198 | - A notification is sent to in-kernel registered modules by sending an event | ||
199 | CPU_DOWN_PREPARE | ||
200 | - All process is migrated away from this outgoing CPU to a new CPU | ||
201 | - All interrupts targeted to this CPU is migrated to a new CPU | ||
202 | - timers/bottom half/task lets are also migrated to a new CPU | ||
203 | - Once all services are migrated, kernel calls an arch specific routine | ||
204 | __cpu_disable() to perform arch specific cleanup. | ||
205 | - Once this is successful, an event for successful cleanup is sent by an event | ||
206 | CPU_DEAD. | ||
207 | |||
208 | "It is expected that each service cleans up when the CPU_DOWN_PREPARE | ||
209 | notifier is called, when CPU_DEAD is called its expected there is nothing | ||
210 | running on behalf of this CPU that was offlined" | ||
211 | |||
212 | Q: If i have some kernel code that needs to be aware of CPU arrival and | ||
213 | departure, how to i arrange for proper notification? | ||
214 | A: This is what you would need in your kernel code to receive notifications. | ||
215 | |||
216 | #include <linux/cpu.h> | ||
217 | static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb, | ||
218 | unsigned long action, void *hcpu) | ||
219 | { | ||
220 | unsigned int cpu = (unsigned long)hcpu; | ||
221 | |||
222 | switch (action) { | ||
223 | case CPU_ONLINE: | ||
224 | foobar_online_action(cpu); | ||
225 | break; | ||
226 | case CPU_DEAD: | ||
227 | foobar_dead_action(cpu); | ||
228 | break; | ||
229 | } | ||
230 | return NOTIFY_OK; | ||
231 | } | ||
232 | |||
233 | static struct notifier_block foobar_cpu_notifer = | ||
234 | { | ||
235 | .notifier_call = foobar_cpu_callback, | ||
236 | }; | ||
237 | |||
238 | |||
239 | In your init function, | ||
240 | |||
241 | register_cpu_notifier(&foobar_cpu_notifier); | ||
242 | |||
243 | You can fail PREPARE notifiers if something doesn't work to prepare resources. | ||
244 | This will stop the activity and send a following CANCELED event back. | ||
245 | |||
246 | CPU_DEAD should not be failed, its just a goodness indication, but bad | ||
247 | things will happen if a notifier in path sent a BAD notify code. | ||
248 | |||
249 | Q: I don't see my action being called for all CPUs already up and running? | ||
250 | A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined. | ||
251 | If you need to perform some action for each cpu already in the system, then | ||
252 | |||
253 | for_each_online_cpu(i) { | ||
254 | foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i); | ||
255 | foobar_cpu_callback(&foobar-cpu_notifier, CPU_ONLINE, i); | ||
256 | } | ||
257 | |||
258 | Q: If i would like to develop cpu hotplug support for a new architecture, | ||
259 | what do i need at a minimum? | ||
260 | A: The following are what is required for CPU hotplug infrastructure to work | ||
261 | correctly. | ||
262 | |||
263 | - Make sure you have an entry in Kconfig to enable CONFIG_HOTPLUG_CPU | ||
264 | - __cpu_up() - Arch interface to bring up a CPU | ||
265 | - __cpu_disable() - Arch interface to shutdown a CPU, no more interrupts | ||
266 | can be handled by the kernel after the routine | ||
267 | returns. Including local APIC timers etc are | ||
268 | shutdown. | ||
269 | - __cpu_die() - This actually supposed to ensure death of the CPU. | ||
270 | Actually look at some example code in other arch | ||
271 | that implement CPU hotplug. The processor is taken | ||
272 | down from the idle() loop for that specific | ||
273 | architecture. __cpu_die() typically waits for some | ||
274 | per_cpu state to be set, to ensure the processor | ||
275 | dead routine is called to be sure positively. | ||
276 | |||
277 | Q: I need to ensure that a particular cpu is not removed when there is some | ||
278 | work specific to this cpu is in progress. | ||
279 | A: First switch the current thread context to preferred cpu | ||
280 | |||
281 | int my_func_on_cpu(int cpu) | ||
282 | { | ||
283 | cpumask_t saved_mask, new_mask = CPU_MASK_NONE; | ||
284 | int curr_cpu, err = 0; | ||
285 | |||
286 | saved_mask = current->cpus_allowed; | ||
287 | cpu_set(cpu, new_mask); | ||
288 | err = set_cpus_allowed(current, new_mask); | ||
289 | |||
290 | if (err) | ||
291 | return err; | ||
292 | |||
293 | /* | ||
294 | * If we got scheduled out just after the return from | ||
295 | * set_cpus_allowed() before running the work, this ensures | ||
296 | * we stay locked. | ||
297 | */ | ||
298 | curr_cpu = get_cpu(); | ||
299 | |||
300 | if (curr_cpu != cpu) { | ||
301 | err = -EAGAIN; | ||
302 | goto ret; | ||
303 | } else { | ||
304 | /* | ||
305 | * Do work : But cant sleep, since get_cpu() disables preempt | ||
306 | */ | ||
307 | } | ||
308 | ret: | ||
309 | put_cpu(); | ||
310 | set_cpus_allowed(current, saved_mask); | ||
311 | return err; | ||
312 | } | ||
313 | |||
314 | |||
315 | Q: How do we determine how many CPUs are available for hotplug. | ||
316 | A: There is no clear spec defined way from ACPI that can give us that | ||
317 | information today. Based on some input from Natalie of Unisys, | ||
318 | that the ACPI MADT (Multiple APIC Description Tables) marks those possible | ||
319 | CPUs in a system with disabled status. | ||
320 | |||
321 | Andi implemented some simple heuristics that count the number of disabled | ||
322 | CPUs in MADT as hotpluggable CPUS. In the case there are no disabled CPUS | ||
323 | we assume 1/2 the number of CPUs currently present can be hotplugged. | ||
324 | |||
325 | Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field | ||
326 | in MADT is only 8 bits. | ||
327 | |||
328 | User Space Notification | ||
329 | |||
330 | Hotplug support for devices is common in Linux today. Its being used today to | ||
331 | support automatic configuration of network, usb and pci devices. A hotplug | ||
332 | event can be used to invoke an agent script to perform the configuration task. | ||
333 | |||
334 | You can add /etc/hotplug/cpu.agent to handle hotplug notification user space | ||
335 | scripts. | ||
336 | |||
337 | #!/bin/bash | ||
338 | # $Id: cpu.agent | ||
339 | # Kernel hotplug params include: | ||
340 | #ACTION=%s [online or offline] | ||
341 | #DEVPATH=%s | ||
342 | # | ||
343 | cd /etc/hotplug | ||
344 | . ./hotplug.functions | ||
345 | |||
346 | case $ACTION in | ||
347 | online) | ||
348 | echo `date` ":cpu.agent" add cpu >> /tmp/hotplug.txt | ||
349 | ;; | ||
350 | offline) | ||
351 | echo `date` ":cpu.agent" remove cpu >>/tmp/hotplug.txt | ||
352 | ;; | ||
353 | *) | ||
354 | debug_mesg CPU $ACTION event not supported | ||
355 | exit 1 | ||
356 | ;; | ||
357 | esac | ||
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt index a09a8eb80665..990998ee10b6 100644 --- a/Documentation/cpusets.txt +++ b/Documentation/cpusets.txt | |||
@@ -14,7 +14,10 @@ CONTENTS: | |||
14 | 1.1 What are cpusets ? | 14 | 1.1 What are cpusets ? |
15 | 1.2 Why are cpusets needed ? | 15 | 1.2 Why are cpusets needed ? |
16 | 1.3 How are cpusets implemented ? | 16 | 1.3 How are cpusets implemented ? |
17 | 1.4 How do I use cpusets ? | 17 | 1.4 What are exclusive cpusets ? |
18 | 1.5 What does notify_on_release do ? | ||
19 | 1.6 What is memory_pressure ? | ||
20 | 1.7 How do I use cpusets ? | ||
18 | 2. Usage Examples and Syntax | 21 | 2. Usage Examples and Syntax |
19 | 2.1 Basic Usage | 22 | 2.1 Basic Usage |
20 | 2.2 Adding/removing cpus | 23 | 2.2 Adding/removing cpus |
@@ -49,29 +52,6 @@ its cpus_allowed vector, and the kernel page allocator will not | |||
49 | allocate a page on a node that is not allowed in the requesting tasks | 52 | allocate a page on a node that is not allowed in the requesting tasks |
50 | mems_allowed vector. | 53 | mems_allowed vector. |
51 | 54 | ||
52 | If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct | ||
53 | ancestor or descendent, may share any of the same CPUs or Memory Nodes. | ||
54 | A cpuset that is cpu exclusive has a sched domain associated with it. | ||
55 | The sched domain consists of all cpus in the current cpuset that are not | ||
56 | part of any exclusive child cpusets. | ||
57 | This ensures that the scheduler load balacing code only balances | ||
58 | against the cpus that are in the sched domain as defined above and not | ||
59 | all of the cpus in the system. This removes any overhead due to | ||
60 | load balancing code trying to pull tasks outside of the cpu exclusive | ||
61 | cpuset only to be prevented by the tasks' cpus_allowed mask. | ||
62 | |||
63 | A cpuset that is mem_exclusive restricts kernel allocations for | ||
64 | page, buffer and other data commonly shared by the kernel across | ||
65 | multiple users. All cpusets, whether mem_exclusive or not, restrict | ||
66 | allocations of memory for user space. This enables configuring a | ||
67 | system so that several independent jobs can share common kernel | ||
68 | data, such as file system pages, while isolating each jobs user | ||
69 | allocation in its own cpuset. To do this, construct a large | ||
70 | mem_exclusive cpuset to hold all the jobs, and construct child, | ||
71 | non-mem_exclusive cpusets for each individual job. Only a small | ||
72 | amount of typical kernel memory, such as requests from interrupt | ||
73 | handlers, is allowed to be taken outside even a mem_exclusive cpuset. | ||
74 | |||
75 | User level code may create and destroy cpusets by name in the cpuset | 55 | User level code may create and destroy cpusets by name in the cpuset |
76 | virtual file system, manage the attributes and permissions of these | 56 | virtual file system, manage the attributes and permissions of these |
77 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, | 57 | cpusets and which CPUs and Memory Nodes are assigned to each cpuset, |
@@ -155,7 +135,7 @@ Cpusets extends these two mechanisms as follows: | |||
155 | The implementation of cpusets requires a few, simple hooks | 135 | The implementation of cpusets requires a few, simple hooks |
156 | into the rest of the kernel, none in performance critical paths: | 136 | into the rest of the kernel, none in performance critical paths: |
157 | 137 | ||
158 | - in main/init.c, to initialize the root cpuset at system boot. | 138 | - in init/main.c, to initialize the root cpuset at system boot. |
159 | - in fork and exit, to attach and detach a task from its cpuset. | 139 | - in fork and exit, to attach and detach a task from its cpuset. |
160 | - in sched_setaffinity, to mask the requested CPUs by what's | 140 | - in sched_setaffinity, to mask the requested CPUs by what's |
161 | allowed in that tasks cpuset. | 141 | allowed in that tasks cpuset. |
@@ -166,7 +146,7 @@ into the rest of the kernel, none in performance critical paths: | |||
166 | and related changes in both sched.c and arch/ia64/kernel/domain.c | 146 | and related changes in both sched.c and arch/ia64/kernel/domain.c |
167 | - in the mbind and set_mempolicy system calls, to mask the requested | 147 | - in the mbind and set_mempolicy system calls, to mask the requested |
168 | Memory Nodes by what's allowed in that tasks cpuset. | 148 | Memory Nodes by what's allowed in that tasks cpuset. |
169 | - in page_alloc, to restrict memory to allowed nodes. | 149 | - in page_alloc.c, to restrict memory to allowed nodes. |
170 | - in vmscan.c, to restrict page recovery to the current cpuset. | 150 | - in vmscan.c, to restrict page recovery to the current cpuset. |
171 | 151 | ||
172 | In addition a new file system, of type "cpuset" may be mounted, | 152 | In addition a new file system, of type "cpuset" may be mounted, |
@@ -192,9 +172,15 @@ containing the following files describing that cpuset: | |||
192 | 172 | ||
193 | - cpus: list of CPUs in that cpuset | 173 | - cpus: list of CPUs in that cpuset |
194 | - mems: list of Memory Nodes in that cpuset | 174 | - mems: list of Memory Nodes in that cpuset |
175 | - memory_migrate flag: if set, move pages to cpusets nodes | ||
195 | - cpu_exclusive flag: is cpu placement exclusive? | 176 | - cpu_exclusive flag: is cpu placement exclusive? |
196 | - mem_exclusive flag: is memory placement exclusive? | 177 | - mem_exclusive flag: is memory placement exclusive? |
197 | - tasks: list of tasks (by pid) attached to that cpuset | 178 | - tasks: list of tasks (by pid) attached to that cpuset |
179 | - notify_on_release flag: run /sbin/cpuset_release_agent on exit? | ||
180 | - memory_pressure: measure of how much paging pressure in cpuset | ||
181 | |||
182 | In addition, the root cpuset only has the following file: | ||
183 | - memory_pressure_enabled flag: compute memory_pressure? | ||
198 | 184 | ||
199 | New cpusets are created using the mkdir system call or shell | 185 | New cpusets are created using the mkdir system call or shell |
200 | command. The properties of a cpuset, such as its flags, allowed | 186 | command. The properties of a cpuset, such as its flags, allowed |
@@ -228,7 +214,108 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs) | |||
228 | to represent the cpuset hierarchy provides for a familiar permission | 214 | to represent the cpuset hierarchy provides for a familiar permission |
229 | and name space for cpusets, with a minimum of additional kernel code. | 215 | and name space for cpusets, with a minimum of additional kernel code. |
230 | 216 | ||
231 | 1.4 How do I use cpusets ? | 217 | |
218 | 1.4 What are exclusive cpusets ? | ||
219 | -------------------------------- | ||
220 | |||
221 | If a cpuset is cpu or mem exclusive, no other cpuset, other than | ||
222 | a direct ancestor or descendent, may share any of the same CPUs or | ||
223 | Memory Nodes. | ||
224 | |||
225 | A cpuset that is cpu_exclusive has a scheduler (sched) domain | ||
226 | associated with it. The sched domain consists of all CPUs in the | ||
227 | current cpuset that are not part of any exclusive child cpusets. | ||
228 | This ensures that the scheduler load balancing code only balances | ||
229 | against the CPUs that are in the sched domain as defined above and | ||
230 | not all of the CPUs in the system. This removes any overhead due to | ||
231 | load balancing code trying to pull tasks outside of the cpu_exclusive | ||
232 | cpuset only to be prevented by the tasks' cpus_allowed mask. | ||
233 | |||
234 | A cpuset that is mem_exclusive restricts kernel allocations for | ||
235 | page, buffer and other data commonly shared by the kernel across | ||
236 | multiple users. All cpusets, whether mem_exclusive or not, restrict | ||
237 | allocations of memory for user space. This enables configuring a | ||
238 | system so that several independent jobs can share common kernel data, | ||
239 | such as file system pages, while isolating each jobs user allocation in | ||
240 | its own cpuset. To do this, construct a large mem_exclusive cpuset to | ||
241 | hold all the jobs, and construct child, non-mem_exclusive cpusets for | ||
242 | each individual job. Only a small amount of typical kernel memory, | ||
243 | such as requests from interrupt handlers, is allowed to be taken | ||
244 | outside even a mem_exclusive cpuset. | ||
245 | |||
246 | |||
247 | 1.5 What does notify_on_release do ? | ||
248 | ------------------------------------ | ||
249 | |||
250 | If the notify_on_release flag is enabled (1) in a cpuset, then whenever | ||
251 | the last task in the cpuset leaves (exits or attaches to some other | ||
252 | cpuset) and the last child cpuset of that cpuset is removed, then | ||
253 | the kernel runs the command /sbin/cpuset_release_agent, supplying the | ||
254 | pathname (relative to the mount point of the cpuset file system) of the | ||
255 | abandoned cpuset. This enables automatic removal of abandoned cpusets. | ||
256 | The default value of notify_on_release in the root cpuset at system | ||
257 | boot is disabled (0). The default value of other cpusets at creation | ||
258 | is the current value of their parents notify_on_release setting. | ||
259 | |||
260 | |||
261 | 1.6 What is memory_pressure ? | ||
262 | ----------------------------- | ||
263 | The memory_pressure of a cpuset provides a simple per-cpuset metric | ||
264 | of the rate that the tasks in a cpuset are attempting to free up in | ||
265 | use memory on the nodes of the cpuset to satisfy additional memory | ||
266 | requests. | ||
267 | |||
268 | This enables batch managers monitoring jobs running in dedicated | ||
269 | cpusets to efficiently detect what level of memory pressure that job | ||
270 | is causing. | ||
271 | |||
272 | This is useful both on tightly managed systems running a wide mix of | ||
273 | submitted jobs, which may choose to terminate or re-prioritize jobs that | ||
274 | are trying to use more memory than allowed on the nodes assigned them, | ||
275 | and with tightly coupled, long running, massively parallel scientific | ||
276 | computing jobs that will dramatically fail to meet required performance | ||
277 | goals if they start to use more memory than allowed to them. | ||
278 | |||
279 | This mechanism provides a very economical way for the batch manager | ||
280 | to monitor a cpuset for signs of memory pressure. It's up to the | ||
281 | batch manager or other user code to decide what to do about it and | ||
282 | take action. | ||
283 | |||
284 | ==> Unless this feature is enabled by writing "1" to the special file | ||
285 | /dev/cpuset/memory_pressure_enabled, the hook in the rebalance | ||
286 | code of __alloc_pages() for this metric reduces to simply noticing | ||
287 | that the cpuset_memory_pressure_enabled flag is zero. So only | ||
288 | systems that enable this feature will compute the metric. | ||
289 | |||
290 | Why a per-cpuset, running average: | ||
291 | |||
292 | Because this meter is per-cpuset, rather than per-task or mm, | ||
293 | the system load imposed by a batch scheduler monitoring this | ||
294 | metric is sharply reduced on large systems, because a scan of | ||
295 | the tasklist can be avoided on each set of queries. | ||
296 | |||
297 | Because this meter is a running average, instead of an accumulating | ||
298 | counter, a batch scheduler can detect memory pressure with a | ||
299 | single read, instead of having to read and accumulate results | ||
300 | for a period of time. | ||
301 | |||
302 | Because this meter is per-cpuset rather than per-task or mm, | ||
303 | the batch scheduler can obtain the key information, memory | ||
304 | pressure in a cpuset, with a single read, rather than having to | ||
305 | query and accumulate results over all the (dynamically changing) | ||
306 | set of tasks in the cpuset. | ||
307 | |||
308 | A per-cpuset simple digital filter (requires a spinlock and 3 words | ||
309 | of data per-cpuset) is kept, and updated by any task attached to that | ||
310 | cpuset, if it enters the synchronous (direct) page reclaim code. | ||
311 | |||
312 | A per-cpuset file provides an integer number representing the recent | ||
313 | (half-life of 10 seconds) rate of direct page reclaims caused by | ||
314 | the tasks in the cpuset, in units of reclaims attempted per second, | ||
315 | times 1000. | ||
316 | |||
317 | |||
318 | 1.7 How do I use cpusets ? | ||
232 | -------------------------- | 319 | -------------------------- |
233 | 320 | ||
234 | In order to minimize the impact of cpusets on critical kernel | 321 | In order to minimize the impact of cpusets on critical kernel |
@@ -277,6 +364,30 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid | |||
277 | impacting the scheduler code in the kernel with a check for changes | 364 | impacting the scheduler code in the kernel with a check for changes |
278 | in a tasks processor placement. | 365 | in a tasks processor placement. |
279 | 366 | ||
367 | Normally, once a page is allocated (given a physical page | ||
368 | of main memory) then that page stays on whatever node it | ||
369 | was allocated, so long as it remains allocated, even if the | ||
370 | cpusets memory placement policy 'mems' subsequently changes. | ||
371 | If the cpuset flag file 'memory_migrate' is set true, then when | ||
372 | tasks are attached to that cpuset, any pages that task had | ||
373 | allocated to it on nodes in its previous cpuset are migrated | ||
374 | to the tasks new cpuset. Depending on the implementation, | ||
375 | this migration may either be done by swapping the page out, | ||
376 | so that the next time the page is referenced, it will be paged | ||
377 | into the tasks new cpuset, usually on the node where it was | ||
378 | referenced, or this migration may be done by directly copying | ||
379 | the pages from the tasks previous cpuset to the new cpuset, | ||
380 | where possible to the same node, relative to the new cpuset, | ||
381 | as the node that held the page, relative to the old cpuset. | ||
382 | Also if 'memory_migrate' is set true, then if that cpusets | ||
383 | 'mems' file is modified, pages allocated to tasks in that | ||
384 | cpuset, that were on nodes in the previous setting of 'mems', | ||
385 | will be moved to nodes in the new setting of 'mems.' Again, | ||
386 | depending on the implementation, this might be done by swapping, | ||
387 | or by direct copying. In either case, pages that were not in | ||
388 | the tasks prior cpuset, or in the cpusets prior 'mems' setting, | ||
389 | will not be moved. | ||
390 | |||
280 | There is an exception to the above. If hotplug functionality is used | 391 | There is an exception to the above. If hotplug functionality is used |
281 | to remove all the CPUs that are currently assigned to a cpuset, | 392 | to remove all the CPUs that are currently assigned to a cpuset, |
282 | then the kernel will automatically update the cpus_allowed of all | 393 | then the kernel will automatically update the cpus_allowed of all |
diff --git a/Documentation/drivers/edac/edac.txt b/Documentation/drivers/edac/edac.txt new file mode 100644 index 000000000000..d37191fe5681 --- /dev/null +++ b/Documentation/drivers/edac/edac.txt | |||
@@ -0,0 +1,673 @@ | |||
1 | |||
2 | |||
3 | EDAC - Error Detection And Correction | ||
4 | |||
5 | Written by Doug Thompson <norsk5@xmission.com> | ||
6 | 7 Dec 2005 | ||
7 | |||
8 | |||
9 | EDAC was written by: | ||
10 | Thayne Harbaugh, | ||
11 | modified by Dave Peterson, Doug Thompson, et al, | ||
12 | from the bluesmoke.sourceforge.net project. | ||
13 | |||
14 | |||
15 | ============================================================================ | ||
16 | EDAC PURPOSE | ||
17 | |||
18 | The 'edac' kernel module goal is to detect and report errors that occur | ||
19 | within the computer system. In the initial release, memory Correctable Errors | ||
20 | (CE) and Uncorrectable Errors (UE) are the primary errors being harvested. | ||
21 | |||
22 | Detecting CE events, then harvesting those events and reporting them, | ||
23 | CAN be a predictor of future UE events. With CE events, the system can | ||
24 | continue to operate, but with less safety. Preventive maintainence and | ||
25 | proactive part replacement of memory DIMMs exhibiting CEs can reduce | ||
26 | the likelihood of the dreaded UE events and system 'panics'. | ||
27 | |||
28 | |||
29 | In addition, PCI Bus Parity and SERR Errors are scanned for on PCI devices | ||
30 | in order to determine if errors are occurring on data transfers. | ||
31 | The presence of PCI Parity errors must be examined with a grain of salt. | ||
32 | There are several addin adapters that do NOT follow the PCI specification | ||
33 | with regards to Parity generation and reporting. The specification says | ||
34 | the vendor should tie the parity status bits to 0 if they do not intend | ||
35 | to generate parity. Some vendors do not do this, and thus the parity bit | ||
36 | can "float" giving false positives. | ||
37 | |||
38 | The PCI Parity EDAC device has the ability to "skip" known flakey | ||
39 | cards during the parity scan. These are set by the parity "blacklist" | ||
40 | interface in the sysfs for PCI Parity. (See the PCI section in the sysfs | ||
41 | section below.) There is also a parity "whitelist" which is used as | ||
42 | an explicit list of devices to scan, while the blacklist is a list | ||
43 | of devices to skip. | ||
44 | |||
45 | EDAC will have future error detectors that will be added or integrated | ||
46 | into EDAC in the following list: | ||
47 | |||
48 | MCE Machine Check Exception | ||
49 | MCA Machine Check Architecture | ||
50 | NMI NMI notification of ECC errors | ||
51 | MSRs Machine Specific Register error cases | ||
52 | and other mechanisms. | ||
53 | |||
54 | These errors are usually bus errors, ECC errors, thermal throttling | ||
55 | and the like. | ||
56 | |||
57 | |||
58 | ============================================================================ | ||
59 | EDAC VERSIONING | ||
60 | |||
61 | EDAC is composed of a "core" module (edac_mc.ko) and several Memory | ||
62 | Controller (MC) driver modules. On a given system, the CORE | ||
63 | is loaded and one MC driver will be loaded. Both the CORE and | ||
64 | the MC driver have individual versions that reflect current release | ||
65 | level of their respective modules. Thus, to "report" on what version | ||
66 | a system is running, one must report both the CORE's and the | ||
67 | MC driver's versions. | ||
68 | |||
69 | |||
70 | LOADING | ||
71 | |||
72 | If 'edac' was statically linked with the kernel then no loading is | ||
73 | necessary. If 'edac' was built as modules then simply modprobe the | ||
74 | 'edac' pieces that you need. You should be able to modprobe | ||
75 | hardware-specific modules and have the dependencies load the necessary core | ||
76 | modules. | ||
77 | |||
78 | Example: | ||
79 | |||
80 | $> modprobe amd76x_edac | ||
81 | |||
82 | loads both the amd76x_edac.ko memory controller module and the edac_mc.ko | ||
83 | core module. | ||
84 | |||
85 | |||
86 | ============================================================================ | ||
87 | EDAC sysfs INTERFACE | ||
88 | |||
89 | EDAC presents a 'sysfs' interface for control, reporting and attribute | ||
90 | reporting purposes. | ||
91 | |||
92 | EDAC lives in the /sys/devices/system/edac directory. Within this directory | ||
93 | there currently reside 2 'edac' components: | ||
94 | |||
95 | mc memory controller(s) system | ||
96 | pci PCI status system | ||
97 | |||
98 | |||
99 | ============================================================================ | ||
100 | Memory Controller (mc) Model | ||
101 | |||
102 | First a background on the memory controller's model abstracted in EDAC. | ||
103 | Each mc device controls a set of DIMM memory modules. These modules are | ||
104 | layed out in a Chip-Select Row (csrowX) and Channel table (chX). There can | ||
105 | be multiple csrows and two channels. | ||
106 | |||
107 | Memory controllers allow for several csrows, with 8 csrows being a typical value. | ||
108 | Yet, the actual number of csrows depends on the electrical "loading" | ||
109 | of a given motherboard, memory controller and DIMM characteristics. | ||
110 | |||
111 | Dual channels allows for 128 bit data transfers to the CPU from memory. | ||
112 | |||
113 | |||
114 | Channel 0 Channel 1 | ||
115 | =================================== | ||
116 | csrow0 | DIMM_A0 | DIMM_B0 | | ||
117 | csrow1 | DIMM_A0 | DIMM_B0 | | ||
118 | =================================== | ||
119 | |||
120 | =================================== | ||
121 | csrow2 | DIMM_A1 | DIMM_B1 | | ||
122 | csrow3 | DIMM_A1 | DIMM_B1 | | ||
123 | =================================== | ||
124 | |||
125 | In the above example table there are 4 physical slots on the motherboard | ||
126 | for memory DIMMs: | ||
127 | |||
128 | DIMM_A0 | ||
129 | DIMM_B0 | ||
130 | DIMM_A1 | ||
131 | DIMM_B1 | ||
132 | |||
133 | Labels for these slots are usually silk screened on the motherboard. Slots | ||
134 | labeled 'A' are channel 0 in this example. Slots labled 'B' | ||
135 | are channel 1. Notice that there are two csrows possible on a | ||
136 | physical DIMM. These csrows are allocated their csrow assignment | ||
137 | based on the slot into which the memory DIMM is placed. Thus, when 1 DIMM | ||
138 | is placed in each Channel, the csrows cross both DIMMs. | ||
139 | |||
140 | Memory DIMMs come single or dual "ranked". A rank is a populated csrow. | ||
141 | Thus, 2 single ranked DIMMs, placed in slots DIMM_A0 and DIMM_B0 above | ||
142 | will have 1 csrow, csrow0. csrow1 will be empty. On the other hand, | ||
143 | when 2 dual ranked DIMMs are similiaryly placed, then both csrow0 and | ||
144 | csrow1 will be populated. The pattern repeats itself for csrow2 and | ||
145 | csrow3. | ||
146 | |||
147 | The representation of the above is reflected in the directory tree | ||
148 | in EDAC's sysfs interface. Starting in directory | ||
149 | /sys/devices/system/edac/mc each memory controller will be represented | ||
150 | by its own 'mcX' directory, where 'X" is the index of the MC. | ||
151 | |||
152 | |||
153 | ..../edac/mc/ | ||
154 | | | ||
155 | |->mc0 | ||
156 | |->mc1 | ||
157 | |->mc2 | ||
158 | .... | ||
159 | |||
160 | Under each 'mcX' directory each 'csrowX' is again represented by a | ||
161 | 'csrowX', where 'X" is the csrow index: | ||
162 | |||
163 | |||
164 | .../mc/mc0/ | ||
165 | | | ||
166 | |->csrow0 | ||
167 | |->csrow2 | ||
168 | |->csrow3 | ||
169 | .... | ||
170 | |||
171 | Notice that there is no csrow1, which indicates that csrow0 is | ||
172 | composed of a single ranked DIMMs. This should also apply in both | ||
173 | Channels, in order to have dual-channel mode be operational. Since | ||
174 | both csrow2 and csrow3 are populated, this indicates a dual ranked | ||
175 | set of DIMMs for channels 0 and 1. | ||
176 | |||
177 | |||
178 | Within each of the 'mc','mcX' and 'csrowX' directories are several | ||
179 | EDAC control and attribute files. | ||
180 | |||
181 | |||
182 | ============================================================================ | ||
183 | DIRECTORY 'mc' | ||
184 | |||
185 | In directory 'mc' are EDAC system overall control and attribute files: | ||
186 | |||
187 | |||
188 | Panic on UE control file: | ||
189 | |||
190 | 'panic_on_ue' | ||
191 | |||
192 | An uncorrectable error will cause a machine panic. This is usually | ||
193 | desirable. It is a bad idea to continue when an uncorrectable error | ||
194 | occurs - it is indeterminate what was uncorrected and the operating | ||
195 | system context might be so mangled that continuing will lead to further | ||
196 | corruption. If the kernel has MCE configured, then EDAC will never | ||
197 | notice the UE. | ||
198 | |||
199 | LOAD TIME: module/kernel parameter: panic_on_ue=[0|1] | ||
200 | |||
201 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/panic_on_ue | ||
202 | |||
203 | |||
204 | Log UE control file: | ||
205 | |||
206 | 'log_ue' | ||
207 | |||
208 | Generate kernel messages describing uncorrectable errors. These errors | ||
209 | are reported through the system message log system. UE statistics | ||
210 | will be accumulated even when UE logging is disabled. | ||
211 | |||
212 | LOAD TIME: module/kernel parameter: log_ue=[0|1] | ||
213 | |||
214 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ue | ||
215 | |||
216 | |||
217 | Log CE control file: | ||
218 | |||
219 | 'log_ce' | ||
220 | |||
221 | Generate kernel messages describing correctable errors. These | ||
222 | errors are reported through the system message log system. | ||
223 | CE statistics will be accumulated even when CE logging is disabled. | ||
224 | |||
225 | LOAD TIME: module/kernel parameter: log_ce=[0|1] | ||
226 | |||
227 | RUN TIME: echo "1" >/sys/devices/system/edac/mc/log_ce | ||
228 | |||
229 | |||
230 | Polling period control file: | ||
231 | |||
232 | 'poll_msec' | ||
233 | |||
234 | The time period, in milliseconds, for polling for error information. | ||
235 | Too small a value wastes resources. Too large a value might delay | ||
236 | necessary handling of errors and might loose valuable information for | ||
237 | locating the error. 1000 milliseconds (once each second) is about | ||
238 | right for most uses. | ||
239 | |||
240 | LOAD TIME: module/kernel parameter: poll_msec=[0|1] | ||
241 | |||
242 | RUN TIME: echo "1000" >/sys/devices/system/edac/mc/poll_msec | ||
243 | |||
244 | |||
245 | Module Version read-only attribute file: | ||
246 | |||
247 | 'mc_version' | ||
248 | |||
249 | The EDAC CORE modules's version and compile date are shown here to | ||
250 | indicate what EDAC is running. | ||
251 | |||
252 | |||
253 | |||
254 | ============================================================================ | ||
255 | 'mcX' DIRECTORIES | ||
256 | |||
257 | |||
258 | In 'mcX' directories are EDAC control and attribute files for | ||
259 | this 'X" instance of the memory controllers: | ||
260 | |||
261 | |||
262 | Counter reset control file: | ||
263 | |||
264 | 'reset_counters' | ||
265 | |||
266 | This write-only control file will zero all the statistical counters | ||
267 | for UE and CE errors. Zeroing the counters will also reset the timer | ||
268 | indicating how long since the last counter zero. This is useful | ||
269 | for computing errors/time. Since the counters are always reset at | ||
270 | driver initialization time, no module/kernel parameter is available. | ||
271 | |||
272 | RUN TIME: echo "anything" >/sys/devices/system/edac/mc/mc0/counter_reset | ||
273 | |||
274 | This resets the counters on memory controller 0 | ||
275 | |||
276 | |||
277 | Seconds since last counter reset control file: | ||
278 | |||
279 | 'seconds_since_reset' | ||
280 | |||
281 | This attribute file displays how many seconds have elapsed since the | ||
282 | last counter reset. This can be used with the error counters to | ||
283 | measure error rates. | ||
284 | |||
285 | |||
286 | |||
287 | DIMM capability attribute file: | ||
288 | |||
289 | 'edac_capability' | ||
290 | |||
291 | The EDAC (Error Detection and Correction) capabilities/modes of | ||
292 | the memory controller hardware. | ||
293 | |||
294 | |||
295 | DIMM Current Capability attribute file: | ||
296 | |||
297 | 'edac_current_capability' | ||
298 | |||
299 | The EDAC capabilities available with the hardware | ||
300 | configuration. This may not be the same as "EDAC capability" | ||
301 | if the correct memory is not used. If a memory controller is | ||
302 | capable of EDAC, but DIMMs without check bits are in use, then | ||
303 | Parity, SECDED, S4ECD4ED capabilities will not be available | ||
304 | even though the memory controller might be capable of those | ||
305 | modes with the proper memory loaded. | ||
306 | |||
307 | |||
308 | Memory Type supported on this controller attribute file: | ||
309 | |||
310 | 'supported_mem_type' | ||
311 | |||
312 | This attribute file displays the memory type, usually | ||
313 | buffered and unbuffered DIMMs. | ||
314 | |||
315 | |||
316 | Memory Controller name attribute file: | ||
317 | |||
318 | 'mc_name' | ||
319 | |||
320 | This attribute file displays the type of memory controller | ||
321 | that is being utilized. | ||
322 | |||
323 | |||
324 | Memory Controller Module name attribute file: | ||
325 | |||
326 | 'module_name' | ||
327 | |||
328 | This attribute file displays the memory controller module name, | ||
329 | version and date built. The name of the memory controller | ||
330 | hardware - some drivers work with multiple controllers and | ||
331 | this field shows which hardware is present. | ||
332 | |||
333 | |||
334 | Total memory managed by this memory controller attribute file: | ||
335 | |||
336 | 'size_mb' | ||
337 | |||
338 | This attribute file displays, in count of megabytes, of memory | ||
339 | that this instance of memory controller manages. | ||
340 | |||
341 | |||
342 | Total Uncorrectable Errors count attribute file: | ||
343 | |||
344 | 'ue_count' | ||
345 | |||
346 | This attribute file displays the total count of uncorrectable | ||
347 | errors that have occurred on this memory controller. If panic_on_ue | ||
348 | is set this counter will not have a chance to increment, | ||
349 | since EDAC will panic the system. | ||
350 | |||
351 | |||
352 | Total UE count that had no information attribute fileY: | ||
353 | |||
354 | 'ue_noinfo_count' | ||
355 | |||
356 | This attribute file displays the number of UEs that | ||
357 | have occurred have occurred with no informations as to which DIMM | ||
358 | slot is having errors. | ||
359 | |||
360 | |||
361 | Total Correctable Errors count attribute file: | ||
362 | |||
363 | 'ce_count' | ||
364 | |||
365 | This attribute file displays the total count of correctable | ||
366 | errors that have occurred on this memory controller. This | ||
367 | count is very important to examine. CEs provide early | ||
368 | indications that a DIMM is beginning to fail. This count | ||
369 | field should be monitored for non-zero values and report | ||
370 | such information to the system administrator. | ||
371 | |||
372 | |||
373 | Total Correctable Errors count attribute file: | ||
374 | |||
375 | 'ce_noinfo_count' | ||
376 | |||
377 | This attribute file displays the number of CEs that | ||
378 | have occurred wherewith no informations as to which DIMM slot | ||
379 | is having errors. Memory is handicapped, but operational, | ||
380 | yet no information is available to indicate which slot | ||
381 | the failing memory is in. This count field should be also | ||
382 | be monitored for non-zero values. | ||
383 | |||
384 | Device Symlink: | ||
385 | |||
386 | 'device' | ||
387 | |||
388 | Symlink to the memory controller device | ||
389 | |||
390 | |||
391 | |||
392 | ============================================================================ | ||
393 | 'csrowX' DIRECTORIES | ||
394 | |||
395 | In the 'csrowX' directories are EDAC control and attribute files for | ||
396 | this 'X" instance of csrow: | ||
397 | |||
398 | |||
399 | Total Uncorrectable Errors count attribute file: | ||
400 | |||
401 | 'ue_count' | ||
402 | |||
403 | This attribute file displays the total count of uncorrectable | ||
404 | errors that have occurred on this csrow. If panic_on_ue is set | ||
405 | this counter will not have a chance to increment, since EDAC | ||
406 | will panic the system. | ||
407 | |||
408 | |||
409 | Total Correctable Errors count attribute file: | ||
410 | |||
411 | 'ce_count' | ||
412 | |||
413 | This attribute file displays the total count of correctable | ||
414 | errors that have occurred on this csrow. This | ||
415 | count is very important to examine. CEs provide early | ||
416 | indications that a DIMM is beginning to fail. This count | ||
417 | field should be monitored for non-zero values and report | ||
418 | such information to the system administrator. | ||
419 | |||
420 | |||
421 | Total memory managed by this csrow attribute file: | ||
422 | |||
423 | 'size_mb' | ||
424 | |||
425 | This attribute file displays, in count of megabytes, of memory | ||
426 | that this csrow contatins. | ||
427 | |||
428 | |||
429 | Memory Type attribute file: | ||
430 | |||
431 | 'mem_type' | ||
432 | |||
433 | This attribute file will display what type of memory is currently | ||
434 | on this csrow. Normally, either buffered or unbuffered memory. | ||
435 | |||
436 | |||
437 | EDAC Mode of operation attribute file: | ||
438 | |||
439 | 'edac_mode' | ||
440 | |||
441 | This attribute file will display what type of Error detection | ||
442 | and correction is being utilized. | ||
443 | |||
444 | |||
445 | Device type attribute file: | ||
446 | |||
447 | 'dev_type' | ||
448 | |||
449 | This attribute file will display what type of DIMM device is | ||
450 | being utilized. Example: x4 | ||
451 | |||
452 | |||
453 | Channel 0 CE Count attribute file: | ||
454 | |||
455 | 'ch0_ce_count' | ||
456 | |||
457 | This attribute file will display the count of CEs on this | ||
458 | DIMM located in channel 0. | ||
459 | |||
460 | |||
461 | Channel 0 UE Count attribute file: | ||
462 | |||
463 | 'ch0_ue_count' | ||
464 | |||
465 | This attribute file will display the count of UEs on this | ||
466 | DIMM located in channel 0. | ||
467 | |||
468 | |||
469 | Channel 0 DIMM Label control file: | ||
470 | |||
471 | 'ch0_dimm_label' | ||
472 | |||
473 | This control file allows this DIMM to have a label assigned | ||
474 | to it. With this label in the module, when errors occur | ||
475 | the output can provide the DIMM label in the system log. | ||
476 | This becomes vital for panic events to isolate the | ||
477 | cause of the UE event. | ||
478 | |||
479 | DIMM Labels must be assigned after booting, with information | ||
480 | that correctly identifies the physical slot with its | ||
481 | silk screen label. This information is currently very | ||
482 | motherboard specific and determination of this information | ||
483 | must occur in userland at this time. | ||
484 | |||
485 | |||
486 | Channel 1 CE Count attribute file: | ||
487 | |||
488 | 'ch1_ce_count' | ||
489 | |||
490 | This attribute file will display the count of CEs on this | ||
491 | DIMM located in channel 1. | ||
492 | |||
493 | |||
494 | Channel 1 UE Count attribute file: | ||
495 | |||
496 | 'ch1_ue_count' | ||
497 | |||
498 | This attribute file will display the count of UEs on this | ||
499 | DIMM located in channel 0. | ||
500 | |||
501 | |||
502 | Channel 1 DIMM Label control file: | ||
503 | |||
504 | 'ch1_dimm_label' | ||
505 | |||
506 | This control file allows this DIMM to have a label assigned | ||
507 | to it. With this label in the module, when errors occur | ||
508 | the output can provide the DIMM label in the system log. | ||
509 | This becomes vital for panic events to isolate the | ||
510 | cause of the UE event. | ||
511 | |||
512 | DIMM Labels must be assigned after booting, with information | ||
513 | that correctly identifies the physical slot with its | ||
514 | silk screen label. This information is currently very | ||
515 | motherboard specific and determination of this information | ||
516 | must occur in userland at this time. | ||
517 | |||
518 | |||
519 | ============================================================================ | ||
520 | SYSTEM LOGGING | ||
521 | |||
522 | If logging for UEs and CEs are enabled then system logs will have | ||
523 | error notices indicating errors that have been detected: | ||
524 | |||
525 | MC0: CE page 0x283, offset 0xce0, grain 8, syndrome 0x6ec3, row 0, | ||
526 | channel 1 "DIMM_B1": amd76x_edac | ||
527 | |||
528 | MC0: CE page 0x1e5, offset 0xfb0, grain 8, syndrome 0xb741, row 0, | ||
529 | channel 1 "DIMM_B1": amd76x_edac | ||
530 | |||
531 | |||
532 | The structure of the message is: | ||
533 | the memory controller (MC0) | ||
534 | Error type (CE) | ||
535 | memory page (0x283) | ||
536 | offset in the page (0xce0) | ||
537 | the byte granularity (grain 8) | ||
538 | or resolution of the error | ||
539 | the error syndrome (0xb741) | ||
540 | memory row (row 0) | ||
541 | memory channel (channel 1) | ||
542 | DIMM label, if set prior (DIMM B1 | ||
543 | and then an optional, driver-specific message that may | ||
544 | have additional information. | ||
545 | |||
546 | Both UEs and CEs with no info will lack all but memory controller, | ||
547 | error type, a notice of "no info" and then an optional, | ||
548 | driver-specific error message. | ||
549 | |||
550 | |||
551 | |||
552 | ============================================================================ | ||
553 | PCI Bus Parity Detection | ||
554 | |||
555 | |||
556 | On Header Type 00 devices the primary status is looked at | ||
557 | for any parity error regardless of whether Parity is enabled on the | ||
558 | device. (The spec indicates parity is generated in some cases). | ||
559 | On Header Type 01 bridges, the secondary status register is also | ||
560 | looked at to see if parity ocurred on the bus on the other side of | ||
561 | the bridge. | ||
562 | |||
563 | |||
564 | SYSFS CONFIGURATION | ||
565 | |||
566 | Under /sys/devices/system/edac/pci are control and attribute files as follows: | ||
567 | |||
568 | |||
569 | Enable/Disable PCI Parity checking control file: | ||
570 | |||
571 | 'check_pci_parity' | ||
572 | |||
573 | |||
574 | This control file enables or disables the PCI Bus Parity scanning | ||
575 | operation. Writing a 1 to this file enables the scanning. Writing | ||
576 | a 0 to this file disables the scanning. | ||
577 | |||
578 | Enable: | ||
579 | echo "1" >/sys/devices/system/edac/pci/check_pci_parity | ||
580 | |||
581 | Disable: | ||
582 | echo "0" >/sys/devices/system/edac/pci/check_pci_parity | ||
583 | |||
584 | |||
585 | |||
586 | Panic on PCI PARITY Error: | ||
587 | |||
588 | 'panic_on_pci_parity' | ||
589 | |||
590 | |||
591 | This control files enables or disables panic'ing when a parity | ||
592 | error has been detected. | ||
593 | |||
594 | |||
595 | module/kernel parameter: panic_on_pci_parity=[0|1] | ||
596 | |||
597 | Enable: | ||
598 | echo "1" >/sys/devices/system/edac/pci/panic_on_pci_parity | ||
599 | |||
600 | Disable: | ||
601 | echo "0" >/sys/devices/system/edac/pci/panic_on_pci_parity | ||
602 | |||
603 | |||
604 | Parity Count: | ||
605 | |||
606 | 'pci_parity_count' | ||
607 | |||
608 | This attribute file will display the number of parity errors that | ||
609 | have been detected. | ||
610 | |||
611 | |||
612 | |||
613 | PCI Device Whitelist: | ||
614 | |||
615 | 'pci_parity_whitelist' | ||
616 | |||
617 | This control file allows for an explicit list of PCI devices to be | ||
618 | scanned for parity errors. Only devices found on this list will | ||
619 | be examined. The list is a line of hexadecimel VENDOR and DEVICE | ||
620 | ID tuples: | ||
621 | |||
622 | 1022:7450,1434:16a6 | ||
623 | |||
624 | One or more can be inserted, seperated by a comma. | ||
625 | |||
626 | To write the above list doing the following as one command line: | ||
627 | |||
628 | echo "1022:7450,1434:16a6" | ||
629 | > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
630 | |||
631 | |||
632 | |||
633 | To display what the whitelist is, simply 'cat' the same file. | ||
634 | |||
635 | |||
636 | PCI Device Blacklist: | ||
637 | |||
638 | 'pci_parity_blacklist' | ||
639 | |||
640 | This control file allows for a list of PCI devices to be | ||
641 | skipped for scanning. | ||
642 | The list is a line of hexadecimel VENDOR and DEVICE ID tuples: | ||
643 | |||
644 | 1022:7450,1434:16a6 | ||
645 | |||
646 | One or more can be inserted, seperated by a comma. | ||
647 | |||
648 | To write the above list doing the following as one command line: | ||
649 | |||
650 | echo "1022:7450,1434:16a6" | ||
651 | > /sys/devices/system/edac/pci/pci_parity_blacklist | ||
652 | |||
653 | |||
654 | To display what the whitelist current contatins, | ||
655 | simply 'cat' the same file. | ||
656 | |||
657 | ======================================================================= | ||
658 | |||
659 | PCI Vendor and Devices IDs can be obtained with the lspci command. Using | ||
660 | the -n option lspci will display the vendor and device IDs. The system | ||
661 | adminstrator will have to determine which devices should be scanned or | ||
662 | skipped. | ||
663 | |||
664 | |||
665 | |||
666 | The two lists (white and black) are prioritized. blacklist is the lower | ||
667 | priority and will NOT be utilized when a whitelist has been set. | ||
668 | Turn OFF a whitelist by an empty echo command: | ||
669 | |||
670 | echo > /sys/devices/system/edac/pci/pci_parity_whitelist | ||
671 | |||
672 | and any previous blacklist will be utililzed. | ||
673 | |||
diff --git a/Documentation/dvb/avermedia.txt b/Documentation/dvb/avermedia.txt index 2dc260b2b0a4..068070ff13cd 100644 --- a/Documentation/dvb/avermedia.txt +++ b/Documentation/dvb/avermedia.txt | |||
@@ -150,7 +150,8 @@ Getting the card going | |||
150 | 150 | ||
151 | The frontend module sp887x.o, requires an external firmware. | 151 | The frontend module sp887x.o, requires an external firmware. |
152 | Please use the command "get_dvb_firmware sp887x" to download | 152 | Please use the command "get_dvb_firmware sp887x" to download |
153 | it. Then copy it to /usr/lib/hotplug/firmware. | 153 | it. Then copy it to /usr/lib/hotplug/firmware or /lib/firmware/ |
154 | (depending on configuration of firmware hotplug). | ||
154 | 155 | ||
155 | Receiving DVB-T in Australia | 156 | Receiving DVB-T in Australia |
156 | 157 | ||
diff --git a/Documentation/dvb/get_dvb_firmware b/Documentation/dvb/get_dvb_firmware index be6eb4c75991..75c28a174092 100644 --- a/Documentation/dvb/get_dvb_firmware +++ b/Documentation/dvb/get_dvb_firmware | |||
@@ -23,7 +23,7 @@ use IO::Handle; | |||
23 | 23 | ||
24 | @components = ( "sp8870", "sp887x", "tda10045", "tda10046", "av7110", "dec2000t", | 24 | @components = ( "sp8870", "sp887x", "tda10045", "tda10046", "av7110", "dec2000t", |
25 | "dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", | 25 | "dec2540t", "dec3000s", "vp7041", "dibusb", "nxt2002", "nxt2004", |
26 | "or51211", "or51132_qam", "or51132_vsb"); | 26 | "or51211", "or51132_qam", "or51132_vsb", "bluebird"); |
27 | 27 | ||
28 | # Check args | 28 | # Check args |
29 | syntax() if (scalar(@ARGV) != 1); | 29 | syntax() if (scalar(@ARGV) != 1); |
@@ -34,7 +34,11 @@ for ($i=0; $i < scalar(@components); $i++) { | |||
34 | if ($cid eq $components[$i]) { | 34 | if ($cid eq $components[$i]) { |
35 | $outfile = eval($cid); | 35 | $outfile = eval($cid); |
36 | die $@ if $@; | 36 | die $@ if $@; |
37 | print STDERR "Firmware $outfile extracted successfully. Now copy it to either /lib/firmware or /usr/lib/hotplug/firmware/ (depending on your hotplug version).\n"; | 37 | print STDERR <<EOF; |
38 | Firmware $outfile extracted successfully. | ||
39 | Now copy it to either /usr/lib/hotplug/firmware or /lib/firmware | ||
40 | (depending on configuration of firmware hotplug). | ||
41 | EOF | ||
38 | exit(0); | 42 | exit(0); |
39 | } | 43 | } |
40 | } | 44 | } |
@@ -243,7 +247,7 @@ sub nxt2002 { | |||
243 | my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); | 247 | my $tmpdir = tempdir(DIR => "/tmp", CLEANUP => 1); |
244 | 248 | ||
245 | checkstandard(); | 249 | checkstandard(); |
246 | 250 | ||
247 | wgetfile($sourcefile, $url); | 251 | wgetfile($sourcefile, $url); |
248 | unzip($sourcefile, $tmpdir); | 252 | unzip($sourcefile, $tmpdir); |
249 | verify("$tmpdir/SkyNETU.sys", $hash); | 253 | verify("$tmpdir/SkyNETU.sys", $hash); |
@@ -308,6 +312,19 @@ sub or51132_vsb { | |||
308 | $fwfile; | 312 | $fwfile; |
309 | } | 313 | } |
310 | 314 | ||
315 | sub bluebird { | ||
316 | my $url = "http://www.linuxtv.org/download/dvb/firmware/dvb-usb-bluebird-01.fw"; | ||
317 | my $outfile = "dvb-usb-bluebird-01.fw"; | ||
318 | my $hash = "658397cb9eba9101af9031302671f49d"; | ||
319 | |||
320 | checkstandard(); | ||
321 | |||
322 | wgetfile($outfile, $url); | ||
323 | verify($outfile,$hash); | ||
324 | |||
325 | $outfile; | ||
326 | } | ||
327 | |||
311 | # --------------------------------------------------------------- | 328 | # --------------------------------------------------------------- |
312 | # Utilities | 329 | # Utilities |
313 | 330 | ||
diff --git a/Documentation/dvb/ttusb-dec.txt b/Documentation/dvb/ttusb-dec.txt index 5c1e984c26a7..b2f271cd784b 100644 --- a/Documentation/dvb/ttusb-dec.txt +++ b/Documentation/dvb/ttusb-dec.txt | |||
@@ -41,4 +41,5 @@ Hotplug Firmware Loading for 2.6 kernels | |||
41 | For 2.6 kernels the firmware is loaded at the point that the driver module is | 41 | For 2.6 kernels the firmware is loaded at the point that the driver module is |
42 | loaded. See linux/Documentation/dvb/firmware.txt for more information. | 42 | loaded. See linux/Documentation/dvb/firmware.txt for more information. |
43 | 43 | ||
44 | Copy the three files downloaded above into the /usr/lib/hotplug/firmware directory. | 44 | Copy the three files downloaded above into the /usr/lib/hotplug/firmware or |
45 | /lib/firmware directory (depending on configuration of firmware hotplug). | ||
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs index f90cc66ea919..9443a6d72cdd 100644 --- a/Documentation/fb/cyblafb/bugs +++ b/Documentation/fb/cyblafb/bugs | |||
@@ -11,4 +11,3 @@ Untested features | |||
11 | 11 | ||
12 | All LCD stuff is untested. If it worked in tridentfb, it should work in | 12 | All LCD stuff is untested. If it worked in tridentfb, it should work in |
13 | cyblafb. Please test and report the results to Knut_Petersen@t-online.de. | 13 | cyblafb. Please test and report the results to Knut_Petersen@t-online.de. |
14 | |||
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes index cf4351fc32ff..fe0e5223ba86 100644 --- a/Documentation/fb/cyblafb/fb.modes +++ b/Documentation/fb/cyblafb/fb.modes | |||
@@ -14,142 +14,141 @@ | |||
14 | # | 14 | # |
15 | 15 | ||
16 | mode "640x480-50" | 16 | mode "640x480-50" |
17 | geometry 640 480 640 3756 8 | 17 | geometry 640 480 2048 4096 8 |
18 | timings 47619 4294967256 24 17 0 216 3 | 18 | timings 47619 4294967256 24 17 0 216 3 |
19 | endmode | 19 | endmode |
20 | 20 | ||
21 | mode "640x480-60" | 21 | mode "640x480-60" |
22 | geometry 640 480 640 3756 8 | 22 | geometry 640 480 2048 4096 8 |
23 | timings 39682 4294967256 24 17 0 216 3 | 23 | timings 39682 4294967256 24 17 0 216 3 |
24 | endmode | 24 | endmode |
25 | 25 | ||
26 | mode "640x480-70" | 26 | mode "640x480-70" |
27 | geometry 640 480 640 3756 8 | 27 | geometry 640 480 2048 4096 8 |
28 | timings 34013 4294967256 24 17 0 216 3 | 28 | timings 34013 4294967256 24 17 0 216 3 |
29 | endmode | 29 | endmode |
30 | 30 | ||
31 | mode "640x480-72" | 31 | mode "640x480-72" |
32 | geometry 640 480 640 3756 8 | 32 | geometry 640 480 2048 4096 8 |
33 | timings 33068 4294967256 24 17 0 216 3 | 33 | timings 33068 4294967256 24 17 0 216 3 |
34 | endmode | 34 | endmode |
35 | 35 | ||
36 | mode "640x480-75" | 36 | mode "640x480-75" |
37 | geometry 640 480 640 3756 8 | 37 | geometry 640 480 2048 4096 8 |
38 | timings 31746 4294967256 24 17 0 216 3 | 38 | timings 31746 4294967256 24 17 0 216 3 |
39 | endmode | 39 | endmode |
40 | 40 | ||
41 | mode "640x480-80" | 41 | mode "640x480-80" |
42 | geometry 640 480 640 3756 8 | 42 | geometry 640 480 2048 4096 8 |
43 | timings 29761 4294967256 24 17 0 216 3 | 43 | timings 29761 4294967256 24 17 0 216 3 |
44 | endmode | 44 | endmode |
45 | 45 | ||
46 | mode "640x480-85" | 46 | mode "640x480-85" |
47 | geometry 640 480 640 3756 8 | 47 | geometry 640 480 2048 4096 8 |
48 | timings 28011 4294967256 24 17 0 216 3 | 48 | timings 28011 4294967256 24 17 0 216 3 |
49 | endmode | 49 | endmode |
50 | 50 | ||
51 | mode "800x600-50" | 51 | mode "800x600-50" |
52 | geometry 800 600 800 3221 8 | 52 | geometry 800 600 2048 4096 8 |
53 | timings 30303 96 24 14 0 136 11 | 53 | timings 30303 96 24 14 0 136 11 |
54 | endmode | 54 | endmode |
55 | 55 | ||
56 | mode "800x600-60" | 56 | mode "800x600-60" |
57 | geometry 800 600 800 3221 8 | 57 | geometry 800 600 2048 4096 8 |
58 | timings 25252 96 24 14 0 136 11 | 58 | timings 25252 96 24 14 0 136 11 |
59 | endmode | 59 | endmode |
60 | 60 | ||
61 | mode "800x600-70" | 61 | mode "800x600-70" |
62 | geometry 800 600 800 3221 8 | 62 | geometry 800 600 2048 4096 8 |
63 | timings 21645 96 24 14 0 136 11 | 63 | timings 21645 96 24 14 0 136 11 |
64 | endmode | 64 | endmode |
65 | 65 | ||
66 | mode "800x600-72" | 66 | mode "800x600-72" |
67 | geometry 800 600 800 3221 8 | 67 | geometry 800 600 2048 4096 8 |
68 | timings 21043 96 24 14 0 136 11 | 68 | timings 21043 96 24 14 0 136 11 |
69 | endmode | 69 | endmode |
70 | 70 | ||
71 | mode "800x600-75" | 71 | mode "800x600-75" |
72 | geometry 800 600 800 3221 8 | 72 | geometry 800 600 2048 4096 8 |
73 | timings 20202 96 24 14 0 136 11 | 73 | timings 20202 96 24 14 0 136 11 |
74 | endmode | 74 | endmode |
75 | 75 | ||
76 | mode "800x600-80" | 76 | mode "800x600-80" |
77 | geometry 800 600 800 3221 8 | 77 | geometry 800 600 2048 4096 8 |
78 | timings 18939 96 24 14 0 136 11 | 78 | timings 18939 96 24 14 0 136 11 |
79 | endmode | 79 | endmode |
80 | 80 | ||
81 | mode "800x600-85" | 81 | mode "800x600-85" |
82 | geometry 800 600 800 3221 8 | 82 | geometry 800 600 2048 4096 8 |
83 | timings 17825 96 24 14 0 136 11 | 83 | timings 17825 96 24 14 0 136 11 |
84 | endmode | 84 | endmode |
85 | 85 | ||
86 | mode "1024x768-50" | 86 | mode "1024x768-50" |
87 | geometry 1024 768 1024 2815 8 | 87 | geometry 1024 768 2048 4096 8 |
88 | timings 19054 144 24 29 0 120 3 | 88 | timings 19054 144 24 29 0 120 3 |
89 | endmode | 89 | endmode |
90 | 90 | ||
91 | mode "1024x768-60" | 91 | mode "1024x768-60" |
92 | geometry 1024 768 1024 2815 8 | 92 | geometry 1024 768 2048 4096 8 |
93 | timings 15880 144 24 29 0 120 3 | 93 | timings 15880 144 24 29 0 120 3 |
94 | endmode | 94 | endmode |
95 | 95 | ||
96 | mode "1024x768-70" | 96 | mode "1024x768-70" |
97 | geometry 1024 768 1024 2815 8 | 97 | geometry 1024 768 2048 4096 8 |
98 | timings 13610 144 24 29 0 120 3 | 98 | timings 13610 144 24 29 0 120 3 |
99 | endmode | 99 | endmode |
100 | 100 | ||
101 | mode "1024x768-72" | 101 | mode "1024x768-72" |
102 | geometry 1024 768 1024 2815 8 | 102 | geometry 1024 768 2048 4096 8 |
103 | timings 13232 144 24 29 0 120 3 | 103 | timings 13232 144 24 29 0 120 3 |
104 | endmode | 104 | endmode |
105 | 105 | ||
106 | mode "1024x768-75" | 106 | mode "1024x768-75" |
107 | geometry 1024 768 1024 2815 8 | 107 | geometry 1024 768 2048 4096 8 |
108 | timings 12703 144 24 29 0 120 3 | 108 | timings 12703 144 24 29 0 120 3 |
109 | endmode | 109 | endmode |
110 | 110 | ||
111 | mode "1024x768-80" | 111 | mode "1024x768-80" |
112 | geometry 1024 768 1024 2815 8 | 112 | geometry 1024 768 2048 4096 8 |
113 | timings 11910 144 24 29 0 120 3 | 113 | timings 11910 144 24 29 0 120 3 |
114 | endmode | 114 | endmode |
115 | 115 | ||
116 | mode "1024x768-85" | 116 | mode "1024x768-85" |
117 | geometry 1024 768 1024 2815 8 | 117 | geometry 1024 768 2048 4096 8 |
118 | timings 11209 144 24 29 0 120 3 | 118 | timings 11209 144 24 29 0 120 3 |
119 | endmode | 119 | endmode |
120 | 120 | ||
121 | mode "1280x1024-50" | 121 | mode "1280x1024-50" |
122 | geometry 1280 1024 1280 2662 8 | 122 | geometry 1280 1024 2048 4096 8 |
123 | timings 11114 232 16 39 0 160 3 | 123 | timings 11114 232 16 39 0 160 3 |
124 | endmode | 124 | endmode |
125 | 125 | ||
126 | mode "1280x1024-60" | 126 | mode "1280x1024-60" |
127 | geometry 1280 1024 1280 2662 8 | 127 | geometry 1280 1024 2048 4096 8 |
128 | timings 9262 232 16 39 0 160 3 | 128 | timings 9262 232 16 39 0 160 3 |
129 | endmode | 129 | endmode |
130 | 130 | ||
131 | mode "1280x1024-70" | 131 | mode "1280x1024-70" |
132 | geometry 1280 1024 1280 2662 8 | 132 | geometry 1280 1024 2048 4096 8 |
133 | timings 7939 232 16 39 0 160 3 | 133 | timings 7939 232 16 39 0 160 3 |
134 | endmode | 134 | endmode |
135 | 135 | ||
136 | mode "1280x1024-72" | 136 | mode "1280x1024-72" |
137 | geometry 1280 1024 1280 2662 8 | 137 | geometry 1280 1024 2048 4096 8 |
138 | timings 7719 232 16 39 0 160 3 | 138 | timings 7719 232 16 39 0 160 3 |
139 | endmode | 139 | endmode |
140 | 140 | ||
141 | mode "1280x1024-75" | 141 | mode "1280x1024-75" |
142 | geometry 1280 1024 1280 2662 8 | 142 | geometry 1280 1024 2048 4096 8 |
143 | timings 7410 232 16 39 0 160 3 | 143 | timings 7410 232 16 39 0 160 3 |
144 | endmode | 144 | endmode |
145 | 145 | ||
146 | mode "1280x1024-80" | 146 | mode "1280x1024-80" |
147 | geometry 1280 1024 1280 2662 8 | 147 | geometry 1280 1024 2048 4096 8 |
148 | timings 6946 232 16 39 0 160 3 | 148 | timings 6946 232 16 39 0 160 3 |
149 | endmode | 149 | endmode |
150 | 150 | ||
151 | mode "1280x1024-85" | 151 | mode "1280x1024-85" |
152 | geometry 1280 1024 1280 2662 8 | 152 | geometry 1280 1024 2048 4096 8 |
153 | timings 6538 232 16 39 0 160 3 | 153 | timings 6538 232 16 39 0 160 3 |
154 | endmode | 154 | endmode |
155 | |||
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance index eb4e47a9cea6..8d15d5dfc6b3 100644 --- a/Documentation/fb/cyblafb/performance +++ b/Documentation/fb/cyblafb/performance | |||
@@ -77,4 +77,3 @@ patch that speeds up kernel bitblitting a lot ( > 20%). | |||
77 | | | | | | | 77 | | | | | | |
78 | | | | | | | 78 | | | | | | |
79 | +-----------+-----------------+-----------------+-----------------+ | 79 | +-----------+-----------------+-----------------+-----------------+ |
80 | |||
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo index 80fb2f89b6c1..c5f6d0eae545 100644 --- a/Documentation/fb/cyblafb/todo +++ b/Documentation/fb/cyblafb/todo | |||
@@ -22,11 +22,10 @@ accelerated color blitting Who needs it? The console driver does use color | |||
22 | everything else is done using color expanding | 22 | everything else is done using color expanding |
23 | blitting of 1bpp character bitmaps. | 23 | blitting of 1bpp character bitmaps. |
24 | 24 | ||
25 | xpanning Who needs it? | ||
26 | |||
27 | ioctls Who needs it? | 25 | ioctls Who needs it? |
28 | 26 | ||
29 | TV-out Will be done later | 27 | TV-out Will be done later. Use "vga= " at boot time |
28 | to set a suitable video mode. | ||
30 | 29 | ||
31 | ??? Feel free to contact me if you have any | 30 | ??? Feel free to contact me if you have any |
32 | feature requests | 31 | feature requests |
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage index e627c8f54211..a39bb3d402a2 100644 --- a/Documentation/fb/cyblafb/usage +++ b/Documentation/fb/cyblafb/usage | |||
@@ -40,6 +40,16 @@ Selecting Modes | |||
40 | None of the modes possible to select as startup modes are affected by | 40 | None of the modes possible to select as startup modes are affected by |
41 | the problems described at the end of the next subsection. | 41 | the problems described at the end of the next subsection. |
42 | 42 | ||
43 | For all startup modes cyblafb chooses a virtual x resolution of 2048, | ||
44 | the only exception is mode 1280x1024 in combination with 32 bpp. This | ||
45 | allows ywrap scrolling for all those modes if rotation is 0 or 2, and | ||
46 | also fast scrolling if rotation is 1 or 3. The default virtual y reso- | ||
47 | lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32, | ||
48 | again with the only exception of 1280x1024 at 32 bpp. | ||
49 | |||
50 | Please do set your video memory size to 8 Mb in the Bios setup. Other | ||
51 | values will work, but performace is decreased for a lot of modes. | ||
52 | |||
43 | Mode changes using fbset | 53 | Mode changes using fbset |
44 | ======================== | 54 | ======================== |
45 | 55 | ||
@@ -54,20 +64,26 @@ Selecting Modes | |||
54 | - if a flat panel is found, cyblafb does not allow you | 64 | - if a flat panel is found, cyblafb does not allow you |
55 | to program a resolution higher than the physical | 65 | to program a resolution higher than the physical |
56 | resolution of the flat panel monitor | 66 | resolution of the flat panel monitor |
57 | - cyblafb does not allow xres to differ from xres_virtual | ||
58 | - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp | 67 | - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp |
59 | and (currently) 24 bit modes use a doubled vclk internally, | 68 | and (currently) 24 bit modes use a doubled vclk internally, |
60 | the dotclock limit as seen by fbset is 115 MHz for those | 69 | the dotclock limit as seen by fbset is 115 MHz for those |
61 | modes and 230 MHz for 8 and 16 bpp modes. | 70 | modes and 230 MHz for 8 and 16 bpp modes. |
71 | - cyblafb will allow you to select very high resolutions as | ||
72 | long as the hardware can be programmed to these modes. The | ||
73 | documented limit 1600x1200 is not enforced, but don't expect | ||
74 | perfect signal quality. | ||
62 | 75 | ||
63 | Any request that violates the rules given above will be ignored and | 76 | Any request that violates the rules given above will be either changed |
64 | fbset will return an error. | 77 | to something the hardware supports or an error value will be returned. |
65 | 78 | ||
66 | If you program a virtual y resolution higher than the hardware limit, | 79 | If you program a virtual y resolution higher than the hardware limit, |
67 | cyblafb will silently decrease that value to the highest possible | 80 | cyblafb will silently decrease that value to the highest possible |
68 | value. | 81 | value. The same is true for a virtual x resolution that is not |
82 | supported by the hardware. Cyblafb tries to adapt vyres first because | ||
83 | vxres decides if ywrap scrolling is possible or not. | ||
69 | 84 | ||
70 | Attempts to disable acceleration are ignored. | 85 | Attempts to disable acceleration are ignored, I believe that this is |
86 | safe. | ||
71 | 87 | ||
72 | Some video modes that should work do not work as expected. If you use | 88 | Some video modes that should work do not work as expected. If you use |
73 | the standard fb.modes, fbset 640x480-60 will program that mode, but | 89 | the standard fb.modes, fbset 640x480-60 will program that mode, but |
@@ -129,10 +145,6 @@ mode 640x480 or 800x600 or 1024x768 or 1280x1024 | |||
129 | verbosity 0 is the default, increase to at least 2 for every | 145 | verbosity 0 is the default, increase to at least 2 for every |
130 | bug report! | 146 | bug report! |
131 | 147 | ||
132 | vesafb allows cyblafb to be loaded after vesafb has been | ||
133 | loaded. See sections "Module unloading ...". | ||
134 | |||
135 | |||
136 | Development hints | 148 | Development hints |
137 | ================= | 149 | ================= |
138 | 150 | ||
@@ -195,7 +207,7 @@ a graphics mode. | |||
195 | After booting, load cyblafb without any mode and bpp parameter and assign | 207 | After booting, load cyblafb without any mode and bpp parameter and assign |
196 | cyblafb to individual ttys using con2fb, e.g.: | 208 | cyblafb to individual ttys using con2fb, e.g.: |
197 | 209 | ||
198 | modprobe cyblafb vesafb=1 | 210 | modprobe cyblafb |
199 | con2fb /dev/fb1 /dev/tty1 | 211 | con2fb /dev/fb1 /dev/tty1 |
200 | 212 | ||
201 | Unloading cyblafb works without problems after you assign vesafb to all | 213 | Unloading cyblafb works without problems after you assign vesafb to all |
@@ -203,4 +215,3 @@ ttys again, e.g.: | |||
203 | 215 | ||
204 | con2fb /dev/fb0 /dev/tty1 | 216 | con2fb /dev/fb0 /dev/tty1 |
205 | rmmod cyblafb | 217 | rmmod cyblafb |
206 | |||
diff --git a/Documentation/fb/cyblafb/whatsnew b/Documentation/fb/cyblafb/whatsnew new file mode 100644 index 000000000000..76c07a26e044 --- /dev/null +++ b/Documentation/fb/cyblafb/whatsnew | |||
@@ -0,0 +1,29 @@ | |||
1 | 0.62 | ||
2 | ==== | ||
3 | |||
4 | - the vesafb parameter has been removed as I decided to allow the | ||
5 | feature without any special parameter. | ||
6 | |||
7 | - Cyblafb does not use the vga style of panning any longer, now the | ||
8 | "right view" register in the graphics engine IO space is used. Without | ||
9 | that change it was impossible to use all available memory, and without | ||
10 | access to all available memory it is impossible to ywrap. | ||
11 | |||
12 | - The imageblit function now uses hardware acceleration for all font | ||
13 | widths. Hardware blitting across pixel column 2048 is broken in the | ||
14 | cyberblade/i1 graphics core, but we work around that hardware bug. | ||
15 | |||
16 | - modes with vxres != xres are supported now. | ||
17 | |||
18 | - ywrap scrolling is supported now and the default. This is a big | ||
19 | performance gain. | ||
20 | |||
21 | - default video modes use vyres > yres and vxres > xres to allow | ||
22 | almost optimal scrolling speed for normal and rotated screens | ||
23 | |||
24 | - some features mainly usefull for debugging the upper layers of the | ||
25 | framebuffer system have been added, have a look at the code | ||
26 | |||
27 | - fixed: Oops after unloading cyblafb when reading /proc/io* | ||
28 | |||
29 | - we work around some bugs of the higher framebuffer layers. | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 9474501dd6cc..b4a1ea762698 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -123,6 +123,15 @@ Who: Christoph Hellwig <hch@lst.de> | |||
123 | 123 | ||
124 | --------------------------- | 124 | --------------------------- |
125 | 125 | ||
126 | What: CONFIG_FORCED_INLINING | ||
127 | When: June 2006 | ||
128 | Why: Config option is there to see if gcc is good enough. (in january | ||
129 | 2006). If it is, the behavior should just be the default. If it's not, | ||
130 | the option should just go away entirely. | ||
131 | Who: Arjan van de Ven | ||
132 | |||
133 | --------------------------- | ||
134 | |||
126 | What: START_ARRAY ioctl for md | 135 | What: START_ARRAY ioctl for md |
127 | When: July 2006 | 136 | When: July 2006 |
128 | Files: drivers/md/md.c | 137 | Files: drivers/md/md.c |
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt index 9840d5b8d5b9..afb1335c05d6 100644 --- a/Documentation/filesystems/ext3.txt +++ b/Documentation/filesystems/ext3.txt | |||
@@ -2,11 +2,11 @@ | |||
2 | Ext3 Filesystem | 2 | Ext3 Filesystem |
3 | =============== | 3 | =============== |
4 | 4 | ||
5 | ext3 was originally released in September 1999. Written by Stephen Tweedie | 5 | Ext3 was originally released in September 1999. Written by Stephen Tweedie |
6 | for 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, | 6 | for the 2.2 branch, and ported to 2.4 kernels by Peter Braam, Andreas Dilger, |
7 | Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie. | 7 | Andrew Morton, Alexander Viro, Ted Ts'o and Stephen Tweedie. |
8 | 8 | ||
9 | ext3 is ext2 filesystem enhanced with journalling capabilities. | 9 | Ext3 is the ext2 filesystem enhanced with journalling capabilities. |
10 | 10 | ||
11 | Options | 11 | Options |
12 | ======= | 12 | ======= |
@@ -14,76 +14,81 @@ Options | |||
14 | When mounting an ext3 filesystem, the following option are accepted: | 14 | When mounting an ext3 filesystem, the following option are accepted: |
15 | (*) == default | 15 | (*) == default |
16 | 16 | ||
17 | jounal=update Update the ext3 file system's journal to the | 17 | journal=update Update the ext3 file system's journal to the current |
18 | current format. | 18 | format. |
19 | 19 | ||
20 | journal=inum When a journal already exists, this option is | 20 | journal=inum When a journal already exists, this option is ignored. |
21 | ignored. Otherwise, it specifies the number of | 21 | Otherwise, it specifies the number of the inode which |
22 | the inode which will represent the ext3 file | 22 | will represent the ext3 file system's journal file. |
23 | system's journal file. | 23 | |
24 | journal_dev=devnum When the external journal device's major/minor numbers | ||
25 | have changed, this option allows the user to specify | ||
26 | the new journal location. The journal device is | ||
27 | identified through its new major/minor numbers encoded | ||
28 | in devnum. | ||
24 | 29 | ||
25 | noload Don't load the journal on mounting. | 30 | noload Don't load the journal on mounting. |
26 | 31 | ||
27 | data=journal All data are committed into the journal prior | 32 | data=journal All data are committed into the journal prior to being |
28 | to being written into the main file system. | 33 | written into the main file system. |
29 | 34 | ||
30 | data=ordered (*) All data are forced directly out to the main file | 35 | data=ordered (*) All data are forced directly out to the main file |
31 | system prior to its metadata being committed to | 36 | system prior to its metadata being committed to the |
32 | the journal. | 37 | journal. |
33 | 38 | ||
34 | data=writeback Data ordering is not preserved, data may be | 39 | data=writeback Data ordering is not preserved, data may be written |
35 | written into the main file system after its | 40 | into the main file system after its metadata has been |
36 | metadata has been committed to the journal. | 41 | committed to the journal. |
37 | 42 | ||
38 | commit=nrsec (*) Ext3 can be told to sync all its data and metadata | 43 | commit=nrsec (*) Ext3 can be told to sync all its data and metadata |
39 | every 'nrsec' seconds. The default value is 5 seconds. | 44 | every 'nrsec' seconds. The default value is 5 seconds. |
40 | This means that if you lose your power, you will lose, | 45 | This means that if you lose your power, you will lose |
41 | as much, the latest 5 seconds of work (your filesystem | 46 | as much as the latest 5 seconds of work (your |
42 | will not be damaged though, thanks to journaling). This | 47 | filesystem will not be damaged though, thanks to the |
43 | default value (or any low value) will hurt performance, | 48 | journaling). This default value (or any low value) |
44 | but it's good for data-safety. Setting it to 0 will | 49 | will hurt performance, but it's good for data-safety. |
45 | have the same effect than leaving the default 5 sec. | 50 | Setting it to 0 will have the same effect as leaving |
51 | it at the default (5 seconds). | ||
46 | Setting it to very large values will improve | 52 | Setting it to very large values will improve |
47 | performance. | 53 | performance. |
48 | 54 | ||
49 | barrier=1 This enables/disables barriers. barrier=0 disables it, | 55 | barrier=1 This enables/disables barriers. barrier=0 disables |
50 | barrier=1 enables it. | 56 | it, barrier=1 enables it. |
51 | 57 | ||
52 | orlov (*) This enables the new Orlov block allocator. It's enabled | 58 | orlov (*) This enables the new Orlov block allocator. It is |
53 | by default. | 59 | enabled by default. |
54 | 60 | ||
55 | oldalloc This disables the Orlov block allocator and enables the | 61 | oldalloc This disables the Orlov block allocator and enables |
56 | old block allocator. Orlov should have better performance, | 62 | the old block allocator. Orlov should have better |
57 | we'd like to get some feedback if it's the contrary for | 63 | performance - we'd like to get some feedback if it's |
58 | you. | 64 | the contrary for you. |
59 | 65 | ||
60 | user_xattr Enables Extended User Attributes. Additionally, you need | 66 | user_xattr Enables Extended User Attributes. Additionally, you |
61 | to have extended attribute support enabled in the kernel | 67 | need to have extended attribute support enabled in the |
62 | configuration (CONFIG_EXT3_FS_XATTR). See the attr(5) | 68 | kernel configuration (CONFIG_EXT3_FS_XATTR). See the |
63 | manual page and http://acl.bestbits.at to learn more | 69 | attr(5) manual page and http://acl.bestbits.at/ to |
64 | about extended attributes. | 70 | learn more about extended attributes. |
65 | 71 | ||
66 | nouser_xattr Disables Extended User Attributes. | 72 | nouser_xattr Disables Extended User Attributes. |
67 | 73 | ||
68 | acl Enables POSIX Access Control Lists support. Additionally, | 74 | acl Enables POSIX Access Control Lists support. |
69 | you need to have ACL support enabled in the kernel | 75 | Additionally, you need to have ACL support enabled in |
70 | configuration (CONFIG_EXT3_FS_POSIX_ACL). See the acl(5) | 76 | the kernel configuration (CONFIG_EXT3_FS_POSIX_ACL). |
71 | manual page and http://acl.bestbits.at for more | 77 | See the acl(5) manual page and http://acl.bestbits.at/ |
72 | information. | 78 | for more information. |
73 | 79 | ||
74 | noacl This option disables POSIX Access Control List support. | 80 | noacl This option disables POSIX Access Control List |
81 | support. | ||
75 | 82 | ||
76 | reservation | 83 | reservation |
77 | 84 | ||
78 | noreservation | 85 | noreservation |
79 | 86 | ||
80 | resize= | ||
81 | |||
82 | bsddf (*) Make 'df' act like BSD. | 87 | bsddf (*) Make 'df' act like BSD. |
83 | minixdf Make 'df' act like Minix. | 88 | minixdf Make 'df' act like Minix. |
84 | 89 | ||
85 | check=none Don't do extra checking of bitmaps on mount. | 90 | check=none Don't do extra checking of bitmaps on mount. |
86 | nocheck | 91 | nocheck |
87 | 92 | ||
88 | debug Extra debugging information is sent to syslog. | 93 | debug Extra debugging information is sent to syslog. |
89 | 94 | ||
@@ -92,7 +97,7 @@ errors=continue Keep going on a filesystem error. | |||
92 | errors=panic Panic and halt the machine if an error occurs. | 97 | errors=panic Panic and halt the machine if an error occurs. |
93 | 98 | ||
94 | grpid Give objects the same group ID as their creator. | 99 | grpid Give objects the same group ID as their creator. |
95 | bsdgroups | 100 | bsdgroups |
96 | 101 | ||
97 | nogrpid (*) New objects have the group ID of their creator. | 102 | nogrpid (*) New objects have the group ID of their creator. |
98 | sysvgroups | 103 | sysvgroups |
@@ -103,81 +108,83 @@ resuid=n The user ID which may use the reserved blocks. | |||
103 | 108 | ||
104 | sb=n Use alternate superblock at this location. | 109 | sb=n Use alternate superblock at this location. |
105 | 110 | ||
106 | quota Quota options are currently silently ignored. | 111 | quota |
107 | noquota (see fs/ext3/super.c, line 594) | 112 | noquota |
108 | grpquota | 113 | grpquota |
109 | usrquota | 114 | usrquota |
110 | 115 | ||
111 | 116 | ||
112 | Specification | 117 | Specification |
113 | ============= | 118 | ============= |
114 | ext3 shares all disk implementation with ext2 filesystem, and add | 119 | Ext3 shares all disk implementation with the ext2 filesystem, and adds |
115 | transactions capabilities to ext2. Journaling is done by the | 120 | transactions capabilities to ext2. Journaling is done by the Journaling Block |
116 | Journaling block device layer. | 121 | Device layer. |
117 | 122 | ||
118 | Journaling Block Device layer | 123 | Journaling Block Device layer |
119 | ----------------------------- | 124 | ----------------------------- |
120 | The Journaling Block Device layer (JBD) isn't ext3 specific. It was | 125 | The Journaling Block Device layer (JBD) isn't ext3 specific. It was design to |
121 | design to add journaling capabilities on a block device. The ext3 | 126 | add journaling capabilities on a block device. The ext3 filesystem code will |
122 | filesystem code will inform the JBD of modifications it is performing | 127 | inform the JBD of modifications it is performing (called a transaction). The |
123 | (Call a transaction). the journal support the transactions start and | 128 | journal supports the transactions start and stop, and in case of crash, the |
124 | stop, and in case of crash, the journal can replayed the transactions | 129 | journal can replayed the transactions to put the partition back in a |
125 | to put the partition on a consistent state fastly. | 130 | consistent state fast. |
126 | 131 | ||
127 | handles represent a single atomic update to a filesystem. JBD can | 132 | Handles represent a single atomic update to a filesystem. JBD can handle an |
128 | handle external journal on a block device. | 133 | external journal on a block device. |
129 | 134 | ||
130 | Data Mode | 135 | Data Mode |
131 | --------- | 136 | --------- |
132 | There's 3 different data modes: | 137 | There are 3 different data modes: |
133 | 138 | ||
134 | * writeback mode | 139 | * writeback mode |
135 | In data=writeback mode, ext3 does not journal data at all. This mode | 140 | In data=writeback mode, ext3 does not journal data at all. This mode provides |
136 | provides a similar level of journaling as XFS, JFS, and ReiserFS in its | 141 | a similar level of journaling as that of XFS, JFS, and ReiserFS in its default |
137 | default mode - metadata journaling. A crash+recovery can cause | 142 | mode - metadata journaling. A crash+recovery can cause incorrect data to |
138 | incorrect data to appear in files which were written shortly before the | 143 | appear in files which were written shortly before the crash. This mode will |
139 | crash. This mode will typically provide the best ext3 performance. | 144 | typically provide the best ext3 performance. |
140 | 145 | ||
141 | * ordered mode | 146 | * ordered mode |
142 | In data=ordered mode, ext3 only officially journals metadata, but it | 147 | In data=ordered mode, ext3 only officially journals metadata, but it logically |
143 | logically groups metadata and data blocks into a single unit called a | 148 | groups metadata and data blocks into a single unit called a transaction. When |
144 | transaction. When it's time to write the new metadata out to disk, the | 149 | it's time to write the new metadata out to disk, the associated data blocks |
145 | associated data blocks are written first. In general, this mode | 150 | are written first. In general, this mode performs slightly slower than |
146 | perform slightly slower than writeback but significantly faster than | 151 | writeback but significantly faster than journal mode. |
147 | journal mode. | ||
148 | 152 | ||
149 | * journal mode | 153 | * journal mode |
150 | data=journal mode provides full data and metadata journaling. All new | 154 | data=journal mode provides full data and metadata journaling. All new data is |
151 | data is written to the journal first, and then to its final location. | 155 | written to the journal first, and then to its final location. |
152 | In the event of a crash, the journal can be replayed, bringing both | 156 | In the event of a crash, the journal can be replayed, bringing both data and |
153 | data and metadata into a consistent state. This mode is the slowest | 157 | metadata into a consistent state. This mode is the slowest except when data |
154 | except when data needs to be read from and written to disk at the same | 158 | needs to be read from and written to disk at the same time where it |
155 | time where it outperform all others mode. | 159 | outperforms all others modes. |
156 | 160 | ||
157 | Compatibility | 161 | Compatibility |
158 | ------------- | 162 | ------------- |
159 | 163 | ||
160 | Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`. | 164 | Ext2 partitions can be easily convert to ext3, with `tune2fs -j <dev>`. |
161 | Ext3 is fully compatible with Ext2. Ext3 partitions can easily be | 165 | Ext3 is fully compatible with Ext2. Ext3 partitions can easily be mounted as |
162 | mounted as Ext2. | 166 | Ext2. |
167 | |||
163 | 168 | ||
164 | External Tools | 169 | External Tools |
165 | ============== | 170 | ============== |
166 | see manual pages to know more. | 171 | See manual pages to learn more. |
172 | |||
173 | tune2fs: create a ext3 journal on a ext2 partition with the -j flag. | ||
174 | mke2fs: create a ext3 partition with the -j flag. | ||
175 | debugfs: ext2 and ext3 file system debugger. | ||
176 | ext2online: online (mounted) ext2 and ext3 filesystem resizer | ||
167 | 177 | ||
168 | tune2fs: create a ext3 journal on a ext2 partition with the -j flags | ||
169 | mke2fs: create a ext3 partition with the -j flags | ||
170 | debugfs: ext2 and ext3 file system debugger | ||
171 | 178 | ||
172 | References | 179 | References |
173 | ========== | 180 | ========== |
174 | 181 | ||
175 | kernel source: file:/usr/src/linux/fs/ext3 | 182 | kernel source: <file:fs/ext3/> |
176 | file:/usr/src/linux/fs/jbd | 183 | <file:fs/jbd/> |
177 | 184 | ||
178 | programs: http://e2fsprogs.sourceforge.net | 185 | programs: http://e2fsprogs.sourceforge.net/ |
186 | http://ext2resize.sourceforge.net | ||
179 | 187 | ||
180 | useful link: | 188 | useful links: http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html |
181 | http://www.zip.com.au/~akpm/linux/ext3/ext3-usage.html | ||
182 | http://www-106.ibm.com/developerworks/linux/library/l-fs7/ | 189 | http://www-106.ibm.com/developerworks/linux/library/l-fs7/ |
183 | http://www-106.ibm.com/developerworks/linux/library/l-fs8/ | 190 | http://www-106.ibm.com/developerworks/linux/library/l-fs8/ |
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt index 6b5741e651a2..33f74310d161 100644 --- a/Documentation/filesystems/fuse.txt +++ b/Documentation/filesystems/fuse.txt | |||
@@ -86,6 +86,62 @@ Mount options | |||
86 | The default is infinite. Note that the size of read requests is | 86 | The default is infinite. Note that the size of read requests is |
87 | limited anyway to 32 pages (which is 128kbyte on i386). | 87 | limited anyway to 32 pages (which is 128kbyte on i386). |
88 | 88 | ||
89 | Sysfs | ||
90 | ~~~~~ | ||
91 | |||
92 | FUSE sets up the following hierarchy in sysfs: | ||
93 | |||
94 | /sys/fs/fuse/connections/N/ | ||
95 | |||
96 | where N is an increasing number allocated to each new connection. | ||
97 | |||
98 | For each connection the following attributes are defined: | ||
99 | |||
100 | 'waiting' | ||
101 | |||
102 | The number of requests which are waiting to be transfered to | ||
103 | userspace or being processed by the filesystem daemon. If there is | ||
104 | no filesystem activity and 'waiting' is non-zero, then the | ||
105 | filesystem is hung or deadlocked. | ||
106 | |||
107 | 'abort' | ||
108 | |||
109 | Writing anything into this file will abort the filesystem | ||
110 | connection. This means that all waiting requests will be aborted an | ||
111 | error returned for all aborted and new requests. | ||
112 | |||
113 | Only a privileged user may read or write these attributes. | ||
114 | |||
115 | Aborting a filesystem connection | ||
116 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
117 | |||
118 | It is possible to get into certain situations where the filesystem is | ||
119 | not responding. Reasons for this may be: | ||
120 | |||
121 | a) Broken userspace filesystem implementation | ||
122 | |||
123 | b) Network connection down | ||
124 | |||
125 | c) Accidental deadlock | ||
126 | |||
127 | d) Malicious deadlock | ||
128 | |||
129 | (For more on c) and d) see later sections) | ||
130 | |||
131 | In either of these cases it may be useful to abort the connection to | ||
132 | the filesystem. There are several ways to do this: | ||
133 | |||
134 | - Kill the filesystem daemon. Works in case of a) and b) | ||
135 | |||
136 | - Kill the filesystem daemon and all users of the filesystem. Works | ||
137 | in all cases except some malicious deadlocks | ||
138 | |||
139 | - Use forced umount (umount -f). Works in all cases but only if | ||
140 | filesystem is still attached (it hasn't been lazy unmounted) | ||
141 | |||
142 | - Abort filesystem through the sysfs interface. Most powerful | ||
143 | method, always works. | ||
144 | |||
89 | How do non-privileged mounts work? | 145 | How do non-privileged mounts work? |
90 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 146 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
91 | 147 | ||
@@ -313,3 +369,10 @@ faulted with get_user_pages(). The 'req->locked' flag indicates | |||
313 | when the copy is taking place, and interruption is delayed until | 369 | when the copy is taking place, and interruption is delayed until |
314 | this flag is unset. | 370 | this flag is unset. |
315 | 371 | ||
372 | Scenario 3 - Tricky deadlock with asynchronous read | ||
373 | --------------------------------------------------- | ||
374 | |||
375 | The same situation as above, except thread-1 will wait on page lock | ||
376 | and hence it will be uninterruptible as well. The solution is to | ||
377 | abort the connection with forced umount (if mount is attached) or | ||
378 | through the abort attribute in sysfs. | ||
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index d4773565ea2f..944cf109a6f5 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt | |||
@@ -418,7 +418,7 @@ VmallocChunk: 111088 kB | |||
418 | Dirty: Memory which is waiting to get written back to the disk | 418 | Dirty: Memory which is waiting to get written back to the disk |
419 | Writeback: Memory which is actively being written back to the disk | 419 | Writeback: Memory which is actively being written back to the disk |
420 | Mapped: files which have been mmaped, such as libraries | 420 | Mapped: files which have been mmaped, such as libraries |
421 | Slab: in-kernel data structures cache | 421 | Slab: in-kernel data structures cache |
422 | CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), | 422 | CommitLimit: Based on the overcommit ratio ('vm.overcommit_ratio'), |
423 | this is the total amount of memory currently available to | 423 | this is the total amount of memory currently available to |
424 | be allocated on the system. This limit is only adhered to | 424 | be allocated on the system. This limit is only adhered to |
@@ -1302,6 +1302,23 @@ VM has token based thrashing control mechanism and uses the token to prevent | |||
1302 | unnecessary page faults in thrashing situation. The unit of the value is | 1302 | unnecessary page faults in thrashing situation. The unit of the value is |
1303 | second. The value would be useful to tune thrashing behavior. | 1303 | second. The value would be useful to tune thrashing behavior. |
1304 | 1304 | ||
1305 | drop_caches | ||
1306 | ----------- | ||
1307 | |||
1308 | Writing to this will cause the kernel to drop clean caches, dentries and | ||
1309 | inodes from memory, causing that memory to become free. | ||
1310 | |||
1311 | To free pagecache: | ||
1312 | echo 1 > /proc/sys/vm/drop_caches | ||
1313 | To free dentries and inodes: | ||
1314 | echo 2 > /proc/sys/vm/drop_caches | ||
1315 | To free pagecache, dentries and inodes: | ||
1316 | echo 3 > /proc/sys/vm/drop_caches | ||
1317 | |||
1318 | As this is a non-destructive operation and dirty objects are not freeable, the | ||
1319 | user should run `sync' first. | ||
1320 | |||
1321 | |||
1305 | 2.5 /proc/sys/dev - Device specific parameters | 1322 | 2.5 /proc/sys/dev - Device specific parameters |
1306 | ---------------------------------------------- | 1323 | ---------------------------------------------- |
1307 | 1324 | ||
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt index b3404a032596..60ab61e54e8a 100644 --- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt +++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt | |||
@@ -143,12 +143,26 @@ as the following example: | |||
143 | dir /mnt 755 0 0 | 143 | dir /mnt 755 0 0 |
144 | file /init initramfs/init.sh 755 0 0 | 144 | file /init initramfs/init.sh 755 0 0 |
145 | 145 | ||
146 | Run "usr/gen_init_cpio" (after the kernel build) to get a usage message | ||
147 | documenting the above file format. | ||
148 | |||
146 | One advantage of the text file is that root access is not required to | 149 | One advantage of the text file is that root access is not required to |
147 | set permissions or create device nodes in the new archive. (Note that those | 150 | set permissions or create device nodes in the new archive. (Note that those |
148 | two example "file" entries expect to find files named "init.sh" and "busybox" in | 151 | two example "file" entries expect to find files named "init.sh" and "busybox" in |
149 | a directory called "initramfs", under the linux-2.6.* directory. See | 152 | a directory called "initramfs", under the linux-2.6.* directory. See |
150 | Documentation/early-userspace/README for more details.) | 153 | Documentation/early-userspace/README for more details.) |
151 | 154 | ||
155 | The kernel does not depend on external cpio tools, gen_init_cpio is created | ||
156 | from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's | ||
157 | boot-time extractor is also (obviously) self-contained. However, if you _do_ | ||
158 | happen to have cpio installed, the following command line can extract the | ||
159 | generated cpio image back into its component files: | ||
160 | |||
161 | cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames | ||
162 | |||
163 | Contents of initramfs: | ||
164 | ---------------------- | ||
165 | |||
152 | If you don't already understand what shared libraries, devices, and paths | 166 | If you don't already understand what shared libraries, devices, and paths |
153 | you need to get a minimal root filesystem up and running, here are some | 167 | you need to get a minimal root filesystem up and running, here are some |
154 | references: | 168 | references: |
@@ -161,13 +175,69 @@ designed to be a tiny C library to statically link early userspace | |||
161 | code against, along with some related utilities. It is BSD licensed. | 175 | code against, along with some related utilities. It is BSD licensed. |
162 | 176 | ||
163 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) | 177 | I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) |
164 | myself. These are LGPL and GPL, respectively. | 178 | myself. These are LGPL and GPL, respectively. (A self-contained initramfs |
179 | package is planned for the busybox 1.2 release.) | ||
165 | 180 | ||
166 | In theory you could use glibc, but that's not well suited for small embedded | 181 | In theory you could use glibc, but that's not well suited for small embedded |
167 | uses like this. (A "hello world" program statically linked against glibc is | 182 | uses like this. (A "hello world" program statically linked against glibc is |
168 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do | 183 | over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do |
169 | name lookups, even when otherwise statically linked.) | 184 | name lookups, even when otherwise statically linked.) |
170 | 185 | ||
186 | Why cpio rather than tar? | ||
187 | ------------------------- | ||
188 | |||
189 | This decision was made back in December, 2001. The discussion started here: | ||
190 | |||
191 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html | ||
192 | |||
193 | And spawned a second thread (specifically on tar vs cpio), starting here: | ||
194 | |||
195 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html | ||
196 | |||
197 | The quick and dirty summary version (which is no substitute for reading | ||
198 | the above threads) is: | ||
199 | |||
200 | 1) cpio is a standard. It's decades old (from the AT&T days), and already | ||
201 | widely used on Linux (inside RPM, Red Hat's device driver disks). Here's | ||
202 | a Linux Journal article about it from 1996: | ||
203 | |||
204 | http://www.linuxjournal.com/article/1213 | ||
205 | |||
206 | It's not as popular as tar because the traditional cpio command line tools | ||
207 | require _truly_hideous_ command line arguments. But that says nothing | ||
208 | either way about the archive format, and there are alternative tools, | ||
209 | such as: | ||
210 | |||
211 | http://freshmeat.net/projects/afio/ | ||
212 | |||
213 | 2) The cpio archive format chosen by the kernel is simpler and cleaner (and | ||
214 | thus easier to create and parse) than any of the (literally dozens of) | ||
215 | various tar archive formats. The complete initramfs archive format is | ||
216 | explained in buffer-format.txt, created in usr/gen_init_cpio.c, and | ||
217 | extracted in init/initramfs.c. All three together come to less than 26k | ||
218 | total of human-readable text. | ||
219 | |||
220 | 3) The GNU project standardizing on tar is approximately as relevant as | ||
221 | Windows standardizing on zip. Linux is not part of either, and is free | ||
222 | to make its own technical decisions. | ||
223 | |||
224 | 4) Since this is a kernel internal format, it could easily have been | ||
225 | something brand new. The kernel provides its own tools to create and | ||
226 | extract this format anyway. Using an existing standard was preferable, | ||
227 | but not essential. | ||
228 | |||
229 | 5) Al Viro made the decision (quote: "tar is ugly as hell and not going to be | ||
230 | supported on the kernel side"): | ||
231 | |||
232 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html | ||
233 | |||
234 | explained his reasoning: | ||
235 | |||
236 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html | ||
237 | http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html | ||
238 | |||
239 | and, most importantly, designed and implemented the initramfs code. | ||
240 | |||
171 | Future directions: | 241 | Future directions: |
172 | ------------------ | 242 | ------------------ |
173 | 243 | ||
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt index d803abed29f0..5832377b7340 100644 --- a/Documentation/filesystems/relayfs.txt +++ b/Documentation/filesystems/relayfs.txt | |||
@@ -44,30 +44,41 @@ relayfs can operate in a mode where it will overwrite data not yet | |||
44 | collected by userspace, and not wait for it to consume it. | 44 | collected by userspace, and not wait for it to consume it. |
45 | 45 | ||
46 | relayfs itself does not provide for communication of such data between | 46 | relayfs itself does not provide for communication of such data between |
47 | userspace and kernel, allowing the kernel side to remain simple and not | 47 | userspace and kernel, allowing the kernel side to remain simple and |
48 | impose a single interface on userspace. It does provide a separate | 48 | not impose a single interface on userspace. It does provide a set of |
49 | helper though, described below. | 49 | examples and a separate helper though, described below. |
50 | |||
51 | klog and relay-apps example code | ||
52 | ================================ | ||
53 | |||
54 | relayfs itself is ready to use, but to make things easier, a couple | ||
55 | simple utility functions and a set of examples are provided. | ||
56 | |||
57 | The relay-apps example tarball, available on the relayfs sourceforge | ||
58 | site, contains a set of self-contained examples, each consisting of a | ||
59 | pair of .c files containing boilerplate code for each of the user and | ||
60 | kernel sides of a relayfs application; combined these two sets of | ||
61 | boilerplate code provide glue to easily stream data to disk, without | ||
62 | having to bother with mundane housekeeping chores. | ||
63 | |||
64 | The 'klog debugging functions' patch (klog.patch in the relay-apps | ||
65 | tarball) provides a couple of high-level logging functions to the | ||
66 | kernel which allow writing formatted text or raw data to a channel, | ||
67 | regardless of whether a channel to write into exists or not, or | ||
68 | whether relayfs is compiled into the kernel or is configured as a | ||
69 | module. These functions allow you to put unconditional 'trace' | ||
70 | statements anywhere in the kernel or kernel modules; only when there | ||
71 | is a 'klog handler' registered will data actually be logged (see the | ||
72 | klog and kleak examples for details). | ||
73 | |||
74 | It is of course possible to use relayfs from scratch i.e. without | ||
75 | using any of the relay-apps example code or klog, but you'll have to | ||
76 | implement communication between userspace and kernel, allowing both to | ||
77 | convey the state of buffers (full, empty, amount of padding). | ||
78 | |||
79 | klog and the relay-apps examples can be found in the relay-apps | ||
80 | tarball on http://relayfs.sourceforge.net | ||
50 | 81 | ||
51 | klog, relay-app & librelay | ||
52 | ========================== | ||
53 | |||
54 | relayfs itself is ready to use, but to make things easier, two | ||
55 | additional systems are provided. klog is a simple wrapper to make | ||
56 | writing formatted text or raw data to a channel simpler, regardless of | ||
57 | whether a channel to write into exists or not, or whether relayfs is | ||
58 | compiled into the kernel or is configured as a module. relay-app is | ||
59 | the kernel counterpart of userspace librelay.c, combined these two | ||
60 | files provide glue to easily stream data to disk, without having to | ||
61 | bother with housekeeping. klog and relay-app can be used together, | ||
62 | with klog providing high-level logging functions to the kernel and | ||
63 | relay-app taking care of kernel-user control and disk-logging chores. | ||
64 | |||
65 | It is possible to use relayfs without relay-app & librelay, but you'll | ||
66 | have to implement communication between userspace and kernel, allowing | ||
67 | both to convey the state of buffers (full, empty, amount of padding). | ||
68 | |||
69 | klog, relay-app and librelay can be found in the relay-apps tarball on | ||
70 | http://relayfs.sourceforge.net | ||
71 | 82 | ||
72 | The relayfs user space API | 83 | The relayfs user space API |
73 | ========================== | 84 | ========================== |
@@ -125,6 +136,8 @@ Here's a summary of the API relayfs provides to in-kernel clients: | |||
125 | relay_reset(chan) | 136 | relay_reset(chan) |
126 | relayfs_create_dir(name, parent) | 137 | relayfs_create_dir(name, parent) |
127 | relayfs_remove_dir(dentry) | 138 | relayfs_remove_dir(dentry) |
139 | relayfs_create_file(name, parent, mode, fops, data) | ||
140 | relayfs_remove_file(dentry) | ||
128 | 141 | ||
129 | channel management typically called on instigation of userspace: | 142 | channel management typically called on instigation of userspace: |
130 | 143 | ||
@@ -141,6 +154,8 @@ Here's a summary of the API relayfs provides to in-kernel clients: | |||
141 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) | 154 | subbuf_start(buf, subbuf, prev_subbuf, prev_padding) |
142 | buf_mapped(buf, filp) | 155 | buf_mapped(buf, filp) |
143 | buf_unmapped(buf, filp) | 156 | buf_unmapped(buf, filp) |
157 | create_buf_file(filename, parent, mode, buf, is_global) | ||
158 | remove_buf_file(dentry) | ||
144 | 159 | ||
145 | helper functions: | 160 | helper functions: |
146 | 161 | ||
@@ -320,6 +335,71 @@ forces a sub-buffer switch on all the channel buffers, and can be used | |||
320 | to finalize and process the last sub-buffers before the channel is | 335 | to finalize and process the last sub-buffers before the channel is |
321 | closed. | 336 | closed. |
322 | 337 | ||
338 | Creating non-relay files | ||
339 | ------------------------ | ||
340 | |||
341 | relay_open() automatically creates files in the relayfs filesystem to | ||
342 | represent the per-cpu kernel buffers; it's often useful for | ||
343 | applications to be able to create their own files alongside the relay | ||
344 | files in the relayfs filesystem as well e.g. 'control' files much like | ||
345 | those created in /proc or debugfs for similar purposes, used to | ||
346 | communicate control information between the kernel and user sides of a | ||
347 | relayfs application. For this purpose the relayfs_create_file() and | ||
348 | relayfs_remove_file() API functions exist. For relayfs_create_file(), | ||
349 | the caller passes in a set of user-defined file operations to be used | ||
350 | for the file and an optional void * to a user-specified data item, | ||
351 | which will be accessible via inode->u.generic_ip (see the relay-apps | ||
352 | tarball for examples). The file_operations are a required parameter | ||
353 | to relayfs_create_file() and thus the semantics of these files are | ||
354 | completely defined by the caller. | ||
355 | |||
356 | See the relay-apps tarball at http://relayfs.sourceforge.net for | ||
357 | examples of how these non-relay files are meant to be used. | ||
358 | |||
359 | Creating relay files in other filesystems | ||
360 | ----------------------------------------- | ||
361 | |||
362 | By default of course, relay_open() creates relay files in the relayfs | ||
363 | filesystem. Because relay_file_operations is exported, however, it's | ||
364 | also possible to create and use relay files in other pseudo-filesytems | ||
365 | such as debugfs. | ||
366 | |||
367 | For this purpose, two callback functions are provided, | ||
368 | create_buf_file() and remove_buf_file(). create_buf_file() is called | ||
369 | once for each per-cpu buffer from relay_open() to allow the client to | ||
370 | create a file to be used to represent the corresponding buffer; if | ||
371 | this callback is not defined, the default implementation will create | ||
372 | and return a file in the relayfs filesystem to represent the buffer. | ||
373 | The callback should return the dentry of the file created to represent | ||
374 | the relay buffer. Note that the parent directory passed to | ||
375 | relay_open() (and passed along to the callback), if specified, must | ||
376 | exist in the same filesystem the new relay file is created in. If | ||
377 | create_buf_file() is defined, remove_buf_file() must also be defined; | ||
378 | it's responsible for deleting the file(s) created in create_buf_file() | ||
379 | and is called during relay_close(). | ||
380 | |||
381 | The create_buf_file() implementation can also be defined in such a way | ||
382 | as to allow the creation of a single 'global' buffer instead of the | ||
383 | default per-cpu set. This can be useful for applications interested | ||
384 | mainly in seeing the relative ordering of system-wide events without | ||
385 | the need to bother with saving explicit timestamps for the purpose of | ||
386 | merging/sorting per-cpu files in a postprocessing step. | ||
387 | |||
388 | To have relay_open() create a global buffer, the create_buf_file() | ||
389 | implementation should set the value of the is_global outparam to a | ||
390 | non-zero value in addition to creating the file that will be used to | ||
391 | represent the single buffer. In the case of a global buffer, | ||
392 | create_buf_file() and remove_buf_file() will be called only once. The | ||
393 | normal channel-writing functions e.g. relay_write() can still be used | ||
394 | - writes from any cpu will transparently end up in the global buffer - | ||
395 | but since it is a global buffer, callers should make sure they use the | ||
396 | proper locking for such a buffer, either by wrapping writes in a | ||
397 | spinlock, or by copying a write function from relayfs_fs.h and | ||
398 | creating a local version that internally does the proper locking. | ||
399 | |||
400 | See the 'exported-relayfile' examples in the relay-apps tarball for | ||
401 | examples of creating and using relay files in debugfs. | ||
402 | |||
323 | Misc | 403 | Misc |
324 | ---- | 404 | ---- |
325 | 405 | ||
diff --git a/Documentation/filesystems/spufs.txt b/Documentation/filesystems/spufs.txt new file mode 100644 index 000000000000..8edc3952eff4 --- /dev/null +++ b/Documentation/filesystems/spufs.txt | |||
@@ -0,0 +1,521 @@ | |||
1 | SPUFS(2) Linux Programmer's Manual SPUFS(2) | ||
2 | |||
3 | |||
4 | |||
5 | NAME | ||
6 | spufs - the SPU file system | ||
7 | |||
8 | |||
9 | DESCRIPTION | ||
10 | The SPU file system is used on PowerPC machines that implement the Cell | ||
11 | Broadband Engine Architecture in order to access Synergistic Processor | ||
12 | Units (SPUs). | ||
13 | |||
14 | The file system provides a name space similar to posix shared memory or | ||
15 | message queues. Users that have write permissions on the file system | ||
16 | can use spu_create(2) to establish SPU contexts in the spufs root. | ||
17 | |||
18 | Every SPU context is represented by a directory containing a predefined | ||
19 | set of files. These files can be used for manipulating the state of the | ||
20 | logical SPU. Users can change permissions on those files, but not actu- | ||
21 | ally add or remove files. | ||
22 | |||
23 | |||
24 | MOUNT OPTIONS | ||
25 | uid=<uid> | ||
26 | set the user owning the mount point, the default is 0 (root). | ||
27 | |||
28 | gid=<gid> | ||
29 | set the group owning the mount point, the default is 0 (root). | ||
30 | |||
31 | |||
32 | FILES | ||
33 | The files in spufs mostly follow the standard behavior for regular sys- | ||
34 | tem calls like read(2) or write(2), but often support only a subset of | ||
35 | the operations supported on regular file systems. This list details the | ||
36 | supported operations and the deviations from the behaviour in the | ||
37 | respective man pages. | ||
38 | |||
39 | All files that support the read(2) operation also support readv(2) and | ||
40 | all files that support the write(2) operation also support writev(2). | ||
41 | All files support the access(2) and stat(2) family of operations, but | ||
42 | only the st_mode, st_nlink, st_uid and st_gid fields of struct stat | ||
43 | contain reliable information. | ||
44 | |||
45 | All files support the chmod(2)/fchmod(2) and chown(2)/fchown(2) opera- | ||
46 | tions, but will not be able to grant permissions that contradict the | ||
47 | possible operations, e.g. read access on the wbox file. | ||
48 | |||
49 | The current set of files is: | ||
50 | |||
51 | |||
52 | /mem | ||
53 | the contents of the local storage memory of the SPU. This can be | ||
54 | accessed like a regular shared memory file and contains both code and | ||
55 | data in the address space of the SPU. The possible operations on an | ||
56 | open mem file are: | ||
57 | |||
58 | read(2), pread(2), write(2), pwrite(2), lseek(2) | ||
59 | These operate as documented, with the exception that seek(2), | ||
60 | write(2) and pwrite(2) are not supported beyond the end of the | ||
61 | file. The file size is the size of the local storage of the SPU, | ||
62 | which normally is 256 kilobytes. | ||
63 | |||
64 | mmap(2) | ||
65 | Mapping mem into the process address space gives access to the | ||
66 | SPU local storage within the process address space. Only | ||
67 | MAP_SHARED mappings are allowed. | ||
68 | |||
69 | |||
70 | /mbox | ||
71 | The first SPU to CPU communication mailbox. This file is read-only and | ||
72 | can be read in units of 32 bits. The file can only be used in non- | ||
73 | blocking mode and it even poll() will not block on it. The possible | ||
74 | operations on an open mbox file are: | ||
75 | |||
76 | read(2) | ||
77 | If a count smaller than four is requested, read returns -1 and | ||
78 | sets errno to EINVAL. If there is no data available in the mail | ||
79 | box, the return value is set to -1 and errno becomes EAGAIN. | ||
80 | When data has been read successfully, four bytes are placed in | ||
81 | the data buffer and the value four is returned. | ||
82 | |||
83 | |||
84 | /ibox | ||
85 | The second SPU to CPU communication mailbox. This file is similar to | ||
86 | the first mailbox file, but can be read in blocking I/O mode, and the | ||
87 | poll familiy of system calls can be used to wait for it. The possible | ||
88 | operations on an open ibox file are: | ||
89 | |||
90 | read(2) | ||
91 | If a count smaller than four is requested, read returns -1 and | ||
92 | sets errno to EINVAL. If there is no data available in the mail | ||
93 | box and the file descriptor has been opened with O_NONBLOCK, the | ||
94 | return value is set to -1 and errno becomes EAGAIN. | ||
95 | |||
96 | If there is no data available in the mail box and the file | ||
97 | descriptor has been opened without O_NONBLOCK, the call will | ||
98 | block until the SPU writes to its interrupt mailbox channel. | ||
99 | When data has been read successfully, four bytes are placed in | ||
100 | the data buffer and the value four is returned. | ||
101 | |||
102 | poll(2) | ||
103 | Poll on the ibox file returns (POLLIN | POLLRDNORM) whenever | ||
104 | data is available for reading. | ||
105 | |||
106 | |||
107 | /wbox | ||
108 | The CPU to SPU communation mailbox. It is write-only can can be written | ||
109 | in units of 32 bits. If the mailbox is full, write() will block and | ||
110 | poll can be used to wait for it becoming empty again. The possible | ||
111 | operations on an open wbox file are: write(2) If a count smaller than | ||
112 | four is requested, write returns -1 and sets errno to EINVAL. If there | ||
113 | is no space available in the mail box and the file descriptor has been | ||
114 | opened with O_NONBLOCK, the return value is set to -1 and errno becomes | ||
115 | EAGAIN. | ||
116 | |||
117 | If there is no space available in the mail box and the file descriptor | ||
118 | has been opened without O_NONBLOCK, the call will block until the SPU | ||
119 | reads from its PPE mailbox channel. When data has been read success- | ||
120 | fully, four bytes are placed in the data buffer and the value four is | ||
121 | returned. | ||
122 | |||
123 | poll(2) | ||
124 | Poll on the ibox file returns (POLLOUT | POLLWRNORM) whenever | ||
125 | space is available for writing. | ||
126 | |||
127 | |||
128 | /mbox_stat | ||
129 | /ibox_stat | ||
130 | /wbox_stat | ||
131 | Read-only files that contain the length of the current queue, i.e. how | ||
132 | many words can be read from mbox or ibox or how many words can be | ||
133 | written to wbox without blocking. The files can be read only in 4-byte | ||
134 | units and return a big-endian binary integer number. The possible | ||
135 | operations on an open *box_stat file are: | ||
136 | |||
137 | read(2) | ||
138 | If a count smaller than four is requested, read returns -1 and | ||
139 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
140 | the data buffer, containing the number of elements that can be | ||
141 | read from (for mbox_stat and ibox_stat) or written to (for | ||
142 | wbox_stat) the respective mail box without blocking or resulting | ||
143 | in EAGAIN. | ||
144 | |||
145 | |||
146 | /npc | ||
147 | /decr | ||
148 | /decr_status | ||
149 | /spu_tag_mask | ||
150 | /event_mask | ||
151 | /srr0 | ||
152 | Internal registers of the SPU. The representation is an ASCII string | ||
153 | with the numeric value of the next instruction to be executed. These | ||
154 | can be used in read/write mode for debugging, but normal operation of | ||
155 | programs should not rely on them because access to any of them except | ||
156 | npc requires an SPU context save and is therefore very inefficient. | ||
157 | |||
158 | The contents of these files are: | ||
159 | |||
160 | npc Next Program Counter | ||
161 | |||
162 | decr SPU Decrementer | ||
163 | |||
164 | decr_status Decrementer Status | ||
165 | |||
166 | spu_tag_mask MFC tag mask for SPU DMA | ||
167 | |||
168 | event_mask Event mask for SPU interrupts | ||
169 | |||
170 | srr0 Interrupt Return address register | ||
171 | |||
172 | |||
173 | The possible operations on an open npc, decr, decr_status, | ||
174 | spu_tag_mask, event_mask or srr0 file are: | ||
175 | |||
176 | read(2) | ||
177 | When the count supplied to the read call is shorter than the | ||
178 | required length for the pointer value plus a newline character, | ||
179 | subsequent reads from the same file descriptor will result in | ||
180 | completing the string, regardless of changes to the register by | ||
181 | a running SPU task. When a complete string has been read, all | ||
182 | subsequent read operations will return zero bytes and a new file | ||
183 | descriptor needs to be opened to read the value again. | ||
184 | |||
185 | write(2) | ||
186 | A write operation on the file results in setting the register to | ||
187 | the value given in the string. The string is parsed from the | ||
188 | beginning to the first non-numeric character or the end of the | ||
189 | buffer. Subsequent writes to the same file descriptor overwrite | ||
190 | the previous setting. | ||
191 | |||
192 | |||
193 | /fpcr | ||
194 | This file gives access to the Floating Point Status and Control Regis- | ||
195 | ter as a four byte long file. The operations on the fpcr file are: | ||
196 | |||
197 | read(2) | ||
198 | If a count smaller than four is requested, read returns -1 and | ||
199 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
200 | the data buffer, containing the current value of the fpcr regis- | ||
201 | ter. | ||
202 | |||
203 | write(2) | ||
204 | If a count smaller than four is requested, write returns -1 and | ||
205 | sets errno to EINVAL. Otherwise, a four byte value is copied | ||
206 | from the data buffer, updating the value of the fpcr register. | ||
207 | |||
208 | |||
209 | /signal1 | ||
210 | /signal2 | ||
211 | The two signal notification channels of an SPU. These are read-write | ||
212 | files that operate on a 32 bit word. Writing to one of these files | ||
213 | triggers an interrupt on the SPU. The value writting to the signal | ||
214 | files can be read from the SPU through a channel read or from host user | ||
215 | space through the file. After the value has been read by the SPU, it | ||
216 | is reset to zero. The possible operations on an open signal1 or sig- | ||
217 | nal2 file are: | ||
218 | |||
219 | read(2) | ||
220 | If a count smaller than four is requested, read returns -1 and | ||
221 | sets errno to EINVAL. Otherwise, a four byte value is placed in | ||
222 | the data buffer, containing the current value of the specified | ||
223 | signal notification register. | ||
224 | |||
225 | write(2) | ||
226 | If a count smaller than four is requested, write returns -1 and | ||
227 | sets errno to EINVAL. Otherwise, a four byte value is copied | ||
228 | from the data buffer, updating the value of the specified signal | ||
229 | notification register. The signal notification register will | ||
230 | either be replaced with the input data or will be updated to the | ||
231 | bitwise OR or the old value and the input data, depending on the | ||
232 | contents of the signal1_type, or signal2_type respectively, | ||
233 | file. | ||
234 | |||
235 | |||
236 | /signal1_type | ||
237 | /signal2_type | ||
238 | These two files change the behavior of the signal1 and signal2 notifi- | ||
239 | cation files. The contain a numerical ASCII string which is read as | ||
240 | either "1" or "0". In mode 0 (overwrite), the hardware replaces the | ||
241 | contents of the signal channel with the data that is written to it. in | ||
242 | mode 1 (logical OR), the hardware accumulates the bits that are subse- | ||
243 | quently written to it. The possible operations on an open signal1_type | ||
244 | or signal2_type file are: | ||
245 | |||
246 | read(2) | ||
247 | When the count supplied to the read call is shorter than the | ||
248 | required length for the digit plus a newline character, subse- | ||
249 | quent reads from the same file descriptor will result in com- | ||
250 | pleting the string. When a complete string has been read, all | ||
251 | subsequent read operations will return zero bytes and a new file | ||
252 | descriptor needs to be opened to read the value again. | ||
253 | |||
254 | write(2) | ||
255 | A write operation on the file results in setting the register to | ||
256 | the value given in the string. The string is parsed from the | ||
257 | beginning to the first non-numeric character or the end of the | ||
258 | buffer. Subsequent writes to the same file descriptor overwrite | ||
259 | the previous setting. | ||
260 | |||
261 | |||
262 | EXAMPLES | ||
263 | /etc/fstab entry | ||
264 | none /spu spufs gid=spu 0 0 | ||
265 | |||
266 | |||
267 | AUTHORS | ||
268 | Arnd Bergmann <arndb@de.ibm.com>, Mark Nutter <mnutter@us.ibm.com>, | ||
269 | Ulrich Weigand <Ulrich.Weigand@de.ibm.com> | ||
270 | |||
271 | SEE ALSO | ||
272 | capabilities(7), close(2), spu_create(2), spu_run(2), spufs(7) | ||
273 | |||
274 | |||
275 | |||
276 | Linux 2005-09-28 SPUFS(2) | ||
277 | |||
278 | ------------------------------------------------------------------------------ | ||
279 | |||
280 | SPU_RUN(2) Linux Programmer's Manual SPU_RUN(2) | ||
281 | |||
282 | |||
283 | |||
284 | NAME | ||
285 | spu_run - execute an spu context | ||
286 | |||
287 | |||
288 | SYNOPSIS | ||
289 | #include <sys/spu.h> | ||
290 | |||
291 | int spu_run(int fd, unsigned int *npc, unsigned int *event); | ||
292 | |||
293 | DESCRIPTION | ||
294 | The spu_run system call is used on PowerPC machines that implement the | ||
295 | Cell Broadband Engine Architecture in order to access Synergistic Pro- | ||
296 | cessor Units (SPUs). It uses the fd that was returned from spu_cre- | ||
297 | ate(2) to address a specific SPU context. When the context gets sched- | ||
298 | uled to a physical SPU, it starts execution at the instruction pointer | ||
299 | passed in npc. | ||
300 | |||
301 | Execution of SPU code happens synchronously, meaning that spu_run does | ||
302 | not return while the SPU is still running. If there is a need to exe- | ||
303 | cute SPU code in parallel with other code on either the main CPU or | ||
304 | other SPUs, you need to create a new thread of execution first, e.g. | ||
305 | using the pthread_create(3) call. | ||
306 | |||
307 | When spu_run returns, the current value of the SPU instruction pointer | ||
308 | is written back to npc, so you can call spu_run again without updating | ||
309 | the pointers. | ||
310 | |||
311 | event can be a NULL pointer or point to an extended status code that | ||
312 | gets filled when spu_run returns. It can be one of the following con- | ||
313 | stants: | ||
314 | |||
315 | SPE_EVENT_DMA_ALIGNMENT | ||
316 | A DMA alignment error | ||
317 | |||
318 | SPE_EVENT_SPE_DATA_SEGMENT | ||
319 | A DMA segmentation error | ||
320 | |||
321 | SPE_EVENT_SPE_DATA_STORAGE | ||
322 | A DMA storage error | ||
323 | |||
324 | If NULL is passed as the event argument, these errors will result in a | ||
325 | signal delivered to the calling process. | ||
326 | |||
327 | RETURN VALUE | ||
328 | spu_run returns the value of the spu_status register or -1 to indicate | ||
329 | an error and set errno to one of the error codes listed below. The | ||
330 | spu_status register value contains a bit mask of status codes and | ||
331 | optionally a 14 bit code returned from the stop-and-signal instruction | ||
332 | on the SPU. The bit masks for the status codes are: | ||
333 | |||
334 | 0x02 SPU was stopped by stop-and-signal. | ||
335 | |||
336 | 0x04 SPU was stopped by halt. | ||
337 | |||
338 | 0x08 SPU is waiting for a channel. | ||
339 | |||
340 | 0x10 SPU is in single-step mode. | ||
341 | |||
342 | 0x20 SPU has tried to execute an invalid instruction. | ||
343 | |||
344 | 0x40 SPU has tried to access an invalid channel. | ||
345 | |||
346 | 0x3fff0000 | ||
347 | The bits masked with this value contain the code returned from | ||
348 | stop-and-signal. | ||
349 | |||
350 | There are always one or more of the lower eight bits set or an error | ||
351 | code is returned from spu_run. | ||
352 | |||
353 | ERRORS | ||
354 | EAGAIN or EWOULDBLOCK | ||
355 | fd is in non-blocking mode and spu_run would block. | ||
356 | |||
357 | EBADF fd is not a valid file descriptor. | ||
358 | |||
359 | EFAULT npc is not a valid pointer or status is neither NULL nor a valid | ||
360 | pointer. | ||
361 | |||
362 | EINTR A signal occured while spu_run was in progress. The npc value | ||
363 | has been updated to the new program counter value if necessary. | ||
364 | |||
365 | EINVAL fd is not a file descriptor returned from spu_create(2). | ||
366 | |||
367 | ENOMEM Insufficient memory was available to handle a page fault result- | ||
368 | ing from an MFC direct memory access. | ||
369 | |||
370 | ENOSYS the functionality is not provided by the current system, because | ||
371 | either the hardware does not provide SPUs or the spufs module is | ||
372 | not loaded. | ||
373 | |||
374 | |||
375 | NOTES | ||
376 | spu_run is meant to be used from libraries that implement a more | ||
377 | abstract interface to SPUs, not to be used from regular applications. | ||
378 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | ||
379 | ommended libraries. | ||
380 | |||
381 | |||
382 | CONFORMING TO | ||
383 | This call is Linux specific and only implemented by the ppc64 architec- | ||
384 | ture. Programs using this system call are not portable. | ||
385 | |||
386 | |||
387 | BUGS | ||
388 | The code does not yet fully implement all features lined out here. | ||
389 | |||
390 | |||
391 | AUTHOR | ||
392 | Arnd Bergmann <arndb@de.ibm.com> | ||
393 | |||
394 | SEE ALSO | ||
395 | capabilities(7), close(2), spu_create(2), spufs(7) | ||
396 | |||
397 | |||
398 | |||
399 | Linux 2005-09-28 SPU_RUN(2) | ||
400 | |||
401 | ------------------------------------------------------------------------------ | ||
402 | |||
403 | SPU_CREATE(2) Linux Programmer's Manual SPU_CREATE(2) | ||
404 | |||
405 | |||
406 | |||
407 | NAME | ||
408 | spu_create - create a new spu context | ||
409 | |||
410 | |||
411 | SYNOPSIS | ||
412 | #include <sys/types.h> | ||
413 | #include <sys/spu.h> | ||
414 | |||
415 | int spu_create(const char *pathname, int flags, mode_t mode); | ||
416 | |||
417 | DESCRIPTION | ||
418 | The spu_create system call is used on PowerPC machines that implement | ||
419 | the Cell Broadband Engine Architecture in order to access Synergistic | ||
420 | Processor Units (SPUs). It creates a new logical context for an SPU in | ||
421 | pathname and returns a handle to associated with it. pathname must | ||
422 | point to a non-existing directory in the mount point of the SPU file | ||
423 | system (spufs). When spu_create is successful, a directory gets cre- | ||
424 | ated on pathname and it is populated with files. | ||
425 | |||
426 | The returned file handle can only be passed to spu_run(2) or closed, | ||
427 | other operations are not defined on it. When it is closed, all associ- | ||
428 | ated directory entries in spufs are removed. When the last file handle | ||
429 | pointing either inside of the context directory or to this file | ||
430 | descriptor is closed, the logical SPU context is destroyed. | ||
431 | |||
432 | The parameter flags can be zero or any bitwise or'd combination of the | ||
433 | following constants: | ||
434 | |||
435 | SPU_RAWIO | ||
436 | Allow mapping of some of the hardware registers of the SPU into | ||
437 | user space. This flag requires the CAP_SYS_RAWIO capability, see | ||
438 | capabilities(7). | ||
439 | |||
440 | The mode parameter specifies the permissions used for creating the new | ||
441 | directory in spufs. mode is modified with the user's umask(2) value | ||
442 | and then used for both the directory and the files contained in it. The | ||
443 | file permissions mask out some more bits of mode because they typically | ||
444 | support only read or write access. See stat(2) for a full list of the | ||
445 | possible mode values. | ||
446 | |||
447 | |||
448 | RETURN VALUE | ||
449 | spu_create returns a new file descriptor. It may return -1 to indicate | ||
450 | an error condition and set errno to one of the error codes listed | ||
451 | below. | ||
452 | |||
453 | |||
454 | ERRORS | ||
455 | EACCESS | ||
456 | The current user does not have write access on the spufs mount | ||
457 | point. | ||
458 | |||
459 | EEXIST An SPU context already exists at the given path name. | ||
460 | |||
461 | EFAULT pathname is not a valid string pointer in the current address | ||
462 | space. | ||
463 | |||
464 | EINVAL pathname is not a directory in the spufs mount point. | ||
465 | |||
466 | ELOOP Too many symlinks were found while resolving pathname. | ||
467 | |||
468 | EMFILE The process has reached its maximum open file limit. | ||
469 | |||
470 | ENAMETOOLONG | ||
471 | pathname was too long. | ||
472 | |||
473 | ENFILE The system has reached the global open file limit. | ||
474 | |||
475 | ENOENT Part of pathname could not be resolved. | ||
476 | |||
477 | ENOMEM The kernel could not allocate all resources required. | ||
478 | |||
479 | ENOSPC There are not enough SPU resources available to create a new | ||
480 | context or the user specific limit for the number of SPU con- | ||
481 | texts has been reached. | ||
482 | |||
483 | ENOSYS the functionality is not provided by the current system, because | ||
484 | either the hardware does not provide SPUs or the spufs module is | ||
485 | not loaded. | ||
486 | |||
487 | ENOTDIR | ||
488 | A part of pathname is not a directory. | ||
489 | |||
490 | |||
491 | |||
492 | NOTES | ||
493 | spu_create is meant to be used from libraries that implement a more | ||
494 | abstract interface to SPUs, not to be used from regular applications. | ||
495 | See http://www.bsc.es/projects/deepcomputing/linuxoncell/ for the rec- | ||
496 | ommended libraries. | ||
497 | |||
498 | |||
499 | FILES | ||
500 | pathname must point to a location beneath the mount point of spufs. By | ||
501 | convention, it gets mounted in /spu. | ||
502 | |||
503 | |||
504 | CONFORMING TO | ||
505 | This call is Linux specific and only implemented by the ppc64 architec- | ||
506 | ture. Programs using this system call are not portable. | ||
507 | |||
508 | |||
509 | BUGS | ||
510 | The code does not yet fully implement all features lined out here. | ||
511 | |||
512 | |||
513 | AUTHOR | ||
514 | Arnd Bergmann <arndb@de.ibm.com> | ||
515 | |||
516 | SEE ALSO | ||
517 | capabilities(7), close(2), spu_run(2), spufs(7) | ||
518 | |||
519 | |||
520 | |||
521 | Linux 2005-09-28 SPU_CREATE(2) | ||
diff --git a/Documentation/filesystems/sysfs-pci.txt b/Documentation/filesystems/sysfs-pci.txt index 988a62fae11f..7ba2baa165ff 100644 --- a/Documentation/filesystems/sysfs-pci.txt +++ b/Documentation/filesystems/sysfs-pci.txt | |||
@@ -1,4 +1,5 @@ | |||
1 | Accessing PCI device resources through sysfs | 1 | Accessing PCI device resources through sysfs |
2 | -------------------------------------------- | ||
2 | 3 | ||
3 | sysfs, usually mounted at /sys, provides access to PCI resources on platforms | 4 | sysfs, usually mounted at /sys, provides access to PCI resources on platforms |
4 | that support it. For example, a given bus might look like this: | 5 | that support it. For example, a given bus might look like this: |
@@ -47,14 +48,21 @@ files, each with their own function. | |||
47 | binary - file contains binary data | 48 | binary - file contains binary data |
48 | cpumask - file contains a cpumask type | 49 | cpumask - file contains a cpumask type |
49 | 50 | ||
50 | The read only files are informational, writes to them will be ignored. | 51 | The read only files are informational, writes to them will be ignored, with |
51 | Writable files can be used to perform actions on the device (e.g. changing | 52 | the exception of the 'rom' file. Writable files can be used to perform |
52 | config space, detaching a device). mmapable files are available via an | 53 | actions on the device (e.g. changing config space, detaching a device). |
53 | mmap of the file at offset 0 and can be used to do actual device programming | 54 | mmapable files are available via an mmap of the file at offset 0 and can be |
54 | from userspace. Note that some platforms don't support mmapping of certain | 55 | used to do actual device programming from userspace. Note that some platforms |
55 | resources, so be sure to check the return value from any attempted mmap. | 56 | don't support mmapping of certain resources, so be sure to check the return |
57 | value from any attempted mmap. | ||
58 | |||
59 | The 'rom' file is special in that it provides read-only access to the device's | ||
60 | ROM file, if available. It's disabled by default, however, so applications | ||
61 | should write the string "1" to the file to enable it before attempting a read | ||
62 | call, and disable it following the access by writing "0" to the file. | ||
56 | 63 | ||
57 | Accessing legacy resources through sysfs | 64 | Accessing legacy resources through sysfs |
65 | ---------------------------------------- | ||
58 | 66 | ||
59 | Legacy I/O port and ISA memory resources are also provided in sysfs if the | 67 | Legacy I/O port and ISA memory resources are also provided in sysfs if the |
60 | underlying platform supports them. They're located in the PCI class heirarchy, | 68 | underlying platform supports them. They're located in the PCI class heirarchy, |
@@ -75,6 +83,7 @@ simply dereference the returned pointer (after checking for errors of course) | |||
75 | to access legacy memory space. | 83 | to access legacy memory space. |
76 | 84 | ||
77 | Supporting PCI access on new platforms | 85 | Supporting PCI access on new platforms |
86 | -------------------------------------- | ||
78 | 87 | ||
79 | In order to support PCI resource mapping as described above, Linux platform | 88 | In order to support PCI resource mapping as described above, Linux platform |
80 | code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function. | 89 | code must define HAVE_PCI_MMAP and provide a pci_mmap_page_range function. |
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 0d783c504ead..dbe4d87d2615 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt | |||
@@ -78,6 +78,18 @@ use up all the memory on the machine; but enhances the scalability of | |||
78 | that instance in a system with many cpus making intensive use of it. | 78 | that instance in a system with many cpus making intensive use of it. |
79 | 79 | ||
80 | 80 | ||
81 | tmpfs has a mount option to set the NUMA memory allocation policy for | ||
82 | all files in that instance: | ||
83 | mpol=interleave prefers to allocate memory from each node in turn | ||
84 | mpol=default prefers to allocate memory from the local node | ||
85 | mpol=bind prefers to allocate from mpol_nodelist | ||
86 | mpol=preferred prefers to allocate from first node in mpol_nodelist | ||
87 | |||
88 | The following mount option is used in conjunction with mpol=interleave, | ||
89 | mpol=bind or mpol=preferred: | ||
90 | mpol_nodelist: nodelist suitable for parsing with nodelist_parse. | ||
91 | |||
92 | |||
81 | To specify the initial root directory you can use the following mount | 93 | To specify the initial root directory you can use the following mount |
82 | options: | 94 | options: |
83 | 95 | ||
diff --git a/Documentation/hpet.txt b/Documentation/hpet.txt index e52457581f47..b7a3dc38dd52 100644 --- a/Documentation/hpet.txt +++ b/Documentation/hpet.txt | |||
@@ -2,7 +2,7 @@ | |||
2 | 2 | ||
3 | The High Precision Event Timer (HPET) hardware is the future replacement | 3 | The High Precision Event Timer (HPET) hardware is the future replacement |
4 | for the 8254 and Real Time Clock (RTC) periodic timer functionality. | 4 | for the 8254 and Real Time Clock (RTC) periodic timer functionality. |
5 | Each HPET can have up two 32 timers. It is possible to configure the | 5 | Each HPET can have up to 32 timers. It is possible to configure the |
6 | first two timers as legacy replacements for 8254 and RTC periodic timers. | 6 | first two timers as legacy replacements for 8254 and RTC periodic timers. |
7 | A specification done by Intel and Microsoft can be found at | 7 | A specification done by Intel and Microsoft can be found at |
8 | <http://www.intel.com/hardwaredesign/hpetspec.htm>. | 8 | <http://www.intel.com/hardwaredesign/hpetspec.htm>. |
diff --git a/Documentation/hrtimers.txt b/Documentation/hrtimers.txt new file mode 100644 index 000000000000..7620ff735faf --- /dev/null +++ b/Documentation/hrtimers.txt | |||
@@ -0,0 +1,178 @@ | |||
1 | |||
2 | hrtimers - subsystem for high-resolution kernel timers | ||
3 | ---------------------------------------------------- | ||
4 | |||
5 | This patch introduces a new subsystem for high-resolution kernel timers. | ||
6 | |||
7 | One might ask the question: we already have a timer subsystem | ||
8 | (kernel/timers.c), why do we need two timer subsystems? After a lot of | ||
9 | back and forth trying to integrate high-resolution and high-precision | ||
10 | features into the existing timer framework, and after testing various | ||
11 | such high-resolution timer implementations in practice, we came to the | ||
12 | conclusion that the timer wheel code is fundamentally not suitable for | ||
13 | such an approach. We initially didnt believe this ('there must be a way | ||
14 | to solve this'), and spent a considerable effort trying to integrate | ||
15 | things into the timer wheel, but we failed. In hindsight, there are | ||
16 | several reasons why such integration is hard/impossible: | ||
17 | |||
18 | - the forced handling of low-resolution and high-resolution timers in | ||
19 | the same way leads to a lot of compromises, macro magic and #ifdef | ||
20 | mess. The timers.c code is very "tightly coded" around jiffies and | ||
21 | 32-bitness assumptions, and has been honed and micro-optimized for a | ||
22 | relatively narrow use case (jiffies in a relatively narrow HZ range) | ||
23 | for many years - and thus even small extensions to it easily break | ||
24 | the wheel concept, leading to even worse compromises. The timer wheel | ||
25 | code is very good and tight code, there's zero problems with it in its | ||
26 | current usage - but it is simply not suitable to be extended for | ||
27 | high-res timers. | ||
28 | |||
29 | - the unpredictable [O(N)] overhead of cascading leads to delays which | ||
30 | necessiate a more complex handling of high resolution timers, which | ||
31 | in turn decreases robustness. Such a design still led to rather large | ||
32 | timing inaccuracies. Cascading is a fundamental property of the timer | ||
33 | wheel concept, it cannot be 'designed out' without unevitably | ||
34 | degrading other portions of the timers.c code in an unacceptable way. | ||
35 | |||
36 | - the implementation of the current posix-timer subsystem on top of | ||
37 | the timer wheel has already introduced a quite complex handling of | ||
38 | the required readjusting of absolute CLOCK_REALTIME timers at | ||
39 | settimeofday or NTP time - further underlying our experience by | ||
40 | example: that the timer wheel data structure is too rigid for high-res | ||
41 | timers. | ||
42 | |||
43 | - the timer wheel code is most optimal for use cases which can be | ||
44 | identified as "timeouts". Such timeouts are usually set up to cover | ||
45 | error conditions in various I/O paths, such as networking and block | ||
46 | I/O. The vast majority of those timers never expire and are rarely | ||
47 | recascaded because the expected correct event arrives in time so they | ||
48 | can be removed from the timer wheel before any further processing of | ||
49 | them becomes necessary. Thus the users of these timeouts can accept | ||
50 | the granularity and precision tradeoffs of the timer wheel, and | ||
51 | largely expect the timer subsystem to have near-zero overhead. | ||
52 | Accurate timing for them is not a core purpose - in fact most of the | ||
53 | timeout values used are ad-hoc. For them it is at most a necessary | ||
54 | evil to guarantee the processing of actual timeout completions | ||
55 | (because most of the timeouts are deleted before completion), which | ||
56 | should thus be as cheap and unintrusive as possible. | ||
57 | |||
58 | The primary users of precision timers are user-space applications that | ||
59 | utilize nanosleep, posix-timers and itimer interfaces. Also, in-kernel | ||
60 | users like drivers and subsystems which require precise timed events | ||
61 | (e.g. multimedia) can benefit from the availability of a seperate | ||
62 | high-resolution timer subsystem as well. | ||
63 | |||
64 | While this subsystem does not offer high-resolution clock sources just | ||
65 | yet, the hrtimer subsystem can be easily extended with high-resolution | ||
66 | clock capabilities, and patches for that exist and are maturing quickly. | ||
67 | The increasing demand for realtime and multimedia applications along | ||
68 | with other potential users for precise timers gives another reason to | ||
69 | separate the "timeout" and "precise timer" subsystems. | ||
70 | |||
71 | Another potential benefit is that such a seperation allows even more | ||
72 | special-purpose optimization of the existing timer wheel for the low | ||
73 | resolution and low precision use cases - once the precision-sensitive | ||
74 | APIs are separated from the timer wheel and are migrated over to | ||
75 | hrtimers. E.g. we could decrease the frequency of the timeout subsystem | ||
76 | from 250 Hz to 100 HZ (or even smaller). | ||
77 | |||
78 | hrtimer subsystem implementation details | ||
79 | ---------------------------------------- | ||
80 | |||
81 | the basic design considerations were: | ||
82 | |||
83 | - simplicity | ||
84 | |||
85 | - data structure not bound to jiffies or any other granularity. All the | ||
86 | kernel logic works at 64-bit nanoseconds resolution - no compromises. | ||
87 | |||
88 | - simplification of existing, timing related kernel code | ||
89 | |||
90 | another basic requirement was the immediate enqueueing and ordering of | ||
91 | timers at activation time. After looking at several possible solutions | ||
92 | such as radix trees and hashes, we chose the red black tree as the basic | ||
93 | data structure. Rbtrees are available as a library in the kernel and are | ||
94 | used in various performance-critical areas of e.g. memory management and | ||
95 | file systems. The rbtree is solely used for time sorted ordering, while | ||
96 | a separate list is used to give the expiry code fast access to the | ||
97 | queued timers, without having to walk the rbtree. | ||
98 | |||
99 | (This seperate list is also useful for later when we'll introduce | ||
100 | high-resolution clocks, where we need seperate pending and expired | ||
101 | queues while keeping the time-order intact.) | ||
102 | |||
103 | Time-ordered enqueueing is not purely for the purposes of | ||
104 | high-resolution clocks though, it also simplifies the handling of | ||
105 | absolute timers based on a low-resolution CLOCK_REALTIME. The existing | ||
106 | implementation needed to keep an extra list of all armed absolute | ||
107 | CLOCK_REALTIME timers along with complex locking. In case of | ||
108 | settimeofday and NTP, all the timers (!) had to be dequeued, the | ||
109 | time-changing code had to fix them up one by one, and all of them had to | ||
110 | be enqueued again. The time-ordered enqueueing and the storage of the | ||
111 | expiry time in absolute time units removes all this complex and poorly | ||
112 | scaling code from the posix-timer implementation - the clock can simply | ||
113 | be set without having to touch the rbtree. This also makes the handling | ||
114 | of posix-timers simpler in general. | ||
115 | |||
116 | The locking and per-CPU behavior of hrtimers was mostly taken from the | ||
117 | existing timer wheel code, as it is mature and well suited. Sharing code | ||
118 | was not really a win, due to the different data structures. Also, the | ||
119 | hrtimer functions now have clearer behavior and clearer names - such as | ||
120 | hrtimer_try_to_cancel() and hrtimer_cancel() [which are roughly | ||
121 | equivalent to del_timer() and del_timer_sync()] - so there's no direct | ||
122 | 1:1 mapping between them on the algorithmical level, and thus no real | ||
123 | potential for code sharing either. | ||
124 | |||
125 | Basic data types: every time value, absolute or relative, is in a | ||
126 | special nanosecond-resolution type: ktime_t. The kernel-internal | ||
127 | representation of ktime_t values and operations is implemented via | ||
128 | macros and inline functions, and can be switched between a "hybrid | ||
129 | union" type and a plain "scalar" 64bit nanoseconds representation (at | ||
130 | compile time). The hybrid union type optimizes time conversions on 32bit | ||
131 | CPUs. This build-time-selectable ktime_t storage format was implemented | ||
132 | to avoid the performance impact of 64-bit multiplications and divisions | ||
133 | on 32bit CPUs. Such operations are frequently necessary to convert | ||
134 | between the storage formats provided by kernel and userspace interfaces | ||
135 | and the internal time format. (See include/linux/ktime.h for further | ||
136 | details.) | ||
137 | |||
138 | hrtimers - rounding of timer values | ||
139 | ----------------------------------- | ||
140 | |||
141 | the hrtimer code will round timer events to lower-resolution clocks | ||
142 | because it has to. Otherwise it will do no artificial rounding at all. | ||
143 | |||
144 | one question is, what resolution value should be returned to the user by | ||
145 | the clock_getres() interface. This will return whatever real resolution | ||
146 | a given clock has - be it low-res, high-res, or artificially-low-res. | ||
147 | |||
148 | hrtimers - testing and verification | ||
149 | ---------------------------------- | ||
150 | |||
151 | We used the high-resolution clock subsystem ontop of hrtimers to verify | ||
152 | the hrtimer implementation details in praxis, and we also ran the posix | ||
153 | timer tests in order to ensure specification compliance. We also ran | ||
154 | tests on low-resolution clocks. | ||
155 | |||
156 | The hrtimer patch converts the following kernel functionality to use | ||
157 | hrtimers: | ||
158 | |||
159 | - nanosleep | ||
160 | - itimers | ||
161 | - posix-timers | ||
162 | |||
163 | The conversion of nanosleep and posix-timers enabled the unification of | ||
164 | nanosleep and clock_nanosleep. | ||
165 | |||
166 | The code was successfully compiled for the following platforms: | ||
167 | |||
168 | i386, x86_64, ARM, PPC, PPC64, IA64 | ||
169 | |||
170 | The code was run-tested on the following platforms: | ||
171 | |||
172 | i386(UP/SMP), x86_64(UP/SMP), ARM, PPC | ||
173 | |||
174 | hrtimers were also integrated into the -rt tree, along with a | ||
175 | hrtimers-based high-resolution clock implementation, so the hrtimers | ||
176 | code got a healthy amount of testing and use in practice. | ||
177 | |||
178 | Thomas Gleixner, Ingo Molnar | ||
diff --git a/Documentation/i2o/ioctl b/Documentation/i2o/ioctl index 3e174978997d..1e77fac4e120 100644 --- a/Documentation/i2o/ioctl +++ b/Documentation/i2o/ioctl | |||
@@ -185,7 +185,7 @@ VII. Getting Parameters | |||
185 | ENOMEM Kernel memory allocation error | 185 | ENOMEM Kernel memory allocation error |
186 | 186 | ||
187 | A return value of 0 does not mean that the value was actually | 187 | A return value of 0 does not mean that the value was actually |
188 | properly retreived. The user should check the result list | 188 | properly retrieved. The user should check the result list |
189 | to determine the specific status of the transaction. | 189 | to determine the specific status of the transaction. |
190 | 190 | ||
191 | VIII. Downloading Software | 191 | VIII. Downloading Software |
diff --git a/Documentation/input/appletouch.txt b/Documentation/input/appletouch.txt index b48d11d0326d..4f7c633a76d2 100644 --- a/Documentation/input/appletouch.txt +++ b/Documentation/input/appletouch.txt | |||
@@ -3,7 +3,7 @@ Apple Touchpad Driver (appletouch) | |||
3 | Copyright (C) 2005 Stelian Pop <stelian@popies.net> | 3 | Copyright (C) 2005 Stelian Pop <stelian@popies.net> |
4 | 4 | ||
5 | appletouch is a Linux kernel driver for the USB touchpad found on post | 5 | appletouch is a Linux kernel driver for the USB touchpad found on post |
6 | February 2005 Apple Alu Powerbooks. | 6 | February 2005 and October 2005 Apple Aluminium Powerbooks. |
7 | 7 | ||
8 | This driver is derived from Johannes Berg's appletrackpad driver[1], but it has | 8 | This driver is derived from Johannes Berg's appletrackpad driver[1], but it has |
9 | been improved in some areas: | 9 | been improved in some areas: |
@@ -13,7 +13,8 @@ been improved in some areas: | |||
13 | 13 | ||
14 | Credits go to Johannes Berg for reverse-engineering the touchpad protocol, | 14 | Credits go to Johannes Berg for reverse-engineering the touchpad protocol, |
15 | Frank Arnold for further improvements, and Alex Harper for some additional | 15 | Frank Arnold for further improvements, and Alex Harper for some additional |
16 | information about the inner workings of the touchpad sensors. | 16 | information about the inner workings of the touchpad sensors. Michael |
17 | Hanselmann added support for the October 2005 models. | ||
17 | 18 | ||
18 | Usage: | 19 | Usage: |
19 | ------ | 20 | ------ |
diff --git a/Documentation/input/ff.txt b/Documentation/input/ff.txt index efa7dd6751f3..c7e10eaff203 100644 --- a/Documentation/input/ff.txt +++ b/Documentation/input/ff.txt | |||
@@ -120,7 +120,7 @@ to the unique id assigned by the driver. This data is required for performing | |||
120 | some operations (removing an effect, controlling the playback). | 120 | some operations (removing an effect, controlling the playback). |
121 | This if field must be set to -1 by the user in order to tell the driver to | 121 | This if field must be set to -1 by the user in order to tell the driver to |
122 | allocate a new effect. | 122 | allocate a new effect. |
123 | See <linux/input.h> for a description of the ff_effect stuct. You should also | 123 | See <linux/input.h> for a description of the ff_effect struct. You should also |
124 | find help in a few sketches, contained in files shape.fig and interactive.fig. | 124 | find help in a few sketches, contained in files shape.fig and interactive.fig. |
125 | You need xfig to visualize these files. | 125 | You need xfig to visualize these files. |
126 | 126 | ||
diff --git a/Documentation/ioctl/hdio.txt b/Documentation/ioctl/hdio.txt index 9a7aea0636a5..11c9be49f37c 100644 --- a/Documentation/ioctl/hdio.txt +++ b/Documentation/ioctl/hdio.txt | |||
@@ -946,7 +946,7 @@ HDIO_SCAN_HWIF register and (re)scan interface | |||
946 | 946 | ||
947 | This ioctl initializes the addresses and irq for a disk | 947 | This ioctl initializes the addresses and irq for a disk |
948 | controller, probes for drives, and creates /proc/ide | 948 | controller, probes for drives, and creates /proc/ide |
949 | interfaces as appropiate. | 949 | interfaces as appropriate. |
950 | 950 | ||
951 | 951 | ||
952 | 952 | ||
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt index d802ce88bedc..443230b43e09 100644 --- a/Documentation/kbuild/makefiles.txt +++ b/Documentation/kbuild/makefiles.txt | |||
@@ -1033,9 +1033,9 @@ When kbuild executes the following steps are followed (roughly): | |||
1033 | 1033 | ||
1034 | Example: | 1034 | Example: |
1035 | #arch/i386/Makefile | 1035 | #arch/i386/Makefile |
1036 | GCC_VERSION := $(call cc-version) | ||
1037 | cflags-y += $(shell \ | 1036 | cflags-y += $(shell \ |
1038 | if [ $(GCC_VERSION) -ge 0300 ] ; then echo "-mregparm=3"; fi ;) | 1037 | if [ $(call cc-version) -ge 0300 ] ; then \ |
1038 | echo "-mregparm=3"; fi ;) | ||
1039 | 1039 | ||
1040 | In the above example -mregparm=3 is only used for gcc version greater | 1040 | In the above example -mregparm=3 is only used for gcc version greater |
1041 | than or equal to gcc 3.0. | 1041 | than or equal to gcc 3.0. |
diff --git a/Documentation/kdump/gdbmacros.txt b/Documentation/kdump/gdbmacros.txt index bc1b9eb92ae1..dcf5580380ab 100644 --- a/Documentation/kdump/gdbmacros.txt +++ b/Documentation/kdump/gdbmacros.txt | |||
@@ -177,3 +177,25 @@ document trapinfo | |||
177 | 'trapinfo <pid>' will tell you by which trap & possibly | 177 | 'trapinfo <pid>' will tell you by which trap & possibly |
178 | addresthe kernel paniced. | 178 | addresthe kernel paniced. |
179 | end | 179 | end |
180 | |||
181 | |||
182 | define dmesg | ||
183 | set $i = 0 | ||
184 | set $end_idx = (log_end - 1) & (log_buf_len - 1) | ||
185 | |||
186 | while ($i < logged_chars) | ||
187 | set $idx = (log_end - 1 - logged_chars + $i) & (log_buf_len - 1) | ||
188 | |||
189 | if ($idx + 100 <= $end_idx) || \ | ||
190 | ($end_idx <= $idx && $idx + 100 < log_buf_len) | ||
191 | printf "%.100s", &log_buf[$idx] | ||
192 | set $i = $i + 100 | ||
193 | else | ||
194 | printf "%c", log_buf[$idx] | ||
195 | set $i = $i + 1 | ||
196 | end | ||
197 | end | ||
198 | end | ||
199 | document dmesg | ||
200 | print the kernel ring buffer | ||
201 | end | ||
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt index 5f08f9ce6046..212cf3c21abf 100644 --- a/Documentation/kdump/kdump.txt +++ b/Documentation/kdump/kdump.txt | |||
@@ -4,10 +4,10 @@ Documentation for kdump - the kexec-based crash dumping solution | |||
4 | DESIGN | 4 | DESIGN |
5 | ====== | 5 | ====== |
6 | 6 | ||
7 | Kdump uses kexec to reboot to a second kernel whenever a dump needs to be taken. | 7 | Kdump uses kexec to reboot to a second kernel whenever a dump needs to be |
8 | This second kernel is booted with very little memory. The first kernel reserves | 8 | taken. This second kernel is booted with very little memory. The first kernel |
9 | the section of memory that the second kernel uses. This ensures that on-going | 9 | reserves the section of memory that the second kernel uses. This ensures that |
10 | DMA from the first kernel does not corrupt the second kernel. | 10 | on-going DMA from the first kernel does not corrupt the second kernel. |
11 | 11 | ||
12 | All the necessary information about Core image is encoded in ELF format and | 12 | All the necessary information about Core image is encoded in ELF format and |
13 | stored in reserved area of memory before crash. Physical address of start of | 13 | stored in reserved area of memory before crash. Physical address of start of |
@@ -35,77 +35,82 @@ In the second kernel, "old memory" can be accessed in two ways. | |||
35 | SETUP | 35 | SETUP |
36 | ===== | 36 | ===== |
37 | 37 | ||
38 | 1) Download http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz | 38 | 1) Download the upstream kexec-tools userspace package from |
39 | and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch | 39 | http://www.xmission.com/~ebiederm/files/kexec/kexec-tools-1.101.tar.gz. |
40 | and after that build the source. | ||
41 | 40 | ||
42 | 2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel. | 41 | Apply the latest consolidated kdump patch on top of kexec-tools-1.101 |
42 | from http://lse.sourceforge.net/kdump/. This arrangment has been made | ||
43 | till all the userspace patches supporting kdump are integrated with | ||
44 | upstream kexec-tools userspace. | ||
43 | 45 | ||
46 | 2) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernels. | ||
44 | Two kernels need to be built in order to get this feature working. | 47 | Two kernels need to be built in order to get this feature working. |
48 | Following are the steps to properly configure the two kernels specific | ||
49 | to kexec and kdump features: | ||
45 | 50 | ||
46 | A) First kernel: | 51 | A) First kernel or regular kernel: |
52 | ---------------------------------- | ||
47 | a) Enable "kexec system call" feature (in Processor type and features). | 53 | a) Enable "kexec system call" feature (in Processor type and features). |
48 | CONFIG_KEXEC=y | 54 | CONFIG_KEXEC=y |
49 | b) This kernel's physical load address should be the default value of | 55 | b) Enable "sysfs file system support" (in Pseudo filesystems). |
50 | 0x100000 (0x100000, 1 MB) (in Processor type and features). | 56 | CONFIG_SYSFS=y |
51 | CONFIG_PHYSICAL_START=0x100000 | 57 | c) make |
52 | c) Enable "sysfs file system support" (in Pseudo filesystems). | ||
53 | CONFIG_SYSFS=y | ||
54 | d) Boot into first kernel with the command line parameter "crashkernel=Y@X". | 58 | d) Boot into first kernel with the command line parameter "crashkernel=Y@X". |
55 | Use appropriate values for X and Y. Y denotes how much memory to reserve | 59 | Use appropriate values for X and Y. Y denotes how much memory to reserve |
56 | for the second kernel, and X denotes at what physical address the reserved | 60 | for the second kernel, and X denotes at what physical address the |
57 | memory section starts. For example: "crashkernel=64M@16M". | 61 | reserved memory section starts. For example: "crashkernel=64M@16M". |
58 | 62 | ||
59 | B) Second kernel: | 63 | |
60 | a) Enable "kernel crash dumps" feature (in Processor type and features). | 64 | B) Second kernel or dump capture kernel: |
61 | CONFIG_CRASH_DUMP=y | 65 | --------------------------------------- |
62 | b) Specify a suitable value for "Physical address where the kernel is | 66 | a) For i386 architecture enable Highmem support |
63 | loaded" (in Processor type and features). Typically this value | 67 | CONFIG_HIGHMEM=y |
64 | should be same as X (See option d) above, e.g., 16 MB or 0x1000000. | 68 | b) Enable "kernel crash dumps" feature (under "Processor type and features") |
65 | CONFIG_PHYSICAL_START=0x1000000 | 69 | CONFIG_CRASH_DUMP=y |
66 | c) Enable "/proc/vmcore support" (Optional, in Pseudo filesystems). | 70 | c) Make sure a suitable value for "Physical address where the kernel is |
67 | CONFIG_PROC_VMCORE=y | 71 | loaded" (under "Processor type and features"). By default this value |
68 | d) Disable SMP support and build a UP kernel (Until it is fixed). | 72 | is 0x1000000 (16MB) and it should be same as X (See option d above), |
69 | CONFIG_SMP=n | 73 | e.g., 16 MB or 0x1000000. |
70 | e) Enable "Local APIC support on uniprocessors". | 74 | CONFIG_PHYSICAL_START=0x1000000 |
71 | CONFIG_X86_UP_APIC=y | 75 | d) Enable "/proc/vmcore support" (Optional, under "Pseudo filesystems"). |
72 | f) Enable "IO-APIC support on uniprocessors" | 76 | CONFIG_PROC_VMCORE=y |
73 | CONFIG_X86_UP_IOAPIC=y | 77 | |
74 | 78 | 3) After booting to regular kernel or first kernel, load the second kernel | |
75 | Note: i) Options a) and b) depend upon "Configure standard kernel features | 79 | using the following command: |
76 | (for small systems)" (under General setup). | ||
77 | ii) Option a) also depends on CONFIG_HIGHMEM (under Processor | ||
78 | type and features). | ||
79 | iii) Both option a) and b) are under "Processor type and features". | ||
80 | |||
81 | 3) Boot into the first kernel. You are now ready to try out kexec-based crash | ||
82 | dumps. | ||
83 | |||
84 | 4) Load the second kernel to be booted using: | ||
85 | 80 | ||
86 | kexec -p <second-kernel> --args-linux --elf32-core-headers | 81 | kexec -p <second-kernel> --args-linux --elf32-core-headers |
87 | --append="root=<root-dev> init 1 irqpoll" | 82 | --append="root=<root-dev> init 1 irqpoll maxcpus=1" |
88 | 83 | ||
89 | Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, | 84 | Notes: |
90 | as of now. | 85 | ====== |
91 | ii) By default ELF headers are stored in ELF64 format. Option | 86 | i) <second-kernel> has to be a vmlinux image ie uncompressed elf image. |
92 | --elf32-core-headers forces generation of ELF32 headers. gdb can | 87 | bzImage will not work, as of now. |
93 | not open ELF64 headers on 32 bit systems. So creating ELF32 | 88 | ii) --args-linux has to be speicfied as if kexec it loading an elf image, |
94 | headers can come handy for users who have got non-PAE systems and | 89 | it needs to know that the arguments supplied are of linux type. |
95 | hence have memory less than 4GB. | 90 | iii) By default ELF headers are stored in ELF64 format to support systems |
96 | iii) Specify "irqpoll" as command line parameter. This reduces driver | 91 | with more than 4GB memory. Option --elf32-core-headers forces generation |
97 | initialization failures in second kernel due to shared interrupts. | 92 | of ELF32 headers. The reason for this option being, as of now gdb can |
98 | iv) <root-dev> needs to be specified in a format corresponding to | 93 | not open vmcore file with ELF64 headers on a 32 bit systems. So ELF32 |
99 | the root device name in the output of mount command. | 94 | headers can be used if one has non-PAE systems and hence memory less |
100 | v) If you have built the drivers required to mount root file | 95 | than 4GB. |
101 | system as modules in <second-kernel>, then, specify | 96 | iv) Specify "irqpoll" as command line parameter. This reduces driver |
102 | --initrd=<initrd-for-second-kernel>. | 97 | initialization failures in second kernel due to shared interrupts. |
103 | 98 | v) <root-dev> needs to be specified in a format corresponding to the root | |
104 | 5) System reboots into the second kernel when a panic occurs. A module can be | 99 | device name in the output of mount command. |
105 | written to force the panic or "ALT-SysRq-c" can be used initiate a crash | 100 | vi) If you have built the drivers required to mount root file system as |
106 | dump for testing purposes. | 101 | modules in <second-kernel>, then, specify |
107 | 102 | --initrd=<initrd-for-second-kernel>. | |
108 | 6) Write out the dump file using | 103 | vii) Specify maxcpus=1 as, if during first kernel run, if panic happens on |
104 | non-boot cpus, second kernel doesn't seem to be boot up all the cpus. | ||
105 | The other option is to always built the second kernel without SMP | ||
106 | support ie CONFIG_SMP=n | ||
107 | |||
108 | 4) After successfully loading the second kernel as above, if a panic occurs | ||
109 | system reboots into the second kernel. A module can be written to force | ||
110 | the panic or "ALT-SysRq-c" can be used initiate a crash dump for testing | ||
111 | purposes. | ||
112 | |||
113 | 5) Once the second kernel has booted, write out the dump file using | ||
109 | 114 | ||
110 | cp /proc/vmcore <dump-file> | 115 | cp /proc/vmcore <dump-file> |
111 | 116 | ||
@@ -119,9 +124,9 @@ SETUP | |||
119 | 124 | ||
120 | Entire memory: dd if=/dev/oldmem of=oldmem.001 | 125 | Entire memory: dd if=/dev/oldmem of=oldmem.001 |
121 | 126 | ||
127 | |||
122 | ANALYSIS | 128 | ANALYSIS |
123 | ======== | 129 | ======== |
124 | |||
125 | Limited analysis can be done using gdb on the dump file copied out of | 130 | Limited analysis can be done using gdb on the dump file copied out of |
126 | /proc/vmcore. Use vmlinux built with -g and run | 131 | /proc/vmcore. Use vmlinux built with -g and run |
127 | 132 | ||
@@ -132,15 +137,19 @@ work fine. | |||
132 | 137 | ||
133 | Note: gdb cannot analyse core files generated in ELF64 format for i386. | 138 | Note: gdb cannot analyse core files generated in ELF64 format for i386. |
134 | 139 | ||
140 | Latest "crash" (crash-4.0-2.18) as available on Dave Anderson's site | ||
141 | http://people.redhat.com/~anderson/ works well with kdump format. | ||
142 | |||
143 | |||
135 | TODO | 144 | TODO |
136 | ==== | 145 | ==== |
137 | |||
138 | 1) Provide a kernel pages filtering mechanism so that core file size is not | 146 | 1) Provide a kernel pages filtering mechanism so that core file size is not |
139 | insane on systems having huge memory banks. | 147 | insane on systems having huge memory banks. |
140 | 2) Modify "crash" tool to make it recognize this dump. | 148 | 2) Relocatable kernel can help in maintaining multiple kernels for crashdump |
149 | and same kernel as the first kernel can be used to capture the dump. | ||
150 | |||
141 | 151 | ||
142 | CONTACT | 152 | CONTACT |
143 | ======= | 153 | ======= |
144 | |||
145 | Vivek Goyal (vgoyal@in.ibm.com) | 154 | Vivek Goyal (vgoyal@in.ibm.com) |
146 | Maneesh Soni (maneesh@in.ibm.com) | 155 | Maneesh Soni (maneesh@in.ibm.com) |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a482fde09bbb..84370363da80 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -452,6 +452,11 @@ running once the system is up. | |||
452 | 452 | ||
453 | eata= [HW,SCSI] | 453 | eata= [HW,SCSI] |
454 | 454 | ||
455 | ec_intr= [HW,ACPI] ACPI Embedded Controller interrupt mode | ||
456 | Format: <int> | ||
457 | 0: polling mode | ||
458 | non-0: interrupt mode (default) | ||
459 | |||
455 | eda= [HW,PS2] | 460 | eda= [HW,PS2] |
456 | 461 | ||
457 | edb= [HW,PS2] | 462 | edb= [HW,PS2] |
@@ -471,14 +476,15 @@ running once the system is up. | |||
471 | arch/i386/kernel/cpu/cpufreq/elanfreq.c. | 476 | arch/i386/kernel/cpu/cpufreq/elanfreq.c. |
472 | 477 | ||
473 | elevator= [IOSCHED] | 478 | elevator= [IOSCHED] |
474 | Format: {"as" | "cfq" | "deadline" | "noop"} | 479 | Format: {"anticipatory" | "cfq" | "deadline" | "noop"} |
475 | See Documentation/block/as-iosched.txt and | 480 | See Documentation/block/as-iosched.txt and |
476 | Documentation/block/deadline-iosched.txt for details. | 481 | Documentation/block/deadline-iosched.txt for details. |
477 | 482 | ||
478 | elfcorehdr= [IA-32] | 483 | elfcorehdr= [IA-32, X86_64] |
479 | Specifies physical address of start of kernel core | 484 | Specifies physical address of start of kernel core |
480 | image elf header. | 485 | image elf header. Generally kexec loader will |
481 | See Documentation/kdump.txt for details. | 486 | pass this option to capture kernel. |
487 | See Documentation/kdump/kdump.txt for details. | ||
482 | 488 | ||
483 | enforcing [SELINUX] Set initial enforcing status. | 489 | enforcing [SELINUX] Set initial enforcing status. |
484 | Format: {"0" | "1"} | 490 | Format: {"0" | "1"} |
@@ -711,9 +717,17 @@ running once the system is up. | |||
711 | load_ramdisk= [RAM] List of ramdisks to load from floppy | 717 | load_ramdisk= [RAM] List of ramdisks to load from floppy |
712 | See Documentation/ramdisk.txt. | 718 | See Documentation/ramdisk.txt. |
713 | 719 | ||
714 | lockd.udpport= [NFS] | 720 | lockd.nlm_grace_period=P [NFS] Assign grace period. |
721 | Format: <integer> | ||
722 | |||
723 | lockd.nlm_tcpport=N [NFS] Assign TCP port. | ||
724 | Format: <integer> | ||
715 | 725 | ||
716 | lockd.tcpport= [NFS] | 726 | lockd.nlm_timeout=T [NFS] Assign timeout value. |
727 | Format: <integer> | ||
728 | |||
729 | lockd.nlm_udpport=M [NFS] Assign UDP port. | ||
730 | Format: <integer> | ||
717 | 731 | ||
718 | logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver | 732 | logibm.irq= [HW,MOUSE] Logitech Bus Mouse Driver |
719 | Format: <irq> | 733 | Format: <irq> |
@@ -832,7 +846,7 @@ running once the system is up. | |||
832 | mem=nopentium [BUGS=IA-32] Disable usage of 4MB pages for kernel | 846 | mem=nopentium [BUGS=IA-32] Disable usage of 4MB pages for kernel |
833 | memory. | 847 | memory. |
834 | 848 | ||
835 | memmap=exactmap [KNL,IA-32] Enable setting of an exact | 849 | memmap=exactmap [KNL,IA-32,X86_64] Enable setting of an exact |
836 | E820 memory map, as specified by the user. | 850 | E820 memory map, as specified by the user. |
837 | Such memmap=exactmap lines can be constructed based on | 851 | Such memmap=exactmap lines can be constructed based on |
838 | BIOS output or other requirements. See the memmap=nn@ss | 852 | BIOS output or other requirements. See the memmap=nn@ss |
@@ -855,6 +869,49 @@ running once the system is up. | |||
855 | 869 | ||
856 | mga= [HW,DRM] | 870 | mga= [HW,DRM] |
857 | 871 | ||
872 | migration_cost= | ||
873 | [KNL,SMP] debug: override scheduler migration costs | ||
874 | Format: <level-1-usecs>,<level-2-usecs>,... | ||
875 | This debugging option can be used to override the | ||
876 | default scheduler migration cost matrix. The numbers | ||
877 | are indexed by 'CPU domain distance'. | ||
878 | E.g. migration_cost=1000,2000,3000 on an SMT NUMA | ||
879 | box will set up an intra-core migration cost of | ||
880 | 1 msec, an inter-core migration cost of 2 msecs, | ||
881 | and an inter-node migration cost of 3 msecs. | ||
882 | |||
883 | WARNING: using the wrong values here can break | ||
884 | scheduler performance, so it's only for scheduler | ||
885 | development purposes, not production environments. | ||
886 | |||
887 | migration_debug= | ||
888 | [KNL,SMP] migration cost auto-detect verbosity | ||
889 | Format=<0|1|2> | ||
890 | If a system's migration matrix reported at bootup | ||
891 | seems erroneous then this option can be used to | ||
892 | increase verbosity of the detection process. | ||
893 | We default to 0 (no extra messages), 1 will print | ||
894 | some more information, and 2 will be really | ||
895 | verbose (probably only useful if you also have a | ||
896 | serial console attached to the system). | ||
897 | |||
898 | migration_factor= | ||
899 | [KNL,SMP] multiply/divide migration costs by a factor | ||
900 | Format=<percent> | ||
901 | This debug option can be used to proportionally | ||
902 | increase or decrease the auto-detected migration | ||
903 | costs for all entries of the migration matrix. | ||
904 | E.g. migration_factor=150 will increase migration | ||
905 | costs by 50%. (and thus the scheduler will be less | ||
906 | eager migrating cache-hot tasks) | ||
907 | migration_factor=80 will decrease migration costs | ||
908 | by 20%. (thus the scheduler will be more eager to | ||
909 | migrate tasks) | ||
910 | |||
911 | WARNING: using the wrong values here can break | ||
912 | scheduler performance, so it's only for scheduler | ||
913 | development purposes, not production environments. | ||
914 | |||
858 | mousedev.tap_time= | 915 | mousedev.tap_time= |
859 | [MOUSE] Maximum time between finger touching and | 916 | [MOUSE] Maximum time between finger touching and |
860 | leaving touchpad surface for touch to be considered | 917 | leaving touchpad surface for touch to be considered |
@@ -998,6 +1055,8 @@ running once the system is up. | |||
998 | 1055 | ||
999 | nowb [ARM] | 1056 | nowb [ARM] |
1000 | 1057 | ||
1058 | nr_uarts= [SERIAL] maximum number of UARTs to be registered. | ||
1059 | |||
1001 | opl3= [HW,OSS] | 1060 | opl3= [HW,OSS] |
1002 | Format: <io> | 1061 | Format: <io> |
1003 | 1062 | ||
@@ -1176,6 +1235,10 @@ running once the system is up. | |||
1176 | Limit processor to maximum C-state | 1235 | Limit processor to maximum C-state |
1177 | max_cstate=9 overrides any DMI blacklist limit. | 1236 | max_cstate=9 overrides any DMI blacklist limit. |
1178 | 1237 | ||
1238 | processor.nocst [HW,ACPI] | ||
1239 | Ignore the _CST method to determine C-states, | ||
1240 | instead using the legacy FADT method | ||
1241 | |||
1179 | prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk | 1242 | prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk |
1180 | before loading. | 1243 | before loading. |
1181 | See Documentation/ramdisk.txt. | 1244 | See Documentation/ramdisk.txt. |
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt index 5f2b9c5edbb5..22488d791168 100644 --- a/Documentation/keys-request-key.txt +++ b/Documentation/keys-request-key.txt | |||
@@ -56,10 +56,12 @@ A request proceeds in the following manner: | |||
56 | (4) request_key() then forks and executes /sbin/request-key with a new session | 56 | (4) request_key() then forks and executes /sbin/request-key with a new session |
57 | keyring that contains a link to auth key V. | 57 | keyring that contains a link to auth key V. |
58 | 58 | ||
59 | (5) /sbin/request-key execs an appropriate program to perform the actual | 59 | (5) /sbin/request-key assumes the authority associated with key U. |
60 | |||
61 | (6) /sbin/request-key execs an appropriate program to perform the actual | ||
60 | instantiation. | 62 | instantiation. |
61 | 63 | ||
62 | (6) The program may want to access another key from A's context (say a | 64 | (7) The program may want to access another key from A's context (say a |
63 | Kerberos TGT key). It just requests the appropriate key, and the keyring | 65 | Kerberos TGT key). It just requests the appropriate key, and the keyring |
64 | search notes that the session keyring has auth key V in its bottom level. | 66 | search notes that the session keyring has auth key V in its bottom level. |
65 | 67 | ||
@@ -67,19 +69,19 @@ A request proceeds in the following manner: | |||
67 | UID, GID, groups and security info of process A as if it was process A, | 69 | UID, GID, groups and security info of process A as if it was process A, |
68 | and come up with key W. | 70 | and come up with key W. |
69 | 71 | ||
70 | (7) The program then does what it must to get the data with which to | 72 | (8) The program then does what it must to get the data with which to |
71 | instantiate key U, using key W as a reference (perhaps it contacts a | 73 | instantiate key U, using key W as a reference (perhaps it contacts a |
72 | Kerberos server using the TGT) and then instantiates key U. | 74 | Kerberos server using the TGT) and then instantiates key U. |
73 | 75 | ||
74 | (8) Upon instantiating key U, auth key V is automatically revoked so that it | 76 | (9) Upon instantiating key U, auth key V is automatically revoked so that it |
75 | may not be used again. | 77 | may not be used again. |
76 | 78 | ||
77 | (9) The program then exits 0 and request_key() deletes key V and returns key | 79 | (10) The program then exits 0 and request_key() deletes key V and returns key |
78 | U to the caller. | 80 | U to the caller. |
79 | 81 | ||
80 | This also extends further. If key W (step 5 above) didn't exist, key W would be | 82 | This also extends further. If key W (step 7 above) didn't exist, key W would be |
81 | created uninstantiated, another auth key (X) would be created [as per step 3] | 83 | created uninstantiated, another auth key (X) would be created (as per step 3) |
82 | and another copy of /sbin/request-key spawned [as per step 4]; but the context | 84 | and another copy of /sbin/request-key spawned (as per step 4); but the context |
83 | specified by auth key X will still be process A, as it was in auth key V. | 85 | specified by auth key X will still be process A, as it was in auth key V. |
84 | 86 | ||
85 | This is because process A's keyrings can't simply be attached to | 87 | This is because process A's keyrings can't simply be attached to |
@@ -138,8 +140,8 @@ until one succeeds: | |||
138 | 140 | ||
139 | (3) The process's session keyring is searched. | 141 | (3) The process's session keyring is searched. |
140 | 142 | ||
141 | (4) If the process has a request_key() authorisation key in its session | 143 | (4) If the process has assumed the authority associated with a request_key() |
142 | keyring then: | 144 | authorisation key then: |
143 | 145 | ||
144 | (a) If extant, the calling process's thread keyring is searched. | 146 | (a) If extant, the calling process's thread keyring is searched. |
145 | 147 | ||
diff --git a/Documentation/keys.txt b/Documentation/keys.txt index 6304db59bfe4..aaa01b0e3ee9 100644 --- a/Documentation/keys.txt +++ b/Documentation/keys.txt | |||
@@ -308,6 +308,8 @@ process making the call: | |||
308 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring | 308 | KEY_SPEC_USER_KEYRING -4 UID-specific keyring |
309 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring | 309 | KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring |
310 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring | 310 | KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring |
311 | KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key() | ||
312 | authorisation key | ||
311 | 313 | ||
312 | 314 | ||
313 | The main syscalls are: | 315 | The main syscalls are: |
@@ -498,7 +500,11 @@ The keyctl syscall functions are: | |||
498 | keyring is full, error ENFILE will result. | 500 | keyring is full, error ENFILE will result. |
499 | 501 | ||
500 | The link procedure checks the nesting of the keyrings, returning ELOOP if | 502 | The link procedure checks the nesting of the keyrings, returning ELOOP if |
501 | it appears to deep or EDEADLK if the link would introduce a cycle. | 503 | it appears too deep or EDEADLK if the link would introduce a cycle. |
504 | |||
505 | Any links within the keyring to keys that match the new key in terms of | ||
506 | type and description will be discarded from the keyring as the new one is | ||
507 | added. | ||
502 | 508 | ||
503 | 509 | ||
504 | (*) Unlink a key or keyring from another keyring: | 510 | (*) Unlink a key or keyring from another keyring: |
@@ -628,6 +634,41 @@ The keyctl syscall functions are: | |||
628 | there is one, otherwise the user default session keyring. | 634 | there is one, otherwise the user default session keyring. |
629 | 635 | ||
630 | 636 | ||
637 | (*) Set the timeout on a key. | ||
638 | |||
639 | long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout); | ||
640 | |||
641 | This sets or clears the timeout on a key. The timeout can be 0 to clear | ||
642 | the timeout or a number of seconds to set the expiry time that far into | ||
643 | the future. | ||
644 | |||
645 | The process must have attribute modification access on a key to set its | ||
646 | timeout. Timeouts may not be set with this function on negative, revoked | ||
647 | or expired keys. | ||
648 | |||
649 | |||
650 | (*) Assume the authority granted to instantiate a key | ||
651 | |||
652 | long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key); | ||
653 | |||
654 | This assumes or divests the authority required to instantiate the | ||
655 | specified key. Authority can only be assumed if the thread has the | ||
656 | authorisation key associated with the specified key in its keyrings | ||
657 | somewhere. | ||
658 | |||
659 | Once authority is assumed, searches for keys will also search the | ||
660 | requester's keyrings using the requester's security label, UID, GID and | ||
661 | groups. | ||
662 | |||
663 | If the requested authority is unavailable, error EPERM will be returned, | ||
664 | likewise if the authority has been revoked because the target key is | ||
665 | already instantiated. | ||
666 | |||
667 | If the specified key is 0, then any assumed authority will be divested. | ||
668 | |||
669 | The assumed authorititive key is inherited across fork and exec. | ||
670 | |||
671 | |||
631 | =============== | 672 | =============== |
632 | KERNEL SERVICES | 673 | KERNEL SERVICES |
633 | =============== | 674 | =============== |
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index 0541fe1de704..0ea5a0c6e827 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt | |||
@@ -411,7 +411,8 @@ int init_module(void) | |||
411 | printk("Couldn't find %s to plant kprobe\n", "do_fork"); | 411 | printk("Couldn't find %s to plant kprobe\n", "do_fork"); |
412 | return -1; | 412 | return -1; |
413 | } | 413 | } |
414 | if ((ret = register_kprobe(&kp) < 0)) { | 414 | ret = register_kprobe(&kp); |
415 | if (ret < 0) { | ||
415 | printk("register_kprobe failed, returned %d\n", ret); | 416 | printk("register_kprobe failed, returned %d\n", ret); |
416 | return -1; | 417 | return -1; |
417 | } | 418 | } |
diff --git a/Documentation/laptop-mode.txt b/Documentation/laptop-mode.txt index dc4e810afdcd..b18e21675906 100644 --- a/Documentation/laptop-mode.txt +++ b/Documentation/laptop-mode.txt | |||
@@ -3,7 +3,7 @@ How to conserve battery power using laptop-mode | |||
3 | 3 | ||
4 | Document Author: Bart Samwel (bart@samwel.tk) | 4 | Document Author: Bart Samwel (bart@samwel.tk) |
5 | Date created: January 2, 2004 | 5 | Date created: January 2, 2004 |
6 | Last modified: July 10, 2004 | 6 | Last modified: December 06, 2004 |
7 | 7 | ||
8 | Introduction | 8 | Introduction |
9 | ------------ | 9 | ------------ |
@@ -33,7 +33,7 @@ or anything. Simply install all the files included in this document, and | |||
33 | laptop mode will automatically be started when you're on battery. For | 33 | laptop mode will automatically be started when you're on battery. For |
34 | your convenience, a tarball containing an installer can be downloaded at: | 34 | your convenience, a tarball containing an installer can be downloaded at: |
35 | 35 | ||
36 | http://www.xs4all.nl/~bsamwel/laptop_mode/tools | 36 | http://www.xs4all.nl/~bsamwel/laptop_mode/tools/ |
37 | 37 | ||
38 | To configure laptop mode, you need to edit the configuration file, which is | 38 | To configure laptop mode, you need to edit the configuration file, which is |
39 | located in /etc/default/laptop-mode on Debian-based systems, or in | 39 | located in /etc/default/laptop-mode on Debian-based systems, or in |
@@ -357,7 +357,7 @@ MAX_AGE=${MAX_AGE:-'600'} | |||
357 | # Read-ahead, in kilobytes | 357 | # Read-ahead, in kilobytes |
358 | READAHEAD=${READAHEAD:-'4096'} | 358 | READAHEAD=${READAHEAD:-'4096'} |
359 | 359 | ||
360 | # Shall we remount journaled fs. with appropiate commit interval? (1=yes) | 360 | # Shall we remount journaled fs. with appropriate commit interval? (1=yes) |
361 | DO_REMOUNTS=${DO_REMOUNTS:-'1'} | 361 | DO_REMOUNTS=${DO_REMOUNTS:-'1'} |
362 | 362 | ||
363 | # And shall we add the "noatime" option to that as well? (1=yes) | 363 | # And shall we add the "noatime" option to that as well? (1=yes) |
@@ -912,7 +912,7 @@ void usage() | |||
912 | exit(0); | 912 | exit(0); |
913 | } | 913 | } |
914 | 914 | ||
915 | int main(int ac, char **av) | 915 | int main(int argc, char **argv) |
916 | { | 916 | { |
917 | int fd; | 917 | int fd; |
918 | char *disk = 0; | 918 | char *disk = 0; |
diff --git a/Documentation/locks.txt b/Documentation/locks.txt index ce1be79edfb8..e3b402ef33bd 100644 --- a/Documentation/locks.txt +++ b/Documentation/locks.txt | |||
@@ -65,20 +65,3 @@ The default is to disallow mandatory locking. The intention is that | |||
65 | mandatory locking only be enabled on a local filesystem as the specific need | 65 | mandatory locking only be enabled on a local filesystem as the specific need |
66 | arises. | 66 | arises. |
67 | 67 | ||
68 | Until an updated version of mount(8) becomes available you may have to apply | ||
69 | this patch to the mount sources (based on the version distributed with Rick | ||
70 | Faith's util-linux-2.5 package): | ||
71 | |||
72 | *** mount.c.orig Sat Jun 8 09:14:31 1996 | ||
73 | --- mount.c Sat Jun 8 09:13:02 1996 | ||
74 | *************** | ||
75 | *** 100,105 **** | ||
76 | --- 100,107 ---- | ||
77 | { "noauto", 0, MS_NOAUTO }, /* Can only be mounted explicitly */ | ||
78 | { "user", 0, MS_USER }, /* Allow ordinary user to mount */ | ||
79 | { "nouser", 1, MS_USER }, /* Forbid ordinary user to mount */ | ||
80 | + { "mand", 0, MS_MANDLOCK }, /* Allow mandatory locks on this FS */ | ||
81 | + { "nomand", 1, MS_MANDLOCK }, /* Forbid mandatory locks on this FS */ | ||
82 | /* add new options here */ | ||
83 | #ifdef MS_NOSUB | ||
84 | { "sub", 1, MS_NOSUB }, /* allow submounts */ | ||
diff --git a/Documentation/mutex-design.txt b/Documentation/mutex-design.txt new file mode 100644 index 000000000000..cbf79881a41c --- /dev/null +++ b/Documentation/mutex-design.txt | |||
@@ -0,0 +1,135 @@ | |||
1 | Generic Mutex Subsystem | ||
2 | |||
3 | started by Ingo Molnar <mingo@redhat.com> | ||
4 | |||
5 | "Why on earth do we need a new mutex subsystem, and what's wrong | ||
6 | with semaphores?" | ||
7 | |||
8 | firstly, there's nothing wrong with semaphores. But if the simpler | ||
9 | mutex semantics are sufficient for your code, then there are a couple | ||
10 | of advantages of mutexes: | ||
11 | |||
12 | - 'struct mutex' is smaller on most architectures: .e.g on x86, | ||
13 | 'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes. | ||
14 | A smaller structure size means less RAM footprint, and better | ||
15 | CPU-cache utilization. | ||
16 | |||
17 | - tighter code. On x86 i get the following .text sizes when | ||
18 | switching all mutex-alike semaphores in the kernel to the mutex | ||
19 | subsystem: | ||
20 | |||
21 | text data bss dec hex filename | ||
22 | 3280380 868188 396860 4545428 455b94 vmlinux-semaphore | ||
23 | 3255329 865296 396732 4517357 44eded vmlinux-mutex | ||
24 | |||
25 | that's 25051 bytes of code saved, or a 0.76% win - off the hottest | ||
26 | codepaths of the kernel. (The .data savings are 2892 bytes, or 0.33%) | ||
27 | Smaller code means better icache footprint, which is one of the | ||
28 | major optimization goals in the Linux kernel currently. | ||
29 | |||
30 | - the mutex subsystem is slightly faster and has better scalability for | ||
31 | contended workloads. On an 8-way x86 system, running a mutex-based | ||
32 | kernel and testing creat+unlink+close (of separate, per-task files) | ||
33 | in /tmp with 16 parallel tasks, the average number of ops/sec is: | ||
34 | |||
35 | Semaphores: Mutexes: | ||
36 | |||
37 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
38 | 8 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
39 | checking VFS performance. checking VFS performance. | ||
40 | avg loops/sec: 34713 avg loops/sec: 84153 | ||
41 | CPU utilization: 63% CPU utilization: 22% | ||
42 | |||
43 | i.e. in this workload, the mutex based kernel was 2.4 times faster | ||
44 | than the semaphore based kernel, _and_ it also had 2.8 times less CPU | ||
45 | utilization. (In terms of 'ops per CPU cycle', the semaphore kernel | ||
46 | performed 551 ops/sec per 1% of CPU time used, while the mutex kernel | ||
47 | performed 3825 ops/sec per 1% of CPU time used - it was 6.9 times | ||
48 | more efficient.) | ||
49 | |||
50 | the scalability difference is visible even on a 2-way P4 HT box: | ||
51 | |||
52 | Semaphores: Mutexes: | ||
53 | |||
54 | $ ./test-mutex V 16 10 $ ./test-mutex V 16 10 | ||
55 | 4 CPUs, running 16 tasks. 8 CPUs, running 16 tasks. | ||
56 | checking VFS performance. checking VFS performance. | ||
57 | avg loops/sec: 127659 avg loops/sec: 181082 | ||
58 | CPU utilization: 100% CPU utilization: 34% | ||
59 | |||
60 | (the straight performance advantage of mutexes is 41%, the per-cycle | ||
61 | efficiency of mutexes is 4.1 times better.) | ||
62 | |||
63 | - there are no fastpath tradeoffs, the mutex fastpath is just as tight | ||
64 | as the semaphore fastpath. On x86, the locking fastpath is 2 | ||
65 | instructions: | ||
66 | |||
67 | c0377ccb <mutex_lock>: | ||
68 | c0377ccb: f0 ff 08 lock decl (%eax) | ||
69 | c0377cce: 78 0e js c0377cde <.text.lock.mutex> | ||
70 | c0377cd0: c3 ret | ||
71 | |||
72 | the unlocking fastpath is equally tight: | ||
73 | |||
74 | c0377cd1 <mutex_unlock>: | ||
75 | c0377cd1: f0 ff 00 lock incl (%eax) | ||
76 | c0377cd4: 7e 0f jle c0377ce5 <.text.lock.mutex+0x7> | ||
77 | c0377cd6: c3 ret | ||
78 | |||
79 | - 'struct mutex' semantics are well-defined and are enforced if | ||
80 | CONFIG_DEBUG_MUTEXES is turned on. Semaphores on the other hand have | ||
81 | virtually no debugging code or instrumentation. The mutex subsystem | ||
82 | checks and enforces the following rules: | ||
83 | |||
84 | * - only one task can hold the mutex at a time | ||
85 | * - only the owner can unlock the mutex | ||
86 | * - multiple unlocks are not permitted | ||
87 | * - recursive locking is not permitted | ||
88 | * - a mutex object must be initialized via the API | ||
89 | * - a mutex object must not be initialized via memset or copying | ||
90 | * - task may not exit with mutex held | ||
91 | * - memory areas where held locks reside must not be freed | ||
92 | * - held mutexes must not be reinitialized | ||
93 | * - mutexes may not be used in irq contexts | ||
94 | |||
95 | furthermore, there are also convenience features in the debugging | ||
96 | code: | ||
97 | |||
98 | * - uses symbolic names of mutexes, whenever they are printed in debug output | ||
99 | * - point-of-acquire tracking, symbolic lookup of function names | ||
100 | * - list of all locks held in the system, printout of them | ||
101 | * - owner tracking | ||
102 | * - detects self-recursing locks and prints out all relevant info | ||
103 | * - detects multi-task circular deadlocks and prints out all affected | ||
104 | * locks and tasks (and only those tasks) | ||
105 | |||
106 | Disadvantages | ||
107 | ------------- | ||
108 | |||
109 | The stricter mutex API means you cannot use mutexes the same way you | ||
110 | can use semaphores: e.g. they cannot be used from an interrupt context, | ||
111 | nor can they be unlocked from a different context that which acquired | ||
112 | it. [ I'm not aware of any other (e.g. performance) disadvantages from | ||
113 | using mutexes at the moment, please let me know if you find any. ] | ||
114 | |||
115 | Implementation of mutexes | ||
116 | ------------------------- | ||
117 | |||
118 | 'struct mutex' is the new mutex type, defined in include/linux/mutex.h | ||
119 | and implemented in kernel/mutex.c. It is a counter-based mutex with a | ||
120 | spinlock and a wait-list. The counter has 3 states: 1 for "unlocked", | ||
121 | 0 for "locked" and negative numbers (usually -1) for "locked, potential | ||
122 | waiters queued". | ||
123 | |||
124 | the APIs of 'struct mutex' have been streamlined: | ||
125 | |||
126 | DEFINE_MUTEX(name); | ||
127 | |||
128 | mutex_init(mutex); | ||
129 | |||
130 | void mutex_lock(struct mutex *lock); | ||
131 | int mutex_lock_interruptible(struct mutex *lock); | ||
132 | int mutex_trylock(struct mutex *lock); | ||
133 | void mutex_unlock(struct mutex *lock); | ||
134 | int mutex_is_locked(struct mutex *lock); | ||
135 | |||
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt index b0fe41da007b..8d8b4e5ea184 100644 --- a/Documentation/networking/bonding.txt +++ b/Documentation/networking/bonding.txt | |||
@@ -945,7 +945,6 @@ bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | |||
945 | collisions:0 txqueuelen:0 | 945 | collisions:0 txqueuelen:0 |
946 | 946 | ||
947 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 947 | eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
948 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 | ||
949 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 948 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
950 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 | 949 | RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 |
951 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 | 950 | TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 |
@@ -953,7 +952,6 @@ eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | |||
953 | Interrupt:10 Base address:0x1080 | 952 | Interrupt:10 Base address:0x1080 |
954 | 953 | ||
955 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 | 954 | eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 |
956 | inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0 | ||
957 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 | 955 | UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 |
958 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 | 956 | RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 |
959 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 | 957 | TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 |
diff --git a/Documentation/networking/sk98lin.txt b/Documentation/networking/sk98lin.txt index 851fc97bb22f..7837c53fd5fe 100644 --- a/Documentation/networking/sk98lin.txt +++ b/Documentation/networking/sk98lin.txt | |||
@@ -91,7 +91,7 @@ To use the driver as a module, proceed as follows: | |||
91 | with (M) | 91 | with (M) |
92 | 5. Execute the command "make modules". | 92 | 5. Execute the command "make modules". |
93 | 6. Execute the command "make modules_install". | 93 | 6. Execute the command "make modules_install". |
94 | The appropiate modules will be installed. | 94 | The appropriate modules will be installed. |
95 | 7. Reboot your system. | 95 | 7. Reboot your system. |
96 | 96 | ||
97 | 97 | ||
@@ -245,7 +245,7 @@ Default: Both | |||
245 | This parameters is only relevant if auto-negotiation for this port is | 245 | This parameters is only relevant if auto-negotiation for this port is |
246 | not set to "Sense". If auto-negotiation is set to "On", all three values | 246 | not set to "Sense". If auto-negotiation is set to "On", all three values |
247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. | 247 | are possible. If it is set to "Off", only "Full" and "Half" are allowed. |
248 | This parameter is usefull if your link partner does not support all | 248 | This parameter is useful if your link partner does not support all |
249 | possible combinations. | 249 | possible combinations. |
250 | 250 | ||
251 | Flow Control | 251 | Flow Control |
diff --git a/Documentation/oops-tracing.txt b/Documentation/oops-tracing.txt index 05960f8a748e..2503404ae5c2 100644 --- a/Documentation/oops-tracing.txt +++ b/Documentation/oops-tracing.txt | |||
@@ -41,11 +41,9 @@ the disk is not available then you have three options :- | |||
41 | run a null modem to a second machine and capture the output there | 41 | run a null modem to a second machine and capture the output there |
42 | using your favourite communication program. Minicom works well. | 42 | using your favourite communication program. Minicom works well. |
43 | 43 | ||
44 | (3) Patch the kernel with one of the crash dump patches. These save | 44 | (3) Use Kdump (see Documentation/kdump/kdump.txt), |
45 | data to a floppy disk or video rom or a swap partition. None of | 45 | extract the kernel ring buffer from old memory with using dmesg |
46 | these are standard kernel patches so you have to find and apply | 46 | gdbmacro in Documentation/kdump/gdbmacros.txt. |
47 | them yourself. Search kernel archives for kmsgdump, lkcd and | ||
48 | oops+smram. | ||
49 | 47 | ||
50 | 48 | ||
51 | Full Information | 49 | Full Information |
diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt new file mode 100644 index 000000000000..d089967e4948 --- /dev/null +++ b/Documentation/pci-error-recovery.txt | |||
@@ -0,0 +1,246 @@ | |||
1 | |||
2 | PCI Error Recovery | ||
3 | ------------------ | ||
4 | May 31, 2005 | ||
5 | |||
6 | Current document maintainer: | ||
7 | Linas Vepstas <linas@austin.ibm.com> | ||
8 | |||
9 | |||
10 | Some PCI bus controllers are able to detect certain "hard" PCI errors | ||
11 | on the bus, such as parity errors on the data and address busses, as | ||
12 | well as SERR and PERR errors. These chipsets are then able to disable | ||
13 | I/O to/from the affected device, so that, for example, a bad DMA | ||
14 | address doesn't end up corrupting system memory. These same chipsets | ||
15 | are also able to reset the affected PCI device, and return it to | ||
16 | working condition. This document describes a generic API form | ||
17 | performing error recovery. | ||
18 | |||
19 | The core idea is that after a PCI error has been detected, there must | ||
20 | be a way for the kernel to coordinate with all affected device drivers | ||
21 | so that the pci card can be made operational again, possibly after | ||
22 | performing a full electrical #RST of the PCI card. The API below | ||
23 | provides a generic API for device drivers to be notified of PCI | ||
24 | errors, and to be notified of, and respond to, a reset sequence. | ||
25 | |||
26 | Preliminary sketch of API, cut-n-pasted-n-modified email from | ||
27 | Ben Herrenschmidt, circa 5 april 2005 | ||
28 | |||
29 | The error recovery API support is exposed to the driver in the form of | ||
30 | a structure of function pointers pointed to by a new field in struct | ||
31 | pci_driver. The absence of this pointer in pci_driver denotes an | ||
32 | "non-aware" driver, behaviour on these is platform dependant. | ||
33 | Platforms like ppc64 can try to simulate pci hotplug remove/add. | ||
34 | |||
35 | The definition of "pci_error_token" is not covered here. It is based on | ||
36 | Seto's work on the synchronous error detection. We still need to define | ||
37 | functions for extracting infos out of an opaque error token. This is | ||
38 | separate from this API. | ||
39 | |||
40 | This structure has the form: | ||
41 | |||
42 | struct pci_error_handlers | ||
43 | { | ||
44 | int (*error_detected)(struct pci_dev *dev, pci_error_token error); | ||
45 | int (*mmio_enabled)(struct pci_dev *dev); | ||
46 | int (*resume)(struct pci_dev *dev); | ||
47 | int (*link_reset)(struct pci_dev *dev); | ||
48 | int (*slot_reset)(struct pci_dev *dev); | ||
49 | }; | ||
50 | |||
51 | A driver doesn't have to implement all of these callbacks. The | ||
52 | only mandatory one is error_detected(). If a callback is not | ||
53 | implemented, the corresponding feature is considered unsupported. | ||
54 | For example, if mmio_enabled() and resume() aren't there, then the | ||
55 | driver is assumed as not doing any direct recovery and requires | ||
56 | a reset. If link_reset() is not implemented, the card is assumed as | ||
57 | not caring about link resets, in which case, if recover is supported, | ||
58 | the core can try recover (but not slot_reset() unless it really did | ||
59 | reset the slot). If slot_reset() is not supported, link_reset() can | ||
60 | be called instead on a slot reset. | ||
61 | |||
62 | At first, the call will always be : | ||
63 | |||
64 | 1) error_detected() | ||
65 | |||
66 | Error detected. This is sent once after an error has been detected. At | ||
67 | this point, the device might not be accessible anymore depending on the | ||
68 | platform (the slot will be isolated on ppc64). The driver may already | ||
69 | have "noticed" the error because of a failing IO, but this is the proper | ||
70 | "synchronisation point", that is, it gives a chance to the driver to | ||
71 | cleanup, waiting for pending stuff (timers, whatever, etc...) to | ||
72 | complete; it can take semaphores, schedule, etc... everything but touch | ||
73 | the device. Within this function and after it returns, the driver | ||
74 | shouldn't do any new IOs. Called in task context. This is sort of a | ||
75 | "quiesce" point. See note about interrupts at the end of this doc. | ||
76 | |||
77 | Result codes: | ||
78 | - PCIERR_RESULT_CAN_RECOVER: | ||
79 | Driever returns this if it thinks it might be able to recover | ||
80 | the HW by just banging IOs or if it wants to be given | ||
81 | a chance to extract some diagnostic informations (see | ||
82 | below). | ||
83 | - PCIERR_RESULT_NEED_RESET: | ||
84 | Driver returns this if it thinks it can't recover unless the | ||
85 | slot is reset. | ||
86 | - PCIERR_RESULT_DISCONNECT: | ||
87 | Return this if driver thinks it won't recover at all, | ||
88 | (this will detach the driver ? or just leave it | ||
89 | dangling ? to be decided) | ||
90 | |||
91 | So at this point, we have called error_detected() for all drivers | ||
92 | on the segment that had the error. On ppc64, the slot is isolated. What | ||
93 | happens now typically depends on the result from the drivers. If all | ||
94 | drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would | ||
95 | re-enable IOs on the slot (or do nothing special if the platform doesn't | ||
96 | isolate slots) and call 2). If not and we can reset slots, we go to 4), | ||
97 | if neither, we have a dead slot. If it's an hotplug slot, we might | ||
98 | "simulate" reset by triggering HW unplug/replug though. | ||
99 | |||
100 | >>> Current ppc64 implementation assumes that a device driver will | ||
101 | >>> *not* schedule or semaphore in this routine; the current ppc64 | ||
102 | >>> implementation uses one kernel thread to notify all devices; | ||
103 | >>> thus, of one device sleeps/schedules, all devices are affected. | ||
104 | >>> Doing better requires complex multi-threaded logic in the error | ||
105 | >>> recovery implementation (e.g. waiting for all notification threads | ||
106 | >>> to "join" before proceeding with recovery.) This seems excessively | ||
107 | >>> complex and not worth implementing. | ||
108 | |||
109 | >>> The current ppc64 implementation doesn't much care if the device | ||
110 | >>> attempts i/o at this point, or not. I/O's will fail, returning | ||
111 | >>> a value of 0xff on read, and writes will be dropped. If the device | ||
112 | >>> driver attempts more than 10K I/O's to a frozen adapter, it will | ||
113 | >>> assume that the device driver has gone into an infinite loop, and | ||
114 | >>> it will panic the the kernel. | ||
115 | |||
116 | 2) mmio_enabled() | ||
117 | |||
118 | This is the "early recovery" call. IOs are allowed again, but DMA is | ||
119 | not (hrm... to be discussed, I prefer not), with some restrictions. This | ||
120 | is NOT a callback for the driver to start operations again, only to | ||
121 | peek/poke at the device, extract diagnostic information, if any, and | ||
122 | eventually do things like trigger a device local reset or some such, | ||
123 | but not restart operations. This is sent if all drivers on a segment | ||
124 | agree that they can try to recover and no automatic link reset was | ||
125 | performed by the HW. If the platform can't just re-enable IOs without | ||
126 | a slot reset or a link reset, it doesn't call this callback and goes | ||
127 | directly to 3) or 4). All IOs should be done _synchronously_ from | ||
128 | within this callback, errors triggered by them will be returned via | ||
129 | the normal pci_check_whatever() api, no new error_detected() callback | ||
130 | will be issued due to an error happening here. However, such an error | ||
131 | might cause IOs to be re-blocked for the whole segment, and thus | ||
132 | invalidate the recovery that other devices on the same segment might | ||
133 | have done, forcing the whole segment into one of the next states, | ||
134 | that is link reset or slot reset. | ||
135 | |||
136 | Result codes: | ||
137 | - PCIERR_RESULT_RECOVERED | ||
138 | Driver returns this if it thinks the device is fully | ||
139 | functionnal and thinks it is ready to start | ||
140 | normal driver operations again. There is no | ||
141 | guarantee that the driver will actually be | ||
142 | allowed to proceed, as another driver on the | ||
143 | same segment might have failed and thus triggered a | ||
144 | slot reset on platforms that support it. | ||
145 | |||
146 | - PCIERR_RESULT_NEED_RESET | ||
147 | Driver returns this if it thinks the device is not | ||
148 | recoverable in it's current state and it needs a slot | ||
149 | reset to proceed. | ||
150 | |||
151 | - PCIERR_RESULT_DISCONNECT | ||
152 | Same as above. Total failure, no recovery even after | ||
153 | reset driver dead. (To be defined more precisely) | ||
154 | |||
155 | >>> The current ppc64 implementation does not implement this callback. | ||
156 | |||
157 | 3) link_reset() | ||
158 | |||
159 | This is called after the link has been reset. This is typically | ||
160 | a PCI Express specific state at this point and is done whenever a | ||
161 | non-fatal error has been detected that can be "solved" by resetting | ||
162 | the link. This call informs the driver of the reset and the driver | ||
163 | should check if the device appears to be in working condition. | ||
164 | This function acts a bit like 2) mmio_enabled(), in that the driver | ||
165 | is not supposed to restart normal driver I/O operations right away. | ||
166 | Instead, it should just "probe" the device to check it's recoverability | ||
167 | status. If all is right, then the core will call resume() once all | ||
168 | drivers have ack'd link_reset(). | ||
169 | |||
170 | Result codes: | ||
171 | (identical to mmio_enabled) | ||
172 | |||
173 | >>> The current ppc64 implementation does not implement this callback. | ||
174 | |||
175 | 4) slot_reset() | ||
176 | |||
177 | This is called after the slot has been soft or hard reset by the | ||
178 | platform. A soft reset consists of asserting the adapter #RST line | ||
179 | and then restoring the PCI BARs and PCI configuration header. If the | ||
180 | platform supports PCI hotplug, then it might instead perform a hard | ||
181 | reset by toggling power on the slot off/on. This call gives drivers | ||
182 | the chance to re-initialize the hardware (re-download firmware, etc.), | ||
183 | but drivers shouldn't restart normal I/O processing operations at | ||
184 | this point. (See note about interrupts; interrupts aren't guaranteed | ||
185 | to be delivered until the resume() callback has been called). If all | ||
186 | device drivers report success on this callback, the patform will call | ||
187 | resume() to complete the error handling and let the driver restart | ||
188 | normal I/O processing. | ||
189 | |||
190 | A driver can still return a critical failure for this function if | ||
191 | it can't get the device operational after reset. If the platform | ||
192 | previously tried a soft reset, it migh now try a hard reset (power | ||
193 | cycle) and then call slot_reset() again. It the device still can't | ||
194 | be recovered, there is nothing more that can be done; the platform | ||
195 | will typically report a "permanent failure" in such a case. The | ||
196 | device will be considered "dead" in this case. | ||
197 | |||
198 | Result codes: | ||
199 | - PCIERR_RESULT_DISCONNECT | ||
200 | Same as above. | ||
201 | |||
202 | >>> The current ppc64 implementation does not try a power-cycle reset | ||
203 | >>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. | ||
204 | |||
205 | 5) resume() | ||
206 | |||
207 | This is called if all drivers on the segment have returned | ||
208 | PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. | ||
209 | That basically tells the driver to restart activity, tht everything | ||
210 | is back and running. No result code is taken into account here. If | ||
211 | a new error happens, it will restart a new error handling process. | ||
212 | |||
213 | That's it. I think this covers all the possibilities. The way those | ||
214 | callbacks are called is platform policy. A platform with no slot reset | ||
215 | capability for example may want to just "ignore" drivers that can't | ||
216 | recover (disconnect them) and try to let other cards on the same segment | ||
217 | recover. Keep in mind that in most real life cases, though, there will | ||
218 | be only one driver per segment. | ||
219 | |||
220 | Now, there is a note about interrupts. If you get an interrupt and your | ||
221 | device is dead or has been isolated, there is a problem :) | ||
222 | |||
223 | After much thinking, I decided to leave that to the platform. That is, | ||
224 | the recovery API only precies that: | ||
225 | |||
226 | - There is no guarantee that interrupt delivery can proceed from any | ||
227 | device on the segment starting from the error detection and until the | ||
228 | restart callback is sent, at which point interrupts are expected to be | ||
229 | fully operational. | ||
230 | |||
231 | - There is no guarantee that interrupt delivery is stopped, that is, ad | ||
232 | river that gets an interrupts after detecting an error, or that detects | ||
233 | and error within the interrupt handler such that it prevents proper | ||
234 | ack'ing of the interrupt (and thus removal of the source) should just | ||
235 | return IRQ_NOTHANDLED. It's up to the platform to deal with taht | ||
236 | condition, typically by masking the irq source during the duration of | ||
237 | the error handling. It is expected that the platform "knows" which | ||
238 | interrupts are routed to error-management capable slots and can deal | ||
239 | with temporarily disabling that irq number during error processing (this | ||
240 | isn't terribly complex). That means some IRQ latency for other devices | ||
241 | sharing the interrupt, but there is simply no other way. High end | ||
242 | platforms aren't supposed to share interrupts between many devices | ||
243 | anyway :) | ||
244 | |||
245 | |||
246 | Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com> | ||
diff --git a/Documentation/pm.txt b/Documentation/pm.txt index 2ea1149bf6b0..79c0f32a760e 100644 --- a/Documentation/pm.txt +++ b/Documentation/pm.txt | |||
@@ -218,7 +218,7 @@ proceed in the opposite direction. | |||
218 | Q: Who do I contact for additional information about | 218 | Q: Who do I contact for additional information about |
219 | enabling power management for my specific driver/device? | 219 | enabling power management for my specific driver/device? |
220 | 220 | ||
221 | ACPI Development mailing list: acpi-devel@lists.sourceforge.net | 221 | ACPI Development mailing list: linux-acpi@vger.kernel.org |
222 | 222 | ||
223 | System Interface -- OBSOLETE, DO NOT USE! | 223 | System Interface -- OBSOLETE, DO NOT USE! |
224 | ----------------************************* | 224 | ----------------************************* |
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index cd0fcd89a6f0..08c79d4dc540 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt | |||
@@ -212,7 +212,7 @@ A: Try running | |||
212 | 212 | ||
213 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null | 213 | cat `cat /proc/[0-9]*/maps | grep / | sed 's:.* /:/:' | sort -u` > /dev/null |
214 | 214 | ||
215 | after resume. swapoff -a; swapon -a may also be usefull. | 215 | after resume. swapoff -a; swapon -a may also be useful. |
216 | 216 | ||
217 | Q: What happens to devices during swsusp? They seem to be resumed | 217 | Q: What happens to devices during swsusp? They seem to be resumed |
218 | during system suspend? | 218 | during system suspend? |
@@ -323,7 +323,7 @@ to be useless to try to suspend to disk while that app is running? | |||
323 | A: No, it should work okay, as long as your app does not mlock() | 323 | A: No, it should work okay, as long as your app does not mlock() |
324 | it. Just prepare big enough swap partition. | 324 | it. Just prepare big enough swap partition. |
325 | 325 | ||
326 | Q: What information is usefull for debugging suspend-to-disk problems? | 326 | Q: What information is useful for debugging suspend-to-disk problems? |
327 | 327 | ||
328 | A: Well, last messages on the screen are always useful. If something | 328 | A: Well, last messages on the screen are always useful. If something |
329 | is broken, it is usually some kernel driver, therefore trying with as | 329 | is broken, it is usually some kernel driver, therefore trying with as |
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index e7bea0a407b4..d6d65b9bcfe3 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX | |||
@@ -8,12 +8,18 @@ please mail me. | |||
8 | cpu_features.txt | 8 | cpu_features.txt |
9 | - info on how we support a variety of CPUs with minimal compile-time | 9 | - info on how we support a variety of CPUs with minimal compile-time |
10 | options. | 10 | options. |
11 | eeh-pci-error-recovery.txt | ||
12 | - info on PCI Bus EEH Error Recovery | ||
13 | hvcs.txt | ||
14 | - IBM "Hypervisor Virtual Console Server" Installation Guide | ||
15 | mpc52xx.txt | ||
16 | - Linux 2.6.x on MPC52xx family | ||
11 | ppc_htab.txt | 17 | ppc_htab.txt |
12 | - info about the Linux/PPC /proc/ppc_htab entry | 18 | - info about the Linux/PPC /proc/ppc_htab entry |
13 | smp.txt | ||
14 | - use and state info about Linux/PPC on MP machines | ||
15 | SBC8260_memory_mapping.txt | 19 | SBC8260_memory_mapping.txt |
16 | - EST SBC8260 board info | 20 | - EST SBC8260 board info |
21 | smp.txt | ||
22 | - use and state info about Linux/PPC on MP machines | ||
17 | sound.txt | 23 | sound.txt |
18 | - info on sound support under Linux/PPC | 24 | - info on sound support under Linux/PPC |
19 | zImage_layout.txt | 25 | zImage_layout.txt |
diff --git a/Documentation/scsi/aacraid.txt b/Documentation/scsi/aacraid.txt new file mode 100644 index 000000000000..820fd0793502 --- /dev/null +++ b/Documentation/scsi/aacraid.txt | |||
@@ -0,0 +1,108 @@ | |||
1 | AACRAID Driver for Linux (take two) | ||
2 | |||
3 | Introduction | ||
4 | ------------------------- | ||
5 | The aacraid driver adds support for Adaptec (http://www.adaptec.com) | ||
6 | RAID controllers. This is a major rewrite from the original | ||
7 | Adaptec supplied driver. It has signficantly cleaned up both the code | ||
8 | and the running binary size (the module is less than half the size of | ||
9 | the original). | ||
10 | |||
11 | Supported Cards/Chipsets | ||
12 | ------------------------- | ||
13 | PCI ID (pci.ids) OEM Product | ||
14 | 9005:0285:9005:028a Adaptec 2020ZCR (Skyhawk) | ||
15 | 9005:0285:9005:028e Adaptec 2020SA (Skyhawk) | ||
16 | 9005:0285:9005:028b Adaptec 2025ZCR (Terminator) | ||
17 | 9005:0285:9005:028f Adaptec 2025SA (Terminator) | ||
18 | 9005:0285:9005:0286 Adaptec 2120S (Crusader) | ||
19 | 9005:0286:9005:028d Adaptec 2130S (Lancer) | ||
20 | 9005:0285:9005:0285 Adaptec 2200S (Vulcan) | ||
21 | 9005:0285:9005:0287 Adaptec 2200S (Vulcan-2m) | ||
22 | 9005:0286:9005:028c Adaptec 2230S (Lancer) | ||
23 | 9005:0286:9005:028c Adaptec 2230SLP (Lancer) | ||
24 | 9005:0285:9005:0296 Adaptec 2240S (SabreExpress) | ||
25 | 9005:0285:9005:0290 Adaptec 2410SA (Jaguar) | ||
26 | 9005:0285:9005:0293 Adaptec 21610SA (Corsair-16) | ||
27 | 9005:0285:103c:3227 Adaptec 2610SA (Bearcat) | ||
28 | 9005:0285:9005:0292 Adaptec 2810SA (Corsair-8) | ||
29 | 9005:0285:9005:0294 Adaptec Prowler | ||
30 | 9005:0286:9005:029d Adaptec 2420SA (Intruder) | ||
31 | 9005:0286:9005:029c Adaptec 2620SA (Intruder) | ||
32 | 9005:0286:9005:029b Adaptec 2820SA (Intruder) | ||
33 | 9005:0286:9005:02a7 Adaptec 2830SA (Skyray) | ||
34 | 9005:0286:9005:02a8 Adaptec 2430SA (Skyray) | ||
35 | 9005:0285:9005:0288 Adaptec 3230S (Harrier) | ||
36 | 9005:0285:9005:0289 Adaptec 3240S (Tornado) | ||
37 | 9005:0285:9005:0298 Adaptec 4000SAS (BlackBird) | ||
38 | 9005:0285:9005:0297 Adaptec 4005SAS (AvonPark) | ||
39 | 9005:0285:9005:0299 Adaptec 4800SAS (Marauder-X) | ||
40 | 9005:0285:9005:029a Adaptec 4805SAS (Marauder-E) | ||
41 | 9005:0286:9005:02a2 Adaptec 4810SAS (Hurricane) | ||
42 | 1011:0046:9005:0364 Adaptec 5400S (Mustang) | ||
43 | 1011:0046:9005:0365 Adaptec 5400S (Mustang) | ||
44 | 9005:0283:9005:0283 Adaptec Catapult (3210S with arc firmware) | ||
45 | 9005:0284:9005:0284 Adaptec Tomcat (3410S with arc firmware) | ||
46 | 9005:0287:9005:0800 Adaptec Themisto (Jupiter) | ||
47 | 9005:0200:9005:0200 Adaptec Themisto (Jupiter) | ||
48 | 9005:0286:9005:0800 Adaptec Callisto (Jupiter) | ||
49 | 1011:0046:9005:1364 Dell PERC 2/QC (Quad Channel, Mustang) | ||
50 | 1028:0001:1028:0001 Dell PERC 2/Si (Iguana) | ||
51 | 1028:0003:1028:0003 Dell PERC 3/Si (SlimFast) | ||
52 | 1028:0002:1028:0002 Dell PERC 3/Di (Opal) | ||
53 | 1028:0004:1028:0004 Dell PERC 3/DiF (Iguana) | ||
54 | 1028:0002:1028:00d1 Dell PERC 3/DiV (Viper) | ||
55 | 1028:0002:1028:00d9 Dell PERC 3/DiL (Lexus) | ||
56 | 1028:000a:1028:0106 Dell PERC 3/DiJ (Jaguar) | ||
57 | 1028:000a:1028:011b Dell PERC 3/DiD (Dagger) | ||
58 | 1028:000a:1028:0121 Dell PERC 3/DiB (Boxster) | ||
59 | 9005:0285:1028:0287 Dell PERC 320/DC (Vulcan) | ||
60 | 9005:0285:1028:0291 Dell CERC 2 (DellCorsair) | ||
61 | 1011:0046:103c:10c2 HP NetRAID-4M (Mustang) | ||
62 | 9005:0285:17aa:0286 Legend S220 (Crusader) | ||
63 | 9005:0285:17aa:0287 Legend S230 (Vulcan) | ||
64 | 9005:0285:9005:0290 IBM ServeRAID 7t (Jaguar) | ||
65 | 9005:0285:1014:02F2 IBM ServeRAID 8i (AvonPark) | ||
66 | 9005:0285:1014:0312 IBM ServeRAID 8i (AvonParkLite) | ||
67 | 9005:0286:1014:9580 IBM ServeRAID 8k/8k-l8 (Aurora) | ||
68 | 9005:0286:1014:9540 IBM ServeRAID 8k/8k-l4 (AuroraLite) | ||
69 | 9005:0286:9005:029f ICP ICP9014R0 (Lancer) | ||
70 | 9005:0286:9005:029e ICP ICP9024R0 (Lancer) | ||
71 | 9005:0286:9005:02a0 ICP ICP9047MA (Lancer) | ||
72 | 9005:0286:9005:02a1 ICP ICP9087MA (Lancer) | ||
73 | 9005:0286:9005:02a4 ICP ICP9085LI (Marauder-X) | ||
74 | 9005:0286:9005:02a5 ICP ICP5085BR (Marauder-E) | ||
75 | 9005:0286:9005:02a3 ICP ICP5085AU (Hurricane) | ||
76 | 9005:0286:9005:02a6 ICP ICP9067MA (Intruder-6) | ||
77 | 9005:0286:9005:02a9 ICP ICP5087AU (Skyray) | ||
78 | 9005:0286:9005:02aa ICP ICP5047AU (Skyray) | ||
79 | |||
80 | People | ||
81 | ------------------------- | ||
82 | Alan Cox <alan@redhat.com> | ||
83 | Christoph Hellwig <hch@infradead.org> (updates for new-style PCI probing and SCSI host registration, | ||
84 | small cleanups/fixes) | ||
85 | Matt Domsch <matt_domsch@dell.com> (revision ioctl, adapter messages) | ||
86 | Deanna Bonds (non-DASD support, PAE fibs and 64 bit, added new adaptec controllers | ||
87 | added new ioctls, changed scsi interface to use new error handler, | ||
88 | increased the number of fibs and outstanding commands to a container) | ||
89 | |||
90 | (fixed 64bit and 64G memory model, changed confusing naming convention | ||
91 | where fibs that go to the hardware are consistently called hw_fibs and | ||
92 | not just fibs like the name of the driver tracking structure) | ||
93 | Mark Salyzyn <Mark_Salyzyn@adaptec.com> Fixed panic issues and added some new product ids for upcoming hbas. Performance tuning, card failover and bug mitigations. | ||
94 | |||
95 | Original Driver | ||
96 | ------------------------- | ||
97 | Adaptec Unix OEM Product Group | ||
98 | |||
99 | Mailing List | ||
100 | ------------------------- | ||
101 | linux-scsi@vger.kernel.org (Interested parties troll here) | ||
102 | Also note this is very different to Brian's original driver | ||
103 | so don't expect him to support it. | ||
104 | Adaptec does support this driver. Contact Adaptec tech support or | ||
105 | aacraid@adaptec.com | ||
106 | |||
107 | Original by Brian Boerner February 2001 | ||
108 | Rewritten by Alan Cox, November 2001 | ||
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl index 4963d83d1511..e651ed8d1e6f 100644 --- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl +++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl | |||
@@ -5577,7 +5577,7 @@ struct _snd_pcm_runtime { | |||
5577 | <informalexample> | 5577 | <informalexample> |
5578 | <programlisting> | 5578 | <programlisting> |
5579 | <![CDATA[ | 5579 | <![CDATA[ |
5580 | static int mychip_suspend(strut pci_dev *pci, pm_message_t state) | 5580 | static int mychip_suspend(struct pci_dev *pci, pm_message_t state) |
5581 | { | 5581 | { |
5582 | /* (1) */ | 5582 | /* (1) */ |
5583 | struct snd_card *card = pci_get_drvdata(pci); | 5583 | struct snd_card *card = pci_get_drvdata(pci); |
diff --git a/Documentation/spi/butterfly b/Documentation/spi/butterfly new file mode 100644 index 000000000000..a2e8c8d90e35 --- /dev/null +++ b/Documentation/spi/butterfly | |||
@@ -0,0 +1,57 @@ | |||
1 | spi_butterfly - parport-to-butterfly adapter driver | ||
2 | =================================================== | ||
3 | |||
4 | This is a hardware and software project that includes building and using | ||
5 | a parallel port adapter cable, together with an "AVR Butterfly" to run | ||
6 | firmware for user interfacing and/or sensors. A Butterfly is a $US20 | ||
7 | battery powered card with an AVR microcontroller and lots of goodies: | ||
8 | sensors, LCD, flash, toggle stick, and more. You can use AVR-GCC to | ||
9 | develop firmware for this, and flash it using this adapter cable. | ||
10 | |||
11 | You can make this adapter from an old printer cable and solder things | ||
12 | directly to the Butterfly. Or (if you have the parts and skills) you | ||
13 | can come up with something fancier, providing ciruit protection to the | ||
14 | Butterfly and the printer port, or with a better power supply than two | ||
15 | signal pins from the printer port. | ||
16 | |||
17 | |||
18 | The first cable connections will hook Linux up to one SPI bus, with the | ||
19 | AVR and a DataFlash chip; and to the AVR reset line. This is all you | ||
20 | need to reflash the firmware, and the pins are the standard Atmel "ISP" | ||
21 | connector pins (used also on non-Butterfly AVR boards). | ||
22 | |||
23 | Signal Butterfly Parport (DB-25) | ||
24 | ------ --------- --------------- | ||
25 | SCK = J403.PB1/SCK = pin 2/D0 | ||
26 | RESET = J403.nRST = pin 3/D1 | ||
27 | VCC = J403.VCC_EXT = pin 8/D6 | ||
28 | MOSI = J403.PB2/MOSI = pin 9/D7 | ||
29 | MISO = J403.PB3/MISO = pin 11/S7,nBUSY | ||
30 | GND = J403.GND = pin 23/GND | ||
31 | |||
32 | Then to let Linux master that bus to talk to the DataFlash chip, you must | ||
33 | (a) flash new firmware that disables SPI (set PRR.2, and disable pullups | ||
34 | by clearing PORTB.[0-3]); (b) configure the mtd_dataflash driver; and | ||
35 | (c) cable in the chipselect. | ||
36 | |||
37 | Signal Butterfly Parport (DB-25) | ||
38 | ------ --------- --------------- | ||
39 | VCC = J400.VCC_EXT = pin 7/D5 | ||
40 | SELECT = J400.PB0/nSS = pin 17/C3,nSELECT | ||
41 | GND = J400.GND = pin 24/GND | ||
42 | |||
43 | The "USI" controller, using J405, can be used for a second SPI bus. That | ||
44 | would let you talk to the AVR over SPI, running firmware that makes it act | ||
45 | as an SPI slave, while letting either Linux or the AVR use the DataFlash. | ||
46 | There are plenty of spare parport pins to wire this one up, such as: | ||
47 | |||
48 | Signal Butterfly Parport (DB-25) | ||
49 | ------ --------- --------------- | ||
50 | SCK = J403.PE4/USCK = pin 5/D3 | ||
51 | MOSI = J403.PE5/DI = pin 6/D4 | ||
52 | MISO = J403.PE6/DO = pin 12/S5,nPAPEROUT | ||
53 | GND = J403.GND = pin 22/GND | ||
54 | |||
55 | IRQ = J402.PF4 = pin 10/S6,ACK | ||
56 | GND = J402.GND(P2) = pin 25/GND | ||
57 | |||
diff --git a/Documentation/spi/spi-summary b/Documentation/spi/spi-summary new file mode 100644 index 000000000000..a5ffba33a351 --- /dev/null +++ b/Documentation/spi/spi-summary | |||
@@ -0,0 +1,457 @@ | |||
1 | Overview of Linux kernel SPI support | ||
2 | ==================================== | ||
3 | |||
4 | 02-Dec-2005 | ||
5 | |||
6 | What is SPI? | ||
7 | ------------ | ||
8 | The "Serial Peripheral Interface" (SPI) is a synchronous four wire serial | ||
9 | link used to connect microcontrollers to sensors, memory, and peripherals. | ||
10 | |||
11 | The three signal wires hold a clock (SCLK, often on the order of 10 MHz), | ||
12 | and parallel data lines with "Master Out, Slave In" (MOSI) or "Master In, | ||
13 | Slave Out" (MISO) signals. (Other names are also used.) There are four | ||
14 | clocking modes through which data is exchanged; mode-0 and mode-3 are most | ||
15 | commonly used. Each clock cycle shifts data out and data in; the clock | ||
16 | doesn't cycle except when there is data to shift. | ||
17 | |||
18 | SPI masters may use a "chip select" line to activate a given SPI slave | ||
19 | device, so those three signal wires may be connected to several chips | ||
20 | in parallel. All SPI slaves support chipselects. Some devices have | ||
21 | other signals, often including an interrupt to the master. | ||
22 | |||
23 | Unlike serial busses like USB or SMBUS, even low level protocols for | ||
24 | SPI slave functions are usually not interoperable between vendors | ||
25 | (except for cases like SPI memory chips). | ||
26 | |||
27 | - SPI may be used for request/response style device protocols, as with | ||
28 | touchscreen sensors and memory chips. | ||
29 | |||
30 | - It may also be used to stream data in either direction (half duplex), | ||
31 | or both of them at the same time (full duplex). | ||
32 | |||
33 | - Some devices may use eight bit words. Others may different word | ||
34 | lengths, such as streams of 12-bit or 20-bit digital samples. | ||
35 | |||
36 | In the same way, SPI slaves will only rarely support any kind of automatic | ||
37 | discovery/enumeration protocol. The tree of slave devices accessible from | ||
38 | a given SPI master will normally be set up manually, with configuration | ||
39 | tables. | ||
40 | |||
41 | SPI is only one of the names used by such four-wire protocols, and | ||
42 | most controllers have no problem handling "MicroWire" (think of it as | ||
43 | half-duplex SPI, for request/response protocols), SSP ("Synchronous | ||
44 | Serial Protocol"), PSP ("Programmable Serial Protocol"), and other | ||
45 | related protocols. | ||
46 | |||
47 | Microcontrollers often support both master and slave sides of the SPI | ||
48 | protocol. This document (and Linux) currently only supports the master | ||
49 | side of SPI interactions. | ||
50 | |||
51 | |||
52 | Who uses it? On what kinds of systems? | ||
53 | --------------------------------------- | ||
54 | Linux developers using SPI are probably writing device drivers for embedded | ||
55 | systems boards. SPI is used to control external chips, and it is also a | ||
56 | protocol supported by every MMC or SD memory card. (The older "DataFlash" | ||
57 | cards, predating MMC cards but using the same connectors and card shape, | ||
58 | support only SPI.) Some PC hardware uses SPI flash for BIOS code. | ||
59 | |||
60 | SPI slave chips range from digital/analog converters used for analog | ||
61 | sensors and codecs, to memory, to peripherals like USB controllers | ||
62 | or Ethernet adapters; and more. | ||
63 | |||
64 | Most systems using SPI will integrate a few devices on a mainboard. | ||
65 | Some provide SPI links on expansion connectors; in cases where no | ||
66 | dedicated SPI controller exists, GPIO pins can be used to create a | ||
67 | low speed "bitbanging" adapter. Very few systems will "hotplug" an SPI | ||
68 | controller; the reasons to use SPI focus on low cost and simple operation, | ||
69 | and if dynamic reconfiguration is important, USB will often be a more | ||
70 | appropriate low-pincount peripheral bus. | ||
71 | |||
72 | Many microcontrollers that can run Linux integrate one or more I/O | ||
73 | interfaces with SPI modes. Given SPI support, they could use MMC or SD | ||
74 | cards without needing a special purpose MMC/SD/SDIO controller. | ||
75 | |||
76 | |||
77 | How do these driver programming interfaces work? | ||
78 | ------------------------------------------------ | ||
79 | The <linux/spi/spi.h> header file includes kerneldoc, as does the | ||
80 | main source code, and you should certainly read that. This is just | ||
81 | an overview, so you get the big picture before the details. | ||
82 | |||
83 | SPI requests always go into I/O queues. Requests for a given SPI device | ||
84 | are always executed in FIFO order, and complete asynchronously through | ||
85 | completion callbacks. There are also some simple synchronous wrappers | ||
86 | for those calls, including ones for common transaction types like writing | ||
87 | a command and then reading its response. | ||
88 | |||
89 | There are two types of SPI driver, here called: | ||
90 | |||
91 | Controller drivers ... these are often built in to System-On-Chip | ||
92 | processors, and often support both Master and Slave roles. | ||
93 | These drivers touch hardware registers and may use DMA. | ||
94 | Or they can be PIO bitbangers, needing just GPIO pins. | ||
95 | |||
96 | Protocol drivers ... these pass messages through the controller | ||
97 | driver to communicate with a Slave or Master device on the | ||
98 | other side of an SPI link. | ||
99 | |||
100 | So for example one protocol driver might talk to the MTD layer to export | ||
101 | data to filesystems stored on SPI flash like DataFlash; and others might | ||
102 | control audio interfaces, present touchscreen sensors as input interfaces, | ||
103 | or monitor temperature and voltage levels during industrial processing. | ||
104 | And those might all be sharing the same controller driver. | ||
105 | |||
106 | A "struct spi_device" encapsulates the master-side interface between | ||
107 | those two types of driver. At this writing, Linux has no slave side | ||
108 | programming interface. | ||
109 | |||
110 | There is a minimal core of SPI programming interfaces, focussing on | ||
111 | using driver model to connect controller and protocol drivers using | ||
112 | device tables provided by board specific initialization code. SPI | ||
113 | shows up in sysfs in several locations: | ||
114 | |||
115 | /sys/devices/.../CTLR/spiB.C ... spi_device for on bus "B", | ||
116 | chipselect C, accessed through CTLR. | ||
117 | |||
118 | /sys/devices/.../CTLR/spiB.C/modalias ... identifies the driver | ||
119 | that should be used with this device (for hotplug/coldplug) | ||
120 | |||
121 | /sys/bus/spi/devices/spiB.C ... symlink to the physical | ||
122 | spiB-C device | ||
123 | |||
124 | /sys/bus/spi/drivers/D ... driver for one or more spi*.* devices | ||
125 | |||
126 | /sys/class/spi_master/spiB ... class device for the controller | ||
127 | managing bus "B". All the spiB.* devices share the same | ||
128 | physical SPI bus segment, with SCLK, MOSI, and MISO. | ||
129 | |||
130 | |||
131 | How does board-specific init code declare SPI devices? | ||
132 | ------------------------------------------------------ | ||
133 | Linux needs several kinds of information to properly configure SPI devices. | ||
134 | That information is normally provided by board-specific code, even for | ||
135 | chips that do support some of automated discovery/enumeration. | ||
136 | |||
137 | DECLARE CONTROLLERS | ||
138 | |||
139 | The first kind of information is a list of what SPI controllers exist. | ||
140 | For System-on-Chip (SOC) based boards, these will usually be platform | ||
141 | devices, and the controller may need some platform_data in order to | ||
142 | operate properly. The "struct platform_device" will include resources | ||
143 | like the physical address of the controller's first register and its IRQ. | ||
144 | |||
145 | Platforms will often abstract the "register SPI controller" operation, | ||
146 | maybe coupling it with code to initialize pin configurations, so that | ||
147 | the arch/.../mach-*/board-*.c files for several boards can all share the | ||
148 | same basic controller setup code. This is because most SOCs have several | ||
149 | SPI-capable controllers, and only the ones actually usable on a given | ||
150 | board should normally be set up and registered. | ||
151 | |||
152 | So for example arch/.../mach-*/board-*.c files might have code like: | ||
153 | |||
154 | #include <asm/arch/spi.h> /* for mysoc_spi_data */ | ||
155 | |||
156 | /* if your mach-* infrastructure doesn't support kernels that can | ||
157 | * run on multiple boards, pdata wouldn't benefit from "__init". | ||
158 | */ | ||
159 | static struct mysoc_spi_data __init pdata = { ... }; | ||
160 | |||
161 | static __init board_init(void) | ||
162 | { | ||
163 | ... | ||
164 | /* this board only uses SPI controller #2 */ | ||
165 | mysoc_register_spi(2, &pdata); | ||
166 | ... | ||
167 | } | ||
168 | |||
169 | And SOC-specific utility code might look something like: | ||
170 | |||
171 | #include <asm/arch/spi.h> | ||
172 | |||
173 | static struct platform_device spi2 = { ... }; | ||
174 | |||
175 | void mysoc_register_spi(unsigned n, struct mysoc_spi_data *pdata) | ||
176 | { | ||
177 | struct mysoc_spi_data *pdata2; | ||
178 | |||
179 | pdata2 = kmalloc(sizeof *pdata2, GFP_KERNEL); | ||
180 | *pdata2 = pdata; | ||
181 | ... | ||
182 | if (n == 2) { | ||
183 | spi2->dev.platform_data = pdata2; | ||
184 | register_platform_device(&spi2); | ||
185 | |||
186 | /* also: set up pin modes so the spi2 signals are | ||
187 | * visible on the relevant pins ... bootloaders on | ||
188 | * production boards may already have done this, but | ||
189 | * developer boards will often need Linux to do it. | ||
190 | */ | ||
191 | } | ||
192 | ... | ||
193 | } | ||
194 | |||
195 | Notice how the platform_data for boards may be different, even if the | ||
196 | same SOC controller is used. For example, on one board SPI might use | ||
197 | an external clock, where another derives the SPI clock from current | ||
198 | settings of some master clock. | ||
199 | |||
200 | |||
201 | DECLARE SLAVE DEVICES | ||
202 | |||
203 | The second kind of information is a list of what SPI slave devices exist | ||
204 | on the target board, often with some board-specific data needed for the | ||
205 | driver to work correctly. | ||
206 | |||
207 | Normally your arch/.../mach-*/board-*.c files would provide a small table | ||
208 | listing the SPI devices on each board. (This would typically be only a | ||
209 | small handful.) That might look like: | ||
210 | |||
211 | static struct ads7846_platform_data ads_info = { | ||
212 | .vref_delay_usecs = 100, | ||
213 | .x_plate_ohms = 580, | ||
214 | .y_plate_ohms = 410, | ||
215 | }; | ||
216 | |||
217 | static struct spi_board_info spi_board_info[] __initdata = { | ||
218 | { | ||
219 | .modalias = "ads7846", | ||
220 | .platform_data = &ads_info, | ||
221 | .mode = SPI_MODE_0, | ||
222 | .irq = GPIO_IRQ(31), | ||
223 | .max_speed_hz = 120000 /* max sample rate at 3V */ * 16, | ||
224 | .bus_num = 1, | ||
225 | .chip_select = 0, | ||
226 | }, | ||
227 | }; | ||
228 | |||
229 | Again, notice how board-specific information is provided; each chip may need | ||
230 | several types. This example shows generic constraints like the fastest SPI | ||
231 | clock to allow (a function of board voltage in this case) or how an IRQ pin | ||
232 | is wired, plus chip-specific constraints like an important delay that's | ||
233 | changed by the capacitance at one pin. | ||
234 | |||
235 | (There's also "controller_data", information that may be useful to the | ||
236 | controller driver. An example would be peripheral-specific DMA tuning | ||
237 | data or chipselect callbacks. This is stored in spi_device later.) | ||
238 | |||
239 | The board_info should provide enough information to let the system work | ||
240 | without the chip's driver being loaded. The most troublesome aspect of | ||
241 | that is likely the SPI_CS_HIGH bit in the spi_device.mode field, since | ||
242 | sharing a bus with a device that interprets chipselect "backwards" is | ||
243 | not possible. | ||
244 | |||
245 | Then your board initialization code would register that table with the SPI | ||
246 | infrastructure, so that it's available later when the SPI master controller | ||
247 | driver is registered: | ||
248 | |||
249 | spi_register_board_info(spi_board_info, ARRAY_SIZE(spi_board_info)); | ||
250 | |||
251 | Like with other static board-specific setup, you won't unregister those. | ||
252 | |||
253 | The widely used "card" style computers bundle memory, cpu, and little else | ||
254 | onto a card that's maybe just thirty square centimeters. On such systems, | ||
255 | your arch/.../mach-.../board-*.c file would primarily provide information | ||
256 | about the devices on the mainboard into which such a card is plugged. That | ||
257 | certainly includes SPI devices hooked up through the card connectors! | ||
258 | |||
259 | |||
260 | NON-STATIC CONFIGURATIONS | ||
261 | |||
262 | Developer boards often play by different rules than product boards, and one | ||
263 | example is the potential need to hotplug SPI devices and/or controllers. | ||
264 | |||
265 | For those cases you might need to use use spi_busnum_to_master() to look | ||
266 | up the spi bus master, and will likely need spi_new_device() to provide the | ||
267 | board info based on the board that was hotplugged. Of course, you'd later | ||
268 | call at least spi_unregister_device() when that board is removed. | ||
269 | |||
270 | When Linux includes support for MMC/SD/SDIO/DataFlash cards through SPI, those | ||
271 | configurations will also be dynamic. Fortunately, those devices all support | ||
272 | basic device identification probes, so that support should hotplug normally. | ||
273 | |||
274 | |||
275 | How do I write an "SPI Protocol Driver"? | ||
276 | ---------------------------------------- | ||
277 | All SPI drivers are currently kernel drivers. A userspace driver API | ||
278 | would just be another kernel driver, probably offering some lowlevel | ||
279 | access through aio_read(), aio_write(), and ioctl() calls and using the | ||
280 | standard userspace sysfs mechanisms to bind to a given SPI device. | ||
281 | |||
282 | SPI protocol drivers somewhat resemble platform device drivers: | ||
283 | |||
284 | static struct spi_driver CHIP_driver = { | ||
285 | .driver = { | ||
286 | .name = "CHIP", | ||
287 | .bus = &spi_bus_type, | ||
288 | .owner = THIS_MODULE, | ||
289 | }, | ||
290 | |||
291 | .probe = CHIP_probe, | ||
292 | .remove = __devexit_p(CHIP_remove), | ||
293 | .suspend = CHIP_suspend, | ||
294 | .resume = CHIP_resume, | ||
295 | }; | ||
296 | |||
297 | The driver core will autmatically attempt to bind this driver to any SPI | ||
298 | device whose board_info gave a modalias of "CHIP". Your probe() code | ||
299 | might look like this unless you're creating a class_device: | ||
300 | |||
301 | static int __devinit CHIP_probe(struct spi_device *spi) | ||
302 | { | ||
303 | struct CHIP *chip; | ||
304 | struct CHIP_platform_data *pdata; | ||
305 | |||
306 | /* assuming the driver requires board-specific data: */ | ||
307 | pdata = &spi->dev.platform_data; | ||
308 | if (!pdata) | ||
309 | return -ENODEV; | ||
310 | |||
311 | /* get memory for driver's per-chip state */ | ||
312 | chip = kzalloc(sizeof *chip, GFP_KERNEL); | ||
313 | if (!chip) | ||
314 | return -ENOMEM; | ||
315 | dev_set_drvdata(&spi->dev, chip); | ||
316 | |||
317 | ... etc | ||
318 | return 0; | ||
319 | } | ||
320 | |||
321 | As soon as it enters probe(), the driver may issue I/O requests to | ||
322 | the SPI device using "struct spi_message". When remove() returns, | ||
323 | the driver guarantees that it won't submit any more such messages. | ||
324 | |||
325 | - An spi_message is a sequence of of protocol operations, executed | ||
326 | as one atomic sequence. SPI driver controls include: | ||
327 | |||
328 | + when bidirectional reads and writes start ... by how its | ||
329 | sequence of spi_transfer requests is arranged; | ||
330 | |||
331 | + optionally defining short delays after transfers ... using | ||
332 | the spi_transfer.delay_usecs setting; | ||
333 | |||
334 | + whether the chipselect becomes inactive after a transfer and | ||
335 | any delay ... by using the spi_transfer.cs_change flag; | ||
336 | |||
337 | + hinting whether the next message is likely to go to this same | ||
338 | device ... using the spi_transfer.cs_change flag on the last | ||
339 | transfer in that atomic group, and potentially saving costs | ||
340 | for chip deselect and select operations. | ||
341 | |||
342 | - Follow standard kernel rules, and provide DMA-safe buffers in | ||
343 | your messages. That way controller drivers using DMA aren't forced | ||
344 | to make extra copies unless the hardware requires it (e.g. working | ||
345 | around hardware errata that force the use of bounce buffering). | ||
346 | |||
347 | If standard dma_map_single() handling of these buffers is inappropriate, | ||
348 | you can use spi_message.is_dma_mapped to tell the controller driver | ||
349 | that you've already provided the relevant DMA addresses. | ||
350 | |||
351 | - The basic I/O primitive is spi_async(). Async requests may be | ||
352 | issued in any context (irq handler, task, etc) and completion | ||
353 | is reported using a callback provided with the message. | ||
354 | After any detected error, the chip is deselected and processing | ||
355 | of that spi_message is aborted. | ||
356 | |||
357 | - There are also synchronous wrappers like spi_sync(), and wrappers | ||
358 | like spi_read(), spi_write(), and spi_write_then_read(). These | ||
359 | may be issued only in contexts that may sleep, and they're all | ||
360 | clean (and small, and "optional") layers over spi_async(). | ||
361 | |||
362 | - The spi_write_then_read() call, and convenience wrappers around | ||
363 | it, should only be used with small amounts of data where the | ||
364 | cost of an extra copy may be ignored. It's designed to support | ||
365 | common RPC-style requests, such as writing an eight bit command | ||
366 | and reading a sixteen bit response -- spi_w8r16() being one its | ||
367 | wrappers, doing exactly that. | ||
368 | |||
369 | Some drivers may need to modify spi_device characteristics like the | ||
370 | transfer mode, wordsize, or clock rate. This is done with spi_setup(), | ||
371 | which would normally be called from probe() before the first I/O is | ||
372 | done to the device. | ||
373 | |||
374 | While "spi_device" would be the bottom boundary of the driver, the | ||
375 | upper boundaries might include sysfs (especially for sensor readings), | ||
376 | the input layer, ALSA, networking, MTD, the character device framework, | ||
377 | or other Linux subsystems. | ||
378 | |||
379 | Note that there are two types of memory your driver must manage as part | ||
380 | of interacting with SPI devices. | ||
381 | |||
382 | - I/O buffers use the usual Linux rules, and must be DMA-safe. | ||
383 | You'd normally allocate them from the heap or free page pool. | ||
384 | Don't use the stack, or anything that's declared "static". | ||
385 | |||
386 | - The spi_message and spi_transfer metadata used to glue those | ||
387 | I/O buffers into a group of protocol transactions. These can | ||
388 | be allocated anywhere it's convenient, including as part of | ||
389 | other allocate-once driver data structures. Zero-init these. | ||
390 | |||
391 | If you like, spi_message_alloc() and spi_message_free() convenience | ||
392 | routines are available to allocate and zero-initialize an spi_message | ||
393 | with several transfers. | ||
394 | |||
395 | |||
396 | How do I write an "SPI Master Controller Driver"? | ||
397 | ------------------------------------------------- | ||
398 | An SPI controller will probably be registered on the platform_bus; write | ||
399 | a driver to bind to the device, whichever bus is involved. | ||
400 | |||
401 | The main task of this type of driver is to provide an "spi_master". | ||
402 | Use spi_alloc_master() to allocate the master, and class_get_devdata() | ||
403 | to get the driver-private data allocated for that device. | ||
404 | |||
405 | struct spi_master *master; | ||
406 | struct CONTROLLER *c; | ||
407 | |||
408 | master = spi_alloc_master(dev, sizeof *c); | ||
409 | if (!master) | ||
410 | return -ENODEV; | ||
411 | |||
412 | c = class_get_devdata(&master->cdev); | ||
413 | |||
414 | The driver will initialize the fields of that spi_master, including the | ||
415 | bus number (maybe the same as the platform device ID) and three methods | ||
416 | used to interact with the SPI core and SPI protocol drivers. It will | ||
417 | also initialize its own internal state. | ||
418 | |||
419 | master->setup(struct spi_device *spi) | ||
420 | This sets up the device clock rate, SPI mode, and word sizes. | ||
421 | Drivers may change the defaults provided by board_info, and then | ||
422 | call spi_setup(spi) to invoke this routine. It may sleep. | ||
423 | |||
424 | master->transfer(struct spi_device *spi, struct spi_message *message) | ||
425 | This must not sleep. Its responsibility is arrange that the | ||
426 | transfer happens and its complete() callback is issued; the two | ||
427 | will normally happen later, after other transfers complete. | ||
428 | |||
429 | master->cleanup(struct spi_device *spi) | ||
430 | Your controller driver may use spi_device.controller_state to hold | ||
431 | state it dynamically associates with that device. If you do that, | ||
432 | be sure to provide the cleanup() method to free that state. | ||
433 | |||
434 | The bulk of the driver will be managing the I/O queue fed by transfer(). | ||
435 | |||
436 | That queue could be purely conceptual. For example, a driver used only | ||
437 | for low-frequency sensor acess might be fine using synchronous PIO. | ||
438 | |||
439 | But the queue will probably be very real, using message->queue, PIO, | ||
440 | often DMA (especially if the root filesystem is in SPI flash), and | ||
441 | execution contexts like IRQ handlers, tasklets, or workqueues (such | ||
442 | as keventd). Your driver can be as fancy, or as simple, as you need. | ||
443 | |||
444 | |||
445 | THANKS TO | ||
446 | --------- | ||
447 | Contributors to Linux-SPI discussions include (in alphabetical order, | ||
448 | by last name): | ||
449 | |||
450 | David Brownell | ||
451 | Russell King | ||
452 | Dmitry Pervushin | ||
453 | Stephen Street | ||
454 | Mark Underwood | ||
455 | Andrew Victor | ||
456 | Vitaly Wool | ||
457 | |||
diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt index 2c81305090df..e409e5d07486 100644 --- a/Documentation/stable_kernel_rules.txt +++ b/Documentation/stable_kernel_rules.txt | |||
@@ -1,58 +1,56 @@ | |||
1 | Everything you ever wanted to know about Linux 2.6 -stable releases. | 1 | Everything you ever wanted to know about Linux 2.6 -stable releases. |
2 | 2 | ||
3 | Rules on what kind of patches are accepted, and what ones are not, into | 3 | Rules on what kind of patches are accepted, and which ones are not, into the |
4 | the "-stable" tree: | 4 | "-stable" tree: |
5 | 5 | ||
6 | - It must be obviously correct and tested. | 6 | - It must be obviously correct and tested. |
7 | - It can not bigger than 100 lines, with context. | 7 | - It can not be bigger than 100 lines, with context. |
8 | - It must fix only one thing. | 8 | - It must fix only one thing. |
9 | - It must fix a real bug that bothers people (not a, "This could be a | 9 | - It must fix a real bug that bothers people (not a, "This could be a |
10 | problem..." type thing.) | 10 | problem..." type thing). |
11 | - It must fix a problem that causes a build error (but not for things | 11 | - It must fix a problem that causes a build error (but not for things |
12 | marked CONFIG_BROKEN), an oops, a hang, data corruption, a real | 12 | marked CONFIG_BROKEN), an oops, a hang, data corruption, a real |
13 | security issue, or some "oh, that's not good" issue. In short, | 13 | security issue, or some "oh, that's not good" issue. In short, something |
14 | something critical. | 14 | critical. |
15 | - No "theoretical race condition" issues, unless an explanation of how | 15 | - No "theoretical race condition" issues, unless an explanation of how the |
16 | the race can be exploited. | 16 | race can be exploited is also provided. |
17 | - It can not contain any "trivial" fixes in it (spelling changes, | 17 | - It can not contain any "trivial" fixes in it (spelling changes, |
18 | whitespace cleanups, etc.) | 18 | whitespace cleanups, etc). |
19 | - It must be accepted by the relevant subsystem maintainer. | 19 | - It must be accepted by the relevant subsystem maintainer. |
20 | - It must follow Documentation/SubmittingPatches rules. | 20 | - It must follow the Documentation/SubmittingPatches rules. |
21 | 21 | ||
22 | 22 | ||
23 | Procedure for submitting patches to the -stable tree: | 23 | Procedure for submitting patches to the -stable tree: |
24 | 24 | ||
25 | - Send the patch, after verifying that it follows the above rules, to | 25 | - Send the patch, after verifying that it follows the above rules, to |
26 | stable@kernel.org. | 26 | stable@kernel.org. |
27 | - The sender will receive an ack when the patch has been accepted into | 27 | - The sender will receive an ACK when the patch has been accepted into the |
28 | the queue, or a nak if the patch is rejected. This response might | 28 | queue, or a NAK if the patch is rejected. This response might take a few |
29 | take a few days, according to the developer's schedules. | 29 | days, according to the developer's schedules. |
30 | - If accepted, the patch will be added to the -stable queue, for review | 30 | - If accepted, the patch will be added to the -stable queue, for review by |
31 | by other developers. | 31 | other developers. |
32 | - Security patches should not be sent to this alias, but instead to the | 32 | - Security patches should not be sent to this alias, but instead to the |
33 | documented security@kernel.org. | 33 | documented security@kernel.org address. |
34 | 34 | ||
35 | 35 | ||
36 | Review cycle: | 36 | Review cycle: |
37 | 37 | ||
38 | - When the -stable maintainers decide for a review cycle, the patches | 38 | - When the -stable maintainers decide for a review cycle, the patches will be |
39 | will be sent to the review committee, and the maintainer of the | 39 | sent to the review committee, and the maintainer of the affected area of |
40 | affected area of the patch (unless the submitter is the maintainer of | 40 | the patch (unless the submitter is the maintainer of the area) and CC: to |
41 | the area) and CC: to the linux-kernel mailing list. | 41 | the linux-kernel mailing list. |
42 | - The review committee has 48 hours in which to ack or nak the patch. | 42 | - The review committee has 48 hours in which to ACK or NAK the patch. |
43 | - If the patch is rejected by a member of the committee, or linux-kernel | 43 | - If the patch is rejected by a member of the committee, or linux-kernel |
44 | members object to the patch, bringing up issues that the maintainers | 44 | members object to the patch, bringing up issues that the maintainers and |
45 | and members did not realize, the patch will be dropped from the | 45 | members did not realize, the patch will be dropped from the queue. |
46 | queue. | 46 | - At the end of the review cycle, the ACKed patches will be added to the |
47 | - At the end of the review cycle, the acked patches will be added to | 47 | latest -stable release, and a new -stable release will happen. |
48 | the latest -stable release, and a new -stable release will happen. | 48 | - Security patches will be accepted into the -stable tree directly from the |
49 | - Security patches will be accepted into the -stable tree directly from | 49 | security kernel team, and not go through the normal review cycle. |
50 | the security kernel team, and not go through the normal review cycle. | ||
51 | Contact the kernel security team for more details on this procedure. | 50 | Contact the kernel security team for more details on this procedure. |
52 | 51 | ||
53 | 52 | ||
54 | Review committe: | 53 | Review committe: |
55 | 54 | ||
56 | - This will be made up of a number of kernel developers who have | 55 | - This is made up of a number of kernel developers who have volunteered for |
57 | volunteered for this task, and a few that haven't. | 56 | this task, and a few that haven't. |
58 | |||
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index 2f1aae32a5d9..391dd64363e7 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -26,12 +26,14 @@ Currently, these files are in /proc/sys/vm: | |||
26 | - min_free_kbytes | 26 | - min_free_kbytes |
27 | - laptop_mode | 27 | - laptop_mode |
28 | - block_dump | 28 | - block_dump |
29 | - drop-caches | ||
30 | - zone_reclaim_mode | ||
29 | 31 | ||
30 | ============================================================== | 32 | ============================================================== |
31 | 33 | ||
32 | dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, | 34 | dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, |
33 | dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, | 35 | dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, |
34 | block_dump, swap_token_timeout: | 36 | block_dump, swap_token_timeout, drop-caches: |
35 | 37 | ||
36 | See Documentation/filesystems/proc.txt | 38 | See Documentation/filesystems/proc.txt |
37 | 39 | ||
@@ -102,3 +104,37 @@ This is used to force the Linux VM to keep a minimum number | |||
102 | of kilobytes free. The VM uses this number to compute a pages_min | 104 | of kilobytes free. The VM uses this number to compute a pages_min |
103 | value for each lowmem zone in the system. Each lowmem zone gets | 105 | value for each lowmem zone in the system. Each lowmem zone gets |
104 | a number of reserved free pages based proportionally on its size. | 106 | a number of reserved free pages based proportionally on its size. |
107 | |||
108 | ============================================================== | ||
109 | |||
110 | percpu_pagelist_fraction | ||
111 | |||
112 | This is the fraction of pages at most (high mark pcp->high) in each zone that | ||
113 | are allocated for each per cpu page list. The min value for this is 8. It | ||
114 | means that we don't allow more than 1/8th of pages in each zone to be | ||
115 | allocated in any single per_cpu_pagelist. This entry only changes the value | ||
116 | of hot per cpu pagelists. User can specify a number like 100 to allocate | ||
117 | 1/100th of each zone to each per cpu page list. | ||
118 | |||
119 | The batch value of each per cpu pagelist is also updated as a result. It is | ||
120 | set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8) | ||
121 | |||
122 | The initial value is zero. Kernel does not use this value at boot time to set | ||
123 | the high water marks for each per cpu page list. | ||
124 | |||
125 | =============================================================== | ||
126 | |||
127 | zone_reclaim_mode: | ||
128 | |||
129 | This is set during bootup to 1 if it is determined that pages from | ||
130 | remote zones will cause a significant performance reduction. The | ||
131 | page allocator will then reclaim easily reusable pages (those page | ||
132 | cache pages that are currently not used) before going off node. | ||
133 | |||
134 | The user can override this setting. It may be beneficial to switch | ||
135 | off zone reclaim if the system is used for a file server and all | ||
136 | of memory should be used for caching files from disk. | ||
137 | |||
138 | It may be beneficial to switch this on if one wants to do zone | ||
139 | reclaim regardless of the numa distances in the system. | ||
140 | |||
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv index 330246ac80f8..b72706c58a44 100644 --- a/Documentation/video4linux/CARDLIST.bttv +++ b/Documentation/video4linux/CARDLIST.bttv | |||
@@ -141,3 +141,5 @@ | |||
141 | 140 -> Osprey 440 [0070:ff07] | 141 | 140 -> Osprey 440 [0070:ff07] |
142 | 141 -> Asound Skyeye PCTV | 142 | 141 -> Asound Skyeye PCTV |
143 | 142 -> Sabrent TV-FM (bttv version) | 143 | 142 -> Sabrent TV-FM (bttv version) |
144 | 143 -> Hauppauge ImpactVCB (bt878) [0070:13eb] | ||
145 | 144 -> MagicTV | ||
diff --git a/Documentation/video4linux/CARDLIST.cx88 b/Documentation/video4linux/CARDLIST.cx88 index a1017d1a85d4..56e194f1a0b0 100644 --- a/Documentation/video4linux/CARDLIST.cx88 +++ b/Documentation/video4linux/CARDLIST.cx88 | |||
@@ -16,10 +16,10 @@ | |||
16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] | 16 | 15 -> DViCO FusionHDTV DVB-T1 [18ac:db00] |
17 | 16 -> KWorld LTV883RF | 17 | 16 -> KWorld LTV883RF |
18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] | 18 | 17 -> DViCO FusionHDTV 3 Gold-Q [18ac:d810] |
19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002] | 19 | 18 -> Hauppauge Nova-T DVB-T [0070:9002,0070:9001] |
20 | 19 -> Conexant DVB-T reference design [14f1:0187] | 20 | 19 -> Conexant DVB-T reference design [14f1:0187] |
21 | 20 -> Provideo PV259 [1540:2580] | 21 | 20 -> Provideo PV259 [1540:2580] |
22 | 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10] | 22 | 21 -> DViCO FusionHDTV DVB-T Plus [18ac:db10,18ac:db11] |
23 | 22 -> pcHDTV HD3000 HDTV [7063:3000] | 23 | 22 -> pcHDTV HD3000 HDTV [7063:3000] |
24 | 23 -> digitalnow DNTV Live! DVB-T [17de:a8a6] | 24 | 23 -> digitalnow DNTV Live! DVB-T [17de:a8a6] |
25 | 24 -> Hauppauge WinTV 28xxx (Roslyn) models [0070:2801] | 25 | 24 -> Hauppauge WinTV 28xxx (Roslyn) models [0070:2801] |
@@ -35,3 +35,11 @@ | |||
35 | 34 -> ATI HDTV Wonder [1002:a101] | 35 | 34 -> ATI HDTV Wonder [1002:a101] |
36 | 35 -> WinFast DTV1000-T [107d:665f] | 36 | 35 -> WinFast DTV1000-T [107d:665f] |
37 | 36 -> AVerTV 303 (M126) [1461:000a] | 37 | 36 -> AVerTV 303 (M126) [1461:000a] |
38 | 37 -> Hauppauge Nova-S-Plus DVB-S [0070:9201,0070:9202] | ||
39 | 38 -> Hauppauge Nova-SE2 DVB-S [0070:9200] | ||
40 | 39 -> KWorld DVB-S 100 [17de:08b2] | ||
41 | 40 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid [0070:9400,0070:9402] | ||
42 | 41 -> Hauppauge WinTV-HVR1100 DVB-T/Hybrid (Low Profile) [0070:9800,0070:9802] | ||
43 | 42 -> digitalnow DNTV Live! DVB-T Pro [1822:0025] | ||
44 | 43 -> KWorld/VStream XPert DVB-T with cx22702 [17de:08a1] | ||
45 | 44 -> DViCO FusionHDTV DVB-T Dual Digital [18ac:db50] | ||
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134 index efb708ec116a..cb3a59bbeb17 100644 --- a/Documentation/video4linux/CARDLIST.saa7134 +++ b/Documentation/video4linux/CARDLIST.saa7134 | |||
@@ -56,7 +56,7 @@ | |||
56 | 55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306] | 56 | 55 -> LifeView FlyDVB-T DUO [5168:0502,5168:0306] |
57 | 56 -> Avermedia AVerTV 307 [1461:a70a] | 57 | 56 -> Avermedia AVerTV 307 [1461:a70a] |
58 | 57 -> Avermedia AVerTV GO 007 FM [1461:f31f] | 58 | 57 -> Avermedia AVerTV GO 007 FM [1461:f31f] |
59 | 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0370,1421:1370] | 59 | 58 -> ADS Tech Instant TV (saa7135) [1421:0350,1421:0351,1421:0370,1421:1370] |
60 | 59 -> Kworld/Tevion V-Stream Xpert TV PVR7134 | 60 | 59 -> Kworld/Tevion V-Stream Xpert TV PVR7134 |
61 | 60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502] | 61 | 60 -> Typhoon DVB-T Duo Digital/Analog Cardbus [4e42:0502] |
62 | 61 -> Philips TOUGH DVB-T reference design [1131:2004] | 62 | 61 -> Philips TOUGH DVB-T reference design [1131:2004] |
@@ -81,4 +81,5 @@ | |||
81 | 80 -> ASUS Digimatrix TV [1043:0210] | 81 | 80 -> ASUS Digimatrix TV [1043:0210] |
82 | 81 -> Philips Tiger reference design [1131:2018] | 82 | 81 -> Philips Tiger reference design [1131:2018] |
83 | 82 -> MSI TV@Anywhere plus [1462:6231] | 83 | 82 -> MSI TV@Anywhere plus [1462:6231] |
84 | 84 | 83 -> Terratec Cinergy 250 PCI TV [153b:1160] | |
85 | 84 -> LifeView FlyDVB Trio [5168:0319] | ||
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner index 9d6544ea9f41..f6d0cf7b7922 100644 --- a/Documentation/video4linux/CARDLIST.tuner +++ b/Documentation/video4linux/CARDLIST.tuner | |||
@@ -40,7 +40,7 @@ tuner=38 - Philips PAL/SECAM multi (FM1216ME MK3) | |||
40 | tuner=39 - LG NTSC (newer TAPC series) | 40 | tuner=39 - LG NTSC (newer TAPC series) |
41 | tuner=40 - HITACHI V7-J180AT | 41 | tuner=40 - HITACHI V7-J180AT |
42 | tuner=41 - Philips PAL_MK (FI1216 MK) | 42 | tuner=41 - Philips PAL_MK (FI1216 MK) |
43 | tuner=42 - Philips 1236D ATSC/NTSC daul in | 43 | tuner=42 - Philips 1236D ATSC/NTSC dual in |
44 | tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F) | 44 | tuner=43 - Philips NTSC MK3 (FM1236MK3 or FM1236/F) |
45 | tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant) | 45 | tuner=44 - Philips 4 in 1 (ATI TV Wonder Pro/Conexant) |
46 | tuner=45 - Microtune 4049 FM5 | 46 | tuner=45 - Microtune 4049 FM5 |
@@ -50,7 +50,7 @@ tuner=48 - Tenna TNF 8831 BGFF) | |||
50 | tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in | 50 | tuner=49 - Microtune 4042 FI5 ATSC/NTSC dual in |
51 | tuner=50 - TCL 2002N | 51 | tuner=50 - TCL 2002N |
52 | tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) | 52 | tuner=51 - Philips PAL/SECAM_D (FM 1256 I-H3) |
53 | tuner=52 - Thomson DDT 7610 (ATSC/NTSC) | 53 | tuner=52 - Thomson DTT 7610 (ATSC/NTSC) |
54 | tuner=53 - Philips FQ1286 | 54 | tuner=53 - Philips FQ1286 |
55 | tuner=54 - tda8290+75 | 55 | tuner=54 - tda8290+75 |
56 | tuner=55 - TCL 2002MB | 56 | tuner=55 - TCL 2002MB |
@@ -58,7 +58,7 @@ tuner=56 - Philips PAL/SECAM multi (FQ1216AME MK4) | |||
58 | tuner=57 - Philips FQ1236A MK4 | 58 | tuner=57 - Philips FQ1236A MK4 |
59 | tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF | 59 | tuner=58 - Ymec TVision TVF-8531MF/8831MF/8731MF |
60 | tuner=59 - Ymec TVision TVF-5533MF | 60 | tuner=59 - Ymec TVision TVF-5533MF |
61 | tuner=60 - Thomson DDT 7611 (ATSC/NTSC) | 61 | tuner=60 - Thomson DTT 761X (ATSC/NTSC) |
62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF | 62 | tuner=61 - Tena TNF9533-D/IF/TNF9533-B/DF |
63 | tuner=62 - Philips TEA5767HN FM Radio | 63 | tuner=62 - Philips TEA5767HN FM Radio |
64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner | 64 | tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner |
@@ -68,3 +68,4 @@ tuner=66 - LG NTSC (TALN mini series) | |||
68 | tuner=67 - Philips TD1316 Hybrid Tuner | 68 | tuner=67 - Philips TD1316 Hybrid Tuner |
69 | tuner=68 - Philips TUV1236D ATSC/NTSC dual in | 69 | tuner=68 - Philips TUV1236D ATSC/NTSC dual in |
70 | tuner=69 - Tena TNF 5335 MF | 70 | tuner=69 - Tena TNF 5335 MF |
71 | tuner=70 - Samsung TCPN 2121P30A | ||
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt index e566affeed7f..9c5fc15d03d1 100644 --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt | |||
@@ -125,7 +125,7 @@ SMP | |||
125 | cpumask=MASK only use cpus with bits set in mask | 125 | cpumask=MASK only use cpus with bits set in mask |
126 | 126 | ||
127 | additional_cpus=NUM Allow NUM more CPUs for hotplug | 127 | additional_cpus=NUM Allow NUM more CPUs for hotplug |
128 | (defaults are specified by the BIOS or half the available CPUs) | 128 | (defaults are specified by the BIOS, see Documentation/x86_64/cpu-hotplug-spec) |
129 | 129 | ||
130 | NUMA | 130 | NUMA |
131 | 131 | ||
@@ -198,6 +198,6 @@ Debugging | |||
198 | 198 | ||
199 | Misc | 199 | Misc |
200 | 200 | ||
201 | noreplacement Don't replace instructions with more appropiate ones | 201 | noreplacement Don't replace instructions with more appropriate ones |
202 | for the CPU. This may be useful on asymmetric MP systems | 202 | for the CPU. This may be useful on asymmetric MP systems |
203 | where some CPU have less capabilities than the others. | 203 | where some CPU have less capabilities than the others. |
diff --git a/Documentation/x86_64/cpu-hotplug-spec b/Documentation/x86_64/cpu-hotplug-spec new file mode 100644 index 000000000000..5c0fa345e556 --- /dev/null +++ b/Documentation/x86_64/cpu-hotplug-spec | |||
@@ -0,0 +1,21 @@ | |||
1 | Firmware support for CPU hotplug under Linux/x86-64 | ||
2 | --------------------------------------------------- | ||
3 | |||
4 | Linux/x86-64 supports CPU hotplug now. For various reasons Linux wants to | ||
5 | know in advance boot time the maximum number of CPUs that could be plugged | ||
6 | into the system. ACPI 3.0 currently has no official way to supply | ||
7 | this information from the firmware to the operating system. | ||
8 | |||
9 | In ACPI each CPU needs an LAPIC object in the MADT table (5.2.11.5 in the | ||
10 | ACPI 3.0 specification). ACPI already has the concept of disabled LAPIC | ||
11 | objects by setting the Enabled bit in the LAPIC object to zero. | ||
12 | |||
13 | For CPU hotplug Linux/x86-64 expects now that any possible future hotpluggable | ||
14 | CPU is already available in the MADT. If the CPU is not available yet | ||
15 | it should have its LAPIC Enabled bit set to 0. Linux will use the number | ||
16 | of disabled LAPICs to compute the maximum number of future CPUs. | ||
17 | |||
18 | In the worst case the user can overwrite this choice using a command line | ||
19 | option (additional_cpus=...), but it is recommended to supply the correct | ||
20 | number (or a reasonable approximation of it, with erring towards more not less) | ||
21 | in the MADT to avoid manual configuration. | ||