aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/Changes31
-rw-r--r--Documentation/CodingStyle41
-rw-r--r--Documentation/RCU/rcuref.txt87
-rw-r--r--Documentation/SubmittingDrivers24
-rw-r--r--Documentation/SubmittingPatches63
-rw-r--r--Documentation/applying-patches.txt29
-rw-r--r--Documentation/block/stat.txt82
-rw-r--r--Documentation/cpu-hotplug.txt357
-rw-r--r--Documentation/cpusets.txt161
-rw-r--r--Documentation/fb/cyblafb/bugs1
-rw-r--r--Documentation/fb/cyblafb/fb.modes57
-rw-r--r--Documentation/fb/cyblafb/performance1
-rw-r--r--Documentation/fb/cyblafb/todo5
-rw-r--r--Documentation/fb/cyblafb/usage33
-rw-r--r--Documentation/fb/cyblafb/whatsnew29
-rw-r--r--Documentation/filesystems/ext3.txt5
-rw-r--r--Documentation/filesystems/proc.txt17
-rw-r--r--Documentation/filesystems/ramfs-rootfs-initramfs.txt72
-rw-r--r--Documentation/filesystems/relayfs.txt126
-rw-r--r--Documentation/keys-request-key.txt22
-rw-r--r--Documentation/keys.txt43
-rw-r--r--Documentation/networking/bonding.txt2
-rw-r--r--Documentation/sysctl/vm.txt20
23 files changed, 1085 insertions, 223 deletions
diff --git a/Documentation/Changes b/Documentation/Changes
index 86b86399d61d..fe5ae0f55020 100644
--- a/Documentation/Changes
+++ b/Documentation/Changes
@@ -31,8 +31,6 @@ al espaņol de este documento en varios formatos.
31Eine deutsche Version dieser Datei finden Sie unter 31Eine deutsche Version dieser Datei finden Sie unter
32<http://www.stefan-winter.de/Changes-2.4.0.txt>. 32<http://www.stefan-winter.de/Changes-2.4.0.txt>.
33 33
34Last updated: October 29th, 2002
35
36Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu). 34Chris Ricker (kaboom@gatech.edu or chris.ricker@genetics.utah.edu).
37 35
38Current Minimal Requirements 36Current Minimal Requirements
@@ -48,7 +46,7 @@ necessary on all systems; obviously, if you don't have any ISDN
48hardware, for example, you probably needn't concern yourself with 46hardware, for example, you probably needn't concern yourself with
49isdn4k-utils. 47isdn4k-utils.
50 48
51o Gnu C 2.95.3 # gcc --version 49o Gnu C 3.2 # gcc --version
52o Gnu make 3.79.1 # make --version 50o Gnu make 3.79.1 # make --version
53o binutils 2.12 # ld -v 51o binutils 2.12 # ld -v
54o util-linux 2.10o # fdformat --version 52o util-linux 2.10o # fdformat --version
@@ -74,26 +72,7 @@ GCC
74--- 72---
75 73
76The gcc version requirements may vary depending on the type of CPU in your 74The gcc version requirements may vary depending on the type of CPU in your
77computer. The next paragraph applies to users of x86 CPUs, but not 75computer.
78necessarily to users of other CPUs. Users of other CPUs should obtain
79information about their gcc version requirements from another source.
80
81The recommended compiler for the kernel is gcc 2.95.x (x >= 3), and it
82should be used when you need absolute stability. You may use gcc 3.0.x
83instead if you wish, although it may cause problems. Later versions of gcc
84have not received much testing for Linux kernel compilation, and there are
85almost certainly bugs (mainly, but not exclusively, in the kernel) that
86will need to be fixed in order to use these compilers. In any case, using
87pgcc instead of plain gcc is just asking for trouble.
88
89The Red Hat gcc 2.96 compiler subtree can also be used to build this tree.
90You should ensure you use gcc-2.96-74 or later. gcc-2.96-54 will not build
91the kernel correctly.
92
93In addition, please pay attention to compiler optimization. Anything
94greater than -O2 may not be wise. Similarly, if you choose to use gcc-2.95.x
95or derivatives, be sure not to use -fstrict-aliasing (which, depending on
96your version of gcc 2.95.x, may necessitate using -fno-strict-aliasing).
97 76
98Make 77Make
99---- 78----
@@ -322,9 +301,9 @@ Getting updated software
322Kernel compilation 301Kernel compilation
323****************** 302******************
324 303
325gcc 2.95.3 304gcc
326---------- 305---
327o <ftp://ftp.gnu.org/gnu/gcc/gcc-2.95.3.tar.gz> 306o <ftp://ftp.gnu.org/gnu/gcc/>
328 307
329Make 308Make
330---- 309----
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index eb7db3c19227..ce780ef648f1 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -344,7 +344,7 @@ Remember: if another thread can find your data structure, and you don't
344have a reference count on it, you almost certainly have a bug. 344have a reference count on it, you almost certainly have a bug.
345 345
346 346
347 Chapter 11: Macros, Enums, Inline functions and RTL 347 Chapter 11: Macros, Enums and RTL
348 348
349Names of macros defining constants and labels in enums are capitalized. 349Names of macros defining constants and labels in enums are capitalized.
350 350
@@ -429,7 +429,35 @@ from void pointer to any other pointer type is guaranteed by the C programming
429language. 429language.
430 430
431 431
432 Chapter 14: References 432 Chapter 14: The inline disease
433
434There appears to be a common misperception that gcc has a magic "make me
435faster" speedup option called "inline". While the use of inlines can be
436appropriate (for example as a means of replacing macros, see Chapter 11), it
437very often is not. Abundant use of the inline keyword leads to a much bigger
438kernel, which in turn slows the system as a whole down, due to a bigger
439icache footprint for the CPU and simply because there is less memory
440available for the pagecache. Just think about it; a pagecache miss causes a
441disk seek, which easily takes 5 miliseconds. There are a LOT of cpu cycles
442that can go into these 5 miliseconds.
443
444A reasonable rule of thumb is to not put inline at functions that have more
445than 3 lines of code in them. An exception to this rule are the cases where
446a parameter is known to be a compiletime constant, and as a result of this
447constantness you *know* the compiler will be able to optimize most of your
448function away at compile time. For a good example of this later case, see
449the kmalloc() inline function.
450
451Often people argue that adding inline to functions that are static and used
452only once is always a win since there is no space tradeoff. While this is
453technically correct, gcc is capable of inlining these automatically without
454help, and the maintenance issue of removing the inline when a second user
455appears outweighs the potential value of the hint that tells gcc to do
456something it would have done anyway.
457
458
459
460 Chapter 15: References
433 461
434The C Programming Language, Second Edition 462The C Programming Language, Second Edition
435by Brian W. Kernighan and Dennis M. Ritchie. 463by Brian W. Kernighan and Dennis M. Ritchie.
@@ -444,10 +472,13 @@ ISBN 0-201-61586-X.
444URL: http://cm.bell-labs.com/cm/cs/tpop/ 472URL: http://cm.bell-labs.com/cm/cs/tpop/
445 473
446GNU manuals - where in compliance with K&R and this text - for cpp, gcc, 474GNU manuals - where in compliance with K&R and this text - for cpp, gcc,
447gcc internals and indent, all available from http://www.gnu.org 475gcc internals and indent, all available from http://www.gnu.org/manual/
448 476
449WG14 is the international standardization working group for the programming 477WG14 is the international standardization working group for the programming
450language C, URL: http://std.dkuug.dk/JTC1/SC22/WG14/ 478language C, URL: http://www.open-std.org/JTC1/SC22/WG14/
479
480Kernel CodingStyle, by greg@kroah.com at OLS 2002:
481http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
451 482
452-- 483--
453Last updated on 16 February 2004 by a community effort on LKML. 484Last updated on 30 December 2005 by a community effort on LKML.
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
index a23fee66064d..3f60db41b2f0 100644
--- a/Documentation/RCU/rcuref.txt
+++ b/Documentation/RCU/rcuref.txt
@@ -1,74 +1,67 @@
1Refcounter framework for elements of lists/arrays protected by 1Refcounter design for elements of lists/arrays protected by RCU.
2RCU.
3 2
4Refcounting on elements of lists which are protected by traditional 3Refcounting on elements of lists which are protected by traditional
5reader/writer spinlocks or semaphores are straight forward as in: 4reader/writer spinlocks or semaphores are straight forward as in:
6 5
71. 2. 61. 2.
8add() search_and_reference() 7add() search_and_reference()
9{ { 8{ {
10 alloc_object read_lock(&list_lock); 9 alloc_object read_lock(&list_lock);
11 ... search_for_element 10 ... search_for_element
12 atomic_set(&el->rc, 1); atomic_inc(&el->rc); 11 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
13 write_lock(&list_lock); ... 12 write_lock(&list_lock); ...
14 add_element read_unlock(&list_lock); 13 add_element read_unlock(&list_lock);
15 ... ... 14 ... ...
16 write_unlock(&list_lock); } 15 write_unlock(&list_lock); }
17} 16}
18 17
193. 4. 183. 4.
20release_referenced() delete() 19release_referenced() delete()
21{ { 20{ {
22 ... write_lock(&list_lock); 21 ... write_lock(&list_lock);
23 atomic_dec(&el->rc, relfunc) ... 22 atomic_dec(&el->rc, relfunc) ...
24 ... delete_element 23 ... delete_element
25} write_unlock(&list_lock); 24} write_unlock(&list_lock);
26 ... 25 ...
27 if (atomic_dec_and_test(&el->rc)) 26 if (atomic_dec_and_test(&el->rc))
28 kfree(el); 27 kfree(el);
29 ... 28 ...
30 } 29 }
31 30
32If this list/array is made lock free using rcu as in changing the 31If this list/array is made lock free using rcu as in changing the
33write_lock in add() and delete() to spin_lock and changing read_lock 32write_lock in add() and delete() to spin_lock and changing read_lock
34in search_and_reference to rcu_read_lock(), the rcuref_get in 33in search_and_reference to rcu_read_lock(), the atomic_get in
35search_and_reference could potentially hold reference to an element which 34search_and_reference could potentially hold reference to an element which
36has already been deleted from the list/array. rcuref_lf_get_rcu takes 35has already been deleted from the list/array. atomic_inc_not_zero takes
37care of this scenario. search_and_reference should look as; 36care of this scenario. search_and_reference should look as;
38 37
391. 2. 381. 2.
40add() search_and_reference() 39add() search_and_reference()
41{ { 40{ {
42 alloc_object rcu_read_lock(); 41 alloc_object rcu_read_lock();
43 ... search_for_element 42 ... search_for_element
44 atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) { 43 atomic_set(&el->rc, 1); if (atomic_inc_not_zero(&el->rc)) {
45 write_lock(&list_lock); rcu_read_unlock(); 44 write_lock(&list_lock); rcu_read_unlock();
46 return FAIL; 45 return FAIL;
47 add_element } 46 add_element }
48 ... ... 47 ... ...
49 write_unlock(&list_lock); rcu_read_unlock(); 48 write_unlock(&list_lock); rcu_read_unlock();
50} } 49} }
513. 4. 503. 4.
52release_referenced() delete() 51release_referenced() delete()
53{ { 52{ {
54 ... write_lock(&list_lock); 53 ... write_lock(&list_lock);
55 rcuref_dec(&el->rc, relfunc) ... 54 atomic_dec(&el->rc, relfunc) ...
56 ... delete_element 55 ... delete_element
57} write_unlock(&list_lock); 56} write_unlock(&list_lock);
58 ... 57 ...
59 if (rcuref_dec_and_test(&el->rc)) 58 if (atomic_dec_and_test(&el->rc))
60 call_rcu(&el->head, el_free); 59 call_rcu(&el->head, el_free);
61 ... 60 ...
62 } 61 }
63 62
64Sometimes, reference to the element need to be obtained in the 63Sometimes, reference to the element need to be obtained in the
65update (write) stream. In such cases, rcuref_inc_lf might be an overkill 64update (write) stream. In such cases, atomic_inc_not_zero might be an
66since the spinlock serialising list updates are held. rcuref_inc 65overkill since the spinlock serialising list updates are held. atomic_inc
67is to be used in such cases. 66is to be used in such cases.
68For arches which do not have cmpxchg rcuref_inc_lf 67
69api uses a hashed spinlock implementation and the same hashed spinlock
70is acquired in all rcuref_xxx primitives to preserve atomicity.
71Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the
72refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api
73might lead to races. rcuref_inc_lf() must be used in lockfree
74RCU critical sections only.
diff --git a/Documentation/SubmittingDrivers b/Documentation/SubmittingDrivers
index c3cca924e94b..dd311cff1cc3 100644
--- a/Documentation/SubmittingDrivers
+++ b/Documentation/SubmittingDrivers
@@ -27,18 +27,17 @@ Who To Submit Drivers To
27------------------------ 27------------------------
28 28
29Linux 2.0: 29Linux 2.0:
30 No new drivers are accepted for this kernel tree 30 No new drivers are accepted for this kernel tree.
31 31
32Linux 2.2: 32Linux 2.2:
33 No new drivers are accepted for this kernel tree.
34
35Linux 2.4:
33 If the code area has a general maintainer then please submit it to 36 If the code area has a general maintainer then please submit it to
34 the maintainer listed in MAINTAINERS in the kernel file. If the 37 the maintainer listed in MAINTAINERS in the kernel file. If the
35 maintainer does not respond or you cannot find the appropriate 38 maintainer does not respond or you cannot find the appropriate
36 maintainer then please contact the 2.2 kernel maintainer: 39 maintainer then please contact Marcelo Tosatti
37 Marc-Christian Petersen <m.c.p@wolk-project.de>. 40 <marcelo.tosatti@cyclades.com>.
38
39Linux 2.4:
40 The same rules apply as 2.2. The final contact point for Linux 2.4
41 submissions is Marcelo Tosatti <marcelo.tosatti@cyclades.com>.
42 41
43Linux 2.6: 42Linux 2.6:
44 The same rules apply as 2.4 except that you should follow linux-kernel 43 The same rules apply as 2.4 except that you should follow linux-kernel
@@ -53,6 +52,7 @@ Licensing: The code must be released to us under the
53 of exclusive GPL licensing, and if you wish the driver 52 of exclusive GPL licensing, and if you wish the driver
54 to be useful to other communities such as BSD you may well 53 to be useful to other communities such as BSD you may well
55 wish to release under multiple licenses. 54 wish to release under multiple licenses.
55 See accepted licenses at include/linux/module.h
56 56
57Copyright: The copyright owner must agree to use of GPL. 57Copyright: The copyright owner must agree to use of GPL.
58 It's best if the submitter and copyright owner 58 It's best if the submitter and copyright owner
@@ -143,5 +143,13 @@ KernelNewbies:
143 http://kernelnewbies.org/ 143 http://kernelnewbies.org/
144 144
145Linux USB project: 145Linux USB project:
146 http://sourceforge.net/projects/linux-usb/ 146 http://linux-usb.sourceforge.net/
147
148How to NOT write kernel driver by arjanv@redhat.com
149 http://people.redhat.com/arjanv/olspaper.pdf
150
151Kernel Janitor:
152 http://janitor.kernelnewbies.org/
147 153
154--
155Last updated on 17 Nov 2005.
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 1d47e6c09dc6..6198e5ebcf65 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -78,7 +78,9 @@ Randy Dunlap's patch scripts:
78http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz 78http://www.xenotime.net/linux/scripts/patching-scripts-002.tar.gz
79 79
80Andrew Morton's patch scripts: 80Andrew Morton's patch scripts:
81http://www.zip.com.au/~akpm/linux/patches/patch-scripts-0.20 81http://www.zip.com.au/~akpm/linux/patches/
82Instead of these scripts, quilt is the recommended patch management
83tool (see above).
82 84
83 85
84 86
@@ -97,7 +99,7 @@ need to split up your patch. See #3, next.
97 99
983) Separate your changes. 1003) Separate your changes.
99 101
100Separate each logical change into its own patch. 102Separate _logical changes_ into a single patch file.
101 103
102For example, if your changes include both bug fixes and performance 104For example, if your changes include both bug fixes and performance
103enhancements for a single driver, separate those changes into two 105enhancements for a single driver, separate those changes into two
@@ -112,6 +114,10 @@ If one patch depends on another patch in order for a change to be
112complete, that is OK. Simply note "this patch depends on patch X" 114complete, that is OK. Simply note "this patch depends on patch X"
113in your patch description. 115in your patch description.
114 116
117If you cannot condense your patch set into a smaller set of patches,
118then only post say 15 or so at a time and wait for review and integration.
119
120
115 121
1164) Select e-mail destination. 1224) Select e-mail destination.
117 123
@@ -124,6 +130,10 @@ your patch to the primary Linux kernel developer's mailing list,
124linux-kernel@vger.kernel.org. Most kernel developers monitor this 130linux-kernel@vger.kernel.org. Most kernel developers monitor this
125e-mail list, and can comment on your changes. 131e-mail list, and can comment on your changes.
126 132
133
134Do not send more than 15 patches at once to the vger mailing lists!!!
135
136
127Linus Torvalds is the final arbiter of all changes accepted into the 137Linus Torvalds is the final arbiter of all changes accepted into the
128Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets 138Linux kernel. His e-mail address is <torvalds@osdl.org>. He gets
129a lot of e-mail, so typically you should do your best to -avoid- sending 139a lot of e-mail, so typically you should do your best to -avoid- sending
@@ -149,6 +159,9 @@ USB, framebuffer devices, the VFS, the SCSI subsystem, etc. See the
149MAINTAINERS file for a mailing list that relates specifically to 159MAINTAINERS file for a mailing list that relates specifically to
150your change. 160your change.
151 161
162Majordomo lists of VGER.KERNEL.ORG at:
163 <http://vger.kernel.org/vger-lists.html>
164
152If changes affect userland-kernel interfaces, please send 165If changes affect userland-kernel interfaces, please send
153the MAN-PAGES maintainer (as listed in the MAINTAINERS file) 166the MAN-PAGES maintainer (as listed in the MAINTAINERS file)
154a man-pages patch, or at least a notification of the change, 167a man-pages patch, or at least a notification of the change,
@@ -373,27 +386,14 @@ a diffstat, to show what files have changed, and the number of inserted
373and deleted lines per file. A diffstat is especially useful on bigger 386and deleted lines per file. A diffstat is especially useful on bigger
374patches. Other comments relevant only to the moment or the maintainer, 387patches. Other comments relevant only to the moment or the maintainer,
375not suitable for the permanent changelog, should also go here. 388not suitable for the permanent changelog, should also go here.
389Use diffstat options "-p 1 -w 70" so that filenames are listed from the
390top of the kernel source tree and don't use too much horizontal space
391(easily fit in 80 columns, maybe with some indentation).
376 392
377See more details on the proper patch format in the following 393See more details on the proper patch format in the following
378references. 394references.
379 395
380 396
38113) More references for submitting patches
382
383Andrew Morton, "The perfect patch" (tpp).
384 <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
385
386Jeff Garzik, "Linux kernel patch submission format."
387 <http://linux.yyz.us/patch-format.html>
388
389Greg KH, "How to piss off a kernel subsystem maintainer"
390 <http://www.kroah.com/log/2005/03/31/>
391
392Kernel Documentation/CodingStyle
393 <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
394
395Linus Torvald's mail on the canonical patch format:
396 <http://lkml.org/lkml/2005/4/7/183>
397 397
398 398
399----------------------------------- 399-----------------------------------
@@ -466,3 +466,30 @@ and 'extern __inline__'.
466Don't try to anticipate nebulous future cases which may or may not 466Don't try to anticipate nebulous future cases which may or may not
467be useful: "Make it as simple as you can, and no simpler." 467be useful: "Make it as simple as you can, and no simpler."
468 468
469
470
471----------------------
472SECTION 3 - REFERENCES
473----------------------
474
475Andrew Morton, "The perfect patch" (tpp).
476 <http://www.zip.com.au/~akpm/linux/patches/stuff/tpp.txt>
477
478Jeff Garzik, "Linux kernel patch submission format."
479 <http://linux.yyz.us/patch-format.html>
480
481Greg Kroah, "How to piss off a kernel subsystem maintainer".
482 <http://www.kroah.com/log/2005/03/31/>
483 <http://www.kroah.com/log/2005/07/08/>
484 <http://www.kroah.com/log/2005/10/19/>
485
486NO!!!! No more huge patch bombs to linux-kernel@vger.kernel.org people!.
487 <http://marc.theaimsgroup.com/?l=linux-kernel&m=112112749912944&w=2>
488
489Kernel Documentation/CodingStyle
490 <http://sosdg.org/~coywolf/lxr/source/Documentation/CodingStyle>
491
492Linus Torvald's mail on the canonical patch format:
493 <http://lkml.org/lkml/2005/4/7/183>
494--
495Last updated on 17 Nov 2005.
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt
index 681e426e2482..05a08c2c1889 100644
--- a/Documentation/applying-patches.txt
+++ b/Documentation/applying-patches.txt
@@ -2,7 +2,8 @@
2 Applying Patches To The Linux Kernel 2 Applying Patches To The Linux Kernel
3 ------------------------------------ 3 ------------------------------------
4 4
5 (Written by Jesper Juhl, August 2005) 5 Original by: Jesper Juhl, August 2005
6 Last update: 2005-12-02
6 7
7 8
8 9
@@ -118,7 +119,7 @@ wrong.
118 119
119When patch encounters a change that it can't fix up with fuzz it rejects it 120When patch encounters a change that it can't fix up with fuzz it rejects it
120outright and leaves a file with a .rej extension (a reject file). You can 121outright and leaves a file with a .rej extension (a reject file). You can
121read this file to see exactely what change couldn't be applied, so you can 122read this file to see exactly what change couldn't be applied, so you can
122go fix it up by hand if you wish. 123go fix it up by hand if you wish.
123 124
124If you don't have any third party patches applied to your kernel source, but 125If you don't have any third party patches applied to your kernel source, but
@@ -127,7 +128,7 @@ and have made no modifications yourself to the source files, then you should
127never see a fuzz or reject message from patch. If you do see such messages 128never see a fuzz or reject message from patch. If you do see such messages
128anyway, then there's a high risk that either your local source tree or the 129anyway, then there's a high risk that either your local source tree or the
129patch file is corrupted in some way. In that case you should probably try 130patch file is corrupted in some way. In that case you should probably try
130redownloading the patch and if things are still not OK then you'd be advised 131re-downloading the patch and if things are still not OK then you'd be advised
131to start with a fresh tree downloaded in full from kernel.org. 132to start with a fresh tree downloaded in full from kernel.org.
132 133
133Let's look a bit more at some of the messages patch can produce. 134Let's look a bit more at some of the messages patch can produce.
@@ -180,9 +181,11 @@ wish to apply.
180 181
181Are there any alternatives to `patch'? 182Are there any alternatives to `patch'?
182--- 183---
183 Yes there are alternatives. You can use the `interdiff' program 184 Yes there are alternatives.
184(http://cyberelk.net/tim/patchutils/) to generate a patch representing the 185
185differences between two patches and then apply the result. 186 You can use the `interdiff' program (http://cyberelk.net/tim/patchutils/) to
187generate a patch representing the differences between two patches and then
188apply the result.
186This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single 189This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single
187step. The -z flag to interdiff will even let you feed it patches in gzip or 190step. The -z flag to interdiff will even let you feed it patches in gzip or
188bzip2 compressed form directly without the use of zcat or bzcat or manual 191bzip2 compressed form directly without the use of zcat or bzcat or manual
@@ -197,7 +200,7 @@ do the additional steps since interdiff can get things wrong in some cases.
197 Another alternative is `ketchup', which is a python script for automatic 200 Another alternative is `ketchup', which is a python script for automatic
198downloading and applying of patches (http://www.selenic.com/ketchup/). 201downloading and applying of patches (http://www.selenic.com/ketchup/).
199 202
200Other nice tools are diffstat which shows a summary of changes made by a 203 Other nice tools are diffstat which shows a summary of changes made by a
201patch, lsdiff which displays a short listing of affected files in a patch 204patch, lsdiff which displays a short listing of affected files in a patch
202file, along with (optionally) the line numbers of the start of each patch 205file, along with (optionally) the line numbers of the start of each patch
203and grepdiff which displays a list of the files modified by a patch where 206and grepdiff which displays a list of the files modified by a patch where
@@ -258,7 +261,7 @@ $ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch
258 # source dir is now 2.6.11 261 # source dir is now 2.6.11
259$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch 262$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch
260$ cd .. 263$ cd ..
261$ mv linux-2.6.11.1 inux-2.6.12 # rename source dir 264$ mv linux-2.6.11.1 linux-2.6.12 # rename source dir
262 265
263 266
264The 2.6.x.y kernels 267The 2.6.x.y kernels
@@ -433,7 +436,11 @@ $ cd ..
433$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir 436$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir
434 437
435 438
436This concludes this list of explanations of the various kernel trees and I 439This concludes this list of explanations of the various kernel trees.
437hope you are now crystal clear on how to apply the various patches and help 440I hope you are now clear on how to apply the various patches and help testing
438testing the kernel. 441the kernel.
442
443Thank you's to Randy Dunlap, Rolf Eike Beer, Linus Torvalds, Bodo Eggert,
444Johannes Stezenbach, Grant Coady, Pavel Machek and others that I may have
445forgotten for their reviews and contributions to this document.
439 446
diff --git a/Documentation/block/stat.txt b/Documentation/block/stat.txt
new file mode 100644
index 000000000000..0dbc946de2ea
--- /dev/null
+++ b/Documentation/block/stat.txt
@@ -0,0 +1,82 @@
1Block layer statistics in /sys/block/<dev>/stat
2===============================================
3
4This file documents the contents of the /sys/block/<dev>/stat file.
5
6The stat file provides several statistics about the state of block
7device <dev>.
8
9Q. Why are there multiple statistics in a single file? Doesn't sysfs
10 normally contain a single value per file?
11A. By having a single file, the kernel can guarantee that the statistics
12 represent a consistent snapshot of the state of the device. If the
13 statistics were exported as multiple files containing one statistic
14 each, it would be impossible to guarantee that a set of readings
15 represent a single point in time.
16
17The stat file consists of a single line of text containing 11 decimal
18values separated by whitespace. The fields are summarized in the
19following table, and described in more detail below.
20
21Name units description
22---- ----- -----------
23read I/Os requests number of read I/Os processed
24read merges requests number of read I/Os merged with in-queue I/O
25read sectors sectors number of sectors read
26read ticks milliseconds total wait time for read requests
27write I/Os requests number of write I/Os processed
28write merges requests number of write I/Os merged with in-queue I/O
29write sectors sectors number of sectors written
30write ticks milliseconds total wait time for write requests
31in_flight requests number of I/Os currently in flight
32io_ticks milliseconds total time this block device has been active
33time_in_queue milliseconds total wait time for all requests
34
35read I/Os, write I/Os
36=====================
37
38These values increment when an I/O request completes.
39
40read merges, write merges
41=========================
42
43These values increment when an I/O request is merged with an
44already-queued I/O request.
45
46read sectors, write sectors
47===========================
48
49These values count the number of sectors read from or written to this
50block device. The "sectors" in question are the standard UNIX 512-byte
51sectors, not any device- or filesystem-specific block size. The
52counters are incremented when the I/O completes.
53
54read ticks, write ticks
55=======================
56
57These values count the number of milliseconds that I/O requests have
58waited on this block device. If there are multiple I/O requests waiting,
59these values will increase at a rate greater than 1000/second; for
60example, if 60 read requests wait for an average of 30 ms, the read_ticks
61field will increase by 60*30 = 1800.
62
63in_flight
64=========
65
66This value counts the number of I/O requests that have been issued to
67the device driver but have not yet completed. It does not include I/O
68requests that are in the queue but not yet issued to the device driver.
69
70io_ticks
71========
72
73This value counts the number of milliseconds during which the device has
74had I/O requests queued.
75
76time_in_queue
77=============
78
79This value counts the number of milliseconds that I/O requests have waited
80on this block device. If there are multiple I/O requests waiting, this
81value will increase as the product of the number of milliseconds times the
82number of requests waiting (see "read ticks" above for an example).
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
new file mode 100644
index 000000000000..08c5d04f3086
--- /dev/null
+++ b/Documentation/cpu-hotplug.txt
@@ -0,0 +1,357 @@
1 CPU hotplug Support in Linux(tm) Kernel
2
3 Maintainers:
4 CPU Hotplug Core:
5 Rusty Russell <rusty@rustycorp.com.au>
6 Srivatsa Vaddagiri <vatsa@in.ibm.com>
7 i386:
8 Zwane Mwaikambo <zwane@arm.linux.org.uk>
9 ppc64:
10 Nathan Lynch <nathanl@austin.ibm.com>
11 Joel Schopp <jschopp@austin.ibm.com>
12 ia64/x86_64:
13 Ashok Raj <ashok.raj@intel.com>
14
15Authors: Ashok Raj <ashok.raj@intel.com>
16Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>,
17 Joel Schopp <jschopp@austin.ibm.com>
18
19Introduction
20
21Modern advances in system architectures have introduced advanced error
22reporting and correction capabilities in processors. CPU architectures permit
23partitioning support, where compute resources of a single CPU could be made
24available to virtual machine environments. There are couple OEMS that
25support NUMA hardware which are hot pluggable as well, where physical
26node insertion and removal require support for CPU hotplug.
27
28Such advances require CPUs available to a kernel to be removed either for
29provisioning reasons, or for RAS purposes to keep an offending CPU off
30system execution path. Hence the need for CPU hotplug support in the
31Linux kernel.
32
33A more novel use of CPU-hotplug support is its use today in suspend
34resume support for SMP. Dual-core and HT support makes even
35a laptop run SMP kernels which didn't support these methods. SMP support
36for suspend/resume is a work in progress.
37
38General Stuff about CPU Hotplug
39--------------------------------
40
41Command Line Switches
42---------------------
43maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
44 maxcpus=2 will only boot 2. You can choose to bring the
45 other cpus later online, read FAQ's for more info.
46
47additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus.
48 This option sets
49 cpu_possible_map = cpu_present_map + additional_cpus
50
51CPU maps and such
52-----------------
53[More on cpumaps and primitive to manipulate, please check
54include/linux/cpumask.h that has more descriptive text.]
55
56cpu_possible_map: Bitmap of possible CPUs that can ever be available in the
57system. This is used to allocate some boot time memory for per_cpu variables
58that aren't designed to grow/shrink as CPUs are made available or removed.
59Once set during boot time discovery phase, the map is static, i.e no bits
60are added or removed anytime. Trimming it accurately for your system needs
61upfront can save some boot time memory. See below for how we use heuristics
62in x86_64 case to keep this under check.
63
64cpu_online_map: Bitmap of all CPUs currently online. Its set in __cpu_up()
65after a cpu is available for kernel scheduling and ready to receive
66interrupts from devices. Its cleared when a cpu is brought down using
67__cpu_disable(), before which all OS services including interrupts are
68migrated to another target CPU.
69
70cpu_present_map: Bitmap of CPUs currently present in the system. Not all
71of them may be online. When physical hotplug is processed by the relevant
72subsystem (e.g ACPI) can change and new bit either be added or removed
73from the map depending on the event is hot-add/hot-remove. There are currently
74no locking rules as of now. Typical usage is to init topology during boot,
75at which time hotplug is disabled.
76
77You really dont need to manipulate any of the system cpu maps. They should
78be read-only for most use. When setting up per-cpu resources almost always use
79cpu_possible_map/for_each_cpu() to iterate.
80
81Never use anything other than cpumask_t to represent bitmap of CPUs.
82
83#include <linux/cpumask.h>
84
85for_each_cpu - Iterate over cpu_possible_map
86for_each_online_cpu - Iterate over cpu_online_map
87for_each_present_cpu - Iterate over cpu_present_map
88for_each_cpu_mask(x,mask) - Iterate over some random collection of cpu mask.
89
90#include <linux/cpu.h>
91lock_cpu_hotplug() and unlock_cpu_hotplug():
92
93The above calls are used to inhibit cpu hotplug operations. While holding the
94cpucontrol mutex, cpu_online_map will not change. If you merely need to avoid
95cpus going away, you could also use preempt_disable() and preempt_enable()
96for those sections. Just remember the critical section cannot call any
97function that can sleep or schedule this process away. The preempt_disable()
98will work as long as stop_machine_run() is used to take a cpu down.
99
100CPU Hotplug - Frequently Asked Questions.
101
102Q: How to i enable my kernel to support CPU hotplug?
103A: When doing make defconfig, Enable CPU hotplug support
104
105 "Processor type and Features" -> Support for Hotpluggable CPUs
106
107Make sure that you have CONFIG_HOTPLUG, and CONFIG_SMP turned on as well.
108
109You would need to enable CONFIG_HOTPLUG_CPU for SMP suspend/resume support
110as well.
111
112Q: What architectures support CPU hotplug?
113A: As of 2.6.14, the following architectures support CPU hotplug.
114
115i386 (Intel), ppc, ppc64, parisc, s390, ia64 and x86_64
116
117Q: How to test if hotplug is supported on the newly built kernel?
118A: You should now notice an entry in sysfs.
119
120Check if sysfs is mounted, using the "mount" command. You should notice
121an entry as shown below in the output.
122
123....
124none on /sys type sysfs (rw)
125....
126
127if this is not mounted, do the following.
128
129#mkdir /sysfs
130#mount -t sysfs sys /sys
131
132now you should see entries for all present cpu, the following is an example
133in a 8-way system.
134
135#pwd
136#/sys/devices/system/cpu
137#ls -l
138total 0
139drwxr-xr-x 10 root root 0 Sep 19 07:44 .
140drwxr-xr-x 13 root root 0 Sep 19 07:45 ..
141drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu0
142drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu1
143drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu2
144drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu3
145drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu4
146drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu5
147drwxr-xr-x 3 root root 0 Sep 19 07:44 cpu6
148drwxr-xr-x 3 root root 0 Sep 19 07:48 cpu7
149
150Under each directory you would find an "online" file which is the control
151file to logically online/offline a processor.
152
153Q: Does hot-add/hot-remove refer to physical add/remove of cpus?
154A: The usage of hot-add/remove may not be very consistently used in the code.
155CONFIG_CPU_HOTPLUG enables logical online/offline capability in the kernel.
156To support physical addition/removal, one would need some BIOS hooks and
157the platform should have something like an attention button in PCI hotplug.
158CONFIG_ACPI_HOTPLUG_CPU enables ACPI support for physical add/remove of CPUs.
159
160Q: How do i logically offline a CPU?
161A: Do the following.
162
163#echo 0 > /sys/devices/system/cpu/cpuX/online
164
165once the logical offline is successful, check
166
167#cat /proc/interrupts
168
169you should now not see the CPU that you removed. Also online file will report
170the state as 0 when a cpu if offline and 1 when its online.
171
172#To display the current cpu state.
173#cat /sys/devices/system/cpu/cpuX/online
174
175Q: Why cant i remove CPU0 on some systems?
176A: Some architectures may have some special dependency on a certain CPU.
177
178For e.g in IA64 platforms we have ability to sent platform interrupts to the
179OS. a.k.a Corrected Platform Error Interrupts (CPEI). In current ACPI
180specifications, we didn't have a way to change the target CPU. Hence if the
181current ACPI version doesn't support such re-direction, we disable that CPU
182by making it not-removable.
183
184In such cases you will also notice that the online file is missing under cpu0.
185
186Q: How do i find out if a particular CPU is not removable?
187A: Depending on the implementation, some architectures may show this by the
188absence of the "online" file. This is done if it can be determined ahead of
189time that this CPU cannot be removed.
190
191In some situations, this can be a run time check, i.e if you try to remove the
192last CPU, this will not be permitted. You can find such failures by
193investigating the return value of the "echo" command.
194
195Q: What happens when a CPU is being logically offlined?
196A: The following happen, listed in no particular order :-)
197
198- A notification is sent to in-kernel registered modules by sending an event
199 CPU_DOWN_PREPARE
200- All process is migrated away from this outgoing CPU to a new CPU
201- All interrupts targeted to this CPU is migrated to a new CPU
202- timers/bottom half/task lets are also migrated to a new CPU
203- Once all services are migrated, kernel calls an arch specific routine
204 __cpu_disable() to perform arch specific cleanup.
205- Once this is successful, an event for successful cleanup is sent by an event
206 CPU_DEAD.
207
208 "It is expected that each service cleans up when the CPU_DOWN_PREPARE
209 notifier is called, when CPU_DEAD is called its expected there is nothing
210 running on behalf of this CPU that was offlined"
211
212Q: If i have some kernel code that needs to be aware of CPU arrival and
213 departure, how to i arrange for proper notification?
214A: This is what you would need in your kernel code to receive notifications.
215
216 #include <linux/cpu.h>
217 static int __cpuinit foobar_cpu_callback(struct notifier_block *nfb,
218 unsigned long action, void *hcpu)
219 {
220 unsigned int cpu = (unsigned long)hcpu;
221
222 switch (action) {
223 case CPU_ONLINE:
224 foobar_online_action(cpu);
225 break;
226 case CPU_DEAD:
227 foobar_dead_action(cpu);
228 break;
229 }
230 return NOTIFY_OK;
231 }
232
233 static struct notifier_block foobar_cpu_notifer =
234 {
235 .notifier_call = foobar_cpu_callback,
236 };
237
238
239In your init function,
240
241 register_cpu_notifier(&foobar_cpu_notifier);
242
243You can fail PREPARE notifiers if something doesn't work to prepare resources.
244This will stop the activity and send a following CANCELED event back.
245
246CPU_DEAD should not be failed, its just a goodness indication, but bad
247things will happen if a notifier in path sent a BAD notify code.
248
249Q: I don't see my action being called for all CPUs already up and running?
250A: Yes, CPU notifiers are called only when new CPUs are on-lined or offlined.
251 If you need to perform some action for each cpu already in the system, then
252
253 for_each_online_cpu(i) {
254 foobar_cpu_callback(&foobar_cpu_notifier, CPU_UP_PREPARE, i);
255 foobar_cpu_callback(&foobar-cpu_notifier, CPU_ONLINE, i);
256 }
257
258Q: If i would like to develop cpu hotplug support for a new architecture,
259 what do i need at a minimum?
260A: The following are what is required for CPU hotplug infrastructure to work
261 correctly.
262
263 - Make sure you have an entry in Kconfig to enable CONFIG_HOTPLUG_CPU
264 - __cpu_up() - Arch interface to bring up a CPU
265 - __cpu_disable() - Arch interface to shutdown a CPU, no more interrupts
266 can be handled by the kernel after the routine
267 returns. Including local APIC timers etc are
268 shutdown.
269 - __cpu_die() - This actually supposed to ensure death of the CPU.
270 Actually look at some example code in other arch
271 that implement CPU hotplug. The processor is taken
272 down from the idle() loop for that specific
273 architecture. __cpu_die() typically waits for some
274 per_cpu state to be set, to ensure the processor
275 dead routine is called to be sure positively.
276
277Q: I need to ensure that a particular cpu is not removed when there is some
278 work specific to this cpu is in progress.
279A: First switch the current thread context to preferred cpu
280
281 int my_func_on_cpu(int cpu)
282 {
283 cpumask_t saved_mask, new_mask = CPU_MASK_NONE;
284 int curr_cpu, err = 0;
285
286 saved_mask = current->cpus_allowed;
287 cpu_set(cpu, new_mask);
288 err = set_cpus_allowed(current, new_mask);
289
290 if (err)
291 return err;
292
293 /*
294 * If we got scheduled out just after the return from
295 * set_cpus_allowed() before running the work, this ensures
296 * we stay locked.
297 */
298 curr_cpu = get_cpu();
299
300 if (curr_cpu != cpu) {
301 err = -EAGAIN;
302 goto ret;
303 } else {
304 /*
305 * Do work : But cant sleep, since get_cpu() disables preempt
306 */
307 }
308 ret:
309 put_cpu();
310 set_cpus_allowed(current, saved_mask);
311 return err;
312 }
313
314
315Q: How do we determine how many CPUs are available for hotplug.
316A: There is no clear spec defined way from ACPI that can give us that
317 information today. Based on some input from Natalie of Unisys,
318 that the ACPI MADT (Multiple APIC Description Tables) marks those possible
319 CPUs in a system with disabled status.
320
321 Andi implemented some simple heuristics that count the number of disabled
322 CPUs in MADT as hotpluggable CPUS. In the case there are no disabled CPUS
323 we assume 1/2 the number of CPUs currently present can be hotplugged.
324
325 Caveat: Today's ACPI MADT can only provide 256 entries since the apicid field
326 in MADT is only 8 bits.
327
328User Space Notification
329
330Hotplug support for devices is common in Linux today. Its being used today to
331support automatic configuration of network, usb and pci devices. A hotplug
332event can be used to invoke an agent script to perform the configuration task.
333
334You can add /etc/hotplug/cpu.agent to handle hotplug notification user space
335scripts.
336
337 #!/bin/bash
338 # $Id: cpu.agent
339 # Kernel hotplug params include:
340 #ACTION=%s [online or offline]
341 #DEVPATH=%s
342 #
343 cd /etc/hotplug
344 . ./hotplug.functions
345
346 case $ACTION in
347 online)
348 echo `date` ":cpu.agent" add cpu >> /tmp/hotplug.txt
349 ;;
350 offline)
351 echo `date` ":cpu.agent" remove cpu >>/tmp/hotplug.txt
352 ;;
353 *)
354 debug_mesg CPU $ACTION event not supported
355 exit 1
356 ;;
357 esac
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index a09a8eb80665..9e49b1c35729 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -14,7 +14,10 @@ CONTENTS:
14 1.1 What are cpusets ? 14 1.1 What are cpusets ?
15 1.2 Why are cpusets needed ? 15 1.2 Why are cpusets needed ?
16 1.3 How are cpusets implemented ? 16 1.3 How are cpusets implemented ?
17 1.4 How do I use cpusets ? 17 1.4 What are exclusive cpusets ?
18 1.5 What does notify_on_release do ?
19 1.6 What is memory_pressure ?
20 1.7 How do I use cpusets ?
182. Usage Examples and Syntax 212. Usage Examples and Syntax
19 2.1 Basic Usage 22 2.1 Basic Usage
20 2.2 Adding/removing cpus 23 2.2 Adding/removing cpus
@@ -49,29 +52,6 @@ its cpus_allowed vector, and the kernel page allocator will not
49allocate a page on a node that is not allowed in the requesting tasks 52allocate a page on a node that is not allowed in the requesting tasks
50mems_allowed vector. 53mems_allowed vector.
51 54
52If a cpuset is cpu or mem exclusive, no other cpuset, other than a direct
53ancestor or descendent, may share any of the same CPUs or Memory Nodes.
54A cpuset that is cpu exclusive has a sched domain associated with it.
55The sched domain consists of all cpus in the current cpuset that are not
56part of any exclusive child cpusets.
57This ensures that the scheduler load balacing code only balances
58against the cpus that are in the sched domain as defined above and not
59all of the cpus in the system. This removes any overhead due to
60load balancing code trying to pull tasks outside of the cpu exclusive
61cpuset only to be prevented by the tasks' cpus_allowed mask.
62
63A cpuset that is mem_exclusive restricts kernel allocations for
64page, buffer and other data commonly shared by the kernel across
65multiple users. All cpusets, whether mem_exclusive or not, restrict
66allocations of memory for user space. This enables configuring a
67system so that several independent jobs can share common kernel
68data, such as file system pages, while isolating each jobs user
69allocation in its own cpuset. To do this, construct a large
70mem_exclusive cpuset to hold all the jobs, and construct child,
71non-mem_exclusive cpusets for each individual job. Only a small
72amount of typical kernel memory, such as requests from interrupt
73handlers, is allowed to be taken outside even a mem_exclusive cpuset.
74
75User level code may create and destroy cpusets by name in the cpuset 55User level code may create and destroy cpusets by name in the cpuset
76virtual file system, manage the attributes and permissions of these 56virtual file system, manage the attributes and permissions of these
77cpusets and which CPUs and Memory Nodes are assigned to each cpuset, 57cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
@@ -192,9 +172,15 @@ containing the following files describing that cpuset:
192 172
193 - cpus: list of CPUs in that cpuset 173 - cpus: list of CPUs in that cpuset
194 - mems: list of Memory Nodes in that cpuset 174 - mems: list of Memory Nodes in that cpuset
175 - memory_migrate flag: if set, move pages to cpusets nodes
195 - cpu_exclusive flag: is cpu placement exclusive? 176 - cpu_exclusive flag: is cpu placement exclusive?
196 - mem_exclusive flag: is memory placement exclusive? 177 - mem_exclusive flag: is memory placement exclusive?
197 - tasks: list of tasks (by pid) attached to that cpuset 178 - tasks: list of tasks (by pid) attached to that cpuset
179 - notify_on_release flag: run /sbin/cpuset_release_agent on exit?
180 - memory_pressure: measure of how much paging pressure in cpuset
181
182In addition, the root cpuset only has the following file:
183 - memory_pressure_enabled flag: compute memory_pressure?
198 184
199New cpusets are created using the mkdir system call or shell 185New cpusets are created using the mkdir system call or shell
200command. The properties of a cpuset, such as its flags, allowed 186command. The properties of a cpuset, such as its flags, allowed
@@ -228,7 +214,108 @@ exclusive cpuset. Also, the use of a Linux virtual file system (vfs)
228to represent the cpuset hierarchy provides for a familiar permission 214to represent the cpuset hierarchy provides for a familiar permission
229and name space for cpusets, with a minimum of additional kernel code. 215and name space for cpusets, with a minimum of additional kernel code.
230 216
2311.4 How do I use cpusets ? 217
2181.4 What are exclusive cpusets ?
219--------------------------------
220
221If a cpuset is cpu or mem exclusive, no other cpuset, other than
222a direct ancestor or descendent, may share any of the same CPUs or
223Memory Nodes.
224
225A cpuset that is cpu_exclusive has a scheduler (sched) domain
226associated with it. The sched domain consists of all CPUs in the
227current cpuset that are not part of any exclusive child cpusets.
228This ensures that the scheduler load balancing code only balances
229against the CPUs that are in the sched domain as defined above and
230not all of the CPUs in the system. This removes any overhead due to
231load balancing code trying to pull tasks outside of the cpu_exclusive
232cpuset only to be prevented by the tasks' cpus_allowed mask.
233
234A cpuset that is mem_exclusive restricts kernel allocations for
235page, buffer and other data commonly shared by the kernel across
236multiple users. All cpusets, whether mem_exclusive or not, restrict
237allocations of memory for user space. This enables configuring a
238system so that several independent jobs can share common kernel data,
239such as file system pages, while isolating each jobs user allocation in
240its own cpuset. To do this, construct a large mem_exclusive cpuset to
241hold all the jobs, and construct child, non-mem_exclusive cpusets for
242each individual job. Only a small amount of typical kernel memory,
243such as requests from interrupt handlers, is allowed to be taken
244outside even a mem_exclusive cpuset.
245
246
2471.5 What does notify_on_release do ?
248------------------------------------
249
250If the notify_on_release flag is enabled (1) in a cpuset, then whenever
251the last task in the cpuset leaves (exits or attaches to some other
252cpuset) and the last child cpuset of that cpuset is removed, then
253the kernel runs the command /sbin/cpuset_release_agent, supplying the
254pathname (relative to the mount point of the cpuset file system) of the
255abandoned cpuset. This enables automatic removal of abandoned cpusets.
256The default value of notify_on_release in the root cpuset at system
257boot is disabled (0). The default value of other cpusets at creation
258is the current value of their parents notify_on_release setting.
259
260
2611.6 What is memory_pressure ?
262-----------------------------
263The memory_pressure of a cpuset provides a simple per-cpuset metric
264of the rate that the tasks in a cpuset are attempting to free up in
265use memory on the nodes of the cpuset to satisfy additional memory
266requests.
267
268This enables batch managers monitoring jobs running in dedicated
269cpusets to efficiently detect what level of memory pressure that job
270is causing.
271
272This is useful both on tightly managed systems running a wide mix of
273submitted jobs, which may choose to terminate or re-prioritize jobs that
274are trying to use more memory than allowed on the nodes assigned them,
275and with tightly coupled, long running, massively parallel scientific
276computing jobs that will dramatically fail to meet required performance
277goals if they start to use more memory than allowed to them.
278
279This mechanism provides a very economical way for the batch manager
280to monitor a cpuset for signs of memory pressure. It's up to the
281batch manager or other user code to decide what to do about it and
282take action.
283
284==> Unless this feature is enabled by writing "1" to the special file
285 /dev/cpuset/memory_pressure_enabled, the hook in the rebalance
286 code of __alloc_pages() for this metric reduces to simply noticing
287 that the cpuset_memory_pressure_enabled flag is zero. So only
288 systems that enable this feature will compute the metric.
289
290Why a per-cpuset, running average:
291
292 Because this meter is per-cpuset, rather than per-task or mm,
293 the system load imposed by a batch scheduler monitoring this
294 metric is sharply reduced on large systems, because a scan of
295 the tasklist can be avoided on each set of queries.
296
297 Because this meter is a running average, instead of an accumulating
298 counter, a batch scheduler can detect memory pressure with a
299 single read, instead of having to read and accumulate results
300 for a period of time.
301
302 Because this meter is per-cpuset rather than per-task or mm,
303 the batch scheduler can obtain the key information, memory
304 pressure in a cpuset, with a single read, rather than having to
305 query and accumulate results over all the (dynamically changing)
306 set of tasks in the cpuset.
307
308A per-cpuset simple digital filter (requires a spinlock and 3 words
309of data per-cpuset) is kept, and updated by any task attached to that
310cpuset, if it enters the synchronous (direct) page reclaim code.
311
312A per-cpuset file provides an integer number representing the recent
313(half-life of 10 seconds) rate of direct page reclaims caused by
314the tasks in the cpuset, in units of reclaims attempted per second,
315times 1000.
316
317
3181.7 How do I use cpusets ?
232-------------------------- 319--------------------------
233 320
234In order to minimize the impact of cpusets on critical kernel 321In order to minimize the impact of cpusets on critical kernel
@@ -277,6 +364,30 @@ rewritten to the 'tasks' file of its cpuset. This is done to avoid
277impacting the scheduler code in the kernel with a check for changes 364impacting the scheduler code in the kernel with a check for changes
278in a tasks processor placement. 365in a tasks processor placement.
279 366
367Normally, once a page is allocated (given a physical page
368of main memory) then that page stays on whatever node it
369was allocated, so long as it remains allocated, even if the
370cpusets memory placement policy 'mems' subsequently changes.
371If the cpuset flag file 'memory_migrate' is set true, then when
372tasks are attached to that cpuset, any pages that task had
373allocated to it on nodes in its previous cpuset are migrated
374to the tasks new cpuset. Depending on the implementation,
375this migration may either be done by swapping the page out,
376so that the next time the page is referenced, it will be paged
377into the tasks new cpuset, usually on the node where it was
378referenced, or this migration may be done by directly copying
379the pages from the tasks previous cpuset to the new cpuset,
380where possible to the same node, relative to the new cpuset,
381as the node that held the page, relative to the old cpuset.
382Also if 'memory_migrate' is set true, then if that cpusets
383'mems' file is modified, pages allocated to tasks in that
384cpuset, that were on nodes in the previous setting of 'mems',
385will be moved to nodes in the new setting of 'mems.' Again,
386depending on the implementation, this might be done by swapping,
387or by direct copying. In either case, pages that were not in
388the tasks prior cpuset, or in the cpusets prior 'mems' setting,
389will not be moved.
390
280There is an exception to the above. If hotplug functionality is used 391There is an exception to the above. If hotplug functionality is used
281to remove all the CPUs that are currently assigned to a cpuset, 392to remove all the CPUs that are currently assigned to a cpuset,
282then the kernel will automatically update the cpus_allowed of all 393then the kernel will automatically update the cpus_allowed of all
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs
index f90cc66ea919..9443a6d72cdd 100644
--- a/Documentation/fb/cyblafb/bugs
+++ b/Documentation/fb/cyblafb/bugs
@@ -11,4 +11,3 @@ Untested features
11 11
12All LCD stuff is untested. If it worked in tridentfb, it should work in 12All LCD stuff is untested. If it worked in tridentfb, it should work in
13cyblafb. Please test and report the results to Knut_Petersen@t-online.de. 13cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
14
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes
index cf4351fc32ff..fe0e5223ba86 100644
--- a/Documentation/fb/cyblafb/fb.modes
+++ b/Documentation/fb/cyblafb/fb.modes
@@ -14,142 +14,141 @@
14# 14#
15 15
16mode "640x480-50" 16mode "640x480-50"
17 geometry 640 480 640 3756 8 17 geometry 640 480 2048 4096 8
18 timings 47619 4294967256 24 17 0 216 3 18 timings 47619 4294967256 24 17 0 216 3
19endmode 19endmode
20 20
21mode "640x480-60" 21mode "640x480-60"
22 geometry 640 480 640 3756 8 22 geometry 640 480 2048 4096 8
23 timings 39682 4294967256 24 17 0 216 3 23 timings 39682 4294967256 24 17 0 216 3
24endmode 24endmode
25 25
26mode "640x480-70" 26mode "640x480-70"
27 geometry 640 480 640 3756 8 27 geometry 640 480 2048 4096 8
28 timings 34013 4294967256 24 17 0 216 3 28 timings 34013 4294967256 24 17 0 216 3
29endmode 29endmode
30 30
31mode "640x480-72" 31mode "640x480-72"
32 geometry 640 480 640 3756 8 32 geometry 640 480 2048 4096 8
33 timings 33068 4294967256 24 17 0 216 3 33 timings 33068 4294967256 24 17 0 216 3
34endmode 34endmode
35 35
36mode "640x480-75" 36mode "640x480-75"
37 geometry 640 480 640 3756 8 37 geometry 640 480 2048 4096 8
38 timings 31746 4294967256 24 17 0 216 3 38 timings 31746 4294967256 24 17 0 216 3
39endmode 39endmode
40 40
41mode "640x480-80" 41mode "640x480-80"
42 geometry 640 480 640 3756 8 42 geometry 640 480 2048 4096 8
43 timings 29761 4294967256 24 17 0 216 3 43 timings 29761 4294967256 24 17 0 216 3
44endmode 44endmode
45 45
46mode "640x480-85" 46mode "640x480-85"
47 geometry 640 480 640 3756 8 47 geometry 640 480 2048 4096 8
48 timings 28011 4294967256 24 17 0 216 3 48 timings 28011 4294967256 24 17 0 216 3
49endmode 49endmode
50 50
51mode "800x600-50" 51mode "800x600-50"
52 geometry 800 600 800 3221 8 52 geometry 800 600 2048 4096 8
53 timings 30303 96 24 14 0 136 11 53 timings 30303 96 24 14 0 136 11
54endmode 54endmode
55 55
56mode "800x600-60" 56mode "800x600-60"
57 geometry 800 600 800 3221 8 57 geometry 800 600 2048 4096 8
58 timings 25252 96 24 14 0 136 11 58 timings 25252 96 24 14 0 136 11
59endmode 59endmode
60 60
61mode "800x600-70" 61mode "800x600-70"
62 geometry 800 600 800 3221 8 62 geometry 800 600 2048 4096 8
63 timings 21645 96 24 14 0 136 11 63 timings 21645 96 24 14 0 136 11
64endmode 64endmode
65 65
66mode "800x600-72" 66mode "800x600-72"
67 geometry 800 600 800 3221 8 67 geometry 800 600 2048 4096 8
68 timings 21043 96 24 14 0 136 11 68 timings 21043 96 24 14 0 136 11
69endmode 69endmode
70 70
71mode "800x600-75" 71mode "800x600-75"
72 geometry 800 600 800 3221 8 72 geometry 800 600 2048 4096 8
73 timings 20202 96 24 14 0 136 11 73 timings 20202 96 24 14 0 136 11
74endmode 74endmode
75 75
76mode "800x600-80" 76mode "800x600-80"
77 geometry 800 600 800 3221 8 77 geometry 800 600 2048 4096 8
78 timings 18939 96 24 14 0 136 11 78 timings 18939 96 24 14 0 136 11
79endmode 79endmode
80 80
81mode "800x600-85" 81mode "800x600-85"
82 geometry 800 600 800 3221 8 82 geometry 800 600 2048 4096 8
83 timings 17825 96 24 14 0 136 11 83 timings 17825 96 24 14 0 136 11
84endmode 84endmode
85 85
86mode "1024x768-50" 86mode "1024x768-50"
87 geometry 1024 768 1024 2815 8 87 geometry 1024 768 2048 4096 8
88 timings 19054 144 24 29 0 120 3 88 timings 19054 144 24 29 0 120 3
89endmode 89endmode
90 90
91mode "1024x768-60" 91mode "1024x768-60"
92 geometry 1024 768 1024 2815 8 92 geometry 1024 768 2048 4096 8
93 timings 15880 144 24 29 0 120 3 93 timings 15880 144 24 29 0 120 3
94endmode 94endmode
95 95
96mode "1024x768-70" 96mode "1024x768-70"
97 geometry 1024 768 1024 2815 8 97 geometry 1024 768 2048 4096 8
98 timings 13610 144 24 29 0 120 3 98 timings 13610 144 24 29 0 120 3
99endmode 99endmode
100 100
101mode "1024x768-72" 101mode "1024x768-72"
102 geometry 1024 768 1024 2815 8 102 geometry 1024 768 2048 4096 8
103 timings 13232 144 24 29 0 120 3 103 timings 13232 144 24 29 0 120 3
104endmode 104endmode
105 105
106mode "1024x768-75" 106mode "1024x768-75"
107 geometry 1024 768 1024 2815 8 107 geometry 1024 768 2048 4096 8
108 timings 12703 144 24 29 0 120 3 108 timings 12703 144 24 29 0 120 3
109endmode 109endmode
110 110
111mode "1024x768-80" 111mode "1024x768-80"
112 geometry 1024 768 1024 2815 8 112 geometry 1024 768 2048 4096 8
113 timings 11910 144 24 29 0 120 3 113 timings 11910 144 24 29 0 120 3
114endmode 114endmode
115 115
116mode "1024x768-85" 116mode "1024x768-85"
117 geometry 1024 768 1024 2815 8 117 geometry 1024 768 2048 4096 8
118 timings 11209 144 24 29 0 120 3 118 timings 11209 144 24 29 0 120 3
119endmode 119endmode
120 120
121mode "1280x1024-50" 121mode "1280x1024-50"
122 geometry 1280 1024 1280 2662 8 122 geometry 1280 1024 2048 4096 8
123 timings 11114 232 16 39 0 160 3 123 timings 11114 232 16 39 0 160 3
124endmode 124endmode
125 125
126mode "1280x1024-60" 126mode "1280x1024-60"
127 geometry 1280 1024 1280 2662 8 127 geometry 1280 1024 2048 4096 8
128 timings 9262 232 16 39 0 160 3 128 timings 9262 232 16 39 0 160 3
129endmode 129endmode
130 130
131mode "1280x1024-70" 131mode "1280x1024-70"
132 geometry 1280 1024 1280 2662 8 132 geometry 1280 1024 2048 4096 8
133 timings 7939 232 16 39 0 160 3 133 timings 7939 232 16 39 0 160 3
134endmode 134endmode
135 135
136mode "1280x1024-72" 136mode "1280x1024-72"
137 geometry 1280 1024 1280 2662 8 137 geometry 1280 1024 2048 4096 8
138 timings 7719 232 16 39 0 160 3 138 timings 7719 232 16 39 0 160 3
139endmode 139endmode
140 140
141mode "1280x1024-75" 141mode "1280x1024-75"
142 geometry 1280 1024 1280 2662 8 142 geometry 1280 1024 2048 4096 8
143 timings 7410 232 16 39 0 160 3 143 timings 7410 232 16 39 0 160 3
144endmode 144endmode
145 145
146mode "1280x1024-80" 146mode "1280x1024-80"
147 geometry 1280 1024 1280 2662 8 147 geometry 1280 1024 2048 4096 8
148 timings 6946 232 16 39 0 160 3 148 timings 6946 232 16 39 0 160 3
149endmode 149endmode
150 150
151mode "1280x1024-85" 151mode "1280x1024-85"
152 geometry 1280 1024 1280 2662 8 152 geometry 1280 1024 2048 4096 8
153 timings 6538 232 16 39 0 160 3 153 timings 6538 232 16 39 0 160 3
154endmode 154endmode
155
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance
index eb4e47a9cea6..8d15d5dfc6b3 100644
--- a/Documentation/fb/cyblafb/performance
+++ b/Documentation/fb/cyblafb/performance
@@ -77,4 +77,3 @@ patch that speeds up kernel bitblitting a lot ( > 20%).
77| | | | | 77| | | | |
78| | | | | 78| | | | |
79+-----------+-----------------+-----------------+-----------------+ 79+-----------+-----------------+-----------------+-----------------+
80
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo
index 80fb2f89b6c1..c5f6d0eae545 100644
--- a/Documentation/fb/cyblafb/todo
+++ b/Documentation/fb/cyblafb/todo
@@ -22,11 +22,10 @@ accelerated color blitting Who needs it? The console driver does use color
22 everything else is done using color expanding 22 everything else is done using color expanding
23 blitting of 1bpp character bitmaps. 23 blitting of 1bpp character bitmaps.
24 24
25xpanning Who needs it?
26
27ioctls Who needs it? 25ioctls Who needs it?
28 26
29TV-out Will be done later 27TV-out Will be done later. Use "vga= " at boot time
28 to set a suitable video mode.
30 29
31??? Feel free to contact me if you have any 30??? Feel free to contact me if you have any
32 feature requests 31 feature requests
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage
index e627c8f54211..a39bb3d402a2 100644
--- a/Documentation/fb/cyblafb/usage
+++ b/Documentation/fb/cyblafb/usage
@@ -40,6 +40,16 @@ Selecting Modes
40 None of the modes possible to select as startup modes are affected by 40 None of the modes possible to select as startup modes are affected by
41 the problems described at the end of the next subsection. 41 the problems described at the end of the next subsection.
42 42
43 For all startup modes cyblafb chooses a virtual x resolution of 2048,
44 the only exception is mode 1280x1024 in combination with 32 bpp. This
45 allows ywrap scrolling for all those modes if rotation is 0 or 2, and
46 also fast scrolling if rotation is 1 or 3. The default virtual y reso-
47 lution is 4096 for bpp == 8, 2048 for bpp==16 and 1024 for bpp == 32,
48 again with the only exception of 1280x1024 at 32 bpp.
49
50 Please do set your video memory size to 8 Mb in the Bios setup. Other
51 values will work, but performace is decreased for a lot of modes.
52
43 Mode changes using fbset 53 Mode changes using fbset
44 ======================== 54 ========================
45 55
@@ -54,20 +64,26 @@ Selecting Modes
54 - if a flat panel is found, cyblafb does not allow you 64 - if a flat panel is found, cyblafb does not allow you
55 to program a resolution higher than the physical 65 to program a resolution higher than the physical
56 resolution of the flat panel monitor 66 resolution of the flat panel monitor
57 - cyblafb does not allow xres to differ from xres_virtual
58 - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp 67 - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
59 and (currently) 24 bit modes use a doubled vclk internally, 68 and (currently) 24 bit modes use a doubled vclk internally,
60 the dotclock limit as seen by fbset is 115 MHz for those 69 the dotclock limit as seen by fbset is 115 MHz for those
61 modes and 230 MHz for 8 and 16 bpp modes. 70 modes and 230 MHz for 8 and 16 bpp modes.
71 - cyblafb will allow you to select very high resolutions as
72 long as the hardware can be programmed to these modes. The
73 documented limit 1600x1200 is not enforced, but don't expect
74 perfect signal quality.
62 75
63 Any request that violates the rules given above will be ignored and 76 Any request that violates the rules given above will be either changed
64 fbset will return an error. 77 to something the hardware supports or an error value will be returned.
65 78
66 If you program a virtual y resolution higher than the hardware limit, 79 If you program a virtual y resolution higher than the hardware limit,
67 cyblafb will silently decrease that value to the highest possible 80 cyblafb will silently decrease that value to the highest possible
68 value. 81 value. The same is true for a virtual x resolution that is not
82 supported by the hardware. Cyblafb tries to adapt vyres first because
83 vxres decides if ywrap scrolling is possible or not.
69 84
70 Attempts to disable acceleration are ignored. 85 Attempts to disable acceleration are ignored, I believe that this is
86 safe.
71 87
72 Some video modes that should work do not work as expected. If you use 88 Some video modes that should work do not work as expected. If you use
73 the standard fb.modes, fbset 640x480-60 will program that mode, but 89 the standard fb.modes, fbset 640x480-60 will program that mode, but
@@ -129,10 +145,6 @@ mode 640x480 or 800x600 or 1024x768 or 1280x1024
129verbosity 0 is the default, increase to at least 2 for every 145verbosity 0 is the default, increase to at least 2 for every
130 bug report! 146 bug report!
131 147
132vesafb allows cyblafb to be loaded after vesafb has been
133 loaded. See sections "Module unloading ...".
134
135
136Development hints 148Development hints
137================= 149=================
138 150
@@ -195,7 +207,7 @@ a graphics mode.
195After booting, load cyblafb without any mode and bpp parameter and assign 207After booting, load cyblafb without any mode and bpp parameter and assign
196cyblafb to individual ttys using con2fb, e.g.: 208cyblafb to individual ttys using con2fb, e.g.:
197 209
198 modprobe cyblafb vesafb=1 210 modprobe cyblafb
199 con2fb /dev/fb1 /dev/tty1 211 con2fb /dev/fb1 /dev/tty1
200 212
201Unloading cyblafb works without problems after you assign vesafb to all 213Unloading cyblafb works without problems after you assign vesafb to all
@@ -203,4 +215,3 @@ ttys again, e.g.:
203 215
204 con2fb /dev/fb0 /dev/tty1 216 con2fb /dev/fb0 /dev/tty1
205 rmmod cyblafb 217 rmmod cyblafb
206
diff --git a/Documentation/fb/cyblafb/whatsnew b/Documentation/fb/cyblafb/whatsnew
new file mode 100644
index 000000000000..76c07a26e044
--- /dev/null
+++ b/Documentation/fb/cyblafb/whatsnew
@@ -0,0 +1,29 @@
10.62
2====
3
4 - the vesafb parameter has been removed as I decided to allow the
5 feature without any special parameter.
6
7 - Cyblafb does not use the vga style of panning any longer, now the
8 "right view" register in the graphics engine IO space is used. Without
9 that change it was impossible to use all available memory, and without
10 access to all available memory it is impossible to ywrap.
11
12 - The imageblit function now uses hardware acceleration for all font
13 widths. Hardware blitting across pixel column 2048 is broken in the
14 cyberblade/i1 graphics core, but we work around that hardware bug.
15
16 - modes with vxres != xres are supported now.
17
18 - ywrap scrolling is supported now and the default. This is a big
19 performance gain.
20
21 - default video modes use vyres > yres and vxres > xres to allow
22 almost optimal scrolling speed for normal and rotated screens
23
24 - some features mainly usefull for debugging the upper layers of the
25 framebuffer system have been added, have a look at the code
26
27 - fixed: Oops after unloading cyblafb when reading /proc/io*
28
29 - we work around some bugs of the higher framebuffer layers.
diff --git a/Documentation/filesystems/ext3.txt b/Documentation/filesystems/ext3.txt
index 9840d5b8d5b9..22e4040564d5 100644
--- a/Documentation/filesystems/ext3.txt
+++ b/Documentation/filesystems/ext3.txt
@@ -22,6 +22,11 @@ journal=inum When a journal already exists, this option is
22 the inode which will represent the ext3 file 22 the inode which will represent the ext3 file
23 system's journal file. 23 system's journal file.
24 24
25journal_dev=devnum When the external journal device's major/minor numbers
26 have changed, this option allows to specify the new
27 journal location. The journal device is identified
28 through its new major/minor numbers encoded in devnum.
29
25noload Don't load the journal on mounting. 30noload Don't load the journal on mounting.
26 31
27data=journal All data are committed into the journal prior 32data=journal All data are committed into the journal prior
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d4773565ea2f..a4dcf42c2fd9 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -1302,6 +1302,23 @@ VM has token based thrashing control mechanism and uses the token to prevent
1302unnecessary page faults in thrashing situation. The unit of the value is 1302unnecessary page faults in thrashing situation. The unit of the value is
1303second. The value would be useful to tune thrashing behavior. 1303second. The value would be useful to tune thrashing behavior.
1304 1304
1305drop_caches
1306-----------
1307
1308Writing to this will cause the kernel to drop clean caches, dentries and
1309inodes from memory, causing that memory to become free.
1310
1311To free pagecache:
1312 echo 1 > /proc/sys/vm/drop_caches
1313To free dentries and inodes:
1314 echo 2 > /proc/sys/vm/drop_caches
1315To free pagecache, dentries and inodes:
1316 echo 3 > /proc/sys/vm/drop_caches
1317
1318As this is a non-destructive operation and dirty objects are not freeable, the
1319user should run `sync' first.
1320
1321
13052.5 /proc/sys/dev - Device specific parameters 13222.5 /proc/sys/dev - Device specific parameters
1306---------------------------------------------- 1323----------------------------------------------
1307 1324
diff --git a/Documentation/filesystems/ramfs-rootfs-initramfs.txt b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
index b3404a032596..60ab61e54e8a 100644
--- a/Documentation/filesystems/ramfs-rootfs-initramfs.txt
+++ b/Documentation/filesystems/ramfs-rootfs-initramfs.txt
@@ -143,12 +143,26 @@ as the following example:
143 dir /mnt 755 0 0 143 dir /mnt 755 0 0
144 file /init initramfs/init.sh 755 0 0 144 file /init initramfs/init.sh 755 0 0
145 145
146Run "usr/gen_init_cpio" (after the kernel build) to get a usage message
147documenting the above file format.
148
146One advantage of the text file is that root access is not required to 149One advantage of the text file is that root access is not required to
147set permissions or create device nodes in the new archive. (Note that those 150set permissions or create device nodes in the new archive. (Note that those
148two example "file" entries expect to find files named "init.sh" and "busybox" in 151two example "file" entries expect to find files named "init.sh" and "busybox" in
149a directory called "initramfs", under the linux-2.6.* directory. See 152a directory called "initramfs", under the linux-2.6.* directory. See
150Documentation/early-userspace/README for more details.) 153Documentation/early-userspace/README for more details.)
151 154
155The kernel does not depend on external cpio tools, gen_init_cpio is created
156from usr/gen_init_cpio.c which is entirely self-contained, and the kernel's
157boot-time extractor is also (obviously) self-contained. However, if you _do_
158happen to have cpio installed, the following command line can extract the
159generated cpio image back into its component files:
160
161 cpio -i -d -H newc -F initramfs_data.cpio --no-absolute-filenames
162
163Contents of initramfs:
164----------------------
165
152If you don't already understand what shared libraries, devices, and paths 166If you don't already understand what shared libraries, devices, and paths
153you need to get a minimal root filesystem up and running, here are some 167you need to get a minimal root filesystem up and running, here are some
154references: 168references:
@@ -161,13 +175,69 @@ designed to be a tiny C library to statically link early userspace
161code against, along with some related utilities. It is BSD licensed. 175code against, along with some related utilities. It is BSD licensed.
162 176
163I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net) 177I use uClibc (http://www.uclibc.org) and busybox (http://www.busybox.net)
164myself. These are LGPL and GPL, respectively. 178myself. These are LGPL and GPL, respectively. (A self-contained initramfs
179package is planned for the busybox 1.2 release.)
165 180
166In theory you could use glibc, but that's not well suited for small embedded 181In theory you could use glibc, but that's not well suited for small embedded
167uses like this. (A "hello world" program statically linked against glibc is 182uses like this. (A "hello world" program statically linked against glibc is
168over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do 183over 400k. With uClibc it's 7k. Also note that glibc dlopens libnss to do
169name lookups, even when otherwise statically linked.) 184name lookups, even when otherwise statically linked.)
170 185
186Why cpio rather than tar?
187-------------------------
188
189This decision was made back in December, 2001. The discussion started here:
190
191 http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1538.html
192
193And spawned a second thread (specifically on tar vs cpio), starting here:
194
195 http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1587.html
196
197The quick and dirty summary version (which is no substitute for reading
198the above threads) is:
199
2001) cpio is a standard. It's decades old (from the AT&T days), and already
201 widely used on Linux (inside RPM, Red Hat's device driver disks). Here's
202 a Linux Journal article about it from 1996:
203
204 http://www.linuxjournal.com/article/1213
205
206 It's not as popular as tar because the traditional cpio command line tools
207 require _truly_hideous_ command line arguments. But that says nothing
208 either way about the archive format, and there are alternative tools,
209 such as:
210
211 http://freshmeat.net/projects/afio/
212
2132) The cpio archive format chosen by the kernel is simpler and cleaner (and
214 thus easier to create and parse) than any of the (literally dozens of)
215 various tar archive formats. The complete initramfs archive format is
216 explained in buffer-format.txt, created in usr/gen_init_cpio.c, and
217 extracted in init/initramfs.c. All three together come to less than 26k
218 total of human-readable text.
219
2203) The GNU project standardizing on tar is approximately as relevant as
221 Windows standardizing on zip. Linux is not part of either, and is free
222 to make its own technical decisions.
223
2244) Since this is a kernel internal format, it could easily have been
225 something brand new. The kernel provides its own tools to create and
226 extract this format anyway. Using an existing standard was preferable,
227 but not essential.
228
2295) Al Viro made the decision (quote: "tar is ugly as hell and not going to be
230 supported on the kernel side"):
231
232 http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1540.html
233
234 explained his reasoning:
235
236 http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1550.html
237 http://www.uwsg.iu.edu/hypermail/linux/kernel/0112.2/1638.html
238
239 and, most importantly, designed and implemented the initramfs code.
240
171Future directions: 241Future directions:
172------------------ 242------------------
173 243
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
index d803abed29f0..5832377b7340 100644
--- a/Documentation/filesystems/relayfs.txt
+++ b/Documentation/filesystems/relayfs.txt
@@ -44,30 +44,41 @@ relayfs can operate in a mode where it will overwrite data not yet
44collected by userspace, and not wait for it to consume it. 44collected by userspace, and not wait for it to consume it.
45 45
46relayfs itself does not provide for communication of such data between 46relayfs itself does not provide for communication of such data between
47userspace and kernel, allowing the kernel side to remain simple and not 47userspace and kernel, allowing the kernel side to remain simple and
48impose a single interface on userspace. It does provide a separate 48not impose a single interface on userspace. It does provide a set of
49helper though, described below. 49examples and a separate helper though, described below.
50
51klog and relay-apps example code
52================================
53
54relayfs itself is ready to use, but to make things easier, a couple
55simple utility functions and a set of examples are provided.
56
57The relay-apps example tarball, available on the relayfs sourceforge
58site, contains a set of self-contained examples, each consisting of a
59pair of .c files containing boilerplate code for each of the user and
60kernel sides of a relayfs application; combined these two sets of
61boilerplate code provide glue to easily stream data to disk, without
62having to bother with mundane housekeeping chores.
63
64The 'klog debugging functions' patch (klog.patch in the relay-apps
65tarball) provides a couple of high-level logging functions to the
66kernel which allow writing formatted text or raw data to a channel,
67regardless of whether a channel to write into exists or not, or
68whether relayfs is compiled into the kernel or is configured as a
69module. These functions allow you to put unconditional 'trace'
70statements anywhere in the kernel or kernel modules; only when there
71is a 'klog handler' registered will data actually be logged (see the
72klog and kleak examples for details).
73
74It is of course possible to use relayfs from scratch i.e. without
75using any of the relay-apps example code or klog, but you'll have to
76implement communication between userspace and kernel, allowing both to
77convey the state of buffers (full, empty, amount of padding).
78
79klog and the relay-apps examples can be found in the relay-apps
80tarball on http://relayfs.sourceforge.net
50 81
51klog, relay-app & librelay
52==========================
53
54relayfs itself is ready to use, but to make things easier, two
55additional systems are provided. klog is a simple wrapper to make
56writing formatted text or raw data to a channel simpler, regardless of
57whether a channel to write into exists or not, or whether relayfs is
58compiled into the kernel or is configured as a module. relay-app is
59the kernel counterpart of userspace librelay.c, combined these two
60files provide glue to easily stream data to disk, without having to
61bother with housekeeping. klog and relay-app can be used together,
62with klog providing high-level logging functions to the kernel and
63relay-app taking care of kernel-user control and disk-logging chores.
64
65It is possible to use relayfs without relay-app & librelay, but you'll
66have to implement communication between userspace and kernel, allowing
67both to convey the state of buffers (full, empty, amount of padding).
68
69klog, relay-app and librelay can be found in the relay-apps tarball on
70http://relayfs.sourceforge.net
71 82
72The relayfs user space API 83The relayfs user space API
73========================== 84==========================
@@ -125,6 +136,8 @@ Here's a summary of the API relayfs provides to in-kernel clients:
125 relay_reset(chan) 136 relay_reset(chan)
126 relayfs_create_dir(name, parent) 137 relayfs_create_dir(name, parent)
127 relayfs_remove_dir(dentry) 138 relayfs_remove_dir(dentry)
139 relayfs_create_file(name, parent, mode, fops, data)
140 relayfs_remove_file(dentry)
128 141
129 channel management typically called on instigation of userspace: 142 channel management typically called on instigation of userspace:
130 143
@@ -141,6 +154,8 @@ Here's a summary of the API relayfs provides to in-kernel clients:
141 subbuf_start(buf, subbuf, prev_subbuf, prev_padding) 154 subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
142 buf_mapped(buf, filp) 155 buf_mapped(buf, filp)
143 buf_unmapped(buf, filp) 156 buf_unmapped(buf, filp)
157 create_buf_file(filename, parent, mode, buf, is_global)
158 remove_buf_file(dentry)
144 159
145 helper functions: 160 helper functions:
146 161
@@ -320,6 +335,71 @@ forces a sub-buffer switch on all the channel buffers, and can be used
320to finalize and process the last sub-buffers before the channel is 335to finalize and process the last sub-buffers before the channel is
321closed. 336closed.
322 337
338Creating non-relay files
339------------------------
340
341relay_open() automatically creates files in the relayfs filesystem to
342represent the per-cpu kernel buffers; it's often useful for
343applications to be able to create their own files alongside the relay
344files in the relayfs filesystem as well e.g. 'control' files much like
345those created in /proc or debugfs for similar purposes, used to
346communicate control information between the kernel and user sides of a
347relayfs application. For this purpose the relayfs_create_file() and
348relayfs_remove_file() API functions exist. For relayfs_create_file(),
349the caller passes in a set of user-defined file operations to be used
350for the file and an optional void * to a user-specified data item,
351which will be accessible via inode->u.generic_ip (see the relay-apps
352tarball for examples). The file_operations are a required parameter
353to relayfs_create_file() and thus the semantics of these files are
354completely defined by the caller.
355
356See the relay-apps tarball at http://relayfs.sourceforge.net for
357examples of how these non-relay files are meant to be used.
358
359Creating relay files in other filesystems
360-----------------------------------------
361
362By default of course, relay_open() creates relay files in the relayfs
363filesystem. Because relay_file_operations is exported, however, it's
364also possible to create and use relay files in other pseudo-filesytems
365such as debugfs.
366
367For this purpose, two callback functions are provided,
368create_buf_file() and remove_buf_file(). create_buf_file() is called
369once for each per-cpu buffer from relay_open() to allow the client to
370create a file to be used to represent the corresponding buffer; if
371this callback is not defined, the default implementation will create
372and return a file in the relayfs filesystem to represent the buffer.
373The callback should return the dentry of the file created to represent
374the relay buffer. Note that the parent directory passed to
375relay_open() (and passed along to the callback), if specified, must
376exist in the same filesystem the new relay file is created in. If
377create_buf_file() is defined, remove_buf_file() must also be defined;
378it's responsible for deleting the file(s) created in create_buf_file()
379and is called during relay_close().
380
381The create_buf_file() implementation can also be defined in such a way
382as to allow the creation of a single 'global' buffer instead of the
383default per-cpu set. This can be useful for applications interested
384mainly in seeing the relative ordering of system-wide events without
385the need to bother with saving explicit timestamps for the purpose of
386merging/sorting per-cpu files in a postprocessing step.
387
388To have relay_open() create a global buffer, the create_buf_file()
389implementation should set the value of the is_global outparam to a
390non-zero value in addition to creating the file that will be used to
391represent the single buffer. In the case of a global buffer,
392create_buf_file() and remove_buf_file() will be called only once. The
393normal channel-writing functions e.g. relay_write() can still be used
394- writes from any cpu will transparently end up in the global buffer -
395but since it is a global buffer, callers should make sure they use the
396proper locking for such a buffer, either by wrapping writes in a
397spinlock, or by copying a write function from relayfs_fs.h and
398creating a local version that internally does the proper locking.
399
400See the 'exported-relayfile' examples in the relay-apps tarball for
401examples of creating and using relay files in debugfs.
402
323Misc 403Misc
324---- 404----
325 405
diff --git a/Documentation/keys-request-key.txt b/Documentation/keys-request-key.txt
index 5f2b9c5edbb5..22488d791168 100644
--- a/Documentation/keys-request-key.txt
+++ b/Documentation/keys-request-key.txt
@@ -56,10 +56,12 @@ A request proceeds in the following manner:
56 (4) request_key() then forks and executes /sbin/request-key with a new session 56 (4) request_key() then forks and executes /sbin/request-key with a new session
57 keyring that contains a link to auth key V. 57 keyring that contains a link to auth key V.
58 58
59 (5) /sbin/request-key execs an appropriate program to perform the actual 59 (5) /sbin/request-key assumes the authority associated with key U.
60
61 (6) /sbin/request-key execs an appropriate program to perform the actual
60 instantiation. 62 instantiation.
61 63
62 (6) The program may want to access another key from A's context (say a 64 (7) The program may want to access another key from A's context (say a
63 Kerberos TGT key). It just requests the appropriate key, and the keyring 65 Kerberos TGT key). It just requests the appropriate key, and the keyring
64 search notes that the session keyring has auth key V in its bottom level. 66 search notes that the session keyring has auth key V in its bottom level.
65 67
@@ -67,19 +69,19 @@ A request proceeds in the following manner:
67 UID, GID, groups and security info of process A as if it was process A, 69 UID, GID, groups and security info of process A as if it was process A,
68 and come up with key W. 70 and come up with key W.
69 71
70 (7) The program then does what it must to get the data with which to 72 (8) The program then does what it must to get the data with which to
71 instantiate key U, using key W as a reference (perhaps it contacts a 73 instantiate key U, using key W as a reference (perhaps it contacts a
72 Kerberos server using the TGT) and then instantiates key U. 74 Kerberos server using the TGT) and then instantiates key U.
73 75
74 (8) Upon instantiating key U, auth key V is automatically revoked so that it 76 (9) Upon instantiating key U, auth key V is automatically revoked so that it
75 may not be used again. 77 may not be used again.
76 78
77 (9) The program then exits 0 and request_key() deletes key V and returns key 79(10) The program then exits 0 and request_key() deletes key V and returns key
78 U to the caller. 80 U to the caller.
79 81
80This also extends further. If key W (step 5 above) didn't exist, key W would be 82This also extends further. If key W (step 7 above) didn't exist, key W would be
81created uninstantiated, another auth key (X) would be created [as per step 3] 83created uninstantiated, another auth key (X) would be created (as per step 3)
82and another copy of /sbin/request-key spawned [as per step 4]; but the context 84and another copy of /sbin/request-key spawned (as per step 4); but the context
83specified by auth key X will still be process A, as it was in auth key V. 85specified by auth key X will still be process A, as it was in auth key V.
84 86
85This is because process A's keyrings can't simply be attached to 87This is because process A's keyrings can't simply be attached to
@@ -138,8 +140,8 @@ until one succeeds:
138 140
139 (3) The process's session keyring is searched. 141 (3) The process's session keyring is searched.
140 142
141 (4) If the process has a request_key() authorisation key in its session 143 (4) If the process has assumed the authority associated with a request_key()
142 keyring then: 144 authorisation key then:
143 145
144 (a) If extant, the calling process's thread keyring is searched. 146 (a) If extant, the calling process's thread keyring is searched.
145 147
diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 6304db59bfe4..aaa01b0e3ee9 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -308,6 +308,8 @@ process making the call:
308 KEY_SPEC_USER_KEYRING -4 UID-specific keyring 308 KEY_SPEC_USER_KEYRING -4 UID-specific keyring
309 KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring 309 KEY_SPEC_USER_SESSION_KEYRING -5 UID-session keyring
310 KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring 310 KEY_SPEC_GROUP_KEYRING -6 GID-specific keyring
311 KEY_SPEC_REQKEY_AUTH_KEY -7 assumed request_key()
312 authorisation key
311 313
312 314
313The main syscalls are: 315The main syscalls are:
@@ -498,7 +500,11 @@ The keyctl syscall functions are:
498 keyring is full, error ENFILE will result. 500 keyring is full, error ENFILE will result.
499 501
500 The link procedure checks the nesting of the keyrings, returning ELOOP if 502 The link procedure checks the nesting of the keyrings, returning ELOOP if
501 it appears to deep or EDEADLK if the link would introduce a cycle. 503 it appears too deep or EDEADLK if the link would introduce a cycle.
504
505 Any links within the keyring to keys that match the new key in terms of
506 type and description will be discarded from the keyring as the new one is
507 added.
502 508
503 509
504 (*) Unlink a key or keyring from another keyring: 510 (*) Unlink a key or keyring from another keyring:
@@ -628,6 +634,41 @@ The keyctl syscall functions are:
628 there is one, otherwise the user default session keyring. 634 there is one, otherwise the user default session keyring.
629 635
630 636
637 (*) Set the timeout on a key.
638
639 long keyctl(KEYCTL_SET_TIMEOUT, key_serial_t key, unsigned timeout);
640
641 This sets or clears the timeout on a key. The timeout can be 0 to clear
642 the timeout or a number of seconds to set the expiry time that far into
643 the future.
644
645 The process must have attribute modification access on a key to set its
646 timeout. Timeouts may not be set with this function on negative, revoked
647 or expired keys.
648
649
650 (*) Assume the authority granted to instantiate a key
651
652 long keyctl(KEYCTL_ASSUME_AUTHORITY, key_serial_t key);
653
654 This assumes or divests the authority required to instantiate the
655 specified key. Authority can only be assumed if the thread has the
656 authorisation key associated with the specified key in its keyrings
657 somewhere.
658
659 Once authority is assumed, searches for keys will also search the
660 requester's keyrings using the requester's security label, UID, GID and
661 groups.
662
663 If the requested authority is unavailable, error EPERM will be returned,
664 likewise if the authority has been revoked because the target key is
665 already instantiated.
666
667 If the specified key is 0, then any assumed authority will be divested.
668
669 The assumed authorititive key is inherited across fork and exec.
670
671
631=============== 672===============
632KERNEL SERVICES 673KERNEL SERVICES
633=============== 674===============
diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index b0fe41da007b..8d8b4e5ea184 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -945,7 +945,6 @@ bond0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
945 collisions:0 txqueuelen:0 945 collisions:0 txqueuelen:0
946 946
947eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 947eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
948 inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
949 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 948 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
950 RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0 949 RX packets:3573025 errors:0 dropped:0 overruns:0 frame:0
951 TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0 950 TX packets:1643167 errors:1 dropped:0 overruns:1 carrier:0
@@ -953,7 +952,6 @@ eth0 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
953 Interrupt:10 Base address:0x1080 952 Interrupt:10 Base address:0x1080
954 953
955eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4 954eth1 Link encap:Ethernet HWaddr 00:C0:F0:1F:37:B4
956 inet addr:XXX.XXX.XXX.YYY Bcast:XXX.XXX.XXX.255 Mask:255.255.252.0
957 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 955 UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1
958 RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0 956 RX packets:3651769 errors:0 dropped:0 overruns:0 frame:0
959 TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0 957 TX packets:1643480 errors:0 dropped:0 overruns:0 carrier:0
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 2f1aae32a5d9..6910c0136f8d 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -26,12 +26,13 @@ Currently, these files are in /proc/sys/vm:
26- min_free_kbytes 26- min_free_kbytes
27- laptop_mode 27- laptop_mode
28- block_dump 28- block_dump
29- drop-caches
29 30
30============================================================== 31==============================================================
31 32
32dirty_ratio, dirty_background_ratio, dirty_expire_centisecs, 33dirty_ratio, dirty_background_ratio, dirty_expire_centisecs,
33dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode, 34dirty_writeback_centisecs, vfs_cache_pressure, laptop_mode,
34block_dump, swap_token_timeout: 35block_dump, swap_token_timeout, drop-caches:
35 36
36See Documentation/filesystems/proc.txt 37See Documentation/filesystems/proc.txt
37 38
@@ -102,3 +103,20 @@ This is used to force the Linux VM to keep a minimum number
102of kilobytes free. The VM uses this number to compute a pages_min 103of kilobytes free. The VM uses this number to compute a pages_min
103value for each lowmem zone in the system. Each lowmem zone gets 104value for each lowmem zone in the system. Each lowmem zone gets
104a number of reserved free pages based proportionally on its size. 105a number of reserved free pages based proportionally on its size.
106
107==============================================================
108
109percpu_pagelist_fraction
110
111This is the fraction of pages at most (high mark pcp->high) in each zone that
112are allocated for each per cpu page list. The min value for this is 8. It
113means that we don't allow more than 1/8th of pages in each zone to be
114allocated in any single per_cpu_pagelist. This entry only changes the value
115of hot per cpu pagelists. User can specify a number like 100 to allocate
1161/100th of each zone to each per cpu page list.
117
118The batch value of each per cpu pagelist is also updated as a result. It is
119set to pcp->high/4. The upper limit of batch is (PAGE_SHIFT * 8)
120
121The initial value is zero. Kernel does not use this value at boot time to set
122the high water marks for each per cpu page list.