aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/CodingStyle100
-rw-r--r--Documentation/DocBook/kernel-api.tmpl13
-rw-r--r--Documentation/RCU/whatisRCU.txt1
-rw-r--r--Documentation/SubmitChecklist57
-rw-r--r--Documentation/devices.txt135
-rw-r--r--Documentation/feature-removal-schedule.txt15
-rw-r--r--Documentation/filesystems/Locking9
-rw-r--r--Documentation/filesystems/porting7
-rw-r--r--Documentation/filesystems/vfs.txt6
-rw-r--r--Documentation/ia64/aliasing.txt208
-rw-r--r--Documentation/ioctl-number.txt2
-rw-r--r--Documentation/kernel-parameters.txt3
-rw-r--r--Documentation/networking/tuntap.txt11
-rw-r--r--Documentation/power/swsusp.txt45
-rw-r--r--Documentation/power/video.txt4
-rw-r--r--Documentation/sparse.txt36
-rw-r--r--Documentation/sysctl/vm.txt13
-rw-r--r--Documentation/vm/page_migration114
18 files changed, 624 insertions, 155 deletions
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle
index ce5d2c038cf5..6d2412ec91ed 100644
--- a/Documentation/CodingStyle
+++ b/Documentation/CodingStyle
@@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome.
155See next chapter. 155See next chapter.
156 156
157 157
158 Chapter 5: Functions 158 Chapter 5: Typedefs
159
160Please don't use things like "vps_t".
161
162It's a _mistake_ to use typedef for structures and pointers. When you see a
163
164 vps_t a;
165
166in the source, what does it mean?
167
168In contrast, if it says
169
170 struct virtual_container *a;
171
172you can actually tell what "a" is.
173
174Lots of people think that typedefs "help readability". Not so. They are
175useful only for:
176
177 (a) totally opaque objects (where the typedef is actively used to _hide_
178 what the object is).
179
180 Example: "pte_t" etc. opaque objects that you can only access using
181 the proper accessor functions.
182
183 NOTE! Opaqueness and "accessor functions" are not good in themselves.
184 The reason we have them for things like pte_t etc. is that there
185 really is absolutely _zero_ portably accessible information there.
186
187 (b) Clear integer types, where the abstraction _helps_ avoid confusion
188 whether it is "int" or "long".
189
190 u8/u16/u32 are perfectly fine typedefs, although they fit into
191 category (d) better than here.
192
193 NOTE! Again - there needs to be a _reason_ for this. If something is
194 "unsigned long", then there's no reason to do
195
196 typedef unsigned long myflags_t;
197
198 but if there is a clear reason for why it under certain circumstances
199 might be an "unsigned int" and under other configurations might be
200 "unsigned long", then by all means go ahead and use a typedef.
201
202 (c) when you use sparse to literally create a _new_ type for
203 type-checking.
204
205 (d) New types which are identical to standard C99 types, in certain
206 exceptional circumstances.
207
208 Although it would only take a short amount of time for the eyes and
209 brain to become accustomed to the standard types like 'uint32_t',
210 some people object to their use anyway.
211
212 Therefore, the Linux-specific 'u8/u16/u32/u64' types and their
213 signed equivalents which are identical to standard types are
214 permitted -- although they are not mandatory in new code of your
215 own.
216
217 When editing existing code which already uses one or the other set
218 of types, you should conform to the existing choices in that code.
219
220 (e) Types safe for use in userspace.
221
222 In certain structures which are visible to userspace, we cannot
223 require C99 types and cannot use the 'u32' form above. Thus, we
224 use __u32 and similar types in all structures which are shared
225 with userspace.
226
227Maybe there are other cases too, but the rule should basically be to NEVER
228EVER use a typedef unless you can clearly match one of those rules.
229
230In general, a pointer, or a struct that has elements that can reasonably
231be directly accessed should _never_ be a typedef.
232
233
234 Chapter 6: Functions
159 235
160Functions should be short and sweet, and do just one thing. They should 236Functions should be short and sweet, and do just one thing. They should
161fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, 237fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24,
@@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like
183to understand what you did 2 weeks from now. 259to understand what you did 2 weeks from now.
184 260
185 261
186 Chapter 6: Centralized exiting of functions 262 Chapter 7: Centralized exiting of functions
187 263
188Albeit deprecated by some people, the equivalent of the goto statement is 264Albeit deprecated by some people, the equivalent of the goto statement is
189used frequently by compilers in form of the unconditional jump instruction. 265used frequently by compilers in form of the unconditional jump instruction.
@@ -220,7 +296,7 @@ out:
220 return result; 296 return result;
221} 297}
222 298
223 Chapter 7: Commenting 299 Chapter 8: Commenting
224 300
225Comments are good, but there is also a danger of over-commenting. NEVER 301Comments are good, but there is also a danger of over-commenting. NEVER
226try to explain HOW your code works in a comment: it's much better to 302try to explain HOW your code works in a comment: it's much better to
@@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format.
240See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc 316See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc
241for details. 317for details.
242 318
243 Chapter 8: You've made a mess of it 319 Chapter 9: You've made a mess of it
244 320
245That's OK, we all do. You've probably been told by your long-time Unix 321That's OK, we all do. You've probably been told by your long-time Unix
246user helper that "GNU emacs" automatically formats the C sources for 322user helper that "GNU emacs" automatically formats the C sources for
@@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But
288remember: "indent" is not a fix for bad programming. 364remember: "indent" is not a fix for bad programming.
289 365
290 366
291 Chapter 9: Configuration-files 367 Chapter 10: Configuration-files
292 368
293For configuration options (arch/xxx/Kconfig, and all the Kconfig files), 369For configuration options (arch/xxx/Kconfig, and all the Kconfig files),
294somewhat different indentation is used. 370somewhat different indentation is used.
@@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other
313experimental options should be denoted (EXPERIMENTAL). 389experimental options should be denoted (EXPERIMENTAL).
314 390
315 391
316 Chapter 10: Data structures 392 Chapter 11: Data structures
317 393
318Data structures that have visibility outside the single-threaded 394Data structures that have visibility outside the single-threaded
319environment they are created and destroyed in should always have 395environment they are created and destroyed in should always have
@@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't
344have a reference count on it, you almost certainly have a bug. 420have a reference count on it, you almost certainly have a bug.
345 421
346 422
347 Chapter 11: Macros, Enums and RTL 423 Chapter 12: Macros, Enums and RTL
348 424
349Names of macros defining constants and labels in enums are capitalized. 425Names of macros defining constants and labels in enums are capitalized.
350 426
@@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also
399covers RTL which is used frequently with assembly language in the kernel. 475covers RTL which is used frequently with assembly language in the kernel.
400 476
401 477
402 Chapter 12: Printing kernel messages 478 Chapter 13: Printing kernel messages
403 479
404Kernel developers like to be seen as literate. Do mind the spelling 480Kernel developers like to be seen as literate. Do mind the spelling
405of kernel messages to make a good impression. Do not use crippled 481of kernel messages to make a good impression. Do not use crippled
@@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period.
410Printing numbers in parentheses (%d) adds no value and should be avoided. 486Printing numbers in parentheses (%d) adds no value and should be avoided.
411 487
412 488
413 Chapter 13: Allocating memory 489 Chapter 14: Allocating memory
414 490
415The kernel provides the following general purpose memory allocators: 491The kernel provides the following general purpose memory allocators:
416kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API 492kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API
@@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming
429language. 505language.
430 506
431 507
432 Chapter 14: The inline disease 508 Chapter 15: The inline disease
433 509
434There appears to be a common misperception that gcc has a magic "make me 510There appears to be a common misperception that gcc has a magic "make me
435faster" speedup option called "inline". While the use of inlines can be 511faster" speedup option called "inline". While the use of inlines can be
@@ -457,7 +533,7 @@ something it would have done anyway.
457 533
458 534
459 535
460 Chapter 15: References 536 Appendix I: References
461 537
462The C Programming Language, Second Edition 538The C Programming Language, Second Edition
463by Brian W. Kernighan and Dennis M. Ritchie. 539by Brian W. Kernighan and Dennis M. Ritchie.
@@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002:
481http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ 557http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/
482 558
483-- 559--
484Last updated on 30 December 2005 by a community effort on LKML. 560Last updated on 30 April 2006.
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl
index ca02e04a906c..31b727ceb127 100644
--- a/Documentation/DocBook/kernel-api.tmpl
+++ b/Documentation/DocBook/kernel-api.tmpl
@@ -117,6 +117,7 @@ X!Ilib/string.c
117 <chapter id="mm"> 117 <chapter id="mm">
118 <title>Memory Management in Linux</title> 118 <title>Memory Management in Linux</title>
119 <sect1><title>The Slab Cache</title> 119 <sect1><title>The Slab Cache</title>
120!Iinclude/linux/slab.h
120!Emm/slab.c 121!Emm/slab.c
121 </sect1> 122 </sect1>
122 <sect1><title>User Space Memory Access</title> 123 <sect1><title>User Space Memory Access</title>
@@ -331,6 +332,18 @@ X!Earch/i386/kernel/mca.c
331!Esecurity/security.c 332!Esecurity/security.c
332 </chapter> 333 </chapter>
333 334
335 <chapter id="audit">
336 <title>Audit Interfaces</title>
337!Ekernel/audit.c
338!Ikernel/auditsc.c
339!Ikernel/auditfilter.c
340 </chapter>
341
342 <chapter id="accounting">
343 <title>Accounting Framework</title>
344!Ikernel/acct.c
345 </chapter>
346
334 <chapter id="pmfuncs"> 347 <chapter id="pmfuncs">
335 <title>Power Management</title> 348 <title>Power Management</title>
336!Ekernel/power/pm.c 349!Ekernel/power/pm.c
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 07cb93b82ba9..6e459420ee9f 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -790,7 +790,6 @@ RCU pointer update:
790 790
791RCU grace period: 791RCU grace period:
792 792
793 synchronize_kernel (deprecated)
794 synchronize_net 793 synchronize_net
795 synchronize_sched 794 synchronize_sched
796 synchronize_rcu 795 synchronize_rcu
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist
new file mode 100644
index 000000000000..8230098da529
--- /dev/null
+++ b/Documentation/SubmitChecklist
@@ -0,0 +1,57 @@
1Linux Kernel patch sumbittal checklist
2~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
3
4Here are some basic things that developers should do if they
5want to see their kernel patch submittals accepted quicker.
6
7These are all above and beyond the documentation that is provided
8in Documentation/SubmittingPatches and elsewhere about submitting
9Linux kernel patches.
10
11
12
13- Builds cleanly with applicable or modified CONFIG options =y, =m, and =n.
14 No gcc warnings/errors, no linker warnings/errors.
15
16- Passes allnoconfig, allmodconfig
17
18- Builds on multiple CPU arch-es by using local cross-compile tools
19 or something like PLM at OSDL.
20
21- ppc64 is a good architecture for cross-compilation checking because it
22 tends to use `unsigned long' for 64-bit quantities.
23
24- Matches kernel coding style(!)
25
26- Any new or modified CONFIG options don't muck up the config menu.
27
28- All new Kconfig options have help text.
29
30- Has been carefully reviewed with respect to relevant Kconfig
31 combinations. This is very hard to get right with testing --
32 brainpower pays off here.
33
34- Check cleanly with sparse.
35
36- Use 'make checkstack' and 'make namespacecheck' and fix any
37 problems that they find. Note: checkstack does not point out
38 problems explicitly, but any one function that uses more than
39 512 bytes on the stack is a candidate for change.
40
41- Include kernel-doc to document global kernel APIs. (Not required
42 for static functions, but OK there also.) Use 'make htmldocs'
43 or 'make mandocs' to check the kernel-doc and fix any issues.
44
45- Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT,
46 CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES,
47 CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously
48 enabled.
49
50- Has been build- and runtime tested with and without CONFIG_SMP and
51 CONFIG_PREEMPT.
52
53- If the patch affects IO/Disk, etc: has been tested with and without
54 CONFIG_LBD.
55
56
572006-APR-27
diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index b369a8c46a73..b2f593fc76ca 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -3,7 +3,7 @@
3 3
4 Maintained by Torben Mathiasen <device@lanana.org> 4 Maintained by Torben Mathiasen <device@lanana.org>
5 5
6 Last revised: 25 January 2005 6 Last revised: 01 March 2006
7 7
8This list is the Linux Device List, the official registry of allocated 8This list is the Linux Device List, the official registry of allocated
9device numbers and /dev directory nodes for the Linux operating 9device numbers and /dev directory nodes for the Linux operating
@@ -94,7 +94,6 @@ Your cooperation is appreciated.
94 9 = /dev/urandom Faster, less secure random number gen. 94 9 = /dev/urandom Faster, less secure random number gen.
95 10 = /dev/aio Asyncronous I/O notification interface 95 10 = /dev/aio Asyncronous I/O notification interface
96 11 = /dev/kmsg Writes to this come out as printk's 96 11 = /dev/kmsg Writes to this come out as printk's
97 12 = /dev/oldmem Access to crash dump from kexec kernel
98 1 block RAM disk 97 1 block RAM disk
99 0 = /dev/ram0 First RAM disk 98 0 = /dev/ram0 First RAM disk
100 1 = /dev/ram1 Second RAM disk 99 1 = /dev/ram1 Second RAM disk
@@ -262,13 +261,13 @@ Your cooperation is appreciated.
262 NOTE: These devices permit both read and write access. 261 NOTE: These devices permit both read and write access.
263 262
264 7 block Loopback devices 263 7 block Loopback devices
265 0 = /dev/loop0 First loopback device 264 0 = /dev/loop0 First loop device
266 1 = /dev/loop1 Second loopback device 265 1 = /dev/loop1 Second loop device
267 ... 266 ...
268 267
269 The loopback devices are used to mount filesystems not 268 The loop devices are used to mount filesystems not
270 associated with block devices. The binding to the 269 associated with block devices. The binding to the
271 loopback devices is handled by mount(8) or losetup(8). 270 loop devices is handled by mount(8) or losetup(8).
272 271
273 8 block SCSI disk devices (0-15) 272 8 block SCSI disk devices (0-15)
274 0 = /dev/sda First SCSI disk whole disk 273 0 = /dev/sda First SCSI disk whole disk
@@ -943,7 +942,7 @@ Your cooperation is appreciated.
943 240 = /dev/ftlp FTL on 16th Memory Technology Device 942 240 = /dev/ftlp FTL on 16th Memory Technology Device
944 943
945 Partitions are handled in the same way as for IDE 944 Partitions are handled in the same way as for IDE
946 disks (see major number 3) expect that the partition 945 disks (see major number 3) except that the partition
947 limit is 15 rather than 63 per disk (same as SCSI.) 946 limit is 15 rather than 63 per disk (same as SCSI.)
948 947
949 45 char isdn4linux ISDN BRI driver 948 45 char isdn4linux ISDN BRI driver
@@ -1168,7 +1167,7 @@ Your cooperation is appreciated.
1168 The filename of the encrypted container and the passwords 1167 The filename of the encrypted container and the passwords
1169 are sent via ioctls (using the sdmount tool) to the master 1168 are sent via ioctls (using the sdmount tool) to the master
1170 node which then activates them via one of the 1169 node which then activates them via one of the
1171 /dev/scramdisk/x nodes for loopback mounting (all handled 1170 /dev/scramdisk/x nodes for loop mounting (all handled
1172 through the sdmount tool). 1171 through the sdmount tool).
1173 1172
1174 Requested by: andy@scramdisklinux.org 1173 Requested by: andy@scramdisklinux.org
@@ -2538,18 +2537,32 @@ Your cooperation is appreciated.
2538 0 = /dev/usb/lp0 First USB printer 2537 0 = /dev/usb/lp0 First USB printer
2539 ... 2538 ...
2540 15 = /dev/usb/lp15 16th USB printer 2539 15 = /dev/usb/lp15 16th USB printer
2541 16 = /dev/usb/mouse0 First USB mouse
2542 ...
2543 31 = /dev/usb/mouse15 16th USB mouse
2544 32 = /dev/usb/ez0 First USB firmware loader
2545 ...
2546 47 = /dev/usb/ez15 16th USB firmware loader
2547 48 = /dev/usb/scanner0 First USB scanner 2540 48 = /dev/usb/scanner0 First USB scanner
2548 ... 2541 ...
2549 63 = /dev/usb/scanner15 16th USB scanner 2542 63 = /dev/usb/scanner15 16th USB scanner
2550 64 = /dev/usb/rio500 Diamond Rio 500 2543 64 = /dev/usb/rio500 Diamond Rio 500
2551 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) 2544 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de)
2552 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) 2545 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD)
2546 96 = /dev/usb/hiddev0 1st USB HID device
2547 ...
2548 111 = /dev/usb/hiddev15 16th USB HID device
2549 112 = /dev/usb/auer0 1st auerswald ISDN device
2550 ...
2551 127 = /dev/usb/auer15 16th auerswald ISDN device
2552 128 = /dev/usb/brlvgr0 First Braille Voyager device
2553 ...
2554 131 = /dev/usb/brlvgr3 Fourth Braille Voyager device
2555 132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device
2556 133 = /dev/usb/sisusbvga1 First SiSUSB VGA device
2557 ...
2558 140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device
2559 144 = /dev/usb/lcd USB LCD device
2560 160 = /dev/usb/legousbtower0 1st USB Legotower device
2561 ...
2562 175 = /dev/usb/legousbtower15 16th USB Legotower device
2563 240 = /dev/usb/dabusb0 First daubusb device
2564 ...
2565 243 = /dev/usb/dabusb3 Fourth dabusb device
2553 2566
2554180 block USB block devices 2567180 block USB block devices
2555 0 = /dev/uba First USB block device 2568 0 = /dev/uba First USB block device
@@ -2710,6 +2723,17 @@ Your cooperation is appreciated.
2710 1 = /dev/cpu/1/msr MSRs on CPU 1 2723 1 = /dev/cpu/1/msr MSRs on CPU 1
2711 ... 2724 ...
2712 2725
2726202 block Xen Virtual Block Device
2727 0 = /dev/xvda First Xen VBD whole disk
2728 16 = /dev/xvdb Second Xen VBD whole disk
2729 32 = /dev/xvdc Third Xen VBD whole disk
2730 ...
2731 240 = /dev/xvdp Sixteenth Xen VBD whole disk
2732
2733 Partitions are handled in the same way as for IDE
2734 disks (see major number 3) except that the limit on
2735 partitions is 15.
2736
2713203 char CPU CPUID information 2737203 char CPU CPUID information
2714 0 = /dev/cpu/0/cpuid CPUID on CPU 0 2738 0 = /dev/cpu/0/cpuid CPUID on CPU 0
2715 1 = /dev/cpu/1/cpuid CPUID on CPU 1 2739 1 = /dev/cpu/1/cpuid CPUID on CPU 1
@@ -2747,11 +2771,26 @@ Your cooperation is appreciated.
2747 46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0 2771 46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0
2748 ... 2772 ...
2749 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5 2773 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5
2750 50 = /dev/ttyIOC40 Altix serial card 2774 50 = /dev/ttyIOC0 Altix serial card
2775 ...
2776 81 = /dev/ttyIOC31 Altix serial card
2777 82 = /dev/ttyVR0 NEC VR4100 series SIU
2778 83 = /dev/ttyVR1 NEC VR4100 series DSIU
2779 84 = /dev/ttyIOC84 Altix ioc4 serial card
2780 ...
2781 115 = /dev/ttyIOC115 Altix ioc4 serial card
2782 116 = /dev/ttySIOC0 Altix ioc3 serial card
2783 ...
2784 147 = /dev/ttySIOC31 Altix ioc3 serial card
2785 148 = /dev/ttyPSC0 PPC PSC - port 0
2786 ...
2787 153 = /dev/ttyPSC5 PPC PSC - port 5
2788 154 = /dev/ttyAT0 ATMEL serial port 0
2751 ... 2789 ...
2752 81 = /dev/ttyIOC431 Altix serial card 2790 169 = /dev/ttyAT15 ATMEL serial port 15
2753 82 = /dev/ttyVR0 NEC VR4100 series SIU 2791 170 = /dev/ttyNX0 Hilscher netX serial port 0
2754 83 = /dev/ttyVR1 NEC VR4100 series DSIU 2792 ...
2793 185 = /dev/ttyNX15 Hilscher netX serial port 15
2755 2794
2756205 char Low-density serial ports (alternate device) 2795205 char Low-density serial ports (alternate device)
2757 0 = /dev/culu0 Callout device for ttyLU0 2796 0 = /dev/culu0 Callout device for ttyLU0
@@ -2786,8 +2825,8 @@ Your cooperation is appreciated.
2786 50 = /dev/cuioc40 Callout device for ttyIOC40 2825 50 = /dev/cuioc40 Callout device for ttyIOC40
2787 ... 2826 ...
2788 81 = /dev/cuioc431 Callout device for ttyIOC431 2827 81 = /dev/cuioc431 Callout device for ttyIOC431
2789 82 = /dev/cuvr0 Callout device for ttyVR0 2828 82 = /dev/cuvr0 Callout device for ttyVR0
2790 83 = /dev/cuvr1 Callout device for ttyVR1 2829 83 = /dev/cuvr1 Callout device for ttyVR1
2791 2830
2792 2831
2793206 char OnStream SC-x0 tape devices 2832206 char OnStream SC-x0 tape devices
@@ -2897,7 +2936,6 @@ Your cooperation is appreciated.
2897 ... 2936 ...
2898 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card 2937 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card
2899 2938
2900
2901216 char Bluetooth RFCOMM TTY devices 2939216 char Bluetooth RFCOMM TTY devices
2902 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device 2940 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device
2903 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device 2941 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device
@@ -3002,12 +3040,43 @@ Your cooperation is appreciated.
3002 ioctl()'s can be used to rewind the tape regardless of 3040 ioctl()'s can be used to rewind the tape regardless of
3003 the device used to access it. 3041 the device used to access it.
3004 3042
3005231 char InfiniBand MAD 3043231 char InfiniBand
3006 0 = /dev/infiniband/umad0 3044 0 = /dev/infiniband/umad0
3007 1 = /dev/infiniband/umad1 3045 1 = /dev/infiniband/umad1
3008 ... 3046 ...
3047 63 = /dev/infiniband/umad63 63rd InfiniBandMad device
3048 64 = /dev/infiniband/issm0 First InfiniBand IsSM device
3049 65 = /dev/infiniband/issm1 Second InfiniBand IsSM device
3050 ...
3051 127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device
3052 128 = /dev/infiniband/uverbs0 First InfiniBand verbs device
3053 129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device
3054 ...
3055 159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device
3056
3057232 char Biometric Devices
3058 0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device
3059 1 = /dev/biometric/sensor0/iris first iris sensor on first device
3060 2 = /dev/biometric/sensor0/retina first retina sensor on first device
3061 3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device
3062 4 = /dev/biometric/sensor0/facial first facial sensor on first device
3063 5 = /dev/biometric/sensor0/hand first hand sensor on first device
3064 ...
3065 10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device
3066 ...
3067 20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device
3068 ...
3009 3069
3010232-239 UNASSIGNED 3070233 char PathScale InfiniPath interconnect
3071 0 = /dev/ipath Primary device for programs (any unit)
3072 1 = /dev/ipath0 Access specifically to unit 0
3073 2 = /dev/ipath1 Access specifically to unit 1
3074 ...
3075 4 = /dev/ipath3 Access specifically to unit 3
3076 129 = /dev/ipath_sma Device used by Subnet Management Agent
3077 130 = /dev/ipath_diag Device used by diagnostics programs
3078
3079234-239 UNASSIGNED
3011 3080
3012240-254 char LOCAL/EXPERIMENTAL USE 3081240-254 char LOCAL/EXPERIMENTAL USE
3013240-254 block LOCAL/EXPERIMENTAL USE 3082240-254 block LOCAL/EXPERIMENTAL USE
@@ -3021,6 +3090,24 @@ Your cooperation is appreciated.
3021 This major is reserved to assist the expansion to a 3090 This major is reserved to assist the expansion to a
3022 larger number space. No device nodes with this major 3091 larger number space. No device nodes with this major
3023 should ever be created on the filesystem. 3092 should ever be created on the filesystem.
3093 (This is probaly not true anymore, but I'll leave it
3094 for now /Torben)
3095
3096---LARGE MAJORS!!!!!---
3097
3098256 char Equinox SST multi-port serial boards
3099 0 = /dev/ttyEQ0 First serial port on first Equinox SST board
3100 127 = /dev/ttyEQ127 Last serial port on first Equinox SST board
3101 128 = /dev/ttyEQ128 First serial port on second Equinox SST board
3102 ...
3103 1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board
3104
3105256 block Resident Flash Disk Flash Translation Layer
3106 0 = /dev/rfda First RFD FTL layer
3107 16 = /dev/rfdb Second RFD FTL layer
3108 ...
3109 240 = /dev/rfdp 16th RFD FTL layer
3110
3024 3111
3025 **** ADDITIONAL /dev DIRECTORY ENTRIES 3112 **** ADDITIONAL /dev DIRECTORY ENTRIES
3026 3113
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index f7293297f326..027285d0c26c 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -33,21 +33,6 @@ Who: Adrian Bunk <bunk@stusta.de>
33 33
34--------------------------- 34---------------------------
35 35
36What: RCU API moves to EXPORT_SYMBOL_GPL
37When: April 2006
38Files: include/linux/rcupdate.h, kernel/rcupdate.c
39Why: Outside of Linux, the only implementations of anything even
40 vaguely resembling RCU that I am aware of are in DYNIX/ptx,
41 VM/XA, Tornado, and K42. I do not expect anyone to port binary
42 drivers or kernel modules from any of these, since the first two
43 are owned by IBM and the last two are open-source research OSes.
44 So these will move to GPL after a grace period to allow
45 people, who might be using implementations that I am not aware
46 of, to adjust to this upcoming change.
47Who: Paul E. McKenney <paulmck@us.ibm.com>
48
49---------------------------
50
51What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN 36What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN
52When: November 2006 37When: November 2006
53Why: Deprecated in favour of the new ioctl-based rawiso interface, which is 38Why: Deprecated in favour of the new ioctl-based rawiso interface, which is
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking
index 1045da582b9b..d31efbbdfe50 100644
--- a/Documentation/filesystems/Locking
+++ b/Documentation/filesystems/Locking
@@ -99,7 +99,7 @@ prototypes:
99 int (*sync_fs)(struct super_block *sb, int wait); 99 int (*sync_fs)(struct super_block *sb, int wait);
100 void (*write_super_lockfs) (struct super_block *); 100 void (*write_super_lockfs) (struct super_block *);
101 void (*unlockfs) (struct super_block *); 101 void (*unlockfs) (struct super_block *);
102 int (*statfs) (struct super_block *, struct kstatfs *); 102 int (*statfs) (struct dentry *, struct kstatfs *);
103 int (*remount_fs) (struct super_block *, int *, char *); 103 int (*remount_fs) (struct super_block *, int *, char *);
104 void (*clear_inode) (struct inode *); 104 void (*clear_inode) (struct inode *);
105 void (*umount_begin) (struct super_block *); 105 void (*umount_begin) (struct super_block *);
@@ -142,15 +142,16 @@ see also dquot_operations section.
142 142
143--------------------------- file_system_type --------------------------- 143--------------------------- file_system_type ---------------------------
144prototypes: 144prototypes:
145 struct super_block *(*get_sb) (struct file_system_type *, int, 145 struct int (*get_sb) (struct file_system_type *, int,
146 const char *, void *); 146 const char *, void *, struct vfsmount *);
147 void (*kill_sb) (struct super_block *); 147 void (*kill_sb) (struct super_block *);
148locking rules: 148locking rules:
149 may block BKL 149 may block BKL
150get_sb yes yes 150get_sb yes yes
151kill_sb yes yes 151kill_sb yes yes
152 152
153->get_sb() returns error or a locked superblock (exclusive on ->s_umount). 153->get_sb() returns error or 0 with locked superblock attached to the vfsmount
154(exclusive on ->s_umount).
154->kill_sb() takes a write-locked superblock, does all shutdown work on it, 155->kill_sb() takes a write-locked superblock, does all shutdown work on it,
155unlocks and drops the reference. 156unlocks and drops the reference.
156 157
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting
index 2f388460cbe7..5531694059ab 100644
--- a/Documentation/filesystems/porting
+++ b/Documentation/filesystems/porting
@@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of
50success and negative number in case of error (-EINVAL unless you have more 50success and negative number in case of error (-EINVAL unless you have more
51informative error value to report). Call it foo_fill_super(). Now declare 51informative error value to report). Call it foo_fill_super(). Now declare
52 52
53struct super_block foo_get_sb(struct file_system_type *fs_type, 53int foo_get_sb(struct file_system_type *fs_type,
54 int flags, const char *dev_name, void *data) 54 int flags, const char *dev_name, void *data, struct vfsmount *mnt)
55{ 55{
56 return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super); 56 return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
57 mnt);
57} 58}
58 59
59(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of 60(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3a2e5520c1e3..9d3aed628bc1 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -113,8 +113,8 @@ members are defined:
113struct file_system_type { 113struct file_system_type {
114 const char *name; 114 const char *name;
115 int fs_flags; 115 int fs_flags;
116 struct super_block *(*get_sb) (struct file_system_type *, int, 116 struct int (*get_sb) (struct file_system_type *, int,
117 const char *, void *); 117 const char *, void *, struct vfsmount *);
118 void (*kill_sb) (struct super_block *); 118 void (*kill_sb) (struct super_block *);
119 struct module *owner; 119 struct module *owner;
120 struct file_system_type * next; 120 struct file_system_type * next;
@@ -211,7 +211,7 @@ struct super_operations {
211 int (*sync_fs)(struct super_block *sb, int wait); 211 int (*sync_fs)(struct super_block *sb, int wait);
212 void (*write_super_lockfs) (struct super_block *); 212 void (*write_super_lockfs) (struct super_block *);
213 void (*unlockfs) (struct super_block *); 213 void (*unlockfs) (struct super_block *);
214 int (*statfs) (struct super_block *, struct kstatfs *); 214 int (*statfs) (struct dentry *, struct kstatfs *);
215 int (*remount_fs) (struct super_block *, int *, char *); 215 int (*remount_fs) (struct super_block *, int *, char *);
216 void (*clear_inode) (struct inode *); 216 void (*clear_inode) (struct inode *);
217 void (*umount_begin) (struct super_block *); 217 void (*umount_begin) (struct super_block *);
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt
new file mode 100644
index 000000000000..38f9a52d1820
--- /dev/null
+++ b/Documentation/ia64/aliasing.txt
@@ -0,0 +1,208 @@
1 MEMORY ATTRIBUTE ALIASING ON IA-64
2
3 Bjorn Helgaas
4 <bjorn.helgaas@hp.com>
5 May 4, 2006
6
7
8MEMORY ATTRIBUTES
9
10 Itanium supports several attributes for virtual memory references.
11 The attribute is part of the virtual translation, i.e., it is
12 contained in the TLB entry. The ones of most interest to the Linux
13 kernel are:
14
15 WB Write-back (cacheable)
16 UC Uncacheable
17 WC Write-coalescing
18
19 System memory typically uses the WB attribute. The UC attribute is
20 used for memory-mapped I/O devices. The WC attribute is uncacheable
21 like UC is, but writes may be delayed and combined to increase
22 performance for things like frame buffers.
23
24 The Itanium architecture requires that we avoid accessing the same
25 page with both a cacheable mapping and an uncacheable mapping[1].
26
27 The design of the chipset determines which attributes are supported
28 on which regions of the address space. For example, some chipsets
29 support either WB or UC access to main memory, while others support
30 only WB access.
31
32MEMORY MAP
33
34 Platform firmware describes the physical memory map and the
35 supported attributes for each region. At boot-time, the kernel uses
36 the EFI GetMemoryMap() interface. ACPI can also describe memory
37 devices and the attributes they support, but Linux/ia64 currently
38 doesn't use this information.
39
40 The kernel uses the efi_memmap table returned from GetMemoryMap() to
41 learn the attributes supported by each region of physical address
42 space. Unfortunately, this table does not completely describe the
43 address space because some machines omit some or all of the MMIO
44 regions from the map.
45
46 The kernel maintains another table, kern_memmap, which describes the
47 memory Linux is actually using and the attribute for each region.
48 This contains only system memory; it does not contain MMIO space.
49
50 The kern_memmap table typically contains only a subset of the system
51 memory described by the efi_memmap. Linux/ia64 can't use all memory
52 in the system because of constraints imposed by the identity mapping
53 scheme.
54
55 The efi_memmap table is preserved unmodified because the original
56 boot-time information is required for kexec.
57
58KERNEL IDENTITY MAPPINGS
59
60 Linux/ia64 identity mappings are done with large pages, currently
61 either 16MB or 64MB, referred to as "granules." Cacheable mappings
62 are speculative[2], so the processor can read any location in the
63 page at any time, independent of the programmer's intentions. This
64 means that to avoid attribute aliasing, Linux can create a cacheable
65 identity mapping only when the entire granule supports cacheable
66 access.
67
68 Therefore, kern_memmap contains only full granule-sized regions that
69 can referenced safely by an identity mapping.
70
71 Uncacheable mappings are not speculative, so the processor will
72 generate UC accesses only to locations explicitly referenced by
73 software. This allows UC identity mappings to cover granules that
74 are only partially populated, or populated with a combination of UC
75 and WB regions.
76
77USER MAPPINGS
78
79 User mappings are typically done with 16K or 64K pages. The smaller
80 page size allows more flexibility because only 16K or 64K has to be
81 homogeneous with respect to memory attributes.
82
83POTENTIAL ATTRIBUTE ALIASING CASES
84
85 There are several ways the kernel creates new mappings:
86
87 mmap of /dev/mem
88
89 This uses remap_pfn_range(), which creates user mappings. These
90 mappings may be either WB or UC. If the region being mapped
91 happens to be in kern_memmap, meaning that it may also be mapped
92 by a kernel identity mapping, the user mapping must use the same
93 attribute as the kernel mapping.
94
95 If the region is not in kern_memmap, the user mapping should use
96 an attribute reported as being supported in the EFI memory map.
97
98 Since the EFI memory map does not describe MMIO on some
99 machines, this should use an uncacheable mapping as a fallback.
100
101 mmap of /sys/class/pci_bus/.../legacy_mem
102
103 This is very similar to mmap of /dev/mem, except that legacy_mem
104 only allows mmap of the one megabyte "legacy MMIO" area for a
105 specific PCI bus. Typically this is the first megabyte of
106 physical address space, but it may be different on machines with
107 several VGA devices.
108
109 "X" uses this to access VGA frame buffers. Using legacy_mem
110 rather than /dev/mem allows multiple instances of X to talk to
111 different VGA cards.
112
113 The /dev/mem mmap constraints apply.
114
115 However, since this is for mapping legacy MMIO space, WB access
116 does not make sense. This matters on machines without legacy
117 VGA support: these machines may have WB memory for the entire
118 first megabyte (or even the entire first granule).
119
120 On these machines, we could mmap legacy_mem as WB, which would
121 be safe in terms of attribute aliasing, but X has no way of
122 knowing that it is accessing regular memory, not a frame buffer,
123 so the kernel should fail the mmap rather than doing it with WB.
124
125 read/write of /dev/mem
126
127 This uses copy_from_user(), which implicitly uses a kernel
128 identity mapping. This is obviously safe for things in
129 kern_memmap.
130
131 There may be corner cases of things that are not in kern_memmap,
132 but could be accessed this way. For example, registers in MMIO
133 space are not in kern_memmap, but could be accessed with a UC
134 mapping. This would not cause attribute aliasing. But
135 registers typically can be accessed only with four-byte or
136 eight-byte accesses, and the copy_from_user() path doesn't allow
137 any control over the access size, so this would be dangerous.
138
139 ioremap()
140
141 This returns a kernel identity mapping for use inside the
142 kernel.
143
144 If the region is in kern_memmap, we should use the attribute
145 specified there. Otherwise, if the EFI memory map reports that
146 the entire granule supports WB, we should use that (granules
147 that are partially reserved or occupied by firmware do not appear
148 in kern_memmap). Otherwise, we should use a UC mapping.
149
150PAST PROBLEM CASES
151
152 mmap of various MMIO regions from /dev/mem by "X" on Intel platforms
153
154 The EFI memory map may not report these MMIO regions.
155
156 These must be allowed so that X will work. This means that
157 when the EFI memory map is incomplete, every /dev/mem mmap must
158 succeed. It may create either WB or UC user mappings, depending
159 on whether the region is in kern_memmap or the EFI memory map.
160
161 mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled
162
163 See https://bugzilla.novell.com/show_bug.cgi?id=140858.
164
165 The EFI memory map reports the following attributes:
166 0x00000-0x9FFFF WB only
167 0xA0000-0xBFFFF UC only (VGA frame buffer)
168 0xC0000-0xFFFFF WB only
169
170 This mmap is done with user pages, not kernel identity mappings,
171 so it is safe to use WB mappings.
172
173 The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000,
174 which will use a granule-sized UC mapping covering 0-0xFFFFF. This
175 granule covers some WB-only memory, but since UC is non-speculative,
176 the processor will never generate an uncacheable reference to the
177 WB-only areas unless the driver explicitly touches them.
178
179 mmap of 0x0-0xFFFFF legacy_mem by "X"
180
181 If the EFI memory map reports this entire range as WB, there
182 is no VGA MMIO hole, and the mmap should fail or be done with
183 a WB mapping.
184
185 There's no easy way for X to determine whether the 0xA0000-0xBFFFF
186 region is a frame buffer or just memory, so I think it's best to
187 just fail this mmap request rather than using a WB mapping. As
188 far as I know, there's no need to map legacy_mem with WB
189 mappings.
190
191 Otherwise, a UC mapping of the entire region is probably safe.
192 The VGA hole means the region will not be in kern_memmap. The
193 HP sx1000 chipset doesn't support UC access to the memory surrounding
194 the VGA hole, but X doesn't need that area anyway and should not
195 reference it.
196
197 mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled
198
199 The EFI memory map reports the following attributes:
200 0x00000-0xFFFFF WB only (no VGA MMIO hole)
201
202 This is a special case of the previous case, and the mmap should
203 fail for the same reason as above.
204
205NOTES
206
207 [1] SDM rev 2.2, vol 2, sec 4.4.1.
208 [2] SDM rev 2.2, vol 2, sec 4.4.6.
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt
index 171a44ebd939..1543802ef53e 100644
--- a/Documentation/ioctl-number.txt
+++ b/Documentation/ioctl-number.txt
@@ -85,7 +85,9 @@ Code Seq# Include File Comments
85 <mailto:maassen@uni-freiburg.de> 85 <mailto:maassen@uni-freiburg.de>
86'C' all linux/soundcard.h 86'C' all linux/soundcard.h
87'D' all asm-s390/dasd.h 87'D' all asm-s390/dasd.h
88'E' all linux/input.h
88'F' all linux/fb.h 89'F' all linux/fb.h
90'H' all linux/hiddev.h
89'I' all linux/isdn.h 91'I' all linux/isdn.h
90'J' 00-1F drivers/scsi/gdth_ioctl.h 92'J' 00-1F drivers/scsi/gdth_ioctl.h
91'K' all linux/kd.h 93'K' all linux/kd.h
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index a9d3a1794b23..bca6f389da66 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -147,6 +147,9 @@ running once the system is up.
147 acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA 147 acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA
148 Format: <irq>,<irq>... 148 Format: <irq>,<irq>...
149 149
150 acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS
151 Format: To spoof as Windows 98: ="Microsoft Windows"
152
150 acpi_osi= [HW,ACPI] empty param disables _OSI 153 acpi_osi= [HW,ACPI] empty param disables _OSI
151 154
152 acpi_serialize [HW,ACPI] force serialization of AML methods 155 acpi_serialize [HW,ACPI] force serialization of AML methods
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt
index 76750fb9151a..839cbb71388b 100644
--- a/Documentation/networking/tuntap.txt
+++ b/Documentation/networking/tuntap.txt
@@ -39,10 +39,13 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com>
39 mknod /dev/net/tun c 10 200 39 mknod /dev/net/tun c 10 200
40 40
41 Set permissions: 41 Set permissions:
42 e.g. chmod 0700 /dev/net/tun 42 e.g. chmod 0666 /dev/net/tun
43 if you want the device only accessible by root. Giving regular users the 43 There's no harm in allowing the device to be accessible by non-root users,
44 right to assign network devices is NOT a good idea. Users could assign 44 since CAP_NET_ADMIN is required for creating network devices or for
45 bogus network interfaces to trick firewalls or administrators. 45 connecting to network devices which aren't owned by the user in question.
46 If you want to create persistent devices and give ownership of them to
47 unprivileged users, then you need the /dev/net/tun device to be usable by
48 those users.
46 49
47 Driver module autoloading 50 Driver module autoloading
48 51
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 516c5019013b..823b2cf6e3dc 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -350,9 +350,34 @@ Q: How do I make suspend more verbose?
350 350
351A: If you want to see any non-error kernel messages on the virtual 351A: If you want to see any non-error kernel messages on the virtual
352terminal the kernel switches to during suspend, you have to set the 352terminal the kernel switches to during suspend, you have to set the
353kernel console loglevel to at least 5, for example by doing 353kernel console loglevel to at least 4 (KERN_WARNING), for example by
354 354doing
355 echo 5 > /proc/sys/kernel/printk 355
356 # save the old loglevel
357 read LOGLEVEL DUMMY < /proc/sys/kernel/printk
358 # set the loglevel so we see the progress bar.
359 # if the level is higher than needed, we leave it alone.
360 if [ $LOGLEVEL -lt 5 ]; then
361 echo 5 > /proc/sys/kernel/printk
362 fi
363
364 IMG_SZ=0
365 read IMG_SZ < /sys/power/image_size
366 echo -n disk > /sys/power/state
367 RET=$?
368 #
369 # the logic here is:
370 # if image_size > 0 (without kernel support, IMG_SZ will be zero),
371 # then try again with image_size set to zero.
372 if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size
373 echo 0 > /sys/power/image_size
374 echo -n disk > /sys/power/state
375 RET=$?
376 fi
377
378 # restore previous loglevel
379 echo $LOGLEVEL > /proc/sys/kernel/printk
380 exit $RET
356 381
357Q: Is this true that if I have a mounted filesystem on a USB device and 382Q: Is this true that if I have a mounted filesystem on a USB device and
358I suspend to disk, I can lose data unless the filesystem has been mounted 383I suspend to disk, I can lose data unless the filesystem has been mounted
@@ -380,3 +405,17 @@ safest thing is to unmount all filesystems on removable media (such USB,
380Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) 405Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays)
381before suspending; then remount them after resuming. 406before suspending; then remount them after resuming.
382 407
408Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were
409compiled with the similar configuration files. Anyway I found that
410suspend to disk (and resume) is much slower on 2.6.16 compared to
4112.6.15. Any idea for why that might happen or how can I speed it up?
412
413A: This is because the size of the suspend image is now greater than
414for 2.6.15 (by saving more data we can get more responsive system
415after resume).
416
417There's the /sys/power/image_size knob that controls the size of the
418image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as
419root), the 2.6.15 behavior should be restored. If it is still too
420slow, take a look at suspend.sf.net -- userland suspend is faster and
421supports LZF compression to speed it up further.
diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt
index 43a889f8f08d..d859faa3a463 100644
--- a/Documentation/power/video.txt
+++ b/Documentation/power/video.txt
@@ -90,6 +90,7 @@ Table of known working notebooks:
90Model hack (or "how to do it") 90Model hack (or "how to do it")
91------------------------------------------------------------------------------ 91------------------------------------------------------------------------------
92Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI 92Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
93Acer TM 230 s3_bios (2)
93Acer TM 242FX vbetool (6) 94Acer TM 242FX vbetool (6)
94Acer TM C110 video_post (8) 95Acer TM C110 video_post (8)
95Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8) 96Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
@@ -115,6 +116,7 @@ Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested)
115Dell Inspiron 4000 ??? (*) 116Dell Inspiron 4000 ??? (*)
116Dell Inspiron 500m ??? (*) 117Dell Inspiron 500m ??? (*)
117Dell Inspiron 510m ??? 118Dell Inspiron 510m ???
119Dell Inspiron 5150 vbetool needed (6)
118Dell Inspiron 600m ??? (*) 120Dell Inspiron 600m ??? (*)
119Dell Inspiron 8200 ??? (*) 121Dell Inspiron 8200 ??? (*)
120Dell Inspiron 8500 ??? (*) 122Dell Inspiron 8500 ??? (*)
@@ -125,6 +127,7 @@ HP NX7000 ??? (*)
125HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X 127HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X
126HP Omnibook XE3 athlon version none (1) 128HP Omnibook XE3 athlon version none (1)
127HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV 129HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV
130HP Omnibook XE3L-GF vbetool (6)
128HP Omnibook 5150 none (1), (S1 also works OK) 131HP Omnibook 5150 none (1), (S1 also works OK)
129IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work. 132IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work.
130IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(] 133IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(]
@@ -157,6 +160,7 @@ Sony Vaio vgn-s260 X or boot-radeon can init it (5)
157Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X. 160Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X.
158Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4) 161Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4)
159Toshiba Libretto L5 none (1) 162Toshiba Libretto L5 none (1)
163Toshiba Libretto 100CT/110CT vbetool (6)
160Toshiba Portege 3020CT s3_mode (3) 164Toshiba Portege 3020CT s3_mode (3)
161Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK) 165Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK)
162Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK) 166Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK)
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt
index 3f1c5464b1c9..5a311c38dd1a 100644
--- a/Documentation/sparse.txt
+++ b/Documentation/sparse.txt
@@ -1,5 +1,6 @@
1Copyright 2004 Linus Torvalds 1Copyright 2004 Linus Torvalds
2Copyright 2004 Pavel Machek <pavel@suse.cz> 2Copyright 2004 Pavel Machek <pavel@suse.cz>
3Copyright 2006 Bob Copeland <me@bobcopeland.com>
3 4
4Using sparse for typechecking 5Using sparse for typechecking
5~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -41,15 +42,8 @@ sure that bitwise types don't get mixed up (little-endian vs big-endian
41vs cpu-endian vs whatever), and there the constant "0" really _is_ 42vs cpu-endian vs whatever), and there the constant "0" really _is_
42special. 43special.
43 44
44Use 45Getting sparse
45 46~~~~~~~~~~~~~~
46 make C=[12] CF=-Wbitwise
47
48or you don't get any checking at all.
49
50
51Where to get sparse
52~~~~~~~~~~~~~~~~~~~
53 47
54With git, you can just get it from 48With git, you can just get it from
55 49
@@ -57,7 +51,7 @@ With git, you can just get it from
57 51
58and DaveJ has tar-balls at 52and DaveJ has tar-balls at
59 53
60 http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ 54 http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
61 55
62 56
63Once you have it, just do 57Once you have it, just do
@@ -65,8 +59,20 @@ Once you have it, just do
65 make 59 make
66 make install 60 make install
67 61
68as your regular user, and it will install sparse in your ~/bin directory. 62as a regular user, and it will install sparse in your ~/bin directory.
69After that, doing a kernel make with "make C=1" will run sparse on all the 63
70C files that get recompiled, or with "make C=2" will run sparse on the 64Using sparse
71files whether they need to be recompiled or not (ie the latter is fast way 65~~~~~~~~~~~~
72to check the whole tree if you have already built it). 66
67Do a kernel make with "make C=1" to run sparse on all the C files that get
68recompiled, or use "make C=2" to run sparse on the files whether they need to
69be recompiled or not. The latter is a fast way to check the whole tree if you
70have already built it.
71
72The optional make variable CF can be used to pass arguments to sparse. The
73build system passes -Wbitwise to sparse automatically. To perform endianness
74checks, you may define __CHECK_ENDIAN__:
75
76 make C=2 CF="-D__CHECK_ENDIAN__"
77
78These checks are disabled by default as they generate a host of warnings.
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index a46c10fcddfc..2dc246af4885 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm:
29- drop-caches 29- drop-caches
30- zone_reclaim_mode 30- zone_reclaim_mode
31- zone_reclaim_interval 31- zone_reclaim_interval
32- panic_on_oom
32 33
33============================================================== 34==============================================================
34 35
@@ -178,3 +179,15 @@ Time is set in seconds and set by default to 30 seconds.
178Reduce the interval if undesired off node allocations occur. However, too 179Reduce the interval if undesired off node allocations occur. However, too
179frequent scans will have a negative impact onoff node allocation performance. 180frequent scans will have a negative impact onoff node allocation performance.
180 181
182=============================================================
183
184panic_on_oom
185
186This enables or disables panic on out-of-memory feature. If this is set to 1,
187the kernel panics when out-of-memory happens. If this is set to 0, the kernel
188will kill some rogue process, called oom_killer. Usually, oom_killer can kill
189rogue processes and system will survive. If you want to panic the system
190rather than killing rogue processes, set this to 1.
191
192The default value is 0.
193
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index 0dd4ef30c361..99f89aa10169 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -26,8 +26,13 @@ a process are located. See also the numa_maps manpage in the numactl package.
26Manual migration is useful if for example the scheduler has relocated 26Manual migration is useful if for example the scheduler has relocated
27a process to a processor on a distant node. A batch scheduler or an 27a process to a processor on a distant node. A batch scheduler or an
28administrator may detect the situation and move the pages of the process 28administrator may detect the situation and move the pages of the process
29nearer to the new processor. At some point in the future we may have 29nearer to the new processor. The kernel itself does only provide
30some mechanism in the scheduler that will automatically move the pages. 30manual page migration support. Automatic page migration may be implemented
31through user space processes that move pages. A special function call
32"move_pages" allows the moving of individual pages within a process.
33A NUMA profiler may f.e. obtain a log showing frequent off node
34accesses and may use the result to move pages to more advantageous
35locations.
31 36
32Larger installations usually partition the system using cpusets into 37Larger installations usually partition the system using cpusets into
33sections of nodes. Paul Jackson has equipped cpusets with the ability to 38sections of nodes. Paul Jackson has equipped cpusets with the ability to
@@ -62,22 +67,14 @@ A. In kernel use of migrate_pages()
62 It also prevents the swapper or other scans to encounter 67 It also prevents the swapper or other scans to encounter
63 the page. 68 the page.
64 69
652. Generate a list of newly allocates page. These pages will contain the 702. We need to have a function of type new_page_t that can be
66 contents of the pages from the first list after page migration is 71 passed to migrate_pages(). This function should figure out
67 complete. 72 how to allocate the correct new page given the old page.
68 73
693. The migrate_pages() function is called which attempts 743. The migrate_pages() function is called which attempts
70 to do the migration. It returns the moved pages in the 75 to do the migration. It will call the function to allocate
71 list specified as the third parameter and the failed 76 the new page for each page that is considered for
72 migrations in the fourth parameter. The first parameter 77 moving.
73 will contain the pages that could still be retried.
74
754. The leftover pages of various types are returned
76 to the LRU using putback_to_lru_pages() or otherwise
77 disposed of. The pages will still have the refcount as
78 increased by isolate_lru_pages() if putback_to_lru_pages() is not
79 used! The kernel may want to handle the various cases of failures in
80 different ways.
81 78
82B. How migrate_pages() works 79B. How migrate_pages() works
83---------------------------- 80----------------------------
@@ -93,83 +90,58 @@ Steps:
93 90
942. Insure that writeback is complete. 912. Insure that writeback is complete.
95 92
963. Make sure that the page has assigned swap cache entry if 933. Prep the new page that we want to move to. It is locked
97 it is an anonyous page. The swap cache reference is necessary
98 to preserve the information contain in the page table maps while
99 page migration occurs.
100
1014. Prep the new page that we want to move to. It is locked
102 and set to not being uptodate so that all accesses to the new 94 and set to not being uptodate so that all accesses to the new
103 page immediately lock while the move is in progress. 95 page immediately lock while the move is in progress.
104 96
1055. All the page table references to the page are either dropped (file 974. The new page is prepped with some settings from the old page so that
106 backed pages) or converted to swap references (anonymous pages). 98 accesses to the new page will discover a page with the correct settings.
107 This should decrease the reference count. 99
1005. All the page table references to the page are converted
101 to migration entries or dropped (nonlinear vmas).
102 This decrease the mapcount of a page. If the resulting
103 mapcount is not zero then we do not migrate the page.
104 All user space processes that attempt to access the page
105 will now wait on the page lock.
108 106
1096. The radix tree lock is taken. This will cause all processes trying 1076. The radix tree lock is taken. This will cause all processes trying
110 to reestablish a pte to block on the radix tree spinlock. 108 to access the page via the mapping to block on the radix tree spinlock.
111 109
1127. The refcount of the page is examined and we back out if references remain 1107. The refcount of the page is examined and we back out if references remain
113 otherwise we know that we are the only one referencing this page. 111 otherwise we know that we are the only one referencing this page.
114 112
1158. The radix tree is checked and if it does not contain the pointer to this 1138. The radix tree is checked and if it does not contain the pointer to this
116 page then we back out because someone else modified the mapping first. 114 page then we back out because someone else modified the radix tree.
117
1189. The mapping is checked. If the mapping is gone then a truncate action may
119 be in progress and we back out.
120
12110. The new page is prepped with some settings from the old page so that
122 accesses to the new page will be discovered to have the correct settings.
123 115
12411. The radix tree is changed to point to the new page. 1169. The radix tree is changed to point to the new page.
125 117
12612. The reference count of the old page is dropped because the radix tree 11810. The reference count of the old page is dropped because the radix tree
127 reference is gone. 119 reference is gone. A reference to the new page is established because
120 the new page is referenced to by the radix tree.
128 121
12913. The radix tree lock is dropped. With that lookups become possible again 12211. The radix tree lock is dropped. With that lookups in the mapping
130 and other processes will move from spinning on the tree lock to sleeping on 123 become possible again. Processes will move from spinning on the tree_lock
131 the locked new page. 124 to sleeping on the locked new page.
132 125
13314. The page contents are copied to the new page. 12612. The page contents are copied to the new page.
134 127
13515. The remaining page flags are copied to the new page. 12813. The remaining page flags are copied to the new page.
136 129
13716. The old page flags are cleared to indicate that the page does 13014. The old page flags are cleared to indicate that the page does
138 not use any information anymore. 131 not provide any information anymore.
139 132
14017. Queued up writeback on the new page is triggered. 13315. Queued up writeback on the new page is triggered.
141 134
14218. If swap pte's were generated for the page then replace them with real 13516. If migration entries were page then replace them with real ptes. Doing
143 ptes. This will reenable access for processes not blocked by the page lock. 136 so will enable access for user space processes not already waiting for
137 the page lock.
144 138
14519. The page locks are dropped from the old and new page. 13919. The page locks are dropped from the old and new page.
146 Processes waiting on the page lock can continue. 140 Processes waiting on the page lock will redo their page faults
141 and will reach the new page.
147 142
14820. The new page is moved to the LRU and can be scanned by the swapper 14320. The new page is moved to the LRU and can be scanned by the swapper
149 etc again. 144 etc again.
150 145
151TODO list 146Christoph Lameter, May 8, 2006.
152---------
153
154- Page migration requires the use of swap handles to preserve the
155 information of the anonymous page table entries. This means that swap
156 space is reserved but never used. The maximum number of swap handles used
157 is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration.
158 Reservation of pages could be avoided by having a special type of swap
159 handle that does not require swap space and that would only track the page
160 references. Something like that was proposed by Marcelo Tosatti in the
161 past (search for migration cache on lkml or linux-mm@kvack.org).
162
163- Page migration unmaps ptes for file backed pages and requires page
164 faults to reestablish these ptes. This could be optimized by somehow
165 recording the references before migration and then reestablish them later.
166 However, there are several locking challenges that have to be overcome
167 before this is possible.
168
169- Page migration generates read ptes for anonymous pages. Dirty page
170 faults are required to make the pages writable again. It may be possible
171 to generate a pte marked dirty if it is known that the page is dirty and
172 that this process has the only reference to that page.
173
174Christoph Lameter, March 8, 2006.
175 147