diff options
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/CodingStyle | 100 | ||||
-rw-r--r-- | Documentation/DocBook/kernel-api.tmpl | 13 | ||||
-rw-r--r-- | Documentation/RCU/whatisRCU.txt | 1 | ||||
-rw-r--r-- | Documentation/SubmitChecklist | 57 | ||||
-rw-r--r-- | Documentation/devices.txt | 135 | ||||
-rw-r--r-- | Documentation/feature-removal-schedule.txt | 15 | ||||
-rw-r--r-- | Documentation/filesystems/Locking | 9 | ||||
-rw-r--r-- | Documentation/filesystems/porting | 7 | ||||
-rw-r--r-- | Documentation/filesystems/vfs.txt | 6 | ||||
-rw-r--r-- | Documentation/ia64/aliasing.txt | 208 | ||||
-rw-r--r-- | Documentation/ioctl-number.txt | 2 | ||||
-rw-r--r-- | Documentation/kernel-parameters.txt | 3 | ||||
-rw-r--r-- | Documentation/networking/tuntap.txt | 11 | ||||
-rw-r--r-- | Documentation/power/swsusp.txt | 45 | ||||
-rw-r--r-- | Documentation/power/video.txt | 4 | ||||
-rw-r--r-- | Documentation/sparse.txt | 36 | ||||
-rw-r--r-- | Documentation/sysctl/vm.txt | 13 | ||||
-rw-r--r-- | Documentation/vm/page_migration | 114 |
18 files changed, 624 insertions, 155 deletions
diff --git a/Documentation/CodingStyle b/Documentation/CodingStyle index ce5d2c038cf5..6d2412ec91ed 100644 --- a/Documentation/CodingStyle +++ b/Documentation/CodingStyle | |||
@@ -155,7 +155,83 @@ problem, which is called the function-growth-hormone-imbalance syndrome. | |||
155 | See next chapter. | 155 | See next chapter. |
156 | 156 | ||
157 | 157 | ||
158 | Chapter 5: Functions | 158 | Chapter 5: Typedefs |
159 | |||
160 | Please don't use things like "vps_t". | ||
161 | |||
162 | It's a _mistake_ to use typedef for structures and pointers. When you see a | ||
163 | |||
164 | vps_t a; | ||
165 | |||
166 | in the source, what does it mean? | ||
167 | |||
168 | In contrast, if it says | ||
169 | |||
170 | struct virtual_container *a; | ||
171 | |||
172 | you can actually tell what "a" is. | ||
173 | |||
174 | Lots of people think that typedefs "help readability". Not so. They are | ||
175 | useful only for: | ||
176 | |||
177 | (a) totally opaque objects (where the typedef is actively used to _hide_ | ||
178 | what the object is). | ||
179 | |||
180 | Example: "pte_t" etc. opaque objects that you can only access using | ||
181 | the proper accessor functions. | ||
182 | |||
183 | NOTE! Opaqueness and "accessor functions" are not good in themselves. | ||
184 | The reason we have them for things like pte_t etc. is that there | ||
185 | really is absolutely _zero_ portably accessible information there. | ||
186 | |||
187 | (b) Clear integer types, where the abstraction _helps_ avoid confusion | ||
188 | whether it is "int" or "long". | ||
189 | |||
190 | u8/u16/u32 are perfectly fine typedefs, although they fit into | ||
191 | category (d) better than here. | ||
192 | |||
193 | NOTE! Again - there needs to be a _reason_ for this. If something is | ||
194 | "unsigned long", then there's no reason to do | ||
195 | |||
196 | typedef unsigned long myflags_t; | ||
197 | |||
198 | but if there is a clear reason for why it under certain circumstances | ||
199 | might be an "unsigned int" and under other configurations might be | ||
200 | "unsigned long", then by all means go ahead and use a typedef. | ||
201 | |||
202 | (c) when you use sparse to literally create a _new_ type for | ||
203 | type-checking. | ||
204 | |||
205 | (d) New types which are identical to standard C99 types, in certain | ||
206 | exceptional circumstances. | ||
207 | |||
208 | Although it would only take a short amount of time for the eyes and | ||
209 | brain to become accustomed to the standard types like 'uint32_t', | ||
210 | some people object to their use anyway. | ||
211 | |||
212 | Therefore, the Linux-specific 'u8/u16/u32/u64' types and their | ||
213 | signed equivalents which are identical to standard types are | ||
214 | permitted -- although they are not mandatory in new code of your | ||
215 | own. | ||
216 | |||
217 | When editing existing code which already uses one or the other set | ||
218 | of types, you should conform to the existing choices in that code. | ||
219 | |||
220 | (e) Types safe for use in userspace. | ||
221 | |||
222 | In certain structures which are visible to userspace, we cannot | ||
223 | require C99 types and cannot use the 'u32' form above. Thus, we | ||
224 | use __u32 and similar types in all structures which are shared | ||
225 | with userspace. | ||
226 | |||
227 | Maybe there are other cases too, but the rule should basically be to NEVER | ||
228 | EVER use a typedef unless you can clearly match one of those rules. | ||
229 | |||
230 | In general, a pointer, or a struct that has elements that can reasonably | ||
231 | be directly accessed should _never_ be a typedef. | ||
232 | |||
233 | |||
234 | Chapter 6: Functions | ||
159 | 235 | ||
160 | Functions should be short and sweet, and do just one thing. They should | 236 | Functions should be short and sweet, and do just one thing. They should |
161 | fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, | 237 | fit on one or two screenfuls of text (the ISO/ANSI screen size is 80x24, |
@@ -183,7 +259,7 @@ and it gets confused. You know you're brilliant, but maybe you'd like | |||
183 | to understand what you did 2 weeks from now. | 259 | to understand what you did 2 weeks from now. |
184 | 260 | ||
185 | 261 | ||
186 | Chapter 6: Centralized exiting of functions | 262 | Chapter 7: Centralized exiting of functions |
187 | 263 | ||
188 | Albeit deprecated by some people, the equivalent of the goto statement is | 264 | Albeit deprecated by some people, the equivalent of the goto statement is |
189 | used frequently by compilers in form of the unconditional jump instruction. | 265 | used frequently by compilers in form of the unconditional jump instruction. |
@@ -220,7 +296,7 @@ out: | |||
220 | return result; | 296 | return result; |
221 | } | 297 | } |
222 | 298 | ||
223 | Chapter 7: Commenting | 299 | Chapter 8: Commenting |
224 | 300 | ||
225 | Comments are good, but there is also a danger of over-commenting. NEVER | 301 | Comments are good, but there is also a danger of over-commenting. NEVER |
226 | try to explain HOW your code works in a comment: it's much better to | 302 | try to explain HOW your code works in a comment: it's much better to |
@@ -240,7 +316,7 @@ When commenting the kernel API functions, please use the kerneldoc format. | |||
240 | See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc | 316 | See the files Documentation/kernel-doc-nano-HOWTO.txt and scripts/kernel-doc |
241 | for details. | 317 | for details. |
242 | 318 | ||
243 | Chapter 8: You've made a mess of it | 319 | Chapter 9: You've made a mess of it |
244 | 320 | ||
245 | That's OK, we all do. You've probably been told by your long-time Unix | 321 | That's OK, we all do. You've probably been told by your long-time Unix |
246 | user helper that "GNU emacs" automatically formats the C sources for | 322 | user helper that "GNU emacs" automatically formats the C sources for |
@@ -288,7 +364,7 @@ re-formatting you may want to take a look at the man page. But | |||
288 | remember: "indent" is not a fix for bad programming. | 364 | remember: "indent" is not a fix for bad programming. |
289 | 365 | ||
290 | 366 | ||
291 | Chapter 9: Configuration-files | 367 | Chapter 10: Configuration-files |
292 | 368 | ||
293 | For configuration options (arch/xxx/Kconfig, and all the Kconfig files), | 369 | For configuration options (arch/xxx/Kconfig, and all the Kconfig files), |
294 | somewhat different indentation is used. | 370 | somewhat different indentation is used. |
@@ -313,7 +389,7 @@ support for file-systems, for instance) should be denoted (DANGEROUS), other | |||
313 | experimental options should be denoted (EXPERIMENTAL). | 389 | experimental options should be denoted (EXPERIMENTAL). |
314 | 390 | ||
315 | 391 | ||
316 | Chapter 10: Data structures | 392 | Chapter 11: Data structures |
317 | 393 | ||
318 | Data structures that have visibility outside the single-threaded | 394 | Data structures that have visibility outside the single-threaded |
319 | environment they are created and destroyed in should always have | 395 | environment they are created and destroyed in should always have |
@@ -344,7 +420,7 @@ Remember: if another thread can find your data structure, and you don't | |||
344 | have a reference count on it, you almost certainly have a bug. | 420 | have a reference count on it, you almost certainly have a bug. |
345 | 421 | ||
346 | 422 | ||
347 | Chapter 11: Macros, Enums and RTL | 423 | Chapter 12: Macros, Enums and RTL |
348 | 424 | ||
349 | Names of macros defining constants and labels in enums are capitalized. | 425 | Names of macros defining constants and labels in enums are capitalized. |
350 | 426 | ||
@@ -399,7 +475,7 @@ The cpp manual deals with macros exhaustively. The gcc internals manual also | |||
399 | covers RTL which is used frequently with assembly language in the kernel. | 475 | covers RTL which is used frequently with assembly language in the kernel. |
400 | 476 | ||
401 | 477 | ||
402 | Chapter 12: Printing kernel messages | 478 | Chapter 13: Printing kernel messages |
403 | 479 | ||
404 | Kernel developers like to be seen as literate. Do mind the spelling | 480 | Kernel developers like to be seen as literate. Do mind the spelling |
405 | of kernel messages to make a good impression. Do not use crippled | 481 | of kernel messages to make a good impression. Do not use crippled |
@@ -410,7 +486,7 @@ Kernel messages do not have to be terminated with a period. | |||
410 | Printing numbers in parentheses (%d) adds no value and should be avoided. | 486 | Printing numbers in parentheses (%d) adds no value and should be avoided. |
411 | 487 | ||
412 | 488 | ||
413 | Chapter 13: Allocating memory | 489 | Chapter 14: Allocating memory |
414 | 490 | ||
415 | The kernel provides the following general purpose memory allocators: | 491 | The kernel provides the following general purpose memory allocators: |
416 | kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API | 492 | kmalloc(), kzalloc(), kcalloc(), and vmalloc(). Please refer to the API |
@@ -429,7 +505,7 @@ from void pointer to any other pointer type is guaranteed by the C programming | |||
429 | language. | 505 | language. |
430 | 506 | ||
431 | 507 | ||
432 | Chapter 14: The inline disease | 508 | Chapter 15: The inline disease |
433 | 509 | ||
434 | There appears to be a common misperception that gcc has a magic "make me | 510 | There appears to be a common misperception that gcc has a magic "make me |
435 | faster" speedup option called "inline". While the use of inlines can be | 511 | faster" speedup option called "inline". While the use of inlines can be |
@@ -457,7 +533,7 @@ something it would have done anyway. | |||
457 | 533 | ||
458 | 534 | ||
459 | 535 | ||
460 | Chapter 15: References | 536 | Appendix I: References |
461 | 537 | ||
462 | The C Programming Language, Second Edition | 538 | The C Programming Language, Second Edition |
463 | by Brian W. Kernighan and Dennis M. Ritchie. | 539 | by Brian W. Kernighan and Dennis M. Ritchie. |
@@ -481,4 +557,4 @@ Kernel CodingStyle, by greg@kroah.com at OLS 2002: | |||
481 | http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ | 557 | http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/ |
482 | 558 | ||
483 | -- | 559 | -- |
484 | Last updated on 30 December 2005 by a community effort on LKML. | 560 | Last updated on 30 April 2006. |
diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index ca02e04a906c..31b727ceb127 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl | |||
@@ -117,6 +117,7 @@ X!Ilib/string.c | |||
117 | <chapter id="mm"> | 117 | <chapter id="mm"> |
118 | <title>Memory Management in Linux</title> | 118 | <title>Memory Management in Linux</title> |
119 | <sect1><title>The Slab Cache</title> | 119 | <sect1><title>The Slab Cache</title> |
120 | !Iinclude/linux/slab.h | ||
120 | !Emm/slab.c | 121 | !Emm/slab.c |
121 | </sect1> | 122 | </sect1> |
122 | <sect1><title>User Space Memory Access</title> | 123 | <sect1><title>User Space Memory Access</title> |
@@ -331,6 +332,18 @@ X!Earch/i386/kernel/mca.c | |||
331 | !Esecurity/security.c | 332 | !Esecurity/security.c |
332 | </chapter> | 333 | </chapter> |
333 | 334 | ||
335 | <chapter id="audit"> | ||
336 | <title>Audit Interfaces</title> | ||
337 | !Ekernel/audit.c | ||
338 | !Ikernel/auditsc.c | ||
339 | !Ikernel/auditfilter.c | ||
340 | </chapter> | ||
341 | |||
342 | <chapter id="accounting"> | ||
343 | <title>Accounting Framework</title> | ||
344 | !Ikernel/acct.c | ||
345 | </chapter> | ||
346 | |||
334 | <chapter id="pmfuncs"> | 347 | <chapter id="pmfuncs"> |
335 | <title>Power Management</title> | 348 | <title>Power Management</title> |
336 | !Ekernel/power/pm.c | 349 | !Ekernel/power/pm.c |
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index 07cb93b82ba9..6e459420ee9f 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt | |||
@@ -790,7 +790,6 @@ RCU pointer update: | |||
790 | 790 | ||
791 | RCU grace period: | 791 | RCU grace period: |
792 | 792 | ||
793 | synchronize_kernel (deprecated) | ||
794 | synchronize_net | 793 | synchronize_net |
795 | synchronize_sched | 794 | synchronize_sched |
796 | synchronize_rcu | 795 | synchronize_rcu |
diff --git a/Documentation/SubmitChecklist b/Documentation/SubmitChecklist new file mode 100644 index 000000000000..8230098da529 --- /dev/null +++ b/Documentation/SubmitChecklist | |||
@@ -0,0 +1,57 @@ | |||
1 | Linux Kernel patch sumbittal checklist | ||
2 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
3 | |||
4 | Here are some basic things that developers should do if they | ||
5 | want to see their kernel patch submittals accepted quicker. | ||
6 | |||
7 | These are all above and beyond the documentation that is provided | ||
8 | in Documentation/SubmittingPatches and elsewhere about submitting | ||
9 | Linux kernel patches. | ||
10 | |||
11 | |||
12 | |||
13 | - Builds cleanly with applicable or modified CONFIG options =y, =m, and =n. | ||
14 | No gcc warnings/errors, no linker warnings/errors. | ||
15 | |||
16 | - Passes allnoconfig, allmodconfig | ||
17 | |||
18 | - Builds on multiple CPU arch-es by using local cross-compile tools | ||
19 | or something like PLM at OSDL. | ||
20 | |||
21 | - ppc64 is a good architecture for cross-compilation checking because it | ||
22 | tends to use `unsigned long' for 64-bit quantities. | ||
23 | |||
24 | - Matches kernel coding style(!) | ||
25 | |||
26 | - Any new or modified CONFIG options don't muck up the config menu. | ||
27 | |||
28 | - All new Kconfig options have help text. | ||
29 | |||
30 | - Has been carefully reviewed with respect to relevant Kconfig | ||
31 | combinations. This is very hard to get right with testing -- | ||
32 | brainpower pays off here. | ||
33 | |||
34 | - Check cleanly with sparse. | ||
35 | |||
36 | - Use 'make checkstack' and 'make namespacecheck' and fix any | ||
37 | problems that they find. Note: checkstack does not point out | ||
38 | problems explicitly, but any one function that uses more than | ||
39 | 512 bytes on the stack is a candidate for change. | ||
40 | |||
41 | - Include kernel-doc to document global kernel APIs. (Not required | ||
42 | for static functions, but OK there also.) Use 'make htmldocs' | ||
43 | or 'make mandocs' to check the kernel-doc and fix any issues. | ||
44 | |||
45 | - Has been tested with CONFIG_PREEMPT, CONFIG_DEBUG_PREEMPT, | ||
46 | CONFIG_DEBUG_SLAB, CONFIG_DEBUG_PAGEALLOC, CONFIG_DEBUG_MUTEXES, | ||
47 | CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_SPINLOCK_SLEEP all simultaneously | ||
48 | enabled. | ||
49 | |||
50 | - Has been build- and runtime tested with and without CONFIG_SMP and | ||
51 | CONFIG_PREEMPT. | ||
52 | |||
53 | - If the patch affects IO/Disk, etc: has been tested with and without | ||
54 | CONFIG_LBD. | ||
55 | |||
56 | |||
57 | 2006-APR-27 | ||
diff --git a/Documentation/devices.txt b/Documentation/devices.txt index b369a8c46a73..b2f593fc76ca 100644 --- a/Documentation/devices.txt +++ b/Documentation/devices.txt | |||
@@ -3,7 +3,7 @@ | |||
3 | 3 | ||
4 | Maintained by Torben Mathiasen <device@lanana.org> | 4 | Maintained by Torben Mathiasen <device@lanana.org> |
5 | 5 | ||
6 | Last revised: 25 January 2005 | 6 | Last revised: 01 March 2006 |
7 | 7 | ||
8 | This list is the Linux Device List, the official registry of allocated | 8 | This list is the Linux Device List, the official registry of allocated |
9 | device numbers and /dev directory nodes for the Linux operating | 9 | device numbers and /dev directory nodes for the Linux operating |
@@ -94,7 +94,6 @@ Your cooperation is appreciated. | |||
94 | 9 = /dev/urandom Faster, less secure random number gen. | 94 | 9 = /dev/urandom Faster, less secure random number gen. |
95 | 10 = /dev/aio Asyncronous I/O notification interface | 95 | 10 = /dev/aio Asyncronous I/O notification interface |
96 | 11 = /dev/kmsg Writes to this come out as printk's | 96 | 11 = /dev/kmsg Writes to this come out as printk's |
97 | 12 = /dev/oldmem Access to crash dump from kexec kernel | ||
98 | 1 block RAM disk | 97 | 1 block RAM disk |
99 | 0 = /dev/ram0 First RAM disk | 98 | 0 = /dev/ram0 First RAM disk |
100 | 1 = /dev/ram1 Second RAM disk | 99 | 1 = /dev/ram1 Second RAM disk |
@@ -262,13 +261,13 @@ Your cooperation is appreciated. | |||
262 | NOTE: These devices permit both read and write access. | 261 | NOTE: These devices permit both read and write access. |
263 | 262 | ||
264 | 7 block Loopback devices | 263 | 7 block Loopback devices |
265 | 0 = /dev/loop0 First loopback device | 264 | 0 = /dev/loop0 First loop device |
266 | 1 = /dev/loop1 Second loopback device | 265 | 1 = /dev/loop1 Second loop device |
267 | ... | 266 | ... |
268 | 267 | ||
269 | The loopback devices are used to mount filesystems not | 268 | The loop devices are used to mount filesystems not |
270 | associated with block devices. The binding to the | 269 | associated with block devices. The binding to the |
271 | loopback devices is handled by mount(8) or losetup(8). | 270 | loop devices is handled by mount(8) or losetup(8). |
272 | 271 | ||
273 | 8 block SCSI disk devices (0-15) | 272 | 8 block SCSI disk devices (0-15) |
274 | 0 = /dev/sda First SCSI disk whole disk | 273 | 0 = /dev/sda First SCSI disk whole disk |
@@ -943,7 +942,7 @@ Your cooperation is appreciated. | |||
943 | 240 = /dev/ftlp FTL on 16th Memory Technology Device | 942 | 240 = /dev/ftlp FTL on 16th Memory Technology Device |
944 | 943 | ||
945 | Partitions are handled in the same way as for IDE | 944 | Partitions are handled in the same way as for IDE |
946 | disks (see major number 3) expect that the partition | 945 | disks (see major number 3) except that the partition |
947 | limit is 15 rather than 63 per disk (same as SCSI.) | 946 | limit is 15 rather than 63 per disk (same as SCSI.) |
948 | 947 | ||
949 | 45 char isdn4linux ISDN BRI driver | 948 | 45 char isdn4linux ISDN BRI driver |
@@ -1168,7 +1167,7 @@ Your cooperation is appreciated. | |||
1168 | The filename of the encrypted container and the passwords | 1167 | The filename of the encrypted container and the passwords |
1169 | are sent via ioctls (using the sdmount tool) to the master | 1168 | are sent via ioctls (using the sdmount tool) to the master |
1170 | node which then activates them via one of the | 1169 | node which then activates them via one of the |
1171 | /dev/scramdisk/x nodes for loopback mounting (all handled | 1170 | /dev/scramdisk/x nodes for loop mounting (all handled |
1172 | through the sdmount tool). | 1171 | through the sdmount tool). |
1173 | 1172 | ||
1174 | Requested by: andy@scramdisklinux.org | 1173 | Requested by: andy@scramdisklinux.org |
@@ -2538,18 +2537,32 @@ Your cooperation is appreciated. | |||
2538 | 0 = /dev/usb/lp0 First USB printer | 2537 | 0 = /dev/usb/lp0 First USB printer |
2539 | ... | 2538 | ... |
2540 | 15 = /dev/usb/lp15 16th USB printer | 2539 | 15 = /dev/usb/lp15 16th USB printer |
2541 | 16 = /dev/usb/mouse0 First USB mouse | ||
2542 | ... | ||
2543 | 31 = /dev/usb/mouse15 16th USB mouse | ||
2544 | 32 = /dev/usb/ez0 First USB firmware loader | ||
2545 | ... | ||
2546 | 47 = /dev/usb/ez15 16th USB firmware loader | ||
2547 | 48 = /dev/usb/scanner0 First USB scanner | 2540 | 48 = /dev/usb/scanner0 First USB scanner |
2548 | ... | 2541 | ... |
2549 | 63 = /dev/usb/scanner15 16th USB scanner | 2542 | 63 = /dev/usb/scanner15 16th USB scanner |
2550 | 64 = /dev/usb/rio500 Diamond Rio 500 | 2543 | 64 = /dev/usb/rio500 Diamond Rio 500 |
2551 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) | 2544 | 65 = /dev/usb/usblcd USBLCD Interface (info@usblcd.de) |
2552 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) | 2545 | 66 = /dev/usb/cpad0 Synaptics cPad (mouse/LCD) |
2546 | 96 = /dev/usb/hiddev0 1st USB HID device | ||
2547 | ... | ||
2548 | 111 = /dev/usb/hiddev15 16th USB HID device | ||
2549 | 112 = /dev/usb/auer0 1st auerswald ISDN device | ||
2550 | ... | ||
2551 | 127 = /dev/usb/auer15 16th auerswald ISDN device | ||
2552 | 128 = /dev/usb/brlvgr0 First Braille Voyager device | ||
2553 | ... | ||
2554 | 131 = /dev/usb/brlvgr3 Fourth Braille Voyager device | ||
2555 | 132 = /dev/usb/idmouse ID Mouse (fingerprint scanner) device | ||
2556 | 133 = /dev/usb/sisusbvga1 First SiSUSB VGA device | ||
2557 | ... | ||
2558 | 140 = /dev/usb/sisusbvga8 Eigth SISUSB VGA device | ||
2559 | 144 = /dev/usb/lcd USB LCD device | ||
2560 | 160 = /dev/usb/legousbtower0 1st USB Legotower device | ||
2561 | ... | ||
2562 | 175 = /dev/usb/legousbtower15 16th USB Legotower device | ||
2563 | 240 = /dev/usb/dabusb0 First daubusb device | ||
2564 | ... | ||
2565 | 243 = /dev/usb/dabusb3 Fourth dabusb device | ||
2553 | 2566 | ||
2554 | 180 block USB block devices | 2567 | 180 block USB block devices |
2555 | 0 = /dev/uba First USB block device | 2568 | 0 = /dev/uba First USB block device |
@@ -2710,6 +2723,17 @@ Your cooperation is appreciated. | |||
2710 | 1 = /dev/cpu/1/msr MSRs on CPU 1 | 2723 | 1 = /dev/cpu/1/msr MSRs on CPU 1 |
2711 | ... | 2724 | ... |
2712 | 2725 | ||
2726 | 202 block Xen Virtual Block Device | ||
2727 | 0 = /dev/xvda First Xen VBD whole disk | ||
2728 | 16 = /dev/xvdb Second Xen VBD whole disk | ||
2729 | 32 = /dev/xvdc Third Xen VBD whole disk | ||
2730 | ... | ||
2731 | 240 = /dev/xvdp Sixteenth Xen VBD whole disk | ||
2732 | |||
2733 | Partitions are handled in the same way as for IDE | ||
2734 | disks (see major number 3) except that the limit on | ||
2735 | partitions is 15. | ||
2736 | |||
2713 | 203 char CPU CPUID information | 2737 | 203 char CPU CPUID information |
2714 | 0 = /dev/cpu/0/cpuid CPUID on CPU 0 | 2738 | 0 = /dev/cpu/0/cpuid CPUID on CPU 0 |
2715 | 1 = /dev/cpu/1/cpuid CPUID on CPU 1 | 2739 | 1 = /dev/cpu/1/cpuid CPUID on CPU 1 |
@@ -2747,11 +2771,26 @@ Your cooperation is appreciated. | |||
2747 | 46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0 | 2771 | 46 = /dev/ttyCPM0 PPC CPM (SCC or SMC) - port 0 |
2748 | ... | 2772 | ... |
2749 | 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5 | 2773 | 47 = /dev/ttyCPM5 PPC CPM (SCC or SMC) - port 5 |
2750 | 50 = /dev/ttyIOC40 Altix serial card | 2774 | 50 = /dev/ttyIOC0 Altix serial card |
2775 | ... | ||
2776 | 81 = /dev/ttyIOC31 Altix serial card | ||
2777 | 82 = /dev/ttyVR0 NEC VR4100 series SIU | ||
2778 | 83 = /dev/ttyVR1 NEC VR4100 series DSIU | ||
2779 | 84 = /dev/ttyIOC84 Altix ioc4 serial card | ||
2780 | ... | ||
2781 | 115 = /dev/ttyIOC115 Altix ioc4 serial card | ||
2782 | 116 = /dev/ttySIOC0 Altix ioc3 serial card | ||
2783 | ... | ||
2784 | 147 = /dev/ttySIOC31 Altix ioc3 serial card | ||
2785 | 148 = /dev/ttyPSC0 PPC PSC - port 0 | ||
2786 | ... | ||
2787 | 153 = /dev/ttyPSC5 PPC PSC - port 5 | ||
2788 | 154 = /dev/ttyAT0 ATMEL serial port 0 | ||
2751 | ... | 2789 | ... |
2752 | 81 = /dev/ttyIOC431 Altix serial card | 2790 | 169 = /dev/ttyAT15 ATMEL serial port 15 |
2753 | 82 = /dev/ttyVR0 NEC VR4100 series SIU | 2791 | 170 = /dev/ttyNX0 Hilscher netX serial port 0 |
2754 | 83 = /dev/ttyVR1 NEC VR4100 series DSIU | 2792 | ... |
2793 | 185 = /dev/ttyNX15 Hilscher netX serial port 15 | ||
2755 | 2794 | ||
2756 | 205 char Low-density serial ports (alternate device) | 2795 | 205 char Low-density serial ports (alternate device) |
2757 | 0 = /dev/culu0 Callout device for ttyLU0 | 2796 | 0 = /dev/culu0 Callout device for ttyLU0 |
@@ -2786,8 +2825,8 @@ Your cooperation is appreciated. | |||
2786 | 50 = /dev/cuioc40 Callout device for ttyIOC40 | 2825 | 50 = /dev/cuioc40 Callout device for ttyIOC40 |
2787 | ... | 2826 | ... |
2788 | 81 = /dev/cuioc431 Callout device for ttyIOC431 | 2827 | 81 = /dev/cuioc431 Callout device for ttyIOC431 |
2789 | 82 = /dev/cuvr0 Callout device for ttyVR0 | 2828 | 82 = /dev/cuvr0 Callout device for ttyVR0 |
2790 | 83 = /dev/cuvr1 Callout device for ttyVR1 | 2829 | 83 = /dev/cuvr1 Callout device for ttyVR1 |
2791 | 2830 | ||
2792 | 2831 | ||
2793 | 206 char OnStream SC-x0 tape devices | 2832 | 206 char OnStream SC-x0 tape devices |
@@ -2897,7 +2936,6 @@ Your cooperation is appreciated. | |||
2897 | ... | 2936 | ... |
2898 | 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card | 2937 | 196 = /dev/dvb/adapter3/video0 first video decoder of fourth card |
2899 | 2938 | ||
2900 | |||
2901 | 216 char Bluetooth RFCOMM TTY devices | 2939 | 216 char Bluetooth RFCOMM TTY devices |
2902 | 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device | 2940 | 0 = /dev/rfcomm0 First Bluetooth RFCOMM TTY device |
2903 | 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device | 2941 | 1 = /dev/rfcomm1 Second Bluetooth RFCOMM TTY device |
@@ -3002,12 +3040,43 @@ Your cooperation is appreciated. | |||
3002 | ioctl()'s can be used to rewind the tape regardless of | 3040 | ioctl()'s can be used to rewind the tape regardless of |
3003 | the device used to access it. | 3041 | the device used to access it. |
3004 | 3042 | ||
3005 | 231 char InfiniBand MAD | 3043 | 231 char InfiniBand |
3006 | 0 = /dev/infiniband/umad0 | 3044 | 0 = /dev/infiniband/umad0 |
3007 | 1 = /dev/infiniband/umad1 | 3045 | 1 = /dev/infiniband/umad1 |
3008 | ... | 3046 | ... |
3047 | 63 = /dev/infiniband/umad63 63rd InfiniBandMad device | ||
3048 | 64 = /dev/infiniband/issm0 First InfiniBand IsSM device | ||
3049 | 65 = /dev/infiniband/issm1 Second InfiniBand IsSM device | ||
3050 | ... | ||
3051 | 127 = /dev/infiniband/issm63 63rd InfiniBand IsSM device | ||
3052 | 128 = /dev/infiniband/uverbs0 First InfiniBand verbs device | ||
3053 | 129 = /dev/infiniband/uverbs1 Second InfiniBand verbs device | ||
3054 | ... | ||
3055 | 159 = /dev/infiniband/uverbs31 31st InfiniBand verbs device | ||
3056 | |||
3057 | 232 char Biometric Devices | ||
3058 | 0 = /dev/biometric/sensor0/fingerprint first fingerprint sensor on first device | ||
3059 | 1 = /dev/biometric/sensor0/iris first iris sensor on first device | ||
3060 | 2 = /dev/biometric/sensor0/retina first retina sensor on first device | ||
3061 | 3 = /dev/biometric/sensor0/voiceprint first voiceprint sensor on first device | ||
3062 | 4 = /dev/biometric/sensor0/facial first facial sensor on first device | ||
3063 | 5 = /dev/biometric/sensor0/hand first hand sensor on first device | ||
3064 | ... | ||
3065 | 10 = /dev/biometric/sensor1/fingerprint first fingerprint sensor on second device | ||
3066 | ... | ||
3067 | 20 = /dev/biometric/sensor2/fingerprint first fingerprint sensor on third device | ||
3068 | ... | ||
3009 | 3069 | ||
3010 | 232-239 UNASSIGNED | 3070 | 233 char PathScale InfiniPath interconnect |
3071 | 0 = /dev/ipath Primary device for programs (any unit) | ||
3072 | 1 = /dev/ipath0 Access specifically to unit 0 | ||
3073 | 2 = /dev/ipath1 Access specifically to unit 1 | ||
3074 | ... | ||
3075 | 4 = /dev/ipath3 Access specifically to unit 3 | ||
3076 | 129 = /dev/ipath_sma Device used by Subnet Management Agent | ||
3077 | 130 = /dev/ipath_diag Device used by diagnostics programs | ||
3078 | |||
3079 | 234-239 UNASSIGNED | ||
3011 | 3080 | ||
3012 | 240-254 char LOCAL/EXPERIMENTAL USE | 3081 | 240-254 char LOCAL/EXPERIMENTAL USE |
3013 | 240-254 block LOCAL/EXPERIMENTAL USE | 3082 | 240-254 block LOCAL/EXPERIMENTAL USE |
@@ -3021,6 +3090,24 @@ Your cooperation is appreciated. | |||
3021 | This major is reserved to assist the expansion to a | 3090 | This major is reserved to assist the expansion to a |
3022 | larger number space. No device nodes with this major | 3091 | larger number space. No device nodes with this major |
3023 | should ever be created on the filesystem. | 3092 | should ever be created on the filesystem. |
3093 | (This is probaly not true anymore, but I'll leave it | ||
3094 | for now /Torben) | ||
3095 | |||
3096 | ---LARGE MAJORS!!!!!--- | ||
3097 | |||
3098 | 256 char Equinox SST multi-port serial boards | ||
3099 | 0 = /dev/ttyEQ0 First serial port on first Equinox SST board | ||
3100 | 127 = /dev/ttyEQ127 Last serial port on first Equinox SST board | ||
3101 | 128 = /dev/ttyEQ128 First serial port on second Equinox SST board | ||
3102 | ... | ||
3103 | 1027 = /dev/ttyEQ1027 Last serial port on eighth Equinox SST board | ||
3104 | |||
3105 | 256 block Resident Flash Disk Flash Translation Layer | ||
3106 | 0 = /dev/rfda First RFD FTL layer | ||
3107 | 16 = /dev/rfdb Second RFD FTL layer | ||
3108 | ... | ||
3109 | 240 = /dev/rfdp 16th RFD FTL layer | ||
3110 | |||
3024 | 3111 | ||
3025 | **** ADDITIONAL /dev DIRECTORY ENTRIES | 3112 | **** ADDITIONAL /dev DIRECTORY ENTRIES |
3026 | 3113 | ||
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index f7293297f326..027285d0c26c 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt | |||
@@ -33,21 +33,6 @@ Who: Adrian Bunk <bunk@stusta.de> | |||
33 | 33 | ||
34 | --------------------------- | 34 | --------------------------- |
35 | 35 | ||
36 | What: RCU API moves to EXPORT_SYMBOL_GPL | ||
37 | When: April 2006 | ||
38 | Files: include/linux/rcupdate.h, kernel/rcupdate.c | ||
39 | Why: Outside of Linux, the only implementations of anything even | ||
40 | vaguely resembling RCU that I am aware of are in DYNIX/ptx, | ||
41 | VM/XA, Tornado, and K42. I do not expect anyone to port binary | ||
42 | drivers or kernel modules from any of these, since the first two | ||
43 | are owned by IBM and the last two are open-source research OSes. | ||
44 | So these will move to GPL after a grace period to allow | ||
45 | people, who might be using implementations that I am not aware | ||
46 | of, to adjust to this upcoming change. | ||
47 | Who: Paul E. McKenney <paulmck@us.ibm.com> | ||
48 | |||
49 | --------------------------- | ||
50 | |||
51 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN | 36 | What: raw1394: requests of type RAW1394_REQ_ISO_SEND, RAW1394_REQ_ISO_LISTEN |
52 | When: November 2006 | 37 | When: November 2006 |
53 | Why: Deprecated in favour of the new ioctl-based rawiso interface, which is | 38 | Why: Deprecated in favour of the new ioctl-based rawiso interface, which is |
diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 1045da582b9b..d31efbbdfe50 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking | |||
@@ -99,7 +99,7 @@ prototypes: | |||
99 | int (*sync_fs)(struct super_block *sb, int wait); | 99 | int (*sync_fs)(struct super_block *sb, int wait); |
100 | void (*write_super_lockfs) (struct super_block *); | 100 | void (*write_super_lockfs) (struct super_block *); |
101 | void (*unlockfs) (struct super_block *); | 101 | void (*unlockfs) (struct super_block *); |
102 | int (*statfs) (struct super_block *, struct kstatfs *); | 102 | int (*statfs) (struct dentry *, struct kstatfs *); |
103 | int (*remount_fs) (struct super_block *, int *, char *); | 103 | int (*remount_fs) (struct super_block *, int *, char *); |
104 | void (*clear_inode) (struct inode *); | 104 | void (*clear_inode) (struct inode *); |
105 | void (*umount_begin) (struct super_block *); | 105 | void (*umount_begin) (struct super_block *); |
@@ -142,15 +142,16 @@ see also dquot_operations section. | |||
142 | 142 | ||
143 | --------------------------- file_system_type --------------------------- | 143 | --------------------------- file_system_type --------------------------- |
144 | prototypes: | 144 | prototypes: |
145 | struct super_block *(*get_sb) (struct file_system_type *, int, | 145 | struct int (*get_sb) (struct file_system_type *, int, |
146 | const char *, void *); | 146 | const char *, void *, struct vfsmount *); |
147 | void (*kill_sb) (struct super_block *); | 147 | void (*kill_sb) (struct super_block *); |
148 | locking rules: | 148 | locking rules: |
149 | may block BKL | 149 | may block BKL |
150 | get_sb yes yes | 150 | get_sb yes yes |
151 | kill_sb yes yes | 151 | kill_sb yes yes |
152 | 152 | ||
153 | ->get_sb() returns error or a locked superblock (exclusive on ->s_umount). | 153 | ->get_sb() returns error or 0 with locked superblock attached to the vfsmount |
154 | (exclusive on ->s_umount). | ||
154 | ->kill_sb() takes a write-locked superblock, does all shutdown work on it, | 155 | ->kill_sb() takes a write-locked superblock, does all shutdown work on it, |
155 | unlocks and drops the reference. | 156 | unlocks and drops the reference. |
156 | 157 | ||
diff --git a/Documentation/filesystems/porting b/Documentation/filesystems/porting index 2f388460cbe7..5531694059ab 100644 --- a/Documentation/filesystems/porting +++ b/Documentation/filesystems/porting | |||
@@ -50,10 +50,11 @@ Turn your foo_read_super() into a function that would return 0 in case of | |||
50 | success and negative number in case of error (-EINVAL unless you have more | 50 | success and negative number in case of error (-EINVAL unless you have more |
51 | informative error value to report). Call it foo_fill_super(). Now declare | 51 | informative error value to report). Call it foo_fill_super(). Now declare |
52 | 52 | ||
53 | struct super_block foo_get_sb(struct file_system_type *fs_type, | 53 | int foo_get_sb(struct file_system_type *fs_type, |
54 | int flags, const char *dev_name, void *data) | 54 | int flags, const char *dev_name, void *data, struct vfsmount *mnt) |
55 | { | 55 | { |
56 | return get_sb_bdev(fs_type, flags, dev_name, data, ext2_fill_super); | 56 | return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, |
57 | mnt); | ||
57 | } | 58 | } |
58 | 59 | ||
59 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of | 60 | (or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of |
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt index 3a2e5520c1e3..9d3aed628bc1 100644 --- a/Documentation/filesystems/vfs.txt +++ b/Documentation/filesystems/vfs.txt | |||
@@ -113,8 +113,8 @@ members are defined: | |||
113 | struct file_system_type { | 113 | struct file_system_type { |
114 | const char *name; | 114 | const char *name; |
115 | int fs_flags; | 115 | int fs_flags; |
116 | struct super_block *(*get_sb) (struct file_system_type *, int, | 116 | struct int (*get_sb) (struct file_system_type *, int, |
117 | const char *, void *); | 117 | const char *, void *, struct vfsmount *); |
118 | void (*kill_sb) (struct super_block *); | 118 | void (*kill_sb) (struct super_block *); |
119 | struct module *owner; | 119 | struct module *owner; |
120 | struct file_system_type * next; | 120 | struct file_system_type * next; |
@@ -211,7 +211,7 @@ struct super_operations { | |||
211 | int (*sync_fs)(struct super_block *sb, int wait); | 211 | int (*sync_fs)(struct super_block *sb, int wait); |
212 | void (*write_super_lockfs) (struct super_block *); | 212 | void (*write_super_lockfs) (struct super_block *); |
213 | void (*unlockfs) (struct super_block *); | 213 | void (*unlockfs) (struct super_block *); |
214 | int (*statfs) (struct super_block *, struct kstatfs *); | 214 | int (*statfs) (struct dentry *, struct kstatfs *); |
215 | int (*remount_fs) (struct super_block *, int *, char *); | 215 | int (*remount_fs) (struct super_block *, int *, char *); |
216 | void (*clear_inode) (struct inode *); | 216 | void (*clear_inode) (struct inode *); |
217 | void (*umount_begin) (struct super_block *); | 217 | void (*umount_begin) (struct super_block *); |
diff --git a/Documentation/ia64/aliasing.txt b/Documentation/ia64/aliasing.txt new file mode 100644 index 000000000000..38f9a52d1820 --- /dev/null +++ b/Documentation/ia64/aliasing.txt | |||
@@ -0,0 +1,208 @@ | |||
1 | MEMORY ATTRIBUTE ALIASING ON IA-64 | ||
2 | |||
3 | Bjorn Helgaas | ||
4 | <bjorn.helgaas@hp.com> | ||
5 | May 4, 2006 | ||
6 | |||
7 | |||
8 | MEMORY ATTRIBUTES | ||
9 | |||
10 | Itanium supports several attributes for virtual memory references. | ||
11 | The attribute is part of the virtual translation, i.e., it is | ||
12 | contained in the TLB entry. The ones of most interest to the Linux | ||
13 | kernel are: | ||
14 | |||
15 | WB Write-back (cacheable) | ||
16 | UC Uncacheable | ||
17 | WC Write-coalescing | ||
18 | |||
19 | System memory typically uses the WB attribute. The UC attribute is | ||
20 | used for memory-mapped I/O devices. The WC attribute is uncacheable | ||
21 | like UC is, but writes may be delayed and combined to increase | ||
22 | performance for things like frame buffers. | ||
23 | |||
24 | The Itanium architecture requires that we avoid accessing the same | ||
25 | page with both a cacheable mapping and an uncacheable mapping[1]. | ||
26 | |||
27 | The design of the chipset determines which attributes are supported | ||
28 | on which regions of the address space. For example, some chipsets | ||
29 | support either WB or UC access to main memory, while others support | ||
30 | only WB access. | ||
31 | |||
32 | MEMORY MAP | ||
33 | |||
34 | Platform firmware describes the physical memory map and the | ||
35 | supported attributes for each region. At boot-time, the kernel uses | ||
36 | the EFI GetMemoryMap() interface. ACPI can also describe memory | ||
37 | devices and the attributes they support, but Linux/ia64 currently | ||
38 | doesn't use this information. | ||
39 | |||
40 | The kernel uses the efi_memmap table returned from GetMemoryMap() to | ||
41 | learn the attributes supported by each region of physical address | ||
42 | space. Unfortunately, this table does not completely describe the | ||
43 | address space because some machines omit some or all of the MMIO | ||
44 | regions from the map. | ||
45 | |||
46 | The kernel maintains another table, kern_memmap, which describes the | ||
47 | memory Linux is actually using and the attribute for each region. | ||
48 | This contains only system memory; it does not contain MMIO space. | ||
49 | |||
50 | The kern_memmap table typically contains only a subset of the system | ||
51 | memory described by the efi_memmap. Linux/ia64 can't use all memory | ||
52 | in the system because of constraints imposed by the identity mapping | ||
53 | scheme. | ||
54 | |||
55 | The efi_memmap table is preserved unmodified because the original | ||
56 | boot-time information is required for kexec. | ||
57 | |||
58 | KERNEL IDENTITY MAPPINGS | ||
59 | |||
60 | Linux/ia64 identity mappings are done with large pages, currently | ||
61 | either 16MB or 64MB, referred to as "granules." Cacheable mappings | ||
62 | are speculative[2], so the processor can read any location in the | ||
63 | page at any time, independent of the programmer's intentions. This | ||
64 | means that to avoid attribute aliasing, Linux can create a cacheable | ||
65 | identity mapping only when the entire granule supports cacheable | ||
66 | access. | ||
67 | |||
68 | Therefore, kern_memmap contains only full granule-sized regions that | ||
69 | can referenced safely by an identity mapping. | ||
70 | |||
71 | Uncacheable mappings are not speculative, so the processor will | ||
72 | generate UC accesses only to locations explicitly referenced by | ||
73 | software. This allows UC identity mappings to cover granules that | ||
74 | are only partially populated, or populated with a combination of UC | ||
75 | and WB regions. | ||
76 | |||
77 | USER MAPPINGS | ||
78 | |||
79 | User mappings are typically done with 16K or 64K pages. The smaller | ||
80 | page size allows more flexibility because only 16K or 64K has to be | ||
81 | homogeneous with respect to memory attributes. | ||
82 | |||
83 | POTENTIAL ATTRIBUTE ALIASING CASES | ||
84 | |||
85 | There are several ways the kernel creates new mappings: | ||
86 | |||
87 | mmap of /dev/mem | ||
88 | |||
89 | This uses remap_pfn_range(), which creates user mappings. These | ||
90 | mappings may be either WB or UC. If the region being mapped | ||
91 | happens to be in kern_memmap, meaning that it may also be mapped | ||
92 | by a kernel identity mapping, the user mapping must use the same | ||
93 | attribute as the kernel mapping. | ||
94 | |||
95 | If the region is not in kern_memmap, the user mapping should use | ||
96 | an attribute reported as being supported in the EFI memory map. | ||
97 | |||
98 | Since the EFI memory map does not describe MMIO on some | ||
99 | machines, this should use an uncacheable mapping as a fallback. | ||
100 | |||
101 | mmap of /sys/class/pci_bus/.../legacy_mem | ||
102 | |||
103 | This is very similar to mmap of /dev/mem, except that legacy_mem | ||
104 | only allows mmap of the one megabyte "legacy MMIO" area for a | ||
105 | specific PCI bus. Typically this is the first megabyte of | ||
106 | physical address space, but it may be different on machines with | ||
107 | several VGA devices. | ||
108 | |||
109 | "X" uses this to access VGA frame buffers. Using legacy_mem | ||
110 | rather than /dev/mem allows multiple instances of X to talk to | ||
111 | different VGA cards. | ||
112 | |||
113 | The /dev/mem mmap constraints apply. | ||
114 | |||
115 | However, since this is for mapping legacy MMIO space, WB access | ||
116 | does not make sense. This matters on machines without legacy | ||
117 | VGA support: these machines may have WB memory for the entire | ||
118 | first megabyte (or even the entire first granule). | ||
119 | |||
120 | On these machines, we could mmap legacy_mem as WB, which would | ||
121 | be safe in terms of attribute aliasing, but X has no way of | ||
122 | knowing that it is accessing regular memory, not a frame buffer, | ||
123 | so the kernel should fail the mmap rather than doing it with WB. | ||
124 | |||
125 | read/write of /dev/mem | ||
126 | |||
127 | This uses copy_from_user(), which implicitly uses a kernel | ||
128 | identity mapping. This is obviously safe for things in | ||
129 | kern_memmap. | ||
130 | |||
131 | There may be corner cases of things that are not in kern_memmap, | ||
132 | but could be accessed this way. For example, registers in MMIO | ||
133 | space are not in kern_memmap, but could be accessed with a UC | ||
134 | mapping. This would not cause attribute aliasing. But | ||
135 | registers typically can be accessed only with four-byte or | ||
136 | eight-byte accesses, and the copy_from_user() path doesn't allow | ||
137 | any control over the access size, so this would be dangerous. | ||
138 | |||
139 | ioremap() | ||
140 | |||
141 | This returns a kernel identity mapping for use inside the | ||
142 | kernel. | ||
143 | |||
144 | If the region is in kern_memmap, we should use the attribute | ||
145 | specified there. Otherwise, if the EFI memory map reports that | ||
146 | the entire granule supports WB, we should use that (granules | ||
147 | that are partially reserved or occupied by firmware do not appear | ||
148 | in kern_memmap). Otherwise, we should use a UC mapping. | ||
149 | |||
150 | PAST PROBLEM CASES | ||
151 | |||
152 | mmap of various MMIO regions from /dev/mem by "X" on Intel platforms | ||
153 | |||
154 | The EFI memory map may not report these MMIO regions. | ||
155 | |||
156 | These must be allowed so that X will work. This means that | ||
157 | when the EFI memory map is incomplete, every /dev/mem mmap must | ||
158 | succeed. It may create either WB or UC user mappings, depending | ||
159 | on whether the region is in kern_memmap or the EFI memory map. | ||
160 | |||
161 | mmap of 0x0-0xA0000 /dev/mem by "hwinfo" on HP sx1000 with VGA enabled | ||
162 | |||
163 | See https://bugzilla.novell.com/show_bug.cgi?id=140858. | ||
164 | |||
165 | The EFI memory map reports the following attributes: | ||
166 | 0x00000-0x9FFFF WB only | ||
167 | 0xA0000-0xBFFFF UC only (VGA frame buffer) | ||
168 | 0xC0000-0xFFFFF WB only | ||
169 | |||
170 | This mmap is done with user pages, not kernel identity mappings, | ||
171 | so it is safe to use WB mappings. | ||
172 | |||
173 | The kernel VGA driver may ioremap the VGA frame buffer at 0xA0000, | ||
174 | which will use a granule-sized UC mapping covering 0-0xFFFFF. This | ||
175 | granule covers some WB-only memory, but since UC is non-speculative, | ||
176 | the processor will never generate an uncacheable reference to the | ||
177 | WB-only areas unless the driver explicitly touches them. | ||
178 | |||
179 | mmap of 0x0-0xFFFFF legacy_mem by "X" | ||
180 | |||
181 | If the EFI memory map reports this entire range as WB, there | ||
182 | is no VGA MMIO hole, and the mmap should fail or be done with | ||
183 | a WB mapping. | ||
184 | |||
185 | There's no easy way for X to determine whether the 0xA0000-0xBFFFF | ||
186 | region is a frame buffer or just memory, so I think it's best to | ||
187 | just fail this mmap request rather than using a WB mapping. As | ||
188 | far as I know, there's no need to map legacy_mem with WB | ||
189 | mappings. | ||
190 | |||
191 | Otherwise, a UC mapping of the entire region is probably safe. | ||
192 | The VGA hole means the region will not be in kern_memmap. The | ||
193 | HP sx1000 chipset doesn't support UC access to the memory surrounding | ||
194 | the VGA hole, but X doesn't need that area anyway and should not | ||
195 | reference it. | ||
196 | |||
197 | mmap of 0xA0000-0xBFFFF legacy_mem by "X" on HP sx1000 with VGA disabled | ||
198 | |||
199 | The EFI memory map reports the following attributes: | ||
200 | 0x00000-0xFFFFF WB only (no VGA MMIO hole) | ||
201 | |||
202 | This is a special case of the previous case, and the mmap should | ||
203 | fail for the same reason as above. | ||
204 | |||
205 | NOTES | ||
206 | |||
207 | [1] SDM rev 2.2, vol 2, sec 4.4.1. | ||
208 | [2] SDM rev 2.2, vol 2, sec 4.4.6. | ||
diff --git a/Documentation/ioctl-number.txt b/Documentation/ioctl-number.txt index 171a44ebd939..1543802ef53e 100644 --- a/Documentation/ioctl-number.txt +++ b/Documentation/ioctl-number.txt | |||
@@ -85,7 +85,9 @@ Code Seq# Include File Comments | |||
85 | <mailto:maassen@uni-freiburg.de> | 85 | <mailto:maassen@uni-freiburg.de> |
86 | 'C' all linux/soundcard.h | 86 | 'C' all linux/soundcard.h |
87 | 'D' all asm-s390/dasd.h | 87 | 'D' all asm-s390/dasd.h |
88 | 'E' all linux/input.h | ||
88 | 'F' all linux/fb.h | 89 | 'F' all linux/fb.h |
90 | 'H' all linux/hiddev.h | ||
89 | 'I' all linux/isdn.h | 91 | 'I' all linux/isdn.h |
90 | 'J' 00-1F drivers/scsi/gdth_ioctl.h | 92 | 'J' 00-1F drivers/scsi/gdth_ioctl.h |
91 | 'K' all linux/kd.h | 93 | 'K' all linux/kd.h |
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index a9d3a1794b23..bca6f389da66 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt | |||
@@ -147,6 +147,9 @@ running once the system is up. | |||
147 | acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA | 147 | acpi_irq_isa= [HW,ACPI] If irq_balance, mark listed IRQs used by ISA |
148 | Format: <irq>,<irq>... | 148 | Format: <irq>,<irq>... |
149 | 149 | ||
150 | acpi_os_name= [HW,ACPI] Tell ACPI BIOS the name of the OS | ||
151 | Format: To spoof as Windows 98: ="Microsoft Windows" | ||
152 | |||
150 | acpi_osi= [HW,ACPI] empty param disables _OSI | 153 | acpi_osi= [HW,ACPI] empty param disables _OSI |
151 | 154 | ||
152 | acpi_serialize [HW,ACPI] force serialization of AML methods | 155 | acpi_serialize [HW,ACPI] force serialization of AML methods |
diff --git a/Documentation/networking/tuntap.txt b/Documentation/networking/tuntap.txt index 76750fb9151a..839cbb71388b 100644 --- a/Documentation/networking/tuntap.txt +++ b/Documentation/networking/tuntap.txt | |||
@@ -39,10 +39,13 @@ Copyright (C) 1999-2000 Maxim Krasnyansky <max_mk@yahoo.com> | |||
39 | mknod /dev/net/tun c 10 200 | 39 | mknod /dev/net/tun c 10 200 |
40 | 40 | ||
41 | Set permissions: | 41 | Set permissions: |
42 | e.g. chmod 0700 /dev/net/tun | 42 | e.g. chmod 0666 /dev/net/tun |
43 | if you want the device only accessible by root. Giving regular users the | 43 | There's no harm in allowing the device to be accessible by non-root users, |
44 | right to assign network devices is NOT a good idea. Users could assign | 44 | since CAP_NET_ADMIN is required for creating network devices or for |
45 | bogus network interfaces to trick firewalls or administrators. | 45 | connecting to network devices which aren't owned by the user in question. |
46 | If you want to create persistent devices and give ownership of them to | ||
47 | unprivileged users, then you need the /dev/net/tun device to be usable by | ||
48 | those users. | ||
46 | 49 | ||
47 | Driver module autoloading | 50 | Driver module autoloading |
48 | 51 | ||
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt index 516c5019013b..823b2cf6e3dc 100644 --- a/Documentation/power/swsusp.txt +++ b/Documentation/power/swsusp.txt | |||
@@ -350,9 +350,34 @@ Q: How do I make suspend more verbose? | |||
350 | 350 | ||
351 | A: If you want to see any non-error kernel messages on the virtual | 351 | A: If you want to see any non-error kernel messages on the virtual |
352 | terminal the kernel switches to during suspend, you have to set the | 352 | terminal the kernel switches to during suspend, you have to set the |
353 | kernel console loglevel to at least 5, for example by doing | 353 | kernel console loglevel to at least 4 (KERN_WARNING), for example by |
354 | 354 | doing | |
355 | echo 5 > /proc/sys/kernel/printk | 355 | |
356 | # save the old loglevel | ||
357 | read LOGLEVEL DUMMY < /proc/sys/kernel/printk | ||
358 | # set the loglevel so we see the progress bar. | ||
359 | # if the level is higher than needed, we leave it alone. | ||
360 | if [ $LOGLEVEL -lt 5 ]; then | ||
361 | echo 5 > /proc/sys/kernel/printk | ||
362 | fi | ||
363 | |||
364 | IMG_SZ=0 | ||
365 | read IMG_SZ < /sys/power/image_size | ||
366 | echo -n disk > /sys/power/state | ||
367 | RET=$? | ||
368 | # | ||
369 | # the logic here is: | ||
370 | # if image_size > 0 (without kernel support, IMG_SZ will be zero), | ||
371 | # then try again with image_size set to zero. | ||
372 | if [ $RET -ne 0 -a $IMG_SZ -ne 0 ]; then # try again with minimal image size | ||
373 | echo 0 > /sys/power/image_size | ||
374 | echo -n disk > /sys/power/state | ||
375 | RET=$? | ||
376 | fi | ||
377 | |||
378 | # restore previous loglevel | ||
379 | echo $LOGLEVEL > /proc/sys/kernel/printk | ||
380 | exit $RET | ||
356 | 381 | ||
357 | Q: Is this true that if I have a mounted filesystem on a USB device and | 382 | Q: Is this true that if I have a mounted filesystem on a USB device and |
358 | I suspend to disk, I can lose data unless the filesystem has been mounted | 383 | I suspend to disk, I can lose data unless the filesystem has been mounted |
@@ -380,3 +405,17 @@ safest thing is to unmount all filesystems on removable media (such USB, | |||
380 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) | 405 | Firewire, CompactFlash, MMC, external SATA, or even IDE hotplug bays) |
381 | before suspending; then remount them after resuming. | 406 | before suspending; then remount them after resuming. |
382 | 407 | ||
408 | Q: I upgraded the kernel from 2.6.15 to 2.6.16. Both kernels were | ||
409 | compiled with the similar configuration files. Anyway I found that | ||
410 | suspend to disk (and resume) is much slower on 2.6.16 compared to | ||
411 | 2.6.15. Any idea for why that might happen or how can I speed it up? | ||
412 | |||
413 | A: This is because the size of the suspend image is now greater than | ||
414 | for 2.6.15 (by saving more data we can get more responsive system | ||
415 | after resume). | ||
416 | |||
417 | There's the /sys/power/image_size knob that controls the size of the | ||
418 | image. If you set it to 0 (eg. by echo 0 > /sys/power/image_size as | ||
419 | root), the 2.6.15 behavior should be restored. If it is still too | ||
420 | slow, take a look at suspend.sf.net -- userland suspend is faster and | ||
421 | supports LZF compression to speed it up further. | ||
diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt index 43a889f8f08d..d859faa3a463 100644 --- a/Documentation/power/video.txt +++ b/Documentation/power/video.txt | |||
@@ -90,6 +90,7 @@ Table of known working notebooks: | |||
90 | Model hack (or "how to do it") | 90 | Model hack (or "how to do it") |
91 | ------------------------------------------------------------------------------ | 91 | ------------------------------------------------------------------------------ |
92 | Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI | 92 | Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI |
93 | Acer TM 230 s3_bios (2) | ||
93 | Acer TM 242FX vbetool (6) | 94 | Acer TM 242FX vbetool (6) |
94 | Acer TM C110 video_post (8) | 95 | Acer TM C110 video_post (8) |
95 | Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8) | 96 | Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8) |
@@ -115,6 +116,7 @@ Dell D610 vga=normal and X (possibly vbestate (6) too, but not tested) | |||
115 | Dell Inspiron 4000 ??? (*) | 116 | Dell Inspiron 4000 ??? (*) |
116 | Dell Inspiron 500m ??? (*) | 117 | Dell Inspiron 500m ??? (*) |
117 | Dell Inspiron 510m ??? | 118 | Dell Inspiron 510m ??? |
119 | Dell Inspiron 5150 vbetool needed (6) | ||
118 | Dell Inspiron 600m ??? (*) | 120 | Dell Inspiron 600m ??? (*) |
119 | Dell Inspiron 8200 ??? (*) | 121 | Dell Inspiron 8200 ??? (*) |
120 | Dell Inspiron 8500 ??? (*) | 122 | Dell Inspiron 8500 ??? (*) |
@@ -125,6 +127,7 @@ HP NX7000 ??? (*) | |||
125 | HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X | 127 | HP Pavilion ZD7000 vbetool post needed, need open-source nv driver for X |
126 | HP Omnibook XE3 athlon version none (1) | 128 | HP Omnibook XE3 athlon version none (1) |
127 | HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV | 129 | HP Omnibook XE3GC none (1), video is S3 Savage/IX-MV |
130 | HP Omnibook XE3L-GF vbetool (6) | ||
128 | HP Omnibook 5150 none (1), (S1 also works OK) | 131 | HP Omnibook 5150 none (1), (S1 also works OK) |
129 | IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work. | 132 | IBM TP T20, model 2647-44G none (1), video is S3 Inc. 86C270-294 Savage/IX-MV, vesafb gets "interesting" but X work. |
130 | IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(] | 133 | IBM TP A31 / Type 2652-M5G s3_mode (3) [works ok with BIOS 1.04 2002-08-23, but not at all with BIOS 1.11 2004-11-05 :-(] |
@@ -157,6 +160,7 @@ Sony Vaio vgn-s260 X or boot-radeon can init it (5) | |||
157 | Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X. | 160 | Sony Vaio vgn-S580BH vga=normal, but suspend from X. Console will be blank unless you return to X. |
158 | Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4) | 161 | Sony Vaio vgn-FS115B s3_bios (2),s3_mode (4) |
159 | Toshiba Libretto L5 none (1) | 162 | Toshiba Libretto L5 none (1) |
163 | Toshiba Libretto 100CT/110CT vbetool (6) | ||
160 | Toshiba Portege 3020CT s3_mode (3) | 164 | Toshiba Portege 3020CT s3_mode (3) |
161 | Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK) | 165 | Toshiba Satellite 4030CDT s3_mode (3) (S1 also works OK) |
162 | Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK) | 166 | Toshiba Satellite 4080XCDT s3_mode (3) (S1 also works OK) |
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt index 3f1c5464b1c9..5a311c38dd1a 100644 --- a/Documentation/sparse.txt +++ b/Documentation/sparse.txt | |||
@@ -1,5 +1,6 @@ | |||
1 | Copyright 2004 Linus Torvalds | 1 | Copyright 2004 Linus Torvalds |
2 | Copyright 2004 Pavel Machek <pavel@suse.cz> | 2 | Copyright 2004 Pavel Machek <pavel@suse.cz> |
3 | Copyright 2006 Bob Copeland <me@bobcopeland.com> | ||
3 | 4 | ||
4 | Using sparse for typechecking | 5 | Using sparse for typechecking |
5 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 6 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
@@ -41,15 +42,8 @@ sure that bitwise types don't get mixed up (little-endian vs big-endian | |||
41 | vs cpu-endian vs whatever), and there the constant "0" really _is_ | 42 | vs cpu-endian vs whatever), and there the constant "0" really _is_ |
42 | special. | 43 | special. |
43 | 44 | ||
44 | Use | 45 | Getting sparse |
45 | 46 | ~~~~~~~~~~~~~~ | |
46 | make C=[12] CF=-Wbitwise | ||
47 | |||
48 | or you don't get any checking at all. | ||
49 | |||
50 | |||
51 | Where to get sparse | ||
52 | ~~~~~~~~~~~~~~~~~~~ | ||
53 | 47 | ||
54 | With git, you can just get it from | 48 | With git, you can just get it from |
55 | 49 | ||
@@ -57,7 +51,7 @@ With git, you can just get it from | |||
57 | 51 | ||
58 | and DaveJ has tar-balls at | 52 | and DaveJ has tar-balls at |
59 | 53 | ||
60 | http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ | 54 | http://www.codemonkey.org.uk/projects/git-snapshots/sparse/ |
61 | 55 | ||
62 | 56 | ||
63 | Once you have it, just do | 57 | Once you have it, just do |
@@ -65,8 +59,20 @@ Once you have it, just do | |||
65 | make | 59 | make |
66 | make install | 60 | make install |
67 | 61 | ||
68 | as your regular user, and it will install sparse in your ~/bin directory. | 62 | as a regular user, and it will install sparse in your ~/bin directory. |
69 | After that, doing a kernel make with "make C=1" will run sparse on all the | 63 | |
70 | C files that get recompiled, or with "make C=2" will run sparse on the | 64 | Using sparse |
71 | files whether they need to be recompiled or not (ie the latter is fast way | 65 | ~~~~~~~~~~~~ |
72 | to check the whole tree if you have already built it). | 66 | |
67 | Do a kernel make with "make C=1" to run sparse on all the C files that get | ||
68 | recompiled, or use "make C=2" to run sparse on the files whether they need to | ||
69 | be recompiled or not. The latter is a fast way to check the whole tree if you | ||
70 | have already built it. | ||
71 | |||
72 | The optional make variable CF can be used to pass arguments to sparse. The | ||
73 | build system passes -Wbitwise to sparse automatically. To perform endianness | ||
74 | checks, you may define __CHECK_ENDIAN__: | ||
75 | |||
76 | make C=2 CF="-D__CHECK_ENDIAN__" | ||
77 | |||
78 | These checks are disabled by default as they generate a host of warnings. | ||
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt index a46c10fcddfc..2dc246af4885 100644 --- a/Documentation/sysctl/vm.txt +++ b/Documentation/sysctl/vm.txt | |||
@@ -29,6 +29,7 @@ Currently, these files are in /proc/sys/vm: | |||
29 | - drop-caches | 29 | - drop-caches |
30 | - zone_reclaim_mode | 30 | - zone_reclaim_mode |
31 | - zone_reclaim_interval | 31 | - zone_reclaim_interval |
32 | - panic_on_oom | ||
32 | 33 | ||
33 | ============================================================== | 34 | ============================================================== |
34 | 35 | ||
@@ -178,3 +179,15 @@ Time is set in seconds and set by default to 30 seconds. | |||
178 | Reduce the interval if undesired off node allocations occur. However, too | 179 | Reduce the interval if undesired off node allocations occur. However, too |
179 | frequent scans will have a negative impact onoff node allocation performance. | 180 | frequent scans will have a negative impact onoff node allocation performance. |
180 | 181 | ||
182 | ============================================================= | ||
183 | |||
184 | panic_on_oom | ||
185 | |||
186 | This enables or disables panic on out-of-memory feature. If this is set to 1, | ||
187 | the kernel panics when out-of-memory happens. If this is set to 0, the kernel | ||
188 | will kill some rogue process, called oom_killer. Usually, oom_killer can kill | ||
189 | rogue processes and system will survive. If you want to panic the system | ||
190 | rather than killing rogue processes, set this to 1. | ||
191 | |||
192 | The default value is 0. | ||
193 | |||
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration index 0dd4ef30c361..99f89aa10169 100644 --- a/Documentation/vm/page_migration +++ b/Documentation/vm/page_migration | |||
@@ -26,8 +26,13 @@ a process are located. See also the numa_maps manpage in the numactl package. | |||
26 | Manual migration is useful if for example the scheduler has relocated | 26 | Manual migration is useful if for example the scheduler has relocated |
27 | a process to a processor on a distant node. A batch scheduler or an | 27 | a process to a processor on a distant node. A batch scheduler or an |
28 | administrator may detect the situation and move the pages of the process | 28 | administrator may detect the situation and move the pages of the process |
29 | nearer to the new processor. At some point in the future we may have | 29 | nearer to the new processor. The kernel itself does only provide |
30 | some mechanism in the scheduler that will automatically move the pages. | 30 | manual page migration support. Automatic page migration may be implemented |
31 | through user space processes that move pages. A special function call | ||
32 | "move_pages" allows the moving of individual pages within a process. | ||
33 | A NUMA profiler may f.e. obtain a log showing frequent off node | ||
34 | accesses and may use the result to move pages to more advantageous | ||
35 | locations. | ||
31 | 36 | ||
32 | Larger installations usually partition the system using cpusets into | 37 | Larger installations usually partition the system using cpusets into |
33 | sections of nodes. Paul Jackson has equipped cpusets with the ability to | 38 | sections of nodes. Paul Jackson has equipped cpusets with the ability to |
@@ -62,22 +67,14 @@ A. In kernel use of migrate_pages() | |||
62 | It also prevents the swapper or other scans to encounter | 67 | It also prevents the swapper or other scans to encounter |
63 | the page. | 68 | the page. |
64 | 69 | ||
65 | 2. Generate a list of newly allocates page. These pages will contain the | 70 | 2. We need to have a function of type new_page_t that can be |
66 | contents of the pages from the first list after page migration is | 71 | passed to migrate_pages(). This function should figure out |
67 | complete. | 72 | how to allocate the correct new page given the old page. |
68 | 73 | ||
69 | 3. The migrate_pages() function is called which attempts | 74 | 3. The migrate_pages() function is called which attempts |
70 | to do the migration. It returns the moved pages in the | 75 | to do the migration. It will call the function to allocate |
71 | list specified as the third parameter and the failed | 76 | the new page for each page that is considered for |
72 | migrations in the fourth parameter. The first parameter | 77 | moving. |
73 | will contain the pages that could still be retried. | ||
74 | |||
75 | 4. The leftover pages of various types are returned | ||
76 | to the LRU using putback_to_lru_pages() or otherwise | ||
77 | disposed of. The pages will still have the refcount as | ||
78 | increased by isolate_lru_pages() if putback_to_lru_pages() is not | ||
79 | used! The kernel may want to handle the various cases of failures in | ||
80 | different ways. | ||
81 | 78 | ||
82 | B. How migrate_pages() works | 79 | B. How migrate_pages() works |
83 | ---------------------------- | 80 | ---------------------------- |
@@ -93,83 +90,58 @@ Steps: | |||
93 | 90 | ||
94 | 2. Insure that writeback is complete. | 91 | 2. Insure that writeback is complete. |
95 | 92 | ||
96 | 3. Make sure that the page has assigned swap cache entry if | 93 | 3. Prep the new page that we want to move to. It is locked |
97 | it is an anonyous page. The swap cache reference is necessary | ||
98 | to preserve the information contain in the page table maps while | ||
99 | page migration occurs. | ||
100 | |||
101 | 4. Prep the new page that we want to move to. It is locked | ||
102 | and set to not being uptodate so that all accesses to the new | 94 | and set to not being uptodate so that all accesses to the new |
103 | page immediately lock while the move is in progress. | 95 | page immediately lock while the move is in progress. |
104 | 96 | ||
105 | 5. All the page table references to the page are either dropped (file | 97 | 4. The new page is prepped with some settings from the old page so that |
106 | backed pages) or converted to swap references (anonymous pages). | 98 | accesses to the new page will discover a page with the correct settings. |
107 | This should decrease the reference count. | 99 | |
100 | 5. All the page table references to the page are converted | ||
101 | to migration entries or dropped (nonlinear vmas). | ||
102 | This decrease the mapcount of a page. If the resulting | ||
103 | mapcount is not zero then we do not migrate the page. | ||
104 | All user space processes that attempt to access the page | ||
105 | will now wait on the page lock. | ||
108 | 106 | ||
109 | 6. The radix tree lock is taken. This will cause all processes trying | 107 | 6. The radix tree lock is taken. This will cause all processes trying |
110 | to reestablish a pte to block on the radix tree spinlock. | 108 | to access the page via the mapping to block on the radix tree spinlock. |
111 | 109 | ||
112 | 7. The refcount of the page is examined and we back out if references remain | 110 | 7. The refcount of the page is examined and we back out if references remain |
113 | otherwise we know that we are the only one referencing this page. | 111 | otherwise we know that we are the only one referencing this page. |
114 | 112 | ||
115 | 8. The radix tree is checked and if it does not contain the pointer to this | 113 | 8. The radix tree is checked and if it does not contain the pointer to this |
116 | page then we back out because someone else modified the mapping first. | 114 | page then we back out because someone else modified the radix tree. |
117 | |||
118 | 9. The mapping is checked. If the mapping is gone then a truncate action may | ||
119 | be in progress and we back out. | ||
120 | |||
121 | 10. The new page is prepped with some settings from the old page so that | ||
122 | accesses to the new page will be discovered to have the correct settings. | ||
123 | 115 | ||
124 | 11. The radix tree is changed to point to the new page. | 116 | 9. The radix tree is changed to point to the new page. |
125 | 117 | ||
126 | 12. The reference count of the old page is dropped because the radix tree | 118 | 10. The reference count of the old page is dropped because the radix tree |
127 | reference is gone. | 119 | reference is gone. A reference to the new page is established because |
120 | the new page is referenced to by the radix tree. | ||
128 | 121 | ||
129 | 13. The radix tree lock is dropped. With that lookups become possible again | 122 | 11. The radix tree lock is dropped. With that lookups in the mapping |
130 | and other processes will move from spinning on the tree lock to sleeping on | 123 | become possible again. Processes will move from spinning on the tree_lock |
131 | the locked new page. | 124 | to sleeping on the locked new page. |
132 | 125 | ||
133 | 14. The page contents are copied to the new page. | 126 | 12. The page contents are copied to the new page. |
134 | 127 | ||
135 | 15. The remaining page flags are copied to the new page. | 128 | 13. The remaining page flags are copied to the new page. |
136 | 129 | ||
137 | 16. The old page flags are cleared to indicate that the page does | 130 | 14. The old page flags are cleared to indicate that the page does |
138 | not use any information anymore. | 131 | not provide any information anymore. |
139 | 132 | ||
140 | 17. Queued up writeback on the new page is triggered. | 133 | 15. Queued up writeback on the new page is triggered. |
141 | 134 | ||
142 | 18. If swap pte's were generated for the page then replace them with real | 135 | 16. If migration entries were page then replace them with real ptes. Doing |
143 | ptes. This will reenable access for processes not blocked by the page lock. | 136 | so will enable access for user space processes not already waiting for |
137 | the page lock. | ||
144 | 138 | ||
145 | 19. The page locks are dropped from the old and new page. | 139 | 19. The page locks are dropped from the old and new page. |
146 | Processes waiting on the page lock can continue. | 140 | Processes waiting on the page lock will redo their page faults |
141 | and will reach the new page. | ||
147 | 142 | ||
148 | 20. The new page is moved to the LRU and can be scanned by the swapper | 143 | 20. The new page is moved to the LRU and can be scanned by the swapper |
149 | etc again. | 144 | etc again. |
150 | 145 | ||
151 | TODO list | 146 | Christoph Lameter, May 8, 2006. |
152 | --------- | ||
153 | |||
154 | - Page migration requires the use of swap handles to preserve the | ||
155 | information of the anonymous page table entries. This means that swap | ||
156 | space is reserved but never used. The maximum number of swap handles used | ||
157 | is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration. | ||
158 | Reservation of pages could be avoided by having a special type of swap | ||
159 | handle that does not require swap space and that would only track the page | ||
160 | references. Something like that was proposed by Marcelo Tosatti in the | ||
161 | past (search for migration cache on lkml or linux-mm@kvack.org). | ||
162 | |||
163 | - Page migration unmaps ptes for file backed pages and requires page | ||
164 | faults to reestablish these ptes. This could be optimized by somehow | ||
165 | recording the references before migration and then reestablish them later. | ||
166 | However, there are several locking challenges that have to be overcome | ||
167 | before this is possible. | ||
168 | |||
169 | - Page migration generates read ptes for anonymous pages. Dirty page | ||
170 | faults are required to make the pages writable again. It may be possible | ||
171 | to generate a pte marked dirty if it is known that the page is dirty and | ||
172 | that this process has the only reference to that page. | ||
173 | |||
174 | Christoph Lameter, March 8, 2006. | ||
175 | 147 | ||