aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/cpu-hotplug.txt27
-rw-r--r--Documentation/cpusets.txt41
-rw-r--r--Documentation/dvb/bt8xx.txt6
-rw-r--r--Documentation/feature-removal-schedule.txt18
-rw-r--r--Documentation/filesystems/ntfs.txt6
-rw-r--r--Documentation/filesystems/tmpfs.txt30
-rw-r--r--Documentation/filesystems/v9fs.txt16
-rw-r--r--Documentation/fujitsu/frv/kernel-ABI.txt234
-rw-r--r--Documentation/hwmon/w83627hf4
-rw-r--r--Documentation/kernel-parameters.txt26
-rw-r--r--Documentation/kprobes.txt81
-rw-r--r--Documentation/mips/AU1xxx_IDE.README6
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas23
-rw-r--r--Documentation/sysctl/kernel.txt10
-rw-r--r--Documentation/video4linux/CARDLIST.saa71344
-rw-r--r--Documentation/vm/page_migration118
-rw-r--r--Documentation/x86_64/boot-options.txt4
17 files changed, 530 insertions, 124 deletions
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 08c5d04f3086..57a09f99ecb0 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -11,6 +11,8 @@
11 Joel Schopp <jschopp@austin.ibm.com> 11 Joel Schopp <jschopp@austin.ibm.com>
12 ia64/x86_64: 12 ia64/x86_64:
13 Ashok Raj <ashok.raj@intel.com> 13 Ashok Raj <ashok.raj@intel.com>
14 s390:
15 Heiko Carstens <heiko.carstens@de.ibm.com>
14 16
15Authors: Ashok Raj <ashok.raj@intel.com> 17Authors: Ashok Raj <ashok.raj@intel.com>
16Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>, 18Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>,
@@ -44,9 +46,28 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
44 maxcpus=2 will only boot 2. You can choose to bring the 46 maxcpus=2 will only boot 2. You can choose to bring the
45 other cpus later online, read FAQ's for more info. 47 other cpus later online, read FAQ's for more info.
46 48
47additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus. 49additional_cpus*=n Use this to limit hotpluggable cpus. This option sets
48 This option sets 50 cpu_possible_map = cpu_present_map + additional_cpus
49 cpu_possible_map = cpu_present_map + additional_cpus 51
52(*) Option valid only for following architectures
53- x86_64, ia64, s390
54
55ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT
56to determine the number of potentially hot-pluggable cpus. The implementation
57should only rely on this to count the #of cpus, but *MUST* not rely on the
58apicid values in those tables for disabled apics. In the event BIOS doesnt
59mark such hot-pluggable cpus as disabled entries, one could use this
60parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
61
62s390 uses the number of cpus it detects at IPL time to also the number of bits
63in cpu_possible_map. If it is desired to add additional cpus at a later time
64the number should be specified using this option or the possible_cpus option.
65
66possible_cpus=n [s390 only] use this to set hotpluggable cpus.
67 This option sets possible_cpus bits in
68 cpu_possible_map. Thus keeping the numbers of bits set
69 constant even if the machine gets rebooted.
70 This option overrides additional_cpus.
50 71
51CPU maps and such 72CPU maps and such
52----------------- 73-----------------
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 990998ee10b6..30c41459953c 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -4,8 +4,9 @@
4Copyright (C) 2004 BULL SA. 4Copyright (C) 2004 BULL SA.
5Written by Simon.Derr@bull.net 5Written by Simon.Derr@bull.net
6 6
7Portions Copyright (c) 2004 Silicon Graphics, Inc. 7Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8Modified by Paul Jackson <pj@sgi.com> 8Modified by Paul Jackson <pj@sgi.com>
9Modified by Christoph Lameter <clameter@sgi.com>
9 10
10CONTENTS: 11CONTENTS:
11========= 12=========
@@ -90,7 +91,8 @@ This can be especially valuable on:
90 91
91These subsets, or "soft partitions" must be able to be dynamically 92These subsets, or "soft partitions" must be able to be dynamically
92adjusted, as the job mix changes, without impacting other concurrently 93adjusted, as the job mix changes, without impacting other concurrently
93executing jobs. 94executing jobs. The location of the running jobs pages may also be moved
95when the memory locations are changed.
94 96
95The kernel cpuset patch provides the minimum essential kernel 97The kernel cpuset patch provides the minimum essential kernel
96mechanisms required to efficiently implement such subsets. It 98mechanisms required to efficiently implement such subsets. It
@@ -102,8 +104,8 @@ memory allocator code.
1021.3 How are cpusets implemented ? 1041.3 How are cpusets implemented ?
103--------------------------------- 105---------------------------------
104 106
105Cpusets provide a Linux kernel (2.6.7 and above) mechanism to constrain 107Cpusets provide a Linux kernel mechanism to constrain which CPUs and
106which CPUs and Memory Nodes are used by a process or set of processes. 108Memory Nodes are used by a process or set of processes.
107 109
108The Linux kernel already has a pair of mechanisms to specify on which 110The Linux kernel already has a pair of mechanisms to specify on which
109CPUs a task may be scheduled (sched_setaffinity) and on which Memory 111CPUs a task may be scheduled (sched_setaffinity) and on which Memory
@@ -371,22 +373,17 @@ cpusets memory placement policy 'mems' subsequently changes.
371If the cpuset flag file 'memory_migrate' is set true, then when 373If the cpuset flag file 'memory_migrate' is set true, then when
372tasks are attached to that cpuset, any pages that task had 374tasks are attached to that cpuset, any pages that task had
373allocated to it on nodes in its previous cpuset are migrated 375allocated to it on nodes in its previous cpuset are migrated
374to the tasks new cpuset. Depending on the implementation, 376to the tasks new cpuset. The relative placement of the page within
375this migration may either be done by swapping the page out, 377the cpuset is preserved during these migration operations if possible.
376so that the next time the page is referenced, it will be paged 378For example if the page was on the second valid node of the prior cpuset
377into the tasks new cpuset, usually on the node where it was 379then the page will be placed on the second valid node of the new cpuset.
378referenced, or this migration may be done by directly copying 380
379the pages from the tasks previous cpuset to the new cpuset,
380where possible to the same node, relative to the new cpuset,
381as the node that held the page, relative to the old cpuset.
382Also if 'memory_migrate' is set true, then if that cpusets 381Also if 'memory_migrate' is set true, then if that cpusets
383'mems' file is modified, pages allocated to tasks in that 382'mems' file is modified, pages allocated to tasks in that
384cpuset, that were on nodes in the previous setting of 'mems', 383cpuset, that were on nodes in the previous setting of 'mems',
385will be moved to nodes in the new setting of 'mems.' Again, 384will be moved to nodes in the new setting of 'mems.'
386depending on the implementation, this might be done by swapping, 385Pages that were not in the tasks prior cpuset, or in the cpusets
387or by direct copying. In either case, pages that were not in 386prior 'mems' setting, will not be moved.
388the tasks prior cpuset, or in the cpusets prior 'mems' setting,
389will not be moved.
390 387
391There is an exception to the above. If hotplug functionality is used 388There is an exception to the above. If hotplug functionality is used
392to remove all the CPUs that are currently assigned to a cpuset, 389to remove all the CPUs that are currently assigned to a cpuset,
@@ -434,16 +431,6 @@ and then start a subshell 'sh' in that cpuset:
434 # The next line should display '/Charlie' 431 # The next line should display '/Charlie'
435 cat /proc/self/cpuset 432 cat /proc/self/cpuset
436 433
437In the case that a change of cpuset includes wanting to move already
438allocated memory pages, consider further the work of IWAMOTO
439Toshihiro <iwamoto@valinux.co.jp> for page remapping and memory
440hotremoval, which can be found at:
441
442 http://people.valinux.co.jp/~iwamoto/mh.html
443
444The integration of cpusets with such memory migration is not yet
445available.
446
447In the future, a C library interface to cpusets will likely be 434In the future, a C library interface to cpusets will likely be
448available. For now, the only way to query or modify cpusets is 435available. For now, the only way to query or modify cpusets is
449via the cpuset file system, using the various cd, mkdir, echo, cat, 436via the cpuset file system, using the various cd, mkdir, echo, cat,
diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt
index df6c05453cb5..52ed462061df 100644
--- a/Documentation/dvb/bt8xx.txt
+++ b/Documentation/dvb/bt8xx.txt
@@ -111,4 +111,8 @@ source: linux/Documentation/video4linux/CARDLIST.bttv
111If you have problems with this please do ask on the mailing list. 111If you have problems with this please do ask on the mailing list.
112 112
113-- 113--
114Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham 114Authors: Richard Walker,
115 Jamie Honan,
116 Michael Hunold,
117 Manu Abraham,
118 Michael Krufky
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index b730d765b525..81bc51369f59 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -171,3 +171,21 @@ Why: The ISA interface is faster and should be always available. The I2C
171 probing is also known to cause trouble in at least one case (see 171 probing is also known to cause trouble in at least one case (see
172 bug #5889.) 172 bug #5889.)
173Who: Jean Delvare <khali@linux-fr.org> 173Who: Jean Delvare <khali@linux-fr.org>
174
175---------------------------
176
177What: mount/umount uevents
178When: February 2007
179Why: These events are not correct, and do not properly let userspace know
180 when a file system has been mounted or unmounted. Userspace should
181 poll the /proc/mounts file instead to detect this properly.
182Who: Greg Kroah-Hartman <gregkh@suse.de>
183
184---------------------------
185
186What: Support for NEC DDB5074 and DDB5476 evaluation boards.
187When: June 2006
188Why: Board specific code doesn't build anymore since ~2.6.0 and no
189 users have complained indicating there is no more need for these
190 boards. This should really be considered a last call.
191Who: Ralf Baechle <ralf@linux-mips.org>
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
index 614de3124901..251168587899 100644
--- a/Documentation/filesystems/ntfs.txt
+++ b/Documentation/filesystems/ntfs.txt
@@ -457,6 +457,12 @@ ChangeLog
457 457
458Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. 458Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
459 459
4602.1.26:
461 - Implement support for sector sizes above 512 bytes (up to the maximum
462 supported by NTFS which is 4096 bytes).
463 - Enhance support for NTFS volumes which were supported by Windows but
464 not by Linux due to invalid attribute list attribute flags.
465 - A few minor updates and bug fixes.
4602.1.25: 4662.1.25:
461 - Write support is now extended with write(2) being able to both 467 - Write support is now extended with write(2) being able to both
462 overwrite existing file data and to extend files. Also, if a write 468 overwrite existing file data and to extend files. Also, if a write
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index dbe4d87d2615..1773106976a2 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -79,15 +79,27 @@ that instance in a system with many cpus making intensive use of it.
79 79
80 80
81tmpfs has a mount option to set the NUMA memory allocation policy for 81tmpfs has a mount option to set the NUMA memory allocation policy for
82all files in that instance: 82all files in that instance (if CONFIG_NUMA is enabled) - which can be
83mpol=interleave prefers to allocate memory from each node in turn 83adjusted on the fly via 'mount -o remount ...'
84mpol=default prefers to allocate memory from the local node
85mpol=bind prefers to allocate from mpol_nodelist
86mpol=preferred prefers to allocate from first node in mpol_nodelist
87 84
88The following mount option is used in conjunction with mpol=interleave, 85mpol=default prefers to allocate memory from the local node
89mpol=bind or mpol=preferred: 86mpol=prefer:Node prefers to allocate memory from the given Node
90mpol_nodelist: nodelist suitable for parsing with nodelist_parse. 87mpol=bind:NodeList allocates memory only from nodes in NodeList
88mpol=interleave prefers to allocate from each node in turn
89mpol=interleave:NodeList allocates from each node of NodeList in turn
90
91NodeList format is a comma-separated list of decimal numbers and ranges,
92a range being two hyphen-separated decimal numbers, the smallest and
93largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
94
95Note that trying to mount a tmpfs with an mpol option will fail if the
96running kernel does not support NUMA; and will fail if its nodelist
97specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs
98being mounted, but from time to time runs a kernel built without NUMA
99capability (perhaps a safe recovery kernel), or configured to support
100fewer nodes, then it is advisable to omit the mpol option from automatic
101mount options. It can be added later, when the tmpfs is already mounted
102on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
91 103
92 104
93To specify the initial root directory you can use the following mount 105To specify the initial root directory you can use the following mount
@@ -109,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
109Author: 121Author:
110 Christoph Rohland <cr@sap.com>, 1.12.01 122 Christoph Rohland <cr@sap.com>, 1.12.01
111Updated: 123Updated:
112 Hugh Dickins <hugh@veritas.com>, 13 March 2005 124 Hugh Dickins <hugh@veritas.com>, 19 February 2006
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt
index 4e92feb6b507..24c7a9c41f0d 100644
--- a/Documentation/filesystems/v9fs.txt
+++ b/Documentation/filesystems/v9fs.txt
@@ -57,8 +57,6 @@ OPTIONS
57 57
58 port=n port to connect to on the remote server 58 port=n port to connect to on the remote server
59 59
60 timeout=n request timeouts (in ms) (default 60000ms)
61
62 noextend force legacy mode (no 9P2000.u semantics) 60 noextend force legacy mode (no 9P2000.u semantics)
63 61
64 uid attempt to mount as a particular uid 62 uid attempt to mount as a particular uid
@@ -74,10 +72,16 @@ OPTIONS
74RESOURCES 72RESOURCES
75========= 73=========
76 74
77The Linux version of the 9P server, along with some client-side utilities 75The Linux version of the 9P server is now maintained under the npfs project
78can be found at http://v9fs.sf.net (along with a CVS repository of the 76on sourceforge (http://sourceforge.net/projects/npfs).
79development branch of this module). There are user and developer mailing 77
80lists here, as well as a bug-tracker. 78There are user and developer mailing lists available through the v9fs project
79on sourceforge (http://sourceforge.net/projects/v9fs).
80
81News and other information is maintained on SWiK (http://swik.net/v9fs).
82
83Bug reports may be issued through the kernel.org bugzilla
84(http://bugzilla.kernel.org)
81 85
82For more information on the Plan 9 Operating System check out 86For more information on the Plan 9 Operating System check out
83http://plan9.bell-labs.com/plan9 87http://plan9.bell-labs.com/plan9
diff --git a/Documentation/fujitsu/frv/kernel-ABI.txt b/Documentation/fujitsu/frv/kernel-ABI.txt
new file mode 100644
index 000000000000..0ed9b0a779bc
--- /dev/null
+++ b/Documentation/fujitsu/frv/kernel-ABI.txt
@@ -0,0 +1,234 @@
1 =================================
2 INTERNAL KERNEL ABI FOR FR-V ARCH
3 =================================
4
5The internal FRV kernel ABI is not quite the same as the userspace ABI. A number of the registers
6are used for special purposed, and the ABI is not consistent between modules vs core, and MMU vs
7no-MMU.
8
9This partly stems from the fact that FRV CPUs do not have a separate supervisor stack pointer, and
10most of them do not have any scratch registers, thus requiring at least one general purpose
11register to be clobbered in such an event. Also, within the kernel core, it is possible to simply
12jump or call directly between functions using a relative offset. This cannot be extended to modules
13for the displacement is likely to be too far. Thus in modules the address of a function to call
14must be calculated in a register and then used, requiring two extra instructions.
15
16This document has the following sections:
17
18 (*) System call register ABI
19 (*) CPU operating modes
20 (*) Internal kernel-mode register ABI
21 (*) Internal debug-mode register ABI
22 (*) Virtual interrupt handling
23
24
25========================
26SYSTEM CALL REGISTER ABI
27========================
28
29When a system call is made, the following registers are effective:
30
31 REGISTERS CALL RETURN
32 =============== ======================= =======================
33 GR7 System call number Preserved
34 GR8 Syscall arg #1 Return value
35 GR9-GR13 Syscall arg #2-6 Preserved
36
37
38===================
39CPU OPERATING MODES
40===================
41
42The FR-V CPU has three basic operating modes. In order of increasing capability:
43
44 (1) User mode.
45
46 Basic userspace running mode.
47
48 (2) Kernel mode.
49
50 Normal kernel mode. There are many additional control registers available that may be
51 accessed in this mode, in addition to all the stuff available to user mode. This has two
52 submodes:
53
54 (a) Exceptions enabled (PSR.T == 1).
55
56 Exceptions will invoke the appropriate normal kernel mode handler. On entry to the
57 handler, the PSR.T bit will be cleared.
58
59 (b) Exceptions disabled (PSR.T == 0).
60
61 No exceptions or interrupts may happen. Any mandatory exceptions will cause the CPU to
62 halt unless the CPU is told to jump into debug mode instead.
63
64 (3) Debug mode.
65
66 No exceptions may happen in this mode. Memory protection and management exceptions will be
67 flagged for later consideration, but the exception handler won't be invoked. Debugging traps
68 such as hardware breakpoints and watchpoints will be ignored. This mode is entered only by
69 debugging events obtained from the other two modes.
70
71 All kernel mode registers may be accessed, plus a few extra debugging specific registers.
72
73
74=================================
75INTERNAL KERNEL-MODE REGISTER ABI
76=================================
77
78There are a number of permanent register assignments that are set up by entry.S in the exception
79prologue. Note that there is a complete set of exception prologues for each of user->kernel
80transition and kernel->kernel transition. There are also user->debug and kernel->debug mode
81transition prologues.
82
83
84 REGISTER FLAVOUR USE
85 =============== ======= ====================================================
86 GR1 Supervisor stack pointer
87 GR15 Current thread info pointer
88 GR16 GP-Rel base register for small data
89 GR28 Current exception frame pointer (__frame)
90 GR29 Current task pointer (current)
91 GR30 Destroyed by kernel mode entry
92 GR31 NOMMU Destroyed by debug mode entry
93 GR31 MMU Destroyed by TLB miss kernel mode entry
94 CCR.ICC2 Virtual interrupt disablement tracking
95 CCCR.CC3 Cleared by exception prologue (atomic op emulation)
96 SCR0 MMU See mmu-layout.txt.
97 SCR1 MMU See mmu-layout.txt.
98 SCR2 MMU Save for EAR0 (destroyed by icache insns in debug mode)
99 SCR3 MMU Save for GR31 during debug exceptions
100 DAMR/IAMR NOMMU Fixed memory protection layout.
101 DAMR/IAMR MMU See mmu-layout.txt.
102
103
104Certain registers are also used or modified across function calls:
105
106 REGISTER CALL RETURN
107 =============== =============================== ===============================
108 GR0 Fixed Zero -
109 GR2 Function call frame pointer
110 GR3 Special Preserved
111 GR3-GR7 - Clobbered
112 GR8 Function call arg #1 Return value (or clobbered)
113 GR9 Function call arg #2 Return value MSW (or clobbered)
114 GR10-GR13 Function call arg #3-#6 Clobbered
115 GR14 - Clobbered
116 GR15-GR16 Special Preserved
117 GR17-GR27 - Preserved
118 GR28-GR31 Special Only accessed explicitly
119 LR Return address after CALL Clobbered
120 CCR/CCCR - Mostly Clobbered
121
122
123================================
124INTERNAL DEBUG-MODE REGISTER ABI
125================================
126
127This is the same as the kernel-mode register ABI for functions calls. The difference is that in
128debug-mode there's a different stack and a different exception frame. Almost all the global
129registers from kernel-mode (including the stack pointer) may be changed.
130
131 REGISTER FLAVOUR USE
132 =============== ======= ====================================================
133 GR1 Debug stack pointer
134 GR16 GP-Rel base register for small data
135 GR31 Current debug exception frame pointer (__debug_frame)
136 SCR3 MMU Saved value of GR31
137
138
139Note that debug mode is able to interfere with the kernel's emulated atomic ops, so it must be
140exceedingly careful not to do any that would interact with the main kernel in this regard. Hence
141the debug mode code (gdbstub) is almost completely self-contained. The only external code used is
142the sprintf family of functions.
143
144Futhermore, break.S is so complicated because single-step mode does not switch off on entry to an
145exception. That means unless manually disabled, single-stepping will blithely go on stepping into
146things like interrupts. See gdbstub.txt for more information.
147
148
149==========================
150VIRTUAL INTERRUPT HANDLING
151==========================
152
153Because accesses to the PSR is so slow, and to disable interrupts we have to access it twice (once
154to read and once to write), we don't actually disable interrupts at all if we don't have to. What
155we do instead is use the ICC2 condition code flags to note virtual disablement, such that if we
156then do take an interrupt, we note the flag, really disable interrupts, set another flag and resume
157execution at the point the interrupt happened. Setting condition flags as a side effect of an
158arithmetic or logical instruction is really fast. This use of the ICC2 only occurs within the
159kernel - it does not affect userspace.
160
161The flags we use are:
162
163 (*) CCR.ICC2.Z [Zero flag]
164
165 Set to virtually disable interrupts, clear when interrupts are virtually enabled. Can be
166 modified by logical instructions without affecting the Carry flag.
167
168 (*) CCR.ICC2.C [Carry flag]
169
170 Clear to indicate hardware interrupts are really disabled, set otherwise.
171
172
173What happens is this:
174
175 (1) Normal kernel-mode operation.
176
177 ICC2.Z is 0, ICC2.C is 1.
178
179 (2) An interrupt occurs. The exception prologue examines ICC2.Z and determines that nothing needs
180 doing. This is done simply with an unlikely BEQ instruction.
181
182 (3) The interrupts are disabled (local_irq_disable)
183
184 ICC2.Z is set to 1.
185
186 (4) If interrupts were then re-enabled (local_irq_enable):
187
188 ICC2.Z would be set to 0.
189
190 A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would be used to trap if
191 interrupts were now virtually enabled, but physically disabled - which they're not, so the
192 trap isn't taken. The kernel would then be back to state (1).
193
194 (5) An interrupt occurs. The exception prologue examines ICC2.Z and determines that the interrupt
195 shouldn't actually have happened. It jumps aside, and there disabled interrupts by setting
196 PSR.PIL to 14 and then it clears ICC2.C.
197
198 (6) If interrupts were then saved and disabled again (local_irq_save):
199
200 ICC2.Z would be shifted into the save variable and masked off (giving a 1).
201
202 ICC2.Z would then be set to 1 (thus unchanged), and ICC2.C would be unaffected (ie: 0).
203
204 (7) If interrupts were then restored from state (6) (local_irq_restore):
205
206 ICC2.Z would be set to indicate the result of XOR'ing the saved value (ie: 1) with 1, which
207 gives a result of 0 - thus leaving ICC2.Z set.
208
209 ICC2.C would remain unaffected (ie: 0).
210
211 A TIHI #2 instruction would be used to again assay the current state, but this would do
212 nothing as Z==1.
213
214 (8) If interrupts were then enabled (local_irq_enable):
215
216 ICC2.Z would be cleared. ICC2.C would be left unaffected. Both flags would now be 0.
217
218 A TIHI #2 instruction again issued to assay the current state would then trap as both Z==0
219 [interrupts virtually enabled] and C==0 [interrupts really disabled] would then be true.
220
221 (9) The trap #2 handler would simply enable hardware interrupts (set PSR.PIL to 0), set ICC2.C to
222 1 and return.
223
224(10) Immediately upon returning, the pending interrupt would be taken.
225
226(11) The interrupt handler would take the path of actually processing the interrupt (ICC2.Z is
227 clear, BEQ fails as per step (2)).
228
229(12) The interrupt handler would then set ICC2.C to 1 since hardware interrupts are definitely
230 enabled - or else the kernel wouldn't be here.
231
232(13) On return from the interrupt handler, things would be back to state (1).
233
234This trap (#2) is only available in kernel mode. In user mode it will result in SIGILL.
diff --git a/Documentation/hwmon/w83627hf b/Documentation/hwmon/w83627hf
index 5d23776e9907..bbeaba680443 100644
--- a/Documentation/hwmon/w83627hf
+++ b/Documentation/hwmon/w83627hf
@@ -36,6 +36,10 @@ Module Parameters
36 (default is 1) 36 (default is 1)
37 Use 'init=0' to bypass initializing the chip. 37 Use 'init=0' to bypass initializing the chip.
38 Try this if your computer crashes when you load the module. 38 Try this if your computer crashes when you load the module.
39* reset: int
40 (default is 0)
41 The driver used to reset the chip on load, but does no more. Use
42 'reset=1' to restore the old behavior. Report if you need to do this.
39 43
40Description 44Description
41----------- 45-----------
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 84370363da80..fc99075e0af4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -335,6 +335,12 @@ running once the system is up.
335 timesource is not avalible, it defaults to PIT. 335 timesource is not avalible, it defaults to PIT.
336 Format: { pit | tsc | cyclone | pmtmr } 336 Format: { pit | tsc | cyclone | pmtmr }
337 337
338 disable_8254_timer
339 enable_8254_timer
340 [IA32/X86_64] Disable/Enable interrupt 0 timer routing
341 over the 8254 in addition to over the IO-APIC. The
342 kernel tries to set a sensible default.
343
338 hpet= [IA-32,HPET] option to disable HPET and use PIT. 344 hpet= [IA-32,HPET] option to disable HPET and use PIT.
339 Format: disable 345 Format: disable
340 346
@@ -1034,6 +1040,8 @@ running once the system is up.
1034 1040
1035 nomce [IA-32] Machine Check Exception 1041 nomce [IA-32] Machine Check Exception
1036 1042
1043 nomca [IA-64] Disable machine check abort handling
1044
1037 noresidual [PPC] Don't use residual data on PReP machines. 1045 noresidual [PPC] Don't use residual data on PReP machines.
1038 1046
1039 noresume [SWSUSP] Disables resume and restores original swap 1047 noresume [SWSUSP] Disables resume and restores original swap
@@ -1133,6 +1141,8 @@ running once the system is up.
1133 Mechanism 1. 1141 Mechanism 1.
1134 conf2 [IA-32] Force use of PCI Configuration 1142 conf2 [IA-32] Force use of PCI Configuration
1135 Mechanism 2. 1143 Mechanism 2.
1144 nommconf [IA-32,X86_64] Disable use of MMCONFIG for PCI
1145 Configuration
1136 nosort [IA-32] Don't sort PCI devices according to 1146 nosort [IA-32] Don't sort PCI devices according to
1137 order given by the PCI BIOS. This sorting is 1147 order given by the PCI BIOS. This sorting is
1138 done to get a device order compatible with 1148 done to get a device order compatible with
@@ -1280,6 +1290,19 @@ running once the system is up.
1280 New name for the ramdisk parameter. 1290 New name for the ramdisk parameter.
1281 See Documentation/ramdisk.txt. 1291 See Documentation/ramdisk.txt.
1282 1292
1293 rcu.blimit= [KNL,BOOT] Set maximum number of finished
1294 RCU callbacks to process in one batch.
1295
1296 rcu.qhimark= [KNL,BOOT] Set threshold of queued
1297 RCU callbacks over which batch limiting is disabled.
1298
1299 rcu.qlowmark= [KNL,BOOT] Set threshold of queued
1300 RCU callbacks below which batch limiting is re-enabled.
1301
1302 rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional
1303 RCU callbacks to queued before forcing reschedule
1304 on all cpus.
1305
1283 rdinit= [KNL] 1306 rdinit= [KNL]
1284 Format: <full_path> 1307 Format: <full_path>
1285 Run specified binary instead of /init from the ramdisk, 1308 Run specified binary instead of /init from the ramdisk,
@@ -1636,6 +1659,9 @@ running once the system is up.
1636 Format: 1659 Format:
1637 <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] 1660 <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
1638 1661
1662 norandmaps Don't use address space randomization
1663 Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space
1664
1639 1665
1640______________________________________________________________________ 1666______________________________________________________________________
1641Changelog: 1667Changelog:
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 0ea5a0c6e827..2c3b1eae4280 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -136,17 +136,20 @@ Kprobes, jprobes, and return probes are implemented on the following
136architectures: 136architectures:
137 137
138- i386 138- i386
139- x86_64 (AMD-64, E64MT) 139- x86_64 (AMD-64, EM64T)
140- ppc64 140- ppc64
141- ia64 (Support for probes on certain instruction types is still in progress.) 141- ia64 (Does not support probes on instruction slot1.)
142- sparc64 (Return probes not yet implemented.) 142- sparc64 (Return probes not yet implemented.)
143 143
1443. Configuring Kprobes 1443. Configuring Kprobes
145 145
146When configuring the kernel using make menuconfig/xconfig/oldconfig, 146When configuring the kernel using make menuconfig/xconfig/oldconfig,
147ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking", 147ensure that CONFIG_KPROBES is set to "y". Under "Instrumentation
148look for "Kprobes". You may have to enable "Kernel debugging" 148Support", look for "Kprobes".
149(CONFIG_DEBUG_KERNEL) before you can enable Kprobes. 149
150So that you can load and unload Kprobes-based instrumentation modules,
151make sure "Loadable module support" (CONFIG_MODULES) and "Module
152unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
150 153
151You may also want to ensure that CONFIG_KALLSYMS and perhaps even 154You may also want to ensure that CONFIG_KALLSYMS and perhaps even
152CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name() 155CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name()
@@ -262,18 +265,18 @@ at any time after the probe has been registered.
262 265
2635. Kprobes Features and Limitations 2665. Kprobes Features and Limitations
264 267
265As of Linux v2.6.12, Kprobes allows multiple probes at the same 268Kprobes allows multiple probes at the same address. Currently,
266address. Currently, however, there cannot be multiple jprobes on 269however, there cannot be multiple jprobes on the same function at
267the same function at the same time. 270the same time.
268 271
269In general, you can install a probe anywhere in the kernel. 272In general, you can install a probe anywhere in the kernel.
270In particular, you can probe interrupt handlers. Known exceptions 273In particular, you can probe interrupt handlers. Known exceptions
271are discussed in this section. 274are discussed in this section.
272 275
273For obvious reasons, it's a bad idea to install a probe in 276The register_*probe functions will return -EINVAL if you attempt
274the code that implements Kprobes (mostly kernel/kprobes.c and 277to install a probe in the code that implements Kprobes (mostly
275arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs 278kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such
276Kprobes to reject such requests. 279as do_page_fault and notifier_call_chain).
277 280
278If you install a probe in an inline-able function, Kprobes makes 281If you install a probe in an inline-able function, Kprobes makes
279no attempt to chase down all inline instances of the function and 282no attempt to chase down all inline instances of the function and
@@ -290,18 +293,14 @@ from the accidental ones. Don't drink and probe.
290 293
291Kprobes makes no attempt to prevent probe handlers from stepping on 294Kprobes makes no attempt to prevent probe handlers from stepping on
292each other -- e.g., probing printk() and then calling printk() from a 295each other -- e.g., probing printk() and then calling printk() from a
293probe handler. As of Linux v2.6.12, if a probe handler hits a probe, 296probe handler. If a probe handler hits a probe, that second probe's
294that second probe's handlers won't be run in that instance. 297handlers won't be run in that instance, and the kprobe.nmissed member
295 298of the second probe will be incremented.
296In Linux v2.6.12 and previous versions, Kprobes' data structures are 299
297protected by a single lock that is held during probe registration and 300As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
298unregistration and while handlers are run. Thus, no two handlers 301the same handler) may run concurrently on different CPUs.
299can run simultaneously. To improve scalability on SMP systems, 302
300this restriction will probably be removed soon, in which case 303Kprobes does not use mutexes or allocate memory except during
301multiple handlers (or multiple instances of the same handler) may
302run concurrently on different CPUs. Code your handlers accordingly.
303
304Kprobes does not use semaphores or allocate memory except during
305registration and unregistration. 304registration and unregistration.
306 305
307Probe handlers are run with preemption disabled. Depending on the 306Probe handlers are run with preemption disabled. Depending on the
@@ -316,11 +315,18 @@ address instead of the real return address for kretprobed functions.
316(As far as we can tell, __builtin_return_address() is used only 315(As far as we can tell, __builtin_return_address() is used only
317for instrumentation and error reporting.) 316for instrumentation and error reporting.)
318 317
319If the number of times a function is called does not match the 318If the number of times a function is called does not match the number
320number of times it returns, registering a return probe on that 319of times it returns, registering a return probe on that function may
321function may produce undesirable results. We have the do_exit() 320produce undesirable results. We have the do_exit() case covered.
322and do_execve() cases covered. do_fork() is not an issue. We're 321do_execve() and do_fork() are not an issue. We're unaware of other
323unaware of other specific cases where this could be a problem. 322specific cases where this could be a problem.
323
324If, upon entry to or exit from a function, the CPU is running on
325a stack other than that of the current task, registering a return
326probe on that function may produce undesirable results. For this
327reason, Kprobes doesn't support return probes (or kprobes or jprobes)
328on the x86_64 version of __switch_to(); the registration functions
329return -EINVAL.
324 330
3256. Probe Overhead 3316. Probe Overhead
326 332
@@ -347,14 +353,12 @@ k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
347 353
3487. TODO 3547. TODO
349 355
350a. SystemTap (http://sourceware.org/systemtap): Work in progress 356a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
351to provide a simplified programming interface for probe-based 357programming interface for probe-based instrumentation. Try it out.
352instrumentation. 358b. Kernel return probes for sparc64.
353b. Improved SMP scalability: Currently, work is in progress to handle 359c. Support for other architectures.
354multiple kprobes in parallel. 360d. User-space probes.
355c. Kernel return probes for sparc64. 361e. Watchpoint probes (which fire on data references).
356d. Support for other architectures.
357e. User-space probes.
358 362
3598. Kprobes Example 3638. Kprobes Example
360 364
@@ -411,8 +415,7 @@ int init_module(void)
411 printk("Couldn't find %s to plant kprobe\n", "do_fork"); 415 printk("Couldn't find %s to plant kprobe\n", "do_fork");
412 return -1; 416 return -1;
413 } 417 }
414 ret = register_kprobe(&kp); 418 if ((ret = register_kprobe(&kp) < 0)) {
415 if (ret < 0) {
416 printk("register_kprobe failed, returned %d\n", ret); 419 printk("register_kprobe failed, returned %d\n", ret);
417 return -1; 420 return -1;
418 } 421 }
diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README
index a7e4c4ea3560..afb31c141d9d 100644
--- a/Documentation/mips/AU1xxx_IDE.README
+++ b/Documentation/mips/AU1xxx_IDE.README
@@ -95,11 +95,13 @@ CONFIG_BLK_DEV_IDEDMA_PCI=y
95CONFIG_IDEDMA_PCI_AUTO=y 95CONFIG_IDEDMA_PCI_AUTO=y
96CONFIG_BLK_DEV_IDE_AU1XXX=y 96CONFIG_BLK_DEV_IDE_AU1XXX=y
97CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y 97CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y
98CONFIG_BLK_DEV_IDE_AU1XXX_BURSTABLE_ON=y
99CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128 98CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
100CONFIG_BLK_DEV_IDEDMA=y 99CONFIG_BLK_DEV_IDEDMA=y
101CONFIG_IDEDMA_AUTO=y 100CONFIG_IDEDMA_AUTO=y
102 101
102Also define 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to enable
103the burst support on DBDMA controller.
104
103If the used system need the USB support enable the following kernel configs for 105If the used system need the USB support enable the following kernel configs for
104high IDE to USB throughput. 106high IDE to USB throughput.
105 107
@@ -115,6 +117,8 @@ CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
115CONFIG_BLK_DEV_IDEDMA=y 117CONFIG_BLK_DEV_IDEDMA=y
116CONFIG_IDEDMA_AUTO=y 118CONFIG_IDEDMA_AUTO=y
117 119
120Also undefine 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to
121disable the burst support on DBDMA controller.
118 122
119ADD NEW HARD DISC TO WHITE OR BLACK LIST 123ADD NEW HARD DISC TO WHITE OR BLACK LIST
120---------------------------------------- 124----------------------------------------
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
index f8c16cbf56ba..2dafa63bd370 100644
--- a/Documentation/scsi/ChangeLog.megaraid_sas
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -1,3 +1,26 @@
11 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
22 Current Version : 00.00.02.04
33 Older Version : 00.00.02.04
4
5i. Support for 1078 type (ppc IOP) controller, device id : 0x60 added.
6 During initialization, depending on the device id, the template members
7 are initialized with function pointers specific to the ppc or
8 xscale controllers.
9
10 -Sumant Patro <Sumant.Patro@lsil.com>
11
121 Release Date : Fri Feb 03 14:16:25 PST 2006 - Sumant Patro
13 <Sumant.Patro@lsil.com>
142 Current Version : 00.00.02.04
153 Older Version : 00.00.02.02
16i. Register 16 byte CDB capability with scsi midlayer
17
18 "Ths patch properly registers the 16 byte command length capability of the
19 megaraid_sas controlled hardware with the scsi midlayer. All megaraid_sas
20 hardware supports 16 byte CDB's."
21
22 -Joshua Giles <joshua_giles@dell.com>
23
11 Release Date : Mon Jan 23 14:09:01 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com> 241 Release Date : Mon Jan 23 14:09:01 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
22 Current Version : 00.00.02.02 252 Current Version : 00.00.02.02
33 Older Version : 00.00.02.01 263 Older Version : 00.00.02.01
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9f11d36a8c10..b0c7ab93dcb9 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -16,6 +16,7 @@ before actually making adjustments.
16 16
17Currently, these files might (depending on your configuration) 17Currently, these files might (depending on your configuration)
18show up in /proc/sys/kernel: 18show up in /proc/sys/kernel:
19- acpi_video_flags
19- acct 20- acct
20- core_pattern 21- core_pattern
21- core_uses_pid 22- core_uses_pid
@@ -57,6 +58,15 @@ show up in /proc/sys/kernel:
57 58
58============================================================== 59==============================================================
59 60
61acpi_video_flags:
62
63flags
64
65See Doc*/kernel/power/video.txt, it allows mode of video boot to be
66set during run time.
67
68==============================================================
69
60acct: 70acct:
61 71
62highwater lowwater frequency 72highwater lowwater frequency
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index 8a352597830f..da4fb890165f 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -13,7 +13,7 @@
13 12 -> Medion 7134 [16be:0003] 13 12 -> Medion 7134 [16be:0003]
14 13 -> Typhoon TV+Radio 90031 14 13 -> Typhoon TV+Radio 90031
15 14 -> ELSA EX-VISION 300TV [1048:226b] 15 14 -> ELSA EX-VISION 300TV [1048:226b]
16 15 -> ELSA EX-VISION 500TV [1048:226b] 16 15 -> ELSA EX-VISION 500TV [1048:226a]
17 16 -> ASUS TV-FM 7134 [1043:4842,1043:4830,1043:4840] 17 16 -> ASUS TV-FM 7134 [1043:4842,1043:4830,1043:4840]
18 17 -> AOPEN VA1000 POWER [1131:7133] 18 17 -> AOPEN VA1000 POWER [1131:7133]
19 18 -> BMK MPEX No Tuner 19 18 -> BMK MPEX No Tuner
@@ -75,7 +75,7 @@
75 74 -> LifeView FlyTV Platinum Mini2 [14c0:1212] 75 74 -> LifeView FlyTV Platinum Mini2 [14c0:1212]
76 75 -> AVerMedia AVerTVHD MCE A180 [1461:1044] 76 75 -> AVerMedia AVerTVHD MCE A180 [1461:1044]
77 76 -> SKNet MonsterTV Mobile [1131:4ee9] 77 76 -> SKNet MonsterTV Mobile [1131:4ee9]
78 77 -> Pinnacle PCTV 110i (saa7133) [11bd:002e] 78 77 -> Pinnacle PCTV 40i/50i/110i (saa7133) [11bd:002e]
79 78 -> ASUSTeK P7131 Dual [1043:4862] 79 78 -> ASUSTeK P7131 Dual [1043:4862]
80 79 -> Sedna/MuchTV PC TV Cardbus TV/Radio (ITO25 Rev:2B) 80 79 -> Sedna/MuchTV PC TV Cardbus TV/Radio (ITO25 Rev:2B)
81 80 -> ASUS Digimatrix TV [1043:0210] 81 80 -> ASUS Digimatrix TV [1043:0210]
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
index c52820fcf500..0dd4ef30c361 100644
--- a/Documentation/vm/page_migration
+++ b/Documentation/vm/page_migration
@@ -12,12 +12,18 @@ is running.
12 12
13Page migration allows a process to manually relocate the node on which its 13Page migration allows a process to manually relocate the node on which its
14pages are located through the MF_MOVE and MF_MOVE_ALL options while setting 14pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
15a new memory policy. The pages of process can also be relocated 15a new memory policy via mbind(). The pages of process can also be relocated
16from another process using the sys_migrate_pages() function call. The 16from another process using the sys_migrate_pages() function call. The
17migrate_pages function call takes two sets of nodes and moves pages of a 17migrate_pages function call takes two sets of nodes and moves pages of a
18process that are located on the from nodes to the destination nodes. 18process that are located on the from nodes to the destination nodes.
19 19Page migration functions are provided by the numactl package by Andi Kleen
20Manual migration is very useful if for example the scheduler has relocated 20(a version later than 0.9.3 is required. Get it from
21ftp://ftp.suse.com/pub/people/ak). numactl provided libnuma which
22provides an interface similar to other numa functionality for page migration.
23cat /proc/<pid>/numa_maps allows an easy review of where the pages of
24a process are located. See also the numa_maps manpage in the numactl package.
25
26Manual migration is useful if for example the scheduler has relocated
21a process to a processor on a distant node. A batch scheduler or an 27a process to a processor on a distant node. A batch scheduler or an
22administrator may detect the situation and move the pages of the process 28administrator may detect the situation and move the pages of the process
23nearer to the new processor. At some point in the future we may have 29nearer to the new processor. At some point in the future we may have
@@ -25,10 +31,12 @@ some mechanism in the scheduler that will automatically move the pages.
25 31
26Larger installations usually partition the system using cpusets into 32Larger installations usually partition the system using cpusets into
27sections of nodes. Paul Jackson has equipped cpusets with the ability to 33sections of nodes. Paul Jackson has equipped cpusets with the ability to
28move pages when a task is moved to another cpuset. This allows automatic 34move pages when a task is moved to another cpuset (See ../cpusets.txt).
29control over locality of a process. If a task is moved to a new cpuset 35Cpusets allows the automation of process locality. If a task is moved to
30then also all its pages are moved with it so that the performance of the 36a new cpuset then also all its pages are moved with it so that the
31process does not sink dramatically (as is the case today). 37performance of the process does not sink dramatically. Also the pages
38of processes in a cpuset are moved if the allowed memory nodes of a
39cpuset are changed.
32 40
33Page migration allows the preservation of the relative location of pages 41Page migration allows the preservation of the relative location of pages
34within a group of nodes for all migration techniques which will preserve a 42within a group of nodes for all migration techniques which will preserve a
@@ -37,22 +45,26 @@ process. This is necessary in order to preserve the memory latencies.
37Processes will run with similar performance after migration. 45Processes will run with similar performance after migration.
38 46
39Page migration occurs in several steps. First a high level 47Page migration occurs in several steps. First a high level
40description for those trying to use migrate_pages() and then 48description for those trying to use migrate_pages() from the kernel
41a low level description of how the low level details work. 49(for userspace usage see the Andi Kleen's numactl package mentioned above)
50and then a low level description of how the low level details work.
42 51
43A. Use of migrate_pages() 52A. In kernel use of migrate_pages()
44------------------------- 53-----------------------------------
45 54
461. Remove pages from the LRU. 551. Remove pages from the LRU.
47 56
48 Lists of pages to be migrated are generated by scanning over 57 Lists of pages to be migrated are generated by scanning over
49 pages and moving them into lists. This is done by 58 pages and moving them into lists. This is done by
50 calling isolate_lru_page() or __isolate_lru_page(). 59 calling isolate_lru_page().
51 Calling isolate_lru_page increases the references to the page 60 Calling isolate_lru_page increases the references to the page
52 so that it cannot vanish under us. 61 so that it cannot vanish while the page migration occurs.
62 It also prevents the swapper or other scans to encounter
63 the page.
53 64
542. Generate a list of newly allocates page to move the contents 652. Generate a list of newly allocates page. These pages will contain the
55 of the first list to. 66 contents of the pages from the first list after page migration is
67 complete.
56 68
573. The migrate_pages() function is called which attempts 693. The migrate_pages() function is called which attempts
58 to do the migration. It returns the moved pages in the 70 to do the migration. It returns the moved pages in the
@@ -63,13 +75,17 @@ A. Use of migrate_pages()
634. The leftover pages of various types are returned 754. The leftover pages of various types are returned
64 to the LRU using putback_to_lru_pages() or otherwise 76 to the LRU using putback_to_lru_pages() or otherwise
65 disposed of. The pages will still have the refcount as 77 disposed of. The pages will still have the refcount as
66 increased by isolate_lru_pages()! 78 increased by isolate_lru_pages() if putback_to_lru_pages() is not
79 used! The kernel may want to handle the various cases of failures in
80 different ways.
67 81
68B. Operation of migrate_pages() 82B. How migrate_pages() works
69-------------------------------- 83----------------------------
70 84
71migrate_pages does several passes over its list of pages. A page is moved 85migrate_pages() does several passes over its list of pages. A page is moved
72if all references to a page are removable at the time. 86if all references to a page are removable at the time. The page has
87already been removed from the LRU via isolate_lru_page() and the refcount
88is increased so that the page cannot be freed while page migration occurs.
73 89
74Steps: 90Steps:
75 91
@@ -79,36 +95,40 @@ Steps:
79 95
803. Make sure that the page has assigned swap cache entry if 963. Make sure that the page has assigned swap cache entry if
81 it is an anonyous page. The swap cache reference is necessary 97 it is an anonyous page. The swap cache reference is necessary
82 to preserve the information contain in the page table maps. 98 to preserve the information contain in the page table maps while
99 page migration occurs.
83 100
844. Prep the new page that we want to move to. It is locked 1014. Prep the new page that we want to move to. It is locked
85 and set to not being uptodate so that all accesses to the new 102 and set to not being uptodate so that all accesses to the new
86 page immediately lock while we are moving references. 103 page immediately lock while the move is in progress.
87 104
885. All the page table references to the page are either dropped (file backed) 1055. All the page table references to the page are either dropped (file
89 or converted to swap references (anonymous pages). This should decrease the 106 backed pages) or converted to swap references (anonymous pages).
90 reference count. 107 This should decrease the reference count.
91 108
926. The radix tree lock is taken 1096. The radix tree lock is taken. This will cause all processes trying
110 to reestablish a pte to block on the radix tree spinlock.
93 111
947. The refcount of the page is examined and we back out if references remain 1127. The refcount of the page is examined and we back out if references remain
95 otherwise we know that we are the only one referencing this page. 113 otherwise we know that we are the only one referencing this page.
96 114
978. The radix tree is checked and if it does not contain the pointer to this 1158. The radix tree is checked and if it does not contain the pointer to this
98 page then we back out. 116 page then we back out because someone else modified the mapping first.
99 117
1009. The mapping is checked. If the mapping is gone then a truncate action may 1189. The mapping is checked. If the mapping is gone then a truncate action may
101 be in progress and we back out. 119 be in progress and we back out.
102 120
10310. The new page is prepped with some settings from the old page so that accesses 12110. The new page is prepped with some settings from the old page so that
104 to the new page will be discovered to have the correct settings. 122 accesses to the new page will be discovered to have the correct settings.
105 123
10611. The radix tree is changed to point to the new page. 12411. The radix tree is changed to point to the new page.
107 125
10812. The reference count of the old page is dropped because the reference has now 12612. The reference count of the old page is dropped because the radix tree
109 been removed. 127 reference is gone.
110 128
11113. The radix tree lock is dropped. 12913. The radix tree lock is dropped. With that lookups become possible again
130 and other processes will move from spinning on the tree lock to sleeping on
131 the locked new page.
112 132
11314. The page contents are copied to the new page. 13314. The page contents are copied to the new page.
114 134
@@ -119,11 +139,37 @@ Steps:
119 139
12017. Queued up writeback on the new page is triggered. 14017. Queued up writeback on the new page is triggered.
121 141
12218. If swap pte's were generated for the page then remove them again. 14218. If swap pte's were generated for the page then replace them with real
143 ptes. This will reenable access for processes not blocked by the page lock.
144
14519. The page locks are dropped from the old and new page.
146 Processes waiting on the page lock can continue.
147
14820. The new page is moved to the LRU and can be scanned by the swapper
149 etc again.
150
151TODO list
152---------
153
154- Page migration requires the use of swap handles to preserve the
155 information of the anonymous page table entries. This means that swap
156 space is reserved but never used. The maximum number of swap handles used
157 is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration.
158 Reservation of pages could be avoided by having a special type of swap
159 handle that does not require swap space and that would only track the page
160 references. Something like that was proposed by Marcelo Tosatti in the
161 past (search for migration cache on lkml or linux-mm@kvack.org).
123 162
12419. The locks are dropped from the old and new page. 163- Page migration unmaps ptes for file backed pages and requires page
164 faults to reestablish these ptes. This could be optimized by somehow
165 recording the references before migration and then reestablish them later.
166 However, there are several locking challenges that have to be overcome
167 before this is possible.
125 168
12620. The new page is moved to the LRU. 169- Page migration generates read ptes for anonymous pages. Dirty page
170 faults are required to make the pages writable again. It may be possible
171 to generate a pte marked dirty if it is known that the page is dirty and
172 that this process has the only reference to that page.
127 173
128Christoph Lameter, December 19, 2005. 174Christoph Lameter, March 8, 2006.
129 175
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 153740f460a6..1921353259ae 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -52,6 +52,10 @@ APICs
52 apicmaintimer. Useful when your PIT timer is totally 52 apicmaintimer. Useful when your PIT timer is totally
53 broken. 53 broken.
54 54
55 disable_8254_timer / enable_8254_timer
56 Enable interrupt 0 timer routing over the 8254 in addition to over
57 the IO-APIC. The kernel tries to set a sensible default.
58
55Early Console 59Early Console
56 60
57 syntax: earlyprintk=vga 61 syntax: earlyprintk=vga