aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@g5.osdl.org>2006-03-21 11:52:18 -0500
committerLinus Torvalds <torvalds@g5.osdl.org>2006-03-21 11:52:18 -0500
commitb05005772f34497eb2b7415a651fe785cbe70e16 (patch)
treeb176aeb7fa9baf69e77ddd83e844727490bfcf28 /Documentation
parent044f324f6ea5d55391db62fca6a295b2651cb946 (diff)
parent7705a8792b0fc82fd7d4dd923724606bbfd9fb20 (diff)
Merge branch 'origin'
Conflicts: Documentation/video4linux/CARDLIST.cx88 drivers/media/video/cx88/Kconfig drivers/media/video/em28xx/em28xx-video.c drivers/media/video/saa7134/saa7134-dvb.c Resolved as in the original merge by Mauro Carvalho Chehab
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/RCU/RTFP.txt25
-rw-r--r--Documentation/RCU/checklist.txt6
-rw-r--r--Documentation/RCU/listRCU.txt21
-rw-r--r--Documentation/RCU/rcu.txt5
-rw-r--r--Documentation/RCU/rcuref.txt31
-rw-r--r--Documentation/RCU/whatisRCU.txt29
-rw-r--r--Documentation/cpu-hotplug.txt27
-rw-r--r--Documentation/cpusets.txt41
-rw-r--r--Documentation/cputopology.txt41
-rw-r--r--Documentation/driver-model/overview.txt57
-rw-r--r--Documentation/feature-removal-schedule.txt41
-rw-r--r--Documentation/filesystems/configfs/configfs_example.c2
-rw-r--r--Documentation/filesystems/ntfs.txt6
-rw-r--r--Documentation/filesystems/ocfs2.txt1
-rw-r--r--Documentation/filesystems/tmpfs.txt30
-rw-r--r--Documentation/filesystems/v9fs.txt16
-rw-r--r--Documentation/fujitsu/frv/kernel-ABI.txt234
-rw-r--r--Documentation/hwmon/f71805f105
-rw-r--r--Documentation/hwmon/it872
-rw-r--r--Documentation/hwmon/sysfs-interface18
-rw-r--r--Documentation/hwmon/w83627hf4
-rw-r--r--Documentation/i2c/busses/i2c-sis96x (renamed from Documentation/i2c/busses/i2c-sis69x)4
-rw-r--r--Documentation/kernel-doc-nano-HOWTO.txt39
-rw-r--r--Documentation/kernel-parameters.txt31
-rw-r--r--Documentation/kprobes.txt81
-rw-r--r--Documentation/mips/AU1xxx_IDE.README6
-rw-r--r--Documentation/networking/ip-sysctl.txt17
-rw-r--r--Documentation/parport-lowlevel.txt8
-rw-r--r--Documentation/pci-error-recovery.txt472
-rw-r--r--Documentation/power/interface.txt2
-rw-r--r--Documentation/power/swsusp.txt2
-rw-r--r--Documentation/powerpc/booting-without-of.txt1486
-rw-r--r--Documentation/scsi/ChangeLog.megaraid_sas47
-rw-r--r--Documentation/scsi/aic79xx.txt93
-rw-r--r--Documentation/scsi/aic7xxx.txt86
-rw-r--r--Documentation/sound/alsa/ALSA-Configuration.txt10
-rw-r--r--Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl4
-rw-r--r--Documentation/spi/butterfly23
-rw-r--r--Documentation/sysctl/kernel.txt10
-rw-r--r--Documentation/sysctl/vm.txt56
-rw-r--r--Documentation/unshare.txt295
-rw-r--r--Documentation/usb/et61x251.txt306
-rw-r--r--Documentation/usb/sn9c102.txt95
-rw-r--r--Documentation/usb/w9968cf.txt30
-rw-r--r--Documentation/vm/page_migration175
-rw-r--r--Documentation/x86_64/boot-options.txt16
46 files changed, 3606 insertions, 530 deletions
diff --git a/Documentation/RCU/RTFP.txt b/Documentation/RCU/RTFP.txt
index fcbcbc35b122..6221464d1a7e 100644
--- a/Documentation/RCU/RTFP.txt
+++ b/Documentation/RCU/RTFP.txt
@@ -90,16 +90,20 @@ at OLS. The resulting abundance of RCU patches was presented the
90following year [McKenney02a], and use of RCU in dcache was first 90following year [McKenney02a], and use of RCU in dcache was first
91described that same year [Linder02a]. 91described that same year [Linder02a].
92 92
93Also in 2002, Michael [Michael02b,Michael02a] presented techniques 93Also in 2002, Michael [Michael02b,Michael02a] presented "hazard-pointer"
94that defer the destruction of data structures to simplify non-blocking 94techniques that defer the destruction of data structures to simplify
95synchronization (wait-free synchronization, lock-free synchronization, 95non-blocking synchronization (wait-free synchronization, lock-free
96and obstruction-free synchronization are all examples of non-blocking 96synchronization, and obstruction-free synchronization are all examples of
97synchronization). In particular, this technique eliminates locking, 97non-blocking synchronization). In particular, this technique eliminates
98reduces contention, reduces memory latency for readers, and parallelizes 98locking, reduces contention, reduces memory latency for readers, and
99pipeline stalls and memory latency for writers. However, these 99parallelizes pipeline stalls and memory latency for writers. However,
100techniques still impose significant read-side overhead in the form of 100these techniques still impose significant read-side overhead in the
101memory barriers. Researchers at Sun worked along similar lines in the 101form of memory barriers. Researchers at Sun worked along similar lines
102same timeframe [HerlihyLM02,HerlihyLMS03]. 102in the same timeframe [HerlihyLM02,HerlihyLMS03]. These techniques
103can be thought of as inside-out reference counts, where the count is
104represented by the number of hazard pointers referencing a given data
105structure (rather than the more conventional counter field within the
106data structure itself).
103 107
104In 2003, the K42 group described how RCU could be used to create 108In 2003, the K42 group described how RCU could be used to create
105hot-pluggable implementations of operating-system functions. Later that 109hot-pluggable implementations of operating-system functions. Later that
@@ -113,7 +117,6 @@ number of operating-system kernels [PaulEdwardMcKenneyPhD], a paper
113describing how to make RCU safe for soft-realtime applications [Sarma04c], 117describing how to make RCU safe for soft-realtime applications [Sarma04c],
114and a paper describing SELinux performance with RCU [JamesMorris04b]. 118and a paper describing SELinux performance with RCU [JamesMorris04b].
115 119
116
1172005 has seen further adaptation of RCU to realtime use, permitting 1202005 has seen further adaptation of RCU to realtime use, permitting
118preemption of RCU realtime critical sections [PaulMcKenney05a, 121preemption of RCU realtime critical sections [PaulMcKenney05a,
119PaulMcKenney05b]. 122PaulMcKenney05b].
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
index e118a7c1a092..49e27cc19385 100644
--- a/Documentation/RCU/checklist.txt
+++ b/Documentation/RCU/checklist.txt
@@ -177,3 +177,9 @@ over a rather long period of time, but improvements are always welcome!
177 177
178 If you want to wait for some of these other things, you might 178 If you want to wait for some of these other things, you might
179 instead need to use synchronize_irq() or synchronize_sched(). 179 instead need to use synchronize_irq() or synchronize_sched().
180
18112. Any lock acquired by an RCU callback must be acquired elsewhere
182 with irq disabled, e.g., via spin_lock_irqsave(). Failing to
183 disable irq on a given acquisition of that lock will result in
184 deadlock as soon as the RCU callback happens to interrupt that
185 acquisition's critical section.
diff --git a/Documentation/RCU/listRCU.txt b/Documentation/RCU/listRCU.txt
index f8a54fa0d8ab..1fd175368a87 100644
--- a/Documentation/RCU/listRCU.txt
+++ b/Documentation/RCU/listRCU.txt
@@ -232,7 +232,7 @@ entry does not exist. For this to be helpful, the search function must
232return holding the per-entry spinlock, as ipc_lock() does in fact do. 232return holding the per-entry spinlock, as ipc_lock() does in fact do.
233 233
234Quick Quiz: Why does the search function need to return holding the 234Quick Quiz: Why does the search function need to return holding the
235per-entry lock for this deleted-flag technique to be helpful? 235 per-entry lock for this deleted-flag technique to be helpful?
236 236
237If the system-call audit module were to ever need to reject stale data, 237If the system-call audit module were to ever need to reject stale data,
238one way to accomplish this would be to add a "deleted" flag and a "lock" 238one way to accomplish this would be to add a "deleted" flag and a "lock"
@@ -275,8 +275,8 @@ flag under the spinlock as follows:
275 { 275 {
276 struct audit_entry *e; 276 struct audit_entry *e;
277 277
278 /* Do not use the _rcu iterator here, since this is the only 278 /* Do not need to use the _rcu iterator here, since this
279 * deletion routine. */ 279 * is the only deletion routine. */
280 list_for_each_entry(e, list, list) { 280 list_for_each_entry(e, list, list) {
281 if (!audit_compare_rule(rule, &e->rule)) { 281 if (!audit_compare_rule(rule, &e->rule)) {
282 spin_lock(&e->lock); 282 spin_lock(&e->lock);
@@ -304,9 +304,12 @@ function to reject newly deleted data.
304 304
305 305
306Answer to Quick Quiz 306Answer to Quick Quiz
307 307 Why does the search function need to return holding the per-entry
308If the search function drops the per-entry lock before returning, then 308 lock for this deleted-flag technique to be helpful?
309the caller will be processing stale data in any case. If it is really 309
310OK to be processing stale data, then you don't need a "deleted" flag. 310 If the search function drops the per-entry lock before returning,
311If processing stale data really is a problem, then you need to hold the 311 then the caller will be processing stale data in any case. If it
312per-entry lock across all of the code that uses the value looked up. 312 is really OK to be processing stale data, then you don't need a
313 "deleted" flag. If processing stale data really is a problem,
314 then you need to hold the per-entry lock across all of the code
315 that uses the value that was returned.
diff --git a/Documentation/RCU/rcu.txt b/Documentation/RCU/rcu.txt
index 6fa092251586..02e27bf1d365 100644
--- a/Documentation/RCU/rcu.txt
+++ b/Documentation/RCU/rcu.txt
@@ -111,6 +111,11 @@ o What are all these files in this directory?
111 111
112 You are reading it! 112 You are reading it!
113 113
114 rcuref.txt
115
116 Describes how to combine use of reference counts
117 with RCU.
118
114 whatisRCU.txt 119 whatisRCU.txt
115 120
116 Overview of how the RCU implementation works. Along 121 Overview of how the RCU implementation works. Along
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
index 3f60db41b2f0..451de2ad8329 100644
--- a/Documentation/RCU/rcuref.txt
+++ b/Documentation/RCU/rcuref.txt
@@ -1,7 +1,7 @@
1Refcounter design for elements of lists/arrays protected by RCU. 1Reference-count design for elements of lists/arrays protected by RCU.
2 2
3Refcounting on elements of lists which are protected by traditional 3Reference counting on elements of lists which are protected by traditional
4reader/writer spinlocks or semaphores are straight forward as in: 4reader/writer spinlocks or semaphores are straightforward:
5 5
61. 2. 61. 2.
7add() search_and_reference() 7add() search_and_reference()
@@ -28,12 +28,12 @@ release_referenced() delete()
28 ... 28 ...
29 } 29 }
30 30
31If this list/array is made lock free using rcu as in changing the 31If this list/array is made lock free using RCU as in changing the
32write_lock in add() and delete() to spin_lock and changing read_lock 32write_lock() in add() and delete() to spin_lock and changing read_lock
33in search_and_reference to rcu_read_lock(), the atomic_get in 33in search_and_reference to rcu_read_lock(), the atomic_get in
34search_and_reference could potentially hold reference to an element which 34search_and_reference could potentially hold reference to an element which
35has already been deleted from the list/array. atomic_inc_not_zero takes 35has already been deleted from the list/array. Use atomic_inc_not_zero()
36care of this scenario. search_and_reference should look as; 36in this scenario as follows:
37 37
381. 2. 381. 2.
39add() search_and_reference() 39add() search_and_reference()
@@ -51,17 +51,16 @@ add() search_and_reference()
51release_referenced() delete() 51release_referenced() delete()
52{ { 52{ {
53 ... write_lock(&list_lock); 53 ... write_lock(&list_lock);
54 atomic_dec(&el->rc, relfunc) ... 54 if (atomic_dec_and_test(&el->rc)) ...
55 ... delete_element 55 call_rcu(&el->head, el_free); delete_element
56} write_unlock(&list_lock); 56 ... write_unlock(&list_lock);
57 ... 57} ...
58 if (atomic_dec_and_test(&el->rc)) 58 if (atomic_dec_and_test(&el->rc))
59 call_rcu(&el->head, el_free); 59 call_rcu(&el->head, el_free);
60 ... 60 ...
61 } 61 }
62 62
63Sometimes, reference to the element need to be obtained in the 63Sometimes, a reference to the element needs to be obtained in the
64update (write) stream. In such cases, atomic_inc_not_zero might be an 64update (write) stream. In such cases, atomic_inc_not_zero() might be
65overkill since the spinlock serialising list updates are held. atomic_inc 65overkill, since we hold the update-side spinlock. One might instead
66is to be used in such cases. 66use atomic_inc() in such cases.
67
diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt
index 15da16861fa3..5ed85af88789 100644
--- a/Documentation/RCU/whatisRCU.txt
+++ b/Documentation/RCU/whatisRCU.txt
@@ -200,10 +200,11 @@ rcu_assign_pointer()
200 the new value, and also executes any memory-barrier instructions 200 the new value, and also executes any memory-barrier instructions
201 required for a given CPU architecture. 201 required for a given CPU architecture.
202 202
203 Perhaps more important, it serves to document which pointers 203 Perhaps just as important, it serves to document (1) which
204 are protected by RCU. That said, rcu_assign_pointer() is most 204 pointers are protected by RCU and (2) the point at which a
205 frequently used indirectly, via the _rcu list-manipulation 205 given structure becomes accessible to other CPUs. That said,
206 primitives such as list_add_rcu(). 206 rcu_assign_pointer() is most frequently used indirectly, via
207 the _rcu list-manipulation primitives such as list_add_rcu().
207 208
208rcu_dereference() 209rcu_dereference()
209 210
@@ -258,9 +259,11 @@ rcu_dereference()
258 locking. 259 locking.
259 260
260 As with rcu_assign_pointer(), an important function of 261 As with rcu_assign_pointer(), an important function of
261 rcu_dereference() is to document which pointers are protected 262 rcu_dereference() is to document which pointers are protected by
262 by RCU. And, again like rcu_assign_pointer(), rcu_dereference() 263 RCU, in particular, flagging a pointer that is subject to changing
263 is typically used indirectly, via the _rcu list-manipulation 264 at any time, including immediately after the rcu_dereference().
265 And, again like rcu_assign_pointer(), rcu_dereference() is
266 typically used indirectly, via the _rcu list-manipulation
264 primitives, such as list_for_each_entry_rcu(). 267 primitives, such as list_for_each_entry_rcu().
265 268
266The following diagram shows how each API communicates among the 269The following diagram shows how each API communicates among the
@@ -327,7 +330,7 @@ for specialized uses, but are relatively uncommon.
3273. WHAT ARE SOME EXAMPLE USES OF CORE RCU API? 3303. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
328 331
329This section shows a simple use of the core RCU API to protect a 332This section shows a simple use of the core RCU API to protect a
330global pointer to a dynamically allocated structure. More typical 333global pointer to a dynamically allocated structure. More-typical
331uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt. 334uses of RCU may be found in listRCU.txt, arrayRCU.txt, and NMI-RCU.txt.
332 335
333 struct foo { 336 struct foo {
@@ -410,6 +413,8 @@ o Use synchronize_rcu() -after- removing a data element from an
410 data item. 413 data item.
411 414
412See checklist.txt for additional rules to follow when using RCU. 415See checklist.txt for additional rules to follow when using RCU.
416And again, more-typical uses of RCU may be found in listRCU.txt,
417arrayRCU.txt, and NMI-RCU.txt.
413 418
414 419
4154. WHAT IF MY UPDATING THREAD CANNOT BLOCK? 4204. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
@@ -513,7 +518,7 @@ production-quality implementation, and see:
513 518
514for papers describing the Linux kernel RCU implementation. The OLS'01 519for papers describing the Linux kernel RCU implementation. The OLS'01
515and OLS'02 papers are a good introduction, and the dissertation provides 520and OLS'02 papers are a good introduction, and the dissertation provides
516more details on the current implementation. 521more details on the current implementation as of early 2004.
517 522
518 523
5195A. "TOY" IMPLEMENTATION #1: LOCKING 5245A. "TOY" IMPLEMENTATION #1: LOCKING
@@ -768,7 +773,6 @@ RCU pointer/list traversal:
768 rcu_dereference 773 rcu_dereference
769 list_for_each_rcu (to be deprecated in favor of 774 list_for_each_rcu (to be deprecated in favor of
770 list_for_each_entry_rcu) 775 list_for_each_entry_rcu)
771 list_for_each_safe_rcu (deprecated, not used)
772 list_for_each_entry_rcu 776 list_for_each_entry_rcu
773 list_for_each_continue_rcu (to be deprecated in favor of new 777 list_for_each_continue_rcu (to be deprecated in favor of new
774 list_for_each_entry_continue_rcu) 778 list_for_each_entry_continue_rcu)
@@ -807,7 +811,8 @@ Quick Quiz #1: Why is this argument naive? How could a deadlock
807Answer: Consider the following sequence of events: 811Answer: Consider the following sequence of events:
808 812
809 1. CPU 0 acquires some unrelated lock, call it 813 1. CPU 0 acquires some unrelated lock, call it
810 "problematic_lock". 814 "problematic_lock", disabling irq via
815 spin_lock_irqsave().
811 816
812 2. CPU 1 enters synchronize_rcu(), write-acquiring 817 2. CPU 1 enters synchronize_rcu(), write-acquiring
813 rcu_gp_mutex. 818 rcu_gp_mutex.
@@ -894,7 +899,7 @@ Answer: Just as PREEMPT_RT permits preemption of spinlock
894ACKNOWLEDGEMENTS 899ACKNOWLEDGEMENTS
895 900
896My thanks to the people who helped make this human-readable, including 901My thanks to the people who helped make this human-readable, including
897Jon Walpole, Josh Triplett, Serge Hallyn, and Suzanne Wood. 902Jon Walpole, Josh Triplett, Serge Hallyn, Suzanne Wood, and Alan Stern.
898 903
899 904
900For more information, see http://www.rdrop.com/users/paulmck/RCU. 905For more information, see http://www.rdrop.com/users/paulmck/RCU.
diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt
index 08c5d04f3086..57a09f99ecb0 100644
--- a/Documentation/cpu-hotplug.txt
+++ b/Documentation/cpu-hotplug.txt
@@ -11,6 +11,8 @@
11 Joel Schopp <jschopp@austin.ibm.com> 11 Joel Schopp <jschopp@austin.ibm.com>
12 ia64/x86_64: 12 ia64/x86_64:
13 Ashok Raj <ashok.raj@intel.com> 13 Ashok Raj <ashok.raj@intel.com>
14 s390:
15 Heiko Carstens <heiko.carstens@de.ibm.com>
14 16
15Authors: Ashok Raj <ashok.raj@intel.com> 17Authors: Ashok Raj <ashok.raj@intel.com>
16Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>, 18Lots of feedback: Nathan Lynch <nathanl@austin.ibm.com>,
@@ -44,9 +46,28 @@ maxcpus=n Restrict boot time cpus to n. Say if you have 4 cpus, using
44 maxcpus=2 will only boot 2. You can choose to bring the 46 maxcpus=2 will only boot 2. You can choose to bring the
45 other cpus later online, read FAQ's for more info. 47 other cpus later online, read FAQ's for more info.
46 48
47additional_cpus=n [x86_64 only] use this to limit hotpluggable cpus. 49additional_cpus*=n Use this to limit hotpluggable cpus. This option sets
48 This option sets 50 cpu_possible_map = cpu_present_map + additional_cpus
49 cpu_possible_map = cpu_present_map + additional_cpus 51
52(*) Option valid only for following architectures
53- x86_64, ia64, s390
54
55ia64 and x86_64 use the number of disabled local apics in ACPI tables MADT
56to determine the number of potentially hot-pluggable cpus. The implementation
57should only rely on this to count the #of cpus, but *MUST* not rely on the
58apicid values in those tables for disabled apics. In the event BIOS doesnt
59mark such hot-pluggable cpus as disabled entries, one could use this
60parameter "additional_cpus=x" to represent those cpus in the cpu_possible_map.
61
62s390 uses the number of cpus it detects at IPL time to also the number of bits
63in cpu_possible_map. If it is desired to add additional cpus at a later time
64the number should be specified using this option or the possible_cpus option.
65
66possible_cpus=n [s390 only] use this to set hotpluggable cpus.
67 This option sets possible_cpus bits in
68 cpu_possible_map. Thus keeping the numbers of bits set
69 constant even if the machine gets rebooted.
70 This option overrides additional_cpus.
50 71
51CPU maps and such 72CPU maps and such
52----------------- 73-----------------
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index 990998ee10b6..30c41459953c 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -4,8 +4,9 @@
4Copyright (C) 2004 BULL SA. 4Copyright (C) 2004 BULL SA.
5Written by Simon.Derr@bull.net 5Written by Simon.Derr@bull.net
6 6
7Portions Copyright (c) 2004 Silicon Graphics, Inc. 7Portions Copyright (c) 2004-2006 Silicon Graphics, Inc.
8Modified by Paul Jackson <pj@sgi.com> 8Modified by Paul Jackson <pj@sgi.com>
9Modified by Christoph Lameter <clameter@sgi.com>
9 10
10CONTENTS: 11CONTENTS:
11========= 12=========
@@ -90,7 +91,8 @@ This can be especially valuable on:
90 91
91These subsets, or "soft partitions" must be able to be dynamically 92These subsets, or "soft partitions" must be able to be dynamically
92adjusted, as the job mix changes, without impacting other concurrently 93adjusted, as the job mix changes, without impacting other concurrently
93executing jobs. 94executing jobs. The location of the running jobs pages may also be moved
95when the memory locations are changed.
94 96
95The kernel cpuset patch provides the minimum essential kernel 97The kernel cpuset patch provides the minimum essential kernel
96mechanisms required to efficiently implement such subsets. It 98mechanisms required to efficiently implement such subsets. It
@@ -102,8 +104,8 @@ memory allocator code.
1021.3 How are cpusets implemented ? 1041.3 How are cpusets implemented ?
103--------------------------------- 105---------------------------------
104 106
105Cpusets provide a Linux kernel (2.6.7 and above) mechanism to constrain 107Cpusets provide a Linux kernel mechanism to constrain which CPUs and
106which CPUs and Memory Nodes are used by a process or set of processes. 108Memory Nodes are used by a process or set of processes.
107 109
108The Linux kernel already has a pair of mechanisms to specify on which 110The Linux kernel already has a pair of mechanisms to specify on which
109CPUs a task may be scheduled (sched_setaffinity) and on which Memory 111CPUs a task may be scheduled (sched_setaffinity) and on which Memory
@@ -371,22 +373,17 @@ cpusets memory placement policy 'mems' subsequently changes.
371If the cpuset flag file 'memory_migrate' is set true, then when 373If the cpuset flag file 'memory_migrate' is set true, then when
372tasks are attached to that cpuset, any pages that task had 374tasks are attached to that cpuset, any pages that task had
373allocated to it on nodes in its previous cpuset are migrated 375allocated to it on nodes in its previous cpuset are migrated
374to the tasks new cpuset. Depending on the implementation, 376to the tasks new cpuset. The relative placement of the page within
375this migration may either be done by swapping the page out, 377the cpuset is preserved during these migration operations if possible.
376so that the next time the page is referenced, it will be paged 378For example if the page was on the second valid node of the prior cpuset
377into the tasks new cpuset, usually on the node where it was 379then the page will be placed on the second valid node of the new cpuset.
378referenced, or this migration may be done by directly copying 380
379the pages from the tasks previous cpuset to the new cpuset,
380where possible to the same node, relative to the new cpuset,
381as the node that held the page, relative to the old cpuset.
382Also if 'memory_migrate' is set true, then if that cpusets 381Also if 'memory_migrate' is set true, then if that cpusets
383'mems' file is modified, pages allocated to tasks in that 382'mems' file is modified, pages allocated to tasks in that
384cpuset, that were on nodes in the previous setting of 'mems', 383cpuset, that were on nodes in the previous setting of 'mems',
385will be moved to nodes in the new setting of 'mems.' Again, 384will be moved to nodes in the new setting of 'mems.'
386depending on the implementation, this might be done by swapping, 385Pages that were not in the tasks prior cpuset, or in the cpusets
387or by direct copying. In either case, pages that were not in 386prior 'mems' setting, will not be moved.
388the tasks prior cpuset, or in the cpusets prior 'mems' setting,
389will not be moved.
390 387
391There is an exception to the above. If hotplug functionality is used 388There is an exception to the above. If hotplug functionality is used
392to remove all the CPUs that are currently assigned to a cpuset, 389to remove all the CPUs that are currently assigned to a cpuset,
@@ -434,16 +431,6 @@ and then start a subshell 'sh' in that cpuset:
434 # The next line should display '/Charlie' 431 # The next line should display '/Charlie'
435 cat /proc/self/cpuset 432 cat /proc/self/cpuset
436 433
437In the case that a change of cpuset includes wanting to move already
438allocated memory pages, consider further the work of IWAMOTO
439Toshihiro <iwamoto@valinux.co.jp> for page remapping and memory
440hotremoval, which can be found at:
441
442 http://people.valinux.co.jp/~iwamoto/mh.html
443
444The integration of cpusets with such memory migration is not yet
445available.
446
447In the future, a C library interface to cpusets will likely be 434In the future, a C library interface to cpusets will likely be
448available. For now, the only way to query or modify cpusets is 435available. For now, the only way to query or modify cpusets is
449via the cpuset file system, using the various cd, mkdir, echo, cat, 436via the cpuset file system, using the various cd, mkdir, echo, cat,
diff --git a/Documentation/cputopology.txt b/Documentation/cputopology.txt
new file mode 100644
index 000000000000..ff280e2e1613
--- /dev/null
+++ b/Documentation/cputopology.txt
@@ -0,0 +1,41 @@
1
2Export cpu topology info by sysfs. Items (attributes) are similar
3to /proc/cpuinfo.
4
51) /sys/devices/system/cpu/cpuX/topology/physical_package_id:
6represent the physical package id of cpu X;
72) /sys/devices/system/cpu/cpuX/topology/core_id:
8represent the cpu core id to cpu X;
93) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
10represent the thread siblings to cpu X in the same core;
114) /sys/devices/system/cpu/cpuX/topology/core_siblings:
12represent the thread siblings to cpu X in the same physical package;
13
14To implement it in an architecture-neutral way, a new source file,
15driver/base/topology.c, is to export the 5 attributes.
16
17If one architecture wants to support this feature, it just needs to
18implement 4 defines, typically in file include/asm-XXX/topology.h.
19The 4 defines are:
20#define topology_physical_package_id(cpu)
21#define topology_core_id(cpu)
22#define topology_thread_siblings(cpu)
23#define topology_core_siblings(cpu)
24
25The type of **_id is int.
26The type of siblings is cpumask_t.
27
28To be consistent on all architectures, the 4 attributes should have
29deafult values if their values are unavailable. Below is the rule.
301) physical_package_id: If cpu has no physical package id, -1 is the
31default value.
322) core_id: If cpu doesn't support multi-core, its core id is 0.
333) thread_siblings: Just include itself, if the cpu doesn't support
34HT/multi-thread.
354) core_siblings: Just include itself, if the cpu doesn't support
36multi-core and HT/Multi-thread.
37
38So be careful when declaring the 4 defines in include/asm-XXX/topology.h.
39
40If an attribute isn't defined on an architecture, it won't be exported.
41
diff --git a/Documentation/driver-model/overview.txt b/Documentation/driver-model/overview.txt
index 44662735cf81..ac4a7a737e43 100644
--- a/Documentation/driver-model/overview.txt
+++ b/Documentation/driver-model/overview.txt
@@ -1,50 +1,43 @@
1The Linux Kernel Device Model 1The Linux Kernel Device Model
2 2
3Patrick Mochel <mochel@osdl.org> 3Patrick Mochel <mochel@digitalimplant.org>
4 4
526 August 2002 5Drafted 26 August 2002
6Updated 31 January 2006
6 7
7 8
8Overview 9Overview
9~~~~~~~~ 10~~~~~~~~
10 11
11This driver model is a unification of all the current, disparate driver models 12The Linux Kernel Driver Model is a unification of all the disparate driver
12that are currently in the kernel. It is intended to augment the 13models that were previously used in the kernel. It is intended to augment the
13bus-specific drivers for bridges and devices by consolidating a set of data 14bus-specific drivers for bridges and devices by consolidating a set of data
14and operations into globally accessible data structures. 15and operations into globally accessible data structures.
15 16
16Current driver models implement some sort of tree-like structure (sometimes 17Traditional driver models implemented some sort of tree-like structure
17just a list) for the devices they control. But, there is no linkage between 18(sometimes just a list) for the devices they control. There wasn't any
18the different bus types. 19uniformity across the different bus types.
19 20
20A common data structure can provide this linkage with little overhead: when a 21The current driver model provides a comon, uniform data model for describing
21bus driver discovers a particular device, it can insert it into the global 22a bus and the devices that can appear under the bus. The unified bus
22tree as well as its local tree. In fact, the local tree becomes just a subset 23model includes a set of common attributes which all busses carry, and a set
23of the global tree. 24of common callbacks, such as device discovery during bus probing, bus
24 25shutdown, bus power management, etc.
25Common data fields can also be moved out of the local bus models into the
26global model. Some of the manipulations of these fields can also be
27consolidated. Most likely, manipulation functions will become a set
28of helper functions, which the bus drivers wrap around to include any
29bus-specific items.
30
31The common device and bridge interface currently reflects the goals of the
32modern PC: namely the ability to do seamless Plug and Play, power management,
33and hot plug. (The model dictated by Intel and Microsoft (read: ACPI) ensures
34us that any device in the system may fit any of these criteria.)
35
36In reality, not every bus will be able to support such operations. But, most
37buses will support a majority of those operations, and all future buses will.
38In other words, a bus that doesn't support an operation is the exception,
39instead of the other way around.
40 26
27The common device and bridge interface reflects the goals of the modern
28computer: namely the ability to do seamless device "plug and play", power
29management, and hot plug. In particular, the model dictated by Intel and
30Microsoft (namely ACPI) ensures that almost every device on almost any bus
31on an x86-compatible system can work within this paradigm. Of course,
32not every bus is able to support all such operations, although most
33buses support a most of those operations.
41 34
42 35
43Downstream Access 36Downstream Access
44~~~~~~~~~~~~~~~~~ 37~~~~~~~~~~~~~~~~~
45 38
46Common data fields have been moved out of individual bus layers into a common 39Common data fields have been moved out of individual bus layers into a common
47data structure. But, these fields must still be accessed by the bus layers, 40data structure. These fields must still be accessed by the bus layers,
48and sometimes by the device-specific drivers. 41and sometimes by the device-specific drivers.
49 42
50Other bus layers are encouraged to do what has been done for the PCI layer. 43Other bus layers are encouraged to do what has been done for the PCI layer.
@@ -53,7 +46,7 @@ struct pci_dev now looks like this:
53struct pci_dev { 46struct pci_dev {
54 ... 47 ...
55 48
56 struct device device; 49 struct device dev;
57}; 50};
58 51
59Note first that it is statically allocated. This means only one allocation on 52Note first that it is statically allocated. This means only one allocation on
@@ -64,9 +57,9 @@ the two.
64 57
65The PCI bus layer freely accesses the fields of struct device. It knows about 58The PCI bus layer freely accesses the fields of struct device. It knows about
66the structure of struct pci_dev, and it should know the structure of struct 59the structure of struct pci_dev, and it should know the structure of struct
67device. PCI devices that have been converted generally do not touch the fields 60device. Individual PCI device drivers that have been converted the the current
68of struct device. More precisely, device-specific drivers should not touch 61driver model generally do not and should not touch the fields of struct device,
69fields of struct device unless there is a strong compelling reason to do so. 62unless there is a strong compelling reason to do so.
70 63
71This abstraction is prevention of unnecessary pain during transitional phases. 64This abstraction is prevention of unnecessary pain during transitional phases.
72If the name of the field changes or is removed, then every downstream driver 65If the name of the field changes or is removed, then every downstream driver
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index b4a1ea762698..81bc51369f59 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -148,3 +148,44 @@ Why: The 8250 serial driver now has the ability to deal with the differences
148 brother on Alchemy SOCs. The loss of features is not considered an 148 brother on Alchemy SOCs. The loss of features is not considered an
149 issue. 149 issue.
150Who: Ralf Baechle <ralf@linux-mips.org> 150Who: Ralf Baechle <ralf@linux-mips.org>
151
152---------------------------
153
154What: Legacy /proc/pci interface (PCI_LEGACY_PROC)
155When: March 2006
156Why: deprecated since 2.5.53 in favor of lspci(8)
157Who: Adrian Bunk <bunk@stusta.de>
158
159---------------------------
160
161What: pci_module_init(driver)
162When: January 2007
163Why: Is replaced by pci_register_driver(pci_driver).
164Who: Richard Knutsson <ricknu-0@student.ltu.se> and Greg Kroah-Hartman <gregkh@suse.de>
165
166---------------------------
167
168What: I2C interface of the it87 driver
169When: January 2007
170Why: The ISA interface is faster and should be always available. The I2C
171 probing is also known to cause trouble in at least one case (see
172 bug #5889.)
173Who: Jean Delvare <khali@linux-fr.org>
174
175---------------------------
176
177What: mount/umount uevents
178When: February 2007
179Why: These events are not correct, and do not properly let userspace know
180 when a file system has been mounted or unmounted. Userspace should
181 poll the /proc/mounts file instead to detect this properly.
182Who: Greg Kroah-Hartman <gregkh@suse.de>
183
184---------------------------
185
186What: Support for NEC DDB5074 and DDB5476 evaluation boards.
187When: June 2006
188Why: Board specific code doesn't build anymore since ~2.6.0 and no
189 users have complained indicating there is no more need for these
190 boards. This should really be considered a last call.
191Who: Ralf Baechle <ralf@linux-mips.org>
diff --git a/Documentation/filesystems/configfs/configfs_example.c b/Documentation/filesystems/configfs/configfs_example.c
index f3c6e4946f98..3d4713a6c207 100644
--- a/Documentation/filesystems/configfs/configfs_example.c
+++ b/Documentation/filesystems/configfs/configfs_example.c
@@ -320,6 +320,7 @@ static struct config_item_type simple_children_type = {
320 .ct_item_ops = &simple_children_item_ops, 320 .ct_item_ops = &simple_children_item_ops,
321 .ct_group_ops = &simple_children_group_ops, 321 .ct_group_ops = &simple_children_group_ops,
322 .ct_attrs = simple_children_attrs, 322 .ct_attrs = simple_children_attrs,
323 .ct_owner = THIS_MODULE,
323}; 324};
324 325
325static struct configfs_subsystem simple_children_subsys = { 326static struct configfs_subsystem simple_children_subsys = {
@@ -403,6 +404,7 @@ static struct config_item_type group_children_type = {
403 .ct_item_ops = &group_children_item_ops, 404 .ct_item_ops = &group_children_item_ops,
404 .ct_group_ops = &group_children_group_ops, 405 .ct_group_ops = &group_children_group_ops,
405 .ct_attrs = group_children_attrs, 406 .ct_attrs = group_children_attrs,
407 .ct_owner = THIS_MODULE,
406}; 408};
407 409
408static struct configfs_subsystem group_children_subsys = { 410static struct configfs_subsystem group_children_subsys = {
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
index 614de3124901..251168587899 100644
--- a/Documentation/filesystems/ntfs.txt
+++ b/Documentation/filesystems/ntfs.txt
@@ -457,6 +457,12 @@ ChangeLog
457 457
458Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. 458Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
459 459
4602.1.26:
461 - Implement support for sector sizes above 512 bytes (up to the maximum
462 supported by NTFS which is 4096 bytes).
463 - Enhance support for NTFS volumes which were supported by Windows but
464 not by Linux due to invalid attribute list attribute flags.
465 - A few minor updates and bug fixes.
4602.1.25: 4662.1.25:
461 - Write support is now extended with write(2) being able to both 467 - Write support is now extended with write(2) being able to both
462 overwrite existing file data and to extend files. Also, if a write 468 overwrite existing file data and to extend files. Also, if a write
diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt
index f2595caf052e..4389c684a80a 100644
--- a/Documentation/filesystems/ocfs2.txt
+++ b/Documentation/filesystems/ocfs2.txt
@@ -35,6 +35,7 @@ Features which OCFS2 does not support yet:
35 be cluster coherent. 35 be cluster coherent.
36 - quotas 36 - quotas
37 - cluster aware flock 37 - cluster aware flock
38 - cluster aware lockf
38 - Directory change notification (F_NOTIFY) 39 - Directory change notification (F_NOTIFY)
39 - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease) 40 - Distributed Caching (F_SETLEASE/F_GETLEASE/break_lease)
40 - POSIX ACLs 41 - POSIX ACLs
diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt
index dbe4d87d2615..1773106976a2 100644
--- a/Documentation/filesystems/tmpfs.txt
+++ b/Documentation/filesystems/tmpfs.txt
@@ -79,15 +79,27 @@ that instance in a system with many cpus making intensive use of it.
79 79
80 80
81tmpfs has a mount option to set the NUMA memory allocation policy for 81tmpfs has a mount option to set the NUMA memory allocation policy for
82all files in that instance: 82all files in that instance (if CONFIG_NUMA is enabled) - which can be
83mpol=interleave prefers to allocate memory from each node in turn 83adjusted on the fly via 'mount -o remount ...'
84mpol=default prefers to allocate memory from the local node
85mpol=bind prefers to allocate from mpol_nodelist
86mpol=preferred prefers to allocate from first node in mpol_nodelist
87 84
88The following mount option is used in conjunction with mpol=interleave, 85mpol=default prefers to allocate memory from the local node
89mpol=bind or mpol=preferred: 86mpol=prefer:Node prefers to allocate memory from the given Node
90mpol_nodelist: nodelist suitable for parsing with nodelist_parse. 87mpol=bind:NodeList allocates memory only from nodes in NodeList
88mpol=interleave prefers to allocate from each node in turn
89mpol=interleave:NodeList allocates from each node of NodeList in turn
90
91NodeList format is a comma-separated list of decimal numbers and ranges,
92a range being two hyphen-separated decimal numbers, the smallest and
93largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15
94
95Note that trying to mount a tmpfs with an mpol option will fail if the
96running kernel does not support NUMA; and will fail if its nodelist
97specifies a node >= MAX_NUMNODES. If your system relies on that tmpfs
98being mounted, but from time to time runs a kernel built without NUMA
99capability (perhaps a safe recovery kernel), or configured to support
100fewer nodes, then it is advisable to omit the mpol option from automatic
101mount options. It can be added later, when the tmpfs is already mounted
102on MountPoint, by 'mount -o remount,mpol=Policy:NodeList MountPoint'.
91 103
92 104
93To specify the initial root directory you can use the following mount 105To specify the initial root directory you can use the following mount
@@ -109,4 +121,4 @@ RAM/SWAP in 10240 inodes and it is only accessible by root.
109Author: 121Author:
110 Christoph Rohland <cr@sap.com>, 1.12.01 122 Christoph Rohland <cr@sap.com>, 1.12.01
111Updated: 123Updated:
112 Hugh Dickins <hugh@veritas.com>, 13 March 2005 124 Hugh Dickins <hugh@veritas.com>, 19 February 2006
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt
index 4e92feb6b507..24c7a9c41f0d 100644
--- a/Documentation/filesystems/v9fs.txt
+++ b/Documentation/filesystems/v9fs.txt
@@ -57,8 +57,6 @@ OPTIONS
57 57
58 port=n port to connect to on the remote server 58 port=n port to connect to on the remote server
59 59
60 timeout=n request timeouts (in ms) (default 60000ms)
61
62 noextend force legacy mode (no 9P2000.u semantics) 60 noextend force legacy mode (no 9P2000.u semantics)
63 61
64 uid attempt to mount as a particular uid 62 uid attempt to mount as a particular uid
@@ -74,10 +72,16 @@ OPTIONS
74RESOURCES 72RESOURCES
75========= 73=========
76 74
77The Linux version of the 9P server, along with some client-side utilities 75The Linux version of the 9P server is now maintained under the npfs project
78can be found at http://v9fs.sf.net (along with a CVS repository of the 76on sourceforge (http://sourceforge.net/projects/npfs).
79development branch of this module). There are user and developer mailing 77
80lists here, as well as a bug-tracker. 78There are user and developer mailing lists available through the v9fs project
79on sourceforge (http://sourceforge.net/projects/v9fs).
80
81News and other information is maintained on SWiK (http://swik.net/v9fs).
82
83Bug reports may be issued through the kernel.org bugzilla
84(http://bugzilla.kernel.org)
81 85
82For more information on the Plan 9 Operating System check out 86For more information on the Plan 9 Operating System check out
83http://plan9.bell-labs.com/plan9 87http://plan9.bell-labs.com/plan9
diff --git a/Documentation/fujitsu/frv/kernel-ABI.txt b/Documentation/fujitsu/frv/kernel-ABI.txt
new file mode 100644
index 000000000000..0ed9b0a779bc
--- /dev/null
+++ b/Documentation/fujitsu/frv/kernel-ABI.txt
@@ -0,0 +1,234 @@
1 =================================
2 INTERNAL KERNEL ABI FOR FR-V ARCH
3 =================================
4
5The internal FRV kernel ABI is not quite the same as the userspace ABI. A number of the registers
6are used for special purposed, and the ABI is not consistent between modules vs core, and MMU vs
7no-MMU.
8
9This partly stems from the fact that FRV CPUs do not have a separate supervisor stack pointer, and
10most of them do not have any scratch registers, thus requiring at least one general purpose
11register to be clobbered in such an event. Also, within the kernel core, it is possible to simply
12jump or call directly between functions using a relative offset. This cannot be extended to modules
13for the displacement is likely to be too far. Thus in modules the address of a function to call
14must be calculated in a register and then used, requiring two extra instructions.
15
16This document has the following sections:
17
18 (*) System call register ABI
19 (*) CPU operating modes
20 (*) Internal kernel-mode register ABI
21 (*) Internal debug-mode register ABI
22 (*) Virtual interrupt handling
23
24
25========================
26SYSTEM CALL REGISTER ABI
27========================
28
29When a system call is made, the following registers are effective:
30
31 REGISTERS CALL RETURN
32 =============== ======================= =======================
33 GR7 System call number Preserved
34 GR8 Syscall arg #1 Return value
35 GR9-GR13 Syscall arg #2-6 Preserved
36
37
38===================
39CPU OPERATING MODES
40===================
41
42The FR-V CPU has three basic operating modes. In order of increasing capability:
43
44 (1) User mode.
45
46 Basic userspace running mode.
47
48 (2) Kernel mode.
49
50 Normal kernel mode. There are many additional control registers available that may be
51 accessed in this mode, in addition to all the stuff available to user mode. This has two
52 submodes:
53
54 (a) Exceptions enabled (PSR.T == 1).
55
56 Exceptions will invoke the appropriate normal kernel mode handler. On entry to the
57 handler, the PSR.T bit will be cleared.
58
59 (b) Exceptions disabled (PSR.T == 0).
60
61 No exceptions or interrupts may happen. Any mandatory exceptions will cause the CPU to
62 halt unless the CPU is told to jump into debug mode instead.
63
64 (3) Debug mode.
65
66 No exceptions may happen in this mode. Memory protection and management exceptions will be
67 flagged for later consideration, but the exception handler won't be invoked. Debugging traps
68 such as hardware breakpoints and watchpoints will be ignored. This mode is entered only by
69 debugging events obtained from the other two modes.
70
71 All kernel mode registers may be accessed, plus a few extra debugging specific registers.
72
73
74=================================
75INTERNAL KERNEL-MODE REGISTER ABI
76=================================
77
78There are a number of permanent register assignments that are set up by entry.S in the exception
79prologue. Note that there is a complete set of exception prologues for each of user->kernel
80transition and kernel->kernel transition. There are also user->debug and kernel->debug mode
81transition prologues.
82
83
84 REGISTER FLAVOUR USE
85 =============== ======= ====================================================
86 GR1 Supervisor stack pointer
87 GR15 Current thread info pointer
88 GR16 GP-Rel base register for small data
89 GR28 Current exception frame pointer (__frame)
90 GR29 Current task pointer (current)
91 GR30 Destroyed by kernel mode entry
92 GR31 NOMMU Destroyed by debug mode entry
93 GR31 MMU Destroyed by TLB miss kernel mode entry
94 CCR.ICC2 Virtual interrupt disablement tracking
95 CCCR.CC3 Cleared by exception prologue (atomic op emulation)
96 SCR0 MMU See mmu-layout.txt.
97 SCR1 MMU See mmu-layout.txt.
98 SCR2 MMU Save for EAR0 (destroyed by icache insns in debug mode)
99 SCR3 MMU Save for GR31 during debug exceptions
100 DAMR/IAMR NOMMU Fixed memory protection layout.
101 DAMR/IAMR MMU See mmu-layout.txt.
102
103
104Certain registers are also used or modified across function calls:
105
106 REGISTER CALL RETURN
107 =============== =============================== ===============================
108 GR0 Fixed Zero -
109 GR2 Function call frame pointer
110 GR3 Special Preserved
111 GR3-GR7 - Clobbered
112 GR8 Function call arg #1 Return value (or clobbered)
113 GR9 Function call arg #2 Return value MSW (or clobbered)
114 GR10-GR13 Function call arg #3-#6 Clobbered
115 GR14 - Clobbered
116 GR15-GR16 Special Preserved
117 GR17-GR27 - Preserved
118 GR28-GR31 Special Only accessed explicitly
119 LR Return address after CALL Clobbered
120 CCR/CCCR - Mostly Clobbered
121
122
123================================
124INTERNAL DEBUG-MODE REGISTER ABI
125================================
126
127This is the same as the kernel-mode register ABI for functions calls. The difference is that in
128debug-mode there's a different stack and a different exception frame. Almost all the global
129registers from kernel-mode (including the stack pointer) may be changed.
130
131 REGISTER FLAVOUR USE
132 =============== ======= ====================================================
133 GR1 Debug stack pointer
134 GR16 GP-Rel base register for small data
135 GR31 Current debug exception frame pointer (__debug_frame)
136 SCR3 MMU Saved value of GR31
137
138
139Note that debug mode is able to interfere with the kernel's emulated atomic ops, so it must be
140exceedingly careful not to do any that would interact with the main kernel in this regard. Hence
141the debug mode code (gdbstub) is almost completely self-contained. The only external code used is
142the sprintf family of functions.
143
144Futhermore, break.S is so complicated because single-step mode does not switch off on entry to an
145exception. That means unless manually disabled, single-stepping will blithely go on stepping into
146things like interrupts. See gdbstub.txt for more information.
147
148
149==========================
150VIRTUAL INTERRUPT HANDLING
151==========================
152
153Because accesses to the PSR is so slow, and to disable interrupts we have to access it twice (once
154to read and once to write), we don't actually disable interrupts at all if we don't have to. What
155we do instead is use the ICC2 condition code flags to note virtual disablement, such that if we
156then do take an interrupt, we note the flag, really disable interrupts, set another flag and resume
157execution at the point the interrupt happened. Setting condition flags as a side effect of an
158arithmetic or logical instruction is really fast. This use of the ICC2 only occurs within the
159kernel - it does not affect userspace.
160
161The flags we use are:
162
163 (*) CCR.ICC2.Z [Zero flag]
164
165 Set to virtually disable interrupts, clear when interrupts are virtually enabled. Can be
166 modified by logical instructions without affecting the Carry flag.
167
168 (*) CCR.ICC2.C [Carry flag]
169
170 Clear to indicate hardware interrupts are really disabled, set otherwise.
171
172
173What happens is this:
174
175 (1) Normal kernel-mode operation.
176
177 ICC2.Z is 0, ICC2.C is 1.
178
179 (2) An interrupt occurs. The exception prologue examines ICC2.Z and determines that nothing needs
180 doing. This is done simply with an unlikely BEQ instruction.
181
182 (3) The interrupts are disabled (local_irq_disable)
183
184 ICC2.Z is set to 1.
185
186 (4) If interrupts were then re-enabled (local_irq_enable):
187
188 ICC2.Z would be set to 0.
189
190 A TIHI #2 instruction (trap #2 if condition HI - Z==0 && C==0) would be used to trap if
191 interrupts were now virtually enabled, but physically disabled - which they're not, so the
192 trap isn't taken. The kernel would then be back to state (1).
193
194 (5) An interrupt occurs. The exception prologue examines ICC2.Z and determines that the interrupt
195 shouldn't actually have happened. It jumps aside, and there disabled interrupts by setting
196 PSR.PIL to 14 and then it clears ICC2.C.
197
198 (6) If interrupts were then saved and disabled again (local_irq_save):
199
200 ICC2.Z would be shifted into the save variable and masked off (giving a 1).
201
202 ICC2.Z would then be set to 1 (thus unchanged), and ICC2.C would be unaffected (ie: 0).
203
204 (7) If interrupts were then restored from state (6) (local_irq_restore):
205
206 ICC2.Z would be set to indicate the result of XOR'ing the saved value (ie: 1) with 1, which
207 gives a result of 0 - thus leaving ICC2.Z set.
208
209 ICC2.C would remain unaffected (ie: 0).
210
211 A TIHI #2 instruction would be used to again assay the current state, but this would do
212 nothing as Z==1.
213
214 (8) If interrupts were then enabled (local_irq_enable):
215
216 ICC2.Z would be cleared. ICC2.C would be left unaffected. Both flags would now be 0.
217
218 A TIHI #2 instruction again issued to assay the current state would then trap as both Z==0
219 [interrupts virtually enabled] and C==0 [interrupts really disabled] would then be true.
220
221 (9) The trap #2 handler would simply enable hardware interrupts (set PSR.PIL to 0), set ICC2.C to
222 1 and return.
223
224(10) Immediately upon returning, the pending interrupt would be taken.
225
226(11) The interrupt handler would take the path of actually processing the interrupt (ICC2.Z is
227 clear, BEQ fails as per step (2)).
228
229(12) The interrupt handler would then set ICC2.C to 1 since hardware interrupts are definitely
230 enabled - or else the kernel wouldn't be here.
231
232(13) On return from the interrupt handler, things would be back to state (1).
233
234This trap (#2) is only available in kernel mode. In user mode it will result in SIGILL.
diff --git a/Documentation/hwmon/f71805f b/Documentation/hwmon/f71805f
new file mode 100644
index 000000000000..28c5b7d1eb90
--- /dev/null
+++ b/Documentation/hwmon/f71805f
@@ -0,0 +1,105 @@
1Kernel driver f71805f
2=====================
3
4Supported chips:
5 * Fintek F71805F/FG
6 Prefix: 'f71805f'
7 Addresses scanned: none, address read from Super I/O config space
8 Datasheet: Provided by Fintek on request
9
10Author: Jean Delvare <khali@linux-fr.org>
11
12Thanks to Denis Kieft from Barracuda Networks for the donation of a
13test system (custom Jetway K8M8MS motherboard, with CPU and RAM) and
14for providing initial documentation.
15
16Thanks to Kris Chen from Fintek for answering technical questions and
17providing additional documentation.
18
19Thanks to Chris Lin from Jetway for providing wiring schematics and
20anwsering technical questions.
21
22
23Description
24-----------
25
26The Fintek F71805F/FG Super I/O chip includes complete hardware monitoring
27capabilities. It can monitor up to 9 voltages (counting its own power
28source), 3 fans and 3 temperature sensors.
29
30This chip also has fan controlling features, using either DC or PWM, in
31three different modes (one manual, two automatic). The driver doesn't
32support these features yet.
33
34The driver assumes that no more than one chip is present, which seems
35reasonable.
36
37
38Voltage Monitoring
39------------------
40
41Voltages are sampled by an 8-bit ADC with a LSB of 8 mV. The supported
42range is thus from 0 to 2.040 V. Voltage values outside of this range
43need external resistors. An exception is in0, which is used to monitor
44the chip's own power source (+3.3V), and is divided internally by a
45factor 2.
46
47The two LSB of the voltage limit registers are not used (always 0), so
48you can only set the limits in steps of 32 mV (before scaling).
49
50The wirings and resistor values suggested by Fintek are as follow:
51
52 pin expected
53 name use R1 R2 divider raw val.
54
55in0 VCC VCC3.3V int. int. 2.00 1.65 V
56in1 VIN1 VTT1.2V 10K - 1.00 1.20 V
57in2 VIN2 VRAM 100K 100K 2.00 ~1.25 V (1)
58in3 VIN3 VCHIPSET 47K 100K 1.47 2.24 V (2)
59in4 VIN4 VCC5V 200K 47K 5.25 0.95 V
60in5 VIN5 +12V 200K 20K 11.00 1.05 V
61in6 VIN6 VCC1.5V 10K - 1.00 1.50 V
62in7 VIN7 VCORE 10K - 1.00 ~1.40 V (1)
63in8 VIN8 VSB5V 200K 47K 1.00 0.95 V
64
65(1) Depends on your hardware setup.
66(2) Obviously not correct, swapping R1 and R2 would make more sense.
67
68These values can be used as hints at best, as motherboard manufacturers
69are free to use a completely different setup. As a matter of fact, the
70Jetway K8M8MS uses a significantly different setup. You will have to
71find out documentation about your own motherboard, and edit sensors.conf
72accordingly.
73
74Each voltage measured has associated low and high limits, each of which
75triggers an alarm when crossed.
76
77
78Fan Monitoring
79--------------
80
81Fan rotation speeds are reported as 12-bit values from a gated clock
82signal. Speeds down to 366 RPM can be measured. There is no theoretical
83high limit, but values over 6000 RPM seem to cause problem. The effective
84resolution is much lower than you would expect, the step between different
85register values being 10 rather than 1.
86
87The chip assumes 2 pulse-per-revolution fans.
88
89An alarm is triggered if the rotation speed drops below a programmable
90limit or is too low to be measured.
91
92
93Temperature Monitoring
94----------------------
95
96Temperatures are reported in degrees Celsius. Each temperature measured
97has a high limit, those crossing triggers an alarm. There is an associated
98hysteresis value, below which the temperature has to drop before the
99alarm is cleared.
100
101All temperature channels are external, there is no embedded temperature
102sensor. Each channel can be used for connecting either a thermal diode
103or a thermistor. The driver reports the currently selected mode, but
104doesn't allow changing it. In theory, the BIOS should have configured
105everything properly.
diff --git a/Documentation/hwmon/it87 b/Documentation/hwmon/it87
index 7f42e441c645..9555be1ed999 100644
--- a/Documentation/hwmon/it87
+++ b/Documentation/hwmon/it87
@@ -9,7 +9,7 @@ Supported chips:
9 http://www.ite.com.tw/ 9 http://www.ite.com.tw/
10 * IT8712F 10 * IT8712F
11 Prefix: 'it8712' 11 Prefix: 'it8712'
12 Addresses scanned: I2C 0x28 - 0x2f 12 Addresses scanned: I2C 0x2d
13 from Super I/O config space (8 I/O ports) 13 from Super I/O config space (8 I/O ports)
14 Datasheet: Publicly available at the ITE website 14 Datasheet: Publicly available at the ITE website
15 http://www.ite.com.tw/ 15 http://www.ite.com.tw/
diff --git a/Documentation/hwmon/sysfs-interface b/Documentation/hwmon/sysfs-interface
index 764cdc5480e7..a0d0ab24288e 100644
--- a/Documentation/hwmon/sysfs-interface
+++ b/Documentation/hwmon/sysfs-interface
@@ -179,11 +179,12 @@ temp[1-*]_auto_point[1-*]_temp_hyst
179**************** 179****************
180 180
181temp[1-3]_type Sensor type selection. 181temp[1-3]_type Sensor type selection.
182 Integers 1, 2, 3 or thermistor Beta value (3435) 182 Integers 1 to 4 or thermistor Beta value (typically 3435)
183 Read/Write. 183 Read/Write.
184 1: PII/Celeron Diode 184 1: PII/Celeron Diode
185 2: 3904 transistor 185 2: 3904 transistor
186 3: thermal diode 186 3: thermal diode
187 4: thermistor (default/unknown Beta)
187 Not all types are supported by all chips 188 Not all types are supported by all chips
188 189
189temp[1-4]_max Temperature max value. 190temp[1-4]_max Temperature max value.
@@ -261,6 +262,21 @@ alarms Alarm bitmask.
261 of individual bits. 262 of individual bits.
262 Bits are defined in kernel/include/sensors.h. 263 Bits are defined in kernel/include/sensors.h.
263 264
265alarms_in Alarm bitmask relative to in (voltage) channels
266 Read only
267 A '1' bit means an alarm, LSB corresponds to in0 and so on
268 Prefered to 'alarms' for newer chips
269
270alarms_fan Alarm bitmask relative to fan channels
271 Read only
272 A '1' bit means an alarm, LSB corresponds to fan1 and so on
273 Prefered to 'alarms' for newer chips
274
275alarms_temp Alarm bitmask relative to temp (temperature) channels
276 Read only
277 A '1' bit means an alarm, LSB corresponds to temp1 and so on
278 Prefered to 'alarms' for newer chips
279
264beep_enable Beep/interrupt enable 280beep_enable Beep/interrupt enable
265 0 to disable. 281 0 to disable.
266 1 to enable. 282 1 to enable.
diff --git a/Documentation/hwmon/w83627hf b/Documentation/hwmon/w83627hf
index 5d23776e9907..bbeaba680443 100644
--- a/Documentation/hwmon/w83627hf
+++ b/Documentation/hwmon/w83627hf
@@ -36,6 +36,10 @@ Module Parameters
36 (default is 1) 36 (default is 1)
37 Use 'init=0' to bypass initializing the chip. 37 Use 'init=0' to bypass initializing the chip.
38 Try this if your computer crashes when you load the module. 38 Try this if your computer crashes when you load the module.
39* reset: int
40 (default is 0)
41 The driver used to reset the chip on load, but does no more. Use
42 'reset=1' to restore the old behavior. Report if you need to do this.
39 43
40Description 44Description
41----------- 45-----------
diff --git a/Documentation/i2c/busses/i2c-sis69x b/Documentation/i2c/busses/i2c-sis96x
index b88953dfd580..00a009b977e9 100644
--- a/Documentation/i2c/busses/i2c-sis69x
+++ b/Documentation/i2c/busses/i2c-sis96x
@@ -7,7 +7,7 @@ Supported adapters:
7 Any combination of these host bridges: 7 Any combination of these host bridges:
8 645, 645DX (aka 646), 648, 650, 651, 655, 735, 745, 746 8 645, 645DX (aka 646), 648, 650, 651, 655, 735, 745, 746
9 and these south bridges: 9 and these south bridges:
10 961, 962, 963(L) 10 961, 962, 963(L)
11 11
12Author: Mark M. Hoffman <mhoffman@lightlink.com> 12Author: Mark M. Hoffman <mhoffman@lightlink.com>
13 13
@@ -29,7 +29,7 @@ The command "lspci" as root should produce something like these lines:
29 29
30or perhaps this... 30or perhaps this...
31 31
3200:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device 0645 3200:00.0 Host bridge: Silicon Integrated Systems [SiS]: Unknown device 0645
3300:02.0 ISA bridge: Silicon Integrated Systems [SiS]: Unknown device 0961 3300:02.0 ISA bridge: Silicon Integrated Systems [SiS]: Unknown device 0961
3400:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016 3400:02.1 SMBus: Silicon Integrated Systems [SiS]: Unknown device 0016
35 35
diff --git a/Documentation/kernel-doc-nano-HOWTO.txt b/Documentation/kernel-doc-nano-HOWTO.txt
index c406ce67edd0..c65233d430f0 100644
--- a/Documentation/kernel-doc-nano-HOWTO.txt
+++ b/Documentation/kernel-doc-nano-HOWTO.txt
@@ -45,10 +45,10 @@ How to extract the documentation
45 45
46If you just want to read the ready-made books on the various 46If you just want to read the ready-made books on the various
47subsystems (see Documentation/DocBook/*.tmpl), just type 'make 47subsystems (see Documentation/DocBook/*.tmpl), just type 'make
48psdocs', or 'make pdfdocs', or 'make htmldocs', depending on your 48psdocs', or 'make pdfdocs', or 'make htmldocs', depending on your
49preference. If you would rather read a different format, you can type 49preference. If you would rather read a different format, you can type
50'make sgmldocs' and then use DocBook tools to convert 50'make sgmldocs' and then use DocBook tools to convert
51Documentation/DocBook/*.sgml to a format of your choice (for example, 51Documentation/DocBook/*.sgml to a format of your choice (for example,
52'db2html ...' if 'make htmldocs' was not defined). 52'db2html ...' if 'make htmldocs' was not defined).
53 53
54If you want to see man pages instead, you can do this: 54If you want to see man pages instead, you can do this:
@@ -124,6 +124,36 @@ patterns, which are highlighted appropriately.
124Take a look around the source tree for examples. 124Take a look around the source tree for examples.
125 125
126 126
127kernel-doc for structs, unions, enums, and typedefs
128---------------------------------------------------
129
130Beside functions you can also write documentation for structs, unions,
131enums and typedefs. Instead of the function name you must write the name
132of the declaration; the struct/union/enum/typedef must always precede
133the name. Nesting of declarations is not supported.
134Use the argument mechanism to document members or constants.
135
136Inside a struct description, you can use the "private:" and "public:"
137comment tags. Structure fields that are inside a "private:" area
138are not listed in the generated output documentation.
139
140Example:
141
142/**
143 * struct my_struct - short description
144 * @a: first member
145 * @b: second member
146 *
147 * Longer description
148 */
149struct my_struct {
150 int a;
151 int b;
152/* private: */
153 int c;
154};
155
156
127How to make new SGML template files 157How to make new SGML template files
128----------------------------------- 158-----------------------------------
129 159
@@ -147,4 +177,3 @@ documentation, in <filename>, for the functions listed.
147 177
148Tim. 178Tim.
149*/ <twaugh@redhat.com> 179*/ <twaugh@redhat.com>
150
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1cbcf65b764b..fc99075e0af4 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -335,6 +335,12 @@ running once the system is up.
335 timesource is not avalible, it defaults to PIT. 335 timesource is not avalible, it defaults to PIT.
336 Format: { pit | tsc | cyclone | pmtmr } 336 Format: { pit | tsc | cyclone | pmtmr }
337 337
338 disable_8254_timer
339 enable_8254_timer
340 [IA32/X86_64] Disable/Enable interrupt 0 timer routing
341 over the 8254 in addition to over the IO-APIC. The
342 kernel tries to set a sensible default.
343
338 hpet= [IA-32,HPET] option to disable HPET and use PIT. 344 hpet= [IA-32,HPET] option to disable HPET and use PIT.
339 Format: disable 345 Format: disable
340 346
@@ -452,6 +458,11 @@ running once the system is up.
452 458
453 eata= [HW,SCSI] 459 eata= [HW,SCSI]
454 460
461 ec_intr= [HW,ACPI] ACPI Embedded Controller interrupt mode
462 Format: <int>
463 0: polling mode
464 non-0: interrupt mode (default)
465
455 eda= [HW,PS2] 466 eda= [HW,PS2]
456 467
457 edb= [HW,PS2] 468 edb= [HW,PS2]
@@ -1029,6 +1040,8 @@ running once the system is up.
1029 1040
1030 nomce [IA-32] Machine Check Exception 1041 nomce [IA-32] Machine Check Exception
1031 1042
1043 nomca [IA-64] Disable machine check abort handling
1044
1032 noresidual [PPC] Don't use residual data on PReP machines. 1045 noresidual [PPC] Don't use residual data on PReP machines.
1033 1046
1034 noresume [SWSUSP] Disables resume and restores original swap 1047 noresume [SWSUSP] Disables resume and restores original swap
@@ -1128,6 +1141,8 @@ running once the system is up.
1128 Mechanism 1. 1141 Mechanism 1.
1129 conf2 [IA-32] Force use of PCI Configuration 1142 conf2 [IA-32] Force use of PCI Configuration
1130 Mechanism 2. 1143 Mechanism 2.
1144 nommconf [IA-32,X86_64] Disable use of MMCONFIG for PCI
1145 Configuration
1131 nosort [IA-32] Don't sort PCI devices according to 1146 nosort [IA-32] Don't sort PCI devices according to
1132 order given by the PCI BIOS. This sorting is 1147 order given by the PCI BIOS. This sorting is
1133 done to get a device order compatible with 1148 done to get a device order compatible with
@@ -1275,6 +1290,19 @@ running once the system is up.
1275 New name for the ramdisk parameter. 1290 New name for the ramdisk parameter.
1276 See Documentation/ramdisk.txt. 1291 See Documentation/ramdisk.txt.
1277 1292
1293 rcu.blimit= [KNL,BOOT] Set maximum number of finished
1294 RCU callbacks to process in one batch.
1295
1296 rcu.qhimark= [KNL,BOOT] Set threshold of queued
1297 RCU callbacks over which batch limiting is disabled.
1298
1299 rcu.qlowmark= [KNL,BOOT] Set threshold of queued
1300 RCU callbacks below which batch limiting is re-enabled.
1301
1302 rcu.rsinterval= [KNL,BOOT,SMP] Set the number of additional
1303 RCU callbacks to queued before forcing reschedule
1304 on all cpus.
1305
1278 rdinit= [KNL] 1306 rdinit= [KNL]
1279 Format: <full_path> 1307 Format: <full_path>
1280 Run specified binary instead of /init from the ramdisk, 1308 Run specified binary instead of /init from the ramdisk,
@@ -1631,6 +1659,9 @@ running once the system is up.
1631 Format: 1659 Format:
1632 <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]] 1660 <irq>,<irq_mask>,<io>,<full_duplex>,<do_sound>,<lockup_hack>[,<irq2>[,<irq3>[,<irq4>]]]
1633 1661
1662 norandmaps Don't use address space randomization
1663 Equivalent to echo 0 > /proc/sys/kernel/randomize_va_space
1664
1634 1665
1635______________________________________________________________________ 1666______________________________________________________________________
1636Changelog: 1667Changelog:
diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt
index 0ea5a0c6e827..2c3b1eae4280 100644
--- a/Documentation/kprobes.txt
+++ b/Documentation/kprobes.txt
@@ -136,17 +136,20 @@ Kprobes, jprobes, and return probes are implemented on the following
136architectures: 136architectures:
137 137
138- i386 138- i386
139- x86_64 (AMD-64, E64MT) 139- x86_64 (AMD-64, EM64T)
140- ppc64 140- ppc64
141- ia64 (Support for probes on certain instruction types is still in progress.) 141- ia64 (Does not support probes on instruction slot1.)
142- sparc64 (Return probes not yet implemented.) 142- sparc64 (Return probes not yet implemented.)
143 143
1443. Configuring Kprobes 1443. Configuring Kprobes
145 145
146When configuring the kernel using make menuconfig/xconfig/oldconfig, 146When configuring the kernel using make menuconfig/xconfig/oldconfig,
147ensure that CONFIG_KPROBES is set to "y". Under "Kernel hacking", 147ensure that CONFIG_KPROBES is set to "y". Under "Instrumentation
148look for "Kprobes". You may have to enable "Kernel debugging" 148Support", look for "Kprobes".
149(CONFIG_DEBUG_KERNEL) before you can enable Kprobes. 149
150So that you can load and unload Kprobes-based instrumentation modules,
151make sure "Loadable module support" (CONFIG_MODULES) and "Module
152unloading" (CONFIG_MODULE_UNLOAD) are set to "y".
150 153
151You may also want to ensure that CONFIG_KALLSYMS and perhaps even 154You may also want to ensure that CONFIG_KALLSYMS and perhaps even
152CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name() 155CONFIG_KALLSYMS_ALL are set to "y", since kallsyms_lookup_name()
@@ -262,18 +265,18 @@ at any time after the probe has been registered.
262 265
2635. Kprobes Features and Limitations 2665. Kprobes Features and Limitations
264 267
265As of Linux v2.6.12, Kprobes allows multiple probes at the same 268Kprobes allows multiple probes at the same address. Currently,
266address. Currently, however, there cannot be multiple jprobes on 269however, there cannot be multiple jprobes on the same function at
267the same function at the same time. 270the same time.
268 271
269In general, you can install a probe anywhere in the kernel. 272In general, you can install a probe anywhere in the kernel.
270In particular, you can probe interrupt handlers. Known exceptions 273In particular, you can probe interrupt handlers. Known exceptions
271are discussed in this section. 274are discussed in this section.
272 275
273For obvious reasons, it's a bad idea to install a probe in 276The register_*probe functions will return -EINVAL if you attempt
274the code that implements Kprobes (mostly kernel/kprobes.c and 277to install a probe in the code that implements Kprobes (mostly
275arch/*/kernel/kprobes.c). A patch in the v2.6.13 timeframe instructs 278kernel/kprobes.c and arch/*/kernel/kprobes.c, but also functions such
276Kprobes to reject such requests. 279as do_page_fault and notifier_call_chain).
277 280
278If you install a probe in an inline-able function, Kprobes makes 281If you install a probe in an inline-able function, Kprobes makes
279no attempt to chase down all inline instances of the function and 282no attempt to chase down all inline instances of the function and
@@ -290,18 +293,14 @@ from the accidental ones. Don't drink and probe.
290 293
291Kprobes makes no attempt to prevent probe handlers from stepping on 294Kprobes makes no attempt to prevent probe handlers from stepping on
292each other -- e.g., probing printk() and then calling printk() from a 295each other -- e.g., probing printk() and then calling printk() from a
293probe handler. As of Linux v2.6.12, if a probe handler hits a probe, 296probe handler. If a probe handler hits a probe, that second probe's
294that second probe's handlers won't be run in that instance. 297handlers won't be run in that instance, and the kprobe.nmissed member
295 298of the second probe will be incremented.
296In Linux v2.6.12 and previous versions, Kprobes' data structures are 299
297protected by a single lock that is held during probe registration and 300As of Linux v2.6.15-rc1, multiple handlers (or multiple instances of
298unregistration and while handlers are run. Thus, no two handlers 301the same handler) may run concurrently on different CPUs.
299can run simultaneously. To improve scalability on SMP systems, 302
300this restriction will probably be removed soon, in which case 303Kprobes does not use mutexes or allocate memory except during
301multiple handlers (or multiple instances of the same handler) may
302run concurrently on different CPUs. Code your handlers accordingly.
303
304Kprobes does not use semaphores or allocate memory except during
305registration and unregistration. 304registration and unregistration.
306 305
307Probe handlers are run with preemption disabled. Depending on the 306Probe handlers are run with preemption disabled. Depending on the
@@ -316,11 +315,18 @@ address instead of the real return address for kretprobed functions.
316(As far as we can tell, __builtin_return_address() is used only 315(As far as we can tell, __builtin_return_address() is used only
317for instrumentation and error reporting.) 316for instrumentation and error reporting.)
318 317
319If the number of times a function is called does not match the 318If the number of times a function is called does not match the number
320number of times it returns, registering a return probe on that 319of times it returns, registering a return probe on that function may
321function may produce undesirable results. We have the do_exit() 320produce undesirable results. We have the do_exit() case covered.
322and do_execve() cases covered. do_fork() is not an issue. We're 321do_execve() and do_fork() are not an issue. We're unaware of other
323unaware of other specific cases where this could be a problem. 322specific cases where this could be a problem.
323
324If, upon entry to or exit from a function, the CPU is running on
325a stack other than that of the current task, registering a return
326probe on that function may produce undesirable results. For this
327reason, Kprobes doesn't support return probes (or kprobes or jprobes)
328on the x86_64 version of __switch_to(); the registration functions
329return -EINVAL.
324 330
3256. Probe Overhead 3316. Probe Overhead
326 332
@@ -347,14 +353,12 @@ k = 0.77 usec; j = 1.31; r = 1.26; kr = 1.45; jr = 1.99
347 353
3487. TODO 3547. TODO
349 355
350a. SystemTap (http://sourceware.org/systemtap): Work in progress 356a. SystemTap (http://sourceware.org/systemtap): Provides a simplified
351to provide a simplified programming interface for probe-based 357programming interface for probe-based instrumentation. Try it out.
352instrumentation. 358b. Kernel return probes for sparc64.
353b. Improved SMP scalability: Currently, work is in progress to handle 359c. Support for other architectures.
354multiple kprobes in parallel. 360d. User-space probes.
355c. Kernel return probes for sparc64. 361e. Watchpoint probes (which fire on data references).
356d. Support for other architectures.
357e. User-space probes.
358 362
3598. Kprobes Example 3638. Kprobes Example
360 364
@@ -411,8 +415,7 @@ int init_module(void)
411 printk("Couldn't find %s to plant kprobe\n", "do_fork"); 415 printk("Couldn't find %s to plant kprobe\n", "do_fork");
412 return -1; 416 return -1;
413 } 417 }
414 ret = register_kprobe(&kp); 418 if ((ret = register_kprobe(&kp) < 0)) {
415 if (ret < 0) {
416 printk("register_kprobe failed, returned %d\n", ret); 419 printk("register_kprobe failed, returned %d\n", ret);
417 return -1; 420 return -1;
418 } 421 }
diff --git a/Documentation/mips/AU1xxx_IDE.README b/Documentation/mips/AU1xxx_IDE.README
index a7e4c4ea3560..afb31c141d9d 100644
--- a/Documentation/mips/AU1xxx_IDE.README
+++ b/Documentation/mips/AU1xxx_IDE.README
@@ -95,11 +95,13 @@ CONFIG_BLK_DEV_IDEDMA_PCI=y
95CONFIG_IDEDMA_PCI_AUTO=y 95CONFIG_IDEDMA_PCI_AUTO=y
96CONFIG_BLK_DEV_IDE_AU1XXX=y 96CONFIG_BLK_DEV_IDE_AU1XXX=y
97CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y 97CONFIG_BLK_DEV_IDE_AU1XXX_MDMA2_DBDMA=y
98CONFIG_BLK_DEV_IDE_AU1XXX_BURSTABLE_ON=y
99CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128 98CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
100CONFIG_BLK_DEV_IDEDMA=y 99CONFIG_BLK_DEV_IDEDMA=y
101CONFIG_IDEDMA_AUTO=y 100CONFIG_IDEDMA_AUTO=y
102 101
102Also define 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to enable
103the burst support on DBDMA controller.
104
103If the used system need the USB support enable the following kernel configs for 105If the used system need the USB support enable the following kernel configs for
104high IDE to USB throughput. 106high IDE to USB throughput.
105 107
@@ -115,6 +117,8 @@ CONFIG_BLK_DEV_IDE_AU1XXX_SEQTS_PER_RQ=128
115CONFIG_BLK_DEV_IDEDMA=y 117CONFIG_BLK_DEV_IDEDMA=y
116CONFIG_IDEDMA_AUTO=y 118CONFIG_IDEDMA_AUTO=y
117 119
120Also undefine 'IDE_AU1XXX_BURSTMODE' in 'drivers/ide/mips/au1xxx-ide.c' to
121disable the burst support on DBDMA controller.
118 122
119ADD NEW HARD DISC TO WHITE OR BLACK LIST 123ADD NEW HARD DISC TO WHITE OR BLACK LIST
120---------------------------------------- 124----------------------------------------
diff --git a/Documentation/networking/ip-sysctl.txt b/Documentation/networking/ip-sysctl.txt
index 2b7cf19a06ad..26364d06ae92 100644
--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -427,6 +427,23 @@ icmp_ignore_bogus_error_responses - BOOLEAN
427 will avoid log file clutter. 427 will avoid log file clutter.
428 Default: FALSE 428 Default: FALSE
429 429
430icmp_errors_use_inbound_ifaddr - BOOLEAN
431
432 If zero, icmp error messages are sent with the primary address of
433 the exiting interface.
434
435 If non-zero, the message will be sent with the primary address of
436 the interface that received the packet that caused the icmp error.
437 This is the behaviour network many administrators will expect from
438 a router. And it can make debugging complicated network layouts
439 much easier.
440
441 Note that if no primary address exists for the interface selected,
442 then the primary address of the first non-loopback interface that
443 has one will be used regarldess of this setting.
444
445 Default: 0
446
430igmp_max_memberships - INTEGER 447igmp_max_memberships - INTEGER
431 Change the maximum number of multicast groups we can subscribe to. 448 Change the maximum number of multicast groups we can subscribe to.
432 Default: 20 449 Default: 20
diff --git a/Documentation/parport-lowlevel.txt b/Documentation/parport-lowlevel.txt
index 1d40008a1926..8f2302415eff 100644
--- a/Documentation/parport-lowlevel.txt
+++ b/Documentation/parport-lowlevel.txt
@@ -1068,7 +1068,7 @@ SYNOPSIS
1068 1068
1069struct parport_operations { 1069struct parport_operations {
1070 ... 1070 ...
1071 void (*write_status) (struct parport *port, unsigned char s); 1071 void (*write_control) (struct parport *port, unsigned char s);
1072 ... 1072 ...
1073}; 1073};
1074 1074
@@ -1097,9 +1097,9 @@ SYNOPSIS
1097 1097
1098struct parport_operations { 1098struct parport_operations {
1099 ... 1099 ...
1100 void (*frob_control) (struct parport *port, 1100 unsigned char (*frob_control) (struct parport *port,
1101 unsigned char mask, 1101 unsigned char mask,
1102 unsigned char val); 1102 unsigned char val);
1103 ... 1103 ...
1104}; 1104};
1105 1105
diff --git a/Documentation/pci-error-recovery.txt b/Documentation/pci-error-recovery.txt
index d089967e4948..634d3e5b5756 100644
--- a/Documentation/pci-error-recovery.txt
+++ b/Documentation/pci-error-recovery.txt
@@ -1,246 +1,396 @@
1 1
2 PCI Error Recovery 2 PCI Error Recovery
3 ------------------ 3 ------------------
4 May 31, 2005 4 February 2, 2006
5 5
6 Current document maintainer: 6 Current document maintainer:
7 Linas Vepstas <linas@austin.ibm.com> 7 Linas Vepstas <linas@austin.ibm.com>
8 8
9 9
10Some PCI bus controllers are able to detect certain "hard" PCI errors 10Many PCI bus controllers are able to detect a variety of hardware
11on the bus, such as parity errors on the data and address busses, as 11PCI errors on the bus, such as parity errors on the data and address
12well as SERR and PERR errors. These chipsets are then able to disable 12busses, as well as SERR and PERR errors. Some of the more advanced
13I/O to/from the affected device, so that, for example, a bad DMA 13chipsets are able to deal with these errors; these include PCI-E chipsets,
14address doesn't end up corrupting system memory. These same chipsets 14and the PCI-host bridges found on IBM Power4 and Power5-based pSeries
15are also able to reset the affected PCI device, and return it to 15boxes. A typical action taken is to disconnect the affected device,
16working condition. This document describes a generic API form 16halting all I/O to it. The goal of a disconnection is to avoid system
17performing error recovery. 17corruption; for example, to halt system memory corruption due to DMA's
18 18to "wild" addresses. Typically, a reconnection mechanism is also
19The core idea is that after a PCI error has been detected, there must 19offered, so that the affected PCI device(s) are reset and put back
20be a way for the kernel to coordinate with all affected device drivers 20into working condition. The reset phase requires coordination
21so that the pci card can be made operational again, possibly after 21between the affected device drivers and the PCI controller chip.
22performing a full electrical #RST of the PCI card. The API below 22This document describes a generic API for notifying device drivers
23provides a generic API for device drivers to be notified of PCI 23of a bus disconnection, and then performing error recovery.
24errors, and to be notified of, and respond to, a reset sequence. 24This API is currently implemented in the 2.6.16 and later kernels.
25 25
26Preliminary sketch of API, cut-n-pasted-n-modified email from 26Reporting and recovery is performed in several steps. First, when
27Ben Herrenschmidt, circa 5 april 2005 27a PCI hardware error has resulted in a bus disconnect, that event
28is reported as soon as possible to all affected device drivers,
29including multiple instances of a device driver on multi-function
30cards. This allows device drivers to avoid deadlocking in spinloops,
31waiting for some i/o-space register to change, when it never will.
32It also gives the drivers a chance to defer incoming I/O as
33needed.
34
35Next, recovery is performed in several stages. Most of the complexity
36is forced by the need to handle multi-function devices, that is,
37devices that have multiple device drivers associated with them.
38In the first stage, each driver is allowed to indicate what type
39of reset it desires, the choices being a simple re-enabling of I/O
40or requesting a hard reset (a full electrical #RST of the PCI card).
41If any driver requests a full reset, that is what will be done.
42
43After a full reset and/or a re-enabling of I/O, all drivers are
44again notified, so that they may then perform any device setup/config
45that may be required. After these have all completed, a final
46"resume normal operations" event is sent out.
47
48The biggest reason for choosing a kernel-based implementation rather
49than a user-space implementation was the need to deal with bus
50disconnects of PCI devices attached to storage media, and, in particular,
51disconnects from devices holding the root file system. If the root
52file system is disconnected, a user-space mechanism would have to go
53through a large number of contortions to complete recovery. Almost all
54of the current Linux file systems are not tolerant of disconnection
55from/reconnection to their underlying block device. By contrast,
56bus errors are easy to manage in the device driver. Indeed, most
57device drivers already handle very similar recovery procedures;
58for example, the SCSI-generic layer already provides significant
59mechanisms for dealing with SCSI bus errors and SCSI bus resets.
60
61
62Detailed Design
63---------------
64Design and implementation details below, based on a chain of
65public email discussions with Ben Herrenschmidt, circa 5 April 2005.
28 66
29The error recovery API support is exposed to the driver in the form of 67The error recovery API support is exposed to the driver in the form of
30a structure of function pointers pointed to by a new field in struct 68a structure of function pointers pointed to by a new field in struct
31pci_driver. The absence of this pointer in pci_driver denotes an 69pci_driver. A driver that fails to provide the structure is "non-aware",
32"non-aware" driver, behaviour on these is platform dependant. 70and the actual recovery steps taken are platform dependent. The
33Platforms like ppc64 can try to simulate pci hotplug remove/add. 71arch/powerpc implementation will simulate a PCI hotplug remove/add.
34
35The definition of "pci_error_token" is not covered here. It is based on
36Seto's work on the synchronous error detection. We still need to define
37functions for extracting infos out of an opaque error token. This is
38separate from this API.
39 72
40This structure has the form: 73This structure has the form:
41
42struct pci_error_handlers 74struct pci_error_handlers
43{ 75{
44 int (*error_detected)(struct pci_dev *dev, pci_error_token error); 76 int (*error_detected)(struct pci_dev *dev, enum pci_channel_state);
45 int (*mmio_enabled)(struct pci_dev *dev); 77 int (*mmio_enabled)(struct pci_dev *dev);
46 int (*resume)(struct pci_dev *dev);
47 int (*link_reset)(struct pci_dev *dev); 78 int (*link_reset)(struct pci_dev *dev);
48 int (*slot_reset)(struct pci_dev *dev); 79 int (*slot_reset)(struct pci_dev *dev);
80 void (*resume)(struct pci_dev *dev);
49}; 81};
50 82
51A driver doesn't have to implement all of these callbacks. The 83The possible channel states are:
52only mandatory one is error_detected(). If a callback is not 84enum pci_channel_state {
53implemented, the corresponding feature is considered unsupported. 85 pci_channel_io_normal, /* I/O channel is in normal state */
54For example, if mmio_enabled() and resume() aren't there, then the 86 pci_channel_io_frozen, /* I/O to channel is blocked */
55driver is assumed as not doing any direct recovery and requires 87 pci_channel_io_perm_failure, /* PCI card is dead */
88};
89
90Possible return values are:
91enum pci_ers_result {
92 PCI_ERS_RESULT_NONE, /* no result/none/not supported in device driver */
93 PCI_ERS_RESULT_CAN_RECOVER, /* Device driver can recover without slot reset */
94 PCI_ERS_RESULT_NEED_RESET, /* Device driver wants slot to be reset. */
95 PCI_ERS_RESULT_DISCONNECT, /* Device has completely failed, is unrecoverable */
96 PCI_ERS_RESULT_RECOVERED, /* Device driver is fully recovered and operational */
97};
98
99A driver does not have to implement all of these callbacks; however,
100if it implements any, it must implement error_detected(). If a callback
101is not implemented, the corresponding feature is considered unsupported.
102For example, if mmio_enabled() and resume() aren't there, then it
103is assumed that the driver is not doing any direct recovery and requires
56a reset. If link_reset() is not implemented, the card is assumed as 104a reset. If link_reset() is not implemented, the card is assumed as
57not caring about link resets, in which case, if recover is supported, 105not care about link resets. Typically a driver will want to know about
58the core can try recover (but not slot_reset() unless it really did 106a slot_reset().
59reset the slot). If slot_reset() is not supported, link_reset() can 107
60be called instead on a slot reset. 108The actual steps taken by a platform to recover from a PCI error
61 109event will be platform-dependent, but will follow the general
62At first, the call will always be : 110sequence described below.
63 111
64 1) error_detected() 112STEP 0: Error Event
65 113-------------------
66 Error detected. This is sent once after an error has been detected. At 114PCI bus error is detect by the PCI hardware. On powerpc, the slot
67this point, the device might not be accessible anymore depending on the 115is isolated, in that all I/O is blocked: all reads return 0xffffffff,
68platform (the slot will be isolated on ppc64). The driver may already 116all writes are ignored.
69have "noticed" the error because of a failing IO, but this is the proper 117
70"synchronisation point", that is, it gives a chance to the driver to 118
71cleanup, waiting for pending stuff (timers, whatever, etc...) to 119STEP 1: Notification
72complete; it can take semaphores, schedule, etc... everything but touch 120--------------------
73the device. Within this function and after it returns, the driver 121Platform calls the error_detected() callback on every instance of
122every driver affected by the error.
123
124At this point, the device might not be accessible anymore, depending on
125the platform (the slot will be isolated on powerpc). The driver may
126already have "noticed" the error because of a failing I/O, but this
127is the proper "synchronization point", that is, it gives the driver
128a chance to cleanup, waiting for pending stuff (timers, whatever, etc...)
129to complete; it can take semaphores, schedule, etc... everything but
130touch the device. Within this function and after it returns, the driver
74shouldn't do any new IOs. Called in task context. This is sort of a 131shouldn't do any new IOs. Called in task context. This is sort of a
75"quiesce" point. See note about interrupts at the end of this doc. 132"quiesce" point. See note about interrupts at the end of this doc.
76 133
77 Result codes: 134All drivers participating in this system must implement this call.
78 - PCIERR_RESULT_CAN_RECOVER: 135The driver must return one of the following result codes:
79 Driever returns this if it thinks it might be able to recover 136 - PCI_ERS_RESULT_CAN_RECOVER:
137 Driver returns this if it thinks it might be able to recover
80 the HW by just banging IOs or if it wants to be given 138 the HW by just banging IOs or if it wants to be given
81 a chance to extract some diagnostic informations (see 139 a chance to extract some diagnostic information (see
82 below). 140 mmio_enable, below).
83 - PCIERR_RESULT_NEED_RESET: 141 - PCI_ERS_RESULT_NEED_RESET:
84 Driver returns this if it thinks it can't recover unless the 142 Driver returns this if it can't recover without a hard
85 slot is reset. 143 slot reset.
86 - PCIERR_RESULT_DISCONNECT: 144 - PCI_ERS_RESULT_DISCONNECT:
87 Return this if driver thinks it won't recover at all, 145 Driver returns this if it doesn't want to recover at all.
88 (this will detach the driver ? or just leave it 146
89 dangling ? to be decided) 147The next step taken will depend on the result codes returned by the
90 148drivers.
91So at this point, we have called error_detected() for all drivers 149
92on the segment that had the error. On ppc64, the slot is isolated. What 150If all drivers on the segment/slot return PCI_ERS_RESULT_CAN_RECOVER,
93happens now typically depends on the result from the drivers. If all 151then the platform should re-enable IOs on the slot (or do nothing in
94drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would 152particular, if the platform doesn't isolate slots), and recovery
95re-enable IOs on the slot (or do nothing special if the platform doesn't 153proceeds to STEP 2 (MMIO Enable).
96isolate slots) and call 2). If not and we can reset slots, we go to 4), 154
97if neither, we have a dead slot. If it's an hotplug slot, we might 155If any driver requested a slot reset (by returning PCI_ERS_RESULT_NEED_RESET),
98"simulate" reset by triggering HW unplug/replug though. 156then recovery proceeds to STEP 4 (Slot Reset).
99 157
100>>> Current ppc64 implementation assumes that a device driver will 158If the platform is unable to recover the slot, the next step
101>>> *not* schedule or semaphore in this routine; the current ppc64 159is STEP 6 (Permanent Failure).
160
161>>> The current powerpc implementation assumes that a device driver will
162>>> *not* schedule or semaphore in this routine; the current powerpc
102>>> implementation uses one kernel thread to notify all devices; 163>>> implementation uses one kernel thread to notify all devices;
103>>> thus, of one device sleeps/schedules, all devices are affected. 164>>> thus, if one device sleeps/schedules, all devices are affected.
104>>> Doing better requires complex multi-threaded logic in the error 165>>> Doing better requires complex multi-threaded logic in the error
105>>> recovery implementation (e.g. waiting for all notification threads 166>>> recovery implementation (e.g. waiting for all notification threads
106>>> to "join" before proceeding with recovery.) This seems excessively 167>>> to "join" before proceeding with recovery.) This seems excessively
107>>> complex and not worth implementing. 168>>> complex and not worth implementing.
108 169
109>>> The current ppc64 implementation doesn't much care if the device 170>>> The current powerpc implementation doesn't much care if the device
110>>> attempts i/o at this point, or not. I/O's will fail, returning 171>>> attempts I/O at this point, or not. I/O's will fail, returning
111>>> a value of 0xff on read, and writes will be dropped. If the device 172>>> a value of 0xff on read, and writes will be dropped. If the device
112>>> driver attempts more than 10K I/O's to a frozen adapter, it will 173>>> driver attempts more than 10K I/O's to a frozen adapter, it will
113>>> assume that the device driver has gone into an infinite loop, and 174>>> assume that the device driver has gone into an infinite loop, and
114>>> it will panic the the kernel. 175>>> it will panic the the kernel. There doesn't seem to be any other
176>>> way of stopping a device driver that insists on spinning on I/O.
115 177
116 2) mmio_enabled() 178STEP 2: MMIO Enabled
179-------------------
180The platform re-enables MMIO to the device (but typically not the
181DMA), and then calls the mmio_enabled() callback on all affected
182device drivers.
117 183
118 This is the "early recovery" call. IOs are allowed again, but DMA is 184This is the "early recovery" call. IOs are allowed again, but DMA is
119not (hrm... to be discussed, I prefer not), with some restrictions. This 185not (hrm... to be discussed, I prefer not), with some restrictions. This
120is NOT a callback for the driver to start operations again, only to 186is NOT a callback for the driver to start operations again, only to
121peek/poke at the device, extract diagnostic information, if any, and 187peek/poke at the device, extract diagnostic information, if any, and
122eventually do things like trigger a device local reset or some such, 188eventually do things like trigger a device local reset or some such,
123but not restart operations. This is sent if all drivers on a segment 189but not restart operations. This is callback is made if all drivers on
124agree that they can try to recover and no automatic link reset was 190a segment agree that they can try to recover and if no automatic link reset
125performed by the HW. If the platform can't just re-enable IOs without 191was performed by the HW. If the platform can't just re-enable IOs without
126a slot reset or a link reset, it doesn't call this callback and goes 192a slot reset or a link reset, it wont call this callback, and instead
127directly to 3) or 4). All IOs should be done _synchronously_ from 193will have gone directly to STEP 3 (Link Reset) or STEP 4 (Slot Reset)
128within this callback, errors triggered by them will be returned via 194
129the normal pci_check_whatever() api, no new error_detected() callback 195>>> The following is proposed; no platform implements this yet:
130will be issued due to an error happening here. However, such an error 196>>> Proposal: All I/O's should be done _synchronously_ from within
131might cause IOs to be re-blocked for the whole segment, and thus 197>>> this callback, errors triggered by them will be returned via
132invalidate the recovery that other devices on the same segment might 198>>> the normal pci_check_whatever() API, no new error_detected()
133have done, forcing the whole segment into one of the next states, 199>>> callback will be issued due to an error happening here. However,
134that is link reset or slot reset. 200>>> such an error might cause IOs to be re-blocked for the whole
135 201>>> segment, and thus invalidate the recovery that other devices
136 Result codes: 202>>> on the same segment might have done, forcing the whole segment
137 - PCIERR_RESULT_RECOVERED 203>>> into one of the next states, that is, link reset or slot reset.
204
205The driver should return one of the following result codes:
206 - PCI_ERS_RESULT_RECOVERED
138 Driver returns this if it thinks the device is fully 207 Driver returns this if it thinks the device is fully
139 functionnal and thinks it is ready to start 208 functional and thinks it is ready to start
140 normal driver operations again. There is no 209 normal driver operations again. There is no
141 guarantee that the driver will actually be 210 guarantee that the driver will actually be
142 allowed to proceed, as another driver on the 211 allowed to proceed, as another driver on the
143 same segment might have failed and thus triggered a 212 same segment might have failed and thus triggered a
144 slot reset on platforms that support it. 213 slot reset on platforms that support it.
145 214
146 - PCIERR_RESULT_NEED_RESET 215 - PCI_ERS_RESULT_NEED_RESET
147 Driver returns this if it thinks the device is not 216 Driver returns this if it thinks the device is not
148 recoverable in it's current state and it needs a slot 217 recoverable in it's current state and it needs a slot
149 reset to proceed. 218 reset to proceed.
150 219
151 - PCIERR_RESULT_DISCONNECT 220 - PCI_ERS_RESULT_DISCONNECT
152 Same as above. Total failure, no recovery even after 221 Same as above. Total failure, no recovery even after
153 reset driver dead. (To be defined more precisely) 222 reset driver dead. (To be defined more precisely)
154 223
155>>> The current ppc64 implementation does not implement this callback. 224The next step taken depends on the results returned by the drivers.
225If all drivers returned PCI_ERS_RESULT_RECOVERED, then the platform
226proceeds to either STEP3 (Link Reset) or to STEP 5 (Resume Operations).
227
228If any driver returned PCI_ERS_RESULT_NEED_RESET, then the platform
229proceeds to STEP 4 (Slot Reset)
156 230
157 3) link_reset() 231>>> The current powerpc implementation does not implement this callback.
158 232
159 This is called after the link has been reset. This is typically 233
160a PCI Express specific state at this point and is done whenever a 234STEP 3: Link Reset
161non-fatal error has been detected that can be "solved" by resetting 235------------------
162the link. This call informs the driver of the reset and the driver 236The platform resets the link, and then calls the link_reset() callback
163should check if the device appears to be in working condition. 237on all affected device drivers. This is a PCI-Express specific state
164This function acts a bit like 2) mmio_enabled(), in that the driver 238and is done whenever a non-fatal error has been detected that can be
165is not supposed to restart normal driver I/O operations right away. 239"solved" by resetting the link. This call informs the driver of the
166Instead, it should just "probe" the device to check it's recoverability 240reset and the driver should check to see if the device appears to be
167status. If all is right, then the core will call resume() once all 241in working condition.
168drivers have ack'd link_reset(). 242
243The driver is not supposed to restart normal driver I/O operations
244at this point. It should limit itself to "probing" the device to
245check it's recoverability status. If all is right, then the platform
246will call resume() once all drivers have ack'd link_reset().
169 247
170 Result codes: 248 Result codes:
171 (identical to mmio_enabled) 249 (identical to STEP 3 (MMIO Enabled)
250
251The platform then proceeds to either STEP 4 (Slot Reset) or STEP 5
252(Resume Operations).
253
254>>> The current powerpc implementation does not implement this callback.
255
256
257STEP 4: Slot Reset
258------------------
259The platform performs a soft or hard reset of the device, and then
260calls the slot_reset() callback.
261
262A soft reset consists of asserting the adapter #RST line and then
263restoring the PCI BAR's and PCI configuration header to a state
264that is equivalent to what it would be after a fresh system
265power-on followed by power-on BIOS/system firmware initialization.
266If the platform supports PCI hotplug, then the reset might be
267performed by toggling the slot electrical power off/on.
172 268
173>>> The current ppc64 implementation does not implement this callback. 269It is important for the platform to restore the PCI config space
270to the "fresh poweron" state, rather than the "last state". After
271a slot reset, the device driver will almost always use its standard
272device initialization routines, and an unusual config space setup
273may result in hung devices, kernel panics, or silent data corruption.
174 274
175 4) slot_reset() 275This call gives drivers the chance to re-initialize the hardware
276(re-download firmware, etc.). At this point, the driver may assume
277that he card is in a fresh state and is fully functional. In
278particular, interrupt generation should work normally.
176 279
177 This is called after the slot has been soft or hard reset by the 280Drivers should not yet restart normal I/O processing operations
178platform. A soft reset consists of asserting the adapter #RST line 281at this point. If all device drivers report success on this
179and then restoring the PCI BARs and PCI configuration header. If the 282callback, the platform will call resume() to complete the sequence,
180platform supports PCI hotplug, then it might instead perform a hard 283and let the driver restart normal I/O processing.
181reset by toggling power on the slot off/on. This call gives drivers
182the chance to re-initialize the hardware (re-download firmware, etc.),
183but drivers shouldn't restart normal I/O processing operations at
184this point. (See note about interrupts; interrupts aren't guaranteed
185to be delivered until the resume() callback has been called). If all
186device drivers report success on this callback, the patform will call
187resume() to complete the error handling and let the driver restart
188normal I/O processing.
189 284
190A driver can still return a critical failure for this function if 285A driver can still return a critical failure for this function if
191it can't get the device operational after reset. If the platform 286it can't get the device operational after reset. If the platform
192previously tried a soft reset, it migh now try a hard reset (power 287previously tried a soft reset, it might now try a hard reset (power
193cycle) and then call slot_reset() again. It the device still can't 288cycle) and then call slot_reset() again. It the device still can't
194be recovered, there is nothing more that can be done; the platform 289be recovered, there is nothing more that can be done; the platform
195will typically report a "permanent failure" in such a case. The 290will typically report a "permanent failure" in such a case. The
196device will be considered "dead" in this case. 291device will be considered "dead" in this case.
197 292
198 Result codes: 293Drivers for multi-function cards will need to coordinate among
199 - PCIERR_RESULT_DISCONNECT 294themselves as to which driver instance will perform any "one-shot"
200 Same as above. 295or global device initialization. For example, the Symbios sym53cxx2
296driver performs device init only from PCI function 0:
201 297
202>>> The current ppc64 implementation does not try a power-cycle reset 298+ if (PCI_FUNC(pdev->devfn) == 0)
203>>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. 299+ sym_reset_scsi_bus(np, 0);
204 300
205 5) resume() 301 Result codes:
206 302 - PCI_ERS_RESULT_DISCONNECT
207 This is called if all drivers on the segment have returned 303 Same as above.
208PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks.
209That basically tells the driver to restart activity, tht everything
210is back and running. No result code is taken into account here. If
211a new error happens, it will restart a new error handling process.
212 304
213That's it. I think this covers all the possibilities. The way those 305Platform proceeds either to STEP 5 (Resume Operations) or STEP 6 (Permanent
214callbacks are called is platform policy. A platform with no slot reset 306Failure).
215capability for example may want to just "ignore" drivers that can't 307
308>>> The current powerpc implementation does not currently try a
309>>> power-cycle reset if the driver returned PCI_ERS_RESULT_DISCONNECT.
310>>> However, it probably should.
311
312
313STEP 5: Resume Operations
314-------------------------
315The platform will call the resume() callback on all affected device
316drivers if all drivers on the segment have returned
317PCI_ERS_RESULT_RECOVERED from one of the 3 previous callbacks.
318The goal of this callback is to tell the driver to restart activity,
319that everything is back and running. This callback does not return
320a result code.
321
322At this point, if a new error happens, the platform will restart
323a new error recovery sequence.
324
325STEP 6: Permanent Failure
326-------------------------
327A "permanent failure" has occurred, and the platform cannot recover
328the device. The platform will call error_detected() with a
329pci_channel_state value of pci_channel_io_perm_failure.
330
331The device driver should, at this point, assume the worst. It should
332cancel all pending I/O, refuse all new I/O, returning -EIO to
333higher layers. The device driver should then clean up all of its
334memory and remove itself from kernel operations, much as it would
335during system shutdown.
336
337The platform will typically notify the system operator of the
338permanent failure in some way. If the device is hotplug-capable,
339the operator will probably want to remove and replace the device.
340Note, however, not all failures are truly "permanent". Some are
341caused by over-heating, some by a poorly seated card. Many
342PCI error events are caused by software bugs, e.g. DMA's to
343wild addresses or bogus split transactions due to programming
344errors. See the discussion in powerpc/eeh-pci-error-recovery.txt
345for additional detail on real-life experience of the causes of
346software errors.
347
348
349Conclusion; General Remarks
350---------------------------
351The way those callbacks are called is platform policy. A platform with
352no slot reset capability may want to just "ignore" drivers that can't
216recover (disconnect them) and try to let other cards on the same segment 353recover (disconnect them) and try to let other cards on the same segment
217recover. Keep in mind that in most real life cases, though, there will 354recover. Keep in mind that in most real life cases, though, there will
218be only one driver per segment. 355be only one driver per segment.
219 356
220Now, there is a note about interrupts. If you get an interrupt and your 357Now, a note about interrupts. If you get an interrupt and your
221device is dead or has been isolated, there is a problem :) 358device is dead or has been isolated, there is a problem :)
222 359The current policy is to turn this into a platform policy.
223After much thinking, I decided to leave that to the platform. That is, 360That is, the recovery API only requires that:
224the recovery API only precies that:
225 361
226 - There is no guarantee that interrupt delivery can proceed from any 362 - There is no guarantee that interrupt delivery can proceed from any
227device on the segment starting from the error detection and until the 363device on the segment starting from the error detection and until the
228restart callback is sent, at which point interrupts are expected to be 364resume callback is sent, at which point interrupts are expected to be
229fully operational. 365fully operational.
230 366
231 - There is no guarantee that interrupt delivery is stopped, that is, ad 367 - There is no guarantee that interrupt delivery is stopped, that is,
232river that gets an interrupts after detecting an error, or that detects 368a driver that gets an interrupt after detecting an error, or that detects
233and error within the interrupt handler such that it prevents proper 369an error within the interrupt handler such that it prevents proper
234ack'ing of the interrupt (and thus removal of the source) should just 370ack'ing of the interrupt (and thus removal of the source) should just
235return IRQ_NOTHANDLED. It's up to the platform to deal with taht 371return IRQ_NOTHANDLED. It's up to the platform to deal with that
236condition, typically by masking the irq source during the duration of 372condition, typically by masking the IRQ source during the duration of
237the error handling. It is expected that the platform "knows" which 373the error handling. It is expected that the platform "knows" which
238interrupts are routed to error-management capable slots and can deal 374interrupts are routed to error-management capable slots and can deal
239with temporarily disabling that irq number during error processing (this 375with temporarily disabling that IRQ number during error processing (this
240isn't terribly complex). That means some IRQ latency for other devices 376isn't terribly complex). That means some IRQ latency for other devices
241sharing the interrupt, but there is simply no other way. High end 377sharing the interrupt, but there is simply no other way. High end
242platforms aren't supposed to share interrupts between many devices 378platforms aren't supposed to share interrupts between many devices
243anyway :) 379anyway :)
244 380
245 381>>> Implementation details for the powerpc platform are discussed in
246Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com> 382>>> the file Documentation/powerpc/eeh-pci-error-recovery.txt
383
384>>> As of this writing, there are six device drivers with patches
385>>> implementing error recovery. Not all of these patches are in
386>>> mainline yet. These may be used as "examples":
387>>>
388>>> drivers/scsi/ipr.c
389>>> drivers/scsi/sym53cxx_2
390>>> drivers/next/e100.c
391>>> drivers/net/e1000
392>>> drivers/net/ixgb
393>>> drivers/net/s2io.c
394
395The End
396-------
diff --git a/Documentation/power/interface.txt b/Documentation/power/interface.txt
index bd4ffb5bd49a..4117802af0f8 100644
--- a/Documentation/power/interface.txt
+++ b/Documentation/power/interface.txt
@@ -44,7 +44,7 @@ it.
44/sys/power/image_size controls the size of the image created by 44/sys/power/image_size controls the size of the image created by
45the suspend-to-disk mechanism. It can be written a string 45the suspend-to-disk mechanism. It can be written a string
46representing a non-negative integer that will be used as an upper 46representing a non-negative integer that will be used as an upper
47limit of the image size, in megabytes. The suspend-to-disk mechanism will 47limit of the image size, in bytes. The suspend-to-disk mechanism will
48do its best to ensure the image size will not exceed that number. However, 48do its best to ensure the image size will not exceed that number. However,
49if this turns out to be impossible, it will try to suspend anyway using the 49if this turns out to be impossible, it will try to suspend anyway using the
50smallest image possible. In particular, if "0" is written to this file, the 50smallest image possible. In particular, if "0" is written to this file, the
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 08c79d4dc540..b28b7f04abb8 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -27,7 +27,7 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state
27 27
28echo platform > /sys/power/disk; echo disk > /sys/power/state 28echo platform > /sys/power/disk; echo disk > /sys/power/state
29 29
30If you want to limit the suspend image size to N megabytes, do 30If you want to limit the suspend image size to N bytes, do
31 31
32echo N > /sys/power/image_size 32echo N > /sys/power/image_size
33 33
diff --git a/Documentation/powerpc/booting-without-of.txt b/Documentation/powerpc/booting-without-of.txt
new file mode 100644
index 000000000000..d02c64953dcd
--- /dev/null
+++ b/Documentation/powerpc/booting-without-of.txt
@@ -0,0 +1,1486 @@
1 Booting the Linux/ppc kernel without Open Firmware
2 --------------------------------------------------
3
4
5(c) 2005 Benjamin Herrenschmidt <benh at kernel.crashing.org>,
6 IBM Corp.
7(c) 2005 Becky Bruce <becky.bruce at freescale.com>,
8 Freescale Semiconductor, FSL SOC and 32-bit additions
9
10 May 18, 2005: Rev 0.1 - Initial draft, no chapter III yet.
11
12 May 19, 2005: Rev 0.2 - Add chapter III and bits & pieces here or
13 clarifies the fact that a lot of things are
14 optional, the kernel only requires a very
15 small device tree, though it is encouraged
16 to provide an as complete one as possible.
17
18 May 24, 2005: Rev 0.3 - Precise that DT block has to be in RAM
19 - Misc fixes
20 - Define version 3 and new format version 16
21 for the DT block (version 16 needs kernel
22 patches, will be fwd separately).
23 String block now has a size, and full path
24 is replaced by unit name for more
25 compactness.
26 linux,phandle is made optional, only nodes
27 that are referenced by other nodes need it.
28 "name" property is now automatically
29 deduced from the unit name
30
31 June 1, 2005: Rev 0.4 - Correct confusion between OF_DT_END and
32 OF_DT_END_NODE in structure definition.
33 - Change version 16 format to always align
34 property data to 4 bytes. Since tokens are
35 already aligned, that means no specific
36 required alignement between property size
37 and property data. The old style variable
38 alignment would make it impossible to do
39 "simple" insertion of properties using
40 memove (thanks Milton for
41 noticing). Updated kernel patch as well
42 - Correct a few more alignement constraints
43 - Add a chapter about the device-tree
44 compiler and the textural representation of
45 the tree that can be "compiled" by dtc.
46
47 November 21, 2005: Rev 0.5
48 - Additions/generalizations for 32-bit
49 - Changed to reflect the new arch/powerpc
50 structure
51 - Added chapter VI
52
53
54 ToDo:
55 - Add some definitions of interrupt tree (simple/complex)
56 - Add some definitions for pci host bridges
57 - Add some common address format examples
58 - Add definitions for standard properties and "compatible"
59 names for cells that are not already defined by the existing
60 OF spec.
61 - Compare FSL SOC use of PCI to standard and make sure no new
62 node definition required.
63 - Add more information about node definitions for SOC devices
64 that currently have no standard, like the FSL CPM.
65
66
67I - Introduction
68================
69
70During the recent development of the Linux/ppc64 kernel, and more
71specifically, the addition of new platform types outside of the old
72IBM pSeries/iSeries pair, it was decided to enforce some strict rules
73regarding the kernel entry and bootloader <-> kernel interfaces, in
74order to avoid the degeneration that had become the ppc32 kernel entry
75point and the way a new platform should be added to the kernel. The
76legacy iSeries platform breaks those rules as it predates this scheme,
77but no new board support will be accepted in the main tree that
78doesn't follows them properly. In addition, since the advent of the
79arch/powerpc merged architecture for ppc32 and ppc64, new 32-bit
80platforms and 32-bit platforms which move into arch/powerpc will be
81required to use these rules as well.
82
83The main requirement that will be defined in more detail below is
84the presence of a device-tree whose format is defined after Open
85Firmware specification. However, in order to make life easier
86to embedded board vendors, the kernel doesn't require the device-tree
87to represent every device in the system and only requires some nodes
88and properties to be present. This will be described in detail in
89section III, but, for example, the kernel does not require you to
90create a node for every PCI device in the system. It is a requirement
91to have a node for PCI host bridges in order to provide interrupt
92routing informations and memory/IO ranges, among others. It is also
93recommended to define nodes for on chip devices and other busses that
94don't specifically fit in an existing OF specification. This creates a
95great flexibility in the way the kernel can then probe those and match
96drivers to device, without having to hard code all sorts of tables. It
97also makes it more flexible for board vendors to do minor hardware
98upgrades without significantly impacting the kernel code or cluttering
99it with special cases.
100
101
1021) Entry point for arch/powerpc
103-------------------------------
104
105 There is one and one single entry point to the kernel, at the start
106 of the kernel image. That entry point supports two calling
107 conventions:
108
109 a) Boot from Open Firmware. If your firmware is compatible
110 with Open Firmware (IEEE 1275) or provides an OF compatible
111 client interface API (support for "interpret" callback of
112 forth words isn't required), you can enter the kernel with:
113
114 r5 : OF callback pointer as defined by IEEE 1275
115 bindings to powerpc. Only the 32 bit client interface
116 is currently supported
117
118 r3, r4 : address & length of an initrd if any or 0
119
120 The MMU is either on or off; the kernel will run the
121 trampoline located in arch/powerpc/kernel/prom_init.c to
122 extract the device-tree and other information from open
123 firmware and build a flattened device-tree as described
124 in b). prom_init() will then re-enter the kernel using
125 the second method. This trampoline code runs in the
126 context of the firmware, which is supposed to handle all
127 exceptions during that time.
128
129 b) Direct entry with a flattened device-tree block. This entry
130 point is called by a) after the OF trampoline and can also be
131 called directly by a bootloader that does not support the Open
132 Firmware client interface. It is also used by "kexec" to
133 implement "hot" booting of a new kernel from a previous
134 running one. This method is what I will describe in more
135 details in this document, as method a) is simply standard Open
136 Firmware, and thus should be implemented according to the
137 various standard documents defining it and its binding to the
138 PowerPC platform. The entry point definition then becomes:
139
140 r3 : physical pointer to the device-tree block
141 (defined in chapter II) in RAM
142
143 r4 : physical pointer to the kernel itself. This is
144 used by the assembly code to properly disable the MMU
145 in case you are entering the kernel with MMU enabled
146 and a non-1:1 mapping.
147
148 r5 : NULL (as to differenciate with method a)
149
150 Note about SMP entry: Either your firmware puts your other
151 CPUs in some sleep loop or spin loop in ROM where you can get
152 them out via a soft reset or some other means, in which case
153 you don't need to care, or you'll have to enter the kernel
154 with all CPUs. The way to do that with method b) will be
155 described in a later revision of this document.
156
157
1582) Board support
159----------------
160
16164-bit kernels:
162
163 Board supports (platforms) are not exclusive config options. An
164 arbitrary set of board supports can be built in a single kernel
165 image. The kernel will "know" what set of functions to use for a
166 given platform based on the content of the device-tree. Thus, you
167 should:
168
169 a) add your platform support as a _boolean_ option in
170 arch/powerpc/Kconfig, following the example of PPC_PSERIES,
171 PPC_PMAC and PPC_MAPLE. The later is probably a good
172 example of a board support to start from.
173
174 b) create your main platform file as
175 "arch/powerpc/platforms/myplatform/myboard_setup.c" and add it
176 to the Makefile under the condition of your CONFIG_
177 option. This file will define a structure of type "ppc_md"
178 containing the various callbacks that the generic code will
179 use to get to your platform specific code
180
181 c) Add a reference to your "ppc_md" structure in the
182 "machines" table in arch/powerpc/kernel/setup_64.c if you are
183 a 64-bit platform.
184
185 d) request and get assigned a platform number (see PLATFORM_*
186 constants in include/asm-powerpc/processor.h
187
18832-bit embedded kernels:
189
190 Currently, board support is essentially an exclusive config option.
191 The kernel is configured for a single platform. Part of the reason
192 for this is to keep kernels on embedded systems small and efficient;
193 part of this is due to the fact the code is already that way. In the
194 future, a kernel may support multiple platforms, but only if the
195 platforms feature the same core architectire. A single kernel build
196 cannot support both configurations with Book E and configurations
197 with classic Powerpc architectures.
198
199 32-bit embedded platforms that are moved into arch/powerpc using a
200 flattened device tree should adopt the merged tree practice of
201 setting ppc_md up dynamically, even though the kernel is currently
202 built with support for only a single platform at a time. This allows
203 unification of the setup code, and will make it easier to go to a
204 multiple-platform-support model in the future.
205
206NOTE: I believe the above will be true once Ben's done with the merge
207of the boot sequences.... someone speak up if this is wrong!
208
209 To add a 32-bit embedded platform support, follow the instructions
210 for 64-bit platforms above, with the exception that the Kconfig
211 option should be set up such that the kernel builds exclusively for
212 the platform selected. The processor type for the platform should
213 enable another config option to select the specific board
214 supported.
215
216NOTE: If ben doesn't merge the setup files, may need to change this to
217point to setup_32.c
218
219
220 I will describe later the boot process and various callbacks that
221 your platform should implement.
222
223
224II - The DT block format
225========================
226
227
228This chapter defines the actual format of the flattened device-tree
229passed to the kernel. The actual content of it and kernel requirements
230are described later. You can find example of code manipulating that
231format in various places, including arch/powerpc/kernel/prom_init.c
232which will generate a flattened device-tree from the Open Firmware
233representation, or the fs2dt utility which is part of the kexec tools
234which will generate one from a filesystem representation. It is
235expected that a bootloader like uboot provides a bit more support,
236that will be discussed later as well.
237
238Note: The block has to be in main memory. It has to be accessible in
239both real mode and virtual mode with no mapping other than main
240memory. If you are writing a simple flash bootloader, it should copy
241the block to RAM before passing it to the kernel.
242
243
2441) Header
245---------
246
247 The kernel is entered with r3 pointing to an area of memory that is
248 roughtly described in include/asm-powerpc/prom.h by the structure
249 boot_param_header:
250
251struct boot_param_header {
252 u32 magic; /* magic word OF_DT_HEADER */
253 u32 totalsize; /* total size of DT block */
254 u32 off_dt_struct; /* offset to structure */
255 u32 off_dt_strings; /* offset to strings */
256 u32 off_mem_rsvmap; /* offset to memory reserve map
257*/
258 u32 version; /* format version */
259 u32 last_comp_version; /* last compatible version */
260
261 /* version 2 fields below */
262 u32 boot_cpuid_phys; /* Which physical CPU id we're
263 booting on */
264 /* version 3 fields below */
265 u32 size_dt_strings; /* size of the strings block */
266};
267
268 Along with the constants:
269
270/* Definitions used by the flattened device tree */
271#define OF_DT_HEADER 0xd00dfeed /* 4: version,
272 4: total size */
273#define OF_DT_BEGIN_NODE 0x1 /* Start node: full name
274*/
275#define OF_DT_END_NODE 0x2 /* End node */
276#define OF_DT_PROP 0x3 /* Property: name off,
277 size, content */
278#define OF_DT_END 0x9
279
280 All values in this header are in big endian format, the various
281 fields in this header are defined more precisely below. All
282 "offset" values are in bytes from the start of the header; that is
283 from the value of r3.
284
285 - magic
286
287 This is a magic value that "marks" the beginning of the
288 device-tree block header. It contains the value 0xd00dfeed and is
289 defined by the constant OF_DT_HEADER
290
291 - totalsize
292
293 This is the total size of the DT block including the header. The
294 "DT" block should enclose all data structures defined in this
295 chapter (who are pointed to by offsets in this header). That is,
296 the device-tree structure, strings, and the memory reserve map.
297
298 - off_dt_struct
299
300 This is an offset from the beginning of the header to the start
301 of the "structure" part the device tree. (see 2) device tree)
302
303 - off_dt_strings
304
305 This is an offset from the beginning of the header to the start
306 of the "strings" part of the device-tree
307
308 - off_mem_rsvmap
309
310 This is an offset from the beginning of the header to the start
311 of the reserved memory map. This map is a list of pairs of 64
312 bit integers. Each pair is a physical address and a size. The
313
314 list is terminated by an entry of size 0. This map provides the
315 kernel with a list of physical memory areas that are "reserved"
316 and thus not to be used for memory allocations, especially during
317 early initialization. The kernel needs to allocate memory during
318 boot for things like un-flattening the device-tree, allocating an
319 MMU hash table, etc... Those allocations must be done in such a
320 way to avoid overriding critical things like, on Open Firmware
321 capable machines, the RTAS instance, or on some pSeries, the TCE
322 tables used for the iommu. Typically, the reserve map should
323 contain _at least_ this DT block itself (header,total_size). If
324 you are passing an initrd to the kernel, you should reserve it as
325 well. You do not need to reserve the kernel image itself. The map
326 should be 64 bit aligned.
327
328 - version
329
330 This is the version of this structure. Version 1 stops
331 here. Version 2 adds an additional field boot_cpuid_phys.
332 Version 3 adds the size of the strings block, allowing the kernel
333 to reallocate it easily at boot and free up the unused flattened
334 structure after expansion. Version 16 introduces a new more
335 "compact" format for the tree itself that is however not backward
336 compatible. You should always generate a structure of the highest
337 version defined at the time of your implementation. Currently
338 that is version 16, unless you explicitely aim at being backward
339 compatible.
340
341 - last_comp_version
342
343 Last compatible version. This indicates down to what version of
344 the DT block you are backward compatible. For example, version 2
345 is backward compatible with version 1 (that is, a kernel build
346 for version 1 will be able to boot with a version 2 format). You
347 should put a 1 in this field if you generate a device tree of
348 version 1 to 3, or 0x10 if you generate a tree of version 0x10
349 using the new unit name format.
350
351 - boot_cpuid_phys
352
353 This field only exist on version 2 headers. It indicate which
354 physical CPU ID is calling the kernel entry point. This is used,
355 among others, by kexec. If you are on an SMP system, this value
356 should match the content of the "reg" property of the CPU node in
357 the device-tree corresponding to the CPU calling the kernel entry
358 point (see further chapters for more informations on the required
359 device-tree contents)
360
361
362 So the typical layout of a DT block (though the various parts don't
363 need to be in that order) looks like this (addresses go from top to
364 bottom):
365
366
367 ------------------------------
368 r3 -> | struct boot_param_header |
369 ------------------------------
370 | (alignment gap) (*) |
371 ------------------------------
372 | memory reserve map |
373 ------------------------------
374 | (alignment gap) |
375 ------------------------------
376 | |
377 | device-tree structure |
378 | |
379 ------------------------------
380 | (alignment gap) |
381 ------------------------------
382 | |
383 | device-tree strings |
384 | |
385 -----> ------------------------------
386 |
387 |
388 --- (r3 + totalsize)
389
390 (*) The alignment gaps are not necessarily present; their presence
391 and size are dependent on the various alignment requirements of
392 the individual data blocks.
393
394
3952) Device tree generalities
396---------------------------
397
398This device-tree itself is separated in two different blocks, a
399structure block and a strings block. Both need to be aligned to a 4
400byte boundary.
401
402First, let's quickly describe the device-tree concept before detailing
403the storage format. This chapter does _not_ describe the detail of the
404required types of nodes & properties for the kernel, this is done
405later in chapter III.
406
407The device-tree layout is strongly inherited from the definition of
408the Open Firmware IEEE 1275 device-tree. It's basically a tree of
409nodes, each node having two or more named properties. A property can
410have a value or not.
411
412It is a tree, so each node has one and only one parent except for the
413root node who has no parent.
414
415A node has 2 names. The actual node name is generally contained in a
416property of type "name" in the node property list whose value is a
417zero terminated string and is mandatory for version 1 to 3 of the
418format definition (as it is in Open Firmware). Version 0x10 makes it
419optional as it can generate it from the unit name defined below.
420
421There is also a "unit name" that is used to differenciate nodes with
422the same name at the same level, it is usually made of the node
423name's, the "@" sign, and a "unit address", which definition is
424specific to the bus type the node sits on.
425
426The unit name doesn't exist as a property per-se but is included in
427the device-tree structure. It is typically used to represent "path" in
428the device-tree. More details about the actual format of these will be
429below.
430
431The kernel powerpc generic code does not make any formal use of the
432unit address (though some board support code may do) so the only real
433requirement here for the unit address is to ensure uniqueness of
434the node unit name at a given level of the tree. Nodes with no notion
435of address and no possible sibling of the same name (like /memory or
436/cpus) may omit the unit address in the context of this specification,
437or use the "@0" default unit address. The unit name is used to define
438a node "full path", which is the concatenation of all parent node
439unit names separated with "/".
440
441The root node doesn't have a defined name, and isn't required to have
442a name property either if you are using version 3 or earlier of the
443format. It also has no unit address (no @ symbol followed by a unit
444address). The root node unit name is thus an empty string. The full
445path to the root node is "/".
446
447Every node which actually represents an actual device (that is, a node
448which isn't only a virtual "container" for more nodes, like "/cpus"
449is) is also required to have a "device_type" property indicating the
450type of node .
451
452Finally, every node that can be referenced from a property in another
453node is required to have a "linux,phandle" property. Real open
454firmware implementations provide a unique "phandle" value for every
455node that the "prom_init()" trampoline code turns into
456"linux,phandle" properties. However, this is made optional if the
457flattened device tree is used directly. An example of a node
458referencing another node via "phandle" is when laying out the
459interrupt tree which will be described in a further version of this
460document.
461
462This "linux, phandle" property is a 32 bit value that uniquely
463identifies a node. You are free to use whatever values or system of
464values, internal pointers, or whatever to generate these, the only
465requirement is that every node for which you provide that property has
466a unique value for it.
467
468Here is an example of a simple device-tree. In this example, an "o"
469designates a node followed by the node unit name. Properties are
470presented with their name followed by their content. "content"
471represents an ASCII string (zero terminated) value, while <content>
472represents a 32 bit hexadecimal value. The various nodes in this
473example will be discussed in a later chapter. At this point, it is
474only meant to give you a idea of what a device-tree looks like. I have
475purposefully kept the "name" and "linux,phandle" properties which
476aren't necessary in order to give you a better idea of what the tree
477looks like in practice.
478
479 / o device-tree
480 |- name = "device-tree"
481 |- model = "MyBoardName"
482 |- compatible = "MyBoardFamilyName"
483 |- #address-cells = <2>
484 |- #size-cells = <2>
485 |- linux,phandle = <0>
486 |
487 o cpus
488 | | - name = "cpus"
489 | | - linux,phandle = <1>
490 | | - #address-cells = <1>
491 | | - #size-cells = <0>
492 | |
493 | o PowerPC,970@0
494 | |- name = "PowerPC,970"
495 | |- device_type = "cpu"
496 | |- reg = <0>
497 | |- clock-frequency = <5f5e1000>
498 | |- linux,boot-cpu
499 | |- linux,phandle = <2>
500 |
501 o memory@0
502 | |- name = "memory"
503 | |- device_type = "memory"
504 | |- reg = <00000000 00000000 00000000 20000000>
505 | |- linux,phandle = <3>
506 |
507 o chosen
508 |- name = "chosen"
509 |- bootargs = "root=/dev/sda2"
510 |- linux,platform = <00000600>
511 |- linux,phandle = <4>
512
513This tree is almost a minimal tree. It pretty much contains the
514minimal set of required nodes and properties to boot a linux kernel;
515that is, some basic model informations at the root, the CPUs, and the
516physical memory layout. It also includes misc information passed
517through /chosen, like in this example, the platform type (mandatory)
518and the kernel command line arguments (optional).
519
520The /cpus/PowerPC,970@0/linux,boot-cpu property is an example of a
521property without a value. All other properties have a value. The
522significance of the #address-cells and #size-cells properties will be
523explained in chapter IV which defines precisely the required nodes and
524properties and their content.
525
526
5273) Device tree "structure" block
528
529The structure of the device tree is a linearized tree structure. The
530"OF_DT_BEGIN_NODE" token starts a new node, and the "OF_DT_END_NODE"
531ends that node definition. Child nodes are simply defined before
532"OF_DT_END_NODE" (that is nodes within the node). A 'token' is a 32
533bit value. The tree has to be "finished" with a OF_DT_END token
534
535Here's the basic structure of a single node:
536
537 * token OF_DT_BEGIN_NODE (that is 0x00000001)
538 * for version 1 to 3, this is the node full path as a zero
539 terminated string, starting with "/". For version 16 and later,
540 this is the node unit name only (or an empty string for the
541 root node)
542 * [align gap to next 4 bytes boundary]
543 * for each property:
544 * token OF_DT_PROP (that is 0x00000003)
545 * 32 bit value of property value size in bytes (or 0 of no
546 * value)
547 * 32 bit value of offset in string block of property name
548 * property value data if any
549 * [align gap to next 4 bytes boundary]
550 * [child nodes if any]
551 * token OF_DT_END_NODE (that is 0x00000002)
552
553So the node content can be summmarised as a start token, a full path,
554a list of properties, a list of child node and an end token. Every
555child node is a full node structure itself as defined above.
556
5574) Device tree 'strings" block
558
559In order to save space, property names, which are generally redundant,
560are stored separately in the "strings" block. This block is simply the
561whole bunch of zero terminated strings for all property names
562concatenated together. The device-tree property definitions in the
563structure block will contain offset values from the beginning of the
564strings block.
565
566
567III - Required content of the device tree
568=========================================
569
570WARNING: All "linux,*" properties defined in this document apply only
571to a flattened device-tree. If your platform uses a real
572implementation of Open Firmware or an implementation compatible with
573the Open Firmware client interface, those properties will be created
574by the trampoline code in the kernel's prom_init() file. For example,
575that's where you'll have to add code to detect your board model and
576set the platform number. However, when using the flatenned device-tree
577entry point, there is no prom_init() pass, and thus you have to
578provide those properties yourself.
579
580
5811) Note about cells and address representation
582----------------------------------------------
583
584The general rule is documented in the various Open Firmware
585documentations. If you chose to describe a bus with the device-tree
586and there exist an OF bus binding, then you should follow the
587specification. However, the kernel does not require every single
588device or bus to be described by the device tree.
589
590In general, the format of an address for a device is defined by the
591parent bus type, based on the #address-cells and #size-cells
592property. In the absence of such a property, the parent's parent
593values are used, etc... The kernel requires the root node to have
594those properties defining addresses format for devices directly mapped
595on the processor bus.
596
597Those 2 properties define 'cells' for representing an address and a
598size. A "cell" is a 32 bit number. For example, if both contain 2
599like the example tree given above, then an address and a size are both
600composed of 2 cells, and each is a 64 bit number (cells are
601concatenated and expected to be in big endian format). Another example
602is the way Apple firmware defines them, with 2 cells for an address
603and one cell for a size. Most 32-bit implementations should define
604#address-cells and #size-cells to 1, which represents a 32-bit value.
605Some 32-bit processors allow for physical addresses greater than 32
606bits; these processors should define #address-cells as 2.
607
608"reg" properties are always a tuple of the type "address size" where
609the number of cells of address and size is specified by the bus
610#address-cells and #size-cells. When a bus supports various address
611spaces and other flags relative to a given address allocation (like
612prefetchable, etc...) those flags are usually added to the top level
613bits of the physical address. For example, a PCI physical address is
614made of 3 cells, the bottom two containing the actual address itself
615while the top cell contains address space indication, flags, and pci
616bus & device numbers.
617
618For busses that support dynamic allocation, it's the accepted practice
619to then not provide the address in "reg" (keep it 0) though while
620providing a flag indicating the address is dynamically allocated, and
621then, to provide a separate "assigned-addresses" property that
622contains the fully allocated addresses. See the PCI OF bindings for
623details.
624
625In general, a simple bus with no address space bits and no dynamic
626allocation is preferred if it reflects your hardware, as the existing
627kernel address parsing functions will work out of the box. If you
628define a bus type with a more complex address format, including things
629like address space bits, you'll have to add a bus translator to the
630prom_parse.c file of the recent kernels for your bus type.
631
632The "reg" property only defines addresses and sizes (if #size-cells
633is
634non-0) within a given bus. In order to translate addresses upward
635(that is into parent bus addresses, and possibly into cpu physical
636addresses), all busses must contain a "ranges" property. If the
637"ranges" property is missing at a given level, it's assumed that
638translation isn't possible. The format of the "ranges" proprety for a
639bus is a list of:
640
641 bus address, parent bus address, size
642
643"bus address" is in the format of the bus this bus node is defining,
644that is, for a PCI bridge, it would be a PCI address. Thus, (bus
645address, size) defines a range of addresses for child devices. "parent
646bus address" is in the format of the parent bus of this bus. For
647example, for a PCI host controller, that would be a CPU address. For a
648PCI<->ISA bridge, that would be a PCI address. It defines the base
649address in the parent bus where the beginning of that range is mapped.
650
651For a new 64 bit powerpc board, I recommend either the 2/2 format or
652Apple's 2/1 format which is slightly more compact since sizes usually
653fit in a single 32 bit word. New 32 bit powerpc boards should use a
6541/1 format, unless the processor supports physical addresses greater
655than 32-bits, in which case a 2/1 format is recommended.
656
657
6582) Note about "compatible" properties
659-------------------------------------
660
661These properties are optional, but recommended in devices and the root
662node. The format of a "compatible" property is a list of concatenated
663zero terminated strings. They allow a device to express its
664compatibility with a family of similar devices, in some cases,
665allowing a single driver to match against several devices regardless
666of their actual names.
667
6683) Note about "name" properties
669-------------------------------
670
671While earlier users of Open Firmware like OldWorld macintoshes tended
672to use the actual device name for the "name" property, it's nowadays
673considered a good practice to use a name that is closer to the device
674class (often equal to device_type). For example, nowadays, ethernet
675controllers are named "ethernet", an additional "model" property
676defining precisely the chip type/model, and "compatible" property
677defining the family in case a single driver can driver more than one
678of these chips. However, the kernel doesn't generally put any
679restriction on the "name" property; it is simply considered good
680practice to follow the standard and its evolutions as closely as
681possible.
682
683Note also that the new format version 16 makes the "name" property
684optional. If it's absent for a node, then the node's unit name is then
685used to reconstruct the name. That is, the part of the unit name
686before the "@" sign is used (or the entire unit name if no "@" sign
687is present).
688
6894) Note about node and property names and character set
690-------------------------------------------------------
691
692While open firmware provides more flexibe usage of 8859-1, this
693specification enforces more strict rules. Nodes and properties should
694be comprised only of ASCII characters 'a' to 'z', '0' to
695'9', ',', '.', '_', '+', '#', '?', and '-'. Node names additionally
696allow uppercase characters 'A' to 'Z' (property names should be
697lowercase. The fact that vendors like Apple don't respect this rule is
698irrelevant here). Additionally, node and property names should always
699begin with a character in the range 'a' to 'z' (or 'A' to 'Z' for node
700names).
701
702The maximum number of characters for both nodes and property names
703is 31. In the case of node names, this is only the leftmost part of
704a unit name (the pure "name" property), it doesn't include the unit
705address which can extend beyond that limit.
706
707
7085) Required nodes and properties
709--------------------------------
710 These are all that are currently required. However, it is strongly
711 recommended that you expose PCI host bridges as documented in the
712 PCI binding to open firmware, and your interrupt tree as documented
713 in OF interrupt tree specification.
714
715 a) The root node
716
717 The root node requires some properties to be present:
718
719 - model : this is your board name/model
720 - #address-cells : address representation for "root" devices
721 - #size-cells: the size representation for "root" devices
722
723 Additionally, some recommended properties are:
724
725 - compatible : the board "family" generally finds its way here,
726 for example, if you have 2 board models with a similar layout,
727 that typically get driven by the same platform code in the
728 kernel, you would use a different "model" property but put a
729 value in "compatible". The kernel doesn't directly use that
730 value (see /chosen/linux,platform for how the kernel choses a
731 platform type) but it is generally useful.
732
733 The root node is also generally where you add additional properties
734 specific to your board like the serial number if any, that sort of
735 thing. it is recommended that if you add any "custom" property whose
736 name may clash with standard defined ones, you prefix them with your
737 vendor name and a comma.
738
739 b) The /cpus node
740
741 This node is the parent of all individual CPU nodes. It doesn't
742 have any specific requirements, though it's generally good practice
743 to have at least:
744
745 #address-cells = <00000001>
746 #size-cells = <00000000>
747
748 This defines that the "address" for a CPU is a single cell, and has
749 no meaningful size. This is not necessary but the kernel will assume
750 that format when reading the "reg" properties of a CPU node, see
751 below
752
753 c) The /cpus/* nodes
754
755 So under /cpus, you are supposed to create a node for every CPU on
756 the machine. There is no specific restriction on the name of the
757 CPU, though It's common practice to call it PowerPC,<name>. For
758 example, Apple uses PowerPC,G5 while IBM uses PowerPC,970FX.
759
760 Required properties:
761
762 - device_type : has to be "cpu"
763 - reg : This is the physical cpu number, it's a single 32 bit cell
764 and is also used as-is as the unit number for constructing the
765 unit name in the full path. For example, with 2 CPUs, you would
766 have the full path:
767 /cpus/PowerPC,970FX@0
768 /cpus/PowerPC,970FX@1
769 (unit addresses do not require leading zeroes)
770 - d-cache-line-size : one cell, L1 data cache line size in bytes
771 - i-cache-line-size : one cell, L1 instruction cache line size in
772 bytes
773 - d-cache-size : one cell, size of L1 data cache in bytes
774 - i-cache-size : one cell, size of L1 instruction cache in bytes
775 - linux, boot-cpu : Should be defined if this cpu is the boot cpu.
776
777 Recommended properties:
778
779 - timebase-frequency : a cell indicating the frequency of the
780 timebase in Hz. This is not directly used by the generic code,
781 but you are welcome to copy/paste the pSeries code for setting
782 the kernel timebase/decrementer calibration based on this
783 value.
784 - clock-frequency : a cell indicating the CPU core clock frequency
785 in Hz. A new property will be defined for 64 bit values, but if
786 your frequency is < 4Ghz, one cell is enough. Here as well as
787 for the above, the common code doesn't use that property, but
788 you are welcome to re-use the pSeries or Maple one. A future
789 kernel version might provide a common function for this.
790
791 You are welcome to add any property you find relevant to your board,
792 like some information about the mechanism used to soft-reset the
793 CPUs. For example, Apple puts the GPIO number for CPU soft reset
794 lines in there as a "soft-reset" property since they start secondary
795 CPUs by soft-resetting them.
796
797
798 d) the /memory node(s)
799
800 To define the physical memory layout of your board, you should
801 create one or more memory node(s). You can either create a single
802 node with all memory ranges in its reg property, or you can create
803 several nodes, as you wish. The unit address (@ part) used for the
804 full path is the address of the first range of memory defined by a
805 given node. If you use a single memory node, this will typically be
806 @0.
807
808 Required properties:
809
810 - device_type : has to be "memory"
811 - reg : This property contains all the physical memory ranges of
812 your board. It's a list of addresses/sizes concatenated
813 together, with the number of cells of each defined by the
814 #address-cells and #size-cells of the root node. For example,
815 with both of these properties beeing 2 like in the example given
816 earlier, a 970 based machine with 6Gb of RAM could typically
817 have a "reg" property here that looks like:
818
819 00000000 00000000 00000000 80000000
820 00000001 00000000 00000001 00000000
821
822 That is a range starting at 0 of 0x80000000 bytes and a range
823 starting at 0x100000000 and of 0x100000000 bytes. You can see
824 that there is no memory covering the IO hole between 2Gb and
825 4Gb. Some vendors prefer splitting those ranges into smaller
826 segments, but the kernel doesn't care.
827
828 e) The /chosen node
829
830 This node is a bit "special". Normally, that's where open firmware
831 puts some variable environment information, like the arguments, or
832 phandle pointers to nodes like the main interrupt controller, or the
833 default input/output devices.
834
835 This specification makes a few of these mandatory, but also defines
836 some linux-specific properties that would be normally constructed by
837 the prom_init() trampoline when booting with an OF client interface,
838 but that you have to provide yourself when using the flattened format.
839
840 Required properties:
841
842 - linux,platform : This is your platform number as assigned by the
843 architecture maintainers
844
845 Recommended properties:
846
847 - bootargs : This zero-terminated string is passed as the kernel
848 command line
849 - linux,stdout-path : This is the full path to your standard
850 console device if any. Typically, if you have serial devices on
851 your board, you may want to put the full path to the one set as
852 the default console in the firmware here, for the kernel to pick
853 it up as it's own default console. If you look at the funciton
854 set_preferred_console() in arch/ppc64/kernel/setup.c, you'll see
855 that the kernel tries to find out the default console and has
856 knowledge of various types like 8250 serial ports. You may want
857 to extend this function to add your own.
858 - interrupt-controller : This is one cell containing a phandle
859 value that matches the "linux,phandle" property of your main
860 interrupt controller node. May be used for interrupt routing.
861
862
863 Note that u-boot creates and fills in the chosen node for platforms
864 that use it.
865
866 f) the /soc<SOCname> node
867
868 This node is used to represent a system-on-a-chip (SOC) and must be
869 present if the processor is a SOC. The top-level soc node contains
870 information that is global to all devices on the SOC. The node name
871 should contain a unit address for the SOC, which is the base address
872 of the memory-mapped register set for the SOC. The name of an soc
873 node should start with "soc", and the remainder of the name should
874 represent the part number for the soc. For example, the MPC8540's
875 soc node would be called "soc8540".
876
877 Required properties:
878
879 - device_type : Should be "soc"
880 - ranges : Should be defined as specified in 1) to describe the
881 translation of SOC addresses for memory mapped SOC registers.
882 - bus-frequency: Contains the bus frequency for the SOC node.
883 Typically, the value of this field is filled in by the boot
884 loader.
885
886
887 Recommended properties:
888
889 - reg : This property defines the address and size of the
890 memory-mapped registers that are used for the SOC node itself.
891 It does not include the child device registers - these will be
892 defined inside each child node. The address specified in the
893 "reg" property should match the unit address of the SOC node.
894 - #address-cells : Address representation for "soc" devices. The
895 format of this field may vary depending on whether or not the
896 device registers are memory mapped. For memory mapped
897 registers, this field represents the number of cells needed to
898 represent the address of the registers. For SOCs that do not
899 use MMIO, a special address format should be defined that
900 contains enough cells to represent the required information.
901 See 1) above for more details on defining #address-cells.
902 - #size-cells : Size representation for "soc" devices
903 - #interrupt-cells : Defines the width of cells used to represent
904 interrupts. Typically this value is <2>, which includes a
905 32-bit number that represents the interrupt number, and a
906 32-bit number that represents the interrupt sense and level.
907 This field is only needed if the SOC contains an interrupt
908 controller.
909
910 The SOC node may contain child nodes for each SOC device that the
911 platform uses. Nodes should not be created for devices which exist
912 on the SOC but are not used by a particular platform. See chapter VI
913 for more information on how to specify devices that are part of an
914SOC.
915
916 Example SOC node for the MPC8540:
917
918 soc8540@e0000000 {
919 #address-cells = <1>;
920 #size-cells = <1>;
921 #interrupt-cells = <2>;
922 device_type = "soc";
923 ranges = <00000000 e0000000 00100000>
924 reg = <e0000000 00003000>;
925 bus-frequency = <0>;
926 }
927
928
929
930IV - "dtc", the device tree compiler
931====================================
932
933
934dtc source code can be found at
935<http://ozlabs.org/~dgibson/dtc/dtc.tar.gz>
936
937WARNING: This version is still in early development stage; the
938resulting device-tree "blobs" have not yet been validated with the
939kernel. The current generated bloc lacks a useful reserve map (it will
940be fixed to generate an empty one, it's up to the bootloader to fill
941it up) among others. The error handling needs work, bugs are lurking,
942etc...
943
944dtc basically takes a device-tree in a given format and outputs a
945device-tree in another format. The currently supported formats are:
946
947 Input formats:
948 -------------
949
950 - "dtb": "blob" format, that is a flattened device-tree block
951 with
952 header all in a binary blob.
953 - "dts": "source" format. This is a text file containing a
954 "source" for a device-tree. The format is defined later in this
955 chapter.
956 - "fs" format. This is a representation equivalent to the
957 output of /proc/device-tree, that is nodes are directories and
958 properties are files
959
960 Output formats:
961 ---------------
962
963 - "dtb": "blob" format
964 - "dts": "source" format
965 - "asm": assembly language file. This is a file that can be
966 sourced by gas to generate a device-tree "blob". That file can
967 then simply be added to your Makefile. Additionally, the
968 assembly file exports some symbols that can be use
969
970
971The syntax of the dtc tool is
972
973 dtc [-I <input-format>] [-O <output-format>]
974 [-o output-filename] [-V output_version] input_filename
975
976
977The "output_version" defines what versio of the "blob" format will be
978generated. Supported versions are 1,2,3 and 16. The default is
979currently version 3 but that may change in the future to version 16.
980
981Additionally, dtc performs various sanity checks on the tree, like the
982uniqueness of linux,phandle properties, validity of strings, etc...
983
984The format of the .dts "source" file is "C" like, supports C and C++
985style commments.
986
987/ {
988}
989
990The above is the "device-tree" definition. It's the only statement
991supported currently at the toplevel.
992
993/ {
994 property1 = "string_value"; /* define a property containing a 0
995 * terminated string
996 */
997
998 property2 = <1234abcd>; /* define a property containing a
999 * numerical 32 bits value (hexadecimal)
1000 */
1001
1002 property3 = <12345678 12345678 deadbeef>;
1003 /* define a property containing 3
1004 * numerical 32 bits values (cells) in
1005 * hexadecimal
1006 */
1007 property4 = [0a 0b 0c 0d de ea ad be ef];
1008 /* define a property whose content is
1009 * an arbitrary array of bytes
1010 */
1011
1012 childnode@addresss { /* define a child node named "childnode"
1013 * whose unit name is "childnode at
1014 * address"
1015 */
1016
1017 childprop = "hello\n"; /* define a property "childprop" of
1018 * childnode (in this case, a string)
1019 */
1020 };
1021};
1022
1023Nodes can contain other nodes etc... thus defining the hierarchical
1024structure of the tree.
1025
1026Strings support common escape sequences from C: "\n", "\t", "\r",
1027"\(octal value)", "\x(hex value)".
1028
1029It is also suggested that you pipe your source file through cpp (gcc
1030preprocessor) so you can use #include's, #define for constants, etc...
1031
1032Finally, various options are planned but not yet implemented, like
1033automatic generation of phandles, labels (exported to the asm file so
1034you can point to a property content and change it easily from whatever
1035you link the device-tree with), label or path instead of numeric value
1036in some cells to "point" to a node (replaced by a phandle at compile
1037time), export of reserve map address to the asm file, ability to
1038specify reserve map content at compile time, etc...
1039
1040We may provide a .h include file with common definitions of that
1041proves useful for some properties (like building PCI properties or
1042interrupt maps) though it may be better to add a notion of struct
1043definitions to the compiler...
1044
1045
1046V - Recommendations for a bootloader
1047====================================
1048
1049
1050Here are some various ideas/recommendations that have been proposed
1051while all this has been defined and implemented.
1052
1053 - The bootloader may want to be able to use the device-tree itself
1054 and may want to manipulate it (to add/edit some properties,
1055 like physical memory size or kernel arguments). At this point, 2
1056 choices can be made. Either the bootloader works directly on the
1057 flattened format, or the bootloader has its own internal tree
1058 representation with pointers (similar to the kernel one) and
1059 re-flattens the tree when booting the kernel. The former is a bit
1060 more difficult to edit/modify, the later requires probably a bit
1061 more code to handle the tree structure. Note that the structure
1062 format has been designed so it's relatively easy to "insert"
1063 properties or nodes or delete them by just memmoving things
1064 around. It contains no internal offsets or pointers for this
1065 purpose.
1066
1067 - An example of code for iterating nodes & retreiving properties
1068 directly from the flattened tree format can be found in the kernel
1069 file arch/ppc64/kernel/prom.c, look at scan_flat_dt() function,
1070 it's usage in early_init_devtree(), and the corresponding various
1071 early_init_dt_scan_*() callbacks. That code can be re-used in a
1072 GPL bootloader, and as the author of that code, I would be happy
1073 do discuss possible free licencing to any vendor who wishes to
1074 integrate all or part of this code into a non-GPL bootloader.
1075
1076
1077
1078VI - System-on-a-chip devices and nodes
1079=======================================
1080
1081Many companies are now starting to develop system-on-a-chip
1082processors, where the processor core (cpu) and many peripheral devices
1083exist on a single piece of silicon. For these SOCs, an SOC node
1084should be used that defines child nodes for the devices that make
1085up the SOC. While platforms are not required to use this model in
1086order to boot the kernel, it is highly encouraged that all SOC
1087implementations define as complete a flat-device-tree as possible to
1088describe the devices on the SOC. This will allow for the
1089genericization of much of the kernel code.
1090
1091
10921) Defining child nodes of an SOC
1093---------------------------------
1094
1095Each device that is part of an SOC may have its own node entry inside
1096the SOC node. For each device that is included in the SOC, the unit
1097address property represents the address offset for this device's
1098memory-mapped registers in the parent's address space. The parent's
1099address space is defined by the "ranges" property in the top-level soc
1100node. The "reg" property for each node that exists directly under the
1101SOC node should contain the address mapping from the child address space
1102to the parent SOC address space and the size of the device's
1103memory-mapped register file.
1104
1105For many devices that may exist inside an SOC, there are predefined
1106specifications for the format of the device tree node. All SOC child
1107nodes should follow these specifications, except where noted in this
1108document.
1109
1110See appendix A for an example partial SOC node definition for the
1111MPC8540.
1112
1113
11142) Specifying interrupt information for SOC devices
1115---------------------------------------------------
1116
1117Each device that is part of an SOC and which generates interrupts
1118should have the following properties:
1119
1120 - interrupt-parent : contains the phandle of the interrupt
1121 controller which handles interrupts for this device
1122 - interrupts : a list of tuples representing the interrupt
1123 number and the interrupt sense and level for each interupt
1124 for this device.
1125
1126This information is used by the kernel to build the interrupt table
1127for the interrupt controllers in the system.
1128
1129Sense and level information should be encoded as follows:
1130
1131 Devices connected to openPIC-compatible controllers should encode
1132 sense and polarity as follows:
1133
1134 0 = high to low edge sensitive type enabled
1135 1 = active low level sensitive type enabled
1136 2 = low to high edge sensitive type enabled
1137 3 = active high level sensitive type enabled
1138
1139 ISA PIC interrupt controllers should adhere to the ISA PIC
1140 encodings listed below:
1141
1142 0 = active low level sensitive type enabled
1143 1 = active high level sensitive type enabled
1144 2 = high to low edge sensitive type enabled
1145 3 = low to high edge sensitive type enabled
1146
1147
1148
11493) Representing devices without a current OF specification
1150----------------------------------------------------------
1151
1152Currently, there are many devices on SOCs that do not have a standard
1153representation pre-defined as part of the open firmware
1154specifications, mainly because the boards that contain these SOCs are
1155not currently booted using open firmware. This section contains
1156descriptions for the SOC devices for which new nodes have been
1157defined; this list will expand as more and more SOC-containing
1158platforms are moved over to use the flattened-device-tree model.
1159
1160 a) MDIO IO device
1161
1162 The MDIO is a bus to which the PHY devices are connected. For each
1163 device that exists on this bus, a child node should be created. See
1164 the definition of the PHY node below for an example of how to define
1165 a PHY.
1166
1167 Required properties:
1168 - reg : Offset and length of the register set for the device
1169 - device_type : Should be "mdio"
1170 - compatible : Should define the compatible device type for the
1171 mdio. Currently, this is most likely to be "gianfar"
1172
1173 Example:
1174
1175 mdio@24520 {
1176 reg = <24520 20>;
1177 device_type = "mdio";
1178 compatible = "gianfar";
1179
1180 ethernet-phy@0 {
1181 ......
1182 };
1183 };
1184
1185
1186 b) Gianfar-compatible ethernet nodes
1187
1188 Required properties:
1189
1190 - device_type : Should be "network"
1191 - model : Model of the device. Can be "TSEC", "eTSEC", or "FEC"
1192 - compatible : Should be "gianfar"
1193 - reg : Offset and length of the register set for the device
1194 - address : List of bytes representing the ethernet address of
1195 this controller
1196 - interrupts : <a b> where a is the interrupt number and b is a
1197 field that represents an encoding of the sense and level
1198 information for the interrupt. This should be encoded based on
1199 the information in section 2) depending on the type of interrupt
1200 controller you have.
1201 - interrupt-parent : the phandle for the interrupt controller that
1202 services interrupts for this device.
1203 - phy-handle : The phandle for the PHY connected to this ethernet
1204 controller.
1205
1206 Example:
1207
1208 ethernet@24000 {
1209 #size-cells = <0>;
1210 device_type = "network";
1211 model = "TSEC";
1212 compatible = "gianfar";
1213 reg = <24000 1000>;
1214 address = [ 00 E0 0C 00 73 00 ];
1215 interrupts = <d 3 e 3 12 3>;
1216 interrupt-parent = <40000>;
1217 phy-handle = <2452000>
1218 };
1219
1220
1221
1222 c) PHY nodes
1223
1224 Required properties:
1225
1226 - device_type : Should be "ethernet-phy"
1227 - interrupts : <a b> where a is the interrupt number and b is a
1228 field that represents an encoding of the sense and level
1229 information for the interrupt. This should be encoded based on
1230 the information in section 2) depending on the type of interrupt
1231 controller you have.
1232 - interrupt-parent : the phandle for the interrupt controller that
1233 services interrupts for this device.
1234 - reg : The ID number for the phy, usually a small integer
1235 - linux,phandle : phandle for this node; likely referenced by an
1236 ethernet controller node.
1237
1238
1239 Example:
1240
1241 ethernet-phy@0 {
1242 linux,phandle = <2452000>
1243 interrupt-parent = <40000>;
1244 interrupts = <35 1>;
1245 reg = <0>;
1246 device_type = "ethernet-phy";
1247 };
1248
1249
1250 d) Interrupt controllers
1251
1252 Some SOC devices contain interrupt controllers that are different
1253 from the standard Open PIC specification. The SOC device nodes for
1254 these types of controllers should be specified just like a standard
1255 OpenPIC controller. Sense and level information should be encoded
1256 as specified in section 2) of this chapter for each device that
1257 specifies an interrupt.
1258
1259 Example :
1260
1261 pic@40000 {
1262 linux,phandle = <40000>;
1263 clock-frequency = <0>;
1264 interrupt-controller;
1265 #address-cells = <0>;
1266 reg = <40000 40000>;
1267 built-in;
1268 compatible = "chrp,open-pic";
1269 device_type = "open-pic";
1270 big-endian;
1271 };
1272
1273
1274 e) I2C
1275
1276 Required properties :
1277
1278 - device_type : Should be "i2c"
1279 - reg : Offset and length of the register set for the device
1280
1281 Recommended properties :
1282
1283 - compatible : Should be "fsl-i2c" for parts compatible with
1284 Freescale I2C specifications.
1285 - interrupts : <a b> where a is the interrupt number and b is a
1286 field that represents an encoding of the sense and level
1287 information for the interrupt. This should be encoded based on
1288 the information in section 2) depending on the type of interrupt
1289 controller you have.
1290 - interrupt-parent : the phandle for the interrupt controller that
1291 services interrupts for this device.
1292 - dfsrr : boolean; if defined, indicates that this I2C device has
1293 a digital filter sampling rate register
1294 - fsl5200-clocking : boolean; if defined, indicated that this device
1295 uses the FSL 5200 clocking mechanism.
1296
1297 Example :
1298
1299 i2c@3000 {
1300 interrupt-parent = <40000>;
1301 interrupts = <1b 3>;
1302 reg = <3000 18>;
1303 device_type = "i2c";
1304 compatible = "fsl-i2c";
1305 dfsrr;
1306 };
1307
1308
1309 f) Freescale SOC USB controllers
1310
1311 The device node for a USB controller that is part of a Freescale
1312 SOC is as described in the document "Open Firmware Recommended
1313 Practice : Universal Serial Bus" with the following modifications
1314 and additions :
1315
1316 Required properties :
1317 - compatible : Should be "fsl-usb2-mph" for multi port host usb
1318 controllers, or "fsl-usb2-dr" for dual role usb controllers
1319 - phy_type : For multi port host usb controllers, should be one of
1320 "ulpi", or "serial". For dual role usb controllers, should be
1321 one of "ulpi", "utmi", "utmi_wide", or "serial".
1322 - reg : Offset and length of the register set for the device
1323 - port0 : boolean; if defined, indicates port0 is connected for
1324 fsl-usb2-mph compatible controllers. Either this property or
1325 "port1" (or both) must be defined for "fsl-usb2-mph" compatible
1326 controllers.
1327 - port1 : boolean; if defined, indicates port1 is connected for
1328 fsl-usb2-mph compatible controllers. Either this property or
1329 "port0" (or both) must be defined for "fsl-usb2-mph" compatible
1330 controllers.
1331
1332 Recommended properties :
1333 - interrupts : <a b> where a is the interrupt number and b is a
1334 field that represents an encoding of the sense and level
1335 information for the interrupt. This should be encoded based on
1336 the information in section 2) depending on the type of interrupt
1337 controller you have.
1338 - interrupt-parent : the phandle for the interrupt controller that
1339 services interrupts for this device.
1340
1341 Example multi port host usb controller device node :
1342 usb@22000 {
1343 device_type = "usb";
1344 compatible = "fsl-usb2-mph";
1345 reg = <22000 1000>;
1346 #address-cells = <1>;
1347 #size-cells = <0>;
1348 interrupt-parent = <700>;
1349 interrupts = <27 1>;
1350 phy_type = "ulpi";
1351 port0;
1352 port1;
1353 };
1354
1355 Example dual role usb controller device node :
1356 usb@23000 {
1357 device_type = "usb";
1358 compatible = "fsl-usb2-dr";
1359 reg = <23000 1000>;
1360 #address-cells = <1>;
1361 #size-cells = <0>;
1362 interrupt-parent = <700>;
1363 interrupts = <26 1>;
1364 phy = "ulpi";
1365 };
1366
1367
1368 More devices will be defined as this spec matures.
1369
1370
1371Appendix A - Sample SOC node for MPC8540
1372========================================
1373
1374Note that the #address-cells and #size-cells for the SoC node
1375in this example have been explicitly listed; these are likely
1376not necessary as they are usually the same as the root node.
1377
1378 soc8540@e0000000 {
1379 #address-cells = <1>;
1380 #size-cells = <1>;
1381 #interrupt-cells = <2>;
1382 device_type = "soc";
1383 ranges = <00000000 e0000000 00100000>
1384 reg = <e0000000 00003000>;
1385 bus-frequency = <0>;
1386
1387 mdio@24520 {
1388 reg = <24520 20>;
1389 device_type = "mdio";
1390 compatible = "gianfar";
1391
1392 ethernet-phy@0 {
1393 linux,phandle = <2452000>
1394 interrupt-parent = <40000>;
1395 interrupts = <35 1>;
1396 reg = <0>;
1397 device_type = "ethernet-phy";
1398 };
1399
1400 ethernet-phy@1 {
1401 linux,phandle = <2452001>
1402 interrupt-parent = <40000>;
1403 interrupts = <35 1>;
1404 reg = <1>;
1405 device_type = "ethernet-phy";
1406 };
1407
1408 ethernet-phy@3 {
1409 linux,phandle = <2452002>
1410 interrupt-parent = <40000>;
1411 interrupts = <35 1>;
1412 reg = <3>;
1413 device_type = "ethernet-phy";
1414 };
1415
1416 };
1417
1418 ethernet@24000 {
1419 #size-cells = <0>;
1420 device_type = "network";
1421 model = "TSEC";
1422 compatible = "gianfar";
1423 reg = <24000 1000>;
1424 address = [ 00 E0 0C 00 73 00 ];
1425 interrupts = <d 3 e 3 12 3>;
1426 interrupt-parent = <40000>;
1427 phy-handle = <2452000>;
1428 };
1429
1430 ethernet@25000 {
1431 #address-cells = <1>;
1432 #size-cells = <0>;
1433 device_type = "network";
1434 model = "TSEC";
1435 compatible = "gianfar";
1436 reg = <25000 1000>;
1437 address = [ 00 E0 0C 00 73 01 ];
1438 interrupts = <13 3 14 3 18 3>;
1439 interrupt-parent = <40000>;
1440 phy-handle = <2452001>;
1441 };
1442
1443 ethernet@26000 {
1444 #address-cells = <1>;
1445 #size-cells = <0>;
1446 device_type = "network";
1447 model = "FEC";
1448 compatible = "gianfar";
1449 reg = <26000 1000>;
1450 address = [ 00 E0 0C 00 73 02 ];
1451 interrupts = <19 3>;
1452 interrupt-parent = <40000>;
1453 phy-handle = <2452002>;
1454 };
1455
1456 serial@4500 {
1457 device_type = "serial";
1458 compatible = "ns16550";
1459 reg = <4500 100>;
1460 clock-frequency = <0>;
1461 interrupts = <1a 3>;
1462 interrupt-parent = <40000>;
1463 };
1464
1465 pic@40000 {
1466 linux,phandle = <40000>;
1467 clock-frequency = <0>;
1468 interrupt-controller;
1469 #address-cells = <0>;
1470 reg = <40000 40000>;
1471 built-in;
1472 compatible = "chrp,open-pic";
1473 device_type = "open-pic";
1474 big-endian;
1475 };
1476
1477 i2c@3000 {
1478 interrupt-parent = <40000>;
1479 interrupts = <1b 3>;
1480 reg = <3000 18>;
1481 device_type = "i2c";
1482 compatible = "fsl-i2c";
1483 dfsrr;
1484 };
1485
1486 };
diff --git a/Documentation/scsi/ChangeLog.megaraid_sas b/Documentation/scsi/ChangeLog.megaraid_sas
new file mode 100644
index 000000000000..2dafa63bd370
--- /dev/null
+++ b/Documentation/scsi/ChangeLog.megaraid_sas
@@ -0,0 +1,47 @@
11 Release Date : Wed Feb 03 14:31:44 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
22 Current Version : 00.00.02.04
33 Older Version : 00.00.02.04
4
5i. Support for 1078 type (ppc IOP) controller, device id : 0x60 added.
6 During initialization, depending on the device id, the template members
7 are initialized with function pointers specific to the ppc or
8 xscale controllers.
9
10 -Sumant Patro <Sumant.Patro@lsil.com>
11
121 Release Date : Fri Feb 03 14:16:25 PST 2006 - Sumant Patro
13 <Sumant.Patro@lsil.com>
142 Current Version : 00.00.02.04
153 Older Version : 00.00.02.02
16i. Register 16 byte CDB capability with scsi midlayer
17
18 "Ths patch properly registers the 16 byte command length capability of the
19 megaraid_sas controlled hardware with the scsi midlayer. All megaraid_sas
20 hardware supports 16 byte CDB's."
21
22 -Joshua Giles <joshua_giles@dell.com>
23
241 Release Date : Mon Jan 23 14:09:01 PST 2006 - Sumant Patro <Sumant.Patro@lsil.com>
252 Current Version : 00.00.02.02
263 Older Version : 00.00.02.01
27
28i. New template defined to represent each family of controllers (identified by processor used).
29 The template will have defintions that will be initialised to appropritae values for a specific family of controllers. The template definition has four function pointers. During driver initialisation the function pointers will be set based on the controller family type. This change is done to support new controllers that has different processors and thus different register set.
30
31 -Sumant Patro <Sumant.Patro@lsil.com>
32
331 Release Date : Mon Dec 19 14:36:26 PST 2005 - Sumant Patro <Sumant.Patro@lsil.com>
342 Current Version : 00.00.02.00-rc4
353 Older Version : 00.00.02.01
36
37i. Code reorganized to remove code duplication in megasas_build_cmd.
38
39 "There's a lot of duplicate code megasas_build_cmd. Move that out of the different codepathes and merge the reminder of megasas_build_cmd into megasas_queue_command"
40
41 - Christoph Hellwig <hch@lst.de>
42
43ii. Defined MEGASAS_IOC_FIRMWARE32 for code paths that handles 32 bit applications in 64 bit systems.
44
45 "MEGASAS_IOC_FIRMWARE can't be redefined if CONFIG_COMPAT is set, we need to define a MEGASAS_IOC_FIRMWARE32 define so native binaries continue to work"
46
47 - Christoph Hellwig <hch@lst.de>
diff --git a/Documentation/scsi/aic79xx.txt b/Documentation/scsi/aic79xx.txt
index 0aeef740a95a..382b439b439e 100644
--- a/Documentation/scsi/aic79xx.txt
+++ b/Documentation/scsi/aic79xx.txt
@@ -1,5 +1,5 @@
1==================================================================== 1====================================================================
2= Adaptec Ultra320 Family Manager Set v1.3.11 = 2= Adaptec Ultra320 Family Manager Set =
3= = 3= =
4= README for = 4= README for =
5= The Linux Operating System = 5= The Linux Operating System =
@@ -63,6 +63,11 @@ The following information is available in this file:
63 68-pin) 63 68-pin)
642. Version History 642. Version History
65 65
66 3.0 (December 1st, 2005)
67 - Updated driver to use SCSI transport class infrastructure
68 - Upported sequencer and core fixes from adaptec released
69 version 2.0.15 of the driver.
70
66 1.3.11 (July 11, 2003) 71 1.3.11 (July 11, 2003)
67 - Fix several deadlock issues. 72 - Fix several deadlock issues.
68 - Add 29320ALP and 39320B Id's. 73 - Add 29320ALP and 39320B Id's.
@@ -194,7 +199,7 @@ The following information is available in this file:
194 supported) 199 supported)
195 - Support for the PCI-X standard up to 133MHz 200 - Support for the PCI-X standard up to 133MHz
196 - Support for the PCI v2.2 standard 201 - Support for the PCI v2.2 standard
197 - Domain Validation 202 - Domain Validation
198 203
199 2.2. Operating System Support: 204 2.2. Operating System Support:
200 - Redhat Linux 7.2, 7.3, 8.0, Advanced Server 2.1 205 - Redhat Linux 7.2, 7.3, 8.0, Advanced Server 2.1
@@ -411,77 +416,53 @@ The following information is available in this file:
411 http://www.adaptec.com. 416 http://www.adaptec.com.
412 417
413 418
4145. Contacting Adaptec 4195. Adaptec Customer Support
415 420
416 A Technical Support Identification (TSID) Number is required for 421 A Technical Support Identification (TSID) Number is required for
417 Adaptec technical support. 422 Adaptec technical support.
418 - The 12-digit TSID can be found on the white barcode-type label 423 - The 12-digit TSID can be found on the white barcode-type label
419 included inside the box with your product. The TSID helps us 424 included inside the box with your product. The TSID helps us
420 provide more efficient service by accurately identifying your 425 provide more efficient service by accurately identifying your
421 product and support status. 426 product and support status.
427
422 Support Options 428 Support Options
423 - Search the Adaptec Support Knowledgebase (ASK) at 429 - Search the Adaptec Support Knowledgebase (ASK) at
424 http://ask.adaptec.com for articles, troubleshooting tips, and 430 http://ask.adaptec.com for articles, troubleshooting tips, and
425 frequently asked questions for your product. 431 frequently asked questions about your product.
426 - For support via Email, submit your question to Adaptec's 432 - For support via Email, submit your question to Adaptec's
427 Technical Support Specialists at http://ask.adaptec.com. 433 Technical Support Specialists at http://ask.adaptec.com/.
428 434
429 North America 435 North America
430 - Visit our Web site at http://www.adaptec.com. 436 - Visit our Web site at http://www.adaptec.com/.
431 - To speak with a Fibre Channel/RAID/External Storage Technical 437 - For information about Adaptec's support options, call
432 Support Specialist, call 1-321-207-2000, 438 408-957-2550, 24 hours a day, 7 days a week.
433 Hours: Monday-Friday, 3:00 A.M. to 5:00 P.M., PST. 439 - To speak with a Technical Support Specialist,
434 (Not open on holidays) 440 * For hardware products, call 408-934-7274,
435 - For Technical Support in all other technologies including 441 Monday to Friday, 3:00 am to 5:00 pm, PDT.
436 SCSI, call 1-408-934-7274, 442 * For RAID and Fibre Channel products, call 321-207-2000,
437 Hours: Monday-Friday, 6:00 A.M. to 5:00 P.M., PST. 443 Monday to Friday, 3:00 am to 5:00 pm, PDT.
438 (Not open on holidays) 444 To expedite your service, have your computer with you.
439 - For after hours support, call 1-800-416-8066 ($99/call, 445 - To order Adaptec products, including accessories and cables,
440 $149/call on holidays) 446 call 408-957-7274. To order cables online go to
441 - To order Adaptec products including software and cables, call 447 http://www.adaptec.com/buy-cables/.
442 1-800-442-7274 or 1-408-957-7274. You can also visit our
443 online store at http://www.adaptecstore.com
444 448
445 Europe 449 Europe
446 - Visit our Web site at http://www.adaptec-europe.com. 450 - Visit our Web site at http://www.adaptec-europe.com/.
447 - English and French: To speak with a Technical Support 451 - To speak with a Technical Support Specialist, call, or email,
448 Specialist, call one of the following numbers: 452 * German: +49 89 4366 5522, Monday-Friday, 9:00-17:00 CET,
449 - English: +32-2-352-3470 453 http://ask-de.adaptec.com/.
450 - French: +32-2-352-3460 454 * French: +49 89 4366 5533, Monday-Friday, 9:00-17:00 CET,
451 Hours: Monday-Thursday, 10:00 to 12:30, 13:30 to 17:30 CET 455 http://ask-fr.adaptec.com/.
452 Friday, 10:00 to 12:30, 13:30 to 16:30 CET 456 * English: +49 89 4366 5544, Monday-Friday, 9:00-17:00 GMT,
453 - German: To speak with a Technical Support Specialist, 457 http://ask.adaptec.com/.
454 call +49-89-456-40660 458 - You can order Adaptec cables online at
455 Hours: Monday-Thursday, 09:30 to 12:30, 13:30 to 16:30 CET 459 http://www.adaptec.com/buy-cables/.
456 Friday, 09:30 to 12:30, 13:30 to 15:00 CET
457 - To order Adaptec products, including accessories and cables:
458 - UK: +0800-96-65-26 or fax +0800-731-02-95
459 - Other European countries: +32-11-300-379
460
461 Australia and New Zealand
462 - Visit our Web site at http://www.adaptec.com.au.
463 - To speak with a Technical Support Specialist, call
464 +612-9416-0698
465 Hours: Monday-Friday, 10:00 A.M. to 4:30 P.M., EAT
466 (Not open on holidays)
467 460
468 Japan 461 Japan
462 - Visit our web site at http://www.adaptec.co.jp/.
469 - To speak with a Technical Support Specialist, call 463 - To speak with a Technical Support Specialist, call
470 +81-3-5308-6120 464 +81 3 5308 6120, Monday-Friday, 9:00 a.m. to 12:00 p.m.,
471 Hours: Monday-Friday, 9:00 a.m. to 12:00 p.m., 1:00 p.m. to 465 1:00 p.m. to 6:00 p.m.
472 6:00 p.m. TSC
473
474 Hong Kong and China
475 - To speak with a Technical Support Specialist, call
476 +852-2869-7200
477 Hours: Monday-Friday, 10:00 to 17:00.
478 - Fax Technical Support at +852-2869-7100.
479
480 Singapore
481 - To speak with a Technical Support Specialist, call
482 +65-245-7470
483 Hours: Monday-Friday, 10:00 to 17:00.
484 - Fax Technical Support at +852-2869-7100
485 466
486------------------------------------------------------------------- 467-------------------------------------------------------------------
487/* 468/*
diff --git a/Documentation/scsi/aic7xxx.txt b/Documentation/scsi/aic7xxx.txt
index 47e74ddc4bc9..3481fcded4c2 100644
--- a/Documentation/scsi/aic7xxx.txt
+++ b/Documentation/scsi/aic7xxx.txt
@@ -309,81 +309,57 @@ The following information is available in this file:
309 ----------------------------------------------------------------- 309 -----------------------------------------------------------------
310 310
311 Example: 311 Example:
312 'options aic7xxx aic7xxx=verbose,no_probe,tag_info:{{},{,,10}},seltime:1" 312 'options aic7xxx aic7xxx=verbose,no_probe,tag_info:{{},{,,10}},seltime:1'
313 enables verbose logging, Disable EISA/VLB probing, 313 enables verbose logging, Disable EISA/VLB probing,
314 and set tag depth on Controller 1/Target 2 to 10 tags. 314 and set tag depth on Controller 1/Target 2 to 10 tags.
315 315
3163. Contacting Adaptec 3164. Adaptec Customer Support
317 317
318 A Technical Support Identification (TSID) Number is required for 318 A Technical Support Identification (TSID) Number is required for
319 Adaptec technical support. 319 Adaptec technical support.
320 - The 12-digit TSID can be found on the white barcode-type label 320 - The 12-digit TSID can be found on the white barcode-type label
321 included inside the box with your product. The TSID helps us 321 included inside the box with your product. The TSID helps us
322 provide more efficient service by accurately identifying your 322 provide more efficient service by accurately identifying your
323 product and support status. 323 product and support status.
324
324 Support Options 325 Support Options
325 - Search the Adaptec Support Knowledgebase (ASK) at 326 - Search the Adaptec Support Knowledgebase (ASK) at
326 http://ask.adaptec.com for articles, troubleshooting tips, and 327 http://ask.adaptec.com for articles, troubleshooting tips, and
327 frequently asked questions for your product. 328 frequently asked questions about your product.
328 - For support via Email, submit your question to Adaptec's 329 - For support via Email, submit your question to Adaptec's
329 Technical Support Specialists at http://ask.adaptec.com. 330 Technical Support Specialists at http://ask.adaptec.com/.
330 331
331 North America 332 North America
332 - Visit our Web site at http://www.adaptec.com. 333 - Visit our Web site at http://www.adaptec.com/.
333 - To speak with a Fibre Channel/RAID/External Storage Technical 334 - For information about Adaptec's support options, call
334 Support Specialist, call 1-321-207-2000, 335 408-957-2550, 24 hours a day, 7 days a week.
335 Hours: Monday-Friday, 3:00 A.M. to 5:00 P.M., PST. 336 - To speak with a Technical Support Specialist,
336 (Not open on holidays) 337 * For hardware products, call 408-934-7274,
337 - For Technical Support in all other technologies including 338 Monday to Friday, 3:00 am to 5:00 pm, PDT.
338 SCSI, call 1-408-934-7274, 339 * For RAID and Fibre Channel products, call 321-207-2000,
339 Hours: Monday-Friday, 6:00 A.M. to 5:00 P.M., PST. 340 Monday to Friday, 3:00 am to 5:00 pm, PDT.
340 (Not open on holidays) 341 To expedite your service, have your computer with you.
341 - For after hours support, call 1-800-416-8066 ($99/call, 342 - To order Adaptec products, including accessories and cables,
342 $149/call on holidays) 343 call 408-957-7274. To order cables online go to
343 - To order Adaptec products including software and cables, call 344 http://www.adaptec.com/buy-cables/.
344 1-800-442-7274 or 1-408-957-7274. You can also visit our
345 online store at http://www.adaptecstore.com
346 345
347 Europe 346 Europe
348 - Visit our Web site at http://www.adaptec-europe.com. 347 - Visit our Web site at http://www.adaptec-europe.com/.
349 - English and French: To speak with a Technical Support 348 - To speak with a Technical Support Specialist, call, or email,
350 Specialist, call one of the following numbers: 349 * German: +49 89 4366 5522, Monday-Friday, 9:00-17:00 CET,
351 - English: +32-2-352-3470 350 http://ask-de.adaptec.com/.
352 - French: +32-2-352-3460 351 * French: +49 89 4366 5533, Monday-Friday, 9:00-17:00 CET,
353 Hours: Monday-Thursday, 10:00 to 12:30, 13:30 to 17:30 CET 352 http://ask-fr.adaptec.com/.
354 Friday, 10:00 to 12:30, 13:30 to 16:30 CET 353 * English: +49 89 4366 5544, Monday-Friday, 9:00-17:00 GMT,
355 - German: To speak with a Technical Support Specialist, 354 http://ask.adaptec.com/.
356 call +49-89-456-40660 355 - You can order Adaptec cables online at
357 Hours: Monday-Thursday, 09:30 to 12:30, 13:30 to 16:30 CET 356 http://www.adaptec.com/buy-cables/.
358 Friday, 09:30 to 12:30, 13:30 to 15:00 CET
359 - To order Adaptec products, including accessories and cables:
360 - UK: +0800-96-65-26 or fax +0800-731-02-95
361 - Other European countries: +32-11-300-379
362
363 Australia and New Zealand
364 - Visit our Web site at http://www.adaptec.com.au.
365 - To speak with a Technical Support Specialist, call
366 +612-9416-0698
367 Hours: Monday-Friday, 10:00 A.M. to 4:30 P.M., EAT
368 (Not open on holidays)
369 357
370 Japan 358 Japan
359 - Visit our web site at http://www.adaptec.co.jp/.
371 - To speak with a Technical Support Specialist, call 360 - To speak with a Technical Support Specialist, call
372 +81-3-5308-6120 361 +81 3 5308 6120, Monday-Friday, 9:00 a.m. to 12:00 p.m.,
373 Hours: Monday-Friday, 9:00 a.m. to 12:00 p.m., 1:00 p.m. to 362 1:00 p.m. to 6:00 p.m.
374 6:00 p.m. TSC
375
376 Hong Kong and China
377 - To speak with a Technical Support Specialist, call
378 +852-2869-7200
379 Hours: Monday-Friday, 10:00 to 17:00.
380 - Fax Technical Support at +852-2869-7100.
381
382 Singapore
383 - To speak with a Technical Support Specialist, call
384 +65-245-7470
385 Hours: Monday-Friday, 10:00 to 17:00.
386 - Fax Technical Support at +852-2869-7100
387 363
388------------------------------------------------------------------- 364-------------------------------------------------------------------
389/* 365/*
diff --git a/Documentation/sound/alsa/ALSA-Configuration.txt b/Documentation/sound/alsa/ALSA-Configuration.txt
index d2578013e829..36b511c7cade 100644
--- a/Documentation/sound/alsa/ALSA-Configuration.txt
+++ b/Documentation/sound/alsa/ALSA-Configuration.txt
@@ -837,8 +837,10 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
837 837
838 Module for AC'97 motherboards from Intel and compatibles. 838 Module for AC'97 motherboards from Intel and compatibles.
839 * Intel i810/810E, i815, i820, i830, i84x, MX440 839 * Intel i810/810E, i815, i820, i830, i84x, MX440
840 ICH5, ICH6, ICH7, ESB2
840 * SiS 7012 (SiS 735) 841 * SiS 7012 (SiS 735)
841 * NVidia NForce, NForce2 842 * NVidia NForce, NForce2, NForce3, MCP04, CK804
843 CK8, CK8S, MCP501
842 * AMD AMD768, AMD8111 844 * AMD AMD768, AMD8111
843 * ALi m5455 845 * ALi m5455
844 846
@@ -868,6 +870,12 @@ Prior to version 0.9.0rc4 options had a 'snd_' prefix. This was removed.
868 -------------------- 870 --------------------
869 871
870 Module for Intel ICH (i8x0) chipset MC97 modems. 872 Module for Intel ICH (i8x0) chipset MC97 modems.
873 * Intel i810/810E, i815, i820, i830, i84x, MX440
874 ICH5, ICH6, ICH7
875 * SiS 7013 (SiS 735)
876 * NVidia NForce, NForce2, NForce2s, NForce3
877 * AMD AMD8111
878 * ALi m5455
871 879
872 ac97_clock - AC'97 codec clock base (0 = auto-detect) 880 ac97_clock - AC'97 codec clock base (0 = auto-detect)
873 881
diff --git a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
index e651ed8d1e6f..4251085d38d3 100644
--- a/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
+++ b/Documentation/sound/alsa/DocBook/writing-an-alsa-driver.tmpl
@@ -5206,14 +5206,14 @@ struct _snd_pcm_runtime {
5206 You need to pass the <function>snd_dma_pci_data(pci)</function>, 5206 You need to pass the <function>snd_dma_pci_data(pci)</function>,
5207 where pci is the struct <structname>pci_dev</structname> pointer 5207 where pci is the struct <structname>pci_dev</structname> pointer
5208 of the chip as well. 5208 of the chip as well.
5209 The <type>snd_sg_buf_t</type> instance is created as 5209 The <type>struct snd_sg_buf</type> instance is created as
5210 substream-&gt;dma_private. You can cast 5210 substream-&gt;dma_private. You can cast
5211 the pointer like: 5211 the pointer like:
5212 5212
5213 <informalexample> 5213 <informalexample>
5214 <programlisting> 5214 <programlisting>
5215<![CDATA[ 5215<![CDATA[
5216 struct snd_sg_buf *sgbuf = (struct snd_sg_buf_t*)substream->dma_private; 5216 struct snd_sg_buf *sgbuf = (struct snd_sg_buf *)substream->dma_private;
5217]]> 5217]]>
5218 </programlisting> 5218 </programlisting>
5219 </informalexample> 5219 </informalexample>
diff --git a/Documentation/spi/butterfly b/Documentation/spi/butterfly
index a2e8c8d90e35..9927af7a629c 100644
--- a/Documentation/spi/butterfly
+++ b/Documentation/spi/butterfly
@@ -12,13 +12,20 @@ You can make this adapter from an old printer cable and solder things
12directly to the Butterfly. Or (if you have the parts and skills) you 12directly to the Butterfly. Or (if you have the parts and skills) you
13can come up with something fancier, providing ciruit protection to the 13can come up with something fancier, providing ciruit protection to the
14Butterfly and the printer port, or with a better power supply than two 14Butterfly and the printer port, or with a better power supply than two
15signal pins from the printer port. 15signal pins from the printer port. Or for that matter, you can use
16similar cables to talk to many AVR boards, even a breadboard.
17
18This is more powerful than "ISP programming" cables since it lets kernel
19SPI protocol drivers interact with the AVR, and could even let the AVR
20issue interrupts to them. Later, your protocol driver should work
21easily with a "real SPI controller", instead of this bitbanger.
16 22
17 23
18The first cable connections will hook Linux up to one SPI bus, with the 24The first cable connections will hook Linux up to one SPI bus, with the
19AVR and a DataFlash chip; and to the AVR reset line. This is all you 25AVR and a DataFlash chip; and to the AVR reset line. This is all you
20need to reflash the firmware, and the pins are the standard Atmel "ISP" 26need to reflash the firmware, and the pins are the standard Atmel "ISP"
21connector pins (used also on non-Butterfly AVR boards). 27connector pins (used also on non-Butterfly AVR boards). On the parport
28side this is like "sp12" programming cables.
22 29
23 Signal Butterfly Parport (DB-25) 30 Signal Butterfly Parport (DB-25)
24 ------ --------- --------------- 31 ------ --------- ---------------
@@ -40,10 +47,14 @@ by clearing PORTB.[0-3]); (b) configure the mtd_dataflash driver; and
40 SELECT = J400.PB0/nSS = pin 17/C3,nSELECT 47 SELECT = J400.PB0/nSS = pin 17/C3,nSELECT
41 GND = J400.GND = pin 24/GND 48 GND = J400.GND = pin 24/GND
42 49
43The "USI" controller, using J405, can be used for a second SPI bus. That 50Or you could flash firmware making the AVR into an SPI slave (keeping the
44would let you talk to the AVR over SPI, running firmware that makes it act 51DataFlash in reset) and tweak the spi_butterfly driver to make it bind to
45as an SPI slave, while letting either Linux or the AVR use the DataFlash. 52the driver for your custom SPI-based protocol.
46There are plenty of spare parport pins to wire this one up, such as: 53
54The "USI" controller, using J405, can also be used for a second SPI bus.
55That would let you talk to the AVR using custom SPI-with-USI firmware,
56while letting either Linux or the AVR use the DataFlash. There are plenty
57of spare parport pins to wire this one up, such as:
47 58
48 Signal Butterfly Parport (DB-25) 59 Signal Butterfly Parport (DB-25)
49 ------ --------- --------------- 60 ------ --------- ---------------
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 9f11d36a8c10..b0c7ab93dcb9 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -16,6 +16,7 @@ before actually making adjustments.
16 16
17Currently, these files might (depending on your configuration) 17Currently, these files might (depending on your configuration)
18show up in /proc/sys/kernel: 18show up in /proc/sys/kernel:
19- acpi_video_flags
19- acct 20- acct
20- core_pattern 21- core_pattern
21- core_uses_pid 22- core_uses_pid
@@ -57,6 +58,15 @@ show up in /proc/sys/kernel:
57 58
58============================================================== 59==============================================================
59 60
61acpi_video_flags:
62
63flags
64
65See Doc*/kernel/power/video.txt, it allows mode of video boot to be
66set during run time.
67
68==============================================================
69
60acct: 70acct:
61 71
62highwater lowwater frequency 72highwater lowwater frequency
diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 391dd64363e7..a46c10fcddfc 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -28,6 +28,7 @@ Currently, these files are in /proc/sys/vm:
28- block_dump 28- block_dump
29- drop-caches 29- drop-caches
30- zone_reclaim_mode 30- zone_reclaim_mode
31- zone_reclaim_interval
31 32
32============================================================== 33==============================================================
33 34
@@ -126,15 +127,54 @@ the high water marks for each per cpu page list.
126 127
127zone_reclaim_mode: 128zone_reclaim_mode:
128 129
129This is set during bootup to 1 if it is determined that pages from 130Zone_reclaim_mode allows to set more or less agressive approaches to
130remote zones will cause a significant performance reduction. The 131reclaim memory when a zone runs out of memory. If it is set to zero then no
132zone reclaim occurs. Allocations will be satisfied from other zones / nodes
133in the system.
134
135This is value ORed together of
136
1371 = Zone reclaim on
1382 = Zone reclaim writes dirty pages out
1394 = Zone reclaim swaps pages
1408 = Also do a global slab reclaim pass
141
142zone_reclaim_mode is set during bootup to 1 if it is determined that pages
143from remote zones will cause a measurable performance reduction. The
131page allocator will then reclaim easily reusable pages (those page 144page allocator will then reclaim easily reusable pages (those page
132cache pages that are currently not used) before going off node. 145cache pages that are currently not used) before allocating off node pages.
146
147It may be beneficial to switch off zone reclaim if the system is
148used for a file server and all of memory should be used for caching files
149from disk. In that case the caching effect is more important than
150data locality.
151
152Allowing zone reclaim to write out pages stops processes that are
153writing large amounts of data from dirtying pages on other nodes. Zone
154reclaim will write out dirty pages if a zone fills up and so effectively
155throttle the process. This may decrease the performance of a single process
156since it cannot use all of system memory to buffer the outgoing writes
157anymore but it preserve the memory on other nodes so that the performance
158of other processes running on other nodes will not be affected.
159
160Allowing regular swap effectively restricts allocations to the local
161node unless explicitly overridden by memory policies or cpuset
162configurations.
163
164It may be advisable to allow slab reclaim if the system makes heavy
165use of files and builds up large slab caches. However, the slab
166shrink operation is global, may take a long time and free slabs
167in all nodes of the system.
168
169================================================================
170
171zone_reclaim_interval:
172
173The time allowed for off node allocations after zone reclaim
174has failed to reclaim enough pages to allow a local allocation.
133 175
134The user can override this setting. It may be beneficial to switch 176Time is set in seconds and set by default to 30 seconds.
135off zone reclaim if the system is used for a file server and all
136of memory should be used for caching files from disk.
137 177
138It may be beneficial to switch this on if one wants to do zone 178Reduce the interval if undesired off node allocations occur. However, too
139reclaim regardless of the numa distances in the system. 179frequent scans will have a negative impact onoff node allocation performance.
140 180
diff --git a/Documentation/unshare.txt b/Documentation/unshare.txt
new file mode 100644
index 000000000000..90a5e9e5bef1
--- /dev/null
+++ b/Documentation/unshare.txt
@@ -0,0 +1,295 @@
1
2unshare system call:
3--------------------
4This document describes the new system call, unshare. The document
5provides an overview of the feature, why it is needed, how it can
6be used, its interface specification, design, implementation and
7how it can be tested.
8
9Change Log:
10-----------
11version 0.1 Initial document, Janak Desai (janak@us.ibm.com), Jan 11, 2006
12
13Contents:
14---------
15 1) Overview
16 2) Benefits
17 3) Cost
18 4) Requirements
19 5) Functional Specification
20 6) High Level Design
21 7) Low Level Design
22 8) Test Specification
23 9) Future Work
24
251) Overview
26-----------
27Most legacy operating system kernels support an abstraction of threads
28as multiple execution contexts within a process. These kernels provide
29special resources and mechanisms to maintain these "threads". The Linux
30kernel, in a clever and simple manner, does not make distinction
31between processes and "threads". The kernel allows processes to share
32resources and thus they can achieve legacy "threads" behavior without
33requiring additional data structures and mechanisms in the kernel. The
34power of implementing threads in this manner comes not only from
35its simplicity but also from allowing application programmers to work
36outside the confinement of all-or-nothing shared resources of legacy
37threads. On Linux, at the time of thread creation using the clone system
38call, applications can selectively choose which resources to share
39between threads.
40
41unshare system call adds a primitive to the Linux thread model that
42allows threads to selectively 'unshare' any resources that were being
43shared at the time of their creation. unshare was conceptualized by
44Al Viro in the August of 2000, on the Linux-Kernel mailing list, as part
45of the discussion on POSIX threads on Linux. unshare augments the
46usefulness of Linux threads for applications that would like to control
47shared resources without creating a new process. unshare is a natural
48addition to the set of available primitives on Linux that implement
49the concept of process/thread as a virtual machine.
50
512) Benefits
52-----------
53unshare would be useful to large application frameworks such as PAM
54where creating a new process to control sharing/unsharing of process
55resources is not possible. Since namespaces are shared by default
56when creating a new process using fork or clone, unshare can benefit
57even non-threaded applications if they have a need to disassociate
58from default shared namespace. The following lists two use-cases
59where unshare can be used.
60
612.1 Per-security context namespaces
62-----------------------------------
63unshare can be used to implement polyinstantiated directories using
64the kernel's per-process namespace mechanism. Polyinstantiated directories,
65such as per-user and/or per-security context instance of /tmp, /var/tmp or
66per-security context instance of a user's home directory, isolate user
67processes when working with these directories. Using unshare, a PAM
68module can easily setup a private namespace for a user at login.
69Polyinstantiated directories are required for Common Criteria certification
70with Labeled System Protection Profile, however, with the availability
71of shared-tree feature in the Linux kernel, even regular Linux systems
72can benefit from setting up private namespaces at login and
73polyinstantiating /tmp, /var/tmp and other directories deemed
74appropriate by system administrators.
75
762.2 unsharing of virtual memory and/or open files
77-------------------------------------------------
78Consider a client/server application where the server is processing
79client requests by creating processes that share resources such as
80virtual memory and open files. Without unshare, the server has to
81decide what needs to be shared at the time of creating the process
82which services the request. unshare allows the server an ability to
83disassociate parts of the context during the servicing of the
84request. For large and complex middleware application frameworks, this
85ability to unshare after the process was created can be very
86useful.
87
883) Cost
89-------
90In order to not duplicate code and to handle the fact that unshare
91works on an active task (as opposed to clone/fork working on a newly
92allocated inactive task) unshare had to make minor reorganizational
93changes to copy_* functions utilized by clone/fork system call.
94There is a cost associated with altering existing, well tested and
95stable code to implement a new feature that may not get exercised
96extensively in the beginning. However, with proper design and code
97review of the changes and creation of an unshare test for the LTP
98the benefits of this new feature can exceed its cost.
99
1004) Requirements
101---------------
102unshare reverses sharing that was done using clone(2) system call,
103so unshare should have a similar interface as clone(2). That is,
104since flags in clone(int flags, void *stack) specifies what should
105be shared, similar flags in unshare(int flags) should specify
106what should be unshared. Unfortunately, this may appear to invert
107the meaning of the flags from the way they are used in clone(2).
108However, there was no easy solution that was less confusing and that
109allowed incremental context unsharing in future without an ABI change.
110
111unshare interface should accommodate possible future addition of
112new context flags without requiring a rebuild of old applications.
113If and when new context flags are added, unshare design should allow
114incremental unsharing of those resources on an as needed basis.
115
1165) Functional Specification
117---------------------------
118NAME
119 unshare - disassociate parts of the process execution context
120
121SYNOPSIS
122 #include <sched.h>
123
124 int unshare(int flags);
125
126DESCRIPTION
127 unshare allows a process to disassociate parts of its execution
128 context that are currently being shared with other processes. Part
129 of execution context, such as the namespace, is shared by default
130 when a new process is created using fork(2), while other parts,
131 such as the virtual memory, open file descriptors, etc, may be
132 shared by explicit request to share them when creating a process
133 using clone(2).
134
135 The main use of unshare is to allow a process to control its
136 shared execution context without creating a new process.
137
138 The flags argument specifies one or bitwise-or'ed of several of
139 the following constants.
140
141 CLONE_FS
142 If CLONE_FS is set, file system information of the caller
143 is disassociated from the shared file system information.
144
145 CLONE_FILES
146 If CLONE_FILES is set, the file descriptor table of the
147 caller is disassociated from the shared file descriptor
148 table.
149
150 CLONE_NEWNS
151 If CLONE_NEWNS is set, the namespace of the caller is
152 disassociated from the shared namespace.
153
154 CLONE_VM
155 If CLONE_VM is set, the virtual memory of the caller is
156 disassociated from the shared virtual memory.
157
158RETURN VALUE
159 On success, zero returned. On failure, -1 is returned and errno is
160
161ERRORS
162 EPERM CLONE_NEWNS was specified by a non-root process (process
163 without CAP_SYS_ADMIN).
164
165 ENOMEM Cannot allocate sufficient memory to copy parts of caller's
166 context that need to be unshared.
167
168 EINVAL Invalid flag was specified as an argument.
169
170CONFORMING TO
171 The unshare() call is Linux-specific and should not be used
172 in programs intended to be portable.
173
174SEE ALSO
175 clone(2), fork(2)
176
1776) High Level Design
178--------------------
179Depending on the flags argument, the unshare system call allocates
180appropriate process context structures, populates it with values from
181the current shared version, associates newly duplicated structures
182with the current task structure and releases corresponding shared
183versions. Helper functions of clone (copy_*) could not be used
184directly by unshare because of the following two reasons.
185 1) clone operates on a newly allocated not-yet-active task
186 structure, where as unshare operates on the current active
187 task. Therefore unshare has to take appropriate task_lock()
188 before associating newly duplicated context structures
189 2) unshare has to allocate and duplicate all context structures
190 that are being unshared, before associating them with the
191 current task and releasing older shared structures. Failure
192 do so will create race conditions and/or oops when trying
193 to backout due to an error. Consider the case of unsharing
194 both virtual memory and namespace. After successfully unsharing
195 vm, if the system call encounters an error while allocating
196 new namespace structure, the error return code will have to
197 reverse the unsharing of vm. As part of the reversal the
198 system call will have to go back to older, shared, vm
199 structure, which may not exist anymore.
200
201Therefore code from copy_* functions that allocated and duplicated
202current context structure was moved into new dup_* functions. Now,
203copy_* functions call dup_* functions to allocate and duplicate
204appropriate context structures and then associate them with the
205task structure that is being constructed. unshare system call on
206the other hand performs the following:
207 1) Check flags to force missing, but implied, flags
208 2) For each context structure, call the corresponding unshare
209 helper function to allocate and duplicate a new context
210 structure, if the appropriate bit is set in the flags argument.
211 3) If there is no error in allocation and duplication and there
212 are new context structures then lock the current task structure,
213 associate new context structures with the current task structure,
214 and release the lock on the current task structure.
215 4) Appropriately release older, shared, context structures.
216
2177) Low Level Design
218-------------------
219Implementation of unshare can be grouped in the following 4 different
220items:
221 a) Reorganization of existing copy_* functions
222 b) unshare system call service function
223 c) unshare helper functions for each different process context
224 d) Registration of system call number for different architectures
225
226 7.1) Reorganization of copy_* functions
227 Each copy function such as copy_mm, copy_namespace, copy_files,
228 etc, had roughly two components. The first component allocated
229 and duplicated the appropriate structure and the second component
230 linked it to the task structure passed in as an argument to the copy
231 function. The first component was split into its own function.
232 These dup_* functions allocated and duplicated the appropriate
233 context structure. The reorganized copy_* functions invoked
234 their corresponding dup_* functions and then linked the newly
235 duplicated structures to the task structure with which the
236 copy function was called.
237
238 7.2) unshare system call service function
239 * Check flags
240 Force implied flags. If CLONE_THREAD is set force CLONE_VM.
241 If CLONE_VM is set, force CLONE_SIGHAND. If CLONE_SIGHAND is
242 set and signals are also being shared, force CLONE_THREAD. If
243 CLONE_NEWNS is set, force CLONE_FS.
244 * For each context flag, invoke the corresponding unshare_*
245 helper routine with flags passed into the system call and a
246 reference to pointer pointing the new unshared structure
247 * If any new structures are created by unshare_* helper
248 functions, take the task_lock() on the current task,
249 modify appropriate context pointers, and release the
250 task lock.
251 * For all newly unshared structures, release the corresponding
252 older, shared, structures.
253
254 7.3) unshare_* helper functions
255 For unshare_* helpers corresponding to CLONE_SYSVSEM, CLONE_SIGHAND,
256 and CLONE_THREAD, return -EINVAL since they are not implemented yet.
257 For others, check the flag value to see if the unsharing is
258 required for that structure. If it is, invoke the corresponding
259 dup_* function to allocate and duplicate the structure and return
260 a pointer to it.
261
262 7.4) Appropriately modify architecture specific code to register the
263 the new system call.
264
2658) Test Specification
266---------------------
267The test for unshare should test the following:
268 1) Valid flags: Test to check that clone flags for signal and
269 signal handlers, for which unsharing is not implemented
270 yet, return -EINVAL.
271 2) Missing/implied flags: Test to make sure that if unsharing
272 namespace without specifying unsharing of filesystem, correctly
273 unshares both namespace and filesystem information.
274 3) For each of the four (namespace, filesystem, files and vm)
275 supported unsharing, verify that the system call correctly
276 unshares the appropriate structure. Verify that unsharing
277 them individually as well as in combination with each
278 other works as expected.
279 4) Concurrent execution: Use shared memory segments and futex on
280 an address in the shm segment to synchronize execution of
281 about 10 threads. Have a couple of threads execute execve,
282 a couple _exit and the rest unshare with different combination
283 of flags. Verify that unsharing is performed as expected and
284 that there are no oops or hangs.
285
2869) Future Work
287--------------
288The current implementation of unshare does not allow unsharing of
289signals and signal handlers. Signals are complex to begin with and
290to unshare signals and/or signal handlers of a currently running
291process is even more complex. If in the future there is a specific
292need to allow unsharing of signals and/or signal handlers, it can
293be incrementally added to unshare without affecting legacy
294applications using unshare.
295
diff --git a/Documentation/usb/et61x251.txt b/Documentation/usb/et61x251.txt
new file mode 100644
index 000000000000..b44dda407ce2
--- /dev/null
+++ b/Documentation/usb/et61x251.txt
@@ -0,0 +1,306 @@
1
2 ET61X[12]51 PC Camera Controllers
3 Driver for Linux
4 =================================
5
6 - Documentation -
7
8
9Index
10=====
111. Copyright
122. Disclaimer
133. License
144. Overview and features
155. Module dependencies
166. Module loading
177. Module parameters
188. Optional device control through "sysfs"
199. Supported devices
2010. Notes for V4L2 application developers
2111. Contact information
22
23
241. Copyright
25============
26Copyright (C) 2006 by Luca Risolia <luca.risolia@studio.unibo.it>
27
28
292. Disclaimer
30=============
31Etoms is a trademark of Etoms Electronics Corp.
32This software is not developed or sponsored by Etoms Electronics.
33
34
353. License
36==========
37This program is free software; you can redistribute it and/or modify
38it under the terms of the GNU General Public License as published by
39the Free Software Foundation; either version 2 of the License, or
40(at your option) any later version.
41
42This program is distributed in the hope that it will be useful,
43but WITHOUT ANY WARRANTY; without even the implied warranty of
44MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
45GNU General Public License for more details.
46
47You should have received a copy of the GNU General Public License
48along with this program; if not, write to the Free Software
49Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
50
51
524. Overview and features
53========================
54This driver supports the video interface of the devices mounting the ET61X151
55or ET61X251 PC Camera Controllers.
56
57It's worth to note that Etoms Electronics has never collaborated with the
58author during the development of this project; despite several requests,
59Etoms Electronics also refused to release enough detailed specifications of
60the video compression engine.
61
62The driver relies on the Video4Linux2 and USB core modules. It has been
63designed to run properly on SMP systems as well.
64
65The latest version of the ET61X[12]51 driver can be found at the following URL:
66http://www.linux-projects.org/
67
68Some of the features of the driver are:
69
70- full compliance with the Video4Linux2 API (see also "Notes for V4L2
71 application developers" paragraph);
72- available mmap or read/poll methods for video streaming through isochronous
73 data transfers;
74- automatic detection of image sensor;
75- support for any window resolutions and optional panning within the maximum
76 pixel area of image sensor;
77- image downscaling with arbitrary scaling factors from 1 and 2 in both
78 directions (see "Notes for V4L2 application developers" paragraph);
79- two different video formats for uncompressed or compressed data in low or
80 high compression quality (see also "Notes for V4L2 application developers"
81 paragraph);
82- full support for the capabilities of every possible image sensors that can
83 be connected to the ET61X[12]51 bridges, including, for istance, red, green,
84 blue and global gain adjustments and exposure control (see "Supported
85 devices" paragraph for details);
86- use of default color settings for sunlight conditions;
87- dynamic I/O interface for both ET61X[12]51 and image sensor control (see
88 "Optional device control through 'sysfs'" paragraph);
89- dynamic driver control thanks to various module parameters (see "Module
90 parameters" paragraph);
91- up to 64 cameras can be handled at the same time; they can be connected and
92 disconnected from the host many times without turning off the computer, if
93 the system supports hotplugging;
94- no known bugs.
95
96
975. Module dependencies
98======================
99For it to work properly, the driver needs kernel support for Video4Linux and
100USB.
101
102The following options of the kernel configuration file must be enabled and
103corresponding modules must be compiled:
104
105 # Multimedia devices
106 #
107 CONFIG_VIDEO_DEV=m
108
109To enable advanced debugging functionality on the device through /sysfs:
110
111 # Multimedia devices
112 #
113 CONFIG_VIDEO_ADV_DEBUG=y
114
115 # USB support
116 #
117 CONFIG_USB=m
118
119In addition, depending on the hardware being used, the modules below are
120necessary:
121
122 # USB Host Controller Drivers
123 #
124 CONFIG_USB_EHCI_HCD=m
125 CONFIG_USB_UHCI_HCD=m
126 CONFIG_USB_OHCI_HCD=m
127
128And finally:
129
130 # USB Multimedia devices
131 #
132 CONFIG_USB_ET61X251=m
133
134
1356. Module loading
136=================
137To use the driver, it is necessary to load the "et61x251" module into memory
138after every other module required: "videodev", "usbcore" and, depending on
139the USB host controller you have, "ehci-hcd", "uhci-hcd" or "ohci-hcd".
140
141Loading can be done as shown below:
142
143 [root@localhost home]# modprobe et61x251
144
145At this point the devices should be recognized. You can invoke "dmesg" to
146analyze kernel messages and verify that the loading process has gone well:
147
148 [user@localhost home]$ dmesg
149
150
1517. Module parameters
152====================
153Module parameters are listed below:
154-------------------------------------------------------------------------------
155Name: video_nr
156Type: short array (min = 0, max = 64)
157Syntax: <-1|n[,...]>
158Description: Specify V4L2 minor mode number:
159 -1 = use next available
160 n = use minor number n
161 You can specify up to 64 cameras this way.
162 For example:
163 video_nr=-1,2,-1 would assign minor number 2 to the second
164 registered camera and use auto for the first one and for every
165 other camera.
166Default: -1
167-------------------------------------------------------------------------------
168Name: force_munmap
169Type: bool array (min = 0, max = 64)
170Syntax: <0|1[,...]>
171Description: Force the application to unmap previously mapped buffer memory
172 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not
173 all the applications support this feature. This parameter is
174 specific for each detected camera.
175 0 = do not force memory unmapping
176 1 = force memory unmapping (save memory)
177Default: 0
178-------------------------------------------------------------------------------
179Name: debug
180Type: ushort
181Syntax: <n>
182Description: Debugging information level, from 0 to 3:
183 0 = none (use carefully)
184 1 = critical errors
185 2 = significant informations
186 3 = more verbose messages
187 Level 3 is useful for testing only, when only one device
188 is used at the same time. It also shows some more informations
189 about the hardware being detected. This module parameter can be
190 changed at runtime thanks to the /sys filesystem interface.
191Default: 2
192-------------------------------------------------------------------------------
193
194
1958. Optional device control through "sysfs"
196==========================================
197If the kernel has been compiled with the CONFIG_VIDEO_ADV_DEBUG option enabled,
198it is possible to read and write both the ET61X[12]51 and the image sensor
199registers by using the "sysfs" filesystem interface.
200
201There are four files in the /sys/class/video4linux/videoX directory for each
202registered camera: "reg", "val", "i2c_reg" and "i2c_val". The first two files
203control the ET61X[12]51 bridge, while the other two control the sensor chip.
204"reg" and "i2c_reg" hold the values of the current register index where the
205following reading/writing operations are addressed at through "val" and
206"i2c_val". Their use is not intended for end-users, unless you know what you
207are doing. Remember that you must be logged in as root before writing to them.
208
209As an example, suppose we were to want to read the value contained in the
210register number 1 of the sensor register table - which is usually the product
211identifier - of the camera registered as "/dev/video0":
212
213 [root@localhost #] cd /sys/class/video4linux/video0
214 [root@localhost #] echo 1 > i2c_reg
215 [root@localhost #] cat i2c_val
216
217Note that if the sensor registers can not be read, "cat" will fail.
218To avoid race conditions, all the I/O accesses to the files are serialized.
219
220
2219. Supported devices
222====================
223None of the names of the companies as well as their products will be mentioned
224here. They have never collaborated with the author, so no advertising.
225
226From the point of view of a driver, what unambiguously identify a device are
227its vendor and product USB identifiers. Below is a list of known identifiers of
228devices mounting the ET61X[12]51 PC camera controllers:
229
230Vendor ID Product ID
231--------- ----------
2320x102c 0x6151
2330x102c 0x6251
2340x102c 0x6253
2350x102c 0x6254
2360x102c 0x6255
2370x102c 0x6256
2380x102c 0x6257
2390x102c 0x6258
2400x102c 0x6259
2410x102c 0x625a
2420x102c 0x625b
2430x102c 0x625c
2440x102c 0x625d
2450x102c 0x625e
2460x102c 0x625f
2470x102c 0x6260
2480x102c 0x6261
2490x102c 0x6262
2500x102c 0x6263
2510x102c 0x6264
2520x102c 0x6265
2530x102c 0x6266
2540x102c 0x6267
2550x102c 0x6268
2560x102c 0x6269
257
258The following image sensors are supported:
259
260Model Manufacturer
261----- ------------
262TAS5130D1B Taiwan Advanced Sensor Corporation
263
264All the available control settings of each image sensor are supported through
265the V4L2 interface.
266
267
26810. Notes for V4L2 application developers
269========================================
270This driver follows the V4L2 API specifications. In particular, it enforces two
271rules:
272
273- exactly one I/O method, either "mmap" or "read", is associated with each
274file descriptor. Once it is selected, the application must close and reopen the
275device to switch to the other I/O method;
276
277- although it is not mandatory, previously mapped buffer memory should always
278be unmapped before calling any "VIDIOC_S_CROP" or "VIDIOC_S_FMT" ioctl's.
279The same number of buffers as before will be allocated again to match the size
280of the new video frames, so you have to map the buffers again before any I/O
281attempts on them.
282
283Consistently with the hardware limits, this driver also supports image
284downscaling with arbitrary scaling factors from 1 and 2 in both directions.
285However, the V4L2 API specifications don't correctly define how the scaling
286factor can be chosen arbitrarily by the "negotiation" of the "source" and
287"target" rectangles. To work around this flaw, we have added the convention
288that, during the negotiation, whenever the "VIDIOC_S_CROP" ioctl is issued, the
289scaling factor is restored to 1.
290
291This driver supports two different video formats: the first one is the "8-bit
292Sequential Bayer" format and can be used to obtain uncompressed video data
293from the device through the current I/O method, while the second one provides
294"raw" compressed video data (without frame headers not related to the
295compressed data). The current compression quality may vary from 0 to 1 and can
296be selected or queried thanks to the VIDIOC_S_JPEGCOMP and VIDIOC_G_JPEGCOMP
297V4L2 ioctl's.
298
299
30011. Contact information
301=======================
302The author may be contacted by e-mail at <luca.risolia@studio.unibo.it>.
303
304GPG/PGP encrypted e-mail's are accepted. The GPG key ID of the author is
305'FCE635A4'; the public 1024-bit key should be available at any keyserver;
306the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'.
diff --git a/Documentation/usb/sn9c102.txt b/Documentation/usb/sn9c102.txt
index 3f8a119db31b..c6b76414172c 100644
--- a/Documentation/usb/sn9c102.txt
+++ b/Documentation/usb/sn9c102.txt
@@ -17,16 +17,15 @@ Index
177. Module parameters 177. Module parameters
188. Optional device control through "sysfs" 188. Optional device control through "sysfs"
199. Supported devices 199. Supported devices
2010. How to add plug-in's for new image sensors 2010. Notes for V4L2 application developers
2111. Notes for V4L2 application developers 2111. Video frame formats
2212. Video frame formats 2212. Contact information
2313. Contact information 2313. Credits
2414. Credits
25 24
26 25
271. Copyright 261. Copyright
28============ 27============
29Copyright (C) 2004-2005 by Luca Risolia <luca.risolia@studio.unibo.it> 28Copyright (C) 2004-2006 by Luca Risolia <luca.risolia@studio.unibo.it>
30 29
31 30
322. Disclaimer 312. Disclaimer
@@ -54,9 +53,8 @@ Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
54 53
554. Overview and features 544. Overview and features
56======================== 55========================
57This driver attempts to support the video and audio streaming capabilities of 56This driver attempts to support the video interface of the devices mounting the
58the devices mounting the SONiX SN9C101, SN9C102 and SN9C103 PC Camera 57SONiX SN9C101, SN9C102 and SN9C103 PC Camera Controllers.
59Controllers.
60 58
61It's worth to note that SONiX has never collaborated with the author during the 59It's worth to note that SONiX has never collaborated with the author during the
62development of this project, despite several requests for enough detailed 60development of this project, despite several requests for enough detailed
@@ -78,6 +76,7 @@ Some of the features of the driver are:
78- available mmap or read/poll methods for video streaming through isochronous 76- available mmap or read/poll methods for video streaming through isochronous
79 data transfers; 77 data transfers;
80- automatic detection of image sensor; 78- automatic detection of image sensor;
79- support for built-in microphone interface;
81- support for any window resolutions and optional panning within the maximum 80- support for any window resolutions and optional panning within the maximum
82 pixel area of image sensor; 81 pixel area of image sensor;
83- image downscaling with arbitrary scaling factors from 1, 2 and 4 in both 82- image downscaling with arbitrary scaling factors from 1, 2 and 4 in both
@@ -96,7 +95,7 @@ Some of the features of the driver are:
96 parameters" paragraph); 95 parameters" paragraph);
97- up to 64 cameras can be handled at the same time; they can be connected and 96- up to 64 cameras can be handled at the same time; they can be connected and
98 disconnected from the host many times without turning off the computer, if 97 disconnected from the host many times without turning off the computer, if
99 your system supports hotplugging; 98 the system supports hotplugging;
100- no known bugs. 99- no known bugs.
101 100
102 101
@@ -112,6 +111,12 @@ corresponding modules must be compiled:
112 # 111 #
113 CONFIG_VIDEO_DEV=m 112 CONFIG_VIDEO_DEV=m
114 113
114To enable advanced debugging functionality on the device through /sysfs:
115
116 # Multimedia devices
117 #
118 CONFIG_VIDEO_ADV_DEBUG=y
119
115 # USB support 120 # USB support
116 # 121 #
117 CONFIG_USB=m 122 CONFIG_USB=m
@@ -125,6 +130,21 @@ necessary:
125 CONFIG_USB_UHCI_HCD=m 130 CONFIG_USB_UHCI_HCD=m
126 CONFIG_USB_OHCI_HCD=m 131 CONFIG_USB_OHCI_HCD=m
127 132
133The SN9C103 controller also provides a built-in microphone interface. It is
134supported by the USB Audio driver thanks to the ALSA API:
135
136 # Sound
137 #
138 CONFIG_SOUND=y
139
140 # Advanced Linux Sound Architecture
141 #
142 CONFIG_SND=m
143
144 # USB devices
145 #
146 CONFIG_SND_USB_AUDIO=m
147
128And finally: 148And finally:
129 149
130 # USB Multimedia devices 150 # USB Multimedia devices
@@ -153,7 +173,7 @@ analyze kernel messages and verify that the loading process has gone well:
153Module parameters are listed below: 173Module parameters are listed below:
154------------------------------------------------------------------------------- 174-------------------------------------------------------------------------------
155Name: video_nr 175Name: video_nr
156Type: int array (min = 0, max = 64) 176Type: short array (min = 0, max = 64)
157Syntax: <-1|n[,...]> 177Syntax: <-1|n[,...]>
158Description: Specify V4L2 minor mode number: 178Description: Specify V4L2 minor mode number:
159 -1 = use next available 179 -1 = use next available
@@ -165,19 +185,19 @@ Description: Specify V4L2 minor mode number:
165 other camera. 185 other camera.
166Default: -1 186Default: -1
167------------------------------------------------------------------------------- 187-------------------------------------------------------------------------------
168Name: force_munmap; 188Name: force_munmap
169Type: bool array (min = 0, max = 64) 189Type: bool array (min = 0, max = 64)
170Syntax: <0|1[,...]> 190Syntax: <0|1[,...]>
171Description: Force the application to unmap previously mapped buffer memory 191Description: Force the application to unmap previously mapped buffer memory
172 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not 192 before calling any VIDIOC_S_CROP or VIDIOC_S_FMT ioctl's. Not
173 all the applications support this feature. This parameter is 193 all the applications support this feature. This parameter is
174 specific for each detected camera. 194 specific for each detected camera.
175 0 = do not force memory unmapping" 195 0 = do not force memory unmapping
176 1 = force memory unmapping (save memory)" 196 1 = force memory unmapping (save memory)
177Default: 0 197Default: 0
178------------------------------------------------------------------------------- 198-------------------------------------------------------------------------------
179Name: debug 199Name: debug
180Type: int 200Type: ushort
181Syntax: <n> 201Syntax: <n>
182Description: Debugging information level, from 0 to 3: 202Description: Debugging information level, from 0 to 3:
183 0 = none (use carefully) 203 0 = none (use carefully)
@@ -187,14 +207,15 @@ Description: Debugging information level, from 0 to 3:
187 Level 3 is useful for testing only, when only one device 207 Level 3 is useful for testing only, when only one device
188 is used. It also shows some more informations about the 208 is used. It also shows some more informations about the
189 hardware being detected. This parameter can be changed at 209 hardware being detected. This parameter can be changed at
190 runtime thanks to the /sys filesystem. 210 runtime thanks to the /sys filesystem interface.
191Default: 2 211Default: 2
192------------------------------------------------------------------------------- 212-------------------------------------------------------------------------------
193 213
194 214
1958. Optional device control through "sysfs" [1] 2158. Optional device control through "sysfs" [1]
196========================================== 216==========================================
197It is possible to read and write both the SN9C10x and the image sensor 217If the kernel has been compiled with the CONFIG_VIDEO_ADV_DEBUG option enabled,
218it is possible to read and write both the SN9C10x and the image sensor
198registers by using the "sysfs" filesystem interface. 219registers by using the "sysfs" filesystem interface.
199 220
200Every time a supported device is recognized, a write-only file named "green" is 221Every time a supported device is recognized, a write-only file named "green" is
@@ -236,7 +257,7 @@ serialized.
236 257
237The sysfs interface also provides the "frame_header" entry, which exports the 258The sysfs interface also provides the "frame_header" entry, which exports the
238frame header of the most recent requested and captured video frame. The header 259frame header of the most recent requested and captured video frame. The header
239is 12-bytes long and is appended to every video frame by the SN9C10x 260is always 18-bytes long and is appended to every video frame by the SN9C10x
240controllers. As an example, this additional information can be used by the user 261controllers. As an example, this additional information can be used by the user
241application for implementing auto-exposure features via software. 262application for implementing auto-exposure features via software.
242 263
@@ -250,7 +271,8 @@ Byte # Value Description
2500x03 0xC4 Frame synchronisation pattern. 2710x03 0xC4 Frame synchronisation pattern.
2510x04 0xC4 Frame synchronisation pattern. 2720x04 0xC4 Frame synchronisation pattern.
2520x05 0x96 Frame synchronisation pattern. 2730x05 0x96 Frame synchronisation pattern.
2530x06 0x00 or 0x01 Unknown meaning. The exact value depends on the chip. 2740x06 0xXX Unknown meaning. The exact value depends on the chip;
275 possible values are 0x00, 0x01 and 0x20.
2540x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a 2760x07 0xXX Variable value, whose bits are ff00uzzc, where ff is a
255 frame counter, u is unknown, zz is a size indicator 277 frame counter, u is unknown, zz is a size indicator
256 (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for 278 (00 = VGA, 01 = SIF, 10 = QSIF) and c stands for
@@ -267,12 +289,23 @@ Byte # Value Description
267 times the area outside of the specified AE area. For 289 times the area outside of the specified AE area. For
268 images that are not pure white, the value scales down 290 images that are not pure white, the value scales down
269 according to relative whiteness. 291 according to relative whiteness.
292 according to relative whiteness.
293
294The following bytes are used by the SN9C103 bridge only:
295
2960x0C 0xXX Unknown meaning
2970x0D 0xXX Unknown meaning
2980x0E 0xXX Unknown meaning
2990x0F 0xXX Unknown meaning
3000x10 0xXX Unknown meaning
3010x11 0xXX Unknown meaning
270 302
271The AE area (sx, sy, ex, ey) in the active window can be set by programming the 303The AE area (sx, sy, ex, ey) in the active window can be set by programming the
272registers 0x1c, 0x1d, 0x1e and 0x1f of the SN9C10x controllers, where one unit 304registers 0x1c, 0x1d, 0x1e and 0x1f of the SN9C10x controllers, where one unit
273corresponds to 32 pixels. 305corresponds to 32 pixels.
274 306
275[1] The frame header has been documented by Bertrik Sikken. 307[1] Part of the meaning of the frame header has been documented by Bertrik
308 Sikken.
276 309
277 310
2789. Supported devices 3119. Supported devices
@@ -298,6 +331,7 @@ Vendor ID Product ID
2980x0c45 0x602b 3310x0c45 0x602b
2990x0c45 0x602c 3320x0c45 0x602c
3000x0c45 0x602d 3330x0c45 0x602d
3340x0c45 0x602e
3010x0c45 0x6030 3350x0c45 0x6030
3020x0c45 0x6080 3360x0c45 0x6080
3030x0c45 0x6082 3370x0c45 0x6082
@@ -348,18 +382,7 @@ appreciated. Non-available hardware will not be supported by the author of this
348driver. 382driver.
349 383
350 384
35110. How to add plug-in's for new image sensors 38510. Notes for V4L2 application developers
352==============================================
353It should be easy to write plug-in's for new sensors by using the small API
354that has been created for this purpose, which is present in "sn9c102_sensor.h"
355(documentation is included there). As an example, have a look at the code in
356"sn9c102_pas106b.c", which uses the mentioned interface.
357
358At the moment, possible unsupported image sensors are: CIS-VF10 (VGA),
359OV7620 (VGA), OV7630 (VGA).
360
361
36211. Notes for V4L2 application developers
363========================================= 386=========================================
364This driver follows the V4L2 API specifications. In particular, it enforces two 387This driver follows the V4L2 API specifications. In particular, it enforces two
365rules: 388rules:
@@ -394,7 +417,7 @@ initialized (as described in the documentation of the API for the image sensors
394supplied by this driver). 417supplied by this driver).
395 418
396 419
39712. Video frame formats [1] 42011. Video frame formats [1]
398======================= 421=======================
399The SN9C10x PC Camera Controllers can send images in two possible video 422The SN9C10x PC Camera Controllers can send images in two possible video
400formats over the USB: either native "Sequential RGB Bayer" or Huffman 423formats over the USB: either native "Sequential RGB Bayer" or Huffman
@@ -455,7 +478,7 @@ The following Huffman codes have been found:
455 documented by Bertrik Sikken. 478 documented by Bertrik Sikken.
456 479
457 480
45813. Contact information 48112. Contact information
459======================= 482=======================
460The author may be contacted by e-mail at <luca.risolia@studio.unibo.it>. 483The author may be contacted by e-mail at <luca.risolia@studio.unibo.it>.
461 484
@@ -464,7 +487,7 @@ GPG/PGP encrypted e-mail's are accepted. The GPG key ID of the author is
464the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'. 487the fingerprint is: '88E8 F32F 7244 68BA 3958 5D40 99DA 5D2A FCE6 35A4'.
465 488
466 489
46714. Credits 49013. Credits
468=========== 491===========
469Many thanks to following persons for their contribute (listed in alphabetical 492Many thanks to following persons for their contribute (listed in alphabetical
470order): 493order):
@@ -480,5 +503,5 @@ order):
480- Bertrik Sikken, who reverse-engineered and documented the Huffman compression 503- Bertrik Sikken, who reverse-engineered and documented the Huffman compression
481 algorithm used in the SN9C10x controllers and implemented the first decoder; 504 algorithm used in the SN9C10x controllers and implemented the first decoder;
482- Mizuno Takafumi for the donation of a webcam; 505- Mizuno Takafumi for the donation of a webcam;
483- An "anonymous" donator (who didn't want his name to be revealed) for the 506- an "anonymous" donator (who didn't want his name to be revealed) for the
484 donation of a webcam. 507 donation of a webcam.
diff --git a/Documentation/usb/w9968cf.txt b/Documentation/usb/w9968cf.txt
index 18a47738d56c..9d46cd0b19e3 100644
--- a/Documentation/usb/w9968cf.txt
+++ b/Documentation/usb/w9968cf.txt
@@ -57,16 +57,12 @@ based cameras should be supported as well.
57The driver is divided into two modules: the basic one, "w9968cf", is needed for 57The driver is divided into two modules: the basic one, "w9968cf", is needed for
58the supported devices to work; the second one, "w9968cf-vpp", is an optional 58the supported devices to work; the second one, "w9968cf-vpp", is an optional
59module, which provides some useful video post-processing functions like video 59module, which provides some useful video post-processing functions like video
60decoding, up-scaling and colour conversions. Once the driver is installed, 60decoding, up-scaling and colour conversions.
61every time an application tries to open a recognized device, "w9968cf" checks
62the presence of the "w9968cf-vpp" module and loads it automatically by default.
63 61
64Please keep in mind that official kernels do not include the second module for 62Note that the official kernels do neither include nor support the second
65performance purposes. However it is always recommended to download and install 63module for performance purposes. Therefore, it is always recommended to
66the latest and complete release of the driver, replacing the existing one, if 64download and install the latest and complete release of the driver,
67present: it will be still even possible not to load the "w9968cf-vpp" module at 65replacing the existing one, if present.
68all, if you ever want to. Another important missing feature of the version in
69the official Linux 2.4 kernels is the writeable /proc filesystem interface.
70 66
71The latest and full-featured version of the W996[87]CF driver can be found at: 67The latest and full-featured version of the W996[87]CF driver can be found at:
72http://www.linux-projects.org. Please refer to the documentation included in 68http://www.linux-projects.org. Please refer to the documentation included in
@@ -201,22 +197,6 @@ Note: The kernel must be compiled with the CONFIG_KMOD option
201 enabled for the 'ovcamchip' module to be loaded and for 197 enabled for the 'ovcamchip' module to be loaded and for
202 this parameter to be present. 198 this parameter to be present.
203------------------------------------------------------------------------------- 199-------------------------------------------------------------------------------
204Name: vppmod_load
205Type: bool
206Syntax: <0|1>
207Description: Automatic 'w9968cf-vpp' module loading: 0 disabled, 1 enabled.
208 If enabled, every time an application attempts to open a
209 camera, 'insmod' searches for the video post-processing module
210 in the system and loads it automatically (if present).
211 The optional 'w9968cf-vpp' module adds extra image manipulation
212 capabilities to the 'w9968cf' module,like software up-scaling,
213 colour conversions and video decompression for very high frame
214 rates.
215Default: 1
216Note: The kernel must be compiled with the CONFIG_KMOD option
217 enabled for the 'w9968cf-vpp' module to be loaded and for
218 this parameter to be present.
219-------------------------------------------------------------------------------
220Name: simcams 200Name: simcams
221Type: int 201Type: int
222Syntax: <n> 202Syntax: <n>
diff --git a/Documentation/vm/page_migration b/Documentation/vm/page_migration
new file mode 100644
index 000000000000..0dd4ef30c361
--- /dev/null
+++ b/Documentation/vm/page_migration
@@ -0,0 +1,175 @@
1Page migration
2--------------
3
4Page migration allows the moving of the physical location of pages between
5nodes in a numa system while the process is running. This means that the
6virtual addresses that the process sees do not change. However, the
7system rearranges the physical location of those pages.
8
9The main intend of page migration is to reduce the latency of memory access
10by moving pages near to the processor where the process accessing that memory
11is running.
12
13Page migration allows a process to manually relocate the node on which its
14pages are located through the MF_MOVE and MF_MOVE_ALL options while setting
15a new memory policy via mbind(). The pages of process can also be relocated
16from another process using the sys_migrate_pages() function call. The
17migrate_pages function call takes two sets of nodes and moves pages of a
18process that are located on the from nodes to the destination nodes.
19Page migration functions are provided by the numactl package by Andi Kleen
20(a version later than 0.9.3 is required. Get it from
21ftp://ftp.suse.com/pub/people/ak). numactl provided libnuma which
22provides an interface similar to other numa functionality for page migration.
23cat /proc/<pid>/numa_maps allows an easy review of where the pages of
24a process are located. See also the numa_maps manpage in the numactl package.
25
26Manual migration is useful if for example the scheduler has relocated
27a process to a processor on a distant node. A batch scheduler or an
28administrator may detect the situation and move the pages of the process
29nearer to the new processor. At some point in the future we may have
30some mechanism in the scheduler that will automatically move the pages.
31
32Larger installations usually partition the system using cpusets into
33sections of nodes. Paul Jackson has equipped cpusets with the ability to
34move pages when a task is moved to another cpuset (See ../cpusets.txt).
35Cpusets allows the automation of process locality. If a task is moved to
36a new cpuset then also all its pages are moved with it so that the
37performance of the process does not sink dramatically. Also the pages
38of processes in a cpuset are moved if the allowed memory nodes of a
39cpuset are changed.
40
41Page migration allows the preservation of the relative location of pages
42within a group of nodes for all migration techniques which will preserve a
43particular memory allocation pattern generated even after migrating a
44process. This is necessary in order to preserve the memory latencies.
45Processes will run with similar performance after migration.
46
47Page migration occurs in several steps. First a high level
48description for those trying to use migrate_pages() from the kernel
49(for userspace usage see the Andi Kleen's numactl package mentioned above)
50and then a low level description of how the low level details work.
51
52A. In kernel use of migrate_pages()
53-----------------------------------
54
551. Remove pages from the LRU.
56
57 Lists of pages to be migrated are generated by scanning over
58 pages and moving them into lists. This is done by
59 calling isolate_lru_page().
60 Calling isolate_lru_page increases the references to the page
61 so that it cannot vanish while the page migration occurs.
62 It also prevents the swapper or other scans to encounter
63 the page.
64
652. Generate a list of newly allocates page. These pages will contain the
66 contents of the pages from the first list after page migration is
67 complete.
68
693. The migrate_pages() function is called which attempts
70 to do the migration. It returns the moved pages in the
71 list specified as the third parameter and the failed
72 migrations in the fourth parameter. The first parameter
73 will contain the pages that could still be retried.
74
754. The leftover pages of various types are returned
76 to the LRU using putback_to_lru_pages() or otherwise
77 disposed of. The pages will still have the refcount as
78 increased by isolate_lru_pages() if putback_to_lru_pages() is not
79 used! The kernel may want to handle the various cases of failures in
80 different ways.
81
82B. How migrate_pages() works
83----------------------------
84
85migrate_pages() does several passes over its list of pages. A page is moved
86if all references to a page are removable at the time. The page has
87already been removed from the LRU via isolate_lru_page() and the refcount
88is increased so that the page cannot be freed while page migration occurs.
89
90Steps:
91
921. Lock the page to be migrated
93
942. Insure that writeback is complete.
95
963. Make sure that the page has assigned swap cache entry if
97 it is an anonyous page. The swap cache reference is necessary
98 to preserve the information contain in the page table maps while
99 page migration occurs.
100
1014. Prep the new page that we want to move to. It is locked
102 and set to not being uptodate so that all accesses to the new
103 page immediately lock while the move is in progress.
104
1055. All the page table references to the page are either dropped (file
106 backed pages) or converted to swap references (anonymous pages).
107 This should decrease the reference count.
108
1096. The radix tree lock is taken. This will cause all processes trying
110 to reestablish a pte to block on the radix tree spinlock.
111
1127. The refcount of the page is examined and we back out if references remain
113 otherwise we know that we are the only one referencing this page.
114
1158. The radix tree is checked and if it does not contain the pointer to this
116 page then we back out because someone else modified the mapping first.
117
1189. The mapping is checked. If the mapping is gone then a truncate action may
119 be in progress and we back out.
120
12110. The new page is prepped with some settings from the old page so that
122 accesses to the new page will be discovered to have the correct settings.
123
12411. The radix tree is changed to point to the new page.
125
12612. The reference count of the old page is dropped because the radix tree
127 reference is gone.
128
12913. The radix tree lock is dropped. With that lookups become possible again
130 and other processes will move from spinning on the tree lock to sleeping on
131 the locked new page.
132
13314. The page contents are copied to the new page.
134
13515. The remaining page flags are copied to the new page.
136
13716. The old page flags are cleared to indicate that the page does
138 not use any information anymore.
139
14017. Queued up writeback on the new page is triggered.
141
14218. If swap pte's were generated for the page then replace them with real
143 ptes. This will reenable access for processes not blocked by the page lock.
144
14519. The page locks are dropped from the old and new page.
146 Processes waiting on the page lock can continue.
147
14820. The new page is moved to the LRU and can be scanned by the swapper
149 etc again.
150
151TODO list
152---------
153
154- Page migration requires the use of swap handles to preserve the
155 information of the anonymous page table entries. This means that swap
156 space is reserved but never used. The maximum number of swap handles used
157 is determined by CHUNK_SIZE (see mm/mempolicy.c) per ongoing migration.
158 Reservation of pages could be avoided by having a special type of swap
159 handle that does not require swap space and that would only track the page
160 references. Something like that was proposed by Marcelo Tosatti in the
161 past (search for migration cache on lkml or linux-mm@kvack.org).
162
163- Page migration unmaps ptes for file backed pages and requires page
164 faults to reestablish these ptes. This could be optimized by somehow
165 recording the references before migration and then reestablish them later.
166 However, there are several locking challenges that have to be overcome
167 before this is possible.
168
169- Page migration generates read ptes for anonymous pages. Dirty page
170 faults are required to make the pages writable again. It may be possible
171 to generate a pte marked dirty if it is known that the page is dirty and
172 that this process has the only reference to that page.
173
174Christoph Lameter, March 8, 2006.
175
diff --git a/Documentation/x86_64/boot-options.txt b/Documentation/x86_64/boot-options.txt
index 9c5fc15d03d1..1921353259ae 100644
--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -40,6 +40,22 @@ APICs
40 no_timer_check Don't check the IO-APIC timer. This can work around 40 no_timer_check Don't check the IO-APIC timer. This can work around
41 problems with incorrect timer initialization on some boards. 41 problems with incorrect timer initialization on some boards.
42 42
43 apicmaintimer Run time keeping from the local APIC timer instead
44 of using the PIT/HPET interrupt for this. This is useful
45 when the PIT/HPET interrupts are unreliable.
46
47 noapicmaintimer Don't do time keeping using the APIC timer.
48 Useful when this option was auto selected, but doesn't work.
49
50 apicpmtimer
51 Do APIC timer calibration using the pmtimer. Implies
52 apicmaintimer. Useful when your PIT timer is totally
53 broken.
54
55 disable_8254_timer / enable_8254_timer
56 Enable interrupt 0 timer routing over the 8254 in addition to over
57 the IO-APIC. The kernel tries to set a sensible default.
58
43Early Console 59Early Console
44 60
45 syntax: earlyprintk=vga 61 syntax: earlyprintk=vga