aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorDmitry Torokhov <dtor_core@ameritech.net>2005-09-09 21:14:47 -0400
committerDmitry Torokhov <dtor_core@ameritech.net>2005-09-09 21:14:47 -0400
commitd344c5e0856ad03278d8700b503762dbc8b86e12 (patch)
treea6d893a643470a3c2580a58f3228a55fa1fd1d82 /Documentation
parent010988e888a0abbe7118635c1b33d049caae6b29 (diff)
parent87fc767b832ef5a681a0ff9d203c3289bc3be2bf (diff)
Manual merge with Linus
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/00-INDEX2
-rw-r--r--Documentation/DMA-ISA-LPC.txt151
-rw-r--r--Documentation/DocBook/kernel-hacking.tmpl310
-rw-r--r--Documentation/DocBook/mcabook.tmpl2
-rw-r--r--Documentation/IPMI.txt13
-rw-r--r--Documentation/RCU/NMI-RCU.txt112
-rw-r--r--Documentation/RCU/rcuref.txt74
-rw-r--r--Documentation/acpi-hotkey.txt2
-rw-r--r--Documentation/applying-patches.txt439
-rw-r--r--Documentation/cdrom/sonycd5353
-rw-r--r--Documentation/cpusets.txt12
-rw-r--r--Documentation/crypto/api-intro.txt1
-rw-r--r--Documentation/dcdbas.txt91
-rw-r--r--Documentation/dell_rbu.txt74
-rw-r--r--Documentation/dvb/bt8xx.txt89
-rw-r--r--Documentation/dvb/ci.txt9
-rw-r--r--Documentation/exception.txt2
-rw-r--r--Documentation/fb/cyblafb/bugs14
-rw-r--r--Documentation/fb/cyblafb/credits7
-rw-r--r--Documentation/fb/cyblafb/documentation17
-rw-r--r--Documentation/fb/cyblafb/fb.modes155
-rw-r--r--Documentation/fb/cyblafb/performance80
-rw-r--r--Documentation/fb/cyblafb/todo32
-rw-r--r--Documentation/fb/cyblafb/usage206
-rw-r--r--Documentation/fb/cyblafb/whycyblafb85
-rw-r--r--Documentation/fb/modedb.txt73
-rw-r--r--Documentation/feature-removal-schedule.txt35
-rw-r--r--Documentation/filesystems/files.txt123
-rw-r--r--Documentation/filesystems/fuse.txt315
-rw-r--r--Documentation/filesystems/ntfs.txt12
-rw-r--r--Documentation/filesystems/proc.txt41
-rw-r--r--Documentation/filesystems/relayfs.txt362
-rw-r--r--Documentation/filesystems/sysfs.txt28
-rw-r--r--Documentation/filesystems/v9fs.txt95
-rw-r--r--Documentation/filesystems/vfs.txt435
-rw-r--r--Documentation/hwmon/lm787
-rw-r--r--Documentation/hwmon/w83792d174
-rw-r--r--Documentation/i2c/chips/max687594
-rw-r--r--Documentation/i2c/functionality2
-rw-r--r--Documentation/i2c/porting-clients25
-rw-r--r--Documentation/i2c/writing-clients114
-rw-r--r--Documentation/i386/boot.txt35
-rw-r--r--Documentation/ibm-acpi.txt376
-rw-r--r--Documentation/input/yealink.txt203
-rw-r--r--Documentation/kbuild/makefiles.txt6
-rw-r--r--Documentation/kdump/kdump.txt16
-rw-r--r--Documentation/kernel-parameters.txt5
-rw-r--r--Documentation/power/swsusp-dmcrypt.txt138
-rw-r--r--Documentation/power/swsusp.txt102
-rw-r--r--Documentation/power/video.txt10
-rw-r--r--Documentation/scsi/aic7xxx.txt6
-rw-r--r--Documentation/scsi/scsi_mid_low_api.txt41
-rw-r--r--Documentation/sonypi.txt10
-rw-r--r--Documentation/sparse.txt2
-rw-r--r--Documentation/video4linux/CARDLIST.bttv4
-rw-r--r--Documentation/video4linux/CARDLIST.saa71343
-rw-r--r--Documentation/video4linux/CARDLIST.tuner1
-rw-r--r--Documentation/vm/locking15
-rw-r--r--Documentation/watchdog/watchdog-api.txt20
59 files changed, 4185 insertions, 725 deletions
diff --git a/Documentation/00-INDEX b/Documentation/00-INDEX
index f28a24e0279b..f6de52b01059 100644
--- a/Documentation/00-INDEX
+++ b/Documentation/00-INDEX
@@ -46,6 +46,8 @@ SubmittingPatches
46 - procedure to get a source patch included into the kernel tree. 46 - procedure to get a source patch included into the kernel tree.
47VGA-softcursor.txt 47VGA-softcursor.txt
48 - how to change your VGA cursor from a blinking underscore. 48 - how to change your VGA cursor from a blinking underscore.
49applying-patches.txt
50 - description of various trees and how to apply their patches.
49arm/ 51arm/
50 - directory with info about Linux on the ARM architecture. 52 - directory with info about Linux on the ARM architecture.
51basic_profiling.txt 53basic_profiling.txt
diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt
new file mode 100644
index 000000000000..705f6be92bdb
--- /dev/null
+++ b/Documentation/DMA-ISA-LPC.txt
@@ -0,0 +1,151 @@
1 DMA with ISA and LPC devices
2 ============================
3
4 Pierre Ossman <drzeus@drzeus.cx>
5
6This document describes how to do DMA transfers using the old ISA DMA
7controller. Even though ISA is more or less dead today the LPC bus
8uses the same DMA system so it will be around for quite some time.
9
10Part I - Headers and dependencies
11---------------------------------
12
13To do ISA style DMA you need to include two headers:
14
15#include <linux/dma-mapping.h>
16#include <asm/dma.h>
17
18The first is the generic DMA API used to convert virtual addresses to
19physical addresses (see Documentation/DMA-API.txt for details).
20
21The second contains the routines specific to ISA DMA transfers. Since
22this is not present on all platforms make sure you construct your
23Kconfig to be dependent on ISA_DMA_API (not ISA) so that nobody tries
24to build your driver on unsupported platforms.
25
26Part II - Buffer allocation
27---------------------------
28
29The ISA DMA controller has some very strict requirements on which
30memory it can access so extra care must be taken when allocating
31buffers.
32
33(You usually need a special buffer for DMA transfers instead of
34transferring directly to and from your normal data structures.)
35
36The DMA-able address space is the lowest 16 MB of _physical_ memory.
37Also the transfer block may not cross page boundaries (which are 64
38or 128 KiB depending on which channel you use).
39
40In order to allocate a piece of memory that satisfies all these
41requirements you pass the flag GFP_DMA to kmalloc.
42
43Unfortunately the memory available for ISA DMA is scarce so unless you
44allocate the memory during boot-up it's a good idea to also pass
45__GFP_REPEAT and __GFP_NOWARN to make the allocater try a bit harder.
46
47(This scarcity also means that you should allocate the buffer as
48early as possible and not release it until the driver is unloaded.)
49
50Part III - Address translation
51------------------------------
52
53To translate the virtual address to a physical use the normal DMA
54API. Do _not_ use isa_virt_to_phys() even though it does the same
55thing. The reason for this is that the function isa_virt_to_phys()
56will require a Kconfig dependency to ISA, not just ISA_DMA_API which
57is really all you need. Remember that even though the DMA controller
58has its origins in ISA it is used elsewhere.
59
60Note: x86_64 had a broken DMA API when it came to ISA but has since
61been fixed. If your arch has problems then fix the DMA API instead of
62reverting to the ISA functions.
63
64Part IV - Channels
65------------------
66
67A normal ISA DMA controller has 8 channels. The lower four are for
688-bit transfers and the upper four are for 16-bit transfers.
69
70(Actually the DMA controller is really two separate controllers where
71channel 4 is used to give DMA access for the second controller (0-3).
72This means that of the four 16-bits channels only three are usable.)
73
74You allocate these in a similar fashion as all basic resources:
75
76extern int request_dma(unsigned int dmanr, const char * device_id);
77extern void free_dma(unsigned int dmanr);
78
79The ability to use 16-bit or 8-bit transfers is _not_ up to you as a
80driver author but depends on what the hardware supports. Check your
81specs or test different channels.
82
83Part V - Transfer data
84----------------------
85
86Now for the good stuff, the actual DMA transfer. :)
87
88Before you use any ISA DMA routines you need to claim the DMA lock
89using claim_dma_lock(). The reason is that some DMA operations are
90not atomic so only one driver may fiddle with the registers at a
91time.
92
93The first time you use the DMA controller you should call
94clear_dma_ff(). This clears an internal register in the DMA
95controller that is used for the non-atomic operations. As long as you
96(and everyone else) uses the locking functions then you only need to
97reset this once.
98
99Next, you tell the controller in which direction you intend to do the
100transfer using set_dma_mode(). Currently you have the options
101DMA_MODE_READ and DMA_MODE_WRITE.
102
103Set the address from where the transfer should start (this needs to
104be 16-bit aligned for 16-bit transfers) and how many bytes to
105transfer. Note that it's _bytes_. The DMA routines will do all the
106required translation to values that the DMA controller understands.
107
108The final step is enabling the DMA channel and releasing the DMA
109lock.
110
111Once the DMA transfer is finished (or timed out) you should disable
112the channel again. You should also check get_dma_residue() to make
113sure that all data has been transfered.
114
115Example:
116
117int flags, residue;
118
119flags = claim_dma_lock();
120
121clear_dma_ff();
122
123set_dma_mode(channel, DMA_MODE_WRITE);
124set_dma_addr(channel, phys_addr);
125set_dma_count(channel, num_bytes);
126
127dma_enable(channel);
128
129release_dma_lock(flags);
130
131while (!device_done());
132
133flags = claim_dma_lock();
134
135dma_disable(channel);
136
137residue = dma_get_residue(channel);
138if (residue != 0)
139 printk(KERN_ERR "driver: Incomplete DMA transfer!"
140 " %d bytes left!\n", residue);
141
142release_dma_lock(flags);
143
144Part VI - Suspend/resume
145------------------------
146
147It is the driver's responsibility to make sure that the machine isn't
148suspended while a DMA transfer is in progress. Also, all DMA settings
149are lost when the system suspends so if your driver relies on the DMA
150controller being in a certain state then you have to restore these
151registers upon resume.
diff --git a/Documentation/DocBook/kernel-hacking.tmpl b/Documentation/DocBook/kernel-hacking.tmpl
index 49a9ef82d575..6367bba32d22 100644
--- a/Documentation/DocBook/kernel-hacking.tmpl
+++ b/Documentation/DocBook/kernel-hacking.tmpl
@@ -8,8 +8,7 @@
8 8
9 <authorgroup> 9 <authorgroup>
10 <author> 10 <author>
11 <firstname>Paul</firstname> 11 <firstname>Rusty</firstname>
12 <othername>Rusty</othername>
13 <surname>Russell</surname> 12 <surname>Russell</surname>
14 <affiliation> 13 <affiliation>
15 <address> 14 <address>
@@ -20,7 +19,7 @@
20 </authorgroup> 19 </authorgroup>
21 20
22 <copyright> 21 <copyright>
23 <year>2001</year> 22 <year>2005</year>
24 <holder>Rusty Russell</holder> 23 <holder>Rusty Russell</holder>
25 </copyright> 24 </copyright>
26 25
@@ -64,7 +63,7 @@
64 <chapter id="introduction"> 63 <chapter id="introduction">
65 <title>Introduction</title> 64 <title>Introduction</title>
66 <para> 65 <para>
67 Welcome, gentle reader, to Rusty's Unreliable Guide to Linux 66 Welcome, gentle reader, to Rusty's Remarkably Unreliable Guide to Linux
68 Kernel Hacking. This document describes the common routines and 67 Kernel Hacking. This document describes the common routines and
69 general requirements for kernel code: its goal is to serve as a 68 general requirements for kernel code: its goal is to serve as a
70 primer for Linux kernel development for experienced C 69 primer for Linux kernel development for experienced C
@@ -96,13 +95,13 @@
96 95
97 <listitem> 96 <listitem>
98 <para> 97 <para>
99 not associated with any process, serving a softirq, tasklet or bh; 98 not associated with any process, serving a softirq or tasklet;
100 </para> 99 </para>
101 </listitem> 100 </listitem>
102 101
103 <listitem> 102 <listitem>
104 <para> 103 <para>
105 running in kernel space, associated with a process; 104 running in kernel space, associated with a process (user context);
106 </para> 105 </para>
107 </listitem> 106 </listitem>
108 107
@@ -114,11 +113,12 @@
114 </itemizedlist> 113 </itemizedlist>
115 114
116 <para> 115 <para>
117 There is a strict ordering between these: other than the last 116 There is an ordering between these. The bottom two can preempt
118 category (userspace) each can only be pre-empted by those above. 117 each other, but above that is a strict hierarchy: each can only be
119 For example, while a softirq is running on a CPU, no other 118 preempted by the ones above it. For example, while a softirq is
120 softirq will pre-empt it, but a hardware interrupt can. However, 119 running on a CPU, no other softirq will preempt it, but a hardware
121 any other CPUs in the system execute independently. 120 interrupt can. However, any other CPUs in the system execute
121 independently.
122 </para> 122 </para>
123 123
124 <para> 124 <para>
@@ -130,10 +130,10 @@
130 <title>User Context</title> 130 <title>User Context</title>
131 131
132 <para> 132 <para>
133 User context is when you are coming in from a system call or 133 User context is when you are coming in from a system call or other
134 other trap: you can sleep, and you own the CPU (except for 134 trap: like userspace, you can be preempted by more important tasks
135 interrupts) until you call <function>schedule()</function>. 135 and by interrupts. You can sleep, by calling
136 In other words, user context (unlike userspace) is not pre-emptable. 136 <function>schedule()</function>.
137 </para> 137 </para>
138 138
139 <note> 139 <note>
@@ -153,7 +153,7 @@
153 153
154 <caution> 154 <caution>
155 <para> 155 <para>
156 Beware that if you have interrupts or bottom halves disabled 156 Beware that if you have preemption or softirqs disabled
157 (see below), <function>in_interrupt()</function> will return a 157 (see below), <function>in_interrupt()</function> will return a
158 false positive. 158 false positive.
159 </para> 159 </para>
@@ -168,10 +168,10 @@
168 <hardware>keyboard</hardware> are examples of real 168 <hardware>keyboard</hardware> are examples of real
169 hardware which produce interrupts at any time. The kernel runs 169 hardware which produce interrupts at any time. The kernel runs
170 interrupt handlers, which services the hardware. The kernel 170 interrupt handlers, which services the hardware. The kernel
171 guarantees that this handler is never re-entered: if another 171 guarantees that this handler is never re-entered: if the same
172 interrupt arrives, it is queued (or dropped). Because it 172 interrupt arrives, it is queued (or dropped). Because it
173 disables interrupts, this handler has to be fast: frequently it 173 disables interrupts, this handler has to be fast: frequently it
174 simply acknowledges the interrupt, marks a `software interrupt' 174 simply acknowledges the interrupt, marks a 'software interrupt'
175 for execution and exits. 175 for execution and exits.
176 </para> 176 </para>
177 177
@@ -188,60 +188,52 @@
188 </sect1> 188 </sect1>
189 189
190 <sect1 id="basics-softirqs"> 190 <sect1 id="basics-softirqs">
191 <title>Software Interrupt Context: Bottom Halves, Tasklets, softirqs</title> 191 <title>Software Interrupt Context: Softirqs and Tasklets</title>
192 192
193 <para> 193 <para>
194 Whenever a system call is about to return to userspace, or a 194 Whenever a system call is about to return to userspace, or a
195 hardware interrupt handler exits, any `software interrupts' 195 hardware interrupt handler exits, any 'software interrupts'
196 which are marked pending (usually by hardware interrupts) are 196 which are marked pending (usually by hardware interrupts) are
197 run (<filename>kernel/softirq.c</filename>). 197 run (<filename>kernel/softirq.c</filename>).
198 </para> 198 </para>
199 199
200 <para> 200 <para>
201 Much of the real interrupt handling work is done here. Early in 201 Much of the real interrupt handling work is done here. Early in
202 the transition to <acronym>SMP</acronym>, there were only `bottom 202 the transition to <acronym>SMP</acronym>, there were only 'bottom
203 halves' (BHs), which didn't take advantage of multiple CPUs. Shortly 203 halves' (BHs), which didn't take advantage of multiple CPUs. Shortly
204 after we switched from wind-up computers made of match-sticks and snot, 204 after we switched from wind-up computers made of match-sticks and snot,
205 we abandoned this limitation. 205 we abandoned this limitation and switched to 'softirqs'.
206 </para> 206 </para>
207 207
208 <para> 208 <para>
209 <filename class="headerfile">include/linux/interrupt.h</filename> lists the 209 <filename class="headerfile">include/linux/interrupt.h</filename> lists the
210 different BH's. No matter how many CPUs you have, no two BHs will run at 210 different softirqs. A very important softirq is the
211 the same time. This made the transition to SMP simpler, but sucks hard for 211 timer softirq (<filename
212 scalable performance. A very important bottom half is the timer 212 class="headerfile">include/linux/timer.h</filename>): you can
213 BH (<filename class="headerfile">include/linux/timer.h</filename>): you 213 register to have it call functions for you in a given length of
214 can register to have it call functions for you in a given length of time. 214 time.
215 </para> 215 </para>
216 216
217 <para> 217 <para>
218 2.3.43 introduced softirqs, and re-implemented the (now 218 Softirqs are often a pain to deal with, since the same softirq
219 deprecated) BHs underneath them. Softirqs are fully-SMP 219 will run simultaneously on more than one CPU. For this reason,
220 versions of BHs: they can run on as many CPUs at once as 220 tasklets (<filename
221 required. This means they need to deal with any races in shared 221 class="headerfile">include/linux/interrupt.h</filename>) are more
222 data using their own locks. A bitmask is used to keep track of 222 often used: they are dynamically-registrable (meaning you can have
223 which are enabled, so the 32 available softirqs should not be 223 as many as you want), and they also guarantee that any tasklet
224 used up lightly. (<emphasis>Yes</emphasis>, people will 224 will only run on one CPU at any time, although different tasklets
225 notice). 225 can run simultaneously.
226 </para>
227
228 <para>
229 tasklets (<filename class="headerfile">include/linux/interrupt.h</filename>)
230 are like softirqs, except they are dynamically-registrable (meaning you
231 can have as many as you want), and they also guarantee that any tasklet
232 will only run on one CPU at any time, although different tasklets can
233 run simultaneously (unlike different BHs).
234 </para> 226 </para>
235 <caution> 227 <caution>
236 <para> 228 <para>
237 The name `tasklet' is misleading: they have nothing to do with `tasks', 229 The name 'tasklet' is misleading: they have nothing to do with 'tasks',
238 and probably more to do with some bad vodka Alexey Kuznetsov had at the 230 and probably more to do with some bad vodka Alexey Kuznetsov had at the
239 time. 231 time.
240 </para> 232 </para>
241 </caution> 233 </caution>
242 234
243 <para> 235 <para>
244 You can tell you are in a softirq (or bottom half, or tasklet) 236 You can tell you are in a softirq (or tasklet)
245 using the <function>in_softirq()</function> macro 237 using the <function>in_softirq()</function> macro
246 (<filename class="headerfile">include/linux/interrupt.h</filename>). 238 (<filename class="headerfile">include/linux/interrupt.h</filename>).
247 </para> 239 </para>
@@ -288,11 +280,10 @@
288 <term>A rigid stack limit</term> 280 <term>A rigid stack limit</term>
289 <listitem> 281 <listitem>
290 <para> 282 <para>
291 The kernel stack is about 6K in 2.2 (for most 283 Depending on configuration options the kernel stack is about 3K to 6K for most 32-bit architectures: it's
292 architectures: it's about 14K on the Alpha), and shared 284 about 14K on most 64-bit archs, and often shared with interrupts
293 with interrupts so you can't use it all. Avoid deep 285 so you can't use it all. Avoid deep recursion and huge local
294 recursion and huge local arrays on the stack (allocate 286 arrays on the stack (allocate them dynamically instead).
295 them dynamically instead).
296 </para> 287 </para>
297 </listitem> 288 </listitem>
298 </varlistentry> 289 </varlistentry>
@@ -339,7 +330,7 @@ asmlinkage long sys_mycall(int arg)
339 330
340 <para> 331 <para>
341 If all your routine does is read or write some parameter, consider 332 If all your routine does is read or write some parameter, consider
342 implementing a <function>sysctl</function> interface instead. 333 implementing a <function>sysfs</function> interface instead.
343 </para> 334 </para>
344 335
345 <para> 336 <para>
@@ -417,7 +408,10 @@ cond_resched(); /* Will sleep */
417 </para> 408 </para>
418 409
419 <para> 410 <para>
420 You will eventually lock up your box if you break these rules. 411 You should always compile your kernel
412 <symbol>CONFIG_DEBUG_SPINLOCK_SLEEP</symbol> on, and it will warn
413 you if you break these rules. If you <emphasis>do</emphasis> break
414 the rules, you will eventually lock up your box.
421 </para> 415 </para>
422 416
423 <para> 417 <para>
@@ -515,8 +509,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
515 success). 509 success).
516 </para> 510 </para>
517 </caution> 511 </caution>
518 [Yes, this moronic interface makes me cringe. Please submit a 512 [Yes, this moronic interface makes me cringe. The flamewar comes up every year or so. --RR.]
519 patch and become my hero --RR.]
520 </para> 513 </para>
521 <para> 514 <para>
522 The functions may sleep implicitly. This should never be called 515 The functions may sleep implicitly. This should never be called
@@ -587,10 +580,11 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
587 </variablelist> 580 </variablelist>
588 581
589 <para> 582 <para>
590 If you see a <errorname>kmem_grow: Called nonatomically from int 583 If you see a <errorname>sleeping function called from invalid
591 </errorname> warning message you called a memory allocation function 584 context</errorname> warning message, then maybe you called a
592 from interrupt context without <constant>GFP_ATOMIC</constant>. 585 sleeping allocation function from interrupt context without
593 You should really fix that. Run, don't walk. 586 <constant>GFP_ATOMIC</constant>. You should really fix that.
587 Run, don't walk.
594 </para> 588 </para>
595 589
596 <para> 590 <para>
@@ -639,16 +633,16 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
639 </sect1> 633 </sect1>
640 634
641 <sect1 id="routines-udelay"> 635 <sect1 id="routines-udelay">
642 <title><function>udelay()</function>/<function>mdelay()</function> 636 <title><function>mdelay()</function>/<function>udelay()</function>
643 <filename class="headerfile">include/asm/delay.h</filename> 637 <filename class="headerfile">include/asm/delay.h</filename>
644 <filename class="headerfile">include/linux/delay.h</filename> 638 <filename class="headerfile">include/linux/delay.h</filename>
645 </title> 639 </title>
646 640
647 <para> 641 <para>
648 The <function>udelay()</function> function can be used for small pauses. 642 The <function>udelay()</function> and <function>ndelay()</function> functions can be used for small pauses.
649 Do not use large values with <function>udelay()</function> as you risk 643 Do not use large values with them as you risk
650 overflow - the helper function <function>mdelay()</function> is useful 644 overflow - the helper function <function>mdelay()</function> is useful
651 here, or even consider <function>schedule_timeout()</function>. 645 here, or consider <function>msleep()</function>.
652 </para> 646 </para>
653 </sect1> 647 </sect1>
654 648
@@ -698,8 +692,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
698 These routines disable soft interrupts on the local CPU, and 692 These routines disable soft interrupts on the local CPU, and
699 restore them. They are reentrant; if soft interrupts were 693 restore them. They are reentrant; if soft interrupts were
700 disabled before, they will still be disabled after this pair 694 disabled before, they will still be disabled after this pair
701 of functions has been called. They prevent softirqs, tasklets 695 of functions has been called. They prevent softirqs and tasklets
702 and bottom halves from running on the current CPU. 696 from running on the current CPU.
703 </para> 697 </para>
704 </sect1> 698 </sect1>
705 699
@@ -708,10 +702,16 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
708 <filename class="headerfile">include/asm/smp.h</filename></title> 702 <filename class="headerfile">include/asm/smp.h</filename></title>
709 703
710 <para> 704 <para>
711 <function>smp_processor_id()</function> returns the current 705 <function>get_cpu()</function> disables preemption (so you won't
712 processor number, between 0 and <symbol>NR_CPUS</symbol> (the 706 suddenly get moved to another CPU) and returns the current
713 maximum number of CPUs supported by Linux, currently 32). These 707 processor number, between 0 and <symbol>NR_CPUS</symbol>. Note
714 values are not necessarily continuous. 708 that the CPU numbers are not necessarily continuous. You return
709 it again with <function>put_cpu()</function> when you are done.
710 </para>
711 <para>
712 If you know you cannot be preempted by another task (ie. you are
713 in interrupt context, or have preemption disabled) you can use
714 smp_processor_id().
715 </para> 715 </para>
716 </sect1> 716 </sect1>
717 717
@@ -722,19 +722,14 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
722 <para> 722 <para>
723 After boot, the kernel frees up a special section; functions 723 After boot, the kernel frees up a special section; functions
724 marked with <type>__init</type> and data structures marked with 724 marked with <type>__init</type> and data structures marked with
725 <type>__initdata</type> are dropped after boot is complete (within 725 <type>__initdata</type> are dropped after boot is complete: similarly
726 modules this directive is currently ignored). <type>__exit</type> 726 modules discard this memory after initialization. <type>__exit</type>
727 is used to declare a function which is only required on exit: the 727 is used to declare a function which is only required on exit: the
728 function will be dropped if this file is not compiled as a module. 728 function will be dropped if this file is not compiled as a module.
729 See the header file for use. Note that it makes no sense for a function 729 See the header file for use. Note that it makes no sense for a function
730 marked with <type>__init</type> to be exported to modules with 730 marked with <type>__init</type> to be exported to modules with
731 <function>EXPORT_SYMBOL()</function> - this will break. 731 <function>EXPORT_SYMBOL()</function> - this will break.
732 </para> 732 </para>
733 <para>
734 Static data structures marked as <type>__initdata</type> must be initialised
735 (as opposed to ordinary static data which is zeroed BSS) and cannot be
736 <type>const</type>.
737 </para>
738 733
739 </sect1> 734 </sect1>
740 735
@@ -762,9 +757,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
762 <para> 757 <para>
763 The function can return a negative error number to cause 758 The function can return a negative error number to cause
764 module loading to fail (unfortunately, this has no effect if 759 module loading to fail (unfortunately, this has no effect if
765 the module is compiled into the kernel). For modules, this is 760 the module is compiled into the kernel). This function is
766 called in user context, with interrupts enabled, and the 761 called in user context with interrupts enabled, so it can sleep.
767 kernel lock held, so it can sleep.
768 </para> 762 </para>
769 </sect1> 763 </sect1>
770 764
@@ -779,6 +773,34 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
779 reached zero. This function can also sleep, but cannot fail: 773 reached zero. This function can also sleep, but cannot fail:
780 everything must be cleaned up by the time it returns. 774 everything must be cleaned up by the time it returns.
781 </para> 775 </para>
776
777 <para>
778 Note that this macro is optional: if it is not present, your
779 module will not be removable (except for 'rmmod -f').
780 </para>
781 </sect1>
782
783 <sect1 id="routines-module-use-counters">
784 <title> <function>try_module_get()</function>/<function>module_put()</function>
785 <filename class="headerfile">include/linux/module.h</filename></title>
786
787 <para>
788 These manipulate the module usage count, to protect against
789 removal (a module also can't be removed if another module uses one
790 of its exported symbols: see below). Before calling into module
791 code, you should call <function>try_module_get()</function> on
792 that module: if it fails, then the module is being removed and you
793 should act as if it wasn't there. Otherwise, you can safely enter
794 the module, and call <function>module_put()</function> when you're
795 finished.
796 </para>
797
798 <para>
799 Most registerable structures have an
800 <structfield>owner</structfield> field, such as in the
801 <structname>file_operations</structname> structure. Set this field
802 to the macro <symbol>THIS_MODULE</symbol>.
803 </para>
782 </sect1> 804 </sect1>
783 805
784 <!-- add info on new-style module refcounting here --> 806 <!-- add info on new-style module refcounting here -->
@@ -821,7 +843,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
821 There is a macro to do this: 843 There is a macro to do this:
822 <function>wait_event_interruptible()</function> 844 <function>wait_event_interruptible()</function>
823 845
824 <filename class="headerfile">include/linux/sched.h</filename> The 846 <filename class="headerfile">include/linux/wait.h</filename> The
825 first argument is the wait queue head, and the second is an 847 first argument is the wait queue head, and the second is an
826 expression which is evaluated; the macro returns 848 expression which is evaluated; the macro returns
827 <returnvalue>0</returnvalue> when this expression is true, or 849 <returnvalue>0</returnvalue> when this expression is true, or
@@ -847,10 +869,11 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
847 <para> 869 <para>
848 Call <function>wake_up()</function> 870 Call <function>wake_up()</function>
849 871
850 <filename class="headerfile">include/linux/sched.h</filename>;, 872 <filename class="headerfile">include/linux/wait.h</filename>;,
851 which will wake up every process in the queue. The exception is 873 which will wake up every process in the queue. The exception is
852 if one has <constant>TASK_EXCLUSIVE</constant> set, in which case 874 if one has <constant>TASK_EXCLUSIVE</constant> set, in which case
853 the remainder of the queue will not be woken. 875 the remainder of the queue will not be woken. There are other variants
876 of this basic function available in the same header.
854 </para> 877 </para>
855 </sect1> 878 </sect1>
856 </chapter> 879 </chapter>
@@ -863,7 +886,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
863 first class of operations work on <type>atomic_t</type> 886 first class of operations work on <type>atomic_t</type>
864 887
865 <filename class="headerfile">include/asm/atomic.h</filename>; this 888 <filename class="headerfile">include/asm/atomic.h</filename>; this
866 contains a signed integer (at least 24 bits long), and you must use 889 contains a signed integer (at least 32 bits long), and you must use
867 these functions to manipulate or read atomic_t variables. 890 these functions to manipulate or read atomic_t variables.
868 <function>atomic_read()</function> and 891 <function>atomic_read()</function> and
869 <function>atomic_set()</function> get and set the counter, 892 <function>atomic_set()</function> get and set the counter,
@@ -882,13 +905,12 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
882 905
883 <para> 906 <para>
884 Note that these functions are slower than normal arithmetic, and 907 Note that these functions are slower than normal arithmetic, and
885 so should not be used unnecessarily. On some platforms they 908 so should not be used unnecessarily.
886 are much slower, like 32-bit Sparc where they use a spinlock.
887 </para> 909 </para>
888 910
889 <para> 911 <para>
890 The second class of atomic operations is atomic bit operations on a 912 The second class of atomic operations is atomic bit operations on an
891 <type>long</type>, defined in 913 <type>unsigned long</type>, defined in
892 914
893 <filename class="headerfile">include/linux/bitops.h</filename>. These 915 <filename class="headerfile">include/linux/bitops.h</filename>. These
894 operations generally take a pointer to the bit pattern, and a bit 916 operations generally take a pointer to the bit pattern, and a bit
@@ -899,7 +921,7 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
899 <function>test_and_clear_bit()</function> and 921 <function>test_and_clear_bit()</function> and
900 <function>test_and_change_bit()</function> do the same thing, 922 <function>test_and_change_bit()</function> do the same thing,
901 except return true if the bit was previously set; these are 923 except return true if the bit was previously set; these are
902 particularly useful for very simple locking. 924 particularly useful for atomically setting flags.
903 </para> 925 </para>
904 926
905 <para> 927 <para>
@@ -907,12 +929,6 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
907 than BITS_PER_LONG. The resulting behavior is strange on big-endian 929 than BITS_PER_LONG. The resulting behavior is strange on big-endian
908 platforms though so it is a good idea not to do this. 930 platforms though so it is a good idea not to do this.
909 </para> 931 </para>
910
911 <para>
912 Note that the order of bits depends on the architecture, and in
913 particular, the bitfield passed to these operations must be at
914 least as large as a <type>long</type>.
915 </para>
916 </chapter> 932 </chapter>
917 933
918 <chapter id="symbols"> 934 <chapter id="symbols">
@@ -932,11 +948,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
932 <filename class="headerfile">include/linux/module.h</filename></title> 948 <filename class="headerfile">include/linux/module.h</filename></title>
933 949
934 <para> 950 <para>
935 This is the classic method of exporting a symbol, and it works 951 This is the classic method of exporting a symbol: dynamically
936 for both modules and non-modules. In the kernel all these 952 loaded modules will be able to use the symbol as normal.
937 declarations are often bundled into a single file to help
938 genksyms (which searches source files for these declarations).
939 See the comment on genksyms and Makefiles below.
940 </para> 953 </para>
941 </sect1> 954 </sect1>
942 955
@@ -949,7 +962,8 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
949 symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can 962 symbols exported by <function>EXPORT_SYMBOL_GPL()</function> can
950 only be seen by modules with a 963 only be seen by modules with a
951 <function>MODULE_LICENSE()</function> that specifies a GPL 964 <function>MODULE_LICENSE()</function> that specifies a GPL
952 compatible license. 965 compatible license. It implies that the function is considered
966 an internal implementation issue, and not really an interface.
953 </para> 967 </para>
954 </sect1> 968 </sect1>
955 </chapter> 969 </chapter>
@@ -962,12 +976,13 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
962 <filename class="headerfile">include/linux/list.h</filename></title> 976 <filename class="headerfile">include/linux/list.h</filename></title>
963 977
964 <para> 978 <para>
965 There are three sets of linked-list routines in the kernel 979 There used to be three sets of linked-list routines in the kernel
966 headers, but this one seems to be winning out (and Linus has 980 headers, but this one is the winner. If you don't have some
967 used it). If you don't have some particular pressing need for 981 particular pressing need for a single list, it's a good choice.
968 a single list, it's a good choice. In fact, I don't care 982 </para>
969 whether it's a good choice or not, just use it so we can get 983
970 rid of the others. 984 <para>
985 In particular, <function>list_for_each_entry</function> is useful.
971 </para> 986 </para>
972 </sect1> 987 </sect1>
973 988
@@ -979,14 +994,13 @@ printk(KERN_INFO "my ip: %d.%d.%d.%d\n", NIPQUAD(ipaddress));
979 convention, and return <returnvalue>0</returnvalue> for success, 994 convention, and return <returnvalue>0</returnvalue> for success,
980 and a negative error number 995 and a negative error number
981 (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be 996 (eg. <returnvalue>-EFAULT</returnvalue>) for failure. This can be
982 unintuitive at first, but it's fairly widespread in the networking 997 unintuitive at first, but it's fairly widespread in the kernel.
983 code, for example.
984 </para> 998 </para>
985 999
986 <para> 1000 <para>
987 The filesystem code uses <function>ERR_PTR()</function> 1001 Using <function>ERR_PTR()</function>
988 1002
989 <filename class="headerfile">include/linux/fs.h</filename>; to 1003 <filename class="headerfile">include/linux/err.h</filename>; to
990 encode a negative error number into a pointer, and 1004 encode a negative error number into a pointer, and
991 <function>IS_ERR()</function> and <function>PTR_ERR()</function> 1005 <function>IS_ERR()</function> and <function>PTR_ERR()</function>
992 to get it back out again: avoids a separate pointer parameter for 1006 to get it back out again: avoids a separate pointer parameter for
@@ -1040,7 +1054,7 @@ static struct block_device_operations opt_fops = {
1040 supported, due to lack of general use, but the following are 1054 supported, due to lack of general use, but the following are
1041 considered standard (see the GCC info page section "C 1055 considered standard (see the GCC info page section "C
1042 Extensions" for more details - Yes, really the info page, the 1056 Extensions" for more details - Yes, really the info page, the
1043 man page is only a short summary of the stuff in info): 1057 man page is only a short summary of the stuff in info).
1044 </para> 1058 </para>
1045 <itemizedlist> 1059 <itemizedlist>
1046 <listitem> 1060 <listitem>
@@ -1091,7 +1105,7 @@ static struct block_device_operations opt_fops = {
1091 </listitem> 1105 </listitem>
1092 <listitem> 1106 <listitem>
1093 <para> 1107 <para>
1094 Function names as strings (__FUNCTION__) 1108 Function names as strings (__func__).
1095 </para> 1109 </para>
1096 </listitem> 1110 </listitem>
1097 <listitem> 1111 <listitem>
@@ -1164,63 +1178,35 @@ static struct block_device_operations opt_fops = {
1164 <listitem> 1178 <listitem>
1165 <para> 1179 <para>
1166 Usually you want a configuration option for your kernel hack. 1180 Usually you want a configuration option for your kernel hack.
1167 Edit <filename>Config.in</filename> in the appropriate directory 1181 Edit <filename>Kconfig</filename> in the appropriate directory.
1168 (but under <filename>arch/</filename> it's called 1182 The Config language is simple to use by cut and paste, and there's
1169 <filename>config.in</filename>). The Config Language used is not 1183 complete documentation in
1170 bash, even though it looks like bash; the safe way is to use only 1184 <filename>Documentation/kbuild/kconfig-language.txt</filename>.
1171 the constructs that you already see in
1172 <filename>Config.in</filename> files (see
1173 <filename>Documentation/kbuild/kconfig-language.txt</filename>).
1174 It's good to run "make xconfig" at least once to test (because
1175 it's the only one with a static parser).
1176 </para>
1177
1178 <para>
1179 Variables which can be Y or N use <type>bool</type> followed by a
1180 tagline and the config define name (which must start with
1181 CONFIG_). The <type>tristate</type> function is the same, but
1182 allows the answer M (which defines
1183 <symbol>CONFIG_foo_MODULE</symbol> in your source, instead of
1184 <symbol>CONFIG_FOO</symbol>) if <symbol>CONFIG_MODULES</symbol>
1185 is enabled.
1186 </para> 1185 </para>
1187 1186
1188 <para> 1187 <para>
1189 You may well want to make your CONFIG option only visible if 1188 You may well want to make your CONFIG option only visible if
1190 <symbol>CONFIG_EXPERIMENTAL</symbol> is enabled: this serves as a 1189 <symbol>CONFIG_EXPERIMENTAL</symbol> is enabled: this serves as a
1191 warning to users. There many other fancy things you can do: see 1190 warning to users. There many other fancy things you can do: see
1192 the various <filename>Config.in</filename> files for ideas. 1191 the various <filename>Kconfig</filename> files for ideas.
1193 </para> 1192 </para>
1194 </listitem>
1195 1193
1196 <listitem>
1197 <para> 1194 <para>
1198 Edit the <filename>Makefile</filename>: the CONFIG variables are 1195 In your description of the option, make sure you address both the
1199 exported here so you can conditionalize compilation with `ifeq'. 1196 expert user and the user who knows nothing about your feature. Mention
1200 If your file exports symbols then add the names to 1197 incompatibilities and issues here. <emphasis> Definitely
1201 <varname>export-objs</varname> so that genksyms will find them. 1198 </emphasis> end your description with <quote> if in doubt, say N
1202 <caution> 1199 </quote> (or, occasionally, `Y'); this is for people who have no
1203 <para> 1200 idea what you are talking about.
1204 There is a restriction on the kernel build system that objects
1205 which export symbols must have globally unique names.
1206 If your object does not have a globally unique name then the
1207 standard fix is to move the
1208 <function>EXPORT_SYMBOL()</function> statements to their own
1209 object with a unique name.
1210 This is why several systems have separate exporting objects,
1211 usually suffixed with ksyms.
1212 </para>
1213 </caution>
1214 </para> 1201 </para>
1215 </listitem> 1202 </listitem>
1216 1203
1217 <listitem> 1204 <listitem>
1218 <para> 1205 <para>
1219 Document your option in Documentation/Configure.help. Mention 1206 Edit the <filename>Makefile</filename>: the CONFIG variables are
1220 incompatibilities and issues here. <emphasis> Definitely 1207 exported here so you can usually just add a "obj-$(CONFIG_xxx) +=
1221 </emphasis> end your description with <quote> if in doubt, say N 1208 xxx.o" line. The syntax is documented in
1222 </quote> (or, occasionally, `Y'); this is for people who have no 1209 <filename>Documentation/kbuild/makefiles.txt</filename>.
1223 idea what you are talking about.
1224 </para> 1210 </para>
1225 </listitem> 1211 </listitem>
1226 1212
@@ -1253,20 +1239,12 @@ static struct block_device_operations opt_fops = {
1253 </para> 1239 </para>
1254 1240
1255 <para> 1241 <para>
1256 <filename>include/linux/brlock.h:</filename> 1242 <filename>include/asm-i386/delay.h:</filename>
1257 </para> 1243 </para>
1258 <programlisting> 1244 <programlisting>
1259extern inline void br_read_lock (enum brlock_indices idx) 1245#define ndelay(n) (__builtin_constant_p(n) ? \
1260{ 1246 ((n) > 20000 ? __bad_ndelay() : __const_udelay((n) * 5ul)) : \
1261 /* 1247 __ndelay(n))
1262 * This causes a link-time bug message if an
1263 * invalid index is used:
1264 */
1265 if (idx >= __BR_END)
1266 __br_lock_usage_bug();
1267
1268 read_lock(&amp;__brlock_array[smp_processor_id()][idx]);
1269}
1270 </programlisting> 1248 </programlisting>
1271 1249
1272 <para> 1250 <para>
diff --git a/Documentation/DocBook/mcabook.tmpl b/Documentation/DocBook/mcabook.tmpl
index 4367f4642f3d..42a760cd7467 100644
--- a/Documentation/DocBook/mcabook.tmpl
+++ b/Documentation/DocBook/mcabook.tmpl
@@ -96,7 +96,7 @@
96 96
97 <chapter id="pubfunctions"> 97 <chapter id="pubfunctions">
98 <title>Public Functions Provided</title> 98 <title>Public Functions Provided</title>
99!Earch/i386/kernel/mca.c 99!Edrivers/mca/mca-legacy.c
100 </chapter> 100 </chapter>
101 101
102 <chapter id="dmafunctions"> 102 <chapter id="dmafunctions">
diff --git a/Documentation/IPMI.txt b/Documentation/IPMI.txt
index 84d3d4d10c17..bf1cf98d2a27 100644
--- a/Documentation/IPMI.txt
+++ b/Documentation/IPMI.txt
@@ -605,12 +605,13 @@ is in the ipmi_poweroff module. When the system requests a powerdown,
605it will send the proper IPMI commands to do this. This is supported on 605it will send the proper IPMI commands to do this. This is supported on
606several platforms. 606several platforms.
607 607
608There is a module parameter named "poweroff_control" that may either be zero 608There is a module parameter named "poweroff_powercycle" that may
609(do a power down) or 2 (do a power cycle, power the system off, then power 609either be zero (do a power down) or non-zero (do a power cycle, power
610it on in a few seconds). Setting ipmi_poweroff.poweroff_control=x will do 610the system off, then power it on in a few seconds). Setting
611the same thing on the kernel command line. The parameter is also available 611ipmi_poweroff.poweroff_control=x will do the same thing on the kernel
612via the proc filesystem in /proc/ipmi/poweroff_control. Note that if the 612command line. The parameter is also available via the proc filesystem
613system does not support power cycling, it will always to the power off. 613in /proc/sys/dev/ipmi/poweroff_powercycle. Note that if the system
614does not support power cycling, it will always do the power off.
614 615
615Note that if you have ACPI enabled, the system will prefer using ACPI to 616Note that if you have ACPI enabled, the system will prefer using ACPI to
616power off. 617power off.
diff --git a/Documentation/RCU/NMI-RCU.txt b/Documentation/RCU/NMI-RCU.txt
new file mode 100644
index 000000000000..d0634a5c3445
--- /dev/null
+++ b/Documentation/RCU/NMI-RCU.txt
@@ -0,0 +1,112 @@
1Using RCU to Protect Dynamic NMI Handlers
2
3
4Although RCU is usually used to protect read-mostly data structures,
5it is possible to use RCU to provide dynamic non-maskable interrupt
6handlers, as well as dynamic irq handlers. This document describes
7how to do this, drawing loosely from Zwane Mwaikambo's NMI-timer
8work in "arch/i386/oprofile/nmi_timer_int.c" and in
9"arch/i386/kernel/traps.c".
10
11The relevant pieces of code are listed below, each followed by a
12brief explanation.
13
14 static int dummy_nmi_callback(struct pt_regs *regs, int cpu)
15 {
16 return 0;
17 }
18
19The dummy_nmi_callback() function is a "dummy" NMI handler that does
20nothing, but returns zero, thus saying that it did nothing, allowing
21the NMI handler to take the default machine-specific action.
22
23 static nmi_callback_t nmi_callback = dummy_nmi_callback;
24
25This nmi_callback variable is a global function pointer to the current
26NMI handler.
27
28 fastcall void do_nmi(struct pt_regs * regs, long error_code)
29 {
30 int cpu;
31
32 nmi_enter();
33
34 cpu = smp_processor_id();
35 ++nmi_count(cpu);
36
37 if (!rcu_dereference(nmi_callback)(regs, cpu))
38 default_do_nmi(regs);
39
40 nmi_exit();
41 }
42
43The do_nmi() function processes each NMI. It first disables preemption
44in the same way that a hardware irq would, then increments the per-CPU
45count of NMIs. It then invokes the NMI handler stored in the nmi_callback
46function pointer. If this handler returns zero, do_nmi() invokes the
47default_do_nmi() function to handle a machine-specific NMI. Finally,
48preemption is restored.
49
50Strictly speaking, rcu_dereference() is not needed, since this code runs
51only on i386, which does not need rcu_dereference() anyway. However,
52it is a good documentation aid, particularly for anyone attempting to
53do something similar on Alpha.
54
55Quick Quiz: Why might the rcu_dereference() be necessary on Alpha,
56 given that the code referenced by the pointer is read-only?
57
58
59Back to the discussion of NMI and RCU...
60
61 void set_nmi_callback(nmi_callback_t callback)
62 {
63 rcu_assign_pointer(nmi_callback, callback);
64 }
65
66The set_nmi_callback() function registers an NMI handler. Note that any
67data that is to be used by the callback must be initialized up -before-
68the call to set_nmi_callback(). On architectures that do not order
69writes, the rcu_assign_pointer() ensures that the NMI handler sees the
70initialized values.
71
72 void unset_nmi_callback(void)
73 {
74 rcu_assign_pointer(nmi_callback, dummy_nmi_callback);
75 }
76
77This function unregisters an NMI handler, restoring the original
78dummy_nmi_handler(). However, there may well be an NMI handler
79currently executing on some other CPU. We therefore cannot free
80up any data structures used by the old NMI handler until execution
81of it completes on all other CPUs.
82
83One way to accomplish this is via synchronize_sched(), perhaps as
84follows:
85
86 unset_nmi_callback();
87 synchronize_sched();
88 kfree(my_nmi_data);
89
90This works because synchronize_sched() blocks until all CPUs complete
91any preemption-disabled segments of code that they were executing.
92Since NMI handlers disable preemption, synchronize_sched() is guaranteed
93not to return until all ongoing NMI handlers exit. It is therefore safe
94to free up the handler's data as soon as synchronize_sched() returns.
95
96
97Answer to Quick Quiz
98
99 Why might the rcu_dereference() be necessary on Alpha, given
100 that the code referenced by the pointer is read-only?
101
102 Answer: The caller to set_nmi_callback() might well have
103 initialized some data that is to be used by the
104 new NMI handler. In this case, the rcu_dereference()
105 would be needed, because otherwise a CPU that received
106 an NMI just after the new handler was set might see
107 the pointer to the new NMI handler, but the old
108 pre-initialized version of the handler's data.
109
110 More important, the rcu_dereference() makes it clear
111 to someone reading the code that the pointer is being
112 protected by RCU.
diff --git a/Documentation/RCU/rcuref.txt b/Documentation/RCU/rcuref.txt
new file mode 100644
index 000000000000..a23fee66064d
--- /dev/null
+++ b/Documentation/RCU/rcuref.txt
@@ -0,0 +1,74 @@
1Refcounter framework for elements of lists/arrays protected by
2RCU.
3
4Refcounting on elements of lists which are protected by traditional
5reader/writer spinlocks or semaphores are straight forward as in:
6
71. 2.
8add() search_and_reference()
9{ {
10 alloc_object read_lock(&list_lock);
11 ... search_for_element
12 atomic_set(&el->rc, 1); atomic_inc(&el->rc);
13 write_lock(&list_lock); ...
14 add_element read_unlock(&list_lock);
15 ... ...
16 write_unlock(&list_lock); }
17}
18
193. 4.
20release_referenced() delete()
21{ {
22 ... write_lock(&list_lock);
23 atomic_dec(&el->rc, relfunc) ...
24 ... delete_element
25} write_unlock(&list_lock);
26 ...
27 if (atomic_dec_and_test(&el->rc))
28 kfree(el);
29 ...
30 }
31
32If this list/array is made lock free using rcu as in changing the
33write_lock in add() and delete() to spin_lock and changing read_lock
34in search_and_reference to rcu_read_lock(), the rcuref_get in
35search_and_reference could potentially hold reference to an element which
36has already been deleted from the list/array. rcuref_lf_get_rcu takes
37care of this scenario. search_and_reference should look as;
38
391. 2.
40add() search_and_reference()
41{ {
42 alloc_object rcu_read_lock();
43 ... search_for_element
44 atomic_set(&el->rc, 1); if (rcuref_inc_lf(&el->rc)) {
45 write_lock(&list_lock); rcu_read_unlock();
46 return FAIL;
47 add_element }
48 ... ...
49 write_unlock(&list_lock); rcu_read_unlock();
50} }
513. 4.
52release_referenced() delete()
53{ {
54 ... write_lock(&list_lock);
55 rcuref_dec(&el->rc, relfunc) ...
56 ... delete_element
57} write_unlock(&list_lock);
58 ...
59 if (rcuref_dec_and_test(&el->rc))
60 call_rcu(&el->head, el_free);
61 ...
62 }
63
64Sometimes, reference to the element need to be obtained in the
65update (write) stream. In such cases, rcuref_inc_lf might be an overkill
66since the spinlock serialising list updates are held. rcuref_inc
67is to be used in such cases.
68For arches which do not have cmpxchg rcuref_inc_lf
69api uses a hashed spinlock implementation and the same hashed spinlock
70is acquired in all rcuref_xxx primitives to preserve atomicity.
71Note: Use rcuref_inc api only if you need to use rcuref_inc_lf on the
72refcounter atleast at one place. Mixing rcuref_inc and atomic_xxx api
73might lead to races. rcuref_inc_lf() must be used in lockfree
74RCU critical sections only.
diff --git a/Documentation/acpi-hotkey.txt b/Documentation/acpi-hotkey.txt
index 0acdc80c30c2..744f1aec6553 100644
--- a/Documentation/acpi-hotkey.txt
+++ b/Documentation/acpi-hotkey.txt
@@ -35,4 +35,4 @@ created. Please use command "cat /proc/acpi/hotkey/polling_method"
35to retrieve it. 35to retrieve it.
36 36
37Note: Use cmdline "acpi_generic_hotkey" to over-ride 37Note: Use cmdline "acpi_generic_hotkey" to over-ride
38loading any platform specific drivers. 38platform-specific with generic driver.
diff --git a/Documentation/applying-patches.txt b/Documentation/applying-patches.txt
new file mode 100644
index 000000000000..681e426e2482
--- /dev/null
+++ b/Documentation/applying-patches.txt
@@ -0,0 +1,439 @@
1
2 Applying Patches To The Linux Kernel
3 ------------------------------------
4
5 (Written by Jesper Juhl, August 2005)
6
7
8
9A frequently asked question on the Linux Kernel Mailing List is how to apply
10a patch to the kernel or, more specifically, what base kernel a patch for
11one of the many trees/branches should be applied to. Hopefully this document
12will explain this to you.
13
14In addition to explaining how to apply and revert patches, a brief
15description of the different kernel trees (and examples of how to apply
16their specific patches) is also provided.
17
18
19What is a patch?
20---
21 A patch is a small text document containing a delta of changes between two
22different versions of a source tree. Patches are created with the `diff'
23program.
24To correctly apply a patch you need to know what base it was generated from
25and what new version the patch will change the source tree into. These
26should both be present in the patch file metadata or be possible to deduce
27from the filename.
28
29
30How do I apply or revert a patch?
31---
32 You apply a patch with the `patch' program. The patch program reads a diff
33(or patch) file and makes the changes to the source tree described in it.
34
35Patches for the Linux kernel are generated relative to the parent directory
36holding the kernel source dir.
37
38This means that paths to files inside the patch file contain the name of the
39kernel source directories it was generated against (or some other directory
40names like "a/" and "b/").
41Since this is unlikely to match the name of the kernel source dir on your
42local machine (but is often useful info to see what version an otherwise
43unlabeled patch was generated against) you should change into your kernel
44source directory and then strip the first element of the path from filenames
45in the patch file when applying it (the -p1 argument to `patch' does this).
46
47To revert a previously applied patch, use the -R argument to patch.
48So, if you applied a patch like this:
49 patch -p1 < ../patch-x.y.z
50
51You can revert (undo) it like this:
52 patch -R -p1 < ../patch-x.y.z
53
54
55How do I feed a patch/diff file to `patch'?
56---
57 This (as usual with Linux and other UNIX like operating systems) can be
58done in several different ways.
59In all the examples below I feed the file (in uncompressed form) to patch
60via stdin using the following syntax:
61 patch -p1 < path/to/patch-x.y.z
62
63If you just want to be able to follow the examples below and don't want to
64know of more than one way to use patch, then you can stop reading this
65section here.
66
67Patch can also get the name of the file to use via the -i argument, like
68this:
69 patch -p1 -i path/to/patch-x.y.z
70
71If your patch file is compressed with gzip or bzip2 and you don't want to
72uncompress it before applying it, then you can feed it to patch like this
73instead:
74 zcat path/to/patch-x.y.z.gz | patch -p1
75 bzcat path/to/patch-x.y.z.bz2 | patch -p1
76
77If you wish to uncompress the patch file by hand first before applying it
78(what I assume you've done in the examples below), then you simply run
79gunzip or bunzip2 on the file - like this:
80 gunzip patch-x.y.z.gz
81 bunzip2 patch-x.y.z.bz2
82
83Which will leave you with a plain text patch-x.y.z file that you can feed to
84patch via stdin or the -i argument, as you prefer.
85
86A few other nice arguments for patch are -s which causes patch to be silent
87except for errors which is nice to prevent errors from scrolling out of the
88screen too fast, and --dry-run which causes patch to just print a listing of
89what would happen, but doesn't actually make any changes. Finally --verbose
90tells patch to print more information about the work being done.
91
92
93Common errors when patching
94---
95 When patch applies a patch file it attempts to verify the sanity of the
96file in different ways.
97Checking that the file looks like a valid patch file, checking the code
98around the bits being modified matches the context provided in the patch are
99just two of the basic sanity checks patch does.
100
101If patch encounters something that doesn't look quite right it has two
102options. It can either refuse to apply the changes and abort or it can try
103to find a way to make the patch apply with a few minor changes.
104
105One example of something that's not 'quite right' that patch will attempt to
106fix up is if all the context matches, the lines being changed match, but the
107line numbers are different. This can happen, for example, if the patch makes
108a change in the middle of the file but for some reasons a few lines have
109been added or removed near the beginning of the file. In that case
110everything looks good it has just moved up or down a bit, and patch will
111usually adjust the line numbers and apply the patch.
112
113Whenever patch applies a patch that it had to modify a bit to make it fit
114it'll tell you about it by saying the patch applied with 'fuzz'.
115You should be wary of such changes since even though patch probably got it
116right it doesn't /always/ get it right, and the result will sometimes be
117wrong.
118
119When patch encounters a change that it can't fix up with fuzz it rejects it
120outright and leaves a file with a .rej extension (a reject file). You can
121read this file to see exactely what change couldn't be applied, so you can
122go fix it up by hand if you wish.
123
124If you don't have any third party patches applied to your kernel source, but
125only patches from kernel.org and you apply the patches in the correct order,
126and have made no modifications yourself to the source files, then you should
127never see a fuzz or reject message from patch. If you do see such messages
128anyway, then there's a high risk that either your local source tree or the
129patch file is corrupted in some way. In that case you should probably try
130redownloading the patch and if things are still not OK then you'd be advised
131to start with a fresh tree downloaded in full from kernel.org.
132
133Let's look a bit more at some of the messages patch can produce.
134
135If patch stops and presents a "File to patch:" prompt, then patch could not
136find a file to be patched. Most likely you forgot to specify -p1 or you are
137in the wrong directory. Less often, you'll find patches that need to be
138applied with -p0 instead of -p1 (reading the patch file should reveal if
139this is the case - if so, then this is an error by the person who created
140the patch but is not fatal).
141
142If you get "Hunk #2 succeeded at 1887 with fuzz 2 (offset 7 lines)." or a
143message similar to that, then it means that patch had to adjust the location
144of the change (in this example it needed to move 7 lines from where it
145expected to make the change to make it fit).
146The resulting file may or may not be OK, depending on the reason the file
147was different than expected.
148This often happens if you try to apply a patch that was generated against a
149different kernel version than the one you are trying to patch.
150
151If you get a message like "Hunk #3 FAILED at 2387.", then it means that the
152patch could not be applied correctly and the patch program was unable to
153fuzz its way through. This will generate a .rej file with the change that
154caused the patch to fail and also a .orig file showing you the original
155content that couldn't be changed.
156
157If you get "Reversed (or previously applied) patch detected! Assume -R? [n]"
158then patch detected that the change contained in the patch seems to have
159already been made.
160If you actually did apply this patch previously and you just re-applied it
161in error, then just say [n]o and abort this patch. If you applied this patch
162previously and actually intended to revert it, but forgot to specify -R,
163then you can say [y]es here to make patch revert it for you.
164This can also happen if the creator of the patch reversed the source and
165destination directories when creating the patch, and in that case reverting
166the patch will in fact apply it.
167
168A message similar to "patch: **** unexpected end of file in patch" or "patch
169unexpectedly ends in middle of line" means that patch could make no sense of
170the file you fed to it. Either your download is broken or you tried to feed
171patch a compressed patch file without uncompressing it first.
172
173As I already mentioned above, these errors should never happen if you apply
174a patch from kernel.org to the correct version of an unmodified source tree.
175So if you get these errors with kernel.org patches then you should probably
176assume that either your patch file or your tree is broken and I'd advice you
177to start over with a fresh download of a full kernel tree and the patch you
178wish to apply.
179
180
181Are there any alternatives to `patch'?
182---
183 Yes there are alternatives. You can use the `interdiff' program
184(http://cyberelk.net/tim/patchutils/) to generate a patch representing the
185differences between two patches and then apply the result.
186This will let you move from something like 2.6.12.2 to 2.6.12.3 in a single
187step. The -z flag to interdiff will even let you feed it patches in gzip or
188bzip2 compressed form directly without the use of zcat or bzcat or manual
189decompression.
190
191Here's how you'd go from 2.6.12.2 to 2.6.12.3 in a single step:
192 interdiff -z ../patch-2.6.12.2.bz2 ../patch-2.6.12.3.gz | patch -p1
193
194Although interdiff may save you a step or two you are generally advised to
195do the additional steps since interdiff can get things wrong in some cases.
196
197 Another alternative is `ketchup', which is a python script for automatic
198downloading and applying of patches (http://www.selenic.com/ketchup/).
199
200Other nice tools are diffstat which shows a summary of changes made by a
201patch, lsdiff which displays a short listing of affected files in a patch
202file, along with (optionally) the line numbers of the start of each patch
203and grepdiff which displays a list of the files modified by a patch where
204the patch contains a given regular expression.
205
206
207Where can I download the patches?
208---
209 The patches are available at http://kernel.org/
210Most recent patches are linked from the front page, but they also have
211specific homes.
212
213The 2.6.x.y (-stable) and 2.6.x patches live at
214 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/
215
216The -rc patches live at
217 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/testing/
218
219The -git patches live at
220 ftp://ftp.kernel.org/pub/linux/kernel/v2.6/snapshots/
221
222The -mm kernels live at
223 ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/
224
225In place of ftp.kernel.org you can use ftp.cc.kernel.org, where cc is a
226country code. This way you'll be downloading from a mirror site that's most
227likely geographically closer to you, resulting in faster downloads for you,
228less bandwidth used globally and less load on the main kernel.org servers -
229these are good things, do use mirrors when possible.
230
231
232The 2.6.x kernels
233---
234 These are the base stable releases released by Linus. The highest numbered
235release is the most recent.
236
237If regressions or other serious flaws are found then a -stable fix patch
238will be released (see below) on top of this base. Once a new 2.6.x base
239kernel is released, a patch is made available that is a delta between the
240previous 2.6.x kernel and the new one.
241
242To apply a patch moving from 2.6.11 to 2.6.12 you'd do the following (note
243that such patches do *NOT* apply on top of 2.6.x.y kernels but on top of the
244base 2.6.x kernel - if you need to move from 2.6.x.y to 2.6.x+1 you need to
245first revert the 2.6.x.y patch).
246
247Here are some examples:
248
249# moving from 2.6.11 to 2.6.12
250$ cd ~/linux-2.6.11 # change to kernel source dir
251$ patch -p1 < ../patch-2.6.12 # apply the 2.6.12 patch
252$ cd ..
253$ mv linux-2.6.11 linux-2.6.12 # rename source dir
254
255# moving from 2.6.11.1 to 2.6.12
256$ cd ~/linux-2.6.11.1 # change to kernel source dir
257$ patch -p1 -R < ../patch-2.6.11.1 # revert the 2.6.11.1 patch
258 # source dir is now 2.6.11
259$ patch -p1 < ../patch-2.6.12 # apply new 2.6.12 patch
260$ cd ..
261$ mv linux-2.6.11.1 inux-2.6.12 # rename source dir
262
263
264The 2.6.x.y kernels
265---
266 Kernels with 4 digit versions are -stable kernels. They contain small(ish)
267critical fixes for security problems or significant regressions discovered
268in a given 2.6.x kernel.
269
270This is the recommended branch for users who want the most recent stable
271kernel and are not interested in helping test development/experimental
272versions.
273
274If no 2.6.x.y kernel is available, then the highest numbered 2.6.x kernel is
275the current stable kernel.
276
277These patches are not incremental, meaning that for example the 2.6.12.3
278patch does not apply on top of the 2.6.12.2 kernel source, but rather on top
279of the base 2.6.12 kernel source.
280So, in order to apply the 2.6.12.3 patch to your existing 2.6.12.2 kernel
281source you have to first back out the 2.6.12.2 patch (so you are left with a
282base 2.6.12 kernel source) and then apply the new 2.6.12.3 patch.
283
284Here's a small example:
285
286$ cd ~/linux-2.6.12.2 # change into the kernel source dir
287$ patch -p1 -R < ../patch-2.6.12.2 # revert the 2.6.12.2 patch
288$ patch -p1 < ../patch-2.6.12.3 # apply the new 2.6.12.3 patch
289$ cd ..
290$ mv linux-2.6.12.2 linux-2.6.12.3 # rename the kernel source dir
291
292
293The -rc kernels
294---
295 These are release-candidate kernels. These are development kernels released
296by Linus whenever he deems the current git (the kernel's source management
297tool) tree to be in a reasonably sane state adequate for testing.
298
299These kernels are not stable and you should expect occasional breakage if
300you intend to run them. This is however the most stable of the main
301development branches and is also what will eventually turn into the next
302stable kernel, so it is important that it be tested by as many people as
303possible.
304
305This is a good branch to run for people who want to help out testing
306development kernels but do not want to run some of the really experimental
307stuff (such people should see the sections about -git and -mm kernels below).
308
309The -rc patches are not incremental, they apply to a base 2.6.x kernel, just
310like the 2.6.x.y patches described above. The kernel version before the -rcN
311suffix denotes the version of the kernel that this -rc kernel will eventually
312turn into.
313So, 2.6.13-rc5 means that this is the fifth release candidate for the 2.6.13
314kernel and the patch should be applied on top of the 2.6.12 kernel source.
315
316Here are 3 examples of how to apply these patches:
317
318# first an example of moving from 2.6.12 to 2.6.13-rc3
319$ cd ~/linux-2.6.12 # change into the 2.6.12 source dir
320$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch
321$ cd ..
322$ mv linux-2.6.12 linux-2.6.13-rc3 # rename the source dir
323
324# now let's move from 2.6.13-rc3 to 2.6.13-rc5
325$ cd ~/linux-2.6.13-rc3 # change into the 2.6.13-rc3 dir
326$ patch -p1 -R < ../patch-2.6.13-rc3 # revert the 2.6.13-rc3 patch
327$ patch -p1 < ../patch-2.6.13-rc5 # apply the new 2.6.13-rc5 patch
328$ cd ..
329$ mv linux-2.6.13-rc3 linux-2.6.13-rc5 # rename the source dir
330
331# finally let's try and move from 2.6.12.3 to 2.6.13-rc5
332$ cd ~/linux-2.6.12.3 # change to the kernel source dir
333$ patch -p1 -R < ../patch-2.6.12.3 # revert the 2.6.12.3 patch
334$ patch -p1 < ../patch-2.6.13-rc5 # apply new 2.6.13-rc5 patch
335$ cd ..
336$ mv linux-2.6.12.3 linux-2.6.13-rc5 # rename the kernel source dir
337
338
339The -git kernels
340---
341 These are daily snapshots of Linus' kernel tree (managed in a git
342repository, hence the name).
343
344These patches are usually released daily and represent the current state of
345Linus' tree. They are more experimental than -rc kernels since they are
346generated automatically without even a cursory glance to see if they are
347sane.
348
349-git patches are not incremental and apply either to a base 2.6.x kernel or
350a base 2.6.x-rc kernel - you can see which from their name.
351A patch named 2.6.12-git1 applies to the 2.6.12 kernel source and a patch
352named 2.6.13-rc3-git2 applies to the source of the 2.6.13-rc3 kernel.
353
354Here are some examples of how to apply these patches:
355
356# moving from 2.6.12 to 2.6.12-git1
357$ cd ~/linux-2.6.12 # change to the kernel source dir
358$ patch -p1 < ../patch-2.6.12-git1 # apply the 2.6.12-git1 patch
359$ cd ..
360$ mv linux-2.6.12 linux-2.6.12-git1 # rename the kernel source dir
361
362# moving from 2.6.12-git1 to 2.6.13-rc2-git3
363$ cd ~/linux-2.6.12-git1 # change to the kernel source dir
364$ patch -p1 -R < ../patch-2.6.12-git1 # revert the 2.6.12-git1 patch
365 # we now have a 2.6.12 kernel
366$ patch -p1 < ../patch-2.6.13-rc2 # apply the 2.6.13-rc2 patch
367 # the kernel is now 2.6.13-rc2
368$ patch -p1 < ../patch-2.6.13-rc2-git3 # apply the 2.6.13-rc2-git3 patch
369 # the kernel is now 2.6.13-rc2-git3
370$ cd ..
371$ mv linux-2.6.12-git1 linux-2.6.13-rc2-git3 # rename source dir
372
373
374The -mm kernels
375---
376 These are experimental kernels released by Andrew Morton.
377
378The -mm tree serves as a sort of proving ground for new features and other
379experimental patches.
380Once a patch has proved its worth in -mm for a while Andrew pushes it on to
381Linus for inclusion in mainline.
382
383Although it's encouraged that patches flow to Linus via the -mm tree, this
384is not always enforced.
385Subsystem maintainers (or individuals) sometimes push their patches directly
386to Linus, even though (or after) they have been merged and tested in -mm (or
387sometimes even without prior testing in -mm).
388
389You should generally strive to get your patches into mainline via -mm to
390ensure maximum testing.
391
392This branch is in constant flux and contains many experimental features, a
393lot of debugging patches not appropriate for mainline etc and is the most
394experimental of the branches described in this document.
395
396These kernels are not appropriate for use on systems that are supposed to be
397stable and they are more risky to run than any of the other branches (make
398sure you have up-to-date backups - that goes for any experimental kernel but
399even more so for -mm kernels).
400
401These kernels in addition to all the other experimental patches they contain
402usually also contain any changes in the mainline -git kernels available at
403the time of release.
404
405Testing of -mm kernels is greatly appreciated since the whole point of the
406tree is to weed out regressions, crashes, data corruption bugs, build
407breakage (and any other bug in general) before changes are merged into the
408more stable mainline Linus tree.
409But testers of -mm should be aware that breakage in this tree is more common
410than in any other tree.
411
412The -mm kernels are not released on a fixed schedule, but usually a few -mm
413kernels are released in between each -rc kernel (1 to 3 is common).
414The -mm kernels apply to either a base 2.6.x kernel (when no -rc kernels
415have been released yet) or to a Linus -rc kernel.
416
417Here are some examples of applying the -mm patches:
418
419# moving from 2.6.12 to 2.6.12-mm1
420$ cd ~/linux-2.6.12 # change to the 2.6.12 source dir
421$ patch -p1 < ../2.6.12-mm1 # apply the 2.6.12-mm1 patch
422$ cd ..
423$ mv linux-2.6.12 linux-2.6.12-mm1 # rename the source appropriately
424
425# moving from 2.6.12-mm1 to 2.6.13-rc3-mm3
426$ cd ~/linux-2.6.12-mm1
427$ patch -p1 -R < ../2.6.12-mm1 # revert the 2.6.12-mm1 patch
428 # we now have a 2.6.12 source
429$ patch -p1 < ../patch-2.6.13-rc3 # apply the 2.6.13-rc3 patch
430 # we now have a 2.6.13-rc3 source
431$ patch -p1 < ../2.6.13-rc3-mm3 # apply the 2.6.13-rc3-mm3 patch
432$ cd ..
433$ mv linux-2.6.12-mm1 linux-2.6.13-rc3-mm3 # rename the source dir
434
435
436This concludes this list of explanations of the various kernel trees and I
437hope you are now crystal clear on how to apply the various patches and help
438testing the kernel.
439
diff --git a/Documentation/cdrom/sonycd535 b/Documentation/cdrom/sonycd535
index 59581a4b302a..b81e109970aa 100644
--- a/Documentation/cdrom/sonycd535
+++ b/Documentation/cdrom/sonycd535
@@ -68,7 +68,8 @@ it a better device citizen. Further thanks to Joel Katz
68Porfiri Claudio <C.Porfiri@nisms.tei.ericsson.se> for patches 68Porfiri Claudio <C.Porfiri@nisms.tei.ericsson.se> for patches
69to make the driver work with the older CDU-510/515 series, and 69to make the driver work with the older CDU-510/515 series, and
70Heiko Eissfeldt <heiko@colossus.escape.de> for pointing out that 70Heiko Eissfeldt <heiko@colossus.escape.de> for pointing out that
71the verify_area() checks were ignoring the results of said checks. 71the verify_area() checks were ignoring the results of said checks
72(note: verify_area() has since been replaced by access_ok()).
72 73
73(Acknowledgments from Ron Jeppesen in the 0.3 release:) 74(Acknowledgments from Ron Jeppesen in the 0.3 release:)
74Thanks to Corey Minyard who wrote the original CDU-31A driver on which 75Thanks to Corey Minyard who wrote the original CDU-31A driver on which
diff --git a/Documentation/cpusets.txt b/Documentation/cpusets.txt
index ad944c060312..47f4114fbf54 100644
--- a/Documentation/cpusets.txt
+++ b/Documentation/cpusets.txt
@@ -60,6 +60,18 @@ all of the cpus in the system. This removes any overhead due to
60load balancing code trying to pull tasks outside of the cpu exclusive 60load balancing code trying to pull tasks outside of the cpu exclusive
61cpuset only to be prevented by the tasks' cpus_allowed mask. 61cpuset only to be prevented by the tasks' cpus_allowed mask.
62 62
63A cpuset that is mem_exclusive restricts kernel allocations for
64page, buffer and other data commonly shared by the kernel across
65multiple users. All cpusets, whether mem_exclusive or not, restrict
66allocations of memory for user space. This enables configuring a
67system so that several independent jobs can share common kernel
68data, such as file system pages, while isolating each jobs user
69allocation in its own cpuset. To do this, construct a large
70mem_exclusive cpuset to hold all the jobs, and construct child,
71non-mem_exclusive cpusets for each individual job. Only a small
72amount of typical kernel memory, such as requests from interrupt
73handlers, is allowed to be taken outside even a mem_exclusive cpuset.
74
63User level code may create and destroy cpusets by name in the cpuset 75User level code may create and destroy cpusets by name in the cpuset
64virtual file system, manage the attributes and permissions of these 76virtual file system, manage the attributes and permissions of these
65cpusets and which CPUs and Memory Nodes are assigned to each cpuset, 77cpusets and which CPUs and Memory Nodes are assigned to each cpuset,
diff --git a/Documentation/crypto/api-intro.txt b/Documentation/crypto/api-intro.txt
index a2d5b4900772..74dffc68ff9f 100644
--- a/Documentation/crypto/api-intro.txt
+++ b/Documentation/crypto/api-intro.txt
@@ -223,6 +223,7 @@ CAST5 algorithm contributors:
223 223
224TEA/XTEA algorithm contributors: 224TEA/XTEA algorithm contributors:
225 Aaron Grothe 225 Aaron Grothe
226 Michael Ringe
226 227
227Khazad algorithm contributors: 228Khazad algorithm contributors:
228 Aaron Grothe 229 Aaron Grothe
diff --git a/Documentation/dcdbas.txt b/Documentation/dcdbas.txt
new file mode 100644
index 000000000000..e1c52e2dc361
--- /dev/null
+++ b/Documentation/dcdbas.txt
@@ -0,0 +1,91 @@
1Overview
2
3The Dell Systems Management Base Driver provides a sysfs interface for
4systems management software such as Dell OpenManage to perform system
5management interrupts and host control actions (system power cycle or
6power off after OS shutdown) on certain Dell systems.
7
8Dell OpenManage requires this driver on the following Dell PowerEdge systems:
9300, 1300, 1400, 400SC, 500SC, 1500SC, 1550, 600SC, 1600SC, 650, 1655MC,
10700, and 750. Other Dell software such as the open source libsmbios project
11is expected to make use of this driver, and it may include the use of this
12driver on other Dell systems.
13
14The Dell libsmbios project aims towards providing access to as much BIOS
15information as possible. See http://linux.dell.com/libsmbios/main/ for
16more information about the libsmbios project.
17
18
19System Management Interrupt
20
21On some Dell systems, systems management software must access certain
22management information via a system management interrupt (SMI). The SMI data
23buffer must reside in 32-bit address space, and the physical address of the
24buffer is required for the SMI. The driver maintains the memory required for
25the SMI and provides a way for the application to generate the SMI.
26The driver creates the following sysfs entries for systems management
27software to perform these system management interrupts:
28
29/sys/devices/platform/dcdbas/smi_data
30/sys/devices/platform/dcdbas/smi_data_buf_phys_addr
31/sys/devices/platform/dcdbas/smi_data_buf_size
32/sys/devices/platform/dcdbas/smi_request
33
34Systems management software must perform the following steps to execute
35a SMI using this driver:
36
371) Lock smi_data.
382) Write system management command to smi_data.
393) Write "1" to smi_request to generate a calling interface SMI or
40 "2" to generate a raw SMI.
414) Read system management command response from smi_data.
425) Unlock smi_data.
43
44
45Host Control Action
46
47Dell OpenManage supports a host control feature that allows the administrator
48to perform a power cycle or power off of the system after the OS has finished
49shutting down. On some Dell systems, this host control feature requires that
50a driver perform a SMI after the OS has finished shutting down.
51
52The driver creates the following sysfs entries for systems management software
53to schedule the driver to perform a power cycle or power off host control
54action after the system has finished shutting down:
55
56/sys/devices/platform/dcdbas/host_control_action
57/sys/devices/platform/dcdbas/host_control_smi_type
58/sys/devices/platform/dcdbas/host_control_on_shutdown
59
60Dell OpenManage performs the following steps to execute a power cycle or
61power off host control action using this driver:
62
631) Write host control action to be performed to host_control_action.
642) Write type of SMI that driver needs to perform to host_control_smi_type.
653) Write "1" to host_control_on_shutdown to enable host control action.
664) Initiate OS shutdown.
67 (Driver will perform host control SMI when it is notified that the OS
68 has finished shutting down.)
69
70
71Host Control SMI Type
72
73The following table shows the value to write to host_control_smi_type to
74perform a power cycle or power off host control action:
75
76PowerEdge System Host Control SMI Type
77---------------- ---------------------
78 300 HC_SMITYPE_TYPE1
79 1300 HC_SMITYPE_TYPE1
80 1400 HC_SMITYPE_TYPE2
81 500SC HC_SMITYPE_TYPE2
82 1500SC HC_SMITYPE_TYPE2
83 1550 HC_SMITYPE_TYPE2
84 600SC HC_SMITYPE_TYPE2
85 1600SC HC_SMITYPE_TYPE2
86 650 HC_SMITYPE_TYPE2
87 1655MC HC_SMITYPE_TYPE2
88 700 HC_SMITYPE_TYPE3
89 750 HC_SMITYPE_TYPE3
90
91
diff --git a/Documentation/dell_rbu.txt b/Documentation/dell_rbu.txt
new file mode 100644
index 000000000000..bcfa5c35036b
--- /dev/null
+++ b/Documentation/dell_rbu.txt
@@ -0,0 +1,74 @@
1Purpose:
2Demonstrate the usage of the new open sourced rbu (Remote BIOS Update) driver
3for updating BIOS images on Dell servers and desktops.
4
5Scope:
6This document discusses the functionality of the rbu driver only.
7It does not cover the support needed from aplications to enable the BIOS to
8update itself with the image downloaded in to the memory.
9
10Overview:
11This driver works with Dell OpenManage or Dell Update Packages for updating
12the BIOS on Dell servers (starting from servers sold since 1999), desktops
13and notebooks (starting from those sold in 2005).
14Please go to http://support.dell.com register and you can find info on
15OpenManage and Dell Update packages (DUP).
16
17Dell_RBU driver supports BIOS update using the monilothic image and packetized
18image methods. In case of moniolithic the driver allocates a contiguous chunk
19of physical pages having the BIOS image. In case of packetized the app
20using the driver breaks the image in to packets of fixed sizes and the driver
21would place each packet in contiguous physical memory. The driver also
22maintains a link list of packets for reading them back.
23If the dell_rbu driver is unloaded all the allocated memory is freed.
24
25The rbu driver needs to have an application which will inform the BIOS to
26enable the update in the next system reboot.
27
28The user should not unload the rbu driver after downloading the BIOS image
29or updating.
30
31The driver load creates the following directories under the /sys file system.
32/sys/class/firmware/dell_rbu/loading
33/sys/class/firmware/dell_rbu/data
34/sys/devices/platform/dell_rbu/image_type
35/sys/devices/platform/dell_rbu/data
36
37The driver supports two types of update mechanism; monolithic and packetized.
38These update mechanism depends upon the BIOS currently running on the system.
39Most of the Dell systems support a monolithic update where the BIOS image is
40copied to a single contiguous block of physical memory.
41In case of packet mechanism the single memory can be broken in smaller chuks
42of contiguous memory and the BIOS image is scattered in these packets.
43
44By default the driver uses monolithic memory for the update type. This can be
45changed to contiguous during the driver load time by specifying the load
46parameter image_type=packet. This can also be changed later as below
47echo packet > /sys/devices/platform/dell_rbu/image_type
48
49Do the steps below to download the BIOS image.
501) echo 1 > /sys/class/firmware/dell_rbu/loading
512) cp bios_image.hdr /sys/class/firmware/dell_rbu/data
523) echo 0 > /sys/class/firmware/dell_rbu/loading
53
54The /sys/class/firmware/dell_rbu/ entries will remain till the following is
55done.
56echo -1 > /sys/class/firmware/dell_rbu/loading
57
58Until this step is completed the drivr cannot be unloaded.
59
60Also the driver provides /sys/devices/platform/dell_rbu/data readonly file to
61read back the image downloaded. This is useful in case of packet update
62mechanism where the above steps 1,2,3 will repeated for every packet.
63By reading the /sys/devices/platform/dell_rbu/data file all packet data
64downloaded can be verified in a single file.
65The packets are arranged in this file one after the other in a FIFO order.
66
67NOTE:
68This driver requires a patch for firmware_class.c which has the addition
69of request_firmware_nowait_nohotplug function to wortk
70Also after updating the BIOS image an user mdoe application neeeds to execute
71code which message the BIOS update request to the BIOS. So on the next reboot
72the BIOS knows about the new image downloaded and it updates it self.
73Also don't unload the rbu drive if the image has to be updated.
74
diff --git a/Documentation/dvb/bt8xx.txt b/Documentation/dvb/bt8xx.txt
index e6b8d05bc08d..cb63b7a93c82 100644
--- a/Documentation/dvb/bt8xx.txt
+++ b/Documentation/dvb/bt8xx.txt
@@ -1,55 +1,74 @@
1How to get the Nebula Electronics DigiTV, Pinnacle PCTV Sat, Twinhan DST + clones working 1How to get the Nebula, PCTV and Twinhan DST cards working
2========================================================================================= 2=========================================================
3 3
41) General information 4This class of cards has a bt878a as the PCI interface, and
5====================== 5require the bttv driver.
6 6
7This class of cards has a bt878a chip as the PCI interface. 7Please pay close attention to the warning about the bttv module
8The different card drivers require the bttv driver to provide the means 8options below for the DST card.
9to access the i2c bus and the gpio pins of the bt8xx chipset.
10 9
112) Compilation rules for Kernel >= 2.6.12 101) General informations
12========================================= 11=======================
13 12
14Enable the following options: 13These drivers require the bttv driver to provide the means to access
14the i2c bus and the gpio pins of the bt8xx chipset.
15 15
16Because of this, you need to enable
16"Device drivers" => "Multimedia devices" 17"Device drivers" => "Multimedia devices"
17 => "Video For Linux" => "BT848 Video For Linux" 18 => "Video For Linux" => "BT848 Video For Linux"
19
20Furthermore you need to enable
18"Device drivers" => "Multimedia devices" => "Digital Video Broadcasting Devices" 21"Device drivers" => "Multimedia devices" => "Digital Video Broadcasting Devices"
19 => "DVB for Linux" "DVB Core Support" "Nebula/Pinnacle PCTV/TwinHan PCI Cards" 22 => "DVB for Linux" "DVB Core Support" "BT8xx based PCI cards"
20 23
213) Loading Modules, described by two approaches 242) Loading Modules
22=============================================== 25==================
23 26
24In general you need to load the bttv driver, which will handle the gpio and 27In general you need to load the bttv driver, which will handle the gpio and
25i2c communication for us, plus the common dvb-bt8xx device driver, 28i2c communication for us, plus the common dvb-bt8xx device driver.
26which is called the backend. 29The frontends for Nebula (nxt6000), Pinnacle PCTV (cx24110) and
27The frontends for Nebula DigiTV (nxt6000), Pinnacle PCTV Sat (cx24110), 30TwinHan (dst) are loaded automatically by the dvb-bt8xx device driver.
28TwinHan DST + clones (dst and dst-ca) are loaded automatically by the backend.
29For further details about TwinHan DST + clones see /Documentation/dvb/ci.txt.
30 31
313a) The manual approach 323a) Nebula / Pinnacle PCTV
32----------------------- 33--------------------------
33 34
34Loading modules: 35 $ modprobe bttv (normally bttv is being loaded automatically by kmod)
35modprobe bttv 36 $ modprobe dvb-bt8xx (or just place dvb-bt8xx in /etc/modules for automatic loading)
36modprobe dvb-bt8xx
37 37
38Unloading modules:
39modprobe -r dvb-bt8xx
40modprobe -r bttv
41 38
423b) The automatic approach 393b) TwinHan and Clones
43-------------------------- 40--------------------------
44 41
45If not already done by installation, place a line either in 42 $ modprobe bttv i2c_hw=1 card=0x71
46/etc/modules.conf or in /etc/modprobe.conf containing this text: 43 $ modprobe dvb-bt8xx
47alias char-major-81 bttv 44 $ modprobe dst
45
46The value 0x71 will override the PCI type detection for dvb-bt8xx,
47which is necessary for TwinHan cards.
48
49If you're having an older card (blue color circuit) and card=0x71 locks
50your machine, try using 0x68, too. If that does not work, ask on the
51mailing list.
52
53The DST module takes a couple of useful parameters.
54
55verbose takes values 0 to 4. These values control the verbosity level,
56and can be used to debug also.
57
58verbose=0 means complete disabling of messages
59 1 only error messages are displayed
60 2 notifications are also displayed
61 3 informational messages are also displayed
62 4 debug setting
63
64dst_addons takes values 0 and 0x20. A value of 0 means it is a FTA card.
650x20 means it has a Conditional Access slot.
66
67The autodected values are determined bythe cards 'response
68string' which you can see in your logs e.g.
48 69
49Then place a line in /etc/modules containing this text: 70dst_get_device_id: Recognise [DSTMCI]
50dvb-bt8xx
51 71
52Reboot your system and have fun!
53 72
54-- 73--
55Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham, Uwe Bugla 74Authors: Richard Walker, Jamie Honan, Michael Hunold, Manu Abraham
diff --git a/Documentation/dvb/ci.txt b/Documentation/dvb/ci.txt
index 62e0701b542a..95f0e73b2135 100644
--- a/Documentation/dvb/ci.txt
+++ b/Documentation/dvb/ci.txt
@@ -23,7 +23,6 @@ This application requires the following to function properly as of now.
23 eg: $ szap -c channels.conf -r "TMC" -x 23 eg: $ szap -c channels.conf -r "TMC" -x
24 24
25 (b) a channels.conf containing a valid PMT PID 25 (b) a channels.conf containing a valid PMT PID
26
27 eg: TMC:11996:h:0:27500:278:512:650:321 26 eg: TMC:11996:h:0:27500:278:512:650:321
28 27
29 here 278 is a valid PMT PID. the rest of the values are the 28 here 278 is a valid PMT PID. the rest of the values are the
@@ -31,13 +30,7 @@ This application requires the following to function properly as of now.
31 30
32 (c) after running a szap, you have to run ca_zap, for the 31 (c) after running a szap, you have to run ca_zap, for the
33 descrambler to function, 32 descrambler to function,
34 33 eg: $ ca_zap channels.conf "TMC"
35 eg: $ ca_zap patched_channels.conf "TMC"
36
37 The patched means a patch to apply to scan, such that scan can
38 generate a channels.conf_with pmt, which has this PMT PID info
39 (NOTE: szap cannot use this channels.conf with the PMT_PID)
40
41 34
42 (d) Hopeflly Enjoy your favourite subscribed channel as you do with 35 (d) Hopeflly Enjoy your favourite subscribed channel as you do with
43 a FTA card. 36 a FTA card.
diff --git a/Documentation/exception.txt b/Documentation/exception.txt
index f1d436993eb1..3cb39ade290e 100644
--- a/Documentation/exception.txt
+++ b/Documentation/exception.txt
@@ -7,7 +7,7 @@ To protect itself the kernel has to verify this address.
7 7
8In older versions of Linux this was done with the 8In older versions of Linux this was done with the
9int verify_area(int type, const void * addr, unsigned long size) 9int verify_area(int type, const void * addr, unsigned long size)
10function. 10function (which has since been replaced by access_ok()).
11 11
12This function verified that the memory area starting at address 12This function verified that the memory area starting at address
13addr and of size size was accessible for the operation specified 13addr and of size size was accessible for the operation specified
diff --git a/Documentation/fb/cyblafb/bugs b/Documentation/fb/cyblafb/bugs
new file mode 100644
index 000000000000..f90cc66ea919
--- /dev/null
+++ b/Documentation/fb/cyblafb/bugs
@@ -0,0 +1,14 @@
1Bugs
2====
3
4I currently don't know of any bug. Please do send reports to:
5 - linux-fbdev-devel@lists.sourceforge.net
6 - Knut_Petersen@t-online.de.
7
8
9Untested features
10=================
11
12All LCD stuff is untested. If it worked in tridentfb, it should work in
13cyblafb. Please test and report the results to Knut_Petersen@t-online.de.
14
diff --git a/Documentation/fb/cyblafb/credits b/Documentation/fb/cyblafb/credits
new file mode 100644
index 000000000000..0eb3b443dc2b
--- /dev/null
+++ b/Documentation/fb/cyblafb/credits
@@ -0,0 +1,7 @@
1Thanks to
2=========
3 * Alan Hourihane, for writing the X trident driver
4 * Jani Monoses, for writing the tridentfb driver
5 * Antonino A. Daplas, for review of the first published
6 version of cyblafb and some code
7 * Jochen Hein, for testing and a helpfull bug report
diff --git a/Documentation/fb/cyblafb/documentation b/Documentation/fb/cyblafb/documentation
new file mode 100644
index 000000000000..bb1aac048425
--- /dev/null
+++ b/Documentation/fb/cyblafb/documentation
@@ -0,0 +1,17 @@
1Available Documentation
2=======================
3
4Apollo PLE 133 Chipset VT8601A North Bridge Datasheet, Rev. 1.82, October 22,
52001, available from VIA:
6
7 http://www.viavpsd.com/product/6/15/DS8601A182.pdf
8
9The datasheet is incomplete, some registers that need to be programmed are not
10explained at all and important bits are listed as "reserved". But you really
11need the datasheet to understand the code. "p. xxx" comments refer to page
12numbers of this document.
13
14XFree/XOrg drivers are available and of good quality, looking at the code
15there is a good idea if the datasheet does not provide enough information
16or if the datasheet seems to be wrong.
17
diff --git a/Documentation/fb/cyblafb/fb.modes b/Documentation/fb/cyblafb/fb.modes
new file mode 100644
index 000000000000..cf4351fc32ff
--- /dev/null
+++ b/Documentation/fb/cyblafb/fb.modes
@@ -0,0 +1,155 @@
1#
2# Sample fb.modes file
3#
4# Provides an incomplete list of working modes for
5# the cyberblade/i1 graphics core.
6#
7# The value 4294967256 is used instead of -40. Of course, -40 is not
8# a really reasonable value, but chip design does not always follow
9# logic. Believe me, it's ok, and it's the way the BIOS does it.
10#
11# fbset requires 4294967256 in fb.modes and -40 as an argument to
12# the -t parameter. That's also not too reasonable, and it might change
13# in the future or might even be differt for your current version.
14#
15
16mode "640x480-50"
17 geometry 640 480 640 3756 8
18 timings 47619 4294967256 24 17 0 216 3
19endmode
20
21mode "640x480-60"
22 geometry 640 480 640 3756 8
23 timings 39682 4294967256 24 17 0 216 3
24endmode
25
26mode "640x480-70"
27 geometry 640 480 640 3756 8
28 timings 34013 4294967256 24 17 0 216 3
29endmode
30
31mode "640x480-72"
32 geometry 640 480 640 3756 8
33 timings 33068 4294967256 24 17 0 216 3
34endmode
35
36mode "640x480-75"
37 geometry 640 480 640 3756 8
38 timings 31746 4294967256 24 17 0 216 3
39endmode
40
41mode "640x480-80"
42 geometry 640 480 640 3756 8
43 timings 29761 4294967256 24 17 0 216 3
44endmode
45
46mode "640x480-85"
47 geometry 640 480 640 3756 8
48 timings 28011 4294967256 24 17 0 216 3
49endmode
50
51mode "800x600-50"
52 geometry 800 600 800 3221 8
53 timings 30303 96 24 14 0 136 11
54endmode
55
56mode "800x600-60"
57 geometry 800 600 800 3221 8
58 timings 25252 96 24 14 0 136 11
59endmode
60
61mode "800x600-70"
62 geometry 800 600 800 3221 8
63 timings 21645 96 24 14 0 136 11
64endmode
65
66mode "800x600-72"
67 geometry 800 600 800 3221 8
68 timings 21043 96 24 14 0 136 11
69endmode
70
71mode "800x600-75"
72 geometry 800 600 800 3221 8
73 timings 20202 96 24 14 0 136 11
74endmode
75
76mode "800x600-80"
77 geometry 800 600 800 3221 8
78 timings 18939 96 24 14 0 136 11
79endmode
80
81mode "800x600-85"
82 geometry 800 600 800 3221 8
83 timings 17825 96 24 14 0 136 11
84endmode
85
86mode "1024x768-50"
87 geometry 1024 768 1024 2815 8
88 timings 19054 144 24 29 0 120 3
89endmode
90
91mode "1024x768-60"
92 geometry 1024 768 1024 2815 8
93 timings 15880 144 24 29 0 120 3
94endmode
95
96mode "1024x768-70"
97 geometry 1024 768 1024 2815 8
98 timings 13610 144 24 29 0 120 3
99endmode
100
101mode "1024x768-72"
102 geometry 1024 768 1024 2815 8
103 timings 13232 144 24 29 0 120 3
104endmode
105
106mode "1024x768-75"
107 geometry 1024 768 1024 2815 8
108 timings 12703 144 24 29 0 120 3
109endmode
110
111mode "1024x768-80"
112 geometry 1024 768 1024 2815 8
113 timings 11910 144 24 29 0 120 3
114endmode
115
116mode "1024x768-85"
117 geometry 1024 768 1024 2815 8
118 timings 11209 144 24 29 0 120 3
119endmode
120
121mode "1280x1024-50"
122 geometry 1280 1024 1280 2662 8
123 timings 11114 232 16 39 0 160 3
124endmode
125
126mode "1280x1024-60"
127 geometry 1280 1024 1280 2662 8
128 timings 9262 232 16 39 0 160 3
129endmode
130
131mode "1280x1024-70"
132 geometry 1280 1024 1280 2662 8
133 timings 7939 232 16 39 0 160 3
134endmode
135
136mode "1280x1024-72"
137 geometry 1280 1024 1280 2662 8
138 timings 7719 232 16 39 0 160 3
139endmode
140
141mode "1280x1024-75"
142 geometry 1280 1024 1280 2662 8
143 timings 7410 232 16 39 0 160 3
144endmode
145
146mode "1280x1024-80"
147 geometry 1280 1024 1280 2662 8
148 timings 6946 232 16 39 0 160 3
149endmode
150
151mode "1280x1024-85"
152 geometry 1280 1024 1280 2662 8
153 timings 6538 232 16 39 0 160 3
154endmode
155
diff --git a/Documentation/fb/cyblafb/performance b/Documentation/fb/cyblafb/performance
new file mode 100644
index 000000000000..eb4e47a9cea6
--- /dev/null
+++ b/Documentation/fb/cyblafb/performance
@@ -0,0 +1,80 @@
1Speed
2=====
3
4CyBlaFB is much faster than tridentfb and vesafb. Compare the performance data
5for mode 1280x1024-[8,16,32]@61 Hz.
6
7Test 1: Cat a file with 2000 lines of 0 characters.
8Test 2: Cat a file with 2000 lines of 80 characters.
9Test 3: Cat a file with 2000 lines of 160 characters.
10
11All values show system time use in seconds, kernel 2.6.12 was used for
12the measurements. 2.6.13 is a bit slower, 2.6.14 hopefully will include a
13patch that speeds up kernel bitblitting a lot ( > 20%).
14
15+-----------+-----------------------------------------------------+
16| | not accelerated |
17| TRIDENTFB +-----------------+-----------------+-----------------+
18| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
19| | noypan | ypan | noypan | ypan | noypan | ypan |
20+-----------+--------+--------+--------+--------+--------+--------+
21| Test 1 | 4.31 | 4.33 | 6.05 | 12.81 | ---- | ---- |
22| Test 2 | 67.94 | 5.44 | 123.16 | 14.79 | ---- | ---- |
23| Test 3 | 131.36 | 6.55 | 240.12 | 16.76 | ---- | ---- |
24+-----------+--------+--------+--------+--------+--------+--------+
25| Comments | | | completely bro- |
26| | | | ken, monitor |
27| | | | switches off |
28+-----------+-----------------+-----------------+-----------------+
29
30
31+-----------+-----------------------------------------------------+
32| | accelerated |
33| TRIDENTFB +-----------------+-----------------+-----------------+
34| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
35| | noypan | ypan | noypan | ypan | noypan | ypan |
36+-----------+--------+--------+--------+--------+--------+--------+
37| Test 1 | ---- | ---- | 20.62 | 1.22 | ---- | ---- |
38| Test 2 | ---- | ---- | 22.61 | 3.19 | ---- | ---- |
39| Test 3 | ---- | ---- | 24.59 | 5.16 | ---- | ---- |
40+-----------+--------+--------+--------+--------+--------+--------+
41| Comments | broken, writing | broken, ok only | completely bro- |
42| | to wrong places | if bgcolor is | ken, monitor |
43| | on screen + bug | black, bug in | switches off |
44| | in fillrect() | fillrect() | |
45+-----------+-----------------+-----------------+-----------------+
46
47
48+-----------+-----------------------------------------------------+
49| | not accelerated |
50| VESAFB +-----------------+-----------------+-----------------+
51| of 2.6.12 | 8 bpp | 16 bpp | 32 bpp |
52| | noypan | ypan | noypan | ypan | noypan | ypan |
53+-----------+--------+--------+--------+--------+--------+--------+
54| Test 1 | 4.26 | 3.76 | 5.99 | 7.23 | ---- | ---- |
55| Test 2 | 65.65 | 4.89 | 120.88 | 9.08 | ---- | ---- |
56| Test 3 | 126.91 | 5.94 | 235.77 | 11.03 | ---- | ---- |
57+-----------+--------+--------+--------+--------+--------+--------+
58| Comments | vga=0x307 | vga=0x31a | vga=0x31b not |
59| | fh=80kHz | fh=80kHz | supported by |
60| | fv=75kHz | fv=75kHz | video BIOS and |
61| | | | hardware |
62+-----------+-----------------+-----------------+-----------------+
63
64
65+-----------+-----------------------------------------------------+
66| | accelerated |
67| CYBLAFB +-----------------+-----------------+-----------------+
68| | 8 bpp | 16 bpp | 32 bpp |
69| | noypan | ypan | noypan | ypan | noypan | ypan |
70+-----------+--------+--------+--------+--------+--------+--------+
71| Test 1 | 8.02 | 0.23 | 19.04 | 0.61 | 57.12 | 2.74 |
72| Test 2 | 8.38 | 0.55 | 19.39 | 0.92 | 57.54 | 3.13 |
73| Test 3 | 8.73 | 0.86 | 19.74 | 1.24 | 57.95 | 3.51 |
74+-----------+--------+--------+--------+--------+--------+--------+
75| Comments | | | |
76| | | | |
77| | | | |
78| | | | |
79+-----------+-----------------+-----------------+-----------------+
80
diff --git a/Documentation/fb/cyblafb/todo b/Documentation/fb/cyblafb/todo
new file mode 100644
index 000000000000..80fb2f89b6c1
--- /dev/null
+++ b/Documentation/fb/cyblafb/todo
@@ -0,0 +1,32 @@
1TODO / Missing features
2=======================
3
4Verify LCD stuff "stretch" and "center" options are
5 completely untested ... this code needs to be
6 verified. As I don't have access to such
7 hardware, please contact me if you are
8 willing run some tests.
9
10Interlaced video modes The reason that interleaved
11 modes are disabled is that I do not know
12 the meaning of the vertical interlace
13 parameter. Also the datasheet mentions a
14 bit d8 of a horizontal interlace parameter,
15 but nowhere the lower 8 bits. Please help
16 if you can.
17
18low-res double scan modes Who needs it?
19
20accelerated color blitting Who needs it? The console driver does use color
21 blitting for nothing but drawing the penguine,
22 everything else is done using color expanding
23 blitting of 1bpp character bitmaps.
24
25xpanning Who needs it?
26
27ioctls Who needs it?
28
29TV-out Will be done later
30
31??? Feel free to contact me if you have any
32 feature requests
diff --git a/Documentation/fb/cyblafb/usage b/Documentation/fb/cyblafb/usage
new file mode 100644
index 000000000000..e627c8f54211
--- /dev/null
+++ b/Documentation/fb/cyblafb/usage
@@ -0,0 +1,206 @@
1CyBlaFB is a framebuffer driver for the Cyberblade/i1 graphics core integrated
2into the VIA Apollo PLE133 (aka vt8601) south bridge. It is developed and
3tested using a VIA EPIA 5000 board.
4
5Cyblafb - compiled into the kernel or as a module?
6==================================================
7
8You might compile cyblafb either as a module or compile it permanently into the
9kernel.
10
11Unless you have a real reason to do so you should not compile both vesafb and
12cyblafb permanently into the kernel. It's possible and it helps during the
13developement cycle, but it's useless and will at least block some otherwise
14usefull memory for ordinary users.
15
16Selecting Modes
17===============
18
19 Startup Mode
20 ============
21
22 First of all, you might use the "vga=???" boot parameter as it is
23 documented in vesafb.txt and svga.txt. Cyblafb will detect the video
24 mode selected and will use the geometry and timings found by
25 inspecting the hardware registers.
26
27 video=cyblafb vga=0x317
28
29 Alternatively you might use a combination of the mode, ref and bpp
30 parameters. If you compiled the driver into the kernel, add something
31 like this to the kernel command line:
32
33 video=cyblafb:1280x1024,bpp=16,ref=50 ...
34
35 If you compiled the driver as a module, the same mode would be
36 selected by the following command:
37
38 modprobe cyblafb mode=1280x1024 bpp=16 ref=50 ...
39
40 None of the modes possible to select as startup modes are affected by
41 the problems described at the end of the next subsection.
42
43 Mode changes using fbset
44 ========================
45
46 You might use fbset to change the video mode, see "man fbset". Cyblafb
47 generally does assume that you know what you are doing. But it does
48 some checks, especially those that are needed to prevent you from
49 damaging your hardware.
50
51 - only 8, 16, 24 and 32 bpp video modes are accepted
52 - interlaced video modes are not accepted
53 - double scan video modes are not accepted
54 - if a flat panel is found, cyblafb does not allow you
55 to program a resolution higher than the physical
56 resolution of the flat panel monitor
57 - cyblafb does not allow xres to differ from xres_virtual
58 - cyblafb does not allow vclk to exceed 230 MHz. As 32 bpp
59 and (currently) 24 bit modes use a doubled vclk internally,
60 the dotclock limit as seen by fbset is 115 MHz for those
61 modes and 230 MHz for 8 and 16 bpp modes.
62
63 Any request that violates the rules given above will be ignored and
64 fbset will return an error.
65
66 If you program a virtual y resolution higher than the hardware limit,
67 cyblafb will silently decrease that value to the highest possible
68 value.
69
70 Attempts to disable acceleration are ignored.
71
72 Some video modes that should work do not work as expected. If you use
73 the standard fb.modes, fbset 640x480-60 will program that mode, but
74 you will see a vertical area, about two characters wide, with only
75 much darker characters than the other characters on the screen.
76 Cyblafb does allow that mode to be set, as it does not violate the
77 official specifications. It would need a lot of code to reliably sort
78 out all invalid modes, playing around with the margin values will
79 give a valid mode quickly. And if cyblafb would detect such an invalid
80 mode, should it silently alter the requested values or should it
81 report an error? Both options have some pros and cons. As stated
82 above, none of the startup modes are affected, and if you set
83 verbosity to 1 or higher, cyblafb will print the fbset command that
84 would be needed to program that mode using fbset.
85
86
87Other Parameters
88================
89
90
91crt don't autodetect, assume monitor connected to
92 standard VGA connector
93
94fp don't autodetect, assume flat panel display
95 connected to flat panel monitor interface
96
97nativex inform driver about native x resolution of
98 flat panel monitor connected to special
99 interface (should be autodetected)
100
101stretch stretch image to adapt low resolution modes to
102 higer resolutions of flat panel monitors
103 connected to special interface
104
105center center image to adapt low resolution modes to
106 higer resolutions of flat panel monitors
107 connected to special interface
108
109memsize use if autodetected memsize is wrong ...
110 should never be necessary
111
112nopcirr disable PCI read retry
113nopciwr disable PCI write retry
114nopcirb disable PCI read bursts
115nopciwb disable PCI write bursts
116
117bpp bpp for specified modes
118 valid values: 8 || 16 || 24 || 32
119
120ref refresh rate for specified mode
121 valid values: 50 <= ref <= 85
122
123mode 640x480 or 800x600 or 1024x768 or 1280x1024
124 if not specified, the startup mode will be detected
125 and used, so you might also use the vga=??? parameter
126 described in vesafb.txt. If you do not specify a mode,
127 bpp and ref parameters are ignored.
128
129verbosity 0 is the default, increase to at least 2 for every
130 bug report!
131
132vesafb allows cyblafb to be loaded after vesafb has been
133 loaded. See sections "Module unloading ...".
134
135
136Development hints
137=================
138
139It's much faster do compile a module and to load the new version after
140unloading the old module than to compile a new kernel and to reboot. So if you
141try to work on cyblafb, it might be a good idea to use cyblafb as a module.
142In real life, fast often means dangerous, and that's also the case here. If
143you introduce a serious bug when cyblafb is compiled into the kernel, the
144kernel will lock or oops with a high probability before the file system is
145mounted, and the danger for your data is low. If you load a broken own version
146of cyblafb on a running system, the danger for the integrity of the file
147system is much higher as you might need a hard reset afterwards. Decide
148yourself.
149
150Module unloading, the vfb method
151================================
152
153If you want to unload/reload cyblafb using the virtual framebuffer, you need
154to enable vfb support in the kernel first. After that, load the modules as
155shown below:
156
157 modprobe vfb vfb_enable=1
158 modprobe fbcon
159 modprobe cyblafb
160 fbset -fb /dev/fb1 1280x1024-60 -vyres 2662
161 con2fb /dev/fb1 /dev/tty1
162 ...
163
164If you now made some changes to cyblafb and want to reload it, you might do it
165as show below:
166
167 con2fb /dev/fb0 /dev/tty1
168 ...
169 rmmod cyblafb
170 modprobe cyblafb
171 con2fb /dev/fb1 /dev/tty1
172 ...
173
174Of course, you might choose another mode, and most certainly you also want to
175map some other /dev/tty* to the real framebuffer device. You might also choose
176to compile fbcon as a kernel module or place it permanently in the kernel.
177
178I do not know of any way to unload fbcon, and fbcon will prevent the
179framebuffer device loaded first from unloading. [If there is a way, then
180please add a description here!]
181
182Module unloading, the vesafb method
183===================================
184
185Configure the kernel:
186
187 <*> Support for frame buffer devices
188 [*] VESA VGA graphics support
189 <M> Cyberblade/i1 support
190
191Add e.g. "video=vesafb:ypan vga=0x307" to the kernel parameters. The ypan
192parameter is important, choose any vga parameter you like as long as it is
193a graphics mode.
194
195After booting, load cyblafb without any mode and bpp parameter and assign
196cyblafb to individual ttys using con2fb, e.g.:
197
198 modprobe cyblafb vesafb=1
199 con2fb /dev/fb1 /dev/tty1
200
201Unloading cyblafb works without problems after you assign vesafb to all
202ttys again, e.g.:
203
204 con2fb /dev/fb0 /dev/tty1
205 rmmod cyblafb
206
diff --git a/Documentation/fb/cyblafb/whycyblafb b/Documentation/fb/cyblafb/whycyblafb
new file mode 100644
index 000000000000..a123bc11e698
--- /dev/null
+++ b/Documentation/fb/cyblafb/whycyblafb
@@ -0,0 +1,85 @@
1I tried the following framebuffer drivers:
2
3 - TRIDENTFB is full of bugs. Acceleration is broken for Blade3D
4 graphics cores like the cyberblade/i1. It claims to support a great
5 number of devices, but documentation for most of these devices is
6 unfortunately not available. There is _no_ reason to use tridentfb
7 for cyberblade/i1 + CRT users. VESAFB is faster, and the one
8 advantage, mode switching, is broken in tridentfb.
9
10 - VESAFB is used by many distributions as a standard. Vesafb does
11 not support mode switching. VESAFB is a bit faster than the working
12 configurations of TRIDENTFB, but it is still too slow, even if you
13 use ypan.
14
15 - EPIAFB (you'll find it on sourceforge) supports the Cyberblade/i1
16 graphics core, but it still has serious bugs and developement seems
17 to have stopped. This is the one driver with TV-out support. If you
18 do need this feature, try epiafb.
19
20None of these drivers was a real option for me.
21
22I believe that is unreasonable to change code that announces to support 20
23devices if I only have more or less sufficient documentation for exactly one
24of these. The risk of breaking device foo while fixing device bar is too high.
25
26So I decided to start CyBlaFB as a stripped down tridentfb.
27
28All code specific to other Trident chips has been removed. After that there
29were a lot of cosmetic changes to increase the readability of the code. All
30register names were changed to those mnemonics used in the datasheet. Function
31and macro names were changed if they hindered easy understanding of the code.
32
33After that I debugged the code and implemented some new features. I'll try to
34give a little summary of the main changes:
35
36 - calculation of vertical and horizontal timings was fixed
37
38 - video signal quality has been improved dramatically
39
40 - acceleration:
41
42 - fillrect and copyarea were fixed and reenabled
43
44 - color expanding imageblit was newly implemented, color
45 imageblit (only used to draw the penguine) still uses the
46 generic code.
47
48 - init of the acceleration engine was improved and moved to a
49 place where it really works ...
50
51 - sync function has a timeout now and tries to reset and
52 reinit the accel engine if necessary
53
54 - fewer slow copyarea calls when doing ypan scrolling by using
55 undocumented bit d21 of screen start address stored in
56 CR2B[5]. BIOS does use it also, so this should be safe.
57
58 - cyblafb rejects any attempt to set modes that would cause vclk
59 values above reasonable 230 MHz. 32bit modes use a clock
60 multiplicator of 2, so fbset does show the correct values for
61 pixclock but not for vclk in this case. The fbset limit is 115 MHz
62 for 32 bpp modes.
63
64 - cyblafb rejects modes known to be broken or unimplemented (all
65 interlaced modes, all doublescan modes for now)
66
67 - cyblafb now works independant of the video mode in effect at startup
68 time (tridentfb does not init all needed registers to reasonable
69 values)
70
71 - switching between video modes does work reliably now
72
73 - the first video mode now is the one selected on startup using the
74 vga=???? mechanism or any of
75 - 640x480, 800x600, 1024x768, 1280x1024
76 - 8, 16, 24 or 32 bpp
77 - refresh between 50 Hz and 85 Hz, 1 Hz steps (1280x1024-32
78 is limited to 63Hz)
79
80 - pci retry and pci burst mode are settable (try to disable if you
81 experience latency problems)
82
83 - built as a module cyblafb might be unloaded and reloaded using
84 the vfb module and con2vt or might be used together with vesafb
85
diff --git a/Documentation/fb/modedb.txt b/Documentation/fb/modedb.txt
index e04458b319d5..4fcdb4cf4cca 100644
--- a/Documentation/fb/modedb.txt
+++ b/Documentation/fb/modedb.txt
@@ -20,12 +20,83 @@ in a video= option, fbmem considers that to be a global video mode option.
20 20
21Valid mode specifiers (mode_option argument): 21Valid mode specifiers (mode_option argument):
22 22
23 <xres>x<yres>[-<bpp>][@<refresh>] 23 <xres>x<yres>[M][R][-<bpp>][@<refresh>][i][m]
24 <name>[-<bpp>][@<refresh>] 24 <name>[-<bpp>][@<refresh>]
25 25
26with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string. 26with <xres>, <yres>, <bpp> and <refresh> decimal numbers and <name> a string.
27Things between square brackets are optional. 27Things between square brackets are optional.
28 28
29If 'M' is specified in the mode_option argument (after <yres> and before
30<bpp> and <refresh>, if specified) the timings will be calculated using
31VESA(TM) Coordinated Video Timings instead of looking up the mode from a table.
32If 'R' is specified, do a 'reduced blanking' calculation for digital displays.
33If 'i' is specified, calculate for an interlaced mode. And if 'm' is
34specified, add margins to the calculation (1.8% of xres rounded down to 8
35pixels and 1.8% of yres).
36
37 Sample usage: 1024x768M@60m - CVT timing with margins
38
39***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
40
41What is the VESA(TM) Coordinated Video Timings (CVT)?
42
43From the VESA(TM) Website:
44
45 "The purpose of CVT is to provide a method for generating a consistent
46 and coordinated set of standard formats, display refresh rates, and
47 timing specifications for computer display products, both those
48 employing CRTs, and those using other display technologies. The
49 intention of CVT is to give both source and display manufacturers a
50 common set of tools to enable new timings to be developed in a
51 consistent manner that ensures greater compatibility."
52
53This is the third standard approved by VESA(TM) concerning video timings. The
54first was the Discrete Video Timings (DVT) which is a collection of
55pre-defined modes approved by VESA(TM). The second is the Generalized Timing
56Formula (GTF) which is an algorithm to calculate the timings, given the
57pixelclock, the horizontal sync frequency, or the vertical refresh rate.
58
59The GTF is limited by the fact that it is designed mainly for CRT displays.
60It artificially increases the pixelclock because of its high blanking
61requirement. This is inappropriate for digital display interface with its high
62data rate which requires that it conserves the pixelclock as much as possible.
63Also, GTF does not take into account the aspect ratio of the display.
64
65The CVT addresses these limitations. If used with CRT's, the formula used
66is a derivation of GTF with a few modifications. If used with digital
67displays, the "reduced blanking" calculation can be used.
68
69From the framebuffer subsystem perspective, new formats need not be added
70to the global mode database whenever a new mode is released by display
71manufacturers. Specifying for CVT will work for most, if not all, relatively
72new CRT displays and probably with most flatpanels, if 'reduced blanking'
73calculation is specified. (The CVT compatibility of the display can be
74determined from its EDID. The version 1.3 of the EDID has extra 128-byte
75blocks where additional timing information is placed. As of this time, there
76is no support yet in the layer to parse this additional blocks.)
77
78CVT also introduced a new naming convention (should be seen from dmesg output):
79
80 <pix>M<a>[-R]
81
82 where: pix = total amount of pixels in MB (xres x yres)
83 M = always present
84 a = aspect ratio (3 - 4:3; 4 - 5:4; 9 - 15:9, 16:9; A - 16:10)
85 -R = reduced blanking
86
87 example: .48M3-R - 800x600 with reduced blanking
88
89Note: VESA(TM) has restrictions on what is a standard CVT timing:
90
91 - aspect ratio can only be one of the above values
92 - acceptable refresh rates are 50, 60, 70 or 85 Hz only
93 - if reduced blanking, the refresh rate must be at 60Hz
94
95If one of the above are not satisfied, the kernel will print a warning but the
96timings will still be calculated.
97
98***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo ***** oOo *****
99
29To find a suitable video mode, you just call 100To find a suitable video mode, you just call
30 101
31int __init fb_find_mode(struct fb_var_screeninfo *var, 102int __init fb_find_mode(struct fb_var_screeninfo *var,
diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt
index 0665cb12bd66..5f95d4b3cab1 100644
--- a/Documentation/feature-removal-schedule.txt
+++ b/Documentation/feature-removal-schedule.txt
@@ -25,15 +25,6 @@ Who: Pavel Machek <pavel@suse.cz>
25 25
26--------------------------- 26---------------------------
27 27
28What: PCI Name Database (CONFIG_PCI_NAMES)
29When: July 2005
30Why: It bloats the kernel unnecessarily, and is handled by userspace better
31 (pciutils supports it.) Will eliminate the need to try to keep the
32 pci.ids file in sync with the sf.net database all of the time.
33Who: Greg Kroah-Hartman <gregkh@suse.de>
34
35---------------------------
36
37What: io_remap_page_range() (macro or function) 28What: io_remap_page_range() (macro or function)
38When: September 2005 29When: September 2005
39Why: Replaced by io_remap_pfn_range() which allows more memory space 30Why: Replaced by io_remap_pfn_range() which allows more memory space
@@ -51,14 +42,6 @@ Who: Adrian Bunk <bunk@stusta.de>
51 42
52--------------------------- 43---------------------------
53 44
54What: register_ioctl32_conversion() / unregister_ioctl32_conversion()
55When: April 2005
56Why: Replaced by ->compat_ioctl in file_operations and other method
57 vecors.
58Who: Andi Kleen <ak@muc.de>, Christoph Hellwig <hch@lst.de>
59
60---------------------------
61
62What: RCU API moves to EXPORT_SYMBOL_GPL 45What: RCU API moves to EXPORT_SYMBOL_GPL
63When: April 2006 46When: April 2006
64Files: include/linux/rcupdate.h, kernel/rcupdate.c 47Files: include/linux/rcupdate.h, kernel/rcupdate.c
@@ -74,14 +57,6 @@ Who: Paul E. McKenney <paulmck@us.ibm.com>
74 57
75--------------------------- 58---------------------------
76 59
77What: remove verify_area()
78When: July 2006
79Files: Various uaccess.h headers.
80Why: Deprecated and redundant. access_ok() should be used instead.
81Who: Jesper Juhl <juhl-lkml@dif.dk>
82
83---------------------------
84
85What: IEEE1394 Audio and Music Data Transmission Protocol driver, 60What: IEEE1394 Audio and Music Data Transmission Protocol driver,
86 Connection Management Procedures driver 61 Connection Management Procedures driver
87When: November 2005 62When: November 2005
@@ -102,16 +77,6 @@ Who: Jody McIntyre <scjody@steamballoon.com>
102 77
103--------------------------- 78---------------------------
104 79
105What: register_serial/unregister_serial
106When: September 2005
107Why: This interface does not allow serial ports to be registered against
108 a struct device, and as such does not allow correct power management
109 of such ports. 8250-based ports should use serial8250_register_port
110 and serial8250_unregister_port, or platform devices instead.
111Who: Russell King <rmk@arm.linux.org.uk>
112
113---------------------------
114
115What: i2c sysfs name change: in1_ref, vid deprecated in favour of cpu0_vid 80What: i2c sysfs name change: in1_ref, vid deprecated in favour of cpu0_vid
116When: November 2005 81When: November 2005
117Files: drivers/i2c/chips/adm1025.c, drivers/i2c/chips/adm1026.c 82Files: drivers/i2c/chips/adm1025.c, drivers/i2c/chips/adm1026.c
diff --git a/Documentation/filesystems/files.txt b/Documentation/filesystems/files.txt
new file mode 100644
index 000000000000..8c206f4e0250
--- /dev/null
+++ b/Documentation/filesystems/files.txt
@@ -0,0 +1,123 @@
1File management in the Linux kernel
2-----------------------------------
3
4This document describes how locking for files (struct file)
5and file descriptor table (struct files) works.
6
7Up until 2.6.12, the file descriptor table has been protected
8with a lock (files->file_lock) and reference count (files->count).
9->file_lock protected accesses to all the file related fields
10of the table. ->count was used for sharing the file descriptor
11table between tasks cloned with CLONE_FILES flag. Typically
12this would be the case for posix threads. As with the common
13refcounting model in the kernel, the last task doing
14a put_files_struct() frees the file descriptor (fd) table.
15The files (struct file) themselves are protected using
16reference count (->f_count).
17
18In the new lock-free model of file descriptor management,
19the reference counting is similar, but the locking is
20based on RCU. The file descriptor table contains multiple
21elements - the fd sets (open_fds and close_on_exec, the
22array of file pointers, the sizes of the sets and the array
23etc.). In order for the updates to appear atomic to
24a lock-free reader, all the elements of the file descriptor
25table are in a separate structure - struct fdtable.
26files_struct contains a pointer to struct fdtable through
27which the actual fd table is accessed. Initially the
28fdtable is embedded in files_struct itself. On a subsequent
29expansion of fdtable, a new fdtable structure is allocated
30and files->fdtab points to the new structure. The fdtable
31structure is freed with RCU and lock-free readers either
32see the old fdtable or the new fdtable making the update
33appear atomic. Here are the locking rules for
34the fdtable structure -
35
361. All references to the fdtable must be done through
37 the files_fdtable() macro :
38
39 struct fdtable *fdt;
40
41 rcu_read_lock();
42
43 fdt = files_fdtable(files);
44 ....
45 if (n <= fdt->max_fds)
46 ....
47 ...
48 rcu_read_unlock();
49
50 files_fdtable() uses rcu_dereference() macro which takes care of
51 the memory barrier requirements for lock-free dereference.
52 The fdtable pointer must be read within the read-side
53 critical section.
54
552. Reading of the fdtable as described above must be protected
56 by rcu_read_lock()/rcu_read_unlock().
57
583. For any update to the the fd table, files->file_lock must
59 be held.
60
614. To look up the file structure given an fd, a reader
62 must use either fcheck() or fcheck_files() APIs. These
63 take care of barrier requirements due to lock-free lookup.
64 An example :
65
66 struct file *file;
67
68 rcu_read_lock();
69 file = fcheck(fd);
70 if (file) {
71 ...
72 }
73 ....
74 rcu_read_unlock();
75
765. Handling of the file structures is special. Since the look-up
77 of the fd (fget()/fget_light()) are lock-free, it is possible
78 that look-up may race with the last put() operation on the
79 file structure. This is avoided using the rcuref APIs
80 on ->f_count :
81
82 rcu_read_lock();
83 file = fcheck_files(files, fd);
84 if (file) {
85 if (rcuref_inc_lf(&file->f_count))
86 *fput_needed = 1;
87 else
88 /* Didn't get the reference, someone's freed */
89 file = NULL;
90 }
91 rcu_read_unlock();
92 ....
93 return file;
94
95 rcuref_inc_lf() detects if refcounts is already zero or
96 goes to zero during increment. If it does, we fail
97 fget()/fget_light().
98
996. Since both fdtable and file structures can be looked up
100 lock-free, they must be installed using rcu_assign_pointer()
101 API. If they are looked up lock-free, rcu_dereference()
102 must be used. However it is advisable to use files_fdtable()
103 and fcheck()/fcheck_files() which take care of these issues.
104
1057. While updating, the fdtable pointer must be looked up while
106 holding files->file_lock. If ->file_lock is dropped, then
107 another thread expand the files thereby creating a new
108 fdtable and making the earlier fdtable pointer stale.
109 For example :
110
111 spin_lock(&files->file_lock);
112 fd = locate_fd(files, file, start);
113 if (fd >= 0) {
114 /* locate_fd() may have expanded fdtable, load the ptr */
115 fdt = files_fdtable(files);
116 FD_SET(fd, fdt->open_fds);
117 FD_CLR(fd, fdt->close_on_exec);
118 spin_unlock(&files->file_lock);
119 .....
120
121 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock),
122 the fdtable pointer (fdt) must be loaded after locate_fd().
123
diff --git a/Documentation/filesystems/fuse.txt b/Documentation/filesystems/fuse.txt
new file mode 100644
index 000000000000..6b5741e651a2
--- /dev/null
+++ b/Documentation/filesystems/fuse.txt
@@ -0,0 +1,315 @@
1Definitions
2~~~~~~~~~~~
3
4Userspace filesystem:
5
6 A filesystem in which data and metadata are provided by an ordinary
7 userspace process. The filesystem can be accessed normally through
8 the kernel interface.
9
10Filesystem daemon:
11
12 The process(es) providing the data and metadata of the filesystem.
13
14Non-privileged mount (or user mount):
15
16 A userspace filesystem mounted by a non-privileged (non-root) user.
17 The filesystem daemon is running with the privileges of the mounting
18 user. NOTE: this is not the same as mounts allowed with the "user"
19 option in /etc/fstab, which is not discussed here.
20
21Mount owner:
22
23 The user who does the mounting.
24
25User:
26
27 The user who is performing filesystem operations.
28
29What is FUSE?
30~~~~~~~~~~~~~
31
32FUSE is a userspace filesystem framework. It consists of a kernel
33module (fuse.ko), a userspace library (libfuse.*) and a mount utility
34(fusermount).
35
36One of the most important features of FUSE is allowing secure,
37non-privileged mounts. This opens up new possibilities for the use of
38filesystems. A good example is sshfs: a secure network filesystem
39using the sftp protocol.
40
41The userspace library and utilities are available from the FUSE
42homepage:
43
44 http://fuse.sourceforge.net/
45
46Mount options
47~~~~~~~~~~~~~
48
49'fd=N'
50
51 The file descriptor to use for communication between the userspace
52 filesystem and the kernel. The file descriptor must have been
53 obtained by opening the FUSE device ('/dev/fuse').
54
55'rootmode=M'
56
57 The file mode of the filesystem's root in octal representation.
58
59'user_id=N'
60
61 The numeric user id of the mount owner.
62
63'group_id=N'
64
65 The numeric group id of the mount owner.
66
67'default_permissions'
68
69 By default FUSE doesn't check file access permissions, the
70 filesystem is free to implement it's access policy or leave it to
71 the underlying file access mechanism (e.g. in case of network
72 filesystems). This option enables permission checking, restricting
73 access based on file mode. This is option is usually useful
74 together with the 'allow_other' mount option.
75
76'allow_other'
77
78 This option overrides the security measure restricting file access
79 to the user mounting the filesystem. This option is by default only
80 allowed to root, but this restriction can be removed with a
81 (userspace) configuration option.
82
83'max_read=N'
84
85 With this option the maximum size of read operations can be set.
86 The default is infinite. Note that the size of read requests is
87 limited anyway to 32 pages (which is 128kbyte on i386).
88
89How do non-privileged mounts work?
90~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
91
92Since the mount() system call is a privileged operation, a helper
93program (fusermount) is needed, which is installed setuid root.
94
95The implication of providing non-privileged mounts is that the mount
96owner must not be able to use this capability to compromise the
97system. Obvious requirements arising from this are:
98
99 A) mount owner should not be able to get elevated privileges with the
100 help of the mounted filesystem
101
102 B) mount owner should not get illegitimate access to information from
103 other users' and the super user's processes
104
105 C) mount owner should not be able to induce undesired behavior in
106 other users' or the super user's processes
107
108How are requirements fulfilled?
109~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
110
111 A) The mount owner could gain elevated privileges by either:
112
113 1) creating a filesystem containing a device file, then opening
114 this device
115
116 2) creating a filesystem containing a suid or sgid application,
117 then executing this application
118
119 The solution is not to allow opening device files and ignore
120 setuid and setgid bits when executing programs. To ensure this
121 fusermount always adds "nosuid" and "nodev" to the mount options
122 for non-privileged mounts.
123
124 B) If another user is accessing files or directories in the
125 filesystem, the filesystem daemon serving requests can record the
126 exact sequence and timing of operations performed. This
127 information is otherwise inaccessible to the mount owner, so this
128 counts as an information leak.
129
130 The solution to this problem will be presented in point 2) of C).
131
132 C) There are several ways in which the mount owner can induce
133 undesired behavior in other users' processes, such as:
134
135 1) mounting a filesystem over a file or directory which the mount
136 owner could otherwise not be able to modify (or could only
137 make limited modifications).
138
139 This is solved in fusermount, by checking the access
140 permissions on the mountpoint and only allowing the mount if
141 the mount owner can do unlimited modification (has write
142 access to the mountpoint, and mountpoint is not a "sticky"
143 directory)
144
145 2) Even if 1) is solved the mount owner can change the behavior
146 of other users' processes.
147
148 i) It can slow down or indefinitely delay the execution of a
149 filesystem operation creating a DoS against the user or the
150 whole system. For example a suid application locking a
151 system file, and then accessing a file on the mount owner's
152 filesystem could be stopped, and thus causing the system
153 file to be locked forever.
154
155 ii) It can present files or directories of unlimited length, or
156 directory structures of unlimited depth, possibly causing a
157 system process to eat up diskspace, memory or other
158 resources, again causing DoS.
159
160 The solution to this as well as B) is not to allow processes
161 to access the filesystem, which could otherwise not be
162 monitored or manipulated by the mount owner. Since if the
163 mount owner can ptrace a process, it can do all of the above
164 without using a FUSE mount, the same criteria as used in
165 ptrace can be used to check if a process is allowed to access
166 the filesystem or not.
167
168 Note that the ptrace check is not strictly necessary to
169 prevent B/2/i, it is enough to check if mount owner has enough
170 privilege to send signal to the process accessing the
171 filesystem, since SIGSTOP can be used to get a similar effect.
172
173I think these limitations are unacceptable?
174~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
175
176If a sysadmin trusts the users enough, or can ensure through other
177measures, that system processes will never enter non-privileged
178mounts, it can relax the last limitation with a "user_allow_other"
179config option. If this config option is set, the mounting user can
180add the "allow_other" mount option which disables the check for other
181users' processes.
182
183Kernel - userspace interface
184~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185
186The following diagram shows how a filesystem operation (in this
187example unlink) is performed in FUSE.
188
189NOTE: everything in this description is greatly simplified
190
191 | "rm /mnt/fuse/file" | FUSE filesystem daemon
192 | |
193 | | >sys_read()
194 | | >fuse_dev_read()
195 | | >request_wait()
196 | | [sleep on fc->waitq]
197 | |
198 | >sys_unlink() |
199 | >fuse_unlink() |
200 | [get request from |
201 | fc->unused_list] |
202 | >request_send() |
203 | [queue req on fc->pending] |
204 | [wake up fc->waitq] | [woken up]
205 | >request_wait_answer() |
206 | [sleep on req->waitq] |
207 | | <request_wait()
208 | | [remove req from fc->pending]
209 | | [copy req to read buffer]
210 | | [add req to fc->processing]
211 | | <fuse_dev_read()
212 | | <sys_read()
213 | |
214 | | [perform unlink]
215 | |
216 | | >sys_write()
217 | | >fuse_dev_write()
218 | | [look up req in fc->processing]
219 | | [remove from fc->processing]
220 | | [copy write buffer to req]
221 | [woken up] | [wake up req->waitq]
222 | | <fuse_dev_write()
223 | | <sys_write()
224 | <request_wait_answer() |
225 | <request_send() |
226 | [add request to |
227 | fc->unused_list] |
228 | <fuse_unlink() |
229 | <sys_unlink() |
230
231There are a couple of ways in which to deadlock a FUSE filesystem.
232Since we are talking about unprivileged userspace programs,
233something must be done about these.
234
235Scenario 1 - Simple deadlock
236-----------------------------
237
238 | "rm /mnt/fuse/file" | FUSE filesystem daemon
239 | |
240 | >sys_unlink("/mnt/fuse/file") |
241 | [acquire inode semaphore |
242 | for "file"] |
243 | >fuse_unlink() |
244 | [sleep on req->waitq] |
245 | | <sys_read()
246 | | >sys_unlink("/mnt/fuse/file")
247 | | [acquire inode semaphore
248 | | for "file"]
249 | | *DEADLOCK*
250
251The solution for this is to allow requests to be interrupted while
252they are in userspace:
253
254 | [interrupted by signal] |
255 | <fuse_unlink() |
256 | [release semaphore] | [semaphore acquired]
257 | <sys_unlink() |
258 | | >fuse_unlink()
259 | | [queue req on fc->pending]
260 | | [wake up fc->waitq]
261 | | [sleep on req->waitq]
262
263If the filesystem daemon was single threaded, this will stop here,
264since there's no other thread to dequeue and execute the request.
265In this case the solution is to kill the FUSE daemon as well. If
266there are multiple serving threads, you just have to kill them as
267long as any remain.
268
269Moral: a filesystem which deadlocks, can soon find itself dead.
270
271Scenario 2 - Tricky deadlock
272----------------------------
273
274This one needs a carefully crafted filesystem. It's a variation on
275the above, only the call back to the filesystem is not explicit,
276but is caused by a pagefault.
277
278 | Kamikaze filesystem thread 1 | Kamikaze filesystem thread 2
279 | |
280 | [fd = open("/mnt/fuse/file")] | [request served normally]
281 | [mmap fd to 'addr'] |
282 | [close fd] | [FLUSH triggers 'magic' flag]
283 | [read a byte from addr] |
284 | >do_page_fault() |
285 | [find or create page] |
286 | [lock page] |
287 | >fuse_readpage() |
288 | [queue READ request] |
289 | [sleep on req->waitq] |
290 | | [read request to buffer]
291 | | [create reply header before addr]
292 | | >sys_write(addr - headerlength)
293 | | >fuse_dev_write()
294 | | [look up req in fc->processing]
295 | | [remove from fc->processing]
296 | | [copy write buffer to req]
297 | | >do_page_fault()
298 | | [find or create page]
299 | | [lock page]
300 | | * DEADLOCK *
301
302Solution is again to let the the request be interrupted (not
303elaborated further).
304
305An additional problem is that while the write buffer is being
306copied to the request, the request must not be interrupted. This
307is because the destination address of the copy may not be valid
308after the request is interrupted.
309
310This is solved with doing the copy atomically, and allowing
311interruption while the page(s) belonging to the write buffer are
312faulted with get_user_pages(). The 'req->locked' flag indicates
313when the copy is taking place, and interruption is delayed until
314this flag is unset.
315
diff --git a/Documentation/filesystems/ntfs.txt b/Documentation/filesystems/ntfs.txt
index eef4aca0c753..a5fbc8e897fa 100644
--- a/Documentation/filesystems/ntfs.txt
+++ b/Documentation/filesystems/ntfs.txt
@@ -439,6 +439,18 @@ ChangeLog
439 439
440Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog. 440Note, a technical ChangeLog aimed at kernel hackers is in fs/ntfs/ChangeLog.
441 441
4422.1.24:
443 - Support journals ($LogFile) which have been modified by chkdsk. This
444 means users can boot into Windows after we marked the volume dirty.
445 The Windows boot will run chkdsk and then reboot. The user can then
446 immediately boot into Linux rather than having to do a full Windows
447 boot first before rebooting into Linux and we will recognize such a
448 journal and empty it as it is clean by definition.
449 - Support journals ($LogFile) with only one restart page as well as
450 journals with two different restart pages. We sanity check both and
451 either use the only sane one or the more recent one of the two in the
452 case that both are valid.
453 - Lots of bug fixes and enhancements across the board.
4422.1.23: 4542.1.23:
443 - Stamp the user space journal, aka transaction log, aka $UsnJrnl, if 455 - Stamp the user space journal, aka transaction log, aka $UsnJrnl, if
444 it is present and active thus telling Windows and applications using 456 it is present and active thus telling Windows and applications using
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 6c98f2bd421e..d4773565ea2f 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -133,6 +133,7 @@ Table 1-1: Process specific entries in /proc
133 statm Process memory status information 133 statm Process memory status information
134 status Process status in human readable form 134 status Process status in human readable form
135 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan 135 wchan If CONFIG_KALLSYMS is set, a pre-decoded wchan
136 smaps Extension based on maps, presenting the rss size for each mapped file
136.............................................................................. 137..............................................................................
137 138
138For example, to get the status information of a process, all you have to do is 139For example, to get the status information of a process, all you have to do is
@@ -1240,16 +1241,38 @@ swap-intensive.
1240overcommit_memory 1241overcommit_memory
1241----------------- 1242-----------------
1242 1243
1243This file contains one value. The following algorithm is used to decide if 1244Controls overcommit of system memory, possibly allowing processes
1244there's enough memory: if the value of overcommit_memory is positive, then 1245to allocate (but not use) more memory than is actually available.
1245there's always enough memory. This is a useful feature, since programs often
1246malloc() huge amounts of memory 'just in case', while they only use a small
1247part of it. Leaving this value at 0 will lead to the failure of such a huge
1248malloc(), when in fact the system has enough memory for the program to run.
1249 1246
1250On the other hand, enabling this feature can cause you to run out of memory 1247
1251and thrash the system to death, so large and/or important servers will want to 12480 - Heuristic overcommit handling. Obvious overcommits of
1252set this value to 0. 1249 address space are refused. Used for a typical system. It
1250 ensures a seriously wild allocation fails while allowing
1251 overcommit to reduce swap usage. root is allowed to
1252 allocate slighly more memory in this mode. This is the
1253 default.
1254
12551 - Always overcommit. Appropriate for some scientific
1256 applications.
1257
12582 - Don't overcommit. The total address space commit
1259 for the system is not permitted to exceed swap plus a
1260 configurable percentage (default is 50) of physical RAM.
1261 Depending on the percentage you use, in most situations
1262 this means a process will not be killed while attempting
1263 to use already-allocated memory but will receive errors
1264 on memory allocation as appropriate.
1265
1266overcommit_ratio
1267----------------
1268
1269Percentage of physical memory size to include in overcommit calculations
1270(see above.)
1271
1272Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100)
1273
1274 swapspace = total size of all swap areas
1275 physmem = size of physical memory in system
1253 1276
1254nr_hugepages and hugetlb_shm_group 1277nr_hugepages and hugetlb_shm_group
1255---------------------------------- 1278----------------------------------
diff --git a/Documentation/filesystems/relayfs.txt b/Documentation/filesystems/relayfs.txt
new file mode 100644
index 000000000000..d24e1b0d4f39
--- /dev/null
+++ b/Documentation/filesystems/relayfs.txt
@@ -0,0 +1,362 @@
1
2relayfs - a high-speed data relay filesystem
3============================================
4
5relayfs is a filesystem designed to provide an efficient mechanism for
6tools and facilities to relay large and potentially sustained streams
7of data from kernel space to user space.
8
9The main abstraction of relayfs is the 'channel'. A channel consists
10of a set of per-cpu kernel buffers each represented by a file in the
11relayfs filesystem. Kernel clients write into a channel using
12efficient write functions which automatically log to the current cpu's
13channel buffer. User space applications mmap() the per-cpu files and
14retrieve the data as it becomes available.
15
16The format of the data logged into the channel buffers is completely
17up to the relayfs client; relayfs does however provide hooks which
18allow clients to impose some stucture on the buffer data. Nor does
19relayfs implement any form of data filtering - this also is left to
20the client. The purpose is to keep relayfs as simple as possible.
21
22This document provides an overview of the relayfs API. The details of
23the function parameters are documented along with the functions in the
24filesystem code - please see that for details.
25
26Semantics
27=========
28
29Each relayfs channel has one buffer per CPU, each buffer has one or
30more sub-buffers. Messages are written to the first sub-buffer until
31it is too full to contain a new message, in which case it it is
32written to the next (if available). Messages are never split across
33sub-buffers. At this point, userspace can be notified so it empties
34the first sub-buffer, while the kernel continues writing to the next.
35
36When notified that a sub-buffer is full, the kernel knows how many
37bytes of it are padding i.e. unused. Userspace can use this knowledge
38to copy only valid data.
39
40After copying it, userspace can notify the kernel that a sub-buffer
41has been consumed.
42
43relayfs can operate in a mode where it will overwrite data not yet
44collected by userspace, and not wait for it to consume it.
45
46relayfs itself does not provide for communication of such data between
47userspace and kernel, allowing the kernel side to remain simple and not
48impose a single interface on userspace. It does provide a separate
49helper though, described below.
50
51klog, relay-app & librelay
52==========================
53
54relayfs itself is ready to use, but to make things easier, two
55additional systems are provided. klog is a simple wrapper to make
56writing formatted text or raw data to a channel simpler, regardless of
57whether a channel to write into exists or not, or whether relayfs is
58compiled into the kernel or is configured as a module. relay-app is
59the kernel counterpart of userspace librelay.c, combined these two
60files provide glue to easily stream data to disk, without having to
61bother with housekeeping. klog and relay-app can be used together,
62with klog providing high-level logging functions to the kernel and
63relay-app taking care of kernel-user control and disk-logging chores.
64
65It is possible to use relayfs without relay-app & librelay, but you'll
66have to implement communication between userspace and kernel, allowing
67both to convey the state of buffers (full, empty, amount of padding).
68
69klog, relay-app and librelay can be found in the relay-apps tarball on
70http://relayfs.sourceforge.net
71
72The relayfs user space API
73==========================
74
75relayfs implements basic file operations for user space access to
76relayfs channel buffer data. Here are the file operations that are
77available and some comments regarding their behavior:
78
79open() enables user to open an _existing_ buffer.
80
81mmap() results in channel buffer being mapped into the caller's
82 memory space. Note that you can't do a partial mmap - you must
83 map the entire file, which is NRBUF * SUBBUFSIZE.
84
85read() read the contents of a channel buffer. The bytes read are
86 'consumed' by the reader i.e. they won't be available again
87 to subsequent reads. If the channel is being used in
88 no-overwrite mode (the default), it can be read at any time
89 even if there's an active kernel writer. If the channel is
90 being used in overwrite mode and there are active channel
91 writers, results may be unpredictable - users should make
92 sure that all logging to the channel has ended before using
93 read() with overwrite mode.
94
95poll() POLLIN/POLLRDNORM/POLLERR supported. User applications are
96 notified when sub-buffer boundaries are crossed.
97
98close() decrements the channel buffer's refcount. When the refcount
99 reaches 0 i.e. when no process or kernel client has the buffer
100 open, the channel buffer is freed.
101
102
103In order for a user application to make use of relayfs files, the
104relayfs filesystem must be mounted. For example,
105
106 mount -t relayfs relayfs /mnt/relay
107
108NOTE: relayfs doesn't need to be mounted for kernel clients to create
109 or use channels - it only needs to be mounted when user space
110 applications need access to the buffer data.
111
112
113The relayfs kernel API
114======================
115
116Here's a summary of the API relayfs provides to in-kernel clients:
117
118
119 channel management functions:
120
121 relay_open(base_filename, parent, subbuf_size, n_subbufs,
122 callbacks)
123 relay_close(chan)
124 relay_flush(chan)
125 relay_reset(chan)
126 relayfs_create_dir(name, parent)
127 relayfs_remove_dir(dentry)
128
129 channel management typically called on instigation of userspace:
130
131 relay_subbufs_consumed(chan, cpu, subbufs_consumed)
132
133 write functions:
134
135 relay_write(chan, data, length)
136 __relay_write(chan, data, length)
137 relay_reserve(chan, length)
138
139 callbacks:
140
141 subbuf_start(buf, subbuf, prev_subbuf, prev_padding)
142 buf_mapped(buf, filp)
143 buf_unmapped(buf, filp)
144
145 helper functions:
146
147 relay_buf_full(buf)
148 subbuf_start_reserve(buf, length)
149
150
151Creating a channel
152------------------
153
154relay_open() is used to create a channel, along with its per-cpu
155channel buffers. Each channel buffer will have an associated file
156created for it in the relayfs filesystem, which can be opened and
157mmapped from user space if desired. The files are named
158basename0...basenameN-1 where N is the number of online cpus, and by
159default will be created in the root of the filesystem. If you want a
160directory structure to contain your relayfs files, you can create it
161with relayfs_create_dir() and pass the parent directory to
162relay_open(). Clients are responsible for cleaning up any directory
163structure they create when the channel is closed - use
164relayfs_remove_dir() for that.
165
166The total size of each per-cpu buffer is calculated by multiplying the
167number of sub-buffers by the sub-buffer size passed into relay_open().
168The idea behind sub-buffers is that they're basically an extension of
169double-buffering to N buffers, and they also allow applications to
170easily implement random-access-on-buffer-boundary schemes, which can
171be important for some high-volume applications. The number and size
172of sub-buffers is completely dependent on the application and even for
173the same application, different conditions will warrant different
174values for these parameters at different times. Typically, the right
175values to use are best decided after some experimentation; in general,
176though, it's safe to assume that having only 1 sub-buffer is a bad
177idea - you're guaranteed to either overwrite data or lose events
178depending on the channel mode being used.
179
180Channel 'modes'
181---------------
182
183relayfs channels can be used in either of two modes - 'overwrite' or
184'no-overwrite'. The mode is entirely determined by the implementation
185of the subbuf_start() callback, as described below. In 'overwrite'
186mode, also known as 'flight recorder' mode, writes continuously cycle
187around the buffer and will never fail, but will unconditionally
188overwrite old data regardless of whether it's actually been consumed.
189In no-overwrite mode, writes will fail i.e. data will be lost, if the
190number of unconsumed sub-buffers equals the total number of
191sub-buffers in the channel. It should be clear that if there is no
192consumer or if the consumer can't consume sub-buffers fast enought,
193data will be lost in either case; the only difference is whether data
194is lost from the beginning or the end of a buffer.
195
196As explained above, a relayfs channel is made of up one or more
197per-cpu channel buffers, each implemented as a circular buffer
198subdivided into one or more sub-buffers. Messages are written into
199the current sub-buffer of the channel's current per-cpu buffer via the
200write functions described below. Whenever a message can't fit into
201the current sub-buffer, because there's no room left for it, the
202client is notified via the subbuf_start() callback that a switch to a
203new sub-buffer is about to occur. The client uses this callback to 1)
204initialize the next sub-buffer if appropriate 2) finalize the previous
205sub-buffer if appropriate and 3) return a boolean value indicating
206whether or not to actually go ahead with the sub-buffer switch.
207
208To implement 'no-overwrite' mode, the userspace client would provide
209an implementation of the subbuf_start() callback something like the
210following:
211
212static int subbuf_start(struct rchan_buf *buf,
213 void *subbuf,
214 void *prev_subbuf,
215 unsigned int prev_padding)
216{
217 if (prev_subbuf)
218 *((unsigned *)prev_subbuf) = prev_padding;
219
220 if (relay_buf_full(buf))
221 return 0;
222
223 subbuf_start_reserve(buf, sizeof(unsigned int));
224
225 return 1;
226}
227
228If the current buffer is full i.e. all sub-buffers remain unconsumed,
229the callback returns 0 to indicate that the buffer switch should not
230occur yet i.e. until the consumer has had a chance to read the current
231set of ready sub-buffers. For the relay_buf_full() function to make
232sense, the consumer is reponsible for notifying relayfs when
233sub-buffers have been consumed via relay_subbufs_consumed(). Any
234subsequent attempts to write into the buffer will again invoke the
235subbuf_start() callback with the same parameters; only when the
236consumer has consumed one or more of the ready sub-buffers will
237relay_buf_full() return 0, in which case the buffer switch can
238continue.
239
240The implementation of the subbuf_start() callback for 'overwrite' mode
241would be very similar:
242
243static int subbuf_start(struct rchan_buf *buf,
244 void *subbuf,
245 void *prev_subbuf,
246 unsigned int prev_padding)
247{
248 if (prev_subbuf)
249 *((unsigned *)prev_subbuf) = prev_padding;
250
251 subbuf_start_reserve(buf, sizeof(unsigned int));
252
253 return 1;
254}
255
256In this case, the relay_buf_full() check is meaningless and the
257callback always returns 1, causing the buffer switch to occur
258unconditionally. It's also meaningless for the client to use the
259relay_subbufs_consumed() function in this mode, as it's never
260consulted.
261
262The default subbuf_start() implementation, used if the client doesn't
263define any callbacks, or doesn't define the subbuf_start() callback,
264implements the simplest possible 'no-overwrite' mode i.e. it does
265nothing but return 0.
266
267Header information can be reserved at the beginning of each sub-buffer
268by calling the subbuf_start_reserve() helper function from within the
269subbuf_start() callback. This reserved area can be used to store
270whatever information the client wants. In the example above, room is
271reserved in each sub-buffer to store the padding count for that
272sub-buffer. This is filled in for the previous sub-buffer in the
273subbuf_start() implementation; the padding value for the previous
274sub-buffer is passed into the subbuf_start() callback along with a
275pointer to the previous sub-buffer, since the padding value isn't
276known until a sub-buffer is filled. The subbuf_start() callback is
277also called for the first sub-buffer when the channel is opened, to
278give the client a chance to reserve space in it. In this case the
279previous sub-buffer pointer passed into the callback will be NULL, so
280the client should check the value of the prev_subbuf pointer before
281writing into the previous sub-buffer.
282
283Writing to a channel
284--------------------
285
286kernel clients write data into the current cpu's channel buffer using
287relay_write() or __relay_write(). relay_write() is the main logging
288function - it uses local_irqsave() to protect the buffer and should be
289used if you might be logging from interrupt context. If you know
290you'll never be logging from interrupt context, you can use
291__relay_write(), which only disables preemption. These functions
292don't return a value, so you can't determine whether or not they
293failed - the assumption is that you wouldn't want to check a return
294value in the fast logging path anyway, and that they'll always succeed
295unless the buffer is full and no-overwrite mode is being used, in
296which case you can detect a failed write in the subbuf_start()
297callback by calling the relay_buf_full() helper function.
298
299relay_reserve() is used to reserve a slot in a channel buffer which
300can be written to later. This would typically be used in applications
301that need to write directly into a channel buffer without having to
302stage data in a temporary buffer beforehand. Because the actual write
303may not happen immediately after the slot is reserved, applications
304using relay_reserve() can keep a count of the number of bytes actually
305written, either in space reserved in the sub-buffers themselves or as
306a separate array. See the 'reserve' example in the relay-apps tarball
307at http://relayfs.sourceforge.net for an example of how this can be
308done. Because the write is under control of the client and is
309separated from the reserve, relay_reserve() doesn't protect the buffer
310at all - it's up to the client to provide the appropriate
311synchronization when using relay_reserve().
312
313Closing a channel
314-----------------
315
316The client calls relay_close() when it's finished using the channel.
317The channel and its associated buffers are destroyed when there are no
318longer any references to any of the channel buffers. relay_flush()
319forces a sub-buffer switch on all the channel buffers, and can be used
320to finalize and process the last sub-buffers before the channel is
321closed.
322
323Misc
324----
325
326Some applications may want to keep a channel around and re-use it
327rather than open and close a new channel for each use. relay_reset()
328can be used for this purpose - it resets a channel to its initial
329state without reallocating channel buffer memory or destroying
330existing mappings. It should however only be called when it's safe to
331do so i.e. when the channel isn't currently being written to.
332
333Finally, there are a couple of utility callbacks that can be used for
334different purposes. buf_mapped() is called whenever a channel buffer
335is mmapped from user space and buf_unmapped() is called when it's
336unmapped. The client can use this notification to trigger actions
337within the kernel application, such as enabling/disabling logging to
338the channel.
339
340
341Resources
342=========
343
344For news, example code, mailing list, etc. see the relayfs homepage:
345
346 http://relayfs.sourceforge.net
347
348
349Credits
350=======
351
352The ideas and specs for relayfs came about as a result of discussions
353on tracing involving the following:
354
355Michel Dagenais <michel.dagenais@polymtl.ca>
356Richard Moore <richardj_moore@uk.ibm.com>
357Bob Wisniewski <bob@watson.ibm.com>
358Karim Yaghmour <karim@opersys.com>
359Tom Zanussi <zanussi@us.ibm.com>
360
361Also thanks to Hubertus Franke for a lot of useful suggestions and bug
362reports.
diff --git a/Documentation/filesystems/sysfs.txt b/Documentation/filesystems/sysfs.txt
index dc276598a65a..c8bce82ddcac 100644
--- a/Documentation/filesystems/sysfs.txt
+++ b/Documentation/filesystems/sysfs.txt
@@ -90,7 +90,7 @@ void device_remove_file(struct device *, struct device_attribute *);
90 90
91It also defines this helper for defining device attributes: 91It also defines this helper for defining device attributes:
92 92
93#define DEVICE_ATTR(_name,_mode,_show,_store) \ 93#define DEVICE_ATTR(_name, _mode, _show, _store) \
94struct device_attribute dev_attr_##_name = { \ 94struct device_attribute dev_attr_##_name = { \
95 .attr = {.name = __stringify(_name) , .mode = _mode }, \ 95 .attr = {.name = __stringify(_name) , .mode = _mode }, \
96 .show = _show, \ 96 .show = _show, \
@@ -99,14 +99,14 @@ struct device_attribute dev_attr_##_name = { \
99 99
100For example, declaring 100For example, declaring
101 101
102static DEVICE_ATTR(foo,0644,show_foo,store_foo); 102static DEVICE_ATTR(foo, S_IWUSR | S_IRUGO, show_foo, store_foo);
103 103
104is equivalent to doing: 104is equivalent to doing:
105 105
106static struct device_attribute dev_attr_foo = { 106static struct device_attribute dev_attr_foo = {
107 .attr = { 107 .attr = {
108 .name = "foo", 108 .name = "foo",
109 .mode = 0644, 109 .mode = S_IWUSR | S_IRUGO,
110 }, 110 },
111 .show = show_foo, 111 .show = show_foo,
112 .store = store_foo, 112 .store = store_foo,
@@ -121,8 +121,8 @@ set of sysfs operations for forwarding read and write calls to the
121show and store methods of the attribute owners. 121show and store methods of the attribute owners.
122 122
123struct sysfs_ops { 123struct sysfs_ops {
124 ssize_t (*show)(struct kobject *, struct attribute *,char *); 124 ssize_t (*show)(struct kobject *, struct attribute *, char *);
125 ssize_t (*store)(struct kobject *,struct attribute *,const char *); 125 ssize_t (*store)(struct kobject *, struct attribute *, const char *);
126}; 126};
127 127
128[ Subsystems should have already defined a struct kobj_type as a 128[ Subsystems should have already defined a struct kobj_type as a
@@ -137,7 +137,7 @@ calls the associated methods.
137 137
138To illustrate: 138To illustrate:
139 139
140#define to_dev_attr(_attr) container_of(_attr,struct device_attribute,attr) 140#define to_dev_attr(_attr) container_of(_attr, struct device_attribute, attr)
141#define to_dev(d) container_of(d, struct device, kobj) 141#define to_dev(d) container_of(d, struct device, kobj)
142 142
143static ssize_t 143static ssize_t
@@ -148,7 +148,7 @@ dev_attr_show(struct kobject * kobj, struct attribute * attr, char * buf)
148 ssize_t ret = 0; 148 ssize_t ret = 0;
149 149
150 if (dev_attr->show) 150 if (dev_attr->show)
151 ret = dev_attr->show(dev,buf); 151 ret = dev_attr->show(dev, buf);
152 return ret; 152 return ret;
153} 153}
154 154
@@ -216,16 +216,16 @@ A very simple (and naive) implementation of a device attribute is:
216 216
217static ssize_t show_name(struct device *dev, struct device_attribute *attr, char *buf) 217static ssize_t show_name(struct device *dev, struct device_attribute *attr, char *buf)
218{ 218{
219 return sprintf(buf,"%s\n",dev->name); 219 return snprintf(buf, PAGE_SIZE, "%s\n", dev->name);
220} 220}
221 221
222static ssize_t store_name(struct device * dev, const char * buf) 222static ssize_t store_name(struct device * dev, const char * buf)
223{ 223{
224 sscanf(buf,"%20s",dev->name); 224 sscanf(buf, "%20s", dev->name);
225 return strlen(buf); 225 return strnlen(buf, PAGE_SIZE);
226} 226}
227 227
228static DEVICE_ATTR(name,S_IRUGO,show_name,store_name); 228static DEVICE_ATTR(name, S_IRUGO, show_name, store_name);
229 229
230 230
231(Note that the real implementation doesn't allow userspace to set the 231(Note that the real implementation doesn't allow userspace to set the
@@ -290,7 +290,7 @@ struct device_attribute {
290 290
291Declaring: 291Declaring:
292 292
293DEVICE_ATTR(_name,_str,_mode,_show,_store); 293DEVICE_ATTR(_name, _str, _mode, _show, _store);
294 294
295Creation/Removal: 295Creation/Removal:
296 296
@@ -310,7 +310,7 @@ struct bus_attribute {
310 310
311Declaring: 311Declaring:
312 312
313BUS_ATTR(_name,_mode,_show,_store) 313BUS_ATTR(_name, _mode, _show, _store)
314 314
315Creation/Removal: 315Creation/Removal:
316 316
@@ -331,7 +331,7 @@ struct driver_attribute {
331 331
332Declaring: 332Declaring:
333 333
334DRIVER_ATTR(_name,_mode,_show,_store) 334DRIVER_ATTR(_name, _mode, _show, _store)
335 335
336Creation/Removal: 336Creation/Removal:
337 337
diff --git a/Documentation/filesystems/v9fs.txt b/Documentation/filesystems/v9fs.txt
new file mode 100644
index 000000000000..4e92feb6b507
--- /dev/null
+++ b/Documentation/filesystems/v9fs.txt
@@ -0,0 +1,95 @@
1 V9FS: 9P2000 for Linux
2 ======================
3
4ABOUT
5=====
6
7v9fs is a Unix implementation of the Plan 9 9p remote filesystem protocol.
8
9This software was originally developed by Ron Minnich <rminnich@lanl.gov>
10and Maya Gokhale <maya@lanl.gov>. Additional development by Greg Watson
11<gwatson@lanl.gov> and most recently Eric Van Hensbergen
12<ericvh@gmail.com> and Latchesar Ionkov <lucho@ionkov.net>.
13
14USAGE
15=====
16
17For remote file server:
18
19 mount -t 9P 10.10.1.2 /mnt/9
20
21For Plan 9 From User Space applications (http://swtch.com/plan9)
22
23 mount -t 9P `namespace`/acme /mnt/9 -o proto=unix,name=$USER
24
25OPTIONS
26=======
27
28 proto=name select an alternative transport. Valid options are
29 currently:
30 unix - specifying a named pipe mount point
31 tcp - specifying a normal TCP/IP connection
32 fd - used passed file descriptors for connection
33 (see rfdno and wfdno)
34
35 name=name user name to attempt mount as on the remote server. The
36 server may override or ignore this value. Certain user
37 names may require authentication.
38
39 aname=name aname specifies the file tree to access when the server is
40 offering several exported file systems.
41
42 debug=n specifies debug level. The debug level is a bitmask.
43 0x01 = display verbose error messages
44 0x02 = developer debug (DEBUG_CURRENT)
45 0x04 = display 9P trace
46 0x08 = display VFS trace
47 0x10 = display Marshalling debug
48 0x20 = display RPC debug
49 0x40 = display transport debug
50 0x80 = display allocation debug
51
52 rfdno=n the file descriptor for reading with proto=fd
53
54 wfdno=n the file descriptor for writing with proto=fd
55
56 maxdata=n the number of bytes to use for 9P packet payload (msize)
57
58 port=n port to connect to on the remote server
59
60 timeout=n request timeouts (in ms) (default 60000ms)
61
62 noextend force legacy mode (no 9P2000.u semantics)
63
64 uid attempt to mount as a particular uid
65
66 gid attempt to mount with a particular gid
67
68 afid security channel - used by Plan 9 authentication protocols
69
70 nodevmap do not map special files - represent them as normal files.
71 This can be used to share devices/named pipes/sockets between
72 hosts. This functionality will be expanded in later versions.
73
74RESOURCES
75=========
76
77The Linux version of the 9P server, along with some client-side utilities
78can be found at http://v9fs.sf.net (along with a CVS repository of the
79development branch of this module). There are user and developer mailing
80lists here, as well as a bug-tracker.
81
82For more information on the Plan 9 Operating System check out
83http://plan9.bell-labs.com/plan9
84
85For information on Plan 9 from User Space (Plan 9 applications and libraries
86ported to Linux/BSD/OSX/etc) check out http://swtch.com/plan9
87
88
89STATUS
90======
91
92The 2.6 kernel support is working on PPC and x86.
93
94PLEASE USE THE SOURCEFORGE BUG-TRACKER TO REPORT PROBLEMS.
95
diff --git a/Documentation/filesystems/vfs.txt b/Documentation/filesystems/vfs.txt
index 3f318dd44c77..f042c12e0ed2 100644
--- a/Documentation/filesystems/vfs.txt
+++ b/Documentation/filesystems/vfs.txt
@@ -1,35 +1,27 @@
1/* -*- auto-fill -*- */
2 1
3 Overview of the Virtual File System 2 Overview of the Linux Virtual File System
4 3
5 Richard Gooch <rgooch@atnf.csiro.au> 4 Original author: Richard Gooch <rgooch@atnf.csiro.au>
6 5
7 5-JUL-1999 6 Last updated on August 25, 2005
8 7
8 Copyright (C) 1999 Richard Gooch
9 Copyright (C) 2005 Pekka Enberg
9 10
10Conventions used in this document <section> 11 This file is released under the GPLv2.
11=================================
12 12
13Each section in this document will have the string "<section>" at the
14right-hand side of the section title. Each subsection will have
15"<subsection>" at the right-hand side. These strings are meant to make
16it easier to search through the document.
17 13
18NOTE that the master copy of this document is available online at: 14What is it?
19http://www.atnf.csiro.au/~rgooch/linux/docs/vfs.txt
20
21
22What is it? <section>
23=========== 15===========
24 16
25The Virtual File System (otherwise known as the Virtual Filesystem 17The Virtual File System (otherwise known as the Virtual Filesystem
26Switch) is the software layer in the kernel that provides the 18Switch) is the software layer in the kernel that provides the
27filesystem interface to userspace programs. It also provides an 19filesystem interface to userspace programs. It also provides an
28abstraction within the kernel which allows different filesystem 20abstraction within the kernel which allows different filesystem
29implementations to co-exist. 21implementations to coexist.
30 22
31 23
32A Quick Look At How It Works <section> 24A Quick Look At How It Works
33============================ 25============================
34 26
35In this section I'll briefly describe how things work, before 27In this section I'll briefly describe how things work, before
@@ -38,7 +30,8 @@ when user programs open and manipulate files, and then look from the
38other view which is how a filesystem is supported and subsequently 30other view which is how a filesystem is supported and subsequently
39mounted. 31mounted.
40 32
41Opening a File <subsection> 33
34Opening a File
42-------------- 35--------------
43 36
44The VFS implements the open(2), stat(2), chmod(2) and similar system 37The VFS implements the open(2), stat(2), chmod(2) and similar system
@@ -77,7 +70,7 @@ back to userspace.
77 70
78Opening a file requires another operation: allocation of a file 71Opening a file requires another operation: allocation of a file
79structure (this is the kernel-side implementation of file 72structure (this is the kernel-side implementation of file
80descriptors). The freshly allocated file structure is initialised with 73descriptors). The freshly allocated file structure is initialized with
81a pointer to the dentry and a set of file operation member functions. 74a pointer to the dentry and a set of file operation member functions.
82These are taken from the inode data. The open() file method is then 75These are taken from the inode data. The open() file method is then
83called so the specific filesystem implementation can do it's work. You 76called so the specific filesystem implementation can do it's work. You
@@ -102,7 +95,8 @@ filesystem or driver code at the same time, on different
102processors. You should ensure that access to shared resources is 95processors. You should ensure that access to shared resources is
103protected by appropriate locks. 96protected by appropriate locks.
104 97
105Registering and Mounting a Filesystem <subsection> 98
99Registering and Mounting a Filesystem
106------------------------------------- 100-------------------------------------
107 101
108If you want to support a new kind of filesystem in the kernel, all you 102If you want to support a new kind of filesystem in the kernel, all you
@@ -123,17 +117,21 @@ updated to point to the root inode for the new filesystem.
123It's now time to look at things in more detail. 117It's now time to look at things in more detail.
124 118
125 119
126struct file_system_type <section> 120struct file_system_type
127======================= 121=======================
128 122
129This describes the filesystem. As of kernel 2.1.99, the following 123This describes the filesystem. As of kernel 2.6.13, the following
130members are defined: 124members are defined:
131 125
132struct file_system_type { 126struct file_system_type {
133 const char *name; 127 const char *name;
134 int fs_flags; 128 int fs_flags;
135 struct super_block *(*read_super) (struct super_block *, void *, int); 129 struct super_block *(*get_sb) (struct file_system_type *, int,
136 struct file_system_type * next; 130 const char *, void *);
131 void (*kill_sb) (struct super_block *);
132 struct module *owner;
133 struct file_system_type * next;
134 struct list_head fs_supers;
137}; 135};
138 136
139 name: the name of the filesystem type, such as "ext2", "iso9660", 137 name: the name of the filesystem type, such as "ext2", "iso9660",
@@ -141,51 +139,97 @@ struct file_system_type {
141 139
142 fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.) 140 fs_flags: various flags (i.e. FS_REQUIRES_DEV, FS_NO_DCACHE, etc.)
143 141
144 read_super: the method to call when a new instance of this 142 get_sb: the method to call when a new instance of this
145 filesystem should be mounted 143 filesystem should be mounted
146 144
147 next: for internal VFS use: you should initialise this to NULL 145 kill_sb: the method to call when an instance of this filesystem
146 should be unmounted
147
148 owner: for internal VFS use: you should initialize this to THIS_MODULE in
149 most cases.
148 150
149The read_super() method has the following arguments: 151 next: for internal VFS use: you should initialize this to NULL
152
153The get_sb() method has the following arguments:
150 154
151 struct super_block *sb: the superblock structure. This is partially 155 struct super_block *sb: the superblock structure. This is partially
152 initialised by the VFS and the rest must be initialised by the 156 initialized by the VFS and the rest must be initialized by the
153 read_super() method 157 get_sb() method
158
159 int flags: mount flags
160
161 const char *dev_name: the device name we are mounting.
154 162
155 void *data: arbitrary mount options, usually comes as an ASCII 163 void *data: arbitrary mount options, usually comes as an ASCII
156 string 164 string
157 165
158 int silent: whether or not to be silent on error 166 int silent: whether or not to be silent on error
159 167
160The read_super() method must determine if the block device specified 168The get_sb() method must determine if the block device specified
161in the superblock contains a filesystem of the type the method 169in the superblock contains a filesystem of the type the method
162supports. On success the method returns the superblock pointer, on 170supports. On success the method returns the superblock pointer, on
163failure it returns NULL. 171failure it returns NULL.
164 172
165The most interesting member of the superblock structure that the 173The most interesting member of the superblock structure that the
166read_super() method fills in is the "s_op" field. This is a pointer to 174get_sb() method fills in is the "s_op" field. This is a pointer to
167a "struct super_operations" which describes the next level of the 175a "struct super_operations" which describes the next level of the
168filesystem implementation. 176filesystem implementation.
169 177
178Usually, a filesystem uses generic one of the generic get_sb()
179implementations and provides a fill_super() method instead. The
180generic methods are:
181
182 get_sb_bdev: mount a filesystem residing on a block device
170 183
171struct super_operations <section> 184 get_sb_nodev: mount a filesystem that is not backed by a device
185
186 get_sb_single: mount a filesystem which shares the instance between
187 all mounts
188
189A fill_super() method implementation has the following arguments:
190
191 struct super_block *sb: the superblock structure. The method fill_super()
192 must initialize this properly.
193
194 void *data: arbitrary mount options, usually comes as an ASCII
195 string
196
197 int silent: whether or not to be silent on error
198
199
200struct super_operations
172======================= 201=======================
173 202
174This describes how the VFS can manipulate the superblock of your 203This describes how the VFS can manipulate the superblock of your
175filesystem. As of kernel 2.1.99, the following members are defined: 204filesystem. As of kernel 2.6.13, the following members are defined:
176 205
177struct super_operations { 206struct super_operations {
178 void (*read_inode) (struct inode *); 207 struct inode *(*alloc_inode)(struct super_block *sb);
179 int (*write_inode) (struct inode *, int); 208 void (*destroy_inode)(struct inode *);
180 void (*put_inode) (struct inode *); 209
181 void (*drop_inode) (struct inode *); 210 void (*read_inode) (struct inode *);
182 void (*delete_inode) (struct inode *); 211
183 int (*notify_change) (struct dentry *, struct iattr *); 212 void (*dirty_inode) (struct inode *);
184 void (*put_super) (struct super_block *); 213 int (*write_inode) (struct inode *, int);
185 void (*write_super) (struct super_block *); 214 void (*put_inode) (struct inode *);
186 int (*statfs) (struct super_block *, struct statfs *, int); 215 void (*drop_inode) (struct inode *);
187 int (*remount_fs) (struct super_block *, int *, char *); 216 void (*delete_inode) (struct inode *);
188 void (*clear_inode) (struct inode *); 217 void (*put_super) (struct super_block *);
218 void (*write_super) (struct super_block *);
219 int (*sync_fs)(struct super_block *sb, int wait);
220 void (*write_super_lockfs) (struct super_block *);
221 void (*unlockfs) (struct super_block *);
222 int (*statfs) (struct super_block *, struct kstatfs *);
223 int (*remount_fs) (struct super_block *, int *, char *);
224 void (*clear_inode) (struct inode *);
225 void (*umount_begin) (struct super_block *);
226
227 void (*sync_inodes) (struct super_block *sb,
228 struct writeback_control *wbc);
229 int (*show_options)(struct seq_file *, struct vfsmount *);
230
231 ssize_t (*quota_read)(struct super_block *, int, char *, size_t, loff_t);
232 ssize_t (*quota_write)(struct super_block *, int, const char *, size_t, loff_t);
189}; 233};
190 234
191All methods are called without any locks being held, unless otherwise 235All methods are called without any locks being held, unless otherwise
@@ -193,43 +237,62 @@ noted. This means that most methods can block safely. All methods are
193only called from a process context (i.e. not from an interrupt handler 237only called from a process context (i.e. not from an interrupt handler
194or bottom half). 238or bottom half).
195 239
240 alloc_inode: this method is called by inode_alloc() to allocate memory
241 for struct inode and initialize it.
242
243 destroy_inode: this method is called by destroy_inode() to release
244 resources allocated for struct inode.
245
196 read_inode: this method is called to read a specific inode from the 246 read_inode: this method is called to read a specific inode from the
197 mounted filesystem. The "i_ino" member in the "struct inode" 247 mounted filesystem. The i_ino member in the struct inode is
198 will be initialised by the VFS to indicate which inode to 248 initialized by the VFS to indicate which inode to read. Other
199 read. Other members are filled in by this method 249 members are filled in by this method.
250
251 You can set this to NULL and use iget5_locked() instead of iget()
252 to read inodes. This is necessary for filesystems for which the
253 inode number is not sufficient to identify an inode.
254
255 dirty_inode: this method is called by the VFS to mark an inode dirty.
200 256
201 write_inode: this method is called when the VFS needs to write an 257 write_inode: this method is called when the VFS needs to write an
202 inode to disc. The second parameter indicates whether the write 258 inode to disc. The second parameter indicates whether the write
203 should be synchronous or not, not all filesystems check this flag. 259 should be synchronous or not, not all filesystems check this flag.
204 260
205 put_inode: called when the VFS inode is removed from the inode 261 put_inode: called when the VFS inode is removed from the inode
206 cache. This method is optional 262 cache.
207 263
208 drop_inode: called when the last access to the inode is dropped, 264 drop_inode: called when the last access to the inode is dropped,
209 with the inode_lock spinlock held. 265 with the inode_lock spinlock held.
210 266
211 This method should be either NULL (normal unix filesystem 267 This method should be either NULL (normal UNIX filesystem
212 semantics) or "generic_delete_inode" (for filesystems that do not 268 semantics) or "generic_delete_inode" (for filesystems that do not
213 want to cache inodes - causing "delete_inode" to always be 269 want to cache inodes - causing "delete_inode" to always be
214 called regardless of the value of i_nlink) 270 called regardless of the value of i_nlink)
215 271
216 The "generic_delete_inode()" behaviour is equivalent to the 272 The "generic_delete_inode()" behavior is equivalent to the
217 old practice of using "force_delete" in the put_inode() case, 273 old practice of using "force_delete" in the put_inode() case,
218 but does not have the races that the "force_delete()" approach 274 but does not have the races that the "force_delete()" approach
219 had. 275 had.
220 276
221 delete_inode: called when the VFS wants to delete an inode 277 delete_inode: called when the VFS wants to delete an inode
222 278
223 notify_change: called when VFS inode attributes are changed. If this
224 is NULL the VFS falls back to the write_inode() method. This
225 is called with the kernel lock held
226
227 put_super: called when the VFS wishes to free the superblock 279 put_super: called when the VFS wishes to free the superblock
228 (i.e. unmount). This is called with the superblock lock held 280 (i.e. unmount). This is called with the superblock lock held
229 281
230 write_super: called when the VFS superblock needs to be written to 282 write_super: called when the VFS superblock needs to be written to
231 disc. This method is optional 283 disc. This method is optional
232 284
285 sync_fs: called when VFS is writing out all dirty data associated with
286 a superblock. The second parameter indicates whether the method
287 should wait until the write out has been completed. Optional.
288
289 write_super_lockfs: called when VFS is locking a filesystem and forcing
290 it into a consistent state. This function is currently used by the
291 Logical Volume Manager (LVM).
292
293 unlockfs: called when VFS is unlocking a filesystem and making it writable
294 again.
295
233 statfs: called when the VFS needs to get filesystem statistics. This 296 statfs: called when the VFS needs to get filesystem statistics. This
234 is called with the kernel lock held 297 is called with the kernel lock held
235 298
@@ -238,21 +301,31 @@ or bottom half).
238 301
239 clear_inode: called then the VFS clears the inode. Optional 302 clear_inode: called then the VFS clears the inode. Optional
240 303
304 umount_begin: called when the VFS is unmounting a filesystem.
305
306 sync_inodes: called when the VFS is writing out dirty data associated with
307 a superblock.
308
309 show_options: called by the VFS to show mount options for /proc/<pid>/mounts.
310
311 quota_read: called by the VFS to read from filesystem quota file.
312
313 quota_write: called by the VFS to write to filesystem quota file.
314
241The read_inode() method is responsible for filling in the "i_op" 315The read_inode() method is responsible for filling in the "i_op"
242field. This is a pointer to a "struct inode_operations" which 316field. This is a pointer to a "struct inode_operations" which
243describes the methods that can be performed on individual inodes. 317describes the methods that can be performed on individual inodes.
244 318
245 319
246struct inode_operations <section> 320struct inode_operations
247======================= 321=======================
248 322
249This describes how the VFS can manipulate an inode in your 323This describes how the VFS can manipulate an inode in your
250filesystem. As of kernel 2.1.99, the following members are defined: 324filesystem. As of kernel 2.6.13, the following members are defined:
251 325
252struct inode_operations { 326struct inode_operations {
253 struct file_operations * default_file_ops; 327 int (*create) (struct inode *,struct dentry *,int, struct nameidata *);
254 int (*create) (struct inode *,struct dentry *,int); 328 struct dentry * (*lookup) (struct inode *,struct dentry *, struct nameidata *);
255 int (*lookup) (struct inode *,struct dentry *);
256 int (*link) (struct dentry *,struct inode *,struct dentry *); 329 int (*link) (struct dentry *,struct inode *,struct dentry *);
257 int (*unlink) (struct inode *,struct dentry *); 330 int (*unlink) (struct inode *,struct dentry *);
258 int (*symlink) (struct inode *,struct dentry *,const char *); 331 int (*symlink) (struct inode *,struct dentry *,const char *);
@@ -261,25 +334,22 @@ struct inode_operations {
261 int (*mknod) (struct inode *,struct dentry *,int,dev_t); 334 int (*mknod) (struct inode *,struct dentry *,int,dev_t);
262 int (*rename) (struct inode *, struct dentry *, 335 int (*rename) (struct inode *, struct dentry *,
263 struct inode *, struct dentry *); 336 struct inode *, struct dentry *);
264 int (*readlink) (struct dentry *, char *,int); 337 int (*readlink) (struct dentry *, char __user *,int);
265 struct dentry * (*follow_link) (struct dentry *, struct dentry *); 338 void * (*follow_link) (struct dentry *, struct nameidata *);
266 int (*readpage) (struct file *, struct page *); 339 void (*put_link) (struct dentry *, struct nameidata *, void *);
267 int (*writepage) (struct page *page, struct writeback_control *wbc);
268 int (*bmap) (struct inode *,int);
269 void (*truncate) (struct inode *); 340 void (*truncate) (struct inode *);
270 int (*permission) (struct inode *, int); 341 int (*permission) (struct inode *, int, struct nameidata *);
271 int (*smap) (struct inode *,int); 342 int (*setattr) (struct dentry *, struct iattr *);
272 int (*updatepage) (struct file *, struct page *, const char *, 343 int (*getattr) (struct vfsmount *mnt, struct dentry *, struct kstat *);
273 unsigned long, unsigned int, int); 344 int (*setxattr) (struct dentry *, const char *,const void *,size_t,int);
274 int (*revalidate) (struct dentry *); 345 ssize_t (*getxattr) (struct dentry *, const char *, void *, size_t);
346 ssize_t (*listxattr) (struct dentry *, char *, size_t);
347 int (*removexattr) (struct dentry *, const char *);
275}; 348};
276 349
277Again, all methods are called without any locks being held, unless 350Again, all methods are called without any locks being held, unless
278otherwise noted. 351otherwise noted.
279 352
280 default_file_ops: this is a pointer to a "struct file_operations"
281 which describes how to open and then manipulate open files
282
283 create: called by the open(2) and creat(2) system calls. Only 353 create: called by the open(2) and creat(2) system calls. Only
284 required if you want to support regular files. The dentry you 354 required if you want to support regular files. The dentry you
285 get should not have an inode (i.e. it should be a negative 355 get should not have an inode (i.e. it should be a negative
@@ -328,31 +398,143 @@ otherwise noted.
328 you want to support reading symbolic links 398 you want to support reading symbolic links
329 399
330 follow_link: called by the VFS to follow a symbolic link to the 400 follow_link: called by the VFS to follow a symbolic link to the
331 inode it points to. Only required if you want to support 401 inode it points to. Only required if you want to support
332 symbolic links 402 symbolic links. This function returns a void pointer cookie
403 that is passed to put_link().
404
405 put_link: called by the VFS to release resources allocated by
406 follow_link(). The cookie returned by follow_link() is passed to
407 to this function as the last parameter. It is used by filesystems
408 such as NFS where page cache is not stable (i.e. page that was
409 installed when the symbolic link walk started might not be in the
410 page cache at the end of the walk).
411
412 truncate: called by the VFS to change the size of a file. The i_size
413 field of the inode is set to the desired size by the VFS before
414 this function is called. This function is called by the truncate(2)
415 system call and related functionality.
416
417 permission: called by the VFS to check for access rights on a POSIX-like
418 filesystem.
419
420 setattr: called by the VFS to set attributes for a file. This function is
421 called by chmod(2) and related system calls.
422
423 getattr: called by the VFS to get attributes of a file. This function is
424 called by stat(2) and related system calls.
425
426 setxattr: called by the VFS to set an extended attribute for a file.
427 Extended attribute is a name:value pair associated with an inode. This
428 function is called by setxattr(2) system call.
429
430 getxattr: called by the VFS to retrieve the value of an extended attribute
431 name. This function is called by getxattr(2) function call.
432
433 listxattr: called by the VFS to list all extended attributes for a given
434 file. This function is called by listxattr(2) system call.
435
436 removexattr: called by the VFS to remove an extended attribute from a file.
437 This function is called by removexattr(2) system call.
438
439
440struct address_space_operations
441===============================
442
443This describes how the VFS can manipulate mapping of a file to page cache in
444your filesystem. As of kernel 2.6.13, the following members are defined:
445
446struct address_space_operations {
447 int (*writepage)(struct page *page, struct writeback_control *wbc);
448 int (*readpage)(struct file *, struct page *);
449 int (*sync_page)(struct page *);
450 int (*writepages)(struct address_space *, struct writeback_control *);
451 int (*set_page_dirty)(struct page *page);
452 int (*readpages)(struct file *filp, struct address_space *mapping,
453 struct list_head *pages, unsigned nr_pages);
454 int (*prepare_write)(struct file *, struct page *, unsigned, unsigned);
455 int (*commit_write)(struct file *, struct page *, unsigned, unsigned);
456 sector_t (*bmap)(struct address_space *, sector_t);
457 int (*invalidatepage) (struct page *, unsigned long);
458 int (*releasepage) (struct page *, int);
459 ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
460 loff_t offset, unsigned long nr_segs);
461 struct page* (*get_xip_page)(struct address_space *, sector_t,
462 int);
463};
464
465 writepage: called by the VM write a dirty page to backing store.
466
467 readpage: called by the VM to read a page from backing store.
468
469 sync_page: called by the VM to notify the backing store to perform all
470 queued I/O operations for a page. I/O operations for other pages
471 associated with this address_space object may also be performed.
472
473 writepages: called by the VM to write out pages associated with the
474 address_space object.
475
476 set_page_dirty: called by the VM to set a page dirty.
477
478 readpages: called by the VM to read pages associated with the address_space
479 object.
333 480
481 prepare_write: called by the generic write path in VM to set up a write
482 request for a page.
334 483
335struct file_operations <section> 484 commit_write: called by the generic write path in VM to write page to
485 its backing store.
486
487 bmap: called by the VFS to map a logical block offset within object to
488 physical block number. This method is use by for the legacy FIBMAP
489 ioctl. Other uses are discouraged.
490
491 invalidatepage: called by the VM on truncate to disassociate a page from its
492 address_space mapping.
493
494 releasepage: called by the VFS to release filesystem specific metadata from
495 a page.
496
497 direct_IO: called by the VM for direct I/O writes and reads.
498
499 get_xip_page: called by the VM to translate a block number to a page.
500 The page is valid until the corresponding filesystem is unmounted.
501 Filesystems that want to use execute-in-place (XIP) need to implement
502 it. An example implementation can be found in fs/ext2/xip.c.
503
504
505struct file_operations
336====================== 506======================
337 507
338This describes how the VFS can manipulate an open file. As of kernel 508This describes how the VFS can manipulate an open file. As of kernel
3392.1.99, the following members are defined: 5092.6.13, the following members are defined:
340 510
341struct file_operations { 511struct file_operations {
342 loff_t (*llseek) (struct file *, loff_t, int); 512 loff_t (*llseek) (struct file *, loff_t, int);
343 ssize_t (*read) (struct file *, char *, size_t, loff_t *); 513 ssize_t (*read) (struct file *, char __user *, size_t, loff_t *);
344 ssize_t (*write) (struct file *, const char *, size_t, loff_t *); 514 ssize_t (*aio_read) (struct kiocb *, char __user *, size_t, loff_t);
515 ssize_t (*write) (struct file *, const char __user *, size_t, loff_t *);
516 ssize_t (*aio_write) (struct kiocb *, const char __user *, size_t, loff_t);
345 int (*readdir) (struct file *, void *, filldir_t); 517 int (*readdir) (struct file *, void *, filldir_t);
346 unsigned int (*poll) (struct file *, struct poll_table_struct *); 518 unsigned int (*poll) (struct file *, struct poll_table_struct *);
347 int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long); 519 int (*ioctl) (struct inode *, struct file *, unsigned int, unsigned long);
520 long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
521 long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
348 int (*mmap) (struct file *, struct vm_area_struct *); 522 int (*mmap) (struct file *, struct vm_area_struct *);
349 int (*open) (struct inode *, struct file *); 523 int (*open) (struct inode *, struct file *);
524 int (*flush) (struct file *);
350 int (*release) (struct inode *, struct file *); 525 int (*release) (struct inode *, struct file *);
351 int (*fsync) (struct file *, struct dentry *); 526 int (*fsync) (struct file *, struct dentry *, int datasync);
352 int (*fasync) (struct file *, int); 527 int (*aio_fsync) (struct kiocb *, int datasync);
353 int (*check_media_change) (kdev_t dev); 528 int (*fasync) (int, struct file *, int);
354 int (*revalidate) (kdev_t dev);
355 int (*lock) (struct file *, int, struct file_lock *); 529 int (*lock) (struct file *, int, struct file_lock *);
530 ssize_t (*readv) (struct file *, const struct iovec *, unsigned long, loff_t *);
531 ssize_t (*writev) (struct file *, const struct iovec *, unsigned long, loff_t *);
532 ssize_t (*sendfile) (struct file *, loff_t *, size_t, read_actor_t, void *);
533 ssize_t (*sendpage) (struct file *, struct page *, int, size_t, loff_t *, int);
534 unsigned long (*get_unmapped_area)(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
535 int (*check_flags)(int);
536 int (*dir_notify)(struct file *filp, unsigned long arg);
537 int (*flock) (struct file *, int, struct file_lock *);
356}; 538};
357 539
358Again, all methods are called without any locks being held, unless 540Again, all methods are called without any locks being held, unless
@@ -362,8 +544,12 @@ otherwise noted.
362 544
363 read: called by read(2) and related system calls 545 read: called by read(2) and related system calls
364 546
547 aio_read: called by io_submit(2) and other asynchronous I/O operations
548
365 write: called by write(2) and related system calls 549 write: called by write(2) and related system calls
366 550
551 aio_write: called by io_submit(2) and other asynchronous I/O operations
552
367 readdir: called when the VFS needs to read the directory contents 553 readdir: called when the VFS needs to read the directory contents
368 554
369 poll: called by the VFS when a process wants to check if there is 555 poll: called by the VFS when a process wants to check if there is
@@ -372,18 +558,25 @@ otherwise noted.
372 558
373 ioctl: called by the ioctl(2) system call 559 ioctl: called by the ioctl(2) system call
374 560
561 unlocked_ioctl: called by the ioctl(2) system call. Filesystems that do not
562 require the BKL should use this method instead of the ioctl() above.
563
564 compat_ioctl: called by the ioctl(2) system call when 32 bit system calls
565 are used on 64 bit kernels.
566
375 mmap: called by the mmap(2) system call 567 mmap: called by the mmap(2) system call
376 568
377 open: called by the VFS when an inode should be opened. When the VFS 569 open: called by the VFS when an inode should be opened. When the VFS
378 opens a file, it creates a new "struct file" and initialises 570 opens a file, it creates a new "struct file". It then calls the
379 the "f_op" file operations member with the "default_file_ops" 571 open method for the newly allocated file structure. You might
380 field in the inode structure. It then calls the open method 572 think that the open method really belongs in
381 for the newly allocated file structure. You might think that 573 "struct inode_operations", and you may be right. I think it's
382 the open method really belongs in "struct inode_operations", 574 done the way it is because it makes filesystems simpler to
383 and you may be right. I think it's done the way it is because 575 implement. The open() method is a good place to initialize the
384 it makes filesystems simpler to implement. The open() method 576 "private_data" member in the file structure if you want to point
385 is a good place to initialise the "private_data" member in the 577 to a device structure
386 file structure if you want to point to a device structure 578
579 flush: called by the close(2) system call to flush a file
387 580
388 release: called when the last reference to an open file is closed 581 release: called when the last reference to an open file is closed
389 582
@@ -392,6 +585,23 @@ otherwise noted.
392 fasync: called by the fcntl(2) system call when asynchronous 585 fasync: called by the fcntl(2) system call when asynchronous
393 (non-blocking) mode is enabled for a file 586 (non-blocking) mode is enabled for a file
394 587
588 lock: called by the fcntl(2) system call for F_GETLK, F_SETLK, and F_SETLKW
589 commands
590
591 readv: called by the readv(2) system call
592
593 writev: called by the writev(2) system call
594
595 sendfile: called by the sendfile(2) system call
596
597 get_unmapped_area: called by the mmap(2) system call
598
599 check_flags: called by the fcntl(2) system call for F_SETFL command
600
601 dir_notify: called by the fcntl(2) system call for F_NOTIFY command
602
603 flock: called by the flock(2) system call
604
395Note that the file operations are implemented by the specific 605Note that the file operations are implemented by the specific
396filesystem in which the inode resides. When opening a device node 606filesystem in which the inode resides. When opening a device node
397(character or block special) most filesystems will call special 607(character or block special) most filesystems will call special
@@ -400,29 +610,28 @@ driver information. These support routines replace the filesystem file
400operations with those for the device driver, and then proceed to call 610operations with those for the device driver, and then proceed to call
401the new open() method for the file. This is how opening a device file 611the new open() method for the file. This is how opening a device file
402in the filesystem eventually ends up calling the device driver open() 612in the filesystem eventually ends up calling the device driver open()
403method. Note the devfs (the Device FileSystem) has a more direct path 613method.
404from device node to device driver (this is an unofficial kernel
405patch).
406 614
407 615
408Directory Entry Cache (dcache) <section> 616Directory Entry Cache (dcache)
409------------------------------ 617==============================
618
410 619
411struct dentry_operations 620struct dentry_operations
412======================== 621------------------------
413 622
414This describes how a filesystem can overload the standard dentry 623This describes how a filesystem can overload the standard dentry
415operations. Dentries and the dcache are the domain of the VFS and the 624operations. Dentries and the dcache are the domain of the VFS and the
416individual filesystem implementations. Device drivers have no business 625individual filesystem implementations. Device drivers have no business
417here. These methods may be set to NULL, as they are either optional or 626here. These methods may be set to NULL, as they are either optional or
418the VFS uses a default. As of kernel 2.1.99, the following members are 627the VFS uses a default. As of kernel 2.6.13, the following members are
419defined: 628defined:
420 629
421struct dentry_operations { 630struct dentry_operations {
422 int (*d_revalidate)(struct dentry *); 631 int (*d_revalidate)(struct dentry *, struct nameidata *);
423 int (*d_hash) (struct dentry *, struct qstr *); 632 int (*d_hash) (struct dentry *, struct qstr *);
424 int (*d_compare) (struct dentry *, struct qstr *, struct qstr *); 633 int (*d_compare) (struct dentry *, struct qstr *, struct qstr *);
425 void (*d_delete)(struct dentry *); 634 int (*d_delete)(struct dentry *);
426 void (*d_release)(struct dentry *); 635 void (*d_release)(struct dentry *);
427 void (*d_iput)(struct dentry *, struct inode *); 636 void (*d_iput)(struct dentry *, struct inode *);
428}; 637};
@@ -451,6 +660,7 @@ Each dentry has a pointer to its parent dentry, as well as a hash list
451of child dentries. Child dentries are basically like files in a 660of child dentries. Child dentries are basically like files in a
452directory. 661directory.
453 662
663
454Directory Entry Cache APIs 664Directory Entry Cache APIs
455-------------------------- 665--------------------------
456 666
@@ -471,7 +681,7 @@ manipulate dentries:
471 "d_delete" method is called 681 "d_delete" method is called
472 682
473 d_drop: this unhashes a dentry from its parents hash list. A 683 d_drop: this unhashes a dentry from its parents hash list. A
474 subsequent call to dput() will dellocate the dentry if its 684 subsequent call to dput() will deallocate the dentry if its
475 usage count drops to 0 685 usage count drops to 0
476 686
477 d_delete: delete a dentry. If there are no other open references to 687 d_delete: delete a dentry. If there are no other open references to
@@ -507,16 +717,16 @@ up by walking the tree starting with the first component
507of the pathname and using that dentry along with the next 717of the pathname and using that dentry along with the next
508component to look up the next level and so on. Since it 718component to look up the next level and so on. Since it
509is a frequent operation for workloads like multiuser 719is a frequent operation for workloads like multiuser
510environments and webservers, it is important to optimize 720environments and web servers, it is important to optimize
511this path. 721this path.
512 722
513Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus 723Prior to 2.5.10, dcache_lock was acquired in d_lookup and thus
514in every component during path look-up. Since 2.5.10 onwards, 724in every component during path look-up. Since 2.5.10 onwards,
515fastwalk algorithm changed this by holding the dcache_lock 725fast-walk algorithm changed this by holding the dcache_lock
516at the beginning and walking as many cached path component 726at the beginning and walking as many cached path component
517dentries as possible. This signficantly decreases the number 727dentries as possible. This significantly decreases the number
518of acquisition of dcache_lock. However it also increases the 728of acquisition of dcache_lock. However it also increases the
519lock hold time signficantly and affects performance in large 729lock hold time significantly and affects performance in large
520SMP machines. Since 2.5.62 kernel, dcache has been using 730SMP machines. Since 2.5.62 kernel, dcache has been using
521a new locking model that uses RCU to make dcache look-up 731a new locking model that uses RCU to make dcache look-up
522lock-free. 732lock-free.
@@ -527,7 +737,7 @@ protected the hash chain, d_child, d_alias, d_lru lists as well
527as d_inode and several other things like mount look-up. RCU-based 737as d_inode and several other things like mount look-up. RCU-based
528changes affect only the way the hash chain is protected. For everything 738changes affect only the way the hash chain is protected. For everything
529else the dcache_lock must be taken for both traversing as well as 739else the dcache_lock must be taken for both traversing as well as
530updating. The hash chain updations too take the dcache_lock. 740updating. The hash chain updates too take the dcache_lock.
531The significant change is the way d_lookup traverses the hash chain, 741The significant change is the way d_lookup traverses the hash chain,
532it doesn't acquire the dcache_lock for this and rely on RCU to 742it doesn't acquire the dcache_lock for this and rely on RCU to
533ensure that the dentry has not been *freed*. 743ensure that the dentry has not been *freed*.
@@ -535,14 +745,15 @@ ensure that the dentry has not been *freed*.
535 745
536Dcache locking details 746Dcache locking details
537---------------------- 747----------------------
748
538For many multi-user workloads, open() and stat() on files are 749For many multi-user workloads, open() and stat() on files are
539very frequently occurring operations. Both involve walking 750very frequently occurring operations. Both involve walking
540of path names to find the dentry corresponding to the 751of path names to find the dentry corresponding to the
541concerned file. In 2.4 kernel, dcache_lock was held 752concerned file. In 2.4 kernel, dcache_lock was held
542during look-up of each path component. Contention and 753during look-up of each path component. Contention and
543cacheline bouncing of this global lock caused significant 754cache-line bouncing of this global lock caused significant
544scalability problems. With the introduction of RCU 755scalability problems. With the introduction of RCU
545in linux kernel, this was worked around by making 756in Linux kernel, this was worked around by making
546the look-up of path components during path walking lock-free. 757the look-up of path components during path walking lock-free.
547 758
548 759
@@ -562,7 +773,7 @@ Some of the important changes are :
5622. Insertion of a dentry into the hash table is done using 7732. Insertion of a dentry into the hash table is done using
563 hlist_add_head_rcu() which take care of ordering the writes - 774 hlist_add_head_rcu() which take care of ordering the writes -
564 the writes to the dentry must be visible before the dentry 775 the writes to the dentry must be visible before the dentry
565 is inserted. This works in conjuction with hlist_for_each_rcu() 776 is inserted. This works in conjunction with hlist_for_each_rcu()
566 while walking the hash chain. The only requirement is that 777 while walking the hash chain. The only requirement is that
567 all initialization to the dentry must be done before hlist_add_head_rcu() 778 all initialization to the dentry must be done before hlist_add_head_rcu()
568 since we don't have dcache_lock protection while traversing 779 since we don't have dcache_lock protection while traversing
@@ -584,7 +795,7 @@ Some of the important changes are :
584 the same. In some sense, dcache_rcu path walking looks like 795 the same. In some sense, dcache_rcu path walking looks like
585 the pre-2.5.10 version. 796 the pre-2.5.10 version.
586 797
5875. All dentry hash chain updations must take the dcache_lock as well as 7985. All dentry hash chain updates must take the dcache_lock as well as
588 the per-dentry lock in that order. dput() does this to ensure 799 the per-dentry lock in that order. dput() does this to ensure
589 that a dentry that has just been looked up in another CPU 800 that a dentry that has just been looked up in another CPU
590 doesn't get deleted before dget() can be done on it. 801 doesn't get deleted before dget() can be done on it.
@@ -640,10 +851,10 @@ handled as described below :
640 Since we redo the d_parent check and compare name while holding 851 Since we redo the d_parent check and compare name while holding
641 d_lock, lock-free look-up will not race against d_move(). 852 d_lock, lock-free look-up will not race against d_move().
642 853
6434. There can be a theoritical race when a dentry keeps coming back 8544. There can be a theoretical race when a dentry keeps coming back
644 to original bucket due to double moves. Due to this look-up may 855 to original bucket due to double moves. Due to this look-up may
645 consider that it has never moved and can end up in a infinite loop. 856 consider that it has never moved and can end up in a infinite loop.
646 But this is not any worse that theoritical livelocks we already 857 But this is not any worse that theoretical livelocks we already
647 have in the kernel. 858 have in the kernel.
648 859
649 860
diff --git a/Documentation/hwmon/lm78 b/Documentation/hwmon/lm78
index 357086ed7f64..fd5dc7a19f0e 100644
--- a/Documentation/hwmon/lm78
+++ b/Documentation/hwmon/lm78
@@ -2,16 +2,11 @@ Kernel driver lm78
2================== 2==================
3 3
4Supported chips: 4Supported chips:
5 * National Semiconductor LM78 5 * National Semiconductor LM78 / LM78-J
6 Prefix: 'lm78' 6 Prefix: 'lm78'
7 Addresses scanned: I2C 0x20 - 0x2f, ISA 0x290 (8 I/O ports) 7 Addresses scanned: I2C 0x20 - 0x2f, ISA 0x290 (8 I/O ports)
8 Datasheet: Publicly available at the National Semiconductor website 8 Datasheet: Publicly available at the National Semiconductor website
9 http://www.national.com/ 9 http://www.national.com/
10 * National Semiconductor LM78-J
11 Prefix: 'lm78-j'
12 Addresses scanned: I2C 0x20 - 0x2f, ISA 0x290 (8 I/O ports)
13 Datasheet: Publicly available at the National Semiconductor website
14 http://www.national.com/
15 * National Semiconductor LM79 10 * National Semiconductor LM79
16 Prefix: 'lm79' 11 Prefix: 'lm79'
17 Addresses scanned: I2C 0x20 - 0x2f, ISA 0x290 (8 I/O ports) 12 Addresses scanned: I2C 0x20 - 0x2f, ISA 0x290 (8 I/O ports)
diff --git a/Documentation/hwmon/w83792d b/Documentation/hwmon/w83792d
new file mode 100644
index 000000000000..8171c285bb55
--- /dev/null
+++ b/Documentation/hwmon/w83792d
@@ -0,0 +1,174 @@
1Kernel driver w83792d
2=====================
3
4Supported chips:
5 * Winbond W83792D
6 Prefix: 'w83792d'
7 Addresses scanned: I2C 0x2c - 0x2f
8 Datasheet: http://www.winbond.com.tw/E-WINBONDHTM/partner/PDFresult.asp?Pname=1035
9
10Author: Chunhao Huang
11Contact: DZShen <DZShen@Winbond.com.tw>
12
13
14Module Parameters
15-----------------
16
17* init int
18 (default 1)
19 Use 'init=0' to bypass initializing the chip.
20 Try this if your computer crashes when you load the module.
21
22* force_subclients=bus,caddr,saddr,saddr
23 This is used to force the i2c addresses for subclients of
24 a certain chip. Example usage is `force_subclients=0,0x2f,0x4a,0x4b'
25 to force the subclients of chip 0x2f on bus 0 to i2c addresses
26 0x4a and 0x4b.
27
28
29Description
30-----------
31
32This driver implements support for the Winbond W83792AD/D.
33
34Detection of the chip can sometimes be foiled because it can be in an
35internal state that allows no clean access (Bank with ID register is not
36currently selected). If you know the address of the chip, use a 'force'
37parameter; this will put it into a more well-behaved state first.
38
39The driver implements three temperature sensors, seven fan rotation speed
40sensors, nine voltage sensors, and two automatic fan regulation
41strategies called: Smart Fan I (Thermal Cruise mode) and Smart Fan II.
42Automatic fan control mode is possible only for fan1-fan3. Fan4-fan7 can run
43synchronized with selected fan (fan1-fan3). This functionality and manual PWM
44control for fan4-fan7 is not yet implemented.
45
46Temperatures are measured in degrees Celsius and measurement resolution is 1
47degC for temp1 and 0.5 degC for temp2 and temp3. An alarm is triggered when
48the temperature gets higher than the Overtemperature Shutdown value; it stays
49on until the temperature falls below the Hysteresis value.
50
51Fan rotation speeds are reported in RPM (rotations per minute). An alarm is
52triggered if the rotation speed has dropped below a programmable limit. Fan
53readings can be divided by a programmable divider (1, 2, 4, 8, 16, 32, 64 or
54128) to give the readings more range or accuracy.
55
56Voltage sensors (also known as IN sensors) report their values in millivolts.
57An alarm is triggered if the voltage has crossed a programmable minimum
58or maximum limit.
59
60Alarms are provided as output from "realtime status register". Following bits
61are defined:
62
63bit - alarm on:
640 - in0
651 - in1
662 - temp1
673 - temp2
684 - temp3
695 - fan1
706 - fan2
717 - fan3
728 - in2
739 - in3
7410 - in4
7511 - in5
7612 - in6
7713 - VID change
7814 - chassis
7915 - fan7
8016 - tart1
8117 - tart2
8218 - tart3
8319 - in7
8420 - in8
8521 - fan4
8622 - fan5
8723 - fan6
88
89Tart will be asserted while target temperature cannot be achieved after 3 minutes
90of full speed rotation of corresponding fan.
91
92In addition to the alarms described above, there is a CHAS alarm on the chips
93which triggers if your computer case is open (This one is latched, contrary
94to realtime alarms).
95
96The chips only update values each 3 seconds; reading them more often will
97do no harm, but will return 'old' values.
98
99
100W83792D PROBLEMS
101----------------
102Known problems:
103 - This driver is only for Winbond W83792D C version device, there
104 are also some motherboards with B version W83792D device. The
105 calculation method to in6-in7(measured value, limits) is a little
106 different between C and B version. C or B version can be identified
107 by CR[0x49h].
108 - The function of vid and vrm has not been finished, because I'm NOT
109 very familiar with them. Adding support is welcome.
110  - The function of chassis open detection needs more tests.
111 - If you have ASUS server board and chip was not found: Then you will
112 need to upgrade to latest (or beta) BIOS. If it does not help please
113 contact us.
114
115Fan control
116-----------
117
118Manual mode
119-----------
120
121Works as expected. You just need to specify desired PWM/DC value (fan speed)
122in appropriate pwm# file.
123
124Thermal cruise
125--------------
126
127In this mode, W83792D provides the Smart Fan system to automatically control
128fan speed to keep the temperatures of CPU and the system within specific
129range. At first a wanted temperature and interval must be set. This is done
130via thermal_cruise# file. The tolerance# file serves to create T +- tolerance
131interval. The fan speed will be lowered as long as the current temperature
132remains below the thermal_cruise# +- tolerance# value. Once the temperature
133exceeds the high limit (T+tolerance), the fan will be turned on with a
134specific speed set by pwm# and automatically controlled its PWM duty cycle
135with the temperature varying. Three conditions may occur:
136
137(1) If the temperature still exceeds the high limit, PWM duty
138cycle will increase slowly.
139
140(2) If the temperature goes below the high limit, but still above the low
141limit (T-tolerance), the fan speed will be fixed at the current speed because
142the temperature is in the target range.
143
144(3) If the temperature goes below the low limit, PWM duty cycle will decrease
145slowly to 0 or a preset stop value until the temperature exceeds the low
146limit. (The preset stop value handling is not yet implemented in driver)
147
148Smart Fan II
149------------
150
151W83792D also provides a special mode for fan. Four temperature points are
152available. When related temperature sensors detects the temperature in preset
153temperature region (sf2_point@_fan# +- tolerance#) it will cause fans to run
154on programmed value from sf2_level@_fan#. You need to set four temperatures
155for each fan.
156
157
158/sys files
159----------
160
161pwm[1-3] - this file stores PWM duty cycle or DC value (fan speed) in range:
162 0 (stop) to 255 (full)
163pwm[1-3]_enable - this file controls mode of fan/temperature control:
164 * 0 Disabled
165 * 1 Manual mode
166 * 2 Smart Fan II
167 * 3 Thermal Cruise
168pwm[1-3]_mode - Select PWM of DC mode
169 * 0 DC
170 * 1 PWM
171thermal_cruise[1-3] - Selects the desired temperature for cruise (degC)
172tolerance[1-3] - Value in degrees of Celsius (degC) for +- T
173sf2_point[1-4]_fan[1-3] - four temperature points for each fan for Smart Fan II
174sf2_level[1-3]_fan[1-3] - three PWM/DC levels for each fan for Smart Fan II
diff --git a/Documentation/i2c/chips/max6875 b/Documentation/i2c/chips/max6875
index b02002898a09..96fec562a8e9 100644
--- a/Documentation/i2c/chips/max6875
+++ b/Documentation/i2c/chips/max6875
@@ -4,22 +4,13 @@ Kernel driver max6875
4Supported chips: 4Supported chips:
5 * Maxim MAX6874, MAX6875 5 * Maxim MAX6874, MAX6875
6 Prefix: 'max6875' 6 Prefix: 'max6875'
7 Addresses scanned: 0x50, 0x52 7 Addresses scanned: None (see below)
8 Datasheet: 8 Datasheet:
9 http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf 9 http://pdfserv.maxim-ic.com/en/ds/MAX6874-MAX6875.pdf
10 10
11Author: Ben Gardner <bgardner@wabtec.com> 11Author: Ben Gardner <bgardner@wabtec.com>
12 12
13 13
14Module Parameters
15-----------------
16
17* allow_write int
18 Set to non-zero to enable write permission:
19 *0: Read only
20 1: Read and write
21
22
23Description 14Description
24----------- 15-----------
25 16
@@ -33,34 +24,85 @@ registers.
33 24
34The Maxim MAX6874 is a similar, mostly compatible device, with more intputs 25The Maxim MAX6874 is a similar, mostly compatible device, with more intputs
35and outputs: 26and outputs:
36
37 vin gpi vout 27 vin gpi vout
38MAX6874 6 4 8 28MAX6874 6 4 8
39MAX6875 4 3 5 29MAX6875 4 3 5
40 30
41MAX6874 chips can have four different addresses (as opposed to only two for 31See the datasheet for more information.
42the MAX6875). The additional addresses (0x54 and 0x56) are not probed by
43this driver by default, but the probe module parameter can be used if
44needed.
45
46See the datasheet for details on how to program the EEPROM.
47 32
48 33
49Sysfs entries 34Sysfs entries
50------------- 35-------------
51 36
52eeprom_user - 512 bytes of user-defined EEPROM space. Only writable if 37eeprom - 512 bytes of user-defined EEPROM space.
53 allow_write was set and register 0x43 is 0.
54
55eeprom_config - 70 bytes of config EEPROM. Note that changes will not get
56 loaded into register space until a power cycle or device reset.
57
58reg_config - 70 bytes of register space. Any changes take affect immediately.
59 38
60 39
61General Remarks 40General Remarks
62--------------- 41---------------
63 42
64A typical application will require that the EEPROMs be programmed once and 43Valid addresses for the MAX6875 are 0x50 and 0x52.
65never altered afterwards. 44Valid addresses for the MAX6874 are 0x50, 0x52, 0x54 and 0x56.
45The driver does not probe any address, so you must force the address.
46
47Example:
48$ modprobe max6875 force=0,0x50
49
50The MAX6874/MAX6875 ignores address bit 0, so this driver attaches to multiple
51addresses. For example, for address 0x50, it also reserves 0x51.
52The even-address instance is called 'max6875', the odd one is 'max6875 subclient'.
53
54
55Programming the chip using i2c-dev
56----------------------------------
57
58Use the i2c-dev interface to access and program the chips.
59Reads and writes are performed differently depending on the address range.
60
61The configuration registers are at addresses 0x00 - 0x45.
62Use i2c_smbus_write_byte_data() to write a register and
63i2c_smbus_read_byte_data() to read a register.
64The command is the register number.
65
66Examples:
67To write a 1 to register 0x45:
68 i2c_smbus_write_byte_data(fd, 0x45, 1);
69
70To read register 0x45:
71 value = i2c_smbus_read_byte_data(fd, 0x45);
72
73
74The configuration EEPROM is at addresses 0x8000 - 0x8045.
75The user EEPROM is at addresses 0x8100 - 0x82ff.
76
77Use i2c_smbus_write_word_data() to write a byte to EEPROM.
78
79The command is the upper byte of the address: 0x80, 0x81, or 0x82.
80The data word is the lower part of the address or'd with data << 8.
81 cmd = address >> 8;
82 val = (address & 0xff) | (data << 8);
83
84Example:
85To write 0x5a to address 0x8003:
86 i2c_smbus_write_word_data(fd, 0x80, 0x5a03);
87
88
89Reading data from the EEPROM is a little more complicated.
90Use i2c_smbus_write_byte_data() to set the read address and then
91i2c_smbus_read_byte() or i2c_smbus_read_i2c_block_data() to read the data.
92
93Example:
94To read data starting at offset 0x8100, first set the address:
95 i2c_smbus_write_byte_data(fd, 0x81, 0x00);
96
97And then read the data
98 value = i2c_smbus_read_byte(fd);
99
100 or
101
102 count = i2c_smbus_read_i2c_block_data(fd, 0x84, buffer);
103
104The block read should read 16 bytes.
1050x84 is the block read command.
106
107See the datasheet for more details.
66 108
diff --git a/Documentation/i2c/functionality b/Documentation/i2c/functionality
index 8a78a95ae04e..41ffefbdc60c 100644
--- a/Documentation/i2c/functionality
+++ b/Documentation/i2c/functionality
@@ -115,7 +115,7 @@ CHECKING THROUGH /DEV
115If you try to access an adapter from a userspace program, you will have 115If you try to access an adapter from a userspace program, you will have
116to use the /dev interface. You will still have to check whether the 116to use the /dev interface. You will still have to check whether the
117functionality you need is supported, of course. This is done using 117functionality you need is supported, of course. This is done using
118the I2C_FUNCS ioctl. An example, adapted from the lm_sensors i2c_detect 118the I2C_FUNCS ioctl. An example, adapted from the lm_sensors i2cdetect
119program, is below: 119program, is below:
120 120
121 int file; 121 int file;
diff --git a/Documentation/i2c/porting-clients b/Documentation/i2c/porting-clients
index a7adbdd9ea8a..4849dfd6961c 100644
--- a/Documentation/i2c/porting-clients
+++ b/Documentation/i2c/porting-clients
@@ -1,4 +1,4 @@
1Revision 4, 2004-03-30 1Revision 5, 2005-07-29
2Jean Delvare <khali@linux-fr.org> 2Jean Delvare <khali@linux-fr.org>
3Greg KH <greg@kroah.com> 3Greg KH <greg@kroah.com>
4 4
@@ -17,20 +17,22 @@ yours for best results.
17 17
18Technical changes: 18Technical changes:
19 19
20* [Includes] Get rid of "version.h". Replace <linux/i2c-proc.h> with 20* [Includes] Get rid of "version.h" and <linux/i2c-proc.h>.
21 <linux/i2c-sensor.h>. Includes typically look like that: 21 Includes typically look like that:
22 #include <linux/module.h> 22 #include <linux/module.h>
23 #include <linux/init.h> 23 #include <linux/init.h>
24 #include <linux/slab.h> 24 #include <linux/slab.h>
25 #include <linux/i2c.h> 25 #include <linux/i2c.h>
26 #include <linux/i2c-sensor.h> 26 #include <linux/hwmon.h> /* for hardware monitoring drivers */
27 #include <linux/i2c-vid.h> /* if you need VRM support */ 27 #include <linux/hwmon-sysfs.h>
28 #include <linux/hwmon-vid.h> /* if you need VRM support */
28 #include <asm/io.h> /* if you have I/O operations */ 29 #include <asm/io.h> /* if you have I/O operations */
29 Please respect this inclusion order. Some extra headers may be 30 Please respect this inclusion order. Some extra headers may be
30 required for a given driver (e.g. "lm75.h"). 31 required for a given driver (e.g. "lm75.h").
31 32
32* [Addresses] SENSORS_I2C_END becomes I2C_CLIENT_END, SENSORS_ISA_END 33* [Addresses] SENSORS_I2C_END becomes I2C_CLIENT_END, ISA addresses
33 becomes I2C_CLIENT_ISA_END. 34 are no more handled by the i2c core.
35 SENSORS_INSMOD_<n> becomes I2C_CLIENT_INSMOD_<n>.
34 36
35* [Client data] Get rid of sysctl_id. Try using standard names for 37* [Client data] Get rid of sysctl_id. Try using standard names for
36 register values (for example, temp_os becomes temp_max). You're 38 register values (for example, temp_os becomes temp_max). You're
@@ -66,13 +68,15 @@ Technical changes:
66 if (!(adapter->class & I2C_CLASS_HWMON)) 68 if (!(adapter->class & I2C_CLASS_HWMON))
67 return 0; 69 return 0;
68 ISA-only drivers of course don't need this. 70 ISA-only drivers of course don't need this.
71 Call i2c_probe() instead of i2c_detect().
69 72
70* [Detect] As mentioned earlier, the flags parameter is gone. 73* [Detect] As mentioned earlier, the flags parameter is gone.
71 The type_name and client_name strings are replaced by a single 74 The type_name and client_name strings are replaced by a single
72 name string, which will be filled with a lowercase, short string 75 name string, which will be filled with a lowercase, short string
73 (typically the driver name, e.g. "lm75"). 76 (typically the driver name, e.g. "lm75").
74 In i2c-only drivers, drop the i2c_is_isa_adapter check, it's 77 In i2c-only drivers, drop the i2c_is_isa_adapter check, it's
75 useless. 78 useless. Same for isa-only drivers, as the test would always be
79 true. Only hybrid drivers (which are quite rare) still need it.
76 The errorN labels are reduced to the number needed. If that number 80 The errorN labels are reduced to the number needed. If that number
77 is 2 (i2c-only drivers), it is advised that the labels are named 81 is 2 (i2c-only drivers), it is advised that the labels are named
78 exit and exit_free. For i2c+isa drivers, labels should be named 82 exit and exit_free. For i2c+isa drivers, labels should be named
@@ -86,6 +90,8 @@ Technical changes:
86 device_create_file. Move the driver initialization before any 90 device_create_file. Move the driver initialization before any
87 sysfs file creation. 91 sysfs file creation.
88 Drop client->id. 92 Drop client->id.
93 Drop any 24RF08 corruption prevention you find, as this is now done
94 at the i2c-core level, and doing it twice voids it.
89 95
90* [Init] Limits must not be set by the driver (can be done later in 96* [Init] Limits must not be set by the driver (can be done later in
91 user-space). Chip should not be reset default (although a module 97 user-space). Chip should not be reset default (although a module
@@ -93,7 +99,8 @@ Technical changes:
93 limited to the strictly necessary steps. 99 limited to the strictly necessary steps.
94 100
95* [Detach] Get rid of data, remove the call to 101* [Detach] Get rid of data, remove the call to
96 i2c_deregister_entry. 102 i2c_deregister_entry. Do not log an error message if
103 i2c_detach_client fails, as i2c-core will now do it for you.
97 104
98* [Update] Don't access client->data directly, use 105* [Update] Don't access client->data directly, use
99 i2c_get_clientdata(client) instead. 106 i2c_get_clientdata(client) instead.
diff --git a/Documentation/i2c/writing-clients b/Documentation/i2c/writing-clients
index 91664be91ffc..077275722a7c 100644
--- a/Documentation/i2c/writing-clients
+++ b/Documentation/i2c/writing-clients
@@ -148,15 +148,15 @@ are defined in i2c.h to help you support them, as well as a generic
148detection algorithm. 148detection algorithm.
149 149
150You do not have to use this parameter interface; but don't try to use 150You do not have to use this parameter interface; but don't try to use
151function i2c_probe() (or i2c_detect()) if you don't. 151function i2c_probe() if you don't.
152 152
153NOTE: If you want to write a `sensors' driver, the interface is slightly 153NOTE: If you want to write a `sensors' driver, the interface is slightly
154 different! See below. 154 different! See below.
155 155
156 156
157 157
158Probing classes (i2c) 158Probing classes
159--------------------- 159---------------
160 160
161All parameters are given as lists of unsigned 16-bit integers. Lists are 161All parameters are given as lists of unsigned 16-bit integers. Lists are
162terminated by I2C_CLIENT_END. 162terminated by I2C_CLIENT_END.
@@ -171,12 +171,18 @@ The following lists are used internally:
171 ignore: insmod parameter. 171 ignore: insmod parameter.
172 A list of pairs. The first value is a bus number (-1 for any I2C bus), 172 A list of pairs. The first value is a bus number (-1 for any I2C bus),
173 the second is the I2C address. These addresses are never probed. 173 the second is the I2C address. These addresses are never probed.
174 This parameter overrules 'normal' and 'probe', but not the 'force' lists. 174 This parameter overrules the 'normal_i2c' list only.
175 force: insmod parameter. 175 force: insmod parameter.
176 A list of pairs. The first value is a bus number (-1 for any I2C bus), 176 A list of pairs. The first value is a bus number (-1 for any I2C bus),
177 the second is the I2C address. A device is blindly assumed to be on 177 the second is the I2C address. A device is blindly assumed to be on
178 the given address, no probing is done. 178 the given address, no probing is done.
179 179
180Additionally, kind-specific force lists may optionally be defined if
181the driver supports several chip kinds. They are grouped in a
182NULL-terminated list of pointers named forces, those first element if the
183generic force list mentioned above. Each additional list correspond to an
184insmod parameter of the form force_<kind>.
185
180Fortunately, as a module writer, you just have to define the `normal_i2c' 186Fortunately, as a module writer, you just have to define the `normal_i2c'
181parameter. The complete declaration could look like this: 187parameter. The complete declaration could look like this:
182 188
@@ -186,66 +192,17 @@ parameter. The complete declaration could look like this:
186 192
187 /* Magic definition of all other variables and things */ 193 /* Magic definition of all other variables and things */
188 I2C_CLIENT_INSMOD; 194 I2C_CLIENT_INSMOD;
195 /* Or, if your driver supports, say, 2 kind of devices: */
196 I2C_CLIENT_INSMOD_2(foo, bar);
197
198If you use the multi-kind form, an enum will be defined for you:
199 enum chips { any_chip, foo, bar, ... }
200You can then (and certainly should) use it in the driver code.
189 201
190Note that you *have* to call the defined variable `normal_i2c', 202Note that you *have* to call the defined variable `normal_i2c',
191without any prefix! 203without any prefix!
192 204
193 205
194Probing classes (sensors)
195-------------------------
196
197If you write a `sensors' driver, you use a slightly different interface.
198As well as I2C addresses, we have to cope with ISA addresses. Also, we
199use a enum of chip types. Don't forget to include `sensors.h'.
200
201The following lists are used internally. They are all lists of integers.
202
203 normal_i2c: filled in by the module writer. Terminated by SENSORS_I2C_END.
204 A list of I2C addresses which should normally be examined.
205 normal_isa: filled in by the module writer. Terminated by SENSORS_ISA_END.
206 A list of ISA addresses which should normally be examined.
207 probe: insmod parameter. Initialize this list with SENSORS_I2C_END values.
208 A list of pairs. The first value is a bus number (SENSORS_ISA_BUS for
209 the ISA bus, -1 for any I2C bus), the second is the address. These
210 addresses are also probed, as if they were in the 'normal' list.
211 ignore: insmod parameter. Initialize this list with SENSORS_I2C_END values.
212 A list of pairs. The first value is a bus number (SENSORS_ISA_BUS for
213 the ISA bus, -1 for any I2C bus), the second is the I2C address. These
214 addresses are never probed. This parameter overrules 'normal' and
215 'probe', but not the 'force' lists.
216
217Also used is a list of pointers to sensors_force_data structures:
218 force_data: insmod parameters. A list, ending with an element of which
219 the force field is NULL.
220 Each element contains the type of chip and a list of pairs.
221 The first value is a bus number (SENSORS_ISA_BUS for the ISA bus,
222 -1 for any I2C bus), the second is the address.
223 These are automatically translated to insmod variables of the form
224 force_foo.
225
226So we have a generic insmod variabled `force', and chip-specific variables
227`force_CHIPNAME'.
228
229Fortunately, as a module writer, you just have to define the `normal_i2c'
230and `normal_isa' parameters, and define what chip names are used.
231The complete declaration could look like this:
232 /* Scan i2c addresses 0x37, and 0x48 to 0x4f */
233 static unsigned short normal_i2c[] = { 0x37, 0x48, 0x49, 0x4a, 0x4b, 0x4c,
234 0x4d, 0x4e, 0x4f, I2C_CLIENT_END };
235 /* Scan ISA address 0x290 */
236 static unsigned int normal_isa[] = {0x0290,SENSORS_ISA_END};
237
238 /* Define chips foo and bar, as well as all module parameters and things */
239 SENSORS_INSMOD_2(foo,bar);
240
241If you have one chip, you use macro SENSORS_INSMOD_1(chip), if you have 2
242you use macro SENSORS_INSMOD_2(chip1,chip2), etc. If you do not want to
243bother with chip types, you can use SENSORS_INSMOD_0.
244
245A enum is automatically defined as follows:
246 enum chips { any_chip, chip1, chip2, ... }
247
248
249Attaching to an adapter 206Attaching to an adapter
250----------------------- 207-----------------------
251 208
@@ -264,17 +221,10 @@ detected at a specific address, another callback is called.
264 return i2c_probe(adapter,&addr_data,&foo_detect_client); 221 return i2c_probe(adapter,&addr_data,&foo_detect_client);
265 } 222 }
266 223
267For `sensors' drivers, use the i2c_detect function instead:
268
269 int foo_attach_adapter(struct i2c_adapter *adapter)
270 {
271 return i2c_detect(adapter,&addr_data,&foo_detect_client);
272 }
273
274Remember, structure `addr_data' is defined by the macros explained above, 224Remember, structure `addr_data' is defined by the macros explained above,
275so you do not have to define it yourself. 225so you do not have to define it yourself.
276 226
277The i2c_probe or i2c_detect function will call the foo_detect_client 227The i2c_probe function will call the foo_detect_client
278function only for those i2c addresses that actually have a device on 228function only for those i2c addresses that actually have a device on
279them (unless a `force' parameter was used). In addition, addresses that 229them (unless a `force' parameter was used). In addition, addresses that
280are already in use (by some other registered client) are skipped. 230are already in use (by some other registered client) are skipped.
@@ -283,19 +233,18 @@ are already in use (by some other registered client) are skipped.
283The detect client function 233The detect client function
284-------------------------- 234--------------------------
285 235
286The detect client function is called by i2c_probe or i2c_detect. 236The detect client function is called by i2c_probe. The `kind' parameter
287The `kind' parameter contains 0 if this call is due to a `force' 237contains -1 for a probed detection, 0 for a forced detection, or a positive
288parameter, and -1 otherwise (for i2c_detect, it contains 0 if 238number for a forced detection with a chip type forced.
289this call is due to the generic `force' parameter, and the chip type
290number if it is due to a specific `force' parameter).
291 239
292Below, some things are only needed if this is a `sensors' driver. Those 240Below, some things are only needed if this is a `sensors' driver. Those
293parts are between /* SENSORS ONLY START */ and /* SENSORS ONLY END */ 241parts are between /* SENSORS ONLY START */ and /* SENSORS ONLY END */
294markers. 242markers.
295 243
296This function should only return an error (any value != 0) if there is 244Returning an error different from -ENODEV in a detect function will cause
297some reason why no more detection should be done anymore. If the 245the detection to stop: other addresses and adapters won't be scanned.
298detection just fails for this address, return 0. 246This should only be done on fatal or internal errors, such as a memory
247shortage or i2c_attach_client failing.
299 248
300For now, you can ignore the `flags' parameter. It is there for future use. 249For now, you can ignore the `flags' parameter. It is there for future use.
301 250
@@ -320,11 +269,10 @@ For now, you can ignore the `flags' parameter. It is there for future use.
320 const char *type_name = ""; 269 const char *type_name = "";
321 int is_isa = i2c_is_isa_adapter(adapter); 270 int is_isa = i2c_is_isa_adapter(adapter);
322 271
323 if (is_isa) { 272 /* Do this only if the chip can additionally be found on the ISA bus
273 (hybrid chip). */
324 274
325 /* If this client can't be on the ISA bus at all, we can stop now 275 if (is_isa) {
326 (call `goto ERROR0'). But for kicks, we will assume it is all
327 right. */
328 276
329 /* Discard immediately if this ISA range is already used */ 277 /* Discard immediately if this ISA range is already used */
330 if (check_region(address,FOO_EXTENT)) 278 if (check_region(address,FOO_EXTENT))
@@ -495,15 +443,13 @@ much simpler than the attachment code, fortunately!
495 /* SENSORS ONLY END */ 443 /* SENSORS ONLY END */
496 444
497 /* Try to detach the client from i2c space */ 445 /* Try to detach the client from i2c space */
498 if ((err = i2c_detach_client(client))) { 446 if ((err = i2c_detach_client(client)))
499 printk("foo.o: Client deregistration failed, client not detached.\n");
500 return err; 447 return err;
501 }
502 448
503 /* SENSORS ONLY START */ 449 /* HYBRID SENSORS CHIP ONLY START */
504 if i2c_is_isa_client(client) 450 if i2c_is_isa_client(client)
505 release_region(client->addr,LM78_EXTENT); 451 release_region(client->addr,LM78_EXTENT);
506 /* SENSORS ONLY END */ 452 /* HYBRID SENSORS CHIP ONLY END */
507 453
508 kfree(client); /* Frees client data too, if allocated at the same time */ 454 kfree(client); /* Frees client data too, if allocated at the same time */
509 return 0; 455 return 0;
diff --git a/Documentation/i386/boot.txt b/Documentation/i386/boot.txt
index 1c48f0eba6fb..10312bebe55d 100644
--- a/Documentation/i386/boot.txt
+++ b/Documentation/i386/boot.txt
@@ -2,7 +2,7 @@
2 ---------------------------- 2 ----------------------------
3 3
4 H. Peter Anvin <hpa@zytor.com> 4 H. Peter Anvin <hpa@zytor.com>
5 Last update 2002-01-01 5 Last update 2005-09-02
6 6
7On the i386 platform, the Linux kernel uses a rather complicated boot 7On the i386 platform, the Linux kernel uses a rather complicated boot
8convention. This has evolved partially due to historical aspects, as 8convention. This has evolved partially due to historical aspects, as
@@ -34,6 +34,8 @@ Protocol 2.02: (Kernel 2.4.0-test3-pre3) New command line protocol.
34Protocol 2.03: (Kernel 2.4.18-pre1) Explicitly makes the highest possible 34Protocol 2.03: (Kernel 2.4.18-pre1) Explicitly makes the highest possible
35 initrd address available to the bootloader. 35 initrd address available to the bootloader.
36 36
37Protocol 2.04: (Kernel 2.6.14) Extend the syssize field to four bytes.
38
37 39
38**** MEMORY LAYOUT 40**** MEMORY LAYOUT
39 41
@@ -103,10 +105,9 @@ The header looks like:
103Offset Proto Name Meaning 105Offset Proto Name Meaning
104/Size 106/Size
105 107
10601F1/1 ALL setup_sects The size of the setup in sectors 10801F1/1 ALL(1 setup_sects The size of the setup in sectors
10701F2/2 ALL root_flags If set, the root is mounted readonly 10901F2/2 ALL root_flags If set, the root is mounted readonly
10801F4/2 ALL syssize DO NOT USE - for bootsect.S use only 11001F4/4 2.04+(2 syssize The size of the 32-bit code in 16-byte paras
10901F6/2 ALL swap_dev DO NOT USE - obsolete
11001F8/2 ALL ram_size DO NOT USE - for bootsect.S use only 11101F8/2 ALL ram_size DO NOT USE - for bootsect.S use only
11101FA/2 ALL vid_mode Video mode control 11201FA/2 ALL vid_mode Video mode control
11201FC/2 ALL root_dev Default root device number 11301FC/2 ALL root_dev Default root device number
@@ -129,8 +130,12 @@ Offset Proto Name Meaning
1290228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line 1300228/4 2.02+ cmd_line_ptr 32-bit pointer to the kernel command line
130022C/4 2.03+ initrd_addr_max Highest legal initrd address 131022C/4 2.03+ initrd_addr_max Highest legal initrd address
131 132
132For backwards compatibility, if the setup_sects field contains 0, the 133(1) For backwards compatibility, if the setup_sects field contains 0, the
133real value is 4. 134 real value is 4.
135
136(2) For boot protocol prior to 2.04, the upper two bytes of the syssize
137 field are unusable, which means the size of a bzImage kernel
138 cannot be determined.
134 139
135If the "HdrS" (0x53726448) magic number is not found at offset 0x202, 140If the "HdrS" (0x53726448) magic number is not found at offset 0x202,
136the boot protocol version is "old". Loading an old kernel, the 141the boot protocol version is "old". Loading an old kernel, the
@@ -230,12 +235,16 @@ loader to communicate with the kernel. Some of its options are also
230relevant to the boot loader itself, see "special command line options" 235relevant to the boot loader itself, see "special command line options"
231below. 236below.
232 237
233The kernel command line is a null-terminated string up to 255 238The kernel command line is a null-terminated string currently up to
234characters long, plus the final null. 239255 characters long, plus the final null. A string that is too long
240will be automatically truncated by the kernel, a boot loader may allow
241a longer command line to be passed to permit future kernels to extend
242this limit.
235 243
236If the boot protocol version is 2.02 or later, the address of the 244If the boot protocol version is 2.02 or later, the address of the
237kernel command line is given by the header field cmd_line_ptr (see 245kernel command line is given by the header field cmd_line_ptr (see
238above.) 246above.) This address can be anywhere between the end of the setup
247heap and 0xA0000.
239 248
240If the protocol version is *not* 2.02 or higher, the kernel 249If the protocol version is *not* 2.02 or higher, the kernel
241command line is entered using the following protocol: 250command line is entered using the following protocol:
@@ -255,7 +264,7 @@ command line is entered using the following protocol:
255**** SAMPLE BOOT CONFIGURATION 264**** SAMPLE BOOT CONFIGURATION
256 265
257As a sample configuration, assume the following layout of the real 266As a sample configuration, assume the following layout of the real
258mode segment: 267mode segment (this is a typical, and recommended layout):
259 268
260 0x0000-0x7FFF Real mode kernel 269 0x0000-0x7FFF Real mode kernel
261 0x8000-0x8FFF Stack and heap 270 0x8000-0x8FFF Stack and heap
@@ -312,9 +321,9 @@ Such a boot loader should enter the following fields in the header:
312 321
313**** LOADING THE REST OF THE KERNEL 322**** LOADING THE REST OF THE KERNEL
314 323
315The non-real-mode kernel starts at offset (setup_sects+1)*512 in the 324The 32-bit (non-real-mode) kernel starts at offset (setup_sects+1)*512
316kernel file (again, if setup_sects == 0 the real value is 4.) It 325in the kernel file (again, if setup_sects == 0 the real value is 4.)
317should be loaded at address 0x10000 for Image/zImage kernels and 326It should be loaded at address 0x10000 for Image/zImage kernels and
3180x100000 for bzImage kernels. 3270x100000 for bzImage kernels.
319 328
320The kernel is a bzImage kernel if the protocol >= 2.00 and the 0x01 329The kernel is a bzImage kernel if the protocol >= 2.00 and the 0x01
diff --git a/Documentation/ibm-acpi.txt b/Documentation/ibm-acpi.txt
index c437b1aeff55..8b3fd82b2ce7 100644
--- a/Documentation/ibm-acpi.txt
+++ b/Documentation/ibm-acpi.txt
@@ -1,16 +1,16 @@
1 IBM ThinkPad ACPI Extras Driver 1 IBM ThinkPad ACPI Extras Driver
2 2
3 Version 0.8 3 Version 0.12
4 8 November 2004 4 17 August 2005
5 5
6 Borislav Deianov <borislav@users.sf.net> 6 Borislav Deianov <borislav@users.sf.net>
7 http://ibm-acpi.sf.net/ 7 http://ibm-acpi.sf.net/
8 8
9 9
10This is a Linux ACPI driver for the IBM ThinkPad laptops. It aims to 10This is a Linux ACPI driver for the IBM ThinkPad laptops. It supports
11support various features of these laptops which are accessible through 11various features of these laptops which are accessible through the
12the ACPI framework but not otherwise supported by the generic Linux 12ACPI framework but not otherwise supported by the generic Linux ACPI
13ACPI drivers. 13drivers.
14 14
15 15
16Status 16Status
@@ -25,9 +25,14 @@ detailed description):
25 - ThinkLight on and off 25 - ThinkLight on and off
26 - limited docking and undocking 26 - limited docking and undocking
27 - UltraBay eject 27 - UltraBay eject
28 - Experimental: CMOS control 28 - CMOS control
29 - Experimental: LED control 29 - LED control
30 - Experimental: ACPI sounds 30 - ACPI sounds
31 - temperature sensors
32 - Experimental: embedded controller register dump
33 - Experimental: LCD brightness control
34 - Experimental: volume control
35 - Experimental: fan speed, fan enable/disable
31 36
32A compatibility table by model and feature is maintained on the web 37A compatibility table by model and feature is maintained on the web
33site, http://ibm-acpi.sf.net/. I appreciate any success or failure 38site, http://ibm-acpi.sf.net/. I appreciate any success or failure
@@ -91,12 +96,12 @@ driver is still in the alpha stage, the exact proc file format and
91commands supported by the various features is guaranteed to change 96commands supported by the various features is guaranteed to change
92frequently. 97frequently.
93 98
94Driver Version -- /proc/acpi/ibm/driver 99Driver version -- /proc/acpi/ibm/driver
95-------------------------------------- 100---------------------------------------
96 101
97The driver name and version. No commands can be written to this file. 102The driver name and version. No commands can be written to this file.
98 103
99Hot Keys -- /proc/acpi/ibm/hotkey 104Hot keys -- /proc/acpi/ibm/hotkey
100--------------------------------- 105---------------------------------
101 106
102Without this driver, only the Fn-F4 key (sleep button) generates an 107Without this driver, only the Fn-F4 key (sleep button) generates an
@@ -188,7 +193,7 @@ and, on the X40, video corruption. By disabling automatic switching,
188the flickering or video corruption can be avoided. 193the flickering or video corruption can be avoided.
189 194
190The video_switch command cycles through the available video outputs 195The video_switch command cycles through the available video outputs
191(it sumulates the behavior of Fn-F7). 196(it simulates the behavior of Fn-F7).
192 197
193Video expansion can be toggled through this feature. This controls 198Video expansion can be toggled through this feature. This controls
194whether the display is expanded to fill the entire LCD screen when a 199whether the display is expanded to fill the entire LCD screen when a
@@ -201,6 +206,12 @@ Fn-F7 from working. This also disables the video output switching
201features of this driver, as it uses the same ACPI methods as 206features of this driver, as it uses the same ACPI methods as
202Fn-F7. Video switching on the console should still work. 207Fn-F7. Video switching on the console should still work.
203 208
209UPDATE: There's now a patch for the X.org Radeon driver which
210addresses this issue. Some people are reporting success with the patch
211while others are still having problems. For more information:
212
213https://bugs.freedesktop.org/show_bug.cgi?id=2000
214
204ThinkLight control -- /proc/acpi/ibm/light 215ThinkLight control -- /proc/acpi/ibm/light
205------------------------------------------ 216------------------------------------------
206 217
@@ -211,7 +222,7 @@ models which do not make the status available will show it as
211 echo on > /proc/acpi/ibm/light 222 echo on > /proc/acpi/ibm/light
212 echo off > /proc/acpi/ibm/light 223 echo off > /proc/acpi/ibm/light
213 224
214Docking / Undocking -- /proc/acpi/ibm/dock 225Docking / undocking -- /proc/acpi/ibm/dock
215------------------------------------------ 226------------------------------------------
216 227
217Docking and undocking (e.g. with the X4 UltraBase) requires some 228Docking and undocking (e.g. with the X4 UltraBase) requires some
@@ -228,11 +239,15 @@ NOTE: These events will only be generated if the laptop was docked
228when originally booted. This is due to the current lack of support for 239when originally booted. This is due to the current lack of support for
229hot plugging of devices in the Linux ACPI framework. If the laptop was 240hot plugging of devices in the Linux ACPI framework. If the laptop was
230booted while not in the dock, the following message is shown in the 241booted while not in the dock, the following message is shown in the
231logs: "ibm_acpi: dock device not present". No dock-related events are 242logs:
232generated but the dock and undock commands described below still 243
233work. They can be executed manually or triggered by Fn key 244 Mar 17 01:42:34 aero kernel: ibm_acpi: dock device not present
234combinations (see the example acpid configuration files included in 245
235the driver tarball package available on the web site). 246In this case, no dock-related events are generated but the dock and
247undock commands described below still work. They can be executed
248manually or triggered by Fn key combinations (see the example acpid
249configuration files included in the driver tarball package available
250on the web site).
236 251
237When the eject request button on the dock is pressed, the first event 252When the eject request button on the dock is pressed, the first event
238above is generated. The handler for this event should issue the 253above is generated. The handler for this event should issue the
@@ -267,7 +282,7 @@ the only docking stations currently supported are the X-series
267UltraBase docks and "dumb" port replicators like the Mini Dock (the 282UltraBase docks and "dumb" port replicators like the Mini Dock (the
268latter don't need any ACPI support, actually). 283latter don't need any ACPI support, actually).
269 284
270UltraBay Eject -- /proc/acpi/ibm/bay 285UltraBay eject -- /proc/acpi/ibm/bay
271------------------------------------ 286------------------------------------
272 287
273Inserting or ejecting an UltraBay device requires some actions to be 288Inserting or ejecting an UltraBay device requires some actions to be
@@ -284,8 +299,11 @@ when the laptop was originally booted (on the X series, the UltraBay
284is in the dock, so it may not be present if the laptop was undocked). 299is in the dock, so it may not be present if the laptop was undocked).
285This is due to the current lack of support for hot plugging of devices 300This is due to the current lack of support for hot plugging of devices
286in the Linux ACPI framework. If the laptop was booted without the 301in the Linux ACPI framework. If the laptop was booted without the
287UltraBay, the following message is shown in the logs: "ibm_acpi: bay 302UltraBay, the following message is shown in the logs:
288device not present". No bay-related events are generated but the eject 303
304 Mar 17 01:42:34 aero kernel: ibm_acpi: bay device not present
305
306In this case, no bay-related events are generated but the eject
289command described below still works. It can be executed manually or 307command described below still works. It can be executed manually or
290triggered by a hot key combination. 308triggered by a hot key combination.
291 309
@@ -306,22 +324,33 @@ necessary to enable the UltraBay device (e.g. call idectl).
306The contents of the /proc/acpi/ibm/bay file shows the current status 324The contents of the /proc/acpi/ibm/bay file shows the current status
307of the UltraBay, as provided by the ACPI framework. 325of the UltraBay, as provided by the ACPI framework.
308 326
309Experimental Features 327EXPERIMENTAL warm eject support on the 600e/x, A22p and A3x (To use
310--------------------- 328this feature, you need to supply the experimental=1 parameter when
329loading the module):
330
331These models do not have a button near the UltraBay device to request
332a hot eject but rather require the laptop to be put to sleep
333(suspend-to-ram) before the bay device is ejected or inserted).
334The sequence of steps to eject the device is as follows:
335
336 echo eject > /proc/acpi/ibm/bay
337 put the ThinkPad to sleep
338 remove the drive
339 resume from sleep
340 cat /proc/acpi/ibm/bay should show that the drive was removed
341
342On the A3x, both the UltraBay 2000 and UltraBay Plus devices are
343supported. Use "eject2" instead of "eject" for the second bay.
311 344
312The following features are marked experimental because using them 345Note: the UltraBay eject support on the 600e/x, A22p and A3x is
313involves guessing the correct values of some parameters. Guessing 346EXPERIMENTAL and may not work as expected. USE WITH CAUTION!
314incorrectly may have undesirable effects like crashing your
315ThinkPad. USE THESE WITH CAUTION! To activate them, you'll need to
316supply the experimental=1 parameter when loading the module.
317 347
318Experimental: CMOS control - /proc/acpi/ibm/cmos 348CMOS control -- /proc/acpi/ibm/cmos
319------------------------------------------------ 349-----------------------------------
320 350
321This feature is used internally by the ACPI firmware to control the 351This feature is used internally by the ACPI firmware to control the
322ThinkLight on most newer ThinkPad models. It appears that it can also 352ThinkLight on most newer ThinkPad models. It may also control LCD
323control LCD brightness, sounds volume and more, but only on some 353brightness, sounds volume and more, but only on some models.
324models.
325 354
326The commands are non-negative integer numbers: 355The commands are non-negative integer numbers:
327 356
@@ -330,10 +359,9 @@ The commands are non-negative integer numbers:
330 echo 2 >/proc/acpi/ibm/cmos 359 echo 2 >/proc/acpi/ibm/cmos
331 ... 360 ...
332 361
333The range of numbers which are used internally by various models is 0 362The range of valid numbers is 0 to 21, but not all have an effect and
334to 21, but it's possible that numbers outside this range have 363the behavior varies from model to model. Here is the behavior on the
335interesting behavior. Here is the behavior on the X40 (tpb is the 364X40 (tpb is the ThinkPad Buttons utility):
336ThinkPad Buttons utility):
337 365
338 0 - no effect but tpb reports "Volume down" 366 0 - no effect but tpb reports "Volume down"
339 1 - no effect but tpb reports "Volume up" 367 1 - no effect but tpb reports "Volume up"
@@ -346,26 +374,18 @@ ThinkPad Buttons utility):
346 13 - ThinkLight off 374 13 - ThinkLight off
347 14 - no effect but tpb reports ThinkLight status change 375 14 - no effect but tpb reports ThinkLight status change
348 376
349If you try this feature, please send me a report similar to the 377LED control -- /proc/acpi/ibm/led
350above. On models which allow control of LCD brightness or sound 378---------------------------------
351volume, I'd like to provide this functionality in an user-friendly
352way, but first I need a way to identify the models which this is
353possible.
354
355Experimental: LED control - /proc/acpi/ibm/LED
356----------------------------------------------
357 379
358Some of the LED indicators can be controlled through this feature. The 380Some of the LED indicators can be controlled through this feature. The
359available commands are: 381available commands are:
360 382
361 echo <led number> on >/proc/acpi/ibm/led 383 echo '<led number> on' >/proc/acpi/ibm/led
362 echo <led number> off >/proc/acpi/ibm/led 384 echo '<led number> off' >/proc/acpi/ibm/led
363 echo <led number> blink >/proc/acpi/ibm/led 385 echo '<led number> blink' >/proc/acpi/ibm/led
364 386
365The <led number> parameter is a non-negative integer. The range of LED 387The <led number> range is 0 to 7. The set of LEDs that can be
366numbers used internally by various models is 0 to 7 but it's possible 388controlled varies from model to model. Here is the mapping on the X40:
367that numbers outside this range are also valid. Here is the mapping on
368the X40:
369 389
370 0 - power 390 0 - power
371 1 - battery (orange) 391 1 - battery (orange)
@@ -376,49 +396,224 @@ the X40:
376 396
377All of the above can be turned on and off and can be made to blink. 397All of the above can be turned on and off and can be made to blink.
378 398
379If you try this feature, please send me a report similar to the 399ACPI sounds -- /proc/acpi/ibm/beep
380above. I'd like to provide this functionality in an user-friendly way, 400----------------------------------
381but first I need to identify the which numbers correspond to which
382LEDs on various models.
383
384Experimental: ACPI sounds - /proc/acpi/ibm/beep
385-----------------------------------------------
386 401
387The BEEP method is used internally by the ACPI firmware to provide 402The BEEP method is used internally by the ACPI firmware to provide
388audible alerts in various situtation. This feature allows the same 403audible alerts in various situations. This feature allows the same
389sounds to be triggered manually. 404sounds to be triggered manually.
390 405
391The commands are non-negative integer numbers: 406The commands are non-negative integer numbers:
392 407
393 echo 0 >/proc/acpi/ibm/beep 408 echo <number> >/proc/acpi/ibm/beep
394 echo 1 >/proc/acpi/ibm/beep
395 echo 2 >/proc/acpi/ibm/beep
396 ...
397 409
398The range of numbers which are used internally by various models is 0 410The valid <number> range is 0 to 17. Not all numbers trigger sounds
399to 17, but it's possible that numbers outside this range are also 411and the sounds vary from model to model. Here is the behavior on the
400valid. Here is the behavior on the X40: 412X40:
401 413
402 2 - two beeps, pause, third beep 414 0 - stop a sound in progress (but use 17 to stop 16)
415 2 - two beeps, pause, third beep ("low battery")
403 3 - single beep 416 3 - single beep
404 4 - "unable" 417 4 - high, followed by low-pitched beep ("unable")
405 5 - single beep 418 5 - single beep
406 6 - "AC/DC" 419 6 - very high, followed by high-pitched beep ("AC/DC")
407 7 - high-pitched beep 420 7 - high-pitched beep
408 9 - three short beeps 421 9 - three short beeps
409 10 - very long beep 422 10 - very long beep
410 12 - low-pitched beep 423 12 - low-pitched beep
424 15 - three high-pitched beeps repeating constantly, stop with 0
425 16 - one medium-pitched beep repeating constantly, stop with 17
426 17 - stop 16
427
428Temperature sensors -- /proc/acpi/ibm/thermal
429---------------------------------------------
430
431Most ThinkPads include six or more separate temperature sensors but
432only expose the CPU temperature through the standard ACPI methods.
433This feature shows readings from up to eight different sensors. Some
434readings may not be valid, e.g. may show large negative values. For
435example, on the X40, a typical output may be:
436
437temperatures: 42 42 45 41 36 -128 33 -128
438
439Thomas Gruber took his R51 apart and traced all six active sensors in
440his laptop (the location of sensors may vary on other models):
441
4421: CPU
4432: Mini PCI Module
4443: HDD
4454: GPU
4465: Battery
4476: N/A
4487: Battery
4498: N/A
450
451No commands can be written to this file.
452
453EXPERIMENTAL: Embedded controller reigster dump -- /proc/acpi/ibm/ecdump
454------------------------------------------------------------------------
455
456This feature is marked EXPERIMENTAL because the implementation
457directly accesses hardware registers and may not work as expected. USE
458WITH CAUTION! To use this feature, you need to supply the
459experimental=1 parameter when loading the module.
460
461This feature dumps the values of 256 embedded controller
462registers. Values which have changed since the last time the registers
463were dumped are marked with a star:
464
465[root@x40 ibm-acpi]# cat /proc/acpi/ibm/ecdump
466EC +00 +01 +02 +03 +04 +05 +06 +07 +08 +09 +0a +0b +0c +0d +0e +0f
467EC 0x00: a7 47 87 01 fe 96 00 08 01 00 cb 00 00 00 40 00
468EC 0x10: 00 00 ff ff f4 3c 87 09 01 ff 42 01 ff ff 0d 00
469EC 0x20: 00 00 00 00 00 00 00 00 00 00 00 03 43 00 00 80
470EC 0x30: 01 07 1a 00 30 04 00 00 *85 00 00 10 00 50 00 00
471EC 0x40: 00 00 00 00 00 00 14 01 00 04 00 00 00 00 00 00
472EC 0x50: 00 c0 02 0d 00 01 01 02 02 03 03 03 03 *bc *02 *bc
473EC 0x60: *02 *bc *02 00 00 00 00 00 00 00 00 00 00 00 00 00
474EC 0x70: 00 00 00 00 00 12 30 40 *24 *26 *2c *27 *20 80 *1f 80
475EC 0x80: 00 00 00 06 *37 *0e 03 00 00 00 0e 07 00 00 00 00
476EC 0x90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
477EC 0xa0: *ff 09 ff 09 ff ff *64 00 *00 *00 *a2 41 *ff *ff *e0 00
478EC 0xb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
479EC 0xc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
480EC 0xd0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
481EC 0xe0: 00 00 00 00 00 00 00 00 11 20 49 04 24 06 55 03
482EC 0xf0: 31 55 48 54 35 38 57 57 08 2f 45 73 07 65 6c 1a
483
484This feature can be used to determine the register holding the fan
485speed on some models. To do that, do the following:
486
487 - make sure the battery is fully charged
488 - make sure the fan is running
489 - run 'cat /proc/acpi/ibm/ecdump' several times, once per second or so
490
491The first step makes sure various charging-related values don't
492vary. The second ensures that the fan-related values do vary, since
493the fan speed fluctuates a bit. The third will (hopefully) mark the
494fan register with a star:
495
496[root@x40 ibm-acpi]# cat /proc/acpi/ibm/ecdump
497EC +00 +01 +02 +03 +04 +05 +06 +07 +08 +09 +0a +0b +0c +0d +0e +0f
498EC 0x00: a7 47 87 01 fe 96 00 08 01 00 cb 00 00 00 40 00
499EC 0x10: 00 00 ff ff f4 3c 87 09 01 ff 42 01 ff ff 0d 00
500EC 0x20: 00 00 00 00 00 00 00 00 00 00 00 03 43 00 00 80
501EC 0x30: 01 07 1a 00 30 04 00 00 85 00 00 10 00 50 00 00
502EC 0x40: 00 00 00 00 00 00 14 01 00 04 00 00 00 00 00 00
503EC 0x50: 00 c0 02 0d 00 01 01 02 02 03 03 03 03 bc 02 bc
504EC 0x60: 02 bc 02 00 00 00 00 00 00 00 00 00 00 00 00 00
505EC 0x70: 00 00 00 00 00 12 30 40 24 27 2c 27 21 80 1f 80
506EC 0x80: 00 00 00 06 *be 0d 03 00 00 00 0e 07 00 00 00 00
507EC 0x90: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
508EC 0xa0: ff 09 ff 09 ff ff 64 00 00 00 a2 41 ff ff e0 00
509EC 0xb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
510EC 0xc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
511EC 0xd0: 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
512EC 0xe0: 00 00 00 00 00 00 00 00 11 20 49 04 24 06 55 03
513EC 0xf0: 31 55 48 54 35 38 57 57 08 2f 45 73 07 65 6c 1a
514
515Another set of values that varies often is the temperature
516readings. Since temperatures don't change vary fast, you can take
517several quick dumps to eliminate them.
518
519You can use a similar method to figure out the meaning of other
520embedded controller registers - e.g. make sure nothing else changes
521except the charging or discharging battery to determine which
522registers contain the current battery capacity, etc. If you experiment
523with this, do send me your results (including some complete dumps with
524a description of the conditions when they were taken.)
525
526EXPERIMENTAL: LCD brightness control -- /proc/acpi/ibm/brightness
527-----------------------------------------------------------------
528
529This feature is marked EXPERIMENTAL because the implementation
530directly accesses hardware registers and may not work as expected. USE
531WITH CAUTION! To use this feature, you need to supply the
532experimental=1 parameter when loading the module.
533
534This feature allows software control of the LCD brightness on ThinkPad
535models which don't have a hardware brightness slider. The available
536commands are:
537
538 echo up >/proc/acpi/ibm/brightness
539 echo down >/proc/acpi/ibm/brightness
540 echo 'level <level>' >/proc/acpi/ibm/brightness
541
542The <level> number range is 0 to 7, although not all of them may be
543distinct. The current brightness level is shown in the file.
544
545EXPERIMENTAL: Volume control -- /proc/acpi/ibm/volume
546-----------------------------------------------------
547
548This feature is marked EXPERIMENTAL because the implementation
549directly accesses hardware registers and may not work as expected. USE
550WITH CAUTION! To use this feature, you need to supply the
551experimental=1 parameter when loading the module.
552
553This feature allows volume control on ThinkPad models which don't have
554a hardware volume knob. The available commands are:
555
556 echo up >/proc/acpi/ibm/volume
557 echo down >/proc/acpi/ibm/volume
558 echo mute >/proc/acpi/ibm/volume
559 echo 'level <level>' >/proc/acpi/ibm/volume
560
561The <level> number range is 0 to 15 although not all of them may be
562distinct. The unmute the volume after the mute command, use either the
563up or down command (the level command will not unmute the volume).
564The current volume level and mute state is shown in the file.
565
566EXPERIMENTAL: fan speed, fan enable/disable -- /proc/acpi/ibm/fan
567-----------------------------------------------------------------
568
569This feature is marked EXPERIMENTAL because the implementation
570directly accesses hardware registers and may not work as expected. USE
571WITH CAUTION! To use this feature, you need to supply the
572experimental=1 parameter when loading the module.
573
574This feature attempts to show the current fan speed. The speed is read
575directly from the hardware registers of the embedded controller. This
576is known to work on later R, T and X series ThinkPads but may show a
577bogus value on other models.
578
579The fan may be enabled or disabled with the following commands:
580
581 echo enable >/proc/acpi/ibm/fan
582 echo disable >/proc/acpi/ibm/fan
583
584WARNING WARNING WARNING: do not leave the fan disabled unless you are
585monitoring the temperature sensor readings and you are ready to enable
586it if necessary to avoid overheating.
587
588The fan only runs if it's enabled *and* the various temperature
589sensors which control it read high enough. On the X40, this seems to
590depend on the CPU and HDD temperatures. Specifically, the fan is
591turned on when either the CPU temperature climbs to 56 degrees or the
592HDD temperature climbs to 46 degrees. The fan is turned off when the
593CPU temperature drops to 49 degrees and the HDD temperature drops to
59441 degrees. These thresholds cannot currently be controlled.
595
596On the X31 and X40 (and ONLY on those models), the fan speed can be
597controlled to a certain degree. Once the fan is running, it can be
598forced to run faster or slower with the following command:
599
600 echo 'speed <speed>' > /proc/acpi/ibm/thermal
601
602The sustainable range of fan speeds on the X40 appears to be from
603about 3700 to about 7350. Values outside this range either do not have
604any effect or the fan speed eventually settles somewhere in that
605range. The fan cannot be stopped or started with this command.
606
607On the 570, temperature readings are not available through this
608feature and the fan control works a little differently. The fan speed
609is reported in levels from 0 (off) to 7 (max) and can be controlled
610with the following command:
411 611
412(I've only been able to identify a couple of them). 612 echo 'level <level>' > /proc/acpi/ibm/thermal
413
414If you try this feature, please send me a report similar to the
415above. I'd like to provide this functionality in an user-friendly way,
416but first I need to identify the which numbers correspond to which
417sounds on various models.
418 613
419 614
420Multiple Command, Module Parameters 615Multiple Commands, Module Parameters
421----------------------------------- 616------------------------------------
422 617
423Multiple commands can be written to the proc files in one shot by 618Multiple commands can be written to the proc files in one shot by
424separating them with commas, for example: 619separating them with commas, for example:
@@ -451,24 +646,19 @@ scripts (included with ibm-acpi for completeness):
451 /usr/local/sbin/laptop_mode -- from the Linux kernel source 646 /usr/local/sbin/laptop_mode -- from the Linux kernel source
452 distribution, see Documentation/laptop-mode.txt 647 distribution, see Documentation/laptop-mode.txt
453 /sbin/service -- comes with Redhat/Fedora distributions 648 /sbin/service -- comes with Redhat/Fedora distributions
649 /usr/sbin/hibernate -- from the Software Suspend 2 distribution,
650 see http://softwaresuspend.berlios.de/
454 651
455Toan T Nguyen <ntt@control.uchicago.edu> has written a SuSE powersave 652Toan T Nguyen <ntt@physics.ucla.edu> notes that Suse uses the
456script for the X20, included in config/usr/sbin/ibm_hotkeys_X20 653powersave program to suspend ('powersave --suspend-to-ram') or
654hibernate ('powersave --suspend-to-disk'). This means that the
655hibernate script is not needed on that distribution.
457 656
458Henrik Brix Andersen <brix@gentoo.org> has written a Gentoo ACPI event 657Henrik Brix Andersen <brix@gentoo.org> has written a Gentoo ACPI event
459handler script for the X31. You can get the latest version from 658handler script for the X31. You can get the latest version from
460http://dev.gentoo.org/~brix/files/x31.sh 659http://dev.gentoo.org/~brix/files/x31.sh
461 660
462David Schweikert <dws@ee.eth.ch> has written an alternative blank.sh 661David Schweikert <dws@ee.eth.ch> has written an alternative blank.sh
463script which works on Debian systems, included in 662script which works on Debian systems. This scripts has now been
464configs/etc/acpi/actions/blank-debian.sh 663extended to also work on Fedora systems and included as the default
465 664blank.sh in the distribution.
466
467TODO
468----
469
470I'd like to implement the following features but haven't yet found the
471time and/or I don't yet know how to implement them:
472
473- UltraBay floppy drive support
474
diff --git a/Documentation/input/yealink.txt b/Documentation/input/yealink.txt
new file mode 100644
index 000000000000..85f095a7ad04
--- /dev/null
+++ b/Documentation/input/yealink.txt
@@ -0,0 +1,203 @@
1Driver documentation for yealink usb-p1k phones
2
30. Status
4~~~~~~~~~
5
6The p1k is a relatively cheap usb 1.1 phone with:
7 - keyboard full support, yealink.ko / input event API
8 - LCD full support, yealink.ko / sysfs API
9 - LED full support, yealink.ko / sysfs API
10 - dialtone full support, yealink.ko / sysfs API
11 - ringtone full support, yealink.ko / sysfs API
12 - audio playback full support, snd_usb_audio.ko / alsa API
13 - audio record full support, snd_usb_audio.ko / alsa API
14
15For vendor documentation see http://www.yealink.com
16
17
181. Compilation (stand alone version)
19~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20
21Currently only kernel 2.6.x.y versions are supported.
22In order to build the yealink.ko module do:
23
24 make
25
26If you encounter problems please check if in the MAKE_OPTS variable in
27the Makefile is pointing to the location where your kernel sources
28are located, default /usr/src/linux.
29
30
31
322. keyboard features
33~~~~~~~~~~~~~~~~~~~~
34The current mapping in the kernel is provided by the map_p1k_to_key
35function:
36
37 Physical USB-P1K button layout input events
38
39
40 up up
41 IN OUT left, right
42 down down
43
44 pickup C hangup enter, backspace, escape
45 1 2 3 1, 2, 3
46 4 5 6 4, 5, 6,
47 7 8 9 7, 8, 9,
48 * 0 # *, 0, #,
49
50 The "up" and "down" keys, are symbolised by arrows on the button.
51 The "pickup" and "hangup" keys are symbolised by a green and red phone
52 on the button.
53
54
553. LCD features
56~~~~~~~~~~~~~~~
57The LCD is divided and organised as a 3 line display:
58
59 |[] [][] [][] [][] in |[][]
60 |[] M [][] D [][] : [][] out |[][]
61 store
62
63 NEW REP SU MO TU WE TH FR SA
64
65 [] [] [] [] [] [] [] [] [] [] [] []
66 [] [] [] [] [] [] [] [] [] [] [] []
67
68
69Line 1 Format (see below) : 18.e8.M8.88...188
70 Icon names : M D : IN OUT STORE
71Line 2 Format : .........
72 Icon name : NEW REP SU MO TU WE TH FR SA
73Line 3 Format : 888888888888
74
75
76Format description:
77 From a user space perspective the world is seperated in "digits" and "icons".
78 A digit can have a character set, an icon can only be ON or OFF.
79
80 Format specifier
81 '8' : Generic 7 segment digit with individual addressable segments
82
83 Reduced capabillity 7 segm digit, when segments are hard wired together.
84 '1' : 2 segments digit only able to produce a 1.
85 'e' : Most significant day of the month digit,
86 able to produce at least 1 2 3.
87 'M' : Most significant minute digit,
88 able to produce at least 0 1 2 3 4 5.
89
90 Icons or pictograms:
91 '.' : For example like AM, PM, SU, a 'dot' .. or other single segment
92 elements.
93
94
954. Driver usage
96~~~~~~~~~~~~~~~
97For userland the following interfaces are available using the sysfs interface:
98 /sys/.../
99 line1 Read/Write, lcd line1
100 line2 Read/Write, lcd line2
101 line3 Read/Write, lcd line3
102
103 get_icons Read, returns a set of available icons.
104 hide_icon Write, hide the element by writing the icon name.
105 show_icon Write, display the element by writing the icon name.
106
107 map_seg7 Read/Write, the 7 segments char set, common for all
108 yealink phones. (see map_to_7segment.h)
109
110 ringtone Write, upload binary representation of a ringtone,
111 see yealink.c. status EXPERIMENTAL due to potential
112 races between async. and sync usb calls.
113
114
1154.1 lineX
116~~~~~~~~~
117Reading /sys/../lineX will return the format string with its current value:
118
119 Example:
120 cat ./line3
121 888888888888
122 Linux Rocks!
123
124Writing to /sys/../lineX will set the coresponding LCD line.
125 - Excess characters are ignored.
126 - If less characters are written than allowed, the remaining digits are
127 unchanged.
128 - The tab '\t'and '\n' char does not overwrite the original content.
129 - Writing a space to an icon will always hide its content.
130
131 Example:
132 date +"%m.%e.%k:%M" | sed 's/^0/ /' > ./line1
133
134 Will update the LCD with the current date & time.
135
136
1374.2 get_icons
138~~~~~~~~~~~~~
139Reading will return all available icon names and its current settings:
140
141 cat ./get_icons
142 on M
143 on D
144 on :
145 IN
146 OUT
147 STORE
148 NEW
149 REP
150 SU
151 MO
152 TU
153 WE
154 TH
155 FR
156 SA
157 LED
158 DIALTONE
159 RINGTONE
160
161
1624.3 show/hide icons
163~~~~~~~~~~~~~~~~~~~
164Writing to these files will update the state of the icon.
165Only one icon at a time can be updated.
166
167If an icon is also on a ./lineX the corresponding value is
168updated with the first letter of the icon.
169
170 Example - light up the store icon:
171 echo -n "STORE" > ./show_icon
172
173 cat ./line1
174 18.e8.M8.88...188
175 S
176
177 Example - sound the ringtone for 10 seconds:
178 echo -n RINGTONE > /sys/..../show_icon
179 sleep 10
180 echo -n RINGTONE > /sys/..../hide_icon
181
182
1835. Sound features
184~~~~~~~~~~~~~~~~~
185Sound is supported by the ALSA driver: snd_usb_audio
186
187One 16-bit channel with sample and playback rates of 8000 Hz is the practical
188limit of the device.
189
190 Example - recording test:
191 arecord -v -d 10 -r 8000 -f S16_LE -t wav foobar.wav
192
193 Example - playback test:
194 aplay foobar.wav
195
196
1976. Credits & Acknowledgments
198~~~~~~~~~~~~~~~~~~~~~~~~~~~~
199 - Olivier Vandorpe, for starting the usbb2k-api project doing much of
200 the reverse engineering.
201 - Martin Diehl, for pointing out how to handle USB memory allocation.
202 - Dmitry Torokhov, for the numerous code reviews and suggestions.
203
diff --git a/Documentation/kbuild/makefiles.txt b/Documentation/kbuild/makefiles.txt
index 2616a58a5a4b..9a1586590d82 100644
--- a/Documentation/kbuild/makefiles.txt
+++ b/Documentation/kbuild/makefiles.txt
@@ -872,7 +872,13 @@ When kbuild executes the following steps are followed (roughly):
872 Assignments to $(targets) are without $(obj)/ prefix. 872 Assignments to $(targets) are without $(obj)/ prefix.
873 if_changed may be used in conjunction with custom commands as 873 if_changed may be used in conjunction with custom commands as
874 defined in 6.7 "Custom kbuild commands". 874 defined in 6.7 "Custom kbuild commands".
875
875 Note: It is a typical mistake to forget the FORCE prerequisite. 876 Note: It is a typical mistake to forget the FORCE prerequisite.
877 Another common pitfall is that whitespace is sometimes
878 significant; for instance, the below will fail (note the extra space
879 after the comma):
880 target: source(s) FORCE
881 #WRONG!# $(call if_changed, ld/objcopy/gzip)
876 882
877 ld 883 ld
878 Link target. Often LDFLAGS_$@ is used to set specific options to ld. 884 Link target. Often LDFLAGS_$@ is used to set specific options to ld.
diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 7ff213f4becd..1f5f7d28c9e6 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -39,8 +39,7 @@ SETUP
39 and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch 39 and apply http://lse.sourceforge.net/kdump/patches/kexec-tools-1.101-kdump.patch
40 and after that build the source. 40 and after that build the source.
41 41
422) Download and build the appropriate (latest) kexec/kdump (-mm) kernel 422) Download and build the appropriate (2.6.13-rc1 onwards) vanilla kernel.
43 patchset and apply it to the vanilla kernel tree.
44 43
45 Two kernels need to be built in order to get this feature working. 44 Two kernels need to be built in order to get this feature working.
46 45
@@ -84,15 +83,16 @@ SETUP
84 83
854) Load the second kernel to be booted using: 844) Load the second kernel to be booted using:
86 85
87 kexec -p <second-kernel> --crash-dump --args-linux --append="root=<root-dev> 86 kexec -p <second-kernel> --args-linux --elf32-core-headers
88 init 1 irqpoll" 87 --append="root=<root-dev> init 1 irqpoll"
89 88
90 Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work, 89 Note: i) <second-kernel> has to be a vmlinux image. bzImage will not work,
91 as of now. 90 as of now.
92 ii) By default ELF headers are stored in ELF32 format (for i386). This 91 ii) By default ELF headers are stored in ELF64 format. Option
93 is sufficient to represent the physical memory up to 4GB. To store 92 --elf32-core-headers forces generation of ELF32 headers. gdb can
94 headers in ELF64 format, specifiy "--elf64-core-headers" on the 93 not open ELF64 headers on 32 bit systems. So creating ELF32
95 kexec command line additionally. 94 headers can come handy for users who have got non-PAE systems and
95 hence have memory less than 4GB.
96 iii) Specify "irqpoll" as command line parameter. This reduces driver 96 iii) Specify "irqpoll" as command line parameter. This reduces driver
97 initialization failures in second kernel due to shared interrupts. 97 initialization failures in second kernel due to shared interrupts.
98 98
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 111e98056195..db2603ceabba 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1175,6 +1175,11 @@ running once the system is up.
1175 New name for the ramdisk parameter. 1175 New name for the ramdisk parameter.
1176 See Documentation/ramdisk.txt. 1176 See Documentation/ramdisk.txt.
1177 1177
1178 rdinit= [KNL]
1179 Format: <full_path>
1180 Run specified binary instead of /init from the ramdisk,
1181 used for early userspace startup. See initrd.
1182
1178 reboot= [BUGS=IA-32,BUGS=ARM,BUGS=IA-64] Rebooting mode 1183 reboot= [BUGS=IA-32,BUGS=ARM,BUGS=IA-64] Rebooting mode
1179 Format: <reboot_mode>[,<reboot_mode2>[,...]] 1184 Format: <reboot_mode>[,<reboot_mode2>[,...]]
1180 See arch/*/kernel/reboot.c. 1185 See arch/*/kernel/reboot.c.
diff --git a/Documentation/power/swsusp-dmcrypt.txt b/Documentation/power/swsusp-dmcrypt.txt
new file mode 100644
index 000000000000..59931b46ff7e
--- /dev/null
+++ b/Documentation/power/swsusp-dmcrypt.txt
@@ -0,0 +1,138 @@
1Author: Andreas Steinmetz <ast@domdv.de>
2
3
4How to use dm-crypt and swsusp together:
5========================================
6
7Some prerequisites:
8You know how dm-crypt works. If not, visit the following web page:
9http://www.saout.de/misc/dm-crypt/
10You have read Documentation/power/swsusp.txt and understand it.
11You did read Documentation/initrd.txt and know how an initrd works.
12You know how to create or how to modify an initrd.
13
14Now your system is properly set up, your disk is encrypted except for
15the swap device(s) and the boot partition which may contain a mini
16system for crypto setup and/or rescue purposes. You may even have
17an initrd that does your current crypto setup already.
18
19At this point you want to encrypt your swap, too. Still you want to
20be able to suspend using swsusp. This, however, means that you
21have to be able to either enter a passphrase or that you read
22the key(s) from an external device like a pcmcia flash disk
23or an usb stick prior to resume. So you need an initrd, that sets
24up dm-crypt and then asks swsusp to resume from the encrypted
25swap device.
26
27The most important thing is that you set up dm-crypt in such
28a way that the swap device you suspend to/resume from has
29always the same major/minor within the initrd as well as
30within your running system. The easiest way to achieve this is
31to always set up this swap device first with dmsetup, so that
32it will always look like the following:
33
34brw------- 1 root root 254, 0 Jul 28 13:37 /dev/mapper/swap0
35
36Now set up your kernel to use /dev/mapper/swap0 as the default
37resume partition, so your kernel .config contains:
38
39CONFIG_PM_STD_PARTITION="/dev/mapper/swap0"
40
41Prepare your boot loader to use the initrd you will create or
42modify. For lilo the simplest setup looks like the following
43lines:
44
45image=/boot/vmlinuz
46initrd=/boot/initrd.gz
47label=linux
48append="root=/dev/ram0 init=/linuxrc rw"
49
50Finally you need to create or modify your initrd. Lets assume
51you create an initrd that reads the required dm-crypt setup
52from a pcmcia flash disk card. The card is formatted with an ext2
53fs which resides on /dev/hde1 when the card is inserted. The
54card contains at least the encrypted swap setup in a file
55named "swapkey". /etc/fstab of your initrd contains something
56like the following:
57
58/dev/hda1 /mnt ext3 ro 0 0
59none /proc proc defaults,noatime,nodiratime 0 0
60none /sys sysfs defaults,noatime,nodiratime 0 0
61
62/dev/hda1 contains an unencrypted mini system that sets up all
63of your crypto devices, again by reading the setup from the
64pcmcia flash disk. What follows now is a /linuxrc for your
65initrd that allows you to resume from encrypted swap and that
66continues boot with your mini system on /dev/hda1 if resume
67does not happen:
68
69#!/bin/sh
70PATH=/sbin:/bin:/usr/sbin:/usr/bin
71mount /proc
72mount /sys
73mapped=0
74noresume=`grep -c noresume /proc/cmdline`
75if [ "$*" != "" ]
76then
77 noresume=1
78fi
79dmesg -n 1
80/sbin/cardmgr -q
81for i in 1 2 3 4 5 6 7 8 9 0
82do
83 if [ -f /proc/ide/hde/media ]
84 then
85 usleep 500000
86 mount -t ext2 -o ro /dev/hde1 /mnt
87 if [ -f /mnt/swapkey ]
88 then
89 dmsetup create swap0 /mnt/swapkey > /dev/null 2>&1 && mapped=1
90 fi
91 umount /mnt
92 break
93 fi
94 usleep 500000
95done
96killproc /sbin/cardmgr
97dmesg -n 6
98if [ $mapped = 1 ]
99then
100 if [ $noresume != 0 ]
101 then
102 mkswap /dev/mapper/swap0 > /dev/null 2>&1
103 fi
104 echo 254:0 > /sys/power/resume
105 dmsetup remove swap0
106fi
107umount /sys
108mount /mnt
109umount /proc
110cd /mnt
111pivot_root . mnt
112mount /proc
113umount -l /mnt
114umount /proc
115exec chroot . /sbin/init $* < dev/console > dev/console 2>&1
116
117Please don't mind the weird loop above, busybox's msh doesn't know
118the let statement. Now, what is happening in the script?
119First we have to decide if we want to try to resume, or not.
120We will not resume if booting with "noresume" or any parameters
121for init like "single" or "emergency" as boot parameters.
122
123Then we need to set up dmcrypt with the setup data from the
124pcmcia flash disk. If this succeeds we need to reset the swap
125device if we don't want to resume. The line "echo 254:0 > /sys/power/resume"
126then attempts to resume from the first device mapper device.
127Note that it is important to set the device in /sys/power/resume,
128regardless if resuming or not, otherwise later suspend will fail.
129If resume starts, script execution terminates here.
130
131Otherwise we just remove the encrypted swap device and leave it to the
132mini system on /dev/hda1 to set the whole crypto up (it is up to
133you to modify this to your taste).
134
135What then follows is the well known process to change the root
136file system and continue booting from there. I prefer to unmount
137the initrd prior to continue booting but it is up to you to modify
138this.
diff --git a/Documentation/power/swsusp.txt b/Documentation/power/swsusp.txt
index 7a6b78966459..b0d50840788e 100644
--- a/Documentation/power/swsusp.txt
+++ b/Documentation/power/swsusp.txt
@@ -1,22 +1,20 @@
1From kernel/suspend.c: 1Some warnings, first.
2 2
3 * BIG FAT WARNING ********************************************************* 3 * BIG FAT WARNING *********************************************************
4 * 4 *
5 * If you have unsupported (*) devices using DMA...
6 * ...say goodbye to your data.
7 *
8 * If you touch anything on disk between suspend and resume... 5 * If you touch anything on disk between suspend and resume...
9 * ...kiss your data goodbye. 6 * ...kiss your data goodbye.
10 * 7 *
11 * If your disk driver does not support suspend... (IDE does) 8 * If you do resume from initrd after your filesystems are mounted...
12 * ...you'd better find out how to get along 9 * ...bye bye root partition.
13 * without your data. 10 * [this is actually same case as above]
14 *
15 * If you change kernel command line between suspend and resume...
16 * ...prepare for nasty fsck or worse.
17 * 11 *
18 * If you change your hardware while system is suspended... 12 * If you have unsupported (*) devices using DMA, you may have some
19 * ...well, it was not good idea. 13 * problems. If your disk driver does not support suspend... (IDE does),
14 * it may cause some problems, too. If you change kernel command line
15 * between suspend and resume, it may do something wrong. If you change
16 * your hardware while system is suspended... well, it was not good idea;
17 * but it will probably only crash.
20 * 18 *
21 * (*) suspend/resume support is needed to make it safe. 19 * (*) suspend/resume support is needed to make it safe.
22 20
@@ -30,6 +28,13 @@ echo shutdown > /sys/power/disk; echo disk > /sys/power/state
30echo platform > /sys/power/disk; echo disk > /sys/power/state 28echo platform > /sys/power/disk; echo disk > /sys/power/state
31 29
32 30
31Encrypted suspend image:
32------------------------
33If you want to store your suspend image encrypted with a temporary
34key to prevent data gathering after resume you must compile
35crypto and the aes algorithm into the kernel - modules won't work
36as they cannot be loaded at resume time.
37
33 38
34Article about goals and implementation of Software Suspend for Linux 39Article about goals and implementation of Software Suspend for Linux
35~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 40~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -85,11 +90,6 @@ resume.
85You have your server on UPS. Power died, and UPS is indicating 30 90You have your server on UPS. Power died, and UPS is indicating 30
86seconds to failure. What do you do? Suspend to disk. 91seconds to failure. What do you do? Suspend to disk.
87 92
88Ethernet card in your server died. You want to replace it. Your
89server is not hotplug capable. What do you do? Suspend to disk,
90replace ethernet card, resume. If you are fast your users will not
91even see broken connections.
92
93 93
94Q: Maybe I'm missing something, but why don't the regular I/O paths work? 94Q: Maybe I'm missing something, but why don't the regular I/O paths work?
95 95
@@ -117,31 +117,6 @@ Q: Does linux support ACPI S4?
117 117
118A: Yes. That's what echo platform > /sys/power/disk does. 118A: Yes. That's what echo platform > /sys/power/disk does.
119 119
120Q: My machine doesn't work with ACPI. How can I use swsusp than ?
121
122A: Do a reboot() syscall with right parameters. Warning: glibc gets in
123its way, so check with strace:
124
125reboot(LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2, 0xd000fce2)
126
127(Thanks to Peter Osterlund:)
128
129#include <unistd.h>
130#include <syscall.h>
131
132#define LINUX_REBOOT_MAGIC1 0xfee1dead
133#define LINUX_REBOOT_MAGIC2 672274793
134#define LINUX_REBOOT_CMD_SW_SUSPEND 0xD000FCE2
135
136int main()
137{
138 syscall(SYS_reboot, LINUX_REBOOT_MAGIC1, LINUX_REBOOT_MAGIC2,
139 LINUX_REBOOT_CMD_SW_SUSPEND, 0);
140 return 0;
141}
142
143Also /sys/ interface should be still present.
144
145Q: What is 'suspend2'? 120Q: What is 'suspend2'?
146 121
147A: suspend2 is 'Software Suspend 2', a forked implementation of 122A: suspend2 is 'Software Suspend 2', a forked implementation of
@@ -311,3 +286,46 @@ As a rule of thumb use encrypted swap to protect your data while your
311system is shut down or suspended. Additionally use the encrypted 286system is shut down or suspended. Additionally use the encrypted
312suspend image to prevent sensitive data from being stolen after 287suspend image to prevent sensitive data from being stolen after
313resume. 288resume.
289
290Q: Why can't we suspend to a swap file?
291
292A: Because accessing swap file needs the filesystem mounted, and
293filesystem might do something wrong (like replaying the journal)
294during mount.
295
296There are few ways to get that fixed:
297
2981) Probably could be solved by modifying every filesystem to support
299some kind of "really read-only!" option. Patches welcome.
300
3012) suspend2 gets around that by storing absolute positions in on-disk
302image (and blocksize), with resume parameter pointing directly to
303suspend header.
304
305Q: Is there a maximum system RAM size that is supported by swsusp?
306
307A: It should work okay with highmem.
308
309Q: Does swsusp (to disk) use only one swap partition or can it use
310multiple swap partitions (aggregate them into one logical space)?
311
312A: Only one swap partition, sorry.
313
314Q: If my application(s) causes lots of memory & swap space to be used
315(over half of the total system RAM), is it correct that it is likely
316to be useless to try to suspend to disk while that app is running?
317
318A: No, it should work okay, as long as your app does not mlock()
319it. Just prepare big enough swap partition.
320
321Q: What information is usefull for debugging suspend-to-disk problems?
322
323A: Well, last messages on the screen are always useful. If something
324is broken, it is usually some kernel driver, therefore trying with as
325little as possible modules loaded helps a lot. I also prefer people to
326suspend from console, preferably without X running. Booting with
327init=/bin/bash, then swapon and starting suspend sequence manually
328usually does the trick. Then it is good idea to try with latest
329vanilla kernel.
330
331
diff --git a/Documentation/power/video.txt b/Documentation/power/video.txt
index 7a4a5036d123..526d6dd267ea 100644
--- a/Documentation/power/video.txt
+++ b/Documentation/power/video.txt
@@ -46,6 +46,12 @@ There are a few types of systems where video works after S3 resume:
46 POSTing bios works. Ole Rohne has patch to do just that at 46 POSTing bios works. Ole Rohne has patch to do just that at
47 http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2. 47 http://dev.gentoo.org/~marineam/patch-radeonfb-2.6.11-rc2-mm2.
48 48
49(8) on some systems, you can use the video_post utility mentioned here:
50 http://bugzilla.kernel.org/show_bug.cgi?id=3670. Do echo 3 > /sys/power/state
51 && /usr/sbin/video_post - which will initialize the display in console mode.
52 If you are in X, you can switch to a virtual terminal and back to X using
53 CTRL+ALT+F1 - CTRL+ALT+F7 to get the display working in graphical mode again.
54
49Now, if you pass acpi_sleep=something, and it does not work with your 55Now, if you pass acpi_sleep=something, and it does not work with your
50bios, you'll get a hard crash during resume. Be careful. Also it is 56bios, you'll get a hard crash during resume. Be careful. Also it is
51safest to do your experiments with plain old VGA console. The vesafb 57safest to do your experiments with plain old VGA console. The vesafb
@@ -64,7 +70,8 @@ Model hack (or "how to do it")
64------------------------------------------------------------------------------ 70------------------------------------------------------------------------------
65Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI 71Acer Aspire 1406LC ole's late BIOS init (7), turn off DRI
66Acer TM 242FX vbetool (6) 72Acer TM 242FX vbetool (6)
67Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) 73Acer TM C110 video_post (8)
74Acer TM C300 vga=normal (only suspend on console, not in X), vbetool (6) or video_post (8)
68Acer TM 4052LCi s3_bios (2) 75Acer TM 4052LCi s3_bios (2)
69Acer TM 636Lci s3_bios vga=normal (2) 76Acer TM 636Lci s3_bios vga=normal (2)
70Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back 77Acer TM 650 (Radeon M7) vga=normal plus boot-radeon (5) gets text console back
@@ -113,6 +120,7 @@ IBM ThinkPad T42p (2373-GTG) s3_bios (2)
113IBM TP X20 ??? (*) 120IBM TP X20 ??? (*)
114IBM TP X30 s3_bios (2) 121IBM TP X30 s3_bios (2)
115IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight. 122IBM TP X31 / Type 2672-XXH none (1), use radeontool (http://fdd.com/software/radeon/) to turn off backlight.
123IBM TP X32 none (1), but backlight is on and video is trashed after long suspend
116IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4) 124IBM Thinkpad X40 Type 2371-7JG s3_bios,s3_mode (4)
117Medion MD4220 ??? (*) 125Medion MD4220 ??? (*)
118Samsung P35 vbetool needed (6) 126Samsung P35 vbetool needed (6)
diff --git a/Documentation/scsi/aic7xxx.txt b/Documentation/scsi/aic7xxx.txt
index 160e7354cd1e..47e74ddc4bc9 100644
--- a/Documentation/scsi/aic7xxx.txt
+++ b/Documentation/scsi/aic7xxx.txt
@@ -1,5 +1,5 @@
1==================================================================== 1====================================================================
2= Adaptec Aic7xxx Fast -> Ultra160 Family Manager Set v6.2.28 = 2= Adaptec Aic7xxx Fast -> Ultra160 Family Manager Set v7.0 =
3= README for = 3= README for =
4= The Linux Operating System = 4= The Linux Operating System =
5==================================================================== 5====================================================================
@@ -131,6 +131,10 @@ The following information is available in this file:
131 SCSI "stub" effects. 131 SCSI "stub" effects.
132 132
1332. Version History 1332. Version History
134 7.0 (4th August, 2005)
135 - Updated driver to use SCSI transport class infrastructure
136 - Upported sequencer and core fixes from last adaptec released
137 version of the driver.
134 6.2.36 (June 3rd, 2003) 138 6.2.36 (June 3rd, 2003)
135 - Correct code that disables PCI parity error checking. 139 - Correct code that disables PCI parity error checking.
136 - Correct and simplify handling of the ignore wide residue 140 - Correct and simplify handling of the ignore wide residue
diff --git a/Documentation/scsi/scsi_mid_low_api.txt b/Documentation/scsi/scsi_mid_low_api.txt
index 7536823c0cb1..44df89c9c049 100644
--- a/Documentation/scsi/scsi_mid_low_api.txt
+++ b/Documentation/scsi/scsi_mid_low_api.txt
@@ -373,13 +373,11 @@ Summary:
373 scsi_activate_tcq - turn on tag command queueing 373 scsi_activate_tcq - turn on tag command queueing
374 scsi_add_device - creates new scsi device (lu) instance 374 scsi_add_device - creates new scsi device (lu) instance
375 scsi_add_host - perform sysfs registration and SCSI bus scan. 375 scsi_add_host - perform sysfs registration and SCSI bus scan.
376 scsi_add_timer - (re-)start timer on a SCSI command.
377 scsi_adjust_queue_depth - change the queue depth on a SCSI device 376 scsi_adjust_queue_depth - change the queue depth on a SCSI device
378 scsi_assign_lock - replace default host_lock with given lock 377 scsi_assign_lock - replace default host_lock with given lock
379 scsi_bios_ptable - return copy of block device's partition table 378 scsi_bios_ptable - return copy of block device's partition table
380 scsi_block_requests - prevent further commands being queued to given host 379 scsi_block_requests - prevent further commands being queued to given host
381 scsi_deactivate_tcq - turn off tag command queueing 380 scsi_deactivate_tcq - turn off tag command queueing
382 scsi_delete_timer - cancel timer on a SCSI command.
383 scsi_host_alloc - return a new scsi_host instance whose refcount==1 381 scsi_host_alloc - return a new scsi_host instance whose refcount==1
384 scsi_host_get - increments Scsi_Host instance's refcount 382 scsi_host_get - increments Scsi_Host instance's refcount
385 scsi_host_put - decrements Scsi_Host instance's refcount (free if 0) 383 scsi_host_put - decrements Scsi_Host instance's refcount (free if 0)
@@ -458,27 +456,6 @@ int scsi_add_host(struct Scsi_Host *shost, struct device * dev)
458 456
459 457
460/** 458/**
461 * scsi_add_timer - (re-)start timer on a SCSI command.
462 * @scmd: pointer to scsi command instance
463 * @timeout: duration of timeout in "jiffies"
464 * @complete: pointer to function to call if timeout expires
465 *
466 * Returns nothing
467 *
468 * Might block: no
469 *
470 * Notes: Each scsi command has its own timer, and as it is added
471 * to the queue, we set up the timer. When the command completes,
472 * we cancel the timer. An LLD can use this function to change
473 * the existing timeout value.
474 *
475 * Defined in: drivers/scsi/scsi_error.c
476 **/
477void scsi_add_timer(struct scsi_cmnd *scmd, int timeout,
478 void (*complete)(struct scsi_cmnd *))
479
480
481/**
482 * scsi_adjust_queue_depth - allow LLD to change queue depth on a SCSI device 459 * scsi_adjust_queue_depth - allow LLD to change queue depth on a SCSI device
483 * @sdev: pointer to SCSI device to change queue depth on 460 * @sdev: pointer to SCSI device to change queue depth on
484 * @tagged: 0 - no tagged queuing 461 * @tagged: 0 - no tagged queuing
@@ -566,24 +543,6 @@ void scsi_deactivate_tcq(struct scsi_device *sdev, int depth)
566 543
567 544
568/** 545/**
569 * scsi_delete_timer - cancel timer on a SCSI command.
570 * @scmd: pointer to scsi command instance
571 *
572 * Returns 1 if able to cancel timer else 0 (i.e. too late or already
573 * cancelled).
574 *
575 * Might block: no [may in the future if it invokes del_timer_sync()]
576 *
577 * Notes: All commands issued by upper levels already have a timeout
578 * associated with them. An LLD can use this function to cancel the
579 * timer.
580 *
581 * Defined in: drivers/scsi/scsi_error.c
582 **/
583int scsi_delete_timer(struct scsi_cmnd *scmd)
584
585
586/**
587 * scsi_host_alloc - create a scsi host adapter instance and perform basic 546 * scsi_host_alloc - create a scsi host adapter instance and perform basic
588 * initialization. 547 * initialization.
589 * @sht: pointer to scsi host template 548 * @sht: pointer to scsi host template
diff --git a/Documentation/sonypi.txt b/Documentation/sonypi.txt
index 0f3b2405d09e..c1237a925505 100644
--- a/Documentation/sonypi.txt
+++ b/Documentation/sonypi.txt
@@ -99,6 +99,7 @@ statically linked into the kernel). Those options are:
99 SONYPI_MEYE_MASK 0x0400 99 SONYPI_MEYE_MASK 0x0400
100 SONYPI_MEMORYSTICK_MASK 0x0800 100 SONYPI_MEMORYSTICK_MASK 0x0800
101 SONYPI_BATTERY_MASK 0x1000 101 SONYPI_BATTERY_MASK 0x1000
102 SONYPI_WIRELESS_MASK 0x2000
102 103
103 useinput: if set (which is the default) two input devices are 104 useinput: if set (which is the default) two input devices are
104 created, one which interprets the jogdial events as 105 created, one which interprets the jogdial events as
@@ -137,6 +138,15 @@ Bugs:
137 speed handling etc). Use ACPI instead of APM if it works on your 138 speed handling etc). Use ACPI instead of APM if it works on your
138 laptop. 139 laptop.
139 140
141 - sonypi lacks the ability to distinguish between certain key
142 events on some models.
143
144 - some models with the nvidia card (geforce go 6200 tc) uses a
145 different way to adjust the backlighting of the screen. There
146 is a userspace utility to adjust the brightness on those models,
147 which can be downloaded from
148 http://www.acc.umu.se/~erikw/program/smartdimmer-0.1.tar.bz2
149
140 - since all development was done by reverse engineering, there is 150 - since all development was done by reverse engineering, there is
141 _absolutely no guarantee_ that this driver will not crash your 151 _absolutely no guarantee_ that this driver will not crash your
142 laptop. Permanently. 152 laptop. Permanently.
diff --git a/Documentation/sparse.txt b/Documentation/sparse.txt
index f97841478459..5df44dc894e5 100644
--- a/Documentation/sparse.txt
+++ b/Documentation/sparse.txt
@@ -57,7 +57,7 @@ With BK, you can just get it from
57 57
58and DaveJ has tar-balls at 58and DaveJ has tar-balls at
59 59
60 http://www.codemonkey.org.uk/projects/bitkeeper/sparse/ 60 http://www.codemonkey.org.uk/projects/git-snapshots/sparse/
61 61
62 62
63Once you have it, just do 63Once you have it, just do
diff --git a/Documentation/video4linux/CARDLIST.bttv b/Documentation/video4linux/CARDLIST.bttv
index 62a12a08e2ac..ec785f9f15a3 100644
--- a/Documentation/video4linux/CARDLIST.bttv
+++ b/Documentation/video4linux/CARDLIST.bttv
@@ -126,10 +126,12 @@ card=124 - AverMedia AverTV DVB-T 761
126card=125 - MATRIX Vision Sigma-SQ 126card=125 - MATRIX Vision Sigma-SQ
127card=126 - MATRIX Vision Sigma-SLC 127card=126 - MATRIX Vision Sigma-SLC
128card=127 - APAC Viewcomp 878(AMAX) 128card=127 - APAC Viewcomp 878(AMAX)
129card=128 - DVICO FusionHDTV DVB-T Lite 129card=128 - DViCO FusionHDTV DVB-T Lite
130card=129 - V-Gear MyVCD 130card=129 - V-Gear MyVCD
131card=130 - Super TV Tuner 131card=130 - Super TV Tuner
132card=131 - Tibet Systems 'Progress DVR' CS16 132card=131 - Tibet Systems 'Progress DVR' CS16
133card=132 - Kodicom 4400R (master) 133card=132 - Kodicom 4400R (master)
134card=133 - Kodicom 4400R (slave) 134card=133 - Kodicom 4400R (slave)
135card=134 - Adlink RTV24 135card=134 - Adlink RTV24
136card=135 - DViCO FusionHDTV 5 Lite
137card=136 - Acorp Y878F
diff --git a/Documentation/video4linux/CARDLIST.saa7134 b/Documentation/video4linux/CARDLIST.saa7134
index 1b5a3a9ffbe2..dc57225f39be 100644
--- a/Documentation/video4linux/CARDLIST.saa7134
+++ b/Documentation/video4linux/CARDLIST.saa7134
@@ -62,3 +62,6 @@
62 61 -> Philips TOUGH DVB-T reference design [1131:2004] 62 61 -> Philips TOUGH DVB-T reference design [1131:2004]
63 62 -> Compro VideoMate TV Gold+II 63 62 -> Compro VideoMate TV Gold+II
64 63 -> Kworld Xpert TV PVR7134 64 63 -> Kworld Xpert TV PVR7134
65 64 -> FlyTV mini Asus Digimatrix [1043:0210,1043:0210]
66 65 -> V-Stream Studio TV Terminator
67 66 -> Yuan TUN-900 (saa7135)
diff --git a/Documentation/video4linux/CARDLIST.tuner b/Documentation/video4linux/CARDLIST.tuner
index f3302e1b1b9c..f5876be658a6 100644
--- a/Documentation/video4linux/CARDLIST.tuner
+++ b/Documentation/video4linux/CARDLIST.tuner
@@ -64,3 +64,4 @@ tuner=62 - Philips TEA5767HN FM Radio
64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner 64tuner=63 - Philips FMD1216ME MK3 Hybrid Tuner
65tuner=64 - LG TDVS-H062F/TUA6034 65tuner=64 - LG TDVS-H062F/TUA6034
66tuner=65 - Ymec TVF66T5-B/DFF 66tuner=65 - Ymec TVF66T5-B/DFF
67tuner=66 - LG NTSC (TALN mini series)
diff --git a/Documentation/vm/locking b/Documentation/vm/locking
index c3ef09ae3bb1..f366fa956179 100644
--- a/Documentation/vm/locking
+++ b/Documentation/vm/locking
@@ -83,19 +83,18 @@ single address space optimization, so that the zap_page_range (from
83vmtruncate) does not lose sending ipi's to cloned threads that might 83vmtruncate) does not lose sending ipi's to cloned threads that might
84be spawned underneath it and go to user mode to drag in pte's into tlbs. 84be spawned underneath it and go to user mode to drag in pte's into tlbs.
85 85
86swap_list_lock/swap_device_lock 86swap_lock
87------------------------------- 87--------------
88The swap devices are chained in priority order from the "swap_list" header. 88The swap devices are chained in priority order from the "swap_list" header.
89The "swap_list" is used for the round-robin swaphandle allocation strategy. 89The "swap_list" is used for the round-robin swaphandle allocation strategy.
90The #free swaphandles is maintained in "nr_swap_pages". These two together 90The #free swaphandles is maintained in "nr_swap_pages". These two together
91are protected by the swap_list_lock. 91are protected by the swap_lock.
92 92
93The swap_device_lock, which is per swap device, protects the reference 93The swap_lock also protects all the device reference counts on the
94counts on the corresponding swaphandles, maintained in the "swap_map" 94corresponding swaphandles, maintained in the "swap_map" array, and the
95array, and the "highest_bit" and "lowest_bit" fields. 95"highest_bit" and "lowest_bit" fields.
96 96
97Both of these are spinlocks, and are never acquired from intr level. The 97The swap_lock is a spinlock, and is never acquired from intr level.
98locking hierarchy is swap_list_lock -> swap_device_lock.
99 98
100To prevent races between swap space deletion or async readahead swapins 99To prevent races between swap space deletion or async readahead swapins
101deciding whether a swap handle is being used, ie worthy of being read in 100deciding whether a swap handle is being used, ie worthy of being read in
diff --git a/Documentation/watchdog/watchdog-api.txt b/Documentation/watchdog/watchdog-api.txt
index 28388aa700c6..c5beb548cfc4 100644
--- a/Documentation/watchdog/watchdog-api.txt
+++ b/Documentation/watchdog/watchdog-api.txt
@@ -228,6 +228,26 @@ advantechwdt.c -- Advantech Single Board Computer
228 The GETSTATUS call returns if the device is open or not. 228 The GETSTATUS call returns if the device is open or not.
229 [FIXME -- silliness again?] 229 [FIXME -- silliness again?]
230 230
231booke_wdt.c -- PowerPC BookE Watchdog Timer
232
233 Timeout default varies according to frequency, supports
234 SETTIMEOUT
235
236 Watchdog can not be turned off, CONFIG_WATCHDOG_NOWAYOUT
237 does not make sense
238
239 GETSUPPORT returns the watchdog_info struct, and
240 GETSTATUS returns the supported options. GETBOOTSTATUS
241 returns a 1 if the last reset was caused by the
242 watchdog and a 0 otherwise. This watchdog can not be
243 disabled once it has been started. The wdt_period kernel
244 parameter selects which bit of the time base changing
245 from 0->1 will trigger the watchdog exception. Changing
246 the timeout from the ioctl calls will change the
247 wdt_period as defined above. Finally if you would like to
248 replace the default Watchdog Handler you can implement the
249 WatchdogHandler() function in your own code.
250
231eurotechwdt.c -- Eurotech CPU-1220/1410 251eurotechwdt.c -- Eurotech CPU-1220/1410
232 252
233 The timeout can be set using the SETTIMEOUT ioctl and defaults 253 The timeout can be set using the SETTIMEOUT ioctl and defaults